1There are two versions of the logarithm.
2\begin{itemize}
3\item The first relies on 80-bit double-extended arithmetic, and is
4  well suited to IA32 and IA64 architectures which have hardware
5  support for such arithmetic. It computes the quick step in
6  double-extended arithmetic, and the accurate step in
7  double-double-extended arithmetic.
8\item The second relies only on double-precision arithmetic, and is
9  portable. It uses double-double for the quick step, and
10  triple-double for the accurate step.
11\end{itemize}
12
13Both implementations use the same algorithm, which is detailed in
14\ref{sec:logoutline}. Sections \ref{sec:logdeproof} and
15\ref{sec:logtdproof} detail the proof of both implementations, and
16\ref{sec:logperf} give some performance results.
17
18
19\section{General outline of the algorithm\label{sec:logoutline}}
20The algorithm used is mainly due to
21Wong and Goto\cite{WG94} and has been discussed further in \cite{Muller97}. In the case we are given here,
22both quick and accurate phase use principally the same algorithm however optimized for different accuracies.
23
24The function's argument $x \in \F$ is first checked for special cases, such as $x\leq0$, $+\infty$, $\nan$ etc.
25These checks are mainly implemented using integer arithmetics and will be further explained in section
26\ref{subsec:reduction}. Then, the argument is reduced using integer arithmetics as follows:
27$$x = 2^{E^\prime} \cdot m$$
28where $E^\prime$ is the exponent of $x$ and $m$ a double corresponding
29to the mantissa of $x$. This decomposition is done such that in any
30case, i.e. even if $x$ is subnormal, $1 \leq m < 2$. In the subnormal
31case, the exponent of $x$ is adjusted accordingly.  This first
32argument reduction corresponds to the equality
33$$\log\left( x \right) = E^\prime \cdot \log\left(2\right) + \log\left(m \right)$$
34Using this term directly would lead to catastrophic cancellation in the case where $E^\prime = -1$ and
35$m \approx 2$. To overcome this difficulty, a second adjustment is done as follows:
36\begin{center}
37  \begin{tabular}{cc}
38    \begin{minipage}{50mm}
39      $$E = \left \lbrace \begin{array}{lcl} E^\prime & \mbox{ if } & m \leq \sqrt{2} \\
40          E^\prime +1 & \mbox{ if } & m > \sqrt{2} \end{array} \right.$$
41    \end{minipage}
42    &
43    \begin{minipage}{50mm}
44      $$y = \left \lbrace \begin{array}{lcl} m & \mbox{ if } & m \leq \sqrt{2} \\
45          \frac{m}{2} & \mbox{ if } & m > \sqrt{2} \end{array} \right.$$
46    \end{minipage}
47  \end{tabular}
48\end{center}
49
50The decision whether $m \leq \sqrt{2}$ or not is performed using integer arithmetics on
51the high order bits of the mantissa $m$. The test is therefore not completely exact which is no
52disadvantage since, in any case, the bound $\sqrt{2}$ is somewhat arbitrary.\par
53All the previous reduction steps can be implemented exactly as they consist mainly in decompositions
54of a floating point number, multiplications by powers of $2$ and integer additions on the corresponding exponent value.
55All this leads to the following equation
56$$\log\left( x \right) = E \cdot \log\left( 2 \right) + \log\left( y \right)$$
57where
58$$-\frac{1}{2} \cdot \log\left( 2 \right) \leq \log\left( y \right) \leq \frac{1}{2} \cdot \log\left( 2 \right)$$
59The magnitude of $y$ is thus still too great for allowing for a direct polynomial approximation of $\log\left(y\right)$.
60Therefore, a second argument reduction step is performed using a table of $128$ entries as follows:
61using the high order bits of $y$ as an index $i$, a tabulated value $r_i$ is looked up which approximated very well
62$\frac{1}{y}$. Setting $z = y \cdot r_i - 1$, one obtains
63$$\log\left( y \right) = \log\left( 1 + z \right) - \log\left( r_i \right)$$
64Since $y = \frac{1}{r_i} + \delta$ the magnitude of $z$ is finally
65small enough (typically $\left \vert z \right \vert < 2^{-8}$) for
66approximating $\log\left(1+z\right)$ by a Remez polynomial
67$p\left(z\right)$. The values for $\log\left(r_i\right)$ are of course also tabulated.
68
69It is important to notice that the reduction step
70$$z = y \cdot r_i - 1$$
71can be implemented exactly which eases the correctness proof of the algorithm. This property will be proven in
72section \ref{subsec:reduction}. The reduced argument $z$ will
73be represented as a double-double number $z_\hi + z_\lo$ that will be fed into the polynomial approximation
74algorithms of both quick and accurate phase. Each of these phases will take into account the lower significant value
75$z_\lo$ for more or less
76higher monomial degrees.
77
78Both phases will finally reconstruct the function's value as follows:
79$$\log\left( x \right) \approx E \cdot \log\left( 2 \right) + p\left( z \right) - \log\left( r_i \right)$$
80using a double (respectively a triple for the accurate phase) double value for each
81$\log\left( 2 \right)$ and $-\log\left( r_i \right)$. The computations necessary for performing this reconstruction
82are carried out in double-double arithmetics for the quick phase and triple-double for the accurate phase.
83
84The quick phase uses a modified Remez polynomial of degree $7$ of the form
85$$p\left( z \right) = z - \frac{1}{2} \cdot z^2 + z^3 \cdot
86\left( c_3 + z \cdot \left( c_4 + z \cdot \left( c_5 + z \cdot \left( c_6 + z \cdot c_7 \right) \right) \right) \right)$$
87with $c_i \in \F$.
88This polynomial is evaluated as indicated by the parenthesis in the following term:
89$$p\left( z_\hi + z_\lo \right) \approx \left( \left(z_\hi + z_\lo \right) - \frac{1}{2} \cdot  z_\hi^2\right) +
90\left( \left( - z_\hi \cdot z_\lo \right) +
91\left(z_\hi^2 \cdot z_\hi \right) \cdot
92\left( c_3 + z_\hi \cdot \left( c_4 + z_\hi \cdot \left( c_5 + z_\hi \cdot \left( c_6 + z_\hi \cdot c_7 \right) \right) \right) \right) \right)$$
93The mathematical relative approximation error of the polynomial $p\left( z \right)$ defined as
94$$\epsilon_{\mbox{\tiny meth}} = \frac{p\left( z \right) - \log\left( 1 + z \right)}{\log\left(1 + z \right)}$$ is bounded by
95$$\left \vert \epsilon_{\mbox{\tiny meth}} \right \vert \leq 2^{-62.99}$$
96This methodical error is joined by the arithmetical error induced by the evaluation of $p\left( z \right)$
97and by the rounding of the constants $\log\left( 2 \right)$ and $\log\left( r_i \right)$.
98As will be shown in section \ref{subsec:quickphase}, the overall error of the quick phase defined as
99$$\epsilon_{\mbox{\tiny quick}} = \frac{\left(log_\hi + log_\lo\right) - \log\left(x\right)}{\log\left(x\right)}$$
100is bounded by
101$$\left \vert \epsilon_{\mbox{\tiny quick}} \right \vert \leq 5 \cdot 2^{-65} \leq 2^{-62.6}$$ ~ \par
102After the computation of the quick phase double-double value $\left( log_\hi + log_\lo \right)$ a rounding test is performed
103using the rounding constants according to \ref{th:roundingRN1}. If the rounding cannot be decided, the accurate
104phase is launched. \par
105The accurate phase performs all its computations on the same reduced argument $z = z_\hi + z_\lo$ which will be shown to be
106exact. An approximation polynomial of degree $14$ is used. It is once again a modified Remez polynomial and has the
107following form:
108$$p\left( z \right) = z + \frac{1}{2} \cdot z + z^3 \cdot q\left( z \right)$$
109where
110$$q\left( z \right) =
111c^\prime_3 + z \cdot \left(
112c^\prime_4 + z \cdot \left(
113c^\prime_5 + z \cdot \left(
114c^\prime_6 + z \cdot \left(
115c^\prime_7 + z \cdot \left(
116c^\prime_8 + z \cdot \left(
117c^\prime_9 + z \cdot r\left( z \right) \right) \right) \right) \right) \right) \right)$$
118with
119$c^\prime_i = c_{i\hi} + c_{i\lo} \in \F + \F$ and
120$$r\left( z \right) =
121c_{10} + z \cdot \left(
122c_{11} + z \cdot \left(
123c_{12} + z \cdot \left(
124c_{13} + z \cdot c_{14} \right) \right) \right)$$
125with $c_i \in \F$.
126The mathematical relative error
127$$\epsilon_{\mbox{\tiny meth}} = \frac{p\left( z \right) - \log\left( 1 + z \right)}{\log\left( 1 + z \right)}$$
128is bounded by
129$$\left \vert \epsilon_{\mbox{\tiny meth}} \right \vert \leq  2^{-125}$$
130The polynomial is evaluated using double precision for $r\left( z \right)$, double-double arithmetic for
131$q\left( z \right)$ and a triple-double representation for $p\left( z \right)$ and the final reconstruction.
132
133The overall error
134$$\epsilon_{\mbox{\tiny accurate}} = \frac{\left( log_\hi + log_\mi + log_\lo \right) - \log\left( x \right)}{\log\left( x \right)}$$
135is bounded by
136$$\left \vert \epsilon_{\mbox{\tiny accurate}} \right \vert \leq 5735 \cdot 2^{-132} \leq 2^{-119.5}$$
137as will be shown in section \ref{subsec:accuratephase}. Here $\left( log_\hi + log_\mi + log_\lo \right)$
138are obtained by reconstructing the logarithm as indicated by the parenthesis in the following term:
139$$log_\hi + log_\mi + log_\lo = \left(E \cdot \left( log2_\hi + log2_\mi + log2_\lo \right) \right) +
140\left( \left( p_\hi + p_\mi + p_\lo \right) + \left(logi_\hi + logi_\mi + logi_\lo \right) \right)$$
141where
142$log2_\hi + log2_\mi + log2_\lo \approx \log\left( 2 \right)$ and $logi_\hi + logi_\mi + logi_\lo \approx -\log\left(r_i \right)$.
143
144Since the critical accuracy of the double precision $\log$ function is $118$ bits according to
145\cite{DinDefLau2004LIP}, rounding $\log_\hi + log_\mi + log_\lo \approx \log\left( x \right)$ to double precision is equivalent
146to rounding the infinite precision value $\log\left( x \right)$ to double precision.
147Using the final rounding sequences presented in \cite{Lauter2005LIP:tripledouble}, which are supposed to be correct,
148the double precision value returned by the function is the correctly rounded double precision value of
149$\log\left( x \right)$.
150
151
152
153\section{Proof of correctness of the triple-double implementation \label{sec:logtdproof}}
154Proving that an implementation of an elementary function is correctly rounded means mainly proving two
155bounds on the relative error $\epsilon_{\mbox{\tiny quick}}$ and $\epsilon_{\mbox{\tiny accurate}}$, using the appropriate lemma for proving the
156correctness of the rounding test and conluding by means of the theorem stating the critical accuracy of the
157function considered. The computation of the error bounds will be done mainly using the Gappa tool\cite{Melqu05} but
158some parts of the proof are still based on paper or Maple computations. These parts will be shown in sections
159\ref{subsec:reduction}, \ref{subsec:quickphase} and \ref{subsec:accuratephase} and mainly comprise the following:
160\begin{itemize}
161\item the demonstration that all special cases are handled correctly,
162\item a proof that $z_\hi + z_\lo = r_i \cdot y - 1$ exactly,
163\item the bounds for the mathematical approximation errors for the polynoms,
164\item a proof of the exactness of some multiplications in the code,
165\item the proof for the accuracy of all basic addition and multiplication code sequences on
166double-double and triple-double numbers,
167\item the correctness proof of the final rounding sequences for rounding triple-double numbers to double precision and
168\item the mathematical equality of the term rewriting hints in the Gappa code.
169\end{itemize}
170The proofs for the accuracy of the basic operation bricks and the correctness proof of the final rounding sequences
171are somewhat lengthy and are not given here; they can be found in \cite{Lauter2005LIP:tripledouble}.
172\subsection{Exactness of the argument reduction\label{subsec:reduction}}
173In this section, we will show that all special cases are handled correctly and that the
174reduced argument consisting in $E$ and $z_\hi + z_\lo$ is exact, which means that we have the mathematically exact
175equation
176$$\log\left( x \right) = E \cdot \log\left( 2 \right) + \log\left( 1 + \left( z_\hi + z_\lo \right) \right) - \log\left( r_i \right)$$
177This part of the algorithm is performed by the following code sequences which we will analyse line by line:
178\begin{lstlisting}[caption={Handling of special cases and table access},firstnumber=1]
179E=0;
180xdb.d=x;
181
182/* Filter cases */
183if (xdb.i[HI] < 0x00100000){                     /* x < 2^(-1022)    */
184  if (((xdb.i[HI] & 0x7fffffff)|xdb.i[LO])==0){
185    return -1.0/0.0;
186  }                    		                 /* log(+/-0) = -Inf */
187  if (xdb.i[HI] < 0){
188    return (x-x)/0;                              /* log(-x) = Nan    */
189  }
190  /* Subnormal number */
191  E = -52;
192  xdb.d *= ((db_number) ((double) two52)).d; 	 /* make x a normal number    */
193}
194
195if (xdb.i[HI] >= 0x7ff00000){
196  return  x+x;				         /* Inf or Nan       */
197}
198
199
200/* Do argument reduction */
201E += (xdb.i[HI]>>20)-1023;                       /* extract the exponent */
202index = (xdb.i[HI] & 0x000fffff);
203xdb.i[HI] =  index | 0x3ff00000;	         /* do exponent = 0 */
204index = (index + (1<<(20-L-1))) >> (20-L);
205
206/* reduce  such that sqrt(2)/2 < xdb.d < sqrt(2) */
207if (index >= MAXINDEX){                          /* corresponds to xdb>sqrt(2)*/
208  xdb.i[HI] -= 0x00100000;
209  E++;
210}
211y = xdb.d;
212index = index & INDEXMASK;
213
214ed = (double) E;
215
216ri = argredtable[index].ri;
217
218logih = argredtable[index].logih;
219logim = argredtable[index].logim;
220\end{lstlisting}
221Analysis of the code:
222{\renewcommand{\labelenumi}{}
223\begin{enumerate}
224\item line 1 and 2: Initialization of integer $E$ and {\tt db\_number} {\tt xdb} which is now equal to $x$.
225\item line 5: As the integer ordering and the ordering on floating point numbers are compatible,
226$x < +2^{-1022}$, i.e. negative, negative infinite, equal to zero or a subnormal.
227\item line 6: {\tt xdb.i[HI] \& 0x7fffffff} is the high order word of $x$ without the sign bit. If the test is true,
228$\left \vert x \right \vert = 0$. As the logarithm of $0$ is not defined but as the limit $-\infty$ is known, returning
229$-1.0 / 0.0$ is correct.
230\item line 9: Since the integer ordering and the ordering on floating point numbers are compatible,
231{\tt xdb.i[HI] < 0} implies $x < 0$. The logarithm is not defined for negative numbers, so the result must be $\nan$.
232$0.0 / 0.0$ leads to a $\nan$; one uses $\left(x - x\right) / 0.0$ in order to overcome the static tests of the compiler.
233\item line 13 and 14: if this code lines are reached, $x$ is a subnormal. Since $E$ equals $0$ at this point,
234setting it to $-52$ and multipliying {\tt xdb} by $2^{-52}$ means bringing {\tt xdb} to the normal number range and
235rescaling the internal representation $x = 2^E \cdot m = 2^E \cdot${\tt xdb} in consequence.
236\item line 17: As the integer ordering and the ordering on floating point numbers are compatible and as
237{\tt 0x7fefffff ffffffff} is the greatest normal, the test being true implies that $x$ is equal to $+\infty$ or $\nan$.
238In the case of $x=+\infty$, $+\infty$ must be returned which is done. In the other case, $\nan$ must be returned which is
239still assured by $x + x$.
240\item line 23: At this point of the code, the most significant bit of the high order word of {\tt xdb} must be $0$ as
241the case where $x < 0$ is already filtered out. So {\tt xdb.i[HI] > > 20} is equal to the biased exponent of {\tt xdb}
242because a double number consists in $1$ sign bit, $11$ exponent bits and the word bit length is supposed to be $32$.
243Subtracting $1023$ yields to the unbiased exponent which is written to $E$.
244\item line 24 and 25: Since a double number consists in $1$ sign bit and $11$ exponent bits, the operation
245{\tt xdb.i[HI] \& 0x000fffff} masks out the mantissa bits in the higher order word of {\tt xdb}.
246Rewriting {\tt xdb.i[HI] = index | 0x3ff00000} means setting the exponent of {\tt xdb} to $0$ because
247{\tt 0x3ff}$ - 1023 = 0$.
248\item line 26: Before execution of this line of code, {\tt index} contains the high order bits of the normalized mantissa
249of $x$ stored as a double in {\tt xdb.d} and verifying thus $1 \leq m < 2$. The second argument reduction step
250will slice this intervall in $128$ intervalls for each of which we dispose of a table entry. For reasons of possible
251cancellation in the reconstruction step on the operation $p\left( z \right) - \log\left( r_i \right)$, we want the
252small intervalls to be centered around $1$. That means e.g. for the intervall around $1$ and a table indexed by $7$ bits
253that mantissas (as doubles) with the high order word {\tt 0x3fefffff} through {\tt 0x3ff00fff} must be mapped to $0$.
254The decision is therefore made at the $7+1$th bit of the mantissa part of the double depending on whether this bit is $0$
255-- in which case the value falls in the lower intervall -- or $1$ -- in which case the value goes to the next higher
256intervall. So adding $1$ to the $\left(20 - 7 - 1\right)$ rightmost bit ($L = 7$) increases the index value by $1$ iff this bit is $1$.
257So after execution of the line, {\tt index} contains the number of the intervall for the second argument reduction step
258centered in $1$.
259\item line 29 through 31: The second adjustment to be made on $E^\prime$ and $m$ is the decision whether $m > \sqrt{2}$ as
260indicated in section \ref{sec:logoutline}. The high order word of $\sqrt{2}$ rounded to a double is {\tt 0x3ff6a09e}.
261As one can simply verify, the value for {\tt index} calculated for this value is $53$. As the integer ordering and
262the ordering of floating point numbers are compatible and as the computations for computing {\tt index} are monotone,
263{\tt index} being greater or equal than $53$ implies that the (normalized) mantissa of $x$ is greater than
264$\sqrt{2} + \delta$ with a neglectable error $\delta$.
265As {\tt MAXINDEX} is equal to $53$, the test will be true iff the adjustment on $E^\prime$ leading
266to $E$ and $m$ yielding $y$ is to be made. It is trivial to see that the code in the {\tt if}'s body implements the
267adjustment correctly.
268\item lines 33 and 34: the final value of the reduced argument $y$ -- still stored in {\tt xdb.d} -- is copied to
269a {\tt double} variable (or register) named {\tt y}. The final index value is masked out by means of an {\tt INDEXMASK}
270which is equal to $127 = 2^7-1$.
271\item lines 36: The integer value of the exponent $E$ stored in {\tt E} is cast to a {\tt double ed}.
272\item lines 38 through 41: The table is indexed by {\tt index} and values {\tt ri}$=r_i$ and
273{\tt logih}$=logi_\hi$ and {\tt logim}$=logi_\mi$ are read.
274Since the latter form a double-double precision value, we know that
275$logi_\hi + logi_\mi = \log\left( r_i \right) \cdot \left( 1 + \epsilon \right)$ with $\left \vert \epsilon \right \vert \leq 2^{-106}$.
276The value {\tt ri} is stored as a single precision variable and a Maple procedure assures that for each value
277$y$ the following inegality is verified:
278$$\left \vert z \right \vert = \left \vert y \cdot r_i - 1 \right \vert \leq 2^{-8}$$
279\end{enumerate}
280}
281Let us show now that the following line calculate $z_\hi$ and $z_\lo$ such that for each $y$ and corresponding $r_i$,
282we obtain exactly
283$$z_\hi + z_\lo = y \cdot r_i - 1$$
284\begin{lstlisting}[caption={Argument reduction},firstnumber=42]
285Mul12(&yrih, &yril, y, ri);
286th = yrih - 1.0;
287Add12Cond(zh, zl, th, yril);
288\end{lstlisting}
289We know that we can suppose that the multiplication and addition sequences \Mul~ and \Add~ used at lines
29042 and 44 are exact. Thus, it suffices to show that
291$$yri_\hi - 1.0 = yri_\hi \ominus 1.0$$
292because in that case, we can note
293$$z_\hi + z_\lo = th + yri_\lo = yri_\hi \ominus 1.0 + yri_\lo = y \cdot r_i - 1.0$$
294We will show this property using Sterbenz' lemma. It suffices thus to prove that
295$$\frac{1}{2} \leq yri_\hi \leq 2$$
296We know that
297\begin{eqnarray*}
298yri_\hi & = & \circ\left( y \cdot r_i \right) \\
299& \leq & \circ \left( 1 + 2^{-8} \right) \\
300& = & 1 + 2^{-8} \\
301& < & 2
302\end{eqnarray*}
303since the rounding function $\circ$ is monotonic and the accuracy of the format is greater than $9$ bits.
304
305The other way round, we get
306\begin{eqnarray*}
307yri_\hi & = & \circ \left( y \cdot r_i \right) \\
308& \geq & \circ \left( 1 - 2^{-8} \right) \\
309& = & 1 - 2^{-8} \\
310& > & \frac{1}{2}
311\end{eqnarray*}
312for the same reasons.
313
314Thus $z_\hi + z_\lo = y \cdot r_i$ exactly. Since the previous phases of the argument reduction were all exact, the reduced argument
315verifies $x = 2^{E} \cdot y$ exactly.
316
317Still in this section, let us show that neither the reduced argument of the logarithm function nor its result may be
318a sub-normal double number. The first property has already been assured by special case handling as shown above. The
319latter can be proven as follows: the $\log\left( x \right)$ function has one zero for $x = 1$ and only one.
320As it is monotone, for $x = 1 \pm 1 \mUlp = 1 \pm 2^{-52}$ we will obtain $\log\left( 1 \pm 2^{-52} \right) = 0 \pm 2^{-52} + \delta$
321with
322$\left \vert \delta \right \vert \leq 2^{-103}$. As $0 \pm 2^{-1022}$ is the least normal, the result of the logarithm function will
323always be a normal. Further, in both double-double and triple-double representations for the final intermediate result
324for the function, as its critical accuracy is $118$, the least significant double in the representation will still be
325a normal as $52 + 106 = 158 < 1022$.
326\subsection{Accuracy proof of the quick phase\label{subsec:quickphase}}
327As already mentionned, the accuracy proof of the quick phase is mainly based on the Gappa tool. To prove the desired
328accuracy bound defined as
329$$\epsilon_{\mbox{\tiny quick}} = \frac{\left(log_\hi + log_\lo\right) - \log\left(x\right)}{\log\left(x\right)}$$
330and given by
331$$\left \vert \epsilon_{\mbox{\tiny quick}} \right \vert \leq 5 \cdot 2^{-65} \leq 2^{-62.6}$$
332three different Gappa proof files are necessary depending on the following cases:
333\begin{itemize}
334\item for $E \geq 1$ and all indexes to the table $0 \leq i \leq 127$, a general proof file named {\tt log-td.gappa} is used
335\item for $E = 0$ and all indexes to the table except $0$, i.e. $1 \leq i \leq 127$, a proof file named {\tt log-td-E0.gappa}
336comes to hand and
337\item for $E = 0$ and the table index $i = 0$, a proof file called {\tt log-td-E0-logir0.gappa} is employed.
338This latter file
339uses relative error computations in opposition to the other two cases where absolute error estimates suffice. This
340is necessary because in this case and in this one only, the logarithm function has a zero in the intervall considered.
341\end{itemize}
342In each of the three proof files, we will ask the Gappa tool to verify the accuracy bound expressed in its syntax as
343follows:
344\begin{lstlisting}[caption={Accuracy bound to prove},firstnumber=109]
345->
346((logh + logm) - Log) / Log in [-5b-65,5b-65]
347\end{lstlisting}
348Still in any proof file, some hypothesis are made on the correctness of one multiplication sequence and the
349accuracy of the constants and resting operations carried out in double-double arithmetic.
350These hypothesis are the following:
351\begin{itemize}
352\item The operations in the following code sequence are exact since the constants are stored with enough trailing zeros:
353\begin{lstlisting}[caption={Multiplication by $E$},firstnumber=50]
354Add12(log2edh, log2edl, log2h * ed, log2m * ed);
355\end{lstlisting}
356This means that $log2ed_\hi + log2ed_\lo = E \cdot \left( log2_\hi + log2_\lo \right)$ exactly.
357\item The operations in the following code sequence are exact since multiplications with a power of $2$ are exact
358as long as the result is not underflowed:
359\begin{lstlisting}[caption={Multiplication by $-0.5$},firstnumber=60]
360zhSquareHalfh = zhSquareh * -0.5;
361zhSquareHalfl = zhSquarel * -0.5;
362\end{lstlisting}
363i.e. $zhSquareHalf_\hi + zhSquareHalf_\lo = -0.5 \cdot \left( zhSquare_\hi + zhSquare_\lo \right)$.
364\item The following hypothesis on the accuracy bounds, expressed here in Gappa syntax, are verified:
365\begin{lstlisting}[caption={Gappa hypothesis},firstnumber=100]
366(T2hl - T2) / T2 in [-1b-103,1b-103]
367/\ (Phl - PE) / PE in [-1b-103,1b-103]
368/\ (LogTabPolyhl - LogTabPoly) / LogTabPoly in [-1b-103,1b-103]
369/\ (Loghm - LogE) / LogE in [-1b-103,1b-103]
370/\ (Log2hm - Log2) / Log2 in [-1b-84,1b-84]
371/\ (Logihm - Logir) / Logir in [-1b-106,1b-106]
372/\ Z in [_zmin,_zmax]
373/\ (P - Log1pZ) / Log1pZ in [-_epsilonApproxQuick,_epsilonApproxQuick]
374/\ ((logh + logm) - Loghm) / Loghm in [-1b-106,1b-106]
375\end{lstlisting}
376Here, {\tt \_zmin}, {\tt \_zmax} and {\tt \_epsilonApproxQuick} are replaced by Maple calculated values, typically
377$-zmin = zmax = 2^{-8}$ and $epsilonApproxQuick = 2^{-62.99}$.
378\end{itemize}
379Let us now show each of this hypothesises.
380\begin{enumerate}
381\item The operations yielding {\tt log2edh} and {\tt log2edl} are all exact because the \Add~ sequence is supposed
382to be exact in any case and because the constants {\tt log2h} and {\tt log2m} are calculated by the following Maple
383code and have in consequence at least 11 trailing zeros and {\tt ed} $=E$ is less than $1024$ in magnitude since $1024$ is
384the maximum exponent value for double precision.
385\begin{lstlisting}[caption={Maple code for computing {\tt log2h} and {\tt log2m}},firstnumber=21,label={list:maplelog2}]
386log2acc := log(2):
387log2h := round(log2acc * 2**(floor(-log[2](abs(log2acc))) + (53 - 11))) /
388         2**(floor(-log[2](abs(log2acc))) + (53 - 11)):
389log2m := round((log2acc - log2h) * 2**(floor(-log[2](abs((log2acc - log2h)))) +
390         (53 - 11))) / 2**(floor(-log[2](abs((log2acc - log2h)))) + (53 - 11)):
391\end{lstlisting}
392\item To show that $zhSquareHalf_\hi + zhSquareHalf_\lo = -0.5 \cdot \left( zhSquare_\hi + zhSquare_\lo \right)$ we just have to show
393that both values $zhSquare_\hi$ and $zhSquare_\lo$ are either equal to $0$ or greater than $2$ times the smallest
394normal. Let us first give the definitions of both values:
395\begin{eqnarray*}
396zhSquare_\hi & = & \circ \left( z_\hi \cdot z_\hi \right) \\
397zhSquare_\lo & = & z_\hi \cdot z_\hi - zhSquare_\hi
398\end{eqnarray*}
399where $z_\hi = \circ \left( z \right)$.
400Let us suppose that $z \not = 0$. Otherwise all values are equal to $0$ and we can conclude.
401
402Let us first show that $\left \vert zhSquare_\hi \right \vert$ is greater than $2^{54}$ times the smallest normal.
403Let us therefore suppose that this
404is not the case, i.e. $\left \vert zhSquare_\hi \right \vert < 2^{-948}$. Since the rounding function is monotonic,
405this implies that $\left \vert z_\hi \right \vert \leq 2^{-424}$. For the same reason, we can note that
406$\left \vert z \right \vert \leq 2^{-424}$. As we have $z = y \cdot r_i - 1$, clearly neither $y$ nor $r_i$ can be exactly $1$.
407If this were the case for both, we would obtain $z=0$ which we excluded; if there were one of them only that was
408exactly $1$, the other being a floating point number in the intervall $\left[ 0.5; 1.5 \right]$,
409the resulting inegality $\left \vert z \right \vert \geq 2^{-53}$ which would be contradictory.
410
411Otherwise, since we know that $1 - 2^{-8} \leq y \cdot r_i \leq 1 + 2^{-8}$ and since the precision of all formats used is greater than
412$9$, the hypothesis that $1 - 2^{-424} \leq y \cdot r_i \leq 1 + 2^{-424}$ and $y \cdot r_i \not = 0$
413would imply that the infinite precision mantissa of
414$y \cdot r_i$ contains a $1$ weighted with $2^0$ and a $1$ weighted with less than $2^{-424}$. So its length would be greater than
415$423$ bits. As it is the product of two floating point numbers which have $52$ and $23$ significant bits, there
416cannot be a $1$ weighted with less than $76$ if there is a $1$ weighted with $2^0$ which is the case. Contradiction.
417
418So $-0.5 \cdot zhSquare_\hi$ is not underflowed. Additionally, with a similar argument, since {\tt zh} is a double precision
419number, $zhSquare_\lo$ is either $0$ or greater in magnitude than $2^{-53} \cdot \left \vert zhSquare_\hi \right \vert$ which is
420$2^{52}$ times greater in magnitude than the smallest normal. So $zhSquare_\lo$ is either $0$ or $2$ times greater in
421magnitude than the smallest normal.
422
423So, the floating point multiplication of $zhSquare_\hi$ and $zhSquare_\lo$ with $-0.5$ can be considered to be exact.
424\item {\tt (T2hl - T2) / T2 in [-1b-103,1b-103]} which means that
425$$\left \vert \frac{T2hl - T2}{T2} \right \vert \leq 2^{-103}$$ is verified as $T2hl$ and $T2$ are defined as follows:
426$$T2hl = t2_\hi + t2_\lo \gets \mAddDD \left( z_\hi, z_\lo, zhSquareHalf_\hi, zhSquareHalf_\lo \right)$$
427$$T2 = \left( z_\hi + z_\lo \right) + \left( zhSquareHalf_\hi + zhSquareHalf_\lo \right)$$
428The given bound is thus just the accuracy bound of the \AddDD~ sequence for which a proof can be found in
429\cite{Lauter2005LIP:tripledouble}.
430\item {\tt (Phl - PE) / PE in [-1b-103,1b-103]} is verified for the same reason; let us just recall the definitions
431$$Phl = p_\hi + p_\lo \gets \mAddDD \left( t2_\hi, t2_\lo, t1_\hi, t1_\lo \right)$$
432$$PE = \left( t2_\hi + t2_\lo\right) + \left( t1_\hi + t1_\lo \right)$$
433\item {\tt (LogTabPolyhl - LogTabPoly) / LogTabPoly in [-1b-103,1b-103]} falls still into the same case with
434$$LogTabPolyhl = logTabPoly_\hi + logTabPoly_\lo \gets \mAddDD \left( logi_\hi, logi_\mi, p_\hi, p_\lo \right)$$
435$$LogTabPoly = \left( logi_\hi, + logi_\mi \right) + \left( p_\hi +  p_\lo \right)$$
436\item And finally, {\tt (Loghm - LogE) / LogE in [-1b-103,1b-103]}
437which is also just the accuracy bound of the \AddDD~ sequence for
438$$Loghm = log_\hi + log_\mi \gets \mAddDD \left( log2ed_\hi, log2ed_\lo, logTabPoly_\hi, logTabPoly_\lo \right)$$
439$$LogE = \left( log2ed_\hi + log2ed_\lo \right) + \left( logTabPoly_\hi + logTabPoly_\lo \right)$$
440\item {\tt (Log2hm - Log2) / Log2 in [-1b-84,1b-84]} is verified
441since $log2_\hi$ and $log2_\mi$ are computed as already indicated in listing \ref{list:maplelog2}.
442This means that at least $11$ trailing zeros are stored in each in the doubles in this (pseudo-)double-double number,
443so it is exact to $2^{-106-2 \cdot 11} = 2^{-84}$.
444\item {\tt (Logihm - Logir) / Logir in [-1b-106,1b-106]} which means
445$$\left \vert \frac{\left( logi_\hi + logi_\mi \right) - \log\left( r_i \right)}{\log\left( r_i \right)} \right \vert \leq 2^{-106}$$
446is verified by construction as $logi_\hi$ and $logi_\mi$ are computed by the following Maple code:
447\begin{lstlisting}[caption={Maple code for computing $logi_\hi$ and $logi_\mi$},firstnumber=35]
448(logih[i], logim[i], logil[i]) := hi_mi_lo(evalf(-log(r[i]))):
449\end{lstlisting}
450where {\tt hi\_mi\_lo} is the procedure for rounding an arbitrary precision number to a triple-double number the higher
451significant numbers of which form a double-double number.
452\item The hypothesis {\tt Z in [\_zmin,\_zmax]} simply recalls the bounds for $z$ as calculated by Maple.
453\item The same can be said on the hypothesis \\
454{\tt (P - Log1pZ) / Log1pZ in [-\_epsilonApproxQuick,\_epsilonApproxQuick]} \\
455which gives the mathematical approximation error of the polynomial. This bound is computed by Maple using the following
456instructions:
457\begin{lstlisting}[caption={Maple code for computing the relative error of the polynomial},firstnumber=129]
458epsilonApproxQuick := numapprox[infnorm]( 1-polyQuick/log(1+x), x=zminmin..zmaxmax)
459\end{lstlisting}
460\item Finally, Gappa's hypothesis {\tt ((logh + logm) - Loghm) / Loghm in [-1b-106,1b-106]}
461simply restates the fact that a double-double precision number is exact to
462at least $2^{-106}$ in terms of its relative error.
463\end{enumerate}
464The Gappa tool itself is not capable of proving the final accuracy bound it is asked for a complex algorithm as the
465one given here. Its user must provide hints to help it to rewrite the interval arithmetics terms it encounters
466in the program. These hints are generally given in the form {\tt $\alpha$ -> $\beta$}
467where $\beta$ is an expression we want the
468tool to rewrite the expression $\alpha$ by. Generally speaking, the idea behind each hint is one of the following:
469\begin{itemize}
470\item For computing intervall bounds on differences like $\alpha = a - A$ where both $a$ and $A$ are sums of terms like
471$a = c + C$ and $B = d + D$, it is often useful to rewrite $\alpha$ by $\beta = \left(c - d \right) + \left( C - D \right)$.
472\item An intervall bound can often be easier found for a term $A$ representing an exact mathematical value that for $a$
473which is its arithmetical equivalent. So it is useful to rewrite $a$ by $A \cdot \left( 1 + \frac{a - A}{A} \right)$ when
474an intervall for $\frac{a - A}{A}$ is known.
475\item Fractional left hand sides like $\frac{a}{b}$ where both expressions $a$ and $b$ are functions in a common argument
476$x$ that can be written like $a = a\left( x \right) = x^n \cdot a^\prime\left( x \right)$ and
477$b = b\left( x \right) = x^m \cdot b^\prime\left( x \right)$ should usually be rewritten as follows:
478$$\frac{a\left(x\right)}{b\left( x \right)} = \frac{x^n \cdot a^\prime\left( x \right)}{x^m \cdot b^\prime\left( x \right)} =
479x^{n - m} \cdot \frac{a^\prime\left( x \right)}{b^\prime\left( x \right)}$$ In particular, this kind of hint is needed when an
480intervall for the denominator of a fractional left-hand-side comprises $0$.
481\item Fractional left-hand-sides of the form $\frac{a - A}{A}$ with an unknown $A$ can easily be written like
482$$\frac{a - A}{A} = \frac{a - B}{B} + \frac{B - A}{A} + \frac{a - B}{B} \cdot \frac{B - A}{A}$$
483We can show this equivalence like this
484\begin{eqnarray*}
485\frac{a - A}{A} & = & \frac{a - B + B - A}{A} \\
486& = & \frac{a - B}{A} + \frac{B - A}{A} \\
487& = & \frac{a - B}{B} \cdot \frac{B}{A} + \frac{B - A}{A} \\
488& = & \frac{a - B}{B} \cdot \left( 1 + \frac{B - A}{A} \right) + \frac{B - A}{A} \\
489& = & \frac{a - B}{B} + \frac{B - A}{A} + \frac{a - B}{B} \cdot \frac{B - A}{A}
490\end{eqnarray*}
491This is particularly useful when a bound on the relative error of some term $a$ with regard to $B$ should be
492extended to the next approximation level.
493\end{itemize}
494Clearly, the left-hand-side $A$ and right-hand-side $B$ of an hint must be mathematically equivalent to provide a
495correct result. The Gappa tool checks for this equivalence and sometimes is able to prove it. If not, it emits a
496warning indicating that the formal proof it is generating for the accuracy bound computations is valid only under
497the hypothesis that both sides of the rewriting hint are mathematically equivalent. Further, it prints out the
498difference $A - B$ of both sides $A$ and $B$ which it has already reduced using the equivalences given in the
499Gappa code. It is relatively simple to verify that all this differences are equal to $0$ modulo the definitions
500given in the Gappa code by means of Maple-scripts. This work can even been done automatically. Thus, we refrain
501from giving a paper proof of each hint in the Gappa files used for proving the logarithm function but just
502give the exhaustive list of the hints in files {\tt log-td.gappa} and {\tt log-td-E0-logir0.gappa}:
503\begin{lstlisting}[caption={Gappa term rewriting hints in file {\tt log-td.gappa}},firstnumber=115]
504T2hl - T2 -> ((T2hl - T2) / T2) * T2;
505T2hl -> (T2hl - T2) + T2;
506
507Phl - PE -> ((Phl - PE) / PE) * PE;
508Phl -> (Phl - PE) + PE;
509
510
511LogTabPolyhl -> (LogTabPolyhl - LogTabPoly) + LogTabPoly;
512
513Loghm -> (Loghm - LogE) + LogE;
514
515Log2 -> Log2hm * (1 / (((Log2hm - Log2) / Log2) + 1));
516
517Logir -> Logihm * (1 / (((Logihm - Logir) / Logir) + 1));
518
519
520LogTabPolyhl - LogTabPoly -> ((LogTabPolyhl - LogTabPoly) / LogTabPoly) * LogTabPoly;
521
522HZZsimp -> (-0.5 * zh * zh) - (0.5 * zl * zl);
523
524T2hl - ZpHZZsimp -> (0.5 * zl * zl) + delta1;
525
526zhCube - ZZZ -> (Z * (zhSquareh - Z * Z)) - (zl * zhSquareh);
527
528polyUpper - ZZZPhigher -> ZZZ * (polyHorner - Phigher) + polyHorner * delta3 + delta2;
529
530ZpHZZ + ZZZPhigher -> ZpHZZsimp + ZZZPhigherPzhzl;
531
532Phl - P -> (T2hl - ZpHZZsimp) + (T1hl - ZZZPhigherPzhzl) + delta4;
533
534Log1pZ -> P * (1 / (((P - Log1pZ) / Log1pZ) + 1));
535P - Log1pZ -> ((P - Log1pZ) / Log1pZ) * Log1pZ;
536
537Phl - Log1pZ -> (Phl - P) + delta6;
538
539LogTabPolyhl - Log1pZpTab -> (Logihm - Logir) + (Phl - Log1pZ) + delta7;
540
541Loghm - Log -> (Log2edhm - Log2E) + (LogTabPolyhl - Log1pZpTab) + delta5;
542
543(logh + logm) - Loghm -> (((logh + logm) - Loghm) / Loghm) * Loghm;
544
545(logh + logm) - Log -> ((logh + logm) - Loghm) + (Loghm - Log);
546\end{lstlisting}
547\begin{lstlisting}[caption={Gappa term rewriting hints in file {\tt log-td-E0-logir0.gappa}},firstnumber=81]
548T2hl - T2 -> ((T2hl - T2) / T2) * T2;
549T2hl -> (T2hl - T2) + T2;
550
551Phl - PE -> ((Phl - PE) / PE) * PE;
552Phl -> (Phl - PE) + PE;
553
554
555(ZhSquarehl - ZZ) / ZZ -> 2 * ((zh - Z) / Z) + ((zh - Z) / Z) * ((zh - Z) / Z);
556
557(zhSquareh - ZZ) / ZZ -> ((ZhSquarehl - ZZ) / ZZ) + ((zhSquareh - ZhSquarehl) / ZZ);
558
559(zhSquareh - ZhSquarehl) / ZZ -> ((zhSquareh - ZhSquarehl) / ZhSquarehl) * (ZhSquarehl / ZZ);
560
561ZhSquarehl / ZZ -> ((ZhSquarehl - ZZ) / ZZ) + 1;
562
563(ZhCube - ZZZ) / ZZZ -> (((zh * zhSquareh) - ZZZ) / ZZZ) + ((ZhCube - (zh * zhSquareh)) / ZZZ);
564
565((zh * zhSquareh) - ZZZ) / ZZZ -> (1 + ((zh - Z) / Z)) * (1 + ((zhSquareh - ZZ) / ZZ)) - 1;
566
567((ZhCube - (zh * zhSquareh)) / ZZZ) -> ((ZhCube - (zh * zhSquareh)) / (zh * zhSquareh)) * (((zh - Z) / Z) + 1) * (((zhSquareh - ZZ) / ZZ) + 1);
568
569polyHorner / Phigher -> ((polyHorner - Phigher) / Phigher) + 1;
570
571(polyUpper - ZZZPhigher) / ZZZPhigher -> ((polyHorner - Phigher) / Phigher) + ((ZhCube - ZZZ) / ZZZ) * (polyHorner / Phigher) +
572					  + ((polyUpper - (polyHorner * ZhCube)) / (polyHorner * ZhCube)) * (polyHorner / Phigher) +
573					  + ((ZhCube - ZZZ) / ZZZ) * ((polyUpper - (polyHorner * ZhCube)) / (polyHorner * ZhCube)) *
574					    (polyHorner / Phigher);
575
576
577((ZhSquareHalfhl - (zh * zl)) - HZZ) / HZZ -> - ((zh - Z) / Z) * ((zh - Z) / Z);
578
579(ZhSquareHalfhl - HZZ) / HZZ -> (ZhSquarehl - ZZ) / ZZ;
580
581((T2hl - (zh * zl)) - ZpHZZ) / ZpHZZ -> ((HZ * (((ZhSquareHalfhl - (zh * zl)) - HZZ) / HZZ)) + ((T2hl - T2) / T2)
582                                        + (HZ * ((T2hl - T2) / T2))
583					+ (HZ * ((ZhSquareHalfhl - HZZ) / HZZ) * ((T2hl - T2) / T2))) / (1 + HZ);
584
585(PE - P) / P -> (((1 + HZ) * (((T2hl - (zh * zl)) - ZpHZZ) / ZpHZZ)) +
586		((1 + ((zh - Z) / Z)) * (Z * ((zh - Z) / Z)) * ((Flzhzl - (zh * zl)) / (zh * zl)))
587	        + (ZZ * Phigher * ((polyUpper - ZZZPhigher) / ZZZPhigher))) / (1 + HZ + ZZ * Phigher);
588
589(Phl - P) / P -> ((PE - P) / P) + ((((PE - P) / P) + 1) * ((Phl - PE) / PE));
590
591(Loghm - Log) / Log -> ((Loghm - P) / P) + ((P - Log) / Log) + ((Loghm - P) / P) * ((P - Log) / Log);
592
593(((logh + logm) - Log) / Log) -> (((logh + logm) - Loghm) / Loghm) + ((Loghm - Log) / Log) + (((logh + logm) - Loghm) / Loghm) * ((Loghm - Log) / Log);
594\end{lstlisting}
595For the reasons mentionned, we can consider the accuracy proof of the quick phase to be correct.
596\subsection{Accuracy proof of the accurate phase\label{subsec:accuratephase}}
597The accuracy proof of the accurate phase is also based mainly on the use of the Gappa tool.
598Nevertheless, since the tool is currently not directly supporting triple-double representations, some additional
599hand-proven accuracy bound results for the main addition and multiplication operators are needed. They can be
600found in \cite{Lauter2005LIP:tripledouble}. Since all these accuracy bounds are parameterized by the maximal overlap bound
601for the triple-double numbers along the computations, before being able to give a numerical value for
602these error bounds understood by the Gappa tool, it is necessary to do a maximal overlap bound analysis using
603the theorems given in \cite{Lauter2005LIP:tripledouble}.\par
604Eventually, since not an overlapped triple-double intermediate result is to be returned by the logarithm function but a
605double precision number that is the correct rounding according to the rounding mode chosen, the algorithm effectuates
606a renormalizing operation on the final result and rounds this non-overlapped result down to a double using an
607appropriate rounding sequence. All this renormalization and rounding sequences are exact and have been shown to be
608correct in \cite{Lauter2005LIP:tripledouble}. The same way, all properties shown in section \ref{subsec:reduction}
609concerning the special case handling and exactness argument reduction can be reused because the algorithm implemented in
610the accurate phase uses the same reduced argument and is substantially the same as for the quick phase. \par
611We will thus rely on all these properties and simply show the following accuracy bound
612$$\epsilon_{\mbox{\tiny accurate}} = \frac{\left( log_\hi + log_\mi + log_\lo \right) - \log\left( x \right)}{\log\left( x \right)}$$
613is bounded by
614$$\left \vert \epsilon_{\mbox{\tiny accurate}} \right \vert \leq 5735 \cdot 2^{-132} \leq 2^{-119.5}$$
615which will be expressed in Gappa syntax as follows:
616\begin{lstlisting}[caption={Accuracy bound to prove for the accurate phase},firstnumber=165]
617->
618((logh + logm + logl) - MLog) / MLog in [-5735b-132,5735b-132]
619\end{lstlisting}
620The Gappa proof files still make the hypothesis that two of the multiplications in the accurate phase code can be
621considered to be
622exact. This property must therefore be shown in a paper proof in the following.
623
624The first of these multiplications is the following sequence:
625\begin{lstlisting}[caption={Multiplication of triple-double $\circ\left( Z \cdot Z \right)$ by $-\frac{1}{2}$},firstnumber=99]
626  zSquareHalfh = zSquareh * -0.5;
627  zSquareHalfm = zSquarem * -0.5;
628  zSquareHalfl = zSquarel * -0.5;
629\end{lstlisting}
630As it will be shown below, the relative error $\epsilon_{ZSquare}$ defined as
631$$\epsilon_{ZSquare} = \frac{\left( zSquare_\hi + zSquare_\mi + zSquare_\lo \right) - Z^2}{Z^2}$$
632is bounded by $\left \vert \epsilon_{ZSquare} \right \vert \leq 2^{-149}$.
633Using the same argument as the one given in section \ref{subsec:quickphase}, one can show that $Z$ is either $0$
634or greater in magnitude than at least $2^{-77}$. So the following is true
635$$Z^2 = 0 \lor \left \vert Z^2 \right \vert \geq 2^{-154}$$
636If $Z^2=0$, $ZSquarehml = zSquare_\hi + zSquare_\mi + zSquare_\lo$ trivially is $0$, too, and the multiplication is
637with $-\frac{1}{2}$ is therefore exact.
638Since we can note $ZSquarehml = Z^2 \cdot \left( 1 + \epsilon_{ZSquare} \right)$, we know that in the other case,
639$$\left \vert ZSquarehml \right \vert \geq 2^{-155}$$
640We can suppose that in the triple-double number $zSquare_\hi + zSquare_\mi + zSquare_\lo$, $zSquare_\mi$ and
641$zSquare_\lo$ are not overlapped at all (since $zSquare_\mi = \circ \left( zSquare_\mi + zSquare_\lo \right)$)
642and that $zSquare_\hi$ and $zSquare_\mi$ are not fully overlapped.
643So we can note $\left \vert zSquare_\mi \right \vert \leq 2^{-\beta_o} \cdot \left \vert zSquare_\hi \right \vert$ and
644$\left \vert zSquare_\lo \right \vert \leq 2^{-\beta_u} \cdot \left \vert zSquare_\mi \right \vert$ with $\beta_o \geq 1$ and
645$\beta_u \geq 53$.
646We will show this property below we are just supposing here.
647So we can verify the following
648\begin{eqnarray*}
649\left \vert ZSquarehml \right \vert & = & \left \vert zSquare_\hi + zSquare_\mi + zSquare_\lo \right \vert \\
650& \leq & \left \vert zSquare_\hi \right \vert + \left \vert zSquare_\mi \right \vert + \left \vert zSquare_\lo \right \vert \\
651& \leq & \left \vert zSquare_\hi \right \vert +
6522^{-\beta_o} \cdot \left \vert zSquare_\hi \right \vert +
6532^{-\beta_o} \cdot 2^{-\beta_u} \cdot \left \vert zSquare_\hi \right \vert \\
654& \leq & 2 \cdot \left \vert zSquare_\hi \right \vert
655\end{eqnarray*}
656In consequence, we obtain
657$$\left \vert zSquare_\hi \right \vert \geq \frac{1}{2} \cdot \left \vert ZSquarehml \right \vert$$
658and thus
659$$\left \vert zSquare_\hi \right \vert \geq 2^{-156}$$ under the hypothesis that it is not exactly zero.
660So $zSquareHalf_\hi = -\frac{1}{2} \cdot zSquare_\hi$ will never be underflowed.
661
662Let us now show first that the operations for computing $zSquareHalf_\mi$ and $zSquareHalf_\lo$ cannot both be
663inexact. We will use the fact that $\left \vert zSquare_\lo \right \vert \leq 2^{-53} \cdot \left \vert zSquare_\mi \right \vert$.
664Suppose first that
665$$zSquareHalf_\mi \gets - \frac{1}{2} \otimes zSquare_\mi$$ is inexact. So $\left \vert zSquare_\mi \right \vert < 2^{-1022}$ and
666in consequence $\left \vert zSquare_\lo \right \vert < 2^{-1022-53}$. Note that the inegality is strict.
667Since the least (in magnitude) representable denormalized double precision floating point number is $2^{-52} \cdot 2^{-1023}$,
668$zSquare_\lo = 0$ in this case. So $$zSquareHalf_\lo \gets - \frac{1}{2} \otimes zSquare_\lo$$ is exact because trivially, a
669multiplication with $0$ is exact.
670
671Suppose now that $$zSquareHalf_\lo \gets - \frac{1}{2} \otimes zSquare_\lo$$ is inexact.
672So $\left \vert zSquare_\lo \right \vert < 2^{-1022}$. Further, the least significant bit of the mantissa of $zSquare_\lo$ is
673$1$ because otherwise, a bit-shift in its mantissa by 1 would be an exact operation.
674Thus $\left \vert zSquare_\lo \right \vert \geq 2^{-52} \cdot 2^{-1023}$ and $\left \vert zSquare_\mi \right \vert \geq 2^{-1022}$. So
675$$zSquareHalf_\mi \gets - \frac{1}{2} \otimes zSquare_\mi$$ cannot be inexact because in this case we would have
676$\left \vert zSquare_\mi \right \vert < 2^{-1022}$.
677
678So, in any case, if ever $zSquareHalf_\mi + zSquareHalf_\lo$ are not exactly
679$-\frac{1}{2} \cdot \left( zSquare_\mi + zSquare_\lo \right)$, the error made will be $\frac{1}{2} \cdot d$
680in magnitude, where $d = 0^+$ is the smallest representable denormalized non-zero double. So we can note down in this case
681$$zSquareHalf_\hi + zSquareHalf_\mi + zSquareHalf_\lo = - \frac{1}{2} \cdot \left( zSquare_\hi + zSquare_\mi + zSquare_\lo \right) + \delta$$
682with $\left \vert \delta \right \vert \leq 2^{-1075}$. Since we know that
683$\left \vert - \frac{1}{2} \cdot \left( zSquare_\hi + zSquare_\mi + zSquare_\lo \right) \right \vert \geq 2^{-156}$, we can give the
684following bound
685$$\left \vert \frac{\delta}{-\frac{1}{2} \cdot \left( zSquare_\hi + zSquare_\mi + zSquare_\lo \right)} \right \vert \leq
686\frac{2^{-1075}}{2^{-156}} = 2^{-919}$$
687So we get
688$$ZSquareHalfhml = - \frac{1}{2} \cdot ZSquarehml \cdot \left(1 + \epsilon\right)$$
689with
690$\left \vert \epsilon \right \vert \leq 2^{-919}$
691
692In contrast, since we know that $\left \vert Z \right \vert \leq 2^{-8}$
693thus that $\left \vert Z^2 \right \vert \leq 2^{-16}$ but that $\left \vert Z^2 \right \vert \geq 2^{-154}$, we can
694assume that the infinite precision mantissa of $Z^2$ can always be written exactly with at most $154 - 16 = 138 < 149$
695bits. As we can show that
696$\frac{1}{2} \cdot \left \vert ZSquarehml \right \vert \leq \left \vert zSquare_\hi \right \vert \leq
6972 \cdot \left \vert ZSquarehml \right \vert$ we know that if ever one of $zSquare_\mi$ or $zSquare_\lo$ is such that
698the multiplication with $-\frac{1}{2}$ is not exact, the error made has already been accounted for in the error bound
699for $ZSquarehml$ with regard to $Z^2$.
700So the operation computing $ZSquareHalfhml$ out of $ZSquarehml$ can be considered to be exact. \par
701Let us now analyse the following sequence
702\begin{lstlisting}[caption={Multiplication of triple-double $Log2hml$ by $E$},firstnumber=126]
703  log2edhover = log2h * ed;
704  log2edmover = log2m * ed;
705  log2edlover = log2l * ed;
706\end{lstlisting}
707Similar to the argumentation that has been given in section \ref{subsec:quickphase}, since $E=ed$ is bound
708in magnitude by $1024=2^{10}$ and since $log2_\hi$, $log2_\mi$ are stored with at least $11$ trailing bits at zero,
709the multiplications in these components are exact. The constant $log2_\lo$ is not stored with $11$ trailing bits at zero
710but it could be because we will be just supposing the bound $\left \vert \epsilon_{Log2hml} \right \vert \leq 2^{-3 \cdot 53 + 33} = 2^{-126}$ for
711$$\epsilon_{Log2hml} = \frac{log2_\hi + log2_\mi + log2_\lo - \log\left( 2 \right)}{\log\left( 2 \right)}$$
712So the multiplication is not exact in itself but the final result is exacter than the bound we are using for it.
713
714Let us finally just recall the Maple code for computing the constants:
715\begin{lstlisting}[caption={Maple code for computing $Log2hml$},firstnumber=21]
716log2acc := log(2):
717log2h := round(log2acc * 2**(floor(-log[2](abs(log2acc))) + (53 - 11))) / 2**(floor(-log[2](abs(log2acc))) + (53 - 11)):
718log2m := round((log2acc - log2h) * 2**(floor(-log[2](abs((log2acc - log2h)))) + (53 - 11))) / 2**(floor(-log[2](abs((log2acc - log2h)))) + (53 - 11)):
719log2l := log2acc - (log2h + log2m):
720\end{lstlisting}
721So the multiplication can be considered to be exact as long the less accurate bound for $\epsilon_{Log2hml}$ is used.
722
723Let us know analyse the bounds that we can give for the maximal overlap of the components of the triple-double numbers
724in the logarithm implementation. For doing this, we will assign each triple-double number in the code an overlap bound as
725follows. Call the number in consideration e.g. $a_\hi + a_\mi + a_\lo$. So we will give the bounds expressed like this:
726\begin{eqnarray*}
727\left \vert a_\mi \right \vert & \leq & 2^{-\alpha_o} \cdot \left \vert a_\hi \right \vert \\
728\left \vert a_\lo \right \vert & \leq & 2^{-\alpha_u} \cdot \left \vert a_\mi \right \vert
729\end{eqnarray*}
730where $\alpha_o, \alpha_u \geq 2$.
731We will then propagate this information following the flow of control in the implementation and using the overlap
732bound theorems given in \cite{Lauter2005LIP:tripledouble}. Here, we understand by ``propagating'' checking a system of constraints
733of the bounds under the limitations provided by the theorems. As the control-flow-graph of our implementation
734is completely linear, this check is linear, too. The theorems mentionned can be summarized as follows:
735\begin{center}
736\begin{tabular}{|l|ll|ll|ll|}
737\hline
738Operation & 1st arg. & 2nd arg. & result high & result low \\
739\hline
740\AddTT & $\alpha_o \geq 4$, $\alpha_u \geq 1$ & $\beta_o \geq 4$, $\beta_u \geq 1$ & $\gamma_o \geq \min\left( \alpha_o, \beta_o \right) - 5$ & $\gamma_u \geq 53$ \\
741\hline
742\AddDTT & - & $\beta_o \geq 2$, $\beta_u \geq 1$ & $\gamma_o \geq \min\left( 45, \beta_o - 4, \beta_o + \beta_u - 2 \right)$ & $\gamma_u \geq 53$ \\
743\hline
744\MulDT & - & - & $\gamma_o \geq 48$ & $\gamma_u \geq 53$ \\
745\hline
746\MulDTT & - & $\beta_o \geq 2$, $\beta_u \geq 1$ & $\gamma_o \geq \min\left( 48, \beta_o - 4, \beta_o + \beta_u - 4 \right)$ & $\gamma_u \geq 53$ \\
747\hline
748\end{tabular}
749\end{center}
750So let us analyse the following code:
751\begin{lstlisting}[caption={Triple-double computations},firstnumber=90,label={list:tripledouble}]
752Mul23(&zSquareh, &zSquarem, &zSquarel, zh, zl, zh, zl);
753Mul233(&zCubeh, &zCubem, &zCubel, zh, zl, zSquareh, zSquarem, zSquarel);
754Mul233(&higherPolyMultZh, &higherPolyMultZm, &higherPolyMultZl, t14h, t14l, zCubeh, zCubem, zCubel);
755zSquareHalfh = zSquareh * -0.5;
756zSquareHalfm = zSquarem * -0.5;
757zSquareHalfl = zSquarel * -0.5;
758Add33(&polyWithSquareh, &polyWithSquarem, &polyWithSquarel,
759      zSquareHalfh, zSquareHalfm, zSquareHalfl,
760      higherPolyMultZh, higherPolyMultZm, higherPolyMultZl);
761Add233(&polyh, &polym, &polyl, zh, zl, polyWithSquareh, polyWithSquarem, polyWithSquarel);
762Add33(&logyh, &logym, &logyl, logih, logim, logil, polyh, polym, polyl);
763log2edhover = log2h * ed;
764log2edmover = log2m * ed;
765log2edlover = log2l * ed;
766log2edh = log2edhover;
767log2edm = log2edmover;
768log2edl = log2edlover;
769Add33(&loghover, &logmover, &loglover, log2edh, log2edm, log2edl, logyh, logym, logyl);
770\end{lstlisting}
771This code will finally generate triple-double numbers respecting the following overlap bounds as will be
772shown below:
773\begin{center}
774\begin{tabular}{|l|l|l|l|}
775\hline
776Variable & Line(s) & $\alpha_o \geq$ & $\alpha_u \geq$ \\
777\hline
778$ZSquarehml$ & 90 & $48$ & $53$ \\
779\hline
780$ZCubehml$ & 91 & $44$ & $53$ \\
781\hline
782$HigherPolyMultZhml$ & 92 & $40$ & $53$ \\
783\hline
784$ZSquareHalfhml$ & 93-95 & $48$ & $53$ \\
785\hline
786$PolyWithSquarehml$ & 96-98 & $35$ & $53$ \\
787\hline
788$Polyhml$ & 99 & $31$ & $53$ \\
789\hline
790$Logyhml$ & 100 & $26$ & $53$ \\
791\hline
792$Log2edhml$ & 101-106 & $40$ & $40$ \\
793\hline
794$Logoverhml$ & 107 & $21$ & $53$ \\
795\hline
796\end{tabular}
797\end{center}
798So let us verify exemplarily some of these bounds:
799\begin{itemize}
800\item At line 90, $ZSquarehml$ is computed out of the double-double number $z_\hi + z_\lo$ by use of the \MulDT~ sequence.
801Since the inputs of this function are not triple-double, the overlap bound is just the bound provided by the sequence
802itself, i.e. $\alpha_o \geq 48$, $\alpha_u \geq 53$.
803\item $ZCubehml$ is the result of a \MulDTT~ sequence at line 91. Its overlap bound depends therefore on the one
804for $ZSquarehml$, which is the second argument of the function. Since we know the bound for this variable, we easily
805verify the one for $ZCubehml$ which is $\alpha_o \geq 44$ and $\alpha_u \geq 53$.
806\item $Log2edhml$ is the exact pairwise product of the triple-double constant $Log2hml$ and double $E$. Since $E$ may be
807as small as $0$ in magnitude and further, since the multiplication is pairwise, the overlap bound we dispose of for
808$Log2edhml$ is the same as for $Log2hml$ which is stored with at least $11$ bit trailing zeros.
809So an appropriate bound is $\alpha_o \geq 52 - 11 \geq 40$ and $\alpha_u \geq 40$.
810\end{itemize}
811All other bounds can be verified the same way using the theorems given in \cite{Lauter2005LIP:tripledouble} and indicated above. \par
812Since we have computed the overlap bounds for the different triple-double operands in the code, we can now
813calculate the accuracy bounds for the operations. Doing this is only possible with the knowledge of the
814overlap of the operations because all accuracy bound theorems given in \cite{Lauter2005LIP:tripledouble} are parameterized with this
815overlap expressions.
816
817Let us first give a list of the accuracy of the different basic operations which is not exhaustive with regard to
818its lack of listing almost all preconditions on the sequences required for theorems to hold. We refrain from explicitely
819verifying each of this preconditions in this document as this is only fastidious work but not of special interest.
820\begin{center}
821\begin{tabular}{|l|l|l|l|}
822\hline
823Operation & Overlap 1st arg. & Overlap 2nd arg. & Relative error $\epsilon$ \\
824\hline
825\AddDD & - & - & $\left \vert \epsilon \right \vert \leq 2^{-103.5} \leq 2^{-103}$ \\
826\hline
827\MulDD & - & - & $\left \vert \epsilon \right \vert \leq 2^{-102}$\\
828\hline
829\AddTT & $\alpha_o \geq 4$, $\alpha_u \geq 1$ & $\beta_o \geq 4$, $\beta_u \geq 1$ &
830$\left \vert \epsilon \right \vert \leq 2^{-\min\left( \alpha_o + \alpha_u, \beta_o + \beta_u \right) -47} + 2^{-\min\left( \alpha_o, \beta_o \right) - 98}$
831\\
832\hline
833\AddDTT & - & $\beta_o \geq 2$, $\beta_u \geq 1$ &
834$\left \vert \epsilon \right \vert \leq 2^{-\beta_o - \beta_u - 52} + 2^{-\beta_o-104} + 2^{-153}$
835\\
836\hline
837\MulDT & - & - &
838$\left \vert \epsilon \right \vert \leq 2^{-149}$
839\\
840\hline
841\MulDTT & - & $\beta_o \geq 2$, $\beta_u \geq 1$ &
842$\left \vert \epsilon \right \vert \leq 2^{-97-\beta_o} + 2^{-97-\beta_o-\beta_u} + 2^{-150}$
843\\
844\hline
845\end{tabular}
846\end{center}
847Still analyzing the following double-double computations code and the code
848given at listing \ref{list:tripledouble}, one can now easily check the bounds for
849the relative error of the different operations listed in the table below.
850We define here the relative error of an operation $\ast$ and its arithmetical equivalent $\circledast$ as follows:
851$$\epsilon = \frac{\left(a \circledast b \right) - \left(a \ast b\right)}{\left(a \ast b \right)}$$
852\begin{lstlisting}[caption={Double-double computations in accurate phase},firstnumber=73,label={list:doubledouble}]
853  Mul12(&t1h, &t1l, zh, highPoly);
854  Add22(&t2h, &t2l, accPolyC9h, accPolyC9l, t1h, t1l);
855  Mul22(&t3h, &t3l, zh, zl, t2h, t2l);
856  Add22(&t4h, &t4l, accPolyC8h, accPolyC8l, t3h, t3l);
857  Mul22(&t5h, &t5l, zh, zl, t4h, t4l);
858  Add22(&t6h, &t6l, accPolyC7h, accPolyC7l, t5h, t5l);
859  Mul22(&t7h, &t7l, zh, zl, t6h, t6l);
860  Add22(&t8h, &t8l, accPolyC6h, accPolyC6l, t7h, t7l);
861  Mul22(&t9h, &t9l, zh, zl, t8h, t8l);
862  Add22(&t10h, &t10l, accPolyC5h, accPolyC5l, t9h, t9l);
863  Mul22(&t11h, &t11l, zh, zl, t10h, t10l);
864  Add22(&t12h, &t12l, accPolyC4h, accPolyC4l, t11h, t11l);
865  Mul22(&t13h, &t13l, zh, zl, t12h, t12l);
866  Add22(&t14h, &t14l, accPolyC3h, accPolyC3l, t13h, t13l);
867\end{lstlisting}
868\begin{center}
869\begin{tabular}{|l|l|l|l|}
870\hline
871Result & Line(s) & Operation & Relative error $\epsilon$ \\
872\hline
873$T1hl$ through $T14hl$ & 73 - 86 & \AddDD~ / \MulDD &
874$\left \vert \epsilon \right \vert \leq 2^{-103}$ / $\left \vert \epsilon \right \vert \leq 2^{-102}$ \\
875\hline
876$ZSquarehml$ & 90 & \MulDT & $\left \vert \epsilon \right \vert \leq 2^{-149}$ \\
877\hline
878$ZCubehml$ & 91 & \MulDTT & $\left \vert \epsilon \right \vert \leq 2^{-144}$ \\
879\hline
880$HigherPolyMultZhml$ & 92 & \MulDTT & $\left \vert \epsilon \right \vert \leq 2^{-141}$ \\
881\hline
882$PolyWithSquarehml$ & 96-98 & \AddTT & $\left \vert \epsilon \right \vert \leq 2^{-137}$ \\
883\hline
884$Polyhml$ & 99 & \AddDTT & $\left \vert \epsilon \right \vert \leq 2^{-134}$ \\
885\hline
886$Logyhml$ & 100 & \AddTT & $\left \vert \epsilon \right \vert \leq 2^{-128}$ \\
887\hline
888$Logoverhml$ & 107 & \AddTT & $\left \vert \epsilon \right \vert \leq 2^{-123}$ \\
889\hline
890\end{tabular}
891\end{center}
892Let us just explicitely check the bound for one of the operations for sake of an example. Let us take
893therefore the \AddTT~ sequence at lines 96-98 computing $PolyWithSquarehml$ out of $ZSquareHalfhml$ and
894$HigherPolyMultZhml$. We have already obtained to following overlap bounds:
895\begin{eqnarray*}
896\left \vert zSquareHalf_\mi \right \vert & \leq & 2^{-48} \cdot \left \vert zSquareHalf_\hi \right \vert \\
897\left \vert zSquareHalf_\lo \right \vert & \leq & 2^{-53} \cdot \left \vert zSquareHalf_\mi \right \vert \\
898\left \vert higherPolyMultZ_\mi \right \vert & \leq & 2^{-40} \cdot \left \vert higherPolyMultZ_\hi \right \vert \\
899\left \vert higherPolyMultZ_\lo \right \vert & \leq & 2^{-53} \cdot \left \vert higherPolyMultZ_\mi \right \vert
900\end{eqnarray*}
901Feeding now this bounds into the theorem on the accuracy of \AddTT, we get
902$$\left \vert \epsilon \right \vert \leq 2^{-\min\left( 48 + 53, 40 + 53 \right) - 47} + 2^{-\min \left( 48, 40 \right) - 98} \leq 2^{-140} + 2^{-138} \leq 2^{-137}$$
903All other error bounds can be verified in a similar way. They are finally expressed in Gappa syntax as follows:
904\begin{lstlisting}[caption={Relative error bounds in Gappa code},firstnumber=139]
905(T2hl - T2) / T2 in [-1b-103,1b-103]
906/\ (T3hl - T3) / T3 in [-1b-102,1b-102]
907/\ (T4hl - T4) / T4 in [-1b-103,1b-103]
908/\ (T5hl - T5) / T5 in [-1b-102,1b-102]
909/\ (T6hl - T6) / T6 in [-1b-103,1b-103]
910/\ (T7hl - T7) / T7 in [-1b-102,1b-102]
911/\ (T8hl - T8) / T8 in [-1b-103,1b-103]
912/\ (T9hl - T9) / T9 in [-1b-102,1b-102]
913/\ (T10hl - T10) / T10 in [-1b-103,1b-103]
914/\ (T11hl - T11) / T11 in [-1b-102,1b-102]
915/\ (T12hl - T12) / T12 in [-1b-103,1b-103]
916/\ (T13hl - T13) / T13 in [-1b-102,1b-102]
917/\ (T14hl - T14) / T14 in [-1b-103,1b-103]
918/\ (ZSquarehml - ZSquare) / ZSquare in [-1b-149,1b-149]
919/\ (ZCubehml - ZCube) / ZCube in [-1b-144,1b-144]
920/\ (HigherPolyMultZhml - HigherPolyMultZ) / HigherPolyMultZ in [-1b-141,1b-141]
921/\ (PolyWithSquarehml - PolyWithSquare) / PolyWithSquare in [-1b-137,1b-137]
922/\ (Polyhml - Poly) / Poly in [-1b-134,1b-134]
923/\ (Logyhml - Logy) / Logy in [-1b-128,1b-128]
924/\ (Loghml - Logover) / Logover in [-1b-123,1b-123]
925/\ (Log2hml - MLog2) / MLog2 in [-1b-126,1b-126]
926/\ (Logihml - MLogi) / MLogi in [-1b-159,1b-159]
927/\ (MPoly - MLog1pZ) / MLog1pZ in [-_epsilonApproxAccurate,_epsilonApproxAccurate]
928/\ Z in [_zmin,_zmax]
929/\ ((logh + logm + logl) - Loghml) / Loghml in [-1b-159,1b-159]
930\end{lstlisting}
931Concerning the Gappa proofs for accurate phase, in a similar way as for the quick phase, three different proof
932files are used. They reflect once again the three main cases for the argument of the logarithm function:
933\begin{itemize}
934\item For cases where after argument reduction $\left \vert E \right \vert \geq 1$,
935the file {\tt log-td-accurate.gappa} is used. In this case, absolute error computations are sufficient
936for the final relative error bound to be calculable because
937$\left \vert \log\left( x \right) \right \vert \geq \frac{1}{2} \log\left( 2 \right)$ in this case.
938\item For the case where after argument reduction, $E = 0$ and $i \not = 0$, the file
939{\tt log-td-accurate-E0.gappa} is used. The same way here, we have a preponderant constant term so absolute
940error computations suffice.
941\item For the other case, where $E=0$ and $i=0$ the file {\tt log-td-accurate-E0-logir0.gappa} provides the
942accuracy bound proof. In contrast to the other cases, we obliged to relative error estimations since the beginning
943since the function $\log\left( x \right)$ has a zero in this intervall.
944\end{itemize}
945Once again, several term rewriting hints are needed in the Gappa proof files for enabling the Gappa tool to
946generate a proof for the accuracy bounds. In a similar way, the hints which cannot directly be checked for their
947mathematical correctness by the tool itself are verified by semi-automatic Maple scripts.\par
948By the existence of an accuracy proof for a final relative error of $\left \vert \epsilon_{\mbox{\tiny accurate}} \right \vert \leq 2^{-119.5}$ and
949by the use of the critical accuracy of the double precision natural logarithm function which is
950$118$ bits\cite{DinDefLau2004LIP}, we can consider the implementation to be correctly rounding under the hypothesis
951that the final rounding sequence used is exact and correct. Since we suppose this -- a correctness proof can be
952found in \cite{Lauter2005LIP:tripledouble} -- the correctly rounding property is verified.
953
954
955\section{Proof of correctness of the double-extended implementation \label{sec:logdeproof}}
956
957
958\section{Performance results\label{sec:logperf}}
959The given implementation of the natural logarithm function aims at
960being both portable and more performant than the previous
961implementations using the SCS libary for the accurate phase.  This
962goal is acheived in terms of memory consumption (if the code sizes for
963{\tt scslib} are taken into account) and in terms of speed
964performance. The reason for this is mainly the possibility of reusing
965all values computed in the argument reduction phase and the tables for
966the accurate phase directly.
967
968\subsection{Memory requirements}
969
970
971
972\subsection{Timings}
973
974
975%%% Local Variables:
976%%% mode: latex
977%%% TeX-master: "crlibm"
978%%% End:
979