1There are two versions of the logarithm. 2\begin{itemize} 3\item The first relies on 80-bit double-extended arithmetic, and is 4 well suited to IA32 and IA64 architectures which have hardware 5 support for such arithmetic. It computes the quick step in 6 double-extended arithmetic, and the accurate step in 7 double-double-extended arithmetic. 8\item The second relies only on double-precision arithmetic, and is 9 portable. It uses double-double for the quick step, and 10 triple-double for the accurate step. 11\end{itemize} 12 13Both implementations use the same algorithm, which is detailed in 14\ref{sec:logoutline}. Sections \ref{sec:logdeproof} and 15\ref{sec:logtdproof} detail the proof of both implementations, and 16\ref{sec:logperf} give some performance results. 17 18 19\section{General outline of the algorithm\label{sec:logoutline}} 20The algorithm used is mainly due to 21Wong and Goto\cite{WG94} and has been discussed further in \cite{Muller97}. In the case we are given here, 22both quick and accurate phase use principally the same algorithm however optimized for different accuracies. 23 24The function's argument $x \in \F$ is first checked for special cases, such as $x\leq0$, $+\infty$, $\nan$ etc. 25These checks are mainly implemented using integer arithmetics and will be further explained in section 26\ref{subsec:reduction}. Then, the argument is reduced using integer arithmetics as follows: 27$$x = 2^{E^\prime} \cdot m$$ 28where $E^\prime$ is the exponent of $x$ and $m$ a double corresponding 29to the mantissa of $x$. This decomposition is done such that in any 30case, i.e. even if $x$ is subnormal, $1 \leq m < 2$. In the subnormal 31case, the exponent of $x$ is adjusted accordingly. This first 32argument reduction corresponds to the equality 33$$\log\left( x \right) = E^\prime \cdot \log\left(2\right) + \log\left(m \right)$$ 34Using this term directly would lead to catastrophic cancellation in the case where $E^\prime = -1$ and 35$m \approx 2$. To overcome this difficulty, a second adjustment is done as follows: 36\begin{center} 37 \begin{tabular}{cc} 38 \begin{minipage}{50mm} 39 $$E = \left \lbrace \begin{array}{lcl} E^\prime & \mbox{ if } & m \leq \sqrt{2} \\ 40 E^\prime +1 & \mbox{ if } & m > \sqrt{2} \end{array} \right.$$ 41 \end{minipage} 42 & 43 \begin{minipage}{50mm} 44 $$y = \left \lbrace \begin{array}{lcl} m & \mbox{ if } & m \leq \sqrt{2} \\ 45 \frac{m}{2} & \mbox{ if } & m > \sqrt{2} \end{array} \right.$$ 46 \end{minipage} 47 \end{tabular} 48\end{center} 49 50The decision whether $m \leq \sqrt{2}$ or not is performed using integer arithmetics on 51the high order bits of the mantissa $m$. The test is therefore not completely exact which is no 52disadvantage since, in any case, the bound $\sqrt{2}$ is somewhat arbitrary.\par 53All the previous reduction steps can be implemented exactly as they consist mainly in decompositions 54of a floating point number, multiplications by powers of $2$ and integer additions on the corresponding exponent value. 55All this leads to the following equation 56$$\log\left( x \right) = E \cdot \log\left( 2 \right) + \log\left( y \right)$$ 57where 58$$-\frac{1}{2} \cdot \log\left( 2 \right) \leq \log\left( y \right) \leq \frac{1}{2} \cdot \log\left( 2 \right)$$ 59The magnitude of $y$ is thus still too great for allowing for a direct polynomial approximation of $\log\left(y\right)$. 60Therefore, a second argument reduction step is performed using a table of $128$ entries as follows: 61using the high order bits of $y$ as an index $i$, a tabulated value $r_i$ is looked up which approximated very well 62$\frac{1}{y}$. Setting $z = y \cdot r_i - 1$, one obtains 63$$\log\left( y \right) = \log\left( 1 + z \right) - \log\left( r_i \right)$$ 64Since $y = \frac{1}{r_i} + \delta$ the magnitude of $z$ is finally 65small enough (typically $\left \vert z \right \vert < 2^{-8}$) for 66approximating $\log\left(1+z\right)$ by a Remez polynomial 67$p\left(z\right)$. The values for $\log\left(r_i\right)$ are of course also tabulated. 68 69It is important to notice that the reduction step 70$$z = y \cdot r_i - 1$$ 71can be implemented exactly which eases the correctness proof of the algorithm. This property will be proven in 72section \ref{subsec:reduction}. The reduced argument $z$ will 73be represented as a double-double number $z_\hi + z_\lo$ that will be fed into the polynomial approximation 74algorithms of both quick and accurate phase. Each of these phases will take into account the lower significant value 75$z_\lo$ for more or less 76higher monomial degrees. 77 78Both phases will finally reconstruct the function's value as follows: 79$$\log\left( x \right) \approx E \cdot \log\left( 2 \right) + p\left( z \right) - \log\left( r_i \right)$$ 80using a double (respectively a triple for the accurate phase) double value for each 81$\log\left( 2 \right)$ and $-\log\left( r_i \right)$. The computations necessary for performing this reconstruction 82are carried out in double-double arithmetics for the quick phase and triple-double for the accurate phase. 83 84The quick phase uses a modified Remez polynomial of degree $7$ of the form 85$$p\left( z \right) = z - \frac{1}{2} \cdot z^2 + z^3 \cdot 86\left( c_3 + z \cdot \left( c_4 + z \cdot \left( c_5 + z \cdot \left( c_6 + z \cdot c_7 \right) \right) \right) \right)$$ 87with $c_i \in \F$. 88This polynomial is evaluated as indicated by the parenthesis in the following term: 89$$p\left( z_\hi + z_\lo \right) \approx \left( \left(z_\hi + z_\lo \right) - \frac{1}{2} \cdot z_\hi^2\right) + 90\left( \left( - z_\hi \cdot z_\lo \right) + 91\left(z_\hi^2 \cdot z_\hi \right) \cdot 92\left( c_3 + z_\hi \cdot \left( c_4 + z_\hi \cdot \left( c_5 + z_\hi \cdot \left( c_6 + z_\hi \cdot c_7 \right) \right) \right) \right) \right)$$ 93The mathematical relative approximation error of the polynomial $p\left( z \right)$ defined as 94$$\epsilon_{\mbox{\tiny meth}} = \frac{p\left( z \right) - \log\left( 1 + z \right)}{\log\left(1 + z \right)}$$ is bounded by 95$$\left \vert \epsilon_{\mbox{\tiny meth}} \right \vert \leq 2^{-62.99}$$ 96This methodical error is joined by the arithmetical error induced by the evaluation of $p\left( z \right)$ 97and by the rounding of the constants $\log\left( 2 \right)$ and $\log\left( r_i \right)$. 98As will be shown in section \ref{subsec:quickphase}, the overall error of the quick phase defined as 99$$\epsilon_{\mbox{\tiny quick}} = \frac{\left(log_\hi + log_\lo\right) - \log\left(x\right)}{\log\left(x\right)}$$ 100is bounded by 101$$\left \vert \epsilon_{\mbox{\tiny quick}} \right \vert \leq 5 \cdot 2^{-65} \leq 2^{-62.6}$$ ~ \par 102After the computation of the quick phase double-double value $\left( log_\hi + log_\lo \right)$ a rounding test is performed 103using the rounding constants according to \ref{th:roundingRN1}. If the rounding cannot be decided, the accurate 104phase is launched. \par 105The accurate phase performs all its computations on the same reduced argument $z = z_\hi + z_\lo$ which will be shown to be 106exact. An approximation polynomial of degree $14$ is used. It is once again a modified Remez polynomial and has the 107following form: 108$$p\left( z \right) = z + \frac{1}{2} \cdot z + z^3 \cdot q\left( z \right)$$ 109where 110$$q\left( z \right) = 111c^\prime_3 + z \cdot \left( 112c^\prime_4 + z \cdot \left( 113c^\prime_5 + z \cdot \left( 114c^\prime_6 + z \cdot \left( 115c^\prime_7 + z \cdot \left( 116c^\prime_8 + z \cdot \left( 117c^\prime_9 + z \cdot r\left( z \right) \right) \right) \right) \right) \right) \right)$$ 118with 119$c^\prime_i = c_{i\hi} + c_{i\lo} \in \F + \F$ and 120$$r\left( z \right) = 121c_{10} + z \cdot \left( 122c_{11} + z \cdot \left( 123c_{12} + z \cdot \left( 124c_{13} + z \cdot c_{14} \right) \right) \right)$$ 125with $c_i \in \F$. 126The mathematical relative error 127$$\epsilon_{\mbox{\tiny meth}} = \frac{p\left( z \right) - \log\left( 1 + z \right)}{\log\left( 1 + z \right)}$$ 128is bounded by 129$$\left \vert \epsilon_{\mbox{\tiny meth}} \right \vert \leq 2^{-125}$$ 130The polynomial is evaluated using double precision for $r\left( z \right)$, double-double arithmetic for 131$q\left( z \right)$ and a triple-double representation for $p\left( z \right)$ and the final reconstruction. 132 133The overall error 134$$\epsilon_{\mbox{\tiny accurate}} = \frac{\left( log_\hi + log_\mi + log_\lo \right) - \log\left( x \right)}{\log\left( x \right)}$$ 135is bounded by 136$$\left \vert \epsilon_{\mbox{\tiny accurate}} \right \vert \leq 5735 \cdot 2^{-132} \leq 2^{-119.5}$$ 137as will be shown in section \ref{subsec:accuratephase}. Here $\left( log_\hi + log_\mi + log_\lo \right)$ 138are obtained by reconstructing the logarithm as indicated by the parenthesis in the following term: 139$$log_\hi + log_\mi + log_\lo = \left(E \cdot \left( log2_\hi + log2_\mi + log2_\lo \right) \right) + 140\left( \left( p_\hi + p_\mi + p_\lo \right) + \left(logi_\hi + logi_\mi + logi_\lo \right) \right)$$ 141where 142$log2_\hi + log2_\mi + log2_\lo \approx \log\left( 2 \right)$ and $logi_\hi + logi_\mi + logi_\lo \approx -\log\left(r_i \right)$. 143 144Since the critical accuracy of the double precision $\log$ function is $118$ bits according to 145\cite{DinDefLau2004LIP}, rounding $\log_\hi + log_\mi + log_\lo \approx \log\left( x \right)$ to double precision is equivalent 146to rounding the infinite precision value $\log\left( x \right)$ to double precision. 147Using the final rounding sequences presented in \cite{Lauter2005LIP:tripledouble}, which are supposed to be correct, 148the double precision value returned by the function is the correctly rounded double precision value of 149$\log\left( x \right)$. 150 151 152 153\section{Proof of correctness of the triple-double implementation \label{sec:logtdproof}} 154Proving that an implementation of an elementary function is correctly rounded means mainly proving two 155bounds on the relative error $\epsilon_{\mbox{\tiny quick}}$ and $\epsilon_{\mbox{\tiny accurate}}$, using the appropriate lemma for proving the 156correctness of the rounding test and conluding by means of the theorem stating the critical accuracy of the 157function considered. The computation of the error bounds will be done mainly using the Gappa tool\cite{Melqu05} but 158some parts of the proof are still based on paper or Maple computations. These parts will be shown in sections 159\ref{subsec:reduction}, \ref{subsec:quickphase} and \ref{subsec:accuratephase} and mainly comprise the following: 160\begin{itemize} 161\item the demonstration that all special cases are handled correctly, 162\item a proof that $z_\hi + z_\lo = r_i \cdot y - 1$ exactly, 163\item the bounds for the mathematical approximation errors for the polynoms, 164\item a proof of the exactness of some multiplications in the code, 165\item the proof for the accuracy of all basic addition and multiplication code sequences on 166double-double and triple-double numbers, 167\item the correctness proof of the final rounding sequences for rounding triple-double numbers to double precision and 168\item the mathematical equality of the term rewriting hints in the Gappa code. 169\end{itemize} 170The proofs for the accuracy of the basic operation bricks and the correctness proof of the final rounding sequences 171are somewhat lengthy and are not given here; they can be found in \cite{Lauter2005LIP:tripledouble}. 172\subsection{Exactness of the argument reduction\label{subsec:reduction}} 173In this section, we will show that all special cases are handled correctly and that the 174reduced argument consisting in $E$ and $z_\hi + z_\lo$ is exact, which means that we have the mathematically exact 175equation 176$$\log\left( x \right) = E \cdot \log\left( 2 \right) + \log\left( 1 + \left( z_\hi + z_\lo \right) \right) - \log\left( r_i \right)$$ 177This part of the algorithm is performed by the following code sequences which we will analyse line by line: 178\begin{lstlisting}[caption={Handling of special cases and table access},firstnumber=1] 179E=0; 180xdb.d=x; 181 182/* Filter cases */ 183if (xdb.i[HI] < 0x00100000){ /* x < 2^(-1022) */ 184 if (((xdb.i[HI] & 0x7fffffff)|xdb.i[LO])==0){ 185 return -1.0/0.0; 186 } /* log(+/-0) = -Inf */ 187 if (xdb.i[HI] < 0){ 188 return (x-x)/0; /* log(-x) = Nan */ 189 } 190 /* Subnormal number */ 191 E = -52; 192 xdb.d *= ((db_number) ((double) two52)).d; /* make x a normal number */ 193} 194 195if (xdb.i[HI] >= 0x7ff00000){ 196 return x+x; /* Inf or Nan */ 197} 198 199 200/* Do argument reduction */ 201E += (xdb.i[HI]>>20)-1023; /* extract the exponent */ 202index = (xdb.i[HI] & 0x000fffff); 203xdb.i[HI] = index | 0x3ff00000; /* do exponent = 0 */ 204index = (index + (1<<(20-L-1))) >> (20-L); 205 206/* reduce such that sqrt(2)/2 < xdb.d < sqrt(2) */ 207if (index >= MAXINDEX){ /* corresponds to xdb>sqrt(2)*/ 208 xdb.i[HI] -= 0x00100000; 209 E++; 210} 211y = xdb.d; 212index = index & INDEXMASK; 213 214ed = (double) E; 215 216ri = argredtable[index].ri; 217 218logih = argredtable[index].logih; 219logim = argredtable[index].logim; 220\end{lstlisting} 221Analysis of the code: 222{\renewcommand{\labelenumi}{} 223\begin{enumerate} 224\item line 1 and 2: Initialization of integer $E$ and {\tt db\_number} {\tt xdb} which is now equal to $x$. 225\item line 5: As the integer ordering and the ordering on floating point numbers are compatible, 226$x < +2^{-1022}$, i.e. negative, negative infinite, equal to zero or a subnormal. 227\item line 6: {\tt xdb.i[HI] \& 0x7fffffff} is the high order word of $x$ without the sign bit. If the test is true, 228$\left \vert x \right \vert = 0$. As the logarithm of $0$ is not defined but as the limit $-\infty$ is known, returning 229$-1.0 / 0.0$ is correct. 230\item line 9: Since the integer ordering and the ordering on floating point numbers are compatible, 231{\tt xdb.i[HI] < 0} implies $x < 0$. The logarithm is not defined for negative numbers, so the result must be $\nan$. 232$0.0 / 0.0$ leads to a $\nan$; one uses $\left(x - x\right) / 0.0$ in order to overcome the static tests of the compiler. 233\item line 13 and 14: if this code lines are reached, $x$ is a subnormal. Since $E$ equals $0$ at this point, 234setting it to $-52$ and multipliying {\tt xdb} by $2^{-52}$ means bringing {\tt xdb} to the normal number range and 235rescaling the internal representation $x = 2^E \cdot m = 2^E \cdot${\tt xdb} in consequence. 236\item line 17: As the integer ordering and the ordering on floating point numbers are compatible and as 237{\tt 0x7fefffff ffffffff} is the greatest normal, the test being true implies that $x$ is equal to $+\infty$ or $\nan$. 238In the case of $x=+\infty$, $+\infty$ must be returned which is done. In the other case, $\nan$ must be returned which is 239still assured by $x + x$. 240\item line 23: At this point of the code, the most significant bit of the high order word of {\tt xdb} must be $0$ as 241the case where $x < 0$ is already filtered out. So {\tt xdb.i[HI] > > 20} is equal to the biased exponent of {\tt xdb} 242because a double number consists in $1$ sign bit, $11$ exponent bits and the word bit length is supposed to be $32$. 243Subtracting $1023$ yields to the unbiased exponent which is written to $E$. 244\item line 24 and 25: Since a double number consists in $1$ sign bit and $11$ exponent bits, the operation 245{\tt xdb.i[HI] \& 0x000fffff} masks out the mantissa bits in the higher order word of {\tt xdb}. 246Rewriting {\tt xdb.i[HI] = index | 0x3ff00000} means setting the exponent of {\tt xdb} to $0$ because 247{\tt 0x3ff}$ - 1023 = 0$. 248\item line 26: Before execution of this line of code, {\tt index} contains the high order bits of the normalized mantissa 249of $x$ stored as a double in {\tt xdb.d} and verifying thus $1 \leq m < 2$. The second argument reduction step 250will slice this intervall in $128$ intervalls for each of which we dispose of a table entry. For reasons of possible 251cancellation in the reconstruction step on the operation $p\left( z \right) - \log\left( r_i \right)$, we want the 252small intervalls to be centered around $1$. That means e.g. for the intervall around $1$ and a table indexed by $7$ bits 253that mantissas (as doubles) with the high order word {\tt 0x3fefffff} through {\tt 0x3ff00fff} must be mapped to $0$. 254The decision is therefore made at the $7+1$th bit of the mantissa part of the double depending on whether this bit is $0$ 255-- in which case the value falls in the lower intervall -- or $1$ -- in which case the value goes to the next higher 256intervall. So adding $1$ to the $\left(20 - 7 - 1\right)$ rightmost bit ($L = 7$) increases the index value by $1$ iff this bit is $1$. 257So after execution of the line, {\tt index} contains the number of the intervall for the second argument reduction step 258centered in $1$. 259\item line 29 through 31: The second adjustment to be made on $E^\prime$ and $m$ is the decision whether $m > \sqrt{2}$ as 260indicated in section \ref{sec:logoutline}. The high order word of $\sqrt{2}$ rounded to a double is {\tt 0x3ff6a09e}. 261As one can simply verify, the value for {\tt index} calculated for this value is $53$. As the integer ordering and 262the ordering of floating point numbers are compatible and as the computations for computing {\tt index} are monotone, 263{\tt index} being greater or equal than $53$ implies that the (normalized) mantissa of $x$ is greater than 264$\sqrt{2} + \delta$ with a neglectable error $\delta$. 265As {\tt MAXINDEX} is equal to $53$, the test will be true iff the adjustment on $E^\prime$ leading 266to $E$ and $m$ yielding $y$ is to be made. It is trivial to see that the code in the {\tt if}'s body implements the 267adjustment correctly. 268\item lines 33 and 34: the final value of the reduced argument $y$ -- still stored in {\tt xdb.d} -- is copied to 269a {\tt double} variable (or register) named {\tt y}. The final index value is masked out by means of an {\tt INDEXMASK} 270which is equal to $127 = 2^7-1$. 271\item lines 36: The integer value of the exponent $E$ stored in {\tt E} is cast to a {\tt double ed}. 272\item lines 38 through 41: The table is indexed by {\tt index} and values {\tt ri}$=r_i$ and 273{\tt logih}$=logi_\hi$ and {\tt logim}$=logi_\mi$ are read. 274Since the latter form a double-double precision value, we know that 275$logi_\hi + logi_\mi = \log\left( r_i \right) \cdot \left( 1 + \epsilon \right)$ with $\left \vert \epsilon \right \vert \leq 2^{-106}$. 276The value {\tt ri} is stored as a single precision variable and a Maple procedure assures that for each value 277$y$ the following inegality is verified: 278$$\left \vert z \right \vert = \left \vert y \cdot r_i - 1 \right \vert \leq 2^{-8}$$ 279\end{enumerate} 280} 281Let us show now that the following line calculate $z_\hi$ and $z_\lo$ such that for each $y$ and corresponding $r_i$, 282we obtain exactly 283$$z_\hi + z_\lo = y \cdot r_i - 1$$ 284\begin{lstlisting}[caption={Argument reduction},firstnumber=42] 285Mul12(&yrih, &yril, y, ri); 286th = yrih - 1.0; 287Add12Cond(zh, zl, th, yril); 288\end{lstlisting} 289We know that we can suppose that the multiplication and addition sequences \Mul~ and \Add~ used at lines 29042 and 44 are exact. Thus, it suffices to show that 291$$yri_\hi - 1.0 = yri_\hi \ominus 1.0$$ 292because in that case, we can note 293$$z_\hi + z_\lo = th + yri_\lo = yri_\hi \ominus 1.0 + yri_\lo = y \cdot r_i - 1.0$$ 294We will show this property using Sterbenz' lemma. It suffices thus to prove that 295$$\frac{1}{2} \leq yri_\hi \leq 2$$ 296We know that 297\begin{eqnarray*} 298yri_\hi & = & \circ\left( y \cdot r_i \right) \\ 299& \leq & \circ \left( 1 + 2^{-8} \right) \\ 300& = & 1 + 2^{-8} \\ 301& < & 2 302\end{eqnarray*} 303since the rounding function $\circ$ is monotonic and the accuracy of the format is greater than $9$ bits. 304 305The other way round, we get 306\begin{eqnarray*} 307yri_\hi & = & \circ \left( y \cdot r_i \right) \\ 308& \geq & \circ \left( 1 - 2^{-8} \right) \\ 309& = & 1 - 2^{-8} \\ 310& > & \frac{1}{2} 311\end{eqnarray*} 312for the same reasons. 313 314Thus $z_\hi + z_\lo = y \cdot r_i$ exactly. Since the previous phases of the argument reduction were all exact, the reduced argument 315verifies $x = 2^{E} \cdot y$ exactly. 316 317Still in this section, let us show that neither the reduced argument of the logarithm function nor its result may be 318a sub-normal double number. The first property has already been assured by special case handling as shown above. The 319latter can be proven as follows: the $\log\left( x \right)$ function has one zero for $x = 1$ and only one. 320As it is monotone, for $x = 1 \pm 1 \mUlp = 1 \pm 2^{-52}$ we will obtain $\log\left( 1 \pm 2^{-52} \right) = 0 \pm 2^{-52} + \delta$ 321with 322$\left \vert \delta \right \vert \leq 2^{-103}$. As $0 \pm 2^{-1022}$ is the least normal, the result of the logarithm function will 323always be a normal. Further, in both double-double and triple-double representations for the final intermediate result 324for the function, as its critical accuracy is $118$, the least significant double in the representation will still be 325a normal as $52 + 106 = 158 < 1022$. 326\subsection{Accuracy proof of the quick phase\label{subsec:quickphase}} 327As already mentionned, the accuracy proof of the quick phase is mainly based on the Gappa tool. To prove the desired 328accuracy bound defined as 329$$\epsilon_{\mbox{\tiny quick}} = \frac{\left(log_\hi + log_\lo\right) - \log\left(x\right)}{\log\left(x\right)}$$ 330and given by 331$$\left \vert \epsilon_{\mbox{\tiny quick}} \right \vert \leq 5 \cdot 2^{-65} \leq 2^{-62.6}$$ 332three different Gappa proof files are necessary depending on the following cases: 333\begin{itemize} 334\item for $E \geq 1$ and all indexes to the table $0 \leq i \leq 127$, a general proof file named {\tt log-td.gappa} is used 335\item for $E = 0$ and all indexes to the table except $0$, i.e. $1 \leq i \leq 127$, a proof file named {\tt log-td-E0.gappa} 336comes to hand and 337\item for $E = 0$ and the table index $i = 0$, a proof file called {\tt log-td-E0-logir0.gappa} is employed. 338This latter file 339uses relative error computations in opposition to the other two cases where absolute error estimates suffice. This 340is necessary because in this case and in this one only, the logarithm function has a zero in the intervall considered. 341\end{itemize} 342In each of the three proof files, we will ask the Gappa tool to verify the accuracy bound expressed in its syntax as 343follows: 344\begin{lstlisting}[caption={Accuracy bound to prove},firstnumber=109] 345-> 346((logh + logm) - Log) / Log in [-5b-65,5b-65] 347\end{lstlisting} 348Still in any proof file, some hypothesis are made on the correctness of one multiplication sequence and the 349accuracy of the constants and resting operations carried out in double-double arithmetic. 350These hypothesis are the following: 351\begin{itemize} 352\item The operations in the following code sequence are exact since the constants are stored with enough trailing zeros: 353\begin{lstlisting}[caption={Multiplication by $E$},firstnumber=50] 354Add12(log2edh, log2edl, log2h * ed, log2m * ed); 355\end{lstlisting} 356This means that $log2ed_\hi + log2ed_\lo = E \cdot \left( log2_\hi + log2_\lo \right)$ exactly. 357\item The operations in the following code sequence are exact since multiplications with a power of $2$ are exact 358as long as the result is not underflowed: 359\begin{lstlisting}[caption={Multiplication by $-0.5$},firstnumber=60] 360zhSquareHalfh = zhSquareh * -0.5; 361zhSquareHalfl = zhSquarel * -0.5; 362\end{lstlisting} 363i.e. $zhSquareHalf_\hi + zhSquareHalf_\lo = -0.5 \cdot \left( zhSquare_\hi + zhSquare_\lo \right)$. 364\item The following hypothesis on the accuracy bounds, expressed here in Gappa syntax, are verified: 365\begin{lstlisting}[caption={Gappa hypothesis},firstnumber=100] 366(T2hl - T2) / T2 in [-1b-103,1b-103] 367/\ (Phl - PE) / PE in [-1b-103,1b-103] 368/\ (LogTabPolyhl - LogTabPoly) / LogTabPoly in [-1b-103,1b-103] 369/\ (Loghm - LogE) / LogE in [-1b-103,1b-103] 370/\ (Log2hm - Log2) / Log2 in [-1b-84,1b-84] 371/\ (Logihm - Logir) / Logir in [-1b-106,1b-106] 372/\ Z in [_zmin,_zmax] 373/\ (P - Log1pZ) / Log1pZ in [-_epsilonApproxQuick,_epsilonApproxQuick] 374/\ ((logh + logm) - Loghm) / Loghm in [-1b-106,1b-106] 375\end{lstlisting} 376Here, {\tt \_zmin}, {\tt \_zmax} and {\tt \_epsilonApproxQuick} are replaced by Maple calculated values, typically 377$-zmin = zmax = 2^{-8}$ and $epsilonApproxQuick = 2^{-62.99}$. 378\end{itemize} 379Let us now show each of this hypothesises. 380\begin{enumerate} 381\item The operations yielding {\tt log2edh} and {\tt log2edl} are all exact because the \Add~ sequence is supposed 382to be exact in any case and because the constants {\tt log2h} and {\tt log2m} are calculated by the following Maple 383code and have in consequence at least 11 trailing zeros and {\tt ed} $=E$ is less than $1024$ in magnitude since $1024$ is 384the maximum exponent value for double precision. 385\begin{lstlisting}[caption={Maple code for computing {\tt log2h} and {\tt log2m}},firstnumber=21,label={list:maplelog2}] 386log2acc := log(2): 387log2h := round(log2acc * 2**(floor(-log[2](abs(log2acc))) + (53 - 11))) / 388 2**(floor(-log[2](abs(log2acc))) + (53 - 11)): 389log2m := round((log2acc - log2h) * 2**(floor(-log[2](abs((log2acc - log2h)))) + 390 (53 - 11))) / 2**(floor(-log[2](abs((log2acc - log2h)))) + (53 - 11)): 391\end{lstlisting} 392\item To show that $zhSquareHalf_\hi + zhSquareHalf_\lo = -0.5 \cdot \left( zhSquare_\hi + zhSquare_\lo \right)$ we just have to show 393that both values $zhSquare_\hi$ and $zhSquare_\lo$ are either equal to $0$ or greater than $2$ times the smallest 394normal. Let us first give the definitions of both values: 395\begin{eqnarray*} 396zhSquare_\hi & = & \circ \left( z_\hi \cdot z_\hi \right) \\ 397zhSquare_\lo & = & z_\hi \cdot z_\hi - zhSquare_\hi 398\end{eqnarray*} 399where $z_\hi = \circ \left( z \right)$. 400Let us suppose that $z \not = 0$. Otherwise all values are equal to $0$ and we can conclude. 401 402Let us first show that $\left \vert zhSquare_\hi \right \vert$ is greater than $2^{54}$ times the smallest normal. 403Let us therefore suppose that this 404is not the case, i.e. $\left \vert zhSquare_\hi \right \vert < 2^{-948}$. Since the rounding function is monotonic, 405this implies that $\left \vert z_\hi \right \vert \leq 2^{-424}$. For the same reason, we can note that 406$\left \vert z \right \vert \leq 2^{-424}$. As we have $z = y \cdot r_i - 1$, clearly neither $y$ nor $r_i$ can be exactly $1$. 407If this were the case for both, we would obtain $z=0$ which we excluded; if there were one of them only that was 408exactly $1$, the other being a floating point number in the intervall $\left[ 0.5; 1.5 \right]$, 409the resulting inegality $\left \vert z \right \vert \geq 2^{-53}$ which would be contradictory. 410 411Otherwise, since we know that $1 - 2^{-8} \leq y \cdot r_i \leq 1 + 2^{-8}$ and since the precision of all formats used is greater than 412$9$, the hypothesis that $1 - 2^{-424} \leq y \cdot r_i \leq 1 + 2^{-424}$ and $y \cdot r_i \not = 0$ 413would imply that the infinite precision mantissa of 414$y \cdot r_i$ contains a $1$ weighted with $2^0$ and a $1$ weighted with less than $2^{-424}$. So its length would be greater than 415$423$ bits. As it is the product of two floating point numbers which have $52$ and $23$ significant bits, there 416cannot be a $1$ weighted with less than $76$ if there is a $1$ weighted with $2^0$ which is the case. Contradiction. 417 418So $-0.5 \cdot zhSquare_\hi$ is not underflowed. Additionally, with a similar argument, since {\tt zh} is a double precision 419number, $zhSquare_\lo$ is either $0$ or greater in magnitude than $2^{-53} \cdot \left \vert zhSquare_\hi \right \vert$ which is 420$2^{52}$ times greater in magnitude than the smallest normal. So $zhSquare_\lo$ is either $0$ or $2$ times greater in 421magnitude than the smallest normal. 422 423So, the floating point multiplication of $zhSquare_\hi$ and $zhSquare_\lo$ with $-0.5$ can be considered to be exact. 424\item {\tt (T2hl - T2) / T2 in [-1b-103,1b-103]} which means that 425$$\left \vert \frac{T2hl - T2}{T2} \right \vert \leq 2^{-103}$$ is verified as $T2hl$ and $T2$ are defined as follows: 426$$T2hl = t2_\hi + t2_\lo \gets \mAddDD \left( z_\hi, z_\lo, zhSquareHalf_\hi, zhSquareHalf_\lo \right)$$ 427$$T2 = \left( z_\hi + z_\lo \right) + \left( zhSquareHalf_\hi + zhSquareHalf_\lo \right)$$ 428The given bound is thus just the accuracy bound of the \AddDD~ sequence for which a proof can be found in 429\cite{Lauter2005LIP:tripledouble}. 430\item {\tt (Phl - PE) / PE in [-1b-103,1b-103]} is verified for the same reason; let us just recall the definitions 431$$Phl = p_\hi + p_\lo \gets \mAddDD \left( t2_\hi, t2_\lo, t1_\hi, t1_\lo \right)$$ 432$$PE = \left( t2_\hi + t2_\lo\right) + \left( t1_\hi + t1_\lo \right)$$ 433\item {\tt (LogTabPolyhl - LogTabPoly) / LogTabPoly in [-1b-103,1b-103]} falls still into the same case with 434$$LogTabPolyhl = logTabPoly_\hi + logTabPoly_\lo \gets \mAddDD \left( logi_\hi, logi_\mi, p_\hi, p_\lo \right)$$ 435$$LogTabPoly = \left( logi_\hi, + logi_\mi \right) + \left( p_\hi + p_\lo \right)$$ 436\item And finally, {\tt (Loghm - LogE) / LogE in [-1b-103,1b-103]} 437which is also just the accuracy bound of the \AddDD~ sequence for 438$$Loghm = log_\hi + log_\mi \gets \mAddDD \left( log2ed_\hi, log2ed_\lo, logTabPoly_\hi, logTabPoly_\lo \right)$$ 439$$LogE = \left( log2ed_\hi + log2ed_\lo \right) + \left( logTabPoly_\hi + logTabPoly_\lo \right)$$ 440\item {\tt (Log2hm - Log2) / Log2 in [-1b-84,1b-84]} is verified 441since $log2_\hi$ and $log2_\mi$ are computed as already indicated in listing \ref{list:maplelog2}. 442This means that at least $11$ trailing zeros are stored in each in the doubles in this (pseudo-)double-double number, 443so it is exact to $2^{-106-2 \cdot 11} = 2^{-84}$. 444\item {\tt (Logihm - Logir) / Logir in [-1b-106,1b-106]} which means 445$$\left \vert \frac{\left( logi_\hi + logi_\mi \right) - \log\left( r_i \right)}{\log\left( r_i \right)} \right \vert \leq 2^{-106}$$ 446is verified by construction as $logi_\hi$ and $logi_\mi$ are computed by the following Maple code: 447\begin{lstlisting}[caption={Maple code for computing $logi_\hi$ and $logi_\mi$},firstnumber=35] 448(logih[i], logim[i], logil[i]) := hi_mi_lo(evalf(-log(r[i]))): 449\end{lstlisting} 450where {\tt hi\_mi\_lo} is the procedure for rounding an arbitrary precision number to a triple-double number the higher 451significant numbers of which form a double-double number. 452\item The hypothesis {\tt Z in [\_zmin,\_zmax]} simply recalls the bounds for $z$ as calculated by Maple. 453\item The same can be said on the hypothesis \\ 454{\tt (P - Log1pZ) / Log1pZ in [-\_epsilonApproxQuick,\_epsilonApproxQuick]} \\ 455which gives the mathematical approximation error of the polynomial. This bound is computed by Maple using the following 456instructions: 457\begin{lstlisting}[caption={Maple code for computing the relative error of the polynomial},firstnumber=129] 458epsilonApproxQuick := numapprox[infnorm]( 1-polyQuick/log(1+x), x=zminmin..zmaxmax) 459\end{lstlisting} 460\item Finally, Gappa's hypothesis {\tt ((logh + logm) - Loghm) / Loghm in [-1b-106,1b-106]} 461simply restates the fact that a double-double precision number is exact to 462at least $2^{-106}$ in terms of its relative error. 463\end{enumerate} 464The Gappa tool itself is not capable of proving the final accuracy bound it is asked for a complex algorithm as the 465one given here. Its user must provide hints to help it to rewrite the interval arithmetics terms it encounters 466in the program. These hints are generally given in the form {\tt $\alpha$ -> $\beta$} 467where $\beta$ is an expression we want the 468tool to rewrite the expression $\alpha$ by. Generally speaking, the idea behind each hint is one of the following: 469\begin{itemize} 470\item For computing intervall bounds on differences like $\alpha = a - A$ where both $a$ and $A$ are sums of terms like 471$a = c + C$ and $B = d + D$, it is often useful to rewrite $\alpha$ by $\beta = \left(c - d \right) + \left( C - D \right)$. 472\item An intervall bound can often be easier found for a term $A$ representing an exact mathematical value that for $a$ 473which is its arithmetical equivalent. So it is useful to rewrite $a$ by $A \cdot \left( 1 + \frac{a - A}{A} \right)$ when 474an intervall for $\frac{a - A}{A}$ is known. 475\item Fractional left hand sides like $\frac{a}{b}$ where both expressions $a$ and $b$ are functions in a common argument 476$x$ that can be written like $a = a\left( x \right) = x^n \cdot a^\prime\left( x \right)$ and 477$b = b\left( x \right) = x^m \cdot b^\prime\left( x \right)$ should usually be rewritten as follows: 478$$\frac{a\left(x\right)}{b\left( x \right)} = \frac{x^n \cdot a^\prime\left( x \right)}{x^m \cdot b^\prime\left( x \right)} = 479x^{n - m} \cdot \frac{a^\prime\left( x \right)}{b^\prime\left( x \right)}$$ In particular, this kind of hint is needed when an 480intervall for the denominator of a fractional left-hand-side comprises $0$. 481\item Fractional left-hand-sides of the form $\frac{a - A}{A}$ with an unknown $A$ can easily be written like 482$$\frac{a - A}{A} = \frac{a - B}{B} + \frac{B - A}{A} + \frac{a - B}{B} \cdot \frac{B - A}{A}$$ 483We can show this equivalence like this 484\begin{eqnarray*} 485\frac{a - A}{A} & = & \frac{a - B + B - A}{A} \\ 486& = & \frac{a - B}{A} + \frac{B - A}{A} \\ 487& = & \frac{a - B}{B} \cdot \frac{B}{A} + \frac{B - A}{A} \\ 488& = & \frac{a - B}{B} \cdot \left( 1 + \frac{B - A}{A} \right) + \frac{B - A}{A} \\ 489& = & \frac{a - B}{B} + \frac{B - A}{A} + \frac{a - B}{B} \cdot \frac{B - A}{A} 490\end{eqnarray*} 491This is particularly useful when a bound on the relative error of some term $a$ with regard to $B$ should be 492extended to the next approximation level. 493\end{itemize} 494Clearly, the left-hand-side $A$ and right-hand-side $B$ of an hint must be mathematically equivalent to provide a 495correct result. The Gappa tool checks for this equivalence and sometimes is able to prove it. If not, it emits a 496warning indicating that the formal proof it is generating for the accuracy bound computations is valid only under 497the hypothesis that both sides of the rewriting hint are mathematically equivalent. Further, it prints out the 498difference $A - B$ of both sides $A$ and $B$ which it has already reduced using the equivalences given in the 499Gappa code. It is relatively simple to verify that all this differences are equal to $0$ modulo the definitions 500given in the Gappa code by means of Maple-scripts. This work can even been done automatically. Thus, we refrain 501from giving a paper proof of each hint in the Gappa files used for proving the logarithm function but just 502give the exhaustive list of the hints in files {\tt log-td.gappa} and {\tt log-td-E0-logir0.gappa}: 503\begin{lstlisting}[caption={Gappa term rewriting hints in file {\tt log-td.gappa}},firstnumber=115] 504T2hl - T2 -> ((T2hl - T2) / T2) * T2; 505T2hl -> (T2hl - T2) + T2; 506 507Phl - PE -> ((Phl - PE) / PE) * PE; 508Phl -> (Phl - PE) + PE; 509 510 511LogTabPolyhl -> (LogTabPolyhl - LogTabPoly) + LogTabPoly; 512 513Loghm -> (Loghm - LogE) + LogE; 514 515Log2 -> Log2hm * (1 / (((Log2hm - Log2) / Log2) + 1)); 516 517Logir -> Logihm * (1 / (((Logihm - Logir) / Logir) + 1)); 518 519 520LogTabPolyhl - LogTabPoly -> ((LogTabPolyhl - LogTabPoly) / LogTabPoly) * LogTabPoly; 521 522HZZsimp -> (-0.5 * zh * zh) - (0.5 * zl * zl); 523 524T2hl - ZpHZZsimp -> (0.5 * zl * zl) + delta1; 525 526zhCube - ZZZ -> (Z * (zhSquareh - Z * Z)) - (zl * zhSquareh); 527 528polyUpper - ZZZPhigher -> ZZZ * (polyHorner - Phigher) + polyHorner * delta3 + delta2; 529 530ZpHZZ + ZZZPhigher -> ZpHZZsimp + ZZZPhigherPzhzl; 531 532Phl - P -> (T2hl - ZpHZZsimp) + (T1hl - ZZZPhigherPzhzl) + delta4; 533 534Log1pZ -> P * (1 / (((P - Log1pZ) / Log1pZ) + 1)); 535P - Log1pZ -> ((P - Log1pZ) / Log1pZ) * Log1pZ; 536 537Phl - Log1pZ -> (Phl - P) + delta6; 538 539LogTabPolyhl - Log1pZpTab -> (Logihm - Logir) + (Phl - Log1pZ) + delta7; 540 541Loghm - Log -> (Log2edhm - Log2E) + (LogTabPolyhl - Log1pZpTab) + delta5; 542 543(logh + logm) - Loghm -> (((logh + logm) - Loghm) / Loghm) * Loghm; 544 545(logh + logm) - Log -> ((logh + logm) - Loghm) + (Loghm - Log); 546\end{lstlisting} 547\begin{lstlisting}[caption={Gappa term rewriting hints in file {\tt log-td-E0-logir0.gappa}},firstnumber=81] 548T2hl - T2 -> ((T2hl - T2) / T2) * T2; 549T2hl -> (T2hl - T2) + T2; 550 551Phl - PE -> ((Phl - PE) / PE) * PE; 552Phl -> (Phl - PE) + PE; 553 554 555(ZhSquarehl - ZZ) / ZZ -> 2 * ((zh - Z) / Z) + ((zh - Z) / Z) * ((zh - Z) / Z); 556 557(zhSquareh - ZZ) / ZZ -> ((ZhSquarehl - ZZ) / ZZ) + ((zhSquareh - ZhSquarehl) / ZZ); 558 559(zhSquareh - ZhSquarehl) / ZZ -> ((zhSquareh - ZhSquarehl) / ZhSquarehl) * (ZhSquarehl / ZZ); 560 561ZhSquarehl / ZZ -> ((ZhSquarehl - ZZ) / ZZ) + 1; 562 563(ZhCube - ZZZ) / ZZZ -> (((zh * zhSquareh) - ZZZ) / ZZZ) + ((ZhCube - (zh * zhSquareh)) / ZZZ); 564 565((zh * zhSquareh) - ZZZ) / ZZZ -> (1 + ((zh - Z) / Z)) * (1 + ((zhSquareh - ZZ) / ZZ)) - 1; 566 567((ZhCube - (zh * zhSquareh)) / ZZZ) -> ((ZhCube - (zh * zhSquareh)) / (zh * zhSquareh)) * (((zh - Z) / Z) + 1) * (((zhSquareh - ZZ) / ZZ) + 1); 568 569polyHorner / Phigher -> ((polyHorner - Phigher) / Phigher) + 1; 570 571(polyUpper - ZZZPhigher) / ZZZPhigher -> ((polyHorner - Phigher) / Phigher) + ((ZhCube - ZZZ) / ZZZ) * (polyHorner / Phigher) + 572 + ((polyUpper - (polyHorner * ZhCube)) / (polyHorner * ZhCube)) * (polyHorner / Phigher) + 573 + ((ZhCube - ZZZ) / ZZZ) * ((polyUpper - (polyHorner * ZhCube)) / (polyHorner * ZhCube)) * 574 (polyHorner / Phigher); 575 576 577((ZhSquareHalfhl - (zh * zl)) - HZZ) / HZZ -> - ((zh - Z) / Z) * ((zh - Z) / Z); 578 579(ZhSquareHalfhl - HZZ) / HZZ -> (ZhSquarehl - ZZ) / ZZ; 580 581((T2hl - (zh * zl)) - ZpHZZ) / ZpHZZ -> ((HZ * (((ZhSquareHalfhl - (zh * zl)) - HZZ) / HZZ)) + ((T2hl - T2) / T2) 582 + (HZ * ((T2hl - T2) / T2)) 583 + (HZ * ((ZhSquareHalfhl - HZZ) / HZZ) * ((T2hl - T2) / T2))) / (1 + HZ); 584 585(PE - P) / P -> (((1 + HZ) * (((T2hl - (zh * zl)) - ZpHZZ) / ZpHZZ)) + 586 ((1 + ((zh - Z) / Z)) * (Z * ((zh - Z) / Z)) * ((Flzhzl - (zh * zl)) / (zh * zl))) 587 + (ZZ * Phigher * ((polyUpper - ZZZPhigher) / ZZZPhigher))) / (1 + HZ + ZZ * Phigher); 588 589(Phl - P) / P -> ((PE - P) / P) + ((((PE - P) / P) + 1) * ((Phl - PE) / PE)); 590 591(Loghm - Log) / Log -> ((Loghm - P) / P) + ((P - Log) / Log) + ((Loghm - P) / P) * ((P - Log) / Log); 592 593(((logh + logm) - Log) / Log) -> (((logh + logm) - Loghm) / Loghm) + ((Loghm - Log) / Log) + (((logh + logm) - Loghm) / Loghm) * ((Loghm - Log) / Log); 594\end{lstlisting} 595For the reasons mentionned, we can consider the accuracy proof of the quick phase to be correct. 596\subsection{Accuracy proof of the accurate phase\label{subsec:accuratephase}} 597The accuracy proof of the accurate phase is also based mainly on the use of the Gappa tool. 598Nevertheless, since the tool is currently not directly supporting triple-double representations, some additional 599hand-proven accuracy bound results for the main addition and multiplication operators are needed. They can be 600found in \cite{Lauter2005LIP:tripledouble}. Since all these accuracy bounds are parameterized by the maximal overlap bound 601for the triple-double numbers along the computations, before being able to give a numerical value for 602these error bounds understood by the Gappa tool, it is necessary to do a maximal overlap bound analysis using 603the theorems given in \cite{Lauter2005LIP:tripledouble}.\par 604Eventually, since not an overlapped triple-double intermediate result is to be returned by the logarithm function but a 605double precision number that is the correct rounding according to the rounding mode chosen, the algorithm effectuates 606a renormalizing operation on the final result and rounds this non-overlapped result down to a double using an 607appropriate rounding sequence. All this renormalization and rounding sequences are exact and have been shown to be 608correct in \cite{Lauter2005LIP:tripledouble}. The same way, all properties shown in section \ref{subsec:reduction} 609concerning the special case handling and exactness argument reduction can be reused because the algorithm implemented in 610the accurate phase uses the same reduced argument and is substantially the same as for the quick phase. \par 611We will thus rely on all these properties and simply show the following accuracy bound 612$$\epsilon_{\mbox{\tiny accurate}} = \frac{\left( log_\hi + log_\mi + log_\lo \right) - \log\left( x \right)}{\log\left( x \right)}$$ 613is bounded by 614$$\left \vert \epsilon_{\mbox{\tiny accurate}} \right \vert \leq 5735 \cdot 2^{-132} \leq 2^{-119.5}$$ 615which will be expressed in Gappa syntax as follows: 616\begin{lstlisting}[caption={Accuracy bound to prove for the accurate phase},firstnumber=165] 617-> 618((logh + logm + logl) - MLog) / MLog in [-5735b-132,5735b-132] 619\end{lstlisting} 620The Gappa proof files still make the hypothesis that two of the multiplications in the accurate phase code can be 621considered to be 622exact. This property must therefore be shown in a paper proof in the following. 623 624The first of these multiplications is the following sequence: 625\begin{lstlisting}[caption={Multiplication of triple-double $\circ\left( Z \cdot Z \right)$ by $-\frac{1}{2}$},firstnumber=99] 626 zSquareHalfh = zSquareh * -0.5; 627 zSquareHalfm = zSquarem * -0.5; 628 zSquareHalfl = zSquarel * -0.5; 629\end{lstlisting} 630As it will be shown below, the relative error $\epsilon_{ZSquare}$ defined as 631$$\epsilon_{ZSquare} = \frac{\left( zSquare_\hi + zSquare_\mi + zSquare_\lo \right) - Z^2}{Z^2}$$ 632is bounded by $\left \vert \epsilon_{ZSquare} \right \vert \leq 2^{-149}$. 633Using the same argument as the one given in section \ref{subsec:quickphase}, one can show that $Z$ is either $0$ 634or greater in magnitude than at least $2^{-77}$. So the following is true 635$$Z^2 = 0 \lor \left \vert Z^2 \right \vert \geq 2^{-154}$$ 636If $Z^2=0$, $ZSquarehml = zSquare_\hi + zSquare_\mi + zSquare_\lo$ trivially is $0$, too, and the multiplication is 637with $-\frac{1}{2}$ is therefore exact. 638Since we can note $ZSquarehml = Z^2 \cdot \left( 1 + \epsilon_{ZSquare} \right)$, we know that in the other case, 639$$\left \vert ZSquarehml \right \vert \geq 2^{-155}$$ 640We can suppose that in the triple-double number $zSquare_\hi + zSquare_\mi + zSquare_\lo$, $zSquare_\mi$ and 641$zSquare_\lo$ are not overlapped at all (since $zSquare_\mi = \circ \left( zSquare_\mi + zSquare_\lo \right)$) 642and that $zSquare_\hi$ and $zSquare_\mi$ are not fully overlapped. 643So we can note $\left \vert zSquare_\mi \right \vert \leq 2^{-\beta_o} \cdot \left \vert zSquare_\hi \right \vert$ and 644$\left \vert zSquare_\lo \right \vert \leq 2^{-\beta_u} \cdot \left \vert zSquare_\mi \right \vert$ with $\beta_o \geq 1$ and 645$\beta_u \geq 53$. 646We will show this property below we are just supposing here. 647So we can verify the following 648\begin{eqnarray*} 649\left \vert ZSquarehml \right \vert & = & \left \vert zSquare_\hi + zSquare_\mi + zSquare_\lo \right \vert \\ 650& \leq & \left \vert zSquare_\hi \right \vert + \left \vert zSquare_\mi \right \vert + \left \vert zSquare_\lo \right \vert \\ 651& \leq & \left \vert zSquare_\hi \right \vert + 6522^{-\beta_o} \cdot \left \vert zSquare_\hi \right \vert + 6532^{-\beta_o} \cdot 2^{-\beta_u} \cdot \left \vert zSquare_\hi \right \vert \\ 654& \leq & 2 \cdot \left \vert zSquare_\hi \right \vert 655\end{eqnarray*} 656In consequence, we obtain 657$$\left \vert zSquare_\hi \right \vert \geq \frac{1}{2} \cdot \left \vert ZSquarehml \right \vert$$ 658and thus 659$$\left \vert zSquare_\hi \right \vert \geq 2^{-156}$$ under the hypothesis that it is not exactly zero. 660So $zSquareHalf_\hi = -\frac{1}{2} \cdot zSquare_\hi$ will never be underflowed. 661 662Let us now show first that the operations for computing $zSquareHalf_\mi$ and $zSquareHalf_\lo$ cannot both be 663inexact. We will use the fact that $\left \vert zSquare_\lo \right \vert \leq 2^{-53} \cdot \left \vert zSquare_\mi \right \vert$. 664Suppose first that 665$$zSquareHalf_\mi \gets - \frac{1}{2} \otimes zSquare_\mi$$ is inexact. So $\left \vert zSquare_\mi \right \vert < 2^{-1022}$ and 666in consequence $\left \vert zSquare_\lo \right \vert < 2^{-1022-53}$. Note that the inegality is strict. 667Since the least (in magnitude) representable denormalized double precision floating point number is $2^{-52} \cdot 2^{-1023}$, 668$zSquare_\lo = 0$ in this case. So $$zSquareHalf_\lo \gets - \frac{1}{2} \otimes zSquare_\lo$$ is exact because trivially, a 669multiplication with $0$ is exact. 670 671Suppose now that $$zSquareHalf_\lo \gets - \frac{1}{2} \otimes zSquare_\lo$$ is inexact. 672So $\left \vert zSquare_\lo \right \vert < 2^{-1022}$. Further, the least significant bit of the mantissa of $zSquare_\lo$ is 673$1$ because otherwise, a bit-shift in its mantissa by 1 would be an exact operation. 674Thus $\left \vert zSquare_\lo \right \vert \geq 2^{-52} \cdot 2^{-1023}$ and $\left \vert zSquare_\mi \right \vert \geq 2^{-1022}$. So 675$$zSquareHalf_\mi \gets - \frac{1}{2} \otimes zSquare_\mi$$ cannot be inexact because in this case we would have 676$\left \vert zSquare_\mi \right \vert < 2^{-1022}$. 677 678So, in any case, if ever $zSquareHalf_\mi + zSquareHalf_\lo$ are not exactly 679$-\frac{1}{2} \cdot \left( zSquare_\mi + zSquare_\lo \right)$, the error made will be $\frac{1}{2} \cdot d$ 680in magnitude, where $d = 0^+$ is the smallest representable denormalized non-zero double. So we can note down in this case 681$$zSquareHalf_\hi + zSquareHalf_\mi + zSquareHalf_\lo = - \frac{1}{2} \cdot \left( zSquare_\hi + zSquare_\mi + zSquare_\lo \right) + \delta$$ 682with $\left \vert \delta \right \vert \leq 2^{-1075}$. Since we know that 683$\left \vert - \frac{1}{2} \cdot \left( zSquare_\hi + zSquare_\mi + zSquare_\lo \right) \right \vert \geq 2^{-156}$, we can give the 684following bound 685$$\left \vert \frac{\delta}{-\frac{1}{2} \cdot \left( zSquare_\hi + zSquare_\mi + zSquare_\lo \right)} \right \vert \leq 686\frac{2^{-1075}}{2^{-156}} = 2^{-919}$$ 687So we get 688$$ZSquareHalfhml = - \frac{1}{2} \cdot ZSquarehml \cdot \left(1 + \epsilon\right)$$ 689with 690$\left \vert \epsilon \right \vert \leq 2^{-919}$ 691 692In contrast, since we know that $\left \vert Z \right \vert \leq 2^{-8}$ 693thus that $\left \vert Z^2 \right \vert \leq 2^{-16}$ but that $\left \vert Z^2 \right \vert \geq 2^{-154}$, we can 694assume that the infinite precision mantissa of $Z^2$ can always be written exactly with at most $154 - 16 = 138 < 149$ 695bits. As we can show that 696$\frac{1}{2} \cdot \left \vert ZSquarehml \right \vert \leq \left \vert zSquare_\hi \right \vert \leq 6972 \cdot \left \vert ZSquarehml \right \vert$ we know that if ever one of $zSquare_\mi$ or $zSquare_\lo$ is such that 698the multiplication with $-\frac{1}{2}$ is not exact, the error made has already been accounted for in the error bound 699for $ZSquarehml$ with regard to $Z^2$. 700So the operation computing $ZSquareHalfhml$ out of $ZSquarehml$ can be considered to be exact. \par 701Let us now analyse the following sequence 702\begin{lstlisting}[caption={Multiplication of triple-double $Log2hml$ by $E$},firstnumber=126] 703 log2edhover = log2h * ed; 704 log2edmover = log2m * ed; 705 log2edlover = log2l * ed; 706\end{lstlisting} 707Similar to the argumentation that has been given in section \ref{subsec:quickphase}, since $E=ed$ is bound 708in magnitude by $1024=2^{10}$ and since $log2_\hi$, $log2_\mi$ are stored with at least $11$ trailing bits at zero, 709the multiplications in these components are exact. The constant $log2_\lo$ is not stored with $11$ trailing bits at zero 710but it could be because we will be just supposing the bound $\left \vert \epsilon_{Log2hml} \right \vert \leq 2^{-3 \cdot 53 + 33} = 2^{-126}$ for 711$$\epsilon_{Log2hml} = \frac{log2_\hi + log2_\mi + log2_\lo - \log\left( 2 \right)}{\log\left( 2 \right)}$$ 712So the multiplication is not exact in itself but the final result is exacter than the bound we are using for it. 713 714Let us finally just recall the Maple code for computing the constants: 715\begin{lstlisting}[caption={Maple code for computing $Log2hml$},firstnumber=21] 716log2acc := log(2): 717log2h := round(log2acc * 2**(floor(-log[2](abs(log2acc))) + (53 - 11))) / 2**(floor(-log[2](abs(log2acc))) + (53 - 11)): 718log2m := round((log2acc - log2h) * 2**(floor(-log[2](abs((log2acc - log2h)))) + (53 - 11))) / 2**(floor(-log[2](abs((log2acc - log2h)))) + (53 - 11)): 719log2l := log2acc - (log2h + log2m): 720\end{lstlisting} 721So the multiplication can be considered to be exact as long the less accurate bound for $\epsilon_{Log2hml}$ is used. 722 723Let us know analyse the bounds that we can give for the maximal overlap of the components of the triple-double numbers 724in the logarithm implementation. For doing this, we will assign each triple-double number in the code an overlap bound as 725follows. Call the number in consideration e.g. $a_\hi + a_\mi + a_\lo$. So we will give the bounds expressed like this: 726\begin{eqnarray*} 727\left \vert a_\mi \right \vert & \leq & 2^{-\alpha_o} \cdot \left \vert a_\hi \right \vert \\ 728\left \vert a_\lo \right \vert & \leq & 2^{-\alpha_u} \cdot \left \vert a_\mi \right \vert 729\end{eqnarray*} 730where $\alpha_o, \alpha_u \geq 2$. 731We will then propagate this information following the flow of control in the implementation and using the overlap 732bound theorems given in \cite{Lauter2005LIP:tripledouble}. Here, we understand by ``propagating'' checking a system of constraints 733of the bounds under the limitations provided by the theorems. As the control-flow-graph of our implementation 734is completely linear, this check is linear, too. The theorems mentionned can be summarized as follows: 735\begin{center} 736\begin{tabular}{|l|ll|ll|ll|} 737\hline 738Operation & 1st arg. & 2nd arg. & result high & result low \\ 739\hline 740\AddTT & $\alpha_o \geq 4$, $\alpha_u \geq 1$ & $\beta_o \geq 4$, $\beta_u \geq 1$ & $\gamma_o \geq \min\left( \alpha_o, \beta_o \right) - 5$ & $\gamma_u \geq 53$ \\ 741\hline 742\AddDTT & - & $\beta_o \geq 2$, $\beta_u \geq 1$ & $\gamma_o \geq \min\left( 45, \beta_o - 4, \beta_o + \beta_u - 2 \right)$ & $\gamma_u \geq 53$ \\ 743\hline 744\MulDT & - & - & $\gamma_o \geq 48$ & $\gamma_u \geq 53$ \\ 745\hline 746\MulDTT & - & $\beta_o \geq 2$, $\beta_u \geq 1$ & $\gamma_o \geq \min\left( 48, \beta_o - 4, \beta_o + \beta_u - 4 \right)$ & $\gamma_u \geq 53$ \\ 747\hline 748\end{tabular} 749\end{center} 750So let us analyse the following code: 751\begin{lstlisting}[caption={Triple-double computations},firstnumber=90,label={list:tripledouble}] 752Mul23(&zSquareh, &zSquarem, &zSquarel, zh, zl, zh, zl); 753Mul233(&zCubeh, &zCubem, &zCubel, zh, zl, zSquareh, zSquarem, zSquarel); 754Mul233(&higherPolyMultZh, &higherPolyMultZm, &higherPolyMultZl, t14h, t14l, zCubeh, zCubem, zCubel); 755zSquareHalfh = zSquareh * -0.5; 756zSquareHalfm = zSquarem * -0.5; 757zSquareHalfl = zSquarel * -0.5; 758Add33(&polyWithSquareh, &polyWithSquarem, &polyWithSquarel, 759 zSquareHalfh, zSquareHalfm, zSquareHalfl, 760 higherPolyMultZh, higherPolyMultZm, higherPolyMultZl); 761Add233(&polyh, &polym, &polyl, zh, zl, polyWithSquareh, polyWithSquarem, polyWithSquarel); 762Add33(&logyh, &logym, &logyl, logih, logim, logil, polyh, polym, polyl); 763log2edhover = log2h * ed; 764log2edmover = log2m * ed; 765log2edlover = log2l * ed; 766log2edh = log2edhover; 767log2edm = log2edmover; 768log2edl = log2edlover; 769Add33(&loghover, &logmover, &loglover, log2edh, log2edm, log2edl, logyh, logym, logyl); 770\end{lstlisting} 771This code will finally generate triple-double numbers respecting the following overlap bounds as will be 772shown below: 773\begin{center} 774\begin{tabular}{|l|l|l|l|} 775\hline 776Variable & Line(s) & $\alpha_o \geq$ & $\alpha_u \geq$ \\ 777\hline 778$ZSquarehml$ & 90 & $48$ & $53$ \\ 779\hline 780$ZCubehml$ & 91 & $44$ & $53$ \\ 781\hline 782$HigherPolyMultZhml$ & 92 & $40$ & $53$ \\ 783\hline 784$ZSquareHalfhml$ & 93-95 & $48$ & $53$ \\ 785\hline 786$PolyWithSquarehml$ & 96-98 & $35$ & $53$ \\ 787\hline 788$Polyhml$ & 99 & $31$ & $53$ \\ 789\hline 790$Logyhml$ & 100 & $26$ & $53$ \\ 791\hline 792$Log2edhml$ & 101-106 & $40$ & $40$ \\ 793\hline 794$Logoverhml$ & 107 & $21$ & $53$ \\ 795\hline 796\end{tabular} 797\end{center} 798So let us verify exemplarily some of these bounds: 799\begin{itemize} 800\item At line 90, $ZSquarehml$ is computed out of the double-double number $z_\hi + z_\lo$ by use of the \MulDT~ sequence. 801Since the inputs of this function are not triple-double, the overlap bound is just the bound provided by the sequence 802itself, i.e. $\alpha_o \geq 48$, $\alpha_u \geq 53$. 803\item $ZCubehml$ is the result of a \MulDTT~ sequence at line 91. Its overlap bound depends therefore on the one 804for $ZSquarehml$, which is the second argument of the function. Since we know the bound for this variable, we easily 805verify the one for $ZCubehml$ which is $\alpha_o \geq 44$ and $\alpha_u \geq 53$. 806\item $Log2edhml$ is the exact pairwise product of the triple-double constant $Log2hml$ and double $E$. Since $E$ may be 807as small as $0$ in magnitude and further, since the multiplication is pairwise, the overlap bound we dispose of for 808$Log2edhml$ is the same as for $Log2hml$ which is stored with at least $11$ bit trailing zeros. 809So an appropriate bound is $\alpha_o \geq 52 - 11 \geq 40$ and $\alpha_u \geq 40$. 810\end{itemize} 811All other bounds can be verified the same way using the theorems given in \cite{Lauter2005LIP:tripledouble} and indicated above. \par 812Since we have computed the overlap bounds for the different triple-double operands in the code, we can now 813calculate the accuracy bounds for the operations. Doing this is only possible with the knowledge of the 814overlap of the operations because all accuracy bound theorems given in \cite{Lauter2005LIP:tripledouble} are parameterized with this 815overlap expressions. 816 817Let us first give a list of the accuracy of the different basic operations which is not exhaustive with regard to 818its lack of listing almost all preconditions on the sequences required for theorems to hold. We refrain from explicitely 819verifying each of this preconditions in this document as this is only fastidious work but not of special interest. 820\begin{center} 821\begin{tabular}{|l|l|l|l|} 822\hline 823Operation & Overlap 1st arg. & Overlap 2nd arg. & Relative error $\epsilon$ \\ 824\hline 825\AddDD & - & - & $\left \vert \epsilon \right \vert \leq 2^{-103.5} \leq 2^{-103}$ \\ 826\hline 827\MulDD & - & - & $\left \vert \epsilon \right \vert \leq 2^{-102}$\\ 828\hline 829\AddTT & $\alpha_o \geq 4$, $\alpha_u \geq 1$ & $\beta_o \geq 4$, $\beta_u \geq 1$ & 830$\left \vert \epsilon \right \vert \leq 2^{-\min\left( \alpha_o + \alpha_u, \beta_o + \beta_u \right) -47} + 2^{-\min\left( \alpha_o, \beta_o \right) - 98}$ 831\\ 832\hline 833\AddDTT & - & $\beta_o \geq 2$, $\beta_u \geq 1$ & 834$\left \vert \epsilon \right \vert \leq 2^{-\beta_o - \beta_u - 52} + 2^{-\beta_o-104} + 2^{-153}$ 835\\ 836\hline 837\MulDT & - & - & 838$\left \vert \epsilon \right \vert \leq 2^{-149}$ 839\\ 840\hline 841\MulDTT & - & $\beta_o \geq 2$, $\beta_u \geq 1$ & 842$\left \vert \epsilon \right \vert \leq 2^{-97-\beta_o} + 2^{-97-\beta_o-\beta_u} + 2^{-150}$ 843\\ 844\hline 845\end{tabular} 846\end{center} 847Still analyzing the following double-double computations code and the code 848given at listing \ref{list:tripledouble}, one can now easily check the bounds for 849the relative error of the different operations listed in the table below. 850We define here the relative error of an operation $\ast$ and its arithmetical equivalent $\circledast$ as follows: 851$$\epsilon = \frac{\left(a \circledast b \right) - \left(a \ast b\right)}{\left(a \ast b \right)}$$ 852\begin{lstlisting}[caption={Double-double computations in accurate phase},firstnumber=73,label={list:doubledouble}] 853 Mul12(&t1h, &t1l, zh, highPoly); 854 Add22(&t2h, &t2l, accPolyC9h, accPolyC9l, t1h, t1l); 855 Mul22(&t3h, &t3l, zh, zl, t2h, t2l); 856 Add22(&t4h, &t4l, accPolyC8h, accPolyC8l, t3h, t3l); 857 Mul22(&t5h, &t5l, zh, zl, t4h, t4l); 858 Add22(&t6h, &t6l, accPolyC7h, accPolyC7l, t5h, t5l); 859 Mul22(&t7h, &t7l, zh, zl, t6h, t6l); 860 Add22(&t8h, &t8l, accPolyC6h, accPolyC6l, t7h, t7l); 861 Mul22(&t9h, &t9l, zh, zl, t8h, t8l); 862 Add22(&t10h, &t10l, accPolyC5h, accPolyC5l, t9h, t9l); 863 Mul22(&t11h, &t11l, zh, zl, t10h, t10l); 864 Add22(&t12h, &t12l, accPolyC4h, accPolyC4l, t11h, t11l); 865 Mul22(&t13h, &t13l, zh, zl, t12h, t12l); 866 Add22(&t14h, &t14l, accPolyC3h, accPolyC3l, t13h, t13l); 867\end{lstlisting} 868\begin{center} 869\begin{tabular}{|l|l|l|l|} 870\hline 871Result & Line(s) & Operation & Relative error $\epsilon$ \\ 872\hline 873$T1hl$ through $T14hl$ & 73 - 86 & \AddDD~ / \MulDD & 874$\left \vert \epsilon \right \vert \leq 2^{-103}$ / $\left \vert \epsilon \right \vert \leq 2^{-102}$ \\ 875\hline 876$ZSquarehml$ & 90 & \MulDT & $\left \vert \epsilon \right \vert \leq 2^{-149}$ \\ 877\hline 878$ZCubehml$ & 91 & \MulDTT & $\left \vert \epsilon \right \vert \leq 2^{-144}$ \\ 879\hline 880$HigherPolyMultZhml$ & 92 & \MulDTT & $\left \vert \epsilon \right \vert \leq 2^{-141}$ \\ 881\hline 882$PolyWithSquarehml$ & 96-98 & \AddTT & $\left \vert \epsilon \right \vert \leq 2^{-137}$ \\ 883\hline 884$Polyhml$ & 99 & \AddDTT & $\left \vert \epsilon \right \vert \leq 2^{-134}$ \\ 885\hline 886$Logyhml$ & 100 & \AddTT & $\left \vert \epsilon \right \vert \leq 2^{-128}$ \\ 887\hline 888$Logoverhml$ & 107 & \AddTT & $\left \vert \epsilon \right \vert \leq 2^{-123}$ \\ 889\hline 890\end{tabular} 891\end{center} 892Let us just explicitely check the bound for one of the operations for sake of an example. Let us take 893therefore the \AddTT~ sequence at lines 96-98 computing $PolyWithSquarehml$ out of $ZSquareHalfhml$ and 894$HigherPolyMultZhml$. We have already obtained to following overlap bounds: 895\begin{eqnarray*} 896\left \vert zSquareHalf_\mi \right \vert & \leq & 2^{-48} \cdot \left \vert zSquareHalf_\hi \right \vert \\ 897\left \vert zSquareHalf_\lo \right \vert & \leq & 2^{-53} \cdot \left \vert zSquareHalf_\mi \right \vert \\ 898\left \vert higherPolyMultZ_\mi \right \vert & \leq & 2^{-40} \cdot \left \vert higherPolyMultZ_\hi \right \vert \\ 899\left \vert higherPolyMultZ_\lo \right \vert & \leq & 2^{-53} \cdot \left \vert higherPolyMultZ_\mi \right \vert 900\end{eqnarray*} 901Feeding now this bounds into the theorem on the accuracy of \AddTT, we get 902$$\left \vert \epsilon \right \vert \leq 2^{-\min\left( 48 + 53, 40 + 53 \right) - 47} + 2^{-\min \left( 48, 40 \right) - 98} \leq 2^{-140} + 2^{-138} \leq 2^{-137}$$ 903All other error bounds can be verified in a similar way. They are finally expressed in Gappa syntax as follows: 904\begin{lstlisting}[caption={Relative error bounds in Gappa code},firstnumber=139] 905(T2hl - T2) / T2 in [-1b-103,1b-103] 906/\ (T3hl - T3) / T3 in [-1b-102,1b-102] 907/\ (T4hl - T4) / T4 in [-1b-103,1b-103] 908/\ (T5hl - T5) / T5 in [-1b-102,1b-102] 909/\ (T6hl - T6) / T6 in [-1b-103,1b-103] 910/\ (T7hl - T7) / T7 in [-1b-102,1b-102] 911/\ (T8hl - T8) / T8 in [-1b-103,1b-103] 912/\ (T9hl - T9) / T9 in [-1b-102,1b-102] 913/\ (T10hl - T10) / T10 in [-1b-103,1b-103] 914/\ (T11hl - T11) / T11 in [-1b-102,1b-102] 915/\ (T12hl - T12) / T12 in [-1b-103,1b-103] 916/\ (T13hl - T13) / T13 in [-1b-102,1b-102] 917/\ (T14hl - T14) / T14 in [-1b-103,1b-103] 918/\ (ZSquarehml - ZSquare) / ZSquare in [-1b-149,1b-149] 919/\ (ZCubehml - ZCube) / ZCube in [-1b-144,1b-144] 920/\ (HigherPolyMultZhml - HigherPolyMultZ) / HigherPolyMultZ in [-1b-141,1b-141] 921/\ (PolyWithSquarehml - PolyWithSquare) / PolyWithSquare in [-1b-137,1b-137] 922/\ (Polyhml - Poly) / Poly in [-1b-134,1b-134] 923/\ (Logyhml - Logy) / Logy in [-1b-128,1b-128] 924/\ (Loghml - Logover) / Logover in [-1b-123,1b-123] 925/\ (Log2hml - MLog2) / MLog2 in [-1b-126,1b-126] 926/\ (Logihml - MLogi) / MLogi in [-1b-159,1b-159] 927/\ (MPoly - MLog1pZ) / MLog1pZ in [-_epsilonApproxAccurate,_epsilonApproxAccurate] 928/\ Z in [_zmin,_zmax] 929/\ ((logh + logm + logl) - Loghml) / Loghml in [-1b-159,1b-159] 930\end{lstlisting} 931Concerning the Gappa proofs for accurate phase, in a similar way as for the quick phase, three different proof 932files are used. They reflect once again the three main cases for the argument of the logarithm function: 933\begin{itemize} 934\item For cases where after argument reduction $\left \vert E \right \vert \geq 1$, 935the file {\tt log-td-accurate.gappa} is used. In this case, absolute error computations are sufficient 936for the final relative error bound to be calculable because 937$\left \vert \log\left( x \right) \right \vert \geq \frac{1}{2} \log\left( 2 \right)$ in this case. 938\item For the case where after argument reduction, $E = 0$ and $i \not = 0$, the file 939{\tt log-td-accurate-E0.gappa} is used. The same way here, we have a preponderant constant term so absolute 940error computations suffice. 941\item For the other case, where $E=0$ and $i=0$ the file {\tt log-td-accurate-E0-logir0.gappa} provides the 942accuracy bound proof. In contrast to the other cases, we obliged to relative error estimations since the beginning 943since the function $\log\left( x \right)$ has a zero in this intervall. 944\end{itemize} 945Once again, several term rewriting hints are needed in the Gappa proof files for enabling the Gappa tool to 946generate a proof for the accuracy bounds. In a similar way, the hints which cannot directly be checked for their 947mathematical correctness by the tool itself are verified by semi-automatic Maple scripts.\par 948By the existence of an accuracy proof for a final relative error of $\left \vert \epsilon_{\mbox{\tiny accurate}} \right \vert \leq 2^{-119.5}$ and 949by the use of the critical accuracy of the double precision natural logarithm function which is 950$118$ bits\cite{DinDefLau2004LIP}, we can consider the implementation to be correctly rounding under the hypothesis 951that the final rounding sequence used is exact and correct. Since we suppose this -- a correctness proof can be 952found in \cite{Lauter2005LIP:tripledouble} -- the correctly rounding property is verified. 953 954 955\section{Proof of correctness of the double-extended implementation \label{sec:logdeproof}} 956 957 958\section{Performance results\label{sec:logperf}} 959The given implementation of the natural logarithm function aims at 960being both portable and more performant than the previous 961implementations using the SCS libary for the accurate phase. This 962goal is acheived in terms of memory consumption (if the code sizes for 963{\tt scslib} are taken into account) and in terms of speed 964performance. The reason for this is mainly the possibility of reusing 965all values computed in the argument reduction phase and the tables for 966the accurate phase directly. 967 968\subsection{Memory requirements} 969 970 971 972\subsection{Timings} 973 974 975%%% Local Variables: 976%%% mode: latex 977%%% TeX-master: "crlibm" 978%%% End: 979