1This chapter is contributed by Ch. Q. Lauter. 2 3\section{Main considerations, critical accuracy bounds}\label{subsec:criticalboundslog10} 4If one wants to guarantee that an implementation of the logarithm in 5base $10$, $\log_{10}\left( x \right)$, in double precision is 6correctly rounded, one has to ensure that the final intermediate 7approximation before rounding to double precision has a relative error 8of less than $2^{-122}$. 9 10An implementation of $\log_{10}\left(x\right)$ can also be derived from an 11implementation of the natural logarithm $\ln\left(x\right)$ using the formula: 12 13\begin{equation} 14 \log_{10}\left( x \right) = \frac{1}{\ln\left( 10 \right)} \cdot 15\ln\left( x \right)\label{eq:log10} 16\end{equation} 17When doing so, one must ensure that the constant 18$\mathit{log10inv} = \frac{1}{\ln\left(10\right)}$ is stored with 19enough accuracy, that the approximation for $\ln\left( x \right)$ is 20exact enough and that the multiplication sequence does not introduce 21too great an error. As we will see in the next section 22\ref{subsec:outlinelog10}, this implies slight changes to the code for 23the natural logarithm with regard to what has been presented in 24 Chapter \ref{chap:log}. 25 26 With regard to final rounding, the elementary function $\log_{10}$ 27 presents a particular issue that is somewhat singular amongst all 28 considered elementary functions: There exist a large set of input 29 double-precision numbers $x$ such that $\log_{10}(x)$ is a rational 30 and is representable as a double-precision number. 31 32 For such cases, a final directed rounding will be correct only if the 33 approximation error is exactly $0$. 34 Indeed, the rounding $\diamond\left( f\left( x \right) \right)$ of 35 the exactly representably value $f\left(x\right) \in \F$ is trivially 36 $\diamond\left( f\left( x \right) \right) = f\left( x \right)$ 37 \cite{IEEE754}. In contrast, $\diamond\left( f\left( x \right) + 38 \delta \right) \not = f\left( x \right) \in \F$ holds for all 39 $\left \vert \delta \right \vert > 0$. 40 41 As it is impossible to achieve an approximation error exactly equal 42 to zero, it is preferable to filter out such cases and handle them 43 separately. Other functions described so far had only one such argument 44 ($x=1$ for $\log$, x=0 for the trigs). $\log_2$ has a set of such 45 cases ($x = 2^k$, $k \in \Z$) which is equally trivial to handle in 46 binary floating point. 47 48 49For $f = \log_{10}$, filtering much more difficult. In fact, $y = 50\log\left( x \right)$ is algebraic for exactly all $x = 10^k$, $k \in 51\Z$ \cite{Baker75}. Filtering means thus testing whether an input $x$ 52can be written $x = 10^k$ with an integer $k \in \Z$. This is 53equivalent to testing if $\log_{10}\left( x \right)$ an integer, 54i.e. $\log_{10}\left( x \right) \in \Z$. However, since $\log_{10}\left( x 55\right)$ can only be approximated, filtering this way is 56impossible. 57 58One possibility is the following approach. In 59floating point arithmetic, in order to be in a situation of difficult 60rounding, not only $\log_{10}\left( x \right)$ must be algebraic but 61also the corresponding $x = 10^k$, $k \in \Z$, must be representable 62in floating point. To start with eliminating cases, we can argue that 63this impossible for all $k < 0$. Indeed, since $2 \nmid 5$, there 64exist no $m \in \N$ and $e \in \Z$ for any $k \in \Z^-$ such that 65$10^k = 2^e \cdot m$ \cite{Muller97}. So we have reduced the range of 66cases to filter to all $x = 10^k$, $k \in \N \cup \left \lbrace 670\right \rbrace$. Further in double precision, the mantissa's length 68is $53$. So $10^k = 2^k \cdot 5^k = 2^e \cdot m$ is exactly 69representable in double precision only for values $k \in \N \cup \left 70\lbrace 0 \right \rbrace$ such that $5^k \leq 2^{53}$. This yields to 71$k \leq 53 \cdot \frac{\ln\left( 2 \right)}{\ln\left( 5 \right)} 72\approx 22.82$; hence $0 \leq k \leq 22$. In consequence, it would be 73possible to filter out the $23$ arguments $x = 10^k$ with $k \in \left 74\lbrace 0 \dots 22 \right \rbrace$. Nevertheless, this approach would 75be relatively costly. It is not the way that has been chosen for the 76implementation presented here. 77 78Our approach uses the critical worst case accuracy of the 79elementary function $\log_{10}\left( x \right)$. As already mentioned, 80it is $2^{-122}$. Under the condition that we can provide an 81approximation to the function that is exact to at least $2^{-123}$, we 82can decide the directed rounding using a modified final rounding 83sequence: We know that a $1$ after a long series of $0$s (respectively 84a $0$ after a long series of $1$s) must be present at least at the 85$122$th bit of the intermediate mantissa. If it is not, we can 86consider a potentially present $1$ after the $122$th bit to be an 87approximation error artefact. In fact this means neglecting $\delta$s 88relatively less than $2^{-122}$ when rounding $\diamond \left( f\left( 89x \right) + \delta \right)$ instead of $\diamond \left( f\left( x 90\right) \right)$. 91 92One shortcoming of this approach is that the 93accurate phase is launched for arguments where the quick phase's 94accuracy would suffice to return the correct result. As 95such arguments are extremely rare ($p = \frac{23}{2^{63}} \approx 2.5 96\cdot 10^{-18}$~!), this is not of an issue. The modification of the 97final rounding sequence is relatively lightweight: merely one floating 98point multiplication, two integer masks and one integer comparison 99have to be added to handle the case. 100 101One remarks this approach is only possible because the critical worst 102case accuracy of the function is known by Lef{\`e}vre's works. Ziv's oignon peeling 103strategy without the filtering of the 104$23$ possible cases in input and without any accuracy limitation for 105intermediate computations yields to nontermination of the 106implementation of the function on such arguments $x = 10^k$. 107 108An earlier \crlibm\ implementation of the $\log_{10}\left(x\right)$ 109function based on the SCS format did not handle the problem and 110returned incorrectly rounded results for inputs $x = 10^k$ in the 111directed rounding modes. 112 113 114\section{General outline of the algorithm and accuracy estimates}\label{subsec:outlinelog10} 115% 1/2 page 116% 117% - Multiply by the right constant, this time using a triple double for the constant => Mul33 which is costly 118% - Tell about the need to gain some bits in the log for the worst case => renormalize at some point in the code 119% - Analyse the issue of integer powers of 10 => give explanation that there are only 17 cases 120% - Indicate the way the final rounding sequence for triple double can be modified => additional costs 121% - Mention that we launch the accurate phase even for results where the quick phase result suffices (10^n), 122% analyse the problem and mention that it is unique for log10 (in the usual list of elementary functions) 123% but that it is quasi impossible to get around it (tell that log10 in SCS did not correctly treat the problem) 124 125The quick phase of the implementation of the $\log_{10}\left( x 126\right)$ follows exactly the scheme depicted by equation 127(\ref{eq:log10}) above. Similarly to the logarithm in base 128$2$, the natural logarithm's intermediate double-double result is 129multiplied by a double-double precision approximation of 130$\mathit{log10inv}$. The rounding test is slightly modified in order 131to ensure safe rounding or launching the accurate phase. 132 133Concerning the accurate phase, some modifications in the natural 134logarithm's code are necessary because of the tighter accuracy bound 135needed for the worst case. The natural logarithms accurate phase 136polynomial approximation relative error has already been less than 137$2^{-125}$ which is exact enough for $\log_{10}\left( x 138\right)$. The fact that the complete triple-double implementation is 139exact to only $119$ bits, is mainly due to the inexactness of the 140operators used in reconstruction phase. In turn, this inexactness is 141caused by the relatively high overlap in the triple-double numbers 142handled. By adding two additional renormalisations the triple-double 143operators become exact enough. 144 145The constant $\mathit{log10inv}$ cannot be stored in double-double 146precision with an accuracy of $124$ bits. A triple-double 147approximation is therefore used. Its relative approximation error is 148smaller than $2^{-159}$. The final multiplication of the triple-double 149constant representing $\mathit{log10inv}$ and the triple-double 150natural logarithm result is performed by a \MulTT. The relative error 151of this operator on non-overlapping triple-doubles is not greater than 152$2^{-140}$. This last operation therefore offers a large accuracy 153overkill. 154 155TODO The combination of the previous errors should be verified in Gappa. 156 157 158\section{Timings}\label{subsec:timingslog10} 159 160We compare \crlibm's portable triple-double implementation 161for $\log_{10}\left( x \right)$ to other correctly rounded and 162not-correctly rounded implementations. ``\crlibm\ portable using 163\scslib'' is the timing for the earlier implementation in {\tt 164 crlibm}, which has been superseded by the one depicted here since 165version 0.10$\beta$. This earlier implementation was completely based 166on the SCS format and did not contain a quick phase implemented in 167double precision arithmetic. The values are given in arbitrary units 168and obtained on a IBM Power 5 processor with gcc 3.3.3 on a Linux 169Kernel 2.6.5. 170 171\begin{table}[h] 172 \begin{center} 173\begin{tabular}{|l|r|r|} 174 \hline 175 Library & avg time & max time \\ 176 \hline 177 \hline 178 \multicolumn{3}{|c|}{Power5 / Linux-2.6 / gcc-3.3} \\ 179 \hline 180 \texttt{MPFR} & 9490 & 84478 \\ 181 \hline 182 \crlibm\ portable using \texttt{scslib} & 2624 & 2744 \\ 183 \hline 184 \crlibm\ portable using triple-double & 60 & 311 \\ 185 \hline 186 default \texttt{libm} (not correctly rounded) & 66 & 71 \\ 187 \hline 188 \hline 189 \multicolumn{3}{|c|}{PentiumM / Linux-2.6 / gcc-4.0} \\ 190 \hline 191 \crlibm\ portable using triple-double & 304 & 1529 \\ 192 \hline 193 default \texttt{libm} (not correctly rounded) & 153 & 1904 \\ 194 \hline 195 \hline 196\end{tabular} 197\end{center} 198\caption{Log10 timings on Power5 and PentiumM architectures} 199\label{Log10timings} 200\end{table} 201 202On average, our triple-double based implementation is even $10\%$ 203faster than its incorrectly rounding counterpart on Power. On 204Pentium, we observe the usual factor 2 with respect to an 205implementation using double-extended arithmetic. Worst case timings 206are acceptable in both cases. 207 208 209 210%%% Local Variables: 211%%% mode: latex 212%%% TeX-master: "crlibm" 213%%% End: 214