1This chapter is contributed by Ch. Q. Lauter.
2
3\section{Main considerations, critical accuracy bounds}\label{subsec:criticalboundslog10}
4If one wants to guarantee that an implementation of the logarithm in
5base $10$, $\log_{10}\left( x \right)$, in double precision is
6correctly rounded, one has to ensure that the final intermediate
7approximation before rounding to double precision has a relative error
8of less than $2^{-122}$.
9
10An implementation of $\log_{10}\left(x\right)$ can also be derived from an
11implementation of the natural logarithm $\ln\left(x\right)$ using the formula:
12
13\begin{equation}
14  \log_{10}\left( x \right) = \frac{1}{\ln\left( 10 \right)} \cdot
15\ln\left( x \right)\label{eq:log10}
16\end{equation}
17When doing so, one must ensure that the constant
18$\mathit{log10inv} = \frac{1}{\ln\left(10\right)}$ is stored with
19enough accuracy, that the approximation for $\ln\left( x \right)$ is
20exact enough and that the multiplication sequence does not introduce
21too great an error. As we will see in the next section
22\ref{subsec:outlinelog10}, this implies slight changes to the code for
23the natural logarithm with regard to what has been presented in
24 Chapter \ref{chap:log}.
25
26 With regard to final rounding, the elementary function $\log_{10}$
27 presents a particular issue that is somewhat singular amongst all
28 considered elementary functions: There exist a large set of input
29 double-precision numbers $x$ such that $\log_{10}(x)$ is a rational
30 and is representable as a double-precision number.
31
32 For such cases, a final directed rounding will be correct only if the
33 approximation error is exactly $0$.
34 Indeed, the rounding $\diamond\left( f\left( x \right) \right)$ of
35 the exactly representably value $f\left(x\right) \in \F$ is trivially
36 $\diamond\left( f\left( x \right) \right) = f\left( x \right)$
37 \cite{IEEE754}. In contrast, $\diamond\left( f\left( x \right) +
38   \delta \right) \not = f\left( x \right) \in \F$ holds for all
39 $\left \vert \delta \right \vert > 0$.
40
41 As it is impossible to achieve an approximation error exactly equal
42 to zero, it is preferable to filter out such cases and handle them
43 separately. Other functions described so far had only one such argument
44 ($x=1$ for $\log$, x=0 for the trigs). $\log_2$ has a set of such
45 cases ($x = 2^k$, $k \in \Z$) which is equally trivial to handle in
46 binary floating point.
47
48
49For $f = \log_{10}$, filtering much more difficult. In fact, $y =
50\log\left( x \right)$ is algebraic for exactly all $x = 10^k$, $k \in
51\Z$ \cite{Baker75}. Filtering means thus testing whether an input $x$
52can be written $x = 10^k$ with an integer $k \in \Z$. This is
53equivalent to testing if $\log_{10}\left( x \right)$ an integer,
54i.e. $\log_{10}\left( x \right) \in \Z$. However, since $\log_{10}\left( x
55\right)$ can only be approximated, filtering this way is
56impossible.
57
58One possibility is the following approach. In
59floating point arithmetic, in order to be in a situation of difficult
60rounding, not only $\log_{10}\left( x \right)$ must be algebraic but
61also the corresponding $x = 10^k$, $k \in \Z$, must be representable
62in floating point. To start with eliminating cases, we can argue that
63this impossible for all $k < 0$. Indeed, since $2 \nmid 5$, there
64exist no $m \in \N$ and $e \in \Z$ for any $k \in \Z^-$ such that
65$10^k = 2^e \cdot m$ \cite{Muller97}. So we have reduced the range of
66cases to filter to all $x = 10^k$, $k \in \N \cup \left \lbrace
670\right \rbrace$. Further in double precision, the mantissa's length
68is $53$. So $10^k = 2^k \cdot 5^k = 2^e \cdot m$ is exactly
69representable in double precision only for values $k \in \N \cup \left
70\lbrace 0 \right \rbrace$ such that $5^k \leq 2^{53}$. This yields to
71$k \leq 53 \cdot \frac{\ln\left( 2 \right)}{\ln\left( 5 \right)}
72\approx 22.82$; hence $0 \leq k \leq 22$. In consequence, it would be
73possible to filter out the $23$ arguments $x = 10^k$ with $k \in \left
74\lbrace 0 \dots 22 \right \rbrace$. Nevertheless, this approach would
75be relatively costly. It is not the way that has been chosen for the
76implementation presented here.
77
78Our approach uses the critical worst case accuracy of the
79elementary function $\log_{10}\left( x \right)$. As already mentioned,
80it is $2^{-122}$. Under the condition that we can provide an
81approximation to the function that is exact to at least $2^{-123}$, we
82can decide the directed rounding using a modified final rounding
83sequence: We know that a $1$ after a long series of $0$s (respectively
84a $0$ after a long series of $1$s) must be present at least at the
85$122$th bit of the intermediate mantissa.  If it is not, we can
86consider a potentially present $1$ after the $122$th bit to be an
87approximation error artefact. In fact this means neglecting $\delta$s
88relatively less than $2^{-122}$ when rounding $\diamond \left( f\left(
89x \right) + \delta \right)$ instead of $\diamond \left( f\left( x
90\right) \right)$.
91
92One shortcoming of this approach is that the
93accurate phase is launched for arguments where the quick phase's
94accuracy would suffice to return the correct result. As
95such arguments are extremely rare ($p = \frac{23}{2^{63}} \approx 2.5
96\cdot 10^{-18}$~!), this is not of an issue. The modification of the
97final rounding sequence is relatively lightweight: merely one floating
98point multiplication, two integer masks and one integer comparison
99have to be added to handle the case.
100
101One remarks this approach is only possible because the critical worst
102case accuracy of the function is known by Lef{\`e}vre's works. Ziv's oignon peeling
103strategy without the filtering of the
104$23$ possible cases in input and without any accuracy limitation for
105intermediate computations yields to nontermination of the
106implementation of the function on such arguments $x = 10^k$.
107
108An earlier \crlibm\ implementation of the $\log_{10}\left(x\right)$
109function based on the SCS format did not handle the problem and
110returned incorrectly rounded results for inputs $x = 10^k$ in the
111directed rounding modes.
112
113
114\section{General outline of the algorithm and accuracy estimates}\label{subsec:outlinelog10}
115% 1/2 page
116%
117% - Multiply by the right constant, this time using a triple double for the constant => Mul33 which is costly
118% - Tell about the need to gain some bits in the log for the worst case => renormalize at some point in the code
119% - Analyse the issue of integer powers of 10 => give explanation that there are only 17 cases
120% - Indicate the way the final rounding sequence for triple double can be modified => additional costs
121% - Mention that we launch the accurate phase even for results where the quick phase result suffices (10^n),
122%   analyse the problem and mention that it is unique for log10 (in the usual list of elementary functions)
123%   but that it is quasi impossible to get around it (tell that log10 in SCS did not correctly treat the problem)
124
125The quick phase of the implementation of the $\log_{10}\left( x
126\right)$ follows exactly the scheme depicted by equation
127(\ref{eq:log10}) above. Similarly to the logarithm in base
128$2$, the natural logarithm's intermediate double-double result is
129multiplied by a double-double precision approximation of
130$\mathit{log10inv}$. The rounding test is slightly modified in order
131to ensure safe rounding or launching the accurate phase.
132
133Concerning the accurate phase, some modifications in the natural
134logarithm's code are necessary because of the tighter accuracy bound
135needed for the worst case. The natural logarithms accurate phase
136polynomial approximation relative error has already been less than
137$2^{-125}$ which is exact enough for $\log_{10}\left( x
138\right)$. The fact that the complete triple-double implementation is
139exact to only $119$ bits, is mainly due to the inexactness of the
140operators used in reconstruction phase. In turn, this inexactness is
141caused by the relatively high overlap in the triple-double numbers
142handled. By adding two additional renormalisations the triple-double
143operators become exact enough.
144
145The constant $\mathit{log10inv}$ cannot be stored in double-double
146precision with an accuracy of $124$ bits. A triple-double
147approximation is therefore used. Its relative approximation error is
148smaller than $2^{-159}$. The final multiplication of the triple-double
149constant representing $\mathit{log10inv}$ and the triple-double
150natural logarithm result is performed by a \MulTT. The relative error
151of this operator on non-overlapping triple-doubles is not greater than
152$2^{-140}$. This last operation therefore offers a large accuracy
153overkill.
154
155TODO The combination of the previous errors should be verified in Gappa.
156
157
158\section{Timings}\label{subsec:timingslog10}
159
160We compare \crlibm's portable triple-double implementation
161for $\log_{10}\left( x \right)$ to other correctly rounded and
162not-correctly rounded implementations.  ``\crlibm\ portable using
163\scslib'' is the timing for the earlier implementation in {\tt
164  crlibm}, which has been superseded by the one depicted here since
165version 0.10$\beta$. This earlier implementation was completely based
166on the SCS format and did not contain a quick phase implemented in
167double precision arithmetic. The values are given in arbitrary units
168and obtained on a IBM Power 5 processor with gcc 3.3.3 on a Linux
169Kernel 2.6.5.
170
171\begin{table}[h]
172  \begin{center}
173\begin{tabular}{|l|r|r|}
174 \hline
175  Library                       &     avg time  & max time \\
176 \hline
177 \hline
178 \multicolumn{3}{|c|}{Power5 / Linux-2.6 / gcc-3.3}   \\
179 \hline
180 \texttt{MPFR}   &   9490    & 84478        \\
181 \hline
182 \crlibm\ portable using \texttt{scslib}   &   2624    & 2744        \\
183 \hline
184 \crlibm\ portable using triple-double      &        60    & 311        \\
185 \hline
186 default \texttt{libm} (not correctly rounded)   &        66    & 71      \\
187 \hline
188 \hline
189 \multicolumn{3}{|c|}{PentiumM / Linux-2.6 / gcc-4.0}   \\
190 \hline
191 \crlibm\ portable using triple-double                  &        304    & 1529      \\
192 \hline
193 default \texttt{libm}  (not correctly rounded)          &        153    & 1904      \\
194 \hline
195 \hline
196\end{tabular}
197\end{center}
198\caption{Log10 timings on Power5 and PentiumM architectures}
199\label{Log10timings}
200\end{table}
201
202On average, our triple-double based implementation is even $10\%$
203faster than its  incorrectly rounding counterpart on Power. On
204Pentium, we observe the usual factor 2 with respect to an
205implementation using double-extended arithmetic. Worst case timings
206are acceptable in both cases.
207
208
209
210%%% Local Variables:
211%%% mode: latex
212%%% TeX-master: "crlibm"
213%%% End:
214