1.. _using: 2 3Using ball arithmetic 4=============================================================================== 5 6This section gives an introduction to working with 7real numbers in Arb (see :ref:`arb` for the API and technical documentation). 8The general principles carry over to complex numbers, polynomials and 9matrices. 10 11Ball semantics 12------------------------------------------------------------------------------- 13 14Let `f : A \to B` be a function. 15A ball implementation of `f` is a function `F` that maps sets `X \subseteq A` 16to sets `F(X) \subseteq B` subject to the following rule: 17 18 For all `x \in X`, 19 we have `f(x) \in F(X)`. 20 21In other words, `F(X)` is an *enclosure* for the set `\{f(x) : x \in X\}`. 22This rule is sometimes called the *inclusion principle*. 23 24Throughout the documentation (except where otherwise noted), 25we will simply write `f(x)` instead of `F(X)` 26when describing ball implementations of pointwise-defined mathematical 27functions, understanding that the input is a set of point values and that 28the output is an enclosure. 29 30General subsets of `\mathbb{R}` are not possible to 31represent on a computer. Instead, we work with subsets of the form 32`[m \pm r] = [m-r, m+r]` where the midpoint *m* and radius *r* are binary 33floating-point numbers, i.e. numbers of the form `u 2^v` with `u, v \in \mathbb{Z}` 34(to make this scheme complete, 35we also need to adjoin the special floating-point 36values `-\infty`, `+\infty` and `\operatorname{NaN}`). 37 38Given a ball `[m \pm r]` with `m \in \mathbb{R}` (not necessarily a 39floating-point number), 40we can always round *m* to a nearby floating-point number that has at most 41most *prec* bits in the component *u*, 42and add an upper bound for the rounding error to *r*. 43In Arb, ball functions that take a *prec* argument as input 44(e.g. :func:`arb_add`) always round their output to *prec* bits. 45Some functions are always exact (e.g. :func:`arb_neg`), and thus do not take a *prec* argument. 46 47The programming interface resembles that of GMP. 48Each :type:`arb_t` variable must be initialized with :func:`arb_init` before use 49(this also sets its value to zero), and deallocated with :func:`arb_clear` 50after use. Variables have pass-by-reference semantics. 51In the list of arguments to a function, output variables come first, 52followed by input variables, and finally the precision: 53 54.. code-block:: c 55 56 #include "arb.h" 57 58 int main() 59 { 60 arb_t x, y; 61 arb_init(x); arb_init(y); 62 arb_set_ui(x, 3); /* x = 3 */ 63 arb_const_pi(y, 128); /* y = pi, to 128 bits */ 64 arb_sub(y, y, x, 53); /* y = y - x, to 53 bits */ 65 arb_clear(x); arb_clear(y); 66 } 67 68Binary and decimal 69------------------------------------------------------------------------------- 70 71While the internal representation uses binary floating-point numbers, 72it is usually preferable to print numbers in decimal. The binary-to-decimal 73conversion generally requires rounding. Three different methods 74are available for printing a number to standard output: 75 76* :func:`arb_print` shows the exact internal representation of a ball, with binary exponents. 77* :func:`arb_printd` shows an inexact view of the internal representation, approximated by decimal floating-point numbers. 78* :func:`arb_printn` shows a *decimal ball* that is guaranteed to be an enclosure of the binary 79 floating-point ball. By default, it only prints digits in the midpoint that are certain to 80 be correct, up to an error of at most one unit in the last place. 81 Converting from binary to decimal is generally inexact, and the output of this 82 method takes this rounding into account when printing the radius. 83 84This snippet computes a 53-bit enclosure of `\pi` and prints it 85in three ways: 86 87.. code-block:: c 88 89 arb_const_pi(x, 53); 90 arb_print(x); printf("\n"); 91 arb_printd(x, 20); printf("\n"); 92 arb_printn(x, 20, 0); printf("\n"); 93 94The output is: 95 96.. code-block:: text 97 98 (884279719003555 * 2^-48) +/- (536870913 * 2^-80) 99 3.141592653589793116 +/- 4.4409e-16 100 [3.141592653589793 +/- 5.61e-16] 101 102The :func:`arb_get_str` and :func:`arb_set_str` methods are useful for 103converting rigorously between decimal strings and binary balls 104(:func:`arb_get_str` produces the same string as :func:`arb_printn`, 105and :func:`arb_set_str` can parse such strings back). 106 107A potential mistake is to create a ball from a ``double`` constant 108such as ``2.3``, when this actually represents 109``2.29999999999999982236431605997495353221893310546875``. 110To produce a ball containing the rational number 111`23/10`, one of the following can be used: 112 113.. code-block:: c 114 115 arb_set_str(x, "2.3", prec) 116 117 arb_set_ui(x, 23); 118 arb_div_ui(x, x, 10, prec) 119 120 fmpq_set_si(q, 23, 10); /* q is a FLINT fmpq_t */ 121 arb_set_fmpq(x, q, prec); 122 123Quality of enclosures 124------------------------------------------------------------------------------- 125 126The main problem when working with ball arithmetic (or interval arithmetic) 127is *overestimation*. In general, the enclosure of a value or set 128of values as computed with ball arithmetic will be larger than the smallest 129possible enclosure. 130 131Overestimation results naturally from rounding errors and cancellations 132in the individual steps of a calculation. 133As a general principle, formula rewriting techniques that make 134floating-point code more numerically stable also make ball arithmetic code 135more numerically stable, in the sense of producing tighter enclosures. 136 137As a result of the *dependency problem*, ball or interval 138arithmetic can produce error 139bounds that are much larger than the actual numerical errors 140resulting from doing floating-point arithmetic. 141Consider the expression `(x + 1) - x` as an example. 142When evaluated in floating-point 143arithmetic, `x` may have a large initial error. However, that error will 144cancel itself out in the subtraction, so that the result equals 1 145(except perhaps for a small rounding error left from the operation `x + 1`). 146In ball arithmetic, dependent errors add up instead of cancelling out. 147If `x = [3 \pm 0.1]`, the result will be `[1 \pm 0.2]`, where 148the error bound has doubled. 149In unfavorable circumstances, error bounds can grow exponentially 150with the number of steps. 151 152If all inputs to a calculation are "point values", i.e. 153exact numbers and known mathematical constants that can 154be approximated arbitrarily closely (such as `\pi`), then an error 155of order `2^n` can typically be overcome by working with *n* extra bits of 156precision, increasing the computation time by an amount 157that is polynomial in *n*. 158In certain situations, however, overestimation leads to exponential 159slowdown or even failure of an algorithm to converge. 160For example, root-finding algorithms that refine the result iteratively 161may fail to converge in ball arithmetic, even if they do converge in plain 162floating-point arithmetic. 163 164Therefore, ball arithmetic is not a silver bullet: there will always 165be situations where some amount of numerical or mathematical analysis 166is required. Some experimentation may be required to find whether 167(and how) it can be used effectively for a given problem. 168 169Predicates 170------------------------------------------------------------------------------- 171 172A ball implementation of a predicate 173`f : \mathbb{R} \to \{\operatorname{True}, \operatorname{False}\}` 174would need to be able to return a third logical value indicating 175that the result could be either True or False. 176In most cases, predicates in Arb are implemented as 177functions that return the *int* value 1 to indicate that the 178result certainly is True, and the *int* value 0 to indicate 179that the result could be either True or False. 180To test whether a predicate certainly is False, the user must 181test whether the negated predicate certainly is True. 182 183For example, the following code would *not* be correct in general: 184 185.. code-block:: c 186 187 if (arb_is_positive(x)) 188 { 189 ... /* do things assuming that x > 0 */ 190 } 191 else 192 { 193 ... /* do things assuming that x <= 0 */ 194 } 195 196Instead, the following can be used: 197 198.. code-block:: c 199 200 if (arb_is_positive(x)) 201 { 202 ... /* do things assuming that x > 0 */ 203 } 204 else if (arb_is_nonpositive(x)) 205 { 206 ... /* do things assuming that x <= 0 */ 207 } 208 else 209 { 210 ... /* do things assuming that the sign of x is unknown */ 211 } 212 213Likewise, we will write `x \le y` in mathematical notation with the meaning 214that `x \le y` holds for all `x \in X, y \in Y` where `X` and `Y` are balls. 215 216Note that some predicates such as :func:`arb_overlaps` and :func:`arb_contains` 217actually are predicates on balls viewed as sets, and not ball implementations 218of pointwise predicates. 219 220Some predicates are also complementary. 221For example :func:`arb_contains_zero` tests whether the input ball 222contains the point zero. 223Negated, it is equivalent to :func:`arb_is_nonzero`, 224and complementary to :func:`arb_is_zero` as a pointwise predicate: 225 226.. code-block:: c 227 228 if (arb_is_zero(x)) 229 { 230 ... /* do things assuming that x = 0 */ 231 } 232 #if 1 233 else if (arb_is_nonzero(x)) 234 #else 235 else if (!arb_contains_zero(x)) /* equivalent */ 236 #endif 237 { 238 ... /* do things assuming that x != 0 */ 239 } 240 else 241 { 242 ... /* do things assuming that the sign of x is unknown */ 243 } 244 245A worked example: the sine function 246------------------------------------------------------------------------------- 247 248We implement the function `\sin(x)` naively using 249the Taylor series `\sum_{k=0}^{\infty} (-1)^k x^{2k+1} / (2k+1)!` 250and :type:`arb_t` arithmetic. 251Since there are infinitely many terms, we need to split the series 252in two parts: a finite sum that can be evaluated directly, and 253a tail that has to be bounded. 254 255We stop as soon as we reach a term `t` bounded by `|t| \le 2^{-prec} < 1`. 256The terms are alternating and must have decreasing magnitude 257from that point, so the tail of the series 258is bounded by `|t|`. We add this magnitude to the radius 259of the output. Since ball arithmetic automatically bounds the numerical errors 260resulting from all arithmetic operations, the output *res* is a 261ball guaranteed to contain `\sin(x)`. 262 263.. code-block:: c 264 265 #include "arb.h" 266 267 void arb_sin_naive(arb_t res, const arb_t x, slong prec) 268 { 269 arb_t s, t, u, tol; 270 slong k; 271 arb_init(s); arb_init(t); arb_init(u); arb_init(tol); 272 273 arb_one(tol); 274 arb_mul_2exp_si(tol, tol, -prec); /* tol = 2^-prec */ 275 276 for (k = 0; ; k++) 277 { 278 arb_pow_ui(t, x, 2 * k + 1, prec); 279 arb_fac_ui(u, 2 * k + 1, prec); 280 arb_div(t, t, u, prec); /* t = x^(2k+1) / (2k+1)! */ 281 282 arb_abs(u, t); 283 if (arb_le(u, tol)) /* if |t| <= 2^-prec */ 284 { 285 arb_add_error(s, u); /* add |t| to the radius and stop */ 286 break; 287 } 288 289 if (k % 2 == 0) 290 arb_add(s, s, t, prec); 291 else 292 arb_sub(s, s, t, prec); 293 294 } 295 296 arb_set(res, s); 297 arb_clear(s); arb_clear(t); arb_clear(u); arb_clear(tol); 298 } 299 300This algorithm is naive, because the Taylor series is slow to converge 301and suffers from catastrophic cancellation when `|x|` is large 302(we could also improve the efficiency of the code slightly by 303computing the terms using recurrence relations instead of 304computing `x^k` and `k!` from scratch each iteration). 305 306As a test, we compute `\sin(2016.1)`. 307The largest term in the Taylor series for `\sin(x)` reaches 308a magnitude of about `x^x / x!`, or about `10^{873}` in this case. 309Therefore, we need over 873 digits (about 3000 bits) of precision 310to overcome the catastrophic cancellation and determine 311the result with sufficient accuracy to tell whether it is positive 312or negative. 313 314.. code-block:: c 315 316 int main() 317 { 318 arb_t x, y; 319 slong prec; 320 arb_init(x); arb_init(y); 321 322 for (prec = 64; ; prec *= 2) 323 { 324 arb_set_str(x, "2016.1", prec); 325 arb_sin_naive(y, x, prec); 326 printf("Using %5ld bits, sin(x) = ", prec); 327 arb_printn(y, 10, 0); printf("\n"); 328 if (!arb_contains_zero(y)) /* stopping condition */ 329 break; 330 } 331 332 arb_clear(x); arb_clear(y); 333 } 334 335The program produces the following output: 336 337.. code-block:: text 338 339 Using 64 bits, sin(x) = [+/- 2.67e+859] 340 Using 128 bits, sin(x) = [+/- 1.30e+840] 341 Using 256 bits, sin(x) = [+/- 3.60e+801] 342 Using 512 bits, sin(x) = [+/- 3.01e+724] 343 Using 1024 bits, sin(x) = [+/- 2.18e+570] 344 Using 2048 bits, sin(x) = [+/- 1.22e+262] 345 Using 4096 bits, sin(x) = [-0.7190842207 +/- 1.20e-11] 346 347As an exercise, the reader may improve the naive algorithm by making it 348subtract a well-chosen multiple of `2 \pi` from `x` before invoking 349the Taylor series (hint: use :func:`arb_const_pi`, :func:`arb_div` 350and :func:`arf_get_fmpz`). 351If done correctly, 64 bits of precision should be more than enough to 352compute `\sin(2016.1)`, and with minor adjustments 353to the code, the user should be able to compute 354`\sin(\exp(2016.1))` quite easily as well. 355 356This example illustrates how ball arithmetic can be used to perform 357nontrivial calculations. To evaluate an infinite series, the user 358needs to know how to bound the tail of the series, but everything 359else is automatic. 360When evaluating a finite formula that can be expressed 361completely using built-in functions, all error bounding is automatic 362from the point of view of the user. 363In particular, the :func:`arb_sin` method should be used to compute the sine 364of a real number; it uses a much more efficient algorithm 365than the naive code above. 366 367This example also illustrates the "guess-and-verify" paradigm: 368instead of determining *a priori* the floating-point precision necessary 369to get a correct result, we *guess* some initial precision, use ball arithmetic 370to *verify* that the result is accurate enough, and restart with 371higher precision (or signal failure) if it is not. 372 373If we think of rounding errors as essentially random processes, 374then a floating-point computation is analogous to a 375*Monte Carlo algorithm*. Using ball arithmetic to get a verified result 376effectively turns it into the analog of a *Las Vegas algorithm*, 377which is a randomized algorithm that always gives a correct result if it terminates, but 378may fail to terminate (alternatively, instead of actually looping forever, 379it might signal failure after a certain number of iterations). 380 381The loop will fail to terminate if we attempt to determine the sign of 382`\sin(\pi)`: 383 384.. code-block:: text 385 386 Using 64 bits, sin(x) = [+/- 3.96e-18] 387 Using 128 bits, sin(x) = [+/- 2.17e-37] 388 Using 256 bits, sin(x) = [+/- 6.10e-76] 389 Using 512 bits, sin(x) = [+/- 5.13e-153] 390 Using 1024 bits, sin(x) = [+/- 4.01e-307] 391 Using 2048 bits, sin(x) = [+/- 2.13e-615] 392 Using 4096 bits, sin(x) = [+/- 6.85e-1232] 393 Using 8192 bits, sin(x) = [+/- 6.46e-2465] 394 Using 16384 bits, sin(x) = [+/- 5.09e-4931] 395 Using 32768 bits, sin(x) = [+/- 5.41e-9863] 396 ... 397 398The sign of a nonzero real number can be 399decided by computing it to sufficiently high accuracy, but the sign 400of an expression that is exactly equal to zero cannot be decided 401by a numerical computation unless the entire computation 402happens to be exact (in this example, we could use the :func:`arb_sin_pi` 403function which computes `\sin(\pi x)` in one step, with the input `x = 1`). 404 405It is up to the user to implement a stopping criterion appropriate for 406the circumstances of a given application. For example, breaking 407when it is clear that `|\sin(x)| < 10^{-10000}` would allow the program 408to terminate and convey some meaningful information about the input `x = \pi`, 409though this would not constitute a mathematical proof that 410`\sin(\pi) = 0`. 411 412More on precision and accuracy 413------------------------------------------------------------------------------- 414 415The relation between the working precision and the accuracy of the output 416is not always easy predict. The following remarks might help 417to choose *prec* optimally. 418 419For a ball `[m \pm r]` it is convenient to define the following notions: 420 421* Absolute error: `e_{abs} = |r|` 422* Relative error: `e_{rel} = |r| / \max(0, |m| - |r|)` (or `e_{rel} = 0` if `r = m = 0`) 423* Absolute accuracy: `a_{abs} = 1 / e_{abs}` 424* Relative accuracy: `a_{rel} = 1 / e_{rel}` 425 426Expressed in bits, one takes the corresponding `\log_2` values. 427 428Of course, if `x` is the exact value being approximated, then 429the "absolute error" so defined is an upper bound for the 430actual absolute error `|x-m|` and "absolute accuracy" 431a lower bound for `1/|x-m|`, etc. 432 433The *prec* argument in Arb should be thought of as controlling 434the working precision. 435Generically, when evaluating a fixed expression (that is, when the 436sequence of operations does not depend on the precision), the 437absolute or relative error will be bounded by 438 439.. math :: 440 441 2^{O(1) - prec} 442 443where the `O(1)` term depends on the expression and implementation 444details of the ball functions used to evaluate it. 445Accordingly, for an accuracy of *p* bits, we need to use a working precision 446`O(1) + p`. 447If the expression is numerically well-behaved, then the `O(1)` term 448will be small, which leads to the heuristic of "adding a few guard bits" 449(for most basic calculations, 10 or 20 guard bits is enough). 450If the `O(1)` term is unknown, then increasing the number of guard 451bits in exponential steps until the result is accurate enough 452is generally a good heuristic. 453 454Sometimes, a partially accurate result can be used to estimate the `O(1)` 455term. For example, if the goal is to achieve 100 bits of accuracy 456and a precision of 120 bits yields 80 bits of accuracy, then 457it is plausible that a precision of just over 458140 bits yields 100 bits of accuracy. 459 460Built-in functions in Arb can roughly be characterized as 461belonging to one of two extremes (though there is actually a spectrum): 462 463* Simple operations, including basic arithmetic operations and many 464 elementary functions. In most cases, for an input `x = [m \pm r]`, 465 `f(x)` is evaluated by computing `f(m)` and then separately bounding the 466 *propagated error* `|f(m) - f(m + \varepsilon)|, |\varepsilon| \le r`. 467 The working precision is automatically increased internally 468 so that `f(m)` is computed to *prec* bits of relative accuracy 469 with an error of at most a few units in the last place (perhaps with 470 rare exceptions). 471 The propagated error can generally be bounded quite tightly as well (see :ref:`general_formulas`). 472 As a result, the enclosure will be close to the best possible 473 at the given precision, and the user can estimate the precision to use 474 accordingly. 475 476* Complex operations, such as certain higher 477 transcendental functions (for example, the Riemann zeta function). 478 The function is evaluated by performing a sequence of simpler operations, 479 each using ball arithmetic with a working precision of roughly *prec* 480 bits. The sequence of operations might depend on *prec*; 481 for example, an infinite series might be truncated 482 so that the remainder is smaller than `2^{-prec}`. 483 The final result can be far from tight, and it is not guaranteed 484 that the error converges to zero as `prec \to \infty`, though 485 in practice, it should do so in most cases. 486 487In short, the *inclusion principle* is the fundamental contract in Arb. 488Enclosures computed by built-in functions may or may not be tight 489enough to be useful, but the hope is that they will be sufficient 490for most purposes. 491Tightening the error bounds for more complex operations is a long 492term optimization goal, which in many cases will require a 493fair amount of research. 494A tradeoff also has to be made for efficiency: tighter error bounds 495allow the user to work with a lower precision, but they may 496also be much more expensive to compute. 497 498Polynomial time guarantee 499------------------------------------------------------------------------------- 500 501Arb provides a soft guarantee that the time used to evaluate a ball 502function will depend polynomially on *prec* and the bit size 503of the input, uniformly regardless of the numerical value of the input. 504 505The idea behind this soft guarantee is to allow Arb to be used as a 506black box to evaluate expressions numerically without potentially 507slowing down, hanging indefinitely or crashing 508because of "bad" input such as nested exponentials. 509By controlling the precision, the user can cancel 510a computation before it uses up 511an unreasonable amount of resources, 512without having to rely on other timeout or exception mechanisms. 513A result that is feasible but very expensive to compute 514can still be forced by setting the precision high enough. 515 516As motivation, consider evaluating `\sin(x)` or `\exp(x)` with 517the exact floating-point number 518`x = 2^{2^n}` as input. 519The time and space required to compute an accurate floating-point 520approximation of `\sin(x)` or `\exp(x)` increases as `2^n`, 521in the first case because because of the need to subtract an accurate 522multiple of `2\pi` and in the second case due to the size of the 523output exponent and the internal subtraction of an accurate multiple of `\log(2)`. 524This is despite the fact that the size of `x` as an object in memory only 525increases linearly with `n`. 526Already `n = 33` would require at least 1 GB of memory, and 527`n = 100` would be physically impossible to process. 528For functions that are computed by direct use of power series expansions, 529e.g. `f(x) = \sum_{k=0}^{\infty} c_k x^k`, 530without having fast argument-reduction techniques 531like those for elementary functions, 532the time would be exponential in `n` already when `x = 2^n`. 533 534Therefore, Arb caps internal work parameters 535(the internal working precision, 536the number terms of an infinite series to add, etc.) by polynomial, 537usually linear, functions of *prec*. 538When the limit is exceeded, the output is set to a crude bound. 539For example, if `x` is too large, :func:`arb_sin` will 540simply return `[\pm 1]`, and :func:`arb_exp` 541will simply return `[\pm \infty]` if `x` is positive 542or `[\pm 2^{-m}]` if `x` is negative. 543 544This is not just a failsafe, but occasionally a useful optimization. 545It is not entirely uncommon to have formulas where one term 546is modest and another term decreases exponentially, such as: 547 548.. math :: 549 550 \log(x) + \sin(x) \exp(-x). 551 552For example, the reflection formula of the digamma function has 553a similar structure. 554When `x` is large, the right term would be expensive to compute 555to high relative accuracy. Doing so is unnecessary, however, 556since a crude bound of `[\pm 1] \cdot [\pm 2^{-m}]` is enough to evaluate 557the expression as a whole accurately. 558 559The polynomial time guarantee is "soft" in that there are a few exceptions. 560For example, the complexity of computing the Riemann zeta function 561`\zeta(\sigma+it)` increases linearly with the imaginary height `|t|` 562in the current implementation, and all known algorithms 563have a complexity of `|t|^{\alpha}` where the best known value for `\alpha` 564is about `0.3`. 565Input with large `|t|` is most likely to be given deliberately 566by users with the explicit intent of evaluating the zeta 567function itself, so the evaluation is not cut off automatically. 568 569