1.. _longlong:
2
3**longlong.h** -- support functions for multi-word arithmetic
4===============================================================================
5
6
7Auxiliary asm macros
8--------------------------------------------------------------------------------
9
10
11.. macro:: umul_ppmm(high_prod, low_prod, multipler, multiplicand)
12
13    Multiplies two single limb integers ``MULTIPLER`` and
14    ``MULTIPLICAND``, and generates a two limb product in
15    ``HIGH_PROD`` and ``LOW_PROD``.
16
17.. macro:: smul_ppmm(high_prod, low_prod, multipler, multiplicand)
18
19    As for ``umul_ppmm()`` but the numbers are signed.
20
21.. macro:: udiv_qrnnd(quotient, remainder, high_numerator, low_numerator, denominator)
22
23    Divides an unsigned integer, composed by the limb integers
24    ``HIGH_NUMERATOR`` and\\ ``LOW_NUMERATOR``, by ``DENOMINATOR``
25    and places the quotient in ``QUOTIENT`` and the remainder in
26    ``REMAINDER``.  ``HIGH_NUMERATOR`` must be less than
27    ``DENOMINATOR`` for correct operation.
28
29.. macro:: sdiv_qrnnd(quotient, remainder, high_numerator, low_numerator, denominator)
30
31    As for ``udiv_qrnnd()`` but the numbers are signed.  The quotient is
32    rounded towards `0`. Note that as the quotient is signed it must lie in
33    the range `[-2^63, 2^63)`.
34
35.. macro:: count_leading_zeros(count, x)
36
37    Counts the number of zero-bits from the msb to the first non-zero bit
38    in the limb ``x``.  This is the number of steps ``x`` needs to
39    be shifted left to set the msb. If ``x`` is `0` then count is
40    undefined.
41
42.. macro:: count_trailing_zeros(count, x)
43
44    As for ``count_leading_zeros()``, but counts from the least
45    significant end. If ``x`` is zero then count is undefined.
46
47.. macro:: add_ssaaaa(high_sum, low_sum, high_addend_1, low_addend_1, high_addend_2, low_addend_2)
48
49    Adds two limb integers, composed by ``HIGH_ADDEND_1`` and
50    ``LOW_ADDEND_1``, and\\ ``HIGH_ADDEND_2`` and ``LOW_ADDEND_2``,
51    respectively.  The result is placed in ``HIGH_SUM`` and
52    ``LOW_SUM``.  Overflow, i.e.\ carry out, is not stored anywhere,
53    and is lost.
54
55.. macro:: add_sssaaaaaa(high_sum, mid_sum, low_sum, high_addend_1, mid_addend_1, low_addend_1, high_addend_2, mid_addend_2, low_addend_2)
56
57    Adds two three limb integers. Carry out is lost.
58
59.. macro:: sub_ddmmss(high_difference, low_difference, high_minuend, low_minuend, high_subtrahend, low_subtrahend)
60
61    Subtracts two limb integers, composed by ``HIGH_MINUEND_1`` and
62    ``LOW_MINUEND_1``, and ``HIGH_SUBTRAHEND_2`` and
63    ``LOW_SUBTRAHEND_2``, respectively.  The result is placed in\\
64    ``HIGH_DIFFERENCE`` and ``LOW_DIFFERENCE``.  Overflow, i.e.\
65    carry out is not stored anywhere, and is lost.
66
67.. macro:: sub_dddmmmsss(high_diff, mid_diff, low_diff, high_minuend_1, mid_minuend_1, low_minuend_1, high_subtrahend_2, mid_subtrahend_2, low_subtrahend_2)
68
69    Subtracts two three limb integers. Borrow out is lost.
70
71.. macro:: byte_swap(x)
72
73    Swap the order of the bytes in the word `x`, i.e. most significant byte
74    becomes least significant byte, etc.
75
76.. macro:: invert_limb(invxl, xl)
77
78    Computes an approximate inverse ``invxl`` of the limb ``xl``,
79    with an implicit leading~`1`. More formally it computes::
80
81        invxl = (B^2 - B*x - 1)/x = (B^2 - 1)/x - B
82
83    Note that `x` must be normalised, i.e.\ with msb set. This inverse
84    makes use of the following theorem of Torbjorn Granlund and Peter
85    Montgomery~[Lemma~8.1][GraMon1994]_:
86
87    Let `d` be normalised, `d < B`, i.e.\ it fits in a word, and suppose
88    that `m d < B^2 \leq (m+1) d`. Let `0 \leq n \leq B d - 1`.  Write
89    `n = n_2 B + n_1 B/2 + n_0` with `n_1 = 0` or `1` and `n_0 < B/2`.
90    Suppose `q_1 B + q_0 = n_2 B + (n_2 + n_1) (m - B) + n_1 (d-B/2) + n_0`
91    and `0 \leq q_0 < B`. Then `0 \leq q_1 < B` and `0 \leq n - q_1 d < 2 d`.
92
93    In the theorem, `m` is the inverse of `d`. If we let
94    ``m = invxl + B`` and `d = x` we have `m d = B^2 - 1 < B^2` and
95    `(m+1) x = B^2 + d - 1 \geq B^2`.
96
97    The theorem is often applied as follows: note that `n_0` and `n_1 (d-B/2)`
98    are both less than `B/2`. Also note that `n_1 (m-B) < B`. Thus the sum of
99    all these terms contributes at most `1` to `q_1`. We are left with
100    `n_2 B + n_2 (m-B)`. But note that `(m-B)` is precisely our precomputed
101    inverse ``invxl``. If we write `q_1 B + q_0 = n_2 B + n_2 (m-B)`,
102    then from the theorem, we have `0 \leq n - q_1 d < 3 d`, i.e.\ the
103    quotient is out by at most `2` and is always either correct or too small.
104
105.. macro:: udiv_qrnnd_preinv(q, r, nh, nl, d, di)
106
107    As for ``udiv_qrnnd()`` but takes a precomputed inverse ``di`` as
108    computed by ``invert_limb()``. The algorithm, in terms of the theorem
109    above, is::
110
111        nadj = n1*(d-B/2) + n0
112        xh, xl = (n2+n1)*(m-B)
113        xh, xl += nadj + n2*B ( xh, xl = n2*B + (n2+n1)*(m-B) + n1*(d-B/2) + n0 )
114        _q1 = B - xh - 1
115        xh, xl = _q1*d + nh, nl - B*d = nh, nl - q1*d - d so that xh = 0 or -1
116        r = xl + xh*d where xh is 0 if q1 is off by 1, otherwise -1
117        q = xh - _q1 = xh + 1 + n2
118
119