Lines Matching defs:C

26 C P4: 32 cycles/limb integer part, 30 cycles/limb fraction part.  label
29 C mp_limb_t mpn_divrem_1 (mp_ptr dst, mp_size_t xsize, label
30 C mp_srcptr src, mp_size_t size, label
31 C mp_limb_t divisor); label
32 C mp_limb_t mpn_divrem_1c (mp_ptr dst, mp_size_t xsize, label
33 C mp_srcptr src, mp_size_t size, label
34 C mp_limb_t divisor, mp_limb_t carry); label
35 C mp_limb_t mpn_preinv_divrem_1 (mp_ptr dst, mp_size_t xsize, label
36 C mp_srcptr src, mp_size_t size, label
37 C mp_limb_t divisor, mp_limb_t inverse, label
38 C unsigned shift); label
39 C label
40 C Algorithm: label
41 C label
43 C Integers using Multiplication" by Granlund and Montgomery, reference in label
44 C gmp.texi. label
45 C label
46 C "m" is written for what is m' in the paper, and "d" for d_norm, which label
47 C won't cause any confusion since it's only the normalized divisor that's of label
48 C any use in the code. "b" is written for 2^N, the size of a limb, N being label
49 C 32 here. label
50 C label
51 C The step "sdword dr = n - 2^N*d + (2^N-1-q1) * d" is instead done as label
52 C "n-d - q1*d". This rearrangement gives the same two-limb answer but lets label
53 C us have just a psubq on the dependent chain. label
54 C label
55 C For reference, the way the k7 code uses "n-(q1+1)*d" would not suit here, label
56 C detecting an overflow of q1+1 when q1=0xFFFFFFFF would cost too much. label
57 C label
58 C Notes: label
59 C label
60 C mpn_divrem_1 and mpn_preinv_divrem_1 avoid one division if the src high label
61 C limb is less than the divisor. mpn_divrem_1c doesn't check for a zero label
62 C carry, since in normal circumstances that will be a very rare event. label
63 C label
64 C The test for skipping a division is branch free (once size>=1 is tested). label
65 C The store to the destination high limb is 0 when a divide is skipped, or label
66 C if it's not skipped then a copy of the src high limb is stored. The label
67 C latter is in case src==dst. label
68 C label
69 C There's a small bias towards expecting xsize==0, by having code for label
70 C xsize==0 in a straight line and xsize!=0 under forward jumps. label
71 C label
72 C Enhancements: label
73 C label
74 C The loop measures 32 cycles, but the dependent chain would suggest it label
75 C could be done with 30. Not sure where to start looking for the extras. label
76 C label
77 C Alternatives: label
78 C label
79 C If the divisor is normalized (high bit set) then a division step can label
80 C always be skipped, since the high destination limb is always 0 or 1 in label
81 C that case. It doesn't seem worth checking for this though, since it label
82 C probably occurs infrequently. label
305 C ----------------------------------------------------------------------------- label
375 C The dependent chain here consists of label
376 C label
377 C 2 paddd n1+n2 label
378 C 8 pmuludq m*(n1+n2) label
379 C 2 paddq n2:nadj + m*(n1+n2) label
380 C 2 psrlq q1 label
381 C 8 pmuludq d*q1 label
382 C 2 psubq (n-d)-q1*d label
383 C 2 psrlq high n-(q1+1)*d mask label
384 C 2 pand d masked label
385 C 2 paddd n2+d addback label
386 C -- label
387 C 30 label
388 C label
389 C But it seems to run at 32 cycles, so presumably there's something else label
390 C going on. label
561 C ----------------------------------------------------------------------------- label
562 C label