Lines Matching defs:C
25 C cycles/limb label
26 C UltraSPARC 1&2: 9 label
27 C UltraSPARC 3: 10 label
29 C Algorithm: We use 16 floating-point multiplies per limb product, with the label
30 C 2-limb v operand split into eight 16-bit pieces, and the n-limb u operand label
31 C split into 32-bit pieces. We sum four 48-bit partial products using label
32 C floating-point add, then convert the resulting four 50-bit quantities and label
33 C transfer them to the integer unit. label
35 C Possible optimizations: label
36 C 1. Align the stack area where we transfer the four 50-bit product-sums label
37 C to a 32-byte boundary. That would minimize the cache collision. label
38 C (UltraSPARC-1/2 use a direct-mapped cache.) (Perhaps even better would label
39 C be to align the area to map to the area immediately before up?) label
40 C 2. Perform two of the fp->int conversions with integer instructions. We label
41 C can get almost ten free IEU slots, if we clean up bookkeeping and the label
42 C silly carry-limb code. label
43 C 3. For an mpn_addmul_1 based on this, we need to fix the silly carry-limb label
44 C code. label
46 C OSP (Overlapping software pipeline) version of mpn_mul_basecase: label
47 C Operand swap will require 8 LDDA and 8 FXTOD, which will mean 8 cycles. label
48 C FI = 20 label
49 C L = 9 x un * vn label
50 C WDFI = 10 x vn / 2 label
51 C WD = 4 label
53 C Instruction classification (as per UltraSPARC functional units). label
54 C Assuming silly carry code is fixed. Includes bookkeeping. label
55 C label
56 C mpn_addmul_X mpn_mul_X label
57 C 1 2 1 2 label
58 C ========== ========== label
59 C FM 8 16 8 16 label
60 C FA 10 18 10 18 label
61 C MEM 12 12 10 10 label
62 C ISHIFT 6 6 6 6 label
63 C IADDLOG 11 11 10 10 label
64 C BRANCH 1 1 1 1 label
65 C label
66 C TOTAL IEU 17 17 16 16 label
67 C TOTAL 48 64 45 61 label
68 C label
69 C IEU cycles 8.5 8.5 8 8 label
70 C MEM cycles 12 12 10 10 label
71 C ISSUE cycles 12 16 11.25 15.25 label
72 C FPU cycles 10 18 10 18 label
73 C cycles/loop 12 18 12 18 label
74 C cycles/limb 12 9 12 9 label
77 C INPUT PARAMETERS label
78 C rp[n + 1] i0 label
79 C up[n] i1 label
80 C n i2 label
81 C vp[2] i3 label
88 C Combine registers: label
89 C u00_hi= u32_hi label
90 C u00_lo= u32_lo label
91 C a000 = out000 label
92 C a016 = out016 label
93 C Free: f52 f54 label
128 C Initialization. (1) Split v operand into eight 16-bit chunks and store them label
129 C as IEEE double in fp registers. (2) Clear upper 32 bits of fp register pairs label
130 C f2 and f4. (3) Store masks in registers aliased to `xffff' and `xffffffff'. label
131 C This code could be better scheduled. label
205 C Initialization done. label
211 C Start software pipeline. label
215 C mid label
249 C mid label
279 C mid label
307 C 64 32 0 label
308 C . . . label
309 C . |__rXXX_| 32 label
310 C . |___cy___| 34 label
311 C . |_______i00__| 50 label
312 C |_______i16__| . 50 label
315 C BEGIN MAIN LOOP label
322 C label
327 C label
332 C label
337 C label
342 C label
347 C label
352 C label
357 C label
362 C midloop label
367 C label
372 C label
377 C label
382 C label
387 C label
392 C label
397 C label
402 C label
409 C WIND-DOWN PHASE 1 label
442 C mid label
463 C WIND-DOWN PHASE 2 label
482 C mid label
501 C WIND-DOWN PHASE 3 label
518 C mid label