• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..13-Feb-2017-

penryn/H13-Nov-2015-312268

READMEH A D14-Sep-20144.5 KiB16094

add_n.asH A D14-Sep-20142.3 KiB9992

addadd_n.asmH A D13-Nov-20152.7 KiB126118

addlsh_n.asH A D14-Sep-20143.2 KiB145135

addmul_1.asmH A D14-Sep-20142.7 KiB147124

addmul_2.asH A D14-Sep-20144.1 KiB207200

addsub_n.asmH A D13-Nov-20152.7 KiB126118

and_n.asH A D14-Sep-20141.7 KiB6660

andn_n.asH A D14-Sep-20141.7 KiB6661

com_n.asH A D14-Sep-20141.6 KiB6055

copyd.asH A D14-Sep-20141.7 KiB7366

copyi.asmH A D14-Sep-20144 KiB185176

divexact_byff.asH A D14-Sep-20142.2 KiB9384

divrem_hensel_qr_1_2.asmH A D14-Sep-20143.1 KiB168148

gmp-mparam.hH A D13-Nov-20153 KiB8862

hamdist.asmH A D14-Sep-20143.3 KiB189172

ior_n.asH A D14-Sep-20141.7 KiB6660

iorn_n.asH A D14-Sep-20141.7 KiB6661

karaadd.asmH A D14-Sep-20144.6 KiB248240

karasub.asmH A D14-Sep-20145.1 KiB270262

lshift.asmH A D14-Sep-20142.2 KiB10394

mod_1_1.asmH A D14-Sep-20141.5 KiB6252

mod_1_2.asmH A D14-Sep-20142.2 KiB10797

mod_1_3.asmH A D14-Sep-20143.7 KiB207198

mul_1.asmH A D14-Sep-20141.3 KiB5042

mul_2.asH A D14-Sep-20142.8 KiB135129

mul_basecase.asH A D14-Sep-201416.3 KiB922873

mullow_n_basecase.asmH A D13-Feb-20177.5 KiB426381

nand_n.asH A D14-Sep-20141.7 KiB6661

nior_n.asH A D14-Sep-20141.7 KiB6661

popcount.asmH A D14-Sep-20143.7 KiB160150

redc_1.asH A D14-Sep-20147.9 KiB426403

rsh1add_n.asH A D14-Sep-20142.8 KiB123116

rsh1sub_n.asH A D14-Sep-20142.8 KiB125118

rsh_divrem_hensel_qr_1_2.asmH A D14-Sep-20144 KiB237200

rshift.asmH A D14-Sep-20142.3 KiB10495

store.asmH A D14-Sep-20141.3 KiB5646

sub_n.asH A D14-Sep-20142.3 KiB9992

subadd_n.asmH A D13-Nov-20152.7 KiB126118

sublsh1_n.asH A D14-Sep-20142.1 KiB9185

submul_1.asmH A D14-Sep-20142.7 KiB147124

sumdiff_n.asmH A D13-Nov-20153.2 KiB162154

xnor_n.asH A D14-Sep-20141.7 KiB6661

xor_n.asH A D14-Sep-20141.8 KiB6661

README

1This is a patch to solve two problems:
2
31.  It makes gmp run faster on Intel Core2 CPUs (i.e. Woodcrest, Conroe,
4    and friends) under Linux
5
62.  It makes gmp work (and run fast) under Mac OS X on Core2 CPU
7    machines (e.g. Mac Pro)
8
9As an added bonus, it actually gives a little speed up to gmp on AMD64
10machines as well.
11
12
13To Install on a 64 bit Intel Mac (e.g. Mac Pro)
14-------------------------------------------------------
151. Download gmp-4.2.1-core2-port.tar.gz
16
17
182. Uncompress and untar it.  Let's say that it's in the directory
19~/gmp-4.2.1-core2-port
20
21
223.  Download GMP version 4.2.1
23
24
254.  Uncompress and untar GMP.  Let's say that it's in the directory
26~/gmp-4.2.1
27
28
295.  Change into the gmp-4.2.1-core2-port directory and run the install
30script (if you want to see what it's doing, just read it... it's a
31very simple script).
32
33    > cd ~/gmp-4.2.1-core2-port
34    > ./install_gmp_4.2.1_core2_patch.sh ~/gmp-4.2.1
35
36
376.  Configure gmp for a 64 bit Intel Mac as such:
38
39   > cd ~/gmp-4.2.1
40   > ./configure --build=x86_64-apple-darwin CFLAGS="-m64 -fast"
41
42(You can, of course, add whatever other config options you want.  Be
43sure to use the CFLAGS environmental variable given above on the
44command line.  Otherwise, the CFLAGS setting that configure generates
45by default will give you compilation problems.)
46
477.  Build it!  Execute the following:
48
49   > make
50
51
528.  Check it!  Execute the following:
53
54   > make check
55
56
579.  Install it.
58
59   > sudo make install
60
61
62
63
64To Install on a Linux machine.
65-------------------------------------------------------
661. Download gmp-4.2.1-core2-port.tar.gz
67
68
692. Uncompress and untar it.  Let's say that it's in the directory
70~/gmp-4.2.1-core2-port
71
72
733.  Download GMP version 4.2.1
74
75
764.  Uncompress and untar GMP.  Let's say that it's in the directory
77~/gmp-4.2.1
78
79
805.  Change into the gmp-4.2.1-core2-port directory and run the install
81script (if you want to see what it's doing, just read it... it's a
82very simple script).
83
84    > cd ~/gmp-4.2.1-core2-port
85    > ./install_gmp_4.2.1_core2_patch.sh ~/gmp-4.2.1
86
87
886.  Configure gmp as normal.
89
90   > cd ~/gmp-4.2.1
91   > ./configure
92
93(You can, of course, add whatever other config options you want.)
94
95
967.  Build it!  Execute the following:
97
98   > make
99
100
1018.  Check it!  Execute the following:
102
103   > make check
104
105
1069.  Install it.
107
108   > sudo make install
109
110
111
112
113
114
115NOTES:
116
1171. Wow!  The GMP code base is really well organized!  It was very easy
118for me to find out exactly what files needed changing.  Nice work guys!!
119
1202. In amd64call.asm all I changed was to make the addressing relative to
121the rip register rather than absolute.  The Apple 64bit ABI doesn't support
122absolute addressing.  Since Linux can use either addressing mode, it
123makes sense to use position independent code... it's more portable and
124there's no real performance difference.
125
1263. In add_n and sub_n I re-wrote the code to accomidate the Woodcrest
127nuances.  Mainly, I unrolled the main loop and I got rid of the "inc"
128instruction (which causes a false dependency on the flag register and
129thus stalls the pipeline).  Of course, this also meant that I had to
130save the carry flag between loop iterations using the "lahf" and
131"sahf" instructions.  These instructions are available on the Mac Pro
132using the Apple assembler, but because some early x86_64 CPUs didn't
133support those instructins, the GNU assembler doesn't allow those
134mnemonics on 64bit machines (even when the CPU will support it).  So,
135my assembly code includes some m4 code which calls the shell script
136"lahf_sahf_test.sh" which determines if the lahf and sahf instructions
137are available on the CPU.  If so, then it includes some hand assembled
138bytes to get around GNU as limitations.  Otherwise, it falls back to
139using "setc" and "bt" which are slower.
140
1414.  On my 2.66 GHz Mac Pro, I was able to get a GMPbench score of 8263.
142
1435.  You'll notice a Makefile and a bunch of extraneous files.  These are
144used for testing the code outside of the GMP source tree.  The Makefile
145will produce a file called mpn_test which just runs the routines through
146a bunch of speed and correctness tests and compares them against the
147original GMP 4.2.1 assembly files.
148
1496.  On Mac OS X I haven't found a nice way yet to build dynamic
150libraries.  The biggest obstical is that the Apple "libtool" and the
151GNU "libtool" have incompatible syntax.  My guess is that in the near
152future the GNU libtool will support the Apple libtool for creating
153dynamic shared libraries.  For the mean time, I'll be content with
154static libraries.  If you find a simple solution please let me know.
155
156Jason Worth Martin
157Asst. Prof. of Mathematics
158James Madison Univ.
159martinjw@jmu.edu
160