1Copyright 2002, 2005 Free Software Foundation, Inc. 2 3This file is part of the GNU MP Library. 4 5The GNU MP Library is free software; you can redistribute it and/or modify 6it under the terms of either: 7 8 * the GNU Lesser General Public License as published by the Free 9 Software Foundation; either version 3 of the License, or (at your 10 option) any later version. 11 12or 13 14 * the GNU General Public License as published by the Free Software 15 Foundation; either version 2 of the License, or (at your option) any 16 later version. 17 18or both in parallel, as here. 19 20The GNU MP Library is distributed in the hope that it will be useful, but 21WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY 22or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 23for more details. 24 25You should have received copies of the GNU General Public License and the 26GNU Lesser General Public License along with the GNU MP Library. If not, 27see https://www.gnu.org/licenses/. 28 29 30 31 32 33 POWERPC 32-BIT MPN SUBROUTINES 34 35 36This directory contains mpn functions for various 32-bit PowerPC chips. 37 38 39CODE ORGANIZATION 40 41 directory used for 42 ================================================ 43 powerpc generic, 604, 604e, 744x, 745x 44 powerpc/750 740, 750, 7400, 7410 45 46 47The top-level powerpc directory is currently mostly aimed at 604/604e but 48should be reasonable on all powerpcs. 49 50 51 52STATUS 53 54The code is quite well optimized for the 604e, other chips have had less 55attention. 56 57Altivec SIMD available in 74xx might hold some promise, but unfortunately 58GMP only guarantees 32-bit data alignment, so there's lots of fiddling 59around with partial operations at the start and end of limb vectors. A 60128-bit limb would be a novel idea, but is unlikely to be practical, since 61it would have to work with ordinary +, -, * etc in the C code. 62 63Also, Altivec isn't very well suited for the GMP multiplication needs. 64Using floating-point based multiplication has much better better performance 65potential for all current powerpcs, both the ones with slow integer multiply 66units (603, 740, 750, 7400, 7410) and those with fast (604, 604e, 744x, 67745x). This is because all powerpcs do some level of pipelining in the FPU: 68 69603 and 750 can sustain one fmadd every 2nd cycle. 70604 and 604e can sustain one fmadd per cycle. 717400 and 7410 can sustain 3 fmadd in 4 cycles. 72744x and 745x can sustain 4 fmadd in 5 cycles. 73 74 75 76REGISTER NAMES 77 78The normal powerpc convention is to give registers as plain numbers, like 79"mtctr 6", but on Apple MacOS X (powerpc*-*-rhapsody* and 80powerpc*-*-darwin*) the assembler demands an "r" like "mtctr r6". Note 81however when register 0 in an instruction means a literal zero the "r" is 82omitted, for instance "lwzx r6,0,r7". 83 84The GMP code uses the "r" forms, powerpc-defs.m4 transforms them to plain 85numbers according to what GMP_ASM_POWERPC_R_REGISTERS finds is needed. 86(Note that this style isn't fully general, as the identifier r4 and the 87register r4 will not be distinguishable on some systems. However, this is 88not a problem for the limited GMP assembly usage.) 89 90 91 92GLOBAL REFERENCES 93 94Linux non-PIC 95 lis 9, __gmp_binvert_limb_table@ha 96 rlwinm 11, 5, 31, 25, 31 97 la 9, __gmp_binvert_limb_table@l(9) 98 lbzx 11, 9, 11 99 100Linux PIC (FIXME) 101.LCL0: 102 .long .LCTOC1-.LCF0 103 bcl 20, 31, .LCF0 104.LCF0: 105 mflr 30 106 lwz 7, .LCL0-.LCF0(30) 107 add 30, 7, 30 108 lwz 11, .LC0-.LCTOC1(30) 109 rlwinm 3, 5, 31, 25, 31 110 lbzx 7, 11, 3 111 112AIX (always PIC) 113LC..0: 114 .tc __gmp_binvert_limb_table[TC],__gmp_binvert_limb_table[RW] 115 lwz 9, LC..0(2) 116 rlwinm 0, 5, 31, 25, 31 117 lbzx 0, 9, 0 118 119Darwin (non-PIC) 120 lis r2, ha16(___gmp_binvert_limb_table) 121 rlwinm r9, r5, 31, 25, 31 122 la r2, lo16(___gmp_binvert_limb_table)(r2) 123 lbzx r0, r2, r9 124Darwin (PIC) 125 mflr r0 126 bcl 20, 31, L0001$pb 127L0001$pb: 128 mflr r7 129 mtlr r0 130 addis r2, r7, ha16(L___gmp_binvert_limb_table$non_lazy_ptr-L0001$pb) 131 rlwinm r9, r5, 31, 25, 31 132 lwz r2, lo16(L___gmp_binvert_limb_table$non_lazy_ptr-L0001$pb)(r2) 133 lbzx r0, r2, r9 134------ 135 .non_lazy_symbol_pointer 136L___gmp_binvert_limb_table$non_lazy_ptr: 137 .indirect_symbol ___gmp_binvert_limb_table 138 .long 0 139 .subsections_via_symbols 140 141 142For GNU/Linux and Darwin, we might want to duplicate __gmp_binvert_limb_table 143into the text section in this file. We should thus be able to reach it like 144this: 145 146 blr L0 147L0: mflr r2 148 rlwinm r9, r5, 31, 25, 31 149 addi r9, r9, lo16(local_binvert_table-L0) 150 lbzx r0, r2, r9 151 152 153 154REFERENCES 155 156PowerPC Microprocessor Family: The Programming Environments for 32-bit 157Microprocessors, IBM document G522-0290-01, 2000. 158 159PowerPC 604e RISC Microprocessor User's Manual with Supplement for PowerPC 160604 Microprocessor, IBM document G552-0330-00, Freescale document 161MPC604EUM/AD, 3/1998. 162 163MPC7410/MPC7400 RISC Microprocessor User's Manual, Freescale document 164MPC7400UM/D, rev 1, 11/2002. 165 166MPC7450 RISC Microprocessor Family Reference Manual, Freescale document 167MPC7450UM, rev 5, 1/2005. 168 169The above are available online from 170 171 http://www.ibm.com/chips/techlib/techlib.nsf/productfamilies/PowerPC 172 http://www.freescale.com/PowerPC 173 174 175 176---------------- 177Local variables: 178mode: text 179fill-column: 76 180End: 181