174664626SKris Kennaway.ident "sparcv8plus.s, Version 1.4" 2e71b7053SJung-uk Kim.ident "SPARC v9 ISA artwork by Andy Polyakov <appro@openssl.org>" 374664626SKris Kennaway 474664626SKris Kennaway/* 574664626SKris Kennaway * ==================================================================== 6b077aed3SPierre Pronchery * Copyright 1999-2016 The OpenSSL Project Authors. All Rights Reserved. 774664626SKris Kennaway * 8b077aed3SPierre Pronchery * Licensed under the Apache License 2.0 (the "License"). You may not use 9e71b7053SJung-uk Kim * this file except in compliance with the License. You can obtain a copy 10e71b7053SJung-uk Kim * in the file LICENSE in the source distribution or at 11e71b7053SJung-uk Kim * https://www.openssl.org/source/license.html 1274664626SKris Kennaway * ==================================================================== 1374664626SKris Kennaway */ 1474664626SKris Kennaway 1574664626SKris Kennaway/* 16e71b7053SJung-uk Kim * This is my modest contribution to OpenSSL project (see 1774664626SKris Kennaway * http://www.openssl.org/ for more information about it) and is 1874664626SKris Kennaway * a drop-in UltraSPARC ISA replacement for crypto/bn/bn_asm.c 1974664626SKris Kennaway * module. For updates see http://fy.chalmers.se/~appro/hpe/. 2074664626SKris Kennaway * 2174664626SKris Kennaway * Questions-n-answers. 2274664626SKris Kennaway * 2374664626SKris Kennaway * Q. How to compile? 2474664626SKris Kennaway * A. With SC4.x/SC5.x: 2574664626SKris Kennaway * 2674664626SKris Kennaway * cc -xarch=v8plus -c bn_asm.sparc.v8plus.S -o bn_asm.o 2774664626SKris Kennaway * 2874664626SKris Kennaway * and with gcc: 2974664626SKris Kennaway * 3074664626SKris Kennaway * gcc -mcpu=ultrasparc -c bn_asm.sparc.v8plus.S -o bn_asm.o 3174664626SKris Kennaway * 3274664626SKris Kennaway * or if above fails (it does if you have gas installed): 3374664626SKris Kennaway * 3474664626SKris Kennaway * gcc -E bn_asm.sparc.v8plus.S | as -xarch=v8plus /dev/fd/0 -o bn_asm.o 3574664626SKris Kennaway * 3674664626SKris Kennaway * Quick-n-dirty way to fuse the module into the library. 3774664626SKris Kennaway * Provided that the library is already configured and built 3874664626SKris Kennaway * (in 0.9.2 case with no-asm option): 3974664626SKris Kennaway * 4074664626SKris Kennaway * # cd crypto/bn 4174664626SKris Kennaway * # cp /some/place/bn_asm.sparc.v8plus.S . 4274664626SKris Kennaway * # cc -xarch=v8plus -c bn_asm.sparc.v8plus.S -o bn_asm.o 4374664626SKris Kennaway * # make 4474664626SKris Kennaway * # cd ../.. 4574664626SKris Kennaway * # make; make test 4674664626SKris Kennaway * 4774664626SKris Kennaway * Quick-n-dirty way to get rid of it: 4874664626SKris Kennaway * 4974664626SKris Kennaway * # cd crypto/bn 5074664626SKris Kennaway * # touch bn_asm.c 5174664626SKris Kennaway * # make 5274664626SKris Kennaway * # cd ../.. 5374664626SKris Kennaway * # make; make test 5474664626SKris Kennaway * 55e71b7053SJung-uk Kim * Q. V8plus architecture? What kind of beast is that? 5674664626SKris Kennaway * A. Well, it's rather a programming model than an architecture... 5774664626SKris Kennaway * It's actually v9-compliant, i.e. *any* UltraSPARC, CPU under 5874664626SKris Kennaway * special conditions, namely when kernel doesn't preserve upper 5974664626SKris Kennaway * 32 bits of otherwise 64-bit registers during a context switch. 6074664626SKris Kennaway * 6174664626SKris Kennaway * Q. Why just UltraSPARC? What about SuperSPARC? 6274664626SKris Kennaway * A. Original release did target UltraSPARC only. Now SuperSPARC 6374664626SKris Kennaway * version is provided along. Both version share bn_*comba[48] 6474664626SKris Kennaway * implementations (see comment later in code for explanation). 6574664626SKris Kennaway * But what's so special about this UltraSPARC implementation? 6674664626SKris Kennaway * Why didn't I let compiler do the job? Trouble is that most of 6774664626SKris Kennaway * available compilers (well, SC5.0 is the only exception) don't 6874664626SKris Kennaway * attempt to take advantage of UltraSPARC's 64-bitness under 6974664626SKris Kennaway * 32-bit kernels even though it's perfectly possible (see next 7074664626SKris Kennaway * question). 7174664626SKris Kennaway * 7274664626SKris Kennaway * Q. 64-bit registers under 32-bit kernels? Didn't you just say it 7374664626SKris Kennaway * doesn't work? 74e71b7053SJung-uk Kim * A. You can't address *all* registers as 64-bit wide:-( The catch is 7574664626SKris Kennaway * that you actually may rely upon %o0-%o5 and %g1-%g4 being fully 7674664626SKris Kennaway * preserved if you're in a leaf function, i.e. such never calling 7774664626SKris Kennaway * any other functions. All functions in this module are leaf and 7874664626SKris Kennaway * 10 registers is a handful. And as a matter of fact none-"comba" 7974664626SKris Kennaway * routines don't require even that much and I could even afford to 8074664626SKris Kennaway * not allocate own stack frame for 'em:-) 8174664626SKris Kennaway * 8274664626SKris Kennaway * Q. What about 64-bit kernels? 8374664626SKris Kennaway * A. What about 'em? Just kidding:-) Pure 64-bit version is currently 8474664626SKris Kennaway * under evaluation and development... 8574664626SKris Kennaway * 8674664626SKris Kennaway * Q. What about shared libraries? 8774664626SKris Kennaway * A. What about 'em? Kidding again:-) Code does *not* contain any 8874664626SKris Kennaway * code position dependencies and it's safe to include it into 8974664626SKris Kennaway * shared library as is. 9074664626SKris Kennaway * 9174664626SKris Kennaway * Q. How much faster does it go? 9274664626SKris Kennaway * A. Do you have a good benchmark? In either case below is what I 9374664626SKris Kennaway * experience with crypto/bn/expspeed.c test program: 9474664626SKris Kennaway * 9574664626SKris Kennaway * v8plus module on U10/300MHz against bn_asm.c compiled with: 9674664626SKris Kennaway * 9774664626SKris Kennaway * cc-5.0 -xarch=v8plus -xO5 -xdepend +7-12% 9874664626SKris Kennaway * cc-4.2 -xarch=v8plus -xO5 -xdepend +25-35% 9974664626SKris Kennaway * egcs-1.1.2 -mcpu=ultrasparc -O3 +35-45% 10074664626SKris Kennaway * 10174664626SKris Kennaway * v8 module on SS10/60MHz against bn_asm.c compiled with: 10274664626SKris Kennaway * 10374664626SKris Kennaway * cc-5.0 -xarch=v8 -xO5 -xdepend +7-10% 10474664626SKris Kennaway * cc-4.2 -xarch=v8 -xO5 -xdepend +10% 10574664626SKris Kennaway * egcs-1.1.2 -mv8 -O3 +35-45% 10674664626SKris Kennaway * 10774664626SKris Kennaway * As you can see it's damn hard to beat the new Sun C compiler 10874664626SKris Kennaway * and it's in first place GNU C users who will appreciate this 10974664626SKris Kennaway * assembler implementation:-) 11074664626SKris Kennaway */ 11174664626SKris Kennaway 11274664626SKris Kennaway/* 11374664626SKris Kennaway * Revision history. 11474664626SKris Kennaway * 11574664626SKris Kennaway * 1.0 - initial release; 11674664626SKris Kennaway * 1.1 - new loop unrolling model(*); 11774664626SKris Kennaway * - some more fine tuning; 11874664626SKris Kennaway * 1.2 - made gas friendly; 11974664626SKris Kennaway * - updates to documentation concerning v9; 12074664626SKris Kennaway * - new performance comparison matrix; 12174664626SKris Kennaway * 1.3 - fixed problem with /usr/ccs/lib/cpp; 12274664626SKris Kennaway * 1.4 - native V9 bn_*_comba[48] implementation (15% more efficient) 12374664626SKris Kennaway * resulting in slight overall performance kick; 12474664626SKris Kennaway * - some retunes; 12574664626SKris Kennaway * - support for GNU as added; 12674664626SKris Kennaway * 12774664626SKris Kennaway * (*) Originally unrolled loop looked like this: 12874664626SKris Kennaway * for (;;) { 12974664626SKris Kennaway * op(p+0); if (--n==0) break; 13074664626SKris Kennaway * op(p+1); if (--n==0) break; 13174664626SKris Kennaway * op(p+2); if (--n==0) break; 13274664626SKris Kennaway * op(p+3); if (--n==0) break; 13374664626SKris Kennaway * p+=4; 13474664626SKris Kennaway * } 13574664626SKris Kennaway * I unroll according to following: 13674664626SKris Kennaway * while (n&~3) { 13774664626SKris Kennaway * op(p+0); op(p+1); op(p+2); op(p+3); 13874664626SKris Kennaway * p+=4; n=-4; 13974664626SKris Kennaway * } 14074664626SKris Kennaway * if (n) { 14174664626SKris Kennaway * op(p+0); if (--n==0) return; 14274664626SKris Kennaway * op(p+2); if (--n==0) return; 14374664626SKris Kennaway * op(p+3); return; 14474664626SKris Kennaway * } 14574664626SKris Kennaway */ 14674664626SKris Kennaway 1471f13597dSJung-uk Kim#if defined(__SUNPRO_C) && defined(__sparcv9) 1481f13597dSJung-uk Kim /* They've said -xarch=v9 at command line */ 1491f13597dSJung-uk Kim .register %g2,#scratch 1501f13597dSJung-uk Kim .register %g3,#scratch 1511f13597dSJung-uk Kim# define FRAME_SIZE -192 1521f13597dSJung-uk Kim#elif defined(__GNUC__) && defined(__arch64__) 1531f13597dSJung-uk Kim /* They've said -m64 at command line */ 1541f13597dSJung-uk Kim .register %g2,#scratch 1551f13597dSJung-uk Kim .register %g3,#scratch 1561f13597dSJung-uk Kim# define FRAME_SIZE -192 1571f13597dSJung-uk Kim#else 1581f13597dSJung-uk Kim# define FRAME_SIZE -96 1591f13597dSJung-uk Kim#endif 16074664626SKris Kennaway/* 16174664626SKris Kennaway * GNU assembler can't stand stuw:-( 16274664626SKris Kennaway */ 16374664626SKris Kennaway#define stuw st 16474664626SKris Kennaway 16574664626SKris Kennaway.section ".text",#alloc,#execinstr 16674664626SKris Kennaway.file "bn_asm.sparc.v8plus.S" 16774664626SKris Kennaway 16874664626SKris Kennaway.align 32 16974664626SKris Kennaway 17074664626SKris Kennaway.global bn_mul_add_words 17174664626SKris Kennaway/* 17274664626SKris Kennaway * BN_ULONG bn_mul_add_words(rp,ap,num,w) 17374664626SKris Kennaway * BN_ULONG *rp,*ap; 17474664626SKris Kennaway * int num; 17574664626SKris Kennaway * BN_ULONG w; 17674664626SKris Kennaway */ 17774664626SKris Kennawaybn_mul_add_words: 1783b4e3dcbSSimon L. B. Nielsen sra %o2,%g0,%o2 ! signx %o2 17974664626SKris Kennaway brgz,a %o2,.L_bn_mul_add_words_proceed 18074664626SKris Kennaway lduw [%o1],%g2 18174664626SKris Kennaway retl 18274664626SKris Kennaway clr %o0 1833b4e3dcbSSimon L. B. Nielsen nop 1843b4e3dcbSSimon L. B. Nielsen nop 1853b4e3dcbSSimon L. B. Nielsen nop 18674664626SKris Kennaway 18774664626SKris Kennaway.L_bn_mul_add_words_proceed: 18874664626SKris Kennaway srl %o3,%g0,%o3 ! clruw %o3 18974664626SKris Kennaway andcc %o2,-4,%g0 19074664626SKris Kennaway bz,pn %icc,.L_bn_mul_add_words_tail 19174664626SKris Kennaway clr %o5 19274664626SKris Kennaway 19374664626SKris Kennaway.L_bn_mul_add_words_loop: ! wow! 32 aligned! 19474664626SKris Kennaway lduw [%o0],%g1 19574664626SKris Kennaway lduw [%o1+4],%g3 19674664626SKris Kennaway mulx %o3,%g2,%g2 19774664626SKris Kennaway add %g1,%o5,%o4 19874664626SKris Kennaway nop 19974664626SKris Kennaway add %o4,%g2,%o4 20074664626SKris Kennaway stuw %o4,[%o0] 20174664626SKris Kennaway srlx %o4,32,%o5 20274664626SKris Kennaway 20374664626SKris Kennaway lduw [%o0+4],%g1 20474664626SKris Kennaway lduw [%o1+8],%g2 20574664626SKris Kennaway mulx %o3,%g3,%g3 20674664626SKris Kennaway add %g1,%o5,%o4 20774664626SKris Kennaway dec 4,%o2 20874664626SKris Kennaway add %o4,%g3,%o4 20974664626SKris Kennaway stuw %o4,[%o0+4] 21074664626SKris Kennaway srlx %o4,32,%o5 21174664626SKris Kennaway 21274664626SKris Kennaway lduw [%o0+8],%g1 21374664626SKris Kennaway lduw [%o1+12],%g3 21474664626SKris Kennaway mulx %o3,%g2,%g2 21574664626SKris Kennaway add %g1,%o5,%o4 21674664626SKris Kennaway inc 16,%o1 21774664626SKris Kennaway add %o4,%g2,%o4 21874664626SKris Kennaway stuw %o4,[%o0+8] 21974664626SKris Kennaway srlx %o4,32,%o5 22074664626SKris Kennaway 22174664626SKris Kennaway lduw [%o0+12],%g1 22274664626SKris Kennaway mulx %o3,%g3,%g3 22374664626SKris Kennaway add %g1,%o5,%o4 22474664626SKris Kennaway inc 16,%o0 22574664626SKris Kennaway add %o4,%g3,%o4 22674664626SKris Kennaway andcc %o2,-4,%g0 22774664626SKris Kennaway stuw %o4,[%o0-4] 22874664626SKris Kennaway srlx %o4,32,%o5 22974664626SKris Kennaway bnz,a,pt %icc,.L_bn_mul_add_words_loop 23074664626SKris Kennaway lduw [%o1],%g2 23174664626SKris Kennaway 23274664626SKris Kennaway brnz,a,pn %o2,.L_bn_mul_add_words_tail 23374664626SKris Kennaway lduw [%o1],%g2 23474664626SKris Kennaway.L_bn_mul_add_words_return: 23574664626SKris Kennaway retl 23674664626SKris Kennaway mov %o5,%o0 23774664626SKris Kennaway 23874664626SKris Kennaway.L_bn_mul_add_words_tail: 23974664626SKris Kennaway lduw [%o0],%g1 24074664626SKris Kennaway mulx %o3,%g2,%g2 24174664626SKris Kennaway add %g1,%o5,%o4 24274664626SKris Kennaway dec %o2 24374664626SKris Kennaway add %o4,%g2,%o4 24474664626SKris Kennaway srlx %o4,32,%o5 24574664626SKris Kennaway brz,pt %o2,.L_bn_mul_add_words_return 24674664626SKris Kennaway stuw %o4,[%o0] 24774664626SKris Kennaway 24874664626SKris Kennaway lduw [%o1+4],%g2 24974664626SKris Kennaway lduw [%o0+4],%g1 25074664626SKris Kennaway mulx %o3,%g2,%g2 25174664626SKris Kennaway add %g1,%o5,%o4 25274664626SKris Kennaway dec %o2 25374664626SKris Kennaway add %o4,%g2,%o4 25474664626SKris Kennaway srlx %o4,32,%o5 25574664626SKris Kennaway brz,pt %o2,.L_bn_mul_add_words_return 25674664626SKris Kennaway stuw %o4,[%o0+4] 25774664626SKris Kennaway 25874664626SKris Kennaway lduw [%o1+8],%g2 25974664626SKris Kennaway lduw [%o0+8],%g1 26074664626SKris Kennaway mulx %o3,%g2,%g2 26174664626SKris Kennaway add %g1,%o5,%o4 26274664626SKris Kennaway add %o4,%g2,%o4 26374664626SKris Kennaway stuw %o4,[%o0+8] 26474664626SKris Kennaway retl 26574664626SKris Kennaway srlx %o4,32,%o0 26674664626SKris Kennaway 26774664626SKris Kennaway.type bn_mul_add_words,#function 26874664626SKris Kennaway.size bn_mul_add_words,(.-bn_mul_add_words) 26974664626SKris Kennaway 27074664626SKris Kennaway.align 32 27174664626SKris Kennaway 27274664626SKris Kennaway.global bn_mul_words 27374664626SKris Kennaway/* 27474664626SKris Kennaway * BN_ULONG bn_mul_words(rp,ap,num,w) 27574664626SKris Kennaway * BN_ULONG *rp,*ap; 27674664626SKris Kennaway * int num; 27774664626SKris Kennaway * BN_ULONG w; 27874664626SKris Kennaway */ 27974664626SKris Kennawaybn_mul_words: 2803b4e3dcbSSimon L. B. Nielsen sra %o2,%g0,%o2 ! signx %o2 281e71b7053SJung-uk Kim brgz,a %o2,.L_bn_mul_words_proceed 28274664626SKris Kennaway lduw [%o1],%g2 28374664626SKris Kennaway retl 28474664626SKris Kennaway clr %o0 2853b4e3dcbSSimon L. B. Nielsen nop 2863b4e3dcbSSimon L. B. Nielsen nop 2873b4e3dcbSSimon L. B. Nielsen nop 28874664626SKris Kennaway 289e71b7053SJung-uk Kim.L_bn_mul_words_proceed: 29074664626SKris Kennaway srl %o3,%g0,%o3 ! clruw %o3 29174664626SKris Kennaway andcc %o2,-4,%g0 29274664626SKris Kennaway bz,pn %icc,.L_bn_mul_words_tail 29374664626SKris Kennaway clr %o5 29474664626SKris Kennaway 29574664626SKris Kennaway.L_bn_mul_words_loop: ! wow! 32 aligned! 29674664626SKris Kennaway lduw [%o1+4],%g3 29774664626SKris Kennaway mulx %o3,%g2,%g2 29874664626SKris Kennaway add %g2,%o5,%o4 29974664626SKris Kennaway nop 30074664626SKris Kennaway stuw %o4,[%o0] 30174664626SKris Kennaway srlx %o4,32,%o5 30274664626SKris Kennaway 30374664626SKris Kennaway lduw [%o1+8],%g2 30474664626SKris Kennaway mulx %o3,%g3,%g3 30574664626SKris Kennaway add %g3,%o5,%o4 30674664626SKris Kennaway dec 4,%o2 30774664626SKris Kennaway stuw %o4,[%o0+4] 30874664626SKris Kennaway srlx %o4,32,%o5 30974664626SKris Kennaway 31074664626SKris Kennaway lduw [%o1+12],%g3 31174664626SKris Kennaway mulx %o3,%g2,%g2 31274664626SKris Kennaway add %g2,%o5,%o4 31374664626SKris Kennaway inc 16,%o1 31474664626SKris Kennaway stuw %o4,[%o0+8] 31574664626SKris Kennaway srlx %o4,32,%o5 31674664626SKris Kennaway 31774664626SKris Kennaway mulx %o3,%g3,%g3 31874664626SKris Kennaway add %g3,%o5,%o4 31974664626SKris Kennaway inc 16,%o0 32074664626SKris Kennaway stuw %o4,[%o0-4] 32174664626SKris Kennaway srlx %o4,32,%o5 32274664626SKris Kennaway andcc %o2,-4,%g0 32374664626SKris Kennaway bnz,a,pt %icc,.L_bn_mul_words_loop 32474664626SKris Kennaway lduw [%o1],%g2 32574664626SKris Kennaway nop 32674664626SKris Kennaway nop 32774664626SKris Kennaway 32874664626SKris Kennaway brnz,a,pn %o2,.L_bn_mul_words_tail 32974664626SKris Kennaway lduw [%o1],%g2 33074664626SKris Kennaway.L_bn_mul_words_return: 33174664626SKris Kennaway retl 33274664626SKris Kennaway mov %o5,%o0 33374664626SKris Kennaway 33474664626SKris Kennaway.L_bn_mul_words_tail: 33574664626SKris Kennaway mulx %o3,%g2,%g2 33674664626SKris Kennaway add %g2,%o5,%o4 33774664626SKris Kennaway dec %o2 33874664626SKris Kennaway srlx %o4,32,%o5 33974664626SKris Kennaway brz,pt %o2,.L_bn_mul_words_return 34074664626SKris Kennaway stuw %o4,[%o0] 34174664626SKris Kennaway 34274664626SKris Kennaway lduw [%o1+4],%g2 34374664626SKris Kennaway mulx %o3,%g2,%g2 34474664626SKris Kennaway add %g2,%o5,%o4 34574664626SKris Kennaway dec %o2 34674664626SKris Kennaway srlx %o4,32,%o5 34774664626SKris Kennaway brz,pt %o2,.L_bn_mul_words_return 34874664626SKris Kennaway stuw %o4,[%o0+4] 34974664626SKris Kennaway 35074664626SKris Kennaway lduw [%o1+8],%g2 35174664626SKris Kennaway mulx %o3,%g2,%g2 35274664626SKris Kennaway add %g2,%o5,%o4 35374664626SKris Kennaway stuw %o4,[%o0+8] 35474664626SKris Kennaway retl 35574664626SKris Kennaway srlx %o4,32,%o0 35674664626SKris Kennaway 35774664626SKris Kennaway.type bn_mul_words,#function 35874664626SKris Kennaway.size bn_mul_words,(.-bn_mul_words) 35974664626SKris Kennaway 36074664626SKris Kennaway.align 32 36174664626SKris Kennaway.global bn_sqr_words 36274664626SKris Kennaway/* 36374664626SKris Kennaway * void bn_sqr_words(r,a,n) 36474664626SKris Kennaway * BN_ULONG *r,*a; 36574664626SKris Kennaway * int n; 36674664626SKris Kennaway */ 36774664626SKris Kennawaybn_sqr_words: 3683b4e3dcbSSimon L. B. Nielsen sra %o2,%g0,%o2 ! signx %o2 369e71b7053SJung-uk Kim brgz,a %o2,.L_bn_sqr_words_proceed 37074664626SKris Kennaway lduw [%o1],%g2 37174664626SKris Kennaway retl 37274664626SKris Kennaway clr %o0 3733b4e3dcbSSimon L. B. Nielsen nop 3743b4e3dcbSSimon L. B. Nielsen nop 3753b4e3dcbSSimon L. B. Nielsen nop 37674664626SKris Kennaway 377e71b7053SJung-uk Kim.L_bn_sqr_words_proceed: 37874664626SKris Kennaway andcc %o2,-4,%g0 37974664626SKris Kennaway nop 38074664626SKris Kennaway bz,pn %icc,.L_bn_sqr_words_tail 38174664626SKris Kennaway nop 38274664626SKris Kennaway 38374664626SKris Kennaway.L_bn_sqr_words_loop: ! wow! 32 aligned! 38474664626SKris Kennaway lduw [%o1+4],%g3 38574664626SKris Kennaway mulx %g2,%g2,%o4 38674664626SKris Kennaway stuw %o4,[%o0] 38774664626SKris Kennaway srlx %o4,32,%o5 38874664626SKris Kennaway stuw %o5,[%o0+4] 38974664626SKris Kennaway nop 39074664626SKris Kennaway 39174664626SKris Kennaway lduw [%o1+8],%g2 39274664626SKris Kennaway mulx %g3,%g3,%o4 39374664626SKris Kennaway dec 4,%o2 39474664626SKris Kennaway stuw %o4,[%o0+8] 39574664626SKris Kennaway srlx %o4,32,%o5 39674664626SKris Kennaway stuw %o5,[%o0+12] 39774664626SKris Kennaway 39874664626SKris Kennaway lduw [%o1+12],%g3 39974664626SKris Kennaway mulx %g2,%g2,%o4 40074664626SKris Kennaway srlx %o4,32,%o5 40174664626SKris Kennaway stuw %o4,[%o0+16] 40274664626SKris Kennaway inc 16,%o1 40374664626SKris Kennaway stuw %o5,[%o0+20] 40474664626SKris Kennaway 40574664626SKris Kennaway mulx %g3,%g3,%o4 40674664626SKris Kennaway inc 32,%o0 40774664626SKris Kennaway stuw %o4,[%o0-8] 40874664626SKris Kennaway srlx %o4,32,%o5 40974664626SKris Kennaway andcc %o2,-4,%g2 41074664626SKris Kennaway stuw %o5,[%o0-4] 41174664626SKris Kennaway bnz,a,pt %icc,.L_bn_sqr_words_loop 41274664626SKris Kennaway lduw [%o1],%g2 41374664626SKris Kennaway nop 41474664626SKris Kennaway 41574664626SKris Kennaway brnz,a,pn %o2,.L_bn_sqr_words_tail 41674664626SKris Kennaway lduw [%o1],%g2 41774664626SKris Kennaway.L_bn_sqr_words_return: 41874664626SKris Kennaway retl 41974664626SKris Kennaway clr %o0 42074664626SKris Kennaway 42174664626SKris Kennaway.L_bn_sqr_words_tail: 42274664626SKris Kennaway mulx %g2,%g2,%o4 42374664626SKris Kennaway dec %o2 42474664626SKris Kennaway stuw %o4,[%o0] 42574664626SKris Kennaway srlx %o4,32,%o5 42674664626SKris Kennaway brz,pt %o2,.L_bn_sqr_words_return 42774664626SKris Kennaway stuw %o5,[%o0+4] 42874664626SKris Kennaway 42974664626SKris Kennaway lduw [%o1+4],%g2 43074664626SKris Kennaway mulx %g2,%g2,%o4 43174664626SKris Kennaway dec %o2 43274664626SKris Kennaway stuw %o4,[%o0+8] 43374664626SKris Kennaway srlx %o4,32,%o5 43474664626SKris Kennaway brz,pt %o2,.L_bn_sqr_words_return 43574664626SKris Kennaway stuw %o5,[%o0+12] 43674664626SKris Kennaway 43774664626SKris Kennaway lduw [%o1+8],%g2 43874664626SKris Kennaway mulx %g2,%g2,%o4 43974664626SKris Kennaway srlx %o4,32,%o5 44074664626SKris Kennaway stuw %o4,[%o0+16] 44174664626SKris Kennaway stuw %o5,[%o0+20] 44274664626SKris Kennaway retl 44374664626SKris Kennaway clr %o0 44474664626SKris Kennaway 44574664626SKris Kennaway.type bn_sqr_words,#function 44674664626SKris Kennaway.size bn_sqr_words,(.-bn_sqr_words) 44774664626SKris Kennaway 44874664626SKris Kennaway.align 32 44974664626SKris Kennaway.global bn_div_words 45074664626SKris Kennaway/* 45174664626SKris Kennaway * BN_ULONG bn_div_words(h,l,d) 45274664626SKris Kennaway * BN_ULONG h,l,d; 45374664626SKris Kennaway */ 45474664626SKris Kennawaybn_div_words: 45574664626SKris Kennaway sllx %o0,32,%o0 45674664626SKris Kennaway or %o0,%o1,%o0 45774664626SKris Kennaway udivx %o0,%o2,%o0 45874664626SKris Kennaway retl 45974664626SKris Kennaway srl %o0,%g0,%o0 ! clruw %o0 46074664626SKris Kennaway 46174664626SKris Kennaway.type bn_div_words,#function 46274664626SKris Kennaway.size bn_div_words,(.-bn_div_words) 46374664626SKris Kennaway 46474664626SKris Kennaway.align 32 46574664626SKris Kennaway 46674664626SKris Kennaway.global bn_add_words 46774664626SKris Kennaway/* 46874664626SKris Kennaway * BN_ULONG bn_add_words(rp,ap,bp,n) 46974664626SKris Kennaway * BN_ULONG *rp,*ap,*bp; 47074664626SKris Kennaway * int n; 47174664626SKris Kennaway */ 47274664626SKris Kennawaybn_add_words: 4733b4e3dcbSSimon L. B. Nielsen sra %o3,%g0,%o3 ! signx %o3 47474664626SKris Kennaway brgz,a %o3,.L_bn_add_words_proceed 47574664626SKris Kennaway lduw [%o1],%o4 47674664626SKris Kennaway retl 47774664626SKris Kennaway clr %o0 47874664626SKris Kennaway 47974664626SKris Kennaway.L_bn_add_words_proceed: 48074664626SKris Kennaway andcc %o3,-4,%g0 48174664626SKris Kennaway bz,pn %icc,.L_bn_add_words_tail 48274664626SKris Kennaway addcc %g0,0,%g0 ! clear carry flag 48374664626SKris Kennaway 48474664626SKris Kennaway.L_bn_add_words_loop: ! wow! 32 aligned! 48574664626SKris Kennaway dec 4,%o3 48674664626SKris Kennaway lduw [%o2],%o5 48774664626SKris Kennaway lduw [%o1+4],%g1 48874664626SKris Kennaway lduw [%o2+4],%g2 48974664626SKris Kennaway lduw [%o1+8],%g3 49074664626SKris Kennaway lduw [%o2+8],%g4 49174664626SKris Kennaway addccc %o5,%o4,%o5 49274664626SKris Kennaway stuw %o5,[%o0] 49374664626SKris Kennaway 49474664626SKris Kennaway lduw [%o1+12],%o4 49574664626SKris Kennaway lduw [%o2+12],%o5 49674664626SKris Kennaway inc 16,%o1 49774664626SKris Kennaway addccc %g1,%g2,%g1 49874664626SKris Kennaway stuw %g1,[%o0+4] 49974664626SKris Kennaway 50074664626SKris Kennaway inc 16,%o2 50174664626SKris Kennaway addccc %g3,%g4,%g3 50274664626SKris Kennaway stuw %g3,[%o0+8] 50374664626SKris Kennaway 50474664626SKris Kennaway inc 16,%o0 50574664626SKris Kennaway addccc %o5,%o4,%o5 50674664626SKris Kennaway stuw %o5,[%o0-4] 50774664626SKris Kennaway and %o3,-4,%g1 50874664626SKris Kennaway brnz,a,pt %g1,.L_bn_add_words_loop 50974664626SKris Kennaway lduw [%o1],%o4 51074664626SKris Kennaway 51174664626SKris Kennaway brnz,a,pn %o3,.L_bn_add_words_tail 51274664626SKris Kennaway lduw [%o1],%o4 51374664626SKris Kennaway.L_bn_add_words_return: 51474664626SKris Kennaway clr %o0 51574664626SKris Kennaway retl 51674664626SKris Kennaway movcs %icc,1,%o0 51774664626SKris Kennaway nop 51874664626SKris Kennaway 51974664626SKris Kennaway.L_bn_add_words_tail: 52074664626SKris Kennaway lduw [%o2],%o5 52174664626SKris Kennaway dec %o3 52274664626SKris Kennaway addccc %o5,%o4,%o5 52374664626SKris Kennaway brz,pt %o3,.L_bn_add_words_return 52474664626SKris Kennaway stuw %o5,[%o0] 52574664626SKris Kennaway 52674664626SKris Kennaway lduw [%o1+4],%o4 52774664626SKris Kennaway lduw [%o2+4],%o5 52874664626SKris Kennaway dec %o3 52974664626SKris Kennaway addccc %o5,%o4,%o5 53074664626SKris Kennaway brz,pt %o3,.L_bn_add_words_return 53174664626SKris Kennaway stuw %o5,[%o0+4] 53274664626SKris Kennaway 53374664626SKris Kennaway lduw [%o1+8],%o4 53474664626SKris Kennaway lduw [%o2+8],%o5 53574664626SKris Kennaway addccc %o5,%o4,%o5 53674664626SKris Kennaway stuw %o5,[%o0+8] 53774664626SKris Kennaway clr %o0 53874664626SKris Kennaway retl 53974664626SKris Kennaway movcs %icc,1,%o0 54074664626SKris Kennaway 54174664626SKris Kennaway.type bn_add_words,#function 54274664626SKris Kennaway.size bn_add_words,(.-bn_add_words) 54374664626SKris Kennaway 54474664626SKris Kennaway.global bn_sub_words 54574664626SKris Kennaway/* 54674664626SKris Kennaway * BN_ULONG bn_sub_words(rp,ap,bp,n) 54774664626SKris Kennaway * BN_ULONG *rp,*ap,*bp; 54874664626SKris Kennaway * int n; 54974664626SKris Kennaway */ 55074664626SKris Kennawaybn_sub_words: 5513b4e3dcbSSimon L. B. Nielsen sra %o3,%g0,%o3 ! signx %o3 55274664626SKris Kennaway brgz,a %o3,.L_bn_sub_words_proceed 55374664626SKris Kennaway lduw [%o1],%o4 55474664626SKris Kennaway retl 55574664626SKris Kennaway clr %o0 55674664626SKris Kennaway 55774664626SKris Kennaway.L_bn_sub_words_proceed: 55874664626SKris Kennaway andcc %o3,-4,%g0 55974664626SKris Kennaway bz,pn %icc,.L_bn_sub_words_tail 56074664626SKris Kennaway addcc %g0,0,%g0 ! clear carry flag 56174664626SKris Kennaway 56274664626SKris Kennaway.L_bn_sub_words_loop: ! wow! 32 aligned! 56374664626SKris Kennaway dec 4,%o3 56474664626SKris Kennaway lduw [%o2],%o5 56574664626SKris Kennaway lduw [%o1+4],%g1 56674664626SKris Kennaway lduw [%o2+4],%g2 56774664626SKris Kennaway lduw [%o1+8],%g3 56874664626SKris Kennaway lduw [%o2+8],%g4 56974664626SKris Kennaway subccc %o4,%o5,%o5 57074664626SKris Kennaway stuw %o5,[%o0] 57174664626SKris Kennaway 57274664626SKris Kennaway lduw [%o1+12],%o4 57374664626SKris Kennaway lduw [%o2+12],%o5 57474664626SKris Kennaway inc 16,%o1 57574664626SKris Kennaway subccc %g1,%g2,%g2 57674664626SKris Kennaway stuw %g2,[%o0+4] 57774664626SKris Kennaway 57874664626SKris Kennaway inc 16,%o2 57974664626SKris Kennaway subccc %g3,%g4,%g4 58074664626SKris Kennaway stuw %g4,[%o0+8] 58174664626SKris Kennaway 58274664626SKris Kennaway inc 16,%o0 58374664626SKris Kennaway subccc %o4,%o5,%o5 58474664626SKris Kennaway stuw %o5,[%o0-4] 58574664626SKris Kennaway and %o3,-4,%g1 58674664626SKris Kennaway brnz,a,pt %g1,.L_bn_sub_words_loop 58774664626SKris Kennaway lduw [%o1],%o4 58874664626SKris Kennaway 58974664626SKris Kennaway brnz,a,pn %o3,.L_bn_sub_words_tail 59074664626SKris Kennaway lduw [%o1],%o4 59174664626SKris Kennaway.L_bn_sub_words_return: 59274664626SKris Kennaway clr %o0 59374664626SKris Kennaway retl 59474664626SKris Kennaway movcs %icc,1,%o0 59574664626SKris Kennaway nop 59674664626SKris Kennaway 59774664626SKris Kennaway.L_bn_sub_words_tail: ! wow! 32 aligned! 59874664626SKris Kennaway lduw [%o2],%o5 59974664626SKris Kennaway dec %o3 60074664626SKris Kennaway subccc %o4,%o5,%o5 60174664626SKris Kennaway brz,pt %o3,.L_bn_sub_words_return 60274664626SKris Kennaway stuw %o5,[%o0] 60374664626SKris Kennaway 60474664626SKris Kennaway lduw [%o1+4],%o4 60574664626SKris Kennaway lduw [%o2+4],%o5 60674664626SKris Kennaway dec %o3 60774664626SKris Kennaway subccc %o4,%o5,%o5 60874664626SKris Kennaway brz,pt %o3,.L_bn_sub_words_return 60974664626SKris Kennaway stuw %o5,[%o0+4] 61074664626SKris Kennaway 61174664626SKris Kennaway lduw [%o1+8],%o4 61274664626SKris Kennaway lduw [%o2+8],%o5 61374664626SKris Kennaway subccc %o4,%o5,%o5 61474664626SKris Kennaway stuw %o5,[%o0+8] 61574664626SKris Kennaway clr %o0 61674664626SKris Kennaway retl 61774664626SKris Kennaway movcs %icc,1,%o0 61874664626SKris Kennaway 61974664626SKris Kennaway.type bn_sub_words,#function 62074664626SKris Kennaway.size bn_sub_words,(.-bn_sub_words) 62174664626SKris Kennaway 62274664626SKris Kennaway/* 62374664626SKris Kennaway * Code below depends on the fact that upper parts of the %l0-%l7 62474664626SKris Kennaway * and %i0-%i7 are zeroed by kernel after context switch. In 62574664626SKris Kennaway * previous versions this comment stated that "the trouble is that 62674664626SKris Kennaway * it's not feasible to implement the mumbo-jumbo in less V9 62774664626SKris Kennaway * instructions:-(" which apparently isn't true thanks to 62874664626SKris Kennaway * 'bcs,a %xcc,.+8; inc %rd' pair. But the performance improvement 62974664626SKris Kennaway * results not from the shorter code, but from elimination of 63074664626SKris Kennaway * multicycle none-pairable 'rd %y,%rd' instructions. 63174664626SKris Kennaway * 63274664626SKris Kennaway * Andy. 63374664626SKris Kennaway */ 63474664626SKris Kennaway 63574664626SKris Kennaway/* 63674664626SKris Kennaway * Here is register usage map for *all* routines below. 63774664626SKris Kennaway */ 63874664626SKris Kennaway#define t_1 %o0 63974664626SKris Kennaway#define t_2 %o1 64074664626SKris Kennaway#define c_12 %o2 64174664626SKris Kennaway#define c_3 %o3 64274664626SKris Kennaway 64374664626SKris Kennaway#define ap(I) [%i1+4*I] 64474664626SKris Kennaway#define bp(I) [%i2+4*I] 64574664626SKris Kennaway#define rp(I) [%i0+4*I] 64674664626SKris Kennaway 64774664626SKris Kennaway#define a_0 %l0 64874664626SKris Kennaway#define a_1 %l1 64974664626SKris Kennaway#define a_2 %l2 65074664626SKris Kennaway#define a_3 %l3 65174664626SKris Kennaway#define a_4 %l4 65274664626SKris Kennaway#define a_5 %l5 65374664626SKris Kennaway#define a_6 %l6 65474664626SKris Kennaway#define a_7 %l7 65574664626SKris Kennaway 65674664626SKris Kennaway#define b_0 %i3 65774664626SKris Kennaway#define b_1 %i4 65874664626SKris Kennaway#define b_2 %i5 65974664626SKris Kennaway#define b_3 %o4 66074664626SKris Kennaway#define b_4 %o5 66174664626SKris Kennaway#define b_5 %o7 66274664626SKris Kennaway#define b_6 %g1 66374664626SKris Kennaway#define b_7 %g4 66474664626SKris Kennaway 66574664626SKris Kennaway.align 32 66674664626SKris Kennaway.global bn_mul_comba8 66774664626SKris Kennaway/* 66874664626SKris Kennaway * void bn_mul_comba8(r,a,b) 66974664626SKris Kennaway * BN_ULONG *r,*a,*b; 67074664626SKris Kennaway */ 67174664626SKris Kennawaybn_mul_comba8: 67274664626SKris Kennaway save %sp,FRAME_SIZE,%sp 67374664626SKris Kennaway mov 1,t_2 67474664626SKris Kennaway lduw ap(0),a_0 67574664626SKris Kennaway sllx t_2,32,t_2 67674664626SKris Kennaway lduw bp(0),b_0 != 67774664626SKris Kennaway lduw bp(1),b_1 67874664626SKris Kennaway mulx a_0,b_0,t_1 !mul_add_c(a[0],b[0],c1,c2,c3); 67974664626SKris Kennaway srlx t_1,32,c_12 68074664626SKris Kennaway stuw t_1,rp(0) !=!r[0]=c1; 68174664626SKris Kennaway 68274664626SKris Kennaway lduw ap(1),a_1 68374664626SKris Kennaway mulx a_0,b_1,t_1 !mul_add_c(a[0],b[1],c2,c3,c1); 68474664626SKris Kennaway addcc c_12,t_1,c_12 68574664626SKris Kennaway clr c_3 != 68674664626SKris Kennaway bcs,a %xcc,.+8 68774664626SKris Kennaway add c_3,t_2,c_3 68874664626SKris Kennaway lduw ap(2),a_2 68974664626SKris Kennaway mulx a_1,b_0,t_1 !=!mul_add_c(a[1],b[0],c2,c3,c1); 69074664626SKris Kennaway addcc c_12,t_1,t_1 69174664626SKris Kennaway bcs,a %xcc,.+8 69274664626SKris Kennaway add c_3,t_2,c_3 69374664626SKris Kennaway srlx t_1,32,c_12 != 69474664626SKris Kennaway stuw t_1,rp(1) !r[1]=c2; 69574664626SKris Kennaway or c_12,c_3,c_12 69674664626SKris Kennaway 69774664626SKris Kennaway mulx a_2,b_0,t_1 !mul_add_c(a[2],b[0],c3,c1,c2); 69874664626SKris Kennaway addcc c_12,t_1,c_12 != 69974664626SKris Kennaway clr c_3 70074664626SKris Kennaway bcs,a %xcc,.+8 70174664626SKris Kennaway add c_3,t_2,c_3 70274664626SKris Kennaway lduw bp(2),b_2 != 70374664626SKris Kennaway mulx a_1,b_1,t_1 !mul_add_c(a[1],b[1],c3,c1,c2); 70474664626SKris Kennaway addcc c_12,t_1,c_12 70574664626SKris Kennaway bcs,a %xcc,.+8 70674664626SKris Kennaway add c_3,t_2,c_3 != 70774664626SKris Kennaway lduw bp(3),b_3 70874664626SKris Kennaway mulx a_0,b_2,t_1 !mul_add_c(a[0],b[2],c3,c1,c2); 70974664626SKris Kennaway addcc c_12,t_1,t_1 71074664626SKris Kennaway bcs,a %xcc,.+8 != 71174664626SKris Kennaway add c_3,t_2,c_3 71274664626SKris Kennaway srlx t_1,32,c_12 71374664626SKris Kennaway stuw t_1,rp(2) !r[2]=c3; 71474664626SKris Kennaway or c_12,c_3,c_12 != 71574664626SKris Kennaway 71674664626SKris Kennaway mulx a_0,b_3,t_1 !mul_add_c(a[0],b[3],c1,c2,c3); 71774664626SKris Kennaway addcc c_12,t_1,c_12 71874664626SKris Kennaway clr c_3 71974664626SKris Kennaway bcs,a %xcc,.+8 != 72074664626SKris Kennaway add c_3,t_2,c_3 72174664626SKris Kennaway mulx a_1,b_2,t_1 !=!mul_add_c(a[1],b[2],c1,c2,c3); 72274664626SKris Kennaway addcc c_12,t_1,c_12 72374664626SKris Kennaway bcs,a %xcc,.+8 != 72474664626SKris Kennaway add c_3,t_2,c_3 72574664626SKris Kennaway lduw ap(3),a_3 72674664626SKris Kennaway mulx a_2,b_1,t_1 !mul_add_c(a[2],b[1],c1,c2,c3); 72774664626SKris Kennaway addcc c_12,t_1,c_12 != 72874664626SKris Kennaway bcs,a %xcc,.+8 72974664626SKris Kennaway add c_3,t_2,c_3 73074664626SKris Kennaway lduw ap(4),a_4 73174664626SKris Kennaway mulx a_3,b_0,t_1 !=!mul_add_c(a[3],b[0],c1,c2,c3);!= 73274664626SKris Kennaway addcc c_12,t_1,t_1 73374664626SKris Kennaway bcs,a %xcc,.+8 73474664626SKris Kennaway add c_3,t_2,c_3 73574664626SKris Kennaway srlx t_1,32,c_12 != 73674664626SKris Kennaway stuw t_1,rp(3) !r[3]=c1; 73774664626SKris Kennaway or c_12,c_3,c_12 73874664626SKris Kennaway 73974664626SKris Kennaway mulx a_4,b_0,t_1 !mul_add_c(a[4],b[0],c2,c3,c1); 74074664626SKris Kennaway addcc c_12,t_1,c_12 != 74174664626SKris Kennaway clr c_3 74274664626SKris Kennaway bcs,a %xcc,.+8 74374664626SKris Kennaway add c_3,t_2,c_3 74474664626SKris Kennaway mulx a_3,b_1,t_1 !=!mul_add_c(a[3],b[1],c2,c3,c1); 74574664626SKris Kennaway addcc c_12,t_1,c_12 74674664626SKris Kennaway bcs,a %xcc,.+8 74774664626SKris Kennaway add c_3,t_2,c_3 74874664626SKris Kennaway mulx a_2,b_2,t_1 !=!mul_add_c(a[2],b[2],c2,c3,c1); 74974664626SKris Kennaway addcc c_12,t_1,c_12 75074664626SKris Kennaway bcs,a %xcc,.+8 75174664626SKris Kennaway add c_3,t_2,c_3 75274664626SKris Kennaway lduw bp(4),b_4 != 75374664626SKris Kennaway mulx a_1,b_3,t_1 !mul_add_c(a[1],b[3],c2,c3,c1); 75474664626SKris Kennaway addcc c_12,t_1,c_12 75574664626SKris Kennaway bcs,a %xcc,.+8 75674664626SKris Kennaway add c_3,t_2,c_3 != 75774664626SKris Kennaway lduw bp(5),b_5 75874664626SKris Kennaway mulx a_0,b_4,t_1 !mul_add_c(a[0],b[4],c2,c3,c1); 75974664626SKris Kennaway addcc c_12,t_1,t_1 76074664626SKris Kennaway bcs,a %xcc,.+8 != 76174664626SKris Kennaway add c_3,t_2,c_3 76274664626SKris Kennaway srlx t_1,32,c_12 76374664626SKris Kennaway stuw t_1,rp(4) !r[4]=c2; 76474664626SKris Kennaway or c_12,c_3,c_12 != 76574664626SKris Kennaway 76674664626SKris Kennaway mulx a_0,b_5,t_1 !mul_add_c(a[0],b[5],c3,c1,c2); 76774664626SKris Kennaway addcc c_12,t_1,c_12 76874664626SKris Kennaway clr c_3 76974664626SKris Kennaway bcs,a %xcc,.+8 != 77074664626SKris Kennaway add c_3,t_2,c_3 77174664626SKris Kennaway mulx a_1,b_4,t_1 !mul_add_c(a[1],b[4],c3,c1,c2); 77274664626SKris Kennaway addcc c_12,t_1,c_12 77374664626SKris Kennaway bcs,a %xcc,.+8 != 77474664626SKris Kennaway add c_3,t_2,c_3 77574664626SKris Kennaway mulx a_2,b_3,t_1 !mul_add_c(a[2],b[3],c3,c1,c2); 77674664626SKris Kennaway addcc c_12,t_1,c_12 77774664626SKris Kennaway bcs,a %xcc,.+8 != 77874664626SKris Kennaway add c_3,t_2,c_3 77974664626SKris Kennaway mulx a_3,b_2,t_1 !mul_add_c(a[3],b[2],c3,c1,c2); 78074664626SKris Kennaway addcc c_12,t_1,c_12 78174664626SKris Kennaway bcs,a %xcc,.+8 != 78274664626SKris Kennaway add c_3,t_2,c_3 78374664626SKris Kennaway lduw ap(5),a_5 78474664626SKris Kennaway mulx a_4,b_1,t_1 !mul_add_c(a[4],b[1],c3,c1,c2); 78574664626SKris Kennaway addcc c_12,t_1,c_12 != 78674664626SKris Kennaway bcs,a %xcc,.+8 78774664626SKris Kennaway add c_3,t_2,c_3 78874664626SKris Kennaway lduw ap(6),a_6 78974664626SKris Kennaway mulx a_5,b_0,t_1 !=!mul_add_c(a[5],b[0],c3,c1,c2); 79074664626SKris Kennaway addcc c_12,t_1,t_1 79174664626SKris Kennaway bcs,a %xcc,.+8 79274664626SKris Kennaway add c_3,t_2,c_3 79374664626SKris Kennaway srlx t_1,32,c_12 != 79474664626SKris Kennaway stuw t_1,rp(5) !r[5]=c3; 79574664626SKris Kennaway or c_12,c_3,c_12 79674664626SKris Kennaway 79774664626SKris Kennaway mulx a_6,b_0,t_1 !mul_add_c(a[6],b[0],c1,c2,c3); 79874664626SKris Kennaway addcc c_12,t_1,c_12 != 79974664626SKris Kennaway clr c_3 80074664626SKris Kennaway bcs,a %xcc,.+8 80174664626SKris Kennaway add c_3,t_2,c_3 80274664626SKris Kennaway mulx a_5,b_1,t_1 !=!mul_add_c(a[5],b[1],c1,c2,c3); 80374664626SKris Kennaway addcc c_12,t_1,c_12 80474664626SKris Kennaway bcs,a %xcc,.+8 80574664626SKris Kennaway add c_3,t_2,c_3 80674664626SKris Kennaway mulx a_4,b_2,t_1 !=!mul_add_c(a[4],b[2],c1,c2,c3); 80774664626SKris Kennaway addcc c_12,t_1,c_12 80874664626SKris Kennaway bcs,a %xcc,.+8 80974664626SKris Kennaway add c_3,t_2,c_3 81074664626SKris Kennaway mulx a_3,b_3,t_1 !=!mul_add_c(a[3],b[3],c1,c2,c3); 81174664626SKris Kennaway addcc c_12,t_1,c_12 81274664626SKris Kennaway bcs,a %xcc,.+8 81374664626SKris Kennaway add c_3,t_2,c_3 81474664626SKris Kennaway mulx a_2,b_4,t_1 !=!mul_add_c(a[2],b[4],c1,c2,c3); 81574664626SKris Kennaway addcc c_12,t_1,c_12 81674664626SKris Kennaway bcs,a %xcc,.+8 81774664626SKris Kennaway add c_3,t_2,c_3 81874664626SKris Kennaway lduw bp(6),b_6 != 81974664626SKris Kennaway mulx a_1,b_5,t_1 !mul_add_c(a[1],b[5],c1,c2,c3); 82074664626SKris Kennaway addcc c_12,t_1,c_12 82174664626SKris Kennaway bcs,a %xcc,.+8 82274664626SKris Kennaway add c_3,t_2,c_3 != 82374664626SKris Kennaway lduw bp(7),b_7 82474664626SKris Kennaway mulx a_0,b_6,t_1 !mul_add_c(a[0],b[6],c1,c2,c3); 82574664626SKris Kennaway addcc c_12,t_1,t_1 82674664626SKris Kennaway bcs,a %xcc,.+8 != 82774664626SKris Kennaway add c_3,t_2,c_3 82874664626SKris Kennaway srlx t_1,32,c_12 82974664626SKris Kennaway stuw t_1,rp(6) !r[6]=c1; 83074664626SKris Kennaway or c_12,c_3,c_12 != 83174664626SKris Kennaway 83274664626SKris Kennaway mulx a_0,b_7,t_1 !mul_add_c(a[0],b[7],c2,c3,c1); 83374664626SKris Kennaway addcc c_12,t_1,c_12 83474664626SKris Kennaway clr c_3 83574664626SKris Kennaway bcs,a %xcc,.+8 != 83674664626SKris Kennaway add c_3,t_2,c_3 83774664626SKris Kennaway mulx a_1,b_6,t_1 !mul_add_c(a[1],b[6],c2,c3,c1); 83874664626SKris Kennaway addcc c_12,t_1,c_12 83974664626SKris Kennaway bcs,a %xcc,.+8 != 84074664626SKris Kennaway add c_3,t_2,c_3 84174664626SKris Kennaway mulx a_2,b_5,t_1 !mul_add_c(a[2],b[5],c2,c3,c1); 84274664626SKris Kennaway addcc c_12,t_1,c_12 84374664626SKris Kennaway bcs,a %xcc,.+8 != 84474664626SKris Kennaway add c_3,t_2,c_3 84574664626SKris Kennaway mulx a_3,b_4,t_1 !mul_add_c(a[3],b[4],c2,c3,c1); 84674664626SKris Kennaway addcc c_12,t_1,c_12 84774664626SKris Kennaway bcs,a %xcc,.+8 != 84874664626SKris Kennaway add c_3,t_2,c_3 84974664626SKris Kennaway mulx a_4,b_3,t_1 !mul_add_c(a[4],b[3],c2,c3,c1); 85074664626SKris Kennaway addcc c_12,t_1,c_12 85174664626SKris Kennaway bcs,a %xcc,.+8 != 85274664626SKris Kennaway add c_3,t_2,c_3 85374664626SKris Kennaway mulx a_5,b_2,t_1 !mul_add_c(a[5],b[2],c2,c3,c1); 85474664626SKris Kennaway addcc c_12,t_1,c_12 85574664626SKris Kennaway bcs,a %xcc,.+8 != 85674664626SKris Kennaway add c_3,t_2,c_3 85774664626SKris Kennaway lduw ap(7),a_7 85874664626SKris Kennaway mulx a_6,b_1,t_1 !=!mul_add_c(a[6],b[1],c2,c3,c1); 85974664626SKris Kennaway addcc c_12,t_1,c_12 86074664626SKris Kennaway bcs,a %xcc,.+8 86174664626SKris Kennaway add c_3,t_2,c_3 86274664626SKris Kennaway mulx a_7,b_0,t_1 !=!mul_add_c(a[7],b[0],c2,c3,c1); 86374664626SKris Kennaway addcc c_12,t_1,t_1 86474664626SKris Kennaway bcs,a %xcc,.+8 86574664626SKris Kennaway add c_3,t_2,c_3 86674664626SKris Kennaway srlx t_1,32,c_12 != 86774664626SKris Kennaway stuw t_1,rp(7) !r[7]=c2; 86874664626SKris Kennaway or c_12,c_3,c_12 86974664626SKris Kennaway 87074664626SKris Kennaway mulx a_7,b_1,t_1 !=!mul_add_c(a[7],b[1],c3,c1,c2); 87174664626SKris Kennaway addcc c_12,t_1,c_12 87274664626SKris Kennaway clr c_3 87374664626SKris Kennaway bcs,a %xcc,.+8 87474664626SKris Kennaway add c_3,t_2,c_3 != 87574664626SKris Kennaway mulx a_6,b_2,t_1 !mul_add_c(a[6],b[2],c3,c1,c2); 87674664626SKris Kennaway addcc c_12,t_1,c_12 87774664626SKris Kennaway bcs,a %xcc,.+8 87874664626SKris Kennaway add c_3,t_2,c_3 != 87974664626SKris Kennaway mulx a_5,b_3,t_1 !mul_add_c(a[5],b[3],c3,c1,c2); 88074664626SKris Kennaway addcc c_12,t_1,c_12 88174664626SKris Kennaway bcs,a %xcc,.+8 88274664626SKris Kennaway add c_3,t_2,c_3 != 88374664626SKris Kennaway mulx a_4,b_4,t_1 !mul_add_c(a[4],b[4],c3,c1,c2); 88474664626SKris Kennaway addcc c_12,t_1,c_12 88574664626SKris Kennaway bcs,a %xcc,.+8 88674664626SKris Kennaway add c_3,t_2,c_3 != 88774664626SKris Kennaway mulx a_3,b_5,t_1 !mul_add_c(a[3],b[5],c3,c1,c2); 88874664626SKris Kennaway addcc c_12,t_1,c_12 88974664626SKris Kennaway bcs,a %xcc,.+8 89074664626SKris Kennaway add c_3,t_2,c_3 != 89174664626SKris Kennaway mulx a_2,b_6,t_1 !mul_add_c(a[2],b[6],c3,c1,c2); 89274664626SKris Kennaway addcc c_12,t_1,c_12 89374664626SKris Kennaway bcs,a %xcc,.+8 89474664626SKris Kennaway add c_3,t_2,c_3 != 89574664626SKris Kennaway mulx a_1,b_7,t_1 !mul_add_c(a[1],b[7],c3,c1,c2); 89674664626SKris Kennaway addcc c_12,t_1,t_1 89774664626SKris Kennaway bcs,a %xcc,.+8 89874664626SKris Kennaway add c_3,t_2,c_3 != 89974664626SKris Kennaway srlx t_1,32,c_12 90074664626SKris Kennaway stuw t_1,rp(8) !r[8]=c3; 90174664626SKris Kennaway or c_12,c_3,c_12 90274664626SKris Kennaway 90374664626SKris Kennaway mulx a_2,b_7,t_1 !=!mul_add_c(a[2],b[7],c1,c2,c3); 90474664626SKris Kennaway addcc c_12,t_1,c_12 90574664626SKris Kennaway clr c_3 90674664626SKris Kennaway bcs,a %xcc,.+8 90774664626SKris Kennaway add c_3,t_2,c_3 != 90874664626SKris Kennaway mulx a_3,b_6,t_1 !mul_add_c(a[3],b[6],c1,c2,c3); 90974664626SKris Kennaway addcc c_12,t_1,c_12 91074664626SKris Kennaway bcs,a %xcc,.+8 != 91174664626SKris Kennaway add c_3,t_2,c_3 91274664626SKris Kennaway mulx a_4,b_5,t_1 !mul_add_c(a[4],b[5],c1,c2,c3); 91374664626SKris Kennaway addcc c_12,t_1,c_12 91474664626SKris Kennaway bcs,a %xcc,.+8 != 91574664626SKris Kennaway add c_3,t_2,c_3 91674664626SKris Kennaway mulx a_5,b_4,t_1 !mul_add_c(a[5],b[4],c1,c2,c3); 91774664626SKris Kennaway addcc c_12,t_1,c_12 91874664626SKris Kennaway bcs,a %xcc,.+8 != 91974664626SKris Kennaway add c_3,t_2,c_3 92074664626SKris Kennaway mulx a_6,b_3,t_1 !mul_add_c(a[6],b[3],c1,c2,c3); 92174664626SKris Kennaway addcc c_12,t_1,c_12 92274664626SKris Kennaway bcs,a %xcc,.+8 != 92374664626SKris Kennaway add c_3,t_2,c_3 92474664626SKris Kennaway mulx a_7,b_2,t_1 !mul_add_c(a[7],b[2],c1,c2,c3); 92574664626SKris Kennaway addcc c_12,t_1,t_1 92674664626SKris Kennaway bcs,a %xcc,.+8 != 92774664626SKris Kennaway add c_3,t_2,c_3 92874664626SKris Kennaway srlx t_1,32,c_12 92974664626SKris Kennaway stuw t_1,rp(9) !r[9]=c1; 93074664626SKris Kennaway or c_12,c_3,c_12 != 93174664626SKris Kennaway 93274664626SKris Kennaway mulx a_7,b_3,t_1 !mul_add_c(a[7],b[3],c2,c3,c1); 93374664626SKris Kennaway addcc c_12,t_1,c_12 93474664626SKris Kennaway clr c_3 93574664626SKris Kennaway bcs,a %xcc,.+8 != 93674664626SKris Kennaway add c_3,t_2,c_3 93774664626SKris Kennaway mulx a_6,b_4,t_1 !mul_add_c(a[6],b[4],c2,c3,c1); 93874664626SKris Kennaway addcc c_12,t_1,c_12 93974664626SKris Kennaway bcs,a %xcc,.+8 != 94074664626SKris Kennaway add c_3,t_2,c_3 94174664626SKris Kennaway mulx a_5,b_5,t_1 !mul_add_c(a[5],b[5],c2,c3,c1); 94274664626SKris Kennaway addcc c_12,t_1,c_12 94374664626SKris Kennaway bcs,a %xcc,.+8 != 94474664626SKris Kennaway add c_3,t_2,c_3 94574664626SKris Kennaway mulx a_4,b_6,t_1 !mul_add_c(a[4],b[6],c2,c3,c1); 94674664626SKris Kennaway addcc c_12,t_1,c_12 94774664626SKris Kennaway bcs,a %xcc,.+8 != 94874664626SKris Kennaway add c_3,t_2,c_3 94974664626SKris Kennaway mulx a_3,b_7,t_1 !mul_add_c(a[3],b[7],c2,c3,c1); 95074664626SKris Kennaway addcc c_12,t_1,t_1 95174664626SKris Kennaway bcs,a %xcc,.+8 != 95274664626SKris Kennaway add c_3,t_2,c_3 95374664626SKris Kennaway srlx t_1,32,c_12 95474664626SKris Kennaway stuw t_1,rp(10) !r[10]=c2; 95574664626SKris Kennaway or c_12,c_3,c_12 != 95674664626SKris Kennaway 95774664626SKris Kennaway mulx a_4,b_7,t_1 !mul_add_c(a[4],b[7],c3,c1,c2); 95874664626SKris Kennaway addcc c_12,t_1,c_12 95974664626SKris Kennaway clr c_3 96074664626SKris Kennaway bcs,a %xcc,.+8 != 96174664626SKris Kennaway add c_3,t_2,c_3 96274664626SKris Kennaway mulx a_5,b_6,t_1 !mul_add_c(a[5],b[6],c3,c1,c2); 96374664626SKris Kennaway addcc c_12,t_1,c_12 96474664626SKris Kennaway bcs,a %xcc,.+8 != 96574664626SKris Kennaway add c_3,t_2,c_3 96674664626SKris Kennaway mulx a_6,b_5,t_1 !mul_add_c(a[6],b[5],c3,c1,c2); 96774664626SKris Kennaway addcc c_12,t_1,c_12 96874664626SKris Kennaway bcs,a %xcc,.+8 != 96974664626SKris Kennaway add c_3,t_2,c_3 97074664626SKris Kennaway mulx a_7,b_4,t_1 !mul_add_c(a[7],b[4],c3,c1,c2); 97174664626SKris Kennaway addcc c_12,t_1,t_1 97274664626SKris Kennaway bcs,a %xcc,.+8 != 97374664626SKris Kennaway add c_3,t_2,c_3 97474664626SKris Kennaway srlx t_1,32,c_12 97574664626SKris Kennaway stuw t_1,rp(11) !r[11]=c3; 97674664626SKris Kennaway or c_12,c_3,c_12 != 97774664626SKris Kennaway 97874664626SKris Kennaway mulx a_7,b_5,t_1 !mul_add_c(a[7],b[5],c1,c2,c3); 97974664626SKris Kennaway addcc c_12,t_1,c_12 98074664626SKris Kennaway clr c_3 98174664626SKris Kennaway bcs,a %xcc,.+8 != 98274664626SKris Kennaway add c_3,t_2,c_3 98374664626SKris Kennaway mulx a_6,b_6,t_1 !mul_add_c(a[6],b[6],c1,c2,c3); 98474664626SKris Kennaway addcc c_12,t_1,c_12 98574664626SKris Kennaway bcs,a %xcc,.+8 != 98674664626SKris Kennaway add c_3,t_2,c_3 98774664626SKris Kennaway mulx a_5,b_7,t_1 !mul_add_c(a[5],b[7],c1,c2,c3); 98874664626SKris Kennaway addcc c_12,t_1,t_1 98974664626SKris Kennaway bcs,a %xcc,.+8 != 99074664626SKris Kennaway add c_3,t_2,c_3 99174664626SKris Kennaway srlx t_1,32,c_12 99274664626SKris Kennaway stuw t_1,rp(12) !r[12]=c1; 99374664626SKris Kennaway or c_12,c_3,c_12 != 99474664626SKris Kennaway 99574664626SKris Kennaway mulx a_6,b_7,t_1 !mul_add_c(a[6],b[7],c2,c3,c1); 99674664626SKris Kennaway addcc c_12,t_1,c_12 99774664626SKris Kennaway clr c_3 99874664626SKris Kennaway bcs,a %xcc,.+8 != 99974664626SKris Kennaway add c_3,t_2,c_3 100074664626SKris Kennaway mulx a_7,b_6,t_1 !mul_add_c(a[7],b[6],c2,c3,c1); 100174664626SKris Kennaway addcc c_12,t_1,t_1 100274664626SKris Kennaway bcs,a %xcc,.+8 != 100374664626SKris Kennaway add c_3,t_2,c_3 100474664626SKris Kennaway srlx t_1,32,c_12 100574664626SKris Kennaway st t_1,rp(13) !r[13]=c2; 100674664626SKris Kennaway or c_12,c_3,c_12 != 100774664626SKris Kennaway 100874664626SKris Kennaway mulx a_7,b_7,t_1 !mul_add_c(a[7],b[7],c3,c1,c2); 100974664626SKris Kennaway addcc c_12,t_1,t_1 101074664626SKris Kennaway srlx t_1,32,c_12 != 101174664626SKris Kennaway stuw t_1,rp(14) !r[14]=c3; 101274664626SKris Kennaway stuw c_12,rp(15) !r[15]=c1; 101374664626SKris Kennaway 101474664626SKris Kennaway ret 101574664626SKris Kennaway restore %g0,%g0,%o0 != 101674664626SKris Kennaway 101774664626SKris Kennaway.type bn_mul_comba8,#function 101874664626SKris Kennaway.size bn_mul_comba8,(.-bn_mul_comba8) 101974664626SKris Kennaway 102074664626SKris Kennaway.align 32 102174664626SKris Kennaway 102274664626SKris Kennaway.global bn_mul_comba4 102374664626SKris Kennaway/* 102474664626SKris Kennaway * void bn_mul_comba4(r,a,b) 102574664626SKris Kennaway * BN_ULONG *r,*a,*b; 102674664626SKris Kennaway */ 102774664626SKris Kennawaybn_mul_comba4: 102874664626SKris Kennaway save %sp,FRAME_SIZE,%sp 102974664626SKris Kennaway lduw ap(0),a_0 103074664626SKris Kennaway mov 1,t_2 103174664626SKris Kennaway lduw bp(0),b_0 103274664626SKris Kennaway sllx t_2,32,t_2 != 103374664626SKris Kennaway lduw bp(1),b_1 103474664626SKris Kennaway mulx a_0,b_0,t_1 !mul_add_c(a[0],b[0],c1,c2,c3); 103574664626SKris Kennaway srlx t_1,32,c_12 103674664626SKris Kennaway stuw t_1,rp(0) !=!r[0]=c1; 103774664626SKris Kennaway 103874664626SKris Kennaway lduw ap(1),a_1 103974664626SKris Kennaway mulx a_0,b_1,t_1 !mul_add_c(a[0],b[1],c2,c3,c1); 104074664626SKris Kennaway addcc c_12,t_1,c_12 104174664626SKris Kennaway clr c_3 != 104274664626SKris Kennaway bcs,a %xcc,.+8 104374664626SKris Kennaway add c_3,t_2,c_3 104474664626SKris Kennaway lduw ap(2),a_2 104574664626SKris Kennaway mulx a_1,b_0,t_1 !=!mul_add_c(a[1],b[0],c2,c3,c1); 104674664626SKris Kennaway addcc c_12,t_1,t_1 104774664626SKris Kennaway bcs,a %xcc,.+8 104874664626SKris Kennaway add c_3,t_2,c_3 104974664626SKris Kennaway srlx t_1,32,c_12 != 105074664626SKris Kennaway stuw t_1,rp(1) !r[1]=c2; 105174664626SKris Kennaway or c_12,c_3,c_12 105274664626SKris Kennaway 105374664626SKris Kennaway mulx a_2,b_0,t_1 !mul_add_c(a[2],b[0],c3,c1,c2); 105474664626SKris Kennaway addcc c_12,t_1,c_12 != 105574664626SKris Kennaway clr c_3 105674664626SKris Kennaway bcs,a %xcc,.+8 105774664626SKris Kennaway add c_3,t_2,c_3 105874664626SKris Kennaway lduw bp(2),b_2 != 105974664626SKris Kennaway mulx a_1,b_1,t_1 !mul_add_c(a[1],b[1],c3,c1,c2); 106074664626SKris Kennaway addcc c_12,t_1,c_12 106174664626SKris Kennaway bcs,a %xcc,.+8 106274664626SKris Kennaway add c_3,t_2,c_3 != 106374664626SKris Kennaway lduw bp(3),b_3 106474664626SKris Kennaway mulx a_0,b_2,t_1 !mul_add_c(a[0],b[2],c3,c1,c2); 106574664626SKris Kennaway addcc c_12,t_1,t_1 106674664626SKris Kennaway bcs,a %xcc,.+8 != 106774664626SKris Kennaway add c_3,t_2,c_3 106874664626SKris Kennaway srlx t_1,32,c_12 106974664626SKris Kennaway stuw t_1,rp(2) !r[2]=c3; 107074664626SKris Kennaway or c_12,c_3,c_12 != 107174664626SKris Kennaway 107274664626SKris Kennaway mulx a_0,b_3,t_1 !mul_add_c(a[0],b[3],c1,c2,c3); 107374664626SKris Kennaway addcc c_12,t_1,c_12 107474664626SKris Kennaway clr c_3 107574664626SKris Kennaway bcs,a %xcc,.+8 != 107674664626SKris Kennaway add c_3,t_2,c_3 107774664626SKris Kennaway mulx a_1,b_2,t_1 !mul_add_c(a[1],b[2],c1,c2,c3); 107874664626SKris Kennaway addcc c_12,t_1,c_12 107974664626SKris Kennaway bcs,a %xcc,.+8 != 108074664626SKris Kennaway add c_3,t_2,c_3 108174664626SKris Kennaway lduw ap(3),a_3 108274664626SKris Kennaway mulx a_2,b_1,t_1 !mul_add_c(a[2],b[1],c1,c2,c3); 108374664626SKris Kennaway addcc c_12,t_1,c_12 != 108474664626SKris Kennaway bcs,a %xcc,.+8 108574664626SKris Kennaway add c_3,t_2,c_3 108674664626SKris Kennaway mulx a_3,b_0,t_1 !mul_add_c(a[3],b[0],c1,c2,c3);!= 108774664626SKris Kennaway addcc c_12,t_1,t_1 != 108874664626SKris Kennaway bcs,a %xcc,.+8 108974664626SKris Kennaway add c_3,t_2,c_3 109074664626SKris Kennaway srlx t_1,32,c_12 109174664626SKris Kennaway stuw t_1,rp(3) !=!r[3]=c1; 109274664626SKris Kennaway or c_12,c_3,c_12 109374664626SKris Kennaway 109474664626SKris Kennaway mulx a_3,b_1,t_1 !mul_add_c(a[3],b[1],c2,c3,c1); 109574664626SKris Kennaway addcc c_12,t_1,c_12 109674664626SKris Kennaway clr c_3 != 109774664626SKris Kennaway bcs,a %xcc,.+8 109874664626SKris Kennaway add c_3,t_2,c_3 109974664626SKris Kennaway mulx a_2,b_2,t_1 !mul_add_c(a[2],b[2],c2,c3,c1); 110074664626SKris Kennaway addcc c_12,t_1,c_12 != 110174664626SKris Kennaway bcs,a %xcc,.+8 110274664626SKris Kennaway add c_3,t_2,c_3 110374664626SKris Kennaway mulx a_1,b_3,t_1 !mul_add_c(a[1],b[3],c2,c3,c1); 110474664626SKris Kennaway addcc c_12,t_1,t_1 != 110574664626SKris Kennaway bcs,a %xcc,.+8 110674664626SKris Kennaway add c_3,t_2,c_3 110774664626SKris Kennaway srlx t_1,32,c_12 110874664626SKris Kennaway stuw t_1,rp(4) !=!r[4]=c2; 110974664626SKris Kennaway or c_12,c_3,c_12 111074664626SKris Kennaway 111174664626SKris Kennaway mulx a_2,b_3,t_1 !mul_add_c(a[2],b[3],c3,c1,c2); 111274664626SKris Kennaway addcc c_12,t_1,c_12 111374664626SKris Kennaway clr c_3 != 111474664626SKris Kennaway bcs,a %xcc,.+8 111574664626SKris Kennaway add c_3,t_2,c_3 111674664626SKris Kennaway mulx a_3,b_2,t_1 !mul_add_c(a[3],b[2],c3,c1,c2); 111774664626SKris Kennaway addcc c_12,t_1,t_1 != 111874664626SKris Kennaway bcs,a %xcc,.+8 111974664626SKris Kennaway add c_3,t_2,c_3 112074664626SKris Kennaway srlx t_1,32,c_12 112174664626SKris Kennaway stuw t_1,rp(5) !=!r[5]=c3; 112274664626SKris Kennaway or c_12,c_3,c_12 112374664626SKris Kennaway 112474664626SKris Kennaway mulx a_3,b_3,t_1 !mul_add_c(a[3],b[3],c1,c2,c3); 112574664626SKris Kennaway addcc c_12,t_1,t_1 112674664626SKris Kennaway srlx t_1,32,c_12 != 112774664626SKris Kennaway stuw t_1,rp(6) !r[6]=c1; 112874664626SKris Kennaway stuw c_12,rp(7) !r[7]=c2; 112974664626SKris Kennaway 113074664626SKris Kennaway ret 113174664626SKris Kennaway restore %g0,%g0,%o0 113274664626SKris Kennaway 113374664626SKris Kennaway.type bn_mul_comba4,#function 113474664626SKris Kennaway.size bn_mul_comba4,(.-bn_mul_comba4) 113574664626SKris Kennaway 113674664626SKris Kennaway.align 32 113774664626SKris Kennaway 113874664626SKris Kennaway.global bn_sqr_comba8 113974664626SKris Kennawaybn_sqr_comba8: 114074664626SKris Kennaway save %sp,FRAME_SIZE,%sp 114174664626SKris Kennaway mov 1,t_2 114274664626SKris Kennaway lduw ap(0),a_0 114374664626SKris Kennaway sllx t_2,32,t_2 114474664626SKris Kennaway lduw ap(1),a_1 114574664626SKris Kennaway mulx a_0,a_0,t_1 !sqr_add_c(a,0,c1,c2,c3); 114674664626SKris Kennaway srlx t_1,32,c_12 114774664626SKris Kennaway stuw t_1,rp(0) !r[0]=c1; 114874664626SKris Kennaway 114974664626SKris Kennaway lduw ap(2),a_2 115074664626SKris Kennaway mulx a_0,a_1,t_1 !=!sqr_add_c2(a,1,0,c2,c3,c1); 115174664626SKris Kennaway addcc c_12,t_1,c_12 115274664626SKris Kennaway clr c_3 115374664626SKris Kennaway bcs,a %xcc,.+8 115474664626SKris Kennaway add c_3,t_2,c_3 115574664626SKris Kennaway addcc c_12,t_1,t_1 115674664626SKris Kennaway bcs,a %xcc,.+8 115774664626SKris Kennaway add c_3,t_2,c_3 115874664626SKris Kennaway srlx t_1,32,c_12 115974664626SKris Kennaway stuw t_1,rp(1) !r[1]=c2; 116074664626SKris Kennaway or c_12,c_3,c_12 116174664626SKris Kennaway 116274664626SKris Kennaway mulx a_2,a_0,t_1 !sqr_add_c2(a,2,0,c3,c1,c2); 116374664626SKris Kennaway addcc c_12,t_1,c_12 116474664626SKris Kennaway clr c_3 116574664626SKris Kennaway bcs,a %xcc,.+8 116674664626SKris Kennaway add c_3,t_2,c_3 116774664626SKris Kennaway addcc c_12,t_1,c_12 116874664626SKris Kennaway bcs,a %xcc,.+8 116974664626SKris Kennaway add c_3,t_2,c_3 117074664626SKris Kennaway lduw ap(3),a_3 117174664626SKris Kennaway mulx a_1,a_1,t_1 !sqr_add_c(a,1,c3,c1,c2); 117274664626SKris Kennaway addcc c_12,t_1,t_1 117374664626SKris Kennaway bcs,a %xcc,.+8 117474664626SKris Kennaway add c_3,t_2,c_3 117574664626SKris Kennaway srlx t_1,32,c_12 117674664626SKris Kennaway stuw t_1,rp(2) !r[2]=c3; 117774664626SKris Kennaway or c_12,c_3,c_12 117874664626SKris Kennaway 117974664626SKris Kennaway mulx a_0,a_3,t_1 !sqr_add_c2(a,3,0,c1,c2,c3); 118074664626SKris Kennaway addcc c_12,t_1,c_12 118174664626SKris Kennaway clr c_3 118274664626SKris Kennaway bcs,a %xcc,.+8 118374664626SKris Kennaway add c_3,t_2,c_3 118474664626SKris Kennaway addcc c_12,t_1,c_12 118574664626SKris Kennaway bcs,a %xcc,.+8 118674664626SKris Kennaway add c_3,t_2,c_3 118774664626SKris Kennaway lduw ap(4),a_4 118874664626SKris Kennaway mulx a_1,a_2,t_1 !sqr_add_c2(a,2,1,c1,c2,c3); 118974664626SKris Kennaway addcc c_12,t_1,c_12 119074664626SKris Kennaway bcs,a %xcc,.+8 119174664626SKris Kennaway add c_3,t_2,c_3 119274664626SKris Kennaway addcc c_12,t_1,t_1 119374664626SKris Kennaway bcs,a %xcc,.+8 119474664626SKris Kennaway add c_3,t_2,c_3 119574664626SKris Kennaway srlx t_1,32,c_12 119674664626SKris Kennaway st t_1,rp(3) !r[3]=c1; 119774664626SKris Kennaway or c_12,c_3,c_12 119874664626SKris Kennaway 119974664626SKris Kennaway mulx a_4,a_0,t_1 !sqr_add_c2(a,4,0,c2,c3,c1); 120074664626SKris Kennaway addcc c_12,t_1,c_12 120174664626SKris Kennaway clr c_3 120274664626SKris Kennaway bcs,a %xcc,.+8 120374664626SKris Kennaway add c_3,t_2,c_3 120474664626SKris Kennaway addcc c_12,t_1,c_12 120574664626SKris Kennaway bcs,a %xcc,.+8 120674664626SKris Kennaway add c_3,t_2,c_3 120774664626SKris Kennaway mulx a_3,a_1,t_1 !sqr_add_c2(a,3,1,c2,c3,c1); 120874664626SKris Kennaway addcc c_12,t_1,c_12 120974664626SKris Kennaway bcs,a %xcc,.+8 121074664626SKris Kennaway add c_3,t_2,c_3 121174664626SKris Kennaway addcc c_12,t_1,c_12 121274664626SKris Kennaway bcs,a %xcc,.+8 121374664626SKris Kennaway add c_3,t_2,c_3 121474664626SKris Kennaway lduw ap(5),a_5 121574664626SKris Kennaway mulx a_2,a_2,t_1 !sqr_add_c(a,2,c2,c3,c1); 121674664626SKris Kennaway addcc c_12,t_1,t_1 121774664626SKris Kennaway bcs,a %xcc,.+8 121874664626SKris Kennaway add c_3,t_2,c_3 121974664626SKris Kennaway srlx t_1,32,c_12 122074664626SKris Kennaway stuw t_1,rp(4) !r[4]=c2; 122174664626SKris Kennaway or c_12,c_3,c_12 122274664626SKris Kennaway 122374664626SKris Kennaway mulx a_0,a_5,t_1 !sqr_add_c2(a,5,0,c3,c1,c2); 122474664626SKris Kennaway addcc c_12,t_1,c_12 122574664626SKris Kennaway clr c_3 122674664626SKris Kennaway bcs,a %xcc,.+8 122774664626SKris Kennaway add c_3,t_2,c_3 122874664626SKris Kennaway addcc c_12,t_1,c_12 122974664626SKris Kennaway bcs,a %xcc,.+8 123074664626SKris Kennaway add c_3,t_2,c_3 123174664626SKris Kennaway mulx a_1,a_4,t_1 !sqr_add_c2(a,4,1,c3,c1,c2); 123274664626SKris Kennaway addcc c_12,t_1,c_12 123374664626SKris Kennaway bcs,a %xcc,.+8 123474664626SKris Kennaway add c_3,t_2,c_3 123574664626SKris Kennaway addcc c_12,t_1,c_12 123674664626SKris Kennaway bcs,a %xcc,.+8 123774664626SKris Kennaway add c_3,t_2,c_3 123874664626SKris Kennaway lduw ap(6),a_6 123974664626SKris Kennaway mulx a_2,a_3,t_1 !sqr_add_c2(a,3,2,c3,c1,c2); 124074664626SKris Kennaway addcc c_12,t_1,c_12 124174664626SKris Kennaway bcs,a %xcc,.+8 124274664626SKris Kennaway add c_3,t_2,c_3 124374664626SKris Kennaway addcc c_12,t_1,t_1 124474664626SKris Kennaway bcs,a %xcc,.+8 124574664626SKris Kennaway add c_3,t_2,c_3 124674664626SKris Kennaway srlx t_1,32,c_12 124774664626SKris Kennaway stuw t_1,rp(5) !r[5]=c3; 124874664626SKris Kennaway or c_12,c_3,c_12 124974664626SKris Kennaway 125074664626SKris Kennaway mulx a_6,a_0,t_1 !sqr_add_c2(a,6,0,c1,c2,c3); 125174664626SKris Kennaway addcc c_12,t_1,c_12 125274664626SKris Kennaway clr c_3 125374664626SKris Kennaway bcs,a %xcc,.+8 125474664626SKris Kennaway add c_3,t_2,c_3 125574664626SKris Kennaway addcc c_12,t_1,c_12 125674664626SKris Kennaway bcs,a %xcc,.+8 125774664626SKris Kennaway add c_3,t_2,c_3 125874664626SKris Kennaway mulx a_5,a_1,t_1 !sqr_add_c2(a,5,1,c1,c2,c3); 125974664626SKris Kennaway addcc c_12,t_1,c_12 126074664626SKris Kennaway bcs,a %xcc,.+8 126174664626SKris Kennaway add c_3,t_2,c_3 126274664626SKris Kennaway addcc c_12,t_1,c_12 126374664626SKris Kennaway bcs,a %xcc,.+8 126474664626SKris Kennaway add c_3,t_2,c_3 126574664626SKris Kennaway mulx a_4,a_2,t_1 !sqr_add_c2(a,4,2,c1,c2,c3); 126674664626SKris Kennaway addcc c_12,t_1,c_12 126774664626SKris Kennaway bcs,a %xcc,.+8 126874664626SKris Kennaway add c_3,t_2,c_3 126974664626SKris Kennaway addcc c_12,t_1,c_12 127074664626SKris Kennaway bcs,a %xcc,.+8 127174664626SKris Kennaway add c_3,t_2,c_3 127274664626SKris Kennaway lduw ap(7),a_7 127374664626SKris Kennaway mulx a_3,a_3,t_1 !=!sqr_add_c(a,3,c1,c2,c3); 127474664626SKris Kennaway addcc c_12,t_1,t_1 127574664626SKris Kennaway bcs,a %xcc,.+8 127674664626SKris Kennaway add c_3,t_2,c_3 127774664626SKris Kennaway srlx t_1,32,c_12 127874664626SKris Kennaway stuw t_1,rp(6) !r[6]=c1; 127974664626SKris Kennaway or c_12,c_3,c_12 128074664626SKris Kennaway 128174664626SKris Kennaway mulx a_0,a_7,t_1 !sqr_add_c2(a,7,0,c2,c3,c1); 128274664626SKris Kennaway addcc c_12,t_1,c_12 128374664626SKris Kennaway clr c_3 128474664626SKris Kennaway bcs,a %xcc,.+8 128574664626SKris Kennaway add c_3,t_2,c_3 128674664626SKris Kennaway addcc c_12,t_1,c_12 128774664626SKris Kennaway bcs,a %xcc,.+8 128874664626SKris Kennaway add c_3,t_2,c_3 128974664626SKris Kennaway mulx a_1,a_6,t_1 !sqr_add_c2(a,6,1,c2,c3,c1); 129074664626SKris Kennaway addcc c_12,t_1,c_12 129174664626SKris Kennaway bcs,a %xcc,.+8 129274664626SKris Kennaway add c_3,t_2,c_3 129374664626SKris Kennaway addcc c_12,t_1,c_12 129474664626SKris Kennaway bcs,a %xcc,.+8 129574664626SKris Kennaway add c_3,t_2,c_3 129674664626SKris Kennaway mulx a_2,a_5,t_1 !sqr_add_c2(a,5,2,c2,c3,c1); 129774664626SKris Kennaway addcc c_12,t_1,c_12 129874664626SKris Kennaway bcs,a %xcc,.+8 129974664626SKris Kennaway add c_3,t_2,c_3 130074664626SKris Kennaway addcc c_12,t_1,c_12 130174664626SKris Kennaway bcs,a %xcc,.+8 130274664626SKris Kennaway add c_3,t_2,c_3 130374664626SKris Kennaway mulx a_3,a_4,t_1 !sqr_add_c2(a,4,3,c2,c3,c1); 130474664626SKris Kennaway addcc c_12,t_1,c_12 130574664626SKris Kennaway bcs,a %xcc,.+8 130674664626SKris Kennaway add c_3,t_2,c_3 130774664626SKris Kennaway addcc c_12,t_1,t_1 130874664626SKris Kennaway bcs,a %xcc,.+8 130974664626SKris Kennaway add c_3,t_2,c_3 131074664626SKris Kennaway srlx t_1,32,c_12 131174664626SKris Kennaway stuw t_1,rp(7) !r[7]=c2; 131274664626SKris Kennaway or c_12,c_3,c_12 131374664626SKris Kennaway 131474664626SKris Kennaway mulx a_7,a_1,t_1 !sqr_add_c2(a,7,1,c3,c1,c2); 131574664626SKris Kennaway addcc c_12,t_1,c_12 131674664626SKris Kennaway clr c_3 131774664626SKris Kennaway bcs,a %xcc,.+8 131874664626SKris Kennaway add c_3,t_2,c_3 131974664626SKris Kennaway addcc c_12,t_1,c_12 132074664626SKris Kennaway bcs,a %xcc,.+8 132174664626SKris Kennaway add c_3,t_2,c_3 132274664626SKris Kennaway mulx a_6,a_2,t_1 !sqr_add_c2(a,6,2,c3,c1,c2); 132374664626SKris Kennaway addcc c_12,t_1,c_12 132474664626SKris Kennaway bcs,a %xcc,.+8 132574664626SKris Kennaway add c_3,t_2,c_3 132674664626SKris Kennaway addcc c_12,t_1,c_12 132774664626SKris Kennaway bcs,a %xcc,.+8 132874664626SKris Kennaway add c_3,t_2,c_3 132974664626SKris Kennaway mulx a_5,a_3,t_1 !sqr_add_c2(a,5,3,c3,c1,c2); 133074664626SKris Kennaway addcc c_12,t_1,c_12 133174664626SKris Kennaway bcs,a %xcc,.+8 133274664626SKris Kennaway add c_3,t_2,c_3 133374664626SKris Kennaway addcc c_12,t_1,c_12 133474664626SKris Kennaway bcs,a %xcc,.+8 133574664626SKris Kennaway add c_3,t_2,c_3 133674664626SKris Kennaway mulx a_4,a_4,t_1 !sqr_add_c(a,4,c3,c1,c2); 133774664626SKris Kennaway addcc c_12,t_1,t_1 133874664626SKris Kennaway bcs,a %xcc,.+8 133974664626SKris Kennaway add c_3,t_2,c_3 134074664626SKris Kennaway srlx t_1,32,c_12 134174664626SKris Kennaway stuw t_1,rp(8) !r[8]=c3; 134274664626SKris Kennaway or c_12,c_3,c_12 134374664626SKris Kennaway 134474664626SKris Kennaway mulx a_2,a_7,t_1 !sqr_add_c2(a,7,2,c1,c2,c3); 134574664626SKris Kennaway addcc c_12,t_1,c_12 134674664626SKris Kennaway clr c_3 134774664626SKris Kennaway bcs,a %xcc,.+8 134874664626SKris Kennaway add c_3,t_2,c_3 134974664626SKris Kennaway addcc c_12,t_1,c_12 135074664626SKris Kennaway bcs,a %xcc,.+8 135174664626SKris Kennaway add c_3,t_2,c_3 135274664626SKris Kennaway mulx a_3,a_6,t_1 !sqr_add_c2(a,6,3,c1,c2,c3); 135374664626SKris Kennaway addcc c_12,t_1,c_12 135474664626SKris Kennaway bcs,a %xcc,.+8 135574664626SKris Kennaway add c_3,t_2,c_3 135674664626SKris Kennaway addcc c_12,t_1,c_12 135774664626SKris Kennaway bcs,a %xcc,.+8 135874664626SKris Kennaway add c_3,t_2,c_3 135974664626SKris Kennaway mulx a_4,a_5,t_1 !sqr_add_c2(a,5,4,c1,c2,c3); 136074664626SKris Kennaway addcc c_12,t_1,c_12 136174664626SKris Kennaway bcs,a %xcc,.+8 136274664626SKris Kennaway add c_3,t_2,c_3 136374664626SKris Kennaway addcc c_12,t_1,t_1 136474664626SKris Kennaway bcs,a %xcc,.+8 136574664626SKris Kennaway add c_3,t_2,c_3 136674664626SKris Kennaway srlx t_1,32,c_12 136774664626SKris Kennaway stuw t_1,rp(9) !r[9]=c1; 136874664626SKris Kennaway or c_12,c_3,c_12 136974664626SKris Kennaway 137074664626SKris Kennaway mulx a_7,a_3,t_1 !sqr_add_c2(a,7,3,c2,c3,c1); 137174664626SKris Kennaway addcc c_12,t_1,c_12 137274664626SKris Kennaway clr c_3 137374664626SKris Kennaway bcs,a %xcc,.+8 137474664626SKris Kennaway add c_3,t_2,c_3 137574664626SKris Kennaway addcc c_12,t_1,c_12 137674664626SKris Kennaway bcs,a %xcc,.+8 137774664626SKris Kennaway add c_3,t_2,c_3 137874664626SKris Kennaway mulx a_6,a_4,t_1 !sqr_add_c2(a,6,4,c2,c3,c1); 137974664626SKris Kennaway addcc c_12,t_1,c_12 138074664626SKris Kennaway bcs,a %xcc,.+8 138174664626SKris Kennaway add c_3,t_2,c_3 138274664626SKris Kennaway addcc c_12,t_1,c_12 138374664626SKris Kennaway bcs,a %xcc,.+8 138474664626SKris Kennaway add c_3,t_2,c_3 138574664626SKris Kennaway mulx a_5,a_5,t_1 !sqr_add_c(a,5,c2,c3,c1); 138674664626SKris Kennaway addcc c_12,t_1,t_1 138774664626SKris Kennaway bcs,a %xcc,.+8 138874664626SKris Kennaway add c_3,t_2,c_3 138974664626SKris Kennaway srlx t_1,32,c_12 139074664626SKris Kennaway stuw t_1,rp(10) !r[10]=c2; 139174664626SKris Kennaway or c_12,c_3,c_12 139274664626SKris Kennaway 139374664626SKris Kennaway mulx a_4,a_7,t_1 !sqr_add_c2(a,7,4,c3,c1,c2); 139474664626SKris Kennaway addcc c_12,t_1,c_12 139574664626SKris Kennaway clr c_3 139674664626SKris Kennaway bcs,a %xcc,.+8 139774664626SKris Kennaway add c_3,t_2,c_3 139874664626SKris Kennaway addcc c_12,t_1,c_12 139974664626SKris Kennaway bcs,a %xcc,.+8 140074664626SKris Kennaway add c_3,t_2,c_3 140174664626SKris Kennaway mulx a_5,a_6,t_1 !sqr_add_c2(a,6,5,c3,c1,c2); 140274664626SKris Kennaway addcc c_12,t_1,c_12 140374664626SKris Kennaway bcs,a %xcc,.+8 140474664626SKris Kennaway add c_3,t_2,c_3 140574664626SKris Kennaway addcc c_12,t_1,t_1 140674664626SKris Kennaway bcs,a %xcc,.+8 140774664626SKris Kennaway add c_3,t_2,c_3 140874664626SKris Kennaway srlx t_1,32,c_12 140974664626SKris Kennaway stuw t_1,rp(11) !r[11]=c3; 141074664626SKris Kennaway or c_12,c_3,c_12 141174664626SKris Kennaway 141274664626SKris Kennaway mulx a_7,a_5,t_1 !sqr_add_c2(a,7,5,c1,c2,c3); 141374664626SKris Kennaway addcc c_12,t_1,c_12 141474664626SKris Kennaway clr c_3 141574664626SKris Kennaway bcs,a %xcc,.+8 141674664626SKris Kennaway add c_3,t_2,c_3 141774664626SKris Kennaway addcc c_12,t_1,c_12 141874664626SKris Kennaway bcs,a %xcc,.+8 141974664626SKris Kennaway add c_3,t_2,c_3 142074664626SKris Kennaway mulx a_6,a_6,t_1 !sqr_add_c(a,6,c1,c2,c3); 142174664626SKris Kennaway addcc c_12,t_1,t_1 142274664626SKris Kennaway bcs,a %xcc,.+8 142374664626SKris Kennaway add c_3,t_2,c_3 142474664626SKris Kennaway srlx t_1,32,c_12 142574664626SKris Kennaway stuw t_1,rp(12) !r[12]=c1; 142674664626SKris Kennaway or c_12,c_3,c_12 142774664626SKris Kennaway 142874664626SKris Kennaway mulx a_6,a_7,t_1 !sqr_add_c2(a,7,6,c2,c3,c1); 142974664626SKris Kennaway addcc c_12,t_1,c_12 143074664626SKris Kennaway clr c_3 143174664626SKris Kennaway bcs,a %xcc,.+8 143274664626SKris Kennaway add c_3,t_2,c_3 143374664626SKris Kennaway addcc c_12,t_1,t_1 143474664626SKris Kennaway bcs,a %xcc,.+8 143574664626SKris Kennaway add c_3,t_2,c_3 143674664626SKris Kennaway srlx t_1,32,c_12 143774664626SKris Kennaway stuw t_1,rp(13) !r[13]=c2; 143874664626SKris Kennaway or c_12,c_3,c_12 143974664626SKris Kennaway 144074664626SKris Kennaway mulx a_7,a_7,t_1 !sqr_add_c(a,7,c3,c1,c2); 144174664626SKris Kennaway addcc c_12,t_1,t_1 144274664626SKris Kennaway srlx t_1,32,c_12 144374664626SKris Kennaway stuw t_1,rp(14) !r[14]=c3; 144474664626SKris Kennaway stuw c_12,rp(15) !r[15]=c1; 144574664626SKris Kennaway 144674664626SKris Kennaway ret 144774664626SKris Kennaway restore %g0,%g0,%o0 144874664626SKris Kennaway 144974664626SKris Kennaway.type bn_sqr_comba8,#function 145074664626SKris Kennaway.size bn_sqr_comba8,(.-bn_sqr_comba8) 145174664626SKris Kennaway 145274664626SKris Kennaway.align 32 145374664626SKris Kennaway 145474664626SKris Kennaway.global bn_sqr_comba4 145574664626SKris Kennaway/* 145674664626SKris Kennaway * void bn_sqr_comba4(r,a) 145774664626SKris Kennaway * BN_ULONG *r,*a; 145874664626SKris Kennaway */ 145974664626SKris Kennawaybn_sqr_comba4: 146074664626SKris Kennaway save %sp,FRAME_SIZE,%sp 146174664626SKris Kennaway mov 1,t_2 146274664626SKris Kennaway lduw ap(0),a_0 146374664626SKris Kennaway sllx t_2,32,t_2 146474664626SKris Kennaway lduw ap(1),a_1 146574664626SKris Kennaway mulx a_0,a_0,t_1 !sqr_add_c(a,0,c1,c2,c3); 146674664626SKris Kennaway srlx t_1,32,c_12 146774664626SKris Kennaway stuw t_1,rp(0) !r[0]=c1; 146874664626SKris Kennaway 146974664626SKris Kennaway lduw ap(2),a_2 147074664626SKris Kennaway mulx a_0,a_1,t_1 !sqr_add_c2(a,1,0,c2,c3,c1); 147174664626SKris Kennaway addcc c_12,t_1,c_12 147274664626SKris Kennaway clr c_3 147374664626SKris Kennaway bcs,a %xcc,.+8 147474664626SKris Kennaway add c_3,t_2,c_3 147574664626SKris Kennaway addcc c_12,t_1,t_1 147674664626SKris Kennaway bcs,a %xcc,.+8 147774664626SKris Kennaway add c_3,t_2,c_3 147874664626SKris Kennaway srlx t_1,32,c_12 147974664626SKris Kennaway stuw t_1,rp(1) !r[1]=c2; 148074664626SKris Kennaway or c_12,c_3,c_12 148174664626SKris Kennaway 148274664626SKris Kennaway mulx a_2,a_0,t_1 !sqr_add_c2(a,2,0,c3,c1,c2); 148374664626SKris Kennaway addcc c_12,t_1,c_12 148474664626SKris Kennaway clr c_3 148574664626SKris Kennaway bcs,a %xcc,.+8 148674664626SKris Kennaway add c_3,t_2,c_3 148774664626SKris Kennaway addcc c_12,t_1,c_12 148874664626SKris Kennaway bcs,a %xcc,.+8 148974664626SKris Kennaway add c_3,t_2,c_3 149074664626SKris Kennaway lduw ap(3),a_3 149174664626SKris Kennaway mulx a_1,a_1,t_1 !sqr_add_c(a,1,c3,c1,c2); 149274664626SKris Kennaway addcc c_12,t_1,t_1 149374664626SKris Kennaway bcs,a %xcc,.+8 149474664626SKris Kennaway add c_3,t_2,c_3 149574664626SKris Kennaway srlx t_1,32,c_12 149674664626SKris Kennaway stuw t_1,rp(2) !r[2]=c3; 149774664626SKris Kennaway or c_12,c_3,c_12 149874664626SKris Kennaway 149974664626SKris Kennaway mulx a_0,a_3,t_1 !sqr_add_c2(a,3,0,c1,c2,c3); 150074664626SKris Kennaway addcc c_12,t_1,c_12 150174664626SKris Kennaway clr c_3 150274664626SKris Kennaway bcs,a %xcc,.+8 150374664626SKris Kennaway add c_3,t_2,c_3 150474664626SKris Kennaway addcc c_12,t_1,c_12 150574664626SKris Kennaway bcs,a %xcc,.+8 150674664626SKris Kennaway add c_3,t_2,c_3 150774664626SKris Kennaway mulx a_1,a_2,t_1 !sqr_add_c2(a,2,1,c1,c2,c3); 150874664626SKris Kennaway addcc c_12,t_1,c_12 150974664626SKris Kennaway bcs,a %xcc,.+8 151074664626SKris Kennaway add c_3,t_2,c_3 151174664626SKris Kennaway addcc c_12,t_1,t_1 151274664626SKris Kennaway bcs,a %xcc,.+8 151374664626SKris Kennaway add c_3,t_2,c_3 151474664626SKris Kennaway srlx t_1,32,c_12 151574664626SKris Kennaway stuw t_1,rp(3) !r[3]=c1; 151674664626SKris Kennaway or c_12,c_3,c_12 151774664626SKris Kennaway 151874664626SKris Kennaway mulx a_3,a_1,t_1 !sqr_add_c2(a,3,1,c2,c3,c1); 151974664626SKris Kennaway addcc c_12,t_1,c_12 152074664626SKris Kennaway clr c_3 152174664626SKris Kennaway bcs,a %xcc,.+8 152274664626SKris Kennaway add c_3,t_2,c_3 152374664626SKris Kennaway addcc c_12,t_1,c_12 152474664626SKris Kennaway bcs,a %xcc,.+8 152574664626SKris Kennaway add c_3,t_2,c_3 152674664626SKris Kennaway mulx a_2,a_2,t_1 !sqr_add_c(a,2,c2,c3,c1); 152774664626SKris Kennaway addcc c_12,t_1,t_1 152874664626SKris Kennaway bcs,a %xcc,.+8 152974664626SKris Kennaway add c_3,t_2,c_3 153074664626SKris Kennaway srlx t_1,32,c_12 153174664626SKris Kennaway stuw t_1,rp(4) !r[4]=c2; 153274664626SKris Kennaway or c_12,c_3,c_12 153374664626SKris Kennaway 153474664626SKris Kennaway mulx a_2,a_3,t_1 !sqr_add_c2(a,3,2,c3,c1,c2); 153574664626SKris Kennaway addcc c_12,t_1,c_12 153674664626SKris Kennaway clr c_3 153774664626SKris Kennaway bcs,a %xcc,.+8 153874664626SKris Kennaway add c_3,t_2,c_3 153974664626SKris Kennaway addcc c_12,t_1,t_1 154074664626SKris Kennaway bcs,a %xcc,.+8 154174664626SKris Kennaway add c_3,t_2,c_3 154274664626SKris Kennaway srlx t_1,32,c_12 154374664626SKris Kennaway stuw t_1,rp(5) !r[5]=c3; 154474664626SKris Kennaway or c_12,c_3,c_12 154574664626SKris Kennaway 154674664626SKris Kennaway mulx a_3,a_3,t_1 !sqr_add_c(a,3,c1,c2,c3); 154774664626SKris Kennaway addcc c_12,t_1,t_1 154874664626SKris Kennaway srlx t_1,32,c_12 154974664626SKris Kennaway stuw t_1,rp(6) !r[6]=c1; 155074664626SKris Kennaway stuw c_12,rp(7) !r[7]=c2; 155174664626SKris Kennaway 155274664626SKris Kennaway ret 155374664626SKris Kennaway restore %g0,%g0,%o0 155474664626SKris Kennaway 155574664626SKris Kennaway.type bn_sqr_comba4,#function 155674664626SKris Kennaway.size bn_sqr_comba4,(.-bn_sqr_comba4) 155774664626SKris Kennaway 155874664626SKris Kennaway.align 32 1559