174664626SKris Kennaway.ident	"sparcv8plus.s, Version 1.4"
2e71b7053SJung-uk Kim.ident	"SPARC v9 ISA artwork by Andy Polyakov <appro@openssl.org>"
374664626SKris Kennaway
474664626SKris Kennaway/*
574664626SKris Kennaway * ====================================================================
6b077aed3SPierre Pronchery * Copyright 1999-2016 The OpenSSL Project Authors. All Rights Reserved.
774664626SKris Kennaway *
8b077aed3SPierre Pronchery * Licensed under the Apache License 2.0 (the "License").  You may not use
9e71b7053SJung-uk Kim * this file except in compliance with the License.  You can obtain a copy
10e71b7053SJung-uk Kim * in the file LICENSE in the source distribution or at
11e71b7053SJung-uk Kim * https://www.openssl.org/source/license.html
1274664626SKris Kennaway * ====================================================================
1374664626SKris Kennaway */
1474664626SKris Kennaway
1574664626SKris Kennaway/*
16e71b7053SJung-uk Kim * This is my modest contribution to OpenSSL project (see
1774664626SKris Kennaway * http://www.openssl.org/ for more information about it) and is
1874664626SKris Kennaway * a drop-in UltraSPARC ISA replacement for crypto/bn/bn_asm.c
1974664626SKris Kennaway * module. For updates see http://fy.chalmers.se/~appro/hpe/.
2074664626SKris Kennaway *
2174664626SKris Kennaway * Questions-n-answers.
2274664626SKris Kennaway *
2374664626SKris Kennaway * Q. How to compile?
2474664626SKris Kennaway * A. With SC4.x/SC5.x:
2574664626SKris Kennaway *
2674664626SKris Kennaway *	cc -xarch=v8plus -c bn_asm.sparc.v8plus.S -o bn_asm.o
2774664626SKris Kennaway *
2874664626SKris Kennaway *    and with gcc:
2974664626SKris Kennaway *
3074664626SKris Kennaway *	gcc -mcpu=ultrasparc -c bn_asm.sparc.v8plus.S -o bn_asm.o
3174664626SKris Kennaway *
3274664626SKris Kennaway *    or if above fails (it does if you have gas installed):
3374664626SKris Kennaway *
3474664626SKris Kennaway *	gcc -E bn_asm.sparc.v8plus.S | as -xarch=v8plus /dev/fd/0 -o bn_asm.o
3574664626SKris Kennaway *
3674664626SKris Kennaway *    Quick-n-dirty way to fuse the module into the library.
3774664626SKris Kennaway *    Provided that the library is already configured and built
3874664626SKris Kennaway *    (in 0.9.2 case with no-asm option):
3974664626SKris Kennaway *
4074664626SKris Kennaway *	# cd crypto/bn
4174664626SKris Kennaway *	# cp /some/place/bn_asm.sparc.v8plus.S .
4274664626SKris Kennaway *	# cc -xarch=v8plus -c bn_asm.sparc.v8plus.S -o bn_asm.o
4374664626SKris Kennaway *	# make
4474664626SKris Kennaway *	# cd ../..
4574664626SKris Kennaway *	# make; make test
4674664626SKris Kennaway *
4774664626SKris Kennaway *    Quick-n-dirty way to get rid of it:
4874664626SKris Kennaway *
4974664626SKris Kennaway *	# cd crypto/bn
5074664626SKris Kennaway *	# touch bn_asm.c
5174664626SKris Kennaway *	# make
5274664626SKris Kennaway *	# cd ../..
5374664626SKris Kennaway *	# make; make test
5474664626SKris Kennaway *
55e71b7053SJung-uk Kim * Q. V8plus architecture? What kind of beast is that?
5674664626SKris Kennaway * A. Well, it's rather a programming model than an architecture...
5774664626SKris Kennaway *    It's actually v9-compliant, i.e. *any* UltraSPARC, CPU under
5874664626SKris Kennaway *    special conditions, namely when kernel doesn't preserve upper
5974664626SKris Kennaway *    32 bits of otherwise 64-bit registers during a context switch.
6074664626SKris Kennaway *
6174664626SKris Kennaway * Q. Why just UltraSPARC? What about SuperSPARC?
6274664626SKris Kennaway * A. Original release did target UltraSPARC only. Now SuperSPARC
6374664626SKris Kennaway *    version is provided along. Both version share bn_*comba[48]
6474664626SKris Kennaway *    implementations (see comment later in code for explanation).
6574664626SKris Kennaway *    But what's so special about this UltraSPARC implementation?
6674664626SKris Kennaway *    Why didn't I let compiler do the job? Trouble is that most of
6774664626SKris Kennaway *    available compilers (well, SC5.0 is the only exception) don't
6874664626SKris Kennaway *    attempt to take advantage of UltraSPARC's 64-bitness under
6974664626SKris Kennaway *    32-bit kernels even though it's perfectly possible (see next
7074664626SKris Kennaway *    question).
7174664626SKris Kennaway *
7274664626SKris Kennaway * Q. 64-bit registers under 32-bit kernels? Didn't you just say it
7374664626SKris Kennaway *    doesn't work?
74e71b7053SJung-uk Kim * A. You can't address *all* registers as 64-bit wide:-( The catch is
7574664626SKris Kennaway *    that you actually may rely upon %o0-%o5 and %g1-%g4 being fully
7674664626SKris Kennaway *    preserved if you're in a leaf function, i.e. such never calling
7774664626SKris Kennaway *    any other functions. All functions in this module are leaf and
7874664626SKris Kennaway *    10 registers is a handful. And as a matter of fact none-"comba"
7974664626SKris Kennaway *    routines don't require even that much and I could even afford to
8074664626SKris Kennaway *    not allocate own stack frame for 'em:-)
8174664626SKris Kennaway *
8274664626SKris Kennaway * Q. What about 64-bit kernels?
8374664626SKris Kennaway * A. What about 'em? Just kidding:-) Pure 64-bit version is currently
8474664626SKris Kennaway *    under evaluation and development...
8574664626SKris Kennaway *
8674664626SKris Kennaway * Q. What about shared libraries?
8774664626SKris Kennaway * A. What about 'em? Kidding again:-) Code does *not* contain any
8874664626SKris Kennaway *    code position dependencies and it's safe to include it into
8974664626SKris Kennaway *    shared library as is.
9074664626SKris Kennaway *
9174664626SKris Kennaway * Q. How much faster does it go?
9274664626SKris Kennaway * A. Do you have a good benchmark? In either case below is what I
9374664626SKris Kennaway *    experience with crypto/bn/expspeed.c test program:
9474664626SKris Kennaway *
9574664626SKris Kennaway *	v8plus module on U10/300MHz against bn_asm.c compiled with:
9674664626SKris Kennaway *
9774664626SKris Kennaway *	cc-5.0 -xarch=v8plus -xO5 -xdepend	+7-12%
9874664626SKris Kennaway *	cc-4.2 -xarch=v8plus -xO5 -xdepend	+25-35%
9974664626SKris Kennaway *	egcs-1.1.2 -mcpu=ultrasparc -O3		+35-45%
10074664626SKris Kennaway *
10174664626SKris Kennaway *	v8 module on SS10/60MHz against bn_asm.c compiled with:
10274664626SKris Kennaway *
10374664626SKris Kennaway *	cc-5.0 -xarch=v8 -xO5 -xdepend		+7-10%
10474664626SKris Kennaway *	cc-4.2 -xarch=v8 -xO5 -xdepend		+10%
10574664626SKris Kennaway *	egcs-1.1.2 -mv8 -O3			+35-45%
10674664626SKris Kennaway *
10774664626SKris Kennaway *    As you can see it's damn hard to beat the new Sun C compiler
10874664626SKris Kennaway *    and it's in first place GNU C users who will appreciate this
10974664626SKris Kennaway *    assembler implementation:-)
11074664626SKris Kennaway */
11174664626SKris Kennaway
11274664626SKris Kennaway/*
11374664626SKris Kennaway * Revision history.
11474664626SKris Kennaway *
11574664626SKris Kennaway * 1.0	- initial release;
11674664626SKris Kennaway * 1.1	- new loop unrolling model(*);
11774664626SKris Kennaway *	- some more fine tuning;
11874664626SKris Kennaway * 1.2	- made gas friendly;
11974664626SKris Kennaway *	- updates to documentation concerning v9;
12074664626SKris Kennaway *	- new performance comparison matrix;
12174664626SKris Kennaway * 1.3	- fixed problem with /usr/ccs/lib/cpp;
12274664626SKris Kennaway * 1.4	- native V9 bn_*_comba[48] implementation (15% more efficient)
12374664626SKris Kennaway *	  resulting in slight overall performance kick;
12474664626SKris Kennaway *	- some retunes;
12574664626SKris Kennaway *	- support for GNU as added;
12674664626SKris Kennaway *
12774664626SKris Kennaway * (*)	Originally unrolled loop looked like this:
12874664626SKris Kennaway *	    for (;;) {
12974664626SKris Kennaway *		op(p+0); if (--n==0) break;
13074664626SKris Kennaway *		op(p+1); if (--n==0) break;
13174664626SKris Kennaway *		op(p+2); if (--n==0) break;
13274664626SKris Kennaway *		op(p+3); if (--n==0) break;
13374664626SKris Kennaway *		p+=4;
13474664626SKris Kennaway *	    }
13574664626SKris Kennaway *	I unroll according to following:
13674664626SKris Kennaway *	    while (n&~3) {
13774664626SKris Kennaway *		op(p+0); op(p+1); op(p+2); op(p+3);
13874664626SKris Kennaway *		p+=4; n=-4;
13974664626SKris Kennaway *	    }
14074664626SKris Kennaway *	    if (n) {
14174664626SKris Kennaway *		op(p+0); if (--n==0) return;
14274664626SKris Kennaway *		op(p+2); if (--n==0) return;
14374664626SKris Kennaway *		op(p+3); return;
14474664626SKris Kennaway *	    }
14574664626SKris Kennaway */
14674664626SKris Kennaway
1471f13597dSJung-uk Kim#if defined(__SUNPRO_C) && defined(__sparcv9)
1481f13597dSJung-uk Kim  /* They've said -xarch=v9 at command line */
1491f13597dSJung-uk Kim  .register	%g2,#scratch
1501f13597dSJung-uk Kim  .register	%g3,#scratch
1511f13597dSJung-uk Kim# define	FRAME_SIZE	-192
1521f13597dSJung-uk Kim#elif defined(__GNUC__) && defined(__arch64__)
1531f13597dSJung-uk Kim  /* They've said -m64 at command line */
1541f13597dSJung-uk Kim  .register	%g2,#scratch
1551f13597dSJung-uk Kim  .register	%g3,#scratch
1561f13597dSJung-uk Kim# define	FRAME_SIZE	-192
1571f13597dSJung-uk Kim#else
1581f13597dSJung-uk Kim# define	FRAME_SIZE	-96
1591f13597dSJung-uk Kim#endif
16074664626SKris Kennaway/*
16174664626SKris Kennaway * GNU assembler can't stand stuw:-(
16274664626SKris Kennaway */
16374664626SKris Kennaway#define stuw st
16474664626SKris Kennaway
16574664626SKris Kennaway.section	".text",#alloc,#execinstr
16674664626SKris Kennaway.file		"bn_asm.sparc.v8plus.S"
16774664626SKris Kennaway
16874664626SKris Kennaway.align	32
16974664626SKris Kennaway
17074664626SKris Kennaway.global bn_mul_add_words
17174664626SKris Kennaway/*
17274664626SKris Kennaway * BN_ULONG bn_mul_add_words(rp,ap,num,w)
17374664626SKris Kennaway * BN_ULONG *rp,*ap;
17474664626SKris Kennaway * int num;
17574664626SKris Kennaway * BN_ULONG w;
17674664626SKris Kennaway */
17774664626SKris Kennawaybn_mul_add_words:
1783b4e3dcbSSimon L. B. Nielsen	sra	%o2,%g0,%o2	! signx %o2
17974664626SKris Kennaway	brgz,a	%o2,.L_bn_mul_add_words_proceed
18074664626SKris Kennaway	lduw	[%o1],%g2
18174664626SKris Kennaway	retl
18274664626SKris Kennaway	clr	%o0
1833b4e3dcbSSimon L. B. Nielsen	nop
1843b4e3dcbSSimon L. B. Nielsen	nop
1853b4e3dcbSSimon L. B. Nielsen	nop
18674664626SKris Kennaway
18774664626SKris Kennaway.L_bn_mul_add_words_proceed:
18874664626SKris Kennaway	srl	%o3,%g0,%o3	! clruw	%o3
18974664626SKris Kennaway	andcc	%o2,-4,%g0
19074664626SKris Kennaway	bz,pn	%icc,.L_bn_mul_add_words_tail
19174664626SKris Kennaway	clr	%o5
19274664626SKris Kennaway
19374664626SKris Kennaway.L_bn_mul_add_words_loop:	! wow! 32 aligned!
19474664626SKris Kennaway	lduw	[%o0],%g1
19574664626SKris Kennaway	lduw	[%o1+4],%g3
19674664626SKris Kennaway	mulx	%o3,%g2,%g2
19774664626SKris Kennaway	add	%g1,%o5,%o4
19874664626SKris Kennaway	nop
19974664626SKris Kennaway	add	%o4,%g2,%o4
20074664626SKris Kennaway	stuw	%o4,[%o0]
20174664626SKris Kennaway	srlx	%o4,32,%o5
20274664626SKris Kennaway
20374664626SKris Kennaway	lduw	[%o0+4],%g1
20474664626SKris Kennaway	lduw	[%o1+8],%g2
20574664626SKris Kennaway	mulx	%o3,%g3,%g3
20674664626SKris Kennaway	add	%g1,%o5,%o4
20774664626SKris Kennaway	dec	4,%o2
20874664626SKris Kennaway	add	%o4,%g3,%o4
20974664626SKris Kennaway	stuw	%o4,[%o0+4]
21074664626SKris Kennaway	srlx	%o4,32,%o5
21174664626SKris Kennaway
21274664626SKris Kennaway	lduw	[%o0+8],%g1
21374664626SKris Kennaway	lduw	[%o1+12],%g3
21474664626SKris Kennaway	mulx	%o3,%g2,%g2
21574664626SKris Kennaway	add	%g1,%o5,%o4
21674664626SKris Kennaway	inc	16,%o1
21774664626SKris Kennaway	add	%o4,%g2,%o4
21874664626SKris Kennaway	stuw	%o4,[%o0+8]
21974664626SKris Kennaway	srlx	%o4,32,%o5
22074664626SKris Kennaway
22174664626SKris Kennaway	lduw	[%o0+12],%g1
22274664626SKris Kennaway	mulx	%o3,%g3,%g3
22374664626SKris Kennaway	add	%g1,%o5,%o4
22474664626SKris Kennaway	inc	16,%o0
22574664626SKris Kennaway	add	%o4,%g3,%o4
22674664626SKris Kennaway	andcc	%o2,-4,%g0
22774664626SKris Kennaway	stuw	%o4,[%o0-4]
22874664626SKris Kennaway	srlx	%o4,32,%o5
22974664626SKris Kennaway	bnz,a,pt	%icc,.L_bn_mul_add_words_loop
23074664626SKris Kennaway	lduw	[%o1],%g2
23174664626SKris Kennaway
23274664626SKris Kennaway	brnz,a,pn	%o2,.L_bn_mul_add_words_tail
23374664626SKris Kennaway	lduw	[%o1],%g2
23474664626SKris Kennaway.L_bn_mul_add_words_return:
23574664626SKris Kennaway	retl
23674664626SKris Kennaway	mov	%o5,%o0
23774664626SKris Kennaway
23874664626SKris Kennaway.L_bn_mul_add_words_tail:
23974664626SKris Kennaway	lduw	[%o0],%g1
24074664626SKris Kennaway	mulx	%o3,%g2,%g2
24174664626SKris Kennaway	add	%g1,%o5,%o4
24274664626SKris Kennaway	dec	%o2
24374664626SKris Kennaway	add	%o4,%g2,%o4
24474664626SKris Kennaway	srlx	%o4,32,%o5
24574664626SKris Kennaway	brz,pt	%o2,.L_bn_mul_add_words_return
24674664626SKris Kennaway	stuw	%o4,[%o0]
24774664626SKris Kennaway
24874664626SKris Kennaway	lduw	[%o1+4],%g2
24974664626SKris Kennaway	lduw	[%o0+4],%g1
25074664626SKris Kennaway	mulx	%o3,%g2,%g2
25174664626SKris Kennaway	add	%g1,%o5,%o4
25274664626SKris Kennaway	dec	%o2
25374664626SKris Kennaway	add	%o4,%g2,%o4
25474664626SKris Kennaway	srlx	%o4,32,%o5
25574664626SKris Kennaway	brz,pt	%o2,.L_bn_mul_add_words_return
25674664626SKris Kennaway	stuw	%o4,[%o0+4]
25774664626SKris Kennaway
25874664626SKris Kennaway	lduw	[%o1+8],%g2
25974664626SKris Kennaway	lduw	[%o0+8],%g1
26074664626SKris Kennaway	mulx	%o3,%g2,%g2
26174664626SKris Kennaway	add	%g1,%o5,%o4
26274664626SKris Kennaway	add	%o4,%g2,%o4
26374664626SKris Kennaway	stuw	%o4,[%o0+8]
26474664626SKris Kennaway	retl
26574664626SKris Kennaway	srlx	%o4,32,%o0
26674664626SKris Kennaway
26774664626SKris Kennaway.type	bn_mul_add_words,#function
26874664626SKris Kennaway.size	bn_mul_add_words,(.-bn_mul_add_words)
26974664626SKris Kennaway
27074664626SKris Kennaway.align	32
27174664626SKris Kennaway
27274664626SKris Kennaway.global bn_mul_words
27374664626SKris Kennaway/*
27474664626SKris Kennaway * BN_ULONG bn_mul_words(rp,ap,num,w)
27574664626SKris Kennaway * BN_ULONG *rp,*ap;
27674664626SKris Kennaway * int num;
27774664626SKris Kennaway * BN_ULONG w;
27874664626SKris Kennaway */
27974664626SKris Kennawaybn_mul_words:
2803b4e3dcbSSimon L. B. Nielsen	sra	%o2,%g0,%o2	! signx %o2
281e71b7053SJung-uk Kim	brgz,a	%o2,.L_bn_mul_words_proceed
28274664626SKris Kennaway	lduw	[%o1],%g2
28374664626SKris Kennaway	retl
28474664626SKris Kennaway	clr	%o0
2853b4e3dcbSSimon L. B. Nielsen	nop
2863b4e3dcbSSimon L. B. Nielsen	nop
2873b4e3dcbSSimon L. B. Nielsen	nop
28874664626SKris Kennaway
289e71b7053SJung-uk Kim.L_bn_mul_words_proceed:
29074664626SKris Kennaway	srl	%o3,%g0,%o3	! clruw	%o3
29174664626SKris Kennaway	andcc	%o2,-4,%g0
29274664626SKris Kennaway	bz,pn	%icc,.L_bn_mul_words_tail
29374664626SKris Kennaway	clr	%o5
29474664626SKris Kennaway
29574664626SKris Kennaway.L_bn_mul_words_loop:		! wow! 32 aligned!
29674664626SKris Kennaway	lduw	[%o1+4],%g3
29774664626SKris Kennaway	mulx	%o3,%g2,%g2
29874664626SKris Kennaway	add	%g2,%o5,%o4
29974664626SKris Kennaway	nop
30074664626SKris Kennaway	stuw	%o4,[%o0]
30174664626SKris Kennaway	srlx	%o4,32,%o5
30274664626SKris Kennaway
30374664626SKris Kennaway	lduw	[%o1+8],%g2
30474664626SKris Kennaway	mulx	%o3,%g3,%g3
30574664626SKris Kennaway	add	%g3,%o5,%o4
30674664626SKris Kennaway	dec	4,%o2
30774664626SKris Kennaway	stuw	%o4,[%o0+4]
30874664626SKris Kennaway	srlx	%o4,32,%o5
30974664626SKris Kennaway
31074664626SKris Kennaway	lduw	[%o1+12],%g3
31174664626SKris Kennaway	mulx	%o3,%g2,%g2
31274664626SKris Kennaway	add	%g2,%o5,%o4
31374664626SKris Kennaway	inc	16,%o1
31474664626SKris Kennaway	stuw	%o4,[%o0+8]
31574664626SKris Kennaway	srlx	%o4,32,%o5
31674664626SKris Kennaway
31774664626SKris Kennaway	mulx	%o3,%g3,%g3
31874664626SKris Kennaway	add	%g3,%o5,%o4
31974664626SKris Kennaway	inc	16,%o0
32074664626SKris Kennaway	stuw	%o4,[%o0-4]
32174664626SKris Kennaway	srlx	%o4,32,%o5
32274664626SKris Kennaway	andcc	%o2,-4,%g0
32374664626SKris Kennaway	bnz,a,pt	%icc,.L_bn_mul_words_loop
32474664626SKris Kennaway	lduw	[%o1],%g2
32574664626SKris Kennaway	nop
32674664626SKris Kennaway	nop
32774664626SKris Kennaway
32874664626SKris Kennaway	brnz,a,pn	%o2,.L_bn_mul_words_tail
32974664626SKris Kennaway	lduw	[%o1],%g2
33074664626SKris Kennaway.L_bn_mul_words_return:
33174664626SKris Kennaway	retl
33274664626SKris Kennaway	mov	%o5,%o0
33374664626SKris Kennaway
33474664626SKris Kennaway.L_bn_mul_words_tail:
33574664626SKris Kennaway	mulx	%o3,%g2,%g2
33674664626SKris Kennaway	add	%g2,%o5,%o4
33774664626SKris Kennaway	dec	%o2
33874664626SKris Kennaway	srlx	%o4,32,%o5
33974664626SKris Kennaway	brz,pt	%o2,.L_bn_mul_words_return
34074664626SKris Kennaway	stuw	%o4,[%o0]
34174664626SKris Kennaway
34274664626SKris Kennaway	lduw	[%o1+4],%g2
34374664626SKris Kennaway	mulx	%o3,%g2,%g2
34474664626SKris Kennaway	add	%g2,%o5,%o4
34574664626SKris Kennaway	dec	%o2
34674664626SKris Kennaway	srlx	%o4,32,%o5
34774664626SKris Kennaway	brz,pt	%o2,.L_bn_mul_words_return
34874664626SKris Kennaway	stuw	%o4,[%o0+4]
34974664626SKris Kennaway
35074664626SKris Kennaway	lduw	[%o1+8],%g2
35174664626SKris Kennaway	mulx	%o3,%g2,%g2
35274664626SKris Kennaway	add	%g2,%o5,%o4
35374664626SKris Kennaway	stuw	%o4,[%o0+8]
35474664626SKris Kennaway	retl
35574664626SKris Kennaway	srlx	%o4,32,%o0
35674664626SKris Kennaway
35774664626SKris Kennaway.type	bn_mul_words,#function
35874664626SKris Kennaway.size	bn_mul_words,(.-bn_mul_words)
35974664626SKris Kennaway
36074664626SKris Kennaway.align  32
36174664626SKris Kennaway.global	bn_sqr_words
36274664626SKris Kennaway/*
36374664626SKris Kennaway * void bn_sqr_words(r,a,n)
36474664626SKris Kennaway * BN_ULONG *r,*a;
36574664626SKris Kennaway * int n;
36674664626SKris Kennaway */
36774664626SKris Kennawaybn_sqr_words:
3683b4e3dcbSSimon L. B. Nielsen	sra	%o2,%g0,%o2	! signx %o2
369e71b7053SJung-uk Kim	brgz,a	%o2,.L_bn_sqr_words_proceed
37074664626SKris Kennaway	lduw	[%o1],%g2
37174664626SKris Kennaway	retl
37274664626SKris Kennaway	clr	%o0
3733b4e3dcbSSimon L. B. Nielsen	nop
3743b4e3dcbSSimon L. B. Nielsen	nop
3753b4e3dcbSSimon L. B. Nielsen	nop
37674664626SKris Kennaway
377e71b7053SJung-uk Kim.L_bn_sqr_words_proceed:
37874664626SKris Kennaway	andcc	%o2,-4,%g0
37974664626SKris Kennaway	nop
38074664626SKris Kennaway	bz,pn	%icc,.L_bn_sqr_words_tail
38174664626SKris Kennaway	nop
38274664626SKris Kennaway
38374664626SKris Kennaway.L_bn_sqr_words_loop:		! wow! 32 aligned!
38474664626SKris Kennaway	lduw	[%o1+4],%g3
38574664626SKris Kennaway	mulx	%g2,%g2,%o4
38674664626SKris Kennaway	stuw	%o4,[%o0]
38774664626SKris Kennaway	srlx	%o4,32,%o5
38874664626SKris Kennaway	stuw	%o5,[%o0+4]
38974664626SKris Kennaway	nop
39074664626SKris Kennaway
39174664626SKris Kennaway	lduw	[%o1+8],%g2
39274664626SKris Kennaway	mulx	%g3,%g3,%o4
39374664626SKris Kennaway	dec	4,%o2
39474664626SKris Kennaway	stuw	%o4,[%o0+8]
39574664626SKris Kennaway	srlx	%o4,32,%o5
39674664626SKris Kennaway	stuw	%o5,[%o0+12]
39774664626SKris Kennaway
39874664626SKris Kennaway	lduw	[%o1+12],%g3
39974664626SKris Kennaway	mulx	%g2,%g2,%o4
40074664626SKris Kennaway	srlx	%o4,32,%o5
40174664626SKris Kennaway	stuw	%o4,[%o0+16]
40274664626SKris Kennaway	inc	16,%o1
40374664626SKris Kennaway	stuw	%o5,[%o0+20]
40474664626SKris Kennaway
40574664626SKris Kennaway	mulx	%g3,%g3,%o4
40674664626SKris Kennaway	inc	32,%o0
40774664626SKris Kennaway	stuw	%o4,[%o0-8]
40874664626SKris Kennaway	srlx	%o4,32,%o5
40974664626SKris Kennaway	andcc	%o2,-4,%g2
41074664626SKris Kennaway	stuw	%o5,[%o0-4]
41174664626SKris Kennaway	bnz,a,pt	%icc,.L_bn_sqr_words_loop
41274664626SKris Kennaway	lduw	[%o1],%g2
41374664626SKris Kennaway	nop
41474664626SKris Kennaway
41574664626SKris Kennaway	brnz,a,pn	%o2,.L_bn_sqr_words_tail
41674664626SKris Kennaway	lduw	[%o1],%g2
41774664626SKris Kennaway.L_bn_sqr_words_return:
41874664626SKris Kennaway	retl
41974664626SKris Kennaway	clr	%o0
42074664626SKris Kennaway
42174664626SKris Kennaway.L_bn_sqr_words_tail:
42274664626SKris Kennaway	mulx	%g2,%g2,%o4
42374664626SKris Kennaway	dec	%o2
42474664626SKris Kennaway	stuw	%o4,[%o0]
42574664626SKris Kennaway	srlx	%o4,32,%o5
42674664626SKris Kennaway	brz,pt	%o2,.L_bn_sqr_words_return
42774664626SKris Kennaway	stuw	%o5,[%o0+4]
42874664626SKris Kennaway
42974664626SKris Kennaway	lduw	[%o1+4],%g2
43074664626SKris Kennaway	mulx	%g2,%g2,%o4
43174664626SKris Kennaway	dec	%o2
43274664626SKris Kennaway	stuw	%o4,[%o0+8]
43374664626SKris Kennaway	srlx	%o4,32,%o5
43474664626SKris Kennaway	brz,pt	%o2,.L_bn_sqr_words_return
43574664626SKris Kennaway	stuw	%o5,[%o0+12]
43674664626SKris Kennaway
43774664626SKris Kennaway	lduw	[%o1+8],%g2
43874664626SKris Kennaway	mulx	%g2,%g2,%o4
43974664626SKris Kennaway	srlx	%o4,32,%o5
44074664626SKris Kennaway	stuw	%o4,[%o0+16]
44174664626SKris Kennaway	stuw	%o5,[%o0+20]
44274664626SKris Kennaway	retl
44374664626SKris Kennaway	clr	%o0
44474664626SKris Kennaway
44574664626SKris Kennaway.type	bn_sqr_words,#function
44674664626SKris Kennaway.size	bn_sqr_words,(.-bn_sqr_words)
44774664626SKris Kennaway
44874664626SKris Kennaway.align	32
44974664626SKris Kennaway.global bn_div_words
45074664626SKris Kennaway/*
45174664626SKris Kennaway * BN_ULONG bn_div_words(h,l,d)
45274664626SKris Kennaway * BN_ULONG h,l,d;
45374664626SKris Kennaway */
45474664626SKris Kennawaybn_div_words:
45574664626SKris Kennaway	sllx	%o0,32,%o0
45674664626SKris Kennaway	or	%o0,%o1,%o0
45774664626SKris Kennaway	udivx	%o0,%o2,%o0
45874664626SKris Kennaway	retl
45974664626SKris Kennaway	srl	%o0,%g0,%o0	! clruw	%o0
46074664626SKris Kennaway
46174664626SKris Kennaway.type	bn_div_words,#function
46274664626SKris Kennaway.size	bn_div_words,(.-bn_div_words)
46374664626SKris Kennaway
46474664626SKris Kennaway.align	32
46574664626SKris Kennaway
46674664626SKris Kennaway.global bn_add_words
46774664626SKris Kennaway/*
46874664626SKris Kennaway * BN_ULONG bn_add_words(rp,ap,bp,n)
46974664626SKris Kennaway * BN_ULONG *rp,*ap,*bp;
47074664626SKris Kennaway * int n;
47174664626SKris Kennaway */
47274664626SKris Kennawaybn_add_words:
4733b4e3dcbSSimon L. B. Nielsen	sra	%o3,%g0,%o3	! signx %o3
47474664626SKris Kennaway	brgz,a	%o3,.L_bn_add_words_proceed
47574664626SKris Kennaway	lduw	[%o1],%o4
47674664626SKris Kennaway	retl
47774664626SKris Kennaway	clr	%o0
47874664626SKris Kennaway
47974664626SKris Kennaway.L_bn_add_words_proceed:
48074664626SKris Kennaway	andcc	%o3,-4,%g0
48174664626SKris Kennaway	bz,pn	%icc,.L_bn_add_words_tail
48274664626SKris Kennaway	addcc	%g0,0,%g0	! clear carry flag
48374664626SKris Kennaway
48474664626SKris Kennaway.L_bn_add_words_loop:		! wow! 32 aligned!
48574664626SKris Kennaway	dec	4,%o3
48674664626SKris Kennaway	lduw	[%o2],%o5
48774664626SKris Kennaway	lduw	[%o1+4],%g1
48874664626SKris Kennaway	lduw	[%o2+4],%g2
48974664626SKris Kennaway	lduw	[%o1+8],%g3
49074664626SKris Kennaway	lduw	[%o2+8],%g4
49174664626SKris Kennaway	addccc	%o5,%o4,%o5
49274664626SKris Kennaway	stuw	%o5,[%o0]
49374664626SKris Kennaway
49474664626SKris Kennaway	lduw	[%o1+12],%o4
49574664626SKris Kennaway	lduw	[%o2+12],%o5
49674664626SKris Kennaway	inc	16,%o1
49774664626SKris Kennaway	addccc	%g1,%g2,%g1
49874664626SKris Kennaway	stuw	%g1,[%o0+4]
49974664626SKris Kennaway
50074664626SKris Kennaway	inc	16,%o2
50174664626SKris Kennaway	addccc	%g3,%g4,%g3
50274664626SKris Kennaway	stuw	%g3,[%o0+8]
50374664626SKris Kennaway
50474664626SKris Kennaway	inc	16,%o0
50574664626SKris Kennaway	addccc	%o5,%o4,%o5
50674664626SKris Kennaway	stuw	%o5,[%o0-4]
50774664626SKris Kennaway	and	%o3,-4,%g1
50874664626SKris Kennaway	brnz,a,pt	%g1,.L_bn_add_words_loop
50974664626SKris Kennaway	lduw	[%o1],%o4
51074664626SKris Kennaway
51174664626SKris Kennaway	brnz,a,pn	%o3,.L_bn_add_words_tail
51274664626SKris Kennaway	lduw	[%o1],%o4
51374664626SKris Kennaway.L_bn_add_words_return:
51474664626SKris Kennaway	clr	%o0
51574664626SKris Kennaway	retl
51674664626SKris Kennaway	movcs	%icc,1,%o0
51774664626SKris Kennaway	nop
51874664626SKris Kennaway
51974664626SKris Kennaway.L_bn_add_words_tail:
52074664626SKris Kennaway	lduw	[%o2],%o5
52174664626SKris Kennaway	dec	%o3
52274664626SKris Kennaway	addccc	%o5,%o4,%o5
52374664626SKris Kennaway	brz,pt	%o3,.L_bn_add_words_return
52474664626SKris Kennaway	stuw	%o5,[%o0]
52574664626SKris Kennaway
52674664626SKris Kennaway	lduw	[%o1+4],%o4
52774664626SKris Kennaway	lduw	[%o2+4],%o5
52874664626SKris Kennaway	dec	%o3
52974664626SKris Kennaway	addccc	%o5,%o4,%o5
53074664626SKris Kennaway	brz,pt	%o3,.L_bn_add_words_return
53174664626SKris Kennaway	stuw	%o5,[%o0+4]
53274664626SKris Kennaway
53374664626SKris Kennaway	lduw	[%o1+8],%o4
53474664626SKris Kennaway	lduw	[%o2+8],%o5
53574664626SKris Kennaway	addccc	%o5,%o4,%o5
53674664626SKris Kennaway	stuw	%o5,[%o0+8]
53774664626SKris Kennaway	clr	%o0
53874664626SKris Kennaway	retl
53974664626SKris Kennaway	movcs	%icc,1,%o0
54074664626SKris Kennaway
54174664626SKris Kennaway.type	bn_add_words,#function
54274664626SKris Kennaway.size	bn_add_words,(.-bn_add_words)
54374664626SKris Kennaway
54474664626SKris Kennaway.global bn_sub_words
54574664626SKris Kennaway/*
54674664626SKris Kennaway * BN_ULONG bn_sub_words(rp,ap,bp,n)
54774664626SKris Kennaway * BN_ULONG *rp,*ap,*bp;
54874664626SKris Kennaway * int n;
54974664626SKris Kennaway */
55074664626SKris Kennawaybn_sub_words:
5513b4e3dcbSSimon L. B. Nielsen	sra	%o3,%g0,%o3	! signx %o3
55274664626SKris Kennaway	brgz,a	%o3,.L_bn_sub_words_proceed
55374664626SKris Kennaway	lduw	[%o1],%o4
55474664626SKris Kennaway	retl
55574664626SKris Kennaway	clr	%o0
55674664626SKris Kennaway
55774664626SKris Kennaway.L_bn_sub_words_proceed:
55874664626SKris Kennaway	andcc	%o3,-4,%g0
55974664626SKris Kennaway	bz,pn	%icc,.L_bn_sub_words_tail
56074664626SKris Kennaway	addcc	%g0,0,%g0	! clear carry flag
56174664626SKris Kennaway
56274664626SKris Kennaway.L_bn_sub_words_loop:		! wow! 32 aligned!
56374664626SKris Kennaway	dec	4,%o3
56474664626SKris Kennaway	lduw	[%o2],%o5
56574664626SKris Kennaway	lduw	[%o1+4],%g1
56674664626SKris Kennaway	lduw	[%o2+4],%g2
56774664626SKris Kennaway	lduw	[%o1+8],%g3
56874664626SKris Kennaway	lduw	[%o2+8],%g4
56974664626SKris Kennaway	subccc	%o4,%o5,%o5
57074664626SKris Kennaway	stuw	%o5,[%o0]
57174664626SKris Kennaway
57274664626SKris Kennaway	lduw	[%o1+12],%o4
57374664626SKris Kennaway	lduw	[%o2+12],%o5
57474664626SKris Kennaway	inc	16,%o1
57574664626SKris Kennaway	subccc	%g1,%g2,%g2
57674664626SKris Kennaway	stuw	%g2,[%o0+4]
57774664626SKris Kennaway
57874664626SKris Kennaway	inc	16,%o2
57974664626SKris Kennaway	subccc	%g3,%g4,%g4
58074664626SKris Kennaway	stuw	%g4,[%o0+8]
58174664626SKris Kennaway
58274664626SKris Kennaway	inc	16,%o0
58374664626SKris Kennaway	subccc	%o4,%o5,%o5
58474664626SKris Kennaway	stuw	%o5,[%o0-4]
58574664626SKris Kennaway	and	%o3,-4,%g1
58674664626SKris Kennaway	brnz,a,pt	%g1,.L_bn_sub_words_loop
58774664626SKris Kennaway	lduw	[%o1],%o4
58874664626SKris Kennaway
58974664626SKris Kennaway	brnz,a,pn	%o3,.L_bn_sub_words_tail
59074664626SKris Kennaway	lduw	[%o1],%o4
59174664626SKris Kennaway.L_bn_sub_words_return:
59274664626SKris Kennaway	clr	%o0
59374664626SKris Kennaway	retl
59474664626SKris Kennaway	movcs	%icc,1,%o0
59574664626SKris Kennaway	nop
59674664626SKris Kennaway
59774664626SKris Kennaway.L_bn_sub_words_tail:		! wow! 32 aligned!
59874664626SKris Kennaway	lduw	[%o2],%o5
59974664626SKris Kennaway	dec	%o3
60074664626SKris Kennaway	subccc	%o4,%o5,%o5
60174664626SKris Kennaway	brz,pt	%o3,.L_bn_sub_words_return
60274664626SKris Kennaway	stuw	%o5,[%o0]
60374664626SKris Kennaway
60474664626SKris Kennaway	lduw	[%o1+4],%o4
60574664626SKris Kennaway	lduw	[%o2+4],%o5
60674664626SKris Kennaway	dec	%o3
60774664626SKris Kennaway	subccc	%o4,%o5,%o5
60874664626SKris Kennaway	brz,pt	%o3,.L_bn_sub_words_return
60974664626SKris Kennaway	stuw	%o5,[%o0+4]
61074664626SKris Kennaway
61174664626SKris Kennaway	lduw	[%o1+8],%o4
61274664626SKris Kennaway	lduw	[%o2+8],%o5
61374664626SKris Kennaway	subccc	%o4,%o5,%o5
61474664626SKris Kennaway	stuw	%o5,[%o0+8]
61574664626SKris Kennaway	clr	%o0
61674664626SKris Kennaway	retl
61774664626SKris Kennaway	movcs	%icc,1,%o0
61874664626SKris Kennaway
61974664626SKris Kennaway.type	bn_sub_words,#function
62074664626SKris Kennaway.size	bn_sub_words,(.-bn_sub_words)
62174664626SKris Kennaway
62274664626SKris Kennaway/*
62374664626SKris Kennaway * Code below depends on the fact that upper parts of the %l0-%l7
62474664626SKris Kennaway * and %i0-%i7 are zeroed by kernel after context switch. In
62574664626SKris Kennaway * previous versions this comment stated that "the trouble is that
62674664626SKris Kennaway * it's not feasible to implement the mumbo-jumbo in less V9
62774664626SKris Kennaway * instructions:-(" which apparently isn't true thanks to
62874664626SKris Kennaway * 'bcs,a %xcc,.+8; inc %rd' pair. But the performance improvement
62974664626SKris Kennaway * results not from the shorter code, but from elimination of
63074664626SKris Kennaway * multicycle none-pairable 'rd %y,%rd' instructions.
63174664626SKris Kennaway *
63274664626SKris Kennaway *							Andy.
63374664626SKris Kennaway */
63474664626SKris Kennaway
63574664626SKris Kennaway/*
63674664626SKris Kennaway * Here is register usage map for *all* routines below.
63774664626SKris Kennaway */
63874664626SKris Kennaway#define t_1	%o0
63974664626SKris Kennaway#define	t_2	%o1
64074664626SKris Kennaway#define c_12	%o2
64174664626SKris Kennaway#define c_3	%o3
64274664626SKris Kennaway
64374664626SKris Kennaway#define ap(I)	[%i1+4*I]
64474664626SKris Kennaway#define bp(I)	[%i2+4*I]
64574664626SKris Kennaway#define rp(I)	[%i0+4*I]
64674664626SKris Kennaway
64774664626SKris Kennaway#define	a_0	%l0
64874664626SKris Kennaway#define	a_1	%l1
64974664626SKris Kennaway#define	a_2	%l2
65074664626SKris Kennaway#define	a_3	%l3
65174664626SKris Kennaway#define	a_4	%l4
65274664626SKris Kennaway#define	a_5	%l5
65374664626SKris Kennaway#define	a_6	%l6
65474664626SKris Kennaway#define	a_7	%l7
65574664626SKris Kennaway
65674664626SKris Kennaway#define	b_0	%i3
65774664626SKris Kennaway#define	b_1	%i4
65874664626SKris Kennaway#define	b_2	%i5
65974664626SKris Kennaway#define	b_3	%o4
66074664626SKris Kennaway#define	b_4	%o5
66174664626SKris Kennaway#define	b_5	%o7
66274664626SKris Kennaway#define	b_6	%g1
66374664626SKris Kennaway#define	b_7	%g4
66474664626SKris Kennaway
66574664626SKris Kennaway.align	32
66674664626SKris Kennaway.global bn_mul_comba8
66774664626SKris Kennaway/*
66874664626SKris Kennaway * void bn_mul_comba8(r,a,b)
66974664626SKris Kennaway * BN_ULONG *r,*a,*b;
67074664626SKris Kennaway */
67174664626SKris Kennawaybn_mul_comba8:
67274664626SKris Kennaway	save	%sp,FRAME_SIZE,%sp
67374664626SKris Kennaway	mov	1,t_2
67474664626SKris Kennaway	lduw	ap(0),a_0
67574664626SKris Kennaway	sllx	t_2,32,t_2
67674664626SKris Kennaway	lduw	bp(0),b_0	!=
67774664626SKris Kennaway	lduw	bp(1),b_1
67874664626SKris Kennaway	mulx	a_0,b_0,t_1	!mul_add_c(a[0],b[0],c1,c2,c3);
67974664626SKris Kennaway	srlx	t_1,32,c_12
68074664626SKris Kennaway	stuw	t_1,rp(0)	!=!r[0]=c1;
68174664626SKris Kennaway
68274664626SKris Kennaway	lduw	ap(1),a_1
68374664626SKris Kennaway	mulx	a_0,b_1,t_1	!mul_add_c(a[0],b[1],c2,c3,c1);
68474664626SKris Kennaway	addcc	c_12,t_1,c_12
68574664626SKris Kennaway	clr	c_3		!=
68674664626SKris Kennaway	bcs,a	%xcc,.+8
68774664626SKris Kennaway	add	c_3,t_2,c_3
68874664626SKris Kennaway	lduw	ap(2),a_2
68974664626SKris Kennaway	mulx	a_1,b_0,t_1	!=!mul_add_c(a[1],b[0],c2,c3,c1);
69074664626SKris Kennaway	addcc	c_12,t_1,t_1
69174664626SKris Kennaway	bcs,a	%xcc,.+8
69274664626SKris Kennaway	add	c_3,t_2,c_3
69374664626SKris Kennaway	srlx	t_1,32,c_12	!=
69474664626SKris Kennaway	stuw	t_1,rp(1)	!r[1]=c2;
69574664626SKris Kennaway	or	c_12,c_3,c_12
69674664626SKris Kennaway
69774664626SKris Kennaway	mulx	a_2,b_0,t_1	!mul_add_c(a[2],b[0],c3,c1,c2);
69874664626SKris Kennaway	addcc	c_12,t_1,c_12	!=
69974664626SKris Kennaway	clr	c_3
70074664626SKris Kennaway	bcs,a	%xcc,.+8
70174664626SKris Kennaway	add	c_3,t_2,c_3
70274664626SKris Kennaway	lduw	bp(2),b_2	!=
70374664626SKris Kennaway	mulx	a_1,b_1,t_1	!mul_add_c(a[1],b[1],c3,c1,c2);
70474664626SKris Kennaway	addcc	c_12,t_1,c_12
70574664626SKris Kennaway	bcs,a	%xcc,.+8
70674664626SKris Kennaway	add	c_3,t_2,c_3	!=
70774664626SKris Kennaway	lduw	bp(3),b_3
70874664626SKris Kennaway	mulx	a_0,b_2,t_1	!mul_add_c(a[0],b[2],c3,c1,c2);
70974664626SKris Kennaway	addcc	c_12,t_1,t_1
71074664626SKris Kennaway	bcs,a	%xcc,.+8	!=
71174664626SKris Kennaway	add	c_3,t_2,c_3
71274664626SKris Kennaway	srlx	t_1,32,c_12
71374664626SKris Kennaway	stuw	t_1,rp(2)	!r[2]=c3;
71474664626SKris Kennaway	or	c_12,c_3,c_12	!=
71574664626SKris Kennaway
71674664626SKris Kennaway	mulx	a_0,b_3,t_1	!mul_add_c(a[0],b[3],c1,c2,c3);
71774664626SKris Kennaway	addcc	c_12,t_1,c_12
71874664626SKris Kennaway	clr	c_3
71974664626SKris Kennaway	bcs,a	%xcc,.+8	!=
72074664626SKris Kennaway	add	c_3,t_2,c_3
72174664626SKris Kennaway	mulx	a_1,b_2,t_1	!=!mul_add_c(a[1],b[2],c1,c2,c3);
72274664626SKris Kennaway	addcc	c_12,t_1,c_12
72374664626SKris Kennaway	bcs,a	%xcc,.+8	!=
72474664626SKris Kennaway	add	c_3,t_2,c_3
72574664626SKris Kennaway	lduw	ap(3),a_3
72674664626SKris Kennaway	mulx	a_2,b_1,t_1	!mul_add_c(a[2],b[1],c1,c2,c3);
72774664626SKris Kennaway	addcc	c_12,t_1,c_12	!=
72874664626SKris Kennaway	bcs,a	%xcc,.+8
72974664626SKris Kennaway	add	c_3,t_2,c_3
73074664626SKris Kennaway	lduw	ap(4),a_4
73174664626SKris Kennaway	mulx	a_3,b_0,t_1	!=!mul_add_c(a[3],b[0],c1,c2,c3);!=
73274664626SKris Kennaway	addcc	c_12,t_1,t_1
73374664626SKris Kennaway	bcs,a	%xcc,.+8
73474664626SKris Kennaway	add	c_3,t_2,c_3
73574664626SKris Kennaway	srlx	t_1,32,c_12	!=
73674664626SKris Kennaway	stuw	t_1,rp(3)	!r[3]=c1;
73774664626SKris Kennaway	or	c_12,c_3,c_12
73874664626SKris Kennaway
73974664626SKris Kennaway	mulx	a_4,b_0,t_1	!mul_add_c(a[4],b[0],c2,c3,c1);
74074664626SKris Kennaway	addcc	c_12,t_1,c_12	!=
74174664626SKris Kennaway	clr	c_3
74274664626SKris Kennaway	bcs,a	%xcc,.+8
74374664626SKris Kennaway	add	c_3,t_2,c_3
74474664626SKris Kennaway	mulx	a_3,b_1,t_1	!=!mul_add_c(a[3],b[1],c2,c3,c1);
74574664626SKris Kennaway	addcc	c_12,t_1,c_12
74674664626SKris Kennaway	bcs,a	%xcc,.+8
74774664626SKris Kennaway	add	c_3,t_2,c_3
74874664626SKris Kennaway	mulx	a_2,b_2,t_1	!=!mul_add_c(a[2],b[2],c2,c3,c1);
74974664626SKris Kennaway	addcc	c_12,t_1,c_12
75074664626SKris Kennaway	bcs,a	%xcc,.+8
75174664626SKris Kennaway	add	c_3,t_2,c_3
75274664626SKris Kennaway	lduw	bp(4),b_4	!=
75374664626SKris Kennaway	mulx	a_1,b_3,t_1	!mul_add_c(a[1],b[3],c2,c3,c1);
75474664626SKris Kennaway	addcc	c_12,t_1,c_12
75574664626SKris Kennaway	bcs,a	%xcc,.+8
75674664626SKris Kennaway	add	c_3,t_2,c_3	!=
75774664626SKris Kennaway	lduw	bp(5),b_5
75874664626SKris Kennaway	mulx	a_0,b_4,t_1	!mul_add_c(a[0],b[4],c2,c3,c1);
75974664626SKris Kennaway	addcc	c_12,t_1,t_1
76074664626SKris Kennaway	bcs,a	%xcc,.+8	!=
76174664626SKris Kennaway	add	c_3,t_2,c_3
76274664626SKris Kennaway	srlx	t_1,32,c_12
76374664626SKris Kennaway	stuw	t_1,rp(4)	!r[4]=c2;
76474664626SKris Kennaway	or	c_12,c_3,c_12	!=
76574664626SKris Kennaway
76674664626SKris Kennaway	mulx	a_0,b_5,t_1	!mul_add_c(a[0],b[5],c3,c1,c2);
76774664626SKris Kennaway	addcc	c_12,t_1,c_12
76874664626SKris Kennaway	clr	c_3
76974664626SKris Kennaway	bcs,a	%xcc,.+8	!=
77074664626SKris Kennaway	add	c_3,t_2,c_3
77174664626SKris Kennaway	mulx	a_1,b_4,t_1	!mul_add_c(a[1],b[4],c3,c1,c2);
77274664626SKris Kennaway	addcc	c_12,t_1,c_12
77374664626SKris Kennaway	bcs,a	%xcc,.+8	!=
77474664626SKris Kennaway	add	c_3,t_2,c_3
77574664626SKris Kennaway	mulx	a_2,b_3,t_1	!mul_add_c(a[2],b[3],c3,c1,c2);
77674664626SKris Kennaway	addcc	c_12,t_1,c_12
77774664626SKris Kennaway	bcs,a	%xcc,.+8	!=
77874664626SKris Kennaway	add	c_3,t_2,c_3
77974664626SKris Kennaway	mulx	a_3,b_2,t_1	!mul_add_c(a[3],b[2],c3,c1,c2);
78074664626SKris Kennaway	addcc	c_12,t_1,c_12
78174664626SKris Kennaway	bcs,a	%xcc,.+8	!=
78274664626SKris Kennaway	add	c_3,t_2,c_3
78374664626SKris Kennaway	lduw	ap(5),a_5
78474664626SKris Kennaway	mulx	a_4,b_1,t_1	!mul_add_c(a[4],b[1],c3,c1,c2);
78574664626SKris Kennaway	addcc	c_12,t_1,c_12	!=
78674664626SKris Kennaway	bcs,a	%xcc,.+8
78774664626SKris Kennaway	add	c_3,t_2,c_3
78874664626SKris Kennaway	lduw	ap(6),a_6
78974664626SKris Kennaway	mulx	a_5,b_0,t_1	!=!mul_add_c(a[5],b[0],c3,c1,c2);
79074664626SKris Kennaway	addcc	c_12,t_1,t_1
79174664626SKris Kennaway	bcs,a	%xcc,.+8
79274664626SKris Kennaway	add	c_3,t_2,c_3
79374664626SKris Kennaway	srlx	t_1,32,c_12	!=
79474664626SKris Kennaway	stuw	t_1,rp(5)	!r[5]=c3;
79574664626SKris Kennaway	or	c_12,c_3,c_12
79674664626SKris Kennaway
79774664626SKris Kennaway	mulx	a_6,b_0,t_1	!mul_add_c(a[6],b[0],c1,c2,c3);
79874664626SKris Kennaway	addcc	c_12,t_1,c_12	!=
79974664626SKris Kennaway	clr	c_3
80074664626SKris Kennaway	bcs,a	%xcc,.+8
80174664626SKris Kennaway	add	c_3,t_2,c_3
80274664626SKris Kennaway	mulx	a_5,b_1,t_1	!=!mul_add_c(a[5],b[1],c1,c2,c3);
80374664626SKris Kennaway	addcc	c_12,t_1,c_12
80474664626SKris Kennaway	bcs,a	%xcc,.+8
80574664626SKris Kennaway	add	c_3,t_2,c_3
80674664626SKris Kennaway	mulx	a_4,b_2,t_1	!=!mul_add_c(a[4],b[2],c1,c2,c3);
80774664626SKris Kennaway	addcc	c_12,t_1,c_12
80874664626SKris Kennaway	bcs,a	%xcc,.+8
80974664626SKris Kennaway	add	c_3,t_2,c_3
81074664626SKris Kennaway	mulx	a_3,b_3,t_1	!=!mul_add_c(a[3],b[3],c1,c2,c3);
81174664626SKris Kennaway	addcc	c_12,t_1,c_12
81274664626SKris Kennaway	bcs,a	%xcc,.+8
81374664626SKris Kennaway	add	c_3,t_2,c_3
81474664626SKris Kennaway	mulx	a_2,b_4,t_1	!=!mul_add_c(a[2],b[4],c1,c2,c3);
81574664626SKris Kennaway	addcc	c_12,t_1,c_12
81674664626SKris Kennaway	bcs,a	%xcc,.+8
81774664626SKris Kennaway	add	c_3,t_2,c_3
81874664626SKris Kennaway	lduw	bp(6),b_6	!=
81974664626SKris Kennaway	mulx	a_1,b_5,t_1	!mul_add_c(a[1],b[5],c1,c2,c3);
82074664626SKris Kennaway	addcc	c_12,t_1,c_12
82174664626SKris Kennaway	bcs,a	%xcc,.+8
82274664626SKris Kennaway	add	c_3,t_2,c_3	!=
82374664626SKris Kennaway	lduw	bp(7),b_7
82474664626SKris Kennaway	mulx	a_0,b_6,t_1	!mul_add_c(a[0],b[6],c1,c2,c3);
82574664626SKris Kennaway	addcc	c_12,t_1,t_1
82674664626SKris Kennaway	bcs,a	%xcc,.+8	!=
82774664626SKris Kennaway	add	c_3,t_2,c_3
82874664626SKris Kennaway	srlx	t_1,32,c_12
82974664626SKris Kennaway	stuw	t_1,rp(6)	!r[6]=c1;
83074664626SKris Kennaway	or	c_12,c_3,c_12	!=
83174664626SKris Kennaway
83274664626SKris Kennaway	mulx	a_0,b_7,t_1	!mul_add_c(a[0],b[7],c2,c3,c1);
83374664626SKris Kennaway	addcc	c_12,t_1,c_12
83474664626SKris Kennaway	clr	c_3
83574664626SKris Kennaway	bcs,a	%xcc,.+8	!=
83674664626SKris Kennaway	add	c_3,t_2,c_3
83774664626SKris Kennaway	mulx	a_1,b_6,t_1	!mul_add_c(a[1],b[6],c2,c3,c1);
83874664626SKris Kennaway	addcc	c_12,t_1,c_12
83974664626SKris Kennaway	bcs,a	%xcc,.+8	!=
84074664626SKris Kennaway	add	c_3,t_2,c_3
84174664626SKris Kennaway	mulx	a_2,b_5,t_1	!mul_add_c(a[2],b[5],c2,c3,c1);
84274664626SKris Kennaway	addcc	c_12,t_1,c_12
84374664626SKris Kennaway	bcs,a	%xcc,.+8	!=
84474664626SKris Kennaway	add	c_3,t_2,c_3
84574664626SKris Kennaway	mulx	a_3,b_4,t_1	!mul_add_c(a[3],b[4],c2,c3,c1);
84674664626SKris Kennaway	addcc	c_12,t_1,c_12
84774664626SKris Kennaway	bcs,a	%xcc,.+8	!=
84874664626SKris Kennaway	add	c_3,t_2,c_3
84974664626SKris Kennaway	mulx	a_4,b_3,t_1	!mul_add_c(a[4],b[3],c2,c3,c1);
85074664626SKris Kennaway	addcc	c_12,t_1,c_12
85174664626SKris Kennaway	bcs,a	%xcc,.+8	!=
85274664626SKris Kennaway	add	c_3,t_2,c_3
85374664626SKris Kennaway	mulx	a_5,b_2,t_1	!mul_add_c(a[5],b[2],c2,c3,c1);
85474664626SKris Kennaway	addcc	c_12,t_1,c_12
85574664626SKris Kennaway	bcs,a	%xcc,.+8	!=
85674664626SKris Kennaway	add	c_3,t_2,c_3
85774664626SKris Kennaway	lduw	ap(7),a_7
85874664626SKris Kennaway	mulx	a_6,b_1,t_1	!=!mul_add_c(a[6],b[1],c2,c3,c1);
85974664626SKris Kennaway	addcc	c_12,t_1,c_12
86074664626SKris Kennaway	bcs,a	%xcc,.+8
86174664626SKris Kennaway	add	c_3,t_2,c_3
86274664626SKris Kennaway	mulx	a_7,b_0,t_1	!=!mul_add_c(a[7],b[0],c2,c3,c1);
86374664626SKris Kennaway	addcc	c_12,t_1,t_1
86474664626SKris Kennaway	bcs,a	%xcc,.+8
86574664626SKris Kennaway	add	c_3,t_2,c_3
86674664626SKris Kennaway	srlx	t_1,32,c_12	!=
86774664626SKris Kennaway	stuw	t_1,rp(7)	!r[7]=c2;
86874664626SKris Kennaway	or	c_12,c_3,c_12
86974664626SKris Kennaway
87074664626SKris Kennaway	mulx	a_7,b_1,t_1	!=!mul_add_c(a[7],b[1],c3,c1,c2);
87174664626SKris Kennaway	addcc	c_12,t_1,c_12
87274664626SKris Kennaway	clr	c_3
87374664626SKris Kennaway	bcs,a	%xcc,.+8
87474664626SKris Kennaway	add	c_3,t_2,c_3	!=
87574664626SKris Kennaway	mulx	a_6,b_2,t_1	!mul_add_c(a[6],b[2],c3,c1,c2);
87674664626SKris Kennaway	addcc	c_12,t_1,c_12
87774664626SKris Kennaway	bcs,a	%xcc,.+8
87874664626SKris Kennaway	add	c_3,t_2,c_3	!=
87974664626SKris Kennaway	mulx	a_5,b_3,t_1	!mul_add_c(a[5],b[3],c3,c1,c2);
88074664626SKris Kennaway	addcc	c_12,t_1,c_12
88174664626SKris Kennaway	bcs,a	%xcc,.+8
88274664626SKris Kennaway	add	c_3,t_2,c_3	!=
88374664626SKris Kennaway	mulx	a_4,b_4,t_1	!mul_add_c(a[4],b[4],c3,c1,c2);
88474664626SKris Kennaway	addcc	c_12,t_1,c_12
88574664626SKris Kennaway	bcs,a	%xcc,.+8
88674664626SKris Kennaway	add	c_3,t_2,c_3	!=
88774664626SKris Kennaway	mulx	a_3,b_5,t_1	!mul_add_c(a[3],b[5],c3,c1,c2);
88874664626SKris Kennaway	addcc	c_12,t_1,c_12
88974664626SKris Kennaway	bcs,a	%xcc,.+8
89074664626SKris Kennaway	add	c_3,t_2,c_3	!=
89174664626SKris Kennaway	mulx	a_2,b_6,t_1	!mul_add_c(a[2],b[6],c3,c1,c2);
89274664626SKris Kennaway	addcc	c_12,t_1,c_12
89374664626SKris Kennaway	bcs,a	%xcc,.+8
89474664626SKris Kennaway	add	c_3,t_2,c_3	!=
89574664626SKris Kennaway	mulx	a_1,b_7,t_1	!mul_add_c(a[1],b[7],c3,c1,c2);
89674664626SKris Kennaway	addcc	c_12,t_1,t_1
89774664626SKris Kennaway	bcs,a	%xcc,.+8
89874664626SKris Kennaway	add	c_3,t_2,c_3	!=
89974664626SKris Kennaway	srlx	t_1,32,c_12
90074664626SKris Kennaway	stuw	t_1,rp(8)	!r[8]=c3;
90174664626SKris Kennaway	or	c_12,c_3,c_12
90274664626SKris Kennaway
90374664626SKris Kennaway	mulx	a_2,b_7,t_1	!=!mul_add_c(a[2],b[7],c1,c2,c3);
90474664626SKris Kennaway	addcc	c_12,t_1,c_12
90574664626SKris Kennaway	clr	c_3
90674664626SKris Kennaway	bcs,a	%xcc,.+8
90774664626SKris Kennaway	add	c_3,t_2,c_3	!=
90874664626SKris Kennaway	mulx	a_3,b_6,t_1	!mul_add_c(a[3],b[6],c1,c2,c3);
90974664626SKris Kennaway	addcc	c_12,t_1,c_12
91074664626SKris Kennaway	bcs,a	%xcc,.+8	!=
91174664626SKris Kennaway	add	c_3,t_2,c_3
91274664626SKris Kennaway	mulx	a_4,b_5,t_1	!mul_add_c(a[4],b[5],c1,c2,c3);
91374664626SKris Kennaway	addcc	c_12,t_1,c_12
91474664626SKris Kennaway	bcs,a	%xcc,.+8	!=
91574664626SKris Kennaway	add	c_3,t_2,c_3
91674664626SKris Kennaway	mulx	a_5,b_4,t_1	!mul_add_c(a[5],b[4],c1,c2,c3);
91774664626SKris Kennaway	addcc	c_12,t_1,c_12
91874664626SKris Kennaway	bcs,a	%xcc,.+8	!=
91974664626SKris Kennaway	add	c_3,t_2,c_3
92074664626SKris Kennaway	mulx	a_6,b_3,t_1	!mul_add_c(a[6],b[3],c1,c2,c3);
92174664626SKris Kennaway	addcc	c_12,t_1,c_12
92274664626SKris Kennaway	bcs,a	%xcc,.+8	!=
92374664626SKris Kennaway	add	c_3,t_2,c_3
92474664626SKris Kennaway	mulx	a_7,b_2,t_1	!mul_add_c(a[7],b[2],c1,c2,c3);
92574664626SKris Kennaway	addcc	c_12,t_1,t_1
92674664626SKris Kennaway	bcs,a	%xcc,.+8	!=
92774664626SKris Kennaway	add	c_3,t_2,c_3
92874664626SKris Kennaway	srlx	t_1,32,c_12
92974664626SKris Kennaway	stuw	t_1,rp(9)	!r[9]=c1;
93074664626SKris Kennaway	or	c_12,c_3,c_12	!=
93174664626SKris Kennaway
93274664626SKris Kennaway	mulx	a_7,b_3,t_1	!mul_add_c(a[7],b[3],c2,c3,c1);
93374664626SKris Kennaway	addcc	c_12,t_1,c_12
93474664626SKris Kennaway	clr	c_3
93574664626SKris Kennaway	bcs,a	%xcc,.+8	!=
93674664626SKris Kennaway	add	c_3,t_2,c_3
93774664626SKris Kennaway	mulx	a_6,b_4,t_1	!mul_add_c(a[6],b[4],c2,c3,c1);
93874664626SKris Kennaway	addcc	c_12,t_1,c_12
93974664626SKris Kennaway	bcs,a	%xcc,.+8	!=
94074664626SKris Kennaway	add	c_3,t_2,c_3
94174664626SKris Kennaway	mulx	a_5,b_5,t_1	!mul_add_c(a[5],b[5],c2,c3,c1);
94274664626SKris Kennaway	addcc	c_12,t_1,c_12
94374664626SKris Kennaway	bcs,a	%xcc,.+8	!=
94474664626SKris Kennaway	add	c_3,t_2,c_3
94574664626SKris Kennaway	mulx	a_4,b_6,t_1	!mul_add_c(a[4],b[6],c2,c3,c1);
94674664626SKris Kennaway	addcc	c_12,t_1,c_12
94774664626SKris Kennaway	bcs,a	%xcc,.+8	!=
94874664626SKris Kennaway	add	c_3,t_2,c_3
94974664626SKris Kennaway	mulx	a_3,b_7,t_1	!mul_add_c(a[3],b[7],c2,c3,c1);
95074664626SKris Kennaway	addcc	c_12,t_1,t_1
95174664626SKris Kennaway	bcs,a	%xcc,.+8	!=
95274664626SKris Kennaway	add	c_3,t_2,c_3
95374664626SKris Kennaway	srlx	t_1,32,c_12
95474664626SKris Kennaway	stuw	t_1,rp(10)	!r[10]=c2;
95574664626SKris Kennaway	or	c_12,c_3,c_12	!=
95674664626SKris Kennaway
95774664626SKris Kennaway	mulx	a_4,b_7,t_1	!mul_add_c(a[4],b[7],c3,c1,c2);
95874664626SKris Kennaway	addcc	c_12,t_1,c_12
95974664626SKris Kennaway	clr	c_3
96074664626SKris Kennaway	bcs,a	%xcc,.+8	!=
96174664626SKris Kennaway	add	c_3,t_2,c_3
96274664626SKris Kennaway	mulx	a_5,b_6,t_1	!mul_add_c(a[5],b[6],c3,c1,c2);
96374664626SKris Kennaway	addcc	c_12,t_1,c_12
96474664626SKris Kennaway	bcs,a	%xcc,.+8	!=
96574664626SKris Kennaway	add	c_3,t_2,c_3
96674664626SKris Kennaway	mulx	a_6,b_5,t_1	!mul_add_c(a[6],b[5],c3,c1,c2);
96774664626SKris Kennaway	addcc	c_12,t_1,c_12
96874664626SKris Kennaway	bcs,a	%xcc,.+8	!=
96974664626SKris Kennaway	add	c_3,t_2,c_3
97074664626SKris Kennaway	mulx	a_7,b_4,t_1	!mul_add_c(a[7],b[4],c3,c1,c2);
97174664626SKris Kennaway	addcc	c_12,t_1,t_1
97274664626SKris Kennaway	bcs,a	%xcc,.+8	!=
97374664626SKris Kennaway	add	c_3,t_2,c_3
97474664626SKris Kennaway	srlx	t_1,32,c_12
97574664626SKris Kennaway	stuw	t_1,rp(11)	!r[11]=c3;
97674664626SKris Kennaway	or	c_12,c_3,c_12	!=
97774664626SKris Kennaway
97874664626SKris Kennaway	mulx	a_7,b_5,t_1	!mul_add_c(a[7],b[5],c1,c2,c3);
97974664626SKris Kennaway	addcc	c_12,t_1,c_12
98074664626SKris Kennaway	clr	c_3
98174664626SKris Kennaway	bcs,a	%xcc,.+8	!=
98274664626SKris Kennaway	add	c_3,t_2,c_3
98374664626SKris Kennaway	mulx	a_6,b_6,t_1	!mul_add_c(a[6],b[6],c1,c2,c3);
98474664626SKris Kennaway	addcc	c_12,t_1,c_12
98574664626SKris Kennaway	bcs,a	%xcc,.+8	!=
98674664626SKris Kennaway	add	c_3,t_2,c_3
98774664626SKris Kennaway	mulx	a_5,b_7,t_1	!mul_add_c(a[5],b[7],c1,c2,c3);
98874664626SKris Kennaway	addcc	c_12,t_1,t_1
98974664626SKris Kennaway	bcs,a	%xcc,.+8	!=
99074664626SKris Kennaway	add	c_3,t_2,c_3
99174664626SKris Kennaway	srlx	t_1,32,c_12
99274664626SKris Kennaway	stuw	t_1,rp(12)	!r[12]=c1;
99374664626SKris Kennaway	or	c_12,c_3,c_12	!=
99474664626SKris Kennaway
99574664626SKris Kennaway	mulx	a_6,b_7,t_1	!mul_add_c(a[6],b[7],c2,c3,c1);
99674664626SKris Kennaway	addcc	c_12,t_1,c_12
99774664626SKris Kennaway	clr	c_3
99874664626SKris Kennaway	bcs,a	%xcc,.+8	!=
99974664626SKris Kennaway	add	c_3,t_2,c_3
100074664626SKris Kennaway	mulx	a_7,b_6,t_1	!mul_add_c(a[7],b[6],c2,c3,c1);
100174664626SKris Kennaway	addcc	c_12,t_1,t_1
100274664626SKris Kennaway	bcs,a	%xcc,.+8	!=
100374664626SKris Kennaway	add	c_3,t_2,c_3
100474664626SKris Kennaway	srlx	t_1,32,c_12
100574664626SKris Kennaway	st	t_1,rp(13)	!r[13]=c2;
100674664626SKris Kennaway	or	c_12,c_3,c_12	!=
100774664626SKris Kennaway
100874664626SKris Kennaway	mulx	a_7,b_7,t_1	!mul_add_c(a[7],b[7],c3,c1,c2);
100974664626SKris Kennaway	addcc	c_12,t_1,t_1
101074664626SKris Kennaway	srlx	t_1,32,c_12	!=
101174664626SKris Kennaway	stuw	t_1,rp(14)	!r[14]=c3;
101274664626SKris Kennaway	stuw	c_12,rp(15)	!r[15]=c1;
101374664626SKris Kennaway
101474664626SKris Kennaway	ret
101574664626SKris Kennaway	restore	%g0,%g0,%o0	!=
101674664626SKris Kennaway
101774664626SKris Kennaway.type	bn_mul_comba8,#function
101874664626SKris Kennaway.size	bn_mul_comba8,(.-bn_mul_comba8)
101974664626SKris Kennaway
102074664626SKris Kennaway.align	32
102174664626SKris Kennaway
102274664626SKris Kennaway.global bn_mul_comba4
102374664626SKris Kennaway/*
102474664626SKris Kennaway * void bn_mul_comba4(r,a,b)
102574664626SKris Kennaway * BN_ULONG *r,*a,*b;
102674664626SKris Kennaway */
102774664626SKris Kennawaybn_mul_comba4:
102874664626SKris Kennaway	save	%sp,FRAME_SIZE,%sp
102974664626SKris Kennaway	lduw	ap(0),a_0
103074664626SKris Kennaway	mov	1,t_2
103174664626SKris Kennaway	lduw	bp(0),b_0
103274664626SKris Kennaway	sllx	t_2,32,t_2	!=
103374664626SKris Kennaway	lduw	bp(1),b_1
103474664626SKris Kennaway	mulx	a_0,b_0,t_1	!mul_add_c(a[0],b[0],c1,c2,c3);
103574664626SKris Kennaway	srlx	t_1,32,c_12
103674664626SKris Kennaway	stuw	t_1,rp(0)	!=!r[0]=c1;
103774664626SKris Kennaway
103874664626SKris Kennaway	lduw	ap(1),a_1
103974664626SKris Kennaway	mulx	a_0,b_1,t_1	!mul_add_c(a[0],b[1],c2,c3,c1);
104074664626SKris Kennaway	addcc	c_12,t_1,c_12
104174664626SKris Kennaway	clr	c_3		!=
104274664626SKris Kennaway	bcs,a	%xcc,.+8
104374664626SKris Kennaway	add	c_3,t_2,c_3
104474664626SKris Kennaway	lduw	ap(2),a_2
104574664626SKris Kennaway	mulx	a_1,b_0,t_1	!=!mul_add_c(a[1],b[0],c2,c3,c1);
104674664626SKris Kennaway	addcc	c_12,t_1,t_1
104774664626SKris Kennaway	bcs,a	%xcc,.+8
104874664626SKris Kennaway	add	c_3,t_2,c_3
104974664626SKris Kennaway	srlx	t_1,32,c_12	!=
105074664626SKris Kennaway	stuw	t_1,rp(1)	!r[1]=c2;
105174664626SKris Kennaway	or	c_12,c_3,c_12
105274664626SKris Kennaway
105374664626SKris Kennaway	mulx	a_2,b_0,t_1	!mul_add_c(a[2],b[0],c3,c1,c2);
105474664626SKris Kennaway	addcc	c_12,t_1,c_12	!=
105574664626SKris Kennaway	clr	c_3
105674664626SKris Kennaway	bcs,a	%xcc,.+8
105774664626SKris Kennaway	add	c_3,t_2,c_3
105874664626SKris Kennaway	lduw	bp(2),b_2	!=
105974664626SKris Kennaway	mulx	a_1,b_1,t_1	!mul_add_c(a[1],b[1],c3,c1,c2);
106074664626SKris Kennaway	addcc	c_12,t_1,c_12
106174664626SKris Kennaway	bcs,a	%xcc,.+8
106274664626SKris Kennaway	add	c_3,t_2,c_3	!=
106374664626SKris Kennaway	lduw	bp(3),b_3
106474664626SKris Kennaway	mulx	a_0,b_2,t_1	!mul_add_c(a[0],b[2],c3,c1,c2);
106574664626SKris Kennaway	addcc	c_12,t_1,t_1
106674664626SKris Kennaway	bcs,a	%xcc,.+8	!=
106774664626SKris Kennaway	add	c_3,t_2,c_3
106874664626SKris Kennaway	srlx	t_1,32,c_12
106974664626SKris Kennaway	stuw	t_1,rp(2)	!r[2]=c3;
107074664626SKris Kennaway	or	c_12,c_3,c_12	!=
107174664626SKris Kennaway
107274664626SKris Kennaway	mulx	a_0,b_3,t_1	!mul_add_c(a[0],b[3],c1,c2,c3);
107374664626SKris Kennaway	addcc	c_12,t_1,c_12
107474664626SKris Kennaway	clr	c_3
107574664626SKris Kennaway	bcs,a	%xcc,.+8	!=
107674664626SKris Kennaway	add	c_3,t_2,c_3
107774664626SKris Kennaway	mulx	a_1,b_2,t_1	!mul_add_c(a[1],b[2],c1,c2,c3);
107874664626SKris Kennaway	addcc	c_12,t_1,c_12
107974664626SKris Kennaway	bcs,a	%xcc,.+8	!=
108074664626SKris Kennaway	add	c_3,t_2,c_3
108174664626SKris Kennaway	lduw	ap(3),a_3
108274664626SKris Kennaway	mulx	a_2,b_1,t_1	!mul_add_c(a[2],b[1],c1,c2,c3);
108374664626SKris Kennaway	addcc	c_12,t_1,c_12	!=
108474664626SKris Kennaway	bcs,a	%xcc,.+8
108574664626SKris Kennaway	add	c_3,t_2,c_3
108674664626SKris Kennaway	mulx	a_3,b_0,t_1	!mul_add_c(a[3],b[0],c1,c2,c3);!=
108774664626SKris Kennaway	addcc	c_12,t_1,t_1	!=
108874664626SKris Kennaway	bcs,a	%xcc,.+8
108974664626SKris Kennaway	add	c_3,t_2,c_3
109074664626SKris Kennaway	srlx	t_1,32,c_12
109174664626SKris Kennaway	stuw	t_1,rp(3)	!=!r[3]=c1;
109274664626SKris Kennaway	or	c_12,c_3,c_12
109374664626SKris Kennaway
109474664626SKris Kennaway	mulx	a_3,b_1,t_1	!mul_add_c(a[3],b[1],c2,c3,c1);
109574664626SKris Kennaway	addcc	c_12,t_1,c_12
109674664626SKris Kennaway	clr	c_3		!=
109774664626SKris Kennaway	bcs,a	%xcc,.+8
109874664626SKris Kennaway	add	c_3,t_2,c_3
109974664626SKris Kennaway	mulx	a_2,b_2,t_1	!mul_add_c(a[2],b[2],c2,c3,c1);
110074664626SKris Kennaway	addcc	c_12,t_1,c_12	!=
110174664626SKris Kennaway	bcs,a	%xcc,.+8
110274664626SKris Kennaway	add	c_3,t_2,c_3
110374664626SKris Kennaway	mulx	a_1,b_3,t_1	!mul_add_c(a[1],b[3],c2,c3,c1);
110474664626SKris Kennaway	addcc	c_12,t_1,t_1	!=
110574664626SKris Kennaway	bcs,a	%xcc,.+8
110674664626SKris Kennaway	add	c_3,t_2,c_3
110774664626SKris Kennaway	srlx	t_1,32,c_12
110874664626SKris Kennaway	stuw	t_1,rp(4)	!=!r[4]=c2;
110974664626SKris Kennaway	or	c_12,c_3,c_12
111074664626SKris Kennaway
111174664626SKris Kennaway	mulx	a_2,b_3,t_1	!mul_add_c(a[2],b[3],c3,c1,c2);
111274664626SKris Kennaway	addcc	c_12,t_1,c_12
111374664626SKris Kennaway	clr	c_3		!=
111474664626SKris Kennaway	bcs,a	%xcc,.+8
111574664626SKris Kennaway	add	c_3,t_2,c_3
111674664626SKris Kennaway	mulx	a_3,b_2,t_1	!mul_add_c(a[3],b[2],c3,c1,c2);
111774664626SKris Kennaway	addcc	c_12,t_1,t_1	!=
111874664626SKris Kennaway	bcs,a	%xcc,.+8
111974664626SKris Kennaway	add	c_3,t_2,c_3
112074664626SKris Kennaway	srlx	t_1,32,c_12
112174664626SKris Kennaway	stuw	t_1,rp(5)	!=!r[5]=c3;
112274664626SKris Kennaway	or	c_12,c_3,c_12
112374664626SKris Kennaway
112474664626SKris Kennaway	mulx	a_3,b_3,t_1	!mul_add_c(a[3],b[3],c1,c2,c3);
112574664626SKris Kennaway	addcc	c_12,t_1,t_1
112674664626SKris Kennaway	srlx	t_1,32,c_12	!=
112774664626SKris Kennaway	stuw	t_1,rp(6)	!r[6]=c1;
112874664626SKris Kennaway	stuw	c_12,rp(7)	!r[7]=c2;
112974664626SKris Kennaway
113074664626SKris Kennaway	ret
113174664626SKris Kennaway	restore	%g0,%g0,%o0
113274664626SKris Kennaway
113374664626SKris Kennaway.type	bn_mul_comba4,#function
113474664626SKris Kennaway.size	bn_mul_comba4,(.-bn_mul_comba4)
113574664626SKris Kennaway
113674664626SKris Kennaway.align	32
113774664626SKris Kennaway
113874664626SKris Kennaway.global bn_sqr_comba8
113974664626SKris Kennawaybn_sqr_comba8:
114074664626SKris Kennaway	save	%sp,FRAME_SIZE,%sp
114174664626SKris Kennaway	mov	1,t_2
114274664626SKris Kennaway	lduw	ap(0),a_0
114374664626SKris Kennaway	sllx	t_2,32,t_2
114474664626SKris Kennaway	lduw	ap(1),a_1
114574664626SKris Kennaway	mulx	a_0,a_0,t_1	!sqr_add_c(a,0,c1,c2,c3);
114674664626SKris Kennaway	srlx	t_1,32,c_12
114774664626SKris Kennaway	stuw	t_1,rp(0)	!r[0]=c1;
114874664626SKris Kennaway
114974664626SKris Kennaway	lduw	ap(2),a_2
115074664626SKris Kennaway	mulx	a_0,a_1,t_1	!=!sqr_add_c2(a,1,0,c2,c3,c1);
115174664626SKris Kennaway	addcc	c_12,t_1,c_12
115274664626SKris Kennaway	clr	c_3
115374664626SKris Kennaway	bcs,a	%xcc,.+8
115474664626SKris Kennaway	add	c_3,t_2,c_3
115574664626SKris Kennaway	addcc	c_12,t_1,t_1
115674664626SKris Kennaway	bcs,a	%xcc,.+8
115774664626SKris Kennaway	add	c_3,t_2,c_3
115874664626SKris Kennaway	srlx	t_1,32,c_12
115974664626SKris Kennaway	stuw	t_1,rp(1)	!r[1]=c2;
116074664626SKris Kennaway	or	c_12,c_3,c_12
116174664626SKris Kennaway
116274664626SKris Kennaway	mulx	a_2,a_0,t_1	!sqr_add_c2(a,2,0,c3,c1,c2);
116374664626SKris Kennaway	addcc	c_12,t_1,c_12
116474664626SKris Kennaway	clr	c_3
116574664626SKris Kennaway	bcs,a	%xcc,.+8
116674664626SKris Kennaway	add	c_3,t_2,c_3
116774664626SKris Kennaway	addcc	c_12,t_1,c_12
116874664626SKris Kennaway	bcs,a	%xcc,.+8
116974664626SKris Kennaway	add	c_3,t_2,c_3
117074664626SKris Kennaway	lduw	ap(3),a_3
117174664626SKris Kennaway	mulx	a_1,a_1,t_1	!sqr_add_c(a,1,c3,c1,c2);
117274664626SKris Kennaway	addcc	c_12,t_1,t_1
117374664626SKris Kennaway	bcs,a	%xcc,.+8
117474664626SKris Kennaway	add	c_3,t_2,c_3
117574664626SKris Kennaway	srlx	t_1,32,c_12
117674664626SKris Kennaway	stuw	t_1,rp(2)	!r[2]=c3;
117774664626SKris Kennaway	or	c_12,c_3,c_12
117874664626SKris Kennaway
117974664626SKris Kennaway	mulx	a_0,a_3,t_1	!sqr_add_c2(a,3,0,c1,c2,c3);
118074664626SKris Kennaway	addcc	c_12,t_1,c_12
118174664626SKris Kennaway	clr	c_3
118274664626SKris Kennaway	bcs,a	%xcc,.+8
118374664626SKris Kennaway	add	c_3,t_2,c_3
118474664626SKris Kennaway	addcc	c_12,t_1,c_12
118574664626SKris Kennaway	bcs,a	%xcc,.+8
118674664626SKris Kennaway	add	c_3,t_2,c_3
118774664626SKris Kennaway	lduw	ap(4),a_4
118874664626SKris Kennaway	mulx	a_1,a_2,t_1	!sqr_add_c2(a,2,1,c1,c2,c3);
118974664626SKris Kennaway	addcc	c_12,t_1,c_12
119074664626SKris Kennaway	bcs,a	%xcc,.+8
119174664626SKris Kennaway	add	c_3,t_2,c_3
119274664626SKris Kennaway	addcc	c_12,t_1,t_1
119374664626SKris Kennaway	bcs,a	%xcc,.+8
119474664626SKris Kennaway	add	c_3,t_2,c_3
119574664626SKris Kennaway	srlx	t_1,32,c_12
119674664626SKris Kennaway	st	t_1,rp(3)	!r[3]=c1;
119774664626SKris Kennaway	or	c_12,c_3,c_12
119874664626SKris Kennaway
119974664626SKris Kennaway	mulx	a_4,a_0,t_1	!sqr_add_c2(a,4,0,c2,c3,c1);
120074664626SKris Kennaway	addcc	c_12,t_1,c_12
120174664626SKris Kennaway	clr	c_3
120274664626SKris Kennaway	bcs,a	%xcc,.+8
120374664626SKris Kennaway	add	c_3,t_2,c_3
120474664626SKris Kennaway	addcc	c_12,t_1,c_12
120574664626SKris Kennaway	bcs,a	%xcc,.+8
120674664626SKris Kennaway	add	c_3,t_2,c_3
120774664626SKris Kennaway	mulx	a_3,a_1,t_1	!sqr_add_c2(a,3,1,c2,c3,c1);
120874664626SKris Kennaway	addcc	c_12,t_1,c_12
120974664626SKris Kennaway	bcs,a	%xcc,.+8
121074664626SKris Kennaway	add	c_3,t_2,c_3
121174664626SKris Kennaway	addcc	c_12,t_1,c_12
121274664626SKris Kennaway	bcs,a	%xcc,.+8
121374664626SKris Kennaway	add	c_3,t_2,c_3
121474664626SKris Kennaway	lduw	ap(5),a_5
121574664626SKris Kennaway	mulx	a_2,a_2,t_1	!sqr_add_c(a,2,c2,c3,c1);
121674664626SKris Kennaway	addcc	c_12,t_1,t_1
121774664626SKris Kennaway	bcs,a	%xcc,.+8
121874664626SKris Kennaway	add	c_3,t_2,c_3
121974664626SKris Kennaway	srlx	t_1,32,c_12
122074664626SKris Kennaway	stuw	t_1,rp(4)	!r[4]=c2;
122174664626SKris Kennaway	or	c_12,c_3,c_12
122274664626SKris Kennaway
122374664626SKris Kennaway	mulx	a_0,a_5,t_1	!sqr_add_c2(a,5,0,c3,c1,c2);
122474664626SKris Kennaway	addcc	c_12,t_1,c_12
122574664626SKris Kennaway	clr	c_3
122674664626SKris Kennaway	bcs,a	%xcc,.+8
122774664626SKris Kennaway	add	c_3,t_2,c_3
122874664626SKris Kennaway	addcc	c_12,t_1,c_12
122974664626SKris Kennaway	bcs,a	%xcc,.+8
123074664626SKris Kennaway	add	c_3,t_2,c_3
123174664626SKris Kennaway	mulx	a_1,a_4,t_1	!sqr_add_c2(a,4,1,c3,c1,c2);
123274664626SKris Kennaway	addcc	c_12,t_1,c_12
123374664626SKris Kennaway	bcs,a	%xcc,.+8
123474664626SKris Kennaway	add	c_3,t_2,c_3
123574664626SKris Kennaway	addcc	c_12,t_1,c_12
123674664626SKris Kennaway	bcs,a	%xcc,.+8
123774664626SKris Kennaway	add	c_3,t_2,c_3
123874664626SKris Kennaway	lduw	ap(6),a_6
123974664626SKris Kennaway	mulx	a_2,a_3,t_1	!sqr_add_c2(a,3,2,c3,c1,c2);
124074664626SKris Kennaway	addcc	c_12,t_1,c_12
124174664626SKris Kennaway	bcs,a	%xcc,.+8
124274664626SKris Kennaway	add	c_3,t_2,c_3
124374664626SKris Kennaway	addcc	c_12,t_1,t_1
124474664626SKris Kennaway	bcs,a	%xcc,.+8
124574664626SKris Kennaway	add	c_3,t_2,c_3
124674664626SKris Kennaway	srlx	t_1,32,c_12
124774664626SKris Kennaway	stuw	t_1,rp(5)	!r[5]=c3;
124874664626SKris Kennaway	or	c_12,c_3,c_12
124974664626SKris Kennaway
125074664626SKris Kennaway	mulx	a_6,a_0,t_1	!sqr_add_c2(a,6,0,c1,c2,c3);
125174664626SKris Kennaway	addcc	c_12,t_1,c_12
125274664626SKris Kennaway	clr	c_3
125374664626SKris Kennaway	bcs,a	%xcc,.+8
125474664626SKris Kennaway	add	c_3,t_2,c_3
125574664626SKris Kennaway	addcc	c_12,t_1,c_12
125674664626SKris Kennaway	bcs,a	%xcc,.+8
125774664626SKris Kennaway	add	c_3,t_2,c_3
125874664626SKris Kennaway	mulx	a_5,a_1,t_1	!sqr_add_c2(a,5,1,c1,c2,c3);
125974664626SKris Kennaway	addcc	c_12,t_1,c_12
126074664626SKris Kennaway	bcs,a	%xcc,.+8
126174664626SKris Kennaway	add	c_3,t_2,c_3
126274664626SKris Kennaway	addcc	c_12,t_1,c_12
126374664626SKris Kennaway	bcs,a	%xcc,.+8
126474664626SKris Kennaway	add	c_3,t_2,c_3
126574664626SKris Kennaway	mulx	a_4,a_2,t_1	!sqr_add_c2(a,4,2,c1,c2,c3);
126674664626SKris Kennaway	addcc	c_12,t_1,c_12
126774664626SKris Kennaway	bcs,a	%xcc,.+8
126874664626SKris Kennaway	add	c_3,t_2,c_3
126974664626SKris Kennaway	addcc	c_12,t_1,c_12
127074664626SKris Kennaway	bcs,a	%xcc,.+8
127174664626SKris Kennaway	add	c_3,t_2,c_3
127274664626SKris Kennaway	lduw	ap(7),a_7
127374664626SKris Kennaway	mulx	a_3,a_3,t_1	!=!sqr_add_c(a,3,c1,c2,c3);
127474664626SKris Kennaway	addcc	c_12,t_1,t_1
127574664626SKris Kennaway	bcs,a	%xcc,.+8
127674664626SKris Kennaway	add	c_3,t_2,c_3
127774664626SKris Kennaway	srlx	t_1,32,c_12
127874664626SKris Kennaway	stuw	t_1,rp(6)	!r[6]=c1;
127974664626SKris Kennaway	or	c_12,c_3,c_12
128074664626SKris Kennaway
128174664626SKris Kennaway	mulx	a_0,a_7,t_1	!sqr_add_c2(a,7,0,c2,c3,c1);
128274664626SKris Kennaway	addcc	c_12,t_1,c_12
128374664626SKris Kennaway	clr	c_3
128474664626SKris Kennaway	bcs,a	%xcc,.+8
128574664626SKris Kennaway	add	c_3,t_2,c_3
128674664626SKris Kennaway	addcc	c_12,t_1,c_12
128774664626SKris Kennaway	bcs,a	%xcc,.+8
128874664626SKris Kennaway	add	c_3,t_2,c_3
128974664626SKris Kennaway	mulx	a_1,a_6,t_1	!sqr_add_c2(a,6,1,c2,c3,c1);
129074664626SKris Kennaway	addcc	c_12,t_1,c_12
129174664626SKris Kennaway	bcs,a	%xcc,.+8
129274664626SKris Kennaway	add	c_3,t_2,c_3
129374664626SKris Kennaway	addcc	c_12,t_1,c_12
129474664626SKris Kennaway	bcs,a	%xcc,.+8
129574664626SKris Kennaway	add	c_3,t_2,c_3
129674664626SKris Kennaway	mulx	a_2,a_5,t_1	!sqr_add_c2(a,5,2,c2,c3,c1);
129774664626SKris Kennaway	addcc	c_12,t_1,c_12
129874664626SKris Kennaway	bcs,a	%xcc,.+8
129974664626SKris Kennaway	add	c_3,t_2,c_3
130074664626SKris Kennaway	addcc	c_12,t_1,c_12
130174664626SKris Kennaway	bcs,a	%xcc,.+8
130274664626SKris Kennaway	add	c_3,t_2,c_3
130374664626SKris Kennaway	mulx	a_3,a_4,t_1	!sqr_add_c2(a,4,3,c2,c3,c1);
130474664626SKris Kennaway	addcc	c_12,t_1,c_12
130574664626SKris Kennaway	bcs,a	%xcc,.+8
130674664626SKris Kennaway	add	c_3,t_2,c_3
130774664626SKris Kennaway	addcc	c_12,t_1,t_1
130874664626SKris Kennaway	bcs,a	%xcc,.+8
130974664626SKris Kennaway	add	c_3,t_2,c_3
131074664626SKris Kennaway	srlx	t_1,32,c_12
131174664626SKris Kennaway	stuw	t_1,rp(7)	!r[7]=c2;
131274664626SKris Kennaway	or	c_12,c_3,c_12
131374664626SKris Kennaway
131474664626SKris Kennaway	mulx	a_7,a_1,t_1	!sqr_add_c2(a,7,1,c3,c1,c2);
131574664626SKris Kennaway	addcc	c_12,t_1,c_12
131674664626SKris Kennaway	clr	c_3
131774664626SKris Kennaway	bcs,a	%xcc,.+8
131874664626SKris Kennaway	add	c_3,t_2,c_3
131974664626SKris Kennaway	addcc	c_12,t_1,c_12
132074664626SKris Kennaway	bcs,a	%xcc,.+8
132174664626SKris Kennaway	add	c_3,t_2,c_3
132274664626SKris Kennaway	mulx	a_6,a_2,t_1	!sqr_add_c2(a,6,2,c3,c1,c2);
132374664626SKris Kennaway	addcc	c_12,t_1,c_12
132474664626SKris Kennaway	bcs,a	%xcc,.+8
132574664626SKris Kennaway	add	c_3,t_2,c_3
132674664626SKris Kennaway	addcc	c_12,t_1,c_12
132774664626SKris Kennaway	bcs,a	%xcc,.+8
132874664626SKris Kennaway	add	c_3,t_2,c_3
132974664626SKris Kennaway	mulx	a_5,a_3,t_1	!sqr_add_c2(a,5,3,c3,c1,c2);
133074664626SKris Kennaway	addcc	c_12,t_1,c_12
133174664626SKris Kennaway	bcs,a	%xcc,.+8
133274664626SKris Kennaway	add	c_3,t_2,c_3
133374664626SKris Kennaway	addcc	c_12,t_1,c_12
133474664626SKris Kennaway	bcs,a	%xcc,.+8
133574664626SKris Kennaway	add	c_3,t_2,c_3
133674664626SKris Kennaway	mulx	a_4,a_4,t_1	!sqr_add_c(a,4,c3,c1,c2);
133774664626SKris Kennaway	addcc	c_12,t_1,t_1
133874664626SKris Kennaway	bcs,a	%xcc,.+8
133974664626SKris Kennaway	add	c_3,t_2,c_3
134074664626SKris Kennaway	srlx	t_1,32,c_12
134174664626SKris Kennaway	stuw	t_1,rp(8)	!r[8]=c3;
134274664626SKris Kennaway	or	c_12,c_3,c_12
134374664626SKris Kennaway
134474664626SKris Kennaway	mulx	a_2,a_7,t_1	!sqr_add_c2(a,7,2,c1,c2,c3);
134574664626SKris Kennaway	addcc	c_12,t_1,c_12
134674664626SKris Kennaway	clr	c_3
134774664626SKris Kennaway	bcs,a	%xcc,.+8
134874664626SKris Kennaway	add	c_3,t_2,c_3
134974664626SKris Kennaway	addcc	c_12,t_1,c_12
135074664626SKris Kennaway	bcs,a	%xcc,.+8
135174664626SKris Kennaway	add	c_3,t_2,c_3
135274664626SKris Kennaway	mulx	a_3,a_6,t_1	!sqr_add_c2(a,6,3,c1,c2,c3);
135374664626SKris Kennaway	addcc	c_12,t_1,c_12
135474664626SKris Kennaway	bcs,a	%xcc,.+8
135574664626SKris Kennaway	add	c_3,t_2,c_3
135674664626SKris Kennaway	addcc	c_12,t_1,c_12
135774664626SKris Kennaway	bcs,a	%xcc,.+8
135874664626SKris Kennaway	add	c_3,t_2,c_3
135974664626SKris Kennaway	mulx	a_4,a_5,t_1	!sqr_add_c2(a,5,4,c1,c2,c3);
136074664626SKris Kennaway	addcc	c_12,t_1,c_12
136174664626SKris Kennaway	bcs,a	%xcc,.+8
136274664626SKris Kennaway	add	c_3,t_2,c_3
136374664626SKris Kennaway	addcc	c_12,t_1,t_1
136474664626SKris Kennaway	bcs,a	%xcc,.+8
136574664626SKris Kennaway	add	c_3,t_2,c_3
136674664626SKris Kennaway	srlx	t_1,32,c_12
136774664626SKris Kennaway	stuw	t_1,rp(9)	!r[9]=c1;
136874664626SKris Kennaway	or	c_12,c_3,c_12
136974664626SKris Kennaway
137074664626SKris Kennaway	mulx	a_7,a_3,t_1	!sqr_add_c2(a,7,3,c2,c3,c1);
137174664626SKris Kennaway	addcc	c_12,t_1,c_12
137274664626SKris Kennaway	clr	c_3
137374664626SKris Kennaway	bcs,a	%xcc,.+8
137474664626SKris Kennaway	add	c_3,t_2,c_3
137574664626SKris Kennaway	addcc	c_12,t_1,c_12
137674664626SKris Kennaway	bcs,a	%xcc,.+8
137774664626SKris Kennaway	add	c_3,t_2,c_3
137874664626SKris Kennaway	mulx	a_6,a_4,t_1	!sqr_add_c2(a,6,4,c2,c3,c1);
137974664626SKris Kennaway	addcc	c_12,t_1,c_12
138074664626SKris Kennaway	bcs,a	%xcc,.+8
138174664626SKris Kennaway	add	c_3,t_2,c_3
138274664626SKris Kennaway	addcc	c_12,t_1,c_12
138374664626SKris Kennaway	bcs,a	%xcc,.+8
138474664626SKris Kennaway	add	c_3,t_2,c_3
138574664626SKris Kennaway	mulx	a_5,a_5,t_1	!sqr_add_c(a,5,c2,c3,c1);
138674664626SKris Kennaway	addcc	c_12,t_1,t_1
138774664626SKris Kennaway	bcs,a	%xcc,.+8
138874664626SKris Kennaway	add	c_3,t_2,c_3
138974664626SKris Kennaway	srlx	t_1,32,c_12
139074664626SKris Kennaway	stuw	t_1,rp(10)	!r[10]=c2;
139174664626SKris Kennaway	or	c_12,c_3,c_12
139274664626SKris Kennaway
139374664626SKris Kennaway	mulx	a_4,a_7,t_1	!sqr_add_c2(a,7,4,c3,c1,c2);
139474664626SKris Kennaway	addcc	c_12,t_1,c_12
139574664626SKris Kennaway	clr	c_3
139674664626SKris Kennaway	bcs,a	%xcc,.+8
139774664626SKris Kennaway	add	c_3,t_2,c_3
139874664626SKris Kennaway	addcc	c_12,t_1,c_12
139974664626SKris Kennaway	bcs,a	%xcc,.+8
140074664626SKris Kennaway	add	c_3,t_2,c_3
140174664626SKris Kennaway	mulx	a_5,a_6,t_1	!sqr_add_c2(a,6,5,c3,c1,c2);
140274664626SKris Kennaway	addcc	c_12,t_1,c_12
140374664626SKris Kennaway	bcs,a	%xcc,.+8
140474664626SKris Kennaway	add	c_3,t_2,c_3
140574664626SKris Kennaway	addcc	c_12,t_1,t_1
140674664626SKris Kennaway	bcs,a	%xcc,.+8
140774664626SKris Kennaway	add	c_3,t_2,c_3
140874664626SKris Kennaway	srlx	t_1,32,c_12
140974664626SKris Kennaway	stuw	t_1,rp(11)	!r[11]=c3;
141074664626SKris Kennaway	or	c_12,c_3,c_12
141174664626SKris Kennaway
141274664626SKris Kennaway	mulx	a_7,a_5,t_1	!sqr_add_c2(a,7,5,c1,c2,c3);
141374664626SKris Kennaway	addcc	c_12,t_1,c_12
141474664626SKris Kennaway	clr	c_3
141574664626SKris Kennaway	bcs,a	%xcc,.+8
141674664626SKris Kennaway	add	c_3,t_2,c_3
141774664626SKris Kennaway	addcc	c_12,t_1,c_12
141874664626SKris Kennaway	bcs,a	%xcc,.+8
141974664626SKris Kennaway	add	c_3,t_2,c_3
142074664626SKris Kennaway	mulx	a_6,a_6,t_1	!sqr_add_c(a,6,c1,c2,c3);
142174664626SKris Kennaway	addcc	c_12,t_1,t_1
142274664626SKris Kennaway	bcs,a	%xcc,.+8
142374664626SKris Kennaway	add	c_3,t_2,c_3
142474664626SKris Kennaway	srlx	t_1,32,c_12
142574664626SKris Kennaway	stuw	t_1,rp(12)	!r[12]=c1;
142674664626SKris Kennaway	or	c_12,c_3,c_12
142774664626SKris Kennaway
142874664626SKris Kennaway	mulx	a_6,a_7,t_1	!sqr_add_c2(a,7,6,c2,c3,c1);
142974664626SKris Kennaway	addcc	c_12,t_1,c_12
143074664626SKris Kennaway	clr	c_3
143174664626SKris Kennaway	bcs,a	%xcc,.+8
143274664626SKris Kennaway	add	c_3,t_2,c_3
143374664626SKris Kennaway	addcc	c_12,t_1,t_1
143474664626SKris Kennaway	bcs,a	%xcc,.+8
143574664626SKris Kennaway	add	c_3,t_2,c_3
143674664626SKris Kennaway	srlx	t_1,32,c_12
143774664626SKris Kennaway	stuw	t_1,rp(13)	!r[13]=c2;
143874664626SKris Kennaway	or	c_12,c_3,c_12
143974664626SKris Kennaway
144074664626SKris Kennaway	mulx	a_7,a_7,t_1	!sqr_add_c(a,7,c3,c1,c2);
144174664626SKris Kennaway	addcc	c_12,t_1,t_1
144274664626SKris Kennaway	srlx	t_1,32,c_12
144374664626SKris Kennaway	stuw	t_1,rp(14)	!r[14]=c3;
144474664626SKris Kennaway	stuw	c_12,rp(15)	!r[15]=c1;
144574664626SKris Kennaway
144674664626SKris Kennaway	ret
144774664626SKris Kennaway	restore	%g0,%g0,%o0
144874664626SKris Kennaway
144974664626SKris Kennaway.type	bn_sqr_comba8,#function
145074664626SKris Kennaway.size	bn_sqr_comba8,(.-bn_sqr_comba8)
145174664626SKris Kennaway
145274664626SKris Kennaway.align	32
145374664626SKris Kennaway
145474664626SKris Kennaway.global bn_sqr_comba4
145574664626SKris Kennaway/*
145674664626SKris Kennaway * void bn_sqr_comba4(r,a)
145774664626SKris Kennaway * BN_ULONG *r,*a;
145874664626SKris Kennaway */
145974664626SKris Kennawaybn_sqr_comba4:
146074664626SKris Kennaway	save	%sp,FRAME_SIZE,%sp
146174664626SKris Kennaway	mov	1,t_2
146274664626SKris Kennaway	lduw	ap(0),a_0
146374664626SKris Kennaway	sllx	t_2,32,t_2
146474664626SKris Kennaway	lduw	ap(1),a_1
146574664626SKris Kennaway	mulx	a_0,a_0,t_1	!sqr_add_c(a,0,c1,c2,c3);
146674664626SKris Kennaway	srlx	t_1,32,c_12
146774664626SKris Kennaway	stuw	t_1,rp(0)	!r[0]=c1;
146874664626SKris Kennaway
146974664626SKris Kennaway	lduw	ap(2),a_2
147074664626SKris Kennaway	mulx	a_0,a_1,t_1	!sqr_add_c2(a,1,0,c2,c3,c1);
147174664626SKris Kennaway	addcc	c_12,t_1,c_12
147274664626SKris Kennaway	clr	c_3
147374664626SKris Kennaway	bcs,a	%xcc,.+8
147474664626SKris Kennaway	add	c_3,t_2,c_3
147574664626SKris Kennaway	addcc	c_12,t_1,t_1
147674664626SKris Kennaway	bcs,a	%xcc,.+8
147774664626SKris Kennaway	add	c_3,t_2,c_3
147874664626SKris Kennaway	srlx	t_1,32,c_12
147974664626SKris Kennaway	stuw	t_1,rp(1)	!r[1]=c2;
148074664626SKris Kennaway	or	c_12,c_3,c_12
148174664626SKris Kennaway
148274664626SKris Kennaway	mulx	a_2,a_0,t_1	!sqr_add_c2(a,2,0,c3,c1,c2);
148374664626SKris Kennaway	addcc	c_12,t_1,c_12
148474664626SKris Kennaway	clr	c_3
148574664626SKris Kennaway	bcs,a	%xcc,.+8
148674664626SKris Kennaway	add	c_3,t_2,c_3
148774664626SKris Kennaway	addcc	c_12,t_1,c_12
148874664626SKris Kennaway	bcs,a	%xcc,.+8
148974664626SKris Kennaway	add	c_3,t_2,c_3
149074664626SKris Kennaway	lduw	ap(3),a_3
149174664626SKris Kennaway	mulx	a_1,a_1,t_1	!sqr_add_c(a,1,c3,c1,c2);
149274664626SKris Kennaway	addcc	c_12,t_1,t_1
149374664626SKris Kennaway	bcs,a	%xcc,.+8
149474664626SKris Kennaway	add	c_3,t_2,c_3
149574664626SKris Kennaway	srlx	t_1,32,c_12
149674664626SKris Kennaway	stuw	t_1,rp(2)	!r[2]=c3;
149774664626SKris Kennaway	or	c_12,c_3,c_12
149874664626SKris Kennaway
149974664626SKris Kennaway	mulx	a_0,a_3,t_1	!sqr_add_c2(a,3,0,c1,c2,c3);
150074664626SKris Kennaway	addcc	c_12,t_1,c_12
150174664626SKris Kennaway	clr	c_3
150274664626SKris Kennaway	bcs,a	%xcc,.+8
150374664626SKris Kennaway	add	c_3,t_2,c_3
150474664626SKris Kennaway	addcc	c_12,t_1,c_12
150574664626SKris Kennaway	bcs,a	%xcc,.+8
150674664626SKris Kennaway	add	c_3,t_2,c_3
150774664626SKris Kennaway	mulx	a_1,a_2,t_1	!sqr_add_c2(a,2,1,c1,c2,c3);
150874664626SKris Kennaway	addcc	c_12,t_1,c_12
150974664626SKris Kennaway	bcs,a	%xcc,.+8
151074664626SKris Kennaway	add	c_3,t_2,c_3
151174664626SKris Kennaway	addcc	c_12,t_1,t_1
151274664626SKris Kennaway	bcs,a	%xcc,.+8
151374664626SKris Kennaway	add	c_3,t_2,c_3
151474664626SKris Kennaway	srlx	t_1,32,c_12
151574664626SKris Kennaway	stuw	t_1,rp(3)	!r[3]=c1;
151674664626SKris Kennaway	or	c_12,c_3,c_12
151774664626SKris Kennaway
151874664626SKris Kennaway	mulx	a_3,a_1,t_1	!sqr_add_c2(a,3,1,c2,c3,c1);
151974664626SKris Kennaway	addcc	c_12,t_1,c_12
152074664626SKris Kennaway	clr	c_3
152174664626SKris Kennaway	bcs,a	%xcc,.+8
152274664626SKris Kennaway	add	c_3,t_2,c_3
152374664626SKris Kennaway	addcc	c_12,t_1,c_12
152474664626SKris Kennaway	bcs,a	%xcc,.+8
152574664626SKris Kennaway	add	c_3,t_2,c_3
152674664626SKris Kennaway	mulx	a_2,a_2,t_1	!sqr_add_c(a,2,c2,c3,c1);
152774664626SKris Kennaway	addcc	c_12,t_1,t_1
152874664626SKris Kennaway	bcs,a	%xcc,.+8
152974664626SKris Kennaway	add	c_3,t_2,c_3
153074664626SKris Kennaway	srlx	t_1,32,c_12
153174664626SKris Kennaway	stuw	t_1,rp(4)	!r[4]=c2;
153274664626SKris Kennaway	or	c_12,c_3,c_12
153374664626SKris Kennaway
153474664626SKris Kennaway	mulx	a_2,a_3,t_1	!sqr_add_c2(a,3,2,c3,c1,c2);
153574664626SKris Kennaway	addcc	c_12,t_1,c_12
153674664626SKris Kennaway	clr	c_3
153774664626SKris Kennaway	bcs,a	%xcc,.+8
153874664626SKris Kennaway	add	c_3,t_2,c_3
153974664626SKris Kennaway	addcc	c_12,t_1,t_1
154074664626SKris Kennaway	bcs,a	%xcc,.+8
154174664626SKris Kennaway	add	c_3,t_2,c_3
154274664626SKris Kennaway	srlx	t_1,32,c_12
154374664626SKris Kennaway	stuw	t_1,rp(5)	!r[5]=c3;
154474664626SKris Kennaway	or	c_12,c_3,c_12
154574664626SKris Kennaway
154674664626SKris Kennaway	mulx	a_3,a_3,t_1	!sqr_add_c(a,3,c1,c2,c3);
154774664626SKris Kennaway	addcc	c_12,t_1,t_1
154874664626SKris Kennaway	srlx	t_1,32,c_12
154974664626SKris Kennaway	stuw	t_1,rp(6)	!r[6]=c1;
155074664626SKris Kennaway	stuw	c_12,rp(7)	!r[7]=c2;
155174664626SKris Kennaway
155274664626SKris Kennaway	ret
155374664626SKris Kennaway	restore	%g0,%g0,%o0
155474664626SKris Kennaway
155574664626SKris Kennaway.type	bn_sqr_comba4,#function
155674664626SKris Kennaway.size	bn_sqr_comba4,(.-bn_sqr_comba4)
155774664626SKris Kennaway
155874664626SKris Kennaway.align	32
1559