1\input texinfo @c -*-texinfo-*- 2@c %**start of header 3@setfilename gmp.info 4@documentencoding ISO-8859-1 5@include version.texi 6@settitle GNU MP @value{VERSION} 7@synindex tp fn 8@iftex 9@afourpaper 10@end iftex 11@comment %**end of header 12 13@copying 14This manual describes how to install and use the GNU multiple precision 15arithmetic library, version @value{VERSION}. 16 17Copyright 1991, 1993-2014 Free Software Foundation, Inc. 18 19Permission is granted to copy, distribute and/or modify this document under 20the terms of the GNU Free Documentation License, Version 1.3 or any later 21version published by the Free Software Foundation; with no Invariant Sections, 22with the Front-Cover Texts being ``A GNU Manual'', and with the Back-Cover 23Texts being ``You have freedom to copy and modify this GNU Manual, like GNU 24software''. A copy of the license is included in 25@ref{GNU Free Documentation License}. 26@end copying 27@c Note the @ref above must be on one line, a line break in an @ref within 28@c @copying will bomb in recent texinfo.tex (eg. 2004-04-07.08 which comes 29@c with texinfo 4.7), with messages about missing @endcsname. 30 31 32@c Texinfo version 4.2 or up will be needed to process this file. 33@c 34@c The version number and edition number are taken from version.texi provided 35@c by automake (note that it's regenerated only if you configure with 36@c --enable-maintainer-mode). 37@c 38@c Notes discussing the present version number of GMP in relation to previous 39@c ones (for instance in the "Compatibility" section) must be updated at 40@c manually though. 41@c 42@c @cindex entries have been made for function categories and programming 43@c topics. The "mpn" section is not included in this, because a beginner 44@c looking for "GCD" or something is only going to be confused by pointers to 45@c low level routines. 46@c 47@c @cindex entries are present for processors and systems when there's 48@c particular notes concerning them, but not just for everything GMP 49@c supports. 50@c 51@c Index entries for files use @code rather than @file, @samp or @option, 52@c since the latter come out with quotes in TeX, which are nice in the text 53@c but don't look so good in index columns. 54@c 55@c Tex: 56@c 57@c A suitable texinfo.tex is supplied, a newer one should work equally well. 58@c 59@c HTML: 60@c 61@c Nothing special is done for links to external manuals, they just come out 62@c in the usual makeinfo style, eg. "../libc/Locales.html". If you have 63@c local copies of such manuals then this is a good thing, if not then you 64@c may want to search-and-replace to some online source. 65@c 66 67@dircategory GNU libraries 68@direntry 69* gmp: (gmp). GNU Multiple Precision Arithmetic Library. 70@end direntry 71 72@c html <meta name="description" content="..."> 73@documentdescription 74How to install and use the GNU multiple precision arithmetic library, version @value{VERSION}. 75@end documentdescription 76 77@c smallbook 78@finalout 79@setchapternewpage on 80 81@ifnottex 82@node Top, Copying, (dir), (dir) 83@top GNU MP 84@end ifnottex 85 86@iftex 87@titlepage 88@title GNU MP 89@subtitle The GNU Multiple Precision Arithmetic Library 90@subtitle Edition @value{EDITION} 91@subtitle @value{UPDATED} 92 93@author by Torbj@"orn Granlund and the GMP development team 94@c @email{tg@@gmplib.org} 95 96@c Include the Distribution inside the titlepage so 97@c that headings are turned off. 98 99@tex 100\global\parindent=0pt 101\global\parskip=8pt 102\global\baselineskip=13pt 103@end tex 104 105@page 106@vskip 0pt plus 1filll 107@end iftex 108 109@insertcopying 110@ifnottex 111@sp 1 112@end ifnottex 113 114@iftex 115@end titlepage 116@headings double 117@end iftex 118 119@c Don't bother with contents for html, the menus seem adequate. 120@ifnothtml 121@contents 122@end ifnothtml 123 124@menu 125* Copying:: GMP Copying Conditions (LGPL). 126* Introduction to GMP:: Brief introduction to GNU MP. 127* Installing GMP:: How to configure and compile the GMP library. 128* GMP Basics:: What every GMP user should know. 129* Reporting Bugs:: How to usefully report bugs. 130* Integer Functions:: Functions for arithmetic on signed integers. 131* Rational Number Functions:: Functions for arithmetic on rational numbers. 132* Floating-point Functions:: Functions for arithmetic on floats. 133* Low-level Functions:: Fast functions for natural numbers. 134* Random Number Functions:: Functions for generating random numbers. 135* Formatted Output:: @code{printf} style output. 136* Formatted Input:: @code{scanf} style input. 137* C++ Class Interface:: Class wrappers around GMP types. 138* Custom Allocation:: How to customize the internal allocation. 139* Language Bindings:: Using GMP from other languages. 140* Algorithms:: What happens behind the scenes. 141* Internals:: How values are represented behind the scenes. 142 143* Contributors:: Who brings you this library? 144* References:: Some useful papers and books to read. 145* GNU Free Documentation License:: 146* Concept Index:: 147* Function Index:: 148@end menu 149 150 151@c @m{T,N} is $T$ in tex or @math{N} otherwise. This is an easy way to give 152@c different forms for math in tex and info. Commas in N or T don't work, 153@c but @C{} can be used instead. \, works in info but not in tex. 154@iftex 155@macro m {T,N} 156@tex$\T\$@end tex 157@end macro 158@end iftex 159@ifnottex 160@macro m {T,N} 161@math{\N\} 162@end macro 163@end ifnottex 164 165@macro C {} 166, 167@end macro 168 169@c @ms{V,N} is $V_N$ in tex or just vn otherwise. This suits simple 170@c subscripts like @ms{x,0}. 171@iftex 172@macro ms {V,N} 173@tex$\V\_{\N\}$@end tex 174@end macro 175@end iftex 176@ifnottex 177@macro ms {V,N} 178\V\\N\ 179@end macro 180@end ifnottex 181 182@c @nicode{S} is plain S in info, or @code{S} elsewhere. This can be used 183@c when the quotes that @code{} gives in info aren't wanted, but the 184@c fontification in tex or html is wanted. Doesn't work as @nicode{'\\0'} 185@c though (gives two backslashes in tex). 186@ifinfo 187@macro nicode {S} 188\S\ 189@end macro 190@end ifinfo 191@ifnotinfo 192@macro nicode {S} 193@code{\S\} 194@end macro 195@end ifnotinfo 196 197@c @nisamp{S} is plain S in info, or @samp{S} elsewhere. This can be used 198@c when the quotes that @samp{} gives in info aren't wanted, but the 199@c fontification in tex or html is wanted. 200@ifinfo 201@macro nisamp {S} 202\S\ 203@end macro 204@end ifinfo 205@ifnotinfo 206@macro nisamp {S} 207@samp{\S\} 208@end macro 209@end ifnotinfo 210 211@c Usage: @GMPtimes{} 212@c Give either \times or the word "times". 213@tex 214\gdef\GMPtimes{\times} 215@end tex 216@ifnottex 217@macro GMPtimes 218times 219@end macro 220@end ifnottex 221 222@c Usage: @GMPmultiply{} 223@c Give * in info, or nothing in tex. 224@tex 225\gdef\GMPmultiply{} 226@end tex 227@ifnottex 228@macro GMPmultiply 229* 230@end macro 231@end ifnottex 232 233@c Usage: @GMPabs{x} 234@c Give either |x| in tex, or abs(x) in info or html. 235@tex 236\gdef\GMPabs#1{|#1|} 237@end tex 238@ifnottex 239@macro GMPabs {X} 240@abs{}(\X\) 241@end macro 242@end ifnottex 243 244@c Usage: @GMPfloor{x} 245@c Give either \lfloor x\rfloor in tex, or floor(x) in info or html. 246@tex 247\gdef\GMPfloor#1{\lfloor #1\rfloor} 248@end tex 249@ifnottex 250@macro GMPfloor {X} 251floor(\X\) 252@end macro 253@end ifnottex 254 255@c Usage: @GMPceil{x} 256@c Give either \lceil x\rceil in tex, or ceil(x) in info or html. 257@tex 258\gdef\GMPceil#1{\lceil #1 \rceil} 259@end tex 260@ifnottex 261@macro GMPceil {X} 262ceil(\X\) 263@end macro 264@end ifnottex 265 266@c Math operators already available in tex, made available in info too. 267@c For example @bmod{} can be used in both tex and info. 268@ifnottex 269@macro bmod 270mod 271@end macro 272@macro gcd 273gcd 274@end macro 275@macro ge 276>= 277@end macro 278@macro le 279<= 280@end macro 281@macro log 282log 283@end macro 284@macro min 285min 286@end macro 287@macro leftarrow 288<- 289@end macro 290@macro rightarrow 291-> 292@end macro 293@end ifnottex 294 295@c New math operators. 296@c @abs{} can be used in both tex and info, or just \abs in tex. 297@tex 298\gdef\abs{\mathop{\rm abs}} 299@end tex 300@ifnottex 301@macro abs 302abs 303@end macro 304@end ifnottex 305 306@c @cross{} is a \times symbol in tex, or an "x" in info. In tex it works 307@c inside or outside $ $. 308@tex 309\gdef\cross{\ifmmode\times\else$\times$\fi} 310@end tex 311@ifnottex 312@macro cross 313x 314@end macro 315@end ifnottex 316 317@c @times{} made available as a "*" in info and html (already works in tex). 318@ifnottex 319@macro times 320* 321@end macro 322@end ifnottex 323 324@c Usage: @W{text} 325@c Like @w{} but working in math mode too. 326@tex 327\gdef\W#1{\ifmmode{#1}\else\w{#1}\fi} 328@end tex 329@ifnottex 330@macro W {S} 331@w{\S\} 332@end macro 333@end ifnottex 334 335@c Usage: \GMPdisplay{text} 336@c Put the given text in an @display style indent, but without turning off 337@c paragraph reflow etc. 338@tex 339\gdef\GMPdisplay#1{% 340\noindent 341\advance\leftskip by \lispnarrowing 342#1\par} 343@end tex 344 345@c Usage: \GMPhat 346@c A new \hat that will work in math mode, unlike the texinfo redefined 347@c version. 348@tex 349\gdef\GMPhat{\mathaccent"705E} 350@end tex 351 352@c Usage: \GMPraise{text} 353@c For use in a $ $ math expression as an alternative to "^". This is good 354@c for @code{} in an exponent, since there seems to be no superscript font 355@c for that. 356@tex 357\gdef\GMPraise#1{\mskip0.5\thinmuskip\hbox{\raise0.8ex\hbox{#1}}} 358@end tex 359 360@c Usage: @texlinebreak{} 361@c A line break as per @*, but only in tex. 362@iftex 363@macro texlinebreak 364@* 365@end macro 366@end iftex 367@ifnottex 368@macro texlinebreak 369@end macro 370@end ifnottex 371 372@c Usage: @maybepagebreak 373@c Allow tex to insert a page break, if it feels the urge. 374@c Normally blocks of @deftypefun/funx are kept together, which can lead to 375@c some poor page break positioning if it's a big block, like the sets of 376@c division functions etc. 377@tex 378\gdef\maybepagebreak{\penalty0} 379@end tex 380@ifnottex 381@macro maybepagebreak 382@end macro 383@end ifnottex 384 385@c Usage: @GMPreftop{info,title} 386@c Usage: @GMPpxreftop{info,title} 387@c 388@c Like @ref{} and @pxref{}, but designed for a reference to the top of a 389@c document, not a particular section. The TeX output for plain @ref insists 390@c on printing a particular section, GMPreftop gives just the title. 391@c 392@c The texinfo manual recommends putting a likely section name in references 393@c like this, eg. "Introduction", but it seems better to just give the title. 394@c 395@iftex 396@macro GMPreftop{info,title} 397@i{\title\} 398@end macro 399@macro GMPpxreftop{info,title} 400see @i{\title\} 401@end macro 402@end iftex 403@c 404@ifnottex 405@macro GMPreftop{info,title} 406@ref{Top,\title\,\title\,\info\,\title\} 407@end macro 408@macro GMPpxreftop{info,title} 409@pxref{Top,\title\,\title\,\info\,\title\} 410@end macro 411@end ifnottex 412 413 414@node Copying, Introduction to GMP, Top, Top 415@comment node-name, next, previous, up 416@unnumbered GNU MP Copying Conditions 417@cindex Copying conditions 418@cindex Conditions for copying GNU MP 419@cindex License conditions 420 421This library is @dfn{free}; this means that everyone is free to use it and 422free to redistribute it on a free basis. The library is not in the public 423domain; it is copyrighted and there are restrictions on its distribution, but 424these restrictions are designed to permit everything that a good cooperating 425citizen would want to do. What is not allowed is to try to prevent others 426from further sharing any version of this library that they might get from 427you.@refill 428 429Specifically, we want to make sure that you have the right to give away copies 430of the library, that you receive source code or else can get it if you want 431it, that you can change this library or use pieces of it in new free programs, 432and that you know you can do these things.@refill 433 434To make sure that everyone has such rights, we have to forbid you to deprive 435anyone else of these rights. For example, if you distribute copies of the GNU 436MP library, you must give the recipients all the rights that you have. You 437must make sure that they, too, receive or can get the source code. And you 438must tell them their rights.@refill 439 440Also, for our own protection, we must make certain that everyone finds out 441that there is no warranty for the GNU MP library. If it is modified by 442someone else and passed on, we want their recipients to know that what they 443have is not what we distributed, so that any problems introduced by others 444will not reflect on our reputation.@refill 445 446More precisely, the GNU MP library is dual licensed, under the conditions of 447the GNU Lesser General Public License version 3 (see 448@file{COPYING.LESSERv3}), or the GNU General Public License version 2 (see 449@file{COPYINGv2}). This is the recipient's choice, and the recipient also has 450the additional option of applying later versions of these licenses. (The 451reason for this dual licensing is to make it possible to use the library with 452programs which are licensed under GPL version 2, but which for historical or 453other reasons do not allow use under later versions of the GPL). 454 455Programs which are not part of the library itself, such as demonstration 456programs and the GMP testsuite, are licensed under the terms of the GNU 457General Public License version 3 (see @file{COPYINGv3}), or any later 458version. 459 460 461@node Introduction to GMP, Installing GMP, Copying, Top 462@comment node-name, next, previous, up 463@chapter Introduction to GNU MP 464@cindex Introduction 465 466GNU MP is a portable library written in C for arbitrary precision arithmetic 467on integers, rational numbers, and floating-point numbers. It aims to provide 468the fastest possible arithmetic for all applications that need higher 469precision than is directly supported by the basic C types. 470 471Many applications use just a few hundred bits of precision; but some 472applications may need thousands or even millions of bits. GMP is designed to 473give good performance for both, by choosing algorithms based on the sizes of 474the operands, and by carefully keeping the overhead at a minimum. 475 476The speed of GMP is achieved by using fullwords as the basic arithmetic type, 477by using sophisticated algorithms, by including carefully optimized assembly 478code for the most common inner loops for many different CPUs, and by a general 479emphasis on speed (as opposed to simplicity or elegance). 480 481There is assembly code for these CPUs: 482@cindex CPU types 483ARM Cortex-A9, Cortex-A15, and generic ARM, 484DEC Alpha 21064, 21164, and 21264, 485AMD K8 and K10 (sold under many brands, e.g. Athlon64, Phenom, Opteron) 486Bulldozer, and Bobcat, 487Intel Pentium, Pentium Pro/II/III, Pentium 4, Core2, Nehalem, Sandy bridge, Haswell, generic x86, 488Intel IA-64, 489Motorola/IBM PowerPC 32 and 64 such as POWER970, POWER5, POWER6, and POWER7, 490MIPS 32-bit and 64-bit, 491SPARC 32-bit ad 64-bit with special support for all UltraSPARC models. 492There is also assembly code for many obsolete CPUs. 493 494 495@cindex Home page 496@cindex Web page 497@noindent 498For up-to-date information on GMP, please see the GMP web pages at 499 500@display 501@uref{https://gmplib.org/} 502@end display 503 504@cindex Latest version of GMP 505@cindex Anonymous FTP of latest version 506@cindex FTP of latest version 507@noindent 508The latest version of the library is available at 509 510@display 511@uref{https://ftp.gnu.org/gnu/gmp/} 512@end display 513 514Many sites around the world mirror @samp{ftp.gnu.org}, please use a mirror 515near you, see @uref{https://www.gnu.org/order/ftp.html} for a full list. 516 517@cindex Mailing lists 518There are three public mailing lists of interest. One for release 519announcements, one for general questions and discussions about usage of the GMP 520library and one for bug reports. For more information, see 521 522@display 523@uref{https://gmplib.org/mailman/listinfo/}. 524@end display 525 526The proper place for bug reports is @email{gmp-bugs@@gmplib.org}. See 527@ref{Reporting Bugs} for information about reporting bugs. 528 529@sp 1 530@section How to use this Manual 531@cindex About this manual 532 533Everyone should read @ref{GMP Basics}. If you need to install the library 534yourself, then read @ref{Installing GMP}. If you have a system with multiple 535ABIs, then read @ref{ABI and ISA}, for the compiler options that must be used 536on applications. 537 538The rest of the manual can be used for later reference, although it is 539probably a good idea to glance through it. 540 541 542@node Installing GMP, GMP Basics, Introduction to GMP, Top 543@comment node-name, next, previous, up 544@chapter Installing GMP 545@cindex Installing GMP 546@cindex Configuring GMP 547@cindex Building GMP 548 549GMP has an autoconf/automake/libtool based configuration system. On a 550Unix-like system a basic build can be done with 551 552@example 553./configure 554make 555@end example 556 557@noindent 558Some self-tests can be run with 559 560@example 561make check 562@end example 563 564@noindent 565And you can install (under @file{/usr/local} by default) with 566 567@example 568make install 569@end example 570 571If you experience problems, please report them to @email{gmp-bugs@@gmplib.org}. 572See @ref{Reporting Bugs}, for information on what to include in useful bug 573reports. 574 575@menu 576* Build Options:: 577* ABI and ISA:: 578* Notes for Package Builds:: 579* Notes for Particular Systems:: 580* Known Build Problems:: 581* Performance optimization:: 582@end menu 583 584 585@node Build Options, ABI and ISA, Installing GMP, Installing GMP 586@section Build Options 587@cindex Build options 588 589All the usual autoconf configure options are available, run @samp{./configure 590--help} for a summary. The file @file{INSTALL.autoconf} has some generic 591installation information too. 592 593@table @asis 594@item Tools 595@cindex Non-Unix systems 596@samp{configure} requires various Unix-like tools. See @ref{Notes for 597Particular Systems}, for some options on non-Unix systems. 598 599It might be possible to build without the help of @samp{configure}, certainly 600all the code is there, but unfortunately you'll be on your own. 601 602@item Build Directory 603@cindex Build directory 604To compile in a separate build directory, @command{cd} to that directory, and 605prefix the configure command with the path to the GMP source directory. For 606example 607 608@example 609cd /my/build/dir 610/my/sources/gmp-@value{VERSION}/configure 611@end example 612 613Not all @samp{make} programs have the necessary features (@code{VPATH}) to 614support this. In particular, SunOS and Slowaris @command{make} have bugs that 615make them unable to build in a separate directory. Use GNU @command{make} 616instead. 617 618@item @option{--prefix} and @option{--exec-prefix} 619@cindex Prefix 620@cindex Exec prefix 621@cindex Install prefix 622@cindex @code{--prefix} 623@cindex @code{--exec-prefix} 624The @option{--prefix} option can be used in the normal way to direct GMP to 625install under a particular tree. The default is @samp{/usr/local}. 626 627@option{--exec-prefix} can be used to direct architecture-dependent files like 628@file{libgmp.a} to a different location. This can be used to share 629architecture-independent parts like the documentation, but separate the 630dependent parts. Note however that @file{gmp.h} and @file{mp.h} are 631architecture-dependent since they encode certain aspects of @file{libgmp}, so 632it will be necessary to ensure both @file{$prefix/include} and 633@file{$exec_prefix/include} are available to the compiler. 634 635@item @option{--disable-shared}, @option{--disable-static} 636@cindex @code{--disable-shared} 637@cindex @code{--disable-static} 638By default both shared and static libraries are built (where possible), but 639one or other can be disabled. Shared libraries result in smaller executables 640and permit code sharing between separate running processes, but on some CPUs 641are slightly slower, having a small cost on each function call. 642 643@item Native Compilation, @option{--build=CPU-VENDOR-OS} 644@cindex Native compilation 645@cindex Build system 646@cindex @code{--build} 647For normal native compilation, the system can be specified with 648@samp{--build}. By default @samp{./configure} uses the output from running 649@samp{./config.guess}. On some systems @samp{./config.guess} can determine 650the exact CPU type, on others it will be necessary to give it explicitly. For 651example, 652 653@example 654./configure --build=ultrasparc-sun-solaris2.7 655@end example 656 657In all cases the @samp{OS} part is important, since it controls how libtool 658generates shared libraries. Running @samp{./config.guess} is the simplest way 659to see what it should be, if you don't know already. 660 661@item Cross Compilation, @option{--host=CPU-VENDOR-OS} 662@cindex Cross compiling 663@cindex Host system 664@cindex @code{--host} 665When cross-compiling, the system used for compiling is given by @samp{--build} 666and the system where the library will run is given by @samp{--host}. For 667example when using a FreeBSD Athlon system to build GNU/Linux m68k binaries, 668 669@example 670./configure --build=athlon-pc-freebsd3.5 --host=m68k-mac-linux-gnu 671@end example 672 673Compiler tools are sought first with the host system type as a prefix. For 674example @command{m68k-mac-linux-gnu-ranlib} is tried, then plain 675@command{ranlib}. This makes it possible for a set of cross-compiling tools 676to co-exist with native tools. The prefix is the argument to @samp{--host}, 677and this can be an alias, such as @samp{m68k-linux}. But note that tools 678don't have to be setup this way, it's enough to just have a @env{PATH} with a 679suitable cross-compiling @command{cc} etc. 680 681Compiling for a different CPU in the same family as the build system is a form 682of cross-compilation, though very possibly this would merely be special 683options on a native compiler. In any case @samp{./configure} avoids depending 684on being able to run code on the build system, which is important when 685creating binaries for a newer CPU since they very possibly won't run on the 686build system. 687 688In all cases the compiler must be able to produce an executable (of whatever 689format) from a standard C @code{main}. Although only object files will go to 690make up @file{libgmp}, @samp{./configure} uses linking tests for various 691purposes, such as determining what functions are available on the host system. 692 693Currently a warning is given unless an explicit @samp{--build} is used when 694cross-compiling, because it may not be possible to correctly guess the build 695system type if the @env{PATH} has only a cross-compiling @command{cc}. 696 697Note that the @samp{--target} option is not appropriate for GMP@. It's for use 698when building compiler tools, with @samp{--host} being where they will run, 699and @samp{--target} what they'll produce code for. Ordinary programs or 700libraries like GMP are only interested in the @samp{--host} part, being where 701they'll run. (Some past versions of GMP used @samp{--target} incorrectly.) 702 703@item CPU types 704@cindex CPU types 705In general, if you want a library that runs as fast as possible, you should 706configure GMP for the exact CPU type your system uses. However, this may mean 707the binaries won't run on older members of the family, and might run slower on 708other members, older or newer. The best idea is always to build GMP for the 709exact machine type you intend to run it on. 710 711The following CPUs have specific support. See @file{configure.ac} for details 712of what code and compiler options they select. 713 714@itemize @bullet 715 716@c Keep this formatting, it's easy to read and it can be grepped to 717@c automatically test that CPUs listed get through ./config.sub 718 719@item 720Alpha: 721@nisamp{alpha}, 722@nisamp{alphaev5}, 723@nisamp{alphaev56}, 724@nisamp{alphapca56}, 725@nisamp{alphapca57}, 726@nisamp{alphaev6}, 727@nisamp{alphaev67}, 728@nisamp{alphaev68} 729@nisamp{alphaev7} 730 731@item 732Cray: 733@nisamp{c90}, 734@nisamp{j90}, 735@nisamp{t90}, 736@nisamp{sv1} 737 738@item 739HPPA: 740@nisamp{hppa1.0}, 741@nisamp{hppa1.1}, 742@nisamp{hppa2.0}, 743@nisamp{hppa2.0n}, 744@nisamp{hppa2.0w}, 745@nisamp{hppa64} 746 747@item 748IA-64: 749@nisamp{ia64}, 750@nisamp{itanium}, 751@nisamp{itanium2} 752 753@item 754MIPS: 755@nisamp{mips}, 756@nisamp{mips3}, 757@nisamp{mips64} 758 759@item 760Motorola: 761@nisamp{m68k}, 762@nisamp{m68000}, 763@nisamp{m68010}, 764@nisamp{m68020}, 765@nisamp{m68030}, 766@nisamp{m68040}, 767@nisamp{m68060}, 768@nisamp{m68302}, 769@nisamp{m68360}, 770@nisamp{m88k}, 771@nisamp{m88110} 772 773@item 774POWER: 775@nisamp{power}, 776@nisamp{power1}, 777@nisamp{power2}, 778@nisamp{power2sc} 779 780@item 781PowerPC: 782@nisamp{powerpc}, 783@nisamp{powerpc64}, 784@nisamp{powerpc401}, 785@nisamp{powerpc403}, 786@nisamp{powerpc405}, 787@nisamp{powerpc505}, 788@nisamp{powerpc601}, 789@nisamp{powerpc602}, 790@nisamp{powerpc603}, 791@nisamp{powerpc603e}, 792@nisamp{powerpc604}, 793@nisamp{powerpc604e}, 794@nisamp{powerpc620}, 795@nisamp{powerpc630}, 796@nisamp{powerpc740}, 797@nisamp{powerpc7400}, 798@nisamp{powerpc7450}, 799@nisamp{powerpc750}, 800@nisamp{powerpc801}, 801@nisamp{powerpc821}, 802@nisamp{powerpc823}, 803@nisamp{powerpc860}, 804@nisamp{powerpc970} 805 806@item 807SPARC: 808@nisamp{sparc}, 809@nisamp{sparcv8}, 810@nisamp{microsparc}, 811@nisamp{supersparc}, 812@nisamp{sparcv9}, 813@nisamp{ultrasparc}, 814@nisamp{ultrasparc2}, 815@nisamp{ultrasparc2i}, 816@nisamp{ultrasparc3}, 817@nisamp{sparc64} 818 819@item 820x86 family: 821@nisamp{i386}, 822@nisamp{i486}, 823@nisamp{i586}, 824@nisamp{pentium}, 825@nisamp{pentiummmx}, 826@nisamp{pentiumpro}, 827@nisamp{pentium2}, 828@nisamp{pentium3}, 829@nisamp{pentium4}, 830@nisamp{k6}, 831@nisamp{k62}, 832@nisamp{k63}, 833@nisamp{athlon}, 834@nisamp{amd64}, 835@nisamp{viac3}, 836@nisamp{viac32} 837 838@item 839Other: 840@nisamp{arm}, 841@nisamp{sh}, 842@nisamp{sh2}, 843@nisamp{vax}, 844@end itemize 845 846CPUs not listed will use generic C code. 847 848@item Generic C Build 849@cindex Generic C 850If some of the assembly code causes problems, or if otherwise desired, the 851generic C code can be selected with the configure @option{--disable-assembly}. 852 853Note that this will run quite slowly, but it should be portable and should at 854least make it possible to get something running if all else fails. 855 856@item Fat binary, @option{--enable-fat} 857@cindex Fat binary 858@cindex @code{--enable-fat} 859Using @option{--enable-fat} selects a ``fat binary'' build on x86, where 860optimized low level subroutines are chosen at runtime according to the CPU 861detected. This means more code, but gives good performance on all x86 chips. 862(This option might become available for more architectures in the future.) 863 864@item @option{ABI} 865@cindex ABI 866On some systems GMP supports multiple ABIs (application binary interfaces), 867meaning data type sizes and calling conventions. By default GMP chooses the 868best ABI available, but a particular ABI can be selected. For example 869 870@example 871./configure --host=mips64-sgi-irix6 ABI=n32 872@end example 873 874See @ref{ABI and ISA}, for the available choices on relevant CPUs, and what 875applications need to do. 876 877@item @option{CC}, @option{CFLAGS} 878@cindex C compiler 879@cindex @code{CC} 880@cindex @code{CFLAGS} 881By default the C compiler used is chosen from among some likely candidates, 882with @command{gcc} normally preferred if it's present. The usual 883@samp{CC=whatever} can be passed to @samp{./configure} to choose something 884different. 885 886For various systems, default compiler flags are set based on the CPU and 887compiler. The usual @samp{CFLAGS="-whatever"} can be passed to 888@samp{./configure} to use something different or to set good flags for systems 889GMP doesn't otherwise know. 890 891The @samp{CC} and @samp{CFLAGS} used are printed during @samp{./configure}, 892and can be found in each generated @file{Makefile}. This is the easiest way 893to check the defaults when considering changing or adding something. 894 895Note that when @samp{CC} and @samp{CFLAGS} are specified on a system 896supporting multiple ABIs it's important to give an explicit 897@samp{ABI=whatever}, since GMP can't determine the ABI just from the flags and 898won't be able to select the correct assembly code. 899 900If just @samp{CC} is selected then normal default @samp{CFLAGS} for that 901compiler will be used (if GMP recognises it). For example @samp{CC=gcc} can 902be used to force the use of GCC, with default flags (and default ABI). 903 904@item @option{CPPFLAGS} 905@cindex @code{CPPFLAGS} 906Any flags like @samp{-D} defines or @samp{-I} includes required by the 907preprocessor should be set in @samp{CPPFLAGS} rather than @samp{CFLAGS}. 908Compiling is done with both @samp{CPPFLAGS} and @samp{CFLAGS}, but 909preprocessing uses just @samp{CPPFLAGS}. This distinction is because most 910preprocessors won't accept all the flags the compiler does. Preprocessing is 911done separately in some configure tests. 912 913@item @option{CC_FOR_BUILD} 914@cindex @code{CC_FOR_BUILD} 915Some build-time programs are compiled and run to generate host-specific data 916tables. @samp{CC_FOR_BUILD} is the compiler used for this. It doesn't need 917to be in any particular ABI or mode, it merely needs to generate executables 918that can run. The default is to try the selected @samp{CC} and some likely 919candidates such as @samp{cc} and @samp{gcc}, looking for something that works. 920 921No flags are used with @samp{CC_FOR_BUILD} because a simple invocation like 922@samp{cc foo.c} should be enough. If some particular options are required 923they can be included as for instance @samp{CC_FOR_BUILD="cc -whatever"}. 924 925@item C++ Support, @option{--enable-cxx} 926@cindex C++ support 927@cindex @code{--enable-cxx} 928C++ support in GMP can be enabled with @samp{--enable-cxx}, in which case a 929C++ compiler will be required. As a convenience @samp{--enable-cxx=detect} 930can be used to enable C++ support only if a compiler can be found. The C++ 931support consists of a library @file{libgmpxx.la} and header file 932@file{gmpxx.h} (@pxref{Headers and Libraries}). 933 934A separate @file{libgmpxx.la} has been adopted rather than having C++ objects 935within @file{libgmp.la} in order to ensure dynamic linked C programs aren't 936bloated by a dependency on the C++ standard library, and to avoid any chance 937that the C++ compiler could be required when linking plain C programs. 938 939@file{libgmpxx.la} will use certain internals from @file{libgmp.la} and can 940only be expected to work with @file{libgmp.la} from the same GMP version. 941Future changes to the relevant internals will be accompanied by renaming, so a 942mismatch will cause unresolved symbols rather than perhaps mysterious 943misbehaviour. 944 945In general @file{libgmpxx.la} will be usable only with the C++ compiler that 946built it, since name mangling and runtime support are usually incompatible 947between different compilers. 948 949@item @option{CXX}, @option{CXXFLAGS} 950@cindex C++ compiler 951@cindex @code{CXX} 952@cindex @code{CXXFLAGS} 953When C++ support is enabled, the C++ compiler and its flags can be set with 954variables @samp{CXX} and @samp{CXXFLAGS} in the usual way. The default for 955@samp{CXX} is the first compiler that works from a list of likely candidates, 956with @command{g++} normally preferred when available. The default for 957@samp{CXXFLAGS} is to try @samp{CFLAGS}, @samp{CFLAGS} without @samp{-g}, then 958for @command{g++} either @samp{-g -O2} or @samp{-O2}, or for other compilers 959@samp{-g} or nothing. Trying @samp{CFLAGS} this way is convenient when using 960@samp{gcc} and @samp{g++} together, since the flags for @samp{gcc} will 961usually suit @samp{g++}. 962 963It's important that the C and C++ compilers match, meaning their startup and 964runtime support routines are compatible and that they generate code in the 965same ABI (if there's a choice of ABIs on the system). @samp{./configure} 966isn't currently able to check these things very well itself, so for that 967reason @samp{--disable-cxx} is the default, to avoid a build failure due to a 968compiler mismatch. Perhaps this will change in the future. 969 970Incidentally, it's normally not good enough to set @samp{CXX} to the same as 971@samp{CC}. Although @command{gcc} for instance recognises @file{foo.cc} as 972C++ code, only @command{g++} will invoke the linker the right way when 973building an executable or shared library from C++ object files. 974 975@item Temporary Memory, @option{--enable-alloca=<choice>} 976@cindex Temporary memory 977@cindex Stack overflow 978@cindex @code{alloca} 979@cindex @code{--enable-alloca} 980GMP allocates temporary workspace using one of the following three methods, 981which can be selected with for instance 982@samp{--enable-alloca=malloc-reentrant}. 983 984@itemize @bullet 985@item 986@samp{alloca} - C library or compiler builtin. 987@item 988@samp{malloc-reentrant} - the heap, in a re-entrant fashion. 989@item 990@samp{malloc-notreentrant} - the heap, with global variables. 991@end itemize 992 993For convenience, the following choices are also available. 994@samp{--disable-alloca} is the same as @samp{no}. 995 996@itemize @bullet 997@item 998@samp{yes} - a synonym for @samp{alloca}. 999@item 1000@samp{no} - a synonym for @samp{malloc-reentrant}. 1001@item 1002@samp{reentrant} - @code{alloca} if available, otherwise 1003@samp{malloc-reentrant}. This is the default. 1004@item 1005@samp{notreentrant} - @code{alloca} if available, otherwise 1006@samp{malloc-notreentrant}. 1007@end itemize 1008 1009@code{alloca} is reentrant and fast, and is recommended. It actually allocates 1010just small blocks on the stack; larger ones use malloc-reentrant. 1011 1012@samp{malloc-reentrant} is, as the name suggests, reentrant and thread safe, 1013but @samp{malloc-notreentrant} is faster and should be used if reentrancy is 1014not required. 1015 1016The two malloc methods in fact use the memory allocation functions selected by 1017@code{mp_set_memory_functions}, these being @code{malloc} and friends by 1018default. @xref{Custom Allocation}. 1019 1020An additional choice @samp{--enable-alloca=debug} is available, to help when 1021debugging memory related problems (@pxref{Debugging}). 1022 1023@item FFT Multiplication, @option{--disable-fft} 1024@cindex FFT multiplication 1025@cindex @code{--disable-fft} 1026By default multiplications are done using Karatsuba, 3-way Toom, higher degree 1027Toom, and Fermat FFT@. The FFT is only used on large to very large operands 1028and can be disabled to save code size if desired. 1029 1030@item Assertion Checking, @option{--enable-assert} 1031@cindex Assertion checking 1032@cindex @code{--enable-assert} 1033This option enables some consistency checking within the library. This can be 1034of use while debugging, @pxref{Debugging}. 1035 1036@item Execution Profiling, @option{--enable-profiling=prof/gprof/instrument} 1037@cindex Execution profiling 1038@cindex @code{--enable-profiling} 1039Enable profiling support, in one of various styles, @pxref{Profiling}. 1040 1041@item @option{MPN_PATH} 1042@cindex @code{MPN_PATH} 1043Various assembly versions of each mpn subroutines are provided. For a given 1044CPU, a search is made though a path to choose a version of each. For example 1045@samp{sparcv8} has 1046 1047@example 1048MPN_PATH="sparc32/v8 sparc32 generic" 1049@end example 1050 1051which means look first for v8 code, then plain sparc32 (which is v7), and 1052finally fall back on generic C@. Knowledgeable users with special requirements 1053can specify a different path. Normally this is completely unnecessary. 1054 1055@item Documentation 1056@cindex Documentation formats 1057@cindex Texinfo 1058The source for the document you're now reading is @file{doc/gmp.texi}, in 1059Texinfo format, see @GMPreftop{texinfo, Texinfo}. 1060 1061@cindex Postscript 1062@cindex DVI 1063@cindex PDF 1064Info format @samp{doc/gmp.info} is included in the distribution. The usual 1065automake targets are available to make PostScript, DVI, PDF and HTML (these 1066will require various @TeX{} and Texinfo tools). 1067 1068@cindex DocBook 1069@cindex XML 1070DocBook and XML can be generated by the Texinfo @command{makeinfo} program 1071too, see @ref{makeinfo options,, Options for @command{makeinfo}, texinfo, 1072Texinfo}. 1073 1074Some supplementary notes can also be found in the @file{doc} subdirectory. 1075 1076@end table 1077 1078 1079@need 2000 1080@node ABI and ISA, Notes for Package Builds, Build Options, Installing GMP 1081@section ABI and ISA 1082@cindex ABI 1083@cindex Application Binary Interface 1084@cindex ISA 1085@cindex Instruction Set Architecture 1086 1087ABI (Application Binary Interface) refers to the calling conventions between 1088functions, meaning what registers are used and what sizes the various C data 1089types are. ISA (Instruction Set Architecture) refers to the instructions and 1090registers a CPU has available. 1091 1092Some 64-bit ISA CPUs have both a 64-bit ABI and a 32-bit ABI defined, the 1093latter for compatibility with older CPUs in the family. GMP supports some 1094CPUs like this in both ABIs. In fact within GMP @samp{ABI} means a 1095combination of chip ABI, plus how GMP chooses to use it. For example in some 109632-bit ABIs, GMP may support a limb as either a 32-bit @code{long} or a 64-bit 1097@code{long long}. 1098 1099By default GMP chooses the best ABI available for a given system, and this 1100generally gives significantly greater speed. But an ABI can be chosen 1101explicitly to make GMP compatible with other libraries, or particular 1102application requirements. For example, 1103 1104@example 1105./configure ABI=32 1106@end example 1107 1108In all cases it's vital that all object code used in a given program is 1109compiled for the same ABI. 1110 1111Usually a limb is implemented as a @code{long}. When a @code{long long} limb 1112is used this is encoded in the generated @file{gmp.h}. This is convenient for 1113applications, but it does mean that @file{gmp.h} will vary, and can't be just 1114copied around. @file{gmp.h} remains compiler independent though, since all 1115compilers for a particular ABI will be expected to use the same limb type. 1116 1117Currently no attempt is made to follow whatever conventions a system has for 1118installing library or header files built for a particular ABI@. This will 1119probably only matter when installing multiple builds of GMP, and it might be 1120as simple as configuring with a special @samp{libdir}, or it might require 1121more than that. Note that builds for different ABIs need to done separately, 1122with a fresh @command{./configure} and @command{make} each. 1123 1124@sp 1 1125@table @asis 1126@need 1000 1127@item AMD64 (@samp{x86_64}) 1128@cindex AMD64 1129On AMD64 systems supporting both 32-bit and 64-bit modes for applications, the 1130following ABI choices are available. 1131 1132@table @asis 1133@item @samp{ABI=64} 1134The 64-bit ABI uses 64-bit limbs and pointers and makes full use of the chip 1135architecture. This is the default. Applications will usually not need 1136special compiler flags, but for reference the option is 1137 1138@example 1139gcc -m64 1140@end example 1141 1142@item @samp{ABI=32} 1143The 32-bit ABI is the usual i386 conventions. This will be slower, and is not 1144recommended except for inter-operating with other code not yet 64-bit capable. 1145Applications must be compiled with 1146 1147@example 1148gcc -m32 1149@end example 1150 1151(In GCC 2.95 and earlier there's no @samp{-m32} option, it's the only mode.) 1152 1153@item @samp{ABI=x32} 1154The x32 ABI uses 64-bit limbs but 32-bit pointers. Like the 64-bit ABI, it 1155makes full use of the chip's arithmetic capabilities. This ABI is not 1156supported by all operating systems. 1157 1158@example 1159gcc -mx32 1160@end example 1161 1162@end table 1163 1164@sp 1 1165@need 1000 1166@item HPPA 2.0 (@samp{hppa2.0*}, @samp{hppa64}) 1167@cindex HPPA 1168@cindex HP-UX 1169@table @asis 1170@item @samp{ABI=2.0w} 1171The 2.0w ABI uses 64-bit limbs and pointers and is available on HP-UX 11 or 1172up. Applications must be compiled with 1173 1174@example 1175gcc [built for 2.0w] 1176cc +DD64 1177@end example 1178 1179@item @samp{ABI=2.0n} 1180The 2.0n ABI means the 32-bit HPPA 1.0 ABI and all its normal calling 1181conventions, but with 64-bit instructions permitted within functions. GMP 1182uses a 64-bit @code{long long} for a limb. This ABI is available on hppa64 1183GNU/Linux and on HP-UX 10 or higher. Applications must be compiled with 1184 1185@example 1186gcc [built for 2.0n] 1187cc +DA2.0 +e 1188@end example 1189 1190Note that current versions of GCC (eg.@: 3.2) don't generate 64-bit 1191instructions for @code{long long} operations and so may be slower than for 11922.0w. (The GMP assembly code is the same though.) 1193 1194@item @samp{ABI=1.0} 1195HPPA 2.0 CPUs can run all HPPA 1.0 and 1.1 code in the 32-bit HPPA 1.0 ABI@. 1196No special compiler options are needed for applications. 1197@end table 1198 1199All three ABIs are available for CPU types @samp{hppa2.0w}, @samp{hppa2.0} and 1200@samp{hppa64}, but for CPU type @samp{hppa2.0n} only 2.0n or 1.0 are 1201considered. 1202 1203Note that GCC on HP-UX has no options to choose between 2.0n and 2.0w modes, 1204unlike HP @command{cc}. Instead it must be built for one or the other ABI@. 1205GMP will detect how it was built, and skip to the corresponding @samp{ABI}. 1206 1207@sp 1 1208@need 1500 1209@item IA-64 under HP-UX (@samp{ia64*-*-hpux*}, @samp{itanium*-*-hpux*}) 1210@cindex IA-64 1211@cindex HP-UX 1212HP-UX supports two ABIs for IA-64. GMP performance is the same in both. 1213 1214@table @asis 1215@item @samp{ABI=32} 1216In the 32-bit ABI, pointers, @code{int}s and @code{long}s are 32 bits and GMP 1217uses a 64 bit @code{long long} for a limb. Applications can be compiled 1218without any special flags since this ABI is the default in both HP C and GCC, 1219but for reference the flags are 1220 1221@example 1222gcc -milp32 1223cc +DD32 1224@end example 1225 1226@item @samp{ABI=64} 1227In the 64-bit ABI, @code{long}s and pointers are 64 bits and GMP uses a 1228@code{long} for a limb. Applications must be compiled with 1229 1230@example 1231gcc -mlp64 1232cc +DD64 1233@end example 1234@end table 1235 1236On other IA-64 systems, GNU/Linux for instance, @samp{ABI=64} is the only 1237choice. 1238 1239@sp 1 1240@need 1000 1241@item MIPS under IRIX 6 (@samp{mips*-*-irix[6789]}) 1242@cindex MIPS 1243@cindex IRIX 1244IRIX 6 always has a 64-bit MIPS 3 or better CPU, and supports ABIs o32, n32, 1245and 64. n32 or 64 are recommended, and GMP performance will be the same in 1246each. The default is n32. 1247 1248@table @asis 1249@item @samp{ABI=o32} 1250The o32 ABI is 32-bit pointers and integers, and no 64-bit operations. GMP 1251will be slower than in n32 or 64, this option only exists to support old 1252compilers, eg.@: GCC 2.7.2. Applications can be compiled with no special 1253flags on an old compiler, or on a newer compiler with 1254 1255@example 1256gcc -mabi=32 1257cc -32 1258@end example 1259 1260@item @samp{ABI=n32} 1261The n32 ABI is 32-bit pointers and integers, but with a 64-bit limb using a 1262@code{long long}. Applications must be compiled with 1263 1264@example 1265gcc -mabi=n32 1266cc -n32 1267@end example 1268 1269@item @samp{ABI=64} 1270The 64-bit ABI is 64-bit pointers and integers. Applications must be compiled 1271with 1272 1273@example 1274gcc -mabi=64 1275cc -64 1276@end example 1277@end table 1278 1279Note that MIPS GNU/Linux, as of kernel version 2.2, doesn't have the necessary 1280support for n32 or 64 and so only gets a 32-bit limb and the MIPS 2 code. 1281 1282@sp 1 1283@need 1000 1284@item PowerPC 64 (@samp{powerpc64}, @samp{powerpc620}, @samp{powerpc630}, @samp{powerpc970}, @samp{power4}, @samp{power5}) 1285@cindex PowerPC 1286@table @asis 1287@item @samp{ABI=mode64} 1288@cindex AIX 1289The AIX 64 ABI uses 64-bit limbs and pointers and is the default on PowerPC 64 1290@samp{*-*-aix*} systems. Applications must be compiled with 1291 1292@example 1293gcc -maix64 1294xlc -q64 1295@end example 1296 1297On 64-bit GNU/Linux, BSD, and Mac OS X/Darwin systems, the applications must 1298be compiled with 1299 1300@example 1301gcc -m64 1302@end example 1303 1304@item @samp{ABI=mode32} 1305The @samp{mode32} ABI uses a 64-bit @code{long long} limb but with the chip 1306still in 32-bit mode and using 32-bit calling conventions. This is the default 1307for systems where the true 64-bit ABI is unavailable. No special compiler 1308options are typically needed for applications. This ABI is not available under 1309AIX. 1310 1311@item @samp{ABI=32} 1312This is the basic 32-bit PowerPC ABI, with a 32-bit limb. No special compiler 1313options are needed for applications. 1314@end table 1315 1316GMP's speed is greatest for the @samp{mode64} ABI, the @samp{mode32} ABI is 2nd 1317best. In @samp{ABI=32} only the 32-bit ISA is used and this doesn't make full 1318use of a 64-bit chip. 1319 1320@sp 1 1321@need 1000 1322@item Sparc V9 (@samp{sparc64}, @samp{sparcv9}, @samp{ultrasparc*}) 1323@cindex Sparc V9 1324@cindex Solaris 1325@cindex Sun 1326@table @asis 1327@item @samp{ABI=64} 1328The 64-bit V9 ABI is available on the various BSD sparc64 ports, recent 1329versions of Sparc64 GNU/Linux, and Solaris 2.7 and up (when the kernel is in 133064-bit mode). GCC 3.2 or higher, or Sun @command{cc} is required. On 1331GNU/Linux, depending on the default @command{gcc} mode, applications must be 1332compiled with 1333 1334@example 1335gcc -m64 1336@end example 1337 1338On Solaris applications must be compiled with 1339 1340@example 1341gcc -m64 -mptr64 -Wa,-xarch=v9 -mcpu=v9 1342cc -xarch=v9 1343@end example 1344 1345On the BSD sparc64 systems no special options are required, since 64-bits is 1346the only ABI available. 1347 1348@item @samp{ABI=32} 1349For the basic 32-bit ABI, GMP still uses as much of the V9 ISA as it can. In 1350the Sun documentation this combination is known as ``v8plus''. On GNU/Linux, 1351depending on the default @command{gcc} mode, applications may need to be 1352compiled with 1353 1354@example 1355gcc -m32 1356@end example 1357 1358On Solaris, no special compiler options are required for applications, though 1359using something like the following is recommended. (@command{gcc} 2.8 and 1360earlier only support @samp{-mv8} though.) 1361 1362@example 1363gcc -mv8plus 1364cc -xarch=v8plus 1365@end example 1366@end table 1367 1368GMP speed is greatest in @samp{ABI=64}, so it's the default where available. 1369The speed is partly because there are extra registers available and partly 1370because 64-bits is considered the more important case and has therefore had 1371better code written for it. 1372 1373Don't be confused by the names of the @samp{-m} and @samp{-x} compiler 1374options, they're called @samp{arch} but effectively control both ABI and ISA@. 1375 1376On Solaris 2.6 and earlier, only @samp{ABI=32} is available since the kernel 1377doesn't save all registers. 1378 1379On Solaris 2.7 with the kernel in 32-bit mode, a normal native build will 1380reject @samp{ABI=64} because the resulting executables won't run. 1381@samp{ABI=64} can still be built if desired by making it look like a 1382cross-compile, for example 1383 1384@example 1385./configure --build=none --host=sparcv9-sun-solaris2.7 ABI=64 1386@end example 1387@end table 1388 1389 1390@need 2000 1391@node Notes for Package Builds, Notes for Particular Systems, ABI and ISA, Installing GMP 1392@section Notes for Package Builds 1393@cindex Build notes for binary packaging 1394@cindex Packaged builds 1395 1396GMP should present no great difficulties for packaging in a binary 1397distribution. 1398 1399@cindex Libtool versioning 1400@cindex Shared library versioning 1401Libtool is used to build the library and @samp{-version-info} is set 1402appropriately, having started from @samp{3:0:0} in GMP 3.0 (@pxref{Versioning, 1403Library interface versions, Library interface versions, libtool, GNU 1404Libtool}). 1405 1406The GMP 4 series will be upwardly binary compatible in each release and will 1407be upwardly binary compatible with all of the GMP 3 series. Additional 1408function interfaces may be added in each release, so on systems where libtool 1409versioning is not fully checked by the loader an auxiliary mechanism may be 1410needed to express that a dynamic linked application depends on a new enough 1411GMP. 1412 1413An auxiliary mechanism may also be needed to express that @file{libgmpxx.la} 1414(from @option{--enable-cxx}, @pxref{Build Options}) requires @file{libgmp.la} 1415from the same GMP version, since this is not done by the libtool versioning, 1416nor otherwise. A mismatch will result in unresolved symbols from the linker, 1417or perhaps the loader. 1418 1419When building a package for a CPU family, care should be taken to use 1420@samp{--host} (or @samp{--build}) to choose the least common denominator among 1421the CPUs which might use the package. For example this might mean plain 1422@samp{sparc} (meaning V7) for SPARCs. 1423 1424For x86s, @option{--enable-fat} sets things up for a fat binary build, making a 1425runtime selection of optimized low level routines. This is a good choice for 1426packaging to run on a range of x86 chips. 1427 1428Users who care about speed will want GMP built for their exact CPU type, to 1429make best use of the available optimizations. Providing a way to suitably 1430rebuild a package may be useful. This could be as simple as making it 1431possible for a user to omit @samp{--build} (and @samp{--host}) so 1432@samp{./config.guess} will detect the CPU@. But a way to manually specify a 1433@samp{--build} will be wanted for systems where @samp{./config.guess} is 1434inexact. 1435 1436On systems with multiple ABIs, a packaged build will need to decide which 1437among the choices is to be provided, see @ref{ABI and ISA}. A given run of 1438@samp{./configure} etc will only build one ABI@. If a second ABI is also 1439required then a second run of @samp{./configure} etc must be made, starting 1440from a clean directory tree (@samp{make distclean}). 1441 1442As noted under ``ABI and ISA'', currently no attempt is made to follow system 1443conventions for install locations that vary with ABI, such as 1444@file{/usr/lib/sparcv9} for @samp{ABI=64} as opposed to @file{/usr/lib} for 1445@samp{ABI=32}. A package build can override @samp{libdir} and other standard 1446variables as necessary. 1447 1448Note that @file{gmp.h} is a generated file, and will be architecture and ABI 1449dependent. When attempting to install two ABIs simultaneously it will be 1450important that an application compile gets the correct @file{gmp.h} for its 1451desired ABI@. If compiler include paths don't vary with ABI options then it 1452might be necessary to create a @file{/usr/include/gmp.h} which tests 1453preprocessor symbols and chooses the correct actual @file{gmp.h}. 1454 1455 1456@need 2000 1457@node Notes for Particular Systems, Known Build Problems, Notes for Package Builds, Installing GMP 1458@section Notes for Particular Systems 1459@cindex Build notes for particular systems 1460@cindex Particular systems 1461@cindex Systems 1462@table @asis 1463 1464@c This section is more or less meant for notes about performance or about 1465@c build problems that have been worked around but might leave a user 1466@c scratching their head. Fun with different ABIs on a system belongs in the 1467@c above section. 1468 1469@item AIX 3 and 4 1470@cindex AIX 1471On systems @samp{*-*-aix[34]*} shared libraries are disabled by default, since 1472some versions of the native @command{ar} fail on the convenience libraries 1473used. A shared build can be attempted with 1474 1475@example 1476./configure --enable-shared --disable-static 1477@end example 1478 1479Note that the @samp{--disable-static} is necessary because in a shared build 1480libtool makes @file{libgmp.a} a symlink to @file{libgmp.so}, apparently for 1481the benefit of old versions of @command{ld} which only recognise @file{.a}, 1482but unfortunately this is done even if a fully functional @command{ld} is 1483available. 1484 1485@item ARM 1486@cindex ARM 1487On systems @samp{arm*-*-*}, versions of GCC up to and including 2.95.3 have a 1488bug in unsigned division, giving wrong results for some operands. GMP 1489@samp{./configure} will demand GCC 2.95.4 or later. 1490 1491@item Compaq C++ 1492@cindex Compaq C++ 1493Compaq C++ on OSF 5.1 has two flavours of @code{iostream}, a standard one and 1494an old pre-standard one (see @samp{man iostream_intro}). GMP can only use the 1495standard one, which unfortunately is not the default but must be selected by 1496defining @code{__USE_STD_IOSTREAM}. Configure with for instance 1497 1498@example 1499./configure --enable-cxx CPPFLAGS=-D__USE_STD_IOSTREAM 1500@end example 1501 1502@item Floating Point Mode 1503@cindex Floating point mode 1504@cindex Hardware floating point mode 1505@cindex Precision of hardware floating point 1506@cindex x87 1507On some systems, the hardware floating point has a control mode which can set 1508all operations to be done in a particular precision, for instance single, 1509double or extended on x86 systems (x87 floating point). The GMP functions 1510involving a @code{double} cannot be expected to operate to their full 1511precision when the hardware is in single precision mode. Of course this 1512affects all code, including application code, not just GMP. 1513 1514@item FreeBSD 7.x, 8.x, 9.0, 9.1, 9.2 1515@cindex FreeBSD 1516@command{m4} in these releases of FreeBSD has an eval function which ignores 1517its 2nd and 3rd arguments, which makes it unsuitable for @file{.asm} file 1518processing. @samp{./configure} will detect the problem and either abort or 1519choose another m4 in the @env{PATH}. The bug is fixed in FreeBSD 9.3 and 10.0, 1520so either upgrade or use GNU m4. Note that the FreeBSD package system installs 1521GNU m4 under the name @samp{gm4}, which GMP cannot guess. 1522 1523@item FreeBSD 7.x, 8.x, 9.x 1524@cindex FreeBSD 1525GMP releases starting with 6.0 do not support @samp{ABI=32} on FreeBSD/amd64 1526prior to release 10.0 of the system. The cause is a broken @code{limits.h}, 1527which GMP no longer works around. 1528 1529@item MS-DOS and MS Windows 1530@cindex MS-DOS 1531@cindex MS Windows 1532@cindex Windows 1533@cindex Cygwin 1534@cindex DJGPP 1535@cindex MINGW 1536On an MS-DOS system DJGPP can be used to build GMP, and on an MS Windows 1537system Cygwin, DJGPP and MINGW can be used. All three are excellent ports of 1538GCC and the various GNU tools. 1539 1540@display 1541@uref{http://www.cygwin.com/} 1542@uref{http://www.delorie.com/djgpp/} 1543@uref{http://www.mingw.org/} 1544@end display 1545 1546@cindex Interix 1547@cindex Services for Unix 1548Microsoft also publishes an Interix ``Services for Unix'' which can be used to 1549build GMP on Windows (with a normal @samp{./configure}), but it's not free 1550software. 1551 1552@item MS Windows DLLs 1553@cindex DLLs 1554@cindex MS Windows 1555@cindex Windows 1556On systems @samp{*-*-cygwin*}, @samp{*-*-mingw*} and @samp{*-*-pw32*} by 1557default GMP builds only a static library, but a DLL can be built instead using 1558 1559@example 1560./configure --disable-static --enable-shared 1561@end example 1562 1563Static and DLL libraries can't both be built, since certain export directives 1564in @file{gmp.h} must be different. 1565 1566A MINGW DLL build of GMP can be used with Microsoft C@. Libtool doesn't 1567install a @file{.lib} format import library, but it can be created with MS 1568@command{lib} as follows, and copied to the install directory. Similarly for 1569@file{libmp} and @file{libgmpxx}. 1570 1571@example 1572cd .libs 1573lib /def:libgmp-3.dll.def /out:libgmp-3.lib 1574@end example 1575 1576MINGW uses the C runtime library @samp{msvcrt.dll} for I/O, so applications 1577wanting to use the GMP I/O routines must be compiled with @samp{cl /MD} to do 1578the same. If one of the other C runtime library choices provided by MS C is 1579desired then the suggestion is to use the GMP string functions and confine I/O 1580to the application. 1581 1582@item Motorola 68k CPU Types 1583@cindex 68000 1584@samp{m68k} is taken to mean 68000. @samp{m68020} or higher will give a 1585performance boost on applicable CPUs. @samp{m68360} can be used for CPU32 1586series chips. @samp{m68302} can be used for ``Dragonball'' series chips, 1587though this is merely a synonym for @samp{m68000}. 1588 1589@item NetBSD 5.x 1590@cindex NetBSD 1591@command{m4} in these releases of NetBSD has an eval function which ignores its 15922nd and 3rd arguments, which makes it unsuitable for @file{.asm} file 1593processing. @samp{./configure} will detect the problem and either abort or 1594choose another m4 in the @env{PATH}. The bug is fixed in NetBSD 6, so either 1595upgrade or use GNU m4. Note that the NetBSD package system installs GNU m4 1596under the name @samp{gm4}, which GMP cannot guess. 1597 1598@item OpenBSD 2.6 1599@cindex OpenBSD 1600@command{m4} in this release of OpenBSD has a bug in @code{eval} that makes it 1601unsuitable for @file{.asm} file processing. @samp{./configure} will detect 1602the problem and either abort or choose another m4 in the @env{PATH}. The bug 1603is fixed in OpenBSD 2.7, so either upgrade or use GNU m4. 1604 1605@item Power CPU Types 1606@cindex Power/PowerPC 1607In GMP, CPU types @samp{power*} and @samp{powerpc*} will each use instructions 1608not available on the other, so it's important to choose the right one for the 1609CPU that will be used. Currently GMP has no assembly code support for using 1610just the common instruction subset. To get executables that run on both, the 1611current suggestion is to use the generic C code (@option{--disable-assembly}), 1612possibly with appropriate compiler options (like @samp{-mcpu=common} for 1613@command{gcc}). CPU @samp{rs6000} (which is not a CPU but a family of 1614workstations) is accepted by @file{config.sub}, but is currently equivalent to 1615@option{--disable-assembly}. 1616 1617@item Sparc CPU Types 1618@cindex Sparc 1619@samp{sparcv8} or @samp{supersparc} on relevant systems will give a 1620significant performance increase over the V7 code selected by plain 1621@samp{sparc}. 1622 1623@item Sparc App Regs 1624@cindex Sparc 1625The GMP assembly code for both 32-bit and 64-bit Sparc clobbers the 1626``application registers'' @code{g2}, @code{g3} and @code{g4}, the same way 1627that the GCC default @samp{-mapp-regs} does (@pxref{SPARC Options,, SPARC 1628Options, gcc, Using the GNU Compiler Collection (GCC)}). 1629 1630This makes that code unsuitable for use with the special V9 1631@samp{-mcmodel=embmedany} (which uses @code{g4} as a data segment pointer), and 1632for applications wanting to use those registers for special purposes. In these 1633cases the only suggestion currently is to build GMP with 1634@option{--disable-assembly} to avoid the assembly code. 1635 1636@item SunOS 4 1637@cindex SunOS 1638@command{/usr/bin/m4} lacks various features needed to process @file{.asm} 1639files, and instead @samp{./configure} will automatically use 1640@command{/usr/5bin/m4}, which we believe is always available (if not then use 1641GNU m4). 1642 1643@item x86 CPU Types 1644@cindex x86 1645@cindex 80x86 1646@cindex i386 1647@samp{i586}, @samp{pentium} or @samp{pentiummmx} code is good for its intended 1648P5 Pentium chips, but quite slow when run on Intel P6 class chips (PPro, P-II, 1649P-III)@. @samp{i386} is a better choice when making binaries that must run on 1650both. 1651 1652@item x86 MMX and SSE2 Code 1653@cindex MMX 1654@cindex SSE2 1655If the CPU selected has MMX code but the assembler doesn't support it, a 1656warning is given and non-MMX code is used instead. This will be an inferior 1657build, since the MMX code that's present is there because it's faster than the 1658corresponding plain integer code. The same applies to SSE2. 1659 1660Old versions of @samp{gas} don't support MMX instructions, in particular 1661version 1.92.3 that comes with FreeBSD 2.2.8 or the more recent OpenBSD 3.1 1662doesn't. 1663 1664Solaris 2.6 and 2.7 @command{as} generate incorrect object code for register 1665to register @code{movq} instructions, and so can't be used for MMX code. 1666Install a recent @command{gas} if MMX code is wanted on these systems. 1667@end table 1668 1669 1670@need 2000 1671@node Known Build Problems, Performance optimization, Notes for Particular Systems, Installing GMP 1672@section Known Build Problems 1673@cindex Build problems known 1674 1675@c This section is more or less meant for known build problems that are not 1676@c otherwise worked around and require some sort of manual intervention. 1677 1678You might find more up-to-date information at @uref{https://gmplib.org/}. 1679 1680@table @asis 1681@item Compiler link options 1682The version of libtool currently in use rather aggressively strips compiler 1683options when linking a shared library. This will hopefully be relaxed in the 1684future, but for now if this is a problem the suggestion is to create a little 1685script to hide them, and for instance configure with 1686 1687@example 1688./configure CC=gcc-with-my-options 1689@end example 1690 1691@item DJGPP (@samp{*-*-msdosdjgpp*}) 1692@cindex DJGPP 1693The DJGPP port of @command{bash} 2.03 is unable to run the @samp{configure} 1694script, it exits silently, having died writing a preamble to 1695@file{config.log}. Use @command{bash} 2.04 or higher. 1696 1697@samp{make all} was found to run out of memory during the final 1698@file{libgmp.la} link on one system tested, despite having 64Mb available. 1699Running @samp{make libgmp.la} directly helped, perhaps recursing into the 1700various subdirectories uses up memory. 1701 1702@item GNU binutils @command{strip} prior to 2.12 1703@cindex Stripped libraries 1704@cindex Binutils @command{strip} 1705@cindex GNU @command{strip} 1706@command{strip} from GNU binutils 2.11 and earlier should not be used on the 1707static libraries @file{libgmp.a} and @file{libmp.a} since it will discard all 1708but the last of multiple archive members with the same name, like the three 1709versions of @file{init.o} in @file{libgmp.a}. Binutils 2.12 or higher can be 1710used successfully. 1711 1712The shared libraries @file{libgmp.so} and @file{libmp.so} are not affected by 1713this and any version of @command{strip} can be used on them. 1714 1715@item @command{make} syntax error 1716@cindex SCO 1717@cindex IRIX 1718On certain versions of SCO OpenServer 5 and IRIX 6.5 the native @command{make} 1719is unable to handle the long dependencies list for @file{libgmp.la}. The 1720symptom is a ``syntax error'' on the following line of the top-level 1721@file{Makefile}. 1722 1723@example 1724libgmp.la: $(libgmp_la_OBJECTS) $(libgmp_la_DEPENDENCIES) 1725@end example 1726 1727Either use GNU Make, or as a workaround remove 1728@code{$(libgmp_la_DEPENDENCIES)} from that line (which will make the initial 1729build work, but if any recompiling is done @file{libgmp.la} might not be 1730rebuilt). 1731 1732@item MacOS X (@samp{*-*-darwin*}) 1733@cindex MacOS X 1734@cindex Darwin 1735Libtool currently only knows how to create shared libraries on MacOS X using 1736the native @command{cc} (which is a modified GCC), not a plain GCC@. A 1737static-only build should work though (@samp{--disable-shared}). 1738 1739@item NeXT prior to 3.3 1740@cindex NeXT 1741The system compiler on old versions of NeXT was a massacred and old GCC, even 1742if it called itself @file{cc}. This compiler cannot be used to build GMP, you 1743need to get a real GCC, and install that. (NeXT may have fixed this in 1744release 3.3 of their system.) 1745 1746@item POWER and PowerPC 1747@cindex Power/PowerPC 1748Bugs in GCC 2.7.2 (and 2.6.3) mean it can't be used to compile GMP on POWER or 1749PowerPC@. If you want to use GCC for these machines, get GCC 2.7.2.1 (or 1750later). 1751 1752@item Sequent Symmetry 1753@cindex Sequent Symmetry 1754Use the GNU assembler instead of the system assembler, since the latter has 1755serious bugs. 1756 1757@item Solaris 2.6 1758@cindex Solaris 1759The system @command{sed} prints an error ``Output line too long'' when libtool 1760builds @file{libgmp.la}. This doesn't seem to cause any obvious ill effects, 1761but GNU @command{sed} is recommended, to avoid any doubt. 1762 1763@item Sparc Solaris 2.7 with gcc 2.95.2 in @samp{ABI=32} 1764@cindex Solaris 1765A shared library build of GMP seems to fail in this combination, it builds but 1766then fails the tests, apparently due to some incorrect data relocations within 1767@code{gmp_randinit_lc_2exp_size}. The exact cause is unknown, 1768@samp{--disable-shared} is recommended. 1769@end table 1770 1771 1772@need 2000 1773@node Performance optimization, , Known Build Problems, Installing GMP 1774@section Performance optimization 1775@cindex Optimizing performance 1776 1777@c At some point, this should perhaps move to a separate chapter on optimizing 1778@c performance. 1779 1780For optimal performance, build GMP for the exact CPU type of the target 1781computer, see @ref{Build Options}. 1782 1783Unlike what is the case for most other programs, the compiler typically 1784doesn't matter much, since GMP uses assembly language for the most critical 1785operation. 1786 1787In particular for long-running GMP applications, and applications demanding 1788extremely large numbers, building and running the @code{tuneup} program in the 1789@file{tune} subdirectory, can be important. For example, 1790 1791@example 1792cd tune 1793make tuneup 1794./tuneup 1795@end example 1796 1797will generate better contents for the @file{gmp-mparam.h} parameter file. 1798 1799To use the results, put the output in the file indicated in the 1800@samp{Parameters for ...} header. Then recompile from scratch. 1801 1802The @code{tuneup} program takes one useful parameter, @samp{-f NNN}, which 1803instructs the program how long to check FFT multiply parameters. If you're 1804going to use GMP for extremely large numbers, you may want to run @code{tuneup} 1805with a large NNN value. 1806 1807 1808@node GMP Basics, Reporting Bugs, Installing GMP, Top 1809@comment node-name, next, previous, up 1810@chapter GMP Basics 1811@cindex Basics 1812 1813@strong{Using functions, macros, data types, etc.@: not documented in this 1814manual is strongly discouraged. If you do so your application is guaranteed 1815to be incompatible with future versions of GMP.} 1816 1817@menu 1818* Headers and Libraries:: 1819* Nomenclature and Types:: 1820* Function Classes:: 1821* Variable Conventions:: 1822* Parameter Conventions:: 1823* Memory Management:: 1824* Reentrancy:: 1825* Useful Macros and Constants:: 1826* Compatibility with older versions:: 1827* Demonstration Programs:: 1828* Efficiency:: 1829* Debugging:: 1830* Profiling:: 1831* Autoconf:: 1832* Emacs:: 1833@end menu 1834 1835@node Headers and Libraries, Nomenclature and Types, GMP Basics, GMP Basics 1836@section Headers and Libraries 1837@cindex Headers 1838 1839@cindex @file{gmp.h} 1840@cindex Include files 1841@cindex @code{#include} 1842All declarations needed to use GMP are collected in the include file 1843@file{gmp.h}. It is designed to work with both C and C++ compilers. 1844 1845@example 1846#include <gmp.h> 1847@end example 1848 1849@cindex @code{stdio.h} 1850Note however that prototypes for GMP functions with @code{FILE *} parameters 1851are only provided if @code{<stdio.h>} is included too. 1852 1853@example 1854#include <stdio.h> 1855#include <gmp.h> 1856@end example 1857 1858@cindex @code{stdarg.h} 1859Likewise @code{<stdarg.h>} is required for prototypes with @code{va_list} 1860parameters, such as @code{gmp_vprintf}. And @code{<obstack.h>} for prototypes 1861with @code{struct obstack} parameters, such as @code{gmp_obstack_printf}, when 1862available. 1863 1864@cindex Libraries 1865@cindex Linking 1866@cindex @code{libgmp} 1867All programs using GMP must link against the @file{libgmp} library. On a 1868typical Unix-like system this can be done with @samp{-lgmp}, for example 1869 1870@example 1871gcc myprogram.c -lgmp 1872@end example 1873 1874@cindex @code{libgmpxx} 1875GMP C++ functions are in a separate @file{libgmpxx} library. This is built 1876and installed if C++ support has been enabled (@pxref{Build Options}). For 1877example, 1878 1879@example 1880g++ mycxxprog.cc -lgmpxx -lgmp 1881@end example 1882 1883@cindex Libtool 1884GMP is built using Libtool and an application can use that to link if desired, 1885@GMPpxreftop{libtool, GNU Libtool}. 1886 1887If GMP has been installed to a non-standard location then it may be necessary 1888to use @samp{-I} and @samp{-L} compiler options to point to the right 1889directories, and some sort of run-time path for a shared library. 1890 1891 1892@node Nomenclature and Types, Function Classes, Headers and Libraries, GMP Basics 1893@section Nomenclature and Types 1894@cindex Nomenclature 1895@cindex Types 1896 1897@cindex Integer 1898@tindex @code{mpz_t} 1899In this manual, @dfn{integer} usually means a multiple precision integer, as 1900defined by the GMP library. The C data type for such integers is @code{mpz_t}. 1901Here are some examples of how to declare such integers: 1902 1903@example 1904mpz_t sum; 1905 1906struct foo @{ mpz_t x, y; @}; 1907 1908mpz_t vec[20]; 1909@end example 1910 1911@cindex Rational number 1912@tindex @code{mpq_t} 1913@dfn{Rational number} means a multiple precision fraction. The C data type 1914for these fractions is @code{mpq_t}. For example: 1915 1916@example 1917mpq_t quotient; 1918@end example 1919 1920@cindex Floating-point number 1921@tindex @code{mpf_t} 1922@dfn{Floating point number} or @dfn{Float} for short, is an arbitrary precision 1923mantissa with a limited precision exponent. The C data type for such objects 1924is @code{mpf_t}. For example: 1925 1926@example 1927mpf_t fp; 1928@end example 1929 1930@tindex @code{mp_exp_t} 1931The floating point functions accept and return exponents in the C type 1932@code{mp_exp_t}. Currently this is usually a @code{long}, but on some systems 1933it's an @code{int} for efficiency. 1934 1935@cindex Limb 1936@tindex @code{mp_limb_t} 1937A @dfn{limb} means the part of a multi-precision number that fits in a single 1938machine word. (We chose this word because a limb of the human body is 1939analogous to a digit, only larger, and containing several digits.) Normally a 1940limb is 32 or 64 bits. The C data type for a limb is @code{mp_limb_t}. 1941 1942@tindex @code{mp_size_t} 1943Counts of limbs of a multi-precision number represented in the C type 1944@code{mp_size_t}. Currently this is normally a @code{long}, but on some 1945systems it's an @code{int} for efficiency, and on some systems it will be 1946@code{long long} in the future. 1947 1948@tindex @code{mp_bitcnt_t} 1949Counts of bits of a multi-precision number are represented in the C type 1950@code{mp_bitcnt_t}. Currently this is always an @code{unsigned long}, but on 1951some systems it will be an @code{unsigned long long} in the future. 1952 1953@cindex Random state 1954@tindex @code{gmp_randstate_t} 1955@dfn{Random state} means an algorithm selection and current state data. The C 1956data type for such objects is @code{gmp_randstate_t}. For example: 1957 1958@example 1959gmp_randstate_t rstate; 1960@end example 1961 1962Also, in general @code{mp_bitcnt_t} is used for bit counts and ranges, and 1963@code{size_t} is used for byte or character counts. 1964 1965 1966@node Function Classes, Variable Conventions, Nomenclature and Types, GMP Basics 1967@section Function Classes 1968@cindex Function classes 1969 1970There are six classes of functions in the GMP library: 1971 1972@enumerate 1973@item 1974Functions for signed integer arithmetic, with names beginning with 1975@code{mpz_}. The associated type is @code{mpz_t}. There are about 150 1976functions in this class. (@pxref{Integer Functions}) 1977 1978@item 1979Functions for rational number arithmetic, with names beginning with 1980@code{mpq_}. The associated type is @code{mpq_t}. There are about 35 1981functions in this class, but the integer functions can be used for arithmetic 1982on the numerator and denominator separately. (@pxref{Rational Number 1983Functions}) 1984 1985@item 1986Functions for floating-point arithmetic, with names beginning with 1987@code{mpf_}. The associated type is @code{mpf_t}. There are about 70 1988functions is this class. (@pxref{Floating-point Functions}) 1989 1990@item 1991Fast low-level functions that operate on natural numbers. These are used by 1992the functions in the preceding groups, and you can also call them directly 1993from very time-critical user programs. These functions' names begin with 1994@code{mpn_}. The associated type is array of @code{mp_limb_t}. There are 1995about 60 (hard-to-use) functions in this class. (@pxref{Low-level Functions}) 1996 1997@item 1998Miscellaneous functions. Functions for setting up custom allocation and 1999functions for generating random numbers. (@pxref{Custom Allocation}, and 2000@pxref{Random Number Functions}) 2001@end enumerate 2002 2003 2004@node Variable Conventions, Parameter Conventions, Function Classes, GMP Basics 2005@section Variable Conventions 2006@cindex Variable conventions 2007@cindex Conventions for variables 2008 2009GMP functions generally have output arguments before input arguments. This 2010notation is by analogy with the assignment operator. The BSD MP compatibility 2011functions are exceptions, having the output arguments last. 2012 2013GMP lets you use the same variable for both input and output in one call. For 2014example, the main function for integer multiplication, @code{mpz_mul}, can be 2015used to square @code{x} and put the result back in @code{x} with 2016 2017@example 2018mpz_mul (x, x, x); 2019@end example 2020 2021Before you can assign to a GMP variable, you need to initialize it by calling 2022one of the special initialization functions. When you're done with a 2023variable, you need to clear it out, using one of the functions for that 2024purpose. Which function to use depends on the type of variable. See the 2025chapters on integer functions, rational number functions, and floating-point 2026functions for details. 2027 2028A variable should only be initialized once, or at least cleared between each 2029initialization. After a variable has been initialized, it may be assigned to 2030any number of times. 2031 2032For efficiency reasons, avoid excessive initializing and clearing. In 2033general, initialize near the start of a function and clear near the end. For 2034example, 2035 2036@example 2037void 2038foo (void) 2039@{ 2040 mpz_t n; 2041 int i; 2042 mpz_init (n); 2043 for (i = 1; i < 100; i++) 2044 @{ 2045 mpz_mul (n, @dots{}); 2046 mpz_fdiv_q (n, @dots{}); 2047 @dots{} 2048 @} 2049 mpz_clear (n); 2050@} 2051@end example 2052 2053 2054@node Parameter Conventions, Memory Management, Variable Conventions, GMP Basics 2055@section Parameter Conventions 2056@cindex Parameter conventions 2057@cindex Conventions for parameters 2058 2059When a GMP variable is used as a function parameter, it's effectively a 2060call-by-reference, meaning if the function stores a value there it will change 2061the original in the caller. Parameters which are input-only can be designated 2062@code{const} to provoke a compiler error or warning on attempting to modify 2063them. 2064 2065When a function is going to return a GMP result, it should designate a 2066parameter that it sets, like the library functions do. More than one value 2067can be returned by having more than one output parameter, again like the 2068library functions. A @code{return} of an @code{mpz_t} etc doesn't return the 2069object, only a pointer, and this is almost certainly not what's wanted. 2070 2071Here's an example accepting an @code{mpz_t} parameter, doing a calculation, 2072and storing the result to the indicated parameter. 2073 2074@example 2075void 2076foo (mpz_t result, const mpz_t param, unsigned long n) 2077@{ 2078 unsigned long i; 2079 mpz_mul_ui (result, param, n); 2080 for (i = 1; i < n; i++) 2081 mpz_add_ui (result, result, i*7); 2082@} 2083 2084int 2085main (void) 2086@{ 2087 mpz_t r, n; 2088 mpz_init (r); 2089 mpz_init_set_str (n, "123456", 0); 2090 foo (r, n, 20L); 2091 gmp_printf ("%Zd\n", r); 2092 return 0; 2093@} 2094@end example 2095 2096@code{foo} works even if the mainline passes the same variable for 2097@code{param} and @code{result}, just like the library functions. But 2098sometimes it's tricky to make that work, and an application might not want to 2099bother supporting that sort of thing. 2100 2101For interest, the GMP types @code{mpz_t} etc are implemented as one-element 2102arrays of certain structures. This is why declaring a variable creates an 2103object with the fields GMP needs, but then using it as a parameter passes a 2104pointer to the object. Note that the actual fields in each @code{mpz_t} etc 2105are for internal use only and should not be accessed directly by code that 2106expects to be compatible with future GMP releases. 2107 2108 2109@need 1000 2110@node Memory Management, Reentrancy, Parameter Conventions, GMP Basics 2111@section Memory Management 2112@cindex Memory management 2113 2114The GMP types like @code{mpz_t} are small, containing only a couple of sizes, 2115and pointers to allocated data. Once a variable is initialized, GMP takes 2116care of all space allocation. Additional space is allocated whenever a 2117variable doesn't have enough. 2118 2119@code{mpz_t} and @code{mpq_t} variables never reduce their allocated space. 2120Normally this is the best policy, since it avoids frequent reallocation. 2121Applications that need to return memory to the heap at some particular point 2122can use @code{mpz_realloc2}, or clear variables no longer needed. 2123 2124@code{mpf_t} variables, in the current implementation, use a fixed amount of 2125space, determined by the chosen precision and allocated at initialization, so 2126their size doesn't change. 2127 2128All memory is allocated using @code{malloc} and friends by default, but this 2129can be changed, see @ref{Custom Allocation}. Temporary memory on the stack is 2130also used (via @code{alloca}), but this can be changed at build-time if 2131desired, see @ref{Build Options}. 2132 2133 2134@node Reentrancy, Useful Macros and Constants, Memory Management, GMP Basics 2135@section Reentrancy 2136@cindex Reentrancy 2137@cindex Thread safety 2138@cindex Multi-threading 2139 2140@noindent 2141GMP is reentrant and thread-safe, with some exceptions: 2142 2143@itemize @bullet 2144@item 2145If configured with @option{--enable-alloca=malloc-notreentrant} (or with 2146@option{--enable-alloca=notreentrant} when @code{alloca} is not available), 2147then naturally GMP is not reentrant. 2148 2149@item 2150@code{mpf_set_default_prec} and @code{mpf_init} use a global variable for the 2151selected precision. @code{mpf_init2} can be used instead, and in the C++ 2152interface an explicit precision to the @code{mpf_class} constructor. 2153 2154@item 2155@code{mpz_random} and the other old random number functions use a global 2156random state and are hence not reentrant. The newer random number functions 2157that accept a @code{gmp_randstate_t} parameter can be used instead. 2158 2159@item 2160@code{gmp_randinit} (obsolete) returns an error indication through a global 2161variable, which is not thread safe. Applications are advised to use 2162@code{gmp_randinit_default} or @code{gmp_randinit_lc_2exp} instead. 2163 2164@item 2165@code{mp_set_memory_functions} uses global variables to store the selected 2166memory allocation functions. 2167 2168@item 2169If the memory allocation functions set by a call to 2170@code{mp_set_memory_functions} (or @code{malloc} and friends by default) are 2171not reentrant, then GMP will not be reentrant either. 2172 2173@item 2174If the standard I/O functions such as @code{fwrite} are not reentrant then the 2175GMP I/O functions using them will not be reentrant either. 2176 2177@item 2178It's safe for two threads to read from the same GMP variable simultaneously, 2179but it's not safe for one to read while another might be writing, nor for 2180two threads to write simultaneously. It's not safe for two threads to 2181generate a random number from the same @code{gmp_randstate_t} simultaneously, 2182since this involves an update of that variable. 2183@end itemize 2184 2185 2186@need 2000 2187@node Useful Macros and Constants, Compatibility with older versions, Reentrancy, GMP Basics 2188@section Useful Macros and Constants 2189@cindex Useful macros and constants 2190@cindex Constants 2191 2192@deftypevr {Global Constant} {const int} mp_bits_per_limb 2193@findex mp_bits_per_limb 2194@cindex Bits per limb 2195@cindex Limb size 2196The number of bits per limb. 2197@end deftypevr 2198 2199@defmac __GNU_MP_VERSION 2200@defmacx __GNU_MP_VERSION_MINOR 2201@defmacx __GNU_MP_VERSION_PATCHLEVEL 2202@cindex Version number 2203@cindex GMP version number 2204The major and minor GMP version, and patch level, respectively, as integers. 2205For GMP i.j, these numbers will be i, j, and 0, respectively. 2206For GMP i.j.k, these numbers will be i, j, and k, respectively. 2207@end defmac 2208 2209@deftypevr {Global Constant} {const char * const} gmp_version 2210@findex gmp_version 2211The GMP version number, as a null-terminated string, in the form ``i.j.k''. 2212This release is @nicode{"@value{VERSION}"}. Note that the format ``i.j'' was 2213used, before version 4.3.0, when k was zero. 2214@end deftypevr 2215 2216@defmac __GMP_CC 2217@defmacx __GMP_CFLAGS 2218The compiler and compiler flags, respectively, used when compiling GMP, as 2219strings. 2220@end defmac 2221 2222 2223@node Compatibility with older versions, Demonstration Programs, Useful Macros and Constants, GMP Basics 2224@section Compatibility with older versions 2225@cindex Compatibility with older versions 2226@cindex Past GMP versions 2227@cindex Upward compatibility 2228 2229This version of GMP is upwardly binary compatible with all 5.x, 4.x, and 3.x 2230versions, and upwardly compatible at the source level with all 2.x versions, 2231with the following exceptions. 2232 2233@itemize @bullet 2234@item 2235@code{mpn_gcd} had its source arguments swapped as of GMP 3.0, for consistency 2236with other @code{mpn} functions. 2237 2238@item 2239@code{mpf_get_prec} counted precision slightly differently in GMP 3.0 and 22403.0.1, but in 3.1 reverted to the 2.x style. 2241 2242@item 2243@code{mpn_bdivmod}, documented as preliminary in GMP 4, has been removed. 2244@end itemize 2245 2246There are a number of compatibility issues between GMP 1 and GMP 2 that of 2247course also apply when porting applications from GMP 1 to GMP 5. Please 2248see the GMP 2 manual for details. 2249 2250@c @item Integer division functions round the result differently. The obsolete 2251@c functions (@code{mpz_div}, @code{mpz_divmod}, @code{mpz_mdiv}, 2252@c @code{mpz_mdivmod}, etc) now all use floor rounding (i.e., they round the 2253@c quotient towards 2254@c @ifinfo 2255@c @minus{}infinity). 2256@c @end ifinfo 2257@c @iftex 2258@c @tex 2259@c $-\infty$). 2260@c @end tex 2261@c @end iftex 2262@c There are a lot of functions for integer division, giving the user better 2263@c control over the rounding. 2264 2265@c @item The function @code{mpz_mod} now compute the true @strong{mod} function. 2266 2267@c @item The functions @code{mpz_powm} and @code{mpz_powm_ui} now use 2268@c @strong{mod} for reduction. 2269 2270@c @item The assignment functions for rational numbers do no longer canonicalize 2271@c their results. In the case a non-canonical result could arise from an 2272@c assignment, the user need to insert an explicit call to 2273@c @code{mpq_canonicalize}. This change was made for efficiency. 2274 2275@c @item Output generated by @code{mpz_out_raw} in this release cannot be read 2276@c by @code{mpz_inp_raw} in previous releases. This change was made for making 2277@c the file format truly portable between machines with different word sizes. 2278 2279@c @item Several @code{mpn} functions have changed. But they were intentionally 2280@c undocumented in previous releases. 2281 2282@c @item The functions @code{mpz_cmp_ui}, @code{mpz_cmp_si}, and @code{mpq_cmp_ui} 2283@c are now implemented as macros, and thereby sometimes evaluate their 2284@c arguments multiple times. 2285 2286@c @item The functions @code{mpz_pow_ui} and @code{mpz_ui_pow_ui} now yield 1 2287@c for 0^0. (In version 1, they yielded 0.) 2288 2289@c In version 1 of the library, @code{mpq_set_den} handled negative 2290@c denominators by copying the sign to the numerator. That is no longer done. 2291 2292@c Pure assignment functions do not canonicalize the assigned variable. It is 2293@c the responsibility of the user to canonicalize the assigned variable before 2294@c any arithmetic operations are performed on that variable. 2295@c Note that this is an incompatible change from version 1 of the library. 2296 2297@c @end enumerate 2298 2299 2300@need 1000 2301@node Demonstration Programs, Efficiency, Compatibility with older versions, GMP Basics 2302@section Demonstration programs 2303@cindex Demonstration programs 2304@cindex Example programs 2305@cindex Sample programs 2306The @file{demos} subdirectory has some sample programs using GMP@. These 2307aren't built or installed, but there's a @file{Makefile} with rules for them. 2308For instance, 2309 2310@example 2311make pexpr 2312./pexpr 68^975+10 2313@end example 2314 2315@noindent 2316The following programs are provided 2317 2318@itemize @bullet 2319@item 2320@cindex Expression parsing demo 2321@cindex Parsing expressions demo 2322@samp{pexpr} is an expression evaluator, the program used on the GMP web page. 2323@item 2324@cindex Expression parsing demo 2325@cindex Parsing expressions demo 2326The @samp{calc} subdirectory has a similar but simpler evaluator using 2327@command{lex} and @command{yacc}. 2328@item 2329@cindex Expression parsing demo 2330@cindex Parsing expressions demo 2331The @samp{expr} subdirectory is yet another expression evaluator, a library 2332designed for ease of use within a C program. See @file{demos/expr/README} for 2333more information. 2334@item 2335@cindex Factorization demo 2336@samp{factorize} is a Pollard-Rho factorization program. 2337@item 2338@samp{isprime} is a command-line interface to the @code{mpz_probab_prime_p} 2339function. 2340@item 2341@samp{primes} counts or lists primes in an interval, using a sieve. 2342@item 2343@samp{qcn} is an example use of @code{mpz_kronecker_ui} to estimate quadratic 2344class numbers. 2345@item 2346@cindex @code{perl} 2347@cindex GMP Perl module 2348@cindex Perl module 2349The @samp{perl} subdirectory is a comprehensive perl interface to GMP@. See 2350@file{demos/perl/INSTALL} for more information. Documentation is in POD 2351format in @file{demos/perl/GMP.pm}. 2352@end itemize 2353 2354As an aside, consideration has been given at various times to some sort of 2355expression evaluation within the main GMP library. Going beyond something 2356minimal quickly leads to matters like user-defined functions, looping, fixnums 2357for control variables, etc, which are considered outside the scope of GMP 2358(much closer to language interpreters or compilers, @xref{Language Bindings}.) 2359Something simple for program input convenience may yet be a possibility, a 2360combination of the @file{expr} demo and the @file{pexpr} tree back-end 2361perhaps. But for now the above evaluators are offered as illustrations. 2362 2363 2364@need 1000 2365@node Efficiency, Debugging, Demonstration Programs, GMP Basics 2366@section Efficiency 2367@cindex Efficiency 2368 2369@table @asis 2370@item Small Operands 2371@cindex Small operands 2372On small operands, the time for function call overheads and memory allocation 2373can be significant in comparison to actual calculation. This is unavoidable 2374in a general purpose variable precision library, although GMP attempts to be 2375as efficient as it can on both large and small operands. 2376 2377@item Static Linking 2378@cindex Static linking 2379On some CPUs, in particular the x86s, the static @file{libgmp.a} should be 2380used for maximum speed, since the PIC code in the shared @file{libgmp.so} will 2381have a small overhead on each function call and global data address. For many 2382programs this will be insignificant, but for long calculations there's a gain 2383to be had. 2384 2385@item Initializing and Clearing 2386@cindex Initializing and clearing 2387Avoid excessive initializing and clearing of variables, since this can be 2388quite time consuming, especially in comparison to otherwise fast operations 2389like addition. 2390 2391A language interpreter might want to keep a free list or stack of 2392initialized variables ready for use. It should be possible to integrate 2393something like that with a garbage collector too. 2394 2395@item Reallocations 2396@cindex Reallocations 2397An @code{mpz_t} or @code{mpq_t} variable used to hold successively increasing 2398values will have its memory repeatedly @code{realloc}ed, which could be quite 2399slow or could fragment memory, depending on the C library. If an application 2400can estimate the final size then @code{mpz_init2} or @code{mpz_realloc2} can 2401be called to allocate the necessary space from the beginning 2402(@pxref{Initializing Integers}). 2403 2404It doesn't matter if a size set with @code{mpz_init2} or @code{mpz_realloc2} 2405is too small, since all functions will do a further reallocation if necessary. 2406Badly overestimating memory required will waste space though. 2407 2408@item @code{2exp} Functions 2409@cindex @code{2exp} functions 2410It's up to an application to call functions like @code{mpz_mul_2exp} when 2411appropriate. General purpose functions like @code{mpz_mul} make no attempt to 2412identify powers of two or other special forms, because such inputs will 2413usually be very rare and testing every time would be wasteful. 2414 2415@item @code{ui} and @code{si} Functions 2416@cindex @code{ui} and @code{si} functions 2417The @code{ui} functions and the small number of @code{si} functions exist for 2418convenience and should be used where applicable. But if for example an 2419@code{mpz_t} contains a value that fits in an @code{unsigned long} there's no 2420need extract it and call a @code{ui} function, just use the regular @code{mpz} 2421function. 2422 2423@item In-Place Operations 2424@cindex In-place operations 2425@code{mpz_abs}, @code{mpq_abs}, @code{mpf_abs}, @code{mpz_neg}, @code{mpq_neg} 2426and @code{mpf_neg} are fast when used for in-place operations like 2427@code{mpz_abs(x,x)}, since in the current implementation only a single field 2428of @code{x} needs changing. On suitable compilers (GCC for instance) this is 2429inlined too. 2430 2431@code{mpz_add_ui}, @code{mpz_sub_ui}, @code{mpf_add_ui} and @code{mpf_sub_ui} 2432benefit from an in-place operation like @code{mpz_add_ui(x,x,y)}, since 2433usually only one or two limbs of @code{x} will need to be changed. The same 2434applies to the full precision @code{mpz_add} etc if @code{y} is small. If 2435@code{y} is big then cache locality may be helped, but that's all. 2436 2437@code{mpz_mul} is currently the opposite, a separate destination is slightly 2438better. A call like @code{mpz_mul(x,x,y)} will, unless @code{y} is only one 2439limb, make a temporary copy of @code{x} before forming the result. Normally 2440that copying will only be a tiny fraction of the time for the multiply, so 2441this is not a particularly important consideration. 2442 2443@code{mpz_set}, @code{mpq_set}, @code{mpq_set_num}, @code{mpf_set}, etc, make 2444no attempt to recognise a copy of something to itself, so a call like 2445@code{mpz_set(x,x)} will be wasteful. Naturally that would never be written 2446deliberately, but if it might arise from two pointers to the same object then 2447a test to avoid it might be desirable. 2448 2449@example 2450if (x != y) 2451 mpz_set (x, y); 2452@end example 2453 2454Note that it's never worth introducing extra @code{mpz_set} calls just to get 2455in-place operations. If a result should go to a particular variable then just 2456direct it there and let GMP take care of data movement. 2457 2458@item Divisibility Testing (Small Integers) 2459@cindex Divisibility testing 2460@code{mpz_divisible_ui_p} and @code{mpz_congruent_ui_p} are the best functions 2461for testing whether an @code{mpz_t} is divisible by an individual small 2462integer. They use an algorithm which is faster than @code{mpz_tdiv_ui}, but 2463which gives no useful information about the actual remainder, only whether 2464it's zero (or a particular value). 2465 2466However when testing divisibility by several small integers, it's best to take 2467a remainder modulo their product, to save multi-precision operations. For 2468instance to test whether a number is divisible by any of 23, 29 or 31 take a 2469remainder modulo @math{23@times{}29@times{}31 = 20677} and then test that. 2470 2471The division functions like @code{mpz_tdiv_q_ui} which give a quotient as well 2472as a remainder are generally a little slower than the remainder-only functions 2473like @code{mpz_tdiv_ui}. If the quotient is only rarely wanted then it's 2474probably best to just take a remainder and then go back and calculate the 2475quotient if and when it's wanted (@code{mpz_divexact_ui} can be used if the 2476remainder is zero). 2477 2478@item Rational Arithmetic 2479@cindex Rational arithmetic 2480The @code{mpq} functions operate on @code{mpq_t} values with no common factors 2481in the numerator and denominator. Common factors are checked-for and cast out 2482as necessary. In general, cancelling factors every time is the best approach 2483since it minimizes the sizes for subsequent operations. 2484 2485However, applications that know something about the factorization of the 2486values they're working with might be able to avoid some of the GCDs used for 2487canonicalization, or swap them for divisions. For example when multiplying by 2488a prime it's enough to check for factors of it in the denominator instead of 2489doing a full GCD@. Or when forming a big product it might be known that very 2490little cancellation will be possible, and so canonicalization can be left to 2491the end. 2492 2493The @code{mpq_numref} and @code{mpq_denref} macros give access to the 2494numerator and denominator to do things outside the scope of the supplied 2495@code{mpq} functions. @xref{Applying Integer Functions}. 2496 2497The canonical form for rationals allows mixed-type @code{mpq_t} and integer 2498additions or subtractions to be done directly with multiples of the 2499denominator. This will be somewhat faster than @code{mpq_add}. For example, 2500 2501@example 2502/* mpq increment */ 2503mpz_add (mpq_numref(q), mpq_numref(q), mpq_denref(q)); 2504 2505/* mpq += unsigned long */ 2506mpz_addmul_ui (mpq_numref(q), mpq_denref(q), 123UL); 2507 2508/* mpq -= mpz */ 2509mpz_submul (mpq_numref(q), mpq_denref(q), z); 2510@end example 2511 2512@item Number Sequences 2513@cindex Number sequences 2514Functions like @code{mpz_fac_ui}, @code{mpz_fib_ui} and @code{mpz_bin_uiui} 2515are designed for calculating isolated values. If a range of values is wanted 2516it's probably best to call to get a starting point and iterate from there. 2517 2518@item Text Input/Output 2519@cindex Text input/output 2520Hexadecimal or octal are suggested for input or output in text form. 2521Power-of-2 bases like these can be converted much more efficiently than other 2522bases, like decimal. For big numbers there's usually nothing of particular 2523interest to be seen in the digits, so the base doesn't matter much. 2524 2525Maybe we can hope octal will one day become the normal base for everyday use, 2526as proposed by King Charles XII of Sweden and later reformers. 2527@c Reference: Knuth volume 2 section 4.1, page 184 of second edition. :-) 2528@end table 2529 2530 2531@node Debugging, Profiling, Efficiency, GMP Basics 2532@section Debugging 2533@cindex Debugging 2534 2535@table @asis 2536@item Stack Overflow 2537@cindex Stack overflow 2538@cindex Segmentation violation 2539@cindex Bus error 2540Depending on the system, a segmentation violation or bus error might be the 2541only indication of stack overflow. See @samp{--enable-alloca} choices in 2542@ref{Build Options}, for how to address this. 2543 2544In new enough versions of GCC, @samp{-fstack-check} may be able to ensure an 2545overflow is recognised by the system before too much damage is done, or 2546@samp{-fstack-limit-symbol} or @samp{-fstack-limit-register} may be able to 2547add checking if the system itself doesn't do any (@pxref{Code Gen Options,, 2548Options for Code Generation, gcc, Using the GNU Compiler Collection (GCC)}). 2549These options must be added to the @samp{CFLAGS} used in the GMP build 2550(@pxref{Build Options}), adding them just to an application will have no 2551effect. Note also they're a slowdown, adding overhead to each function call 2552and each stack allocation. 2553 2554@item Heap Problems 2555@cindex Heap problems 2556@cindex Malloc problems 2557The most likely cause of application problems with GMP is heap corruption. 2558Failing to @code{init} GMP variables will have unpredictable effects, and 2559corruption arising elsewhere in a program may well affect GMP@. Initializing 2560GMP variables more than once or failing to clear them will cause memory leaks. 2561 2562@cindex Malloc debugger 2563In all such cases a @code{malloc} debugger is recommended. On a GNU or BSD 2564system the standard C library @code{malloc} has some diagnostic facilities, 2565see @ref{Allocation Debugging,, Allocation Debugging, libc, The GNU C Library 2566Reference Manual}, or @samp{man 3 malloc}. Other possibilities, in no 2567particular order, include 2568 2569@display 2570@uref{http://www.inf.ethz.ch/personal/biere/projects/ccmalloc/} 2571@uref{http://dmalloc.com/} 2572@uref{http://www.perens.com/FreeSoftware/} @ (electric fence) 2573@uref{http://packages.debian.org/stable/devel/fda} 2574@uref{http://www.gnupdate.org/components/leakbug/} 2575@uref{http://people.redhat.com/~otaylor/memprof/} 2576@uref{http://www.cbmamiga.demon.co.uk/mpatrol/} 2577@end display 2578 2579The GMP default allocation routines in @file{memory.c} also have a simple 2580sentinel scheme which can be enabled with @code{#define DEBUG} in that file. 2581This is mainly designed for detecting buffer overruns during GMP development, 2582but might find other uses. 2583 2584@item Stack Backtraces 2585@cindex Stack backtrace 2586On some systems the compiler options GMP uses by default can interfere with 2587debugging. In particular on x86 and 68k systems @samp{-fomit-frame-pointer} 2588is used and this generally inhibits stack backtracing. Recompiling without 2589such options may help while debugging, though the usual caveats about it 2590potentially moving a memory problem or hiding a compiler bug will apply. 2591 2592@item GDB, the GNU Debugger 2593@cindex GDB 2594@cindex GNU Debugger 2595A sample @file{.gdbinit} is included in the distribution, showing how to call 2596some undocumented dump functions to print GMP variables from within GDB@. Note 2597that these functions shouldn't be used in final application code since they're 2598undocumented and may be subject to incompatible changes in future versions of 2599GMP. 2600 2601@item Source File Paths 2602GMP has multiple source files with the same name, in different directories. 2603For example @file{mpz}, @file{mpq} and @file{mpf} each have an 2604@file{init.c}. If the debugger can't already determine the right one it may 2605help to build with absolute paths on each C file. One way to do that is to 2606use a separate object directory with an absolute path to the source directory. 2607 2608@example 2609cd /my/build/dir 2610/my/source/dir/gmp-@value{VERSION}/configure 2611@end example 2612 2613This works via @code{VPATH}, and might require GNU @command{make}. 2614Alternately it might be possible to change the @code{.c.lo} rules 2615appropriately. 2616 2617@item Assertion Checking 2618@cindex Assertion checking 2619The build option @option{--enable-assert} is available to add some consistency 2620checks to the library (see @ref{Build Options}). These are likely to be of 2621limited value to most applications. Assertion failures are just as likely to 2622indicate memory corruption as a library or compiler bug. 2623 2624Applications using the low-level @code{mpn} functions, however, will benefit 2625from @option{--enable-assert} since it adds checks on the parameters of most 2626such functions, many of which have subtle restrictions on their usage. Note 2627however that only the generic C code has checks, not the assembly code, so 2628@option{--disable-assembly} should be used for maximum checking. 2629 2630@item Temporary Memory Checking 2631The build option @option{--enable-alloca=debug} arranges that each block of 2632temporary memory in GMP is allocated with a separate call to @code{malloc} (or 2633the allocation function set with @code{mp_set_memory_functions}). 2634 2635This can help a malloc debugger detect accesses outside the intended bounds, 2636or detect memory not released. In a normal build, on the other hand, 2637temporary memory is allocated in blocks which GMP divides up for its own use, 2638or may be allocated with a compiler builtin @code{alloca} which will go 2639nowhere near any malloc debugger hooks. 2640 2641@item Maximum Debuggability 2642To summarize the above, a GMP build for maximum debuggability would be 2643 2644@example 2645./configure --disable-shared --enable-assert \ 2646 --enable-alloca=debug --disable-assembly CFLAGS=-g 2647@end example 2648 2649For C++, add @samp{--enable-cxx CXXFLAGS=-g}. 2650 2651@item Checker 2652@cindex Checker 2653@cindex GCC Checker 2654The GCC checker (@uref{https://savannah.nongnu.org/projects/checker/}) can be 2655used with GMP@. It contains a stub library which means GMP applications 2656compiled with checker can use a normal GMP build. 2657 2658A build of GMP with checking within GMP itself can be made. This will run 2659very very slowly. On GNU/Linux for example, 2660 2661@cindex @command{checkergcc} 2662@example 2663./configure --disable-assembly CC=checkergcc 2664@end example 2665 2666@option{--disable-assembly} must be used, since the GMP assembly code doesn't 2667support the checking scheme. The GMP C++ features cannot be used, since 2668current versions of checker (0.9.9.1) don't yet support the standard C++ 2669library. 2670 2671@item Valgrind 2672@cindex Valgrind 2673Valgrind (@uref{http://valgrind.org/}) is a memory checker for x86, ARM, MIPS, 2674PowerPC, and S/390. It translates and emulates machine instructions to do 2675strong checks for uninitialized data (at the level of individual bits), memory 2676accesses through bad pointers, and memory leaks. 2677 2678Valgrind does not always support every possible instruction, in particular 2679ones recently added to an ISA. Valgrind might therefore be incompatible with 2680a recent GMP or even a less recent GMP which is compiled using a recent GCC. 2681 2682GMP's assembly code sometimes promotes a read of the limbs to some larger size, 2683for efficiency. GMP will do this even at the start and end of a multilimb 2684operand, using naturally aligned operations on the larger type. This may lead 2685to benign reads outside of allocated areas, triggering complaints from 2686Valgrind. Valgrind's option @samp{--partial-loads-ok=yes} should help. 2687 2688@item Other Problems 2689Any suspected bug in GMP itself should be isolated to make sure it's not an 2690application problem, see @ref{Reporting Bugs}. 2691@end table 2692 2693 2694@node Profiling, Autoconf, Debugging, GMP Basics 2695@section Profiling 2696@cindex Profiling 2697@cindex Execution profiling 2698@cindex @code{--enable-profiling} 2699 2700Running a program under a profiler is a good way to find where it's spending 2701most time and where improvements can be best sought. The profiling choices 2702for a GMP build are as follows. 2703 2704@table @asis 2705@item @samp{--disable-profiling} 2706The default is to add nothing special for profiling. 2707 2708It should be possible to just compile the mainline of a program with @code{-p} 2709and use @command{prof} to get a profile consisting of timer-based sampling of 2710the program counter. Most of the GMP assembly code has the necessary symbol 2711information. 2712 2713This approach has the advantage of minimizing interference with normal program 2714operation, but on most systems the resolution of the sampling is quite low (10 2715milliseconds for instance), requiring long runs to get accurate information. 2716 2717@item @samp{--enable-profiling=prof} 2718@cindex @code{prof} 2719Build with support for the system @command{prof}, which means @samp{-p} added 2720to the @samp{CFLAGS}. 2721 2722This provides call counting in addition to program counter sampling, which 2723allows the most frequently called routines to be identified, and an average 2724time spent in each routine to be determined. 2725 2726The x86 assembly code has support for this option, but on other processors 2727the assembly routines will be as if compiled without @samp{-p} and therefore 2728won't appear in the call counts. 2729 2730On some systems, such as GNU/Linux, @samp{-p} in fact means @samp{-pg} and in 2731this case @samp{--enable-profiling=gprof} described below should be used 2732instead. 2733 2734@item @samp{--enable-profiling=gprof} 2735@cindex @code{gprof} 2736Build with support for @command{gprof}, which means @samp{-pg} added to the 2737@samp{CFLAGS}. 2738 2739This provides call graph construction in addition to call counting and program 2740counter sampling, which makes it possible to count calls coming from different 2741locations. For example the number of calls to @code{mpn_mul} from 2742@code{mpz_mul} versus the number from @code{mpf_mul}. The program counter 2743sampling is still flat though, so only a total time in @code{mpn_mul} would be 2744accumulated, not a separate amount for each call site. 2745 2746The x86 assembly code has support for this option, but on other processors 2747the assembly routines will be as if compiled without @samp{-pg} and therefore 2748not be included in the call counts. 2749 2750On x86 and m68k systems @samp{-pg} and @samp{-fomit-frame-pointer} are 2751incompatible, so the latter is omitted from the default flags in that case, 2752which might result in poorer code generation. 2753 2754Incidentally, it should be possible to use the @command{gprof} program with a 2755plain @samp{--enable-profiling=prof} build. But in that case only the 2756@samp{gprof -p} flat profile and call counts can be expected to be valid, not 2757the @samp{gprof -q} call graph. 2758 2759@item @samp{--enable-profiling=instrument} 2760@cindex @code{-finstrument-functions} 2761@cindex @code{instrument-functions} 2762Build with the GCC option @samp{-finstrument-functions} added to the 2763@samp{CFLAGS} (@pxref{Code Gen Options,, Options for Code Generation, gcc, 2764Using the GNU Compiler Collection (GCC)}). 2765 2766This inserts special instrumenting calls at the start and end of each 2767function, allowing exact timing and full call graph construction. 2768 2769This instrumenting is not normally a standard system feature and will require 2770support from an external library, such as 2771 2772@cindex FunctionCheck 2773@cindex fnccheck 2774@display 2775@uref{http://sourceforge.net/projects/fnccheck/} 2776@end display 2777 2778This should be included in @samp{LIBS} during the GMP configure so that test 2779programs will link. For example, 2780 2781@example 2782./configure --enable-profiling=instrument LIBS=-lfc 2783@end example 2784 2785On a GNU system the C library provides dummy instrumenting functions, so 2786programs compiled with this option will link. In this case it's only 2787necessary to ensure the correct library is added when linking an application. 2788 2789The x86 assembly code supports this option, but on other processors the 2790assembly routines will be as if compiled without 2791@samp{-finstrument-functions} meaning time spent in them will effectively be 2792attributed to their caller. 2793@end table 2794 2795 2796@node Autoconf, Emacs, Profiling, GMP Basics 2797@section Autoconf 2798@cindex Autoconf 2799 2800Autoconf based applications can easily check whether GMP is installed. The 2801only thing to be noted is that GMP library symbols from version 3 onwards have 2802prefixes like @code{__gmpz}. The following therefore would be a simple test, 2803 2804@cindex @code{AC_CHECK_LIB} 2805@example 2806AC_CHECK_LIB(gmp, __gmpz_init) 2807@end example 2808 2809This just uses the default @code{AC_CHECK_LIB} actions for found or not found, 2810but an application that must have GMP would want to generate an error if not 2811found. For example, 2812 2813@example 2814AC_CHECK_LIB(gmp, __gmpz_init, , 2815 [AC_MSG_ERROR([GNU MP not found, see https://gmplib.org/])]) 2816@end example 2817 2818If functions added in some particular version of GMP are required, then one of 2819those can be used when checking. For example @code{mpz_mul_si} was added in 2820GMP 3.1, 2821 2822@example 2823AC_CHECK_LIB(gmp, __gmpz_mul_si, , 2824 [AC_MSG_ERROR( 2825 [GNU MP not found, or not 3.1 or up, see https://gmplib.org/])]) 2826@end example 2827 2828An alternative would be to test the version number in @file{gmp.h} using say 2829@code{AC_EGREP_CPP}. That would make it possible to test the exact version, 2830if some particular sub-minor release is known to be necessary. 2831 2832In general it's recommended that applications should simply demand a new 2833enough GMP rather than trying to provide supplements for features not 2834available in past versions. 2835 2836Occasionally an application will need or want to know the size of a type at 2837configuration or preprocessing time, not just with @code{sizeof} in the code. 2838This can be done in the normal way with @code{mp_limb_t} etc, but GMP 4.0 or 2839up is best for this, since prior versions needed certain @samp{-D} defines on 2840systems using a @code{long long} limb. The following would suit Autoconf 2.50 2841or up, 2842 2843@example 2844AC_CHECK_SIZEOF(mp_limb_t, , [#include <gmp.h>]) 2845@end example 2846 2847 2848@node Emacs, , Autoconf, GMP Basics 2849@section Emacs 2850@cindex Emacs 2851@cindex @code{info-lookup-symbol} 2852 2853@key{C-h C-i} (@code{info-lookup-symbol}) is a good way to find documentation 2854on C functions while editing (@pxref{Info Lookup, , Info Documentation Lookup, 2855emacs, The Emacs Editor}). 2856 2857The GMP manual can be included in such lookups by putting the following in 2858your @file{.emacs}, 2859 2860@c This isn't pretty, but there doesn't seem to be a better way (in emacs 2861@c 21.2 at least). info-lookup->mode-value could be used for the "assoc"s, 2862@c but that function isn't documented, whereas info-lookup-alist is. 2863@c 2864@example 2865(eval-after-load "info-look" 2866 '(let ((mode-value (assoc 'c-mode (assoc 'symbol info-lookup-alist)))) 2867 (setcar (nthcdr 3 mode-value) 2868 (cons '("(gmp)Function Index" nil "^ -.* " "\\>") 2869 (nth 3 mode-value))))) 2870@end example 2871 2872 2873@node Reporting Bugs, Integer Functions, GMP Basics, Top 2874@comment node-name, next, previous, up 2875@chapter Reporting Bugs 2876@cindex Reporting bugs 2877@cindex Bug reporting 2878 2879If you think you have found a bug in the GMP library, please investigate it 2880and report it. We have made this library available to you, and it is not too 2881much to ask you to report the bugs you find. 2882 2883Before you report a bug, check it's not already addressed in @ref{Known Build 2884Problems}, or perhaps @ref{Notes for Particular Systems}. You may also want 2885to check @uref{https://gmplib.org/} for patches for this release. 2886 2887Please include the following in any report, 2888 2889@itemize @bullet 2890@item 2891The GMP version number, and if pre-packaged or patched then say so. 2892 2893@item 2894A test program that makes it possible for us to reproduce the bug. Include 2895instructions on how to run the program. 2896 2897@item 2898A description of what is wrong. If the results are incorrect, in what way. 2899If you get a crash, say so. 2900 2901@item 2902If you get a crash, include a stack backtrace from the debugger if it's 2903informative (@samp{where} in @command{gdb}, or @samp{$C} in @command{adb}). 2904 2905@item 2906Please do not send core dumps, executables or @command{strace}s. 2907 2908@item 2909The @samp{configure} options you used when building GMP, if any. 2910 2911@item 2912The output from @samp{configure}, as printed to stdout, with any options used. 2913 2914@item 2915The name of the compiler and its version. For @command{gcc}, get the version 2916with @samp{gcc -v}, otherwise perhaps @samp{what `which cc`}, or similar. 2917 2918@item 2919The output from running @samp{uname -a}. 2920 2921@item 2922The output from running @samp{./config.guess}, and from running 2923@samp{./configfsf.guess} (might be the same). 2924 2925@item 2926If the bug is related to @samp{configure}, then the compressed contents of 2927@file{config.log}. 2928 2929@item 2930If the bug is related to an @file{asm} file not assembling, then the contents 2931of @file{config.m4} and the offending line or lines from the temporary 2932@file{mpn/tmp-<file>.s}. 2933@end itemize 2934 2935Please make an effort to produce a self-contained report, with something 2936definite that can be tested or debugged. Vague queries or piecemeal messages 2937are difficult to act on and don't help the development effort. 2938 2939It is not uncommon that an observed problem is actually due to a bug in the 2940compiler; the GMP code tends to explore interesting corners in compilers. 2941 2942If your bug report is good, we will do our best to help you get a corrected 2943version of the library; if the bug report is poor, we won't do anything about 2944it (except maybe ask you to send a better report). 2945 2946Send your report to: @email{gmp-bugs@@gmplib.org}. 2947 2948If you think something in this manual is unclear, or downright incorrect, or if 2949the language needs to be improved, please send a note to the same address. 2950 2951 2952@node Integer Functions, Rational Number Functions, Reporting Bugs, Top 2953@comment node-name, next, previous, up 2954@chapter Integer Functions 2955@cindex Integer functions 2956 2957This chapter describes the GMP functions for performing integer arithmetic. 2958These functions start with the prefix @code{mpz_}. 2959 2960GMP integers are stored in objects of type @code{mpz_t}. 2961 2962@menu 2963* Initializing Integers:: 2964* Assigning Integers:: 2965* Simultaneous Integer Init & Assign:: 2966* Converting Integers:: 2967* Integer Arithmetic:: 2968* Integer Division:: 2969* Integer Exponentiation:: 2970* Integer Roots:: 2971* Number Theoretic Functions:: 2972* Integer Comparisons:: 2973* Integer Logic and Bit Fiddling:: 2974* I/O of Integers:: 2975* Integer Random Numbers:: 2976* Integer Import and Export:: 2977* Miscellaneous Integer Functions:: 2978* Integer Special Functions:: 2979@end menu 2980 2981@node Initializing Integers, Assigning Integers, Integer Functions, Integer Functions 2982@comment node-name, next, previous, up 2983@section Initialization Functions 2984@cindex Integer initialization functions 2985@cindex Initialization functions 2986 2987The functions for integer arithmetic assume that all integer objects are 2988initialized. You do that by calling the function @code{mpz_init}. For 2989example, 2990 2991@example 2992@{ 2993 mpz_t integ; 2994 mpz_init (integ); 2995 @dots{} 2996 mpz_add (integ, @dots{}); 2997 @dots{} 2998 mpz_sub (integ, @dots{}); 2999 3000 /* Unless the program is about to exit, do ... */ 3001 mpz_clear (integ); 3002@} 3003@end example 3004 3005As you can see, you can store new values any number of times, once an 3006object is initialized. 3007 3008@deftypefun void mpz_init (mpz_t @var{x}) 3009Initialize @var{x}, and set its value to 0. 3010@end deftypefun 3011 3012@deftypefun void mpz_inits (mpz_t @var{x}, ...) 3013Initialize a NULL-terminated list of @code{mpz_t} variables, and set their 3014values to 0. 3015@end deftypefun 3016 3017@deftypefun void mpz_init2 (mpz_t @var{x}, mp_bitcnt_t @var{n}) 3018Initialize @var{x}, with space for @var{n}-bit numbers, and set its value to 0. 3019Calling this function instead of @code{mpz_init} or @code{mpz_inits} is never 3020necessary; reallocation is handled automatically by GMP when needed. 3021 3022While @var{n} defines the initial space, @var{x} will grow automatically in the 3023normal way, if necessary, for subsequent values stored. @code{mpz_init2} makes 3024it possible to avoid such reallocations if a maximum size is known in advance. 3025 3026In preparation for an operation, GMP often allocates one limb more than 3027ultimately needed. To make sure GMP will not perform reallocation for 3028@var{x}, you need to add the number of bits in @code{mp_limb_t} to @var{n}. 3029@end deftypefun 3030 3031@deftypefun void mpz_clear (mpz_t @var{x}) 3032Free the space occupied by @var{x}. Call this function for all @code{mpz_t} 3033variables when you are done with them. 3034@end deftypefun 3035 3036@deftypefun void mpz_clears (mpz_t @var{x}, ...) 3037Free the space occupied by a NULL-terminated list of @code{mpz_t} variables. 3038@end deftypefun 3039 3040@deftypefun void mpz_realloc2 (mpz_t @var{x}, mp_bitcnt_t @var{n}) 3041Change the space allocated for @var{x} to @var{n} bits. The value in @var{x} 3042is preserved if it fits, or is set to 0 if not. 3043 3044Calling this function is never necessary; reallocation is handled automatically 3045by GMP when needed. But this function can be used to increase the space for a 3046variable in order to avoid repeated automatic reallocations, or to decrease it 3047to give memory back to the heap. 3048@end deftypefun 3049 3050 3051@node Assigning Integers, Simultaneous Integer Init & Assign, Initializing Integers, Integer Functions 3052@comment node-name, next, previous, up 3053@section Assignment Functions 3054@cindex Integer assignment functions 3055@cindex Assignment functions 3056 3057These functions assign new values to already initialized integers 3058(@pxref{Initializing Integers}). 3059 3060@deftypefun void mpz_set (mpz_t @var{rop}, const mpz_t @var{op}) 3061@deftypefunx void mpz_set_ui (mpz_t @var{rop}, unsigned long int @var{op}) 3062@deftypefunx void mpz_set_si (mpz_t @var{rop}, signed long int @var{op}) 3063@deftypefunx void mpz_set_d (mpz_t @var{rop}, double @var{op}) 3064@deftypefunx void mpz_set_q (mpz_t @var{rop}, const mpq_t @var{op}) 3065@deftypefunx void mpz_set_f (mpz_t @var{rop}, const mpf_t @var{op}) 3066Set the value of @var{rop} from @var{op}. 3067 3068@code{mpz_set_d}, @code{mpz_set_q} and @code{mpz_set_f} truncate @var{op} to 3069make it an integer. 3070@end deftypefun 3071 3072@deftypefun int mpz_set_str (mpz_t @var{rop}, const char *@var{str}, int @var{base}) 3073Set the value of @var{rop} from @var{str}, a null-terminated C string in base 3074@var{base}. White space is allowed in the string, and is simply ignored. 3075 3076The @var{base} may vary from 2 to 62, or if @var{base} is 0, then the leading 3077characters are used: @code{0x} and @code{0X} for hexadecimal, @code{0b} and 3078@code{0B} for binary, @code{0} for octal, or decimal otherwise. 3079 3080For bases up to 36, case is ignored; upper-case and lower-case letters have 3081the same value. For bases 37 to 62, upper-case letter represent the usual 308210..35 while lower-case letter represent 36..61. 3083 3084This function returns 0 if the entire string is a valid number in base 3085@var{base}. Otherwise it returns @minus{}1. 3086@c 3087@c It turns out that it is not entirely true that this function ignores 3088@c white-space. It does ignore it between digits, but not after a minus sign 3089@c or within or after ``0x''. Some thought was given to disallowing all 3090@c whitespace, but that would be an incompatible change, whitespace has been 3091@c documented as ignored ever since GMP 1. 3092@c 3093@end deftypefun 3094 3095@deftypefun void mpz_swap (mpz_t @var{rop1}, mpz_t @var{rop2}) 3096Swap the values @var{rop1} and @var{rop2} efficiently. 3097@end deftypefun 3098 3099 3100@node Simultaneous Integer Init & Assign, Converting Integers, Assigning Integers, Integer Functions 3101@comment node-name, next, previous, up 3102@section Combined Initialization and Assignment Functions 3103@cindex Integer assignment functions 3104@cindex Assignment functions 3105@cindex Integer initialization functions 3106@cindex Initialization functions 3107 3108For convenience, GMP provides a parallel series of initialize-and-set functions 3109which initialize the output and then store the value there. These functions' 3110names have the form @code{mpz_init_set@dots{}} 3111 3112Here is an example of using one: 3113 3114@example 3115@{ 3116 mpz_t pie; 3117 mpz_init_set_str (pie, "3141592653589793238462643383279502884", 10); 3118 @dots{} 3119 mpz_sub (pie, @dots{}); 3120 @dots{} 3121 mpz_clear (pie); 3122@} 3123@end example 3124 3125@noindent 3126Once the integer has been initialized by any of the @code{mpz_init_set@dots{}} 3127functions, it can be used as the source or destination operand for the ordinary 3128integer functions. Don't use an initialize-and-set function on a variable 3129already initialized! 3130 3131@deftypefun void mpz_init_set (mpz_t @var{rop}, const mpz_t @var{op}) 3132@deftypefunx void mpz_init_set_ui (mpz_t @var{rop}, unsigned long int @var{op}) 3133@deftypefunx void mpz_init_set_si (mpz_t @var{rop}, signed long int @var{op}) 3134@deftypefunx void mpz_init_set_d (mpz_t @var{rop}, double @var{op}) 3135Initialize @var{rop} with limb space and set the initial numeric value from 3136@var{op}. 3137@end deftypefun 3138 3139@deftypefun int mpz_init_set_str (mpz_t @var{rop}, const char *@var{str}, int @var{base}) 3140Initialize @var{rop} and set its value like @code{mpz_set_str} (see its 3141documentation above for details). 3142 3143If the string is a correct base @var{base} number, the function returns 0; 3144if an error occurs it returns @minus{}1. @var{rop} is initialized even if 3145an error occurs. (I.e., you have to call @code{mpz_clear} for it.) 3146@end deftypefun 3147 3148 3149@node Converting Integers, Integer Arithmetic, Simultaneous Integer Init & Assign, Integer Functions 3150@comment node-name, next, previous, up 3151@section Conversion Functions 3152@cindex Integer conversion functions 3153@cindex Conversion functions 3154 3155This section describes functions for converting GMP integers to standard C 3156types. Functions for converting @emph{to} GMP integers are described in 3157@ref{Assigning Integers} and @ref{I/O of Integers}. 3158 3159@deftypefun {unsigned long int} mpz_get_ui (const mpz_t @var{op}) 3160Return the value of @var{op} as an @code{unsigned long}. 3161 3162If @var{op} is too big to fit an @code{unsigned long} then just the least 3163significant bits that do fit are returned. The sign of @var{op} is ignored, 3164only the absolute value is used. 3165@end deftypefun 3166 3167@deftypefun {signed long int} mpz_get_si (const mpz_t @var{op}) 3168If @var{op} fits into a @code{signed long int} return the value of @var{op}. 3169Otherwise return the least significant part of @var{op}, with the same sign 3170as @var{op}. 3171 3172If @var{op} is too big to fit in a @code{signed long int}, the returned 3173result is probably not very useful. To find out if the value will fit, use 3174the function @code{mpz_fits_slong_p}. 3175@end deftypefun 3176 3177@deftypefun double mpz_get_d (const mpz_t @var{op}) 3178Convert @var{op} to a @code{double}, truncating if necessary (i.e.@: rounding 3179towards zero). 3180 3181If the exponent from the conversion is too big, the result is system 3182dependent. An infinity is returned where available. A hardware overflow trap 3183may or may not occur. 3184@end deftypefun 3185 3186@deftypefun double mpz_get_d_2exp (signed long int *@var{exp}, const mpz_t @var{op}) 3187Convert @var{op} to a @code{double}, truncating if necessary (i.e.@: rounding 3188towards zero), and returning the exponent separately. 3189 3190The return value is in the range @math{0.5@le{}@GMPabs{@var{d}}<1} and the 3191exponent is stored to @code{*@var{exp}}. @m{@var{d} * 2^{exp}, @var{d} * 31922^@var{exp}} is the (truncated) @var{op} value. If @var{op} is zero, the 3193return is @math{0.0} and 0 is stored to @code{*@var{exp}}. 3194 3195@cindex @code{frexp} 3196This is similar to the standard C @code{frexp} function (@pxref{Normalization 3197Functions,,, libc, The GNU C Library Reference Manual}). 3198@end deftypefun 3199 3200@deftypefun {char *} mpz_get_str (char *@var{str}, int @var{base}, const mpz_t @var{op}) 3201Convert @var{op} to a string of digits in base @var{base}. The base argument 3202may vary from 2 to 62 or from @minus{}2 to @minus{}36. 3203 3204For @var{base} in the range 2..36, digits and lower-case letters are used; for 3205@minus{}2..@minus{}36, digits and upper-case letters are used; for 37..62, 3206digits, upper-case letters, and lower-case letters (in that significance order) 3207are used. 3208 3209If @var{str} is @code{NULL}, the result string is allocated using the current 3210allocation function (@pxref{Custom Allocation}). The block will be 3211@code{strlen(str)+1} bytes, that being exactly enough for the string and 3212null-terminator. 3213 3214If @var{str} is not @code{NULL}, it should point to a block of storage large 3215enough for the result, that being @code{mpz_sizeinbase (@var{op}, @var{base}) 3216+ 2}. The two extra bytes are for a possible minus sign, and the 3217null-terminator. 3218 3219A pointer to the result string is returned, being either the allocated block, 3220or the given @var{str}. 3221@end deftypefun 3222 3223 3224@need 2000 3225@node Integer Arithmetic, Integer Division, Converting Integers, Integer Functions 3226@comment node-name, next, previous, up 3227@section Arithmetic Functions 3228@cindex Integer arithmetic functions 3229@cindex Arithmetic functions 3230 3231@deftypefun void mpz_add (mpz_t @var{rop}, const mpz_t @var{op1}, const mpz_t @var{op2}) 3232@deftypefunx void mpz_add_ui (mpz_t @var{rop}, const mpz_t @var{op1}, unsigned long int @var{op2}) 3233Set @var{rop} to @math{@var{op1} + @var{op2}}. 3234@end deftypefun 3235 3236@deftypefun void mpz_sub (mpz_t @var{rop}, const mpz_t @var{op1}, const mpz_t @var{op2}) 3237@deftypefunx void mpz_sub_ui (mpz_t @var{rop}, const mpz_t @var{op1}, unsigned long int @var{op2}) 3238@deftypefunx void mpz_ui_sub (mpz_t @var{rop}, unsigned long int @var{op1}, const mpz_t @var{op2}) 3239Set @var{rop} to @var{op1} @minus{} @var{op2}. 3240@end deftypefun 3241 3242@deftypefun void mpz_mul (mpz_t @var{rop}, const mpz_t @var{op1}, const mpz_t @var{op2}) 3243@deftypefunx void mpz_mul_si (mpz_t @var{rop}, const mpz_t @var{op1}, long int @var{op2}) 3244@deftypefunx void mpz_mul_ui (mpz_t @var{rop}, const mpz_t @var{op1}, unsigned long int @var{op2}) 3245Set @var{rop} to @math{@var{op1} @GMPtimes{} @var{op2}}. 3246@end deftypefun 3247 3248@deftypefun void mpz_addmul (mpz_t @var{rop}, const mpz_t @var{op1}, const mpz_t @var{op2}) 3249@deftypefunx void mpz_addmul_ui (mpz_t @var{rop}, const mpz_t @var{op1}, unsigned long int @var{op2}) 3250Set @var{rop} to @math{@var{rop} + @var{op1} @GMPtimes{} @var{op2}}. 3251@end deftypefun 3252 3253@deftypefun void mpz_submul (mpz_t @var{rop}, const mpz_t @var{op1}, const mpz_t @var{op2}) 3254@deftypefunx void mpz_submul_ui (mpz_t @var{rop}, const mpz_t @var{op1}, unsigned long int @var{op2}) 3255Set @var{rop} to @math{@var{rop} - @var{op1} @GMPtimes{} @var{op2}}. 3256@end deftypefun 3257 3258@deftypefun void mpz_mul_2exp (mpz_t @var{rop}, const mpz_t @var{op1}, mp_bitcnt_t @var{op2}) 3259@cindex Bit shift left 3260Set @var{rop} to @m{@var{op1} \times 2^{op2}, @var{op1} times 2 raised to 3261@var{op2}}. This operation can also be defined as a left shift by @var{op2} 3262bits. 3263@end deftypefun 3264 3265@deftypefun void mpz_neg (mpz_t @var{rop}, const mpz_t @var{op}) 3266Set @var{rop} to @minus{}@var{op}. 3267@end deftypefun 3268 3269@deftypefun void mpz_abs (mpz_t @var{rop}, const mpz_t @var{op}) 3270Set @var{rop} to the absolute value of @var{op}. 3271@end deftypefun 3272 3273 3274@need 2000 3275@node Integer Division, Integer Exponentiation, Integer Arithmetic, Integer Functions 3276@section Division Functions 3277@cindex Integer division functions 3278@cindex Division functions 3279 3280Division is undefined if the divisor is zero. Passing a zero divisor to the 3281division or modulo functions (including the modular powering functions 3282@code{mpz_powm} and @code{mpz_powm_ui}), will cause an intentional division by 3283zero. This lets a program handle arithmetic exceptions in these functions the 3284same way as for normal C @code{int} arithmetic. 3285 3286@c Separate deftypefun groups for cdiv, fdiv and tdiv produce a blank line 3287@c between each, and seem to let tex do a better job of page breaks than an 3288@c @sp 1 in the middle of one big set. 3289 3290@deftypefun void mpz_cdiv_q (mpz_t @var{q}, const mpz_t @var{n}, const mpz_t @var{d}) 3291@deftypefunx void mpz_cdiv_r (mpz_t @var{r}, const mpz_t @var{n}, const mpz_t @var{d}) 3292@deftypefunx void mpz_cdiv_qr (mpz_t @var{q}, mpz_t @var{r}, const mpz_t @var{n}, const mpz_t @var{d}) 3293@maybepagebreak 3294@deftypefunx {unsigned long int} mpz_cdiv_q_ui (mpz_t @var{q}, const mpz_t @var{n}, @w{unsigned long int @var{d}}) 3295@deftypefunx {unsigned long int} mpz_cdiv_r_ui (mpz_t @var{r}, const mpz_t @var{n}, @w{unsigned long int @var{d}}) 3296@deftypefunx {unsigned long int} mpz_cdiv_qr_ui (mpz_t @var{q}, mpz_t @var{r}, @w{const mpz_t @var{n}}, @w{unsigned long int @var{d}}) 3297@deftypefunx {unsigned long int} mpz_cdiv_ui (const mpz_t @var{n}, @w{unsigned long int @var{d}}) 3298@maybepagebreak 3299@deftypefunx void mpz_cdiv_q_2exp (mpz_t @var{q}, const mpz_t @var{n}, @w{mp_bitcnt_t @var{b}}) 3300@deftypefunx void mpz_cdiv_r_2exp (mpz_t @var{r}, const mpz_t @var{n}, @w{mp_bitcnt_t @var{b}}) 3301@end deftypefun 3302 3303@deftypefun void mpz_fdiv_q (mpz_t @var{q}, const mpz_t @var{n}, const mpz_t @var{d}) 3304@deftypefunx void mpz_fdiv_r (mpz_t @var{r}, const mpz_t @var{n}, const mpz_t @var{d}) 3305@deftypefunx void mpz_fdiv_qr (mpz_t @var{q}, mpz_t @var{r}, const mpz_t @var{n}, const mpz_t @var{d}) 3306@maybepagebreak 3307@deftypefunx {unsigned long int} mpz_fdiv_q_ui (mpz_t @var{q}, const mpz_t @var{n}, @w{unsigned long int @var{d}}) 3308@deftypefunx {unsigned long int} mpz_fdiv_r_ui (mpz_t @var{r}, const mpz_t @var{n}, @w{unsigned long int @var{d}}) 3309@deftypefunx {unsigned long int} mpz_fdiv_qr_ui (mpz_t @var{q}, mpz_t @var{r}, @w{const mpz_t @var{n}}, @w{unsigned long int @var{d}}) 3310@deftypefunx {unsigned long int} mpz_fdiv_ui (const mpz_t @var{n}, @w{unsigned long int @var{d}}) 3311@maybepagebreak 3312@deftypefunx void mpz_fdiv_q_2exp (mpz_t @var{q}, const mpz_t @var{n}, @w{mp_bitcnt_t @var{b}}) 3313@deftypefunx void mpz_fdiv_r_2exp (mpz_t @var{r}, const mpz_t @var{n}, @w{mp_bitcnt_t @var{b}}) 3314@end deftypefun 3315 3316@deftypefun void mpz_tdiv_q (mpz_t @var{q}, const mpz_t @var{n}, const mpz_t @var{d}) 3317@deftypefunx void mpz_tdiv_r (mpz_t @var{r}, const mpz_t @var{n}, const mpz_t @var{d}) 3318@deftypefunx void mpz_tdiv_qr (mpz_t @var{q}, mpz_t @var{r}, const mpz_t @var{n}, const mpz_t @var{d}) 3319@maybepagebreak 3320@deftypefunx {unsigned long int} mpz_tdiv_q_ui (mpz_t @var{q}, const mpz_t @var{n}, @w{unsigned long int @var{d}}) 3321@deftypefunx {unsigned long int} mpz_tdiv_r_ui (mpz_t @var{r}, const mpz_t @var{n}, @w{unsigned long int @var{d}}) 3322@deftypefunx {unsigned long int} mpz_tdiv_qr_ui (mpz_t @var{q}, mpz_t @var{r}, @w{const mpz_t @var{n}}, @w{unsigned long int @var{d}}) 3323@deftypefunx {unsigned long int} mpz_tdiv_ui (const mpz_t @var{n}, @w{unsigned long int @var{d}}) 3324@maybepagebreak 3325@deftypefunx void mpz_tdiv_q_2exp (mpz_t @var{q}, const mpz_t @var{n}, @w{mp_bitcnt_t @var{b}}) 3326@deftypefunx void mpz_tdiv_r_2exp (mpz_t @var{r}, const mpz_t @var{n}, @w{mp_bitcnt_t @var{b}}) 3327@cindex Bit shift right 3328 3329@sp 1 3330Divide @var{n} by @var{d}, forming a quotient @var{q} and/or remainder 3331@var{r}. For the @code{2exp} functions, @m{@var{d}=2^b, @var{d}=2^@var{b}}. 3332The rounding is in three styles, each suiting different applications. 3333 3334@itemize @bullet 3335@item 3336@code{cdiv} rounds @var{q} up towards @m{+\infty, +infinity}, and @var{r} will 3337have the opposite sign to @var{d}. The @code{c} stands for ``ceil''. 3338 3339@item 3340@code{fdiv} rounds @var{q} down towards @m{-\infty, @minus{}infinity}, and 3341@var{r} will have the same sign as @var{d}. The @code{f} stands for 3342``floor''. 3343 3344@item 3345@code{tdiv} rounds @var{q} towards zero, and @var{r} will have the same sign 3346as @var{n}. The @code{t} stands for ``truncate''. 3347@end itemize 3348 3349In all cases @var{q} and @var{r} will satisfy 3350@m{@var{n}=@var{q}@var{d}+@var{r}, @var{n}=@var{q}*@var{d}+@var{r}}, and 3351@var{r} will satisfy @math{0@le{}@GMPabs{@var{r}}<@GMPabs{@var{d}}}. 3352 3353The @code{q} functions calculate only the quotient, the @code{r} functions 3354only the remainder, and the @code{qr} functions calculate both. Note that for 3355@code{qr} the same variable cannot be passed for both @var{q} and @var{r}, or 3356results will be unpredictable. 3357 3358For the @code{ui} variants the return value is the remainder, and in fact 3359returning the remainder is all the @code{div_ui} functions do. For 3360@code{tdiv} and @code{cdiv} the remainder can be negative, so for those the 3361return value is the absolute value of the remainder. 3362 3363For the @code{2exp} variants the divisor is @m{2^b,2^@var{b}}. These 3364functions are implemented as right shifts and bit masks, but of course they 3365round the same as the other functions. 3366 3367For positive @var{n} both @code{mpz_fdiv_q_2exp} and @code{mpz_tdiv_q_2exp} 3368are simple bitwise right shifts. For negative @var{n}, @code{mpz_fdiv_q_2exp} 3369is effectively an arithmetic right shift treating @var{n} as twos complement 3370the same as the bitwise logical functions do, whereas @code{mpz_tdiv_q_2exp} 3371effectively treats @var{n} as sign and magnitude. 3372@end deftypefun 3373 3374@deftypefun void mpz_mod (mpz_t @var{r}, const mpz_t @var{n}, const mpz_t @var{d}) 3375@deftypefunx {unsigned long int} mpz_mod_ui (mpz_t @var{r}, const mpz_t @var{n}, @w{unsigned long int @var{d}}) 3376Set @var{r} to @var{n} @code{mod} @var{d}. The sign of the divisor is 3377ignored; the result is always non-negative. 3378 3379@code{mpz_mod_ui} is identical to @code{mpz_fdiv_r_ui} above, returning the 3380remainder as well as setting @var{r}. See @code{mpz_fdiv_ui} above if only 3381the return value is wanted. 3382@end deftypefun 3383 3384@deftypefun void mpz_divexact (mpz_t @var{q}, const mpz_t @var{n}, const mpz_t @var{d}) 3385@deftypefunx void mpz_divexact_ui (mpz_t @var{q}, const mpz_t @var{n}, unsigned long @var{d}) 3386@cindex Exact division functions 3387Set @var{q} to @var{n}/@var{d}. These functions produce correct results only 3388when it is known in advance that @var{d} divides @var{n}. 3389 3390These routines are much faster than the other division functions, and are the 3391best choice when exact division is known to occur, for example reducing a 3392rational to lowest terms. 3393@end deftypefun 3394 3395@deftypefun int mpz_divisible_p (const mpz_t @var{n}, const mpz_t @var{d}) 3396@deftypefunx int mpz_divisible_ui_p (const mpz_t @var{n}, unsigned long int @var{d}) 3397@deftypefunx int mpz_divisible_2exp_p (const mpz_t @var{n}, mp_bitcnt_t @var{b}) 3398@cindex Divisibility functions 3399Return non-zero if @var{n} is exactly divisible by @var{d}, or in the case of 3400@code{mpz_divisible_2exp_p} by @m{2^b,2^@var{b}}. 3401 3402@var{n} is divisible by @var{d} if there exists an integer @var{q} satisfying 3403@math{@var{n} = @var{q}@GMPmultiply{}@var{d}}. Unlike the other division 3404functions, @math{@var{d}=0} is accepted and following the rule it can be seen 3405that only 0 is considered divisible by 0. 3406@end deftypefun 3407 3408@deftypefun int mpz_congruent_p (const mpz_t @var{n}, const mpz_t @var{c}, const mpz_t @var{d}) 3409@deftypefunx int mpz_congruent_ui_p (const mpz_t @var{n}, unsigned long int @var{c}, unsigned long int @var{d}) 3410@deftypefunx int mpz_congruent_2exp_p (const mpz_t @var{n}, const mpz_t @var{c}, mp_bitcnt_t @var{b}) 3411@cindex Divisibility functions 3412@cindex Congruence functions 3413Return non-zero if @var{n} is congruent to @var{c} modulo @var{d}, or in the 3414case of @code{mpz_congruent_2exp_p} modulo @m{2^b,2^@var{b}}. 3415 3416@var{n} is congruent to @var{c} mod @var{d} if there exists an integer @var{q} 3417satisfying @math{@var{n} = @var{c} + @var{q}@GMPmultiply{}@var{d}}. Unlike 3418the other division functions, @math{@var{d}=0} is accepted and following the 3419rule it can be seen that @var{n} and @var{c} are considered congruent mod 0 3420only when exactly equal. 3421@end deftypefun 3422 3423 3424@need 2000 3425@node Integer Exponentiation, Integer Roots, Integer Division, Integer Functions 3426@section Exponentiation Functions 3427@cindex Integer exponentiation functions 3428@cindex Exponentiation functions 3429@cindex Powering functions 3430 3431@deftypefun void mpz_powm (mpz_t @var{rop}, const mpz_t @var{base}, const mpz_t @var{exp}, const mpz_t @var{mod}) 3432@deftypefunx void mpz_powm_ui (mpz_t @var{rop}, const mpz_t @var{base}, unsigned long int @var{exp}, const mpz_t @var{mod}) 3433Set @var{rop} to @m{base^{exp} \bmod mod, (@var{base} raised to @var{exp}) 3434modulo @var{mod}}. 3435 3436Negative @var{exp} is supported if an inverse @math{@var{base}^@W{-1} @bmod 3437@var{mod}} exists (see @code{mpz_invert} in @ref{Number Theoretic Functions}). 3438If an inverse doesn't exist then a divide by zero is raised. 3439@end deftypefun 3440 3441@deftypefun void mpz_powm_sec (mpz_t @var{rop}, const mpz_t @var{base}, const mpz_t @var{exp}, const mpz_t @var{mod}) 3442Set @var{rop} to @m{base^{exp} \bmod @var{mod}, (@var{base} raised to @var{exp}) 3443modulo @var{mod}}. 3444 3445It is required that @math{@var{exp} > 0} and that @var{mod} is odd. 3446 3447This function is designed to take the same time and have the same cache access 3448patterns for any two same-size arguments, assuming that function arguments are 3449placed at the same position and that the machine state is identical upon 3450function entry. This function is intended for cryptographic purposes, where 3451resilience to side-channel attacks is desired. 3452@end deftypefun 3453 3454@deftypefun void mpz_pow_ui (mpz_t @var{rop}, const mpz_t @var{base}, unsigned long int @var{exp}) 3455@deftypefunx void mpz_ui_pow_ui (mpz_t @var{rop}, unsigned long int @var{base}, unsigned long int @var{exp}) 3456Set @var{rop} to @m{base^{exp}, @var{base} raised to @var{exp}}. The case 3457@math{0^0} yields 1. 3458@end deftypefun 3459 3460 3461@need 2000 3462@node Integer Roots, Number Theoretic Functions, Integer Exponentiation, Integer Functions 3463@section Root Extraction Functions 3464@cindex Integer root functions 3465@cindex Root extraction functions 3466 3467@deftypefun int mpz_root (mpz_t @var{rop}, const mpz_t @var{op}, unsigned long int @var{n}) 3468Set @var{rop} to @m{\lfloor\root n \of {op}\rfloor@C{},} the truncated integer 3469part of the @var{n}th root of @var{op}. Return non-zero if the computation 3470was exact, i.e., if @var{op} is @var{rop} to the @var{n}th power. 3471@end deftypefun 3472 3473@deftypefun void mpz_rootrem (mpz_t @var{root}, mpz_t @var{rem}, const mpz_t @var{u}, unsigned long int @var{n}) 3474Set @var{root} to @m{\lfloor\root n \of {u}\rfloor@C{},} the truncated 3475integer part of the @var{n}th root of @var{u}. Set @var{rem} to the 3476remainder, @m{(@var{u} - @var{root}^n), 3477@var{u}@minus{}@var{root}**@var{n}}. 3478@end deftypefun 3479 3480@deftypefun void mpz_sqrt (mpz_t @var{rop}, const mpz_t @var{op}) 3481Set @var{rop} to @m{\lfloor\sqrt{@var{op}}\rfloor@C{},} the truncated 3482integer part of the square root of @var{op}. 3483@end deftypefun 3484 3485@deftypefun void mpz_sqrtrem (mpz_t @var{rop1}, mpz_t @var{rop2}, const mpz_t @var{op}) 3486Set @var{rop1} to @m{\lfloor\sqrt{@var{op}}\rfloor, the truncated integer part 3487of the square root of @var{op}}, like @code{mpz_sqrt}. Set @var{rop2} to the 3488remainder @m{(@var{op} - @var{rop1}^2), 3489@var{op}@minus{}@var{rop1}*@var{rop1}}, which will be zero if @var{op} is a 3490perfect square. 3491 3492If @var{rop1} and @var{rop2} are the same variable, the results are 3493undefined. 3494@end deftypefun 3495 3496@deftypefun int mpz_perfect_power_p (const mpz_t @var{op}) 3497@cindex Perfect power functions 3498@cindex Root testing functions 3499Return non-zero if @var{op} is a perfect power, i.e., if there exist integers 3500@m{a,@var{a}} and @m{b,@var{b}}, with @m{b>1, @var{b}>1}, such that 3501@m{@var{op}=a^b, @var{op} equals @var{a} raised to the power @var{b}}. 3502 3503Under this definition both 0 and 1 are considered to be perfect powers. 3504Negative values of @var{op} are accepted, but of course can only be odd 3505perfect powers. 3506@end deftypefun 3507 3508@deftypefun int mpz_perfect_square_p (const mpz_t @var{op}) 3509@cindex Perfect square functions 3510@cindex Root testing functions 3511Return non-zero if @var{op} is a perfect square, i.e., if the square root of 3512@var{op} is an integer. Under this definition both 0 and 1 are considered to 3513be perfect squares. 3514@end deftypefun 3515 3516 3517@need 2000 3518@node Number Theoretic Functions, Integer Comparisons, Integer Roots, Integer Functions 3519@section Number Theoretic Functions 3520@cindex Number theoretic functions 3521 3522@deftypefun int mpz_probab_prime_p (const mpz_t @var{n}, int @var{reps}) 3523@cindex Prime testing functions 3524@cindex Probable prime testing functions 3525Determine whether @var{n} is prime. Return 2 if @var{n} is definitely prime, 3526return 1 if @var{n} is probably prime (without being certain), or return 0 if 3527@var{n} is definitely composite. 3528 3529This function does some trial divisions, then some Miller-Rabin probabilistic 3530primality tests. The argument @var{reps} controls how many such tests are 3531done; a higher value will reduce the chances of a composite being returned as 3532``probably prime''. 25 is a reasonable number; a composite number will then be 3533identified as a prime with a probability of less than @m{2^{-50},2^(-50)}. 3534 3535Miller-Rabin and similar tests can be more properly called compositeness 3536tests. Numbers which fail are known to be composite but those which pass 3537might be prime or might be composite. Only a few composites pass, hence those 3538which pass are considered probably prime. 3539@end deftypefun 3540 3541@deftypefun void mpz_nextprime (mpz_t @var{rop}, const mpz_t @var{op}) 3542@cindex Next prime function 3543Set @var{rop} to the next prime greater than @var{op}. 3544 3545This function uses a probabilistic algorithm to identify primes. For 3546practical purposes it's adequate, the chance of a composite passing will be 3547extremely small. 3548@end deftypefun 3549 3550@c mpz_prime_p not implemented as of gmp 3.0. 3551 3552@c @deftypefun int mpz_prime_p (const mpz_t @var{n}) 3553@c Return non-zero if @var{n} is prime and zero if @var{n} is a non-prime. 3554@c This function is far slower than @code{mpz_probab_prime_p}, but then it 3555@c never returns non-zero for composite numbers. 3556 3557@c (For practical purposes, using @code{mpz_probab_prime_p} is adequate. 3558@c The likelihood of a programming error or hardware malfunction is orders 3559@c of magnitudes greater than the likelihood for a composite to pass as a 3560@c prime, if the @var{reps} argument is in the suggested range.) 3561@c @end deftypefun 3562 3563@deftypefun void mpz_gcd (mpz_t @var{rop}, const mpz_t @var{op1}, const mpz_t @var{op2}) 3564@cindex Greatest common divisor functions 3565@cindex GCD functions 3566Set @var{rop} to the greatest common divisor of @var{op1} and @var{op2}. The 3567result is always positive even if one or both input operands are negative. 3568Except if both inputs are zero; then this function defines @math{gcd(0,0) = 0}. 3569@end deftypefun 3570 3571@deftypefun {unsigned long int} mpz_gcd_ui (mpz_t @var{rop}, const mpz_t @var{op1}, unsigned long int @var{op2}) 3572Compute the greatest common divisor of @var{op1} and @var{op2}. If 3573@var{rop} is not @code{NULL}, store the result there. 3574 3575If the result is small enough to fit in an @code{unsigned long int}, it is 3576returned. If the result does not fit, 0 is returned, and the result is equal 3577to the argument @var{op1}. Note that the result will always fit if @var{op2} 3578is non-zero. 3579@end deftypefun 3580 3581@deftypefun void mpz_gcdext (mpz_t @var{g}, mpz_t @var{s}, mpz_t @var{t}, const mpz_t @var{a}, const mpz_t @var{b}) 3582@cindex Extended GCD 3583@cindex GCD extended 3584Set @var{g} to the greatest common divisor of @var{a} and @var{b}, and in 3585addition set @var{s} and @var{t} to coefficients satisfying 3586@math{@var{a}@GMPmultiply{}@var{s} + @var{b}@GMPmultiply{}@var{t} = @var{g}}. 3587The value in @var{g} is always positive, even if one or both of @var{a} and 3588@var{b} are negative (or zero if both inputs are zero). The values in @var{s} 3589and @var{t} are chosen such that normally, @math{@GMPabs{@var{s}} < 3590@GMPabs{@var{b}} / (2 @var{g})} and @math{@GMPabs{@var{t}} < @GMPabs{@var{a}} 3591/ (2 @var{g})}, and these relations define @var{s} and @var{t} uniquely. There 3592are a few exceptional cases: 3593 3594If @math{@GMPabs{@var{a}} = @GMPabs{@var{b}}}, then @math{@var{s} = 0}, 3595@math{@var{t} = sgn(@var{b})}. 3596 3597Otherwise, @math{@var{s} = sgn(@var{a})} if @math{@var{b} = 0} or 3598@math{@GMPabs{@var{b}} = 2 @var{g}}, and @math{@var{t} = sgn(@var{b})} if 3599@math{@var{a} = 0} or @math{@GMPabs{@var{a}} = 2 @var{g}}. 3600 3601In all cases, @math{@var{s} = 0} if and only if @math{@var{g} = 3602@GMPabs{@var{b}}}, i.e., if @var{b} divides @var{a} or @math{@var{a} = @var{b} 3603= 0}. 3604 3605If @var{t} is @code{NULL} then that value is not computed. 3606@end deftypefun 3607 3608@deftypefun void mpz_lcm (mpz_t @var{rop}, const mpz_t @var{op1}, const mpz_t @var{op2}) 3609@deftypefunx void mpz_lcm_ui (mpz_t @var{rop}, const mpz_t @var{op1}, unsigned long @var{op2}) 3610@cindex Least common multiple functions 3611@cindex LCM functions 3612Set @var{rop} to the least common multiple of @var{op1} and @var{op2}. 3613@var{rop} is always positive, irrespective of the signs of @var{op1} and 3614@var{op2}. @var{rop} will be zero if either @var{op1} or @var{op2} is zero. 3615@end deftypefun 3616 3617@deftypefun int mpz_invert (mpz_t @var{rop}, const mpz_t @var{op1}, const mpz_t @var{op2}) 3618@cindex Modular inverse functions 3619@cindex Inverse modulo functions 3620Compute the inverse of @var{op1} modulo @var{op2} and put the result in 3621@var{rop}. If the inverse exists, the return value is non-zero and @var{rop} 3622will satisfy @math{0 < @var{rop} < @GMPabs{@var{op2}}}. If an inverse doesn't 3623exist the return value is zero and @var{rop} is undefined. The behaviour of 3624this function is undefined when @var{op2} is zero. 3625@end deftypefun 3626 3627@deftypefun int mpz_jacobi (const mpz_t @var{a}, const mpz_t @var{b}) 3628@cindex Jacobi symbol functions 3629Calculate the Jacobi symbol @m{\left(a \over b\right), 3630(@var{a}/@var{b})}. This is defined only for @var{b} odd. 3631@end deftypefun 3632 3633@deftypefun int mpz_legendre (const mpz_t @var{a}, const mpz_t @var{p}) 3634@cindex Legendre symbol functions 3635Calculate the Legendre symbol @m{\left(a \over p\right), 3636(@var{a}/@var{p})}. This is defined only for @var{p} an odd positive 3637prime, and for such @var{p} it's identical to the Jacobi symbol. 3638@end deftypefun 3639 3640@deftypefun int mpz_kronecker (const mpz_t @var{a}, const mpz_t @var{b}) 3641@deftypefunx int mpz_kronecker_si (const mpz_t @var{a}, long @var{b}) 3642@deftypefunx int mpz_kronecker_ui (const mpz_t @var{a}, unsigned long @var{b}) 3643@deftypefunx int mpz_si_kronecker (long @var{a}, const mpz_t @var{b}) 3644@deftypefunx int mpz_ui_kronecker (unsigned long @var{a}, const mpz_t @var{b}) 3645@cindex Kronecker symbol functions 3646Calculate the Jacobi symbol @m{\left(a \over b\right), 3647(@var{a}/@var{b})} with the Kronecker extension @m{\left(a \over 36482\right) = \left(2 \over a\right), (a/2)=(2/a)} when @math{a} odd, or 3649@m{\left(a \over 2\right) = 0, (a/2)=0} when @math{a} even. 3650 3651When @var{b} is odd the Jacobi symbol and Kronecker symbol are 3652identical, so @code{mpz_kronecker_ui} etc can be used for mixed 3653precision Jacobi symbols too. 3654 3655For more information see Henri Cohen section 1.4.2 (@pxref{References}), 3656or any number theory textbook. See also the example program 3657@file{demos/qcn.c} which uses @code{mpz_kronecker_ui}. 3658@end deftypefun 3659 3660@deftypefun {mp_bitcnt_t} mpz_remove (mpz_t @var{rop}, const mpz_t @var{op}, const mpz_t @var{f}) 3661@cindex Remove factor functions 3662@cindex Factor removal functions 3663Remove all occurrences of the factor @var{f} from @var{op} and store the 3664result in @var{rop}. The return value is how many such occurrences were 3665removed. 3666@end deftypefun 3667 3668@deftypefun void mpz_fac_ui (mpz_t @var{rop}, unsigned long int @var{n}) 3669@deftypefunx void mpz_2fac_ui (mpz_t @var{rop}, unsigned long int @var{n}) 3670@deftypefunx void mpz_mfac_uiui (mpz_t @var{rop}, unsigned long int @var{n}, unsigned long int @var{m}) 3671@cindex Factorial functions 3672Set @var{rop} to the factorial of @var{n}: @code{mpz_fac_ui} computes the plain factorial @var{n}!, 3673@code{mpz_2fac_ui} computes the double-factorial @var{n}!!, and @code{mpz_mfac_uiui} the 3674@var{m}-multi-factorial @m{n!^{(m)}, @var{n}!^(@var{m})}. 3675@end deftypefun 3676 3677@deftypefun void mpz_primorial_ui (mpz_t @var{rop}, unsigned long int @var{n}) 3678@cindex Primorial functions 3679Set @var{rop} to the primorial of @var{n}, i.e. the product of all positive 3680prime numbers @math{@le{}@var{n}}. 3681@end deftypefun 3682 3683@deftypefun void mpz_bin_ui (mpz_t @var{rop}, const mpz_t @var{n}, unsigned long int @var{k}) 3684@deftypefunx void mpz_bin_uiui (mpz_t @var{rop}, unsigned long int @var{n}, @w{unsigned long int @var{k}}) 3685@cindex Binomial coefficient functions 3686Compute the binomial coefficient @m{\left({n}\atop{k}\right), @var{n} over 3687@var{k}} and store the result in @var{rop}. Negative values of @var{n} are 3688supported by @code{mpz_bin_ui}, using the identity 3689@m{\left({-n}\atop{k}\right) = (-1)^k \left({n+k-1}\atop{k}\right), 3690bin(-n@C{}k) = (-1)^k * bin(n+k-1@C{}k)}, see Knuth volume 1 section 1.2.6 3691part G. 3692@end deftypefun 3693 3694@deftypefun void mpz_fib_ui (mpz_t @var{fn}, unsigned long int @var{n}) 3695@deftypefunx void mpz_fib2_ui (mpz_t @var{fn}, mpz_t @var{fnsub1}, unsigned long int @var{n}) 3696@cindex Fibonacci sequence functions 3697@code{mpz_fib_ui} sets @var{fn} to to @m{F_n,F[n]}, the @var{n}'th Fibonacci 3698number. @code{mpz_fib2_ui} sets @var{fn} to @m{F_n,F[n]}, and @var{fnsub1} to 3699@m{F_{n-1},F[n-1]}. 3700 3701These functions are designed for calculating isolated Fibonacci numbers. When 3702a sequence of values is wanted it's best to start with @code{mpz_fib2_ui} and 3703iterate the defining @m{F_{n+1} = F_n + F_{n-1}, F[n+1]=F[n]+F[n-1]} or 3704similar. 3705@end deftypefun 3706 3707@deftypefun void mpz_lucnum_ui (mpz_t @var{ln}, unsigned long int @var{n}) 3708@deftypefunx void mpz_lucnum2_ui (mpz_t @var{ln}, mpz_t @var{lnsub1}, unsigned long int @var{n}) 3709@cindex Lucas number functions 3710@code{mpz_lucnum_ui} sets @var{ln} to to @m{L_n,L[n]}, the @var{n}'th Lucas 3711number. @code{mpz_lucnum2_ui} sets @var{ln} to @m{L_n,L[n]}, and @var{lnsub1} 3712to @m{L_{n-1},L[n-1]}. 3713 3714These functions are designed for calculating isolated Lucas numbers. When a 3715sequence of values is wanted it's best to start with @code{mpz_lucnum2_ui} and 3716iterate the defining @m{L_{n+1} = L_n + L_{n-1}, L[n+1]=L[n]+L[n-1]} or 3717similar. 3718 3719The Fibonacci numbers and Lucas numbers are related sequences, so it's never 3720necessary to call both @code{mpz_fib2_ui} and @code{mpz_lucnum2_ui}. The 3721formulas for going from Fibonacci to Lucas can be found in @ref{Lucas Numbers 3722Algorithm}, the reverse is straightforward too. 3723@end deftypefun 3724 3725 3726@node Integer Comparisons, Integer Logic and Bit Fiddling, Number Theoretic Functions, Integer Functions 3727@comment node-name, next, previous, up 3728@section Comparison Functions 3729@cindex Integer comparison functions 3730@cindex Comparison functions 3731 3732@deftypefn Function int mpz_cmp (const mpz_t @var{op1}, const mpz_t @var{op2}) 3733@deftypefnx Function int mpz_cmp_d (const mpz_t @var{op1}, double @var{op2}) 3734@deftypefnx Macro int mpz_cmp_si (const mpz_t @var{op1}, signed long int @var{op2}) 3735@deftypefnx Macro int mpz_cmp_ui (const mpz_t @var{op1}, unsigned long int @var{op2}) 3736Compare @var{op1} and @var{op2}. Return a positive value if @math{@var{op1} > 3737@var{op2}}, zero if @math{@var{op1} = @var{op2}}, or a negative value if 3738@math{@var{op1} < @var{op2}}. 3739 3740@code{mpz_cmp_ui} and @code{mpz_cmp_si} are macros and will evaluate their 3741arguments more than once. @code{mpz_cmp_d} can be called with an infinity, 3742but results are undefined for a NaN. 3743@end deftypefn 3744 3745@deftypefn Function int mpz_cmpabs (const mpz_t @var{op1}, const mpz_t @var{op2}) 3746@deftypefnx Function int mpz_cmpabs_d (const mpz_t @var{op1}, double @var{op2}) 3747@deftypefnx Function int mpz_cmpabs_ui (const mpz_t @var{op1}, unsigned long int @var{op2}) 3748Compare the absolute values of @var{op1} and @var{op2}. Return a positive 3749value if @math{@GMPabs{@var{op1}} > @GMPabs{@var{op2}}}, zero if 3750@math{@GMPabs{@var{op1}} = @GMPabs{@var{op2}}}, or a negative value if 3751@math{@GMPabs{@var{op1}} < @GMPabs{@var{op2}}}. 3752 3753@code{mpz_cmpabs_d} can be called with an infinity, but results are undefined 3754for a NaN. 3755@end deftypefn 3756 3757@deftypefn Macro int mpz_sgn (const mpz_t @var{op}) 3758@cindex Sign tests 3759@cindex Integer sign tests 3760Return @math{+1} if @math{@var{op} > 0}, 0 if @math{@var{op} = 0}, and 3761@math{-1} if @math{@var{op} < 0}. 3762 3763This function is actually implemented as a macro. It evaluates its argument 3764multiple times. 3765@end deftypefn 3766 3767 3768@node Integer Logic and Bit Fiddling, I/O of Integers, Integer Comparisons, Integer Functions 3769@comment node-name, next, previous, up 3770@section Logical and Bit Manipulation Functions 3771@cindex Logical functions 3772@cindex Bit manipulation functions 3773@cindex Integer logical functions 3774@cindex Integer bit manipulation functions 3775 3776These functions behave as if twos complement arithmetic were used (although 3777sign-magnitude is the actual implementation). The least significant bit is 3778number 0. 3779 3780@deftypefun void mpz_and (mpz_t @var{rop}, const mpz_t @var{op1}, const mpz_t @var{op2}) 3781Set @var{rop} to @var{op1} bitwise-and @var{op2}. 3782@end deftypefun 3783 3784@deftypefun void mpz_ior (mpz_t @var{rop}, const mpz_t @var{op1}, const mpz_t @var{op2}) 3785Set @var{rop} to @var{op1} bitwise inclusive-or @var{op2}. 3786@end deftypefun 3787 3788@deftypefun void mpz_xor (mpz_t @var{rop}, const mpz_t @var{op1}, const mpz_t @var{op2}) 3789Set @var{rop} to @var{op1} bitwise exclusive-or @var{op2}. 3790@end deftypefun 3791 3792@deftypefun void mpz_com (mpz_t @var{rop}, const mpz_t @var{op}) 3793Set @var{rop} to the one's complement of @var{op}. 3794@end deftypefun 3795 3796@deftypefun {mp_bitcnt_t} mpz_popcount (const mpz_t @var{op}) 3797If @math{@var{op}@ge{}0}, return the population count of @var{op}, which is the 3798number of 1 bits in the binary representation. If @math{@var{op}<0}, the 3799number of 1s is infinite, and the return value is the largest possible 3800@code{mp_bitcnt_t}. 3801@end deftypefun 3802 3803@deftypefun {mp_bitcnt_t} mpz_hamdist (const mpz_t @var{op1}, const mpz_t @var{op2}) 3804If @var{op1} and @var{op2} are both @math{@ge{}0} or both @math{<0}, return the 3805hamming distance between the two operands, which is the number of bit positions 3806where @var{op1} and @var{op2} have different bit values. If one operand is 3807@math{@ge{}0} and the other @math{<0} then the number of bits different is 3808infinite, and the return value is the largest possible @code{mp_bitcnt_t}. 3809@end deftypefun 3810 3811@deftypefun {mp_bitcnt_t} mpz_scan0 (const mpz_t @var{op}, mp_bitcnt_t @var{starting_bit}) 3812@deftypefunx {mp_bitcnt_t} mpz_scan1 (const mpz_t @var{op}, mp_bitcnt_t @var{starting_bit}) 3813@cindex Bit scanning functions 3814@cindex Scan bit functions 3815Scan @var{op}, starting from bit @var{starting_bit}, towards more significant 3816bits, until the first 0 or 1 bit (respectively) is found. Return the index of 3817the found bit. 3818 3819If the bit at @var{starting_bit} is already what's sought, then 3820@var{starting_bit} is returned. 3821 3822If there's no bit found, then the largest possible @code{mp_bitcnt_t} is 3823returned. This will happen in @code{mpz_scan0} past the end of a negative 3824number, or @code{mpz_scan1} past the end of a nonnegative number. 3825@end deftypefun 3826 3827@deftypefun void mpz_setbit (mpz_t @var{rop}, mp_bitcnt_t @var{bit_index}) 3828Set bit @var{bit_index} in @var{rop}. 3829@end deftypefun 3830 3831@deftypefun void mpz_clrbit (mpz_t @var{rop}, mp_bitcnt_t @var{bit_index}) 3832Clear bit @var{bit_index} in @var{rop}. 3833@end deftypefun 3834 3835@deftypefun void mpz_combit (mpz_t @var{rop}, mp_bitcnt_t @var{bit_index}) 3836Complement bit @var{bit_index} in @var{rop}. 3837@end deftypefun 3838 3839@deftypefun int mpz_tstbit (const mpz_t @var{op}, mp_bitcnt_t @var{bit_index}) 3840Test bit @var{bit_index} in @var{op} and return 0 or 1 accordingly. 3841@end deftypefun 3842 3843@node I/O of Integers, Integer Random Numbers, Integer Logic and Bit Fiddling, Integer Functions 3844@comment node-name, next, previous, up 3845@section Input and Output Functions 3846@cindex Integer input and output functions 3847@cindex Input functions 3848@cindex Output functions 3849@cindex I/O functions 3850 3851Functions that perform input from a stdio stream, and functions that output to 3852a stdio stream, of @code{mpz} numbers. Passing a @code{NULL} pointer for a 3853@var{stream} argument to any of these functions will make them read from 3854@code{stdin} and write to @code{stdout}, respectively. 3855 3856When using any of these functions, it is a good idea to include @file{stdio.h} 3857before @file{gmp.h}, since that will allow @file{gmp.h} to define prototypes 3858for these functions. 3859 3860See also @ref{Formatted Output} and @ref{Formatted Input}. 3861 3862@deftypefun size_t mpz_out_str (FILE *@var{stream}, int @var{base}, const mpz_t @var{op}) 3863Output @var{op} on stdio stream @var{stream}, as a string of digits in base 3864@var{base}. The base argument may vary from 2 to 62 or from @minus{}2 to 3865@minus{}36. 3866 3867For @var{base} in the range 2..36, digits and lower-case letters are used; for 3868@minus{}2..@minus{}36, digits and upper-case letters are used; for 37..62, 3869digits, upper-case letters, and lower-case letters (in that significance order) 3870are used. 3871 3872Return the number of bytes written, or if an error occurred, return 0. 3873@end deftypefun 3874 3875@deftypefun size_t mpz_inp_str (mpz_t @var{rop}, FILE *@var{stream}, int @var{base}) 3876Input a possibly white-space preceded string in base @var{base} from stdio 3877stream @var{stream}, and put the read integer in @var{rop}. 3878 3879The @var{base} may vary from 2 to 62, or if @var{base} is 0, then the leading 3880characters are used: @code{0x} and @code{0X} for hexadecimal, @code{0b} and 3881@code{0B} for binary, @code{0} for octal, or decimal otherwise. 3882 3883For bases up to 36, case is ignored; upper-case and lower-case letters have 3884the same value. For bases 37 to 62, upper-case letter represent the usual 388510..35 while lower-case letter represent 36..61. 3886 3887Return the number of bytes read, or if an error occurred, return 0. 3888@end deftypefun 3889 3890@deftypefun size_t mpz_out_raw (FILE *@var{stream}, const mpz_t @var{op}) 3891Output @var{op} on stdio stream @var{stream}, in raw binary format. The 3892integer is written in a portable format, with 4 bytes of size information, and 3893that many bytes of limbs. Both the size and the limbs are written in 3894decreasing significance order (i.e., in big-endian). 3895 3896The output can be read with @code{mpz_inp_raw}. 3897 3898Return the number of bytes written, or if an error occurred, return 0. 3899 3900The output of this can not be read by @code{mpz_inp_raw} from GMP 1, because 3901of changes necessary for compatibility between 32-bit and 64-bit machines. 3902@end deftypefun 3903 3904@deftypefun size_t mpz_inp_raw (mpz_t @var{rop}, FILE *@var{stream}) 3905Input from stdio stream @var{stream} in the format written by 3906@code{mpz_out_raw}, and put the result in @var{rop}. Return the number of 3907bytes read, or if an error occurred, return 0. 3908 3909This routine can read the output from @code{mpz_out_raw} also from GMP 1, in 3910spite of changes necessary for compatibility between 32-bit and 64-bit 3911machines. 3912@end deftypefun 3913 3914 3915@need 2000 3916@node Integer Random Numbers, Integer Import and Export, I/O of Integers, Integer Functions 3917@comment node-name, next, previous, up 3918@section Random Number Functions 3919@cindex Integer random number functions 3920@cindex Random number functions 3921 3922The random number functions of GMP come in two groups; older function 3923that rely on a global state, and newer functions that accept a state 3924parameter that is read and modified. Please see the @ref{Random Number 3925Functions} for more information on how to use and not to use random 3926number functions. 3927 3928@deftypefun void mpz_urandomb (mpz_t @var{rop}, gmp_randstate_t @var{state}, mp_bitcnt_t @var{n}) 3929Generate a uniformly distributed random integer in the range 0 to @m{2^n-1, 39302^@var{n}@minus{}1}, inclusive. 3931 3932The variable @var{state} must be initialized by calling one of the 3933@code{gmp_randinit} functions (@ref{Random State Initialization}) before 3934invoking this function. 3935@end deftypefun 3936 3937@deftypefun void mpz_urandomm (mpz_t @var{rop}, gmp_randstate_t @var{state}, const mpz_t @var{n}) 3938Generate a uniform random integer in the range 0 to @math{@var{n}-1}, 3939inclusive. 3940 3941The variable @var{state} must be initialized by calling one of the 3942@code{gmp_randinit} functions (@ref{Random State Initialization}) 3943before invoking this function. 3944@end deftypefun 3945 3946@deftypefun void mpz_rrandomb (mpz_t @var{rop}, gmp_randstate_t @var{state}, mp_bitcnt_t @var{n}) 3947Generate a random integer with long strings of zeros and ones in the 3948binary representation. Useful for testing functions and algorithms, 3949since this kind of random numbers have proven to be more likely to 3950trigger corner-case bugs. The random number will be in the range 39510 to @m{2^n-1, 2^@var{n}@minus{}1}, inclusive. 3952 3953The variable @var{state} must be initialized by calling one of the 3954@code{gmp_randinit} functions (@ref{Random State Initialization}) 3955before invoking this function. 3956@end deftypefun 3957 3958@deftypefun void mpz_random (mpz_t @var{rop}, mp_size_t @var{max_size}) 3959Generate a random integer of at most @var{max_size} limbs. The generated 3960random number doesn't satisfy any particular requirements of randomness. 3961Negative random numbers are generated when @var{max_size} is negative. 3962 3963This function is obsolete. Use @code{mpz_urandomb} or 3964@code{mpz_urandomm} instead. 3965@end deftypefun 3966 3967@deftypefun void mpz_random2 (mpz_t @var{rop}, mp_size_t @var{max_size}) 3968Generate a random integer of at most @var{max_size} limbs, with long strings 3969of zeros and ones in the binary representation. Useful for testing functions 3970and algorithms, since this kind of random numbers have proven to be more 3971likely to trigger corner-case bugs. Negative random numbers are generated 3972when @var{max_size} is negative. 3973 3974This function is obsolete. Use @code{mpz_rrandomb} instead. 3975@end deftypefun 3976 3977 3978@node Integer Import and Export, Miscellaneous Integer Functions, Integer Random Numbers, Integer Functions 3979@section Integer Import and Export 3980 3981@code{mpz_t} variables can be converted to and from arbitrary words of binary 3982data with the following functions. 3983 3984@deftypefun void mpz_import (mpz_t @var{rop}, size_t @var{count}, int @var{order}, size_t @var{size}, int @var{endian}, size_t @var{nails}, const void *@var{op}) 3985@cindex Integer import 3986@cindex Import 3987Set @var{rop} from an array of word data at @var{op}. 3988 3989The parameters specify the format of the data. @var{count} many words are 3990read, each @var{size} bytes. @var{order} can be 1 for most significant word 3991first or -1 for least significant first. Within each word @var{endian} can be 39921 for most significant byte first, -1 for least significant first, or 0 for 3993the native endianness of the host CPU@. The most significant @var{nails} bits 3994of each word are skipped, this can be 0 to use the full words. 3995 3996There is no sign taken from the data, @var{rop} will simply be a positive 3997integer. An application can handle any sign itself, and apply it for instance 3998with @code{mpz_neg}. 3999 4000There are no data alignment restrictions on @var{op}, any address is allowed. 4001 4002Here's an example converting an array of @code{unsigned long} data, most 4003significant element first, and host byte order within each value. 4004 4005@example 4006unsigned long a[20]; 4007/* Initialize @var{z} and @var{a} */ 4008mpz_import (z, 20, 1, sizeof(a[0]), 0, 0, a); 4009@end example 4010 4011This example assumes the full @code{sizeof} bytes are used for data in the 4012given type, which is usually true, and certainly true for @code{unsigned long} 4013everywhere we know of. However on Cray vector systems it may be noted that 4014@code{short} and @code{int} are always stored in 8 bytes (and with 4015@code{sizeof} indicating that) but use only 32 or 46 bits. The @var{nails} 4016feature can account for this, by passing for instance 4017@code{8*sizeof(int)-INT_BIT}. 4018@end deftypefun 4019 4020@deftypefun {void *} mpz_export (void *@var{rop}, size_t *@var{countp}, int @var{order}, size_t @var{size}, int @var{endian}, size_t @var{nails}, const mpz_t @var{op}) 4021@cindex Integer export 4022@cindex Export 4023Fill @var{rop} with word data from @var{op}. 4024 4025The parameters specify the format of the data produced. Each word will be 4026@var{size} bytes and @var{order} can be 1 for most significant word first or 4027-1 for least significant first. Within each word @var{endian} can be 1 for 4028most significant byte first, -1 for least significant first, or 0 for the 4029native endianness of the host CPU@. The most significant @var{nails} bits of 4030each word are unused and set to zero, this can be 0 to produce full words. 4031 4032The number of words produced is written to @code{*@var{countp}}, or 4033@var{countp} can be @code{NULL} to discard the count. @var{rop} must have 4034enough space for the data, or if @var{rop} is @code{NULL} then a result array 4035of the necessary size is allocated using the current GMP allocation function 4036(@pxref{Custom Allocation}). In either case the return value is the 4037destination used, either @var{rop} or the allocated block. 4038 4039If @var{op} is non-zero then the most significant word produced will be 4040non-zero. If @var{op} is zero then the count returned will be zero and 4041nothing written to @var{rop}. If @var{rop} is @code{NULL} in this case, no 4042block is allocated, just @code{NULL} is returned. 4043 4044The sign of @var{op} is ignored, just the absolute value is exported. An 4045application can use @code{mpz_sgn} to get the sign and handle it as desired. 4046(@pxref{Integer Comparisons}) 4047 4048There are no data alignment restrictions on @var{rop}, any address is allowed. 4049 4050When an application is allocating space itself the required size can be 4051determined with a calculation like the following. Since @code{mpz_sizeinbase} 4052always returns at least 1, @code{count} here will be at least one, which 4053avoids any portability problems with @code{malloc(0)}, though if @code{z} is 4054zero no space at all is actually needed (or written). 4055 4056@example 4057numb = 8*size - nail; 4058count = (mpz_sizeinbase (z, 2) + numb-1) / numb; 4059p = malloc (count * size); 4060@end example 4061@end deftypefun 4062 4063 4064@need 2000 4065@node Miscellaneous Integer Functions, Integer Special Functions, Integer Import and Export, Integer Functions 4066@comment node-name, next, previous, up 4067@section Miscellaneous Functions 4068@cindex Miscellaneous integer functions 4069@cindex Integer miscellaneous functions 4070 4071@deftypefun int mpz_fits_ulong_p (const mpz_t @var{op}) 4072@deftypefunx int mpz_fits_slong_p (const mpz_t @var{op}) 4073@deftypefunx int mpz_fits_uint_p (const mpz_t @var{op}) 4074@deftypefunx int mpz_fits_sint_p (const mpz_t @var{op}) 4075@deftypefunx int mpz_fits_ushort_p (const mpz_t @var{op}) 4076@deftypefunx int mpz_fits_sshort_p (const mpz_t @var{op}) 4077Return non-zero iff the value of @var{op} fits in an @code{unsigned long int}, 4078@code{signed long int}, @code{unsigned int}, @code{signed int}, @code{unsigned 4079short int}, or @code{signed short int}, respectively. Otherwise, return zero. 4080@end deftypefun 4081 4082@deftypefn Macro int mpz_odd_p (const mpz_t @var{op}) 4083@deftypefnx Macro int mpz_even_p (const mpz_t @var{op}) 4084Determine whether @var{op} is odd or even, respectively. Return non-zero if 4085yes, zero if no. These macros evaluate their argument more than once. 4086@end deftypefn 4087 4088@deftypefun size_t mpz_sizeinbase (const mpz_t @var{op}, int @var{base}) 4089@cindex Size in digits 4090@cindex Digits in an integer 4091Return the size of @var{op} measured in number of digits in the given 4092@var{base}. @var{base} can vary from 2 to 62. The sign of @var{op} is 4093ignored, just the absolute value is used. The result will be either exact or 40941 too big. If @var{base} is a power of 2, the result is always exact. If 4095@var{op} is zero the return value is always 1. 4096 4097This function can be used to determine the space required when converting 4098@var{op} to a string. The right amount of allocation is normally two more 4099than the value returned by @code{mpz_sizeinbase}, one extra for a minus sign 4100and one for the null-terminator. 4101 4102@cindex Most significant bit 4103It will be noted that @code{mpz_sizeinbase(@var{op},2)} can be used to locate 4104the most significant 1 bit in @var{op}, counting from 1. (Unlike the bitwise 4105functions which start from 0, @xref{Integer Logic and Bit Fiddling,, Logical 4106and Bit Manipulation Functions}.) 4107@end deftypefun 4108 4109 4110@node Integer Special Functions, , Miscellaneous Integer Functions, Integer Functions 4111@section Special Functions 4112@cindex Special integer functions 4113@cindex Integer special functions 4114 4115The functions in this section are for various special purposes. Most 4116applications will not need them. 4117 4118@deftypefun void mpz_array_init (mpz_t @var{integer_array}, mp_size_t @var{array_size}, @w{mp_size_t @var{fixed_num_bits}}) 4119@strong{This is an obsolete function. Do not use it.} 4120@end deftypefun 4121 4122@deftypefun {void *} _mpz_realloc (mpz_t @var{integer}, mp_size_t @var{new_alloc}) 4123Change the space for @var{integer} to @var{new_alloc} limbs. The value in 4124@var{integer} is preserved if it fits, or is set to 0 if not. The return 4125value is not useful to applications and should be ignored. 4126 4127@code{mpz_realloc2} is the preferred way to accomplish allocation changes like 4128this. @code{mpz_realloc2} and @code{_mpz_realloc} are the same except that 4129@code{_mpz_realloc} takes its size in limbs. 4130@end deftypefun 4131 4132@deftypefun mp_limb_t mpz_getlimbn (const mpz_t @var{op}, mp_size_t @var{n}) 4133Return limb number @var{n} from @var{op}. The sign of @var{op} is ignored, 4134just the absolute value is used. The least significant limb is number 0. 4135 4136@code{mpz_size} can be used to find how many limbs make up @var{op}. 4137@code{mpz_getlimbn} returns zero if @var{n} is outside the range 0 to 4138@code{mpz_size(@var{op})-1}. 4139@end deftypefun 4140 4141@deftypefun size_t mpz_size (const mpz_t @var{op}) 4142Return the size of @var{op} measured in number of limbs. If @var{op} is zero, 4143the returned value will be zero. 4144@c (@xref{Nomenclature}, for an explanation of the concept @dfn{limb}.) 4145@end deftypefun 4146 4147@deftypefun {const mp_limb_t *} mpz_limbs_read (const mpz_t @var{x}) 4148Return a pointer to the limb array representing the absolute value of @var{x}. 4149The size of the array is @code{mpz_size(@var{x})}. Intended for read access 4150only. 4151@end deftypefun 4152 4153@deftypefun {mp_limb_t *} mpz_limbs_write (mpz_t @var{x}, mp_size_t @var{n}) 4154@deftypefunx {mp_limb_t *} mpz_limbs_modify (mpz_t @var{x}, mp_size_t @var{n}) 4155Return a pointer to the limb array, intended for write access. The array is 4156reallocated as needed, to make room for @var{n} limbs. Requires @math{@var{n} 4157> 0}. The @code{mpz_limbs_modify} function returns an array that holds the old 4158absolute value of @var{x}, while @code{mpz_limbs_write} may destroy the old 4159value and return an array with unspecified contents. 4160@end deftypefun 4161 4162@deftypefun void mpz_limbs_finish (mpz_t @var{x}, mp_size_t @var{s}) 4163Updates the internal size field of @var{x}. Used after writing to the limb 4164array pointer returned by @code{mpz_limbs_write} or @code{mpz_limbs_modify} is 4165completed. The array should contain @math{@GMPabs{@var{s}}} valid limbs, 4166representing the new absolute value for @var{x}, and the sign of @var{x} is 4167taken from the sign of @var{s}. This function never reallocates @var{x}, so 4168the limb pointer remains valid. 4169@end deftypefun 4170 4171@c FIXME: Some more useful and less silly example? 4172@example 4173void foo (mpz_t x) 4174@{ 4175 mp_size_t n, i; 4176 mp_limb_t *xp; 4177 4178 n = mpz_size (x); 4179 xp = mpz_limbs_modify(x, 2*n); 4180 for (i = 0; i < n; i++) 4181 xp[n+i] = xp[n-1-i]; 4182 mpz_limbs_finish (x, mpz_sgn (x) < 0 ? - 2*n : 2*n); 4183@} 4184@end example 4185 4186@deftypefun mpz_srcptr mpz_roinit_n (mpz_t @var{x}, const mp_limb_t *@var{xp}, mp_size_t @var{xs}) 4187Special initialization of @var{x}, using the given limb array and size. 4188@var{x} should be treated as read-only: it can be passed safely as input to 4189any mpz function, but not as an output. The array @var{xp} must point to at 4190least a readable limb, its size is 4191@math{@GMPabs{@var{xs}}}, and the sign of @var{x} is the sign of @var{xs}. For 4192convenience, the function returns @var{x}, but cast to a const pointer type. 4193@end deftypefun 4194 4195@example 4196void foo (mpz_t x) 4197@{ 4198 static const mp_limb_t y[3] = @{ 0x1, 0x2, 0x3 @}; 4199 mpz_t tmp; 4200 mpz_add (x, x, mpz_roinit_n (tmp, y, 3)); 4201@} 4202@end example 4203 4204@deftypefn Macro mpz_t MPZ_ROINIT_N (mp_limb_t *@var{xp}, mp_size_t @var{xs}) 4205This macro expands to an initializer which can be assigned to an mpz_t 4206variable. The limb array @var{xp} must point to at least a readable limb, 4207moreover, unlike the @code{mpz_roinit_n} function, the array must be 4208normalized: if @var{xs} is non-zero, then 4209@code{@var{xp}[@math{@GMPabs{@var{xs}}-1}]} must be non-zero. Intended 4210primarily for constant values. Using it for non-constant values requires a C 4211compiler supporting C99. 4212@end deftypefn 4213 4214@example 4215void foo (mpz_t x) 4216@{ 4217 static const mp_limb_t ya[3] = @{ 0x1, 0x2, 0x3 @}; 4218 static const mpz_t y = MPZ_ROINIT_N ((mp_limb_t *) ya, 3); 4219 4220 mpz_add (x, x, y); 4221@} 4222@end example 4223 4224 4225@node Rational Number Functions, Floating-point Functions, Integer Functions, Top 4226@comment node-name, next, previous, up 4227@chapter Rational Number Functions 4228@cindex Rational number functions 4229 4230This chapter describes the GMP functions for performing arithmetic on rational 4231numbers. These functions start with the prefix @code{mpq_}. 4232 4233Rational numbers are stored in objects of type @code{mpq_t}. 4234 4235All rational arithmetic functions assume operands have a canonical form, and 4236canonicalize their result. The canonical from means that the denominator and 4237the numerator have no common factors, and that the denominator is positive. 4238Zero has the unique representation 0/1. 4239 4240Pure assignment functions do not canonicalize the assigned variable. It is 4241the responsibility of the user to canonicalize the assigned variable before 4242any arithmetic operations are performed on that variable. 4243 4244@deftypefun void mpq_canonicalize (mpq_t @var{op}) 4245Remove any factors that are common to the numerator and denominator of 4246@var{op}, and make the denominator positive. 4247@end deftypefun 4248 4249@menu 4250* Initializing Rationals:: 4251* Rational Conversions:: 4252* Rational Arithmetic:: 4253* Comparing Rationals:: 4254* Applying Integer Functions:: 4255* I/O of Rationals:: 4256@end menu 4257 4258@node Initializing Rationals, Rational Conversions, Rational Number Functions, Rational Number Functions 4259@comment node-name, next, previous, up 4260@section Initialization and Assignment Functions 4261@cindex Rational assignment functions 4262@cindex Assignment functions 4263@cindex Rational initialization functions 4264@cindex Initialization functions 4265 4266@deftypefun void mpq_init (mpq_t @var{x}) 4267Initialize @var{x} and set it to 0/1. Each variable should normally only be 4268initialized once, or at least cleared out (using the function @code{mpq_clear}) 4269between each initialization. 4270@end deftypefun 4271 4272@deftypefun void mpq_inits (mpq_t @var{x}, ...) 4273Initialize a NULL-terminated list of @code{mpq_t} variables, and set their 4274values to 0/1. 4275@end deftypefun 4276 4277@deftypefun void mpq_clear (mpq_t @var{x}) 4278Free the space occupied by @var{x}. Make sure to call this function for all 4279@code{mpq_t} variables when you are done with them. 4280@end deftypefun 4281 4282@deftypefun void mpq_clears (mpq_t @var{x}, ...) 4283Free the space occupied by a NULL-terminated list of @code{mpq_t} variables. 4284@end deftypefun 4285 4286@deftypefun void mpq_set (mpq_t @var{rop}, const mpq_t @var{op}) 4287@deftypefunx void mpq_set_z (mpq_t @var{rop}, const mpz_t @var{op}) 4288Assign @var{rop} from @var{op}. 4289@end deftypefun 4290 4291@deftypefun void mpq_set_ui (mpq_t @var{rop}, unsigned long int @var{op1}, unsigned long int @var{op2}) 4292@deftypefunx void mpq_set_si (mpq_t @var{rop}, signed long int @var{op1}, unsigned long int @var{op2}) 4293Set the value of @var{rop} to @var{op1}/@var{op2}. Note that if @var{op1} and 4294@var{op2} have common factors, @var{rop} has to be passed to 4295@code{mpq_canonicalize} before any operations are performed on @var{rop}. 4296@end deftypefun 4297 4298@deftypefun int mpq_set_str (mpq_t @var{rop}, const char *@var{str}, int @var{base}) 4299Set @var{rop} from a null-terminated string @var{str} in the given @var{base}. 4300 4301The string can be an integer like ``41'' or a fraction like ``41/152''. The 4302fraction must be in canonical form (@pxref{Rational Number Functions}), or if 4303not then @code{mpq_canonicalize} must be called. 4304 4305The numerator and optional denominator are parsed the same as in 4306@code{mpz_set_str} (@pxref{Assigning Integers}). White space is allowed in 4307the string, and is simply ignored. The @var{base} can vary from 2 to 62, or 4308if @var{base} is 0 then the leading characters are used: @code{0x} or @code{0X} for hex, 4309@code{0b} or @code{0B} for binary, 4310@code{0} for octal, or decimal otherwise. Note that this is done separately 4311for the numerator and denominator, so for instance @code{0xEF/100} is 239/100, 4312whereas @code{0xEF/0x100} is 239/256. 4313 4314The return value is 0 if the entire string is a valid number, or @minus{}1 if 4315not. 4316@end deftypefun 4317 4318@deftypefun void mpq_swap (mpq_t @var{rop1}, mpq_t @var{rop2}) 4319Swap the values @var{rop1} and @var{rop2} efficiently. 4320@end deftypefun 4321 4322 4323@need 2000 4324@node Rational Conversions, Rational Arithmetic, Initializing Rationals, Rational Number Functions 4325@comment node-name, next, previous, up 4326@section Conversion Functions 4327@cindex Rational conversion functions 4328@cindex Conversion functions 4329 4330@deftypefun double mpq_get_d (const mpq_t @var{op}) 4331Convert @var{op} to a @code{double}, truncating if necessary (i.e.@: rounding 4332towards zero). 4333 4334If the exponent from the conversion is too big or too small to fit a 4335@code{double} then the result is system dependent. For too big an infinity is 4336returned when available. For too small @math{0.0} is normally returned. 4337Hardware overflow, underflow and denorm traps may or may not occur. 4338@end deftypefun 4339 4340@deftypefun void mpq_set_d (mpq_t @var{rop}, double @var{op}) 4341@deftypefunx void mpq_set_f (mpq_t @var{rop}, const mpf_t @var{op}) 4342Set @var{rop} to the value of @var{op}. There is no rounding, this conversion 4343is exact. 4344@end deftypefun 4345 4346@deftypefun {char *} mpq_get_str (char *@var{str}, int @var{base}, const mpq_t @var{op}) 4347Convert @var{op} to a string of digits in base @var{base}. The base may vary 4348from 2 to 36. The string will be of the form @samp{num/den}, or if the 4349denominator is 1 then just @samp{num}. 4350 4351If @var{str} is @code{NULL}, the result string is allocated using the current 4352allocation function (@pxref{Custom Allocation}). The block will be 4353@code{strlen(str)+1} bytes, that being exactly enough for the string and 4354null-terminator. 4355 4356If @var{str} is not @code{NULL}, it should point to a block of storage large 4357enough for the result, that being 4358 4359@example 4360mpz_sizeinbase (mpq_numref(@var{op}), @var{base}) 4361+ mpz_sizeinbase (mpq_denref(@var{op}), @var{base}) + 3 4362@end example 4363 4364The three extra bytes are for a possible minus sign, possible slash, and the 4365null-terminator. 4366 4367A pointer to the result string is returned, being either the allocated block, 4368or the given @var{str}. 4369@end deftypefun 4370 4371 4372@node Rational Arithmetic, Comparing Rationals, Rational Conversions, Rational Number Functions 4373@comment node-name, next, previous, up 4374@section Arithmetic Functions 4375@cindex Rational arithmetic functions 4376@cindex Arithmetic functions 4377 4378@deftypefun void mpq_add (mpq_t @var{sum}, const mpq_t @var{addend1}, const mpq_t @var{addend2}) 4379Set @var{sum} to @var{addend1} + @var{addend2}. 4380@end deftypefun 4381 4382@deftypefun void mpq_sub (mpq_t @var{difference}, const mpq_t @var{minuend}, const mpq_t @var{subtrahend}) 4383Set @var{difference} to @var{minuend} @minus{} @var{subtrahend}. 4384@end deftypefun 4385 4386@deftypefun void mpq_mul (mpq_t @var{product}, const mpq_t @var{multiplier}, const mpq_t @var{multiplicand}) 4387Set @var{product} to @math{@var{multiplier} @GMPtimes{} @var{multiplicand}}. 4388@end deftypefun 4389 4390@deftypefun void mpq_mul_2exp (mpq_t @var{rop}, const mpq_t @var{op1}, mp_bitcnt_t @var{op2}) 4391Set @var{rop} to @m{@var{op1} \times 2^{op2}, @var{op1} times 2 raised to 4392@var{op2}}. 4393@end deftypefun 4394 4395@deftypefun void mpq_div (mpq_t @var{quotient}, const mpq_t @var{dividend}, const mpq_t @var{divisor}) 4396@cindex Division functions 4397Set @var{quotient} to @var{dividend}/@var{divisor}. 4398@end deftypefun 4399 4400@deftypefun void mpq_div_2exp (mpq_t @var{rop}, const mpq_t @var{op1}, mp_bitcnt_t @var{op2}) 4401Set @var{rop} to @m{@var{op1}/2^{op2}, @var{op1} divided by 2 raised to 4402@var{op2}}. 4403@end deftypefun 4404 4405@deftypefun void mpq_neg (mpq_t @var{negated_operand}, const mpq_t @var{operand}) 4406Set @var{negated_operand} to @minus{}@var{operand}. 4407@end deftypefun 4408 4409@deftypefun void mpq_abs (mpq_t @var{rop}, const mpq_t @var{op}) 4410Set @var{rop} to the absolute value of @var{op}. 4411@end deftypefun 4412 4413@deftypefun void mpq_inv (mpq_t @var{inverted_number}, const mpq_t @var{number}) 4414Set @var{inverted_number} to 1/@var{number}. If the new denominator is 4415zero, this routine will divide by zero. 4416@end deftypefun 4417 4418@node Comparing Rationals, Applying Integer Functions, Rational Arithmetic, Rational Number Functions 4419@comment node-name, next, previous, up 4420@section Comparison Functions 4421@cindex Rational comparison functions 4422@cindex Comparison functions 4423 4424@deftypefun int mpq_cmp (const mpq_t @var{op1}, const mpq_t @var{op2}) 4425Compare @var{op1} and @var{op2}. Return a positive value if @math{@var{op1} > 4426@var{op2}}, zero if @math{@var{op1} = @var{op2}}, and a negative value if 4427@math{@var{op1} < @var{op2}}. 4428 4429To determine if two rationals are equal, @code{mpq_equal} is faster than 4430@code{mpq_cmp}. 4431@end deftypefun 4432 4433@deftypefn Macro int mpq_cmp_ui (const mpq_t @var{op1}, unsigned long int @var{num2}, unsigned long int @var{den2}) 4434@deftypefnx Macro int mpq_cmp_si (const mpq_t @var{op1}, long int @var{num2}, unsigned long int @var{den2}) 4435Compare @var{op1} and @var{num2}/@var{den2}. Return a positive value if 4436@math{@var{op1} > @var{num2}/@var{den2}}, zero if @math{@var{op1} = 4437@var{num2}/@var{den2}}, and a negative value if @math{@var{op1} < 4438@var{num2}/@var{den2}}. 4439 4440@var{num2} and @var{den2} are allowed to have common factors. 4441 4442These functions are implemented as a macros and evaluate their arguments 4443multiple times. 4444@end deftypefn 4445 4446@deftypefn Macro int mpq_sgn (const mpq_t @var{op}) 4447@cindex Sign tests 4448@cindex Rational sign tests 4449Return @math{+1} if @math{@var{op} > 0}, 0 if @math{@var{op} = 0}, and 4450@math{-1} if @math{@var{op} < 0}. 4451 4452This function is actually implemented as a macro. It evaluates its 4453argument multiple times. 4454@end deftypefn 4455 4456@deftypefun int mpq_equal (const mpq_t @var{op1}, const mpq_t @var{op2}) 4457Return non-zero if @var{op1} and @var{op2} are equal, zero if they are 4458non-equal. Although @code{mpq_cmp} can be used for the same purpose, this 4459function is much faster. 4460@end deftypefun 4461 4462@node Applying Integer Functions, I/O of Rationals, Comparing Rationals, Rational Number Functions 4463@comment node-name, next, previous, up 4464@section Applying Integer Functions to Rationals 4465@cindex Rational numerator and denominator 4466@cindex Numerator and denominator 4467 4468The set of @code{mpq} functions is quite small. In particular, there are few 4469functions for either input or output. The following functions give direct 4470access to the numerator and denominator of an @code{mpq_t}. 4471 4472Note that if an assignment to the numerator and/or denominator could take an 4473@code{mpq_t} out of the canonical form described at the start of this chapter 4474(@pxref{Rational Number Functions}) then @code{mpq_canonicalize} must be 4475called before any other @code{mpq} functions are applied to that @code{mpq_t}. 4476 4477@deftypefn Macro mpz_t mpq_numref (const mpq_t @var{op}) 4478@deftypefnx Macro mpz_t mpq_denref (const mpq_t @var{op}) 4479Return a reference to the numerator and denominator of @var{op}, respectively. 4480The @code{mpz} functions can be used on the result of these macros. 4481@end deftypefn 4482 4483@deftypefun void mpq_get_num (mpz_t @var{numerator}, const mpq_t @var{rational}) 4484@deftypefunx void mpq_get_den (mpz_t @var{denominator}, const mpq_t @var{rational}) 4485@deftypefunx void mpq_set_num (mpq_t @var{rational}, const mpz_t @var{numerator}) 4486@deftypefunx void mpq_set_den (mpq_t @var{rational}, const mpz_t @var{denominator}) 4487Get or set the numerator or denominator of a rational. These functions are 4488equivalent to calling @code{mpz_set} with an appropriate @code{mpq_numref} or 4489@code{mpq_denref}. Direct use of @code{mpq_numref} or @code{mpq_denref} is 4490recommended instead of these functions. 4491@end deftypefun 4492 4493 4494@need 2000 4495@node I/O of Rationals, , Applying Integer Functions, Rational Number Functions 4496@comment node-name, next, previous, up 4497@section Input and Output Functions 4498@cindex Rational input and output functions 4499@cindex Input functions 4500@cindex Output functions 4501@cindex I/O functions 4502 4503Functions that perform input from a stdio stream, and functions that output to 4504a stdio stream, of @code{mpq} numbers. Passing a @code{NULL} pointer for a 4505@var{stream} argument to any of these functions will make them read from 4506@code{stdin} and write to @code{stdout}, respectively. 4507 4508When using any of these functions, it is a good idea to include @file{stdio.h} 4509before @file{gmp.h}, since that will allow @file{gmp.h} to define prototypes 4510for these functions. 4511 4512See also @ref{Formatted Output} and @ref{Formatted Input}. 4513 4514@deftypefun size_t mpq_out_str (FILE *@var{stream}, int @var{base}, const mpq_t @var{op}) 4515Output @var{op} on stdio stream @var{stream}, as a string of digits in base 4516@var{base}. The base may vary from 2 to 36. Output is in the form 4517@samp{num/den} or if the denominator is 1 then just @samp{num}. 4518 4519Return the number of bytes written, or if an error occurred, return 0. 4520@end deftypefun 4521 4522@deftypefun size_t mpq_inp_str (mpq_t @var{rop}, FILE *@var{stream}, int @var{base}) 4523Read a string of digits from @var{stream} and convert them to a rational in 4524@var{rop}. Any initial white-space characters are read and discarded. Return 4525the number of characters read (including white space), or 0 if a rational 4526could not be read. 4527 4528The input can be a fraction like @samp{17/63} or just an integer like 4529@samp{123}. Reading stops at the first character not in this form, and white 4530space is not permitted within the string. If the input might not be in 4531canonical form, then @code{mpq_canonicalize} must be called (@pxref{Rational 4532Number Functions}). 4533 4534The @var{base} can be between 2 and 36, or can be 0 in which case the leading 4535characters of the string determine the base, @samp{0x} or @samp{0X} for 4536hexadecimal, @samp{0} for octal, or decimal otherwise. The leading characters 4537are examined separately for the numerator and denominator of a fraction, so 4538for instance @samp{0x10/11} is @math{16/11}, whereas @samp{0x10/0x11} is 4539@math{16/17}. 4540@end deftypefun 4541 4542 4543@node Floating-point Functions, Low-level Functions, Rational Number Functions, Top 4544@comment node-name, next, previous, up 4545@chapter Floating-point Functions 4546@cindex Floating-point functions 4547@cindex Float functions 4548@cindex User-defined precision 4549@cindex Precision of floats 4550 4551GMP floating point numbers are stored in objects of type @code{mpf_t} and 4552functions operating on them have an @code{mpf_} prefix. 4553 4554The mantissa of each float has a user-selectable precision, limited only by 4555available memory. Each variable has its own precision, and that can be 4556increased or decreased at any time. 4557 4558The exponent of each float is a fixed precision, one machine word on most 4559systems. In the current implementation the exponent is a count of limbs, so 4560for example on a 32-bit system this means a range of roughly 4561@math{2^@W{-68719476768}} to @math{2^@W{68719476736}}, or on a 64-bit system 4562this will be greater. Note however that @code{mpf_get_str} can only return an 4563exponent which fits an @code{mp_exp_t} and currently @code{mpf_set_str} 4564doesn't accept exponents bigger than a @code{long}. 4565 4566Each variable keeps a size for the mantissa data actually in use. This means 4567that if a float is exactly represented in only a few bits then only those bits 4568will be used in a calculation, even if the selected precision is high. 4569 4570All calculations are performed to the precision of the destination variable. 4571Each function is defined to calculate with ``infinite precision'' followed by 4572a truncation to the destination precision, but of course the work done is only 4573what's needed to determine a result under that definition. 4574 4575The precision selected by the user for a variable is a minimum value, GMP may 4576increase it to facilitate efficient calculation. Currently this means 4577rounding up to a whole limb, and then sometimes having a further partial limb, 4578depending on the high limb of the mantissa. 4579 4580The mantissa is stored in binary. One consequence of this is that decimal 4581fractions like @math{0.1} cannot be represented exactly. The same is true of 4582plain IEEE @code{double} floats. This makes both highly unsuitable for 4583calculations involving money or other values that should be exact decimal 4584fractions. (Suitably scaled integers, or perhaps rationals, are better 4585choices.) 4586 4587The @code{mpf} functions and variables have no special notion of infinity or 4588not-a-number, and applications must take care not to overflow the exponent or 4589results will be unpredictable. This might change in a future release. 4590 4591Note that the @code{mpf} functions are @emph{not} intended as a smooth 4592extension to IEEE P754 arithmetic. In particular results obtained on one 4593computer often differ from the results on a computer with a different word 4594size. 4595 4596The GMP extension library MPFR (@url{http://mpfr.org}) is an alternative to 4597GMP's @code{mpf} functions. MPFR provides well-defined precision and accurate 4598rounding, and thereby naturally extends IEEE P754. 4599 4600@menu 4601* Initializing Floats:: 4602* Assigning Floats:: 4603* Simultaneous Float Init & Assign:: 4604* Converting Floats:: 4605* Float Arithmetic:: 4606* Float Comparison:: 4607* I/O of Floats:: 4608* Miscellaneous Float Functions:: 4609@end menu 4610 4611@node Initializing Floats, Assigning Floats, Floating-point Functions, Floating-point Functions 4612@comment node-name, next, previous, up 4613@section Initialization Functions 4614@cindex Float initialization functions 4615@cindex Initialization functions 4616 4617@deftypefun void mpf_set_default_prec (mp_bitcnt_t @var{prec}) 4618Set the default precision to be @strong{at least} @var{prec} bits. All 4619subsequent calls to @code{mpf_init} will use this precision, but previously 4620initialized variables are unaffected. 4621@end deftypefun 4622 4623@deftypefun {mp_bitcnt_t} mpf_get_default_prec (void) 4624Return the default precision actually used. 4625@end deftypefun 4626 4627An @code{mpf_t} object must be initialized before storing the first value in 4628it. The functions @code{mpf_init} and @code{mpf_init2} are used for that 4629purpose. 4630 4631@deftypefun void mpf_init (mpf_t @var{x}) 4632Initialize @var{x} to 0. Normally, a variable should be initialized once only 4633or at least be cleared, using @code{mpf_clear}, between initializations. The 4634precision of @var{x} is undefined unless a default precision has already been 4635established by a call to @code{mpf_set_default_prec}. 4636@end deftypefun 4637 4638@deftypefun void mpf_init2 (mpf_t @var{x}, mp_bitcnt_t @var{prec}) 4639Initialize @var{x} to 0 and set its precision to be @strong{at least} 4640@var{prec} bits. Normally, a variable should be initialized once only or at 4641least be cleared, using @code{mpf_clear}, between initializations. 4642@end deftypefun 4643 4644@deftypefun void mpf_inits (mpf_t @var{x}, ...) 4645Initialize a NULL-terminated list of @code{mpf_t} variables, and set their 4646values to 0. The precision of the initialized variables is undefined unless a 4647default precision has already been established by a call to 4648@code{mpf_set_default_prec}. 4649@end deftypefun 4650 4651@deftypefun void mpf_clear (mpf_t @var{x}) 4652Free the space occupied by @var{x}. Make sure to call this function for all 4653@code{mpf_t} variables when you are done with them. 4654@end deftypefun 4655 4656@deftypefun void mpf_clears (mpf_t @var{x}, ...) 4657Free the space occupied by a NULL-terminated list of @code{mpf_t} variables. 4658@end deftypefun 4659 4660@need 2000 4661Here is an example on how to initialize floating-point variables: 4662@example 4663@{ 4664 mpf_t x, y; 4665 mpf_init (x); /* use default precision */ 4666 mpf_init2 (y, 256); /* precision @emph{at least} 256 bits */ 4667 @dots{} 4668 /* Unless the program is about to exit, do ... */ 4669 mpf_clear (x); 4670 mpf_clear (y); 4671@} 4672@end example 4673 4674The following three functions are useful for changing the precision during a 4675calculation. A typical use would be for adjusting the precision gradually in 4676iterative algorithms like Newton-Raphson, making the computation precision 4677closely match the actual accurate part of the numbers. 4678 4679@deftypefun {mp_bitcnt_t} mpf_get_prec (const mpf_t @var{op}) 4680Return the current precision of @var{op}, in bits. 4681@end deftypefun 4682 4683@deftypefun void mpf_set_prec (mpf_t @var{rop}, mp_bitcnt_t @var{prec}) 4684Set the precision of @var{rop} to be @strong{at least} @var{prec} bits. The 4685value in @var{rop} will be truncated to the new precision. 4686 4687This function requires a call to @code{realloc}, and so should not be used in 4688a tight loop. 4689@end deftypefun 4690 4691@deftypefun void mpf_set_prec_raw (mpf_t @var{rop}, mp_bitcnt_t @var{prec}) 4692Set the precision of @var{rop} to be @strong{at least} @var{prec} bits, 4693without changing the memory allocated. 4694 4695@var{prec} must be no more than the allocated precision for @var{rop}, that 4696being the precision when @var{rop} was initialized, or in the most recent 4697@code{mpf_set_prec}. 4698 4699The value in @var{rop} is unchanged, and in particular if it had a higher 4700precision than @var{prec} it will retain that higher precision. New values 4701written to @var{rop} will use the new @var{prec}. 4702 4703Before calling @code{mpf_clear} or the full @code{mpf_set_prec}, another 4704@code{mpf_set_prec_raw} call must be made to restore @var{rop} to its original 4705allocated precision. Failing to do so will have unpredictable results. 4706 4707@code{mpf_get_prec} can be used before @code{mpf_set_prec_raw} to get the 4708original allocated precision. After @code{mpf_set_prec_raw} it reflects the 4709@var{prec} value set. 4710 4711@code{mpf_set_prec_raw} is an efficient way to use an @code{mpf_t} variable at 4712different precisions during a calculation, perhaps to gradually increase 4713precision in an iteration, or just to use various different precisions for 4714different purposes during a calculation. 4715@end deftypefun 4716 4717 4718@need 2000 4719@node Assigning Floats, Simultaneous Float Init & Assign, Initializing Floats, Floating-point Functions 4720@comment node-name, next, previous, up 4721@section Assignment Functions 4722@cindex Float assignment functions 4723@cindex Assignment functions 4724 4725These functions assign new values to already initialized floats 4726(@pxref{Initializing Floats}). 4727 4728@deftypefun void mpf_set (mpf_t @var{rop}, const mpf_t @var{op}) 4729@deftypefunx void mpf_set_ui (mpf_t @var{rop}, unsigned long int @var{op}) 4730@deftypefunx void mpf_set_si (mpf_t @var{rop}, signed long int @var{op}) 4731@deftypefunx void mpf_set_d (mpf_t @var{rop}, double @var{op}) 4732@deftypefunx void mpf_set_z (mpf_t @var{rop}, const mpz_t @var{op}) 4733@deftypefunx void mpf_set_q (mpf_t @var{rop}, const mpq_t @var{op}) 4734Set the value of @var{rop} from @var{op}. 4735@end deftypefun 4736 4737@deftypefun int mpf_set_str (mpf_t @var{rop}, const char *@var{str}, int @var{base}) 4738Set the value of @var{rop} from the string in @var{str}. The string is of the 4739form @samp{M@@N} or, if the base is 10 or less, alternatively @samp{MeN}. 4740@samp{M} is the mantissa and @samp{N} is the exponent. The mantissa is always 4741in the specified base. The exponent is either in the specified base or, if 4742@var{base} is negative, in decimal. The decimal point expected is taken from 4743the current locale, on systems providing @code{localeconv}. 4744 4745The argument @var{base} may be in the ranges 2 to 62, or @minus{}62 to 4746@minus{}2. Negative values are used to specify that the exponent is in 4747decimal. 4748 4749For bases up to 36, case is ignored; upper-case and lower-case letters have 4750the same value; for bases 37 to 62, upper-case letter represent the usual 475110..35 while lower-case letter represent 36..61. 4752 4753Unlike the corresponding @code{mpz} function, the base will not be determined 4754from the leading characters of the string if @var{base} is 0. This is so that 4755numbers like @samp{0.23} are not interpreted as octal. 4756 4757White space is allowed in the string, and is simply ignored. [This is not 4758really true; white-space is ignored in the beginning of the string and within 4759the mantissa, but not in other places, such as after a minus sign or in the 4760exponent. We are considering changing the definition of this function, making 4761it fail when there is any white-space in the input, since that makes a lot of 4762sense. Please tell us your opinion about this change. Do you really want it 4763to accept @nicode{"3 14"} as meaning 314 as it does now?] 4764 4765This function returns 0 if the entire string is a valid number in base 4766@var{base}. Otherwise it returns @minus{}1. 4767@end deftypefun 4768 4769@deftypefun void mpf_swap (mpf_t @var{rop1}, mpf_t @var{rop2}) 4770Swap @var{rop1} and @var{rop2} efficiently. Both the values and the 4771precisions of the two variables are swapped. 4772@end deftypefun 4773 4774 4775@node Simultaneous Float Init & Assign, Converting Floats, Assigning Floats, Floating-point Functions 4776@comment node-name, next, previous, up 4777@section Combined Initialization and Assignment Functions 4778@cindex Float assignment functions 4779@cindex Assignment functions 4780@cindex Float initialization functions 4781@cindex Initialization functions 4782 4783For convenience, GMP provides a parallel series of initialize-and-set functions 4784which initialize the output and then store the value there. These functions' 4785names have the form @code{mpf_init_set@dots{}} 4786 4787Once the float has been initialized by any of the @code{mpf_init_set@dots{}} 4788functions, it can be used as the source or destination operand for the ordinary 4789float functions. Don't use an initialize-and-set function on a variable 4790already initialized! 4791 4792@deftypefun void mpf_init_set (mpf_t @var{rop}, const mpf_t @var{op}) 4793@deftypefunx void mpf_init_set_ui (mpf_t @var{rop}, unsigned long int @var{op}) 4794@deftypefunx void mpf_init_set_si (mpf_t @var{rop}, signed long int @var{op}) 4795@deftypefunx void mpf_init_set_d (mpf_t @var{rop}, double @var{op}) 4796Initialize @var{rop} and set its value from @var{op}. 4797 4798The precision of @var{rop} will be taken from the active default precision, as 4799set by @code{mpf_set_default_prec}. 4800@end deftypefun 4801 4802@deftypefun int mpf_init_set_str (mpf_t @var{rop}, const char *@var{str}, int @var{base}) 4803Initialize @var{rop} and set its value from the string in @var{str}. See 4804@code{mpf_set_str} above for details on the assignment operation. 4805 4806Note that @var{rop} is initialized even if an error occurs. (I.e., you have to 4807call @code{mpf_clear} for it.) 4808 4809The precision of @var{rop} will be taken from the active default precision, as 4810set by @code{mpf_set_default_prec}. 4811@end deftypefun 4812 4813 4814@node Converting Floats, Float Arithmetic, Simultaneous Float Init & Assign, Floating-point Functions 4815@comment node-name, next, previous, up 4816@section Conversion Functions 4817@cindex Float conversion functions 4818@cindex Conversion functions 4819 4820@deftypefun double mpf_get_d (const mpf_t @var{op}) 4821Convert @var{op} to a @code{double}, truncating if necessary (i.e.@: rounding 4822towards zero). 4823 4824If the exponent in @var{op} is too big or too small to fit a @code{double} 4825then the result is system dependent. For too big an infinity is returned when 4826available. For too small @math{0.0} is normally returned. Hardware overflow, 4827underflow and denorm traps may or may not occur. 4828@end deftypefun 4829 4830@deftypefun double mpf_get_d_2exp (signed long int *@var{exp}, const mpf_t @var{op}) 4831Convert @var{op} to a @code{double}, truncating if necessary (i.e.@: rounding 4832towards zero), and with an exponent returned separately. 4833 4834The return value is in the range @math{0.5@le{}@GMPabs{@var{d}}<1} and the 4835exponent is stored to @code{*@var{exp}}. @m{@var{d} \times 2^{exp}, 4836@var{d} * 2^@var{exp}} is the (truncated) @var{op} value. If @var{op} is zero, 4837the return is @math{0.0} and 0 is stored to @code{*@var{exp}}. 4838 4839@cindex @code{frexp} 4840This is similar to the standard C @code{frexp} function (@pxref{Normalization 4841Functions,,, libc, The GNU C Library Reference Manual}). 4842@end deftypefun 4843 4844@deftypefun long mpf_get_si (const mpf_t @var{op}) 4845@deftypefunx {unsigned long} mpf_get_ui (const mpf_t @var{op}) 4846Convert @var{op} to a @code{long} or @code{unsigned long}, truncating any 4847fraction part. If @var{op} is too big for the return type, the result is 4848undefined. 4849 4850See also @code{mpf_fits_slong_p} and @code{mpf_fits_ulong_p} 4851(@pxref{Miscellaneous Float Functions}). 4852@end deftypefun 4853 4854@deftypefun {char *} mpf_get_str (char *@var{str}, mp_exp_t *@var{expptr}, int @var{base}, size_t @var{n_digits}, const mpf_t @var{op}) 4855Convert @var{op} to a string of digits in base @var{base}. The base argument 4856may vary from 2 to 62 or from @minus{}2 to @minus{}36. Up to @var{n_digits} 4857digits will be generated. Trailing zeros are not returned. No more digits 4858than can be accurately represented by @var{op} are ever generated. If 4859@var{n_digits} is 0 then that accurate maximum number of digits are generated. 4860 4861For @var{base} in the range 2..36, digits and lower-case letters are used; for 4862@minus{}2..@minus{}36, digits and upper-case letters are used; for 37..62, 4863digits, upper-case letters, and lower-case letters (in that significance order) 4864are used. 4865 4866If @var{str} is @code{NULL}, the result string is allocated using the current 4867allocation function (@pxref{Custom Allocation}). The block will be 4868@code{strlen(str)+1} bytes, that being exactly enough for the string and 4869null-terminator. 4870 4871If @var{str} is not @code{NULL}, it should point to a block of 4872@math{@var{n_digits} + 2} bytes, that being enough for the mantissa, a 4873possible minus sign, and a null-terminator. When @var{n_digits} is 0 to get 4874all significant digits, an application won't be able to know the space 4875required, and @var{str} should be @code{NULL} in that case. 4876 4877The generated string is a fraction, with an implicit radix point immediately 4878to the left of the first digit. The applicable exponent is written through 4879the @var{expptr} pointer. For example, the number 3.1416 would be returned as 4880string @nicode{"31416"} and exponent 1. 4881 4882When @var{op} is zero, an empty string is produced and the exponent returned 4883is 0. 4884 4885A pointer to the result string is returned, being either the allocated block 4886or the given @var{str}. 4887@end deftypefun 4888 4889 4890@node Float Arithmetic, Float Comparison, Converting Floats, Floating-point Functions 4891@comment node-name, next, previous, up 4892@section Arithmetic Functions 4893@cindex Float arithmetic functions 4894@cindex Arithmetic functions 4895 4896@deftypefun void mpf_add (mpf_t @var{rop}, const mpf_t @var{op1}, const mpf_t @var{op2}) 4897@deftypefunx void mpf_add_ui (mpf_t @var{rop}, const mpf_t @var{op1}, unsigned long int @var{op2}) 4898Set @var{rop} to @math{@var{op1} + @var{op2}}. 4899@end deftypefun 4900 4901@deftypefun void mpf_sub (mpf_t @var{rop}, const mpf_t @var{op1}, const mpf_t @var{op2}) 4902@deftypefunx void mpf_ui_sub (mpf_t @var{rop}, unsigned long int @var{op1}, const mpf_t @var{op2}) 4903@deftypefunx void mpf_sub_ui (mpf_t @var{rop}, const mpf_t @var{op1}, unsigned long int @var{op2}) 4904Set @var{rop} to @var{op1} @minus{} @var{op2}. 4905@end deftypefun 4906 4907@deftypefun void mpf_mul (mpf_t @var{rop}, const mpf_t @var{op1}, const mpf_t @var{op2}) 4908@deftypefunx void mpf_mul_ui (mpf_t @var{rop}, const mpf_t @var{op1}, unsigned long int @var{op2}) 4909Set @var{rop} to @math{@var{op1} @GMPtimes{} @var{op2}}. 4910@end deftypefun 4911 4912Division is undefined if the divisor is zero, and passing a zero divisor to the 4913divide functions will make these functions intentionally divide by zero. This 4914lets the user handle arithmetic exceptions in these functions in the same 4915manner as other arithmetic exceptions. 4916 4917@deftypefun void mpf_div (mpf_t @var{rop}, const mpf_t @var{op1}, const mpf_t @var{op2}) 4918@deftypefunx void mpf_ui_div (mpf_t @var{rop}, unsigned long int @var{op1}, const mpf_t @var{op2}) 4919@deftypefunx void mpf_div_ui (mpf_t @var{rop}, const mpf_t @var{op1}, unsigned long int @var{op2}) 4920@cindex Division functions 4921Set @var{rop} to @var{op1}/@var{op2}. 4922@end deftypefun 4923 4924@deftypefun void mpf_sqrt (mpf_t @var{rop}, const mpf_t @var{op}) 4925@deftypefunx void mpf_sqrt_ui (mpf_t @var{rop}, unsigned long int @var{op}) 4926@cindex Root extraction functions 4927Set @var{rop} to @m{\sqrt{@var{op}}, the square root of @var{op}}. 4928@end deftypefun 4929 4930@deftypefun void mpf_pow_ui (mpf_t @var{rop}, const mpf_t @var{op1}, unsigned long int @var{op2}) 4931@cindex Exponentiation functions 4932@cindex Powering functions 4933Set @var{rop} to @m{@var{op1}^{op2}, @var{op1} raised to the power @var{op2}}. 4934@end deftypefun 4935 4936@deftypefun void mpf_neg (mpf_t @var{rop}, const mpf_t @var{op}) 4937Set @var{rop} to @minus{}@var{op}. 4938@end deftypefun 4939 4940@deftypefun void mpf_abs (mpf_t @var{rop}, const mpf_t @var{op}) 4941Set @var{rop} to the absolute value of @var{op}. 4942@end deftypefun 4943 4944@deftypefun void mpf_mul_2exp (mpf_t @var{rop}, const mpf_t @var{op1}, mp_bitcnt_t @var{op2}) 4945Set @var{rop} to @m{@var{op1} \times 2^{op2}, @var{op1} times 2 raised to 4946@var{op2}}. 4947@end deftypefun 4948 4949@deftypefun void mpf_div_2exp (mpf_t @var{rop}, const mpf_t @var{op1}, mp_bitcnt_t @var{op2}) 4950Set @var{rop} to @m{@var{op1}/2^{op2}, @var{op1} divided by 2 raised to 4951@var{op2}}. 4952@end deftypefun 4953 4954@node Float Comparison, I/O of Floats, Float Arithmetic, Floating-point Functions 4955@comment node-name, next, previous, up 4956@section Comparison Functions 4957@cindex Float comparison functions 4958@cindex Comparison functions 4959 4960@deftypefun int mpf_cmp (const mpf_t @var{op1}, const mpf_t @var{op2}) 4961@deftypefunx int mpf_cmp_d (const mpf_t @var{op1}, double @var{op2}) 4962@deftypefunx int mpf_cmp_ui (const mpf_t @var{op1}, unsigned long int @var{op2}) 4963@deftypefunx int mpf_cmp_si (const mpf_t @var{op1}, signed long int @var{op2}) 4964Compare @var{op1} and @var{op2}. Return a positive value if @math{@var{op1} > 4965@var{op2}}, zero if @math{@var{op1} = @var{op2}}, and a negative value if 4966@math{@var{op1} < @var{op2}}. 4967 4968@code{mpf_cmp_d} can be called with an infinity, but results are undefined for 4969a NaN. 4970@end deftypefun 4971 4972@deftypefun int mpf_eq (const mpf_t @var{op1}, const mpf_t @var{op2}, mp_bitcnt_t op3) 4973Return non-zero if the first @var{op3} bits of @var{op1} and @var{op2} are 4974equal, zero otherwise. I.e., test if @var{op1} and @var{op2} are approximately 4975equal. 4976 4977Caution 1: All version of GMP up to version 4.2.4 compared just whole limbs, 4978meaning sometimes more than @var{op3} bits, sometimes fewer. 4979 4980Caution 2: This function will consider XXX11...111 and XX100...000 different, 4981even if ... is replaced by a semi-infinite number of bits. Such numbers are 4982really just one ulp off, and should be considered equal. 4983@end deftypefun 4984 4985@deftypefun void mpf_reldiff (mpf_t @var{rop}, const mpf_t @var{op1}, const mpf_t @var{op2}) 4986Compute the relative difference between @var{op1} and @var{op2} and store the 4987result in @var{rop}. This is @math{@GMPabs{@var{op1}-@var{op2}}/@var{op1}}. 4988@end deftypefun 4989 4990@deftypefn Macro int mpf_sgn (const mpf_t @var{op}) 4991@cindex Sign tests 4992@cindex Float sign tests 4993Return @math{+1} if @math{@var{op} > 0}, 0 if @math{@var{op} = 0}, and 4994@math{-1} if @math{@var{op} < 0}. 4995 4996This function is actually implemented as a macro. It evaluates its argument 4997multiple times. 4998@end deftypefn 4999 5000@node I/O of Floats, Miscellaneous Float Functions, Float Comparison, Floating-point Functions 5001@comment node-name, next, previous, up 5002@section Input and Output Functions 5003@cindex Float input and output functions 5004@cindex Input functions 5005@cindex Output functions 5006@cindex I/O functions 5007 5008Functions that perform input from a stdio stream, and functions that output to 5009a stdio stream, of @code{mpf} numbers. Passing a @code{NULL} pointer for a 5010@var{stream} argument to any of these functions will make them read from 5011@code{stdin} and write to @code{stdout}, respectively. 5012 5013When using any of these functions, it is a good idea to include @file{stdio.h} 5014before @file{gmp.h}, since that will allow @file{gmp.h} to define prototypes 5015for these functions. 5016 5017See also @ref{Formatted Output} and @ref{Formatted Input}. 5018 5019@deftypefun size_t mpf_out_str (FILE *@var{stream}, int @var{base}, size_t @var{n_digits}, const mpf_t @var{op}) 5020Print @var{op} to @var{stream}, as a string of digits. Return the number of 5021bytes written, or if an error occurred, return 0. 5022 5023The mantissa is prefixed with an @samp{0.} and is in the given @var{base}, 5024which may vary from 2 to 62 or from @minus{}2 to @minus{}36. An exponent is 5025then printed, separated by an @samp{e}, or if the base is greater than 10 then 5026by an @samp{@@}. The exponent is always in decimal. The decimal point follows 5027the current locale, on systems providing @code{localeconv}. 5028 5029For @var{base} in the range 2..36, digits and lower-case letters are used; for 5030@minus{}2..@minus{}36, digits and upper-case letters are used; for 37..62, 5031digits, upper-case letters, and lower-case letters (in that significance order) 5032are used. 5033 5034Up to @var{n_digits} will be printed from the mantissa, except that no more 5035digits than are accurately representable by @var{op} will be printed. 5036@var{n_digits} can be 0 to select that accurate maximum. 5037@end deftypefun 5038 5039@deftypefun size_t mpf_inp_str (mpf_t @var{rop}, FILE *@var{stream}, int @var{base}) 5040Read a string in base @var{base} from @var{stream}, and put the read float in 5041@var{rop}. The string is of the form @samp{M@@N} or, if the base is 10 or 5042less, alternatively @samp{MeN}. @samp{M} is the mantissa and @samp{N} is the 5043exponent. The mantissa is always in the specified base. The exponent is 5044either in the specified base or, if @var{base} is negative, in decimal. The 5045decimal point expected is taken from the current locale, on systems providing 5046@code{localeconv}. 5047 5048The argument @var{base} may be in the ranges 2 to 36, or @minus{}36 to 5049@minus{}2. Negative values are used to specify that the exponent is in 5050decimal. 5051 5052Unlike the corresponding @code{mpz} function, the base will not be determined 5053from the leading characters of the string if @var{base} is 0. This is so that 5054numbers like @samp{0.23} are not interpreted as octal. 5055 5056Return the number of bytes read, or if an error occurred, return 0. 5057@end deftypefun 5058 5059@c @deftypefun void mpf_out_raw (FILE *@var{stream}, const mpf_t @var{float}) 5060@c Output @var{float} on stdio stream @var{stream}, in raw binary 5061@c format. The float is written in a portable format, with 4 bytes of 5062@c size information, and that many bytes of limbs. Both the size and the 5063@c limbs are written in decreasing significance order. 5064@c @end deftypefun 5065 5066@c @deftypefun void mpf_inp_raw (mpf_t @var{float}, FILE *@var{stream}) 5067@c Input from stdio stream @var{stream} in the format written by 5068@c @code{mpf_out_raw}, and put the result in @var{float}. 5069@c @end deftypefun 5070 5071 5072@node Miscellaneous Float Functions, , I/O of Floats, Floating-point Functions 5073@comment node-name, next, previous, up 5074@section Miscellaneous Functions 5075@cindex Miscellaneous float functions 5076@cindex Float miscellaneous functions 5077 5078@deftypefun void mpf_ceil (mpf_t @var{rop}, const mpf_t @var{op}) 5079@deftypefunx void mpf_floor (mpf_t @var{rop}, const mpf_t @var{op}) 5080@deftypefunx void mpf_trunc (mpf_t @var{rop}, const mpf_t @var{op}) 5081@cindex Rounding functions 5082@cindex Float rounding functions 5083Set @var{rop} to @var{op} rounded to an integer. @code{mpf_ceil} rounds to the 5084next higher integer, @code{mpf_floor} to the next lower, and @code{mpf_trunc} 5085to the integer towards zero. 5086@end deftypefun 5087 5088@deftypefun int mpf_integer_p (const mpf_t @var{op}) 5089Return non-zero if @var{op} is an integer. 5090@end deftypefun 5091 5092@deftypefun int mpf_fits_ulong_p (const mpf_t @var{op}) 5093@deftypefunx int mpf_fits_slong_p (const mpf_t @var{op}) 5094@deftypefunx int mpf_fits_uint_p (const mpf_t @var{op}) 5095@deftypefunx int mpf_fits_sint_p (const mpf_t @var{op}) 5096@deftypefunx int mpf_fits_ushort_p (const mpf_t @var{op}) 5097@deftypefunx int mpf_fits_sshort_p (const mpf_t @var{op}) 5098Return non-zero if @var{op} would fit in the respective C data type, when 5099truncated to an integer. 5100@end deftypefun 5101 5102@deftypefun void mpf_urandomb (mpf_t @var{rop}, gmp_randstate_t @var{state}, mp_bitcnt_t @var{nbits}) 5103@cindex Random number functions 5104@cindex Float random number functions 5105Generate a uniformly distributed random float in @var{rop}, such that @math{0 5106@le{} @var{rop} < 1}, with @var{nbits} significant bits in the mantissa or 5107less if the precision of @var{rop} is smaller. 5108 5109The variable @var{state} must be initialized by calling one of the 5110@code{gmp_randinit} functions (@ref{Random State Initialization}) before 5111invoking this function. 5112@end deftypefun 5113 5114@deftypefun void mpf_random2 (mpf_t @var{rop}, mp_size_t @var{max_size}, mp_exp_t @var{exp}) 5115Generate a random float of at most @var{max_size} limbs, with long strings of 5116zeros and ones in the binary representation. The exponent of the number is in 5117the interval @minus{}@var{exp} to @var{exp} (in limbs). This function is 5118useful for testing functions and algorithms, since these kind of random 5119numbers have proven to be more likely to trigger corner-case bugs. Negative 5120random numbers are generated when @var{max_size} is negative. 5121@end deftypefun 5122 5123@c @deftypefun size_t mpf_size (const mpf_t @var{op}) 5124@c Return the size of @var{op} measured in number of limbs. If @var{op} is 5125@c zero, the returned value will be zero. (@xref{Nomenclature}, for an 5126@c explanation of the concept @dfn{limb}.) 5127@c 5128@c @strong{This function is obsolete. It will disappear from future GMP 5129@c releases.} 5130@c @end deftypefun 5131 5132 5133@node Low-level Functions, Random Number Functions, Floating-point Functions, Top 5134@comment node-name, next, previous, up 5135@chapter Low-level Functions 5136@cindex Low-level functions 5137 5138This chapter describes low-level GMP functions, used to implement the 5139high-level GMP functions, but also intended for time-critical user code. 5140 5141These functions start with the prefix @code{mpn_}. 5142 5143@c 1. Some of these function clobber input operands. 5144@c 5145 5146The @code{mpn} functions are designed to be as fast as possible, @strong{not} 5147to provide a coherent calling interface. The different functions have somewhat 5148similar interfaces, but there are variations that make them hard to use. These 5149functions do as little as possible apart from the real multiple precision 5150computation, so that no time is spent on things that not all callers need. 5151 5152A source operand is specified by a pointer to the least significant limb and a 5153limb count. A destination operand is specified by just a pointer. It is the 5154responsibility of the caller to ensure that the destination has enough space 5155for storing the result. 5156 5157With this way of specifying operands, it is possible to perform computations on 5158subranges of an argument, and store the result into a subrange of a 5159destination. 5160 5161A common requirement for all functions is that each source area needs at least 5162one limb. No size argument may be zero. Unless otherwise stated, in-place 5163operations are allowed where source and destination are the same, but not where 5164they only partly overlap. 5165 5166The @code{mpn} functions are the base for the implementation of the 5167@code{mpz_}, @code{mpf_}, and @code{mpq_} functions. 5168 5169This example adds the number beginning at @var{s1p} and the number beginning at 5170@var{s2p} and writes the sum at @var{destp}. All areas have @var{n} limbs. 5171 5172@example 5173cy = mpn_add_n (destp, s1p, s2p, n) 5174@end example 5175 5176It should be noted that the @code{mpn} functions make no attempt to identify 5177high or low zero limbs on their operands, or other special forms. On random 5178data such cases will be unlikely and it'd be wasteful for every function to 5179check every time. An application knowing something about its data can take 5180steps to trim or perhaps split its calculations. 5181@c 5182@c For reference, within gmp mpz_t operands never have high zero limbs, and 5183@c we rate low zero limbs as unlikely too (or something an application should 5184@c handle). This is a prime motivation for not stripping zero limbs in say 5185@c mpn_mul_n etc. 5186@c 5187@c Other applications doing variable-length calculations will quite likely do 5188@c something similar to mpz. And even if not then it's highly likely zero 5189@c limb stripping can be done at just a few judicious points, which will be 5190@c more efficient than having lots of mpn functions checking every time. 5191 5192@sp 1 5193@noindent 5194In the notation used below, a source operand is identified by the pointer to 5195the least significant limb, and the limb count in braces. For example, 5196@{@var{s1p}, @var{s1n}@}. 5197 5198@deftypefun mp_limb_t mpn_add_n (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n}) 5199Add @{@var{s1p}, @var{n}@} and @{@var{s2p}, @var{n}@}, and write the @var{n} 5200least significant limbs of the result to @var{rp}. Return carry, either 0 or 52011. 5202 5203This is the lowest-level function for addition. It is the preferred function 5204for addition, since it is written in assembly for most CPUs. For addition of 5205a variable to itself (i.e., @var{s1p} equals @var{s2p}) use @code{mpn_lshift} 5206with a count of 1 for optimal speed. 5207@end deftypefun 5208 5209@deftypefun mp_limb_t mpn_add_1 (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{n}, mp_limb_t @var{s2limb}) 5210Add @{@var{s1p}, @var{n}@} and @var{s2limb}, and write the @var{n} least 5211significant limbs of the result to @var{rp}. Return carry, either 0 or 1. 5212@end deftypefun 5213 5214@deftypefun mp_limb_t mpn_add (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{s1n}, const mp_limb_t *@var{s2p}, mp_size_t @var{s2n}) 5215Add @{@var{s1p}, @var{s1n}@} and @{@var{s2p}, @var{s2n}@}, and write the 5216@var{s1n} least significant limbs of the result to @var{rp}. Return carry, 5217either 0 or 1. 5218 5219This function requires that @var{s1n} is greater than or equal to @var{s2n}. 5220@end deftypefun 5221 5222@deftypefun mp_limb_t mpn_sub_n (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n}) 5223Subtract @{@var{s2p}, @var{n}@} from @{@var{s1p}, @var{n}@}, and write the 5224@var{n} least significant limbs of the result to @var{rp}. Return borrow, 5225either 0 or 1. 5226 5227This is the lowest-level function for subtraction. It is the preferred 5228function for subtraction, since it is written in assembly for most CPUs. 5229@end deftypefun 5230 5231@deftypefun mp_limb_t mpn_sub_1 (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{n}, mp_limb_t @var{s2limb}) 5232Subtract @var{s2limb} from @{@var{s1p}, @var{n}@}, and write the @var{n} least 5233significant limbs of the result to @var{rp}. Return borrow, either 0 or 1. 5234@end deftypefun 5235 5236@deftypefun mp_limb_t mpn_sub (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{s1n}, const mp_limb_t *@var{s2p}, mp_size_t @var{s2n}) 5237Subtract @{@var{s2p}, @var{s2n}@} from @{@var{s1p}, @var{s1n}@}, and write the 5238@var{s1n} least significant limbs of the result to @var{rp}. Return borrow, 5239either 0 or 1. 5240 5241This function requires that @var{s1n} is greater than or equal to 5242@var{s2n}. 5243@end deftypefun 5244 5245@deftypefun mp_limb_t mpn_neg (mp_limb_t *@var{rp}, const mp_limb_t *@var{sp}, mp_size_t @var{n}) 5246Perform the negation of @{@var{sp}, @var{n}@}, and write the result to 5247@{@var{rp}, @var{n}@}. This is equivalent to calling @code{mpn_sub_n} with a 5248@var{n}-limb zero minuend and passing @{@var{sp}, @var{n}@} as subtrahend. 5249Return borrow, either 0 or 1. 5250@end deftypefun 5251 5252@deftypefun void mpn_mul_n (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n}) 5253Multiply @{@var{s1p}, @var{n}@} and @{@var{s2p}, @var{n}@}, and write the 52542*@var{n}-limb result to @var{rp}. 5255 5256The destination has to have space for 2*@var{n} limbs, even if the product's 5257most significant limb is zero. No overlap is permitted between the 5258destination and either source. 5259 5260If the two input operands are the same, use @code{mpn_sqr}. 5261@end deftypefun 5262 5263@deftypefun mp_limb_t mpn_mul (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{s1n}, const mp_limb_t *@var{s2p}, mp_size_t @var{s2n}) 5264Multiply @{@var{s1p}, @var{s1n}@} and @{@var{s2p}, @var{s2n}@}, and write the 5265(@var{s1n}+@var{s2n})-limb result to @var{rp}. Return the most significant 5266limb of the result. 5267 5268The destination has to have space for @var{s1n} + @var{s2n} limbs, even if the 5269product's most significant limb is zero. No overlap is permitted between the 5270destination and either source. 5271 5272This function requires that @var{s1n} is greater than or equal to @var{s2n}. 5273@end deftypefun 5274 5275@deftypefun void mpn_sqr (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{n}) 5276Compute the square of @{@var{s1p}, @var{n}@} and write the 2*@var{n}-limb 5277result to @var{rp}. 5278 5279The destination has to have space for 2@var{n} limbs, even if the result's 5280most significant limb is zero. No overlap is permitted between the 5281destination and the source. 5282@end deftypefun 5283 5284@deftypefun mp_limb_t mpn_mul_1 (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{n}, mp_limb_t @var{s2limb}) 5285Multiply @{@var{s1p}, @var{n}@} by @var{s2limb}, and write the @var{n} least 5286significant limbs of the product to @var{rp}. Return the most significant 5287limb of the product. @{@var{s1p}, @var{n}@} and @{@var{rp}, @var{n}@} are 5288allowed to overlap provided @math{@var{rp} @le{} @var{s1p}}. 5289 5290This is a low-level function that is a building block for general 5291multiplication as well as other operations in GMP@. It is written in assembly 5292for most CPUs. 5293 5294Don't call this function if @var{s2limb} is a power of 2; use @code{mpn_lshift} 5295with a count equal to the logarithm of @var{s2limb} instead, for optimal speed. 5296@end deftypefun 5297 5298@deftypefun mp_limb_t mpn_addmul_1 (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{n}, mp_limb_t @var{s2limb}) 5299Multiply @{@var{s1p}, @var{n}@} and @var{s2limb}, and add the @var{n} least 5300significant limbs of the product to @{@var{rp}, @var{n}@} and write the result 5301to @var{rp}. Return the most significant limb of the product, plus carry-out 5302from the addition. 5303 5304This is a low-level function that is a building block for general 5305multiplication as well as other operations in GMP@. It is written in assembly 5306for most CPUs. 5307@end deftypefun 5308 5309@deftypefun mp_limb_t mpn_submul_1 (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{n}, mp_limb_t @var{s2limb}) 5310Multiply @{@var{s1p}, @var{n}@} and @var{s2limb}, and subtract the @var{n} 5311least significant limbs of the product from @{@var{rp}, @var{n}@} and write the 5312result to @var{rp}. Return the most significant limb of the product, plus 5313borrow-out from the subtraction. 5314 5315This is a low-level function that is a building block for general 5316multiplication and division as well as other operations in GMP@. It is written 5317in assembly for most CPUs. 5318@end deftypefun 5319 5320@deftypefun void mpn_tdiv_qr (mp_limb_t *@var{qp}, mp_limb_t *@var{rp}, mp_size_t @var{qxn}, const mp_limb_t *@var{np}, mp_size_t @var{nn}, const mp_limb_t *@var{dp}, mp_size_t @var{dn}) 5321Divide @{@var{np}, @var{nn}@} by @{@var{dp}, @var{dn}@} and put the quotient 5322at @{@var{qp}, @var{nn}@minus{}@var{dn}+1@} and the remainder at @{@var{rp}, 5323@var{dn}@}. The quotient is rounded towards 0. 5324 5325No overlap is permitted between arguments, except that @var{np} might equal 5326@var{rp}. The dividend size @var{nn} must be greater than or equal to divisor 5327size @var{dn}. The most significant limb of the divisor must be non-zero. The 5328@var{qxn} operand must be zero. 5329@end deftypefun 5330 5331@deftypefun mp_limb_t mpn_divrem (mp_limb_t *@var{r1p}, mp_size_t @var{qxn}, mp_limb_t *@var{rs2p}, mp_size_t @var{rs2n}, const mp_limb_t *@var{s3p}, mp_size_t @var{s3n}) 5332[This function is obsolete. Please call @code{mpn_tdiv_qr} instead for best 5333performance.] 5334 5335Divide @{@var{rs2p}, @var{rs2n}@} by @{@var{s3p}, @var{s3n}@}, and write the 5336quotient at @var{r1p}, with the exception of the most significant limb, which 5337is returned. The remainder replaces the dividend at @var{rs2p}; it will be 5338@var{s3n} limbs long (i.e., as many limbs as the divisor). 5339 5340In addition to an integer quotient, @var{qxn} fraction limbs are developed, and 5341stored after the integral limbs. For most usages, @var{qxn} will be zero. 5342 5343It is required that @var{rs2n} is greater than or equal to @var{s3n}. It is 5344required that the most significant bit of the divisor is set. 5345 5346If the quotient is not needed, pass @var{rs2p} + @var{s3n} as @var{r1p}. Aside 5347from that special case, no overlap between arguments is permitted. 5348 5349Return the most significant limb of the quotient, either 0 or 1. 5350 5351The area at @var{r1p} needs to be @var{rs2n} @minus{} @var{s3n} + @var{qxn} 5352limbs large. 5353@end deftypefun 5354 5355@deftypefn Function mp_limb_t mpn_divrem_1 (mp_limb_t *@var{r1p}, mp_size_t @var{qxn}, @w{mp_limb_t *@var{s2p}}, mp_size_t @var{s2n}, mp_limb_t @var{s3limb}) 5356@deftypefnx Macro mp_limb_t mpn_divmod_1 (mp_limb_t *@var{r1p}, mp_limb_t *@var{s2p}, @w{mp_size_t @var{s2n}}, @w{mp_limb_t @var{s3limb}}) 5357Divide @{@var{s2p}, @var{s2n}@} by @var{s3limb}, and write the quotient at 5358@var{r1p}. Return the remainder. 5359 5360The integer quotient is written to @{@var{r1p}+@var{qxn}, @var{s2n}@} and in 5361addition @var{qxn} fraction limbs are developed and written to @{@var{r1p}, 5362@var{qxn}@}. Either or both @var{s2n} and @var{qxn} can be zero. For most 5363usages, @var{qxn} will be zero. 5364 5365@code{mpn_divmod_1} exists for upward source compatibility and is simply a 5366macro calling @code{mpn_divrem_1} with a @var{qxn} of 0. 5367 5368The areas at @var{r1p} and @var{s2p} have to be identical or completely 5369separate, not partially overlapping. 5370@end deftypefn 5371 5372@deftypefun mp_limb_t mpn_divmod (mp_limb_t *@var{r1p}, mp_limb_t *@var{rs2p}, mp_size_t @var{rs2n}, const mp_limb_t *@var{s3p}, mp_size_t @var{s3n}) 5373[This function is obsolete. Please call @code{mpn_tdiv_qr} instead for best 5374performance.] 5375@end deftypefun 5376 5377@deftypefn Macro mp_limb_t mpn_divexact_by3 (mp_limb_t *@var{rp}, mp_limb_t *@var{sp}, @w{mp_size_t @var{n}}) 5378@deftypefnx Function mp_limb_t mpn_divexact_by3c (mp_limb_t *@var{rp}, mp_limb_t *@var{sp}, @w{mp_size_t @var{n}}, mp_limb_t @var{carry}) 5379Divide @{@var{sp}, @var{n}@} by 3, expecting it to divide exactly, and writing 5380the result to @{@var{rp}, @var{n}@}. If 3 divides exactly, the return value is 5381zero and the result is the quotient. If not, the return value is non-zero and 5382the result won't be anything useful. 5383 5384@code{mpn_divexact_by3c} takes an initial carry parameter, which can be the 5385return value from a previous call, so a large calculation can be done piece by 5386piece from low to high. @code{mpn_divexact_by3} is simply a macro calling 5387@code{mpn_divexact_by3c} with a 0 carry parameter. 5388 5389These routines use a multiply-by-inverse and will be faster than 5390@code{mpn_divrem_1} on CPUs with fast multiplication but slow division. 5391 5392The source @math{a}, result @math{q}, size @math{n}, initial carry @math{i}, 5393and return value @math{c} satisfy @m{cb^n+a-i=3q, c*b^n + a-i = 3*q}, where 5394@m{b=2\GMPraise{@code{GMP\_NUMB\_BITS}}, b=2^GMP_NUMB_BITS}. The 5395return @math{c} is always 0, 1 or 2, and the initial carry @math{i} must also 5396be 0, 1 or 2 (these are both borrows really). When @math{c=0} clearly 5397@math{q=(a-i)/3}. When @m{c \neq 0, c!=0}, the remainder @math{(a-i) @bmod{} 53983} is given by @math{3-c}, because @math{b @equiv{} 1 @bmod{} 3} (when 5399@code{mp_bits_per_limb} is even, which is always so currently). 5400@end deftypefn 5401 5402@deftypefun mp_limb_t mpn_mod_1 (const mp_limb_t *@var{s1p}, mp_size_t @var{s1n}, mp_limb_t @var{s2limb}) 5403Divide @{@var{s1p}, @var{s1n}@} by @var{s2limb}, and return the remainder. 5404@var{s1n} can be zero. 5405@end deftypefun 5406 5407@deftypefun mp_limb_t mpn_lshift (mp_limb_t *@var{rp}, const mp_limb_t *@var{sp}, mp_size_t @var{n}, unsigned int @var{count}) 5408Shift @{@var{sp}, @var{n}@} left by @var{count} bits, and write the result to 5409@{@var{rp}, @var{n}@}. The bits shifted out at the left are returned in the 5410least significant @var{count} bits of the return value (the rest of the return 5411value is zero). 5412 5413@var{count} must be in the range 1 to @nicode{mp_bits_per_limb}@minus{}1. The 5414regions @{@var{sp}, @var{n}@} and @{@var{rp}, @var{n}@} may overlap, provided 5415@math{@var{rp} @ge{} @var{sp}}. 5416 5417This function is written in assembly for most CPUs. 5418@end deftypefun 5419 5420@deftypefun mp_limb_t mpn_rshift (mp_limb_t *@var{rp}, const mp_limb_t *@var{sp}, mp_size_t @var{n}, unsigned int @var{count}) 5421Shift @{@var{sp}, @var{n}@} right by @var{count} bits, and write the result to 5422@{@var{rp}, @var{n}@}. The bits shifted out at the right are returned in the 5423most significant @var{count} bits of the return value (the rest of the return 5424value is zero). 5425 5426@var{count} must be in the range 1 to @nicode{mp_bits_per_limb}@minus{}1. The 5427regions @{@var{sp}, @var{n}@} and @{@var{rp}, @var{n}@} may overlap, provided 5428@math{@var{rp} @le{} @var{sp}}. 5429 5430This function is written in assembly for most CPUs. 5431@end deftypefun 5432 5433@deftypefun int mpn_cmp (const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n}) 5434Compare @{@var{s1p}, @var{n}@} and @{@var{s2p}, @var{n}@} and return a 5435positive value if @math{@var{s1} > @var{s2}}, 0 if they are equal, or a 5436negative value if @math{@var{s1} < @var{s2}}. 5437@end deftypefun 5438 5439@deftypefun mp_size_t mpn_gcd (mp_limb_t *@var{rp}, mp_limb_t *@var{xp}, mp_size_t @var{xn}, mp_limb_t *@var{yp}, mp_size_t @var{yn}) 5440Set @{@var{rp}, @var{retval}@} to the greatest common divisor of @{@var{xp}, 5441@var{xn}@} and @{@var{yp}, @var{yn}@}. The result can be up to @var{yn} limbs, 5442the return value is the actual number produced. Both source operands are 5443destroyed. 5444 5445It is required that @math{@var{xn} @ge @var{yn} > 0}, and the most significant 5446limb of @{@var{yp}, @var{yn}@} must be non-zero. No overlap is permitted 5447between @{@var{xp}, @var{xn}@} and @{@var{yp}, @var{yn}@}. 5448@end deftypefun 5449 5450@deftypefun mp_limb_t mpn_gcd_1 (const mp_limb_t *@var{xp}, mp_size_t @var{xn}, mp_limb_t @var{ylimb}) 5451Return the greatest common divisor of @{@var{xp}, @var{xn}@} and @var{ylimb}. 5452Both operands must be non-zero. 5453@end deftypefun 5454 5455@deftypefun mp_size_t mpn_gcdext (mp_limb_t *@var{gp}, mp_limb_t *@var{sp}, mp_size_t *@var{sn}, mp_limb_t *@var{up}, mp_size_t @var{un}, mp_limb_t *@var{vp}, mp_size_t @var{vn}) 5456Let @m{U,@var{U}} be defined by @{@var{up}, @var{un}@} and let @m{V,@var{V}} be 5457defined by @{@var{vp}, @var{vn}@}. 5458 5459Compute the greatest common divisor @math{G} of @math{U} and @math{V}. Compute 5460a cofactor @math{S} such that @math{G = US + VT}. The second cofactor @var{T} 5461is not computed but can easily be obtained from @m{(G - US) / V, (@var{G} - 5462@var{U}*@var{S}) / @var{V}} (the division will be exact). It is required that 5463@math{@var{un} @ge @var{vn} > 0}, and the most significant 5464limb of @{@var{vp}, @var{vn}@} must be non-zero. 5465 5466@math{S} satisfies @math{S = 1} or @math{@GMPabs{S} < V / (2 G)}. @math{S = 54670} if and only if @math{V} divides @math{U} (i.e., @math{G = V}). 5468 5469Store @math{G} at @var{gp} and let the return value define its limb count. 5470Store @math{S} at @var{sp} and let |*@var{sn}| define its limb count. @math{S} 5471can be negative; when this happens *@var{sn} will be negative. The area at 5472@var{gp} should have room for @var{vn} limbs and the area at @var{sp} should 5473have room for @math{@var{vn}+1} limbs. 5474 5475Both source operands are destroyed. 5476 5477Compatibility notes: GMP 4.3.0 and 4.3.1 defined @math{S} less strictly. 5478Earlier as well as later GMP releases define @math{S} as described here. 5479GMP releases before GMP 4.3.0 required additional space for both input and output 5480areas. More precisely, the areas @{@var{up}, @math{@var{un}+1}@} and 5481@{@var{vp}, @math{@var{vn}+1}@} were destroyed (i.e.@: the operands plus an 5482extra limb past the end of each), and the areas pointed to by @var{gp} and 5483@var{sp} should each have room for @math{@var{un}+1} limbs. 5484@end deftypefun 5485 5486@deftypefun mp_size_t mpn_sqrtrem (mp_limb_t *@var{r1p}, mp_limb_t *@var{r2p}, const mp_limb_t *@var{sp}, mp_size_t @var{n}) 5487Compute the square root of @{@var{sp}, @var{n}@} and put the result at 5488@{@var{r1p}, @math{@GMPceil{@var{n}/2}}@} and the remainder at @{@var{r2p}, 5489@var{retval}@}. @var{r2p} needs space for @var{n} limbs, but the return value 5490indicates how many are produced. 5491 5492The most significant limb of @{@var{sp}, @var{n}@} must be non-zero. The 5493areas @{@var{r1p}, @math{@GMPceil{@var{n}/2}}@} and @{@var{sp}, @var{n}@} must 5494be completely separate. The areas @{@var{r2p}, @var{n}@} and @{@var{sp}, 5495@var{n}@} must be either identical or completely separate. 5496 5497If the remainder is not wanted then @var{r2p} can be @code{NULL}, and in this 5498case the return value is zero or non-zero according to whether the remainder 5499would have been zero or non-zero. 5500 5501A return value of zero indicates a perfect square. See also 5502@code{mpn_perfect_square_p}. 5503@end deftypefun 5504 5505@deftypefun size_t mpn_sizeinbase (const mp_limb_t *@var{xp}, mp_size_t @var{n}, int @var{base}) 5506Return the size of @{@var{xp},@var{n}@} measured in number of digits in the 5507given @var{base}. @var{base} can vary from 2 to 62. Requires @math{@var{n} > 0} 5508and @math{@var{xp}[@var{n}-1] > 0}. The result will be either exact or 55091 too big. If @var{base} is a power of 2, the result is always exact. 5510@end deftypefun 5511 5512@deftypefun mp_size_t mpn_get_str (unsigned char *@var{str}, int @var{base}, mp_limb_t *@var{s1p}, mp_size_t @var{s1n}) 5513Convert @{@var{s1p}, @var{s1n}@} to a raw unsigned char array at @var{str} in 5514base @var{base}, and return the number of characters produced. There may be 5515leading zeros in the string. The string is not in ASCII; to convert it to 5516printable format, add the ASCII codes for @samp{0} or @samp{A}, depending on 5517the base and range. @var{base} can vary from 2 to 256. 5518 5519The most significant limb of the input @{@var{s1p}, @var{s1n}@} must be 5520non-zero. The input @{@var{s1p}, @var{s1n}@} is clobbered, except when 5521@var{base} is a power of 2, in which case it's unchanged. 5522 5523The area at @var{str} has to have space for the largest possible number 5524represented by a @var{s1n} long limb array, plus one extra character. 5525@end deftypefun 5526 5527@deftypefun mp_size_t mpn_set_str (mp_limb_t *@var{rp}, const unsigned char *@var{str}, size_t @var{strsize}, int @var{base}) 5528Convert bytes @{@var{str},@var{strsize}@} in the given @var{base} to limbs at 5529@var{rp}. 5530 5531@math{@var{str}[0]} is the most significant input byte and 5532@math{@var{str}[@var{strsize}-1]} is the least significant input byte. Each 5533byte should be a value in the range 0 to @math{@var{base}-1}, not an ASCII 5534character. @var{base} can vary from 2 to 256. 5535 5536The converted value is @{@var{rp},@var{rn}@} where @var{rn} is the return 5537value. If the most significant input byte @math{@var{str}[0]} is non-zero, 5538then @math{@var{rp}[@var{rn}-1]} will be non-zero, else 5539@math{@var{rp}[@var{rn}-1]} and some number of subsequent limbs may be zero. 5540 5541The area at @var{rp} has to have space for the largest possible number with 5542@var{strsize} digits in the chosen base, plus one extra limb. 5543 5544The input must have at least one byte, and no overlap is permitted between 5545@{@var{str},@var{strsize}@} and the result at @var{rp}. 5546@end deftypefun 5547 5548@deftypefun {mp_bitcnt_t} mpn_scan0 (const mp_limb_t *@var{s1p}, mp_bitcnt_t @var{bit}) 5549Scan @var{s1p} from bit position @var{bit} for the next clear bit. 5550 5551It is required that there be a clear bit within the area at @var{s1p} at or 5552beyond bit position @var{bit}, so that the function has something to return. 5553@end deftypefun 5554 5555@deftypefun {mp_bitcnt_t} mpn_scan1 (const mp_limb_t *@var{s1p}, mp_bitcnt_t @var{bit}) 5556Scan @var{s1p} from bit position @var{bit} for the next set bit. 5557 5558It is required that there be a set bit within the area at @var{s1p} at or 5559beyond bit position @var{bit}, so that the function has something to return. 5560@end deftypefun 5561 5562@deftypefun void mpn_random (mp_limb_t *@var{r1p}, mp_size_t @var{r1n}) 5563@deftypefunx void mpn_random2 (mp_limb_t *@var{r1p}, mp_size_t @var{r1n}) 5564Generate a random number of length @var{r1n} and store it at @var{r1p}. The 5565most significant limb is always non-zero. @code{mpn_random} generates 5566uniformly distributed limb data, @code{mpn_random2} generates long strings of 5567zeros and ones in the binary representation. 5568 5569@code{mpn_random2} is intended for testing the correctness of the @code{mpn} 5570routines. 5571@end deftypefun 5572 5573@deftypefun {mp_bitcnt_t} mpn_popcount (const mp_limb_t *@var{s1p}, mp_size_t @var{n}) 5574Count the number of set bits in @{@var{s1p}, @var{n}@}. 5575@end deftypefun 5576 5577@deftypefun {mp_bitcnt_t} mpn_hamdist (const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n}) 5578Compute the hamming distance between @{@var{s1p}, @var{n}@} and @{@var{s2p}, 5579@var{n}@}, which is the number of bit positions where the two operands have 5580different bit values. 5581@end deftypefun 5582 5583@deftypefun int mpn_perfect_square_p (const mp_limb_t *@var{s1p}, mp_size_t @var{n}) 5584Return non-zero iff @{@var{s1p}, @var{n}@} is a perfect square. 5585The most significant limb of the input @{@var{s1p}, @var{n}@} must be 5586non-zero. 5587@end deftypefun 5588 5589@deftypefun void mpn_and_n (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n}) 5590Perform the bitwise logical and of @{@var{s1p}, @var{n}@} and @{@var{s2p}, 5591@var{n}@}, and write the result to @{@var{rp}, @var{n}@}. 5592@end deftypefun 5593 5594@deftypefun void mpn_ior_n (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n}) 5595Perform the bitwise logical inclusive or of @{@var{s1p}, @var{n}@} and 5596@{@var{s2p}, @var{n}@}, and write the result to @{@var{rp}, @var{n}@}. 5597@end deftypefun 5598 5599@deftypefun void mpn_xor_n (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n}) 5600Perform the bitwise logical exclusive or of @{@var{s1p}, @var{n}@} and 5601@{@var{s2p}, @var{n}@}, and write the result to @{@var{rp}, @var{n}@}. 5602@end deftypefun 5603 5604@deftypefun void mpn_andn_n (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n}) 5605Perform the bitwise logical and of @{@var{s1p}, @var{n}@} and the bitwise 5606complement of @{@var{s2p}, @var{n}@}, and write the result to @{@var{rp}, @var{n}@}. 5607@end deftypefun 5608 5609@deftypefun void mpn_iorn_n (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n}) 5610Perform the bitwise logical inclusive or of @{@var{s1p}, @var{n}@} and the bitwise 5611complement of @{@var{s2p}, @var{n}@}, and write the result to @{@var{rp}, @var{n}@}. 5612@end deftypefun 5613 5614@deftypefun void mpn_nand_n (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n}) 5615Perform the bitwise logical and of @{@var{s1p}, @var{n}@} and @{@var{s2p}, 5616@var{n}@}, and write the bitwise complement of the result to @{@var{rp}, @var{n}@}. 5617@end deftypefun 5618 5619@deftypefun void mpn_nior_n (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n}) 5620Perform the bitwise logical inclusive or of @{@var{s1p}, @var{n}@} and 5621@{@var{s2p}, @var{n}@}, and write the bitwise complement of the result to 5622@{@var{rp}, @var{n}@}. 5623@end deftypefun 5624 5625@deftypefun void mpn_xnor_n (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n}) 5626Perform the bitwise logical exclusive or of @{@var{s1p}, @var{n}@} and 5627@{@var{s2p}, @var{n}@}, and write the bitwise complement of the result to 5628@{@var{rp}, @var{n}@}. 5629@end deftypefun 5630 5631@deftypefun void mpn_com (mp_limb_t *@var{rp}, const mp_limb_t *@var{sp}, mp_size_t @var{n}) 5632Perform the bitwise complement of @{@var{sp}, @var{n}@}, and write the result 5633to @{@var{rp}, @var{n}@}. 5634@end deftypefun 5635 5636@deftypefun void mpn_copyi (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{n}) 5637Copy from @{@var{s1p}, @var{n}@} to @{@var{rp}, @var{n}@}, increasingly. 5638@end deftypefun 5639 5640@deftypefun void mpn_copyd (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{n}) 5641Copy from @{@var{s1p}, @var{n}@} to @{@var{rp}, @var{n}@}, decreasingly. 5642@end deftypefun 5643 5644@deftypefun void mpn_zero (mp_limb_t *@var{rp}, mp_size_t @var{n}) 5645Zero @{@var{rp}, @var{n}@}. 5646@end deftypefun 5647 5648@sp 1 5649@section Low-level functions for cryptography 5650@cindex Low-level functions for cryptography 5651@cindex Cryptography functions, low-level 5652 5653The functions prefixed with @code{mpn_sec_} and @code{mpn_cnd_} are designed to 5654perform the exact same low-level operations and have the same cache access 5655patterns for any two same-size arguments, assuming that function arguments are 5656placed at the same position and that the machine state is identical upon 5657function entry. These functions are intended for cryptographic purposes, where 5658resilience to side-channel attacks is desired. 5659 5660These functions are less efficient than their ``leaky'' counterparts; their 5661performance for operands of the sizes typically used for cryptographic 5662applications is between 15% and 100% worse. For larger operands, these 5663functions might be inadequate, since they rely on asymptotically elementary 5664algorithms. 5665 5666These functions do not make any explicit allocations. Those of these functions 5667that need scratch space accept a scratch space operand. This convention allows 5668callers to keep sensitive data in designated memory areas. Note however that 5669compilers may choose to spill scalar values used within these functions to 5670their stack frame and that such scalars may contain sensitive data. 5671 5672In addition to these specially crafted functions, the following @code{mpn} 5673functions are naturally side-channel resistant: @code{mpn_add_n}, 5674@code{mpn_sub_n}, @code{mpn_lshift}, @code{mpn_rshift}, @code{mpn_zero}, 5675@code{mpn_copyi}, @code{mpn_copyd}, @code{mpn_com}, and the logical function 5676(@code{mpn_and_n}, etc). 5677 5678There are some exceptions from the side-channel resilience: (1) Some assembly 5679implementations of @code{mpn_lshift} identify shift-by-one as a special case. 5680This is a problem iff the shift count is a function of sensitive data. (2) 5681Alpha ev6 and Pentium4 using 64-bit limbs have leaky @code{mpn_add_n} and 5682@code{mpn_sub_n}. (3) Alpha ev6 has a leaky @code{mpn_mul_1} which also makes 5683@code{mpn_sec_mul} on those systems unsafe. 5684 5685@deftypefun mp_limb_t mpn_cnd_add_n (mp_limb_t @var{cnd}, mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n}) 5686@deftypefunx mp_limb_t mpn_cnd_sub_n (mp_limb_t @var{cnd}, mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n}) 5687These functions do conditional addition and subtraction. If @var{cnd} is 5688non-zero, they produce the same result as a regular @code{mpn_add_n} or 5689@code{mpn_sub_n}, and if @var{cnd} is zero, they copy @{@var{s1p},@var{n}@} to 5690the result area and return zero. The functions are designed to have timing and 5691memory access patterns depending only on size and location of the data areas, 5692but independent of the condition @var{cnd}. Like for @code{mpn_add_n} and 5693@code{mpn_sub_n}, on most machines, the timing will also be independent of the 5694actual limb values. 5695@end deftypefun 5696 5697@deftypefun mp_limb_t mpn_sec_add_1 (mp_limb_t *@var{rp}, const mp_limb_t *@var{ap}, mp_size_t @var{n}, mp_limb_t @var{b}, mp_limb_t *@var{tp}) 5698@deftypefunx mp_limb_t mpn_sec_sub_1 (mp_limb_t *@var{rp}, const mp_limb_t *@var{ap}, mp_size_t @var{n}, mp_limb_t @var{b}, mp_limb_t *@var{tp}) 5699Set @var{R} to @var{A} + @var{b} or @var{A} - @var{b}, respectively, where 5700@var{R} = @{@var{rp},@var{n}@}, @var{A} = @{@var{ap},@var{n}@}, and @var{b} is 5701a single limb. Returns carry. 5702 5703These functions take @math{O(N)} time, unlike the leaky functions 5704@code{mpn_add_1} which are @math{O(1)} on average. They require scratch space 5705of @code{mpn_sec_add_1_itch(@var{n})} and @code{mpn_sec_sub_1_itch(@var{n})} 5706limbs, respectively, to be passed in the @var{tp} parameter. The scratch space 5707requirements are guaranteed to increase monotonously in the operand size. 5708@end deftypefun 5709 5710@deftypefun void mpn_sec_mul (mp_limb_t *@var{rp}, const mp_limb_t *@var{ap}, mp_size_t @var{an}, const mp_limb_t *@var{bp}, mp_size_t @var{bn}, mp_limb_t *@var{tp}) 5711@deftypefunx mp_size_t mpn_sec_mul_itch (mp_size_t @var{an}, mp_size_t @var{bn}) 5712Set @var{R} to @math{A @times{} B}, where @var{A} = @{@var{ap},@var{an}@}, 5713@var{B} = @{@var{bp},@var{bn}@}, and @var{R} = 5714@{@var{rp},@math{@var{an}+@var{bn}}@}. 5715 5716It is required that @math{@var{an} @ge @var{bn} > 0}. 5717 5718No overlapping between @var{R} and the input operands is allowed. For 5719@math{@var{A} = @var{B}}, use @code{mpn_sec_sqr} for optimal performance. 5720 5721This function requires scratch space of @code{mpn_sec_mul_itch(@var{an}, 5722@var{bn})} limbs to be passed in the @var{tp} parameter. The scratch space 5723requirements are guaranteed to increase monotonously in the operand sizes. 5724@end deftypefun 5725 5726 5727@deftypefun void mpn_sec_sqr (mp_limb_t *@var{rp}, const mp_limb_t *@var{ap}, mp_size_t @var{an}, mp_limb_t *@var{tp}) 5728@deftypefunx mp_size_t mpn_sec_sqr_itch (mp_size_t @var{an}) 5729Set @var{R} to @math{A^2}, where @var{A} = @{@var{ap},@var{an}@}, and @var{R} = 5730@{@var{rp},@math{2@var{an}}@}. 5731 5732It is required that @math{@var{an} > 0}. 5733 5734No overlapping between @var{R} and the input operands is allowed. 5735 5736This function requires scratch space of @code{mpn_sec_sqr_itch(@var{an})} limbs 5737to be passed in the @var{tp} parameter. The scratch space requirements are 5738guaranteed to increase monotonously in the operand size. 5739@end deftypefun 5740 5741 5742@deftypefun void mpn_sec_powm (mp_limb_t *@var{rp}, const mp_limb_t *@var{bp}, mp_size_t @var{bn}, const mp_limb_t *@var{ep}, mp_bitcnt_t @var{enb}, const mp_limb_t *@var{mp}, mp_size_t @var{n}, mp_limb_t *@var{tp}) 5743@deftypefunx mp_size_t mpn_sec_powm_itch (mp_size_t @var{bn}, mp_bitcnt_t @var{enb}, size_t @var{n}) 5744Set @var{R} to @m{B^E \bmod @var{M}, (@var{B} raised to @var{E}) modulo 5745@var{M}}, where @var{R} = @{@var{rp},@var{n}@}, @var{M} = @{@var{mp},@var{n}@}, 5746and @var{E} = @{@var{ep},@math{@GMPceil{@var{enb} / 5747@code{GMP\_NUMB\_BITS}}}@}. 5748 5749It is required that @math{@var{B} > 0}, that @math{@var{M} > 0} is odd, and 5750that @m{@var{E} < 2@GMPraise{@var{enb}}, @var{E} < 2^@var{enb}}. 5751 5752No overlapping between @var{R} and the input operands is allowed. 5753 5754This function requires scratch space of @code{mpn_sec_powm_itch(@var{bn}, 5755@var{enb}, @var{n})} limbs to be passed in the @var{tp} parameter. The scratch 5756space requirements are guaranteed to increase monotonously in the operand 5757sizes. 5758@end deftypefun 5759 5760@deftypefun void mpn_sec_tabselect (mp_limb_t *@var{rp}, const mp_limb_t *@var{tab}, mp_size_t @var{n}, mp_size_t @var{nents}, mp_size_t @var{which}) 5761Select entry @var{which} from table @var{tab}, which has @var{nents} entries, each @var{n} 5762limbs. Store the selected entry at @var{rp}. 5763 5764This function reads the entire table to avoid side-channel information leaks. 5765@end deftypefun 5766 5767@deftypefun mp_limb_t mpn_sec_div_qr (mp_limb_t *@var{qp}, mp_limb_t *@var{np}, mp_size_t @var{nn}, const mp_limb_t *@var{dp}, mp_size_t @var{dn}, mp_limb_t *@var{tp}) 5768@deftypefunx mp_size_t mpn_sec_div_qr_itch (mp_size_t @var{nn}, mp_size_t @var{dn}) 5769 5770Set @var{Q} to @m{\lfloor @var{N} / @var{D}\rfloor, the truncated quotient 5771@var{N} / @var{D}} and @var{R} to @m{@var{N} \bmod @var{D}, @var{N} modulo 5772@var{D}}, where @var{N} = @{@var{np},@var{nn}@}, @var{D} = 5773@{@var{dp},@var{dn}@}, @var{Q}'s most significant limb is the function return 5774value and the remaining limbs are @{@var{qp},@var{nn-dn}@}, and @var{R} = 5775@{@var{np},@var{dn}@}. 5776 5777It is required that @math{@var{nn} @ge @var{dn} @ge 1}, and that 5778@m{@var{dp}[@var{dn}-1] @neq 0, @var{dp}[@var{dn}-1] != 0}. This does not 5779imply that @math{@var{N} @ge @var{D}} since @var{N} might be zero-padded. 5780 5781Note the overlapping between @var{N} and @var{R}. No other operand overlapping 5782is allowed. The entire space occupied by @var{N} is overwritten. 5783 5784This function requires scratch space of @code{mpn_sec_div_qr_itch(@var{nn}, 5785@var{dn})} limbs to be passed in the @var{tp} parameter. 5786@end deftypefun 5787 5788@deftypefun void mpn_sec_div_r (mp_limb_t *@var{np}, mp_size_t @var{nn}, const mp_limb_t *@var{dp}, mp_size_t @var{dn}, mp_limb_t *@var{tp}) 5789@deftypefunx mp_size_t mpn_sec_div_r_itch (mp_size_t @var{nn}, mp_size_t @var{dn}) 5790 5791Set @var{R} to @m{@var{N} \bmod @var{D}, @var{N} modulo @var{D}}, where @var{N} 5792= @{@var{np},@var{nn}@}, @var{D} = @{@var{dp},@var{dn}@}, and @var{R} = 5793@{@var{np},@var{dn}@}. 5794 5795It is required that @math{@var{nn} @ge @var{dn} @ge 1}, and that 5796@m{@var{dp}[@var{dn}-1] @neq 0, @var{dp}[@var{dn}-1] != 0}. This does not 5797imply that @math{@var{N} @ge @var{D}} since @var{N} might be zero-padded. 5798 5799Note the overlapping between @var{N} and @var{R}. No other operand overlapping 5800is allowed. The entire space occupied by @var{N} is overwritten. 5801 5802This function requires scratch space of @code{mpn_sec_div_r_itch(@var{nn}, 5803@var{dn})} limbs to be passed in the @var{tp} parameter. 5804@end deftypefun 5805 5806@deftypefun int mpn_sec_invert (mp_limb_t *@var{rp}, mp_limb_t *@var{ap}, const mp_limb_t *@var{mp}, mp_size_t @var{n}, mp_bitcnt_t @var{nbcnt}, mp_limb_t *@var{tp}) 5807@deftypefunx mp_size_t mpn_sec_invert_itch (mp_size_t @var{n}) 5808Set @var{R} to @m{@var{A}^{-1} \bmod @var{M}, the inverse of @var{A} modulo 5809@var{M}}, where @var{R} = @{@var{rp},@var{n}@}, @var{A} = @{@var{ap},@var{n}@}, 5810and @var{M} = @{@var{mp},@var{n}@}. @strong{This function's interface is 5811preliminary.} 5812 5813If an inverse exists, return 1, otherwise return 0 and leave @var{R} 5814undefined. In either case, the input @var{A} is destroyed. 5815 5816It is required that @var{M} is odd, and that @math{@var{nbcnt} @ge 5817@GMPceil{\log(@var{A}+1)} + @GMPceil{\log(@var{M}+1)}}. A safe choice is 5818@m{@var{nbcnt} = 2@var{n} @times{} @code{GMP\_NUMB\_BITS}, @var{nbcnt} = 2 5819@times{} @var{n} @times{} GMP_NUMB_BITS}, but a smaller value might improve 5820performance if @var{M} or @var{A} are known to have leading zero bits. 5821 5822This function requires scratch space of @code{mpn_sec_invert_itch(@var{n})} 5823limbs to be passed in the @var{tp} parameter. 5824@end deftypefun 5825 5826 5827@sp 1 5828@section Nails 5829@cindex Nails 5830 5831@strong{Everything in this section is highly experimental and may disappear or 5832be subject to incompatible changes in a future version of GMP.} 5833 5834Nails are an experimental feature whereby a few bits are left unused at the 5835top of each @code{mp_limb_t}. This can significantly improve carry handling 5836on some processors. 5837 5838All the @code{mpn} functions accepting limb data will expect the nail bits to 5839be zero on entry, and will return data with the nails similarly all zero. 5840This applies both to limb vectors and to single limb arguments. 5841 5842Nails can be enabled by configuring with @samp{--enable-nails}. By default 5843the number of bits will be chosen according to what suits the host processor, 5844but a particular number can be selected with @samp{--enable-nails=N}. 5845 5846At the mpn level, a nail build is neither source nor binary compatible with a 5847non-nail build, strictly speaking. But programs acting on limbs only through 5848the mpn functions are likely to work equally well with either build, and 5849judicious use of the definitions below should make any program compatible with 5850either build, at the source level. 5851 5852For the higher level routines, meaning @code{mpz} etc, a nail build should be 5853fully source and binary compatible with a non-nail build. 5854 5855@defmac GMP_NAIL_BITS 5856@defmacx GMP_NUMB_BITS 5857@defmacx GMP_LIMB_BITS 5858@code{GMP_NAIL_BITS} is the number of nail bits, or 0 when nails are not in 5859use. @code{GMP_NUMB_BITS} is the number of data bits in a limb. 5860@code{GMP_LIMB_BITS} is the total number of bits in an @code{mp_limb_t}. In 5861all cases 5862 5863@example 5864GMP_LIMB_BITS == GMP_NAIL_BITS + GMP_NUMB_BITS 5865@end example 5866@end defmac 5867 5868@defmac GMP_NAIL_MASK 5869@defmacx GMP_NUMB_MASK 5870Bit masks for the nail and number parts of a limb. @code{GMP_NAIL_MASK} is 0 5871when nails are not in use. 5872 5873@code{GMP_NAIL_MASK} is not often needed, since the nail part can be obtained 5874with @code{x >> GMP_NUMB_BITS}, and that means one less large constant, which 5875can help various RISC chips. 5876@end defmac 5877 5878@defmac GMP_NUMB_MAX 5879The maximum value that can be stored in the number part of a limb. This is 5880the same as @code{GMP_NUMB_MASK}, but can be used for clarity when doing 5881comparisons rather than bit-wise operations. 5882@end defmac 5883 5884The term ``nails'' comes from finger or toe nails, which are at the ends of a 5885limb (arm or leg). ``numb'' is short for number, but is also how the 5886developers felt after trying for a long time to come up with sensible names 5887for these things. 5888 5889In the future (the distant future most likely) a non-zero nail might be 5890permitted, giving non-unique representations for numbers in a limb vector. 5891This would help vector processors since carries would only ever need to 5892propagate one or two limbs. 5893 5894 5895@node Random Number Functions, Formatted Output, Low-level Functions, Top 5896@chapter Random Number Functions 5897@cindex Random number functions 5898 5899Sequences of pseudo-random numbers in GMP are generated using a variable of 5900type @code{gmp_randstate_t}, which holds an algorithm selection and a current 5901state. Such a variable must be initialized by a call to one of the 5902@code{gmp_randinit} functions, and can be seeded with one of the 5903@code{gmp_randseed} functions. 5904 5905The functions actually generating random numbers are described in @ref{Integer 5906Random Numbers}, and @ref{Miscellaneous Float Functions}. 5907 5908The older style random number functions don't accept a @code{gmp_randstate_t} 5909parameter but instead share a global variable of that type. They use a 5910default algorithm and are currently not seeded (though perhaps that will 5911change in the future). The new functions accepting a @code{gmp_randstate_t} 5912are recommended for applications that care about randomness. 5913 5914@menu 5915* Random State Initialization:: 5916* Random State Seeding:: 5917* Random State Miscellaneous:: 5918@end menu 5919 5920@node Random State Initialization, Random State Seeding, Random Number Functions, Random Number Functions 5921@section Random State Initialization 5922@cindex Random number state 5923@cindex Initialization functions 5924 5925@deftypefun void gmp_randinit_default (gmp_randstate_t @var{state}) 5926Initialize @var{state} with a default algorithm. This will be a compromise 5927between speed and randomness, and is recommended for applications with no 5928special requirements. Currently this is @code{gmp_randinit_mt}. 5929@end deftypefun 5930 5931@deftypefun void gmp_randinit_mt (gmp_randstate_t @var{state}) 5932@cindex Mersenne twister random numbers 5933Initialize @var{state} for a Mersenne Twister algorithm. This algorithm is 5934fast and has good randomness properties. 5935@end deftypefun 5936 5937@deftypefun void gmp_randinit_lc_2exp (gmp_randstate_t @var{state}, const mpz_t @var{a}, @w{unsigned long @var{c}}, @w{mp_bitcnt_t @var{m2exp}}) 5938@cindex Linear congruential random numbers 5939Initialize @var{state} with a linear congruential algorithm @m{X = (@var{a}X + 5940@var{c}) @bmod 2^{m2exp}, X = (@var{a}*X + @var{c}) mod 2^@var{m2exp}}. 5941 5942The low bits of @math{X} in this algorithm are not very random. The least 5943significant bit will have a period no more than 2, and the second bit no more 5944than 4, etc. For this reason only the high half of each @math{X} is actually 5945used. 5946 5947When a random number of more than @math{@var{m2exp}/2} bits is to be 5948generated, multiple iterations of the recurrence are used and the results 5949concatenated. 5950@end deftypefun 5951 5952@deftypefun int gmp_randinit_lc_2exp_size (gmp_randstate_t @var{state}, mp_bitcnt_t @var{size}) 5953@cindex Linear congruential random numbers 5954Initialize @var{state} for a linear congruential algorithm as per 5955@code{gmp_randinit_lc_2exp}. @var{a}, @var{c} and @var{m2exp} are selected 5956from a table, chosen so that @var{size} bits (or more) of each @math{X} will 5957be used, i.e.@: @math{@var{m2exp}/2 @ge{} @var{size}}. 5958 5959If successful the return value is non-zero. If @var{size} is bigger than the 5960table data provides then the return value is zero. The maximum @var{size} 5961currently supported is 128. 5962@end deftypefun 5963 5964@deftypefun void gmp_randinit_set (gmp_randstate_t @var{rop}, gmp_randstate_t @var{op}) 5965Initialize @var{rop} with a copy of the algorithm and state from @var{op}. 5966@end deftypefun 5967 5968@c Although gmp_randinit, gmp_errno and related constants are obsolete, we 5969@c still put @findex entries for them, since they're still documented and 5970@c someone might be looking them up when perusing old application code. 5971 5972@deftypefun void gmp_randinit (gmp_randstate_t @var{state}, @w{gmp_randalg_t @var{alg}}, @dots{}) 5973@strong{This function is obsolete.} 5974 5975@findex GMP_RAND_ALG_LC 5976@findex GMP_RAND_ALG_DEFAULT 5977Initialize @var{state} with an algorithm selected by @var{alg}. The only 5978choice is @code{GMP_RAND_ALG_LC}, which is @code{gmp_randinit_lc_2exp_size} 5979described above. A third parameter of type @code{unsigned long} is required, 5980this is the @var{size} for that function. @code{GMP_RAND_ALG_DEFAULT} or 0 5981are the same as @code{GMP_RAND_ALG_LC}. 5982 5983@c For reference, this is the only place gmp_errno has been documented, and 5984@c due to being non thread safe we won't be adding to it's uses. 5985@findex gmp_errno 5986@findex GMP_ERROR_UNSUPPORTED_ARGUMENT 5987@findex GMP_ERROR_INVALID_ARGUMENT 5988@code{gmp_randinit} sets bits in the global variable @code{gmp_errno} to 5989indicate an error. @code{GMP_ERROR_UNSUPPORTED_ARGUMENT} if @var{alg} is 5990unsupported, or @code{GMP_ERROR_INVALID_ARGUMENT} if the @var{size} parameter 5991is too big. It may be noted this error reporting is not thread safe (a good 5992reason to use @code{gmp_randinit_lc_2exp_size} instead). 5993@end deftypefun 5994 5995@deftypefun void gmp_randclear (gmp_randstate_t @var{state}) 5996Free all memory occupied by @var{state}. 5997@end deftypefun 5998 5999 6000@node Random State Seeding, Random State Miscellaneous, Random State Initialization, Random Number Functions 6001@section Random State Seeding 6002@cindex Random number seeding 6003@cindex Seeding random numbers 6004 6005@deftypefun void gmp_randseed (gmp_randstate_t @var{state}, const mpz_t @var{seed}) 6006@deftypefunx void gmp_randseed_ui (gmp_randstate_t @var{state}, @w{unsigned long int @var{seed}}) 6007Set an initial seed value into @var{state}. 6008 6009The size of a seed determines how many different sequences of random numbers 6010that it's possible to generate. The ``quality'' of the seed is the randomness 6011of a given seed compared to the previous seed used, and this affects the 6012randomness of separate number sequences. The method for choosing a seed is 6013critical if the generated numbers are to be used for important applications, 6014such as generating cryptographic keys. 6015 6016Traditionally the system time has been used to seed, but care needs to be 6017taken with this. If an application seeds often and the resolution of the 6018system clock is low, then the same sequence of numbers might be repeated. 6019Also, the system time is quite easy to guess, so if unpredictability is 6020required then it should definitely not be the only source for the seed value. 6021On some systems there's a special device @file{/dev/random} which provides 6022random data better suited for use as a seed. 6023@end deftypefun 6024 6025 6026@node Random State Miscellaneous, , Random State Seeding, Random Number Functions 6027@section Random State Miscellaneous 6028 6029@deftypefun {unsigned long} gmp_urandomb_ui (gmp_randstate_t @var{state}, unsigned long @var{n}) 6030Return a uniformly distributed random number of @var{n} bits, i.e.@: in the 6031range 0 to @m{2^n-1,2^@var{n}-1} inclusive. @var{n} must be less than or 6032equal to the number of bits in an @code{unsigned long}. 6033@end deftypefun 6034 6035@deftypefun {unsigned long} gmp_urandomm_ui (gmp_randstate_t @var{state}, unsigned long @var{n}) 6036Return a uniformly distributed random number in the range 0 to 6037@math{@var{n}-1}, inclusive. 6038@end deftypefun 6039 6040 6041@node Formatted Output, Formatted Input, Random Number Functions, Top 6042@chapter Formatted Output 6043@cindex Formatted output 6044@cindex @code{printf} formatted output 6045 6046@menu 6047* Formatted Output Strings:: 6048* Formatted Output Functions:: 6049* C++ Formatted Output:: 6050@end menu 6051 6052@node Formatted Output Strings, Formatted Output Functions, Formatted Output, Formatted Output 6053@section Format Strings 6054 6055@code{gmp_printf} and friends accept format strings similar to the standard C 6056@code{printf} (@pxref{Formatted Output,, Formatted Output, libc, The GNU C 6057Library Reference Manual}). A format specification is of the form 6058 6059@example 6060% [flags] [width] [.[precision]] [type] conv 6061@end example 6062 6063GMP adds types @samp{Z}, @samp{Q} and @samp{F} for @code{mpz_t}, @code{mpq_t} 6064and @code{mpf_t} respectively, @samp{M} for @code{mp_limb_t}, and @samp{N} for 6065an @code{mp_limb_t} array. @samp{Z}, @samp{Q}, @samp{M} and @samp{N} behave 6066like integers. @samp{Q} will print a @samp{/} and a denominator, if needed. 6067@samp{F} behaves like a float. For example, 6068 6069@example 6070mpz_t z; 6071gmp_printf ("%s is an mpz %Zd\n", "here", z); 6072 6073mpq_t q; 6074gmp_printf ("a hex rational: %#40Qx\n", q); 6075 6076mpf_t f; 6077int n; 6078gmp_printf ("fixed point mpf %.*Ff with %d digits\n", n, f, n); 6079 6080mp_limb_t l; 6081gmp_printf ("limb %Mu\n", l); 6082 6083const mp_limb_t *ptr; 6084mp_size_t size; 6085gmp_printf ("limb array %Nx\n", ptr, size); 6086@end example 6087 6088For @samp{N} the limbs are expected least significant first, as per the 6089@code{mpn} functions (@pxref{Low-level Functions}). A negative size can be 6090given to print the value as a negative. 6091 6092All the standard C @code{printf} types behave the same as the C library 6093@code{printf}, and can be freely intermixed with the GMP extensions. In the 6094current implementation the standard parts of the format string are simply 6095handed to @code{printf} and only the GMP extensions handled directly. 6096 6097The flags accepted are as follows. GLIBC style @nisamp{'} is only for the 6098standard C types (not the GMP types), and only if the C library supports it. 6099 6100@quotation 6101@multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM} 6102@item @nicode{0} @tab pad with zeros (rather than spaces) 6103@item @nicode{#} @tab show the base with @samp{0x}, @samp{0X} or @samp{0} 6104@item @nicode{+} @tab always show a sign 6105@item (space) @tab show a space or a @samp{-} sign 6106@item @nicode{'} @tab group digits, GLIBC style (not GMP types) 6107@end multitable 6108@end quotation 6109 6110The optional width and precision can be given as a number within the format 6111string, or as a @samp{*} to take an extra parameter of type @code{int}, the 6112same as the standard @code{printf}. 6113 6114The standard types accepted are as follows. @samp{h} and @samp{l} are 6115portable, the rest will depend on the compiler (or include files) for the type 6116and the C library for the output. 6117 6118@quotation 6119@multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM} 6120@item @nicode{h} @tab @nicode{short} 6121@item @nicode{hh} @tab @nicode{char} 6122@item @nicode{j} @tab @nicode{intmax_t} or @nicode{uintmax_t} 6123@item @nicode{l} @tab @nicode{long} or @nicode{wchar_t} 6124@item @nicode{ll} @tab @nicode{long long} 6125@item @nicode{L} @tab @nicode{long double} 6126@item @nicode{q} @tab @nicode{quad_t} or @nicode{u_quad_t} 6127@item @nicode{t} @tab @nicode{ptrdiff_t} 6128@item @nicode{z} @tab @nicode{size_t} 6129@end multitable 6130@end quotation 6131 6132@noindent 6133The GMP types are 6134 6135@quotation 6136@multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM} 6137@item @nicode{F} @tab @nicode{mpf_t}, float conversions 6138@item @nicode{Q} @tab @nicode{mpq_t}, integer conversions 6139@item @nicode{M} @tab @nicode{mp_limb_t}, integer conversions 6140@item @nicode{N} @tab @nicode{mp_limb_t} array, integer conversions 6141@item @nicode{Z} @tab @nicode{mpz_t}, integer conversions 6142@end multitable 6143@end quotation 6144 6145The conversions accepted are as follows. @samp{a} and @samp{A} are always 6146supported for @code{mpf_t} but depend on the C library for standard C float 6147types. @samp{m} and @samp{p} depend on the C library. 6148 6149@quotation 6150@multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM} 6151@item @nicode{a} @nicode{A} @tab hex floats, C99 style 6152@item @nicode{c} @tab character 6153@item @nicode{d} @tab decimal integer 6154@item @nicode{e} @nicode{E} @tab scientific format float 6155@item @nicode{f} @tab fixed point float 6156@item @nicode{i} @tab same as @nicode{d} 6157@item @nicode{g} @nicode{G} @tab fixed or scientific float 6158@item @nicode{m} @tab @code{strerror} string, GLIBC style 6159@item @nicode{n} @tab store characters written so far 6160@item @nicode{o} @tab octal integer 6161@item @nicode{p} @tab pointer 6162@item @nicode{s} @tab string 6163@item @nicode{u} @tab unsigned integer 6164@item @nicode{x} @nicode{X} @tab hex integer 6165@end multitable 6166@end quotation 6167 6168@samp{o}, @samp{x} and @samp{X} are unsigned for the standard C types, but for 6169types @samp{Z}, @samp{Q} and @samp{N} they are signed. @samp{u} is not 6170meaningful for @samp{Z}, @samp{Q} and @samp{N}. 6171 6172@samp{M} is a proxy for the C library @samp{l} or @samp{L}, according to the 6173size of @code{mp_limb_t}. Unsigned conversions will be usual, but a signed 6174conversion can be used and will interpret the value as a twos complement 6175negative. 6176 6177@samp{n} can be used with any type, even the GMP types. 6178 6179Other types or conversions that might be accepted by the C library 6180@code{printf} cannot be used through @code{gmp_printf}, this includes for 6181instance extensions registered with GLIBC @code{register_printf_function}. 6182Also currently there's no support for POSIX @samp{$} style numbered arguments 6183(perhaps this will be added in the future). 6184 6185The precision field has its usual meaning for integer @samp{Z} and float 6186@samp{F} types, but is currently undefined for @samp{Q} and should not be used 6187with that. 6188 6189@code{mpf_t} conversions only ever generate as many digits as can be 6190accurately represented by the operand, the same as @code{mpf_get_str} does. 6191Zeros will be used if necessary to pad to the requested precision. This 6192happens even for an @samp{f} conversion of an @code{mpf_t} which is an 6193integer, for instance @math{2^@W{1024}} in an @code{mpf_t} of 128 bits 6194precision will only produce about 40 digits, then pad with zeros to the 6195decimal point. An empty precision field like @samp{%.Fe} or @samp{%.Ff} can 6196be used to specifically request just the significant digits. Without any dot 6197and thus no precision field, a precision value of 6 will be used. Note that 6198these rules mean that @samp{%Ff}, @samp{%.Ff}, and @samp{%.0Ff} will all be 6199different. 6200 6201The decimal point character (or string) is taken from the current locale 6202settings on systems which provide @code{localeconv} (@pxref{Locales,, Locales 6203and Internationalization, libc, The GNU C Library Reference Manual}). The C 6204library will normally do the same for standard float output. 6205 6206The format string is only interpreted as plain @code{char}s, multibyte 6207characters are not recognised. Perhaps this will change in the future. 6208 6209 6210@node Formatted Output Functions, C++ Formatted Output, Formatted Output Strings, Formatted Output 6211@section Functions 6212@cindex Output functions 6213 6214Each of the following functions is similar to the corresponding C library 6215function. The basic @code{printf} forms take a variable argument list. The 6216@code{vprintf} forms take an argument pointer, see @ref{Variadic Functions,, 6217Variadic Functions, libc, The GNU C Library Reference Manual}, or @samp{man 3 6218va_start}. 6219 6220It should be emphasised that if a format string is invalid, or the arguments 6221don't match what the format specifies, then the behaviour of any of these 6222functions will be unpredictable. GCC format string checking is not available, 6223since it doesn't recognise the GMP extensions. 6224 6225The file based functions @code{gmp_printf} and @code{gmp_fprintf} will return 6226@math{-1} to indicate a write error. Output is not ``atomic'', so partial 6227output may be produced if a write error occurs. All the functions can return 6228@math{-1} if the C library @code{printf} variant in use returns @math{-1}, but 6229this shouldn't normally occur. 6230 6231@deftypefun int gmp_printf (const char *@var{fmt}, @dots{}) 6232@deftypefunx int gmp_vprintf (const char *@var{fmt}, va_list @var{ap}) 6233Print to the standard output @code{stdout}. Return the number of characters 6234written, or @math{-1} if an error occurred. 6235@end deftypefun 6236 6237@deftypefun int gmp_fprintf (FILE *@var{fp}, const char *@var{fmt}, @dots{}) 6238@deftypefunx int gmp_vfprintf (FILE *@var{fp}, const char *@var{fmt}, va_list @var{ap}) 6239Print to the stream @var{fp}. Return the number of characters written, or 6240@math{-1} if an error occurred. 6241@end deftypefun 6242 6243@deftypefun int gmp_sprintf (char *@var{buf}, const char *@var{fmt}, @dots{}) 6244@deftypefunx int gmp_vsprintf (char *@var{buf}, const char *@var{fmt}, va_list @var{ap}) 6245Form a null-terminated string in @var{buf}. Return the number of characters 6246written, excluding the terminating null. 6247 6248No overlap is permitted between the space at @var{buf} and the string 6249@var{fmt}. 6250 6251These functions are not recommended, since there's no protection against 6252exceeding the space available at @var{buf}. 6253@end deftypefun 6254 6255@deftypefun int gmp_snprintf (char *@var{buf}, size_t @var{size}, const char *@var{fmt}, @dots{}) 6256@deftypefunx int gmp_vsnprintf (char *@var{buf}, size_t @var{size}, const char *@var{fmt}, va_list @var{ap}) 6257Form a null-terminated string in @var{buf}. No more than @var{size} bytes 6258will be written. To get the full output, @var{size} must be enough for the 6259string and null-terminator. 6260 6261The return value is the total number of characters which ought to have been 6262produced, excluding the terminating null. If @math{@var{retval} @ge{} 6263@var{size}} then the actual output has been truncated to the first 6264@math{@var{size}-1} characters, and a null appended. 6265 6266No overlap is permitted between the region @{@var{buf},@var{size}@} and the 6267@var{fmt} string. 6268 6269Notice the return value is in ISO C99 @code{snprintf} style. This is so even 6270if the C library @code{vsnprintf} is the older GLIBC 2.0.x style. 6271@end deftypefun 6272 6273@deftypefun int gmp_asprintf (char **@var{pp}, const char *@var{fmt}, @dots{}) 6274@deftypefunx int gmp_vasprintf (char **@var{pp}, const char *@var{fmt}, va_list @var{ap}) 6275Form a null-terminated string in a block of memory obtained from the current 6276memory allocation function (@pxref{Custom Allocation}). The block will be the 6277size of the string and null-terminator. The address of the block in stored to 6278*@var{pp}. The return value is the number of characters produced, excluding 6279the null-terminator. 6280 6281Unlike the C library @code{asprintf}, @code{gmp_asprintf} doesn't return 6282@math{-1} if there's no more memory available, it lets the current allocation 6283function handle that. 6284@end deftypefun 6285 6286@deftypefun int gmp_obstack_printf (struct obstack *@var{ob}, const char *@var{fmt}, @dots{}) 6287@deftypefunx int gmp_obstack_vprintf (struct obstack *@var{ob}, const char *@var{fmt}, va_list @var{ap}) 6288@cindex @code{obstack} output 6289Append to the current object in @var{ob}. The return value is the number of 6290characters written. A null-terminator is not written. 6291 6292@var{fmt} cannot be within the current object in @var{ob}, since that object 6293might move as it grows. 6294 6295These functions are available only when the C library provides the obstack 6296feature, which probably means only on GNU systems, see @ref{Obstacks,, 6297Obstacks, libc, The GNU C Library Reference Manual}. 6298@end deftypefun 6299 6300 6301@node C++ Formatted Output, , Formatted Output Functions, Formatted Output 6302@section C++ Formatted Output 6303@cindex C++ @code{ostream} output 6304@cindex @code{ostream} output 6305 6306The following functions are provided in @file{libgmpxx} (@pxref{Headers and 6307Libraries}), which is built if C++ support is enabled (@pxref{Build Options}). 6308Prototypes are available from @code{<gmp.h>}. 6309 6310@deftypefun ostream& operator<< (ostream& @var{stream}, const mpz_t @var{op}) 6311Print @var{op} to @var{stream}, using its @code{ios} formatting settings. 6312@code{ios::width} is reset to 0 after output, the same as the standard 6313@code{ostream operator<<} routines do. 6314 6315In hex or octal, @var{op} is printed as a signed number, the same as for 6316decimal. This is unlike the standard @code{operator<<} routines on @code{int} 6317etc, which instead give twos complement. 6318@end deftypefun 6319 6320@deftypefun ostream& operator<< (ostream& @var{stream}, const mpq_t @var{op}) 6321Print @var{op} to @var{stream}, using its @code{ios} formatting settings. 6322@code{ios::width} is reset to 0 after output, the same as the standard 6323@code{ostream operator<<} routines do. 6324 6325Output will be a fraction like @samp{5/9}, or if the denominator is 1 then 6326just a plain integer like @samp{123}. 6327 6328In hex or octal, @var{op} is printed as a signed value, the same as for 6329decimal. If @code{ios::showbase} is set then a base indicator is shown on 6330both the numerator and denominator (if the denominator is required). 6331@end deftypefun 6332 6333@deftypefun ostream& operator<< (ostream& @var{stream}, const mpf_t @var{op}) 6334Print @var{op} to @var{stream}, using its @code{ios} formatting settings. 6335@code{ios::width} is reset to 0 after output, the same as the standard 6336@code{ostream operator<<} routines do. 6337 6338The decimal point follows the standard library float @code{operator<<}, which 6339on recent systems means the @code{std::locale} imbued on @var{stream}. 6340 6341Hex and octal are supported, unlike the standard @code{operator<<} on 6342@code{double}. The mantissa will be in hex or octal, the exponent will be in 6343decimal. For hex the exponent delimiter is an @samp{@@}. This is as per 6344@code{mpf_out_str}. 6345 6346@code{ios::showbase} is supported, and will put a base on the mantissa, for 6347example hex @samp{0x1.8} or @samp{0x0.8}, or octal @samp{01.4} or @samp{00.4}. 6348This last form is slightly strange, but at least differentiates itself from 6349decimal. 6350@end deftypefun 6351 6352These operators mean that GMP types can be printed in the usual C++ way, for 6353example, 6354 6355@example 6356mpz_t z; 6357int n; 6358... 6359cout << "iteration " << n << " value " << z << "\n"; 6360@end example 6361 6362But note that @code{ostream} output (and @code{istream} input, @pxref{C++ 6363Formatted Input}) is the only overloading available for the GMP types and that 6364for instance using @code{+} with an @code{mpz_t} will have unpredictable 6365results. For classes with overloading, see @ref{C++ Class Interface}. 6366 6367 6368@node Formatted Input, C++ Class Interface, Formatted Output, Top 6369@chapter Formatted Input 6370@cindex Formatted input 6371@cindex @code{scanf} formatted input 6372 6373@menu 6374* Formatted Input Strings:: 6375* Formatted Input Functions:: 6376* C++ Formatted Input:: 6377@end menu 6378 6379 6380@node Formatted Input Strings, Formatted Input Functions, Formatted Input, Formatted Input 6381@section Formatted Input Strings 6382 6383@code{gmp_scanf} and friends accept format strings similar to the standard C 6384@code{scanf} (@pxref{Formatted Input,, Formatted Input, libc, The GNU C 6385Library Reference Manual}). A format specification is of the form 6386 6387@example 6388% [flags] [width] [type] conv 6389@end example 6390 6391GMP adds types @samp{Z}, @samp{Q} and @samp{F} for @code{mpz_t}, @code{mpq_t} 6392and @code{mpf_t} respectively. @samp{Z} and @samp{Q} behave like integers. 6393@samp{Q} will read a @samp{/} and a denominator, if present. @samp{F} behaves 6394like a float. 6395 6396GMP variables don't require an @code{&} when passed to @code{gmp_scanf}, since 6397they're already ``call-by-reference''. For example, 6398 6399@example 6400/* to read say "a(5) = 1234" */ 6401int n; 6402mpz_t z; 6403gmp_scanf ("a(%d) = %Zd\n", &n, z); 6404 6405mpq_t q1, q2; 6406gmp_sscanf ("0377 + 0x10/0x11", "%Qi + %Qi", q1, q2); 6407 6408/* to read say "topleft (1.55,-2.66)" */ 6409mpf_t x, y; 6410char buf[32]; 6411gmp_scanf ("%31s (%Ff,%Ff)", buf, x, y); 6412@end example 6413 6414All the standard C @code{scanf} types behave the same as in the C library 6415@code{scanf}, and can be freely intermixed with the GMP extensions. In the 6416current implementation the standard parts of the format string are simply 6417handed to @code{scanf} and only the GMP extensions handled directly. 6418 6419The flags accepted are as follows. @samp{a} and @samp{'} will depend on 6420support from the C library, and @samp{'} cannot be used with GMP types. 6421 6422@quotation 6423@multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM} 6424@item @nicode{*} @tab read but don't store 6425@item @nicode{a} @tab allocate a buffer (string conversions) 6426@item @nicode{'} @tab grouped digits, GLIBC style (not GMP types) 6427@end multitable 6428@end quotation 6429 6430The standard types accepted are as follows. @samp{h} and @samp{l} are 6431portable, the rest will depend on the compiler (or include files) for the type 6432and the C library for the input. 6433 6434@quotation 6435@multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM} 6436@item @nicode{h} @tab @nicode{short} 6437@item @nicode{hh} @tab @nicode{char} 6438@item @nicode{j} @tab @nicode{intmax_t} or @nicode{uintmax_t} 6439@item @nicode{l} @tab @nicode{long int}, @nicode{double} or @nicode{wchar_t} 6440@item @nicode{ll} @tab @nicode{long long} 6441@item @nicode{L} @tab @nicode{long double} 6442@item @nicode{q} @tab @nicode{quad_t} or @nicode{u_quad_t} 6443@item @nicode{t} @tab @nicode{ptrdiff_t} 6444@item @nicode{z} @tab @nicode{size_t} 6445@end multitable 6446@end quotation 6447 6448@noindent 6449The GMP types are 6450 6451@quotation 6452@multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM} 6453@item @nicode{F} @tab @nicode{mpf_t}, float conversions 6454@item @nicode{Q} @tab @nicode{mpq_t}, integer conversions 6455@item @nicode{Z} @tab @nicode{mpz_t}, integer conversions 6456@end multitable 6457@end quotation 6458 6459The conversions accepted are as follows. @samp{p} and @samp{[} will depend on 6460support from the C library, the rest are standard. 6461 6462@quotation 6463@multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM} 6464@item @nicode{c} @tab character or characters 6465@item @nicode{d} @tab decimal integer 6466@item @nicode{e} @nicode{E} @nicode{f} @nicode{g} @nicode{G} 6467 @tab float 6468@item @nicode{i} @tab integer with base indicator 6469@item @nicode{n} @tab characters read so far 6470@item @nicode{o} @tab octal integer 6471@item @nicode{p} @tab pointer 6472@item @nicode{s} @tab string of non-whitespace characters 6473@item @nicode{u} @tab decimal integer 6474@item @nicode{x} @nicode{X} @tab hex integer 6475@item @nicode{[} @tab string of characters in a set 6476@end multitable 6477@end quotation 6478 6479@samp{e}, @samp{E}, @samp{f}, @samp{g} and @samp{G} are identical, they all 6480read either fixed point or scientific format, and either upper or lower case 6481@samp{e} for the exponent in scientific format. 6482 6483C99 style hex float format (@code{printf %a}, @pxref{Formatted Output 6484Strings}) is always accepted for @code{mpf_t}, but for the standard float 6485types it will depend on the C library. 6486 6487@samp{x} and @samp{X} are identical, both accept both upper and lower case 6488hexadecimal. 6489 6490@samp{o}, @samp{u}, @samp{x} and @samp{X} all read positive or negative 6491values. For the standard C types these are described as ``unsigned'' 6492conversions, but that merely affects certain overflow handling, negatives are 6493still allowed (per @code{strtoul}, @pxref{Parsing of Integers,, Parsing of 6494Integers, libc, The GNU C Library Reference Manual}). For GMP types there are 6495no overflows, so @samp{d} and @samp{u} are identical. 6496 6497@samp{Q} type reads the numerator and (optional) denominator as given. If the 6498value might not be in canonical form then @code{mpq_canonicalize} must be 6499called before using it in any calculations (@pxref{Rational Number 6500Functions}). 6501 6502@samp{Qi} will read a base specification separately for the numerator and 6503denominator. For example @samp{0x10/11} would be 16/11, whereas 6504@samp{0x10/0x11} would be 16/17. 6505 6506@samp{n} can be used with any of the types above, even the GMP types. 6507@samp{*} to suppress assignment is allowed, though in that case it would do 6508nothing at all. 6509 6510Other conversions or types that might be accepted by the C library 6511@code{scanf} cannot be used through @code{gmp_scanf}. 6512 6513Whitespace is read and discarded before a field, except for @samp{c} and 6514@samp{[} conversions. 6515 6516For float conversions, the decimal point character (or string) expected is 6517taken from the current locale settings on systems which provide 6518@code{localeconv} (@pxref{Locales,, Locales and Internationalization, libc, 6519The GNU C Library Reference Manual}). The C library will normally do the same 6520for standard float input. 6521 6522The format string is only interpreted as plain @code{char}s, multibyte 6523characters are not recognised. Perhaps this will change in the future. 6524 6525 6526@node Formatted Input Functions, C++ Formatted Input, Formatted Input Strings, Formatted Input 6527@section Formatted Input Functions 6528@cindex Input functions 6529 6530Each of the following functions is similar to the corresponding C library 6531function. The plain @code{scanf} forms take a variable argument list. The 6532@code{vscanf} forms take an argument pointer, see @ref{Variadic Functions,, 6533Variadic Functions, libc, The GNU C Library Reference Manual}, or @samp{man 3 6534va_start}. 6535 6536It should be emphasised that if a format string is invalid, or the arguments 6537don't match what the format specifies, then the behaviour of any of these 6538functions will be unpredictable. GCC format string checking is not available, 6539since it doesn't recognise the GMP extensions. 6540 6541No overlap is permitted between the @var{fmt} string and any of the results 6542produced. 6543 6544@deftypefun int gmp_scanf (const char *@var{fmt}, @dots{}) 6545@deftypefunx int gmp_vscanf (const char *@var{fmt}, va_list @var{ap}) 6546Read from the standard input @code{stdin}. 6547@end deftypefun 6548 6549@deftypefun int gmp_fscanf (FILE *@var{fp}, const char *@var{fmt}, @dots{}) 6550@deftypefunx int gmp_vfscanf (FILE *@var{fp}, const char *@var{fmt}, va_list @var{ap}) 6551Read from the stream @var{fp}. 6552@end deftypefun 6553 6554@deftypefun int gmp_sscanf (const char *@var{s}, const char *@var{fmt}, @dots{}) 6555@deftypefunx int gmp_vsscanf (const char *@var{s}, const char *@var{fmt}, va_list @var{ap}) 6556Read from a null-terminated string @var{s}. 6557@end deftypefun 6558 6559The return value from each of these functions is the same as the standard C99 6560@code{scanf}, namely the number of fields successfully parsed and stored. 6561@samp{%n} fields and fields read but suppressed by @samp{*} don't count 6562towards the return value. 6563 6564If end of input (or a file error) is reached before a character for a field or 6565a literal, and if no previous non-suppressed fields have matched, then the 6566return value is @code{EOF} instead of 0. A whitespace character in the format 6567string is only an optional match and doesn't induce an @code{EOF} in this 6568fashion. Leading whitespace read and discarded for a field don't count as 6569characters for that field. 6570 6571For the GMP types, input parsing follows C99 rules, namely one character of 6572lookahead is used and characters are read while they continue to meet the 6573format requirements. If this doesn't provide a complete number then the 6574function terminates, with that field not stored nor counted towards the return 6575value. For instance with @code{mpf_t} an input @samp{1.23e-XYZ} would be read 6576up to the @samp{X} and that character pushed back since it's not a digit. The 6577string @samp{1.23e-} would then be considered invalid since an @samp{e} must 6578be followed by at least one digit. 6579 6580For the standard C types, in the current implementation GMP calls the C 6581library @code{scanf} functions, which might have looser rules about what 6582constitutes a valid input. 6583 6584Note that @code{gmp_sscanf} is the same as @code{gmp_fscanf} and only does one 6585character of lookahead when parsing. Although clearly it could look at its 6586entire input, it is deliberately made identical to @code{gmp_fscanf}, the same 6587way C99 @code{sscanf} is the same as @code{fscanf}. 6588 6589 6590@node C++ Formatted Input, , Formatted Input Functions, Formatted Input 6591@section C++ Formatted Input 6592@cindex C++ @code{istream} input 6593@cindex @code{istream} input 6594 6595The following functions are provided in @file{libgmpxx} (@pxref{Headers and 6596Libraries}), which is built only if C++ support is enabled (@pxref{Build 6597Options}). Prototypes are available from @code{<gmp.h>}. 6598 6599@deftypefun istream& operator>> (istream& @var{stream}, mpz_t @var{rop}) 6600Read @var{rop} from @var{stream}, using its @code{ios} formatting settings. 6601@end deftypefun 6602 6603@deftypefun istream& operator>> (istream& @var{stream}, mpq_t @var{rop}) 6604An integer like @samp{123} will be read, or a fraction like @samp{5/9}. No 6605whitespace is allowed around the @samp{/}. If the fraction is not in 6606canonical form then @code{mpq_canonicalize} must be called (@pxref{Rational 6607Number Functions}) before operating on it. 6608 6609As per integer input, an @samp{0} or @samp{0x} base indicator is read when 6610none of @code{ios::dec}, @code{ios::oct} or @code{ios::hex} are set. This is 6611done separately for numerator and denominator, so that for instance 6612@samp{0x10/11} is @math{16/11} and @samp{0x10/0x11} is @math{16/17}. 6613@end deftypefun 6614 6615@deftypefun istream& operator>> (istream& @var{stream}, mpf_t @var{rop}) 6616Read @var{rop} from @var{stream}, using its @code{ios} formatting settings. 6617 6618Hex or octal floats are not supported, but might be in the future, or perhaps 6619it's best to accept only what the standard float @code{operator>>} does. 6620@end deftypefun 6621 6622Note that digit grouping specified by the @code{istream} locale is currently 6623not accepted. Perhaps this will change in the future. 6624 6625@sp 1 6626These operators mean that GMP types can be read in the usual C++ way, for 6627example, 6628 6629@example 6630mpz_t z; 6631... 6632cin >> z; 6633@end example 6634 6635But note that @code{istream} input (and @code{ostream} output, @pxref{C++ 6636Formatted Output}) is the only overloading available for the GMP types and 6637that for instance using @code{+} with an @code{mpz_t} will have unpredictable 6638results. For classes with overloading, see @ref{C++ Class Interface}. 6639 6640 6641 6642@node C++ Class Interface, Custom Allocation, Formatted Input, Top 6643@chapter C++ Class Interface 6644@cindex C++ interface 6645 6646This chapter describes the C++ class based interface to GMP. 6647 6648All GMP C language types and functions can be used in C++ programs, since 6649@file{gmp.h} has @code{extern "C"} qualifiers, but the class interface offers 6650overloaded functions and operators which may be more convenient. 6651 6652Due to the implementation of this interface, a reasonably recent C++ compiler 6653is required, one supporting namespaces, partial specialization of templates 6654and member templates. 6655 6656@strong{Everything described in this chapter is to be considered preliminary 6657and might be subject to incompatible changes if some unforeseen difficulty 6658reveals itself.} 6659 6660@menu 6661* C++ Interface General:: 6662* C++ Interface Integers:: 6663* C++ Interface Rationals:: 6664* C++ Interface Floats:: 6665* C++ Interface Random Numbers:: 6666* C++ Interface Limitations:: 6667@end menu 6668 6669 6670@node C++ Interface General, C++ Interface Integers, C++ Class Interface, C++ Class Interface 6671@section C++ Interface General 6672 6673@noindent 6674All the C++ classes and functions are available with 6675 6676@cindex @code{gmpxx.h} 6677@example 6678#include <gmpxx.h> 6679@end example 6680 6681Programs should be linked with the @file{libgmpxx} and @file{libgmp} 6682libraries. For example, 6683 6684@example 6685g++ mycxxprog.cc -lgmpxx -lgmp 6686@end example 6687 6688@noindent 6689The classes defined are 6690 6691@deftp Class mpz_class 6692@deftpx Class mpq_class 6693@deftpx Class mpf_class 6694@end deftp 6695 6696The standard operators and various standard functions are overloaded to allow 6697arithmetic with these classes. For example, 6698 6699@example 6700int 6701main (void) 6702@{ 6703 mpz_class a, b, c; 6704 6705 a = 1234; 6706 b = "-5678"; 6707 c = a+b; 6708 cout << "sum is " << c << "\n"; 6709 cout << "absolute value is " << abs(c) << "\n"; 6710 6711 return 0; 6712@} 6713@end example 6714 6715An important feature of the implementation is that an expression like 6716@code{a=b+c} results in a single call to the corresponding @code{mpz_add}, 6717without using a temporary for the @code{b+c} part. Expressions which by their 6718nature imply intermediate values, like @code{a=b*c+d*e}, still use temporaries 6719though. 6720 6721The classes can be freely intermixed in expressions, as can the classes and 6722the standard types @code{long}, @code{unsigned long} and @code{double}. 6723Smaller types like @code{int} or @code{float} can also be intermixed, since 6724C++ will promote them. 6725 6726Note that @code{bool} is not accepted directly, but must be explicitly cast to 6727an @code{int} first. This is because C++ will automatically convert any 6728pointer to a @code{bool}, so if GMP accepted @code{bool} it would make all 6729sorts of invalid class and pointer combinations compile but almost certainly 6730not do anything sensible. 6731 6732Conversions back from the classes to standard C++ types aren't done 6733automatically, instead member functions like @code{get_si} are provided (see 6734the following sections for details). 6735 6736Also there are no automatic conversions from the classes to the corresponding 6737GMP C types, instead a reference to the underlying C object can be obtained 6738with the following functions, 6739 6740@deftypefun mpz_t mpz_class::get_mpz_t () 6741@deftypefunx mpq_t mpq_class::get_mpq_t () 6742@deftypefunx mpf_t mpf_class::get_mpf_t () 6743@end deftypefun 6744 6745These can be used to call a C function which doesn't have a C++ class 6746interface. For example to set @code{a} to the GCD of @code{b} and @code{c}, 6747 6748@example 6749mpz_class a, b, c; 6750... 6751mpz_gcd (a.get_mpz_t(), b.get_mpz_t(), c.get_mpz_t()); 6752@end example 6753 6754In the other direction, a class can be initialized from the corresponding GMP 6755C type, or assigned to if an explicit constructor is used. In both cases this 6756makes a copy of the value, it doesn't create any sort of association. For 6757example, 6758 6759@example 6760mpz_t z; 6761// ... init and calculate z ... 6762mpz_class x(z); 6763mpz_class y; 6764y = mpz_class (z); 6765@end example 6766 6767There are no namespace setups in @file{gmpxx.h}, all types and functions are 6768simply put into the global namespace. This is what @file{gmp.h} has done in 6769the past, and continues to do for compatibility. The extras provided by 6770@file{gmpxx.h} follow GMP naming conventions and are unlikely to clash with 6771anything. 6772 6773 6774@node C++ Interface Integers, C++ Interface Rationals, C++ Interface General, C++ Class Interface 6775@section C++ Interface Integers 6776 6777@deftypefun {} mpz_class::mpz_class (type @var{n}) 6778Construct an @code{mpz_class}. All the standard C++ types may be used, except 6779@code{long long} and @code{long double}, and all the GMP C++ classes can be 6780used, although conversions from @code{mpq_class} and @code{mpf_class} are 6781@code{explicit}. Any necessary conversion follows the corresponding C 6782function, for example @code{double} follows @code{mpz_set_d} 6783(@pxref{Assigning Integers}). 6784@end deftypefun 6785 6786@deftypefun explicit mpz_class::mpz_class (const mpz_t @var{z}) 6787Construct an @code{mpz_class} from an @code{mpz_t}. The value in @var{z} is 6788copied into the new @code{mpz_class}, there won't be any permanent association 6789between it and @var{z}. 6790@end deftypefun 6791 6792@deftypefun explicit mpz_class::mpz_class (const char *@var{s}, int @var{base} = 0) 6793@deftypefunx explicit mpz_class::mpz_class (const string& @var{s}, int @var{base} = 0) 6794Construct an @code{mpz_class} converted from a string using @code{mpz_set_str} 6795(@pxref{Assigning Integers}). 6796 6797If the string is not a valid integer, an @code{std::invalid_argument} 6798exception is thrown. The same applies to @code{operator=}. 6799@end deftypefun 6800 6801@deftypefun mpz_class operator"" _mpz (const char *@var{str}) 6802With C++11 compilers, integers can be constructed with the syntax 6803@code{123_mpz} which is equivalent to @code{mpz_class("123")}. 6804@end deftypefun 6805 6806@deftypefun mpz_class operator/ (mpz_class @var{a}, mpz_class @var{d}) 6807@deftypefunx mpz_class operator% (mpz_class @var{a}, mpz_class @var{d}) 6808Divisions involving @code{mpz_class} round towards zero, as per the 6809@code{mpz_tdiv_q} and @code{mpz_tdiv_r} functions (@pxref{Integer Division}). 6810This is the same as the C99 @code{/} and @code{%} operators. 6811 6812The @code{mpz_fdiv@dots{}} or @code{mpz_cdiv@dots{}} functions can always be called 6813directly if desired. For example, 6814 6815@example 6816mpz_class q, a, d; 6817... 6818mpz_fdiv_q (q.get_mpz_t(), a.get_mpz_t(), d.get_mpz_t()); 6819@end example 6820@end deftypefun 6821 6822@deftypefun mpz_class abs (mpz_class @var{op}) 6823@deftypefunx int cmp (mpz_class @var{op1}, type @var{op2}) 6824@deftypefunx int cmp (type @var{op1}, mpz_class @var{op2}) 6825@maybepagebreak 6826@deftypefunx bool mpz_class::fits_sint_p (void) 6827@deftypefunx bool mpz_class::fits_slong_p (void) 6828@deftypefunx bool mpz_class::fits_sshort_p (void) 6829@maybepagebreak 6830@deftypefunx bool mpz_class::fits_uint_p (void) 6831@deftypefunx bool mpz_class::fits_ulong_p (void) 6832@deftypefunx bool mpz_class::fits_ushort_p (void) 6833@maybepagebreak 6834@deftypefunx double mpz_class::get_d (void) 6835@deftypefunx long mpz_class::get_si (void) 6836@deftypefunx string mpz_class::get_str (int @var{base} = 10) 6837@deftypefunx {unsigned long} mpz_class::get_ui (void) 6838@maybepagebreak 6839@deftypefunx int mpz_class::set_str (const char *@var{str}, int @var{base}) 6840@deftypefunx int mpz_class::set_str (const string& @var{str}, int @var{base}) 6841@deftypefunx int sgn (mpz_class @var{op}) 6842@deftypefunx mpz_class sqrt (mpz_class @var{op}) 6843@maybepagebreak 6844@deftypefunx void mpz_class::swap (mpz_class& @var{op}) 6845@deftypefunx void swap (mpz_class& @var{op1}, mpz_class& @var{op2}) 6846These functions provide a C++ class interface to the corresponding GMP C 6847routines. 6848 6849@code{cmp} can be used with any of the classes or the standard C++ types, 6850except @code{long long} and @code{long double}. 6851@end deftypefun 6852 6853@sp 1 6854Overloaded operators for combinations of @code{mpz_class} and @code{double} 6855are provided for completeness, but it should be noted that if the given 6856@code{double} is not an integer then the way any rounding is done is currently 6857unspecified. The rounding might take place at the start, in the middle, or at 6858the end of the operation, and it might change in the future. 6859 6860Conversions between @code{mpz_class} and @code{double}, however, are defined 6861to follow the corresponding C functions @code{mpz_get_d} and @code{mpz_set_d}. 6862And comparisons are always made exactly, as per @code{mpz_cmp_d}. 6863 6864 6865@node C++ Interface Rationals, C++ Interface Floats, C++ Interface Integers, C++ Class Interface 6866@section C++ Interface Rationals 6867 6868In all the following constructors, if a fraction is given then it should be in 6869canonical form, or if not then @code{mpq_class::canonicalize} called. 6870 6871@deftypefun {} mpq_class::mpq_class (type @var{op}) 6872@deftypefunx {} mpq_class::mpq_class (integer @var{num}, integer @var{den}) 6873Construct an @code{mpq_class}. The initial value can be a single value of any 6874type (conversion from @code{mpf_class} is @code{explicit}), or a pair of 6875integers (@code{mpz_class} or standard C++ integer types) representing a 6876fraction, except that @code{long long} and @code{long double} are not 6877supported. For example, 6878 6879@example 6880mpq_class q (99); 6881mpq_class q (1.75); 6882mpq_class q (1, 3); 6883@end example 6884@end deftypefun 6885 6886@deftypefun explicit mpq_class::mpq_class (const mpq_t @var{q}) 6887Construct an @code{mpq_class} from an @code{mpq_t}. The value in @var{q} is 6888copied into the new @code{mpq_class}, there won't be any permanent association 6889between it and @var{q}. 6890@end deftypefun 6891 6892@deftypefun explicit mpq_class::mpq_class (const char *@var{s}, int @var{base} = 0) 6893@deftypefunx explicit mpq_class::mpq_class (const string& @var{s}, int @var{base} = 0) 6894Construct an @code{mpq_class} converted from a string using @code{mpq_set_str} 6895(@pxref{Initializing Rationals}). 6896 6897If the string is not a valid rational, an @code{std::invalid_argument} 6898exception is thrown. The same applies to @code{operator=}. 6899@end deftypefun 6900 6901@deftypefun mpq_class operator"" _mpq (const char *@var{str}) 6902With C++11 compilers, integral rationals can be constructed with the syntax 6903@code{123_mpq} which is equivalent to @code{mpq_class(123_mpz)}. Other 6904rationals can be built as @code{-1_mpq/2} or @code{0xb_mpq/123456_mpz}. 6905@end deftypefun 6906 6907@deftypefun void mpq_class::canonicalize () 6908Put an @code{mpq_class} into canonical form, as per @ref{Rational Number 6909Functions}. All arithmetic operators require their operands in canonical 6910form, and will return results in canonical form. 6911@end deftypefun 6912 6913@deftypefun mpq_class abs (mpq_class @var{op}) 6914@deftypefunx int cmp (mpq_class @var{op1}, type @var{op2}) 6915@deftypefunx int cmp (type @var{op1}, mpq_class @var{op2}) 6916@maybepagebreak 6917@deftypefunx double mpq_class::get_d (void) 6918@deftypefunx string mpq_class::get_str (int @var{base} = 10) 6919@maybepagebreak 6920@deftypefunx int mpq_class::set_str (const char *@var{str}, int @var{base}) 6921@deftypefunx int mpq_class::set_str (const string& @var{str}, int @var{base}) 6922@deftypefunx int sgn (mpq_class @var{op}) 6923@maybepagebreak 6924@deftypefunx void mpq_class::swap (mpq_class& @var{op}) 6925@deftypefunx void swap (mpq_class& @var{op1}, mpq_class& @var{op2}) 6926These functions provide a C++ class interface to the corresponding GMP C 6927routines. 6928 6929@code{cmp} can be used with any of the classes or the standard C++ types, 6930except @code{long long} and @code{long double}. 6931@end deftypefun 6932 6933@deftypefun {mpz_class&} mpq_class::get_num () 6934@deftypefunx {mpz_class&} mpq_class::get_den () 6935Get a reference to an @code{mpz_class} which is the numerator or denominator 6936of an @code{mpq_class}. This can be used both for read and write access. If 6937the object returned is modified, it modifies the original @code{mpq_class}. 6938 6939If direct manipulation might produce a non-canonical value, then 6940@code{mpq_class::canonicalize} must be called before further operations. 6941@end deftypefun 6942 6943@deftypefun mpz_t mpq_class::get_num_mpz_t () 6944@deftypefunx mpz_t mpq_class::get_den_mpz_t () 6945Get a reference to the underlying @code{mpz_t} numerator or denominator of an 6946@code{mpq_class}. This can be passed to C functions expecting an 6947@code{mpz_t}. Any modifications made to the @code{mpz_t} will modify the 6948original @code{mpq_class}. 6949 6950If direct manipulation might produce a non-canonical value, then 6951@code{mpq_class::canonicalize} must be called before further operations. 6952@end deftypefun 6953 6954@deftypefun istream& operator>> (istream& @var{stream}, mpq_class& @var{rop}); 6955Read @var{rop} from @var{stream}, using its @code{ios} formatting settings, 6956the same as @code{mpq_t operator>>} (@pxref{C++ Formatted Input}). 6957 6958If the @var{rop} read might not be in canonical form then 6959@code{mpq_class::canonicalize} must be called. 6960@end deftypefun 6961 6962 6963@node C++ Interface Floats, C++ Interface Random Numbers, C++ Interface Rationals, C++ Class Interface 6964@section C++ Interface Floats 6965 6966When an expression requires the use of temporary intermediate @code{mpf_class} 6967values, like @code{f=g*h+x*y}, those temporaries will have the same precision 6968as the destination @code{f}. Explicit constructors can be used if this 6969doesn't suit. 6970 6971@deftypefun {} mpf_class::mpf_class (type @var{op}) 6972@deftypefunx {} mpf_class::mpf_class (type @var{op}, mp_bitcnt_t @var{prec}) 6973Construct an @code{mpf_class}. Any standard C++ type can be used, except 6974@code{long long} and @code{long double}, and any of the GMP C++ classes can be 6975used. 6976 6977If @var{prec} is given, the initial precision is that value, in bits. If 6978@var{prec} is not given, then the initial precision is determined by the type 6979of @var{op} given. An @code{mpz_class}, @code{mpq_class}, or C++ 6980builtin type will give the default @code{mpf} precision (@pxref{Initializing 6981Floats}). An @code{mpf_class} or expression will give the precision of that 6982value. The precision of a binary expression is the higher of the two 6983operands. 6984 6985@example 6986mpf_class f(1.5); // default precision 6987mpf_class f(1.5, 500); // 500 bits (at least) 6988mpf_class f(x); // precision of x 6989mpf_class f(abs(x)); // precision of x 6990mpf_class f(-g, 1000); // 1000 bits (at least) 6991mpf_class f(x+y); // greater of precisions of x and y 6992@end example 6993@end deftypefun 6994 6995@deftypefun explicit mpf_class::mpf_class (const mpf_t @var{f}) 6996@deftypefunx {} mpf_class::mpf_class (const mpf_t @var{f}, mp_bitcnt_t @var{prec}) 6997Construct an @code{mpf_class} from an @code{mpf_t}. The value in @var{f} is 6998copied into the new @code{mpf_class}, there won't be any permanent association 6999between it and @var{f}. 7000 7001If @var{prec} is given, the initial precision is that value, in bits. If 7002@var{prec} is not given, then the initial precision is that of @var{f}. 7003@end deftypefun 7004 7005@deftypefun explicit mpf_class::mpf_class (const char *@var{s}) 7006@deftypefunx {} mpf_class::mpf_class (const char *@var{s}, mp_bitcnt_t @var{prec}, int @var{base} = 0) 7007@deftypefunx explicit mpf_class::mpf_class (const string& @var{s}) 7008@deftypefunx {} mpf_class::mpf_class (const string& @var{s}, mp_bitcnt_t @var{prec}, int @var{base} = 0) 7009Construct an @code{mpf_class} converted from a string using @code{mpf_set_str} 7010(@pxref{Assigning Floats}). If @var{prec} is given, the initial precision is 7011that value, in bits. If not, the default @code{mpf} precision 7012(@pxref{Initializing Floats}) is used. 7013 7014If the string is not a valid float, an @code{std::invalid_argument} exception 7015is thrown. The same applies to @code{operator=}. 7016@end deftypefun 7017 7018@deftypefun mpf_class operator"" _mpf (const char *@var{str}) 7019With C++11 compilers, floats can be constructed with the syntax 7020@code{1.23e-1_mpf} which is equivalent to @code{mpf_class("1.23e-1")}. 7021@end deftypefun 7022 7023@deftypefun {mpf_class&} mpf_class::operator= (type @var{op}) 7024Convert and store the given @var{op} value to an @code{mpf_class} object. The 7025same types are accepted as for the constructors above. 7026 7027Note that @code{operator=} only stores a new value, it doesn't copy or change 7028the precision of the destination, instead the value is truncated if necessary. 7029This is the same as @code{mpf_set} etc. Note in particular this means for 7030@code{mpf_class} a copy constructor is not the same as a default constructor 7031plus assignment. 7032 7033@example 7034mpf_class x (y); // x created with precision of y 7035 7036mpf_class x; // x created with default precision 7037x = y; // value truncated to that precision 7038@end example 7039 7040Applications using templated code may need to be careful about the assumptions 7041the code makes in this area, when working with @code{mpf_class} values of 7042various different or non-default precisions. For instance implementations of 7043the standard @code{complex} template have been seen in both styles above, 7044though of course @code{complex} is normally only actually specified for use 7045with the builtin float types. 7046@end deftypefun 7047 7048@deftypefun mpf_class abs (mpf_class @var{op}) 7049@deftypefunx mpf_class ceil (mpf_class @var{op}) 7050@deftypefunx int cmp (mpf_class @var{op1}, type @var{op2}) 7051@deftypefunx int cmp (type @var{op1}, mpf_class @var{op2}) 7052@maybepagebreak 7053@deftypefunx bool mpf_class::fits_sint_p (void) 7054@deftypefunx bool mpf_class::fits_slong_p (void) 7055@deftypefunx bool mpf_class::fits_sshort_p (void) 7056@maybepagebreak 7057@deftypefunx bool mpf_class::fits_uint_p (void) 7058@deftypefunx bool mpf_class::fits_ulong_p (void) 7059@deftypefunx bool mpf_class::fits_ushort_p (void) 7060@maybepagebreak 7061@deftypefunx mpf_class floor (mpf_class @var{op}) 7062@deftypefunx mpf_class hypot (mpf_class @var{op1}, mpf_class @var{op2}) 7063@maybepagebreak 7064@deftypefunx double mpf_class::get_d (void) 7065@deftypefunx long mpf_class::get_si (void) 7066@deftypefunx string mpf_class::get_str (mp_exp_t& @var{exp}, int @var{base} = 10, size_t @var{digits} = 0) 7067@deftypefunx {unsigned long} mpf_class::get_ui (void) 7068@maybepagebreak 7069@deftypefunx int mpf_class::set_str (const char *@var{str}, int @var{base}) 7070@deftypefunx int mpf_class::set_str (const string& @var{str}, int @var{base}) 7071@deftypefunx int sgn (mpf_class @var{op}) 7072@deftypefunx mpf_class sqrt (mpf_class @var{op}) 7073@maybepagebreak 7074@deftypefunx void mpf_class::swap (mpf_class& @var{op}) 7075@deftypefunx void swap (mpf_class& @var{op1}, mpf_class& @var{op2}) 7076@deftypefunx mpf_class trunc (mpf_class @var{op}) 7077These functions provide a C++ class interface to the corresponding GMP C 7078routines. 7079 7080@code{cmp} can be used with any of the classes or the standard C++ types, 7081except @code{long long} and @code{long double}. 7082 7083The accuracy provided by @code{hypot} is not currently guaranteed. 7084@end deftypefun 7085 7086@deftypefun {mp_bitcnt_t} mpf_class::get_prec () 7087@deftypefunx void mpf_class::set_prec (mp_bitcnt_t @var{prec}) 7088@deftypefunx void mpf_class::set_prec_raw (mp_bitcnt_t @var{prec}) 7089Get or set the current precision of an @code{mpf_class}. 7090 7091The restrictions described for @code{mpf_set_prec_raw} (@pxref{Initializing 7092Floats}) apply to @code{mpf_class::set_prec_raw}. Note in particular that the 7093@code{mpf_class} must be restored to it's allocated precision before being 7094destroyed. This must be done by application code, there's no automatic 7095mechanism for it. 7096@end deftypefun 7097 7098 7099@node C++ Interface Random Numbers, C++ Interface Limitations, C++ Interface Floats, C++ Class Interface 7100@section C++ Interface Random Numbers 7101 7102@deftp Class gmp_randclass 7103The C++ class interface to the GMP random number functions uses 7104@code{gmp_randclass} to hold an algorithm selection and current state, as per 7105@code{gmp_randstate_t}. 7106@end deftp 7107 7108@deftypefun {} gmp_randclass::gmp_randclass (void (*@var{randinit}) (gmp_randstate_t, @dots{}), @dots{}) 7109Construct a @code{gmp_randclass}, using a call to the given @var{randinit} 7110function (@pxref{Random State Initialization}). The arguments expected are 7111the same as @var{randinit}, but with @code{mpz_class} instead of @code{mpz_t}. 7112For example, 7113 7114@example 7115gmp_randclass r1 (gmp_randinit_default); 7116gmp_randclass r2 (gmp_randinit_lc_2exp_size, 32); 7117gmp_randclass r3 (gmp_randinit_lc_2exp, a, c, m2exp); 7118gmp_randclass r4 (gmp_randinit_mt); 7119@end example 7120 7121@code{gmp_randinit_lc_2exp_size} will fail if the size requested is too big, 7122an @code{std::length_error} exception is thrown in that case. 7123@end deftypefun 7124 7125@deftypefun {} gmp_randclass::gmp_randclass (gmp_randalg_t @var{alg}, @dots{}) 7126Construct a @code{gmp_randclass} using the same parameters as 7127@code{gmp_randinit} (@pxref{Random State Initialization}). This function is 7128obsolete and the above @var{randinit} style should be preferred. 7129@end deftypefun 7130 7131@deftypefun void gmp_randclass::seed (unsigned long int @var{s}) 7132@deftypefunx void gmp_randclass::seed (mpz_class @var{s}) 7133Seed a random number generator. See @pxref{Random Number Functions}, for how 7134to choose a good seed. 7135@end deftypefun 7136 7137@deftypefun mpz_class gmp_randclass::get_z_bits (mp_bitcnt_t @var{bits}) 7138@deftypefunx mpz_class gmp_randclass::get_z_bits (mpz_class @var{bits}) 7139Generate a random integer with a specified number of bits. 7140@end deftypefun 7141 7142@deftypefun mpz_class gmp_randclass::get_z_range (mpz_class @var{n}) 7143Generate a random integer in the range 0 to @math{@var{n}-1} inclusive. 7144@end deftypefun 7145 7146@deftypefun mpf_class gmp_randclass::get_f () 7147@deftypefunx mpf_class gmp_randclass::get_f (mp_bitcnt_t @var{prec}) 7148Generate a random float @var{f} in the range @math{0 <= @var{f} < 1}. @var{f} 7149will be to @var{prec} bits precision, or if @var{prec} is not given then to 7150the precision of the destination. For example, 7151 7152@example 7153gmp_randclass r; 7154... 7155mpf_class f (0, 512); // 512 bits precision 7156f = r.get_f(); // random number, 512 bits 7157@end example 7158@end deftypefun 7159 7160 7161 7162@node C++ Interface Limitations, , C++ Interface Random Numbers, C++ Class Interface 7163@section C++ Interface Limitations 7164 7165@table @asis 7166@item @code{mpq_class} and Templated Reading 7167A generic piece of template code probably won't know that @code{mpq_class} 7168requires a @code{canonicalize} call if inputs read with @code{operator>>} 7169might be non-canonical. This can lead to incorrect results. 7170 7171@code{operator>>} behaves as it does for reasons of efficiency. A 7172canonicalize can be quite time consuming on large operands, and is best 7173avoided if it's not necessary. 7174 7175But this potential difficulty reduces the usefulness of @code{mpq_class}. 7176Perhaps a mechanism to tell @code{operator>>} what to do will be adopted in 7177the future, maybe a preprocessor define, a global flag, or an @code{ios} flag 7178pressed into service. Or maybe, at the risk of inconsistency, the 7179@code{mpq_class} @code{operator>>} could canonicalize and leave @code{mpq_t} 7180@code{operator>>} not doing so, for use on those occasions when that's 7181acceptable. Send feedback or alternate ideas to @email{gmp-bugs@@gmplib.org}. 7182 7183@item Subclassing 7184Subclassing the GMP C++ classes works, but is not currently recommended. 7185 7186Expressions involving subclasses resolve correctly (or seem to), but in normal 7187C++ fashion the subclass doesn't inherit constructors and assignments. 7188There's many of those in the GMP classes, and a good way to reestablish them 7189in a subclass is not yet provided. 7190 7191@item Templated Expressions 7192A subtle difficulty exists when using expressions together with 7193application-defined template functions. Consider the following, with @code{T} 7194intended to be some numeric type, 7195 7196@example 7197template <class T> 7198T fun (const T &, const T &); 7199@end example 7200 7201@noindent 7202When used with, say, plain @code{mpz_class} variables, it works fine: @code{T} 7203is resolved as @code{mpz_class}. 7204 7205@example 7206mpz_class f(1), g(2); 7207fun (f, g); // Good 7208@end example 7209 7210@noindent 7211But when one of the arguments is an expression, it doesn't work. 7212 7213@example 7214mpz_class f(1), g(2), h(3); 7215fun (f, g+h); // Bad 7216@end example 7217 7218This is because @code{g+h} ends up being a certain expression template type 7219internal to @code{gmpxx.h}, which the C++ template resolution rules are unable 7220to automatically convert to @code{mpz_class}. The workaround is simply to add 7221an explicit cast. 7222 7223@example 7224mpz_class f(1), g(2), h(3); 7225fun (f, mpz_class(g+h)); // Good 7226@end example 7227 7228Similarly, within @code{fun} it may be necessary to cast an expression to type 7229@code{T} when calling a templated @code{fun2}. 7230 7231@example 7232template <class T> 7233void fun (T f, T g) 7234@{ 7235 fun2 (f, f+g); // Bad 7236@} 7237 7238template <class T> 7239void fun (T f, T g) 7240@{ 7241 fun2 (f, T(f+g)); // Good 7242@} 7243@end example 7244 7245@item C++11 7246C++11 provides several new ways in which types can be inferred: @code{auto}, 7247@code{decltype}, etc. While they can be very convenient, they don't mix well 7248with expression templates. In this example, the addition is performed twice, 7249as if we had defined @code{sum} as a macro. 7250 7251@example 7252mpz_class z = 33; 7253auto sum = z + z; 7254mpz_class prod = sum * sum; 7255@end example 7256 7257This other example may crash, though some compilers might make it look like 7258it is working, because the expression @code{z+z} goes out of scope before it 7259is evaluated. 7260 7261@example 7262mpz_class z = 33; 7263auto sum = z + z + z; 7264mpz_class prod = sum * 2; 7265@end example 7266 7267It is thus strongly recommended to avoid @code{auto} anywhere a GMP C++ 7268expression may appear. 7269@end table 7270 7271 7272@node Custom Allocation, Language Bindings, C++ Class Interface, Top 7273@comment node-name, next, previous, up 7274@chapter Custom Allocation 7275@cindex Custom allocation 7276@cindex Memory allocation 7277@cindex Allocation of memory 7278 7279By default GMP uses @code{malloc}, @code{realloc} and @code{free} for memory 7280allocation, and if they fail GMP prints a message to the standard error output 7281and terminates the program. 7282 7283Alternate functions can be specified, to allocate memory in a different way or 7284to have a different error action on running out of memory. 7285 7286@deftypefun void mp_set_memory_functions (@* void *(*@var{alloc_func_ptr}) (size_t), @* void *(*@var{realloc_func_ptr}) (void *, size_t, size_t), @* void (*@var{free_func_ptr}) (void *, size_t)) 7287Replace the current allocation functions from the arguments. If an argument 7288is @code{NULL}, the corresponding default function is used. 7289 7290These functions will be used for all memory allocation done by GMP, apart from 7291temporary space from @code{alloca} if that function is available and GMP is 7292configured to use it (@pxref{Build Options}). 7293 7294@strong{Be sure to call @code{mp_set_memory_functions} only when there are no 7295active GMP objects allocated using the previous memory functions! Usually 7296that means calling it before any other GMP function.} 7297@end deftypefun 7298 7299The functions supplied should fit the following declarations: 7300 7301@deftypevr Function {void *} allocate_function (size_t @var{alloc_size}) 7302Return a pointer to newly allocated space with at least @var{alloc_size} 7303bytes. 7304@end deftypevr 7305 7306@deftypevr Function {void *} reallocate_function (void *@var{ptr}, size_t @var{old_size}, size_t @var{new_size}) 7307Resize a previously allocated block @var{ptr} of @var{old_size} bytes to be 7308@var{new_size} bytes. 7309 7310The block may be moved if necessary or if desired, and in that case the 7311smaller of @var{old_size} and @var{new_size} bytes must be copied to the new 7312location. The return value is a pointer to the resized block, that being the 7313new location if moved or just @var{ptr} if not. 7314 7315@var{ptr} is never @code{NULL}, it's always a previously allocated block. 7316@var{new_size} may be bigger or smaller than @var{old_size}. 7317@end deftypevr 7318 7319@deftypevr Function void free_function (void *@var{ptr}, size_t @var{size}) 7320De-allocate the space pointed to by @var{ptr}. 7321 7322@var{ptr} is never @code{NULL}, it's always a previously allocated block of 7323@var{size} bytes. 7324@end deftypevr 7325 7326A @dfn{byte} here means the unit used by the @code{sizeof} operator. 7327 7328The @var{reallocate_function} parameter @var{old_size} and the 7329@var{free_function} parameter @var{size} are passed for convenience, but of 7330course they can be ignored if not needed by an implementation. The default 7331functions using @code{malloc} and friends for instance don't use them. 7332 7333No error return is allowed from any of these functions, if they return then 7334they must have performed the specified operation. In particular note that 7335@var{allocate_function} or @var{reallocate_function} mustn't return 7336@code{NULL}. 7337 7338Getting a different fatal error action is a good use for custom allocation 7339functions, for example giving a graphical dialog rather than the default print 7340to @code{stderr}. How much is possible when genuinely out of memory is 7341another question though. 7342 7343There's currently no defined way for the allocation functions to recover from 7344an error such as out of memory, they must terminate program execution. A 7345@code{longjmp} or throwing a C++ exception will have undefined results. This 7346may change in the future. 7347 7348GMP may use allocated blocks to hold pointers to other allocated blocks. This 7349will limit the assumptions a conservative garbage collection scheme can make. 7350 7351Since the default GMP allocation uses @code{malloc} and friends, those 7352functions will be linked in even if the first thing a program does is an 7353@code{mp_set_memory_functions}. It's necessary to change the GMP sources if 7354this is a problem. 7355 7356@sp 1 7357@deftypefun void mp_get_memory_functions (@* void *(**@var{alloc_func_ptr}) (size_t), @* void *(**@var{realloc_func_ptr}) (void *, size_t, size_t), @* void (**@var{free_func_ptr}) (void *, size_t)) 7358Get the current allocation functions, storing function pointers to the 7359locations given by the arguments. If an argument is @code{NULL}, that 7360function pointer is not stored. 7361 7362@need 1000 7363For example, to get just the current free function, 7364 7365@example 7366void (*freefunc) (void *, size_t); 7367 7368mp_get_memory_functions (NULL, NULL, &freefunc); 7369@end example 7370@end deftypefun 7371 7372@node Language Bindings, Algorithms, Custom Allocation, Top 7373@chapter Language Bindings 7374@cindex Language bindings 7375@cindex Other languages 7376 7377The following packages and projects offer access to GMP from languages other 7378than C, though perhaps with varying levels of functionality and efficiency. 7379 7380@c @spaceuref{U} is the same as @uref{U}, but with a couple of extra spaces 7381@c in tex, just to separate the URL from the preceding text a bit. 7382@iftex 7383@macro spaceuref {U} 7384@ @ @uref{\U\} 7385@end macro 7386@end iftex 7387@ifnottex 7388@macro spaceuref {U} 7389@uref{\U\} 7390@end macro 7391@end ifnottex 7392 7393@sp 1 7394@table @asis 7395@item C++ 7396@itemize @bullet 7397@item 7398GMP C++ class interface, @pxref{C++ Class Interface} @* Straightforward 7399interface, expression templates to eliminate temporaries. 7400@item 7401ALP @spaceuref{https://www-sop.inria.fr/saga/logiciels/ALP/} @* Linear algebra and 7402polynomials using templates. 7403@item 7404Arithmos @spaceuref{http://cant.ua.ac.be/old/arithmos/} @* Rationals 7405with infinities and square roots. 7406@item 7407CLN @spaceuref{http://www.ginac.de/CLN/} @* High level classes for arithmetic. 7408@item 7409Linbox @spaceuref{http://www.linalg.org/} @* Sparse vectors and matrices. 7410@item 7411NTL @spaceuref{http://www.shoup.net/ntl/} @* A C++ number theory library. 7412@end itemize 7413 7414@c @item D 7415@c @itemize @bullet 7416@c @item 7417@c gmp-d @spaceuref{http://home.comcast.net/~benhinkle/gmp-d/} 7418@c @end itemize 7419 7420@item Eiffel 7421@itemize @bullet 7422@item 7423Eiffelroom @spaceuref{http://www.eiffelroom.org/node/442} 7424@end itemize 7425 7426@c @item Fortran 7427@c @itemize @bullet 7428@c @item 7429@c Omni F77 @spaceuref{http://phase.hpcc.jp/Omni/home.html} @* Arbitrary 7430@c precision floats. 7431@c @end itemize 7432 7433@item Haskell 7434@itemize @bullet 7435@item 7436Glasgow Haskell Compiler @spaceuref{https://www.haskell.org/ghc/} 7437@end itemize 7438 7439@item Java 7440@itemize @bullet 7441@item 7442Kaffe @spaceuref{https://github.com/kaffe/kaffe} 7443@end itemize 7444 7445@item Lisp 7446@itemize @bullet 7447@item 7448GNU Common Lisp @spaceuref{https://www.gnu.org/software/gcl/gcl.html} 7449@item 7450Librep @spaceuref{http://librep.sourceforge.net/} 7451@item 7452@c FIXME: When there's a stable release with gmp support, just refer to it 7453@c rather than bothering to talk about betas. 7454XEmacs (21.5.18 beta and up) @spaceuref{http://www.xemacs.org} @* Optional 7455big integers, rationals and floats using GMP. 7456@end itemize 7457 7458@item M4 7459@itemize @bullet 7460@item 7461@c FIXME: When there's a stable release with gmp support, just refer to it 7462@c rather than bothering to talk about betas. 7463GNU m4 betas @spaceuref{http://www.seindal.dk/rene/gnu/} @* Optionally provides 7464an arbitrary precision @code{mpeval}. 7465@end itemize 7466 7467@item ML 7468@itemize @bullet 7469@item 7470MLton compiler @spaceuref{http://mlton.org/} 7471@end itemize 7472 7473@item Objective Caml 7474@itemize @bullet 7475@item 7476MLGMP @spaceuref{http://opam.ocamlpro.com/pkg/mlgmp.20120224.html} 7477@item 7478Numerix @spaceuref{http://pauillac.inria.fr/~quercia/} @* Optionally using 7479GMP. 7480@end itemize 7481 7482@item Oz 7483@itemize @bullet 7484@item 7485Mozart @spaceuref{http://mozart.github.io/} 7486@end itemize 7487 7488@item Pascal 7489@itemize @bullet 7490@item 7491GNU Pascal Compiler @spaceuref{http://www.gnu-pascal.de/} @* GMP unit. 7492@item 7493Numerix @spaceuref{http://pauillac.inria.fr/~quercia/} @* For Free Pascal, 7494optionally using GMP. 7495@end itemize 7496 7497@item Perl 7498@itemize @bullet 7499@item 7500GMP module, see @file{demos/perl} in the GMP sources (@pxref{Demonstration 7501Programs}). 7502@item 7503Math::GMP @spaceuref{http://www.cpan.org/} @* Compatible with Math::BigInt, but 7504not as many functions as the GMP module above. 7505@item 7506Math::BigInt::GMP @spaceuref{http://www.cpan.org/} @* Plug Math::GMP into 7507normal Math::BigInt operations. 7508@end itemize 7509 7510@need 1000 7511@item Pike 7512@itemize @bullet 7513@item 7514mpz module in the standard distribution, @uref{http://pike.ida.liu.se/} 7515@end itemize 7516 7517@need 500 7518@item Prolog 7519@itemize @bullet 7520@item 7521SWI Prolog @spaceuref{http://www.swi-prolog.org/} @* 7522Arbitrary precision floats. 7523@end itemize 7524 7525@item Python 7526@itemize @bullet 7527@item 7528GMPY @uref{https://code.google.com/p/gmpy/} 7529@end itemize 7530 7531@item Ruby 7532@itemize @bullet 7533@item 7534http://rubygems.org/gems/gmp 7535@end itemize 7536 7537@item Scheme 7538@itemize @bullet 7539@item 7540GNU Guile @spaceuref{https://www.gnu.org/software/guile/guile.html} 7541@item 7542RScheme @spaceuref{http://www.rscheme.org/} 7543@item 7544STklos @spaceuref{http://www.stklos.net/} 7545@c 7546@c For reference, MzScheme uses some of gmp, but (as of version 205) it only 7547@c has copies of some of the generic C code, and we don't consider that a 7548@c language binding to gmp. 7549@c 7550@end itemize 7551 7552@item Smalltalk 7553@itemize @bullet 7554@item 7555GNU Smalltalk @spaceuref{http://www.smalltalk.org/versions/GNUSmalltalk.html} 7556@end itemize 7557 7558@item Other 7559@itemize @bullet 7560@item 7561Axiom @uref{https://savannah.nongnu.org/projects/axiom} @* Computer algebra 7562using GCL. 7563@item 7564DrGenius @spaceuref{http://drgenius.seul.org/} @* Geometry system and 7565mathematical programming language. 7566@item 7567GiNaC @spaceuref{http://www.ginac.de/} @* C++ computer algebra using CLN. 7568@item 7569GOO @spaceuref{https://www.eecs.berkeley.edu/~jrb/goo/} @* Dynamic object oriented 7570language. 7571@item 7572Maxima @uref{https://www.ma.utexas.edu/users/wfs/maxima.html} @* Macsyma 7573computer algebra using GCL. 7574@c @item 7575@c Q @spaceuref{http://q-lang.sourceforge.net/} @* Equational programming system. 7576@item 7577Regina @spaceuref{http://regina.sourceforge.net/} @* Topological calculator. 7578@item 7579Yacas @spaceuref{http://yacas.sourceforge.net} @* Yet another computer algebra system. 7580@end itemize 7581 7582@end table 7583 7584 7585@node Algorithms, Internals, Language Bindings, Top 7586@chapter Algorithms 7587@cindex Algorithms 7588 7589This chapter is an introduction to some of the algorithms used for various GMP 7590operations. The code is likely to be hard to understand without knowing 7591something about the algorithms. 7592 7593Some GMP internals are mentioned, but applications that expect to be 7594compatible with future GMP releases should take care to use only the 7595documented functions. 7596 7597@menu 7598* Multiplication Algorithms:: 7599* Division Algorithms:: 7600* Greatest Common Divisor Algorithms:: 7601* Powering Algorithms:: 7602* Root Extraction Algorithms:: 7603* Radix Conversion Algorithms:: 7604* Other Algorithms:: 7605* Assembly Coding:: 7606@end menu 7607 7608 7609@node Multiplication Algorithms, Division Algorithms, Algorithms, Algorithms 7610@section Multiplication 7611@cindex Multiplication algorithms 7612 7613N@cross{}N limb multiplications and squares are done using one of seven 7614algorithms, as the size N increases. 7615 7616@quotation 7617@multitable {KaratsubaMMM} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM} 7618@item Algorithm @tab Threshold 7619@item Basecase @tab (none) 7620@item Karatsuba @tab @code{MUL_TOOM22_THRESHOLD} 7621@item Toom-3 @tab @code{MUL_TOOM33_THRESHOLD} 7622@item Toom-4 @tab @code{MUL_TOOM44_THRESHOLD} 7623@item Toom-6.5 @tab @code{MUL_TOOM6H_THRESHOLD} 7624@item Toom-8.5 @tab @code{MUL_TOOM8H_THRESHOLD} 7625@item FFT @tab @code{MUL_FFT_THRESHOLD} 7626@end multitable 7627@end quotation 7628 7629Similarly for squaring, with the @code{SQR} thresholds. 7630 7631N@cross{}M multiplications of operands with different sizes above 7632@code{MUL_TOOM22_THRESHOLD} are currently done by special Toom-inspired 7633algorithms or directly with FFT, depending on operand size (@pxref{Unbalanced 7634Multiplication}). 7635 7636@menu 7637* Basecase Multiplication:: 7638* Karatsuba Multiplication:: 7639* Toom 3-Way Multiplication:: 7640* Toom 4-Way Multiplication:: 7641* Higher degree Toom'n'half:: 7642* FFT Multiplication:: 7643* Other Multiplication:: 7644* Unbalanced Multiplication:: 7645@end menu 7646 7647 7648@node Basecase Multiplication, Karatsuba Multiplication, Multiplication Algorithms, Multiplication Algorithms 7649@subsection Basecase Multiplication 7650 7651Basecase N@cross{}M multiplication is a straightforward rectangular set of 7652cross-products, the same as long multiplication done by hand and for that 7653reason sometimes known as the schoolbook or grammar school method. This is an 7654@m{O(NM),O(N*M)} algorithm. See Knuth section 4.3.1 algorithm M 7655(@pxref{References}), and the @file{mpn/generic/mul_basecase.c} code. 7656 7657Assembly implementations of @code{mpn_mul_basecase} are essentially the same 7658as the generic C code, but have all the usual assembly tricks and 7659obscurities introduced for speed. 7660 7661A square can be done in roughly half the time of a multiply, by using the fact 7662that the cross products above and below the diagonal are the same. A triangle 7663of products below the diagonal is formed, doubled (left shift by one bit), and 7664then the products on the diagonal added. This can be seen in 7665@file{mpn/generic/sqr_basecase.c}. Again the assembly implementations take 7666essentially the same approach. 7667 7668@tex 7669\def\GMPline#1#2#3#4#5#6{% 7670 \hbox {% 7671 \vrule height 2.5ex depth 1ex 7672 \hbox to 2em {\hfil{#2}\hfil}% 7673 \vrule \hbox to 2em {\hfil{#3}\hfil}% 7674 \vrule \hbox to 2em {\hfil{#4}\hfil}% 7675 \vrule \hbox to 2em {\hfil{#5}\hfil}% 7676 \vrule \hbox to 2em {\hfil{#6}\hfil}% 7677 \vrule}} 7678\GMPdisplay{ 7679 \hbox{% 7680 \vbox{% 7681 \hbox to 1.5em {\vrule height 2.5ex depth 1ex width 0pt}% 7682 \hbox {\vrule height 2.5ex depth 1ex width 0pt u0\hfil}% 7683 \hbox {\vrule height 2.5ex depth 1ex width 0pt u1\hfil}% 7684 \hbox {\vrule height 2.5ex depth 1ex width 0pt u2\hfil}% 7685 \hbox {\vrule height 2.5ex depth 1ex width 0pt u3\hfil}% 7686 \hbox {\vrule height 2.5ex depth 1ex width 0pt u4\hfil}% 7687 \vfill}% 7688 \vbox{% 7689 \hbox{% 7690 \hbox to 2em {\hfil u0\hfil}% 7691 \hbox to 2em {\hfil u1\hfil}% 7692 \hbox to 2em {\hfil u2\hfil}% 7693 \hbox to 2em {\hfil u3\hfil}% 7694 \hbox to 2em {\hfil u4\hfil}}% 7695 \vskip 0.7ex 7696 \hrule 7697 \GMPline{u0}{d}{}{}{}{}% 7698 \hrule 7699 \GMPline{u1}{}{d}{}{}{}% 7700 \hrule 7701 \GMPline{u2}{}{}{d}{}{}% 7702 \hrule 7703 \GMPline{u3}{}{}{}{d}{}% 7704 \hrule 7705 \GMPline{u4}{}{}{}{}{d}% 7706 \hrule}}} 7707@end tex 7708@ifnottex 7709@example 7710@group 7711 u0 u1 u2 u3 u4 7712 +---+---+---+---+---+ 7713u0 | d | | | | | 7714 +---+---+---+---+---+ 7715u1 | | d | | | | 7716 +---+---+---+---+---+ 7717u2 | | | d | | | 7718 +---+---+---+---+---+ 7719u3 | | | | d | | 7720 +---+---+---+---+---+ 7721u4 | | | | | d | 7722 +---+---+---+---+---+ 7723@end group 7724@end example 7725@end ifnottex 7726 7727In practice squaring isn't a full 2@cross{} faster than multiplying, it's 7728usually around 1.5@cross{}. Less than 1.5@cross{} probably indicates 7729@code{mpn_sqr_basecase} wants improving on that CPU. 7730 7731On some CPUs @code{mpn_mul_basecase} can be faster than the generic C 7732@code{mpn_sqr_basecase} on some small sizes. @code{SQR_BASECASE_THRESHOLD} is 7733the size at which to use @code{mpn_sqr_basecase}, this will be zero if that 7734routine should be used always. 7735 7736 7737@node Karatsuba Multiplication, Toom 3-Way Multiplication, Basecase Multiplication, Multiplication Algorithms 7738@subsection Karatsuba Multiplication 7739@cindex Karatsuba multiplication 7740 7741The Karatsuba multiplication algorithm is described in Knuth section 4.3.3 7742part A, and various other textbooks. A brief description is given here. 7743 7744The inputs @math{x} and @math{y} are treated as each split into two parts of 7745equal length (or the most significant part one limb shorter if N is odd). 7746 7747@tex 7748% GMPboxwidth used for all the multiplication pictures 7749\global\newdimen\GMPboxwidth \global\GMPboxwidth=5em 7750% GMPboxdepth and GMPboxheight are also used for the float pictures 7751\global\newdimen\GMPboxdepth \global\GMPboxdepth=1ex 7752\global\newdimen\GMPboxheight \global\GMPboxheight=2ex 7753\gdef\GMPvrule{\vrule height \GMPboxheight depth \GMPboxdepth} 7754\def\GMPbox#1#2{% 7755 \vbox {% 7756 \hrule 7757 \hbox to 2\GMPboxwidth{% 7758 \GMPvrule \hfil $#1$\hfil \vrule \hfil $#2$\hfil \vrule}% 7759 \hrule}} 7760\GMPdisplay{% 7761\vbox{% 7762 \hbox to 2\GMPboxwidth {high \hfil low} 7763 \vskip 0.7ex 7764 \GMPbox{x_1}{x_0} 7765 \vskip 0.5ex 7766 \GMPbox{y_1}{y_0} 7767}} 7768@end tex 7769@ifnottex 7770@example 7771@group 7772 high low 7773+----------+----------+ 7774| x1 | x0 | 7775+----------+----------+ 7776 7777+----------+----------+ 7778| y1 | y0 | 7779+----------+----------+ 7780@end group 7781@end example 7782@end ifnottex 7783 7784Let @math{b} be the power of 2 where the split occurs, i.e.@: if @ms{x,0} is 7785@math{k} limbs (@ms{y,0} the same) then 7786@m{b=2\GMPraise{$k*$@code{mp\_bits\_per\_limb}}, b=2^(k*mp_bits_per_limb)}. 7787With that @m{x=x_1b+x_0,x=x1*b+x0} and @m{y=y_1b+y_0,y=y1*b+y0}, and the 7788following holds, 7789 7790@display 7791@m{xy = (b^2+b)x_1y_1 - b(x_1-x_0)(y_1-y_0) + (b+1)x_0y_0, 7792 x*y = (b^2+b)*x1*y1 - b*(x1-x0)*(y1-y0) + (b+1)*x0*y0} 7793@end display 7794 7795This formula means doing only three multiplies of (N/2)@cross{}(N/2) limbs, 7796whereas a basecase multiply of N@cross{}N limbs is equivalent to four 7797multiplies of (N/2)@cross{}(N/2). The factors @math{(b^2+b)} etc represent 7798the positions where the three products must be added. 7799 7800@tex 7801\def\GMPboxA#1#2{% 7802 \vbox{% 7803 \hrule 7804 \hbox{% 7805 \GMPvrule 7806 \hbox to 2\GMPboxwidth {\hfil\hbox{$#1$}\hfil}% 7807 \vrule 7808 \hbox to 2\GMPboxwidth {\hfil\hbox{$#2$}\hfil}% 7809 \vrule} 7810 \hrule}} 7811\def\GMPboxB#1#2{% 7812 \hbox{% 7813 \raise \GMPboxdepth \hbox to \GMPboxwidth {\hfil #1\hskip 0.5em}% 7814 \vbox{% 7815 \hrule 7816 \hbox{% 7817 \GMPvrule 7818 \hbox to 2\GMPboxwidth {\hfil\hbox{$#2$}\hfil}% 7819 \vrule}% 7820 \hrule}}} 7821\GMPdisplay{% 7822\vbox{% 7823 \hbox to 4\GMPboxwidth {high \hfil low} 7824 \vskip 0.7ex 7825 \GMPboxA{x_1y_1}{x_0y_0} 7826 \vskip 0.5ex 7827 \GMPboxB{$+$}{x_1y_1} 7828 \vskip 0.5ex 7829 \GMPboxB{$+$}{x_0y_0} 7830 \vskip 0.5ex 7831 \GMPboxB{$-$}{(x_1-x_0)(y_1-y_0)} 7832}} 7833@end tex 7834@ifnottex 7835@example 7836@group 7837 high low 7838+--------+--------+ +--------+--------+ 7839| x1*y1 | | x0*y0 | 7840+--------+--------+ +--------+--------+ 7841 +--------+--------+ 7842 add | x1*y1 | 7843 +--------+--------+ 7844 +--------+--------+ 7845 add | x0*y0 | 7846 +--------+--------+ 7847 +--------+--------+ 7848 sub | (x1-x0)*(y1-y0) | 7849 +--------+--------+ 7850@end group 7851@end example 7852@end ifnottex 7853 7854The term @m{(x_1-x_0)(y_1-y_0),(x1-x0)*(y1-y0)} is best calculated as an 7855absolute value, and the sign used to choose to add or subtract. Notice the 7856sum @m{\mathop{\rm high}(x_0y_0)+\mathop{\rm low}(x_1y_1), 7857high(x0*y0)+low(x1*y1)} occurs twice, so it's possible to do @m{5k,5*k} limb 7858additions, rather than @m{6k,6*k}, but in GMP extra function call overheads 7859outweigh the saving. 7860 7861Squaring is similar to multiplying, but with @math{x=y} the formula reduces to 7862an equivalent with three squares, 7863 7864@display 7865@m{x^2 = (b^2+b)x_1^2 - b(x_1-x_0)^2 + (b+1)x_0^2, 7866 x^2 = (b^2+b)*x1^2 - b*(x1-x0)^2 + (b+1)*x0^2} 7867@end display 7868 7869The final result is accumulated from those three squares the same way as for 7870the three multiplies above. The middle term @m{(x_1-x_0)^2,(x1-x0)^2} is now 7871always positive. 7872 7873A similar formula for both multiplying and squaring can be constructed with a 7874middle term @m{(x_1+x_0)(y_1+y_0),(x1+x0)*(y1+y0)}. But those sums can exceed 7875@math{k} limbs, leading to more carry handling and additions than the form 7876above. 7877 7878Karatsuba multiplication is asymptotically an @math{O(N^@W{1.585})} algorithm, 7879the exponent being @m{\log3/\log2,log(3)/log(2)}, representing 3 multiplies 7880each @math{1/2} the size of the inputs. This is a big improvement over the 7881basecase multiply at @math{O(N^2)} and the advantage soon overcomes the extra 7882additions Karatsuba performs. @code{MUL_TOOM22_THRESHOLD} can be as little 7883as 10 limbs. The @code{SQR} threshold is usually about twice the @code{MUL}. 7884 7885The basecase algorithm will take a time of the form @m{M(N) = aN^2 + bN + c, 7886M(N) = a*N^2 + b*N + c} and the Karatsuba algorithm @m{K(N) = 3M(N/2) + dN + 7887e, K(N) = 3*M(N/2) + d*N + e}, which expands to @m{K(N) = {3\over4} aN^2 + 7888{3\over2} bN + 3c + dN + e, K(N) = 3/4*a*N^2 + 3/2*b*N + 3*c + d*N + e}. The 7889factor @m{3\over4, 3/4} for @math{a} means per-crossproduct speedups in the 7890basecase code will increase the threshold since they benefit @math{M(N)} more 7891than @math{K(N)}. And conversely the @m{3\over2, 3/2} for @math{b} means 7892linear style speedups of @math{b} will increase the threshold since they 7893benefit @math{K(N)} more than @math{M(N)}. The latter can be seen for 7894instance when adding an optimized @code{mpn_sqr_diagonal} to 7895@code{mpn_sqr_basecase}. Of course all speedups reduce total time, and in 7896that sense the algorithm thresholds are merely of academic interest. 7897 7898 7899@node Toom 3-Way Multiplication, Toom 4-Way Multiplication, Karatsuba Multiplication, Multiplication Algorithms 7900@subsection Toom 3-Way Multiplication 7901@cindex Toom multiplication 7902 7903The Karatsuba formula is the simplest case of a general approach to splitting 7904inputs that leads to both Toom and FFT algorithms. A description of 7905Toom can be found in Knuth section 4.3.3, with an example 3-way 7906calculation after Theorem A@. The 3-way form used in GMP is described here. 7907 7908The operands are each considered split into 3 pieces of equal length (or the 7909most significant part 1 or 2 limbs shorter than the other two). 7910 7911@tex 7912\def\GMPbox#1#2#3{% 7913 \vbox{% 7914 \hrule \vfil 7915 \hbox to 3\GMPboxwidth {% 7916 \GMPvrule 7917 \hfil$#1$\hfil 7918 \vrule 7919 \hfil$#2$\hfil 7920 \vrule 7921 \hfil$#3$\hfil 7922 \vrule}% 7923 \vfil \hrule 7924}} 7925\GMPdisplay{% 7926\vbox{% 7927 \hbox to 3\GMPboxwidth {high \hfil low} 7928 \vskip 0.7ex 7929 \GMPbox{x_2}{x_1}{x_0} 7930 \vskip 0.5ex 7931 \GMPbox{y_2}{y_1}{y_0} 7932 \vskip 0.5ex 7933}} 7934@end tex 7935@ifnottex 7936@example 7937@group 7938 high low 7939+----------+----------+----------+ 7940| x2 | x1 | x0 | 7941+----------+----------+----------+ 7942 7943+----------+----------+----------+ 7944| y2 | y1 | y0 | 7945+----------+----------+----------+ 7946@end group 7947@end example 7948@end ifnottex 7949 7950@noindent 7951These parts are treated as the coefficients of two polynomials 7952 7953@display 7954@group 7955@m{X(t) = x_2t^2 + x_1t + x_0, 7956 X(t) = x2*t^2 + x1*t + x0} 7957@m{Y(t) = y_2t^2 + y_1t + y_0, 7958 Y(t) = y2*t^2 + y1*t + y0} 7959@end group 7960@end display 7961 7962Let @math{b} equal the power of 2 which is the size of the @ms{x,0}, @ms{x,1}, 7963@ms{y,0} and @ms{y,1} pieces, i.e.@: if they're @math{k} limbs each then 7964@m{b=2\GMPraise{$k*$@code{mp\_bits\_per\_limb}}, b=2^(k*mp_bits_per_limb)}. 7965With this @math{x=X(b)} and @math{y=Y(b)}. 7966 7967Let a polynomial @m{W(t)=X(t)Y(t),W(t)=X(t)*Y(t)} and suppose its coefficients 7968are 7969 7970@display 7971@m{W(t) = w_4t^4 + w_3t^3 + w_2t^2 + w_1t + w_0, 7972 W(t) = w4*t^4 + w3*t^3 + w2*t^2 + w1*t + w0} 7973@end display 7974 7975The @m{w_i,w[i]} are going to be determined, and when they are they'll give 7976the final result using @math{w=W(b)}, since 7977@m{xy=X(b)Y(b),x*y=X(b)*Y(b)=W(b)}. The coefficients will be roughly 7978@math{b^2} each, and the final @math{W(b)} will be an addition like, 7979 7980@tex 7981\def\GMPbox#1#2{% 7982 \moveright #1\GMPboxwidth 7983 \vbox{% 7984 \hrule 7985 \hbox{% 7986 \GMPvrule 7987 \hbox to 2\GMPboxwidth {\hfil$#2$\hfil}% 7988 \vrule}% 7989 \hrule 7990}} 7991\GMPdisplay{% 7992\vbox{% 7993 \hbox to 6\GMPboxwidth {high \hfil low}% 7994 \vskip 0.7ex 7995 \GMPbox{0}{w_4} 7996 \vskip 0.5ex 7997 \GMPbox{1}{w_3} 7998 \vskip 0.5ex 7999 \GMPbox{2}{w_2} 8000 \vskip 0.5ex 8001 \GMPbox{3}{w_1} 8002 \vskip 0.5ex 8003 \GMPbox{4}{w_0} 8004}} 8005@end tex 8006@ifnottex 8007@example 8008@group 8009 high low 8010+-------+-------+ 8011| w4 | 8012+-------+-------+ 8013 +--------+-------+ 8014 | w3 | 8015 +--------+-------+ 8016 +--------+-------+ 8017 | w2 | 8018 +--------+-------+ 8019 +--------+-------+ 8020 | w1 | 8021 +--------+-------+ 8022 +-------+-------+ 8023 | w0 | 8024 +-------+-------+ 8025@end group 8026@end example 8027@end ifnottex 8028 8029The @m{w_i,w[i]} coefficients could be formed by a simple set of cross 8030products, like @m{w_4=x_2y_2,w4=x2*y2}, @m{w_3=x_2y_1+x_1y_2,w3=x2*y1+x1*y2}, 8031@m{w_2=x_2y_0+x_1y_1+x_0y_2,w2=x2*y0+x1*y1+x0*y2} etc, but this would need all 8032nine @m{x_iy_j,x[i]*y[j]} for @math{i,j=0,1,2}, and would be equivalent merely 8033to a basecase multiply. Instead the following approach is used. 8034 8035@math{X(t)} and @math{Y(t)} are evaluated and multiplied at 5 points, giving 8036values of @math{W(t)} at those points. In GMP the following points are used, 8037 8038@quotation 8039@multitable {@m{t=\infty,t=inf}M} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM} 8040@item Point @tab Value 8041@item @math{t=0} @tab @m{x_0y_0,x0 * y0}, which gives @ms{w,0} immediately 8042@item @math{t=1} @tab @m{(x_2+x_1+x_0)(y_2+y_1+y_0),(x2+x1+x0) * (y2+y1+y0)} 8043@item @math{t=-1} @tab @m{(x_2-x_1+x_0)(y_2-y_1+y_0),(x2-x1+x0) * (y2-y1+y0)} 8044@item @math{t=2} @tab @m{(4x_2+2x_1+x_0)(4y_2+2y_1+y_0),(4*x2+2*x1+x0) * (4*y2+2*y1+y0)} 8045@item @m{t=\infty,t=inf} @tab @m{x_2y_2,x2 * y2}, which gives @ms{w,4} immediately 8046@end multitable 8047@end quotation 8048 8049At @math{t=-1} the values can be negative and that's handled using the 8050absolute values and tracking the sign separately. At @m{t=\infty,t=inf} the 8051value is actually @m{\lim_{t\to\infty} {X(t)Y(t)\over t^4}, X(t)*Y(t)/t^4 in 8052the limit as t approaches infinity}, but it's much easier to think of as 8053simply @m{x_2y_2,x2*y2} giving @ms{w,4} immediately (much like 8054@m{x_0y_0,x0*y0} at @math{t=0} gives @ms{w,0} immediately). 8055 8056Each of the points substituted into 8057@m{W(t)=w_4t^4+\cdots+w_0,W(t)=w4*t^4+@dots{}+w0} gives a linear combination 8058of the @m{w_i,w[i]} coefficients, and the value of those combinations has just 8059been calculated. 8060 8061@tex 8062\GMPdisplay{% 8063$\matrix{% 8064W(0) & = & & & & & & & & & w_0 \cr 8065W(1) & = & w_4 & + & w_3 & + & w_2 & + & w_1 & + & w_0 \cr 8066W(-1) & = & w_4 & - & w_3 & + & w_2 & - & w_1 & + & w_0 \cr 8067W(2) & = & 16w_4 & + & 8w_3 & + & 4w_2 & + & 2w_1 & + & w_0 \cr 8068W(\infty) & = & w_4 \cr 8069}$} 8070@end tex 8071@ifnottex 8072@example 8073@group 8074W(0) = w0 8075W(1) = w4 + w3 + w2 + w1 + w0 8076W(-1) = w4 - w3 + w2 - w1 + w0 8077W(2) = 16*w4 + 8*w3 + 4*w2 + 2*w1 + w0 8078W(inf) = w4 8079@end group 8080@end example 8081@end ifnottex 8082 8083This is a set of five equations in five unknowns, and some elementary linear 8084algebra quickly isolates each @m{w_i,w[i]}. This involves adding or 8085subtracting one @math{W(t)} value from another, and a couple of divisions by 8086powers of 2 and one division by 3, the latter using the special 8087@code{mpn_divexact_by3} (@pxref{Exact Division}). 8088 8089The conversion of @math{W(t)} values to the coefficients is interpolation. A 8090polynomial of degree 4 like @math{W(t)} is uniquely determined by values known 8091at 5 different points. The points are arbitrary and can be chosen to make the 8092linear equations come out with a convenient set of steps for quickly isolating 8093the @m{w_i,w[i]}. 8094 8095Squaring follows the same procedure as multiplication, but there's only one 8096@math{X(t)} and it's evaluated at the 5 points, and those values squared to 8097give values of @math{W(t)}. The interpolation is then identical, and in fact 8098the same @code{toom_interpolate_5pts} subroutine is used for both squaring and 8099multiplying. 8100 8101Toom-3 is asymptotically @math{O(N^@W{1.465})}, the exponent being 8102@m{\log5/\log3,log(5)/log(3)}, representing 5 recursive multiplies of 1/3 the 8103original size each. This is an improvement over Karatsuba at 8104@math{O(N^@W{1.585})}, though Toom does more work in the evaluation and 8105interpolation and so it only realizes its advantage above a certain size. 8106 8107Near the crossover between Toom-3 and Karatsuba there's generally a range of 8108sizes where the difference between the two is small. 8109@code{MUL_TOOM33_THRESHOLD} is a somewhat arbitrary point in that range and 8110successive runs of the tune program can give different values due to small 8111variations in measuring. A graph of time versus size for the two shows the 8112effect, see @file{tune/README}. 8113 8114At the fairly small sizes where the Toom-3 thresholds occur it's worth 8115remembering that the asymptotic behaviour for Karatsuba and Toom-3 can't be 8116expected to make accurate predictions, due of course to the big influence of 8117all sorts of overheads, and the fact that only a few recursions of each are 8118being performed. Even at large sizes there's a good chance machine dependent 8119effects like cache architecture will mean actual performance deviates from 8120what might be predicted. 8121 8122The formula given for the Karatsuba algorithm (@pxref{Karatsuba 8123Multiplication}) has an equivalent for Toom-3 involving only five multiplies, 8124but this would be complicated and unenlightening. 8125 8126An alternate view of Toom-3 can be found in Zuras (@pxref{References}), using 8127a vector to represent the @math{x} and @math{y} splits and a matrix 8128multiplication for the evaluation and interpolation stages. The matrix 8129inverses are not meant to be actually used, and they have elements with values 8130much greater than in fact arise in the interpolation steps. The diagram shown 8131for the 3-way is attractive, but again doesn't have to be implemented that way 8132and for example with a bit of rearrangement just one division by 6 can be 8133done. 8134 8135 8136@node Toom 4-Way Multiplication, Higher degree Toom'n'half, Toom 3-Way Multiplication, Multiplication Algorithms 8137@subsection Toom 4-Way Multiplication 8138@cindex Toom multiplication 8139 8140Karatsuba and Toom-3 split the operands into 2 and 3 coefficients, 8141respectively. Toom-4 analogously splits the operands into 4 coefficients. 8142Using the notation from the section on Toom-3 multiplication, we form two 8143polynomials: 8144 8145@display 8146@group 8147@m{X(t) = x_3t^3 + x_2t^2 + x_1t + x_0, 8148 X(t) = x3*t^3 + x2*t^2 + x1*t + x0} 8149@m{Y(t) = y_3t^3 + y_2t^2 + y_1t + y_0, 8150 Y(t) = y3*t^3 + y2*t^2 + y1*t + y0} 8151@end group 8152@end display 8153 8154@math{X(t)} and @math{Y(t)} are evaluated and multiplied at 7 points, giving 8155values of @math{W(t)} at those points. In GMP the following points are used, 8156 8157@quotation 8158@multitable {@m{t=-1/2,t=inf}M} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM} 8159@item Point @tab Value 8160@item @math{t=0} @tab @m{x_0y_0,x0 * y0}, which gives @ms{w,0} immediately 8161@item @math{t=1/2} @tab @m{(x_3+2x_2+4x_1+8x_0)(y_3+2y_2+4y_1+8y_0),(x3+2*x2+4*x1+8*x0) * (y3+2*y2+4*y1+8*y0)} 8162@item @math{t=-1/2} @tab @m{(-x_3+2x_2-4x_1+8x_0)(-y_3+2y_2-4y_1+8y_0),(-x3+2*x2-4*x1+8*x0) * (-y3+2*y2-4*y1+8*y0)} 8163@item @math{t=1} @tab @m{(x_3+x_2+x_1+x_0)(y_3+y_2+y_1+y_0),(x3+x2+x1+x0) * (y3+y2+y1+y0)} 8164@item @math{t=-1} @tab @m{(-x_3+x_2-x_1+x_0)(-y_3+y_2-y_1+y_0),(-x3+x2-x1+x0) * (-y3+y2-y1+y0)} 8165@item @math{t=2} @tab @m{(8x_3+4x_2+2x_1+x_0)(8y_3+4y_2+2y_1+y_0),(8*x3+4*x2+2*x1+x0) * (8*y3+4*y2+2*y1+y0)} 8166@item @m{t=\infty,t=inf} @tab @m{x_3y_3,x3 * y3}, which gives @ms{w,6} immediately 8167@end multitable 8168@end quotation 8169 8170The number of additions and subtractions for Toom-4 is much larger than for Toom-3. 8171But several subexpressions occur multiple times, for example @m{x_2+x_0,x2+x0}, occurs 8172for both @math{t=1} and @math{t=-1}. 8173 8174Toom-4 is asymptotically @math{O(N^@W{1.404})}, the exponent being 8175@m{\log7/\log4,log(7)/log(4)}, representing 7 recursive multiplies of 1/4 the 8176original size each. 8177 8178 8179@node Higher degree Toom'n'half, FFT Multiplication, Toom 4-Way Multiplication, Multiplication Algorithms 8180@subsection Higher degree Toom'n'half 8181@cindex Toom multiplication 8182 8183The Toom algorithms described above (@pxref{Toom 3-Way Multiplication}, 8184@pxref{Toom 4-Way Multiplication}) generalizes to split into an arbitrary 8185number of pieces. In general a split of two equally long operands into 8186@math{r} pieces leads to evaluations and pointwise multiplications done at 8187@m{2r-1,2*r-1} points. To fully exploit symmetries it would be better to have 8188a multiple of 4 points, that's why for higher degree Toom'n'half is used. 8189 8190Toom'n'half means that the existence of one more piece is considered for a 8191single operand. It can be virtual, i.e. zero, or real, when the two operand 8192are not exactly balanced. By choosing an even @math{r}, 8193Toom-@m{r{1\over2},r+1/2} requires @math{2r} points, a multiple of four. 8194 8195The four-plets of points include 0, @m{\infty,inf}, +1, -1 and 8196@m{\pm2^i,+-2^i}, @m{\pm2^{-i},+-2^-i} . Each of them giving shortcuts for the 8197evaluation phase and for some steps in the interpolation phase. Further tricks 8198are used to reduce the memory footprint of the whole multiplication algorithm 8199to a memory buffer equanl in size to the result of the product. 8200 8201Current GMP uses both Toom-6'n'half and Toom-8'n'half. 8202 8203 8204@node FFT Multiplication, Other Multiplication, Higher degree Toom'n'half, Multiplication Algorithms 8205@subsection FFT Multiplication 8206@cindex FFT multiplication 8207@cindex Fast Fourier Transform 8208 8209At large to very large sizes a Fermat style FFT multiplication is used, 8210following Sch@"onhage and Strassen (@pxref{References}). Descriptions of FFTs 8211in various forms can be found in many textbooks, for instance Knuth section 82124.3.3 part C or Lipson chapter IX@. A brief description of the form used in 8213GMP is given here. 8214 8215The multiplication done is @m{xy \bmod 2^N+1, x*y mod 2^N+1}, for a given 8216@math{N}. A full product @m{xy,x*y} is obtained by choosing @m{N \ge 8217\mathop{\rm bits}(x)+\mathop{\rm bits}(y), N>=bits(x)+bits(y)} and padding 8218@math{x} and @math{y} with high zero limbs. The modular product is the native 8219form for the algorithm, so padding to get a full product is unavoidable. 8220 8221The algorithm follows a split, evaluate, pointwise multiply, interpolate and 8222combine similar to that described above for Karatsuba and Toom-3. A @math{k} 8223parameter controls the split, with an FFT-@math{k} splitting into @math{2^k} 8224pieces of @math{M=N/2^k} bits each. @math{N} must be a multiple of 8225@m{2^k\times@code{mp\_bits\_per\_limb}, (2^k)*@nicode{mp_bits_per_limb}} so 8226the split falls on limb boundaries, avoiding bit shifts in the split and 8227combine stages. 8228 8229The evaluations, pointwise multiplications, and interpolation, are all done 8230modulo @m{2^{N'}+1, 2^N'+1} where @math{N'} is @math{2M+k+3} rounded up to a 8231multiple of @math{2^k} and of @code{mp_bits_per_limb}. The results of 8232interpolation will be the following negacyclic convolution of the input 8233pieces, and the choice of @math{N'} ensures these sums aren't truncated. 8234@tex 8235$$ w_n = \sum_{{i+j = b2^k+n}\atop{b=0,1}} (-1)^b x_i y_j $$ 8236@end tex 8237@ifnottex 8238 8239@example 8240 --- 8241 \ b 8242w[n] = / (-1) * x[i] * y[j] 8243 --- 8244 i+j==b*2^k+n 8245 b=0,1 8246@end example 8247 8248@end ifnottex 8249The points used for the evaluation are @math{g^i} for @math{i=0} to 8250@math{2^k-1} where @m{g=2^{2N'/2^k}, g=2^(2N'/2^k)}. @math{g} is a 8251@m{2^k,2^k'}th root of unity mod @m{2^{N'}+1,2^N'+1}, which produces necessary 8252cancellations at the interpolation stage, and it's also a power of 2 so the 8253fast Fourier transforms used for the evaluation and interpolation do only 8254shifts, adds and negations. 8255 8256The pointwise multiplications are done modulo @m{2^{N'}+1, 2^N'+1} and either 8257recurse into a further FFT or use a plain multiplication (Toom-3, Karatsuba or 8258basecase), whichever is optimal at the size @math{N'}. The interpolation is 8259an inverse fast Fourier transform. The resulting set of sums of @m{x_iy_j, 8260x[i]*y[j]} are added at appropriate offsets to give the final result. 8261 8262Squaring is the same, but @math{x} is the only input so it's one transform at 8263the evaluate stage and the pointwise multiplies are squares. The 8264interpolation is the same. 8265 8266For a mod @math{2^N+1} product, an FFT-@math{k} is an @m{O(N^{k/(k-1)}), 8267O(N^(k/(k-1)))} algorithm, the exponent representing @math{2^k} recursed 8268modular multiplies each @m{1/2^{k-1},1/2^(k-1)} the size of the original. 8269Each successive @math{k} is an asymptotic improvement, but overheads mean each 8270is only faster at bigger and bigger sizes. In the code, @code{MUL_FFT_TABLE} 8271and @code{SQR_FFT_TABLE} are the thresholds where each @math{k} is used. Each 8272new @math{k} effectively swaps some multiplying for some shifts, adds and 8273overheads. 8274 8275A mod @math{2^N+1} product can be formed with a normal 8276@math{N@cross{}N@rightarrow{}2N} bit multiply plus a subtraction, so an FFT 8277and Toom-3 etc can be compared directly. A @math{k=4} FFT at 8278@math{O(N^@W{1.333})} can be expected to be the first faster than Toom-3 at 8279@math{O(N^@W{1.465})}. In practice this is what's found, with 8280@code{MUL_FFT_MODF_THRESHOLD} and @code{SQR_FFT_MODF_THRESHOLD} being between 8281300 and 1000 limbs, depending on the CPU@. So far it's been found that only 8282very large FFTs recurse into pointwise multiplies above these sizes. 8283 8284When an FFT is to give a full product, the change of @math{N} to @math{2N} 8285doesn't alter the theoretical complexity for a given @math{k}, but for the 8286purposes of considering where an FFT might be first used it can be assumed 8287that the FFT is recursing into a normal multiply and that on that basis it's 8288doing @math{2^k} recursed multiplies each @m{1/2^{k-2},1/2^(k-2)} the size of 8289the inputs, making it @m{O(N^{k/(k-2)}), O(N^(k/(k-2)))}. This would mean 8290@math{k=7} at @math{O(N^@W{1.4})} would be the first FFT faster than Toom-3. 8291In practice @code{MUL_FFT_THRESHOLD} and @code{SQR_FFT_THRESHOLD} have been 8292found to be in the @math{k=8} range, somewhere between 3000 and 10000 limbs. 8293 8294The way @math{N} is split into @math{2^k} pieces and then @math{2M+k+3} is 8295rounded up to a multiple of @math{2^k} and @code{mp_bits_per_limb} means that 8296when @math{2^k@ge{}@nicode{mp\_bits\_per\_limb}} the effective @math{N} is a 8297multiple of @m{2^{2k-1},2^(2k-1)} bits. The @math{+k+3} means some values of 8298@math{N} just under such a multiple will be rounded to the next. The 8299complexity calculations above assume that a favourable size is used, meaning 8300one which isn't padded through rounding, and it's also assumed that the extra 8301@math{+k+3} bits are negligible at typical FFT sizes. 8302 8303The practical effect of the @m{2^{2k-1},2^(2k-1)} constraint is to introduce a 8304step-effect into measured speeds. For example @math{k=8} will round @math{N} 8305up to a multiple of 32768 bits, so for a 32-bit limb there'll be 512 limb 8306groups of sizes for which @code{mpn_mul_n} runs at the same speed. Or for 8307@math{k=9} groups of 2048 limbs, @math{k=10} groups of 8192 limbs, etc. In 8308practice it's been found each @math{k} is used at quite small multiples of its 8309size constraint and so the step effect is quite noticeable in a time versus 8310size graph. 8311 8312The threshold determinations currently measure at the mid-points of size 8313steps, but this is sub-optimal since at the start of a new step it can happen 8314that it's better to go back to the previous @math{k} for a while. Something 8315more sophisticated for @code{MUL_FFT_TABLE} and @code{SQR_FFT_TABLE} will be 8316needed. 8317 8318 8319@node Other Multiplication, Unbalanced Multiplication, FFT Multiplication, Multiplication Algorithms 8320@subsection Other Multiplication 8321@cindex Toom multiplication 8322 8323The Toom algorithms described above (@pxref{Toom 3-Way Multiplication}, 8324@pxref{Toom 4-Way Multiplication}) generalizes to split into an arbitrary 8325number of pieces, as per Knuth section 4.3.3 algorithm C@. This is not 8326currently used. The notes here are merely for interest. 8327 8328In general a split into @math{r+1} pieces is made, and evaluations and 8329pointwise multiplications done at @m{2r+1,2*r+1} points. A 4-way split does 7 8330pointwise multiplies, 5-way does 9, etc. Asymptotically an @math{(r+1)}-way 8331algorithm is @m{O(N^{log(2r+1)/log(r+1)}), O(N^(log(2*r+1)/log(r+1)))}. Only 8332the pointwise multiplications count towards big-@math{O} complexity, but the 8333time spent in the evaluate and interpolate stages grows with @math{r} and has 8334a significant practical impact, with the asymptotic advantage of each @math{r} 8335realized only at bigger and bigger sizes. The overheads grow as 8336@m{O(Nr),O(N*r)}, whereas in an @math{r=2^k} FFT they grow only as @m{O(N \log 8337r), O(N*log(r))}. 8338 8339Knuth algorithm C evaluates at points 0,1,2,@dots{},@m{2r,2*r}, but exercise 4 8340uses @math{-r},@dots{},0,@dots{},@math{r} and the latter saves some small 8341multiplies in the evaluate stage (or rather trades them for additions), and 8342has a further saving of nearly half the interpolate steps. The idea is to 8343separate odd and even final coefficients and then perform algorithm C steps C7 8344and C8 on them separately. The divisors at step C7 become @math{j^2} and the 8345multipliers at C8 become @m{2tj-j^2,2*t*j-j^2}. 8346 8347Splitting odd and even parts through positive and negative points can be 8348thought of as using @math{-1} as a square root of unity. If a 4th root of 8349unity was available then a further split and speedup would be possible, but no 8350such root exists for plain integers. Going to complex integers with 8351@m{i=\sqrt{-1}, i=sqrt(-1)} doesn't help, essentially because in Cartesian 8352form it takes three real multiplies to do a complex multiply. The existence 8353of @m{2^k,2^k'}th roots of unity in a suitable ring or field lets the fast 8354Fourier transform keep splitting and get to @m{O(N \log r), O(N*log(r))}. 8355 8356Floating point FFTs use complex numbers approximating Nth roots of unity. 8357Some processors have special support for such FFTs. But these are not used in 8358GMP since it's very difficult to guarantee an exact result (to some number of 8359bits). An occasional difference of 1 in the last bit might not matter to a 8360typical signal processing algorithm, but is of course of vital importance to 8361GMP. 8362 8363 8364@node Unbalanced Multiplication, , Other Multiplication, Multiplication Algorithms 8365@subsection Unbalanced Multiplication 8366@cindex Unbalanced multiplication 8367 8368Multiplication of operands with different sizes, both below 8369@code{MUL_TOOM22_THRESHOLD} are done with plain schoolbook multiplication 8370(@pxref{Basecase Multiplication}). 8371 8372For really large operands, we invoke FFT directly. 8373 8374For operands between these sizes, we use Toom inspired algorithms suggested by 8375Alberto Zanoni and Marco Bodrato. The idea is to split the operands into 8376polynomials of different degree. GMP currently splits the smaller operand 8377onto 2 coefficients, i.e., a polynomial of degree 1, but the larger operand 8378can be split into 2, 3, or 4 coefficients, i.e., a polynomial of degree 1 to 83793. 8380 8381@c FIXME: This is mighty ugly, but a cleaner @need triggers texinfo bugs that 8382@c screws up layout here and there in the rest of the manual. 8383@c @tex 8384@c \goodbreak 8385@c @end tex 8386@node Division Algorithms, Greatest Common Divisor Algorithms, Multiplication Algorithms, Algorithms 8387@section Division Algorithms 8388@cindex Division algorithms 8389 8390@menu 8391* Single Limb Division:: 8392* Basecase Division:: 8393* Divide and Conquer Division:: 8394* Block-Wise Barrett Division:: 8395* Exact Division:: 8396* Exact Remainder:: 8397* Small Quotient Division:: 8398@end menu 8399 8400 8401@node Single Limb Division, Basecase Division, Division Algorithms, Division Algorithms 8402@subsection Single Limb Division 8403 8404N@cross{}1 division is implemented using repeated 2@cross{}1 divisions from 8405high to low, either with a hardware divide instruction or a multiplication by 8406inverse, whichever is best on a given CPU. 8407 8408The multiply by inverse follows ``Improved division by invariant integers'' by 8409M@"oller and Granlund (@pxref{References}) and is implemented as 8410@code{udiv_qrnnd_preinv} in @file{gmp-impl.h}. The idea is to have a 8411fixed-point approximation to @math{1/d} (see @code{invert_limb}) and then 8412multiply by the high limb (plus one bit) of the dividend to get a quotient 8413@math{q}. With @math{d} normalized (high bit set), @math{q} is no more than 1 8414too small. Subtracting @m{qd,q*d} from the dividend gives a remainder, and 8415reveals whether @math{q} or @math{q-1} is correct. 8416 8417The result is a division done with two multiplications and four or five 8418arithmetic operations. On CPUs with low latency multipliers this can be much 8419faster than a hardware divide, though the cost of calculating the inverse at 8420the start may mean it's only better on inputs bigger than say 4 or 5 limbs. 8421 8422When a divisor must be normalized, either for the generic C 8423@code{__udiv_qrnnd_c} or the multiply by inverse, the division performed is 8424actually @m{a2^k,a*2^k} by @m{d2^k,d*2^k} where @math{a} is the dividend and 8425@math{k} is the power necessary to have the high bit of @m{d2^k,d*2^k} set. 8426The bit shifts for the dividend are usually accomplished ``on the fly'' 8427meaning by extracting the appropriate bits at each step. Done this way the 8428quotient limbs come out aligned ready to store. When only the remainder is 8429wanted, an alternative is to take the dividend limbs unshifted and calculate 8430@m{r = a \bmod d2^k, r = a mod d*2^k} followed by an extra final step @m{r2^k 8431\bmod d2^k, r*2^k mod d*2^k}. This can help on CPUs with poor bit shifts or 8432few registers. 8433 8434The multiply by inverse can be done two limbs at a time. The calculation is 8435basically the same, but the inverse is two limbs and the divisor treated as if 8436padded with a low zero limb. This means more work, since the inverse will 8437need a 2@cross{}2 multiply, but the four 1@cross{}1s to do that are 8438independent and can therefore be done partly or wholly in parallel. Likewise 8439for a 2@cross{}1 calculating @m{qd,q*d}. The net effect is to process two 8440limbs with roughly the same two multiplies worth of latency that one limb at a 8441time gives. This extends to 3 or 4 limbs at a time, though the extra work to 8442apply the inverse will almost certainly soon reach the limits of multiplier 8443throughput. 8444 8445A similar approach in reverse can be taken to process just half a limb at a 8446time if the divisor is only a half limb. In this case the 1@cross{}1 multiply 8447for the inverse effectively becomes two @m{{1\over2}\times1, (1/2)x1} for each 8448limb, which can be a saving on CPUs with a fast half limb multiply, or in fact 8449if the only multiply is a half limb, and especially if it's not pipelined. 8450 8451 8452@node Basecase Division, Divide and Conquer Division, Single Limb Division, Division Algorithms 8453@subsection Basecase Division 8454 8455Basecase N@cross{}M division is like long division done by hand, but in base 8456@m{2\GMPraise{@code{mp\_bits\_per\_limb}}, 2^mp_bits_per_limb}. See Knuth 8457section 4.3.1 algorithm D, and @file{mpn/generic/sb_divrem_mn.c}. 8458 8459Briefly stated, while the dividend remains larger than the divisor, a high 8460quotient limb is formed and the N@cross{}1 product @m{qd,q*d} subtracted at 8461the top end of the dividend. With a normalized divisor (most significant bit 8462set), each quotient limb can be formed with a 2@cross{}1 division and a 84631@cross{}1 multiplication plus some subtractions. The 2@cross{}1 division is 8464by the high limb of the divisor and is done either with a hardware divide or a 8465multiply by inverse (the same as in @ref{Single Limb Division}) whichever is 8466faster. Such a quotient is sometimes one too big, requiring an addback of the 8467divisor, but that happens rarely. 8468 8469With Q=N@minus{}M being the number of quotient limbs, this is an 8470@m{O(QM),O(Q*M)} algorithm and will run at a speed similar to a basecase 8471Q@cross{}M multiplication, differing in fact only in the extra multiply and 8472divide for each of the Q quotient limbs. 8473 8474 8475@node Divide and Conquer Division, Block-Wise Barrett Division, Basecase Division, Division Algorithms 8476@subsection Divide and Conquer Division 8477 8478For divisors larger than @code{DC_DIV_QR_THRESHOLD}, division is done by dividing. 8479Or to be precise by a recursive divide and conquer algorithm based on work by 8480Moenck and Borodin, Jebelean, and Burnikel and Ziegler (@pxref{References}). 8481 8482The algorithm consists essentially of recognising that a 2N@cross{}N division 8483can be done with the basecase division algorithm (@pxref{Basecase Division}), 8484but using N/2 limbs as a base, not just a single limb. This way the 8485multiplications that arise are (N/2)@cross{}(N/2) and can take advantage of 8486Karatsuba and higher multiplication algorithms (@pxref{Multiplication 8487Algorithms}). The two ``digits'' of the quotient are formed by recursive 8488N@cross{}(N/2) divisions. 8489 8490If the (N/2)@cross{}(N/2) multiplies are done with a basecase multiplication 8491then the work is about the same as a basecase division, but with more function 8492call overheads and with some subtractions separated from the multiplies. 8493These overheads mean that it's only when N/2 is above 8494@code{MUL_TOOM22_THRESHOLD} that divide and conquer is of use. 8495 8496@code{DC_DIV_QR_THRESHOLD} is based on the divisor size N, so it will be somewhere 8497above twice @code{MUL_TOOM22_THRESHOLD}, but how much above depends on the 8498CPU@. An optimized @code{mpn_mul_basecase} can lower @code{DC_DIV_QR_THRESHOLD} a 8499little by offering a ready-made advantage over repeated @code{mpn_submul_1} 8500calls. 8501 8502Divide and conquer is asymptotically @m{O(M(N)\log N),O(M(N)*log(N))} where 8503@math{M(N)} is the time for an N@cross{}N multiplication done with FFTs. The 8504actual time is a sum over multiplications of the recursed sizes, as can be 8505seen near the end of section 2.2 of Burnikel and Ziegler. For example, within 8506the Toom-3 range, divide and conquer is @m{2.63M(N), 2.63*M(N)}. With higher 8507algorithms the @math{M(N)} term improves and the multiplier tends to @m{\log 8508N, log(N)}. In practice, at moderate to large sizes, a 2N@cross{}N division 8509is about 2 to 4 times slower than an N@cross{}N multiplication. 8510 8511 8512@node Block-Wise Barrett Division, Exact Division, Divide and Conquer Division, Division Algorithms 8513@subsection Block-Wise Barrett Division 8514 8515For the largest divisions, a block-wise Barrett division algorithm is used. 8516Here, the divisor is inverted to a precision determined by the relative size of 8517the dividend and divisor. Blocks of quotient limbs are then generated by 8518multiplying blocks from the dividend by the inverse. 8519 8520Our block-wise algorithm computes a smaller inverse than in the plain Barrett 8521algorithm. For a @math{2n/n} division, the inverse will be just @m{\lceil n/2 8522\rceil, ceil(n/2)} limbs. 8523 8524 8525@node Exact Division, Exact Remainder, Block-Wise Barrett Division, Division Algorithms 8526@subsection Exact Division 8527 8528 8529A so-called exact division is when the dividend is known to be an exact 8530multiple of the divisor. Jebelean's exact division algorithm uses this 8531knowledge to make some significant optimizations (@pxref{References}). 8532 8533The idea can be illustrated in decimal for example with 368154 divided by 8534543. Because the low digit of the dividend is 4, the low digit of the 8535quotient must be 8. This is arrived at from @m{4 \mathord{\times} 7 \bmod 10, 85364*7 mod 10}, using the fact 7 is the modular inverse of 3 (the low digit of 8537the divisor), since @m{3 \mathord{\times} 7 \mathop{\equiv} 1 \bmod 10, 3*7 8538@equiv{} 1 mod 10}. So @m{8\mathord{\times}543 = 4344,8*543=4344} can be 8539subtracted from the dividend leaving 363810. Notice the low digit has become 8540zero. 8541 8542The procedure is repeated at the second digit, with the next quotient digit 7 8543(@m{1 \mathord{\times} 7 \bmod 10, 7 @equiv{} 1*7 mod 10}), subtracting 8544@m{7\mathord{\times}543 = 3801,7*543=3801}, leaving 325800. And finally at 8545the third digit with quotient digit 6 (@m{8 \mathord{\times} 7 \bmod 10, 8*7 8546mod 10}), subtracting @m{6\mathord{\times}543 = 3258,6*543=3258} leaving 0. 8547So the quotient is 678. 8548 8549Notice however that the multiplies and subtractions don't need to extend past 8550the low three digits of the dividend, since that's enough to determine the 8551three quotient digits. For the last quotient digit no subtraction is needed 8552at all. On a 2N@cross{}N division like this one, only about half the work of 8553a normal basecase division is necessary. 8554 8555For an N@cross{}M exact division producing Q=N@minus{}M quotient limbs, the 8556saving over a normal basecase division is in two parts. Firstly, each of the 8557Q quotient limbs needs only one multiply, not a 2@cross{}1 divide and 8558multiply. Secondly, the crossproducts are reduced when @math{Q>M} to 8559@m{QM-M(M+1)/2,Q*M-M*(M+1)/2}, or when @math{Q@le{}M} to @m{Q(Q-1)/2, 8560Q*(Q-1)/2}. Notice the savings are complementary. If Q is big then many 8561divisions are saved, or if Q is small then the crossproducts reduce to a small 8562number. 8563 8564The modular inverse used is calculated efficiently by @code{binvert_limb} in 8565@file{gmp-impl.h}. This does four multiplies for a 32-bit limb, or six for a 856664-bit limb. @file{tune/modlinv.c} has some alternate implementations that 8567might suit processors better at bit twiddling than multiplying. 8568 8569The sub-quadratic exact division described by Jebelean in ``Exact Division 8570with Karatsuba Complexity'' is not currently implemented. It uses a 8571rearrangement similar to the divide and conquer for normal division 8572(@pxref{Divide and Conquer Division}), but operating from low to high. A 8573further possibility not currently implemented is ``Bidirectional Exact Integer 8574Division'' by Krandick and Jebelean which forms quotient limbs from both the 8575high and low ends of the dividend, and can halve once more the number of 8576crossproducts needed in a 2N@cross{}N division. 8577 8578A special case exact division by 3 exists in @code{mpn_divexact_by3}, 8579supporting Toom-3 multiplication and @code{mpq} canonicalizations. It forms 8580quotient digits with a multiply by the modular inverse of 3 (which is 8581@code{0xAA..AAB}) and uses two comparisons to determine a borrow for the next 8582limb. The multiplications don't need to be on the dependent chain, as long as 8583the effect of the borrows is applied, which can help chips with pipelined 8584multipliers. 8585 8586 8587@node Exact Remainder, Small Quotient Division, Exact Division, Division Algorithms 8588@subsection Exact Remainder 8589@cindex Exact remainder 8590 8591If the exact division algorithm is done with a full subtraction at each stage 8592and the dividend isn't a multiple of the divisor, then low zero limbs are 8593produced but with a remainder in the high limbs. For dividend @math{a}, 8594divisor @math{d}, quotient @math{q}, and @m{b = 2 8595\GMPraise{@code{mp\_bits\_per\_limb}}, b = 2^mp_bits_per_limb}, this remainder 8596@math{r} is of the form 8597@tex 8598$$ a = qd + r b^n $$ 8599@end tex 8600@ifnottex 8601 8602@example 8603a = q*d + r*b^n 8604@end example 8605 8606@end ifnottex 8607@math{n} represents the number of zero limbs produced by the subtractions, 8608that being the number of limbs produced for @math{q}. @math{r} will be in the 8609range @math{0@le{}r<d} and can be viewed as a remainder, but one shifted up by 8610a factor of @math{b^n}. 8611 8612Carrying out full subtractions at each stage means the same number of cross 8613products must be done as a normal division, but there's still some single limb 8614divisions saved. When @math{d} is a single limb some simplifications arise, 8615providing good speedups on a number of processors. 8616 8617The functions @code{mpn_divexact_by3}, @code{mpn_modexact_1_odd} and the 8618internal @code{mpn_redc_X} functions differ subtly in how they return @math{r}, 8619leading to some negations in the above formula, but all are essentially the 8620same. 8621 8622@cindex Divisibility algorithm 8623@cindex Congruence algorithm 8624Clearly @math{r} is zero when @math{a} is a multiple of @math{d}, and this 8625leads to divisibility or congruence tests which are potentially more efficient 8626than a normal division. 8627 8628The factor of @math{b^n} on @math{r} can be ignored in a GCD when @math{d} is 8629odd, hence the use of @code{mpn_modexact_1_odd} by @code{mpn_gcd_1} and 8630@code{mpz_kronecker_ui} etc (@pxref{Greatest Common Divisor Algorithms}). 8631 8632Montgomery's REDC method for modular multiplications uses operands of the form 8633of @m{xb^{-n}, x*b^-n} and @m{yb^{-n}, y*b^-n} and on calculating @m{(xb^{-n}) 8634(yb^{-n}), (x*b^-n)*(y*b^-n)} uses the factor of @math{b^n} in the exact 8635remainder to reach a product in the same form @m{(xy)b^{-n}, (x*y)*b^-n} 8636(@pxref{Modular Powering Algorithm}). 8637 8638Notice that @math{r} generally gives no useful information about the ordinary 8639remainder @math{a @bmod d} since @math{b^n @bmod d} could be anything. If 8640however @math{b^n @equiv{} 1 @bmod d}, then @math{r} is the negative of the 8641ordinary remainder. This occurs whenever @math{d} is a factor of 8642@math{b^n-1}, as for example with 3 in @code{mpn_divexact_by3}. For a 32 or 864364 bit limb other such factors include 5, 17 and 257, but no particular use 8644has been found for this. 8645 8646 8647@node Small Quotient Division, , Exact Remainder, Division Algorithms 8648@subsection Small Quotient Division 8649 8650An N@cross{}M division where the number of quotient limbs Q=N@minus{}M is 8651small can be optimized somewhat. 8652 8653An ordinary basecase division normalizes the divisor by shifting it to make 8654the high bit set, shifting the dividend accordingly, and shifting the 8655remainder back down at the end of the calculation. This is wasteful if only a 8656few quotient limbs are to be formed. Instead a division of just the top 8657@m{\rm2Q,2*Q} limbs of the dividend by the top Q limbs of the divisor can be 8658used to form a trial quotient. This requires only those limbs normalized, not 8659the whole of the divisor and dividend. 8660 8661A multiply and subtract then applies the trial quotient to the M@minus{}Q 8662unused limbs of the divisor and N@minus{}Q dividend limbs (which includes Q 8663limbs remaining from the trial quotient division). The starting trial 8664quotient can be 1 or 2 too big, but all cases of 2 too big and most cases of 1 8665too big are detected by first comparing the most significant limbs that will 8666arise from the subtraction. An addback is done if the quotient still turns 8667out to be 1 too big. 8668 8669This whole procedure is essentially the same as one step of the basecase 8670algorithm done in a Q limb base, though with the trial quotient test done only 8671with the high limbs, not an entire Q limb ``digit'' product. The correctness 8672of this weaker test can be established by following the argument of Knuth 8673section 4.3.1 exercise 20 but with the @m{v_2 \GMPhat q > b \GMPhat r 8674+ u_2, v2*q>b*r+u2} condition appropriately relaxed. 8675 8676 8677@need 1000 8678@node Greatest Common Divisor Algorithms, Powering Algorithms, Division Algorithms, Algorithms 8679@section Greatest Common Divisor 8680@cindex Greatest common divisor algorithms 8681@cindex GCD algorithms 8682 8683@menu 8684* Binary GCD:: 8685* Lehmer's Algorithm:: 8686* Subquadratic GCD:: 8687* Extended GCD:: 8688* Jacobi Symbol:: 8689@end menu 8690 8691 8692@node Binary GCD, Lehmer's Algorithm, Greatest Common Divisor Algorithms, Greatest Common Divisor Algorithms 8693@subsection Binary GCD 8694 8695At small sizes GMP uses an @math{O(N^2)} binary style GCD@. This is described 8696in many textbooks, for example Knuth section 4.5.2 algorithm B@. It simply 8697consists of successively reducing odd operands @math{a} and @math{b} using 8698 8699@quotation 8700@math{a,b = @abs{}(a-b),@min{}(a,b)} @* 8701strip factors of 2 from @math{a} 8702@end quotation 8703 8704The Euclidean GCD algorithm, as per Knuth algorithms E and A, repeatedly 8705computes the quotient @m{q = \lfloor a/b \rfloor, q = floor(a/b)} and replaces 8706@math{a,b} by @math{v, u - q v}. The binary algorithm has so far been found to 8707be faster than the Euclidean algorithm everywhere. One reason the binary 8708method does well is that the implied quotient at each step is usually small, 8709so often only one or two subtractions are needed to get the same effect as a 8710division. Quotients 1, 2 and 3 for example occur 67.7% of the time, see Knuth 8711section 4.5.3 Theorem E. 8712 8713When the implied quotient is large, meaning @math{b} is much smaller than 8714@math{a}, then a division is worthwhile. This is the basis for the initial 8715@math{a @bmod b} reductions in @code{mpn_gcd} and @code{mpn_gcd_1} (the latter 8716for both N@cross{}1 and 1@cross{}1 cases). But after that initial reduction, 8717big quotients occur too rarely to make it worth checking for them. 8718 8719@sp 1 8720The final @math{1@cross{}1} GCD in @code{mpn_gcd_1} is done in the generic C 8721code as described above. For two N-bit operands, the algorithm takes about 87220.68 iterations per bit. For optimum performance some attention needs to be 8723paid to the way the factors of 2 are stripped from @math{a}. 8724 8725Firstly it may be noted that in twos complement the number of low zero bits on 8726@math{a-b} is the same as @math{b-a}, so counting or testing can begin on 8727@math{a-b} without waiting for @math{@abs{}(a-b)} to be determined. 8728 8729A loop stripping low zero bits tends not to branch predict well, since the 8730condition is data dependent. But on average there's only a few low zeros, so 8731an option is to strip one or two bits arithmetically then loop for more (as 8732done for AMD K6). Or use a lookup table to get a count for several bits then 8733loop for more (as done for AMD K7). An alternative approach is to keep just 8734one of @math{a} or @math{b} odd and iterate 8735 8736@quotation 8737@math{a,b = @abs{}(a-b), @min{}(a,b)} @* 8738@math{a = a/2} if even @* 8739@math{b = b/2} if even 8740@end quotation 8741 8742This requires about 1.25 iterations per bit, but stripping of a single bit at 8743each step avoids any branching. Repeating the bit strip reduces to about 0.9 8744iterations per bit, which may be a worthwhile tradeoff. 8745 8746Generally with the above approaches a speed of perhaps 6 cycles per bit can be 8747achieved, which is still not terribly fast with for instance a 64-bit GCD 8748taking nearly 400 cycles. It's this sort of time which means it's not usually 8749advantageous to combine a set of divisibility tests into a GCD. 8750 8751Currently, the binary algorithm is used for GCD only when @math{N < 3}. 8752 8753@node Lehmer's Algorithm, Subquadratic GCD, Binary GCD, Greatest Common Divisor Algorithms 8754@comment node-name, next, previous, up 8755@subsection Lehmer's algorithm 8756 8757Lehmer's improvement of the Euclidean algorithms is based on the observation 8758that the initial part of the quotient sequence depends only on the most 8759significant parts of the inputs. The variant of Lehmer's algorithm used in GMP 8760splits off the most significant two limbs, as suggested, e.g., in ``A 8761Double-Digit Lehmer-Euclid Algorithm'' by Jebelean (@pxref{References}). The 8762quotients of two double-limb inputs are collected as a 2 by 2 matrix with 8763single-limb elements. This is done by the function @code{mpn_hgcd2}. The 8764resulting matrix is applied to the inputs using @code{mpn_mul_1} and 8765@code{mpn_submul_1}. Each iteration usually reduces the inputs by almost one 8766limb. In the rare case of a large quotient, no progress can be made by 8767examining just the most significant two limbs, and the quotient is computed 8768using plain division. 8769 8770The resulting algorithm is asymptotically @math{O(N^2)}, just as the Euclidean 8771algorithm and the binary algorithm. The quadratic part of the work are 8772the calls to @code{mpn_mul_1} and @code{mpn_submul_1}. For small sizes, the 8773linear work is also significant. There are roughly @math{N} calls to the 8774@code{mpn_hgcd2} function. This function uses a couple of important 8775optimizations: 8776 8777@itemize 8778@item 8779It uses the same relaxed notion of correctness as @code{mpn_hgcd} (see next 8780section). This means that when called with the most significant two limbs of 8781two large numbers, the returned matrix does not always correspond exactly to 8782the initial quotient sequence for the two large numbers; the final quotient 8783may sometimes be one off. 8784 8785@item 8786It takes advantage of the fact the quotients are usually small. The division 8787operator is not used, since the corresponding assembler instruction is very 8788slow on most architectures. (This code could probably be improved further, it 8789uses many branches that are unfriendly to prediction). 8790 8791@item 8792It switches from double-limb calculations to single-limb calculations half-way 8793through, when the input numbers have been reduced in size from two limbs to 8794one and a half. 8795 8796@end itemize 8797 8798@node Subquadratic GCD, Extended GCD, Lehmer's Algorithm, Greatest Common Divisor Algorithms 8799@subsection Subquadratic GCD 8800 8801For inputs larger than @code{GCD_DC_THRESHOLD}, GCD is computed via the HGCD 8802(Half GCD) function, as a generalization to Lehmer's algorithm. 8803 8804Let the inputs @math{a,b} be of size @math{N} limbs each. Put @m{S=\lfloor N/2 8805\rfloor + 1, S = floor(N/2) + 1}. Then HGCD(a,b) returns a transformation 8806matrix @math{T} with non-negative elements, and reduced numbers @math{(c;d) = 8807T^{-1} (a;b)}. The reduced numbers @math{c,d} must be larger than @math{S} 8808limbs, while their difference @math{abs(c-d)} must fit in @math{S} limbs. The 8809matrix elements will also be of size roughly @math{N/2}. 8810 8811The HGCD base case uses Lehmer's algorithm, but with the above stop condition 8812that returns reduced numbers and the corresponding transformation matrix 8813half-way through. For inputs larger than @code{HGCD_THRESHOLD}, HGCD is 8814computed recursively, using the divide and conquer algorithm in ``On 8815Sch@"onhage's algorithm and subquadratic integer GCD computation'' by M@"oller 8816(@pxref{References}). The recursive algorithm consists of these main 8817steps. 8818 8819@itemize 8820 8821@item 8822Call HGCD recursively, on the most significant @math{N/2} limbs. Apply the 8823resulting matrix @math{T_1} to the full numbers, reducing them to a size just 8824above @math{3N/2}. 8825 8826@item 8827Perform a small number of division or subtraction steps to reduce the numbers 8828to size below @math{3N/2}. This is essential mainly for the unlikely case of 8829large quotients. 8830 8831@item 8832Call HGCD recursively, on the most significant @math{N/2} limbs of the reduced 8833numbers. Apply the resulting matrix @math{T_2} to the full numbers, reducing 8834them to a size just above @math{N/2}. 8835 8836@item 8837Compute @math{T = T_1 T_2}. 8838 8839@item 8840Perform a small number of division and subtraction steps to satisfy the 8841requirements, and return. 8842@end itemize 8843 8844GCD is then implemented as a loop around HGCD, similarly to Lehmer's 8845algorithm. Where Lehmer repeatedly chops off the top two limbs, calls 8846@code{mpn_hgcd2}, and applies the resulting matrix to the full numbers, the 8847subquadratic GCD chops off the most significant third of the limbs (the 8848proportion is a tuning parameter, and @math{1/3} seems to be more efficient 8849than, e.g, @math{1/2}), calls @code{mpn_hgcd}, and applies the resulting 8850matrix. Once the input numbers are reduced to size below 8851@code{GCD_DC_THRESHOLD}, Lehmer's algorithm is used for the rest of the work. 8852 8853The asymptotic running time of both HGCD and GCD is @m{O(M(N)\log N),O(M(N)*log(N))}, 8854where @math{M(N)} is the time for multiplying two @math{N}-limb numbers. 8855 8856@comment node-name, next, previous, up 8857 8858@node Extended GCD, Jacobi Symbol, Subquadratic GCD, Greatest Common Divisor Algorithms 8859@subsection Extended GCD 8860 8861The extended GCD function, or GCDEXT, calculates @math{@gcd{}(a,b)} and also 8862cofactors @math{x} and @math{y} satisfying @m{ax+by=\gcd(a@C{}b), 8863a*x+b*y=gcd(a@C{}b)}. All the algorithms used for plain GCD are extended to 8864handle this case. The binary algorithm is used only for single-limb GCDEXT. 8865Lehmer's algorithm is used for sizes up to @code{GCDEXT_DC_THRESHOLD}. Above 8866this threshold, GCDEXT is implemented as a loop around HGCD, but with more 8867book-keeping to keep track of the cofactors. This gives the same asymptotic 8868running time as for GCD and HGCD, @m{O(M(N)\log N),O(M(N)*log(N))} 8869 8870One difference to plain GCD is that while the inputs @math{a} and @math{b} are 8871reduced as the algorithm proceeds, the cofactors @math{x} and @math{y} grow in 8872size. This makes the tuning of the chopping-point more difficult. The current 8873code chops off the most significant half of the inputs for the call to HGCD in 8874the first iteration, and the most significant two thirds for the remaining 8875calls. This strategy could surely be improved. Also the stop condition for the 8876loop, where Lehmer's algorithm is invoked once the inputs are reduced below 8877@code{GCDEXT_DC_THRESHOLD}, could maybe be improved by taking into account the 8878current size of the cofactors. 8879 8880@node Jacobi Symbol, , Extended GCD, Greatest Common Divisor Algorithms 8881@subsection Jacobi Symbol 8882@cindex Jacobi symbol algorithm 8883 8884[This section is obsolete. The current Jacobi code actually uses a very 8885efficient algorithm.] 8886 8887@code{mpz_jacobi} and @code{mpz_kronecker} are currently implemented with a 8888simple binary algorithm similar to that described for the GCDs (@pxref{Binary 8889GCD}). They're not very fast when both inputs are large. Lehmer's multi-step 8890improvement or a binary based multi-step algorithm is likely to be better. 8891 8892When one operand fits a single limb, and that includes @code{mpz_kronecker_ui} 8893and friends, an initial reduction is done with either @code{mpn_mod_1} or 8894@code{mpn_modexact_1_odd}, followed by the binary algorithm on a single limb. 8895The binary algorithm is well suited to a single limb, and the whole 8896calculation in this case is quite efficient. 8897 8898In all the routines sign changes for the result are accumulated using some bit 8899twiddling, avoiding table lookups or conditional jumps. 8900 8901 8902@need 1000 8903@node Powering Algorithms, Root Extraction Algorithms, Greatest Common Divisor Algorithms, Algorithms 8904@section Powering Algorithms 8905@cindex Powering algorithms 8906 8907@menu 8908* Normal Powering Algorithm:: 8909* Modular Powering Algorithm:: 8910@end menu 8911 8912 8913@node Normal Powering Algorithm, Modular Powering Algorithm, Powering Algorithms, Powering Algorithms 8914@subsection Normal Powering 8915 8916Normal @code{mpz} or @code{mpf} powering uses a simple binary algorithm, 8917successively squaring and then multiplying by the base when a 1 bit is seen in 8918the exponent, as per Knuth section 4.6.3. The ``left to right'' 8919variant described there is used rather than algorithm A, since it's just as 8920easy and can be done with somewhat less temporary memory. 8921 8922 8923@node Modular Powering Algorithm, , Normal Powering Algorithm, Powering Algorithms 8924@subsection Modular Powering 8925 8926Modular powering is implemented using a @math{2^k}-ary sliding window 8927algorithm, as per ``Handbook of Applied Cryptography'' algorithm 14.85 8928(@pxref{References}). @math{k} is chosen according to the size of the 8929exponent. Larger exponents use larger values of @math{k}, the choice being 8930made to minimize the average number of multiplications that must supplement 8931the squaring. 8932 8933The modular multiplies and squarings use either a simple division or the REDC 8934method by Montgomery (@pxref{References}). REDC is a little faster, 8935essentially saving N single limb divisions in a fashion similar to an exact 8936remainder (@pxref{Exact Remainder}). 8937 8938 8939@node Root Extraction Algorithms, Radix Conversion Algorithms, Powering Algorithms, Algorithms 8940@section Root Extraction Algorithms 8941@cindex Root extraction algorithms 8942 8943@menu 8944* Square Root Algorithm:: 8945* Nth Root Algorithm:: 8946* Perfect Square Algorithm:: 8947* Perfect Power Algorithm:: 8948@end menu 8949 8950 8951@node Square Root Algorithm, Nth Root Algorithm, Root Extraction Algorithms, Root Extraction Algorithms 8952@subsection Square Root 8953@cindex Square root algorithm 8954@cindex Karatsuba square root algorithm 8955 8956Square roots are taken using the ``Karatsuba Square Root'' algorithm by Paul 8957Zimmermann (@pxref{References}). 8958 8959An input @math{n} is split into four parts of @math{k} bits each, so with 8960@math{b=2^k} we have @m{n = a_3b^3 + a_2b^2 + a_1b + a_0, n = a3*b^3 + a2*b^2 8961+ a1*b + a0}. Part @ms{a,3} must be ``normalized'' so that either the high or 8962second highest bit is set. In GMP, @math{k} is kept on a limb boundary and 8963the input is left shifted (by an even number of bits) to normalize. 8964 8965The square root of the high two parts is taken, by recursive application of 8966the algorithm (bottoming out in a one-limb Newton's method), 8967@tex 8968$$ s',r' = \mathop{\rm sqrtrem} \> (a_3b + a_2) $$ 8969@end tex 8970@ifnottex 8971 8972@example 8973s1,r1 = sqrtrem (a3*b + a2) 8974@end example 8975 8976@end ifnottex 8977This is an approximation to the desired root and is extended by a division to 8978give @math{s},@math{r}, 8979@tex 8980$$\eqalign{ 8981q,u &= \mathop{\rm divrem} \> (r'b + a_1, 2s') \cr 8982s &= s'b + q \cr 8983r &= ub + a_0 - q^2 8984}$$ 8985@end tex 8986@ifnottex 8987 8988@example 8989q,u = divrem (r1*b + a1, 2*s1) 8990s = s1*b + q 8991r = u*b + a0 - q^2 8992@end example 8993 8994@end ifnottex 8995The normalization requirement on @ms{a,3} means at this point @math{s} is 8996either correct or 1 too big. @math{r} is negative in the latter case, so 8997@tex 8998$$\eqalign{ 8999\mathop{\rm if} \; r &< 0 \; \mathop{\rm then} \cr 9000r &\leftarrow r + 2s - 1 \cr 9001s &\leftarrow s - 1 9002}$$ 9003@end tex 9004@ifnottex 9005 9006@example 9007if r < 0 then 9008 r = r + 2*s - 1 9009 s = s - 1 9010@end example 9011 9012@end ifnottex 9013The algorithm is expressed in a divide and conquer form, but as noted in the 9014paper it can also be viewed as a discrete variant of Newton's method, or as a 9015variation on the schoolboy method (no longer taught) for square roots two 9016digits at a time. 9017 9018If the remainder @math{r} is not required then usually only a few high limbs 9019of @math{r} and @math{u} need to be calculated to determine whether an 9020adjustment to @math{s} is required. This optimization is not currently 9021implemented. 9022 9023In the Karatsuba multiplication range this algorithm is @m{O({3\over2} 9024M(N/2)),O(1.5*M(N/2))}, where @math{M(n)} is the time to multiply two numbers 9025of @math{n} limbs. In the FFT multiplication range this grows to a bound of 9026@m{O(6 M(N/2)),O(6*M(N/2))}. In practice a factor of about 1.5 to 1.8 is 9027found in the Karatsuba and Toom-3 ranges, growing to 2 or 3 in the FFT range. 9028 9029The algorithm does all its calculations in integers and the resulting 9030@code{mpn_sqrtrem} is used for both @code{mpz_sqrt} and @code{mpf_sqrt}. 9031The extended precision given by @code{mpf_sqrt_ui} is obtained by 9032padding with zero limbs. 9033 9034 9035@node Nth Root Algorithm, Perfect Square Algorithm, Square Root Algorithm, Root Extraction Algorithms 9036@subsection Nth Root 9037@cindex Root extraction algorithm 9038@cindex Nth root algorithm 9039 9040Integer Nth roots are taken using Newton's method with the following 9041iteration, where @math{A} is the input and @math{n} is the root to be taken. 9042@tex 9043$$a_{i+1} = {1\over n} \left({A \over a_i^{n-1}} + (n-1)a_i \right)$$ 9044@end tex 9045@ifnottex 9046 9047@example 9048 1 A 9049a[i+1] = - * ( --------- + (n-1)*a[i] ) 9050 n a[i]^(n-1) 9051@end example 9052 9053@end ifnottex 9054The initial approximation @m{a_1,a[1]} is generated bitwise by successively 9055powering a trial root with or without new 1 bits, aiming to be just above the 9056true root. The iteration converges quadratically when started from a good 9057approximation. When @math{n} is large more initial bits are needed to get 9058good convergence. The current implementation is not particularly well 9059optimized. 9060 9061 9062@node Perfect Square Algorithm, Perfect Power Algorithm, Nth Root Algorithm, Root Extraction Algorithms 9063@subsection Perfect Square 9064@cindex Perfect square algorithm 9065 9066A significant fraction of non-squares can be quickly identified by checking 9067whether the input is a quadratic residue modulo small integers. 9068 9069@code{mpz_perfect_square_p} first tests the input mod 256, which means just 9070examining the low byte. Only 44 different values occur for squares mod 256, 9071so 82.8% of inputs can be immediately identified as non-squares. 9072 9073On a 32-bit system similar tests are done mod 9, 5, 7, 13 and 17, for a total 907499.25% of inputs identified as non-squares. On a 64-bit system 97 is tested 9075too, for a total 99.62%. 9076 9077These moduli are chosen because they're factors of @math{2^@W{24}-1} (or 9078@math{2^@W{48}-1} for 64-bits), and such a remainder can be quickly taken just 9079using additions (see @code{mpn_mod_34lsub1}). 9080 9081When nails are in use moduli are instead selected by the @file{gen-psqr.c} 9082program and applied with an @code{mpn_mod_1}. The same @math{2^@W{24}-1} or 9083@math{2^@W{48}-1} could be done with nails using some extra bit shifts, but 9084this is not currently implemented. 9085 9086In any case each modulus is applied to the @code{mpn_mod_34lsub1} or 9087@code{mpn_mod_1} remainder and a table lookup identifies non-squares. By 9088using a ``modexact'' style calculation, and suitably permuted tables, just one 9089multiply each is required, see the code for details. Moduli are also combined 9090to save operations, so long as the lookup tables don't become too big. 9091@file{gen-psqr.c} does all the pre-calculations. 9092 9093A square root must still be taken for any value that passes these tests, to 9094verify it's really a square and not one of the small fraction of non-squares 9095that get through (i.e.@: a pseudo-square to all the tested bases). 9096 9097Clearly more residue tests could be done, @code{mpz_perfect_square_p} only 9098uses a compact and efficient set. Big inputs would probably benefit from more 9099residue testing, small inputs might be better off with less. The assumed 9100distribution of squares versus non-squares in the input would affect such 9101considerations. 9102 9103 9104@node Perfect Power Algorithm, , Perfect Square Algorithm, Root Extraction Algorithms 9105@subsection Perfect Power 9106@cindex Perfect power algorithm 9107 9108Detecting perfect powers is required by some factorization algorithms. 9109Currently @code{mpz_perfect_power_p} is implemented using repeated Nth root 9110extractions, though naturally only prime roots need to be considered. 9111(@xref{Nth Root Algorithm}.) 9112 9113If a prime divisor @math{p} with multiplicity @math{e} can be found, then only 9114roots which are divisors of @math{e} need to be considered, much reducing the 9115work necessary. To this end divisibility by a set of small primes is checked. 9116 9117 9118@node Radix Conversion Algorithms, Other Algorithms, Root Extraction Algorithms, Algorithms 9119@section Radix Conversion 9120@cindex Radix conversion algorithms 9121 9122Radix conversions are less important than other algorithms. A program 9123dominated by conversions should probably use a different data representation. 9124 9125@menu 9126* Binary to Radix:: 9127* Radix to Binary:: 9128@end menu 9129 9130 9131@node Binary to Radix, Radix to Binary, Radix Conversion Algorithms, Radix Conversion Algorithms 9132@subsection Binary to Radix 9133 9134Conversions from binary to a power-of-2 radix use a simple and fast 9135@math{O(N)} bit extraction algorithm. 9136 9137Conversions from binary to other radices use one of two algorithms. Sizes 9138below @code{GET_STR_PRECOMPUTE_THRESHOLD} use a basic @math{O(N^2)} method. 9139Repeated divisions by @math{b^n} are made, where @math{b} is the radix and 9140@math{n} is the biggest power that fits in a limb. But instead of simply 9141using the remainder @math{r} from such divisions, an extra divide step is done 9142to give a fractional limb representing @math{r/b^n}. The digits of @math{r} 9143can then be extracted using multiplications by @math{b} rather than divisions. 9144Special case code is provided for decimal, allowing multiplications by 10 to 9145optimize to shifts and adds. 9146 9147Above @code{GET_STR_PRECOMPUTE_THRESHOLD} a sub-quadratic algorithm is used. 9148For an input @math{t}, powers @m{b^{n2^i},b^(n*2^i)} of the radix are 9149calculated, until a power between @math{t} and @m{\sqrt{t},sqrt(t)} is 9150reached. @math{t} is then divided by that largest power, giving a quotient 9151which is the digits above that power, and a remainder which is those below. 9152These two parts are in turn divided by the second highest power, and so on 9153recursively. When a piece has been divided down to less than 9154@code{GET_STR_DC_THRESHOLD} limbs, the basecase algorithm described above is 9155used. 9156 9157The advantage of this algorithm is that big divisions can make use of the 9158sub-quadratic divide and conquer division (@pxref{Divide and Conquer 9159Division}), and big divisions tend to have less overheads than lots of 9160separate single limb divisions anyway. But in any case the cost of 9161calculating the powers @m{b^{n2^i},b^(n*2^i)} must first be overcome. 9162 9163@code{GET_STR_PRECOMPUTE_THRESHOLD} and @code{GET_STR_DC_THRESHOLD} represent 9164the same basic thing, the point where it becomes worth doing a big division to 9165cut the input in half. @code{GET_STR_PRECOMPUTE_THRESHOLD} includes the cost 9166of calculating the radix power required, whereas @code{GET_STR_DC_THRESHOLD} 9167assumes that's already available, which is the case when recursing. 9168 9169Since the base case produces digits from least to most significant but they 9170want to be stored from most to least, it's necessary to calculate in advance 9171how many digits there will be, or at least be sure not to underestimate that. 9172For GMP the number of input bits is multiplied by @code{chars_per_bit_exactly} 9173from @code{mp_bases}, rounding up. The result is either correct or one too 9174big. 9175 9176Examining some of the high bits of the input could increase the chance of 9177getting the exact number of digits, but an exact result every time would not 9178be practical, since in general the difference between numbers 100@dots{} and 917999@dots{} is only in the last few bits and the work to identify 99@dots{} 9180might well be almost as much as a full conversion. 9181 9182@code{mpf_get_str} doesn't currently use the algorithm described here, it 9183multiplies or divides by a power of @math{b} to move the radix point to the 9184just above the highest non-zero digit (or at worst one above that location), 9185then multiplies by @math{b^n} to bring out digits. This is @math{O(N^2)} and 9186is certainly not optimal. 9187 9188The @math{r/b^n} scheme described above for using multiplications to bring out 9189digits might be useful for more than a single limb. Some brief experiments 9190with it on the base case when recursing didn't give a noticeable improvement, 9191but perhaps that was only due to the implementation. Something similar would 9192work for the sub-quadratic divisions too, though there would be the cost of 9193calculating a bigger radix power. 9194 9195Another possible improvement for the sub-quadratic part would be to arrange 9196for radix powers that balanced the sizes of quotient and remainder produced, 9197i.e.@: the highest power would be an @m{b^{nk},b^(n*k)} approximately equal to 9198@m{\sqrt{t},sqrt(t)}, not restricted to a @math{2^i} factor. That ought to 9199smooth out a graph of times against sizes, but may or may not be a net 9200speedup. 9201 9202 9203@node Radix to Binary, , Binary to Radix, Radix Conversion Algorithms 9204@subsection Radix to Binary 9205 9206@strong{This section needs to be rewritten, it currently describes the 9207algorithms used before GMP 4.3.} 9208 9209Conversions from a power-of-2 radix into binary use a simple and fast 9210@math{O(N)} bitwise concatenation algorithm. 9211 9212Conversions from other radices use one of two algorithms. Sizes below 9213@code{SET_STR_PRECOMPUTE_THRESHOLD} use a basic @math{O(N^2)} method. Groups 9214of @math{n} digits are converted to limbs, where @math{n} is the biggest 9215power of the base @math{b} which will fit in a limb, then those groups are 9216accumulated into the result by multiplying by @math{b^n} and adding. This 9217saves multi-precision operations, as per Knuth section 4.4 part E 9218(@pxref{References}). Some special case code is provided for decimal, giving 9219the compiler a chance to optimize multiplications by 10. 9220 9221Above @code{SET_STR_PRECOMPUTE_THRESHOLD} a sub-quadratic algorithm is used. 9222First groups of @math{n} digits are converted into limbs. Then adjacent 9223limbs are combined into limb pairs with @m{xb^n+y,x*b^n+y}, where @math{x} 9224and @math{y} are the limbs. Adjacent limb pairs are combined into quads 9225similarly with @m{xb^{2n}+y,x*b^(2n)+y}. This continues until a single block 9226remains, that being the result. 9227 9228The advantage of this method is that the multiplications for each @math{x} are 9229big blocks, allowing Karatsuba and higher algorithms to be used. But the cost 9230of calculating the powers @m{b^{n2^i},b^(n*2^i)} must be overcome. 9231@code{SET_STR_PRECOMPUTE_THRESHOLD} usually ends up quite big, around 5000 digits, and on 9232some processors much bigger still. 9233 9234@code{SET_STR_PRECOMPUTE_THRESHOLD} is based on the input digits (and tuned 9235for decimal), though it might be better based on a limb count, so as to be 9236independent of the base. But that sort of count isn't used by the base case 9237and so would need some sort of initial calculation or estimate. 9238 9239The main reason @code{SET_STR_PRECOMPUTE_THRESHOLD} is so much bigger than the 9240corresponding @code{GET_STR_PRECOMPUTE_THRESHOLD} is that @code{mpn_mul_1} is 9241much faster than @code{mpn_divrem_1} (often by a factor of 5, or more). 9242 9243 9244@need 1000 9245@node Other Algorithms, Assembly Coding, Radix Conversion Algorithms, Algorithms 9246@section Other Algorithms 9247 9248@menu 9249* Prime Testing Algorithm:: 9250* Factorial Algorithm:: 9251* Binomial Coefficients Algorithm:: 9252* Fibonacci Numbers Algorithm:: 9253* Lucas Numbers Algorithm:: 9254* Random Number Algorithms:: 9255@end menu 9256 9257 9258@node Prime Testing Algorithm, Factorial Algorithm, Other Algorithms, Other Algorithms 9259@subsection Prime Testing 9260@cindex Prime testing algorithms 9261 9262The primality testing in @code{mpz_probab_prime_p} (@pxref{Number Theoretic 9263Functions}) first does some trial division by small factors and then uses the 9264Miller-Rabin probabilistic primality testing algorithm, as described in Knuth 9265section 4.5.4 algorithm P (@pxref{References}). 9266 9267For an odd input @math{n}, and with @math{n = q@GMPmultiply{}2^k+1} where 9268@math{q} is odd, this algorithm selects a random base @math{x} and tests 9269whether @math{x^q @bmod{} n} is 1 or @math{-1}, or an @m{x^{q2^j} \bmod n, 9270x^(q*2^j) mod n} is @math{1}, for @math{1@le{}j@le{}k}. If so then @math{n} 9271is probably prime, if not then @math{n} is definitely composite. 9272 9273Any prime @math{n} will pass the test, but some composites do too. Such 9274composites are known as strong pseudoprimes to base @math{x}. No @math{n} is 9275a strong pseudoprime to more than @math{1/4} of all bases (see Knuth exercise 927622), hence with @math{x} chosen at random there's no more than a @math{1/4} 9277chance a ``probable prime'' will in fact be composite. 9278 9279In fact strong pseudoprimes are quite rare, making the test much more 9280powerful than this analysis would suggest, but @math{1/4} is all that's proven 9281for an arbitrary @math{n}. 9282 9283 9284@node Factorial Algorithm, Binomial Coefficients Algorithm, Prime Testing Algorithm, Other Algorithms 9285@subsection Factorial 9286@cindex Factorial algorithm 9287 9288Factorials are calculated by a combination of two algorithms. An idea is 9289shared among them: to compute the odd part of the factorial; a final step 9290takes account of the power of @math{2} term, by shifting. 9291 9292For small @math{n}, the odd factor of @math{n!} is computed with the simple 9293observation that it is equal to the product of all positive odd numbers 9294smaller than @math{n} times the odd factor of @m{\lfloor n/2\rfloor!, [n/2]!}, 9295where @m{\lfloor x\rfloor, [x]} is the integer part of @math{x}, and so on 9296recursively. The procedure can be best illustrated with an example, 9297 9298@quotation 9299@math{23! = (23.21.19.17.15.13.11.9.7.5.3)(11.9.7.5.3)(5.3)2^{19}} 9300@end quotation 9301 9302Current code collects all the factors in a single list, with a loop and no 9303recursion, and compute the product, with no special care for repeated chunks. 9304 9305When @math{n} is larger, computation pass trough prime sieving. An helper 9306function is used, as suggested by Peter Luschny: 9307@tex 9308$$\mathop{\rm msf}(n) = {n!\over\lfloor n/2\rfloor!^2\cdot2^k} = \prod_{p=3}^{n} 9309p^{\mathop{\rm L}(p,n)} $$ 9310@end tex 9311@ifnottex 9312 9313@example 9314 n 9315 ----- 9316 n! | | L(p,n) 9317msf(n) = -------------- = | | p 9318 [n/2]!^2.2^k p=3 9319@end example 9320@end ifnottex 9321 9322Where @math{p} ranges on odd prime numbers. The exponent @math{k} is chosen to 9323obtain an odd integer number: @math{k} is the number of 1 bits in the binary 9324representation of @m{\lfloor n/2\rfloor, [n/2]}. The function L@math{(p,n)} 9325can be defined as zero when @math{p} is composite, and, for any prime 9326@math{p}, it is computed with: 9327@tex 9328$$\mathop{\rm L}(p,n) = \sum_{i>0}\left\lfloor{n\over p^i}\right\rfloor\bmod2 9329\leq\log_p(n)$$ 9330@end tex 9331@ifnottex 9332 9333@example 9334 --- 9335 \ n 9336L(p,n) = / [---] mod 2 <= log (n) . 9337 --- p^i p 9338 i>0 9339@end example 9340@end ifnottex 9341 9342With this helper function, we are able to compute the odd part of @math{n!} 9343using the recursion implied by @m{n!=\lfloor n/2\rfloor!^2\cdot\mathop{\rm 9344msf}(n)\cdot2^k , n!=[n/2]!^2*msf(n)*2^k}. The recursion stops using the 9345small-@math{n} algorithm on some @m{\lfloor n/2^i\rfloor, [n/2^i]}. 9346 9347Both the above algorithms use binary splitting to compute the product of many 9348small factors. At first as many products as possible are accumulated in a 9349single register, generating a list of factors that fit in a machine word. This 9350list is then split into halves, and the product is computed recursively. 9351 9352Such splitting is more efficient than repeated N@cross{}1 multiplies since it 9353forms big multiplies, allowing Karatsuba and higher algorithms to be used. 9354And even below the Karatsuba threshold a big block of work can be more 9355efficient for the basecase algorithm. 9356 9357 9358@node Binomial Coefficients Algorithm, Fibonacci Numbers Algorithm, Factorial Algorithm, Other Algorithms 9359@subsection Binomial Coefficients 9360@cindex Binomial coefficient algorithm 9361 9362Binomial coefficients @m{\left({n}\atop{k}\right), C(n@C{}k)} are calculated 9363by first arranging @math{k @le{} n/2} using @m{\left({n}\atop{k}\right) = 9364\left({n}\atop{n-k}\right), C(n@C{}k) = C(n@C{}n-k)} if necessary, and then 9365evaluating the following product simply from @math{i=2} to @math{i=k}. 9366@tex 9367$$ \left({n}\atop{k}\right) = (n-k+1) \prod_{i=2}^{k} {{n-k+i} \over i} $$ 9368@end tex 9369@ifnottex 9370 9371@example 9372 k (n-k+i) 9373C(n,k) = (n-k+1) * prod ------- 9374 i=2 i 9375@end example 9376 9377@end ifnottex 9378It's easy to show that each denominator @math{i} will divide the product so 9379far, so the exact division algorithm is used (@pxref{Exact Division}). 9380 9381The numerators @math{n-k+i} and denominators @math{i} are first accumulated 9382into as many fit a limb, to save multi-precision operations, though for 9383@code{mpz_bin_ui} this applies only to the divisors, since @math{n} is an 9384@code{mpz_t} and @math{n-k+i} in general won't fit in a limb at all. 9385 9386 9387@node Fibonacci Numbers Algorithm, Lucas Numbers Algorithm, Binomial Coefficients Algorithm, Other Algorithms 9388@subsection Fibonacci Numbers 9389@cindex Fibonacci number algorithm 9390 9391The Fibonacci functions @code{mpz_fib_ui} and @code{mpz_fib2_ui} are designed 9392for calculating isolated @m{F_n,F[n]} or @m{F_n,F[n]},@m{F_{n-1},F[n-1]} 9393values efficiently. 9394 9395For small @math{n}, a table of single limb values in @code{__gmp_fib_table} is 9396used. On a 32-bit limb this goes up to @m{F_{47},F[47]}, or on a 64-bit limb 9397up to @m{F_{93},F[93]}. For convenience the table starts at @m{F_{-1},F[-1]}. 9398 9399Beyond the table, values are generated with a binary powering algorithm, 9400calculating a pair @m{F_n,F[n]} and @m{F_{n-1},F[n-1]} working from high to 9401low across the bits of @math{n}. The formulas used are 9402@tex 9403$$\eqalign{ 9404 F_{2k+1} &= 4F_k^2 - F_{k-1}^2 + 2(-1)^k \cr 9405 F_{2k-1} &= F_k^2 + F_{k-1}^2 \cr 9406 F_{2k} &= F_{2k+1} - F_{2k-1} 9407}$$ 9408@end tex 9409@ifnottex 9410 9411@example 9412F[2k+1] = 4*F[k]^2 - F[k-1]^2 + 2*(-1)^k 9413F[2k-1] = F[k]^2 + F[k-1]^2 9414 9415F[2k] = F[2k+1] - F[2k-1] 9416@end example 9417 9418@end ifnottex 9419At each step, @math{k} is the high @math{b} bits of @math{n}. If the next bit 9420of @math{n} is 0 then @m{F_{2k},F[2k]},@m{F_{2k-1},F[2k-1]} is used, or if 9421it's a 1 then @m{F_{2k+1},F[2k+1]},@m{F_{2k},F[2k]} is used, and the process 9422repeated until all bits of @math{n} are incorporated. Notice these formulas 9423require just two squares per bit of @math{n}. 9424 9425It'd be possible to handle the first few @math{n} above the single limb table 9426with simple additions, using the defining Fibonacci recurrence @m{F_{k+1} = 9427F_k + F_{k-1}, F[k+1]=F[k]+F[k-1]}, but this is not done since it usually 9428turns out to be faster for only about 10 or 20 values of @math{n}, and 9429including a block of code for just those doesn't seem worthwhile. If they 9430really mattered it'd be better to extend the data table. 9431 9432Using a table avoids lots of calculations on small numbers, and makes small 9433@math{n} go fast. A bigger table would make more small @math{n} go fast, it's 9434just a question of balancing size against desired speed. For GMP the code is 9435kept compact, with the emphasis primarily on a good powering algorithm. 9436 9437@code{mpz_fib2_ui} returns both @m{F_n,F[n]} and @m{F_{n-1},F[n-1]}, but 9438@code{mpz_fib_ui} is only interested in @m{F_n,F[n]}. In this case the last 9439step of the algorithm can become one multiply instead of two squares. One of 9440the following two formulas is used, according as @math{n} is odd or even. 9441@tex 9442$$\eqalign{ 9443 F_{2k} &= F_k (F_k + 2F_{k-1}) \cr 9444 F_{2k+1} &= (2F_k + F_{k-1}) (2F_k - F_{k-1}) + 2(-1)^k 9445}$$ 9446@end tex 9447@ifnottex 9448 9449@example 9450F[2k] = F[k]*(F[k]+2F[k-1]) 9451 9452F[2k+1] = (2F[k]+F[k-1])*(2F[k]-F[k-1]) + 2*(-1)^k 9453@end example 9454 9455@end ifnottex 9456@m{F_{2k+1},F[2k+1]} here is the same as above, just rearranged to be a 9457multiply. For interest, the @m{2(-1)^k, 2*(-1)^k} term both here and above 9458can be applied just to the low limb of the calculation, without a carry or 9459borrow into further limbs, which saves some code size. See comments with 9460@code{mpz_fib_ui} and the internal @code{mpn_fib2_ui} for how this is done. 9461 9462 9463@node Lucas Numbers Algorithm, Random Number Algorithms, Fibonacci Numbers Algorithm, Other Algorithms 9464@subsection Lucas Numbers 9465@cindex Lucas number algorithm 9466 9467@code{mpz_lucnum2_ui} derives a pair of Lucas numbers from a pair of Fibonacci 9468numbers with the following simple formulas. 9469@tex 9470$$\eqalign{ 9471 L_k &= F_k + 2F_{k-1} \cr 9472 L_{k-1} &= 2F_k - F_{k-1} 9473}$$ 9474@end tex 9475@ifnottex 9476 9477@example 9478L[k] = F[k] + 2*F[k-1] 9479L[k-1] = 2*F[k] - F[k-1] 9480@end example 9481 9482@end ifnottex 9483@code{mpz_lucnum_ui} is only interested in @m{L_n,L[n]}, and some work can be 9484saved. Trailing zero bits on @math{n} can be handled with a single square 9485each. 9486@tex 9487$$ L_{2k} = L_k^2 - 2(-1)^k $$ 9488@end tex 9489@ifnottex 9490 9491@example 9492L[2k] = L[k]^2 - 2*(-1)^k 9493@end example 9494 9495@end ifnottex 9496And the lowest 1 bit can be handled with one multiply of a pair of Fibonacci 9497numbers, similar to what @code{mpz_fib_ui} does. 9498@tex 9499$$ L_{2k+1} = 5F_{k-1} (2F_k + F_{k-1}) - 4(-1)^k $$ 9500@end tex 9501@ifnottex 9502 9503@example 9504L[2k+1] = 5*F[k-1]*(2*F[k]+F[k-1]) - 4*(-1)^k 9505@end example 9506 9507@end ifnottex 9508 9509 9510@node Random Number Algorithms, , Lucas Numbers Algorithm, Other Algorithms 9511@subsection Random Numbers 9512@cindex Random number algorithms 9513 9514For the @code{urandomb} functions, random numbers are generated simply by 9515concatenating bits produced by the generator. As long as the generator has 9516good randomness properties this will produce well-distributed @math{N} bit 9517numbers. 9518 9519For the @code{urandomm} functions, random numbers in a range @math{0@le{}R<N} 9520are generated by taking values @math{R} of @m{\lceil \log_2 N \rceil, 9521ceil(log2(N))} bits each until one satisfies @math{R<N}. This will normally 9522require only one or two attempts, but the attempts are limited in case the 9523generator is somehow degenerate and produces only 1 bits or similar. 9524 9525@cindex Mersenne twister algorithm 9526The Mersenne Twister generator is by Matsumoto and Nishimura 9527(@pxref{References}). It has a non-repeating period of @math{2^@W{19937}-1}, 9528which is a Mersenne prime, hence the name of the generator. The state is 624 9529words of 32-bits each, which is iterated with one XOR and shift for each 953032-bit word generated, making the algorithm very fast. Randomness properties 9531are also very good and this is the default algorithm used by GMP. 9532 9533@cindex Linear congruential algorithm 9534Linear congruential generators are described in many text books, for instance 9535Knuth volume 2 (@pxref{References}). With a modulus @math{M} and parameters 9536@math{A} and @math{C}, an integer state @math{S} is iterated by the formula 9537@math{S @leftarrow{} A@GMPmultiply{}S+C @bmod{} M}. At each step the new 9538state is a linear function of the previous, mod @math{M}, hence the name of 9539the generator. 9540 9541In GMP only moduli of the form @math{2^N} are supported, and the current 9542implementation is not as well optimized as it could be. Overheads are 9543significant when @math{N} is small, and when @math{N} is large clearly the 9544multiply at each step will become slow. This is not a big concern, since the 9545Mersenne Twister generator is better in every respect and is therefore 9546recommended for all normal applications. 9547 9548For both generators the current state can be deduced by observing enough 9549output and applying some linear algebra (over GF(2) in the case of the 9550Mersenne Twister). This generally means raw output is unsuitable for 9551cryptographic applications without further hashing or the like. 9552 9553 9554@node Assembly Coding, , Other Algorithms, Algorithms 9555@section Assembly Coding 9556@cindex Assembly coding 9557 9558The assembly subroutines in GMP are the most significant source of speed at 9559small to moderate sizes. At larger sizes algorithm selection becomes more 9560important, but of course speedups in low level routines will still speed up 9561everything proportionally. 9562 9563Carry handling and widening multiplies that are important for GMP can't be 9564easily expressed in C@. GCC @code{asm} blocks help a lot and are provided in 9565@file{longlong.h}, but hand coding low level routines invariably offers a 9566speedup over generic C by a factor of anything from 2 to 10. 9567 9568@menu 9569* Assembly Code Organisation:: 9570* Assembly Basics:: 9571* Assembly Carry Propagation:: 9572* Assembly Cache Handling:: 9573* Assembly Functional Units:: 9574* Assembly Floating Point:: 9575* Assembly SIMD Instructions:: 9576* Assembly Software Pipelining:: 9577* Assembly Loop Unrolling:: 9578* Assembly Writing Guide:: 9579@end menu 9580 9581 9582@node Assembly Code Organisation, Assembly Basics, Assembly Coding, Assembly Coding 9583@subsection Code Organisation 9584@cindex Assembly code organisation 9585@cindex Code organisation 9586 9587The various @file{mpn} subdirectories contain machine-dependent code, written 9588in C or assembly. The @file{mpn/generic} subdirectory contains default code, 9589used when there's no machine-specific version of a particular file. 9590 9591Each @file{mpn} subdirectory is for an ISA family. Generally 32-bit and 959264-bit variants in a family cannot share code and have separate directories. 9593Within a family further subdirectories may exist for CPU variants. 9594 9595In each directory a @file{nails} subdirectory may exist, holding code with 9596nails support for that CPU variant. A @code{NAILS_SUPPORT} directive in each 9597file indicates the nails values the code handles. Nails code only exists 9598where it's faster, or promises to be faster, than plain code. There's no 9599effort put into nails if they're not going to enhance a given CPU. 9600 9601 9602@node Assembly Basics, Assembly Carry Propagation, Assembly Code Organisation, Assembly Coding 9603@subsection Assembly Basics 9604 9605@code{mpn_addmul_1} and @code{mpn_submul_1} are the most important routines 9606for overall GMP performance. All multiplications and divisions come down to 9607repeated calls to these. @code{mpn_add_n}, @code{mpn_sub_n}, 9608@code{mpn_lshift} and @code{mpn_rshift} are next most important. 9609 9610On some CPUs assembly versions of the internal functions 9611@code{mpn_mul_basecase} and @code{mpn_sqr_basecase} give significant speedups, 9612mainly through avoiding function call overheads. They can also potentially 9613make better use of a wide superscalar processor, as can bigger primitives like 9614@code{mpn_addmul_2} or @code{mpn_addmul_4}. 9615 9616The restrictions on overlaps between sources and destinations 9617(@pxref{Low-level Functions}) are designed to facilitate a variety of 9618implementations. For example, knowing @code{mpn_add_n} won't have partly 9619overlapping sources and destination means reading can be done far ahead of 9620writing on superscalar processors, and loops can be vectorized on a vector 9621processor, depending on the carry handling. 9622 9623 9624@node Assembly Carry Propagation, Assembly Cache Handling, Assembly Basics, Assembly Coding 9625@subsection Carry Propagation 9626@cindex Assembly carry propagation 9627 9628The problem that presents most challenges in GMP is propagating carries from 9629one limb to the next. In functions like @code{mpn_addmul_1} and 9630@code{mpn_add_n}, carries are the only dependencies between limb operations. 9631 9632On processors with carry flags, a straightforward CISC style @code{adc} is 9633generally best. AMD K6 @code{mpn_addmul_1} however is an example of an 9634unusual set of circumstances where a branch works out better. 9635 9636On RISC processors generally an add and compare for overflow is used. This 9637sort of thing can be seen in @file{mpn/generic/aors_n.c}. Some carry 9638propagation schemes require 4 instructions, meaning at least 4 cycles per 9639limb, but other schemes may use just 1 or 2. On wide superscalar processors 9640performance may be completely determined by the number of dependent 9641instructions between carry-in and carry-out for each limb. 9642 9643On vector processors good use can be made of the fact that a carry bit only 9644very rarely propagates more than one limb. When adding a single bit to a 9645limb, there's only a carry out if that limb was @code{0xFF@dots{}FF} which on 9646random data will be only 1 in @m{2\GMPraise{@code{mp\_bits\_per\_limb}}, 96472^mp_bits_per_limb}. @file{mpn/cray/add_n.c} is an example of this, it adds 9648all limbs in parallel, adds one set of carry bits in parallel and then only 9649rarely needs to fall through to a loop propagating further carries. 9650 9651On the x86s, GCC (as of version 2.95.2) doesn't generate particularly good code 9652for the RISC style idioms that are necessary to handle carry bits in 9653C@. Often conditional jumps are generated where @code{adc} or @code{sbb} forms 9654would be better. And so unfortunately almost any loop involving carry bits 9655needs to be coded in assembly for best results. 9656 9657 9658@node Assembly Cache Handling, Assembly Functional Units, Assembly Carry Propagation, Assembly Coding 9659@subsection Cache Handling 9660@cindex Assembly cache handling 9661 9662GMP aims to perform well both on operands that fit entirely in L1 cache and 9663those which don't. 9664 9665Basic routines like @code{mpn_add_n} or @code{mpn_lshift} are often used on 9666large operands, so L2 and main memory performance is important for them. 9667@code{mpn_mul_1} and @code{mpn_addmul_1} are mostly used for multiply and 9668square basecases, so L1 performance matters most for them, unless assembly 9669versions of @code{mpn_mul_basecase} and @code{mpn_sqr_basecase} exist, in 9670which case the remaining uses are mostly for larger operands. 9671 9672For L2 or main memory operands, memory access times will almost certainly be 9673more than the calculation time. The aim therefore is to maximize memory 9674throughput, by starting a load of the next cache line while processing the 9675contents of the previous one. Clearly this is only possible if the chip has a 9676lock-up free cache or some sort of prefetch instruction. Most current chips 9677have both these features. 9678 9679Prefetching sources combines well with loop unrolling, since a prefetch can be 9680initiated once per unrolled loop (or more than once if the loop covers more 9681than one cache line). 9682 9683On CPUs without write-allocate caches, prefetching destinations will ensure 9684individual stores don't go further down the cache hierarchy, limiting 9685bandwidth. Of course for calculations which are slow anyway, like 9686@code{mpn_divrem_1}, write-throughs might be fine. 9687 9688The distance ahead to prefetch will be determined by memory latency versus 9689throughput. The aim of course is to have data arriving continuously, at peak 9690throughput. Some CPUs have limits on the number of fetches or prefetches in 9691progress. 9692 9693If a special prefetch instruction doesn't exist then a plain load can be used, 9694but in that case care must be taken not to attempt to read past the end of an 9695operand, since that might produce a segmentation violation. 9696 9697Some CPUs or systems have hardware that detects sequential memory accesses and 9698initiates suitable cache movements automatically, making life easy. 9699 9700 9701@node Assembly Functional Units, Assembly Floating Point, Assembly Cache Handling, Assembly Coding 9702@subsection Functional Units 9703 9704When choosing an approach for an assembly loop, consideration is given to 9705what operations can execute simultaneously and what throughput can thereby be 9706achieved. In some cases an algorithm can be tweaked to accommodate available 9707resources. 9708 9709Loop control will generally require a counter and pointer updates, costing as 9710much as 5 instructions, plus any delays a branch introduces. CPU addressing 9711modes might reduce pointer updates, perhaps by allowing just one updating 9712pointer and others expressed as offsets from it, or on CISC chips with all 9713addressing done with the loop counter as a scaled index. 9714 9715The final loop control cost can be amortised by processing several limbs in 9716each iteration (@pxref{Assembly Loop Unrolling}). This at least ensures loop 9717control isn't a big fraction the work done. 9718 9719Memory throughput is always a limit. If perhaps only one load or one store 9720can be done per cycle then 3 cycles/limb will the top speed for ``binary'' 9721operations like @code{mpn_add_n}, and any code achieving that is optimal. 9722 9723Integer resources can be freed up by having the loop counter in a float 9724register, or by pressing the float units into use for some multiplying, 9725perhaps doing every second limb on the float side (@pxref{Assembly Floating 9726Point}). 9727 9728Float resources can be freed up by doing carry propagation on the integer 9729side, or even by doing integer to float conversions in integers using bit 9730twiddling. 9731 9732 9733@node Assembly Floating Point, Assembly SIMD Instructions, Assembly Functional Units, Assembly Coding 9734@subsection Floating Point 9735@cindex Assembly floating Point 9736 9737Floating point arithmetic is used in GMP for multiplications on CPUs with poor 9738integer multipliers. It's mostly useful for @code{mpn_mul_1}, 9739@code{mpn_addmul_1} and @code{mpn_submul_1} on 64-bit machines, and 9740@code{mpn_mul_basecase} on both 32-bit and 64-bit machines. 9741 9742With IEEE 53-bit double precision floats, integer multiplications producing up 9743to 53 bits will give exact results. Breaking a 64@cross{}64 multiplication 9744into eight 16@cross{}@math{32@rightarrow{}48} bit pieces is convenient. With 9745some care though six 21@cross{}@math{32@rightarrow{}53} bit products can be 9746used, if one of the lower two 21-bit pieces also uses the sign bit. 9747 9748For the @code{mpn_mul_1} family of functions on a 64-bit machine, the 9749invariant single limb is split at the start, into 3 or 4 pieces. Inside the 9750loop, the bignum operand is split into 32-bit pieces. Fast conversion of 9751these unsigned 32-bit pieces to floating point is highly machine-dependent. 9752In some cases, reading the data into the integer unit, zero-extending to 975364-bits, then transferring to the floating point unit back via memory is the 9754only option. 9755 9756Converting partial products back to 64-bit limbs is usually best done as a 9757signed conversion. Since all values are smaller than @m{2^{53},2^53}, signed 9758and unsigned are the same, but most processors lack unsigned conversions. 9759 9760@sp 2 9761 9762Here is a diagram showing 16@cross{}32 bit products for an @code{mpn_mul_1} or 9763@code{mpn_addmul_1} with a 64-bit limb. The single limb operand V is split 9764into four 16-bit parts. The multi-limb operand U is split in the loop into 9765two 32-bit parts. 9766 9767@tex 9768\global\newdimen\GMPbits \global\GMPbits=0.18em 9769\def\GMPbox#1#2#3{% 9770 \hbox{% 9771 \hbox to 128\GMPbits{\hfil 9772 \vbox{% 9773 \hrule 9774 \hbox to 48\GMPbits {\GMPvrule \hfil$#2$\hfil \vrule}% 9775 \hrule}% 9776 \hskip #1\GMPbits}% 9777 \raise \GMPboxdepth \hbox{\hskip 2em #3}}} 9778% 9779\GMPdisplay{% 9780 \vbox{% 9781 \hbox{% 9782 \hbox to 128\GMPbits {\hfil 9783 \vbox{% 9784 \hrule 9785 \hbox to 64\GMPbits{% 9786 \GMPvrule \hfil$v48$\hfil 9787 \vrule \hfil$v32$\hfil 9788 \vrule \hfil$v16$\hfil 9789 \vrule \hfil$v00$\hfil 9790 \vrule} 9791 \hrule}}% 9792 \raise \GMPboxdepth \hbox{\hskip 2em V Operand}} 9793 \vskip 0.5ex 9794 \hbox{% 9795 \hbox to 128\GMPbits {\hfil 9796 \raise \GMPboxdepth \hbox{$\times$\hskip 1.5em}% 9797 \vbox{% 9798 \hrule 9799 \hbox to 64\GMPbits {% 9800 \GMPvrule \hfil$u32$\hfil 9801 \vrule \hfil$u00$\hfil 9802 \vrule}% 9803 \hrule}}% 9804 \raise \GMPboxdepth \hbox{\hskip 2em U Operand (one limb)}}% 9805 \vskip 0.5ex 9806 \hbox{\vbox to 2ex{\hrule width 128\GMPbits}}% 9807 \GMPbox{0}{u00 \times v00}{$p00$\hskip 1.5em 48-bit products}% 9808 \vskip 0.5ex 9809 \GMPbox{16}{u00 \times v16}{$p16$} 9810 \vskip 0.5ex 9811 \GMPbox{32}{u00 \times v32}{$p32$} 9812 \vskip 0.5ex 9813 \GMPbox{48}{u00 \times v48}{$p48$} 9814 \vskip 0.5ex 9815 \GMPbox{32}{u32 \times v00}{$r32$} 9816 \vskip 0.5ex 9817 \GMPbox{48}{u32 \times v16}{$r48$} 9818 \vskip 0.5ex 9819 \GMPbox{64}{u32 \times v32}{$r64$} 9820 \vskip 0.5ex 9821 \GMPbox{80}{u32 \times v48}{$r80$} 9822}} 9823@end tex 9824@ifnottex 9825@example 9826@group 9827 +---+---+---+---+ 9828 |v48|v32|v16|v00| V operand 9829 +---+---+---+---+ 9830 9831 +-------+---+---+ 9832 x | u32 | u00 | U operand (one limb) 9833 +---------------+ 9834 9835--------------------------------- 9836 9837 +-----------+ 9838 | u00 x v00 | p00 48-bit products 9839 +-----------+ 9840 +-----------+ 9841 | u00 x v16 | p16 9842 +-----------+ 9843 +-----------+ 9844 | u00 x v32 | p32 9845 +-----------+ 9846 +-----------+ 9847 | u00 x v48 | p48 9848 +-----------+ 9849 +-----------+ 9850 | u32 x v00 | r32 9851 +-----------+ 9852 +-----------+ 9853 | u32 x v16 | r48 9854 +-----------+ 9855 +-----------+ 9856 | u32 x v32 | r64 9857 +-----------+ 9858+-----------+ 9859| u32 x v48 | r80 9860+-----------+ 9861@end group 9862@end example 9863@end ifnottex 9864 9865@math{p32} and @math{r32} can be summed using floating-point addition, and 9866likewise @math{p48} and @math{r48}. @math{p00} and @math{p16} can be summed 9867with @math{r64} and @math{r80} from the previous iteration. 9868 9869For each loop then, four 49-bit quantities are transferred to the integer unit, 9870aligned as follows, 9871 9872@tex 9873% GMPbox here should be 49 bits wide, but use 51 to better show p16+r80' 9874% crossing into the upper 64 bits. 9875\def\GMPbox#1#2#3{% 9876 \hbox{% 9877 \hbox to 128\GMPbits {% 9878 \hfil 9879 \vbox{% 9880 \hrule 9881 \hbox to 51\GMPbits {\GMPvrule \hfil$#2$\hfil \vrule}% 9882 \hrule}% 9883 \hskip #1\GMPbits}% 9884 \raise \GMPboxdepth \hbox{\hskip 1.5em $#3$\hfil}% 9885}} 9886\newbox\b \setbox\b\hbox{64 bits}% 9887\newdimen\bw \bw=\wd\b \advance\bw by 2em 9888\newdimen\x \x=128\GMPbits 9889\advance\x by -2\bw 9890\divide\x by4 9891\GMPdisplay{% 9892 \vbox{% 9893 \hbox to 128\GMPbits {% 9894 \GMPvrule 9895 \raise 0.5ex \vbox{\hrule \hbox to \x {}}% 9896 \hfil 64 bits\hfil 9897 \raise 0.5ex \vbox{\hrule \hbox to \x {}}% 9898 \vrule 9899 \raise 0.5ex \vbox{\hrule \hbox to \x {}}% 9900 \hfil 64 bits\hfil 9901 \raise 0.5ex \vbox{\hrule \hbox to \x {}}% 9902 \vrule}% 9903 \vskip 0.7ex 9904 \GMPbox{0}{p00+r64'}{i00} 9905 \vskip 0.5ex 9906 \GMPbox{16}{p16+r80'}{i16} 9907 \vskip 0.5ex 9908 \GMPbox{32}{p32+r32}{i32} 9909 \vskip 0.5ex 9910 \GMPbox{48}{p48+r48}{i48} 9911}} 9912@end tex 9913@ifnottex 9914@example 9915@group 9916|-----64bits----|-----64bits----| 9917 +------------+ 9918 | p00 + r64' | i00 9919 +------------+ 9920 +------------+ 9921 | p16 + r80' | i16 9922 +------------+ 9923 +------------+ 9924 | p32 + r32 | i32 9925 +------------+ 9926 +------------+ 9927 | p48 + r48 | i48 9928 +------------+ 9929@end group 9930@end example 9931@end ifnottex 9932 9933The challenge then is to sum these efficiently and add in a carry limb, 9934generating a low 64-bit result limb and a high 33-bit carry limb (@math{i48} 9935extends 33 bits into the high half). 9936 9937 9938@node Assembly SIMD Instructions, Assembly Software Pipelining, Assembly Floating Point, Assembly Coding 9939@subsection SIMD Instructions 9940@cindex Assembly SIMD 9941 9942The single-instruction multiple-data support in current microprocessors is 9943aimed at signal processing algorithms where each data point can be treated 9944more or less independently. There's generally not much support for 9945propagating the sort of carries that arise in GMP. 9946 9947SIMD multiplications of say four 16@cross{}16 bit multiplies only do as much 9948work as one 32@cross{}32 from GMP's point of view, and need some shifts and 9949adds besides. But of course if say the SIMD form is fully pipelined and uses 9950less instruction decoding then it may still be worthwhile. 9951 9952On the x86 chips, MMX has so far found a use in @code{mpn_rshift} and 9953@code{mpn_lshift}, and is used in a special case for 16-bit multipliers in the 9954P55 @code{mpn_mul_1}. SSE2 is used for Pentium 4 @code{mpn_mul_1}, 9955@code{mpn_addmul_1}, and @code{mpn_submul_1}. 9956 9957 9958@node Assembly Software Pipelining, Assembly Loop Unrolling, Assembly SIMD Instructions, Assembly Coding 9959@subsection Software Pipelining 9960@cindex Assembly software pipelining 9961 9962Software pipelining consists of scheduling instructions around the branch 9963point in a loop. For example a loop might issue a load not for use in the 9964present iteration but the next, thereby allowing extra cycles for the data to 9965arrive from memory. 9966 9967Naturally this is wanted only when doing things like loads or multiplies that 9968take several cycles to complete, and only where a CPU has multiple functional 9969units so that other work can be done in the meantime. 9970 9971A pipeline with several stages will have a data value in progress at each 9972stage and each loop iteration moves them along one stage. This is like 9973juggling. 9974 9975If the latency of some instruction is greater than the loop time then it will 9976be necessary to unroll, so one register has a result ready to use while 9977another (or multiple others) are still in progress. (@pxref{Assembly Loop 9978Unrolling}). 9979 9980 9981@node Assembly Loop Unrolling, Assembly Writing Guide, Assembly Software Pipelining, Assembly Coding 9982@subsection Loop Unrolling 9983@cindex Assembly loop unrolling 9984 9985Loop unrolling consists of replicating code so that several limbs are 9986processed in each loop. At a minimum this reduces loop overheads by a 9987corresponding factor, but it can also allow better register usage, for example 9988alternately using one register combination and then another. Judicious use of 9989@command{m4} macros can help avoid lots of duplication in the source code. 9990 9991Any amount of unrolling can be handled with a loop counter that's decremented 9992by @math{N} each time, stopping when the remaining count is less than the 9993further @math{N} the loop will process. Or by subtracting @math{N} at the 9994start, the termination condition becomes when the counter @math{C} is less 9995than 0 (and the count of remaining limbs is @math{C+N}). 9996 9997Alternately for a power of 2 unroll the loop count and remainder can be 9998established with a shift and mask. This is convenient if also making a 9999computed jump into the middle of a large loop. 10000 10001The limbs not a multiple of the unrolling can be handled in various ways, for 10002example 10003 10004@itemize @bullet 10005@item 10006A simple loop at the end (or the start) to process the excess. Care will be 10007wanted that it isn't too much slower than the unrolled part. 10008 10009@item 10010A set of binary tests, for example after an 8-limb unrolling, test for 4 more 10011limbs to process, then a further 2 more or not, and finally 1 more or not. 10012This will probably take more code space than a simple loop. 10013 10014@item 10015A @code{switch} statement, providing separate code for each possible excess, 10016for example an 8-limb unrolling would have separate code for 0 remaining, 1 10017remaining, etc, up to 7 remaining. This might take a lot of code, but may be 10018the best way to optimize all cases in combination with a deep pipelined loop. 10019 10020@item 10021A computed jump into the middle of the loop, thus making the first iteration 10022handle the excess. This should make times smoothly increase with size, which 10023is attractive, but setups for the jump and adjustments for pointers can be 10024tricky and could become quite difficult in combination with deep pipelining. 10025@end itemize 10026 10027 10028@node Assembly Writing Guide, , Assembly Loop Unrolling, Assembly Coding 10029@subsection Writing Guide 10030@cindex Assembly writing guide 10031 10032This is a guide to writing software pipelined loops for processing limb 10033vectors in assembly. 10034 10035First determine the algorithm and which instructions are needed. Code it 10036without unrolling or scheduling, to make sure it works. On a 3-operand CPU 10037try to write each new value to a new register, this will greatly simplify later 10038steps. 10039 10040Then note for each instruction the functional unit and/or issue port 10041requirements. If an instruction can use either of two units, like U0 or U1 10042then make a category ``U0/U1''. Count the total using each unit (or combined 10043unit), and count all instructions. 10044 10045Figure out from those counts the best possible loop time. The goal will be to 10046find a perfect schedule where instruction latencies are completely hidden. 10047The total instruction count might be the limiting factor, or perhaps a 10048particular functional unit. It might be possible to tweak the instructions to 10049help the limiting factor. 10050 10051Suppose the loop time is @math{N}, then make @math{N} issue buckets, with the 10052final loop branch at the end of the last. Now fill the buckets with dummy 10053instructions using the functional units desired. Run this to make sure the 10054intended speed is reached. 10055 10056Now replace the dummy instructions with the real instructions from the slow 10057but correct loop you started with. The first will typically be a load 10058instruction. Then the instruction using that value is placed in a bucket an 10059appropriate distance down. Run the loop again, to check it still runs at 10060target speed. 10061 10062Keep placing instructions, frequently measuring the loop. After a few you 10063will need to wrap around from the last bucket back to the top of the loop. If 10064you used the new-register for new-value strategy above then there will be no 10065register conflicts. If not then take care not to clobber something already in 10066use. Changing registers at this time is very error prone. 10067 10068The loop will overlap two or more of the original loop iterations, and the 10069computation of one vector element result will be started in one iteration of 10070the new loop, and completed one or several iterations later. 10071 10072The final step is to create feed-in and wind-down code for the loop. A good 10073way to do this is to make a copy (or copies) of the loop at the start and 10074delete those instructions which don't have valid antecedents, and at the end 10075replicate and delete those whose results are unwanted (including any further 10076loads). 10077 10078The loop will have a minimum number of limbs loaded and processed, so the 10079feed-in code must test if the request size is smaller and skip either to a 10080suitable part of the wind-down or to special code for small sizes. 10081 10082 10083@node Internals, Contributors, Algorithms, Top 10084@chapter Internals 10085@cindex Internals 10086 10087@strong{This chapter is provided only for informational purposes and the 10088various internals described here may change in future GMP releases. 10089Applications expecting to be compatible with future releases should use only 10090the documented interfaces described in previous chapters.} 10091 10092@menu 10093* Integer Internals:: 10094* Rational Internals:: 10095* Float Internals:: 10096* Raw Output Internals:: 10097* C++ Interface Internals:: 10098@end menu 10099 10100@node Integer Internals, Rational Internals, Internals, Internals 10101@section Integer Internals 10102@cindex Integer internals 10103 10104@code{mpz_t} variables represent integers using sign and magnitude, in space 10105dynamically allocated and reallocated. The fields are as follows. 10106 10107@table @asis 10108@item @code{_mp_size} 10109The number of limbs, or the negative of that when representing a negative 10110integer. Zero is represented by @code{_mp_size} set to zero, in which case 10111the @code{_mp_d} data is unused. 10112 10113@item @code{_mp_d} 10114A pointer to an array of limbs which is the magnitude. These are stored 10115``little endian'' as per the @code{mpn} functions, so @code{_mp_d[0]} is the 10116least significant limb and @code{_mp_d[ABS(_mp_size)-1]} is the most 10117significant. Whenever @code{_mp_size} is non-zero, the most significant limb 10118is non-zero. 10119 10120Currently there's always at least one limb allocated, so for instance 10121@code{mpz_set_ui} never needs to reallocate, and @code{mpz_get_ui} can fetch 10122@code{_mp_d[0]} unconditionally (though its value is then only wanted if 10123@code{_mp_size} is non-zero). 10124 10125@item @code{_mp_alloc} 10126@code{_mp_alloc} is the number of limbs currently allocated at @code{_mp_d}, 10127and naturally @code{_mp_alloc >= ABS(_mp_size)}. When an @code{mpz} routine 10128is about to (or might be about to) increase @code{_mp_size}, it checks 10129@code{_mp_alloc} to see whether there's enough space, and reallocates if not. 10130@code{MPZ_REALLOC} is generally used for this. 10131@end table 10132 10133The various bitwise logical functions like @code{mpz_and} behave as if 10134negative values were twos complement. But sign and magnitude is always used 10135internally, and necessary adjustments are made during the calculations. 10136Sometimes this isn't pretty, but sign and magnitude are best for other 10137routines. 10138 10139Some internal temporary variables are setup with @code{MPZ_TMP_INIT} and these 10140have @code{_mp_d} space obtained from @code{TMP_ALLOC} rather than the memory 10141allocation functions. Care is taken to ensure that these are big enough that 10142no reallocation is necessary (since it would have unpredictable consequences). 10143 10144@code{_mp_size} and @code{_mp_alloc} are @code{int}, although @code{mp_size_t} 10145is usually a @code{long}. This is done to make the fields just 32 bits on 10146some 64 bits systems, thereby saving a few bytes of data space but still 10147providing plenty of range. 10148 10149 10150@node Rational Internals, Float Internals, Integer Internals, Internals 10151@section Rational Internals 10152@cindex Rational internals 10153 10154@code{mpq_t} variables represent rationals using an @code{mpz_t} numerator and 10155denominator (@pxref{Integer Internals}). 10156 10157The canonical form adopted is denominator positive (and non-zero), no common 10158factors between numerator and denominator, and zero uniquely represented as 101590/1. 10160 10161It's believed that casting out common factors at each stage of a calculation 10162is best in general. A GCD is an @math{O(N^2)} operation so it's better to do 10163a few small ones immediately than to delay and have to do a big one later. 10164Knowing the numerator and denominator have no common factors can be used for 10165example in @code{mpq_mul} to make only two cross GCDs necessary, not four. 10166 10167This general approach to common factors is badly sub-optimal in the presence 10168of simple factorizations or little prospect for cancellation, but GMP has no 10169way to know when this will occur. As per @ref{Efficiency}, that's left to 10170applications. The @code{mpq_t} framework might still suit, with 10171@code{mpq_numref} and @code{mpq_denref} for direct access to the numerator and 10172denominator, or of course @code{mpz_t} variables can be used directly. 10173 10174 10175@node Float Internals, Raw Output Internals, Rational Internals, Internals 10176@section Float Internals 10177@cindex Float internals 10178 10179Efficient calculation is the primary aim of GMP floats and the use of whole 10180limbs and simple rounding facilitates this. 10181 10182@code{mpf_t} floats have a variable precision mantissa and a single machine 10183word signed exponent. The mantissa is represented using sign and magnitude. 10184 10185@c FIXME: The arrow heads don't join to the lines exactly. 10186@tex 10187\global\newdimen\GMPboxwidth \GMPboxwidth=5em 10188\global\newdimen\GMPboxheight \GMPboxheight=3ex 10189\def\centreline{\hbox{\raise 0.8ex \vbox{\hrule \hbox{\hfil}}}} 10190\GMPdisplay{% 10191\vbox{% 10192 \hbox to 5\GMPboxwidth {most significant limb \hfil least significant limb} 10193 \vskip 0.7ex 10194 \def\GMPcentreline#1{\hbox{\raise 0.5 ex \vbox{\hrule \hbox to #1 {}}}} 10195 \hbox { 10196 \hbox to 3\GMPboxwidth {% 10197 \setbox 0 = \hbox{@code{\_mp\_exp}}% 10198 \dimen0=3\GMPboxwidth 10199 \advance\dimen0 by -\wd0 10200 \divide\dimen0 by 2 10201 \advance\dimen0 by -1em 10202 \setbox1 = \hbox{$\rightarrow$}% 10203 \dimen1=\dimen0 10204 \advance\dimen1 by -\wd1 10205 \GMPcentreline{\dimen0}% 10206 \hfil 10207 \box0% 10208 \hfil 10209 \GMPcentreline{\dimen1{}}% 10210 \box1} 10211 \hbox to 2\GMPboxwidth {\hfil @code{\_mp\_d}}} 10212 \vskip 0.5ex 10213 \vbox {% 10214 \hrule 10215 \hbox{% 10216 \vrule height 2ex depth 1ex 10217 \hbox to \GMPboxwidth {}% 10218 \vrule 10219 \hbox to \GMPboxwidth {}% 10220 \vrule 10221 \hbox to \GMPboxwidth {}% 10222 \vrule 10223 \hbox to \GMPboxwidth {}% 10224 \vrule 10225 \hbox to \GMPboxwidth {}% 10226 \vrule} 10227 \hrule 10228 } 10229 \hbox {% 10230 \hbox to 0.8 pt {} 10231 \hbox to 3\GMPboxwidth {% 10232 \hfil $\cdot$} \hbox {$\leftarrow$ radix point\hfil}} 10233 \hbox to 5\GMPboxwidth{% 10234 \setbox 0 = \hbox{@code{\_mp\_size}}% 10235 \dimen0 = 5\GMPboxwidth 10236 \advance\dimen0 by -\wd0 10237 \divide\dimen0 by 2 10238 \advance\dimen0 by -1em 10239 \dimen1 = \dimen0 10240 \setbox1 = \hbox{$\leftarrow$}% 10241 \setbox2 = \hbox{$\rightarrow$}% 10242 \advance\dimen0 by -\wd1 10243 \advance\dimen1 by -\wd2 10244 \hbox to 0.3 em {}% 10245 \box1 10246 \GMPcentreline{\dimen0}% 10247 \hfil 10248 \box0 10249 \hfil 10250 \GMPcentreline{\dimen1}% 10251 \box2} 10252}} 10253@end tex 10254@ifnottex 10255@example 10256 most least 10257significant significant 10258 limb limb 10259 10260 _mp_d 10261 |---- _mp_exp ---> | 10262 _____ _____ _____ _____ _____ 10263 |_____|_____|_____|_____|_____| 10264 . <------------ radix point 10265 10266 <-------- _mp_size ---------> 10267@sp 1 10268@end example 10269@end ifnottex 10270 10271@noindent 10272The fields are as follows. 10273 10274@table @asis 10275@item @code{_mp_size} 10276The number of limbs currently in use, or the negative of that when 10277representing a negative value. Zero is represented by @code{_mp_size} and 10278@code{_mp_exp} both set to zero, and in that case the @code{_mp_d} data is 10279unused. (In the future @code{_mp_exp} might be undefined when representing 10280zero.) 10281 10282@item @code{_mp_prec} 10283The precision of the mantissa, in limbs. In any calculation the aim is to 10284produce @code{_mp_prec} limbs of result (the most significant being non-zero). 10285 10286@item @code{_mp_d} 10287A pointer to the array of limbs which is the absolute value of the mantissa. 10288These are stored ``little endian'' as per the @code{mpn} functions, so 10289@code{_mp_d[0]} is the least significant limb and 10290@code{_mp_d[ABS(_mp_size)-1]} the most significant. 10291 10292The most significant limb is always non-zero, but there are no other 10293restrictions on its value, in particular the highest 1 bit can be anywhere 10294within the limb. 10295 10296@code{_mp_prec+1} limbs are allocated to @code{_mp_d}, the extra limb being 10297for convenience (see below). There are no reallocations during a calculation, 10298only in a change of precision with @code{mpf_set_prec}. 10299 10300@item @code{_mp_exp} 10301The exponent, in limbs, determining the location of the implied radix point. 10302Zero means the radix point is just above the most significant limb. Positive 10303values mean a radix point offset towards the lower limbs and hence a value 10304@math{@ge{} 1}, as for example in the diagram above. Negative exponents mean 10305a radix point further above the highest limb. 10306 10307Naturally the exponent can be any value, it doesn't have to fall within the 10308limbs as the diagram shows, it can be a long way above or a long way below. 10309Limbs other than those included in the @code{@{_mp_d,_mp_size@}} data 10310are treated as zero. 10311@end table 10312 10313The @code{_mp_size} and @code{_mp_prec} fields are @code{int}, although the 10314@code{mp_size_t} type is usually a @code{long}. The @code{_mp_exp} field is 10315usually @code{long}. This is done to make some fields just 32 bits on some 64 10316bits systems, thereby saving a few bytes of data space but still providing 10317plenty of precision and a very large range. 10318 10319 10320@sp 1 10321@noindent 10322The following various points should be noted. 10323 10324@table @asis 10325@item Low Zeros 10326The least significant limbs @code{_mp_d[0]} etc can be zero, though such low 10327zeros can always be ignored. Routines likely to produce low zeros check and 10328avoid them to save time in subsequent calculations, but for most routines 10329they're quite unlikely and aren't checked. 10330 10331@item Mantissa Size Range 10332The @code{_mp_size} count of limbs in use can be less than @code{_mp_prec} if 10333the value can be represented in less. This means low precision values or 10334small integers stored in a high precision @code{mpf_t} can still be operated 10335on efficiently. 10336 10337@code{_mp_size} can also be greater than @code{_mp_prec}. Firstly a value is 10338allowed to use all of the @code{_mp_prec+1} limbs available at @code{_mp_d}, 10339and secondly when @code{mpf_set_prec_raw} lowers @code{_mp_prec} it leaves 10340@code{_mp_size} unchanged and so the size can be arbitrarily bigger than 10341@code{_mp_prec}. 10342 10343@item Rounding 10344All rounding is done on limb boundaries. Calculating @code{_mp_prec} limbs 10345with the high non-zero will ensure the application requested minimum precision 10346is obtained. 10347 10348The use of simple ``trunc'' rounding towards zero is efficient, since there's 10349no need to examine extra limbs and increment or decrement. 10350 10351@item Bit Shifts 10352Since the exponent is in limbs, there are no bit shifts in basic operations 10353like @code{mpf_add} and @code{mpf_mul}. When differing exponents are 10354encountered all that's needed is to adjust pointers to line up the relevant 10355limbs. 10356 10357Of course @code{mpf_mul_2exp} and @code{mpf_div_2exp} will require bit shifts, 10358but the choice is between an exponent in limbs which requires shifts there, or 10359one in bits which requires them almost everywhere else. 10360 10361@item Use of @code{_mp_prec+1} Limbs 10362The extra limb on @code{_mp_d} (@code{_mp_prec+1} rather than just 10363@code{_mp_prec}) helps when an @code{mpf} routine might get a carry from its 10364operation. @code{mpf_add} for instance will do an @code{mpn_add} of 10365@code{_mp_prec} limbs. If there's no carry then that's the result, but if 10366there is a carry then it's stored in the extra limb of space and 10367@code{_mp_size} becomes @code{_mp_prec+1}. 10368 10369Whenever @code{_mp_prec+1} limbs are held in a variable, the low limb is not 10370needed for the intended precision, only the @code{_mp_prec} high limbs. But 10371zeroing it out or moving the rest down is unnecessary. Subsequent routines 10372reading the value will simply take the high limbs they need, and this will be 10373@code{_mp_prec} if their target has that same precision. This is no more than 10374a pointer adjustment, and must be checked anyway since the destination 10375precision can be different from the sources. 10376 10377Copy functions like @code{mpf_set} will retain a full @code{_mp_prec+1} limbs 10378if available. This ensures that a variable which has @code{_mp_size} equal to 10379@code{_mp_prec+1} will get its full exact value copied. Strictly speaking 10380this is unnecessary since only @code{_mp_prec} limbs are needed for the 10381application's requested precision, but it's considered that an @code{mpf_set} 10382from one variable into another of the same precision ought to produce an exact 10383copy. 10384 10385@item Application Precisions 10386@code{__GMPF_BITS_TO_PREC} converts an application requested precision to an 10387@code{_mp_prec}. The value in bits is rounded up to a whole limb then an 10388extra limb is added since the most significant limb of @code{_mp_d} is only 10389non-zero and therefore might contain only one bit. 10390 10391@code{__GMPF_PREC_TO_BITS} does the reverse conversion, and removes the extra 10392limb from @code{_mp_prec} before converting to bits. The net effect of 10393reading back with @code{mpf_get_prec} is simply the precision rounded up to a 10394multiple of @code{mp_bits_per_limb}. 10395 10396Note that the extra limb added here for the high only being non-zero is in 10397addition to the extra limb allocated to @code{_mp_d}. For example with a 1039832-bit limb, an application request for 250 bits will be rounded up to 8 10399limbs, then an extra added for the high being only non-zero, giving an 10400@code{_mp_prec} of 9. @code{_mp_d} then gets 10 limbs allocated. Reading 10401back with @code{mpf_get_prec} will take @code{_mp_prec} subtract 1 limb and 10402multiply by 32, giving 256 bits. 10403 10404Strictly speaking, the fact the high limb has at least one bit means that a 10405float with, say, 3 limbs of 32-bits each will be holding at least 65 bits, but 10406for the purposes of @code{mpf_t} it's considered simply to be 64 bits, a nice 10407multiple of the limb size. 10408@end table 10409 10410 10411@node Raw Output Internals, C++ Interface Internals, Float Internals, Internals 10412@section Raw Output Internals 10413@cindex Raw output internals 10414 10415@noindent 10416@code{mpz_out_raw} uses the following format. 10417 10418@tex 10419\global\newdimen\GMPboxwidth \GMPboxwidth=5em 10420\global\newdimen\GMPboxheight \GMPboxheight=3ex 10421\def\centreline{\hbox{\raise 0.8ex \vbox{\hrule \hbox{\hfil}}}} 10422\GMPdisplay{% 10423\vbox{% 10424 \def\GMPcentreline#1{\hbox{\raise 0.5 ex \vbox{\hrule \hbox to #1 {}}}} 10425 \vbox {% 10426 \hrule 10427 \hbox{% 10428 \vrule height 2.5ex depth 1.5ex 10429 \hbox to \GMPboxwidth {\hfil size\hfil}% 10430 \vrule 10431 \hbox to 3\GMPboxwidth {\hfil data bytes\hfil}% 10432 \vrule} 10433 \hrule} 10434}} 10435@end tex 10436@ifnottex 10437@example 10438+------+------------------------+ 10439| size | data bytes | 10440+------+------------------------+ 10441@end example 10442@end ifnottex 10443 10444The size is 4 bytes written most significant byte first, being the number of 10445subsequent data bytes, or the twos complement negative of that when a negative 10446integer is represented. The data bytes are the absolute value of the integer, 10447written most significant byte first. 10448 10449The most significant data byte is always non-zero, so the output is the same 10450on all systems, irrespective of limb size. 10451 10452In GMP 1, leading zero bytes were written to pad the data bytes to a multiple 10453of the limb size. @code{mpz_inp_raw} will still accept this, for 10454compatibility. 10455 10456The use of ``big endian'' for both the size and data fields is deliberate, it 10457makes the data easy to read in a hex dump of a file. Unfortunately it also 10458means that the limb data must be reversed when reading or writing, so neither 10459a big endian nor little endian system can just read and write @code{_mp_d}. 10460 10461 10462@node C++ Interface Internals, , Raw Output Internals, Internals 10463@section C++ Interface Internals 10464@cindex C++ interface internals 10465 10466A system of expression templates is used to ensure something like @code{a=b+c} 10467turns into a simple call to @code{mpz_add} etc. For @code{mpf_class} 10468the scheme also ensures the precision of the final 10469destination is used for any temporaries within a statement like 10470@code{f=w*x+y*z}. These are important features which a naive implementation 10471cannot provide. 10472 10473A simplified description of the scheme follows. The true scheme is 10474complicated by the fact that expressions have different return types. For 10475detailed information, refer to the source code. 10476 10477To perform an operation, say, addition, we first define a ``function object'' 10478evaluating it, 10479 10480@example 10481struct __gmp_binary_plus 10482@{ 10483 static void eval(mpf_t f, const mpf_t g, const mpf_t h) 10484 @{ 10485 mpf_add(f, g, h); 10486 @} 10487@}; 10488@end example 10489 10490@noindent 10491And an ``additive expression'' object, 10492 10493@example 10494__gmp_expr<__gmp_binary_expr<mpf_class, mpf_class, __gmp_binary_plus> > 10495operator+(const mpf_class &f, const mpf_class &g) 10496@{ 10497 return __gmp_expr 10498 <__gmp_binary_expr<mpf_class, mpf_class, __gmp_binary_plus> >(f, g); 10499@} 10500@end example 10501 10502The seemingly redundant @code{__gmp_expr<__gmp_binary_expr<@dots{}>>} is used to 10503encapsulate any possible kind of expression into a single template type. In 10504fact even @code{mpf_class} etc are @code{typedef} specializations of 10505@code{__gmp_expr}. 10506 10507Next we define assignment of @code{__gmp_expr} to @code{mpf_class}. 10508 10509@example 10510template <class T> 10511mpf_class & mpf_class::operator=(const __gmp_expr<T> &expr) 10512@{ 10513 expr.eval(this->get_mpf_t(), this->precision()); 10514 return *this; 10515@} 10516 10517template <class Op> 10518void __gmp_expr<__gmp_binary_expr<mpf_class, mpf_class, Op> >::eval 10519(mpf_t f, mp_bitcnt_t precision) 10520@{ 10521 Op::eval(f, expr.val1.get_mpf_t(), expr.val2.get_mpf_t()); 10522@} 10523@end example 10524 10525where @code{expr.val1} and @code{expr.val2} are references to the expression's 10526operands (here @code{expr} is the @code{__gmp_binary_expr} stored within the 10527@code{__gmp_expr}). 10528 10529This way, the expression is actually evaluated only at the time of assignment, 10530when the required precision (that of @code{f}) is known. Furthermore the 10531target @code{mpf_t} is now available, thus we can call @code{mpf_add} directly 10532with @code{f} as the output argument. 10533 10534Compound expressions are handled by defining operators taking subexpressions 10535as their arguments, like this: 10536 10537@example 10538template <class T, class U> 10539__gmp_expr 10540<__gmp_binary_expr<__gmp_expr<T>, __gmp_expr<U>, __gmp_binary_plus> > 10541operator+(const __gmp_expr<T> &expr1, const __gmp_expr<U> &expr2) 10542@{ 10543 return __gmp_expr 10544 <__gmp_binary_expr<__gmp_expr<T>, __gmp_expr<U>, __gmp_binary_plus> > 10545 (expr1, expr2); 10546@} 10547@end example 10548 10549And the corresponding specializations of @code{__gmp_expr::eval}: 10550 10551@example 10552template <class T, class U, class Op> 10553void __gmp_expr 10554<__gmp_binary_expr<__gmp_expr<T>, __gmp_expr<U>, Op> >::eval 10555(mpf_t f, mp_bitcnt_t precision) 10556@{ 10557 // declare two temporaries 10558 mpf_class temp1(expr.val1, precision), temp2(expr.val2, precision); 10559 Op::eval(f, temp1.get_mpf_t(), temp2.get_mpf_t()); 10560@} 10561@end example 10562 10563The expression is thus recursively evaluated to any level of complexity and 10564all subexpressions are evaluated to the precision of @code{f}. 10565 10566 10567@node Contributors, References, Internals, Top 10568@comment node-name, next, previous, up 10569@appendix Contributors 10570@cindex Contributors 10571 10572Torbj@"orn Granlund wrote the original GMP library and is still the main 10573developer. Code not explicitly attributed to others, was contributed by 10574Torbj@"orn. Several other individuals and organizations have contributed 10575GMP. Here is a list in chronological order on first contribution: 10576 10577Gunnar Sj@"odin and Hans Riesel helped with mathematical problems in early 10578versions of the library. 10579 10580Richard Stallman helped with the interface design and revised the first 10581version of this manual. 10582 10583Brian Beuning and Doug Lea helped with testing of early versions of the 10584library and made creative suggestions. 10585 10586John Amanatides of York University in Canada contributed the function 10587@code{mpz_probab_prime_p}. 10588 10589Paul Zimmermann wrote the REDC-based mpz_powm code, the Sch@"onhage-Strassen 10590FFT multiply code, and the Karatsuba square root code. He also improved the 10591Toom3 code for GMP 4.2. Paul sparked the development of GMP 2, with his 10592comparisons between bignum packages. The ECMNET project Paul is organizing 10593was a driving force behind many of the optimizations in GMP 3. Paul also 10594wrote the new GMP 4.3 nth root code (with Torbj@"orn). 10595 10596Ken Weber (Kent State University, Universidade Federal do Rio Grande do Sul) 10597contributed now defunct versions of @code{mpz_gcd}, @code{mpz_divexact}, 10598@code{mpn_gcd}, and @code{mpn_bdivmod}, partially supported by CNPq (Brazil) 10599grant 301314194-2. 10600 10601Per Bothner of Cygnus Support helped to set up GMP to use Cygnus' configure. 10602He has also made valuable suggestions and tested numerous intermediary 10603releases. 10604 10605Joachim Hollman was involved in the design of the @code{mpf} interface, and in 10606the @code{mpz} design revisions for version 2. 10607 10608Bennet Yee contributed the initial versions of @code{mpz_jacobi} and 10609@code{mpz_legendre}. 10610 10611Andreas Schwab contributed the files @file{mpn/m68k/lshift.S} and 10612@file{mpn/m68k/rshift.S} (now in @file{.asm} form). 10613 10614Robert Harley of Inria, France and David Seal of ARM, England, suggested clever 10615improvements for population count. Robert also wrote highly optimized 10616Karatsuba and 3-way Toom multiplication functions for GMP 3, and contributed 10617the ARM assembly code. 10618 10619Torsten Ekedahl of the Mathematical department of Stockholm University provided 10620significant inspiration during several phases of the GMP development. His 10621mathematical expertise helped improve several algorithms. 10622 10623Linus Nordberg wrote the new configure system based on autoconf and 10624implemented the new random functions. 10625 10626Kevin Ryde worked on a large number of things: optimized x86 code, m4 asm 10627macros, parameter tuning, speed measuring, the configure system, function 10628inlining, divisibility tests, bit scanning, Jacobi symbols, Fibonacci and Lucas 10629number functions, printf and scanf functions, perl interface, demo expression 10630parser, the algorithms chapter in the manual, @file{gmpasm-mode.el}, and 10631various miscellaneous improvements elsewhere. 10632 10633Kent Boortz made the Mac OS 9 port. 10634 10635Steve Root helped write the optimized alpha 21264 assembly code. 10636 10637Gerardo Ballabio wrote the @file{gmpxx.h} C++ class interface and the C++ 10638@code{istream} input routines. 10639 10640Jason Moxham rewrote @code{mpz_fac_ui}. 10641 10642Pedro Gimeno implemented the Mersenne Twister and made other random number 10643improvements. 10644 10645Niels M@"oller wrote the sub-quadratic GCD, extended GCD and jacobi code, the 10646quadratic Hensel division code, and (with Torbj@"orn) the new divide and 10647conquer division code for GMP 4.3. Niels also helped implement the new Toom 10648multiply code for GMP 4.3 and implemented helper functions to simplify Toom 10649evaluations for GMP 5.0. He wrote the original version of mpn_mulmod_bnm1, and 10650he is the main author of the mini-gmp package used for gmp bootstrapping. 10651 10652Alberto Zanoni and Marco Bodrato suggested the unbalanced multiply strategy, 10653and found the optimal strategies for evaluation and interpolation in Toom 10654multiplication. 10655 10656Marco Bodrato helped implement the new Toom multiply code for GMP 4.3 and 10657implemented most of the new Toom multiply and squaring code for 5.0. 10658He is the main author of the current mpn_mulmod_bnm1 and mpn_mullo_n. Marco 10659also wrote the functions mpn_invert and mpn_invertappr. He is the author of 10660the current combinatorial functions: binomial, factorial, multifactorial, 10661primorial. 10662 10663David Harvey suggested the internal function @code{mpn_bdiv_dbm1}, implementing 10664division relevant to Toom multiplication. He also worked on fast assembly 10665sequences, in particular on a fast AMD64 @code{mpn_mul_basecase}. He wrote 10666the internal middle product functions @code{mpn_mulmid_basecase}, 10667@code{mpn_toom42_mulmid}, @code{mpn_mulmid_n} and related helper routines. 10668 10669Martin Boij wrote @code{mpn_perfect_power_p}. 10670 10671Marc Glisse improved @file{gmpxx.h}: use fewer temporaries (faster), 10672specializations of @code{numeric_limits} and @code{common_type}, C++11 10673features (move constructors, explicit bool conversion, UDL), make the 10674conversion from @code{mpq_class} to @code{mpz_class} explicit, optimize 10675operations where one argument is a small compile-time constant, replace 10676some heap allocations by stack allocations. He also fixed the eofbit 10677handling of C++ streams, and removed one division from @file{mpq/aors.c}. 10678 10679David S Miller wrote assembly code for SPARC T3 and T4. 10680 10681Mark Sofroniou cleaned up the types of mul_fft.c, letting it work for huge 10682operands. 10683 10684Ulrich Weigand ported GMP to the powerpc64le ABI. 10685 10686(This list is chronological, not ordered after significance. If you have 10687contributed to GMP but are not listed above, please tell 10688@email{gmp-devel@@gmplib.org} about the omission!) 10689 10690The development of floating point functions of GNU MP 2, were supported in part 10691by the ESPRIT-BRA (Basic Research Activities) 6846 project POSSO (POlynomial 10692System SOlving). 10693 10694The development of GMP 2, 3, and 4.0 was supported in part by the IDA Center 10695for Computing Sciences. 10696 10697The development of GMP 4.3, 5.0, and 5.1 was supported in part by the Swedish 10698Foundation for Strategic Research. 10699 10700Thanks go to Hans Thorsen for donating an SGI system for the GMP test system 10701environment. 10702 10703@node References, GNU Free Documentation License, Contributors, Top 10704@comment node-name, next, previous, up 10705@appendix References 10706@cindex References 10707 10708@c FIXME: In tex, the @uref's are unhyphenated, which is good for clarity, 10709@c but being long words they upset paragraph formatting (the preceding line 10710@c can get badly stretched). Would like an conditional @* style line break 10711@c if the uref is too long to fit on the last line of the paragraph, but it's 10712@c not clear how to do that. For now explicit @texlinebreak{}s are used on 10713@c paragraphs that come out bad. 10714 10715@section Books 10716 10717@itemize @bullet 10718@item 10719Jonathan M. Borwein and Peter B. Borwein, ``Pi and the AGM: A Study in 10720Analytic Number Theory and Computational Complexity'', Wiley, 1998. 10721 10722@item 10723Richard Crandall and Carl Pomerance, ``Prime Numbers: A Computational 10724Perspective'', 2nd edition, Springer-Verlag, 2005. 10725@texlinebreak{} @uref{http://www.math.dartmouth.edu/~carlp/} 10726 10727@item 10728Henri Cohen, ``A Course in Computational Algebraic Number Theory'', Graduate 10729Texts in Mathematics number 138, Springer-Verlag, 1993. 10730@texlinebreak{} @uref{http://www.math.u-bordeaux.fr/~cohen/} 10731 10732@item 10733Donald E. Knuth, ``The Art of Computer Programming'', volume 2, 10734``Seminumerical Algorithms'', 3rd edition, Addison-Wesley, 1998. 10735@texlinebreak{} @uref{http://www-cs-faculty.stanford.edu/~knuth/taocp.html} 10736 10737@item 10738John D. Lipson, ``Elements of Algebra and Algebraic Computing'', 10739The Benjamin Cummings Publishing Company Inc, 1981. 10740 10741@item 10742Alfred J. Menezes, Paul C. van Oorschot and Scott A. Vanstone, ``Handbook of 10743Applied Cryptography'', @uref{http://www.cacr.math.uwaterloo.ca/hac/} 10744 10745@item 10746Richard M. Stallman and the GCC Developer Community, ``Using the GNU Compiler 10747Collection'', Free Software Foundation, 2008, available online 10748@uref{https://gcc.gnu.org/onlinedocs/}, and in the GCC package 10749@uref{https://ftp.gnu.org/gnu/gcc/} 10750@end itemize 10751 10752@section Papers 10753 10754@itemize @bullet 10755@item 10756Yves Bertot, Nicolas Magaud and Paul Zimmermann, ``A Proof of GMP Square 10757Root'', Journal of Automated Reasoning, volume 29, 2002, pp.@: 225-252. Also 10758available online as INRIA Research Report 4475, June 2002, 10759@uref{http://hal.inria.fr/docs/00/07/21/13/PDF/RR-4475.pdf} 10760 10761@item 10762Christoph Burnikel and Joachim Ziegler, ``Fast Recursive Division'', 10763Max-Planck-Institut fuer Informatik Research Report MPI-I-98-1-022, 10764@texlinebreak{} @uref{http://data.mpi-sb.mpg.de/internet/reports.nsf/NumberView/1998-1-022} 10765 10766@item 10767Torbj@"orn Granlund and Peter L. Montgomery, ``Division by Invariant Integers 10768using Multiplication'', in Proceedings of the SIGPLAN PLDI'94 Conference, June 107691994. Also available @uref{https://gmplib.org/~tege/divcnst-pldi94.pdf}. 10770 10771@item 10772Niels M@"oller and Torbj@"orn Granlund, ``Improved division by invariant 10773integers'', IEEE Transactions on Computers, 11 June 2010. 10774@uref{https://gmplib.org/~tege/division-paper.pdf} 10775 10776@item 10777Torbj@"orn Granlund and Niels M@"oller, ``Division of integers large and 10778small'', to appear. 10779 10780@item 10781Tudor Jebelean, 10782``An algorithm for exact division'', 10783Journal of Symbolic Computation, 10784volume 15, 1993, pp.@: 169-180. 10785Research report version available @texlinebreak{} 10786@uref{ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1992/92-35.ps.gz} 10787 10788@item 10789Tudor Jebelean, ``Exact Division with Karatsuba Complexity - Extended 10790Abstract'', RISC-Linz technical report 96-31, @texlinebreak{} 10791@uref{ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1996/96-31.ps.gz} 10792 10793@item 10794Tudor Jebelean, ``Practical Integer Division with Karatsuba Complexity'', 10795ISSAC 97, pp.@: 339-341. Technical report available @texlinebreak{} 10796@uref{ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1996/96-29.ps.gz} 10797 10798@item 10799Tudor Jebelean, ``A Generalization of the Binary GCD Algorithm'', ISSAC 93, 10800pp.@: 111-116. Technical report version available @texlinebreak{} 10801@uref{ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1993/93-01.ps.gz} 10802 10803@item 10804Tudor Jebelean, ``A Double-Digit Lehmer-Euclid Algorithm for Finding the GCD 10805of Long Integers'', Journal of Symbolic Computation, volume 19, 1995, 10806pp.@: 145-157. Technical report version also available @texlinebreak{} 10807@uref{ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1992/92-69.ps.gz} 10808 10809@item 10810Werner Krandick and Tudor Jebelean, ``Bidirectional Exact Integer Division'', 10811Journal of Symbolic Computation, volume 21, 1996, pp.@: 441-455. Early 10812technical report version also available 10813@uref{ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1994/94-50.ps.gz} 10814 10815@item 10816Makoto Matsumoto and Takuji Nishimura, ``Mersenne Twister: A 623-dimensionally 10817equidistributed uniform pseudorandom number generator'', ACM Transactions on 10818Modelling and Computer Simulation, volume 8, January 1998, pp.@: 3-30. 10819Available online @texlinebreak{} 10820@uref{http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/ARTICLES/mt.ps.gz} (or .pdf) 10821 10822@item 10823R. Moenck and A. Borodin, ``Fast Modular Transforms via Division'', 10824Proceedings of the 13th Annual IEEE Symposium on Switching and Automata 10825Theory, October 1972, pp.@: 90-96. Reprinted as ``Fast Modular Transforms'', 10826Journal of Computer and System Sciences, volume 8, number 3, June 1974, 10827pp.@: 366-386. 10828 10829@item 10830Niels M@"oller, ``On Sch@"onhage's algorithm and subquadratic integer GCD 10831 computation'', in Mathematics of Computation, volume 77, January 2008, pp.@: 10832 589-607. 10833 10834@item 10835Peter L. Montgomery, ``Modular Multiplication Without Trial Division'', in 10836Mathematics of Computation, volume 44, number 170, April 1985. 10837 10838@item 10839Arnold Sch@"onhage and Volker Strassen, ``Schnelle Multiplikation grosser 10840Zahlen'', Computing 7, 1971, pp.@: 281-292. 10841 10842@item 10843Kenneth Weber, ``The accelerated integer GCD algorithm'', 10844ACM Transactions on Mathematical Software, 10845volume 21, number 1, March 1995, pp.@: 111-122. 10846 10847@item 10848Paul Zimmermann, ``Karatsuba Square Root'', INRIA Research Report 3805, 10849November 1999, @uref{http://hal.inria.fr/inria-00072854/PDF/RR-3805.pdf} 10850 10851@item 10852Paul Zimmermann, ``A Proof of GMP Fast Division and Square Root 10853Implementations'', @texlinebreak{} 10854@uref{http://www.loria.fr/~zimmerma/papers/proof-div-sqrt.ps.gz} 10855 10856@item 10857Dan Zuras, ``On Squaring and Multiplying Large Integers'', ARITH-11: IEEE 10858Symposium on Computer Arithmetic, 1993, pp.@: 260 to 271. Reprinted as ``More 10859on Multiplying and Squaring Large Integers'', IEEE Transactions on Computers, 10860volume 43, number 8, August 1994, pp.@: 899-908. 10861@end itemize 10862 10863 10864@node GNU Free Documentation License, Concept Index, References, Top 10865@appendix GNU Free Documentation License 10866@cindex GNU Free Documentation License 10867@cindex Free Documentation License 10868@cindex Documentation license 10869@include fdl-1.3.texi 10870 10871 10872@node Concept Index, Function Index, GNU Free Documentation License, Top 10873@comment node-name, next, previous, up 10874@unnumbered Concept Index 10875@printindex cp 10876 10877@node Function Index, , Concept Index, Top 10878@comment node-name, next, previous, up 10879@unnumbered Function and Type Index 10880@printindex fn 10881 10882@bye 10883 10884@c Local variables: 10885@c fill-column: 78 10886@c compile-command: "make gmp.info" 10887@c End: 10888