1\input texinfo @c -*-texinfo-*- 2 3@c %**start of header 4@setfilename libgomp.info 5@settitle GNU libgomp 6@c %**end of header 7 8 9@copying 10Copyright @copyright{} 2006-2020 Free Software Foundation, Inc. 11 12Permission is granted to copy, distribute and/or modify this document 13under the terms of the GNU Free Documentation License, Version 1.3 or 14any later version published by the Free Software Foundation; with the 15Invariant Sections being ``Funding Free Software'', the Front-Cover 16texts being (a) (see below), and with the Back-Cover Texts being (b) 17(see below). A copy of the license is included in the section entitled 18``GNU Free Documentation License''. 19 20(a) The FSF's Front-Cover Text is: 21 22 A GNU Manual 23 24(b) The FSF's Back-Cover Text is: 25 26 You have freedom to copy and modify this GNU Manual, like GNU 27 software. Copies published by the Free Software Foundation raise 28 funds for GNU development. 29@end copying 30 31@ifinfo 32@dircategory GNU Libraries 33@direntry 34* libgomp: (libgomp). GNU Offloading and Multi Processing Runtime Library. 35@end direntry 36 37This manual documents libgomp, the GNU Offloading and Multi Processing 38Runtime library. This is the GNU implementation of the OpenMP and 39OpenACC APIs for parallel and accelerator programming in C/C++ and 40Fortran. 41 42Published by the Free Software Foundation 4351 Franklin Street, Fifth Floor 44Boston, MA 02110-1301 USA 45 46@insertcopying 47@end ifinfo 48 49 50@setchapternewpage odd 51 52@titlepage 53@title GNU Offloading and Multi Processing Runtime Library 54@subtitle The GNU OpenMP and OpenACC Implementation 55@page 56@vskip 0pt plus 1filll 57@comment For the @value{version-GCC} Version* 58@sp 1 59Published by the Free Software Foundation @* 6051 Franklin Street, Fifth Floor@* 61Boston, MA 02110-1301, USA@* 62@sp 1 63@insertcopying 64@end titlepage 65 66@summarycontents 67@contents 68@page 69 70 71@node Top 72@top Introduction 73@cindex Introduction 74 75This manual documents the usage of libgomp, the GNU Offloading and 76Multi Processing Runtime Library. This includes the GNU 77implementation of the @uref{https://www.openmp.org, OpenMP} Application 78Programming Interface (API) for multi-platform shared-memory parallel 79programming in C/C++ and Fortran, and the GNU implementation of the 80@uref{https://www.openacc.org, OpenACC} Application Programming 81Interface (API) for offloading of code to accelerator devices in C/C++ 82and Fortran. 83 84Originally, libgomp implemented the GNU OpenMP Runtime Library. Based 85on this, support for OpenACC and offloading (both OpenACC and OpenMP 864's target construct) has been added later on, and the library's name 87changed to GNU Offloading and Multi Processing Runtime Library. 88 89 90 91@comment 92@comment When you add a new menu item, please keep the right hand 93@comment aligned to the same column. Do not use tabs. This provides 94@comment better formatting. 95@comment 96@menu 97* Enabling OpenMP:: How to enable OpenMP for your applications. 98* OpenMP Runtime Library Routines: Runtime Library Routines. 99 The OpenMP runtime application programming 100 interface. 101* OpenMP Environment Variables: Environment Variables. 102 Influencing OpenMP runtime behavior with 103 environment variables. 104* Enabling OpenACC:: How to enable OpenACC for your 105 applications. 106* OpenACC Runtime Library Routines:: The OpenACC runtime application 107 programming interface. 108* OpenACC Environment Variables:: Influencing OpenACC runtime behavior with 109 environment variables. 110* CUDA Streams Usage:: Notes on the implementation of 111 asynchronous operations. 112* OpenACC Library Interoperability:: OpenACC library interoperability with the 113 NVIDIA CUBLAS library. 114* OpenACC Profiling Interface:: 115* The libgomp ABI:: Notes on the external ABI presented by libgomp. 116* Reporting Bugs:: How to report bugs in the GNU Offloading and 117 Multi Processing Runtime Library. 118* Copying:: GNU general public license says 119 how you can copy and share libgomp. 120* GNU Free Documentation License:: 121 How you can copy and share this manual. 122* Funding:: How to help assure continued work for free 123 software. 124* Library Index:: Index of this documentation. 125@end menu 126 127 128@c --------------------------------------------------------------------- 129@c Enabling OpenMP 130@c --------------------------------------------------------------------- 131 132@node Enabling OpenMP 133@chapter Enabling OpenMP 134 135To activate the OpenMP extensions for C/C++ and Fortran, the compile-time 136flag @command{-fopenmp} must be specified. This enables the OpenMP directive 137@code{#pragma omp} in C/C++ and @code{!$omp} directives in free form, 138@code{c$omp}, @code{*$omp} and @code{!$omp} directives in fixed form, 139@code{!$} conditional compilation sentinels in free form and @code{c$}, 140@code{*$} and @code{!$} sentinels in fixed form, for Fortran. The flag also 141arranges for automatic linking of the OpenMP runtime library 142(@ref{Runtime Library Routines}). 143 144A complete description of all OpenMP directives accepted may be found in 145the @uref{https://www.openmp.org, OpenMP Application Program Interface} manual, 146version 4.5. 147 148 149@c --------------------------------------------------------------------- 150@c OpenMP Runtime Library Routines 151@c --------------------------------------------------------------------- 152 153@node Runtime Library Routines 154@chapter OpenMP Runtime Library Routines 155 156The runtime routines described here are defined by Section 3 of the OpenMP 157specification in version 4.5. The routines are structured in following 158three parts: 159 160@menu 161Control threads, processors and the parallel environment. They have C 162linkage, and do not throw exceptions. 163 164* omp_get_active_level:: Number of active parallel regions 165* omp_get_ancestor_thread_num:: Ancestor thread ID 166* omp_get_cancellation:: Whether cancellation support is enabled 167* omp_get_default_device:: Get the default device for target regions 168* omp_get_dynamic:: Dynamic teams setting 169* omp_get_level:: Number of parallel regions 170* omp_get_max_active_levels:: Maximum number of active regions 171* omp_get_max_task_priority:: Maximum task priority value that can be set 172* omp_get_max_threads:: Maximum number of threads of parallel region 173* omp_get_nested:: Nested parallel regions 174* omp_get_num_devices:: Number of target devices 175* omp_get_num_procs:: Number of processors online 176* omp_get_num_teams:: Number of teams 177* omp_get_num_threads:: Size of the active team 178* omp_get_proc_bind:: Whether theads may be moved between CPUs 179* omp_get_schedule:: Obtain the runtime scheduling method 180* omp_get_team_num:: Get team number 181* omp_get_team_size:: Number of threads in a team 182* omp_get_thread_limit:: Maximum number of threads 183* omp_get_thread_num:: Current thread ID 184* omp_in_parallel:: Whether a parallel region is active 185* omp_in_final:: Whether in final or included task region 186* omp_is_initial_device:: Whether executing on the host device 187* omp_set_default_device:: Set the default device for target regions 188* omp_set_dynamic:: Enable/disable dynamic teams 189* omp_set_max_active_levels:: Limits the number of active parallel regions 190* omp_set_nested:: Enable/disable nested parallel regions 191* omp_set_num_threads:: Set upper team size limit 192* omp_set_schedule:: Set the runtime scheduling method 193 194Initialize, set, test, unset and destroy simple and nested locks. 195 196* omp_init_lock:: Initialize simple lock 197* omp_set_lock:: Wait for and set simple lock 198* omp_test_lock:: Test and set simple lock if available 199* omp_unset_lock:: Unset simple lock 200* omp_destroy_lock:: Destroy simple lock 201* omp_init_nest_lock:: Initialize nested lock 202* omp_set_nest_lock:: Wait for and set simple lock 203* omp_test_nest_lock:: Test and set nested lock if available 204* omp_unset_nest_lock:: Unset nested lock 205* omp_destroy_nest_lock:: Destroy nested lock 206 207Portable, thread-based, wall clock timer. 208 209* omp_get_wtick:: Get timer precision. 210* omp_get_wtime:: Elapsed wall clock time. 211@end menu 212 213 214 215@node omp_get_active_level 216@section @code{omp_get_active_level} -- Number of parallel regions 217@table @asis 218@item @emph{Description}: 219This function returns the nesting level for the active parallel blocks, 220which enclose the calling call. 221 222@item @emph{C/C++} 223@multitable @columnfractions .20 .80 224@item @emph{Prototype}: @tab @code{int omp_get_active_level(void);} 225@end multitable 226 227@item @emph{Fortran}: 228@multitable @columnfractions .20 .80 229@item @emph{Interface}: @tab @code{integer function omp_get_active_level()} 230@end multitable 231 232@item @emph{See also}: 233@ref{omp_get_level}, @ref{omp_get_max_active_levels}, @ref{omp_set_max_active_levels} 234 235@item @emph{Reference}: 236@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.20. 237@end table 238 239 240 241@node omp_get_ancestor_thread_num 242@section @code{omp_get_ancestor_thread_num} -- Ancestor thread ID 243@table @asis 244@item @emph{Description}: 245This function returns the thread identification number for the given 246nesting level of the current thread. For values of @var{level} outside 247zero to @code{omp_get_level} -1 is returned; if @var{level} is 248@code{omp_get_level} the result is identical to @code{omp_get_thread_num}. 249 250@item @emph{C/C++} 251@multitable @columnfractions .20 .80 252@item @emph{Prototype}: @tab @code{int omp_get_ancestor_thread_num(int level);} 253@end multitable 254 255@item @emph{Fortran}: 256@multitable @columnfractions .20 .80 257@item @emph{Interface}: @tab @code{integer function omp_get_ancestor_thread_num(level)} 258@item @tab @code{integer level} 259@end multitable 260 261@item @emph{See also}: 262@ref{omp_get_level}, @ref{omp_get_thread_num}, @ref{omp_get_team_size} 263 264@item @emph{Reference}: 265@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.18. 266@end table 267 268 269 270@node omp_get_cancellation 271@section @code{omp_get_cancellation} -- Whether cancellation support is enabled 272@table @asis 273@item @emph{Description}: 274This function returns @code{true} if cancellation is activated, @code{false} 275otherwise. Here, @code{true} and @code{false} represent their language-specific 276counterparts. Unless @env{OMP_CANCELLATION} is set true, cancellations are 277deactivated. 278 279@item @emph{C/C++}: 280@multitable @columnfractions .20 .80 281@item @emph{Prototype}: @tab @code{int omp_get_cancellation(void);} 282@end multitable 283 284@item @emph{Fortran}: 285@multitable @columnfractions .20 .80 286@item @emph{Interface}: @tab @code{logical function omp_get_cancellation()} 287@end multitable 288 289@item @emph{See also}: 290@ref{OMP_CANCELLATION} 291 292@item @emph{Reference}: 293@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.9. 294@end table 295 296 297 298@node omp_get_default_device 299@section @code{omp_get_default_device} -- Get the default device for target regions 300@table @asis 301@item @emph{Description}: 302Get the default device for target regions without device clause. 303 304@item @emph{C/C++}: 305@multitable @columnfractions .20 .80 306@item @emph{Prototype}: @tab @code{int omp_get_default_device(void);} 307@end multitable 308 309@item @emph{Fortran}: 310@multitable @columnfractions .20 .80 311@item @emph{Interface}: @tab @code{integer function omp_get_default_device()} 312@end multitable 313 314@item @emph{See also}: 315@ref{OMP_DEFAULT_DEVICE}, @ref{omp_set_default_device} 316 317@item @emph{Reference}: 318@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.30. 319@end table 320 321 322 323@node omp_get_dynamic 324@section @code{omp_get_dynamic} -- Dynamic teams setting 325@table @asis 326@item @emph{Description}: 327This function returns @code{true} if enabled, @code{false} otherwise. 328Here, @code{true} and @code{false} represent their language-specific 329counterparts. 330 331The dynamic team setting may be initialized at startup by the 332@env{OMP_DYNAMIC} environment variable or at runtime using 333@code{omp_set_dynamic}. If undefined, dynamic adjustment is 334disabled by default. 335 336@item @emph{C/C++}: 337@multitable @columnfractions .20 .80 338@item @emph{Prototype}: @tab @code{int omp_get_dynamic(void);} 339@end multitable 340 341@item @emph{Fortran}: 342@multitable @columnfractions .20 .80 343@item @emph{Interface}: @tab @code{logical function omp_get_dynamic()} 344@end multitable 345 346@item @emph{See also}: 347@ref{omp_set_dynamic}, @ref{OMP_DYNAMIC} 348 349@item @emph{Reference}: 350@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.8. 351@end table 352 353 354 355@node omp_get_level 356@section @code{omp_get_level} -- Obtain the current nesting level 357@table @asis 358@item @emph{Description}: 359This function returns the nesting level for the parallel blocks, 360which enclose the calling call. 361 362@item @emph{C/C++} 363@multitable @columnfractions .20 .80 364@item @emph{Prototype}: @tab @code{int omp_get_level(void);} 365@end multitable 366 367@item @emph{Fortran}: 368@multitable @columnfractions .20 .80 369@item @emph{Interface}: @tab @code{integer function omp_level()} 370@end multitable 371 372@item @emph{See also}: 373@ref{omp_get_active_level} 374 375@item @emph{Reference}: 376@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.17. 377@end table 378 379 380 381@node omp_get_max_active_levels 382@section @code{omp_get_max_active_levels} -- Maximum number of active regions 383@table @asis 384@item @emph{Description}: 385This function obtains the maximum allowed number of nested, active parallel regions. 386 387@item @emph{C/C++} 388@multitable @columnfractions .20 .80 389@item @emph{Prototype}: @tab @code{int omp_get_max_active_levels(void);} 390@end multitable 391 392@item @emph{Fortran}: 393@multitable @columnfractions .20 .80 394@item @emph{Interface}: @tab @code{integer function omp_get_max_active_levels()} 395@end multitable 396 397@item @emph{See also}: 398@ref{omp_set_max_active_levels}, @ref{omp_get_active_level} 399 400@item @emph{Reference}: 401@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.16. 402@end table 403 404 405@node omp_get_max_task_priority 406@section @code{omp_get_max_task_priority} -- Maximum priority value 407that can be set for tasks. 408@table @asis 409@item @emph{Description}: 410This function obtains the maximum allowed priority number for tasks. 411 412@item @emph{C/C++} 413@multitable @columnfractions .20 .80 414@item @emph{Prototype}: @tab @code{int omp_get_max_task_priority(void);} 415@end multitable 416 417@item @emph{Fortran}: 418@multitable @columnfractions .20 .80 419@item @emph{Interface}: @tab @code{integer function omp_get_max_task_priority()} 420@end multitable 421 422@item @emph{Reference}: 423@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.29. 424@end table 425 426 427@node omp_get_max_threads 428@section @code{omp_get_max_threads} -- Maximum number of threads of parallel region 429@table @asis 430@item @emph{Description}: 431Return the maximum number of threads used for the current parallel region 432that does not use the clause @code{num_threads}. 433 434@item @emph{C/C++}: 435@multitable @columnfractions .20 .80 436@item @emph{Prototype}: @tab @code{int omp_get_max_threads(void);} 437@end multitable 438 439@item @emph{Fortran}: 440@multitable @columnfractions .20 .80 441@item @emph{Interface}: @tab @code{integer function omp_get_max_threads()} 442@end multitable 443 444@item @emph{See also}: 445@ref{omp_set_num_threads}, @ref{omp_set_dynamic}, @ref{omp_get_thread_limit} 446 447@item @emph{Reference}: 448@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.3. 449@end table 450 451 452 453@node omp_get_nested 454@section @code{omp_get_nested} -- Nested parallel regions 455@table @asis 456@item @emph{Description}: 457This function returns @code{true} if nested parallel regions are 458enabled, @code{false} otherwise. Here, @code{true} and @code{false} 459represent their language-specific counterparts. 460 461Nested parallel regions may be initialized at startup by the 462@env{OMP_NESTED} environment variable or at runtime using 463@code{omp_set_nested}. If undefined, nested parallel regions are 464disabled by default. 465 466@item @emph{C/C++}: 467@multitable @columnfractions .20 .80 468@item @emph{Prototype}: @tab @code{int omp_get_nested(void);} 469@end multitable 470 471@item @emph{Fortran}: 472@multitable @columnfractions .20 .80 473@item @emph{Interface}: @tab @code{logical function omp_get_nested()} 474@end multitable 475 476@item @emph{See also}: 477@ref{omp_set_nested}, @ref{OMP_NESTED} 478 479@item @emph{Reference}: 480@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.11. 481@end table 482 483 484 485@node omp_get_num_devices 486@section @code{omp_get_num_devices} -- Number of target devices 487@table @asis 488@item @emph{Description}: 489Returns the number of target devices. 490 491@item @emph{C/C++}: 492@multitable @columnfractions .20 .80 493@item @emph{Prototype}: @tab @code{int omp_get_num_devices(void);} 494@end multitable 495 496@item @emph{Fortran}: 497@multitable @columnfractions .20 .80 498@item @emph{Interface}: @tab @code{integer function omp_get_num_devices()} 499@end multitable 500 501@item @emph{Reference}: 502@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.31. 503@end table 504 505 506 507@node omp_get_num_procs 508@section @code{omp_get_num_procs} -- Number of processors online 509@table @asis 510@item @emph{Description}: 511Returns the number of processors online on that device. 512 513@item @emph{C/C++}: 514@multitable @columnfractions .20 .80 515@item @emph{Prototype}: @tab @code{int omp_get_num_procs(void);} 516@end multitable 517 518@item @emph{Fortran}: 519@multitable @columnfractions .20 .80 520@item @emph{Interface}: @tab @code{integer function omp_get_num_procs()} 521@end multitable 522 523@item @emph{Reference}: 524@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.5. 525@end table 526 527 528 529@node omp_get_num_teams 530@section @code{omp_get_num_teams} -- Number of teams 531@table @asis 532@item @emph{Description}: 533Returns the number of teams in the current team region. 534 535@item @emph{C/C++}: 536@multitable @columnfractions .20 .80 537@item @emph{Prototype}: @tab @code{int omp_get_num_teams(void);} 538@end multitable 539 540@item @emph{Fortran}: 541@multitable @columnfractions .20 .80 542@item @emph{Interface}: @tab @code{integer function omp_get_num_teams()} 543@end multitable 544 545@item @emph{Reference}: 546@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.32. 547@end table 548 549 550 551@node omp_get_num_threads 552@section @code{omp_get_num_threads} -- Size of the active team 553@table @asis 554@item @emph{Description}: 555Returns the number of threads in the current team. In a sequential section of 556the program @code{omp_get_num_threads} returns 1. 557 558The default team size may be initialized at startup by the 559@env{OMP_NUM_THREADS} environment variable. At runtime, the size 560of the current team may be set either by the @code{NUM_THREADS} 561clause or by @code{omp_set_num_threads}. If none of the above were 562used to define a specific value and @env{OMP_DYNAMIC} is disabled, 563one thread per CPU online is used. 564 565@item @emph{C/C++}: 566@multitable @columnfractions .20 .80 567@item @emph{Prototype}: @tab @code{int omp_get_num_threads(void);} 568@end multitable 569 570@item @emph{Fortran}: 571@multitable @columnfractions .20 .80 572@item @emph{Interface}: @tab @code{integer function omp_get_num_threads()} 573@end multitable 574 575@item @emph{See also}: 576@ref{omp_get_max_threads}, @ref{omp_set_num_threads}, @ref{OMP_NUM_THREADS} 577 578@item @emph{Reference}: 579@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.2. 580@end table 581 582 583 584@node omp_get_proc_bind 585@section @code{omp_get_proc_bind} -- Whether theads may be moved between CPUs 586@table @asis 587@item @emph{Description}: 588This functions returns the currently active thread affinity policy, which is 589set via @env{OMP_PROC_BIND}. Possible values are @code{omp_proc_bind_false}, 590@code{omp_proc_bind_true}, @code{omp_proc_bind_master}, 591@code{omp_proc_bind_close} and @code{omp_proc_bind_spread}. 592 593@item @emph{C/C++}: 594@multitable @columnfractions .20 .80 595@item @emph{Prototype}: @tab @code{omp_proc_bind_t omp_get_proc_bind(void);} 596@end multitable 597 598@item @emph{Fortran}: 599@multitable @columnfractions .20 .80 600@item @emph{Interface}: @tab @code{integer(kind=omp_proc_bind_kind) function omp_get_proc_bind()} 601@end multitable 602 603@item @emph{See also}: 604@ref{OMP_PROC_BIND}, @ref{OMP_PLACES}, @ref{GOMP_CPU_AFFINITY}, 605 606@item @emph{Reference}: 607@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.22. 608@end table 609 610 611 612@node omp_get_schedule 613@section @code{omp_get_schedule} -- Obtain the runtime scheduling method 614@table @asis 615@item @emph{Description}: 616Obtain the runtime scheduling method. The @var{kind} argument will be 617set to the value @code{omp_sched_static}, @code{omp_sched_dynamic}, 618@code{omp_sched_guided} or @code{omp_sched_auto}. The second argument, 619@var{chunk_size}, is set to the chunk size. 620 621@item @emph{C/C++} 622@multitable @columnfractions .20 .80 623@item @emph{Prototype}: @tab @code{void omp_get_schedule(omp_sched_t *kind, int *chunk_size);} 624@end multitable 625 626@item @emph{Fortran}: 627@multitable @columnfractions .20 .80 628@item @emph{Interface}: @tab @code{subroutine omp_get_schedule(kind, chunk_size)} 629@item @tab @code{integer(kind=omp_sched_kind) kind} 630@item @tab @code{integer chunk_size} 631@end multitable 632 633@item @emph{See also}: 634@ref{omp_set_schedule}, @ref{OMP_SCHEDULE} 635 636@item @emph{Reference}: 637@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.13. 638@end table 639 640 641 642@node omp_get_team_num 643@section @code{omp_get_team_num} -- Get team number 644@table @asis 645@item @emph{Description}: 646Returns the team number of the calling thread. 647 648@item @emph{C/C++}: 649@multitable @columnfractions .20 .80 650@item @emph{Prototype}: @tab @code{int omp_get_team_num(void);} 651@end multitable 652 653@item @emph{Fortran}: 654@multitable @columnfractions .20 .80 655@item @emph{Interface}: @tab @code{integer function omp_get_team_num()} 656@end multitable 657 658@item @emph{Reference}: 659@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.33. 660@end table 661 662 663 664@node omp_get_team_size 665@section @code{omp_get_team_size} -- Number of threads in a team 666@table @asis 667@item @emph{Description}: 668This function returns the number of threads in a thread team to which 669either the current thread or its ancestor belongs. For values of @var{level} 670outside zero to @code{omp_get_level}, -1 is returned; if @var{level} is zero, 6711 is returned, and for @code{omp_get_level}, the result is identical 672to @code{omp_get_num_threads}. 673 674@item @emph{C/C++}: 675@multitable @columnfractions .20 .80 676@item @emph{Prototype}: @tab @code{int omp_get_team_size(int level);} 677@end multitable 678 679@item @emph{Fortran}: 680@multitable @columnfractions .20 .80 681@item @emph{Interface}: @tab @code{integer function omp_get_team_size(level)} 682@item @tab @code{integer level} 683@end multitable 684 685@item @emph{See also}: 686@ref{omp_get_num_threads}, @ref{omp_get_level}, @ref{omp_get_ancestor_thread_num} 687 688@item @emph{Reference}: 689@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.19. 690@end table 691 692 693 694@node omp_get_thread_limit 695@section @code{omp_get_thread_limit} -- Maximum number of threads 696@table @asis 697@item @emph{Description}: 698Return the maximum number of threads of the program. 699 700@item @emph{C/C++}: 701@multitable @columnfractions .20 .80 702@item @emph{Prototype}: @tab @code{int omp_get_thread_limit(void);} 703@end multitable 704 705@item @emph{Fortran}: 706@multitable @columnfractions .20 .80 707@item @emph{Interface}: @tab @code{integer function omp_get_thread_limit()} 708@end multitable 709 710@item @emph{See also}: 711@ref{omp_get_max_threads}, @ref{OMP_THREAD_LIMIT} 712 713@item @emph{Reference}: 714@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.14. 715@end table 716 717 718 719@node omp_get_thread_num 720@section @code{omp_get_thread_num} -- Current thread ID 721@table @asis 722@item @emph{Description}: 723Returns a unique thread identification number within the current team. 724In a sequential parts of the program, @code{omp_get_thread_num} 725always returns 0. In parallel regions the return value varies 726from 0 to @code{omp_get_num_threads}-1 inclusive. The return 727value of the master thread of a team is always 0. 728 729@item @emph{C/C++}: 730@multitable @columnfractions .20 .80 731@item @emph{Prototype}: @tab @code{int omp_get_thread_num(void);} 732@end multitable 733 734@item @emph{Fortran}: 735@multitable @columnfractions .20 .80 736@item @emph{Interface}: @tab @code{integer function omp_get_thread_num()} 737@end multitable 738 739@item @emph{See also}: 740@ref{omp_get_num_threads}, @ref{omp_get_ancestor_thread_num} 741 742@item @emph{Reference}: 743@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.4. 744@end table 745 746 747 748@node omp_in_parallel 749@section @code{omp_in_parallel} -- Whether a parallel region is active 750@table @asis 751@item @emph{Description}: 752This function returns @code{true} if currently running in parallel, 753@code{false} otherwise. Here, @code{true} and @code{false} represent 754their language-specific counterparts. 755 756@item @emph{C/C++}: 757@multitable @columnfractions .20 .80 758@item @emph{Prototype}: @tab @code{int omp_in_parallel(void);} 759@end multitable 760 761@item @emph{Fortran}: 762@multitable @columnfractions .20 .80 763@item @emph{Interface}: @tab @code{logical function omp_in_parallel()} 764@end multitable 765 766@item @emph{Reference}: 767@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.6. 768@end table 769 770 771@node omp_in_final 772@section @code{omp_in_final} -- Whether in final or included task region 773@table @asis 774@item @emph{Description}: 775This function returns @code{true} if currently running in a final 776or included task region, @code{false} otherwise. Here, @code{true} 777and @code{false} represent their language-specific counterparts. 778 779@item @emph{C/C++}: 780@multitable @columnfractions .20 .80 781@item @emph{Prototype}: @tab @code{int omp_in_final(void);} 782@end multitable 783 784@item @emph{Fortran}: 785@multitable @columnfractions .20 .80 786@item @emph{Interface}: @tab @code{logical function omp_in_final()} 787@end multitable 788 789@item @emph{Reference}: 790@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.21. 791@end table 792 793 794 795@node omp_is_initial_device 796@section @code{omp_is_initial_device} -- Whether executing on the host device 797@table @asis 798@item @emph{Description}: 799This function returns @code{true} if currently running on the host device, 800@code{false} otherwise. Here, @code{true} and @code{false} represent 801their language-specific counterparts. 802 803@item @emph{C/C++}: 804@multitable @columnfractions .20 .80 805@item @emph{Prototype}: @tab @code{int omp_is_initial_device(void);} 806@end multitable 807 808@item @emph{Fortran}: 809@multitable @columnfractions .20 .80 810@item @emph{Interface}: @tab @code{logical function omp_is_initial_device()} 811@end multitable 812 813@item @emph{Reference}: 814@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.34. 815@end table 816 817 818 819@node omp_set_default_device 820@section @code{omp_set_default_device} -- Set the default device for target regions 821@table @asis 822@item @emph{Description}: 823Set the default device for target regions without device clause. The argument 824shall be a nonnegative device number. 825 826@item @emph{C/C++}: 827@multitable @columnfractions .20 .80 828@item @emph{Prototype}: @tab @code{void omp_set_default_device(int device_num);} 829@end multitable 830 831@item @emph{Fortran}: 832@multitable @columnfractions .20 .80 833@item @emph{Interface}: @tab @code{subroutine omp_set_default_device(device_num)} 834@item @tab @code{integer device_num} 835@end multitable 836 837@item @emph{See also}: 838@ref{OMP_DEFAULT_DEVICE}, @ref{omp_get_default_device} 839 840@item @emph{Reference}: 841@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.29. 842@end table 843 844 845 846@node omp_set_dynamic 847@section @code{omp_set_dynamic} -- Enable/disable dynamic teams 848@table @asis 849@item @emph{Description}: 850Enable or disable the dynamic adjustment of the number of threads 851within a team. The function takes the language-specific equivalent 852of @code{true} and @code{false}, where @code{true} enables dynamic 853adjustment of team sizes and @code{false} disables it. 854 855@item @emph{C/C++}: 856@multitable @columnfractions .20 .80 857@item @emph{Prototype}: @tab @code{void omp_set_dynamic(int dynamic_threads);} 858@end multitable 859 860@item @emph{Fortran}: 861@multitable @columnfractions .20 .80 862@item @emph{Interface}: @tab @code{subroutine omp_set_dynamic(dynamic_threads)} 863@item @tab @code{logical, intent(in) :: dynamic_threads} 864@end multitable 865 866@item @emph{See also}: 867@ref{OMP_DYNAMIC}, @ref{omp_get_dynamic} 868 869@item @emph{Reference}: 870@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.7. 871@end table 872 873 874 875@node omp_set_max_active_levels 876@section @code{omp_set_max_active_levels} -- Limits the number of active parallel regions 877@table @asis 878@item @emph{Description}: 879This function limits the maximum allowed number of nested, active 880parallel regions. 881 882@item @emph{C/C++} 883@multitable @columnfractions .20 .80 884@item @emph{Prototype}: @tab @code{void omp_set_max_active_levels(int max_levels);} 885@end multitable 886 887@item @emph{Fortran}: 888@multitable @columnfractions .20 .80 889@item @emph{Interface}: @tab @code{subroutine omp_set_max_active_levels(max_levels)} 890@item @tab @code{integer max_levels} 891@end multitable 892 893@item @emph{See also}: 894@ref{omp_get_max_active_levels}, @ref{omp_get_active_level} 895 896@item @emph{Reference}: 897@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.15. 898@end table 899 900 901 902@node omp_set_nested 903@section @code{omp_set_nested} -- Enable/disable nested parallel regions 904@table @asis 905@item @emph{Description}: 906Enable or disable nested parallel regions, i.e., whether team members 907are allowed to create new teams. The function takes the language-specific 908equivalent of @code{true} and @code{false}, where @code{true} enables 909dynamic adjustment of team sizes and @code{false} disables it. 910 911@item @emph{C/C++}: 912@multitable @columnfractions .20 .80 913@item @emph{Prototype}: @tab @code{void omp_set_nested(int nested);} 914@end multitable 915 916@item @emph{Fortran}: 917@multitable @columnfractions .20 .80 918@item @emph{Interface}: @tab @code{subroutine omp_set_nested(nested)} 919@item @tab @code{logical, intent(in) :: nested} 920@end multitable 921 922@item @emph{See also}: 923@ref{OMP_NESTED}, @ref{omp_get_nested} 924 925@item @emph{Reference}: 926@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.10. 927@end table 928 929 930 931@node omp_set_num_threads 932@section @code{omp_set_num_threads} -- Set upper team size limit 933@table @asis 934@item @emph{Description}: 935Specifies the number of threads used by default in subsequent parallel 936sections, if those do not specify a @code{num_threads} clause. The 937argument of @code{omp_set_num_threads} shall be a positive integer. 938 939@item @emph{C/C++}: 940@multitable @columnfractions .20 .80 941@item @emph{Prototype}: @tab @code{void omp_set_num_threads(int num_threads);} 942@end multitable 943 944@item @emph{Fortran}: 945@multitable @columnfractions .20 .80 946@item @emph{Interface}: @tab @code{subroutine omp_set_num_threads(num_threads)} 947@item @tab @code{integer, intent(in) :: num_threads} 948@end multitable 949 950@item @emph{See also}: 951@ref{OMP_NUM_THREADS}, @ref{omp_get_num_threads}, @ref{omp_get_max_threads} 952 953@item @emph{Reference}: 954@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.1. 955@end table 956 957 958 959@node omp_set_schedule 960@section @code{omp_set_schedule} -- Set the runtime scheduling method 961@table @asis 962@item @emph{Description}: 963Sets the runtime scheduling method. The @var{kind} argument can have the 964value @code{omp_sched_static}, @code{omp_sched_dynamic}, 965@code{omp_sched_guided} or @code{omp_sched_auto}. Except for 966@code{omp_sched_auto}, the chunk size is set to the value of 967@var{chunk_size} if positive, or to the default value if zero or negative. 968For @code{omp_sched_auto} the @var{chunk_size} argument is ignored. 969 970@item @emph{C/C++} 971@multitable @columnfractions .20 .80 972@item @emph{Prototype}: @tab @code{void omp_set_schedule(omp_sched_t kind, int chunk_size);} 973@end multitable 974 975@item @emph{Fortran}: 976@multitable @columnfractions .20 .80 977@item @emph{Interface}: @tab @code{subroutine omp_set_schedule(kind, chunk_size)} 978@item @tab @code{integer(kind=omp_sched_kind) kind} 979@item @tab @code{integer chunk_size} 980@end multitable 981 982@item @emph{See also}: 983@ref{omp_get_schedule} 984@ref{OMP_SCHEDULE} 985 986@item @emph{Reference}: 987@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.12. 988@end table 989 990 991 992@node omp_init_lock 993@section @code{omp_init_lock} -- Initialize simple lock 994@table @asis 995@item @emph{Description}: 996Initialize a simple lock. After initialization, the lock is in 997an unlocked state. 998 999@item @emph{C/C++}: 1000@multitable @columnfractions .20 .80 1001@item @emph{Prototype}: @tab @code{void omp_init_lock(omp_lock_t *lock);} 1002@end multitable 1003 1004@item @emph{Fortran}: 1005@multitable @columnfractions .20 .80 1006@item @emph{Interface}: @tab @code{subroutine omp_init_lock(svar)} 1007@item @tab @code{integer(omp_lock_kind), intent(out) :: svar} 1008@end multitable 1009 1010@item @emph{See also}: 1011@ref{omp_destroy_lock} 1012 1013@item @emph{Reference}: 1014@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.1. 1015@end table 1016 1017 1018 1019@node omp_set_lock 1020@section @code{omp_set_lock} -- Wait for and set simple lock 1021@table @asis 1022@item @emph{Description}: 1023Before setting a simple lock, the lock variable must be initialized by 1024@code{omp_init_lock}. The calling thread is blocked until the lock 1025is available. If the lock is already held by the current thread, 1026a deadlock occurs. 1027 1028@item @emph{C/C++}: 1029@multitable @columnfractions .20 .80 1030@item @emph{Prototype}: @tab @code{void omp_set_lock(omp_lock_t *lock);} 1031@end multitable 1032 1033@item @emph{Fortran}: 1034@multitable @columnfractions .20 .80 1035@item @emph{Interface}: @tab @code{subroutine omp_set_lock(svar)} 1036@item @tab @code{integer(omp_lock_kind), intent(inout) :: svar} 1037@end multitable 1038 1039@item @emph{See also}: 1040@ref{omp_init_lock}, @ref{omp_test_lock}, @ref{omp_unset_lock} 1041 1042@item @emph{Reference}: 1043@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.4. 1044@end table 1045 1046 1047 1048@node omp_test_lock 1049@section @code{omp_test_lock} -- Test and set simple lock if available 1050@table @asis 1051@item @emph{Description}: 1052Before setting a simple lock, the lock variable must be initialized by 1053@code{omp_init_lock}. Contrary to @code{omp_set_lock}, @code{omp_test_lock} 1054does not block if the lock is not available. This function returns 1055@code{true} upon success, @code{false} otherwise. Here, @code{true} and 1056@code{false} represent their language-specific counterparts. 1057 1058@item @emph{C/C++}: 1059@multitable @columnfractions .20 .80 1060@item @emph{Prototype}: @tab @code{int omp_test_lock(omp_lock_t *lock);} 1061@end multitable 1062 1063@item @emph{Fortran}: 1064@multitable @columnfractions .20 .80 1065@item @emph{Interface}: @tab @code{logical function omp_test_lock(svar)} 1066@item @tab @code{integer(omp_lock_kind), intent(inout) :: svar} 1067@end multitable 1068 1069@item @emph{See also}: 1070@ref{omp_init_lock}, @ref{omp_set_lock}, @ref{omp_set_lock} 1071 1072@item @emph{Reference}: 1073@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.6. 1074@end table 1075 1076 1077 1078@node omp_unset_lock 1079@section @code{omp_unset_lock} -- Unset simple lock 1080@table @asis 1081@item @emph{Description}: 1082A simple lock about to be unset must have been locked by @code{omp_set_lock} 1083or @code{omp_test_lock} before. In addition, the lock must be held by the 1084thread calling @code{omp_unset_lock}. Then, the lock becomes unlocked. If one 1085or more threads attempted to set the lock before, one of them is chosen to, 1086again, set the lock to itself. 1087 1088@item @emph{C/C++}: 1089@multitable @columnfractions .20 .80 1090@item @emph{Prototype}: @tab @code{void omp_unset_lock(omp_lock_t *lock);} 1091@end multitable 1092 1093@item @emph{Fortran}: 1094@multitable @columnfractions .20 .80 1095@item @emph{Interface}: @tab @code{subroutine omp_unset_lock(svar)} 1096@item @tab @code{integer(omp_lock_kind), intent(inout) :: svar} 1097@end multitable 1098 1099@item @emph{See also}: 1100@ref{omp_set_lock}, @ref{omp_test_lock} 1101 1102@item @emph{Reference}: 1103@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.5. 1104@end table 1105 1106 1107 1108@node omp_destroy_lock 1109@section @code{omp_destroy_lock} -- Destroy simple lock 1110@table @asis 1111@item @emph{Description}: 1112Destroy a simple lock. In order to be destroyed, a simple lock must be 1113in the unlocked state. 1114 1115@item @emph{C/C++}: 1116@multitable @columnfractions .20 .80 1117@item @emph{Prototype}: @tab @code{void omp_destroy_lock(omp_lock_t *lock);} 1118@end multitable 1119 1120@item @emph{Fortran}: 1121@multitable @columnfractions .20 .80 1122@item @emph{Interface}: @tab @code{subroutine omp_destroy_lock(svar)} 1123@item @tab @code{integer(omp_lock_kind), intent(inout) :: svar} 1124@end multitable 1125 1126@item @emph{See also}: 1127@ref{omp_init_lock} 1128 1129@item @emph{Reference}: 1130@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.3. 1131@end table 1132 1133 1134 1135@node omp_init_nest_lock 1136@section @code{omp_init_nest_lock} -- Initialize nested lock 1137@table @asis 1138@item @emph{Description}: 1139Initialize a nested lock. After initialization, the lock is in 1140an unlocked state and the nesting count is set to zero. 1141 1142@item @emph{C/C++}: 1143@multitable @columnfractions .20 .80 1144@item @emph{Prototype}: @tab @code{void omp_init_nest_lock(omp_nest_lock_t *lock);} 1145@end multitable 1146 1147@item @emph{Fortran}: 1148@multitable @columnfractions .20 .80 1149@item @emph{Interface}: @tab @code{subroutine omp_init_nest_lock(nvar)} 1150@item @tab @code{integer(omp_nest_lock_kind), intent(out) :: nvar} 1151@end multitable 1152 1153@item @emph{See also}: 1154@ref{omp_destroy_nest_lock} 1155 1156@item @emph{Reference}: 1157@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.1. 1158@end table 1159 1160 1161@node omp_set_nest_lock 1162@section @code{omp_set_nest_lock} -- Wait for and set nested lock 1163@table @asis 1164@item @emph{Description}: 1165Before setting a nested lock, the lock variable must be initialized by 1166@code{omp_init_nest_lock}. The calling thread is blocked until the lock 1167is available. If the lock is already held by the current thread, the 1168nesting count for the lock is incremented. 1169 1170@item @emph{C/C++}: 1171@multitable @columnfractions .20 .80 1172@item @emph{Prototype}: @tab @code{void omp_set_nest_lock(omp_nest_lock_t *lock);} 1173@end multitable 1174 1175@item @emph{Fortran}: 1176@multitable @columnfractions .20 .80 1177@item @emph{Interface}: @tab @code{subroutine omp_set_nest_lock(nvar)} 1178@item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar} 1179@end multitable 1180 1181@item @emph{See also}: 1182@ref{omp_init_nest_lock}, @ref{omp_unset_nest_lock} 1183 1184@item @emph{Reference}: 1185@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.4. 1186@end table 1187 1188 1189 1190@node omp_test_nest_lock 1191@section @code{omp_test_nest_lock} -- Test and set nested lock if available 1192@table @asis 1193@item @emph{Description}: 1194Before setting a nested lock, the lock variable must be initialized by 1195@code{omp_init_nest_lock}. Contrary to @code{omp_set_nest_lock}, 1196@code{omp_test_nest_lock} does not block if the lock is not available. 1197If the lock is already held by the current thread, the new nesting count 1198is returned. Otherwise, the return value equals zero. 1199 1200@item @emph{C/C++}: 1201@multitable @columnfractions .20 .80 1202@item @emph{Prototype}: @tab @code{int omp_test_nest_lock(omp_nest_lock_t *lock);} 1203@end multitable 1204 1205@item @emph{Fortran}: 1206@multitable @columnfractions .20 .80 1207@item @emph{Interface}: @tab @code{logical function omp_test_nest_lock(nvar)} 1208@item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar} 1209@end multitable 1210 1211 1212@item @emph{See also}: 1213@ref{omp_init_lock}, @ref{omp_set_lock}, @ref{omp_set_lock} 1214 1215@item @emph{Reference}: 1216@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.6. 1217@end table 1218 1219 1220 1221@node omp_unset_nest_lock 1222@section @code{omp_unset_nest_lock} -- Unset nested lock 1223@table @asis 1224@item @emph{Description}: 1225A nested lock about to be unset must have been locked by @code{omp_set_nested_lock} 1226or @code{omp_test_nested_lock} before. In addition, the lock must be held by the 1227thread calling @code{omp_unset_nested_lock}. If the nesting count drops to zero, the 1228lock becomes unlocked. If one ore more threads attempted to set the lock before, 1229one of them is chosen to, again, set the lock to itself. 1230 1231@item @emph{C/C++}: 1232@multitable @columnfractions .20 .80 1233@item @emph{Prototype}: @tab @code{void omp_unset_nest_lock(omp_nest_lock_t *lock);} 1234@end multitable 1235 1236@item @emph{Fortran}: 1237@multitable @columnfractions .20 .80 1238@item @emph{Interface}: @tab @code{subroutine omp_unset_nest_lock(nvar)} 1239@item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar} 1240@end multitable 1241 1242@item @emph{See also}: 1243@ref{omp_set_nest_lock} 1244 1245@item @emph{Reference}: 1246@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.5. 1247@end table 1248 1249 1250 1251@node omp_destroy_nest_lock 1252@section @code{omp_destroy_nest_lock} -- Destroy nested lock 1253@table @asis 1254@item @emph{Description}: 1255Destroy a nested lock. In order to be destroyed, a nested lock must be 1256in the unlocked state and its nesting count must equal zero. 1257 1258@item @emph{C/C++}: 1259@multitable @columnfractions .20 .80 1260@item @emph{Prototype}: @tab @code{void omp_destroy_nest_lock(omp_nest_lock_t *);} 1261@end multitable 1262 1263@item @emph{Fortran}: 1264@multitable @columnfractions .20 .80 1265@item @emph{Interface}: @tab @code{subroutine omp_destroy_nest_lock(nvar)} 1266@item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar} 1267@end multitable 1268 1269@item @emph{See also}: 1270@ref{omp_init_lock} 1271 1272@item @emph{Reference}: 1273@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.3. 1274@end table 1275 1276 1277 1278@node omp_get_wtick 1279@section @code{omp_get_wtick} -- Get timer precision 1280@table @asis 1281@item @emph{Description}: 1282Gets the timer precision, i.e., the number of seconds between two 1283successive clock ticks. 1284 1285@item @emph{C/C++}: 1286@multitable @columnfractions .20 .80 1287@item @emph{Prototype}: @tab @code{double omp_get_wtick(void);} 1288@end multitable 1289 1290@item @emph{Fortran}: 1291@multitable @columnfractions .20 .80 1292@item @emph{Interface}: @tab @code{double precision function omp_get_wtick()} 1293@end multitable 1294 1295@item @emph{See also}: 1296@ref{omp_get_wtime} 1297 1298@item @emph{Reference}: 1299@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.4.2. 1300@end table 1301 1302 1303 1304@node omp_get_wtime 1305@section @code{omp_get_wtime} -- Elapsed wall clock time 1306@table @asis 1307@item @emph{Description}: 1308Elapsed wall clock time in seconds. The time is measured per thread, no 1309guarantee can be made that two distinct threads measure the same time. 1310Time is measured from some "time in the past", which is an arbitrary time 1311guaranteed not to change during the execution of the program. 1312 1313@item @emph{C/C++}: 1314@multitable @columnfractions .20 .80 1315@item @emph{Prototype}: @tab @code{double omp_get_wtime(void);} 1316@end multitable 1317 1318@item @emph{Fortran}: 1319@multitable @columnfractions .20 .80 1320@item @emph{Interface}: @tab @code{double precision function omp_get_wtime()} 1321@end multitable 1322 1323@item @emph{See also}: 1324@ref{omp_get_wtick} 1325 1326@item @emph{Reference}: 1327@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.4.1. 1328@end table 1329 1330 1331 1332@c --------------------------------------------------------------------- 1333@c OpenMP Environment Variables 1334@c --------------------------------------------------------------------- 1335 1336@node Environment Variables 1337@chapter OpenMP Environment Variables 1338 1339The environment variables which beginning with @env{OMP_} are defined by 1340section 4 of the OpenMP specification in version 4.5, while those 1341beginning with @env{GOMP_} are GNU extensions. 1342 1343@menu 1344* OMP_CANCELLATION:: Set whether cancellation is activated 1345* OMP_DISPLAY_ENV:: Show OpenMP version and environment variables 1346* OMP_DEFAULT_DEVICE:: Set the device used in target regions 1347* OMP_DYNAMIC:: Dynamic adjustment of threads 1348* OMP_MAX_ACTIVE_LEVELS:: Set the maximum number of nested parallel regions 1349* OMP_MAX_TASK_PRIORITY:: Set the maximum task priority value 1350* OMP_NESTED:: Nested parallel regions 1351* OMP_NUM_THREADS:: Specifies the number of threads to use 1352* OMP_PROC_BIND:: Whether theads may be moved between CPUs 1353* OMP_PLACES:: Specifies on which CPUs the theads should be placed 1354* OMP_STACKSIZE:: Set default thread stack size 1355* OMP_SCHEDULE:: How threads are scheduled 1356* OMP_THREAD_LIMIT:: Set the maximum number of threads 1357* OMP_WAIT_POLICY:: How waiting threads are handled 1358* GOMP_CPU_AFFINITY:: Bind threads to specific CPUs 1359* GOMP_DEBUG:: Enable debugging output 1360* GOMP_STACKSIZE:: Set default thread stack size 1361* GOMP_SPINCOUNT:: Set the busy-wait spin count 1362* GOMP_RTEMS_THREAD_POOLS:: Set the RTEMS specific thread pools 1363@end menu 1364 1365 1366@node OMP_CANCELLATION 1367@section @env{OMP_CANCELLATION} -- Set whether cancellation is activated 1368@cindex Environment Variable 1369@table @asis 1370@item @emph{Description}: 1371If set to @code{TRUE}, the cancellation is activated. If set to @code{FALSE} or 1372if unset, cancellation is disabled and the @code{cancel} construct is ignored. 1373 1374@item @emph{See also}: 1375@ref{omp_get_cancellation} 1376 1377@item @emph{Reference}: 1378@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.11 1379@end table 1380 1381 1382 1383@node OMP_DISPLAY_ENV 1384@section @env{OMP_DISPLAY_ENV} -- Show OpenMP version and environment variables 1385@cindex Environment Variable 1386@table @asis 1387@item @emph{Description}: 1388If set to @code{TRUE}, the OpenMP version number and the values 1389associated with the OpenMP environment variables are printed to @code{stderr}. 1390If set to @code{VERBOSE}, it additionally shows the value of the environment 1391variables which are GNU extensions. If undefined or set to @code{FALSE}, 1392this information will not be shown. 1393 1394 1395@item @emph{Reference}: 1396@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.12 1397@end table 1398 1399 1400 1401@node OMP_DEFAULT_DEVICE 1402@section @env{OMP_DEFAULT_DEVICE} -- Set the device used in target regions 1403@cindex Environment Variable 1404@table @asis 1405@item @emph{Description}: 1406Set to choose the device which is used in a @code{target} region, unless the 1407value is overridden by @code{omp_set_default_device} or by a @code{device} 1408clause. The value shall be the nonnegative device number. If no device with 1409the given device number exists, the code is executed on the host. If unset, 1410device number 0 will be used. 1411 1412 1413@item @emph{See also}: 1414@ref{omp_get_default_device}, @ref{omp_set_default_device}, 1415 1416@item @emph{Reference}: 1417@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.13 1418@end table 1419 1420 1421 1422@node OMP_DYNAMIC 1423@section @env{OMP_DYNAMIC} -- Dynamic adjustment of threads 1424@cindex Environment Variable 1425@table @asis 1426@item @emph{Description}: 1427Enable or disable the dynamic adjustment of the number of threads 1428within a team. The value of this environment variable shall be 1429@code{TRUE} or @code{FALSE}. If undefined, dynamic adjustment is 1430disabled by default. 1431 1432@item @emph{See also}: 1433@ref{omp_set_dynamic} 1434 1435@item @emph{Reference}: 1436@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.3 1437@end table 1438 1439 1440 1441@node OMP_MAX_ACTIVE_LEVELS 1442@section @env{OMP_MAX_ACTIVE_LEVELS} -- Set the maximum number of nested parallel regions 1443@cindex Environment Variable 1444@table @asis 1445@item @emph{Description}: 1446Specifies the initial value for the maximum number of nested parallel 1447regions. The value of this variable shall be a positive integer. 1448If undefined, the number of active levels is unlimited. 1449 1450@item @emph{See also}: 1451@ref{omp_set_max_active_levels} 1452 1453@item @emph{Reference}: 1454@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.9 1455@end table 1456 1457 1458 1459@node OMP_MAX_TASK_PRIORITY 1460@section @env{OMP_MAX_TASK_PRIORITY} -- Set the maximum priority 1461number that can be set for a task. 1462@cindex Environment Variable 1463@table @asis 1464@item @emph{Description}: 1465Specifies the initial value for the maximum priority value that can be 1466set for a task. The value of this variable shall be a non-negative 1467integer, and zero is allowed. If undefined, the default priority is 14680. 1469 1470@item @emph{See also}: 1471@ref{omp_get_max_task_priority} 1472 1473@item @emph{Reference}: 1474@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.14 1475@end table 1476 1477 1478 1479@node OMP_NESTED 1480@section @env{OMP_NESTED} -- Nested parallel regions 1481@cindex Environment Variable 1482@cindex Implementation specific setting 1483@table @asis 1484@item @emph{Description}: 1485Enable or disable nested parallel regions, i.e., whether team members 1486are allowed to create new teams. The value of this environment variable 1487shall be @code{TRUE} or @code{FALSE}. If undefined, nested parallel 1488regions are disabled by default. 1489 1490@item @emph{See also}: 1491@ref{omp_set_nested} 1492 1493@item @emph{Reference}: 1494@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.6 1495@end table 1496 1497 1498 1499@node OMP_NUM_THREADS 1500@section @env{OMP_NUM_THREADS} -- Specifies the number of threads to use 1501@cindex Environment Variable 1502@cindex Implementation specific setting 1503@table @asis 1504@item @emph{Description}: 1505Specifies the default number of threads to use in parallel regions. The 1506value of this variable shall be a comma-separated list of positive integers; 1507the value specified the number of threads to use for the corresponding nested 1508level. If undefined one thread per CPU is used. 1509 1510@item @emph{See also}: 1511@ref{omp_set_num_threads} 1512 1513@item @emph{Reference}: 1514@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.2 1515@end table 1516 1517 1518 1519@node OMP_PROC_BIND 1520@section @env{OMP_PROC_BIND} -- Whether theads may be moved between CPUs 1521@cindex Environment Variable 1522@table @asis 1523@item @emph{Description}: 1524Specifies whether threads may be moved between processors. If set to 1525@code{TRUE}, OpenMP theads should not be moved; if set to @code{FALSE} 1526they may be moved. Alternatively, a comma separated list with the 1527values @code{MASTER}, @code{CLOSE} and @code{SPREAD} can be used to specify 1528the thread affinity policy for the corresponding nesting level. With 1529@code{MASTER} the worker threads are in the same place partition as the 1530master thread. With @code{CLOSE} those are kept close to the master thread 1531in contiguous place partitions. And with @code{SPREAD} a sparse distribution 1532across the place partitions is used. 1533 1534When undefined, @env{OMP_PROC_BIND} defaults to @code{TRUE} when 1535@env{OMP_PLACES} or @env{GOMP_CPU_AFFINITY} is set and @code{FALSE} otherwise. 1536 1537@item @emph{See also}: 1538@ref{OMP_PLACES}, @ref{GOMP_CPU_AFFINITY}, @ref{omp_get_proc_bind} 1539 1540@item @emph{Reference}: 1541@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.4 1542@end table 1543 1544 1545 1546@node OMP_PLACES 1547@section @env{OMP_PLACES} -- Specifies on which CPUs the theads should be placed 1548@cindex Environment Variable 1549@table @asis 1550@item @emph{Description}: 1551The thread placement can be either specified using an abstract name or by an 1552explicit list of the places. The abstract names @code{threads}, @code{cores} 1553and @code{sockets} can be optionally followed by a positive number in 1554parentheses, which denotes the how many places shall be created. With 1555@code{threads} each place corresponds to a single hardware thread; @code{cores} 1556to a single core with the corresponding number of hardware threads; and with 1557@code{sockets} the place corresponds to a single socket. The resulting 1558placement can be shown by setting the @env{OMP_DISPLAY_ENV} environment 1559variable. 1560 1561Alternatively, the placement can be specified explicitly as comma-separated 1562list of places. A place is specified by set of nonnegative numbers in curly 1563braces, denoting the denoting the hardware threads. The hardware threads 1564belonging to a place can either be specified as comma-separated list of 1565nonnegative thread numbers or using an interval. Multiple places can also be 1566either specified by a comma-separated list of places or by an interval. To 1567specify an interval, a colon followed by the count is placed after after 1568the hardware thread number or the place. Optionally, the length can be 1569followed by a colon and the stride number -- otherwise a unit stride is 1570assumed. For instance, the following specifies the same places list: 1571@code{"@{0,1,2@}, @{3,4,6@}, @{7,8,9@}, @{10,11,12@}"}; 1572@code{"@{0:3@}, @{3:3@}, @{7:3@}, @{10:3@}"}; and @code{"@{0:2@}:4:3"}. 1573 1574If @env{OMP_PLACES} and @env{GOMP_CPU_AFFINITY} are unset and 1575@env{OMP_PROC_BIND} is either unset or @code{false}, threads may be moved 1576between CPUs following no placement policy. 1577 1578@item @emph{See also}: 1579@ref{OMP_PROC_BIND}, @ref{GOMP_CPU_AFFINITY}, @ref{omp_get_proc_bind}, 1580@ref{OMP_DISPLAY_ENV} 1581 1582@item @emph{Reference}: 1583@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.5 1584@end table 1585 1586 1587 1588@node OMP_STACKSIZE 1589@section @env{OMP_STACKSIZE} -- Set default thread stack size 1590@cindex Environment Variable 1591@table @asis 1592@item @emph{Description}: 1593Set the default thread stack size in kilobytes, unless the number 1594is suffixed by @code{B}, @code{K}, @code{M} or @code{G}, in which 1595case the size is, respectively, in bytes, kilobytes, megabytes 1596or gigabytes. This is different from @code{pthread_attr_setstacksize} 1597which gets the number of bytes as an argument. If the stack size cannot 1598be set due to system constraints, an error is reported and the initial 1599stack size is left unchanged. If undefined, the stack size is system 1600dependent. 1601 1602@item @emph{Reference}: 1603@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.7 1604@end table 1605 1606 1607 1608@node OMP_SCHEDULE 1609@section @env{OMP_SCHEDULE} -- How threads are scheduled 1610@cindex Environment Variable 1611@cindex Implementation specific setting 1612@table @asis 1613@item @emph{Description}: 1614Allows to specify @code{schedule type} and @code{chunk size}. 1615The value of the variable shall have the form: @code{type[,chunk]} where 1616@code{type} is one of @code{static}, @code{dynamic}, @code{guided} or @code{auto} 1617The optional @code{chunk} size shall be a positive integer. If undefined, 1618dynamic scheduling and a chunk size of 1 is used. 1619 1620@item @emph{See also}: 1621@ref{omp_set_schedule} 1622 1623@item @emph{Reference}: 1624@uref{https://www.openmp.org, OpenMP specification v4.5}, Sections 2.7.1.1 and 4.1 1625@end table 1626 1627 1628 1629@node OMP_THREAD_LIMIT 1630@section @env{OMP_THREAD_LIMIT} -- Set the maximum number of threads 1631@cindex Environment Variable 1632@table @asis 1633@item @emph{Description}: 1634Specifies the number of threads to use for the whole program. The 1635value of this variable shall be a positive integer. If undefined, 1636the number of threads is not limited. 1637 1638@item @emph{See also}: 1639@ref{OMP_NUM_THREADS}, @ref{omp_get_thread_limit} 1640 1641@item @emph{Reference}: 1642@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.10 1643@end table 1644 1645 1646 1647@node OMP_WAIT_POLICY 1648@section @env{OMP_WAIT_POLICY} -- How waiting threads are handled 1649@cindex Environment Variable 1650@table @asis 1651@item @emph{Description}: 1652Specifies whether waiting threads should be active or passive. If 1653the value is @code{PASSIVE}, waiting threads should not consume CPU 1654power while waiting; while the value is @code{ACTIVE} specifies that 1655they should. If undefined, threads wait actively for a short time 1656before waiting passively. 1657 1658@item @emph{See also}: 1659@ref{GOMP_SPINCOUNT} 1660 1661@item @emph{Reference}: 1662@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.8 1663@end table 1664 1665 1666 1667@node GOMP_CPU_AFFINITY 1668@section @env{GOMP_CPU_AFFINITY} -- Bind threads to specific CPUs 1669@cindex Environment Variable 1670@table @asis 1671@item @emph{Description}: 1672Binds threads to specific CPUs. The variable should contain a space-separated 1673or comma-separated list of CPUs. This list may contain different kinds of 1674entries: either single CPU numbers in any order, a range of CPUs (M-N) 1675or a range with some stride (M-N:S). CPU numbers are zero based. For example, 1676@code{GOMP_CPU_AFFINITY="0 3 1-2 4-15:2"} will bind the initial thread 1677to CPU 0, the second to CPU 3, the third to CPU 1, the fourth to 1678CPU 2, the fifth to CPU 4, the sixth through tenth to CPUs 6, 8, 10, 12, 1679and 14 respectively and then start assigning back from the beginning of 1680the list. @code{GOMP_CPU_AFFINITY=0} binds all threads to CPU 0. 1681 1682There is no libgomp library routine to determine whether a CPU affinity 1683specification is in effect. As a workaround, language-specific library 1684functions, e.g., @code{getenv} in C or @code{GET_ENVIRONMENT_VARIABLE} in 1685Fortran, may be used to query the setting of the @code{GOMP_CPU_AFFINITY} 1686environment variable. A defined CPU affinity on startup cannot be changed 1687or disabled during the runtime of the application. 1688 1689If both @env{GOMP_CPU_AFFINITY} and @env{OMP_PROC_BIND} are set, 1690@env{OMP_PROC_BIND} has a higher precedence. If neither has been set and 1691@env{OMP_PROC_BIND} is unset, or when @env{OMP_PROC_BIND} is set to 1692@code{FALSE}, the host system will handle the assignment of threads to CPUs. 1693 1694@item @emph{See also}: 1695@ref{OMP_PLACES}, @ref{OMP_PROC_BIND} 1696@end table 1697 1698 1699 1700@node GOMP_DEBUG 1701@section @env{GOMP_DEBUG} -- Enable debugging output 1702@cindex Environment Variable 1703@table @asis 1704@item @emph{Description}: 1705Enable debugging output. The variable should be set to @code{0} 1706(disabled, also the default if not set), or @code{1} (enabled). 1707 1708If enabled, some debugging output will be printed during execution. 1709This is currently not specified in more detail, and subject to change. 1710@end table 1711 1712 1713 1714@node GOMP_STACKSIZE 1715@section @env{GOMP_STACKSIZE} -- Set default thread stack size 1716@cindex Environment Variable 1717@cindex Implementation specific setting 1718@table @asis 1719@item @emph{Description}: 1720Set the default thread stack size in kilobytes. This is different from 1721@code{pthread_attr_setstacksize} which gets the number of bytes as an 1722argument. If the stack size cannot be set due to system constraints, an 1723error is reported and the initial stack size is left unchanged. If undefined, 1724the stack size is system dependent. 1725 1726@item @emph{See also}: 1727@ref{OMP_STACKSIZE} 1728 1729@item @emph{Reference}: 1730@uref{https://gcc.gnu.org/ml/gcc-patches/2006-06/msg00493.html, 1731GCC Patches Mailinglist}, 1732@uref{https://gcc.gnu.org/ml/gcc-patches/2006-06/msg00496.html, 1733GCC Patches Mailinglist} 1734@end table 1735 1736 1737 1738@node GOMP_SPINCOUNT 1739@section @env{GOMP_SPINCOUNT} -- Set the busy-wait spin count 1740@cindex Environment Variable 1741@cindex Implementation specific setting 1742@table @asis 1743@item @emph{Description}: 1744Determines how long a threads waits actively with consuming CPU power 1745before waiting passively without consuming CPU power. The value may be 1746either @code{INFINITE}, @code{INFINITY} to always wait actively or an 1747integer which gives the number of spins of the busy-wait loop. The 1748integer may optionally be followed by the following suffixes acting 1749as multiplication factors: @code{k} (kilo, thousand), @code{M} (mega, 1750million), @code{G} (giga, billion), or @code{T} (tera, trillion). 1751If undefined, 0 is used when @env{OMP_WAIT_POLICY} is @code{PASSIVE}, 1752300,000 is used when @env{OMP_WAIT_POLICY} is undefined and 175330 billion is used when @env{OMP_WAIT_POLICY} is @code{ACTIVE}. 1754If there are more OpenMP threads than available CPUs, 1000 and 100 1755spins are used for @env{OMP_WAIT_POLICY} being @code{ACTIVE} or 1756undefined, respectively; unless the @env{GOMP_SPINCOUNT} is lower 1757or @env{OMP_WAIT_POLICY} is @code{PASSIVE}. 1758 1759@item @emph{See also}: 1760@ref{OMP_WAIT_POLICY} 1761@end table 1762 1763 1764 1765@node GOMP_RTEMS_THREAD_POOLS 1766@section @env{GOMP_RTEMS_THREAD_POOLS} -- Set the RTEMS specific thread pools 1767@cindex Environment Variable 1768@cindex Implementation specific setting 1769@table @asis 1770@item @emph{Description}: 1771This environment variable is only used on the RTEMS real-time operating system. 1772It determines the scheduler instance specific thread pools. The format for 1773@env{GOMP_RTEMS_THREAD_POOLS} is a list of optional 1774@code{<thread-pool-count>[$<priority>]@@<scheduler-name>} configurations 1775separated by @code{:} where: 1776@itemize @bullet 1777@item @code{<thread-pool-count>} is the thread pool count for this scheduler 1778instance. 1779@item @code{$<priority>} is an optional priority for the worker threads of a 1780thread pool according to @code{pthread_setschedparam}. In case a priority 1781value is omitted, then a worker thread will inherit the priority of the OpenMP 1782master thread that created it. The priority of the worker thread is not 1783changed after creation, even if a new OpenMP master thread using the worker has 1784a different priority. 1785@item @code{@@<scheduler-name>} is the scheduler instance name according to the 1786RTEMS application configuration. 1787@end itemize 1788In case no thread pool configuration is specified for a scheduler instance, 1789then each OpenMP master thread of this scheduler instance will use its own 1790dynamically allocated thread pool. To limit the worker thread count of the 1791thread pools, each OpenMP master thread must call @code{omp_set_num_threads}. 1792@item @emph{Example}: 1793Lets suppose we have three scheduler instances @code{IO}, @code{WRK0}, and 1794@code{WRK1} with @env{GOMP_RTEMS_THREAD_POOLS} set to 1795@code{"1@@WRK0:3$4@@WRK1"}. Then there are no thread pool restrictions for 1796scheduler instance @code{IO}. In the scheduler instance @code{WRK0} there is 1797one thread pool available. Since no priority is specified for this scheduler 1798instance, the worker thread inherits the priority of the OpenMP master thread 1799that created it. In the scheduler instance @code{WRK1} there are three thread 1800pools available and their worker threads run at priority four. 1801@end table 1802 1803 1804 1805@c --------------------------------------------------------------------- 1806@c Enabling OpenACC 1807@c --------------------------------------------------------------------- 1808 1809@node Enabling OpenACC 1810@chapter Enabling OpenACC 1811 1812To activate the OpenACC extensions for C/C++ and Fortran, the compile-time 1813flag @option{-fopenacc} must be specified. This enables the OpenACC directive 1814@code{#pragma acc} in C/C++ and @code{!$acc} directives in free form, 1815@code{c$acc}, @code{*$acc} and @code{!$acc} directives in fixed form, 1816@code{!$} conditional compilation sentinels in free form and @code{c$}, 1817@code{*$} and @code{!$} sentinels in fixed form, for Fortran. The flag also 1818arranges for automatic linking of the OpenACC runtime library 1819(@ref{OpenACC Runtime Library Routines}). 1820 1821See @uref{https://gcc.gnu.org/wiki/OpenACC} for more information. 1822 1823A complete description of all OpenACC directives accepted may be found in 1824the @uref{https://www.openacc.org, OpenACC} Application Programming 1825Interface manual, version 2.6. 1826 1827 1828 1829@c --------------------------------------------------------------------- 1830@c OpenACC Runtime Library Routines 1831@c --------------------------------------------------------------------- 1832 1833@node OpenACC Runtime Library Routines 1834@chapter OpenACC Runtime Library Routines 1835 1836The runtime routines described here are defined by section 3 of the OpenACC 1837specifications in version 2.6. 1838They have C linkage, and do not throw exceptions. 1839Generally, they are available only for the host, with the exception of 1840@code{acc_on_device}, which is available for both the host and the 1841acceleration device. 1842 1843@menu 1844* acc_get_num_devices:: Get number of devices for the given device 1845 type. 1846* acc_set_device_type:: Set type of device accelerator to use. 1847* acc_get_device_type:: Get type of device accelerator to be used. 1848* acc_set_device_num:: Set device number to use. 1849* acc_get_device_num:: Get device number to be used. 1850* acc_get_property:: Get device property. 1851* acc_async_test:: Tests for completion of a specific asynchronous 1852 operation. 1853* acc_async_test_all:: Tests for completion of all asynchronous 1854 operations. 1855* acc_wait:: Wait for completion of a specific asynchronous 1856 operation. 1857* acc_wait_all:: Waits for completion of all asynchronous 1858 operations. 1859* acc_wait_all_async:: Wait for completion of all asynchronous 1860 operations. 1861* acc_wait_async:: Wait for completion of asynchronous operations. 1862* acc_init:: Initialize runtime for a specific device type. 1863* acc_shutdown:: Shuts down the runtime for a specific device 1864 type. 1865* acc_on_device:: Whether executing on a particular device 1866* acc_malloc:: Allocate device memory. 1867* acc_free:: Free device memory. 1868* acc_copyin:: Allocate device memory and copy host memory to 1869 it. 1870* acc_present_or_copyin:: If the data is not present on the device, 1871 allocate device memory and copy from host 1872 memory. 1873* acc_create:: Allocate device memory and map it to host 1874 memory. 1875* acc_present_or_create:: If the data is not present on the device, 1876 allocate device memory and map it to host 1877 memory. 1878* acc_copyout:: Copy device memory to host memory. 1879* acc_delete:: Free device memory. 1880* acc_update_device:: Update device memory from mapped host memory. 1881* acc_update_self:: Update host memory from mapped device memory. 1882* acc_map_data:: Map previously allocated device memory to host 1883 memory. 1884* acc_unmap_data:: Unmap device memory from host memory. 1885* acc_deviceptr:: Get device pointer associated with specific 1886 host address. 1887* acc_hostptr:: Get host pointer associated with specific 1888 device address. 1889* acc_is_present:: Indicate whether host variable / array is 1890 present on device. 1891* acc_memcpy_to_device:: Copy host memory to device memory. 1892* acc_memcpy_from_device:: Copy device memory to host memory. 1893* acc_attach:: Let device pointer point to device-pointer target. 1894* acc_detach:: Let device pointer point to host-pointer target. 1895 1896API routines for target platforms. 1897 1898* acc_get_current_cuda_device:: Get CUDA device handle. 1899* acc_get_current_cuda_context::Get CUDA context handle. 1900* acc_get_cuda_stream:: Get CUDA stream handle. 1901* acc_set_cuda_stream:: Set CUDA stream handle. 1902 1903API routines for the OpenACC Profiling Interface. 1904 1905* acc_prof_register:: Register callbacks. 1906* acc_prof_unregister:: Unregister callbacks. 1907* acc_prof_lookup:: Obtain inquiry functions. 1908* acc_register_library:: Library registration. 1909@end menu 1910 1911 1912 1913@node acc_get_num_devices 1914@section @code{acc_get_num_devices} -- Get number of devices for given device type 1915@table @asis 1916@item @emph{Description} 1917This function returns a value indicating the number of devices available 1918for the device type specified in @var{devicetype}. 1919 1920@item @emph{C/C++}: 1921@multitable @columnfractions .20 .80 1922@item @emph{Prototype}: @tab @code{int acc_get_num_devices(acc_device_t devicetype);} 1923@end multitable 1924 1925@item @emph{Fortran}: 1926@multitable @columnfractions .20 .80 1927@item @emph{Interface}: @tab @code{integer function acc_get_num_devices(devicetype)} 1928@item @tab @code{integer(kind=acc_device_kind) devicetype} 1929@end multitable 1930 1931@item @emph{Reference}: 1932@uref{https://www.openacc.org, OpenACC specification v2.6}, section 19333.2.1. 1934@end table 1935 1936 1937 1938@node acc_set_device_type 1939@section @code{acc_set_device_type} -- Set type of device accelerator to use. 1940@table @asis 1941@item @emph{Description} 1942This function indicates to the runtime library which device type, specified 1943in @var{devicetype}, to use when executing a parallel or kernels region. 1944 1945@item @emph{C/C++}: 1946@multitable @columnfractions .20 .80 1947@item @emph{Prototype}: @tab @code{acc_set_device_type(acc_device_t devicetype);} 1948@end multitable 1949 1950@item @emph{Fortran}: 1951@multitable @columnfractions .20 .80 1952@item @emph{Interface}: @tab @code{subroutine acc_set_device_type(devicetype)} 1953@item @tab @code{integer(kind=acc_device_kind) devicetype} 1954@end multitable 1955 1956@item @emph{Reference}: 1957@uref{https://www.openacc.org, OpenACC specification v2.6}, section 19583.2.2. 1959@end table 1960 1961 1962 1963@node acc_get_device_type 1964@section @code{acc_get_device_type} -- Get type of device accelerator to be used. 1965@table @asis 1966@item @emph{Description} 1967This function returns what device type will be used when executing a 1968parallel or kernels region. 1969 1970This function returns @code{acc_device_none} if 1971@code{acc_get_device_type} is called from 1972@code{acc_ev_device_init_start}, @code{acc_ev_device_init_end} 1973callbacks of the OpenACC Profiling Interface (@ref{OpenACC Profiling 1974Interface}), that is, if the device is currently being initialized. 1975 1976@item @emph{C/C++}: 1977@multitable @columnfractions .20 .80 1978@item @emph{Prototype}: @tab @code{acc_device_t acc_get_device_type(void);} 1979@end multitable 1980 1981@item @emph{Fortran}: 1982@multitable @columnfractions .20 .80 1983@item @emph{Interface}: @tab @code{function acc_get_device_type(void)} 1984@item @tab @code{integer(kind=acc_device_kind) acc_get_device_type} 1985@end multitable 1986 1987@item @emph{Reference}: 1988@uref{https://www.openacc.org, OpenACC specification v2.6}, section 19893.2.3. 1990@end table 1991 1992 1993 1994@node acc_set_device_num 1995@section @code{acc_set_device_num} -- Set device number to use. 1996@table @asis 1997@item @emph{Description} 1998This function will indicate to the runtime which device number, 1999specified by @var{devicenum}, associated with the specified device 2000type @var{devicetype}. 2001 2002@item @emph{C/C++}: 2003@multitable @columnfractions .20 .80 2004@item @emph{Prototype}: @tab @code{acc_set_device_num(int devicenum, acc_device_t devicetype);} 2005@end multitable 2006 2007@item @emph{Fortran}: 2008@multitable @columnfractions .20 .80 2009@item @emph{Interface}: @tab @code{subroutine acc_set_device_num(devicenum, devicetype)} 2010@item @tab @code{integer devicenum} 2011@item @tab @code{integer(kind=acc_device_kind) devicetype} 2012@end multitable 2013 2014@item @emph{Reference}: 2015@uref{https://www.openacc.org, OpenACC specification v2.6}, section 20163.2.4. 2017@end table 2018 2019 2020 2021@node acc_get_device_num 2022@section @code{acc_get_device_num} -- Get device number to be used. 2023@table @asis 2024@item @emph{Description} 2025This function returns which device number associated with the specified device 2026type @var{devicetype}, will be used when executing a parallel or kernels 2027region. 2028 2029@item @emph{C/C++}: 2030@multitable @columnfractions .20 .80 2031@item @emph{Prototype}: @tab @code{int acc_get_device_num(acc_device_t devicetype);} 2032@end multitable 2033 2034@item @emph{Fortran}: 2035@multitable @columnfractions .20 .80 2036@item @emph{Interface}: @tab @code{function acc_get_device_num(devicetype)} 2037@item @tab @code{integer(kind=acc_device_kind) devicetype} 2038@item @tab @code{integer acc_get_device_num} 2039@end multitable 2040 2041@item @emph{Reference}: 2042@uref{https://www.openacc.org, OpenACC specification v2.6}, section 20433.2.5. 2044@end table 2045 2046 2047 2048@node acc_get_property 2049@section @code{acc_get_property} -- Get device property. 2050@cindex acc_get_property 2051@cindex acc_get_property_string 2052@table @asis 2053@item @emph{Description} 2054These routines return the value of the specified @var{property} for the 2055device being queried according to @var{devicenum} and @var{devicetype}. 2056Integer-valued and string-valued properties are returned by 2057@code{acc_get_property} and @code{acc_get_property_string} respectively. 2058The Fortran @code{acc_get_property_string} subroutine returns the string 2059retrieved in its fourth argument while the remaining entry points are 2060functions, which pass the return value as their result. 2061 2062Note for Fortran, only: the OpenACC technical committee corrected and, hence, 2063modified the interface introduced in OpenACC 2.6. The kind-value parameter 2064@code{acc_device_property} has been renamed to @code{acc_device_property_kind} 2065for consistency and the return type of the @code{acc_get_property} function is 2066now a @code{c_size_t} integer instead of a @code{acc_device_property} integer. 2067The parameter @code{acc_device_property} will continue to be provided, 2068but might be removed in a future version of GCC. 2069 2070@item @emph{C/C++}: 2071@multitable @columnfractions .20 .80 2072@item @emph{Prototype}: @tab @code{size_t acc_get_property(int devicenum, acc_device_t devicetype, acc_device_property_t property);} 2073@item @emph{Prototype}: @tab @code{const char *acc_get_property_string(int devicenum, acc_device_t devicetype, acc_device_property_t property);} 2074@end multitable 2075 2076@item @emph{Fortran}: 2077@multitable @columnfractions .20 .80 2078@item @emph{Interface}: @tab @code{function acc_get_property(devicenum, devicetype, property)} 2079@item @emph{Interface}: @tab @code{subroutine acc_get_property_string(devicenum, devicetype, property, string)} 2080@item @tab @code{use ISO_C_Binding, only: c_size_t} 2081@item @tab @code{integer devicenum} 2082@item @tab @code{integer(kind=acc_device_kind) devicetype} 2083@item @tab @code{integer(kind=acc_device_property_kind) property} 2084@item @tab @code{integer(kind=c_size_t) acc_get_property} 2085@item @tab @code{character(*) string} 2086@end multitable 2087 2088@item @emph{Reference}: 2089@uref{https://www.openacc.org, OpenACC specification v2.6}, section 20903.2.6. 2091@end table 2092 2093 2094 2095@node acc_async_test 2096@section @code{acc_async_test} -- Test for completion of a specific asynchronous operation. 2097@table @asis 2098@item @emph{Description} 2099This function tests for completion of the asynchronous operation specified 2100in @var{arg}. In C/C++, a non-zero value will be returned to indicate 2101the specified asynchronous operation has completed. While Fortran will return 2102a @code{true}. If the asynchronous operation has not completed, C/C++ returns 2103a zero and Fortran returns a @code{false}. 2104 2105@item @emph{C/C++}: 2106@multitable @columnfractions .20 .80 2107@item @emph{Prototype}: @tab @code{int acc_async_test(int arg);} 2108@end multitable 2109 2110@item @emph{Fortran}: 2111@multitable @columnfractions .20 .80 2112@item @emph{Interface}: @tab @code{function acc_async_test(arg)} 2113@item @tab @code{integer(kind=acc_handle_kind) arg} 2114@item @tab @code{logical acc_async_test} 2115@end multitable 2116 2117@item @emph{Reference}: 2118@uref{https://www.openacc.org, OpenACC specification v2.6}, section 21193.2.9. 2120@end table 2121 2122 2123 2124@node acc_async_test_all 2125@section @code{acc_async_test_all} -- Tests for completion of all asynchronous operations. 2126@table @asis 2127@item @emph{Description} 2128This function tests for completion of all asynchronous operations. 2129In C/C++, a non-zero value will be returned to indicate all asynchronous 2130operations have completed. While Fortran will return a @code{true}. If 2131any asynchronous operation has not completed, C/C++ returns a zero and 2132Fortran returns a @code{false}. 2133 2134@item @emph{C/C++}: 2135@multitable @columnfractions .20 .80 2136@item @emph{Prototype}: @tab @code{int acc_async_test_all(void);} 2137@end multitable 2138 2139@item @emph{Fortran}: 2140@multitable @columnfractions .20 .80 2141@item @emph{Interface}: @tab @code{function acc_async_test()} 2142@item @tab @code{logical acc_get_device_num} 2143@end multitable 2144 2145@item @emph{Reference}: 2146@uref{https://www.openacc.org, OpenACC specification v2.6}, section 21473.2.10. 2148@end table 2149 2150 2151 2152@node acc_wait 2153@section @code{acc_wait} -- Wait for completion of a specific asynchronous operation. 2154@table @asis 2155@item @emph{Description} 2156This function waits for completion of the asynchronous operation 2157specified in @var{arg}. 2158 2159@item @emph{C/C++}: 2160@multitable @columnfractions .20 .80 2161@item @emph{Prototype}: @tab @code{acc_wait(arg);} 2162@item @emph{Prototype (OpenACC 1.0 compatibility)}: @tab @code{acc_async_wait(arg);} 2163@end multitable 2164 2165@item @emph{Fortran}: 2166@multitable @columnfractions .20 .80 2167@item @emph{Interface}: @tab @code{subroutine acc_wait(arg)} 2168@item @tab @code{integer(acc_handle_kind) arg} 2169@item @emph{Interface (OpenACC 1.0 compatibility)}: @tab @code{subroutine acc_async_wait(arg)} 2170@item @tab @code{integer(acc_handle_kind) arg} 2171@end multitable 2172 2173@item @emph{Reference}: 2174@uref{https://www.openacc.org, OpenACC specification v2.6}, section 21753.2.11. 2176@end table 2177 2178 2179 2180@node acc_wait_all 2181@section @code{acc_wait_all} -- Waits for completion of all asynchronous operations. 2182@table @asis 2183@item @emph{Description} 2184This function waits for the completion of all asynchronous operations. 2185 2186@item @emph{C/C++}: 2187@multitable @columnfractions .20 .80 2188@item @emph{Prototype}: @tab @code{acc_wait_all(void);} 2189@item @emph{Prototype (OpenACC 1.0 compatibility)}: @tab @code{acc_async_wait_all(void);} 2190@end multitable 2191 2192@item @emph{Fortran}: 2193@multitable @columnfractions .20 .80 2194@item @emph{Interface}: @tab @code{subroutine acc_wait_all()} 2195@item @emph{Interface (OpenACC 1.0 compatibility)}: @tab @code{subroutine acc_async_wait_all()} 2196@end multitable 2197 2198@item @emph{Reference}: 2199@uref{https://www.openacc.org, OpenACC specification v2.6}, section 22003.2.13. 2201@end table 2202 2203 2204 2205@node acc_wait_all_async 2206@section @code{acc_wait_all_async} -- Wait for completion of all asynchronous operations. 2207@table @asis 2208@item @emph{Description} 2209This function enqueues a wait operation on the queue @var{async} for any 2210and all asynchronous operations that have been previously enqueued on 2211any queue. 2212 2213@item @emph{C/C++}: 2214@multitable @columnfractions .20 .80 2215@item @emph{Prototype}: @tab @code{acc_wait_all_async(int async);} 2216@end multitable 2217 2218@item @emph{Fortran}: 2219@multitable @columnfractions .20 .80 2220@item @emph{Interface}: @tab @code{subroutine acc_wait_all_async(async)} 2221@item @tab @code{integer(acc_handle_kind) async} 2222@end multitable 2223 2224@item @emph{Reference}: 2225@uref{https://www.openacc.org, OpenACC specification v2.6}, section 22263.2.14. 2227@end table 2228 2229 2230 2231@node acc_wait_async 2232@section @code{acc_wait_async} -- Wait for completion of asynchronous operations. 2233@table @asis 2234@item @emph{Description} 2235This function enqueues a wait operation on queue @var{async} for any and all 2236asynchronous operations enqueued on queue @var{arg}. 2237 2238@item @emph{C/C++}: 2239@multitable @columnfractions .20 .80 2240@item @emph{Prototype}: @tab @code{acc_wait_async(int arg, int async);} 2241@end multitable 2242 2243@item @emph{Fortran}: 2244@multitable @columnfractions .20 .80 2245@item @emph{Interface}: @tab @code{subroutine acc_wait_async(arg, async)} 2246@item @tab @code{integer(acc_handle_kind) arg, async} 2247@end multitable 2248 2249@item @emph{Reference}: 2250@uref{https://www.openacc.org, OpenACC specification v2.6}, section 22513.2.12. 2252@end table 2253 2254 2255 2256@node acc_init 2257@section @code{acc_init} -- Initialize runtime for a specific device type. 2258@table @asis 2259@item @emph{Description} 2260This function initializes the runtime for the device type specified in 2261@var{devicetype}. 2262 2263@item @emph{C/C++}: 2264@multitable @columnfractions .20 .80 2265@item @emph{Prototype}: @tab @code{acc_init(acc_device_t devicetype);} 2266@end multitable 2267 2268@item @emph{Fortran}: 2269@multitable @columnfractions .20 .80 2270@item @emph{Interface}: @tab @code{subroutine acc_init(devicetype)} 2271@item @tab @code{integer(acc_device_kind) devicetype} 2272@end multitable 2273 2274@item @emph{Reference}: 2275@uref{https://www.openacc.org, OpenACC specification v2.6}, section 22763.2.7. 2277@end table 2278 2279 2280 2281@node acc_shutdown 2282@section @code{acc_shutdown} -- Shuts down the runtime for a specific device type. 2283@table @asis 2284@item @emph{Description} 2285This function shuts down the runtime for the device type specified in 2286@var{devicetype}. 2287 2288@item @emph{C/C++}: 2289@multitable @columnfractions .20 .80 2290@item @emph{Prototype}: @tab @code{acc_shutdown(acc_device_t devicetype);} 2291@end multitable 2292 2293@item @emph{Fortran}: 2294@multitable @columnfractions .20 .80 2295@item @emph{Interface}: @tab @code{subroutine acc_shutdown(devicetype)} 2296@item @tab @code{integer(acc_device_kind) devicetype} 2297@end multitable 2298 2299@item @emph{Reference}: 2300@uref{https://www.openacc.org, OpenACC specification v2.6}, section 23013.2.8. 2302@end table 2303 2304 2305 2306@node acc_on_device 2307@section @code{acc_on_device} -- Whether executing on a particular device 2308@table @asis 2309@item @emph{Description}: 2310This function returns whether the program is executing on a particular 2311device specified in @var{devicetype}. In C/C++ a non-zero value is 2312returned to indicate the device is executing on the specified device type. 2313In Fortran, @code{true} will be returned. If the program is not executing 2314on the specified device type C/C++ will return a zero, while Fortran will 2315return @code{false}. 2316 2317@item @emph{C/C++}: 2318@multitable @columnfractions .20 .80 2319@item @emph{Prototype}: @tab @code{acc_on_device(acc_device_t devicetype);} 2320@end multitable 2321 2322@item @emph{Fortran}: 2323@multitable @columnfractions .20 .80 2324@item @emph{Interface}: @tab @code{function acc_on_device(devicetype)} 2325@item @tab @code{integer(acc_device_kind) devicetype} 2326@item @tab @code{logical acc_on_device} 2327@end multitable 2328 2329 2330@item @emph{Reference}: 2331@uref{https://www.openacc.org, OpenACC specification v2.6}, section 23323.2.17. 2333@end table 2334 2335 2336 2337@node acc_malloc 2338@section @code{acc_malloc} -- Allocate device memory. 2339@table @asis 2340@item @emph{Description} 2341This function allocates @var{len} bytes of device memory. It returns 2342the device address of the allocated memory. 2343 2344@item @emph{C/C++}: 2345@multitable @columnfractions .20 .80 2346@item @emph{Prototype}: @tab @code{d_void* acc_malloc(size_t len);} 2347@end multitable 2348 2349@item @emph{Reference}: 2350@uref{https://www.openacc.org, OpenACC specification v2.6}, section 23513.2.18. 2352@end table 2353 2354 2355 2356@node acc_free 2357@section @code{acc_free} -- Free device memory. 2358@table @asis 2359@item @emph{Description} 2360Free previously allocated device memory at the device address @code{a}. 2361 2362@item @emph{C/C++}: 2363@multitable @columnfractions .20 .80 2364@item @emph{Prototype}: @tab @code{acc_free(d_void *a);} 2365@end multitable 2366 2367@item @emph{Reference}: 2368@uref{https://www.openacc.org, OpenACC specification v2.6}, section 23693.2.19. 2370@end table 2371 2372 2373 2374@node acc_copyin 2375@section @code{acc_copyin} -- Allocate device memory and copy host memory to it. 2376@table @asis 2377@item @emph{Description} 2378In C/C++, this function allocates @var{len} bytes of device memory 2379and maps it to the specified host address in @var{a}. The device 2380address of the newly allocated device memory is returned. 2381 2382In Fortran, two (2) forms are supported. In the first form, @var{a} specifies 2383a contiguous array section. The second form @var{a} specifies a 2384variable or array element and @var{len} specifies the length in bytes. 2385 2386@item @emph{C/C++}: 2387@multitable @columnfractions .20 .80 2388@item @emph{Prototype}: @tab @code{void *acc_copyin(h_void *a, size_t len);} 2389@item @emph{Prototype}: @tab @code{void *acc_copyin_async(h_void *a, size_t len, int async);} 2390@end multitable 2391 2392@item @emph{Fortran}: 2393@multitable @columnfractions .20 .80 2394@item @emph{Interface}: @tab @code{subroutine acc_copyin(a)} 2395@item @tab @code{type, dimension(:[,:]...) :: a} 2396@item @emph{Interface}: @tab @code{subroutine acc_copyin(a, len)} 2397@item @tab @code{type, dimension(:[,:]...) :: a} 2398@item @tab @code{integer len} 2399@item @emph{Interface}: @tab @code{subroutine acc_copyin_async(a, async)} 2400@item @tab @code{type, dimension(:[,:]...) :: a} 2401@item @tab @code{integer(acc_handle_kind) :: async} 2402@item @emph{Interface}: @tab @code{subroutine acc_copyin_async(a, len, async)} 2403@item @tab @code{type, dimension(:[,:]...) :: a} 2404@item @tab @code{integer len} 2405@item @tab @code{integer(acc_handle_kind) :: async} 2406@end multitable 2407 2408@item @emph{Reference}: 2409@uref{https://www.openacc.org, OpenACC specification v2.6}, section 24103.2.20. 2411@end table 2412 2413 2414 2415@node acc_present_or_copyin 2416@section @code{acc_present_or_copyin} -- If the data is not present on the device, allocate device memory and copy from host memory. 2417@table @asis 2418@item @emph{Description} 2419This function tests if the host data specified by @var{a} and of length 2420@var{len} is present or not. If it is not present, then device memory 2421will be allocated and the host memory copied. The device address of 2422the newly allocated device memory is returned. 2423 2424In Fortran, two (2) forms are supported. In the first form, @var{a} specifies 2425a contiguous array section. The second form @var{a} specifies a variable or 2426array element and @var{len} specifies the length in bytes. 2427 2428Note that @code{acc_present_or_copyin} and @code{acc_pcopyin} exist for 2429backward compatibility with OpenACC 2.0; use @ref{acc_copyin} instead. 2430 2431@item @emph{C/C++}: 2432@multitable @columnfractions .20 .80 2433@item @emph{Prototype}: @tab @code{void *acc_present_or_copyin(h_void *a, size_t len);} 2434@item @emph{Prototype}: @tab @code{void *acc_pcopyin(h_void *a, size_t len);} 2435@end multitable 2436 2437@item @emph{Fortran}: 2438@multitable @columnfractions .20 .80 2439@item @emph{Interface}: @tab @code{subroutine acc_present_or_copyin(a)} 2440@item @tab @code{type, dimension(:[,:]...) :: a} 2441@item @emph{Interface}: @tab @code{subroutine acc_present_or_copyin(a, len)} 2442@item @tab @code{type, dimension(:[,:]...) :: a} 2443@item @tab @code{integer len} 2444@item @emph{Interface}: @tab @code{subroutine acc_pcopyin(a)} 2445@item @tab @code{type, dimension(:[,:]...) :: a} 2446@item @emph{Interface}: @tab @code{subroutine acc_pcopyin(a, len)} 2447@item @tab @code{type, dimension(:[,:]...) :: a} 2448@item @tab @code{integer len} 2449@end multitable 2450 2451@item @emph{Reference}: 2452@uref{https://www.openacc.org, OpenACC specification v2.6}, section 24533.2.20. 2454@end table 2455 2456 2457 2458@node acc_create 2459@section @code{acc_create} -- Allocate device memory and map it to host memory. 2460@table @asis 2461@item @emph{Description} 2462This function allocates device memory and maps it to host memory specified 2463by the host address @var{a} with a length of @var{len} bytes. In C/C++, 2464the function returns the device address of the allocated device memory. 2465 2466In Fortran, two (2) forms are supported. In the first form, @var{a} specifies 2467a contiguous array section. The second form @var{a} specifies a variable or 2468array element and @var{len} specifies the length in bytes. 2469 2470@item @emph{C/C++}: 2471@multitable @columnfractions .20 .80 2472@item @emph{Prototype}: @tab @code{void *acc_create(h_void *a, size_t len);} 2473@item @emph{Prototype}: @tab @code{void *acc_create_async(h_void *a, size_t len, int async);} 2474@end multitable 2475 2476@item @emph{Fortran}: 2477@multitable @columnfractions .20 .80 2478@item @emph{Interface}: @tab @code{subroutine acc_create(a)} 2479@item @tab @code{type, dimension(:[,:]...) :: a} 2480@item @emph{Interface}: @tab @code{subroutine acc_create(a, len)} 2481@item @tab @code{type, dimension(:[,:]...) :: a} 2482@item @tab @code{integer len} 2483@item @emph{Interface}: @tab @code{subroutine acc_create_async(a, async)} 2484@item @tab @code{type, dimension(:[,:]...) :: a} 2485@item @tab @code{integer(acc_handle_kind) :: async} 2486@item @emph{Interface}: @tab @code{subroutine acc_create_async(a, len, async)} 2487@item @tab @code{type, dimension(:[,:]...) :: a} 2488@item @tab @code{integer len} 2489@item @tab @code{integer(acc_handle_kind) :: async} 2490@end multitable 2491 2492@item @emph{Reference}: 2493@uref{https://www.openacc.org, OpenACC specification v2.6}, section 24943.2.21. 2495@end table 2496 2497 2498 2499@node acc_present_or_create 2500@section @code{acc_present_or_create} -- If the data is not present on the device, allocate device memory and map it to host memory. 2501@table @asis 2502@item @emph{Description} 2503This function tests if the host data specified by @var{a} and of length 2504@var{len} is present or not. If it is not present, then device memory 2505will be allocated and mapped to host memory. In C/C++, the device address 2506of the newly allocated device memory is returned. 2507 2508In Fortran, two (2) forms are supported. In the first form, @var{a} specifies 2509a contiguous array section. The second form @var{a} specifies a variable or 2510array element and @var{len} specifies the length in bytes. 2511 2512Note that @code{acc_present_or_create} and @code{acc_pcreate} exist for 2513backward compatibility with OpenACC 2.0; use @ref{acc_create} instead. 2514 2515@item @emph{C/C++}: 2516@multitable @columnfractions .20 .80 2517@item @emph{Prototype}: @tab @code{void *acc_present_or_create(h_void *a, size_t len)} 2518@item @emph{Prototype}: @tab @code{void *acc_pcreate(h_void *a, size_t len)} 2519@end multitable 2520 2521@item @emph{Fortran}: 2522@multitable @columnfractions .20 .80 2523@item @emph{Interface}: @tab @code{subroutine acc_present_or_create(a)} 2524@item @tab @code{type, dimension(:[,:]...) :: a} 2525@item @emph{Interface}: @tab @code{subroutine acc_present_or_create(a, len)} 2526@item @tab @code{type, dimension(:[,:]...) :: a} 2527@item @tab @code{integer len} 2528@item @emph{Interface}: @tab @code{subroutine acc_pcreate(a)} 2529@item @tab @code{type, dimension(:[,:]...) :: a} 2530@item @emph{Interface}: @tab @code{subroutine acc_pcreate(a, len)} 2531@item @tab @code{type, dimension(:[,:]...) :: a} 2532@item @tab @code{integer len} 2533@end multitable 2534 2535@item @emph{Reference}: 2536@uref{https://www.openacc.org, OpenACC specification v2.6}, section 25373.2.21. 2538@end table 2539 2540 2541 2542@node acc_copyout 2543@section @code{acc_copyout} -- Copy device memory to host memory. 2544@table @asis 2545@item @emph{Description} 2546This function copies mapped device memory to host memory which is specified 2547by host address @var{a} for a length @var{len} bytes in C/C++. 2548 2549In Fortran, two (2) forms are supported. In the first form, @var{a} specifies 2550a contiguous array section. The second form @var{a} specifies a variable or 2551array element and @var{len} specifies the length in bytes. 2552 2553@item @emph{C/C++}: 2554@multitable @columnfractions .20 .80 2555@item @emph{Prototype}: @tab @code{acc_copyout(h_void *a, size_t len);} 2556@item @emph{Prototype}: @tab @code{acc_copyout_async(h_void *a, size_t len, int async);} 2557@item @emph{Prototype}: @tab @code{acc_copyout_finalize(h_void *a, size_t len);} 2558@item @emph{Prototype}: @tab @code{acc_copyout_finalize_async(h_void *a, size_t len, int async);} 2559@end multitable 2560 2561@item @emph{Fortran}: 2562@multitable @columnfractions .20 .80 2563@item @emph{Interface}: @tab @code{subroutine acc_copyout(a)} 2564@item @tab @code{type, dimension(:[,:]...) :: a} 2565@item @emph{Interface}: @tab @code{subroutine acc_copyout(a, len)} 2566@item @tab @code{type, dimension(:[,:]...) :: a} 2567@item @tab @code{integer len} 2568@item @emph{Interface}: @tab @code{subroutine acc_copyout_async(a, async)} 2569@item @tab @code{type, dimension(:[,:]...) :: a} 2570@item @tab @code{integer(acc_handle_kind) :: async} 2571@item @emph{Interface}: @tab @code{subroutine acc_copyout_async(a, len, async)} 2572@item @tab @code{type, dimension(:[,:]...) :: a} 2573@item @tab @code{integer len} 2574@item @tab @code{integer(acc_handle_kind) :: async} 2575@item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize(a)} 2576@item @tab @code{type, dimension(:[,:]...) :: a} 2577@item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize(a, len)} 2578@item @tab @code{type, dimension(:[,:]...) :: a} 2579@item @tab @code{integer len} 2580@item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize_async(a, async)} 2581@item @tab @code{type, dimension(:[,:]...) :: a} 2582@item @tab @code{integer(acc_handle_kind) :: async} 2583@item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize_async(a, len, async)} 2584@item @tab @code{type, dimension(:[,:]...) :: a} 2585@item @tab @code{integer len} 2586@item @tab @code{integer(acc_handle_kind) :: async} 2587@end multitable 2588 2589@item @emph{Reference}: 2590@uref{https://www.openacc.org, OpenACC specification v2.6}, section 25913.2.22. 2592@end table 2593 2594 2595 2596@node acc_delete 2597@section @code{acc_delete} -- Free device memory. 2598@table @asis 2599@item @emph{Description} 2600This function frees previously allocated device memory specified by 2601the device address @var{a} and the length of @var{len} bytes. 2602 2603In Fortran, two (2) forms are supported. In the first form, @var{a} specifies 2604a contiguous array section. The second form @var{a} specifies a variable or 2605array element and @var{len} specifies the length in bytes. 2606 2607@item @emph{C/C++}: 2608@multitable @columnfractions .20 .80 2609@item @emph{Prototype}: @tab @code{acc_delete(h_void *a, size_t len);} 2610@item @emph{Prototype}: @tab @code{acc_delete_async(h_void *a, size_t len, int async);} 2611@item @emph{Prototype}: @tab @code{acc_delete_finalize(h_void *a, size_t len);} 2612@item @emph{Prototype}: @tab @code{acc_delete_finalize_async(h_void *a, size_t len, int async);} 2613@end multitable 2614 2615@item @emph{Fortran}: 2616@multitable @columnfractions .20 .80 2617@item @emph{Interface}: @tab @code{subroutine acc_delete(a)} 2618@item @tab @code{type, dimension(:[,:]...) :: a} 2619@item @emph{Interface}: @tab @code{subroutine acc_delete(a, len)} 2620@item @tab @code{type, dimension(:[,:]...) :: a} 2621@item @tab @code{integer len} 2622@item @emph{Interface}: @tab @code{subroutine acc_delete_async(a, async)} 2623@item @tab @code{type, dimension(:[,:]...) :: a} 2624@item @tab @code{integer(acc_handle_kind) :: async} 2625@item @emph{Interface}: @tab @code{subroutine acc_delete_async(a, len, async)} 2626@item @tab @code{type, dimension(:[,:]...) :: a} 2627@item @tab @code{integer len} 2628@item @tab @code{integer(acc_handle_kind) :: async} 2629@item @emph{Interface}: @tab @code{subroutine acc_delete_finalize(a)} 2630@item @tab @code{type, dimension(:[,:]...) :: a} 2631@item @emph{Interface}: @tab @code{subroutine acc_delete_finalize(a, len)} 2632@item @tab @code{type, dimension(:[,:]...) :: a} 2633@item @tab @code{integer len} 2634@item @emph{Interface}: @tab @code{subroutine acc_delete_async_finalize(a, async)} 2635@item @tab @code{type, dimension(:[,:]...) :: a} 2636@item @tab @code{integer(acc_handle_kind) :: async} 2637@item @emph{Interface}: @tab @code{subroutine acc_delete_async_finalize(a, len, async)} 2638@item @tab @code{type, dimension(:[,:]...) :: a} 2639@item @tab @code{integer len} 2640@item @tab @code{integer(acc_handle_kind) :: async} 2641@end multitable 2642 2643@item @emph{Reference}: 2644@uref{https://www.openacc.org, OpenACC specification v2.6}, section 26453.2.23. 2646@end table 2647 2648 2649 2650@node acc_update_device 2651@section @code{acc_update_device} -- Update device memory from mapped host memory. 2652@table @asis 2653@item @emph{Description} 2654This function updates the device copy from the previously mapped host memory. 2655The host memory is specified with the host address @var{a} and a length of 2656@var{len} bytes. 2657 2658In Fortran, two (2) forms are supported. In the first form, @var{a} specifies 2659a contiguous array section. The second form @var{a} specifies a variable or 2660array element and @var{len} specifies the length in bytes. 2661 2662@item @emph{C/C++}: 2663@multitable @columnfractions .20 .80 2664@item @emph{Prototype}: @tab @code{acc_update_device(h_void *a, size_t len);} 2665@item @emph{Prototype}: @tab @code{acc_update_device(h_void *a, size_t len, async);} 2666@end multitable 2667 2668@item @emph{Fortran}: 2669@multitable @columnfractions .20 .80 2670@item @emph{Interface}: @tab @code{subroutine acc_update_device(a)} 2671@item @tab @code{type, dimension(:[,:]...) :: a} 2672@item @emph{Interface}: @tab @code{subroutine acc_update_device(a, len)} 2673@item @tab @code{type, dimension(:[,:]...) :: a} 2674@item @tab @code{integer len} 2675@item @emph{Interface}: @tab @code{subroutine acc_update_device_async(a, async)} 2676@item @tab @code{type, dimension(:[,:]...) :: a} 2677@item @tab @code{integer(acc_handle_kind) :: async} 2678@item @emph{Interface}: @tab @code{subroutine acc_update_device_async(a, len, async)} 2679@item @tab @code{type, dimension(:[,:]...) :: a} 2680@item @tab @code{integer len} 2681@item @tab @code{integer(acc_handle_kind) :: async} 2682@end multitable 2683 2684@item @emph{Reference}: 2685@uref{https://www.openacc.org, OpenACC specification v2.6}, section 26863.2.24. 2687@end table 2688 2689 2690 2691@node acc_update_self 2692@section @code{acc_update_self} -- Update host memory from mapped device memory. 2693@table @asis 2694@item @emph{Description} 2695This function updates the host copy from the previously mapped device memory. 2696The host memory is specified with the host address @var{a} and a length of 2697@var{len} bytes. 2698 2699In Fortran, two (2) forms are supported. In the first form, @var{a} specifies 2700a contiguous array section. The second form @var{a} specifies a variable or 2701array element and @var{len} specifies the length in bytes. 2702 2703@item @emph{C/C++}: 2704@multitable @columnfractions .20 .80 2705@item @emph{Prototype}: @tab @code{acc_update_self(h_void *a, size_t len);} 2706@item @emph{Prototype}: @tab @code{acc_update_self_async(h_void *a, size_t len, int async);} 2707@end multitable 2708 2709@item @emph{Fortran}: 2710@multitable @columnfractions .20 .80 2711@item @emph{Interface}: @tab @code{subroutine acc_update_self(a)} 2712@item @tab @code{type, dimension(:[,:]...) :: a} 2713@item @emph{Interface}: @tab @code{subroutine acc_update_self(a, len)} 2714@item @tab @code{type, dimension(:[,:]...) :: a} 2715@item @tab @code{integer len} 2716@item @emph{Interface}: @tab @code{subroutine acc_update_self_async(a, async)} 2717@item @tab @code{type, dimension(:[,:]...) :: a} 2718@item @tab @code{integer(acc_handle_kind) :: async} 2719@item @emph{Interface}: @tab @code{subroutine acc_update_self_async(a, len, async)} 2720@item @tab @code{type, dimension(:[,:]...) :: a} 2721@item @tab @code{integer len} 2722@item @tab @code{integer(acc_handle_kind) :: async} 2723@end multitable 2724 2725@item @emph{Reference}: 2726@uref{https://www.openacc.org, OpenACC specification v2.6}, section 27273.2.25. 2728@end table 2729 2730 2731 2732@node acc_map_data 2733@section @code{acc_map_data} -- Map previously allocated device memory to host memory. 2734@table @asis 2735@item @emph{Description} 2736This function maps previously allocated device and host memory. The device 2737memory is specified with the device address @var{d}. The host memory is 2738specified with the host address @var{h} and a length of @var{len}. 2739 2740@item @emph{C/C++}: 2741@multitable @columnfractions .20 .80 2742@item @emph{Prototype}: @tab @code{acc_map_data(h_void *h, d_void *d, size_t len);} 2743@end multitable 2744 2745@item @emph{Reference}: 2746@uref{https://www.openacc.org, OpenACC specification v2.6}, section 27473.2.26. 2748@end table 2749 2750 2751 2752@node acc_unmap_data 2753@section @code{acc_unmap_data} -- Unmap device memory from host memory. 2754@table @asis 2755@item @emph{Description} 2756This function unmaps previously mapped device and host memory. The latter 2757specified by @var{h}. 2758 2759@item @emph{C/C++}: 2760@multitable @columnfractions .20 .80 2761@item @emph{Prototype}: @tab @code{acc_unmap_data(h_void *h);} 2762@end multitable 2763 2764@item @emph{Reference}: 2765@uref{https://www.openacc.org, OpenACC specification v2.6}, section 27663.2.27. 2767@end table 2768 2769 2770 2771@node acc_deviceptr 2772@section @code{acc_deviceptr} -- Get device pointer associated with specific host address. 2773@table @asis 2774@item @emph{Description} 2775This function returns the device address that has been mapped to the 2776host address specified by @var{h}. 2777 2778@item @emph{C/C++}: 2779@multitable @columnfractions .20 .80 2780@item @emph{Prototype}: @tab @code{void *acc_deviceptr(h_void *h);} 2781@end multitable 2782 2783@item @emph{Reference}: 2784@uref{https://www.openacc.org, OpenACC specification v2.6}, section 27853.2.28. 2786@end table 2787 2788 2789 2790@node acc_hostptr 2791@section @code{acc_hostptr} -- Get host pointer associated with specific device address. 2792@table @asis 2793@item @emph{Description} 2794This function returns the host address that has been mapped to the 2795device address specified by @var{d}. 2796 2797@item @emph{C/C++}: 2798@multitable @columnfractions .20 .80 2799@item @emph{Prototype}: @tab @code{void *acc_hostptr(d_void *d);} 2800@end multitable 2801 2802@item @emph{Reference}: 2803@uref{https://www.openacc.org, OpenACC specification v2.6}, section 28043.2.29. 2805@end table 2806 2807 2808 2809@node acc_is_present 2810@section @code{acc_is_present} -- Indicate whether host variable / array is present on device. 2811@table @asis 2812@item @emph{Description} 2813This function indicates whether the specified host address in @var{a} and a 2814length of @var{len} bytes is present on the device. In C/C++, a non-zero 2815value is returned to indicate the presence of the mapped memory on the 2816device. A zero is returned to indicate the memory is not mapped on the 2817device. 2818 2819In Fortran, two (2) forms are supported. In the first form, @var{a} specifies 2820a contiguous array section. The second form @var{a} specifies a variable or 2821array element and @var{len} specifies the length in bytes. If the host 2822memory is mapped to device memory, then a @code{true} is returned. Otherwise, 2823a @code{false} is return to indicate the mapped memory is not present. 2824 2825@item @emph{C/C++}: 2826@multitable @columnfractions .20 .80 2827@item @emph{Prototype}: @tab @code{int acc_is_present(h_void *a, size_t len);} 2828@end multitable 2829 2830@item @emph{Fortran}: 2831@multitable @columnfractions .20 .80 2832@item @emph{Interface}: @tab @code{function acc_is_present(a)} 2833@item @tab @code{type, dimension(:[,:]...) :: a} 2834@item @tab @code{logical acc_is_present} 2835@item @emph{Interface}: @tab @code{function acc_is_present(a, len)} 2836@item @tab @code{type, dimension(:[,:]...) :: a} 2837@item @tab @code{integer len} 2838@item @tab @code{logical acc_is_present} 2839@end multitable 2840 2841@item @emph{Reference}: 2842@uref{https://www.openacc.org, OpenACC specification v2.6}, section 28433.2.30. 2844@end table 2845 2846 2847 2848@node acc_memcpy_to_device 2849@section @code{acc_memcpy_to_device} -- Copy host memory to device memory. 2850@table @asis 2851@item @emph{Description} 2852This function copies host memory specified by host address of @var{src} to 2853device memory specified by the device address @var{dest} for a length of 2854@var{bytes} bytes. 2855 2856@item @emph{C/C++}: 2857@multitable @columnfractions .20 .80 2858@item @emph{Prototype}: @tab @code{acc_memcpy_to_device(d_void *dest, h_void *src, size_t bytes);} 2859@end multitable 2860 2861@item @emph{Reference}: 2862@uref{https://www.openacc.org, OpenACC specification v2.6}, section 28633.2.31. 2864@end table 2865 2866 2867 2868@node acc_memcpy_from_device 2869@section @code{acc_memcpy_from_device} -- Copy device memory to host memory. 2870@table @asis 2871@item @emph{Description} 2872This function copies host memory specified by host address of @var{src} from 2873device memory specified by the device address @var{dest} for a length of 2874@var{bytes} bytes. 2875 2876@item @emph{C/C++}: 2877@multitable @columnfractions .20 .80 2878@item @emph{Prototype}: @tab @code{acc_memcpy_from_device(d_void *dest, h_void *src, size_t bytes);} 2879@end multitable 2880 2881@item @emph{Reference}: 2882@uref{https://www.openacc.org, OpenACC specification v2.6}, section 28833.2.32. 2884@end table 2885 2886 2887 2888@node acc_attach 2889@section @code{acc_attach} -- Let device pointer point to device-pointer target. 2890@table @asis 2891@item @emph{Description} 2892This function updates a pointer on the device from pointing to a host-pointer 2893address to pointing to the corresponding device data. 2894 2895@item @emph{C/C++}: 2896@multitable @columnfractions .20 .80 2897@item @emph{Prototype}: @tab @code{acc_attach(h_void **ptr);} 2898@item @emph{Prototype}: @tab @code{acc_attach_async(h_void **ptr, int async);} 2899@end multitable 2900 2901@item @emph{Reference}: 2902@uref{https://www.openacc.org, OpenACC specification v2.6}, section 29033.2.34. 2904@end table 2905 2906 2907 2908@node acc_detach 2909@section @code{acc_detach} -- Let device pointer point to host-pointer target. 2910@table @asis 2911@item @emph{Description} 2912This function updates a pointer on the device from pointing to a device-pointer 2913address to pointing to the corresponding host data. 2914 2915@item @emph{C/C++}: 2916@multitable @columnfractions .20 .80 2917@item @emph{Prototype}: @tab @code{acc_detach(h_void **ptr);} 2918@item @emph{Prototype}: @tab @code{acc_detach_async(h_void **ptr, int async);} 2919@item @emph{Prototype}: @tab @code{acc_detach_finalize(h_void **ptr);} 2920@item @emph{Prototype}: @tab @code{acc_detach_finalize_async(h_void **ptr, int async);} 2921@end multitable 2922 2923@item @emph{Reference}: 2924@uref{https://www.openacc.org, OpenACC specification v2.6}, section 29253.2.35. 2926@end table 2927 2928 2929 2930@node acc_get_current_cuda_device 2931@section @code{acc_get_current_cuda_device} -- Get CUDA device handle. 2932@table @asis 2933@item @emph{Description} 2934This function returns the CUDA device handle. This handle is the same 2935as used by the CUDA Runtime or Driver API's. 2936 2937@item @emph{C/C++}: 2938@multitable @columnfractions .20 .80 2939@item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_device(void);} 2940@end multitable 2941 2942@item @emph{Reference}: 2943@uref{https://www.openacc.org, OpenACC specification v2.6}, section 2944A.2.1.1. 2945@end table 2946 2947 2948 2949@node acc_get_current_cuda_context 2950@section @code{acc_get_current_cuda_context} -- Get CUDA context handle. 2951@table @asis 2952@item @emph{Description} 2953This function returns the CUDA context handle. This handle is the same 2954as used by the CUDA Runtime or Driver API's. 2955 2956@item @emph{C/C++}: 2957@multitable @columnfractions .20 .80 2958@item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_context(void);} 2959@end multitable 2960 2961@item @emph{Reference}: 2962@uref{https://www.openacc.org, OpenACC specification v2.6}, section 2963A.2.1.2. 2964@end table 2965 2966 2967 2968@node acc_get_cuda_stream 2969@section @code{acc_get_cuda_stream} -- Get CUDA stream handle. 2970@table @asis 2971@item @emph{Description} 2972This function returns the CUDA stream handle for the queue @var{async}. 2973This handle is the same as used by the CUDA Runtime or Driver API's. 2974 2975@item @emph{C/C++}: 2976@multitable @columnfractions .20 .80 2977@item @emph{Prototype}: @tab @code{void *acc_get_cuda_stream(int async);} 2978@end multitable 2979 2980@item @emph{Reference}: 2981@uref{https://www.openacc.org, OpenACC specification v2.6}, section 2982A.2.1.3. 2983@end table 2984 2985 2986 2987@node acc_set_cuda_stream 2988@section @code{acc_set_cuda_stream} -- Set CUDA stream handle. 2989@table @asis 2990@item @emph{Description} 2991This function associates the stream handle specified by @var{stream} with 2992the queue @var{async}. 2993 2994This cannot be used to change the stream handle associated with 2995@code{acc_async_sync}. 2996 2997The return value is not specified. 2998 2999@item @emph{C/C++}: 3000@multitable @columnfractions .20 .80 3001@item @emph{Prototype}: @tab @code{int acc_set_cuda_stream(int async, void *stream);} 3002@end multitable 3003 3004@item @emph{Reference}: 3005@uref{https://www.openacc.org, OpenACC specification v2.6}, section 3006A.2.1.4. 3007@end table 3008 3009 3010 3011@node acc_prof_register 3012@section @code{acc_prof_register} -- Register callbacks. 3013@table @asis 3014@item @emph{Description}: 3015This function registers callbacks. 3016 3017@item @emph{C/C++}: 3018@multitable @columnfractions .20 .80 3019@item @emph{Prototype}: @tab @code{void acc_prof_register (acc_event_t, acc_prof_callback, acc_register_t);} 3020@end multitable 3021 3022@item @emph{See also}: 3023@ref{OpenACC Profiling Interface} 3024 3025@item @emph{Reference}: 3026@uref{https://www.openacc.org, OpenACC specification v2.6}, section 30275.3. 3028@end table 3029 3030 3031 3032@node acc_prof_unregister 3033@section @code{acc_prof_unregister} -- Unregister callbacks. 3034@table @asis 3035@item @emph{Description}: 3036This function unregisters callbacks. 3037 3038@item @emph{C/C++}: 3039@multitable @columnfractions .20 .80 3040@item @emph{Prototype}: @tab @code{void acc_prof_unregister (acc_event_t, acc_prof_callback, acc_register_t);} 3041@end multitable 3042 3043@item @emph{See also}: 3044@ref{OpenACC Profiling Interface} 3045 3046@item @emph{Reference}: 3047@uref{https://www.openacc.org, OpenACC specification v2.6}, section 30485.3. 3049@end table 3050 3051 3052 3053@node acc_prof_lookup 3054@section @code{acc_prof_lookup} -- Obtain inquiry functions. 3055@table @asis 3056@item @emph{Description}: 3057Function to obtain inquiry functions. 3058 3059@item @emph{C/C++}: 3060@multitable @columnfractions .20 .80 3061@item @emph{Prototype}: @tab @code{acc_query_fn acc_prof_lookup (const char *);} 3062@end multitable 3063 3064@item @emph{See also}: 3065@ref{OpenACC Profiling Interface} 3066 3067@item @emph{Reference}: 3068@uref{https://www.openacc.org, OpenACC specification v2.6}, section 30695.3. 3070@end table 3071 3072 3073 3074@node acc_register_library 3075@section @code{acc_register_library} -- Library registration. 3076@table @asis 3077@item @emph{Description}: 3078Function for library registration. 3079 3080@item @emph{C/C++}: 3081@multitable @columnfractions .20 .80 3082@item @emph{Prototype}: @tab @code{void acc_register_library (acc_prof_reg, acc_prof_reg, acc_prof_lookup_func);} 3083@end multitable 3084 3085@item @emph{See also}: 3086@ref{OpenACC Profiling Interface}, @ref{ACC_PROFLIB} 3087 3088@item @emph{Reference}: 3089@uref{https://www.openacc.org, OpenACC specification v2.6}, section 30905.3. 3091@end table 3092 3093 3094 3095@c --------------------------------------------------------------------- 3096@c OpenACC Environment Variables 3097@c --------------------------------------------------------------------- 3098 3099@node OpenACC Environment Variables 3100@chapter OpenACC Environment Variables 3101 3102The variables @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM} 3103are defined by section 4 of the OpenACC specification in version 2.0. 3104The variable @env{ACC_PROFLIB} 3105is defined by section 4 of the OpenACC specification in version 2.6. 3106The variable @env{GCC_ACC_NOTIFY} is used for diagnostic purposes. 3107 3108@menu 3109* ACC_DEVICE_TYPE:: 3110* ACC_DEVICE_NUM:: 3111* ACC_PROFLIB:: 3112* GCC_ACC_NOTIFY:: 3113@end menu 3114 3115 3116 3117@node ACC_DEVICE_TYPE 3118@section @code{ACC_DEVICE_TYPE} 3119@table @asis 3120@item @emph{Reference}: 3121@uref{https://www.openacc.org, OpenACC specification v2.6}, section 31224.1. 3123@end table 3124 3125 3126 3127@node ACC_DEVICE_NUM 3128@section @code{ACC_DEVICE_NUM} 3129@table @asis 3130@item @emph{Reference}: 3131@uref{https://www.openacc.org, OpenACC specification v2.6}, section 31324.2. 3133@end table 3134 3135 3136 3137@node ACC_PROFLIB 3138@section @code{ACC_PROFLIB} 3139@table @asis 3140@item @emph{See also}: 3141@ref{acc_register_library}, @ref{OpenACC Profiling Interface} 3142 3143@item @emph{Reference}: 3144@uref{https://www.openacc.org, OpenACC specification v2.6}, section 31454.3. 3146@end table 3147 3148 3149 3150@node GCC_ACC_NOTIFY 3151@section @code{GCC_ACC_NOTIFY} 3152@table @asis 3153@item @emph{Description}: 3154Print debug information pertaining to the accelerator. 3155@end table 3156 3157 3158 3159@c --------------------------------------------------------------------- 3160@c CUDA Streams Usage 3161@c --------------------------------------------------------------------- 3162 3163@node CUDA Streams Usage 3164@chapter CUDA Streams Usage 3165 3166This applies to the @code{nvptx} plugin only. 3167 3168The library provides elements that perform asynchronous movement of 3169data and asynchronous operation of computing constructs. This 3170asynchronous functionality is implemented by making use of CUDA 3171streams@footnote{See "Stream Management" in "CUDA Driver API", 3172TRM-06703-001, Version 5.5, for additional information}. 3173 3174The primary means by that the asynchronous functionality is accessed 3175is through the use of those OpenACC directives which make use of the 3176@code{async} and @code{wait} clauses. When the @code{async} clause is 3177first used with a directive, it creates a CUDA stream. If an 3178@code{async-argument} is used with the @code{async} clause, then the 3179stream is associated with the specified @code{async-argument}. 3180 3181Following the creation of an association between a CUDA stream and the 3182@code{async-argument} of an @code{async} clause, both the @code{wait} 3183clause and the @code{wait} directive can be used. When either the 3184clause or directive is used after stream creation, it creates a 3185rendezvous point whereby execution waits until all operations 3186associated with the @code{async-argument}, that is, stream, have 3187completed. 3188 3189Normally, the management of the streams that are created as a result of 3190using the @code{async} clause, is done without any intervention by the 3191caller. This implies the association between the @code{async-argument} 3192and the CUDA stream will be maintained for the lifetime of the program. 3193However, this association can be changed through the use of the library 3194function @code{acc_set_cuda_stream}. When the function 3195@code{acc_set_cuda_stream} is called, the CUDA stream that was 3196originally associated with the @code{async} clause will be destroyed. 3197Caution should be taken when changing the association as subsequent 3198references to the @code{async-argument} refer to a different 3199CUDA stream. 3200 3201 3202 3203@c --------------------------------------------------------------------- 3204@c OpenACC Library Interoperability 3205@c --------------------------------------------------------------------- 3206 3207@node OpenACC Library Interoperability 3208@chapter OpenACC Library Interoperability 3209 3210@section Introduction 3211 3212The OpenACC library uses the CUDA Driver API, and may interact with 3213programs that use the Runtime library directly, or another library 3214based on the Runtime library, e.g., CUBLAS@footnote{See section 2.26, 3215"Interactions with the CUDA Driver API" in 3216"CUDA Runtime API", Version 5.5, and section 2.27, "VDPAU 3217Interoperability", in "CUDA Driver API", TRM-06703-001, Version 5.5, 3218for additional information on library interoperability.}. 3219This chapter describes the use cases and what changes are 3220required in order to use both the OpenACC library and the CUBLAS and Runtime 3221libraries within a program. 3222 3223@section First invocation: NVIDIA CUBLAS library API 3224 3225In this first use case (see below), a function in the CUBLAS library is called 3226prior to any of the functions in the OpenACC library. More specifically, the 3227function @code{cublasCreate()}. 3228 3229When invoked, the function initializes the library and allocates the 3230hardware resources on the host and the device on behalf of the caller. Once 3231the initialization and allocation has completed, a handle is returned to the 3232caller. The OpenACC library also requires initialization and allocation of 3233hardware resources. Since the CUBLAS library has already allocated the 3234hardware resources for the device, all that is left to do is to initialize 3235the OpenACC library and acquire the hardware resources on the host. 3236 3237Prior to calling the OpenACC function that initializes the library and 3238allocate the host hardware resources, you need to acquire the device number 3239that was allocated during the call to @code{cublasCreate()}. The invoking of the 3240runtime library function @code{cudaGetDevice()} accomplishes this. Once 3241acquired, the device number is passed along with the device type as 3242parameters to the OpenACC library function @code{acc_set_device_num()}. 3243 3244Once the call to @code{acc_set_device_num()} has completed, the OpenACC 3245library uses the context that was created during the call to 3246@code{cublasCreate()}. In other words, both libraries will be sharing the 3247same context. 3248 3249@smallexample 3250 /* Create the handle */ 3251 s = cublasCreate(&h); 3252 if (s != CUBLAS_STATUS_SUCCESS) 3253 @{ 3254 fprintf(stderr, "cublasCreate failed %d\n", s); 3255 exit(EXIT_FAILURE); 3256 @} 3257 3258 /* Get the device number */ 3259 e = cudaGetDevice(&dev); 3260 if (e != cudaSuccess) 3261 @{ 3262 fprintf(stderr, "cudaGetDevice failed %d\n", e); 3263 exit(EXIT_FAILURE); 3264 @} 3265 3266 /* Initialize OpenACC library and use device 'dev' */ 3267 acc_set_device_num(dev, acc_device_nvidia); 3268 3269@end smallexample 3270@center Use Case 1 3271 3272@section First invocation: OpenACC library API 3273 3274In this second use case (see below), a function in the OpenACC library is 3275called prior to any of the functions in the CUBLAS library. More specificially, 3276the function @code{acc_set_device_num()}. 3277 3278In the use case presented here, the function @code{acc_set_device_num()} 3279is used to both initialize the OpenACC library and allocate the hardware 3280resources on the host and the device. In the call to the function, the 3281call parameters specify which device to use and what device 3282type to use, i.e., @code{acc_device_nvidia}. It should be noted that this 3283is but one method to initialize the OpenACC library and allocate the 3284appropriate hardware resources. Other methods are available through the 3285use of environment variables and these will be discussed in the next section. 3286 3287Once the call to @code{acc_set_device_num()} has completed, other OpenACC 3288functions can be called as seen with multiple calls being made to 3289@code{acc_copyin()}. In addition, calls can be made to functions in the 3290CUBLAS library. In the use case a call to @code{cublasCreate()} is made 3291subsequent to the calls to @code{acc_copyin()}. 3292As seen in the previous use case, a call to @code{cublasCreate()} 3293initializes the CUBLAS library and allocates the hardware resources on the 3294host and the device. However, since the device has already been allocated, 3295@code{cublasCreate()} will only initialize the CUBLAS library and allocate 3296the appropriate hardware resources on the host. The context that was created 3297as part of the OpenACC initialization is shared with the CUBLAS library, 3298similarly to the first use case. 3299 3300@smallexample 3301 dev = 0; 3302 3303 acc_set_device_num(dev, acc_device_nvidia); 3304 3305 /* Copy the first set to the device */ 3306 d_X = acc_copyin(&h_X[0], N * sizeof (float)); 3307 if (d_X == NULL) 3308 @{ 3309 fprintf(stderr, "copyin error h_X\n"); 3310 exit(EXIT_FAILURE); 3311 @} 3312 3313 /* Copy the second set to the device */ 3314 d_Y = acc_copyin(&h_Y1[0], N * sizeof (float)); 3315 if (d_Y == NULL) 3316 @{ 3317 fprintf(stderr, "copyin error h_Y1\n"); 3318 exit(EXIT_FAILURE); 3319 @} 3320 3321 /* Create the handle */ 3322 s = cublasCreate(&h); 3323 if (s != CUBLAS_STATUS_SUCCESS) 3324 @{ 3325 fprintf(stderr, "cublasCreate failed %d\n", s); 3326 exit(EXIT_FAILURE); 3327 @} 3328 3329 /* Perform saxpy using CUBLAS library function */ 3330 s = cublasSaxpy(h, N, &alpha, d_X, 1, d_Y, 1); 3331 if (s != CUBLAS_STATUS_SUCCESS) 3332 @{ 3333 fprintf(stderr, "cublasSaxpy failed %d\n", s); 3334 exit(EXIT_FAILURE); 3335 @} 3336 3337 /* Copy the results from the device */ 3338 acc_memcpy_from_device(&h_Y1[0], d_Y, N * sizeof (float)); 3339 3340@end smallexample 3341@center Use Case 2 3342 3343@section OpenACC library and environment variables 3344 3345There are two environment variables associated with the OpenACC library 3346that may be used to control the device type and device number: 3347@env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}, respectively. These two 3348environment variables can be used as an alternative to calling 3349@code{acc_set_device_num()}. As seen in the second use case, the device 3350type and device number were specified using @code{acc_set_device_num()}. 3351If however, the aforementioned environment variables were set, then the 3352call to @code{acc_set_device_num()} would not be required. 3353 3354 3355The use of the environment variables is only relevant when an OpenACC function 3356is called prior to a call to @code{cudaCreate()}. If @code{cudaCreate()} 3357is called prior to a call to an OpenACC function, then you must call 3358@code{acc_set_device_num()}@footnote{More complete information 3359about @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM} can be found in 3360sections 4.1 and 4.2 of the @uref{https://www.openacc.org, OpenACC} 3361Application Programming Interface”, Version 2.6.} 3362 3363 3364 3365@c --------------------------------------------------------------------- 3366@c OpenACC Profiling Interface 3367@c --------------------------------------------------------------------- 3368 3369@node OpenACC Profiling Interface 3370@chapter OpenACC Profiling Interface 3371 3372@section Implementation Status and Implementation-Defined Behavior 3373 3374We're implementing the OpenACC Profiling Interface as defined by the 3375OpenACC 2.6 specification. We're clarifying some aspects here as 3376@emph{implementation-defined behavior}, while they're still under 3377discussion within the OpenACC Technical Committee. 3378 3379This implementation is tuned to keep the performance impact as low as 3380possible for the (very common) case that the Profiling Interface is 3381not enabled. This is relevant, as the Profiling Interface affects all 3382the @emph{hot} code paths (in the target code, not in the offloaded 3383code). Users of the OpenACC Profiling Interface can be expected to 3384understand that performance will be impacted to some degree once the 3385Profiling Interface has gotten enabled: for example, because of the 3386@emph{runtime} (libgomp) calling into a third-party @emph{library} for 3387every event that has been registered. 3388 3389We're not yet accounting for the fact that @cite{OpenACC events may 3390occur during event processing}. 3391We just handle one case specially, as required by CUDA 9.0 3392@command{nvprof}, that @code{acc_get_device_type} 3393(@ref{acc_get_device_type})) may be called from 3394@code{acc_ev_device_init_start}, @code{acc_ev_device_init_end} 3395callbacks. 3396 3397We're not yet implementing initialization via a 3398@code{acc_register_library} function that is either statically linked 3399in, or dynamically via @env{LD_PRELOAD}. 3400Initialization via @code{acc_register_library} functions dynamically 3401loaded via the @env{ACC_PROFLIB} environment variable does work, as 3402does directly calling @code{acc_prof_register}, 3403@code{acc_prof_unregister}, @code{acc_prof_lookup}. 3404 3405As currently there are no inquiry functions defined, calls to 3406@code{acc_prof_lookup} will always return @code{NULL}. 3407 3408There aren't separate @emph{start}, @emph{stop} events defined for the 3409event types @code{acc_ev_create}, @code{acc_ev_delete}, 3410@code{acc_ev_alloc}, @code{acc_ev_free}. It's not clear if these 3411should be triggered before or after the actual device-specific call is 3412made. We trigger them after. 3413 3414Remarks about data provided to callbacks: 3415 3416@table @asis 3417 3418@item @code{acc_prof_info.event_type} 3419It's not clear if for @emph{nested} event callbacks (for example, 3420@code{acc_ev_enqueue_launch_start} as part of a parent compute 3421construct), this should be set for the nested event 3422(@code{acc_ev_enqueue_launch_start}), or if the value of the parent 3423construct should remain (@code{acc_ev_compute_construct_start}). In 3424this implementation, the value will generally correspond to the 3425innermost nested event type. 3426 3427@item @code{acc_prof_info.device_type} 3428@itemize 3429 3430@item 3431For @code{acc_ev_compute_construct_start}, and in presence of an 3432@code{if} clause with @emph{false} argument, this will still refer to 3433the offloading device type. 3434It's not clear if that's the expected behavior. 3435 3436@item 3437Complementary to the item before, for 3438@code{acc_ev_compute_construct_end}, this is set to 3439@code{acc_device_host} in presence of an @code{if} clause with 3440@emph{false} argument. 3441It's not clear if that's the expected behavior. 3442 3443@end itemize 3444 3445@item @code{acc_prof_info.thread_id} 3446Always @code{-1}; not yet implemented. 3447 3448@item @code{acc_prof_info.async} 3449@itemize 3450 3451@item 3452Not yet implemented correctly for 3453@code{acc_ev_compute_construct_start}. 3454 3455@item 3456In a compute construct, for host-fallback 3457execution/@code{acc_device_host} it will always be 3458@code{acc_async_sync}. 3459It's not clear if that's the expected behavior. 3460 3461@item 3462For @code{acc_ev_device_init_start} and @code{acc_ev_device_init_end}, 3463it will always be @code{acc_async_sync}. 3464It's not clear if that's the expected behavior. 3465 3466@end itemize 3467 3468@item @code{acc_prof_info.async_queue} 3469There is no @cite{limited number of asynchronous queues} in libgomp. 3470This will always have the same value as @code{acc_prof_info.async}. 3471 3472@item @code{acc_prof_info.src_file} 3473Always @code{NULL}; not yet implemented. 3474 3475@item @code{acc_prof_info.func_name} 3476Always @code{NULL}; not yet implemented. 3477 3478@item @code{acc_prof_info.line_no} 3479Always @code{-1}; not yet implemented. 3480 3481@item @code{acc_prof_info.end_line_no} 3482Always @code{-1}; not yet implemented. 3483 3484@item @code{acc_prof_info.func_line_no} 3485Always @code{-1}; not yet implemented. 3486 3487@item @code{acc_prof_info.func_end_line_no} 3488Always @code{-1}; not yet implemented. 3489 3490@item @code{acc_event_info.event_type}, @code{acc_event_info.*.event_type} 3491Relating to @code{acc_prof_info.event_type} discussed above, in this 3492implementation, this will always be the same value as 3493@code{acc_prof_info.event_type}. 3494 3495@item @code{acc_event_info.*.parent_construct} 3496@itemize 3497 3498@item 3499Will be @code{acc_construct_parallel} for all OpenACC compute 3500constructs as well as many OpenACC Runtime API calls; should be the 3501one matching the actual construct, or 3502@code{acc_construct_runtime_api}, respectively. 3503 3504@item 3505Will be @code{acc_construct_enter_data} or 3506@code{acc_construct_exit_data} when processing variable mappings 3507specified in OpenACC @emph{declare} directives; should be 3508@code{acc_construct_declare}. 3509 3510@item 3511For implicit @code{acc_ev_device_init_start}, 3512@code{acc_ev_device_init_end}, and explicit as well as implicit 3513@code{acc_ev_alloc}, @code{acc_ev_free}, 3514@code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end}, 3515@code{acc_ev_enqueue_download_start}, and 3516@code{acc_ev_enqueue_download_end}, will be 3517@code{acc_construct_parallel}; should reflect the real parent 3518construct. 3519 3520@end itemize 3521 3522@item @code{acc_event_info.*.implicit} 3523For @code{acc_ev_alloc}, @code{acc_ev_free}, 3524@code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end}, 3525@code{acc_ev_enqueue_download_start}, and 3526@code{acc_ev_enqueue_download_end}, this currently will be @code{1} 3527also for explicit usage. 3528 3529@item @code{acc_event_info.data_event.var_name} 3530Always @code{NULL}; not yet implemented. 3531 3532@item @code{acc_event_info.data_event.host_ptr} 3533For @code{acc_ev_alloc}, and @code{acc_ev_free}, this is always 3534@code{NULL}. 3535 3536@item @code{typedef union acc_api_info} 3537@dots{} as printed in @cite{5.2.3. Third Argument: API-Specific 3538Information}. This should obviously be @code{typedef @emph{struct} 3539acc_api_info}. 3540 3541@item @code{acc_api_info.device_api} 3542Possibly not yet implemented correctly for 3543@code{acc_ev_compute_construct_start}, 3544@code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}: 3545will always be @code{acc_device_api_none} for these event types. 3546For @code{acc_ev_enter_data_start}, it will be 3547@code{acc_device_api_none} in some cases. 3548 3549@item @code{acc_api_info.device_type} 3550Always the same as @code{acc_prof_info.device_type}. 3551 3552@item @code{acc_api_info.vendor} 3553Always @code{-1}; not yet implemented. 3554 3555@item @code{acc_api_info.device_handle} 3556Always @code{NULL}; not yet implemented. 3557 3558@item @code{acc_api_info.context_handle} 3559Always @code{NULL}; not yet implemented. 3560 3561@item @code{acc_api_info.async_handle} 3562Always @code{NULL}; not yet implemented. 3563 3564@end table 3565 3566Remarks about certain event types: 3567 3568@table @asis 3569 3570@item @code{acc_ev_device_init_start}, @code{acc_ev_device_init_end} 3571@itemize 3572 3573@item 3574@c See 'DEVICE_INIT_INSIDE_COMPUTE_CONSTRUCT' in 3575@c 'libgomp.oacc-c-c++-common/acc_prof-kernels-1.c', 3576@c 'libgomp.oacc-c-c++-common/acc_prof-parallel-1.c'. 3577Whan a compute construct triggers implicit 3578@code{acc_ev_device_init_start} and @code{acc_ev_device_init_end} 3579events, they currently aren't @emph{nested within} the corresponding 3580@code{acc_ev_compute_construct_start} and 3581@code{acc_ev_compute_construct_end}, but they're currently observed 3582@emph{before} @code{acc_ev_compute_construct_start}. 3583It's not clear what to do: the standard asks us provide a lot of 3584details to the @code{acc_ev_compute_construct_start} callback, without 3585(implicitly) initializing a device before? 3586 3587@item 3588Callbacks for these event types will not be invoked for calls to the 3589@code{acc_set_device_type} and @code{acc_set_device_num} functions. 3590It's not clear if they should be. 3591 3592@end itemize 3593 3594@item @code{acc_ev_enter_data_start}, @code{acc_ev_enter_data_end}, @code{acc_ev_exit_data_start}, @code{acc_ev_exit_data_end} 3595@itemize 3596 3597@item 3598Callbacks for these event types will also be invoked for OpenACC 3599@emph{host_data} constructs. 3600It's not clear if they should be. 3601 3602@item 3603Callbacks for these event types will also be invoked when processing 3604variable mappings specified in OpenACC @emph{declare} directives. 3605It's not clear if they should be. 3606 3607@end itemize 3608 3609@end table 3610 3611Callbacks for the following event types will be invoked, but dispatch 3612and information provided therein has not yet been thoroughly reviewed: 3613 3614@itemize 3615@item @code{acc_ev_alloc} 3616@item @code{acc_ev_free} 3617@item @code{acc_ev_update_start}, @code{acc_ev_update_end} 3618@item @code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end} 3619@item @code{acc_ev_enqueue_download_start}, @code{acc_ev_enqueue_download_end} 3620@end itemize 3621 3622During device initialization, and finalization, respectively, 3623callbacks for the following event types will not yet be invoked: 3624 3625@itemize 3626@item @code{acc_ev_alloc} 3627@item @code{acc_ev_free} 3628@end itemize 3629 3630Callbacks for the following event types have not yet been implemented, 3631so currently won't be invoked: 3632 3633@itemize 3634@item @code{acc_ev_device_shutdown_start}, @code{acc_ev_device_shutdown_end} 3635@item @code{acc_ev_runtime_shutdown} 3636@item @code{acc_ev_create}, @code{acc_ev_delete} 3637@item @code{acc_ev_wait_start}, @code{acc_ev_wait_end} 3638@end itemize 3639 3640For the following runtime library functions, not all expected 3641callbacks will be invoked (mostly concerning implicit device 3642initialization): 3643 3644@itemize 3645@item @code{acc_get_num_devices} 3646@item @code{acc_set_device_type} 3647@item @code{acc_get_device_type} 3648@item @code{acc_set_device_num} 3649@item @code{acc_get_device_num} 3650@item @code{acc_init} 3651@item @code{acc_shutdown} 3652@end itemize 3653 3654Aside from implicit device initialization, for the following runtime 3655library functions, no callbacks will be invoked for shared-memory 3656offloading devices (it's not clear if they should be): 3657 3658@itemize 3659@item @code{acc_malloc} 3660@item @code{acc_free} 3661@item @code{acc_copyin}, @code{acc_present_or_copyin}, @code{acc_copyin_async} 3662@item @code{acc_create}, @code{acc_present_or_create}, @code{acc_create_async} 3663@item @code{acc_copyout}, @code{acc_copyout_async}, @code{acc_copyout_finalize}, @code{acc_copyout_finalize_async} 3664@item @code{acc_delete}, @code{acc_delete_async}, @code{acc_delete_finalize}, @code{acc_delete_finalize_async} 3665@item @code{acc_update_device}, @code{acc_update_device_async} 3666@item @code{acc_update_self}, @code{acc_update_self_async} 3667@item @code{acc_map_data}, @code{acc_unmap_data} 3668@item @code{acc_memcpy_to_device}, @code{acc_memcpy_to_device_async} 3669@item @code{acc_memcpy_from_device}, @code{acc_memcpy_from_device_async} 3670@end itemize 3671 3672 3673 3674@c --------------------------------------------------------------------- 3675@c The libgomp ABI 3676@c --------------------------------------------------------------------- 3677 3678@node The libgomp ABI 3679@chapter The libgomp ABI 3680 3681The following sections present notes on the external ABI as 3682presented by libgomp. Only maintainers should need them. 3683 3684@menu 3685* Implementing MASTER construct:: 3686* Implementing CRITICAL construct:: 3687* Implementing ATOMIC construct:: 3688* Implementing FLUSH construct:: 3689* Implementing BARRIER construct:: 3690* Implementing THREADPRIVATE construct:: 3691* Implementing PRIVATE clause:: 3692* Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses:: 3693* Implementing REDUCTION clause:: 3694* Implementing PARALLEL construct:: 3695* Implementing FOR construct:: 3696* Implementing ORDERED construct:: 3697* Implementing SECTIONS construct:: 3698* Implementing SINGLE construct:: 3699* Implementing OpenACC's PARALLEL construct:: 3700@end menu 3701 3702 3703@node Implementing MASTER construct 3704@section Implementing MASTER construct 3705 3706@smallexample 3707if (omp_get_thread_num () == 0) 3708 block 3709@end smallexample 3710 3711Alternately, we generate two copies of the parallel subfunction 3712and only include this in the version run by the master thread. 3713Surely this is not worthwhile though... 3714 3715 3716 3717@node Implementing CRITICAL construct 3718@section Implementing CRITICAL construct 3719 3720Without a specified name, 3721 3722@smallexample 3723 void GOMP_critical_start (void); 3724 void GOMP_critical_end (void); 3725@end smallexample 3726 3727so that we don't get COPY relocations from libgomp to the main 3728application. 3729 3730With a specified name, use omp_set_lock and omp_unset_lock with 3731name being transformed into a variable declared like 3732 3733@smallexample 3734 omp_lock_t gomp_critical_user_<name> __attribute__((common)) 3735@end smallexample 3736 3737Ideally the ABI would specify that all zero is a valid unlocked 3738state, and so we wouldn't need to initialize this at 3739startup. 3740 3741 3742 3743@node Implementing ATOMIC construct 3744@section Implementing ATOMIC construct 3745 3746The target should implement the @code{__sync} builtins. 3747 3748Failing that we could add 3749 3750@smallexample 3751 void GOMP_atomic_enter (void) 3752 void GOMP_atomic_exit (void) 3753@end smallexample 3754 3755which reuses the regular lock code, but with yet another lock 3756object private to the library. 3757 3758 3759 3760@node Implementing FLUSH construct 3761@section Implementing FLUSH construct 3762 3763Expands to the @code{__sync_synchronize} builtin. 3764 3765 3766 3767@node Implementing BARRIER construct 3768@section Implementing BARRIER construct 3769 3770@smallexample 3771 void GOMP_barrier (void) 3772@end smallexample 3773 3774 3775@node Implementing THREADPRIVATE construct 3776@section Implementing THREADPRIVATE construct 3777 3778In _most_ cases we can map this directly to @code{__thread}. Except 3779that OMP allows constructors for C++ objects. We can either 3780refuse to support this (how often is it used?) or we can 3781implement something akin to .ctors. 3782 3783Even more ideally, this ctor feature is handled by extensions 3784to the main pthreads library. Failing that, we can have a set 3785of entry points to register ctor functions to be called. 3786 3787 3788 3789@node Implementing PRIVATE clause 3790@section Implementing PRIVATE clause 3791 3792In association with a PARALLEL, or within the lexical extent 3793of a PARALLEL block, the variable becomes a local variable in 3794the parallel subfunction. 3795 3796In association with FOR or SECTIONS blocks, create a new 3797automatic variable within the current function. This preserves 3798the semantic of new variable creation. 3799 3800 3801 3802@node Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses 3803@section Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses 3804 3805This seems simple enough for PARALLEL blocks. Create a private 3806struct for communicating between the parent and subfunction. 3807In the parent, copy in values for scalar and "small" structs; 3808copy in addresses for others TREE_ADDRESSABLE types. In the 3809subfunction, copy the value into the local variable. 3810 3811It is not clear what to do with bare FOR or SECTION blocks. 3812The only thing I can figure is that we do something like: 3813 3814@smallexample 3815#pragma omp for firstprivate(x) lastprivate(y) 3816for (int i = 0; i < n; ++i) 3817 body; 3818@end smallexample 3819 3820which becomes 3821 3822@smallexample 3823@{ 3824 int x = x, y; 3825 3826 // for stuff 3827 3828 if (i == n) 3829 y = y; 3830@} 3831@end smallexample 3832 3833where the "x=x" and "y=y" assignments actually have different 3834uids for the two variables, i.e. not something you could write 3835directly in C. Presumably this only makes sense if the "outer" 3836x and y are global variables. 3837 3838COPYPRIVATE would work the same way, except the structure 3839broadcast would have to happen via SINGLE machinery instead. 3840 3841 3842 3843@node Implementing REDUCTION clause 3844@section Implementing REDUCTION clause 3845 3846The private struct mentioned in the previous section should have 3847a pointer to an array of the type of the variable, indexed by the 3848thread's @var{team_id}. The thread stores its final value into the 3849array, and after the barrier, the master thread iterates over the 3850array to collect the values. 3851 3852 3853@node Implementing PARALLEL construct 3854@section Implementing PARALLEL construct 3855 3856@smallexample 3857 #pragma omp parallel 3858 @{ 3859 body; 3860 @} 3861@end smallexample 3862 3863becomes 3864 3865@smallexample 3866 void subfunction (void *data) 3867 @{ 3868 use data; 3869 body; 3870 @} 3871 3872 setup data; 3873 GOMP_parallel_start (subfunction, &data, num_threads); 3874 subfunction (&data); 3875 GOMP_parallel_end (); 3876@end smallexample 3877 3878@smallexample 3879 void GOMP_parallel_start (void (*fn)(void *), void *data, unsigned num_threads) 3880@end smallexample 3881 3882The @var{FN} argument is the subfunction to be run in parallel. 3883 3884The @var{DATA} argument is a pointer to a structure used to 3885communicate data in and out of the subfunction, as discussed 3886above with respect to FIRSTPRIVATE et al. 3887 3888The @var{NUM_THREADS} argument is 1 if an IF clause is present 3889and false, or the value of the NUM_THREADS clause, if 3890present, or 0. 3891 3892The function needs to create the appropriate number of 3893threads and/or launch them from the dock. It needs to 3894create the team structure and assign team ids. 3895 3896@smallexample 3897 void GOMP_parallel_end (void) 3898@end smallexample 3899 3900Tears down the team and returns us to the previous @code{omp_in_parallel()} state. 3901 3902 3903 3904@node Implementing FOR construct 3905@section Implementing FOR construct 3906 3907@smallexample 3908 #pragma omp parallel for 3909 for (i = lb; i <= ub; i++) 3910 body; 3911@end smallexample 3912 3913becomes 3914 3915@smallexample 3916 void subfunction (void *data) 3917 @{ 3918 long _s0, _e0; 3919 while (GOMP_loop_static_next (&_s0, &_e0)) 3920 @{ 3921 long _e1 = _e0, i; 3922 for (i = _s0; i < _e1; i++) 3923 body; 3924 @} 3925 GOMP_loop_end_nowait (); 3926 @} 3927 3928 GOMP_parallel_loop_static (subfunction, NULL, 0, lb, ub+1, 1, 0); 3929 subfunction (NULL); 3930 GOMP_parallel_end (); 3931@end smallexample 3932 3933@smallexample 3934 #pragma omp for schedule(runtime) 3935 for (i = 0; i < n; i++) 3936 body; 3937@end smallexample 3938 3939becomes 3940 3941@smallexample 3942 @{ 3943 long i, _s0, _e0; 3944 if (GOMP_loop_runtime_start (0, n, 1, &_s0, &_e0)) 3945 do @{ 3946 long _e1 = _e0; 3947 for (i = _s0, i < _e0; i++) 3948 body; 3949 @} while (GOMP_loop_runtime_next (&_s0, _&e0)); 3950 GOMP_loop_end (); 3951 @} 3952@end smallexample 3953 3954Note that while it looks like there is trickiness to propagating 3955a non-constant STEP, there isn't really. We're explicitly allowed 3956to evaluate it as many times as we want, and any variables involved 3957should automatically be handled as PRIVATE or SHARED like any other 3958variables. So the expression should remain evaluable in the 3959subfunction. We can also pull it into a local variable if we like, 3960but since its supposed to remain unchanged, we can also not if we like. 3961 3962If we have SCHEDULE(STATIC), and no ORDERED, then we ought to be 3963able to get away with no work-sharing context at all, since we can 3964simply perform the arithmetic directly in each thread to divide up 3965the iterations. Which would mean that we wouldn't need to call any 3966of these routines. 3967 3968There are separate routines for handling loops with an ORDERED 3969clause. Bookkeeping for that is non-trivial... 3970 3971 3972 3973@node Implementing ORDERED construct 3974@section Implementing ORDERED construct 3975 3976@smallexample 3977 void GOMP_ordered_start (void) 3978 void GOMP_ordered_end (void) 3979@end smallexample 3980 3981 3982 3983@node Implementing SECTIONS construct 3984@section Implementing SECTIONS construct 3985 3986A block as 3987 3988@smallexample 3989 #pragma omp sections 3990 @{ 3991 #pragma omp section 3992 stmt1; 3993 #pragma omp section 3994 stmt2; 3995 #pragma omp section 3996 stmt3; 3997 @} 3998@end smallexample 3999 4000becomes 4001 4002@smallexample 4003 for (i = GOMP_sections_start (3); i != 0; i = GOMP_sections_next ()) 4004 switch (i) 4005 @{ 4006 case 1: 4007 stmt1; 4008 break; 4009 case 2: 4010 stmt2; 4011 break; 4012 case 3: 4013 stmt3; 4014 break; 4015 @} 4016 GOMP_barrier (); 4017@end smallexample 4018 4019 4020@node Implementing SINGLE construct 4021@section Implementing SINGLE construct 4022 4023A block like 4024 4025@smallexample 4026 #pragma omp single 4027 @{ 4028 body; 4029 @} 4030@end smallexample 4031 4032becomes 4033 4034@smallexample 4035 if (GOMP_single_start ()) 4036 body; 4037 GOMP_barrier (); 4038@end smallexample 4039 4040while 4041 4042@smallexample 4043 #pragma omp single copyprivate(x) 4044 body; 4045@end smallexample 4046 4047becomes 4048 4049@smallexample 4050 datap = GOMP_single_copy_start (); 4051 if (datap == NULL) 4052 @{ 4053 body; 4054 data.x = x; 4055 GOMP_single_copy_end (&data); 4056 @} 4057 else 4058 x = datap->x; 4059 GOMP_barrier (); 4060@end smallexample 4061 4062 4063 4064@node Implementing OpenACC's PARALLEL construct 4065@section Implementing OpenACC's PARALLEL construct 4066 4067@smallexample 4068 void GOACC_parallel () 4069@end smallexample 4070 4071 4072 4073@c --------------------------------------------------------------------- 4074@c Reporting Bugs 4075@c --------------------------------------------------------------------- 4076 4077@node Reporting Bugs 4078@chapter Reporting Bugs 4079 4080Bugs in the GNU Offloading and Multi Processing Runtime Library should 4081be reported via @uref{https://gcc.gnu.org/bugzilla/, Bugzilla}. Please add 4082"openacc", or "openmp", or both to the keywords field in the bug 4083report, as appropriate. 4084 4085 4086 4087@c --------------------------------------------------------------------- 4088@c GNU General Public License 4089@c --------------------------------------------------------------------- 4090 4091@include gpl_v3.texi 4092 4093 4094 4095@c --------------------------------------------------------------------- 4096@c GNU Free Documentation License 4097@c --------------------------------------------------------------------- 4098 4099@include fdl.texi 4100 4101 4102 4103@c --------------------------------------------------------------------- 4104@c Funding Free Software 4105@c --------------------------------------------------------------------- 4106 4107@include funding.texi 4108 4109@c --------------------------------------------------------------------- 4110@c Index 4111@c --------------------------------------------------------------------- 4112 4113@node Library Index 4114@unnumbered Library Index 4115 4116@printindex cp 4117 4118@bye 4119