1 2==================== 3``<atomic>`` Design 4==================== 5 6There were originally 3 designs under consideration. They differ in where most 7of the implementation work is done. The functionality exposed to the customer 8should be identical (and conforming) for all three designs. 9 10 11Design A: Minimal work for the library 12====================================== 13The compiler supplies all of the intrinsics as described below. This list of 14intrinsics roughly parallels the requirements of the C and C++ atomics proposals. 15The C and C++ library implementations simply drop through to these intrinsics. 16Anything the platform does not support in hardware, the compiler 17arranges for a (compiler-rt) library call to be made which will do the job with 18a mutex, and in this case ignoring the memory ordering parameter (effectively 19implementing ``memory_order_seq_cst``). 20 21Ultimate efficiency is preferred over run time error checking. Undefined 22behavior is acceptable when the inputs do not conform as defined below. 23 24.. code-block:: cpp 25 26 // In every intrinsic signature below, type* atomic_obj may be a pointer to a 27 // volatile-qualified type. Memory ordering values map to the following meanings: 28 // memory_order_relaxed == 0 29 // memory_order_consume == 1 30 // memory_order_acquire == 2 31 // memory_order_release == 3 32 // memory_order_acq_rel == 4 33 // memory_order_seq_cst == 5 34 35 // type must be trivially copyable 36 // type represents a "type argument" 37 bool __atomic_is_lock_free(type); 38 39 // type must be trivially copyable 40 // Behavior is defined for mem_ord = 0, 1, 2, 5 41 type __atomic_load(const type* atomic_obj, int mem_ord); 42 43 // type must be trivially copyable 44 // Behavior is defined for mem_ord = 0, 3, 5 45 void __atomic_store(type* atomic_obj, type desired, int mem_ord); 46 47 // type must be trivially copyable 48 // Behavior is defined for mem_ord = [0 ... 5] 49 type __atomic_exchange(type* atomic_obj, type desired, int mem_ord); 50 51 // type must be trivially copyable 52 // Behavior is defined for mem_success = [0 ... 5], 53 // mem_failure <= mem_success 54 // mem_failure != 3 55 // mem_failure != 4 56 bool __atomic_compare_exchange_strong(type* atomic_obj, 57 type* expected, type desired, 58 int mem_success, int mem_failure); 59 60 // type must be trivially copyable 61 // Behavior is defined for mem_success = [0 ... 5], 62 // mem_failure <= mem_success 63 // mem_failure != 3 64 // mem_failure != 4 65 bool __atomic_compare_exchange_weak(type* atomic_obj, 66 type* expected, type desired, 67 int mem_success, int mem_failure); 68 69 // type is one of: char, signed char, unsigned char, short, unsigned short, int, 70 // unsigned int, long, unsigned long, long long, unsigned long long, 71 // char16_t, char32_t, wchar_t 72 // Behavior is defined for mem_ord = [0 ... 5] 73 type __atomic_fetch_add(type* atomic_obj, type operand, int mem_ord); 74 75 // type is one of: char, signed char, unsigned char, short, unsigned short, int, 76 // unsigned int, long, unsigned long, long long, unsigned long long, 77 // char16_t, char32_t, wchar_t 78 // Behavior is defined for mem_ord = [0 ... 5] 79 type __atomic_fetch_sub(type* atomic_obj, type operand, int mem_ord); 80 81 // type is one of: char, signed char, unsigned char, short, unsigned short, int, 82 // unsigned int, long, unsigned long, long long, unsigned long long, 83 // char16_t, char32_t, wchar_t 84 // Behavior is defined for mem_ord = [0 ... 5] 85 type __atomic_fetch_and(type* atomic_obj, type operand, int mem_ord); 86 87 // type is one of: char, signed char, unsigned char, short, unsigned short, int, 88 // unsigned int, long, unsigned long, long long, unsigned long long, 89 // char16_t, char32_t, wchar_t 90 // Behavior is defined for mem_ord = [0 ... 5] 91 type __atomic_fetch_or(type* atomic_obj, type operand, int mem_ord); 92 93 // type is one of: char, signed char, unsigned char, short, unsigned short, int, 94 // unsigned int, long, unsigned long, long long, unsigned long long, 95 // char16_t, char32_t, wchar_t 96 // Behavior is defined for mem_ord = [0 ... 5] 97 type __atomic_fetch_xor(type* atomic_obj, type operand, int mem_ord); 98 99 // Behavior is defined for mem_ord = [0 ... 5] 100 void* __atomic_fetch_add(void** atomic_obj, ptrdiff_t operand, int mem_ord); 101 void* __atomic_fetch_sub(void** atomic_obj, ptrdiff_t operand, int mem_ord); 102 103 // Behavior is defined for mem_ord = [0 ... 5] 104 void __atomic_thread_fence(int mem_ord); 105 void __atomic_signal_fence(int mem_ord); 106 107If desired the intrinsics taking a single ``mem_ord`` parameter can default 108this argument to 5. 109 110If desired the intrinsics taking two ordering parameters can default ``mem_success`` 111to 5, and ``mem_failure`` to ``translate_memory_order(mem_success)`` where 112``translate_memory_order(mem_success)`` is defined as: 113 114.. code-block:: cpp 115 116 int translate_memory_order(int o) { 117 switch (o) { 118 case 4: 119 return 2; 120 case 3: 121 return 0; 122 } 123 return o; 124 } 125 126Below are representative C++ implementations of all of the operations. Their 127purpose is to document the desired semantics of each operation, assuming 128``memory_order_seq_cst``. This is essentially the code that will be called 129if the front end calls out to compiler-rt. 130 131.. code-block:: cpp 132 133 template <class T> 134 T __atomic_load(T const volatile* obj) { 135 unique_lock<mutex> _(some_mutex); 136 return *obj; 137 } 138 139 template <class T> 140 void __atomic_store(T volatile* obj, T desr) { 141 unique_lock<mutex> _(some_mutex); 142 *obj = desr; 143 } 144 145 template <class T> 146 T __atomic_exchange(T volatile* obj, T desr) { 147 unique_lock<mutex> _(some_mutex); 148 T r = *obj; 149 *obj = desr; 150 return r; 151 } 152 153 template <class T> 154 bool __atomic_compare_exchange_strong(T volatile* obj, T* exp, T desr) { 155 unique_lock<mutex> _(some_mutex); 156 if (std::memcmp(const_cast<T*>(obj), exp, sizeof(T)) == 0) // if (*obj == *exp) 157 { 158 std::memcpy(const_cast<T*>(obj), &desr, sizeof(T)); // *obj = desr; 159 return true; 160 } 161 std::memcpy(exp, const_cast<T*>(obj), sizeof(T)); // *exp = *obj; 162 return false; 163 } 164 165 // May spuriously return false (even if *obj == *exp) 166 template <class T> 167 bool __atomic_compare_exchange_weak(T volatile* obj, T* exp, T desr) { 168 unique_lock<mutex> _(some_mutex); 169 if (std::memcmp(const_cast<T*>(obj), exp, sizeof(T)) == 0) // if (*obj == *exp) 170 { 171 std::memcpy(const_cast<T*>(obj), &desr, sizeof(T)); // *obj = desr; 172 return true; 173 } 174 std::memcpy(exp, const_cast<T*>(obj), sizeof(T)); // *exp = *obj; 175 return false; 176 } 177 178 template <class T> 179 T __atomic_fetch_add(T volatile* obj, T operand) { 180 unique_lock<mutex> _(some_mutex); 181 T r = *obj; 182 *obj += operand; 183 return r; 184 } 185 186 template <class T> 187 T __atomic_fetch_sub(T volatile* obj, T operand) { 188 unique_lock<mutex> _(some_mutex); 189 T r = *obj; 190 *obj -= operand; 191 return r; 192 } 193 194 template <class T> 195 T __atomic_fetch_and(T volatile* obj, T operand) { 196 unique_lock<mutex> _(some_mutex); 197 T r = *obj; 198 *obj &= operand; 199 return r; 200 } 201 202 template <class T> 203 T __atomic_fetch_or(T volatile* obj, T operand) { 204 unique_lock<mutex> _(some_mutex); 205 T r = *obj; 206 *obj |= operand; 207 return r; 208 } 209 210 template <class T> 211 T __atomic_fetch_xor(T volatile* obj, T operand) { 212 unique_lock<mutex> _(some_mutex); 213 T r = *obj; 214 *obj ^= operand; 215 return r; 216 } 217 218 void* __atomic_fetch_add(void* volatile* obj, ptrdiff_t operand) { 219 unique_lock<mutex> _(some_mutex); 220 void* r = *obj; 221 (char*&)(*obj) += operand; 222 return r; 223 } 224 225 void* __atomic_fetch_sub(void* volatile* obj, ptrdiff_t operand) { 226 unique_lock<mutex> _(some_mutex); 227 void* r = *obj; 228 (char*&)(*obj) -= operand; 229 return r; 230 } 231 232 void __atomic_thread_fence() { 233 unique_lock<mutex> _(some_mutex); 234 } 235 236 void __atomic_signal_fence() { 237 unique_lock<mutex> _(some_mutex); 238 } 239 240 241Design B: Something in between 242============================== 243This is a variation of design A which puts the burden on the library to arrange 244for the correct manipulation of the run time memory ordering arguments, and only 245calls the compiler for well-defined memory orderings. I think of this design as 246the worst of A and C, instead of the best of A and C. But I offer it as an 247option in the spirit of completeness. 248 249.. code-block:: cpp 250 251 // type must be trivially copyable 252 bool __atomic_is_lock_free(const type* atomic_obj); 253 254 // type must be trivially copyable 255 type __atomic_load_relaxed(const volatile type* atomic_obj); 256 type __atomic_load_consume(const volatile type* atomic_obj); 257 type __atomic_load_acquire(const volatile type* atomic_obj); 258 type __atomic_load_seq_cst(const volatile type* atomic_obj); 259 260 // type must be trivially copyable 261 type __atomic_store_relaxed(volatile type* atomic_obj, type desired); 262 type __atomic_store_release(volatile type* atomic_obj, type desired); 263 type __atomic_store_seq_cst(volatile type* atomic_obj, type desired); 264 265 // type must be trivially copyable 266 type __atomic_exchange_relaxed(volatile type* atomic_obj, type desired); 267 type __atomic_exchange_consume(volatile type* atomic_obj, type desired); 268 type __atomic_exchange_acquire(volatile type* atomic_obj, type desired); 269 type __atomic_exchange_release(volatile type* atomic_obj, type desired); 270 type __atomic_exchange_acq_rel(volatile type* atomic_obj, type desired); 271 type __atomic_exchange_seq_cst(volatile type* atomic_obj, type desired); 272 273 // type must be trivially copyable 274 bool __atomic_compare_exchange_strong_relaxed_relaxed(volatile type* atomic_obj, 275 type* expected, 276 type desired); 277 bool __atomic_compare_exchange_strong_consume_relaxed(volatile type* atomic_obj, 278 type* expected, 279 type desired); 280 bool __atomic_compare_exchange_strong_consume_consume(volatile type* atomic_obj, 281 type* expected, 282 type desired); 283 bool __atomic_compare_exchange_strong_acquire_relaxed(volatile type* atomic_obj, 284 type* expected, 285 type desired); 286 bool __atomic_compare_exchange_strong_acquire_consume(volatile type* atomic_obj, 287 type* expected, 288 type desired); 289 bool __atomic_compare_exchange_strong_acquire_acquire(volatile type* atomic_obj, 290 type* expected, 291 type desired); 292 bool __atomic_compare_exchange_strong_release_relaxed(volatile type* atomic_obj, 293 type* expected, 294 type desired); 295 bool __atomic_compare_exchange_strong_release_consume(volatile type* atomic_obj, 296 type* expected, 297 type desired); 298 bool __atomic_compare_exchange_strong_release_acquire(volatile type* atomic_obj, 299 type* expected, 300 type desired); 301 bool __atomic_compare_exchange_strong_acq_rel_relaxed(volatile type* atomic_obj, 302 type* expected, 303 type desired); 304 bool __atomic_compare_exchange_strong_acq_rel_consume(volatile type* atomic_obj, 305 type* expected, 306 type desired); 307 bool __atomic_compare_exchange_strong_acq_rel_acquire(volatile type* atomic_obj, 308 type* expected, 309 type desired); 310 bool __atomic_compare_exchange_strong_seq_cst_relaxed(volatile type* atomic_obj, 311 type* expected, 312 type desired); 313 bool __atomic_compare_exchange_strong_seq_cst_consume(volatile type* atomic_obj, 314 type* expected, 315 type desired); 316 bool __atomic_compare_exchange_strong_seq_cst_acquire(volatile type* atomic_obj, 317 type* expected, 318 type desired); 319 bool __atomic_compare_exchange_strong_seq_cst_seq_cst(volatile type* atomic_obj, 320 type* expected, 321 type desired); 322 323 // type must be trivially copyable 324 bool __atomic_compare_exchange_weak_relaxed_relaxed(volatile type* atomic_obj, 325 type* expected, 326 type desired); 327 bool __atomic_compare_exchange_weak_consume_relaxed(volatile type* atomic_obj, 328 type* expected, 329 type desired); 330 bool __atomic_compare_exchange_weak_consume_consume(volatile type* atomic_obj, 331 type* expected, 332 type desired); 333 bool __atomic_compare_exchange_weak_acquire_relaxed(volatile type* atomic_obj, 334 type* expected, 335 type desired); 336 bool __atomic_compare_exchange_weak_acquire_consume(volatile type* atomic_obj, 337 type* expected, 338 type desired); 339 bool __atomic_compare_exchange_weak_acquire_acquire(volatile type* atomic_obj, 340 type* expected, 341 type desired); 342 bool __atomic_compare_exchange_weak_release_relaxed(volatile type* atomic_obj, 343 type* expected, 344 type desired); 345 bool __atomic_compare_exchange_weak_release_consume(volatile type* atomic_obj, 346 type* expected, 347 type desired); 348 bool __atomic_compare_exchange_weak_release_acquire(volatile type* atomic_obj, 349 type* expected, 350 type desired); 351 bool __atomic_compare_exchange_weak_acq_rel_relaxed(volatile type* atomic_obj, 352 type* expected, 353 type desired); 354 bool __atomic_compare_exchange_weak_acq_rel_consume(volatile type* atomic_obj, 355 type* expected, 356 type desired); 357 bool __atomic_compare_exchange_weak_acq_rel_acquire(volatile type* atomic_obj, 358 type* expected, 359 type desired); 360 bool __atomic_compare_exchange_weak_seq_cst_relaxed(volatile type* atomic_obj, 361 type* expected, 362 type desired); 363 bool __atomic_compare_exchange_weak_seq_cst_consume(volatile type* atomic_obj, 364 type* expected, 365 type desired); 366 bool __atomic_compare_exchange_weak_seq_cst_acquire(volatile type* atomic_obj, 367 type* expected, 368 type desired); 369 bool __atomic_compare_exchange_weak_seq_cst_seq_cst(volatile type* atomic_obj, 370 type* expected, 371 type desired); 372 373 // type is one of: char, signed char, unsigned char, short, unsigned short, int, 374 // unsigned int, long, unsigned long, long long, unsigned long long, 375 // char16_t, char32_t, wchar_t 376 type __atomic_fetch_add_relaxed(volatile type* atomic_obj, type operand); 377 type __atomic_fetch_add_consume(volatile type* atomic_obj, type operand); 378 type __atomic_fetch_add_acquire(volatile type* atomic_obj, type operand); 379 type __atomic_fetch_add_release(volatile type* atomic_obj, type operand); 380 type __atomic_fetch_add_acq_rel(volatile type* atomic_obj, type operand); 381 type __atomic_fetch_add_seq_cst(volatile type* atomic_obj, type operand); 382 383 // type is one of: char, signed char, unsigned char, short, unsigned short, int, 384 // unsigned int, long, unsigned long, long long, unsigned long long, 385 // char16_t, char32_t, wchar_t 386 type __atomic_fetch_sub_relaxed(volatile type* atomic_obj, type operand); 387 type __atomic_fetch_sub_consume(volatile type* atomic_obj, type operand); 388 type __atomic_fetch_sub_acquire(volatile type* atomic_obj, type operand); 389 type __atomic_fetch_sub_release(volatile type* atomic_obj, type operand); 390 type __atomic_fetch_sub_acq_rel(volatile type* atomic_obj, type operand); 391 type __atomic_fetch_sub_seq_cst(volatile type* atomic_obj, type operand); 392 393 // type is one of: char, signed char, unsigned char, short, unsigned short, int, 394 // unsigned int, long, unsigned long, long long, unsigned long long, 395 // char16_t, char32_t, wchar_t 396 type __atomic_fetch_and_relaxed(volatile type* atomic_obj, type operand); 397 type __atomic_fetch_and_consume(volatile type* atomic_obj, type operand); 398 type __atomic_fetch_and_acquire(volatile type* atomic_obj, type operand); 399 type __atomic_fetch_and_release(volatile type* atomic_obj, type operand); 400 type __atomic_fetch_and_acq_rel(volatile type* atomic_obj, type operand); 401 type __atomic_fetch_and_seq_cst(volatile type* atomic_obj, type operand); 402 403 // type is one of: char, signed char, unsigned char, short, unsigned short, int, 404 // unsigned int, long, unsigned long, long long, unsigned long long, 405 // char16_t, char32_t, wchar_t 406 type __atomic_fetch_or_relaxed(volatile type* atomic_obj, type operand); 407 type __atomic_fetch_or_consume(volatile type* atomic_obj, type operand); 408 type __atomic_fetch_or_acquire(volatile type* atomic_obj, type operand); 409 type __atomic_fetch_or_release(volatile type* atomic_obj, type operand); 410 type __atomic_fetch_or_acq_rel(volatile type* atomic_obj, type operand); 411 type __atomic_fetch_or_seq_cst(volatile type* atomic_obj, type operand); 412 413 // type is one of: char, signed char, unsigned char, short, unsigned short, int, 414 // unsigned int, long, unsigned long, long long, unsigned long long, 415 // char16_t, char32_t, wchar_t 416 type __atomic_fetch_xor_relaxed(volatile type* atomic_obj, type operand); 417 type __atomic_fetch_xor_consume(volatile type* atomic_obj, type operand); 418 type __atomic_fetch_xor_acquire(volatile type* atomic_obj, type operand); 419 type __atomic_fetch_xor_release(volatile type* atomic_obj, type operand); 420 type __atomic_fetch_xor_acq_rel(volatile type* atomic_obj, type operand); 421 type __atomic_fetch_xor_seq_cst(volatile type* atomic_obj, type operand); 422 423 void* __atomic_fetch_add_relaxed(void* volatile* atomic_obj, ptrdiff_t operand); 424 void* __atomic_fetch_add_consume(void* volatile* atomic_obj, ptrdiff_t operand); 425 void* __atomic_fetch_add_acquire(void* volatile* atomic_obj, ptrdiff_t operand); 426 void* __atomic_fetch_add_release(void* volatile* atomic_obj, ptrdiff_t operand); 427 void* __atomic_fetch_add_acq_rel(void* volatile* atomic_obj, ptrdiff_t operand); 428 void* __atomic_fetch_add_seq_cst(void* volatile* atomic_obj, ptrdiff_t operand); 429 430 void* __atomic_fetch_sub_relaxed(void* volatile* atomic_obj, ptrdiff_t operand); 431 void* __atomic_fetch_sub_consume(void* volatile* atomic_obj, ptrdiff_t operand); 432 void* __atomic_fetch_sub_acquire(void* volatile* atomic_obj, ptrdiff_t operand); 433 void* __atomic_fetch_sub_release(void* volatile* atomic_obj, ptrdiff_t operand); 434 void* __atomic_fetch_sub_acq_rel(void* volatile* atomic_obj, ptrdiff_t operand); 435 void* __atomic_fetch_sub_seq_cst(void* volatile* atomic_obj, ptrdiff_t operand); 436 437 void __atomic_thread_fence_relaxed(); 438 void __atomic_thread_fence_consume(); 439 void __atomic_thread_fence_acquire(); 440 void __atomic_thread_fence_release(); 441 void __atomic_thread_fence_acq_rel(); 442 void __atomic_thread_fence_seq_cst(); 443 444 void __atomic_signal_fence_relaxed(); 445 void __atomic_signal_fence_consume(); 446 void __atomic_signal_fence_acquire(); 447 void __atomic_signal_fence_release(); 448 void __atomic_signal_fence_acq_rel(); 449 void __atomic_signal_fence_seq_cst(); 450 451Design C: Minimal work for the front end 452======================================== 453The ``<atomic>`` header is one of the most closely coupled headers to the compiler. 454Ideally when you invoke any function from ``<atomic>``, it should result in highly 455optimized assembly being inserted directly into your application -- assembly that 456is not otherwise representable by higher level C or C++ expressions. The design of 457the libc++ ``<atomic>`` header started with this goal in mind. A secondary, but 458still very important goal is that the compiler should have to do minimal work to 459facilitate the implementation of ``<atomic>``. Without this second goal, then 460practically speaking, the libc++ ``<atomic>`` header would be doomed to be a 461barely supported, second class citizen on almost every platform. 462 463Goals: 464 465- Optimal code generation for atomic operations 466- Minimal effort for the compiler to achieve goal 1 on any given platform 467- Conformance to the C++0X draft standard 468 469The purpose of this document is to inform compiler writers what they need to do 470to enable a high performance libc++ ``<atomic>`` with minimal effort. 471 472The minimal work that must be done for a conforming ``<atomic>`` 473---------------------------------------------------------------- 474The only "atomic" operations that must actually be lock free in 475``<atomic>`` are represented by the following compiler intrinsics: 476 477.. code-block:: cpp 478 479 __atomic_flag__ __atomic_exchange_seq_cst(__atomic_flag__ volatile* obj, __atomic_flag__ desr) { 480 unique_lock<mutex> _(some_mutex); 481 __atomic_flag__ result = *obj; 482 *obj = desr; 483 return result; 484 } 485 486 void __atomic_store_seq_cst(__atomic_flag__ volatile* obj, __atomic_flag__ desr) { 487 unique_lock<mutex> _(some_mutex); 488 *obj = desr; 489 } 490 491Where: 492 493- If ``__has_feature(__atomic_flag)`` evaluates to 1 in the preprocessor then 494 the compiler must define ``__atomic_flag__`` (e.g. as a typedef to ``int``). 495- If ``__has_feature(__atomic_flag)`` evaluates to 0 in the preprocessor then 496 the library defines ``__atomic_flag__`` as a typedef to ``bool``. 497- To communicate that the above intrinsics are available, the compiler must 498 arrange for ``__has_feature`` to return 1 when fed the intrinsic name 499 appended with an '_' and the mangled type name of ``__atomic_flag__``. 500 501For example if ``__atomic_flag__`` is ``unsigned int``: 502 503.. code-block:: cpp 504 505 // __has_feature(__atomic_flag) == 1 506 // __has_feature(__atomic_exchange_seq_cst_j) == 1 507 // __has_feature(__atomic_store_seq_cst_j) == 1 508 509 typedef unsigned int __atomic_flag__; 510 511 unsigned int __atomic_exchange_seq_cst(unsigned int volatile*, unsigned int) { 512 // ... 513 } 514 515 void __atomic_store_seq_cst(unsigned int volatile*, unsigned int) { 516 // ... 517 } 518 519That's it! Compiler writers do the above and you've got a fully conforming 520(though sub-par performance) ``<atomic>`` header! 521 522 523Recommended work for a higher performance ``<atomic>`` 524------------------------------------------------------ 525It would be good if the above intrinsics worked with all integral types plus 526``void*``. Because this may not be possible to do in a lock-free manner for 527all integral types on all platforms, a compiler must communicate each type that 528an intrinsic works with. For example, if ``__atomic_exchange_seq_cst`` works 529for all types except for ``long long`` and ``unsigned long long`` then: 530 531.. code-block:: cpp 532 533 __has_feature(__atomic_exchange_seq_cst_b) == 1 // bool 534 __has_feature(__atomic_exchange_seq_cst_c) == 1 // char 535 __has_feature(__atomic_exchange_seq_cst_a) == 1 // signed char 536 __has_feature(__atomic_exchange_seq_cst_h) == 1 // unsigned char 537 __has_feature(__atomic_exchange_seq_cst_Ds) == 1 // char16_t 538 __has_feature(__atomic_exchange_seq_cst_Di) == 1 // char32_t 539 __has_feature(__atomic_exchange_seq_cst_w) == 1 // wchar_t 540 __has_feature(__atomic_exchange_seq_cst_s) == 1 // short 541 __has_feature(__atomic_exchange_seq_cst_t) == 1 // unsigned short 542 __has_feature(__atomic_exchange_seq_cst_i) == 1 // int 543 __has_feature(__atomic_exchange_seq_cst_j) == 1 // unsigned int 544 __has_feature(__atomic_exchange_seq_cst_l) == 1 // long 545 __has_feature(__atomic_exchange_seq_cst_m) == 1 // unsigned long 546 __has_feature(__atomic_exchange_seq_cst_Pv) == 1 // void* 547 548Note that only the ``__has_feature`` flag is decorated with the argument 549type. The name of the compiler intrinsic is not decorated, but instead works 550like a C++ overloaded function. 551 552Additionally, there are other intrinsics besides ``__atomic_exchange_seq_cst`` 553and ``__atomic_store_seq_cst``. They are optional. But if the compiler can 554generate faster code than provided by the library, then clients will benefit 555from the compiler writer's expertise and knowledge of the targeted platform. 556 557Below is the complete list of *sequentially consistent* intrinsics, and 558their library implementations. Template syntax is used to indicate the desired 559overloading for integral and ``void*`` types. The template does not represent a 560requirement that the intrinsic operate on **any** type! 561 562.. code-block:: cpp 563 564 // T is one of: 565 // bool, char, signed char, unsigned char, short, unsigned short, 566 // int, unsigned int, long, unsigned long, 567 // long long, unsigned long long, char16_t, char32_t, wchar_t, void* 568 569 template <class T> 570 T __atomic_load_seq_cst(T const volatile* obj) { 571 unique_lock<mutex> _(some_mutex); 572 return *obj; 573 } 574 575 template <class T> 576 void __atomic_store_seq_cst(T volatile* obj, T desr) { 577 unique_lock<mutex> _(some_mutex); 578 *obj = desr; 579 } 580 581 template <class T> 582 T __atomic_exchange_seq_cst(T volatile* obj, T desr) { 583 unique_lock<mutex> _(some_mutex); 584 T r = *obj; 585 *obj = desr; 586 return r; 587 } 588 589 template <class T> 590 bool __atomic_compare_exchange_strong_seq_cst_seq_cst(T volatile* obj, T* exp, T desr) { 591 unique_lock<mutex> _(some_mutex); 592 if (std::memcmp(const_cast<T*>(obj), exp, sizeof(T)) == 0) { 593 std::memcpy(const_cast<T*>(obj), &desr, sizeof(T)); 594 return true; 595 } 596 std::memcpy(exp, const_cast<T*>(obj), sizeof(T)); 597 return false; 598 } 599 600 template <class T> 601 bool __atomic_compare_exchange_weak_seq_cst_seq_cst(T volatile* obj, T* exp, T desr) { 602 unique_lock<mutex> _(some_mutex); 603 if (std::memcmp(const_cast<T*>(obj), exp, sizeof(T)) == 0) 604 { 605 std::memcpy(const_cast<T*>(obj), &desr, sizeof(T)); 606 return true; 607 } 608 std::memcpy(exp, const_cast<T*>(obj), sizeof(T)); 609 return false; 610 } 611 612 // T is one of: 613 // char, signed char, unsigned char, short, unsigned short, 614 // int, unsigned int, long, unsigned long, 615 // long long, unsigned long long, char16_t, char32_t, wchar_t 616 617 template <class T> 618 T __atomic_fetch_add_seq_cst(T volatile* obj, T operand) { 619 unique_lock<mutex> _(some_mutex); 620 T r = *obj; 621 *obj += operand; 622 return r; 623 } 624 625 template <class T> 626 T __atomic_fetch_sub_seq_cst(T volatile* obj, T operand) { 627 unique_lock<mutex> _(some_mutex); 628 T r = *obj; 629 *obj -= operand; 630 return r; 631 } 632 633 template <class T> 634 T __atomic_fetch_and_seq_cst(T volatile* obj, T operand) { 635 unique_lock<mutex> _(some_mutex); 636 T r = *obj; 637 *obj &= operand; 638 return r; 639 } 640 641 template <class T> 642 T __atomic_fetch_or_seq_cst(T volatile* obj, T operand) { 643 unique_lock<mutex> _(some_mutex); 644 T r = *obj; 645 *obj |= operand; 646 return r; 647 } 648 649 template <class T> 650 T __atomic_fetch_xor_seq_cst(T volatile* obj, T operand) { 651 unique_lock<mutex> _(some_mutex); 652 T r = *obj; 653 *obj ^= operand; 654 return r; 655 } 656 657 void* __atomic_fetch_add_seq_cst(void* volatile* obj, ptrdiff_t operand) { 658 unique_lock<mutex> _(some_mutex); 659 void* r = *obj; 660 (char*&)(*obj) += operand; 661 return r; 662 } 663 664 void* __atomic_fetch_sub_seq_cst(void* volatile* obj, ptrdiff_t operand) { 665 unique_lock<mutex> _(some_mutex); 666 void* r = *obj; 667 (char*&)(*obj) -= operand; 668 return r; 669 } 670 671 void __atomic_thread_fence_seq_cst() { 672 unique_lock<mutex> _(some_mutex); 673 } 674 675 void __atomic_signal_fence_seq_cst() { 676 unique_lock<mutex> _(some_mutex); 677 } 678 679One should consult the (currently draft) `C++ Standard <https://wg21.link/n3126>`_ 680for the details of the definitions for these operations. For example, 681``__atomic_compare_exchange_weak_seq_cst_seq_cst`` is allowed to fail 682spuriously while ``__atomic_compare_exchange_strong_seq_cst_seq_cst`` is not. 683 684If on your platform the lock-free definition of ``__atomic_compare_exchange_weak_seq_cst_seq_cst`` 685would be the same as ``__atomic_compare_exchange_strong_seq_cst_seq_cst``, you may omit the 686``__atomic_compare_exchange_weak_seq_cst_seq_cst`` intrinsic without a performance cost. The 687library will prefer your implementation of ``__atomic_compare_exchange_strong_seq_cst_seq_cst`` 688over its own definition for implementing ``__atomic_compare_exchange_weak_seq_cst_seq_cst``. 689That is, the library will arrange for ``__atomic_compare_exchange_weak_seq_cst_seq_cst`` to call 690``__atomic_compare_exchange_strong_seq_cst_seq_cst`` if you supply an intrinsic for the strong 691version but not the weak. 692 693Taking advantage of weaker memory synchronization 694------------------------------------------------- 695So far, all of the intrinsics presented require a **sequentially consistent** memory ordering. 696That is, no loads or stores can move across the operation (just as if the library had locked 697that internal mutex). But ``<atomic>`` supports weaker memory ordering operations. In all, 698there are six memory orderings (listed here from strongest to weakest): 699 700.. code-block:: cpp 701 702 memory_order_seq_cst 703 memory_order_acq_rel 704 memory_order_release 705 memory_order_acquire 706 memory_order_consume 707 memory_order_relaxed 708 709(See the `C++ Standard <https://wg21.link/n3126>`_ for the detailed definitions of each of these orderings). 710 711On some platforms, the compiler vendor can offer some or even all of the above 712intrinsics at one or more weaker levels of memory synchronization. This might 713lead for example to not issuing an ``mfence`` instruction on the x86. 714 715If the compiler does not offer any given operation, at any given memory ordering 716level, the library will automatically attempt to call the next highest memory 717ordering operation. This continues up to ``seq_cst``, and if that doesn't 718exist, then the library takes over and does the job with a ``mutex``. This 719is a compile-time search and selection operation. At run time, the application 720will only see the few inlined assembly instructions for the selected intrinsic. 721 722Each intrinsic is appended with the 7-letter name of the memory ordering it 723addresses. For example a ``load`` with ``relaxed`` ordering is defined by: 724 725.. code-block:: cpp 726 727 T __atomic_load_relaxed(const volatile T* obj); 728 729And announced with: 730 731.. code-block:: cpp 732 733 __has_feature(__atomic_load_relaxed_b) == 1 // bool 734 __has_feature(__atomic_load_relaxed_c) == 1 // char 735 __has_feature(__atomic_load_relaxed_a) == 1 // signed char 736 ... 737 738The ``__atomic_compare_exchange_strong(weak)`` intrinsics are parameterized 739on two memory orderings. The first ordering applies when the operation returns 740``true`` and the second ordering applies when the operation returns ``false``. 741 742Not every memory ordering is appropriate for every operation. ``exchange`` 743and the ``fetch_XXX`` operations support all 6. But ``load`` only supports 744``relaxed``, ``consume``, ``acquire`` and ``seq_cst``. ``store`` only supports 745``relaxed``, ``release``, and ``seq_cst``. The ``compare_exchange`` operations 746support the following 16 combinations out of the possible 36: 747 748.. code-block:: cpp 749 750 relaxed_relaxed 751 consume_relaxed 752 consume_consume 753 acquire_relaxed 754 acquire_consume 755 acquire_acquire 756 release_relaxed 757 release_consume 758 release_acquire 759 acq_rel_relaxed 760 acq_rel_consume 761 acq_rel_acquire 762 seq_cst_relaxed 763 seq_cst_consume 764 seq_cst_acquire 765 seq_cst_seq_cst 766 767Again, the compiler supplies intrinsics only for the strongest orderings where 768it can make a difference. The library takes care of calling the weakest 769supplied intrinsic that is as strong or stronger than the customer asked for. 770 771Note about ABI 772============== 773With any design, the (back end) compiler writer should note that the decision to 774implement lock-free operations on any given type (or not) is an ABI-binding decision. 775One can not change from treating a type as not lock free, to lock free (or vice-versa) 776without breaking your ABI. 777 778For example: 779 780**TU1.cpp**: 781 782.. code-block:: cpp 783 784 extern atomic<long long> A; 785 int foo() { return A.compare_exchange_strong(w, x); } 786 787 788**TU2.cpp**: 789 790.. code-block:: cpp 791 792 extern atomic<long long> A; 793 void bar() { return A.compare_exchange_strong(y, z); } 794 795If only **one** of these calls to ``compare_exchange_strong`` is implemented with 796mutex-locked code, then that mutex-locked code will not be executed mutually 797exclusively of the one implemented in a lock-free manner. 798