1c8c06e52SAlex Bennée.. 2c8c06e52SAlex Bennée Copyright (c) 2015-2020 Linaro Ltd. 3c8c06e52SAlex Bennée 4c8c06e52SAlex Bennée This work is licensed under the terms of the GNU GPL, version 2 or 5c8c06e52SAlex Bennée later. See the COPYING file in the top-level directory. 6c8c06e52SAlex Bennée 7ae63ed16SLuis Pires================== 8ae63ed16SLuis PiresMulti-threaded TCG 9ae63ed16SLuis Pires================== 10c8c06e52SAlex Bennée 11c8c06e52SAlex BennéeThis document outlines the design for multi-threaded TCG (a.k.a MTTCG) 12c8c06e52SAlex Bennéesystem-mode emulation. user-mode emulation has always mirrored the 13c8c06e52SAlex Bennéethread structure of the translated executable although some of the 14c8c06e52SAlex Bennéechanges done for MTTCG system emulation have improved the stability of 15c8c06e52SAlex Bennéelinux-user emulation. 16c8c06e52SAlex Bennée 17c8c06e52SAlex BennéeThe original system-mode TCG implementation was single threaded and 18c8c06e52SAlex Bennéedealt with multiple CPUs with simple round-robin scheduling. This 19c8c06e52SAlex Bennéesimplified a lot of things but became increasingly limited as systems 20c8c06e52SAlex Bennéebeing emulated gained additional cores and per-core performance gains 21c8c06e52SAlex Bennéefor host systems started to level off. 22c8c06e52SAlex Bennée 23c8c06e52SAlex BennéevCPU Scheduling 24c8c06e52SAlex Bennée=============== 25c8c06e52SAlex Bennée 26c8c06e52SAlex BennéeWe introduce a new running mode where each vCPU will run on its own 27c8c06e52SAlex Bennéeuser-space thread. This is enabled by default for all FE/BE 28c8c06e52SAlex Bennéecombinations where the host memory model is able to accommodate the 29c8c06e52SAlex Bennéeguest (TCG_GUEST_DEFAULT_MO & ~TCG_TARGET_DEFAULT_MO is zero) and the 30c8c06e52SAlex Bennéeguest has had the required work done to support this safely 31c8c06e52SAlex Bennée(TARGET_SUPPORTS_MTTCG). 32c8c06e52SAlex Bennée 33c8c06e52SAlex BennéeSystem emulation will fall back to the original round robin approach 34c8c06e52SAlex Bennéeif: 35c8c06e52SAlex Bennée 36c8c06e52SAlex Bennée* forced by --accel tcg,thread=single 37c8c06e52SAlex Bennée* enabling --icount mode 38c8c06e52SAlex Bennée* 64 bit guests on 32 bit hosts (TCG_OVERSIZED_GUEST) 39c8c06e52SAlex Bennée 40c8c06e52SAlex BennéeIn the general case of running translated code there should be no 41c8c06e52SAlex Bennéeinter-vCPU dependencies and all vCPUs should be able to run at full 42c8c06e52SAlex Bennéespeed. Synchronisation will only be required while accessing internal 43c8c06e52SAlex Bennéeshared data structures or when the emulated architecture requires a 44c8c06e52SAlex Bennéecoherent representation of the emulated machine state. 45c8c06e52SAlex Bennée 46c8c06e52SAlex BennéeShared Data Structures 47c8c06e52SAlex Bennée====================== 48c8c06e52SAlex Bennée 49c8c06e52SAlex BennéeMain Run Loop 50c8c06e52SAlex Bennée------------- 51c8c06e52SAlex Bennée 52c8c06e52SAlex BennéeEven when there is no code being generated there are a number of 53c8c06e52SAlex Bennéestructures associated with the hot-path through the main run-loop. 54c8c06e52SAlex BennéeThese are associated with looking up the next translation block to 55c8c06e52SAlex Bennéeexecute. These include: 56c8c06e52SAlex Bennée 57c8c06e52SAlex Bennée tb_jmp_cache (per-vCPU, cache of recent jumps) 58c8c06e52SAlex Bennée tb_ctx.htable (global hash table, phys address->tb lookup) 59c8c06e52SAlex Bennée 60c8c06e52SAlex BennéeAs TB linking only occurs when blocks are in the same page this code 61c8c06e52SAlex Bennéeis critical to performance as looking up the next TB to execute is the 62c8c06e52SAlex Bennéemost common reason to exit the generated code. 63c8c06e52SAlex Bennée 64c8c06e52SAlex BennéeDESIGN REQUIREMENT: Make access to lookup structures safe with 65c8c06e52SAlex Bennéemultiple reader/writer threads. Minimise any lock contention to do it. 66c8c06e52SAlex Bennée 67c8c06e52SAlex BennéeThe hot-path avoids using locks where possible. The tb_jmp_cache is 68c8c06e52SAlex Bennéeupdated with atomic accesses to ensure consistent results. The fall 69c8c06e52SAlex Bennéeback QHT based hash table is also designed for lockless lookups. Locks 70c8c06e52SAlex Bennéeare only taken when code generation is required or TranslationBlocks 71c8c06e52SAlex Bennéehave their block-to-block jumps patched. 72c8c06e52SAlex Bennée 73c8c06e52SAlex BennéeGlobal TCG State 74c8c06e52SAlex Bennée---------------- 75c8c06e52SAlex Bennée 76c8c06e52SAlex BennéeUser-mode emulation 77c8c06e52SAlex Bennée~~~~~~~~~~~~~~~~~~~ 78c8c06e52SAlex Bennée 79c8c06e52SAlex BennéeWe need to protect the entire code generation cycle including any post 80c8c06e52SAlex Bennéegeneration patching of the translated code. This also implies a shared 81c8c06e52SAlex Bennéetranslation buffer which contains code running on all cores. Any 82c8c06e52SAlex Bennéeexecution path that comes to the main run loop will need to hold a 83c8c06e52SAlex Bennéemutex for code generation. This also includes times when we need flush 84c8c06e52SAlex Bennéecode or entries from any shared lookups/caches. Structures held on a 85c8c06e52SAlex Bennéeper-vCPU basis won't need locking unless other vCPUs will need to 86c8c06e52SAlex Bennéemodify them. 87c8c06e52SAlex Bennée 88c8c06e52SAlex BennéeDESIGN REQUIREMENT: Add locking around all code generation and TB 89c8c06e52SAlex Bennéepatching. 90c8c06e52SAlex Bennée 91c8c06e52SAlex Bennée(Current solution) 92c8c06e52SAlex Bennée 93c8c06e52SAlex BennéeCode generation is serialised with mmap_lock(). 94c8c06e52SAlex Bennée 95c8c06e52SAlex Bennée!User-mode emulation 96c8c06e52SAlex Bennée~~~~~~~~~~~~~~~~~~~~ 97c8c06e52SAlex Bennée 98c8c06e52SAlex BennéeEach vCPU has its own TCG context and associated TCG region, thereby 99c8c06e52SAlex Bennéerequiring no locking during translation. 100c8c06e52SAlex Bennée 101c8c06e52SAlex BennéeTranslation Blocks 102c8c06e52SAlex Bennée------------------ 103c8c06e52SAlex Bennée 104c8c06e52SAlex BennéeCurrently the whole system shares a single code generation buffer 105c8c06e52SAlex Bennéewhich when full will force a flush of all translations and start from 106c8c06e52SAlex Bennéescratch again. Some operations also force a full flush of translations 107c8c06e52SAlex Bennéeincluding: 108c8c06e52SAlex Bennée 109c8c06e52SAlex Bennée - debugging operations (breakpoint insertion/removal) 110c8c06e52SAlex Bennée - some CPU helper functions 11193154e76SAlex Bennée - linux-user spawning its first thread 112*02ca5ec1SPierrick Bouvier - operations related to TCG Plugins 113c8c06e52SAlex Bennée 114c8c06e52SAlex BennéeThis is done with the async_safe_run_on_cpu() mechanism to ensure all 115c8c06e52SAlex BennéevCPUs are quiescent when changes are being made to shared global 116c8c06e52SAlex Bennéestructures. 117c8c06e52SAlex Bennée 118c8c06e52SAlex BennéeMore granular translation invalidation events are typically due 119c8c06e52SAlex Bennéeto a change of the state of a physical page: 120c8c06e52SAlex Bennée 121c8c06e52SAlex Bennée - code modification (self modify code, patching code) 122c8c06e52SAlex Bennée - page changes (new page mapping in linux-user mode) 123c8c06e52SAlex Bennée 124c8c06e52SAlex BennéeWhile setting the invalid flag in a TranslationBlock will stop it 125c8c06e52SAlex Bennéebeing used when looked up in the hot-path there are a number of other 126c8c06e52SAlex Bennéebook-keeping structures that need to be safely cleared. 127c8c06e52SAlex Bennée 128c8c06e52SAlex BennéeAny TranslationBlocks which have been patched to jump directly to the 129c8c06e52SAlex Bennéenow invalid blocks need the jump patches reversing so they will return 130c8c06e52SAlex Bennéeto the C code. 131c8c06e52SAlex Bennée 132c8c06e52SAlex BennéeThere are a number of look-up caches that need to be properly updated 133c8c06e52SAlex Bennéeincluding the: 134c8c06e52SAlex Bennée 135c8c06e52SAlex Bennée - jump lookup cache 136c8c06e52SAlex Bennée - the physical-to-tb lookup hash table 137c8c06e52SAlex Bennée - the global page table 138c8c06e52SAlex Bennée 139c8c06e52SAlex BennéeThe global page table (l1_map) which provides a multi-level look-up 140c8c06e52SAlex Bennéefor PageDesc structures which contain pointers to the start of a 141c8c06e52SAlex Bennéelinked list of all Translation Blocks in that page (see page_next). 142c8c06e52SAlex Bennée 143c8c06e52SAlex BennéeBoth the jump patching and the page cache involve linked lists that 144c8c06e52SAlex Bennéethe invalidated TranslationBlock needs to be removed from. 145c8c06e52SAlex Bennée 146c8c06e52SAlex BennéeDESIGN REQUIREMENT: Safely handle invalidation of TBs 147c8c06e52SAlex Bennée - safely patch/revert direct jumps 148c8c06e52SAlex Bennée - remove central PageDesc lookup entries 149c8c06e52SAlex Bennée - ensure lookup caches/hashes are safely updated 150c8c06e52SAlex Bennée 151c8c06e52SAlex Bennée(Current solution) 152c8c06e52SAlex Bennée 153c8c06e52SAlex BennéeThe direct jump themselves are updated atomically by the TCG 154c8c06e52SAlex Bennéetb_set_jmp_target() code. Modification to the linked lists that allow 155c8c06e52SAlex Bennéesearching for linked pages are done under the protection of tb->jmp_lock, 156c8c06e52SAlex Bennéewhere tb is the destination block of a jump. Each origin block keeps a 157c8c06e52SAlex Bennéepointer to its destinations so that the appropriate lock can be acquired before 158c8c06e52SAlex Bennéeiterating over a jump list. 159c8c06e52SAlex Bennée 160c8c06e52SAlex BennéeThe global page table is a lockless radix tree; cmpxchg is used 161c8c06e52SAlex Bennéeto atomically insert new elements. 162c8c06e52SAlex Bennée 163c8c06e52SAlex BennéeThe lookup caches are updated atomically and the lookup hash uses QHT 164c8c06e52SAlex Bennéewhich is designed for concurrent safe lookup. 165c8c06e52SAlex Bennée 166c8c06e52SAlex BennéeParallel code generation is supported. QHT is used at insertion time 167c8c06e52SAlex Bennéeas the synchronization point across threads, thereby ensuring that we only 168c8c06e52SAlex Bennéekeep track of a single TranslationBlock for each guest code block. 169c8c06e52SAlex Bennée 170c8c06e52SAlex BennéeMemory maps and TLBs 171c8c06e52SAlex Bennée-------------------- 172c8c06e52SAlex Bennée 173c8c06e52SAlex BennéeThe memory handling code is fairly critical to the speed of memory 174c8c06e52SAlex Bennéeaccess in the emulated system. The SoftMMU code is designed so the 175c8c06e52SAlex Bennéehot-path can be handled entirely within translated code. This is 176c8c06e52SAlex Bennéehandled with a per-vCPU TLB structure which once populated will allow 177c8c06e52SAlex Bennéea series of accesses to the page to occur without exiting the 178c8c06e52SAlex Bennéetranslated code. It is possible to set flags in the TLB address which 179c8c06e52SAlex Bennéewill ensure the slow-path is taken for each access. This can be done 180c8c06e52SAlex Bennéeto support: 181c8c06e52SAlex Bennée 182c8c06e52SAlex Bennée - Memory regions (dividing up access to PIO, MMIO and RAM) 183c8c06e52SAlex Bennée - Dirty page tracking (for code gen, SMC detection, migration and display) 184c8c06e52SAlex Bennée - Virtual TLB (for translating guest address->real address) 185c8c06e52SAlex Bennée 186c8c06e52SAlex BennéeWhen the TLB tables are updated by a vCPU thread other than their own 187c8c06e52SAlex Bennéewe need to ensure it is done in a safe way so no inconsistent state is 188c8c06e52SAlex Bennéeseen by the vCPU thread. 189c8c06e52SAlex Bennée 190c8c06e52SAlex BennéeSome operations require updating a number of vCPUs TLBs at the same 191c8c06e52SAlex Bennéetime in a synchronised manner. 192c8c06e52SAlex Bennée 193c8c06e52SAlex BennéeDESIGN REQUIREMENTS: 194c8c06e52SAlex Bennée 195c8c06e52SAlex Bennée - TLB Flush All/Page 196c8c06e52SAlex Bennée - can be across-vCPUs 197c8c06e52SAlex Bennée - cross vCPU TLB flush may need other vCPU brought to halt 198c8c06e52SAlex Bennée - change may need to be visible to the calling vCPU immediately 199c8c06e52SAlex Bennée - TLB Flag Update 200c8c06e52SAlex Bennée - usually cross-vCPU 201c8c06e52SAlex Bennée - want change to be visible as soon as possible 202c8c06e52SAlex Bennée - TLB Update (update a CPUTLBEntry, via tlb_set_page_with_attrs) 203c8c06e52SAlex Bennée - This is a per-vCPU table - by definition can't race 204c8c06e52SAlex Bennée - updated by its own thread when the slow-path is forced 205c8c06e52SAlex Bennée 206c8c06e52SAlex Bennée(Current solution) 207c8c06e52SAlex Bennée 208c8c06e52SAlex BennéeWe have updated cputlb.c to defer operations when a cross-vCPU 209c8c06e52SAlex Bennéeoperation with async_run_on_cpu() which ensures each vCPU sees a 210c8c06e52SAlex Bennéecoherent state when it next runs its work (in a few instructions 211c8c06e52SAlex Bennéetime). 212c8c06e52SAlex Bennée 213c8c06e52SAlex BennéeA new set up operations (tlb_flush_*_all_cpus) take an additional flag 214c8c06e52SAlex Bennéewhich when set will force synchronisation by setting the source vCPUs 215c8c06e52SAlex Bennéework as "safe work" and exiting the cpu run loop. This ensure by the 216c8c06e52SAlex Bennéetime execution restarts all flush operations have completed. 217c8c06e52SAlex Bennée 218c8c06e52SAlex BennéeTLB flag updates are all done atomically and are also protected by the 219c8c06e52SAlex Bennéecorresponding page lock. 220c8c06e52SAlex Bennée 221c8c06e52SAlex Bennée(Known limitation) 222c8c06e52SAlex Bennée 223c8c06e52SAlex BennéeNot really a limitation but the wait mechanism is overly strict for 224c8c06e52SAlex Bennéesome architectures which only need flushes completed by a barrier 225c8c06e52SAlex Bennéeinstruction. This could be a future optimisation. 226c8c06e52SAlex Bennée 227c8c06e52SAlex BennéeEmulated hardware state 228c8c06e52SAlex Bennée----------------------- 229c8c06e52SAlex Bennée 2300b2675c4SStefan HajnocziCurrently thanks to KVM work any access to IO memory is automatically protected 2310b2675c4SStefan Hajnocziby the BQL (Big QEMU Lock). Any IO region that doesn't use the BQL is expected 2320b2675c4SStefan Hajnoczito do its own locking. 233c8c06e52SAlex Bennée 234c8c06e52SAlex BennéeHowever IO memory isn't the only way emulated hardware state can be 235c8c06e52SAlex Bennéemodified. Some architectures have model specific registers that 236c8c06e52SAlex Bennéetrigger hardware emulation features. Generally any translation helper 237c8c06e52SAlex Bennéethat needs to update more than a single vCPUs of state should take the 238c8c06e52SAlex BennéeBQL. 239c8c06e52SAlex Bennée 240c8c06e52SAlex BennéeAs the BQL, or global iothread mutex is shared across the system we 241c8c06e52SAlex Bennéepush the use of the lock as far down into the TCG code as possible to 242c8c06e52SAlex Bennéeminimise contention. 243c8c06e52SAlex Bennée 244c8c06e52SAlex Bennée(Current solution) 245c8c06e52SAlex Bennée 246c8c06e52SAlex BennéeMMIO access automatically serialises hardware emulation by way of the 247c8c06e52SAlex BennéeBQL. Currently Arm targets serialise all ARM_CP_IO register accesses 248c8c06e52SAlex Bennéeand also defer the reset/startup of vCPUs to the vCPU context by way 249c8c06e52SAlex Bennéeof async_run_on_cpu(). 250c8c06e52SAlex Bennée 251c8c06e52SAlex BennéeUpdates to interrupt state are also protected by the BQL as they can 252c8c06e52SAlex Bennéeoften be cross vCPU. 253c8c06e52SAlex Bennée 254c8c06e52SAlex BennéeMemory Consistency 255c8c06e52SAlex Bennée================== 256c8c06e52SAlex Bennée 257c8c06e52SAlex BennéeBetween emulated guests and host systems there are a range of memory 258c8c06e52SAlex Bennéeconsistency models. Even emulating weakly ordered systems on strongly 259c8c06e52SAlex Bennéeordered hosts needs to ensure things like store-after-load re-ordering 260c8c06e52SAlex Bennéecan be prevented when the guest wants to. 261c8c06e52SAlex Bennée 262c8c06e52SAlex BennéeMemory Barriers 263c8c06e52SAlex Bennée--------------- 264c8c06e52SAlex Bennée 265c8c06e52SAlex BennéeBarriers (sometimes known as fences) provide a mechanism for software 266c8c06e52SAlex Bennéeto enforce a particular ordering of memory operations from the point 267c8c06e52SAlex Bennéeof view of external observers (e.g. another processor core). They can 268c8c06e52SAlex Bennéeapply to any memory operations as well as just loads or stores. 269c8c06e52SAlex Bennée 270c8c06e52SAlex BennéeThe Linux kernel has an excellent `write-up 2711ec43ca4SJohn Snow<https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/plain/Documentation/memory-barriers.txt>`_ 272c8c06e52SAlex Bennéeon the various forms of memory barrier and the guarantees they can 273c8c06e52SAlex Bennéeprovide. 274c8c06e52SAlex Bennée 275c8c06e52SAlex BennéeBarriers are often wrapped around synchronisation primitives to 276c8c06e52SAlex Bennéeprovide explicit memory ordering semantics. However they can be used 277c8c06e52SAlex Bennéeby themselves to provide safe lockless access by ensuring for example 278c8c06e52SAlex Bennéea change to a signal flag will only be visible once the changes to 279c8c06e52SAlex Bennéepayload are. 280c8c06e52SAlex Bennée 281c8c06e52SAlex BennéeDESIGN REQUIREMENT: Add a new tcg_memory_barrier op 282c8c06e52SAlex Bennée 283c8c06e52SAlex BennéeThis would enforce a strong load/store ordering so all loads/stores 284c8c06e52SAlex Bennéecomplete at the memory barrier. On single-core non-SMP strongly 285c8c06e52SAlex Bennéeordered backends this could become a NOP. 286c8c06e52SAlex Bennée 287c8c06e52SAlex BennéeAside from explicit standalone memory barrier instructions there are 288c8c06e52SAlex Bennéealso implicit memory ordering semantics which comes with each guest 289c8c06e52SAlex Bennéememory access instruction. For example all x86 load/stores come with 290c8c06e52SAlex Bennéefairly strong guarantees of sequential consistency whereas Arm has 291c8c06e52SAlex Bennéespecial variants of load/store instructions that imply acquire/release 292c8c06e52SAlex Bennéesemantics. 293c8c06e52SAlex Bennée 294c8c06e52SAlex BennéeIn the case of a strongly ordered guest architecture being emulated on 295c8c06e52SAlex Bennéea weakly ordered host the scope for a heavy performance impact is 296c8c06e52SAlex Bennéequite high. 297c8c06e52SAlex Bennée 298c8c06e52SAlex BennéeDESIGN REQUIREMENTS: Be efficient with use of memory barriers 299c8c06e52SAlex Bennée - host systems with stronger implied guarantees can skip some barriers 300c8c06e52SAlex Bennée - merge consecutive barriers to the strongest one 301c8c06e52SAlex Bennée 302c8c06e52SAlex Bennée(Current solution) 303c8c06e52SAlex Bennée 304c8c06e52SAlex BennéeThe system currently has a tcg_gen_mb() which will add memory barrier 305c8c06e52SAlex Bennéeoperations if code generation is being done in a parallel context. The 306c8c06e52SAlex Bennéetcg_optimize() function attempts to merge barriers up to their 307c8c06e52SAlex Bennéestrongest form before any load/store operations. The solution was 308c8c06e52SAlex Bennéeoriginally developed and tested for linux-user based systems. All 309c8c06e52SAlex Bennéebackends have been converted to emit fences when required. So far the 310c8c06e52SAlex Bennéefollowing front-ends have been updated to emit fences when required: 311c8c06e52SAlex Bennée 312c8c06e52SAlex Bennée - target-i386 313c8c06e52SAlex Bennée - target-arm 314c8c06e52SAlex Bennée - target-aarch64 315c8c06e52SAlex Bennée - target-alpha 316c8c06e52SAlex Bennée - target-mips 317c8c06e52SAlex Bennée 318c8c06e52SAlex BennéeMemory Control and Maintenance 319c8c06e52SAlex Bennée------------------------------ 320c8c06e52SAlex Bennée 321c8c06e52SAlex BennéeThis includes a class of instructions for controlling system cache 322c8c06e52SAlex Bennéebehaviour. While QEMU doesn't model cache behaviour these instructions 323c8c06e52SAlex Bennéeare often seen when code modification has taken place to ensure the 324c8c06e52SAlex Bennéechanges take effect. 325c8c06e52SAlex Bennée 326c8c06e52SAlex BennéeSynchronisation Primitives 327c8c06e52SAlex Bennée-------------------------- 328c8c06e52SAlex Bennée 329c8c06e52SAlex BennéeThere are two broad types of synchronisation primitives found in 330c8c06e52SAlex Bennéemodern ISAs: atomic instructions and exclusive regions. 331c8c06e52SAlex Bennée 332c8c06e52SAlex BennéeThe first type offer a simple atomic instruction which will guarantee 333c8c06e52SAlex Bennéesome sort of test and conditional store will be truly atomic w.r.t. 334c8c06e52SAlex Bennéeother cores sharing access to the memory. The classic example is the 335c8c06e52SAlex Bennéex86 cmpxchg instruction. 336c8c06e52SAlex Bennée 337c8c06e52SAlex BennéeThe second type offer a pair of load/store instructions which offer a 338c8c06e52SAlex Bennéeguarantee that a region of memory has not been touched between the 339c8c06e52SAlex Bennéeload and store instructions. An example of this is Arm's ldrex/strex 340c8c06e52SAlex Bennéepair where the strex instruction will return a flag indicating a 341c8c06e52SAlex Bennéesuccessful store only if no other CPU has accessed the memory region 342c8c06e52SAlex Bennéesince the ldrex. 343c8c06e52SAlex Bennée 344c8c06e52SAlex BennéeTraditionally TCG has generated a series of operations that work 345c8c06e52SAlex Bennéebecause they are within the context of a single translation block so 346c8c06e52SAlex Bennéewill have completed before another CPU is scheduled. However with 347c8c06e52SAlex Bennéethe ability to have multiple threads running to emulate multiple CPUs 348c8c06e52SAlex Bennéewe will need to explicitly expose these semantics. 349c8c06e52SAlex Bennée 350c8c06e52SAlex BennéeDESIGN REQUIREMENTS: 351c8c06e52SAlex Bennée - Support classic atomic instructions 352c8c06e52SAlex Bennée - Support load/store exclusive (or load link/store conditional) pairs 353c8c06e52SAlex Bennée - Generic enough infrastructure to support all guest architectures 354c8c06e52SAlex BennéeCURRENT OPEN QUESTIONS: 355c8c06e52SAlex Bennée - How problematic is the ABA problem in general? 356c8c06e52SAlex Bennée 357c8c06e52SAlex Bennée(Current solution) 358c8c06e52SAlex Bennée 359c8c06e52SAlex BennéeThe TCG provides a number of atomic helpers (tcg_gen_atomic_*) which 360c8c06e52SAlex Bennéecan be used directly or combined to emulate other instructions like 361c8c06e52SAlex BennéeArm's ldrex/strex instructions. While they are susceptible to the ABA 362c8c06e52SAlex Bennéeproblem so far common guests have not implemented patterns where 363c8c06e52SAlex Bennéethis may be a problem - typically presenting a locking ABI which 364c8c06e52SAlex Bennéeassumes cmpxchg like semantics. 365c8c06e52SAlex Bennée 366c8c06e52SAlex BennéeThe code also includes a fall-back for cases where multi-threaded TCG 367c8c06e52SAlex Bennéeops can't work (e.g. guest atomic width > host atomic width). In this 368c8c06e52SAlex Bennéecase an EXCP_ATOMIC exit occurs and the instruction is emulated with 369c8c06e52SAlex Bennéean exclusive lock which ensures all emulation is serialised. 370c8c06e52SAlex Bennée 371c8c06e52SAlex BennéeWhile the atomic helpers look good enough for now there may be a need 372c8c06e52SAlex Bennéeto look at solutions that can more closely model the guest 373c8c06e52SAlex Bennéearchitectures semantics. 374