1# Design Notes for Execution and Memory Space Instances 2 3## Objective 4 5 * Enable Kokkos interoperability with coarse-grain tasking models 6 7## Requirements 8 9 * Backwards compatible with existing Kokkos API 10 * Support existing Host execution spaces (Serial, Threads, OpenMP) 11 * Support DARMA threading model (may require a new Host execution space) 12 * Support Uintah threading model, i.e. indepentant worker threadpools working of of shared task queues 13 14 15## Execution Space 16 17 * Parallel work is *dispatched* on an execution space instance 18 19 * Execution space instances are conceptually disjoint/independent from each other 20 21 22## Host Execution Space Instances 23 24 * A host-side *control* thread dispatches work to an instance 25 26 * `main` is the initial control thread 27 28 * A host execution space instance is an organized thread pool 29 30 * All instances are disjoint, i.e. hardware resources are not shared between instances 31 32 * Exactly one control thread is associated with 33 an instance and only that control thread may 34 dispatch work to to that instance 35 36 * The control thread is a member of the instance 37 38 * The pool of threads associated with an instances is not mutatable during that instance existence 39 40 * The pool of threads associated with an instance may be masked 41 42 - Allows work to be dispatched to a subset of the pool 43 44 - Example: only one hyperthread per core of the instance 45 46 - A mask can be applied during the policy creation of a parallel algorithm 47 48 - Masking is portable by defining it as ceiling of fraction between [0.0, 1.0] 49 of the available resources 50 51``` 52class ExecutionSpace { 53public: 54 using execution_space = ExecutionSpace; 55 using memory_space = ...; 56 using device_type = Kokkos::Device<execution_space, memory_space>; 57 using array_layout = ...; 58 using size_type = ...; 59 using scratch_memory_space = ...; 60 61 62 class Instance 63 { 64 int thread_pool_size( int depth = 0 ); 65 ... 66 }; 67 68 class InstanceRequest 69 { 70 public: 71 using Control = std::function< void( Instance * )>; 72 73 InstanceRequest( Control control 74 , unsigned thread_count 75 , unsigned use_numa_count = 0 76 , unsigned use_cores_per_numa = 0 77 ); 78 79 }; 80 81 static bool in_parallel(); 82 83 static bool sleep(); 84 static bool wake(); 85 86 static void fence(); 87 88 static void print_configuration( std::ostream &, const bool detailed = false ); 89 90 static void initialize( unsigned thread_count = 0 91 , unsigned use_numa_count = 0 92 , unsigned use_cores_per_numa = 0 93 ); 94 95 // Partition the current instance into the requested instances 96 // and run the given functions on the cooresponding instances 97 // will block until all the partitioned instances complete and 98 // the original instance will be restored 99 // 100 // Requires that the space has already been initialized 101 // Requires that the request can be statisfied by the current instance 102 // i.e. the sum of number of requested threads must be less than the 103 // max_hardware_threads 104 // 105 // Each control functor will accept a handle to its new default instance 106 // Each instance must be independent of all other instances 107 // i.e. no assumption on scheduling between instances 108 // The user is responible for checking the return code for errors 109 static int run_instances( std::vector< InstanceRequest> const& requests ); 110 111 static void finalize(); 112 113 static int is_initialized(); 114 115 static int concurrency(); 116 117 static int thread_pool_size( int depth = 0 ); 118 119 static int thread_pool_rank(); 120 121 static int max_hardware_threads(); 122 123 static int hardware_thread_id(); 124 125 }; 126 127``` 128 129 130 131 132