# Design Notes for Execution and Memory Space Instances ## Objective * Enable Kokkos interoperability with coarse-grain tasking models ## Requirements * Backwards compatible with existing Kokkos API * Support existing Host execution spaces (Serial, Threads, OpenMP) * Support DARMA threading model (may require a new Host execution space) * Support Uintah threading model, i.e. indepentant worker threadpools working of of shared task queues ## Execution Space * Parallel work is *dispatched* on an execution space instance * Execution space instances are conceptually disjoint/independent from each other ## Host Execution Space Instances * A host-side *control* thread dispatches work to an instance * `main` is the initial control thread * A host execution space instance is an organized thread pool * All instances are disjoint, i.e. hardware resources are not shared between instances * Exactly one control thread is associated with an instance and only that control thread may dispatch work to to that instance * The control thread is a member of the instance * The pool of threads associated with an instances is not mutatable during that instance existence * The pool of threads associated with an instance may be masked - Allows work to be dispatched to a subset of the pool - Example: only one hyperthread per core of the instance - A mask can be applied during the policy creation of a parallel algorithm - Masking is portable by defining it as ceiling of fraction between [0.0, 1.0] of the available resources ``` class ExecutionSpace { public: using execution_space = ExecutionSpace; using memory_space = ...; using device_type = Kokkos::Device; using array_layout = ...; using size_type = ...; using scratch_memory_space = ...; class Instance { int thread_pool_size( int depth = 0 ); ... }; class InstanceRequest { public: using Control = std::function< void( Instance * )>; InstanceRequest( Control control , unsigned thread_count , unsigned use_numa_count = 0 , unsigned use_cores_per_numa = 0 ); }; static bool in_parallel(); static bool sleep(); static bool wake(); static void fence(); static void print_configuration( std::ostream &, const bool detailed = false ); static void initialize( unsigned thread_count = 0 , unsigned use_numa_count = 0 , unsigned use_cores_per_numa = 0 ); // Partition the current instance into the requested instances // and run the given functions on the cooresponding instances // will block until all the partitioned instances complete and // the original instance will be restored // // Requires that the space has already been initialized // Requires that the request can be statisfied by the current instance // i.e. the sum of number of requested threads must be less than the // max_hardware_threads // // Each control functor will accept a handle to its new default instance // Each instance must be independent of all other instances // i.e. no assumption on scheduling between instances // The user is responible for checking the return code for errors static int run_instances( std::vector< InstanceRequest> const& requests ); static void finalize(); static int is_initialized(); static int concurrency(); static int thread_pool_size( int depth = 0 ); static int thread_pool_rank(); static int max_hardware_threads(); static int hardware_thread_id(); }; ```