1# Design Notes for Execution and Memory Space Instances
2
3## Objective
4
5 * Enable Kokkos interoperability with coarse-grain tasking models
6
7## Requirements
8
9 * Backwards compatible with existing Kokkos API
10 * Support existing Host execution spaces (Serial, Threads, OpenMP)
11 * Support DARMA threading model (may require a new Host execution space)
12 * Support Uintah threading model, i.e. indepentant worker threadpools working of of shared task queues
13
14
15## Execution Space
16
17  * Parallel work is *dispatched* on an execution space instance
18
19  * Execution space instances are conceptually disjoint/independent from each other
20
21
22## Host Execution Space Instances
23
24  *  A host-side *control* thread dispatches work to an instance
25
26  * `main` is the initial control thread
27
28  *  A host execution space instance is an organized thread pool
29
30  *  All instances are disjoint, i.e. hardware resources are not shared between instances
31
32  *  Exactly one control thread is associated with
33     an instance and only that control thread may
34     dispatch work to to that instance
35
36  *  The control thread is a member of the instance
37
38  *  The pool of threads associated with an instances is not mutatable during that instance existence
39
40  *  The pool of threads associated with an instance may be masked
41
42    -  Allows work to be dispatched to a subset of the pool
43
44    -  Example: only one hyperthread per core of the instance
45
46    -  A mask can be applied during the policy creation of a parallel algorithm
47
48    -  Masking is portable by defining it as ceiling of fraction between [0.0, 1.0]
49       of the available resources
50
51```
52class ExecutionSpace {
53public:
54  using execution_space = ExecutionSpace;
55  using memory_space = ...;
56  using device_type = Kokkos::Device<execution_space, memory_space>;
57  using array_layout = ...;
58  using size_type = ...;
59  using scratch_memory_space = ...;
60
61
62  class Instance
63  {
64    int thread_pool_size( int depth = 0 );
65    ...
66  };
67
68  class InstanceRequest
69  {
70  public:
71    using Control = std::function< void( Instance * )>;
72
73    InstanceRequest( Control control
74                   , unsigned thread_count
75                   , unsigned use_numa_count = 0
76                   , unsigned use_cores_per_numa = 0
77                   );
78
79  };
80
81  static bool in_parallel();
82
83  static bool sleep();
84  static bool wake();
85
86  static void fence();
87
88  static void print_configuration( std::ostream &, const bool detailed = false );
89
90  static void initialize( unsigned thread_count = 0
91                        , unsigned use_numa_count = 0
92                        , unsigned use_cores_per_numa = 0
93                        );
94
95  // Partition the current instance into the requested instances
96  // and run the given functions on the cooresponding instances
97  // will block until all the partitioned instances complete and
98  // the original instance will be restored
99  //
100  // Requires that the space has already been initialized
101  // Requires that the request can be statisfied by the current instance
102  //   i.e. the sum of number of requested threads must be less than the
103  //   max_hardware_threads
104  //
105  // Each control functor will accept a handle to its new default instance
106  // Each instance must be independent of all other instances
107  //   i.e. no assumption on scheduling between instances
108  // The user is responible for checking the return code for errors
109  static int run_instances( std::vector< InstanceRequest> const& requests );
110
111  static void finalize();
112
113  static int is_initialized();
114
115  static int concurrency();
116
117  static int thread_pool_size( int depth = 0 );
118
119  static int thread_pool_rank();
120
121  static int max_hardware_threads();
122
123  static int hardware_thread_id();
124
125 };
126
127```
128
129
130
131
132