1<!--
2(pandoc `#--from gfm` --to html --standalone --metadata title=" " $0 > $0.html) && firefox --new-window $0.html; sleep 5; rm $0.html; exit
3-->
4# [Boost.]Multi
5
6(not an official Boost library)
7
8_© Alfredo A. Correa, 2018-2021_
9
10`Multi` provides multidimensional array access to contiguous or regularly contiguous memory (or ranges).
11It shares the goals of [Boost.MultiArray](https://www.boost.org/doc/libs/1_69_0/libs/multi_array/doc/index.html),
12although the code is completely independent and the syntax has slight differences or has been extended.
13`Multi` and `Boost.MultiArray` types can be used interchangeably for the most part, they differ in the semantics of reference and value types.
14
15Multi aims to simplify the semantics of Boost.MultiArray and make it more compatible with the Standard (STL) Algorithms and special memory.
16It requires C++14.
17
18Some features:
19
20* Arbitrary pointer types (minimal requirements)
21* Simplified implementation (~1200 lines)
22* Fast access of subarrays (view) types
23* Value semantics of multi-dimensional array container
24* Better semantics of subarray (view) types
25* Interoperability with other libraries, STL, ranges,
26
27(Do not confuse this library with Boost.MultiArray or Boost.MultiIndex.)
28
29
30## Contents
31[[_TOC_]]
32
33## Installation and Tests
34
35`Multi` doesn't require instalation, single file `#include<multi/array.hpp>` is enough to use the full core library.
36`Multi`'s _only_ dependecy is the standard C++ library.
37
38It is important to compile programs that use the library with a decent level of optimization (e.g. `-O2`) to avoid slowdown if indiviudual element-access is intensively used.
39For example, when testing speed, please make sure that you are compiling in release mode (`-DNDEBUG`) and with optimizations (`-O3`),
40if your test involves mathematical operations add arithmetic optimizations (`-Ofast`) to compare with Fortran code.
41
42A CMake build system is provided to automatically run basic tests.
43Test do depend on Boost.Test.
44
45```bash
46git clone https://gitlab.com/correaa/boost-multi.git multi
47cd multi
48```
49```bash
50#export CXX="nvcc -DBOOST_PP_VARIADICS=1 -x cu -O3"  #optional spec. compiler
51mkdir -p test/build
52cd test/build
53cmake ..
54make -j
55make test -j
56```
57
58The code is developed on `clang` (10.0), `gcc` (9.3) and `nvcc` 11 compilers, and [tested regularly ](https://gitlab.com/correaa/boost-multi/pipelines) with clang 9.0, NVCC 10.1, Intel (19.1), and PGI(nvc++) 20.7 compilers.
59For detailed compilation instructions of test see the Continuous Integration (CI) definition file https://gitlab.com/correaa/boost-multi/-/blob/master/.gitlab-ci.yml
60
61## Types
62
63* `multi::array<T, D, A>`: Array of dimension `D`, it has value semantics if `T` has value semantics. Memory is requested by allocator of type `A`, should support stateful allocators.
64* `multi::array_ref<T, D, P = T*>`: Array interpretation of a random access range, usually a memory block. It has reference semantics. Thanks to (non-virtual) inheritance an `array<T, D, A>` is-a `array_ref<T, D, A::pointer>`.
65* other derived "unspecified types" fulfil (a still loosely defined) `MultiArrayView` concept, for example by taking partial indices or rotations (transpositions). These reference types cannot be stored except through life-time extensions `auto&&`. Due to language limitations `auto` will not deduce a corresponding value-sematics type; for this reason it is necessary to use a "decay" idiom to obtain value object.
66* `MultiArrayView<T,D,P>::(const_)iterator`: Iterator to subarrays of dimension `D - 1`. For `D == 1` this is an iterator to an element. This types are generated by `begin` and `end` functions.
67* `MultiArrayView<T, D, P>::(const_)reference`: Reference to subarrays of dimension `D - 1`. For `D > 1` this are not true C++-references but types emulate them (with reference semantics), therefore `auto` is not well behaved. For `D==1` this is a true C++ reference to an elements. These types are generated by dereferencing iterators, e.g. `*begin(MA)`.
68
69## Basic Usage
70
71Declare an array specifying the element type and the dimension.
72Elements can be input with nested braced notation.
73```cpp
74std::array<double, 2> A = {
75	{1, 2, 3}
76	{4, 5, 6}
77};
78```
79
80The size is automatically deduced; the first dimension are the (two) "rows" above.
81
82```cpp
83assert( A.size()==2 );
84assert( std::get<1>(A.sizes()) == 3 );
85```
86
87The value of an array can be copied, moved, and compared.
88Copies are equal but independent.
89```cpp
90std::array<double, 2> B = A;
91assert( extensions(B) == extensions(A) );
92assert(  B[0][1] ==  A[0][1] );
93assert( &B[0][1] != &A[0][1] );
94assert( B == A );
95```
96
97Array can be initialized by the size alone, in which case the element values are default constructed:
98
99```cpp
100std::array<double, 3> C({3, 4, 5}); // 3*4*5 = 60 elements
101```
102
103Arrays can be passed by value or by reference, most of the time they should be passed through generic parameters.
104Most useful function work on the concept of array rather than on a concrete type.
105
106```cpp
107template<class ArrayDouble2D> // instead of the over specific argument std::array<double, 2>
108double const& element_1_1(ArrayDouble2D const& m){return m[1][1];}
109...
110assert( element_1_1(A) == A[1][1] );
111```
112
113These generic function arguments that are not intended to be modified are passed by `const&`; otherwise pass by forward-reference `&&`.
114In this way the functions can be called on subblocks of larger matrices.
115
116```cpp
117assert( &element_1_1(C3D[0]) == &C3D[0][1][1] );
118```
119
120## Advanced Usage
121
122We create a static C-array of `double`s, and refer to it via a bidimensional array `multi::array_ref<double, 2>`.
123
124```cpp
125	#include "../array_ref.hpp"
126	#include "../array.hpp"
127
128	#include<algorithm> // for sort
129	#include<iostream> // for print
130
131	namespace multi = boost::multi;
132	using std::cout; using std::cerr;
133
134	int main(){
135		double d2D[4][5] = {
136			{150, 16, 17, 18, 19},
137			{ 30,  1,  2,  3,  4},
138			{100, 11, 12, 13, 14},
139			{ 50,  6,  7,  8,  9}
140		};
141		multi::array_ref<double, 2> d2D_ref{&d2D[0][0], {4, 5}};
142		...
143```
144
145Note that the syntax of creating a reference array involves passing the pointer to a memory block (20 elements here) and the logical dimensions of that memory block (4 by 5 here).
146
147Next we print the elements in a way that corresponds to the logical arrangement:
148
149```cpp
150		...
151		for(auto i : d2D_ref.extension(0)){
152			for(auto j : d2D_ref.extension(1))
153				cout << d2D_ref[i][j] <<' ';
154			cout <<'\n';
155		}
156		...
157```
158
159This will output:
160
161> ```cpp
162> 150 16 17 18 19
163> 30 1 2 3 4
164> 100 11 12 13 14
165> 50 6 7 8 9
166> ```
167
168It is sometimes said (by Sean Parent) that the whole of STL algorithms can be seen as intermediate pieces to implement`std::stable_sort`.
169Pressumably if one can sort over a range, one can perform any other standard algorithm.
170
171```cpp
172		...
173		std::stable_sort( begin(d2D_ref), end(d2D_ref) );
174		...
175```
176
177If we print this we will get
178
179> ```cpp
180> 30 1 2 3 4
181> 50 6 7 8 9
182> 100 11 12 13 14
183> 150 16 17 18 19
184> ```
185
186
187The array has been changed to be in row-based lexicographical order.
188Since the sorted array is a reference to the original data, the original array has changed.
189
190```cpp
191		...
192		assert( d2D[1][1] == 6 );
193		...
194```
195
196(Note that `std::*sort` cannot be applied directly to a multidimensional C-array or to Boost.MultiArray types.)
197
198If we want to order the matrix in a per-column basis we need to "view" the matrix as range of columns. This is done in the bidimensional case, by accessing the matrix as a range of columns:
199
200```cpp
201		...
202		std::stable_sort( d2D_ref.begin(1), d2D_ref.end(1) );
203	}
204```
205
206Which will transform the matrix into.
207
208> ```cpp
209> 1 2 3 4 30
210> 6 7 8 9 50
211> 11 12 13 14 100
212> 16 17 18 19 150
213> ```
214
215In other words, a matrix of dimension `D` can be viewed simultaneously as `D` different ranges of different "transpositions" by passing an interger value to `begin` and `end` indicating the preferred dimension.
216`begin(0)` is equivalent to `begin()`.
217
218## Initialization
219
220`array_ref` is initialized from a preexisting contiguous range, the index extensions should compatible with the total number of elements.
221
222```cpp
223double* dp = new double[12];
224multi::array_ref<double, 2> A({3,4}, dp);
225multi::array_ref<double, 2> B({2,6}, dp);
226...
227delete[] dp;
228```
229`array` is initialized by specifying the index extensions (and optionally a default value) or alternatively from a rectangular list.
230
231```cpp
232/*In C++17 the element-type and the dimensionality can be omitted*/
233multi::array/*<double, 1>*/ A1 = {1.,2.,3.};
234                     assert(A1.dimensionality==1 and A1.num_elements()==3);
235multi::array/*<double, 2>*/ A2 {
236	 {1.,2.,3.},
237	 {4.,5.,6.}
238};                   assert(A2.dimensionality==2 and A2.num_elements()==2*3);
239multi::array/*<double, 3>*/ const A3 = {
240    {{ 1.2,  0.}, { 2.4, 1.}},
241    {{11.2,  3.}, {34.4, 4.}},
242    {{15.2, 99.}, {32.4, 2.}}
243};                   assert(A3.dimensionality==3 and A3.num_elements()==3*2*2);
244```
245
246## Iteration
247
248Accessing arrays by iterators (`begin`/`end`) enables the use of many iterator based algorithms (see the sort example above).
249`begin/end(A)` (or equivalently `A.begin/end()`) gives iterators that linear and random access in the leading dimension.
250
251`A.begin/end(n)` gives access in non-leading nested dimension number `n`.
252
253`cbegin/cend(A)` (or equivalently `A.cbegin/cend()`) gives read-only iterators.
254
255For example in three dimensional array,
256
257	(cbegin(A)+1)->operator[](1).begin()[0] = 342.4; //error, read-only
258	(begin(A)+1)->operator[](1).begin()[0] = 342.4; // assigns to A[1][1][0]
259	assert( (begin(A)+1)->operator[](1).begin()[0] == 342.4 );
260
261As an example, this function allows printing arrays of arbitrary dimension into a linear comma-separated form.
262
263```cpp
264void print(double const& d){cout<<d;};
265template<class MultiArray>
266void print(MultiArray const& ma){
267	cout<<"{";
268	if(not ma.empty()){
269		print(*cbegin(ma));
270		std::for_each(cbegin(ma)+1, cend(ma), [](auto&& e){cout<<","; print(e);});
271	}
272	cout<<"}";
273}
274...
275print(A);
276```
277> {{{1.2,1.1},{2.4,1}},{{11.2,3},{34.4,4}},{{15.2,99},{32.4,2}}}
278
279
280Except for those corresponding to the one-dimensional case, derreferencing iterators generally produce proxy-reference objects.
281Therefore this is not allowed:
282
283    auto row = *begin(A); // compile error
284
285This because `row` doesn't have the expected value semantics, and didn't produce any data copy.
286However this express the intention better
287
288    decltype(A)::value_type row = *begin(A); // there is a real copy.
289
290In my experience, however, this produces a more consistent idiom to hold references without copying elements.
291
292    auto const& crow = *cbegin(A); // same as decltype(A)::const_reference crow = *cbegin(A);
293    auto&&       row = * begin(A); // same as decltype(A)::      reference  row = * begin(A);
294
295## Indexing
296
297Arrays provide random access to elements or subviews.
298Many algorithms on arrays are oriented to linear algebra, which are ubiquitously implemented in terms of multidimensional index access.
299
300### Element access and partial access
301
302Index access mimics that of C-fixed sizes arrays, for example a 3-dimensional array will access to an element by `m[1][2][3]`,
303which can be used for write and read operations.
304
305Partial index arguments `m[1][2]` generate a view 1-dimensional object.
306Transpositions are also multi-dimensional arrays views in which the index are *logically* rearranged, for example `m.rotated(1)[2][3][1] == rotated(m)[2][3][1] == m[1][2][3]`.
307(rotate refers to the fact that the logical indices are rotated.)
308
309As an illustration of an algorithm based on index access (as opposed to iterators),
310this example code implements Gauss Jordan Elimination without pivoting:
311
312```cpp
313template<class Matrix, class Vector>
314auto gj_solve(Matrix&& A, Vector&& y)->decltype(y[0]/=A[0][0], y){
315	std::ptrdiff_t Asize = size(A);
316	for(std::ptrdiff_t r = 0; r != Asize; ++r){
317		auto&& Ar = A[r];
318		auto&& Arr = Ar[r];
319		for(std::ptrdiff_t c = r + 1; c != Asize; ++c) Ar[c] /= Arr;
320		auto const yr = (y[r] /= Arr);
321		for(std::ptrdiff_t r2 = r + 1; r2 != Asize; ++r2){
322			auto&& Ar2 = A[r2];
323			auto const& Ar2r = Ar2[r]; // auto&& Ar = A[r];
324			for(std::ptrdiff_t c = r + 1; c != Asize; ++c) Ar2[c] -= Ar2r*Ar[c];
325			y[r2] -= Ar2r*yr;
326		}
327	}
328	for(std::ptrdiff_t r = Asize - 1; r > 0; --r){
329		auto const& yr = y[r];
330		for(std::ptrdiff_t r2 = r-1; r2 >=0; --r2) y[r2] -= yr*A[r2][r];
331	}
332	return y;
333}
334```
335
336This function can be applied to a `multi::array` container:
337
338```cpp
339multi::array<double, 2> A = {{-3., 2., -4.},{0., 1., 2.},{2., 4., 5.}};
340multi::array<double, 1> y = {12.,5.,2.}; //(M); assert(y.size() == M); iota(y.begin(), y.end(), 3.1);
341gj_solve(A, y);
342```
343
344and also to a combination of `MultiArrayView`-type objects:
345
346```cpp
347multi::array<double, 2> A({6000, 7000}); std::iota(A.data(), A.data() + A.num_elements(), 0.1);
348std::vector<double> y(3000); std::iota(y.begin(), y.end(), 0.2);
349gj_solve(A({1000, 4000}, {0, 3000}), y);
350```
351
352### Slices and strides
353
354Given an array, a slice in the first dimension can be taken with the `sliced` function. `sliced` takes two arguments, the first index of the slice and the last index (not included) of the slice. For example,
355
356```cpp
357multi::array<double, 2> d2D({4, 5});
358assert( d2D.size(0) == 4 and d2D.size(1) == 5 );
359
360auto&& d2D_sliced = d2D.sliced(1, 3); // {{d2D[1], d2D[2]}}
361assert( d2D_sliced.size(0) == 2 and d2D_sliced.size(1) == 5 );
362```
363
364The number of rows in the sliced matrix is 2 because we took only two rows, row 1 and row 2 (row 3 is excluded).
365
366In the same way a strided view of the original array can be taken with the `strided` function.
367
368```cpp
369auto&& d2D_strided = d2D.strided(2); // {{ d2D[0], d2D[1] }};
370assert( d2D_strided.size(0) == 2 and d2D_strided.size(1) == 5 );
371```
372
373In this case the number of rows is 2 because, out of the 4 original rows we took one every two.
374
375Operations can be combined in a single line:
376
377```cpp
378auto&& d2D_slicedstrided = d2D.sliced(1, 3).strided(2); // {{ d2D[1] }};
379assert( d2D_slicedstrided.size(0) == 1 and d2D_slicedstrided.size(1) == 5 );
380```
381
382For convenience, `A.sliced(a, b, c)` is the same as `A.sliced(a, b).strided(c)`.
383
384By combining `rotated`, `sliced` and `strided` one can take sub arrays at any dimension.
385For example in a two dimensional array one can take a subset of columns by defining.
386
387```cpp
388auto&& subA = A.rotated(1).strided(1, 3).sliced(2).rotated(-1);
389```
390
391Other notations are available, but when in doubt the `rotated/strided/sliced/rotated` and combinations of them idioms provides the most control over the subview operations.
392(At the moment the `strided` argument has to divide the total size of the slice (or matrix), otherwise the behavior is undefined.)
393
394Blocks (slices) in multidimensions can be obtained but pure index notation using `.operator()`:
395
396```cpp
397multi::array<double, 2> A({6, 7}); // 6x7 array
398A({1, 4}, {2, 4}) // 3x2 array, containing indices 1 to 4 in the first dimension and 2 to 4 in the second dimension.
399```
400
401## Concept Requirements
402
403The design tries to impose the minimum possible requirements over the used referred types.
404Pointer-like random access types can be used as substitutes of built-in pointers.
405
406```cpp
407namespace minimal{
408    template<class T> class ptr{ // minimalistic pointer
409    	T* impl_;
410    	T& operator*() const{return *impl_;}
411    	auto operator+(std::ptrdiff_t n) const{return ptr{impl_ + n};}
412    //	operator[], operator+=, etc are optional but not necessary
413    };
414}
415
416int main(){
417	double* buffer = new double[100];
418	multi::array_ref<double, 2, minimal::ptr<double> > CC(minimal::ptr<double>{buffer}, {10, 10});
419	CC[2]; // requires operator+
420	CC[1][1]; // requires operator*
421	CC[1][1] = 9;
422	assert(CC[1][1] == 9);
423	delete[] buffer;
424}
425```
426
427### Linear Sequences: Pointers
428
429An `array_ref` can reference to an arbitrary random access iterator sequence.
430This way, any linear (random access) sequence (e.g. `raw memory`, `std::vector`, `std::queue`) can be efficiently arranged as a multidimensional array.
431
432```cpp
433std::vector<double> buffer(100);
434multi::array_ref<double, 2, std::vector<double>::iterator> A({10, 10}, buffer.begin());
435A[1][1] = 9;
436assert(A[1][1] == 9);
437assert(buffer[11]==9);
438```
439Since `array_ref` does not manage the memory associated with it, the reference can be simply dangle if the `buffer` memory is reallocated (e.g. by `resize`).
440
441### Special Memory: Allocators and Fancy Pointers
442
443`array`'s manages its memory through allocators.
444It can handle special memory, as long as the underlying types behave coherently, these include fancy pointers and fancy references.
445Associated fancy pointers and fancy reference (if any) are deduced from the allocator types.
446
447The behavior regarding memory managament of the [fancy pointers](https://en.cppreference.com/w/cpp/named_req/Allocator#Fancy_pointers) can be customized (if necessary) by specializations of some or all of these functions:
448
449```cpp
450destroy(a, first, last)
451destroy_n(a, first, n) -> last
452uninitialized_copy_n(a, first, n, dest) -> last;
453uninitialized_fill_n(a, first, n, value) -> last
454uninitialized_default_construct_n(a, first, n) -> last
455uninitialized_value_construct_n(a, first, n) -> last
456```
457
458where `a` is the special allocator, `n` is a size (usually the number of elements), `first`, `last` and `dest` are fancy pointers.
459
460Copying underlying memory can be customized by specializing
461
462```cpp
463copy_n(first, n, dest)
464fill_n(first, n, value)
465```
466
467Specific cases of fancy memory are file-mapped memory or interprocess shared memory.
468This example illustrates memory persistency by combining with Boost.Interprocess library.
469The arrays support their allocators and fancy pointers (`boost::interprocess::offset_ptr`).
470
471```cpp
472#include <boost/interprocess/managed_mapped_file.hpp>
473using namespace boost::interprocess;
474using manager = managed_mapped_file;
475template<class T> using mallocator = allocator<T, manager::segment_manager>;
476decltype(auto) get_allocator(manager& m){return m.get_segment_manager();}
477
478template<class T, auto D> using marray = multi::array<T, D, mallocator<T>>;
479
480int main(){
481{
482	manager m{create_only, "mapped_file.bin", 1 << 25};
483	auto&& arr2d = *m.construct<marray<double, 2>>("arr2d")(std::tuple{1000, 1000}, 0.0, get_allocator(m));
484	arr2d[4][5] = 45.001;
485}
486// imagine execution restarts here
487{
488	manager m{open_only, "mapped_file.bin"};
489	auto&& arr2d = *m.find<marray<double, 2>>("arr2d").first;
490	assert( arr2d[7][8] == 0. );
491	assert( arr2d[4][5] == 45.001 );
492	m.destroy<marray<double, 2>>("arr2d");
493}
494}
495```
496
497# Interoperability with other software
498
499## STL (Standard Template Library)
500
501The fundamental goal of the library is that the arrays and iterators can be used with STL algorithms out-of-the-box with a reasonable efficiency.
502The most dramatic example of this is that `std::sort` works with array as it is shown in a previous example.
503
504Along with STL itself, the library tries to interact with other existing C++ libraries.
505
506## Range v3
507
508```cpp
509#include <range/v3/all.hpp>
510int main(){
511
512	multi::array const d2D = {
513		{ 0,  1,  2,  3},
514		{ 5,  6,  7,  8},
515		{10, 11, 12, 13},
516		{15, 16, 17, 18}
517	};
518	assert( ranges::inner_product(d2D[0], d2D[1], 0.) == 6+2*7+3*8 );
519	assert( ranges::inner_product(d2D[0], rotated(d2D)[0], 0.) == 1*5+2*10+15*3 );
520
521	static_assert(ranges::RandomAccessIterator<multi::array<double, 1>::iterator>{});
522	static_assert(ranges::RandomAccessIterator<multi::array<double, 2>::iterator>{});
523}
524```
525
526## Boost.Interprocess
527
528Using Interprocess allows for shared memory and for persistent mapped memory.
529
530```cpp
531#include <boost/interprocess/managed_mapped_file.hpp>
532#include "multi/array.hpp"
533#include<cassert>
534
535namespace bip = boost::interprocess;
536using manager = bip::managed_mapped_file;
537template<class T> using mallocator = bip::allocator<T, manager::segment_manager>;
538auto get_allocator(manager& m){return m.get_segment_manager();}
539
540namespace multi = boost::multi;
541template<class T, int D> using marray = multi::array<T, D, mallocator<T>>;
542
543int main(){
544{
545	manager m{bip::create_only, "bip_mapped_file.bin", 1 << 25};
546	auto&& arr2d = *m.construct<marray<double, 2>>("arr2d")(std::tuple{1000, 1000}, 0., get_allocator(m));
547	arr2d[4][5] = 45.001;
548	m.flush();
549}
550{
551	manager m{bip::open_only, "bip_mapped_file.bin"};
552	auto&& arr2d = *m.find<marray<double, 2>>("arr2d").first;
553	assert( arr2d[4][5] == 45.001 );
554	m.destroy<marray<double, 2>>("arr2d");//	eliminate<marray<double, 2>>(m, "arr2d");}
555}
556}
557```
558
559(Similarly works with [LLNL's Meta Allocator](https://github.com/llnl/metall))
560
561## Cuda thrust
562
563```cpp
564#include "multi/adaptors/thrust/allocator_traits.hpp"
565#include "multi/adaptors/thrust/algorithms.hpp"
566#include "multi/array.hpp"
567
568namespace multi = boost::multi;
569int main(){
570	multi::array<double, 2, thrust::device_allocator<double>> A2({10,10});
571	multi::array<double, 2, thrust::device_allocator<double>> B2({10,10});
572	A2[5][0] = 50.;
573	thrust::copy(begin(rotated(A2)[0]), end(rotated(A2)[0]), begin(rotated(B2)[0]));
574	assert( B2[5][0] == 50. );
575}
576```
577
578## TotalView
579
580TotalView visual debugger (commercial) can display arrays in human-readable form (for simple types, like `double` or `std::complex`).
581To use it, simply `#include "multi/adaptors/totalview.hpp"` and link to the TotalView libraries, compile and run the code with the debugger.
582
583## Memory Resources
584
585The library is compatible with C++17's polymorphic memory resources which allows using preallocated buffers.
586This enables the use of stack memory or in order to reduce the number of allocations.
587For example, this code ends up with `buffer` containing the string `"aaaabbbbbb  "`.
588
589```cpp
590#include<pmr>
591int main(){
592	char buffer[13] = "____________"; // a small buffer on the stack
593	std::pmr::monotonic_buffer_resource pool{std::data(buffer), std::size(buffer)}; // or multi::memory::monotonic<char*>
594
595	multi::array<char, 2, std::pmr::polymorphic_allocator<char>> A({2, 2}, 'a', &pool); // or multi::memory::monotonic_allocator<double>
596	multi::array<char, 2, std::pmr::polymorphic_allocator<char>> B({3, 2}, 'b', &pool);
597}
598```
599
600The library comes with its own customized (non-polymorphic) memory resources if, for any reason, the standard PMRs are not sufficiently general.
601The headers to include are:
602
603```cpp
604#include<multi/memory/monotonic.hpp> // multi::memory::monotonic<char*> : no memory reclaim
605#include<multi/memory/stack.hpp>     // multi::memory::stack<char*>     : FIFO memory reclaim
606```
607
608# Technical points
609
610### What's up with the multiple bracket notation?
611
612The chained bracket notation (`A[i][j][k]`) allows to refer to elements and subarrays lower dimensional subarrays in a consistent and _generic_ manner and it is the recommended way to access the array objects.
613It is a frequently raised question whether the chained bracket notation is good for performance, since it appears that each utilization of the bracket leads to the creation of a temporary which in turn generates a partial copy of the layout.
614Moreover, this goes against [historical recommendations](https://isocpp.org/wiki/faq/operator-overloading#matrix-subscript-op).
615
616It turns out that [modern compilers with a fair level of optimization (`-O2`)](https://godbolt.org/z/3fYd5c) can elide these temporary objects, so that `A[i][j][k]` generates identical assembly code as `A.base() + i*stride1 + j*stride2 + k*stride3` (+offsets not shown).
617
618In a subsequent optimization, constant indices can have their "partial stride" computation removed from loops.
619As a result, these two loops lead to the [same machine code](https://godbolt.org/z/z1se74):
620
621```cpp
622    for(int j = 0; j != nj; ++j)
623        ++A[i][j][k];
624```
625```cpp
626    double* Ai_k = A.base() + i*A_stride1 + k*A_stride3;
627    for(int j = 0; j != nj; ++jj)
628        ++(*(Ai_k + j*A_stride2));
629```
630
631Incidentally, the library also supports parenthesis notation with multiple indices `A(i, j, k)` for element or partial access, but it does so for accidental reasons as part of a more general syntax to generate sub-blocks.
632In any case `A(i, j, k)` is expanded to `A[i][j][k]` internally in the library when `i, j, k` are normal integer indices.
633Additionally, array coordinates can be directly stored in tuple-like data structures, allowing this functional syntax:
634
635```cpp
636std::array p = {2,3,4};
637std::apply(A, p) = 234; // A[2][3][4] = 234;
638```
639
640### Customizing recursive operations: SCARY iterators
641
642A custom level of customization can be achieved by intercepting internal recursive algorithms.
643Multi iterators are [SCARY](http://www.open-std.org/jtc1/sc22/WG21/docs/papers/2009/n2980.pdf).
644SCARY means that they are independent of any container and can be accessed generically through their dimension and underlying pointer types:
645
646For example, `boost::multi::array_iterator<double, 2, double*> it` is a row (or column) iterator of an array of dimension 2 or higher, whose underlying pointer type is `double*`.
647This row (or column) and subsequent ones can be accessed by the normal iterator(pointer) notation `*it` and `it[n]` respectively.
648Indirection `it->...` is supported (even for iterators if high dimension).
649The base pointer, the strides and the size of the arrow can be accessed by `base(it)`, `stride(it)`, `it->size()`.
650
651The template arguments of the iterator can be used to customize operations that are recursive (and possibly inefficient in certain context) in the library:
652
653```cpp
654namespace boost{namespace multi{
655template<class It, class T>  // custom copy 1D (aka strided copy)
656void copy(It first, It last, multi::array_iterator<T, 1, fancy::ptr<T> > dest){
657	assert( stride(first) == stride(last) );
658	std::cerr<<"1D copy(it1D, it1D, it1D) with strides "<< stride(first) <<" "<< stride(dest) <<std::endl;
659}
660
661template<class It, class T> // custom copy 2D (aka double strided copy)
662void copy(It first, It last, multi::array_iterator<T, 2, fancy::ptr<T> > dest){
663	assert( stride(first) == stride(last) );
664	std::cerr<<"2D copy(It, It, it2D) with strides "<< stride(first) <<" "<< stride(dest) <<std::endl;
665}
666}}
667```
668
669For example, if your custom pointers refers a memory type in which 2D memory copying (strided copy) is faster than sequencial copying, that kind of instruction can be ejecuted when the library internally calls `copy`.
670This customization must be performed (unfortunately) in the `boost::multi` namespace (this is where the Multi iterators are defined) and the customization happens through matching the dimension and the pointer type.
671
672