1Flow Tutorial
2=============
3
4   * [Using Flow](#using-flow)
5      * [Keywords/primitives](#keywordsprimitives)
6         * [Promise, Future](#promise-future)
7         * [Network traversal](#network-traversal)
8         * [wait()](#wait)
9         * [ACTOR](#actor)
10         * [State Variables](#state-variables)
11         * [Void](#void)
12         * [PromiseStream<>, FutureStream<>](#promisestream-futurestream)
13         * [waitNext()](#waitnext)
14         * [choose / when](#choose--when)
15         * [Future composition](#future-composition)
16      * [Design Patterns](#design-patterns)
17         * [RPC](#rpc)
18         * [ACTOR return values](#actor-return-values)
19      * [“gotchas”](#gotchas)
20         * [Actor compiler](#actor-compiler)
21            * [Switch statements](#switch-statements)
22            * [try/catch with no wait()](#trycatch-with-no-wait)
23         * [ACTOR cancellation](#actor-cancellation)
24   * [Memory Management](#memory-management)
25      * [Reference Counting](#reference-counting)
26         * [Potential Gotchas](#potential-gotchas)
27            * [Reference Cycles](#reference-cycles)
28      * [Arenas](#arenas)
29         * [Potential Gotchas](#potential-gotchas-1)
30            * [Function Creating and Returning a non-Standalone Ref Object](#function-creating-and-returning-a-non-standalone-ref-object)
31            * [Assigning Returned Standalone Object to non Standalone Variable](#assigning-returned-standalone-object-to-non-standalone-variable)
32            * [Use of Standalone Objects in ACTOR Functions](#use-of-standalone-objects-in-actor-functions)
33
34# Using Flow
35
36Flow introduces some new keywords and flow controls. Combining these into workable units
37also introduces some new design patterns to C++ programmers.
38
39## Keywords/primitives
40
41The essence of Flow is the capability of passing messages asynchronously between
42components. The basic data types that connect asynchronous senders and receivers are
43`Promise<>` and `Future<>`. The sender holds a `Promise<X>` to, sometime in the future, deliver
44a value of type `X` to the holder of the `Future<X>`. A receiver, holding a `Future<X>`, at some point
45needs the `X` to continue computation, and invokes the `wait(Future<> f)` statement to pause
46until the value is delivered. To use the `wait()` statement, a function needs to be declared as an
47ACTOR function, a special flow keyword which directs the flow compiler to create the necessary
48internal callbacks, etc. Similarly, When a component wants to deal not with one asynchronously
49delivered value, but with a series, there are `PromiseStream<>` and `FutureStream<>`. These
50two constructs allow for “reliable delivery” of messages, and play an important role in the design
51patterns.
52
53### Promise<T>, Future<T>
54
55`Promise<T>` and `Future<T>` are intrinsically linked (they go in pairs) and are two wrappers
56around a construct called `SingleAssignmentVar`, a variable that can be set only once. A
57Promise is a handle to a `SingleAssignmentVar` that allows for a one-time set of the value; a
58Future is a read-only handle to the variable that only allows reading of the value.
59
60The following example uses these two simple types:
61
62```c++
63Promise<int> p;
64Future<int> f = p.getFuture();
65p.send( 4 );
66printf( "%d\n", f.get() ); // f is already set
67```
68
69### Network traversal
70
71Promises and futures can be used within a single process, but their real strength in a distributed
72system is that they can traverse the network. For example, one computer could create a
73promise/future pair, then send the promise to another computer over the network. The promise
74and future will still be connected, and when the promise is fulfilled by the remote computer, the
75original holder of the future will see the value appear.
76
77[TODO: network delivery guarantees]
78
79### wait()
80
81Wait allows for the calling code to pause execution while the value of a `Future` is set. This
82statement is called with a `Future<T>` as its parameter and returns a `T`; the eventual value of the
83`Future`. Errors that are generated in the code that is setting the `Future`, will be thrown from
84the location of the `wait()`, so `Error`s must be caught at this location.
85
86The following example shows a snippet (from an ACTOR) of waiting on a `Future`:
87
88```c++
89Future<int> f = asyncCalculation(); // defined elsewhere
90int count = wait( f );
91printf( "%d\n", count );
92```
93
94It is worth nothing that, although the function `wait()` is declared in [actorcompiler.h](actorcompiler.h), this
95“function” is compiled by the Actor Compiler into a complex set of integrated statements and
96callbacks. It is therefore never present in generated code or at link time.
97**Note** : because of the way that the actor compiler is built, `wait()` must always assign the
98resulting value to a _newly declared variable._
99
100From 6.1, `wait()` on `Void` actors shouldn't assign the resulting value. So, the following code
101
102```c++
103Future<Void> asyncTask(); //defined elsewhere
104wait(asyncTask());
105```
106
107becomes
108
109```c++
110Future<Void> asyncTask(); //defined elsewhere
111wait(asyncTask());
112```
113
114### ACTOR
115
116The only code that can call the `wait()` function are functions that are themselves labeled with
117the “ACTOR” tag. This is the essential unit of asynchronous work that can be chained together
118to create complex message-passing systems.
119An actor, although declared as returning a `Future<T>`, simply returns a `T`. Because an actor
120may wait on the results of other actors, an actor must return either a `Future `or `void`. In most
121cases returning `void `is less advantageous than returning a `Future`, since there are
122implications for actor cancellation. See the Actor Cancellation section for details.
123
124The following simple actor function waits on the `Future` to be ready, when it is ready adds `offset` and returns the result:
125
126```c++
127ACTOR Future<int> asyncAdd(Future<int> f, int offset) {
128    int value = wait( f );
129    return value + offset;
130}
131```
132
133### State Variables
134
135Since ACTOR-labeled functions are compiled into a c++ class and numerous supporting
136functions, the variable scoping rules that normally apply are altered. The differences arise out
137of the fact that control flow is broken at wait() statements. Generally the compiled code is
138broken into chunks at wait statements, so scoping variables so that they can be seen in multiple
139“chunks” requires the `state `keyword.
140The following function waits on two inputs and outputs the sum with an offset attached:
141
142```c++
143ACTOR Future<int> asyncCalculation(Future<int> f1, Future<int> f2, int offset ) {
144    state int value1 = wait( f1 );
145    int value2 = wait( f2 );
146    return value1 + value2 + offset;
147}
148```
149
150### Void
151
152The `Void `type is used as a signalling-only type for coordination of asynchronous processes.
153The following function waits on an input, send an output to a `Promise`, and signals completion:
154
155```c++
156ACTOR Future<Void> asyncCalculation(Future<int> f, Promise<int> p, int offset ) {
157    int value = wait( f );
158    p.send( value + offset );
159    return Void();
160}
161```
162
163### PromiseStream<>, FutureStream<>
164
165PromiseStream ​and `FutureStream` are groupings of a series of asynchronous messages.
166
167
168These allow for two important features: multiplexing and network reliability, discussed later.
169They can be waited on with the `waitNext()` function.
170
171### waitNext()
172
173Like `wait()`, `waitNext()` pauses program execution and awaits the next value in a
174`FutureStream`. If there is a value ready in the stream, execution continues without delay. The
175following “server” waits on input, send an output to a `PromiseStream`:
176
177```c++
178ACTOR void asyncCalculation(FutureStream<int> f, PromiseStream<int> p, int offset ) {
179    while( true ) {
180        int value = waitNext( f );
181        p.send( value + offset );
182    }
183}
184```
185
186### choose / when
187
188The `choose / when` construct allows an Actor to wait for multiple `Future `events at once in a
189ordered and predictable way. Only the `when` associated with the first future to become ready
190will be executed. The following shows the general use of choose and when:
191
192```c++
193choose {
194    when( int number = waitNext( futureStreamA ) ) {
195        // clause A
196    }
197    when( std::string text = wait( futureB ) ) {
198        // clause B
199    }
200}
201```
202
203You can put this construct in a loop if you need multiple `when` clauses to execute.
204
205### Future composition
206
207Futures can be chained together with the result of one depending on the output of another.
208
209```c++
210ACTOR Future<int> asyncAddition(Future<int> f, int offset ) {
211    int value = wait( f );
212    return value + offset;
213}
214
215ACTOR Future<int> asyncDivision(Future<int> f, int divisor ) {
216    int value = wait( f );
217    return value / divisor;
218}
219
220ACTOR Future<int> asyncCalculation( Future<int> f ) {
221    int value = wait( asyncDivision(
222    asyncAddition( f, 10 ), 2 ) );
223    return value;
224}
225```
226
227
228## Design Patterns
229
230### RPC
231
232Many of the “servers” in FoundationDB that communicate over the network expose their interfaces as a struct of PromiseStreams--one for each request type. For instance, a logical server that keeps a count could look like this:
233
234```c++
235struct CountingServerInterface {
236    PromiseStream<int> addCount;
237    PromiseStream<int> subtractCount;
238    PromiseStream<Promise<int>> getCount;
239
240    // serialization code required for use on a network
241    template <class Ar>
242    void serialize( Ar& ar ) {
243        serializer(ar, addCount, subtractCount, getCount);
244    }
245};
246```
247
248Clients can then pass messages to the server with calls such as this:
249
250```c++
251CountingServerInterface csi = ...; // comes from somewhere
252csi.addCount.send(5);
253csi.subtractCount.send(2);
254Promise<int> finalCount;
255csi.getCount.send(finalCount);
256int value = wait( finalCount.getFuture() );
257```
258
259There is even a utility function to take the place of the last three lines: [TODO: And is necessary
260when sending requests over a real network to ensure delivery]
261
262```c++
263CountingServerInterface csi = ...; // comes from somewhere
264csi.addCount.send(5);
265csi.subtractCount.send(2);
266int value = wait( csi.getCount.getReply<int>() );
267```
268
269Canonically, a single server ACTOR that implements the interface is a loop with a choose
270statement between all of the request types:
271
272```c++
273ACTOR void serveCountingServerInterface(CountingServerInterface csi) {
274    state int count = 0;
275    loop {
276        choose {
277            when (int x = waitNext(csi.addCount.getFuture())){
278                count += x;
279            }
280            when (int x = waitNext(csi.subtractCount.getFuture())){
281                count -= x;
282            }
283            when (Promise<int> r = waitNext(csi.getCount.getFuture())){
284                r.send( count ); // goes to client
285            }
286        }
287    }
288}
289```
290
291In this example, the add and subtract interfaces modify the count itself, stored with a state
292variable. The get interface is a bit more complicated, taking a `Promise<int>` instead of just an
293int. In the interface class, you can see a `PromiseStream<Promise<int>>`. This is a common
294construct that is analogous to sending someone a self-addressed envelope. You send a
295promise to a someone else, who then unpacks it and send the answer back to you, because
296you are holding the corresponding future.
297
298### ACTOR return values
299
300An actor can have only one returned Future, so there is a case that one actor wants to perform
301some operation more than once:
302
303```c++
304ACTOR Future<Void> periodically(PromiseStream<Void> ps, int seconds) {
305    loop {
306        wait( delay( seconds ) );
307        ps.send(Void());
308    }
309}
310```
311
312In this example, the `PromiseStream `is actually a way for the actor to return data from some
313operation that it ongoing.
314
315## “gotchas”
316
317### Actor compiler
318
319There are some things about the actor compiler that can confuse and may change over time
320
321#### Switch statements
322
323Do not use these with wait statements inside!
324
325#### try/catch with no wait()
326
327When a `try/catch` block does not `wait()` the blocks are still decomposed into separate
328functions. This means that variables that you want to access both before and after such a block
329will need to be declared state.
330
331### ACTOR cancellation
332
333When the reference to the returned `Future` of an actor is dropped, that actor will be cancelled.
334Cancellation of an actor means that any `wait()`s that were currently active (the callback was
335currently registered) will be delivered an exception (`actor_cancelled`). In almost every case
336this exception should not be caught, though there are cetainly exceptions!
337
338# Memory Management
339
340## Reference Counting
341
342The FoundationDB solution uses reference counting to manage the lifetimes of many of its
343constituent classes. In order for a class `T` to be reference counted, the following two globally
344defined methods must be defined (see [FastRef.h](FastRef.h)):
345
346
347```c++
348void addref(T*);
349void delref(T*);
350```
351
352The easiest way to implement these methods is by making your class a descendant of
353`ReferenceCounted`.
354
355NOTE: Any descendants of `ReferenceCounted` should either have virtual destructors or be
356sealed. If you fail to meet these criteria, then references to descendants of your class will never
357be deleted.
358
359If you choose not to inherit from `ReferenceCounted`, you will have to manage the reference
360count yourself. One way this can be done is to define `void addref()` and `void delref()`
361methods on your class, which will make it compatible with the existing global `addref` and
362`delref` methods. Otherwise, you will need to create the global `addref` and `delref` methods
363for your class, as mentioned above. In either case, you will need to manage the reference
364count on your object and delete it when the count reaches 0. Note that the reference count
365should usually be initialized to 1, as the `addRef(T*)` function is not called when the object is
366created.
367
368To create a reference counted instance of a class `T`, you instantiate a `Reference<T>` on the
369stack with a pointer to your `T` object:
370
371```c++
372Reference<T> refCountedInstance(new T());
373```
374The `Reference<T>` class automatically calls addref on your `T` instance every time it is copied
375(such as by argument passing or assignment), but not when the object is initially created
376(consequently, `ReferenceCounted` classes are initialized with reference count 1). It will call
377`delref` on your `T` instance whenever a particular `Reference<T>` instance gets deleted (usually
378by going out of scope). When no more instances of `Reference<T>` holding a particular `T`
379instance exist, then that `T` instance will be destroyed.
380
381### Potential Gotchas
382
383#### Reference Cycles
384
385You must be cautious about creating reference cycles when using reference counting. For
386example, if two `Reference<T>` objects refer to each other, then without specific intervention
387their reference counts will never reach 0 and the objects will never be deleted.
388
389## Arenas
390
391
392In addition to using reference counting, the FoundationDB solution also uses memory pools to
393allocate buffers. In this scheme, buffers are allocated from a common pool, called an `Arena`,
394and remain valid for the entire lifetime of that `Arena`. When the `Arena` is destroyed, all of the
395memory it held for the buffers is deallocated along with it. As a general convention, types which
396can use these `Arenas` and do not manage their own memory are given the "`Ref`" suffix. When
397a `*Ref` object is being used, consideration should be given to how its buffers are being
398managed (much in the same way that you would consider memory management when you see
399a `T*`).
400
401As an example, consider the `StringRef` class. A `StringRef` is an object which contains a
402pointer to a sequence of bytes, but does not actually manage that buffer. Thus, if a `StringRef`
403is deallocated, the data remains intact. Conversely, if the data is deallocated, the `StringRef`
404becomes invalid. In order for the `StringRef` to manage its own buffer, we need to create an
405instance of the `Standalone<StringRef>` class:
406
407```c++
408Standalone<StringRef> str("data");
409```
410
411A `Standalone<T>` object has its own arena (technically, it is an `Arena`), and for classes like
412`StringRef` which support the use of arenas, the memory buffers used by the class are
413allocated from that arena. `Standalone<T>` is also a subclass of `T`, and so for all other purposes
414operates just like a `T`.
415
416There are a number of classes which support the use of arenas, and some which have
417convenience types for their `Standalone` versions (not a complete list):
418
419|        T         | Standalone<T> alias |
420|:----------------:|:-------------------:|
421| StringRef        | N/A                 |
422| LiteralStringRef | N/A                 |
423| KeyRef           | Key                 |
424| ValueRef         | Value               |
425| KeyValueRef      | KeyValue            |
426| KeyRangeRef      | KeyRange            |
427| KeySelectorRef   | KeySelector         |
428| VectorRef        | N/A                 |
429
430The `VectorRef<T>` class is an `std::vector`-like object which is used to manage a list of these
431`*Ref` objects. A `Standalone<VectorRef<T>>` has its own arena which can be used to store
432the buffers held by its constituents. In order for that to happen, one of the two deep insertion
433methods (`push_back_deep` or `append_deep`) should be used when placing items in the vector.
434The shallow insertion methods will hold the objects only; any arena-managed memory is not
435copied. Thus, the `Standalone<VectorRef<T>>` will hold the `T` objects without managing their
436memory. Note that the arena(s) used by the `VectorRef` need not be its own (and cannot be
437unless the `VectorRef` is a `Standalone` object), and are determined by arguments to the
438functions that insert items.
439
440`VectorRef<T>` can also be used with types besides the standard `Ref` types, in which case the
441deep copy methods should not be used. In this case, the `VectorRef<T>` object holds the items
442in an arena much like a normal vector would hold the items in its buffer. Again, the arena used
443by the `VectorRef<T>` need not be its own.
444
445When a `Standalone<T>` is copied (e.g. by argument passing or assignment) to another
446`Standalone<T>`, they will share the same memory. The actual memory contents of the arena
447are stored in a reference counted structure (`ArenaBlock`), so the memory will persist until all
448instances of `Arena` holding that memory are destroyed. If instead a `T` object is copied to a
449`Standalone<T>`, then its entire contents are copied into the arena of the new `Standalone<T>`
450object using a deep copy. Thus, it is generally more efficient to consistently use `*Ref` objects
451and manage the memory with something external, or to consistently use `Standalone<T>`
452objects (where assignments just increment reference counters) to avoid memory copies.
453
454### Potential Gotchas
455
456#### Function Creating and Returning a non-Standalone Ref Object
457
458A function which creates a `Ref` object should generally return a `Standalone` version of that
459object. Otherwise, make certain that the `Arena` on which that `Ref` object was created still exists
460when the caller uses the returned `Ref`.
461
462#### Assigning Returned Standalone Object to non Standalone Variable
463
464A caller which receives a `Standalone` return value should assign that return value to a
465`Standalone` variable. Consider the following example:
466
467```c++
468Standalone<StringRef> foo() {
469    return Standalone<StringRef>("string");
470}
471
472void bar() {
473    StringRef val = foo();
474}
475```
476
477When `val` is copy-assigned in `bar`, its data is stored in the `Arena` of the `StringRef` that was
478returned from `foo`. When this returned `StringRef` is subsequently deallocated, `val` will no
479longer be valid.
480
481
482#### Use of Standalone Objects in ACTOR Functions
483
484Special care needs to be taken when using using `Standalone` values in actor functions.
485Consider the following example:
486
487```
488ACTOR Future<void> foo(StringRef param)
489{
490    //Do something
491    return Void();
492}
493
494ACTOR Future<Void> bar()
495{
496    Standalone<StringRef> str("string");
497    wait(foo(str));
498    return Void();
499}
500```
501
502Although it appears at first glance that `bar` keeps the `Arena` for `str` alive during the call to `foo`,
503it will actually go out of scope in the class generated by the actor compiler. As a result, `param` in
504`foo` will become invalid. To prevent this, either declare `param` to be of type
505`Standalone<StringRef>` or make `str` a state variable.
506