1/*
2 * Copyright (c) 2009-2013 Zmanda, Inc.  All Rights Reserved.
3 *
4 * This program is free software; you can redistribute it and/or
5 * modify it under the terms of the GNU General Public License
6 * as published by the Free Software Foundation; either version 2
7 * of the License, or (at your option) any later version.
8 *
9 * This program is distributed in the hope that it will be useful, but
10 * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
11 * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
12 * for more details.
13 *
14 * You should have received a copy of the GNU General Public License along
15 * with this program; if not, write to the Free Software Foundation, Inc.,
16 * 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
17 *
18 * Contact information: Zmanda Inc., 465 S. Mathilda Ave., Suite 300
19 * Sunnyvale, CA 94085, USA, or: http://www.zmanda.com
20 */
21
22%perlcode %{
23
24=head1 NAME
25
26Amanda::MainLoop - Perl interface to the Glib MainLoop
27
28=head1 SYNOPSIS
29
30    use Amanda::MainLoop;
31
32    my $to = Amanda::MainLoop::timeout_source(2000);
33    $to->set_callback(sub {
34	print "Time's Up!\n";
35	$to->remove();		    # dont' re-queue this timeout
36	Amanda::MainLoop::quit();   # return from Amanda::MainLoop::run
37    });
38
39    Amanda::MainLoop::run();
40
41Note that all functions in this module are individually available for
42export, e.g.,
43
44    use Amanda::MainLoop qw(run quit);
45
46=head1 OVERVIEW
47
48The main event loop of an application is a tight loop which waits for
49events, and calls functions to respond to those events.  This design
50allows an IO-bound application to multitask within a single thread, by
51responding to IO events as they occur instead of blocking on
52particular IO operations.
53
54The Amanda security API, transfer API, and other components rely on
55the event loop to allow them to respond to their own events in a
56timely fashion.
57
58The overall structure of an application, then, is to initialize its
59state, register callbacks for some events, and begin looping.  In each
60iteration, the loop waits for interesting events to occur (data
61available for reading or writing, timeouts, etc.), and then calls
62functions to handle those interesting things.  Thus, the application
63spends most of its time waiting.  When some application-defined state
64is reached, the loop is terminated and the application cleans up and
65exits.
66
67The Glib main loop takes place within a call to
68C<Amanda::MainLoop::run()>.  This function executes until a call to
69C<Amanda::MainLoop::quit()> occurs, at which point C<run()> returns.
70You can check whether the loop is running with
71C<Amanda::MainLoop::is_running()>.
72
73=head1 HIGH-LEVEL INTERFACE
74
75The functions in this section are intended to make asynchronous
76programming as simple as possible.  They are implemented on top of the
77interfaces described in the LOW-LEVEL INTERFACE section.
78
79=head3 call_later
80
81In most cases, a callback does not need to be invoked immediately.  In
82fact, because Perl does not do tail-call optimization, a long chain of
83callbacks may cause the perl stack to grow unnecessarily.
84
85The solution is to queue the callback for execution on the next
86iteration of the main loop, and C<call_later($cb, @args)> does exactly
87this.
88
89    sub might_delay {
90        my ($cb) = @_;
91        if (can_do_it_now()) {
92            my $result = do_it();
93            Amanda::MainLoop::call_later($cb, $result)
94        } else {
95            # ..
96        }
97    }
98
99When starting the main loop, an application usually has a sub that
100should run after the loop has started.  C<call_later> works in this
101situation, too.
102
103    my $main = sub {
104	# ..
105	Amanda::MainLoop::quit();
106    };
107    Amanda::MainLoop::call_later($main);
108    # ..
109    Amanda::MainLoop::run();
110
111=head3 make_cb
112
113As an optimization, C<make_cb> wraps a sub with a call to call_later
114while also naming the sub (using C<Sub::Name>, if available):
115
116    my $fetched_cb = make_cb(fetched_cb => sub {
117	# .. callback body
118    }
119
120In general, C<make_cb> should be used whenever a callback is passed to
121some other library.  For example, the Changer API (see
122L<Amanda::Changer>) might be invoked like this:
123
124    my $reset_finished_cb = make_cb(reset_finished_cb => sub {
125	my ($err) = @_;
126	die "while resetting: $err" if $err;
127	# ..
128    });
129
130Be careful I<not> to use C<make_cb> in cases where some action must
131take place before the next iteration of the main loop.  In practice,
132this means C<make_cb> should be avoided with file-descriptor
133callbacks, which will trigger repeatedly until the descriptors' needs
134are addressed.
135
136C<make_cb> is exported automatically.
137
138=head3 call_after
139
140Sometimes you need the MainLoop equivalent of C<sleep()>.  That comes
141in the form of C<call_later($delay, $cb, @args)>, which takes a delay
142(in milliseconds), a sub, and an arbitrary number of arguments.  The
143sub is called with the arguments after the delay has elapsed.
144
145    sub countdown {
146	my $counter;
147	$counter = sub {
148	    print "$i..\n";
149	    if ($i) {
150		Amanda::MainLoop::call_after(1000, $counter, $i-1);
151	    }
152	}
153	$counter->(10);
154    }
155
156The function returns the underlying event source (see below), enabling
157the caller to cancel the pending call:
158
159    my $tosrc = Amanda::MainLoop::call_after(15000, $timeout_cb):
160    # ...data arrives before timeout...
161    $tosrc->remove();
162
163=head3 call_on_child_termination
164
165To monitor a child process for termination, give its pid to
166C<call_on_child_termination($pid, $cb, @args)>.  When the child exits
167for any reason, this will collect its exit status (via C<waitpid>) and
168call C<$cb> as
169
170    $cb->($exitstatus, @args);
171
172Like C<call_after>, this function returns the event source to allow
173early cancellation if desired.
174
175=head3 async_read
176
177    async_read(
178	fd => $fd,
179	size => $size,        # optional, default 0
180	async_read_cb => $async_read_cb,
181	args => [ .. ]);      # optional
182
183This function will read C<$size> bytes when they are available from
184file descriptor C<$fd>, and invoke the callback with the results:
185
186    $async_read_cb->($err, $buf, @args);
187
188If C<$size> is zero, then the callback will get whatever data is
189available as soon as it is available, up to an arbitrary buffer size.
190If C<$size> is nonzero, then a short read may still occur if C<$size>
191bytes do not become available simultaneously.  On EOF, C<$buf> will be
192the empty string.  It is the caller's responsibility to set C<$fd> to
193non-blocking mode.  Note that not all operating sytems generate errors
194that might be reported here.  For example, on Solaris an invalid file
195descriptor will be silently ignored.
196
197The return value is an event source, and calling its C<remove> method
198will cancel the read.  It is an error to have more than one
199C<async_read> operation on a single file descriptor at any time, and
200will lead to unpredictable results.
201
202This function adds a new FdSource every time it is invoked, so it is
203not well-suited to processing large amounts of data.  For that
204purpose, consider using the low-level interface or, better, the
205transfer architecture (see L<Amanda::Xfer>).
206
207=head3 async_write
208
209    async_write(
210	fd => $fd,
211	data => $data,
212	async_write_cb => $async_write_cb,
213	args => [ .. ]);      # optional
214
215This function will write C<$data> to file descriptor C<$fd> and invoke
216the callback with the number of bytes written:
217
218    $cb->($err, $bytes_written, @args);
219
220If C<$bytes_written> is less than then length of <$data>, then an
221error occurred, and is given in C<$err>.  As for C<async_read>, the
222caller should set C<$fd> to non-blocking mode.  Multiple parallel
223invocations of this function for the same file descriptor are allowed
224and will be serialized in the order the calls were made:
225
226    async_write($fd, "HELLO!\n",
227	async_write_cb => make_cb(wrote_hello => sub {
228	    print "wrote 'HELLO!'\n";
229	}));
230    async_write($fd, "GOODBYE!\n",
231	async_write_cb => make_cb(wrote_goodbye => sub {
232	    print "wrote 'GOODBYE!'\n";
233	}));
234
235In this case, the two strings are guaranteed to be written in the same
236order, and the callbacks will be called in the correct order.
237
238Like async_read, this function may add a new FdSource every time it is
239invoked, so it is not well-suited to processing large amounts of data.
240
241=head3 synchronized
242
243Java has the notion of a "synchronized" method, which can only execute in one
244thread at any time.  This is a particular application of a lock, in which the
245lock is acquired when the method begins, and released when it finishes.
246
247With C<Amanda::MainLoop>, this functionality is generally not needed because
248there is no unexpected preemeption. However, if you break up a long-running
249operation (that doesn't allow concurrency) into several callbacks, you'll need
250to ensure that at most one of those operations is going on at a time. The
251C<synchronized> function manages that for you.
252
253The function takes a C<$lock> argument, which should be initialized to an empty
254arrayref (C<[]>).  It is used like this:
255
256    use Amanda::MainLoop 'synchronized';
257    # ..
258    sub dump_data {
259	my $self = shift;
260	my ($arg1, $arg2, $dump_cb) = @_;
261
262	synchronized($self->{'lock'}, $dump_cb, sub {
263	    my ($dump_cb) = @_; # IMPORTANT! See below
264	    $self->do_dump_data($arg1, $arg2, $dump_cb);
265	};
266    }
267
268Here, C<do_dump_data> may take a long time to complete (perhaps it starts
269a long-running data transfer) but only one such operation is allowed at any
270time and other C<Amanda::MainLoop> callbacks may occur (e.g. a timeout).
271When the critical operation is complete, it calls C<$dump_cb> which will
272release the lock before transferring control to the caller.
273
274Note that the C<$dump_cb> in the inner C<sub> shadows that in
275C<dump_data> -- this is intentional, the a call to the the inner
276C<$dump_cb> is how C<synchronized> knows that the operation has completed.
277
278Several methods may be synchronized with one another by simply sharing the same
279lock.
280
281=head1 ASYNCHRONOUS STYLE
282
283When writing asynchronous code, it's easy to write code that is *very*
284difficult to read or debug.  The suggestions in this section will help
285write code that is more readable, and also ensure that all asynchronous
286code in Amanda uses similar, common idioms.
287
288=head2 USING CALLBACKS
289
290Most often, callbacks are short, and can be specified as anonymous
291subs.  They should be specified with make_cb, like this:
292
293    some_async_function(make_cb(foo_cb => sub {
294	my ($x, $y) = @_;
295	# ...
296    }));
297
298If a callback is more than about two lines, specify it in a named
299variable, rather than directly in the function call:
300
301    my $foo_cb = make_cb(foo_cb => sub {
302	my ($src) = @_;
303	# .
304	# .  long function
305	# .
306    });
307    some_async_function($foo_cb);
308
309When using callbacks from an object-oriented package, it is often
310useful to treat a method as a callback.  This requires an anonymous
311sub "wrapper", which can be written on one line:
312
313    some_async_function(sub { $self->foo_cb(@_) });
314
315=head2 LINEARITY
316
317The single most important factor in readability is linearity.  If a function
318that performs operations A, B, and C in that order, then the code for A, B, and
319C should appear in that order in the source file.  This seems obvious, but it's
320all too easy to write
321
322    sub three_ops {
323	my $do_c = sub { .. };
324	my $do_b = sub { .. $do_c->() .. };
325	my $do_a = sub { .. $do_b->() .. };
326	$do_a->();
327    }
328
329Which isn't very readable.  Be readable.
330
331=head2 SINGLE ENTRY AND EXIT
332
333Amanda's use of callbacks emulates continuation-passing style.  As such, when a
334function finishes -- whether successfully or with an error -- it should call a
335single callback.  This ensures that the function has a simple control
336interface: perform the operation and call the callback.
337
338=head2 MULTIPLE STEPS
339
340Some operations require a long squence of asynchronous operations.  For
341example, often the results of one operation are required to initiate
342another.  The I<step> syntax is useful to make this much more readable, and
343also eliminate some nasty reference-counting bugs.  The idea is that each "step"
344in the process gets its own sub, and then each step calls the next step.  The
345first step defined will be called automatically.
346
347    sub send_file {
348	my ($hostname, $port, $data, $sendfile_cb) = @_;
349	my ($addr, $socket); # shared lexical variables
350	my $steps = define_steps
351		cb_ref => \$sendfile_cb;
352	step lookup_addr => sub {
353	    return async_gethostbyname(hostname => $hostname,
354				ghbn_cb => $steps->{'got_addr'});
355	};
356	step ghbn_cb => sub {
357	    my ($err, $hostinfo) = @_;
358	    die $err if $err;
359	    $addr = $hostinfo->{'ipaddr'};
360	    return $steps->{'connect'}->();
361	};
362	step connect => sub {
363	    return async_connect(
364		ipaddr => $addr,
365		port => $port,
366		connect_cb => $steps->{'connect_cb'},
367	    );
368	};
369	step connect_cb => sub {
370	    my ($err, $conn_sock) = @_;
371	    die $err if $err;
372	    $socket = $conn_sock;
373	    return $steps->{'write_block'}->();
374	};
375	# ...
376    }
377
378The C<define_steps> function sets the stage.  It is given a reference to the
379callback for this function (recall there is only one exit point!), and
380"patches" that reference to free C<$steps>, which otherwise forms a reference
381loop, on exit.
382
383WARNING: if the function or method needs to do any kind of setup before its
384first step, that setup should be done either in a C<setup> step or I<before>
385the C<define_steps> invocation.  Do not write any statements other than step
386declarations after the C<define_steps> call.
387
388Note that there are more steps in this example than are strictly necessary: the
389body of C<connect> could be appended to C<ghbn_cb>.  The extra steps make the
390overall operation more readable by adding "punctuation" to separate the task of
391handling a callback (C<ghbn_cb>) from starting the next operation (C<connect>).
392
393Also note that the enclosing scope contains some lexical (C<my>)
394variables which are shared by several of the callbacks.
395
396All of the steps are wrapped by C<make_cb>, so each step will be executed on a
397separate iteration of the MainLoop.  This generally has the effect of making
398asynchronous functions share CPU time more fairly.  Sometimes, especially when
399using the low-level interface, a callback must be called immediately.  To
400achieve this for all callbacks, add C<< immediate => 1 >> to the C<define_steps>
401invocation:
402
403    my $steps = define_steps
404	    cb_ref => \$finished_cb,
405	    immediate => 1;
406
407To do the same for a single step, add the same keyword to the C<step> invocation:
408
409    step immediate => 1,
410	 connect => sub { .. };
411
412In some case, you want to execute some code when the step finish, it can
413be done by defining a finalize code in define_steps:
414
415    my $steps = define_steps
416	    cb_ref => \$finished_cb,
417	    finalize => sub { .. };
418
419=head2 JOINING ASYNCHRONOUS "THREADS"
420
421With slow operations, it is often useful to perform multiple operations
422simultaneously.  As an example, the following code might run two system
423commands simultaneously and capture their output:
424
425    sub run_two_commands {
426	my ($finished_cb) = @_;
427	my $running_commands = 0;
428	my ($result1, $result2);
429	my $steps = define_steps
430	    cb_ref => \$finished_cb;
431	step start => sub {
432	    $running_commands++;
433	    run_command($command1,
434		run_cb => $steps->{'command1_done'});
435	    $running_commands++;
436	    run_command($command2,
437		run_cb => $steps->{'command2_done'});
438	};
439	step command1_done => sub {
440	    $result1 = $_[0];
441	    $steps->{'maybe_done'}->();
442	};
443	step command2_done => sub {
444	    $result2 = $_[0];
445	    $steps->{'maybe_done'}->();
446	};
447	step maybe_done => sub {
448	    return if --$running_commands; # not done yet
449	    $finished_cb->($result1, $result2);
450	};
451    }
452
453It is tempting to optimize out the C<$running_commands> with something like:
454
455    step maybe_done { ## BAD!
456	return unless defined $result1 and defined $result2;
457	$finished_cb->($result1, $result2);
458    }
459
460However this can lead to trouble.  Remember that define_steps automatically
461applies C<make_cb> to each step, so a C<maybe_done> is not invoked immediately
462by C<command1_done> and C<command2_done> - instead, C<maybe_done> is scheduled
463for invocation in the next loop of the mainloop (via C<call_later>).  If both
464commands finish before C<maybe_done> is invoked, C<call_later> will be called
465I<twice>, with both C<$result1> and C<$result2> defined both times.  The result
466is that C<$finished_cb> is called twice, and mayhem ensues.
467
468This is a complex case, but worth understanding if you want to be able to debug
469difficult MainLoop bugs.
470
471=head2 WRITING ASYNCHRONOUS INTERFACES
472
473When designing a library or interface that will accept and invoke
474callbacks, follow these guidelines so that users of the interface will
475not need to remember special rules.
476
477Each callback signature within a package should always have the same
478name, ending with C<_cb>.  For example, a hypothetical
479C<Amanda::Estimate> module might provide its estimates through a
480callback with four parameters.  This callback should be referred to as
481C<estimate_cb> throughout the package, and its parameters should be
482clearly defined in the package's documentation.  It should take
483positional parameters only.  If error conditions must also be
484communicated via the callback, then the first parameter should be an
485C<$error> parameter, which is undefined when no error has occurred.
486The Changer API's C<res_cb> is typical of such a callback signature.
487
488A caller can only know that an operation is complete by the invocation
489of the callback, so it is important that a callback be invoked
490I<exactly once> in all circumstances.  Even in an error condition, the
491caller needs to know that the operation has failed.  Also beware of
492bugs that might cause a callback to be invoked twice.
493
494Functions or methods taking callbacks as arguments should either take
495only a callback (like C<call_later>), or take hash-key parameters,
496where the callback's key is the signature name.  For example, the
497C<Amanda::Estimate> package might define a function like
498C<perform_estimate>, invoked something like this:
499
500    my $estimate_cb = make_cb(estimate_cb => sub {
501	my ($err, $size, $level) = @_;
502	die $err if $err;
503	# ...
504    });
505    Amanda::Estimate::perform_estimate(
506	host => $host,
507	disk => $disk,
508	estimate_cb => $estimate_cb,
509    );
510
511When invoking a user-supplied callback within the library, there is no
512need to wrap it in a C<call_later> invocation, as the user already
513supplied that wrapper via C<make_cb>, or is not interested in using
514such a wrapper.
515
516Callbacks are a form of continuation
517(L<http://en.wikipedia.org/wiki/Continuations>), and as such should
518only be called at the I<end> of a function.  Do not do anything after
519invoking a callback, as you cannot know what processing has gone on in
520the callback.
521
522    sub estimate_done {
523	# ...
524	$self->{'estimate_cb'}->(undef, $size, $level);
525	$self->{'estimate_in_progress'} = 0; # BUG!!
526    }
527
528In this case, the C<estimate_cb> invocation may have called
529C<perform_estimate> again, setting C<estimate_in_progress> back to 1.
530A technique to avoid this pitfall is to always C<return> a callback's
531result, even though that result is not important.  This makes the bug
532much more apparent:
533
534    sub estimate_done {
535	# ...
536	return $self->{'estimate_cb'}->(undef, $size, $level);
537	$self->{'estimate_in_progress'} = 0; # BUG (this just looks silly)
538    }
539
540=head1 LOW-LEVEL INTERFACE
541
542MainLoop events are generated by event sources.  A source may produce
543multiple events over its lifetime.  The higher-level methods in the
544previous section provide a more Perlish abstraction of event sources,
545but for efficiency it is sometimes necessary to use event sources
546directly.
547
548The method C<< $src->set_callback(\&cb) >> sets the function that will
549be called for a given source, and "attaches" the source to the main
550loop so that it will begin generating events.  The arguments to the
551callback depend on the event source, but the first argument is always
552the source itself.  Unless specified, no other arguments are provided.
553
554Event sources persist until they are removed with
555C<< $src->remove() >>, even if the source itself is no longer accessible from Perl.
556Although Glib supports it, there is no provision for "automatically"
557removing an event source.  Also, calling C<< $src->remove() >> more than
558once is a potentially-fatal error. As an example:
559
560  sub start_timer {
561    my ($loops) = @_;
562    Amanda::MainLoop::timeout_source(200)->set_callback(sub {
563      my ($src) = @_;
564      print "timer\n";
565      if (--$loops <= 0) {
566        $src->remove();
567        Amanda::MainLoop::quit();
568      }
569    });
570  }
571  start_timer(10);
572  Amanda::MainLoop::run();
573
574There is no means in place to specify extra arguments to be provided
575to a source callback when it is set.  If the callback needs access to
576other data, it should use a Perl closure in the form of lexically
577scoped variables and an anonymous sub.  In fact, this is exactly what
578the higher-level functions (described above) do.
579
580=head2 Timeout
581
582  my $src = Amanda::MainLoop::timeout_source(10000);
583
584A timeout source will create events at the specified interval,
585specified in milliseconds (thousandths of a second).  The events will
586continue until the source is destroyed.
587
588=head2 Idle
589
590  my $src = Amanda::MainLoop::idle_source(2);
591
592An idle source will create events continuously except when a
593higher-priority source is emitting events.  Priorities are generally
594small positive integers, with larger integers denoting lower
595priorities.  The events will continue until the source is destroyed.
596
597=head2 Child Watch
598
599  my $src = Amanda::MainLoop::child_watch_source($pid);
600
601A child watch source will issue an event when the process with the
602given PID dies.  To avoid race conditions, it will issue an event even
603if the process dies before the source is created.  The callback is
604called with three arguments: the event source, the PID, and the
605child's exit status.
606
607Note that this source is totally incompatible with any thing that
608would cause perl to change the SIGCHLD handler.  If SIGCHLD is
609changed, under some circumstances the module will recognize this
610circumstance, add a warning to the debug log, and continue operating.
611However, it is impossible to catch all possible situations.
612
613=head2 File Descriptor
614
615  my $src = Amanda::MainLoop::fd_source($fd, $G_IO_IN);
616
617This source will issue an event whenever one of the given conditions
618is true for the given file (a file handle or integer file descriptor).
619The conditions are from Glib's GIOCondition, and are C<$G_IO_IN>,
620C<G_IO_OUT>, C<$G_IO_PRI>, C<$G_IO_ERR>, C<$G_IO_HUP>, and
621C<$G_IO_NVAL>.  These constants are available with the import tag
622C<:GIOCondition>.
623
624Generally, when reading from a file descriptor, use
625C<$G_IO_IN|$G_IO_HUP|$G_IO_ERR> to ensure that an EOF triggers an
626event as well.  Writing to a file descriptor can simply use
627C<$G_IO_OUT|$G_IO_ERR>.
628
629The callback attached to an FdSource should read from or write to the
630underlying file descriptor before returning, or it will be called
631again in the next iteration of the main loop, which can lead to
632unexpected results.  Do I<not> use C<make_cb> here!
633
634=head2 Combining Event Sources
635
636Event sources are often set up in groups, e.g., a long-term operation
637and a timeout.  When this is the case, be careful that all sources are
638removed when the operation is complete.  The easiest way to accomplish
639this is to include all sources in a lexical scope and remove them at
640the appropriate times:
641
642    {
643	my $op_src = long_operation_src();
644	my $timeout_src = Amanda::MainLoop::timeout_source($timeout);
645
646	sub finish {
647	    $op_src->remove();
648	    $timeout_src->remove();
649	}
650
651	$op_src->set_callback(sub {
652	    print "Operation complete\n";
653	    finish();
654	});
655
656	$timeout_src->set_callback(sub {
657	    print "Operation timed out\n";
658	    finish();
659	});
660    }
661
662=head2 Relationship to Glib
663
664Glib's main event loop is described in the Glib manual:
665L<http://library.gnome.org/devel/glib/stable/glib-The-Main-Event-Loop.html>.
666Note that Amanda depends only on the functionality available in
667Glib-2.2.0, so many functions described in that document are not
668available in Amanda.  This module provides a much-simplified interface
669to the glib library, and is not intended as a generic wrapper for it:
670Amanda's perl-accessible main loop only runs a single C<GMainContext>,
671and always runs in the main thread; and (aside from idle sources),
672event priorities are not accessible from Perl.
673
674=cut
675
676
677%}
678