1/* 2 * Copyright (c) 2009-2013 Zmanda, Inc. All Rights Reserved. 3 * 4 * This program is free software; you can redistribute it and/or 5 * modify it under the terms of the GNU General Public License 6 * as published by the Free Software Foundation; either version 2 7 * of the License, or (at your option) any later version. 8 * 9 * This program is distributed in the hope that it will be useful, but 10 * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY 11 * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 12 * for more details. 13 * 14 * You should have received a copy of the GNU General Public License along 15 * with this program; if not, write to the Free Software Foundation, Inc., 16 * 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA 17 * 18 * Contact information: Zmanda Inc., 465 S. Mathilda Ave., Suite 300 19 * Sunnyvale, CA 94085, USA, or: http://www.zmanda.com 20 */ 21 22%perlcode %{ 23 24=head1 NAME 25 26Amanda::MainLoop - Perl interface to the Glib MainLoop 27 28=head1 SYNOPSIS 29 30 use Amanda::MainLoop; 31 32 my $to = Amanda::MainLoop::timeout_source(2000); 33 $to->set_callback(sub { 34 print "Time's Up!\n"; 35 $to->remove(); # dont' re-queue this timeout 36 Amanda::MainLoop::quit(); # return from Amanda::MainLoop::run 37 }); 38 39 Amanda::MainLoop::run(); 40 41Note that all functions in this module are individually available for 42export, e.g., 43 44 use Amanda::MainLoop qw(run quit); 45 46=head1 OVERVIEW 47 48The main event loop of an application is a tight loop which waits for 49events, and calls functions to respond to those events. This design 50allows an IO-bound application to multitask within a single thread, by 51responding to IO events as they occur instead of blocking on 52particular IO operations. 53 54The Amanda security API, transfer API, and other components rely on 55the event loop to allow them to respond to their own events in a 56timely fashion. 57 58The overall structure of an application, then, is to initialize its 59state, register callbacks for some events, and begin looping. In each 60iteration, the loop waits for interesting events to occur (data 61available for reading or writing, timeouts, etc.), and then calls 62functions to handle those interesting things. Thus, the application 63spends most of its time waiting. When some application-defined state 64is reached, the loop is terminated and the application cleans up and 65exits. 66 67The Glib main loop takes place within a call to 68C<Amanda::MainLoop::run()>. This function executes until a call to 69C<Amanda::MainLoop::quit()> occurs, at which point C<run()> returns. 70You can check whether the loop is running with 71C<Amanda::MainLoop::is_running()>. 72 73=head1 HIGH-LEVEL INTERFACE 74 75The functions in this section are intended to make asynchronous 76programming as simple as possible. They are implemented on top of the 77interfaces described in the LOW-LEVEL INTERFACE section. 78 79=head3 call_later 80 81In most cases, a callback does not need to be invoked immediately. In 82fact, because Perl does not do tail-call optimization, a long chain of 83callbacks may cause the perl stack to grow unnecessarily. 84 85The solution is to queue the callback for execution on the next 86iteration of the main loop, and C<call_later($cb, @args)> does exactly 87this. 88 89 sub might_delay { 90 my ($cb) = @_; 91 if (can_do_it_now()) { 92 my $result = do_it(); 93 Amanda::MainLoop::call_later($cb, $result) 94 } else { 95 # .. 96 } 97 } 98 99When starting the main loop, an application usually has a sub that 100should run after the loop has started. C<call_later> works in this 101situation, too. 102 103 my $main = sub { 104 # .. 105 Amanda::MainLoop::quit(); 106 }; 107 Amanda::MainLoop::call_later($main); 108 # .. 109 Amanda::MainLoop::run(); 110 111=head3 make_cb 112 113As an optimization, C<make_cb> wraps a sub with a call to call_later 114while also naming the sub (using C<Sub::Name>, if available): 115 116 my $fetched_cb = make_cb(fetched_cb => sub { 117 # .. callback body 118 } 119 120In general, C<make_cb> should be used whenever a callback is passed to 121some other library. For example, the Changer API (see 122L<Amanda::Changer>) might be invoked like this: 123 124 my $reset_finished_cb = make_cb(reset_finished_cb => sub { 125 my ($err) = @_; 126 die "while resetting: $err" if $err; 127 # .. 128 }); 129 130Be careful I<not> to use C<make_cb> in cases where some action must 131take place before the next iteration of the main loop. In practice, 132this means C<make_cb> should be avoided with file-descriptor 133callbacks, which will trigger repeatedly until the descriptors' needs 134are addressed. 135 136C<make_cb> is exported automatically. 137 138=head3 call_after 139 140Sometimes you need the MainLoop equivalent of C<sleep()>. That comes 141in the form of C<call_later($delay, $cb, @args)>, which takes a delay 142(in milliseconds), a sub, and an arbitrary number of arguments. The 143sub is called with the arguments after the delay has elapsed. 144 145 sub countdown { 146 my $counter; 147 $counter = sub { 148 print "$i..\n"; 149 if ($i) { 150 Amanda::MainLoop::call_after(1000, $counter, $i-1); 151 } 152 } 153 $counter->(10); 154 } 155 156The function returns the underlying event source (see below), enabling 157the caller to cancel the pending call: 158 159 my $tosrc = Amanda::MainLoop::call_after(15000, $timeout_cb): 160 # ...data arrives before timeout... 161 $tosrc->remove(); 162 163=head3 call_on_child_termination 164 165To monitor a child process for termination, give its pid to 166C<call_on_child_termination($pid, $cb, @args)>. When the child exits 167for any reason, this will collect its exit status (via C<waitpid>) and 168call C<$cb> as 169 170 $cb->($exitstatus, @args); 171 172Like C<call_after>, this function returns the event source to allow 173early cancellation if desired. 174 175=head3 async_read 176 177 async_read( 178 fd => $fd, 179 size => $size, # optional, default 0 180 async_read_cb => $async_read_cb, 181 args => [ .. ]); # optional 182 183This function will read C<$size> bytes when they are available from 184file descriptor C<$fd>, and invoke the callback with the results: 185 186 $async_read_cb->($err, $buf, @args); 187 188If C<$size> is zero, then the callback will get whatever data is 189available as soon as it is available, up to an arbitrary buffer size. 190If C<$size> is nonzero, then a short read may still occur if C<$size> 191bytes do not become available simultaneously. On EOF, C<$buf> will be 192the empty string. It is the caller's responsibility to set C<$fd> to 193non-blocking mode. Note that not all operating sytems generate errors 194that might be reported here. For example, on Solaris an invalid file 195descriptor will be silently ignored. 196 197The return value is an event source, and calling its C<remove> method 198will cancel the read. It is an error to have more than one 199C<async_read> operation on a single file descriptor at any time, and 200will lead to unpredictable results. 201 202This function adds a new FdSource every time it is invoked, so it is 203not well-suited to processing large amounts of data. For that 204purpose, consider using the low-level interface or, better, the 205transfer architecture (see L<Amanda::Xfer>). 206 207=head3 async_write 208 209 async_write( 210 fd => $fd, 211 data => $data, 212 async_write_cb => $async_write_cb, 213 args => [ .. ]); # optional 214 215This function will write C<$data> to file descriptor C<$fd> and invoke 216the callback with the number of bytes written: 217 218 $cb->($err, $bytes_written, @args); 219 220If C<$bytes_written> is less than then length of <$data>, then an 221error occurred, and is given in C<$err>. As for C<async_read>, the 222caller should set C<$fd> to non-blocking mode. Multiple parallel 223invocations of this function for the same file descriptor are allowed 224and will be serialized in the order the calls were made: 225 226 async_write($fd, "HELLO!\n", 227 async_write_cb => make_cb(wrote_hello => sub { 228 print "wrote 'HELLO!'\n"; 229 })); 230 async_write($fd, "GOODBYE!\n", 231 async_write_cb => make_cb(wrote_goodbye => sub { 232 print "wrote 'GOODBYE!'\n"; 233 })); 234 235In this case, the two strings are guaranteed to be written in the same 236order, and the callbacks will be called in the correct order. 237 238Like async_read, this function may add a new FdSource every time it is 239invoked, so it is not well-suited to processing large amounts of data. 240 241=head3 synchronized 242 243Java has the notion of a "synchronized" method, which can only execute in one 244thread at any time. This is a particular application of a lock, in which the 245lock is acquired when the method begins, and released when it finishes. 246 247With C<Amanda::MainLoop>, this functionality is generally not needed because 248there is no unexpected preemeption. However, if you break up a long-running 249operation (that doesn't allow concurrency) into several callbacks, you'll need 250to ensure that at most one of those operations is going on at a time. The 251C<synchronized> function manages that for you. 252 253The function takes a C<$lock> argument, which should be initialized to an empty 254arrayref (C<[]>). It is used like this: 255 256 use Amanda::MainLoop 'synchronized'; 257 # .. 258 sub dump_data { 259 my $self = shift; 260 my ($arg1, $arg2, $dump_cb) = @_; 261 262 synchronized($self->{'lock'}, $dump_cb, sub { 263 my ($dump_cb) = @_; # IMPORTANT! See below 264 $self->do_dump_data($arg1, $arg2, $dump_cb); 265 }; 266 } 267 268Here, C<do_dump_data> may take a long time to complete (perhaps it starts 269a long-running data transfer) but only one such operation is allowed at any 270time and other C<Amanda::MainLoop> callbacks may occur (e.g. a timeout). 271When the critical operation is complete, it calls C<$dump_cb> which will 272release the lock before transferring control to the caller. 273 274Note that the C<$dump_cb> in the inner C<sub> shadows that in 275C<dump_data> -- this is intentional, the a call to the the inner 276C<$dump_cb> is how C<synchronized> knows that the operation has completed. 277 278Several methods may be synchronized with one another by simply sharing the same 279lock. 280 281=head1 ASYNCHRONOUS STYLE 282 283When writing asynchronous code, it's easy to write code that is *very* 284difficult to read or debug. The suggestions in this section will help 285write code that is more readable, and also ensure that all asynchronous 286code in Amanda uses similar, common idioms. 287 288=head2 USING CALLBACKS 289 290Most often, callbacks are short, and can be specified as anonymous 291subs. They should be specified with make_cb, like this: 292 293 some_async_function(make_cb(foo_cb => sub { 294 my ($x, $y) = @_; 295 # ... 296 })); 297 298If a callback is more than about two lines, specify it in a named 299variable, rather than directly in the function call: 300 301 my $foo_cb = make_cb(foo_cb => sub { 302 my ($src) = @_; 303 # . 304 # . long function 305 # . 306 }); 307 some_async_function($foo_cb); 308 309When using callbacks from an object-oriented package, it is often 310useful to treat a method as a callback. This requires an anonymous 311sub "wrapper", which can be written on one line: 312 313 some_async_function(sub { $self->foo_cb(@_) }); 314 315=head2 LINEARITY 316 317The single most important factor in readability is linearity. If a function 318that performs operations A, B, and C in that order, then the code for A, B, and 319C should appear in that order in the source file. This seems obvious, but it's 320all too easy to write 321 322 sub three_ops { 323 my $do_c = sub { .. }; 324 my $do_b = sub { .. $do_c->() .. }; 325 my $do_a = sub { .. $do_b->() .. }; 326 $do_a->(); 327 } 328 329Which isn't very readable. Be readable. 330 331=head2 SINGLE ENTRY AND EXIT 332 333Amanda's use of callbacks emulates continuation-passing style. As such, when a 334function finishes -- whether successfully or with an error -- it should call a 335single callback. This ensures that the function has a simple control 336interface: perform the operation and call the callback. 337 338=head2 MULTIPLE STEPS 339 340Some operations require a long squence of asynchronous operations. For 341example, often the results of one operation are required to initiate 342another. The I<step> syntax is useful to make this much more readable, and 343also eliminate some nasty reference-counting bugs. The idea is that each "step" 344in the process gets its own sub, and then each step calls the next step. The 345first step defined will be called automatically. 346 347 sub send_file { 348 my ($hostname, $port, $data, $sendfile_cb) = @_; 349 my ($addr, $socket); # shared lexical variables 350 my $steps = define_steps 351 cb_ref => \$sendfile_cb; 352 step lookup_addr => sub { 353 return async_gethostbyname(hostname => $hostname, 354 ghbn_cb => $steps->{'got_addr'}); 355 }; 356 step ghbn_cb => sub { 357 my ($err, $hostinfo) = @_; 358 die $err if $err; 359 $addr = $hostinfo->{'ipaddr'}; 360 return $steps->{'connect'}->(); 361 }; 362 step connect => sub { 363 return async_connect( 364 ipaddr => $addr, 365 port => $port, 366 connect_cb => $steps->{'connect_cb'}, 367 ); 368 }; 369 step connect_cb => sub { 370 my ($err, $conn_sock) = @_; 371 die $err if $err; 372 $socket = $conn_sock; 373 return $steps->{'write_block'}->(); 374 }; 375 # ... 376 } 377 378The C<define_steps> function sets the stage. It is given a reference to the 379callback for this function (recall there is only one exit point!), and 380"patches" that reference to free C<$steps>, which otherwise forms a reference 381loop, on exit. 382 383WARNING: if the function or method needs to do any kind of setup before its 384first step, that setup should be done either in a C<setup> step or I<before> 385the C<define_steps> invocation. Do not write any statements other than step 386declarations after the C<define_steps> call. 387 388Note that there are more steps in this example than are strictly necessary: the 389body of C<connect> could be appended to C<ghbn_cb>. The extra steps make the 390overall operation more readable by adding "punctuation" to separate the task of 391handling a callback (C<ghbn_cb>) from starting the next operation (C<connect>). 392 393Also note that the enclosing scope contains some lexical (C<my>) 394variables which are shared by several of the callbacks. 395 396All of the steps are wrapped by C<make_cb>, so each step will be executed on a 397separate iteration of the MainLoop. This generally has the effect of making 398asynchronous functions share CPU time more fairly. Sometimes, especially when 399using the low-level interface, a callback must be called immediately. To 400achieve this for all callbacks, add C<< immediate => 1 >> to the C<define_steps> 401invocation: 402 403 my $steps = define_steps 404 cb_ref => \$finished_cb, 405 immediate => 1; 406 407To do the same for a single step, add the same keyword to the C<step> invocation: 408 409 step immediate => 1, 410 connect => sub { .. }; 411 412In some case, you want to execute some code when the step finish, it can 413be done by defining a finalize code in define_steps: 414 415 my $steps = define_steps 416 cb_ref => \$finished_cb, 417 finalize => sub { .. }; 418 419=head2 JOINING ASYNCHRONOUS "THREADS" 420 421With slow operations, it is often useful to perform multiple operations 422simultaneously. As an example, the following code might run two system 423commands simultaneously and capture their output: 424 425 sub run_two_commands { 426 my ($finished_cb) = @_; 427 my $running_commands = 0; 428 my ($result1, $result2); 429 my $steps = define_steps 430 cb_ref => \$finished_cb; 431 step start => sub { 432 $running_commands++; 433 run_command($command1, 434 run_cb => $steps->{'command1_done'}); 435 $running_commands++; 436 run_command($command2, 437 run_cb => $steps->{'command2_done'}); 438 }; 439 step command1_done => sub { 440 $result1 = $_[0]; 441 $steps->{'maybe_done'}->(); 442 }; 443 step command2_done => sub { 444 $result2 = $_[0]; 445 $steps->{'maybe_done'}->(); 446 }; 447 step maybe_done => sub { 448 return if --$running_commands; # not done yet 449 $finished_cb->($result1, $result2); 450 }; 451 } 452 453It is tempting to optimize out the C<$running_commands> with something like: 454 455 step maybe_done { ## BAD! 456 return unless defined $result1 and defined $result2; 457 $finished_cb->($result1, $result2); 458 } 459 460However this can lead to trouble. Remember that define_steps automatically 461applies C<make_cb> to each step, so a C<maybe_done> is not invoked immediately 462by C<command1_done> and C<command2_done> - instead, C<maybe_done> is scheduled 463for invocation in the next loop of the mainloop (via C<call_later>). If both 464commands finish before C<maybe_done> is invoked, C<call_later> will be called 465I<twice>, with both C<$result1> and C<$result2> defined both times. The result 466is that C<$finished_cb> is called twice, and mayhem ensues. 467 468This is a complex case, but worth understanding if you want to be able to debug 469difficult MainLoop bugs. 470 471=head2 WRITING ASYNCHRONOUS INTERFACES 472 473When designing a library or interface that will accept and invoke 474callbacks, follow these guidelines so that users of the interface will 475not need to remember special rules. 476 477Each callback signature within a package should always have the same 478name, ending with C<_cb>. For example, a hypothetical 479C<Amanda::Estimate> module might provide its estimates through a 480callback with four parameters. This callback should be referred to as 481C<estimate_cb> throughout the package, and its parameters should be 482clearly defined in the package's documentation. It should take 483positional parameters only. If error conditions must also be 484communicated via the callback, then the first parameter should be an 485C<$error> parameter, which is undefined when no error has occurred. 486The Changer API's C<res_cb> is typical of such a callback signature. 487 488A caller can only know that an operation is complete by the invocation 489of the callback, so it is important that a callback be invoked 490I<exactly once> in all circumstances. Even in an error condition, the 491caller needs to know that the operation has failed. Also beware of 492bugs that might cause a callback to be invoked twice. 493 494Functions or methods taking callbacks as arguments should either take 495only a callback (like C<call_later>), or take hash-key parameters, 496where the callback's key is the signature name. For example, the 497C<Amanda::Estimate> package might define a function like 498C<perform_estimate>, invoked something like this: 499 500 my $estimate_cb = make_cb(estimate_cb => sub { 501 my ($err, $size, $level) = @_; 502 die $err if $err; 503 # ... 504 }); 505 Amanda::Estimate::perform_estimate( 506 host => $host, 507 disk => $disk, 508 estimate_cb => $estimate_cb, 509 ); 510 511When invoking a user-supplied callback within the library, there is no 512need to wrap it in a C<call_later> invocation, as the user already 513supplied that wrapper via C<make_cb>, or is not interested in using 514such a wrapper. 515 516Callbacks are a form of continuation 517(L<http://en.wikipedia.org/wiki/Continuations>), and as such should 518only be called at the I<end> of a function. Do not do anything after 519invoking a callback, as you cannot know what processing has gone on in 520the callback. 521 522 sub estimate_done { 523 # ... 524 $self->{'estimate_cb'}->(undef, $size, $level); 525 $self->{'estimate_in_progress'} = 0; # BUG!! 526 } 527 528In this case, the C<estimate_cb> invocation may have called 529C<perform_estimate> again, setting C<estimate_in_progress> back to 1. 530A technique to avoid this pitfall is to always C<return> a callback's 531result, even though that result is not important. This makes the bug 532much more apparent: 533 534 sub estimate_done { 535 # ... 536 return $self->{'estimate_cb'}->(undef, $size, $level); 537 $self->{'estimate_in_progress'} = 0; # BUG (this just looks silly) 538 } 539 540=head1 LOW-LEVEL INTERFACE 541 542MainLoop events are generated by event sources. A source may produce 543multiple events over its lifetime. The higher-level methods in the 544previous section provide a more Perlish abstraction of event sources, 545but for efficiency it is sometimes necessary to use event sources 546directly. 547 548The method C<< $src->set_callback(\&cb) >> sets the function that will 549be called for a given source, and "attaches" the source to the main 550loop so that it will begin generating events. The arguments to the 551callback depend on the event source, but the first argument is always 552the source itself. Unless specified, no other arguments are provided. 553 554Event sources persist until they are removed with 555C<< $src->remove() >>, even if the source itself is no longer accessible from Perl. 556Although Glib supports it, there is no provision for "automatically" 557removing an event source. Also, calling C<< $src->remove() >> more than 558once is a potentially-fatal error. As an example: 559 560 sub start_timer { 561 my ($loops) = @_; 562 Amanda::MainLoop::timeout_source(200)->set_callback(sub { 563 my ($src) = @_; 564 print "timer\n"; 565 if (--$loops <= 0) { 566 $src->remove(); 567 Amanda::MainLoop::quit(); 568 } 569 }); 570 } 571 start_timer(10); 572 Amanda::MainLoop::run(); 573 574There is no means in place to specify extra arguments to be provided 575to a source callback when it is set. If the callback needs access to 576other data, it should use a Perl closure in the form of lexically 577scoped variables and an anonymous sub. In fact, this is exactly what 578the higher-level functions (described above) do. 579 580=head2 Timeout 581 582 my $src = Amanda::MainLoop::timeout_source(10000); 583 584A timeout source will create events at the specified interval, 585specified in milliseconds (thousandths of a second). The events will 586continue until the source is destroyed. 587 588=head2 Idle 589 590 my $src = Amanda::MainLoop::idle_source(2); 591 592An idle source will create events continuously except when a 593higher-priority source is emitting events. Priorities are generally 594small positive integers, with larger integers denoting lower 595priorities. The events will continue until the source is destroyed. 596 597=head2 Child Watch 598 599 my $src = Amanda::MainLoop::child_watch_source($pid); 600 601A child watch source will issue an event when the process with the 602given PID dies. To avoid race conditions, it will issue an event even 603if the process dies before the source is created. The callback is 604called with three arguments: the event source, the PID, and the 605child's exit status. 606 607Note that this source is totally incompatible with any thing that 608would cause perl to change the SIGCHLD handler. If SIGCHLD is 609changed, under some circumstances the module will recognize this 610circumstance, add a warning to the debug log, and continue operating. 611However, it is impossible to catch all possible situations. 612 613=head2 File Descriptor 614 615 my $src = Amanda::MainLoop::fd_source($fd, $G_IO_IN); 616 617This source will issue an event whenever one of the given conditions 618is true for the given file (a file handle or integer file descriptor). 619The conditions are from Glib's GIOCondition, and are C<$G_IO_IN>, 620C<G_IO_OUT>, C<$G_IO_PRI>, C<$G_IO_ERR>, C<$G_IO_HUP>, and 621C<$G_IO_NVAL>. These constants are available with the import tag 622C<:GIOCondition>. 623 624Generally, when reading from a file descriptor, use 625C<$G_IO_IN|$G_IO_HUP|$G_IO_ERR> to ensure that an EOF triggers an 626event as well. Writing to a file descriptor can simply use 627C<$G_IO_OUT|$G_IO_ERR>. 628 629The callback attached to an FdSource should read from or write to the 630underlying file descriptor before returning, or it will be called 631again in the next iteration of the main loop, which can lead to 632unexpected results. Do I<not> use C<make_cb> here! 633 634=head2 Combining Event Sources 635 636Event sources are often set up in groups, e.g., a long-term operation 637and a timeout. When this is the case, be careful that all sources are 638removed when the operation is complete. The easiest way to accomplish 639this is to include all sources in a lexical scope and remove them at 640the appropriate times: 641 642 { 643 my $op_src = long_operation_src(); 644 my $timeout_src = Amanda::MainLoop::timeout_source($timeout); 645 646 sub finish { 647 $op_src->remove(); 648 $timeout_src->remove(); 649 } 650 651 $op_src->set_callback(sub { 652 print "Operation complete\n"; 653 finish(); 654 }); 655 656 $timeout_src->set_callback(sub { 657 print "Operation timed out\n"; 658 finish(); 659 }); 660 } 661 662=head2 Relationship to Glib 663 664Glib's main event loop is described in the Glib manual: 665L<http://library.gnome.org/devel/glib/stable/glib-The-Main-Event-Loop.html>. 666Note that Amanda depends only on the functionality available in 667Glib-2.2.0, so many functions described in that document are not 668available in Amanda. This module provides a much-simplified interface 669to the glib library, and is not intended as a generic wrapper for it: 670Amanda's perl-accessible main loop only runs a single C<GMainContext>, 671and always runs in the main thread; and (aside from idle sources), 672event priorities are not accessible from Perl. 673 674=cut 675 676 677%} 678