1NAME
2 AnyEvent::Task - Client/server-based asynchronous worker pool
3
4SYNOPSIS 1: PASSWORD HASHING
5 Server
6 use AnyEvent::Task::Server;
7 use Authen::Passphrase::BlowfishCrypt;
8
9 my $dev_urandom;
10 my $server = AnyEvent::Task::Server->new(
11 listen => ['unix/', '/tmp/anyevent-task.socket'],
12 setup => sub {
13 open($dev_urandom, "/dev/urandom") || die "open urandom: $!";
14 },
15 interface => {
16 hash => sub {
17 my ($plaintext) = @_;
18 read($dev_urandom, my $salt, 16) == 16 || die "bad read from urandom";
19 return Authen::Passphrase::BlowfishCrypt->new(cost => 10,
20 salt => $salt,
21 passphrase => $plaintext)
22 ->as_crypt;
23
24 },
25 verify => sub {
26 my ($crypted, $plaintext) = @_;
27 return Authen::Passphrase::BlowfishCrypt->from_crypt($crypted)
28 ->match($plaintext);
29 },
30 },
31 );
32
33 $server->run; # or AE::cv->recv
34
35 Client
36 use AnyEvent::Task::Client;
37
38 my $client = AnyEvent::Task::Client->new(
39 connect => ['unix/', '/tmp/anyevent-task.socket'],
40 );
41
42 my $checkout = $client->checkout( timeout => 5, );
43
44 my $cv = AE::cv;
45
46 $checkout->hash('secret',
47 sub {
48 my ($checkout, $crypted) = @_;
49
50 print "Hashed password is $crypted\n";
51
52 $checkout->verify($crypted, 'secret',
53 sub {
54 my ($checkout, $result) = @_;
55 print "Verify result is $result\n";
56 $cv->send;
57 });
58 });
59
60 $cv->recv;
61
62 Output
63 Hashed password is $2a$10$NwTOwxmTlG0Lk8YZMT29/uysC9RiZX4jtWCx.deBbb2evRjCq6ovi
64 Verify result is 1
65
66SYNOPSIS 2: DBI
67 Server
68 use AnyEvent::Task::Server;
69 use DBI;
70
71 my $dbh;
72
73 AnyEvent::Task::Server->new(
74 listen => ['unix/', '/tmp/anyevent-task.socket'],
75 setup => sub {
76 $dbh = DBI->connect("dbi:SQLite:dbname=/tmp/junk.sqlite3","","",{ RaiseError => 1, });
77 },
78 interface => sub {
79 my ($method, @args) = @_;
80 $dbh->$method(@args);
81 },
82 )->run;
83
84 Client
85 use AnyEvent::Task::Client;
86
87 my $client = AnyEvent::Task::Client->new(
88 connect => ['unix/', '/tmp/anyevent-task.socket'],
89 );
90
91 my $dbh = $client->checkout;
92
93 my $cv = AE::cv;
94
95 $dbh->do(q{ CREATE TABLE user(username TEXT PRIMARY KEY, email TEXT); },
96 sub { });
97
98 ## Requests will queue up on the checkout and execute in order:
99
100 $dbh->do(q{ INSERT INTO user (username, email) VALUES (?, ?) },
101 undef, 'jimmy',
102 'jimmy@example.com',
103 sub { });
104
105 $dbh->selectrow_hashref(q{ SELECT * FROM user }, sub {
106 my ($dbh, $user) = @_;
107 print "username: $user->{username}, email: $user->{email}\n";
108 $cv->send;
109 });
110
111 $cv->recv;
112
113 Output
114 username: jimmy, email: jimmy@example.com
115
116DESCRIPTION
117 The synopses make this module look much more complicated than it
118 actually is. In a nutshell, a synchronous worker process is forked off
119 by a server whenever a client asks for one. The client keeps as many of
120 these workers around as it wants and delegates tasks to them
121 asynchronously.
122
123 Another way of saying that is that AnyEvent::Task is a
124 pre-fork-on-demand server (AnyEvent::Task::Server) combined with a
125 persistent worker-pooled client (AnyEvent::Task::Client).
126
127 The examples in the synopses are complete stand-alone programs. Run the
128 server in one window and the client in another. The server will remain
129 running but the client will exit after printing its output. Typically
130 the "client" programs would be embedded in a server program such as a
131 web-server.
132
133 Note that the client examples don't implement error checking (see the
134 "ERROR HANDLING" section).
135
136 A server is started with "AnyEvent::Task::Server->new". This constructor
137 should be passed in at least the "listen" and "interface" arguments.
138 Keep the returned server object around for as long as you want the
139 server to be running. "listen" is an array ref containing the host and
140 service options to be passed to AnyEvent::Socket's "tcp_server"
141 function. "interface" is the code that should handle each request. See
142 the INTERFACE section below for its specification. A "setup" coderef can
143 be passed in to run some code after a new worker is forked. A
144 "checkout_done" coderef can be passed in to run some code whenever a
145 checkout is released in order to perform any required clean-up.
146
147 A client is started with "AnyEvent::Task::Client->new". You only need to
148 pass "connect" to this constructor which is an array ref containing the
149 host and service options to be passed to AnyEvent::Socket's
150 "tcp_connect". Keep the returned client object around as long as you
151 wish the client to be connected.
152
153 After the server and client are initialised, each process must enter
154 AnyEvent's "main loop" in some way, possibly just "AE::cv->recv". The
155 "run" method on the server object is a convenient short-cut for this.
156
157 To acquire a worker process you call the "checkout" method on the client
158 object. The "checkout" method doesn't need any arguments, but several
159 optional ones such as "timeout" are described below. As long as the
160 checkout object is around, this checkout has exclusive access to the
161 worker.
162
163 The checkout object is an object that proxies its method calls to a
164 worker process or a function that does the same. The arguments to this
165 method/function are the arguments you wish to send to the worker process
166 followed by a callback to run when the operation completes. The callback
167 will be passed two arguments: the original checkout object and the value
168 returned by the worker process. The checkout object is passed into the
169 callback as a convenience just in case you no longer have the original
170 checkout available lexically.
171
172 In the event of an exception thrown by the worker process, a timeout, or
173 some other unexpected condition, an error is raised in the dynamic
174 context of the callback (see the "ERROR HANDLING" section).
175
176DESIGN
177 Both client and server are of course built with AnyEvent. However,
178 workers can't use AnyEvent (yet). I've never found a need to do event
179 processing in the worker since if the library you wish to use is already
180 AnyEvent-compatible you can simply use the library in the client
181 process. If the client process is too over-loaded, it may make sense to
182 run multiple client processes.
183
184 Each client maintains a "pool" of connections to worker processes. Every
185 time a checkout is requested, the request is placed into a first-come,
186 first-serve queue. Once a worker process becomes available, it is
187 associated with that checkout until that checkout is garbage collected
188 which in perl means as soon as it is no longer needed. Each checkout
189 also maintains a queue of requested method-calls so that as soon as a
190 worker process is allocated to a checkout, any queued method calls are
191 filled in order.
192
193 "timeout" can be passed as a keyword argument to "checkout". Once a
194 request is queued up on that checkout, a timer of "timout" seconds
195 (default is 30, undef means infinity) is started. If the request
196 completes during this timeframe, the timer is cancelled. If the timer
197 expires, the worker connection is terminated and an exception is thrown
198 in the dynamic context of the callback (see the "ERROR HANDLING"
199 section).
200
201 Note that since timeouts are associated with a checkout, checkouts can
202 be created before the server is started. As long as the server is
203 running within "timeout" seconds, no error will be thrown and no
204 requests will be lost. The client will continually try to acquire worker
205 processes until a server is available, and once one is available it will
206 attempt to allocate all queued checkouts.
207
208 Because of checkout queuing, the maximum number of worker processes a
209 client will attempt to obtain can be limited with the "max_workers"
210 argument when creating a client object. If there are more live checkouts
211 than "max_workers", the remaining checkouts will have to wait until one
212 of the other workers becomes available. Because of timeouts, some
213 checkouts may never be serviced if the system can't handle the load (the
214 timeout error should be handled to indicate the service is temporarily
215 unavailable).
216
217 The "min_workers" argument determines how many "hot-standby" workers
218 should be pre-forked when creating the client. The default is 2 though
219 note that this may change to 0 in the future.
220
221STARTING THE SERVER
222 Typically you will want to start the client and server as completely
223 separate processes as shown in the synopses.
224
225 Running the server and the client in the same process is technically
226 possible but is highly discouraged since the server will "fork()" when
227 the client demands a new worker process. In this case, all descriptors
228 in use by the client are duped into the worker process and the worker
229 ought to close these extra descriptors. Also, forking a busy client may
230 be memory-inefficient (and dangerous if it uses threads).
231
232 Since it's more of a bother than it's worth to run the server and the
233 client in the same process, there is an alternate server constructor,
234 "AnyEvent::Task::Server::fork_task_server" for when you'd like to fork a
235 dedicated server process. It can be passed the same arguments as the
236 regular "new" constructor:
237
238 ## my ($keepalive_pipe, $server_pid) =
239 AnyEvent::Task::Server::fork_task_server(
240 listen => ['unix/', '/tmp/anyevent-task.socket'],
241 interface => sub {
242 return "Hello from PID $$";
243 },
244 );
245
246 The only differences between this and the regular constructor is that
247 "fork_task_server" will fork a process which becomes the server and will
248 also install a "keep-alive" pipe between the server and the client. This
249 keep-alive pipe will be used by the server to detect when its parent
250 (the client process) exits.
251
252 If "AnyEvent::Task::Server::fork_task_server" is called in a void
253 context then the reference to this keep-alive pipe is pushed onto
254 @AnyEvent::Task::Server::children_sockets. Otherwise, the keep-alive
255 pipe and the server's PID are returned. Closing the pipe will terminate
256 the server gracefully. "kill" the PID to terminate it immediately. Note
257 that even when the server is shutdown, existing worker processes and
258 checkouts may still be active in the client. The client object and all
259 checkout objects should be destroyed if you wish to ensure all workers
260 are shutdown.
261
262 Since the "fork_task_server" constructor calls fork and requires using
263 AnyEvent in both the parent and child processes, it is important that
264 you not install any AnyEvent watchers before calling it. The usual
265 caveats about forking AnyEvent processes apply (see the AnyEvent docs).
266
267 You should also not call "fork_task_server" after having started threads
268 since, again, this function calls fork. Forking a threaded process is
269 dangerous because the threads might have userspace data-structures in
270 inconsistent states at the time of the fork.
271
272INTERFACE
273 When creating a server, there are two possible formats for the
274 "interface" option. The first and most general is a coderef. This
275 coderef will be passed the list of arguments that were sent when the
276 checkout was called in the client process (without the trailing callback
277 of course).
278
279 As described above, you can use a checkout object as a coderef or as an
280 object with methods. If the checkout is invoked as an object, the method
281 name is prepended to the arguments passed to "interface":
282
283 interface => sub {
284 my ($method, @args) = @_;
285 },
286
287 If the checkout is invoked as a coderef, method is omitted:
288
289 interface => sub {
290 my (@args) = @_;
291 },
292
293 The second format possible for "interface" is a hash ref. This is a
294 simple method dispatch feature where the method invoked on the checkout
295 object is the key used to lookup which coderef to run in the worker:
296
297 interface => {
298 method1 => sub {
299 my (@args) = @_;
300 },
301 method2 => sub {
302 my (@args) = @_;
303 },
304 },
305
306 Note that since the protocol between the client and the worker process
307 is currently JSON-based, all arguments and return values must be
308 serializable to JSON. This includes most perl scalars like strings, a
309 limited range of numerical types, and hash/list constructs with no
310 cyclical references.
311
312 Because there isn't any way for the callback to indicate the context it
313 desires, interface subs are always called in scalar context.
314
315 A future backwards compatible RPC protocol may use Sereal. Although it's
316 inefficient you can already serialise an object with Sereal manually,
317 send the resulting string over the existing protocol, and then
318 deserialise it in the worker.
319
320LOGGING
321 Because workers run in a separate process, they can't directly use
322 logging contexts in the client process. That is why this module is
323 integrated with Log::Defer.
324
325 A Log::Defer object is created on demand in the worker process. Once the
326 worker is done an operation, any messages in the object will be
327 extracted and sent back to the client. The client then merges this into
328 its main Log::Defer object that was passed in when creating the
329 checkout.
330
331 In your server code, use AnyEvent::Task::Logger. It exports the function
332 "logger" which returns a Log::Defer object:
333
334 use AnyEvent::Task::Server;
335 use AnyEvent::Task::Logger;
336
337 AnyEvent::Task::Server->new(
338 listen => ['unix/', '/tmp/anyevent-task.socket'],
339 interface => sub {
340 logger->info('about to compute some operation');
341 {
342 my $timer = logger->timer('computing some operation');
343 select undef,undef,undef, 1; ## sleep for 1 second
344 }
345 },
346 )->run;
347
348 Note: Portable server code should never call "sleep" because on some
349 systems it will interfere with the recoverable worker timeout feature
350 implemented with "SIGALRM".
351
352 In your client code, pass a Log::Defer object in when you create a
353 checkout:
354
355 use AnyEvent::Task::Client;
356 use Log::Defer;
357
358 my $client = AnyEvent::Task::Client->new(
359 connect => ['unix/', '/tmp/anyevent-task.socket'],
360 );
361
362 my $log_defer_object = Log::Defer->new(sub {
363 my $msg = shift;
364 use Data::Dumper; ## or whatever
365 print Dumper($msg);
366 });
367
368 $log_defer_object->info('going to compute some operation in a worker');
369
370 my $checkout = $client->checkout(log_defer_object => $log_defer_object);
371
372 my $cv = AE::cv;
373
374 $checkout->(sub {
375 $log_defer_object->info('finished some operation');
376 $cv->send;
377 });
378
379 $cv->recv;
380
381 When run, the above client will print something like this:
382
383 $VAR1 = {
384 'start' => '1363232705.96839',
385 'end' => '1.027309',
386 'logs' => [
387 [
388 '0.000179',
389 30,
390 'going to compute some operation in a worker'
391 ],
392 [
393 '0.023881061050415',
394 30,
395 'about to compute some operation'
396 ],
397 [
398 '1.025965',
399 30,
400 'finished some operation'
401 ]
402 ],
403 'timers' => {
404 'computing some operation' => [
405 '0.024089061050415',
406 '1.02470206105041'
407 ]
408 }
409 };
410
411ERROR HANDLING
412 In a synchronous program, if you expected some operation to throw an
413 exception you might wrap it in "eval" like this:
414
415 my $crypted;
416
417 eval {
418 $crypted = hash('secret');
419 };
420
421 if ($@) {
422 say "hash failed: $@";
423 } else {
424 say "hashed password is $crypted";
425 }
426
427 But in an asynchronous program, typically "hash" would initiate some
428 kind of asynchronous operation and then return immediately, allowing the
429 program to go about other tasks while waiting for the result. Since the
430 error might come back at any time in the future, the program needs a way
431 to map the exception that is thrown back to the original context.
432
433 AnyEvent::Task accomplishes this mapping with Callback::Frame.
434
435 Callback::Frame lets you preserve error handlers (and "local" variables)
436 across asynchronous callbacks. Callback::Frame is not tied to
437 AnyEvent::Task, AnyEvent or any other async framework and can be used
438 with almost all callback-based libraries.
439
440 However, when using AnyEvent::Task, libraries that you use in the client
441 must be AnyEvent compatible. This restriction obviously does not apply
442 to your server code, that being the main purpose of this module:
443 accessing blocking resources from an asynchronous program. In your
444 server code, when there is an error condition you should simply "die" or
445 "croak" as in a synchronous program.
446
447 As an example usage of Callback::Frame, here is how we would handle
448 errors thrown from a worker process running the "hash" method in an
449 asychronous client program:
450
451 use Callback::Frame;
452
453 frame(code => sub {
454
455 $client->checkout->hash('secret', sub {
456 my ($checkout, $crypted) = @_;
457 say "Hashed password is $crypted";
458 });
459
460 }, catch => sub {
461
462 my $back_trace = shift;
463 say "Error is: $@";
464 say "Full back-trace: $back_trace";
465
466 })->(); ## <-- frame is created and then immediately executed
467
468 Of course if "hash" is something like a bcrypt hash function it is
469 unlikely to raise an exception so maybe that's a bad example. On the
470 other hand, maybe it's a really good example: In addition to errors that
471 occur while running your callbacks, AnyEvent::Task uses Callback::Frame
472 to throw errors if the worker process times out, so if the bcrypt "cost"
473 is really cranked up it might hit the default 30 second time limit.
474
475 Rationale for Callback::Frame
476 Why not just call the callback but set $@ and indicate an error has
477 occurred? This is the approach taken with AnyEvent::DBI for example. I
478 believe the Callback::Frame interface is superior to this method. In a
479 synchronous program, exceptions are out-of-band messages and code
480 doesn't need to locally handle them. It can let them "bubble up" the
481 stack, perhaps to a top-level error handler. Invoking the callback when
482 an error occurs forces exceptions to be handled in-band.
483
484 How about having AnyEvent::Task expose an error callback? This is the
485 approach taken by AnyEvent::Handle for example. I believe
486 Callback::Frame is superior to this method also. Although separate
487 callbacks are (sort of) out-of-band, you still have to write error
488 handler callbacks and do something relevant locally instead of allowing
489 the exception to bubble up to an error handler.
490
491 In servers, Callback::Frame helps you maintain the "dynamic state"
492 (error handlers and dynamic variables) installed for a single
493 connection. In other words, any errors that occur while servicing that
494 connection will be able to be caught by an error handler specific to
495 that connection. This lets you send an error response to the client and
496 collect associated log messages in a Log::Defer object specific to that
497 connection.
498
499 Callback::Frame provides an error handler stack so you can have a
500 top-level handler as well as nested handlers (similar to nested
501 "eval"s). This is useful when you wish to have a top-level "bail-out"
502 error handler and also nested error handlers that know how to retry or
503 recover from an error in an async sub-operation.
504
505 Callback::Frame is designed to be easily used with callback-based
506 libraries that don't know about Callback::Frame. "fub" is a shortcut for
507 "frame" with just the "code" argument. Instead of passing "sub { ... }"
508 into libraries you can pass in "fub { ... }". When invoked, this wrapped
509 callback will first re-establish any error handlers that you installed
510 with "frame" and then run your provided code. Libraries that force
511 in-band error signalling can be handled with callbacks such as "fub {
512 die $@ if $@; ... }". Separate error callbacks should simply be "fub {
513 die "failed becase ..." }".
514
515 It's important that all callbacks be created with "fub" (or "frame")
516 even if you don't expect them to fail so that the dynamic context is
517 preserved for nested callbacks that may. An exception is the callbacks
518 provided to AnyEvent::Task checkouts: These are automatically wrapped in
519 frames for you (although explicitly passing in fubs is fine too).
520
521 The Callback::Frame documentation explains how this works in much more
522 detail.
523
524 Reforking of workers after errors
525 If a worker throws an error, the client receives the error but the
526 worker process stays running. As long as the client has a reference to
527 the checkout (and as long as the exception wasn't "fatal" -- see below),
528 it can still be used to communicate with that worker so you can access
529 error states, rollback transactions, or do any sort of required
530 clean-up.
531
532 However, once the checkout object is destroyed, by default the worker
533 will be shutdown instead of returning to the client's worker pool as in
534 the normal case where no errors were thrown. This is a "safe-by-default"
535 behaviour that may help in the event that an exception thrown by a
536 worker leaves the worker process in a broken/inconsistent state for some
537 reason (for example a DBI connection died). This can be overridden by
538 setting the "dont_refork_after_error" option to 1 in the client
539 constructor. This will only matter if errors are being thrown frequently
540 and your "setup" routines take a long time (aside from the setup
541 routine, creating new workers is quite fast since the server has already
542 compiled all the application code and just has to fork).
543
544 There are cases where workers will never be returned to the worker pool:
545 workers that have thrown fatal errors such as loss of worker connection
546 or hung worker timeout errors. These errors are stored in the checkout
547 and for as long as the checkout exists any methods on the checkout will
548 immediately return the stored fatal error. Your client process can
549 invoke this behaviour manually by calling the "throw_fatal_error" method
550 on a checkout object to cancel an operation and force-terminate a
551 worker.
552
553 Another reason that a worker might not be returned to the worker pool is
554 if it has been checked out "max_checkouts" times. If "max_checkouts" is
555 specified as an argument to the Client constructor, then workers will be
556 destroyed and reforked after being checked out this number of times.
557 When not specified, workers are never re-forked for this reason. This
558 parameter is useful for coping with libraries that leak memory or
559 otherwise become slower/more resource-hungry over time.
560
561COMPARISON WITH HTTP
562 Why a custom protocol, client, and server? Can't we just use something
563 like HTTP?
564
565 It depends.
566
567 AnyEvent::Task clients send discrete messages and receive ordered
568 replies from workers, much like HTTP. The AnyEvent::Task protocol can be
569 extended in a backwards-compatible manner like HTTP. AnyEvent::Task
570 communication can be pipelined and possibly in the future even
571 compressed like HTTP.
572
573 The current AnyEvent::Task server obeys a very specific implementation
574 policy: It is like a CGI server in that each process it forks is
575 guaranteed to be handling only one connection at once so it can perform
576 blocking operations without worrying about holding up other connections.
577
578 But since a single process can handle many requests in a row without
579 exiting, they are more like persistent FastCGI processes. The difference
580 however is that while a client holds a checkout it is guaranteed an
581 exclusive lock on that process (useful for supporting DB transactions
582 for example). With a FastCGI server it is assumed that requests are
583 stateless so you can't necessarily be sure you'll get the same process
584 for two consecutive requests. In fact, if an error is thrown in the
585 FastCGI handler you may never get the same process back again,
586 preventing you from being able to recover from the error, retry, or at
587 least collect process state for logging reasons.
588
589 The fundamental difference between the AnyEvent::Task protocol and HTTP
590 is that in AnyEvent::Task the client is the dominant protocol
591 orchestrator whereas in HTTP it is the server.
592
593 In AnyEvent::Task, the client manages the worker pool and the client
594 decides if/when worker processes should terminate. In the normal case, a
595 client will just return the worker to its worker pool. A worker is
596 supposed to accept commands for as long as possible until the client
597 dismisses it.
598
599 The client decides the timeout for each checkout and different clients
600 can have different timeouts while connecting to the same server.
601
602 Client processes can be started and checkouts can be obtained before the
603 server is even started. The client will continue trying to connect to
604 the server to obtain worker processes until either the server starts or
605 the checkout's timeout period lapses. As well as freeing you from having
606 to start your services in the "right" order, this also means servers can
607 be restarted without throwing any errors (aka "zero-downtime restarts").
608
609 The client even decides how many minimum workers should be in the pool
610 upon start-up and how many maximum workers to acquire before checkout
611 creation requests are queued. The server is really just a dumb
612 fork-on-demand server and most of the sophistication is in the
613 asynchronous client.
614
615SEE ALSO
616 The AnyEvent::Task github repo
617 <https://github.com/hoytech/AnyEvent-Task>
618
619 In order to handle exceptions in a meaningful way with this module, you
620 must use Callback::Frame. In order to maintain seamless request logging
621 across clients and workers, you should use Log::Defer.
622
623 There are many modules on CPAN similar to AnyEvent::Task.
624
625 This module is designed to be used in a non-blocking, process-based unix
626 program. Depending on your exact requirements you might find something
627 else useful: Parallel::ForkManager, Thread::Pool, or an HTTP server of
628 some kind.
629
630 If you're into AnyEvent, AnyEvent::DBI and AnyEvent::Worker (based on
631 AnyEvent::DBI), AnyEvent::ForkObject, and AnyEvent::Fork::RPC send and
632 receive commands from worker processes similar to this module.
633 AnyEvent::Worker::Pool also has an implementation of a worker pool.
634 AnyEvent::Gearman can interface with Gearman services.
635
636 If you're into POE there is POE::Component::Pool::DBI, POEx::WorkerPool,
637 POE::Component::ResourcePool, POE::Component::PreforkDispatch,
638 Cantella::Worker.
639
640BUGS
641 Although this module's interface is now stable and has been in
642 production use for some time, there are few remaining TODO items (see
643 the bottom of Task.pm).
644
645AUTHOR
646 Doug Hoyte, "<doug@hcsw.org>"
647
648COPYRIGHT & LICENSE
649 Copyright 2012-2014 Doug Hoyte.
650
651 This module is licensed under the same terms as perl itself.
652
653