1=head1 NAME
2
3Test::Harness::Beyond - Beyond make test
4
5=head1 Beyond make test
6
7Test::Harness is responsible for running test scripts, analysing
8their output and reporting success or failure. When I type
9F<make test> (or F<./Build test>) for a module, Test::Harness is usually
10used to run the tests (not all modules use Test::Harness but the
11majority do).
12
13To start exploring some of the features of Test::Harness I need to
14switch from F<make test> to the F<prove> command (which ships with
15Test::Harness). For the following examples I'll also need a recent
16version of Test::Harness installed; 3.14 is current as I write.
17
18For the examples I'm going to assume that we're working with a
19'normal' Perl module distribution. Specifically I'll assume that
20typing F<make> or F<./Build> causes the built, ready-to-install module
21code to be available below ./blib/lib and ./blib/arch and that
22there's a directory called 't' that contains our tests. Test::Harness
23isn't hardwired to that configuration but it  saves me from explaining
24which files live where for each example.
25
26Back to F<prove>; like F<make test> it runs a test suite - but it
27provides far more control over which tests are executed, in what
28order and how their results are reported. Typically F<make test>
29runs all the test scripts below the 't' directory. To do the same
30thing with prove I type:
31
32  prove -rb t
33
34The switches here are -r to recurse into any directories below 't'
35and -b which adds ./blib/lib and ./blib/arch to Perl's include path
36so that the tests can find the code they will be testing. If I'm
37testing a module of which an earlier version is already installed
38I need to be careful about the include path to make sure I'm not
39running my tests against the installed version rather than the new
40one that I'm working on.
41
42Unlike F<make test>, typing F<prove> doesn't automatically rebuild
43my module. If I forget to make before prove I will be testing against
44older versions of those files - which inevitably leads to confusion.
45I either get into the habit of typing
46
47  make && prove -rb t
48
49or - if I have no XS code that needs to be built I use the modules
50below F<lib> instead
51
52  prove -Ilib -r t
53
54So far I've shown you nothing that F<make test> doesn't do. Let's
55fix that.
56
57=head2 Saved State
58
59If I have failing tests in a test suite that consists of more than
60a handful of scripts and takes more than a few seconds to run it
61rapidly becomes tedious to run the whole test suite repeatedly as
62I track down the problems.
63
64I can tell prove just to run the tests that are failing like this:
65
66  prove -b t/this_fails.t t/so_does_this.t
67
68That speeds things up but I have to make a note of which tests are
69failing and make sure that I run those tests. Instead I can use
70prove's --state switch and have it keep track of failing tests for
71me. First I do a complete run of the test suite and tell prove to
72save the results:
73
74  prove -rb --state=save t
75
76That stores a machine readable summary of the test run in a file
77called '.prove' in the current directory. If I have failures I can
78then run just the failing scripts like this:
79
80  prove -b --state=failed
81
82I can also tell prove to save the results again so that it updates
83its idea of which tests failed:
84
85  prove -b --state=failed,save
86
87As soon as one of my failing tests passes it will be removed from
88the list of failed tests. Eventually I fix them all and prove can
89find no failing tests to run:
90
91  Files=0, Tests=0, 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
92  Result: NOTESTS
93
94As I work on a particular part of my module it's most likely that
95the tests that cover that code will fail. I'd like to run the whole
96test suite but have it prioritize these 'hot' tests. I can tell
97prove to do this:
98
99  prove -rb --state=hot,save t
100
101All the tests will run but those that failed most recently will be
102run first. If no tests have failed since I started saving state all
103tests will run in their normal order. This combines full test
104coverage with early notification of failures.
105
106The --state switch supports a number of options; for example to run
107failed tests first followed by all remaining tests ordered by the
108timestamps of the test scripts - and save the results - I can use
109
110  prove -rb --state=failed,new,save t
111
112See the prove documentation (type prove --man) for the full list
113of state options.
114
115When I tell prove to save state it writes a file called '.prove'
116('_prove' on Windows) in the current directory. It's a YAML document
117so it's quite easy to write tools of your own that work on the saved
118test state - but the format isn't officially documented so it might
119change without (much) warning in the future.
120
121=head2 Parallel Testing
122
123If my tests take too long to run I may be able to speed them up by
124running multiple test scripts in parallel. This is particularly
125effective if the tests are I/O bound or if I have multiple CPU
126cores. I tell prove to run my tests in parallel like this:
127
128  prove -rb -j 9 t
129
130The -j switch enables parallel testing; the number that follows it
131is the maximum number of tests to run in parallel. Sometimes tests
132that pass when run sequentially will fail when run in parallel. For
133example if two different test scripts use the same temporary file
134or attempt to listen on the same socket I'll have problems running
135them in parallel. If I see unexpected failures I need to check my
136tests to work out which of them are trampling on the same resource
137and rename temporary files or add locks as appropriate.
138
139To get the most performance benefit I want to have the test scripts
140that take the longest to run start first - otherwise I'll be waiting
141for the one test that takes nearly a minute to complete after all
142the others are done. I can use the --state switch to run the tests
143in slowest to fastest order:
144
145  prove -rb -j 9 --state=slow,save t
146
147=head2 Non-Perl Tests
148
149The Test Anything Protocol (http://testanything.org/) isn't just
150for Perl. Just about any language can be used to write tests that
151output TAP. There are TAP based testing libraries for C, C++, PHP,
152Python and many others. If I can't find a TAP library for my language
153of choice it's easy to generate valid TAP. It looks like this:
154
155  1..3
156  ok 1 - init OK
157  ok 2 - opened file
158  not ok 3 - appended to file
159
160The first line is the plan - it specifies the number of tests I'm
161going to run so that it's easy to check that the test script didn't
162exit before running all the expected tests. The following lines are
163the test results - 'ok' for pass, 'not ok' for fail. Each test has
164a number and, optionally, a description. And that's it. Any language
165that can produce output like that on STDOUT can be used to write
166tests.
167
168Recently I've been rekindling a two-decades-old interest in Forth.
169Evidently I have a masochistic streak that even Perl can't satisfy.
170I want to write tests in Forth and run them using prove (you can
171find my gforth TAP experiments at
172https://svn.hexten.net/andy/Forth/Testing/). I can use the --exec
173switch to tell prove to run the tests using gforth like this:
174
175  prove -r --exec gforth t
176
177Alternately, if the language used to write my tests allows a shebang
178line I can use that to specify the interpreter. Here's a test written
179in PHP:
180
181  #!/usr/bin/php
182  <?php
183    print "1..2\n";
184    print "ok 1\n";
185    print "not ok 2\n";
186  ?>
187
188If I save that as t/phptest.t the shebang line will ensure that it
189runs correctly along with all my other tests.
190
191=head2 Mixing it up
192
193Subtle interdependencies between test programs can mask problems -
194for example an earlier test may neglect to remove a temporary file
195that affects the behaviour of a later test. To find this kind of
196problem I use the --shuffle and --reverse options to run my tests
197in random or reversed order.
198
199=head2 Rolling My Own
200
201If I need a feature that prove doesn't provide I can easily write my own.
202
203Typically you'll want to change how TAP gets I<input> into and I<output>
204from the parser.  L<App::Prove> supports arbitrary plugins, and L<TAP::Harness>
205supports custom I<formatters> and I<source handlers> that you can load using
206either L<prove> or L<Module::Build>; there are many examples to base mine on.
207For more details see L<App::Prove>, L<TAP::Parser::SourceHandler>, and
208L<TAP::Formatter::Base>.
209
210If writing a plugin is not enough, you can write your own test harness; one of
211the motives for the 3.00 rewrite of Test::Harness was to make it easier to
212subclass and extend.
213
214The Test::Harness module is a compatibility wrapper around TAP::Harness.
215For new applications I should use TAP::Harness directly. As we'll
216see, prove uses TAP::Harness.
217
218When I run prove it processes its arguments, figures out which test
219scripts to run and then passes control to TAP::Harness to run the
220tests, parse, analyse and present the results. By subclassing
221TAP::Harness I can customise many aspects of the test run.
222
223I want to log my test results in a database so I can track them
224over time. To do this I override the summary method in TAP::Harness.
225I start with a simple prototype that dumps the results as a YAML
226document:
227
228  package My::TAP::Harness;
229
230  use base 'TAP::Harness';
231  use YAML;
232
233  sub summary {
234    my ( $self, $aggregate ) = @_;
235    print Dump( $aggregate );
236    $self->SUPER::summary( $aggregate );
237  }
238
239  1;
240
241I need to tell prove to use my My::TAP::Harness. If My::TAP::Harness
242is on Perl's @INC include path I can
243
244  prove --harness=My::TAP::Harness -rb t
245
246If I don't have My::TAP::Harness installed on @INC I need to provide
247the correct path to perl when I run prove:
248
249  perl -Ilib `which prove` --harness=My::TAP::Harness -rb t
250
251I can incorporate these options into my own version of prove. It's
252pretty simple. Most of the work of prove is handled by App::Prove.
253The important code in prove is just:
254
255  use App::Prove;
256
257  my $app = App::Prove->new;
258  $app->process_args(@ARGV);
259  exit( $app->run ? 0 : 1 );
260
261If I write a subclass of App::Prove I can customise any aspect of
262the test runner while inheriting all of prove's behaviour. Here's
263myprove:
264
265  #!/usr/bin/env perl use lib qw( lib );      # Add ./lib to @INC
266  use App::Prove;
267
268  my $app = App::Prove->new;
269
270  # Use custom TAP::Harness subclass
271  $app->harness( 'My::TAP::Harness' );
272
273  $app->process_args( @ARGV ); exit( $app->run ? 0 : 1 );
274
275Now I can run my tests like this
276
277  ./myprove -rb t
278
279=head2 Deeper Customisation
280
281Now that I know how to subclass and replace TAP::Harness I can
282replace any other part of the harness. To do that I need to know
283which classes are responsible for which functionality. Here's a
284brief guided tour; the default class for each component is shown
285in parentheses. Normally any replacements I write will be subclasses
286of these default classes.
287
288When I run my tests TAP::Harness creates a scheduler
289(TAP::Parser::Scheduler) to work out the running order for the
290tests, an aggregator (TAP::Parser::Aggregator) to collect and analyse
291the test results and a formatter (TAP::Formatter::Console) to display
292those results.
293
294If I'm running my tests in parallel there may also be a multiplexer
295(TAP::Parser::Multiplexer) - the component that allows multiple
296tests to run simultaneously.
297
298Once it has created those helpers TAP::Harness starts running the
299tests. For each test it creates a new parser (TAP::Parser) which
300is responsible for running the test script and parsing its output.
301
302To replace any of these components I call one of these harness
303methods with the name of the replacement class:
304
305  aggregator_class
306  formatter_class
307  multiplexer_class
308  parser_class
309  scheduler_class
310
311For example, to replace the aggregator I would
312
313  $harness->aggregator_class( 'My::Aggregator' );
314
315Alternately I can supply the names of my substitute classes to the
316TAP::Harness constructor:
317
318  my $harness = TAP::Harness->new(
319    { aggregator_class => 'My::Aggregator' }
320  );
321
322If I need to reach even deeper into the internals of the harness I
323can replace the classes that TAP::Parser uses to execute test scripts
324and tokenise their output. Before running a test script TAP::Parser
325creates a grammar (TAP::Parser::Grammar) to decode the raw TAP into
326tokens, a result factory (TAP::Parser::ResultFactory) to turn the
327decoded TAP results into objects and, depending on whether it's
328running a test script or reading TAP from a file, scalar or array
329a source or an iterator (TAP::Parser::IteratorFactory).
330
331Each of these objects may be replaced by calling one of these parser
332methods:
333
334  source_class
335  perl_source_class
336  grammar_class
337  iterator_factory_class
338  result_factory_class
339
340=head2 Callbacks
341
342As an alternative to subclassing the components I need to change I
343can attach callbacks to the default classes. TAP::Harness exposes
344these callbacks:
345
346  parser_args      Tweak the parameters used to create the parser
347  made_parser      Just made a new parser
348  before_runtests  About to run tests
349  after_runtests   Have run all tests
350  after_test       Have run an individual test script
351
352TAP::Parser also supports callbacks; bailout, comment, plan, test,
353unknown, version and yaml are called for the corresponding TAP
354result types, ALL is called for all results, ELSE is called for all
355results for which a named callback is not installed and EOF is
356called once at the end of each TAP stream.
357
358To install a callback I pass the name of the callback and a subroutine
359reference to TAP::Harness or TAP::Parser's callback method:
360
361  $harness->callback( after_test => sub {
362    my ( $script, $desc, $parser ) = @_;
363  } );
364
365I can also pass callbacks to the constructor:
366
367  my $harness = TAP::Harness->new({
368    callbacks => {
369	    after_test => sub {
370        my ( $script, $desc, $parser ) = @_;
371        # Do something interesting here
372	    }
373    }
374  });
375
376When it comes to altering the behaviour of the test harness there's
377more than one way to do it. Which way is best depends on my
378requirements. In general if I only want to observe test execution
379without changing the harness' behaviour (for example to log test
380results to a database) I choose callbacks. If I want to make the
381harness behave differently subclassing gives me more control.
382
383=head2 Parsing TAP
384
385Perhaps I don't need a complete test harness. If I already have a
386TAP test log that I need to parse all I need is TAP::Parser and the
387various classes it depends upon. Here's the code I need to run a
388test and parse its TAP output
389
390  use TAP::Parser;
391
392  my $parser = TAP::Parser->new( { source => 't/simple.t' } );
393  while ( my $result = $parser->next ) {
394    print $result->as_string, "\n";
395  }
396
397Alternately I can pass an open filehandle as source and have the
398parser read from that rather than attempting to run a test script:
399
400  open my $tap, '<', 'tests.tap'
401    or die "Can't read TAP transcript ($!)\n";
402  my $parser = TAP::Parser->new( { source => $tap } );
403  while ( my $result = $parser->next ) {
404    print $result->as_string, "\n";
405  }
406
407This approach is useful if I need to convert my TAP based test
408results into some other representation. See TAP::Convert::TET
409(http://search.cpan.org/dist/TAP-Convert-TET/) for an example of
410this approach.
411
412=head2 Getting Support
413
414The Test::Harness developers hang out on the tapx-dev mailing
415list[1]. For discussion of general, language independent TAP issues
416there's the tap-l[2] list. Finally there's a wiki dedicated to the
417Test Anything Protocol[3]. Contributions to the wiki, patches and
418suggestions are all welcome.
419
420[1] L<http://www.hexten.net/mailman/listinfo/tapx-dev>
421[2] L<http://testanything.org/mailman/listinfo/tap-l>
422[3] L<http://testanything.org/>
423