1#!/usr/bin/perl -w
2
3=head1 NAME
4
5frend.pl -- Render a Bio::Graphics Feature File on the web
6
7=head1 SYNOPSIS
8
9 http://your.host.com/cgi-bin/frend.pl
10
11=head1 DESCRIPTION
12
13The frend.pl script is a thin front end around the Bio::Graphics
14module.  It accepts a list of files containing sequence (protein,
15nucleotide) feature coordinates from the file(s) listed on the command
16line or on standard input, renders them, and produces a PNG file on
17standard output.
18
19=head1 INSTALLATION
20
21Copy this script into your web site's cgi-bin directory.  Name it
22whatever you want.
23
24=head1 Feature Files Format
25
26This script accepts and processes sequence annotations in a simple
27tab-delimited format or in GFF format.
28
29The feature file format has a configuration section and a data
30section. The configuration section sets up the size and overall
31properties of the image, and the data section gives the feature
32data itself.
33
34=head2 Configuration Section
35
36If not provided, this scripts generates a reasonable default
37configuration section for you, so you do not need to provide a
38configuration section to get a reasonable image. However, to tune the
39appearance of the image, you will probably want to tweak the
40configuration. Here is an excerpt from the configuration section:
41
42
43 # example file
44 [general]
45 bases = -1000..21000
46 height = 12
47
48 [EST]
49 glyph = segments
50 bgcolor= yellow
51 connector = solid
52 height = 5
53
54 [FGENES]
55 glyph = transcript2
56 bgcolor = green
57 description = 1
58
59
60The configuration section is divided into a set of sections, each one
61labeled with a [section title]. The [general] section specifies global
62options for the entire image. Other sections apply to particular
63feature types. In the example above, the configuration in the [EST]
64section applies to features labeled as ESTs, while the configuration
65in the [FGENES] section applies to features labeled as predictions
66from the FGENES gene prediction program.
67
68Inside each section is a series of name=value pairs, where the name is
69the name of an option to set. You can put whitespace around the = sign
70to make it more readable, or even use a colon (:) if you prefer. The
71following option names are recognized:
72
73 Option     Value                                       Example
74 ------     -----                                       -------
75
76 bases      Min & max of the sequence range (bp)           1200..60000
77 width      width of the image (pixels)                    600
78 height     Height of each graphical element (pixels)      10
79 glyph      Style of each graphical element (see below)    transcript
80 fgcolor    Foreground color of each element               yellow
81 bgcolor    Background color of each element               blue
82 linewidth  Width of lines                                 3
83 label      Print the feature's name                       1
84 description Whether to print the feature's description    0
85 bump       Elements are not allowed to collide            1
86 ticks      Print tick marks on arrows                     1
87 connector  Type of group connector (dashed, hat or solid) dashed
88
89The "bases" and "width" options are only relevant in the [general]
90section. They are overridden by the like-named command-line options.
91
92The remainder of the options can be located in any section, but if
93present in the [general] section will set defaults for the others.
94
95Colors are English-language color names or Web-style #RRGGBB colors
96(see a book on HTML for an explanation). True/false values are 1 for
97true, and 0 for false. Numeric ranges can be expressed in start..end
98fashion with two dots, or as start-end with a hyphen.
99
100The "glyph" option controls how the features are rendered. The
101following glyphs are implemented:
102
103  Name                Description
104  ----                -----------
105
106  box                 A filled rectangle, nondirectional.
107  ellipse             An oval.
108  arrow               An arrow; can be unidirectional or
109		      bidirectional.  It is also capable of displaying
110                      a scale with major and minor tickmarks, and can
111                      be oriented horizontally or vertically.
112  segments            A set of filled rectangles connected by solid
113		      lines. Used for interrupted features, such as
114		      gapped alignments and exon groups.
115  transcript          Similar to segments, but the connecting line is
116		      a "hat" shape, and the direction of
117		      transcription is indicated by a small arrow.
118  transcript2         Similar to transcript, but the direction of
119		      transcription is indicated by a terminal segment
120		      in the shape of an arrow.
121  primers             Two inward pointing arrows connected by a line. Used for STSs.
122
123The bump option is the most important option for controlling the look
124of the image. If set to false (the number 0), then the features are
125allowed to overlap. If set to true (the number 1), then the features
126will move vertically to avoid colliding. If not specified, bump is
127turned on if the number of any given type of sequence feature is
128greater than 50.
129
130=head2 Data Section
131
132The data section can follow or proceed the configuration section. The
133two sections can also be intermixed. The data section is a tab or
134whitespace-delimited file which you can export from a spreadsheet
135application or word processor file (be sure to save as text only!)
136
137Here is an example data section:
138
139
140Cosmid     B0511        .       516-619
141Cosmid     B0511        .       3185-3294
142Cosmid     B0511        .       10946-11208
143Cosmid     B0511        .       13126-13511
144Cosmid     B0511        .       66-208
145Cosmid     B0511        .       6354-6499
146Cosmid     B0511        .       13955-14115
147EST        yk595e6.5    +       3187-3294
148EST        yk846e07.3   -       11015-11208
149EST        yk53c10
150           yk53c10.5    +       18892-19154
151           yk53c10.3    -       15000-15500,15700-15800
152EST        yk53c10.5    +       16032-16105
153SwissProt  PECANEX      +       13153-13656     Swedish fish
154FGENESH    "Gene 1"     -       1-205,518-616,661-735,3187-3365,3436-3846       Transmembrane domain
155FGENESH    "Gene 2"     -       16626-17396,17451-17597 Kinase and sushi domains
156
157
158Each line of the file contains five columns. The columns are:
159
160 Column #   Description
161 --------   -----------
162
163 1          feature type
164 2          feature name
165 3          strand
166 4          coordinates
167 5          description
168
169=over 4
170
171=item Feature type
172
173The feature type should correspond to one of the [feature type]
174headings in the configuration section. If it doesn't, the [general]
175options will be applied to the feature when rendering it. The feature
176name is a name for the feature. Use a "." or "-" if this is not
177relevant. If the name contains whitespace, put single or double quotes
178("") around the name.
179
180=item Strand
181
182The strand indicates which strand the feature is on. It is one of "+"
183for the forward strand, "-" for the reverse strand, or "." for
184features that are not stranded.
185
186=item Coordinates
187
188The coordinates column is a set of one or more ranges that the feature
189occupies. Ranges are written using ".." as in start..stop, or with
190hyphens, as in start-stop. For features that are composed of multiple
191ranges &em; for example transcripts that have multiple exons &em; you
192can either put the ranges on the same line separated by commas or
193spaces, or put the ranges on individual lines and just use the same
194feature name and type to group them. In the example above, the Cosmid
195B0511 features use the individual line style, while the FGENESH
196features use the all-ranges-on-one-line style.
197
198=item Description
199
200The last column contains some descriptive text. If the description
201option is set to true, this text will be printed underneath the
202feature in the rendering.
203
204=back
205
206Finally, it is possible to group related features together. An example
207is the ESTs yk53c10.5 and yk53c10.3, which are related by being reads
208from the two ends of the clone yk53c10. To indicate this relationship,
209generate a section that looks like this:
210
211 EST        yk53c10
212            yk53c10.5    +       18892-19154
213            yk53c10.3    -       15000-15500,15700-15800
214
215
216The group is indicated by a line that contains just two columns
217containing the feature type and a unique name for the group. Follow
218this line with all the features that form the group, but leave the
219first column (the feature type) blank. The group will be rendered by
220drawing a dashed line between all the members of the group. You can
221change this by specifying a different connector option in the
222configuration section for this feature type.
223
224=head1 BUGS
225
226Please report them to the author.
227
228=head1 SEE ALSO
229
230L<Bio::Graphics>, L<feature_draw.pl>
231
232=head1 AUTHOR
233
234Lincoln Stein, lstein@cshl.org
235
236=cut
237
238use strict;
239use Bio::Graphics::Panel;
240use Bio::Graphics::Feature;
241use Bio::Graphics::FeatureFile;
242use CGI qw(:standard);
243use CGI::Carp;
244use File::Temp ':mktemp';
245use File::Spec;
246use File::Basename 'basename';
247use File::Path 'mkpath';
248use vars '@COLORS';
249
250use constant WIDTH          => 600;  # default width
251use constant BUMP_THRESHOLD => 50;  # if more than this # of features, will stop bumping
252@COLORS = qw(cyan blue red yellow green wheat turquoise orange);  # default colors
253
254if (param('cat')) {
255  catfile(param('cat'));
256  exit 0;
257}
258
259print header,start_html('Sequence Feature Renderer');
260print h1('Sequence Feature Renderer');
261
262print p('This is a front end to the Bio::Graphics package, a part of the',
263	a({-href=>'http://www.bioperl.org'},'BioPerl library.'),
264	  'Cut and paste your sequence annotation data into the text field below, or upload it using the',
265	'upload button.',
266	'The format of the annotation data is explained',a({-href=>'#format'},'below.'));
267
268my $self = url(-relative=>1);
269print h3('Instant examples'),
270  p('For the impatient, you can paste in an',
271    b(a({-href=>"$self?Paste+Example+1"},'example file.')));
272
273read_file() if param('file');
274
275my $example = param('Example 1')
276  ? test_data(0)
277  : param('Example 2')
278  ? test_data(1)
279  : '';
280param(text => $example) if length $example;
281
282render() if param('text') || param('file') =~ /\w/;
283
284print start_multipart_form(),
285  table({-border=>0,-width=>300,-cellspacing=>0,-cellpadding=>0},
286	TR({-class=>'resultsbody'},
287	   td({-colspan=>1},
288	      'Cut and Paste the annotation file...'
289	     ),
290	   td({-colspan=>2},
291	      'Image width: ',
292	      popup_menu(-name=>'width',-values=>[480,640,800,1024,1280,1600],-default=>800)
293	     ),
294	   TR({-class=>'resultsbody'},
295	      td({-colspan=>3},
296		 pre(
297		     textarea(-name=>'text',-value=>$example,
298			      -cols=>80,-rows=>10,-wrap=>'off',-override=>length $example || param('Clear'))
299		    )
300		)
301	     )
302	  ),
303	TR({-class=>'resultsbody'},
304	   td({-colspan=>1},'Upload it... ',filefield(-name=>'file',-size=>30)),
305	   td({-align=>'left',-colspan=>2},
306	      'Or paste one of the example files...',
307	      submit('Example 1'),
308	      submit('Example 2'),
309	      submit('Clear'),
310	     )
311	  ),
312	TR({-class=>'resultstitle'},
313	   td({-align=>'left',-colspan=>3},
314	      "Press",b('Render'),'when ready...',
315	      b(submit('Render...'))
316	     ),
317	   )),
318  end_form;
319
320print_format();
321
322print hr(),a({-href=>'http://www.bioperl.org'},'www.bioperl.org'),end_html();
323
324exit 0;
325
326sub read_file {
327  my $text;
328  my $fh = param('file') or return;
329  $text .= $_ while <$fh>;
330  param(text => $text);
331}
332
333sub render {
334  my $text = shift;
335  my $color = 0;      # position in color cycle
336
337  $text ||= param('text');
338  my $data = $text ? Bio::Graphics::FeatureFile->new(-text => $text)
339                   : Bio::Graphics::FeatureFile->new(-file => param('file'));
340
341  unless ($data->min < $data->max) {
342    AceError("This doesn't look like a valid annotation file.  No annotations found.");
343    exit 0;
344  }
345
346  # adjust the width if requested
347  $data->setting(general => 'width',param('width')) if param('width');
348
349  # render the panel
350  my $panel = $data->new_panel;
351  $data->render($panel);
352
353  # we create the file and write it out
354  my $gd = $panel->gd;
355  my $suffix = $gd->can('gif') ? '.gif' : '.png';
356  my $dir  = tmpdir();
357  mkpath($dir) unless -e $dir;
358  my($fh,$filename) = mkstemps(tmpfile('XXXXXXXX'),$suffix);
359
360  print $fh ($gd->can('gif') ? $gd->gif : $gd->png);
361  close $fh;
362
363  # now we send the link to the user
364  my $self = url(-relative=>1);
365  my $base = basename($filename);
366  my $url  = "$self/features$suffix?cat=$base";
367  my ($w,$h) = $gd->getBounds;
368
369  print hr(),h2('Rendering');
370  print a({-name=>'rendering'},
371	  img({-src=>$url,-alt=>'Right-click and "Save As..." to save this image',
372	       -border=>0,-width=>$w,-height=>$h})
373	  );
374}
375
376sub tmpdir {
377  return File::Spec->catfile(File::Spec->tmpdir,'frend');
378}
379
380sub tmpfile {
381  return File::Spec->catfile(tmpdir(),shift);
382}
383
384sub catfile {
385  my $file = shift;
386  my $path = tmpfile($file);
387  print header($path =~ /\.gif$/ ? 'image/gif' : 'image/png');
388  open F,$path or die "Couldn't open $file for reading: $!";
389  print while <F>;
390  close F;
391  unlink $path;
392}
393
394sub print_format {
395  print hr();
396  print a({-name=>'format'},h2('Annotation file format'));
397  print <<END;
398<p>
399The annotation file format has a configuration section and a data section.  The configuration section
400sets up the size and overall properties of the image, and the data section gives the annotation data
401itself.
402<p>
403<h3>Configuration Section</h3>
404<p>
405If not provided, this page generates a reasonable default configuration section for you, so you
406do not need to provide a configuration section to get a reasonable image.  However, to tune the
407appearance of the image, you will probably want to tweak the configuration.  Here is an excerpt
408from the configuration section:
409<blockquote>
410<pre>
411# example file
412[general]
413bases = -1000..21000
414height = 12
415
416[EST]
417glyph = segments
418bgcolor= yellow
419connector = solid
420height = 5
421
422[FGENES]
423glyph = transcript2
424bgcolor = green
425description = 1
426</pre>
427</blockquote>
428
429<p>
430The configuration section is divided into a set of sections, each one labeled with a [section title].
431The [general] section specifies global options for the entire image.  Other sections apply to particular
432feature types.  In the example above, the configuration in the [EST] section applies to features labeled
433as ESTs, while the configuration in the [FGENES] section applies to features labeled as predictions from
434the FGENES gene prediction program.
435<p>
436Inside each section is a series of <i>name</i>=<i>value</i> pairs, where the name is the name of
437an option to set.  You can put whitespace around the = sign to make it more readable, or even use
438a colon (:) if you prefer.  The following option names are recognized:
439<p>
440<table border="1">
441<tr>
442  <th>Option</th><th>Value</th><th>Example</th>
443</tr>
444<tr>
445  <th>bases</th><td>Min &amp; max of the sequence range (bp)</td><td>1200..60000</td>
446</tr>
447<tr>
448  <th>width</th><td>width of the image (pixels)</td>                 <td>600</td>
449</tr>
450<tr>
451  <th>height</th><td>Height of each graphical element (pixels)</td><td>10</td>
452</tr>
453<tr>
454  <th>glyph</th><td>Style of each graphical element (see below)</td><td>transcript</td>
455</tr>
456<tr>
457  <th>fgcolor</th>      <td>Foreground color of each element</td>            <td>yellow</td>
458</tr>
459<tr>
460  <th>bgcolor</th>      <td>Background color of each element</td>            <td>blue</td>
461</tr>
462<tr>
463  <th>linewidth</th>      <td>Width of lines</td>            <td>3</td>
464</tr>
465<tr>
466  <th>label</th>        <td>Print the feature's name</td>         <td>1</td>
467</tr>
468<tr>
469  <th>description</th>  <td>Whether to print the feature's description </td> <td>0</td>
470</tr>
471<tr>
472  <th>bump</th>         <td>Elements are not allowed to collide</td> <td>1</td>
473</tr>
474<tr>
475  <th>ticks</th>        <td>Print tick marks on arrows</td>       <td>1</td>
476</tr>
477<tr>
478  <th>connector</th>    <td>Type of group connector (dashed, hat or solid)</td>       <td>dashed</td>
479</tr>
480</table>
481<p>
482
483The "bases" and "width" options are only relevant in the [general]
484section.  The rest can be located in any section, but if present in
485the [general] section will set defaults for the others.
486
487<p>
488
489Colors are English-language color names or Web-style #RRGGBB colors
490(see a book on HTML for an explanation).  True/false values are 1 for
491true, and 0 for false.  Numeric ranges can be expressed in
492<i>start</i>..<i>end</i> fashion with two dots, or as
493<i>start</i>-<i>end</i> with a hyphen.
494
495<p>
496The "glyph" option controls how the features are rendered.  The
497following glyphs are implemented:
498
499<p>
500
501<table border="1">
502
503<tr><th>Name</th><th>Description</th></tr>
504<tr>
505  <th>
506  box
507  </th>
508  <td>A filled rectangle, nondirectional.</td>
509</tr>
510<tr>
511  <th>ellipse</th><td>An oval.</td>
512</tr>
513<tr>
514<th>arrow</th>
515<td>	      An arrow; can be unidirectional or bidirectional.
516	      It is also capable of displaying a scale with
517	      major and minor tickmarks, and can be oriented
518	      horizontally or vertically.
519</td>
520</tr>
521<tr>
522  <th>segments</th>
523  <td>    A set of filled rectangles connected by solid lines.
524  Used for interrupted features, such as gapped
525  alignments.
526</td>
527</tr>
528<tr>
529  <th>transcript</th>
530<td>
531  Similar to segments, but the connecting line is
532  a "hat" shape, and the direction of transcription
533  is indicated by a small arrow.
534  </td>
535</tr>
536<tr>
537<th>
538  transcript2</th>
539<td>  Similar to transcript, but the direction of
540  transcription is indicated by a terminal segment
541  in the shape of an arrow.
542</td>
543</tr>
544<tr>
545<th>
546  primers
547</th>
548<td>     Two inward pointing arrows connected by a line.
549	      Used for STSs.
550</td>
551</tr>
552</table>
553<p>
554
555The <b>bump</b> option is the most important option for controlling the look
556of the image.  If set to false (the number 0), then the features are allowed
557to overlap.  If set to true (the number 1), then the features will move
558vertically to avoid colliding.  If not specified, bump is turned on
559if the number of any given type of sequence feature is greater than
560${\BUMP_THRESHOLD}.
561
562<h3>Data Section</h3>
563<p>
564
565The data section can follow or proceed the configuration section.  The two sections
566can also be intermixed.  The data section is a tab or whitespace-delimited file which you can
567export from a spreadsheet application or word processor file (be sure to save as text only!)
568
569<p>
570
571Here is an example data section:
572
573<p>
574
575<blockquote>
576<pre>
577Cosmid	   B0511	.	516-619
578Cosmid	   B0511	.	3185-3294
579Cosmid	   B0511	.	10946-11208
580Cosmid	   B0511	.	13126-13511
581Cosmid	   B0511	.	66-208
582Cosmid	   B0511	.	6354-6499
583Cosmid	   B0511	.	13955-14115
584EST	   yk595e6.5	+	3187-3294
585EST	   yk846e07.3	-	11015-11208
586EST	   yk53c10
587	   yk53c10.5	+	18892-19154
588	   yk53c10.3	-	15000-15500,15700-15800
589EST	   yk53c10.5	+	16032-16105
590SwissProt  PECANEX	+	13153-13656	Swedish fish
591FGENESH	   "Gene 1"	-	1-205,518-616,661-735,3187-3365,3436-3846	Transmembrane domain
592FGENESH	   "Gene 2"	-	16626-17396,17451-17597	Kinase and sushi domains
593</pre>
594</blockquote>
595
596<p>
597
598Each line of the file contains five columns.  The columns are:
599
600<p>
601
602<table border="1">
603<tr><th>Column #</th><th>Column Description</th></tr>
604<tr><td align="right">1</td><td>feature type</td></tr>
605<tr><td align="right">2</td><td>feature name</td></tr>
606<tr><td align="right">3</td><td>strand</td></tr>
607<tr><td align="right">4</td><td>coordinates</td></tr>
608<tr><td align="right">5</td><td>description</td></tr>
609</table>
610<p>
611
612The <b>feature type</b> should correspond to one of the [feature type] headings
613in the configuration section.  If it doesn't, the [general] options will
614be applied to the feature when rendering it.  The <b>feature name</b> is a
615name for the feature.  Use a "." or "-" if this is not relevant.  If
616the name contains whitespace, put single or double quotes ("") around
617the name.
618
619<p>
620
621The <b>strand</b>
622indicates which strand the feature is on.  It is one of "+" for the
623forward strand, "-" for the reverse strand, or "." for features that are not
624stranded.
625
626<p>
627
628The <b>coordinates</b> column is a set of one or more ranges that the
629feature occupies.  Ranges are written using ".." as in <i>start</i>..<i>stop</i>,
630or with hyphens, as in <i>start</i>-<i>stop</i>. For features that are composed
631of multiple ranges &em; for example transcripts that have multiple exons &em;
632you can either put the ranges on the same line separated by commas or spaces,
633or put the ranges on individual lines and just use the same feature name and
634type to group them.  In the example above, the Cosmid B0511 features use
635the individual line style, while the FGENESH features use the all-ranges-on-one-line
636style.
637
638<p>
639
640The last column contains some descriptive text.  If the <b>description</b> option
641is set to true, this text will be printed underneath the feature in the rendering.
642
643<p>
644
645Finally, it is possible to group related features together.  An example is
646the ESTs yk53c10.5 and yk53c10.3, which are related by being reads from
647the two ends of the clone yk53c10.  To indicate this relationship, generate
648a section that looks like this:
649
650<p>
651
652<blockquote>
653<pre>
654EST	   yk53c10
655	   yk53c10.5	+	18892-19154
656	   yk53c10.3	-	15000-15500,15700-15800
657</pre>
658</blockquote>
659
660<p>
661
662The group is indicated by a line that contains just two columns
663containing the feature type and a unique name for the group.
664Follow this line with all
665the features that form the group, but leave the first column
666(the feature type) blank.  The group will be rendered by
667drawing a dashed line between all the members of the group.
668You can change this by specifying a different <b>connector</b>
669option in the configuration section for this feature type.
670
671END
672;
673
674}
675
676sub test_data {
677  my $config = shift;
678  my $header = <<'END';
679[general]
680bases = -1000..21000
681height = 12
682reference = B0511
683
684[Cosmid]
685glyph = segments
686fgcolor = blue
687key = C. elegans conserved regions
688
689[EST]
690glyph = segments
691bgcolor= yellow
692connector = solid
693height = 5
694
695[FGENESH]
696glyph = transcript2
697bgcolor = green
698description = 1
699
700[SwissProt]
701glyph = arrow
702base  = 1
703linewidth = 2
704fgcolor = red
705description = 1
706
707[P-element]
708glyph = triangle
709orient = S
710bgcolor = red
711fgcolor = white
712label = 1
713point = 1
714
715END
716;
717
718my $data =<<'END';
719Cosmid	B0511	516-619
720Cosmid	B0511	3185-3294
721Cosmid	B0511	10946-11208
722Cosmid	B0511	13126-13511
723Cosmid	B0511	11394-11539
724Cosmid	B0511	14383-14490
725Cosmid	B0511	15569-15755
726Cosmid	B0511	18879-19178
727Cosmid	B0511	15850-16110
728Cosmid	B0511	66-208
729Cosmid	B0511	6354-6499
730Cosmid	B0511	13955-14115
731Cosmid	B0511	7985-8042
732Cosmid	B0511	11916-12046
733P-element	""	500-500
734P-element	MrQ	700-700
735P-element	MrR	10000-10000
736EST	yk260e10.5	15569-15724
737EST	yk672a12.5	537-618,3187-3294
738EST	yk595e6.5	552-618
739EST	yk595e6.5	3187-3294
740EST	yk846e07.3	11015-11208
741EST	yk53c10
742	yk53c10.3	12876-13577,13882-14121,14169-14535
743	yk53c10.5	18892-19154,15853-16219
744SwissProt	"PECANEX Protein"	5513-16656	"From SwissProt"
745FGENESH	"Predicted gene 1"	-1200--500,518-616,661-735,3187-3365,3436-3846	Pfam domain
746FGENESH	"Predicted gene 2"	5513-6497,7968-8136,8278-8383,8651-8839,9462-9515,10032-10705,10949-11340,11387-11524,11765-12067,12876-13577,13882-14121,14169-14535,15006-15209,15259-15462,15513-15753,15853-16219	Mysterious
747FGENESH	"Predicted gene 3"	16626-17396,17451-17597
748FGENESH	"Predicted gene 4"	18459-18722,18882-19176,19221-19513,19572-30000	"Transmembrane protein"
749END
750
751  return $config ? $header . $data : $data;
752}
753
754__END__
755
756