1#!/usr/bin/perl -w 2 3=head1 NAME 4 5frend.pl -- Render a Bio::Graphics Feature File on the web 6 7=head1 SYNOPSIS 8 9 http://your.host.com/cgi-bin/frend.pl 10 11=head1 DESCRIPTION 12 13The frend.pl script is a thin front end around the Bio::Graphics 14module. It accepts a list of files containing sequence (protein, 15nucleotide) feature coordinates from the file(s) listed on the command 16line or on standard input, renders them, and produces a PNG file on 17standard output. 18 19=head1 INSTALLATION 20 21Copy this script into your web site's cgi-bin directory. Name it 22whatever you want. 23 24=head1 Feature Files Format 25 26This script accepts and processes sequence annotations in a simple 27tab-delimited format or in GFF format. 28 29The feature file format has a configuration section and a data 30section. The configuration section sets up the size and overall 31properties of the image, and the data section gives the feature 32data itself. 33 34=head2 Configuration Section 35 36If not provided, this scripts generates a reasonable default 37configuration section for you, so you do not need to provide a 38configuration section to get a reasonable image. However, to tune the 39appearance of the image, you will probably want to tweak the 40configuration. Here is an excerpt from the configuration section: 41 42 43 # example file 44 [general] 45 bases = -1000..21000 46 height = 12 47 48 [EST] 49 glyph = segments 50 bgcolor= yellow 51 connector = solid 52 height = 5 53 54 [FGENES] 55 glyph = transcript2 56 bgcolor = green 57 description = 1 58 59 60The configuration section is divided into a set of sections, each one 61labeled with a [section title]. The [general] section specifies global 62options for the entire image. Other sections apply to particular 63feature types. In the example above, the configuration in the [EST] 64section applies to features labeled as ESTs, while the configuration 65in the [FGENES] section applies to features labeled as predictions 66from the FGENES gene prediction program. 67 68Inside each section is a series of name=value pairs, where the name is 69the name of an option to set. You can put whitespace around the = sign 70to make it more readable, or even use a colon (:) if you prefer. The 71following option names are recognized: 72 73 Option Value Example 74 ------ ----- ------- 75 76 bases Min & max of the sequence range (bp) 1200..60000 77 width width of the image (pixels) 600 78 height Height of each graphical element (pixels) 10 79 glyph Style of each graphical element (see below) transcript 80 fgcolor Foreground color of each element yellow 81 bgcolor Background color of each element blue 82 linewidth Width of lines 3 83 label Print the feature's name 1 84 description Whether to print the feature's description 0 85 bump Elements are not allowed to collide 1 86 ticks Print tick marks on arrows 1 87 connector Type of group connector (dashed, hat or solid) dashed 88 89The "bases" and "width" options are only relevant in the [general] 90section. They are overridden by the like-named command-line options. 91 92The remainder of the options can be located in any section, but if 93present in the [general] section will set defaults for the others. 94 95Colors are English-language color names or Web-style #RRGGBB colors 96(see a book on HTML for an explanation). True/false values are 1 for 97true, and 0 for false. Numeric ranges can be expressed in start..end 98fashion with two dots, or as start-end with a hyphen. 99 100The "glyph" option controls how the features are rendered. The 101following glyphs are implemented: 102 103 Name Description 104 ---- ----------- 105 106 box A filled rectangle, nondirectional. 107 ellipse An oval. 108 arrow An arrow; can be unidirectional or 109 bidirectional. It is also capable of displaying 110 a scale with major and minor tickmarks, and can 111 be oriented horizontally or vertically. 112 segments A set of filled rectangles connected by solid 113 lines. Used for interrupted features, such as 114 gapped alignments and exon groups. 115 transcript Similar to segments, but the connecting line is 116 a "hat" shape, and the direction of 117 transcription is indicated by a small arrow. 118 transcript2 Similar to transcript, but the direction of 119 transcription is indicated by a terminal segment 120 in the shape of an arrow. 121 primers Two inward pointing arrows connected by a line. Used for STSs. 122 123The bump option is the most important option for controlling the look 124of the image. If set to false (the number 0), then the features are 125allowed to overlap. If set to true (the number 1), then the features 126will move vertically to avoid colliding. If not specified, bump is 127turned on if the number of any given type of sequence feature is 128greater than 50. 129 130=head2 Data Section 131 132The data section can follow or proceed the configuration section. The 133two sections can also be intermixed. The data section is a tab or 134whitespace-delimited file which you can export from a spreadsheet 135application or word processor file (be sure to save as text only!) 136 137Here is an example data section: 138 139 140Cosmid B0511 . 516-619 141Cosmid B0511 . 3185-3294 142Cosmid B0511 . 10946-11208 143Cosmid B0511 . 13126-13511 144Cosmid B0511 . 66-208 145Cosmid B0511 . 6354-6499 146Cosmid B0511 . 13955-14115 147EST yk595e6.5 + 3187-3294 148EST yk846e07.3 - 11015-11208 149EST yk53c10 150 yk53c10.5 + 18892-19154 151 yk53c10.3 - 15000-15500,15700-15800 152EST yk53c10.5 + 16032-16105 153SwissProt PECANEX + 13153-13656 Swedish fish 154FGENESH "Gene 1" - 1-205,518-616,661-735,3187-3365,3436-3846 Transmembrane domain 155FGENESH "Gene 2" - 16626-17396,17451-17597 Kinase and sushi domains 156 157 158Each line of the file contains five columns. The columns are: 159 160 Column # Description 161 -------- ----------- 162 163 1 feature type 164 2 feature name 165 3 strand 166 4 coordinates 167 5 description 168 169=over 4 170 171=item Feature type 172 173The feature type should correspond to one of the [feature type] 174headings in the configuration section. If it doesn't, the [general] 175options will be applied to the feature when rendering it. The feature 176name is a name for the feature. Use a "." or "-" if this is not 177relevant. If the name contains whitespace, put single or double quotes 178("") around the name. 179 180=item Strand 181 182The strand indicates which strand the feature is on. It is one of "+" 183for the forward strand, "-" for the reverse strand, or "." for 184features that are not stranded. 185 186=item Coordinates 187 188The coordinates column is a set of one or more ranges that the feature 189occupies. Ranges are written using ".." as in start..stop, or with 190hyphens, as in start-stop. For features that are composed of multiple 191ranges &em; for example transcripts that have multiple exons &em; you 192can either put the ranges on the same line separated by commas or 193spaces, or put the ranges on individual lines and just use the same 194feature name and type to group them. In the example above, the Cosmid 195B0511 features use the individual line style, while the FGENESH 196features use the all-ranges-on-one-line style. 197 198=item Description 199 200The last column contains some descriptive text. If the description 201option is set to true, this text will be printed underneath the 202feature in the rendering. 203 204=back 205 206Finally, it is possible to group related features together. An example 207is the ESTs yk53c10.5 and yk53c10.3, which are related by being reads 208from the two ends of the clone yk53c10. To indicate this relationship, 209generate a section that looks like this: 210 211 EST yk53c10 212 yk53c10.5 + 18892-19154 213 yk53c10.3 - 15000-15500,15700-15800 214 215 216The group is indicated by a line that contains just two columns 217containing the feature type and a unique name for the group. Follow 218this line with all the features that form the group, but leave the 219first column (the feature type) blank. The group will be rendered by 220drawing a dashed line between all the members of the group. You can 221change this by specifying a different connector option in the 222configuration section for this feature type. 223 224=head1 BUGS 225 226Please report them to the author. 227 228=head1 SEE ALSO 229 230L<Bio::Graphics>, L<feature_draw.pl> 231 232=head1 AUTHOR 233 234Lincoln Stein, lstein@cshl.org 235 236=cut 237 238use strict; 239use Bio::Graphics::Panel; 240use Bio::Graphics::Feature; 241use Bio::Graphics::FeatureFile; 242use CGI qw(:standard); 243use CGI::Carp; 244use File::Temp ':mktemp'; 245use File::Spec; 246use File::Basename 'basename'; 247use File::Path 'mkpath'; 248use vars '@COLORS'; 249 250use constant WIDTH => 600; # default width 251use constant BUMP_THRESHOLD => 50; # if more than this # of features, will stop bumping 252@COLORS = qw(cyan blue red yellow green wheat turquoise orange); # default colors 253 254if (param('cat')) { 255 catfile(param('cat')); 256 exit 0; 257} 258 259print header,start_html('Sequence Feature Renderer'); 260print h1('Sequence Feature Renderer'); 261 262print p('This is a front end to the Bio::Graphics package, a part of the', 263 a({-href=>'http://www.bioperl.org'},'BioPerl library.'), 264 'Cut and paste your sequence annotation data into the text field below, or upload it using the', 265 'upload button.', 266 'The format of the annotation data is explained',a({-href=>'#format'},'below.')); 267 268my $self = url(-relative=>1); 269print h3('Instant examples'), 270 p('For the impatient, you can paste in an', 271 b(a({-href=>"$self?Paste+Example+1"},'example file.'))); 272 273read_file() if param('file'); 274 275my $example = param('Example 1') 276 ? test_data(0) 277 : param('Example 2') 278 ? test_data(1) 279 : ''; 280param(text => $example) if length $example; 281 282render() if param('text') || param('file') =~ /\w/; 283 284print start_multipart_form(), 285 table({-border=>0,-width=>300,-cellspacing=>0,-cellpadding=>0}, 286 TR({-class=>'resultsbody'}, 287 td({-colspan=>1}, 288 'Cut and Paste the annotation file...' 289 ), 290 td({-colspan=>2}, 291 'Image width: ', 292 popup_menu(-name=>'width',-values=>[480,640,800,1024,1280,1600],-default=>800) 293 ), 294 TR({-class=>'resultsbody'}, 295 td({-colspan=>3}, 296 pre( 297 textarea(-name=>'text',-value=>$example, 298 -cols=>80,-rows=>10,-wrap=>'off',-override=>length $example || param('Clear')) 299 ) 300 ) 301 ) 302 ), 303 TR({-class=>'resultsbody'}, 304 td({-colspan=>1},'Upload it... ',filefield(-name=>'file',-size=>30)), 305 td({-align=>'left',-colspan=>2}, 306 'Or paste one of the example files...', 307 submit('Example 1'), 308 submit('Example 2'), 309 submit('Clear'), 310 ) 311 ), 312 TR({-class=>'resultstitle'}, 313 td({-align=>'left',-colspan=>3}, 314 "Press",b('Render'),'when ready...', 315 b(submit('Render...')) 316 ), 317 )), 318 end_form; 319 320print_format(); 321 322print hr(),a({-href=>'http://www.bioperl.org'},'www.bioperl.org'),end_html(); 323 324exit 0; 325 326sub read_file { 327 my $text; 328 my $fh = param('file') or return; 329 $text .= $_ while <$fh>; 330 param(text => $text); 331} 332 333sub render { 334 my $text = shift; 335 my $color = 0; # position in color cycle 336 337 $text ||= param('text'); 338 my $data = $text ? Bio::Graphics::FeatureFile->new(-text => $text) 339 : Bio::Graphics::FeatureFile->new(-file => param('file')); 340 341 unless ($data->min < $data->max) { 342 AceError("This doesn't look like a valid annotation file. No annotations found."); 343 exit 0; 344 } 345 346 # adjust the width if requested 347 $data->setting(general => 'width',param('width')) if param('width'); 348 349 # render the panel 350 my $panel = $data->new_panel; 351 $data->render($panel); 352 353 # we create the file and write it out 354 my $gd = $panel->gd; 355 my $suffix = $gd->can('gif') ? '.gif' : '.png'; 356 my $dir = tmpdir(); 357 mkpath($dir) unless -e $dir; 358 my($fh,$filename) = mkstemps(tmpfile('XXXXXXXX'),$suffix); 359 360 print $fh ($gd->can('gif') ? $gd->gif : $gd->png); 361 close $fh; 362 363 # now we send the link to the user 364 my $self = url(-relative=>1); 365 my $base = basename($filename); 366 my $url = "$self/features$suffix?cat=$base"; 367 my ($w,$h) = $gd->getBounds; 368 369 print hr(),h2('Rendering'); 370 print a({-name=>'rendering'}, 371 img({-src=>$url,-alt=>'Right-click and "Save As..." to save this image', 372 -border=>0,-width=>$w,-height=>$h}) 373 ); 374} 375 376sub tmpdir { 377 return File::Spec->catfile(File::Spec->tmpdir,'frend'); 378} 379 380sub tmpfile { 381 return File::Spec->catfile(tmpdir(),shift); 382} 383 384sub catfile { 385 my $file = shift; 386 my $path = tmpfile($file); 387 print header($path =~ /\.gif$/ ? 'image/gif' : 'image/png'); 388 open F,$path or die "Couldn't open $file for reading: $!"; 389 print while <F>; 390 close F; 391 unlink $path; 392} 393 394sub print_format { 395 print hr(); 396 print a({-name=>'format'},h2('Annotation file format')); 397 print <<END; 398<p> 399The annotation file format has a configuration section and a data section. The configuration section 400sets up the size and overall properties of the image, and the data section gives the annotation data 401itself. 402<p> 403<h3>Configuration Section</h3> 404<p> 405If not provided, this page generates a reasonable default configuration section for you, so you 406do not need to provide a configuration section to get a reasonable image. However, to tune the 407appearance of the image, you will probably want to tweak the configuration. Here is an excerpt 408from the configuration section: 409<blockquote> 410<pre> 411# example file 412[general] 413bases = -1000..21000 414height = 12 415 416[EST] 417glyph = segments 418bgcolor= yellow 419connector = solid 420height = 5 421 422[FGENES] 423glyph = transcript2 424bgcolor = green 425description = 1 426</pre> 427</blockquote> 428 429<p> 430The configuration section is divided into a set of sections, each one labeled with a [section title]. 431The [general] section specifies global options for the entire image. Other sections apply to particular 432feature types. In the example above, the configuration in the [EST] section applies to features labeled 433as ESTs, while the configuration in the [FGENES] section applies to features labeled as predictions from 434the FGENES gene prediction program. 435<p> 436Inside each section is a series of <i>name</i>=<i>value</i> pairs, where the name is the name of 437an option to set. You can put whitespace around the = sign to make it more readable, or even use 438a colon (:) if you prefer. The following option names are recognized: 439<p> 440<table border="1"> 441<tr> 442 <th>Option</th><th>Value</th><th>Example</th> 443</tr> 444<tr> 445 <th>bases</th><td>Min & max of the sequence range (bp)</td><td>1200..60000</td> 446</tr> 447<tr> 448 <th>width</th><td>width of the image (pixels)</td> <td>600</td> 449</tr> 450<tr> 451 <th>height</th><td>Height of each graphical element (pixels)</td><td>10</td> 452</tr> 453<tr> 454 <th>glyph</th><td>Style of each graphical element (see below)</td><td>transcript</td> 455</tr> 456<tr> 457 <th>fgcolor</th> <td>Foreground color of each element</td> <td>yellow</td> 458</tr> 459<tr> 460 <th>bgcolor</th> <td>Background color of each element</td> <td>blue</td> 461</tr> 462<tr> 463 <th>linewidth</th> <td>Width of lines</td> <td>3</td> 464</tr> 465<tr> 466 <th>label</th> <td>Print the feature's name</td> <td>1</td> 467</tr> 468<tr> 469 <th>description</th> <td>Whether to print the feature's description </td> <td>0</td> 470</tr> 471<tr> 472 <th>bump</th> <td>Elements are not allowed to collide</td> <td>1</td> 473</tr> 474<tr> 475 <th>ticks</th> <td>Print tick marks on arrows</td> <td>1</td> 476</tr> 477<tr> 478 <th>connector</th> <td>Type of group connector (dashed, hat or solid)</td> <td>dashed</td> 479</tr> 480</table> 481<p> 482 483The "bases" and "width" options are only relevant in the [general] 484section. The rest can be located in any section, but if present in 485the [general] section will set defaults for the others. 486 487<p> 488 489Colors are English-language color names or Web-style #RRGGBB colors 490(see a book on HTML for an explanation). True/false values are 1 for 491true, and 0 for false. Numeric ranges can be expressed in 492<i>start</i>..<i>end</i> fashion with two dots, or as 493<i>start</i>-<i>end</i> with a hyphen. 494 495<p> 496The "glyph" option controls how the features are rendered. The 497following glyphs are implemented: 498 499<p> 500 501<table border="1"> 502 503<tr><th>Name</th><th>Description</th></tr> 504<tr> 505 <th> 506 box 507 </th> 508 <td>A filled rectangle, nondirectional.</td> 509</tr> 510<tr> 511 <th>ellipse</th><td>An oval.</td> 512</tr> 513<tr> 514<th>arrow</th> 515<td> An arrow; can be unidirectional or bidirectional. 516 It is also capable of displaying a scale with 517 major and minor tickmarks, and can be oriented 518 horizontally or vertically. 519</td> 520</tr> 521<tr> 522 <th>segments</th> 523 <td> A set of filled rectangles connected by solid lines. 524 Used for interrupted features, such as gapped 525 alignments. 526</td> 527</tr> 528<tr> 529 <th>transcript</th> 530<td> 531 Similar to segments, but the connecting line is 532 a "hat" shape, and the direction of transcription 533 is indicated by a small arrow. 534 </td> 535</tr> 536<tr> 537<th> 538 transcript2</th> 539<td> Similar to transcript, but the direction of 540 transcription is indicated by a terminal segment 541 in the shape of an arrow. 542</td> 543</tr> 544<tr> 545<th> 546 primers 547</th> 548<td> Two inward pointing arrows connected by a line. 549 Used for STSs. 550</td> 551</tr> 552</table> 553<p> 554 555The <b>bump</b> option is the most important option for controlling the look 556of the image. If set to false (the number 0), then the features are allowed 557to overlap. If set to true (the number 1), then the features will move 558vertically to avoid colliding. If not specified, bump is turned on 559if the number of any given type of sequence feature is greater than 560${\BUMP_THRESHOLD}. 561 562<h3>Data Section</h3> 563<p> 564 565The data section can follow or proceed the configuration section. The two sections 566can also be intermixed. The data section is a tab or whitespace-delimited file which you can 567export from a spreadsheet application or word processor file (be sure to save as text only!) 568 569<p> 570 571Here is an example data section: 572 573<p> 574 575<blockquote> 576<pre> 577Cosmid B0511 . 516-619 578Cosmid B0511 . 3185-3294 579Cosmid B0511 . 10946-11208 580Cosmid B0511 . 13126-13511 581Cosmid B0511 . 66-208 582Cosmid B0511 . 6354-6499 583Cosmid B0511 . 13955-14115 584EST yk595e6.5 + 3187-3294 585EST yk846e07.3 - 11015-11208 586EST yk53c10 587 yk53c10.5 + 18892-19154 588 yk53c10.3 - 15000-15500,15700-15800 589EST yk53c10.5 + 16032-16105 590SwissProt PECANEX + 13153-13656 Swedish fish 591FGENESH "Gene 1" - 1-205,518-616,661-735,3187-3365,3436-3846 Transmembrane domain 592FGENESH "Gene 2" - 16626-17396,17451-17597 Kinase and sushi domains 593</pre> 594</blockquote> 595 596<p> 597 598Each line of the file contains five columns. The columns are: 599 600<p> 601 602<table border="1"> 603<tr><th>Column #</th><th>Column Description</th></tr> 604<tr><td align="right">1</td><td>feature type</td></tr> 605<tr><td align="right">2</td><td>feature name</td></tr> 606<tr><td align="right">3</td><td>strand</td></tr> 607<tr><td align="right">4</td><td>coordinates</td></tr> 608<tr><td align="right">5</td><td>description</td></tr> 609</table> 610<p> 611 612The <b>feature type</b> should correspond to one of the [feature type] headings 613in the configuration section. If it doesn't, the [general] options will 614be applied to the feature when rendering it. The <b>feature name</b> is a 615name for the feature. Use a "." or "-" if this is not relevant. If 616the name contains whitespace, put single or double quotes ("") around 617the name. 618 619<p> 620 621The <b>strand</b> 622indicates which strand the feature is on. It is one of "+" for the 623forward strand, "-" for the reverse strand, or "." for features that are not 624stranded. 625 626<p> 627 628The <b>coordinates</b> column is a set of one or more ranges that the 629feature occupies. Ranges are written using ".." as in <i>start</i>..<i>stop</i>, 630or with hyphens, as in <i>start</i>-<i>stop</i>. For features that are composed 631of multiple ranges &em; for example transcripts that have multiple exons &em; 632you can either put the ranges on the same line separated by commas or spaces, 633or put the ranges on individual lines and just use the same feature name and 634type to group them. In the example above, the Cosmid B0511 features use 635the individual line style, while the FGENESH features use the all-ranges-on-one-line 636style. 637 638<p> 639 640The last column contains some descriptive text. If the <b>description</b> option 641is set to true, this text will be printed underneath the feature in the rendering. 642 643<p> 644 645Finally, it is possible to group related features together. An example is 646the ESTs yk53c10.5 and yk53c10.3, which are related by being reads from 647the two ends of the clone yk53c10. To indicate this relationship, generate 648a section that looks like this: 649 650<p> 651 652<blockquote> 653<pre> 654EST yk53c10 655 yk53c10.5 + 18892-19154 656 yk53c10.3 - 15000-15500,15700-15800 657</pre> 658</blockquote> 659 660<p> 661 662The group is indicated by a line that contains just two columns 663containing the feature type and a unique name for the group. 664Follow this line with all 665the features that form the group, but leave the first column 666(the feature type) blank. The group will be rendered by 667drawing a dashed line between all the members of the group. 668You can change this by specifying a different <b>connector</b> 669option in the configuration section for this feature type. 670 671END 672; 673 674} 675 676sub test_data { 677 my $config = shift; 678 my $header = <<'END'; 679[general] 680bases = -1000..21000 681height = 12 682reference = B0511 683 684[Cosmid] 685glyph = segments 686fgcolor = blue 687key = C. elegans conserved regions 688 689[EST] 690glyph = segments 691bgcolor= yellow 692connector = solid 693height = 5 694 695[FGENESH] 696glyph = transcript2 697bgcolor = green 698description = 1 699 700[SwissProt] 701glyph = arrow 702base = 1 703linewidth = 2 704fgcolor = red 705description = 1 706 707[P-element] 708glyph = triangle 709orient = S 710bgcolor = red 711fgcolor = white 712label = 1 713point = 1 714 715END 716; 717 718my $data =<<'END'; 719Cosmid B0511 516-619 720Cosmid B0511 3185-3294 721Cosmid B0511 10946-11208 722Cosmid B0511 13126-13511 723Cosmid B0511 11394-11539 724Cosmid B0511 14383-14490 725Cosmid B0511 15569-15755 726Cosmid B0511 18879-19178 727Cosmid B0511 15850-16110 728Cosmid B0511 66-208 729Cosmid B0511 6354-6499 730Cosmid B0511 13955-14115 731Cosmid B0511 7985-8042 732Cosmid B0511 11916-12046 733P-element "" 500-500 734P-element MrQ 700-700 735P-element MrR 10000-10000 736EST yk260e10.5 15569-15724 737EST yk672a12.5 537-618,3187-3294 738EST yk595e6.5 552-618 739EST yk595e6.5 3187-3294 740EST yk846e07.3 11015-11208 741EST yk53c10 742 yk53c10.3 12876-13577,13882-14121,14169-14535 743 yk53c10.5 18892-19154,15853-16219 744SwissProt "PECANEX Protein" 5513-16656 "From SwissProt" 745FGENESH "Predicted gene 1" -1200--500,518-616,661-735,3187-3365,3436-3846 Pfam domain 746FGENESH "Predicted gene 2" 5513-6497,7968-8136,8278-8383,8651-8839,9462-9515,10032-10705,10949-11340,11387-11524,11765-12067,12876-13577,13882-14121,14169-14535,15006-15209,15259-15462,15513-15753,15853-16219 Mysterious 747FGENESH "Predicted gene 3" 16626-17396,17451-17597 748FGENESH "Predicted gene 4" 18459-18722,18882-19176,19221-19513,19572-30000 "Transmembrane protein" 749END 750 751 return $config ? $header . $data : $data; 752} 753 754__END__ 755 756