1# 2# BioPerl module for Bio::Search::Tiling::MapTiling 3# 4# Please direct questions and support issues to <bioperl-l@bioperl.org> 5# 6# Cared for by Mark A. Jensen <maj@fortinbras.us> 7# 8# Copyright Mark A. Jensen 9# 10# You may distribute this module under the same terms as perl itself 11 12# POD documentation - main docs before the code 13 14=head1 NAME 15 16Bio::Search::Tiling::MapTiling - An implementation of an HSP tiling 17algorithm, with methods to obtain frequently-requested statistics 18 19=head1 SYNOPSIS 20 21 # get a BLAST $hit from somewhere, then 22 $tiling = Bio::Search::Tiling::MapTiling->new($hit); 23 24 # stats 25 $numID = $tiling->identities(); 26 $numCons = $tiling->conserved(); 27 $query_length = $tiling->length('query'); 28 $subject_length = $tiling->length('subject'); # or... 29 $subject_length = $tiling->length('hit'); 30 31 # get a visual on the coverage map 32 print $tiling->coverage_map_as_text('query',$context,'LEGEND'); 33 34 # tilings 35 $context = $tiling->_context( -type => 'subject', -strand=> 1, -frame=>1); 36 @covering_hsps_for_subject = $tiling->next_tiling('subject',$context); 37 $context = $tiling->_context( -type => 'query', -strand=> -1, -frame=>0); 38 @covering_hsps_for_query = $tiling->next_tiling('query', $context); 39 40=head1 DESCRIPTION 41 42Frequently, users want to use a set of high-scoring pairs (HSPs) 43obtained from a BLAST or other search to assess the overall level of 44identity, conservation, or coverage represented by matches between a 45subject and a query sequence. Because a set of HSPs frequently 46describes multiple overlapping sequence fragments, a simple summation of 47statistics over the HSPs will generally overestimate those 48statistics. To obtain an accurate estimate of global hit statistics, a 49'tiling' of HSPs onto either the subject or the query sequence must be 50performed, in order to properly correct for this. 51 52This module will execute a tiling algorithm on a given hit based on an 53interval decomposition I'm calling the "coverage map". Internal object 54methods compute the various statistics, which are then stored in 55appropriately-named public object attributes. See 56L<Bio::Search::Tiling::MapTileUtils> for more info on the algorithm. 57 58=head2 STRAND/FRAME CONTEXTS 59 60In BLASTX, TBLASTN, and TBLASTX reports, strand and frame information 61are reported for the query, subject, or query and subject, 62respectively, for each HSP. Tilings for these sequence types are only 63meaningful when they include HSPs in the same strand and frame, or 64"context". So, in these situations, the context must be specified 65in the method calls or the methods will throw. 66 67Contexts are specified as strings: C<[ 'all' | [m|p][_|0|1|2] ]>, where 68C<all> = all HSPs (will throw if context must be specified), C<m> = minus 69strand, C<p> = plus strand, and C<_> = no frame info, C<0,1,2> = respective 70(absolute) frame. The L</_make_context_key> method will convert a (strand, 71frame) specification to a context string, e.g.: 72 73 $context = $self->_context(-type=>'query', -strand=>-1, -frame=>-2); 74 75returns C<m2>. 76 77The contexts present among the HSPs in a hit are identified and stored 78for convenience upon object construction. These are accessed off the 79object with the L</contexts> method. If contexts don't apply for the 80given report, this returns C<('all')>. 81 82=head1 TILED ALIGNMENTS 83 84The experimental method L<ALIGNMENTS/get_tiled_alns> will use a tiling 85to concatenate tiled hsps into a series of L<Bio::SimpleAlign> 86objects: 87 88 @alns = $tiling->get_tiled_alns($type, $context); 89 90Each alignment contains two sequences with ids 'query' and 'subject', 91and consists of a concatenation of tiling HSPs which overlap or are 92directly adjacent. The alignment are returned in C<$type> sequence 93order. When HSPs overlap, the alignment sequence is taken from the HSP 94which comes first in the coverage map array. 95 96The sequences in each alignment contain features (even though they are 97L<Bio::LocatableSeq> objects) which map the original query/subject 98coordinates to the new alignment sequence coordinates. You can 99determine the original BLAST fragments this way: 100 101 $aln = ($tiling->get_tiled_alns)[0]; 102 $qseq = $aln->get_seq_by_id('query'); 103 $hseq = $aln->get_seq_by_id('subject'); 104 foreach my $feat ($qseq->get_SeqFeatures) { 105 $org_start = ($feat->get_tag_values('query_start'))[0]; 106 $org_end = ($feat->get_tag_values('query_end'))[0]; 107 # original fragment as represented in the tiled alignment: 108 $org_fragment = $feat->seq; 109 } 110 foreach my $feat ($hseq->get_SeqFeatures) { 111 $org_start = ($feat->get_tag_values('subject_start'))[0]; 112 $org_end = ($feat->get_tag_values('subject_end'))[0]; 113 # original fragment as represented in the tiled alignment: 114 $org_fragment = $feat->seq; 115 } 116 117=head1 DESIGN NOTE 118 119The major calculations are made just-in-time, and then memoized. So, 120for example, for a given MapTiling object, a coverage map would 121usually be calculated only once (for the query), and at most twice (if 122the subject perspective is also desired), and then only when a 123statistic is first accessed. Afterward, the map and/or any statistic 124is read from storage. So feel free to call the statistic methods 125frequently if it suits you. 126 127=head1 FEEDBACK 128 129=head2 Mailing Lists 130 131User feedback is an integral part of the evolution of this and other 132Bioperl modules. Send your comments and suggestions preferably to 133the Bioperl mailing list. Your participation is much appreciated. 134 135 bioperl-l@bioperl.org - General discussion 136 http://bioperl.org/wiki/Mailing_lists - About the mailing lists 137 138=head2 Support 139 140Please direct usage questions or support issues to the mailing list: 141 142I<bioperl-l@bioperl.org> 143 144rather than to the module maintainer directly. Many experienced and 145reponsive experts will be able look at the problem and quickly 146address it. Please include a thorough description of the problem 147with code and data examples if at all possible. 148 149=head2 Reporting Bugs 150 151Report bugs to the Bioperl bug tracking system to help us keep track 152of the bugs and their resolution. Bug reports can be submitted via 153the web: 154 155 https://github.com/bioperl/bioperl-live/issues 156 157=head1 AUTHOR - Mark A. Jensen 158 159Email maj -at- fortinbras -dot- us 160 161=head1 APPENDIX 162 163The rest of the documentation details each of the object methods. 164Internal methods are usually preceded with a _ 165 166=cut 167 168# Let the code begin... 169 170package Bio::Search::Tiling::MapTiling; 171$Bio::Search::Tiling::MapTiling::VERSION = '1.7.7'; 172use strict; 173use warnings; 174 175use Bio::Root::Root; 176use Bio::Search::Tiling::TilingI; 177use Bio::Search::Tiling::MapTileUtils; 178use Bio::LocatableSeq; 179 180# use base qw(Bio::Root::Root Bio::Search::Tiling::TilingI); 181use base qw(Bio::Root::Root Bio::Search::Tiling::TilingI); 182 183=head1 CONSTRUCTOR 184 185=head2 new 186 187 Title : new 188 Usage : my $obj = new Bio::Search::Tiling::GenericTiling(); 189 Function: Builds a new Bio::Search::Tiling::GenericTiling object 190 Returns : an instance of Bio::Search::Tiling::GenericTiling 191 Args : -hit => $a_Bio_Search_Hit_HitI_object 192 general filter function: 193 -hsp_filter => sub { my $this_hsp = shift; 194 ...; 195 return 1 if $wanted; 196 return 0; } 197 198=cut 199 200sub new { 201 my $class = shift; 202 my @args = @_; 203 my $self = $class->SUPER::new(@args); 204 my($hit, $filter) = $self->_rearrange( [qw( HIT HSP_FILTER)],@args ); 205 206 $self->throw("HitI object required") unless $hit; 207 $self->throw("Argument must be HitI object") unless ( ref $hit && $hit->isa('Bio::Search::Hit::HitI') ); 208 $self->{hit} = $hit; 209 $self->_set_attributes(); 210 $self->{"_algorithm"} = $hit->algorithm; 211 212 my @hsps = $hit->hsps; 213 # apply filter function if requested 214 if ( defined $filter ) { 215 if ( ref($filter) eq 'CODE' ) { 216 @hsps = map { $filter->($_) ? $_ : () } @hsps; 217 } 218 else { 219 $self->warn("-filter is not a coderef; ignoring"); 220 } 221 } 222 223 # identify available contexts 224 for my $t (qw( query hit )) { 225 my %contexts; 226 for my $i (0..$#hsps) { 227 my $ctxt = $self->_context( 228 -type => $t, 229 -strand => $hsps[$i]->strand($t), 230 -frame => $hsps[$i]->frame($t)); 231 $contexts{$ctxt} ||= []; 232 push @{$contexts{$ctxt}}, $i; 233 } 234 $self->{"_contexts_${t}"} = \%contexts; 235 } 236 237 $self->warn("No HSPs present in hit after filtering") unless (@hsps); 238 $self->hsps(\@hsps); 239 return $self; 240} 241 242# a tiling is based on the set of hsps contained in a single hit. 243# check all the boundaries - zero hsps, one hsp, all disjoint hsps 244 245=head1 TILING ITERATORS 246 247=head2 next_tiling 248 249 Title : next_tiling 250 Usage : @hsps = $self->next_tiling($type); 251 Function: Obtain a tiling: a minimal set of HSPs covering the $type 252 ('hit', 'subject', 'query') sequence 253 Example : 254 Returns : an array of HSPI objects 255 Args : scalar $type: one of 'hit', 'subject', 'query', with 256 'subject' an alias for 'hit' 257 258=cut 259 260sub next_tiling{ 261 my $self = shift; 262 my ($type, $context) = @_; 263 $self->_check_type_arg(\$type); 264 $self->_check_context_arg($type, \$context); 265 return $self->_tiling_iterator($type, $context)->(); 266} 267 268=head2 rewind_tilings 269 270 Title : rewind_tilings 271 Usage : $self->rewind_tilings($type) 272 Function: Reset the next_tilings($type) iterator 273 Example : 274 Returns : True on success 275 Args : scalar $type: one of 'hit', 'subject', 'query'; 276 default is 'query' 277 278=cut 279 280sub rewind_tilings{ 281 my $self = shift; 282 my ($type,$context) = @_; 283 $self->_check_type_arg(\$type); 284 $self->_check_context_arg($type, \$context); 285 return $self->_tiling_iterator($type, $context)->('REWIND'); 286} 287 288=head1 ALIGNMENTS 289 290=head2 get_tiled_alns() 291 292 Title : get_tiled_alns 293 Usage : @alns = $tiling->get_tiled_alns($type, $context) 294 Function: Use a tiling to construct a minimal set of alignment 295 objects covering the region specified by $type/$context 296 by splicing adjacent HSP tiles 297 Returns : an array of Bio::SimpleAlign objects; see Note below 298 Args : scalar $type: one of 'hit', 'subject', 'query' 299 default is 'query' 300 scalar $context: strand/frame context string 301 Following $type and $context, an array of 302 ordered, tiled HSP objects can be specified; this is 303 the tiling that will directly the alignment construction 304 default -- the first tiling provided by a tiling iterator 305 Notes : Each returned alignment is a concatenation of adjacent tiles. 306 The set of alignments will cover all regions described by the 307 $type/$context pair in the hit. The pair of sequences in each 308 alignment have ids 'query' and 'subject', and each sequence 309 possesses SeqFeatures that map the original query or subject 310 coordinates to the sequence coordinates in the tiled alignment. 311 312=cut 313 314sub get_tiled_alns { 315 my $self = shift; 316 my ($type, $context) = @_; 317 $self->_check_type_arg(\$type); 318 $self->_check_context_arg($type, \$context); 319 my $t = shift; # first arg after type/context, arrayref to a tiling 320 my @tiling; 321 if ($t && (ref($t) eq 'ARRAY')) { 322 @tiling = @$t; 323 } 324 else { # otherwise, take the first tiling available 325 326 @tiling = $self->_make_tiling_iterator($type,$context)->(); 327 } 328 my @ret; 329 330 my @map = $self->coverage_map($type, $context); 331 my @intervals = map {$_->[0]} @map; # disjoint decomp 332 # divide into disjoint covering groups 333 my @groups = covering_groups(@intervals); 334 335 require Bio::SimpleAlign; 336 require Bio::SeqFeature::Generic; 337 # cut hsp aligns along the disjoint decomp 338 # look for gaps...or add gaps? 339 my ($q_start, $h_start, $q_strand, $h_strand); 340 # build seqs 341 for my $grp (@groups) { 342 my $taln = Bio::SimpleAlign->new(); 343 my (@qfeats, @hfeats); 344 my $query_string = ''; 345 my $hit_string = ''; 346 my ($qlen,$hlen) = (0,0); 347 my ($qinc, $hinc, $qstart, $hstart); 348 for my $intvl (@$grp) { 349 # following just chooses the first available hsp containing the 350 # interval -- this is arbitrary, could be smarter. 351 my $aln = ( containing_hsps($intvl, @tiling) )[0]->get_aln; 352 my $qseq = $aln->get_seq_by_pos(1); 353 my $hseq = $aln->get_seq_by_pos(2); 354 $qstart ||= $qseq->start; 355 $hstart ||= $hseq->start; 356 # calculate the slice boundaries 357 my ($beg, $end); 358 for ($type) { 359 /query/ && do { 360 $beg = $aln->column_from_residue_number($qseq->id, $intvl->[0]); 361 $end = $aln->column_from_residue_number($qseq->id, $intvl->[1]); 362 last; 363 }; 364 /subject|hit/ && do { 365 $beg = $aln->column_from_residue_number($hseq->id, $intvl->[0]); 366 $end = $aln->column_from_residue_number($hseq->id, $intvl->[1]); 367 last; 368 }; 369 } 370 $aln = $aln->slice($beg, $end); 371 $qseq = $aln->get_seq_by_pos(1); 372 $hseq = $aln->get_seq_by_pos(2); 373 $qinc = $qseq->length - $qseq->num_gaps($Bio::LocatableSeq::GAP_SYMBOLS); 374 $hinc = $hseq->length - $hseq->num_gaps($Bio::LocatableSeq::GAP_SYMBOLS); 375 376 push @qfeats, Bio::SeqFeature::Generic->new( 377 -start => $qlen+1, 378 -end => $qlen+$qinc, 379 -strand => $qseq->strand, 380 -primary => 'query', 381 -source_tag => 'BLAST', 382 -display_name => 'query coordinates', 383 -tag => { query_id => $qseq->id, 384 query_desc => $qseq->desc, 385 query_start => $qstart + (($qseq->strand && $qseq->strand < 0) ? -1 : 1)*$qlen, 386 query_end => $qstart + (($qseq->strand && $qseq->strand < 0) ? -1 : 1)*($qlen+$qinc-1), 387 } 388 ); 389 push @hfeats, Bio::SeqFeature::Generic->new( 390 -start => $hlen+1, 391 -end => $hlen+$hinc, 392 -strand => $hseq->strand, 393 -primary => 'subject/hit', 394 -source_tag => 'BLAST', 395 -display_name => 'subject/hit coordinates', 396 -tag => { subject_id => $hseq->id, 397 subject_desc => $hseq->desc, 398 subject_start => $hstart + (($hseq->strand && $hseq->strand < 0) ? -1 : 1)*$hlen, 399 subject_end => $hstart + (($hseq->strand && $hseq->strand < 0) ? -1 : 1)*($hlen+$hinc-1) 400 } 401 ); 402 $query_string .= $qseq->seq; 403 $hit_string .= $hseq->seq; 404 $qlen += $qinc; 405 $hlen += $hinc; 406 } 407 # create the LocatableSeqs and add the features to each 408 # then add the seqs to the new aln 409 # note in MapTileUtils, Bio::FeatureHolderI methods have been 410 # mixed into Bio::LocatableSeq 411 my $qseq = Bio::LocatableSeq->new( -id => 'query', 412 -seq => $query_string); 413 $qseq->add_SeqFeature(@qfeats); 414 my $hseq = Bio::LocatableSeq->new( -id => 'subject', 415 -seq => $hit_string ); 416 $hseq->add_SeqFeature(@hfeats); 417 $taln->add_seq($qseq); 418 $taln->add_seq($hseq); 419 push @ret, $taln; 420 } 421 422 return @ret; 423} 424 425=head1 STATISTICS 426 427=head2 identities 428 429 Title : identities 430 Usage : $tiling->identities($type, $action, $context) 431 Function: Retrieve the calculated number of identities for the invocant 432 Example : 433 Returns : value of identities (a scalar) 434 Args : scalar $type: one of 'hit', 'subject', 'query' 435 default is 'query' 436 option scalar $action: one of 'exact', 'est', 'fast', 'max' 437 default is 'exact' 438 option scalar $context: strand/frame context string 439 Note : getter only 440 441=cut 442 443sub identities{ 444 my $self = shift; 445 my ($type, $action, $context) = @_; 446 $self->_check_type_arg(\$type); 447 $self->_check_action_arg(\$action); 448 $self->_check_context_arg($type, \$context); 449 if (!defined $self->{"identities_${type}_${action}_${context}"}) { 450 $self->_calc_stats($type, $action, $context); 451 } 452 return $self->{"identities_${type}_${action}_${context}"}; 453} 454 455=head2 conserved 456 457 Title : conserved 458 Usage : $tiling->conserved($type, $action) 459 Function: Retrieve the calculated number of conserved sites for the invocant 460 Example : 461 Returns : value of conserved (a scalar) 462 Args : scalar $type: one of 'hit', 'subject', 'query' 463 default is 'query' 464 option scalar $action: one of 'exact', 'est', 'fast', 'max' 465 default is 'exact' 466 option scalar $context: strand/frame context string 467 Note : getter only 468 469=cut 470 471sub conserved{ 472 my $self = shift; 473 my ($type, $action, $context) = @_; 474 $self->_check_type_arg(\$type); 475 $self->_check_action_arg(\$action); 476 $self->_check_context_arg($type, \$context); 477 if (!defined $self->{"conserved_${type}_${action}_${context}"}) { 478 $self->_calc_stats($type, $action, $context); 479 } 480 return $self->{"conserved_${type}_${action}_${context}"}; 481} 482 483=head2 length 484 485 Title : length 486 Usage : $tiling->length($type, $action, $context) 487 Function: Retrieve the total length of aligned residues for 488 the seq $type 489 Example : 490 Returns : value of length (a scalar) 491 Args : scalar $type: one of 'hit', 'subject', 'query' 492 default is 'query' 493 option scalar $action: one of 'exact', 'est', 'fast', 'max' 494 default is 'exact' 495 option scalar $context: strand/frame context string 496 Note : getter only 497 498=cut 499 500sub length{ 501 my $self = shift; 502 my ($type,$action,$context) = @_; 503 $self->_check_type_arg(\$type); 504 $self->_check_action_arg(\$action); 505 $self->_check_context_arg($type, \$context); 506 if (!defined $self->{"length_${type}_${action}_${context}"}) { 507 $self->_calc_stats($type, $action, $context); 508 } 509 return $self->{"length_${type}_${action}_${context}"}; 510} 511 512=head2 frac 513 514 Title : frac 515 Usage : $tiling->frac($type, $denom, $action, $context, $method) 516 Function: Return the fraction of sequence length consisting 517 of desired kinds of pairs (given by $method), 518 with respect to $denom 519 Returns : scalar float 520 Args : -type => one of 'hit', 'subject', 'query' 521 -denom => one of 'total', 'aligned' 522 -action => one of 'exact', 'est', 'fast', 'max' 523 -context => strand/frame context string 524 -method => one of 'identical', 'conserved' 525 Note : $denom == 'aligned', return desired_stat/num_aligned 526 $denom == 'total', return desired_stat/_reported_length 527 (i.e., length of the original input sequences) 528 Note : In keeping with the spirit of Bio::Search::HSP::HSPI, 529 reported lengths of translated dna are reduced by 530 a factor of 3, to provide fractions relative to 531 amino acid coordinates. 532 533=cut 534 535sub frac { 536 my $self = shift; 537 my @args = @_; 538 my ($type, $denom, $action, $context, $method) = $self->_rearrange([qw(TYPE DENOM ACTION CONTEXT METHOD)],@args); 539 $self->_check_type_arg(\$type); 540 $self->_check_action_arg(\$action); 541 $self->_check_context_arg($type, \$context); 542 unless ($method and grep(/^$method$/, qw( identical conserved ))) { 543 $self->throw("-method must specified; one of ('identical', 'conserved')"); 544 } 545 $denom ||= 'total'; 546 unless (grep /^$denom/, qw( total aligned )) { 547 $self->throw("Denominator selection must be one of ('total', 'aligned'), not '$denom'"); 548 } 549 my $key = "frac_${method}_${type}_${denom}_${action}_${context}"; 550 my $stat; 551 for ($method) { 552 $_ eq 'identical' && do { 553 $stat = $self->identities($type, $action, $context); 554 last; 555 }; 556 $_ eq 'conserved' && do { 557 $stat = $self->conserved($type, $action, $context); 558 last; 559 }; 560 do { 561 $self->throw("What are YOU doing here?"); 562 }; 563 } 564 if (!defined $self->{$key}) { 565 for ($denom) { 566 /total/ && do { 567 $self->{$key} = 568 $stat/$self->_reported_length($type); # need fudge fac?? 569 last; 570 }; 571 /aligned/ && do { 572 $self->{$key} = 573 $stat/$self->length($type,$action,$context); 574 last; 575 }; 576 do { 577 $self->throw("What are YOU doing here?"); 578 }; 579 } 580 } 581 return $self->{$key}; 582} 583 584=head2 frac_identical 585 586 Title : frac_identical 587 Usage : $tiling->frac_identical($type, $denom, $action, $context) 588 Function: Return the fraction of sequence length consisting 589 of identical pairs, with respect to $denom 590 Returns : scalar float 591 Args : -type => one of 'hit', 'subject', 'query' 592 -denom => one of 'total', 'aligned' 593 -action => one of 'exact', 'est', 'fast', 'max' 594 -context => strand/frame context string 595 Note : $denom == 'aligned', return conserved/num_aligned 596 $denom == 'total', return conserved/_reported_length 597 (i.e., length of the original input sequences) 598 Note : In keeping with the spirit of Bio::Search::HSP::HSPI, 599 reported lengths of translated dna are reduced by 600 a factor of 3, to provide fractions relative to 601 amino acid coordinates. 602 Note : This an alias that calls frac() 603 604=cut 605 606sub frac_identical{ 607 my $self = shift; 608 my @args = @_; 609 my ($type, $denom, $action,$context) = $self->_rearrange( [qw[ TYPE DENOM ACTION CONTEXT]],@args ); 610 $self->frac( -type=>$type, -denom=>$denom, -action=>$action, -method=>'identical', -context=>$context); 611} 612 613=head2 frac_conserved 614 615 Title : frac_conserved 616 Usage : $tiling->frac_conserved($type, $denom, $action, $context) 617 Function: Return the fraction of sequence length consisting 618 of conserved pairs, with respect to $denom 619 Returns : scalar float 620 Args : -type => one of 'hit', 'subject', 'query' 621 -denom => one of 'total', 'aligned' 622 -action => one of 'exact', 'est', 'fast', 'max' 623 -context => strand/frame context string 624 Note : $denom == 'aligned', return conserved/num_aligned 625 $denom == 'total', return conserved/_reported_length 626 (i.e., length of the original input sequences) 627 Note : In keeping with the spirit of Bio::Search::HSP::HSPI, 628 reported lengths of translated dna are reduced by 629 a factor of 3, to provide fractions relative to 630 amino acid coordinates. 631 Note : This an alias that calls frac() 632 633=cut 634 635sub frac_conserved{ 636 my $self = shift; 637 my @args = @_; 638 my ($type, $denom, $action, $context) = $self->_rearrange( [qw[ TYPE DENOM ACTION CONTEXT]],@args ); 639 $self->frac( -type=>$type, -denom=>$denom, -action=>$action, -context=>$context, -method=>'conserved'); 640} 641 642=head2 frac_aligned 643 644 Title : frac_aligned 645 Aliases : frac_aligned_query - frac_aligned(-type=>'query',...) 646 frac_aligned_hit - frac_aligned(-type=>'hit',...) 647 Usage : $tiling->frac_aligned(-type=>$type, 648 -action=>$action, 649 -context=>$context) 650 Function: Return the fraction of input sequence length 651 that was aligned by the algorithm 652 Returns : scalar float 653 Args : -type => one of 'hit', 'subject', 'query' 654 -action => one of 'exact', 'est', 'fast', 'max' 655 -context => strand/frame context string 656 657=cut 658 659sub frac_aligned{ 660 my ($self, @args) = @_; 661 my ($type, $action, $context) = $self->_rearrange([qw(TYPE ACTION CONTEXT)],@args); 662 $self->_check_type_arg(\$type); 663 $self->_check_action_arg(\$action); 664 $self->_check_context_arg($type, \$context); 665 if (!$self->{"frac_aligned_${type}_${action}_${context}"}) { 666 $self->{"frac_aligned_${type}_${action}_${context}"} = $self->num_aligned($type,$action,$context)/$self->_reported_length($type); 667 } 668 return $self->{"frac_aligned_${type}_${action}_${context}"}; 669} 670 671sub frac_aligned_query { shift->frac_aligned(-type=>'query', @_) } 672sub frac_aligned_hit { shift->frac_aligned(-type=>'hit', @_) } 673 674 675=head2 num_aligned 676 677 Title : num_aligned 678 Usage : $tiling->num_aligned(-type=>$type) 679 Function: Return the number of residues of sequence $type 680 that were aligned by the algorithm 681 Returns : scalar int 682 Args : -type => one of 'hit', 'subject', 'query' 683 -action => one of 'exact', 'est', 'fast', 'max' 684 -context => strand/frame context string 685 Note : Since this is calculated from reported coordinates, 686 not symbol string counts, it is already in terms of 687 "logical length" 688 Note : Aliases length() 689 690=cut 691 692sub num_aligned { shift->length( @_ ) }; 693 694=head2 num_unaligned 695 696 Title : num_unaligned 697 Usage : $tiling->num_unaligned(-type=>$type) 698 Function: Return the number of residues of sequence $type 699 that were left unaligned by the algorithm 700 Returns : scalar int 701 Args : -type => one of 'hit', 'subject', 'query' 702 -action => one of 'exact', 'est', 'fast', 'max' 703 -context => strand/frame context string 704 Note : Since this is calculated from reported coordinates, 705 not symbol string counts, it is already in terms of 706 "logical length" 707 708=cut 709 710sub num_unaligned { 711 my $self = shift; 712 my ($type,$action,$context) = @_; 713 my $ret; 714 $self->_check_type_arg(\$type); 715 $self->_check_action_arg(\$action); 716 $self->_check_context_arg($type, \$context); 717 if (!defined $self->{"num_unaligned_${type}_${action}_${context}"}) { 718 $self->{"num_unaligned_${type}_${action}_${context}"} = $self->_reported_length($type)-$self->num_aligned($type,$action,$context); 719 } 720 return $self->{"num_unaligned_${type}_${action}_${context}"}; 721} 722 723 724=head2 range 725 726 Title : range 727 Usage : $tiling->range(-type=>$type) 728 Function: Returns the extent of the longest tiling 729 as ($min_coord, $max_coord) 730 Returns : array of two scalar integers 731 Args : -type => one of 'hit', 'subject', 'query' 732 -context => strand/frame context string 733 734=cut 735 736sub range { 737 my ($self, $type, $context) = @_; 738 $self->_check_type_arg(\$type); 739 $self->_check_context_arg($type, \$context); 740 my @a = $self->_contig_intersection($type,$context); 741 return ($a[0][0], $a[-1][1]); 742} 743 744 745 746=head1 ACCESSORS 747 748=head2 coverage_map 749 750 Title : coverage_map 751 Usage : $map = $tiling->coverage_map($type) 752 Function: Property to contain the coverage map calculated 753 by _calc_coverage_map() - see that for 754 details 755 Example : 756 Returns : value of coverage_map_$type as an array 757 Args : scalar $type: one of 'hit', 'subject', 'query' 758 default is 'query' 759 Note : getter 760 761=cut 762 763sub coverage_map{ 764 my $self = shift; 765 my ($type, $context) = @_; 766 $self->_check_type_arg(\$type); 767 $self->_check_context_arg($type, \$context); 768 769 if (!defined $self->{"coverage_map_${type}_${context}"}) { 770 # following calculates coverage maps in all strands/frames 771 # if necessary 772 $self->_calc_coverage_map($type, $context); 773 } 774 # if undef is returned, then there were no hsps for given strand/frame 775 if (!defined $self->{"coverage_map_${type}_${context}"}) { 776 $self->warn("No HSPS present for type '$type' in context '$context' for this hit"); 777 return undef; 778 } 779 return @{$self->{"coverage_map_${type}_${context}"}}; 780} 781 782=head2 coverage_map_as_text 783 784 Title : coverage_map_as_text 785 Usage : $tiling->coverage_map_as_text($type, $legend_flag) 786 Function: Format a text-graphic representation of the 787 coverage map 788 Returns : an array of scalar strings, suitable for printing 789 Args : $type: one of 'query', 'hit', 'subject' 790 $context: strand/frame context string 791 $legend_flag: boolean; add a legend indicating 792 the actual interval coordinates for each component 793 interval and hsp (in the $type sequence context) 794 Example : print $tiling->coverage_map_as_text('query',1); 795 796=cut 797 798sub coverage_map_as_text{ 799 my $self = shift; 800 my ($type, $context, $legend_q) = @_; 801 $self->_check_type_arg(\$type); 802 $self->_check_context_arg($type, \$context); 803 804 my @map = $self->coverage_map($type, $context); 805 my @ret; 806 my @hsps = $self->hit->hsps; 807 my %hsps_i; 808 require Tie::RefHash; 809 tie %hsps_i, 'Tie::RefHash'; 810 @hsps_i{@hsps} = (0..$#hsps); 811 my @mx; 812 foreach (0..$#map) { 813 my @hspx = ('') x @hsps; 814 my @these_hsps = @{$map[$_]->[1]}; 815 @hspx[@hsps_i{@these_hsps}] = ('*') x @these_hsps; 816 $mx[$_] = \@hspx; 817 } 818 untie %hsps_i; 819 820 push @ret, "\tIntvl\n"; 821 push @ret, "HSPS\t", join ("\t", (0..$#map)), "\n"; 822 foreach my $h (0..$#hsps) { 823 push @ret, join("\t", $h, map { $mx[$_][$h] } (0..$#map) ),"\n"; 824 } 825 if ($legend_q) { 826 push @ret, "Interval legend\n"; 827 foreach (0..$#map) { 828 push @ret, sprintf("%d\t[%d, %d]\n", $_, @{$map[$_][0]}); 829 } 830 push @ret, "HSP legend\n"; 831 my @ints = get_intervals_from_hsps($type,@hsps); 832 foreach (0..$#hsps) { 833 push @ret, sprintf("%d\t[%d, %d]\n", $_, @{$ints[$_]}); 834 } 835 } 836 return @ret; 837} 838 839=head2 hit 840 841 Title : hit 842 Usage : $tiling->hit 843 Function: 844 Example : 845 Returns : The HitI object associated with the invocant 846 Args : none 847 Note : getter only 848 849=cut 850 851sub hit{ 852 my $self = shift; 853 $self->warn("Getter only") if @_; 854 return $self->{'hit'}; 855} 856 857=head2 hsps 858 859 Title : hsps 860 Usage : $tiling->hsps() 861 Function: Container for the HSP objects associated with invocant 862 Example : 863 Returns : an array of hsps associated with the hit 864 Args : on set, new value (an arrayref or undef, optional) 865 866=cut 867 868sub hsps{ 869 my $self = shift; 870 return $self->{'hsps'} = shift if @_; 871 return @{$self->{'hsps'}}; 872} 873 874=head2 contexts 875 876 Title : contexts 877 Usage : @contexts = $tiling->context($type) or 878 @indices = $tiling->context($type, $context) 879 Function: Retrieve the set of available contexts in the hit, 880 or the indices of hsps having the given context 881 (integer indices for the array returned by $self->hsps) 882 Returns : array of scalar context strings or 883 array of scalar positive integers 884 undef if no hsps in given context 885 Args : $type: one of 'query', 'hit', 'subject' 886 optional $context: context string 887 888=cut 889 890sub contexts{ 891 my $self = shift; 892 my ($type, $context) = @_; 893 $self->_check_type_arg(\$type); 894 return keys %{$self->{"_contexts_$type"}} unless defined $context; 895 return undef unless $self->{"_contexts_$type"}{$context}; 896 return @{$self->{"_contexts_$type"}{$context}}; 897} 898 899=head2 mapping 900 901 Title : mapping 902 Usage : $tiling->mapping($type) 903 Function: Retrieve the mapping coefficient for the sequence type 904 based on the underlying algorithm 905 Returns : scalar integer (mapping coefficient) 906 Args : $type: one of 'query', 'hit', 'subject' 907 Note : getter only (set in constructor) 908 909=cut 910 911sub mapping{ 912 my $self = shift; 913 my $type = shift; 914 $self->_check_type_arg(\$type); 915 return $self->{"_mapping_${type}"}; 916} 917 918=head2 default_context 919 920 Title : default_context 921 Usage : $tiling->default_context($type) 922 Function: Retrieve the default strand/frame context string 923 for the sequence type based on the underlying algorithm 924 Returns : scalar string (context string) 925 Args : $type: one of 'query', 'hit', 'subject' 926 Note : getter only (set in constructor) 927 928=cut 929 930sub default_context{ 931 my $self = shift; 932 my $type = shift; 933 $self->_check_type_arg(\$type); 934 return $self->{"_def_context_${type}"}; 935} 936 937=head2 algorithm 938 939 Title : algorithm 940 Usage : $tiling->algorithm 941 Function: Retrieve the algorithm name associated with the 942 invocant's hit object 943 Returns : scalar string 944 Args : none 945 Note : getter only (set in constructor) 946 947=cut 948 949sub algorithm{ 950 my $self = shift; 951 $self->warn("Getter only") if @_; 952 return $self->{"_algorithm"}; 953} 954 955=head1 "PRIVATE" METHODS 956 957=head2 Calculators 958 959See L<Bio::Search::Tiling::MapTileUtils> for lower level 960calculation methods. 961 962=head2 _calc_coverage_map 963 964 Title : _calc_coverage_map 965 Usage : $tiling->_calc_coverage_map($type) 966 Function: Calculates the coverage map for the object's associated 967 hit from the perspective of the desired $type (see Args:) 968 and sets the coverage_map() property 969 Returns : True on success 970 Args : optional scalar $type: one of 'hit'|'subject'|'query' 971 default is 'query' 972 Note : The "coverage map" is an array with the following format: 973 ( [ $component_interval => [ @containing_hsps ] ], ... ), 974 where $component_interval is a closed interval (see 975 DESCRIPTION) of the form [$a0, $a1] with $a0 <= $a1, and 976 @containing_hsps is an array of all HspI objects in the hit 977 which completely contain the $component_interval. 978 The set of $component_interval's is a disjoint decomposition 979 of the minimum set of minimal intervals that completely 980 cover the hit's HSPs (from the perspective of the $type) 981 Note : This calculates the map for all strand/frame contexts available 982 in the hit 983 984=cut 985 986sub _calc_coverage_map { 987 my $self = shift; 988 my ($type) = @_; 989 $self->_check_type_arg(\$type); 990 991 # obtain the [start, end] intervals for all hsps in the hit (relative 992 # to the type) 993 unless ($self->{'hsps'}) { 994 $self->warn("No HSPs for this hit"); 995 return; 996 } 997 998 my (@map, @hsps, %filters, @intervals); 999 1000 1001 # conversion here? 1002 my $c = $self->mapping($type); 1003 1004 # create the possible maps 1005 for my $context ($self->contexts($type)) { 1006 @map = (); 1007 @hsps = ($self->hsps)[$self->contexts($type, $context)]; 1008 @intervals = get_intervals_from_hsps( $type, @hsps ); 1009 # the "frame" 1010 my $f = ($intervals[0]->[0] - 1) % $c; 1011 1012 # convert interval endpoints... 1013 for (@intervals) { 1014 $$_[0] = ($$_[0] - $f + $c - 1)/$c; 1015 $$_[1] = ($$_[1] - $f)/$c; 1016 } 1017 1018 # determine the minimal set of disjoint intervals that cover the 1019 # set of hsp intervals 1020 my @dj_set = interval_tiling(\@intervals); 1021 1022 # decompose each disjoint interval into another set of disjoint 1023 # intervals, each of which is completely contained within the 1024 # original hsp intervals with which it overlaps 1025 my $i=0; 1026 my @decomp; 1027 for my $dj_elt (@dj_set) { 1028 my ($covering, $indices) = @$dj_elt; 1029 my @covering_hsps = @hsps[@$indices]; 1030 my @coverers = @intervals[@$indices]; 1031 @decomp = decompose_interval( \@coverers ); 1032 for (@decomp) { 1033 my ($component, $container_indices) = @{$_}; 1034 push @map, [ $component, 1035 [@covering_hsps[@$container_indices]] ]; 1036 } 1037 1; 1038 } 1039 1040 # unconvert the components: 1041##### 1042 foreach (@map) { 1043 $$_[0][0] = $c*$$_[0][0] - $c + 1 + $f; 1044 $$_[0][1] = $c*$$_[0][1] + $f; 1045 } 1046 foreach (@dj_set) { 1047 $$_[0][0] = $c*$$_[0][0] - $c + 1 + $f; 1048 $$_[0][1] = $c*$$_[0][1] + $f; 1049 } 1050 1051 # sort the map on the interval left-ends 1052 @map = sort { $a->[0][0]<=>$b->[0][0] } @map; 1053 $self->{"coverage_map_${type}_${context}"} = [@map]; 1054 # set the _contig_intersection attribute here (side effect) 1055 $self->{"_contig_intersection_${type}_${context}"} = [map { $$_[0] } @map]; 1056 } 1057 1058 return 1; # success 1059} 1060 1061=head2 _calc_stats 1062 1063 Title : _calc_stats 1064 Usage : $tiling->_calc_stats($type, $action, $context) 1065 Function: Calculates [estimated] tiling statistics (identities, conserved sites 1066 length) and sets the public accessors 1067 Returns : True on success 1068 Args : scalar $type: one of 'hit', 'subject', 'query' 1069 default is 'query' 1070 optional scalar $action: requests calculation method 1071 currently one of 'exact', 'est', 'fast', 'max' 1072 option scalar $context: strand/frame context string 1073 Note : Action: The statistics are calculated by summing quantities 1074 over the disjoint component intervals, taking into account 1075 coverage of those intervals by multiple HSPs. The action 1076 tells the algorithm how to obtain those quantities-- 1077 'exact' will use Bio::Search::HSP::HSPI::matches 1078 to count the appropriate segment of the homology string; 1079 'est' will estimate the statistics by multiplying the 1080 fraction of the HSP overlapped by the component interval 1081 (see MapTileUtils) by the BLAST-reported identities/postives 1082 (this may be convenient for BLAST summary report formats) 1083 * Both exact and est take the average over the number of HSPs 1084 that overlap the component interval. 1085 'max' uses the exact method to calculate the statistics, 1086 and returns only the maximum identites/positives over 1087 overlapping HSP for the component interval. No averaging 1088 is involved here. 1089 'fast' doesn't involve tiling at all (hence the name), 1090 but it seems like a very good estimate, and uses only 1091 reported values, and so does not require sequence data. It 1092 calculates an average of reported identities, conserved 1093 sites, and lengths, over unmodified hsps in the hit, 1094 weighted by the length of the hsps. 1095 1096=cut 1097 1098sub _calc_stats { 1099 my $self = shift; 1100 my ($type, $action, $context) = @_; 1101 # need to check args here, in case method is called internally. 1102 $self->_check_type_arg(\$type); 1103 $self->_check_action_arg(\$action); 1104 $self->_check_context_arg($type, \$context); 1105 my ($ident, $cons, $length) = (0,0,0); 1106 1107 # fast : avoid coverage map altogether, get a pretty damn 1108 # good estimate with a weighted average of reported hsp 1109 # statistics 1110 1111 ($action eq 'fast') && do { 1112 my @hsps = $self->hit->hsps; 1113 @hsps = @hsps[$self->contexts($type, $context)]; 1114 # weights for averages 1115 my @wt = map {$_->length($type)} @hsps; 1116 my $sum = eval( join('+',@wt) ); 1117 $_ /= $sum for (@wt); 1118 for (@hsps) { 1119 my $wt = shift @wt; 1120 $ident += $wt*$_->matches_MT($type,'identities'); 1121 $cons += $wt*$_->matches_MT($type,'conserved'); 1122 $length += $wt*$_->length($type); 1123 } 1124 }; 1125 1126 # or, do tiling 1127 1128 # calculate identities/conserved sites in tiling 1129 # estimate based on the fraction of the component interval covered 1130 # and ident/cons reported by the HSPs 1131 ($action ne 'fast') && do { 1132 foreach ($self->coverage_map($type, $context)) { 1133 my ($intvl, $hsps) = @{$_}; 1134 my $len = ($$intvl[1]-$$intvl[0]+1); 1135 my $ncover = ($action eq 'max') ? 1 : scalar @$hsps; 1136 my ($acc_i, $acc_c) = (0,0); 1137 foreach my $hsp (@$hsps) { 1138 for ($action) { 1139 ($_ eq 'est') && do { 1140 my ($inc_i, $inc_c) = $hsp->matches_MT( 1141 -type => $type, 1142 -action => 'searchutils', 1143 ); 1144 my $frac = $len/$hsp->length($type); 1145 $acc_i += $inc_i * $frac; 1146 $acc_c += $inc_c * $frac; 1147 last; 1148 }; 1149 ($_ eq 'max') && do { 1150 my ($inc_i, $inc_c) = $hsp->matches_MT( 1151 -type => $type, 1152 -action => 'searchutils', 1153 -start => $$intvl[0], 1154 -end => $$intvl[1] 1155 ); 1156 $acc_i = ($acc_i > $inc_i) ? $acc_i : $inc_i; 1157 $acc_c = ($acc_c > $inc_c) ? $acc_c : $inc_c; 1158 last; 1159 }; 1160 (!$_ || ($_ eq 'exact')) && do { 1161 my ($inc_i, $inc_c) = $hsp->matches_MT( 1162 -type => $type, 1163 -action => 'searchutils', 1164 -start => $$intvl[0], 1165 -end => $$intvl[1] 1166 ); 1167 $acc_i += $inc_i; 1168 $acc_c += $inc_c; 1169 last; 1170 }; 1171 } 1172 } 1173 $ident += ($acc_i/$ncover); 1174 $cons += ($acc_c/$ncover); 1175 $length += $len; 1176 } 1177 }; 1178 1179 $self->{"identities_${type}_${action}_${context}"} = $ident; 1180 $self->{"conserved_${type}_${action}_${context}"} = $cons; 1181 $self->{"length_${type}_${action}_${context}"} = $length; 1182 1183 return 1; 1184} 1185 1186=head2 Tiling Helper Methods 1187 1188=cut 1189 1190# coverage_map is of the form 1191# ( [ $interval, \@containing_hsps ], ... ) 1192 1193# so, for each interval, pick one of the containing hsps, 1194# and return the union of all the picks. 1195 1196# use the combinatorial generating iterator, with 1197# the urns containing the @containing_hsps for each 1198# interval 1199 1200=head2 _make_tiling_iterator 1201 1202 Title : _make_tiling_iterator 1203 Usage : $self->_make_tiling_iterator($type) 1204 Function: Create an iterator code ref that will step through all 1205 minimal combinations of HSPs that produce complete coverage 1206 of the $type ('hit', 'subject', 'query') sequence, 1207 and set the correct iterator property of the invocant 1208 Example : 1209 Returns : The iterator 1210 Args : scalar $type, one of 'hit', 'subject', 'query'; 1211 default is 'query' 1212 1213=cut 1214 1215sub _make_tiling_iterator { 1216 ### create the urns 1217 my $self = shift; 1218 my ($type, $context) = @_; 1219 $self->_check_type_arg(\$type); 1220 $self->_check_context_arg($type, \$context); 1221 1222 # initialize the urns 1223 my @urns = map { [0, $$_[1]] } $self->coverage_map($type, $context); 1224 1225 my $FINISHED = 0; 1226 my $iter = sub { 1227 # rewind? 1228 if (my $rewind = shift) { 1229 # reinitialize urn indices 1230 $$_[0] = 0 for (@urns); 1231 $FINISHED = 0; 1232 return 1; 1233 } 1234 # check if done... 1235 return if $FINISHED; 1236 1237 my $finished_incrementing = 0; 1238 # @ret is the collector of urn choices 1239 my @ret; 1240 1241 for my $urn (@urns) { 1242 my ($n, $hsps) = @$urn; 1243 push @ret, $$hsps[$n]; 1244 unless ($finished_incrementing) { 1245 if ($n == $#$hsps) { $$urn[0] = 0; } 1246 else { ($$urn[0])++; $finished_incrementing = 1 } 1247 } 1248 } 1249 1250 # backstop... 1251 $FINISHED = 1 unless $finished_incrementing; 1252 # uniquify @ret 1253 # $hsp->rank is a unique identifier for an hsp in a hit. 1254 # preserve order in @ret 1255 1256 my (%order, %uniq); 1257 @order{(0..$#ret)} = @ret; 1258 $uniq{$order{$_}->rank} = $_ for (0..$#ret); 1259 @ret = @order{ sort {$a<=>$b} values %uniq }; 1260 1261 return @ret; 1262 }; 1263 return $iter; 1264} 1265 1266=head2 _tiling_iterator 1267 1268 Title : _tiling_iterator 1269 Usage : $tiling->_tiling_iterator($type,$context) 1270 Function: Retrieve the tiling iterator coderef for the requested 1271 $type ('hit', 'subject', 'query') 1272 Example : 1273 Returns : coderef to the desired iterator 1274 Args : scalar $type, one of 'hit', 'subject', 'query' 1275 default is 'query' 1276 option scalar $context: strand/frame context string 1277 Note : getter only 1278 1279=cut 1280 1281sub _tiling_iterator { 1282 my $self = shift; 1283 my ($type, $context) = @_; 1284 $self->_check_type_arg(\$type); 1285 $self->_check_context_arg($type, \$context); 1286 1287 if (!defined $self->{"_tiling_iterator_${type}_${context}"}) { 1288 $self->{"_tiling_iterator_${type}_${context}"} = 1289 $self->_make_tiling_iterator($type,$context); 1290 } 1291 return $self->{"_tiling_iterator_${type}_${context}"}; 1292} 1293=head2 Construction Helper Methods 1294 1295See also L<Bio::Search::Tiling::MapTileUtils>. 1296 1297=cut 1298 1299sub _check_type_arg { 1300 my $self = shift; 1301 my $typeref = shift; 1302 $$typeref ||= 'query'; 1303 $self->throw("Unknown type '$$typeref'") unless grep(/^$$typeref$/, qw( hit query subject )); 1304 $$typeref = 'hit' if $$typeref eq 'subject'; 1305 return 1; 1306} 1307 1308sub _check_action_arg { 1309 my $self = shift; 1310 my $actionref = shift; 1311 if (!$$actionref) { 1312 $$actionref = ($self->_has_sequence_data ? 'exact' : 'est'); 1313 } 1314 else { 1315 $self->throw("Calc action '$$actionref' unrecognized") unless grep /^$$actionref$/, qw( est exact fast max ); 1316 if ($$actionref ne 'est' and !$self->_has_sequence_data) { 1317 $self->warn("Blast file did not possess sequence data; defaulting to 'est' action"); 1318 $$actionref = 'est'; 1319 } 1320 } 1321 return 1; 1322} 1323 1324sub _check_context_arg { 1325 my $self = shift; 1326 my ($type, $contextref) = @_; 1327 if (!$$contextref) { 1328 $self->throw("Type '$type' requires strand/frame context for algorithm ".$self->algorithm) unless ($self->mapping($type) == 1); 1329 # set default according to default_context attrib 1330 $$contextref = $self->default_context($type); 1331 } 1332 else { 1333 ($$contextref =~ /^[mp]$/) && do { $$contextref .= '_' }; 1334 $self->throw("Context '$$contextref' unrecognized") unless 1335 $$contextref =~ /all|[mp][0-2_]/; 1336 } 1337 1338} 1339 1340=head2 _make_context_key 1341 1342 Title : _make_context_key 1343 Alias : _context 1344 Usage : $tiling->_make_context_key(-strand => $strand, -frame => $frame) 1345 Function: create a string indicating strand/frame context; serves as 1346 component of memoizing hash keys 1347 Returns : scalar string 1348 Args : -type => one of ('query', 'hit', 'subject') 1349 -strand => one of (1,0,-1) 1350 -frame => one of (-2, 1, 0, 1, -2) 1351 called w/o args: returns 'all' 1352 1353=cut 1354 1355sub _make_context_key { 1356 my $self = shift; 1357 my @args = @_; 1358 my ($type, $strand, $frame) = $self->_rearrange([qw(TYPE STRAND FRAME)], @args); 1359 _check_type_arg(\$type); 1360 return 'all' unless (defined $strand or defined $frame); 1361 if ( defined $strand && $self->_has_strand($type) ) { 1362 if (defined $frame && $self->_has_frame($type)) { 1363 return ($strand >= 0 ? 'p' : 'm').abs($frame); 1364 } 1365 else { 1366 return ($strand >= 0 ? 'p_' : 'm_'); 1367 } 1368 } 1369 else { 1370 if (defined $frame && $self->_has_frame($type)) { 1371 $self->warn("Frame defined without strand; punting with plus strand"); 1372 return 'p'.abs($frame); 1373 } 1374 else { 1375 return 'all'; 1376 } 1377 } 1378} 1379 1380=head2 _context 1381 1382 Title : _context 1383 Alias : _make_context_key 1384 Usage : $tiling->_make_context_key(-strand => $strand, -frame => $frame) 1385 Function: create a string indicating strand/frame context; serves as 1386 component of memoizing hash keys 1387 Returns : scalar string 1388 Args : -type => one of ('query', 'hit', 'subject') 1389 -strand => one of (1,0,-1) 1390 -frame => one of (-2, 1, 0, 1, -2) 1391 called w/o args: returns 'all' 1392 1393=cut 1394 1395sub _context { shift->_make_context_key(@_) } 1396 1397=head2 Predicates 1398 1399Most based on a reading of the algorithm name with a configuration lookup. 1400 1401=over 1402 1403=item _has_sequence_data() 1404 1405=cut 1406 1407sub _has_sequence_data { 1408 my $self = shift; 1409 $self->throw("Hit attribute not yet set") unless defined $self->hit; 1410 return (($self->hit->hsps)[0]->seq_str('match') ? 1 : 0); 1411} 1412 1413=item _has_logical_length() 1414 1415=cut 1416 1417sub _has_logical_length { 1418 my $self = shift; 1419 my $type = shift; 1420 $self->_check_type_arg(\$type); 1421 # based on mapping coeff 1422 $self->throw("Mapping coefficients not yet set") unless defined $self->mapping($type); 1423 return ($self->mapping($type) > 1); 1424} 1425 1426=item _has_strand() 1427 1428=cut 1429 1430sub _has_strand { 1431 my $self = shift; 1432 my $type = shift; 1433 $self->_check_type_arg(\$type); 1434 return $self->{"_has_strand_${type}"}; 1435} 1436 1437=item _has_frame() 1438 1439=cut 1440 1441sub _has_frame { 1442 my $self = shift; 1443 my $type = shift; 1444 $self->_check_type_arg(\$type); 1445 return $self->{"_has_frame_${type}"}; 1446} 1447 1448=back 1449 1450=head1 Private Accessors 1451 1452=head2 _contig_intersection 1453 1454 Title : _contig_intersection 1455 Usage : $tiling->_contig_intersection($type) 1456 Function: Return the minimal set of $type coordinate intervals 1457 covered by the invocant's HSPs 1458 Returns : array of intervals (2-member arrayrefs; see MapTileUtils) 1459 Args : scalar $type: one of 'query', 'hit', 'subject' 1460 1461=cut 1462 1463sub _contig_intersection { 1464 my $self = shift; 1465 my ($type, $context) = @_; 1466 $self->_check_type_arg(\$type); 1467 $self->_check_context_arg($type, \$context); 1468 if (!defined $self->{"_contig_intersection_${type}_${context}"}) { 1469 $self->_calc_coverage_map($type); 1470 } 1471 return @{$self->{"_contig_intersection_${type}_${context}"}}; 1472} 1473 1474=head2 _reported_length 1475 1476 Title : _reported_length 1477 Usage : $tiling->_reported_length($type) 1478 Function: Get the total length of the seq $type 1479 for the invocant's hit object, as reported 1480 by (not calculated from) the input data file 1481 Returns : scalar int 1482 Args : scalar $type: one of 'query', 'hit', 'subject' 1483 Note : This is kludgy; the hit object does not currently 1484 maintain accessors for these values, but the 1485 hsps possess these attributes. This is a wrapper 1486 that allows a consistent access method in the 1487 MapTiling code. 1488 Note : Since this number is based on a reported length, 1489 it is already a "logical length". 1490 1491=cut 1492 1493sub _reported_length { 1494 my $self = shift; 1495 my $type = shift; 1496 $self->_check_type_arg(\$type); 1497 my $key = uc( $type."_LENGTH" ); 1498 return ($self->hsps)[0]->{$key}; 1499} 1500 15011; 1502 1503