12020-09-17 twu 2 3 * Makefile.gsnaptoo.am: Including sam_sort again 4 52020-09-13 twu 6 7 * VERSION, index.html: Updated version number 8 9 * path-solve.c, distant-rna.c, stage3hr.c, terminal.c: Using new interfaces 10 to Substring trim functions 11 12 * gmap_build.pl.in: Fixed flag in help output 13 14 * substring.c, substring.h: For DNA-seq, not allowing extension of last 15 mismatch at end if it extends beyond chromosomal bounds 16 172020-07-10 twu 18 19 * pair.c: Added semicolon between Dir and coverage in gff3 output 20 212020-06-29 twu 22 23 * trunk, VERSION: Revised version number 24 25 * src: Merged revisions 222852 through 222926 from 26 branches/2020-06-12-end-trimming 27 28 * terminal.c: Merged revisions 222852 through 222926 from 29 branches/2020-06-12-end-trimming to use new interface to 30 Subsrtring_trim_qend_nosplice 31 32 * stage1hr.c, stage1hr.h: Merged revisions 222852 through 222926 to use both 33 max_mismatches_refalt and max_mismatches_ref 34 35 * gsnap.c: Merged revisions 222852 through 222926 to add 36 --max-mismatches-ref and to change --ignore-trim-in-filtering to 37 --filter-within-trims 38 39 * substring.c, substring.h: Merged revisions 222852 through 222926 from 40 branches/2020-06-12-end-trimming to backup 1 bp for mismatches at the ends 41 of reads for DNA-seq 42 43 * stage3hr.c, stage3hr.h, stage3hrdef.h: Merged revisions 222852 through 44 222926 from branches/2020-06-12-end-trimming to use refalt for scoring and 45 both ref and refalt for filtering 46 47 * indel.c: Merged revisions 222852 through 222926 from 48 branches/2020-06-12-end-trimming to use new interface to 49 Genome_count_mismatches_substring 50 51 * genome128_hr.c, genome128_hr.h: Merged revisions 222852 through 222926 52 from branches/2020-06-12-end-trimming to handle masked genomes 53 542020-06-22 twu 55 56 * gsnap.c: Using new interface to Indel_setup 57 58 * indel.h: Providing genomelength in Indel_setup 59 60 * indel.c: Using Genome_fill_buffer_ref instead of Genome_fill_buffer_blocks 61 to avoid issues at ends of the genome 62 63 * genome.c, genome.h: Implemented Genome_fill_buffer_ref 64 652020-06-18 twu 66 67 * cpuid.c: Fixed parameter list for Intel compilers 68 69 * stage3.c: Adding another exception for long end introns: end exon must be 70 less than 40 bp 71 722020-06-15 twu 73 74 * VERSION: Updated version number 75 76 * trunk, src, genome128_consec.c, genome128_hr.c, splice.c: Merged revisions 77 222853 through 222859 from branches/2020-06-12-end-trimming to increase 78 MIN_EXON_LENGTH from 9 to 20 79 802020-06-13 twu 81 82 * trunk, src, substring.c: Merged revision 222854 from 83 branches/2020-06-12-end-trimming to iterate through correct number of 84 mismatches from Genome trim procedures, and when splicing fails, using 85 trimpos rather than pos5 or pos3 86 872020-06-04 twu 88 89 * samprint.c: Fixed memory leak with mate_md_fp in nomapping alignments 90 91 * trunk, VERSION, config.site.rescomp.tst: Updated version number 92 93 * index.html: Updated for new version 94 95 * src, Makefile.gsnaptoo.am, cigar.c, cigar.h, gsnap.c, mdprint.c, 96 mdprint.h, samprint.c, samprint.h: Merged revisions 222824 through 222834 97 from branches/2020-06-03-MD to integrate computation of CIGAR and MD 98 strings 99 1002020-06-03 twu 101 102 * trunk, cigar.c, cigar.h, samprint.c: Merged revisions 222820 through 103 222823 from branches/2020-06-03-MD to compute MD string correctly for 104 hardclipping on minus alignments and for the --sam-hardclip-use-S flag 105 106 * samprint.c: Removed debugging macro 107 108 * samprint.c: Removed code that led to a zero-length MD string when 109 hard-clipping was present 110 111 * substring.c: Changed format of debugging statement to handle both regular 112 and large genomes 113 114 * splice.c, oligoindex_hr.c, kmer-search.c, iit-read-univ.c: Removed unused 115 variables 116 117 * gsnap.c: Using new interfaces to Indexdb_new_genome and 118 Indexdb_new_transcriptome 119 120 * gmap.c, atoiindex.c, cmetindex.c, indexdb-cat.c: Using new interface to 121 Indexdb_new_genome 122 123 * genome.c, transcriptome.c: Removed unused variable 124 125 * compress-write.c: Fixed messages to stderr 126 127 * stage3.c: Changed type of intronlength from Chrpos_T to int. Changed 128 types of new_leftgenomepos and new_rightgenomepos to be int 129 130 * stage1hr.c: Using new interface to Terminal_solve_plus, 131 Terminal_solve_minus, and Distant_rna_solve 132 133 * distant-rna.c, distant-rna.h: Removed unused parameters queryuc_ptr and 134 queryrc 135 136 * terminal.c, terminal.h: Removed unused parameters queryuc_ptr and queryrc 137 from Terminal_solve_plus and Terminal_solve_minus, respectively 138 139 * uniqscan.c: Using new interface to Stage3_new_genome 140 141 * stage3hr.c: Using new interface to Junction_new_chimera 142 143 * junction.c, junction.h: Removed unused parameter sensedir from 144 Junction_new_chimera 145 146 * indexdb.c, indexdb.h: Removed unused parameter expand_offsets_p 147 148 * trunk, VERSION, config.site.rescomp.tst, index.html, src, distant-dna.c, 149 distant-dna.h, gsnap.c, indel.c, indel.h, kmer-search.c, path-solve.c, 150 splice.c, splice.h, stage3hr.c, stage3hr.h: Merged revisions 222790 151 through 222796 from branches/2020-06-03-TGGA to fix bugs in 152 transcriptome-guided genomic alignment 153 1542020-06-02 twu 155 156 * stage3hrdef.h: Added comment 157 158 * substring.c: Made computations for mandatory trims similar to those for 159 querystart_chrbound and queryend_chrbound 160 161 * stage3hr.c: Computing fields mandatory_trim_querystart and 162 mandatory_trim_queryend and using them in computing coverage 163 164 * stage3hrdef.h: Added fields mandatory_trim_querystart and 165 mandatory_trim_queryend 166 167 * kmer-search.c: Changed types of genomic coords from Trcoord_T to 168 Univcoord_T 169 170 * get-genome.c, snpindex.c: Added calls to Genome_setup 171 1722020-06-01 twu 173 174 * VERSION: Updated version number 175 176 * gmap.c: Using new interface to Genome_user_setup 177 178 * genome.c, genome.h: Added genomelength to Genome_user_setup 179 180 * gsnap.c: Moved Genome_setup before knownsplicing initialization 181 1822020-05-31 twu 183 184 * gmap.c, gsnap.c: Using new interface to Genome_setup 185 186 * index.html: Updated for latest version 187 188 * VERSION: Updated version number 189 190 * README: Updated information 191 192 * genome.c: Fixed bug in specifying coordinates in Genome_fill_buffer_simple 193 194 * genome.c, genome.h: Modifying pos5 and pos3 in Genome_fill_buffer_simple 195 to avoid going outside of genome bounds 196 197 * stage3hr.c: Fixed a bug in Stage3end_remove_duplicates that failed to 198 return distant alignments 199 200 * gmap_build.pl.in: Checking if transcript FASTA or genes file is provided 201 but transcriptome name is not 202 203 * gsnap.c: Using new interfaces to setup procedures 204 205 * stage3hr.c, stage3hr.h, substring.c, substring.h, terminal.c, terminal.h, 206 distant-rna.h, extension-search.c, extension-search.h, kmer-search.h, 207 path-solve.c, path-solve.h: Providing genomelength to setup procedures 208 209 * kmer-search.c: Fixed calculations of pos5 and pos3 for transcriptome bounds 210 211 * distant-rna.c: Fixed bug in computation of pos3. Providing genomelength 212 to setup procedure 213 214 * stage3hr.c, stage3hr.h: Removed unused variables for Stage3hr_setup 215 216 * gsnap.c, substring.c, substring.h: Removed unused variables for setup 217 218 * get-genome.c: Added a --genes option for converting a genes file to FASTA 219 format 220 221 * gmap_build.pl.in: Added --genes option for building a transcriptome from a 222 genes file 223 224 * gff3_genes.pl.in: Added options for printing exon and/or CDNA fields. 225 Printing only exons by default 226 227 * get-genome.c: Removed unused code 228 229 * kmer-search.c: Fixed issue with uninitialized variable in 230 transcriptome-guided genomic alignment 231 2322020-05-30 twu 233 234 * VERSION: Updated version number 235 236 * gmap_build.pl.in: Building a genome index based on the presence of genome 237 FASTA files 238 239 * path-solve.c: Handling the case where best_left_paths or best_right_paths 240 is NULL, due to an alignment attempt that yields an unacceptable path 241 242 * kmer-search.c: No longer using SIMD shortcuts for computing exoni. Not 243 comparing len against nindels, only exon_residual against nindels 244 245 * stage1hr.c: Edited comment 246 247 * stage3hr.c: Removed test against MIN_ALIGNMENT_LEN 248 249 * transcriptome.c: Added variable for debugging 250 251 * transcript.c: Fixed memory leak 252 253 * distant-rna.c: Computing fragment substring bounds for new 254 Genome_mismatches_left and Genome_mismatches_right 255 256 * distant-dna.c: Fixed memory leak. Allocating memory for new 257 Genome_mismatches_left and Genome_mismatches_right 258 259 * genome128_hr.c: For Genome_mismatches_left and Genome_mismatches_right, 260 enforcing that nmismatches <= max_mismatches 261 2622020-05-29 twu 263 264 * VERSION: Updated version number 265 266 * iit_store.c: Fixed bug in allocating memory for string 267 268 * path-solve.c: Rearranged debugging code 269 270 * distant-rna.c: Using new interface to Substring_qstart_trim and 271 Substring_qend_trim 272 273 * stage3hr.c: In Stage3end_new_substitution, checking for pos5 and pos3 274 being outside of genome bounds 275 276 * substring.c, substring.h: Changed parameter names for 277 Substring_qstart_trim and Substring_qend_trim 278 279 * iit_store.c: Fixed bug with double freeing line in parsing GFF3 files 280 281 * concordance.c, concordance.h, distant-dna.c, distant-dna.h, kmer-search.c, 282 kmer-search.h, path-solve.c, path-solve.h, samprint.c, samprint.h, 283 simplepair.c, simplepair.h, stage1hr.c, stage1hr.h, terminal.c, 284 terminal.h: Using splicingp instead of novelsplicingp 285 286 * gsnap.c: Added variable splicingp and providing it to setup procedures 287 2882020-05-28 twu 289 290 * stage1hr.c: Testing novelsplicingp before all code for antisense hits 291 292 * ladder.c: In Ladder_minimax_trim, handling the case where the antisense 293 ladders are NULL 294 295 * VERSION: Updated version number 296 297 * trunk, src, concordance.c, concordance.h, distant-dna.c, distant-dna.h, 298 distant-rna.c, distant-rna.h, extension-search.c, extension-search.h, 299 genome128_hr.c, gsnap.c, indel.c, indel.h, junction.c, junction.h, 300 kmer-search.c, kmer-search.h, ladder.c, ladder.h, path-solve.c, 301 path-solve.h, segment-search.c, segment-search.h, splice.c, splice.h, 302 stage1hr.c, stage3hr.c, stage3hr.h, stage3hrdef.h, substring.c, 303 substring.h, terminal.c, terminal.h: Merged revisions 222646 through 304 222711 from branches/2020-05-23-min-coverage to improve splice-plus-indel 305 alignments, allow multiple paths in path-solve procedures, and find 306 concordance separately for sense and antisense alignments 307 3082020-05-23 twu 309 310 * indexdb-cat.c: Handling cases where sampling intervals are different 311 312 * gsnap.c: Using new interface to Extension_search_setup 313 314 * extension-search.c, extension-search.h: Generalized from an index1interval 315 of 3 316 3172020-05-20 twu 318 319 * stage1hr.c: Always making calls to Stage3end_filter 320 321 * stage3hr.c: Added a debugging statement 322 3232020-05-19 twu 324 325 * trunk, src, gsnap.c, indel.c, ladder.c, merge-diagonals-simd-uint4.c, 326 regiondb.c, splice.c, stage1hr.c, stage3hr.c, stage3hr.h, stage3hrdef.h, 327 terminal.c, terminal.h: Merged revisions 222614 through 222628 from 328 branches/2020-05-18-filtering to improve the filtering and choices among 329 alignments 330 331 * VERSION, config.site.rescomp.tst: Updated version number 332 3332020-05-18 twu 334 335 * substring.c: Removed printing of NA for probabilities under NO_COMPARE 336 macro 337 3382020-05-15 twu 339 340 * splice.c: Removed code that exited prematurely from search for indels plus 341 splicing 342 343 * splice.c: Removed unused code 344 3452020-05-14 twu 346 347 * stage3hr.c: In Stage3end_new_substrings, checking if an alignment in a 348 circular chromosome exceeds chrlength 349 350 * substring.c: Added assertions 351 3522020-05-13 twu 353 354 * stage3hr.c: Setting trim_querystart and trim_queryend, and then 355 querystart_chrbound and queryend_chrbound, revising according to trim 356 values 357 358 * stage3hr.c: Applying minimum alignment length to Stage3_new_substrings 359 360 * stage3hr.c: Requiring a minimum exon length before reducing score for 361 spliced ends 362 363 * extension-search.c: Using new interface to Univ_IIT_update_chrnum 364 365 * stage3hr.c: Fixed penalty for trims to work with spliced ends 366 367 * gsnap.c: Restored the -E abbreviation for --distant-splice-penalty 368 3692020-05-12 twu 370 371 * trunk, config.site.rescomp.tst, src, distant-rna.c, extension-search.c, 372 extension-search.h, gsnap.c, kmer-search.c, kmer-search.h, path-solve.c, 373 path-solve.h, segment-search.c, segment-search.h, stage1hr.c, stage1hr.h, 374 stage3hr.c, stage3hr.h: Merged revisions 222561 through 222589 from 375 branches/2020-05-08-large-insertions to identify large insertions 376 377 * Makefile.gsnaptoo.am: Added comment 378 379 * stage3hr.c: Not compensating for short chrlengths if chrlength is 0 380 381 * stage1hr.c: Filtering singlehits5 and singlehits3 382 383 * terminal.c, distant-rna.c: Using new interface to Univ_IIT_get_chrnum 384 385 * genome128_hr.c: Setting final querypos to be pos5 - 1, rather than -1 386 387 * concordance.h: Removed unused header file 388 389 * iit-read-univ.c, iit-read-univ.h: Univ_IIT_get_chrnum, 390 Univ_IIT_update_chrnum, and Univ_IIT_get_trnum take low and high as 391 parameters 392 393 * segment-search.c: Checking for the need to go to a previous chrnum 394 395 * intersect-large.c: Adding parentheses for clarity 396 397 * kmer-search.c: Using new interface to Univ_IIT_get_trnum 398 399 * path-solve.c: In attach_qstart_diagonal and attach_qend_diagonal, checking 400 that left+pos5/pos3 are not before the beginning of the genome 401 4022020-05-09 twu 403 404 * intersect-large.c: Fixed check for positions0 at beginning of genome 405 406 * extension-search.c: Using new interface to Univ_IIT_update_chrnum 407 4082020-04-25 twu 409 410 * stage3hr.c: Redefining low and high in Stage3end_T object to be based on 411 the aligned endpoints 412 413 * extension-search.c: Checking for genomic bounds on each of the 414 univdiagonals 415 4162020-04-24 twu 417 418 * intersect-large.c, intersect.c: Fixed Intersect_exact_indices routines to 419 handle small univdiagonals less than diagterm 420 4212020-04-22 twu 422 423 * splice.c: Specifying a minimum splice prob 424 425 * stage3hr.c: Adding slop for insert length and splice score 426 427 * util, gmap_cat.pl.in: Removed unused lines 428 429 * trunk, config.site.rescomp.prd, config.site.rescomp.tst, src, 430 Makefile.gsnaptoo.am, concordance.c, distant-rna.c, genome128_hr.c, 431 genome128_hr.h, gmap.c, gsnap.c, indexdb.c, kmer-search.c, ladder.c, 432 path-solve.c, path-solve.h, splice.c, splice.h, splicetrie.c, 433 splicetrie.h, stage3.h, stage3hr.c, stage3hr.h, stage3hrdef.h, 434 substring.c, substring.h, terminal.c: Merged revisions 222160 through 435 222483 from branches/2020-03-13-exon-intron-scores to allow for masked 436 genomes 437 438 * simplepair.c: No longer printing transcript information, since it requires 439 a merging step 440 441 * samprint.c: Using new interfaces to procedures for printing transcripts 442 443 * stage3hr.c, stage3hr.h, stage3hrdef.h, transcript.c, transcript.h: Removed 444 transcripts5 and transcripts3 from Stage3pair_T object, and computing and 445 printing concordance when needed 446 447 * stage3hr.c: Put debugging statements within a macro 448 449 * stage3hr.c: Merged revision 222198 from branches/2020-03-13 to restore 450 hit_equal and hitpair_equal procedures for handling overlaps within loci 451 452 * substring.c: Merged revision 222188 from 453 branches/2020-03-13-exon-intron-scores to improve trimming 454 455 * stage1hr.c: Merged revision 222189 from 456 branches/2020-03-13-exon-intron-scores to call optimal_score_prefinal 457 before removing overlaps and optimal_score_final 458 459 * splice.c, splice.h: Merged revision 222187 from 460 branches/2020-03-13-exon-intron-scores to taking probability and 461 nconsecutive thresholds as parameters 462 4632020-04-21 twu 464 465 * iit-read-univ.c, iit-read-univ.h: Removed obsolete procedure 466 Univ_IIT_interval_bounds_linear 467 468 * kmer-search.c: Using Univ_IIT_get_trnum 469 470 * trunk, configure.ac, src, Makefile.gsnaptoo.am, chrnum.h, distant-dna.c, 471 distant-rna.c, extension-search.c, extension-search.h, gmapindex.c, 472 gsnap.c, iit-read-univ.c, iit-read-univ.h, indexdb.c, indexdb.h, 473 kmer-search.c, localdb.c, merge-diagonals-heap.c, 474 merge-diagonals-simd-uint4.c, merge-diagonals-simd-uint8.c, path-solve.c, 475 path-solve.h, record.h, regiondb.c, regiondb.h, segment-search.c, 476 stage1hr.c, stage3hr.c, stage3hr.h, substring.c, substring.h, terminal.c, 477 util: Merged revisions 222404 through 222444 from 478 branches/2020-04-14-right-diagonals to use univdiagonals instead of lefts 479 4802020-04-18 twu 481 482 * samprint.c: Allowing for XS field to be printed in transcriptome-guided 483 alignment 484 485 * index.html: Updated for version 2020-04-08 486 487 * gsnap.c: Using new interface to Transcriptome_new 488 489 * gmap_build.pl.in: Added options for building a transcriptome db 490 491 * transcriptome.c, transcriptome.h: Putting transcriptome info into a 492 subdirectory of the genome db 493 494 * trindex.c: Putting transcriptome info into a subdirectory in the genome db 495 496 * kmer-search.c: Fixed bug in computing adj 497 498 * Makefile.gsnaptoo.am: Restored trindex program 499 5002020-04-15 twu 501 502 * output.c: Fixed bug in checking for a condition 503 5042020-04-14 twu 505 506 * gmap.c, output.c, output.h: Added options cdna+introns and genomic+introns 507 to the --exons flag 508 5092020-04-13 twu 510 511 * svncl.pl: Grouping files with identical comments 512 5132020-04-12 twu 514 515 * iit_get.c: Setting coordstart and coordend to 0 when force_label_p is true 516 5172020-04-10 twu 518 519 * VERSION: Updated version number 520 521 * access.c, access.h, iit-read-univ.c, iit-read.c: Removed procedures for 522 read/write memory mapping 523 5242020-04-08 twu 525 526 * trunk, src, gmapindex.c, indexdb-cat.c, gmap_cat.pl.in: Merged revisions 527 222346 to 222387 from branches/2020-03-13-exon-intron-scores to remove the 528 -F flag from concatenation programs 529 5302020-04-05 twu 531 532 * VERSION, config.site.rescomp.prd, index.html: Revised for latest version 533 534 * svncl.pl: Adding spaces between lines of multi-line comments 535 536 * svncl.pl, MAINTAINER: Replaced svncl.pl with a program that does not 537 depend on xml output 538 539 * configure.ac: Added comment 540 541 * dynprog_end.c: Fixed debugging macro 542 543 * gsnap.c: Turned shared memory off by default 544 545 * gmap.c: Turned shared memory off by default. Added option 546 --use-shared-memory 547 548 * compress-write.c, regiondb-write.c: Initializing value of current_pos in 549 concatenation procedures 550 5512020-04-02 twu 552 553 * get-genome.c: Added option --add-circular 554 5552020-03-27 twu 556 557 * iit_store.c: Added option --accesion-only 558 5592020-03-23 twu 560 561 * compress-write.c, compress-write.h, gmapindex.c: Allowing for compressing 562 genomes from stdin 563 564 * iit_get.c: Removed debugging command 565 566 * compress-write.c, compress-write.h, gmapindex.c: Implemented option for 567 compressing FASTA files 568 569 * iit_get.c: Fixed printing of sequence with coordinates 570 571 * gmap.c, output.c, output.h, pair.c, pair.h: Differentiating between 572 mask_introns and mask_utr_introns 573 5742020-03-18 twu 575 576 * gmap.c, output.c, output.h, pair.c, pair.h: Added option --mask-introns to 577 GMAP 578 5792020-03-13 twu 580 581 * iit_get.c: Ignoring linefeed characters in handling coordinates 582 583 * stage3hr.c, stage3hrdef.h, substring.c, substring.h: Computing 584 querystart_chrbound and queryend_chrbound, and disregarding in comparing 585 against max_mismatches 586 587 * index.html: Added statement about nosimd versions being restored 588 5892020-03-12 twu 590 591 * archive.html, index.html: Updated for latest version 592 593 * trunk, VERSION, config.site.rescomp.prd, config.site.rescomp.tst, 594 index.html, src, Makefile.gsnaptoo.am, atoi.c, atoi.h, atoiindex.c, 595 cmet.c, cmet.h, cmetindex.c, dynprog_end.c, gmap.c, gmapindex.c, gsnap.c, 596 iit-read.c, iit-read.h, indexdb-write.c, indexdb.c, indexdb.h, 597 intersect.c, kmer-search.c, kmer-search.h, localdb.c, 598 merge-diagonals-simd-uint8.c, path-solve.c, path-solve.h, 599 regiondb-write.c, regiondb-write.h, regiondb.c, regiondb.h, regiondbdef.h, 600 stage1hr.c, stage1hr.h, transcriptome.c, trindex.c, uint8list.c, 601 uint8list.h, util, fa_coords.pl.in, gmap_build.pl.in, gmap_cat.pl.in: 602 Merged revisions 221585 through 222138 from 603 branches/2020-02-01-local-fixed-size to implement a regiondb hash to 604 replace the localdb hash 605 606 * MAINTAINER: Revised instructions for rosalind 607 608 * bootstrap.gsnaptoo: Using automake and autoreconf in path 609 610 * index.html: Added comment about default behavior for localdb usage 611 6122020-03-06 twu 613 614 * gsnap.c: Setting default behavior for localdb usage to be true for RNA-seq 615 and false for DNA-seq 616 617 * gmap.c: Setting npaths_primary and npaths_altloc when stage3list is NULL 618 619 * atoiindex.c, cmetindex.c, genome.c, indexdb-write.c, indexdb.c, localdb.c, 620 snpindex.c: Using new interface to Access_mmap 621 622 * access.c, access.h: Access_mmap now returns seconds 623 6242020-02-20 twu 625 626 * trunk, index.html, src, iit_store.c, indexdb-cat.c, indexdb.c, util, 627 gmap_cat.pl.in: Merged revisions 221935 through 221944 from 628 branches/2020-02-01-local-fixed-size to handle circular chromosomes in 629 gmap_cat 630 631 * gmap_cat.pl.in, indexdb-cat.c: Allowing -F flag to handle multiple source 632 directories 633 634 * gmapindex.c: Improved user error message 635 636 * gmap_build.pl.in: Made none the default value for sorting. Removed usage 637 of sourcedir and -F in calls to gmapindex. 638 639 * gmapindex.c: Removed usage of -F except for concatenating genomes. 640 Allowing -F to handle multiple source directories 641 6422020-02-19 twu 643 644 * README: Now references LICENSE file 645 646 * Makefile.am: Added LICENSE file for distribution 647 648 * LICENSE: Initial import 649 650 * COPYING: Removed old license. Now refers to the LICENSE file 651 652 * NOTICE: Removed references to suffix array code. Added journal reference 653 for Lemire and Boytsov 654 655 * compress-write.c: Implemented a simpler and more general algorithm for 656 Compress_cat 657 6582020-02-17 twu 659 660 * gmapindex.c: Fixed bug when -F and -D values are different when 661 concatenating genomes 662 6632020-02-13 twu 664 665 * indexdb.c: Fixed lengths for kmer size and sampling and error message to 666 user 667 668 * gmap_build.pl.in: Removed unused variable 669 670 * gmap_cat.pl.in: Fixed Id comment line 671 672 * gmap_cat.pl.in: Added Id comment line 673 674 * gmap_cat.pl.in: Removed references to sleep. Added better final message 675 676 * gmap_build.pl.in, gmap_cat.pl.in: Adding package version numbers to 677 version file 678 679 * Makefile.am: Removed obsolete scripts 680 681 * indexdb-cat.c: Updating required_index1part and required_index1interval so 682 they all match 683 684 * indexdb.c: Added user statement about required kmer and interval 685 686 * gmap_cat.pl.in: Checking compiler assumptions and creating version file 687 688 * compress-write.c: Made fixes to computation of flags 689 690 * trunk, index.html: Updated version 691 692 * VERSION, config.site.rescomp.prd, config.site.rescomp.tst: Updated version 693 number 694 695 * gmap_cat.pl.in: Implemented --names option 696 697 * gmap_build.pl.in: Changed documentation 698 699 * gmap_cat.pl.in: Initial import 700 701 * index.html: Updated for latest version 702 703 * access.h: Added NOT_USED as an access type 704 705 * access.c: Adding a check for a null filename 706 707 * gmap_build.pl.in: Removed comment about gmap_setup 708 709 * indexdb-cat.c: Using destdir, which should be set correctly by gmap_cat 710 711 * iit_store.c: For universal IIT output, adding a circular type 712 713 * gmapindex.c: Added code for concatenating genomecomp 714 715 * gmap.c: Turning off contig output 716 717 * compress-write.c: Put debugging messages inside macro 718 719 * bitpack64-write.c, bitpack64-write.h: Using 720 Bitpack64_compute_q4_diffs_bidir_huge. Removed duplicate code 721 722 * README: Removed references to suffix array 723 724 * configure.ac: Added gmap_cat. Removed obsolete scripts 725 726 * Makefile.gsnaptoo.am: Added instructions for indexdb_cat 727 728 * indexdbdef.h: Added a field hugep 729 730 * indexdb.c: Modified user messages 731 732 * indexdb-cat.c: Implemented procedures for huge genomes and 8-byte positions 733 734 * indexdb-cat.c: Implemented uint8 procedure, but has errors 735 736 * indexdb-cat.c: Implemented merging of positions 737 738 * indexdb.c: Cleaned up Indexdb_new_genome procedure 739 7402020-02-12 twu 741 742 * indexdb-cat.c: Initial import 743 744 * compress-write.c: Enclosed debugging statements into macros 745 746 * compress-write.c: Fixed bugs in Compress_cat 747 748 * compress-write.c: Simplified code for Compress_cat 749 7502020-02-11 twu 751 752 * compress-write.c: Made fixes for shifts < 16 753 754 * compress-write.c, compress-write.h: Initial implementation of Compress_cat 755 for concatenating genomes 756 757 * Makefile.gsnaptoo.am: Removed trindex and sam_sort 758 759 * gmap.c, pair.c, sequence.c, sequence.h, stage3.c, stage3.h: Added option 760 --gff3-fasta-annotation to GMAP 761 7622020-02-04 twu 763 764 * iit-read.c: Fixed IIT_dump for intervals with start and end both being 0 765 766 * iit_store.c: Handling the case where annotation has zero length 767 768 * gmap_build.pl.in: Returning pipe variable 769 770 * stage3.c: Using new interface to Pair_print_gff3 771 772 * sequence.c, sequence.h: Implemented Sequence_restofheader 773 774 * path-solve.c: No longer using localdb for large genomes 775 776 * pair.c, pair.h: For GFF3 output, printing FASTA headers as annotation 777 778 * localdb.c: Created a separate debugging category 779 7802020-01-30 twu 781 782 * Makefile.gsnaptoo.am: Removed ushortlist.c 783 784 * dynprog.h, dynprog_cdna.h, dynprog_end.h, dynprog_genome.h, 785 dynprog_single.h, extension-search.h, genome-write.h, genome128-write.h, 786 iit-read-univ.h, ladder.h, record.h, stage2.h, stage3hrdef.h: Added 787 include for genomicpos.h 788 789 * chrnum.h, genome.h, genomicpos.h, merge-records-heap.h, 790 merge-records-simd.h, path-solve.h: Added include for univcoord.h 791 792 * snpindex.c: Making genomeblocks for snpindex the same length as for the 793 reference 794 795 * parserange.c: Initializing coordstart and coordend 796 797 * localdbdef.h, localdb.c, localdb.h: Using a single loctable 798 799 * localdb-write.c, localdb-write.h: Using Uintlist_T instead of Ushortlist_T 800 801 * iit_get.c: If coords are given, printing the corresponding substring of 802 the annotation 803 804 * gsnap.c, gmap.c: Allowing pipe signals 805 806 * gmapindex.c: Printing user messages about compression only when offsets 807 are compressed 808 809 * epu16-bitpack64-readtwo.c: Declaring a variable 810 8112020-01-23 twu 812 813 * iit_store.c: Allowing input FASTA to have labels without intervals 814 8152019-12-15 twu 816 817 * fa_coords.pl.in, gmap_build.pl.in, gmap_process.pl.in: Handling 1-column 818 names.txt file 819 820 * stage3.c: Changed some variables from int to Chrpos_T 821 822 * merge-diagonals-simd-uint8.c: Using correct printf format for uint8 823 824 * localdb.c, localdb.h, localdbdef.h, localdb-write.c, localdb-write.h: 825 Handling large genomes 826 827 * indexdb-write.c, indexdb-write.h: Changed procedure names to make them 828 clearer 829 830 * gmapindex.c: Using new interfaces to indexdb and localdb write procedures 831 832 * epu16-bitpack64-write.h: Added comment 833 834 * Makefile.gsnaptoo.am, ushortlist.c, ushortlist.h: Added Ushortlist_T 835 object for localdb 836 837 * pair.c: Added cDNA direction to GFF3 output 838 8392019-12-12 twu 840 841 * path-solve.c: Changed type for deletionpos to be Univcoord_T 842 8432019-09-12 twu 844 845 * src, concordance.c, concordance.h, distant-dna.c, distant-dna.h, 846 extension-search.c, extension-search.h, gsnap.c, junction.c, junction.h, 847 kmer-search.c, kmer-search.h, list.c, list.h, stage1hr.c, stage1hr.h, 848 stage3hr.c, stage3hr.h, stage3hrdef.h, substring.c, substring.h: Merged 849 revisions 220290 through 220325 from 850 branches/2019-09-11-zero-length-introns to handle cases where ambiguous 851 ends are resolved and where distant DNA alignments yield a zero-length 852 intron 853 854 * index.html: Updated for latest version 855 8562019-08-09 twu 857 858 * inbuffer.c, inbuffer.h: Removed references to interleavedp from GMAP 859 860 * gsnap.c: Added --interleaved feature 861 862 * bzip2.c: Saving a file handle and closing it 863 864 * Makefile.gsnaptoo.am: Including bzip2.c and bzip2.h to relevant programs 865 866 * atoiindex.c, cmetindex.c: Initializing filenames to be NULL 867 8682019-07-15 twu 869 870 * inbuffer.c, inbuffer.h, shortread.h: Added interleavedp parameter 871 872 * shortread.c: Implemented interleaved format for gzip- and bzip2-compressed 873 files. 874 875 * getline.c, getline.h: Implemented Getline_gzip and Getline_bzip2 876 877 * shortread.c: Implemented Shortread_read_interleaved_text 878 8792019-06-11 twu 880 881 * stage1hr.c: Added debugging statements 882 883 * concordance.c: Limiting number of overlaps to avoid combinatorial 884 explosion in some cases 885 8862019-05-20 twu 887 888 * index.html: Updated for latest version 889 890 * ax_ext.m4: Improved structure for AVX2 and AVX512 891 8922019-05-12 twu 893 894 * gmap.c, gsnap_select.c, gsnapl_select.c, cpuid.c, cpuid.h, gmap_select.c, 895 gmapl_select.c: Adding support for avx512bw 896 897 * gsnap.c: Changed default parameter for --max-mismatches for DNA-seq 898 899 * Makefile.gsnaptoo.am: Adding programs for avx512bw 900 901 * configure.ac: Adding option for AVX512BW SIMD 902 903 * ax_cpuid_intel.m4, ax_cpuid_non_intel.m4: Adding test for AVX512BW support 904 905 * ax_ext.m4: Adding commands for AVX512BW 906 907 * univdiagpool.c: Adding assertions 908 909 * substring.c: Checking against substrings on the wrong chromosome 910 911 * stage1hr.c: Commenting out extended algorithm, which can cause problems 912 with repetitive reads 913 914 * output.c: Consider excessive output to be a fail for the purpose of the 915 --nofails flag. 916 917 * merge-uint8.c: Fixed SIMD command for AVX512 machines 918 9192019-03-25 twu 920 921 * extension-search.c, extension-search.h: Implemented extension of elt sets 922 in the opposite direction 923 9242019-03-19 twu 925 926 * segment-search.c: For alignments straddling a chromosome, recomputing 927 querypos and queryend to cover the new chromosome 928 929 * gsnap.c: Using new interface for Path_solve_setup 930 931 * path-solve.h: New interface for Path_solve_setup 932 933 * path-solve.c: Not allowing splices on circular chromosomes 934 935 * concordance.c: Using field sensedir_for_concordance 936 937 * stage3hr.c, stage3hrdef.h: Now using fields sensedir_for_concordance and 938 sensedir 939 940 * samprint.c: Removed references to Stage3end_sensedir_distant_guess 941 9422019-03-15 twu 943 944 * trunk, VERSION, config.site.rescomp.tst, index.html, src, 945 Makefile.gsnaptoo.am, distant-dna.c, distant-dna.h, distant-rna.c, 946 distant-rna.h, gsnap.c, method.c, method.h, output.c, path-solve.c, 947 samprint.c, samprint.h, splice.c, splice.h, stage1hr.c, stage3hr.c, 948 stage3hr.h, stage3hrdef.h, substring.c, substring.h, terminal.c: Merged 949 revisions 218560 through 218674 from branches/2019-03-07-distant-dna to 950 implement distant splicing and to fix some bugs in spliced alignments 951 952 * genome128_hr.c: Added comment 953 9542019-03-06 twu 955 956 * distant-dna.h, distant-dna.c: Initial import 957 9582019-03-05 twu 959 960 * gsnap.c: Added option --use-local-hash 961 962 * trunk, VERSION, config.site.rescomp.tst, src: Updated for latest version 963 9642019-03-04 twu 965 966 * localdb.c, localdb.h, path-solve.c: Merged revisions 218528 and 218529 967 from branches/2019-03-04-fix-repetitive to limit recursive procedures 968 969 * stage3hr.c: Fixed debugging statements 970 971 * stage1hr.c, kmer-search.c: Added debugging statements 972 973 * segment-search.c: Fixed debugging statement 974 975 * gsnap.c: Removed variables relating to stage2 suboptimal alignments 976 9772019-03-02 twu 978 979 * path-solve.c: For compute_qstart_paths and compute_qend_paths, added a 980 max_depth criterion, and checking for repetitive positions 981 9822019-03-01 twu 983 984 * stage3hr.c: Checking nmismatches when we are checking all assertions 985 986 * path-solve.c: Revising qstart of middle segment if an insertion is 987 present. Revised code for computing ninserts 988 989 * path-solve.c: Fixed addition of alts substring to best path from 990 all_child_paths, rather than to path 991 9922019-02-26 twu 993 994 * VERSION, config.site.rescomp.prd, index.html: Updated for latest version 995 996 * substring.c: In trimming at ends without splicing, extending to the end if 997 nmismatches is 0 998 999 * stage3hr.c: For score_within_trims, adding a penalty for long ambiguous 1000 ends 1001 1002 * substring.c: Restored previous algorithm for computing trim at ends with 1003 no splice. For alts with good splice probability, counting substring as 1004 nmatches rather than amb. For alts with poor splice probability, counting 1005 substring as amb. 1006 10072019-02-25 twu 1008 1009 * path-solve.c: Fixed cases in compute_qstart_paths and compute_qend_paths 1010 where terminalp was not being set 1011 10122019-02-22 twu 1013 1014 * trunk, config.site.rescomp.prd, config.site.rescomp.tst, src, 1015 concordance.c, concordance.h, distant-rna.c, distant-rna.h, 1016 extension-search.c, extension-search.h, genome128_hr.c, gsnap.c, 1017 kmer-search.c, kmer-search.h, ladder.c, path-solve.c, path-solve.h, 1018 resulthr.c, segment-search.c, segment-search.h, stage1hr.c, stage3hr.c, 1019 stage3hr.h, stage3hrdef.h, substring.c, substring.h, terminal.c, 1020 terminal.h: Merged revisions 218419 through 218472 from 1021 branches/2019-02-19-restore-fusions 1022 1023 * index.html: Updated for latest version 1024 10252019-02-19 twu 1026 1027 * stage1hr.c: Using user_maxlevel_float for final filtering, and not for 1028 searching 1029 1030 * concordance.c: Using score_ignore_trim instead of score_posttrim 1031 1032 * gsnap.c: Added option for --ignore-trim-in-filtering 1033 1034 * stage3hrdef.h: Changed field score_posttrim to score_ignore_trim 1035 1036 * stage3hr.c, stage3hr.h: Changed Stage3hr_filter_coverage to 1037 Stage3hr_filter, which accounts for number of mismatches. Added parameter 1038 ignore_trim_p. Changed score_posttrim to score_ignore_trim, and computing 1039 this to be lower than score 1040 10412019-02-18 twu 1042 1043 * terminal.c: Using Univ_IIT_get_chrnum and new interface to Substring_new 1044 1045 * substring.c, substring.h: Substring_new now assumes that chrnum was set 1046 correctly 1047 1048 * stage3hr.c: Using new interface to Substring_new 1049 1050 * stage1hr.c: Using new interface to Segment_identify procedures 1051 1052 * segment-search.c, segment-search.h: Handling alignments straddling more 1053 than two chromosomes. Removed plusp as a parameter 1054 1055 * kmer-search.c: Using Univ_IIT_get_chrnum. Using new interface to 1056 Substring_new 1057 1058 * iit-read-univ.c, iit-read-univ.h: Implemented Univ_IIT_update_chrnum and 1059 Univ_IIT_get_chrnum 1060 1061 * extension-search.c: Calling Univ_IIT_update_chrnum to set chrnum 1062 1063 * stage3hr.c: If qend <= qstart, do not calculate number of nmismatches, 1064 which is not defined. For nmatches, do not penalize for indels 1065 1066 * substring.c: If trimming changes querystart or queryend, recalculate 1067 nmismatches 1068 10692019-02-15 twu 1070 1071 * stage1hr.c: Skipping alignment when querylength is less than index1part + 1072 index1interval - 1 1073 1074 * segment-search.c: When advancing chrnum for straddled alignments, checking 1075 that we do not go past the last chromosome 1076 1077 * VERSION: Updated version number 1078 1079 * indexdb-write.c: Added code for comparing counts with compression and 1080 counts without compression 1081 1082 * bitpack64-write.c, bitpack64-write.h: For huge genomes, using an array of 1083 UINT8 for calculations of genome position. Printing strerror for 1084 file-related errors. 1085 1086 * substring.c: For circular chromosomes, checking if the entire substring 1087 resides in the next chromosome and returning the circularpos at that query 1088 position 1089 1090 * segment-search.c: When a straddle calls for advancing to a later 1091 chromosome, using local data structures instead of mixing them with a call 1092 to Univ_IIT_get_one 1093 1094 * path-solve.c: Using subtract_bounded to subtract the local1part amount 1095 1096 * localdb.c: In Localdb_get_diagonals, checking for low < high, since low == 1097 high can occur when a substring is at the beginning or end of a chromosome 1098 1099 * cigar.c: Checking for an initial M to be printed before printing any indel 1100 or splice 1101 11022019-02-13 twu 1103 1104 * intersect-large.c, intersect.c: No longer initializing last_diagonal to be 1105 0, and comparing against it, which fails if a diagonal is 0. Instead, 1106 checking explicitly for the first case 1107 11082019-01-31 twu 1109 1110 * gsnap.c, path-solve.c, path-solve.h, segment-search.c, segment-search.h, 1111 stage1hr.c, stage1hr.h, stage3hr.c, stage3hr.h: Removed GMAP parameters 1112 from GSNAP code 1113 1114 * VERSION, config.site.rescomp.prd, archive.html, index.html: Updated for 1115 latest version 1116 1117 * intersect-large.c, intersect.c: In Intersect_approx_lower and 1118 Intersect_approx_higher, ignoring duplicates of diagonals0, in order to 1119 ensure that the result is in ascending order 1120 1121 * substring.c: Using plusp in interpreting mandatory_trim_querystart and 1122 mandatory_trim_queryend 1123 1124 * path-solve.c: When a mismatch extends a diagonal, computing the number of 1125 mismatches at that time 1126 1127 * merge-diagonals-simd-uint4.c: Added code for checking that inputs are in 1128 ascending order 1129 1130 * stage3hr.c: Improved tradeoffs between nmatches, nmatches_posttrim, splice 1131 score, nsegments, insertlength, and outerlength in Stage3end_optimal_score 1132 and Stage3pair_optimal_score_final 1133 1134 * extension-search.c, segment-search.c: Handling the case where the middle 1135 or anchor diagonal straddles two chromosomes 1136 11372019-01-23 twu 1138 1139 * trunk, VERSION, src, Makefile.gsnaptoo.am, changepoint.h, compress.c, 1140 concordance.h, diag.c, extension-search.h, gbuffer.c, gbuffer.h, genome.c, 1141 genome.h, genome128_consec.c, genome128_hr.c, genome_hr.c, genome_sites.c, 1142 gmap.c, gsnap.c, kmer-search.c, kmer-search.h, knownsplicing.c, 1143 knownsplicing.h, oligoindex_localdb.h, outbuffer.h, output.c, pair.c, 1144 pair.h, path-solve.c, path-solve.h, samprint.c, segment-search.c, 1145 segment-search.h, simplepair.c, simplepair.h, splice.c, splice.h, 1146 splicetrie.c, splicetrie.h, splicetrie_build.h, stage1hr.c, stage1hr.h, 1147 stage3.c, stage3hr.c, stage3hr.h, substring.c, substring.h, types.h, 1148 univcoord.h, univdiagpool.c: Merged revisions 218195 through 218285 from 1149 branches/2019-01-17-split-gmap-gsnap to separate GMAP and GSNAP code 1150 11512019-01-22 twu 1152 1153 * stage1hr.c, cigar.c, path-solve.c: Added debugging statements 1154 1155 * stage3hr.c: At a given locus, checking for nmatches_posttrim before 1156 checking for splice score 1157 1158 * splice.c, splice.h, substring.c: Added limit on consecutive matches in 1159 scanning for spliceends 1160 1161 * path-solve.c: Restored previous code that did not look at extendedp 1162 1163 * splice.c: Added requirement for MIN_EXON_LENGTH in trimming ends 1164 1165 * gsnap.c: Added --sam_sparse_secondaries to omit SEQ and QUAL flags in 1166 secondary alignments 1167 1168 * samprint.c, samprint.h: Using mate_plusp computed in Cigar_compute_main in 1169 compute_flag, to give correct results for translocations. Added 1170 sam_sparse_secondaries_p to omit SEQ and QUAL flags in secondary alignments 1171 1172 * path-solve.c: Restored usage of the sense condition to handle the 1173 non-splicing condition 1174 1175 * filestring.c: Handling the case in Filestring_merge where source->string 1176 is NULL 1177 1178 * path-solve.c: Checking if qstart or qend is extended from spliced 1179 endpoints of middle diagonal, and if not extended, using the original 1180 endpoints 1181 1182 * junction.c: Changed name of macro for debugging 1183 1184 * samprint.c, cigar.c: Handling the case where hard clipping removes all 1185 substrings 1186 1187 * extension-search.c: Added debugging statement 1188 11892019-01-18 twu 1190 1191 * extension-search.c: Fixed bugs in process_seed for processing the 1192 remainder of the queryfwd set or the queryrev set 1193 1194 * extension-search.c: Rewrote algorithm extensively to combine seeds and 1195 sets from queryfwd and queryrev passes 1196 1197 * segment-search.c: Using new interface to Path_solve_from_diagonals 1198 1199 * path-solve.c, path-solve.h: Changed Path_solve_from_diagonals to take a 1200 univdiagonal, qstart, and qend, instead of a Univdiag_T object 1201 1202 * kmer-search.c: Removed unused include file 1203 12042019-01-17 twu 1205 1206 * setup.genomecomp.ok: Revised gold standard for extra bytes at end 1207 1208 * segment-search.c: Fixed problem with allocation when total_npositions is 1209 zero in Segment_identify_lower and Segment_identify_higher 1210 1211 * hitlistpool.c: Initial import 1212 1213 * gsnap.c: Remoed oligoindices_major, oligoindices_minor, pairpool, 1214 diagpool, cellpool, and Dynprog_T objects as variables 1215 1216 * gmap.c: Using new interfaces to stage1, stage2, and stage3 procedures 1217 1218 * stage3.c, stage3.h: Removed unused parameters 1219 1220 * translation.c: Removed npairs as a parameter for backward procedures 1221 1222 * terminal.c, terminal.h: Removed mismatch_positions_alloc as a parameter 1223 1224 * stage2.c: Removed code based on anchoredp, anchor_querypos, and 1225 anchor_position, which are now always false and 0 1226 1227 * stage1hr.c, stage1hr.h: Using new interfaces to kmer-search, 1228 extension-search, terminal, and concordance procedures. Removed 1229 oligoindices_minor, diagpool, and cellpool parameters to single_read and 1230 paired_read procedures 1231 1232 * stage1.c, stage1.h: Using new interfaces to Block_process_oligo_5 and 1233 Block_process_oligo_3. Removed sizelimit parameters to Stage1_compute 1234 procedures 1235 1236 * splice.c, splice.h: Removed unused parameters to Splice_setup 1237 1238 * smooth.c: Removed exon_denominator as a parameter to 1239 find_internal_bads_by_prob 1240 1241 * segment-search.c: Removed unused variable 1242 1243 * samprint.h: Removed preprocessor macros for GSNAP 1244 1245 * samprint.c: Using new interfaces to Substring_compute_chrpos and 1246 Pair_print_sam 1247 1248 * path-solve.h: Removed interface to Path_solve_via_gmap 1249 1250 * path-solve.c: Using new interfaces to substring procedures 1251 1252 * pair.c, pair.h: Removed unused parameters for Pair_print_sam 1253 1254 * output.c: Using new interfaces to stage3 print procedures 1255 1256 * iit-write-univ.c: Removed omegas as a parameter to node_select 1257 1258 * iit-read.c, iit-read.h: Commented out obsolete procedure 1259 12602019-01-16 twu 1261 1262 * genome128_consec.c: Added macros around some procedures. Removed unused 1263 procedures 1264 1265 * genome128_hr.c: Added macros around some procedures 1266 1267 * intersect.c, indexdb.c: Added LARGE_GENOMES macro to a procedure 1268 1269 * oligoindex_hr.c, oligoindex_hr.h: Removed unused parameters from 1270 Oligoindex_untally 1271 1272 * kmer-search.c, kmer-search.h, extension-search.c, extension-search.h: 1273 Removed unused parameters 1274 1275 * epu16-bitpack64-read.c, epu16-bitpack64-readtwo.c: Commented out print 1276 procedures for debugging 1277 1278 * epu16-bitpack64-incr.c: Turned off CHECK macro 1279 1280 * dynprog_single.c, dynprog_single.h: Removed glengthL and glengthR as 1281 parameters to Dynprog_microexon_int 1282 1283 * dynprog_genome.c, dynprog_genome.h: Removed unused parameters, including 1284 calculation of canonical_reward 1285 1286 * distant-rna.c, distant-rna.h: Removed user_maxlevel as a parameter 1287 1288 * datadir.c: Removed unused variables 1289 1290 * concordance.c, concordance.h: Using new interface to Stage3pair_new. 1291 Removed unused parameters 1292 1293 * stage3hr.c, stage3hr.h: Removed oligoindices_minor, diagpool, and cellpool 1294 as parameters, used previously for resolving insides 1295 1296 * compress-write.c: Using different format statements for Univcoord_T 1297 variables 1298 1299 * cigar.c: Removed trimlength as a parameter to length_cigar_M. Using new 1300 interface to Substring_compute_chrpos 1301 1302 * substring.c, substring.h: Removed plusp as a parameter to 1303 embellish_genomic and hardclip_high as a parameter to 1304 Substring_compute_chrpos 1305 1306 * reader.c, reader.h: Commented out unused procedures 1307 1308 * block.c, block.h: Removed indexdb_sizelimit as a parameter to 1309 Block_process_oligo_5 and Block_process_oligo_3 1310 1311 * atoiindex.c, cmetindex.c, snpindex.c: Using new interfaces to 1312 Indexdb_bitpack_counter and Localdb_new_genome 1313 1314 * localdb.c, localdb.h: Removed expand_offsets_p as a parameter to 1315 Localdb_new_genome 1316 1317 * indexdb-write.c, indexdb-write.h: Removed offsetsstrm and offsetspages as 1318 parameters to Indexdb_bitpack_counter and Indexdb_bitpack_counter_huge 1319 1320 * trunk, VERSION, src, Makefile.gsnaptoo.am, access.c, atoiindex.c, 1321 boyer-moore.h, cellpool.c, chrom.h, cigar.c, cmetindex.c, 1322 compress-write.h, concordance.c, concordance.h, diag.h, diagpool.c, 1323 distant-rna.c, distant-rna.h, extension-search.c, extension-search.h, 1324 filestring.c, genome128_hr.c, genome_sites.h, genomicpos.c, gmap.c, 1325 gmapindex.c, gregion.c, gsnap.c, hitlistpool.h, indel.c, indexdb.c, 1326 indexdb.h, intersect-large.h, intersect.c, intersect.h, intlist.c, 1327 intlist.h, intlistdef.h, intlistpool.c, intlistpool.h, junction.c, 1328 junction.h, kmer-search.c, kmer-search.h, ladder.c, ladder.h, list.h, 1329 listdef.h, listpool.c, listpool.h, localdb.c, localdb.h, localdbdef.h, 1330 matchpool.c, maxent_hr.h, mem.h, merge-diagonals-heap.h, 1331 merge-diagonals-simd-uint4.c, merge-diagonals-simd-uint4.h, 1332 merge-diagonals-simd-uint8.h, merge-uint4.c, method.c, oligoindex_hr.c, 1333 oligoindex_hr.h, outbuffer.c, outbuffer.h, output.c, output.h, pair.c, 1334 pair.h, pairpool.c, path-solve.c, path-solve.h, result.h, resulthr.h, 1335 samprint.c, samprint.h, segment-search.c, segment-search.h, shortread.c, 1336 shortread.h, splice.c, splice.h, splicestringpool.c, splicetrie.c, 1337 splicetrie_build.h, stage1hr.c, stage1hr.h, stage3.c, stage3.h, 1338 stage3hr.c, stage3hr.h, stage3hrdef.h, substring.c, substring.h, 1339 terminal.c, terminal.h, types.h, uint8listpool.c, uint8listpool.h, 1340 uint8table_rh.c, uint8table_rh.h, uintlist.c, uintlist.h, uintlistpool.c, 1341 uintlistpool.h, uinttable.c, uinttable_rh.c, uinttable_rh.h, uniqscan.c, 1342 univcoord.h, univdiag.c, univdiag.h, univdiagdef.h, univdiagpool.c, 1343 univdiagpool.h, univinterval.h: Merged revisions 216893 to 218146 from 1344 branches/2018-10-08-path-solve to improve path-solve procedure 1345 13462018-10-18 twu 1347 1348 * genome-write.c: Adding 2 words to the end of genomecomp, needed for 1349 accessing nextlow (ptr+4) in the fwd_partial and rev_partial procedures in 1350 oligoindex_hr.c 1351 13522018-10-10 twu 1353 1354 * genome.c, bitpack64-serial-write.c, bitpack64-write.c, genome-write.c, 1355 genome128.c, gmapindex.c, indexdb-write.c, indexdb.c, indexdb_hr.c, 1356 kmer-search.c, outbuffer.c, pair.c, parserange.c, stage3.c, stage3hr.c, 1357 substring.c: Replaced occurrences of 1U with 1 1358 1359 * sedgesort.c: Fixed bug in Sedgesort_uint8 where we assigned -1U instead of 1360 (UINT8) -1 as the sentinel value 1361 13622018-10-09 twu 1363 1364 * path-solve.c: Removing endpoints from the left and right to see if a 1365 continuing alignment works 1366 1367 * trunk, src, Makefile.gsnaptoo.am, concordance.c, concordance.h, 1368 distant-rna.c, distant-rna.h, extension-search.c, extension-search.h, 1369 gsnap.c, intlistpool.c, junction.c, junction.h, kmer-search.c, 1370 kmer-search.h, list.h, listpool.c, listpool.h, pair.c, pair.h, 1371 path-solve.c, path-solve.h, segment-search.c, segment-search.h, 1372 stage1hr.c, stage1hr.h, stage3hr.c, stage3hr.h, terminal.c, terminal.h: 1373 Merged revision 216940 from branches/2018-10-10-reduce-list-push to add 1374 Listpool_T object for lists of substrings and junctions 1375 1376 * trunk, VERSION, src, Makefile.gsnaptoo.am, intersect-large.c, 1377 intersect-large.h, intersect.c, intersect.h, path-solve.c, record.h, 1378 segment-search.c, segment-search.h, stage1hr.c, stage1hr.h: Merged 1379 revisions 216922 through 216936 from branches/2018-10-09-merge-records to 1380 replace Merge_records procedures in segment search with Merge_diagonals 1381 1382 * localdb.c: Allocating extra space for array, needed for Sedgesort 1383 1384 * concordance.c: Removed unused code for filtering paired-end hits 1385 1386 * concordance.c: Restored computation of abort_pairing_p 1387 13882018-10-08 twu 1389 1390 * trunk, VERSION, src, Makefile.gsnaptoo.am, distant-rna.c, distant-rna.h, 1391 extension-search.c, filter-diagonals.c, filter-diagonals.h, gsnap.c, 1392 kmer-search.c, kmer-search.h, localdb.c, localdb.h, 1393 merge-diagonals-heap.c, merge-diagonals-heap.h, 1394 merge-diagonals-simd-uint4.c, merge-diagonals-simd-uint4.h, 1395 merge-diagonals-simd-uint8.c, merge-diagonals-simd-uint8.h, 1396 merge-records-heap.c, merge-records-heap.h, merge-records-simd.c, 1397 merge-records-simd.h, merge-uint4.c, path-solve.c, path-solve.h, 1398 segment-search.c, segment-search.h, splice.c, splice.h, stage1hr.c, 1399 stage1hr.h, stage3hr.c, stage3hr.h, substring.c, substring.h, terminal.c, 1400 terminal.h: Merged revisions 216889 to 216917 from 1401 branches/2018-10-07-filter-diagonals to introduce a filtering step before 1402 segment search, and to pre-allocate memory for Merge_records, 1403 Merge_diagonals, Splice_resolve, and Substring_new procedures 1404 1405 * src, kmer-search.c: Fixed double-assignment of variable 1406 1407 * method.c, method.h, segment-search.c, segment-search.h, stage1hr.c: 1408 Distinguishing between segment search for single-end reads and segment 1409 search for anchored paired-end reads 1410 14112018-10-07 twu 1412 1413 * trunk, VERSION, config.site.rescomp.prd, src, Makefile.gsnaptoo.am, 1414 concordance.c, concordance.h, extension-search.c, extension-search.h, 1415 gsnap.c, intersect-large.c, intersect-large.h, intersect.c, intersect.h, 1416 intlistpool.c, intlistpool.h, intpool.c, intpool.h, kmer-search.c, 1417 kmer-search.h, ladder.c, ladder.h, localdb.c, mem.h, 1418 merge-diagonals-heap.c, merge-diagonals-heap.h, 1419 merge-diagonals-simd-uint4.c, merge-diagonals-simd-uint4.h, 1420 merge-diagonals-simd-uint8.c, merge-diagonals-simd-uint8.h, 1421 merge-heap-diagonals.c, merge-heap-diagonals.h, merge-heap-records.c, 1422 merge-heap-records.h, merge-records-heap.c, merge-records-heap.h, 1423 merge-records-simd.c, merge-records-simd.h, merge-simd-diagonals.c, 1424 merge-simd-diagonals.h, merge-simd-records.c, merge-simd-records.h, 1425 merge-uint4.h, merge-uint8.c, merge-uint8.h, method.c, method.h, 1426 path-solve.c, path-solve.h, sedgesort.c, sedgesort.h, segment-search.c, 1427 segment-search.h, stage1hr.c, stage1hr.h, stage3hr.c: Merged revisions 1428 216741 to 216887 from branches/2018-10-01-gsnapl-speed to increased speed 1429 of GSNAP and GSNAPL, especially for paired-end reads 1430 1431 * index.html: Updated for current version 1432 14332018-10-03 twu 1434 1435 * concordance.c, concordance.h, stage1hr.c: Using new interfaces to 1436 Stage3pair_new and Concordance_pair_up procedures 1437 1438 * stage3hr.c, stage3hr.h: No longer filtering substrings based on 1439 endtrim_allowed on one side. Performing resolve_insides at end of 1440 Stage3pair_new 1441 1442 * segment-search.c: Fixed debugging statements 1443 1444 * path-solve.c: No longer calling a check creation of a substring of the 1445 middle diagonal 1446 1447 * kmer-search.c: Hiding a debugging procedure 1448 1449 * acinclude.m4, simd-intrinsics.m4, configure.ac, genome128_consec.c, 1450 genome128_hr.c: Added compiler checks for the SIMD intrinsics 1451 _mm_extract_epi64 and _mm_popcnt_u64, and using them 1452 1453 * extension-search.c: Removed unused debugging procedure 1454 1455 * epu16-bitpack64-write.c: Modified comments 1456 1457 * bigendian.c, bigendian.h: Implemented FWRITE_USHORT and FWRITE_USHORTS for 1458 bigendian machines 1459 14602018-10-02 twu 1461 1462 * substring.c: Fixed uninitialized fields querystart_pretrim and 1463 queryend_pretrim in Substring_T object 1464 14652018-07-05 twu 1466 1467 * trunk, VERSION, config.site.rescomp.tst, index.html, src, path-solve.c: 1468 Changed check on sense_endpoints to antisense_endpoints before trying to 1469 remove an end segment 1470 14712018-06-29 twu 1472 1473 * types.h: Merged revisions 215752 to 215897 from 1474 branches/2018-06-15-path-solve-junctions to define Univcoordlist_pop 1475 1476 * stage3hrdef.h: Merged revisions 215752 to 215897 from 1477 branches/2018-06-15-path-solve-junctions to add nmatches_amb as a field 1478 1479 * stage3hr.c, stage3hr.h: Merged revisions 215752 to 215897 from 1480 branches/2018-06-15-path-solve-junctions to consider trim amount in the 1481 final comparison among alignments 1482 1483 * stage3.h: Merged revisions 215752 to 215897 from 1484 branches/2018-06-15-path-solve-junctions to take queryseq as an argument 1485 in merge procedures 1486 1487 * stage3.c: Merged revisions 215752 to 215897 from 1488 branches/2018-06-15-path-solve-junctions to copy all pairs when performing 1489 a merge, to peelback to indels on the medial side when taking the 1490 continuous solution, and to remove indels in insert_gapholders 1491 1492 * pair.c, pair.h: Merged revisions 215752 to 215897 from 1493 branches/2018-06-15-path-solve-junctions to implement Pair_split_circular 1494 1495 * junction.c, junction.h: Merged revisions 215752 to 215897 from 1496 branches/2018-06-15-path-solve-junctions to implement Junction_new_generic 1497 1498 * gmap.c: Merged revisions 215752 to 215897 from 1499 branches/2018-06-15-path-solve-junctions to split circular alignments to 1500 cross the origin 1501 1502 * chimera.c, chimera.h: Merged revisions 215752 to 215897 from 1503 branches/2018-06-15-path-solve-junctions to disallow chimeras to circular 1504 chromosomes 1505 1506 * path-solve.c: Merged revisions 215752 to 215897 from 1507 branches/2018-06-15-path-solve-junctions to create a new junction at the 1508 end, rather than use a precomputed junction 1509 1510 * path-solve.c: At ends, pushing NULL instead of the previously computed 1511 junction 1512 1513 * pair.c, pair.h: Implemented checking procedures 1514 15152018-06-15 twu 1516 1517 * stage3hr.c: Removed extraneous characters in debugging statement 1518 15192018-06-14 twu 1520 1521 * stage3.c: Fixed handling of end exons in end trimming procedures. Using 1522 new interfaces to Pair_clip_bounded_list_5 and Pair_clip_bounded_list_3 1523 1524 * pair.c, pair.h: Implemented separate 5' and 3' versions of 1525 Pair_clip_bounded_list 1526 1527 * chimera.c: Fixed debugging statements to use new interface to 1528 Sequence_stdout 1529 15302018-05-30 twu 1531 1532 * pair.c: In converting pairarray to substrings, now resetting exon 1533 variables after an insertion or deletion 1534 1535 * iit-read.c: Added a warning in using IIT_read for a version 1 IIT 1536 1537 * iit_get.c, iit_store.c: Fixed some memory leaks 1538 1539 * getline.c, getline.h: Returning string_length with Getline_wlinefeed 1540 1541 * iit_store.c: Using Getline_wlinefeed instead of Getline_wlength 1542 1543 * stage1hr.c: Fixed uninitialized variable 1544 1545 * substring.c: Fixed assertion 1546 1547 * stage3hr.c: Checking if there is enough space at ends of the chromosome 1548 before resolving inner exons 1549 1550 * gmapindex.c: Assigning variable and not pointer when clearing empty space 1551 at end of line 1552 1553 * atoiindex.c, cmetindex.c, indexdb-write.c, snpindex.c: Calling 1554 Access_allocate_private properly for machines where mmap is not available 1555 or disabled 1556 1557 * gregion.c: Improved procedure for finding unique gregions by sorting by 1558 support instead of weight, by checking if query coordinates are 1559 consistent, and handling cases where endpoints are equal 1560 1561 * Makefile.gsnaptoo.am: Not making iit_pileup 1562 15632018-05-25 twu 1564 1565 * Makefile.gsnaptoo.am: Added getline.c and getline.h to library 1566 1567 * get-genome.c: Using variable line instead of Buffer 1568 1569 * trunk, config.site.rescomp.prd, config.site.rescomp.tst, index.html, src, 1570 gmap.c, oligoindex_hr.c, stage2.c, stage3.c: Merged revisions 215481 1571 through 215483 from branches/2018-05-25-restore-compute-ends to restore 1572 GMAP behavior from 2018-03-20 and use Stage3middle_T, 1573 Stage3_compute_middle, and Stage3_comput_ends for better alignment at ends 1574 1575 * datadir.c: Fixed extraneous parenthesis 1576 1577 * datadir.c, Makefile.gsnaptoo.am, iit_get.c, iit_store.c: Using Getline 1578 1579 * gmap.c, chrsubset.c, iit-read-univ.c, iit-read.c, iit_store.c, samread.c, 1580 sequence.c, shortread.c, splicing-scan.c, stage2.c, stage3hr.c: Ensuring 1581 that calls to strncpy are followed by setting the end to be '\0'. Using 1582 malloc instead of calloc in these situations 1583 1584 * Makefile.gsnaptoo.am, datadir.c, get-genome.c, getline.c, getline.h, 1585 gmapindex.c: Using calls to Getline to prevent problems with buffer 1586 overflow 1587 1588 * substring.c: Fixed calculation of mandatory_trim_left and 1589 mandatory_trim_right for alias to be positive, rather than negative 1590 15912018-05-11 twu 1592 1593 * VERSION: Updated version number 1594 1595 * src: Made changes 1596 1597 * fa_coords.pl.in, gmap_build.pl.in, gmap_process.pl.in: Added options to 1598 handle fastq files and to reverse complement all sequences 1599 1600 * stage1hr.c: No longer iterating through Segment_search 1601 1602 * stage1hr.c: Not calling Stage1_init or other Stage1 procedures when 1603 querylength < index1part 1604 1605 * substring.c: Not doing aliasing and unaliasing on alt substrings. 1606 Computing mandatory_trim_querystart and mandatory_trim_queryend instead of 1607 mandatory_trim_left and mandatory_trim_right 1608 1609 * trindex.c: Revised instructions 1610 1611 * iit-read-univ.c, iit_get.c, iit_pileup.c, iit_tally.c: Freeing div name 1612 allocated by new versions of Parserange routines 1613 1614 * genome128_consec.c, genome128_hr.c: Added header files needed for SSSE3 1615 computers 1616 16172018-04-30 twu 1618 1619 * shortread.c: Formatting change 1620 1621 * parserange.c: Allow colons in accession names 1622 1623 * gmapindex.c: Allow colons in accession. Implementing revcomp by using '-' 1624 sign for contig length 1625 1626 * get-genome.c: Freeing chromosome string after call to Parserange 1627 16282018-04-21 twu 1629 1630 * stage1hr.c: Merged revision 214805 from branches/2018-04-21-fix-anchors to 1631 use Solve_segment_all instead of Concordance_filter_records for paired-end 1632 reads 1633 1634 * stage3hr.c, stage1hr.c: Added variable remap_transcriptome_p 1635 1636 * stage3.c: Checking for a NULL stage3 in Stage3_split 1637 1638 * intersect.c: Fixed comments 1639 1640 * stage1.c: Using Oligospace_T to cast 0 1641 1642 * kmer-search.c: Fixed issues with transcript coordinates 1643 16442018-04-20 twu 1645 1646 * gmap.c, gsnap.c, inbuffer.c, inbuffer.h, sequence.c, sequence.h, 1647 shortread.c, shortread.h: Adding a command-line option --read-files-command 1648 1649 * extension-search.c, segment-search.c, stage1hr.c: Using (Oligospace_T) as 1650 a cast for 0 1651 1652 * Makefile.gsnaptoo.am, fopen.c, fopen.h: Added a command-line option 1653 --read-files-command 1654 1655 * gsnap.c: Changed flag from --use-transcriptome to --use-transcriptome-only 1656 16572018-03-25 twu 1658 1659 * samprint.c: Using new interface to Pair_print_sam 1660 1661 * trunk, config.site.rescomp.prd, index.html, src, block.c, gmap.c, 1662 indexdb.c, oligoindex_hr.c, pair.c, stage1.c, stage2.c, stage3.c, 1663 translation.c: Merged revisions 214439 through 214446 from 1664 branches/2018-03-24-restore-gmap to restore speed and sensitivity of GMAP 1665 by eliminating use of Stage3_compute_ends, restoring use of multiple 1666 oligoindices, and fixing CDS phases for GFF3 output 1667 16682018-03-24 twu 1669 1670 * indexdb.c: Removed option for expanding offsets 1671 1672 * gmap.c: Fixed memory issues with new calls to Stage3_merge_local and 1673 Stage3_merge_chimera, and appending middlepieces to stage3list. Changed 1674 default gff3_cds to be genomic. Ignoring option for --expand-offsets 1675 1676 * stage3.c, stage3.h: Removed cigar_tokens and intronp as fields for 1677 Stage3_T objects. Implemented Stage3_copy and using it in 1678 Stage3_merge_local and Stage3_merge_chimera 1679 1680 * pair.c, pair.h: Removed cigar_tokens and intronp as paramters to 1681 Pair_print_sam 1682 16832018-03-23 twu 1684 1685 * substring.c: Added assertions about alts, to make sure we don't use 1686 alignstart_trim or alignend_trim fields in those cases 1687 1688 * stage3hr.c: Checking for alts when getting chrpos_low or chrpos_high 1689 1690 * samprint.c: Formatting changes 1691 1692 * dynprog.h: Setting AMBIGUOUS score to be 3, to make cmet and atoi 1693 alignments equivalent to standard 1694 1695 * cigar.c: For sam_hardclip_use_S option, returning hardclips to be 0 so 1696 they don't affect the query sequence in SAM output 1697 1698 * stage3.c: Restored force of single gaps, to avoid problems with 1699 add_dual_break later 1700 17012018-03-21 twu 1702 1703 * iit_tally.c: Initial import 1704 17052018-03-20 twu 1706 1707 * VERSION: Updated version number 1708 1709 * trunk, src, dynprog.c, dynprog.h, dynprog_cdna.c, dynprog_cdna.h, 1710 dynprog_end.c, dynprog_end.h, dynprog_genome.c, dynprog_genome.h, 1711 dynprog_simd.c, dynprog_simd.h, dynprog_single.c, dynprog_single.h, 1712 gmap.c, gsnap.c, pair.c, path-solve.c, splicetrie.c, splicetrie.h, 1713 stage3.c, stage3.h: Merged revisions 214343 through 214360 from 1714 branches/2018-03-20-cmet-gmap to make non-standard modes work in GMAP 1715 alignments 1716 1717 * sam-exons.pl.in: Initial import 1718 1719 * indexdb.c: Fixed preprocessor macro 1720 17212018-03-19 twu 1722 1723 * VERSION: Updated version number 1724 1725 * trunk, VERSION, config.site.rescomp.tst, configure.ac, index.html, src, 1726 Makefile.gsnaptoo.am, atoi.c, atoi.h, atoiindex.c, cmet.c, cmet.h, 1727 cmetindex.c, epu16-bitpack64-access.c, epu16-bitpack64-access.h, 1728 epu16-bitpack64-incr.c, epu16-bitpack64-incr.h, epu16-bitpack64-read.c, 1729 epu16-bitpack64-read.h, epu16-bitpack64-readtwo.c, 1730 epu16-bitpack64-readtwo.h, epu16-bitpack64-write.c, 1731 epu16-bitpack64-write.h, extension-search.c, extension-search.h, 1732 genomicpos.c, genomicpos.h, gsnap.c, indexdb.h, indexdbdef.h, 1733 kmer-search.c, kmer-search.h, localdb-write.c, localdb.c, localdb.h, 1734 localdbdef.h, oligo.c, path-solve.c, path-solve.h, reader.c, stage1hr.c, 1735 types.h: Merged revisions 214117 through 214304 from 1736 branches/2018-03-11-cmet-localdb to implement non-standard modes with 1737 support for localdb 1738 1739 * substring.c: Removed an assertion that is not always valid 1740 1741 * stage3hr.c, stage3hr.h: Commented out code for Stage3end_substring_high, 1742 which is not used anymore 1743 1744 * samprint.c: Fixed typo in using querylength5 instead of querylength3 for 1745 mate 1746 1747 * samflags.h: Incremented values for number of filestream outputs, to handle 1748 XS output files properly 1749 1750 * outbuffer.c: Always skipping output for OUTPUT_NONE filestream 1751 1752 * Makefile.gsnaptoo.am: Hiding program iit_pileup 1753 1754 * oligoindex_hr.c, localdb.c: Allowing debugging statements to work 1755 1756 * kmer-search.c: Added debugging statements 1757 1758 * indexdb.c: Distinguishing between GSNAP pointer procedures, which need to 1759 handle negative diagterms, and GMAP read procedures, where diagterms are 1760 always non-negative 1761 1762 * gsnap.c: Added option for --sam-hardclip-use-S 1763 1764 * cigar.c, cigar.h: Calling low substring for all circular chromosome hits 1765 to get correct chrpos in SAM output. Added provisions for 1766 SAM_hardclip_use_S_p 1767 17682018-03-10 twu 1769 1770 * substring.c: Changed assertion to allow for '*' 1771 1772 * stage3.c: Added a clip of end5 against the chromosomal bound for 1773 alignments on the minus strand 1774 1775 * intersect.c: Commenting out advance of positions past -diagterm 1776 1777 * indexdb.c: No longer checking for diagterm < 0, since diagterm <= 0 should 1778 be true 1779 1780 * extension-search.c: Reverted advance of positions past -diagterm and added 1781 assertions instead. Changed algorithm to not add an elt when the 1782 positions are invalid. 1783 1784 * shortread.c: Preventing invalid read when accession contains only one or 1785 two characters 1786 1787 * kmer-search.c, merge-heap-diagonals.c, merge-heap-records.c, 1788 merge-simd-diagonals.c, merge-simd-records.c: Added assertions to check 1789 that positions >= -diagterm 1790 1791 * extension-search.c, intersect.c: Skipping or advancing positions so they 1792 exceed -diagterm 1793 1794 * oligo.c: Fixed initialization code so that Clang compiler will accept it 1795 17962018-03-09 twu 1797 1798 * localdb-write.c: Added a message to indicate when writing is done 1799 1800 * Makefile.gsnaptoo.am, iit_pileup.c: Added ability for the user to specify 1801 a genomic range 1802 1803 * stage3hr.c, stage3hr.h: Fixed bug with circular read not being unaliased 1804 because of a soft clip. Removed macro for soft clips avoiding 1805 circularization. Removed circularalias values of +2 and -2, since 1806 alignments now should stay within chromosomal bounds 1807 1808 * iit_pileup.c: Added code for using filestrings 1809 1810 * Makefile.gsnaptoo.am, iit-read.c, iit-read.h, iit_pileup.c: Restored 1811 iit_pileup program. Rewrote to take a FASTA file as input 1812 18132018-03-05 twu 1814 1815 * stage3.c: Added back declaration of variables 1816 1817 * stage3.c: Removed unused variables and parameters 1818 1819 * samprint.c: Removed unused static variable 1820 1821 * path-solve.c: Using new interface to Stage3end_new_gmap 1822 1823 * stage3hr.c, stage3hr.h: Removed unused parameters to Stage3end_new_gmap. 1824 Removed unused code 1825 1826 * gsnap.c: Removed unused parameter to process_request 1827 1828 * extension-search.c: Removed unused code 1829 1830 * stage1hr.c: Fixed memory error when abort_pairing_p is true 1831 18322018-03-04 twu 1833 1834 * localdb.c: Defined variables needed for large genomes 1835 1836 * trunk, VERSION, src, block.c, block.h, distant-rna.c, distant-rna.h, 1837 extension-search.c, gmap.c, gsnap.c, kmer-search.c, kmer-search.h, 1838 ladder.c, ladder.h, localdb.c, localdb.h, oligo.c, oligo.h, pair.c, 1839 pair.h, path-solve.c, path-solve.h, segment-search.c, segment-search.h, 1840 splice.c, splice.h, stage1.c, stage1.h, stage1hr.c, stage1hr.h, stage2.c, 1841 stage3.c, stage3hr.c, stage3hr.h, substring.c, substring.h, terminal.c, 1842 terminal.h: Merged revisions 213978 through 214024 from 1843 branches/2018-03-01-ptr-and-diagterm to add potential support for cmet, 1844 atoi, and ttoc modes 1845 18462018-03-03 twu 1847 1848 * cigar.c, cigar.h, gsnap.c: Obey behavior of --sam-use-0M flag 1849 18502018-03-02 twu 1851 1852 * trunk, config.site.rescomp.prd, src, Makefile.gsnaptoo.am, access.c, 1853 atoiindex.c, bitpack64-readtwo.c, cmetindex.c, epu16-bitpack64-read.c, 1854 epu16-bitpack64-readtwo.c, epu16-bitpack64-write.c, extension-search.c, 1855 filesuffix.h, genome-write.c, genome128_hr.c, gmapindex.c, gsnap.c, 1856 iit-write.c, iit_store.c, indel.c, indexdb-write.c, indexdb.c, indexdb.h, 1857 indexdbdef.h, intersect.c, intersect.h, kmer-search.c, kmer-search.h, 1858 localdb.c, merge-heap-diagonals.c, merge-heap-diagonals.h, 1859 merge-heap-records.c, merge-heap-records.h, merge-simd-diagonals.c, 1860 merge-simd-diagonals.h, oligoindex_hr.c, pair.c, path-solve.c, 1861 segment-search.c, segment-search.h, smooth.c, snpindex.c, stage1hr.c, 1862 stage1hr.h, stage2.c, stage3.c, stage3hr.c, substring.c, terminal.c, 1863 types.h: Merged revisions 213921 to 213977 from 1864 branches/2018-03-01-ptr-and-diagterm to add back support for large genomes 1865 1866 * stage3hr.c: Improved debugging statements 1867 1868 * kmer-search.c: Removed extraneous calls to free lists 1869 1870 * ladder.c, ladder.h: Implemented Ladder_cutoff to allow for ladders, but 1871 restrict computation to MAX_HITS 1872 1873 * concordance.c: Calling Ladder_cutoff instead of Ladder_maxscore 1874 1875 * cigar.c: Fixed computation of SAM chrpos. Always calling 1876 Stage3end_substring_low for a single SAM line 1877 1878 * concordance.c: Applying MAX_HITS to Concordance_pair_up_distant 1879 18802018-03-01 twu 1881 1882 * stage1hr.c: Fixed a typo: querypos_rc instead of querypos for minus 1883 positions 1884 1885 * indexdb.c: Hiding Indexdb_ptr_with_diagterm from utility programs 1886 1887 * pairpool.c, pairpool.h, stage3.c: Using a better method for handling 1888 chromosomal bounds in GMAP, by trimming pairs, rather than the pairarray 1889 1890 * trunk, VERSION, config.site.rescomp.prd, config.site.rescomp.tst, src, 1891 Makefile.gsnaptoo.am, extension-search.c, extension-search.h, indexdb.c, 1892 indexdb.h, intersect.c, intersect.h, kmer-search.c, kmer-search.h, 1893 localdb.c, merge-heap-diagonals.c, merge-heap-diagonals.h, 1894 merge-heap-records.c, merge-heap-records.h, merge-simd-diagonals.c, 1895 merge-simd-diagonals.h, merge-simd-records.c, merge-simd-records.h, 1896 segment-search.c, stage1hr.c, stage1hr.h: Merged revisions 213873 through 1897 213921 from branches/2018-03-01-ptr-and-diagterm to use pointers to 1898 positions and diagterms, rather than copying positions 1899 1900 * substring.c: In Substring_set_alt, handling amb_splice_pos properly as 1901 being defined from left, rather than querystart 1902 1903 * Makefile.util.am: Added instructions for bitpack64-test 1904 1905 * segment-search.c: Freeing lists provided to Merge_records 1906 1907 * path-solve.c: Using FREE_ALIGN for freeing results of merge procedures 1908 1909 * localdb.c: Localdb_read_with_bounds now always returns aligned memory, 1910 which can happen if merging is performed. Using new interface to merge 1911 procedures 1912 1913 * kmer-search.c: Fixed a memory leak in exact procedure when the ends have 1914 invalid oligos 1915 1916 * merge-heap-diagonals.c, merge-heap-records.c, merge-simd-diagonals.c, 1917 merge-simd-records.c: Merge procedures no longer free their input lists or 1918 streams. Streams can be aligned or not, so the caller needs to free them 1919 1920 * pair.c, pairpool.c, pairpool.h, stage3.c: Prevent GMAP results from going 1921 beyond chromosomal bounds, when making the final pairarray 1922 19232018-02-28 twu 1924 1925 * stage1hr.c: Using new interfaces to exact and approx algorithms 1926 1927 * stage3hr.c, stage3hr.h: Renamed Stage3end_list_free to Stage3end_gc 1928 1929 * segment-search.c: Fixing memory leak 1930 1931 * kmer-search.c, kmer-search.h: Limiting results for exact algorithm with 1932 max_hits, and limiting approx algorithm with both max_hits and sizelimit 1933 1934 * concordance.c: Limiting results for newladder with MAX_HITS 1935 1936 * uint8table.c, uint8table.h: Initial implementation 1937 1938 * Makefile.gsnaptoo.am, stage1hr.c: Using a Uint8table_T object to call 1939 repeated occurrences of an oligo invalid 1940 1941 * substring.c: Allocating querylength+1 for mismatch_positions 1942 1943 * path-solve.c: Making a better call to Substring_new for the middle 1944 diagonal, and handling the case where the result is NULL 1945 1946 * gsnap.c: Including header for concordance.h 1947 1948 * terminal.c, stage3hr.c: Using new interface to Substring_new 1949 1950 * path-solve.c: Removed all instances of middle_path 1951 1952 * extension-search.c, kmer-search.c: Using new interfaces to 1953 Path_solve_from_diagonals and Substring_new 1954 1955 * segment-search.c: Using new interface to Path_solve_from_diagonals. 1956 Letting that procedure determine which left and right diagonals are in the 1957 correct chrnum 1958 1959 * path-solve.c, path-solve.h: Path_solve_from_diagonals now takes a middle 1960 diagonal, rather than a middle path. Sets chrnum based on middle 1961 diagonal, as determined by Substring_new. Also considers only left and 1962 right diagonals in that chrnum. 1963 1964 * substring.c, substring.h: Substring_new now takes chrnum_fixed_p as a 1965 parameter. If true, then the procedure does not recompute chromosomal 1966 bounds 1967 19682018-02-27 twu 1969 1970 * segment-search.c: Using left instead of lowpos to determine chrnum, to be 1971 safe 1972 1973 * substring.c: Checking for substring being below given chroffset, and 1974 recomputing chromosomal bounds 1975 1976 * segment-search.c: When adding segments to left and right, checking to make 1977 sure they fall at least partially into the same chromosome as the anchor 1978 segment 1979 1980 * pair.c: Fixed argument for setting querystart_pretrim and queryend_pretrim 1981 1982 * substring.c: Fixed typo 1983 1984 * pair.c, substring.c, substring.h: Computing more accurate values for 1985 querystart_pretrim and queryend_pretrim. Removed assertion about 1986 left_bound and right_bound in Substring_count_mismatches_region 1987 1988 * concordance.c, stage1hr.c, stage3hr.c, stage3hr.h, stage3hrdef.h: Removed 1989 private5p and private3p fields from Stage3pair_T object. Always copying 1990 hits when making a pair, needed because the concordance procedure can now 1991 delete hits 1992 1993 * ladder.c: Assigning a value to nhits in all cases from 1994 Ladder_hits_for_score 1995 1996 * concordance.c, stage1hr.c: Restored assignment of abort_pairing_p 1997 1998 * ladder.c, ladder.h, stage3hr.c, stage3hr.h: Removing duplicates before 1999 returning hits at a given score. Added a duplicates field to hold 2000 duplicate hits, and a procedure for deleting them 2001 20022018-02-26 twu 2003 2004 * filestring.c: Now handling %p in format statement 2005 2006 * stage3hr.c: Changed one occurrence of Substring_ambiguous_p to 2007 Substring_has_alts_p 2008 2009 * trunk, src, Makefile.gsnaptoo.am, cigar.c, concordance.c, concordance.h, 2010 distant-rna.c, distant-rna.h, gsnap.c, kmer-search.c, kmer-search.h, 2011 ladder.c, ladder.h, merge-heap-records.c, merge-heap-records.h, 2012 merge-simd-records.c, merge-simd-records.h, merge-uint4.c, method.c, 2013 method.h, pair.c, pair.h, path-solve.c, record.h, samprint.c, 2014 segment-search.c, segment-search.h, splice.h, stage1hr.c, stage1hr.h, 2015 stage3hr.c, stage3hr.h, stage3hrdef.h, substring.c, substring.h, 2016 terminal.c, terminal.h, types.h, univdiag.c: Merged revisions 213654 to 2017 213758 from branches/2018-02-22-limit-segment-search to add concordance 2018 procedures, ladders, and filters for segment search, and to revise the 2019 single-end and paired-end procedures 2020 20212018-02-22 twu 2022 2023 * trunk, VERSION, config.site.rescomp.tst, src, Makefile.gsnaptoo.am, 2024 block.c, extension-search.c, genomicpos.h, gmap.c, gsnap.c, indexdb.c, 2025 indexdb.h, intersect.c, intersect.h, intlist.c, intlist.h, junction.c, 2026 kmer-search.c, localdb.c, localdb.h, merge-heap-diagonals.c, 2027 merge-heap-diagonals.h, merge-heap-records.c, merge-heap-records.h, 2028 merge-heap.c, merge-heap.h, merge-simd-diagonals.c, 2029 merge-simd-diagonals.h, merge-simd-records.c, merge-simd-records.h, 2030 merge-uint4.c, merge-uint4.h, merge.c, merge.h, oligo.c, pair.c, pair.h, 2031 path-solve.c, segment-search.c, segment-search.h, splice.c, splicetrie.c, 2032 splicetrie.h, stage1.c, stage1hr.c, stage1hr.h, stage2.c, stage2.h, 2033 stage3.c, stage3hr.c, stage3hr.h, substring.c, substring.h, types.h, 2034 uint8list.c, uint8list.h, uintlist.h, uniqscan.c, util, gmap_build.pl.in: 2035 Merged revisions 213593 through 213652 from 2036 branches/2018-02-21-large-genomes to add support for large genomes 2037 20382018-02-21 twu 2039 2040 * stage1hr.c: Checking for a set with zero elements before calling 2041 Orderstat_int_pct 2042 2043 * segment-search.c: Excluding invalid oligos 2044 2045 * path-solve.c: When novelsplicing is false, creating just one hit 2046 2047 * oligo.c: Commented out unused procedures 2048 2049 * kmer-search.c: Handling invalid oligos correctly 2050 2051 * stage1hr.c: Making poly_A and poly_T oligos not valid. Added a min 2052 sizelimit 2053 2054 * stage1hr.c: Added comment 2055 2056 * cigar.c, samprint.c, stage3hr.c, stage3hr.h: Removed code designed for old 2057 meaning of Substring_ambiguous_p, now distinct from Substring_has_alts_p 2058 2059 * indexdb.c: Commented out SIMD procedures for utility programs. Added 2060 support for AVX2 2061 2062 * trunk, src, extension-search.c, extension-search.h, indexdb.c, indexdb.h, 2063 kmer-search.c, kmer-search.h, path-solve.c, path-solve.h, 2064 segment-search.c, segment-search.h, splice.c, stage1hr.c, stage1hr.h, 2065 stage3hr.c, stage3hr.h, substring.c, substring.h, types.h: Merged 2066 revisions 213472 through 213572 from branches/2018-02-16-faster-one-kmer 2067 to increase speed and accuracy 2068 2069 * VERSION, config.site.rescomp.prd, config.site.rescomp.tst, index.html: 2070 Updated for latest version 2071 2072 * stage1hr.c: Added debugging statements 2073 20742018-02-16 twu 2075 2076 * stage3.c: Replaced check with an assertion 2077 2078 * gmap.c: Filtering out Stage3_T objects with zero npairs 2079 2080 * junction.c, junction.h: Implemented Junction_typestring 2081 2082 * stage3.c: Making explicit check for npairs being 0 in Stage3_new 2083 2084 * pair.c: Handling extra exons with a specific type and transition rules 2085 2086 * gmap.c: Checking for Stage3_T object having zero pairs before pushing onto 2087 list 2088 2089 * cigar.c: Checking hardclip when printing insertion token. Calling either 2090 Stage3end_substring_low or Stage3end_substring_high 2091 2092 * stage3hr.c: Added provisions for junctions_HtoL 2093 20942018-02-13 twu 2095 2096 * localdb.c: Performing a check for possible negative coordinates 2097 2098 * cigar.c, cigar.h: Storing value of extended_cigar_p during setup 2099 2100 * gsnap.c: Added flags for --endtrim-length and --sam-extended-cigar 2101 2102 * stage3.c: In Stage3_recompute_coverage, added a check for npairs being 0 2103 2104 * shortread.c, shortread.h: Added support for end (3') trimming of each read 2105 2106 * kmer-search.c: Improved debugging statements 2107 2108 * gmap.c: Added checks for breakpoint being invalid, before running 2109 Stage3_mergeable 2110 2111 * chimera.c: Added checks for npairs being zero 2112 2113 * gsnap.c: Removed the --expand-offsets flag 2114 2115 * access.c: Put a limit on the maximum number of attempts to kill an 2116 unattached shared memory segment 2117 2118 * splice.c, substring.c: Checking for cases where the splice search 2119 boundaries yield a negative coordinate. Redefined middle coordinate to be 2120 alignstart and alignend, and not the positions 1 bp distal to these 2121 coordinates. 2122 2123 * stage1hr.c: Increased all sizelimits for diagonals from 100 to 3000 2124 2125 * stage3.c: Defined all parameters for allowed iterations to be 1 2126 2127 * stage2.c: Added a step to filter stage2 middles, starts, and ends for 2128 uniqueness 2129 21302018-02-11 twu 2131 2132 * substring.c: Improved debugging statement 2133 2134 * stage3hr.c: Added a comment 2135 2136 * splice.c: Improved debugging statements 2137 2138 * stage3hr.c: Now allowing Stage3pair_resolve_insides to resolve both ends 2139 at the same time 2140 2141 * VERSION: Updated version number 2142 2143 * stage3hr.c: Added debugging statements 2144 21452018-02-10 twu 2146 2147 * pair.c: Fixed counting routines to skip over bad pairs correctly 2148 2149 * substring.c: Removed unused code 2150 2151 * stage3hr.c: In Stage3pair_resolve_insides and resolve_inside_general 2152 procedures, when hits are changed for the hitpair, revising the values of 2153 alts_resolve_5 and alts_resolve_3. Fixes a seg fault when 2154 Stage3pair_eval_and_sort tries to resolve the hits a second time 2155 2156 * path-solve.c, samprint.c, stage3hr.c, stage3hr.h, substring.c, 2157 substring.h: Changed variable names to distinguish between an ambiguous 2158 splice length and alternative splice coords 2159 2160 * trunk, config.site.rescomp.tst, src, pair.c, pair.h, segment-search.c, 2161 segment-search.h, stage1hr.c, stage3.c, stage3hr.c: Merged revisions 2162 213287 through 213291 from branches/2018-02-10-fix-bugs to fix bugs 2163 related to pairarrays that crossed chromosomal bounds, segments subsuming 2164 others, and handling of dual breaks 2165 2166 * pair.c: Hiding function Pairarray_convert_to_substrings from GMAP 2167 2168 * trunk, index.html, src, cigar.c, cigar.h, distant-rna.c, 2169 extension-search.c, filestring.c, filestring.h, gsnap.c, kmer-search.c, 2170 output.c, pair.c, pair.h, path-solve.c, samprint.c, samprint.h, 2171 segment-search.c, splice.c, stage1hr.c, stage3hr.c, stage3hr.h, 2172 substring.c, substring.h: Merged revisions 213162 through 213277 from 2173 branches/2018-02-07-improve-circular to standardize printing of circular 2174 and translocation alignments 2175 2176 * stage3hr.c: Fixed printing of method label 2177 21782018-02-08 twu 2179 2180 * extension-search.c: Making sure that queryoffset does not go outside 2181 bounds of 0 to query_lastpos 2182 21832018-02-07 twu 2184 2185 * substring.c: In Substring_new, giving initial values to trim_left and 2186 trim_right 2187 2188 * substring.c: In Substring_new, calling trim_left_end and trim_right_end 2189 from 0 and querylength, but trim_novel_spliceends from given querystart 2190 and queryend. If novel spliceends yields a short exon, then using the 2191 non-spliced trimming results. 2192 2193 * stage3hr.c: Calling compute_circularpos on all Substring_T objects 2194 2195 * samprint.c: Added debugging statements 2196 2197 * substring.c: In Substring_new, initializing necessary values of 2198 Substring_T object earlier than first possible abort and call to 2199 Substring_free 2200 22012018-02-06 twu 2202 2203 * trunk, config.site.rescomp.tst, index.html, src, extension-search.c, 2204 gmap.c, oligo.c, oligo.h, reader.c, reader.h, stage1hr.c, svncl.pl: Merged 2205 revisions 213097 through 213130 from branches/2018-02-05-distant-dna to 2206 allow for updating of querypos around N's and fix of fatal bug in GMAP in 2207 merge_left_and_right_readthrough 2208 22092018-02-05 twu 2210 2211 * substring.c: In trim_left_end and trim_right_end, computing trims relative 2212 to querystart and queryend, rather than 0 and querylength. In 2213 embellish_genomic_sam, filling in beginning and end with dashes and stars 2214 to avoid genomic_refdiff having a different length than querylength 2215 2216 * Makefile.gsnaptoo.am: When MAKE_LIB is not defined, not copying headers 2217 either 2218 2219 * VERSION, config.site.rescomp.prd: Updated version number 2220 2221 * get-genome.c, parserange.c, parserange.h: Allowing Parserange_universal to 2222 return a value for whole_chromosome_p 2223 2224 * substring.c: In Substring_new, always making querystart and queryend 2225 correspond to trim_left and trim_right, to fix errors in CIGAR strings 2226 2227 * access.c: Using a loop to create (and possibly deallocate) shared memory 2228 2229 * access.c: Upon startup, if a shared memory segment exists with no other 2230 attached processes (possibly corrupted), then deallocating it and creating 2231 a new one 2232 2233 * kmer-search.c: Simplified code in making sure that both ends are in the 2234 same chromosome 2235 2236 * samprint.c: In SAM_compute chrpos, if hardclip_low yields a NULL 2237 low_substring, then trying hardclip_high 2238 2239 * substring.c: Handling the case where both cdna and genome have 'N' 2240 2241 * stage3hr.c, stage3hr.h: Reversed revision 213080, which is giving CIGAR 2242 errors. Restored Stage3end_substring_high 2243 2244 * substring.c: In Substring_new, added a minimum of 8 bp in general test 2 2245 2246 * stage3hr.c: For Stage3end_substring_low and Stage3end_substring_high, if 2247 the hardclip passes all substrings, return the last one processed 2248 2249 * stage3hr.c: In Stage3end_new_substrings, handling the case where 2250 substring1 and substringN have different chrnums assigned by Substring_new 2251 2252 * stage3hr.c: In Stage3end_new_substitution, handling the case where 2253 Substring_new has assigned a different chrnum than the one given 2254 2255 * substring.c: Made trim incremental when performed as a preliminary step 2256 before finding novel splice ends 2257 2258 * kmer-search.c: Made a more rigorous check that both ends are on the same 2259 chromosome 2260 2261 * kmer-search.c: Making sure the two ends are on the same chromosome 2262 2263 * substring.c: Handling the case where the alignment goes over the upper 2264 bound of the next higher chromosome 2265 2266 * substring.c: Removed residue from an SVN merge conflict 2267 2268 * trunk, src, Makefile.gsnaptoo.am, distant-rna.c, gsnap.c, path-solve.c, 2269 stage1hr.c, stage3hr.c, stage3hr.h, terminal.c, terminal.h, types.h: 2270 Merged revisions 213033 through 213070 from 2271 branches/2018-02-05-add-terminals to add terminal alignments 2272 22732018-02-04 twu 2274 2275 * substring.c: Not applying general test 1 if orig_nmismatches < 0 2276 2277 * substring.c: Making outofbounds adjustments to be increments to the 2278 existing trim_left and trim_right 2279 2280 * trunk, src, cigar.c, distant-rna.c, gsnap.c, kmer-search.c, samprint.c, 2281 splice.c, stage1hr.c, stage3hr.c, stage3hr.h, substring.c, substring.h: 2282 Merged revisions 213027 through 213053 from 2283 branches/2018-02-04-simplify-substring-new to simplify Substring_new 2284 procedure 2285 2286 * config.site.rescomp.prd, config.site.rescomp.tst, index.html, svncl.pl: 2287 Updated version number 2288 2289 * stage3.c: Added a List_reverse command 2290 2291 * types.h: Added comments 2292 2293 * stage1hr.c: Fixed wrong assignment of first_read_p 2294 2295 * stage3hr.c: Checking for the eventrim region having bounds that don't make 2296 sense 2297 2298 * substring.c: Added an assertion 2299 2300 * stage3.c: Restored pass 7 to remove dual breaks at the ends 2301 2302 * stage1hr.c: Fixed a memory leak for queryrc 2303 2304 * filestring.c: Added a check for fp being NULL, which can occur with the 2305 flags --omit-concordant-uniq or --omit-concordant-mult 2306 23072018-02-02 twu 2308 2309 * VERSION: Updated version number 2310 2311 * configure.ac: Fixed command for --enable-lib 2312 23132018-02-01 twu 2314 2315 * trunk, index.html, src, block.c, block.h, extension-search.c, 2316 extension-search.h, filestring.c, gmap.c, gsnap.c, indexdb.c, indexdb.h, 2317 intersect.c, kmer-search.c, kmer-search.h, oligoindex_hr.c, pair.c, 2318 pair.h, path-solve.c, segment-search.c, stage1.c, stage1.h, stage1hr.c, 2319 stage1hr.h, stage2.c, stage3.c, stage3.h, stage3hr.c, substring.c, 2320 transcriptome.c: Merged revisions 212952 through 212996 from 2321 branches/2018-02-01-reverse-gmap-slowdown 2322 2323 * gmap.c: Fixed reference to array0 that should have been array1 2324 2325 * configure.ac: Disabled alloca by default 2326 2327 * stage3.c: Restored computation of Stage2_compute_starts and 2328 Stage2_compute_ends 2329 2330 * pair.c: Handling GFF3 output when chrstring or accession is NULL 2331 2332 * oligoindex_localdb.c, oligoindex_localdb.h: Not using 2333 Oligoindex_localdb_tally because of speed 2334 2335 * inbuffer.c: Added debugging statements 2336 2337 * gmap.c: Restored function of --pairalign option 2338 23392018-01-30 twu 2340 2341 * stage3hr.c: Made changes to debugging statements 2342 2343 * stage3.h: Updated interface for Stage3_setup 2344 2345 * stage3.c: Distinguishing between overall_end_distance_linear and 2346 overall_end_distance_circular when making call to Stage2_compute_starts 2347 and Stage2_compute_ends 2348 2349 * stage1hr.c: Making calls to remove_circular_alias and remove_duplicates 2350 for single-end reads 2351 2352 * samprint.c: Fixed bug where alignment results to circular chromosome were 2353 not being printed 2354 2355 * path-solve.c, kmer-search.c: Not allowing splice junctions for circular 2356 chromosomes 2357 2358 * gsnap.c: Using new interface to Stage3_setup 2359 2360 * gmap.c: Using genomelength instead of genome_totallength. Initializing 2361 circularp for usersegment 2362 23632018-01-29 twu 2364 2365 * gsnap.c, kmer-search.c, kmer-search.h, path-solve.c, path-solve.h: Not 2366 allowing Junction_new_splice to be called when splicing is turned off 2367 2368 * trindex.c: Added some headers for open() procedure 2369 2370 * gmap.c, gsnap.c: Using interfaces to new setup procedures 2371 2372 * stage3.c, stage3.h: Limiting chrend for Stage2_compute_starts and 2373 Stage2_compute_ends based on genome total length 2374 2375 * stage1hr.c, stage1hr.h: Added hook for distant DNA alignments 2376 2377 * samprint.c, samprint.h: Not printing XS field in SAM output when splicing 2378 is turned off. 2379 2380 * pair.c, pair.h: Printing . in features field of GFF3 output when sense is 2381 unknown. Not printing XS field in SAM output when splicing is turned off. 2382 2383 * stage3.c: Before running Stage2_compute_starts and Stage2_compute_ends, 2384 removed the check on chrend going past chrhigh. The truncated coordinates 2385 can cause chrend to be less than chrstart if the alignment straddles a 2386 chromosomal bound 2387 2388 * stage3hr.c: Fixed a memory free error. In Stage3end_new_substrings, when 2389 Stage3end_T object fails due to circular alias, letting Stage3end_free and 2390 not the caller free the junctions 2391 2392 * stage3.c: Accepting a single alignment regardless of final score, if the 2393 original queryjump or genomejump is negative 2394 23952018-01-27 twu 2396 2397 * substring.c: In Substring_new, computing trims first, then adjusting trims 2398 for out-of-bound lengths, then adjusting query and genomic bounds 2399 24002018-01-26 twu 2401 2402 * VERSION: Updated version number 2403 2404 * intlist.c, intlist.h, list.c, list.h, uintlist.c, uintlist.h: Added code 2405 for non-inlined functions 2406 2407 * transcript.c, transcript.h: Made Transcript_num non-inline 2408 2409 * configure.ac: Added macro AC_C_INLINE 2410 2411 * get-genome.c: Added include for intlist.h 2412 24132018-01-25 twu 2414 2415 * archive.html, index.html: Updated for latest version 2416 2417 * path-solve.c: Added a type conversion 2418 2419 * Makefile.gsnaptoo.am: Fixed typos in file names 2420 2421 * VERSION: Updated version number 2422 2423 * gsnap.c: Change in description for --action-if-cigar-error 2424 2425 * gmap.c: Implemented --action-if-cigar-error flag 2426 2427 * stage3.c: Putting an upper bound on the number of Boyer-Moore searches for 2428 microexons, based on the number of 5' and 3' splice positions 2429 2430 * translation.c: Assigning aaphase_e for indels in CDS 2431 2432 * pair.c, pair.h, gsnap.c: Implemented flag --action-if-cigar-error 2433 2434 * oligoindex_hr.h, oligoindex_hr.c: Moved SIMD includes to header file, 2435 since definition of Oligoindex_T object is now there 2436 2437 * gmap.c: Using new interface to Pair_setup 2438 2439 * Makefile.gsnaptoo.am: Turned off making of gmapl and gsnapl for all SIMD 2440 types 2441 2442 * pair.c: Implemented patch by Nathan Weeks to remove extra token from 2443 print_gff3_exons_forward. For GFF3 code, changed types for genomic 2444 coordinates from int to Chrpos_T. Also initializing genomic coordinates 2445 to 0 instead of -1 2446 24472018-01-24 twu 2448 2449 * oligoindex_hr.c: Added assertion 2450 2451 * stage3.c: Fixed computation of chrstart and chrend before final 2452 Stage2_compute_start and Stage2_compute_ends for circular chromosomes 2453 2454 * trunk, src, Makefile.gsnaptoo.am, gmap.c, gsnap.c, localdb.c, localdb.h, 2455 merge.c, oligoindex_hr.c, oligoindex_hr.h, oligoindex_localdb.c, 2456 oligoindex_localdb.h, stage2.c, stage2.h, stage3.c: Merged revisions 2457 212704 through 212742 from branches/2018-01-23-gmap-localdb 2458 2459 * VERSION, config.site.rescomp.tst: Updated version 2460 2461 * configure.ac: Added the conditional MAKE_LIB and a flag to control it 2462 2463 * pair.c: Added an assertion about cds_phase being non-negative 2464 2465 * Makefile.gsnaptoo.am: Building library only if MAKE_LIB is true 2466 24672018-01-23 twu 2468 2469 * gsnap.c: Allow --terminal-threshold flag for backward compatibility, but 2470 ignore 2471 24722018-01-22 twu 2473 2474 * gsnap.c: Allowing the --use-sarray flag and ignoring it 2475 2476 * stage1hr.c: Fixed bugs regarding the use of querylength3 instead of 2477 querylength5, and for handling the case where all paired-end hits are NULL 2478 and we need to run GMAP on the complete paths. 2479 24802018-01-21 twu 2481 2482 * setup1.test.in: Turning off setup1.test 2483 24842018-01-19 twu 2485 2486 * trunk, src, Makefile.gsnaptoo.am, access.c, atoiindex.c, 2487 bitpack64-access.h, bitpack64-incr.c, bitpack64-incr.h, bitpack64-read.c, 2488 bitpack64-read.h, bitpack64-readtwo.c, bitpack64-write.c, 2489 bitpack64-write.h, block.c, cellpool.c, cmetindex.c, diagpool.c, 2490 distant-rna.c, distant-rna.h, dynprog.c, dynprog_cdna.c, dynprog_end.c, 2491 dynprog_genome.c, dynprog_single.c, epu16-bitpack64-access.c, 2492 epu16-bitpack64-access.h, epu16-bitpack64-incr.c, epu16-bitpack64-incr.h, 2493 epu16-bitpack64-read.c, epu16-bitpack64-read.h, epu16-bitpack64-readtwo.c, 2494 epu16-bitpack64-readtwo.h, epu16-bitpack64-write.c, 2495 epu16-bitpack64-write.h, extension-search.c, extension-search.h, 2496 filesuffix.h, genome.c, genome128_consec.c, genome128_hr.c, 2497 genome128_hr.h, genome_sites.c, genome_sites.h, gmap.c, gmapindex.c, 2498 gregion.c, gregion.h, gsnap.c, indel.c, indel.h, indexdb-write.c, 2499 indexdb-write.h, indexdb.c, indexdb.h, indexdb_hr.c, indexdbdef.h, 2500 intersect.c, intersect.h, intlist.c, intlist.h, junction.c, kmer-search.c, 2501 kmer-search.h, list.c, list.h, littleendian.h, localdb-write.c, 2502 localdb-write.h, localdb.c, localdb.h, localdbdef.h, matchpool.c, 2503 maxent_hr.c, merge.c, merge.h, oligo.c, oligo.h, oligoindex.c, output.c, 2504 pair.c, pair.h, pairpool.c, path-solve.c, path-solve.h, reader.c, 2505 reader.h, resulthr.c, resulthr.h, samprint.c, sarray-read.c, 2506 sarray-search.c, sarray-search.h, sedgesort.c, sedgesort.h, 2507 segment-search.c, segment-search.h, smooth.c, snpindex.c, splice.c, 2508 splice.h, splicestringpool.c, splicetrie.c, splicetrie.h, stage1.c, 2509 stage1hr.c, stage1hr.h, stage2.c, stage3.c, stage3.h, stage3hr.c, 2510 stage3hr.h, substring.c, substring.h, transcript.c, transcript.h, types.h, 2511 uintlist.c, uintlist.h, uniqscan.c, univdiag.c, univdiag.h, univdiagdef.h, 2512 util, gmap_build.pl.in: Merged revisions 210996 to 212658 from 2513 branches/2017-11-04-faster-transcriptome. Put previous version into 2514 tags/2018-01-19-version1-pre-transcriptome 2515 2516 * index.html: Revised for new version 2517 2518 * uniqscan.c: Using new interface to Stage1hr_setup 2519 2520 * transcript.c: Sorting transcripts before printing 2521 2522 * stage3hr.c, stage3hr.h: Stage3end_new_transcript aborting of number of 2523 substrings and junctions don't match. Stage3end_new_substrings returning 2524 new junctions to caller 2525 2526 * stage1hr.h, stage1hr.c: Using transcriptome_end_accept 2527 2528 * sarray-search.c: Various improvements to algorithm 2529 25302018-01-17 twu 2531 2532 * pair.c: Skipping gap characters in Pairarray_genomic_sequence 2533 2534 * gsnap.c: Fixed default value of max_middle_insertions 2535 25362017-11-17 twu 2537 2538 * sarray-search.c: Turned off qsort in favor of sedgesort 2539 25402017-11-15 twu 2541 2542 * stopwatch.c: Added standalone code for testing 2543 2544 * stage3.c: Fixed issue with trimming of chimeras not being re-extended back 2545 to the breakpoint. Fixed issue with CIGAR strings from shortgap comps 2546 being treated differently from indel comps. 2547 2548 * gsnap.c: Using new interface to Sarray_search. Printing worker runtimes 2549 to stderr. Using --transcriptome-accept flag instead of old flags 2550 2551 * sarray-search.c, sarray-search.h: Removed genes_iit as a parameter 2552 2553 * indel.c: Added debugging statements 2554 2555 * dynprog_end.c: Handling the case where rev_goffset is negative 2556 2557 * outbuffer.c: Fixed fatal bug when trying to write SAM headers to 2558 OUTPUT_NONE split output, which is now NULL 2559 25602017-10-30 twu 2561 2562 * stage3.c: Fixed trim_end_indel procedures to stop at a gap 2563 25642017-10-29 twu 2565 2566 * substring.c: Fixed a memory leak when embellish_genomic is called a second 2567 time on a Stage3end_T object 2568 2569 * stage3hr.c: In Stage3end_new_gmap, in computing CIGAR, starting with 2570 hardclips set to zero 2571 2572 * stage3.c: In traverse_single_gap, not allowing a force of the gappairs. 2573 Implemented contiguous versions of peel_leftward and peel_rightward, but 2574 not using yet. 2575 2576 * sarray-search.c: For transcriptome endpoints, restricting further the 2577 cases where an indel should be ignored. Added some potential code for 2578 genome endpoints, but not implemented 2579 2580 * pairpool.c: In Pairpool_join_end5 and Pairpool_join_end3, if the final 2581 revision process still results in a negative queryjump or genomejump, then 2582 don't attach the end at all 2583 2584 * get-genome.c: If map file is not in mapdir, then look in genomesubdir 2585 2586 * cigar.c, cigar.h, gsnap.c: Calling Cigar_setup to initialize static 2587 variables 2588 25892017-10-20 twu 2590 2591 * trindex.c: Added error message if genes file not provided by user 2592 2593 * stage1hr.c, stage3hr.c, stage3hr.h: Doing remapping back to transcriptome 2594 during pairing procedure 2595 2596 * get-genome.c: Added comment 2597 2598 * pair.c, pair.h, samprint.c: For GMAP-based alignments in GSNAP, printing 2599 transcript info 2600 2601 * sarray-read.c, sarray-read.h: Changed relevant types from Univcoord_T to 2602 Sarrayptr_T 2603 2604 * sarray-search.c, sarray-search.h: Changed relevant types from Univcoord_T 2605 to Sarrayptr_T. Choosing closest indel on left and right ends. For 2606 transcriptome, checking for overlapping or adjacent indels 2607 26082017-10-18 twu 2609 2610 * config.site.rescomp.prd, config.site.rescomp.tst, index.html: Updated for 2611 latest version 2612 2613 * transcriptome.c, transcriptome.h: Implemented 2614 Transcriptome_genomic_bounded_p 2615 2616 * transcript.c, transcript.h: Implemented Transcript_in_list_p. Returning 2617 min_insertlength from Transcript_intersect_p 2618 2619 * stage3hr.c: Fixed bug in retrieving parent hit in Stage3end_remove_overlaps 2620 2621 * sarray-search.c: Fixed an issue where querystart was -1, which can result 2622 when nmatches is 0 2623 26242017-10-17 twu 2625 2626 * gmap.c: Changed option name from --alt-initiation-codons to 2627 --alt-start-codons 2628 2629 * stage1hr.c: Fixed typo in variable name 2630 26312017-10-12 twu 2632 2633 * stage3hr.c: Removed optimization to use the minimum of npaths and 2634 maxpaths, since we need computations on all npaths to make a random 2635 selection 2636 2637 * samprint.c: Handling dinucleotides gracefully if genomic sequence is NULL 2638 2639 * gsnap.c: Increased default expected_pairlength from 200 to 500. Removed 2640 --maxsearch option, since it can lead to poor answers. 2641 2642 * outbuffer.c, samflags.h, samheader.c: Fixed problem introduced in 2017-05 2643 which caused the --output-file option to produce a NULL file pointer 2644 2645 * chrom.c: Deferencing a character pointer 2646 2647 * pair.c: Fixed problem with initialization of endi in determining gff3 2648 coordinates. Avoiding reading the pair at index of npairs. 2649 2650 * cellpool.c, cellpool.h, stage2.c: Adding non-overlapping paths, as well as 2651 high-scoring paths 2652 2653 * stage1hr.c: Using new interface to Sarray_search_transcriptome 2654 2655 * substring.c, substring.h: Added functions for Substring_chrpos_low and 2656 Substring_chrpos_high 2657 2658 * stage3hr.h: Added interfaces for chrpos_low and chrpos_high 2659 2660 * stage3hr.c: In transferring Transcript_T objects, checking for duplicates 2661 2662 * sarray-search.c, sarray-search.h: Allowing for genomic bounds on 2663 transcriptome alignment, for use in re-aligning genomic hit to find 2664 transcriptome coordinates 2665 2666 * gmap.c, translation.c, translation.h: Adding option for only ATG as 2667 initiation codon. Making this the default. 2668 2669 * stage3.c: For trimming end exons, using a percentage of the querylength 2670 instead of a fixed length 2671 26722017-10-03 twu 2673 2674 * stage3hr.c: Removed old version of pair_up_concordant_transcriptome 2675 2676 * stage3hr.c: In resolve_ambiguous_splice procedures, when ambiguity is 2677 resolved, setting genomicstart or genomicend fields accordingly 2678 2679 * stage3hr.h: Removed interface for Stage3pair_pairtype 2680 2681 * stage3hr.c: Fixed the order for constructing genomic sequence from 2682 substrings 2683 2684 * substring.h, substring.c: Not applying general test of goodness for 2685 substrings from transcriptome-guided alignment 2686 2687 * sarray-search.c: Modified debugging statements 2688 2689 * samprint.c: Determining pairtype again, and allowing for the possibility 2690 of concordant uniq 2691 26922017-10-02 twu 2693 2694 * trindex.c: Copying the genes IIT file to the transcriptome directory 2695 2696 * transcriptome.c: Allowing a transcriptome to be read without a genome, by 2697 using divints instead of chrnums 2698 2699 * transcript.c: Revised debugging statements 2700 2701 * gsnap.c, uniqscan.c: Using new interface to Stage1hr_setup 2702 2703 * stage3hr.c, stage3hr.h: Implemented a separate procedure for pairing up 2704 transcriptome alignments. Transcriptome hit types now take precedence. 2705 Implemented Stage3end_substrings_genomic_sequence. 2706 2707 * stage1hr.c, stage1hr.h: Using new interface to Stage3pair_new. Re-mapping 2708 to transcriptome from genomic suffix array alignment. 2709 2710 * junction.c, junction.h: Added function Junction_deletionpos 2711 27122017-09-29 twu 2713 2714 * stage3.c: Moved location of build_dual_breaks step to get better behavior 2715 27162017-09-27 twu 2717 2718 * stage3.c: Fixed over-aggressive use of minintronlen_ends from wrong end of 2719 sequence 2720 2721 * chimera.c: Initializing a variable 2722 27232017-09-22 twu 2724 2725 * uinttableuint.c, uinttableuint.h: Initial import 2726 2727 * stage1hr.c, stage3hr.c, stage3hr.h: For transcriptome-guided genomic 2728 alignment, placing results into concordant_uniq instead of 2729 paired_uniq_long, if a transcript matches both ends 2730 2731 * pair.c, pair.h: Fixed computation of cds bounds for GFF3 output 2732 2733 * gsnap.c, uniqscan.c: Using new interface to Pair_setup 2734 2735 * gmap.c: Restored MAX_CHIMERA_ITER to be 3, but not iterating multiple 2736 times for middle pieces. Added option --gff3-cds 2737 2738 * Makefile.gsnaptoo.am: Added uinttableuint.c to library 2739 2740 * translation.c: Assigning aaphase_g for final genomic codon 2741 27422017-09-11 twu 2743 2744 * gmap.c: Restored --intronlength option 2745 2746 * pair.c: Fixed gff3 cds output so it ignores indels 2747 2748 * pair.c: Replaced gff3 printing code for CDS with a call to code for exons 2749 2750 * pair.c, pair.h: Using new interface to transcript print functions 2751 2752 * stage3hr.c, stage3hr.h, transcript.c, transcript.h: Replacing separate 2753 trnums, trstarts, and trends fields with transcripts field 2754 2755 * stage1hr.c: No longer filtering initially by transcript concordance 2756 2757 * gsnap.c, samprint.c, samprint.h: Using new interface to transcripts field 2758 for Stage3end_T and Stage3pair_T objects 2759 2760 * gmap.c: No longer iterating on check_middle_local 2761 2762 * Makefile.gsnaptoo.am: Added transcript.c and transcript.h to programs that 2763 need it from pair.c 2764 27652017-09-05 twu 2766 2767 * sarray-search.c, stage3hr.c: Commented out or fixed code for LARGE_GENOMES 2768 to use Uint8list_T 2769 2770 * sarray-search.c, sarray-search.h: Commented out genome code for 2771 LARGE_GENOMES 2772 2773 * trindex.c: Fixed code so it will compile. Fixed memory leaks. 2774 2775 * trunk, VERSION, gsl.m4, config.site.rescomp.prd, config.site.rescomp.tst, 2776 index.html, src, Makefile.gsnaptoo.am, bitpack64-readtwo.c, 2777 genome128_consec.c, genome128_consec.h, genome128_hr.c, genome128_hr.h, 2778 get-genome.c, gmap.c, gmapindex.c, gsnap.c, iit-read-univ.c, 2779 iit-read-univ.h, iit-read.c, iit-read.h, indel.c, indel.h, intlist.c, 2780 intlist.h, junction.c, junction.h, mapq.c, mapq.h, output.c, pair.c, 2781 pair.h, pairpool.c, samprint.c, samprint.h, sarray-read.c, sarray-read.h, 2782 sarray-search.c, sarray-search.h, sarray-write.c, sarray-write.h, 2783 splicealt.c, stage1hr.c, stage1hr.h, stage3hr.c, stage3hr.h, substring.c, 2784 substring.h, transcriptome.c, transcriptome.h, trindex.c, uniqscan.c: 2785 Merged revisions 207858 to 209656 from branches/2017-07-01-transcripts to 2786 allow for transcriptome-guided genomic alignment 2787 2788 * sarray-search.c: Fixed bug resulting from check of common diagonal over 2789 circular origin 2790 27912017-09-01 twu 2792 2793 * index.html: Updated for latest version 2794 2795 * stage1hr.c: Fixed criterion for looking for spliceends with nmismatches 2796 less than max_splice_mismatches on each end 2797 2798 * sarray-search.c: Checking right and left diagonals for collinearity with 2799 middle diagonal in query coordinates 2800 2801 * gmapindex.c: Casting all Univcoord_T lengths to UINT4 for suffix array 2802 procedures 2803 2804 * get-genome.c: Added option --gsequence to print exons and introns 2805 28062017-08-15 twu 2807 2808 * stage3hr.c: Changed checks on circularalias to circularpos 2809 2810 * sarray-search.c: Disallowing any splicing solution that goes around a 2811 circular origin 2812 2813 * pair.c: Checking that we are not at the end of the alignment before doing 2814 backward steps 2815 2816 * gmap.c: Removed option -G for uncompressed genome 2817 2818 * stage1hr.c: Disallowing any splicing solution that goes around a circular 2819 origin. Incrementing counter when comparing against max_gmap_improvement. 2820 Fixed a memory leak. 2821 28222017-07-27 twu 2823 2824 * stage3.c: In find_dual_break_spliceends, fixed a bug that generated 2825 negative coordinates 2826 28272017-06-29 twu 2828 2829 * Makefile.gsnaptoo.am: Added some files to the library and the include 2830 directory 2831 2832 * table.c, table.h: Added a function needed by gstruct 2833 2834 * interval.c, interval.h, iit-write.c, iit-write.h: Added a variable to make 2835 a function compatible with the gstruct version 2836 2837 * gsnap.c: Added a header file 2838 2839 * dynprog_genome.c: Commented out assertions that do not hold in transcript 2840 alignment 2841 2842 * chrom.c: Removed a faulty assertion 2843 28442017-06-21 twu 2845 2846 * stage3.c: For final call to insert_gapholders from path_compute_final, 2847 filling the gap with nucleotides if queryjump == genomejump 2848 2849 * pair.c: For GFF3 output, not printing lines where genomestart and 2850 genomeend coordinates are the same, typically resulting from a query skip 2851 28522017-06-20 twu 2853 2854 * Makefile.gsnaptoo.am: Added maxent_hr to lib and include 2855 2856 * stage3hr.c: Added assertions to make sure ilengths are not negative 2857 2858 * substring.c: In overlap checking procedures, decrementing high coordinate 2859 by 1 if possible to match the procedures for clip_overlap and 2860 merge_overlap in stage3hr.c 2861 28622017-06-19 twu 2863 2864 * VERSION, index.html: Updated version number 2865 28662017-06-16 twu 2867 2868 * gmap.c: Added to debugging statements 2869 2870 * stage3.c: In merge procedures, restoring original pairs to Stage3_T 2871 objects if the merge fails 2872 2873 * gsnap.c: Turning off default of 0 for trim-mismatch-score and 2874 trim-indel-score for DNA-Seq 2875 28762017-06-15 twu 2877 2878 * samprint.c, pair.c: For XM, handling the case where queryseq_mate is NULL 2879 2880 * shortread.c: Changed memory source of longstring to IN 2881 2882 * samprint.c: Added back a missing else clause after checking for 2883 omit_concordant_uniq_p 2884 2885 * stage3.c: Added debugging statements for creating and freeing Stage3_T 2886 objects 2887 2888 * stage1hr.c: Fixed memory leaks relating to floors and anchor segments 2889 2890 * sequence.c: Changed memory source of all contents to IN 2891 2892 * pair.c: Changed memory source of all tokens to OUT 2893 2894 * list.c, list.h: Implemented List_to_array_out_n 2895 2896 * intlist.c, intlist.h: Implemented Intlist_to_char_array_in 2897 2898 * gmap.c: Fixed memory leaks and memory bugs relating to chimera code. 2899 Removed all references to a nonjoinable list, and using stage3list as the 2900 master list for all procedures. 2901 2902 * genome.c: Changed source of alloc to IN 2903 29042017-06-14 twu 2905 2906 * index.html: Updated for latest version 2907 2908 * configure.ac: Removed unused macros 2909 2910 * src, util: Merged revisions 204076 through 207268 from 2911 branches/2017-03-07-multimapper-genes 2912 2913 * stage3hr.c: Reverted from revision 207330 (revision 205421 from 2914 branches/2017-03-07-multimapper-genes) to remove nindelbreaks field, since 2915 it discriminates against some equivalently good alignments 2916 2917 * stage3hr.c: Merged revision 205421 from 2918 branches/2017-03-07-multimapper-genes to add nindelbreaks field 2919 2920 * Makefile.gsnaptoo.am: Added commands for building lib and include 2921 2922 * uniqscan.c: Added Access_controlled_cleanup 2923 2924 * substring.c: Merged revisions 204076 through 205371 from 2925 branches/2017-03-07-multimapper-genes to remove splicecoordN and to set 2926 splicecoordD_knowni and splicecoordA_knowni. 2927 2928 * stage1hr.c: Merged revisions 204076 through 205713 from 2929 branches/2017-03-07-multimapper-genes to find DNA chimeras in paired-end 2930 reads and to double-check apparent perfect matches for actual number of 2931 mismatches 2932 2933 * sequence.c, sequence.h: Added function Sequence_stdout_header 2934 2935 * sarray-read.c, sarray-read.h, sarray-search.c, sarray-search.h: Merged 2936 revisions 204076 through 205420 from branches/2017-03-07-multimapper-genes 2937 to move search functions from sarray-read.c to sarray-search.c 2938 2939 * samprint.c, samprint.h: Merged revisions 204076 through 206196 from 2940 branches/2017-03-07-multimapper-genes to print information in XT field for 2941 transcript splicing and to handle omitting of concordant alignments 2942 2943 * samheader.c: Don't open a file for OUTPUT_NONE 2944 2945 * popcount.c, popcount.h: Modified conditions for including our own popcount 2946 instructions. No longer needed if built-in options are available 2947 2948 * pair.c: Modified compressed format to no longer print tokens or 2949 dinucleotides 2950 2951 * littleendian.h: Added macros for FREAD_FLOATS and FWRITE_FLOATS 2952 2953 * iit-read.c, iit-read.h: Merged revisions 205322 through 206058 from 2954 branches/2017-03-07-multimapper-genes to support --coding in get-genome 2955 and to implement IIT_genestruct_chrpos 2956 2957 * gsnap.c: Merged revisions 204076 through 206184 from 2958 branches/2017-03-07-multimapper-genes to add options for transcriptome 2959 alignment and omitting concordant output. 2960 2961 * datadir.c: Modified messages when gmapdb is not found 2962 2963 * cigar.c: Fixed printing of "*" for cigar with mate is NULL or substrings 2964 is NULL 2965 2966 * block.c: Merged revision 204180 from branches/2017-03-07-multimapper-genes 2967 to generalize from 12-mers to oligo size for debugging output 2968 2969 * stage3hr.c, stage3hr.h: Merged revisions 206187 and 205714 from 2970 branches/2017-03-07-multimapper-genes to add behavior for 2971 --omit-concordant-uniq and --omit-concordant-mult and to add a 2972 splice_score field for all splice types 2973 29742017-06-13 twu 2975 2976 * pair.c, stage3.c: Changed type of chroffset and chrhigh from Chrpos_T to 2977 Univcoord_T in trim end functions 2978 29792017-06-12 twu 2980 2981 * src, gmap.c, output.c, pair.c, pair.h, stage3.c, stage3.h: Merged revision 2982 204925 from branches/2017-04-02-genome-genome to add bedpe output 2983 2984 * diag.c, stage2.c: Merged revisions 207196 and 207198 from 2985 branches/2017-04-02-genome-genome to improve genome-genome alignment 2986 29872017-06-09 twu 2988 2989 * stage3.c, output.c: Using functions now in pair.c 2990 2991 * samprint.c, cigar.c, pair.c, pair.h, stage3hr.c, stage3hr.h: Moved some 2992 functions to pair.c 2993 2994 * get-genome.c: Allowing for --dump to work with --exons 2995 2996 * substring.h: Moved typedef of Substring_T early 2997 2998 * Makefile.gsnaptoo.am: Including cigar.c and cigar.h for uniqscan and 2999 uniqscanl 3000 30012017-05-30 twu 3002 3003 * substring.c, substring.h: Commenting out procedures needed for chrpos_high 3004 3005 * stage3hr.c, stage3hr.h: Commenting out procedures needed for chrpos_high. 3006 Using procedures from cigar.c. 3007 3008 * stage3.c: Using procedures from cigar.c 3009 3010 * stage1hr.c: Added debugging statement 3011 3012 * pair.c, pair.h, samprint.c, samprint.h: Moved CIGAR printing procedures to 3013 cigar.c. Printing mate cigar in XM field instead of mate chrpos_high. 3014 3015 * gsnap.c: Using new interfaces to Output_setup and SAM_setup 3016 3017 * output.c, output.h, samprint.c, samprint.h: Moved setup of merge_samechr_p 3018 from output.c to samprint.c 3019 3020 * Makefile.gsnaptoo.am, cigar.c, cigar.h: Added cigar.c and cigar.h for code 3021 relating to printing of CIGAR strings 3022 30232017-05-25 twu 3024 3025 * sarray-read.c: Increased iteration condition, allowing sarray algorithm to 3026 work when nmisses_allowed is zero. 3027 30282017-05-12 twu 3029 3030 * stage3.c: In Stage3_merge_chimera, doing peelback to remove any indels at 3031 the chimeric junction 3032 30332017-05-11 twu 3034 3035 * output.c, pair.c, pair.h, samprint.c, samprint.h, stage3hr.c, stage3hr.h, 3036 substring.c, substring.h: Added printing of mate chrpos high with an XM 3037 field 3038 3039 * stage1hr.c: Removed exception for FREE_ALIGN when nstreams is 1 3040 3041 * iit_get.c: Commented out printing of total when reading queries from stdin 3042 30432017-05-10 twu 3044 3045 * chimera.c, chimera.h, gmap.c: Allowing search for chimera exon-exon 3046 boundary to extend for 1 mismatch 3047 3048 * stage3.c, stage3.h: Implemented procedures Stage3_trim_left and 3049 Stage3_trim_right 3050 3051 * gmap.c: Calling Chimera_find_breakpoint first to set bounds based on 3052 sequence, and then Chimera_find_exonexon to find the exon boundary 3053 3054 * gmap.c: Increased value of CHIMERA_EXTEND from 8 to 20 3055 3056 * parserange.c: Added null terminating character after strncpy 3057 30582017-05-09 twu 3059 3060 * shortread.c: Fixed uninitialized variable in nextchar2 and invalid free 3061 when skipping in second file 3062 30632017-05-08 twu 3064 3065 * uniqscan.c: Using new interface to Stage1hr_setup 3066 3067 * gsnap.c, stage1hr.c, stage1hr.h: Added --speed option for GSNAP 3068 3069 * gmap.c: Set default value for maxintronlen to be 500,000 3070 30712017-05-03 twu 3072 3073 * stage3hr.c: Changed the procedure for resolving overlapping and separate 3074 alignments. Now filtering both the overlapping and separate alignments. 3075 Using expected pairlength and pairlength deviation to select which one to 3076 report. 3077 3078 * stage1hr.c: Turned off the shortcut to skip complete set algorithm if 3079 suffix array has found something. Turned off the shortcut for GMAP 3080 pairsearch/halfmapping if nconcordant > 0 3081 3082 * spanningelt.c: Changed a check procedure to abort rather than exit 3083 3084 * spanningelt.h: Fixed a typo in a comment 3085 3086 * merge.c: Made Merge_diagonals non-destructive, by copying the streams into 3087 the heap 3088 3089 * indexdb_hr.c: Added a comment about Merge_uint4 being destructive 3090 3091 * iit-read.c, iit-read.h: Added IIT_gene_overlapp function used by 3092 get-genome with the --coding flag 3093 3094 * get-genome.c: Added a --coding flag to report only genes that overlap in 3095 their coding regions 3096 30972017-04-24 twu 3098 3099 * index.html: Updated for latest version 3100 3101 * Makefile.am: Added gtf_transcript_splicesites to CLEANFILES 3102 3103 * pair.c, pair.h: Taking mate_chrnum as an argument in SAM print function 3104 3105 * output.c, samprint.c, samprint.h: Computing chrnum and mate_chrnum at same 3106 time as chrpos and mate_chrpos, to resolve issues with SAM output 3107 31082017-04-21 twu 3109 3110 * samprint.c, stage3hr.c, stage3hr.h: Fixed issue in mate chromosome printed 3111 when mate is a translocation 3112 31132017-04-13 twu 3114 3115 * chrom.c: Fixed compare function for alpha_numeric entries 3116 31172017-04-12 twu 3118 3119 * Makefile.am, configure.ac: Added an entry for gtf_transcript_splicesites 3120 3121 * index.html: Changed for latest version 3122 3123 * gtf_transcript_splicesites.pl.in: Changed output format 3124 3125 * stage3.c: Revised debugging statements 3126 3127 * samflags.h, samprint.c, samprint.h: Adding supplementary flag. Adding 3128 information to XT field for transcript splicing. 3129 3130 * gsnap.c: Turning off trim mismatch for DNA-Seq 3131 3132 * gmap.c: Added comments 3133 3134 * dynprog_end.c: Turned wideband off for extending from medial splicesite, 3135 which was causing the end exon to be re-discovered as an indel 3136 31372017-04-11 twu 3138 3139 * gtf_transcript_splicesites.pl.in: Initial import 3140 31412017-03-17 twu 3142 3143 * substring.c: Commented out assertions that don't hold under SNP-tolerant 3144 alignment 3145 3146 * stage3hr.c: Improved debugging statement 3147 3148 * stage1hr.c: Proceeding to spanning set procedure if the other end has more 3149 hits than the number of concordant hits 3150 3151 * splicing-score.c: Added debugging calls to Maxent_hr procedures, to help 3152 in development 3153 3154 * samprint.c, maxent_hr.c, junction.c: Added debugging statements 3155 3156 * Makefile.gsnaptoo.am: Added files for splicing_score 3157 3158 * iit-read.c: Fixed memory leak for intron-level known splicing 3159 3160 * intron.c, intron.h: Added some utility functions 3161 3162 * indel.c, indel.h: Modified Indel_resolve_middle_deletion to favor and 3163 report intron dinucleotides for short deletions 3164 3165 * gsnap.c: Using new interface to Sarray_setup 3166 3167 * sarray-read.c, sarray-read.h: Checking short deletions with length between 3168 min_intronlength and max_deletionlen to see if they are introns 3169 31702017-03-09 twu 3171 3172 * get-genome.c: Allowing dump of all sequences from a map file 3173 31742017-03-02 twu 3175 3176 * pairpool.c: Allowing querypos and genomepos of 0 3177 31782017-02-24 twu 3179 3180 * VERSION, config.site.rescomp.prd, config.site.rescomp.tst: Updated version 3181 number 3182 3183 * chrom.c: Added a type ALPHA_NUMERIC and sorting appropriately for those. 3184 Stripping "Chr" as well as "chr" from names 3185 3186 * pairpool.c: In Pairpool_push, not doing anything if querypos or genomepos 3187 is less than or equal to 0 3188 3189 * stage2.c: For convert_to_nucleotides, handling the case where path is NULL 3190 3191 * gmap.c: Added missing brace 3192 3193 * gmap.c, stage3.c, stage3.h: Added option --split-large-introns and 3194 implemented procedure Stage3_split 3195 3196 * stage2.c: Renamed variable querypos to curr_querypos in some procedures, 3197 so debug9 can be used 3198 31992017-02-16 twu 3200 3201 * iit-read.c: Handling the case in finding unique positions and splices 3202 where a gene has no overlapping genes 3203 32042017-02-15 twu 3205 3206 * archive.html, index.html: Updated for latest version 3207 3208 * VERSION: Updated version number 3209 3210 * substring.c: Fixed calculation of mandatory_trim_left and 3211 mandatory_trim_right 3212 3213 * indexdb.c: Assigning MMAPPED to positions_high_access when appropriate, to 3214 avoid free() error at end of program 3215 32162017-02-14 twu 3217 3218 * output.c: Ignoring mergedp in restricting the final result to a single path 3219 3220 * gmap.c: Allowing value of --suboptimal-score to be a float. Ignoring 3221 mergedp in handling the final result 3222 3223 * gmap_build.pl.in: Added flag to build genome index in parts 3224 3225 * substring.c: For default alignment format, filling in stars in regions 3226 where the alignment goes past the beginning or end of the genome 3227 3228 * dynprog_single.c, stage3.c: Added checks against non-positive values for 3229 rlength and glength in Dynprog_single_gap. Also requiring a positive 3230 value for rlength in running Dynprog_single_gap over Dynprog_cdna_gap or 3231 Dynprog_genome_gap. 3232 3233 * dynprog.c: Added debugging statements 3234 3235 * stage3.c, stage3.h: Added sort comparison procedures to help with local 3236 chimeric joins on each chromosome 3237 3238 * gmap.c: In checking for local chimeric joins, processing each chromosome 3239 separately 3240 3241 * stage3hr.c: Not resolving inside alignment when the coordinates look like 3242 a scramble, which can occur with circular chromosomes 3243 3244 * stage1hr.c: Fixed a memory leak for a non-concordant pair. Fixed an 3245 uninitialized variable for non-spliced alignment 3246 3247 * oligoindex_hr.h: Commented out obsolete code 3248 3249 * iit-read.c, iit-read.h: Added support for finding unique splices, and for 3250 finding unique positions and splices in a set of genes 3251 3252 * gmap.c, gsnap.c: Fixed printing of SIMD capabilities for AVX2 and AVX512 3253 3254 * get-genome.c: Added ability to dump a map file, and the ability to print 3255 unique positions among a set of genes 3256 3257 * genome128_hr.c: Changed builtin commands for trailing and leading zeroes 3258 to use the long long versions for 64-bit words 3259 3260 * genome.c: Commented out messages to stderr for negative coordinates 3261 3262 * dynprog_genome.c: Increased rewards for canonical intron. Removed 3263 penalties for indels next to a splice site 3264 32652017-02-08 twu 3266 3267 * get-genome.c: Printing presence/absence of unique splices also 3268 32692017-01-31 twu 3270 3271 * get-genome.c, iit-read.c, iit-read.h: Added option --nunique to print 3272 number of unique positions 3273 32742017-01-27 twu 3275 3276 * oligoindex_hr.c: Fixed typos in atoi functions for SSE2 code 3277 32782017-01-13 twu 3279 3280 * stage3hr.c: Fixed a memory leak in resolving inner splices 3281 3282 * dynprog_end.c: Fixed conditional jump based on finalscore, by not checking 3283 when endalign is QUERYEND_NOGAPS 3284 3285 * stage1hr.c: Fixed uninitialized value for successp. Using FREE_ALIGN macro 3286 3287 * spanningelt.c, indexdb_hr.c: Using MALLOC_ALIGN instead of MALLOC when 3288 needed 3289 3290 * oligoindex_hr.c: Including atoi.h 3291 3292 * samprint.c, substring.c, substring.h: Fixed coordinates reported in XT 3293 field, which depend on the donor and acceptor strands 3294 3295 * merge.c: Using macros FREE_ALIGN and CHECK_ALIGN 3296 3297 * mem.h: Defined macros FREE_ALIGN and CHECK_ALIGN 3298 32992017-01-10 twu 3300 3301 * genome128_hr.c: Fixed incorrect AVX macro 3302 3303 * oligoindex_hr.c: Changed _mm_bsrli_si128 to _mm_srli_si128. Added atoi 3304 and ttoc modes to all code. 3305 33062017-01-09 twu 3307 3308 * gsnap.c: Removed option --microexon-spliceprob 3309 33102017-01-06 twu 3311 3312 * stage1hr.c: Using alignments with most matches, even if they are 3313 translocations compared with other hitpairs 3314 33152017-01-02 twu 3316 3317 * genome128_hr.c: For handling middle rows, using <= and >= to endptr and 3318 startptr, instead of < and > 3319 33202017-01-01 twu 3321 3322 * stage3.c: Using new interface to Dynprog_end5_gap and Dynprog_end3_gap 3323 3324 * stage1hr.c: In identify_all_segments, filtering out diagonals < 3325 querylength from the merged array 3326 3327 * dynprog_single.c, dynprog_cdna.c, dynprog_genome.c: Using use8p_size 3328 3329 * dynprog_simd.h: Removing fixed definition for SIMD_MAXLENGTH_EPI8 3330 3331 * dynprog_simd.c: Added assertions for traceback procedures for vertical and 3332 horizontal jumps not to go past the main diagonal. Put macros around 3333 memory fences in debugging print procedures. 3334 3335 * dynprog_end.c, dynprog_end.h: Using use8p_size and introduced parameter 3336 require_pos_score_p 3337 3338 * dynprog.c, dynprog.h: Introducing an array for use8p_size that depends on 3339 the mismatch type 3340 33412016-12-30 twu 3342 3343 * stage3hr.c: Not converting splices when resolving insides of 3344 paired-end-reads 3345 33462016-12-29 twu 3347 3348 * trunk, src, dynprog_genome.c, gsnap.c, pair.c, pair.h, sarray-read.c, 3349 smooth.c, stage1hr.c, stage1hr.h, stage2.c, stage3.c, stage3.h, 3350 stage3hr.c, stage3hr.h, substring.c, substring.h, uniqscan.c: Merged 3351 revisions 201789 through 202030 from branches/2016-12-18-stage2-soa to 3352 make various improvements to alignments 3353 3354 * stage1hr.c: Added debugging statements 3355 3356 * indexdb_hr.c: Checking for nmerged being 0 3357 33582016-12-16 twu 3359 3360 * ax_ext.m4: Not adding -mno options to an Intel compiler 3361 3362 * indexdb_hr.c: Returning an array created by malloc, rather than 3363 _mm_malloc, from the merge version of Indexdb_merge_compoundpos 3364 3365 * sarray-read.c: Using qsort instead of Sedgesort, because of seg faults 3366 observed on Intel compiler 3367 3368 * Makefile.gsnaptoo.am: Including merge.c, merge.h, merge-heap.c, and 3369 merge-heap.h where needed 3370 3371 * stage1hr.c: Providing a version of identify_all_segments for LARGE_GENOMES 3372 3373 * indexdb_hr.c: Cleaned up code so there are three versions of 3374 Indexdb_merge_compoundpos. Fixed the merge version. 3375 3376 * oligoindex_hr.c: Fixed faulty svn merge 3377 3378 * genome128_hr.c: Fixed faulty svn merge, and hid shift_lo and shift_hi 3379 procedures 3380 3381 * trunk, src, Makefile.gsnaptoo.am, indexdb_hr.c, mem.h, merge-heap.c, 3382 merge-heap.h, merge.c, merge.h, stage1hr.c: Merged revisions 200992 3383 through 201743 from branches/2016-11-28-simd-merging to revise SIMD merge 3384 code 3385 3386 * spanningelt.c, spanningelt.h: Merged revisions 200992 through 201743 from 3387 branches/2016-11-28-simd-merging to change a calloc to a malloc 3388 3389 * trunk, ax_cpuid_intel.m4, ax_cpuid_non_intel.m4, ax_ext.m4, configure.ac, 3390 src, Makefile.gsnaptoo.am, cpuid.c: Merged revisions 200476 through 201735 3391 from branches/2016-11-14-avx512 to make provisions for AVX-512 3392 3393 * gmap.c: Merged revisions 200476 through 201735 from 3394 branches/2016-11-14-avx512 to change Genome_hr_user_setup to 3395 Genome_hr_setup 3396 3397 * gmap_select.c, gmapl_select.c, gsnap_select.c, gsnapl_select.c: Merged 3398 revisions 200476 through 201735 from branches/2016-11-14-avx512 to add 3399 provisions for AVX-512 3400 3401 * genome128_hr.c, genome128_hr.h: Merged revisions 200476 through 201735 3402 from branches/2016-11-14-avx512 to add shift and wrap procedures 3403 3404 * oligoindex_hr.c, oligoindex_hr.h: Merged revisions 200476 through 201735 3405 from branches/2016-11-14-avx512 to revise algorithms substantially 3406 3407 * oligoindex_old.c, oligoindex_old.h: Merged revisions 200476 through 201735 3408 from branches/2016-11-14-avx512 to make checking code work with current 3409 code 3410 3411 * stage2.c: Merged revisions 200476 through 201735 from 3412 branches/2016-11-14-avx512 to fix debugging comment 3413 3414 * sarray-read.c: Merged revisions 200476 through 201735 from 3415 branches/2016-11-14-avx512 to add AVX-512 code 3416 3417 * stage1hr.c: Fixed uninitialized variable 3418 34192016-12-13 twu 3420 3421 * trunk, VERSION, config.site.rescomp.prd, config.site.rescomp.tst, src, 3422 genome128_hr.c: Merged revisions 201421 through 201532 from 3423 branches/2016-12-09-genomebits-serial-simd to change structure of SIMD 3424 code in genome128_hr.c 3425 3426 * index.html: Updated for version 2016-11-07 3427 3428 * configure.ac: Allowing sse4.1 and sse4.2 as responses to --with-simd-level 3429 3430 * samprint.c: Added missing pair of braces 3431 3432 * gsnap.c, stage1hr.c, stage1hr.h: Removed references to indel_knownsplice 3433 mode for gmap 3434 34352016-11-18 twu 3436 3437 * oligoindex_hr.c: Fixed debugging statements to use SIMD commands in count 3438 procedures 3439 34402016-11-16 twu 3441 3442 * ax_ext.m4: Removed -mno... flags for compilers 3443 3444 * configure.ac: Restricting response to --with-simd-level 3445 3446 * ax_cpuid_intel.m4: Fixed configure issue for AVX2 support using Intel 3447 compiler 3448 34492016-11-14 twu 3450 3451 * pair.c: Removed initialization of static variables 3452 3453 * gsnap.c, outbuffer.c, outbuffer.h, output.c, output.h: Separate output 3454 files for single-end and paired-end results 3455 34562016-11-07 twu 3457 3458 * sam_sort.c: Added printing at monitor intervals 3459 3460 * stage3hr.c: Checking for cases where insertions and deletions extend past 3461 genomicpos 0 3462 3463 * samprint.c: Added preliminary code for printing extended cigar strings 3464 3465 * pair.c, pair.h: Added code for printing extended cigar strings. Not 3466 printing BLAST e-values. 3467 3468 * indexdb.c: Removed unused statement 3469 3470 * gsnap.c, uniqscan.c: Using new interface to Pair_setup 3471 3472 * gmap_select.c, gmapl_select.c, gsnap_select.c, gsnapl_select.c: Using new 3473 interface to CPUID_support 3474 3475 * gmap.c: Added option --sam-cigar-extended 3476 3477 * cpuid.c, cpuid.h: Added provisions for AVX512 3478 34792016-10-24 twu 3480 3481 * stage1hr.c: Not computing floors if querylength is less than index1part 3482 3483 * pair.c: Showing blast_evalue function for GMAP 3484 3485 * samprint.c: Removing assertions and aborts that do not hold for DNA-Seq 3486 chimeras 3487 34882016-10-23 twu 3489 3490 * bitpack64-read.c, pair.c, samprint.c, stage3hr.c, stage3hr.h, substring.c, 3491 substring.h: Printing BLAST e-values 3492 34932016-10-22 twu 3494 3495 * indexdb-write.c: For initializing counters, using packsizes from 3496 offsetsmeta, rather than recomputing them from offsetsstrm 3497 3498 * bitpack64-write.c: Handling problem with ptri overflowing a signed int. 3499 Now just advancing a pointer. 3500 3501 * bitpack64-read.h: Added interface for a function 3502 3503 * bitpack64-access.c, bitpack64-incr.c, bitpack64-read.c, 3504 bitpack64-readtwo.c: Handling the case where nwritten*4 overflows an 3505 unsigned int. Casting it first to UINT8. 3506 35072016-10-17 twu 3508 3509 * bitpack64-access.c: Fixed an increment of out in extract_28 3510 35112016-09-26 twu 3512 3513 * archive.html, index.html: Updated for latest version 3514 35152016-09-23 twu 3516 3517 * stage3.c: In solving dual introns, handling the case where single_gappairs 3518 is NULL. Added code for gmapl. 3519 3520 * stage1.c: Modified debugging statements 3521 3522 * pair.c: Added a check for monotonicity of query coordinates to the 3523 debugging procedure 3524 3525 * dynprog_genome.c: If procedure is returning NULL instead of the computed 3526 gap pairs, then setting finalscore to be negative, so the result is not 3527 used by the calling procedure 3528 3529 * access.c: If shm_attach fails, and using mmap instead, then not trying to 3530 copy a file to a read-only memory segment 3531 3532 * Makefile.gsnaptoo.am: Added uint8list.c and uint8list.h for gmap 3533 3534 * stage2.c: Added back find_shifted_canonical procedure as unused code 3535 35362016-09-20 twu 3537 3538 * substring.h: Using sensedir as a field, instead of chimera_sensedir 3539 3540 * substring.c: Substring_new can use trimmed ends to determine the sensedir. 3541 Using sensedir as a field, instead of chimera_sensedir 3542 3543 * stage3hr.h: Stage3end_new_gmap takes sensedir_knownp as an argument 3544 3545 * stage3hr.c: Stage3end_new_gmap takes sensedir_knownp as an argument, and 3546 can use trimmed ends to determine the sensedir. Stage3end_new_substrings 3547 can determine sensedir from its component substrings and junctions. For 3548 comparing alignments, using nmatches rather than nmatches_posttrim 3549 3550 * stage3.h: Changed variable name 3551 3552 * stage3.c: Removing maxpeelback restriction on peeling back for introns. 3553 For microexons, just transferring without checking. In comparing single 3554 and dual gaps, not using middle exonprob to evaluate middle exon. In 3555 solving dual breaks for microexons, allowing for multiple possible outer 3556 splice positions. Changed order of operations to smooth first, then find 3557 dual breaks, and then single introns. 3558 3559 * stage1hr.c: Deciding separately whether to run gmap on 5' and 3' ends, 3560 depending on max_matches found on each end 3561 3562 * pair.c: Putting macro around GSNAP-specific output code for using mate 3563 sensedir 3564 3565 * dynprog_genome.c: Not using probabilities to determine if dinucleotide 3566 solution is good 3567 3568 * boyer-moore.c: Added debugging statement 3569 3570 * dynprog_genome.c: Removed backup algorithm for best score above a 3571 probability threshold. Instead, using best probability among canonical or 3572 semicanonical dinucleotides. 3573 35742016-09-16 twu 3575 3576 * stage3.c: Solving for microexons inside of traverse_dual_break. Solving 3577 for dual breaks before solving introns. 3578 3579 * splice.c, splice.h: Splice_trim_novel_spliceends function now returning 3580 new splicedir 3581 3582 * stage3.c: Added intron-specific functions for peelback, to handle long 3583 similarity between exon ends and intron segments on the other end. 3584 Function for finding novel spliceends now returns new splicedir, although 3585 currently not used. 3586 35872016-09-15 twu 3588 3589 * samprint.c: Using new interface for Substring_sensedir 3590 3591 * pair.c: Printing mate sensedir, to be consistent with samprint code 3592 35932016-09-13 twu 3594 3595 * stage3hr.h: Removed obsolete functions 3596 3597 * stage3hr.c: Fixed cases where trim was added to amb_length. Removed 3598 specific amb_length fields for GMAP alignments, and calculating instead 3599 using trim_left_splicep and trim_right_splicep 3600 3601 * stage1hr.c: Modified debugging statements 3602 3603 * substring.h, substring.c: Removed an include statement 3604 3605 * splice.c, splice.h: Moved splice site probability calculations from 3606 substring_trim_novel_spliceends to here 3607 3608 * stage3.c: Fixed gmap_trim_novel_spliceends to initialize mismatchp to be 3609 true if the alignment does not extend to the end 3610 3611 * dynprog_genome.c: Turned off debugging 3612 3613 * substring.c: Added comments 3614 3615 * dynprog_genome.c: Fixed bug in decision-making for using bestscore when it 3616 has a good probability. Previously, this switched to the 3617 probability-based algorithm. Renamed variables to clarify the algorithms. 3618 36192016-09-12 twu 3620 3621 * stage3hr.c: For overlap calculation, using just trim, not trim plus amb 3622 length 3623 36242016-09-09 twu 3625 3626 * stage3hr.h: Defining Stage3end_nmatches 3627 3628 * stage3hr.c: Defining nmatches to be nmatches_posttrim plus amb length. 3629 Requiring minimum number of matches to allow a transloc splice. Favoring 3630 definite ambig results, plus insertlength, over definite splices or 3631 trimmed ambig, and then favoring definite splices over trimmed ambig. 3632 3633 * stage1hr.c: Using Stage3end_nmatches instead of 3634 Stage3end_nmatches_posttrim to decide whether to run GMAP 3635 3636 * substring.h: Defining procedures for returning nmatches and amb lengths 3637 3638 * substring.c: Defining nmatches to be nmatches_posttrim plus ambiguous 3639 length. Computing MAPQ over trimmed region to be consistent with 3640 pair-based method. For new donor and acceptor substrings, extending the 3641 trim calculation to 0 or querylength. 3642 36432016-09-07 twu 3644 3645 * stage1hr.c: Checking whether result of Stage3end_new_splice is NULL 3646 3647 * stage3hr.c: Using number of matches and nmatches_posttrim in 3648 hit_goodness_cmp and hitpair_goodness_cmp. Requiring a minimum number of 3649 matches in donor and acceptor before creating a transloc splice. Added 3650 code for checking suffix array mismatches. 3651 3652 * sarray-read.c: After finding an insertion, modifying querystart of current 3653 diagonal, so next substring operation starts from that position 3654 3655 * indel.c: Improved debugging statements 3656 3657 * bitpack64-incr.c: Fixed errors in code for transferring from bitpack sizes 3658 22 to 24, and from 26 to 28 3659 36602016-09-02 twu 3661 3662 * gmap.c, gsnap.c, uniqscan.c: Using new interface to Indexdb_new_genome 3663 3664 * splice.c: When splice is not found, return -1 as values for nmismatches 3665 3666 * sarray-read.c: Allowing initial value of nmismatches to be used if it is 3667 0. Fixed case involving ambiguous substrings. 3668 3669 * sarray-read.c: Setting nmismatches correctly in various cases, so we do 3670 not have to recompute them. Looking at endpoints to determine if the 3671 nmismatches value is correct. 3672 36732016-09-01 twu 3674 3675 * indexdb.c, indexdb.h: For the option --unload-shared-memory, use 3676 allocation and not memory mapping to make sure we deallocate any shared 3677 memory 3678 36792016-08-24 twu 3680 3681 * genome.c: Not accessing beyond end of blocks when enddiscard is 0 3682 36832016-08-16 twu 3684 3685 * VERSION: Updated version number 3686 3687 * README: Discussing MAX_STACK_READLENGTH 3688 3689 * gsnap.c, uniqscan.c: Using MAX_FLOORS_READLENGTH instead of MAX_READLENGTH 3690 3691 * configure.ac, Makefile.gsnaptoo.am: Using MAX_STACK_READLENGTH instead of 3692 MAX_READLENGTH 3693 3694 * stage1hr.h: Adding max_floor_readlength to setup 3695 3696 * stage1hr.c: Removed local allocation of arrays of size MAX_READLENGTH. 3697 Now checking querylength against MAX_STACK_READLENGTH to determine whether 3698 to allocate from stack or heap. Adding max_floor_readlength to setup 3699 3700 * indel.c, mapq.c, sarray-read.c, splice.c: Removed local allocation of 3701 arrays of size MAX_READLENGTH. Now checking querylength against 3702 MAX_STACK_READLENGTH to determine whether to allocate from stack or heap 3703 3704 * stage3hr.c: Not allowing any indels to set trims in determining optimal 3705 score 3706 3707 * stage1hr.c: Using pre-processor macro LONG_READLENGTHS to allocate 3708 read-related memory on heap instead of stack. Setting spliceable_high_p 3709 to be false for last segment. In computing end indels, ensuring that 3710 shifti is not negative when looking up array value. 3711 3712 * shortread.c: Using MAX_EXPECTED_READLENGTH instead of MAX_READLENGTH 3713 3714 * stage3.c: Handling the case when trimming ends that exon is empty 3715 3716 * stage3hr.c: Restored setting of abort_pairing_p when nconcordant exceeds 3717 maxpairedpaths 3718 3719 * gsnap.c, uniqscan.c: Using new interface to Pair_setup 3720 3721 * indel.c, mapq.c, sarray-read.c, splice.c, substring.c: Using pre-processor 3722 macro LONG_READLENGTHS to allocate read-related memory on heap instead of 3723 stack 3724 3725 * gmap.c, pair.c, pair.h: Added option --gff3-swap-phase 3726 3727 * bytecoding.c: Added explanation messages to remove shared memory segments 3728 37292016-08-12 twu 3730 3731 * trunk, config.site.rescomp.prd, configure.ac, src, Makefile.gsnaptoo.am, 3732 filestring.c, genome_sites.c, gsnap.c, pair.c, samprint.c, sarray-read.c, 3733 sedgesort.c, sedgesort.h, shortread.c, splice.c, stage1hr.c, stage3hr.c, 3734 stage3hr.h, substring.c, substring.h, univdiag.c, univdiag.h, util: Merged 3735 revisions 195608 to 196272 from branches/2016-08-09-genome-sites-hr, which 3736 contains merged revisions from branches/2016-08-02-long-read-fusions and 3737 2016-07-01-better-triage 3738 3739 * trunk, VERSION: Updated version number 3740 3741 * Makefile.gsnaptoo.am: Removed chrsubset.c and chrsubset.h for 3742 splicing-score 3743 3744 * pair.c: Added variable to swap phase for gff3 output 3745 3746 * configure.ac: Added a line to disable maintainer mode for users 3747 3748 * config.site.rescomp.prd, config.site.rescomp.tst, archive.html, 3749 index.html: Updated for latest version 3750 3751 * MAINTAINER: Added note about PATH 3752 37532016-08-08 twu 3754 3755 * gtf_genes.pl.in, gtf_introns.pl.in, gtf_splicesites.pl.in: Printing both 3756 gene_id and gene_name 3757 3758 * atoi.c, cmet.c: Fixed reduce procedures for 64-bit computers 3759 3760 * Makefile.gsnaptoo.am: Added semaphore.c and semaphore.h to list of files 3761 for splicing-score 3762 3763 * stage1hr.c: Fixed debugging statements 3764 3765 * stage3.c: Fixed issue where we tried to use pairs_pretrim after path_trim 3766 altered the pairs 3767 3768 * samprint.c, substring.c, substring.h: Fixed XT field to print correct 3769 junction coordinates 3770 37712016-08-02 twu 3772 3773 * stage3hr.c: Restoring final procedure based on nmatches in 3774 Stage3pair_optimal_score 3775 3776 * stage3.c: Reverting from revision 195487 to allow extraexon comps again 3777 and from revision 193238 to always insert dual break alignments 3778 3779 * comp.h, pair.c, pairpool.c: Reverting from revision 195484 to allow 3780 extraexon comps again 3781 37822016-08-01 twu 3783 3784 * gtf_genes.pl.in, gtf_introns.pl.in, gtf_splicesites.pl.in: Imposing 3785 preference order based on desired keys, rather than the text 3786 3787 * src, inbuffer.c, shortread.c: Merged revisions 195492 and 195493 to fix 3788 problem where --force-single-end terminated when a file had reads that 3789 were a multiple of --input-buffer-size 3790 3791 * stage3.c, comp.h, pair.c, pairpool.c: Using shortgap comp instead of 3792 extraexon comp for representing dual breaks 3793 3794 * shortread.c: Fixed issues in reading multiple pairs of files from command 3795 line 3796 37972016-07-23 twu 3798 3799 * atoiindex.c: Fixed calculation of oligo using new block algorithm 3800 38012016-07-11 twu 3802 3803 * stage1hr.c: For paired terminals, assigning final pairtype to be concordant 3804 3805 * archive.html: Exposed version 2015-07-23. Improved formatting. 3806 3807 * stage3hr.c, substring.c: Handling the special case when alignstart or 3808 alignend is requested on an ambiguous substring 3809 3810 * stage3hr.c: Computing insertlength and concordance properly for 3811 overlapping dual GMAP alignments 3812 3813 * stage1hr.c: For dynamic programming of anchor segments, proceeding from 3814 closest to farthest segments, to favor shorter splice distances. Resolved 3815 uninitialized variable when completeset algorithm is called but 3816 spanningset was not called, so read_oligos was not called. Skipping 3817 re-alignment of hits that are already have a type of GMAP. 3818 3819 * sarray-read.c: Resolving ambiguous ends if one dominates by both 3820 probability and splice distance 3821 3822 * gmap.c, gsnap.c, uniqscan.c: Using new interface to Stage3_setup 3823 3824 * stage3.c, stage3.h: Adding dual break for both SAM and non-SAM output, 3825 needed to give the correct CIGAR starting coordinate 3826 3827 * doublelist.c, doublelist.h, intlist.c, intlist.h, uintlist.c, uintlist.h: 3828 Implemented procedures for keeping a single item in the list 3829 3830 * sarray-read.c: Among ambiguous splice segments, ranking by probability and 3831 selecting closest one if it is less than half the distance of the second 3832 one 3833 3834 * substring.c, stage3hr.c: Improved debugging statements 3835 3836 * sarray-read.c: Fixed value of substring1p passed to Substring_new_ambig 3837 for alignments on the minus strand, which resulted in problems with the 3838 --merge-overlap feature 3839 38402016-07-06 twu 3841 3842 * stage3.c, pair.c: Restored backward movement of ptr 3843 3844 * stage1hr.c: Fixed infinite loop due to circular list 3845 3846 * gsnap.c: Changed default end detail to be medium 3847 3848 * substring.c: Fixed standard GSNAP output for deletions, by reducing the 3849 number of final dashes 3850 3851 * stage3hr.c: Added debugging statements 3852 38532016-06-30 twu 3854 3855 * pair.c, stage3.c: No longer going backward after an indel, which could 3856 cause an infinite loop 3857 3858 * splice.c: Using looser criteria for accepting a splice 3859 3860 * sarray-read.c: Revising previous number of mismatches instead of replacing 3861 it 3862 3863 * substring.c: Using correct memory category for substrings 3864 3865 * stage3hr.c: Penalizing bad introns 3866 3867 * pair.c, pair.h: Pair_nmismatches_region returning number of bad introns 3868 3869 * indel.c: Improved debugging statements 3870 3871 * gmap.c: Including -K for backward compatibility 3872 3873 * stage1hr.c: Merged revision 193193 from branches/2016-06-29-add-listpool 3874 to change from lists to a vector for anchor_segments 3875 38762016-06-29 twu 3877 3878 * resulthr.c, resulthr.h: Added UNPAIRED_TERMINALS result type 3879 3880 * stage1hr.c: Handling unpaired_terminals. Consolidated memory allocation 3881 for plus and minus cases in Stage1hr_T object. 3882 38832016-06-21 twu 3884 3885 * shortread.c: Made fixes for --force-single-end option to work properly 3886 38872016-06-15 twu 3888 3889 * configure.ac: Added provision for user-selected SIMD level 3890 38912016-06-09 twu 3892 3893 * cpuid.c: Providing more detailed information from standalone program 3894 3895 * splicetrie.c: Commented out a debugging statement 3896 3897 * splice.c: Restored check for sufficient splice probabilities on splices 3898 3899 * sarray-read.c: Restored _pext_u32 command 3900 3901 * Makefile.gsnaptoo.am, cpuid.c: Added cpuid main program 3902 3903 * stage3.c: Added comment 3904 39052016-06-08 twu 3906 3907 * ax_ext.m4: Added -mbmi2 flag for avx2 3908 39092016-06-03 twu 3910 3911 * VERSION: Updated version number 3912 3913 * stage1hr.c: Replaced constant value of 15 with 3914 min_distantsplicing_end_matches 3915 3916 * indexdb.c: Removed sanity check on positions filesize, which can fail on 3917 multiple simultaneous instances of the process 3918 3919 * stage1hr.c, stage3hr.c, stage3hr.h: Searching for distant splicing based 3920 on trim 3921 3922 * sarray-read.c: Turning off AVX2-specific version of 3923 fill_positions_filtered_first 3924 3925 * sam_sort.c: Fixed warning message. Fixed memory leak. 3926 3927 * gmap_select.c, gmapl_select.c, gsnap_select.c, gsnapl_select.c: Improved 3928 warning messages 3929 39302016-05-25 twu 3931 3932 * pair.c: Fixed calculation of circularpos for plus strand 3933 39342016-05-24 twu 3935 3936 * trunk, VERSION, acinclude.m4, bootstrap.gsnaptoo, asm-bsr.m4, ax_ext.m4, 3937 builtin-popcount.m4, configure.ac, index.html, src, Makefile.gsnaptoo.am, 3938 cpuid.c: Merged revisions 189683 through 190434 from 3939 branches/2016-05-12-power8 3940 3941 * uniqscan.c, gsnap.c: Using new interface to Stage3_setup 3942 3943 * iit-read-univ.c: Commented out warning when IIT file cannot be read 3944 3945 * gmap.c: Changed flag names to --max-intronlength-middle, 3946 --max-intronlength-ends, and --trim-end-exons 3947 3948 * stage3.c, stage3.h: Checking for end exon length with minendexon variable 3949 3950 * config.site.rescomp.dev, config.site.rescomp.tst: Added a configuration 3951 file for run-time checking 3952 39532016-05-17 twu 3954 3955 * stage3hr.c: Fixed uninitialized variable 3956 3957 * gmap.c, gsnap.c, uniqscan.c: Using new interface to Stage3_setup 3958 3959 * stage3.h: Added variable maxintronlen_ends 3960 3961 * stage3.c: In trimming ends, always going to intron, and not allowing 3962 indel. Comparing end intron length with maxintronlen_ends. 3963 39642016-05-06 twu 3965 3966 * dynprog_simd.c: Computing X_prev_nogap correctly for the case of zero gap 3967 penalty 3968 3969 * stage1hr.c: Modified debugging statements 3970 3971 * stage2.c: Added assertions 3972 3973 * acinclude.m4, ax_cpuid_intel.m4, ax_cpuid_non_intel.m4, ax_ext.m4, 3974 configure.ac: Writing own cpuid configure checks based on same code as in 3975 src/cpuid.c 3976 39772016-05-01 twu 3978 3979 * trunk, src, dynprog.c, dynprog.h, dynprog_genome.c, gmap.c, gsnap.c, 3980 pair.c, pair.h, sarray-read.c, splice.c, stage1hr.c, stage3.c, stage3.h, 3981 stage3hr.c, stage3hr.h, uniqscan.c: Merged revisions 188721 through 188751 3982 from branches/2016-04-29-improve-alignments 3983 3984 * trunk, Makefile.gsnaptoo.am: Property changes 3985 3986 * VERSION, config.site.rescomp.prd: Updated version number 3987 3988 * config.site.rescomp.tst: Added sanitize flag 3989 39902016-04-29 twu 3991 3992 * papers: Removed papers directory from SVN 3993 3994 * src, Makefile.gsnaptoo.am, gmap.c, pair.c, stage3.c, translation.c, 3995 translation.h: Merged revisions 188558 to 188717 from 3996 branches/2016-04-27-alt-codons to allow for alternate genetic codes 3997 39982016-04-20 twu 3999 4000 * archive.html, index.html: Updated for latest version 4001 4002 * stage3.c: Not allowing any ambiguous matches at 3' or 5' ends when trimming 4003 4004 * datadir.c: Modified comments 4005 4006 * datadir.c: In find_fileroot, showing preference if <dbroot>.version is 4007 found. Otherwise, handling the case where multiple .version files are 4008 found. 4009 40102016-04-04 twu 4011 4012 * archive.html, index.html: Revised for latest version 4013 4014 * splice.c: Checking for more than 10% mismatches in either end. Using 4015 value of min_shortend in Splice_resolve_sense and Splice_resolve_antisense. 4016 4017 * sarray-read.c: Modified debugging statements 4018 40192016-03-30 twu 4020 4021 * gmapindex.c: Not creating altscaffold IIT file if no alt scaffolds are 4022 observed 4023 4024 * gmap.c, uniqscan.c: Using new interface to Univ_IIT_altlocp 4025 4026 * VERSION: Updated version number 4027 4028 * index.html: Updated for latest version 4029 4030 * stage3hr.c: Removed low_alias and high_alias fields. Using altlocp, 4031 alias_starts, and alias_ends. 4032 4033 * resulthr.c: Using npaths_primary and npaths_altloc 4034 4035 * gsnap.c, iit-read-univ.c, iit-read-univ.h: Reading altloc IIT file 4036 40372016-03-29 twu 4038 4039 * sam_sort.c: Added option --restore-orig-order 4040 4041 * samprint.c: Removed print statement 4042 4043 * iit-read.c: Trying adding .iit suffix first 4044 4045 * stage3hr.c: Turning off DISTANT_SPLICE_SPECIAL, so we can find distant 4046 splices. For substrings, updating found_score only when the new one is 4047 better. Using nmismatches_whole for score field. 4048 4049 * sarray-read.c: Fixed debugging statements 4050 40512016-03-17 twu 4052 4053 * gff3_genes.pl.in, gff3_introns.pl.in, gff3_splicesites.pl.in: Handling 4054 Parent and ID fields in exon and CDS types of recent NCBI gff3 files. 4055 Handling new transcript types. 4056 4057 * pair.c: Changed occurrences of abs() to explicit conditional statements, 4058 since abs() can give large integers with -m64 compiler flag 4059 4060 * stage2.c: Added parentheses 4061 4062 * gsnap.c, stage1hr.c, stage1hr.h, uniqscan.c: Added option --max-anchors 4063 4064 * samread.c: Added debugging statements 4065 40662016-02-19 twu 4067 4068 * VERSION: Updated for latest version 4069 4070 * index.html: Updated version latest version 4071 4072 * stage3hr.c: Turned off debugging 4073 4074 * stage1hr.c, stage3hr.c, substring.c: Fixed query coordinates for salvage 4075 terminal procedure on minus strand 4076 40772016-02-18 twu 4078 4079 * stage3hr.c: Checking for stage2pairs being NULL when running GMAP on 4080 substrings or previous GMAP 4081 4082 * gmap_build.pl.in: Restored removal of fasta_sources and coordsfile 4083 temporary files 4084 4085 * gmap_build.pl.in: Added quotes around bindir programs 4086 4087 * stage3hr.c: Fixed creating stage 2 pairs for circular chromosomes 4088 4089 * stage3.c: Fixed debugging statements 4090 40912016-02-17 twu 4092 4093 * indel.c: Require more matches on both ends than the length of the insertion 4094 4095 * indel.c, oligoindex_hr.c, sarray-read.c, stage1hr.c, stage3hr.c, 4096 stage3hr.h, substring.c, substring.h: Removed genomiclength as a field 4097 from Substring_T objects. Fixed overflow bug for large insertion 4098 substrings. 4099 4100 * sarray-read.c: Fixed code for SSE2 compilation 4101 4102 * stage3hr.c: Removed assertion, which is not valid 4103 4104 * indexdb.c: Handling stderr message for single sequence, where number of 4105 seconds is not defined 4106 4107 * stage3hr.c: Changed assertion to handle large genomes 4108 4109 * stage1hr.c, smooth.c, stage3.c, translation.c, translation.h, gmap.c, 4110 gregion.c, gregion.h, stage1.c, stage1.h, sarray-read.c, splice.c, 4111 splice.h: Removed unused variables and parameters 4112 4113 * splicetrie_build.c, sarray-write.c, indexdb.c, indexdb-write.c, 4114 gmapindex.c, dynprog_single.c: Removed unused variables 4115 4116 * splicetrie.c: Using new interface to dynprog procedures 4117 4118 * pbinom.c, outbuffer.c, iit-write.c, get-genome.c: Hiding unused procedures 4119 4120 * pair.c, indel.c, indel.h, sarray-read.c, sarray-read.h, stage1hr.c, 4121 stage3.c, dynprog_single.c, dynprog.c, dynprog_cdna.c, dynprog_cdna.h, 4122 dynprog_end.c, dynprog_end.h, dynprog_genome.c, dynprog_simd.c, 4123 dynprog_simd.h: Removed unused parameters 4124 4125 * output.c, stage3.h: Removed unused parameters in Stage3_print_sam 4126 4127 * oligoindex_hr.c: Fixed comparison between unsigned and signed values 4128 4129 * genome128_hr.c: Hiding procedures specific to GSNAP 4130 4131 * dynprog_cdna.c, dynprog_end.c, dynprog_end.h, dynprog_genome.c, 4132 dynprog_single.c, dynprog_single.h: Changed check for HAVE_SSE4_1 or 4133 HAVE_SSE2 to just HAVE_SSE2 4134 4135 * compress.c, bitpack64-readtwo.c: Put macros around a variable 4136 4137 * stage3hr.c: Using new interfaces to stage 2 procedures 4138 4139 * splicetrie.c: Using new interfaces to dynprog procedures 4140 4141 * gmap.c, stage1hr.c, stage2.c, stage2.h, stage3.c: Removed stage2_source 4142 and stage2_indexsize as return values from procedures 4143 4144 * stage3hr.c: Fixed comparisons of signed and unsigned integers 4145 4146 * gmap.c, stage3.c, stage3.h: Removed stage2 and stage3 benchmarking fields 4147 from Stage3_T object 4148 4149 * output.c, samprint.c, samprint.h, chimera.c, pair.c, pair.h: Removed 4150 unused variables and parameters from pair procedures 4151 4152 * dynprog_single.c, dynprog.c, dynprog.h, dynprog_cdna.c, dynprog_end.c, 4153 dynprog_genome.c, dynprog_simd.c, pairpool.c, pairpool.h: Reduced 4154 parameters for Pairpool_add_genomeskip and Dynprog_traceback_std 4155 4156 * gsnap.c, stage1hr.c, stage3hr.c, stage3hr.h, uniqscan.c: Putting 4157 subopt_levels into Stage3hr_setup. Removing cutoff_level as parameter 4158 from optimal score procedures 4159 4160 * stage1hr.c, stage3hr.c, stage3hr.h: Removed unused parameters from display 4161 and eval procedures 4162 41632016-02-16 twu 4164 4165 * sarray-read.c, stage1hr.c, stage3hr.c, stage3hr.h: Removed unused 4166 parameters for stage3hr procedures 4167 4168 * sarray-read.c, splice.c, stage1hr.c, stage3hr.c, stage3hr.h: Removed 4169 unused parameters and variables 4170 4171 * stage1hr.c, stage3hr.c, stage3hr.h: Using new interfaces to functions 4172 without first_read_p 4173 4174 * mapq.c, mapq.h, bitpack64-readtwo.c: Hiding unused functions 4175 4176 * bitpack64-read.c: Put correct macros around variable 4177 4178 * indel.c, splice.c: Using new interfaces to procedures without first_read_p 4179 4180 * sarray-read.c, substring.c, substring.h: Removed unused field first_read_p 4181 4182 * samprint.c: Removed unused parameter concordant_chrpos 4183 41842016-02-13 twu 4185 4186 * gsnap.c, stage1hr.c, stage1hr.h, uniqscan.c: Removed unused variables and 4187 parameters in stage1hr.c 4188 41892016-02-12 twu 4190 4191 * gmap.c, gsnap.c, indexdb_hr.c, oligo.c, sarray-read.c, sarray-read.h, 4192 stage1hr.c, stage2.c, stage2.h, stage3.c, stage3hr.c: Removed unused 4193 variables and parameters from sarray procedures 4194 4195 * genome-write.c, genome-write.h, gmapindex.c: Removed altstrain_iit as a 4196 parameter to Genome_write_comp32 4197 4198 * genome_sites.c, sequence.c: Hiding unused function 4199 4200 * genome_sites.c, genome_sites.h, access.c: Hiding unused functions 4201 4202 * genome128_hr.c, genome128_hr.h, indel.c, mapq.c, sarray-read.c, splice.c, 4203 splicetrie.c, stage1hr.c, stage3hr.c, substring.c: Removed first_read_p as 4204 a parameter from all genome128 procedures 4205 4206 * compress.c: Removed unused variables 4207 4208 * iit_get.c: Removed unused variables and parameters 4209 4210 * parserange.c, iit-read-univ.c: Removed unused variable 4211 4212 * iit-read.c, iit-read.h, stage3.c: Removed map_bothstrands_p as a parameter 4213 to IIT_print_header 4214 4215 * iit-read-univ.c, iit-read.c, iit-read.h, iit_get.c: Removed sortp as a 4216 parameter from IIT_get_values routines 4217 4218 * get-genome.c, gmap.c, gsnap.c, iit-read.c, iit-read.h, iit_dump.c, 4219 iit_fetch.c, iit_get.c, parserange.c, snpindex.c, uniqscan.c: Removed 4220 labels_read_p as a parameter from IIT_read 4221 4222 * access.c: Restored check for number of attached processes when deallocating 4223 4224 * iit-read.c, iit-read.h, splicing-scan.c: Removed parameter annotationonlyp 4225 from IIT_dump 4226 4227 * genome128.c, gmapindex.c, sarray-read.c, snpindex.c, access.c, access.h, 4228 atoiindex.c, cmetindex.c, genome.c, iit-read-univ.c, iit-read.c, 4229 indexdb-write.c, indexdb.c, sarray-write.c: Removed eltsize as an argument 4230 to Access_mmap routines 4231 4232 * gsnap.c, stage3hr.c, stage3hr.h, uniqscan.c: Added --end-detail flag 4233 4234 * access.c: No longer printing long string of periods and commas during 4235 pre-load 4236 42372016-02-09 twu 4238 4239 * gmap.c, gsnap.c: Added message to remove shared memory manually 4240 4241 * access.c: Removed warning message 4242 4243 * spanningelt.c: Removed debugging code 4244 4245 * spanningelt.c: For debugging purposes 4246 4247 * src, atoiindex.c, cmetindex.c, gmapindex.c, indexdb.c, indexdb_hr.c, 4248 sarray-write.c: Using new interface to Access_allocate_private 4249 4250 * sarray-read.c: Using new interface to Access_allocate_shared and 4251 Access_allocate_private. Removed code for USE_CSA. 4252 4253 * genome.c, indexdb.c, indexdb.h, indexdbdef.h: Using new interface to 4254 Access_allocate_shared and Access_allocate_private 4255 4256 * access.c, access.h: If shared allocation fails, now using memory mapping 4257 if possible. Setting access variable. 4258 4259 * gsnap.c, gmap.c, uniqscan.c: Using new interface to Access_setup 4260 4261 * dynprog_cdna.c: Changed variable initialization 4262 4263 * semaphore.c: Changed variable names 4264 4265 * access.c: Storing all semaphore IDs, and looking at their 4266 resident/freeable status. Handling emergency stops better. 4267 42682016-02-08 twu 4269 4270 * Makefile.gsnaptoo.am, access.c, semaphore.c, semaphore.h: Put semaphore 4271 commands in a separate file. Fixed small bugs with deleting semaphores 4272 and shared memory. 4273 42742016-02-05 twu 4275 4276 * stage2.c: Removed unnecessary calls to abs(). Replaced with a comparison 4277 between gendistance and querydistance. 4278 4279 * shortread.c: Using size_t instead of unsigned long long 4280 4281 * bitpack64-write.c, indexdb_hr.c, junction.c, pair.c, parserange.c, 4282 sarray-write.c, splicetrie_build.c, substring.c, uint8list.c: Using %llu 4283 for formatting instead of %u 4284 4285 * access.c, access.h, genuncompress.c, iit-read-univ.c, iit-read.c, 4286 indexdb-write.c, indexdb.c, sam_sort.c: Changed off_t to size_t for 4287 filesize 4288 4289 * gsnap.c: Removed testing code 4290 4291 * gsnap.c: For testing purposes 4292 42932016-02-04 twu 4294 4295 * pairpool.c: Fixed assertion on genomepos 4296 4297 * stage3hr.c: Fixed computation of minus chromosome coordinates for circular 4298 chromosomes 4299 43002016-02-03 twu 4301 4302 * gmap.c: Creating altlocp, alias_starts, and alias_ends for user-provided 4303 genomic segment 4304 4305 * coords1.test.ok: Revised for alternate genomic contigs 4306 4307 * gsnap.c: Allowing for npaths_primary and npaths_alternate. Letting 4308 insertion length be arbitrarily long when user does not specify 4309 --max-middle-insertions. 4310 4311 * gmap.c, stage3hr.h, stage3.h: Allowing for npaths_primary and 4312 npaths_alternate 4313 4314 * uniqscan.c: Using new interfaces to functions 4315 4316 * substring.h, substring.c: Trimming novel spliceends for substrings 4317 4318 * stage3hr.c: Allowing for npaths_primary and npaths_alternate. Changed 4319 logic for extending substrings using GMAP. Implemented extension of GMAP 4320 alignments. 4321 4322 * stage3.c: In find_novel_spliceends, using trim lengths at ends to define 4323 two regions, one with a stronger and one with a weaker criterion for 4324 splice site probability. 4325 4326 * stage2.c, stage2.h: Implemented Stage2_compute_starts and 4327 Stage2_compute_ends for extending ends of alignments. Fixed a condition 4328 for termination of while loop. 4329 4330 * stage1hr.h: Allowing for npaths_primary and npaths_alternate. Allowing 4331 for arbitrarily long insertions when --max-middle-insertions is not set by 4332 user. 4333 4334 * stage1hr.c: Allowing for npaths_primary and npaths_alternate. Making call 4335 to extend gmap alignments. Allowing for arbitrarily long insertions when 4336 --max-middle-insertions is not set by user 4337 4338 * sarray-read.h, sarray-read.c: Allowing for arbitrarily long insertions 4339 when --max-middle-insertions is not set by user 4340 4341 * samprint.c, samprint.h: Allowing for npaths_primary and npaths_alternate. 4342 Added parameter for artificial mate in --add-paired-nomappers. 4343 4344 * pairpool.c: Added assertion for genomepos 4345 4346 * pair.c, pair.h: Allowing for npaths_primary and npaths_alternate. Added 4347 function Pair_trim_distances, used by new find_novel_spliceends function 4348 for pairs. 4349 4350 * output.c: Allowing for npaths_primary and npaths_alternate. Using new 4351 interface using artificial_mate_p. 4352 4353 * gmapindex.c: Allowing for alternate scaffolds 4354 4355 * dynprog_simd.c: Calling correct printing procedures for debugging 4356 4357 * dynprog_genome.c: Requiring finalscore to be >= 0 4358 4359 * Makefile.gsnaptoo.am: Added parserange.c and parserange.h for sam_sort 4360 43612016-01-15 twu 4362 4363 * src, result.c, result.h, resulthr.c, resulthr.h: Merging revision 182439 4364 from branches/2014-09-04-secondary-chr to handle npaths_primary and 4365 npaths_altloc 4366 4367 * src, iit-read.c, iit_store.c: Merging revision 182435 to use new interface 4368 to Chrom_from_string 4369 4370 * src, parserange.c: Merging revision 182431 from 4371 branches/2014-09-04-secondary-chr to check if contig_iit is NULL 4372 4373 * Makefile.gsnaptoo.am: Merging revisions 162111 and 182429 from 4374 branches/2014-09-04-secondary-chr to add tableint.c, tableint.h, 4375 parserange.c, and parserange.h where needed 4376 4377 * chrom.c, chrom.h: Merged revisions 162112 and 182427 from 4378 branches/2014-09-04-secondary-chr to add fields for alt_scaffold_start and 4379 alt_scaffold_end 4380 4381 * table.c, tableint.c, tableuint.c, tableuint8.c: Merged revision 162110 4382 from branches/2014-09-04-secondary-chr to fix memory leak 4383 4384 * iit-read-univ.c, iit-read-univ.h: Merged revision 182424 from 4385 branches/2014-09-04-secondary-chr to add function Univ_IIT_altlocp 4386 4387 * util, fa_coords.pl.in, gmap_build.pl.in, gmap_process.pl.in: Merged 4388 revisions 146896 through 182422 from branches/2014-09-04-secondary-chr 4389 4390 * index.html: Updated to latest version 4391 4392 * archive.html: Added revision for 2014-12-31.v2 4393 4394 * stage1hr.c: Fixed array overflow in segmentation procedure 4395 43962016-01-14 twu 4397 4398 * uniqscan.c: Using new interface to Stage3hr_setup 4399 4400 * gsnap.c, stage3hr.c, stage3hr.h: Distinguishing between pairmax_linear and 4401 pairmax_circular 4402 4403 * pair.c: Removed SOFT_CLIPS_AVOID_CIRCULARIZATION code in computing 4404 circularpos, since it isn't needed 4405 4406 * substring.c, substring.h: Defining Substring_mandatory_trim_left and 4407 Substring_mandatory_trim_right. 4408 4409 * stage3hr.c: Turning SOFT_CLIPS_AVOID_CIRCULARIZATION back on. Using 4410 Substring_mandatory_left_trim and Substring_mandatory_right_trim. 4411 44122016-01-13 twu 4413 4414 * gmap_select.c, gmapl_select.c, gsnap_select.c, gsnapl_select.c: Checking 4415 for ENOENT instead of EACCES 4416 4417 * substring.c: Handling trim_left_action and trim_right_action, instead of 4418 trim_left_p and trim_right_p 4419 4420 * substring.h, stage3hr.c: Using trim_left_action and trim_right_action, 4421 instead of trim_left_p and trim_right_p 4422 44232016-01-12 twu 4424 4425 * gmap_select.c, gmapl_select.c, gsnap_select.c, gsnapl_select.c: Returning 4426 return code from execvp 4427 4428 * gmap_select.c, gmapl_select.c, gsnap_select.c, gsnapl_select.c: Handling 4429 case where no path is given, by using execvp to find the correct program 4430 4431 * stage3hr.c, substring.c: Computing coordinates correctly in salvage 4432 procedure for terminal alignments with too many mismatches 4433 44342016-01-11 twu 4435 4436 * iit-read.c: Removed debugging code which was not adding .iit ending to 4437 file suffix 4438 44392016-01-08 twu 4440 4441 * dynprog_cdna.c, dynprog_end.c, dynprog_genome.c, dynprog_single.c: Added 4442 parameters for Dynprog_standard in non-SIMD code 4443 4444 * samprint.c, stage3hr.c, stage3hr.h: Changed name of function from 4445 Stage3end_substring2 to Stage3end_substringN 4446 4447 * gsnap.c, sarray-read.c, sarray-read.h: Not allowing for ambiguous splicing 4448 on circular chromosomes 4449 4450 * dynprog_single.c: Added assertion 4451 4452 * dynprog_genome.c: Checking for the case where no intron is found in a gap 4453 44542016-01-07 twu 4455 4456 * stage1hr.c: Removed extra debugging code 4457 4458 * sarray-read.c, stage3hr.c: Handling Junction_gc from within 4459 Stage3end_new_substrings 4460 4461 * stage1hr.c: Defining MAX_ANCHORS instead of EXHAUSTIVE_ANCHORS. Keeping 4462 track of both all_segments and anchor_segments, and using whichever 4463 satisfies MAX_ANCHORS. 4464 4465 * dynprog_cdna.c, dynprog_end.c, dynprog_genome.c, dynprog_simd.h, 4466 dynprog_single.c: Replacing DEBUG14 and DEBUG16 with DEBUG_SIMD and 4467 DEBUG_AVX2, respectively 4468 4469 * dynprog.c: Allocating space needed for AVX2 debugging 4470 4471 * dynprog.h: Providing SIMD variables for non-AVX2 debugging procedures 4472 4473 * dynprog_simd.c: Fixed SIMD variables in non-AVX2 debugging procedures 4474 4475 * dynprog_simd.c: Added code for AVX2 4476 4477 * dynprog_cdna.c, dynprog_end.c, dynprog_genome.c, dynprog_simd.h: Passing 4478 debugging parameters for both SIMD and AVX2 debugging 4479 4480 * dynprog.c: Generalized allocation procedures to use ALIGN_SIZE 4481 4482 * stage3hr.c: In Stage3end_optimal_score and Stage3pair_optimal_score, 4483 turning off comparison of score_eventrim with cutoff_level. In 4484 Stage3end_new_terminal, if number of mismatches between pos5 and pos3 4485 exceeds number allowed, then recomputing pos5 or pos3 that does fit within 4486 the number allowed. 4487 4488 * stage1hr.c: Allowing for an exhaustive set of anchor segments 4489 44902015-12-19 twu 4491 4492 * types.h: Revised comment 4493 4494 * dynprog.h: Moved definitions of infinite gap penalties here 4495 4496 * dynprog.c: Adjusting for negative infinity in last_nogap in F loop 4497 4498 * atoi.c, cmet.c: Added types to make sure 64 bits are used 4499 45002015-12-18 twu 4501 4502 * dynprog_simd.c: Fixed initial conditions for all three types of initial 4503 gap penalty 4504 45052015-12-15 twu 4506 4507 * dynprog.c: Added initializtion for ZERO_INITIAL_GAP_PENALTY. Made fixes 4508 for initial Fgap calculation for standard initial gap penalty, by 4509 initializing last_nogap appropriately. 4510 4511 * dynprog_simd.c: Using a filter on lband for the E calculation for the 4512 first column when using a standard initial gap penalty 4513 4514 * dynprog_end.c: Added function needed for debugging 4515 4516 * dynprog_simd.c: For standard initial gap penalty, revising extend_ladder 4517 for first column of values, in second and later blocks. 4518 4519 * dynprog_simd.c: Implemented code for ZERO_INITIAL_GAP_PENALTY and 4520 INFINITE_INITIAL_GAP_PENALTY. Added filters on lband for first column of 4521 values in ZERO_INITIAL_GAP_PENALTY. 4522 45232015-12-11 twu 4524 4525 * dynprog.c, dynprog.h: Added upperp and lowerp parameters to 4526 Dynprog_standard, to give it the same behavior as the SIMD upper and lower 4527 procedures 4528 45292015-12-10 twu 4530 4531 * VERSION: Updated version number 4532 4533 * oligoindex_hr.h: Removed duplicate definition of Shortoligmer_T 4534 4535 * trunk, config.site.rescomp.tst, src, Makefile.gsnaptoo.am, 4536 Makefile.pmaptoo.am, alphabet.c, alphabet.h, atoi.h, bitpack64-read.c, 4537 bitpack64-read.h, block.c, block.h, cmet.h, compress-write.c, dynprog.c, 4538 dynprog_single.c, gmap.c, gmapindex.c, indexdb-write.c, indexdb-write.h, 4539 indexdb.c, indexdb.h, oligoindex.h, oligoindex_hr.c, oligoindex_pmap.c, 4540 oligoindex_pmap.h, oligop.c, oligop.h, pair.c, pmap_select.c, pmapindex.c, 4541 stage1.c, stage2.c, stage2.h, stage3.c, types.h: Merged revisions 179384 4542 through 180698 from branches/2015-11-20-pmap 4543 45442015-12-09 twu 4545 4546 * indexdb.c: Defining blocksize for PMAP 4547 4548 * stage3hr.c: Adjusting cutoff levels in Stage3end_pair_up_concordant to 4549 consider the best alignment on each end, if has mismatches that exceed the 4550 given cutoff level. Added a field distant_splice_p, and using that for 4551 filtering, rather than chrnum == 0. 4552 4553 * output.c, samprint.c, samprint.h: Printing NH to be the maximum of the two 4554 npaths in --add-paired-nomappers option 4555 45562015-12-07 twu 4557 4558 * trunk, VERSION, config.site.rescomp.prd, config.site.rescomp.tst, src, 4559 Makefile.gsnaptoo.am, atoi.c, atoi.h, atoiindex.c, bitpack64-access.c, 4560 bitpack64-access.h, bitpack64-incr.c, bitpack64-incr.h, bitpack64-read.c, 4561 bitpack64-read.h, bitpack64-readtwo.c, bitpack64-readtwo.h, 4562 bitpack64-write.c, bitpack64-write.h, block.c, block.h, cmet.c, cmet.h, 4563 cmetindex.c, gdiag.c, genome_sites.c, gmap.c, gmapindex.c, gsnap.c, 4564 indexdb-write.c, indexdb-write.h, indexdb.c, indexdb.h, indexdb_hr.c, 4565 indexdb_hr.h, oligo.c, oligo.h, oligoindex_hr.c, sarray-read.c, 4566 sarray-write.c, snpindex.c, spanningelt.c, spanningelt.h, 4567 splicetrie_build.c, stage1.c, stage1hr.c, types.h: Merged revisions 179335 4568 through 180340 from branches/2015-11-20-16mers to allow for genomic 4569 indices up to 18-mers 4570 4571 * index.html: Updated for latest version 4572 45732015-12-04 twu 4574 4575 * samprint.c: Fixed printing under --add-paired-nomappers for unpaired 4576 multiple alignments where npaths2 > npaths1 4577 45782015-12-02 twu 4579 4580 * stage2.c: In looking back, skipping positions where no hits were found 4581 45822015-12-01 twu 4583 4584 * configure.ac: Building only one level of SIMD 4585 4586 * ax_ext.m4: Added "no" flags for various SIMD levels 4587 45882015-11-19 twu 4589 4590 * gsnap.c: Fixed incorrect check on floating-point values for --min-coverage 4591 4592 * gmap_select.c, gmapl_select.c, gsnap_select.c, gsnapl_select.c: Picking 4593 best available program at run-time 4594 4595 * stage1hr.c: In computing gmap using segments, introducing a min_genomepos 4596 and max_genomepos 4597 4598 * samread.c: Fixed incorrect parsing needed by sam_sort, resulting from 4599 missing break commands in case statements 4600 4601 * samprint.c: Fixed some issues with --add-paired-nomappers option 4602 4603 * inbuffer.c: Initializing variables 4604 4605 * pair.c, stage3hr.c: Excluding alignments to circular chromosomes that 4606 extend below the first copy or above the second copy 4607 4608 * atoiindex.c, cmetindex.c, gmapindex.c: Using new interface to suffix array 4609 write procedures 4610 4611 * sarray-write.h, sarray-write.c: Using a list of Cell_T objects to compute 4612 exceptions 4613 4614 * bytecoding.c, bytecoding.h: Implemented procedures for writing bytes file 4615 and interleaving bytes files. 4616 4617 * Makefile.gsnaptoo.am: Removed uinttable.c and uinttable.h from programs 4618 with bytecoding.c 4619 46202015-11-13 twu 4621 4622 * Makefile.gsnaptoo.am: Added uinttable files needed by bytecoding 4623 46242015-10-29 twu 4625 4626 * stage3hr.c: Turning off SOFT_CLIPS_AVOID_CIRCULARIZATION. Handling 4627 deletions on minus strand that go beyond genomic position 0. 4628 46292015-10-06 twu 4630 4631 * Makefile.gsnaptoo.am: Added "=1" to some pre-processor flags 4632 4633 * access.h: Added LOADED type for cases where IIT file is loaded from memory 4634 instead of being read from file 4635 4636 * iit-write.c: Adding padding to character arrays in IIT file, so the 4637 integer arrays are aligned. 4638 4639 * iit-read.c, iit-read.h: Added IIT_load function to obtain IIT from a 4640 region of memory 4641 46422015-09-28 twu 4643 4644 * oligoindex_hr.c: Handling the case where left_plus_length < indexsize 4645 46462015-09-24 twu 4647 4648 * gmap.c, gsnap.c: Revised explanation message for illegal instructions 4649 4650 * datadir.c: Increased buffer size for dbversion file name 4651 4652 * access.c: Looping until we get a semaphore, either by creation or by using 4653 existing one 4654 46552015-09-21 twu 4656 4657 * stage3hr.c: Using nsegments field for all alignment types. Filtering 4658 results if nsegments is relatively high compared with the best result. 4659 4660 * stage3.c: Using higher standard for microexons. Using nmatches instead of 4661 support for sufficient_splice_prob. 4662 4663 * stage1hr.c: Handling case of paired-end alignments where both ends do not 4664 satisfy minimum coverage. Fixed debugging statements. 4665 4666 * splice.c: Using correct variable names inside FREEA calls 4667 4668 * pair.c, pair.h: Implemented Pair_maxnegscore 4669 4670 * dynprog_genome.c: Revised bridge_intron_gap procedures to make reasoning 4671 clearer. Using weaker values for scoreI. Using maxnegscore to filter out 4672 bad alignments. 4673 46742015-09-16 twu 4675 4676 * stage1hr.c: Removed debugging comment 4677 4678 * stage1hr.c, stage3hr.c: Fixed calls to Genome_get_segment_blocks_left, 4679 where we were providing the left coordinate instead of the right one. 4680 4681 * genome.c: Added comment 4682 46832015-09-11 twu 4684 4685 * cpuid.c: Added to repository 4686 4687 * uniqscan.c: Using new interfaces to stage 1 procedures 4688 4689 * substring.c, substring.h: Removed reject_trimlength 4690 4691 * stage3hr.c, stage3hr.h: Added procedures for filtering by coverage 4692 4693 * stage1hr.h, gsnap.c: Added --min-coverage and removed --terminal-threshold 4694 4695 * stage1hr.c: Added --min-coverage. Changed criteria for running 4696 find_terminals. 4697 4698 * gmap_build.pl.in: Allowing for spaces in destination directory 4699 47002015-09-08 twu 4701 4702 * stage3hr.c: Fixed debugging statement 4703 47042015-09-01 twu 4705 4706 * gmap.c: Setting some uninitialized variables for chimera 4707 4708 * chimera.c: When chimeric breakpoint is beyond chromosomal bounds, 4709 returning NN for dinucleotides 4710 4711 * stage3hr.c: Favoring non-zero sensedirs when sorting results 4712 4713 * splice.c: Fixed variable names for FREEA command, which was observed by 4714 compiler only when alloca is not available. 4715 4716 * oligoindex_hr.c: Initializing some return variables when exiting trimming 4717 procedure early 4718 4719 * chimera.c, chimera.h, gmap.c, pair.c: Fixed issues when chimeras extend to 4720 beginning or end of chromosomes, causing a search for donor and acceptor 4721 nucleotides beyond chromosomal bounds. Fixed Pair_pathscores to extend to 4722 last pair of path. 4723 4724 * dynprog_genome.c: Resolved fatal bug in bridging intron gaps when when no 4725 probabilities are found 4726 47272015-08-31 twu 4728 4729 * stage1.c: No longer using alloca for array of Batch_T objects 4730 47312015-08-28 twu 4732 4733 * gmap.c: Changed MALLOC of array to MALLOC_OUT. Fixed code for memusage. 4734 4735 * outbuffer.c: Changed FREE of outputs to FREE_KEEP 4736 4737 * stage3.c: Using known splices in pick_cdna_direction. Changed MALLOC of 4738 Stage3_new to MALLOC_OUT. 4739 47402015-08-27 twu 4741 4742 * stage3.c: Changed criterion for evaluating splice neighborhood to allow 4743 for short ends 4744 47452015-08-24 twu 4746 4747 * sarray-read.c: Using max_mismatches_allowed from original call to suffix 4748 array algorithm, and not allowing it to be unlimited 4749 4750 * splice.c: Added debugging statements 4751 4752 * genome.c: Setting end of genomealt string to be NULL 4753 4754 * dynprog_genome.c: Changed loop end condition to avoid accessing 4755 uninitialized variables 4756 4757 * stage3hr.c: Calling Genome_get_segment_blocks_left with chroffset and not 4758 chrhigh. Setting resolve value to be -1 in case of AMB_UNRESOLVED_TOOCLOSE. 4759 47602015-08-19 twu 4761 4762 * gsnap.c: Removed extraneous space after newline 4763 4764 * access.c: Added header file 4765 47662015-08-13 twu 4767 4768 * gsnap.c: Using new interface to SAM_setup 4769 4770 * oligoindex_hr.h: Changing Inquery_T types to be unsigned 4771 4772 * oligoindex_hr.c: Removed old code that caused counts to be incremented 4773 twice 4774 4775 * access.c, access.h, atoiindex.c, cmetindex.c, genome.c, gmapindex.c, 4776 indexdb-write.c, indexdb.c, sarray-read.c, sarray-write.c, snpindex.c: 4777 Replaced Access_allocate with Access_allocate_private and 4778 Access_allocate_shared 4779 4780 * gsnap.c, samprint.c, samprint.h: Added option 4781 --paired-flag-means-concordant 4782 47832015-08-11 twu 4784 4785 * genome.c, indexdb.c, indexdbdef.h, sarray-read.c: Storing keys from shared 4786 memory to check semaphores to see if the memory should be retained 4787 4788 * access.c, access.h, gsnap.c: Added code for preloading and unloading of 4789 shared memory 4790 4791 * configure.ac: No longer offering options to enable or disable CPU types 4792 4793 * ax_ext.m4: Various changes to handle CPU types and features 4794 4795 * oligoindex_hr.c, oligoindex_hr.h: Created Inquery_T type to handle both 4796 SSE2 and non-SSE code 4797 4798 * genome128_hr.c: Fixed branch for clz/ctz for SSE4.2 4799 4800 * configure.ac: Removing mpi for now 4801 48022015-08-10 twu 4803 4804 * ax_ext.m4: Requiring BMI2 as part of AVX2 4805 4806 * sarray-read.c: Allowing stream_load of si128 only for HAVE_SSE4_1 4807 4808 * genome.c, genome.h, indexdb.c, indexdb.h, sarray-read.h: Commented unused 4809 procedures for shmem_remove 4810 4811 * trunk, Makefile.am, acinclude.m4, ax_compiler_vendor.m4, ax_ext.m4, 4812 config.site.rescomp.tst, configure.ac, src, Makefile.gsnaptoo.am, cpuid.h, 4813 genome128_hr.c, gmap.c, gmap_select.c, gmapindex.c, gmapl_select.c, 4814 gsnap.c, gsnap_select.c, gsnapl_select.c, popcount.c, popcount.h, 4815 sarray-read.c: Merged revisions 171384 through 171613 from 4816 branches/2015-08-06-run-time-variants to allow for run-time variants 4817 4818 * trunk, VERSION, config.site.rescomp.tst, index.html, src, oligoindex_hr.c, 4819 oligoindex_hr.h, sarray-read.c: Merged revisions 170634 to 171595 to add 4820 code for AVX2 4821 48222015-08-05 twu 4823 4824 * oligoindex_hr.c: Changed debugging statements 4825 48262015-08-04 twu 4827 4828 * oligoindex_hr.c: Added debugging statements 4829 48302015-08-03 twu 4831 4832 * gsnap.c, samprint.c, samprint.h: Added flag --add-paired-nomappers 4833 48342015-07-28 twu 4835 4836 * oligoindex_hr.c: Restored missing line in counting of 9-mers 4837 48382015-07-23 twu 4839 4840 * VERSION: Updated version number 4841 4842 * stage1hr.c: Removed an abort command from debugging 4843 4844 * sarray-read.c: Using new interface to Bytecoding lcp_next function. 4845 Commented out code that is not used when SUBDIVIDE_ENDS is not defined. 4846 4847 * bytecoding.c, bytecoding.h: Call to lcp_next now returns child_next 4848 48492015-07-22 twu 4850 4851 * VERSION: Updated version number 4852 4853 * dynprog_genome.c: Fixed boundaries that led to negative coordinates for 4854 splice site candidates. 4855 4856 * stage1hr.c: Removed unused variables 4857 4858 * stage1hr.c: Removed allvalidp as parameter to align_end and align_pair. 4859 4860 * stage1hr.c: Setting spanningsetp and completesetp to false if querylength 4861 < min_kmer_readlength 4862 4863 * stage1hr.c: Removed restriction on min_readlength. Running only suffix 4864 array, if possible, if reads are too short. 4865 4866 * access.c: Changed user message 4867 4868 * sarray-write.c: Changing plcp[n] to be 0 instead of -1 4869 4870 * sarray-read.c: Improved debugging results 4871 4872 * access.c: Printing user message if shmem fails 4873 48742015-07-17 twu 4875 4876 * get-genome.c, sequence.c, sequence.h: Added flags for --stream-chars and 4877 --stream-ints 4878 48792015-06-26 twu 4880 4881 * trunk, Makefile.gsnaptoo.am, 2015-statgen, algorithm.tex, discussion.tex, 4882 features.tex, introduction.tex, util: Modified mergeinfo 4883 4884 * config.site.rescomp.tst: Updated version 4885 4886 * index.html: Updated for version 2015-06-23 4887 4888 * archive.html: Updated for version 2014-12-31 4889 4890 * README: Removed references to Goby 4891 4892 * src, access.c, bigendian.c, bigendian.h, bitpack64-access.c, 4893 bitpack64-read.c, bitpack64-readtwo.c, bytecoding.c, compress.c, 4894 compress.h, genome-write.c, genome.c, genome.h, genome128_hr.c, 4895 iit-read-univ.c, indexdb.c, indexdb_hr.c, sarray-read.c, sarray-write.c, 4896 snpindex.c, types.h, univinterval.h: Merged revisions 167282 through 4897 168383 from branches/2015-06-10-bigendian to support bigendian 4898 architectures 4899 4900 * Makefile.dna.am, Makefile.util.am: Added instructions for check-bigendian 4901 49022015-06-23 twu 4903 4904 * VERSION, config.site.rescomp.tst: Updated version number 4905 4906 * algorithm.tex, biblio.bib, discussion.tex, features.tex, introduction.tex, 4907 toplevel.tex: Final version 4908 4909 * stage1hr.c: Added comments 4910 4911 * gmap.c: Removed message about different batch levels 4912 4913 * gsnap.c: Added option --master-is-worker for MPI version 4914 4915 * access.c: Using malloc whenever shmget fails 4916 49172015-06-15 twu 4918 4919 * stage1hr.c: Removed extra #endif statements 4920 4921 * trunk, VERSION, config.site.rescomp.tst, Makefile.gsnaptoo.am, 4922 2015-statgen, Ambiguous-splicing.eps, Hierarchical-GMAP.eps, 4923 Large-hash-table.eps, Overlapping-alignment.eps, biblio.bib, toplevel.tex, 4924 util: Updated version number 4925 4926 * stage1hr.c: Fixed indentation 4927 4928 * src, genome.c, genome128_hr.c, gmap.c, gsnap.c, indexdb.c, mode.h, 4929 sarray-read.c, stage1hr.c, substring.c, uniqscan.c: Merged revisions 4930 165630 through 167691 from branches/2015-05-13-ttoc to implement ttoc mode 4931 4932 * splice.c: Applied revision 167580 from releases/public-2014-12-17. In 4933 group_by_segmenti_aux and group_by_segmentj_aux, checking plusp for each 4934 individual hit in deciding whether to group donor or acceptor. 4935 4936 * bitpack64-readtwo.c: Added debugging statements 4937 4938 * sarray-read.c: Defining a variable for debugging 4939 4940 * oligoindex_hr.c: Defining reverse_nt for machines without SSE4.1 4941 49422015-06-11 twu 4943 4944 * stage3hr.c: Changed occurrences of Uintlist_next to Uint8list_next for 4945 LARGE_GENOMES 4946 4947 * oligoindex_hr.c: Providing alternative to _mm_extract_epi32 for machines 4948 without SSE4.1 4949 4950 * acinclude.m4, shm-flags.m4, configure.ac, access.c: Including check for 4951 SHM_NORESERVE 4952 4953 * Makefile.gsnaptoo.am: Removed -lrt 4954 4955 * sarray-read.c: Initializing chromosome values to be those for chrnum 1 to 4956 handle left == 0 4957 49582015-06-10 twu 4959 4960 * VERSION, index.html: Updated version number 4961 4962 * sarray-write.c, gmapindex.c: Removing rankfile 4963 4964 * gmap_build.pl.in: Changed flag from --no-sarray to --build-sarray 4965 4966 * atoiindex.c, cmetindex.c: Added flag --build-sarray 4967 49682015-06-09 twu 4969 4970 * indel.c: Added debugging statements 4971 4972 * stage1hr.c: Bypassing gmap on region if mappingend is less than or equal 4973 to mappingstart, which can happen if the region is pushed to the beginning 4974 or end of the chromosome 4975 4976 * stage3hr.c: Assigning loop variable to given junctions before we push 4977 left_ambig 4978 49792015-06-06 twu 4980 4981 * stage3.c: Reversed last revision, and put trim_novel_spliceends at 4982 beginning of path_trim, since putting it at the end results in an infinite 4983 loop 4984 4985 * stage3hr.c: Added debugging statement 4986 4987 * stage3.c: Moved trimming of novel spliceends from beginning of path_trim 4988 procedure to end 4989 4990 * pair.c: Fixed computation of circularpos for minus alignments 4991 49922015-06-05 twu 4993 4994 * samprint.c: Removed unused variables 4995 4996 * stage3hr.c: In printing translocations, getting separate chrs for the two 4997 halves. Turned on TRANSLOC_SPECIAL. 4998 4999 * samprint.c: In printing halfdonors and halfacceptors, comparing endlengths 5000 to trimlengths to determine whether to print H or S in CIGAR string 5001 5002 * samprint.c: Fixed printing of CIGAR strings for minus alignments 5003 50042015-06-04 twu 5005 5006 * stage1hr.c: Added lowpos and highpos to Segment_T object. Rewrote dynamic 5007 programming procedures for converting segments to pairs. 5008 50092015-06-03 twu 5010 5011 * stage1hr.c: In converting segments to GMAP, changed criteria for dynamic 5012 programming to be relative to anchor_segment and not to segment[k]. 5013 50142015-06-02 twu 5015 5016 * sarray-read.c, stage3hr.c: Using new interface to Substring_new_ambig 5017 5018 * substring.c, substring.h: Setting trim_left and trim_right for ambiguous 5019 substrings 5020 5021 * stage3hr.c, substring.c, substring.h: Renamed outofbounds variables to 5022 outofbounds_start and outofbounds_end. Handling the case where the 5023 alignment is out of bounds to the left of the current chromosome. 5024 5025 * VERSION: Updated version number 5026 5027 * archive.html, index.html: Made changes for new version 5028 5029 * stage1hr.c: Handling the case where floors is NULL, such as for a poly-A 5030 read 5031 5032 * stage3hr.c: Fixed genomic segments for converting substrings to GMAP 5033 5034 * stage1hr.c: For converting segments to GMAP, fixed criteria for allowing 5035 non-monotonic query orders and possible insertions 5036 5037 * stage3hr.c: Fixed bug in referring to uninitialized substring 5038 5039 * substring.c, substring.h: Removed left_genomicseg field 5040 5041 * stage3hr.c: In converting substrings to GMAP, using correct genomic 5042 nucleotide now 5043 5044 * gsnap.c: Made batch level 4 the default 5045 5046 * stage1hr.c: Reordered search algorithms. Limiting number of anchor 5047 segments, and pairing up those instead. Disabling doublesplicing 5048 algorithm. 5049 5050 * sarray-read.h, sarray-read.c: Removed references to sarray_gmap 5051 5052 * pair.c, pair.h: For GSNAP default output format, no longer printing pair 5053 info for single-end reads 5054 5055 * memory-check.pl: Handling results for non-threaded runs 5056 5057 * sarray-read.c: Fixed memory leak 5058 50592015-06-01 twu 5060 5061 * stage1hr.c: Deferring read_oligos until we need them for spanning set or 5062 complete set algorithms 5063 5064 * stage1hr.c: Fixed call to single hit alignment of GMAP. Made batch level 5065 4 the default for memory. 5066 5067 * stage1hr.c: Allowing terminal alignments only if no single-end alignments 5068 are found, or if no concordant alignments are found. 5069 5070 * sarray-read.c: Fixed memory leak 5071 5072 * stage1hr.c: Limiting number of anchor segments. Implementing terminal 5073 alignments. 5074 5075 * stage3hr.c: Using new interface to Substring procedures 5076 5077 * substring.c, substring.h: Removed unused variables 5078 5079 * samprint.c: Removed obsolete code for printing specific GSNAP types 5080 5081 * stage1hr.c: Implemented finding of terminals based on anchor segments 5082 5083 * stage3hr.c: Fixed accumulation of ilength_high. In comparing GMAP against 5084 substrings, iterating through all substrings. 5085 5086 * stage3hr.c, stage3hr.h, substring.c, substring.h: Fixed issues with 5087 substring boundaries, computing genomic_diff, and marking mismatches 5088 5089 * stage2.c: Removed GMAP-specific code from GSNAP 5090 5091 * samprint.c: Changed call to get querylengths 5092 5093 * genome128_hr.c, genome128_hr.h: Removed mismatch_offset 5094 50952015-05-31 twu 5096 5097 * samprint.c: Removed references to Pair_check_cigar. Changed calls to get 5098 cdna_direction to those for sensedir. 5099 5100 * pair.c: Removed printing of state 5101 5102 * substring.c: Removed references to genomicstart_adj and genomicend_adj in 5103 converting substrings to pairs 5104 5105 * stage3hr.h: Removed interface for Stage3end_indel_pos 5106 5107 * stage3hr.c: Changed calls to Substring_new for insertion and deletion 5108 types to conform to new substrings standards, where each substring has its 5109 genomicstart and genomicend adjusted for indels. Removed indel_pos and 5110 indel_low fields from Stage3end_T object. Removed code for printing 5111 separate GSNAP types. 5112 5113 * stage3hr.c: Setting trim_left, trim_right, trim_left_splicep, and 5114 trim_right_splicep for substring hit type 5115 51162015-05-29 twu 5117 5118 * stage3hr.c: Fixed coordinate error in test_hardclips 5119 5120 * stage3hr.c: Fixed typo 5121 5122 * samprint.c, stage3hr.c: Fixed issues in finding substring_low for minus 5123 alignments using hardclip_low 5124 5125 * stage3hr.c: Fixed computation of ilength for substrings 5126 5127 * trunk, README, Makefile.gsnaptoo.am, 2015-statgen, Ambiguous-splicing.eps, 5128 DP-triangles.eps, Diagonalization.eps, Hierarchical-GMAP.eps, 5129 Large-hash-table.eps, Overlapping-alignment.eps, SIMD-oligomers.eps, 5130 Vertical-format.eps, algorithm.tex, biblio.bib, context.tex, 5131 discussion.tex, features.tex, introduction.tex, toplevel.tex, src, diag.c, 5132 diag.h, diagpool.c, diagpool.h, doublelist.c, doublelist.h, 5133 genome128_hr.c, gmap.c, gsnap.c, indel.c, indel.h, intlist.c, intlist.h, 5134 junction.c, junction.h, list.c, list.h, oligoindex_hr.c, oligoindex_hr.h, 5135 pair.c, pair.h, samprint.c, samprint.h, sarray-read.c, sarray-read.h, 5136 sequence.c, splice.c, splice.h, splicing-score.c, stage1hr.c, stage1hr.h, 5137 stage2.c, stage2.h, stage3.c, stage3.h, stage3hr.c, stage3hr.h, 5138 substring.c, substring.h, uint8list.c, uint8list.h, uintlist.c, 5139 uintlist.h, uniqscan.c, univdiag.c, univdiag.h, univdiagdef.h, util: 5140 Merged revisions 162218 to 166640 from branches/2015-03-28-sarray-gmap, 5141 2015-03-31-new-sarray-, 2015-05-07-sarray-ambig, 2015-05-21-segment-gmap, 5142 and 2015-05-22-fast-oligoindex 5143 5144 * trunk, config.site.rescomp.tst: Updated version number 5145 5146 * index.html: Made changes for 2014-12-29 5147 5148 * samprint.c: Moved position of #endif line 5149 51502015-05-27 twu 5151 5152 * substring.c, stage3.c: Fixes to debugging statements 5153 5154 * samprint.c, samprint.h: Revisions to SAM_compute_chrpos 5155 5156 * output.c: Using new interface to SAM_compute_chrpos 5157 51582015-05-19 twu 5159 5160 * src, gmapindex.c: Allowing genomecomp to be a command-line argument. 5161 Merged changes from branches-2015-05-15-compressed-sarray to allow for 5162 compressed suffix arrays. 5163 5164 * util, gmap_build.pl.in: Providing genomecomp file as a command-line 5165 argument, instead of piping it into gmapindex 5166 5167 * sarray-write.c, sarray-write.h: Merged changes from 5168 branches/2015-05-15-compressed-sarray to allow for compressed suffix 5169 arrays, but removed csafile needed for debugging 5170 5171 * sarray-read.c: Turning off code for compressed suffix arrays 5172 5173 * indexdb-write.c, indexdb-write.h: Allowing the case where genomelength is 5174 less than index1part 5175 5176 * bitpack64-write.h: Improved comments 5177 5178 * access.c: Merged changes from branches/2015-05-15-compressed-sarray to 5179 assign *fd, even if file is empty 5180 5181 * sarray-read.c: Merged code for compressed suffix array. Implemented 5182 different methods for Elt_fill_positions_filtered, depending on whether 5183 the filtering occurs more than once. 5184 5185 * gmap.c: Using new interface to Pair_setup 5186 51872015-05-15 twu 5188 5189 * output.c: Not computing chrpos for SAMECHR_SPLICE and TRANSLOC_SPLICE 5190 hittypes 5191 5192 * gmap.c, gsnap.c, pair.c, pair.h, uniqscan.c: Fixed issue with printing 5193 nsnpdiffs for GMAP alignments 5194 5195 * stage3hr.c: Turned on TRANSLOC_SPECIAL to remove translocations when 5196 non-translocation alignments are found. Using effective_chr for printing 5197 purposes. Pushing both substrings for a distant splice. Using querystart 5198 and queryend instead of querystart_adj and queryend_adj for computing 5199 insertlength. 5200 5201 * samprint.c: Using Substring_compute_chrpos to compute chrpos based on 5202 substrings instead of Stage3end_T object 5203 5204 * substring.c, substring.h: Implemented Substring_compute_chrpos 5205 52062015-05-01 twu 5207 5208 * iit-read.c: Checking for the possibility in IIT_get_highs_for_low and 5209 IIT_get_lows_for_high of a zero-length array. 5210 5211 * stage3hr.c: Fixed order of LtoH substrings for deletions 5212 5213 * oligoindex_hr.c: Replaced count_fwdrev_simd with individual 5214 count_*mer_fwd|rev_simd procedures 5215 52162015-04-30 twu 5217 5218 * substring.c: Revised some debugging statements 5219 5220 * stage3hr.c: Retaining old information about sarrayp when copying a 5221 Stage3_T object 5222 5223 * stage3.c: Initializing max_nmatches to be 0 in end-trimming procedures 5224 5225 * Makefile.gsnaptoo.am: Added -lrt to get shm commands 5226 5227 * algorithm.tex, context.tex, features.tex, introduction.tex: Augmented 5228 captions 5229 5230 * biblio.bib, toplevel.tex: Added references 5231 5232 * discussion.tex: Added material 5233 5234 * algorithm.tex, features.tex, introduction.tex: Added citations 5235 5236 * discussion.tex: Added text 5237 5238 * context.tex: Added description of GSTRUCT 5239 5240 * context.tex, discussion.tex: Moved HTSeqGenie to context.tex 5241 5242 * introduction.tex: Added caption 5243 5244 * features.tex: Revisions 5245 5246 * Diagonalization.eps, Hierarchical-GMAP.eps, Large-hash-table.eps, 5247 Overlapping-alignment.eps, SIMD-oligomers.eps: Revised figures 5248 5249 * algorithm.tex: Expanded caption 5250 52512015-04-29 matthejb 5252 5253 * discussion.tex: + adding content to discussion 5254 52552015-04-29 twu 5256 5257 * context.tex, algorithm.tex: Revisions 5258 5259 * algorithm.tex: Revisions to diagonalization 5260 5261 * toplevel.tex: Changed symbols for logical operations 5262 52632015-04-28 matthejb 5264 5265 * discussion.tex: + initial additions to discussion by MB 5266 52672015-04-28 twu 5268 5269 * algorithm.tex: Revisions to linear genome 5270 5271 * algorithm.tex: Moved material on large genomes from features.tex to here 5272 5273 * introduction.tex, features.tex: Revisions 5274 5275 * algorithm.tex: Moved section on ranking alignments and eliminating 5276 duplicates to features.tex 5277 52782015-04-27 twu 5279 5280 * discussion.tex: Added notes 5281 5282 * algorithm.tex: Changed table 5283 5284 * features.tex, introduction.tex: Revisions 5285 52862015-04-27 michafla 5287 5288 * biblio.bib, context.tex: first draft of gmapR writeup 5289 52902015-04-24 twu 5291 5292 * Hierarchical-GMAP.eps, algorithm.tex, features.tex, introduction.tex, 5293 toplevel.tex: Revisions 5294 5295 * Ambiguous-splicing.eps, DP-triangles.eps, Diagonalization.eps, 5296 Hierarchical-GMAP.eps, Large-hash-table.eps, Overlapping-alignment.eps, 5297 SIMD-oligomers.eps, Vertical-format.eps: Added figures 5298 52992015-04-23 twu 5300 5301 * papers, 2015-statgen, algorithm.tex, context.tex, discussion.tex, 5302 features.tex, introduction.tex, toplevel.tex: Added directory for editing 5303 papers 5304 53052015-04-07 twu 5306 5307 * splice.c: Fixed probability calculation for an ambiguous splice 5308 53092015-03-27 twu 5310 5311 * stage3hr.c: Allowing insertlength to be negative, up to -pairmax, to allow 5312 for overlaps. For debugging messages involving insert length, using 5313 chromosomal coordinates. 5314 5315 * stage1hr.c: Added address of GMAP alignment to debugging messages 5316 5317 * chimera.c: Added information about querypos and homology to XT field for 5318 GMAP 5319 5320 * samprint.c: Removed old version of adjust_hardclips 5321 53222015-03-26 twu 5323 5324 * filestring.c: Turned off debugging output to stdout 5325 5326 * outbuffer.c, master.c, master.h, gsnap.c, filestring.c: Allow possibility 5327 in MPI for output to stdout 5328 5329 * mpidebug.h: Added tag for writing to stdout 5330 5331 * mpidebug.c: Handling debugging output for MPI_BOOL_T as an unsigned char 5332 5333 * gsnap.c: Allowing MPI with only a single thread per rank, by calling 5334 Master_parser as a detached thread 5335 5336 * sarray-read.c: Allowing memory mapping for indexij_access 5337 53382015-03-25 twu 5339 5340 * gmap.c, gsnap.c: Added USE_MPI checks around final MPI_Barrier 5341 5342 * VERSION: Updated version number 5343 5344 * trunk, configure.ac, index.html, src, access.c, access.h, atoiindex.c, 5345 cmetindex.c, genome.c, genome.h, get-genome.c, gmap.c, gmapindex.c, 5346 gsnap.c, iit-read-univ.c, iit-read.c, indexdb-write.c, indexdb.c, 5347 indexdb.h, indexdbdef.h, outbuffer.c, sarray-read.c, sarray-read.h, 5348 sarray-write.c, snpindex.c, uniqscan.c: Merged revisions 161768 through 5349 161939 from branches/2015-03-23-shmem to implement shared memory 5350 53512015-03-24 twu 5352 5353 * stage3hr.c: In test_hardclips, checking if low and high coordinates are 5354 equal 5355 5356 * stage3hr.c: Fixed comparison of chrpos in adjust_hardclips_right and 5357 adjust_hardclips_left 5358 53592015-03-23 twu 5360 5361 * stage3hr.c: In adjust_hardclips, advancing both low_querypos and 5362 high_querypos on either failure, to prevent infinite loop 5363 5364 * stage3hr.c: In adjust_hardclips, advancing either low_querypos or 5365 high_querypos if needed 5366 53672015-03-22 twu 5368 5369 * stage3hr.c: Doing a final test_hardclip when shift right and shift left 5370 are not possible 5371 5372 * substring.c: In alias_circular and unalias_circular, updating 5373 genomicstart_adj and genomicend_adj 5374 5375 * stage3hr.c: Changed endpoint test in Stage3end_substring_low 5376 5377 * substring.c: Removed debugging string 5378 5379 * substring.c, substring.h: Added fields genomicstart_adj and genomicend_adj 5380 for substring2 of insertions and deletions to handle computations with 5381 querypos to obtain a genomic position 5382 5383 * stage3hr.c: Using genomicstart_adj and genomicend_adj in insertions and 5384 deletions to handle computations with querypos to obtain a genomic position 5385 53862015-03-21 twu 5387 5388 * substring.c, substring.h: Substring_convert_to_pairs now takes 5389 genomicstart_indel_adj 5390 5391 * stage3hr.c: No longer changing left2, genomicstart2, and genomicend2 for 5392 substring2 of insertions and deletions. Providing indel adjustments 5393 instead to Substring_convert_to_pairs. 5394 5395 * pair.c: Made Pairarray_contains_p routine look for any case of a gap or 5396 indel for a given querypos 5397 53982015-03-20 twu 5399 5400 * stage3hr.c: In adjust_hardclips, for dual GMAP, added the ability to shift 5401 low_querypos or high_querypos independently to make the low genomicpos and 5402 high genomicpos equal. 5403 5404 * stage3hr.c: In test_hardclips, for dual GMAP, checking that the 5405 coordinates match for the two ends 5406 5407 * stage3hr.c: On recomputing of hardclips near center, decrementing the 5408 higher value to make the clipping more even 5409 5410 * stage3hr.c: Fixed bug in defining left2 for deletion 5411 5412 * stage3hr.c: In find_ilengths, returning false instead of aborting 5413 5414 * VERSION: Updated version number 5415 5416 * list.c, list.h: Implemented List_pop_out 5417 5418 * substring.c: Fixed genomic coordinates to be 0-based when converting from 5419 substrings to pairs 5420 5421 * stage3hr.c: In test_hardclips, fixed bug with uninitialized values. In 5422 adjust_hardclips, checking querypos, querypos-1, and querypos+1 again. 5423 Also, for dual GMAP, checking that genomepos matches for the given 5424 low_querypos and high_querypos, meaning that alignments are similar. 5425 Always doing a recompute of ilengths after adjust_hardclips. Implemented 5426 stripping of gaps and indels that occur between the two parts when doing a 5427 merge overlap. 5428 5429 * stage3hr.c: Subtracting 1 from alignstart or alignend in computing 5430 overlaps. The find_ilengths function returns false if a common point is 5431 not found. Added a test_hardclips step and separate right and left shifts 5432 for adjust_hardclip. Computing a separate genomicstart2 for substring2 of 5433 insertions and deletions. 5434 54352015-03-19 twu 5436 5437 * pair.c, pair.h: Implemented Pairarray_lookup 5438 5439 * stage3hr.c: Computing second hardclip from its ilength, not overlap. In 5440 finding common point involving GMAP, skipping introns and indels. Added 5441 code to check that merged overlap pieces are next to each other. 5442 54432015-03-18 twu 5444 5445 * stage3hr.c: Fixed bug in some of the initial loops of adjust_hardclips 5446 5447 * splice.c, stage1hr.c: Using only sensedir and not sensep in calling 5448 Substring_new_donor, acceptor, and shortexon 5449 54502015-03-17 twu 5451 5452 * stage3hr.c, substring.c, substring.h: Removed unused variables and 5453 parameters. Using sensedir instead of sensep. 5454 5455 * samprint.c: Removed unused parameters and variables 5456 5457 * substring.c, substring.h: Making Substring_print_shortexon use sensedir 5458 instead of sensep. Removed unused parameters. 5459 5460 * stage3hr.c: Calling Substring_print_donor, acceptor, and shortexon 5461 procedures with sensedir instead of sensep 5462 5463 * pair.c, pair.h: Removed unused parameters 5464 5465 * VERSION: Updated version number 5466 5467 * output.c: Using new interface to SAM_compute_chrpos 5468 5469 * samprint.c, samprint.h: Corrected calculations in SAM_compute_chrpos 5470 5471 * stage3hr.c, stage3hr.h: Using substring_LtoH instead of substring_low and 5472 substring_high. Added initial shift in adjust_hardclips. Fixed 5473 calculation of overlap to depend only on common_left and common_right. 5474 5475 * substring.c, substring.h: Changed Substring_chrstart and Substring_chrend 5476 to Substring_alignstart_chr and Substring_alignend_chr 5477 5478 * output.c, samprint.c, samprint.h: Did a reverse merge to undo revision 5479 160876 which used substring_hardclipped instead of substring_low 5480 54812015-03-12 twu 5482 5483 * VERSION: Updated version number 5484 5485 * output.c, samprint.c, samprint.h: Revised SAM_compute_chrpos to search for 5486 the hardclipped substring, rather than using substring_low 5487 5488 * stage3hr.c: Changed comment 5489 5490 * shortread.c: Initializing nextchar2 in various procedures 5491 5492 * gsnap.c: Fixed small memory leak 5493 54942015-03-11 twu 5495 5496 * stage3hr.c: Adjusting hardclips by checking adjacent positions left and 5497 right of the crossover querypos. 5498 5499 * substring.c: Removed comment 5500 5501 * stage3hr.c: Restored correct ilength calculations for minus strand 5502 55032015-03-06 twu 5504 5505 * VERSION, config.site.rescomp.prd, config.site.rescomp.tst, index.html: 5506 Updated version number 5507 5508 * stage3hr.c: Added comparisons in hitpair_sort_cmp to fix issue where 5509 duplicate alignments were not being put together for removal 5510 5511 * oligoindex_hr.c: Implemented bit twiddling and SIMD-based method for 5512 computing reverse_nt 5513 55142015-03-03 twu 5515 5516 * stage3.c: Removed automatic trimming of ends less than 12 bp. Fixed bug 5517 in assigning splice pair in end trimming procedures. 5518 5519 * ax_ext.m4: Performing run test for tzcnt_u32 and tzcnt_u64 5520 5521 * stage3hr.c: Made minor fixes in --clip-overlap feature, including fixes to 5522 gaps and overlaps, more even division of overlaps, and preference for 5523 clipping heads rather than tails in cases of ties 5524 5525 * stage3.c: Turning off branch that can lead to bad CIGAR strings 5526 5527 * inbuffer.c: Defining variable needed when MPI_FILE_INPUT is specified 5528 5529 * gsnap.c: Doing a chromosome_iit_setup before worker_setup 5530 5531 * genome128_hr.c: Using HAVE_TZCNT instead of HAVE_BMI1 5532 55332015-02-25 twu 5534 5535 * stage1hr.c, stage3hr.c, stage3hr.h: Printing an accession when reporting a 5536 CIGAR error 5537 5538 * inbuffer.c, inbuffer.h: Changed nspaces to be an unsigned int 5539 5540 * gsnap.c: Moved pthread_attr_init to places just before they are needed 5541 5542 * Makefile.gsnaptoo.am: Added master.c and master.h as extra files to be 5543 distributed 5544 5545 * master.c: Added pre-processor macros 5546 55472015-02-24 twu 5548 5549 * gsnap.c: Added pre-processor macro around inclusion of master.h 5550 5551 * trunk, VERSION, config.site.rescomp.prd, config.site.rescomp.tst, 5552 Makefile.gsnaptoo.am, index.html, src, filestring.c, filestring.h, 5553 gsnap.c, inbuffer.c, inbuffer.h, master.c, master.h, mpidebug.c, 5554 mpidebug.h, util: Merged revisions 158119 through 159424 from 5555 branches/2015-02-05-mpi-workers-0 to allow for worker threads in rank 0 5556 55572015-02-12 twu 5558 5559 * gmap.c: Added debugging statements 5560 5561 * chimera.c, pair.c, pair.h: Providing Pair_pathscores with a 5562 pre_extension_slop parameter. Distinguishing between call to 5563 Pair_pathscores when finding non-extended paths to pair up, and when 5564 finding a breakpoint between the final, extended paths. 5565 5566 * outbuffer.c: Rearranged procedures for compilation to work 5567 5568 * pair.c: In Pair_print_sam, always doing a Pair_compute_cigar 5569 5570 * outbuffer.c: Printing SAM headers on empty files 5571 55722015-02-10 twu 5573 5574 * gmap.c: Allowing PMAP to have variables for gff3_separators_p 5575 5576 * gmap.c, gsnap.c, pair.c, pair.h, uniqscan.c: For gff3 output, always 5577 adding a separator line. Added --gff3-add-separators flag to GMAP. 5578 5579 * stage1.c: In find_range, limiting number of results to 100 to avoid 5580 getting bogged down in repeats 5581 5582 * gff3_genes.pl.in, gff3_introns.pl.in, gff3_splicesites.pl.in: For gff3 5583 files without a gene name, always read $chr from line 5584 55852015-02-05 twu 5586 5587 * pair.c: GMAP always recompute cigar_tokens, in case merging has affected 5588 them 5589 55902015-02-04 twu 5591 5592 * pair.c: Added slop in computing Pair_pathscores, to allow for better 5593 identification of translocations 5594 5595 * gmap.c: Improved debugging statements 5596 5597 * chimera.c: Changed type of some debugging statements 5598 55992015-02-02 twu 5600 5601 * trunk, VERSION, src, gmap.c, gsnap.c, pair.c, pair.h, samprint.c, 5602 stage3.c, stage3hr.c, stage3hr.h, uniqscan.c: Merged revisions 157793 5603 through 157918 from branches/2015-01-30-cigar-check to create and check 5604 cigar strings when Stage3_T or Stage3end_T objects are created 5605 56062015-01-30 twu 5607 5608 * stage1hr.c: Using new interface to Stage3_compute 5609 5610 * gmap.c: Using new interface to Stage3_compute and Stage3_new. No longer 5611 calling Stage3_recompute_goodness. 5612 5613 * pair.c, pair.h: Implemented Pair_fracidentity_array, which returns goodness 5614 5615 * stage3.h, stage3.c: Changed Stage3_recompute_goodness to 5616 Stage3_compute_mapq. Always recomputing matches and goodness when 5617 this->pairarray is assigned. Removed references to 5618 END_KNOWNSPLICING_SHORTCUT. 5619 56202015-01-29 twu 5621 5622 * stage3.c: In Stage3_cmp, using npairs and matches as secondary criteria 5623 beyond goodness 5624 5625 * gmap.c: Cleaned up unused variables and parameters. Using new interface 5626 to Stage3_compute 5627 5628 * filestring.c: Added ability to handle %f 5629 5630 * stage3.c, stage3.h: Cleaned up unused variables and parameters 5631 5632 * stage1hr.c: Using new interface to Stage3_compute 5633 5634 * pair.c: Using false instead of 0 5635 56362015-01-28 twu 5637 5638 * gmap.c: Added call to Outbuffer_cleanup() 5639 5640 * outbuffer.c: Moved lock outside of loop to prevent a race condition 5641 5642 * inbuffer.c: Removed check of nextchar == EOF, which causes standard GSNAP 5643 and GMAP not to terminate 5644 56452015-01-27 twu 5646 5647 * shortread.c, shortread.h: Fixed some issues with variable names for MPI 5648 code 5649 5650 * outbuffer.c, outbuffer.h: Added Outbuffer_cleanup, which frees array of 5651 outputs 5652 5653 * inbuffer.c: Allowing for gzipped and bzipped2 files in MPI version by 5654 sending and receiving filecontents 5655 5656 * gsnap.c: Calling Outbuffer_cleanup 5657 5658 * gmap.c: Revealed variable needed for debugging 5659 5660 * filestring.c, filestring.h: Implemented Filestring_send and Filestring_recv 5661 5662 * compress.c: Fixed comment 5663 5664 * shortread.c: Made code consistent across text, gzip and bzip2. Added 5665 hooks for filling a Filestring_T object in gzip and bzip2 procedures. 5666 56672015-01-26 twu 5668 5669 * index.html: Updated for 2014-12-17.v2 5670 5671 * shortread.c, shortread.h, mpidebug.c, mpidebug.h: Using workers_comm in 5672 MPI_fopen 5673 5674 * inbuffer.c, inbuffer.h: Passing workers_comm to Shortread_read_filecontents 5675 5676 * gsnap.c: Introduced a workers_comm so MPI_File_open and MPI_File_close can 5677 be restricted to that group 5678 5679 * shortread.c: Added debugging statements for opening and closing files 5680 5681 * gsnap.c: Added debugging statements for opening and closing files. For 5682 MPI master using MPI_File input, explicitly closing those inputs. 5683 5684 * gsnap.c: Using new interfaces to Inbuffer_setup, Inbuffer_new, and 5685 Inbuffer_master_process. Master rank 0 no longer calling Inbuffer_new. 5686 5687 * gmap.c: Using new interface to Inbuffer_new 5688 5689 * inbuffer.c: No longer making a special case in fill_buffer for MPI when 5690 nextchar at end of block is EOF. 5691 5692 * shortread.c, shortread.h: MPI procedures for reading from filecontents 5693 also close and open input files 5694 5695 * inbuffer.h, inbuffer.c: Moved nspaces into Inbuffer_T object and into 5696 Inbuffer_new instead of Inbuffer_setup. Made Inbuffer_master_process free 5697 of an Inbuffer_T object. 5698 56992015-01-23 twu 5700 5701 * inbuffer.c: Added comments 5702 5703 * gsnap.c: Created separate worker_setup and worker_cleanup procedures 5704 57052015-01-22 twu 5706 5707 * inbuffer.c: Assigning filecontents buffers to the IN category for memusage 5708 5709 * trunk, config.site.rescomp.prd, src, gsnap.c, inbuffer.c, inbuffer.h, 5710 shortread.c, shortread.h: Merged revisions 157242 to 157253 from 5711 branches/2015-01-22-mpi-file-block to have worker ranks read blocks into a 5712 buffer 5713 5714 * memchk.c, popcount.c, bitpack64-read.h, bitpack64-serial-read.h, 5715 compress.h, dynprog.h, except.h, genomicpos.h, iit-read.h, 5716 indexdb-write.h, indexdb.h, indexdbdef.h, popcount.h, sequence.h: Added 5717 include of config.h 5718 5719 * configure.ac: Changed variable name from USE_MPI_FILE to USE_MPI_FILE_INPUT 5720 5721 * samheader.h, iit-read-univ.h: Added include of <mpi.h> 5722 5723 * oligoindex_pmap.h, bigendian.h, fopen.h, iitdef.h, littleendian.h, mem.h, 5724 oligoindex.h, types.h: Added explanation of why config.h needs to be 5725 included 5726 5727 * gsnap.c, inbuffer.c, inbuffer.h, shortread.c, shortread.h: Checking for 5728 both USE_MPI and USE_MPI_FILE_INPUT in using MPI_File for input 5729 5730 * chrsubset.h, access.h, alphabet.h, backtranslation.h, block.h, 5731 boyer-moore.h, bp-read.h, bp-write.h, bytecoding.h, bzip2.h, chrom.h, 5732 chrsegment.h, datadir.h, diag.h, diagdef.h, diagpool.h, genome-write.h, 5733 genome128-write.h, genome_hr.h, genome_sites.h, genomepage.h, gregion.h, 5734 iit-write-univ.h, iit-write.h, indel.h, indexdb_hr.h, interval.h, 5735 intlist.h, intpool.h, intron.h, match.h, matchdef.h, matchpool.h, 5736 maxent128_hr.h, maxent_hr.h, oligo.h, oligop.h, pairdef.h, parserange.h, 5737 reader.h, stage1.h, tableuint8.h, tally.h, translation.h, univinterval.h: 5738 Added blank line for formatting 5739 5740 * filestring.h, mpidebug.h, oligoindex_hr.h, samprint.h, sortinfo.h: Removed 5741 include of config.h, since not necessary 5742 5743 * atoi.h, bitpack64-write.h, cmet.h: Added $Id$ string 5744 57452015-01-21 twu 5746 5747 * stage3hr.c: Turning on SOFT_CLIPS_AVOID_CIRCULARIZATION again to avoid 5748 duplicates in circular chromosomes 5749 5750 * ax_mpi.m4: Added cc to list of possible values for MPICC, for systems that 5751 use a wrapper called cc 5752 5753 * shortread.c: Fixed parsing issues for blank lines and ends of files 5754 5755 * configure.ac: Added configure flag --enable-mpi-file 5756 5757 * Makefile.gsnaptoo.am: Removed mpi_gmap for now 5758 5759 * gsnap.c, pair.c, pair.h: Added noprint option for --action-if-cigar-error 5760 and made it the default 5761 5762 * gsnap.c, inbuffer.c: Made -q or --part flag work for MPI code 5763 5764 * inbuffer.c: Added ending brace for MPI code 5765 57662015-01-20 twu 5767 5768 * shortread.c: Fixed bug in a print statement where a pointer was not being 5769 provided. In input_oneline, making a single read to get nextchar. 5770 5771 * inbuffer.c: Not doing fseek if nextchar is EOF 5772 5773 * gsnap.c: Removed a debugging statement 5774 5775 * filestring.c: Increased size of buffer 5776 5777 * outbuffer.c, outbuffer.h: Added parameter for output_file 5778 5779 * gmap.c: Using new interface to Outbuffer_setup and 5780 Outbuffer_print_filestrings 5781 5782 * samheader.c, samheader.h, iit-read-univ.c, iit-read-univ.h, gsnap.c, 5783 filestring.c, filestring.h: Applied changes from 5784 branches/2015-01-17-mpi-seq 5785 5786 * outbuffer.c, outbuffer.h: Applied changes from 5787 branches/2015-01-17-mpi-seq. Removed code for Outbuffer_mpi_process. 5788 5789 * inbuffer.c: Removed requestid variable from fill_buffer for GMAP 5790 5791 * gmap.c: Put in dummy variables for Inbuffer_new 5792 5793 * trunk, config.site.rescomp.tst, src, filestring.c, filestring.h, gsnap.c, 5794 inbuffer.c, inbuffer.h, mpidebug.c, mpidebug.h, outbuffer.c, shortread.c, 5795 shortread.h: Merged revisions 156908 to 157083 from 5796 branches/2015-01-17-mpi-seq to change the input side of mpi_gsnap 5797 5798 * index.html: Updated for version 2014-12-17 5799 5800 * VERSION: Updated version number 5801 5802 * samprint.c: Consolidated print statements 5803 5804 * output.c: Defining abbrev for a nomapper 5805 5806 * diag.c: Added debugging statement 5807 58082015-01-15 twu 5809 5810 * gmap.c, pair.c, stage2.c, stage3.c: Merged revisions 156824 to 156843 from 5811 branches/2015-01-15-fix-chimeras to make better decisions for last exons 5812 having partial alignments 5813 5814 * oligoindex_hr.c: Allowing diagonals where ptr->i < querylength. Reveals 5815 alignments that were otherwise missed. 5816 5817 * gmap.c: Fixed debugging statements to use Sequence_stdout instead of 5818 Sequence_print 5819 5820 * chimera.c, chimera.h, gmap.c: Fixed algorithm for finding non-exon-exon 5821 chimeric breakpoint and finding dinucleotides 5822 58232015-01-14 twu 5824 5825 * stage3hr.c: In anomalous_splice_p procedures, checking for samechr_splice 5826 hittypes 5827 5828 * stage1hr.c: Not applying GMAP to samechr_splice hittypes 5829 58302015-01-07 twu 5831 5832 * oligoindex_hr.c: Fixed type for positions_space field in Oligoindex_T 5833 5834 * trunk, src, oligoindex_hr.c, oligoindex_hr.h, oligoindex_old.c, 5835 oligoindex_old.h, stage2.c: Merged revisions 154793 through 156263 from 5836 branches/2014-12-06-stage2-larger-kmers to allow for 9-mers in stage 2 5837 5838 * config.site.rescomp.prd, config.site.rescomp.tst, VERSION: Updated version 5839 number 5840 5841 * index.html: Added changes for version 2014-12-16 (v2) 5842 5843 * substring.c: Fixed assertions to account for out-of-bounds regions 5844 5845 * README: Added explanation of XI field 5846 5847 * pair.c, samprint.c, shortread.c, shortread.h: Added code for XI field 5848 58492015-01-05 twu 5850 5851 * stage1hr.c: Using correct typecast of ambcoords to (Uint8list_T) NULL for 5852 large genomes 5853 5854 * stage2.c: Fixed uninitialized variable for firstactive 5855 58562014-12-16 twu 5857 5858 * gsnap.c, uniqscan.c: Using new interface to Stage3hr_setup 5859 5860 * stage3hr.c, stage3hr.h: Computing outofbounds_left and outofbounds_right. 5861 Using new interface to Substring_new. 5862 5863 * substring.c, substring.h: Added provision for outofbounds_left and 5864 outofbounds_right, to be considered part of trimming 5865 5866 * gsnap.c: Changed input sequence to open input streams to get one character 5867 and determine if it is FASTQ format, and then to do Shortread_setup, and 5868 then to fill the inbuffer. 5869 5870 * sarray-read.c: Fixed typo: spliceends_antisense => spliceends_sense 5871 5872 * substring.c: Removed debugging statement 5873 58742014-12-15 twu 5875 5876 * samheader.c: Not printing tabs if there are no headers 5877 5878 * sam_sort.c: Setting fileposition variable for each file 5879 5880 * filestring.c: Handling the case where filestring is NULL 5881 58822014-12-12 twu 5883 5884 * doublelist.c: Fixed type error in doublelist_to_array_out 5885 5886 * trunk, config.site.rescomp.prd, Makefile.gsnaptoo.am, src, gsnap.c, 5887 samprint.c, stage1hr.c, stage1hr.h, stage3hr.c, substring.c, substring.h, 5888 uniqscan.c: Merged revisions 154499 through 155289 from 5889 branches/2014-12-03-dna-chimeras 5890 5891 * VERSION, config.site.rescomp.prd, index.html: Updated version number 5892 5893 * sam_sort.c: Revised sam_sort to handle multiple input files 5894 5895 * trunk, Makefile.am, VERSION, bootstrap.gsnaptoo, ax_mpi.m4, config.site, 5896 config.site.rescomp.prd, config.site.rescomp.tst, configure.ac, 5897 memory-check.pl, mpi, src, Makefile.gsnaptoo.am, access.c, 5898 backtranslation.c, backtranslation.h, bool.h, chimera.c, chimera.h, 5899 filestring.c, filestring.h, genomicpos.c, genomicpos.h, get-genome.c, 5900 gmap.c, gsnap.c, iit-read-univ.c, iit-read-univ.h, iit-read.c, iit-read.h, 5901 inbuffer.c, inbuffer.h, md5.c, md5.h, mem.c, mem.h, mpidebug.c, 5902 mpidebug.h, outbuffer.c, outbuffer.h, output.c, output.h, pair.c, pair.h, 5903 request.c, request.h, resulthr.c, resulthr.h, revcomp.c, sam_sort.c, 5904 samflags.h, samheader.c, samheader.h, samprint.c, samprint.h, 5905 sarray-read.c, segmentpos.c, segmentpos.h, sequence.c, sequence.h, 5906 shortread.c, shortread.h, stage1hr.c, stage2.c, stage2.h, stage3.c, 5907 stage3.h, stage3hr.c, stage3hr.h, substring.c, substring.h, translation.c, 5908 translation.h, types.h, uniqscan.c: Merged revisions 154226 to 155279 from 5909 branches/2014-11-27-mpi to implement MPI versions and to use Filestring_T 5910 objects for all output 5911 5912 * genome.c, genome.h: Changed type of gbuffer from unsigned char to char 5913 59142014-12-10 twu 5915 5916 * oligoindex_hr.c: Added code for handling 9-mers 5917 59182014-12-06 twu 5919 5920 * stage1hr.c: Fixed typo in assigning probs_acceptor 5921 59222014-12-05 twu 5923 5924 * trunk, VERSION, config.site.rescomp.prd, src, doublelist.c, doublelist.h, 5925 gsnap.c, samprint.c, sarray-read.c, splice.c, stage1hr.c, stage1hr.h, 5926 stage3hr.c, stage3hr.h, uniqscan.c: Merged revisions 154673 through 154777 5927 from branches/2014-12-04-stage1-ambig to compute ambiguous splicing better 5928 in suffix array, stage1, and combining splices. Fixed memory leak and 5929 changed criteria for comparing across hits 5930 59312014-12-04 twu 5932 5933 * samprint.c, stage3hr.c, stage3hr.h: Merged revisions 154673 through 154678 5934 from branches/2014-12-04-stage1-ambig to change XA field 5935 5936 * index.html: Updated for latest version 5937 5938 * configure.ac: Added more detailed messages about our own loading of 5939 config.site files to counteract the warning message from the standard 5940 autoconf loading 5941 59422014-12-03 twu 5943 5944 * uniqscan.c: Using new interface to Substring_setup 5945 5946 * gsnap.c: Replaced --terminal-output-minlength with --reject-trimlength 5947 5948 * stage1hr.c, stage1hr.h: Calling Sarray_search_greedy with nmisses_allowed 5949 being cutoff_level, and not querylength. Using reject_trimlength instead 5950 of terminal_output_minlength. 5951 5952 * stage3hr.c, stage3hr.h: Replaced Stage3_filter_terminals with 5953 Stage3_reject_trimlengths 5954 5955 * substring.c, substring.h: Implemented new logic based on 5956 reject_trimlength. True terminals from the GSNAP algorithm are allowed at 5957 this point (but taken care of now by Stage3end_reject_trimlengths). 5958 5959 * sarray-read.c: Improved debugging statements 5960 5961 * stage3hr.c: No longer trying to clip overlaps when the two ends are not in 5962 a concordant orientation 5963 5964 * outbuffer.c: Using new interface to SAM_print_nomapping 5965 5966 * samprint.c, samprint.h: Allowing for non-zero npaths to be printed in 5967 SAM_print_nomapping as an NH field, which can occur with the 5968 --quiet-if-excessive feature 5969 5970 * samread.c: Allowing for the possibility that XO is the first field in a 5971 SAM line 5972 5973 * stage3hr.c: Fixed problem with --merge-distant-samchr feature giving the 5974 wrong chrpos on SAM output on distant splices, since this was being 5975 treated the same as a translocation (chrnum == 0) 5976 5977 * samread.c: Terminating parse_XO procedures for either '\0' or '\n' 5978 59792014-12-02 twu 5980 5981 * gmap.c, gsnap.c: Including default variables in --help statement 5982 5983 * stage3hr.c: Calculating common_shift to get more even splits between the 5984 two paired ends, by accounting for the common shared point between 5985 common_right and common_left. 5986 5987 * sarray-read.c: Fixed typo in a for loop 5988 5989 * sam_sort.c: Made --no-sam-headers option work correctly 5990 5991 * sam_sort.c, samheader.c, samheader.h: For --split-output function, writing 5992 SAM header files to each output file 5993 59942014-11-27 twu 5995 5996 * archive.html, index.html: Updated for latest version 5997 59982014-11-25 twu 5999 6000 * README: Added comment about sam_sort and --split-output 6001 6002 * sam_sort.c, samflags.h, samread.c, samread.h: Added --split-output and 6003 --append-output options 6004 6005 * outbuffer.c: Changed abbrev NM in comment 6006 6007 * stage1hr.c: Changed calculation of amb_nmatches to amb_length 6008 6009 * stage3hr.c: Swapping ilength_low and ilength_high for GMAP when alignment 6010 is minus 6011 60122014-11-24 twu 6013 6014 * stage1hr.c: Turning off debugging 6015 6016 * bootstrap.gsnaptoo: Running automake to add missing files 6017 6018 * trunk, util, src, outbuffer.c, pair.c, pair.h, samprint.c, samprint.h, 6019 stage3hr.c, stage3hr.h, substring.c, substring.h: Merged revisions 153682 6020 to 154020 from branches/2014-11-20-redo-overlap to compute overlap better 6021 using ilength53 and ilength35 and a common shift 6022 6023 * sarray-read.c: Merged revisions 153682 to 154020 to handle ambiguous 6024 splicing better 6025 6026 * trunk, INSTALL, VERSION, config.site.rescomp.prd, index.html: No longer 6027 keeping track of INSTALL 6028 6029 * config.guess, config.sub, ltmain.sh: No longer keeping track of 6030 config.guess, config.sub, or ltmain.sh 6031 6032 * gmap_build.pl.in: Added comment about meaning of -D flag 6033 6034 * acinclude.m4, configure.ac: Adding check for MPI 6035 6036 * ax_mpi.m4: Added code for MPI 6037 6038 * src, access.c, bitpack64-read.c, bitpack64-readtwo.c, bitpack64-write.c, 6039 compress-write.c, genome-write.c, genome.c, genome128-write.c, 6040 genome128.c, genome_sites.c, get-genome.c, gmapindex.c, iit-read-univ.c, 6041 iit-read.c, iit_get.c, iit_store.c, indel.c, indexdb-write.c, indexdb.c, 6042 indexdb_hr.c, mem.c, oligoindex_hr.c, sam_sort.c, samheader.c, snpindex.c, 6043 stage1hr.c, stage3.c, table.c, tableuint8.c, uniqscan.c, univinterval.c: 6044 Merged revisions 153114 to 153944 from branches/2014-11-12-make-check-i386 6045 to make tests work in i386 computers 6046 6047 * stage3hr.c: Not using ambiguous splices to update found_score 6048 6049 * stage2.c: Removed adjacentp as unused variables 6050 6051 * samprint.c, samprint.h: For circular alignments, checking for sole HS 6052 pattern. Also checking for chrpos > chrlength, and subtracting chrlength 6053 if necessary. 6054 6055 * pair.c, pair.h: Added Cigar_action_T. Added Pair_check_cigar. Removed 6056 prev as an unused variable. 6057 6058 * iit-write-univ.c: Handling the case if total_nintervals is 0 6059 6060 * gmap.c, gsnap.c: Added --action-if-cigar-error 6061 60622014-11-17 twu 6063 6064 * bytecoding.c: Removed unused variable 6065 6066 * gmapindex.c: Printing genome length to stderr 6067 6068 * bytecoding.c: Using a buffer of 10,000,000 block-sizes, and writing 6069 iteratively, rather than a single buffer and single write. 6070 60712014-11-13 twu 6072 6073 * gmapindex.c: Made some changes in casting. Fixed printf format to use 6074 %llu. 6075 6076 * stage3hr.c, stage3hr.h: Renamed amb_nmatches to amb_length. 6077 6078 * samprint.c: In adjust_hardclips, not changing hardclips if shift downward 6079 fails. Renamed amb_nmatches to amb_length. 6080 6081 * splice.c, stage1hr.c: Renamed amb_nmatches to amb_length. Providing 6082 Substring_match_length_orig to amb_length in Stage3end_new_splice and 6083 Stage3end_new_shortexon 6084 60852014-10-31 twu 6086 6087 * iit-read-univ.c, iit-read.c: Using %llu and casting to (long long int) for 6088 printing offset and filesize 6089 6090 * gmap.c, gsnap.c: Using %zu for printing results of sizeof(). 6091 60922014-10-29 twu 6093 6094 * stage3hr.c: Restoring revision of SAM insertlength for ends involving GMAP 6095 when method is successful 6096 6097 * stage3hr.c: Fixed SAM output of insert length of 0 when no overlap is 6098 found in a GMAP alignment 6099 6100 * stage3.c: Added debugging statements 6101 6102 * gmap.c, gmapindex.c, gsnap.c: Added output statement at end of checking 6103 compiler assumptions 6104 6105 * README: Added comment about change from PG: to XG: 6106 6107 * ax_ext.m4, configure.ac: Added option to enable or disable sse4.2 6108 6109 * samprint.c: Fixed typo in adjust_hardclips. Also, when querypos increase 6110 fails, trying querypos decrease. 6111 6112 * samprint.c: Fixed infinite loop in adjust_hardclips 6113 61142014-10-28 twu 6115 6116 * stage3hr.c: Fixed bug with uninitialized variables 6117 6118 * outbuffer.c: Fixing potential data race as noted by valgrind for 6119 this->ntotal between input and output threads, although not problematic 6120 before, because this->ntotal increases monotonically 6121 61222014-10-27 twu 6123 6124 * stage3hr.c: Fixed computation of overlap between GMAP and non-GMAP 6125 alignments 6126 61272014-10-22 twu 6128 6129 * gregion.c: Checking size before deciding to use alloca or malloc. 6130 61312014-10-16 twu 6132 6133 * stage2.c: Fixed an uninitialized variable in grand_fwd and grand_rev 6134 procedures, plus the checks on maxintronlen in computing 6135 grand_fwd_lookforward and grand_rev_lookforward. 6136 6137 * shortread.c: Allowing queryseq1 to be equal to SKIPPED. Removed unused 6138 parameter acc from input_oneline routines. 6139 6140 * VERSION, index.html: Updated version number 6141 6142 * stage1hr.c: Added debugging statement 6143 6144 * sarray-read.c: Don't limit filling of best elt based on nmatches being 6145 more than half of the read length 6146 6147 * stage3hr.c: Restored previous behavior where soft clips avoid 6148 circularization 6149 6150 * indexdb-write.c, sarray-write.c: Removed unnecessary includes of popcount.h 6151 6152 * bitpack64-write.c, genome128_hr.c: In lookups of clz_table, removing the 6153 intermediate variable "top". 6154 61552014-10-15 twu 6156 6157 * stage3hr.c: Not allowing soft-clipping at ends to avoid circularization. 6158 Added pre-processor macro SOFT_CLIPS_AVOID_CIRCULARIZATION to preserve 6159 previous code. 6160 6161 * VERSION, config.site.rescomp.prd, config.site.rescomp.tst, index.html: 6162 Updated version number 6163 6164 * outbuffer.c: Added gff3 header for GMAP gff3 output to stdout. Added HD 6165 and PG headers for GMAP sam output to --split-output files. 6166 61672014-10-14 twu 6168 6169 * Makefile.gsnaptoo.am: Changed variable name to SIMD_CFLAGS 6170 6171 * configure.ac: Changed variable name to SIMD_CFLAGS. Setting -mpopcnt 6172 based only on acx_mpopcnt_ok, and not on individual builtin functions. 6173 6174 * builtin-popcount.m4: Setting CFLAGS instead of LIBS. Checking for builtin 6175 functions regardless of whether -mpopcnt works. 6176 6177 * ax_ext.m4: Setting CFLAGS instead of LIBS. Changed variable name to 6178 SIMD_CFLAGS 6179 6180 * builtin-popcount.m4, configure.ac: Changed macro name to 6181 ACX_BUILTIN_POPCOUNT 6182 6183 * acinclude.m4, builtin-popcount.m4, builtin.m4: Renamed program to 6184 builtin-popcount.m4 6185 6186 * popcnt.m4: Added comment 6187 6188 * builtin.m4: Added check from popcnt.m4 6189 61902014-10-13 twu 6191 6192 * shortread.c: Made input procedures robust to incomplete entries 6193 6194 * indexdb.c: Fixed bug where munmap was called twice on positions_high for 6195 GSNAPL and GMAPL. 6196 61972014-10-09 twu 6198 6199 * VERSION: Updated version number 6200 6201 * ax_ext.m4: Seeing of avx and avx2 are enabled 6202 6203 * configure.ac: Added ability to disable avx and avx2 6204 6205 * gmap_build.pl.in: Calling gmapindex initially to check for compiler 6206 assumptions 6207 6208 * gmapindex.c: Added check for compiler assumptions 6209 6210 * gmap.c, gsnap.c: Added description of --check option to --help message 6211 6212 * gmap.c, gsnap.c: Improved check for compiler assumptions, and added 6213 --check option 6214 6215 * samprint.c: Fixed flag for merged overlap to not have PAIRED_READ set. 6216 When clipdir == 0, not calling adjust_hardclips. 6217 62182014-10-07 twu 6219 6220 * sarray-read.c: Fixed bug when setting array_stop and finalptr is less than 6221 4 6222 62232014-10-01 twu 6224 6225 * samprint.c: Further fixes to mapping quality for merged alignments 6226 6227 * samprint.c: For merged alignments, printing mapping quality of 40 6228 6229 * pair.c, samprint.c: Putting XB and XP tags after XH 6230 6231 * VERSION: Updated version number 6232 6233 * samprint.c: Calling GMAP procedure for querystart and queryend when 6234 necessary in adjust_hardclips 6235 6236 * stage3.c: Initializing variables 6237 6238 * pair.c: Passing correct values for hardclip_low and hardclip_high to 6239 hardclip_pairs 6240 6241 * samprint.c: Removed static initialization of hide_soft_clips_p 6242 6243 * trunk, src, outbuffer.c, pair.c, pair.h, samprint.c, samprint.h: Merged 6244 revisions 149547 through 149570 from branches/2014-10-01-hardclip-adj to 6245 improve adjustment of hardclipping 6246 6247 * stage3.c: Removed diagnosticp from some procedures 6248 62492014-09-30 twu 6250 6251 * sam_sort.c: Providing timing information to user 6252 6253 * sam_sort.c, samread.c, samread.h: Changed algorithm to parse just for 6254 linelengths initially, which allows for the SAM file to be read using 6255 buffers 6256 62572014-09-29 twu 6258 6259 * VERSION, index.html: Updated version number 6260 6261 * sam_sort.c: Fixed usage statement 6262 6263 * samread.c, samread.h: Commented out unused procedures 6264 6265 * samread.c: Added header file 6266 6267 * samheader.c, samheader.h, chimera.c, chimera.h, get-genome.c, gmap.c, 6268 gsnap.c, iit-read-univ.c, iit-read-univ.h, outbuffer.c, outbuffer.h, 6269 pair.c, pair.h, samprint.c, samprint.h, shortread.c, shortread.h, 6270 stage3.c, stage3.h, uniqscan.c: Removing computation of .sortinfo file 6271 6272 * Makefile.gsnaptoo.am: Removed sortinfo.c and sortinfo.h 6273 6274 * sam_sort.c, samflags.h, samheader.c, samheader.h, samread.c, samread.h: 6275 Made sam_sort independent of a .sortinfo file. Computes sortinfo 6276 information directly from the SAM input file 6277 62782014-09-26 twu 6279 6280 * outbuffer.c: Providing sortinfo to Pair_print_sam_nomapping 6281 6282 * VERSION, index.html: Updated version number 6283 6284 * trunk, index.html, src, outbuffer.c, pair.c, pair.h, sam_sort.c, 6285 samflags.h, samprint.c, samprint.h, samread.c, samread.h, sortinfo.c, 6286 sortinfo.h, stage3.c: Merged revisions 148987 through 149172 from 6287 branches/2014-09-25-sam-sort to yield a working version of sortinfo feature 6288 6289 * stage3hr.c: Reverting to previous version 149160 in trunk 6290 6291 * stage3hr.c: Rewrote Stage3pair_remove_overlap procedures to use 6292 hitpair_overlap_score_cmp, hitpair_overlap_test, 6293 hitpair_equiv_preference_cmp, and hitpair_equiv_tst 6294 62952014-09-25 twu 6296 6297 * samprint.c: Not merging paired ends if there is no overlap 6298 62992014-09-24 twu 6300 6301 * pair.c: Enabled call to Sortinfo_update to work for GMAP 6302 6303 * VERSION, index.html: Updated version number 6304 6305 * chimera.c, chimera.h, pair.c, samprint.c: Using @ instead of : for 6306 coordinates for XT field. Made GMAP XT field same as that for GSNAP. 6307 6308 * config.site.rescomp.tst: Allowing builtin_popcount 6309 6310 * README: Added description of XH field 6311 6312 * Makefile.gsnaptoo.am: Added samheader.c to sam_sort 6313 6314 * sam_sort.c: Handling single-end alignments 6315 6316 * samread.c: Setting terminating character at end of computation 6317 6318 * samheader.c, samheader.h: Implemented change of HD header to SO:sorted 6319 6320 * sam_sort.c, samread.c, samread.h: Handling hard-clipped alignments 6321 6322 * pair.c, pair.h, samprint.c, shortread.c, shortread.h: Added XH field to 6323 provide hard-clipped sequence 6324 6325 * substring.c: For --merge-overlap feature, in deciding whether to add 6326 insertion, deletion, or intron between the pieces, using <= instead of < 6327 to decide. 6328 6329 * samprint.c: In SAM_compute_chrpos, always using substring_low to compute 6330 chrpos 6331 6332 * indel.c: Initializing variables 6333 6334 * sam_sort.c: Implemented --dups-only and --uniq-only 6335 6336 * sortinfo.c: Fixed typo in comment 6337 6338 * pair.c, pair.h, samprint.c, sortinfo.c, sortinfo.h: Sortinfo_update uses 6339 sign on chrnum to indicate whether a read is the low or high end 6340 6341 * iit-read-univ.c: Using safer computation of an average 6342 6343 * Makefile.gsnaptoo.am, sam_sort.c, samread.c, samread.h: Implemented 6344 --mark-dups function into sam_sort. Works except for hard-clipping. 6345 6346 * samread.c, samread.h: Brought over copy from GSTRUCT 6347 6348 * sam_sort.c: Introduced readindex, needed for marking duplicates 6349 63502014-09-23 twu 6351 6352 * Makefile.gsnaptoo.am, iit-read-univ.c, iit-read-univ.h, sam_sort.c, 6353 sortinfo.c: Using genome chromosome_iit file in sam_sort, instead of 6354 storing information in .sortinfo files 6355 6356 * VERSION, index.html: Updated version number 6357 6358 * trunk, src, Makefile.gsnaptoo.am, chimera.c, chimera.h, chrnum.h, 6359 get-genome.c, gmap.c, gsnap.c, iit-read-univ.c, iit-read-univ.h, 6360 outbuffer.c, outbuffer.h, pair.c, pair.h, sam_sort.c, samheader.c, 6361 samheader.h, samprint.c, samprint.h, shortread.c, shortread.h, sortinfo.c, 6362 sortinfo.h, stage1hr.h, stage3.c, stage3.h, types.h, uint8list.h, 6363 uintlist.h, uniqscan.c: Merged revisions 148659 through 148720 from 6364 branches/2014-09-23-sam-sort to add --make-sortinfo option and sam_sort 6365 program 6366 6367 * splice.c: Fixed bug in reading Stage3end_T object that was already freed 6368 63692014-09-22 twu 6370 6371 * configure.ac: Loading default ./config.site file only if it exists 6372 6373 * VERSION, config.site.rescomp.prd, config.site.rescomp.tst, configure.ac, 6374 index.html: Handling cases differently if CONFIG_SITE file begins with ./. 6375 Allowing for multiple files in CONFIG_SITE. 6376 6377 * Makefile.gsnaptoo.am: Interchanged order of POPCNT_CFLAGS and SIMD_FLAGS 6378 6379 * samprint.c: Fixed bugs introduced in adding --merge-overlap feature 6380 6381 * INSTALL, config.guess, config.sub, ltmain.sh: Updated to more recent 6382 version of autoconf 6383 6384 * bootstrap.gsnaptoo: Changed /usr/bin/touch to touch 6385 6386 * configure.ac: Testing first for ./ in front of CONFIG_SITE. Added 6387 AC_SUBST of POPCNT_CFLAGS 6388 6389 * shortread.c, shortread.h: Removed unused procedures 6390 6391 * stage1hr.c: Changed nmisses_allowed for sarray method to be querylength, 6392 for both --use-sarray=2 and normal method 6393 63942014-09-21 twu 6395 6396 * pair.c: Changed PG field to XG 6397 6398 * configure.ac: Change flag to --disable-builtin-popcount 6399 6400 * splice.c: In procedures that group for ambiguous hits, separating sense 6401 and antisense segments 6402 6403 * substring.c, stage3hr.c, stage1hr.c: Improved debugging statements 6404 64052014-09-19 twu 6406 6407 * stage3.c: Using new interface to Pair_print_sam 6408 6409 * ax_ext.m4, builtin.m4: Restoring value of LIBS at end of procedure 6410 6411 * builtin.m4: Setting LIBS to -mpopcnt before running tests 6412 6413 * gsnap.c: Added comment that --merge-overlap is a beta implementation 6414 6415 * trunk, VERSION, config.site.rescomp.tst, index.html, src, gsnap.c, list.c, 6416 list.h, outbuffer.c, outbuffer.h, pair.c, pair.h, samprint.c, samprint.h, 6417 shortread.c, shortread.h, stage3hr.c, stage3hr.h, substring.c, 6418 substring.h: Merged revisions 148190 through 1418357 from 6419 branches/2014-10-18-merge-overlap to add --merge-overlap feature 6420 6421 * ax_ext.m4: Changed name of cpu features to sse4.1 and sse4.2 6422 6423 * configure.ac: Showing pthread and popcnt flags at end 6424 6425 * README: Explaining XG flag 6426 6427 * samprint.c: Changed PG flag to XG flag 6428 6429 * samheader.c: Commenting out code for extra @PG lines 6430 6431 * configure.ac: Setting POPCNT_CFLAGS based on result from builtin.m4 6432 6433 * builtin.m4: Setting a variable ax_cv_compile_builtin_ext 6434 6435 * ax_ext.m4: Setting LIBS properly before running AC_LINK_IFELSE 6436 64372014-09-18 twu 6438 6439 * genome128_hr.c: Using Harley's method to reduce number of popcount 6440 operations for SSE2 6441 6442 * genome128_hr.c: Fixed HAVE_MM_POPCNT alternative for SSE2-based popcount 6443 6444 * archive.html, index.html: Updated for version 2014-09-18 6445 64462014-09-17 twu 6447 6448 * VERSION: Updated version number 6449 6450 * genome128_hr.c: Added macro for debug4 6451 6452 * genome128_hr.c: Using _mm_extract_epi16 for SSE2 code instead of casting 6453 to (UINT4 *), because casting leads some compilers to generate wrong 6454 ordering of statements 6455 64562014-09-16 twu 6457 6458 * ax_ext.m4, configure.ac: Providing information to user about SIMD cpu 6459 features available and compiler flags to be used 6460 6461 * gmap.c, gsnap.c: Added extra information to --version about type of popcnt 6462 supported 6463 6464 * ax_ext.m4: Changed ordering of ifthen clauses. Added variable to indicate 6465 compiler or linker problems. 6466 6467 * configure.ac: Added warnings for compiler or linker problems for SIMD 6468 extensions 6469 64702014-09-15 twu 6471 6472 * configure.ac: Changed flag name from --enable-popcnt to 6473 --enable-builtin-popcnt 6474 6475 * genome128_hr.c, sarray-read.c: Using mm_popcnt when _popcnt not available 6476 6477 * ax_ext.m4: Revamped tests to check for CPU, compile, and linking, in that 6478 order. Renamed variables more systematically. 6479 6480 * pair.c, pair.h, pairpool.c, pairpool.h, stage3.c, stage3.h: Moved 6481 Pairpool_clip_bounded to pair.c, and created Pair_clip_bounded_pairs for 6482 computing chimeras by the working thread and Pair_clip_bounded_array for 6483 truncation by the output thread. This enables the --truncate flag to work 6484 again for GMAP. 6485 6486 * gmap.c: Improved error message when invalid argument is given to -f 6487 6488 * stage3.c: Made cdna direction choices based on splice site scores more 6489 stringent, so both donor and acceptor sites have to be significantly 6490 better. 6491 64922014-09-12 twu 6493 6494 * gmap.c, outbuffer.c, pair.c, pair.h, stage3hr.c, stage3hr.h, substring.c, 6495 substring.h: For BLAST m8 output, adding endings to accessions for 6496 paired-end reads 6497 6498 * gmap.c, gsnap.c, outbuffer.c, outbuffer.h, pair.c, pair.h, stage3hr.c, 6499 stage3hr.h, substring.c, substring.h, uniqscan.c: Added implementation of 6500 BLAST m8 output format 6501 65022014-09-10 twu 6503 6504 * VERSION: Updated version number 6505 6506 * ax_ext.m4: Fixed typo for handling AVX 6507 65082014-09-08 twu 6509 6510 * uniqscan.c: Using new interface to Stage1hr_setup 6511 6512 * stage1hr.c, stage1hr.h, gsnap.c: Added option --use-sarray=2 to use only 6513 suffix array algorithm 6514 6515 * stage3hr.c: Stopped using alloca for hitlists, since they can cause stack 6516 overflow. Made loops more efficient for pair_up_concordant_aux. 6517 6518 * gmap.c: Stopping memory error when a chimera is found, --npaths is set to 6519 1, and one part of the chimera fails the conditions for --min-identity or 6520 --min-trimmed-coverage 6521 65222014-09-04 twu 6523 6524 * sarray-read.c: Implemented faster SIMD algorithm for 6525 Elt_fill_positions_filtered 6526 6527 * sarray-read.c: Implemented Elt_fill_positions_filtered using alloca and 6528 copying from stack, instead of guessing allocation 6529 65302014-09-03 twu 6531 6532 * VERSION, config.site.rescomp.prd: Updated version number 6533 6534 * stage3hr.c: In pair_remove_bad_superstretches, keeping track of better and 6535 worse children separately, and handling list order correctly. Now chooses 6536 shorter insert lengths correctly. Added OUTERLENGTH_SLOP. 6537 65382014-09-02 twu 6539 6540 * stage3hr.c: In Stage3end_optimal_score_aux and 6541 Stage3pair_optimal_score_aux, counting indels only if they are within 6542 trim_left and trim_right 6543 6544 * gsnap.c, stage3hr.c, stage3hr.h, uniqscan.c: Added option 6545 --order-among-best to GSNAP to control randomization among best alignments 6546 6547 * stage3hr.c: When SCORE_INDELS is true for comparing alignments, not 6548 counting indel_penalty in new->penalties to avoid double-counting 6549 6550 * stage2.c, stage3hr.c, stage3.c: Turned off debugging 6551 6552 * stage1hr.c: Calling GMAP pairsearch if indels5 or indels3 is not NULL, as 6553 well as if found_score is too high 6554 6555 * trunk, src: Merged revisions 145502 to 146146 from 6556 branches/2014-08-19-stack-alloca and 146146 to 146618 from 6557 branches/2014-08-27-parallel-stage2 6558 6559 * oligoindex_hr.c, diag.c: Merging revisions 145502 to 146146 from 6560 branches/2014-08-19-stack-alloca to ignore check on querylength and to use 6561 alloca for GSNAP 6562 6563 * stage2.c, stage2.h: Merging revisions 146146 to 146618 from 6564 branches/2014-08-27-parallel-stage2 to make stage 2 computation faster 6565 6566 * list.c, list.h, spanningelt.c, spanningelt.h, stage1hr.c, stage1hr.h: 6567 Merging revisions 145502 to 146146 from branches/2014-08-19-stack-alloca 6568 to work directly on arrays of Spanningelt_T objects 6569 6570 * dynprog_simd.c, dynprog_single.c: Merging revisions 146146 to 146618 from 6571 branches/2014-08-27-parallel-stage2 to fix debugging procedures and to use 6572 a stricter check for using 8-bit SIMD 6573 6574 * gmap.c, gsnap.c, stage3.c, stage3.h: Merging revisions 145502 to 146146 6575 from branches/2014-08-19-stack-alloca to use stage2_alloc only for GMAP 6576 initial stage 2 computation 6577 6578 * stage1hr.c: Fixed debugging statements 6579 65802014-08-25 twu 6581 6582 * trunk, configure.ac: Merged revisions 145989 through 145990 from 6583 branches/2014-08-19-stack-alloca to adde flag to configure.ac 6584 6585 * trunk, VERSION, index.html, src, boyer-moore.c, boyer-moore.h, chimera.c, 6586 chop_primers.c, diag.c, doublelist.c, doublelist.h, dynprog.c, 6587 dynprog_cdna.c, dynprog_end.c, dynprog_genome.c, dynprog_simd.c, 6588 dynprog_single.c, genome.c, genome.h, genome128_hr.c, genome_sites.c, 6589 genomicpos.h, gmap.c, gregion.c, gregion.h, gsnap.c, indel.c, intlist.c, 6590 intlist.h, list.c, list.h, mapq.c, mem.c, mem.h, oligo.c, oligoindex_hr.c, 6591 outbuffer.c, pair.c, pair.h, pairpool.c, parserange.c, samprint.c, 6592 sarray-read.c, shortread.c, shortread.h, smooth.c, splice.c, splicetrie.c, 6593 stage1.c, stage1hr.c, stage1hr.h, stage2.c, stage2.h, stage3.c, stage3.h, 6594 stage3hr.c, substring.c, uint8list.c, uint8list.h, uintlist.c, uintlist.h, 6595 uinttable.c, uinttable.h, gvf_iit.pl.in: Merged revisions 145503 through 6596 145988 from branches/2014-08-19-stack-alloca to use alloca 6597 65982014-08-20 twu 6599 6600 * uniqscan.c: Using new interface to Shortread_new 6601 6602 * shortread.c, shortread.h: Made Shortread_new extern again 6603 6604 * trunk, INSTALL, README, config.guess, config.sub, ltmain.sh, 6605 config.site.rescomp.prd, config.site.rescomp.tst, src, gmap.c, gsnap.c, 6606 indel.c, mapq.c, mem.c, mem.h, sarray-read.c, shortread.c, shortread.h, 6607 splice.c, stage1hr.c, stage3hr.c, stage3hr.h, substring.c: Merged 6608 revisions 145503 through 145603 from branches/2014-08-19-stack-alloca to 6609 use alloca instead of statck arrays based on MAX_READLENGTH and to handle 6610 reads longer than MAX_READLENGTH 6611 66122014-08-19 twu 6613 6614 * VERSION, index.html: Updated version number 6615 6616 * configure.ac: Added configure flag for enabling or disabling ssse3 6617 instructions 6618 6619 * ax_ext.m4: Checking whether user wants SSSE3 instructions 6620 6621 * stage2.c, stage3.c: Not putting gapholders into starts and ends. Removing 6622 gapholders from middle before calling Pairpool_join_end5 and 6623 Pairpool_join_end3. Gapholders were causing problems with the join 6624 operation. 6625 6626 * pairpool.c, pairpool.h: Implemented Pairpool_remove_gapholders 6627 66282014-08-04 twu 6629 6630 * stage3.c: Not setting best_pairs or best_path when the result is NULL 6631 6632 * shortread.c: Ignoring spaces in read 6633 6634 * pairpool.c: In joining paths, handling the case when one path is NULL 6635 66362014-07-29 twu 6637 6638 * ltmain.sh: Updated from 2.2.6 to 2.2.6b 6639 6640 * config.sub, config.guess: Updated from 2008 version to 2009 version 6641 6642 * INSTALL: Updated from 2007 version to 2009 version 6643 6644 * archive.html, index.html: Made changes for new version 6645 6646 * gff3_genes.pl.in, gff3_introns.pl.in, gff3_splicesites.pl.in: Handling 6647 GFF3 files that have both exon and CDS fields 6648 66492014-07-21 twu 6650 6651 * stage3hr.c: Restored random behavior for equivalent alignments 6652 6653 * VERSION: Updated version number 6654 6655 * atoiindex.c, cmetindex.c: Making correct calls to 6656 Sarray_discriminating_chars 6657 6658 * dynprog_simd.c: Added code for having no initial gap penalties, to be used 6659 for bam_indelfix 6660 66612014-07-19 twu 6662 6663 * dynprog_simd.c: Improvements to debugging procedures to handle 3-digit 6664 indices 6665 66662014-07-16 twu 6667 6668 * gmap.c: Using new interface to Stage3_compute 6669 6670 * stage3hr.h: Added interfaces for Stage3end_shortexonA_distance and 6671 Stage3end_shortexonD_distance 6672 6673 * stage3hr.c: Added hook for amb_prob in Stage3end_new_gmap. Removed 6674 penalty for ambig end lengths if amb_prob > 0.9. Fixed pointer advances 6675 in removing bad superstretches. 6676 6677 * stage3.c: Made fixes to choice of cdna_direction: using presence/absence 6678 of intron types, rather than number, and decreased binomial threshold for 6679 alignments around intron. Fixed handling of multiple start or end paths 6680 by joining them at the outset. 6681 6682 * stage2.c: Allowing for multiple end cells from each rootposition 6683 6684 * dynprog_genome.c: Altered decision-making between best alignment and 6685 probability-based alignment 6686 6687 * dynprog_simd.c: Added debugging statements 6688 6689 * dynprog.h: Made open gap penalties more uniform for different defect rates 6690 6691 * pair.c: Fixed calculation of nmatches at end for Pair_trim_ends 6692 6693 * pairpool.c: Enhanced debugging statement 6694 66952014-07-15 twu 6696 6697 * stage3.h: Stage3_compute now returns ambig_prob_5 and ambig_prob_3 6698 6699 * stage3.c: For ambiguous ends, no longer calling clean_pairs_end5, 6700 clean_path_end3, trim_end5_exon_indels, or trim_end3_exon_indels 6701 6702 * stage1hr.c: Passing amb_prob_5 and amb_prob_3 to Stage3end_new_gmap 6703 6704 * pair.c, pair.h: In Pair_trim_ends, not trimming ambiguous ends 6705 6706 * stage3.c: In trim_end5_exon_indels and trim_end3_exon_indels, counting 6707 trimmed ends as mismatches and handling large indels differently from 6708 small indels. Added hooks for ambig_prob. 6709 6710 * pair.c, pair.h, stage3hr.c: Eliminating ambig_end_nmatches from 6711 consideration in Pair_nmismatches_region 6712 6713 * stage3hr.c: Restored eventrim algorithm from revision 140363, which had 6714 only a single eventrim calculation and not separate calculations for the 6715 two ends. 6716 67172014-07-03 twu 6718 6719 * VERSION: Updated version number 6720 6721 * dynprog_end.c: Fixed typo in calling matrix for 16_upper twice, instead of 6722 16_upper and 16_lower 6723 6724 * splicetrie.c: Handling the case where Dynprog_end5_splicejunction or 6725 Dynprog_end3_splicejunction returns NULL 6726 6727 * dynprog_simd.h: Decreased value of SIMD_MAXLENGTH_EPI8 from 40 to 30 to 6728 prevent issues with overflows 6729 6730 * dynprog_simd.c, dynprog.c: In traceback, using main loop to decide whether 6731 to handle dir == DIAG 6732 6733 * dynprog_end.c: In traceback, using main loop to decide whether to handle 6734 dir == DIAG. In Dynprog_end5_splicejunction and 6735 Dynprog_end3_splicejunction, requiring finalscore to be positive before 6736 doing any traceback. 6737 6738 * sarray-write.c: Made monitoring statements work for 6739 Sarray_discriminating_chars 6740 6741 * sarray-write.c: Implemented batch reading method for 6742 Sarray_discriminating_chars 6743 67442014-07-02 twu 6745 6746 * gmapindex.c, sarray-write.c, sarray-write.h: Made the building of the LCP 6747 array and discriminating chars array more memory efficient by writing 6748 temporary files for rank and permuted sarray 6749 6750 * genome.c, genome.h: Changed type of counts to be Univcoord_T 6751 6752 * access.c: Fixed bug in handling of final partial block. Added debugging 6753 code for checking results. 6754 6755 * access.c: Increased FREAD_BATCH to 100 million bytes. Modified 6756 Access_allocated to always read in batches of size FREAD_BATCH. 6757 67582014-07-01 twu 6759 6760 * trunk, config.site.rescomp.prd, config.site.rescomp.tst, src, 6761 genome128_hr.c, gmap.c, intlist.c, intlist.h, samprint.c, sarray-read.c, 6762 splice.c, splice.h, stage1hr.c, stage3.c, stage3hr.c, stage3hr.h, 6763 substring.c, uint8list.c, uint8list.h, uintlist.c, uintlist.h: Merged 6764 revisions 140131 through 140367 from branches/2014-06-27-fix-amb to 6765 implement separate eventrim scores for start/end of read, fix 6766 cmet-stranded and cmet-nonstranded modes, implement separate 6767 sense/antisense for Splice_solve_single, and rewrite of ambiguous 6768 parameters from left/right to donor/acceptor 6769 6770 * index.html: Updated for version 2014-06-10 6771 6772 * archive.html: Added link to version 2011-12-28 6773 6774 * VERSION: Updated version number 6775 6776 * stage3hr.c: In Stage3end_new_shortexon, setting amb_nmismatches_start and 6777 amb_nmismatches_end separately 6778 6779 * stage3.c: Using score_introns (which looks at splice site neighborhood), 6780 instead of score_alignment to count canonical introns. Using defect_rate 6781 to determine whether to rely on splice site probabilities. 6782 6783 * stage1hr.c: Added blank lines 6784 6785 * gmap.c: Preventing leftpos and rightpos from exceeding query coordinates 6786 in solving for chimeras. Not using extension in finding remaining 6787 alignment, since it makes alignment harder. 6788 67892014-06-30 twu 6790 6791 * stage3.c: Transferring microexon pairs without looking at probabilities 6792 6793 * dynprog_single.c: Using MIN_MICROEXON_LENGTH instead of 8 6794 67952014-06-25 twu 6796 6797 * stage3hr.c: Fixed assignment of amb_nmatches_start and amb_nmatches_end 6798 for shortexons on minus strand 6799 6800 * stage3hr.c: Removed debugging code 6801 6802 * sarray-write.c: In Sarray_compute_child, cleaning out stack at end, 6803 because skipping it results in an incorrect child array 6804 68052014-06-24 twu 6806 6807 * stage3hr.c: Commenting out assertions that are not always true 6808 6809 * stage1hr.c: Assigning correct values of amb_nmatches_donor and 6810 amb_nmatches_acceptor to Stage3end_new_shortexon 6811 68122014-06-11 twu 6813 6814 * VERSION: Updated version number 6815 6816 * trunk, config.site.rescomp.prd, config.site.rescomp.tst, src, 6817 sarray-read.c, splice.c, stage1hr.c, stage3hr.c, stage3hr.h: Merged 6818 revisions 138722 through 138743 from 6819 branches/2014-06-01-amb-shortexon-fix1 to allow for ambiguous shortexons 6820 and to place a limit of MAX_LOCALSPLICING_POTENTIAL on splicing and 6821 shortexons 6822 6823 * archive.html, index.html: Put version 2014-05-15.v3 into archive 6824 6825 * stage1hr.c: Not processing splicing or shortexons if number of 6826 possibilities exceeds MAX_LOCALSPLICING_POTENTIAL 6827 6828 * gsnap.c: In memusage debugging, printing accession for each thread at the 6829 start of its processing 6830 6831 * substring.c, substring.h: Added function Substring_chimera_prob_2 6832 6833 * segmentpos.c, segmentpos.h: Added function Segmentpos_compare_order 6834 6835 * samprint.c: Removed unused variable 6836 6837 * iitdef.h: Added FILENAME_SORT as a sorting type 6838 6839 * gsnap.c: Added commas to memusage debugging output 6840 6841 * dynprog_end.c: Removed assertions, which do not hold 6842 68432014-06-09 twu 6844 6845 * gmap_build.pl.in, chrom.c, chrom.h, gmapindex.c: Added option for sorting 6846 chromosomes by order in a file 6847 68482014-06-04 twu 6849 6850 * VERSION: Updated version number 6851 6852 * README, configure.ac: Increased default MAX_READLENGTH for GSNAP from 250 6853 to 300 6854 6855 * dynprog_simd.c: Fixed formatting 6856 6857 * dynprog_genome.c, dynprog_cdna.c: Using correct interface to 6858 Dynprog_standard for non-SSE2 systems 6859 6860 * dynprog_end.c: Enabled non-SSE2 compilation to work. Made traceback 6861 procedures follow those in dynprog_simd.c. 6862 6863 * dynprog.c, dynprog.h: Modified traceback_std (non-SIMD) to behave the same 6864 as the SIMD traceback routines. Improved debugging output. 6865 6866 * stage3.c: Added comments 6867 6868 * dynprog.c, dynprog.h: Exposed Dynprog_standard, needed for systems without 6869 SSE2 instructions 6870 68712014-06-03 twu 6872 6873 * gmap.c, gsnap.c: In checking behavior of _mm_extract_epi8, just reporting 6874 results and not exiting based on behavior 6875 6876 * genome128_hr.c: Casting _mm_extract_epi16 to unsigned short, or a zero 6877 extended result, which is technically the correct behavior 6878 6879 * dynprog_simd.c: Being very explicit about casting between int and Score8_T 6880 and Score16_T types 6881 6882 * dynprog.h: Removed conditionals around defining Score_8 and Score_16 types 6883 6884 * compress.c: Calling _mm_free to match _mm_malloc 6885 68862014-05-30 twu 6887 6888 * trunk, VERSION, config.site.rescomp.prd, config.site.rescomp.tst, 6889 index.html, src, compress.c, genome128_hr.c: Merged revisions 137093 6890 through 137694 from branches/2014-05-23-genome128-32bit-shortchut to 6891 implement 32-bit shortcuts for 128-bit genomebits 6892 6893 * stage3hr.c: Not checking any more for duplicate Stage3end_T objects 6894 6895 * stage2.c: Eliminated penalty when exon length < EXON_DEFN, which misses 6896 short exons 6897 68982014-05-29 twu 6899 6900 * stage1hr.c: Restored usage of paired_usedp to avoid excess calls to GMAP 6901 for halfmapping alignments 6902 6903 * stage1hr.c: Not using paired_usedp in computing GMAP for halfmapping 6904 alignments 6905 6906 * dynprog_simd.c: Fixed traceback procedures to follow correct paths on gaps 6907 6908 * dynprog.c, dynprog.h, dynprog_genome.c, dynprog_single.c: Moved gap 6909 penalties to dynprog.h, and reduced open penalties to allow for multiple 6910 indels 6911 69122014-05-28 twu 6913 6914 * stage3.c: In merge_local_single, checking filledp to see if merge failed. 6915 When merge fails, recompute pairarray for this_left and this_right. 6916 69172014-05-27 twu 6918 6919 * dynprog_simd.c: Implemented replacement for _mm_min_epi8 for non-SSE4.1 6920 systems 6921 69222014-05-23 twu 6923 6924 * dynprog_simd.c: Fixed computations of E and H for values near 6925 NEG_INFINITY, to prevent horizontal or vertical jumps into the empty 6926 triangle. Added an E_mask variable to set horizontal/vertical scores into 6927 the empty triangle to be NEG_INFINITY. Setting directions_nogap 6928 explicitly to DIAG along the main diagonal to take care of ties between E 6929 and H. 6930 6931 * dynprog_end.c: Using computed lband and uband in find_best_endpoint 6932 procedures, instead of trying to recompute them, which led to incorrect 6933 results 6934 6935 * get-genome.c, iit-read-univ.c: Fixed zero-based behavior of -L option to 6936 one-based behavior. Zero-based behavior introduced in revision 99737 on 6937 2013-06-27. 6938 6939 * index.html: Made changes for version 2014-05-15 6940 6941 * gmap.c, gsnap.c: Added better test for behavior of max operation in SSE4.1 6942 6943 * stage1hr.c: Fixed a memory leak involving ambcoords 6944 6945 * stage1hr.c: Using nmatches_posttrim in evaluating alignments. Doing a 6946 comparison between concordant alignments involving terminals and 6947 halfmapping alignments, to determine the best solution. 6948 6949 * stage3hr.c: Fixed assignment of amb_nmatches to correct end for minus 6950 alignments in Stage3end_new_splice. Extending hardclips by 6951 amb_nmatches_start and amb_nmatches_end in Stage3pair_overlap. 6952 6953 * stage3hr.c, stage3hr.h: Changed Stage3end_nmatches and Stage3pair_nmatches 6954 to Stage3end_nmatches_posttrim and Stage3pair_nmatches_posttrim 6955 69562014-05-21 twu 6957 6958 * samprint.c, samprint.h: Fixed typo in keeping a parameter 6959 6960 * README: Clarified effect of --failed-input option 6961 6962 * README: Fixed typo 6963 6964 * acx_mmap_fixed.m4, acx_mmap_variable.m4: Fixed type incompatibility when 6965 char * is cast to int 6966 6967 * stage1hr.c: Changed categories for some debugging statements 6968 6969 * gmap.c, gsnap.c, outbuffer.c, outbuffer.h, samprint.c, samprint.h, 6970 stage3hr.c, stage3hr.h: Changed --fails-as-input flag to --failed-input 6971 flag, which takes an argument. Printing failed inputs in addition to 6972 nomapping output. 6973 6974 * genome.c: Made error message clearer when genomebits128 file not found 6975 69762014-05-17 twu 6977 6978 * stage3hr.c: Setting amb_nmatches_start, amb_nmatches_end, 6979 start_ambiguous_p, and end_ambiguous_p based on amb_nmatches for halfdonor 6980 and halfacceptor splices, even when ambcoords_left and ambcoords_right are 6981 NULL 6982 69832014-05-16 twu 6984 6985 * stage1hr.c: Minor fixes to debugging statements 6986 6987 * stage1hr.c: Turned on macro for finding middle alignments. 6988 6989 * stage1hr.c: Turned on finding of middle alignments in find_terminals. Set 6990 length threshold to be querylength/3 instead of index1part. 6991 69922014-05-15 twu 6993 6994 * stage3.c: Removed revisions of coordinates near indels, which is not 6995 needed any more with the latest dynamic programming procedures 6996 69972014-05-13 twu 6998 6999 * VERSION, index.html: Updated version number 7000 7001 * dynprog_simd.c: For systems with SSE2 but not SSE4.1, subtracting 128 from 7002 pairscore in F loop of Dynprog_simd_8, and from initial column in 7003 Dynprog_simd_8_lower, to obtain correct results 7004 7005 * dynprog_genome.c: Removed debugging statements 7006 7007 * dynprog_genome.c: Fixed an uninitialized variable, best_prob 7008 7009 * trunk, README, src, Makefile.gsnaptoo.am, bytecoding.c, samprint.c, 7010 sarray-read.c, sarray-read.h, splice.c, splice.h, stage1hr.c, stage3hr.c, 7011 stage3hr.h, substring.c, substring.h, uint8list.c, uint8list.h, 7012 uintlist.c, uintlist.h: Merged revisions 135802 through 136084 from 7013 branches/2014-05-09-novel-ambiguous to consolidate ambiguous splices to 7014 save on memory usage and to consider them as a single concordant alignment 7015 7016 * pairpool.c: Not advancing coordinate at start of Pairpool_add_queryskip 7017 and Pairpool_add_genomeskip 7018 70192014-05-09 twu 7020 7021 * trunk, VERSION, config.site.rescomp.prd, config.site.rescomp.tst, 7022 index.html, src, sarray-read.c, splice.c, splice.h, stage1hr.c, 7023 stage3hr.c, stage3hr.h, substring.c, substring.h: Merged revisions 135797 7024 through 135801 from branches/2014-05-09-novel-ambiguous to save state 7025 before implementing novel ambiguous positions 7026 7027 * gsnap.c: Improved output for MEMUSAGE 7028 7029 * dynprog_simd.c: Speeding up traceback for upper and lower triangles, to 7030 take advantage of the fact that the main diagonal is filled with DIAGs. 7031 7032 * sarray-read.c, stage1hr.c, stage3hr.c, stage3hr.h: Grouping multiple 7033 splice segments connected to a given splice site, and selecting the best 7034 among these. If multiple best choices are found, creating an ambiguous 7035 splice. 7036 7037 * dynprog_simd.c: In traceback procedures, added check after each indel to 7038 see if we are in row 0 or column 0, and if so, not to trust the value in 7039 directions_nogap. 7040 70412014-05-08 twu 7042 7043 * splicetrie_build.c: Fixed comparison with MAX_SITES_ALLOCATED for 7044 intron-level splicing files 7045 7046 * gsnap.c: Setting POOL_FREE_INTERVAL to be 1 7047 7048 * gsnap.c, outbuffer.c: Using new interface to memusage routines 7049 7050 * mem.c, mem.h: Changed variable names for memusage. Added memusage report 7051 for keep pool. 7052 7053 * dynprog_simd.c: Fixed bug with infinite loop at row 1 or column 1, leading 7054 to error in memory allocation for pairpool 7055 7056 * dynprog_simd.c, pairpool.c: Adding 1 to r and c only on final indel 7057 7058 * pairpool.c: When adding indel (queryskip or genomeskip), not advancing 7059 coordinate by 1, since that can cross over chromosomal bounds 7060 70612014-05-07 twu 7062 7063 * VERSION: Updated version number 7064 7065 * Makefile.gsnaptoo.am, cellpool.c, cellpool.h, gmap.c, smooth.c, 7066 stage1hr.c, stage1hr.h, stage2.c, stage2.h, stage3.c, stage3.h, 7067 uniqscan.c: Added Cellpool_T object to handle allocations of Cell_T in 7068 stage 2 7069 7070 * gsnap.c: Calling Pairpool_reset, Diagpool_reset, and Cellpool_reset before 7071 processing each request. Previously, this memory was not being freed 7072 until the end of the process. 7073 7074 * splicestringpool.c: Changed memory procedures to use standard pool instead 7075 of keep pool 7076 7077 * dynprog_simd.c: Fixed saturation bug in F loop when trying to add 7078 pairscore. Setting E and H so non-diag directions are placed in row 0 and 7079 column 0. At end of traceback procedures, adding final indel to (0,0). 7080 7081 * stage3.c: Having make_pairarray_merge return a boolean to indicate success 7082 or failure. Trying to use the old pairarray in case of failure. 7083 7084 * gmap.c, gsnap.c, uniqscan.c: Using new interface to Splicetrie_retrieve 7085 procedures, using Splicestringpool_T object 7086 7087 * Makefile.gsnaptoo.am, splicestringpool.c, splicestringpool.h, 7088 splicetrie_build.c, splicetrie_build.h: Using Splicestringpool_T object to 7089 reduce number of memory allocations for Splicestring_T objects. Using 7090 local array for sites to also reduce number of memory allocations. 7091 7092 * splicetrie_build.c: Allocating struct Interval_T when copies are needed, 7093 to reduce the number of calls to allocate memory. Allocating triecontents 7094 as an array instead of uintlist, also to reduce the number of calls to 7095 allocate memory. 7096 7097 * interval.c, interval.h: Allocating struct Interval_T when copies are 7098 needed, to reduce the number of calls to allocate memory 7099 71002014-05-06 twu 7101 7102 * dynprog.h: Added definitions of POS_INFINITY_8 and POS_INFINITY_16 7103 7104 * pair.c: Returning correct type of NULL 7105 7106 * dynprog_cdna.c, dynprog_genome.c: Fixed potential memory leak with 7107 SNP-tolerant alignment 7108 7109 * genome.c: If glength == 0, Genome_get_segment_blocks now returns NULL 7110 7111 * stage3.c: Changed condition for not running Dynprog_single_gap from 7112 (queryjump < 0 || genomejump < 0) to (queryjump <= 0 || genomejump <= 0) 7113 71142014-05-05 twu 7115 7116 * stage1hr.c: Checking for anomalous splice (samechr splice) before trying 7117 to compute mapping position from distal end 7118 7119 * compress.c: Bypassing SSSE3 version of Compress_shift when 7120 DEFECTIVE_SSE2_COMPILER is true 7121 71222014-05-02 twu 7123 7124 * ax_ext.m4: Not allowing AVX unless immintrin.h is present 7125 7126 * indexdb.c: Freeing offsetspages for large genomes 7127 7128 * gsnap.c: For MEMUSAGE, changing pool free interval to be 1 7129 7130 * dynprog.c: Freeing nt_to_int_array 7131 7132 * dynprog_simd.c: Simplified traceback procedures 7133 71342014-05-01 twu 7135 7136 * stopwatch.c: Removed comment 7137 7138 * stage3hr.c: Allocating ambi and amb_nmismatches into output memory pool 7139 7140 * stage1hr.c: Limiting the number of terminals 7141 7142 * mem.c: Fixed memusage_reset procedure and revised debugging messages to 7143 print pool type 7144 7145 * intlist.c, intlist.h: Implemented Intlist_to_array_out 7146 7147 * compress.c, compress.h: Implemented SSSE3 procedure and fixed bug in SSE2 7148 procedure for Compress_shift 7149 7150 * genome128_hr.c: Fixed block_diff_snp procedure to perform the complete 7151 calculation for query, ref, and alt sequences. 7152 7153 * genome.c: Fixed Genome_get_segment_blocks_left and _right to provide 7154 correct alternate genomic segment 7155 7156 * dynprog_simd.c: Setting values along bands to be DIAG to avoid going out 7157 of bounds. Revised loops for gaps in traceback procedures. Compensating 7158 for open value in comparing E vs H for dir_horiz and dir_vert. Using 7159 nt_to_int_array. Improved debugging print procedures. 7160 7161 * dynprog.c, dynprog.h: Introduced nt_to_int_array. Setting score for 7162 AMBIGUOUS to be 0, and setting N-N to be that score. 7163 71642014-04-24 twu 7165 7166 * ax_ext.m4: Removing -mavx and -mavx2 compiler flags for now. Being added 7167 to Mac OS X Mavericks, where they are causing problems. 7168 7169 * stopwatch.c: Added macro for specifying POSIX C time 7170 7171 * stage3hr.c: Allowing pair_insert_length_trimmed to handle non-concordant 7172 paired ends 7173 7174 * genome128_hr.c: Implemented code for defective SSE2 compilers that cannot 7175 handle shifts with a non-immediate scalar 7176 7177 * samprint.c: Fixed printing of XS for half-donor and half-acceptor reads 7178 71792014-04-21 twu 7180 7181 * iit-read-univ.c: Printing tp:circular for circular chromosomes 7182 7183 * stage3.c: Using new interface to Pair_circularpos 7184 7185 * stage3hr.c: Computing insertlength properly for circular chromosomes when 7186 ends have different aliases. Handling duplicates of aliases better. 7187 7188 * pair.c, pair.h: Computing alias correctly in Pair_circularpos 7189 7190 * pair.c: Fixed bug when Pair_trim_ends is called when pairs is NULL 7191 71922014-04-19 twu 7193 7194 * trunk, src, Makefile.gsnaptoo.am, atoiindex.c, cmetindex.c, genome.c, 7195 genome.h, genome128_hr.c, genome128_hr.h, gmapindex.c, gsnap.c, indel.c, 7196 indel.h, indexdb.c, mapq.c, mapq.h, oligo.c, oligoindex_hr.c, 7197 sarray-read.c, sarray-read.h, sarray-write.c, sarray-write.h, splice.c, 7198 splice.h, splicetrie.c, splicetrie.h, stage1hr.c, stage1hr.h, stage3hr.c, 7199 stage3hr.h, substring.c, substring.h: Merged revisions 133654 through 7200 133759 from branches/2014-04-18-cmet-atoi-suffix-arrays to fix CMET and 7201 ATOI procedures 7202 7203 * VERSION, config.site.rescomp.prd, config.site.rescomp.tst: Updated version 7204 number 7205 7206 * index.html: Updated for version 2014-04-17 7207 7208 * stage3.c: Fixed uninitialized variable 7209 72102014-04-17 twu 7211 7212 * indexdb.c: Creating correct bitpack filenames when snps_root is given 7213 7214 * snpindex.c: Handling the case when an IIT file is not provided on the 7215 command-line, and installation is not needed 7216 72172014-04-10 twu 7218 7219 * gmap_build.pl.in: Fixed documentation for compression types 7220 7221 * dynprog_simd.c, dynprog_simd.h, dynprog_single.c: Added traceback 7222 procedure for Score16_T variables 7223 72242014-04-09 twu 7225 7226 * snpindex.c: Creating genomebits128 format instead of genomebits format 7227 7228 * sarray-read.c: Made changes in code (commented out) to try to infer 7229 lcp-intervals for short oligomers from 12-mer index, but still has some 7230 bugs 7231 7232 * archive.html, index.html: Made changes for version 2014-04-08 and 7233 2013-03-28.v2 7234 72352014-04-08 twu 7236 7237 * VERSION: Updated version number 7238 7239 * gmap.c: Removed unnecessary const 7240 7241 * stage2.c: Applying maxintronlen in deciding whether to create a grand 7242 lookback 7243 7244 * pairpool.c: Fixed bug in Pairpool_compact_copy 7245 7246 * stage3.c: Removed debugging statement 7247 7248 * trunk, src, Makefile.gsnaptoo.am, dynprog.c, dynprog.h, dynprog_cdna.c, 7249 dynprog_cdna.h, dynprog_end.c, dynprog_end.h, dynprog_genome.c, 7250 dynprog_genome.h, dynprog_simd.c, dynprog_simd.h, dynprog_single.c, 7251 dynprog_single.h, gmap.c, gsnap.c, sarray-read.c, sequence.c, sequence.h, 7252 splicetrie.c, stage3.c, stage3.h, uniqscan.c, util: Merged --reintegrate 7253 branches/2014-04-04-dynprog-shift to change dynprog_end routines to use 7254 upper/lower algorithm without F loops 7255 7256 * index.html: Updated for version 2014-04-06 7257 7258 * pairpool.c, pairpool.h: Implemented Pairpool_compact_copy 7259 7260 * oligoindex_hr.c: Freeing storage memory in Oligoindex_array_T object 7261 7262 * stage1.c: Checking extensions to make sure they fall within 7263 --max_totallength 7264 7265 * chimera.c, chimera.h, gmap.c, stage3.c, stage3.h: Checking chimeras to see 7266 that they satisfy --min-trimmed-coverage and --min-identity filters 7267 7268 * mem.c: Turned off DEBUG macro 7269 7270 * iit-read-univ.c: Restored warning message when IIT file cannot be read 7271 7272 * gmapindex.c, sarray-write.c, sarray-write.h: Implemented 7273 Sarray_child_uncompress 7274 7275 * gmap.c: No longer proceeding to align if Stage2_scan yields max_ncovered 7276 of less than 10% of the querylength. 7277 7278 * genome128_hr.c: Made additional changes to avoid _mm_extract_ps on 7279 non-SSE4.1 systems 7280 72812014-04-07 twu 7282 7283 * sarray-read.c: When effective querylength is less than bucket indexsize, 7284 no longer trying to infer lcp-interval from bucket array, but just 7285 starting from entire lcp-interval. 7286 72872014-04-06 twu 7288 7289 * gmap.c, gsnap.c, uniqscan.c: Checking for valid float values between 0.0 7290 and 1.0 7291 7292 * dynprog_genome.c, splicetrie.c: Fixed headers for dynprog routines 7293 7294 * dynprog_single.c: Using a variable in the F loop to see if H needs to be 7295 reloaded 7296 7297 * oligoindex_hr.c: Allocating storage for Oligoindex_array_T 7298 72992014-04-05 twu 7300 7301 * trunk, src, Makefile.gsnaptoo.am, dynprog.c, dynprog.h, dynprog_genome.c, 7302 dynprog_genome.h, dynprog_single.c, dynprog_single.h, gmap.c, gsnap.c, 7303 pairpool.c, pairpool.h, stage3.c, stage3.h, uniqscan.c, util: erge 7304 reintegrated branches/2014-04-04-dynprog-shift to use new SIMD routines 7305 that are row-first and reduce use of F loops 7306 7307 * util, src: Removed property changes 7308 7309 * VERSION: Updated version number 7310 7311 * gmap.c, gsnap.c, oligoindex_hr.c, oligoindex_hr.h, stage1hr.c, stage1hr.h, 7312 stage2.c, stage2.h, stage3.c, stage3.h, uniqscan.c: Created a separate 7313 Oligoindex_array_T object, which also holds storage 7314 7315 * configure.ac: Added AC_FUNC_ALLOCA 7316 7317 * mem.h: Added hooks for memory allocation using alloca 7318 7319 * bitpack64-readtwo.c: Added missing #endif 7320 7321 * indexdb.c: Fixed type of offsets from UINT4 back to Positionsptr_T 7322 7323 * iit-read-univ.c: aking Univ_FNode_T struct separate from FNode_T struct 7324 7325 * gmapindex.c: Checking genomelength before trying to create suffix array or 7326 LCP/child/DC arrays 7327 7328 * bitpack64-readtwo.c: Fixed bug where not enough 128-bit registers were 7329 provided for large genomes 7330 7331 * bitpack64-read.c: Eliminated an extra addition from computing offset in 7332 large genomes 7333 73342014-04-03 twu 7335 7336 * uniqscan.c: Using new interface to Oligoindex_new routines 7337 7338 * index.html: Updated for 2014-04-01 version 7339 7340 * archive.html: oved 2014-03-28 and earlier versions to archive 7341 7342 * acinclude.m4, ax_ext.m4, ax_gcc_x86_avx_xgetbv.m4: Updated ax_ext.m4 and 7343 added ax_gcc_x86_avx_xgetbv 7344 7345 * uniqscan.c: Using new interface to Stage3_setup 7346 7347 * stage3.c, stage3.h: Providing min_end_indel_matches to end trimming 7348 procedures 7349 7350 * sarray-read.c, sarray-read.h: Added separate access control for lcpchilddc 7351 7352 * oligoindex_hr.c, oligoindex_hr.h: Allocating dedicated space needed for 7353 Oligoindex_get_mappings, to avoid memory allocation/deallocation 7354 7355 * gsnap.c: Using new interface to Oligoindex_new routines, Stage3_setup, and 7356 Sarray_new 7357 7358 * gmap.c: Using new interface to Oligoindex_new routines and Stage3_setup 7359 7360 * genome128_hr.c: Provided alternative to _mm_extract_ps, which also 7361 requires SSE4.1 7362 73632014-04-02 twu 7364 7365 * dynprog.c: odified debugging statements 7366 7367 * ax_ext.m4: Adding warning messages when immintrin.h is not found and could 7368 be used 7369 7370 * ax_ext.m4: Checking for immintrin.h before allowing popcnt, lzcnt, or bmi1 7371 7372 * ax_ext.m4: Improved warning messages 7373 7374 * trunk, VERSION, ax_ext.m4, config.site.rescomp.prd, 7375 config.site.rescomp.tst, src, Makefile.gsnaptoo.am, atoiindex.c, 7376 bitpack64-access.c, bitpack64-read.c, bitpack64-read.h, 7377 bitpack64-readtwo.c, bitpack64-readtwo.h, bitpack64-serial-read.c, 7378 bitpack64-serial-read.h, bitpack64-serial-write.c, 7379 bitpack64-serial-write.h, bitpack64-write.c, bitpack64-write.h, 7380 bytecoding.c, bytecoding.h, cmetindex.c, compress-write.c, 7381 compress-write.h, compress.c, compress.h, compress128.c, dynprog.c, 7382 genome-write.c, genome-write.h, genome.c, genome128-write.c, 7383 genome128-write.h, genome128_hr.c, genome128_hr.h, genome_hr.c, 7384 genome_hr.h, genome_sites.c, gmap.c, gmapindex.c, gsnap.c, 7385 iit-read-univ.c, iit-read.h, iit-write-univ.h, iit-write.h, iitdef.h, 7386 indel.c, indexdb-write.c, indexdb-write.h, indexdb.c, indexdb.h, 7387 indexdb_hr.c, indexdbdef.h, mapq.c, sarray-read.c, sarray-read.h, 7388 sarray-write.c, sarray-write.h, snpindex.c, splice.c, splicetrie.c, 7389 splicetrie_build.c, stage1hr.c, stage2.c, stage3.c, stage3hr.c, 7390 substring.c, types.h, uniqscan.c, util, gmap_build.pl.in: ajor change. 7391 Merged revisions 131573 to 132142 from branches/2014-03-26-bitpack-esa to 7392 use genomebits128 format, bp64-columnar, and enhanced suffix arrays 7393 7394 * VERSION: Updated version number 7395 7396 * index.html: Revised for latest version 7397 7398 * stage3hr.c: Not using number of introns to determine equivalence in 7399 hit_equiv_cmp and hitpair_equiv_cmp 7400 7401 * genome_sites.c: Resolved comparison between unsigned and signed values for 7402 -1 7403 74042014-04-01 twu 7405 7406 * boyer-moore.c, dynprog.c, stage3.c: Implemented changes to restore finding 7407 microexons 7408 7409 * shortread.c: Handling Casava reads ending in ";1". Fixed problem where -q 7410 and --allow-pe-name-mismatch together caused a fatal bug. 7411 7412 * genome_hr.c: Changed debugging output for splice fragments to print 7413 unsigned shorts 7414 74152014-03-28 twu 7416 7417 * indexdb.c: Changed type of offsets (called only for regular bitpack 7418 procedure) from Positionsptr_T * to UINT4 * 7419 7420 * indexdb_hr.c: Calling correct procedures for LARGE_GENOMES 7421 7422 * substring.c, substring.h, stage3hr.c: Fixing comparisons of coordinates to 7423 handle circular chromosomes 7424 7425 * stage3hr.h: Removed queryseq as arguments to Stage3end_remove_duplicates 7426 7427 * stage1hr.c: Calling Stage3end_remove_duplicates after 7428 Stage3end_remove_circular_alias 7429 74302014-03-27 twu 7431 7432 * stage1hr.c, stage3hr.c, stage3hr.h: Trying to salvage alias +1 within 7433 Stage3end_remove_circular_alias, and calling that rather than 7434 Stage3end_unalias_circular 7435 7436 * stage3hr.c: In Stage3_new_splice, not trying to merge long-distance 7437 splices at this time, which can lead to bad coordinates for GMAP 7438 7439 * stage1hr.c: Initializing variable, previously not initialized 7440 7441 * pair.c: Fixed assertion on CIGAR length to include hardclips 7442 7443 * gsnap.c: Including splice.h 7444 7445 * chimera.c: Rearrange order of loops in Chimera_bestpath 7446 74472014-03-26 twu 7448 7449 * stage3hr.c: Fixed bug where overlap across circular chromosome origin is 7450 entirely trimmed, leading to an "SH" cigar string 7451 7452 * trunk, VERSION, config.site.rescomp.prd, config.site.rescomp.tst, 7453 index.html, util: Updated version number 7454 7455 * gmap.c: Checking return value of Chimera_bestpath 7456 7457 * chimera.c, chimera.h: Chimera_bestpath returning a value to indicate if a 7458 chimera was found 7459 7460 * stage3.c: In Stage3_merge_local, traversing cDNA gap any time intronlength 7461 is less than 0 7462 7463 * pairpool.c: Created separate debugging category for Pairpool_clean_join 7464 7465 * gsnap.c, splice.c, splice.h: Using min_shortend to control splice length 7466 at ends 7467 7468 * sarray-read.c: Fixed problem where splice or deletion could extend into 7469 next chromosome 7470 74712014-03-17 twu 7472 7473 * translation.c: Fixed bug where assign_cdna_backward was returning ncdna 7474 instead of codon 7475 74762014-03-12 twu 7477 7478 * sarray-read.c: Put sarray_search loop into a new function, 7479 find_longest_match 7480 7481 * sarray-read.c: Improved sarray_search by looking up genome only when lcp 7482 advances by more than one position 7483 7484 * oligoindex_hr.c: Using faster method for checking for zero 128-bit 7485 register when SSE4.1 is available 7486 7487 * dynprog.c, gdiag.c, indexdb_hr.c, oligo.c, pair.c, sarray-read.c, 7488 spanningelt.c, stage1.c, stage1hr.c, stage2.c: Using safer method for 7489 computing average of lowi and highi in binary search 7490 7491 * access.c: Handling empty files 7492 74932014-02-28 twu 7494 7495 * gmap.c, stage3.c, stage3.h: Removed alignment_score_fwd and 7496 alignment_score_rev from Stage3_T object 7497 7498 * VERSION: Updated version number 7499 7500 * stage3.h, gmap.c: Using maxintronlen_bound in Stage3_mergeable 7501 7502 * stage3.c: Not trimming exons at end based on splice site probabilities 7503 7504 * stage2.c: In D4 section, when diffdistance <= EQUAL_DISTANCE_NO_SPLICING, 7505 adding CONSEC_POINTS_PER_MATCH, to extend chains further 7506 7507 * gmap.c: Performing iterations for finding a local join, before iterations 7508 for finding a chimera. 7509 7510 * stage3.c: Revised criteria for pick_cdna_direction. When cdna_direction 7511 == 0, assigning fwd or rev introntype instead of NONINTRON. 7512 7513 * Makefile.dna.am, Makefile.gsnaptoo.am, get-genome.c, gmap.c, outbuffer.c, 7514 outbuffer.h, stage1.c, stage1.h: Removed references to Chrsubset_T. Made 7515 -c flag work again by setting universal coordinate bounds in stage 1. 7516 7517 * gmap.c: Checking coverage of nonchimericbest against each chimeric part, 7518 and if coverage is large enough on each side, picking non-chimeric 7519 alignment over chimera. 7520 7521 * stage3.c, stage3.h: Not determining mergeable based on cdna_directions of 7522 the left and right part. In that situation, fixing the alignment by 7523 recomputing to find the best cdna_direction. 7524 7525 * pair.c: Disallowing transitions even 10 bp outside of alignments 7526 7527 * chimera.c: Calling Pair_pathscores directly, instead of through 7528 Stage3_pathscores 7529 7530 * gmap_build.pl.in: Removed reference to localdir. Putting tempfiles for 7531 <genome>.coords and <genome>.sources into destination directory. 7532 75332014-02-27 twu 7534 7535 * sarray-read.c: Slight speed improvement in handling pre-alignment loop. 7536 Now using 0 instead of 4. 7537 7538 * stage3hr.c: Restored old version of Stage3end_remove_overlaps, with 7539 different algorithm from that for paired ends 7540 75412014-02-26 twu 7542 7543 * stage1hr.c: For paired-end reads, when only one end is too short, aligning 7544 the other end as part of a halfmapping alignment 7545 75462014-02-24 twu 7547 7548 * stage3hr.c: Implemented recursive, list-based approach to removing bad 7549 superstretches in paired-end alignments, instead of O(n^3) algorithm, 7550 which occasionally hanged in repetitive regions 7551 7552 * gmap.c: Added debugging statements for chimera 7553 7554 * stage2.c: Reducing NINTRON_PENALTY_MISMATCH from 32 to 1, because short 7555 exons were being missed. Also, eliminated querydist_credit and restored 7556 querydist_penalty. 7557 7558 * chimera.c: In Chimera_local_join_p, checking genomic positions to make 7559 sure they make sense 7560 75612014-02-21 twu 7562 7563 * gsnap.c, pair.c, pair.h, samprint.c, samprint.h: Implemented option 7564 --hide-soft-clips 7565 7566 * stage3.c: Returning value from merge_local_single 7567 7568 * substring.c, substring.h: Added function Substring_queryend_orig 7569 75702014-02-20 twu 7571 7572 * gsnap.c: Using new interface to Stage2_setup 7573 7574 * src, chimera.c, diagpool.c, gmap.c, oligoindex.c, oligoindex_hr.c, pair.c, 7575 stage2.c, stage2.h, stage3.c, uniqscan.c: erged revisions 128075 to 7576 128117 from branches/2014-02-20-chimera-breakpoint to fix cases where 7577 breakpoint was found outside of alignments 7578 75792014-02-19 twu 7580 7581 * dynprog.c: Handling lower-case nucleotides correctly in dynamic 7582 programming. Handling alternate alleles equally in dynamic programming. 7583 7584 * bitpack64-read.c, indexdb.c: Fixed bugs in handling bitpackpages file for 7585 huge genomes 7586 75872014-02-18 twu 7588 7589 * pair.c, pair.h, stage3.c: Added numbers of matches, mismatches, indels and 7590 unknowns to GFF3 output 7591 7592 * VERSION: Updated version 7593 7594 * ax_ext.m4: Fixed typo 7595 7596 * gmap_build.pl.in: Not printing warning message about -T unless it is used. 7597 Deleting .sources file. 7598 7599 * chimera.c, chimera.h, gmap.c: Added donor_watsonp and acceptor_watsonp to 7600 fields for GMAP 7601 7602 * stage3hr.c: Fixed bug in printing header twice for concordant 7603 translocations in standard GSNAP output format 7604 7605 * get-genome.c: Added output for vareffect 7606 7607 * samprint.c: Added strands to XT field 7608 7609 * indexdb-write.c: When checking bitpack64, printing warning messages rather 7610 than exiting. 7611 76122014-02-14 twu 7613 7614 * sarray-read.c: Fixed a bug where r for an lcp-interval was being 7615 decremented from 0 to -1U. 7616 76172014-02-13 twu 7618 7619 * samprint.c: Added distant splice information to XT field, but without 7620 strand information 7621 7622 * samprint.c: Added XC flag to indicate a circular alignment 7623 76242014-02-11 twu 7625 7626 * stage1hr.c: Removed unnecessary variables that have been made global 7627 within the file 7628 7629 * branchpoint.c, branchpoint.h, splicing-score.c: Added bpa-all option to 7630 print all marginals 7631 7632 * iit-read.c: Handling cases in signed matching where sign == 0 7633 7634 * get-genome.c: Added option --aslabel 7635 7636 * sarray-read.c: Turning off debugging 7637 76382014-02-02 twu 7639 7640 * get-genome.c: Added debugging statements 7641 76422014-01-31 twu 7643 7644 * get-genome.c: Making typed queries when user provides a typestring 7645 7646 * trunk, README, src, Makefile.dna.am, Makefile.gsnaptoo.am, branchpoint.c, 7647 branchpoint.h, gsnap.c, indel.c, indel.h, sarray-read.c, splicing-score.c, 7648 stage1hr.c, translation.c, uniqscan.c, util: Merged revisions 125127 to 7649 125307 from branches/2014-01-30-lsm-branch-point to add bpa analysis to 7650 splicing-score 7651 76522014-01-29 twu 7653 7654 * splicing-score.c: Added -v flag to handle alternate alleles 7655 76562014-01-21 twu 7657 7658 * fa_coords.pl.in: Fixed syntax error 7659 7660 * VERSION, config.site.rescomp.prd, config.site.rescomp.tst, index.html, 7661 src, closeparen-table.pl, excess-leftward.pl, excess-rightward.pl, 7662 openparen-table.pl, select1-table.pl: Updated version number 7663 7664 * Makefile.dna.am, Makefile.gsnaptoo.am: Added SIMD_FLAGS and POPCNT_FLAGS 7665 to all programs that include compress.c 7666 7667 * sarray-read.c: Added benchmarking procedure Sarray_traverse_children 7668 7669 * oligo.c: Added "U" to end of integer constants where necessary 7670 7671 * gsnap.c: Fixed typo in --help statement 7672 7673 * gmap.c: Fixed revision of extension length relative to end-segment length 7674 for finding chimeras 7675 76762014-01-15 twu 7677 7678 * gmap_build.pl.in: Fixed variable name 7679 76802013-12-23 twu 7681 7682 * sarray-read.c: Added comments 7683 7684 * Makefile.gsnaptoo.am: Added files for bp-read and bp-write 7685 7686 * bp-read.c, bp-read.h, bp-write.c, bp-write.h, bp.h, gmapindex.c, 7687 sarray-write.c, sarray-write.h, types.h: Merged revisions 119699 to 122378 7688 from branches/2013-11-27-child-bitvector to add code for bitvector 7689 representation of child array 7690 7691 * bitpack64-read.c: Removed extraneous addition for second half of block for 7692 packsize of 32 7693 76942013-12-20 twu 7695 7696 * util, gmap_build.pl.in: Merged revisions 119700 through 122189 from 7697 branches/2013-11-27-child-bitvector/util to write genome indices in 7698 destination directory 7699 7700 * fa_coords.pl.in, gmap_build.pl.in, gmap_process.pl.in: Writing FASTA 7701 sources to a file 7702 77032013-12-19 twu 7704 7705 * Makefile.dna.am: Added gamma-speed-test and bitpack64-speed-test 7706 7707 * splicing-score.c: Checking if coordinates are valid 7708 7709 * gamma-speed-test.c: Printing nanoseconds per query 7710 7711 * bitpack64-speed-test.c: Using new interface to bitpack64 read commands 7712 77132013-12-17 twu 7714 7715 * stage1hr.c, stage3hr.c, stage3hr.h: Requiring single or unpaired terminals 7716 to be 2/3rds of the querylength 7717 7718 * sarray-read.c: Fixed memory leak 7719 77202013-12-16 twu 7721 7722 * dynprog.c: Fixed a bug where the wrong genomic position was provided for a 7723 high-probability microexon. 7724 7725 * stage1hr.c: Not passing in max_terminal_length to find_terminals 7726 7727 * sarray-read.c: Allowing initial lcp interval for the first nucleotide 7728 7729 * gmapindex.c: Changed name of child_uncompress procedure 7730 7731 * bitpack64-write.c: Handling the case for Bitpack64_write_direct where the 7732 packsize is 0. 7733 7734 * sarray-write.c, sarray-write.h: Included code for bp version of child 7735 information 7736 77372013-12-13 twu 7738 7739 * trunk, config.site.rescomp.tst, src, Makefile.dna.am, 7740 Makefile.gsnaptoo.am, atoiindex.c, bitpack64-access.c, bitpack64-access.h, 7741 bitpack64-read.c, bitpack64-read.h, bitpack64-write.c, bitpack64-write.h, 7742 chimera.c, cmetindex.c, dynprog.c, dynprog.h, genome.c, genome.h, gmap.c, 7743 gmapindex.c, indexdb-write.c, indexdb-write.h, indexdb.c, indexdb.h, 7744 indexdb_hr.c, indexdbdef.h, oligoindex_hr.c, oligoindex_hr.h, 7745 oligoindex_pmap.c, oligoindex_pmap.h, pmapindex.c, sarray-read.c, 7746 sarray-write.c, sarray-write.h, sequence.c, sequence.h, snpindex.c, 7747 splicetrie.c, splicetrie.h, splicing-score.c, stage1hr.h, stage2.c, 7748 stage2.h, stage3.c, stage3.h, util: Merged revisions 120874 through 121506 7749 from branches/2013-12-10-pmap 7750 77512013-12-04 twu 7752 7753 * substring.c: Fixed issue where circularpos was assigned in the region of 7754 soft-clipping, leading to an improper CIGAR string 7755 7756 * pair.c: In GFF output, printing '.' instead of '?' for unknown strand 7757 77582013-11-27 twu 7759 7760 * index.html: Added statement about portability of NAN 7761 7762 * iit_store.c: Handling NAN when it is not available 7763 7764 * VERSION, index.html: Updated version number 7765 7766 * popcount.c, popcount.h: Added popcount.c and popcount.h to store tables 7767 7768 * Makefile.gsnaptoo.am: Added popcount.c and popcount.h 7769 7770 * genome_sites.c: Added include of popcount.h. 7771 7772 * genome_hr.c: Moved tables to popcount.c, and added include of popcount.h. 7773 7774 * indexdb-write.c: Revised size of memory reserved for bitpackptrs. Moved 7775 tables to popcount.c, and added include of popcount.h. 7776 7777 * sarray-write.c: Revised size of memory reserved. Added include of 7778 popcount.h 7779 77802013-11-22 twu 7781 7782 * VERSION, config.site.rescomp.prd, config.site.rescomp.tst, index.html: 7783 Updated version number 7784 7785 * stage3.c: In build_pairs_singles, build_pairs_introns, and 7786 build_pairs_dualintrons, remove gaps at the beginning and any gap at the 7787 end of the path. 7788 7789 * outbuffer.c: Fixed bug in using --fails-as-input on single-end reads 7790 7791 * stage3.c: Replaced one call to insert_gapholders with List_reverse 7792 7793 * Makefile.dna.am: Added files needed for gmapindex and uniqscan 7794 7795 * stage3.c: Inserting gapholders before calling assign_intron_probs 7796 77972013-11-21 twu 7798 7799 * iit-read.c: Fixed issue with IIT_fieldvalue returning the previous line 7800 from what is desired 7801 78022013-11-20 twu 7803 7804 * stage3hr.c: Slight improvement in efficiency of randomization procedure 7805 7806 * stage3hr.c: Using a different formula for generating a random integer 7807 7808 * stage3hr.c: Put an explicit type conversion to double before RAND_MAX 7809 7810 * stage3hr.c: Now picking primary alignment randomly among best alignments 7811 with ties 7812 7813 * trunk, VERSION, config.site.rescomp.prd, config.site.rescomp.tst, 7814 index.html, src, Makefile.dna.am, Makefile.gsnaptoo.am, dynprog.c, 7815 iit-read.c, iit-read.h, iit-write.c, iit-write.h, iit_get.c, iit_store.c, 7816 iitdef.h, littleendian.h, util: Merged revisions 115687 to 115891 from 7817 branches/2013-11-19-iit-values 7818 7819 * archive.html: Moved version 2013-10-28 to archive 7820 7821 * README: Changed discussion to use gmap_build and not gmap_setup 7822 7823 * configure.ac, Makefile.am: Removing gmap_setup program 7824 7825 * gmap_build.pl.in: Added --contigs-are-mapped and --fasta-pipe options. 7826 Providing arguments to subroutines. 7827 7828 * archive.html: Added link to old database format using gamma compression 7829 7830 * index.html: Provided link to old genome database formats 7831 7832 * src, spanningelt.c: Fixed bug in code for spanning elt in regular-sized 7833 genomes 7834 7835 * index.html: Updated for version 2013-10-28 7836 78372013-11-19 twu 7838 7839 * README: Changed N1 and N2 abbreviations to NM 7840 7841 * configure.ac, Makefile.am, ensembl_genes.pl.in: Added ensembl_genes program 7842 7843 * gmap_build.pl.in: Clarified usage of --circular feature when --names 7844 feature is also used 7845 7846 * bitpack64-read.c: Declared variable needed for non-SSE2 compilation 7847 7848 * stage2.c: Revised debugging statement 7849 7850 * gmap.c: Added debugging statement 7851 7852 * oligoindex_hr.c: Made debugging statements print counts for each oligomer 7853 7854 * oligoindex.c: Resetting query_evaluated_p in Oligoindex_untally, to avoid 7855 interactions between queryseqs 7856 78572013-11-18 twu 7858 7859 * dynprog.c: Not adding dashes if gap is long. Not performing simple genome 7860 gap if finalp is true. 7861 7862 * Makefile.dna.am, Makefile.gsnaptoo.am: Added sarray-read.c and 7863 sarray-read.h to gsnapl and uniqscanl 7864 7865 * uniqscan.c: Using new interface to Stage3hr_setup 7866 7867 * stage3hr.c, outbuffer.c: Removed unused code 7868 7869 * smooth.c, stage3.c: In traverse_dual_intron, if dual scores win over 7870 single score, protecting that exon from being smoothed by size 7871 7872 * pairdef.h: Added comment about protectedp 7873 7874 * pair.c, pair.h: Added function Pair_protect_list 7875 7876 * gmap.c: Reduced threshold for discarding gegion from 0.80*max_ncovered to 7877 0.25. 7878 7879 * stage3.c: Reduced amount of peelback. Made fixes to improve finding of 7880 short exons, including reducing criteria for dual_canonical_p, and 7881 eliminating smoothing by size. 7882 7883 * access.c, atoiindex.c, bitpack64-read.c, bitpack64-read.h, 7884 bitpack64-speed-test.c, cmetindex.c, genome_hr.c, genome_hr.h, gmap.c, 7885 gmapindex.c, gsnap.c, iit-read-univ.c, indexdb-write.c, indexdb-write.h, 7886 indexdb.c, indexdb.h, indexdb_hr.c, indexdbdef.h, littleendian.c, 7887 littleendian.h, snpindex.c, spanningelt.c, spanningelt.h, stage1hr.c, 7888 stage2.c, types.h: Merged changes from branches/2013-10-16-huge-genomes to 7889 allow for huge genomes, where offsets (or length of positions file) 7890 exceeds 2^32 entries 7891 7892 * table.c, table.h: Changed type of keyfree procedure to take const void as 7893 argument type 7894 7895 * stage3hr.c, stage3hr.h, shortread.c, shortread.h: Allowing 7896 --fails-as-input to work with new xs files 7897 7898 * samflags.h: Changed N1 and N2 abbreviations to be NM 7899 7900 * outbuffer.c, samprint.c, samprint.h: Allowing --fastq-as-input to work 7901 with new xs files 7902 7903 * gmap_build.pl.in: Merged changes from branches/2013-10-16-huge-genomes to 7904 count offsets and to allow for huge genomes 7905 79062013-11-15 twu 7907 7908 * README: Added discussion of XS categories 7909 7910 * stage1hr.c: Setting ignore_found_score to be found_score, so the input 7911 value will be correct for those procedures 7912 7913 * stage1.c: For short sequences (< 4 times the default matchsize), 7914 performing scan_ends twice, once with a short matchsize and once with a 7915 default matchsize, to improve sensitivity and specificity 7916 7917 * outbuffer.c, samflags.h, samprint.c, samprint.h, stage3hr.c, stage3hr.h: 7918 Added xs (for quiet-if-excessive) output files and categories. Added 7919 file_setup procedures for SAM and standard output types for GSNAP. 7920 7921 * gmap.c: Removed --quiet-if-excessive option from GMAP, because not working 7922 properly 7923 7924 * gmap_build.pl.in: Added comment 7925 7926 * indexdb-write.c: Not performing check of bitpack compression when there 7927 are no k-mers 7928 7929 * sarray-write.c: Include code for reading sarray using fread instead of mmap 7930 79312013-11-14 twu 7932 7933 * Makefile.gsnaptoo.am, gmapindex.c, sarray-write.c, sarray-write.h: Using 7934 less memory for creating LCP array and saindex. Using permuted LCP 7935 algorithm. Memory mapping suffix array file and compressed LCP files. 7936 7937 * genome_hr.c: Made some fixes to Genome_consecutive_matches_pair 7938 79392013-11-13 twu 7940 7941 * genome_hr.c, genome_hr.h: Implemented Genome_consecutive_matches_pair 7942 79432013-10-29 twu 7944 7945 * indexdb.c: Using a separate sanity check on positions filesize when 7946 expanding bitpack offsets 7947 7948 * stage1.c: Reducing intial matchsize for short reads to be less than half 7949 the read length 7950 7951 * stage3hr.c: Fixed bug where insert length was being computed between GMAP 7952 and a TRANSLOC_SPLICE using pair_insert_length on a NULL substring. 7953 79542013-10-25 twu 7955 7956 * samheader.c: Changed SO:unknown to SO:unsorted 7957 7958 * gregion.c: Added comment 7959 79602013-10-24 twu 7961 7962 * gregion.c: In Gregion_extend, when chrend goes past chrhigh, setting it to 7963 chrlength - 1 and not chrlength 7964 7965 * stage3.c: In peel_leftward and peel_rightward, removing initial pairs that 7966 are gaps or indels, so we don't leave a gap or indel on the top. 7967 7968 * genome_sites.c: Fixed bug in large genomes where -1U was being compared to 7969 -1UL 7970 79712013-10-23 twu 7972 7973 * samprint.c: For non-concordant pairs, setting clipdir to 0 when calling 7974 SAM_compute_chrpos. 7975 7976 * stage3hr.c: Now calling pair_insert_length_unpaired for unpaired 7977 alignments involving GMAP, which helps to eliminate duplicate alignments. 7978 7979 * README, acinclude.m4, sse2_shift_defect.m4, configure.ac: Added a test to 7980 see if compiler can handle SSE2 shift commands properly, and setting 7981 config.h variable automatically 7982 79832013-10-22 twu 7984 7985 * VERSION: Updated version number 7986 7987 * dynprog.c: Fixed bug in checking for either rlength or glength to be too 7988 long 7989 79902013-10-21 twu 7991 7992 * stage3.c: Added hooks for HMM step, but still not using it 7993 7994 * stage3.h, gmap.c: Allowing stage3debug value of middle 7995 7996 * oligoindex.c: Adjusting lookback for querypos with no hits. Making 7997 lookback value equal for all oligoindices. 7998 79992013-10-18 twu 8000 8001 * gmap.c, stage3.c, stage3.h: Made stage3debug work 8002 80032013-10-15 twu 8004 8005 * stage1hr.c: Restored previous criteria for distant splicing 8006 8007 * stage1hr.c: Loosened criteria for distant splice probabilities. Checking 8008 for gmap_allowance on single-end GMAP and terminal improvement GMAP 8009 alignments. 8010 8011 * samprint.c: Fixed bug in printing deletion resulting in consecutive M 8012 tokens 8013 80142013-10-11 twu 8015 8016 * indexdb.c: Fixed memory leak when snp_root is given and desired file is 8017 not found 8018 8019 * boyer-moore.c: Checking if result of Genome_get_segment_blocks_left is NULL 8020 8021 * dynprog.c, genome.c: In Genome_get_segment_blocks_right and 8022 Genome_get_segment_blocks_left, returning NULL if the entire segment is 8023 outside the chromosomal bounds 8024 8025 * psl_splices.pl.in: Replaced hard-coded path 8026 80272013-10-10 twu 8028 8029 * stage3.c: Fixed typo in variable name 8030 8031 * stage2.c: Adding query_offset to pairs in non-standard (cmet or atoi) modes 8032 80332013-10-09 twu 8034 8035 * VERSION, index.html: Updated version 8036 8037 * README: Added section on compiler issues. Added section numbers. Revised 8038 some sections for latest version. 8039 8040 * configure.ac: Added option --with-defective-sse2-compiler 8041 8042 * diag.c: Added debugging statements 8043 8044 * compress.c: Added code for defective SSE2 compilers 8045 8046 * stage3hr.c: In cases of overreach when concordant is expected, returning 8047 NULL 8048 8049 * gmap_build.pl.in: Added --no-sarray option to gmap_build 8050 8051 * stage3.c: Making sure not to leave an INDEL_COMP or SHORTGAP_COMP on the 8052 path/pairs after peelback. Treating SHORTGAP_COMP the same as INDEL_COMP 8053 throughout the code. 8054 8055 * VERSION: Updated version number 8056 8057 * stage3hr.c: Fixed insert length calculation for GMAP/substring to go from 8058 start5 to end3. Fixed overreach criteria to consider only substrings that 8059 entirely pass the end of the other hit. 8060 80612013-10-08 twu 8062 8063 * substring.c, substring.h: Added procedures for segment-based trimmed 8064 overlap 8065 8066 * stage3hr.c: Added a general segment-based algorithm for finding trimmed 8067 insert length. Skipping cases with dual overreach. 8068 8069 * stage3hr.c: Fixed bug in computing start and end positions in computing 8070 non-GMAP trimmed insert length. Added code for handling overreach, by 8071 changing splices to halfsplices. 8072 8073 * stage3hr.c: Fixed bug in computing overlap between non-GMAP hits 8074 80752013-10-07 twu 8076 8077 * stage3hr.c, substring.c, substring.h: Checking all possible substring 8078 combinations in computing trimmed insertlength for non-GMAP hits 8079 80802013-10-04 twu 8081 8082 * stage1hr.c: Fixed the computation of goal for finding the next chromosome 8083 bounds 8084 8085 * sarray-read.c: Handling the case where nmisses_allowed < 0. When saindex 8086 yields no result, setting low and high to be the beginning and end of the 8087 suffix array, rather than iterating through saindex for a hit. 8088 8089 * gmap.c, stage3.c, stage3.h: In Stage3_merge_local, recovering gracefully 8090 if clipping leads to a NULL list for either Stage3_T object 8091 8092 * substring.c, substring.h: Revised calculation of substring trimmed 8093 insertlength to account for trimming of 3' end 8094 8095 * stage3hr.c: Revised calculation of GMAP-GMAP trimmed insertlength to 8096 account for trimming 8097 8098 * samprint.c: Revised criteria for printing D token in deletion. Made 8099 compute_cigar follow the logic of print_cigar. 8100 81012013-10-03 twu 8102 8103 * samprint.c: Not computing pair overlaps when a pair involves a circular 8104 alignment 8105 8106 * stage3hr.c: Improvements to computing insertlength for calculating pair 8107 overlap. Using querylength instead of querylength_adj. Searching through 8108 GMAP pairarray to find intersecting genomic position. Added checks for 8109 (left + genomiclength) > chrhigh. 8110 81112013-10-02 twu 8112 8113 * stage3.c: Reduced some parameters for looping in stage 3. Not looping for 8114 GSNAP. 8115 8116 * bitpack64-access.c: Added an access procedure for packsize of zero 8117 8118 * bitpack64-read.c: Added a read procedure for packsize of zero 8119 8120 * sarray-write.c: Changed type of len to size_t 8121 8122 * indexdb-write.c: Handling the case when all offset differences are zero, 8123 resulting in a packsize of zero 8124 8125 * bitpack64-write.c: Allow writing when packsize is zero 8126 8127 * stage3.c: Using new interface to Pair_print_sam 8128 8129 * config.site.rescomp.prd, config.site.rescomp.tst: Updated version number 8130 8131 * archive.html, index.html: Made changes for new version, including the one 8132 with suffix arrays 8133 8134 * outbuffer.c, pair.c, pair.h, samprint.c, samprint.h, stage3hr.c, 8135 substring.c, substring.h: Made numerous changes to fix --clip-overlap, 8136 involving trimming and GMAP alignments 8137 81382013-10-01 twu 8139 8140 * sarray-read.c: Fixed memory leak 8141 8142 * stage3.c: Turned off HMM in stage 3, because of possible core dumps 8143 8144 * VERSION, config.site.rescomp.prd: Updated version number 8145 8146 * stage3.c: Allowing iteration of outer loop based on dual_break_p. 8147 Restored filtering by HMM to remove bad sections. Removing noncanonical 8148 end exons only if both canonical and noncanonical introns exist. 8149 8150 * index.html: Updated for latest version 8151 8152 * fa_coords.pl.in, gmap_build.pl.in: Allowing user to specify circular 8153 chromosomes using a file instead of the command line. 8154 81552013-09-30 twu 8156 8157 * bitpack64-read.c: Fixed Bitpack64_offsetptr_only to work on poly-T. 8158 8159 * trunk, src, bitpack64-read.c, dynprog.c, indexdb-write.c, sarray-write.c, 8160 util: Merged revisions from branches/2013-09-27-bidir-bitpack64 to 8161 implement bidirectional bitpack64 format 8162 8163 * stage3hr.c: Fixed Stage3pair_overlap to handle trimmed ends 8164 8165 * access.c: Disabling memory allocation on Macintoshes when single fread 8166 fails 8167 8168 * stage3hr.c, stage1hr.c: Added debugging statements 8169 8170 * resulthr.c: Handing a missing case in Pairtype_string 8171 8172 * gsnap.c: No longer setting terminal_output_minlength to be MAX_READLENGTH 8173 8174 * genome_hr.c: For determining trim mismatches, setting query_unk_mismatch_p 8175 to be false 8176 8177 * access.c: Eliminated check for i >= 0, which is always true for size_t 8178 8179 * trunk, src, dynprog.c, util: Merged revisions 109420 through 109556 from 8180 releases/internal-2013-09-27 8181 8182 * stage1hr.c, stage3hr.c, stage3hr.h: Merged revisions 109420 through 109556 8183 from releases/internal-2013-09-27 to check for concordance after running 8184 GMAP improvement on paired results 8185 8186 * madvise-flags.m4: Added check for MADV_SEQUENTIAL 8187 8188 * dynprog.c: Making NEG_INFINITY_16 and NEG_INFINITY_8 visible outside SIMD 8189 code 8190 8191 * gmap.c: No longer bounding chimera_margin by CHIMERA_SLOP 8192 8193 * stage3.c: Avoiding indels at chimeric join by cleaning ends, extending 8194 with no gaps, and then clipping at breakpoint 8195 8196 * substring.c: Fixed assertions relative to chrhigh 8197 8198 * samprint.c: Fixed bug in --clip-overlap SAM output where deletion occurs 8199 at the clip site 8200 82012013-09-27 twu 8202 8203 * VERSION, config.site.rescomp.prd, config.site.rescomp.tst: Updated version 8204 number 8205 8206 * access.c: Fixed workaround for Macintosh fread bug to handle large files 8207 8208 * uniqscan.c, gsnap.c: Providing --gmap-allowance 8209 8210 * samprint.c: Fixing problem with mate flags when the mate is a translocation 8211 8212 * sarray-read.h: Passing max_end_deletions into Sarray_setup 8213 8214 * sarray-read.c: If read from beginning does not go halfway across read, 8215 trying a search from the middle of the read to see if it finds 8216 substitutions 8217 8218 * substring.c: No longer allowing donor or acceptor substrings in circular 8219 chromosomes, thereby preventing translocations to those chromosomes 8220 8221 * stage1hr.c, stage3hr.c, stage3hr.h: Using 8222 Stage3end_better_equiv_unpaired_p as a way to determine whether to do 8223 spanning set or complete set algorithms after suffix array method. 8224 8225 * stage1hr.h: Added gmap_allowance to filter bad GMAP alignments 8226 8227 * stage1hr.c: Added gmap_allowance to filter bad GMAP alignments. No longer 8228 allowing GMAP splicing on circular chromosomes. No GMAP improvement on 8229 sarray hits. Doing spanning set algorithm after suffix array algorithm if 8230 nconcordant == 0. 8231 82322013-09-26 twu 8233 8234 * sarray-read.c: Applying max_end_deletions. In collect_elt_matches, 8235 looking at multiple hits between goal and high. 8236 82372013-09-25 twu 8238 8239 * substring.c: Disallowing donor and acceptor substrings on the duplicate 8240 length of a circular chromosome 8241 82422013-09-24 twu 8243 8244 * trunk, src, dynprog.c, util: Reintegrated changes from 8245 releases/internal-2013-09-01 8246 8247 * stage3.c, stage3.h: Merged changes from releases/internal-2013-09-01 to 8248 clean ends of local merges 8249 8250 * sarray-read.c, sarray-read.h: Merged changes from 8251 releases/internal-2013-09-01 to fix problem with insertions that are too 8252 long 8253 8254 * pairpool.c: Merged changes from releases/internal-2013-09-01 to add 8255 debugging statements 8256 8257 * pair.c: Merged changes from releases/internal-2013-09-01 to fix 8258 Pair_start_bound and Pair_end_bound 8259 8260 * gsnap.c: Merged changes from releases/internal-2013-09-01 to use new 8261 interfaces to Sarray_setup 8262 8263 * gmap.c: Merged changes from releases/internal-2013-09-01 to use new 8264 interfaces to stage 3 chimera commands 8265 8266 * VERSION: Updated version number 8267 8268 * bitpack64-speed-test.c, gamma-speed-test.c: Finding both start and end of 8269 position blocks 8270 8271 * sarray-read.c, sarray-read.h: Make some fields available for benchmarking 8272 purposes 8273 82742013-09-19 twu 8275 8276 * pairpool.c: In Pairpool_clean_join, making sure that no gaps are left at 8277 the medial ends 8278 8279 * gmap.c, stage3.c, stage3.h: In Stage3_merge_local_splice, when a new 8280 intron is required, performing a full call to path_compute_dir and 8281 path_compute_final 8282 8283 * gmap.c: Fixed behavior of -n 1 in GMAP, so it does not print chimeric 8284 alignments 8285 8286 * substring.c: Added new assertions to make sure alignstart and alignend do 8287 not exceed chrhigh 8288 8289 * sarray-read.c: For indels and splices, using subtract_bounded and 8290 add_bounded to make sure the other piece is on the same chromosome 8291 82922013-09-18 twu 8293 8294 * stage3hr.c: Stage3pair_overlap inverts return value for minus alignments 8295 8296 * pair.c, stage3hr.c: Reworking of computation for pair overlap 8297 82982013-09-17 twu 8299 8300 * bitpack64-speed-test.c: Added code for uncompressing offsets 8301 8302 * stage3hr.c: Computing pair overlap based on difference between totallength 8303 and insertlength, not alignment coordinates 8304 8305 * indexdb.c: Fixed user message about expanding bitpackcomp 8306 8307 * stage1hr.c: Fixed memory leak when newpair involving GMAP is not kept 8308 8309 * stage3.c: When queryjump > nullgap and genomejump < 16 (not enough 8310 material for stage 2), performing cDNA gap rather than dual break. In 8311 path_compute_final, resolving dual breaks as single gaps. 8312 8313 * stage3hr.c: Fixed memory leak when resolving inner splices 8314 8315 * stage3hr.c: Fixed memory leak 8316 8317 * shortread.c: Fixed invalid read in chopping adapters when read lengths are 8318 unequal 8319 8320 * stage3.c: Removed uninitialized variable 8321 83222013-09-16 twu 8323 8324 * bitpack64-speed-test.c: Initial import into SVN 8325 8326 * Makefile.dna.am: Made instructions match those for Makefile.gsnaptoo.am. 8327 Added gamma-speed-test and bitpack64-speed-test. 8328 8329 * gamma-speed-test.c: Simplified code to focus only on decoding 8330 8331 * splicing-score.c: Fixed code for universal chromosome IIT 8332 8333 * indexdb.c, indexdb.h: Exposing different types of filename retrieval 8334 8335 * gregion.c: Considering fraction of overlap 8336 8337 * samprint.c: Fixed CIGAR strings for splicing on minus strand with 8338 clip-overlap 8339 83402013-09-13 twu 8341 8342 * gmap.c: Increased default value for shortsplicedist from 200,000 to 8343 2,000,000. For chimeric alignments that overlap, fixed bug where 8344 maxpeelback was negative, and now computing based on midpoint of the ends. 8345 8346 * stage3.h: In Stage3_mergeable, using shortsplicedist 8347 8348 * stage3.c: No longer calling Smooth_pairs_by_netgap. In Stage3_mergeable, 8349 using shortsplicedist. For extend5 and extend3, handling case where path 8350 or pairs is NULL after peelback. 8351 83522013-09-12 twu 8353 8354 * index.html: Updated for version 2013-09-11 8355 83562013-09-11 twu 8357 8358 * VERSION, config.site.rescomp.prd, config.site.rescomp.tst: Updated version 8359 numbers 8360 8361 * stage1.c: Added debugging statement 8362 8363 * gregion.c: For determining overlaps, using chrstart and chrend instead of 8364 extentstart and extentend 8365 8366 * pairpool.c, pairpool.h, stage3.c: In merging for local splice, added 8367 procedure Pairpool_clean_join to peel back both ends to remove negative 8368 genomejumps 8369 8370 * stage3hr.c: Fixed memory leak on new pairs. Handling the case where pair 8371 overlap is negative. 8372 8373 * stage1hr.c: Fixed memory leak on terminal alignments that fail 8374 terminal_output_minlength test 8375 8376 * shortread.c: Not checking for Illumina endings on shortreads that are 8377 skipped 8378 8379 * dynprog.c: Initializing value of introntype 8380 8381 * pair.c: Fixed memory leak under --clip-overlap feature 8382 8383 * stage3.c: Requiring 25 bp on both sides of a chimeric alignment 8384 83852013-09-10 twu 8386 8387 * sarray-read.c: Fixed bug in Elt_fill_positions_all where position could be 8388 negative 8389 83902013-09-09 twu 8391 8392 * VERSION, config.site.rescomp.prd, config.site.rescomp.tst: Updated version 8393 number 8394 8395 * stage3.c: Restored finding of microexons when sequence quality is high 8396 8397 * pair.c: Making value of trim_5p dependent on trim_left 8398 8399 * iit_get.c: Fixed behavior of iit_get to work when reading queries from 8400 stdin 8401 8402 * stage3.c: In build_pairs_introns, when finalp is true, solving cDNA gaps 8403 as single gaps 8404 8405 * stage3.c: When finalp is true, not relying upon stored solutions for any 8406 traversal functions. Performing iterative calls to trimming non-canonical 8407 introns at ends. 8408 8409 * stage3.c: In traverse_single_gap, not relying upon previous result if 8410 finalp is true 8411 8412 * stage2.c: Checking to make sure calls to genome canonicalp procedures do 8413 not have negative coordinates 8414 8415 * gmap.c: Initializing genome_sites for pairalign and usersegment queries 8416 8417 * samprint.c: Allowing printing of results where chrpos == 0 8418 84192013-09-06 twu 8420 8421 * index.html: Updated for latest version 8422 8423 * stage1hr.c: Fixed bug in add_bounded where coordinates extended past 8424 chrhigh 8425 8426 * dynprog.c: Fixed problem where prob_trunc was not being initialized 8427 8428 * fa_coords.pl.in, gmap_build.pl.in: Added -n flag for substituting 8429 chromosome names 8430 84312013-09-05 twu 8432 8433 * gmap.c, gsnap.c: Allowed compilation when pthreads are disabled 8434 84352013-09-04 twu 8436 8437 * stage1hr.c: Fixed a fatal bug in find_middle_indels when floors is NULL, 8438 from all oligos being omitted 8439 8440 * README: Added discussion about output types 8441 8442 * stage3.c: Put an absolute limit on peelback. In build_pairs_introns, when 8443 queryjump exceeds nullgap, traversing a dual break. 8444 8445 * pair.c: Eliminated possible infinite loop in 8446 Pair_guess_cdna_direction_array 8447 8448 * gsnap.c: Fixed error message 8449 84502013-09-03 twu 8451 8452 * VERSION, config.site.rescomp.prd, config.site.rescomp.tst: Updated version 8453 number 8454 8455 * outbuffer.c, stage3.c, stage3.h: Providing XO abbrev information for SAM 8456 output from GMAP 8457 8458 * outbuffer.c, pair.c, pair.h, samflags.h, samprint.c, samprint.h: Added XO 8459 flag to SAM output 8460 84612013-08-31 twu 8462 8463 * gsnap.c, stage3hr.c, stage3hr.h, uniqscan.c: Fixed genomicstart for 8464 distant samechr splices when --merge-distant-samechr is specified 8465 8466 * gregion.c: Fixed bug where overlaps were found in error because we used 8467 chromosomal coordinates instead of univeral coordinates 8468 8469 * dynprog.c: Made single and paired indel penalties the same 8470 8471 * README, configure.ac: Modified to include gvf_iit program 8472 8473 * Makefile.am, gvf_iit.pl.in: Added gvf_iit program 8474 84752013-08-30 twu 8476 8477 * gsnap.c, stage1hr.c, stage1hr.h, stage3hr.c, stage3hr.h, uniqscan.c: 8478 Filtering terminals with minlength less than value of 8479 --terminal-output-minlength 8480 8481 * stage3hr.c, stage3hr.h: Creating Stage3pair_T for terminals only if the 8482 terminal lengths exceed terminal_output_minlength 8483 84842013-08-29 twu 8485 8486 * trunk, src, dynprog.c, dynprog.h, gmap.c, gsnap.c, pair.c, pair.h, 8487 stage1hr.c, stage2.c, stage3.c, stage3.h, stage3hr.c, stage3hr.h, util: 8488 Did reintegration merge (revisions 106186 to 106267) of 8489 branches/2013-08-20-stage23-work 8490 8491 * trunk, src, util: Commit property changes 8492 84932013-08-28 twu 8494 8495 * trunk, config.site.rescomp.tst, src, Makefile.gsnaptoo.am, diag.c, diag.h, 8496 doublelist.c, dynprog.c, dynprog.h, genome_sites.c, genome_sites.h, 8497 gmap.c, gsnap.c, pairpool.c, pairpool.h, smooth.c, smooth.h, stage1.c, 8498 stage1hr.c, stage2.c, stage2.h, stage3.c, stage3.h, stage3hr.c, 8499 uniqscan.c, util: Merged revisions 105272 to 106186 from 8500 branches/2013-08-20-stage23-work to improve stage 2 and stage 3 procedures 8501 85022013-08-20 twu 8503 8504 * trunk, src, dynprog.c, gmap.c, scores.h, util: Reintegrated changes from 8505 branches/2013-08-20-stage23-work to fix alignment rankings and evaluation 8506 of chimeric alignment 8507 8508 * index.html: Updated for latest version 8509 8510 * gmap_build.pl.in: Handling case where -d argument contains a directory 8511 8512 * stage3.c: Fixed infinite loop in peelback. Made some changes in stage 3 8513 procedures. 8514 8515 * smooth.c, smooth.h: Restoring procedures for smoothing by net gap 8516 8517 * pairpool.c: If all pairs are outside bounds, returning NULL 8518 8519 * dynprog.c: Reduced indel penalties for low-quality alignments 8520 85212013-08-19 twu 8522 8523 * outbuffer.c: Adding q() line for R output 8524 85252013-08-13 twu 8526 8527 * stage3hr.c: Restoring filter where terminal alignments are removed if 8528 mismatches in trimmed region are too high. 8529 8530 * stage1hr.c: In computing genomic bounds for GMAP alignment, truncating at 8531 chromosomal boundaries 8532 8533 * chimera.c, chimera.h, gmap.c: For local chimeric joins, requiring that 8534 pieces are locally joinable in Chimera_bestpath 8535 85362013-08-07 twu 8537 8538 * dynprog.c: Added cache flush command needed for opencc compiler using -O3 8539 optimization on AMD computers 8540 85412013-08-06 twu 8542 8543 * genome_sites.c: Added lookup tables needed when popcnt is not available 8544 8545 * genome_hr.c: Made debugging statements work when popcnt is not available 8546 85472013-08-02 twu 8548 8549 * acinclude.m4, configure.ac: Removed popcnt.m4 and relying upon ACX_BUILTIN 8550 8551 * Makefile.gsnaptoo.am, outbuffer.c, samheader.c, samheader.h, samprint.c, 8552 samprint.h: Moved SAM header code to new samheader.c file 8553 8554 * gmap.c, gsnap.c, outbuffer.c, outbuffer.h, samprint.c, samprint.h: Added 8555 @HD and @PG header lines to SAM output 8556 85572013-07-25 twu 8558 8559 * trunk, NOTICE, config.site.rescomp.prd, src, Makefile.gsnaptoo.am, 8560 dynprog.c, fastlog.h, mapq.c, mapq.h, oligoindex.c, oligoindex_hr.c, 8561 pair.c, pair.h, pairpool.c, smooth.c, smooth.h, stage1hr.c, stage2.c, 8562 stage3.c, stage3hr.c, stage3hr.h, substring.c, substring.h, 8563 univinterval.c, univinterval.h, util: Reintegrated revisions from 8564 branches/2013-07-24-fastlog to use a fast approximate log function, to 8565 change calloc to malloc in several places, to eliminate a smoothing step, 8566 and to improve the oligoindex SIMD procedures slightly 8567 85682013-07-24 twu 8569 8570 * trunk, src, Makefile.gsnaptoo.am, compress.c, compress.h, dynprog.c, 8571 dynprog.h, genome_hr.c, gmap.c, pair.c, pairpool.c, pairpool.h, 8572 stage1hr.c, stage2.c, stage3.c, stage3.h, stage3hr.c, stage3hr.h, table.c, 8573 table.h, univinterval.c, univinterval.h, util: Merged revisions 102329 to 8574 102725 from branches/2012-07-22-stage3-dir to restructure stage 3 8575 algorithm, including reduction of insert_gapholders and early 8576 determination of cdna direction; to add gmap_history to memoize GMAP 8577 alignment results; and SIMD instructions for Compress_shift 8578 85792013-07-23 twu 8580 8581 * stage3.c: Changing Pairpool_pop to List_transfer_one in some procedures 8582 85832013-07-22 twu 8584 8585 * trunk, config.site.rescomp.prd, config.site.rescomp.tst, src, dynprog.c, 8586 dynprog.h, stage3.c, util: Reintegrated revisions from 8587 branches/2013-07-22-stage3-dir to determine cdna direction early in stage 3 8588 8589 * VERSION, index.html: Updated version number 8590 8591 * dynprog.c: Added memory fences at beginning of each SIMD loop 8592 8593 * stage2.c: Returned to safer code that keeps ranges 3 and 4 separate 8594 8595 * dynprog.c: Added needed variables 8596 8597 * dynprog.c: Making a single call to find splice sites from IIT file, 8598 instead of individual calls. Removed old code. 8599 8600 * iit-read.c, iit-read.h: Added functions IIT_get_lows_signed and 8601 IIT_get_highs_signed 8602 8603 * dynprog.c: In bridge_intron_gap procedures, using genomic sequence instead 8604 of get_genomic_nt. Also, implemented code for a single call to 8605 splicesites IIT, instead of one call for each position. 8606 86072013-07-19 twu 8608 8609 * gmap_setup.pl.in: Added backslash escape that was missing 8610 8611 * gsnap.c: Fixed help string 8612 8613 * sarray-read.c: Handling N's in query sequence 8614 8615 * types.h, compress-write.h: Added comment 8616 8617 * iit-write-univ.c: Fixed message 8618 8619 * types.h: Prefer to use unsigned long long for UINT8 8620 8621 * stage3hr.c: Changed debugging statement 8622 8623 * stage1hr.c: Changed single-end statement for terminals to look like 8624 paired-end statement 8625 8626 * gsnap.c: Turned default for --terminal-threshold back to 2 for DNA-Seq 8627 alignments, since terminal alignments are needed for some GMAP alignments 8628 8629 * bitpack64-write.c: Using _mm_store_si128 instead of type casting and a 8630 memory fence 8631 8632 * iit-write-univ.c: Added message about coordinate sizes 8633 8634 * iit-write.c: Including types.h just to make sure we have it 8635 8636 * stage1hr.c: Keeping lowprob splices, unless dominated by other splices 8637 8638 * gsnap.c, uniqscan.c: Restored --pairexpect and --pairdev flags. Giving 8639 information to Stage3hr_setup. 8640 8641 * stage3hr.h, stage3hr.c: Using insertlength, outerlength, and splice 8642 probabilities to resolve difficult cases 8643 8644 * dynprog.c: Added memory fences between SIMD and non-SIMD code 8645 86462013-07-18 twu 8647 8648 * bitpack64-write.c: Added comment 8649 8650 * bitpack64-write.c: Added _mm_lfence to take care of incorrect SIMD 8651 behavior. 8652 8653 * sarray-write.c: Fixed procedure for checking bitpack compression 8654 8655 * indexdb.c: Fixed procedure for expanding bitpack 8656 8657 * indexdb-write.c: Added procedure for checking bitpack compression 8658 8659 * indexdb.c: Added extra information to message when expanding offsets 8660 8661 * bitpack64-access.c: Added extra information to error message 8662 8663 * bitpack64-read.c: Fixed Bitpack64_block_offsets function 8664 86652013-07-17 twu 8666 8667 * trunk, config.site.rescomp.tst, src, dynprog.c, util: Reintegrated 8668 revisions from branches/2013-07-16-faster-stage2 8669 8670 * stage2.c: Reintegrated revisions from branches/2013-07-16-faster-stage2 to 8671 speed up stage2 procedure 8672 8673 * trunk, src, dynprog.c, goby.c, goby.h, inbuffer.c, shortread.c, 8674 shortread.h, uniqscan.c, util: Reintegrated changes from 8675 branches/2013-07-17-reduce-memset to not allocate memory for short reads 8676 that are skipped 8677 8678 * VERSION, index.html: Updated version number 8679 8680 * dynprog.c: Preventing read of uninitialized variable at matrix position 8681 0,0 during traceback 8682 8683 * dynprog.c: Not initializing directions_Egap or directions_nogap 8684 8685 * stage1.c: In find_first_pair, restored decision of which end to advance 8686 based on number of hits 8687 86882013-07-16 twu 8689 8690 * gregion.c: Turned off debugging 8691 8692 * stage1.c: Allowing arbitrarily long scanning of ends to find first pair 8693 8694 * sequence.c: Modified definition of Sequence_trimlength 8695 8696 * gregion.c: Handling case where extentstart goes past genomic position 0 8697 8698 * gmap.c: Handling case where genomebits is absent 8699 8700 * genome.c: Handling case of negative genomic coordinates 8701 8702 * diag.c, oligoindex_hr.c, stage3.c: Fixed ends to capture last oligomers of 8703 sequence and chimeras exactly 8704 8705 * stage2.c: Added comment 8706 8707 * bitpack64-access.c: Fixed size of accessor table 8708 8709 * dynprog.c: For non-SSE4.1 8-bit SIMD, using a separate pairscore array 8710 incremented by 128. 8711 87122013-07-15 twu 8713 8714 * dynprog.c: Turned off debugging 8715 8716 * trunk, src, dynprog.c, util: Reintegrated revisions from 8717 branches/2013-07-15-sse2-simd8 allowing SSE computers to run 8-bit SIMD 8718 dynamic programming procedures 8719 8720 * VERSION, config.site.rescomp.prd, index.html: Updated version number 8721 8722 * dynprog.c: Before calling Boyer-Moore procedure, requiring that textlen >= 8723 querylen 8724 8725 * dynprog.c: Not calling Boyer-Moore procedure if textright <= textleft 8726 8727 * dynprog.c: Fixed variable name for SSE2 code 8728 8729 * gsnap.c: Fixed --help comment for --mode flag 8730 8731 * gmap.c: Including a header file for compress-write.h 8732 8733 * compress-write.c: Fixed bug in writing genomebits for user-provided 8734 segment in GMAP 8735 8736 * genomicpos.c, genomicpos.h: Fixed formatting routine to work on large files 8737 8738 * trunk, Makefile.am, src, Makefile.gsnaptoo.am, bitpack64-access.c, 8739 bitpack64-access.h, bitpack64-write.c, bitpack64-write.h, gmapindex.c, 8740 gsnap.c, indexdb-write.c, sarray-read.c, sarray-write.c, sarray-write.h, 8741 util, gmap_build.pl.in, gmap_setup.pl.in: Reintegrated revisions from 8742 branches/2013-07-11-compress-lcp to compress lcp file 8743 8744 * stage1hr.c: Disallowing diagonals < querylength, which lead to left < 0, 8745 again 8746 8747 * boyer-moore.c, dynprog.c: Using new Genome_get_segment_blocks_left and 8748 Genome_get_segment_blocks_right procedures 8749 8750 * genome.c, genome.h: Added separate Genome_get_segment_blocks_left and 8751 Genome_get_segment_blocks_right 8752 87532013-07-14 twu 8754 8755 * stage1hr.c: In batch_init and identify_all_segments, allowing diagonal to 8756 be less than querylength, needed for insertions in reads at the beginning 8757 of the genome. 8758 8759 * genome.c: Fixed memory leak when genomebits file is not available 8760 8761 * Makefile.am, NOTICE: Added NOTICE file for distribution 8762 8763 * Makefile.gsnaptoo.am: Using new saca-k.c and saca-k.h files 8764 8765 * compress-write.c, genome.c: Fixed bug where coordinates above 2^31 were 8766 being treated as negative values 8767 8768 * iit_store.c: For IITs with divs (versions above 1), converting from 8769 Univinterval_T objects to Interval_T objects 8770 8771 * saca-k.c, saca-k.h, sarray-write.c: Moved suffix array construction 8772 procedures to a separate file. Using latest version of SACA-K code. 8773 8774 * gmap_build.pl.in: Added command for making suffix array 8775 87762013-07-11 twu 8777 8778 * trunk, VERSION, config.site.rescomp.prd, config.site.rescomp.tst, src, 8779 Makefile.gsnaptoo.am, compress-write.c, compress-write.h, compress.c, 8780 compress.h, compress128.c, compress128.h, genome-write.c, genome-write.h, 8781 genome.c, genome.h, genome128-write.c, genome128-write.h, genome128.c, 8782 genome128.h, genome128_hr.c, genome128_hr.h, genome_hr.c, genome_hr.h, 8783 genome_sites.c, genome_sites.h, get-genome.c, gmap.c, gmapindex.c, 8784 gsnap.c, indexdb-write.c, maxent128_hr.c, maxent128_hr.h, snpindex.c, 8785 splice.c, splicetrie.c, splicetrie_build.c, stage1hr.c, stage2.c, 8786 uniqscan.c, util, gmap_build.pl.in: Merged changes from 8787 branches/2013-07-05-new-genomecomp to implement 32-bit unshuffled 8788 representation of genome 8789 8790 * gmap_setup.pl.in: Added offsets, bitpackptrs, and bitpackcomp suffixes 8791 8792 * gff3_splicesites.pl.in: Removed debugging statement 8793 8794 * boyer-moore.c, dynprog.c: Using Genome_get_segment_blocks to avoid making 8795 repeated calls to Genome_get_char_blocks 8796 8797 * genome.c, genome.h: Implemented Genome_get_segment_blocks 8798 8799 * sarray-read.c: Fixed bug in known splicing, where index j was not being 8800 incremented 8801 8802 * samprint.c: Fixed hardclipping ends for circular alignments 8803 88042013-07-10 twu 8805 8806 * stage1hr.c: Fixed debugging statement 8807 88082013-07-09 twu 8809 8810 * gff3_introns.pl.in: Fixed warning statements 8811 88122013-07-05 twu 8813 8814 * setup1.test.in: Fixed test to use gmap_build, rather than gmap_setup 8815 8816 * oligoindex_hr.c: Handling special cases now within general case. Not 8817 allowing count/store rev SIMD procedures to have ptr go below 0. Fixed 8818 debugging comparison of std and SIMD results. 8819 8820 * gmap.c, stage2.c: Using new interfaces to Oligoindex_untally and 8821 Oligoindex_clear_inquery 8822 8823 * VERSION, config.site.rescomp.prd, config.site.rescomp.tst: Updated version 8824 number 8825 8826 * sarray-read.c, sarray-write.c: Changed lcp file suffix to be .salcp. 8827 Checking to see if suffix array files are present, and returning NULL 8828 gracefully if not. 8829 8830 * oligoindex.c, oligoindex.h: Clearing Oligoindex_T data structures after 8831 alignment by going through oligomers in the query, rather than using 8832 memset on the entire structure. 8833 8834 * uniqscan.c: Using new interface to Stage1hr_setup 8835 8836 * gsnap.c: Added --use-sarray flag 8837 8838 * stage1hr.c, stage1hr.h: Added parameter use_sarray_p to setup procedure. 8839 Improved warning message for very short reads. 8840 8841 * gff3_genes.pl.in: Printing genes with a single exon 8842 8843 * gff3_genes.pl.in, gff3_introns.pl.in, gff3_splicesites.pl.in: Allowing 8844 type CDS to be equivalent to exon 8845 88462013-07-04 twu 8847 8848 * oligoindex_hr.c: Removed global memcpy of oligospace in 8849 allocate_positions, and copying individual pointers at the same time as 8850 positions are assigned. 8851 8852 * gsnap.c, uniqscan.c: Using new interfaces to Stage1hr procedures 8853 8854 * stage1hr.c, stage1hr.h: Generalized mergeable list to be used for middle 8855 indels as well as splicing 8856 8857 * oligoindex_hr.c: Fixed values for startdiscard and enddiscard in special 8858 case of count_fwdrev_simd and store_fwdrev_simd when startptr + 3 == 8859 endptr. 8860 8861 * sarray-read.c: Implemented SIMD procedure for scanning array in 8862 Elt_fill_positions_filtered 8863 88642013-07-03 twu 8865 8866 * gsnap.c: For a single read, changing sarray access to be USE_MMAP_ONLY 8867 8868 * stage1hr.c: Removed requirement of nconcordant == 0 from performing GMAP 8869 terminal alignments. The concordant pairs can sometimes include pairs of 8870 terminal alignments. 8871 8872 * sarray-read.c: In calls to Genome_count_mismatches_limit, changed 8873 nmismatches_allowed parameter from incorrect value of 2 to nmisses_allowed. 8874 8875 * sarray-read.c: Increased value of SARRAY_EXCESS_HITS from 1000 to 100,000. 8876 Providing separate actions for USE_MMAP_PRELOAD and USE_MMAP_ONLY in 8877 Sarray_new. 8878 8879 * trunk, src, Makefile.gsnaptoo.am, access.c, genome.c, genome.h, 8880 genome_hr.c, genome_hr.h, gmapindex.c, gsnap.c, pair.c, pair.h, 8881 samprint.c, sarray-read.c, sarray-read.h, sarray-write.c, sarray-write.h, 8882 spanningelt.c, splice.c, splice.h, stage1hr.c, stage3hr.c, stage3hr.h, 8883 types.h, util: Merged revisions 100273 to 100402 from 8884 branches/2013-07-02-suffix-array-redo to implement suffix array algorithm 8885 88862013-07-02 twu 8887 8888 * gsnap.c: Added comment 8889 8890 * gmap_build.pl.in: When compression type is specified to be none, setting 8891 base size to be equal to the k-mer size. 8892 8893 * gmapindex.c: Fixed bug from failing to initialize compression_types. 8894 Changing to NO_COMPRESSION when base size is equal to k-mer size. 8895 8896 * indexdb-write.c, indexdb-write.h: When compression_type is NO_COMPRESSION, 8897 writing file as "offsets", rather than "offsetscomp". 8898 8899 * indexdb.c, indexdb_hr.c: Checking for NO_COMPRESSION case first 8900 8901 * oligoindex_hr.c: Fixed case for SIMD where startptr and endptr are 3 units 8902 apart (adjacent blocks). 8903 8904 * shortread.c: Ignoring accession and header for second queryseq in 8905 paired-end FASTA format over two files 8906 8907 * indexdb-write.c: Computing basesize separately for bitpack and gamma 8908 compression 8909 8910 * atoiindex.c, cmetindex.c, gmapindex.c, indexdb.c, indexdb.h, snpindex.c: 8911 Added offsets_only_p argument to Indexdb_get_filenames 8912 8913 * gmap_build.pl.in: Added -z flag for specifying compression types 8914 8915 * trunk, config.site.rescomp.prd, src, util: Merged changes from 8916 branches/2013-07-01-faster-splicing to speed up find_singlesplices and 8917 find_doublesplices 8918 8919 * gsnap.c, uniqscan.c: Using new interface to Stage1hr procedures 8920 8921 * stage1hr.c, stage1hr.h: Merged changes from 8922 branches/2013-07-01-faster-splicing to use a spliceable array. Removed 8923 code for quicksort version of identify_all_segments. 8924 8925 * genome_hr.c: Added comment about unshuffle procedure 8926 8927 * VERSION, config.site.rescomp.tst: Updated version number 8928 8929 * gff3_genes.pl.in, gff3_introns.pl.in, gff3_splicesites.pl.in: Handling 8930 GFF3 files that lack gene lines 8931 8932 * indexdb.c: Changed stderr message when allocating memory for output 8933 pointers 8934 8935 * bitpack64-read.c: Using a procedure dispatch table instead of a switch 8936 statement 8937 89382013-07-01 twu 8939 8940 * atoiindex.c, cmetindex.c, indexdb-write.c, indexdb-write.h, snpindex.c: 8941 Added support for bitpack compression 8942 8943 * gmapindex.c, indexdb.c, indexdb.h, indexdb-write.c, indexdb-write.h: Added 8944 ability to write positions from offsets compressed with bitpacking method 8945 8946 * bitpack64-read.c: Implemented Bitpack64_block_offsets to compute all 8947 offsets 8948 89492013-06-30 twu 8950 8951 * bitpack64-read.c: Gave distinct variable names for debugging statements 8952 8953 * indexdb-write.c, bitpack64-write.c: Added code so bitpack offsets file can 8954 be written without SIMD instructions. 8955 8956 * bitpack16-read.c, bitpack32-read.c: Fixed debugging code 8957 8958 * bitpack64-read.c: Made some speed improvements in non-SIMD code by only 8959 adding as many terms as needed. 8960 8961 * bitpack64-read.c: Added code for decoding of bitpack files without SSE2 8962 present 8963 8964 * Makefile.gsnaptoo.am, bitpack64-read.c, bitpack64-read.h, 8965 bitpack64-write.c, bitpack64-write.h, gmapindex.c, indexdb-write.c, 8966 indexdb.c, indexdb_hr.c, indexdbdef.h: Changed back to 64-element blocks, 8967 with only two pieces of meta-information, pointer and offset0, and 8968 packsize inferred from successive pointers. 8969 8970 * bitpack32-read.c, bitpack32-read.h: Performing cumulative sums within each 8971 unpack procedure 8972 89732013-06-29 twu 8974 8975 * bitpack32-read.c, bitpack32-read.h, bitpack32-write.c, bitpack32-write.h, 8976 indexdb-write.c, indexdb.c, indexdb_hr.c, indexdbdef.h, gmapindex.c, 8977 Makefile.gsnaptoo.am: Changed from 64-element blocks to 32-element 8978 half-blocks 8979 8980 * indexdb.c, indexdb.h: Added Indexdb_get_filenames procedure that returns a 8981 Filenames_T object. Put definition of Filenames_T into header file. 8982 8983 * genome_hr.c, genome_hr.h: Using Offsetscomp_T type 8984 8985 * bitpack16-write.c: Changed write procedure to take a pointer as argument, 8986 to avoid compiler warnings 8987 8988 * bitpack16-read.h: Added procedure to compute all offsets 8989 8990 * gmapindex.c, indexdb-write.h: Determining selection of filenames based on 8991 compression type for offsets. 8992 8993 * bitpack16-read.c, indexdb-write.c: Changed forrmat to put all metablock 8994 information, including offset and packsize, into the pointers file 8995 89962013-06-28 twu 8997 8998 * indexdbdef.h: Added values for compression types, and compression type 8999 field in Indexdb_T object. 9000 9001 * indexdb_hr.c: Calling bitpack or gamma compression at run time, as needed 9002 9003 * indexdb.c, indexdb.h: Searching for bitpack format, then gamma format, and 9004 then no compression. Created Filenames_T object to standardize routines. 9005 9006 * indexdb-write.c, indexdb-write.h: Added compression_types as a parameter, 9007 so multiple formats can be written 9008 9009 * bitpack16-read.c, bitpack16-read.h: Moved data structures to static 9010 variables, so they do not need to be passed as arguments each time. 9011 9012 * bitpack16-read.c, bitpack16-read.h: Using Blocksize_T type 9013 9014 * indexdb-write.c, indexdb-write.h: Restored lost code from large genome 9015 revisions 9016 9017 * Makefile.gsnaptoo.am: Added bitpack files for large genome programs 9018 9019 * trunk, config.site.rescomp.prd, src, Makefile.gsnaptoo.am, 9020 bitpack16-read.c, bitpack16-read.h, bitpack16-write.c, bitpack16-write.h, 9021 indexdb-write.c, indexdb-write.h, indexdb.c, indexdb_hr.c, util: 9022 Reintegrated revisions from branches/2013-06-14-bitpacking to add bitpack 9023 compression code 9024 9025 * trunk, index.html, src, util: Added link to large genomes version 9026 9027 * oligoindex_hr.c: Merged revisions 99785 to 99781 from 9028 branches/2013-06-27-simd-oligo to add SIMD code for counting and storing 9029 olimers 9030 90312013-06-27 twu 9032 9033 * types.h: Added comment 9034 9035 * stage1.c, gsnap.c, gmap.c: Using Width_T type 9036 9037 * spanningelt.c, spanningelt.h: Using Width_T types 9038 9039 * pair.c, pair.h: Changed types for binary search to be Chrpos_T 9040 9041 * indexdbdef.h, indexdb.c, indexdb.h: Using Width_T and Blocksize_T types 9042 9043 * indexdb-write.c: Resolved compiler warnings about signed/unsigned 9044 comparisons 9045 9046 * genome_hr.c, genome_hr.h: Using Blocksize_T type for offsetscomp_blocksize 9047 9048 * block.c, block.h: Using Width_T type for oligosize 9049 9050 * types.h: Added Width_T and Blocksize_t types 9051 9052 * trunk, README, configure.ac, src, Makefile.gsnaptoo.am, access.c, 9053 alphabet.c, alphabet.h, atoi.c, atoiindex.c, bigendian.c, bigendian.h, 9054 block.c, block.h, boyer-moore.c, boyer-moore.h, chimera.c, chimera.h, 9055 chrnum.c, chrnum.h, chrom.c, chrom.h, chrsubset.c, chrsubset.h, cmet.c, 9056 cmetindex.c, compress.c, compress.h, diag.c, diag.h, diagdef.h, dynprog.c, 9057 dynprog.h, gdiag.c, genome-write.c, genome-write.h, genome.c, genome.h, 9058 genome_hr.c, genome_hr.h, genomicpos.c, genomicpos.h, genuncompress.c, 9059 get-genome.c, gmap.c, gmapindex.c, goby.c, goby.h, gregion.c, gregion.h, 9060 gsnap.c, gsnap_tally.c, iit-read-univ.c, iit-read-univ.h, iit-read.c, 9061 iit-read.h, iit-write-univ.c, iit-write-univ.h, iit-write.c, iit_dump.c, 9062 iit_get.c, iit_store.c, iitdef.h, indexdb-write.c, indexdb-write.h, 9063 indexdb.c, indexdb.h, indexdb_hr.c, indexdb_hr.h, indexdbdef.h, 9064 interval.c, interval.h, intron.c, intron.h, littleendian.c, 9065 littleendian.h, mapq.c, mapq.h, match.c, match.h, matchdef.h, matchpool.c, 9066 matchpool.h, maxent_hr.c, maxent_hr.h, oligo.c, oligo.h, oligoindex.c, 9067 oligoindex.h, oligoindex_hr.c, oligoindex_hr.h, oligop.c, oligop.h, 9068 outbuffer.c, outbuffer.h, pair.c, pair.h, pairdef.h, parserange.c, 9069 parserange.h, samprint.c, samprint.h, segmentpos.c, segmentpos.h, 9070 snpindex.c, spanningelt.c, spanningelt.h, splicetrie.c, splicetrie.h, 9071 splicetrie_build.c, splicetrie_build.h, stage1.c, stage1.h, stage1hr.c, 9072 stage1hr.h, stage2.c, stage2.h, stage3.c, stage3.h, stage3hr.c, 9073 stage3hr.h, substring.c, substring.h, tableint.c, tableuint.c, 9074 tableuint.h, tableuint8.c, tableuint8.h, types.h, uint8list.c, 9075 uint8list.h, uintlist.c, uintlist.h, uniqscan.c, univinterval.c, 9076 univinterval.h, iittest.iit.ok, util: Reintegrated changes from 9077 branches/2012-02-14-biggenomes to handle large genomes 9078 9079 * index.html: Updated comments for latest version 9080 9081 * trunk, src, dynprog.c, oligoindex.c, oligoindex.h, oligoindex_hr.c, 9082 stage2.c, util: Merged revisions 99648 to 99702 from 9083 branches/2013-06-25-simd-8 to use unsigned chars for counts, and 8-bit 9084 SIMD instructions for allocate_positions 9085 9086 * stage3.c: Removed assertion that c != g when filling in a single bp gap, 9087 which can occur with overabundant oligomers, such as poly-A 9088 90892013-06-26 twu 9090 9091 * VERSION, index.html: Updated version number 9092 9093 * trunk, ax_ext.m4, config.site, config.site.rescomp.tst, configure.ac, src, 9094 boyer-moore.c, dynprog.c, gmap.c, gsnap.c, stage3hr.c, util: Merged 9095 revisions 99462 to 99647 from branches/2013-06-25-simd-8 to allow for 9096 dynamic programming with 8-bit chars when SSE4.1 is available 9097 9098 * stage3.c: Allowing insert_gapholders to fill in single mismatches, rather 9099 than inserting a gap. Fixed peel_rightward and peel_leftward, so the 9100 extrapeel step does not remove gaps. 9101 9102 * stage3hr.c: Fixed issues with wrong queryseq sequence for second end being 9103 printed in standard GSNAP output 9104 9105 * dynprog.c: Further check to make sure traceback_local does not give rise 9106 to negative querypos coordinates. 9107 91082013-06-25 twu 9109 9110 * dynprog.c: Distinguishing between NEG_INFINITY and NEG_INFINITY_DISPLAY 9111 9112 * dynprog.c: Made traceback_local code follow that for traceback 9113 9114 * VERSION, index.html: Updated version number 9115 9116 * dynprog.c: Fixed bug in traceback_local where r is negative 9117 91182013-06-18 twu 9119 9120 * VERSION, index.html: Updated version number 9121 9122 * stage3.c: Using new interface to Pair_print_sam, which requires two 9123 accessions 9124 9125 * gsnap.c, outbuffer.c, pair.c, pair.h, samprint.c, samprint.h, shortread.c, 9126 shortread.h, stage3hr.c: Added flag --allow-pe-name-mismatch 9127 9128 * oligoindex_hr.c: Fixed faulty cmpgt statement based on 16-bit quantities, 9129 instead of 32-bit quantities 9130 91312013-06-14 twu 9132 9133 * dynprog.c: Removed unnecessary initialization for SIMD code 9134 9135 * trunk, util, src, gsnap.c, iit-read.c, iit-read.h, stage1hr.c, stage1hr.h: 9136 Merged revisions 98458 to 98523 from branches/2013-06-13-sort-not-merge to 9137 speed up update of chromosome bounds in identify_all_segments 9138 9139 * VERSION: Removed newline from file 9140 91412013-06-13 twu 9142 9143 * VERSION, config.site.rescomp.tst, index.html: Updated version number 9144 9145 * trunk, src, gmap.c, gsnap.c, indexdb.c, stage1hr.c, util: Merged revisions 9146 98429 to 98457 from branches/2013-06-13-sort-not-merge to add code for 9147 qsort, although code not used 9148 9149 * config.site.rescomp.prd: Removed -g flag from production CFLAGS 9150 9151 * trunk, config.site.rescomp.prd, index.html, src, dynprog.c, util: Merged 9152 revisions 97749 to 98423 from branches/2013-06-05-dynprog-sse to use SIMD 9153 instructions for dynamic programming procedures 9154 91552013-06-12 twu 9156 9157 * dynprog.c: Fixed an unassigned value for rlo. Fixed code in 9158 make_splicejunction_3 where the standard splicejunction was reverse 9159 complemented, but not splicejunction_alt. 9160 91612013-06-10 twu 9162 9163 * oligoindex_hr.c: Using compile-time constants to clarify code 9164 9165 * gmap.c, gsnap.c: Added run-time check of compiler assumptions 9166 91672013-06-09 twu 9168 9169 * dynprog.c: Fixes to debugging code. Slightly more efficient 9170 initialization. 9171 91722013-06-08 twu 9173 9174 * oligoindex_hr.c: Computing skips of empty counts to increase speed of 9175 allocate_positions 9176 91772013-06-07 twu 9178 9179 * oligoindex_hr.c: Using a convert instruction instead of a store 9180 instruction to compute the final sum in allocate_positions. 9181 9182 * oligoindex_hr.c: In allocate_positions, using SIMD commands to check when 9183 positions need to be computed 9184 9185 * gsnap.c: Turning off terminal alignments by default for DNA-Seq alignment. 9186 Added "all" option for --gmap-mode. 9187 9188 * trunk, config.site.rescomp.prd, config.site.rescomp.tst, src, dynprog.c, 9189 pair.c, Makefile.am, util: Merged revisions 97750 to 97963 from 9190 branches/2013-06-05-dynprog-sse for a faster and better implementation of 9191 dynamic programming procedures 9192 9193 * indexdb.c: Removed forced writing of positions by chunks for debugging 9194 9195 * Makefile.dna.am, Makefile.gsnaptoo.am: Removed PTHREAD_CFLAGS from LDFLAGS 9196 and moved SIMD_FLAGS to CFLAGS 9197 9198 * config.site: Added sections for bzlib, simd, and popcnt 9199 9200 * gmap.c: Moved free of usersegment to end, after we decide whether we need 9201 to free genome_blocks 9202 9203 * configure.ac: Moved check for SIMD to be close to that for check for popcnt 9204 9205 * VERSION: Updated version number 9206 9207 * configure.ac: Added flags --enable-simd and --disable-simd to control 9208 check for SIMD features 9209 9210 * indexdb.c: Added a second attempt to write positions file in chunks if the 9211 first write fails. Added a sanity check in reading in a genomic index 9212 that the positions file has the expected size. 9213 92142013-06-05 twu 9215 9216 * configure.ac: Fixed comment 9217 9218 * ax_ext.m4: Changed comment lines for config.h to be standard definitions 9219 of 1 9220 9221 * Makefile.dna.am, Makefile.gsnaptoo.am: Added SIMD_FLAGS 9222 9223 * gmap.c, gsnap.c: Added available SIMD functions to --version output 9224 9225 * indexdb.c: Added check that positions file has the expected size 9226 9227 * oligoindex.c, oligoindex.h, oligoindex_hr.c: Using SIMD functions for 9228 allocate_positions 9229 9230 * acinclude.m4, configure.ac: Added check for SIMD support 9231 9232 * ax_ext.m4: Fixed spelling errors 9233 9234 * ax_check_compile_flag.m4, ax_gcc_x86_cpuid.m4, ax_ext.m4: Initial import 9235 into SVN 9236 9237 * dynprog.c: Created separate source code for jump late and jump early 9238 conditions 9239 92402013-05-28 twu 9241 9242 * pair.c: Fixed bug in final revcomp of sequence, where genomealt was not 9243 being complemented 9244 92452013-05-22 twu 9246 9247 * archive.html: Storing version 2013-03-31 into archive 9248 9249 * index.html: Updated for version 2013-05-09 9250 9251 * iit_get.c: No longer printing total, when getting queries from stdin 9252 9253 * parserange.c: Handling a case where interpreting query as a contig, and 9254 the result is NULL 9255 9256 * pair.c: Fixed computation of coverage for GFF output 9257 92582013-05-17 michafla 9259 9260 * Makefile.am, bootstrap.dna, bootstrap.gmaponly, bootstrap.gsnaptoo, 9261 bootstrap.pmaptoo, bootstrap.three, configure.ac: Port from release: use 9262 only 'config' for m4 macros 9263 92642013-05-09 twu 9265 9266 * dynprog.c, dynprog.h, splicetrie.c: In making splice junctions, checking 9267 for junctions that go to the left of genomic position 0. 9268 92692013-05-08 twu 9270 9271 * samprint.c: Fixed SAM output for translocations, affected by changes to 9272 hardclipping 9273 92742013-05-07 twu 9275 9276 * stage2.c: In find_shifted_canonical, checking for leftpos and rightpos 9277 that exceed chrhigh 9278 92792013-05-06 twu 9280 9281 * uniqscan.c: Using new interface to Stage1_single_read 9282 9283 * gsnap.c, stage1hr.c, stage1hr.h, uniqscan.c: Passing 9284 min_distantsplicing_end_matches and min_distantsplicing_identity to 9285 Stage1hr_setup. Made GSNAP more sensitive to shorter reads, such as 50 bp. 9286 9287 * stage3.c: Removed line that is not reached. 9288 92892013-05-03 twu 9290 9291 * gmap.c: Added comment in --help output about chimeric alignments being an 9292 exception to the filtering options 9293 9294 * stage3.c, stage3.h: Added functions Stage3_recompute_coverage and 9295 Stage3_passes_filter 9296 9297 * gmap.c: Added options --min-trimmed-coverage and --min-identity 9298 9299 * stage2.c: Further check to make sure splice site does not occur in first 3 9300 bp of segment in find_shifted_canonical 9301 93022013-05-02 twu 9303 9304 * stage2.c: In find_shifted_canonical, preventing discovery of splice sites 9305 in first 3 bp of segment 9306 9307 * index.html: Made changes for version 2013-03-31 (v5) 9308 9309 * gmap.c: Checking value of Stage3_merge_chimera before creating chimera and 9310 running merge_left_and_right_transloc. 9311 9312 * stage3.c, stage3.h: Stage3_merge_chimera checks for pairs on left and 9313 right being non-NULL, and returns a bool 9314 9315 * pairpool.c: Fixed Pairpool_clip_bounded so it handles the case where the 9316 list is NULL. 9317 93182013-04-30 twu 9319 9320 * get-genome.c: Added --exact flag 9321 9322 * stage3hr.c: Removed exclusions in Stage3pair_overlap that prevented tails 9323 from being hard clipped. 9324 9325 * shortread.c: Fixed hard clipping of reads and quality strings to match new 9326 definitions of hardclip_low and hardclip_high 9327 93282013-04-26 twu 9329 9330 * iit_get.c: Added --exact option 9331 9332 * chimera.c: Adding range of chimeric overlap in XT SAM field 9333 9334 * outbuffer.c: Using new interface to SAM_print 9335 9336 * stage3hr.h, stage3hr.c: Revised Stage3pair_overlap to return hardclip5, 9337 hardclip3, and clipdir. 9338 9339 * samprint.h: Added clipdir as a parameter to SAM_print 9340 9341 * samprint.c: Handling clipping of overlaps when the low end of the first 9342 read and the high end of the second read should be clipped. This is 9343 indicated by clipdir of -1, and occurs when the insert length is so short 9344 that the two reads have passed each other. 9345 93462013-04-12 twu 9347 9348 * pair.c, pair.h, samprint.c: Providing effective_chrnum of mate to 9349 Pair_print_sam, in case GMAP alignment has a translocation for a mate 9350 93512013-04-11 twu 9352 9353 * gmap.c: Made changes for PMAP to compile 9354 93552013-04-10 twu 9356 9357 * gmap.c: Fixed memory issues in merging middle pieces 9358 93592013-04-09 twu 9360 9361 * Makefile.dna.am: Fixed name of file 9362 93632013-04-05 twu 9364 9365 * stage1hr.c, stage3hr.c, stage3hr.h: When terminal_threshold is set to a 9366 high value, using trim_left_raw and trim_right_raw to exclude GMAP hits. 9367 9368 * samprint.c: Reverting to previous version, where chrpos of 0 indicates 9369 nomapping 9370 9371 * oligoindex_hr.c: Fixed an issue where minus oligomers were extending past 9372 the beginning 9373 9374 * stage3.c: Removed the unused variables nintrons, nnonintrons, intronlen, 9375 and nonintronlen. 9376 93772013-04-04 twu 9378 9379 * gmap.c, outbuffer.c, stage3.h: Changed name from map_genes to map_ranges, 9380 to avoid confusion 9381 9382 * iit_store.c: Handling empty files gracefully 9383 9384 * iit-write.c: If no intervals are found, then returning gracefully instead 9385 of exiting 9386 9387 * outbuffer.c: If allow_chimeras_p is false and chimera is present, then 9388 effective_maxpaths is 0. 9389 9390 * gmap.c: Added flag --no-chimeras 9391 9392 * stage3hr.c: Added debugging statement 9393 9394 * samprint.c: Using NULL hit instead of zero chrpos to indicate lack of 9395 mapping 9396 93972013-04-03 twu 9398 9399 * dynprog.c, stage3.c: Fixed uninitialized variable for g_alt in 9400 get_genomic_nt 9401 9402 * stage3hr.c, substring.c, substring.h: Updating nmismatches_bothdiff also 9403 94042013-04-02 twu 9405 9406 * stage3hr.c: Handling the new pairtype UNSPECIFIED. 9407 9408 * stage1hr.c: Allowing align_pair_with_gmap to change final_pairtype. Calls 9409 Stage3pair_new with pairtype UNSPECIFIED. 9410 9411 * resulthr.c, resulthr.h: Added function Pairtype_string 9412 94132013-04-01 twu 9414 9415 * VERSION, config.site.rescomp.tst: Updated version number 9416 9417 * stage3.c: Added debugging statements 9418 9419 * splicetrie.c: Using computed miss_score. Both miss_score and 9420 threshold_miss_score are negative. 9421 9422 * smooth.c, smooth.h: Added code for marking short exons, but not used 9423 9424 * pair.c, pair.h: Removed unused procedures 9425 9426 * dynprog.h, dynprog.c: Computing miss_score. Both miss_score and 9427 threshold_miss_score are negative. 9428 94292013-03-30 twu 9430 9431 * stage3.c: Fixed bug in handling list for middle exon in 9432 build_pairs_dualintrons. In traverse_single_gap, when forcep == false, not 9433 adding pairs if finalscore < 0. 9434 94352013-03-28 twu 9436 9437 * stage3.c: Skipping pass 9 (distalmedial) and relying instead of 9438 trim_noncanonical_exons. In pass 8, for extend_endings, not removing 9439 indel gaps, and setting quit_on_gap_p true, to preserve indels at ends. 9440 Not setting ambig_end_lengths in trim_noncanonical_exons, reserving it 9441 instead for trim_novel_spliceends. Performing final extension of ends at 9442 end of path_compute, and not at beginning of path_trim. 9443 94442013-03-27 twu 9445 9446 * stage1hr.c: Allowing redo of GMAP pairs based on inconsistent senses. 9447 9448 * gmap.c, gsnap.c: Increased maxpeelback from 11 to 20 9449 9450 * stage3.c: Added quit_on_gap_p parameter to peel_rightward and 9451 peel_leftward. This allows smoothing procedures after traverse_single_gap 9452 to merge gaps, and dynamic programming traversal of introns to add indels. 9453 Calling remove_indel_gaps before dynamic programming solutions of introns. 9454 Added computation of max_intron_score, and using it in 9455 pick_cdna_direction to determine if sense is NULL. 9456 9457 * Makefile.dna.am: Added commands for splicing-score program 9458 9459 * stage3hr.c: Handling case in Stage3pair_new where expect_concordant_p is 9460 false for a GMAP alignment 9461 9462 * stage1hr.h, gsnap.c, uniqscan.c: Added distances_observed_p to 9463 Stage1hr_setup 9464 9465 * splicetrie_build.c: When observed distance is greater than 9466 localsplicedist, storing observed distance 9467 9468 * stage1hr.c: In find_splicepairs_distant, if splice distance is within the 9469 known maximum distance, then treating as a local splice, rather than a 9470 distant splice. 9471 9472 * substring.c: Added code for Substring_intragenic_splice_p, but using 9473 version in stage1hr.c instead 9474 9475 * stage3hr.c: In Stage3_determine_pairtype, using effective_chrnum instead 9476 of chrnum, so new pairs with PAIRED_UNSPECIFIED from GMAP runs work. 9477 9478 * stage3hr.h: Added function Stage3pair_sense_consistent_p. 9479 9480 * stage3hr.c: Changed ambig_end_interval, used in penalty, to 8. In 9481 Stage3end_pick_cdna_direction, returning a non-zero cdna_direction. Added 9482 function Stage3pair_sense_consistentp. 9483 9484 * stage3.c: Allowing ambig end only if medial prob > 0.95. Moved 9485 trim_novel_spliceends earlier, so it is effective. In 9486 pick_cdna_direction, making a final use of alignment score. Checking for 9487 divide by zero in computing defect_rate. 9488 9489 * stage1hr.c: Using matches over entire read to determine whether to perform 9490 GMAP. Performing halfmapping of GMAP against terminals only if nconcordant 9491 is 0. Added a redo step on align_pair_with_gmap if the senses are 9492 inconsistent. 9493 9494 * inbuffer.c, shortread.c, shortread.h: Allowing FASTA input files to be 9495 paired 9496 94972013-03-26 twu 9498 9499 * stage1hr.c: For GMAP alignments with long trimmed ends, comparing 9500 terminal_threshold against user_maxlevel in deciding whether to drop them. 9501 95022013-03-25 twu 9503 9504 * stage3hr.c, stage1hr.c: Allowing GMAP improvement to be run on paired 9505 results 9506 9507 * samprint.c: Adding PG:Z:T output for terminal alignments 9508 9509 * stage3hr.c: Switching back to old GMAP filter, where nmatches_posttrim 9510 must exceed querylength/2. 9511 9512 * stage1hr.c: Applying terminal_threshold test to GMAP alignments 9513 95142013-03-21 twu 9515 9516 * dynprog.c: Allowing only two mismatches in a distant splice 9517 9518 * gmap.c, gsnap.c: Made results of --version consistent 9519 9520 * stage1hr.c: Applying match length criterion to terminal GMAP alignments 9521 9522 * stage1hr.c: Incrementing nconcordant only for high-quality GMAP pairsearch 9523 results, where nmatches is high enough, instead of using GMAP score. 9524 95252013-03-14 twu 9526 9527 * splicetrie_build.c: Fixed type 9528 9529 * gsnap.c, shortread.c, shortread.h: Implemented --force-single-end flag 9530 9531 * index.html: Added comment about --force-single-end flag 9532 9533 * README: Added description of stranded and nonstranded cmet and atoi modes. 9534 9535 * README: Added information about --force-single-end flag and multiple FASTQ 9536 files on the command line. 9537 9538 * configure.ac: Removed --with-samtools flag 9539 95402013-03-13 twu 9541 9542 * VERSION, config.site.rescomp.tst, index.html: Updated version number 9543 9544 * Makefile.dna.am, gmap.c, oligoindex_hr.c, stage3.c, stage3.h: Made changes 9545 so PMAP would compile 9546 9547 * index.html: Made changes for release 2013-03-05 9548 9549 * splicetrie_build.c: Added code for get_exons, which may be needed for 9550 getting splice sites from a genes IIT file 9551 9552 * genome-write.h: Changed UINT4 to Genomecomp_T 9553 9554 * types.h: Added Genomecomp_T 9555 9556 * stage3.c: Added minimum length for running stage2 in a dual break 9557 9558 * stage2.c: Improved debugging statements 9559 95602013-03-12 twu 9561 9562 * genome_hr.c: Returning correct reverse chrpos for 9563 prev_dinucleotide_position_rev. Printing universal coordinates on blocks 9564 for debugging. Improved debugging statements for finding dinucleotides. 9565 9566 * stage3.c: Fixed pass 7, so path is expected in loop, and then is converted 9567 to pairs at the end. 9568 9569 * pair.c: In Pair_start_bound and Pair_end_bound, skipping pairs with 9570 querypos < 0, which represent gaps. 9571 95722013-03-06 twu 9573 9574 * indexdb.c: Removed code for writing word-by-word 9575 9576 * gsnap.c: Fixed bug where --adapter-strip=off had no effect. 9577 95782013-03-05 twu 9579 9580 * fa_coords.pl.in, gmap_build.pl.in, gmap_process.pl.in: Allowing FASTA file 9581 names and gzipped file names to contain spaces 9582 9583 * gsnap.c: Added entry for distant-splice-identity 9584 9585 * VERSION, config.site.rescomp.tst: Updated version number 9586 9587 * stage3hr.c: Restored check on bad GMAP alignments 9588 9589 * stage1hr.c: Taking both long and short terminals, with short terminals 9590 based on half the number of mismatches. Always setting long and short 9591 terminal pos based on one-third of the read length. 9592 9593 * gmap.c: Fixed issues with computation of chimeric middle pieces, including 9594 memory freeing 9595 95962013-03-04 twu 9597 9598 * dynprog.c: Fixed bug with uninitialized variable introntype in 9599 Dynprog_genome_gap 9600 96012013-02-28 twu 9602 9603 * stage1hr.c: No longer using Stage3_short_alignment_p to rule out GMAP hits 9604 9605 * gsnap.c, stage3hr.c, stage3hr.h: Using min-nconsecutive criterion instead 9606 of min-coverage criterion for keeping GMAP hits. 9607 96082013-02-26 twu 9609 9610 * stage1hr.c: Introduced idea of long terminals and short terminals, with 9611 terminal length = querylength/3. Allowing GMAP pairsearch to set 9612 nconcordant. 9613 9614 * stage1hr.c: Reverted to not recording GMAP pairsearch successes as 9615 nconcordant 9616 9617 * stage3hr.c: No longer calling Stage3_bad_stretch_p on GMAP alignment. Not 9618 allowing end indels to set trims for alignment comparisons. Penalizing 9619 indels in alignment comparisons. 9620 9621 * stage3.c, stage3.h: Added procedure Stage3_good_part 9622 9623 * stage1hr.c: Recording GMAP pairsearch successes as nconcordant 9624 96252013-02-25 twu 9626 9627 * index.html: Updated to reflect new version number 9628 9629 * VERSION: Updated version number 9630 9631 * gsnap.c: Increased values of max_gmap_pairsearch and max_gmap_terminals 9632 from 10 and 5, respectively, to 50 and 50 9633 9634 * stage1hr.c: Sorting terminals by matches before running GMAP against a 9635 limited number of them 9636 9637 * uniqscan.c: Printing given sequence in addition to uniqueness result 9638 9639 * stage3.c: Reduced requirement for GMAP from querylength/2 to querylength/3 9640 9641 * gsnap.c: Reduced gmap_min_coverage from 0.50 to 0.33 9642 9643 * stage1hr.c: Counting nhits during subs and indels, and exiting when the 9644 value exceeds maxpaths_search 9645 96462013-02-22 twu 9647 9648 * gff3_introns.pl.in, gff3_splicesites.pl.in: Disallowing negative intron 9649 lengths 9650 9651 * gff3_genes.pl.in, gff3_introns.pl.in: Skipping blank lines. 9652 9653 * gff3_splicesites.pl.in: Fixed bug in warning message. Skipping blank 9654 lines. 9655 9656 * stage3hr.c: Setting value of guided_insertlength for exact hits 9657 9658 * gsnap.c, outbuffer.c, outbuffer.h, samprint.c, samprint.h, stage1hr.c, 9659 stage1hr.h: Created separate variables maxpaths_search and maxpaths_report 9660 9661 * shortread.c: Fixed potential bug if sequence is longer than sequence_length 9662 9663 * uniqscan.c: Computing full sequence first, then iterating from start until 9664 we reach a unique alignment. 9665 96662013-02-21 twu 9667 9668 * samprint.c: Fixed assertion for circularpos to handle trimming at ends 9669 9670 * substring.c: Handling trimmed regions in Substring_circularpos 9671 96722013-02-15 twu 9673 9674 * VERSION, config.site.rescomp.tst, index.html: Updated version number 9675 9676 * stage1hr.c: Changed terminal length for one-half to one-third 9677 9678 * stage1hr.c: Fixed bug in the terminal position used for comparing 9679 mismatches in find_terminals 9680 96812013-02-14 twu 9682 9683 * stage3hr.c: Preventing distant splices from being considered as circular 9684 aliases and removed 9685 9686 * stage1hr.c: Fixed typo in comment 9687 96882013-02-12 twu 9689 9690 * stage1hr.c: Increased MAXCHIMERAPATHS from 3 to 100 9691 96922013-02-07 twu 9693 9694 * stage1hr.c, stage3hr.c, stage3hr.h: Added Stage3end_eval_and_sort_guided, 9695 to sort one end of unpaired alignments when the other end has a unique 9696 alignment. 9697 9698 * get-genome.c: Fixed bug in printing accession for sequence, based on 9699 coordinates, for minus strand 9700 97012013-02-06 twu 9702 9703 * uniqscan.c: Running scan for entire sequence length. Reduced 9704 terminal-threshold to 5. 9705 97062013-02-05 twu 9707 9708 * VERSION, index.html: Updated version number 9709 9710 * genome-write.c, genome_hr.c, indexdb.c, outbuffer.c, stage1hr.c, stage3.c, 9711 substring.c: Fixed static analysis errors found by Nathan Weeks using the 9712 Clang 3.1 compiler 9713 97142013-01-24 twu 9715 9716 * VERSION, config.site.rescomp.tst, archive.html, index.html: Updated 9717 version number 9718 9719 * iit_get.c: Allowing -N flag to be used with tally IIT files 9720 9721 * outbuffer.c: Fixed bug where SAM headers were duplicated in .nomapping file 9722 97232013-01-17 twu 9724 9725 * pair.c: Added PG:Z:M flag for alignments using GMAP method within GSNAP 9726 9727 * stage3.c: Made further changes to peel_leftward and peel_rightward to 9728 prevent an indel from being on top of pairs or path 9729 9730 * Makefile.dna.am: Add bzip2 files 9731 9732 * stage3.c: Changed peel_leftward and peel_rightward so they do not leave a 9733 gap or indel at the top of the pairs or path. 9734 97352013-01-16 twu 9736 9737 * iit_store.c: Added debugging statement 9738 9739 * snpindex.c: Added information for warning statement from check_acgt 9740 9741 * configure.ac: Added configure commands for bzip2 library 9742 9743 * README: Added comment about bunzip2 9744 9745 * Makefile.gsnaptoo.am, bzip2.c, bzip2.h, gsnap.c, inbuffer.c, inbuffer.h, 9746 sequence.c, sequence.h, shortread.c, shortread.h: Added procedures for 9747 handling bunzip2 9748 9749 * snpindex.c: Fixed issue where warning messages referred to wrong labels 9750 9751 * stage1hr.c: Added debugging information about hits used as anchors for GMAP 9752 9753 * stage3hr.c: Restored checks on chromosomal bounds for Stage3_new_gmap, and 9754 returning NULL when bounds are exceeded 9755 97562012-12-19 twu 9757 9758 * VERSION, config.site.rescomp.tst, index.html: Updated version number 9759 9760 * stage3.c: In assign_gap_types, handling case where cdna_direction == 0 9761 9762 * gmap.c: Calling Stage3_guess_cdna_direction at appropriate places after 9763 merge_left_and_right_readthrough. Using maxextension instead of 9764 max_intronlength for Stage 1 computations. 9765 9766 * stage3.h: Added function Stage3_guess_cdna_direction 9767 9768 * stage3.c: Using new interface to Pairpool_push_gapalign. Not performing 9769 guess of cdna_direction in Stage3_merge_local_splice. 9770 9771 * stage3hr.c, stage3hr.h: Stage3_pair_up_concordant now limits number of 9772 samechr results 9773 9774 * stage1hr.c: Using new interface to Stage3_pair_up_concordant, which now 9775 takes nsamechr 9776 9777 * stage1.c, stage1.h: Using extensionlen instead of max_intronlength. 9778 Providing variables in Stage1_setup. 9779 9780 * pairpool.c, pairpool.h: Handling introntype field in Pair_T object 9781 9782 * pairdef.h: Added introntype as field for Pair_T object 9783 9784 * pair.c, pair.h: Added function Pair_fix_cdna_direction_array 9785 9786 * outbuffer.c, outbuffer.h: Adding commas to output of memory usage 9787 9788 * mem.c, mem.h: Added reporting of peak memory usage 9789 9790 * intron.c: For cdna_direction of 0, returning introntype rather than 9791 NONINTRON 9792 97932012-12-18 twu 9794 9795 * stage3.c: In Stage3_merge_local_splice, if intronlength is small or 9796 negative, calling Stage3_merge_local_single. 9797 9798 * mem.c: Increased size of hash table 9799 9800 * gmap.c: Putting results of middle search into middlepieces, and iterating 9801 through that list. Searching for local readthroughs first, but linking 9802 local middle piece to both ends in any case. 9803 98042012-12-14 twu 9805 9806 * gsnap.c: Fixed --help documentation to show default for -a is off 9807 98082012-12-11 twu 9809 9810 * VERSION, index.html: Updated version number 9811 9812 * stage3.c: In Stage3_merge_local_splice, not advancing querypos for a 9813 deletion 9814 9815 * trunk, VERSION, src, gmap.c, pair.c, result.c, stage3.c, stage3.h, util: 9816 Merged changes from branches/2012-11-11-middle-piece to allow for 9817 searching of middle pieces in GMAP 9818 9819 * uinttable.c, table.c: Fixed memory leak 9820 9821 * samprint.c: Fixed issue with CIGAR string and hard-clipping 9822 9823 * gmap.c: Not finding chimeras if --nosplicing is requested 9824 98252012-12-10 twu 9826 9827 * indexdb.c: In writing gammas, using a buffer for offsetcomp, so we do not 9828 write words one at a time, which is slow on some filesystems 9829 9830 * samprint.c: In compute_cigar, handling D and N types based on querypos 9831 9832 * samprint.c: Rewrote compute_cigar 9833 98342012-12-07 twu 9835 9836 * gmap.c: Added separate procedure evaluate_query for use before all calls 9837 to Stage1_compute. Turned on check for repetitivep. 9838 9839 * VERSION, config.site.rescomp.tst, index.html: Updated version number 9840 9841 * archive.html: Added entry for 2012-07-20.v2 9842 9843 * gmap_build.pl.in: Added -e or --nmessages flag 9844 9845 * samprint.c: Fixed problem with CIGAR string and clip-overlap function 9846 9847 * shortread.c: No longer assuming that slashes in the input FASTQ file are 9848 present consistently 9849 98502012-12-06 twu 9851 9852 * stage1hr.c: Added debugging messages 9853 9854 * iit_get.c: Added printing of flanking results in stdin mode. Added -C 9855 flag to force interpretation of queries as coordinates. 9856 9857 * iit-read.c: Fixed coord_search_low and coord_search_high to prevent it 9858 from going below given chromosome 9859 9860 * gsnap.c: Revised --help message to give correct formula for fast index size 9861 9862 * gmapindex.c, genome-write.c, genome-write.h: Added nmessages as a parameter 9863 9864 * gmap.c: Using new interface to Stage3_merge_chimera 9865 9866 * get-genome.c: Added --signed flag 9867 9868 * stage3.h: Doing full trimming of inside ends of chimeras 9869 9870 * stage3.c: Doing full trimming of inside ends of chimeras. Added commented 9871 code from revised 2012-07-20 version to prevent chimera extensions from 9872 crossing chromosomal coordinates. 9873 9874 * pair.c: Handling case for MD string in reverse query direction of I-to-D 9875 and N-to-D transitions 9876 98772012-11-28 twu 9878 9879 * get-genome.c: Fixed sign to be 0 when coordend == coordstart 9880 98812012-11-27 twu 9882 9883 * VERSION, index.html: Updated version number 9884 9885 * gmap_process.pl.in: Turning off handling of pipes in FASTA headers 9886 9887 * iit_get.c: Using new interface for IIT_get_flanking_typed 9888 9889 * iit-read.c, iit-read.h: Added procedures for getting signed results 9890 9891 * get-genome.c: Added --signed flag 9892 9893 * samprint.c: Fixed bugs in cigar strings for hard clipping and 9894 translocations 9895 98962012-11-21 twu 9897 9898 * VERSION, index.html: Updated version number 9899 9900 * gmap.c: Added a limit to iterations of chimera search to prevent an 9901 infinite loop 9902 99032012-11-19 twu 9904 9905 * VERSION, index.html: Updated version number 9906 9907 * gmap.c: On chimera search, calling original found subsequence, in case 9908 stage 1 was led astray by the ends 9909 9910 * stage2.c: Changed debugging call 9911 9912 * dynprog.c: Not aborting from Dynprog_microexon_int if cdna_direction is 0, 9913 which can happen with chimeric joins. Returning NULL instead. 9914 9915 * stage2.c: Moved some debugging statements around 9916 9917 * stage1.c: Added debugging statements for printing final results 9918 9919 * gregion.c: Increasing MAX_GENOMICLENGTH from 1 million bp to 2 million bp 9920 9921 * stage3.c: Using make_pairarrays instead of make_pairarrays_chimera in 9922 Stage3_merge_local_splice. Calling this even for dual break. 9923 9924 * stage2.c: Made slight differences in coordinates for Oligoindex_hr_tally 9925 on minus strand 9926 9927 * oligoindex_hr.c: Added debugging statements 9928 9929 * oligoindex.c, oligoindex.h: Added debugging procedure 9930 9931 * pair.c: Changed all types of exon_genome positions from int to Genomicpos_T 9932 99332012-11-16 twu 9934 9935 * stage3.c: Fixed bug in computing chrstart and chrend for 9936 Stage2_compute_one from traverse_dual_break. Making all splice_genomepos 9937 consistently as chromosomal coordinates. 9938 9939 * stage3hr.c: Handling case in Stage3_cdna_direction where stage3 is NULL 9940 9941 * pair.c, pair.h, samprint.c: Fixed CIGAR strings when clip-overlap hits an 9942 insertion 9943 99442012-11-15 twu 9945 9946 * pairpool.c: Added assert.h 9947 9948 * trunk, config.site.rescomp.tst, src, boyer-moore.c, boyer-moore.h, diag.c, 9949 diag.h, diagdef.h, dynprog.c, dynprog.h, genome.c, genome_hr.c, 9950 genome_hr.h, gmap.c, gregion.c, gregion.h, iit-read.c, iit_dump.c, 9951 oligoindex.c, oligoindex.h, oligoindex_hr.c, oligoindex_hr.h, pair.c, 9952 pair.h, pairpool.c, splicetrie.c, splicetrie.h, splicetrie_build.c, 9953 stage1hr.c, stage2.c, stage2.h, stage3.c, stage3.h, util: Merged revisions 9954 78884 to 79299 from branches/2012-11-11-middle-piece to use chromosomal 9955 coordinates, chroffset, and chrhigh 9956 9957 * README, VERSION, index.html: Updated version number 9958 9959 * genome.h: Changed interface to Genome_fill_buffer_simple_alt 9960 9961 * splicetrie_build.c: Using new interface to Genome_fill_buffer_simple_alt 9962 9963 * genome.c: Fixed bugs in substituting SNPs into string 9964 (uncompress_mmap_snps_subst) 9965 99662012-11-09 twu 9967 9968 * VERSION, index.html: Updated version number 9969 9970 * stage3.c, stage3.h: Implemented Stage3_set_genomicend 9971 9972 * stage2.c: Remove MAX_GENOMICLENGTH restriction 9973 9974 * gregion.c: Impose MAX_GENOMICLENGTH on gregion 9975 9976 * gmap.c: Revising genomicend after pieces are merged 9977 9978 * indexdb.h: Fixed type conflict involving Oligospace_T with definition in 9979 indexdb.c 9980 9981 * stage2.c: Raised MAX_GENOMICLENGTH from 1 million to 10 million bp 9982 9983 * stage3.c: Fixed identification of insertion in Stage3_mergeable 9984 9985 * dynprog.c: Added debugging statement 9986 9987 * gregion.c: Fixed bug where gregion could extend beyond chrhigh 9988 99892012-11-07 twu 9990 9991 * stage3hr.c: Using fact that circularp[0] == false to handle translocations 9992 9993 * iit-read.c: Fixed IIT_circularp for lookup of 1-based chrnum values 9994 9995 * VERSION, index.html, config.site.rescomp.prd: Updated version number 9996 9997 * gmap.c, gsnap.c, uniqscan.c: Using new interface to Stage3hr_setup 9998 9999 * stage3hr.c: Checking hit->alias before checking hit->plusp 10000 10001 * stage3hr.c, stage3hr.h: Using information about circularp in 10002 compute_circularpos and in computing hit->alias. 10003 10004 * iit-read.c, iit-read.h: Implemented IIT_circularp function 10005 10006 * stage1hr.c: Added checks to make sure greedy mapping positions do not 10007 result in a genomic segment with negative length 10008 100092012-11-06 twu 10010 10011 * VERSION, index.html: Updated version number 10012 10013 * access.c: Changed printf commands for off_t to use %ju 10014 10015 * genome_hr.c: Fixed computation of wildcard SNP positions to exclude 10016 positions where reference allele is 'N'. 10017 10018 * configure.ac, access.c: Handling compiler warning messages when sizeof 10019 off_t is not 8 10020 10021 * dynprog.c, dynprog.h, gsnap.c, uniqscan.c: Removed genome and genomealt 10022 from Dynprog_setup 10023 10024 * stage1.c: No longer filtering for support anymore, since it leads to poor 10025 behavior 10026 10027 * gmap.c: Fixed a bug where a NULL genomealt was being passed to 10028 Chimera_find_exonexon 10029 100302012-10-31 twu 10031 10032 * coords1.test.ok: Changed to match current output of program, which allows 10033 for circular chromosomes 10034 10035 * Makefile.gsnaptoo.am: Made files match those of Makefile.dna.am 10036 10037 * archive.html, index.html: Made changes for new version 10038 10039 * README: Added discussion of --force-xs-flag, the XW and XV fields, and 10040 circular chromosomes. 10041 10042 * VERSION, config.site.rescomp.tst: Updated version number 10043 10044 * stage2.c: Prefer shorter intron in all cases 10045 10046 * gsnap.c, stage1hr.c, stage1hr.h, uniqscan.c: Implemented variable for 10047 splice distancs at novel ends 10048 10049 * stage2.c: For first intron, favoring shorter intron lengths in cases of 10050 ties 10051 10052 * stage1hr.c: Refinements made to computing GMAP genomic region, by checking 10053 if a fallback mappingstart or end is available. Performing n best GMAP 10054 alignments for pairsearch. Going back to sorting terminals by matches in 10055 GMAP terminals. 10056 10057 * stage3hr.c: Modifications made to score_eventrim for GMAP, by ignoring 10058 small trims and restoring indel penalty. Fixed bug in computing 10059 non_terminal3_p. 10060 10061 * stage3.c: Calculating ambig end lengths better in trimming noncanonical 10062 ends. Not doing trim of novel splice end if ambig end length already set. 10063 10064 * samprint.c, samprint.h: Implemented --force-xs-direction flag 10065 100662012-10-30 twu 10067 10068 * dynprog.c: Limiting indels to 3 bp around splice sites 10069 10070 * stage2.c: Edited comment on debugging usage 10071 10072 * stage3.c: In trimming non-canonical end exons, when end exon is actually 10073 canonical but with poor probability, we trim the exon but set 10074 ambig_end_length, so the alignment can compete with other alignments. 10075 10076 * stage3.c: Made bad stretch algorithm more tolerant 10077 100782012-10-29 twu 10079 10080 * genome_hr.c: Fixed compile for GMAP 10081 10082 * pair.h: Added code to handle force-xs-direction. Using 10083 mate_cdna_direction when necessary. 10084 10085 * pair.c: Added code to handle force-xs-direction. Using 10086 mate_cdna_direction when necessary. Counting N's as mismatches for 10087 trimming purposes. 10088 10089 * genome_hr.c, genome_hr.h, substring.c: Counting N's as mismatches for the 10090 purposes of end trimming 10091 100922012-10-28 twu 10093 10094 * substring.c, substring.h: Added function Substring_chimera_sensedir 10095 10096 * outbuffer.c, outbuffer.h: Removed snps_p as a parameter 10097 10098 * gmap.c, gsnap.c, uniqscan.c: Added --force-xs-dir flag 10099 101002012-10-27 twu 10101 10102 * stage3.c, stage3.h: Removed snps_p 10103 101042012-10-26 twu 10105 10106 * dynprog.c: Fixed debugging statements 10107 10108 * gmap.c, gsnap.c, uniqscan.c: Using new interface to Stage3_setup 10109 10110 * dynprog.c, stage3.c, stage3.h, intron.c: Commented out constants that 10111 should not be used by PMAP 10112 10113 * samprint.c, pair.c, pair.h: Printing XW and XV only when SNP-tolerant 10114 alignment is used 10115 10116 * uniqscan.c: Using new interfaces to procedures. 10117 10118 * gmap.c, gsnap.c: Added flag --md-lowercase-snp. Using new interfaces to 10119 procedures. 10120 10121 * dynprog.c: Using jump_late_p in cdna_gap and genome_gap, which can 10122 sometimes give rise to indels 10123 10124 * stage1hr.c: Removed nsalvage from debugging statements 10125 10126 * stage3.c: Changed Stage3_bad_stretch_p to count each indel gap as one 10127 mismatch, regardless of length 10128 10129 * pair.c: Removed debugging statements 10130 101312012-10-25 twu 10132 10133 * stage3.c, stage3.h: Using new interface to Intron_type. Obtaining 10134 alternate genomic segments in some procedures. 10135 10136 * stage2.c, stage2.h: Obtaining alternate allele and putting into Pair_T 10137 object when SNP-tolerant alignment is used 10138 10139 * splicetrie.c, splicetrie.h: Using new interfaces to Dynprog splicejunction 10140 procedures 10141 10142 * outbuffer.c, outbuffer.h: Added snps_p field 10143 10144 * iit-read.c, iit-read.h: Added function IIT_interval_type 10145 10146 * genome_hr.c: In finding dinucleotides, using alternate genome 10147 10148 * genome.c, genome.h: In function used to create splicejunctions, returning 10149 alternate genomic segment 10150 10151 * chimera.c, chimera.h, intron.c, intron.h, maxent_hr.c, maxent_hr.h, 10152 dynprog.c: Using alternate alleles in evaluating splice sites 10153 10154 * samprint.c, samprint.h, pair.c, pair.h: Printing lowercase MD for known 10155 SNP variants 10156 10157 * access.c, access.h: Added function Access_file_equal 10158 10159 * dynprog.h: Taking alternate genome sequence in splicejunction procedures 10160 10161 * dynprog.c: Using alternate allele in computing dynamic programming matrices 10162 10163 * snpindex.c: Made snpindex with with circular chromosomes 10164 10165 * snpindex.c: Checking if given IIT file and installed IIT file are the same 10166 101672012-10-23 twu 10168 10169 * trunk, VERSION, config.site.rescomp.tst, src, boyer-moore.c, 10170 boyer-moore.h, diag.c, diag.h, dynprog.c, dynprog.h, genome.c, gmap.c, 10171 oligoindex_hr.c, oligoindex_hr.h, stage1hr.c, stage2.c, stage2.h, 10172 stage3.c, stage3.h, util: Merged revisions 77378 to 77446 from 10173 branches/2012-10-21-no-genomicseg to remove genomicseg parameters 10174 10175 * stage3.c: Fixed uninitialized variables knownsplice5p, knownsplice3p, and 10176 intronlength. 10177 101782012-10-21 twu 10179 10180 * gsnap.c: Using new interface to Genome_setup and SAM_setup. 10181 10182 * gmap.c: Using new interface to Genome_setup. Using -V and -v flags as in 10183 GSNAP. 10184 10185 * stage1.c: Fixed matchsize to be double index1part when index1part < 12 10186 10187 * dynprog.c, genome.c, pair.c, pair.h, pairdef.h, pairpool.c, pairpool.h, 10188 stage2.c, stage3.c: Added genomealt to Pair_T object and assigning this 10189 value in Pairpool_push routines. Creating GENOMEALT_DEFERRED value until 10190 we resolve all occurrences of genomicseg. 10191 10192 * boyer-moore.c, dynprog.c, dynprog.h, genome.c, genome.h, maxent_hr.c, 10193 pair.c, stage2.c, stage3.c: Genome_get_char_blocks and get_genomic_nt now 10194 return alternate allele 10195 10196 * samprint.h: Added snps_iit to SAM_setup 10197 10198 * samprint.c: Computing mismatches for both refdiff and bothdiff, and 10199 printing XW and XV fields if running in SNP-tolerant mode 10200 102012012-10-20 twu 10202 10203 * gmap.c: Clarified that chimera search can be turned off by setting value 10204 to 0 10205 10206 * indexdb_hr.c: Fixed debugging statement 10207 10208 * snpindex.c: Fixed program to work with sampling intervals other than 3 bp. 10209 Performing file copy of IIT file to maps subdirectory. 10210 10211 * access.c, access.h: Added function Access_file_copy 10212 10213 * samprint.c: Fixed bug where circularpos was called before results arrays 10214 were retrieved 10215 102162012-10-19 twu 10217 10218 * stage3hr.c: Using hitpair score_eventrim in Stage3pair_optimal_score, 10219 instead of individual score5 and score3. 10220 10221 * trunk, VERSION, config.site.rescomp.tst, src, Makefile.dna.am, chrom.c, 10222 chrom.h, dynprog.c, gamma-speed-test.c, gdiag.c, genome-write.c, 10223 genome-write.h, genome.c, genome.h, gmap.c, gmapindex.c, gregion.c, 10224 gregion.h, gsnap.c, gsnap_tally.c, iit-read.c, iit-read.h, iit_dump.c, 10225 iit_store.c, indexdb.c, outbuffer.c, pair.c, pair.h, samprint.c, 10226 samprint.h, splicetrie_build.c, stage1.c, stage1.h, stage1hr.c, stage3.c, 10227 stage3.h, stage3hr.c, stage3hr.h, substring.c, substring.h, uniqscan.c, 10228 util, fa_coords.pl.in, gmap_build.pl.in, gmap_process.pl.in: Merged 10229 revisions 76693 to 77345 from branches/2012-10-15-circular to handle 10230 circular chromosomes 10231 10232 * outbuffer.c: Fixed write mode for appending to a file 10233 102342012-10-18 twu 10235 10236 * dynprog.c: Allowing get_genomic_nt to retrieve negative coordinates 10237 102382012-10-15 twu 10239 10240 * gmap.c, gsnap.c, outbuffer.c, outbuffer.h: Added --append option to append 10241 results to output files 10242 10243 * memory-check.pl: Added -9 flag to print successive maximum memory usage 10244 102452012-10-12 twu 10246 10247 * types.h: Added OLIGOSPACE_NOT_LONG 10248 10249 * stage3.c: Handling PMAP case for final guess at cdna_direction 10250 10251 * stage2.c: Not doing stage 2 if genomiclength > MAX_GENOMICLENGTH 10252 10253 * indexdb.c: Fixed issues with %lu using OLIGOSPACE_NOT_LONG 10254 10255 * sequence.c, sequence.h: Sequence_read_unlimited returns nextchar 10256 10257 * inbuffer.c, inbuffer.h, outbuffer.c, outbuffer.h: Handling multiple pairs 10258 of sequences in --pairalign mode 10259 10260 * gmap.c, gsnap.c: Requiring -m to be 0.10 or less when it is a float 10261 10262 * stage3hr.c, stage3hr.h: In pair_up_concordant, treating hits and terminals 10263 separately. When all results are double terminals, treating as if it were 10264 final. 10265 10266 * stage1hr.c: Added variable gmap_rerun_p. Fixed memory leak. Removed use 10267 of segment->usedp. Changed some uses of starti and endi. GMAP result 10268 must be significantly better than original hit (reducing misses by half). 10269 GMAP pairsearch run only if hit list is small enough. Keeping hits and 10270 terminals in separate lists. 10271 102722012-10-10 twu 10273 10274 * substring.c: Fixed bug in not copying trim_left_splicep and 10275 trim_right_splicep 10276 102772012-10-01 twu 10278 10279 * stage3.c: Added ability for Stage3_merge_local_splice to make a deletion 10280 instead of an intron 10281 10282 * stage3.c: Made code for coordinate change in Stage3_merge_local_splice 10283 match that of Stage3_merge_local_single 10284 102852012-09-26 twu 10286 10287 * gmap.c, gsnap.c, pair.c, pair.h, stage3.c, stage3.h, uniqscan.c: Added 10288 --require-splicedir flag and code for guessing at cdna direction. 10289 102902012-09-24 twu 10291 10292 * gmap.c, gsnap.c, stage3.c, stage3.h, stage3hr.c, stage3hr.h, uniqscan.c: 10293 Removed --pairexpect and --pairdev flags. Removed expected_pairlength and 10294 pairlength_deviation variables. 10295 10296 * gsnap.c, uniqscan.c: Using new interface to setup procedures 10297 10298 * stage1hr.c: Turned off debugging 10299 10300 * substring.c, substring.h: Added check for splice site at trimmed position, 10301 to be used in even-trimming 10302 10303 * stage3hr.c, stage3hr.h: Added Stage3end_trim_left and Stage3end_trim_right 10304 commands. In Stage3pair_optimal_score, eliminating hits only if both hit5 10305 and hit3 are worse than optimal in pre-final stages. In final stages, 10306 using the sum of hit5 and hit3. Eliminated absdifflength field. 10307 10308 * stage1hr.c: For determining GMAP bounds, computing both close 10309 mappingstart/mappingend (at distal end) and middle 10310 mappingstart/mappingend, and using close if available. If close 10311 mappingstart/mappingend does not give a good alignment at the end, then 10312 trying full pairmax plus shortsplicedist. 10313 103142012-09-21 twu 10315 10316 * stage1hr.c: Consolidated calls to Stage2_compute and Stage3_compute into a 10317 run_gmap procedure. Computing close_genomicstart and close_genomicend 10318 values, but using full pairmax for now. 10319 103202012-09-20 twu 10321 10322 * stage1hr.c: In computing mappingstart and mappingend for GMAP region, 10323 evaluating each diagonal for shortsplicedist vs querylength extension 10324 103252012-09-19 twu 10326 10327 * stage3hr.c: Computing trim_left_splicep and trim_right_splicep for GMAP 10328 alignments. Using trim_left_splicep and trim_right_splicep to determine 10329 trim amount. 10330 10331 * stage1hr.c: Taking lowprob splice only if both ends have minimum support 10332 (set at 20) and no subs or indels were found previously 10333 103342012-09-18 twu 10335 10336 * stage1hr.c: Requiring one splice site to be sufficient for lowprob 10337 splices. Finding best splice first by nmismatches, and then by prob. 10338 10339 * gsnap.c, stage1hr.c, stage1hr.h, uniqscan.c: Using min_intronlength to 10340 prevent deletions from showing up as lowprob splices 10341 103422012-09-17 twu 10343 10344 * gmap.c, stage3.c, stage3.h: Implemented procedures to favor paths first by 10345 best goodness score, and then by shorter genomiclength 10346 103472012-09-14 twu 10348 10349 * stage3.c: Compute best_absmq_score, even if it is negative 10350 10351 * dynprog.c, pair.c, pair.h, stage3.c: Not protecting distal indels after 10352 known splice sites 10353 10354 * gsnap.c: For cmet-stranded and cmet-nonstranded mode, make 10355 --terminal-threshold=100 the default 10356 10357 * stage3.c: Allowing for merging when there is excess query sequence at the 10358 breakpoint 10359 103602012-09-13 twu 10361 10362 * oligoindex_hr.c: For minus strand, not subtracting 1 from left if left is 10363 0, which caused the entire sequence to be skipped 10364 10365 * datadir.c: Including comment about -F flag for cmetindex and atoiindex 10366 10367 * snpindex.c: Allowing snpindex to work when there is no gammaptrs file 10368 103692012-09-12 twu 10370 10371 * stage3hr.c: Changed a free command from FREE to FREE_OUT 10372 103732012-08-10 twu 10374 10375 * oligoindex.c: Fixed assertion for PMAP to use 3*querylength 10376 103772012-07-20 twu 10378 10379 * pair.c: Restored capitalized Coverage for standard output 10380 10381 * index.html: Added note about lower-case coverage and identity tags in GFF3 10382 10383 * pair.c: Changed Coverage and Identity in GFF3 output to be lower case 10384 10385 * Makefile.three.am: Fixed Makefile instructions 10386 10387 * VERSION, config.site.rescomp.tst, index.html: Updated version number 10388 10389 * gmap_build.pl.in: Fixed use of short and long options 10390 103912012-07-18 twu 10392 10393 * stage3.c: Merging pairs list of left and right parts for local merge, so 10394 that the resulting Stage3_T object can be used iteratively to find a 10395 chimera. 10396 10397 * gmap.c: Added DEBUG2A macro to show details of chimera detection 10398 103992012-07-17 twu 10400 10401 * gmap.c, pair.c, pair.h, stage3.c, stage3.h: Implemented iterative 10402 algorithm for finding chimeras 10403 104042012-07-12 twu 10405 10406 * VERSION, config.site.rescomp.tst, index.html: Updated version number 10407 10408 * pair.h, pair.c, samprint.c: Fixed problem in SAM output with an unpaired 10409 alignment with one end being a GMAP alignment 10410 104112012-07-09 twu 10412 10413 * gff3_introns.pl.in, gff3_splicesites.pl.in: Added -Q flag to suppress 10414 messages to stderr 10415 10416 * gff3_genes.pl.in, gff3_introns.pl.in, gff3_splicesites.pl.in: Checking for 10417 transcript lines as well as mRNA lines 10418 104192012-07-06 twu 10420 10421 * README: Added instructions for using vcf_iit 10422 10423 * configure.ac, Makefile.am, vcf_iit.pl.in: Added vcf_iit program for 10424 processing VCF files 10425 104262012-07-05 twu 10427 10428 * dbsnp_iit.pl.in: Added exception types InconsistentAlleles and 10429 SingleAlleleFreq 10430 10431 * dbsnp_iit.pl.in: Printing exception handling rules to stderr even if 10432 exception file not given 10433 104342012-07-03 twu 10435 10436 * VERSION, index.html: Updated version number 10437 10438 * parserange.c: No longer checking for labels that match a contig 10439 10440 * stage3.c: In extend_ending5 and extend_ending3, checking for a gap between 10441 gappairs and the rest of the read 10442 104432012-06-27 twu 10444 10445 * gmap.c: Fixed bug which prevented -1 flag from working 10446 10447 * stage1.c: Made stage 1 work for PMAP 10448 10449 * stage3hr.c: Turned off debugging 10450 10451 * stage3hr.c: For hitpair_equiv_cmp, not looking at score or nmatches anymore 10452 104532012-06-25 twu 10454 10455 * VERSION, index.html: Updated version number 10456 10457 * splicetrie_build.c: Fixed contents of splicesite_i in splicestrings, after 10458 sorting of splice sites 10459 104602012-06-20 twu 10461 10462 * uniqscan.c: Using new interface to Stage3hr_setup 10463 10464 * VERSION, index.html: Updated version number 10465 10466 * index.html, gsnap.c, stage3hr.c, stage3hr.h: Added --gmap-min-coverage flag 10467 10468 * stage2.c: Changed find_shifted_canonical to go directly to Genome_hr 10469 procedures instead of allocating memory and saving past results 10470 10471 * gsnap.c, stage1hr.c: Added indel_knownsplice as option to --gmap-mode 10472 104732012-06-19 twu 10474 10475 * gmap_build.pl.in: Added -M flag for handling NCBI MD files 10476 104772012-06-18 twu 10478 10479 * stage3hr.c: Cleaned up optimal_score commands for removing terminal 10480 alignments in final stage. Using trim_terminals_p argument in calling 10481 compute_mapq functions. 10482 10483 * pair.c, pair.h, substring.c, substring.h: Added trim_terminals_p argument 10484 to compute_mapq functions 10485 104862012-06-15 twu 10487 10488 * index.html, pair.c, substring.c: Reverted back to old behavior for 10489 computing MAPQ in entire read, but trimming off ends of type TERM 10490 10491 * index.html: Added comments for changes 10492 10493 * gmap_build.pl.in: Using long flag names 10494 10495 * pair.c, pair.h, stage3hr.c, substring.c: Computing MAPQ scores over 10496 trim-kept region, instead of entire substring 10497 10498 * VERSION, config.site.rescomp.tst, index.html: Updated version number 10499 10500 * stage3.c: In trim_noncanonical_end_exons, keeping known introns only if 10501 nmismatches == 0 10502 105032012-06-14 twu 10504 10505 * stage3hr.c: Allowing Stage3end_remove_overlaps to work with translocations 10506 10507 * stage1hr.c: Allowing for multiple translocations to be reported. Not 10508 updating nconcordant for GMAP pair revisions 10509 10510 * outbuffer.c, resulthr.c, samprint.c: Allowing for multiple single-end and 10511 unpaired translocations to be printed 10512 10513 * resulthr.c, samprint.c: Allowing for multiple paired translocations to be 10514 printed 10515 10516 * stage3hr.c, stage3hr.h: Changed Stage3pair_remove_overlaps and 10517 hitpair_sort_cmp, so they work on translocations 10518 10519 * stage3hr.c: Allowing multiple concordant translocations to be printed 10520 10521 * stage1hr.c: Not skipping GMAP on terminal alignments. Performing 10522 align_concordant_with_gmap on with_terminal list. 10523 10524 * resulthr.c, resulthr.h, pair.c: Changed pairtype TRANSLOCATION to 10525 CONCORDANT_TRANSLOCATIONS 10526 10527 * stage1hr.c, stage3hr.c, stage3hr.h: Added a category of hitpairs called 10528 with_terminal, with lower priority than samechr or conc_transloc 10529 10530 * stage2.c: Increased value of SHIFT_EXTRA to fix a fatal bug 10531 105322012-06-13 twu 10533 10534 * stage3.c: Counting indels and short gaps as mismatches in 10535 Stage3_bad_stretch_p 10536 105372012-06-12 twu 10538 10539 * index.html: Added comment about improved detection of translocations 10540 within read ends 10541 10542 * stage3hr.c: Computing substring_for_concordance for both translocations 10543 (chrnum == 0) and intrachromosomal rearrangements (shortdistancep == false) 10544 10545 * stage1hr.c: Checking for bad stretch in GMAP hits, as soon as we call 10546 Stage3end_new_gmap 10547 10548 * index.html: Updated version number 10549 105502012-06-11 twu 10551 10552 * trunk, VERSION: Updated version 10553 10554 * Makefile.gsnaptoo.am: Removed extra includes of cmet and atoi files for 10555 GMAP 10556 10557 * oligoindex_hr.c: Getting the final oligomers when computing mappings 10558 10559 * stage3.c: Fixed computation of mappingstart and mappingend for traversing 10560 dual breaks on crick strand 10561 10562 * stage1.c: Restoring old scan ends algorithm 10563 10564 * stage1hr.c: Removed unused debugging macro 10565 10566 * stage3.c: In trimming novel splice ends, allowing perfect matches to 10567 extend into intron 10568 10569 * psl_introns.pl.in: Added print command 10570 10571 * Makefile.gsnaptoo.am: Added file dependencies 10572 10573 * stage3.c: Using QUERYEND_NOGAPS for pass 9a and pass 9b for GSNAP, so 10574 trimming will work. Fixed computation of mappingstart and mappingend in 10575 traverse_dual_break. 10576 105772012-06-06 twu 10578 10579 * Makefile.dna.am, stage3hr.c: Adding an absolute sufficient minlength for a 10580 terminal, besides using querylength 10581 10582 * VERSION, config.site.rescomp.tst, index.html: Updated version number 10583 10584 * src: Committing property changes from last merge 10585 10586 * gmap.c: Increased max_nalignments from 3 to 10 10587 10588 * stage1hr.c: Fixed bug in find_terminals, where querypos3 was used to 10589 compute start_endtype and querypos5 was used to compute end_endtype, 10590 instead of querypos5 and querypos3, respectively. 10591 10592 * stage3hr.c: Allowing both ends to be of type TERM in a terminal, and 10593 checking for mismatches only between the trimmed ends. Requiring that 10594 final length is querylength/3. 10595 10596 * dynprog.c: Dropped mismatch scores, which helps GMAP extend ends and find 10597 chimeras. 10598 10599 * stage3.c: Changed endalign for pass 9a and 9b from QUERYEND_NOGAPS to 10600 BEST_LOCAL. This fixes an issue in GMAP where ends are truncated, and 10601 chimeras not found, as introduced in revision 64732 on 2012-05-22. 10602 10603 * stage2.c: Fixed bug in condition on suboptimal stage 2 paths, where we 10604 were requiring fewer than max_nalignments results plus the score == 10605 bestscore. The condition should have been a disjunction, not a 10606 conjunction. 10607 10608 * stage1hr.c: Skipping computation of GMAP on single-end terminal 10609 alignments, since that is a duplication of effort 10610 10611 * stage1hr.c, stage3hr.c, stage3hr.h, substring.c, substring.h: Restored 10612 assignment of endtypes to terminal alignments. Using them again determine 10613 whether to extend terminals left or right for GMAP alignments. 10614 10615 * stage1hr.c: Integrated two criteria for finding terminals: old method 10616 based on counting mismatches from ends, and new method based on width of 10617 (querypos3 - querypos5). 10618 106192012-06-05 twu 10620 10621 * stage3.c: Fixed bug in local chimera alignment with uninitialized value 10622 for genomicseg_ptr 10623 106242012-06-01 twu 10625 10626 * genome.c: Added assertions to Genome_get_char and Genome_get_char_blocks 10627 to check for negative coordinates 10628 10629 * dynprog.c: Removed debugging statements 10630 10631 * stage3.c, dynprog.c: Fixed get_genomic_nt to check for both genomicpos 10632 between 0 and genomiclength, and pos between chroffset and chrhigh 10633 10634 * VERSION, index.html, config.site.rescomp.tst: Updated version number 10635 10636 * dynprog.c, stage3.c: Checking only for genomepos < 0 in get_genomic_nt, 10637 not for chrpos between chroffset and chrhigh, which may need further 10638 debugging for chimeras. 10639 10640 * dynprog.c: Checking for genomepos < 0 in get_genomic_nt. 10641 10642 * stage3.c: For Stage3_extend_left and Stage3_extend_right, using 10643 get_genomic_nt instead of going directly to Genome_get_char. Checking for 10644 genomepos < 0 in get_genomic_nt. 10645 10646 * stage3hr.c: In Stage3pair_remove_overlaps, allowing separate pair to 10647 subsume overlapping pair only if it is better 10648 10649 * stage3.c, dynprog.c: Fixed check of chrpos to compare genomicpos against 10650 chrhigh 10651 10652 * dynprog.c, stage3.c: Checking for chrpos between 0 and chrhigh - 1, 10653 inclusive 10654 10655 * dynprog.c, dynprog.h, gmap.c, gregion.c, gregion.h, splicetrie.c, 10656 splicetrie.h, stage1hr.c, stage3.c, stage3.h: Passing chrhigh along with 10657 chroffset to all procedures 10658 10659 * dynprog.c, genome.c: When chromosomal coordinate is negative, returns '*' 10660 instead of 'N'. Traceback procedures in dynamic programming will not add 10661 pairs with '*' genomic nucleotides. 10662 10663 * util: Merged changes from last branch 10664 10665 * README: Added note that MAX_READLENGTH applies only to GSNAP 10666 10667 * stage2.c, stage3.c: Merged changes from 10668 branches/2012-06-01-merge-single-gap to fix problems with merging single 10669 gap on minus strand 10670 106712012-05-31 twu 10672 10673 * stage3.c: Protected another debugging statement from referring to 10674 genomicseg 10675 10676 * gsnap.c: Fixed documentation for --fails-as-input flag. 10677 10678 * gmap.c: Added --fails-as-input string to getopt processing. Fixed 10679 documentation for --fails-as-input flag. 10680 10681 * dynprog.c: Added messages to stderr before all abort statements 10682 10683 * translation.c: Requiring translation_leftpos and translation_rightpos to 10684 be between 0 and querylength-1. 10685 106862012-05-25 twu 10687 10688 * gmap_build.pl.in: If -k 15 specified, but not -b, setting basesize to be 12 10689 106902012-05-24 twu 10691 10692 * Makefile.gsnaptoo.am: Added uniqscan program 10693 10694 * stage1hr.c: Decreased max_nalignments from 3 to 2 10695 10696 * dynprog.c: For known splicesites, adjusted low and high boundaries so 10697 contlength is always between 0 and endlength-1, inclusive. 10698 10699 * stage3.c: Not reducing genomejump at ends anymore 10700 107012012-05-23 twu 10702 10703 * VERSION, index.html: Updated version number 10704 10705 * gmap.c, stage1hr.c: Increased max_nalignments for stage 2 to 3 10706 10707 * stage3hr.c: Turned off check for cdna_direction != 0 and SENSE_NULL in 10708 declaring a GMAP alignment as bad 10709 10710 * stage3.c: Changed pass 9 from queryend_indels to queryend_nogaps, to avoid 10711 false positive indels at ends and to prepare for noncanonical end trimming 10712 10713 * splicetrie.c: Improved debugging statements 10714 10715 * pair.c: Added information about knowngapp and protectedp in printing pair 10716 information for debugging purposes 10717 107182012-05-22 twu 10719 10720 * stage3.c: In trimming of noncanonical introns near end, making an 10721 exception for known introns 10722 10723 * dynprog.c: Replaced noindel version of Dynprog_end_splicejunction 10724 functions with version allowing indels 10725 10726 * stage3.c, stage3.h, stage3hr.c: In Stage3_bad_stretch_p, excluding trimmed 10727 regions on ends 10728 10729 * uniqscan.c: Using new interfaces to setup procedures 10730 10731 * stage1hr.c: Added debugging statements 10732 10733 * genome_hr.c: Removed debugging statements 10734 10735 * gamma-speed-test.c: Using new interface to setup procedures 10736 10737 * dynprog.c, dynprog.h, stage3.c: Introduced new endalign type, 10738 QUERYEND_GAP, and using it in pass 8. Restored call of Dynprog_end 10739 procedures in trim ends using BEST_LOCAL, which does not try to find an 10740 intron. 10741 10742 * stage3.c: Better handling of ends: pass 8, best_local plus known splicing; 10743 pass 9, queryend_indels; pass 10, queryend_nogaps. Medial gap not using 10744 known splicing. Simplified trim_ends procedure. Not removing or 10745 re-inserting known intron gaps. Bayesian computation of mapping scores 10746 for GMAP alignments. 10747 10748 * splicetrie.c: Computing separate offsets for anchor and far splicesites 10749 for use in Dynprog_end_splicejunction procedures. Not calling 10750 Dynprog_add_known_splice procedures. 10751 10752 * pairdef.h, pairpool.c, pairpool.h: Added knowngapp field to Pair_T object 10753 10754 * dynprog.c, dynprog.h: Allowing compute_scores procedures to work on 10755 genomicseg (for splicejunctions). Added functions 10756 Dynprog_end5_splicejunction and Dynprog_end3_splicejunction to replace 10757 add_known_splice procedures. Calling traceback_local_nogaps in two parts. 10758 Dynprog_end_gap procedures returning final score even for QUERYEND_INDELS. 10759 Made debugging statements work without genomicseg. 10760 107612012-05-21 twu 10762 10763 * samprint.c: Checking if MD string output is empty and if so printing "0" 10764 107652012-05-18 twu 10766 10767 * stage3hr.c: For paired-end reads, in cases of tie score, sorting results 10768 by genomic position 10769 10770 * splicetrie_build.c: For intron-level splicing information, sorting 10771 individual splicesites by ascending genomic position 10772 10773 * splicetrie.c, splicetrie.h: Computing spliceoffset needed to construct 10774 splicejunctions. Calling Dynprog_local_nogaps procedures. Requiring 10775 dynprog score > 0 on known splicejunctions. 10776 10777 * dynprog.c, dynprog.h: Fixed bug in making A-G and C-T ambiguous scores for 10778 all modes. Implemented traceback procedure using given sequence and no 10779 gaps for handling known splicejunctions. End dynamic programming 10780 procedure now returns a final score for queryend_nogaps endalign. 10781 Implemented make_contjunction procedures to retrieve the continuous part 10782 of splicejunctions. Made make_splicejunction_3 consistent with 10783 make_contjunction_3. 10784 10785 * stage3hr.c: Turned off check for min_splice_prob on GMAP alignments, since 10786 it appears not to work for known splicesites 10787 10788 * pair.c: Pair_dump_list now prints line to indicate start of list 10789 107902012-05-17 twu 10791 10792 * dynprog.c: Handling cases where length1 == 0 or length2 == 0, which 10793 otherwise cause fatal errors 10794 107952012-05-16 twu 10796 10797 * stage3.c: Setting use_genomicseg_p to false in all cases 10798 10799 * stage2.c: Using Oligoindex_hr_tally even if user genomic segment provided 10800 10801 * gmap.c: Computing genomicend even if user genomic segment provided 10802 10803 * genome-write.c: Added extra 4 words to end of genome blocks to accommodate 10804 nextlow in Oligoindex_hr procedures 10805 10806 * Makefile.gsnaptoo.am: Added source files for GMAP 10807 10808 * VERSION, config.site.rescomp.prd, config.site.rescomp.tst, index.html: 10809 Updated version number 10810 10811 * stage1.c, stage1.h: Added procedure for nonstranded alignment. Turning 10812 off scan_ends algorithm and using only sampling. Using indexdb size limit 10813 for standard mode, but not for cmet-nonstranded mode. 10814 10815 * stage2.c: Looking up genomic nt for all alignment pairs in 10816 convert_to_nucleotides 10817 10818 * uniqscan.c, gsnap.c: Using new interfaces to setup routines 10819 10820 * gmap.c: Using different indexdb size thresholds for standard and 10821 cmet-nonstranded modes 10822 108232012-05-15 twu 10824 10825 * stage1hr.c: Using new interfaces to stage 2 procedures 10826 10827 * stage3.c, stage3.h: Handling AMBIGUOUS_COMP the same as MATCH_COMP and 10828 DYNPROG_MATCH_COMP. Removed genomic_offset argument from Stage2_compute. 10829 Fixed intermediate alignment results for debugging by returning pairs 10830 instead of path from path_compute. 10831 10832 * oligoindex_hr.c, oligoindex_hr.h: Fixed Cmet_reduce commands for 10833 CMET_STRANDED mode 10834 10835 * match.c: Fixed memory leak in a debugging procedure, Match_print 10836 10837 * gregion.c, gregion.h: Made Gregion_filter_support function available 10838 again. Added function Gregion_genestrand. 10839 10840 * Makefile.dna.am, dynprog.c, dynprog.h, genome.c, genome.h, gmap.c, 10841 stage2.c, stage2.h: Added code to make GMAP work on cmet-stranded and 10842 cmet-nonstranded modes 10843 108442012-05-14 twu 10845 10846 * stage3hr.c: Disallowing new Stage3pair_T object if its insertlength 10847 exceeds pairmax and we expect a concordant pair 10848 10849 * trunk, config.site.rescomp.tst, src, Makefile.dna.am, block.c, block.h, 10850 boyer-moore.c, boyer-moore.h, dynprog.c, dynprog.h, genome-write.c, 10851 genome-write.h, genome.c, genome.h, genome_hr.c, genome_hr.h, gmap.c, 10852 gsnap.c, intlist.c, intlist.h, oligoindex.c, oligoindex.h, 10853 oligoindex_hr.c, oligoindex_hr.h, pair.c, pair.h, splicetrie.c, stage1.c, 10854 stage1.h, stage1hr.c, stage2.c, stage2.h, stage3.c, stage3.h, stage3hr.c, 10855 stage3hr.h, uniqscan.c, util: Merged revisions 63606 to 64016 from 10856 branches/2012-05-08-genomic-nts to read genomic nt rather than generate 10857 genomicseg 10858 108592012-05-10 twu 10860 10861 * trunk, config.site.rescomp.tst, archive.html, index.html, src, dynprog.c, 10862 util: Merged revisions 63773 through 63823 from 10863 branches/2012-05-10-better-affine-gap to make score matrix symmetric, put 10864 sequence2 in outer loop, fix boundary conditions, and improve efficiency 10865 10866 * resulthr.c: Fixed uninitialized value for X2 on halfmapping_mult alignments 10867 10868 * samprint.c: Fixed uninitialized value for X2 on halfmapping_uniq alignments 10869 108702012-05-07 twu 10871 10872 * gtf_genes.pl.in, gtf_introns.pl.in, gtf_splicesites.pl.in: Made -E flag 10873 use exon_number field 10874 10875 * gtf_genes.pl.in, gtf_introns.pl.in, gtf_splicesites.pl.in: Added -E flag 10876 to ignore exon_number fields in GTF file 10877 10878 * VERSION: Updated version 10879 10880 * oligoindex_hr.c: Reverting to version that zeroes out counts for oligomers 10881 that are overabundant or not in query 10882 10883 * gmap.c, stage1hr.c, stage2.c, stage2.h: Providing a limit on the number of 10884 suboptimal alignments returned from stage 2. Limit set to 2 for GMAP and 10885 1 for GSNAP. 10886 10887 * gsnap.c: Added getopt handler for --sam-multiple-primaries 10888 108892012-05-03 twu 10890 10891 * trunk, VERSION, config.site.rescomp.tst, src, dynprog.c, gmap.c, pair.c, 10892 pair.h, stage1hr.c, stage3.c, stage3.h, stage3hr.c, stage3hr.h, util: 10893 Merged revisions 63036 to 63240 from branchese/2012-05-01-affine-gap to 10894 implement an affine gap algorithm for dynamic programming 10895 10896 * uniqscan.c: Using new interface to Stage1hr procedure that contains 10897 first_absmq as an argument 10898 10899 * shortread.c: Generalized handling of old Illumina paired-end format ending 10900 in :0 or :<digit>. 10901 10902 * genome.c: Fixed function Genome_fill_buffer_simple_alt so it returns 10903 ref+alt, instead of empty+alt. 10904 10905 * indexdb.c: Made writing of offsetscomp file faster when blocksize == 1 (or 10906 k == b), by using a single write command instead of looping. 10907 10908 * goby.c, outbuffer.c: Implemented patches for Goby 2.0 10909 10910 * gmap.c, gsnap.c, outbuffer.c, pair.c, pair.h, result.c, result.h, 10911 resulthr.c, resulthr.h, samprint.c, samprint.h, stage1hr.c, stage1hr.h, 10912 stage3.c, stage3.h, stage3hr.c, stage3hr.h: Added flag 10913 --sam-multiple-primaries to allow multiple alignments to be marked as 10914 primary, if their mapping scores are equally good 10915 10916 * shortread.c: Handling older style Illumina paired-end reads that end in 10917 ":0" 10918 109192012-05-02 twu 10920 10921 * stage3.c: Fixed debugging statements 10922 10923 * oligoindex_hr.c: Not zeroing out counts[i] for oligomers that are either 10924 overabundant or not in query. Saves time in allocate_positions function 10925 and in store functions. 10926 109272012-04-27 twu 10928 10929 * VERSION, config.site.rescomp.prd, config.site.rescomp.tst, index.html: 10930 Updated version 10931 10932 * gmap.c: Using new interface to Stage3_compute 10933 10934 * stage3hr.c, stage3hr.h: Using nmatches_posttrim (adjusted by scores for 10935 indels) to compare alignments, except for pre-final terminals. Performing 10936 iteration in optimal_score procedures to use updated trim boundaries as 10937 poor alignments are removed. 10938 10939 * stage3.h: No longer use nmatches_pretrim in Stage3_compute 10940 10941 * stage3.c: Removed use of QUERYEND_INDELS at ends of reads, and using 10942 QUERYEND_NOGAPS instead, to reduce time spent in dynamic programming 10943 10944 * stage1hr.c: Added usedp field to Segment_T object, and marking it true if 10945 the segment is used in making a Stage3end_T object. Skipping these 10946 segments in finding terminals. 10947 10948 * pair.c, pair.h: Added pos5 and pos3 arguments to Pair_nmatches 10949 10950 * genome.c: Made uncompress_mmap faster by translating high and low words 16 10951 bits at a time. Adding N's only if flags is not zero. 10952 10953 * dynprog.c: Modified bridge_intron_gap so it searches for indels on left or 10954 right of splice, but not both. Changes algorithm from quadratic to linear 10955 time. 10956 109572012-04-25 twu 10958 10959 * stage1hr.c, stage3hr.c, stage3hr.h: Revisions made to hit_goodness_cmp and 10960 hitpair_goodness_cmp. Using genomiclength in hit_goodness_cmp, final 10961 round. For terminals, returning 0 in preliminary rounds. Not using 10962 scores for terminals, and using scores only if GMAP or indels are 10963 involved. In pair_up procedure, not using terminal scores to update 10964 found_score. 10965 10966 * indexdb.c: Fixed uninitialized variables that caused problems with older 10967 GMAP indices 10968 109692012-04-23 twu 10970 10971 * dynprog.c: Created separate functions, compute_scores_lookup_fwd and 10972 compute_scores_lookup_rev 10973 10974 * splicetrie.c: Fixed calls to Dynprog_end5_gap and Dynprog_end3_gap to use 10975 endalign instead of to_queryend_p. 10976 109772012-04-20 twu 10978 10979 * VERSION: Updated version 10980 10981 * stage3hr.c: Added check for too many indel breaks in GMAP 10982 10983 * stage1hr.c, stage3hr.c, stage3hr.h: Storing cdna_direction in hit of type 10984 GMAP and using it instead of sensedir when printing 10985 109862012-04-19 twu 10987 10988 * VERSION, index.html: Updated version number 10989 10990 * stage3.c: Turned off debugging 10991 10992 * stage3.c: Fixed uninitialized variable in trim_novel_spliceends 10993 10994 * dynprog.c, dynprog.h, stage3.c: Using correct endalign types in 10995 Dynprog_end5_known and Dynprog_end3_known 10996 10997 * pair.c: Fixed issue with uninitialized variable in printing splicesite 10998 labels 10999 110002012-04-18 twu 11001 11002 * result.c, result.h: Added mergedp variable 11003 11004 * outbuffer.c, outbuffer.h: Handling results where mergedp is true 11005 11006 * gmap.c: Allowing chimera finding to be turned off by setting -x to be 0. 11007 Added mergedp variable so merged alignments generate only a single result. 11008 110092012-04-16 twu 11010 11011 * VERSION, config.site.rescomp.tst, index.html: Updated version 11012 11013 * samprint.c: Handling case where clip-overlap results in a NULL substring 11014 11015 * gmap.c: In call to Stage3_merge_local_single, clipping parts around 11016 breakpoint instead of chimerapos and chimeraequivpos, to avoid issues 11017 where maxpeelback is insufficient 11018 11019 * stage3.c, stage3.h: Renamed variable from extendp to max_extend_p 11020 11021 * get-genome.c: Added --forsam flag to generate header for SAM files 11022 110232012-04-10 twu 11024 11025 * VERSION: Updated version 11026 11027 * stage1hr.c: Fixed bug from uninitialized variable 11028 11029 * gmap.c, stage3.c, stage3.h: Added a criterion for extending left and right 11030 chimera ends to consecutive mismatches, based on queryjump and genomejump 11031 being unequal. 11032 110332012-04-09 twu 11034 11035 * gsnap.c, uniqscan.c: Using new interface to Stage3hr_setup 11036 11037 * stage1hr.c: Finding terminals by a new method. Instead of counting 11038 mismatches from end, requiring only that querypos3 - querypos5 is greater 11039 than index1part. Now searching terminals on single-end reads even if a 11040 GSNAP alignment has been found. Removed nsalvage == 0 requirement for 11041 searching terminals and paired-end reads. 11042 11043 * substring.c, substring.h: Added procedure Substring_set_endtypes 11044 11045 * stage3hr.c, stage3hr.h: Changed optimal score procedures to use max of 11046 max-terminal and min-other for prefinal rounds, and max of max-GMAP and 11047 min-other for final rounds. For GMAP eventrim scores, not counting 11048 indels, and adding a penalty for long ambiguous ends, by dividing by 11049 index1part + (index1interval - 1). Terminal alignments now compute their 11050 own endtypes. 11051 110522012-04-06 twu 11053 11054 * stage2.c: Fixed fatal bug when looking for shifted canonical splice site 11055 by checking that rightpos is less than genomicend. 11056 11057 * gmap.c, stage3.c, stage3.h: For chimeras, extending ends until three 11058 consecutive mismatches are found. At final breakpoint, cleaning indels 11059 from ends. 11060 110612012-04-04 twu 11062 11063 * stage1hr.c: For single-end reads, finding distant splicing only when no 11064 other hits have been found 11065 11066 * VERSION, index.html: Updated version 11067 11068 * pair.c, pair.h, stage3.c: Fixed bug in Stage3_mergeable where we require 11069 end1 and start2 pairs to be computed 11070 11071 * splicetrie.c: Allowing 1 mismatch in distal exon 11072 11073 * stage3hr.c: Changed debugging statement to report score_eventrim 11074 11075 * stage3.c: Rewrite trim_novel_spliceends to scan pairs first to find 11076 genomicpos bounds and then iterate through genomicpos. Allowing 11077 pick_cdna_direction to return SENSE_NULL if no introns exist. 11078 11079 * stage1hr.c: For GMAP terminals, also checking for a bad stretch in GMAP 11080 result after the call to align_halfmapping_with_gmap 11081 11082 * stage3.c: Fixed bug in trim_novel_spliceends when pairs is NULL 11083 110842012-04-03 twu 11085 11086 * stage3.c: Turned off trimming of novel splice ends for GMAP 11087 11088 * index.html: Made changes to reflect new version 11089 11090 * VERSION: Updated version 11091 11092 * stage3hr.c: Fixed issue where no non-terminal alignment existed, resulting 11093 in using min trim length of MAX_READLENGTH 11094 11095 * splicetrie.c: Allowing only 1 mismatch (previously 2) in internal splice 11096 region of 6 bp, and no mismatches in external splice region (previously 11097 depended on extension). This avoids bad splicing due to poor gene models. 11098 11099 * pair.c, pair.h, stage3.c: Using pairs instead of pairarray in determining 11100 whether chimera ends are connectable 11101 11102 * pair.c, pair.h, stage3hr.c: Counting indels in GMAP alignments only within 11103 eventrim region 11104 11105 * stage3.c: Added function trim_novel_spliceends 11106 11107 * pair.c: Requiring that cdna_direction not be zero when printing splice 11108 site probabilities at the ends 11109 111102012-04-02 twu 11111 11112 * trunk, VERSION, src, chimera.c, chimera.h, gmap.c, outbuffer.c, pair.c, 11113 pair.h, pairpool.c, pairpool.h, stage1hr.c, stage3.c, stage3.h, 11114 stage3hr.c, stage3hr.h, util: Merged revisions 60621 to 60936 from 11115 branches/2012-03-27-gmap-chimeras to improve GMAP chimeras and to apply a 11116 uniform eventrim procedure in stage 3 optimal score procedures 11117 111182012-03-30 twu 11119 11120 * gamma-speed-test.c: Added to SVN repository 11121 11122 * config.site.rescomp.tst, configure.ac, memory-check.pl, atoiindex.c, 11123 cmetindex.c, gmap.c, gsnap.c, indexdb.c, pdldata.c, snpindex.c: Added 11124 --enable-mmap flag to configure. Added small fixes to allow programs to 11125 work without mmap. 11126 111272012-03-29 twu 11128 11129 * stage3hr.c: Allowing trimming on both ends of a terminal alignment 11130 11131 * pair.c: Handling case where hard clipping is not possible 11132 111332012-03-27 twu 11134 11135 * stage1hr.c: Handling the case where GMAP alignment is attempted on a 11136 translocation 11137 111382012-03-23 twu 11139 11140 * VERSION, index.html: Updated version 11141 11142 * stage1hr.c: Added multiple checks for GMAP bad stretch 11143 111442012-03-22 twu 11145 11146 * index.html: Updated for latest version 11147 11148 * stage1hr.c: Fixing fatal bug when max_end_insertions is set to less than 3 11149 111502012-03-21 twu 11151 11152 * VERSION: Updated version 11153 11154 * index.html, archive.html: Updated for new version 11155 11156 * gsnap.c: Reduced default value of distant-splice-penalty from 2 to 1 11157 11158 * gsnap.c: Reduced default value of distant-splice-penalty from 3 to 2 11159 11160 * stage1hr.c: Changed criterion for GMAP salvage from 11161 Stage3end_bad_stretch_p to Stage3end_score > cutoff_level, because 11162 previous criterion caused GSNAP to miss distant splicing 11163 11164 * dbsnp_iit.pl.in: Checking whether the exceptions field is defined in the 11165 snp file 11166 111672012-03-20 twu 11168 11169 * gmap.c, outbuffer.c, stage3.c, stage3.h: Removed some unused parameters 11170 11171 * VERSION: Updated version 11172 11173 * stage3hr.c: Changed score for GMAP alignments to be post-trim matches 11174 minus penalties for splicing and indels. Allowing Stage3end_bad_stretch_p 11175 to handle GMAP alignments. 11176 11177 * stage3.c: Not doing any peelback on the extension after trim_ends 11178 11179 * stage1hr.c: Moved check for GMAP bad stretch from elimination as a very 11180 bad alignment to a salvage status 11181 11182 * dynprog.h: Added comment 11183 111842012-03-19 twu 11185 11186 * pair.c, pair.h, stage3.c: Added an extend step after the ends are trimmed, 11187 to get as long an extension as possible. 11188 11189 * stage1hr.c, stage3.c, stage3.h: Added a check for GMAP alignment length, 11190 in addition to the bad stretch check 11191 11192 * gmap.c: Using new interface to Stage3_compute 11193 11194 * substring.c: Added comment 11195 11196 * stage1hr.c, stage3.c, stage3.h, stage3hr.c, stage3hr.h: Removed 11197 computation of gmap_nconsecutive, and implemented Stage3_bad_stretch_p to 11198 evaluate GMAP alignments instead. 11199 112002012-03-16 twu 11201 11202 * gff3_genes.pl.in, gff3_introns.pl.in, gff3_splicesites.pl.in: Filtering 11203 out comment lines beginning with '#'. 11204 11205 * gff3_genes.pl.in, gff3_introns.pl.in, gff3_splicesites.pl.in: Printing 11206 results for last mRNA in each gene 11207 112082012-03-15 twu 11209 11210 * stage3.c: In traversing dual break, accepting the stage 2 solution only if 11211 the entire query sequence is bridged 11212 11213 * gff3_genes.pl.in, gff3_introns.pl.in, gff3_splicesites.pl.in, 11214 gtf_genes.pl.in, gtf_introns.pl.in, gtf_splicesites.pl.in: Made gff3 and 11215 gtf programs handle exons in arbitrary order 11216 11217 * gtf_genes.pl.in, gtf_introns.pl.in: Checking for comment lines 11218 11219 * gtf_genes.pl.in, gtf_introns.pl.in: Allowing GTF file to lack exon_number 11220 field 11221 11222 * gtf_genes.pl.in: Checking both gene_name and gene_id to get gene name 11223 112242012-03-13 twu 11225 11226 * gmap.c, inbuffer.c, inbuffer.h: Added -1 flag for self-align feature 11227 112282012-03-12 twu 11229 11230 * configure.ac, README, config.site: Changed default value of MAX_READLENGTH 11231 from 200 to 250 11232 11233 * stage1hr.c: Added missing else statement 11234 11235 * dynprog.c, splicetrie.c, splicetrie.h: Changed Splicetrie_solve_end5 and 11236 Splicetrie_solve_end3 to take triecontents, trieoffsets, and j, rather 11237 than triestart, and to check for a null pointer. 11238 11239 * stage3hr.c: No longer returning NULL from Stage3end_new_gmap when result 11240 crosses before beginning of genome or at end of chromosome. 11241 11242 * stage1hr.c: Removed possibility of dereferencing uninitialized memory when 11243 skipping over diagonals straddling beginning of genome. 11244 112452012-03-09 twu 11246 11247 * gmap.c, stage3.c, stage3.h: Made changes in Stage3_merge_single compatible 11248 with PMAP 11249 11250 * Makefile.dna.am, gsnap_fasta.c: Moved gsnap_fasta.c and bam_fasta and 11251 sam_fasta programs to GSTRUCT repository 11252 11253 * get-genome.c: Fixed -E option for printing exons from gene map files. 11254 Added -S option for printing sequence from gene map files. 11255 11256 * get-genome.c: Removed R from getopt string 11257 11258 * get-genome.c: Removed -R flag and references to map_relativep 11259 112602012-03-08 twu 11261 11262 * gmap.c: Removed debugging statement 11263 11264 * gmap.c: Fixed test for CHIMERA_SLOP to handle minus strand alignments 11265 correctly. Added a remove duplicates step to stage3list_from_gregions. 11266 Using new interface to Stage3_merge_single and Stage3_merge_splice. 11267 11268 * stage3.c, stage3.h: Implemented functions Stage3_merge_single and 11269 Stage3_merge_splice. The first function uses dynamic programming to solve 11270 the region between the two parts. 11271 11272 * pair.c, pair.h: Implemented function Pair_set_genomepos_list 11273 112742012-03-07 twu 11275 11276 * gsnap.c: Fixed default values in --help statement for number of GMAP runs 11277 allowed 11278 11279 * stage1hr.c: Turned bad stretchp back on as a criterion for running GMAP 11280 improvement in paired-end reads. Implemented GMAP improvement for 11281 single-end reads. 11282 11283 * substring.c, substring.h: Implemented Substring_genestrand 11284 11285 * stage3hr.c, stage3hr.h: Added genestrand as a field for Stage3end_T object 11286 112872012-03-02 twu 11288 11289 * stage3.c: Added another check to make sure we don't try to solve for dual 11290 breaks at the ends of an alignment 11291 112922012-03-01 twu 11293 11294 * dynprog.c: Making certain that left_prob and right_prob are initialized to 11295 0.0 11296 11297 * splicetrie_build.c: Limiting distance for splicetries to shortsplicedist 11298 11299 * stage3.c: Checking for endgappairs == NULL, before trying to access 11300 *endgappairs 11301 113022012-02-29 twu 11303 11304 * configure.ac: Changed default .config.site to ./config.site 11305 11306 * dynprog.c: Fixed cases where left_prob and right_prob were not assigned 11307 11308 * stage3.c: Making singlep a local variable in traverse_dual_genome_gap 11309 113102012-02-28 twu 11311 11312 * stage3.c: Setting *singlep to false to fix bug in traversing dual genome 11313 gaps where left goodness or right goodness was called after a dual gap win. 11314 11315 * gmap.c: Calling Alphabet_setup, Oligo_setup, Oligop_setup, Indexdb_setup, 11316 and Stage1_setup only when genome is provided 11317 113182012-02-27 twu 11319 11320 * gmap.c, stage3.c, stage3.h: Restored cdna_direction == 0 when no splices 11321 are present. Transferring overall cdna_direction to first Stage3_T object 11322 in a chimera. 11323 11324 * stage3.c: Disallowing cdna_direction to be set to 0 11325 11326 * gmap.c: Added call to Oligo_setup 11327 11328 * indexdb.c: Looking for offsetscomp_suffix exactly 11329 11330 * gmap.c: Using new interface to Stage3_compute 11331 113322012-02-24 twu 11333 11334 * VERSION: Updated version number 11335 11336 * stage3.c: Handling case where non-canonical splice is exactly in the 11337 middle of the read 11338 11339 * stage1hr.c, stage3.c, stage3.h, stage3hr.c, stage3hr.h: Replaced 11340 high_quality computation on gmap alignments with gmap_nconsecutive 11341 11342 * outbuffer.c, gsnap.c, goby.c, goby.h: Implemented new Goby code 11343 11344 * gtf_splicesites.pl.in: Skipping comment lines that begin with '#'. 11345 11346 * gtf_splicesites.pl.in: Removed requirement for exon_number field in GTF 11347 file 11348 11349 * gmap_build.pl.in: Added -T flag to specify temporary build directory 11350 11351 * gmap_build.pl.in: Deleting .coords file 11352 113532012-02-14 twu 11354 11355 * gmap_build.pl.in: Fixed filenames for 1-digit values 11356 11357 * oligo.c: Added repetitive oligos for 6-, 7-, and 8-mers. 11358 11359 * indexdb.c: Improved warning message 11360 11361 * gmap_build.pl.in: Made default value for base size to be kmer size 11362 11363 * configure.ac: Moved read of config.site and setting of CFLAGS earlier, so 11364 default CFLAGS is not set by autoconf. 11365 11366 * setup1.test.in, Makefile.am, setup.ref12123positions.ok, 11367 setup.ref123positions.ok: Changed name of positions file to reflect new 11368 naming scheme 11369 11370 * README: Added comment about new naming for positions file 11371 11372 * VERSION: Updated version 11373 11374 * config.site.rescomp.prd, config.site.rescomp.tst: Changed dates 11375 11376 * configure.ac: Adding check for CFLAGS and setting default to be -O3 11377 11378 * config.site: Commenting out default CFLAGS variable 11379 11380 * README: Added comment about adding -m64 in CFLAGS for Macintosh machines 11381 11382 * gsnap.c, stage1hr.c, stage1hr.h: Added control of gmap_indel_knownsplice 11383 feature to gsnap program 11384 11385 * stage3hr.c, stage3hr.h: Added function to help run GMAP on indels to find 11386 known splicing 11387 11388 * stage1hr.c: Added function to run GMAP on indels to find known splicing 11389 11390 * stage1.c: Using new interface to Reader_new 11391 11392 * pmapindex.c: Removed -R flag for processing reverse strand of genome. 11393 Consolidated code for computing offsets and positions. 11394 11395 * pair.c: Eliminated extra points subtracted from Pair_nmatches, so that the 11396 function reports the correct number of matches 11397 11398 * dynprog.c: Fixed bug where known splicing was being called when length2 11399 was 0, resulting in bad endpoints for binary search 11400 114012012-02-03 twu 11402 11403 * stage1hr.c: Using spansize in computing floors. Reduced value of 11404 STAGE2_MIN_OLIGO from 5 to 3. 11405 11406 * gsnap.c, uniqscan.c: Using new interface to Stage1hr_setup and 11407 Spanningelt_setup 11408 11409 * reader.c, reader.h: Removed blocksize as a field for Reader_T object 11410 11411 * oligo.c, oligo.h: Storing oligosize as a static variable, and not using it 11412 anymore as a parameter to Oligo_next or Oligo_skip. 11413 11414 * block.c: Removed oligosize as a field from Block_T object 11415 11416 * stage1hr.c, stage3hr.c, stage3hr.h, substring.c, substring.h: Revised 11417 Stage3end_optimal_score to compare terminal and non-terminal alignments 11418 using an eventrim procedure, based on the maximal trims found in the 11419 non-terminal alignments, and re-computing the mismatch scores in those 11420 regions. 11421 11422 * stage1hr.c: Running Stage3end_remove_overlaps to remove terminals that 11423 overlap with a GSNAP alignment. Using spansize to compute fast_level. 11424 Removed index1part (oligosize) from calls to Reader_new and Oligo_next. 11425 114262012-02-02 twu 11427 11428 * stage1hr.c, stage1hr.h: Generalized from 6 to STAGE2_MIN_OLIGO + 11429 index1interval, for deciding whether a found diagonal is close enough to 11430 end to redefine region for GMAP alignment. Using spansize to compute 11431 floors, and counting all mismatches if spansize != index1part. 11432 11433 * spanningelt.c, spanningelt.h: Added function Spanningelt_setup, which 11434 computes spansize for a given index1part and index1interval 11435 11436 * pairdef.h, pairpool.c, stage3.c: Added field end_intron_p to Pair_T 11437 object, and using it in traversing dual genome gaps. When applicable, 11438 solving introns where left intron or right intron is omitted. 11439 11440 * stage1hr.c: Improved debugging messages by printing chromosomal 11441 coordinates rather than universal coordinates 11442 11443 * trunk, config.site.rescomp.prd, config.site.rescomp.tst, memory-check.pl, 11444 src, Makefile.dna.am, atoiindex.c, cmetindex.c, genomicpos.c, 11445 genomicpos.h, gmapindex.c, indexdb.c, indexdb_hr.c, pmapindex.c, 11446 snpindex.c, spanningelt.c, spanningelt.h, stage1hr.c, types.h, util, 11447 gmap_build.pl.in, gmap_setup.pl.in: Merged revisions 56772 to 56962 from 11448 branches/2012-01-31-index1interval to allow genomic indices up to 16-mers 11449 and sampling intervals from 1 to 3 11450 114512012-02-01 twu 11452 11453 * resulthr.c, stage3hr.c, stage3hr.h: If anomalous splices (of type 11454 SAMECHR_SPLICE) occur within a read for a unique mapping, changing 11455 resulttype from _UNIQ to _TRANSLOC. 11456 11457 * outbuffer.c: Fixed issue with double opening of nomapping output file, 11458 first as single and then as paired 11459 114602012-01-31 twu 11461 11462 * gmapindex.c: Allowing only values 1, 2, and 3 for sampling interval 11463 114642012-01-30 twu 11465 11466 * gmap.c, gsnap.c, indexdb.c, indexdb.h, stage1hr.c, stage1hr.h, uniqscan.c: 11467 Added --sampling flag and passing index1interval to Stage1hr_setup 11468 11469 * uniqscan.c: Using new interface to Indexdb_new_genome 11470 11471 * indexdb.c: Fixed incorrect variable name (oligo instead of oligoi). 11472 11473 * gmap_build.pl.in: Removed fixed value of -q 3 in calls to gmapindex 11474 11475 * gmap_build.pl.in: Added -q flag for sampling interval 11476 11477 * gmap.c, gmapindex.c, gsnap.c: Allowing user to enter -k 16 11478 11479 * block.c, indexdb_dump.c, indexdb_hr.c, stage1.c, stage1hr.c: Fixed mask 11480 calculation to use unsigned long, so kmer of 16 works 11481 11482 * atoiindex.c, cmetindex.c, indexdb.c, indexdb.h, snpindex.c: Enabled 11483 writing of genome indices with kmer of 16 11484 11485 * oligo.c: Allowing k-mer size of 16 11486 11487 * genome_hr.c: Clarified types of gamma and value to be Positionsptr_T and 11488 firstbit to be int. 11489 114902012-01-27 twu 11491 11492 * oligo.c: Allowing k-mer size to go down to 9 11493 11494 * indexdb_hr.c, indexdbdef.h, indexdb.c: Eliminated separate data storage 11495 for offsets when expand_offsets_p is true. Instead, expanding gammas into 11496 offsetscomp and making gammaptrs just the identity function. 11497 114982012-01-19 twu 11499 11500 * indexdb.c, indexdb.h: Allowing selection of base size, and returning found 11501 base size. Simplified logic for selecting indexdb. 11502 11503 * gmap.c, gsnap.c: Added --basesize flag to allow user to select base size 11504 11505 * gmapindex.c: Allowing k-mer size to be 15 or less 11506 115072012-01-18 twu 11508 11509 * gmapindex.c: Added break after handling flag -b 11510 115112012-01-13 twu 11512 11513 * VERSION: Updated version number 11514 11515 * Makefile.three.am: Rebuilt instructions from Makefile.gsnaptoo.am, plus 11516 Makefile.dna.am for PMAP and PMAPINDEX 11517 11518 * Makefile.gsnaptoo.am: Removed gsnap_tally from distribution 11519 11520 * README: Change URL for Web site 11521 11522 * alphabet.c: No longer checking new get_codon procedures against old ones 11523 11524 * pmapindex.c: Revised limit for 8-mers to be alphabet size of 13 or less 11525 11526 * README.PMAP: Added a README file for PMAP 11527 11528 * indexdb.c, indexdb.h: Providing alphabet and alphabet_size to caller of 11529 Indexdb_new_genome 11530 11531 * gmap.c: Added --alphabet flag to PMAP to specify a particular alphabet 11532 11533 * atoiindex.c, cmetindex.c, genome_hr.h, snpindex.c, types.h: Moved 11534 definition of Storedoligomer_T from indexdbdef.h to types.h 11535 11536 * block.c, block.h, stage1.c: Removed msb computation and storage in Block_T 11537 object, and initializing instead by Oligop_setup 11538 11539 * alphabet.c, alphabet.h, oligop.c, oligop.h: Providing aa_index_table to 11540 Oligop procedures at run time 11541 11542 * Makefile.dna.am, alphabet.c, alphabet.h, indexdb.c, indexdb.h, 11543 indexdbdef.h, pmapindex.c: Created Alphabet_T object. Moved relevant 11544 codon-based procedures from indexdb.c to alphabet.c. Replaced 11545 get_codon_fwd and get_codon_rev with lookup tables. Allowed pmapindex to 11546 generate indices with different alphabets and alphabet sizes. 11547 11548 * gmapindex.c: Added check for k-mer size being greater than or equal to 11549 base size 11550 11551 * gmap.c: Made default k-mer size 7 for PMAP 11552 11553 * Makefile.dna.am, indexdb.c, pmapindex.c: Made PMAP index files compatible 11554 with compressed hash tables and added flags to be consistent with other 11555 auxiliary indexing programs 11556 115572012-01-11 twu 11558 11559 * VERSION: Updated version 11560 11561 * index.html: Updated Web page for latest version 11562 11563 * mapq.c: Fixed debugging statement 11564 11565 * stage3hr.c: In output_cmp procedures, sorting first by nmatches (pretrim), 11566 and then by mapq. Added procedure to enforce monotonicity of mapq scores. 11567 11568 * stage1hr.c, stage3hr.c, stage3hr.h: Running optimal_score first allowing 11569 all GMAP alignments, then removing overlaps, and then running 11570 optimal_score again without special provision for GMAP alignments. 11571 11572 * gmap.c, gsnap.c, pair.c, pair.h, uniqscan.c: Added flag --sam-use-0M to 11573 control printing of 0M in CIGAR strings 11574 11575 * gmap.c, gsnap.c, stage3.h, uniqscan.c: Providing output_sam_p to 11576 Stage3_setup 11577 11578 * stage3.c: In fill_in_gaps, handling dual breaks by inserting query and 11579 genomic segments when output_sam_p is true. 11580 11581 * pair.c: In compute_md_string, setting state to be IN_MATCHES after seeing 11582 I token, so we don't print two successive ^ tokens. 11583 11584 * stage3.c: Added a pass to remove adjacent insertions and deletions and fix 11585 single gaps again. 11586 115872012-01-10 twu 11588 11589 * gmap_build.pl.in: Doing chmod on gammaptrs only if kmersize > basesize 11590 115912012-01-09 twu 11592 11593 * gsnap.c, stage1hr.c, stage3hr.c, stage3hr.h: Computing GMAP alignment 11594 score using standard splicing penalties and indel penalties 11595 115962012-01-06 twu 11597 11598 * README: Added information about maximum genome length. Added information 11599 about SAM fields. 11600 11601 * atoiindex.c, cmetindex.c: Eliminated memory leaks and reduced memory usage 11602 from 20 GB to 12 GB for a human-sized genome 11603 11604 * gmap.c, gsnap.c, uniqscan.c: Using new interfaces to Dynprog_setup and 11605 Stage3_setup to provide information about novelsplicingp. Passing 11606 fileroot instead of dbroot to Datadir_find_mapdir. 11607 11608 * stage3hr.c: Stage3end_optimal_score and Stage3pair_optimal_score now 11609 keeping all GMAP results 11610 11611 * stage3.c, stage3.h: Marking pairs as disallowed when novelsplicingp is 11612 false and dynamic programming cannot find a solution provided by the 11613 splicing_iit file. Trimming end introns that are disallowed. 11614 11615 * stage1hr.c: Added debugging message 11616 11617 * dynprog.c, dynprog.h: The bridge_intron_gap function now handling runs 11618 where novelsplicingp is false and known splicing is at intron level. 11619 Dynprog_genome_gap returning NULL in such cases. 11620 11621 * pair.c, pairdef.h, pairpool.c: Added field disallowedp to Pair_T object 11622 11623 * iit-read.c, iit-read.h: Added functions IIT_low_exists and 11624 IIT_high_exists, used for intron-level known splicing. Improved warning 11625 message for invalid IIT files. 11626 116272012-01-05 twu 11628 11629 * gmapindex.c: Added check for total genomic length exceeding 4 GB 11630 11631 * snpindex.c: Reduced maximum memory usage for human genome from 12 GB to 8 11632 GB. Eliminated memory leaks. 11633 116342012-01-03 twu 11635 11636 * trunk, src, util: Merged revisions 50470 to 50909 from 11637 branches/gmap-2011-10-24-mult-stage2 11638 11639 * config.site.rescomp.tst, config.site.rescomp.prd: Changed date 11640 116412011-12-28 twu 11642 11643 * README: Changed comment about splicing file with known introns possibly 11644 being buggy 11645 11646 * VERSION: Updated version number 11647 11648 * index.html: Added changes for new version 11649 11650 * iit-read.c: If add_iit_p is true, trying filename with .iit suffix first 11651 before trying filename as given 11652 11653 * stage3hr.c: Added check for low coordinate of new GMAP object being to the 11654 left of coordinate 0 11655 116562011-12-21 twu 11657 11658 * shortread.c: Allowing for /3 endings in second end of Illumina short reads 11659 116602011-12-13 twu 11661 11662 * index.html: Made update for new version 11663 11664 * VERSION: Updated version number 11665 11666 * stage3.c: Fixed issue in trying to solve dual introns with negative query 11667 coordinates 11668 116692011-12-09 twu 11670 11671 * spanningelt.c, spanningelt.h, stage1hr.c: Alternate fix to problem with 11672 positions being NULL, while npositions > 0. Fixed Spanningelt_set so it 11673 updates npositions along with positions. 11674 11675 * VERSION: Updated version number 11676 11677 * archive.html: Moved old version here 11678 11679 * index.html: Added new version. Added information about users group. 11680 11681 * stage1hr.c: Added an extra check for positions[querypos] not being NULL, 11682 needed for nonstranded modes 11683 116842011-12-07 twu 11685 11686 * gmap_build.pl.in: Fixed bug in not assigning cmd variable 11687 116882011-12-05 twu 11689 11690 * shortread.c: Fixed bug in printing null accession for second end when 11691 using --fails-as-input flag 11692 11693 * gmap_build.pl.in: Checking return codes from system calls 11694 116952011-12-02 twu 11696 11697 * VERSION, index.html: Updated version number 11698 11699 * gsnap.c: Changed name of flag from --ambig-splice-notrim to 11700 --ambig-splice-noclip 11701 11702 * pair.c: Pushing "0M" between adjacent I and D operations in a cigar string 11703 11704 * gmap.c, gsnap.c, uniqscan.c: Using new interface to Splicetrie_setup. 11705 Providing --ambig-splice-notrim flag in GSNAP. 11706 11707 * splicetrie.c, splicetrie.h: Providing behavior to turn clipping off at 11708 ambiguous known splice sites, useful if trying to turn off all soft 11709 clipping 11710 11711 * stage1hr.c: Fixed debugging statement 11712 117132011-11-30 twu 11714 11715 * stage1.c: Fixed variable name so PMAP could compile 11716 117172011-11-29 twu 11718 11719 * dynprog.c, maxent_hr.c, maxent_hr.h, pair.c, stage1hr.c, stage3.c, 11720 substring.c: Checking for case where splice_pos minus margin goes beyond 11721 beginning of chromosome 11722 11723 * VERSION, index.html: Updated version number 11724 11725 * maxent_hr.c: Checking for case where splice_pos is smaller than margin 11726 11727 * pair.c, samprint.c: Changed value of NM tag in SAM output to be edit 11728 distance (mismatches plus gaps) 11729 11730 * stage1hr.c: Fixed debugging statement 11731 117322011-11-27 twu 11733 11734 * VERSION, index.html: Updated version number 11735 11736 * gsnap.c: Calling SAM_setup 11737 11738 * outbuffer.c: For GMAP, when quiet_if_excessive_p is true and npaths > 11739 maxpaths, not printing any output 11740 11741 * pair.c, pair.h, stage3.c: Printing HI tag in SAM output 11742 11743 * samprint.c, samprint.h: When quiet_if_excessive_p is true and npaths_mate 11744 > maxpaths, setting MATE_UNMAPPED in flag. Printing HI tag. Added 11745 function SAM_setup. 11746 117472011-11-25 twu 11748 11749 * VERSION, index.html: Updated version number 11750 11751 * stage3.c: For GMAP, using QUERYEND_INDELS instead of QUERYEND_NOGAPS 11752 11753 * stage1.c: Added check for querylength being too short 11754 11755 * gmap.c: Changed information in --help about how to turn off chimera 11756 detection 11757 11758 * stage3.c: In Stage3_mergeable, added check for firstpart_npairs or 11759 secondpart_npairs being 0 11760 117612011-11-23 twu 11762 11763 * index.html, VERSION: Updated version number 11764 11765 * stage1.c: Made direct calls to fields in Match_T objects for speed. Made 11766 changes so debugging macros work. 11767 11768 * dynprog.c: Made special traceback procedure for queryend_nogaps 11769 117702011-11-21 twu 11771 11772 * stage3hr.c: Fixed bug where Stage3end_remove_overlaps was not keeping ends 11773 where paired_usedp was true. In Stage3end_remove_overlaps and 11774 Stage3pair_remove_overlaps, terminal alignments lose to all other types. 11775 11776 * stage3hr.h: Put GMAP hittype before TERMINAL hittype 11777 11778 * gsnap.c: Increased default values for max_gmap_pairsearch, 11779 max_gmap_terminal, and max_gmap_improvement 11780 11781 * stage3hr.c: Changed code for Stage3end_remove_overlaps to parallel that 11782 for Stage3pair_remove_overlaps 11783 11784 * samprint.c: Allowing for GMAP alignments to be printed for single-end reads 11785 11786 * VERSION, config.site.rescomp.prd, config.site.rescomp.tst: Updated version 11787 number 11788 11789 * stage3.c: Added comment about pass 8 and 9 11790 11791 * index.html: Made changes for 2011-11-20 version 11792 11793 * uniqscan.c: Using new interface to Stage1_single_read 11794 11795 * stage3hr.c: Changed the code for Stage3end_optimal_score to be consistent 11796 with that of Stage3pair_optimal_score. Allowing terminal alignments to be 11797 considered along with other alignments. 11798 11799 * stage1hr.c: For single-end alignment, initializing done_level to 11800 user_maxlevel, rather than opt_level, to be consistent with the code for 11801 paired-end alignment. However, the values are the same anyway. 11802 11803 * gsnap.c: Changed the default value for terminal_threshold from 3 to 2. 11804 Expanded the --help entry for --terminal-threshold. 11805 11806 * index.html: Made changes for version 2011-11-17 11807 118082011-11-20 twu 11809 11810 * stage3.c: Added more restrictions on non-canonical splices at ends of read 11811 11812 * stage3hr.c, stage3hr.h: Added functions Stage1end_start_endtype and 11813 Stage1end_end_endtype for terminal alignments 11814 11815 * gsnap.c: Using new interface to Stage1_single_read 11816 11817 * gmap.c, stage3.c, stage3.h: Changed name of variable 11818 11819 * stage1hr.c, stage1hr.h: Implemented GMAP terminal mode for single-end reads 11820 11821 * dynprog.c: Lowered rewards for canonical introns, to help find 11822 non-canonical introns 11823 11824 * stage3.c: In fill_in_gaps, if splicingp is false, then filling in a 11825 deletion, not an intron 11826 118272011-11-19 twu 11828 11829 * stage3.c: Moved trim_noncanonical procedures from path_trim to 11830 path_compute. After trim_noncanonical procedures, doing an extension using 11831 QUERYEND_NOGAPS. 11832 11833 * stage1hr.c: Using -3*nmismatches and -4 for an indel in evaluating end 11834 indels, corresponding to default values for trim_mismatch_score and 11835 trim_indel_score. 11836 118372011-11-18 twu 11838 11839 * stage3.c: Using endalign instead of to_queryend_p. When endalign is 11840 QUERY_NOGAPS, not doing peelback. In pass 8, changed extendp from false 11841 to true (QUERYEND_INDELS). 11842 11843 * gsnap.c, uniqscan.c: Added flag --trim-indel-score 11844 11845 * dynprog.c, dynprog.h: Replaced to_queryend_p with endalign, with types 11846 BEST_LOCAL, QUERYEND_INDELS, and QUERYEND_NOGAPS. For extensions to 11847 queryend, always returning gappairs. 11848 11849 * atoiindex.c, cmetindex.c: Updated --help to indicate how --kmer and -D are 11850 chosen by default 11851 118522011-11-17 twu 11853 11854 * VERSION: Updated version number 11855 11856 * gmap.c, gsnap.c, stage1hr.c, stage3.c, stage3.h, uniqscan.c: Revised 11857 calculation of insertlength inside trim_noncanonical_ends procedures to 11858 compensate for using maximum overlap in computing genomicseg. 11859 11860 * stage3.c: Made fill_in_gaps procedure replace short non-canonical introns 11861 with deletions. Removed this feature from assign_gap_types. Added 11862 additional checks on translation coordinates to stay within array bounds. 11863 11864 * dynprog.c: Added debugging statements 11865 11866 * substring.c: Added comment 11867 11868 * shortread.c, shortread.h, stage1hr.c: For GMAP algorithm in GSNAP, 11869 assuming maximum overlap, rather than trying to compute overlap 11870 11871 * outbuffer.c: Printing all paths when -E flag is given to GMAP 11872 11873 * stage3hr.c: Improved error message when ambig end splicetype has an 11874 unexpected value 11875 11876 * dynprog.c: Removed unused code for END_KNOWNSPLICING_SHORTCUT. Always 11877 assigning *ambig_end_length in Dynprog_end5_known and Dynprog_end3_known. 11878 118792011-11-14 twu 11880 11881 * uniqscan.c: Using new interface to Stage3hr_setup 11882 11883 * stage3.c: Not substituting for long deletions at end of path_compute 11884 11885 * stage1hr.c, stage3hr.c, stage3hr.h: Removed end_indel_p parameters to 11886 Stage3end_new_insertion and Stage3end_new_deletion 11887 11888 * stage1hr.c: Preventing call to Genome_count_mismatches_limit by 11889 find_doublesplices where pos5 >= pos3 11890 11891 * VERSION: Updated version number 11892 11893 * configure.ac: Added flag --enable-popcnt 11894 11895 * dynprog.c: Initialing value of ambig_end_length 11896 11897 * stage3.c: Skipping gap pairs at the beginning of alignments in fill_in_gaps 11898 11899 * goby.c: Using new interface to Result_array 11900 119012011-11-12 twu 11902 11903 * VERSION: Updated version number 11904 11905 * stage1hr.c, stage3hr.c: Checking in Stage3end_new_gmap if genomicstart or 11906 genomicend exceeds chrhigh, and if so, returns NULL. 11907 119082011-11-11 twu 11909 11910 * stage3.c: Moved assigning of gap types to end of path_compute, rather than 11911 beginning of path_trim 11912 11913 * stage3.c: Inserting gap pairs and adding gap types at beginning of 11914 path_trim 11915 11916 * gsnap.c, outbuffer.c, outbuffer.h, samprint.c, samprint.h, stage1hr.c, 11917 stage3hr.c, stage3hr.h: Divided DISTANT_SPLICE type to SAMECHR_SPLICE and 11918 TRANSLOC_SPLICE. Making merge_samechr_p act only at print time, which 11919 allows SAMECHR_SPLICE to undergo pair_up_concordant again. Stopping 11920 clip-overlap on distant splices. 11921 119222011-11-10 twu 11923 11924 * VERSION: Updated version number 11925 11926 * stage3hr.c: Keeping hits in optimal scoring if nmatches_posttrim is 11927 sufficiently high relative to the best hit 11928 11929 * gsnap.c, uniqscan.c: Increased value of max_deletionlength from 30 to 50 11930 11931 * gmap.c, stage1hr.c: Using new interface to Stage3_compute 11932 11933 * stage3.c, stage3.h: Changed flow to path_compute on both cdna directions, 11934 then pick_cdna_direction, then path_trim, which removes non-canonical end 11935 exons. 11936 11937 * stage3hr.c, stage3hr.h, substring.c, substring.h: Added nmatches_posttrim 11938 and using it to break ties resulting from equal values of nmatches. Added 11939 a general test for Substring_new based on matches and mismatches before 11940 trimming. 11941 11942 * pair.c: Modified Pair_nmatches to add penalties for an indel and for a 11943 splice site with low probabilities. Modified compute_md_string on I 11944 tokens to skip only for insertion pairs. 11945 11946 * dynprog.c: Fixed procedure find_best_endpoint_to_queryend to look only at 11947 r == length1 11948 11949 * stage3hr.c: Turned off separate treatment of terminal alignments in 11950 Stage3pair_optimal_score. Always trimming ends of insertions and 11951 deletions (previously depended on value of end_indel_p). 11952 119532011-11-09 twu 11954 11955 * stage3.c: Using a sliding scale in trimming end exons 11956 11957 * pair.c: Added debugging macro for compute_md_string 11958 11959 * pair.c, pair.h: Taking cdna_direction as a parameter in 11960 Pair_print_exonsummary, instead of determining sense separately for each 11961 intron 11962 11963 * stage3.c: Restored alignment score in pick_cdna_direction with different 11964 values of significant difference for GMAP and GSNAP. Considering a 11965 non-canonical intron as canonical if both splice site probabilities are 11966 high. 11967 11968 * splicing-score.c: Fixed getopt so -D takes an argument 11969 11970 * pairpool.c, pairpool.h: Revised Pairpool_count_bounded to return number of 11971 pairs at start. 11972 11973 * gmap.c: Added comments 11974 11975 * stage3.c: Restored use of alignment scores in pick_cdna_direction, after 11976 comparing number of noncanonical splices 11977 119782011-11-08 twu 11979 11980 * VERSION: Updated version number 11981 11982 * shortread.c: No longer printing warning message about not finding "/1" or 11983 "/2" endings 11984 11985 * gsnap.c, uniqscan.c: Added max_deletionlength variable 11986 11987 * gmap.c, outbuffer.c, outbuffer.h, pair.c, pair.h, samprint.c: No longer 11988 converting short noncanonical splices from type N to type D, since this is 11989 now performed in stage 3. Removed cigar_noncanonical_splices_p variable. 11990 11991 * stage3.c, stage3.h: In assign_gap_types, converting noncanonical splices 11992 smaller than max_deletionlength into deletions 11993 11994 * gsnap.c, stage3hr.c, stage3hr.h, uniqscan.c: Treating distant splices on 11995 the same chromosome as a translocation by default. Added a flag 11996 --merge-distant-samechr to get previous behavior. 11997 11998 * stage3.c: Using maxintronlen in trimming end exons 11999 12000 * stage3.c: Removed alignment scores from pick_cdna_direction. Revised 12001 procedures for trimming noncanonical end exons and doing distal/medial 12002 comparison by adding procedures canonicalp and good_end_intron_p, with 12003 latter using probabilities. 12004 12005 * gmap.c: Increased parameter for maxpeelback_distalmedial from 24 to 100 12006 12007 * dynprog.c: Added debugging statements 12008 120092011-11-07 twu 12010 12011 * pair.c: Computing MD string from cigar tokens 12012 12013 * stage2.c: Restored querydist penalty 12014 12015 * VERSION: Revised version number 12016 12017 * index.html: Added entry for new version 12018 12019 * stage1hr.c: Commented out second round of terminal alignments 12020 12021 * gmap.c, gsnap.c, uniqscan.c: Set trim_indel_score to -4 to be consistent 12022 with previous value 12023 12024 * gmap.c, gsnap.c, uniqscan.c: Calling Pair_setup 12025 12026 * substring.c: Extending trimming toward ends in case of ties 12027 12028 * stage3.c: Extending ends completely before final trim 12029 12030 * pair.h, pair.c: Extending trimming toward ends in case of ties. Added 12031 Pair_setup function to use trim_mismatch_score value provided by user. 12032 12033 * stage2.c: Put code for suboptimal starts into a compiler directive 12034 12035 * gmap.c, gsnap.c, uniqscan.c: Provided new defaults for 12036 suboptimal_score_start and suboptimal_score_end, based on simulations 12037 12038 * gmap.c, gsnap.c, stage2.c, stage2.h, uniqscan.c: Introduced parameters for 12039 suboptimal_score_end and suboptimal_score_start 12040 120412011-11-06 twu 12042 12043 * gmap.c: Added flag for --suboptimal-score 12044 12045 * gmap.c, gsnap.c, outbuffer.c, pair.c, pair.h, result.c, result.h, 12046 resulthr.c, resulthr.h, samprint.c, samprint.h, stage1hr.c, stage3.c, 12047 stage3.h, stage3hr.c, stage3hr.h, uniqscan.c: Restoring old MAPQ score. 12048 Making absolute MAPQ score a separate calculation, and printing it in an 12049 XQ flag. 12050 120512011-11-05 twu 12052 12053 * stage1hr.c: Added comments 12054 12055 * stage3hr.c: Added field indel_low, and using to prefer indels at low 12056 genomic coords 12057 12058 * stage1hr.c: Fixed computation of firstbound and lastbound so end indels 12059 are found on short reads, such as 36-mers. 12060 120612011-11-04 twu 12062 12063 * pair.c: Added code to compute_cigar to merge duplicate token types 12064 12065 * stage2.c: Fixed uninitialized value for last_canonicalp 12066 12067 * stage2.c: Implemented ability to generate suboptimal paths based on 12068 different initial positions 12069 120702011-11-03 twu 12071 12072 * gmapindex.c: Removed unnecessary file open for -P flag 12073 12074 * gmapindex.c: Fixed memory leaks for -G flag 12075 12076 * gmapindex.c: Fixed memory leaks for -A flag 12077 120782011-11-01 twu 12079 12080 * pairpool.c: Commented out copy of shortexonp 12081 12082 * dynprog.c, dynprog.h, gmap.c, gsnap.c, stage3.c, uniqscan.c: Removed 12083 endlength requirement for microexons. Returning prob2 and prob3 from 12084 Dynprog_microexon_int and applying two standards, depending on whether an 12085 indel was originally present 12086 12087 * pair.c: Printing shortexon information 12088 12089 * pairpool.c: Copying shortexon information in copying pairs 12090 12091 * smooth.c: Removed unused parameters 12092 12093 * stage3.c: Adding endlength requirement for finding microexons 12094 12095 * smooth.c: Printing result of smoothing as pairs 12096 12097 * stage3.c: Made penalties harsher for indels at end near poor splice sites 12098 12099 * stage2.c: Computing best overall score during dynamic programming process 12100 12101 * uniqscan.c: Using new interface to Stage3hr_setup 12102 12103 * stage2.c: Made changes so PMAP could compile 12104 12105 * stage3.h: Added parameter favor_mode to Stage3_compute 12106 12107 * stage3.c: Counting indel near splice as 2 mismatches. Added endlength 12108 requirement of 12 for indel near splice. 12109 12110 * stage2.c: Going through all hits to accumulate cells. No longer using 12111 number of links to set root scores. 12112 12113 * gmap.c, stage1hr.c: Passing value of favor_mode to Stage3_compute 12114 12115 * smooth.c: Relaxing probability requirement for end exons in GSNAP from 12116 0.05 to 0.10. 12117 121182011-10-31 twu 12119 12120 * stage3.c: In trimming non-canonical end exons, not combining nearindelp 12121 with splice probs, requiring 1 mismatch or less for bingop, and allowing 12122 AT-AC introns. Extending alignments to queryend before trimming 12123 non-canonical end exons. 12124 12125 * stage3.c: In trimming end exons, using a sliding scale based on intron 12126 length. Also penalizing for indels and mismatches close to exon-exon 12127 boundary. 12128 121292011-10-28 twu 12130 12131 * gsnap.c, stage1hr.c, stage1hr.h, stage3hr.c, stage3hr.h: Providing 12132 expected_pairlength and pairlength_deviation values in Stage3hr_setup, and 12133 removing from Stage1hr procedures. 12134 12135 * stage3.c: In end exons, checking if indel present, and if so, requiring 12136 that splice probabilities both be greater than 0.9. 12137 12138 * gsnap.c, stage3hr.c, stage3hr.h: Restoring pairlength deviation. Using 12139 expected pairlength and pairlength deviation to discriminate among 12140 paired-end reads. 12141 12142 * stage2.c: Adding all hits from final querypos directly to celllist, rather 12143 than updating rootscores. Fixed update of rootscore information to use 12144 current querypos and hit. Dynamic programming starting from querypos 0, 12145 rather than querypos 1. 12146 12147 * diag.c: Restricted update of diagonal in middle region, to avoid affecting 12148 subsequent beginning and end regions 12149 12150 * splicetrie_build.c: Fixed handling of intron intervals, by introducing 12151 INTRON_HIGH_TO_LOW 12152 121532011-10-27 twu 12154 12155 * stage3hr.c: Made other fixes to allow copying of GMAP hit types 12156 12157 * stage1hr.c: Making copies where necessary for multiple GMAP subpaths, and 12158 freeing old Stage3pair_T objects at the appropriate time. Calling 12159 Stage3pair_remove_overlaps on double GMAP alignments. 12160 12161 * pairpool.c: Using CALLOC_OUT in Pairpool_copy_array 12162 12163 * gsnap.c: Printing sequence name when debugging memusage 12164 12165 * stage1hr.c: Handling multiple subpaths from stage 2 computation 12166 12167 * stage2.c: Removed restriction on number of positions for final non-zero 12168 querypos 12169 12170 * stage3hr.c: Allowing Stage3end_T objects of type GMAP to be copied 12171 12172 * pairpool.c, pairpool.h: Added function Pairpool_copy_array 12173 12174 * stage2.c: Fixed location of compiler directive for PMAP 12175 12176 * stage3.c: Restoring negative points for non-canonical introns in computing 12177 goodness 12178 12179 * stage2.c: Adding root scores for final non-zero and specific querypos 12180 12181 * gmap.c, list.c, list.h, stage1hr.c, stage1hr.h, stage2.c, stage2.h: Going 12182 back to changes from revision 50508 for multiple subpaths from stage2, 12183 plus revisions 50504 to 50909 from branches/gmap-2011-10-24-mult-stage2 12184 for a root position method for finding optimal and suboptimal subpaths 12185 121862011-10-25 twu 12187 12188 * src, gmap.c, list.c, list.h, stage1hr.c, stage1hr.h, stage2.c, stage2.h: 12189 Reverted to version 50507, before changes made to allow multiple paths 12190 from a stage2 computation 12191 121922011-10-24 twu 12193 12194 * VERSION: Updated version number 12195 12196 * archive.html, index.html: Added changes for 2011-10-24 version 12197 12198 * gmap.c, list.c, list.h, stage1hr.c, stage1hr.h, stage2.c, stage2.h: Merged 12199 revisions 50469 to 50504 from branches/2011-10-24-mult-stage2 to allow for 12200 multiple stage2 results from the same genomic segment for GMAP. 12201 12202 * stage3.c: Assigning sensedir in all cases, based on intron scores if 12203 necessary 12204 12205 * gmap.c, gsnap.c, outbuffer.c, pair.c, pair.h, result.c, result.h, 12206 resulthr.c, resulthr.h, samprint.c, samprint.h, stage1hr.c, stage1hr.h, 12207 stage3hr.c, stage3hr.h, uniqscan.c: Computing MAPQ score relative to best 12208 alignment. Printing X2 field in SAM output to provide second best MAPQ 12209 score. 12210 12211 * stage3.c, stage3.h: When sense_try is provided, assigning sensedir 12212 122132011-10-16 twu 12214 12215 * VERSION: Updated version number 12216 12217 * index.html: Added comment about change to GMAP 12218 12219 * gmap.c: Sorting stage3list before evaluating for chimeras 12220 122212011-10-14 twu 12222 12223 * VERSION: Updated version number 12224 12225 * index.html: Updated for new version 12226 12227 * configure.ac: Grouped together checks for built-in procedures 12228 12229 * splicetrie_build.c, splicetrie_build.h: Checking for splice sites being 12230 beyond the chromosome length boundary 12231 12232 * splicetrie.c: Handling case where trieoffsets has a NULL_POINTER, which 12233 can occur with intron-type splicing 12234 12235 * splicetrie_build.c: Handling case where nsites from a given splice site is 12236 zero 12237 122382011-10-13 twu 12239 12240 * popcnt.m4: Checking each built-in instruction only if available 12241 12242 * asm-bsr.m4: Assigning a value to x 12243 12244 * VERSION: Updated version number 12245 12246 * gsnap.c: Removed 'S' from getopt 12247 12248 * gsnap.c: Fixed naming of --splicingdir flag 12249 12250 * samprint.c: Not printing mate chr or mate chrpos when mate has an 12251 excessive number of paths 12252 12253 * popcnt.m4: Changed test for -mpopcnt from a compiler test to a run test, 12254 to make sure instruction is legal 12255 122562011-10-12 twu 12257 12258 * dbsnp_iit.pl.in: Fixed syntax 12259 12260 * stage1hr.c: Fixed debugging statements 12261 12262 * VERSION: Updated version number 12263 12264 * README, dbsnp_iit.pl.in: Made changes to handle exceptions within the snp 12265 file 12266 12267 * stage1hr.c: Fixed detection of novel splice ends for distant splicing 12268 12269 * maxent_hr.c: Fixed debugging commands 12270 12271 * stage3hr.c: Added comment 12272 12273 * stage3hr.c: Eliminating terminals in Stage3end_optimal_score when 12274 non-terminals are present 12275 12276 * iit-read.c: Checking for case where nintervals is zero 12277 12278 * iit-write.c: Fixed memory leak in IIT_build 12279 122802011-10-10 twu 12281 12282 * stage3hr.c: Removing -nindels from calculation of querylength_adj 12283 12284 * gsnap.c: Limiting minimum value of indexdb_size_threshold 12285 12286 * indexdb.c: Fixed bug in computing Indexdb_mean_size using compressed hash 12287 table 12288 12289 * genome_hr.c: Adding +1 to ctr only when necessary 12290 122912011-10-09 twu 12292 12293 * genome_hr.c, genome_hr.h: In gamma decoding, changed from division of 12294 shift by 2 to using shift - 1, made all variables unsigned ints, and added 12295 code for branchless computation. 12296 122972011-10-07 twu 12298 12299 * gmap.c, gsnap.c, inbuffer.c, inbuffer.h: Generalized --filter-chastity to 12300 work on either or both ends of a paired-end read 12301 12302 * index.html: Changed wording 12303 12304 * index.html: Added information about --filter-chastity option 12305 12306 * goby.c, gsnap.c, inbuffer.c, shortread.c, shortread.h, uniqscan.c: 12307 Implemented --filter-chastity option 12308 12309 * genome_hr.c: Implemented a 2-shift method for decoding gammas, which 12310 avoids a branch 12311 12312 * config.site.rescomp.prd, config.site.rescomp.tst: Updated for new version 12313 12314 * index.html: Entered new version 2011-10-07 12315 12316 * VERSION: Updated version number 12317 12318 * splicetrie.c: Fixed a bug where splicesites_i was not being initialized to 12319 NULL 12320 12321 * genome_hr.c: Removed dispatch procedures. Moved part of read_gamma to top 12322 of loop, to guarantee that the final ptr location is correct. 12323 12324 * genome_hr.c: Removed nbits. Implemented macros for clear_lowbit and 12325 clear_highbit. 12326 12327 * genome_hr.c: Implemented both dispatch and non-dispatch methods for 12328 getting offsetptrs from gammas 12329 123302011-10-06 twu 12331 12332 * genome_hr.c: Rearranged order of gamma computations within loops 12333 12334 * genome_hr.c: For gamma commands, trying builtin clz first, then bsr in 12335 assembly, then table lookup. 12336 12337 * Makefile.dna.am, Makefile.gsnaptoo.am: Added POPCNT_CFLAGS 12338 12339 * config.site: Made CFLAGS=-O3 and added comments about testing -mpopcnt 12340 12341 * acinclude.m4, asm-bsr.m4, popcnt.m4, configure.ac: Added tests for 12342 -mpopcnt compiler flag and bsr function in assembly 12343 12344 * genome_hr.c: Added assertion statements to make sure builtin_clz is not 12345 called with a value of 0 12346 123472011-10-05 twu 12348 12349 * VERSION: Updated version number 12350 12351 * index.html: Made changes for latest version 12352 12353 * configure.ac: Added gff3 utility programs 12354 12355 * stage3hr.c: Sped up comparison of overlapping and separate paired-end 12356 alignments by using arrays instead of lists. 12357 12358 * stage3hr.c: Simplified procedure for finding bad superstretches 12359 12360 * stage1hr.c: Fixed bugs in pushing indels to low genomic position in 12361 solve_middle_insertion and solve_middle_deletion 12362 12363 * gsnap.c: Commenting out --genes option 12364 12365 * stage3hr.c: Performing separate runs of Stage3pair_remove_overlaps on 12366 overlapping and non-overlapping alignments 12367 12368 * stage3hr.c: Added check for bad superstretches in Stage3pair_overlap 12369 12370 * gsnap.c, stage1hr.c, stage1hr.h, uniqscan.c: Implemented a fast, 12371 integrated method for novel and known double splicing, and using it by 12372 default 12373 12374 * access.c: Added error messages when mmap fails 12375 123762011-10-04 twu 12377 12378 * stage3hr.c: Picking a winner in case of ties in Stage3pair_remove_overlaps 12379 12380 * stage1hr.c: Fixed calculation of genomicstart and genomicend for GMAP 12381 alignments, to extend as if there were no trimming 12382 12383 * stage3hr.c: Added debugging statements 12384 12385 * stage1hr.c: Made finding of novel doublesplices faster 12386 12387 * psl_introns.pl.in, psl_splicesites.pl.in: Moved print_exons into a 12388 subroutine 12389 12390 * psl_introns.pl.in, psl_splicesites.pl.in: Using donor_okay_p and 12391 acceptor_okay_p subroutines 12392 12393 * gtf_introns.pl.in: Changed warning message to refer to intron, not exon 12394 12395 * gtf_genes.pl.in: Removed unused variables 12396 12397 * Makefile.am, gff3_genes.pl.in, gff3_introns.pl.in, gff3_splicesites.pl.in: 12398 Added programs to handle GFF3 files 12399 12400 * Makefile.dna.am: Removed programs for processing BAM files 12401 12402 * iitdef.h, iit-write.c: Restored NUMERIC_ALPHA_SORT to sort types 12403 12404 * stage1hr.c: Fixed issue with finding novel doublesplices where not all 12405 middle segments were tested 12406 124072011-10-03 twu 12408 12409 * iitdef.h, iit-write.c, iit-write.h: Added code for creating IIT files 12410 internally 12411 12412 * archive.html, index.html: Made changes for version 2011-10-01 12413 124142011-10-02 twu 12415 12416 * VERSION: Updated version number 12417 124182011-10-01 twu 12419 12420 * sequence.c: Removed compiler directive to undef HAVE_ZLIB 12421 12422 * stage1hr.c: Allowing reads to be equal to index1part+2, not just greater 12423 than 12424 12425 * Makefile.dna.am, uniqscan.c: Added program uniqscan 12426 124272011-09-30 twu 12428 12429 * Makefile.dna.am, dynprog.c, dynprog.h, gmap.c, gsnap.c, splicetrie.c, 12430 splicetrie.h, stage1hr.c, stage1hr.h, stage3.c, stage3.h: Moved 12431 splicesites and trieoffsets into local static variables to avoid passing 12432 them as parameters 12433 12434 * stage1hr.c: Moved check of genestrand value outside loop for retrieving 12435 oligos 12436 12437 * substring.c: Created separate procedures for mark_mismatches for stranded 12438 and non-stranded cases 12439 12440 * genome_hr.c, gsnap.c, indexdb.c, mode.h, stage1hr.c, substring.c: 12441 Implemented stranded and non-stranded versions of cmet and atoi 12442 substitutions 12443 12444 * trunk, VERSION, config.site.rescomp.prd, config.site.rescomp.tst, src, 12445 compress.c, genome_hr.c, genome_hr.h, gsnap.c, mapq.c, mapq.h, 12446 splicetrie.c, splicetrie.h, stage1hr.c, stage3hr.c, stage3hr.h, 12447 substring.c, substring.h, util: Merged revisions 48527 to 48790 from 12448 branches/2011-09-28-atoi 12449 124502011-09-29 twu 12451 12452 * stage1hr.c: Previously returning NULL if either end of a paired-end read 12453 had no oligos. Now returning NULL if both ends have no oligos. 12454 12455 * stage3hr.c: Requiring that overlap be greater than both ends of a 12456 paired-end read 12457 124582011-09-28 twu 12459 12460 * oligoindex_hr.c: Not computing oligoindex if left_plus_length <= left 12461 12462 * psl_introns.pl.in, psl_splicesites.pl.in: Added warning if intron lengths 12463 are negative 12464 12465 * splicetrie_build.c: Added check for negative distances in a splicing_iit 12466 file 12467 12468 * genome_hr.c: Fixed assignment to predicate inside of assert statement 12469 124702011-09-22 twu 12471 12472 * VERSION: Updated version number 12473 12474 * gsnap.c: Fixed typo in help statement 12475 12476 * table.c, uinttable.c: Allowing retrieval of keys to work, even if table is 12477 empty 12478 12479 * substring.c: Allowing case where nothing is found in splicesites_iit, 12480 because it is for introns and not splicesites 12481 12482 * splicetrie.c: Fixed case where splicetrie_obs is not NULL, but 12483 splicetrie_max is NULL, which occurs if the splicing file is for introns, 12484 and not splicesites 12485 12486 * splicetrie_build.c: Fixed memory leak in building splicetrie for introns 12487 splicing file 12488 124892011-09-20 twu 12490 12491 * cappaths.c, spliceclean.c: Moved cappaths and spliceclean programs to 12492 GSTRUCT repository 12493 12494 * Makefile.dna.am, genecompare.c: Moved program genecompare to GSTRUCT 12495 repository 12496 12497 * Makefile.dna.am: Moved cappaths to GSTRUCT repository 12498 12499 * Makefile.dna.am, splicegene.c, spliceturn.c: Moved spliceturn and 12500 splicegene to GSTRUCT repository 12501 125022011-09-16 twu 12503 12504 * gsnap.c: Provided further clarification in help statement about 12505 min-localsplice-endlength 12506 12507 * gsnap.c: Checking that min_distantsplicing_end_matches is greater than or 12508 equal to kmer size. Clarified some help statements. 12509 12510 * README: Added recommendation to use known splice sites, rather than known 12511 introns 12512 12513 * README: Clarified that a given set of known splice sites can find 12514 alternative splicing. 12515 12516 * except.c: Fixed Except_advance_stack to return a value if pthreads not 12517 available 12518 12519 * Makefile.dna.am: Moved instructions for spliceclean and splicealt to 12520 GSTRUCT repository 12521 12522 * psl_introns.pl.in: Removed extraneous "v" at beginning of line 12523 125242011-09-14 twu 12525 12526 * VERSION: Updated version number 12527 12528 * index.html: Updated page to show version 2011-09-14 12529 12530 * inbuffer.c, sequence.c, shortread.c, shortread.h: Revised read procedures 12531 to handle multiple files correctly 12532 125332011-09-13 twu 12534 12535 * pair.c, pair.h, samprint.c: For SAM output of GMAP alignments, printing 12536 correct value of NH for number of hits 12537 12538 * stage3.c, stage3.h, gmap.c, gsnap.c: Added parameter for min_intronlength 12539 12540 * Makefile.dna.am, bam_pileup.c, bam_tally.c, bamread.c, bamread.h, 12541 gsnap_extents.c, gsnap_splices.c: Moved files to gstruct 12542 12543 * stage3.c: Reduced value of MIN_NONINTRON from 50 to 9 to avoid declaring 12544 short introns as indels 12545 12546 * pair.c: Fixed Pair_print_sam to work properly for chimeric alignments 12547 12548 * stage3.c: Cleaning up gaps and indels from ends at end of stage 3 12549 12550 * stage1hr.c: Fixed debugging statement 12551 125522011-09-09 twu 12553 12554 * VERSION: Updated version number 12555 12556 * dynprog.c: Fixed computation of lband and rband in find_best_endpoint and 12557 find_best_endpoint_to_queryend 12558 12559 * dynprog.c: Added protections against length1 being negative 12560 125612011-09-07 twu 12562 12563 * index.html: Made changes for 2011-09-07 version 12564 12565 * samprint.c, pair.c, pair.h, stage3.c: Removed almost all unused parameters 12566 12567 * outbuffer.c, stage3.c, stage3.h: Removed most unused parameters 12568 12569 * pair.c: Fixed compiler messages about comparison of signed and unsigned 12570 ints 12571 12572 * compress.c, compress.h: Fixed compiler messages about comparison of 12573 unsigned and signed ints 12574 12575 * svncl.pl: Made changes to fix merges of words between lines and preserve 12576 original numbers of spaces 12577 12578 * VERSION: Updated version number 12579 12580 * shortread.c: For input error involving line that is too long, printing 12581 accession where problem occurred. 12582 12583 * Makefile.am: Including full set of psl and gtf parsing programs with or 12584 without fulldist 12585 12586 * genome.c: Added include line for genomicpos.h 12587 12588 * Makefile.dna.am: Included genomicpos.c and genomicpos.h for 12589 extents_genebounds 12590 12591 * README: Made changes to reflect new gtf_introns program 12592 12593 * configure.ac, Makefile.am, gtf_introns.pl.in: Added gtf_introns program. 12594 Also putting psl_introns back into the public distribution. Made changes 12595 accordingly in README file. 12596 12597 * dynprog.c, dynprog.h, stage3.c: When searching for gappairs_alt using 12598 probabilities, bounding the search based on a score threshold computed 12599 from the original score 12600 12601 * stage3.c: For GSNAP, no trimming of noncanonical ends based on 12602 probabilities, since need to compare fwd and rev directions. Stopped 12603 final trimming of short end exons. For gappairs_alt, accepting if it 12604 results in high-probability splice sites. For pick_cdna_direction, using 12605 separate donor and acceptor scores and using alignment score again. 12606 12607 * stage3.c: Put final pass to find canonical introns before trimming of dual 12608 breaks at ends 12609 12610 * stage3.c: Fixed problem with trimming dual breaks where it was trimming 12611 indels. In trimming noncanonical exons at end, reduced NONCANONICAL_ACCEPT 12612 from 20 to 15, and added NONCANONICAL_PERFECT_ACCEPT. In 12613 pick_cdna_direction, turned off use of indel_alignment_score, and added 12614 nmatches - nmismatches - 3*nindels with MATCHES_SIGDIFF after use of 12615 totalintronscore. 12616 12617 * pair.c, dynprog.c: Counting nindels correctly near splice sites 12618 12619 * gtf_splicesites.pl.in: Allowing GTF file to use tag gene_id instead of 12620 gene_name 12621 126222011-09-06 twu 12623 12624 * stage1hr.c: Fixed masking of oligos for cmet and atoi modes 12625 12626 * gsnap.c: Added 'S' flag to getopt command. Removed 'R' flag from getopt 12627 and from help message. 12628 12629 * datadir.c: Fixed error message to remove the word default 12630 12631 * stage3.c: Peeling back 1 pair in a dual break (previously turned off) to 12632 avoid having a gap on either side. 12633 12634 * dynprog.c, dynprog.h, stage3.c: Changed bridge_intron_gap to have an 12635 explicit parameter for use_probabilities_p. Using indel_alignment_score 12636 now in pick_cdna_direction. 12637 12638 * splicetrie.c, splicetrie.h: Providing contlength as parameter when making 12639 3' splicejunctions 12640 12641 * genome.c, genome.h: Added functions for Genome_fill_buffer_blocks that do 12642 not print final null at end of string, needed for making splicejunctions. 12643 12644 * dynprog.c, dynprog.h: Fixed creation of splicejunctions. For 5' 12645 splicejunctions, not printing final null at end of string. For 3' 12646 splicejunctions, printing distal sequence at splicejunction[contlength]. 12647 126482011-09-02 twu 12649 12650 * atoiindex.c, cmetindex.c: Masking to oligomer indices to given index1part 12651 size 12652 12653 * gmap.c: Fixed bug in pairalign where min_matches not be adjusted downward 12654 to MIN_MATCHES 12655 126562011-09-01 twu 12657 12658 * stage3.c: Fixed bug where longer end of dual break was being trimmed, not 12659 shorter end 12660 12661 * dynprog.c: Allow for cDNA insert of up to 3 bp at splice site 12662 12663 * dynprog.c, dynprog.h, gmap.c, gsnap.c, stage3.c: Introduced 12664 --microexon-spliceprob flag and allowing microexons only if one of the 12665 splice sites exceeds this value 12666 12667 * stage3.c: Always keeping gappairs if finalp is true (was forcep) 12668 12669 * dynprog.c: Always using splice site probabilities to find introns if 12670 finalp is true, and allowing indels nearby 12671 12672 * stage3.c: Added a factor for dual break query jump to avoid dual breaks at 12673 end with small query jumps 12674 12675 * gmap.c: Fixed indentation 12676 12677 * scores.h, pair.c: Added negative points in pathscore for non-canonical 12678 intron or for when cdna direction is indeterminate 12679 12680 * chimera.c: Not allowing chimeric transition into a gap 12681 12682 * stage1hr.c: Using new interface to Stage3_compute 12683 12684 * gsnap.c: Using new interface to Stage3_setup 12685 12686 * gmap.c: Added --nosplicing flag. Fixed memory leak when matches < 12687 min_matches. 12688 12689 * stage3.c, stage3.h: Made splicingp a static variable. Added a step to 12690 remove dual breaks from the ends of an alignment. 12691 12692 * stage2.c: Turning off link back to grand_fwd_hit when splicingp is false 12693 12694 * stage2.c: Using macros for diffdist_penalty under splicing and 12695 non-splicing cases 12696 12697 * stage2.c: Consolidated loops for use_shifted_canonical_p == true and == 12698 false. Removed compiler branches when SHIFT_EXTRA is not defined. 12699 127002011-08-31 twu 12701 12702 * stage3.c: In trimming noncanonical exons from end, requiring end intron to 12703 be canonical and have donor prob or acceptor prob >= 0.9. 12704 12705 * chimera.c, chimera.h, pair.c, pair.h, stage3.c, stage3.h: Prohibiting 12706 chimeric join at a query position containing a gap 12707 12708 * gsnap.c: Added comment in --help output about how to turn off terminal 12709 alignments 12710 12711 * stage1hr.c: Using terminal_threshold in paired-end reads 12712 12713 * gsnap.c, stage1hr.c, stage1hr.h, stage3hr.c, stage3hr.h: Replaced 12714 terminal_penalty with terminal_threshold 12715 127162011-08-30 twu 12717 12718 * gmap.c: Fixed entry for --pairalign in long_options 12719 12720 * gmap.c: If min_matches exceeds MIN_MATCHES, use MIN_MATCHES 12721 12722 * outbuffer.c, pair.c, pair.h, stage3.c, stage3.h: In compressed output, 12723 printing accession of usersegment instead of null dbversion 12724 12725 * gmap.c: Implemented --pairalign flag for aligning a pair of sequences via 12726 stdin 12727 12728 * splicetrie_build.c: Fixed compiler warning about unused variable 12729 12730 * stage3.c: Improved debugging output 12731 12732 * genome.c, gmap.c, inbuffer.c, inbuffer.h, sequence.c, sequence.h: 12733 Implemented --cmdline flag to align two sequences provided on the command 12734 line 12735 12736 * shortread.c: Changed warning message when /1 or /2 endings not found 12737 12738 * access.c: Changed warning message for Macs 12739 12740 * stage3.c: Fixed add_querypos_offset, which previously excluded gap pairs 12741 127422011-08-29 twu 12743 12744 * genome.c: Fixed compiler warnings about comparing ints and unsigned ints 12745 12746 * indexdb.c: Printing genomesubdir and then individual index file names in 12747 monitoring message 12748 12749 * indexdb.c: Using commas in initial monitoring message 12750 12751 * genome.c: Using commas in initial monitoring message. Allowing allocation 12752 if mmap not available. 12753 12754 * gsnap.c: Modified messages about RNA-Seq and DNA-Seq 12755 12756 * stage1hr.c: Put GMAP modes on a single line in monitoring message 12757 12758 * indexdb.c: Restored debugging messages 12759 12760 * iit-read.c: Casting all size_t to (unsigned long) in error messages 12761 12762 * access.c: Added check for failure of fread, which happens with Macs on 12763 large genomes 12764 12765 * pair.c: Fixed case of a negative distance in printing GMAP alignment 12766 beyond chromosomal bounds 12767 12768 * substring.c, substring.h: Using new interface to Pair_print_gsnap 12769 12770 * segmentpos.c, indexdb_hr.c: Fixed compiler warning about comparing int and 12771 unsigned int 12772 12773 * oligo.c: Commented out unused procedures for dibase 12774 12775 * stage1hr.c: Using new interfaces to Stage3end_remove_overlaps and 12776 Pair_print_gsnap 12777 12778 * stage3hr.c, stage3hr.h: Removed unused parameters in 12779 Stage3end_remove_overlaps 12780 12781 * pair.c, pair.h: Fixed bug in GSNAP standard output in printing GMAP 12782 alignments beyond chromosomal bounds 12783 12784 * stage1hr.c: For finding GMAP mapping bounds using segments, checking 12785 plus_nsegments and minus_nsegments > 0, rather than plus_segments and 12786 minus_segments == NULL. 12787 127882011-08-27 twu 12789 12790 * VERSION: Updated VERSION 12791 12792 * pair.c: Switched from aaphase_g to aaphase_e, since aaphase_e correctly 12793 codes for all three positions of the stop codon 12794 12795 * config.site: Removed references to samtools 12796 12797 * gsnap.c, shortread.c, shortread.h: Added flags --fastq-id-start and 12798 --fastq-id-end. Stripping Illumina paired-end endings more intelligently. 12799 128002011-08-26 twu 12801 12802 * dynprog.c, dynprog.h, splicetrie.c, stage3.c: Removed some unused 12803 parameters 12804 12805 * dynprog.c, dynprog.h, stage3.c: Removed code for INTRON_HELP 12806 12807 * gmap.c: Changed author list 12808 12809 * dynprog.c, dynprog.h, splicetrie.c, splicetrie.h: Removed references to 12810 gbuffer 12811 128122011-08-25 twu 12813 12814 * gsnap.c: Fixed warning messages for -N and -s. In --help message, 12815 notifying user that full pathnames are allowed. 12816 12817 * stage3.h: Providing accessor commands for chrnum, chrstart, and chrend 12818 12819 * stage3.c: Fixed bug in determining coordinates for Stage3_mergeable 12820 12821 * gmap.c: Modified some debugging statements for chimeras 12822 128232011-08-19 twu 12824 12825 * stage1hr.c: Adding at least querylength beyond distal mappingstart or 12826 mappingend to obtain distal genomicstart or genomicend 12827 12828 * oligoindex.c: Added to size of genomicdiag by 1. Added assertion about 12829 exceeding those bounds. 12830 12831 * gsnap.c: Added warning messages about interpretation of -N and -s flags 12832 12833 * genome_hr.c: Fixed bugs preventing program from compiling 12834 12835 * gmap_build.pl.in: Not providing -s flag to fa_coords 12836 128372011-08-16 twu 12838 12839 * archive.html, index.html: Updated for version 2011-08-15 12840 12841 * COPYING: Changed Developer 12842 12843 * gmap_setup.pl.in: Removed instructions about gmapdb_lc and gmapdb_lc_masked 12844 12845 * README: Added default value for MAX_READLENGTH 12846 12847 * README: Revised for new features 12848 12849 * indexdb.c: Added warning message if no gammaptrs file is produced 12850 12851 * gmapindex.c: Allocating an extra two chars in the offsets file names for 12852 the basesize 12853 12854 * archive.html, index.html: Changes made for 2011-03-28 version 12855 128562011-08-15 twu 12857 12858 * fa_coords.pl.in: Limiting number of warning messages about duplicate 12859 contigs 12860 12861 * atoiindex.c, cmetindex.c: Improved monitoring messages 12862 12863 * setup1.test.in, Makefile.am, setup.ref12123positions.ok, 12864 setup.ref123positions.ok: Changed tests for new gamma file format 12865 12866 * atoiindex.c, cmetindex.c, indexdb.c: Handling special case where 12867 index1part == basesize by not writing gammaptrs file and reading 12868 offsetscomp file directly into offsets. 12869 12870 * atoiindex.c, cmetindex.c: Modified procedures to work with new compressed 12871 offsets file format 12872 12873 * indexdb.c: Handling case where kmer == basesize 12874 128752011-08-14 twu 12876 12877 * gmap_setup.pl.in: Removed -S flag and added -s flag 12878 12879 * gmap_build.pl.in: Changed message to indicate that default order is chrom 12880 order 12881 12882 * fa_coords.pl.in: Removed -S flag 12883 12884 * gmapindex.c: Made chrom sort order the default 12885 12886 * gmap.c, gsnap.c: Added --splicingdir flag 12887 12888 * gsnap_splices.c: Turned off warning messages about non-canonical splices 12889 12890 * bam_tally.c: Fixed warning message output going to stdout 12891 12892 * bam_tally.c: Fixed printing of print_allele_counts_simple. Allocating and 12893 freeing tallies within parse_bam procedure for each chromosome. 12894 12895 * indexdb.c: Added missing closing brace 12896 12897 * gsnap_extents.c: Using find_strand procedure from gsnap_splices, which 12898 trusts strand from SAM output 12899 12900 * spliceturn.c: Fixed eliminatep to be indexed by universal IIT index 12901 129022011-08-13 twu 12903 12904 * indexdb.c, indexdb.h, indexdb_hr.c, indexdbdef.h: Allowing backward 12905 compatibility with pre-gamma genomic indices. Using littleendian and 12906 bigendian versions of gamma procedures. Implemented more compatibility 12907 with bigendian machines. 12908 12909 * genome_hr.c, genome_hr.h: Instead of allocated/mmapped versions of gamma 12910 procedures, creating littleendian and bigendian versions 12911 12912 * spliceclean.c: Changed wording of monitoring messages from "Resolve" to 12913 "Choose". Providing count information even when cannot choose between fwd 12914 and rev. 12915 12916 * spliceturn.c: Printing monitoring message about number of splices 12917 eliminated 12918 12919 * bam_tally.c: Improved warning messages for genotypes inconsistent with 12920 reference allele 12921 12922 * atoiindex.c, cmetindex.c: Using offsetscomp instead of offsets in variable 12923 names. 12924 12925 * snpindex.c: Fixed bug with freeing gammaptrs_filename and 12926 offsetscomp_filename too early. Using offsetscomp instead of offsets in 12927 variable names. 12928 12929 * indexdb.h: Removed unused procedures 12930 12931 * indexdb.c: Checking for rare case that ctr == 0 after all gammas read, and 12932 not advancing ptr in that case 12933 12934 * bam_tally.c: Added Tally_T structure to simplify data structures and speed 12935 up program 12936 12937 * bam_tally.c: Removed quality_score_constant. Handling empty quality 12938 strings correctly. 12939 12940 * bamread.c: Handling empty quality strings correctly 12941 12942 * indexdb.c, indexdb_hr.c: Handling an offsetscomp_access condition that is 12943 not possible, to eliminate compiler warnings 12944 12945 * gmap.c, stage1hr.c: Reduced value of minendexon from 12 to 9, since we are 12946 using nmatches - nmismatches. Results in much better results at ends. 12947 12948 * stage3hr.c: Restored usage of score before nmatches in remove_overlaps 12949 procedures 12950 12951 * stage3.c: For trimming at ends, using nmatches - nmismatchs to evaluate. 12952 Fixed bug in pick_cdna_direction where value for cdna_direction not 12953 assigned correctly. Indels for bad introns, using requiring each end 12954 probability to be greater than 0.9, and taking alternate gappairs if 12955 nmismatches is less. 12956 129572011-08-12 twu 12958 12959 * trunk, VERSION, config.site.rescomp.prd, config.site.rescomp.tst, src, 12960 Makefile.dna.am, Makefile.gsnaptoo.am, access.h, atoiindex.c, cmetindex.c, 12961 genome_hr.c, genome_hr.h, gmap.c, gmapindex.c, gsnap.c, iit-read.c, 12962 indexdb.c, indexdb.h, indexdb_hr.c, indexdbdef.h, pmapindex.c, snpindex.c, 12963 splicetrie_build.c, types.h, util, gmap_build.pl.in, gmap_setup.pl.in: 12964 Merged revisions 44539 to 44852 from branches/2011-08-09-elias-gamma, 12965 implementing gamma coding to represent offsets in genomic indices 12966 129672011-08-10 twu 12968 12969 * stage1hr.c: Put GMAP pairsearch back in front of distant splicing. 12970 Characterizing GMAP pairsearch results according to quality, and updating 12971 either nconcordant or nsalvage. Distant splicing done if nconcordant is 12972 0. Terminals done if both nconcordant and nsalvage are 0. 12973 12974 * stage3.c: Not using nnoncanonical in pick_cdna_direction. Restored 12975 assignment of SENSE_NULL if no canonical site is found. 12976 12977 * stage3hr.c: In Stage3end_remove_overlaps and Stage3pair_remove_overlaps, 12978 using nmatches rather than score as the primary measure 12979 12980 * samprint.c, stage1hr.c, stage3hr.h: Introduced new hittype DISTANT_SPLICE 12981 and not trying to do GMAP alignment on those 12982 12983 * stage1hr.c: Moved distant splicing ahead of GMAP pairsearch 12984 12985 * stage3hr.c: Added debugging information 12986 129872011-08-09 twu 12988 12989 * substring.c: Trimming terminals with fixed -3 mismatch score, while 12990 allowing other ends to be controlled with user-specified 12991 trim_mismatch_score. 12992 12993 * dynprog.c: Using probabilities to find splice sites only if finalp is true 12994 129952011-08-08 twu 12996 12997 * bamread.c: Added missing #endif statement 12998 12999 * stage3.c: Limiting SENSE_NULL only to ties between fwd and rev 13000 13001 * stage3.c: Preventing semicanonical splices at end from being trimmed. 13002 Using product of probabilities to decide whether an indel is next to a bad 13003 intron 13004 13005 * dynprog.c: If canonical intron cannot be found, using probabilities to 13006 find best splice junction 13007 13008 * bamread.c, bamread.h: Added function Bamread_splice_strand 13009 13010 * samprint.c: Making cigar_noncanonical_splices_p true 13011 13012 * sequence.c, shortread.c: Made fixes in new handling of eoln situations 13013 130142011-08-07 twu 13015 13016 * stage3hr.c, stage3hr.h, substring.c, substring.h: Using runlength IIT for 13017 resolving multiple mappings 13018 13019 * spliceturn.c: Sorting splices in order of observed counts, and eliminating 13020 in that order 13021 13022 * snpindex.c: Fixed coordinates in error messages 13023 13024 * gsnap.c: Added option for using runlength IIT to resolve multiple mappings 13025 13026 * bam_tally.c: Added option for printing runlengths 13027 13028 * bam_tally.c: Printing genotype. Added --diffs-only flag. 13029 130302011-08-06 twu 13031 13032 * dynprog.c: Providing separate rewards for GC-AG and AT-AC introns, with 13033 stronger reward for GC-AG. 13034 13035 * stage3.c: Not removing bad non-canonical exons at end. Computing combined 13036 probability score for donor and acceptor splice sites. 13037 13038 * spliceturn.c: Fixed program to not depend on distinction between known and 13039 new splices 13040 13041 * gsnap_splices.c: Added --minsupport flag 13042 13043 * gmap.c: Added break after case 'z' 13044 13045 * stage3hr.c: Turned off TALLY_RATIO, and checking instead for presence or 13046 absence of overlap 13047 13048 * gsnap_splices.c: Added --mincount flag 13049 13050 * gsnap.c: Adding .iit to splicesites file when searching locally 13051 13052 * bam_pileup.c: Printing accessions at start and end of reads 13053 13054 * indexdb.c: Checking for full offsets_suffix, not just "offsets" 13055 13056 * dynprog.c, dynprog.h, stage3.c: Fixed bug in assigning the wrong value to 13057 splicingp. For indels next to bad introns, checking the alternative to 13058 see if it is free of mismatches. 13059 13060 * Makefile.dna.am, Makefile.gsnaptoo.am, atoiindex.c, cmetindex.c, 13061 indexdb.c, indexdb.h, snpindex.c: Created a general Indexdb_get_filenames 13062 procedure and using it for snpindex, cmetindex, and atoiindex, so they 13063 work on all k-mer types 13064 13065 * stage3hr.c, substring.c, substring.h: Removed some unused parameters and 13066 variables 13067 13068 * outbuffer.c, stage3.c, stage3.h: Removed some unused parameters 13069 13070 * stage3.c: Not calling pick_cdna_direction when splicingp is false 13071 130722011-08-05 twu 13073 13074 * bam_pileup.c: Initial import into SVN 13075 13076 * outbuffer.c: Restored Paths and Alignments sections to "gmap -4" output 13077 (continuous by exon). 13078 13079 * VERSION: Updated version number 13080 13081 * stage3hr.c: In Stage3end_gene_overlap, initializing foundp 13082 13083 * iit-read.c: In IIT_gene_overlap, initializing allocp and freeing matches. 13084 13085 * gtf_genes.pl.in: Revised ends and starts for genes on minus strand. 13086 13087 * psl_genes.pl.in: Fixed 0-basis of starts. Revised ends and starts for 13088 genes on minus strand. 13089 13090 * psl_genes.pl.in: Added headers for gene format 13091 13092 * configure.ac, Makefile.am: Revised set of files distributed 13093 130942011-08-04 twu 13095 13096 * Makefile.am, gtf_genes.pl.in, psl_genes.pl.in: Added psl_genes and 13097 gtf_genes programs 13098 13099 * gmap.c, iit-read.c, stage1.c, stage1.h, gregion.c: Removed unused 13100 parameters 13101 13102 * inbuffer.h: Removed parameter pc_linefeeds_p 13103 13104 * shortread.h: Removed braces 13105 13106 * sequence.c, shortread.c: Removed call to find_bad_char, since we are 13107 checking for '\r' directly before '\n' 13108 13109 * sequence.c, shortread.c: Added checks so we don't read p[-1] when the 13110 first character in the string is already '\n' 13111 13112 * sequence.c, shortread.c: Checking for carriage return before every line 13113 feed 13114 13115 * gsnap.c, inbuffer.c, shortread.h: Removed --pc-lines option 13116 13117 * Makefile.am, dbsnp_iit.pl.in, fa_coords.pl.in, gmap_compress.pl.in, 13118 gmap_process.pl.in, gmap_uncompress.pl.in, gtf_splicesites.pl.in, 13119 md_coords.pl.in, psl_introns.pl.in, psl_splices.pl.in, 13120 psl_splicesites.pl.in: Stripping CR-LF from input files 13121 13122 * gsnap.c, iit-read.c, iit-read.h, pair.c, pair.h, stage3hr.c, stage3hr.h, 13123 substring.c, substring.h: Added option to favor multi-exon genes 13124 13125 * gmap.c, gsnap.c: Checking for valid int and float arguments 13126 13127 * stage3hr.c: Fixed bug in resolve_multimapping procedures 13128 13129 * stage3hr.c: Fixed bug in not initializing antistranded_penalty 13130 13131 * gsnap.c, iit-read.c, iit-read.h, pair.c, pair.h, stage1hr.c, stage3hr.c, 13132 stage3hr.h, substring.c, substring.h: Added -g flag and genes_iit. Added 13133 procedures for resolving multimapping using known genes and tally. 13134 131352011-08-03 twu 13136 13137 * gmap.c: Making call to Splicetrie_setup, so -s flag works for known splice 13138 sites 13139 13140 * Makefile.dna.am, Makefile.gsnaptoo.am, Makefile.three.am, Makefile.am, 13141 coords1.test.in, iit.test.in, setup1.test.in: Fixed "make check" so it 13142 works for Cygwin on Windows, where copying of programs from src does not 13143 work 13144 13145 * dynprog.c, dynprog.h, stage3.c: Computing splice site probabilities when 13146 user genomic segment is provided 13147 13148 * stage3.c: In assign_gap_types, using known splicesites_iit to assign 13149 splice site probabilities of 1. 13150 13151 * pair.c, pairdef.h, pairpool.c, stage3.c: In addition to trimming 13152 noncanonical exons close to the end, trimming bad canonical exons close to 13153 the end. Now computing splice site probabilities in assign_gap_types. 13154 13155 * dynprog.c: Checking for indel plus bad intron only when finalp is true, 13156 because earlier passes may need some time to iterate to reach a final 13157 solution. 13158 13159 * stage3.c: Checking for pairs being NULL before calling Pair_trim_ends 13160 13161 * gmap.c, pair.c, pair.h, stage1hr.c, stage3.c, stage3.h, stage3hr.c, 13162 stage3hr.h: Using matches post-trim for deciding if the alignment has 13163 sufficient quality, but using nmatches_pretrim for ranking and scoring 13164 purposes. 13165 131662011-08-02 twu 13167 13168 * stage3.c: Turned final pass 6 back on, which was inadvertently turned off 13169 13170 * dynprog.c, pair.c, pair.h, pairdef.h, pairpool.c: Protecting pairs at end 13171 against trimming by GMAP if they are found by splicetrie at known splice 13172 sites 13173 13174 * stage3.c: Not running Dynprog_single_gap if queryjump or genomejump of a 13175 dual break is equal to 1, since that just leads to two indels. 13176 13177 * pair.c: For trimming of ends, changed penalty for indel score from -6 to 13178 -4. 13179 13180 * dynprog.c: Corrected coordinates for splice site probabilities and 13181 dinucleotides. Disallowing indels near splice sites if either probability 13182 is less than 0.9. 13183 13184 * stage3.c: Keeping non-canonical intron if there is sufficient exon 13185 evidence at the end. Iterating trimming of non-canonical introns at the 13186 end. 13187 13188 * pair.c: Printing "method:gmap" even if assertions are turned off 13189 13190 * dynprog.c: Checking probability of intron found by bridge_intron_gap and 13191 discarding the solution if it finds both an indel and a bad intron. 13192 13193 * chrom.c, diag.c, diag.h, dynprog.c, gmap.c, oligoindex_hr.c, outbuffer.c, 13194 pair.c, pair.h, smooth.c, stage1.c, stage1hr.c, stage2.c, stage3.c, 13195 stage3.h: Removed various unused parameters 13196 13197 * VERSION, config.site.rescomp.prd, config.site.rescomp.tst: Revised version 13198 number 13199 13200 * stage1hr.c: Fixed debugging statements 13201 13202 * substring.c: Added a general test of goodness for a substring based on its 13203 numbers of matches and mismatches. 13204 13205 * stage3.c: Restored trimming of non-canonical end exons, but removed code 13206 that trimmed exons less than 20 bp when known splicing was available. 13207 13208 * iit-read.c, iit-read.h: Added function IIT_exists_with_divno_typed_signed 13209 13210 * stage1hr.c: Providing an allowance for GMAP score when want_high_quality_p 13211 is true. Turning off use of score as primary criterion in choosing 13212 splices, since it does not take advantage of known splicing. 13213 13214 * stage3.c: Not using nmismatches in pick_cdna_direction. If 13215 splicesites_iit is available, assigning known splicesites a probability of 13216 1.0. 13217 132182011-08-01 twu 13219 13220 * stage1hr.c, stage1hr.h: When antistranded_penalty has a value, using 13221 mismatches plus penalty to decide between sense and antisense, rather than 13222 using probabilities. 13223 13224 * stage3.c: Changed to_queryend_p to be true for distalmedial_ending, since 13225 it is comparing alternatives. 13226 13227 * stage3hr.c, stage3hr.h: Adding antistranded_penalty to score 13228 13229 * stage3.c: In pick_cdna_direction, using presence of indels in combination 13230 with bad intron 13231 13232 * stage1hr.c, stage1hr.h: Using antistranded penalty to determine to force 13233 GMAP to look for antisense result. Requiring GMAP to be high quality in 13234 all cases. Basing GMAP quality on nmismatches plus gap opens. Choosing 13235 best result among all paired types, so concordant no longer predominates 13236 over others. 13237 13238 * samflags.h: Added comment to show some common flags 13239 13240 * gsnap.c: Added parameter for antistranded penalty 13241 13242 * chrom.c, chrom.h, gmapindex.c, gsnap_extents.c, gsnap_splices.c, 13243 gsnap_terms.c, iit-read.c, iit_store.c, iitdef.h, segmentpos.c, 13244 segmentpos.h, spliceclean.c: Introducing new chrom sort in addition to 13245 previous numeric_alpha sort 13246 13247 * chrom.c: Fixed bugs in parsing out initial "chr" from strings 13248 132492011-07-31 twu 13250 13251 * gsnap.c: Changed default for all gmap parameters from 2 to 3 13252 13253 * stage1hr.c: Added macros add_bounded and subtract_bounded to keep 13254 computations within chromosomal bounds 13255 13256 * gmap.c, stage2.c, stage2.h: Created a specialized procedure for 13257 score_querypos when splicing is true and no shifted canonicals are used 13258 13259 * oligoindex.c: Using only 8-mers for oligoindices_major in GSNAP for stage 2 13260 13261 * trunk, config.site.rescomp.tst, configure.ac, src, gsnap.c, stage1hr.c, 13262 stage1hr.h, stage3hr.c, stage3hr.h, util, gmap_build.pl.in, 13263 gtf_splicesites.pl.in, psl_splicesites.pl.in: Merged revisions 44034 to 13264 44047 from branches/2011-07-31-gmap-then-terminals 13265 13266 * stage1hr.c: For GMAP halfmapping, when overlap is found, take widest 13267 possible starting point between the overlap calculation and the normal 13268 calculation 13269 13270 * stage3.c: Added gappairs in debugging statements 13271 13272 * shortread.c, shortread.h, stage1hr.c: In computing GMAP halfmapping, 13273 checking for existence of primers and extending GMAP region if they exist 13274 13275 * stage3hr.c, stage3hr.h: Added function Stage3end_best_score_paired 13276 13277 * stage1hr.c: Running GMAP pairsearch only on ends with scores better than 13278 those already paired 13279 13280 * gmap.c, gsnap.c: Printing arguments before they are parsed 13281 13282 * stage2.c: When shifted_canonical_p is false, not computing rev scores, 13283 since they are the same as the fwd scores 13284 13285 * stage2.c: Improved debugging statements 13286 13287 * gmap.c, stage1hr.c, stage2.c, stage2.h, stage3.c: Added parameter 13288 use_shifted_canonical_p. Using now only for cross-species alignment in 13289 GMAP. 13290 13291 * stage3hr.c, stage3hr.h: Added function Stage3pair_sort_bymatches. For 13292 substitutions with 0 mismatches, classifying hit as EXACT rather than SUB, 13293 so duplicates are eliminated properly. 13294 13295 * stage1hr.c, stage1hr.h: Allowing multiple concordant results to undergo 13296 GMAP improvement, up to max_gmap_improvement. Sorting results by matches 13297 before GMAP improvement. 13298 13299 * gsnap.c: Introduced separate parameter for max_gmap_improvement. Hid 13300 pairexpect and terminal-penalty parameters. 13301 133022011-07-30 twu 13303 13304 * stage1hr.c: Fixed errors in find_terminals with analysis of mismatches and 13305 use of floors 13306 13307 * stage1hr.c: Requiring high quality GMAP unpaired method when we observe a 13308 paired toolong alignment 13309 13310 * gsnap.c: Made halfmapping,unpaired,improve the default GMAP method 13311 13312 * stage3hr.c, stage3hr.h: Scoring GMAP based on total number of matches. 13313 Implemented Stage3end_best_score. 13314 13315 * stage1hr.c: Using new test based on Stage3end_best_score to avoid 13316 terminals in anticipation of GMAP halfmapping. Allowing poor quality GMAP 13317 in GMAP improvement. Filtering results by optimal score before GMAP 13318 improvement using an infinite cutoff. 13319 13320 * stage3.c: In pick_cdna_direction, allowing bad-scoring canonical introns 13321 to determine sense 13322 13323 * Makefile.dna.am, Makefile.gsnaptoo.am, gmap.c, sense.h, stage1hr.c, 13324 stage3.c, stage3.h, stage3hr.h: Created new header file sense.h. 13325 Returning sensedir from pick_cdna_direction and Stage3_compute. 13326 13327 * gsnap.c, stage1hr.c, stage1hr.h: Introduced separate parameter for 13328 trigger-score-for terminals 13329 13330 * stage1hr.c: Turned off option to avoid terminals if halfmapping 13331 anticipated later 13332 13333 * stage3.c: Increased threshold for trying microexon from 0 acceptable 13334 mismatches to 2 in high quality sequences 13335 13336 * stage1hr.c: Applying Stage3pair_optimal_score after GMAP improvement step 13337 13338 * pair.c, pair.h, samprint.c, stage3hr.c, substring.c, substring.h: 13339 Implemented computation of MAPQ scores for GMAP alignments 13340 13341 * mapq.c, mapq.h: Moved some constants to mapq.h, so pair.c can access them 13342 13343 * gsnap.c, stage1hr.h: Changed name of GMAP method from concordant_uniq to 13344 improvement 13345 13346 * stage1hr.c: Conducting search for terminals even if gmap_halfmapping_p is 13347 true, if both single ends have no hits so far 13348 13349 * splicetrie.c, splicetrie.h: Checking internal exon region at splice site 13350 before doing a search for a short-end splice. 13351 13352 * oligoindex_hr.c: Removed assertions about sequencepos 13353 13354 * stage1hr.c: In using segments to bound GMAP region, looking at querypos5 13355 and querypos3 to help decide whether to extend mapping region by 13356 shortsplicedist. 13357 13358 * stage3.c: In pick_cdna_direction, no longer using indel_alignment_score 13359 13360 * gsnap.c, stage1hr.c, stage1hr.h, stage3hr.c, stage3hr.h: Introducing 13361 separate parameters for GMAP methods. Sorting singlehits5 and singlehits3 13362 before running GMAP halfmapping/unpaired. 13363 13364 * gmap.c, gsnap.c, oligoindex_hr.c, oligoindex_hr.h, stage1hr.c, stage1hr.h, 13365 stage2.c, stage2.h, stage3.c: Merged revisions 43984 to 43994 from 13366 branches/2011-07-30-oligoindex-mapping-region to introduce mappingstart 13367 and mappingend in addition to genomicstart and genomicend for finding 13368 oligoindex mappings in stage 2 13369 13370 * pair.c: In trimming, counting an indel only as a single mismatch 13371 13372 * stage3.c: Turning off the step to remove noncanonical introns, since it 13373 fails in some cases 13374 13375 * stage1hr.c, stage2.c: Not doing terminals if GMAP pairsearch or 13376 halfmapping is available 13377 13378 * stage2.c: Ignore log message for 43986. Correct log message should be: 13379 Introduced NEAR_END_LENGTH to define ends of reads where we can ignore 13380 EXON_DEFN. 13381 13382 * gsnap.c: Made halfmapping,cuniq the default for gmap-mode. 13383 13384 * gmap.c: Reduced values of CHIMERA_SLOP 13385 133862011-07-29 twu 13387 13388 * stage3hr.c: For single-end reads, favoring non-ambiguous alignments over 13389 ambiguous ones when they have identical genomicstarts or genomicends. 13390 13391 * dynprog.c: In Dynprog_end5_known and Dynprog_end3_known, if extension not 13392 found to query end and if ambiguous splicing not found, then doing dynamic 13393 programming again to best end, rather than to query end. 13394 13395 * pair.c, pair.h, stage3.c: Trimming of ends of GMAP alignments 13396 13397 * stage1hr.c: Allowing very poor alignments to be reported by GMAP method, 13398 now that we have trimming at ends 13399 13400 * gsnap.c, stage1hr.c, stage1hr.h: Avoiding duplication of chr marker 13401 segments and chr marker segment at the beginning. Passing chromosome_iit 13402 and nchromosomes into Stage1hr_setup. 13403 134042011-07-28 twu 13405 13406 * stage1hr.c: Using binary search on segments to bound region for GMAP 13407 alignments 13408 134092011-07-27 twu 13410 13411 * stage1hr.c: Created debugging version that identifies plus_segments and 13412 minus_segments within GMAP halfmapping region 13413 13414 * stage2.c: Re-using result of find_shifted_canonical when leftpos remains 13415 the same 13416 13417 * gsnap.c: Made default for GMAP halfmapping with 2 candidates 13418 13419 * genome_hr.c: Added comments in splicesite_positions that offset - 1 is 13420 verified 13421 13422 * stage1hr.c: Fixed issues with fast_level and cutoff_level for very short 13423 reads, where fast_level < 0 13424 13425 * stage1hr.c: Made plus_segments, plus_nsegments, minus_segments, and 13426 minus_nsegments fields within the Stage1_T object, so they can be used 13427 later in limiting range for GMAP alignments 13428 13429 * genome_hr.c, genome_hr.h, stage2.c: Implemented lookup as needed of the 13430 previous dinucleotide from the genomic blocks 13431 134322011-07-26 twu 13433 13434 * stage1hr.c: Added chr marker segments to indicate boundaries between 13435 chromosomes and simplify inner loops for localsplicing and indels 13436 13437 * Makefile.dna.am: Removed gbuffer.c and gbuffer.h from file lists 13438 13439 * stage3.c, stage3.h: Multiple revisions to pick_cdna_direction 13440 13441 * stage2.c: Requiring exon length > EXON_DEFN for canonical splicing only in 13442 middle of queryseq 13443 13444 * gsnap.c, stage1hr.c, stage1hr.h: Allowing control over individual gmap 13445 modes 13446 13447 * chimera.c, dynprog.c, dynprog.h, extents_genebounds.c, genome.c, genome.h, 13448 get-genome.c, gmap.c, gsnap_tally.c, iit_plot.c, match.c, pair.c, 13449 sequence.c, sequence.h, splicetrie.c, splicetrie.h, splicing-score.c, 13450 substring.c: Making complement in place. Using new interface to 13451 Genome_get_segment. 13452 134532011-07-21 twu 13454 13455 * Makefile.am, gtf_splicesites.pl.in: Added gtf_splicesites program 13456 13457 * config.site.rescomp.tst, VERSION: Updated version number 13458 13459 * indexdb.c: Fixed error message for PMAP 13460 13461 * trunk, config.site.rescomp.tst, src, gsnap.c, indexdb.c, outbuffer.c, 13462 outbuffer.h, resulthr.c, resulthr.h, splicealt.c, splicetrie.c, 13463 splicetrie.h, splicetrie_build.c, splicetrie_build.h, stage1hr.c, 13464 stage1hr.h, stage3hr.c, stopwatch.c, substring.c, util: Merged revisions 13465 43003 to 43362 from branches/2011-07-15-fast-knownsplices in creating 13466 splicecomp and creating specialized procedures for find_spliceends for 13467 shortends and distant splicing 13468 13469 * stage2.c: Fixed bug in older canonical dinucleotides procedure that 13470 overwrite -1 values in initial positions 13471 13472 * stage1hr.c: Revised comments 13473 13474 * Makefile.dna.am: Added command for splicealt 13475 13476 * psl_splicesites.pl.in, psl_introns.pl.in: Fixed warning messages 13477 13478 * gmap_build.pl.in: Added initial definition for -B flag 13479 134802011-07-16 twu 13481 13482 * stage2.c: Fixed problem in get_last, needed by find_shifted_canonical, 13483 when first few positions lack a value of -1. 13484 134852011-07-13 twu 13486 13487 * VERSION: Revised version number 13488 13489 * genome-write.c: Added fix for genomes in PC-DOS file format 13490 13491 * trunk, VERSION, config.site.rescomp.prd, src, atoiindex.c, cmetindex.c, 13492 gmap.c, gmapindex.c, gsnap.c, indexdb.c, oligo.c, stage1.c, stage1hr.c, 13493 stage1hr.h, stage3.c, stage3hr.c, stage3hr.h, util, gmap_build.pl.in, 13494 gmap_setup.pl.in: Merged revisions 42540 through 42858 from 13495 branches/2011-07-08-index-14mers to handler 13-mers and 14-mers, and to 13496 fix various bugs 13497 134982011-07-08 twu 13499 13500 * samprint.c: Using new interface to Pair_print_sam 13501 13502 * outbuffer.c, pair.c, pair.h, stage3.c, stage3.h: Using usersegment 13503 accession in SAM and GFF3 output when -g flag is specified. 13504 13505 * goby.h, gsnap.c, outbuffer.c, shortread.c, shortread.h, stage3hr.c, 13506 stage3hr.h, substring.h: Merged external changes for Goby from 2011-07-01 13507 and 2011-07-08 13508 13509 * Makefile.gsnaptoo.am: Added files for oligoindex_hr.c and oligoindex_hr.h 13510 13511 * goby.c: Applied external patch from 2011-07-08 13512 13513 * trunk, VERSION, config.site.rescomp.tst, src, Makefile.dna.am, 13514 genome_hr.c, genome_hr.h, gmap.c, gsnap.c, oligoindex.c, oligoindex.h, 13515 oligoindex_hr.c, oligoindex_hr.h, stage1hr.c, stage1hr.h, stage2.c, 13516 stage2.h, stage3.c, stage3hr.c, stage3hr.h, util: Merged revisions 42282 13517 to 42511 from branches/2011-07-05-oligoindex-hr to speed up GMAP 13518 oligoindex and provide options for GMAP 13519 13520 * oligoindex.c: Stop initializing all diagonals. Initialize only when 13521 needed. 13522 13523 * splice-sites-hr.pl: Added -S flag for semicanonical splice sites 13524 13525 * indexdb.c: Fixed Indexdb_new_genome for PMAP 13526 13527 * gmap.c: Changed default index1part for PMAP 13528 135292011-07-07 twu 13530 13531 * splice-sites-hr.pl: Initial import into SVN 13532 135332011-07-05 twu 13534 13535 * shortread.c: Allows for non-digit, then "1" or "2" for paired-end reads 13536 13537 * dynprog.c: Added protection against negative genomic coordinates in 13538 dynamic programming at 5' and 3' ends 13539 13540 * shortread.c: Modified error statement 13541 13542 * stage1hr.c: Adding querylength to genomicbound in cases where no overlap 13543 is found, just in case the overlap is not found. 13544 13545 * stage1hr.c: Requiring high quality when aligning concordant unique hits 13546 with GMAP 13547 135482011-07-04 twu 13549 13550 * stage3.c: Trimming end exons if less than 20 bp and known splicing is 13551 available 13552 13553 * stage1hr.c: Improved debugging statements for GMAP alignments 13554 13555 * splicetrie.c: Counting mismatches correctly, without penalty, when known 13556 splicing exceeds observed distances 13557 13558 * gsnap.c, stage3hr.c, stage3hr.h: Adding ambiguous matches to nmatches when 13559 favor_ambiguous_p is true, which happens if we have known splicing without 13560 observed distances 13561 13562 * dynprog.c, dynprog.h: Implemented find_best_endpoint_to_queryend 13563 13564 * stage3.c: Trimming semi-canonical exons at end also 13565 13566 * splicetrie.c: Fixed bug in failing to push coordinates for ambiguous cases 13567 13568 * pair.c: Fixed symbol for rev semi-canonical pair in debugging output 13569 13570 * iit_store.c: Improved error message when parsing coords 13571 13572 * iit_store.c: Printing entire problematic line when a parsing error occurs 13573 135742011-07-03 twu 13575 13576 * stage3.c: Revised pick_cdna_direction to count semicanonical splices, and 13577 to use nmatches - nmismatches 13578 135792011-07-02 twu 13580 13581 * stage3hr.c: Fixed bug in always eliminating second hit in 13582 Stage3end_remove_overlaps 13583 135842011-07-01 twu 13585 13586 * Makefile.am, setup.ref123positions.ok, setup.ref3positions.ok, 13587 setup1.test.in, setup2.test.in: Made changes for renaming of ref3positions 13588 to ref123positions 13589 13590 * gmap.c, gsnap.c, stage2.c, stage2.h: Added setup for stage 2 to handle 13591 GMAP alignments without splicing 13592 13593 * pair.c: Putting "0M" between adjacent deletion and insertion in CIGAR 13594 string 13595 13596 * gsnap.c, stage1hr.c, stage1hr.h: Providing --allow-gmap flag to control 13597 whether GMAP alignments are allowed 13598 13599 * stage3.c: Adding gap holders before nonconcordant exon trimming. Checking 13600 if nonconcordant end exon is less than halfway from end. In picking cDNA 13601 direction, checking only for presence or absence of nonconcordant or 13602 concordant introns. 13603 13604 * stage2.c: Put back diffdist_penalty 13605 13606 * stage1hr.c: Using better genomicbounds when running GMAP alignments. 13607 Introduced check for very bad GMAP alignments (nmatches < querylength/2). 13608 Comparing GMAP against original hit for aligning concordant uniques with 13609 GMAP. Trying top hits with GMAP to find concordant pairs, rather than 13610 checking to see if the total number of hits is less than a threshold. 13611 13612 * pair.c, pair.h, shortread.c, stage3hr.c, stage3hr.h, substring.c, 13613 substring.h: Added procedures for computing better genomicbounds on GMAP 13614 alignments, based on overlap between the paired ends 13615 13616 * gsnap.c, splicetrie.c, splicetrie.h: Added option for amb_closest_p 13617 behavior, where shortest intron among ambiguous ones is picked 13618 13619 * genome-write.c: Added index1part as parameter to some procedures 13620 136212011-06-30 twu 13622 13623 * stage3hr.c: Allowing terminal ends to win on the basis of nmatches, 13624 instead of score 13625 13626 * stage3.c: Added pass 6a to remove noncanonical end exons 13627 13628 * stage2.c: Restored NINTRON_PENALTY_MISMATCH from 4 to 8 13629 13630 * stage1hr.c: Generalized minimum querylength for one-miss algorithm to 13631 handle 15-mers. Order is now GMAP, terminal 1, terminal 2, and distant 13632 splicing, with terminals done if found_score > trigger_score_for_gmap. 13633 13634 * gmap.c, gsnap.c, iit-read.c, iit-read.h, outbuffer.c, outbuffer.h: Added 13635 flags for read group library and platform in SAM headers 13636 13637 * compress.c, compress.h, genome-write.c, genome-write.h, gmapindex.c: 13638 Allowing for 15-mer genomic indices when writing an uncompressed genome 13639 using a file 13640 13641 * spanningelt.c: Generalized from 12-mers to 15-mers 13642 136432011-06-29 twu 13644 13645 * gmap_build.pl.in: Fixed installation for 12-mer and 15-mer indices 13646 13647 * gmap.c, gsnap.c: Added -B 5 option to allocate offsets file 13648 13649 * trunk, src, Makefile.dna.am, access.h, atoiindex.c, block.c, block.h, 13650 cmetindex.c, gdiag.c, gmap.c, gmapindex.c, gsnap.c, indexdb.c, indexdb.h, 13651 indexdb_dump.c, indexdb_hr.c, indexdb_hr.h, indexdbdef.h, oligo-count.c, 13652 oligo.c, oligo.h, oligop.c, oligop.h, outbuffer.c, pmapindex.c, 13653 snpindex.c, spanningelt.c, splicetrie.c, stage1.c, stage1.h, stage1hr.c, 13654 stage1hr.h, stage2.c, stage3.c, stage3.h, stage3hr.c, stage3hr.h, 13655 substring.c, substring.h, util, gmap_build.pl.in, gmap_setup.pl.in: Merged 13656 revisions 41633 to 41936 from branches/2011-06-22-index-15mers to allow 13657 for 15-mers in genomic indices 13658 136592011-06-24 twu 13660 13661 * stage1hr.c: Computing query_compress if necessary before deciding whether 13662 to perform halfmapping GMAP on concordant unique 13663 13664 * pair.c: Fixed bug in printing start: instead of end: when endtype2 is END 13665 136662011-06-22 twu 13667 13668 * trunk, config.site.rescomp.prd, config.site.rescomp.tst, src, stage1hr.c, 13669 util: Merged revisions 41618 to 41631 from 13670 branches/2011-06-22-gmap-earlier to move GMAP algorithm before distant 13671 splices 13672 13673 * dynprog.c, dynprog.h, gmap.c, gsnap.c, iit-read.c, intron.c, intron.h, 13674 stage1hr.c, stage1hr.h, stage3.c, stage3.h, stage3hr.c, stage3hr.h: Merged 13675 revisions 41516 to 41609 from branches/2011-06-14-terminals to try GMAP up 13676 to 5 times before terminals if no concordant matches can be found. 13677 136782011-06-21 twu 13679 13680 * dynprog.c, dynprog.h, gmap.c, iit-read.c, iit-read.h, maxent_hr.c, pair.c, 13681 pair.h, splicetrie.c, splicetrie.h, stage1hr.c, stage2.c, stage3.c, 13682 stage3.h, stage3hr.c, stage3hr.h, substring.c, substring.h: Merged 13683 revisions 41410 to 41516 from branches/2011-06-14-terminals. Applying GMAP 13684 for halfmapping multiple and unpaired unique results. Allowing ambiguous 13685 splice ends for GMAP alignments. Fixed computation of splice junctions. 13686 Allowing canonical exons to be found for short exons in GMAP alignments. 13687 Improved pick_cdna_direction. Made fixes to GSNAP output for GMAP 13688 alignments. 13689 136902011-06-18 twu 13691 13692 * src, stage3hr.c: Merged change from branches/2011-06-14-terminals to check 13693 for subsumption in paired alignments 13694 136952011-06-17 twu 13696 13697 * samprint.c, stage1hr.c, stage3hr.c, stage3hr.h, substring.c, substring.h: 13698 Merged revisions 41223 to 41410 from branches/2011-06-14-terminals to use 13699 lexicographic comparison in Stage3end_remove_duplicates; to prevent known 13700 splicing from extending past chromosomal bounds; to prevent end indels 13701 from going past right end of chromosome; to iterate through both 13702 mismatch_positions_cont and mismatch_positions_shift simultaneously in end 13703 indel procedures; and to compute nmatches over entire substring. 13704 137052011-06-14 twu 13706 13707 * genome.c: Allowing fill_buffer_simple procedures to fill past the left end 13708 of the genome 13709 13710 * splicetrie.c, splicetrie.h, stage1hr.c: Preventing short-overlap splicing 13711 from going past chromosomal boundaries 13712 13713 * VERSION: Updated version 13714 13715 * gmap.c: Increased default chimera margin from 20 to 40 13716 13717 * trunk, src, gmap.c, stage3.c, stage3.h, util: Merged revisions 41185 to 13718 41214 from branches/2011-06-13-gmap-merge. Merging chimeric parts when 13719 possible. 13720 13721 * trunk, config.site.rescomp.tst, src, maxent.c, stage3hr.c, util: Merged 13722 revisions 40955 to 41210 from releases/internal-2011-06-09 13723 13724 * trunk, src, dynprog.c, stage1hr.c, stage3hr.c, stage3hr.h, util: Merged 13725 revisions 40682 to 41210 from releases/internal-2011-06-06 13726 13727 * stage1hr.c: Turning on NEW_TERMINALS branch 13728 137292011-06-13 twu 13730 13731 * shortread.c, shortread.h: Added function Shortread_find_overlap 13732 137332011-06-12 twu 13734 13735 * stage1hr.c: Added hooks for computing terminals at zero penalty if 13736 necessary 13737 137382011-06-10 twu 13739 13740 * stage3hr.c: Allowing use of Stage3end_substringD and Stage3end_substringA 13741 by half splices 13742 13743 * samprint.c, stage3hr.c, substring.c, substring.h: Merged changes from 13744 releases/internal-2011-06-09 to change internal representation of splice 13745 from substring1 for donor and substring2 for acceptor. Now substring1 and 13746 substring2 are in query order. Needed to avoid problems when a splice was 13747 labeled both as donor/acceptor and as acceptor/donor. 13748 13749 * gsnap.c: Checking for value of adapter stripping flag. Turning on adapter 13750 stripping by default. 13751 137522011-06-09 twu 13753 13754 * gmap.c, gsnap.c: Joining worker threads instead of detaching them, so 13755 Inbuffer_free can be called safely 13756 13757 * gmap.c, stage3.c, stage3.h: Merge chimeric parts into a single continuous 13758 alignment if possible. 13759 13760 * VERSION: Updated version 13761 13762 * gmap.c: Allowing chimera switchpoint to occur one base pair earlier 13763 13764 * chimera.c: Improved debugging statements 13765 13766 * genome.c: Made Genome_fill_buffer refer to its local genome argument, not 13767 the global one. Needed to fix a bug in snpindex. 13768 13769 * Makefile.dna.am, Makefile.gsnaptoo.am, compress.h, dynprog.h, outbuffer.c, 13770 pair.c, pair.h, pairpool.h, samprint.c, splicetrie.h, splicetrie_build.h, 13771 stage3.c, stage3.h: Printing XT flag for translocation information in SAM 13772 output of both GMAP and GSNAP 13773 13774 * chimera.c, chimera.h, gmap.c: Changed algorithm for finding chimera 13775 boundary in GMAP to maintain best number of mismatches, and then to find 13776 highest splice site probabilities within that range 13777 137782011-06-08 twu 13779 13780 * gsnap.c: Fixed mode flag to take a required argument 13781 13782 * gmap.c, outbuffer.c, outbuffer.h: Added .transloc split output file for 13783 GMAP 13784 13785 * gmap.c: Hiding -s flag from help output 13786 13787 * gmap.c, stage3.c, stage3.h: Revising pairarray genomepos coordinates of 13788 GMAP chimeras to be chromosomal coordinates, so SAM output is correct 13789 13790 * pair.c: Implemented hard clipping in SAM output for GMAP chimeras 13791 137922011-06-06 twu 13793 13794 * samprint.c: Made the sign of insertlength for a translocation depend on 13795 the concordant substring 13796 13797 * VERSION: Updated version number 13798 13799 * stage3hr.c: Fixed computation of pair_insert_length when there is no 13800 overlap 13801 13802 * stage1hr.c: Added debugging statements 13803 13804 * stage1hr.c: Corrected privatep flags and memory freeing for halfmapping 13805 unique cases solved by GMAP. 13806 13807 * outbuffer.c: Corrected argument list for GMAP when MEMUSAGE is turned on 13808 13809 * stage3hr.c, stage3hr.h: Added function Stage3end_effective_chrnum 13810 13811 * samprint.c: For mates that are translocations, using the effective chrnum 13812 in printing the mate location 13813 13814 * resulthr.c: Fixed bug in assigning UNPAIRED_TRANSLOC category 13815 138162011-06-05 twu 13817 13818 * stage1hr.c: Added checks for new pair from Stage3pair_new being NULL 13819 13820 * stage3.c: Placed a limit on iterations of building ends using known 13821 splicing 13822 13823 * trunk, src, util: Merged property changes on subdirectories from 13824 branches/2011-06-04-gmap-genomicseg 13825 13826 * VERSION, config.site.rescomp.prd, config.site.rescomp.tst: Updated version 13827 number 13828 13829 * dynprog.c, gmap.c, pair.c, stage1hr.c, stage3.c, stage3hr.c, stage3hr.h: 13830 Merged changes from revision 40649 to 40663 from 13831 branches/2011-06-04-gmap-genomicseg to provide correct bounds on GMAP 13832 alignment in GSNAP and to improve various issues in alignment, including 13833 close indels 13834 138352011-06-03 twu 13836 13837 * stage3hr.c: Changed paired_seenp back to paired_usedp in 13838 Stage3end_remove_duplicates, because not all pairs are seen in previous 13839 pair_up procedures, resulting in a fatal bug. 13840 13841 * config.site.rescomp.prd, config.site.rescomp.tst, VERSION: Updated version 13842 number 13843 13844 * gmap.c, splicetrie.c, splicetrie.h, stage1hr.c, stage3.c, stage3.h: Making 13845 use of jump_late_p 13846 13847 * dynprog.c, dynprog.h: Added provision for jump_late_p. Fixed issue in 13848 jump_penalty where it was not consistent with jump_penalty_init. Now both 13849 procedures compute extend*length. 13850 138512011-06-02 twu 13852 13853 * pair.c: Fixed printing of dashes in GSNAP standard output 13854 13855 * dynprog.c, dynprog.h, gmap.c, stage1hr.c, stage3.c, stage3.h: Allowing 13856 close combinations of insertions and deletions, by allowing onesidegapp to 13857 be false and letting extraband_single equal 3 instead of 0. Controlled by 13858 --allow-close-indels flag in GMAP. Default set to be on in GMAP and in 13859 GSNAP. 13860 13861 * stage1hr.c: Not performing Stage3end_remove_duplicates on exact matches, 13862 which should not be necessary. 13863 13864 * stage3hr.c: In Stage3end_remove_duplicates, checking against paired_seenp, 13865 instead of paired_usedp, for speed. 13866 13867 * stage3hr.c: Reverted back to revision 40489 of Stage3_pair_up_concordant, 13868 which does has the pairing procedures inline. 13869 13870 * stage3hr.c: Attempt to have different lists for old and new hits, but this 13871 seems to slow down the program. 13872 138732011-06-01 twu 13874 13875 * stage3hr.c: Moved parts of Stage3_pair_up_concordant into separate 13876 procedures 13877 13878 * stage1hr.c, stage3hr.c, stage3hr.h: Performing GMAP on concordant unique 13879 results where one end is of type TERMINAL 13880 13881 * gsnap.c: Changed default indel penalty from 1 to 2 13882 13883 * stage3hr.h: Formatting change 13884 13885 * stage3hr.c, substring.c, substring.h: Using nchimera_novel in 13886 Stage3end_remove_overlaps 13887 13888 * stage3hr.c: In Stage3pair_remove_overlap, favoring longer insert lengths, 13889 if all else is equal 13890 13891 * pair.c: Moved MD string for SAM output before NH tag to be consistent with 13892 other GSNAP SAM output 13893 13894 * stage1hr.c, stage3hr.c, stage3hr.h, substring.c, substring.h: Making a 13895 distinction between Stage3end_remove_duplicates and 13896 Stage3end_remove_overlaps 13897 13898 * stage3hr.c: Reverted to old method of finding pair insert length, where 13899 all substrings are checked. 13900 13901 * stage1hr.c: In pairing algorithm, moved short-overlap splicing and distant 13902 splicing into a single singlesplicing class, so duplicates are handled 13903 properly. 13904 13905 * gsnap.c: Added documentation for --use-tally flag 13906 139072011-05-30 twu 13908 13909 * inbuffer.c, inbuffer.h: Changed nspaces and nread to unsigned int 13910 13911 * gmap.c, gsnap.c, outbuffer.c, outbuffer.h: Made output buffer size a 13912 user-definable parameter 13913 13914 * gmap.c, gsnap.c, outbuffer.c, outbuffer.h: Made more changes to output 13915 thread. Made noutput a local variable. Clearing backlog in ordered output 13916 when necessary. 13917 13918 * stage1hr.c: Added a dinucleotide check for repetitive sequences 13919 13920 * gmap.c, gsnap.c, mem.c, mem.h, outbuffer.c, request.c, result.c, 13921 resulthr.c, sequence.c, shortread.c, stage1hr.c, stage1hr.h, stage3.c, 13922 stage3hr.c, substring.c: Replaced LEAKCHECK system with MEMUSAGE system 13923 13924 * list.c, list.h: Added specialized procedures for using specific memory 13925 pools for memusage 13926 13927 * inbuffer.c: Added comment 13928 13929 * diagpool.c, diagpool.h, pairpool.c, pairpool.h: Added procedures for 13930 reporting memory usage. Using memory from keep portion. 13931 13932 * outbuffer.c: Cleaned up pthread code for output thread. Added MAXQUEUE to 13933 clear out outbuffer. 13934 139352011-05-28 twu 13936 13937 * config.site.rescomp.tst: Added -Wextra to CFLAGS 13938 13939 * gsnap.c, stage3hr.c: Eliminating hitpair duplicates based on hittypes of 13940 ends. Allowing MAPQ score to go as high as 96. 13941 139422011-05-27 twu 13943 13944 * gsnap.c, stage1hr.c, stage3hr.c, stage3hr.h: Implemented mapq-unique-score 13945 13946 * internal-2011-02-27, AUTHORS, COPYING, INSTALL, MAINTAINER, Makefile.am, 13947 NEWS, README, VERSION, acinclude.m4, bootstrap.dna, bootstrap.gmaponly, 13948 bootstrap.gsnaptoo, bootstrap.pmaptoo, bootstrap.three, config, 13949 acx_mmap_fixed.m4, acx_mmap_variable.m4, acx_pthread.m4, builtin.m4, 13950 config.guess, config.sub, expand.m4, fopen.m4, ltmain.sh, 13951 madvise-flags.m4, mmap-flags.m4, pagesize.m4, perl.m4, struct-stat64.m4, 13952 config.site, config.site.rescomp.prd, config.site.rescomp.tst, 13953 configure.ac, dev, maint, memory-check.pl, share, archive.html, 13954 index.html, src, Makefile.dna.am, Makefile.gmaponly.am, 13955 Makefile.gsnaptoo.am, Makefile.pmaptoo.am, Makefile.three.am, 13956 Makefile.util.am, access.c, access.h, add_rpk.c, assert.c, assert.h, 13957 atoi.c, atoi.h, atoiindex.c, backtranslation.c, backtranslation.h, 13958 bam_tally.c, bamread.c, bamread.h, bigendian.c, bigendian.h, block.c, 13959 block.h, bool.h, boyer-moore.c, boyer-moore.h, cappaths.c, changepoint.c, 13960 changepoint.h, chimera.c, chimera.h, chop_primers.c, chrnum.c, chrnum.h, 13961 chrom.c, chrom.h, chrsegment.c, chrsegment.h, chrsubset.c, chrsubset.h, 13962 cmet.c, cmet.h, cmetindex.c, color.c, color.h, comp.h, complement.h, 13963 compress.c, compress.h, convert.t.c, cum.c, datadir.c, datadir.h, datum.c, 13964 datum.h, diag.c, diag.h, diagdef.h, diagnostic.c, diagnostic.h, 13965 diagpool.c, diagpool.h, dibase.c, dibase.h, dibaseindex.c, doublelist.c, 13966 doublelist.h, dynprog.c, dynprog.h, except.c, except.h, exonscan.c, 13967 extents_genebounds.c, fopen.h, gbuffer.c, gbuffer.h, gdiag.c, 13968 geneadjust.c, genecompare.c, geneeval.c, genome-write.c, genome-write.h, 13969 genome.c, genome.h, genome_hr.c, genome_hr.h, genomepage.c, genomepage.h, 13970 genomeplot.c, genomicpos.c, genomicpos.h, genuncompress.c, get-genome.c, 13971 getopt.c, getopt.h, getopt1.c, gmap.c, gmapindex.c, goby.c, goby.h, 13972 gregion.c, gregion.h, gsnap.c, gsnap_best.c, gsnap_concordant.c, 13973 gsnap_extents.c, gsnap_fasta.c, gsnap_filter.c, gsnap_iit.c, 13974 gsnap_multiclean.c, gsnap_splices.c, gsnap_tally.c, gsnap_terms.c, 13975 gsnapread.c, gsnapread.h, hint.c, hint.h, iit-read.c, iit-read.h, 13976 iit-write.c, iit-write.h, iit_dump.c, iit_fetch.c, iit_get.c, 13977 iit_pileup.c, iit_plot.c, iit_store.c, iit_update.c, iitdef.h, inbuffer.c, 13978 inbuffer.h, indexdb.c, indexdb.h, indexdb_dibase.c, indexdb_dibase.h, 13979 indexdb_dump.c, indexdb_hr.c, indexdb_hr.h, indexdbdef.h, interval.c, 13980 interval.h, intlist.c, intlist.h, intlistdef.h, intpool.c, intpool.h, 13981 intron.c, intron.h, lgamma.c, lgamma.h, list.c, list.h, listdef.h, 13982 littleendian.c, littleendian.h, mapq.c, mapq.h, match.c, match.h, 13983 matchdef.h, matchpool.c, matchpool.h, maxent.c, maxent.h, maxent_hr.c, 13984 maxent_hr.h, md5-compute.c, md5.c, md5.h, mem.c, mem.h, memchk.c, mode.h, 13985 nmath.c, nmath.h, nr-x.c, nr-x.h, oligo-count.c, oligo.c, oligo.h, 13986 oligoindex.c, oligoindex.h, oligop.c, oligop.h, orderstat.c, orderstat.h, 13987 outbuffer.c, outbuffer.h, pair.c, pair.h, pairdef.h, pairingcum.c, 13988 pairingflats.c, pairinggene.c, pairingstrand.c, pairingtrain.c, 13989 pairpool.c, pairpool.h, parserange.c, parserange.h, pbinom.c, pbinom.h, 13990 pdl_smooth.c, pdldata.c, pdldata.h, pdlimage.c, plotdata.c, plotdata.h, 13991 plotgenes.c, plotgenes.h, pmapindex.c, random.c, random.h, rbtree.c, 13992 rbtree.h, rbtree.t.c, reader.c, reader.h, reads.c, reads.h, reads_dump.c, 13993 reads_store.c, request.c, request.h, result.c, result.h, resulthr.c, 13994 resulthr.h, revcomp.c, samflags.h, samprint.c, samprint.h, samread.c, 13995 samread.h, scores.h, segmentpos.c, segmentpos.h, segue.c, separator.h, 13996 seqlength.c, sequence.c, sequence.h, shortread.c, shortread.h, smooth.c, 13997 smooth.h, snpindex.c, spanningelt.c, spanningelt.h, spliceclean.c, 13998 spliceeval.c, splicefill.c, splicegene.c, splicegraph.c, splicescan.c, 13999 splicetrie.c, splicetrie.h, splicetrie_build.c, splicetrie_build.h, 14000 spliceturn.c, splicing-scan.c, splicing-score.c, stage1.c, stage1.h, 14001 stage1hr.c, stage1hr.h, stage2.c, stage2.h, stage3.c, stage3.h, 14002 stage3hr.c, stage3hr.h, stopwatch.c, stopwatch.h, subseq.c, substring.c, 14003 substring.h, table.c, table.h, tableint.c, tableint.h, tableuint.c, 14004 tableuint.h, tally.c, tally.h, tally_exclude.c, tally_expr.c, tallyadd.c, 14005 tallyflats.c, tallygene.c, tallyhmm.c, tallystrand.c, translation.c, 14006 translation.h, trial.c, trial.h, types.h, uintlist.c, uintlist.h, 14007 uinttable.c, uinttable.h, svncl.pl, tests, align.test.in, align.test.ok, 14008 coords1.test.in, coords1.test.ok, defs, fa.iittest, iit.test.in, 14009 iit_get.out.ok, iittest.iit.ok, map.test.ok, setup.genomecomp.ok, 14010 setup.idxpositions.ok, setup.ref3positions.ok, setup1.test.in, 14011 setup2.test.in, ss.chr17test, ss.her2, util, dbsnp_iit.pl.in, 14012 ddsgap2_compress.pl, fa_coords.pl.in, gmap_build.pl.in, 14013 gmap_compress.pl.in, gmap_process.pl.in, gmap_reassemble.pl.in, 14014 gmap_setup.pl.in, gmap_uncompress.pl.in, gmap_update.pl.in, 14015 gsnap-fetch-reads.pl, gsnap-fetch-reads.pl.in, gsnap-remap.pl, 14016 gsnap-remap.pl.in, md_coords.pl.in, psl_introns.pl.in, psl_splices.pl.in, 14017 psl_splicesites.pl.in, sam_merge.pl.in, sam_restore.pl.in, 14018 sim4_compress.pl, sim4_uncompress.pl, spidey_compress.pl, whats_on, trunk: 14019 Restored gmap trunk subdirectory 14020 14021 * VERSION: Updated version 14022 14023 * substring.c: Computing MAPQ on entire substring, not on trimmed portion 14024 14025 * dynprog.c, dynprog.h, gmap.c, stage1hr.c, stage3.c, stage3.h: For genomic 14026 GMAP alignments in GSNAP, not assigning any canonical reward, not 14027 computing pairs_rev, and not scoring introns. 14028 14029 * splicetrie.c, splicetrie.h, stage1hr.c: Removed unused parameter 14030 splicetypes 14031 14032 * gsnap.c, stage1hr.c, stage1hr.h: Removed unused parameters, including 14033 queryptr and queryrc 14034 14035 * src, Makefile.dna.am, Makefile.gsnaptoo.am, dynprog.c, dynprog.h, 14036 genome.c, genome.h, genome_hr.c, genome_hr.h, gmap.c, gsnap.c, mapq.c, 14037 mapq.h, maxent_hr.c, maxent_hr.h, oligoindex.c, outbuffer.c, outbuffer.h, 14038 pair.c, pair.h, samprint.c, samprint.h, splicetrie.c, splicetrie.h, 14039 splicetrie_build.c, splicetrie_build.h, stage1hr.c, stage1hr.h, stage3.c, 14040 stage3.h, stage3hr.c, stage3hr.h, substring.c, substring.h: Merged 14041 revisions 40182:40234 from branches/2011-05-27-no-block-vars to reduce 14042 number of parameters 14043 140442011-05-26 twu 14045 14046 * VERSION: Updated version number 14047 14048 * dynprog.c, dynprog.h, splicetrie.c, splicetrie.h, stage1hr.c, stage3.c, 14049 stage3.h: Restored complete searching of known splicesites for dynamic 14050 programming of ends 14051 14052 * stage1hr.c, dynprog.c, dynprog.h, splicetrie.c, splicetrie.h, stage3.c, 14053 stage3.h: Created hybrid procedure for performing dynamic programming at 14054 5' and 3' ends with known splicing 14055 14056 * dynprog.c, dynprog.h, splicetrie.c, splicetrie.h, stage3.c, stage3.h: 14057 Wrote faster procedure for performing dynamic programming at 5' and 3' 14058 ends with known splicing, but does not handle distal indels. 14059 14060 * inbuffer.c, gsnap.c, request.c, request.h, shortread.c, shortread.h: 14061 Performing chopping of adapters only after paired-end alignment fails to 14062 give concordant or paired result. 14063 14064 * stage1hr.c: Removed query as a parameter. Changed knownsplice limits. 14065 14066 * uintlist.c, uintlist.h: Added procedure Uintlist_to_string 14067 14068 * mapq.c, mapq.h, stage3hr.c, stage3hr.h, substring.c, substring.h: Removed 14069 query as parameter to procedures 14070 14071 * genome_hr.c, genome_hr.h: Removed query as parameter to some procedures 14072 140732011-05-25 twu 14074 14075 * stage3hr.c: Moved assertions about private5p and private3p to correct place 14076 14077 * gsnap.c, inbuffer.c, request.c, request.h, shortread.c, shortread.h: When 14078 potential paired-end adapter is found, checking alignment first without 14079 chopping adapters, and then if no concordant or paired alignments are 14080 found, then re-aligning with adapters chopped. 14081 14082 * stage3hr.c: Removing only duplicates that have not been used yet in a pair 14083 14084 * stage1hr.c: Doing Stage3pair_privatize before Stage3pair_eval 14085 14086 * pair.c: Added NH tag for GMAP alignments in GSNAP 14087 14088 * get-genome.c: Enabling re-use of contig_iit 14089 14090 * shortread.c: Fixed bug in printing pairedend fasta. Was printing both 14091 revcomp and forward sequence for queryseq2. 14092 140932011-05-24 twu 14094 14095 * stage1hr.c, stage3hr.c, stage3hr.h: Reduced amount of memory copying in 14096 making Stage3pair_T objects 14097 14098 * get-genome.c, parserange.c, parserange.h: Made operation of get-genome 14099 from stdin more efficient by making only one open of chromosome_iit and 14100 contig_iit 14101 14102 * VERSION: Updated version 14103 14104 * gsnap.c: Added message to indicate when alignment is starting 14105 14106 * stage1hr.c: Doing pairing only when individual alignments are performed 14107 14108 * outbuffer.c: Fixed debugging statement 14109 14110 * gmap.c: Added information to --help output on the -f flag about other 14111 output types 14112 14113 * gsnap.c: Changed default value of genome_unk_mismatch_p to be 1 14114 141152011-05-23 twu 14116 14117 * samprint.c: Fixed sign insert size when read and mate have identical chrpos 14118 14119 * VERSION: Updated version 14120 14121 * samprint.c: Added NH flag to indicate number of paths 14122 14123 * gsnap_concordant.c: Defined concordance to allow for overlapping reads 14124 14125 * stage1hr.c: Introduced DEBUG4K for known doublesplicing 14126 14127 * dynprog.c, dynprog.h, gmap.c, intlist.c, intlist.h, splicetrie.c, 14128 splicetrie.h, stage3.c, stage3.h: Removing duplicate results from 14129 splicetrie when SNPs are allowed 14130 14131 * samprint.c: Fixed bug in printing cigar for two-thirds shortexon on minus 14132 strand 14133 14134 * gmap.c: Removed include of mode.h 14135 14136 * Makefile.dna.am, Makefile.gsnaptoo.am, atoi.c, atoi.h, atoiindex.c, 14137 genome_hr.c, genome_hr.h, gmap.c, gsnap.c, indexdb.c, indexdb.h, mode.h, 14138 stage1hr.c, stage1hr.h, substring.c, substring.h: Using Mode_T instead of 14139 cmetp. Incorporated atoi mode. 14140 14141 * oligo.c, oligo.h: Removed Oligo_setup 14142 14143 * convert.t.c: Initial import into SVN 14144 14145 * gsnap_fasta.c: Added code stub for handling BAM input 14146 14147 * gsnap.c: Added a thread-specific key for storing the request, and 14148 accessing it with the signal handler, which no longer throws an exception. 14149 14150 * stage3.c: Checking for case in distalmedial comparison where medial 14151 location extends past given genomicseg. 14152 14153 * stage1hr.c: Replaced indirect function calls with direct calls to 14154 read_oligos_cmet and read_oligos_standard. 14155 14156 * oligo.c: Replaced indirect function calls with direct calls to oligo_read 14157 and oligo_revise. Not handling dibasep anymore. 14158 14159 * genome_hr.c, genome_hr.h: Replaced indirect function calls with static 14160 inline procedures 14161 141622011-05-22 twu 14163 14164 * stage1.c: Removed dibase parameter in calling Reader_new 14165 14166 * stage3hr.c: Removed assertion check for plusp equality in 14167 pair_insert_length. For splice translocations, redefining plusp based on 14168 substring_for_concordance. 14169 14170 * gsnap.c: Fixed output of query when exception occurs 14171 14172 * except.c: Fixed handling of exceptions by removing unnecessary call to 14173 Except_advance_stack. 14174 14175 * gsnap.c: Removed dibasep and cmetp as parameters. 14176 14177 * stage1hr.c, stage1hr.h: Removed dibasep as a parameter. Also eliminated 14178 cmetp as a parameter. 14179 14180 * substring.c, substring.h, stage3hr.c, stage3hr.h, splicetrie.c, 14181 splicetrie.h, reader.c, reader.h, mapq.c, mapq.h, genome_hr.c, 14182 genome_hr.h: Removed dibasep as a parameter 14183 14184 * oligo.c, oligo.h: Removed dibasep as a parameter. Added setup procedure 14185 to assign procedure for dibase operation. 14186 141872011-05-21 twu 14188 14189 * genome_hr.c, genome_hr.h, gsnap.c, mapq.c, mapq.h, splicetrie.c, 14190 splicetrie.h, stage1hr.c, stage3hr.c, stage3hr.h, substring.c, 14191 substring.h: Setting block_diff procedure in genome_hr.c during setup, and 14192 removing many uses of the cmetp variable. 14193 14194 * dynprog.c: Include header for splicetrie.h 14195 141962011-05-20 twu 14197 14198 * cmet.c, cmet.h, genome_hr.c: Moved mark_a, mark_c, mark_g, and mark_t data 14199 and procedures from cmet.c to genome_hr.c 14200 14201 * Makefile.gsnaptoo.am: Made files for GMAP and GSNAP match those in 14202 Makefile.dna.am 14203 14204 * Makefile.dna.am: Minor rearrangement of filenames 14205 14206 * samprint.c: Fixed printing of SAM output for translocations 14207 14208 * stage3hr.c: Using both hit5 and hit3 end points in hitpair_equal_cmp. Put 14209 tally ahead of score in ranking subsumed hitpairs. 14210 14211 * splicetrie.c: Constraining max_mismatches_allowed to be less than 14212 one-third of the end length 14213 14214 * substring.c: Added header file for pair.h 14215 14216 * stage3hr.c: Not using absdifflength bingo at all 14217 14218 * gsnap.c: Removed references to pairlength deviation 14219 14220 * stage3.c: In peel_leftward and peel_rightward, when running into a second 14221 gap, transferring endgappairs first before transferring peeled pairs. 14222 14223 * dynprog.c: Using List_push_existing instead of Pairpool_push_existing to 14224 save on memory 14225 14226 * stage3hr.c: Using only best splice within constraints and not pairlength 14227 in resolving ambiguous inside splices 14228 14229 * stage1hr.c: Fixed definition of collect_all_p for shortexons 14230 14231 * splicetrie.c, splicetrie.h: Removed old code. Implemented collect_all_p 14232 in Splicetrie_search_left and Splicetrie_search_right procedure. 14233 142342011-05-19 twu 14235 14236 * stage1hr.c, stage3hr.c, stage3hr.h: Requiring concordant pairs to have a 14237 non-zero insert length 14238 14239 * stage1hr.c, stage3hr.c, stage3hr.h: Recording number of ambiguous matches 14240 after known splicing, and subtracting from nmatches when ambiguous inner 14241 splicing yields no candidates. 14242 142432011-05-18 twu 14244 14245 * gmap.c, stage1hr.c, stage2.c, stage2.h, stage3.c: Allowing stage 2 to 14246 favor either left or right part of genomicseg 14247 14248 * gsnap.c: Using the same value for middle and end indel penalties. Changed 14249 flags to allow only one indel penalty to be specified. 14250 14251 * stage3hr.c: In Stage3pair_remove_duplicates, allowing for ties within 14252 cluster. Using hittype to rank hitpairs in hitpair_equal_cmp, but not for 14253 distinguishing hitpairs in hitpair_equal_no_hittype_cmp. 14254 14255 * stage3hr.c: Fixed bug in using hitpair_equal_cmp 14256 14257 * stage3hr.c: Using same procedure, hitpair_equal_cmp (previously called 14258 hitpair_position_cmp), both for sorting and for recognizing equal hitpairs. 14259 14260 * stage3hr.c: In Stage3pair_remove_duplicates, added hittype in sorting and 14261 removing exact duplicates. Going through clusters separately from left 14262 and right and checking for subsumption against initial alignment, not the 14263 previous one. 14264 142652011-05-17 twu 14266 14267 * stage3hr.c: In Stage3pair_remove_duplicates, using tally within clusters. 14268 Removing absdifflength bingo from Stage3pair_optimal_score. 14269 14270 * README, config.site, configure.ac: Setting default value of MAX_READLENGTH 14271 to be 200 14272 14273 * gsnap.c: Providing value of MAX_READLENGTH in printing --version output 14274 14275 * dibase.c, inbuffer.c, mapq.c, shortread.c, stage3hr.c, substring.c: 14276 Removed unnecessary includes of stage1hr.h, needed previously to obtain 14277 MAX_READLENGTH 14278 14279 * Makefile.dna.am: Providing MAX_READLENGTH to gsnap. Provided files for 14280 bam_fasta. 14281 14282 * README, config.site, configure.ac, Makefile.gsnaptoo.am, dibase.c, 14283 gsnap.c, inbuffer.c, mapq.c, reads_store.c, samprint.c, shortread.c, 14284 stage1hr.c, stage1hr.h, stage3hr.c, substring.c: Changed MAX_QUERYLENGTH 14285 to MAX_READLENGTH and allowing value to be defined as an argument to 14286 configure 14287 14288 * samprint.c, shortread.c, shortread.h: Printing chopped primers in SAM 14289 output 14290 14291 * stage3hr.c: Moved code for resolving inside ambiguous splices to separate 14292 procedures. Allowing mate of a GMAP alignment to resolve its inside 14293 ambiguous splice. 14294 14295 * dynprog.c, dynprog.h, gmap.c, splicetrie.c, splicetrie.h, stage1hr.c, 14296 stage3.c, stage3.h: Limiting region of known splice extension for GMAP 14297 alignments in GSNAP involving paired-end reads. Region now cannot extend 14298 past mate. 14299 143002011-05-16 twu 14301 14302 * stage3hr.c: Allowing overlapping of paired-end reads when resolving 14303 ambiguous splices on insides 14304 14305 * dynprog.c, dynprog.h, gmap.c, stage1hr.c, stage3.c, stage3.h: Counting 14306 ambiguous end matches in stage 3 alignment 14307 14308 * substring.c: Hid debugging statement 14309 14310 * gsnap.c: Using user-provided dir for tally IIT when available 14311 14312 * substring.c, substring.h, gsnap.c, stage1hr.c, stage3hr.c, stage3hr.h: 14313 Implemented a multiclean procedure using a tally IIT file 14314 143152011-05-15 twu 14316 14317 * stage3hr.c: Not using hitpair type in making comparisons across multiple 14318 alignments. Using subsumption instead of overlap in 14319 Stage3pair_remove_duplicates. 14320 143212011-05-14 twu 14322 14323 * stage3hr.c: Changed removal of Stage3pair_T duplicates within overlapping 14324 clusters from an O(n^2) algorithm to an O(n) algorithm 14325 143262011-05-13 twu 14327 14328 * stage1hr.c: Added debugging statement 14329 14330 * stage3hr.c: Assigning plusp for translocations based on overall 14331 genomestart and genomeend. Setting substring_low and substring_high for 14332 translocations to be the part that is concordant. 14333 14334 * samprint.c: Fixed bug in printing of translocations 14335 14336 * gsnap.c, genome_hr.c, genome_hr.h: Added options --query-unk-mismatch and 14337 --genome-unk-mismatch, and made both default false, meaning that query N 14338 and genome N no longer count as mismatches 14339 143402011-05-12 twu 14341 14342 * samprint.c, stage3hr.c, stage3hr.h: Handling case where clipping of 14343 overlap removes entire alignment 14344 14345 * pair.c: Fixed bug in number of dashes in GSNAP output on deletions 14346 14347 * substring.c: Fixed GSNAP and SAM output for bisulfite alignments 14348 14349 * stage1hr.c: Eliminating cases with bad GMAP alignments, either with 14350 non-canonical splices or with too many mismatches 14351 14352 * stage3.c: On extend_ending5 and extend_ending3, returning dynamic 14353 programming results, even if finalscore is negative 14354 143552011-05-11 twu 14356 14357 * snpindex.c: Checking for presence of IIT file at destination, and 14358 providing a better reminder message 14359 14360 * snpindex.c: Added reminder message at end to install IIT file 14361 14362 * gsnap.c: Added warnings under --use-cmet flag if cmet index files are not 14363 present 14364 14365 * substring.c: Removed references to MAX_END_DELETIONS 14366 14367 * stage1hr.c, substring.c: Allocating gbuffer when it exceeds the amount 14368 allocated statically 14369 14370 * stage3hr.c, substring.c, substring.h: Added preference in removing 14371 duplicates for known splice sites over novel ones 14372 14373 * gsnap.c: Fixed documentation for --clip-overlap flag 14374 14375 * outbuffer.c, pair.c, pair.h, samprint.c, samprint.h, stage3hr.c, 14376 stage3hr.h, substring.c, substring.h: Performing search for correct 14377 hardclipping boundary. Computing chrpos and mate_chrpos in advance of 14378 printing SAM output. For Pair_binary_search, performing forward and 14379 backward search of middlei to avoid gaps. Fixed computation of overlap in 14380 some cases involving GMAP alignments. 14381 143822011-05-10 twu 14383 14384 * samprint.c: Made logic of print_cigar follow that of print_md_string 14385 14386 * outbuffer.c, samprint.c, shortread.c, stage3hr.c: Fixes made to MD string 14387 with hard clipping of overlaps 14388 143892011-05-09 twu 14390 14391 * pair.c: Fixed infinite loops in binary search procedures 14392 14393 * resulthr.c: Removed unused variable 14394 14395 * pair.c, pair.h, samprint.c: Performing hard clipping by computing a 14396 subsequence on pairarray 14397 14398 * stage3hr.c: Allowing for NULL arguments in Stage3end_substring_low and 14399 Stage3end_chrnum, now possible in samprint.c procedures 14400 14401 * stage1hr.c: Rewrote calculation for genomicseg 14402 14403 * stage3hr.c: Fixed calculation of insert length involving GMAP alignment 14404 14405 * samprint.c, pair.c: Fixed printing of SAM chromosomal pos 14406 14407 * pair.c, pair.h, samprint.c: Further implementation of hard clipping for 14408 overlapping paired-end reads 14409 144102011-05-07 twu 14411 14412 * pair.c: Fixed bug caused by wrong order of parameters 14413 14414 * gsnap.c, outbuffer.c, outbuffer.h, pair.c, pair.h, samprint.c, samprint.h, 14415 stage3.c, stage3hr.c, stage3hr.h, substring.c, substring.h: Added code for 14416 hardclipping overlaps between paired ends. Improved computation of insert 14417 lengths involving GMAP alignments. Increased default shortsplicedist to 14418 200000. 14419 14420 * gmap.c: Changed default shortsplicedist to 200000 14421 14422 * get-genome.c: Added -E option for printing exons for gene maps 14423 14424 * genome.c: Made changes to perror statements 14425 144262011-05-05 twu 14427 14428 * samprint.c: Made changes to MD string to handle hard clipping 14429 14430 * stage1hr.c: Reduced genomicseg for GMAP from pairmax + shortsplicedist to 14431 just pairmax 14432 14433 * dynprog.c: Added a new debugging category for known splicing at ends 14434 14435 * outbuffer.c, samprint.c, samprint.h, stage3hr.c, stage3hr.h: Computing 14436 cigar strings to allow for hard clipping 14437 14438 * pair.c, pair.h, stage3.c: Fixed SAM flags when printing GMAP alignment in 14439 GSNAP 14440 14441 * bam_tally.c: Made bam_tally work on entire genome 14442 144432011-05-04 twu 14444 14445 * README, chrom.c, chrom.h, gmapindex.c, fa_coords.pl.in, gmap_build.pl.in, 14446 gmap_process.pl.in, gmap_setup.pl.in: Made changes for fa_coords to sort 14447 chromosomes in .coords file, for gmap_process to provide universal 14448 coordinate information to gmapindex, and for gmapindex to sort based on 14449 this order. Ignoring leading "chr" in sorting chromosomes. 14450 14451 * chrom.c: Ignoring leading "chr" in chromosome name for sorting purposes 14452 144532011-05-03 twu 14454 14455 * stage1hr.c: Fixed bug in calling Substring_chrhigh 14456 144572011-05-02 twu 14458 14459 * gsnap.c, stage1hr.c, stage3hr.c, stage3hr.h, substring.c, substring.h: 14460 Storing both chroffset and chrhigh in Stage3end_T and Substring_T objects. 14461 Checking chromosomal boundaries when performing GMAP algorithm in GSNAP. 14462 144632011-05-01 twu 14464 14465 * gmap_build.pl.in: Added Id property 14466 14467 * gmap_build.pl.in: Added a -w flag for sleeping between steps 14468 144692011-04-27 twu 14470 14471 * splicetrie.c: Removed unused variables 14472 14473 * goby.c, gsnap.c, outbuffer.c, pair.c, pair.h, resulthr.c, resulthr.h, 14474 samprint.c, samprint.h, stage1hr.c, stage1hr.h, stage3.c, stage3.h, 14475 stage3hr.c, stage3hr.h: Changes in handling of translocations: (1) Created 14476 new "_transloc" output files. (2) Removing XT flag from SAM output. (3) 14477 Enforcing reported translocations to be unique. (4) Creating a new 14478 category for translocations in pair_up procedure. 14479 14480 * gmap.c: Fixed bug in calling Genome_blocks on a user-provided segment 14481 14482 * diag.c: Added debugging information about minactive and maxactive at ends 14483 of query 14484 144852011-04-26 twu 14486 14487 * stage2.c: Not checking for pct_coverage or ncovered when querylength < 150 14488 14489 * tally_expr.c: Removed dependence on pre-computed total in tally IIT file 14490 14491 * README: Showing examples from both hg18 and hg19 in retrieving known 14492 splicesite tracks from UCSC 14493 14494 * gsnap.c, stage1hr.c, stage1hr.h: Removed min_localsplicing_end_matches 14495 parameter. For short overlaps, now checking only that endlength >= 14496 support. 14497 14498 * pair.c: Not printing first read or second read bit in GMAP samse output 14499 145002011-04-25 twu 14501 14502 * shortread.c: Fixed bug in printing queryseq in SAM output when 14503 hardclipping is present 14504 14505 * samprint.c: Formatting changes 14506 14507 * pair.c: Removed warning message when splicesite not found in GMAP 14508 14509 * gsnap.c: Changed documentation for -N 14510 145112011-04-22 twu 14512 14513 * parserange.c: Added warning message if divstring not found in IIT file 14514 14515 * get-genome.c, iit_get.c: Removed extra linefeed introduced in old IIT 14516 versions 14517 14518 * stage1hr.c: Added check for NULL pairarray 14519 145202011-04-21 twu 14521 14522 * Makefile.dna.am, dynprog.c, dynprog.h, gmap.c, indexdb_hr.c, sequence.c, 14523 stage3.c: Made changes so PMAP would compile 14524 14525 * trunk, src, Makefile.dna.am, dynprog.c, dynprog.h, genome.c, genome.h, 14526 gmap.c, goby.c, goby.h, gregion.c, gregion.h, gsnap.c, iit-read.c, 14527 maxent.c, maxent_hr.c, outbuffer.c, pair.c, pair.h, pairdef.h, resulthr.c, 14528 samprint.c, samprint.h, sequence.c, sequence.h, shortread.c, shortread.h, 14529 splicetrie.c, splicetrie.h, stage1.c, stage1.h, stage1hr.c, stage1hr.h, 14530 stage3.c, stage3.h, stage3hr.c, stage3hr.h, substring.c, substring.h, 14531 translation.c, translation.h, util: Merged revisions 38122 to 38539 from 14532 branch 2011-04-14-halfmapping-gmap 14533 145342011-04-15 twu 14535 14536 * bam_tally.c: Changed colon to tab when printing chromosomal coordinates 14537 145382011-04-14 twu 14539 14540 * trunk, src, genome_hr.c, gsnap.c, stage1hr.c, stage1hr.h, stage3hr.c, 14541 stage3hr.h, util: Merged revisions 37902 to 38171 from 14542 branches/2011-04-10-end-indels 14543 14544 * bam_tally.c: Fixed bug in iterating through list in position_printable_p 14545 14546 * bam_tally.c: Added -n and -X flags to control depth and variant strands 14547 required. Added -B flag to control block format output. 14548 14549 * bam_tally.c: Changed -A flag into separate -C and -Q flags to print 14550 details about cycles and quality scores 14551 14552 * stage3hr.c: Allowing optimal_score procedures to consider terminal scores 14553 if all alignments are terminal 14554 14555 * trunk, VERSION, src, dynprog.c, dynprog.h, gmap.c, intron.c, intron.h, 14556 maxent.c, maxent.h, stage3.c, stage3.h, util: Merged 38078:38120 from 14557 branch 2011-04-13-gmap-knownsplicing 14558 145592011-04-13 twu 14560 14561 * snpindex.c: Added variables for bigendian machines 14562 145632011-04-10 twu 14564 14565 * stage1hr.c: Using function Genome_fill_buffer_blocks for debugging 14566 14567 * config.site.rescomp.prd: Revised version number 14568 14569 * genome.c, genome.h: Added function Genome_fill_buffer_blocks 14570 14571 * substring.c: Made fixes to printing of SNPs in GSNAP output 14572 14573 * README, configure.ac, Makefile.am, psl_introns.pl.in: Added psl_introns 14574 program. Explaining in README file about known site-level and known 14575 intron-level splicing. 14576 14577 * gmap.c, outbuffer.c, pair.c, pair.h, stage3.c, stage3.h: Added option for 14578 -f introns output 14579 145802011-04-09 twu 14581 14582 * gsnap.c, iit-read.c, iit-read.h, interval.c, interval.h, splicetrie.c, 14583 splicetrie.h: Added code to allow known splicing based on introns, rather 14584 than splice sites 14585 145862011-03-29 twu 14587 14588 * bam_tally.c: Fixed bug in checking too early for chrpos_high > alloc_high 14589 14590 * VERSION: Updated version number 14591 145922011-03-28 twu 14593 14594 * cmetindex.c, snpindex.c: Made program work on bigendian machines 14595 14596 * stage3hr.c, stage3hr.h, substring.c, substring.h: Moved definition of 14597 Hittype_T from substring.h to stage3hr.h 14598 14599 * trunk, VERSION, config.site.rescomp.tst, src, goby.c, goby.h, gsnap.c, 14600 inbuffer.c, inbuffer.h, outbuffer.c, samprint.c, samprint.h, stage3hr.c, 14601 substring.c, util: Merged changes 36746 through 37246 from branch 14602 2011-03-17-goby-paired-end 14603 146042011-03-26 twu 14605 14606 * resulthr.c: Fixed bug in assignment of translocationp for single-end reads 14607 146082011-03-25 twu 14609 14610 * gmap.c, gsnap.c: Added clarification about memory mapping in message 14611 14612 * bamread.c: Fixed a memory leak caused by unnecessary copying of read 14613 string from BAM. 14614 14615 * Makefile.dna.am, bamread.c, bamread.h, gsnap_splices.c: Added bam_splices 14616 program 14617 14618 * bam_tally.c: Fixed bug with non-initialized count_plus or count_minus 14619 14620 * bam_tally.c: Fixed bug in extracting genomic reference nt from Genome_T. 14621 Added option for signed counts. 14622 14623 * configure.ac, Makefile.dna.am: Added Automake conditional to control when 14624 bam_tally can be made 14625 14626 * bamread.c: Added compiler directives to protect code when samtools is not 14627 available 14628 14629 * bam_tally.c: Fixed usage statement 14630 14631 * bam_tally.c: Added --pairmax option 14632 14633 * bam_tally.c: Fixed help message and added some printing options 14634 14635 * bamread.c: Stopped printing of chromosomes to stdout 14636 14637 * Makefile.dna.am: Added compiler commands for bam_tally 14638 14639 * bam_tally.c: Added clipping at ends of requested genomic region 14640 14641 * samread.c, samread.h: Added function Samread_print_cigar 14642 14643 * bam_tally.c: Implementation of working version 14644 146452011-03-24 twu 14646 14647 * bam_tally.c: Implementation of overall alloc and block structure 14648 14649 * bam_tally.c, bamread.c, bamread.h: Implemented memory freeing procedures 14650 14651 * bam_tally.c: Initial import into SVN 14652 146532011-03-18 twu 14654 14655 * gsnap.c: Rearranged some lines 14656 14657 * gmap.c: Added --quality-protocol flag and 'j' flag to set to quality print 14658 shift 14659 14660 * gmap.c, outbuffer.c, outbuffer.h: Added --quiet-if-excessive option to GMAP 14661 14662 * gsnap.c, outbuffer.c: Adding globals for invert_first_p and 14663 invert_second_p to stage3hr.c 14664 14665 * stage1hr.c, stage3hr.c, stage3hr.h: Determining effective_chrnum, 14666 genomicstart, and genomicend for splice translocations based on inner 14667 substrings. No longer generating copies for each substring when chrnum == 14668 0. 14669 14670 * outbuffer.c: Changed a 0 to a false 14671 14672 * README: Added explanation of dbsnp_iit program and output reporting for 14673 translocations 14674 14675 * configure.ac, Makefile.am, dbsnp_iit.pl.in: Added dbsnp_iit program 14676 14677 * stage3.h: Fixed declaration 14678 14679 * resulthr.c: Put debugging statements into debug macro 14680 14681 * outbuffer.c, pair.c, pair.h, resulthr.c, resulthr.h, samprint.c, 14682 samprint.h, stage3.c, stage3hr.c, stage3hr.h: Removed separate fp_transloc 14683 file. Adding (transloc) string to GSNAP output, and XT flag to SAM output 14684 for translocation results. 14685 14686 * substring.c: Added error message for endtype_string 14687 14688 * outbuffer.c, resulthr.c, resulthr.h, samprint.c, stage3hr.c: Removed 14689 SINGLEEND_TRANSLOCATION and PAIREDEND_TRANSLOCATION types 14690 146912011-03-11 twu 14692 14693 * README: Added more information about the --gunzip option, about the 14694 command-line usage for paired-end reads, and about extended FASTA inputs. 14695 14696 * README: Added information about -s flag for psl_splicesites 14697 14698 * VERSION: Changed version number 14699 14700 * snpindex.c: Added option to limit number of warning messages 14701 147022011-03-10 twu 14703 14704 * snpindex.c: Fixed warning messages that previously reported the wrong SNPs 14705 and coordinates that were problematic 14706 14707 * gsnap.c, mem.c, outbuffer.c, shortread.c: Enabled LEAKCHECK in GSNAP to 14708 check for memory leaks 14709 14710 * gmap_build.pl.in: Fixed bug in creating .maps subdirectory 14711 147122011-03-09 twu 14713 14714 * VERSION: Updated version 14715 14716 * README: Added information about latest syntax for running snpindex 14717 147182011-03-08 twu 14719 14720 * stage1hr.c, stage1hr.h: Passing new parameter nsplicepartners_skip 14721 14722 * stage3hr.c, stage3hr.h: Eliminating splice translocations if a 14723 non-translocation exists 14724 14725 * outbuffer.c, outbuffer.h, resulthr.c, resulthr.h, samprint.c, samprint.h: 14726 Providing a translocation result type and printing split output to a new 14727 file 14728 14729 * indexdbdef.h: Removing unused macro definition 14730 14731 * gsnap.c, splicetrie.c, splicetrie.h: Providing a minimum intron length in 14732 building the splicetrie 14733 14734 * snpindex.c: Providing user options to specify sourcedir and destdir 14735 14736 * cmetindex.c: Fixed amount of space allocated for filename 14737 147382011-03-06 twu 14739 14740 * gmap.c, gsnap.c: Changed warning message about memory mapping 14741 14742 * stage1hr.c: Removed a conversion for bigendian machines in 14743 Batch_init_simple. 14744 14745 * spanningelt.c: Added a necessary conversion for bigendian machines 14746 14747 * indexdb.c, indexdb_hr.c: Making FILEIO access to positions look the same 14748 as MMAPPED access for bigendian machines 14749 14750 * bigendian.c, littleendian.c: Using unsigned char instead of char 14751 147522011-03-04 twu 14753 14754 * inbuffer.c: Initializing value of pc_linefeeds_p 14755 14756 * gmap.c, gsnap.c: Changed advice message about -B 3 and -B 4 14757 14758 * iit_store.c: Using 2^32-1 as a constant instead of 2^32 14759 14760 * stage1hr.c: Added additional places where Bigendian_convert_uint should be 14761 applied 14762 14763 * bigendian.c: Removed monitoring message 14764 147652011-03-03 twu 14766 14767 * stage3hr.c: Sorting paired-end reads by insert length 14768 14769 * gsnap.c: Removed --dibase flag from --help output 14770 14771 * inbuffer.c, shortread.c, shortread.h: Added field for pc_linefeeds_p. 14772 Changed variable name from pc_line_feeds_p. 14773 14774 * gsnap.c, inbuffer.c, inbuffer.h, shortread.c, shortread.h: Added option to 14775 strip PC line feeds from input 14776 147772011-03-02 twu 14778 14779 * sequence.c: Enabled GMAP to read FASTQ files 14780 14781 * gsnap_fasta.c: Made the default behavior to print all sequences 14782 14783 * pair.c: Fixed problem with double tabs in SAM output. Now printing NM tag 14784 in SAM output. 14785 14786 * config.site.rescomp.prd, config.site.rescomp.tst, VERSION: Changed version 14787 number 14788 14789 * sam_merge.pl.in: Printing nomapping lines from original GSNAP output 14790 14791 * configure.ac, Makefile.am, sam_restore.pl.in: Added program sam_restore 14792 14793 * spliceturn.c: Added option to print splices, rather than splicesites 14794 14795 * samread.c, samread.h: Added functions to print modified SAM reads 14796 14797 * gsnap_splices.c: Trusting SAM splice directions by default. Added an 14798 explicit variable to indicate if the splice is canonical. 14799 14800 * gsnap_multiclean.c: Added a flag -C to pick either concordant or 14801 non-concordant behavior. Allowing nonmapped queries to be printed, if no 14802 other alignment is available. 14803 14804 * gsnap_filter.c: Modified to print nonmapped SAM entries. Removed 14805 inconsistencies in printing some lines to fp_one and some to fp_many. 14806 14807 * gsnap_fasta.c: Prints the query from the SAM line with the most matches 14808 (i.e., not hardclipped). Added an option --oneway to print both ends of a 14809 paired-end read in the same direction. 14810 14811 * gsnap_concordant.c: Modified to print nonmapped SAM entries, and to add 14812 mate information for concordant pairs. Code follows that in 14813 gsnap_multiclean.c 14814 148152011-03-01 twu 14816 14817 * splicetrie.c: Increased value of MAX_DUPLICATES from 100 to 1000 14818 14819 * gsnap.c: Checking for case where splicesites_iit has no sites 14820 corresponding to given genome 14821 14822 * pair.c: Fixed MD string to print genomic nt, rather than query nt 14823 148242011-02-28 twu 14825 14826 * pair.c: Implemented MD string in SAM output for GMAP 14827 148282011-02-26 twu 14829 14830 * spliceturn.c: Added --new flag to print only new splices 14831 14832 * spliceclean.c: Added break after case 0 in getopt 14833 14834 * substring.c, substring.h: Fixed insert length calculation to be based on 14835 genomicstart and genomicend. Removed querylength_adj field. 14836 14837 * stage3hr.c: Preventing terminal alignments from setting minscore in 14838 Stage3_optimal_score and Stage3pair_optimal_score 14839 14840 * stage1hr.c: Fixed and simplified calculation of floor_left and floor_right 14841 for diagonals 14842 148432011-02-25 twu 14844 14845 * configure.ac, Makefile.am, psl_splices.pl.in: Added psl_splices program 14846 14847 * sam_merge.pl.in: Including inserted query segment in computing number of 14848 matches. 14849 14850 * spliceturn.c: Eliminating only new splices (numeric labels). Searching 14851 for nearest surrounding known splices. 14852 14853 * spliceclean.c: Printing original label for each splice 14854 14855 * gsnap_splices.c: Added flag to require canonical splices. Checking for 14856 canonical dinucleotides regardless of sam XS string. 14857 14858 * samprint.c: Fixed bug in MD string for shortexon. Added debugging 14859 statements. 14860 14861 * stage1hr.c, stage3hr.c, stage3hr.h, substring.c, substring.h: Added 14862 querylength_adj to compensate for indels when computing insert length 14863 14864 * pair.c: Fixed bug in tokens where Ilength was being printed instead of 14865 Dlength 14866 14867 * spliceturn.c: Printing intron lengths. Added labels to output when 14868 splices are turned. 14869 14870 * shortread.c: Requiring first file to have "/1" and second file to have 14871 "/2" if slashes are present 14872 14873 * samread.c: Allowing for XS:A:? to indicate unknown splice direction 14874 14875 * pair.c: Changed minimum intron length for non-concordant splice in cigar 14876 string from 100 to 20 14877 14878 * gsnap_tally.c: Changed default handling of quality scores to be Sanger 14879 protocol 14880 14881 * gsnap_splices.c: Using available XS flag in SAM output. Printing 14882 non-directional splices as both forward and reverse. Printing 14883 non-canonical dinucleotide pairs. 14884 14885 * gsnap_multiclean.c: Skipping, rather than aborting on, concordant pairs 14886 with different numbers of hits, due to translocations 14887 14888 * gmap.c: Changed flag for printing noncanonical splices in cigar string 14889 14890 * samprint.c, stage3hr.c, substring.c, substring.h: Fixed MD string to 14891 exclude part that is soft-clipped 14892 14893 * spliceclean.c, tally.c, tally.h: Added default-count option. Fixed bug 14894 where splice occurred beyond extents. Fixed some memory leaks. 14895 148962011-02-24 twu 14897 14898 * psl_splicesites.pl.in: Added missing parentheses 14899 14900 * configure.ac, Makefile.am, sam_merge.pl.in: Added sam_merge program 14901 14902 * README: Added description of gmap_build 14903 14904 * gsnap.c, mapq.c: Made sanger the default for quality protocol 14905 149062011-02-23 twu 14907 14908 * shortread.c: Fixed bug in chopping paired-end reads of different lengths 14909 14910 * sequence.c: Fixed bug in skipping initial '<', '>', or '+' in quality 14911 string 14912 14913 * gsnap_concordant.c: Requiring chr strings in the two ends to be equal 14914 14915 * pair.c: Printing XS:A:? when splice direction is not known, because it is 14916 non-canonical 14917 14918 * sequence.c, sequence.h: Added compiler directives to hide quality string 14919 from PMAP 14920 14921 * gsnap_fasta.c: Printing extended FASTA with quality strings for GMAP output 14922 14923 * gmap.c, outbuffer.c, outbuffer.h, pair.c, pair.h, sequence.c, sequence.h, 14924 stage3.c, stage3.h: Implemented ability to read extended FASTA with 14925 quality strings and print them in SAM format 14926 14927 * mapq.c: Added recommendation to use --quality-protocol=sanger 14928 14929 * gsnap_concordant.c, samread.c, samread.h: Printing altered mapq scores 14930 14931 * gmap.c, outbuffer.c, outbuffer.h, pair.c, pair.h, stage3.c, stage3.h: 14932 Added flag to print non-concordant splices as N in SAM cigar string, 14933 rather than as D 14934 14935 * gsnap_concordant.c, samread.c, samread.h: Added calculation and printing 14936 of insert length 14937 14938 * gsnap.c: Added information about quality protocols to --help output 14939 149402011-02-21 twu 14941 14942 * shortread.c: Fixed bugs in handling gzipped paired-end files 14943 14944 * exonscan.c, extents_genebounds.c, gdiag.c, geneadjust.c, get-genome.c, 14945 gsnap_extents.c, gsnap_splices.c, gsnap_tally.c, gsnap_terms.c, 14946 iit_plot.c, pairinggene.c, segue.c, snpindex.c, splicefill.c, 14947 splicegene.c, splicegraph.c, splicescan.c, spliceturn.c, splicing-scan.c, 14948 splicing-score.c, tally_expr.c, tallygene.c: Changed parameter for 14949 Genome_new from batchp to access mode 14950 14951 * psl_splicesites.pl.in: Added -R flag to report non-canonical splice sites 14952 14953 * gmap_build.pl.in: Fixed name of maps subdirectory 14954 14955 * sequence.c, shortread.c: Fixed bugs further in closing NULL file pointer 14956 14957 * psl_splicesites.pl.in: Fixed bugs in syntax. Added -s flag to specify 14958 start column. 14959 14960 * util, fa_coords.pl.in, gmap_build.pl.in, gmap_process.pl.in: Merged -r 14961 35283:35500 from branches/gmapindex-multifile/util 14962 14963 * Makefile.am: Added psl_splicesites to the list of files to be cleaned 14964 14965 * README, configure.ac, Makefile.am, psl_splicesites.pl.in: Added 14966 psl_splicesites program to process UCSC alignment tracks into a splicesite 14967 file. 14968 14969 * sequence.c, shortread.c: Fixed bug from trying to close a NULL file pointer 14970 14971 * trunk, README, VERSION, config.site.rescomp.tst, maint, memory-check.pl, 14972 src, gsnap.c, interval.c, mapq.c, mapq.h, samprint.c, splicetrie.c, 14973 splicetrie.h, stage1hr.c, stage1hr.h, stage3hr.c, stage3hr.h, substring.c, 14974 substring.h: Merged revisions 35346:35468 from branches/tentative-splices 14975 to store and resolve ambiguous splice ends 14976 14977 * outbuffer.c: Added warning messages when an output file cannot be written 14978 149792011-02-16 twu 14980 14981 * mapq.c: Changed allowable range of quality scores to go from 0 to 96 14982 14983 * gsnap.c: Changed default min_shortend from 1 to 2 14984 14985 * stage1hr.c: Removed unused code for old maxent procedures 14986 14987 * splicetrie.c, splicetrie.h, stage1hr.c: Providing splicesites to 14988 Splicetrie_find_short procedures, useful for debugging 14989 14990 * stage3hr.c: Removed debugging statement 14991 14992 * stage1hr.c: Restored ability to find short-overlap splicing in 1-2 bp at 14993 ends of read 14994 14995 * stage3hr.c: For insert length of 0, setting absdifflength to be infinite 14996 14997 * sequence.c, shortread.c: Closing files when multiple ones are provided on 14998 the command line 14999 15000 * pair.c: Implemented printing of intron distances in splice sites (-f 6) 15001 output 15002 15003 * gsnap.c, interval.c, interval.h, splicetrie.c, splicetrie.h, stage1hr.c, 15004 stage1hr.h: Implemented usage of known splice distances for single splices 15005 15006 * get-genome.c: Printing output correctly for version 5 IIT files with 15007 information in rest of header 15008 150092011-02-15 twu 15010 15011 * stage1hr.c: Removed unnecessary checks for nsplicesites > 0 15012 15013 * stage1hr.c: Computing and storing known splicesites_i for each segment once 15014 15015 * shortread.c: Removed abort left in for debugging 15016 15017 * shortread.c: Fixed bug in selecting region of short read to look for 15018 adapter stripping 15019 15020 * stage3hr.c: Fixed indentation 15021 15022 * gsnap_concordant.c: Fixed procedure to report all concordant pairs with 15023 correct distance and orientation 15024 15025 * gsnap.c, inbuffer.c, shortread.c, shortread.h: Simplified procedure for 15026 adapter stripping using a linear algorithm, instead of dynamic programming. 15027 15028 * Makefile.dna.am, Makefile.gsnaptoo.am: Revised instructions for cmetindex 15029 15030 * cmetindex.c: Enabled -F and -D flags for specify source and destination 15031 directories. Added --version and --help flags. 15032 15033 * gmap_build.pl.in: Added -B flag to optionally specify bindir 15034 15035 * gmap_setup.pl.in: Modified usage statement to include GSNAP 15036 15037 * configure.ac, Makefile.am, gmap_build.pl.in: Added gmap_build program 15038 150392011-02-14 twu 15040 15041 * VERSION: Updated version 15042 15043 * README: Added statement about usage of -m flag 15044 15045 * gsnap_concordant.c: Fixed typo leading to wrong output file 15046 15047 * outbuffer.c: Fixed typo 15048 15049 * stage3.c, stage3.h: Allowing flags for GMAP to indicate SAM output is 15050 paired-end 15051 15052 * shortread.c: Allowing extended FASTA format to include a quality string 15053 15054 * sequence.c: Removed debugging statement 15055 15056 * samflags.h: Clarified comment 15057 15058 * pair.c, pair.h: Allowing flags for GMAP to indicate SAM output is 15059 paired-end. Printing XS flag for strand direction. 15060 15061 * outbuffer.c, outbuffer.h: Added variables for indicating SAM output is 15062 paired-end 15063 15064 * gsnap_terms.c: Using procedure in samread.c for computing chrpos_high. 15065 15066 * gsnap_multiclean.c, samread.c, samread.h: Added file variable for printing 15067 altered flags. Added procedure for computing chrpos_high. 15068 15069 * gsnap_filter.c: Added program for sam_filter 15070 15071 * gsnap_fasta.c: Handling GMAP and GSNAP output correctly in a single 15072 procedure 15073 15074 * gsnap_extents.c: Enabling GMAP indexdb for sam_extents 15075 15076 * gsnap_concordant.c: Recomputing all SAM flags 15077 15078 * gmap.c: Changed format names to samse and sampe 15079 15080 * Makefile.gsnaptoo.am: Include maxent_hr.c and .h for GMAP. Removed 15081 maxent.c and .h from GSNAP. 15082 15083 * Makefile.dna.am: Include maxent_hr.c and .h for PMAP. Included programs 15084 sam_fasta and sam_concordant. 15085 150862011-02-13 twu 15087 15088 * gsnap_concordant.c: Added file for concordant_mult. Printing concordant 15089 results in adjacent pairs of SAM lines. 15090 15091 * gsnap_concordant.c: Initial import into SVN 15092 150932011-02-12 twu 15094 15095 * gsnap_fasta.c: Printing extended FASTA output for GMAP, using '>' and '<' 15096 15097 * gsnap_fasta.c: Provided separate output types for GMAP and for GSNAP 15098 15099 * pair.c, sequence.c, sequence.h: Enabled reading of extended FASTA using 15100 '>' and '<' to indicate first and second reads, and putting information in 15101 flag of SAM output. 15102 151032011-02-11 twu 15104 15105 * stage3hr.c: Implemented a different method for using bingo pairlengths, by 15106 calculating a minscore for those pairlengths with the bingo characteristic 15107 15108 * stage3hr.c: Using pairlength_deviation to eliminate pairs, even if 15109 non-overlapping 15110 15111 * gsnap_extents.c, gsnap_iit.c, gsnap_splices.c, gsnap_tally.c, 15112 gsnap_terms.c: Changed defaults of concordant and unique to be false in 15113 all programs 15114 15115 * gsnap_fasta.c: Added printing of quality strings. Implemented -A flag to 15116 print all sequences. 15117 15118 * stage1hr.c: Rearranged order of singlesplice_minus procedure 15119 15120 * gsnap_terms.c: Fixed check of concordance and uniqueness for last sequence 15121 15122 * shortread.c: Moved return statement to correct place 15123 15124 * gmap.c, outbuffer.c: Added compiler directives for case when pthreads not 15125 available 15126 15127 * datadir.c: Changed warning message about needing to recompile GMAP package 15128 15129 * outbuffer.c, samprint.c, samprint.h, stage3hr.c, stage3hr.h: Split 15130 fp_paired_uniq into separate files for inversions, scrambles, and long 15131 inserts 15132 151332011-02-10 twu 15134 15135 * gsnap.c, substring.c, substring.h: Removed notion of termdonor and 15136 termacceptor typeints 15137 15138 * samprint.c: Printing insert length for paired alignments. Changed method 15139 for determining sign. 15140 15141 * substring.c, substring.h: Added functions for finding overlaps and insert 15142 lengths between two substrings 15143 15144 * stage3hr.c: Changed criterion for scramble to be the absence of any 15145 overlap in the wrong relative positions. Revamped computation of insert 15146 length to look for overlapping substrings. 15147 15148 * inbuffer.c: Removed second check of nleft == 0 when initially it is not 15149 15150 * substring.c: Added other fields in Substring_T that were not being copied. 15151 Removed splicesites_offset from Substring_T object. 15152 15153 * gsnap.c, outbuffer.c, outbuffer.h, stage1hr.c, stage3hr.c, stage3hr.h, 15154 substring.c, substring.h: Added compiler directives for using new 15155 maxent_hr procedures. Fixed problem in substring.c where chimera_knownp_2 15156 was not being copied. 15157 15158 * maxent_hr.c: Using jump tables based on shift 15159 151602011-02-09 twu 15161 15162 * maxent_hr.c: Removed duplicate calls in reading genome_blocks 15163 15164 * stage3hr.h: Fixed definition of sense consistency for inversion pairs 15165 15166 * stage3hr.c, stage3hr.h: Changed splicing sense test to work for inversion 15167 pairs 15168 15169 * stage1hr.c: Distinguished a binary_search procedure to be used for 15170 bigendian computers in processing positions from the indexdb file. 15171 15172 * stage1hr.c: Added missing code for computing nmismatches for one type of 15173 splice end 15174 15175 * Makefile.dna.am, maxent_hr.c, maxent_hr.h, stage1hr.c: Implemented fast 15176 calculation of maxent splice site probabilities 15177 15178 * stage1hr.c: Removed trimpos calculation from find_spliceends. Not useful 15179 when we can compute short-overlaps. 15180 15181 * Makefile.dna.am, gsnap_best.c, gsnap_fasta.c: Added programs for 15182 extracting alignment with best MAPQ score and converting alignment output 15183 to FASTA. 15184 15185 * stage1hr.c: Using genome-based splice site detection for splice ends 15186 15187 * gsnap.c: Fixed name of flag suboptimal-levels in help statement 15188 15189 * shortread.c: Fixed bug in not allocating space for final '\0' to contents. 15190 Commented out check for PC line feeds. 15191 151922011-02-08 twu 15193 15194 * genome_hr.c: Fixed excessive shift in calculating high_halfbit 15195 15196 * stage1hr.c: Put back checks for zero nsegments 15197 15198 * outbuffer.c: Some output seems to be missed on occasion. Rewrote to use 15199 ndone and noutput. 15200 15201 * stage1hr.c: Making all loops on segments go to segments[nsegments] instead 15202 of nsegments-1 15203 15204 * stage1hr.c: Implemented novel double splice detection using new 15205 genome-based splice site detection. Removed leftspan/rightspan test and 15206 replaced with counting mismatches. 15207 15208 * outbuffer.c: Fixed outbuffer to check for donep flag, and for inbuffer to 15209 signal when donep is set 15210 152112011-02-07 twu 15212 15213 * genome_hr.c: Fixed bug in dealing with high halfbit 15214 152152011-02-06 twu 15216 15217 * stage1hr.c: Fixed memory leak caused by computing floors twice 15218 15219 * stage1hr.c: Raised minimum splice prob support to 0.80. Extending range 15220 for single splicing to 2 nt from each end. Implemented double splice 15221 detection involving known splice sites. 15222 15223 * stage3hr.c: Eliminating identical Stage3_T pairs properly 15224 15225 * outbuffer.c: Fixed bug that caused GMAP to handle maxpaths parameter 15226 incorrectly 15227 15228 * genome_hr.c, genome_hr.h, stage1hr.c, substring.c: Implemented working 15229 procedure for finding splice sites from compressed genome, and merging 15230 with known sites. Using this procedure to find single splices. Merged 15231 single-splice procedures for plus and minus strand, and moved handling of 15232 plus and minus strands inside of Substring_T procedures. 15233 152342011-02-05 twu 15235 15236 * genome_hr.c, genome_hr.h: Implemented fast determination of splice site 15237 locations 15238 15239 * stage1hr.c: Removed unused variables 15240 15241 * genome_hr.c: Added tables for splice site positions 15242 15243 * dev: New directory for developer work 15244 152452011-02-04 twu 15246 15247 * stage1hr.c: Integrated multiple procedures for merging heaps to find 15248 segments into a single procedure (plus a specialized one for terminals 15249 only). Removed separate Splicesegment_T object and using a general 15250 Segment_T object. 15251 15252 * stage1hr.c: Divided single splicing, known double splicing, and novel 15253 double splicing into separate procedures. 15254 15255 * gsnap.c, stage1hr.c, stage1hr.h: Provided flag for detecting novel double 15256 splices, and turned the feature off by default. 15257 15258 * stage1hr.c: Implemented faster method for finding double splices. 15259 Commented out code no longer valid for setting splice_pos_start and 15260 splice_pos_end in finding single splices. 15261 15262 * stage1hr.c: Increased speed of finding local splices by storing leftmost 15263 and rightmost querypos. Hiding double-splicing for now. 15264 152652011-02-03 twu 15266 15267 * stage1hr.c, substring.c, substring.h: Handling splicesites_offset in 15268 donor, acceptor, and shortexon Substring_T types. Handling two values of 15269 splicesites_i in shortexon. 15270 15271 * stage1hr.c: Implemented detection of double-splicing at novel splice 15272 sites. Integrated detection of single-splicing and double-splicing. 15273 Removed some unused code based on USE_CHARS rather than nucleotides. 15274 15275 * stage3hr.h: Added missing declaration of function 15276 15277 * outbuffer.c: Finishing up printing of remaining output 15278 152792011-02-02 twu 15280 15281 * stage1hr.c: Offset knowni arrays by +1, so we can clear by setting to 0, 15282 rather than by setting to -1. 15283 15284 * outbuffer.c, pair.c, pair.h: Now GMAP prints nomapping results in SAM 15285 format 15286 15287 * pair.c: Fixed printing of splice site scores for antisense cDNAs 15288 15289 * stage1hr.c: Removed restriction on finding terminals only when nconcordant 15290 was 0. 15291 15292 * trunk, src, gsnap.c, stage1hr.c, stage1hr.h: Merged r33262:34617 from 15293 branch suboptimal-alignments, adding parameter for terminal length, and 15294 not using 10*maxpaths for computation 15295 15296 * config.site.rescomp.tst, VERSION: Updated version number 15297 15298 * README: Updated description to include information about paired alignments 15299 and the advantages of known splice sites 15300 15301 * stage1hr.c: Added better debugging statements. Renamed "shortend" 15302 procedures to "short-overlap". 15303 15304 * splicetrie.c: Requiring that 3 * nmismatches < nmatches to report a true 15305 search result 15306 15307 * gsnap.c, outbuffer.c, resulthr.c, resulthr.h, samprint.c, samprint.h, 15308 stage1hr.c, stage1hr.h, stage3hr.c, stage3hr.h: Implemented three types of 15309 alignment: concordant, paired, and unpaired, with three subtypes of 15310 paired: inversion, toolong, and scramble. Detecting paired alignments in 15311 Stage3_pair_up_concordant. Converting unpaired uniq to paired uniq when 15312 appropriate. 15313 153142011-02-01 twu 15315 15316 * stage3hr.c: Adding information about unpaired type (interchrom, toolong, 15317 scramble, inversion) for unpaired_uniq results 15318 15319 * outbuffer.c: Changed type for nread and ncomputed fields from bool to int 15320 15321 * gsnap.c, stage1hr.c, stage1hr.h, stage3hr.c, stage3hr.h: Added parameter 15322 for pairlength_deviation 15323 15324 * outbuffer.c: Protecting print loops with surrounding lock and unlock 15325 instructions. Removed debugging flag. 15326 15327 * Makefile.dna.am, Makefile.gsnaptoo.am, Makefile.three.am, gmap.c, gsnap.c, 15328 inbuffer.c, inbuffer.h, ioboard.c, ioboard.h, outbuffer.c, outbuffer.h: 15329 Fixed bug where multithreading was hanging. Moved IOBoard_T information 15330 to Outbuffer_T. 15331 15332 * stage3hr.c: Rewrite of Stage3_pair_up_concordant to get all concordant 15333 pairs 15334 15335 * gmap.c, gsnap.c: Not printing program name as an arg 15336 15337 * gmap.c, gsnap.c: Printing version and calling arguments to stderr 15338 15339 * stage3hr.c: Using a pointer instead of a count to mark paired_seenp 15340 15341 * outbuffer.c: Using RRlist_T to represent queue for ordered output 15342 15343 * outbuffer.c: Replaced doubly linked list with singly linked list for queue 15344 153452011-01-31 twu 15346 15347 * stage3hr.c: Added additional check to prevent negative insert lengths 15348 15349 * outbuffer.c: Storing results in a queue, instead of a list 15350 15351 * Makefile.dna.am: Removed reads_store and reads_dump 15352 15353 * stage3hr.c: Removed keyword "quality:" from output 15354 15355 * shortread.c: Fixed bug that removed all quality strings 15356 15357 * stage3hr.c: Fixed bug in finding concordance between overlapping ends of 15358 read 15359 15360 * indexdb.c: Fixed bug in compiler directive for MMAP 15361 153622011-01-28 twu 15363 15364 * goby.c, goby.h, gsnap.c, inbuffer.c, inbuffer.h, samprint.c, shortread.c, 15365 shortread.h, stage3hr.c: Simplified procedures in shortread.c. 15366 Implemented parsing of barcodes. 15367 15368 * mapq.c, mapq.h, substring.c: Fixed calculation of MAPQ with separate 15369 coordinates for checking genomic string and quality string 15370 15371 * shortread.c, shortread.h: Initial import into SVN. Contains Sequence_T 15372 functions specific to GSNAP. 15373 15374 * Makefile.dna.am, Makefile.gsnaptoo.am, Makefile.three.am, chop_primers.c, 15375 genome.c, goby.c, goby.h, gsnap.c, inbuffer.c, outbuffer.c, reads_store.c, 15376 request.c, request.h, samprint.c, samprint.h, sequence.c, sequence.h, 15377 stage1hr.c, stage1hr.h, stage3hr.c, stage3hr.h, substring.c, substring.h: 15378 Separated Sequence_T functions into Sequence_T and Shortread_T 15379 15380 * gmap.c, outbuffer.c, outbuffer.h: Made outbuffer work for PMAP by removing 15381 references to sam_header_p and related variables 15382 15383 * gsnap.c: Removed old sam.h include 15384 15385 * substring.c: Turned off debugging 15386 15387 * spliceclean.c: Version 34373 was an accidental reversion. Going back to 15388 version 34369, where we are adding use of genebounds_iit and adding 15389 functionality for resolving splice directions 15390 15391 * spliceturn.c: Version 34374 was an accidental reversion. Going back to 15392 version 34369, where information goes to stderr. 15393 15394 * splicegene.c: Version 34375 was an accidental reversion. Going back to 15395 version 34369, which actually does stop fixing of terminalp in acceptors 15396 where next donor is terminal. 15397 15398 * splicefill.c: Version 34377 was an accidental reversion. Going back to 15399 version 34369, which uses tally, adds smoothing, does not use slopes to 15400 find edges, and does not check for edgedistance. 15401 15402 * pair.c: Using new samflags.h 15403 15404 * Makefile.dna.am, Makefile.gsnaptoo.am, Makefile.three.am, gsnap_extents.c, 15405 gsnap_multiclean.c, gsnap_splices.c, gsnap_tally.c, gsnap_terms.c, sam.c, 15406 sam.h, samflags.h, samprint.c, samprint.h: Change file name from sam.c to 15407 samprint.c. Moved definitions of SAM flags to samflags.h. 15408 15409 * splicefill.c: Removed smoothing. Using slopes to find edges. Checking 15410 for edgedistance. 15411 15412 * splicegene.c: Stopped fixing of terminalp in acceptors cases where next 15413 donor was terminal 15414 15415 * spliceturn.c: Providing information to stdout about splices that are turned 15416 15417 * spliceclean.c: Removed use of genebounds_iit and functionality for 15418 resolving splice directions 15419 15420 * changepoint.c: Changed function for both ends, but not used anyway 15421 15422 * src, Makefile.dna.am, Makefile.gsnaptoo.am, Makefile.three.am, 15423 blackboard.c, blackboard.h, changepoint.c, chop_primers.c, gmap.c, 15424 gsnap.c, iit_pileup.c, inbuffer.c, inbuffer.h, ioboard.c, ioboard.h, 15425 outbuffer.c, outbuffer.h, reads_get.c, reqpost.c, reqpost.h, request.c, 15426 request.h, result.c, result.h, resulthr.c, resulthr.h, sequence.c, 15427 sequence.h, spliceclean.c, splicefill.c, splicegene.c, spliceturn.c, 15428 stage3.c, stage3.h, tableuint.c, tableuint.h, tally_exclude.c: Merging 15429 changes to threads system from new-threads branch 15430 15431 * indexdb.c: Removed extraneous allocation of memory for offsets 15432 15433 * pair.c, pair.h, stage3.c, stage3.h: Using a single parameter for 15434 sourcename for GFF3 output 15435 15436 * substring.c: Fixed calculation of region to check for MAPQ scoring 15437 15438 * mapq.c: Added check for wrong segment to check in computing MAPQ 15439 154402011-01-24 twu 15441 15442 * result.c: Altered debugging statements 15443 15444 * gmap.c: Fixed memory leak in chimera detection 15445 154462011-01-21 twu 15447 15448 * gmap.c: Fixed case where best0 or best1 was duplicated in rest of 15449 stage3list 15450 15451 * result.c: Added debugging statements 15452 15453 * gmap.c: Removed debugging comment 15454 15455 * stage3.c, stage3.h: Added function Stage3_identity_cmp to help with 15456 chimera detection 15457 15458 * gmap.c: Removed check for chimeras based on alignment break. Handling 15459 cases where the same stage3 object is in both lists. 15460 15461 * chimera.c, pair.c, pair.h: Simplified Pair_matchscores and computing over 15462 querylength. In Chimera_bestpair, check for cases where the same stage3 15463 object is in both lists. 15464 154652011-01-20 twu 15466 15467 * Makefile.gsnaptoo.am: Added chimera.c to build of gmap 15468 15469 * VERSION: Updated version to 2011-01-21 15470 15471 * gsnap.c: Always creating a .nomapping file with --split-output option 15472 15473 * stage1hr.c: Changed debugging statements for shortexon 15474 15475 * splicetrie.c: Changed debugging statements 15476 15477 * sequence.c: Not printing space at end of accession 15478 15479 * gsnap.c: Turning on splicetrie precomputation by default 15480 15481 * gmap.c: Fixed bug in separating chimeric paths 15482 15483 * gmap.c: Not sorting first part of stage3list when chimera is present 15484 15485 * Makefile.dna.am: Added uintlist.c to gsnap_iit 15486 15487 * chimera.c: Made detection of alignment break work again 15488 15489 * splicetrie.c: Implemented handling of duplicate leaves 15490 154912011-01-19 twu 15492 15493 * splicegene.c: Handling genebounds.iit as input 15494 15495 * gsnap.c: Added --sam-headers-batch option 15496 15497 * gsnap_iit.c: Changed output to look like gene map format 15498 15499 * gsnap_extents.c: Fixed handling of non-spliced reads in sam_extents 15500 155012011-01-18 twu 15502 15503 * sam.h: Added constant for clearing NOT_PRIMARY bit 15504 15505 * sam_tally.c: Removed from CVS 15506 15507 * gsnap_multiclean.c, samread.c, samread.h: Implemented printing of altered 15508 flag 15509 15510 * gsnap_terms.c: Made program provide same output with SAM input. 15511 Implemented filtering for concordant pairs. Removed filtering by 15512 max_endlength. 15513 15514 * gsnap_tally.c: Implemented filtering for concordant pairs 15515 15516 * gsnap_splices.c: Made program provide same output with SAM input 15517 15518 * gsnap_extents.c: Made program with SAM input 15519 15520 * gsnap_multiclean.c: Turned off debugging statements 15521 15522 * Makefile.dna.am, gsnap_multiclean.c: Implemented sam_multiclean 15523 155242011-01-15 twu 15525 15526 * gsnap_multiclean.c, multiclean.c: Renamed file 15527 15528 * gsnap_tally.c: Added check for concordantp in SAM input. Fixed bug in 15529 initializing a variable. 15530 15531 * sequence.c: Made paired adapter detection more stringent, allowing only 1 15532 mismatch 15533 15534 * gsnap.c, sam.c, sequence.c, sequence.h, stage3hr.c: Fixed bugs in printing 15535 full quality string in GSNAP output 15536 15537 * gsnap.c, sam.c, sequence.c, sequence.h, stage3hr.c: Printing full quality 15538 string (not chopped for adapter) in GSNAP output 15539 155402011-01-10 twu 15541 15542 * stage3.c: Fixed compilation for PMAP 15543 15544 * gmap.c: Added compiler directives to hide SAM output which is not used in 15545 PMAP 15546 15547 * translation.c: Added compiler directives to hide functions that are not 15548 used in PMAP 15549 15550 * oligop.c: Fixed compiler warnings about array index being char 15551 15552 * Makefile.dna.am: Removed bam_pileup from being made 15553 15554 * gmap.c: Added documentation for new output flags 15555 15556 * gsnap.c: Changed output flag from -7 to --split-output 15557 15558 * chimera.c, chimera.h, genome.c, get-genome.c, gmap.c, iit-read.c, 15559 iit-read.h, md5-compute.c, pair.c, pair.h, revcomp.c, segmentpos.c, 15560 segmentpos.h, sequence.c, sequence.h, stage1.c, stage3.c, stage3.h, 15561 subseq.c, translation.c, translation.h: Implemented split output to files 15562 15563 * iit-read.c: Fixed bug in handling NULL IITs 15564 15565 * gmap.c, pair.c, pair.h, sequence.c, sequence.h, stage3.c, stage3.h: 15566 Implemented printing of chimeras in SAM output 15567 155682011-01-09 twu 15569 15570 * trunk, gmap.c, pair.c, pair.h, result.c, result.h, stage3.c, stage3.h, 15571 translation.c: Merged all changes from chimera branch 15572 15573 * Makefile.pmaptoo.am: Update commands 15574 15575 * Makefile.dna.am: Added commands for bam_pileup 15576 155772011-01-07 twu 15578 15579 * gmap.c, stage3.h: Added new debugging point for result after all cycles. 15580 15581 * stage3.c: Not forcing solution for dual breaks. Using separate maxiter 15582 limits. 15583 15584 * stage3.c: Changed comments for fix_adjacent_indels 15585 155862011-01-06 twu 15587 15588 * Makefile.three.am: Added files to GSNAP 15589 15590 * pair.c: Changed debugging output for Pair_dump to show the comp 15591 15592 * stage2.c: Added a check for all zero scores when trying to find alignment 15593 end point 15594 155952011-01-05 twu 15596 15597 * stage3.c: Added a final cleaning of ends 15598 15599 * stage3.c: Added procedure to fix adjacent indels 15600 15601 * gmap.c, pair.c, pair.h, segmentpos.c, segmentpos.h, stage3.c, stage3.h: 15602 Removed references to zerobasedp 15603 15604 * pair.c: Using last_querypos and last_genomepos explicitly instead of 15605 prev->querypos and prev->genomepos. Fixed issues with SAM output. 15606 156072011-01-04 twu 15608 15609 * gmap.c: Added compiler directives to prevent PMAP from seeing SAM output 15610 code 15611 15612 * backtranslation.h: Fixed typo in declaration 15613 15614 * gsnap.c: Fixed comment 15615 15616 * gmap.c, pair.c, pair.h, stage3.c, stage3.h: Printing headers and 15617 read-groups in SAM output 15618 156192011-01-03 twu 15620 15621 * MAINTAINER: Updated instructions for ChangeLog 15622 15623 * config.guess: Update of config.guess by latest autoconf 15624 15625 * INSTALL: Update of INSTALL message by latest autoconf 15626 15627 * stage3hr.c: Added assertions about sign of nindels 15628 15629 * gmap_setup.pl.in: Handling case where user gives -d argument with trailing 15630 slash 15631 15632 * gsnap.c: Added missing break after -o flag 15633 156342010-12-22 coryba 15635 15636 * gsnap_tally.c: changed compiler directives to get gmap build to work 15637 15638 * sam.c: *minor change to have the MD field output a 0 after the deletion if 15639 an insertion is adjacent to a deletion **IGB can now parse gsnap's SAM 15640 output 15641 156422010-12-15 twu 15643 15644 * gsnap.c, mapq.c, sequence.c: Added flag --quality-protocol 15645 156462010-12-12 twu 15647 15648 * stage1hr.c: Fixed bugs in storing splicesites_i 15649 156502010-12-10 twu 15651 15652 * pair.c: Fixed bug in dealing with EXTRAEXON_COMP 15653 15654 * gsnap_tally.c: Added flag for minimum mapq 15655 15656 * gsnap.c, mapq.c, stage1hr.c, stage1hr.h, stage3hr.c, stage3hr.h, 15657 substring.c: Merged r32485:32693 from branch gsnap-trim-penalty into the 15658 trunk 15659 156602010-12-08 twu 15661 15662 * config.site.rescomp.tst: Updated to include with_samtools 15663 15664 * bamread.c: Hid declaration of bam_init_header_hash when samtools is not 15665 enabled 15666 156672010-12-07 twu 15668 15669 * substring.c: Implemented marking of methylation changes 15670 15671 * stage1.c: Performing a single uniqueness step at end 15672 15673 * stage2.c: Using global or local winner for end of stage 2 15674 15675 * indexdb_dibase.c: Using Access_mode_T for Indexdb_new_genome 15676 15677 * indexdb.c: Minor fixes 15678 15679 * gregion.c, gregion.h: Providing hooks for Gregion_filter_clean 15680 15681 * gmap.c, gsnap.c: Using allocate as default mode if mmap not available 15682 156832010-12-06 twu 15684 15685 * gsnap_tally.c, bamread.c, bamread.h, gsnap_extents.c, gsnap_splices.c, 15686 gsnap_terms.c, samread.c, samread.h: Returning mapping quality from SAM 15687 and BAM inputs 15688 15689 * gmap.c: Improved default information for --batch feature in --help 15690 15691 * get-genome.c: Fixed mapping labels from stdin 15692 15693 * gsnap.c: Changed default memory access to be level 2 15694 156952010-12-04 twu 15696 15697 * stage3hr.c: Disallowing concordant pairs between two terminal alignments 15698 15699 * stage1hr.c, stage3hr.c, stage3hr.h: Placed restriction on terminal 15700 alignments to have fewer than allowed mismatches within region after 15701 trimming 15702 15703 * stage3hr.c: Changed Stage3pair_remove_duplicates to resolve overlaps using 15704 absdifflength 15705 15706 * gsnap.c: Changed --help output to show default batch mode of 4 15707 15708 * gmap.c: Providing more batch modes in GMAP 15709 15710 * access.h, genome.c, genome.h, gsnap.c, indexdb.c, indexdb.h: Providing 15711 more batch modes in GSNAP 15712 157132010-12-03 twu 15714 15715 * stage1hr.c: Made done_level always less than or equal to user_maxlevel 15716 157172010-12-02 twu 15718 15719 * samread.h: Added Id tag 15720 15721 * sam.c: Changed terminal alignments to use soft clipping, since hard 15722 clipping information appears to be removed in making BAM files. 15723 15724 * parserange.c, parserange.h: Implemented simple parser for regions 15725 15726 * gsnap_tally.c: Implemented limited region for indexed BAM files in 15727 bam_tally. Added -P flag for printing probabilities. 15728 15729 * bamread.c, bamread.h: Implemented indexed BAM files 15730 15731 * Makefile.dna.am: Added parserange.c and .h to bam_tally 15732 157332010-12-01 twu 15734 15735 * Makefile.dna.am, bamread.c, bamread.h, gsnap_tally.c: Implemented 15736 bam_tally. Changed standard tally output back to previous format. 15737 15738 * config.site, configure.ac: Made changes to include samtools library 15739 157402010-11-30 twu 15741 15742 * Makefile.dna.am, Makefile.gsnaptoo.am, Makefile.three.am, blackboard.c, 15743 blackboard.h, gmap.c, gsnap.c, sequence.c, sequence.h: Implemented the 15744 ability to uncompress gzip files by GSNAP 15745 15746 * README, config.site, configure.ac: Made changes to reflect a new zlib 15747 option 15748 157492010-11-29 twu 15750 15751 * gsnap.c: Fixed bug in output to multiple files where GSNAP single-end 15752 nomapping goes to stdout. 15753 157542010-11-24 twu 15755 15756 * get-genome.c: Fixed stdin input to get-genome for non-map requests 15757 157582010-11-22 twu 15759 15760 * Makefile.dna.am, Makefile.pmaptoo.am, Makefile.three.am: Added uinttable.c 15761 and uinttable.h to pmap 15762 15763 * Makefile.dna.am, Makefile.gmaponly.am, Makefile.gsnaptoo.am, 15764 Makefile.pmaptoo.am, Makefile.three.am: Added uinttable.c and uinttable.h 15765 to gmap 15766 15767 * stage2.c: Fixed bug in determining overall grand winner 15768 15769 * sam.c: Moved read group field to be first 15770 15771 * iit-read.c, iit-read.h: Implemented print_comment option 15772 15773 * gmap.c: Providing nchrs to stage1 procedure 15774 15775 * gregion.c, gregion.h: Implemented extentstart and extentend for comparing 15776 gregions. Added code for a Gregion cleaning step. 15777 15778 * stage1.c, stage1.h: Added hooks for a Gregion cleaning step 15779 157802010-11-18 twu 15781 15782 * gmap.c, pair.c, pair.h, stage3.c, stage3.h: Implemented --print-comment 15783 for map output. Removed old code for universal coordinate IIT files. 15784 15785 * genome.h, stage3hr.h: Formatting changes 15786 15787 * goby.c, goby.h, gsnap.c, sequence.c, sequence.h: Changes made for new Goby 15788 code 15789 15790 * substring.c: Always initializing trim_left and trim_right 15791 15792 * pdl_smooth.c, spliceturn.c, multiclean.c: Initial import into svn 15793 15794 * sam.c: Made fixes in printing mate information 15795 15796 * splicing-scan.c: Combining splice and terminal splicesites 15797 15798 * Makefile.dna.am: Removed hexamer-score.c and .h from extents_genebounds 15799 15800 * extents_genebounds.c: Added debugging information 15801 15802 * tally_expr.c: Using new interface to IIT_annotation. Providing option to 15803 print gc-content. 15804 15805 * Makefile.dna.am: Removed tally_exclude from bin files 15806 15807 * Makefile.dna.am: Removed iit_pileup from bin files 15808 15809 * fopen.m4: Added _cv_ to all variable names 15810 15811 * Makefile.am: Added ACLOCAL_AMFLAGS 15812 15813 * VERSION: Updated version 15814 15815 * bootstrap.dna, bootstrap.gmaponly, bootstrap.gsnaptoo, bootstrap.three: 15816 Using autoreconf. Added --install to some files to allow building from 15817 svn. 15818 15819 * sam.c: For unpaired_uniq, performing sorting first and then selecting mate 15820 for each end. 15821 15822 * sam.c: Restored null mate for unpaired_mult 15823 15824 * sam.c: Providing mate information in unpaired_mult 15825 15826 * iit-read.c: Printing tabs in SAM headers 15827 15828 * Makefile.dna.am, Makefile.gmaponly.am, Makefile.gsnaptoo.am, 15829 Makefile.three.am: Changes to Makefile.am files 15830 158312010-11-17 twu 15832 15833 * gsnap.c: Added a flag for --no-sam-headers 15834 15835 * resulthr.c, resulthr.h: Added a printing command for resulttype 15836 15837 * sam.c: For unmapped reads, always providing a mate if available 15838 15839 * sequence.c, sequence.h, gmap.c, pair.c, pair.h, stage3.c, stage3.h: Added 15840 ability to print GMAP alignments in SAM output format 15841 15842 * gsnap.c, substring.c, substring.h: Added --show-refdiff option 15843 15844 * README: Added further information about SNP-tolerant alignment and 15845 wildcard SNPs. 15846 158472010-11-16 twu 15848 15849 * README: Made changes in instructions for -V and -v flags 15850 15851 * gsnap.c, iit-read.c, iit-read.h, sam.c, sam.h: Implemented SM and RG 15852 fields. 15853 15854 * gsnap.c: Added warning about paired-end output in Goby 15855 15856 * goby.c, goby.h: Using new interface to sequence.h 15857 15858 * datadir.c: Minor formatting change 15859 15860 * get-genome.c, sequence.c, sequence.h: Handling printing of wildcard SNPs 15861 158622010-11-15 twu 15863 15864 * sequence.c, sequence.h: Changed name of procedure 15865 15866 * revcomp.c: Added flag for --byline 15867 15868 * reads_store.c: Fixed bug in freeing memory too early 15869 15870 * gsnapread.c: Reading quality string based on presence of third tab 15871 15872 * gsnap.c: Removed short versions of some flags 15873 15874 * sam.c: Using nmismatches_refdiffs in NM output 15875 15876 * stage3hr.c, stage3hr.h, substring.c, substring.h: Fixed trimming based on 15877 SNPs. Computing different types of nmismatches. 15878 15879 * add_rpk.c, exonscan.c, genecompare.c, plotgenes.c, tally.c, tallygene.c: 15880 Using new interface to IIT_annotation 15881 15882 * genome_hr.c, snpindex.c: Enabled representation of wildcard SNPs 15883 15884 * get-genome.c: Added -V flag to specify a directory for alternate genome 15885 information 15886 15887 * substring.c, substring.h, gsnap.c, mapq.c, mapq.h, sam.c, stage1hr.c, 15888 stage3hr.c, stage3hr.h: Added computation of mapping quality 15889 158902010-11-11 twu 15891 15892 * sequence.c, sequence.h: Fixes to printing of query sequences for failed 15893 alignments 15894 15895 * goby.c, goby.h, gsnap.c: Always shutting down Protobuf if compiled in. 15896 Calling gobyAlEntry_appendTooManyHits, even under quiet-if-excessive 15897 option. Changes to flag descriptions. 15898 15899 * genome_hr.c, genome_hr.h, stage1hr.c, stage3hr.c, stage3hr.h, substring.c, 15900 substring.h: Marking all mismatches by using query_compress, 15901 genome_blocks, and snp_blocks. 15902 159032010-11-10 twu 15904 15905 * goby.c, goby.h, gsnap.c, sam.c, sam.h, stage3hr.c, stage3hr.h: Allowing 15906 for three paired-end orientations instead of circular option. Added 15907 --fails-as-input flag. Fixed issues with handling --failsonly option. 15908 15909 * Makefile.gsnaptoo.am: Added blank line 15910 15911 * Makefile.gmaponly.am: Added parserange.c and parserange.h 15912 15913 * iit-read.c, iit-read.h: Revised IIT_annotation to handle version 5 IIT 15914 files 15915 15916 * iit_get.c: Using new interface to IIT_annotation 15917 15918 * genome.c, genome.h: Added procedures for returning ntcounts in a segment 15919 15920 * gmap.c: Using long options without short options for --version and --help 15921 15922 * pair.c: Fixed output of -f 8 format 15923 15924 * sequence.c, sequence.h: Added procedures for printing GSNAP queries 15925 159262010-11-09 twu 15927 15928 * goby.c, goby.h, Makefile.dna.am, Makefile.gsnaptoo.am, Makefile.three.am, 15929 gsnap.c, stage3hr.c, stage3hr.h: Added functionality for Goby file formats 15930 15931 * configure.ac: Added hooks for Goby compile-time option 15932 15933 * config.site: Added information for Goby compile-time option 15934 15935 * README: Added comments for FASTQ files, -z flag, and Goby functionality 15936 15937 * gsnap_tally.c: Fixed problem with underflow in taking exp() of log 15938 likelihood 15939 15940 * gsnap_tally.c: Added flag for using a constant quality score. Printing 15941 1-p instead of p. 15942 159432010-11-08 twu 15944 15945 * get-genome.c: Making only a single open of genome or genomealt 15946 159472010-11-07 twu 15948 15949 * genome.c, get-genome.c: Using new interface to IIT_annotation 15950 159512010-11-04 twu 15952 15953 * gsnap_terms.c: Added flags and parameters for mincount, min_endlength, and 15954 max_endlength. 15955 159562010-10-31 twu 15957 15958 * reads_get.c: Made several changes in parsing 15959 15960 * gsnap.c, sam.c, sam.h, stage1hr.c, stage1hr.h, stage3hr.c, stage3hr.h, 15961 substring.c, substring.h: Made SNP and splicesite parameters local to 15962 Substring procedures 15963 159642010-10-29 twu 15965 15966 * gsnap_tally.c: Added -A flag for controlling printing of ref details. 15967 Removed unused global parameters. Fixed bug in retrieving genomic 15968 reference. 15969 159702010-10-28 twu 15971 15972 * sam.c: Added XS flag to indicate splice direction 15973 15974 * configure.ac: Additions needed for new libtool version 15975 15976 * bootstrap.gsnaptoo: Running full set of autoconf programs 15977 15978 * iit.test.in: Revised version of test 15979 15980 * config.guess, config.sub: New version of libtool programs 15981 15982 * config.guess, config.sub, ltmain.sh: Previous version of libtool programs 15983 15984 * gsnap_tally.c: Added computation of genotype probabilities 15985 15986 * indexdb_dibase.c, indexdb_dibase.h, setup.ref3positions.ok: Initial import 15987 into SVN 15988 159892010-10-27 twu 15990 15991 * gsnap_tally.c: Sorting shifts and quality scores 15992 15993 * gsnap_tally.c: Keeping track of and reporting shifts and quality for 15994 reference matches 15995 15996 * list.c: Added tentative code for dealing with NULL lists in List_to_array 15997 15998 * chop_primers.c: Using new interface to Sequence_print_header 15999 16000 * translation.c, translation.h: Added ability to start CDS from a given 16001 position 16002 16003 * tally_expr.c: Added ability to show mincount 16004 16005 * tally.c, tally.h: Added functions Tally_mean_double and Tally_quantile 16006 16007 * seqlength.c, pair.c: Using new interface to Sequence_print_digest 16008 16009 * parserange.c: Fixed bug in returning coordstart 16010 16011 * md5-compute.c: Using new interface to MD5_print 16012 16013 * iit_store.c: Storing rest of header in annotation. Using new interface to 16014 IIT_write. 16015 16016 * iit_get.c: Changed stats to compute mean over entire width, not just 16017 non-zero positions 16018 16019 * gsnap_iit.c: Using new interface to Gsnapread_parse_line 16020 16021 * gsnap_tally.c: Subtracting 64 from quality scores, as standard for Illumina 16022 16023 * gsnap_extents.c, gsnap_splices.c, gsnap_terms.c: Using new interface to 16024 Samread 16025 16026 * gsnap_tally.c: Printing quality scores relative to highest one seen 16027 16028 * sam.c: Changed separator for extra fields to be a tab, rather than a space. 16029 160302010-10-26 twu 16031 16032 * gsnap.c, stage1hr.c, stage1hr.h, stage3hr.c, stage3hr.h, substring.c, 16033 substring.h: Implemented trim-mismatch-score for controlling trimming 16034 16035 * samread.c, samread.h, gsnapread.c, gsnapread.h: Implemented retrieval of 16036 quality string 16037 16038 * gsnap_tally.c: Printing mismatch information by position and quality 16039 16040 * gmapindex.c: Using new interface to IIT_write 16041 16042 * gmap.c: Protecting against calling List_to_array with an empty list 16043 16044 * get-genome.c: Fixed bug introduced by new default snps_mode 16045 16046 * diag.c: Protecting against call to List_to_array on an empty list 16047 16048 * stage3hr.c, stage3hr.h: Reversing quality string in GSNAP output when 16049 necessary, and using quality shift. 16050 16051 * gsnap.c: Fixed memory leak when npaths is zero. Reversing quality string 16052 in GSNAP output when necessary. 16053 160542010-10-25 twu 16055 16056 * gsnap_tally.c: Changed output format to show all signed query positions 16057 16058 * gsnap_tally.c: Changed output format to show all query positions 16059 16060 * gsnap_tally.c: Incorporated sam_tally into this source code 16061 160622010-10-24 twu 16063 16064 * splicetrie.h: Changed Splicetrie_dump 16065 16066 * splicetrie.c: Changed Splicetrie_dump. Handling case where pos5 == pos3 16067 in short-exon splicing. Added debugging statements. 16068 16069 * stage1hr.c: Fixed a bug in handling one case of ambiguous splice ends in 16070 short-exon splicing. 16071 16072 * splicetrie.c: Allowing only one mismatch at most for searching at ends in 16073 short-exon splicing when ends are 16 nt or shorter. 16074 16075 * splicetrie.c: Combined Trie_new and Trie_output into a single procedure 16076 160772010-10-23 twu 16078 16079 * splicetrie.c: Removed debugging statement 16080 16081 * gsnap.c, splicetrie.c, splicetrie.h, stage1hr.c, stage1hr.h: Enabled 16082 computation of splice tries on the fly 16083 16084 * gsnap.c, splicetrie.c, splicetrie.h: Divided Splicetrie_build process into 16085 two steps, with one computing nsplicepartners. 16086 16087 * gsnap.c, splicetrie.c, splicetrie.h: Using unsigned ints rather than char 16088 * to store splicestrings and compute tries. 16089 160902010-10-22 twu 16091 16092 * splicetrie.c: Ignoring cases where splice site has an N 16093 16094 * stage3hr.c: Changed assertion to use effective_chrnum rather than chrnum 16095 16096 * stage1hr.c: Using new interface to Splicetrie procedures. Revised 16097 parameters for distant splicing. 16098 16099 * splicetrie.c, splicetrie.h: Checking short-end and short-exon splicing 16100 against extension 16101 16102 * gsnap.c: Automatically setting pairmax if not specified for RNA-Seq 16103 161042010-10-20 twu 16105 16106 * stage3hr.c: Changed ambiguous splice procedure to remove longer splice 16107 16108 * stage1hr.c: Using new interfaces to Splicetrie procedures. 16109 16110 * splicetrie.c, splicetrie.h: Fixed various bugs. Implemented separate 16111 procedures for short-ends and for longer ends (needed for short-exon 16112 alignments). 16113 161142010-10-19 twu 16115 16116 * splicetrie.c, splicetrie.h, stage1hr.c: Checking entire subtree against 16117 splicefrags when using alternate genome and reaching a non-leaf with no 16118 string remaining. Removed unused parameters. 16119 16120 * stage1hr.c: Using new interface to Splicetrie_dump 16121 16122 * get-genome.c: Changed flags for using SNPs 16123 16124 * genome.c, genome.h: Added function Genome_fill_buffer_simple_alt 16125 16126 * splicetrie.c, splicetrie.h: Fixed use of nmismatches from splicefrags. 16127 Fixed use of alternate genome. Using 4 bytes instead of 2 bytes for 16128 reloffsets. Not using suboptimal separation. 16129 161302010-10-18 twu 16131 16132 * stage1hr.c, stage3hr.c, stage3hr.h, substring.c, substring.h: Removed 16133 chrhigh from Substring_T and Stage3_T objects, and removed from segment 16134 objects in stage 1. 16135 16136 * gsnap.c, splicetrie.c, splicetrie.h, stage1hr.c: Completed transition to 16137 using splicetries. Removed unnecessary variables and code. 16138 16139 * genome_hr.c, genome_hr.h, gsnap.c, splicetrie.c, splicetrie.h, stage1hr.c, 16140 stage1hr.h: Enabling use of splicefrags with splicetrie. Enabled 16141 processing of alternate alleles. 16142 161432010-10-16 twu 16144 16145 * Makefile.dna.am, Makefile.gsnaptoo.am, gsnap.c, splicetrie.c, 16146 splicetrie.h, stage1hr.c, stage1hr.h: Implemented tries for short-end 16147 splicing 16148 161492010-10-15 twu 16150 16151 * gsnapread.c, gsnapread.h: Returning number of mismatches 16152 16153 * stage3.c, stage3.h, gmap.c: Added ability to specify where CDS begins 16154 16155 * get-genome.c: Added -A flag for dumping entire genome. Handling -m flag 16156 correctly for stdin input. 16157 161582010-10-14 twu 16159 16160 * pair.c: Implemented printinf of coverage, identity, and phases in GFF 16161 output 16162 16163 * iit-write.c, iit-write.h, iitdef.h: Implemented version 5 of IIT format, 16164 which allows different pointer sizes for labels and annotations 16165 16166 * gsnap.c, iit-read.c, iit-read.h, md5.c, md5.h, resulthr.c, resulthr.h, 16167 sam.c, sam.h, sequence.c, sequence.h, stage3hr.c, stage3hr.h, substring.c, 16168 substring.h: Enabled printing of output into multiple files 16169 16170 * stage1hr.c: Allowing 1 mismatch in shortexon end, but requiring a 16171 separation of 2 from next best alignment. Consolidated code into 16172 find_left_splice and find_right_splice. 16173 161742010-10-13 twu 16175 16176 * gsnap.c: Changed default quality-shift to be 0 16177 16178 * stage3hr.c, stage3hr.h, substring.c, substring.h, sam.c: Implemented 16179 calculation and printing of MD string 16180 16181 * gsnap_tally.c: Fixed bug in freeing data. Setting min_readlength to 0. 16182 Using new interface to Gsnapread. 16183 161842010-10-06 twu 16185 16186 * spliceclean.c: Added ability to print excluded splices 16187 16188 * spliceclean.c: In resolving splice direction, using a 10-to-1 threshold 16189 and checking adjacent splices if necessary. 16190 16191 * gsnap.c: Turned on indels by setting default indel-penalty to be 1. 16192 161932010-10-04 twu 16194 16195 * stage1hr.c: Added restriction on number of mismatches allowed in short 16196 exons 16197 161982010-10-02 twu 16199 16200 * spliceclean.c: Added ability to resolve between competing splices based on 16201 fwd and rev extents. Added ability to print endpoints and midpoints. 16202 Added flag to bypass cleaning step. 16203 16204 * stage3hr.c: Allowing pair overlaps when splices are involved. Using new 16205 way of computing low and high for pairs. 16206 162072010-09-30 twu 16208 16209 * gsnap_extents.c: Initial creation 16210 16211 * Makefile.dna.am: Added extents_genebounds 16212 162132010-09-29 twu 16214 16215 * spliceclean.c: Added ability to print runlengths and splicesites. Added 16216 ability to filter based on uniqueness, concordance, or maxminsupport. 16217 16218 * extents_genebounds.c: Initial creation 16219 162202010-09-28 twu 16221 16222 * Makefile.dna.am: Added multiclean, gsnap_extents, and gsnap_terms 16223 16224 * gsnap_terms.c: Initial creation of gsnap_terms 16225 162262010-09-22 twu 16227 16228 * iit_dump.c: Implemented printing of IIT in runlength or integral output 16229 16230 * gsnap.c, sam.c, sam.h, sequence.c, stage3hr.c: Fixed handling of circular 16231 reads and implemented printing of SAM output for circular reads. 16232 16233 * gmap.c, stage3.c, stage3.h, translation.c, translation.h: Added feature to 16234 start protein coding sequence from first query position. 16235 162362010-09-20 twu 16237 16238 * splicegene.c: Added the ability to output genes as well as paths 16239 162402010-09-08 twu 16241 16242 * mem.c: Added comment about use of LEAKCHECK 16243 16244 * list.c: Removed unused variable 16245 16246 * iit-read.c, iit-read.h, iit_dump.c: Added sort functionality for iit_dump 16247 16248 * snpindex.c: Using -V flag to allow user to specify destination directory 16249 16250 * substring.c: Added check to avoid checking for mismatches past end of 16251 string 16252 16253 * stage1hr.c: Simplified computation of leftbound and rightbound in 16254 short-end splicing 16255 16256 * stage1hr.c: Using different stopi for novel and known splicing. Fixed 16257 possible bug in reading past mismatches_left and mismatches_right. Fixed 16258 calculation of chrend in finding right bound for short-end splicing. 16259 162602010-09-04 twu 16261 16262 * splicegene.c: Incorporated cappaths functionality 16263 16264 * splicegene.c: Removed global variables related to linear fitting 16265 16266 * splicegene.c: Fixed memory leaks. Added filtering based on mean number of 16267 splices. 16268 162692010-09-03 twu 16270 16271 * splicegene.c: Removed global variables 16272 16273 * splicegene.c: Added ability to handle all chromosomes in a single run. 16274 Fixed some memory leaks. 16275 16276 * gsnap.c, stage1hr.c, stage1hr.h, stage3hr.c, stage3hr.h, substring.c, 16277 substring.h: Introduced -V flag for specifying snpsdir, and now using -v 16278 flag for indicating SNPs file. Removed geneprob option and procedures. 16279 16280 * stage1hr.c, stage3hr.c, substring.c, substring.h: Made terminals extend to 16281 beginning and end of read, with trimming starting from there. Endtypes 16282 based on presence of trimming. 16283 162842010-09-02 twu 16285 16286 * gsnapread.c, sam.c, stage1hr.c, stage3hr.c, stage3hr.h, substring.c, 16287 substring.h: Added left and right endtypes to Substring_T object, and 16288 using them for printing exact, substitution, and terminal alignments. 16289 Renamed variables in Hittype_T enum. Added ambiguous alignments. 16290 Restored usage of score in Stage3_remove_duplicates. Using number of 16291 mismatches to compute nmatches in Stage3_T objects. Revised computation 16292 of terminal alignments. 16293 162942010-09-01 twu 16295 16296 * gsnap.c, stage1hr.c, stage1hr.h: Introduced min_shortend as a parameter 16297 with flag -l. The find_left_splice and find_right_splice procedures now 16298 compete with extension. 16299 163002010-08-31 twu 16301 16302 * gmap.c, stage3.c, stage3.h: Introducing sense_filter in addition to 16303 sense_try. Counting non-canonical introns, rather than canonical ones to 16304 determine sense, and adding a small penalty for introns to bias against 16305 short exons. 16306 16307 * stage1hr.c: Using new parameter to turn off concordant translocations with 16308 terminal alignments. Clarified usage of query, queryuc_ptr, and queryrc. 16309 16310 * stage3hr.c, stage3hr.h: Added flag to control concordant translocations 16311 16312 * genome_hr.c, genome_hr.h: Fixed bug in handling fragments when query 16313 length is 16. Removed query parameter from Genome_trim_left and 16314 Genome_trim_right procedures. 16315 16316 * stage1hr.c: Stopped placing restrictions on stopi in finding splice ends. 16317 Requiring minimum endlength for short end splicing. 16318 163192010-08-30 twu 16320 16321 * stage1hr.c: Fixed bug with donor, acceptor, and shortexons that were NULL. 16322 Fixed logic with novel splice sites in local splicing. 16323 163242010-08-26 twu 16325 16326 * stage1hr.c: Fixed bug attempting to make shortexon of length 0 16327 163282010-08-25 twu 16329 16330 * splicing-scan.c: Initial import into SVN 16331 16332 * Makefile.util.am: Renamed revcomp program to rc 16333 16334 * Makefile.gsnaptoo.am: Added gsnapread.c and gsnapread.h for gsnap_tally 16335 16336 * Makefile.dna.am: Added rc and splicing-scan 16337 16338 * get-genome.c: Removed unused parameter 16339 16340 * gsnap_splices.c, gsnapread.c: Allowing program to handle short exon 16341 alignments with multiple splices 16342 16343 * parserange.c: Added check for coordinate lengths that exceed 32 bits 16344 16345 * substring.c: Commented old location of sub: field in donor and acceptor 16346 substrings 16347 16348 * stage3hr.c, stage3hr.h: Including chrhigh in substrings 16349 16350 * stage1hr.c: Including chrhigh in segments and substrings. Implemented 16351 usage of splice distance in short exon alignments. 16352 16353 * gsnap_iit.c, gsnap_splices.c, gsnap_tally.c, gsnapread.c, gsnapread.h: 16354 Gsnapread_parse_line returns information about types of endpoints 16355 163562010-08-24 twu 16357 16358 * substring.c, substring.h: Including chrhigh as a field in Substring_T 16359 16360 * sam.c, stage1hr.c, stage3hr.c, stage3hr.h, substring.c, substring.h: 16361 Implemented shortexon alignment 16362 16363 * gsnap.c: Changed message for reading splicesites IIT file 16364 16365 * genome_hr.c: Fixed bug in using flags for shortend splicing 16366 163672010-08-23 twu 16368 16369 * gsnap.c: Enabled reading of a local splicesite file 16370 163712010-08-19 twu 16372 16373 * stage1hr.c: Added check for query_compress to be non-NULL before 16374 find_terminals for single-end alignment 16375 16376 * stage1hr.c: Fixed bug where query_compress needed to be computed before 16377 finding terminals 16378 163792010-08-18 twu 16380 16381 * iit_store.c: Using string_compare and string_hash functions from table.c 16382 16383 * iit-write.c: Moved position of free() statement 16384 16385 * iit-read.c: Fixed debugging output 16386 16387 * genome_hr.c, genome_hr.h, gsnap.c, stage1hr.c, stage1hr.h: Using 16388 splicefrags to increase speed of finding short-end splicing 16389 16390 * stage3hr.c: Setting nindels to be zero for a terminal alignment 16391 163922010-08-13 twu 16393 16394 * gsnap.c, indexdb.c, indexdb.h, indexdb_hr.c, indexdb_hr.h, spanningelt.c, 16395 spanningelt.h, stage1hr.c, stage1hr.h: Allowing GSNAP to run when 16396 positions are read as FILEIO. 16397 16398 * stage1hr.c, stage3hr.c, stage3hr.h, substring.c, substring.h: Using 16399 splicesites array rather than splicesites_iit in short-end splicing 16400 16401 * stage1hr.h: Increased MAX_QUERYLENGTH from 200 to 500 16402 164032010-08-09 twu 16404 16405 * indexdb_dump.c: Added file left out in SVN conversion 16406 16407 * indexdb_dump.c: Undo addition to wrong directory 16408 16409 * stage1hr.c: Removed requirement for nconcordant == 0 in deciding to 16410 compute local splices and short-end splices. 16411 164122010-08-06 twu 16413 16414 * reads.c, reads_store.c: Eliminating labelorder in file format. Cleaned up 16415 memory leaks. 16416 164172010-08-05 twu 16418 16419 * reads.c: Improved speed of dumping procedure 16420 16421 * reads.c, reads_get.c, reads_store.c: Allowing either 4-byte or 8-byte 16422 label and read pointers 16423 16424 * Makefile.dna.am, reads.c, reads.h, reads_get.c, reads_store.c: Enabled 16425 compression of reads 16426 16427 * Makefile.dna.am, reads.c, reads.h, reads_store.c: Using a div structure in 16428 file format 16429 16430 * Makefile.dna.am, reads.c, reads.h, reads_dump.c, reads_get.c, 16431 reads_store.c: Using our own file format for storing reads, rather than 16432 BerkeleyDB 16433 16434 * iit-read.c: Fixed bug in fileio reading of annotations 16435 16436 * access.c, access.h, add_rpk.c, assert.c, assert.h, backtranslation.c, 16437 backtranslation.h, bigendian.c, bigendian.h, blackboard.c, blackboard.h, 16438 block.c, block.h, bool.h, boyer-moore.c, boyer-moore.h, cappaths.c, 16439 changepoint.c, changepoint.h, chimera.c, chimera.h, chop_primers.c, 16440 chrnum.c, chrnum.h, chrom.c, chrom.h, chrsegment.c, chrsegment.h, 16441 chrsubset.c, chrsubset.h, cmet.c, cmet.h, cmetindex.c, color.c, color.h, 16442 comp.h, complement.h, compress.c, compress.h, cum.c, datadir.c, datadir.h, 16443 datum.c, datum.h, diag.c, diag.h, diagdef.h, diagnostic.c, diagnostic.h, 16444 diagpool.c, diagpool.h, dibase.c, dibase.h, dibaseindex.c, doublelist.c, 16445 doublelist.h, dynprog.c, dynprog.h, except.c, except.h, exonscan.c, 16446 fopen.h, gbuffer.c, gbuffer.h, gdiag.c, geneadjust.c, genecompare.c, 16447 geneeval.c, genome-write.c, genome-write.h, genome.c, genome.h, 16448 genome_hr.c, genome_hr.h, genomepage.c, genomepage.h, genomeplot.c, 16449 genomicpos.c, genomicpos.h, genuncompress.c, get-genome.c, getopt.c, 16450 getopt.h, getopt1.c, gmap.c, gmapindex.c, gregion.c, gregion.h, gsnap.c, 16451 gsnap_iit.c, gsnap_splices.c, gsnap_tally.c, gsnapread.c, gsnapread.h, 16452 hint.c, hint.h, iit-read.h, iit-write.c, iit-write.h, iit_dump.c, 16453 iit_fetch.c, iit_get.c, iit_plot.c, iit_store.c, iit_update.c, iitdef.h, 16454 indexdb.c, indexdb.h, indexdb_hr.c, indexdb_hr.h, indexdbdef.h, 16455 interval.c, interval.h, intlist.c, intlist.h, intlistdef.h, intpool.c, 16456 intpool.h, intron.c, intron.h, lgamma.c, lgamma.h, list.c, list.h, 16457 listdef.h, littleendian.c, littleendian.h, match.c, match.h, matchdef.h, 16458 matchpool.c, matchpool.h, maxent.c, maxent.h, md5-compute.c, md5.c, md5.h, 16459 mem.c, mem.h, memchk.c, nmath.c, nmath.h, nr-x.c, nr-x.h, oligo-count.c, 16460 oligo.c, oligo.h, oligoindex.c, oligoindex.h, oligop.c, oligop.h, 16461 orderstat.c, orderstat.h, pair.c, pair.h, pairdef.h, pairingcum.c, 16462 pairingflats.c, pairinggene.c, pairingstrand.c, pairingtrain.c, 16463 pairpool.c, pairpool.h, parserange.c, parserange.h, pbinom.c, pbinom.h, 16464 pdldata.c, pdldata.h, pdlimage.c, plotdata.c, plotdata.h, plotgenes.c, 16465 plotgenes.h, pmapindex.c, random.c, random.h, rbtree.c, rbtree.h, 16466 rbtree.t.c, reader.c, reader.h, reqpost.c, reqpost.h, request.c, 16467 request.h, result.c, result.h, resulthr.c, resulthr.h, revcomp.c, sam.c, 16468 sam.h, sam_tally.c, samread.c, samread.h, scores.h, segmentpos.c, 16469 segmentpos.h, segue.c, separator.h, seqlength.c, sequence.c, sequence.h, 16470 smooth.c, smooth.h, snpindex.c, spanningelt.c, spanningelt.h, 16471 spliceeval.c, splicegene.c, splicegraph.c, splicescan.c, splicing-score.c, 16472 stage1.c, stage1.h, stage1hr.c, stage1hr.h, stage2.c, stage2.h, stage3.c, 16473 stage3.h, stage3hr.c, stage3hr.h, stopwatch.c, stopwatch.h, subseq.c, 16474 substring.c, substring.h, table.c, table.h, tableint.c, tableint.h, 16475 tallyadd.c, tallyflats.c, tallygene.c, tallyhmm.c, tallystrand.c, 16476 translation.c, translation.h, trial.c, trial.h, types.h, uintlist.c, 16477 uintlist.h, uinttable.c, uinttable.h: Added keyword property for Id 16478 16479 * tally_expr.c: Fixed bug in reporting number of exons and in skipping exons 16480 164812010-08-04 twu 16482 16483 * stage3hr.c: Improved debugging statements 16484 16485 * stage3.c: Fixed bug when ngap was larger than gaps in dual_break 16486 16487 * samread.c: Added old but unused code 16488 16489 * iit-read.c, iit-read.h: Added function IIT_get_typed_signed 16490 16491 * gsnap_splices.c: Added parameter for shortsplicedist 16492 16493 * gsnapread.c, gsnapread.h: Added function Gsnapread_accession 16494 16495 * gsnap_tally.c: Fixed bug in using advance_one_hit 16496 16497 * Makefile.dna.am, gsnap_iit.c: Created gsnap_iit 16498 16499 * Makefile.dna.am, reads_get.c, reads_store.c: Created programs reads_store 16500 and reads_get. 16501 16502 * splicefill.c: Using median filtering as a first step 16503 16504 * splicefill.c: Removed probabilistic calculations 16505 16506 * splicefill.c: Version with probabilistic calculations 16507 165082010-08-03 twu 16509 16510 * splicefill.c: Using tally information to find edges. Using Poisson and 16511 exponential models. 16512 16513 * iit-read.c, iit-read.h, snpindex.c: Providing messages about chromosomes 16514 in the genome and in the SNPs IIT file 16515 165162010-08-02 twu 16517 16518 * Makefile.dna.am, splicefill.c: Initial creation of splicefill program 16519 165202010-08-01 twu 16521 16522 * gsnap_tally.c, gsnapread.c, gsnapread.h: Able to separate low and high 16523 ends of paired-end reads 16524 165252010-07-31 twu 16526 16527 * Makefile.dna.am, gsnap_tally.c: Using parsing functions in gsnapread.c 16528 16529 * Makefile.dna.am, gsnap_splices.c, gsnapread.c, gsnapread.h: Moved parsing 16530 functions to gsnapread.c 16531 165322010-07-30 twu 16533 16534 * spliceclean.c: Preserving information in rest of header 16535 16536 * gsnap_splices.c: Printing maxminsupport and nconcordant information 16537 16538 * Makefile.dna.am, spliceclean.c: Enabled spliceclean to handle all 16539 chromosomes. Using tables to store splices. 16540 16541 * spliceclean.c: Fixed bugs in parsing input 16542 165432010-07-29 twu 16544 16545 * spliceclean.c: Added procedure to free memory 16546 16547 * gsnap_splices.c: Fixed bug from freeing table keys too early 16548 16549 * gsnap_splices.c: Enabled program to handle all chromosomes in a single run 16550 16551 * Makefile.dna.am, gsnap_splices.c, iit-read.c, uinttable.c, uinttable.h: 16552 Using a table to store splice sites in gsnap_splices.c 16553 165542010-07-28 twu 16555 16556 * gsnap_splices.c: Removed -F and -R flags for separate strands 16557 16558 * Makefile.dna.am, gsnap_splices.c, sam_splices.c: Integrated sam_splices.c 16559 and gsnap_splices.c into a single file 16560 16561 * mem.c, genome.c: Removed unused variable 16562 16563 * iit-read.c, substring.c: Removed unused code 16564 16565 * blackboard.c: Returning bool type explicitly 16566 16567 * sequence.c: Resolving compiler warning about type casting 16568 16569 * gsnap_splices.c, sam_splices.c, spliceclean.c: Allowing -s flag to print 16570 annotations about known splicesites 16571 16572 * struct-stat64.m4: Added missing m4 file 16573 16574 * Makefile.am, cvs2cl.pl, svncl.pl: Replace cvs2cl.pl with svncl.pl 16575 16576 * CVSROOT: Removed CVSROOT directory 16577 165782010-07-27 twu 16579 16580 * assert.h: Changed compiler variable 16581 16582 * VERSION, config.site.rescomp.prd, index.html: Revised for 2010-07-27 16583 release 16584 16585 * bootstrap.dna: Using autoreconf 16586 16587 * README: Modified statement about -m flag and about types in SNP IIT files 16588 16589 * MAINTAINER: Added statement about assert.h 16590 16591 * tally_expr.c: Standardized output format 16592 16593 * gsnap.c: Made -q flag work correctly for single-thread mode. Printing run 16594 time at end of each run. 16595 16596 * gmap.c: Calling correct exception for a sigtrap 16597 165982010-07-26 twu 16599 16600 * Makefile.dna.am, Makefile.gsnaptoo.am: Using datadir in snpindex 16601 16602 * iit-read.c, iit-read.h: Fixed IIT_index function 16603 16604 * snpindex.c: Using datadir. Fixed error messages. 16605 16606 * stage3hr.h, substring.c, substring.h: Removed fields for halfintrons. 16607 16608 * stage3hr.c: Fixed bug in removing duplicates. Removed fields for 16609 halfintrons. 16610 16611 * stage1hr.c, stage1hr.h: Implemented short-end splicing for known splice 16612 sites 16613 16614 * mem.c: Changed monitoring statement to print only in debug mode 16615 16616 * iit-read.c, iit-read.h: Added procedure for typed and signed intervals 16617 based on divno 16618 16619 * gsnap.c: New interface to stage 1 procedures 16620 166212010-07-23 twu 16622 16623 * VERSION, index.html: Revised for 2010-07-23 release 16624 16625 * spliceclean.c: Processing forward and reverse splices separately 16626 16627 * gsnap.c: Fixed bug where -a flag modified trim_maxlength 16628 16629 * assert.h: Turned off assertion checking 16630 16631 * Makefile.dna.am: Added tally_exclude 16632 16633 * substring.c: Modified debugging statements for trimming 16634 16635 * stage3hr.c: Added debugging statements 16636 16637 * iit-read.c, iit-read.h: Added function IIT_interval_sign 16638 166392010-07-22 twu 16640 16641 * tally_expr.c: Allowing printing over all positions 16642 16643 * tally_expr.c: Allowing multiple tallies 16644 166452010-07-21 twu 16646 16647 * gsnap.c, sam.c, sam.h, sequence.c, sequence.h: Fixed handling of quality 16648 scores to match that of sequence. Added -j flag to specify amount of 16649 shift for quality scores. 16650 16651 * setup1.test.in: Putting test chromosome in subdirectory 16652 16653 * setup2.test.in: Revised test for new gmapindex, but test not being used 16654 currently 16655 16656 * iit.test.in: Not testing for diff in iittest.iit 16657 16658 * align.test.ok, coords1.test.ok, map.test.ok: Changed expectations to match 16659 latest program output 16660 16661 * iittest.iit.ok: Using latest IIT version 16662 16663 * Makefile.am: Using ref3offsets and ref3positions instead of idxoffsets and 16664 idxpositions 16665 16666 * acx_mmap_fixed.m4, acx_mmap_variable.m4: Added stdlib.h and unistd.h 16667 headers 16668 16669 * bootstrap.dna, bootstrap.gsnaptoo, bootstrap.three, 16670 config.site.rescomp.prd, config.site.rescomp.tst, gsnap-fetch-reads.pl, 16671 gsnap-fetch-reads.pl.in, gsnap-remap.pl, gsnap-remap.pl.in, cum.c, 16672 dibaseindex.c, geneadjust.c, pairingtrain.c, splicegraph.c, tallyadd.c, 16673 tallygene.c, tallystrand.c: Initial import into CVS 16674 16675 * config.site.gne: Removed old config file 16676 16677 * acinclude.m4: Including builtin m4 code 16678 16679 * MAINTAINER: Added notes about checking Bigendian behavior 16680 16681 * archive.html, index.html: Revised for 2010-07-20 release 16682 16683 * configure.ac: Better checking for VERSION 16684 16685 * bootstrap.pmaptoo: Added --force flag 16686 16687 * bootstrap.gmaponly: Added autoreconf step 16688 16689 * README, VERSION: Changed for 2010-07-20 release 16690 16691 * gmap_process.pl.in: Removed check for contig version 16692 16693 * gmap_update.pl.in: Not updating chromosome or contig IIT files 16694 16695 * gmap_setup.pl.in: Providing -q and -Q flags for GMAP and PMAP indexing 16696 intervals. 16697 16698 * gsnap_splicing.pl: Program is superseded by C program gsnap_splices 16699 16700 * gsnap_splicing.pl: Various changes. Program is superseded by C program 16701 gsnap_splices. 16702 16703 * gmap_compress.pl.in, gmap_reassemble.pl.in, gmap_uncompress.pl.in, 16704 md_coords.pl.in: Using "use warnings" instead of "-w" flag 16705 16706 * fa_coords.pl.in: Handling duplicate occurrences of a chromosome. Limiting 16707 number of warnings. 16708 16709 * Makefile.am: Added gmap_update 16710 16711 * gmap.c, pair.c, pair.h, stage3.c, stage3.h: Implemented -f 4 GFF estmatch 16712 format based on patch from Shaun Jackman and Eoghan Harrington of British 16713 Columbia Genome Sciences Centre. 16714 16715 * chimera.c: Commented out problematic code, to be resolved later 16716 16717 * get-genome.c: Fixed coordinates when retrieving map file contents 16718 16719 * pairingstrand.c, tallyhmm.c, geneeval.c: Using new Parserange_universal 16720 function 16721 16722 * tally.c, tally.h: Treating counts as long ints 16723 16724 * splicegene.c: Changed algorithm 16725 16726 * spliceeval.c: Removed unused code 16727 16728 * plotgenes.c, plotgenes.h: Several changes, including trying to resolve 16729 fatal errors 16730 16731 * pdldata.c, pdldata.h: Implemented Pdldata_new and Pdldata_write 16732 16733 * pairinggene.c: Counting found splices as flats 16734 16735 * pairingflats.c: Changed algorithm for finding flat regions 16736 16737 * pairingcum.c: Treating high and low reads separately 16738 16739 * oligo-count.c: Using new interface to Reader_new. 16740 16741 * lgamma.c, lgamma.h: Handling counts as long ints 16742 16743 * hint.c, hint.h: Changed models 16744 16745 * genecompare.c: Separate output for forward and reverse chromosome strands. 16746 16747 * gdiag.c: Removed some output. Using new interfaces to IIT_read. 16748 16749 * dibase.c, dibase.h, exonscan.c: Change in algorithm 16750 16751 * chimera.c: Using Path_matchscores instead of Stage3_matchscores 16752 16753 * cappaths.c: Using xintercepts instead of slopes 16754 16755 * boyer-moore.c, boyer-moore.h: Added procedures for chop_primers.c 16756 16757 * add_rpk.c: Change of output format 16758 167592010-07-20 twu 16760 16761 * stage1hr.c: Tightened requirements further for splice site probabilities 16762 on distant splicing. 16763 16764 * stage3hr.c: Using nmatches to filter pairs containing terminal alignments 16765 16766 * gsnap.c: Changed advice on RNA-Seq settings for -m. 16767 16768 * archive.html, index.html: Released version 2010-03-10 16769 16770 * substring.c, substring.h: Computing nmatches directly 16771 16772 * stage3hr.h: Removed score parameter from Stage3_new_terminal 16773 16774 * stage3hr.c: Selecting best among terminal alignments. Computing nmatches 16775 directly. 16776 16777 * stage1hr.c: Changed algorithm for finding terminal alignments. Requiring 16778 distant splicing to have high splice probabilities. 16779 16780 * sam_splices.c: Computing readlengths on each end of splice separately 16781 16782 * gsnap.c, gsnap_splices.c: Added debugging code 16783 167842010-07-19 twu 16785 16786 * stage1hr.c: Using sequences as numeric in some cases 16787 16788 * maxent.c, maxent.h: Added procedures to handle sequences as numeric 16789 16790 * gsnap.c: Added a comment to the --help message 16791 16792 * genome_hr.c, genome_hr.h: Added a procedure to retrieve a dinucleotide 16793 16794 * genome.c, genome.h: Added a procedure to retrieve sequences as numeric 16795 16796 * Makefile.dna.am, Makefile.gmaponly.am, Makefile.gsnaptoo.am, 16797 Makefile.pmaptoo.am: Revised files and programs as needed 16798 167992010-07-16 twu 16800 16801 * stage3hr.c: Requiring that dual translocations be concordant only across 16802 the same two chromosomes. 16803 16804 * smooth.c: Conserving listcells where possible 16805 16806 * oligoindex.c, oligoindex.h: Removed computation of fingerprint 16807 16808 * list.c, list.h: Implemented List_transfer_one and List_push_existing 16809 16810 * gsnap.c: Performing trimming by default 16811 16812 * dynprog.c: Ensuring that finalscore is assigned in Dynprog_genome_gap. 16813 16814 * stage3hr.c, substring.c, substring.h: Providing a minlength parameter to 16815 Substring_new, so end indels do not get eliminated. 16816 16817 * chop_primers.c: Initial import into CVS 16818 16819 * sam_tally.c: Trimming uses -3 for mismatches and resets negative scores to 16820 zero. Handling hard clipping. 16821 16822 * gsnap_tally.c: Trimming uses -3 for mismatches and resets negative scores 16823 to zero 16824 16825 * gsnap_splices.c: Sorting splices using both ends. 16826 16827 * sam_splices.c: Handling AT-AC introns. Sorting splices using both ends. 16828 16829 * samread.c, samread.h: Returning acc in parsing line 16830 16831 * sam.h: Renamed NOT_PRIMARY bit 16832 16833 * sam.c: Implemented hard clipping of sequences for SAM output. Enabled 16834 printing of distant splices onto two separate lines. Using NOT_PRIMARY 16835 bit in flag. 16836 16837 * sequence.c, sequence.h: Implemented hard clipping of sequences for SAM 16838 output 16839 16840 * stage3.c: Removed unused procedures. Conserving listcells when possible. 16841 16842 * stage2.c: Removed unused procedures 16843 16844 * gmap.c: Removed references to Intpool_T 16845 16846 * pairpool.c: Setting initial value for state 16847 16848 * matchpool.c, matchpool.h: Implemented Matchpool_free_memory procedure 16849 16850 * mem.c, mem.h: Added procedures for computing memory usage 16851 16852 * stage3hr.c, substring.c, substring.h: Now trimming from both ends of 16853 terminal alignment. Explicitly specifying which ends to trim. 16854 16855 * substring.h: Replaced Substring_T with T 16856 16857 * substring.c: Allowing Substring_new to return NULL if trimmed alignments 16858 are poor. Replaced Substring_T with T. Resetting score to zero when it 16859 becomes negative in trimming. 16860 16861 * stage1hr.c, stage3hr.c: Allowing Substring_new and Stage3_new to return 16862 NULL if trimmed alignments are poor. 16863 16864 * substring.c: Changed mismatch score from -1 to -3 for trimming 16865 16866 * stage1hr.c, stage3hr.h: Added notion of ambiguous splices. 16867 16868 * stage3hr.c: Added notion of ambiguous splices. Fixing removal of 16869 duplicates. 16870 168712010-07-14 twu 16872 16873 * stage3hr.c, stage3hr.h: Implemented Stage3_substring_low 16874 16875 * samread.c: Added debugging comments 16876 16877 * sam.c, sam.h: Moved flag constants to sam.h. Using Stage3_substring_low 16878 to print chromosomal pos. 16879 16880 * sam_splices.c: Simplified loop 16881 16882 * gsnap_splices.c: Having lines_gc return NULL 16883 16884 * gsnap_tally.c: Fixed trimming. Turning off trimming by default. 16885 16886 * sam_tally.c: Initial import into CVS 16887 16888 * sam_splices.c, samread.c, samread.h: Fixed bug in specifying allowed 16889 dinucleotides. Moved parsing procedures to samread.c. 16890 168912010-07-13 twu 16892 16893 * sam_splices.c: Initial import into CVS 16894 168952010-07-10 twu 16896 16897 * spliceclean.c: Changed variable names 16898 16899 * stage2.c, stage2.h, stage3.h: Removed stage2 fingerprint 16900 16901 * gmap.c: Added freeing of pairpool and diagpool memory at certain intervals. 16902 16903 * pair.c, pair.h, stage3.c: Moved HMM code from pair.c to stage3.c 16904 16905 * pairpool.c, pairpool.h: Implemented Pairpool_free_memory function 16906 16907 * diagpool.c, diagpool.h: Implemented Diagpool_free_memory function 16908 16909 * gsnap.c: Added ability to remove adapters from paired-end reads. 16910 Providing option for maxlength on trimming. 16911 16912 * gmap.c: Using Stage2_scan method to rank gregions. Providing additional 16913 diagnostic options. 16914 16915 * diag.c, diag.h, diagpool.h: Added ability to allocate memory for 16916 diagonals, rather than using diagpool 16917 16918 * tally_expr.c: Fixed bug in using IIT index 16919 16920 * substring.h: Added handling of terminal reads 16921 16922 * substring.c: Using trimming maxlength. Fixed printing of sequences with 16923 adapters. 16924 16925 * stage3hr.c: Fixed identification of duplicates. Using total matches to 16926 compare results, rather than score. 16927 16928 * stage3.c, stage3.h: Using an HMM to find bad sections and fixing resulting 16929 dual breaks. 16930 16931 * stage2.c, stage2.h: Added Stage2_scan procedure. Providing diagonals for 16932 diagnostic purposes. Computing a fingerprint. 16933 16934 * stage1.c: Using a boolean to see if weight exists rather than depending on 16935 floating point value 16936 16937 * sequence.h: Added handling of finding adapters. Computing sequence 16938 quality for trimming. 16939 16940 * sequence.c: Fixed bug where fastq quality line begins with ">". Added 16941 removal of adapters from paired-end data. 16942 16943 * sam.h: Removed genome from argument lists 16944 16945 * sam.c: Fixed bugs in coordinates, epecially involving trimmed reads. 16946 Handling terminal reads. 16947 16948 * result.c, result.h: Added ability to report intermediate gregions or 16949 diagonals 16950 16951 * oligoindex.h: Added computation of fingerprint 16952 16953 * oligoindex.c: Added necessary clearing of oligoindex. 16954 169552010-07-09 twu 16956 16957 * pairdef.h, pairpool.c: Added Pair_goodness_hmm procedure. 16958 16959 * pair.c, pair.h: Added Pair_goodness_hmm procedure. Added printing of 16960 stage2 fingerprint. 16961 16962 * orderstat.c: Removed reliance on a floating point equality 16963 16964 * mem.c, mem.h: Added leak check procedures 16965 16966 * match.c, match.h, matchdef.h: Using a boolean to record whether weight is 16967 zero or not, rather than relying on floating point 16968 16969 * indexdb_hr.c: Added comment 16970 16971 * indexdb.c: Fixed printf procedure 16972 16973 * iit-read.h: Removed unused IIT_print prototype 16974 16975 * iit-read.c: Fixed print_record procedure 16976 16977 * gsnap_tally.c: Fixed trimming procedure. Added reference nucleotide in 16978 all lines. Fixed processing of all chromosomes. 16979 16980 * gsnap_splices.c: Fixed parsing. Made uniquep false by default. Added 16981 info about nextensions and nunique. 16982 16983 * gregion.c, gregion.h: Added fields ncovered and source, plus function 16984 Gregion_cmp 16985 16986 * get-genome.c: Removed unused function print_map 16987 16988 * genome.c, genome.h: Added function Genome_get_char 16989 16990 * dynprog.c: Added space for formatting 16991 16992 * stage1hr.h: Setting a maxlength on trimming 16993 16994 * stage1hr.c: Finding terminals rather than halfintrons. Fixed case where 16995 splice ends are adjacent in genome. 16996 169972010-07-02 twu 16998 16999 * stage3hr.h: Added support for a terminal alignment. 17000 17001 * stage3hr.c: Added support for a terminal alignment. Removed 17002 halfintron_support field. 17003 170042010-05-28 twu 17005 17006 * iit_store.c: Fixed issues with removing and re-inserting null divstring. 17007 170082010-05-26 twu 17009 17010 * stage1hr.c: Added trim_maxlength. Added nmismatches to halfintron 17011 alignments. 17012 17013 * stage3hr.h: Added trim_maxlength. 17014 17015 * stage3hr.c: Added trim_maxlength. Checking pairlength on samechr_single 17016 to see if concordant. 17017 170182010-05-21 twu 17019 17020 * iit-read.c, iit-read.h, stage3.c: Fixed printing of chromosome in map 17021 results 17022 170232010-05-20 twu 17024 17025 * stage3hr.c: Finding concordant pairs against translocations with chrnum == 17026 0, by making copies for each chrnum and storing in effective_chrnum. 17027 170282010-05-17 twu 17029 17030 * substring.c, substring.h: Added halfintron support field. 17031 17032 * stage3hr.c, stage3hr.h: Implemented sense consistency in paired-end 17033 alignments 17034 17035 * stage1hr.c: Fixed bugs in previous implementation of half introns 17036 170372010-05-16 twu 17038 17039 * stage1hr.c, stage3hr.c: Implemented new way of handling half introns, by 17040 storing best half intron for sense and for antisense 17041 170422010-05-14 twu 17043 17044 * resulthr.c, resulthr.h, stage1hr.c, stage3hr.c, stage3hr.h: Added 17045 procedure for finding samechr pairs if no concordant ones found. Revised 17046 result types to include PAIREDEND_SAMECHR_SINGLE and 17047 PAIREDEND_SAMECHR_MULTIPLE. 17048 170492010-05-13 twu 17050 17051 * stage1hr.c: Added conditional compilation statements for filtering 17052 halfintrons 17053 17054 * gsnap.c, stage3hr.c, stage3hr.h: Handling failsonly and nofails flags for 17055 paired-end data. Printing FASTQ format for failsonly on single-end data. 17056 170572010-04-16 twu 17058 17059 * iit-write.c: Fixed bug in freeing data when number of intervals is zero 17060 170612010-04-12 twu 17062 17063 * iit-read.c: Commented out IIT_index function 17064 17065 * sam.c: Fixed situation where query has no mapping and mate is an 17066 interchromosomal splice 17067 170682010-04-05 twu 17069 17070 * tally_expr.c: Initial import into CVS 17071 170722010-04-02 twu 17073 17074 * iit_get.c: Added allele information to -T option 17075 170762010-03-24 twu 17077 17078 * gsnap.c, sequence.c, sequence.h: Implemented processing of FASTQ files 17079 17080 * gmap.c: Using new interface to blackboard.c 17081 17082 * blackboard.c, blackboard.h: Added input2 to Blackboard_T object 17083 17084 * stage3hr.c, stage3hr.h: Fixed classification of paired-end reads when one 17085 or both ends have a translocation. 17086 170872010-03-09 twu 17088 17089 * stage3hr.c: Revised half_intron_score. Using that score when comparing 17090 overlapping half_introns with one another. 17091 17092 * gsnap.c, stage1hr.c, stage1hr.h: Added parameter for 17093 min_distantsplicing_identity 17094 17095 * stage1hr.c: Providing querylength information when making Stage3_T splice 17096 objects 17097 17098 * stage3hr.c, stage3hr.h: Adding a penalty to half-intron alignments based 17099 on the amount of sequence that was not aligned. 17100 17101 * stage3hr.c: Changed output for samechr results 17102 17103 * substring.c: Printing sub:0 instead of exact 17104 17105 * stage1hr.c: Checking for exact matches that cross chromosomal boundaries 17106 171072010-03-08 twu 17108 17109 * resulthr.c: Making all paired reads of type concordant 17110 17111 * stage3hr.c: Added printing of samechr as a special case of 17112 PAIREDEND_AS_SINGLES_UNIQUE. 17113 17114 * sam.h: Added mate information to nomapping result. 17115 17116 * sam.c: Removed unused code. Fixed printing of query string. Added mate 17117 information to nomapping result. 17118 17119 * gsnap_tally.c: Handling new output format for GSNAP 17120 17121 * gsnap.c: Using new interface for SAM_print_nomapping 17122 17123 * README: Added more information about GSNAP features and output 17124 171252010-03-04 twu 17126 17127 * iit-read.c, iit-read.h: Added function IIT_dump_sam 17128 17129 * gsnap.c: Renamed resulttypes 17130 17131 * resulthr.c, resulthr.h: Added resulttype PAIREDEND_AS_SINGLES_UNIQUE 17132 17133 * substring.c, substring.h: Added function Substring_match_length 17134 17135 * stage3hr.h: Computing chrnum, chroffset, genomicstart, and genomicpos at 17136 Stage3_T level for splices. 17137 17138 * stage3hr.c: Pairing up at each successive score level. Computing chrnum, 17139 chroffset, genomicstart, and genomicpos at Stage3_T level for splices. 17140 17141 * stage1hr.c: Fixed bug allowing deletion to extend past genomicpos 0. 17142 Fixed cases where known splicing occurs near end of sequence. Removing 17143 duplicate hits before pairing up ends. 17144 17145 * sam.c: Made multiple changes to generate correct SAM output 17146 171472010-03-01 twu 17148 17149 * substring.c, substring.h: Removed unnecessary parameters during printing 17150 17151 * stage3hr.h: Removed unnecessary parameters. 17152 17153 * stage3hr.c: Added support information to splices, and using it to select 17154 best half introns. Removing unnecessary parameters during printing. 17155 Checking for abort in pairing process, based on local counts. 17156 17157 * stage1hr.c: Added support information in making splices. Not checking for 17158 sufficiency for half introns. Using an abort_pairing_p flag, and when 17159 true, recomputing ends as singles. 17160 17161 * splicing-score.c: Using parserange module. Allowing range to be specified. 17162 17163 * iit-read.h: Removed unused parameter 17164 17165 * iit-read.c: Changed format strings to eliminate compiler warnings 17166 17167 * genome.c: Added parentheses around some conditional statements 17168 171692010-02-26 twu 17170 17171 * stage3.c: Removed unused parameters from print functions 17172 17173 * sequence.c: Handling sequence at end of file without line feed 17174 17175 * reader.c: Commented out unused code 17176 17177 * gsnap.c: Added flags for SAM and quiet-if-excessive. Dropped flags for 17178 probability thresholds. 17179 17180 * datadir.h: Added external interface for a function 17181 171822010-02-25 twu 17183 17184 * sam.c: Fixed bug where numbers of deletions was being reported as a 17185 negative number 17186 17187 * genome_hr.c, genome_hr.h, stage1hr.c: Removed computation of snpdiffs by 17188 genome_hr 17189 17190 * genome_hr.h, genome_hr.c: Added code for performing trimming. Using 17191 macros for clearing and setting outside regions in start and end blocks. 17192 17193 * stage1hr.c: Added trimming of splice ends to avoid extending into region 17194 of many mismatches. Saving all splice ends that have sufficient sequence 17195 and probability support. 17196 171972010-02-23 twu 17198 17199 * substring.c: Fixed printing of splices. Fixed bugs in retrieving SNP 17200 information. 17201 17202 * stage3hr.h: Returning found score from all functions that create a 17203 Stage3_T object 17204 17205 * stage3hr.c: Fixed computation of pair length. Fixed search for concordant 17206 pairs. 17207 17208 * stage1hr.h: Removed unused parameters 17209 17210 * stage1hr.c: Using found score rather than found number of mismatches. 17211 Fixed cases where indel pos was outside of query range. 17212 17213 * spanningelt.c: Fixed typecast error 17214 17215 * sam.c, sam.h: Implemented SAM output for paired-end reads 17216 172172010-02-12 twu 17218 17219 * resulthr.c, resulthr.h, stage1hr.c, stage3hr.c, stage3hr.h, substring.c, 17220 substring.h: Changed output format to have separate columns for alignment 17221 information and pair information. Standardized output routines. Three 17222 categories for paired-end reads: concordant, samechr, and unpaired. 17223 172242010-02-11 twu 17225 17226 * sam.c, sam.h, stage1hr.c, stage3hr.c, stage3hr.h, substring.c, 17227 substring.h: Rearranged and cleaned up code for making substrings 17228 172292010-02-10 twu 17230 17231 * gmap_process.pl.in: Removed code that removed version numbers on accessions 17232 172332010-02-03 twu 17234 17235 * indexdb.c: Fixed string formatting 17236 17237 * snpindex.c: Fixed some printing statements 17238 17239 * get-genome.c: Changed call to parserange to match new interface 17240 17241 * uintlist.c, uintlist.h: Added Uintlist_find command 17242 17243 * table.c, tableint.c: Added stdlib.h header file 17244 17245 * stage3.h: Added genome to print_alignment for splice sites scores in output 17246 17247 * stage3.c: Allowing null gaps again 17248 17249 * stage2.c: Added separate data types for a 1-dimensional matrix and 17250 2-dimensional matrix representation 17251 17252 * stage1hr.c: Prevented splicing unless both dinucleotides are present 17253 17254 * stage1.c: Removed extensions of gregions 17255 17256 * sequence.c: Commented out unused functions 17257 17258 * resulthr.c, resulthr.h: Renamed result type to PAIRED_AND_PAIRABLE 17259 17260 * parserange.c, parserange.h: Implemented parse_query function 17261 17262 * pair.c, pair.h: Added donor and acceptor scores to output 17263 17264 * orderstat.c, orderstat.h: Added functions for long int 17265 17266 * oligoindex.c, oligoindex.h: Added parameter oned_matrix_p 17267 17268 * nr-x.h: Added ppois functions 17269 17270 * nr-x.c: Added ppois functions. Fixed bug in pbinom for zero observed 17271 counts. 17272 17273 * list.c, list.h: Rewrote function for List_insert 17274 17275 * intlist.c: Handling case of empty list better for conversion to string 17276 17277 * interval.c, interval.h: Added functions for sorting intervals by position 17278 17279 * indexdb.c: Added debugging statements 17280 17281 * iit_plot.c: Using new interface to Genome_new 17282 17283 * iit_get.c: Implemented statistics function. Using long int for tally 17284 IITs. Using parserange module. 17285 17286 * iit-read.h: Added function for divlength 17287 17288 * iit-read.c: New implementation of sorting of intervals by position 17289 172902010-02-02 twu 17291 17292 * gmapindex.c: Increased expected table size for number of chromosomes. 17293 Stopping warning messages after 100 printed. 17294 17295 * gmap.c: Added genome parameter to Stage3_print_alignment 17296 17297 * get-genome.c: Using parserange module. Implemented flanking segments. 17298 17299 * genome_hr.c: Removed unused variables for certain compile-time conditions 17300 17301 * genome-write.c: Stopping warning messages after 50 are printed 17302 17303 * gdiag.c: Formatting changes 17304 17305 * except.c: Using pointers to exception frame objects 17306 17307 * dynprog.c: Reduced PAIRED_OPEN penalty from -24 to -18 17308 17309 * diag.h: Added function Diag_range 17310 17311 * diag.c: Reduced EXTRA_BOUNDS parameter 17312 17313 * datadir.c: Fixed bug where insufficient buffer space was provided for one 17314 string 17315 17316 * backtranslation.h: Removed void in formal parameter lists 17317 17318 * backtranslation.c: Casting character array indices to ints 17319 17320 * splicegene.c: Attempted to find genebounds on all sites 17321 173222010-02-01 twu 17323 17324 * splicegene.c: Implemented finding and reporting of alternate splice forms 17325 17326 * splicegene.c: Differentiated donor and acceptor sites. Handling reverse 17327 strand in reverse direction. Noting conflicts when either endpoint is 17328 close to an endpoint on the other. 17329 173302010-01-31 twu 17331 17332 * splicegene.c: Completely new rewrite based on pairinggene.c. Attempt to 17333 assign genebounds based on tally high and tally low. 17334 173352010-01-30 twu 17336 17337 * splicegene.c: Using tally high and low. Added hooks for alternate splice 17338 site. 17339 17340 * spliceclean.c: Performing validation based on ratio of count to maxcount 17341 over region 17342 173432010-01-29 twu 17344 17345 * spliceclean.c: Using less memory. Attempted validation of splices based 17346 on envelope. 17347 17348 * spliceclean.c: Initial import into CVS 17349 17350 * pairingcum.c: Implemented filtering based on significance at endpoints 17351 173522010-01-27 twu 17353 17354 * pairingcum.c: Added computation on floors as well as ceilings 17355 173562010-01-25 twu 17357 17358 * pairinggene.c: Testing flat regions against the splice IIT to determine if 17359 they are intron-like. Also adding splice edges to the original list. 17360 Splices will therefore need to be filtered. 17361 17362 * pairinggene.c: Reverted to previous version using only observed GSNAP 17363 splices 17364 17365 * pairinggene.c: Improved algorithm for distinguishing between intergenic 17366 flats and intron flats. 17367 173682010-01-24 twu 17369 17370 * geneeval.c: Initial import into CVS 17371 17372 * pairinggene.c: Improved algorithm for detecting intergenic regions. For 17373 flats, we can use a loose criterion without a level threshold, because of 17374 the ordering constraint. We are using both edges from flats and from 17375 gsnap splices. We added a procedure for distinguishing between intergenic 17376 regions and long exons based on the counts_tally. 17377 17378 * pairinggene.c: Using iblocks instead of nblocks to control exon segments, 17379 so essentially all combinations of introns are considered 17380 17381 * pairinggene.c: Reading in edges from the splices_iit file, presumably 17382 after filtering 17383 17384 * pairinggene.c: Attempt to get more edges by looking up splice edges when a 17385 flat does not yield clean ones 17386 17387 * cappaths.c: Added analysis of slopes and attempt to find a flat region 17388 17389 * pairinggene.c: Fixed bug with negative unsigned int 17390 17391 * pairinggene.c: For objective function, using the count of observed splices 17392 from GSNAP. 17393 17394 * pairinggene.c: Eliminated concept of an eblock (or exon block). Trying 17395 all intron combinations, since intergenic blocks are sufficient to contain 17396 the search space. 17397 17398 * pairinggene.c: Fixed bug where an up was a terminal, which hid the 17399 downstream down. Added some debugging code. 17400 17401 * pairinggene.c: Added checking of edges based on genome splice sites 17402 174032010-01-23 twu 17404 17405 * pairinggene.c: Made intergenic regions go between flats, and increased the 17406 length requirement. Using auto_exonlength for adding exons. 17407 17408 * pairinggene.c: Restricted intergenic blocks to be between adjacent down to 17409 up edges 17410 17411 * pairinggene.c: Reordered procedures to minimize memory usage 17412 17413 * pairinggene.c: Implemented a new algorithm for constructing the graph, 17414 using various blocks and building the graph in stages 17415 17416 * tally.c: Added functions for the median and for adding a runlength to an 17417 existing count 17418 17419 * pairinggene.c: Fixed error in formula for computing down edge 17420 17421 * splicing-score.c: Initial import into CVS 17422 17423 * pairingcum.c: Fixed a bug where the cum was being put at the wrong 17424 position, causing the down edge to be 1 position too small. 17425 17426 * pairinggene.c: Implemented trimming of ends 17427 17428 * pairinggene.c: Implemented a new test for intergenic regions based on 17429 finding a long flat region in the counts, which should not happen in an 17430 exon. 17431 174322010-01-22 twu 17433 17434 * pairinggene.c: Added a test for sharpness based on an area ratio 17435 17436 * pairinggene.c: Fixed dynamic programming procedure 17437 17438 * pairinggene.c: Keeping a min-max test on whether introns are acceptable, 17439 but using mean levels of introns and exons for scoring. 17440 17441 * pairinggene.c: Using zero-based check on pairingfull to test for 17442 intergenic regions. Added a greedy addition of introns. 17443 17444 * pairinggene.c: Attempt to use pairing full information and gradual 17445 downsloping to find UTRs. 17446 17447 * pairinggene.c: Using intron level minus exon level to determine edges with 17448 greater sensitivity. Implemented scores as double, rather than int, 17449 although currently using mincount and maxcount. 17450 17451 * pairinggene.c: Changed from onepath dynamic programming to multiple paths 17452 with terminals. Using explicit objects for exons and introns. 17453 17454 * pairinggene.c: Implemented finding of initial ups 17455 174562010-01-21 twu 17457 17458 * pairinggene.c: Initial import into CVS. Dynamic programming based on 17459 splicegene.c 17460 174612010-01-20 twu 17462 17463 * spliceeval.c: Implemented computation of reception zone and init/term 17464 status of splice sites 17465 17466 * gsnap.c, stage1hr.c, stage3hr.c, stage3hr.h: Implemented trimming of ends 17467 of sequences 17468 17469 * stage1hr.c: Allowing GC-AG splicing as well as GT-AG. Using a sliding 17470 scale of splice site probabilities based on alignment support. 17471 17472 * stage3hr.c, stage3hr.h: Added code for using a geneprob IIT file to assist 17473 in finding splice sites 17474 17475 * gsnap.c: Added a -g flag for using a geneprob IIT file to assist in 17476 finding splice sites 17477 174782010-01-19 twu 17479 17480 * tally.c, tally.h: Added function Tally_mean() 17481 17482 * tallyflats.c: Analyzing both fwd and rev tallies and storing in a single 17483 IIT file 17484 17485 * splicegene.c: Using variability in pairing.unk rather than tallyflats to 17486 determine intragenic regions 17487 17488 * spliceeval.c: Computing slopes internally, rather than relying upon 17489 pairingflats 17490 17491 * pairingflats.c: Added median smoothing 17492 17493 * lgamma.c, lgamma.h: Added ppois function 17494 17495 * genecompare.c: Initial import into CVS 17496 17497 * geneeval.c: Changed name to genecompare.c 17498 17499 * geneeval.c: Added ability to handle comment lines in gene 17500 175012010-01-17 twu 17502 17503 * lgamma.c, lgamma.h, random.c, random.h, pairingstrand.c, tallyflats.c: 17504 Initial import into CVS 17505 17506 * tally.c, tally.h: Added functions 17507 17508 * geneeval.c: Printing goldstandard information in comment line 17509 17510 * cappaths.c: Using pairing fwd and rev iits 17511 17512 * spliceeval.c: Removed unused code. Added procedures for merging 17513 pairingflats. 17514 17515 * splicegene.c: Added a check for validity tally flats by looking at tally 17516 information 17517 17518 * tallyflats.c: Keeping track separately of zero regions and flat regions. 17519 Changed parameters. 17520 17521 * gsnap_splices.c: Removed unused code 17522 17523 * gsnap_tally.c: Added flags for picking specific strands and for forced 17524 trimming at ends 17525 17526 * pairingflats.c: Storing regions and then printing them. Have three 17527 states, for zero, flat, and bumpy. 17528 17529 * pairingcum.c: Print all run lengths, even those with level 0 17530 17531 * splicegene.c: Using tallyflats to determine boundaries for donor to 17532 acceptor 17533 17534 * splicegene.c: Removed donorprob and acceptorprob. Recording and printing 17535 all extra information from each splice. Removed unused paths code. 17536 175372010-01-16 twu 17538 17539 * pairingflats.c: Initial import into CVS 17540 17541 * spliceeval.c: Added a intron_transition buffer at the ends of each intron, 17542 where level changes are ignored. 17543 175442010-01-15 twu 17545 17546 * splicegene.c: Using a gap test on pairing IITs to determine whether to 17547 link donor to previous acceptor. 17548 17549 * spliceeval.c: Now computing statistics based on edge finding using Poisson 17550 model and number of consecutive zeroes. 17551 17552 * spliceeval.c: Printing mean pairing levels of each splice 17553 175542010-01-13 twu 17555 17556 * tallyhmm.c: Integrated parserange and tally modules. Removed hints. 17557 Added edge detection. Simplified state model. 17558 175592010-01-12 twu 17560 17561 * tally.c, tally.h: Implemented an exon test and a scanning solution for 17562 pairing information. 17563 17564 * splicegene.c: Using an exon test to determine if we can join splices 17565 17566 * littleendian.c, geneeval.c: Initial import into CVS 17567 17568 * bigendian.c: Created distinct function names for 64-bit procedures. Added 17569 procedures for OUTPUT_BIGENDIAN. Fixed compiler warning messages about 17570 truncating unsigned ints to chars. 17571 17572 * bigendian.h: Created distinct function names for 64-bit procedures 17573 175742010-01-11 twu 17575 17576 * cappaths.c: Initial import into CVS 17577 175782010-01-09 twu 17579 17580 * tally.c, tally.h: Initial import into CVS 17581 175822010-01-08 twu 17583 17584 * pairingcum.c: Initial import into CVS 17585 175862010-01-07 twu 17587 17588 * splicegene.c: Computing exonbounds for each donor 17589 175902010-01-05 twu 17591 17592 * tallyhmm.c: Using edges rather than edgepairs 17593 175942010-01-04 twu 17595 17596 * parserange.c, parserange.h, spliceeval.c: Initial import into CVS 17597 17598 * splicegene.c: Added iterative method to remove conflicting splices 17599 17600 * splicegene.c: Computing one path over forward and one path over reverse 17601 strands, instead of collecting terminals 17602 176032010-01-03 twu 17604 17605 * splicegene.c: Added reading and printing of probability values. Added 17606 debugging statements for Paths_remove_dominated 17607 176082009-12-28 twu 17609 17610 * stage1hr.c, stage1hr.h: Added separate stage for half introns. Added hook 17611 for geneprob_iit eval. 17612 176132009-12-22 twu 17614 17615 * splicegene.c: Initial import into CVS 17616 17617 * gsnap_splices.c: Added command for dumping graph 17618 176192009-12-21 twu 17620 17621 * gsnap.c, stage3hr.h: Added ability to print output in SAM format 17622 17623 * stage3hr.c: Added ability to print output in SAM format. Fixed bug in 17624 identifying pairing. 17625 176262009-12-17 twu 17627 17628 * exonscan.c: Added function for writing edges 17629 176302009-12-10 twu 17631 17632 * stage1hr.c: Fixed bug in insertion at end of query sequence. Removed 17633 requirement for HALF_INTRON_END_LENGTH. Made separate done levels for 5' 17634 and 3' ends in paired alignment. 17635 176362009-12-04 twu 17637 17638 * gsnap_tally.c: Added ability to run on forward or reverse complement 17639 strand only 17640 17641 * gsnap_tally.c: Added ability to run on all chromosomes 17642 176432009-11-25 twu 17644 17645 * stage1hr.h: Added new masktypes 17646 17647 * stage1hr.c: Created a single procedure for omit_oligos. Altered xfirst 17648 and xlast calculation. 17649 176502009-11-20 twu 17651 17652 * stage1hr.c: Made slight efficiency improvements in accessing floor->score 17653 array 17654 176552009-11-18 twu 17656 17657 * stage2.c: Combined features of versions 235 and 237 for both GMAP and PMAP 17658 to work. 17659 17660 * stage2.h: Updated interface 17661 17662 * stage2.c: Fixed bug where processed was updated too soon 17663 17664 * pairpool.c, pairpool.h: Added function Pairpool_transfer_n 17665 17666 * orderstat.c: Commented out debugging function 17667 17668 * gmap.c, oligoindex.c: Restored variables specific to gmap 17669 17670 * oligoindex.c, oligoindex.h: Added major and minor oligoindices 17671 17672 * gmap.c: Added Oligoindex_clear_inquery in all cases 17673 17674 * stage2.c: Restored stage 2 to working condition 17675 176762009-11-06 twu 17677 17678 * maxent.c, maxent.h: Added functions for reporting log odds scores 17679 17680 * littleendian.h: Added interface for WRITE_UINT 17681 17682 * list.c: Added check for NULL in List_truncate 17683 17684 * iit_fetch.c: Added flag for computing cumulative total of an iit. Removed 17685 unused variables. 17686 17687 * iit_get.c: Added flags for computing mean and overall total of tally iit. 17688 176892009-11-04 twu 17690 17691 * add_rpk.c: Initial import into CVS 17692 176932009-10-30 twu 17694 17695 * exonscan.c, hint.c, hint.h, tallyhmm.c: Using edgepair and splice 17696 information in transitions, and tally and pairing information in 17697 emissions. Providing separate training information for transitions and 17698 emissions. 17699 177002009-10-27 twu 17701 17702 * hint.c, hint.h: Initial import into CVS 17703 177042009-10-26 twu 17705 17706 * stage3.c: Removed unused variables 17707 177082009-10-14 twu 17709 17710 * pair.c: Fixed bug in PSL output 17711 177122009-10-08 twu 17713 17714 * exonscan.c, tallyhmm.c: Multiple changes. Version used for rGASP 17715 submission 2. 17716 177172009-10-02 twu 17718 17719 * gsnap.c, stage1hr.c, stage1hr.h: Allowing user to specify max mismatches 17720 as a fraction of read length 17721 17722 * stage3hr.c, stage3hr.h: Made printing of score and insert length more 17723 consistent. Made filtering of paired hits by score and duplicates 17724 consistent with filtering of single hits. 17725 17726 * resulthr.c, resulthr.h: Removed Pairedresult_T type 17727 17728 * stage3hr.c: Made printing of insert length consistent for paired-end reads 17729 17730 * gsnap.c, stage1hr.c, stage1hr.h: Added parameters for minimum end matches 17731 for local and distant splicing 17732 177332009-10-01 twu 17734 17735 * spanningelt.c: Made intersection procedures remove duplicates 17736 17737 * snpindex.c: Formatting change 17738 17739 * gsnap.c: Added parameters for second part of novel splicing and half 17740 intron minimum support 17741 17742 * genome_hr.c, genome_hr.h: Returning ncolordiffs 17743 17744 * gbuffer.h: Added procedures for allocing and freeing contents 17745 17746 * blackboard.c: Fixed problem with hanging when using -q batch feature 17747 17748 * stage1hr.c: Made min_end_matches work on middle indels 17749 17750 * stage3hr.h: Added printing of colordiffs and score. 17751 17752 * stage3hr.c: Added printing of colordiffs and score. Fixed problem with 17753 printing splice on second, inverted read. 17754 17755 * stage1hr.h: Added half_intron_min_support parameter. 17756 17757 * stage1hr.c: Added half_intron_min_support parameter. Fixed bug where 17758 deletion indels were mixed up with colordiffs. Fixed bug where splice 17759 junctions were evaluated past beginning of genome. 17760 177612009-09-21 twu 17762 17763 * tallyhmm.c: Uses hints from splices iit and altexons iit files. Added 17764 median filtering. 17765 17766 * exonscan.c: Added ability to get splice sites from splices iit and 17767 altexons iit file 17768 17769 * tallyhmm.c: Adding information from splices_iit and altexons_iit files 17770 17771 * tallyhmm.c: Implemented two-strand solution as default, with ability to 17772 force 1-strand solution. Provided hooks for splices and altexons iit 17773 files. 17774 17775 * gsnap_tally.c: Added flag for handling 2-base encoded GSNAP output 17776 17777 * gsnap_splices.c: Eliminated printing of overlapping paths. 17778 177792009-09-20 twu 17780 17781 * gsnap_splices.c: Fixed various bugs 17782 17783 * gsnap_splices.c: Added ability to find alternate skipped or extra exons at 17784 each acceptor. 17785 17786 * gsnap_splices.c: Initial import into CVS 17787 177882009-09-19 twu 17789 17790 * tallyhmm.c: Implemented faster way of computing running percentiles 17791 17792 * tallyhmm.c: Implemented ability to read lambda parameters from a file. 17793 Attempted to add a SINGLE exon state and allow transitions from NON to 17794 SINGLE even when no edges were present. 17795 17796 * gsnap_tally.c: Had program determine own trimming based on scoring matches 17797 and mismatches from the ends. 17798 17799 * iit_get.c: Added -M flag for reporting mean of a region in a tally IIT 17800 file. 17801 17802 * gsnap_splicing.pl: Initial import into CVS 17803 178042009-09-18 twu 17805 17806 * exonscan.c: Printing information about sharp edges 17807 17808 * exonscan.c: Fixed bug in recording history. Added hook for allowing GC 17809 donor site. Added splice model probability to name of splice site. 17810 17811 * tallyhmm.c: Added flag for printing lambdas 17812 17813 * tallyhmm.c: Added ability to handle multiple sites at the same position, 17814 by making a mixture of transition tables. Wrote down transition table 17815 explicitly. 17816 178172009-09-17 twu 17818 17819 * tallyhmm.c: Added smoothing to estimation of lambdas. Added routines for 17820 printing genes. 17821 17822 * tallyhmm.c: Consolidated separate fivefwd, fiverev, threefwd, and threerev 17823 sites back into up and down sites. 17824 17825 * tallyhmm.c: Working version of Viterbi algorithm, but still need output of 17826 segments. 17827 17828 * tallyhmm.c: Initial import into CVS 17829 17830 * segue.c: Attempt to use objective function based on sum of counts, 17831 relative to threshold. 17832 17833 * segue.c: Reduced states to be much simpler, where only one strand can be 17834 coding at a time. 17835 178362009-09-16 twu 17837 17838 * segue.c: Added functions for printing genes by exons. 17839 17840 * segue.c: Fixed lookback lengths between sites. Added flag for specifying 17841 knownsites iit. 17842 17843 * segue.c: Optimizing using mean square error. Simplified code for 17844 traversing graph. 17845 17846 * segue.c: Fixed bug with computing cumulative gammln. Version works on 17847 test data set. 17848 17849 * segue.c: Complete rewrite to handle both strands simultaneously 17850 178512009-09-15 twu 17852 17853 * exonscan.c: Using LR test to take all acceptable gene ends. Using 17854 separate end bounds for finding gene ends. 17855 17856 * exonscan.c: Using both edge algorithms, stepfunction and linear fit. Made 17857 different objects for Edge_T and Diff_T. Using goodness-of-fit instead of 17858 xintercept for finding gene ends. 17859 17860 * exonscan.c: No longer using bootstrap method, but relying on testing of 17861 sites using goodness of fit. Evaluating missing edges for both ups and 17862 downs based only on greedy splice sites, and then performing both testing 17863 of sites and tracing. 17864 178652009-09-14 twu 17866 17867 * exonscan.c: Using x-intercepts instead of step function to detect edges 17868 17869 * exonscan.c: Added hooks for a history-recording mechanism. 17870 17871 * exonscan.c: Fixed some bugs with array indices. Added flags for debugging. 17872 17873 * segue.c: Using splice model scores to evaluate introns 17874 178752009-09-13 twu 17876 17877 * exonscan.c: Made numerous tweaks to the scanning algorithm. Incorporated 17878 finding of ends into scanning, using x-intercepts. Always finding ends 17879 when an edge with a splice lacks a match. 17880 17881 * exonscan.c: Implemented two-phase method on stepfunction results, first 17882 picking steps with highest probabilities, and then bootstrapping 17883 neighboring steps. 17884 17885 * exonscan.c: Allowed best prob again for stepfunction results. Fixed bug 17886 in code in scanning procedure. Distinguishing between donor and acceptor 17887 types for matching edges. 17888 17889 * exonscan.c: Large numbers of changes. Implemented scanning method, 17890 testing at positions with good splice site scores, for finding other ends. 17891 Using adjacency information to decide whether to scan. Implemented 17892 testing procedures for ends. Removed unused code. 17893 178942009-09-12 twu 17895 17896 * exonscan.c: Implemented a Gibbs sampling method to speed up identification 17897 of changepoint, but reverted back to testing goodness of fit exhaustively 17898 over a limited range. 17899 17900 * exonscan.c: Implemented a strategy of finding edges only for those that 17901 appear to be missing. Using changepoint to find those edges, with 17902 maximizing goodness of fit. 17903 17904 * exonscan.c: Made reasonably good step function based on log scale. 17905 17906 * exonscan.c: Implemented rampfunction as anchored to a step result. 17907 179082009-09-11 twu 17909 17910 * exonscan.c: Implemented a ramp detector using linear fitting, but too 17911 sensitive 17912 17913 * exonscan.c: Using a cumulative tally to speed up computation of segment 17914 means 17915 17916 * exonscan.c: Added hooks for a redo changepoint step 17917 17918 * segue.c: Implemented traversal of minus strand. Implemented reading of 17919 splicepairs. 17920 17921 * segue.c: Implemented scoring of exons using log likelihood. Implemented 17922 dynamic programming and printing of paths. 17923 179242009-09-10 twu 17925 17926 * exonscan.c: Added splice sites based on analyzing local data, using 17927 methodology from splicescan. 17928 17929 * splicescan.c: Implemented posterior log odds calculations. 17930 17931 * exonscan.c: Added filtering of other ends. Cleaned up unused code. 17932 17933 * exonscan.c: Made output format consistent with that of splicescan 17934 17935 * segue.c: Initial import into CVS 17936 17937 * exonscan.c: Added finding of nearest good splice sites. 17938 179392009-09-09 twu 17940 17941 * exonscan.c: Method based on finding exons. However, will need to switch 17942 to a dynamic programming method. 17943 17944 * exonscan.c: Initial import into CVS 17945 179462009-09-08 twu 17947 17948 * splicescan.c: Added options for separate output files, training mode only, 17949 and random output. 17950 17951 * stage1hr.c: Fixed algorithm for end indels. Provided hooks for 2-base 17952 encoding. 17953 179542009-09-07 twu 17955 17956 * splicescan.c: Fixed calculations 17957 179582009-09-06 twu 17959 17960 * splicescan.c: Added ability to use a known splice site IIT file 17961 17962 * splicescan.c: Initial import into CVS 17963 179642009-09-03 twu 17965 17966 * stage1hr.c: Fixed bug where singlehits5 and singlehits3 not being 17967 initialized. Set limits on local splicing hits and attempts. 17968 179692009-09-02 twu 17970 17971 * stage3.c, stage3.h: Allowing a re-do of stage 2 for bad exons in middle 17972 179732009-08-31 twu 17974 17975 * gsnap.c: Using single-end hits already computed when paired alignments not 17976 found. 17977 17978 * gmap.c: Added minor oligoindices 17979 17980 * changepoint.c: Added comment 17981 17982 * stage3hr.c, stage3hr.h: Introduced faster pair-up procedure. Sorting 17983 paired-end solutions by score. 17984 17985 * stage1hr.h: For paired-end alignment, returning single-end hits. 17986 17987 * stage1hr.c: Fixes to paired-end alignment: (1) stopping when excessive 17988 splicing hits or paired hits found, (2) using new pair_up procedure, (3) 17989 fixed pairing code, (4) returning single-end hits. For dibase alignment, 17990 skipping spanning set. 17991 17992 * iit_store.c: Using total label and annotation lengths to decide if format 17993 should use 8-byte quantities. 17994 17995 * iit_get.c: Added flag to explicitly indicate coordinate is a label. Added 17996 flag to print all zeroes in tally mode. 17997 179982009-08-28 twu 17999 18000 * stage3hr.c, stage3hr.h: Taking a splicing penalty for all splices. Added 18001 code for marking dibase mismatches. 18002 18003 * stage1hr.c, stage1hr.h: Made procedure work for 2-base encoded reads 18004 18005 * oligo.c: Added code to read 2-base encoded queries 18006 18007 * reader.c, reader.h: Added field to indicate if Reader_T is for dibase 18008 queries 18009 18010 * littleendian.h: Added code for handling 8-byte quantities 18011 18012 * iit-read.c, iit-write.c, iitdef.h: Added version 4 format, which uses 18013 8-byte quantities to store label pointers and annotation pointers. 18014 18015 * gsnap_tally.c: Added trimming on left and right 18016 18017 * gsnap.c: Added flag for 2-base mode. Added local splice penalty. 18018 18019 * genome_hr.c, genome_hr.h: Provided hooks for dibase procedures 18020 18021 * genome.c, genome.h: Provided exposure to uncompress_mmap directly from 18022 blocks, needed by dibase procedures. 18023 18024 * dibase.c, dibase.h: Initial import into CVS 18025 18026 * compress.c: Added code for compressing 2-base color genomes, but not 18027 necessary. 18028 18029 * bigendian.c, bigendian.h: Added functions for 8-byte quantities 18030 18031 * access.c: Changed types in debugging statements for off_t 18032 180332009-08-21 twu 18034 18035 * oligo.c: If state is invalid, skipping forward until a valid state is found 18036 18037 * sequence.h: Added FILE * parameter for oneline outputs. 18038 18039 * sequence.c: Added FILE * parameter for oneline outputs. Added hooks for 18040 skipping dashes, but appears to be buggy. 18041 18042 * gsnap.c: Added flag to turn off output (quiet) if too many are found. 18043 18044 * stage1hr.c: Classified half introns as long distance 18045 180462009-08-20 twu 18047 18048 * stage1hr.c: Moved half introns after distant splicing. Set fast_level to 18049 be 1, if user hasn't already specified it. 18050 180512009-08-19 twu 18052 18053 * gmap_setup.pl.in: Using "use warnings" instead of -w flag 18054 180552009-08-18 twu 18056 18057 * stage3hr.h: Added quiet-if-excessive flag. 18058 18059 * stage3hr.c: Fixed problem where total_nmismatches not being set for 18060 indels. Making printing of excessive paths consistent with single-end 18061 behavior. 18062 18063 * stage1hr.c: Fixed problem where pair_up function was creating circular 18064 loops by calling List_append more than once. 18065 18066 * list.c, list.h: Added function List_dump 18067 180682009-08-17 twu 18069 18070 * gsnap-to-iit.c, gsnap_tally.c: Renamed gsnap-to-iit.c to gsnap_tally.c 18071 18072 * stage1hr.c: Fixed bug where program tried to find deletions at end 18073 extending past coordinate 0U. 18074 180752009-08-14 twu 18076 18077 * stage3hr.c: Revised paired-end output to show npaths and indication of 18078 paired or unpaired. 18079 18080 * bigendian.c, gsnap.c, mem.c, iit-read.c, sequence.c, genome.c, indexdb.c, 18081 indexdb_hr.c: Fixed compiler warnings from -Wall 18082 18083 * iit-read.c: Fixed bug where divno not checking the last div. 18084 18085 * sequence.h: Added a function to the interface 18086 18087 * genome.c: Using SNP_FLAGS for getting alternate genome 18088 18089 * indexdb.c, indexdb.h: Added procedure for reading with diagterm and 18090 sizelimit 18091 18092 * indexdb_hr.c: Providing information about nmerged 18093 18094 * maxent.h: Defined a variable to specify maximum storage required 18095 18096 * maxent.c: Added code for computing splices from revcomp sequences 18097 18098 * cmet.c: Added a missing type 18099 18100 * genome_hr.c, stage3hr.c: Fixed compiler warnings from -Wall. 18101 18102 * stage1hr.h: Added Floors_free() to interface. 18103 18104 * stage1hr.c: Fixed bug where unsigned int * assigned to signed int *. 18105 Fixed compiler warnings from -Wall. 18106 181072009-08-04 twu 18108 18109 * list.c, list.h: Added function List_truncate 18110 18111 * pair.c, pair.h: Added function Pair_fracidentity_max 18112 18113 * stage3.h: Added some interfaces 18114 18115 * stage3.c: Forward/reverse decision based on local scoring around each 18116 intron. Distal/medial step now truncates distal exon at best point, and 18117 iterates. When edges cross in changepoint step, now chopping shortest end. 18118 18119 * stage2.c: Allowing rightward and leftward shifts in finding shifted 18120 canonical introns. Fixed bug in scoring for reverse introns. Adjusted 18121 scoring for canonical introns. 18122 18123 * stage1.h: Imposing a size limit on position lists, so large ones are 18124 ignored. 18125 18126 * stage1.c: Imposing a size limit on position lists, so large ones are 18127 ignored. Finding best solutions for each level of numbers of exons. 18128 181292009-07-30 twu 18130 18131 * stage3.c: Using existing pairs/path in end exons, rather than recomputing 18132 in distal/medial calculation. Moved distal/medial after changepoint. 18133 18134 * stage3.h: Moved distal/medial calculation to be after changepoint 18135 18136 * README: Added information about processing reads from bisulfite-treated DNA 18137 18138 * gsnap.c, stage3hr.c, stage3hr.h: Added printing of SNP information 18139 18140 * stage1hr.c: Making correct decision on when to find splice ends for half 18141 introns 18142 18143 * gmap.c: Added indexdb_size_threshold. Made -9 flag do checking, but not 18144 print full diagnostics. 18145 18146 * dynprog.c: Made gap penalties at ends the same as for middle 18147 18148 * diag.c: Added debugging statement 18149 18150 * Makefile.gsnaptoo.am: Included snpindex 18151 181522009-07-29 twu 18153 18154 * README: Added information about providing information to GSNAP about known 18155 splice sites and SNPs 18156 18157 * gsnap.c: Increased default max_middle_insertions from 6 to 9 18158 18159 * stage1hr.c: Checking entire query for nsnpdiffs when snp_blocks is present 18160 181612009-07-07 twu 18162 18163 * stage1.c: In find_best_path, calculating median of segments and requiring 18164 that medians ascend or descend. Adjusting scores for overlaps between 18165 segments. 18166 181672009-07-01 twu 18168 18169 * gsnap.c, stage1hr.c, stage1hr.h: Added separate probability thresholds for 18170 local and distant splicing 18171 18172 * stage1hr.c, stage1hr.h: Re-implemented paired-end alignment 18173 181742009-06-29 twu 18175 18176 * gsnap.c, stage1hr.c, stage1hr.h, stage3hr.c, stage3hr.h: Introduced a 18177 penalty for distant splicing 18178 18179 * gsnap.c: Introduced masktype 18180 18181 * stage3hr.c, stage3hr.h: Introduced chimera_prob field for Stage3_T object. 18182 Increased distantsplicing penalty from 1 to 2. 18183 18184 * stage1hr.c: For novel splice sites in local splicing, requiring canonical 18185 dinucleotides plus sufficient probability score in either donor or 18186 acceptor. Fixed bug in recognizing plus antisense splicing. Restricted 18187 half introns to known splice sites. 18188 18189 * stage1hr.h: Added masktype 18190 18191 * stage1hr.c: Added masktype. Added back half introns. 18192 181932009-06-28 twu 18194 18195 * stage1hr.c: Merged finding of distant splice pairs using known or novel 18196 splice sites. 18197 18198 * stage1hr.c: Consolidated finding of splice pairs using known or novel 18199 splice sites. Making a single call to retrieve genomic segment for local 18200 splicing. 18201 182022009-06-26 twu 18203 18204 * stage1hr.c: Removed unused tournament and middle_indel_p code 18205 18206 * stage1hr.c: Using floors for novel distant splicing 18207 18208 * stage1hr.c: Using floors for finding novel local splicing 18209 18210 * stage3hr.h: Added field for paired_up 18211 18212 * stage3hr.c: Added field for paired_up. Removing duplicates that differ in 18213 indel gap length. 18214 182152009-06-25 twu 18216 18217 * stage1hr.h: Implemented paired-end alignment 18218 18219 * stage1hr.c: Implemented paired-end alignment. Fixed computation of floors 18220 for middle indels. Fixed computation of find_segments_all for novel 18221 splicing. 18222 182232009-06-18 twu 18224 18225 * Makefile.dna.am, Makefile.gsnaptoo.am: Removed segmentpool files from 18226 Makefile.am 18227 18228 * gsnap.c, segmentpool.c, segmentpool.h, stage1hr.c, stage1hr.h: Removed all 18229 references to segmentpool 18230 18231 * stage1hr.c: Removed code for Segmentpool_T 18232 18233 * segmentpool.c, segmentpool.h: Removed some fields 18234 18235 * stage1hr.c: Fixed bug resulting from resetting nmismatches_all 18236 unnecessarily 18237 18238 * stage3.c: Fixed bug in computing goodness_rev 18239 182402009-06-09 twu 18241 18242 * dynprog.c: Increased penalties for mismatch at ends and for opening indels 18243 around introns. 18244 18245 * gsnap.c: Increased default definition of shortsplicedist from 200000 to 18246 500000. 18247 18248 * stage1hr.c: Fixed bugs in dealing with empty plus_segments or 18249 minus_segments. 18250 182512009-06-08 twu 18252 18253 * stage1hr.c: Created variables for faster retrieval of floor scores. 18254 Reporting half introns only if both local and distant known splicing fail. 18255 182562009-06-07 twu 18257 18258 * stage1hr.c: Fixed computation of floors, floor_xfirst, and floor_xlast 18259 18260 * stage1.c: Fixed bug in debugging variables 18261 18262 * stage1hr.c: Implemented code that does not use Segmentpool_T object 18263 18264 * pair.c: Treating ambiguous characters as mismatches for computing 18265 matchscores 18266 18267 * gmap.c: Proving an option for splicesites output. Allowing use of a SNP 18268 genome version. 18269 18270 * dynprog.c: Giving ambiguous characters a negative score in GMAP, but not 18271 PMAP 18272 18273 * stage3.c: Computing direction using just canonical introns and indel 18274 openings 18275 18276 * stage1hr.c: Removed middle_indel_p field in Segment_T object 18277 182782009-06-06 twu 18279 18280 * segmentpool.c, segmentpool.h, stage1hr.c: Removed floor from the 18281 Segmentpool_T object 18282 18283 * segmentpool.c, segmentpool.h: Added a procedure for pushing without any 18284 floors 18285 18286 * stage3hr.c: Added a penalty for distant splicing 18287 18288 * stage1hr.c: Performing local and distant splicing in separate levels 18289 182902009-06-05 twu 18291 18292 * pair.c, pair.h, stage3.h: Added a function for printing splicesites 18293 18294 * stage3.c: Reinstated checking of goodness to determine direction, but now 18295 considering just canonical introns and indels. Added a final call to 18296 assign_gap_types after trimming. 18297 182982009-06-04 twu 18299 18300 * stage1hr.c: Small improvements to code for binary_search and dual_search 18301 18302 * stage1hr.c: Made improvements in code for dual_search by removing lowi and 18303 lowj and updating pointers for positions1 and positions2. 18304 183052009-06-03 twu 18306 18307 * stage1hr.c: Using single splicesites list for novel local splicing 18308 18309 * stage1hr.c: Implemented slightly more efficient code for dual_search. 18310 18311 * gsnap.c, stage1hr.c, stage1hr.h: Improved speed of finding known 18312 splicesites by doing one dual_search for all splice types. 18313 18314 * stage1hr.c: Revised dual_search procedure to handle overlapping 18315 splicesites and positions correctly. 18316 183172009-06-02 twu 18318 18319 * genome_hr.c: Fixed bug in handling pos5 for Genome_mismatches_left. 18320 18321 * stage1hr.c: Fixed dual_search so it handles all overlapping splicesites 18322 and intervals 18323 18324 * stage1hr.c: Implemented faster code for finding novel splice ends 18325 18326 * stage1hr.c: Changed Floors_T to be just a single set of scores. Using 18327 floors to prune known splice ends. 18328 183292009-06-01 twu 18330 18331 * stage3hr.c: Removed genomiclength from Stage3_T object. Using chrnum of 0 18332 to indicate distant splicing. 18333 18334 * stage1hr.c: Removed oldindels code 18335 183362009-05-29 twu 18337 18338 * stage1hr.c: Using compressed nucleotide-level alignment for end indels. 18339 Preventing firstbound and lastbound from going past read boundaries. 18340 18341 * stage1hr.c: Fixed assignment of shortdistancep in splice pairs. Fixed 18342 assignments of prior penalties. 18343 183442009-05-25 twu 18345 18346 * stage3hr.c: Eliminating duplicates where genomicstart and genomicend are 18347 equal 18348 18349 * stage1hr.c: Excluding splice positions at ends in novel local splicing 18350 18351 * genome_hr.c: Fixed bug in mismatches_right_snps where startdiscard was not 18352 being applied 18353 183542009-05-24 twu 18355 18356 * stage3hr.h: Added genomicend to Stage3_T object 18357 18358 * stage3hr.c: Made removal of duplicates work for splicing by adding 18359 genomicend to Stage3_T object. Made removal of duplicate splice ends 18360 faster. 18361 18362 * stage1hr.c: Eliminating splice ends where splice position occurs too close 18363 to the beginning. Fixed triage to treat novel splicing and known splicing 18364 equally, and to give preferences to substitutions over indels over 18365 splicing when hits are found at each type. 18366 18367 * gsnap.c: Passing only one splice prob to stage 1 18368 18369 * stage3hr.c: Lookin up splicesites iit only if site was known 18370 18371 * stage1hr.h: Passing only one splice prob. 18372 18373 * stage1hr.c: Implemented finding of novel distant splice pairs. Fixed bug 18374 in dual_search. 18375 183762009-05-23 twu 18377 18378 * stage1hr.c: Implemented novel local splicing, which includes known splice 18379 sites 18380 183812009-05-22 twu 18382 18383 * indexdb_hr.c, indexdb_hr.h, spanningelt.c, stage1hr.c: Implemented gallop 18384 search 18385 18386 * stage1hr.c, stage1hr.h: Implemented a new flow through the different 18387 algorithms. 18388 18389 * gsnap.c: Made shortsplicedist an unsigned int. Changed name of spliceprob 18390 to minspliceprob. 18391 18392 * stage3hr.c: Using total number of mismatches to score spliced reads 18393 18394 * stage1hr.c: Implemented dual intersection method for finding known splice 18395 sites 18396 18397 * genome_hr.c, genome_hr.h: Added substring parameters to 18398 Genome_count_mismatches_limit 18399 184002009-05-20 twu 18401 18402 * stage1hr.h: Changed name from spliceprob to minspliceprob 18403 18404 * stage1hr.c: Moved mismatches, indels, and splicing into a single 18405 procedure. Using firstbound and lastbound for half introns. Added 18406 provisions for a stretch procedure which uses a spanning set with 18407 nrequired = 1. 18408 18409 * stage2.c: Modified debugging output 18410 18411 * oligoindex.c: Removed unnecessary clearing step 18412 18413 * gmap.c: Fixed bug where program failed to clear oligoindices after a poor 18414 or repetitive sequence. 18415 184162009-05-17 twu 18417 18418 * genome_hr.c, gsnap.c: Enabled methylation mode on snp databases 18419 18420 * indexdb.c: Improved pre-loading messages for snp databases 18421 18422 * snpindex.c: Changed naming convention for snp databases 18423 18424 * stage1hr.c: Fixed bug in calling new substitution with fixed value for 18425 cmetp. 18426 18427 * cmetindex.c: Made the program work for snp databases 18428 18429 * stage3hr.c, stage3hr.h: Stage3 now handles all marking of mismatches. 18430 Implemented marking of methylation for indels. 18431 18432 * stage1hr.c: Moved functions for counting and marking mismatches to 18433 stage3hr.c 18434 18435 * gsnap.c: Removed call to specify methylation printing 18436 18437 * genome.c, genome.h: Removed function for signaling methylation printing 18438 184392009-05-16 twu 18440 18441 * stage1hr.h: Providing second indexdb and size_threshold to procedures. 18442 18443 * stage1hr.c: Omitting frequent 12-mers in the middle and poly-AT 12-mers at 18444 the ends. Performing another round without omitting 12-mers if necessary. 18445 Added first implementation for handling methylation data. 18446 18447 * oligo.c, oligo.h: Changed definition of repetitive to mean only shifts of 18448 1, 2, or 3 nucleotides. Added procedure to mark frequent oligos, but not 18449 used. 18450 18451 * indexdb.c, indexdb.h: Added procedure to compute indexdb mean size. 18452 18453 * gsnap.c: Added flag to deal with methylation data. Passing size_threshold 18454 to stage 1 procedure. 18455 18456 * genome_hr.c, genome_hr.h: Added procedures to deal with methylation data. 18457 18458 * cmetindex.c: Creating two indexdb's, one for plus strand and one for minus 18459 strand. Moved conversion tables to cmet.c. Removed conversion of genome. 18460 184612009-05-14 twu 18462 18463 * cmet.c, cmet.h: Initial import into CVS 18464 184652009-05-13 twu 18466 18467 * cmetindex.c: Initial import into CVS. Implements reverse genome. 18468 18469 * iit-read.c: Added warning for use of IIT_string_from_position 18470 18471 * stage2.c: Fixed bug in using wrong indexsize for a given oligoindex 18472 18473 * oligoindex.c, oligoindex.h: Added a procedure to return indexsize 18474 18475 * get-genome.c: Fixed printing of coordinates 18476 18477 * stage1hr.c, stage1hr.h: Changed polyat to omitted. Created separate 18478 procedure to mark omitted, which omits repetitive oligos except at the 18479 ends, except for poly-AT at the ends. 18480 18481 * stage3hr.c, stage3hr.h: Made removal of duplicates faster 18482 18483 * stage1hr.c, stage1hr.h: Added ability to handle methylation data. Using 18484 simplified version of Genome_fill_buffer that does not check chromosome 18485 bounds. 18486 18487 * snpindex.c: Changed name of SNP genome file from genome to genomecomp. 18488 18489 * genome.c, genome.h, gsnap.c: Added ability to handle methylation data 18490 184912009-05-11 twu 18492 18493 * gsnap.c: Made min_end_matches a user-adjustable parameter. Made 18494 maxchimerapaths the same as maxpaths. 18495 18496 * stage1hr.h: Made min_end_matches a user-adjustable parameter 18497 18498 * stage1hr.c: Fixed computation of end indels so it finds maximal length 18499 from end. Made min_end_matches a user-adjustable parameter. Removed 18500 allocation of polyat outside of 0..query_lastpos. 18501 185022009-05-10 twu 18503 18504 * stage1hr.c: Fixed bug in turning off end indels. Implemented faster 18505 method for computing firstbound and lastbound for xfirst and xlast 18506 computation. 18507 185082009-05-08 twu 18509 18510 * stage3hr.c: Fixed sorting so it uses hittype, and not indel separation 18511 18512 * oligo.c, oligo.h: Added function Oligo_mark_repetitive 18513 18514 * stage1hr.c: Fixed computations of xfirst and xlast in presence of 18515 repetitive oligos. Increased speed of computing nmismatches_long in end 18516 indels. 18517 18518 * genome_hr.c, genome_hr.h: Modified Genome_mismatches_left and 18519 Genome_mismatches_right to take pos5 and pos3 as arguments. 18520 185212009-05-07 twu 18522 18523 * stage1hr.c: Fixed floor formulas again. Generalized idea of polyat to 18524 mean all repetitive oligos. For middle indels, computing middle floor 18525 explicitly when polyat oligos are present. For end indels, computing 18526 firstbound and lastbound to handle cases with polyat oligos at the ends. 18527 Reordered compute_end_indels to starting computing from 1 deletion. 18528 185292009-05-06 twu 18530 18531 * spanningelt.c, spanningelt.h, stage1hr.c: Removed unused code 18532 18533 * spanningelt.c, spanningelt.h: Added code for spanning set computation of 18534 end indels, but not used. 18535 18536 * gsnap.c, stage1hr.h: Added floors_array 18537 18538 * stage1hr.c: Made changes to floors: (1) Made floor_middle formula handle 18539 middle indels, (2) Created Floors_T object to precompute floors and handle 18540 polyat oligomers. Fixed computation of end indels, now can handle 18541 mismatches. Fixed bug in computing max_indel_sep. Added code for 18542 possible fast computation of end indels, but not used. 18543 185442009-05-05 twu 18545 18546 * stage1hr.c: Replaced arithmetic expression for smallesti with if statement. 18547 18548 * stage3hr.c: Changed sorting order so non-indel alignments rank higher than 18549 indel alignments. 18550 18551 * stage1hr.c: Computing floors between two segments in finding middle indels 18552 185532009-05-04 twu 18554 18555 * stage1hr.c: Added limit to number of middle indels found 18556 185572009-05-01 twu 18558 18559 * stage1hr.c: Implemented incremental execution of fast mismatch algorithm. 18560 18561 * stage1hr.c: Fixed setting of indel separation in triage. Prevented 18562 boostpos from being a compoundpos. Turned off tournament tree, since it 18563 was causing a crash. 18564 18565 * spanningelt.c: Fixed bug occurring when all spanningelts have no positions 18566 185672009-04-30 twu 18568 18569 * stage1hr.c: Fixed bug with setting mismatch levels for suboptimal results. 18570 Solving all multiple_mm solutions in a single run. 18571 18572 * stage3hr.c: Fixed bugs in Stage3_remove_duplicates 18573 18574 * stage3hr.h: Added function for counting number of optimal hits. 18575 18576 * stage3hr.c: Fixed problem with undesired removal of second splice site 18577 within a given genomic region. Added function for counting number of 18578 optimal hits. 18579 18580 * stage1hr.c: Removed code for REFINE_MISSES. Added back missing statement 18581 setting duplicates_possible_p. 18582 18583 * gsnap.c: Removed minlevel and maxlevel and replaced with suboptimal 18584 mismatches. Made sort always happen. Changed flag name from invertp to 18585 circular-output. 18586 18587 * stage1hr.c, stage3hr.c, stage3hr.h: Using score in sorting procedure. 18588 Added provision for minlevel in Stage3_optimal_score. 18589 18590 * stage1hr.c, stage1hr.h: Refined miss_querypos5 and miss_querypos3 18591 boundaries. Implemented triage for minlevel_mismatches and 18592 maxlevel_mismatches instead of minlevel and maxlevel, and accounted for 18593 fast mismatch algorithm. 18594 185952009-04-27 twu 18596 18597 * gmapindex.c: Allowing contigs with a single nucleotide 18598 185992009-04-26 twu 18600 18601 * spanningelt.c, spanningelt.h: Removed boosterset idea and returning only 18602 the minscore within the spanningset. 18603 18604 * stage1hr.c: Removed all recursion from identify_multimiss_iter. Added 18605 feature to modify spanningset list and update a counter when a spanningelt 18606 is empty. Removed boosterset idea and restored a single boostpos. 18607 Correctly implemented fast multimiss algorithm. 18608 18609 * spanningelt.c, spanningelt.h, stage1hr.c: Created separate scores for 18610 candidate generation and for pruning. Implemented idea of a boosterset, 18611 instead of a single boostpos, but seems to be slower. 18612 186132009-04-25 twu 18614 18615 * Makefile.dna.am, Makefile.gsnaptoo.am, indexdb_hr.c, indexdb_hr.h, 18616 spanningelt.c, spanningelt.h, stage1hr.c: Created formal Spanningelt_T 18617 object and rewrote algorithms to use it. 18618 186192009-04-24 twu 18620 18621 * stage1hr.c: Preliminary implementation of a multimiss algorithm 18622 generalized from the onemiss algorithm. 18623 186242009-04-23 twu 18625 18626 * stage3hr.c: Fixed bug that resulted in duplicate outputs 18627 18628 * stage1hr.c: Reverting to previous version 18629 18630 * stage1hr.c: Implemented vertical solution for find_segments_multiple_mm, 18631 which handles each querypos one at a time, but result is much slower. 18632 18633 * stage1hr.c: Removed code from before DELAY_READING. Implemented 18634 tournament trees, which require slightly fewer instructions than heaps. 18635 186362009-04-22 twu 18637 18638 * Makefile.gmaponly.am, Makefile.pmaptoo.am: Included changepoint files. 18639 18640 * Makefile.dna.am: Included changepoint files. Included indexdb_dump 18641 program. 18642 18643 * bigendian.c: Added code to output files in bigendian format 18644 18645 * stage3.c, stage3.h: Parameterized TRIM_END_PVALUE. Fixed map feature of 18646 GMAP for new IIT format. 18647 18648 * stage3hr.c: Loosened criteria for duplicate hits that was eliminating 18649 overlapping matches with the same number of mismatches. 18650 18651 * indexdb_hr.c, indexdb_hr.h, stage1hr.c: Using binary_threshold instead of 18652 parent_ndiagonals 18653 18654 * indexdb.c: Fixed bug for bigendian machines 18655 18656 * iitdef.h: Changed type of divsort to be int, apparently for compiler 18657 warnings? 18658 18659 * iit-read.c, iit-read.h: Implemented algorithm for map feature of GMAP to 18660 use new IIT format 18661 18662 * gmap.c: Using divint crosstable for map feature. 18663 186642009-04-21 twu 18665 18666 * stage1hr.c: Implemented a delay in converting positions to diagonals until 18667 needed. 18668 186692009-04-08 twu 18670 18671 * stage3.c: Using number of matches to trim ends, not total length 18672 186732009-04-04 twu 18674 18675 * indexdb_hr.c, stage1hr.c: Made exact and sub-1 code work correctly on 18676 bigendian machines without having to copy memory. 18677 186782009-04-03 twu 18679 18680 * gmap.c, stage3.c, stage3.h: Modified -H flag to let user control 18681 minendexon length 18682 18683 * stage1hr.c: Skipping computation on poly-A or poly-T sequences 18684 186852009-04-02 twu 18686 18687 * stage3.c: Added check for null pairs before generating matchscores 18688 18689 * pair.c, pair.h: Revised procedures for producing matchscores 18690 18691 * gsnap.c: Adding number of paths to output. Removed check for inplace 18692 being possible. 18693 18694 * snpindex.c: Added comment 18695 18696 * indexdb.c, indexdb.h: Removed procedures that create sentinels. 18697 18698 * indexdb_hr.c: Not using sentinels. For compoundpos, always reading in 18699 place and converting to bigendian when needed. 18700 18701 * stage1hr.c, stage1hr.h: Not using sentinels. For exact and sub:1, always 18702 reading in place and converting to bigendian when needed. 18703 18704 * stage1.c: Added limits on number of gregions to speed up program 18705 18706 * stage3.c, stage3.h: Using changepoint algorithm and iterative trimming of 18707 ends to improve ends of alignment. 18708 18709 * indexdb.c, indexdb.h: Added hooks for sentinel in indexdb files 18710 18711 * gregion.c, gregion.h: Rewrote support filtering to use either a fixed 18712 difference (for longer sequences) or a percentage difference (for shorter 18713 ones). For extending sequences, using querylength for adequate support or 18714 short support. 18715 18716 * gmap.c: Improved documentation for --help 18717 18718 * genome-write.c: Always print out number of bad characters 18719 18720 * changepoint.c, changepoint.h: Modified procedures to ignore -1 values in 18721 input 18722 187232009-03-31 twu 18724 18725 * stage1hr.c: Using heaps for nomiss and onemiss, but not for exact. 18726 Replacing NULL lists with sentinels when necessary. 18727 18728 * stage1.c: Implemented limit on number of gregions before finding unique 18729 ones, to prune nonspecific, slow sequences. 18730 187312009-03-26 twu 18732 18733 * stage1hr.c: Using pointers and relying on sentinels to advance through 18734 lists. 18735 18736 * stage1hr.c: Removed old code 18737 18738 * stage1hr.c: Changed sub:1 recursive procedures to iterative 18739 18740 * stage1hr.c: Reverting to previous version 18741 18742 * stage1hr.c: Attempt to use information from third and later lists to 18743 advance first list 18744 18745 * stage1hr.c: Using results of second list to speed up intersection 18746 187472009-03-25 twu 18748 18749 * stage1hr.c: Implemented a faster version of performing intersection for 18750 exact matches. 18751 18752 * stage1hr.c: Starting exact matches with intersection of first two lists. 18753 18754 * stage1hr.c: Made non-heap compoundpos_find procedure the default 18755 18756 * indexdb_hr.c, indexdb_hr.h: Implemented a non-heap version of searching 18757 through a union of positions. 18758 18759 * indexdb_hr.c: Added hooks for handling indexdb with sentinels. Removed 18760 unused code. 18761 18762 * stage1hr.c: Implemented iterative version of find_exact_aux. Depending on 18763 sentinel to increase speed of loop. 18764 187652009-03-24 twu 18766 18767 * gmap.c, pair.c, pair.h, stage3.c, stage3.h: Fixed printing in psl format 18768 when user provides a segment 18769 18770 * compress.c, compress.h, genome-write.c: Fixed reporting of non-ACGTNX 18771 characters 18772 18773 * gmap.c, stage3.c, stage3.h: Removed -k flag and trimexonpct parameter 18774 18775 * iit_fetch.c: Added ability to compute ratios from pdl files 18776 18777 * iit-read.c, iit-read.h: Provided procedures for dumping version 1 IITs 18778 18779 * get-genome.c: Fixed dump output of contigs 18780 18781 * compress.c: Reduced number of error messages for non-ACGTNX characters 18782 18783 * stage3.c: No longer using non-canonical introns for trimming end exons, 18784 only binomial test. Fixed bad behavior for theta values close to 1.0. Not 18785 reporting alignments with fewer than 20 matches. 18786 187872009-03-19 twu 18788 18789 * iit_fetch.c: Added hook for splices iits 18790 18791 * plotgenes.c, plotgenes.h: Added function for handling splices. Made sure 18792 nbins > 0. 18793 187942009-03-18 twu 18795 18796 * plotgenes.c, plotgenes.h: Added function Plotgenes_fetch_points. Renamed 18797 some functions. 18798 18799 * gmap_setup.pl.in: Gives files on command line to fa_coords and 18800 gmap_process, which can then add linefeed if necessary to lines. 18801 18802 * gmap_process.pl.in: Allowed program to read from either stdin or files on 18803 command line. In the latter case, it adds linefeed if necessary to lines. 18804 18805 * fa_coords.pl.in: Re-indented program. Allowed program to read from either 18806 stdin or files on command line. 18807 18808 * iit_fetch.c: Implemented handling of PDL files. Added -s flag to specify 18809 sample number. 18810 18811 * stage3.c: Trimming using a binomial test on end exons 18812 18813 * gdiag.c, genome.c, genome.h, get-genome.c, gsnap-to-iit.c, gsnap.c, 18814 indexdb.c, indexdb.h, oligo-count.c, snpindex.c: Modified programs to take 18815 a snp_root argument to -V 18816 18817 * gmap.c: Added "Processed" message to stderr at end of batch run 18818 18819 * Makefile.dna.am, Makefile.gmaponly.am, Makefile.gsnaptoo.am, 18820 Makefile.pmaptoo.am: Updated Makefiles 18821 18822 * pair.c, pair.h: Added function Pair_fracidentity_simple 18823 18824 * pbinom.c, pbinom.h: Initial import into CVS 18825 188262009-03-17 twu 18827 18828 * gsnap.c: Considering too many paths as a failure type for the --nofails 18829 and --failsonly flags 18830 188312009-03-10 twu 18832 18833 * stage1.c: Added variable needed in debugging 18834 18835 * plotgenes.c, plotgenes.h: Renamed functions relating to fetching 18836 18837 * iit_fetch.c: Removed code relating to printing 18838 18839 * iit-read.c: Fixed issues for bigendian machines and for fileio access 18840 18841 * datadir.c, datadir.h, gmap.c: Added extra functionality to show available 18842 databases and map files 18843 18844 * iit_fetch.c: Initial import into CVS. Copied from iit_plot.c 18845 18846 * blackboard.c, changepoint.c, compress.c, diag.c, dynprog.c, 18847 genome-write.c, genome.c, get-genome.c, gmap.c, gmapindex.c, gregion.c, 18848 iit-read.c, iit-write.c, indexdb.c, indexdb_hr.c, intpool.c, oligo.c, 18849 oligoindex.c, oligoindex.h, pair.c, reader.c, sequence.c, smooth.c, 18850 stage1.c, stage2.c, stage3.c, translation.c: Removed unused variables 18851 based on SGI compiler warnings 18852 188532009-03-09 twu 18854 18855 * iit-read.c: Fixed reading of labels and annotations using fileio on 18856 bigendian machines. 18857 18858 * bigendian.c, bigendian.h: Added command for reading uint using fileio 18859 18860 * stage3.c: Fixed bug where gap was left at 3' end before extending the end. 18861 18862 * stage1hr.c: Fixed counting of mismatches on revcomp sequences 18863 18864 * indexdb_hr.c: Made heapsize an int, rather than unsigned int, in all 18865 procedures 18866 18867 * indexdb.c: Moved variable declarations above debugging statement to 18868 satisfy SGI compiler. 18869 18870 * gsnap.c: Changed variable names involving -q flag to "part_" 18871 18872 * get-genome.c, iit_get.c: Printing divstring only if ndivs > 1 18873 18874 * iit-read.h: Added ability to retrieve divstring from universal coordinates. 18875 18876 * iit-read.c: Added ability to retrieve divstring from universal 18877 coordinates. Fixed bug occurring during reading annotatinos using fileio. 18878 18879 * iit-write.c: Fixed bug relating to use of null annotlist to indicate 18880 altstrain IIT. 18881 188822009-03-06 twu 18883 18884 * pair.c, pair.h, stage3.c: Fixed bug where -P and -Q flags were printing 18885 C-to-N on antisense cDNAs. 18886 188872009-02-20 twu 18888 18889 * gmap_setup.pl.in: Added FILE_ENDINGS variable 18890 18891 * get-genome.c: Removed debugging flag 18892 18893 * sequence.c: Improved handling of blank lines 18894 18895 * indexdb.c, snpindex.c: Changed offsets and positions to occur at modulo 0 18896 intervals based on chrpos, not universal position. 18897 18898 * iit_get.c: Added feature to print results from tally types of IITs 18899 18900 * iit-read.c: Fixed bug in re-fetching matches based on index 18901 18902 * gsnap.c: Cleared flags for batch and novelsplicing 18903 18904 * types.h: Removed an extraneous open brace 18905 18906 * stage3hr.c, stage3hr.h: Calculating splice site model score at print time 18907 for all splice sites. 18908 18909 * stage1hr.c: Providing sense information for donor and acceptor substrings. 18910 Testing if duplicate matches are possible due to minlevel and eliminating 18911 them. 18912 189132009-02-05 twu 18914 18915 * stage3hr.c, stage3hr.h: Distinguishing between splice objects that require 18916 copying of substrings and those that do not. 18917 18918 * stage1hr.c: Incorporated splice site probabilities into finding splice 18919 sites by distance. 18920 18921 * stage1hr.c: Initial implementation of finding splice pairs by distance 18922 18923 * gsnap.c: Retrieving and printing known splicesite information at print 18924 time. 18925 18926 * iit-read.c, iit-read.h: Added function IIT_get_typed_with_divno 18927 18928 * stage3hr.c, stage3hr.h: Providing chroffset information to Substring_T 18929 object. Retrieving and printing known splicesite information at print 18930 time. 18931 18932 * stage1hr.c: Providing chroffset information to Substring_T object 18933 189342009-02-04 twu 18935 18936 * stage1hr.c, stage1hr.h, gsnap.c: Implemented faster method for applying 18937 known splice sites 18938 18939 * stage1hr.c, stage1hr.h: Fixed bug when allvalidp5 or allvalidp3 is false 18940 in paired end reads. Implemented Stage1_retrieve_splicesites. 18941 18942 * resulthr.c, resulthr.h: Renamed paired_translocation as paired_as_singles 18943 18944 * iit-read.c: Fixed bug in evaluating nexactmatches 18945 18946 * get-genome.c: Made retrieval of map information work with universal 18947 coordinates. 18948 18949 * stage3hr.c: Fixed bug where second read of paired_as_singles was not being 18950 printed 18951 18952 * gsnap.c: Renamed paired_translocation to paired_as_singles 18953 189542009-02-03 twu 18955 18956 * genome_hr.c: Now requiring query to match either reference or alternate 18957 allele at SNPs. Otherwise, it counts as a mismatch. 18958 189592009-02-02 twu 18960 18961 * get-genome.c: Added feature to print snp information 18962 18963 * gdiag.c, gsnap-to-iit.c, iit_plot.c: Using new interface to Genome_new 18964 18965 * stage3hr.c: Eliminating novel splice site when it duplicates a known 18966 splice site 18967 18968 * stage1hr.c: Fixed bug where plus multiple mismatches are dropped when 18969 minus batches are all empty. Added flexibility to floor_xfirst and 18970 floor_xlast to allow for indels adjacent to end 12-mers. Changed 18971 condition endpoint for indels at ends. Using genome_hr procedure to count 18972 mismatches for indels. Finding all shortdistance splices with known 18973 splice sites first, before finding novel splice sites. 18974 18975 * snpindex.c: Writing revised version of genome. Skipping cases where snp 18976 type is inconsistent with reference genome. Taking snps_iit as a 18977 command-line argument. 18978 18979 * sequence.c, sequence.h: Added Sequence_print_two function for snps 18980 18981 * gsnap.c, genome_hr.c, genome_hr.h, stage1hr.c, stage1hr.h: Using genomealt 18982 instead of snps_iit 18983 18984 * gmap.c: Switched -v and -V flags. Using new interface to Genome_new. 18985 18986 * genome_hr.c: Corrected calculation of mismatches_right. Corrected offset 18987 and now subtracting number of leading zeroes. Using clz_table instead of 18988 log_table. Improved debugging statements. 18989 18990 * genome.c, genome.h: Added ability to read Genome_T object as alternate or 18991 snp only versions. 18992 189932009-02-01 twu 18994 18995 * gdiag.c, oligo-count.c: Using new interface to Indexdb_new_genome 18996 18997 * gmapindex.c, pmapindex.c: Removed altstrain code 18998 18999 * indexdb.c, indexdb.h: Storing positions only at 0 mod 3. 19000 19001 * gsnap.c: Made batch loading the default for multiple input sequences. 19002 Testing for both new names and old names of reference offsets and 19003 positions files. Made flag for splicing refer to novel splicing. 19004 19005 * gmap.c: Made batch loading the default for multiple input sequences. 19006 Testing for both new names and old names of reference offsets and 19007 positions files. 19008 19009 * get-genome.c: Added debugging statement 19010 19011 * genome_hr.c: Fixed retrieval of intervals from snps_iit file, which are 19012 1-based. 19013 19014 * stage1hr.c, stage1hr.h: Implemented snp differences for splicing and exact 19015 matches. Allowing identification of novel splicing in addition to known 19016 splice sites. 19017 19018 * stage3hr.c: Implemented reporting of snp differences for indels. 19019 19020 * snpindex.c: Fixed bug in using chromosomal position instead of universal 19021 position. Using parameterized suffix for reference offsets and positions 19022 files. 19023 190242009-01-30 twu 19025 19026 * snpindex.c: Initial import into CVS 19027 190282009-01-28 twu 19029 19030 * genome_hr.c: In Genome_mismatches_left and Genome_mismatches_right, adding 19031 a 'sentinel' mismatch position to list if haven't reached max_mismatches. 19032 19033 * stage1hr.c: Implemented tolerance and reporting of snp differences for 19034 novel splice sites. 19035 19036 * stage3hr.c, stage3hr.h: Added ability to print number of SNP differences 19037 separately. Thie feature not yet implemented for indels though. 19038 19039 * stage1hr.c, stage1hr.h: Added ability to tolerate known SNPs and count 19040 differences at those sites separately. Thie feature not yet implemented 19041 for novel splices though. 19042 19043 * segmentpool.c, segmentpool.h: Changed name of chrlow to chroffset 19044 19045 * iit-read.c: Fixed computation of divint_crosstable 19046 19047 * genome_hr.c, genome_hr.h, gsnap.c: Added ability to tolerate known SNPs 19048 and count differences at those sites separately. 19049 190502009-01-27 twu 19051 19052 * gsnap-to-iit.c: Increased buffer sizes 19053 19054 * get-genome.c: Made get-genome work on map files 19055 19056 * plotgenes.c: Edited comments 19057 19058 * gsnap.c: Provided user with the ability to set parameters for size of 19059 middle and end insertions and deletions. 19060 190612009-01-22 twu 19062 19063 * stage1hr.c: Added steps to remove duplicate paired-end results 19064 19065 * stage1hr.c: Fixed memory leak 19066 19067 * gsnap.c: Increased default maxpaths from 20 to 100. Added -e flag to 19068 --help output. Providing max_mismatches parameter to paired-end procedure. 19069 19070 * stage3hr.h: Added function Stage3_remove_old. 19071 19072 * stage3hr.c: Revised definition of paired-end length to go from 19073 beginning-of-read to beginning-of-read. Added function Stage3_remove_old. 19074 19075 * stage1hr.c, stage1hr.h: Passing max_mismatches parameter for paired reads 19076 190772009-01-21 twu 19078 19079 * indexdb_hr.c: Fixed bug where heapify was performed after binary_search 19080 used up all available diagonals in a batch. 19081 19082 * gmap_setup.pl.in: Fixed bug in escaping a variable 19083 19084 * pagesize.m4: Made comment line clearer 19085 19086 * builtin.m4: Initial import into CVS. 19087 19088 * Makefile.am: Added gmap_reassemble 19089 19090 * fa_coords.pl.in: Made -S flag the default. Added -C flag to look 19091 explicitly for chromosomal information. 19092 19093 * md_coords.pl.in: Added check for unmapped contigs 19094 19095 * gmap_setup.pl.in: Maded -S flag the default behavior. Added -C and -O 19096 flags. Added clean procedure when making coords.genome. 19097 19098 * stage3hr.c, stage3hr.h: Including label as part of Substring_T 19099 19100 * stage1hr.h: Added procedures for finding splices against known splicesites 19101 iit. 19102 19103 * stage1hr.c: Added procedures for finding splices against known splicesites 19104 iit. Corrected computation of distances on inversions. 19105 19106 * indexdb.c: Cleaned code to ensure gsnap finds the right offsets file 19107 19108 * iit-write.c: Added check for a null typestring 19109 19110 * iit_dump.c: Fixed debugging output 19111 19112 * iit-read.h: Added procedures to search based on divint and to get a 19113 crosstable of divints. 19114 19115 * iit-read.c: Fixed IIT_debug. Added procedures to search based on divint 19116 and to get a crosstable of divints. 19117 19118 * gsnap.c: Added flags for maxmismatches, splicing penalties, and splicing 19119 iit. Added flags for failsonly and nofails in output. 19120 19121 * gregion.c: Added abort if genomicend < genomicstart 19122 19123 * gmapindex.c: Eliminated reading of strain information and assignment to 19124 contigtypelist. Increased size of chrpos string from 100 to 8192. 19125 19126 * gmap.c: Reformatted output for --help 19127 191282009-01-15 twu 19129 19130 * stage1hr.c: Created user-specified parameters for splicing probabilities 19131 and length. 19132 19133 * stage1hr.c: Fixed bug in printing coordinates of splicing results on minus 19134 strand 19135 191362009-01-14 twu 19137 19138 * gsnap.c: Added flags for excluding failed alignments, or limiting to those 19139 191402009-01-13 twu 19141 19142 * stage1hr.c: Added additional check to prevent straddling across chromosomes 19143 19144 * indexdb_hr.c, indexdb_hr.h, stage1hr.c: Implemented more efficient way of 19145 ignoring extensions past beginning of genome. 19146 191472009-01-07 twu 19148 19149 * stage1hr.c: Hacks put in to exclude diagonals that are less than 19150 querylength 19151 19152 * stage1hr.c: Fixed issues with wrong indel_pos chosen in middle insertions, 19153 and not checking up to specified number of indels. 19154 191552008-12-24 twu 19156 19157 * indexdbdef.h: Reverted to version 1.2 19158 19159 * indexdb.c: Reverted to version 1.121 19160 19161 * indexdb.c, indexdbdef.h: Attempt to use a compressed indexdb file 19162 191632008-12-22 twu 19164 19165 * stage1hr.c: Reading floor 2 during find_segments_multiple_mm. Returning 19166 min_mismatches_seen from find_onemiss_matches. 19167 191682008-12-21 twu 19169 19170 * indexdb_hr.c: Put check for size of compoundpos outside of loop. 19171 19172 * stage1hr.c: Added bounds on location of mismatch in onemiss search 19173 19174 * stage1hr.c: Reverted back to version 1.106 that has hanging compoundpos 19175 positions for exact and onemiss matches. 19176 19177 * stage1hr.c: Version of stage 1 with hooks for disallowing compoundpos 19178 positions that hang over ends. However, this appears to add 40% to number 19179 of instructions. 19180 191812008-12-20 twu 19182 19183 * stage1hr.c: Setting pointers->compoundpos to NULL after it becomes empty, 19184 to prevent further computation on it. 19185 19186 * indexdb_hr.c, indexdb_hr.h: Added function Compoundpos_intersect 19187 19188 * stage1hr.c: Eliminated compoundpos positions in creating segments from 19189 multiple mismatches. Delayed sorting of segments until needed for middle 19190 insertions and deletions. Setting floor to zero in cases where poly-AT is 19191 present. For end indels, computing oligomer_start and oligomer_end based 19192 on results of actual mismatches found. 19193 19194 * stage1hr.c: Fixed bug in debugging statement 19195 19196 * gmapindex.c: Added more compiler checks to hide alternate strain code 19197 19198 * Makefile.util.am: Added program all-orfs 19199 19200 * Makefile.dna.am, Makefile.gsnaptoo.am: Removed chrsubset.c from gsnap 19201 sources 19202 192032008-12-18 twu 19204 19205 * stage3hr.c, stage3hr.h: Using left instead of genomicpos5 for creating 19206 Stage3_T objects. 19207 19208 * stage1hr.c: Using left instead of genomicpos5 for creating Stage3_T 19209 objects. Consolidated code for plus and minus segments. Changed parameter 19210 list for find_segments_multiple_mm to prepare for finding hits within that 19211 procedure. 19212 19213 * stage1hr.c: Clarified calculations of floors 19214 192152008-12-17 twu 19216 19217 * stage1hr.c: Using init and search routines for Compoundpos_T objects 19218 19219 * separator.h: Changed coordinate separator from "--" to ".." 19220 19221 * intlist.c, intlist.h: Added function Intlist_sort_ascending() 19222 19223 * indexdb_hr.c, indexdb_hr.h: Implemented init and search routines for 19224 Compoundpos_T objects 19225 19226 * iit-write.c, iit_store.c: Added monitoring output 19227 19228 * stage1hr.c: Clarified processing of pointers in search for onemiss matches. 19229 192302008-12-16 twu 19231 19232 * stage1hr.c: Allowing spanning 12-mers for exact and onemiss searches to go 19233 in either forward or reverse direction, and picking the optimal direction. 19234 19235 * stage1hr.c: Allowing compoundpos positions to be used for boosting, by 19236 merging them during search for exact matches. 19237 19238 * stage1.c: Fixed a bug where the querypos of sentinel was set incorrectly. 19239 Now using querylength, not -1. Added a check to prevent gregions with 19240 negative values for genomicstart. 19241 19242 * indexdb_hr.c, indexdb_hr.h: Removed partner_diagonals from Compoundpos_T 19243 object and removed reduce function. 19244 19245 * stage1hr.c: Fixed potential problem with sentinel. The querypos part of 19246 sentinel now set to querylength, not -1, to guarantee it stops the loop. 19247 Introduced Pointers_T object to simplify exact and onemiss code. 19248 192492008-12-15 twu 19250 19251 * stage1hr.c: Introduction of Compoundpos_T object for speeding up 19252 computation in exact and onemiss algorithms 19253 19254 * intlist.c, intlist.h: Added Intlist_insert_second() function 19255 19256 * indexdb_hr.c, indexdb_hr.h: Introduction of Compoundpos_T object and 19257 operations 19258 192592008-12-14 twu 19260 19261 * intlist.c: Made cell_ascending() and cell_descending() static. 19262 19263 * genome_hr.c, genome_hr.h: Created Genome_count_mismatches_limit. Also 19264 added code for a oneloop version of Genome_count_mismatches. 19265 19266 * gsnap.c: Removed chrsubset feature 19267 19268 * stage3hr.c, stage3hr.h: Clarified that Substring_new_donor and 19269 Substring_new_acceptor should receive forward query sequence. 19270 19271 * stage1hr.c: Removed querylength from call to select_positions_for_exact(). 19272 19273 * stage1hr.h: Removed Chrsubset_T object. 19274 19275 * stage1hr.c: Using new call to Genome_count_mismatches_limit. Replaced 19276 uses of queryseq with queryuc_ptr and queryrc. Introduced query_lastpos 19277 to replace multiple calculations of (querylength - INDEX1PART). Removed 19278 Chrsubset_T object. 19279 19280 * stage1hr.c: Fixed onemiss algorithm so it handles short reads less than 19281 2*INDEX1PART in length. Changed occurrences of oligobase to INDEX1PART. 19282 Removed oligobase and querylength from Stage1_T object. 19283 19284 * genome_hr.c, genome_hr.h: Made Genome_count_mismatches more efficient, by 19285 using pointers and stepping through query and genome blocks sequentially. 19286 192872008-12-13 twu 19288 19289 * stage1hr.c: Implemented new method for identifying single mismatches, 19290 similar to that for finding exact matches. 19291 192922008-12-12 twu 19293 19294 * result.h: Added a new failure type for short sequences 19295 19296 * pmapindex.c: Changed default index1interval from 3 to 6 19297 19298 * oligo.c: Added a comment. 19299 19300 * oligo-count.c: Using revised interface to indexdb.c. 19301 19302 * indexdb_hr.c, indexdb_hr.h: Removed addition of a diagterm for lookups 19303 involving a left or right shift. 19304 19305 * indexdb.c, indexdb.h: Changed some function names. Added function to 19306 determine if inplace reading is possible. Added parameter to require 19307 sampling of 3 in indexdb. 19308 19309 * gmap.c: Using revised interface to indexdb.c. Added check and message for 19310 sequences shorter than INDEX1PART. 19311 19312 * gdiag.c: Using revised interface to indexdb.c 19313 19314 * block.c: Using revised function names in indexdb.c 19315 19316 * segmentpool.c, segmentpool.h: Removed break5 and break3 from Segmentpool_T 19317 object. 19318 19319 * stage1hr.c: Removed break5 and break3 from Segmentpool_T object 19320 19321 * stage1.c: Providing correct adjustment to diagonals for the minus strand, 19322 by adding the length of the oligomer. 19323 19324 * stage3hr.c, stage3hr.h: Added function to compare Stage3_T objects by 19325 genomic location 19326 19327 * gsnap.c: Added flag to sort results by genomic location 19328 19329 * stage1hr.c: No longer saving segments during check for single mismatches. 19330 Checking and saving substitution hits within each heap merge. 19331 19332 * gsnap.c: Added user flag for setting indel penalty 19333 19334 * stage1hr.c, stage1hr.h: Made indel_penalty a parameter adjustable by the 19335 user. 19336 193372008-12-11 twu 19338 19339 * stage1hr.c: Moved around code for special cases that prevent searching for 19340 end indels at beginning or end of the sequence. 19341 19342 * gsnap.c: Added information about whether inplace reading of indexdb is 19343 possible. Added information to version command. 19344 19345 * segmentpool.c, segmentpool.h: Added floor, floor_xfirst, and floor_xlast 19346 fields to the Segment_T object. 19347 19348 * stage1hr.c: Removed separate lists of segments by floor. Keeping only a 19349 single list of plus segments and of minus segments, with floor information 19350 stored in the Segment_T object. 19351 19352 * stage1hr.h: Providing information about whether inplace reading of 19353 diagonals is possible. 19354 19355 * stage1hr.c: Delayed addition of diagterm until after search for exact 19356 matches. Allowing reads of diagonals from indexdb to be inplace when 19357 possible 19358 193592008-12-10 twu 19360 19361 * types.h: Added check for size of unsigned long long as an 8-byte word 19362 19363 * stage1hr.c: Using 64-bit words, if available, to speed up comparison of 19364 batches in heap merge. 19365 19366 * stage1hr.c: Replaced calls to List_head, List_next, Intlist_head, and 19367 Intlist_next with primitives. 19368 19369 * Makefile.pmaptoo.am: Revised source files 19370 19371 * Makefile.gsnaptoo.am: Added gsnap-to-iit program. Added segmentpool. 19372 19373 * Makefile.gmaponly.am: Added chrom.c to iit utilities 19374 19375 * Makefile.dna.am: Removed test programs for Compress_T procedures 19376 19377 * Makefile.dna.am: Added test programs for Compress_T procedures. Added 19378 segmentpool. 19379 19380 * segmentpool.c, segmentpool.h: Initial import into CVS 19381 19382 * gsnap.c, stage1hr.c, stage1hr.h: Added segmentpool 19383 193842008-12-09 twu 19385 19386 * stage1hr.c: Enforcing diagonals to be within chromosomal bounds. Removed 19387 unused code. 19388 19389 * genome.c: Fixed check for chromosome bounds 19390 19391 * genome_hr.c, genome_hr.h: Removed checks for crossing of chromosome 19392 boundaries. Relying upon calling procedures to enforce this. 19393 193942008-12-08 twu 19395 19396 * gsnap.c: Added flags for minlevel and maxlevel. Cleaned up unused flags. 19397 19398 * genome_hr.c, genome_hr.h: Made functions return maximum number of 19399 mismatches if they cross a chromosome bound. 19400 194012008-12-05 twu 19402 19403 * stage1hr.c: On end indels, checking to see if indel_pos is non-positive. 19404 Passing chromosome_iit to Genome_mismatches_left and 19405 Genome_mismatches_right. 19406 19407 * gsnap.c: Added statement at end of batch processing to indicate number of 19408 queries processed. 19409 194102008-12-02 twu 19411 19412 * subseq.c: Added U and l flags 19413 19414 * stage1hr.c: For middle indels, putting indel at leftmost genomic position. 19415 Fixed filtering criteria for end indels. Counting mismatches and left 19416 and right to identify candidates for end indels. 19417 19418 * stage1.c: Added debugging statements 19419 19420 * gdiag.c, gsnap-to-iit.c: Using new interface to Genome_fill_buffer 19421 19422 * gmap.c: Fixed genomiclength for user-provided genomic sequence. Stopped 19423 trimming of sequence. 19424 19425 * diag.c: Loosened criteria for MAX_DIAGONALS and MIN_SCORE. 19426 194272008-11-25 twu 19428 19429 * genome_hr.c: Made Genome_mismatches_left and Genome_mismatches_right fill 19430 mismatch_positions entries 0..max_mismatches. 19431 19432 * stage1hr.c: Implemented new algorithms for middle insertions and 19433 deletions, using Genome_mismatches_left and Genome_mismatches_right. 19434 19435 * stage1hr.c: Using new interface to Genome_count_mismatches 19436 19437 * genome_hr.c, genome_hr.h: Added functions Genome_mismatches_left and 19438 Genome_mismatches_right. Added builtin bit-vector functions. 19439 194402008-11-24 twu 19441 19442 * genome_hr.c, genome_hr.h: Allowed specification of pos5 and pos3 in 19443 Genome_count_mismatches 19444 19445 * genome_hr.h: Using new Compress_T object. 19446 19447 * genome_hr.c: Using new Compress_T object. Removed unused code. 19448 19449 * stage1hr.c: Using new Compress_T object 19450 19451 * stage3hr.c: Fixed memory leak 19452 19453 * compress.c, compress.h: Introduced Compress_T object 19454 194552008-11-23 twu 19456 19457 * genome_hr.c, genome_hr.h: Initial entry into CVS 19458 19459 * gsnap.c: Added flag for handling circular-end reads 19460 19461 * stage3hr.c, stage3hr.h: Making a single invertp work on paired-end and 19462 circular-end reads 19463 19464 * stage1hr.c: Using direct comparison against compressed genome to count 19465 mismatches 19466 19467 * genome.c, genome.h: Added function Genome_blocks. Made Genome_fill_buffer 19468 return nunknowns. 19469 19470 * compress.c, compress.h: Added functions Compress_new and Compress_shift 19471 194722008-11-20 twu 19473 19474 * sequence.c, sequence.h: For circular-end reads, keeping reverse complement 19475 of queryseq2, but swapping queryseq1 and queryseq2. 19476 19477 * stage1hr.h: Placed local exon-exon mappings after multiple substitutions 19478 and indels in the hierarchy of levels. 19479 19480 * stage2.c: Commented out unused procedure 19481 19482 * stage3.c: Added debugging statement 19483 19484 * stage3hr.c, stage3hr.h: Enabled printing of circular-end reads 19485 194862008-11-13 twu 19487 19488 * stage1hr.c: Allowing mismatches with splicing 19489 194902008-11-11 twu 19491 19492 * gsnap.c, stage1hr.h: Removed unused parameters 19493 19494 * stage1hr.c: Using new single-end read algorithms for paired-end reads. 19495 Fixed problems with query sequences that contain non-ACGT characters. 19496 19497 * stage3hr.c: Fixed problems with printing inverted sequence for paired-end 19498 reads 19499 19500 * gmap_reassemble.pl.in: Initial entry into CVS 19501 19502 * stage3hr.c: Fixed bug in Stage3_remove_duplicates 19503 19504 * stage1hr.c: Revised splicing parameters. Fixed calculation of maxfloor. 19505 195062008-11-10 twu 19507 19508 * stage3hr.c: Favoring substitutions over equivalent indels in removing 19509 repeats. 19510 19511 * stage1hr.c: Treating max_middle_insertions and max_middle_deletions 19512 separately in solving middle indels. Removed scores from 19513 compute_end_indels. Not resetting min_mismatches after single_mm, because 19514 of effect of poly_at oligos. 19515 19516 * gsnap.c, stage1hr.h: Allowing separate parameters for middle and end 19517 insertions and deletions. 19518 19519 * stage3hr.c: Added check and warning messages if observed mismatches is 19520 different from the number expected 19521 19522 * stage1hr.c: Introduced floor system for computing indels. Defined 19523 calculation of middle and end indels more clearly, with separate 19524 parameters for middle and end insertions and deletions. 19525 195262008-11-06 twu 19527 19528 * stage1hr.c: Made fixes for splicing to work 19529 19530 * indexdb_hr.h: Removed obsolete functions 19531 19532 * indexdb_hr.c: Fixed masking of left shifts 19533 19534 * indexdb.c: Commented out warning message for multiple index files 19535 19536 * gsnap.c: Partially implemented minlevel and maxlevel controls for Stage1. 19537 Removed references to Stage3chimera_T objects 19538 19539 * resulthr.c, resulthr.h: Removed references to Stage3chimera_T objects 19540 19541 * stage3hr.h: Implemented new structure for Stage3 objects: single reads may 19542 have one or more substrings. 19543 19544 * stage3hr.c: Implemented new structure for Stage3 objects: single reads may 19545 have one or more substrings. Modified print procedure for indels to allow 19546 for mismatches. 19547 19548 * stage1hr.h: Removed chimerap variable since Stage3 single reads are all of 19549 the same type now. 19550 19551 * stage1hr.c: Fixed polyat assessment at ends of query. Storing first and 19552 last diagonals and computing mismatches on both. Changed ptr->indels to 19553 be consistently positive for insertions and negative for deletions. Using 19554 new Stage3 objects. Solving middle indels with mismatches. Using minlevel 19555 and maxlevel to control computing behavior on different alignment types. 19556 195572008-11-03 twu 19558 19559 * stage1hr.h: Made max_insertions and max_deletions parameters. Added 19560 minlevel and maxlevel. 19561 19562 * stage1hr.c: Cleaned up procedures for single mismatches and multiple 19563 mismatches. Added oligobase to minus diagonals to prevent negative 19564 coordinates. Made max_insertions and max_deletions parameters. Added 19565 minlevel and maxlevel. 19566 195672008-10-28 twu 19568 19569 * md_coords.pl.in: Revised instructions to user 19570 195712008-10-24 twu 19572 19573 * stage3.c: In comparing paths_fwd and paths_rev, using just number of 19574 matches 19575 19576 * stage2.c: Also performing stage2 if there is a sufficient value for 19577 ncoverage 19578 19579 * stage1.c: Removed matchsize and matchinterval from Stage1_T object, and 19580 allowing option in scan_ends of iterating on different matchsizes. In 19581 removal of repeated oligomers, now also removing neighboring oligomers. 19582 Now filtering gregions by support. 19583 19584 * reader.c: Added fields so Reader_reset_ends resets correctly 19585 19586 * gregion.c, gregion.h: Added function to filter by support 19587 19588 * gmap.c: Fixed error message for -z flag 19589 19590 * diag.c, diag.h: Returning ncovered from Diag_update_coverage 19591 19592 * block.c, block.h: Removed high-resolution option 19593 195942008-10-23 twu 19595 19596 * stage3.c: Handling case where gap is at beginning of path. Trimming end 19597 exons until a canonical intron is reached. 19598 19599 * stage1.c: Identifying repeated oligos at the outset 19600 19601 * pair.c: Made counting of ambiguous matches more uniform 19602 19603 * gsnap-to-iit.c: Added information about unique positions. Added ability 19604 to halt at a given position. 19605 19606 * gregion.c: Modified print statement 19607 196082008-10-10 twu 19609 19610 * pair.c: Made N's in query sequence align as mismatches in GMAP. 19611 19612 * gmapindex.c: Removing "chr" from chrsubset file 19613 19614 * stage1hr.c: Tightened criteria for finding exon-exon junctions. Not 19615 reading 10- or 11-mers at ends if the 12-mer is invalid. 19616 19617 * stage3hr.c: Made macros for text constants 19618 19619 * match.c, match.h: Added function Match_print 19620 19621 * gsnap.c: Changed default of trim flag to be false 19622 19623 * gmap.c: Added jobdiv capability to GMAP 19624 196252008-10-02 twu 19626 19627 * blackboard.c: Simplified the fix for the hang for input done with no inputs 19628 196292008-10-01 twu 19630 19631 * blackboard.c: Fixed hang that occurs when no input was ever received, 19632 which happens with the jobdiv option when the input has fewer sequences 19633 than the first batch modulus. 19634 196352008-09-26 twu 19636 19637 * iit_get.c: Allowing retrieval of labels that contain colons, by checking 19638 first to see if the first part of the label is a divstring. 19639 19640 * iit-read.c, iit-read.h: Added function to determine divint without reading 19641 entire IIT. 19642 196432008-09-23 twu 19644 19645 * stage3.h: Added PRE_ENDS as a debugging endpoint 19646 19647 * stage3.c: Modified solutions at ends. First, we decide between distal and 19648 medial, with distal penalized for non-canonical introns. Then, we simply 19649 extend the ends without peelback and permitting an initial gap. 19650 19651 * stage2.c: Introducing a minimum pct_coverage 19652 19653 * oligoindex.c, oligoindex.h: Allowed suffnconsecutive to be a different 19654 value in each level of resolution. 19655 19656 * match.c, match.h: Moved function Match_get_coords to gregion.c 19657 19658 * gregion.h: Added fields for weight and support. 19659 19660 * gregion.c: Added fields for weight and support. Duplicate gregions are 19661 now resolved in favor of the gregion with the greatest weight, or if 19662 equal, the greatest support. 19663 19664 * gmap.c: Increased extraband at end from 3 to 6 19665 19666 * dynprog.c: Allowing a gap to start the alignment of end5 and end3. 19667 Introducing a parameter init_jump_penalty_p to control this. 19668 19669 * diag.c: Replaced 0 with 0U in some cases 19670 19671 * stage1.c: Using weights on matches and on gregions to focus on genomic 19672 regions with most specificity. 19673 196742008-09-19 twu 19675 19676 * stage1.c: Added penalty for intron length in find_best_path to reduce 19677 excessively large regions. If segments are used, then clearing gregions 19678 and starting over. 19679 196802008-09-16 twu 19681 19682 * stage1hr.c: Fixed bug with false positives on middle indel. Fixed bug 19683 with combinations of insertions and deletions in find_segments_multiple_mm. 19684 196852008-09-15 twu 19686 19687 * gmap.c: Allowing multiple paths for alignment against user-provided 19688 segment. Explicitly recomputing goodness over all stage3 objects. 19689 Allowing user to specify direction of introns. 19690 19691 * stage2.c: Giving points for indexsize-equivalent number of matches if it 19692 starts a new chain. 19693 19694 * stage1hr.c: Fixed problem with insertions in first 12-mer. Now treating 19695 as a mismatch, as we did for insertions in the last 12-mer. 19696 19697 * oligoindex.h: Removed debug_graphic_p from argument list 19698 19699 * oligoindex.c: Fixed memory leak 19700 19701 * md5-compute.c: Added ability to handle multiple input files 19702 19703 * diag.c, diag.h, diagdef.h: Computing scores for each diagonal and 19704 requiring a minimum score 19705 197062008-09-09 twu 19707 19708 * stage1hr.c: Fixed bug in exact match to end of chromosome, resulting in 19709 negative coordinates of the next chromosome. 19710 19711 * stage3.h: Added function for recomputing goodness 19712 19713 * stage3.c: Made widebandp true on all single gap solutions. Extending 5' 19714 and 3' ends, rather than comparing distal with medial, when defect rate is 19715 high. Recomputing goodness using just matches if best hit is poor. 19716 197172008-09-08 twu 19718 19719 * diag.c, diag.h: Moved some functions from oligoindex.c to diag.c 19720 19721 * oligoindex.c, oligoindex.h: Implemented different mapping resolutions by 19722 using multiple oligoindices. Using a separate lookback for each 19723 resolution. 19724 19725 * stage2.c, stage2.h: Implemented different mapping resolutions by using 19726 multiple oligoindices. 19727 197282008-09-04 twu 19729 19730 * configure.ac: Added check for stat64 19731 19732 * acinclude.m4: Including config/acx_mmap_fixed.m4, 19733 config/acx_mmap_variable.m4, and config/struct-stat64.m4. 19734 19735 * VERSION: Updated version 19736 19737 * README: Augmented instructions for new gmap_setup flags and made mention 19738 of GSNAP. 19739 19740 * dynprog.c, dynprog.h: New functions added for dealing with an internal gap 19741 19742 * indexdb.c: Fixed problem in reading offsets and positions file based on 19743 interval of 6. 19744 19745 * gmap.c: Made default canonical mode to be 1 19746 19747 * stage3.c: Reverted to revision 1.300 with newer code kept for stage3debug. 19748 19749 * stage2.c: Reverted to revision 1.221 with newer code kept for converting 19750 oligomers to nucleotides 19751 19752 * smooth.h: Removed stage2_indexsize 19753 19754 * smooth.c: Reverted to revision 1.41, plus removal of stage2_indexsize 19755 19756 * oligoindex.h: Reverted to revision 1.47, plus most recent wobble masking 19757 and code for multiple oligoindices 19758 19759 * oligoindex.c: Reverted to revision 1.108, plus most recent wobble masking 19760 and code for multiple oligoindices 19761 19762 * diagpool.c: Removed initialization for bestscore and prev fields 19763 19764 * diagdef.h: Removed score, bestscore, and prev fields 19765 19766 * diag.h: Reverted to revision 1.5 with some functions moved from 19767 oligoindex.c. 19768 19769 * diag.c: Reverted to revision 1.7 with some functions moved from 19770 oligoindex.c. 19771 19772 * stage3.h: Calling stage 2 directly 19773 19774 * stage3.c: More attempts to rearrange steps 19775 19776 * stage2.c, stage2.h: Bypasses former stage 2 and returns best path of 19777 diagonals, converted to nucleotides 19778 19779 * smooth.c: Changed function for finding internal shorts 19780 19781 * oligoindex.c, oligoindex.h: Changed Oligoindex_get_mappings to return a 19782 list of diagonals 19783 19784 * iit-read.h: Added comments to explain arguments 19785 19786 * gmap.c: Having stage2 return a path 19787 19788 * diagpool.c: Added initialization for bestscore and prev 19789 19790 * diagdef.h: Added fields for bestscore and prev 19791 19792 * diag.c, diag.h: Added functions Diag_compare_querystart and Diag_best_path 19793 197942008-08-15 twu 19795 19796 * diag.c, diag.h, gmap.c, oligoindex.c, oligoindex.h, stage2.c, stage2.h: 19797 Implementation of oligoindex step at multiple resolutions 19798 19799 * stage2.c: Rearranged procedures in preparation for multiple oligoindices. 19800 19801 * oligoindex.c, oligoindex.h: Moved various functions from oligoindex.c to 19802 diag.c. Added various variables to Oligoindex_T struct. Rearranged 19803 procedures in preparation for multiple oligoindices. 19804 19805 * diag.c, diag.h: Moved various functions from oligoindex.c to diag.c 19806 19807 * stage2.h: Added a version of stage 2 that can be called from within stage 19808 3. 19809 19810 * stage2.c: Using active hits, instead of minactive and maxactive bounds. 19811 Added hooks for relying upon splice site scores. Made conversion to 19812 nucleotides handle arbitrary masks. Added penalty for diffdistance not a 19813 multiple of 3. Added a version of stage 2 that can be called from within 19814 stage 3. 19815 19816 * stage1hr.c: Exiting if a single polyat 12-mer found, to prevent false 19817 indels from being found in find_segments_multiple_mm. 19818 19819 * oligoindex.h: Computing active hits around each diagonal, instead of 19820 minactive and maxactive bounds. 19821 19822 * oligoindex.c: Added wobble masking. Computing dominance by using scores, 19823 based on number of diagonals overlapping each querypos. 19824 19825 * indexdb_hr.c: Added masking for all left shifts 19826 19827 * indexdb.c: Fixed problem where highest resolution indexdb was not being 19828 used 19829 19830 * gmap.c: Using new interface to Oligoindex_set_inquery 19831 19832 * diag.c, diag.h, diagdef.h: Added score to Diag_T object 19833 19834 * block.c: Added error message 19835 198362008-08-11 twu 19837 19838 * oligoindex.c, oligoindex.h: Passing character strings to procedures, 19839 rather than Sequence_T objects. 19840 198412008-08-10 twu 19842 19843 * gmap.c: Made changes to debug requests from stage3 19844 198452008-08-09 twu 19846 19847 * stage3.c: Rearranging steps to improve cross-species performance. Work 19848 still in progress. 19849 198502008-08-08 twu 19851 19852 * stage1hr.c: Removed old code 19853 19854 * stage1.c: Made heap and segment algorithm work for PMAP 19855 19856 * binarray.c, binarray.h: Removed binarray source code 19857 19858 * sequence.c: Redefined trim_end for PMAP to exclude the terminal stop codon 19859 added 19860 19861 * pmapindex.c: Including index1interval in filename for PMAP databases 19862 19863 * matchpool.c: Removed old code that referred to positions, not diagonals 19864 19865 * match.c: Simplified a procedure 19866 19867 * indexdbdef.h: For PMAP, allowed index1interval to be determined by 19868 available databases 19869 19870 * indexdb_hr.c, indexdb_hr.h: Moved Indexdb_read_no_subst command to 19871 indexdb.c 19872 19873 * indexdb.c, indexdb.h: Moved Indexdb_read_no_subst command here. Including 19874 index1interval into filename for PMAP databases. 19875 19876 * gmap.c: Changed variable name from samplingp to lowidentityp 19877 19878 * block.c: Bypassing oligo.c and calling Indexdb commands directly 19879 198802008-08-06 twu 19881 19882 * Makefile.dna.am, Makefile.gmaponly.am, Makefile.gsnaptoo.am, 19883 Makefile.pmaptoo.am, stage1.c, stage1.h: Removed binarray 19884 198852008-08-05 twu 19886 19887 * binarray.c, binarray.h, stage1.c: Transitioning away from bins and toward 19888 segments. Intermediate code contains both sets of functions. 19889 198902008-08-01 twu 19891 19892 * block.c, block.h, gregion.c, gregion.h, matchpool.c, matchpool.h, 19893 stage1.c: Just before change to using diagonals, with directives 19894 indicating changes 19895 198962008-07-30 twu 19897 19898 * gsnap.c: Changed batch specification so it runs from 0 to n-1. 19899 19900 * stage1hr.c: Changed hierarchy of results to be exact, sub:1, local 19901 splicing, half introns, sub:2, sub:3, sub:4, indels, distant splicing. 19902 Increased speed for computing splice ends. Limiting nmismatches for each 19903 splice end, so not checking nmismatches for splicing after that. 19904 19905 * gsnap.c: Reduced default maxpaths to 20 and maxchimerapaths to 2 19906 199072008-07-29 twu 19908 19909 * stage1hr.c: Improved identification of repetitive oligos 19910 19911 * sequence.c: Better handling of FASTA files that end with blank lines 19912 19913 * stage1hr.c, stage1hr.h: Implemented different sizes for insertions and 19914 deletions 19915 19916 * stage3hr.c, stage3hr.h: Added function Stage3_remove_duplicates 19917 19918 * stage1hr.c: Made 12-mer mod 3 strategy work for multiple mismatches, 19919 indels, and exon-exon junctions. 19920 199212008-07-28 twu 19922 19923 * stage1hr.c: Removed special variables for -2, -1, querypos+1, and 19924 querypos+2. Removed middle_indel_p. 19925 19926 * stage1hr.c: Made paired reads use new 12-mer strategy for exact and 1-sub 19927 19928 * stage1hr.h: Changed variable name to expected_pairlength 19929 19930 * datadir.c: Improved error message when genome db not found 19931 19932 * indexdb_hr.c, indexdb_hr.h, intlist.c, intlist.h, stage1hr.c: Implemented 19933 faster version of exact and 1-sub using 12-mers 19934 199352008-07-26 twu 19936 19937 * indexdb_hr.c: Removed oligo_hr.h and oligo_hr.c. Added code for reading 19938 left and right subst of 1 and 2 nts. 19939 19940 * oligo_hr.c, oligo_hr.h: Removed oligo_hr.h and oligo_hr.c 19941 19942 * Makefile.dna.am, Makefile.gmaponly.am, Makefile.gsnaptoo.am, 19943 Makefile.pmaptoo.am, block.c, indexdb.c, indexdb.h, stage1hr.c: Removed 19944 oligo_hr.h and oligo_hr.c. 19945 19946 * indexdb_hr.h: Initial import into CVS 19947 199482008-07-17 twu 19949 19950 * stage3hr.c: Fixed handling of trimming for inverted hits. Fixed handling 19951 of hits that have negative genomic coordinates. 19952 19953 * stage1hr.c: Fixed handling of trimming at ends 19954 19955 * pair.c: Changed output to show "genome" instead of "chr" 19956 19957 * Makefile.dna.am, Makefile.gmaponly.am, Makefile.gsnaptoo.am, 19958 Makefile.pmaptoo.am, diagnostic.c, diagnostic.h, gmap.c, result.c, 19959 result.h, stage1.c, stage1.h: Added Diagnostic_T to hold information 19960 19961 * chimera.c, chimera.h, get-genome.c, iit_plot.c, match.c, match.h, 19962 stage3.c: Using new interface to Genome_get_segment 19963 19964 * genome.c, genome.h: Printing out-of-bounds characters on all cases where 19965 coordinates exceed chromosomal boundaries. 19966 199672008-07-07 twu 19968 19969 * stage1hr.c: Added comment 19970 19971 * stage1.c: Added high-resolution sampling 19972 19973 * oligo.c, oligo.h: Removed burden of leftreadshift to caller 19974 19975 * indexdb.c, indexdb.h: Added function to provide indexing interval 19976 19977 * block.c, block.h: Added high-resolution behavior to Block_T object 19978 19979 * gmap.c: Added hybrid behavior for finding canonical introns: low reward 19980 for high-identity sequences and high reward otherwise. 19981 19982 * stage1.h: Removed obsolete functions 19983 19984 * stage1.c: Renamed variables 19985 19986 * pair.c, pair.h, stage3.c: Printing separate runtimes for stage2 19987 diagonalization and alignment 19988 19989 * stage2.h: Computing separate runtimes for stage2 diagonalization and 19990 alignment. 19991 19992 * stage2.c: Reinstating limitation on maximum number of active hits. 19993 Computing separate runtimes for stage2 diagonalization and alignment. 19994 19995 * stage1.c, stage1.h: Reporting whether sampling was used 19996 19997 * gmap.c: Using smaller stage 2 indexsize when stage 1 sampling is done 19998 19999 * plotgenes.c, plotgenes.h: Added ability to handle values 20000 20001 * pdldata.c: Using Access_mmap function 20002 20003 * gdiag.c, gsnap-to-iit.c: Using new interface to Genome_fill_buffer 20004 20005 * subseq.c: Added initial '>' to header 20006 20007 * stage3.c: Using new interface to IIT_print 20008 20009 * stage1.c: Removed references to Matchpair_T 20010 20011 * pmapindex.c: Removed -l as an input flag 20012 20013 * oligo.c, oligo.h: Added code for identifying repetitive oligos 20014 20015 * match.c, match.h: Added code for dealing with pairs of matches 20016 20017 * matchpair.c, matchpair.h: Removed Matchpair_T code 20018 20019 * indexdb.h: Restoring previous definition of sufficient support 20020 20021 * iit_update.c: Using new interface to IIT_read 20022 20023 * iit_plot.c: Handling values, in addition to counts and genes 20024 20025 * gregion.c, gregion.h: Added fields to Gregion_T 20026 20027 * gmap.c: Interpreting optarg as strings, not integers 20028 20029 * get-genome.c: Removed -F and -R flags. Using -R flag for relative 20030 coordinates. 20031 20032 * block.h: Separated interfaces for GMAP and PMAP 20033 20034 * block.c: Added hook for removing repetitive oligos 20035 20036 * binarray.c: Taking all boxes in final step. Reduced debugging output. 20037 20038 * Makefile.dna.am, Makefile.gmaponly.am, Makefile.gsnaptoo.am, 20039 Makefile.pmaptoo.am: Added binarray.c and .h and removed matchpair.c and .h 20040 20041 * matchpair.c: Added code for bins 20042 20043 * stage1.c: Added two levels of ntopboxes 20044 200452008-07-03 twu 20046 20047 * binarray.c, binarray.h, stage1.c: Implemented working version of binarray 20048 algorithm 20049 20050 * Makefile.util.am, revcomp.c, seqlength.c, subseq.c: Added utility programs 20051 for internal use 20052 200532008-07-02 twu 20054 20055 * binarray.c, binarray.h: Initial import into CVS 20056 200572008-07-01 twu 20058 20059 * stage1.c, stage1.h: Initial implementation of bins 20060 200612008-06-30 twu 20062 20063 * stage3.c, iit-read.c, iit-read.h: Added ability to print relative 20064 coordinates 20065 20066 * stage1hr.c: Improved handling of heaps. Added code for handling 20067 out-of-bounds conditions. 20068 20069 * sequence.c, sequence.h: Added command for printing revcomp of sequence 20070 20071 * interval.c, indexdb.c: Added debugging statements 20072 20073 * indexdb_hr.c: Replaced separate variables for heapsize and delta into a 20074 single header. Added code for doing all reads, then doing all writes. 20075 20076 * iitdef.h: Storing separate mmap pointers for parts of IIT 20077 20078 * gmap.c: Removed -R flag 20079 20080 * gsnap.c: Added ability to handle input sequence in batches 20081 20082 * genome.c, genome.h: Changed out-of-bounds symbol to be '*'. 20083 200842008-06-25 twu 20085 20086 * gmap.c: Made low reward for canonical sequences to be the default 20087 20088 * get-genome.c: Fixed calculation of genomiclength 20089 20090 * iit-read.h: Removed unused function. 20091 20092 * iit-read.c: Now doing memory mapping of pointers rather than reading all 20093 of them. Fixed bug in reporting second chromosomal coordinate. Fixed bug 20094 in sorting segments by coordinate. 20095 200962008-05-19 twu 20097 20098 * indexdb_hr.c: Made process_heap inline. Removed delta from Batch_T. 20099 20100 * indexdb_hr.c: Various code introduced to improve speed of heapify operation 20101 201022008-05-09 twu 20103 20104 * access.c, access.h: Fixed mmap calls with offset so offset is on a page 20105 boundary 20106 20107 * stage3hr.c, stage3hr.h: Added new functions for filtering and sorting 20108 chimeras. Fixed calls of scrambled exons. 20109 20110 * stage1hr.c: Major efficiency improvements in heapify and other heap 20111 functions for merging diagonals 20112 20113 * indexdb_hr.c: Major efficiency improvements in heapify and other heap 20114 functions for merging batches 20115 20116 * stage1hr.c: Changed filtering methodology for exon-exon junctions 20117 201182008-05-08 twu 20119 20120 * indexdb_hr.c: Using pointers to memory-mapped positions file, and adding 20121 shift-plus-diagterm as heap builds the final array of positions. 20122 201232008-05-07 twu 20124 20125 * indexdb_hr.c: Restored previous version 20126 20127 * indexdb_hr.c: Attempt to reduce D2 cache miss rate, but actually increases 20128 it by 10x. 20129 201302008-05-06 twu 20131 20132 * stage3hr.c: Using correct type for Stage3chimera_t objects 20133 20134 * stage1hr.c: Set chimerap flag correctly 20135 20136 * indexdb_hr.c: Faster counting of entries in cases where duplicates are not 20137 allowed 20138 20139 * indexdb.c: Minor syntactic changes 20140 20141 * gsnap.c: Turned off reading of labels for map iit files 20142 201432008-05-05 twu 20144 20145 * gmapindex.c, pmapindex.c: Providing chromosome_iit to procedures for 20146 writing offset and position files 20147 20148 * stage3hr.c, stage3hr.h: Values for chrnum are pre-computed rather than 20149 computed here 20150 20151 * stage1hr.c: Using new interfaces to Genome_fill_buffer and Stage3_new 20152 routines 20153 20154 * indexdb.c, indexdb.h: No longer storing oligomers at ends of chromosomes 20155 20156 * iit-read.c, iit-read.h, iitdef.h: Providing specific fields for memory 20157 mapping of labels and annotations. Reading all pointers for labels and 20158 annotations. 20159 20160 * genome.c, genome.h: Trimming correctly at chromosome boundaries. 20161 Returning chrnum. 20162 20163 * access.h: Added function to mmap at a particular offset. 20164 20165 * access.c: Added function to mmap at a particular offset. Added check for 20166 struct stat64. 20167 20168 * stage1hr.c: Searching for indels only if substitution fails 20169 201702008-05-04 twu 20171 20172 * gsnap.c, stage1hr.c, stage1hr.h, stage3hr.c, stage3hr.h: Added trimming of 20173 mismatches at ends of substitutions 20174 201752008-04-25 twu 20176 20177 * stage3hr.c, stage3hr.h: Printing information about paired result type and 20178 about structural variations in spliced reads. 20179 20180 * stage3.c: Removed include of maxent.h 20181 20182 * stage1hr.c, stage1hr.h: Added hierarchy of paired result types. Checking 20183 for cross repetitiveness. 20184 20185 * sequence.c: Improved debugging statements 20186 20187 * resulthr.c, resulthr.h: Storing information about paired result type 20188 20189 * iit_plot.c, plotgenes.c, plotgenes.h: Fixed handling of new IIT map format 20190 20191 * pair.c: Changed output format for IIT-readable files (-f 7) 20192 20193 * list.c, list.h: Added function of List_to_array that reports list length 20194 20195 * iit-read.c: Fixed handling of flanking intervals 20196 20197 * gsnap.c: Adding information about paired result type and providing 20198 information about max number of paired paths. 20199 20200 * Makefile.gsnaptoo.am: Using tables for IITs. Removed iit_update. 20201 20202 * Makefile.dna.am: Using tables for IIT. Removing gdiag and iit_update. 20203 202042008-04-23 twu 20205 20206 * iit_get.c: Fixed bug in handling queries from stdin 20207 202082008-04-22 twu 20209 20210 * stage3hr.c: Slight improvement in efficiency in eliminating duplicates or 20211 dominated paired end solutions. 20212 20213 * dynprog.c: Reduced mismatch penalty for low quality sequences. Equalizing 20214 extension penalty for single gaps, regardless of sequence quality. 20215 20216 * maxent.c: Added debugging statements 20217 20218 * stage3hr.c: Fixed bugs in eliminating duplicate or dominated paired-end 20219 results 20220 20221 * iit-read.c: Fixed memory leak for entire IIT structure 20222 20223 * stage3.c: In dual break, peeling pairs first. 20224 20225 * stage3.c: Improved handling of dual breaks by scanning genomic segment 20226 202272008-04-21 twu 20228 20229 * iit_store.c: Fixed bug in handling intervals without divs 20230 20231 * iit-write.c: Added error message if total_nintervals is zero. 20232 20233 * iit-read.c: Modified output for IIT_dump 20234 20235 * fa.iittest, iit_get.out.ok: Modified IIT input/output for new interval 20236 format 20237 20238 * stage3hr.c, stage3hr.h: Added code for printing half introns. Now storing 20239 chrnum when Stage3_T objects are computed. Using chrnum to determine 20240 whether two paired ends are connectable. 20241 20242 * stage1hr.c: Using scores to determine whether indel beats substitution. 20243 Added code for finding half introns. Now storing chrnum when Stage3_T 20244 objects are computed. 20245 20246 * indexdb_hr.c: Added code, not currently used, for using doubles to find 20247 longer oligomers. 20248 202492008-04-15 twu 20250 20251 * stage1hr.c, stage1hr.h: Increased MAX_INDELS, and using it instead of 20252 hard-coded 3 20253 20254 * plotdata.c, segmentpos.c, stage3hr.c, chrnum.c, chrnum.h, chrsubset.c, 20255 gdiag.c, genomepage.c, genomeplot.c: Using new interface to IIT_label 20256 20257 * plotgenes.c, plotgenes.h, stage3.c: Using new interface to IIT_get and 20258 IIT_label 20259 20260 * table.c, table.h: Added functions Table_string_compare and 20261 Table_string_hash 20262 20263 * pmapindex.c, iit_dump.c, iit_plot.c: Using new interface to IIT_read 20264 20265 * match.c, pair.c: Using new interface to Chrnum_to_string 20266 20267 * indexdb.c: Using IIT_total_nintervals 20268 20269 * indexdbdef.h: Moved definition of Indexdb_T to a separate file 20270 20271 * iitdef.h: Added fields for whether labels were read, and for offsets to 20272 various parts of the iit file. 20273 20274 * iit_store.c: Using new version for reporting intervals 20275 20276 * iit_get.c: Using new interface to IIT_get and IIT_read. Added ability to 20277 center annotations at a given column. 20278 20279 * iit-write.c: Fixed bugs for divs with no intervals 20280 20281 * iit-print.c, iit-print.h: Moved IIT_print procedures back to iit-read.c. 20282 20283 * iit-read.c, iit-read.h: Fixed bug in handling divs with no intervals. 20284 Allowing memory mapping of labels and intervals and their pointers (in 20285 addition to annotations). Moved IIT_print procedures back to this file. 20286 20287 * gsnap.c: Providing flag for user to specify consecutive matches, to 20288 control speed 20289 20290 * gsnap-to-iit.c: Removed flag for old GSNAP version output format 20291 20292 * gmapindex.c: Using tables to provide information to IIT_write 20293 20294 * get-genome.c, gmap.c: Using new interface to IIT_get and IIT_read 20295 20296 * genome-write.c: Using new interface to IIT_get 20297 202982008-04-10 twu 20299 20300 * gregion.c, match.c, matchpool.c: Made IIT_get_one pass additional parameter 20301 203022008-04-01 twu 20303 20304 * stage1hr.c: Various methods to improve speed, including separate 20305 processing for plus and minus strands, use of threshold_noligomers and a 20306 user-specified threshold_score for finding segments for multiple 20307 mismatches. 20308 203092008-03-31 twu 20310 20311 * stage1hr.c: Removed old code based on fixed (nonrecursive) oligosize 20312 20313 * stage1hr.h: Changed variable names 20314 20315 * stage1hr.c: Using new variable names for paired-end lengths. Generalized 20316 mask for oligosize. 20317 20318 * gsnap.c: Removed -a flag and replaced it with -S flag. Changed flags for 20319 paired-end lengths. 20320 20321 * chrnum.c, chrom.c, chrom.h, chrsubset.c, segmentpos.c: Using new interface 20322 to IIT routines with divs. 20323 20324 * get-genome.c: Moved Chrom_string_from_position function to iit-print.c 20325 20326 * stage3hr.h: Changed variable names for paired-end lengths. 20327 20328 * stage3hr.c: Using new interface to IIT routines wiht divs. Changed 20329 variable names for paired-end lengths. 20330 20331 * stage1hr.c: Made indel alignments extend inward from ends as far as 20332 possible. 20333 20334 * stage1hr.c: Added new routine for computing indels without using dynamic 20335 programming matrix. Maximizes matches from left to right. 20336 203372008-03-27 twu 20338 20339 * iit-print.c, iit-print.h, iit-read.c, iit-read.h, iit-write.c, 20340 iit-write.h, iit_get.c, iit_store.c, iitdef.h: Introduced version 3 of IIT 20341 format, to handle multiple divs. 20342 203432008-03-20 twu 20344 20345 * Makefile.dna.am, Makefile.gsnaptoo.am: Removed block_hr and blockdef files 20346 20347 * pmapindex.c: Removed both uppercase and lowercase flags, and added -l flag 20348 to make the distinction 20349 20350 * stage3hr.c: Changed order of output so type of match comes before genomic 20351 location 20352 20353 * stage1hr.c: Handling short reads with lowercase characters. Using 20354 Oligo_hr functions rather than Block_T functions. 20355 20356 * sequence.c, sequence.h: Added functions to handle short reads with 20357 lowercase characters 20358 20359 * oligo_hr.c, oligo_hr.h: Moved leftreadshift step out of oligo_hr functions 20360 20361 * oligo-count.c: Using new interface to Block_new 20362 20363 * indexdb_hr.c: Removed checking for duplicates 20364 20365 * indexdb.c, indexdb.h: Added ability to mask lowercase characters in genome 20366 20367 * gsnap.c: Made program work for query sequences with lower case 20368 20369 * gmapindex.c: Removed uppercase and lowercase flags and added -l flag. 20370 Making ".masked" indexdb files for masked genomes (where lowercase nts not 20371 indexed). 20372 20373 * genome.c, genome.h: Changed name of variable 20374 20375 * block.c, block_hr.c, block_hr.h, blockdef.h: Restored definition of 20376 Block_T to block.c 20377 203782008-03-05 twu 20379 20380 * gsnap.c: Using new interfaces to Stage1 procedures 20381 20382 * stage1hr.c: Deleted debugging statements that give a seg fault 20383 20384 * stage1hr.c, stage1hr.h: Generalized procedures to use arbitrary oligosize 20385 20386 * stage1.c: Using new interface to Block_new 20387 20388 * block.c, block.h, block_hr.c, blockdef.h, oligo.c, oligo.h, oligo_hr.c, 20389 oligo_hr.h: Generalized procedures to handle arbitrary oligosize 20390 20391 * indexdb_hr.c: Fixed bugs in adding wildcard nucleotides 20392 20393 * indexdb.c: Fixed bug in recognizing index file at interval 6 20394 20395 * Makefile.dna.am, Makefile.gmaponly.am, Makefile.gsnaptoo.am: Added new 20396 source files 20397 20398 * stage3hr.c: Generalized print procedures to handle arbitrarily long reads 20399 20400 * stage1hr.c, stage1hr.h: Added recursive procedures for paired end reads 20401 20402 * stage1.c: Using original GMAP calls to Block_T procedures 20403 20404 * oligo.c, oligo.h, oligo_hr.c, oligo_hr.h: Moved GSNAP-specific procedures 20405 to a separate file 20406 20407 * intlist.c, intlist.h: Added function Intlist_ascending_by_key 20408 20409 * indexdb_hr.c: Moved GSNAP-specific procedures to a separate file. 20410 20411 * indexdb.h: Using "id<number>" as file suffix for offsets and positions 20412 files. 20413 20414 * indexdb.c: Moved definition of Indexdb_T object to a separate file. 20415 Separated GSNAP-specific procedures to a separate file. Using 20416 "id<number>" as file suffix for offsets and positions files. 20417 20418 * gmapindex.c: Removed -e flag for specifying subindexing 20419 20420 * block_hr.c, block_hr.h: Made separate file for GSNAP-specific procedures 20421 20422 * blockdef.h: Put Block_T definition into a separate file 20423 20424 * block.h: Removed Block_T procedures specific to GSNAP 20425 20426 * block.c: Put Block_T definition into a separate file. Removed GSNAP 20427 parameters for GMAP calls to Block_T procedures. 20428 204292008-03-04 twu 20430 20431 * stage1hr.c: Implemented recursive method for finding exact matches. 20432 Binary search not yet added. 20433 204342008-03-03 twu 20435 20436 * Makefile.am, cvs2cl.pl: Made maintainer Perl machine-independent 20437 20438 * fa_coords.pl.in, gmap_process.pl.in, gmap_setup.pl.in, md_coords.pl.in: 20439 Made different make commands for gmapdb_highres and gmapdb_lowres 20440 20441 * Makefile.dna.am, Makefile.gmaponly.am, Makefile.gsnaptoo.am: Customized 20442 each Makefile.am for its specific task 20443 204442008-02-29 twu 20445 20446 * gmapindex.c, iit-read.c, segmentpos.c: Using new interface to obtain 20447 strings from Chrom_T objects 20448 20449 * chrom.c, chrom.h: Restricted criteria for considering initial part of 20450 chromosome string as numeric. Now storing initial string directly. 20451 20452 * gsnap.c: Using new interface to print commands 20453 204542008-02-28 twu 20455 20456 * stage3hr.c, stage3hr.h: Changed output to be more uniform, in a 1-column 20457 format 20458 20459 * list.c: Added include of string.h 20460 20461 * iit_plot.c: Made program able to print counts 20462 20463 * iit-read.c: Added more informative error messages when offset appears 20464 incorrect relative to filesize. Removed output of type in print_record. 20465 20466 * gsnap-to-iit.c: Handles new GSNAP output format. Handles remapping to 20467 genome. 20468 20469 * get-genome.c: Made program work correctly on chromosomally tagged IIT map 20470 files 20471 20472 * genomepage.c, genomepage.h: Removed sequence as a parameter 20473 20474 * pair.c, pair.h, stage3.c: Modified output of exon map 20475 20476 * plotgenes.c, plotgenes.h: Added function for printing counts 20477 20478 * Makefile.dna.am, Makefile.gsnaptoo.am, blackboard.c, blackboard.h, gmap.c, 20479 gsnap.c, params.c, params.h, reqpost.c, reqpost.h: Removed Params_T object 20480 204812008-02-26 twu 20482 20483 * gsnap-to-iit.c: Handling new version of gsnap output (after remapping). 20484 20485 * gsnap-to-iit.c: Added -b flag to specify blocksize. Made default 20486 blocksize 10000. 20487 204882008-02-13 twu 20489 20490 * iit_plot.c, plotgenes.c, plotgenes.h: Fixed printing of genes in ascii 20491 format 20492 204932008-02-08 twu 20494 20495 * plotgenes.c, plotgenes.h: Added binning by pixel. Removed allgenesp for 20496 plot_counts. 20497 204982008-02-07 twu 20499 20500 * gsnap-to-iit.c, plotgenes.c: Modified count format for IITs to store 20501 information in batches 20502 20503 * plotgenes.c: Added printing of alternate counts. Fixed problem for calls 20504 to IIT_get_typed. 20505 20506 * gsnap-to-iit.c: Initial import into CVS 20507 205082008-02-06 twu 20509 20510 * iit_plot.c: Increased top margin. Added -V flag for handing count data. 20511 20512 * plotgenes.c, plotgenes.h: Added function for plotting count data. 20513 Handling signs for both versions 1 and 2 of IIT files. 20514 20515 * iit-read.h: Added interface for IIT_version() 20516 20517 * iit-read.c: Added abort statement for negative coordinates 20518 20519 * sequence.c: Added functions for skipping sequences 20520 20521 * indexdb.c: Commented out some information output to stderr 20522 20523 * iit_get.c: If iit file not found, try adding ".iit" suffix 20524 20525 * stage3hr.c: Printing distances for spliced reads only if distance value is 20526 nonzero 20527 20528 * stage1hr.c: Fixed calculation of distances in spliced reads 20529 205302008-02-05 twu 20531 20532 * Makefile.gmaponly.am, Makefile.gsnaptoo.am, Makefile.pmaptoo.am: Added 20533 compiler commands for iit_plot 20534 20535 * iit_plot.c: Taken from mapplot.c in gdp. 20536 20537 * genomepage.c, genomepage.h: Extracted commands from gdata-write in gdp. 20538 20539 * plotgenes.c, plotgenes.h: Incorporated changes from gdp. Improved 20540 plotting capabilities. 20541 20542 * list.c, list.h: Incorporated changes from gdp. Added List_from_string. 20543 20544 * color.c: Incorporated changes from gdp. Removed yellow. 20545 205462008-01-30 twu 20547 20548 * gsnap.c: Limited reporting of exon-exon paths. Added -E flag to turn off 20549 finding of exon-exon solutions. 20550 20551 * genome.c, genome.h: Made Genome_fill_buffer return a false value if it 20552 goes into negative genome coordinates. 20553 205542008-01-29 twu 20555 20556 * stage1hr.c: Skipping cases that result in negative genomic coordinates. 20557 Skipping cases of finding first indels when alignment doesn't extend to 20558 the end. 20559 20560 * stage3hr.c, stage3hr.h: Made fixes for handling exon-exon junctions 20561 20562 * stage1hr.c: Fixed problems in handling various combinations of 20563 sense/antisense and plus/minus strands for exon-exon junctions. 20564 20565 * gmap.c: Made finding canonical introns the default. Made -X flag take an 20566 argument. 20567 205682008-01-16 twu 20569 20570 * stage1hr.c, stage3hr.c, stage3hr.h: Improved algorithm for finding and 20571 ranking chimeras 20572 205732008-01-14 twu 20574 20575 * gmap.c, pair.c, pair.h, stage3.c, stage3.h: Added output in IIT exon map 20576 format 20577 20578 * stage1hr.c, stage3hr.c, stage3hr.h: Added printing of number of mismatches 20579 for chimeras 20580 20581 * dynprog.c: Added type cast for memset. 20582 20583 * stage1hr.c: Reduced max mismatches to 4. Penalized mismatches further in 20584 finding breakpoints for chimeras. 20585 20586 * stage3hr.c: Reporting breakpoint coordinates for chimeras 20587 20588 * stage1hr.c: Increased penalty for mismatches to help find correct 20589 breakpoint for chimeras 20590 205912008-01-11 twu 20592 20593 * Makefile.gmaponly.am, Makefile.gsnaptoo.am, Makefile.pmaptoo.am, 20594 stage1hr.c, stage3hr.c, stage3hr.h: Added probabilistic calculations of 20595 splice sites 20596 20597 * gsnap.c, stage1hr.c, stage1hr.h, stage3hr.c, stage3hr.h: Implemented code 20598 for identifying chimeras 20599 20600 * resulthr.c, resulthr.h: Added a new type for chimeras 20601 206022008-01-09 twu 20603 20604 * stage1hr.c: Fixed typo in variable name 20605 20606 * resulthr.c, resulthr.h, stage3hr.c, stage3hr.h, gsnap.c: Added the ability 20607 to report paired-end cases that fail to co-localize 20608 20609 * gmap.c: Cleaned up some code 20610 20611 * gsnap.c, stage1hr.c, stage1hr.h: Added -o flag to specify optimum length 20612 206132008-01-08 twu 20614 20615 * gsnap.c: Added -I flag for specifying inversion of second read of paired 20616 end read 20617 20618 * sequence.c, sequence.h: Added procedure for printing revcomp of a short 20619 read 20620 20621 * stage3hr.c, stage3hr.h: Provided options for printing second read either 20622 in original direction or as reverse complement. 20623 20624 * stage1hr.c: Fixed various memory leaks 20625 20626 * gsnap.c: Added flag to print all solutions, either for single read or for 20627 paired end read. 20628 20629 * stage3hr.c, stage3hr.h: Added procedures for sorting results of single 20630 read mappings 20631 20632 * stage1hr.c, stage1hr.h: Added ability to print all solutions in single read 20633 20634 * stage3hr.c, stage3hr.h: Added sorting of results by closeness to optimal 20635 distance 20636 20637 * gsnap.c: Removed unused variables. Removed instant printing feature. 20638 20639 * stage1hr.c, stage1hr.h: Removed instant printing feature 20640 206412008-01-07 twu 20642 20643 * resulthr.c, resulthr.h: Generalized Result_T object so it can print either 20644 single or paired end results 20645 20646 * sequence.c, sequence.h: Implemented procedure for reading short reads, 20647 either single or paired ends. 20648 20649 * request.c, request.h: Enabled storage of paired reads in Request_T object 20650 20651 * stage3hr.c, stage3hr.h: Implemented routines for storing and printing 20652 paired ends 20653 20654 * stage1hr.c: Implemented separate strategy for handling reads with poly-A 20655 or poly-T 12-mers. In such cases, need to test 12-mers exhaustively. 20656 20657 * stage1hr.c, stage1hr.h: Initial implementation of mapping for paired 20658 reads. For consistency, changed indel to be same rank as sub:2 for single 20659 reads. Generalized separator used in printing results. 20660 206612008-01-04 twu 20662 20663 * stage1hr.c: Separated single read strategy into separate components 20664 20665 * sequence.c: Fixed a memory leak. 20666 20667 * stage3hr.c, stage3hr.h: Added a stage3 procedure specific for GSNAP 20668 20669 * sequence.c, sequence.h: Added a read procedure that converts input to 20670 uppercase 20671 20672 * gsnap.c, resulthr.c, resulthr.h: Made GSNAP algorithm return results 20673 rather than printing them 20674 20675 * Makefile.gmaponly.am, Makefile.gsnaptoo.am, Makefile.pmaptoo.am: Removed 20676 gregion.c and added stage3hr.c to GSNAP build 20677 20678 * stage1hr.c, stage1hr.h: Made algorithm return results rather than printing 20679 them. Fixed a bug in handling cases with mismatches on both ends. 20680 206812007-12-19 twu 20682 20683 * gmap.c, pair.c, pair.h, stage3.c, stage3.h: Added -4 flag for printing 20684 alignments per exon 20685 20686 * gsnap.c: Removed unused code 20687 20688 * stage1hr.c: Removed unused header files and objects 20689 20690 * stage1.c, stage1.h: Added function Stage1_size() 20691 20692 * resulthr.c, resulthr.h: Added specialized Result_T for GSNAP 20693 20694 * reqpost.h: Added conditional include of resulthr.h for GSNAP 20695 20696 * params.c, params.h: Removed obsolete fields 20697 20698 * matchpair.c, matchpair.h: Removed obsolete functions 20699 20700 * iit-read.c: Removed warning message about not finding a file 20701 20702 * stage3.h: Made maponly mode work with Gregion_T objects. 20703 20704 * stage3.c: Fixed bug where genomicuc_ptr was NULL. Made maponly mode work 20705 with Gregion_T objects. 20706 20707 * gregion.c, gregion.h: Added fields to Gregion_T for maponly mode. 20708 20709 * gmap.c: Restored old maponlyp code. Added error message if pthreads fails 20710 on operating system. 20711 20712 * genuncompress.c: Added flag to print one character per line 20713 20714 * blackboard.c, blackboard.h: Added function to see if blackboard is done 20715 20716 * Makefile.gmaponly.am, Makefile.gsnaptoo.am, Makefile.pmaptoo.am: Added 20717 commands for GSNAP 20718 207192007-12-07 twu 20720 20721 * pair.c: Fixed bug with trim_end, now an exclusive coordinate rather than 20722 an inclusive one. 20723 20724 * cvs2cl.pl: Added changelog program to CVS 20725 20726 * gmap.c: Changed calls to Stage1_compute to match new interface 20727 20728 * stage1hr.h: Removing unused parameters 20729 20730 * stage1hr.c: Corrected positions of 12-mers for sequences shorter than 36 20731 nt. Reduced final threshold score to 12. Checking for repetitive sequence. 20732 20733 * stage1.c, stage1.h: Separated stage 1 low-resolution procedure from 20734 high-resolution procedure. 20735 20736 * block.c, block.h, indexdb.c, indexdb.h, oligo.c, oligo.h: Added procedure 20737 for counting number of genomic positions for a given oligomer 20738 20739 * gsnap.c: Now calling Stage1hr_compute directly. Removed some unused code. 20740 20741 * gmap.c: Restored maponly mode 20742 207432007-12-06 twu 20744 20745 * stage1hr.c: Initial attempt to generalized procedure to handle oligomers 20746 shorter than 36, using initial testing of 3 12-mers. 20747 207482007-12-01 twu 20749 20750 * stage1hr.c: Fixed bugs in computation and printing of middle indels. 20751 Fixed bug when best_querypos (in terms of npositions) was -1. 20752 20753 * stage1hr.c: Added strategy of using most specific oligomer to drive search 20754 for exact matches. 20755 20756 * stage1hr.c: Added keep_score parameter to find_segments. 20757 20758 * indexdb.c: Made heapify faster. Hard-coded left_nts and right_nts to be 1 20759 in read_shifted. 20760 20761 * genome.c: In Genome_fill_buffer, checking for negative starting 20762 coordinate, and filling with N's if necessary. 20763 207642007-11-30 twu 20765 20766 * stage1hr.c: Limited dynamic programming to just the non-matching oligomer, 20767 whenever possible. 20768 20769 * stage1hr.c: Added a triplet matching step with binary search to find exact 20770 matches. Fixed a bug in find_segments in handling the last diagonal. 20771 207722007-11-29 twu 20773 20774 * stage1hr.c: Changed output format to show substitutions, insertions, and 20775 deletions. Made speed improvements in heap algorithm. 20776 20777 * gsnap.c: Simplified code for handling short reads. Stopped usage of 20778 oligoindex. 20779 20780 * stage1hr.c: Implemented version that handles indels. Some speed 20781 improvements in reporting exact matches when found. 20782 207832007-11-27 twu 20784 20785 * stage1hr.c: Removed unused code 20786 20787 * stage1hr.c, stage1hr.h: Working version implemented for 36-mers, allowing 20788 for substitutions 20789 20790 * stage1.h: Taking queryseq as an argument for Stage1_compute (needed for 20791 gsnap). 20792 20793 * stage1.c: Using new interface to Block_process_oligo 20794 20795 * Makefile.am: Makefile.am now generated by bootstrap from other files 20796 20797 * stage2.c: Added debugging statements 20798 20799 * sequence.c, sequence.h: Added procedure Sequence_print_oneline 20800 20801 * rbtree.c, rbtree.h, rbtree.t.c, gregion.c, gregion.h: Initial import into 20802 CVS 20803 20804 * result.c, result.h: Added procedure Result_blank 20805 20806 * params.c, params.h: Removed truncstep 20807 20808 * oligoindex.c: Added correct calculation of badoligos 20809 20810 * oligo.c, oligo.h: Providing diagterm information to lookups from indexdb 20811 20812 * indexdb.c, indexdb.h: Changed high-resolution indexdb to be subclassified 20813 by adjacent nucleotides, rather than by phase. 20814 20815 * gsnap.c: Adding separate main program for gsnap. 20816 20817 * block.h: Added function Block_skipto. Giving diagterm information to 20818 Oligo_lookup. 20819 20820 * block.c: Added function Block_skipto. Revised coordinates assigned to 20821 last_querypos. Giving diagterm information to Oligo_lookup. 20822 20823 * Makefile.gmaponly.am, Makefile.pmaptoo.am: Added hooks for gsnap 20824 20825 * fa_coords.pl.in: Improved handling of cases where chromosome is not parsed 20826 20827 * gmap_setup.pl.in: Added -H flag to generate high-resolution gmap dbs. 20828 208292007-11-26 twu 20830 20831 * iit_store.c: Fixed bug in handling GFF files 20832 208332007-11-14 twu 20834 20835 * indexdb.c, indexdb.h: Implemented precise positioning by organizing 20836 composite positions according to phase 20837 208382007-11-13 twu 20839 20840 * result.c, result.h: Remove stage 1 diagnostic information 20841 20842 * matchpair.c, matchpair.h: Making matchpair generate gregion as output from 20843 stage 1 20844 20845 * gmap.c: Using new interface to stage 1. Removed maponly output. 20846 20847 * Makefile.gmaponly.am, Makefile.pmaptoo.am, stage1hr.c, stage1hr.h: Moved 20848 high-resolution stage 1 algorithm to a different file 20849 20850 * stage1.c, stage1.h: Eliminated diagnostic fields. Made interface for 20851 low-resolution version compatible with high-resolution version. 20852 208532007-11-02 twu 20854 20855 * stage3.c, stage3.h: Removed matchpairend and Stage3_direct procedure 20856 20857 * stage1.c, stage1.h: Reverting back to 2007-09-28 version 20858 20859 * pmapindex.c: Changed order of arguments in a function call 20860 20861 * params.c, params.h: Added slots for truncstep and chromosomal transitions. 20862 20863 * list.c, list.h: Added functions List_insert and List_reinsert. 20864 20865 * indexdb.c, indexdb.h: Added function Indexdb_shiftedp. For 20866 high-resolution indexdbs, added code to merge batches using either a queue 20867 or a heap. 20868 20869 * iit-read.c, iit-read.h: Added function IIT_transitions_subset. 20870 20871 * gmapindex.c: Added -e flag to specify high-resolution genomic indices 20872 20873 * chrsubset.c, chrsubset.h: Added function Chrsubset_transitions. Added 20874 assumption to Chrsubset_includep. 20875 20876 * chrnum.c, chrnum.h: Added function Chrnum_print_position 20877 20878 * block.h: Added function Block_donep. 20879 20880 * block.c: Improved debugging output. Added function Block_donep. 20881 20882 * access.h: Added function to report if file exists. 20883 20884 * access.c: Improved error messages. Added function to report if file 20885 exists. 20886 208872007-10-16 twu 20888 20889 * stage1.c: Refined high-resolution algorithm 20890 208912007-10-11 twu 20892 20893 * orderstat.c: Included appropriate header files for memcpy 20894 208952007-10-08 twu 20896 20897 * reader.c: Made reader go all the way to the ends of the sequence 20898 20899 * sequence.c: Fixed computation of trimlength 20900 20901 * indexdb.c, indexdb.h: Implemented read and write procedures for new 20902 genomic index format (trading off position resolution for adjacent 20903 nucleotide contents). 20904 209052007-10-07 twu 20906 20907 * stage1.c: Implemented mapping at ends 20908 209092007-10-06 twu 20910 20911 * stage1.c: Completed initial mapping from middle outward 20912 20913 * stage1.c: Added computation of best subpaths 20914 209152007-10-03 twu 20916 20917 * stage1.c: Implemented high-resolution mapping, and arbitrarily long 20918 matches for the middle of the sequence outward. 20919 209202007-09-30 twu 20921 20922 * stage1.c: Attempt to use diagonals to find genomic position 20923 209242007-09-29 twu 20925 20926 * genome.c, genome.h: Added Genome_totallength function 20927 20928 * stage1.c, stage1.h: Added procedure to match doubles of truncated indexdb 20929 entries 20930 209312007-09-28 twu 20932 20933 * gmap_setup.pl.in: In -S mode (treating each contig as a chromosome), 20934 turning off sorting of chromosomes and contigs. 20935 20936 * gmapindex.c: Added -S flag to turn off sorting of chromosomes and contigs 20937 20938 * table.c, table.h, tableint.c, tableint.h: Added ability to return keys 20939 sorted by timeindex 20940 20941 * stage1.c, trial.c, trial.h: Changes made to scan query sequence from 20942 middle outward 20943 20944 * VERSION: Updated version 20945 209462007-09-27 twu 20947 20948 * gmap.c: Fixed bug for -f 9 and -E output when no paths were found 20949 209502007-09-26 twu 20951 20952 * VERSION: Updated version 20953 20954 * index.html: Revised features for 2007-09-26 version 20955 20956 * gmap_update.pl.in: Made new IIT file permissions the same as the old 20957 permissions 20958 20959 * iit-read.c: Added error messages to various conditions in IIT_read 20960 209612007-09-25 twu 20962 20963 * sequence.c: Fixed reading of sequences with multiple PC line feeds 20964 209652007-09-20 twu 20966 20967 * stage1.c: Kept code that depended on USE_MATCHPOOL and removed alternate 20968 (old) code 20969 20970 * block.c, block.h: Put save variables inside Block_T object 20971 209722007-09-19 twu 20973 20974 * VERSION: Updated version number 20975 20976 * MAINTAINER: Added reminder to do cvs tag 20977 20978 * iit-read.c, iit-read.h, iit-write.c: Moved compute_flanking procedure from 20979 iit-read.c to iit-write.c 20980 20981 * configure.ac, Makefile.am, gmap_update.pl.in: Added gmap_update program 20982 20983 * Makefile.am: Added compile instructions for iit_update 20984 20985 * iit-write.c: Made stringlen of type off_t (to handle annotations of length 20986 greater than can be handled by int). Added check to make sure stringlen 20987 is non-zero. 20988 20989 * iit-read.c: Made stringlen of type off_t (to handle annotations of length 20990 greater than can be handled by int) 20991 20992 * archive.html, index.html: Made changes for 2007-09-20 release 20993 209942007-09-18 twu 20995 20996 * Makefile.gmaponly.am, Makefile.pmaptoo.am, iit-read.c, iit-read.h, 20997 iit-write.c, iit-write.h, iit_update.c: Implemented iit_update program 20998 20999 * iit_store.c: Added -v flag to specify desired version 21000 210012007-09-17 twu 21002 21003 * oligoindex.c, stage2.c: Changed R output for diagonal graphics 21004 210052007-09-12 twu 21006 21007 * gmap.c: Added a check to make sure we don't push NULL for Stage3_T object. 21008 21009 * dynprog.c: Fixed bug in Dynprog_dual_break; need to compute matrix scores 21010 only to the minimum of length1 and length2. 21011 210122007-09-11 twu 21013 21014 * dynprog.c: Fixed problem where Dynprog_dual_break was exiting 21015 unnecessarily; need to be concerned only about shorter distance. 21016 21017 * stage3.c: Added cDNA direction to debugging statements 21018 21019 * iit_get.c, stage3.c: Added sign argument for getting flanking entries 21020 21021 * pair.c: Added provision in PMAP to limit coverage to 100% (could exceed 21022 previously because of implicit stop codon added at end of query sequence). 21023 21024 * iit-read.c, iit-read.h: Added a sign argument for getting flanking entries 21025 21026 * get-genome.c: Added flags for accessing from map files entries of a 21027 particular direction or tag 21028 21029 * stage1.c: Performing filtering based on clustersize only if too many 21030 entries and at least one cluster is large. 21031 210322007-09-04 twu 21033 21034 * stage1.c: Removed filtering based on too many matching pairs 21035 210362007-08-30 twu 21037 21038 * gbuffer.c, gbuffer.h, gmap.c: Removed unused code and parameters from 21039 Gbuffer_T 21040 21041 * gmap.c: Allocating memory for genomicseg only as needed 21042 210432007-08-28 twu 21044 21045 * blackboard.c, blackboard.h, gmap.c, sequence.c, sequence.h: Added ability 21046 to read input from multiple sequence files 21047 21048 * oligoindex.h: Changed calls to reset oligoindex 21049 21050 * oligoindex.c: Fixed hang that resulted when no oligomer positions were 21051 found. Eliminated an extra call to Oligoindex_set_inquery. 21052 21053 * stage2.c: Changed call to Oligoindex to reset after tally 21054 21055 * stage1.c: Modified debugging output 21056 21057 * match.c, mem.c: Enhanced debugging output 21058 21059 * stage2.c, stage2.h: Returned to previous algorithm for finding shifted 21060 canonical dinucleotides, but now allocating memory dynamically. 21061 21062 * gbuffer.c, gbuffer.h: Removed pre-allocated memory for finding shifted 21063 dinucleotides 21064 21065 * stage2.c: Attempt to conserve memory used in finding shifted canonical 21066 dinucleotides. However, results in speed penalty. 21067 210682007-08-26 twu 21069 21070 * gbuffer.c, gbuffer.h: Removed unused matchscores variable and unnecessary 21071 memory allocation. 21072 210732007-08-23 twu 21074 21075 * gmap_uncompress.pl.in: Added a missing space in the output 21076 21077 * gmap_uncompress.pl.in: Added coordinates output (with flag '-f 9') 21078 210792007-08-22 twu 21080 21081 * pair.c: Fixed potential divide-by-zero bug 21082 210832007-08-20 twu 21084 21085 * gmap_setup.pl.in: Added a .SUFFIXES: command at top to prevent unexpected 21086 behaviors 21087 210882007-08-18 twu 21089 21090 * stage2.c: Added step to recover when all scores at a querypos are 21091 negative, by continuing from grand result. 21092 210932007-08-16 twu 21094 21095 * pair.c, pair.h, stage2.c, stage2.h, stage3.c: Computing defect rate in 21096 middle of stage 3, instead of in stage 2 21097 210982007-08-15 twu 21099 21100 * oligoindex.c: Restored amino acid alphabet to 20 from 18. 21101 21102 * oligoindex.c: Fixed typo in variable name 21103 21104 * stage2.c: Inactivated limit on number of active hits 21105 21106 * Makefile.gmaponly.am, Makefile.pmaptoo.am: Added orderstat.c and 21107 orderstat.h to code 21108 21109 * orderstat.c, orderstat.h: Modified procedures to compute order statistics 21110 in place and for both doubles and ints. 21111 21112 * oligoindex.c: Computing overabundance based on upper percentile of 21113 non-zero counts 21114 21115 * doublelist.h: Added Id info to header 21116 21117 * gmap.c: Fixed memory leak when user segment is provided 21118 21119 * orderstat.c, orderstat.h: Added orderstat to CVS 21120 21121 * oligoindex.c, oligoindex.h: Trial to eliminate limit on maxoligohits 21122 21123 * stage2.c: Improved output for graphical debugging 21124 21125 * oligoindex.c: Improved debugging output 21126 211272007-08-13 twu 21128 21129 * stage2.c: Fixed problem with debugging output 21130 21131 * stage1.c: Added pruning by path sizes 21132 21133 * matchpair.c, matchpair.h: Added a procedure for finding path size of a 21134 given matchpair 21135 211362007-07-16 twu 21137 21138 * Makefile.gmaponly.am, Makefile.pmaptoo.am: Created two specialized 21139 Makefile.am files 21140 21141 * Makefile.am: Preparing for iit and genome libraries 21142 21143 * bootstrap, bootstrap.gmaponly, bootstrap.pmaptoo: Created separate 21144 bootstrap routines for gmap and gmap-plus-pmap 21145 21146 * VERSION: Updated version number 21147 21148 * iit-read.c: Computing alphas and betas for iit_dump 21149 21150 * iit-read.c: Computing alphas and betas only when needed for flanking 21151 21152 * gmapindex.c, iit-read.c, iit-read.h, iit-write.c, iit-write.h, iit_get.c, 21153 iit_store.c, iitdef.h: Added fields for annotation in IITs 21154 21155 * indexdb.c: Added monitoring information 21156 211572007-06-25 twu 21158 21159 * genome-write.c, segmentpos.c: For version 2 IITs and later, getting sign 21160 directly from IIT, rather than from annotation. 21161 21162 * get-genome.c, plotgenes.c: Providing sortp parameter to IIT_get 21163 21164 * iit_store.c: Added -v flag to print IIT version 21165 21166 * iit_get.c: Added -U flag to indicate unsigned results 21167 21168 * genome.c: Added comment 21169 21170 * gmapindex.c: No longer writing segment length in contig IITs 21171 21172 * iit-write.c: Writing alphas and betas for correct calculation of flanking 21173 intervals. 21174 21175 * iit-read.c, iit-read.h: Using alphas and betas for correct calculation of 21176 flanking intervals. Added functions IIT_types, IIT_get_all, and 21177 IIT_get_all_typed. 21178 21179 * interval.h: Added interface for Interval_sign 21180 21181 * iitdef.h: Added space for alphas and betas, needed for correct calculation 21182 of flanking intervals 21183 211842007-06-22 twu 21185 21186 * dynprog.h: Defined UNKNOWNJUMP to be used for temporary gapholders during 21187 stage 3 calculations. 21188 21189 * dynprog.c: Returning NULL on all failures, without gapholders (which are 21190 now inserted by calling procedures in stage 3). Allowing 5' and 3' 21191 extensions to work to maxlength allowed. 21192 211932007-06-21 twu 21194 21195 * iitdef.h: Added version to IIT_T 21196 21197 * interval.c: Storing sign for each interval 21198 21199 * iit-read.c, iit-read.h, iit-write.c, iit-write.h: Introduced version 2 21200 format, which stores sign for each interval. Not using annotation anymore 21201 to represent sign. Added function IIT_find_multiple. 21202 212032007-06-20 twu 21204 21205 * pairpool.c: Improved debugging statements 21206 21207 * gdiag.c, get-genome.c, gmap.c, iit-read.c, iit-read.h, iit_get.c, 21208 interval.c, interval.h, plotgenes.c, segmentpos.c, stage3.c: Added ability 21209 to sort intervals by coordinates in IIT_get routines 21210 21211 * mem.c: In comments, showing how TRAP should be defined 21212 21213 * stage3.c: Whenever dynprog procedure returns NULL, make sure to put back 21214 peeled pairs and insert a gapholder. Fixes a bug in BQ672778 against 21215 hg18. Jumps in gapholders now calculated only in certain procedures. 21216 212172007-06-06 twu 21218 21219 * stage3.c: Removed abort commands on peels that run into gaps 21220 21221 * dynprog.c: Expanded on comment 21222 212232007-06-04 twu 21224 21225 * VERSION, index.html: Updated version 21226 21227 * dynprog.c: Removed insertion of gapholder for a single gap that is too 21228 long to solve. 21229 212302007-06-02 twu 21231 21232 * stage1.c: Added debugging statement to signal end of stage 1 21233 21234 * dynprog.c: Eliminated allocation of temporary Dynprog_T objects 21235 21236 * pair.c: Now printing query_skip in -A and -S output 21237 212382007-05-29 twu 21239 21240 * dynprog.c: Lowered gap penalties for single gaps. Fixed bug in solving 21241 dynamic programming for lower-case input sequences. 21242 21243 * stage3.c: Added procedures for cleaning non-matches at ends of alignment, 21244 which are always called. 21245 21246 * stage3.c: Removing all nonmatches at 5' and 3' ends 21247 212482007-05-25 twu 21249 21250 * VERSION: Updated version 21251 21252 * index.html, archive.html: Made changes to reflect 2007-05-25 version 21253 21254 * pair.c, pair.h, stage3.c: Added coverage and identity information to GFF3 21255 output 21256 212572007-05-24 twu 21258 21259 * configure.ac: Checking both fixed and variable mapping for mmap 21260 21261 * fa_coords.pl.in: Fixed problem in parsing lines containing the word 21262 "chromosome" 21263 21264 * stage3.c: Fixed bug in solving dual breaks 21265 21266 * pair.c: Fixed PSL output so query and target gaps are computed directly 21267 from the block starts and lengths. 21268 21269 * gmap.c: Added flag -j for showing dual breaks 21270 21271 * dynprog.c: Fixed bug where solution of dual break exceeded minimum gap 21272 212732007-05-15 twu 21274 21275 * stage3.h: Added do_final_p parameter to Stage3_compute 21276 21277 * stage3.c: Incorporating procedure to trim bad middle exons 21278 21279 * stage1.c: Performing salvage if total number of matches is relatively low 21280 21281 * smooth.c: Added to exon length in smoothing for short exons 21282 21283 * pair.c: Added test code to print information about extra exons 21284 21285 * intron.c: Removing call to abort. 21286 21287 * dynprog.c: Added code to bridge dual break with rewards for canonical 21288 introns. Tweaked some parameters, including less penalty for gap 21289 extensions. 21290 212912007-04-25 twu 21292 21293 * mem.c: Added error messages for memory allocation problems 21294 21295 * gmap.c: Added -X flag for heavily favoring canonical and semi-canonical 21296 introns 21297 212982007-04-23 twu 21299 21300 * gmap.c: Made strict translation the default again 21301 21302 * indexdb.c: Added "U" to integers in bit operations 21303 21304 * genome.c: Using two arrays instead of one for translate. 21305 21306 * compress.c: Using two arrays instead of one for translate. Added various 21307 abort checks. 21308 21309 * translation.c: Removed unnecessary checks of extraexonp 21310 21311 * pair.c: Fixed Pair_dump_one so it handles extraexonp flag 21312 21313 * pairpool.c: Fixed bug in assigning extraexonp 21314 21315 * pair.c: Changed add_intronlengths slightly 21316 21317 * acx_mmap_variable.m4: Moved AC_DEFINE out of macro, to be called 21318 explicitly in configure.ac 21319 21320 * acx_mmap_fixed.m4: Added macro for testing mmap with MAP_FIXED 21321 21322 * gmap.c: Made frameshift-tolerant translation the default 21323 21324 * pair.c: Changed output of CDS in gff3 mode to produce an in-frame protein 21325 sequence 21326 21327 * translation.h, stage3.h: Added strictp for PMAP. 21328 21329 * translation.c: Added strictp for PMAP. Handling extraexonp items in 21330 alignment. 21331 21332 * stage3.c: Giving extraexonp information to pairpool procedures for 21333 gapalign items 21334 21335 * pair.h: Added procedure for computing fractional error. 21336 21337 * pair.c: Added provisions for handling extraexonp flag. Added procedure 21338 for computing fractional error. 21339 21340 * pairpool.c, pairpool.h: Added provisions for handling extraexonp flag 21341 21342 * pairdef.h: Added flag for extra cDNA exon 21343 21344 * comp.h: Added comp for extra cDNA exon 21345 21346 * stage3.c: Added procedures for trimming internal exons with poor matches, 21347 and for finding extra exons in a dual break. Added strictp for PMAP. 21348 21349 * dynprog.c, dynprog.h: Added procedures for finding extra exons in a dual 21350 break 21351 21352 * compress.c: Provided more informative error message 21353 21354 * oligoindex.c: Handled an arithmetic error caused by divide by zero 21355 21356 * gmap.c: Added flag -H for handling trimming middle exons and reporting of 21357 dual breaks. Removed flag -j. Allowed strictp to be used in PMAP. 21358 213592007-04-16 twu 21360 21361 * align.test.ok, map.test.ok: Made test output match current output 21362 21363 * acx_mmap_variable.m4, configure.ac: Added check for mmap using 21364 MAP_VARIABLE (because AIX fails on MAP_FIXED) 21365 21366 * gmap.c: Fixed global variables not used in PMAP 21367 21368 * stage2.c: Fixed check for negative scores to work for PMAP 21369 21370 * stage1.c: Placing bound on number of samples taken over query sequence 21371 21372 * stage2.c: For very long sequences, pruning based on clear coverage 21373 21374 * oligoindex.c, oligoindex.h: Added computation of clear coverage (percent 21375 of query sequence covered by relatively few diagonals). 21376 21377 * matchpair.c: Added debugging statements 21378 213792007-04-06 twu 21380 21381 * stage2.c: Added restriction on distance for grand lookback. Added check 21382 on nactive for scoring at querypos. 21383 21384 * stage2.c: Added nactive as a filter in stage 2 21385 213862007-04-02 twu 21387 21388 * stage2.c: Added comment about a way to restrict debugging output 21389 21390 * iit-read.c: Added start/end information to endpoints in dumping counts 21391 213922007-03-22 yunli 21393 21394 * modules: *** empty log message *** 21395 213962007-03-21 yunli 21397 21398 * modules: *** empty log message *** 21399 214002007-03-08 twu 21401 21402 * stage2.c: Quitting if totalpositions is zero 21403 21404 * gmap.c: Enabled table output (-f 9) for relative alignment mode (-w). 21405 214062007-03-03 twu 21407 21408 * stage2.c: Fixed bug which led to the partial fill problem in filling 21409 oligomers 21410 214112007-03-02 twu 21412 21413 * Makefile.am: Added pthread_libs for iit utilities 21414 214152007-03-01 twu 21416 21417 * gdiag.c: Added printing of centromere regions. Printing marginals more 21418 efficiently. 21419 21420 * Makefile.am: Added chrnum and chrsubset files to gdiag 21421 21422 * iit-read.c, iit-read.h: Made IIT_transitions function return signs 21423 214242007-02-28 twu 21425 21426 * gdiag.c: Added flag for ignoring main diagonal. Fixed problem with 21427 printing revcomp diagonals. Added code for computing different types of 21428 patterns. 21429 214302007-02-21 twu 21431 21432 * gdiag.c: Added hooks for user to enter chromosomal subsets 21433 21434 * complement.h: Added a character code string that doesn't convert to 21435 uppercase, for gdiag. 21436 214372007-02-20 twu 21438 21439 * gdiag.c: Added ability to use map iit files. Skipping over masked regions 21440 in determining lookback. 21441 21442 * gdiag.c: Fixed bug in retrieving last part of sequence from gmapdb. 21443 Providing flag to ignore lowercase (e.g., masked) characters in query. 21444 21445 * gdiag.c: Added ability for user to provide genomic segment 21446 21447 * Makefile.am: Added iit-write source files for gdiag 21448 21449 * gdiag.c: Fixed bug where genomestart could be less than genomeend in a 21450 diagonal. Made separate procedures for updating forward and revcomp 21451 diagonals. 21452 21453 * indexdb.c, indexdb.h: Added procedure IIT_read_inplace for gdiag. 21454 21455 * iit-write.h: Added procedure IIT_new to allow creation and use of iit in 21456 the same program. 21457 21458 * iit-write.c: Added procedure IIT_new to allow creation and use of iit in 21459 the same program. Simplified code for Node_fwrite. 21460 21461 * iit-read.c, iit-read.h: Added procedure IIT_transitions 21462 214632007-02-18 twu 21464 21465 * gdiag.c: Added a ring structure to increase speed 21466 21467 * genome.c, genome.h: Added procedures for gdiag 21468 21469 * gdiag.c: Made speed improvements by not storing full 24-mers, but rather 21470 storing results of previous 12-mers 21471 21472 * gdiag.c: Added calculation of diagonals, ability to read query from 21473 gmapdb, and storage of intervals. 21474 214752007-02-16 twu 21476 21477 * Makefile.am, gdiag.c: Added program gdiag 21478 214792007-02-12 twu 21480 21481 * gmap.c: Increased parameter for maxoligohits 21482 21483 * stage3.c: Lowered parameter for intronlen 21484 21485 * stage2.h: Removed unused function 21486 21487 * stage2.c: Changed distance penalty to consider both gendistance (now 21488 linearly, instead of logarithmically) and querydistance (quadratically). 21489 Using both maxnconsecutive and pct_coverage to decide whether to continue 21490 with stage 2. 21491 21492 * stage1.c: Reduced parameter for number of trials 21493 21494 * pair.c: Fixed calculation of coverage for PMAP 21495 21496 * oligoindex.h: Moved parameters here 21497 21498 * oligoindex.c: Implemented algorithm for PMAP. Allowing a diagonal to 21499 dominate only if it is completely consecutive. 21500 215012007-02-09 twu 21502 21503 * iit-read.c, iit-read.h: Added function to dump labels 21504 21505 * gmap.c: Fixed bugs with map iit files: bad test for distinguishing between 21506 universal map files and chromosomal map files, and incorrectly checking 21507 map tags against chromosomal iit. 21508 215092007-02-08 twu 21510 21511 * oligoindex.c, oligoindex.h: Added computation of percent coverage by 21512 diagonals 21513 21514 * access.c: Added a debugging statement 21515 21516 * gmap.c: Fixed floating point error when trimoligos is zero 21517 21518 * gmap.c, oligoindex.c, oligoindex.h, stage2.c, stage2.h: Added graphical 21519 debugging output for stage 2 21520 215212007-02-07 twu 21522 21523 * smooth.c: Distinguishing use of genomejump and queryjump lengths in 21524 pre-single-gap smoothing versus post-single-gap smoothing. 21525 21526 * gmap.c: Fixed floating exception when sequence has no oligos 21527 21528 * stage2.c: Initializing guide in 5' trim region, until first hits are found. 21529 21530 * gmap.c: Changed default pruning behavior to be no pruning. 21531 21532 * oligoindex.c: Made speed improvements in scanning diagonals. Removed old 21533 code for computing maxconsecutive. 21534 21535 * oligoindex.h, stage2.c: Changed name of variable 21536 21537 * oligoindex.c: Eliminated convex hull algorithm and implemented method 21538 based on ordering of diagonals. 21539 21540 * diag.c, diag.h: Removed unused procedures 21541 215422007-02-06 twu 21543 21544 * oligoindex.c: Implemented a convex hull algorithm to determine minactive 21545 and maxactive bounds. 21546 21547 * diag.c, diag.h: Added a procedure to sort diagonals based on closeness to 21548 origin 21549 21550 * oligoindex.c, oligoindex.h, stage2.c: Restored computation of 21551 maxgoodconsecutive to filter out bad stage1 candidates 21552 21553 * stage2.c: Allowing fill of nucleotides to occur even when 21554 querypos/lastquerypos or genomepos/lastgenomepos are too close. 21555 21556 * oligoindex.c: Added new procedure for determining dominance among diagonals 21557 21558 * diag.c, diag.h: Added procedure for sorting by nconsecutive 21559 21560 * stage2.c: Using minactive and maxactive to bound current querypos, and 21561 active to determine available hits for previous querypos. 21562 21563 * oligoindex.c, oligoindex.h: Computing diagonals inside 21564 Oligoindex_get_mappings procedure. Implemented simple dominance procedure. 21565 21566 * diag.c, diag.h, diagdef.h, diagpool.c, diagpool.h: Added fields for 21567 nconsecutive and dominancep 21568 215692007-02-05 twu 21570 21571 * oligoindex.c: Changes in parameters 21572 21573 * stage2.c: Changes to debugging output 21574 21575 * oligoindex.c: Speed improvements by inlining calls to Intlist accessors 21576 21577 * stage2.h: Removed unused function 21578 21579 * oligoindex.h, stage2.c: Using length as a criterion instead of 21580 nconsecutive for proceeding to dynamic programming 21581 21582 * oligoindex.c: Reduced requirement for nconsecutive in scanning diagonals. 21583 Keeping cumulative track of highest and lowest diagonals. Added 21584 extra_bounds to diagonal bounds. 21585 21586 * Makefile.am, diag.c, diag.h, diagdef.h, diagpool.c, diagpool.h, gmap.c: 21587 Adding Diag_T and Diagpool_T objects for scanning diagonals 21588 21589 * oligoindex.c, oligoindex.h, stage2.c, stage2.h: Scanning diagonals to set 21590 bounds on active oligomers 21591 215922007-02-04 twu 21593 21594 * Makefile.am, gmap.c, intpool.c, intpool.h, oligoindex.c, oligoindex.h, 21595 stage2.c, stage2.h: Added Intpool_T object to manage storage for Intlist_T 21596 objects 21597 21598 * gmap.c: Using new interface to Stage3_compute 21599 21600 * stage2.c: In dynamic programming, added a lookback to the grand best 21601 querypos and hit 21602 21603 * intlist.c, intlistdef.h: Provided exposure to internal structure of 21604 Intlist_T 21605 216062007-02-03 twu 21607 21608 * oligoindex.c, oligoindex.h, stage2.c: Implemented faster algorithm for 21609 identifying active stage 2 oligomers. 21610 21611 * stage2.h: Removed unused procedures 21612 21613 * oligoindex.c, oligoindex.h, stage2.c: Implemented procedures to skip over 21614 unused mappings, based on active 21615 216162007-02-02 twu 21617 21618 * gmap.c, oligoindex.c, oligoindex.h, stage2.c, stage2.h: Implemented new 21619 stage 2 procedure. Now using oligoindex at minindexsize and filtering 21620 those hits according to a local search. Initial search is based on 21621 maxindexsize. Uncovered ends of the alignment receive a looser local 21622 search criterion. Increased stage 2 lookback from 60 to 100. 21623 21624 * translation.c: Fixed bug where first cDNA amino acid appeared under a cDNA 21625 space. 21626 21627 * stage1.c: Adding one more trial for long sequences 21628 21629 * pair.h, stage3.h: Computing coverage and now trimmed coverage at print 21630 time. 21631 21632 * pair.c: Computing coverage and now trimmed coverage at print time. Added 21633 output line for trimmed coverage. Added trim information in compressed 21634 (-Z) output. Adjusting output of trim boundaries based on alignment. 21635 21636 * stage3.c: Computing coverage and now trimmed coverage at print time. 21637 Increased parameter for minintronlength. 21638 21639 * gmap.c: Removed -X (cross-species) flag 21640 216412007-01-31 twu 21642 21643 * oligoindex.c: Added code for 18-amino-acid alphabet 21644 21645 * gmap.c: Made user_stage1p false 21646 21647 * stage2.c: Trying to make penalties consistent across different cases 21648 21649 * stage3.c: Initialized variables in Stage3_T object 21650 21651 * pair.c: Fixed memory leak in gff exon mode (-f 2) 21652 216532007-01-07 twu 21654 21655 * stage3.c: If both single and dual gap solutions are canonical, picking 21656 solution with best score. 21657 216582007-01-06 twu 21659 21660 * dynprog.c: Made reward for final canonical intron uniform across defect 21661 rates. Boosted reward for final semicanonical intron to match that for 21662 canonical intron. 21663 216642007-01-05 twu 21665 21666 * pair.c: Fixed dinucleotide output in compressed (-Z) format when 21667 user-provided genomic segment has lower-case characters. 21668 216692007-01-04 twu 21670 21671 * stage2.c: Reduced value of EQUAL_DISTANCE, to favor better local alignment 21672 over longer global alignment 21673 21674 * stage3.c: Counting exons only after gaps filled in 21675 21676 * pair.c, pair.h: Added procedure for counting exons after gaps filled in 21677 216782007-01-03 twu 21679 21680 * dynprog.c: Made one-sided gap behavior true only for single gaps and end 21681 gaps 21682 21683 * stage3.c: Removed fix_pmap_holes function and all references to it 21684 216852006-12-18 twu 21686 21687 * index.html, VERSION: Updated version 21688 21689 * translation.c: Prevented assignment of incomplete last codon on cDNA side 21690 in strict mode 21691 21692 * stage1.c: Removed unused variables 21693 21694 * matchpair.c: Increased EXTRA_SHORTEND 21695 21696 * gmap.c: Reduced default trimexonpct. Changed bandwidths for single and 21697 gap gaps. 21698 21699 * dynprog.c: Added onesidegap behavior, which allows gaps on either genomic 21700 or cDNA side, but not both. Added concept of fixeddestp, which is not 21701 true for the ends. 21702 217032006-12-15 twu 21704 21705 * VERSION: Updated version 21706 21707 * index.html: Made changes to reflect new version and strict translation as 21708 default 21709 21710 * gmap.c: Reduced extraband_end 21711 21712 * dynprog.c, gmap.c: Reduced extraband_single to prevent gaps from being 21713 inserted on both sides 21714 21715 * gmap.c, pair.c, pair.h, sequence.c, sequence.h, stage3.c: For PMAP, adding 21716 an implicit stop codon at end of sequence if not already present, and 21717 distinguishing between computational fulllength and given fulllength. 21718 21719 * smooth.c: Changed probability threshold for identifying short exons 21720 217212006-12-14 twu 21722 21723 * gmap.c: Made strict translation the default, and tolerant translation 21724 turned on by -Y flag 21725 21726 * stage3.c: Providing pound signs in dual breaks in diagnostic output. 21727 Replacing backtranslation characters with 'N' in PMAP output. Removed 21728 microexon search from PMAP. Using single gap procedure instead of 21729 fix_pmap_holes procedure for PMAP. 21730 21731 * pair.c: Counting ambiguous characters as matches in all instances of 21732 computing percent identity 21733 21734 * indexdb.h: Added variables for 5-aa mers 21735 21736 * dynprog.c: Limiting bandwidth in single gap alignment to be dependent on 21737 differences in segment lengths 21738 21739 * backtranslation.c: Not performing backtranslation if any genomic codon 21740 position is blank. 21741 21742 * md5.t.c: Removed unused file 21743 217442006-12-13 twu 21745 21746 * dynprog.c: Reduced width of band in single gaps when lengths are equal 21747 21748 * translation.c: Fixed strict translation mode, so it begins as same 21749 location as genomic translation. 21750 21751 * stage3.c: Removed step of merging adjacent dynamic programming. Using two 21752 different smoothing steps. Protected small introns from being solved as 21753 single gaps in final intron pass. 21754 21755 * smooth.c, smooth.h: Created two separate smoothing procedures, one based 21756 on net gap, and one based on size. 21757 21758 * dynprog.c: For cDNA gaps, inserting indel pairs only if both gaps are small 21759 217602006-12-12 twu 21761 21762 * map.test.ok: Added blank line at end 21763 21764 * VERSION: Updated version 21765 21766 * MAINTAINER: Added reminder to check cvs log to make sure files are all up 21767 to date 21768 21769 * stage3.c: Removing gaps at 5' and 3' ends after end extensions. Checking 21770 for division by zero in trim_bad_exons. 21771 21772 * Makefile.am: Simplified list of source files 21773 217742006-12-08 twu 21775 21776 * archive.html: Updated to reflect 2006-12-08 version 21777 21778 * archive.html, Makefile.am, align.test.in, align.test.ok, coords1.test.in, 21779 coords1.test.ok, iit_dump.test.in, iit_get.test.in, iit_store.test.in, 21780 map.test.in, map.test.ok, setup1.test.in, setup2.test.in: Merged into trunk 21781 21782 * defs: Initial import into CVS 21783 21784 * VERSION, config.site.gne, share, index.html, pmap_setup.pl.in: Merging 21785 into trunk 21786 21787 * MAINTAINER: Merging into main trunk 21788 21789 * iit.test.in: Combined iit_store, iit_get and iit_dump tests into one script 21790 21791 * stage1.c: Increased definition of short sequence (for allowing cluster 21792 mode) for PMAP 21793 21794 * match.c: Fixed printing of sequences in debugging statements 21795 217962006-12-05 twu 21797 21798 * stage3.c: Fixed miscount problem with filling in short introns. Increased 21799 MININTRONLEN_FINAL significantly. 21800 218012006-12-01 twu 21802 21803 * stage1.c: Printing chromosome name in debugging statements 21804 21805 * match.c, match.h: Added procedure Match_chr 21806 218072006-11-30 twu 21808 21809 * stage3.c: Allowing and correcting for gaps after gaps 21810 21811 * smooth.c: Using difference between genomejump and queryjump to define 21812 introns for the purposes of smoothing. 21813 21814 * configure.ac: Added check for sigaction 21815 21816 * README: Updated README file 21817 21818 * md_coords.pl.in: Fixed behavior when user wants only the reference strain 21819 21820 * gmap_setup.pl.in: Changed name from raw to fullascii. Changed default for 21821 PMAP from 7 to 6. 21822 21823 * gmap_process.pl.in: Added check to see that all contigs are processed 21824 21825 * stage3.c: Assigning gap pairs after final extensions of 5' and 3' ends 21826 21827 * smooth.c: Removed include of unused header file 21828 21829 * genome-write.c, gmapindex.c, indexdb.c, indexdb.h, pmapindex.c: Added 21830 genome name to monitoring statements 21831 21832 * gmap.c: Stopped warning message for -B when flag was not provided 21833 218342006-11-28 twu 21835 21836 * stage1.c: Revised heuristics for determining maxtotallen and lengths for 21837 extensions 21838 21839 * gmap.c: Ignoring batch flag if only a single sequence is given 21840 218412006-11-27 twu 21842 21843 * stage1.c: Removed unused code 21844 21845 * matchpair.c: Adding extra extension length when continuousp is false 21846 21847 * gmap.c: Revised default lengths for single intron length and total genomic 21848 length 21849 21850 * dynprog.c: Added checks for genomic segment at ends being shorter than 21851 query segment 21852 21853 * indexdb.h: Using 6-mers with full alphabet in PMAP 21854 21855 * indexdb.c: Improved monitoring statements 21856 21857 * matchpair.c, matchpair.h: Revised procedures for computing support and 21858 extensions. Integrated procedures for filtering of unique and duplicate 21859 matchpairs. 21860 21861 * oligoindex.c: Returned to 20 amino acids in stage 2 21862 21863 * params.c, params.h, stage1.h: Removed unused variable 21864 21865 * stage1.c: Integrated matchpairs into a single list. Revised procedures 21866 for extending genomic region based on 12-mers. 21867 21868 * stage3.c: Allowed arbitrarily long incursion into previous dynprog during 21869 peelback. 21870 21871 * stage2.c: Separated fwd_consecutive and rev_consecutive. Made values 21872 consistent regardless of indexsize. 21873 218742006-11-21 twu 21875 21876 * stage1.c: Fixed extensions for PMAP 21877 21878 * indexdb.c: Reversed previous changes to try to make idxpositions file 21879 point to end of oligomer for reverse strand matches. 21880 21881 * indexdb.c: Made idxpositions file point to end of oligomer for reverse 21882 strand matches. Improved debugging output. 21883 21884 * stage1.c: Added a binary search routine 21885 218862006-11-20 twu 21887 21888 * indexdb.h, stage1.c: Made changes for PMAP to work with 6-mer pmapdb 21889 21890 * oligoindex.c: Fixed debugging statements for PMAP 21891 21892 * pair.c: Revised psl protein output for matches to the negative genome 21893 strand 21894 21895 * backtranslation.c, backtranslation.h: Made an extern procedure for 21896 computing consistent codon for a given amino acid. 21897 21898 * translation.c, translation.h: Made get_codon an extern procedure 21899 21900 * stage3.c: Added procedure for fixing alignment holes in PMAP. Applying 21901 higher standard for accepting dual intron solutions. 21902 21903 * stage2.c: Fixing bugs in identifying stage 2 candidates to abort 21904 21905 * gmap.c: Setting trim variables appropriately in maponly mode 21906 21907 * dynprog.c: In PMAP, rounding up or down to finish codon 21908 219092006-11-16 twu 21910 21911 * stage3.c: Using intron types to evaluate bad exons at ends. Adding 21912 another round of extensions at ends after trimming of bad exons. Restored 21913 correction for genomepos at left end skip when filling in introns. 21914 21915 * dynprog.c: Assigning intron type for microexons added at ends of alignment 21916 21917 * gbuffer.c, gbuffer.h: Removed unused variables 21918 21919 * stage2.c: Removed unused variables. Using correct value for 21920 maxconsecutive instead of last one. 21921 219222006-11-15 twu 21923 21924 * stage3.c: Using uppercase string, with U-to-T conversion, to identify 21925 mismatches in peelback procedures. 21926 21927 * backtranslation.c, translation.c: Using uppercase string, with U-to-T 21928 conversion, instead of toupper(). 21929 21930 * sequence.c: Using new complement and uppercase strings 21931 21932 * pair.c: Using new name for (lowercase) complement string. Including 'U' 21933 and 'u' as known bases for computing percent identity. 21934 21935 * indexdb.c: Using uppercase string, which also performs U-to-T conversion, 21936 instead of toupper(). 21937 21938 * genome-write.c, genome.c: Using new name for (lowercase) complement string. 21939 21940 * compress.c: Using uppercase string instead of toupper. 21941 Compress_get_char() no longer converts characters to uppercase. 21942 21943 * dynprog.c: Made U and T a matching pair. Commented out old code dealing 21944 with lowercase characters. 21945 21946 * complement.h: Added strings for uppercase of complement, and for U-to-T 21947 conversion during uppercase 21948 21949 * sequence.c: Enabled removal of spaces in read procedure 21950 219512006-11-14 twu 21952 21953 * dynprog.c: Reduced extension penalties for single gaps 21954 21955 * stage3.c: Fixed bug in filling in gaps where leftpair has a genome gap. 21956 Increased size of MININTRONLEN to avoid finding introns in single gap 21957 regions. 21958 219592006-11-12 twu 21960 21961 * stage3.h: Added parameter for number of flanking sequences to 21962 Stage3_print_map 21963 21964 * stage3.c: In 5' and 3' extensions, evaluating continuations before and 21965 after a gap if one is found during peeling, and performing microexon 21966 search medial to the gap 21967 21968 * dynprog.c: Returning null in genome gap if queryjump <= 1 21969 21970 * get-genome.c: Added -u flag for printing flanking intervals 21971 21972 * iit-read.c, iit-read.h: Added option to print iit entries in reverse order 21973 21974 * indexdb.h: Restored previous parameters 21975 21976 * pair.c: Added pointer to pair in debugging output 21977 219782006-11-11 twu 21979 21980 * stage3.c: Fixed computation of bad end exons. Included short end exons in 21981 definition of bad end exons. Finding bad end exons after 5' and 3' 21982 extensions. Fixed declaration of sense/antisense when no canonical or 21983 semicanonical introns are present. Removing end introns during peelback 21984 before 5' and 3' extensions. Removed unused code for trimming alignment 21985 at ends. 21986 219872006-11-06 twu 21988 21989 * gmap.c: Added flags for printing flanking IIT hits and for trimming end 21990 exons 21991 21992 * stage3.c: Fixed bug in trimming empty alignment 21993 21994 * smooth.c: Fixed bug in handling lower-case query sequences 21995 219962006-11-01 twu 21997 21998 * translation.c: Fixed bug in strict translation 21999 22000 * iit-read.h: Added procedures for finding flanking hits. 22001 22002 * iit-read.c: Made IIT_get more efficient. Added procedures for finding 22003 flanking hits. 22004 22005 * iit_get.c: Added -u flag for printing flanking hits 22006 22007 * stage3.c, stage3.h: Allowing trimming of bad exons at ends. Increased 22008 peelback at ends. Added iterative cycles of intron finding within 22009 smoothing and dual intron cycles. 22010 22011 * smooth.c: Relaxing requirements for short exons at ends, because of later 22012 trimming of poor exons at ends 22013 22014 * pair.c: Adding printing of intron type for debugging 22015 22016 * gmap.c: Stopping deletion of global_except_key, because worker threads may 22017 still need it. Increasing standards for defining a sequence to be 22018 repetitive. Eliminating -U flag for trimming alignments, and adding -k 22019 flag for specifying trimming of exons at ends. 22020 22021 * except.c: Stopping deletion of global_except_key, because worker threads 22022 may still need it 22023 22024 * blackboard.c: Letting each thread destroy its own reqpost 22025 220262006-10-31 twu 22027 22028 * gmap.c: Added -Y flag for performing strict translation of cDNA sequence. 22029 Removed worker_assignments variable, and using global blackboard variable 22030 instead to handle exceptions. 22031 22032 * stage3.c, stage3.h, translation.c, translation.h: Added strictp flag for 22033 protein translation 22034 22035 * oligoindex.c: Dropped oligospace requirements for PMAP by reducing amino 22036 acid alphabet in stage 2 from 20 to 16. 22037 22038 * gmapindex.c, indexdb.c, pmapindex.c: Fixed memory allocation for filename 22039 22040 * except.c: Fixed location of compiler directive 22041 22042 * blackboard.c: Put mutex locks outside of updates to input counter and 22043 output counter. This is to be cautious, since only input thread and 22044 output thread, respectively, should be affecting these counters. 22045 220462006-10-24 twu 22047 22048 * stage3.c: Fixed undefine_nucleotides to handle gapholders 22049 22050 * oligoindex.c: Using calloc instead of malloc for initializing oligoindex 22051 space 22052 22053 * gmap.c: Reduced indexsizes in PMAP, so they won't overflow in some machines 22054 22055 * backtranslation.c: Fixed usage of translation_start and translation_end 22056 220572006-10-20 twu 22058 22059 * gmap.c: Printing messages to stderr when no paths are found, in all cases 22060 where sequence headers are not printed. 22061 22062 * translation.c: Fixed coordinates for translation start and end 22063 22064 * stage3.c: Fixed bug with NULL path passed to undefine_nucleotides 22065 22066 * pair.c: Changed gff3 procedures to treat translation start and end values 22067 as query positions, not alignment indices. 22068 220692006-10-16 twu 22070 22071 * stage3.c: Making sure that gaps are inserted after smoothing procedure 22072 deletes exons 22073 22074 * stage2.c: Clarified differences between amino acid indexsize and 22075 nucleotide indexsize. Cleaned up code for filling in oligomers. 22076 22077 * smooth.c: Reduced definition of a gap between exons 22078 22079 * oligop.c: Included possibility of 12-amino acid alphabet for 8-mers. 22080 22081 * indexdb.h: Included possibility of 12-amino acid alphabet for 8-mers. 22082 Provided compile-time values for file suffixes. 22083 22084 * indexdb.c: Included possibility of 12-amino acid alphabet for 8-mers 22085 22086 * pmapindex.c: Performing complete build with a single command 22087 22088 * gmapindex.c: Using compiler-time value for suffix 22089 22090 * gmap.c: Printing value of INDEX1PART in help output for PMAP 22091 220922006-10-13 twu 22093 22094 * oligoindex.c: Added debugging statements 22095 22096 * translation.c: Fixed bug with translating cDNA beyond the genomic stop 22097 22098 * stage3.c: Reorganized passes through the alignment. Made peelback 22099 routines more robust. 22100 22101 * smooth.h: Using stage 2 indexsize in smoothing procedures 22102 22103 * smooth.c: Major rewrite of smoothing procedures 22104 22105 * dynprog.c: Added another mechanism to prevent microexon from having a gap 22106 at either end 22107 221082006-10-12 twu 22109 22110 * gmap.c: Allowing "-t 0" to mean non-threaded behavior. Using new 22111 thread-safe exception handler. 22112 22113 * dynprog.c, dynprog.h: Fixed traceback for cDNA gaps 22114 22115 * except.c, except.h: Re-implemented thread-safe exception handler to remove 22116 memory leaks. Now using exception frames in stack rather than in heap. 22117 22118 * stage3.c: Fixed peelback to codon boundaries for PMAP. Relaxed forcep 22119 requirement for single gaps. Recognizing cases where prior genome or cDNA 22120 gap solution was obtained. 22121 22122 * stage3.h: Removed ngap from parameter lists when possible 22123 22124 * stage1.c: Initialized a diagnostic variable 22125 22126 * pair.c, pair.h: Removed unused code and variables 22127 221282006-10-11 twu 22129 22130 * except.c, except.h: Implemented thread-safe version of exception handler 22131 22132 * gmap.c: Added -j flag to control printing of dual breaks 22133 22134 * except.c, except.h: Reformatted exception handling code. Using pointers 22135 to frames. 22136 221372006-10-10 twu 22138 22139 * stage3.c, stage3.h: Rewrote peelback routines. More accurate handling of 22140 coordinates and checking of coordinates and gaps. 22141 22142 * dynprog.c: Advancing query and genome coordinates in cases of skips 22143 22144 * smooth.c: Revised trimming at ends to use individual exons, rather than 22145 the sum of exon and intron lengths 22146 22147 * pairpool.c: Showing pointer to pair in debugging statements 22148 22149 * pair.c: Showing queryjump and genomejump in debugging statements 22150 22151 * gmap.c: Added -0 flag to inactivate exception handler, and -7 and -8 flags 22152 to show results of stage 2 and smoothing, respectively. 22153 22154 * except.c, except.h: Added mechanism to inactivate exception handler 22155 221562006-10-09 twu 22157 22158 * access.c: Fixed compiler warning about reference to void *. 22159 22160 * block.c, chimera.c, oligo.c, sequence.c, stage1.c: Removed unused variables 22161 22162 * compress.c, dynprog.c, genome-write.c, intron.c: Added necessary header 22163 file 22164 22165 * genome.c: Fixed compiler warning about mismatched variable types. 22166 22167 * gmap.c: Added flag for pruning level. Inactivated conversion of signals 22168 to exceptions with diagnostic flag. Removed references to badoligos. 22169 22170 * indexdb.c: Added necessary header file. Fixed compiler warning about 22171 mismatched variable types. 22172 22173 * matchpool.h: Added declarations of external functions 22174 22175 * oligoindex.c, oligoindex.h: Computing estimate of maxconsecutive when 22176 mappings are obtained 22177 22178 * pair.c, pair.h: Added diagnostic information about stage 2 maxconsecutive. 22179 22180 * result.c, result.h: Added diagnostic information about initial query check 22181 22182 * smooth.c: Handling possible gaps at ends of alignment 22183 22184 * stage2.c: Using maxconsecutive estimate from Oligoindex_get_mappings to 22185 determine whether to proceed with stage 2. 22186 22187 * stage3.c: Added diagnostic information about stage 2 maxconsecutive. 22188 Fixed procedure for removing adjacent dynamic programming to remove all 22189 gaps, and then to reinsert them later. 22190 221912006-10-06 twu 22192 22193 * oligoindex.c, oligoindex.h: Added counting of replicate oligos 22194 22195 * md_coords.pl.in: Added information about number of contigs in each strain 22196 22197 * configure.ac: Removed obsolete tests. Fixed problem in setting share 22198 directory. Added maintainer option. 22199 22200 * gmap.c: Distinguishing between poor and repetitive sequences. Providing 22201 -p flag to control pruning behavior. 22202 22203 * result.h: Distinguishing between poor and repetitive sequences 22204 22205 * sequence.c: Set skiplength correctly on empty sequences 22206 22207 * gmap.c: Added -W flag to force GMAP to compute repetitive or poor sequences 22208 22209 * oligoindex.c: Limited definition of badoligo to consider only non-ACGT 22210 characters, and not to consider number of hits. 22211 22212 * stage3.c: Fixed bug arising from gaps left at ends of alignment 22213 22214 * dynprog.c: Disallowing bridges of introns and cDNA insertions that lead to 22215 coordinate errors 22216 22217 * gmap.c: Changed thread-based exception handling to kill all other threads 22218 and to report all worker assignments 22219 222202006-10-05 twu 22221 22222 * stage3.h: Made checking of coordinates occur in diagnostic mode. 22223 22224 * stage3.c: Made checking of coordinates occur in diagnostic mode. Fixed 22225 case where cDNA gap turned into a single gap after peelback. 22226 22227 * stage1.c: Fixed memory leak 22228 22229 * smooth.c: Fixed bug resulting from apparent negative exon and negative 22230 intron lengths. 22231 22232 * oligoindex.c: Restored pruning of sequences with bad oligos. 22233 22234 * gmap.c: Added handlers to convert signals into exceptions, to indicate the 22235 problematic sequence. Restored pruning of sequences with bad oligos. 22236 222372006-10-04 twu 22238 22239 * result.c, stage1.c, stage1.h: Added reporting of more diagnostic 22240 information 22241 22242 * stage2.c: Fixed problems with uninitialized variable 22243 22244 * matchpair.c: Fixed problem with uninitialized variable 22245 22246 * gmap.c, pair.c, pair.h, result.c, result.h: Printing diagnostic 22247 information upon request 22248 22249 * access.c: Using a Stopwatch_T object 22250 22251 * stage1.c, stage1.h, stage2.c, stage2.h, stage3.c, stage3.h: Storing 22252 diagnostic information 22253 22254 * smooth.c: Fixed memory leak 22255 22256 * oligoindex.c: Stopped initializing data buffer for Oligoindex_T object 22257 22258 * stopwatch.c, stopwatch.h: Created a Stopwatch_T object 22259 222602006-10-03 twu 22261 22262 * stage3.c: Removed unnecessary list reversal 22263 22264 * pair.c: Allowing jump in querypos in pair check procedure 22265 222662006-10-02 twu 22267 22268 * gmap.c: Provide stage 2 information in diagnostic output. Use stage 2 22269 information to prune bad alignments before stage 3. 22270 22271 * stage3.c: Provide stage 2 information in diagnostic output. Allow a 22272 single open in scoring a single intron compared with dual introns. 22273 22274 * stage2.h: Interface provides number of canonical and non-canonical introns 22275 22276 * stage2.c: Returned to using gendistance for computing penalties, except 22277 for diffdistance in deadp. Fixed bug in tallying unknown types of introns. 22278 22279 * sequence.c: Fixed problems with reading control-M characters (PC line 22280 feed) in input. 22281 22282 * pair.h: Reporting stage 2 information in diagnostic output. 22283 22284 * pair.c: Reporting stage 2 information in diagnostic output. Counting 22285 indels in computing percent identity for each exon. 22286 22287 * dynprog.c: Eliminated extra reward for finding semicanonical introns in 22288 final pass 22289 22290 * stage2.c: Need to take abs() when measuring diffdistance. Scoring 22291 behavior checked against revision 1.157. Making stage2 information 22292 available for diagnostic output. 22293 222942006-09-30 twu 22295 22296 * gmap.c: Added -8 flag to show results of stage 2 calculation 22297 22298 * boyer-moore.c: Revised procedure to handle ambiguous characters for PMAP. 22299 22300 * dynprog.h: Added dynprogindex information. 22301 22302 * dynprog.c: Added table for performing Boyer-Moore searches of microexons 22303 for PMAP. Reduced penalties for extending gaps. Added separate rewards 22304 for final pass of finding canonical introns. Added dynprogindex 22305 information. 22306 22307 * smooth.c: Systematically checking ends for smoothing. Using matches 22308 instead of lengths to evaluate exons. Added probabilistic checking for 22309 marking middle exons. 22310 22311 * stage3.h: Passing stage2p as a parameter to Stage3_compute. 22312 22313 * stage3.c: Major changes to algorithm. Added iteration through smoothing, 22314 dual intron, and single intron passes. Checking peel back to determine if 22315 canonical intron needs to be recomputed. Added final pass to find introns 22316 with higher reward. Added dynprogindex information. Using dynprogindex 22317 information in peeling leftward and rightward. 22318 22319 * stage1.c: New criterion for setting usep to false, namely, if support is 22320 less than a certain fraction of the maximum observed support. 22321 22322 * pair.h: Pair_check_array now returns a bool. 22323 22324 * pair.c: Handling more cases of short gaps as indels. Printing 22325 dynprogindex in diagnostic and debugging output. 22326 22327 * stage2.c: Reverted to algorithm from revision 1.157. Using diffdistance 22328 instead of gendistance. Making sufflookback depend on mapfraction. 22329 223302006-09-28 twu 22331 22332 * stage2.c: Changes made to scoring algorithm, but not well-motivated. 22333 Fixed bugs in predicting cDNA direction. 22334 223352006-09-18 twu 22336 22337 * stage3.c: Fixed bug when recomputing over adjacent dynamic programming 22338 regions at end of sequence 22339 22340 * stage2.c: Revised rules for giving credit for query distance, giving none 22341 if difference in distance is greater than min intron length. 22342 22343 * stage1.c: Doubling genomic region with each iteration, until sufficient 22344 support found for a matchpair. 22345 22346 * matchpair.c, matchpair.h: Computing and storing fraction of stage 1 support 22347 22348 * gbuffer.c, gbuffer.h, gmap.c: Allowed Genome_T object to exceed default 22349 length of genomic segment 22350 22351 * dynprog.c: Reduced penalties for gap extension, to match reductions in 22352 mismatch penalties 22353 22354 * Makefile.am: Provided target machine during compilation 22355 223562006-09-11 twu 22357 22358 * gmap.c: Included build target in version output. Increased oligomer size 22359 in PMAP from 3-4 to 4-5. 22360 223612006-09-08 twu 22362 22363 * stage2.c: Added oligos to output of debugging statements 22364 22365 * configure.ac: Using AC_FUNC_FSEEKO to check for fseeko. Added comment 22366 line for $Id$. 22367 223682006-09-07 twu 22369 22370 * stage2.c: Added debugging statements for finding shifted canonical introns 22371 22372 * stage1.c: Increased trimlength and extension past ends for PMAP 22373 22374 * gmap.c: Increased maxextension to 120000 22375 223762006-09-01 twu 22377 22378 * translation.c: Making sure to assign values to variables when number of 22379 alignment pairs is fewer than the minimum 22380 22381 * pair.c: Fixed bug in printing CDS of GFF3 format 22382 22383 * stage3.c: For PMAP, trimming ends of alignment to codon boundaries 22384 22385 * translation.c: Removed check for minimum number of pairs for PMAP 22386 22387 * dynprog.c: Changed calls to Pairpool_push. Added dynprogindex 22388 information. Reduced penalty for mismatches. 22389 22390 * dynprog.h: Changed calls to Pairpool_push. Added dynprogindex information. 22391 22392 * stage3.c: Fixed bug where peeling back yielded wrong coordinates. Changed 22393 calls to Pairpool_push. Added dynprogindex information. Recomputing 22394 regions with adjacent dynamic programming solutions. 22395 22396 * matchpair.c, stage2.c: Changed calls to Pairpool_push 22397 22398 * smooth.c: Added debugging statements for exon and intron lengths 22399 22400 * sequence.c: Fixed bug where return type should be int, not bool. 22401 22402 * pairpool.c, pairpool.h: Distinguished between gapholder and gapalign 22403 elements. Added dynprogindex to Pairpool_push. 22404 22405 * pair.c: Added debugging option for printing dynprogindex 22406 22407 * pairdef.h: Added dynprogindex to struct. Reordered fields. 22408 22409 * bool.h: Defining bool to be an unsigned char instead of an enumerated type 22410 224112006-08-03 twu 22412 22413 * fa_coords.pl.in: Added pattern Chr_ seen in some TIGR genomes. Changed 22414 variable name from chronlyp to concatenatedp. 22415 22416 * oligoindex.c: Added check for query lengths shorter than index size 22417 22418 * get-genome.c, iit_get.c: Allowed program to take coordinate requests from 22419 stdin 22420 22421 * iit-read.c, iit-read.h, iit_dump.c: Added option to dump counts of each 22422 segment 22423 22424 * gmap.c: Printing calling arguments in gff mode 22425 224262006-06-12 twu 22427 22428 * pair.c, gmap_compress.pl.in, gmap_uncompress.pl.in: Using bp to denote 22429 query length, instead of nt 22430 22431 * stage3.c: Turned off gap checking 22432 22433 * gmap_compress.pl.in: Putting cDNA length into the Coverage field 22434 22435 * gmap_uncompress.pl.in: Getting cDNA length from the Coverage field 22436 224372006-05-31 twu 22438 22439 * params.c, params.h: Adding maxoligohits as a parameter 22440 22441 * oligoindex.c, oligoindex.h: Using maxoligohits parameter, and reducing it 22442 for cross-species alignment (to avoid random and misleading matches) 22443 22444 * stage2.h: Using maxoligohits parameter 22445 22446 * stage2.c: For cross-species alignment, increasing enough_consecutive 22447 parameter and not opportunistically increasingly sampling interval 22448 22449 * stage1.h: Reduced SINGLEINTRONLENGTH to 100000 22450 22451 * stage1.c: Using maxextension parameter instead of SINGLEINTRONLENGTH 22452 directly 22453 22454 * gmap.c: Limited crossspecies parameters to maxextension and maxoligohits. 22455 224562006-05-25 twu 22457 22458 * stage2.c: Introduced detection of semicanonical introns and penalty for 22459 these. Removed distpenalty_dead, and introduced distpenalty_noncanonical; 22460 motivated by ENST0356720. Decreased distpenalty; motivated by 22461 ENST0356222. Introduced procedure for querydist_credit, bounded below by 22462 zero. Decreated querydist points when gendistance equals querydistance; 22463 motivated by ENST0354988. 22464 22465 * gmap.c: Using single intron length as basis for maxextension 22466 22467 * stage3.c: Made initial pass of build_pairs_singles work only when 22468 genomejump equals queryjump; motivated by ENST0341339. Made acceptable 22469 mismatches for dual introns depend again on defect rate. 22470 22471 * smooth.c: Removed deletion of longest middle exon in a series of short 22472 exons. Motivated by ENST0348697. 22473 22474 * stage1.c, stage1.h: Using single intron length to extend genomic segment 22475 at ends. Motivated by ENST0358972. 22476 22477 * oligoindex.c: Increased thetadiff for trimming repetitive oligos from 2 to 22478 20. Motivated by ENST0357282. 22479 22480 * gbuffer.c, gbuffer.h: Added data structures for storing positions of 22481 semicanonical dinucleotides 22482 22483 * dynprog.c, dynprog.h: Made microexon p-value threshold depend on the 22484 defect rate. Increased genomejump needed for single gap penalties to 22485 apply. Motivated by ENST0262608. 22486 224872006-05-23 twu 22488 22489 * stage2.c: Moved preprocessor directive outside of macro (needed for gcc3 22490 compiler). 22491 224922006-05-22 twu 22493 22494 * gmap.c, stage3.c, stage3.h: Changed variable name from extend_mismatch_p 22495 to trimalignmentp 22496 22497 * changepoint.c: Changed criterion from a differences in theta to a ratio 22498 224992006-05-19 twu 22500 22501 * stage3.c: Removed trimming of alignments in PMAP 22502 22503 * stage1.c: Changed some parameters to increase sensitivity 22504 22505 * chrsubset.c, chrsubset.h: Added function Chrsubset_make 22506 22507 * translation.c: Fixed assignment of amino acids to genomic sequence in PMAP 22508 22509 * stage3.h: Minor formatting change 22510 22511 * stage3.c: Printing trimmed query coordinates in path summary. Pruning 22512 stage 3 result of coverage is less than MINCOVERAGE. 22513 22514 * sequence.h: Added appropriate MAXSEQLEN for PMAP 22515 22516 * reader.c, reader.h: Allowing reading in each direction to proceed to the 22517 ends of the query sequence 22518 22519 * pairpool.c: Setting initial value for aa_g and aa_e 22520 22521 * pair.c, pair.h: Printing trimmed query coordinates in path summary 22522 22523 * oligoindex.c: Reinstated trimming of query sequence based on changepoint 22524 analysis 22525 22526 * mem.c: Fixed compiler warning about pointer arithmetic on void *. 22527 22528 * matchpair.c: Added comments 22529 22530 * gmap.c: Performing trimming of query sequence in more cases. Changed name 22531 of "mutation reference" to "reference sequence". 22532 22533 * dynprog.c: Removed step function penalty based on codons. Reduced 22534 extension penalty to obtain better behavior. 22535 22536 * stage2.c: Changed position for starting to compute mismatch gaps. Added 22537 trimming at ends for PMAP. 22538 22539 * stage1.c: Improved calculation of genome segment length, based on expected 22540 exon and intron sizes. In sampling mode, continuing sampling at current 22541 position of block pointers. 22542 22543 * block.c, block.h: Added procedures for saving and restoring blocks 22544 225452006-05-15 twu 22546 22547 * Makefile.am, gbuffer.c, gbuffer.h, gmap.c, matchpair.c, matchpair.h, 22548 pair.c, pair.h, stage2.c, stage2.h, stage3.c, stage3.h: Created Gbuffer_T 22549 object to use as workspace for various calculations 22550 22551 * translation.c: Fixed uninitialized variable 22552 22553 * dynprog.c, pair.c: Made cDNA gaps into type SHORTGAP_COMP instead of 22554 INDEL_COMP, so they get treated properly by the changepoint analysis 22555 22556 * stage3.c: Fixed memory leak 22557 22558 * changepoint.c: Changed changepoint parameters slightly 22559 225602006-05-14 twu 22561 22562 * stage3.c: Added checks to make sure both qgenome lengths are adequately 22563 long in dual intron gaps 22564 22565 * dynprog.c: Increased penalties for mismatches 22566 22567 * gmap.c: Changed 'U' flag to mean no trimming of poor alignments at ends 22568 225692006-05-13 twu 22570 22571 * gmap.c: Changed interfaces to some Stage3_T functions 22572 22573 * pair.c, pair.h: Added query length to Coverage line 22574 22575 * stage1.c: Fixed bug where maxtrial wasn't set 22576 22577 * stage2.c: Removed final assignment of dinucleotide positions 22578 22579 * stage3.h: Changed some interfaces to Pair_T functions 22580 22581 * stage3.c: Added some shortcuts for changepoint analysis. Changed some 22582 interfaces to Pair_T functions. 22583 225842006-05-12 twu 22585 22586 * gmap.c: Provided initial values to some variables 22587 22588 * oligoindex.c: Reduced MAXHITS parameter from 200 to 20 22589 22590 * stage1.c: Limiting trials for same-species alignment. Limiting salvage 22591 algorithm to short sequences and cross-species alignment. 22592 22593 * stage2.c: Implemented faster method for finding shifted canonical introns 22594 22595 * stage2.c: Saving mappings for each indexsize, and going back to best one. 22596 Introduced idea of sufficient and minimum map fraction, and aborting if 22597 minimum map fraction not satisfied. 22598 22599 * gmap.c, stage3.c, stage3.h: Added option to print output in IIT FASTA map 22600 format 22601 22602 * pair.c, pair.h: Removed parameter from Pair_print_iit_map 22603 22604 * pair.c, pair.h: Removed old code. Added a procedure for printing an IIT 22605 map. 22606 22607 * sequence.c: Removed printing of '>' from Sequence_print_header 22608 22609 * iit-read.c: Fixed bug in printing results from map iit 22610 22611 * stage2.c: Added debugging statements 22612 226132006-05-11 twu 22614 22615 * gmap.c, pair.c, pair.h, stage3.c, stage3.h: Included filename of 22616 user-provided genomic seg as source in gff3 output 22617 22618 * iit_dump.c: Included header for getopt_long 22619 22620 * README: Added more information about IIT utilities 22621 22622 * iit-read.c, iit-read.h: Added annotation-only option to IIT_dump 22623 22624 * get-genome.c: Changed program description statement slightly 22625 22626 * Makefile.am, iit_dump.c, iit_get.c, iit_store.c: Added long options and 22627 documentation for the IIT utilities 22628 22629 * iit_store.c: Added support for quotation marks in gff3 features 22630 22631 * chrnum.c, chrnum.h, chrsubset.c: Using new Chrnum_to_string interface 22632 22633 * gmap.c, stage3.c, stage3.h: Added cDNA_match option for gff3 output 22634 22635 * pair.c, pair.h: Added cDNA_match option for gff3 output, including Gap 22636 attribute. Using new Chrnum_to_string interface. 22637 22638 * gmap.c, params.c, params.h, stage3.c, stage3.h: Added procedures for 22639 allowing chromosome-tagged IIT map files, in addition to strand-tagged IIT 22640 map files. 22641 22642 * iit-read.c, iit-read.h: Added functions for retrieving multiple types and 22643 for getting label when access mode is fileio. 22644 226452006-05-10 twu 22646 22647 * iit_get.c: Allowing user to specify multiple types 22648 22649 * iit_store.c: Modified gff3 parsing to assign only one tag for each row. 22650 Using feature column as a source for labels. 22651 22652 * pair.c, pair.h: Added routines for output in GFF3 format 22653 22654 * Makefile.am: Added trimming of alignment based on changepoint analysis 22655 22656 * stage2.c: Fixed bug in scanning for reverse canonical intron 22657 22658 * backtranslation.c, pairdef.h, pairpool.c, translation.c: Introduced phase 22659 22660 * gmap.c: Added flags for gff3 output 22661 22662 * stage3.c, stage3.h: Added procedure for trimming pairs. Added gap when 22663 bounds don't make sense for dual intron gaps. Introduced gff3 format and 22664 phase. 22665 22666 * changepoint.c, changepoint.h: Initial import into CVS 22667 22668 * stage2.c: Added debugging statements for the result of stage 2 prior to 22669 trimming 22670 22671 * pair.c, pair.h: Added a function for computing matchscores from an 22672 alignment 22673 22674 * dynprog.c: Changed various gap penalties, especially at ends of sequence 22675 22676 * iit_get.c: Changed atol() to strtoul(), because atol() was truncating 22677 numbers above 2^31 in machines with long ints of 4 bytes. 22678 226792006-05-08 twu 22680 22681 * stage2.c: Changed condition for re-computing dead links from an "or" to an 22682 "and" on both directions. Added trimming of ends, based on consecutive 22683 matches. 22684 226852006-05-07 twu 22686 22687 * stage2.c: Cleaned up procedure for finding introns in PMAP, which can be 22688 shifted. Cleaned up counts of canonical and total introns. 22689 22690 * stage3.c: Fixed problems with shortcut for existing introns and with 22691 coordinates for dual genome pairs 22692 22693 * oligoindex.c: Added debugging statement 22694 226952006-05-06 twu 22696 22697 * stage2.c: Reworking of stage 2 scoring to make it more robust for 22698 low-identity sequences. Includes identification of possible canonical 22699 sites by shifting boundaries. 22700 22701 * stage3.c, stage3.h: Using dynamic programming paths computed for dual 22702 intron gaps 22703 22704 * gmap.c, stage2.c, stage2.h: Computing indexsize adaptively 22705 22706 * smooth.c, smooth.h: Removed indexsize from smoothing procedures 22707 22708 * params.c, params.h: Added minindexsize and maxindexsize to params 22709 22710 * oligoindex.c, oligoindex.h: Changed PMAP indexsize to be in aa. 22711 Calculating mapfraction. 22712 22713 * pair.c: Keeping ambiguous character and comp in PMAP alignments 22714 22715 * intron.c, intron.h: Added function to return string for printing intron 22716 type 22717 22718 * gmap.c, oligoindex.c, oligoindex.h: Put indexsize parameters inside of 22719 Oligoindex_T object 22720 227212006-05-05 twu 22722 22723 * md_coords.pl.in: Improved handling of alternate strains 22724 227252006-05-04 twu 22726 22727 * iit_store.c: Implemented parsing of gff format 22728 22729 * gmap.c: Incremented stuttercycles for PMAP 22730 22731 * indexdb.c: In monitoring commands, printing positions with commas 22732 22733 * matchpair.c, matchpair.h, stage1.c: Using maxintronlen instead of querylen 22734 as criterion for removing hits before clustering 22735 22736 * stage2.c: Checking two possible query positions for intron in PMAP 22737 22738 * dynprog.c, dynprog.h: Removed obsolete parameters for computing genome gaps 22739 22740 * stage3.c: Fixed boundaries for check of dual introns 22741 22742 * fa_coords.pl.in: Improved monitoring messages to indicate when coordinates 22743 are parsed and when they are concatenated. 22744 22745 * md_coords.pl.in: Fixed bug in handling alternate strains 22746 227472006-05-03 twu 22748 22749 * md5-compute.c, oligo-count.c: Using new version of Sequence_read 22750 227512006-05-02 twu 22752 22753 * stage2.c: Removed statement that does not apply to PMAP 22754 22755 * matchpair.c: Fixed computation of support for PMAP. Added debugging 22756 statements 22757 22758 * match.c: Fixed genomic segment retrieved for debugging 22759 22760 * iit-read.c: Minor editing changes 22761 22762 * datadir.c: Improved error message when genome directory isn't found 22763 22764 * compress.c: Removing spaces from reading of uncompressed sequence 22765 22766 * stage1.c: Increased matchpairs allowed. Fixed position adjustment for 22767 reverse strand matches on PMAP. 22768 22769 * indexdb.c: Shifted positions for .prxpositions down by one. 22770 227712006-04-21 twu 22772 22773 * gmap_setup.pl.in: Fixed bugs in printing of instructions 22774 22775 * fa_coords.pl.in: Augmented patterns allowed for specifying chromosomal 22776 location of contigs 22777 22778 * Makefile.am, chrsubset.c, chrsubset.h, get-genome.c: Added ability to 22779 print all chromosomal subsets from get-genome 22780 22781 * datadir.c: Improved error message 22782 22783 * README: Added information about the -q flag. Added additional forms for 22784 specifying chromosomal location of contigs. 22785 227862006-04-18 twu 22787 22788 * gmap_setup.pl.in: Added the -q and -Q flags for specifying indexing 22789 intervals 22790 227912006-04-07 twu 22792 22793 * gmap_setup.pl.in: Fixed bugs with new install statements 22794 22795 * gmap_setup.pl.in: Added comment about editing .chrsubset file. Creating 22796 genome.maps directory. 22797 22798 * stage3.c: Turned off CHECK 22799 22800 * README: Added comment about editing .chrsubset file 22801 228022006-04-06 twu 22803 22804 * gmap_compress.pl.in: Changed program to handle intron lengths in exon 22805 summary 22806 22807 * stage2.c, stage2.h: Introduced limit on individual intron lengths 22808 22809 * stage1.c, stage1.h: Changed variable name from maxintronlen to maxtotallen 22810 22811 * gmap.c: Added separate flag for limiting individual intron lengths 22812 22813 * pair.c: Added intronlengths to exon summary 22814 22815 * sequence.h: Increased maximum sequence length to be 1000000. 22816 228172006-04-04 twu 22818 22819 * stage3.c: Building singles if a short exon is deleted during smoothing 22820 22821 * smooth.c: Improved debugging statements 22822 228232006-03-24 twu 22824 22825 * gmap_setup.pl.in: Printing a copy of the install procedure to a file 22826 228272006-03-20 twu 22828 22829 * match.c: Match_npairings returns an int 22830 22831 * md_coords.pl.in: Passing back maxwidth as a result. 22832 22833 * gmap.c: Made changes for compatibility with PMAP. 22834 22835 * stage3.c, stage3.h: Giving maxpeelback information to dynamic programming 22836 routine, so it can use single gap penalties for long intron gaps. Made 22837 changes for compatibility with PMAP. 22838 22839 * smooth.c, smooth.h: Changed smoothing routine to be based on net intron 22840 lengths. Sequences of small exons are removed if they yield a net intron 22841 length of approximately zero. 22842 22843 * dynprog.c, dynprog.h: Disallowing intron or cDNA gaps to be placed at the 22844 edge of the segment, which caused an error to occur in the check_gaps 22845 routine. Using single gap penalties for long intron gaps. 22846 228472006-03-17 twu 22848 22849 * sequence.c: Added handler for cases where requested subsequence start and 22850 end are beyond the bounds of the sequence 22851 22852 * gmap.c, stage1.c, stage1.h: Added concept of maxtrial, to be used for 22853 chimera (subsequence) problems 22854 22855 * stage3.c: Added an exception handle for errors in checking gaps 22856 22857 * dynprog.c: Disallowed intron or cDNA gaps to be inserted at ends of the 22858 subsequence, which results in an unexpected gap. 22859 228602006-03-05 twu 22861 22862 * gmap.c: Providing maponlyp information to Sequence_read, to turn 22863 skiplength warning message on or off. 22864 22865 * sequence.h: Set MAXSEQLEN to be 200000 22866 22867 * pair.c, stage3.c: Revision of procedures to handle sequences with 22868 skiplength 22869 22870 * stage1.c: Expanded maxintronlen to include skiplength 22871 22872 * sequence.c, sequence.h: Addition of skiplength. Rewriting of code for 22873 reading sequences to handle skipping of middle correctly. 22874 228752006-03-04 twu 22876 22877 * gmap.c: Reworking of maponlyp case to generate a Stage3_T object 22878 22879 * stage3.c, stage3.h: Implementedq Stage3_direct function for maponlyp case. 22880 Cleaned up merge function for combining two Stage3_T objects. 22881 22882 * stage1.c, stage1.h: Cleaned up various procedures in stage 1 computation. 22883 Simplified function identify_matches. Eliminating extensions for maponlyp 22884 case. 22885 22886 * matchpair.c, matchpair.h: Added function for making a path from a 22887 matchpair object 22888 22889 * matchpool.c: Simplified code for handling positions on reverse genomic 22890 strands. 22891 22892 * match.c, match.h: Added function for printing the oligomer for a match. 22893 Simplified code for handling positions on reverse genomic strands. 22894 22895 * oligoindex.c: Turned off code for changepoint analysis for trimming ends 22896 22897 * pair.c, pair.h: Modified printing of path summary for maponlyp 22898 22899 * result.c, result.h: Removed Stage1_T objects from Result_T 22900 22901 * genome.c: Added debugging statements 22902 22903 * block.c, oligo.c, oligo.h: Fixed problem where oligomers read from left 22904 side need to be shifted down to low 12-mer. This corrects problem with 22905 match coordinates being off by 4. 22906 229072006-03-02 twu 22908 22909 * gmap.c: Revised code for computing chimeras 22910 22911 * chimera.c, chimera.h: Made Chimera_T object created only when completely 22912 specified 22913 22914 * stage3.c: Added a step to allow for subseq_offset, if present 22915 22916 * sequence.c, sequence.h: Added subseq_offset to Sequence_T 22917 229182006-03-01 twu 22919 22920 * dynprog.c, dynprog.h: Restored one gap behavior on ends. Using 22921 cdna_direction information on single gaps. 22922 22923 * stage3.c: Forcing single gaps to be solved. Adding cdna_direction 22924 information for single gaps. Fixed problem with short indels being 22925 inserted backward. 22926 22927 * oligoindex.c, oligoindex.h: Implemented new scheme for detecting 22928 repetitive sequence on ends, based on changepoint analysis 22929 22930 * smooth.c: Fixed memory leak. 22931 22932 * translation.c: Added check so we won't go beyond ends. Assigned variables 22933 when npairs is too few. 22934 229352006-02-27 twu 22936 22937 * stage3.c, stage3.h: Minor bug fixes 22938 229392006-02-26 twu 22940 22941 * match.c, match.h, matchdef.h, matchpool.c, stage1.c: Keeping track of 22942 number of pairings for each match, and placing a limit on the number of 22943 matchpairs generated for each match with a "promiscuous" variable 22944 229452006-02-25 twu 22946 22947 * stage2.c: Made behavior similar for sequence and reverse complement, 22948 including bug fix and using diffdistance rather than querydistance 22949 229502006-02-24 twu 22951 22952 * pairpool.c, pairpool.h: Added procedure for counting result of bounding 22953 operation 22954 22955 * pair.c, pair.h: Counting amino acids directly for protein PSL output. 22956 Fixed problem in coordinates output where chrstring was NULL. 22957 22958 * dynprog.c: Increased penalty for gaps in single alignments and made them 22959 uniform across sequence quality 22960 22961 * smooth.c, smooth.h: Rewrite of code to use arrays instead of lists. 22962 Reduced definition of short exon. Now deleting consecutive strings of 22963 short exons. 22964 22965 * translation.c: Noting large insertions and deletions of amino acids, even 22966 if not a multiple of 3 22967 229682006-02-23 twu 22969 22970 * chimera.c, chimera.h, gmap.c, stage3def.h: Moved various functions back to 22971 stage3.c 22972 22973 * stage3.c, stage3.h: Performing substitution of gaps only for final cDNA 22974 direction 22975 22976 * oligoindex.c, oligoindex.h: Turned off trimming of sequence for reference 22977 sequences and for protein sequences 22978 22979 * intron.c, intron.h: Using cdna_direction information in assigning 22980 Intron_type 22981 22982 * dynprog.c, pairpool.c, pairpool.h, stage2.c: Passing in gapp as a 22983 parameter to Pairpool_push 22984 22985 * translation.c: Fixed bug with marking backwards cDNAs relative to 22986 reference sequence 22987 229882006-02-22 twu 22989 22990 * translation.c: Fixed minor bugs in new implementation 22991 22992 * Makefile.am: Rewrite of code for determining mutations and for printing 22993 the results. Removed mutation.c and mutation.h. 22994 22995 * mutation.c, mutation.h, pair.c, pair.h, translation.c: Rewrite of code for 22996 determining mutations and for printing the results 22997 229982006-02-21 twu 22999 23000 * stage3.c: Moved some chimera functions from stage3.c to chimera.c. Set 23001 acceptable_mismatches for microexons to be 2. 23002 23003 * Makefile.am, chimera.c, chimera.h, stage3.h, stage3def.h: Moved some 23004 chimera functions from stage3.c to chimera.c 23005 23006 * dynprog.c: Increased probability standard for finding microexons 23007 230082006-02-19 twu 23009 23010 * translation.c: Fixed bug where cDNA translation was incomplete 23011 23012 * stage3.c: Fixed bug in substitution for gaps when ngap is not 3 23013 23014 * stage3.c, stage3.h: Complete rewrite of stage 3 to use gap pairs 23015 23016 * translation.c: Increased parameter for ignoring amino acid mismatches at 23017 ends of query sequence 23018 23019 * smooth.c, smooth.h: Made changes to handle new gap pairs 23020 23021 * pair.c: No longer assigning coordinates for query sequence and genomic 23022 segment within gaps 23023 23024 * matchpair.c, matchpair.h: Limiting 12-mer hits that are considered in 23025 clustering method to those that have a neighboring hit within the query 23026 length 23027 23028 * dynprog.c, dynprog.h: Inserting a single gap pair for introns and cDNA 23029 insertions instead of filling in nucleotides 23030 23031 * stage1.c: Reduced extension of genomic segment when cluster mode is 23032 required 23033 23034 * gmap.c: Put output to stderr when path not found in compressed output 23035 23036 * intron.c, intron.h: Moved Intron_type function here 23037 23038 * pairpool.c, pairpool.h: Added explicit functios for handling gap pairs 23039 23040 * pairdef.h: Added fields for queryjump and genomejump, to be used for gaps 23041 230422006-02-08 twu 23043 23044 * translation.c: Set minimum number of pairs required for a translation 23045 230462006-02-07 twu 23047 23048 * gmap.c: Now checking for existence of -g or -d flag before proceeding 23049 23050 * stage3.c: Fixed problem when solving an intron and unable to peel back 23051 anything. 23052 230532006-02-05 twu 23054 23055 * dynprog.c: Fixed problem with extending 5' and 3' ends with assumption of 23056 no gap. Added extra efficiency based on this assumption. 23057 230582006-01-19 twu 23059 23060 * README: Enhanced usage statement for gmap_setup 23061 23062 * gmap_setup.pl.in: Cleaned up flags. Added messages after each make 23063 procedure. Enhanced usage statement. 23064 23065 * gmap_process.pl.in: Removed code for a separate strain file 23066 23067 * gmap_process.pl.in: Added provision for a separate strain file, but 23068 commented out code 23069 23070 * md_coords.pl.in: Fixed problem when MD file has fewer than 6 lines. Put 23071 output into an array for printing out in one batch. Improved handling of 23072 strains. 23073 23074 * fa_coords.pl.in: Put output into an array for printing out in one batch. 23075 23076 * pmap_setup.pl.in, Makefile.am: Removed pmap_setup program 23077 23078 * stage3.c: Added procedure to fix short gaps 23079 23080 * gmapindex.c: Added ability to read reference strain from coords file 23081 23082 * gmap.c: Added provision for different stage 2 index size for PMAP 23083 230842006-01-17 twu 23085 23086 * pair.c: Fixed problem with protein PSL coordinates 23087 230882005-12-15 twu 23089 23090 * backtranslation.c, backtranslation.h: Fixed problems in backtranslation 23091 when genomic segment has lower case characters 23092 23093 * gmap.c, stage3.c, stage3.h: Preserved diagnostic info in PMAP through 23094 backtranslation 23095 23096 * pair.c: Changed printing of cDNA on ambiguous comps to be lower case if 23097 appropriate 23098 23099 * dynprog.c: Changed ends from 1 gap to no gaps. Changed open/extend 23100 penalties at ends (which may be irrelevant now). 23101 23102 * matchpool.c, stage1.c: Fixed problems with genomic position in reverse 23103 complement matches in PMAP. 23104 23105 * translation.c: Fixed problems with ends of cDNA and genomic translation 23106 for PMAP. Set margin to zero for computing amino acid changes. 23107 23108 * iit-read.c: Commented out abort 23109 231102005-12-14 twu 23111 23112 * sequence.c: Fixed uninitialized heap 23113 231142005-12-13 twu 23115 23116 * gmap.c, stage2.c, stage2.h: Added pruning before stage 2 based on number 23117 of potentially consecutive hits and short paths 23118 23119 * oligoindex.c, oligoindex.h: Added computation of potentially consecutive 23120 hits in the query 23121 23122 * stage1.c: Added filtering of matchlist based on support 23123 23124 * matchpair.c, matchpair.h: Added storage of support and usep in Matchpair_T 23125 object 23126 231272005-12-09 twu 23128 23129 * gmap.c, stage2.c: Removed code for finding PMAP unaligned access error 23130 23131 * gmap.c, stage2.c: Added code for finding PMAP unaligned access error 23132 23133 * backtranslation.c, oligoindex.c: Removed code for checking assertions 23134 23135 * backtranslation.c, oligoindex.c: Added code for checking assertions 23136 231372005-12-08 twu 23138 23139 * pair.c: Streamlined determination of amino acid coordinates in alignment 23140 output 23141 23142 * indexdb.c: Fixed bug in handling offsets in alternative strains in PMAP 23143 23144 * dynprog.c: Reformulated assignment of pointers in two-dimensional array 23145 231462005-12-06 twu 23147 23148 * translation.c: Formatting change 23149 23150 * stage1.c: Turned on use of matchpool. Fixed problem where list was not 23151 reset to NULL. 23152 23153 * pair.c: Changed dir:unknown to dir:indet 23154 23155 * oligoindex.c: Fixed uninitialized variable in GMAP 23156 23157 * matchpool.c: Improved debugging statements 23158 23159 * matchpair.c: Increased standard for stage 1 support 23160 231612005-12-05 twu 23162 23163 * oligoindex.c: Made code compatible with both GMAP and PMAP 23164 23165 * backtranslation.c, dynprog.c: Reduced memory allocation for 23166 two-dimensional array into a one-dimensional array 23167 23168 * matchpool.c, pairpool.c: Removed initial creation of chunks 23169 23170 * oligoindex.c: Fixed bug in PMAP where stop codon in the genomic sequence 23171 created a value that exceeded oligospace 23172 23173 * pair.c, pair.h: Added a way for the thread worker id to be printed with 23174 the result. Removed ambiguous comp characters from gmap. 23175 23176 * gmap.c, reqpost.c, reqpost.h, result.c, result.h, stage3.c, stage3.h: 23177 Added a way for the thread worker id to be printed with the result 23178 23179 * matchpool.c: Added commands for saving and restoring pointers, so memory 23180 can be re-used 23181 23182 * match.c, match.h, stage1.c: Added compiler conditions for using matchpool 23183 method. 23184 23185 * genome.c: Fixed messages to user 23186 23187 * chrsubset.c: Changed format of output 23188 23189 * translation.c: Fixed bug in translating backward cDNAs. Extended 23190 translation all the way to the end. 23191 231922005-12-04 twu 23193 23194 * Makefile.am, gmap.c, matchpair.c, matchpair.h, matchpairdef.h, 23195 matchpairpool.c, matchpairpool.h, stage1.c, stage1.h: Removed special 23196 memory allocation routines for matchpairs 23197 23198 * Makefile.am, gmap.c, match.c, match.h, matchpair.c, matchpair.h, 23199 matchpairdef.h, matchpairpool.c, matchpairpool.h, matchpool.c, 23200 matchpool.h, stage1.c, stage1.h: Added special memory allocation routines 23201 for matches and matchpairs 23202 23203 * iit-read.c: Added an exception handler 23204 23205 * pair.c: Commented out unused procedure 23206 23207 * genome.c: Added include of except.h 23208 232092005-12-02 twu 23210 23211 * gmap.c: Fixed memory leak 23212 23213 * translation.c: Added separate routine for printing list of mutations. 23214 Fixed problem where number of cDNA nucleotides in codon is 4 or 5. 23215 23216 * stage2.c: Clarified different code for gmap and pmap 23217 23218 * stage1.c: Added checking routine for Stage1 object 23219 23220 * access.c, mem.c: Augmented debugging statements 23221 23222 * sequence.c: Fixed case where first sequence of FASTA file has no header, 23223 but subsequent sequences do. 23224 23225 * nr-x.c: Initial import into CVS 23226 23227 * pair.c: Added printing of aapos to all positions in "f -9" mode 23228 23229 * mutation.c: Simplified logic of merge functions 23230 232312005-11-29 twu 23232 23233 * match.h: Provided interface for new functions 23234 23235 * gmap.c: Fixed bug due to switched parameters 23236 23237 * stage3.c: Added comment 23238 23239 * config.site: Added information about defaults 23240 23241 * README: Added information about Cygwin and defaults 23242 232432005-11-28 twu 23244 23245 * stage1.c: Added include of match.h 23246 232472005-11-23 twu 23248 23249 * acinclude.m4, fopen.m4, configure.ac: Added commands to check for 'b' or 23250 't' flag to fopen 23251 23252 * pmap.c: Removed obsolete file 23253 23254 * Makefile.am, access.c, chrsubset.c, datadir.c, fopen.h, genome-write.c, 23255 genomeplot.c, gmap.c, gmapindex.c, iit-read.c, iit-write.c, iit_store.c, 23256 indexdb.c, oligo-count.c, pdldata.c, pmapindex.c: All calls to fopen now 23257 generalized to handle systems that allow or disallow the 'b' or 't' flag 23258 232592005-11-22 twu 23260 23261 * VERSION: Updated version 23262 23263 * Makefile.am: Removed coords1.test, which is now performed by setup1.test 23264 and setup2.test 23265 23266 * setup1.test.in, setup2.test.in: Added prerequisite of fa_coords program 23267 for setup tests 23268 23269 * README: Made instructions for raw genome build match changes in gmap_setup 23270 23271 * gmap_setup.pl.in: Changed name of make command 23272 23273 * gmap_setup.pl.in: Clarified comments 23274 23275 * gmap.c: Made npaths output correct when user provides a segment 23276 23277 * match.c, matchdef.h, stage1.c: Storing reciprocal of nentries to avoid 23278 repeating this calculation multiple times later 23279 23280 * setup1.test.in, setup2.test.in: Made changes in test to match changes in 23281 program 23282 23283 * align.test.ok, map.test.ok: Made change in output from Mutations to Amino 23284 acid changes 23285 23286 * Makefile.am: Made change in name of coords file 23287 23288 * README: Made instructions consistent with changes in programs 23289 232902005-11-21 twu 23291 23292 * fa_coords.pl.in: Changed a flag. Output now going to stdout rather than 23293 stderr. 23294 23295 * gmap_setup.pl.in: Now making the call to fa_coords or md_coords within the 23296 Makefile 23297 23298 * matchpair.c: Turned off debugging 23299 23300 * match.c, match.h, matchdef.h: Storing number of entries for each match 23301 23302 * indexdb.c: Moved one type of debug macro into its own category 23303 23304 * stage1.c: Weighted dangling computation according to number of entries for 23305 each match 23306 23307 * translation.c: Fixed bug where pointer went past beyond sequence 23308 233092005-11-19 twu 23310 23311 * chrsubset.c, gmap.c: Added printing of chrsubset information. 23312 Consolidated printing of npaths information into a single function. 23313 23314 * backtranslation.c: Using the two aamarkers. Allowing matches to codons 23315 even for frameshifts. 23316 23317 * mutation.c: Allowed merging of adjacent insertions 23318 23319 * translation.c: Made PMAP assignment of genomic amino acids conform to GMAP 23320 code for assignment of cDNA amino acids 23321 23322 * translation.c: Added further translation of cDNA beyond genomic stop 23323 codon, if possible 23324 23325 * translation.c: Streamlined code for amino acids to cDNA sequence 23326 23327 * translation.c: Overhaul of method for assigning amino acids to cDNA 23328 sequence, now based on separate marking and assignment of codons. 23329 23330 * pair.c, pairdef.h, pairpool.c: Created separate aamarkers for genomic and 23331 cDNA sequence 23332 233332005-11-18 twu 23334 23335 * gmap.c, params.c, params.h, stage3.c, stage3.h, translation.c, 23336 translation.h: Added flag for specifying maximum number of amino acid 23337 changes to show 23338 23339 * stage1.c: Fixed memory leak 23340 23341 * matchpair.c: Fixed read of uninitialized heap when bestsize == 0 23342 23343 * matchpair.c, matchpair.h: Removed storage of support value 23344 23345 * gmap.c, matchpair.c, matchpair.h, stage1.c: Moved sequence pruning 23346 procedures from gmap.c to matchpair.c 23347 23348 * stage1.c: Fixed bug which caused loop to continue unnecessarily 23349 23350 * gmap.c, result.c, stage1.h: Added complete option for freeing Stage 1 23351 objects 23352 23353 * stage1.c: Introduced idea of stepping through trials to identify poor 23354 genomic matches 23355 23356 * matchpair.c, matchpair.h: Introduced method for salvaging individual 23357 12-mer hits 23358 23359 * gmap.c, stage1.c, stage1.h: Simplified call to Stage1_matchlist 23360 23361 * stage1.c: Cleaning up parameters in preparation for cycling through stage 1 23362 233632005-11-17 twu 23364 23365 * indexdb.c: Added forward/backward to pre-loading messages for pmap 23366 23367 * translation.c: Skipping mutation calls on non-standard amino acids 23368 23369 * backtranslation.c: Fixed bug when trying to backtranslate non-standard 23370 amino acids 23371 23372 * gmap_setup.pl.in: Added intermediate commands to Makefile 23373 233742005-11-11 twu 23375 23376 * backtranslation.c: Improved matching of genomic codon to cdna codon. 23377 23378 * translation.c: Added debugging statement 23379 23380 * pair.c: Restored printing of genomic sequence for ambiguous matches in pmap 23381 233822005-11-10 twu 23383 23384 * genome-write.c: Added read of linefeed after FASTA entry in raw genome 23385 files. Improved speed of writing blocks of zeros or X's. 23386 23387 * gmapindex.c: Fixed bug in skip_sequence for raw genome files 23388 23389 * get-genome.c: Implemented printing of raw genome files 23390 23391 * gmap.c, stage3.c, stage3.h: Moved final translation and backtranslation 23392 steps into print procedures 23393 23394 * Makefile.am, backtranslation.c, backtranslation.h, translation.c, 23395 translation.h: Moved nucleotide consistency procedures for pmap into 23396 backtranslation.c 23397 23398 * pair.c: Removed consistency conversion. Now being done by backtranslation 23399 procedures. Removed meaning of AMBIGUOUS_COMP for compressed output of 23400 pmap. 23401 23402 * dynprog.c: Added actual coordinates to debugging statements 23403 23404 * translation.h: Made backtranslation procedure more rigorous. 23405 23406 * translation.c: Made backtranslation procedure more rigorous. Added 23407 debugging statements. 23408 234092005-11-09 twu 23410 23411 * get-genome.c: Changed -r flag to also indicate use of the uncompressed 23412 genome file 23413 23414 * get-genome.c, sequence.c, sequence.h: Added uncompressed raw format for 23415 printing genome segment 23416 23417 * genome-write.c, genome-write.h, gmapindex.c: Added uncompressed raw format 23418 for genome file 23419 234202005-11-08 twu 23421 23422 * pair.c: Reformulated printing of protein-based PSL output 23423 23424 * intlist.c: Added include of stdio.h 23425 23426 * chimera.c: Removed include of nmath.h 23427 23428 * gmap.c: Allowed coordinate output for pmap. Changed flag to -f 9. 23429 234302005-11-04 twu 23431 23432 * gmap_compress.pl.in: Allowed handling of PMAP output 23433 23434 * gmap_uncompress.pl.in: Fixed bug in printing last line of alignment 23435 234362005-11-01 twu 23437 23438 * gmap.c, pair.c, pair.h, stage3.c, stage3.h: Allowed introns to be printed 23439 in exon mode 23440 23441 * matchpair.c: Imposed the requirement that minsize be 2 or more away from 23442 bestsize 23443 234442005-10-31 twu 23445 23446 * gmap.c, pair.c, pair.h, stage3.c, stage3.h: Added ability to print exons 23447 using genomic sequence 23448 234492005-10-28 twu 23450 23451 * indexdb.c: Using lseek instead of fseek/fseeko for writing a positions 23452 file on disk 23453 23454 * access.c, access.h: Added a function for opening a file as read/write 23455 234562005-10-27 twu 23457 23458 * VERSION: Updated version 23459 23460 * Makefile.am: Restored setup2.test 23461 23462 * setup2.test.in: Made test use new gmap_setup script 23463 23464 * setup1.test.in: Removed install step 23465 23466 * gmap_setup.pl.in: Fixed bug in clean statement 23467 23468 * configure.ac: Added check for fseeko 23469 23470 * gmap.c: Added information about various type sizes to -V flag 23471 23472 * compress.c, genome-write.c, indexdb.c: Using fseeko if available 23473 234742005-10-25 twu 23475 23476 * gmap_setup.pl.in: Fixed bug where -W flag was in the wrong branch 23477 23478 * pair.c: Removed extraneous linefeed in compressed output 23479 23480 * VERSION: Updated version number 23481 23482 * pair.c: Fixed psl output 23483 23484 * README: Clarified use of ./configure flags. Added instructions for the -C 23485 flag in fa_coords. 23486 23487 * gmap_setup.pl.in: Restored -W flag for writing directly to file 23488 23489 * gmap_setup.pl.in: Added instructions for clean to Makefile 23490 23491 * fa_coords.pl.in: Added -C flag to make each sequence a separate chromosome 23492 23493 * pair.c, pair.h, stage3.c: Added printing of cDNA direction in compressed 23494 output 23495 23496 * bigendian.h, mem.h: Added include of config.h. 23497 234982005-10-21 twu 23499 23500 * VERSION: Updated version 23501 23502 * README: Added instructions for running make after gmap_setup 23503 23504 * MAINTAINER: Added reminder to check for DEBUG mode 23505 23506 * pair.c, pair.h, stage3.c, stage3.h: Restored printing of strain information 23507 23508 * pdldata.c: Fixed typo 23509 23510 * oligo-count.c: Using new interface to indexdb 23511 23512 * gmap.c: Added error message if user tries to use strain information and 23513 file is not found 23514 23515 * gmap.c: Restored printing of strain information. Added conversion to 23516 upper case for altstrain sequence. 23517 23518 * genome.c, indexdb.c: Added printing of number of bytes 23519 23520 * access.h: Added MAX32BIT 23521 23522 * access.c: Added debugging statements 23523 23524 * Makefile.am: Added needed files 23525 235262005-10-18 twu 23527 23528 * configure.ac: Added warning message if mmap not available 23529 23530 * fa_coords.pl.in: Added ability to read from stdin 23531 23532 * setup1.test.in: Added install command 23533 23534 * access.c, access.h: Made Access_filesize an external routine 23535 23536 * genuncompress.c, pdldata.c: Using routines from access.c 23537 23538 * Makefile.am: Added access.c and access.h to programs with IIT_T object 23539 23540 * chrsubset.c: Added include of config.h 23541 23542 * genome-write.c, genomeplot.c, get-genome.c, iit_dump.c, iit_get.c, 23543 pmapindex.c, segmentpos.c: Changed calls to IIT_free and IIT_annotation 23544 23545 * gmapindex.c: Removing free of accsegmentpos_table, which fails on some 23546 computers 23547 23548 * gmap.c: Reading user-provided genomic segment and reference sequence 23549 before FASTA query 23550 23551 * iit-write.c, iit-write.h: Made write version of IIT_free static and 23552 renamed it. 23553 23554 * iit-read.h: Changed interface to IIT_annotation. 23555 23556 * iit-read.c: Added FILEIO mode for reading IIT_T objects. Changed 23557 interface to IIT_annotation. 23558 23559 * iitdef.h: Made mutex part of IIT_T object. Added offset to IIT_T object 23560 for FILEIO mode. 23561 23562 * indexdb.c: Made mutexes part of Indexdb_T object. Changed calls to 23563 IIT_annotation. 23564 23565 * genome.c: Made mutex part of Genome_T object 23566 23567 * access.h: Added flag for randomp. Added function for read/write mmap. 23568 23569 * access.c: Moved file size determination to a separate function 23570 235712005-10-14 twu 23572 23573 * gmap.c: Moved reading of input sequences to beginning 23574 23575 * indexdb.c: Minor fixes 23576 23577 * access.h: Returning length and time for Access_immediate 23578 23579 * access.c: Returning length and time for Access_immediate. Forcing read of 23580 pages during pre-load. 23581 23582 * datadir.c, gmap.c: Removed unused variables 23583 23584 * genome.c, stage3.c: Added necessary include file 23585 23586 * result.c: Addressed compiler warning 23587 23588 * pair.c: Fixed faulty print statement in pslformat_nt 23589 23590 * indexdb.c: Added necessary include file. Removed unnecessary variables. 23591 23592 * dynprog.c: Applied type conversion for char to access array 23593 23594 * Makefile.am: Added access.c and access.h 23595 23596 * blackboard.c, blackboard.h, gmap.c: Added nextchar to Blackboard_T object 23597 23598 * gmap.c: Now reading first sequence in main thread, and using existence of 23599 a second sequence to determine whether to start threads and to pre-read 23600 offsets file for GMAP. Conditioning some flags based on existence of mmap 23601 and threading support. 23602 23603 * datadir.h: Removed unnecessary include 23604 23605 * intlist.c: Minor fix to resolve gcc compiler warning 23606 23607 * access.c, access.h, genome.c, indexdb.c: Standardized file access routines 23608 and moved them to access.c 23609 236102005-10-13 twu 23611 23612 * genomeplot.c, plotdata.c, plotdata.h: Fixed ASCII printing of universal 23613 coordinates when a range is selected 23614 23615 * matchpair.c: Fixed calculation of genome length for segment 23616 23617 * sequence.c: Fixed Sequence_read_unlimited to handle sequences without a 23618 header line. 23619 236202005-10-12 twu 23621 23622 * gmap_setup.pl.in: Changed program to generate a Makefile 23623 23624 * fa_coords.pl.in, md_coords.pl.in: Deleted comment about gmap_setup running 23625 time 23626 23627 * gmap_process.pl.in: Initial import into CVS 23628 23629 * Makefile.am: Added instructions for gmap_process 23630 23631 * setup1.test.in: Modified setup test for new interface to utility programs 23632 23633 * Makefile.am: Modified setup test to put binary files in tests directory 23634 23635 * MAINTAINER: Minor note to self 23636 23637 * Makefile.am: Made FULLDIST work for gmap sources 23638 23639 * gmap.c: Made separate flags for batch for offsets and batch for positions 23640 file. Simplified input thread. 23641 23642 * indexdb.h: Made separate flags for batch for offsets and batch for 23643 positions file 23644 23645 * indexdb.c: Added memory mapping for offsets files under PMAP. Made 23646 separate flags for batch for offsets and batch for positions file. 23647 23648 * oligoindex.c: Removed stop codon from oligomers in stage 2 23649 23650 * Makefile.am: Moved beta code for GMAP into a separate program 23651 23652 * stage2.c: Moved PMAP conditionals out of debugging statements 23653 236542005-10-11 twu 23655 23656 * Makefile.am: Removed conditional distribution of files 23657 23658 * translation.c: Including comp.h header 23659 23660 * gmap.c, pmapindex.c: Changed PMAP indexing interval to be based on amino 23661 acids. 23662 23663 * oligop.c: Removed STOP from amino acid alphabet. 23664 23665 * gmapindex.c: Generating chrsubset file at same time as chromosome file 23666 23667 * indexdb.h: Removed STOP from amino acid alphabet. Changed PMAP interval 23668 to be based on amino acids. 23669 23670 * indexdb.c: Simplified conversion of oligomer to amino acid index for PMAP. 23671 Removed STOP from amino acid alphabet. Computing each protein frame 23672 separately. 23673 23674 * configure.ac: Added large file support with AC_SYS_LARGEFILE. Removed 23675 setup test number 2. Added gmap_process. 23676 23677 * acinclude.m4: Removed macros for O_LARGEFILE 23678 23679 * open-flags.m4: Removed file open-flags.m4 23680 236812005-10-10 twu 23682 23683 * acinclude.m4, open-flags.m4: Added check for O_LARGEFILE in open 23684 236852005-10-07 twu 23686 23687 * gmap_setup.pl.in: Restored -W flag and improved it 23688 236892005-10-06 twu 23690 23691 * VERSION: Updated version 23692 23693 * configure.ac: Added hook for pmap_setup.pl 23694 23695 * README: Added explanation of full, uncompressed genome, and of batch modes 23696 23697 * gmap_setup.pl.in: Added checks to make sure desired files are built. 23698 Added printing of commands to stdout. 23699 23700 * pmap_setup.pl.in: Added checks to make sure desired files are built 23701 23702 * gmap_compress.pl.in, gmap_uncompress.pl.in, pair.c: Altered format of 23703 compressed output to indicate ambiguous matches 23704 23705 * genome.c: Fixed batch loading of full genomes greater than 2 gigabytes 23706 23707 * gmap.c: Modified message about batch mode and multiple threads mode 23708 23709 * stage2.c: Parameterized alignment characters and defined them centrally in 23710 comp.h. Restored previous intron penalties based on length. 23711 23712 * pair.c: Parameterized alignment characters and defined them centrally in 23713 comp.h. Now printing ambiguous nucleotide matches. 23714 23715 * dynprog.c: Parameterized alignment characters and defined them centrally 23716 in comp.h. Added separate table for consistent nucleotide pairs. 23717 23718 * Makefile.am, comp.h, pairpool.c, stage3.c, translation.c: Parameterized 23719 alignment characters and defined them centrally in comp.h 23720 237212005-10-05 twu 23722 23723 * gmap.c: Changed batch mode to be of two types: pre-loading of indices 23724 only, and pre-loading of both indices and genome. 23725 23726 * gmap.c: Clarified various user messages 23727 23728 * indexdb.c: Added an explicit check for a nonsensical offsets file 23729 23730 * pair.c: Made margin width determined dynamically in printing the alignments 23731 237322005-10-03 twu 23733 23734 * dynprog.c: Removed reverse intron possibilities from PMAP 23735 23736 * gmapindex.c: Restored monitoring output for logging contigs 23737 23738 * indexdb.c: Added fwd/rev to monitoring commands for indexing offsets and 23739 position files 23740 23741 * compress.c: Added monitoring commands for compressing and uncompressing 23742 files 23743 23744 * gmap_setup.pl.in: Clarified behavior and instructions for building a full 23745 (uncompressed) genome file 23746 23747 * fa_coords.pl.in: Abbreviated monitoring output, with a parameter that 23748 controls which contigs to ignore 23749 23750 * Makefile.am: Added make instructions for pmap_setup 23751 23752 * pmap_setup.pl.in: Initial import into CVS 23753 237542005-10-01 twu 23755 23756 * gmap.c: Performing translation of query sequence and genomic segment to 23757 upper case. Turned off stage 1 for user-provided genomic segment in PMAP. 23758 Provided -G flag for specifying full genome, if it exists. 23759 23760 * genome.c: Turned warning into error, if user wishes to use a full genome 23761 and none exists 23762 23763 * dynprog.c: Allowed intron gap parameter to be arbitrarily large 23764 23765 * pair.c, stage2.c, translation.c: Fixed handling of user-provided genomic 23766 segment with lower case characters for PMAP 23767 23768 * stage1.c: Improved debugging statement 23769 23770 * oligoindex.c: Minor formatting change 23771 23772 * mem.c: Enhanced trap features 23773 23774 * boyer-moore.c: Removed assertions 23775 237762005-09-30 twu 23777 23778 * boyer-moore.c, dynprog.c, dynprog.h, stage3.c, stage3.h: Made stage 3 use 23779 upper case for query sequence and genomic segment when needed, but 23780 original sequences for building alignment 23781 23782 * oligoindex.c, oligoindex.h, stage2.c, stage2.h: Made stage 2 use upper 23783 case for query sequence and genomic segment for oligomer chaining, but 23784 original sequences for building alignment 23785 23786 * oligo.c, oligop.c, stage1.c, stage1.h: Made stage 1 assume upper case 23787 query sequence 23788 23789 * pair.c: Removed call to toupper 23790 23791 * complement.h, sequence.c, sequence.h: Provided utilities for making 23792 uppercase and alias versions of sequences 23793 23794 * compress.c: Added toupper as reason for including ctype.h 23795 23796 * plotdata.c: Revised autoscale function 23797 237982005-09-29 twu 23799 23800 * stage2.c: For PMAP, fixed bug where C terminus of query sequence was not 23801 aligned. Eliminated computation of reverse intron direction for PMAP. 23802 23803 * oligoindex.c: Modified comments 23804 23805 * gmap.c: Removed Sequence_trim for PMAP, and reduced stage 2 indexsize. 23806 23807 * pair.c: Changed psl output to reflect definition of a block to be a region 23808 without indels or gaps, instead of an exon 23809 238102005-09-22 twu 23811 23812 * oligoindex.c: Make amino acid index for stage 2 (with 21 amino acids) 23813 distinct from that of stage 1 (with 16) 23814 23815 * indexdb.c, indexdb.h, oligop.c: Compressing 21 amino acids into 16 to 23816 allow offsets of amino acid 7-mers to fit into less than 2 GB 23817 238182005-09-21 twu 23819 23820 * stage1.c: Parameterized size of oligomers for PMAP 23821 23822 * gmap.c, indexdb.h: Parameterized interval for stage 1 when user provides a 23823 genomic segment 23824 23825 * pmapindex.c: Parameterized size of oligomers 23826 23827 * matchpair.c: Turned off debugging 23828 23829 * indexdb.c, indexdb.h: Introduced indexing of 7-mers by PMAP 23830 23831 * VERSION: Updated version 23832 23833 * gmap_setup.pl.in: Commented out -W flag for forcing write to file. Added 23834 option -G for making an uncompressed version of the genome (.genome file). 23835 23836 * fa_coords.pl.in: Allowed both chr and Chr in parsing for chromosomal 23837 mapping 23838 23839 * config.site: Clarified possible choices for LDFLAGS 23840 23841 * matchpair.c: Penalizing clusters spread out in repetitive genomic regions 23842 23843 * pdlimage.c: Made images in color 23844 238452005-09-20 twu 23846 23847 * gmapindex.c: Commented out monitoring statement about logging contigs 23848 23849 * stage1.c: Fixed a bug involving subtraction of two unsigned ints into a 23850 signed int, occurring for chromosomes greater than 2^31 in length. 23851 238522005-09-19 twu 23853 23854 * stage2.c: Fixed bug when stage 2 fails 23855 23856 * pair.c: Fixed assessment of unknown bases for PMAP queries 23857 23858 * matchpair.c: Fixed computation of stretch for PMAP protein queries 23859 23860 * indexdb.c: Removed debugging flag 23861 23862 * iit_get.c: Added termination message and flushing output when input coming 23863 from stdin 23864 23865 * gmap.c: Added debugging messages 23866 238672005-09-16 twu 23868 23869 * Makefile.am, pdlimage.c: Initial addition of pdlimage to CVS. 23870 238712005-09-08 twu 23872 23873 * iit-read.c, iit-read.h, stage3.c: Added option to print levels of map 23874 results 23875 23876 * intlist.c, intlist.h: Added command for Intlist_to_string 23877 23878 * gmap.c: Modified directory printing to go to a given file pointer. Added 23879 information about default directory to print_version command. 23880 23881 * datadir.c, datadir.h, get-genome.c: Modified directory printing to go to a 23882 given file pointer 23883 23884 * genomeplot.c, plotdata.c, plotdata.h: Changed format of positions file. 23885 Changed title for summary genome plots. 23886 238872005-09-06 twu 23888 23889 * get-genome.c: Added ability to print levels of map contents. Fixed bug in 23890 interpreting an entire chromosome. 23891 238922005-09-02 twu 23893 23894 * genomeplot.c, pdldata.c, pdldata.h, plotdata.c, plotdata.h: Generalized 23895 variable for transform and added reciprocal 23896 23897 * genomeplot.c, plotdata.c, plotdata.h: Added option to autoscale 23898 23899 * genomeplot.c, pdldata.c, pdldata.h, plotdata.c, plotdata.h: Added options 23900 for computing summary of multiple samples 23901 23902 * genomeplot.c, plotdata.c, plotdata.h: Added functions for printing a 23903 threshold line, and for printing output in ascii format. 23904 23905 * pdldata.c: Now removing line feeds from annotations. If no annotations 23906 are available, using sample numbers. 23907 239082005-08-29 twu 23909 23910 * plotgenes.c, plotgenes.h: Improved display of genes 23911 23912 * genomeplot.c: Made changes so PDL file is read only when necessary. Added 23913 extra room for showing genes. 23914 239152005-08-26 twu 23916 23917 * genomeplot.c, plotdata.c, plotdata.h: Printing accession header only if 23918 one sample per page. Reduced default number of genomes per page to 12. 23919 23920 * genomeplot.c, pdldata.c, pdldata.h, plotdata.c: Added ability to read 23921 sample identifiers from a separate file for PDL input 23922 23923 * genomeplot.c: Added ability to plot a single page 23924 239252005-08-23 twu 23926 23927 * gmap.c, pair.c, pair.h, stage3.c, stage3.h: For PMAP, allowed PSL output 23928 in both nucleotide coordinates and protein coordinates 23929 23930 * Makefile.am, genomeplot.c, plotgenes.c, plotgenes.h: Added ability to plot 23931 genes 23932 239332005-08-18 twu 23934 23935 * gmap.c, stage3.c, stage3.h: Added option for printing coordinates 23936 23937 * plotdata.c, plotdata.h: Added options for printing dots and overlapping 23938 samples 23939 23940 * pair.c, pair.h: Added option for printing coordinates. Trying to fix PSL 23941 output for PMAP. 23942 23943 * genomeplot.c: Added option for printing dots and overlapping samples 23944 23945 * color.c, color.h: Added color brewer palette 23946 23947 * plotdata.c: Prevented printing of empty strings 23948 239492005-08-16 twu 23950 23951 * plotdata.c, plotdata.h: Fixed bug when only a subset of genes is selected. 23952 Added commands for gif output. 23953 23954 * get-genome.c, gmap.c: Showing available map files when valid one is not 23955 entered 23956 23957 * datadir.c, datadir.h: Added function to list directory contents 23958 23959 * genomeplot.c: Allowed user to specify a list of samples to plot 23960 23961 * intlist.c, intlist.h: Added function Intlist_from_string 23962 23963 * stage3.c: Fixed mapping to account for cDNA direction 23964 239652005-08-10 twu 23966 23967 * Makefile.am, genomeplot.c, pdldata.c, pdldata.h, plotdata.c, plotdata.h: 23968 Allowed genomeplot to read PDL files 23969 239702005-08-07 twu 23971 23972 * iit-read.c: Improved debugging statements 23973 23974 * gmap.c, stage3.c, stage3.h: Added option to map by exons 23975 23976 * pair.c, pair.h: Added function to retrieve exon bounds 23977 239782005-08-04 twu 23979 23980 * Makefile.am, gmap.c: Merged pmap main code into gmap.c 23981 23982 * sequence.c, sequence.h: Added functionality for pmap chimeras 23983 23984 * stage3.c, stage3.h: Changed function to take queryntlength instead of 23985 queryseq. Made function work with both gmap and pmap. 23986 23987 * chimera.c, chimera.h: Changed functions to take queryntlength instead of 23988 queryseq 23989 23990 * pair.c: Made PSL format for proteins print protein coordinates 23991 239922005-08-03 twu 23993 23994 * chimera.c, chimera.h, gmap.c: Changed chimera algorithm to potentially 23995 search both sides of an incomplete alignment 23996 239972005-08-02 twu 23998 23999 * stage3.c: Increased size of merge length for chimeric exon-exon junctions 24000 24001 * sequence.c: Restored trimming of subsequences 24002 240032005-08-02 gcavet 24004 24005 * modules: put back to original state 24006 24007 * modules: added dev module 24008 240092005-08-01 twu 24010 24011 * stage3.c, stage3.h: Changed chimeric margin detection to work on both ends 24012 24013 * stage1.c: Added debugging statements 24014 24015 * chimera.c, chimera.h, gmap.c: Changed chimeric search to work on both ends 24016 that fail to align 24017 24018 * sequence.c: Turned trimming off for subsequences 24019 24020 * pair.c, pair.h: Added indel penalties at appropriate end for chimeric path 24021 scores 24022 24023 * Makefile.am, get-genome.c: Allowed user to look up information in map iit 24024 files 24025 24026 * chimera.c: Tested code for checking if breakpoint is outside the alignment 24027 240282005-07-29 twu 24029 24030 * datum.c, datum.h, genomeplot.c, plotdata.c, plotdata.h: Allowed colors to 24031 be specified in input file 24032 24033 * chrsubset.c, chrsubset.h, genomeplot.c, plotdata.c: Implemented 24034 user-selected genomic range 24035 240362005-07-28 twu 24037 24038 * plotdata.c: Allowed program to handle NaNQs 24039 24040 * genomeplot.c, plotdata.c, plotdata.h: Added log and signed cube root 24041 functions 24042 24043 * genomeplot.c, plotdata.c, plotdata.h: Added ability to print genome on a 24044 single line 24045 240462005-07-27 twu 24047 24048 * genomeplot.c: Added ability to handle multiple samples 24049 24050 * plotdata.c, plotdata.h: Added ability to detect and read header lines. 24051 Made code for starting and ending pages extern. 24052 24053 * genomeplot.c, plotdata.c, plotdata.h: Implemented printing of circular 24054 genome 24055 240562005-07-26 twu 24057 24058 * pair.c: Corrected query coordinates of chimera in compressed mode 24059 24060 * pairpool.c: Fixed problem where a gap was left at the 5' end of a bounded 24061 transfer. 24062 24063 * datum.c, datum.h, genomeplot.c, plotdata.c, plotdata.h: Added ability to 24064 read specified colors for each line 24065 24066 * datum.c, datum.h, genomeplot.c, plotdata.c, plotdata.h: Allowed printing 24067 of segments between chromosomes 24068 240692005-07-25 twu 24070 24071 * pmap.c, stage3.c, stage3.h: Allowed chimeric pieces to be merged over 24072 longer length if ends have strong splice sites 24073 24074 * stage1.c: Restored reader overlap for longer sequences 24075 24076 * gmap.c, iit-read.c, iit-read.h: Added chromosomal positions to map 24077 information 24078 24079 * chrnum.c, chrnum.h: Added function to get offset for a chrnum 24080 24081 * Makefile.am: Added chrsegment.h to sources for genomeplot 24082 240832005-07-22 twu 24084 24085 * gmap.c, pmap.c, stage3.c, stage3.h: Allowed two parts of chimera to merge 24086 if close on the genome 24087 240882005-07-21 twu 24089 24090 * stage3.c, stage3.h, translation.c, translation.h: Clarified code specific 24091 to PMAP and GMAP 24092 24093 * sequence.c: Changed sequence header for PMAP to refer to amino acids 24094 24095 * pair.c, pair.h: Added ability to print inferred nucleotide sequence for 24096 PMAP 24097 24098 * oligop.c: Clarified meaning of INDEX1PART to be number of amino acids 24099 24100 * pmapindex.c: Clarified meaning of INDEX1PART to be number of nucleotides 24101 24102 * oligo-count.c: Using new interface for Reader_new 24103 24104 * indexdb.c, indexdb.h: Clarified meaning of INDEX1PART to be number of 24105 nucleotides for PMAP. 24106 24107 * pmap.c: Removed -q flag for specifying stage 1 interval, and removed -T 24108 flag for truncating sequence at full-length protein. Specified -Q flag to 24109 be printing of inferred nucleotide sequence. 24110 24111 * gmap.c: Removed -q flag 24112 24113 * Makefile.am: Added beta source files for pmap 24114 24115 * stage1.c: Introduced min intron length. Clarified meaning of INDEX1PART 24116 to be number of amino acids. Added debugging statements. 24117 24118 * reader.c, reader.h: Allowed crossover of start pointer and end pointer so 24119 that middle oligomers will be read. Should help in mapping of short 24120 sequences. 24121 24122 * translation.c, translation.h: Moved combinatorial testing of codons to 24123 translation step 24124 24125 * stage3.h: Performing protein translation only when necessary. 24126 24127 * stage3.c: Considering only forward intron directions for pmap. Performing 24128 protein translation only when necessary. 24129 24130 * stage2.c: Considering only forward intron directions for pmap 24131 24132 * sequence.c: Changed header line for pmap 24133 24134 * pair.c: Made further changes to accommodate plus sign in alignment 24135 24136 * pair.c: Introduced plus sign in alignment 24137 24138 * oligoindex.c: Improved efficiency of analyzing genomic segment, by storing 24139 indices for each frame 24140 24141 * genome.c: Added debugging statements 24142 24143 * dynprog.c, dynprog.h: Changed combinatorial instantiation of codons to a 24144 single instantiation 24145 24146 * Makefile.am, params.c, params.h, pmap.c: Gave pmap the same overall 24147 behavior as gmap, including multi-threading and flag options 24148 241492005-07-19 twu 24150 24151 * Makefile.am, block.c, block.h, dynprog.c, dynprog.h, gmap.c, indexdb.c, 24152 indexdb.h, oligoindex.c, oligop.c, oligop.h, pair.c, pmap.c, pmapindex.c, 24153 sequence.c, sequence.h, stage1.c, stage1.h, stage2.c, stage2.h, stage3.c, 24154 stage3.h, translation.c, translation.h: Introduced pmap and pmapindex 24155 241562005-07-15 twu 24157 24158 * get-genome.c: Added range format to allow negative lengths 24159 24160 * genome.c: Added exception when requested length exceeds allocated buffer 24161 length 24162 24163 * except.c: Added printing of exception message 24164 24165 * chimera.c: Fixed problem when donor or acceptor length exceeded allocated 24166 buffer length 24167 24168 * Makefile.am: Fixed handling of non-distributed source code 24169 24170 * align.test.ok: Changed genomic coordinate to match new computation of 24171 coordinates in gaps 24172 24173 * VERSION: Updated version number 24174 241752005-07-13 twu 24176 24177 * dynprog.c: Made genomic positions on left and right ends of gap constant, 24178 to avoid problems in stage 3 computations 24179 24180 * memchk.c: Made procedures thread-safe 24181 24182 * stage3.c, stage3.h: Fixed genomic positions on left and right ends of gap. 24183 Removing gaps at 5' end, possibly introduced by smoothing. 24184 24185 * mem.c, mem.h: Improved memory trap procedures 24186 24187 * matchpair.c: Added check before freeing some possibly null structures 24188 24189 * genome.c: Removed duplicate FREE of filename 24190 24191 * gmap.c: Fixed genomic positions on left and right ends of gap. Fixed bug 24192 when chimera was not reset to NULL. 24193 24194 * pair.c, pair.h: Fixed genomic positions on left and right ends of gap 24195 241962005-07-12 twu 24197 24198 * stage3.c: Fixed bugs in computing dual introns, dealing with previously 24199 computed gaps, and returning coordinates for empty peelbacks. 24200 24201 * stage1.c: Increased parameters for maximum number of matching pairs 24202 considered 24203 24204 * pairpool.c: Enhanced debugging output 24205 24206 * pair.c, pair.h: Added procedure for printing a single pair 24207 24208 * mutation.c: Changed unnamed unions to named unions 24209 242102005-07-08 twu 24211 24212 * stage3.c: Added extra check to make sure pairs is non-empty 24213 24214 * gmap.c: Initialized chimera to be NULL 24215 24216 * oligoindex.c: Fixed bug caused by writing to a random location when 24217 indexsize < 8. 24218 24219 * mem.c: Improved trap code 24220 24221 * memchk.c: Changed types to be consistent with regular version of memory 24222 manager 24223 24224 * memchk.c: Added checking implementation of memory manager 24225 24226 * stage3.c: Fixed a segmentation fault bug. 24227 24228 * stage2.c: Changed distpenalty to ignore query distance and 24229 max_intronlength, and simplified computation. These values were probably 24230 not affecting previous computation anyway. 24231 24232 * mutation.c: Fixed problem caused by removal of unnamed union 24233 24234 * iitdef.h: Included header file for off_t type. 24235 24236 * gmap.c: Changed maxpeelback for cross-species mode back to previous value 24237 24238 * VERSION: Updated version 24239 24240 * configure.ac: Added check for caddr_t type. Added check for madvise flags. 24241 24242 * gmap.c: Removed unnecessary variables and arguments. Changed variable 24243 type of nworkers. 24244 24245 * oligoindex.h, pairpool.c, pairpool.h, stopwatch.c, stopwatch.h: Added 24246 formal void argument 24247 24248 * compress.c, genome.c, genome.h, matchpair.c, oligoindex.c, stage2.c: 24249 Removed unnecessary variables 24250 24251 * block.c, boyer-moore.c, dynprog.c, dynprog.h, matchpair.h, oligo.c, 24252 oligo.h, stage1.c, stage1.h: Removed unnecessary arguments 24253 24254 * sequence.c, sequence.h: Added formal void argument. Changed some variable 24255 types. 24256 24257 * indexdb.c, match.c, segmentpos.c: Changed print statement 24258 24259 * pair.c: Added static specification to some functions 24260 24261 * iit-read.c, iitdef.h: Changed some variable types 24262 24263 * indexdb.c: Increased interval of monitoring output from 1 million nt to 10 24264 million nt 24265 24266 * gmap_uncompress.pl.in: Fixed bug in argument list 24267 24268 * bigendian.c, boyer-moore.c, chrom.c, chrsubset.c, chrsubset.h, compress.c, 24269 datadir.c, dynprog.c, except.c, genome-write.c, genome.c, get-genome.c, 24270 gmap.c, gmapindex.c, iit-read.c, iit-write.c, iit_dump.c, iit_get.c, 24271 iit_store.c, indexdb.c, interval.h, intlist.c, list.c, match.c, 24272 matchpair.c, md5-compute.c, md5.c, mutation.c, oligo-count.c, oligo.c, 24273 oligoindex.c, pair.c, pairpool.c, params.h, reqpost.c, segmentpos.c, 24274 segmentpos.h, sequence.c, smooth.c, stage1.c, stage2.c, stage3.c, 24275 translation.c, uintlist.c: Made changes to satisfy pedantic gcc compiler 24276 warnings and to comply with ANSI C 24277 24278 * acinclude.m4: Added autoconf macro for madvise flags 24279 24280 * MAINTAINER: Added comment about strict compiler checking 24281 24282 * madvise-flags.m4: Initial import into CVS 24283 242842005-07-07 twu 24285 24286 * gmap.c: Changed parameters to prevent segmentation fault in cross-species 24287 mode 24288 24289 * gmap.c, stage3.c, stage3.h, pair.c, pair.h: Added psl output format 24290 24291 * stage1.c: Increased matchpairs allowed at pre-unique stage 24292 24293 * match.c, match.h: Trivial formatting change 24294 24295 * gmapindex.c: Removed some unused variables 24296 24297 * get-genome.c: Changed usage statement for coordinate interval 24298 24299 * datadir.c: Added error message when genome subdirectory is not readable 24300 24301 * chrnum.c, chrnum.h: Added Chrnum_length command, needed for psl output 24302 format 24303 243042005-06-23 twu 24305 24306 * VERSION: Updated version for release 24307 24308 * stage2.c: Increased cross-species penalty for intron length 24309 24310 * gmap.c: Added other constraints on using oligo depth. Reporting failure 24311 type. Separated out beta source files from gmap. 24312 24313 * result.c, result.h: Added failure type 24314 24315 * chrsubset.c, chrsubset.h, plotdata.c: Added checks if chrsubset is NULL. 24316 24317 * genomeplot.c: Added getopt to genomeplot. Added mode for printing 24318 segments. 24319 24320 * Makefile.am: Added getopt to genomeplot. Separated out beta source files 24321 from gmap. 24322 243232005-06-21 twu 24324 24325 * genomeplot.c, plotdata.c, plotdata.h: Fixed coloring of raw data, 24326 depending on whether segmentation is performed. 24327 24328 * Makefile.am: Moved chrsegment functionality to genomeplot 24329 24330 * gmap.c: Giving crossspecies flag to Stage 2 24331 24332 * genomeplot.c: Getting segment breakpoints back in three separate lists 24333 24334 * chrsegment.c, chrsegment.h: Added re-checking of segment breakpoints 24335 24336 * intlist.c, intlist.h: Added Intlist_delete function 24337 24338 * stage2.c, stage2.h: Implemented different intron penalties for 24339 crossspecies mode 24340 24341 * stage1.c: Restored full functionality for crossspecies mode 24342 243432005-06-16 twu 24344 24345 * stage1.c: Added check for too many matchpairs before applying 24346 Matchpair_filter_unique 24347 24348 * chrsegment.c, chrsegment.h: Using chromosomal positions in calculations 24349 24350 * genomeplot.c, plotdata.c, plotdata.h: Modified calls to Plotdata_values 24351 and Plotdata_chrpositions 24352 24353 * genomeplot.c, iit-read.c, iit-read.h: Added function IIT_length 24354 24355 * plotdata.c: Storing chrpositions and values as individual arrays 24356 24357 * genomeplot.c: Using a tree structure to store segment results. 24358 24359 * chrsegment.c, chrsegment.h: Using a tree structure to store segment 24360 results. Added check for single breakpoint in addition to double 24361 breakpoints. 24362 243632005-06-15 twu 24364 24365 * chrsegment.c, chrsegment.h, genomeplot.c: Implemented recursive 24366 segmentation, generating a list of segments 24367 24368 * iit-read.c: Fixed problem with memory fault 24369 24370 * Makefile.am, chrsegment.c, chrsegment.h, genomeplot.c, plotdata.c, 24371 plotdata.h: Merged chrsegment functionality into genomeplot 24372 24373 * genomeplot.c: Fixed some memory leaks 24374 24375 * datum.c, datum.h: Added Datum_T object for use by Plotdata_T 24376 24377 * Makefile.am: Added program chrsegment and added Datum_T object to 24378 genomeplot 24379 24380 * chrsegment.c, nr-x.h: Added program chrsegment 24381 24382 * plotdata.c, plotdata.h: Now storing data as sorted within each chromosome 24383 24384 * chrsubset.c, chrsubset.h: Added function to compute and retrieve old 24385 indices 24386 243872005-06-14 twu 24388 24389 * Makefile.am, chrsubset.c, chrsubset.h, color.c, color.h, doublelist.c, 24390 doublelist.h, genomeplot.c, plotdata.c, plotdata.h: Added program 24391 genomeplot 24392 24393 * uintlist.c: Fixed typo 24394 24395 * iit-read.c: Skipping freeing of memory, since it sometimes gives a memory 24396 fault. 24397 24398 * Makefile.am, chimera.c, maxent.c, maxent.h, splice-site.c, splice-site.h: 24399 Changed splice site predictor from scoring matrix to maxent method 24400 244012005-06-10 twu 24402 24403 * indexdb.c: Added error message when user-provided genomic segment is 24404 invalid 24405 244062005-06-03 twu 24407 24408 * chimera.c, chimera.h: Added detection of exon-exon boundary for chimeras 24409 in both forward and reverse directions 24410 24411 * gmap.c, pair.c, pair.h, stage3.c, stage3.h: Added output of cDNA direction 24412 of exon-exon boundary for chimeras 24413 244142005-06-02 twu 24415 24416 * dynprog.c, dynprog.h, gmap.c, stage3.c: Restored previous behavior for 24417 finding microexons. Changed meaning of end_microexons_p to be an 24418 allowance for longer introns at the ends. 24419 24420 * chimera.c: Improved debugging output 24421 244222005-06-01 twu 24423 24424 * stage3.c, stage3.h: Added utilities for new chimera functions. 24425 24426 * gmap.c: Added stage 3 calls for truncating full length. Using Chimera_T 24427 objects and new chimera functions. 24428 24429 * stage1.c, stage1.h: Using ends only for cross-species mode in stage 1 24430 24431 * result.c, result.h: Created Chimera_T object. 24432 24433 * pair.c, pair.h: Added utility programs for chimera evaluation. 24434 24435 * chimera.c, chimera.h: Added search for exon-exon boundaries in chimeras. 24436 Created Chimera_T object. 24437 24438 * Makefile.am, splice-site.c, splice-site.h: Added splice site calculations 24439 to chimera evaluation. 24440 244412005-05-24 twu 24442 24443 * gmap.c: Moved translate calls up to gmap.c. Added hook for -T flag for 24444 truncating full-length sequence. 24445 24446 * stage3.c, stage3.h: Using function Pairpool_transfer_bounded. Moved 24447 translate calls up to gmap.c. 24448 24449 * pairpool.c, pairpool.h: Added function Pairpool_transfer_bounded. 24450 244512005-05-20 twu 24452 24453 * VERSION: Revised version 24454 24455 * stage3.c, stage3.h: Turned off default microexon finding at ends. Cleaned 24456 up margin function for identifying chimeras. 24457 24458 * pair.c: Changed computation of matchscores 24459 24460 * oligoindex.c: Changed definition of oligodepth. 24461 24462 * gmap.c: Added -U flag to turn on microexons at ends. Changed code for 24463 chimeras, and changed meaning of -x flag. 24464 24465 * chrsubset.c: Added check on freeing object. 24466 24467 * chimera.c: Fixed debugging statement 24468 24469 * Makefile.am: Added beta testing flag. 24470 244712005-05-09 twu 24472 24473 * Makefile.am: Added compiler instructions for pthreads to various programs 24474 24475 * VERSION: Modified version number. 24476 24477 * README: Added information about -E feature of fa_coords and gmap_setup, 24478 and information about editing coords.txt. 24479 24480 * MAINTAINER: Added reminder to modify VERSION. 24481 24482 * gmap_setup.pl.in: Removed reverse complement procedures here; now being 24483 done by gmapindex. Allowed specification of a command. 24484 24485 * fa_coords.pl.in: Introduced chromosome NA for headers that cannot be 24486 parsed. Allowed specification of a command. Improved handling of Celera 24487 genomes. 24488 24489 * genome.c, indexdb.c: Put mutexes around read procedures for the 24490 combination of multi-threading and non-memory mapped reading of file. 24491 24492 * gmap.c: Fixed bug from uninitialized querysubseq. 24493 24494 * pair.c, pair.h, result.c, result.h, stage3.c, stage3.h: Allowed printing 24495 of range of chimera breakpoints 24496 24497 * interval.c, interval.h: Changed interface to some functions 24498 24499 * iit-read.c, iit-write.c: Fixed bug in debug version of dump. Changed 24500 calls to Interval_T functions. 24501 24502 * gmapindex.c: Changed count_sequence() to read a line at a time 24503 24504 * genome-write.c: Properly handling contigs marked as reverse complement. 24505 24506 * gmap.c: Using fscore threshold to determine statistical significance. 24507 Reporting equivalent positions for breakpoint. 24508 24509 * chimera.c, chimera.h: Using fscore threshold to determine statistical 24510 significance 24511 245122005-05-06 twu 24513 24514 * gmap_setup.pl.in: Handling other NCBI cases where version numbers are 24515 missing 24516 24517 * genome-write.c, indexdb.c: Minor changes in monitoring output 24518 24519 * VERSION: Updated version number 24520 24521 * README: Added explanation of output ordering with multiple threads 24522 24523 * coords1.test.ok: Changed to add new comment line in coords.txt 24524 24525 * README: Minor textual change 24526 24527 * gmap_setup.pl.in: Added -q flag for specifying indexing interval. Allowed 24528 comment lines to be in coords.txt. 24529 24530 * md_coords.pl.in: Improved messages to user. 24531 24532 * fa_coords.pl.in: Added handling of unmapped contigs for Ensembl genomes. 24533 Improved messages to user. Added check for possible conversions of 24534 alternate chromosomes to alternate strains. 24535 24536 * gmap_uncompress.pl.in: Fixed bug due to old code that referred to the -R 24537 flag 24538 24539 * gmap.c: Enhanced result to show number of matches, mismatches, and indels 24540 in alternative to chimera. Introduced maxpaths of 0 to indicate output of 24541 both paths of chimera if present, otherwise one path. 24542 24543 * pair.c, pair.h, sequence.c, sequence.h, stage3.c, stage3.h: Removed 24544 references to ntrimmed 24545 24546 * result.c, result.h: Enhanced result to show number of matches, mismatches, 24547 and indels in alternative to chimera 24548 24549 * gmap.c: Remove check for badoligos. Modified logic for computing 24550 chimeras. Made calls to initialization and termination routines for 24551 Dynprog_T. 24552 24553 * chimera.c: Fixed memory leak 24554 24555 * pair.c: Removed printing of ntrimmed nucleotides 24556 24557 * stage3.c, stage3.h: Added functions for reporting matches, mismatches, 24558 indels, and margin of a Stage3_T object 24559 24560 * translation.c: Added initial values for translation_start and 24561 translation_end 24562 24563 * stage2.c: Removed computation of stage2 support. Simplified loop 24564 conditions. 24565 24566 * oligoindex.c, oligoindex.h: Removed computation of stage2 support 24567 24568 * dynprog.c, dynprog.h: Replaced functions with arrays for computing 24569 pairdistances and jump penalties 24570 245712005-05-05 twu 24572 24573 * oligoindex.c, oligoindex.h: Changed memory allocation scheme, by setting 24574 ALLOCSIZE == MAXHITS. Assigning blocks in ascending order of available 24575 slots. Computing trim_start and trim_end. Reporting support for stage 2. 24576 24577 * gmap.c: Changed calls to Sequence_read(). Using oligomer-based method for 24578 trimming query sequence. 24579 24580 * md5-compute.c, oligo-count.c: Changed calls to Sequence_read() 24581 24582 * sequence.c, sequence.h: Removed poly-A and poly-T detection in favor of 24583 oligomer-based trimming at ends. 24584 24585 * stage2.c, stage2.h: Added check for stage 2 support. 24586 24587 * stage1.c: Restored terminal sampling for short sequences. Fixed potential 24588 bug with subtracting unsigned ints. Enhanced debugging messages. 24589 24590 * md5-compute.c, oligo-count.c, sequence.c, sequence.h: Modified functions 24591 to report next char in input. 24592 24593 * matchpair.c, matchpair.h: Added reporting of stage1 support and stage1 24594 stretch. 24595 24596 * gmap.c: Added checks for bad input sequences based on oligo depth, bad 24597 oligos, stage1 support, and stage2 support. Moved message about batch 24598 mode earlier, if evidence of a second sequence is present. 24599 24600 * gmap.c: Chopping chimeras at breakpoint, and providing a flag to allow 24601 overlaps at the breakpoint. 24602 24603 * stage3.c, stage3.h: Simplified interface to Stage3_copy. 24604 24605 * pair.c, pair.h: Removed coverage correction for genomic gaps. Added way 24606 to turn off merge_gaps during copying of pairs. 24607 246082005-05-04 twu 24609 24610 * stage2.c: Made changes in individual instructions to improve speed 24611 24612 * oligoindex.c: Added overabundant field 24613 24614 * chimera.c: Speeded up computation 24615 24616 * gmap.c: Using explicit step for marking oligos in the query. Terminating 24617 attempt at mapping if oligo depth exceeds 2. Fixed memory leak. 24618 24619 * stage2.c, stage2.h: The variable badsequencep is now fed into 24620 Stage2_compute. 24621 24622 * stage1.c: Killed terminal sampling for short sequences. Reduced values 24623 for maxentries. Both done to improve speed. 24624 24625 * oligoindex.c, oligoindex.h: Added an explicit step for marking oligos in 24626 the query, which needs to be done only once for each query sequence. 24627 24628 * chimera.h: Added computation of margin. 24629 24630 * chimera.c: Added computation of margin. Improved debugging output. 24631 24632 * gmap.c: Fixed bug where bestfrom == bestto. Added check for sufficient 24633 margin at ends before finding chimera. 24634 24635 * gmap_compress.pl.in: Changed compression routine to handle chimera 24636 information 24637 24638 * chrsubset.c: Fixed bug where stdin was closed if .chrsubset file didn't 24639 exist 24640 24641 * stage3.h: Added function to compute matchscores for chimera detection. 24642 24643 * stage3.c: Changed calls to Sequence_T functions. Added function to 24644 compute matchscores for chimera detection. 24645 24646 * stage2.c: Performing Stage 2 from trim start to trim end, instead of 24647 entire sequence. Changed calls to Sequence_T functions. 24648 24649 * stage1.c: Changed calls to Sequence_T and Reader_T functions 24650 24651 * sequence.c, sequence.h: Cleaned up interface. Added ability to print 24652 trimmed part of sequence. 24653 24654 * Makefile.am, chimera.c, chimera.h, gmap.c, nmath.c, nmath.h, pair.c, 24655 pair.h: Added chimera detection based on Chow test 24656 24657 * md5-compute.c: Changed call to Sequence_T function. Using full sequence 24658 now for MD5 computation. 24659 24660 * matchpair.c: Removed call to Sequence_T function 24661 24662 * oligoindex.c: Changed calls to Sequence_T functions 24663 24664 * oligo-count.c: Changed call to Reader_new 24665 24666 * get-genome.c: Changed call to Sequence_print 24667 24668 * reader.c, reader.h: Storing querystart and queryend in Reader_T object 24669 24670 * block.c, block.h: Removed unnecessary field 24671 246722005-05-03 twu 24673 24674 * gmap.c, gmapindex.c, indexdb.c, indexdb.h: Allowed indexing interval of 24675 12-mers to be specified at run time 24676 24677 * configure.ac: Added check for madvise function 24678 24679 * README: Added Ensembl format as a recognized coordinate format 24680 24681 * md_coords.pl.in: Improved prompt for alternate chromosomes 24682 24683 * genome.c, iit-read.c, indexdb.c: Put compiler flags around madvise 24684 24685 * datadir.c: Deleted line that was causing problems when the GMAPDB 24686 environment variable was set 24687 246882005-05-01 twu 24689 24690 * fa_coords.pl.in: Further fixed coordinates 24691 24692 * fa_coords.pl.in: Removed addition of 1 to coordinates. Added parsing for 24693 Ensembl format. 24694 24695 * gmap_setup.pl.in: Testing accessions with and without version numbers 24696 24697 * md_coords.pl.in: Making -U and -A flags standard. Can exclude unmapped 24698 contigs and alternate chromosomes with chrsubsets. 24699 24700 * md_coords.pl.in: Fixed case where direction eq "0". 24701 24702 * oligoindex.c: Modified memory allocation scheme to have a fixed block of 24703 memory that expands when necessary. 24704 24705 * iit_get.c: Added -A back to allowed flags. 24706 24707 * chrsubset.c: Added debug statements 24708 24709 * VERSION: Updated version 24710 247112005-04-20 twu 24712 24713 * sequence.c: Kept poly-A and poly-T limits when specifying subsequences. 24714 24715 * pair.c: Added an exception handler. Removed minor bug where first pair 24716 was handled twice. 24717 24718 * gmapindex.c: Allowed compress and uncompress routines to take a filename 24719 as an argument. Added wraplength option for uncompress. 24720 24721 * gmap.c: Fixed bug in specifying wrong sequence length for computing 24722 chimeras. Removed limit on number of paths for finding chimeras. Added 24723 exception handler. 24724 24725 * except.c: Modified behavior of exception handler 24726 24727 * genuncompress.c: Fixed problem if positions were greater than allowed for 24728 signed ints. 24729 24730 * compress.c, compress.h: Added wraplength option to Compress_uncompress. 24731 247322005-04-19 twu 24733 24734 * sequence.c, stage1.c: Added checks for null before freeing memory. 24735 24736 * gmap.c: Made IIT_get return an array of ints, rather than an Intlist, to 24737 reduce repeated small memory allocations. Placed a limit on npaths for 24738 finding chimeras. 24739 24740 * get-genome.c, iit-read.c, iit-read.h, iit_get.c, segmentpos.c, stage3.c: 24741 Made IIT_get return an array of ints, rather than an Intlist, to reduce 24742 repeated small memory allocations. 24743 24744 * mem.c: Added debugging statements. 24745 247462005-04-18 twu 24747 24748 * dynprog.c: Added memory allocation routines in cases where problem size 24749 exceeds maxlength of Dynprog_T. Removed unused code for affine gap 24750 penalties. 24751 247522005-04-12 vivekr 24753 24754 * cvswrappers: Added binary extensions 24755 247562005-03-11 twu 24757 24758 * gmap_setup.pl.in, md_coords.pl.in: Allowed for contigs to be reverse 24759 complement 24760 24761 * fa_coords.pl.in: Removed unused functions 24762 24763 * gmap.c, get-genome.c: Moved dump functions to get-genome 24764 24765 * segmentpos.c: Fixed bug when alternate strain contig exists but reference 24766 is to reference strain 24767 24768 * get-genome.c, iit-read.c, iit-read.h: Changed output of dump functions 24769 24770 * README: Added instructions for specifying reverse coordinates 24771 24772 * VERSION: Changed version number 24773 247742005-03-09 twu 24775 24776 * gmapindex.c, iit-read.c: Now storing information about reverse 24777 complementing of contigs 24778 24779 * match.c, pair.c, pair.h, segmentpos.c, segmentpos.h, stage3.c: Limited 24780 printing of contigs to those that are relevant for a given strain. 24781 24782 * get-genome.c, gmap.c: Fixed bug when using the -R release flag. 24783 247842005-03-08 twu 24785 24786 * get-genome.c: Changed default behavior to print just the reference strain. 24787 Added a flag to print all strains. 24788 247892005-03-04 twu 24790 24791 * chrsubset.c: Fixed minor memory leak 24792 24793 * VERSION: Updated version 24794 24795 * README: Added explanation of chromosome subsets 24796 24797 * chrsubset.c: Changed Chrsubset_T object to be NULL when a blank list is 24798 read in .chrsubset file. 24799 24800 * gmap.c: Incorporated chrsubset. Fixed printing of option flags. 24801 24802 * gmap_setup.pl.in: Added creation of chrsubset file 24803 24804 * whats_on: Changed directories where genomic maps are located 24805 24806 * Makefile.am, chrsubset.c, chrsubset.h, params.c, params.h, stage1.c, 24807 stage1.h: Added capability to search on chromosome subsets 24808 24809 * separator.h: Changed separator back to dashes 24810 24811 * iit-read.c: Changed format of dumping typestrings for .altstrain.type file. 24812 24813 * gmapindex.c: Added writing of .altstrain.type file. 24814 24815 * stage1.c: Removed unused code. Using stage1size instead of INDEX1PART in 24816 some places. 24817 24818 * gmap.c: Added error message. 24819 24820 * datadir.c: Removed unused error message. 24821 248222005-03-03 twu 24823 24824 * stage1.c: Introduced idea of dangling matches at ends, and using it to 24825 determine when to sample further at each end, and when to sample from the 24826 middle. 24827 248282005-03-02 twu 24829 24830 * separator.h: Changed separator from dashes to dots. 24831 24832 * stage1.c: Fixed a bug in find_3prime_matches. Changed sampling to avoid 24833 terminal sampling, and to redo sampling just before nskip is zero. This is 24834 done to avoid long computation times with terminal sampling on long cDNAs. 24835 248362005-03-01 twu 24837 24838 * matchpair.c, matchpair.h: Added a boundmethod type. 24839 24840 * stage1.c: Added code for finding matches using triplets, but not using it. 24841 Removing terminal sampling, and performing a redo of last sampling instead. 24842 248432005-02-18 twu 24844 24845 * params.c, params.h: Removed maxintronlen from the params structure. 24846 24847 * gmap.c: Increased default maxintronlen to 1.2M, and provided a flag to 24848 allow user to change this value. 24849 24850 * perl.m4, configure.ac: Changed name of macro 24851 24852 * configure.ac: Added check for Perl with needed modules. Added warning 24853 messages to bottom of configure script. 24854 24855 * config.site: Added option for user to specify a value for PERL 24856 24857 * acinclude.m4: Added check for Perl with needed modules 24858 24859 * perl.m4: Added check for Perl with appropriate modules 24860 24861 * VERSION: Set version number 24862 24863 * COPYING, config.site: Changed wording slightly 24864 24865 * README: Removed optional comment after make check 24866 248672005-02-17 twu 24868 24869 * datadir.c, gmap_setup.pl.in: Allowed subdirectory to be present in the -d 24870 flag 24871 24872 * config.site: Fixed advice on installing in build directory 24873 24874 * README: Fixed some textual errors 24875 248762005-02-16 twu 24877 24878 * gmap_setup.pl.in: Modified instruction text 24879 24880 * md_coords.pl.in: Added guessing of columns 24881 24882 * genome.c, genuncompress.c, iit-read.c, indexdb.c: Added type cast to avoid 24883 compiler warnings for munmap. 24884 24885 * configure.ac: Removed capitalization 24886 24887 * VERSION: Updated version 24888 24889 * configure.ac: Capitalized message when compilation of pthreads fails 24890 24891 * Makefile.am: Added subdirectories 24892 24893 * iit_get.out.ok, iittest.iit.ok: Added okay files for IIT programs 24894 24895 * AUTHORS: Minor text change 24896 24897 * gmap_setup.pl.in: Changed usage statement 24898 24899 * acx_pthread.m4: Updated macro to latest version 24900 24901 * configure.ac: Added tests for IIT programs. Changed call to ACX_PTHREAD. 24902 24903 * config.site.gne: Changed name from genomedir to gmapdb 24904 24905 * config.site: Added lines for PTHREAD_CFLAGS and PTHREAD_LIBS 24906 24907 * MAINTAINER: Added instructions for building .ok files for tests 24908 24909 * align.test.in, coords1.test.in, map.test.in, setup1.test.in, 24910 setup2.test.in: Added ${srcdir} where necessary to make distcheck happy 24911 24912 * Makefile.am, fa.iittest, iit.test.in, iit_dump.test.in, iit_get.test.in, 24913 iit_store.test.in: Added tests for IIT programs 24914 24915 * gmap.c: Changed ENABLE_PTHREADS to HAVE_PTHREAD. Added reporting of 24916 features to version command. 24917 24918 * blackboard.c, reqpost.c: Changed ENABLE_PTHREADS to HAVE_PTHREAD 24919 24920 * iit_store.c: Changed flags and calling convention 24921 24922 * Makefile.am: Removed ENABLE_PTHREADS and POPT_LIBS. 24923 24924 * acinclude.m4, acx-pthread.m4, acx_pthread.m4: Changed name of file 24925 24926 * README: Completed instructions 24927 24928 * COPYING: Completed license terms 24929 24930 * acinclude.m4, config, acx-pthread.m4, expand.m4, mmap-flags.m4, 24931 pagesize.m4: Put m4 macros into separate files 24932 24933 * configure.ac: Commented out code for AC_PROG_LIBTOOL. Added some compiler 24934 checks. 24935 249362005-02-15 twu 24937 24938 * gmap_setup.pl.in: Removed IO::Dir. Changed behavior if -I flag not given. 24939 Added -9 for debugging behavior. 24940 24941 * fa_coords.pl.in, md_coords.pl.in: Removed IO::Dir 24942 24943 * iit-read.h, iit-write.h: Fixed compiler complaint about double typedef for 24944 IIT_T 24945 24946 * iit-read.c: Fixed one-off problem with IIT_totallength. 24947 24948 * genome-write.c: Fixed montoring statements. 24949 24950 * gmap.c: Put pthreads information in version text. 24951 24952 * gmapindex.c: Fixed problem in comparing an int (255) with EOF (-1) on some 24953 machines. 24954 24955 * Makefile.am, align.test.in, align.test.ok, coords1.test.in, 24956 coords1.test.ok, map.test.in, map.test.ok, setup.genomecomp.ok, 24957 setup.idxpositions.ok, setup1.test.in, setup2.test.in, ss.chr17test: 24958 Expanded test suite 24959 24960 * Makefile.am, ss.cdna, ss.chr17test, ss.her2: Initial addition to CVS 24961 repository. 24962 249632005-02-14 twu 24964 24965 * gmap_setup.pl.in, md_coords.pl.in: Moved functionality to separate 24966 md_coords program 24967 24968 * Makefile.am: Added fa_coords program 24969 24970 * fa_coords.pl.in: Added file to CVS repository. 24971 24972 * block.c, block.h, compress.c, dynprog.c, dynprog.h, genome-write.c, 24973 iit-read.c, iit-write.c, indexdb.c, interval.c, intron.c, match.c, 24974 match.h, matchpair.c, matchpair.h, md5.c, md5.h, md5.t.c, oligo.c, 24975 oligo.h, pair.h, pairpool.c, pairpool.h, reader.c, request.c, result.h, 24976 segmentpos.c, segmentpos.h, smooth.c, smooth.h, stage1.c, stage3.c, 24977 stopwatch.c, translation.h: Cleaned up included headers 24978 24979 * table.c, table.h, tableint.c, tableint.h, chrom.c, chrom.h: Clarified 24980 meaning of unsigned type. 24981 24982 * reqpost.h: Using Blackboard_T in interface. 24983 24984 * oligo-count.c: Fixed call to Block_new. 24985 24986 * listdef.h: Added a define for T. 24987 24988 * iitdef.h: Moved typedef to iit-read.h and iit-write.h. 24989 24990 * iit_get.c: Removed popt library calls. 24991 24992 * iit-read.h, iit-write.h: Moved include of iitdef.h to .c files. 24993 24994 * get-genome.c: Using SEPARATOR now instead of DASH. 24995 24996 * datadir.c, datadir.h: Formatting changes. 24997 24998 * gmap.c, oligoindex.c, oligoindex.h, params.c, params.h, stage2.c, 24999 stage2.h: Moved get_mappings command to be in oligoindex.c. Moved 25000 indexsize to be stored in Params_T. 25001 25002 * complement.c, complement.h, genome.c, pair.c, sequence.c, translation.c: 25003 Changed complement table to be a macro. 25004 25005 * blackboard.h: Added comments about include of reqpost.h. 25006 25007 * Makefile.am: Cleaned up source files needed for each binary. 25008 25009 * shortoligomer.h: Removed file. Definition needed only by oligoindex.c. 25010 25011 * bigendian.h, genuncompress.c, iit-write.c, littleendian.h: Conditionally 25012 include littleendian.h. 25013 25014 * iit-read.h: Added function to compute total length. 25015 25016 * iit-read.c: Conditionally include littleendian.h. Added function to 25017 compute total length. 25018 25019 * indexdb.h: Allow user to force building of positions file in file. 25020 25021 * indexdb.c: Conditionally include littleendian.h. Allow user to force 25022 building of positions file in file. 25023 25024 * genome-write.c: Added explanation of file format. 25025 25026 * genome.c: Changed type from unsigned int to UINT4. Conditionally include 25027 littleendian.h. 25028 25029 * compress.c, compress.h: Added ability to create genome file in memory, if 25030 enough is available. Changed type from unsigned int to UINT4. 25031 25032 * Makefile.am, genome-write.c, genome-write.h, gmapindex.c: Moved procedures 25033 for writing genome file to a new file. Added ability to create genome 25034 file in memory, if enough is available. 25035 250362005-02-10 twu 25037 25038 * iit_get.c: Added include for strings.h to handle rindex. 25039 25040 * bigendian.h, genome.c, genuncompress.c, indexdb.c, sequence.c: Added 25041 includes for stddef.h to handle size_t 25042 25043 * genome.c, genuncompress.c, iit-read.c, indexdb.c: Added check for 25044 HAVE_SYS_STAT_H 25045 25046 * gmap.c, gmapindex.c, oligo-count.c: Removed include of sys/stat.h 25047 25048 * iit-read.c: Commented out include of sys/param.h 25049 25050 * genome.c, indexdb.c: Commented out include of errno.h 25051 25052 * except.c: Removed code for mailing error messages to developer. 25053 25054 * genome.c, genuncompress.c, gmapindex.c, iit-read.c, iit_store.c, 25055 indexdb.c, md5-compute.c, stopwatch.c: Added checks for HAVE_UNISTD_H and 25056 HAVE_FCNTL_H. 25057 25058 * blackboard.c, compress.c, datadir.c, genome.c, genuncompress.c, gmap.c, 25059 gmapindex.c, iit-read.c, iit_store.c, indexdb.c, oligo-count.c, reqpost.c, 25060 stopwatch.c: Added check for HAVE_SYS_TYPES_H 25061 25062 * genome.c, genomicpos.c, iit-write.c, indexdb.c, match.c, md5.c, 25063 oligoindex.c, pair.c, sequence.c: Created separate macros for handling 25064 absence of memcpy and memmove. 25065 25066 * genome.c, genomicpos.c, iit-write.c, indexdb.c, match.c, md5.c, 25067 oligoindex.c, pair.c, sequence.c: Included macros for handling computers 25068 without memcpy or memmove. 25069 25070 * datadir.c: Included macros for handling computers without dirent.h. 25071 250722005-02-10 jmurray 25073 25074 * cvswrappers: Added binary extensions 25075 250762005-02-07 twu 25077 25078 * chimera.c, translation.c: Fixed rcsid lines 25079 25080 * Makefile.am, uinttable.c, uinttable.h: Removed files uinttable.c and 25081 uinttable.h 25082 25083 * bigendian.c: Added ending quotation mark to rcsid. 25084 25085 * bigendian.h, chimera.h, scores.h, separator.h: Added Id comment to 25086 beginning of header files. 25087 25088 * assert.c, assert.h, bigendian.c, bigendian.h, blackboard.c, blackboard.h, 25089 block.c, block.h, boyer-moore.c, chimera.c, chrnum.c, chrnum.h, chrom.c, 25090 chrom.h, complement.c, complement.h, compress.c, datadir.h, dynprog.c, 25091 dynprog.h, except.c, except.h, genome.c, genome.h, genomicpos.c, 25092 genomicpos.h, get-genome.c, gmap.c, gmapindex.c, iit-read.c, iit-read.h, 25093 iit-write.c, iit-write.h, iit_dump.c, iit_get.c, iit_store.c, indexdb.c, 25094 indexdb.h, interval.c, interval.h, intlist.c, intlist.h, intron.c, 25095 intron.h, list.c, list.h, match.c, match.h, matchpair.c, matchpair.h, 25096 md5-compute.c, md5.c, md5.h, mem.c, mem.h, mutation.c, mutation.h, 25097 oligo-count.c, oligo.c, oligo.h, oligoindex.c, oligoindex.h, pair.c, 25098 pair.h, pairpool.c, pairpool.h, params.c, params.h, reader.c, reader.h, 25099 reqpost.c, reqpost.h, request.c, request.h, result.c, result.h, 25100 segmentpos.c, segmentpos.h, sequence.c, sequence.h, smooth.c, smooth.h, 25101 stage1.c, stage1.h, stage2.c, stage2.h, stage3.c, stage3.h, stopwatch.c, 25102 stopwatch.h, table.c, table.h, tableint.c, tableint.h, translation.c, 25103 translation.h, uintlist.c, uintlist.h, uinttable.c, uinttable.h: Moved 25104 HAVE_CONFIG_H from .h file to .c file. 25105 25106 * datadir.c: Added check to see if closedir succeeded. 25107 25108 * Makefile.am: Augmented list of bin programs. 25109 25110 * get-genome.c, match.c, pair.c, pair.h, stage3.c, stage3.h: Changing 25111 variable names to genomesubdir, fileroot, and dbversion. 25112 25113 * gmap.c: Added -g flag. Changing variable names to genomesubdir, fileroot, 25114 and dbversion. 25115 25116 * params.c, params.h: Made dbversion a static variable. 25117 25118 * genome.c, genome.h, indexdb.c, indexdb.h: Changing variable names to 25119 genomesubdir and fileroot. 25120 25121 * datadir.c, datadir.h: Now searching subdirectory to find name of fileroot, 25122 which can be different from subdirectory name. 25123 25124 * pair.c: Removed unnecessary math.h header. Added initialization of donor 25125 and acceptor arrays. 25126 25127 * getopt.c: Removed internationalization code. 25128 25129 * gmap.c: Removed unnecessary math.h header. Changed location of map 25130 directory for each genome. 25131 25132 * matchpair.c, oligoindex.c, segmentpos.c, smooth.c, stage3.c: Removed 25133 unnecessary math.h header. 25134 25135 * indexdb.c, indexdb.h: Allowed user to build positions file directly to 25136 disk, if sufficient memory is unavailable. 25137 25138 * mem.c, mem.h: Added procedures for allocating memory without throwing an 25139 exception. 25140 25141 * gmapindex.c: Changed flags. Allowed user to build positions file directly 25142 to disk, if sufficient memory is unavailable. 25143 25144 * chrom.c: Eliminated printing of initial zero on non-numeric chromosomes. 25145 251462005-02-03 twu 25147 25148 * gmap_setup.pl.in: Removed -R flag, and symbolic links. Fixed problems 25149 with parsing unmapped contigs in seq_contig.md files. 25150 25151 * gmapindex.c: Added debugging statements. 25152 251532005-01-27 twu 25154 25155 * config.site: Added warning about non-absolute paths. 25156 25157 * README: Added comments about downloading a genome database. 25158 25159 * Makefile.am: Added extra commands for "make distcheck" to be happy. 25160 Removed genome example. 25161 25162 * MAINTAINER: Added comment about --enable-fulldist 25163 251642005-01-26 twu 25165 25166 * config.site, configure.ac, Makefile.am, datadir.c: Changed GENOMEDIR to 25167 GMAPDB. 25168 251692005-01-25 twu 25170 25171 * ss.AA005326, ss.cdna: Changed name of example cDNA sequence. 25172 251732005-01-24 twu 25174 25175 * MAINTAINER: Added recommended steps for creating a distribution. 25176 25177 * Makefile.am, chrnum.c, chrnum.h, chrom.c, chrom.h, genome.c, genome.h, 25178 get-genome.c, gmap.c, gmapindex.c, match.c, match.h, matchdef.h, 25179 matchpair.c, pair.c, pair.h, segmentpos.c, segmentpos.h, stage1.c, 25180 stage3.c, stage3.h: Made changes to allow chromosome names to be 25181 arbitrarily long 25182 25183 * gmap_setup.pl.in: Removed restriction on chromosome name length. Stripped 25184 spaces from beginning and end of input. Added step to create initial 25185 genomedir. 25186 25187 * config.site: Changed defaults in config.site. 25188 25189 * getopt.c, getopt.h, getopt1.c: Added gnu getopt_long function 25190 25191 * tests, ss.AA005326: Added test sequence. 25192 25193 * Makefile.am: Created Makefile.am in util subdirectory 25194 25195 * gmap_setup.pl.in: Fixed bug due to missing quotation mark 25196 25197 * configure.ac: Removed dependence upon popt library 25198 25199 * Makefile.am, get-genome.c, gmap.c: Added gnu getopt_long procedure 25200 25201 * README: Changed prerequisites. Improved formatting. 25202 252032005-01-23 twu 25204 25205 * gmap_setup.pl.in: Added procedures for handling UCSC genomes. 25206 25207 * iit_store.c: Using Tableint_T instead of Table_T for types. 25208 25209 * Makefile.am: Removed some unnecessary source files. 25210 25211 * configure.ac: Added ACX_EXPAND, turned off popt, and fixed problem when no 25212 threads compilation is possible. 25213 25214 * config.site.gne: Added comments for profiling and making .third file. 25215 25216 * acinclude.m4: Added macro for ACX_EXPAND. 25217 25218 * README: Added mention of examples and make check. 25219 25220 * Makefile.am: Added extra dist files for examples. 25221 25222 * oligoindex.c: Created a union type to make clear the possible storage of 25223 either a position or a pointer to an array positions. 25224 25225 * datadir.c: Removed unused function. 25226 25227 * gmapindex.c, table.c, table.h, tableint.c, tableint.h, uinttable.c, 25228 uinttable.h: Added an end value to avoid problems when table length is 0. 25229 25230 * Makefile.am, tableint.c, tableint.h, uinttable.c, uinttable.h: Made 25231 specific table types. 25232 25233 * gmap.c: Removed duplicate getopt line. 25234 25235 * iit_get.c: Fixed compilation bug when popt not available. 25236 25237 * gmapindex.c: Used specific table types and keys/values functions. 25238 25239 * table.c, table.h: Made functions Table_keys and Table_values 25240 25241 * gmap_uncompress.pl.in: Using BINDIR for substitution. 25242 25243 * Makefile.am: Removed Makefile.am 25244 25245 * gmap_setup.pl.in: Major changes made to provide both interactive and 25246 command-line use. 25247 252482005-01-22 twu 25249 25250 * configure.ac: Allowed hyphens to be in the version number 25251 25252 * MAINTAINER, bootstrap, config.site, config.site.gne: Added local 25253 config.site to CVS directory 25254 25255 * MAINTAINER: Added notes for maintainer 25256 25257 * README: Simplifying the installation instructions 25258 25259 * configure.ac: Made configuration easier by adding VERSION and config.site 25260 files. Removed MAPDIR. Added Perl scripts. 25261 25262 * VERSION, config.site: Made configuration easier by adding VERSION and 25263 config.site files. 25264 25265 * gmap_compress.pl.in, gmap_uncompress.pl.in: Changed file from .pl version 25266 to .pl.in version. 25267 25268 * Makefile.am: Moved Perl scripts to util subdirectory. 25269 25270 * datadir.c, datadir.h, gmap.c: Moved map files to a subdirectory in genome 25271 directory. 25272 25273 * gmapsetup.pl.in: Moved file to util subdirectory. 25274 25275 * whats_on: Changed location of map files to be inside genome directories. 25276 25277 * gmap_compress.pl, gmap_uncompress.pl: Changing scripts from .pl to .pl.in 25278 version 25279 25280 * README, configure.ac, Makefile.am, compress.c, datadir.c, genome.c, 25281 get-genome.c, gmap.c, gmapsetup.pl.in, iit-read.c, indexdb.c, 25282 segmentpos.c, gmap_compress.pl, gmap_compress.pl.in, gmap_setup.pl.in, 25283 snap.c, snapbuild.pl.in, snapindex.c, snap_compress.pl, 25284 snap_uncompress.pl: Renamed program from snap to gmap 25285 25286 * gmap_compress.pl, gmap_compress.pl.in, snap_compress.pl: Better handling 25287 of MD5 info and aa lines. 25288 25289 * gmap_uncompress.pl, gmap_uncompress.pl.in, snap_uncompress.pl: Handling 25290 arbitrary flags in compression. 25291 25292 * mutation.c, mutation.h, pair.c, pair.h, pairdef.h, translation.c: Added 25293 refquerypos to print nucleotide position of mutations. 25294 25295 * get-genome.c: Fixed problem with empty header for reference sequence when 25296 specific strain is requested. 25297 252982005-01-19 twu 25299 25300 * translation.c: Fixed problem with printing of an AA in an intron. 25301 25302 * mutation.c: Consolidated point mutations near a segmental mutation. 25303 253042005-01-06 twu 25305 25306 * translation.c: Fixed detection of deletion mutations where aapos was 25307 advancing in a gap. 25308 25309 * mutation.c, mutation.h, translation.c: Fixed cases where a single-position 25310 mutation was reported next to a segmental mutation. 25311 25312 * stage3.c: Added debugging statements for relative alignment. 25313 253142005-01-05 twu 25315 25316 * translation.c: Allowed lower case letters to translate appropriately to a 25317 codon. 25318 253192004-12-21 twu 25320 25321 * stage3.c: Performing microexon search for all defect rates. Adjusted 25322 acceptable mismatches for low-quality sequences. 25323 253242004-12-20 twu 25325 25326 * translation.c: Increased IGNORE_MARGIN to deal with nucleotide coordinates. 25327 25328 * stage3.c: Changed criteria for performing microexon search. 25329 25330 * translation.c, translation.h: Fixed detection of large deletions relative 25331 to reference sequence. Fixed printing of cDNA aa in a gap. 25332 25333 * stage3.c: Changed criterion for starting microexon search to add 25334 mismatches and indels. Fixed detection of large deletions relative to 25335 reference sequence. 25336 25337 * gmap.c, snap.c: Set chimera threshold to 0 for default. Reduced band from 25338 10 to 7. 25339 25340 * dynprog.c: Reduced pvalue thresholds for microexons. 25341 253422004-12-19 twu 25343 25344 * gmap.c, snap.c: Turned on chimera functionality. Increased dynamic 25345 programming band from 7 to 10. 25346 25347 * stage1.c: Changed function for maxintronlen. 25348 25349 * smooth.c: Increased SHORTMIDEXON_LEN from 40 to 80. 25350 25351 * dynprog.c: Removed definition for INFINITY, which wasn't being used. 25352 253532004-12-18 twu 25354 25355 * stage2.c: Created define parameter SAMPLE_INTERVAL. 25356 25357 * gmap.c, snap.c: Change maxintronlen to be maxintronlen_bound, and compute 25358 new maxintronlen depending on current query length. Increased size of 25359 extraband_single and extraband_paired. 25360 25361 * params.c, params.h, stage1.c, stage1.h: Change maxintronlen to be 25362 maxintronlen_bound, and compute new maxintronlen depending on current 25363 query length. 25364 25365 * dynprog.c: Changed compute_scores_affine to have parameter list compatible 25366 with compute_scores (with codon penalty). 25367 25368 * stage3.c: Subtracting points for non-canonical introns in determining 25369 direction. Doing middle introns of sequence before doing 3' and 5' ends. 25370 25371 * dynprog.c: Increased pvalue thresholds. 25372 253732004-12-13 twu 25374 25375 * gmap.c, pair.c, pair.h, snap.c, stage3.h: Added function for printing cDNA 25376 exons. 25377 25378 * stage3.c: Fixed problem where we shouldn't perform single-gap dynamic 25379 programming because unable to peel forward and peel back. 25380 25381 * translation.c: Created separate mutation types for substitution, 25382 insertion, and deletion. Allowed filling in of last amino acid. 25383 25384 * mutation.c, mutation.h: Created separate mutation types for substitution, 25385 insertion, and deletion. 25386 253872004-12-09 twu 25388 25389 * stage3.c: Changed calls to Translate module. 25390 25391 * translation.c: Simplified code for computing protein bounds. Handled the 25392 case where full length is specified, but no full length protein exists. 25393 25394 * mutation.c, mutation.h: Added procedures for handling multiple insertions 25395 and deletions. 25396 25397 * translation.c, translation.h: Changed algorithm for translate_est_forward 25398 and translate_est_backward. 25399 254002004-12-08 twu 25401 25402 * translation.c, translation.h: Changed algorithms for translate_est_forward 25403 and translation_est_backward. Added printing of nucleotide differences. 25404 25405 * gmap.c, snap.c, stage3.c, stage3.h: Added options for printing either 25406 genomic or cDNA version of protein. 25407 25408 * pair.c, pair.h: Added function Pair_dump_aapos. 25409 25410 * mutation.c, mutation.h: Added functions for retrieving amino acids from 25411 mutation. 25412 25413 * dynprog.c: Added slight penalty against gaps next to an intron. 25414 254152004-12-05 twu 25416 25417 * Makefile.am, mutation.c, mutation.h, stage3.c, translation.c, 25418 translation.h: Simplified computation of translations and mutations. 25419 25420 * pair.c, pair.h, pairdef.h, pairpool.c: Now printing both genomic and cDNA 25421 proteins. 25422 254232004-12-02 twu 25424 25425 * stage3.h: Removed unused chimera code. 25426 25427 * stage3.c: Removed unused chimera code. Changed criteria for finding 25428 microexons at end; now performed only when extension is poor and sequence 25429 quality is high. 25430 25431 * dynprog.h, gmap.c, snap.c: Allowed user option to extend alignment past 25432 last match. 25433 25434 * dynprog.c: Fixed bug in adding gap to replace dashes. 25435 25436 * pairpool.c: Added debugging statement for creation of pairs. 25437 25438 * smooth.c: Added check for negative exon length. 25439 254402004-11-29 twu 25441 25442 * dynprog.c, dynprog.h: Added symbols for an intron if applicable to a large 25443 horizontal jump. Increased maximum microexon size. 25444 25445 * stage3.c: Added peel_back and peel_forward to 5' and 3' ends before doing 25446 search for microexons. 25447 254482004-11-22 twu 25449 25450 * scores.h, stage3.c: Added credit for dual half-canonical introns. 25451 25452 * stage1.c, stage1.h: Removed unused code. 25453 25454 * dynprog.c: Added parameters for PVALUE for microexon and end exon searches. 25455 254562004-11-18 twu 25457 25458 * result.h: Changed interface for Result_new to match implementation. 25459 25460 * Makefile.am: Added scores.h to Makefile.am. 25461 254622004-11-15 twu 25463 25464 * stage3.c, stage3.h: Commented out code for extending pairs in a chimera. 25465 25466 * gmap.c, snap.c: Fixed problem in rearranging best two paths for chimera. 25467 25468 * pair.c: Stopped printing of the terminal amino acid '*'. 25469 25470 * genome.c, indexdb.c: Added printing of a dot every 10000 pages. 25471 254722004-10-12 twu 25473 25474 * gmap.c, snap.c: For chimeras that extend too long, now chopping off the 25475 extra part. 25476 25477 * stage3.c, stage3.h: Added procedure for doing a bounded copy of a Stage 3 25478 object. 25479 25480 * pairpool.c, pairpool.h: Added procedure for doing a bounded copy of a path. 25481 254822004-10-06 twu 25483 25484 * Makefile.am, chimera.c, chimera.h, gmap.c, snap.c: Changed procedure for 25485 chimeras to find best pair and to order the chimeras according to query 25486 sequence. 25487 25488 * scores.h: Moved scores for determining goodness into a separate file. 25489 25490 * stage3.c, stage3.h: Added procedure for copying a Stage3 object. 25491 25492 * result.c, result.h: Changed chimera information to be a position, rather 25493 than a boolean. 25494 25495 * dynprog.c: Added code for allowing right angles, but not using at present. 25496 25497 * pairpool.c: Changed print statement to work only in debug mode. 25498 25499 * pair.c, pair.h: Added procedure for computing scores along a path. 25500 255012004-09-30 twu 25502 25503 * gmap.c, match.c, pair.c, pair.h, sequence.c, sequence.h, snap.c, stage3.c, 25504 stage3.h: Added MD5 checksum for compressed output. 25505 25506 * stage1.c: Added notation about using position for revcomp matches in IITs. 25507 25508 * iit-read.c: Changed debugging statements to print unsigned ints. 25509 255102004-09-27 twu 25511 25512 * stage3.c: Fixed problem when peeling an extra pair if it's a gap. 25513 255142004-09-09 twu 25515 25516 * stage3.c: Restored behavior of crossing just one short exon for dual 25517 genome gap. 25518 25519 * stage3.c: Peeled back one more matching pair. For dual intron gap, now 25520 skipping multiple short exons and keeping the longest one. 25521 25522 * gmap.c, snap.c: Increased maxpeelback from 10 to 11. 25523 25524 * pairpool.c, pairpool.h: Added command Pairpool_transfer_copy, although not 25525 currently used. 25526 25527 * pair.c, pair.h: Added command Pair_check_list. 25528 25529 * dynprog.c: Added end reward for bridging a cDNA gap. 25530 25531 * md5-compute.c: Changed behavior from a single sequence to a FASTA file of 25532 multip[le sequences. 25533 25534 * Makefile.am: Added object file for md5-compute. 25535 255362004-09-02 twu 25537 25538 * stage3.c: Fixed floating exception bug when middle_exonlength is 25539 non-positive. 25540 25541 * stage2.c: Fixed problem of reading uninitialized value. 25542 255432004-08-30 twu 25544 25545 * dynprog.c: Added check for non-positive span. 25546 255472004-07-28 twu 25548 25549 * stage2.c: Changed some penalties. Using bad sequence information to 25550 increase lookback. 25551 25552 * oligoindex.c, oligoindex.h: Added check for bad sequences (with several 25553 non-ACGTN characters). 25554 25555 * dynprog.c: Added check for zero span. 25556 255572004-06-25 twu 25558 25559 * gmap.c, snap.c: Added flag to search only reference strain. 25560 25561 * stage2.c: Increased definition of ENOUGH_CONSECUTIVE. Added penalties for 25562 deadp. 25563 25564 * stage3.c: Penalizing noncanonical introns in comparing across different 25565 paths. 25566 25567 * segmentpos.c: Changed output of contig length. 25568 25569 * pair.c, pair.h: Reporting number of noncanonical introns. Allowing 25570 goodness to be reported during debugging. 25571 255722004-06-23 jtang 25573 25574 * cvswrappers: Added binary extensions 25575 255762004-06-20 twu 25577 25578 * stage2.c: Simplified decision making for mismatch gaps. Increased 25579 penalties on gendistance and querydistance. 25580 255812004-06-16 twu 25582 25583 * whats_on: Added get_sequences function. 25584 25585 * gmap_uncompress.pl, gmap_uncompress.pl.in, snap_uncompress.pl: Implemented 25586 code to interpret new compression scheme. 25587 25588 * stage2.c: Changed penalty functions into macros for speed. Made some 25589 other changes to improve speed. 25590 25591 * dynprog.c: Fixed bug involving uninitialized variable. 25592 255932004-06-15 twu 25594 25595 * stage2.c: Penalizing intron length per 2000 nt instead of 1000. 25596 25597 * stage3.c: Using goodness scores to decide between single and dual introns. 25598 Searching for microexons only when sequence quality is medium to high. 25599 25600 * dynprog.c, dynprog.h: Reporting nopens and nindels from Dynprog_genome_gap. 25601 256022004-06-12 twu 25603 25604 * stage3.c: Added evaluation of middle exon length in deciding between 25605 single and dual introns. 25606 25607 * stage2.c: Reduced size of INTRON_DEFN. Further penalized large query 25608 distances. 25609 25610 * stage1.c: Increased length of additional ends of genome segment. 25611 25612 * oligoindex.c: Improved debugging statements. 25613 25614 * dynprog.c, dynprog.h: Allowing up to 1 mismatch on either side for 25615 microexon search. Reporting position of exonhead in Dynprog_genome_gap for 25616 use in traversing dual genome gap. 25617 256182004-06-10 twu 25619 25620 * stage2.c: Added procedure to determine maximum intron length at a given 25621 querypos, and based penalties to be linear with that. 25622 25623 * smooth.c: Added check for nullness of intronlengths. 25624 25625 * dynprog.c: Added probabilistic check on microexon length for a given 25626 genomic span. 25627 256282004-06-09 twu 25629 25630 * pair.c, pair.h, stage3.c: Keeping track of semicanonical introns and 25631 scoring them to decide on strand. 25632 25633 * smooth.c: Made decision about deleting end exons based on probability. 25634 256352004-06-08 twu 25636 25637 * get-genome.c: Fixed a bug in determining whether a query is a range. 25638 256392004-06-07 twu 25640 25641 * stage3.c: Added debugging statement for microexons. 25642 25643 * dynprog.c: Added minimum length for introns when looking for microexons. 25644 25645 * stage2.c: Changed penalties to be more consistent on mismatches between 25646 different conditions, including deadp. For deadp, now requiring that 25647 abs(gendistance - querydistance) or querydistance be less than INTRON_DEFN. 25648 25649 * smooth.c: Increased threshold on ends to be 20. 25650 25651 * pair.c: For determining fracidentity (and selecting between forward and 25652 reverse strands), now counting semicanonical introns as canonical. 25653 25654 * stage2.c: For deadp, increased lookback. 25655 25656 * gmap.c, snap.c: Increased maxintronlen from 1 million bp to 2 million bp. 25657 Motivated by HER4 (NM_005235). 25658 25659 * gmap.c, snap.c: Increased nullgap from 80 to 600. 25660 25661 * stage2.c: Modified stage 2 scoring for mismatch alignments. Invoked deadp 25662 when fwd or rev score is zero. 25663 256642004-06-04 twu 25665 25666 * translation.c: Further fixed the bug involving uninitialized heap 25667 (translation_start/translation_end extending beyond sequence boundaries). 25668 25669 * stage2.c, stage2.h: Rewrote code into separate procedure. Increased 25670 gendistance penalty. Changed penalties when querypos is dead. 25671 25672 * gmap.c, snap.c: Created separate parameters for extraband_end and 25673 extraband_paired. Renamed maxlookback to nullgap. Created nsufflookback 25674 parameter. Removed repetition of stage 2. 25675 25676 * params.c, params.h, stage3.c, stage3.h: Created separate parameters for 25677 extraband_end and extraband_paired. Renamed maxlookback to nullgap. 25678 Created nsufflookback parameter. 25679 25680 * dynprog.h: Created separate parameters for extraband_end and 25681 extraband_paired. 25682 25683 * dynprog.c: Created separate parameters for extraband_end and 25684 extraband_paired. Extending last nucleotide at ends if possible. Removing 25685 gaps at ends. 25686 25687 * translation.c: Fixed a bug involving reading/writing of uninitialized heap. 25688 256892004-06-02 twu 25690 25691 * stage3.c: Doubled intron space required for a paired gap solution to be 25692 attempted. 25693 25694 * dynprog.c: Implemented gap penalties that are non-affine, with extensions 25695 being the same within a codon. 25696 25697 * translation.c: Fixed bug where codon was assigned improperly at a cDNA gap. 25698 25699 * dynprog.c, dynprog.h, stage3.c: Added a conservative search for microexons 25700 at the 5' and 3' ends. 25701 25702 * smooth.c: Increased pruning of ends from 8 back to 16. 25703 257042004-06-01 twu 25705 25706 * stage2.c, stage2.h: Added Stage2_pathlength function. 25707 25708 * gmap.c, snap.c: ncreased maxpeelback from 8 to 10 and allowed program to 25709 redo stage2 with increased suflookback if cDNA not covered. 25710 25711 * stage3.h: Changed MININTRONLEN from 9 to 6 and moved definition into .c 25712 file. 25713 25714 * stage3.c: Made search for microexon dependent on number of mismatches. 25715 25716 * dynprog.c, dynprog.h: Made Dynprog_genome_gap return number of matches and 25717 mismatches. 25718 257192004-05-26 twu 25720 25721 * dynprog.c, dynprog.h, stage3.c: Made microexon search work in reverse 25722 direction. Fixed memory leak. 25723 25724 * boyer-moore.h: Added RCS Id. 25725 25726 * boyer-moore.c: Removed debugging statement. Added RCS Id. 25727 25728 * Makefile.am, boyer-moore.c, boyer-moore.h, dynprog.c, dynprog.h, stage3.c: 25729 Added procedure for finding microexons. Works for forward direction only. 25730 25731 * stage3.c: Increased goodness score for canonical intron when deciding 25732 between forward and reverse directions. 25733 25734 * sequence.c: Fixed read procedure to handle PC line feeds. 25735 25736 * dynprog.c: Changed end extension to allow one gap and to proceed if number 25737 of matches is greater than or equal to number of mismatches. 25738 25739 * gmap.c, snap.c, stage3.c, stage3.h, translation.c, translation.h: Added 25740 option for assuming a full-length sequence. 25741 25742 * pair.c: Changed printing of protein coordinates to correspond to first 25743 amino acid on each line. 25744 25745 * gmap.c, snap.c: Added ability to print protein sequence. Fixed some flags. 25746 25747 * translation.c, translation.h: Fixed calculation of translation coordinates. 25748 25749 * pair.c, pair.h, stage3.c, stage3.h: Added printing of protein coordinates. 25750 25751 * indexdb.c: Revised monitoring statement. 25752 25753 * dynprog.c: Revised extensions of 5' and 3' ends to use best score with no 25754 gap, even if negative. This extends ends when there is one match and one 25755 mismatch. 25756 257572004-05-14 vivekr 25758 25759 * cvswrappers: Added binary extensions 25760 257612004-05-05 twu 25762 25763 * compress.c, gmap.c, indexdb.c, pair.c, snap.c, stage3.c, stage3.h, 25764 translation.c, translation.h: Made improvements to relative translation 25765 routines. 25766 25767 * Makefile.am, compress.c, compress.h, gmapindex.c, indexdb.c, snapindex.c: 25768 Moved compress and uncompress routines to a new file. 25769 257702004-04-22 twu 25771 25772 * translation.c: Fixed frameshift-tolerant protein computation for cases 25773 where cDNA deletion is 3 or more nt. 25774 25775 * gmap.c, snap.c, stage3.c, stage3.h, translation.c, translation.h: Added 25776 feature for fixing frameshifts in reference-based protein computation and 25777 made it the default. 25778 25779 * stage3.c, translation.c, translation.h: Changed internal data format for 25780 calculating translations. 25781 25782 * translation.c: Fixed array bounds bug in translating from reference. 25783 25784 * whats_on: Added flag for showing original headers. 25785 257862004-04-21 twu 25787 25788 * gmap.c, pair.c, pair.h, pairdef.h, pairpool.c, params.c, params.h, snap.c, 25789 stage3.c, stage3.h, translation.c, translation.h: Added protein 25790 calculation for ESTs based on a reference mRNA. 25791 257922004-04-19 twu 25793 25794 * gmap.c, pair.c, pair.h, params.c, params.h, snap.c, stage3.c, stage3.h, 25795 translation.c, translation.h: Changes to allow calculation of mutation 25796 effect given a specific mutation 25797 257982004-04-18 twu 25799 25800 * genome.c, genome.h: Fixed code for patching strains. 25801 25802 * genomicpos.c: Cleaned up code for adding commas. 25803 25804 * stage1.c: Changed variable name from stutter to stutterdist. 25805 25806 * gmap.c, snap.c: Added internal flag to control strain searching feature. 25807 258082004-04-17 twu 25809 25810 * whats_on: Added ability to print align.iit files, rather than map.iit 25811 files. 25812 25813 * gmap_uncompress.pl, gmap_uncompress.pl.in, snap_uncompress.pl: Added 25814 inversion mode. 25815 25816 * gmap_compress.pl, gmap_compress.pl.in, snap_compress.pl: Added code to 25817 skip protein sequence lines. 25818 258192004-03-30 twu 25820 25821 * Makefile.am, gmap.c, snap.c: Added routines for printing protein sequence. 25822 25823 * iit-read.c, iit-read.h: Added procedure for listing all types. 25824 25825 * stage3.c, stage3.h: Fixed memory leak and bug when stage3 result is NULL. 25826 25827 * translation.c, translation.h: Added routines for printing peptide sequence. 25828 25829 * get-genome.c: Allowed user to select a particular strain to align against. 25830 25831 * pair.c, pair.h: Added routine for printing peptide. Clarified code for 25832 handling inversions on minus strand. Fixed bug in compression for '#' 25833 character. 25834 258352004-02-23 twu 25836 25837 * stage3.c: Added special cases for single mismatch and single cDNA 25838 insertion. 25839 25840 * stage1.c: Defined maximum on finding match pairs, to eliminate slow 25841 response on nonsense sequences, such as poly-G. 25842 25843 * stage3.c: In pass 3, force single gap to be crossed even if finalscore is 25844 negative, to complete alignment. 25845 258462004-02-19 twu 25847 25848 * stage3.c, stage3.h: Removed unused variable minendtrigger. 25849 25850 * gmap.c, params.c, params.h, snap.c: Removed global user-specified 25851 parameters from Params_T. 25852 25853 * gmap.c, snap.c: Removed fraction_threshold parameter. Changed default 25854 chimera_threshold to 0.50. 25855 25856 * params.c, params.h: Removed fraction_threshold parameter. 25857 258582004-02-18 twu 25859 25860 * gmap.c, snap.c: Made chimera threshold definable by user, and set default 25861 to 0.70. 25862 25863 * stage3.c: Re-defined criterion for a gap to be when queryjump <= 0 and 25864 genomejump <= 0, which holds true after single gaps are filled. Prevented 25865 filling in a genome gap when its alignment score is negative. 25866 25867 * smooth.c, smooth.h: Re-defined criterion for a gap to be when queryjump <= 25868 0 and genomejump <= 0, which holds true after single gaps are filled. 25869 25870 * matchpair.c, matchpair.h: Made size bound a fraction of the best, rather 25871 than subtraction. 25872 25873 * stage1.c: Increased maxentries parameters. Made size bound a fraction of 25874 the best, rather than subtraction. 25875 25876 * gmap.c, snap.c: Removed calls to Stage1_matchpairlist. Now performing 25877 sampling by default. 25878 25879 * smooth.c: Reduced size definition of short intron. Made intron definition 25880 depend only on genome distance, which now includes single gaps that 25881 weren't filled in in stage 3. 25882 25883 * stage2.c: Made a macro for query distance penalty. 25884 25885 * stage3.c: Fixed problem with peeled is NULL. Added decision to not fill 25886 in single gap if the score is negative, and to restore peeled pairs in 25887 that case. 25888 25889 * smooth.c: Fixed memory leak. 25890 258912004-02-17 twu 25892 25893 * stage3.c: Fixed bug where program would skip over a pair after gappairs 25894 was added. 25895 25896 * stage3.c: Made peel_back and peel_forward end at a non-gap, by 25897 backtracking from peeled. 25898 258992004-02-16 twu 25900 25901 * stage3.c: Increased reward for canonical introns from 5 to 8. 25902 259032004-02-15 twu 25904 25905 * stage3.c: Fixed bugs in peel_back and peel_forward. Fixed bug in 25906 computing goodness scores. 25907 25908 * smooth.c, smooth.h, stage3.c: Giving information about number of short 25909 exons found in smoothing to stage 3 to help improve speed. 25910 25911 * stage3.c: Removed occurrences of indexsize. Cleaned up procedure for 25912 finding middle exons in dual intron procedure. 25913 25914 * smooth.c, smooth.h: Rewrote smoothing procedure to be a cleaner procedure. 25915 Analyzing both ends to prune short exons. 25916 25917 * pair.c, pair.h: Added Pair_debug_alignment procedure. 25918 259192004-02-14 twu 25920 25921 * smooth.c, smooth.h, stage2.c, stage3.c: Changed stage 2 to produce a 25922 nucleotide-based path, rather than 8-mer path. Changed smoothing and 25923 stage 3 accordingly. Made all intron distance penalties equal in stage 2. 25924 25925 * dynprog.c, dynprog.h, stage3.c: Rearranged stage 3 to solve dual introns 25926 before other introns and large gaps. Performing smoothing iteratively 25927 with dual introns. 25928 25929 * stage2.c: Replaced calculations of gendistance_penalty with macros. 25930 25931 * smooth.c: Increased minexonlen for smoothing, because smoothing has been 25932 made iterative. 25933 25934 * pair.c: Using memcpy commands instead of copying individual fields. Added 25935 diagnostic printing of short exons. 25936 25937 * Makefile.am, smooth.c, smooth.h, stage3.c: Added files smooth.c and 25938 smooth.h and moved Smooth_path there 25939 25940 * matchpair.c: Fixed memory leak. 25941 259422004-02-13 twu 25943 25944 * gmap.c, snap.c, stage2.c, stage2.h, stage3.c, stage3.h: Changed stage 2 25945 and stage 3 algorithms to interleave in the following order: dynamic 25946 programming on single gaps, then smoothing, then dynamic programming on 25947 ends and large gaps. Allows dual intron algorithm to work even when 25948 middle exon has small mismatches or gaps. 25949 25950 * pair.c: Fixed merge_one_gap to handle user-selected ngap != 3. 25951 25952 * dynprog.c, dynprog.h: Moved definitions of defect rate boundaries to 25953 header file. 25954 25955 * stage2.c: Subtracting (querydistance+7)/8 on mismatches to penalize once 25956 per 8-mer. Subtracting 1 for each intron to reduce number of introns, 25957 especially when a/1000 + b/1000 < (a+b)/1000. 25958 25959 * stage2.c: Added separate intron penalties for consistent, unknown, and 25960 inconsistent introns. Increased lengths of short middle exons marked for 25961 dual genome gap. 25962 25963 * stage2.h, stage3.c: Added intron length to goodness score. 25964 25965 * stage2.c: Implemented two parallel computations in Stage 2 under forward 25966 and reverse assumptions. Removed firstregion and lastregion computations 25967 from smoothing. 25968 259692004-02-12 twu 25970 25971 * gmap.c, snap.c: Removed universalp flag (-U). 25972 25973 * iit_get.c: Added annotation only mode (-A). 25974 259752004-02-11 twu 25976 25977 * whats_on: Simplified code greatly. 25978 25979 * gmap_compress.pl, gmap_compress.pl.in, snap_compress.pl: Removed space 25980 before first token. Fixed bug in reporting genomic exon length rather 25981 than cDNA exon length. 25982 25983 * dynprog.c, dynprog.h, stage3.c: Added computation of nonintronlen for 25984 goodness ranking. 25985 25986 * stage2.c: Introduced Link_T to hold dynamic programming data. 25987 25988 * stage2.c: Modified smoothing to have keep, delete, and mark options. 25989 25990 * stage2.c, stage2.h, stage3.c: Added category for introns of unknown 25991 direction. 25992 25993 * gmap.c, snap.c: Made default for chimerasearchp false again. Added an 25994 automatic mode for chimera search if coverage is less than 50%. 25995 259962004-02-10 twu 25997 25998 * gmap.c, snap.c: Made chimera search the default. 25999 26000 * stage2.c: Added counts of forward, reverse, and non-canonical introns to 26001 the dynamic programming procedure, and used consistency in computing 26002 scores. 26003 26004 * stage3.c: Added debugging macros. 26005 26006 * stage2.c: Added intron penalty only for noncanonical introns. 26007 26008 * stage1.c: Added debugging statements. 26009 26010 * matchpair.c: Reduced MAXCANDIDATES from 30 to 10. 26011 26012 * dynprog.c, pair.c, pairdef.h: Added '~' character for non-canonical gaps 26013 converted to insertions, to avoid penalizing them as non-intron gaps. 26014 260152004-02-09 twu 26016 26017 * gmap.c, iit-read.c, iit-read.h, snap.c, stage3.c, stage3.h: Changed map 26018 output to include strand if both strands are requested. 26019 26020 * dynprog.c: Restored horizontal or vertical jump of 1 next to intron. 26021 26022 * datadir.c: Changed error message. 26023 26024 * stage2.c: Add penalty for number of non-canonical introns. Accumulate 26025 best score for introns, even if negative, and use that if no other score 26026 exceeds 0. 26027 26028 * iit-read.h: Added name to IIT structure. 26029 26030 * gmap.c, pair.c, pair.h, params.c, params.h, snap.c, stage3.h: Added 26031 compression feature. 26032 26033 * stage3.c: Added compression feature. Added debug mode to show output from 26034 stage 2. 26035 26036 * gmapindex.c, snapindex.c, get-genome.c, iit-read.c, iit_dump.c, iit_get.c, 26037 iitdef.h: Added optional name to IIT structure. 26038 260392004-02-06 twu 26040 26041 * dynprog.c: Advanced counter within gaps to the next position. 26042 26043 * pair.c, pair.h, pairdef.h, pairpool.c: Added the shortexonp field for 26044 pairs. 26045 26046 * stage2.c: For smoothing of short exons, marking positions as short, rather 26047 than deleting them. Increased length threshold for short exons, because 26048 we now have a mechanism for handling them well. 26049 26050 * gmap.c, snap.c: Added a dynprogM for handling short exons. 26051 26052 * stage3.c, stage3.h: Removed special procedure for dual genome gaps. 26053 Instead comparing a single genome gap with two half genome gaps for short 26054 exons. 26055 26056 * dynprog.c: Removed special procedure for dual genome gaps. Instead, for 26057 short exons, comparing a single genome gap with two half genome gaps. 26058 26059 * dynprog.c: Passing pointers to revsequence and revoffset from stage3 to 26060 dynprog procedures where appropriate. Added preliminary code for dual 26061 genome gap. 26062 26063 * dynprog.h, stage3.c: Passing pointers to revsequence and revoffset from 26064 stage3 to dynprog procedures where appropriate. 26065 26066 * get-genome.c, gmap.c, pair.c, pair.h, params.c, params.h, sequence.c, 26067 sequence.h, snap.c, stage3.c, stage3.h: Added option for specifying wrap 26068 length. 26069 26070 * dynprog.c: Fixed problem with sequence being short by 1 nt in conversion 26071 of gap to insertion. 26072 26073 * dynprog.c: Convert short non-canonical introns into insertions. 26074 26075 * dynprog.c: Removed reverse_sequence and creation of reverse sequence. Now 26076 using a boolean to determine whether to use negative indices. 26077 260782004-02-05 twu 26079 26080 * README, configure.ac, Makefile.am, datadir.c, datadir.h, gmap.c, params.c, 26081 params.h, snap.c, stage3.c, stage3.h: Changed references to "bounds" to 26082 "map". 26083 26084 * ddsgap2_compress.pl: Made much faster. 26085 26086 * get-genome.c: Fixed get-genome for reverse complement. Added debugging 26087 statements. 26088 26089 * dynprog.c: Added specific constraints on whether to allow gaps adjacent to 26090 the intron, depending on sequence quality. 26091 260922004-02-03 twu 26093 26094 * dynprog.c, dynprog.h: Removed conservative option. Added comments to 26095 explain rationale behing scoring scheme. 26096 26097 * gmap.c, params.c, params.h, snap.c, stage3.c, stage3.h: Removed 26098 conservative option. 26099 26100 * stage3.c: Removed peelback on sequence ends. Continued peelback through 26101 small gaps and mismatches. Included comp of '-' in pruning of gaps at end. 26102 26103 * iit-read.c: Added debugging code. 26104 26105 * genome.c: Fixed faulty reasoning when patch has expansion or contraction. 26106 26107 * dynprog.c: Raised penalties on paired gap alignment to prevent 26108 gap-match-gap being preferred to two mismatches. Added checks to bridging 26109 across introns to prevent genomic insertion or more than one cDNA 26110 insertion. 26111 261122004-02-02 twu 26113 26114 * pairdef.h: Revised comment about definition of gapp. 26115 26116 * pair.c: Removed comment. 26117 26118 * dynprog.c: Fixed debugging statements for pairs pushed on horizontal or 26119 vertical moves. 26120 26121 * gmap.c, pair.c, pair.h, snap.c, stage3.c, stage3.h: Added printing of 26122 bounds information as a separate section. 26123 261242004-01-31 twu 26125 26126 * Makefile.am: Added uintlist.c and uintlist.h to source lists where 26127 necessary. 26128 26129 * gmapindex.c, snapindex.c: Made contig intervals inclusive. 26130 26131 * iit_get.c: Changed isnumber to isnumberp to avoid conflict on some Unix 26132 machines. 26133 26134 * iit_get.c: Handle case where strlen of annotation is 0. Add carriage 26135 return after annotation if necessary. If one numeric argument given, try 26136 as a label, then as a number. 26137 26138 * iit-read.c: Handle case where strlen of annotation is 0. 26139 26140 * genome.c, get-genome.c: Reverted to previous IIT format, where we don't 26141 store lengths explicitly. For sequences, can determine actual length from 26142 annotation strlen. 26143 26144 * iit-read.c, iit-read.h, iit-write.c, iit-write.h, iit_store.c, interval.c, 26145 interval.h: Reverted to previous format, where we don't store lengths 26146 explicitly. 26147 26148 * iit_dump.c: Added warning if IIT_read fails. 26149 26150 * gmapindex.c, snapindex.c: Reverted to previous format, where we don't 26151 store lengths explicitly. For FASTA files, count sequence length and store 26152 as annotation in contig_iit. 26153 26154 * stage3.c: Added Pair_check procedure. 26155 26156 * dynprog.c: Fixed problem with dynamic programming not going back to 26157 beginning. Fixed bridging across cDNA gaps. 26158 26159 * datadir.c, datadir.h: Created two data directories, one for genome files 26160 and one for bounds files. 26161 26162 * pair.c, pair.h: Added Pair_check function. 26163 26164 * configure.ac, Makefile.am, gmap.c, snap.c: Created two data directories, 26165 one for genome files and one for batch files. 26166 261672004-01-27 twu 26168 26169 * dynprog.c: Reduced mismatch and gap penalties at ends to extend ends more 26170 completely. 26171 26172 * stage1.c: Increased length of very small sequences from 30 to 40. 26173 261742004-01-26 twu 26175 26176 * gmap.c, snap.c, stage1.h: Changed criterion for good alignment on short 26177 sequences to be based on coverage rather than percent identity. 26178 26179 * stage1.c: Sampling exhaustively on short sequences. 26180 26181 * stage2.c: Removed tiebreaker based on genomic distance. Ignoring 26182 gendistance penalty if no better score can be found, which allows program 26183 to find distant 5' exons. 26184 26185 * pair.c, pairpool.c, stage3.c: Restored large gap and '#' character when 26186 queryjump exceeds maxlookback. 26187 26188 * match.c: Fixed bug where accessions were looked up on chromosomal 26189 coordinates instead of universal coordinates. 26190 26191 * Makefile.am, datadir.c, datadir.h, snapconfig.c: Removed snapconfig and 26192 run-time configuration of SNAP, which doesn't work on statically built 26193 binaries. 26194 261952004-01-23 twu 26196 26197 * gmap.c, snap.c: Updated print_usage statement for non-popt systems. 26198 26199 * snapconfig.c: Added a usage statement. 26200 26201 * iit_dump.c: Added a debug flag. 26202 26203 * iit-write.c: Writing out elements of structs individually, instead of 26204 depending on an fwrite of the struct. 26205 26206 * iit-read.c, iit-read.h: Fixed problem with Bigendian reads of iit files. 26207 Added IIT_debug function. 26208 26209 * Makefile.am: Provided different dist and nodist instructions depending on 26210 FULLDIST. 26211 26212 * stage1.c: Set maxentries during sampling to be 10 times that of scanning. 26213 Set stage1size for short sequences to be 12-mers for < 40 nt, and 18-mers 26214 for 40-80 nt. 26215 26216 * pair.c, pair.h, pairpool.c: Removed '#' is a character in alignment. 26217 26218 * dynprog.c, dynprog.h, stage3.c: Treated cDNA gaps (extra cDNA material) in 26219 a way analogous to genome gaps. 26220 26221 * get-genome.c: Changed name of function from isnumber to isnumberp to avoid 26222 name conflict with some systems (like MacOSX) that define isnumber in 26223 ctype.h. 26224 262252004-01-20 twu 26226 26227 * stage3.c: Fixed bug where dynamic programming of ends wouldn't go all the 26228 way to the end of the genomic segment. 26229 26230 * dynprog.c: Fixed debug statement. 26231 26232 * Makefile.am: Added file matchdef.h 26233 26234 * dynprog.c, dynprog.h, gmap.c, pair.c, pair.h, params.c, params.h, snap.c, 26235 stage3.c, stage3.h: Added parameter for length of intron gap shown. 26236 26237 * stage1.c: Added a second maxentries parameter to prevent slowness on long 26238 repeated inputs, like CA...CA. 26239 262402004-01-19 twu 26241 26242 * stage3.c: Allowed cDNA direction to be indeterminate. 26243 26244 * matchpair.c, stage1.c: Fixed clustering to work with minsize of 1. 26245 26246 * dynprog.c: Reduced points for match, which improves some alignments. 26247 262482004-01-16 twu 26249 26250 * gmap.c, params.c, params.h, snap.c, stage1.c, stage1.h: Removed nsamples 26251 as a global parameter. 26252 26253 * bootstrap, configure.ac, Makefile.am: Added libtool and 26254 --enable-static-linking feature. 26255 26256 * gmap.c, snap.c: Implemented incremental clustering based on progressively 26257 smaller sampling intervals. Added ability to print alignment continuously. 26258 26259 * matchpair.c, matchpair.h, stage1.c: Implemented incremental clustering 26260 based on progressively smaller sampling intervals. 26261 26262 * match.c, matchdef.h: Moved structure definition to matchdef.h 26263 26264 * block.c, block.h, reader.c, reader.h: Added ability to reset ends of block. 26265 26266 * stage3.c, stage3.h: Added printing of number of unknowns and of gap 26267 openings in cDNA and genome. 26268 26269 * params.c, params.h: Added parameter for continuous output of alignment. 26270 26271 * pair.c, pair.h: Added output of number of unknowns. Added procedure for 26272 continuous output of alignment. 26273 26274 * dynprog.c: Created different penalties for gaps in single and paired gaps. 26275 262762004-01-14 desany 26277 26278 * loginfo: Added Yan to e-mail notifications. 26279 26280 * loginfo: Finally figured out where to put the quote (I think). 26281 26282 * loginfo: e-mail command tweak 26283 26284 * loginfo: Tweaking the e-mail command. 26285 26286 * loginfo: Sending log messages to desany when cgh module updates are 26287 committed. 26288 262892004-01-14 twu 26290 26291 * configure.ac: Added feature for static linking. 26292 26293 * params.c, params.h: Using two parameters for stutter: stuttercycles and 26294 stutterhits. 26295 26296 * gmap.c, snap.c: Performing sampling only when necessary. Using popt help 26297 when available. 26298 26299 * stage1.c, stage1.h: Performing sampling only when necessary. Limiting 26300 size and changing parameters for bestlist. 26301 26302 * matchpair.c, matchpair.h: Eliminated unused code in filtering procedure. 26303 26304 * indexdb.c: Fixed fread_int to be fread_uint. 26305 26306 * iit-read.c: Added abort statement when more than one interval retrieved by 26307 IIT_get_one. 26308 26309 * get-genome.c: Fixed bug with accessing chromosome_iit after being freed. 26310 Using popt help when available. 26311 26312 * oligo.c, oligo.h: Added Oligo_skip function. 26313 26314 * block.c, block.h: Removed maxtries and added Block_skip. 26315 263162004-01-12 twu 26317 26318 * gmap.c, snap.c, stage1.c: Changed strategy to use clusters of matches, 26319 after first pair found. 26320 26321 * gmapindex.c, snapindex.c: Eliminated check for genome database in 26322 compression mode. 26323 26324 * stage2.c: Changed distance penalty to 1 point per 1000 nt. 26325 26326 * pair.c, pair.h, stage3.c: Keeping separate track of query indels and 26327 target indels. 26328 26329 * genome.c, genome.h, get-genome.c: Implemented check for gbufferlen when 26330 shifting old sequence. 26331 26332 * separator.h: Added file for separator information. 26333 263342004-01-09 twu 26335 26336 * Makefile.am, get-genome.c: Changed program to use chromosome_iit and 26337 contig_iit, rather than text files. 26338 26339 * genome.c: Fixed bug from call to madvise on NULL region. 26340 26341 * iit-read.c, iit-read.h: Added function IIT_read_linear. 26342 26343 * gmapindex.c, snapindex.c: Storing length in interval of contig_iit, rather 26344 than in annotation. 26345 26346 * stage1.c: Changed paired algorithm to use sum of reciprocals of number of 26347 hits. 26348 26349 * get-genome.c: Removed unnecessary decompression functions (now in 26350 genome.c). 26351 26352 * gmap.c, snap.c: Fixed bug where fraction_threshold was declared as int 26353 rather than double. 26354 26355 * stage1.c: Revised algorithm to count number of query hits on 5' and 3' 26356 ends. 26357 26358 * Makefile.am, datadir.c, datadir.h, get-genome.c, gmap.c, snap.c, 26359 snapconfig.c: Moved datadir functions to a separate file. 26360 26361 * gmapindex.c, snapindex.c: Changed format of text files .chromosome and 26362 .contig. 26363 263642004-01-08 twu 26365 26366 * genome.c, genome.h, get-genome.c, gmap.c, iit-write.c, iit-write.h, 26367 snap.c: Allowed genomic patches to be longer or shorter than their 26368 endpoints. 26369 26370 * gmapindex.c, snapindex.c: Allowed intervals to have length that is 26371 different from their endpoints. Changed format for fasta file input to 26372 snapindex. 26373 26374 * iit_store.c, interval.c, interval.h, segmentpos.c, segmentpos.h: Allowed 26375 intervals to have length that is different from their endpoints. 26376 26377 * iit-read.c: Added carriage returns to annotations, if absent. 26378 263792004-01-07 twu 26380 26381 * gmap.c, params.c, params.h, snap.c: Made fraction_threshold a parameter. 26382 26383 * stage2.c: Changed calculation of penalty for large genome distances to be 26384 done only when necessary. 26385 26386 * snapconfig.c: Changed feedback message. 26387 26388 * genome.c, indexdb.c: Improved warning messages when memory mapping fails. 26389 263902004-01-05 twu 26391 26392 * snapdir.c: Changed name of snapdir to snapconfig. 26393 26394 * gmap.c, match.c, match.h, params.c, params.h, result.c, result.h, snap.c: 26395 Restored alignment using stage 1 only. 26396 26397 * stage1.c, stage1.h: Moved decision of stage1size and maxentries to here. 26398 26399 * genome.c: Added warning message of memory mapping of genome fails. 26400 26401 * genome.c: Restored batch memory mapping of genome. 26402 26403 * stage1.c: Greatly increased MAXENTRIES parameter. 26404 26405 * gmap.c, params.c, params.h, snap.c: Made stage1size dependent upon 26406 sequence length, with short sequences getting stage1size of 12. 26407 26408 * gmap_compress.pl, gmap_compress.pl.in, snap_compress.pl, whats_on: 26409 Generalized parse for coordinate separator. 26410 26411 * get-genome.c: Restored -- as coordinate separator. 26412 264132003-12-19 twu 26414 26415 * gmap.c, sequence.c, sequence.h, snap.c, stage3.c, stage3.h: New approach 26416 to chimeras, involving a subsequence and new stage1 procedure. 26417 26418 * stage2.c: Added distance penalty for long introns. 26419 26420 * Makefile.am, pair.c, segmentpos.c: Included separator.h 26421 26422 * pair.c, segmentpos.c, segmentpos.h: Removed unnecessary parameters in 26423 Segmentpos_print_accessions. 26424 26425 * get-genome.c: Change in coordinate separator from -- to .. 26426 264272003-12-17 twu 26428 26429 * gmap.c, match.c, match.h, matchpair.c, matchpair.h, snap.c, stage1.c, 26430 stage1.h: Changed procedures for finding chimeras to try singlelist of the 26431 appropriate side. 26432 26433 * pair.c, segmentpos.c: Changing coordinate output from -- to .. 26434 26435 * stage3.c, stage3.h: Changed procedures for finding chimeras to try 26436 singlelist of the appropriate side. Fixed bug in computing chimeric 26437 goodness. 26438 26439 * dynprog.c, dynprog.h: Provided separate parameters for ends, removed 26440 multiplicative reward, and changed all score calculations to be integers. 26441 264422003-12-16 twu 26443 26444 * matchpair.c, stage1.c, stage1.h: Fixed bug with position calculations on 26445 large chromosomes (> 2 Gig). 26446 26447 * gmap.c, matchpair.c, matchpair.h, snap.c, stage1.c, stage1.h: Based 26448 algorithm for finding extensions on 12-mers. 26449 26450 * chrnum.c, chrnum.h: Added function for computing chromosomal string and 26451 position from genomic position. 26452 264532003-12-15 twu 26454 26455 * gmap.c, matchpair.c, matchpair.h, params.c, params.h, snap.c: Made 26456 extension linear depending on query length. 26457 26458 * stage1.c, stage1.h: Made cluster list depend on size of largest cluster. 26459 264602003-12-14 twu 26461 26462 * stage2.c: Added a minimum exon length for ends during smoothing. 26463 26464 * stage1.c, stage1.h: Added a last-resort procedure for trying all matches 26465 found in stage 1. Enhanced debugging statements. 26466 26467 * gmap.c, snap.c: Added a last-resort procedure for trying all matches found 26468 in stage 1. 26469 26470 * oligoindex.c, shortoligomer.h: Returned to old method for store_positions, 26471 because it appears to be faster. 26472 26473 * genome.c, genome.h, get-genome.c: Enhanced debugging statements. 26474 26475 * matchpair.c: Added assertions about strands and relative position of 26476 matches. 26477 26478 * stage2.c: Returned to old method for store_positions. Fixed smoothing for 26479 a single exon. 26480 264812003-12-13 twu 26482 26483 * oligoindex.c, shortoligomer.h, types.h: Further attempt to increase speed 26484 of store_positions. 26485 26486 * gmap.c, snap.c: Fixed memory leak when stage3array is recomputed. 26487 26488 * oligoindex.c, oligoindex.h, stage2.c: Increasing speed of store_positions 26489 by reducing number of calls to calloc. 26490 26491 * gmap.c, matchpair.c, matchpair.h, params.c, params.h, snap.c, stage1.c, 26492 stage1.h: Changed cluster algorithm to rank clusters based on size and 26493 process the top ones based on sum of sizes. 26494 264952003-12-12 twu 26496 26497 * genome.c: Added check for enddiscard being 0. 26498 26499 * stage2.c: Did an in-lining of intron_score. 26500 26501 * gmap.c, params.c, params.h, snap.c, stage1.c, stage1.h, stage3.c, 26502 stage3.h: Added new cluster algorithm for stage 1, used when paired 26503 algorithm fails to produce an alignment with high identity. 26504 26505 * gmap.c, snap.c: Added ability to modify binary file to include default 26506 genome directory. 26507 26508 * snapconfig.c, snapdir.c: Initial import into CVS. 26509 265102003-12-10 twu 26511 26512 * gmapindex.c, indexdb.c, indexdb.h, snapindex.c: Added ability to generate 26513 idxoffsets and idxpositions files from compressed genome. 26514 26515 * gmap.c, snap.c: Changed the uncompressed flag from -G to -g. 26516 26517 * gmapindex.c, snapindex.c: Implemented direct writing of compressed genome 26518 file. 26519 265202003-12-09 twu 26521 26522 * iit_store.c: Fixed bug where non-copied string is entered into table. 26523 26524 * iit_get.c: Improved error message. 26525 26526 * iit_dump.c: Added function for showing all types. 26527 26528 * table.c: Added debugging statements. 26529 26530 * gmap.c, params.c, params.h, snap.c: For user-provided segments, skipping 26531 stage 1 (although can be specified by the user), to achieve increased 26532 speed. 26533 26534 * sequence.c, sequence.h: Restored function Sequence_revcomp. 26535 265362003-12-04 twu 26537 26538 * stage1.c: Restored cluster algorithm for short sequences. 26539 26540 * gmap.c, snap.c: Generalized definition of chimera, and reduced percentage 26541 to 80%. 26542 265432003-12-03 twu 26544 26545 * Makefile.am, iit-read.c, iit-read.h, iit_get.c: Augmented iit_get to 26546 handle types and file input. 26547 26548 * gmap.c, intlist.c, intlist.h, sequence.c, sequence.h, snap.c: Allowed 26549 user-specified genomic segment to have arbitrary length. 26550 26551 * gmap.c, snap.c: Restored -U flag for reporting in universal coordinates. 26552 26553 * iit-read.c: Fixed bug in IIT_dump_formatted. 26554 26555 * Makefile.am, md5-compute.c: Added program md5-compute. 26556 265572003-12-01 twu 26558 26559 * gmap.c, params.c, params.h, snap.c: Added message to user when FASTA file 26560 is run without batch mode. 26561 265622003-11-28 twu 26563 26564 * oligo.c: Changed debug statements. 26565 26566 * reader.c: Cleaned up pointer calculation. 26567 26568 * sequence.h: Removed Sequence_revcomp, which is not used. 26569 26570 * sequence.c: Revised comments. 26571 26572 * stage2.c: In-lined gap_score. 26573 26574 * indexdb.c: More bug fixes for bigendian machines on user-provided segments. 26575 26576 * indexdb.c: Fixed a problem with bigendian machines for user-provided 26577 segments. 26578 26579 * gmap.c, snap.c: Added releasestring in attempt to find version file. 26580 26581 * genome.c, genome.h: Added option for replacing X's with N's. 26582 26583 * get-genome.c: Added option for replacing X's with N's. Fixed bug when 26584 closing a null file pointer. 26585 26586 * iit_store.c: Append .iit to given filename, instead of replacing existing 26587 suffix. 26588 265892003-11-26 twu 26590 26591 * gmapindex.c, snapindex.c: Removed -U flag. 26592 26593 * gmapindex.c, indexdb.c, indexdb.h, snapindex.c: Reverted back to using 26594 uncompressed genome for making idxoffsets and idxpositions. 26595 26596 * gmap.c, snap.c: Changed flag for uncompressed genome from -G to -U. 26597 26598 * gmapindex.c, indexdb.c, indexdb.h, snapindex.c: Attempt to build 26599 idxoffsets and idxpositions from genomecomp, but has problems. 26600 26601 * genome.c: Added automated switching between compressed and uncompressed 26602 genome, if the requested one cannot be found. 26603 26604 * iit_store.c: Keeping last carriage return of annotation. 26605 26606 * iit_get.c: If iit file cannot be found, try appending .iit. 26607 26608 * gmapindex.c, snapindex.c: Finding labels in IIT directly instead of 26609 converting to a table. 26610 26611 * iit-read.c, iit-read.h, iit-write.c, iitdef.h: Changed IIT format to store 26612 alphabetic order of labels, so that labels can be found by binary search. 26613 266142003-11-25 twu 26615 26616 * genome.c, genome.h, gmap.c, pair.c, pair.h, params.c, params.h, snap.c, 26617 stage3.c, stage3.h: Added popt handling of options. Renamed various 26618 program options. 26619 26620 * iit-read.c, iit-read.h, iit_get.c: Added ability to search IITs by label. 26621 26622 * get-genome.c: Changed usage statement for popt autohelp. 26623 26624 * Makefile.am: Changed name of variable to POPT_LIBS. 26625 26626 * acinclude.m4, configure.ac: Added AC_DEFINE for HAVE_LIBPOPT. Set various 26627 defines to have value 1. 26628 26629 * gmapindex.c, iit-read.c, iit-read.h, iit-write.c, iit-write.h, iit_get.c, 26630 iit_store.c, iitdef.h, match.c, segmentpos.c, snapindex.c: Change made to 26631 format of IIT file. Now allowing each interval to be labeled. 26632 26633 * indexdb.c: Fix made for the case where an oligomer earlier than TT...TT is 26634 the last one and points to totalcounts. 26635 266362003-11-24 twu 26637 26638 * gmap_compress.pl, gmap_compress.pl.in, snap_compress.pl: Added notation 26639 for chimeric sequences. 26640 26641 * acinclude.m4: Added check for MAP_FAILED. Added sys/types.h when checking 26642 for pthreads (needed for Sun compiler). 26643 26644 * assert.h, bigendian.h, blackboard.h, block.h, chrnum.h, complement.h, 26645 dynprog.h, except.h, genome.h, genomicpos.h, iit-read.h, iit-write.c, 26646 iit-write.h, iit_dump.c, iit_get.c, iit_store.c, indexdb.h, interval.h, 26647 intlist.h, intron.h, list.h, match.h, matchpair.h, md5.h, mem.h, oligo.h, 26648 oligoindex.h, pair.h, pairpool.h, params.h, reader.h, reqpost.h, 26649 request.h, result.h, segmentpos.h, sequence.h, stage1.h, stage2.h, 26650 stage3.h, stopwatch.h, table.h, uintlist.h: Included config.h in all 26651 header files, to catch redefinition of const, which is needed for the Sun 26652 compiler. 26653 26654 * stage3.c: Commented out code that is never reached. 26655 26656 * genome.c, indexdb.c: Modified messages to stderr for batch mode. 26657 26658 * blackboard.c, gmap.c, reqpost.c, snap.c: Added sys/types.h to handle 26659 pthread_t, needed by Sun compiler. 26660 26661 * assert.c: Kept only the header file definition of assert, due to problem 26662 with Sun compiler. 26663 26664 * iit-read.c, table.c: For functions passed as arguments, added pointer and 26665 parentheses around parameter list. 26666 26667 * stage2.c: Changed some exon length parameters. 26668 266692003-11-19 twu 26670 26671 * gmap.c, snap.c, stage3.c, stage3.h: Added additional check for chimeras, 26672 based on top two hits. 26673 26674 * bigendian.c, indexdb.c: Moved masking to the logical or statements to 26675 address a bug on MacOSX. 26676 266772003-11-18 twu 26678 26679 * gmap.c, snap.c: Made directory searching process more flexible, by looking 26680 for version file at toplevel and subdirectory of datadir. 26681 26682 * genome.c, indexdb.c: Fixed calls to mmap and munmap when mmap fails. 26683 Moved stopwatch start before madvise command. 26684 26685 * bigendian.c, genome.c, indexdb.c: Added masks to chars when converting to 26686 an int or unsigned int, due to problem observed on DEC Alpha. 26687 26688 * genome.c, indexdb.c: Corrected conversion of littleendian to bigendian 26689 numbers. Added lseek and read procedures when mmap is not present or 26690 fails. 26691 26692 * bigendian.c: Corrected conversion of littleendian to bigendian numbers. 26693 26694 * Makefile.am: Generate ChangeLog only when CVS directory present. 26695 266962003-11-17 twu 26697 26698 * Makefile.am: Used LDADD instructions to call libraries instead of LDFLAGS. 26699 (Required for program to load on SGI.) Moved SCRIPTS under FULLDIST. 26700 26701 * configure.ac: Renamed POPT_LDFLAGS to POPT_LIBS. 26702 26703 * bootstrap: Added --copy flag to automake. 26704 26705 * Makefile.am: Added dist-hook to make ChangeLog up to date. 26706 26707 * config: Removed secondary config files generated by automake. 26708 26709 * gmapindex.c, snapindex.c: Fixed bug where X's were not being filled in, 26710 because variable declared as int, rather than unsigned int. 26711 26712 * block.c, block.h: Removed obsolete function. 26713 26714 * acinclude.m4: Moved to top-level directory. 26715 26716 * ChangeLog: Removed from repository. Can be generated as needed. 26717 26718 * bootstrap: Added --add-missing flag. 26719 26720 * README: Added message about config.site. 26721 26722 * bootstrap: Initial import into CVS. Added because autoreconf doesn't work 26723 with a config subdirectory. 26724 26725 * configure.ac: Made toplevel configure.ac work with a config subdirectory. 26726 26727 * gmap.c, snap.c, stage1.c, stage1.h: Changed algorithm to declare chimera 26728 only after alignment is done, and to use salvaged matches in that case. 26729 26730 * stage3.c, stage3.h: Stored genomicstart and genomicend as part of Stage3_T 26731 structure. 26732 26733 * ddsgap2_compress.pl: Initial import into CVS. 26734 26735 * whats_on, install-sh, missing, mkinstalldirs, sim4_compress.pl, 26736 sim4_uncompress.pl, snap_compress.pl, snap_uncompress.pl, snapbuild.pl.in, 26737 spidey_compress.pl: Moved to subdirectory. 26738 26739 * compile, config.guess, config.sub, depcomp: Removed secondary config files 26740 (generated by automake). 26741 26742 * Makefile.am: Adding top-level Makefile.am 26743 26744 * assert.c, assert.h, bigendian.c, bigendian.h, blackboard.c, blackboard.h, 26745 block.c, block.h, bool.h, chrnum.c, chrnum.h, complement.c, complement.h, 26746 dynprog.c, dynprog.h, except.c, except.h, genome.c, genome.h, 26747 genomicpos.c, genomicpos.h, genuncompress.c, get-genome.c, iit-read.c, 26748 iit-read.h, iit-write.c, iit-write.h, iit_dump.c, iit_get.c, iit_store.c, 26749 iitdef.h, indexdb.c, indexdb.h, interval.c, interval.h, intlist.c, 26750 intlist.h, intron.c, intron.h, list.c, list.h, listdef.h, match.c, 26751 match.h, matchpair.c, matchpair.h, md5.c, md5.h, md5.t.c, mem.c, mem.h, 26752 oligo-count.c, oligo.c, oligo.h, oligoindex.c, oligoindex.h, pair.c, 26753 pair.h, pairdef.h, pairpool.c, pairpool.h, params.c, params.h, reader.c, 26754 reader.h, reqpost.c, reqpost.h, request.c, request.h, result.c, result.h, 26755 segmentpos.c, segmentpos.h, sequence.c, sequence.h, shortoligomer.h, 26756 snap.c, snapindex.c, stage1.c, stage1.h, stage2.c, stage2.h, stage3.c, 26757 stage3.h, stopwatch.c, stopwatch.h, table.c, table.h, types.h, uintlist.c, 26758 uintlist.h: Moved source files to subdirectory. 26759 26760 * iit-read.c, iit-read.h: Added function IIT_get_typed. 26761 26762 * indexdb.c: Removed debugging message. 26763 26764 * snap.c, gmap.c, stage3.c, stage3.h: Improved determination of when an 26765 alternate strain applies, based on the aligned genomic segment. Added 26766 strain type to sorting of results. 26767 26768 * stage1.c: Bypassing the cluster algorithm. 26769 26770 * snap.c, gmap.c: Added ability to determine datadir from environment 26771 variable or configuration file. 26772 26773 * get-genome.c: Added popt processing of command-line options. 26774 26775 * genome.c: Added bigendian conversions for compressed genome, which is 26776 memory mapped. 26777 26778 * configure.ac, Makefile.am: Added check for popt library. 26779 267802003-11-15 twu 26781 26782 * snapindex.c, gmapindex.c: Fixed pointer bug. 26783 26784 * stage2.c, stage2.h: Removed directional check on stage 2 smoothing. 26785 Introduced separate length criterion for first long exon. 26786 26787 * stage3.c, stage3.h: Implemented checks and procedures for chimeric 26788 sequences. Removed directional check on stage 2 smoothing. 26789 26790 * pair.c, pair.h, result.c, result.h, snap.c, gmap.c, stage1.c, stage1.h: 26791 Implemented checks and procedures for chimeric sequences. 26792 26793 * genome.c: Changed debug statements from stderr to stdout. 26794 267952003-11-14 twu 26796 26797 * stage1.c: Changed identify_matches to assume the absence of duplicates. 26798 26799 * stage2.c: Changed criterion for short first and last exon during smoothing 26800 to be half of the corresponding region. 26801 26802 * stage3.c: Fixed debugging statements. 26803 26804 * snap.c, gmap.c, stage3.c, stage3.h: Fixed bug where a strain was falsely 26805 reported due to duplicate stage 3 objects and deletion of the one for the 26806 reference. 26807 26808 * sequence.c: Reduced poly-A tail left from 7 to 1. 26809 26810 * pair.c: Made print procedure backward compatible with old altstrain_iits. 26811 26812 * pair.c, params.c, params.h, snap.c, snapindex.c, gmap.c, gmapindex.c: Made 26813 changes to include name of reference strain. 26814 26815 * snap.c, gmap.c: Fixed typo in comment. 26816 26817 * iit-read.c: Fixed memory leak when altstrain_iit doesn't exist. 26818 26819 * Makefile.am: Integrated get-genome into snap code. 26820 26821 * get-genome.c, sequence.c, sequence.h: Major rewrite of get-genome, to 26822 integrate it into existing snap code. 26823 26824 * genome.c, genome.h, snap.c, gmap.c: Handled case where more than one patch 26825 from a given strain is applicable to a given genomic segment. 26826 26827 * intlist.c: Added check for null list in Intlist_to_array. 26828 26829 * indexdb.c: Changed idxpositions to eliminate duplicates during writing and 26830 to skip bad values during reading. 26831 268322003-11-13 twu 26833 26834 * snap_compress.pl, gmap_compress.pl, gmap_compress.pl.in: Revised program 26835 to parse strain info. 26836 26837 * stage1.c: Added some comments. 26838 26839 * block.c, block.h, oligo-count.c, oligo.c, oligo.h, Makefile.am: Revised 26840 oligo-count to use the new code. 26841 26842 * Makefile.am: Added build for get-genome. 26843 26844 * get-genome.c: Major cleaning of code. Added ability to read from 26845 compressed genome files. 26846 26847 * oligo-count.c: Initial import into CVS. Dated 2003-07-16. 26848 26849 * genome.c, genome.h, genomicpos.c, genomicpos.h, matchpair.c, matchpair.h, 26850 pair.c, pair.h, snap.c, gmap.c, stage3.c, stage3.h: Added ability to align 26851 to multiple strains. 26852 26853 * stage1.c: Cleaned up some bugs on handling stutter. Implemented check for 26854 duplicates in idxpositions. 26855 26856 * indexdb.c, indexdb.h, snapindex.c, gmapindex.c: Changed strategy for 26857 idxoffsets and idxpositions for strains. Now storing the union of all 26858 strains. 26859 268602003-11-12 twu 26861 26862 * snapbuild.pl.in, Makefile.am, gmapsetup.pl.in, gmap_setup.pl.in: Fixed 26863 procedure for making snapbuild. 26864 26865 * Makefile.am: Added procedure for making snapbuild script. 26866 26867 * configure.ac: Added feature for enabling full distribution. 26868 26869 * snapbuild.pl: Changed file from snapbuild.pl to snapbuild.pl.in. 26870 26871 * configure.ac, params.h, Makefile.am: Cleaned up specification of data 26872 directory and version file. 26873 26874 * params.c: Added provisions for reading altstrain IIT. 26875 26876 * snap.c, gmap.c: Cleaned up specification of data directory and version 26877 file. Added provisions for reading altstrain IIT. 26878 26879 * snapindex.c, gmapindex.c: Fixed problem with slashes in alternate strain 26880 name. 26881 26882 * stage1.c: Cleaned up code for stage1.c. Fixed memory leak for paired 26883 algorithm. Added chromosomal constraint for cluster algorithm. 26884 268852003-11-11 twu 26886 26887 * iit-read.c, iit-read.h, iit-write.c, iit-write.h, iit_dump.c, iit_get.c, 26888 iit_store.c, iitdef.h, indexdb.c, indexdb.h, interval.c, interval.h, 26889 segmentpos.c, segmentpos.h, snap.c, snapbuild.pl, snapbuild.pl.in, 26890 snapindex.c, Makefile.am, gmap.c, gmapindex.c, gmapsetup.pl.in, 26891 gmap_setup.pl.in: Changes made to introduce types into IITs, and to build 26892 SNAP databases with alternate strain information. 26893 26894 * match.c, match.h, stage1.c: Changes to stage 1 algorithm: (1) choice of 5' 26895 or 3' advancement based on number of hits, (2) stutter based on positions 26896 with hits, (3) computed fraction of paired hits on each end. 26897 268982003-11-10 twu 26899 26900 * snapbuild.pl, snapbuild.pl.in, gmapsetup.pl.in, gmap_setup.pl.in: Initial 26901 import into CVS. 26902 26903 * Makefile.am: Added object files for bigendian. 26904 26905 * iit-read.c: Added header file for bigendian.h 26906 26907 * bigendian.c, indexdb.c: Fixed problem in bigendian conversion. 26908 26909 * sequence.c: Fixed problem in handling sequence files without headers. 26910 26911 * iit-read.c: Changed most elements of IIT_T to be fread, rather than 26912 mmapped. Added code for program to work on bigendian architectures. 26913 26914 * indexdb.c: Changed offsets file to be fread, rather than mmapped. Added 26915 code for program to work on bigendian architectures. 26916 26917 * iitdef.h: Added comments. 26918 26919 * bigendian.c, bigendian.h, configure.ac, genuncompress.c, iit-write.c, 26920 snapindex.c, Makefile.am, gmapindex.c: Added code for program to work on 26921 bigendian architectures. 26922 269232003-11-08 twu 26924 26925 * acinclude.m4, configure.ac: Made VERSION automatically equal the current 26926 date. 26927 26928 * Makefile.am: Removed reference to iit_convert. 26929 26930 * genome.c: Turned off batch loading of genome. 26931 26932 * sequence.c, snap.c, gmap.c: Rest of header printed in output. Exceptional 26933 file terminations handled better. 26934 269352003-11-07 twu 26936 26937 * params.c, params.h, snap.c, gmap.c, stage1.c, stage1.h: Added cluster 26938 algorithm for short query sequences. 26939 26940 * block.c, block.h, longoligomer.c, longoligomer.h, match.c, match.h, 26941 matchpair.c, matchpair.h, oligo.c, oligo.h, Makefile.am: Removed 26942 longoligomers. 26943 26944 * genome.c: Fixed print statement for batch mode. 26945 26946 * snap.c, gmap.c: Restored dump_segs functionality. 26947 26948 * snapindex.c, gmapindex.c: Changed name of table from chroffset to 26949 chrlength. 26950 26951 * iit-read.c, iit-read.h: Added function IIT_dump_formatted. 26952 269532003-10-27 twu 26954 26955 * iit_get.c, iit_store.c: Removed carriage return at end of annotation. 26956 26957 * iit-read.c, iit-read.h, iit_dump.c, Makefile.am: Added a program for 26958 dumping IIT files. 26959 26960 * snapindex.c, gmapindex.c: Added better comments. 26961 26962 * whats_on: Fixed program to use new IIT file format. 26963 26964 * table.c: Removed assertion checks for key being non-zero, which doesn't 26965 work for a chromosome of 0. 26966 26967 * INSTALL: Copied generic installation instructions. 26968 26969 * COPYING: Created copyright notice. 26970 269712003-10-25 twu 26972 26973 * iit-write.c: Made Node_make static. 26974 269752003-10-24 twu 26976 26977 * indexdb.c: Fixed format of batch statement. 26978 26979 * iit-read.c, iit-write.c, iit_get.c, match.c, segmentpos.c, snapindex.c, 26980 gmapindex.c: Changed annotations in .iit files to have '\0' characters at 26981 the ends, so they can be used in the file, without copying. 26982 269832003-10-23 twu 26984 26985 * interval.c: Added comment about sorting procedures. 26986 26987 * iit_get.c, iit_store.c: Changed program to use the IIT implementation in 26988 this directory. 26989 26990 * iit-read.c: Added madvise command. 26991 26992 * genome.c, indexdb.c: Changed reporting of touching pages under batch mode. 26993 26994 * Makefile.am: Added iit_store and iit_get. 26995 26996 * genome.c, indexdb.c: Revised touching of pages for batch mode. 26997 26998 * assert.h, blackboard.h, block.h, bool.h, chrnum.h, complement.h, 26999 dynprog.h, except.h, genome.h, genomicpos.h, iit-read.h, iit-write.h, 27000 iitdef.h, indexdb.h, interval.h, intlist.h, intron.h, list.h, listdef.h, 27001 longoligomer.h, match.h, matchpair.h, md5.h, mem.h, oligo.h, oligoindex.h, 27002 pair.h, pairdef.h, pairpool.h, params.h, reader.h, reqpost.h, request.h, 27003 result.h, segmentpos.h, sequence.h, shortoligomer.h, stage1.h, stage2.h, 27004 stage3.h, stopwatch.h, table.h, types.h, uintlist.h: Added RCS Id string 27005 to header files. 27006 27007 * snap.c, gmap.c: Removed call to strdup. 27008 27009 * snapindex.c, gmapindex.c: Removed printing of superaccessions for NCBI 27010 genomes. 27011 27012 * segmentpos.c: Removed unused procedures based on Berkeley DB. 27013 27014 * chrnum.c: Fixed problem with numeric-alpha ordering of chromosomes. XU 27015 now follows X and precedes Y. 27016 270172003-10-22 twu 27018 27019 * acinclude.m4, genome.c, indexdb.c: Added macros to check for pagesize 27020 determination. 27021 27022 * config.h.in: Removed derived file. 27023 27024 * configure.ac: Cleaned up unnecessary autoconf macros. 27025 27026 * acinclude.m4, config.h.in, genome.c, genuncompress.c, iit-read.c, 27027 indexdb.c: Improved autoconf checks and header files for mmap. 27028 27029 * snapindex.c, gmapindex.c: Fixed problem with freeing memory. 27030 27031 * segmentpos.c: Fixed small error with printing accession bounds. 27032 27033 * chrnum.c, iit-read.c, iit-read.h, iit-write.c, iit-write.h, segmentpos.c, 27034 snap.c, snapindex.c, gmap.c, gmapindex.c: Fixed memory leaks. 27035 27036 * acinclude.m4, block.h, chrnum.c, chrnum.h, config.h.in, configure.ac, 27037 database.c, database.h, genomicpos.c, genomicpos.h, get-genome.c, 27038 iit-read.c, iit-read.h, iit-write.c, iit-write.h, interval.c, interval.h, 27039 match.c, match.h, offset.c, offset.h, offsetdb.c, offsetdb.h, oligo.c, 27040 oligo.h, pair.c, pair.h, params.c, params.h, segmentpos.c, segmentpos.h, 27041 sequence.c, snap.c, snapindex.c, Makefile.am, gmap.c, gmapindex.c, 27042 stage1.c, stage1.h, stage3.c, stage3.h, table.c, table.h: Eliminated 27043 dependence upon Berkeley DB. 27044 27045 * table.c, table.h: Initial import into CVS. 27046 270472003-10-21 twu 27048 27049 * acinclude.m4, config.h.in, configure.ac, genome.c, genuncompress.c, 27050 iit-read.c, indexdb.c: Added checks for various mmap flags. 27051 27052 * iitdef.h: Restructed IIT_T commands. 27053 27054 * iit-read.c, iit-read.h, iit-write.c, iit-write.h, interval-read.c, 27055 interval-read.h, interval.c, interval.h, pair.c, snap.c, Makefile.am, 27056 gmap.c: Restructured Interval_T and IIT_T implementations so they don't 27057 depend on BerkeleyDB, and added ability to write IITs. 27058 27059 * acinclude.m4, database.c: Added provision for BerkeleyDB version 4.1. 27060 27061 * iit_store.c: Changed format of input file to have only intervals on the 27062 header line. 27063 27064 * iit_get.c: Changed program to use new IIT format. 27065 27066 * iit_store.c: Fixed problem with annotlist being reversed. 27067 27068 * iit_store.c: Changed format of iit file to include annotations. 27069 270702003-10-20 twu 27071 27072 * sequence.c: Corrected type for return value of fgetc. 27073 27074 * oligo.c: Corrected type for return value of Reader_getc. 27075 27076 * stage1.h: Removed db.h as an included header. 27077 27078 * acinclude.m4: Added -rpath flag during linking of Berkeley DB. 27079 27080 * Makefile.in, configure: Removing from CVS. 27081 27082 * Makefile.in, configure: Result of autoreconf. 27083 27084 * Makefile.am: Added header files to SOURCES. 27085 27086 * configure.ac: Added no-dependencies option. 27087 27088 * iit-read.c: Removed MAP_VARIABLE from mmap call, because not recognized by 27089 Linux. 27090 27091 * sequence.c: Renamed variable strlen to avoid compiler error on Linux. 27092 27093 * Makefile.in: Added various auxiliary files. 27094 27095 * Makefile.in, compile, config.guess, config.sub, depcomp, install-sh, 27096 configure: Initial import into CVS. 27097 27098 * missing, mkinstalldirs: Provided updated version. 27099 27100 * genome.c, genomicpos.c, iit-read.c, iit-read.h, indexdb.c, intlist.c, 27101 intlist.h, match.c, md5.c, mem.c, offset.c, offsetdb.c, oligoindex.c, 27102 pair.c, segmentpos.c, segmentpos.h, Makefile.am, uintlist.c, uintlist.h: 27103 Addressed compiler warnings from gcc. 27104 271052003-10-19 twu 27106 27107 * acinclude.m4, blackboard.c, configure.ac, reqpost.c, snap.c, Makefile.am, 27108 gmap.c: Allowed pthreads to be enabled or disabled. 27109 271102003-10-18 twu 27111 27112 * assert.c, block.c, chrnum.c, complement.c, database.c, dynprog.c, 27113 except.c, genome.c, genomicpos.c, genuncompress.c, get-genome.c, 27114 iit-read.c, indexdb.c, interval-read.c, intron.c, list.c, longoligomer.c, 27115 match.c, matchpair.c, md5.c, offset.c, offsetdb.c, oligo.c, oligoindex.c, 27116 pair.c, pairpool.c, params.c, reader.c, reqpost.c, request.c, result.c, 27117 segmentpos.c, sequence.c, snap.c, snapindex.c, gmap.c, gmapindex.c, 27118 stage1.c, stage2.c, stage3.c, stopwatch.c: Added RCS Id string correctly 27119 27120 * assert.c, block.c, chrnum.c, complement.c, database.c, dynprog.c, 27121 except.c, genome.c, genomicpos.c, genuncompress.c, get-genome.c, 27122 iit-read.c, indexdb.c, interval-read.c, intron.c, list.c, longoligomer.c, 27123 match.c, matchpair.c, md5.c, offset.c, offsetdb.c, oligo.c, oligoindex.c, 27124 pair.c, pairpool.c, params.c, reader.c, reqpost.c, request.c, result.c, 27125 segmentpos.c, sequence.c, snap.c, snapindex.c, gmap.c, gmapindex.c, 27126 stage1.c, stage2.c, stage3.c, stopwatch.c: Added rcsid strings. 27127 27128 * blackboard.c, block.c, complement.c, database.c, dynprog.c, except.c, 27129 iit-read.c, interval-read.c, intron.c, list.c, matchpair.c, md5.c, mem.c, 27130 mem.h, oligo.c, oligoindex.c, pair.c, pairpool.c, params.c, reader.c, 27131 reqpost.c, request.c, result.c, sequence.c, stage2.c, stopwatch.c: 27132 Rearranged header includes. 27133 27134 * longoligomer.h: Defined T for both cases of HAVE_64_BIT. 27135 27136 * longoligomer.c: Added conditional compiling based on HAVE_64_BIT. 27137 27138 * offset.h: Added necessary header file stdio.h. 27139 27140 * types.h: Added compiler directives from config.h. 27141 27142 * configure.ac: Initial changes to configure.scan to make autoconf and 27143 automake work for the cc compiler. 27144 27145 * Makefile: Removed Makefile from CVS, because it is now generated from 27146 Makefile.am by automake, and then from Makefile.in by configure. 27147 27148 * AUTHORS, COPYING, ChangeLog, INSTALL, NEWS, README, acinclude.m4, config, 27149 missing, mkinstalldirs, config.h.in, Makefile.am: Added files for autoconf 27150 and automake to work. 27151 27152 * configure.ac: Initial configure.ac from configure.scan produced by 27153 autoscan. 27154 271552003-10-17 twu 27156 27157 * gencompress.c, snapindex.c, gmapindex.c: Moved gencompress function inside 27158 snapindex (previously in gencompress.c). 27159 27160 * segmentpos.c: Changed type of relstart and relend to int, due to problems 27161 with long. 27162 271632003-10-16 twu 27164 27165 * dynprog.c: Removed splice-site.c. 27166 27167 * commafmt.c, commafmt.h, genomicpos.c, genomicpos.h, match.c, pair.c, 27168 segmentpos.c: Moved commafmt command to genomicpos.c. 27169 27170 * types.h: Defined UINT8 only if HAVE_64_BIT is defined. 27171 27172 * splice-site.c, splice-site.h: Removed splice-site.c from CVS. 27173 27174 * readcirc.c, readcirc.h: Removing readcirc from CVS. 27175 27176 * radixsort.c, radixsort.h: Removing radixsort from CVS. 27177 27178 * boyer-moore.c, boyer-moore.h: Removed Boyer-Moore procedures from CVS. 27179 27180 * longoligomer.h: Introduced constants and procedures for Longoligomer_T on 27181 32-bit systems. 27182 27183 * snapindex.c, gmapindex.c: Changed output type of write_genome_file. 27184 27185 * indexdb.c, indexdb.h: Introduced Storedoligomer_T. 27186 27187 * iit-read.c: Added type cast from void * to char *. 27188 27189 * oligo.c, oligo.h: Created 32-bit versions of procedures. 27190 27191 * match.c: Removed functions Match_print() and oligo_nt(). 27192 27193 * stage1.c: Removed mask from Block_T. Removed function Match_print(). 27194 27195 * block.c, block.h: Removed mask from Block_T. 27196 27197 * longoligomer.c: Added object Longoligomer_T for 32-bit systems. 27198 271992003-10-13 twu 27200 27201 * Makefile, chrnum.c, database.c, gencompress.c, genome.c, genome.h, 27202 genuncompress.c, stage1.c, chrnum.h, match.c, match.h, offset.c, offset.h, 27203 pair.c, pair.h, segmentpos.c, segmentpos.h, snap.c, snapindex.c, gmap.c, 27204 gmapindex.c, stage3.c, stage3.h: Changed unsigned int to more descriptive 27205 types. 27206 27207 * genomicpos.c, genomicpos.h, longoligomer.h, shortoligomer.h: Added new 27208 types. 27209 27210 * offsetdb.c, offsetdb.h: Added type for Chrnum_T. Removed function 27211 Offset_position_to_chr. 27212 27213 * oligoindex.c, stage1.c, stage2.c: Changed unsigned long and unsigned int 27214 to more descriptive types. 27215 27216 * add-chrpos-to-endpoints.c: Removed file used for prototyping. 27217 27218 * rsort-check.c, rsort-test.c: Removed utility files for radixsort. 27219 27220 * sequence.c: Removed code for computing CRC32 checksum. 27221 27222 * sample-oligos.c: Removed sample-oligos.c, which was used for prototyping. 27223 27224 * Makefile: Removed cksum-fa 27225 27226 * prb.c, prb.h: Removed prb.c and prb.h, which implemented red-black trees. 27227 27228 * block.c, block.h, indexdb.c, indexdb.h, match.c, match.h, oligo.c, 27229 oligo.h, oligoindex.c, oligoindex.h: Changed unsigned long and unsigned 27230 int to more informative types. 27231 27232 * cksum.c: Removed cksum.c, which is now computed in sequence.c 27233 27234 * cksum-fa.c: Removed cksum-fa.c, which was a utility program. 27235 27236 * cell.c, cell.h: Removed Cell_T, which was designed for the HashDB storage 27237 scheme for genomic oligomers. 27238 27239 * pair.c, pair.h, sequence.c, sequence.h, stage3.c: Added provision for 27240 correcting coverage in the presence of genomic gaps at the ends. 27241 27242 * chrnum.c: Fixed a bug in printing output. 27243 272442003-10-09 twu 27245 27246 * stage3.c, stage3.h: Added reward for spliced cDNAs based on number of 27247 exons, if it's greater than 2. Also, added flag for conservative behavior 27248 for splice site prediction, by reducing the reward for canonical splice 27249 sites. Note, however, that such behavior causes SNAP to perform poorly in 27250 the presence of sequence errors. 27251 27252 * dynprog.c, dynprog.h, params.c, params.h, snap.c, gmap.c: Added flag for 27253 conservative behavior for splice site prediction, by reducing the reward 27254 for canonical splice sites. Note, however, that such behavior causes SNAP 27255 to perform poorly in the presence of sequence errors. 27256 272572003-10-07 twu 27258 27259 * iit-read.c: Adapt to new format of bounds database contents. 27260 27261 * pair.c: Makes correct call to IIT_get when coordinates are in reverse 27262 order. 27263 272642003-08-19 twu 27265 27266 * Makefile, iit-read.c, iit.c, iit.h, interval-read.c, interval.c, 27267 interval.h, pair.h, params.h, snap.c, gmap.c, stage3.h: Changed filenames 27268 from iit.c and interval.c to iit-read.c and interval-read.c 27269 27270 * whats_on: Generalized procedure for identifying FASTA files containing 27271 ESTs. 27272 27273 * sequence.c: Fixed conversion of char to unsigned char. 27274 27275 * Makefile, bounds.c, bounds.h, iit-read.c, iit-read.h, iit.c, iit.h, 27276 interval-read.c, interval-read.h, interval.c, interval.h, pair.c, pair.h, 27277 params.c, params.h, snap.c, gmap.c, stage3.c, stage3.h: Changed calls to 27278 iit to open the files just once. 27279 27280 * bounds.c, bounds.h: Adding bounds.c file to compute bounds. 27281 27282 * Makefile, database.c, database.h, iit-read.c, iit.c: Added ability to use 27283 a gene bounds iit file. 27284 27285 * interval-read.c, interval-read.h, interval.c, interval.h: Revised version 27286 from berkeleydb CVS repository. 27287 27288 * pair.c, pair.h, params.c, params.h, snap.c, gmap.c, stage3.c, stage3.h: 27289 Added capability to use a gene bounds iit file. 27290 272912003-08-18 twu 27292 27293 * iit-read.h, iit.h: Initial import into CVS. 27294 27295 * iit_get.c: Compare only to query length. 27296 27297 * get-genome.c: Fixes procedure isrange to make a copy of the string. 27298 272992003-07-07 twu 27300 27301 * whats_on: Changed behavior to not die if directory isn't found. 27302 27303 * chrnum.c, chrnum.h, segmentpos.c: Fixed sorting and printing for 27304 chromosomes like 2L. 27305 27306 * stage3.h: Removed Stage3_goodness as an external procedure. 27307 27308 * stage3.c: Changed goodness within a given chromosomal segment to include 27309 canonical introns, but goodness between chromosomal segments to exclude 27310 this. 27311 27312 * stage2.c: Increased MAXHITS from 20 to 1000. Previous value was too low 27313 and led to splicing errors. 27314 27315 * get-genome.c: Changed program to try segment first as a chromosome, then 27316 as a contig. 27317 27318 * offsetdb.c: Improved output statements to print beginning and ending of 27319 chromosomes. 27320 273212003-06-19 twu 27322 27323 * dynprog.c: Changed penalties. Made reward for extension multiplicative. 27324 27325 * Makefile, md5.c, md5.h, md5.t.c, params.c, params.h, sequence.c, 27326 sequence.h, snap.c, snap_compress.pl, snap_uncompress.pl, gmap.c, types.h, 27327 gmap_compress.pl, gmap_compress.pl.in, gmap_uncompress.pl, 27328 gmap_uncompress.pl.in: Added MD5 calculations. 27329 27330 * stage3.c: Added debugging statements for finalscore. 27331 27332 * cksum-fa.c: Added comments. 27333 273342003-06-17 twu 27335 27336 * Makefile: Rearranged lines. 27337 27338 * snap.c, gmap.c: Fixed calculation of indexdb to occur only once for 27339 user-provided segment. 27340 27341 * sequence.c, sequence.h: Added computation for crc32. 27342 273432003-06-13 twu 27344 27345 * dynprog.c: Changed reward for partial match to be zero. 27346 27347 * stage3.c: Fixed bug where pairs_fwd or pairs_rev might be NULL. 27348 273492003-06-03 twu 27350 27351 * stage2.c, stage2.h: Created separate paths for forward and revcomp 27352 directions after smoothing. Added back intron score during calculations. 27353 27354 * stage3.c: Separated calculations of forward and revcomp paths. 27355 27356 * snap.c, gmap.c: Increased size of maxlookback. 27357 27358 * pair.c, pair.h: Added calculation of number of canonical exons. 27359 27360 * dynprog.c: Setting finalscore as a return value. 27361 27362 * stage3.c: Added number of canonical exons to goodness criterion. Added 27363 "Stage 3" to debug statements. 27364 273652003-05-27 twu 27366 27367 * snap.c, gmap.c, stage2.c, stage2.h, stage3.c, stage3.h: Moved alignment of 27368 different cDNA direction from stage 2 to stage 3. 27369 27370 * pair.c, pair.h: Changed Pair_fracidentity to work on a list, rather than 27371 an array. 27372 273732003-05-22 twu 27374 27375 * stage3.c: Changed goodness function to ignore number of canonical introns. 27376 27377 * snap.c, gmap.c: Added parameter for sufflookback, potentially different 27378 from maxlookback, but found that setting maxlookback >> sufflookback led 27379 to long, poor alignments, so set maxlookback = sufflookback. 27380 27381 * params.c, params.h, stage2.c, stage2.h: Added separate parameter for 27382 sufflookback, to be used in stage 2, and possibly different from 27383 maxlookback, used in stage 3. 27384 273852003-05-03 twu 27386 27387 * genome.c, genuncompress.c, iit-read.c, iit.c, indexdb.c: Removed 27388 MAP_VARIABLE from mmap command, because it is not available in Linux. 27389 27390 * hash-test.c, hashdb-read.c, hashdb-read.h, hashdb-write.c, hashdb-write.h, 27391 hashdb.c, hashdb.h: Removed hashdb files, which have been replaced by 27392 indexdb. 27393 27394 * whats_on: Added error message. 27395 27396 * Makefile: Removed old Makefile commands. 27397 27398 * snapgenerate.c, snapindex.c, gmapindex.c: Moved functions from 27399 snapgenerate.c to snapindex.c, so only snapindex is needed to create SNAP 27400 genome files. 27401 27402 * genuncompress.c: Initial import into CVS. 27403 274042003-04-30 twu 27405 27406 * stage2.c: Added check for MAXHITS in stage 2, to prevent slowness problems 27407 from repetitive cDNAs in repetitive genomic segments (such as AA704019). 27408 27409 * stage1.c: Added debugging statement. 27410 27411 * snap_uncompress.pl, gmap_uncompress.pl, gmap_uncompress.pl.in: Fixed 27412 problem where gpos was not handled correctly for the minus strand. 27413 27414 * chrnum.c: Fixed problem where signed chromosomes were being printed 27415 incorrectly. 27416 274172003-04-29 twu 27418 27419 * whats_on: Fixed problem where genomic coordinates were in the order of 27420 largest, then smallest (reverse strand). 27421 274222003-04-27 twu 27423 27424 * stage3.c: Removed queryoffset. 27425 27426 * stage2.c, stage2.h: Removed queryoffset. Made sampling interval variable. 27427 Added bounding of a querypos to a single hit if its top score exceeds its 27428 second highest score. 27429 27430 * snap.c, gmap.c: Changed lookback and extramaterial_paired. 27431 27432 * sequence.c: Changed trimming to leave non-poly-A/T oligomers. 27433 27434 * chrnum.c, database.c, get-genome.c, segmentpos.c: Changed interpretation 27435 of chromosome numbers to allow all single letters and all numbers. 27436 27437 * offsetdb.c: Extra blank line. 27438 274392003-04-16 twu 27440 27441 * Makefile, accpos.c: Removed file accpos.c, which isn't being used anymore. 27442 27443 * genome.c, sequence.c, sequence.h: Removed offset as a parameter for 27444 Sequence_genomic_new. 27445 27446 * mem.c: Removed upper limit check on allocating memory. 27447 27448 * pair.h: Removed queryoffset from print routines. 27449 27450 * pair.c: Removed queryoffset from print routines. Fixed calculation of 27451 genomic distances for Crick strand. 27452 274532003-04-14 rkh 27454 27455 * config, cvswrappers: *** empty log message *** 27456 274572003-04-09 rkh 27458 27459 * config: *** empty log message *** 27460 274612003-04-07 twu 27462 27463 * dynprog.c: Reduced rewards for canonical introns. 27464 27465 * pair.c: Added conversion to uppercase. 27466 27467 * mem.c: Added check for unexpectedly large allocations. 27468 274692003-04-02 twu 27470 27471 * stage3.c: Made separate procedures for 3' and 5' ends. Turned off 27472 Boyer-Moore extension at ends. Added checks to prevent dynamic 27473 programming past end of sequence. 27474 27475 * params.c: Removed freeing of version. 27476 27477 * pairpool.c: Added additional debugging checks. 27478 27479 * pair.c: Improved output for user-provided segments. 27480 27481 * indexdb.c, indexdb.h, match.c, matchpair.c, matchpair.h, offset.c, 27482 sequence.c, sequence.h, snap.c, gmap.c: Now performing stage 1 on 27483 user-provided segments. This eliminates poor alignments when the 27484 user-provided segment is longer than stage 1 would have provided. 27485 27486 * segmentpos.c: Added limit to number of accessions reported. 27487 27488 * request.c, request.h, blackboard.c, blackboard.h: Changed from name from 27489 genomicseg to usersegment. 27490 27491 * stage1.c: Removed offset from call to Block_T procedures. 27492 27493 * genome.c: Renamed some procedures. 27494 27495 * dynprog.c: Increased penalties for mismatch. 27496 27497 * chrnum.c: Allowed chromosome 0. 27498 27499 * block.c, block.h: Removed offset from list of parameters. 27500 275012003-03-27 twu 27502 27503 * pair.c, snap_compress.pl, snap_uncompress.pl, stage3.c, gmap_compress.pl, 27504 gmap_compress.pl.in, gmap_uncompress.pl, gmap_uncompress.pl.in: Changed 27505 alignment output for dual breaks. 27506 275072003-03-25 twu 27508 27509 * dynprog.c: Created an inline procedure and scheme for scoring canonical 27510 and alternate introns. Increased penalties for mismatches. 27511 27512 * intron.c, intron.h: Moved most functions to other files, to increase speed. 27513 27514 * pair.c, pair.h, pairdef.h, pairpool.c, stage3.c: Added field to Pair_T 27515 object to denote a gap. 27516 27517 * sequence.c: Fixed bug that caused large amounts of memory to be allocated. 27518 275192003-03-24 twu 27520 27521 * snap_compress.pl, gmap_compress.pl, gmap_compress.pl.in: Introduced a 27522 better error statement. 27523 27524 * stage2.c: Changed sampling to start at -1 after the first 8-mer missed, 27525 and then go back by the Nyquest rate. 27526 27527 * stage3.c: Introduced peelback for single gaps. 27528 275292003-03-21 twu 27530 27531 * genome.c, pair.c, pair.h, sequence.c, sequence.h, snap.c, gmap.c, 27532 stage3.c: Fixed algorithm to handle poly-T starts as well as poly-A ends. 27533 Added extra information to Sequence_T structure and output procedures to 27534 handle this correctly. 27535 27536 * snap.c, gmap.c: Fixed problems with Stage3_T objects that were not 27537 assigned to NULL. Added flushing of output for debugging. 27538 27539 * dynprog.c: Fixed dynamic programming on ends so the genomic segment won't 27540 stick out. 27541 275422003-03-20 twu 27543 27544 * spidey_compress.pl: Modified routine to look for spaces of at least 10, 27545 instead of 20. 27546 27547 * dynprog.c: Added a separate reward for canonical introns, depending on the 27548 defect rate. 27549 27550 * list.c, list.h: Added a command for setting the head of a list. 27551 27552 * pair.c: Fixed counting of indels. 27553 27554 * pairpool.c: Created new debugging commands. 27555 27556 * snap.c, gmap.c: Added trimming of first or last exon in stage 2 if the 27557 defect rate is high enough and the exons are too long. Increased lookback 27558 from 60 to 90. 27559 27560 * stage3.c: Modified peelback to go past nonconsecutive hits, stopping only 27561 at an intron. 27562 27563 * stage2.c, stage2.h: Added trimming of first or last exon if the defect 27564 rate is high enough and the exons are too long. 27565 275662003-03-16 twu 27567 27568 * stage2.c: Added hooks for making smooth_path depend on defect_rate, but 27569 this appears to be a bad idea. 27570 27571 * pair.c: Improved consistency check to work when cdna_direction is 27572 initially zero. 27573 27574 * dynprog.c, dynprog.h, stage3.c: Changed effect of defect rate to be on 27575 mismatches and gaps, rather than intron scores. 27576 275772003-03-15 twu 27578 27579 * pair.c, pair.h, snap.c, gmap.c, stage2.c, stage2.h, stage3.c, stage3.h: 27580 Added check for consistency of intron directions, and ability to back 27581 track to stage 2 with forced cdna_directions if the stage 3 result is 27582 inconsistent. 27583 275842003-03-14 twu 27585 27586 * dynprog.c, dynprog.h, pair.c, pair.h, stage2.c, stage2.h, stage3.c: Added 27587 estimation of defect_rate in stage 2, and used it to change parameters in 27588 dynamic programming and extension of ends. 27589 275902003-03-12 twu 27591 27592 * stage3.c: Changed limitation on Boyer-Moore search to be a certain number 27593 of hits. This compensates for the fact that smaller oligomers will occur 27594 more frequently than longer ones, and that longer ones are more 27595 statistically significant. 27596 27597 * stage3.c, stage3.h: Limited length of Boyer-Moore search at ends. Changed 27598 name of minendsearch to minendtrigger. 27599 27600 * params.c, params.h, snap.c, gmap.c: Changed name of minendsearch to 27601 minendtrigger. 27602 276032003-03-11 twu 27604 27605 * dynprog.c: Extended the search range of bridge_gap, so that it finds 27606 introns even at the bounds of the dynamic programming. 27607 27608 * params.c, params.h, snap.c, gmap.c, stage3.c, stage3.h: Added parameter 27609 for minendsearch. 27610 27611 * dynprog.c: Fixed safety check in intron_score for reading off end of 27612 segment. 27613 27614 * stage3.c: Rearranged computation of stage 3, such that middle is computed 27615 first, then cDNA direction is recomputed, then 5' and 3' ends are computed. 27616 27617 * pair.c, pair.h: Added function for computing cDNA direction from list of 27618 pairs. 27619 27620 * dynprog.c: Adjusted various dynamic programming scores. Fixed coordinates 27621 in gap. Added check for very short introns. 27622 27623 * stage3.c: Discrimination between paired gap dynamic programming at ends 27624 and in middle. 27625 27626 * dynprog.c, dynprog.h: Major rewrite of dynamic programming procedures. 27627 Changed from Gotoh algorithm to pure banded procedure. Reversing 27628 sequences when necessary, so all computations are symmetric. 27629 276302003-03-10 twu 27631 27632 * sim4_compress.pl, spidey_compress.pl: Added output of the number of exons. 27633 27634 * stage3.c: Added check for genomejump being zero or negative, which would 27635 give rise to a position beyond the genomic segment. 27636 27637 * pair.c: Added check for zero denominator. 27638 27639 * boyer-moore.c: Added check for sequence to consist entirely of valid 27640 nucleotides. 27641 276422003-03-09 twu 27643 27644 * spidey_compress.pl: Added printing of exon lengths, intron lengths, and 27645 dinucleotides, to match new output of snap_compress.pl. Fixed problems 27646 with parsing Spidey output. 27647 27648 * sim4_compress.pl: Added printing of exon lengths, intron lengths, and 27649 dinucleotides, to match new output of snap_compress.pl. 27650 27651 * snap_compress.pl, gmap_compress.pl, gmap_compress.pl.in: Fixed problem 27652 when reverse intron is GT-AG. 27653 27654 * snap_uncompress.pl, gmap_uncompress.pl, gmap_uncompress.pl.in: Fixed bug 27655 that occurs when snap was called with -N, without printing intron lengths. 27656 276572003-03-08 twu 27658 27659 * stage1.c: Fixed a memory leak from not freeing Stage1_T object. 27660 27661 * block.c: Fixed bug which caused a memory leak because we were overwriting 27662 a previous querypos. 27663 27664 * oligo.c: Fixed debug message. 27665 276662003-03-07 twu 27667 27668 * snap.c, gmap.c: Reduced stage1size for short query sequences (< 60 bp). 27669 27670 * match.c, match.h, stage1.c, stage1.h: Fixed Match_print to print the 27671 correct oligo. 27672 27673 * get-genome.c: Changed header to contain the version number. 27674 27675 * snap_uncompress.pl, gmap_uncompress.pl, gmap_uncompress.pl.in: Added exon 27676 lengths to compressed output. 27677 27678 * snap_compress.pl, gmap_compress.pl, gmap_compress.pl.in: Added exon 27679 lengths to compressed output. Removed printing of dinucleotides for 27680 canonical introns. 27681 276822003-03-06 twu 27683 27684 * stage3.c: Cleaned up code extensively. Added Boyer-Moore searches on both 27685 ends of cDNA. 27686 27687 * dynprog.c, dynprog.h: Cleaned up code by making separate procedures for 27688 single gap in middle, and 5' and 3' ends. 27689 27690 * pair.c, pair.h: Added procedure for dumping a list of pairs. 27691 27692 * boyer-moore.c: Removed debugging statements. 27693 276942003-03-04 twu 27695 27696 * Makefile, boyer-moore.c, boyer-moore.h: Addition of Boyer-Moore string 27697 search. 27698 27699 * stage3.c: Consolidated peelback code. Beginning to insert Boyer-Moore 27700 code. 27701 27702 * stage2.c: Fixed bug where index was -1. 27703 27704 * block.c, block.h, oligo.c, oligo.h, stage1.c: Fixed code to use stage1size 27705 instead of INDEX1PART in certain places. 27706 27707 * snap_compress.pl, gmap_compress.pl, gmap_compress.pl.in: Fixed code to 27708 handle genomic accession when genomic sequence is provided by the user. 27709 27710 * pair.c, sequence.c, sequence.h, snap.c, gmap.c: Fixed code to print out 27711 genomic accession when genomic sequence is provided by the user. 27712 27713 * match.c: Fixed code to print just forward oligo. 27714 277152003-03-03 twu 27716 27717 * intron.c, pair.c: Changed '===...===' to represent a non-canonical intron. 27718 27719 * dynprog.c: Reduced reward to semi-canonical introns to be slightly less 27720 than that for canonical introns. 27721 27722 * stage3.c: Changed output of large gaps from '=========' to '###...###'. 27723 27724 * snap_uncompress.pl, gmap_uncompress.pl, gmap_uncompress.pl.in: Made 27725 changes to accommodate enhancements to SNAP, namely use of '#' for large 27726 gaps and switch of intronends and intronlengths info. 27727 27728 * snap_compress.pl, gmap_compress.pl, gmap_compress.pl.in: Initial import 27729 into CVS. 27730 277312003-03-02 twu 27732 27733 * snap_uncompress.pl, gmap_uncompress.pl, gmap_uncompress.pl.in: Initial 27734 import into CVS. 27735 27736 * Makefile, dynprog.c, intron.c, intron.h, pair.c, splice-site.c: Removed 27737 use of splice site matrices and added identification of semi-canonical 27738 dinucleotides. 27739 277402003-02-28 twu 27741 27742 * params.c, params.h, snap.c, gmap.c, stage3.c, stage3.h: Changed 27743 extramaterial at the end and for paired to be parameters. 27744 27745 * get-genome.c: Changed program to check only first four letters of genomic 27746 name. 27747 277482003-02-11 twu 27749 27750 * pair.c, pair.h, stage3.c: Adjusted goodness score of alignment by number 27751 of canonical introns. 27752 27753 * dynprog.c, dynprog.h, params.c, params.h, snap.c, gmap.c, stage3.c, 27754 stage3.h: Parameterized band size in dynamic programming and increased 27755 bands for cross-species alignment. 27756 27757 * oligoindex.c, oligoindex.h, params.c, params.h, snap.c, gmap.c, stage2.c, 27758 stage3.c, stage3.h: Parameterized INDEXSIZE and made it different for 27759 cross-species alignment. 27760 27761 * stage2.c: Added smooth_path step in stage 2 to remove short spurious exon 27762 hits. 27763 277642003-02-03 twu 27765 27766 * params.c, params.h: Replaced dbroot with version. 27767 27768 * snap.c, gmap.c: Added reporting of version to program. 27769 277702003-01-27 twu 27771 27772 * stage3.c: Fixed problem where a base pair was missed on the 5' end. 27773 27774 * stage2.c: Fixed problems where genomic matches can overlap. 27775 27776 * pair.c: Fixed problems in computing exon endpoints. 27777 277782003-01-22 twu 27779 27780 * stage3.c: Reverted back to old method of building pairs in the middle. 27781 27782 * pair.c: Added post-processing check for a gap at the end of the alignment. 27783 27784 * oligoindex.c: Added check for poly-T. 27785 277862003-01-03 twu 27787 27788 * stage3.c: Made some changes to eliminate large gaps at the 3' end. 27789 27790 * snap.c, gmap.c: Improved handling of case where user provides both cDNA 27791 and genomic files. 27792 27793 * stage3.c: Fixed bug when no pairs are found. 27794 27795 * sequence.c: Fixed bug in failing to initialize. 27796 277972002-12-30 twu 27798 27799 * params.c, params.h: Added parameter for fwdonlyp. 27800 27801 * pairpool.c: Fixed small memory leak. 27802 27803 * params.h: Changed genomeinvert from a bool to an int. 27804 27805 * pair.c: Fixed bug where pointer was advanced before freeing it. 27806 278072002-12-11 twu 27808 27809 * snap.c, gmap.c: Fixed problem where complement table was not initialized 27810 early enough. 27811 278122002-12-10 twu 27813 27814 * sequence.c, sequence.h, snap.c, gmap.c: Improved procedure for trimming 27815 poly-A tails. 27816 27817 * pair.c: Increased space for positions from 12 to 14. 27818 27819 * complement.h: Removed extraneous semicolon. 27820 27821 * stage2.c: Fixed problem where no matching 8-mers are found. 27822 278232002-12-04 twu 27824 27825 * snapindex.c, gmapindex.c: Write accession names to .aux file, even if they 27826 do not start with NT_ or GA_. 27827 27828 * snap.c, gmap.c: Added routines for adding signs to chromosomes, inverting 27829 the genome, printing intron lengths, and trimming poly-A tails. 27830 27831 * pair.c, pair.h, params.c, params.h, stage3.c, stage3.h: Added routines for 27832 adding signs to chromosomes, inverting the genome, and printing intron 27833 lengths. 27834 27835 * sequence.c, sequence.h: Added routines for trimming poly-A tails. 27836 27837 * chrnum.c, chrnum.h, match.c: Added routines for adding signs to 27838 chromosomes. 27839 27840 * Makefile, complement.c, complement.h, genome.c: Added files for handling 27841 complements. 27842 278432002-11-26 twu 27844 27845 * match.c, pair.c: Changed printing of FWD/REV to +/-. Added printing of 27846 intron lengths. 27847 27848 * pair.c, pair.h, params.c, params.h, snap.c, gmap.c, stage3.c, stage3.h: 27849 Added ability to print genome first in alignment. 27850 278512002-11-25 twu 27852 27853 * snap.c, gmap.c: Added iteration code for cross-species alignments. 27854 27855 * Makefile, matchpair.c, matchpair.h: Added an object Matchpair_T to hold 27856 pairs of Match_T objects. 27857 27858 * segmentpos.c: Added a check against freeing a null value. 27859 27860 * result.c, result.h: Created a Stage1_T object that can hold state, for 27861 resuming stage 1 calculations later. 27862 27863 * pair.c, stage3.c: Changed definition of coverage to be based on length of 27864 query sequence that aligns. 27865 27866 * dynprog.c: Changed allocation procedures for Matrix_T and Directions_T. 27867 Provided hooks for doing band-limited memory clearing, but this won't work 27868 with the Gotoh P1 and Q1 matrices. 27869 27870 * stage2.c: Added a seenone check to protect against long stretches of N's 27871 in the genome. 27872 27873 * stage1.c, stage1.h: Created a Stage1_T object that can hold state, for 27874 resuming stage 1 calculations later. Stage1_T contains a list of 27875 Matchpair_T objects, and some procedures have been moved to matchpair.c. 27876 27877 * params.c, params.h: Added parameters for crossspecies and changed name of 27878 maxextend to maxstutter. 27879 278802002-11-20 twu 27881 27882 * dynprog.c, dynprog.h, stage3.c: When calling dynprog, now passing pointers 27883 to subsequence rather than copying subsequences. 27884 27885 * block.c, block.h: Simplified procedure for processing oligos by Block_T 27886 object. 27887 27888 * params.c, params.h: Added parameters for stage1size and maxlookback. 27889 27890 * pair.c, pair.h, stage3.c, stage3.h: Added counts for unknowns and 27891 reporting of coverage. 27892 27893 * oligoindex.c: Increased size of memory blocks from 10 to 50. 27894 27895 * oligoindex.c: Replaced realloc function with explicit calls to calloc and 27896 free, because Third Degree reported occasional errors with realloc. 27897 27898 * indexdb.c, indexdb.h, oligo.c, oligo.h, snap.c, gmap.c, stage1.c, 27899 stage1.h: Major change to stage 1 procedure to work on either 24-mers or 27900 18-mers. 27901 27902 * mem.c: Added blank line. 27903 27904 * match.c, match.h: Added procedure for Match_copy and simplified Match_new. 27905 27906 * genome.c: Inlined procedure fill_buffer. 27907 27908 * genome.c: Simplified routine for fill_buffer. 27909 279102002-11-15 twu 27911 27912 * indexdb.c, indexdb.h: Added code for ignoring poly A hits. Added 27913 procedure for reading 12-mer positions. 27914 27915 * pair.c: Removed debugging statement. 27916 27917 * block.c, match.c, match.h, oligo.c, stage1.c, stage1.h: Parameterized 27918 stage1size. 27919 279202002-11-11 twu 27921 27922 * pair.c, pair.h: Distinguished between mismatches and indels. Fixed cases 27923 where gaps need to be merged (e.g., affy.HGU95A.34233_i_at, which created 27924 ===...======...=== when an 8-mer fell into a gap and was then aligned to 27925 either end of the gap by dynamic programming. 27926 27927 * params.c, params.h: Added flag for low stringency. 27928 27929 * stage3.c, stage3.h: Changed definition of LARGEQUERYGAP to be maxlookback. 27930 Distinguished between mismatches and indels. 27931 27932 * snap.c, gmap.c: Changed definition of LARGEQUERYGAP to be maxlookback. 27933 Added flag for lowstringency (12-mers). 27934 27935 * dynprog.c, dynprog.h, stage2.c: Changed definition of LARGEQUERYGAP to be 27936 maxlookback. 27937 27938 * result.c: Improved check on whether to free array in result. 27939 27940 * params.c, params.h, stage2.c, stage2.h: Made maxlookback a parameter. 27941 27942 * snap.c, gmap.c: Introduced heap memory for each thread for dynamic 27943 programming. Made maxlookback a parameter. 27944 27945 * stage3.c, stage3.h: Introduced heap memory for each thread for dynamic 27946 programming. Restricted peelback for consecutive positions. 27947 27948 * dynprog.c, dynprog.h: Introduced heap memory for each thread for dynamic 27949 programming. 27950 279512002-11-08 twu 27952 27953 * dynprog.c: Changed dynamic programming procedure to be banded. 27954 27955 * stage2.c, stage3.c: Revised stage 2 procedure to jump every INDEXSIZE, 27956 keep track of consecutive matches, and have a maximum lookback. Changed 27957 stage 3 procedure accordingly, including increasing peelback to INDEXSIZE. 27958 27959 * snap.c, gmap.c: Changed default behavior to be ordered output. 27960 27961 * genome.c: Revmoed pre-loading for genome, and used madvise(MADV_DONTNEED) 27962 instead. 27963 279642002-11-07 twu 27965 27966 * dynprog.c: Added a zero gap penalty on the ends. Changed mismatch penalty 27967 to be less than a match penalty, and reduced intron reward accordingly. 27968 27969 * stage3.c: Added a peelback on the 5' end, because it's just like half of a 27970 paired gap alignment. 27971 27972 * snap.c, gmap.c: Removed hack used for debugging. 27973 27974 * stage2.c: Introduced concept of a maximum lookback, and will now go beyond 27975 the previous limit if no hit has been found. 27976 27977 * genome.c: Changed genome to be pre-paged when user specifies it. 27978 27979 * indexdb.c: Changed type of i from int to size_t. 27980 279812002-11-06 twu 27982 27983 * pair.c, pair.h, stage3.c: Changed pairs in stage 3 object to be allocated 27984 as a separate block, so they can be output at a later time. 27985 27986 * blackboard.c, blackboard.h, reqpost.c, reqpost.h, request.c, request.h, 27987 result.c, result.h, snap.c, gmap.c: Updated multithreading system to 27988 handle ordered output with better throughput by adding an output queue. 27989 27990 * genome.c, indexdb.c, pair.c: Added header for string.h to eliminate 27991 compiler warnings about strlen type. 27992 27993 * pairpool.c: Increased chunk size from 10000 to 20000. 27994 27995 * stage2.c: Added debugging comments to generate a graph. 27996 27997 * stage3.c: In-lined calls to List_T and Pair_T accessor functions. 27998 27999 * pair.c, stage1.c: In-lined calls to List_T accessor functions. 28000 28001 * list.c, list.h: Added function to return value of the last element of a 28002 list. 28003 280042002-11-05 twu 28005 28006 * pairpool.c, pairpool.h: Removed calls to realloc(), because they do not 28007 preserve pointer values. Replaced with allocation of chunks of memory as 28008 needed. 28009 28010 * dynprog.c, stage2.c: Changed two-dimensional matrices to be 28011 one-dimensional with pointer. 28012 28013 * blackboard.h, snap.c, gmap.c: Made minor tweaks to blackboard object, 28014 primarily alterating ninputs and noutputs outside the lock, and changing 28015 signal of end of output to be a null result. 28016 28017 * reqpost.c, reqpost.h: Added orderedp flag to send only appropriate signals. 28018 28019 * blackboard.c: Made minor tweaks to blackboard object, primarily alterating 28020 ninputs and noutputs outside the lock. 28021 280222002-11-04 twu 28023 28024 * dynprog.c, list.c, listdef.h, pairpool.c, pairpool.h, stage2.c, stage3.c: 28025 Added a pool of List_T cells for each thread to reduce heap contention. 28026 28027 * Makefile, blackboard.c, dynprog.c, dynprog.h, genome.c, genome.h, pair.c, 28028 pairdef.h, pairpool.c, pairpool.h, reqpost.c, reqpost.h, request.c, 28029 request.h, sequence.c, sequence.h, snap.c, gmap.c, stage2.c, stage2.h, 28030 stage3.c, stage3.h: Provided each worker thread with separate sources of 28031 heap memory for genomic sequence and for Pair_T objects. Intended to 28032 reduce heap contention. 28033 28034 * stage2.c: Created define for MAXLOOKBACK. 28035 28036 * stage1.c, stage1.h: Changed constants to be based on those in indexdb.h. 28037 28038 * indexdb.c, indexdb.h: Changed stage 1 lookup to be based on 12-mers, 28039 rather than 8-mers. 28040 280412002-11-02 twu 28042 28043 * indexdb.c: Implemented binary search on third 8-mer. 28044 28045 * snap.c, gmap.c: Allowed user to specify full path of database in the -d 28046 flag. 28047 28048 * Makefile: Added stopwatch to Makefile. 28049 28050 * indexdb.c: Changed preloading of indexdb to touch each page effectively, 28051 not by using memcpy(), which fails to load in pages. 28052 28053 * sequence.c: Added check for first call to fgetc(input) being EOF. 28054 280552002-11-01 twu 28056 28057 * pair.c: Changed print routine to work properly on user-supplied genomic 28058 segments. 28059 28060 * genome.c, indexdb.c: Changed pre-load to use fread/fopen/fwrite, rather 28061 than memcpy, which fails to load the pages into memory. 28062 28063 * Makefile, block.c, block.h, oligo.c, oligo.h, params.c, params.h, snap.c, 28064 snapindex.c, gmap.c, gmapindex.c, stage1.c, stage1.h: Changed stage 1 28065 database to use index table of 8-mers, rather than a hash table of 24-mers. 28066 28067 * stopwatch.c, stopwatch.h: Added stopwatch function to program. 28068 28069 * genome.c, genome.h: Added batch mode by using mmap/memcpy, but this 28070 appears to fail on a clustered file system. 28071 28072 * indexdb.c, indexdb.h: Implemented Indexdb_T as a substitute for Hashdb_T. 28073 280742002-10-31 twu 28075 28076 * stage2.c: Changed stage 2 procedure to consider both forward and reverse 28077 complement introns in one pass. Fixed a small bug in intron_score to 28078 require position >= 2. 28079 280802002-10-29 twu 28081 28082 * stage2.c, stage3.c: Replaced calls to Sequence_char with direct array 28083 access. 28084 28085 * Makefile, blackboard.h, genome.c, genome.h, hashdb-read.c, match.c, 28086 offset.c, oligoindex.c, pair.c, reqpost.c, reqpost.h, segmentpos.c, 28087 snap.c, gmap.c, stage2.c, stage3.c: Made various fixes for compiler 28088 warnings. 28089 28090 * stage3.c: Separated procedures for middle single gap and end single gap. 28091 Decreased size of single gap dynamic programming procedure for 5' and 3' 28092 ends to have genomejump = 2*queryjump. 28093 28094 * snap.c, gmap.c: Increased default extension to 30000 nt. 28095 28096 * dynprog.c: Prevented horizontal jumps on 3' end of splice site. Adjusted 28097 score parameters. 28098 280992002-10-28 twu 28100 28101 * dynprog.c, oligoindex.h, stage2.c: Changed oligomer size in stage 2 from 28102 10 to 8, and adjusted dynamic programming parameters accordingly. 28103 Prevented genomic gap at the 5' edge of an intron. Made initial 28104 cdna_direction test more robust. 28105 28106 * snap.c, gmap.c: Fixed calls to SNAP that don't involve any sequence (the 28107 -C and -L flags). 28108 28109 * stage3.c: Reduced minimum intron size from 10 to 9. 28110 28111 * stage1.c: Substituted the constant HASHSIZE for 24. 28112 28113 * dynprog.c: Increased reward for intron. A score of 10 fails to identify a 28114 canonical intron with a gap. 28115 28116 * pair.c: Fixed misreporting of query start coordinate. 28117 28118 * snap.c, gmap.c: Fixed small memory leak. 28119 281202002-10-27 twu 28121 28122 * Makefile, dynprog.c, splice-site.c, splice-site.h: Added splice site 28123 calculations to find best intron. 28124 28125 * dynprog.c, dynprog.h, stage2.c, stage2.h, stage3.c, stage3.h: Improved 28126 stage 2 dynamic programming procedure to consider introns (only for 28127 consecutive query positions), to compute gap penalty based on difference 28128 of genomejump and queryjump, and to consider cDNA directions separately. 28129 28130 * snap.c, gmap.c: Improved handling of arguments for database search and for 28131 alignment to genomic segment. 28132 28133 * stage1.c, stage1.h: Fixed stage 1 to consider Watson and Crick strands 28134 separately. 28135 28136 * params.c, params.h, snap.c, gmap.c, stage1.c, stage1.h: Made extension in 28137 stage 1 a user-definable parameter. 28138 28139 * blackboard.c, blackboard.h, pair.c, request.c, request.h, sequence.c, 28140 sequence.h, snap.c, gmap.c: Provided ability to align cDNA against 28141 user-provided genomic segment. 28142 28143 * dynprog.c: Gave credit to half introns. 28144 281452002-10-26 twu 28146 28147 * genome.c, genome.h, oligoindex.c, oligoindex.h, snap.c, gmap.c, stage1.c, 28148 stage1.h, stage2.c, stage2.h, stage3.c, stage3.h: Changed genomicseg to be 28149 of type Sequence_T. 28150 28151 * block.c: Changed debug flag. 28152 281532002-10-25 twu 28154 28155 * Makefile, request.c, request.h, sequence.c, sequence.h, snap.c, gmap.c, 28156 stage1.c, stage1.h, stage2.c, stage2.h, stage3.c, stage3.h: Renamed 28157 Queryseq_T to Sequence_T. 28158 28159 * queryseq.c, queryseq.h: Renamed Queryseq_T to Sequence_T, to allow genomic 28160 sequences to be represented this way. 28161 28162 * pair.c, pair.h, stage3.c, stage3.h: Simplified argument lists of some 28163 functions. 28164 28165 * params.c, params.h, result.c, result.h: Allowed first-order approximation 28166 using stage1 results. 28167 28168 * stage2.c: Increased extension on left and right to find small terminal 28169 exons. 28170 28171 * stage1.c, stage1.h: Fixed assessment of whether getpair succeeded or 28172 failed. 28173 28174 * match.c, match.h, snap.c, gmap.c: Added first-order approximation, to use 28175 just stage1 results. 28176 28177 * block.c, block.h, oligo.c: Fixed bugs in Block_next_to_stoppos when the 28178 query sequence has many non-ACGT characters. 28179 28180 * Makefile, snap.c, gmap.c: Made compressed genome the default. 28181 28182 * hashdb-read.c, hashdb-write.c: Reverted to old hashtable format, which 28183 contains only two arrays. 28184 281852002-10-24 twu 28186 28187 * dynprog.c, dynprog.h, pair.c, pair.h, params.c, params.h, snap.c, gmap.c, 28188 stage3.c, stage3.h: Added diagnostic mode to print out asterisks instead 28189 of vertical bars where dynamic programming was done. 28190 28191 * genome.c, genome.h: Added ability to read compressed genomes. 28192 28193 * Makefile, gencompress.c: Added compression routine for genomes. 28194 28195 * blackboard.c, blackboard.h, reqpost.c, reqpost.h, snap.c, gmap.c: Added 28196 anyorder behavior to blackboard, and made it default. 28197 281982002-10-22 twu 28199 28200 * stage2.c: Removed code for memory freeing of positions, which is now 28201 performed by Oligoindex_T. 28202 28203 * oligoindex.c: Changed type of positions from void ** to unsigned int **, 28204 to make code clearer and more robust. 28205 28206 * pair.c, pair.h, params.c, params.h, snap.c, gmap.c, stage3.c, stage3.h: 28207 Added option to print universal genomic coordinates. 28208 28209 * genome.c, hashdb-read.c: Changed mmap to from MAP_PRIVATE to MAP_SHARED. 28210 28211 * genome.c, genome.h, params.c, params.h, snap.c, gmap.c, stage2.c, 28212 stage2.h: Changed Genome_T to be memory-mapped, rather than using fopen, 28213 which is needed for multithreading. 28214 28215 * stage1.c: Fixed bug where salvage procedure fails to find anything. 28216 28217 * mem.c: Enhanced mem.c to give actual location of failure. 28218 28219 * oligoindex.c, oligoindex.h, snap.c, gmap.c, stage2.c, stage2.h: Changed 28220 algorithm for stage 2 to allocate genomic positions dynamically in 28221 Oligoindex_T. To limit number of positions stored, we prescan the 28222 queryseq to see what oligomers are relevant. 28223 28224 * stage2.c: Reverted back to previous Stage 2 strategy where we stored 28225 genomic sequence in oligoindex and scanned query sequence. 28226 28227 * oligoindex.c, oligoindex.h: Simplified routines greatly. 28228 282292002-10-21 twu 28230 28231 * stage2.c: Changed stage 2 strategy to index the query sequence rather than 28232 the genomic sequence. This should result in some speed up. 28233 28234 * reader.c, reader.h, stage1.c: Made Reader_new function depend on sequence 28235 rather than Queryseq_T object. 28236 28237 * list.c, list.h: Added function List_index. 28238 28239 * oligoindex.c, oligoindex.h: Hard-coded interval, rather than passing it in. 28240 28241 * pair.c, snap.c, gmap.c: Added flag to avoid showing contig coordinates. 28242 28243 * params.c, params.h: Added Params_T object. 28244 28245 * Makefile, blackboard.c, blackboard.h, reqpost.c, reqpost.h, request.c, 28246 request.h, result.c, result.h, snap.c, gmap.c: Major change to make 28247 program multithreaded. Introduced Blackboard_T, new Reqpost_T, and new 28248 Result_T objects. 28249 28250 * stage3.h: Added header file. 28251 28252 * stage2.c, stage2.h: Cleaned up procedures. Passed in querylength via 28253 queryseq. 28254 28255 * stage1.c, stage1.h: Cleaned up procedures. Made hashinterval a constant. 28256 28257 * queryseq.c, sequence.c: Removed macro for DEBUG2. 28258 28259 * oligoindex.c, oligoindex.h: Allowed offsets for oligoindex to be created 28260 separately (for individual worker threads.) 28261 28262 * block.c: Changed name from Result_T to Match_T. 28263 28264 * block.c, match.c, match.h, stage1.c, stage1.h, stage2.c, stage2.h: Changed 28265 name of Result_T to Match_T. 28266 28267 * result.c, result.h: Renamed Result_T to Match_T. 28268 28269 * stage3.c: Turned off debug statements. 28270 282712002-10-20 twu 28272 28273 * queryseq.c, sequence.c: Fixed small memory leak. 28274 28275 * Makefile, align.c, align.h, snap.c, gmap.c, stage2.c, stage2.h, stage3.c, 28276 stage3.h: Created Stage 3 and moved part of Stage 2 commands there. 28277 28278 * Makefile, align.c, align.h, queryseq.c, queryseq.h, reader.c, reader.h, 28279 sequence.c, sequence.h, snap.c, gmap.c, stage1.c, stage1.h, stage2.c, 28280 stage2.h: Added a separate Queryseq_T object, and moved some functions 28281 from Reader to Querypos. 28282 28283 * pair.c: Removed bottom ruler. 28284 28285 * block.c, block.h, snap.c, gmap.c, stage1.c, stage1.h: Removed 28286 multithreading from stage 1 (hash table reads). 28287 28288 * dynprog.c: Made maximize_entry inline to speed up dynamic programming. 28289 28290 * stage2.c: Added another 24 (hashsize) to extensions. Without this, for 28291 some reason, we miss the ends. 28292 28293 * Makefile, dynprog.c, penalties.c, penalties.h: Removed Penalties_T object 28294 in order to increase speed. 28295 28296 * stage2.c: Changed extension to be based on remaining distance from end. 28297 282982002-10-19 twu 28299 28300 * snap.c, gmap.c: Eliminated parameters maxentries and indexsize. 28301 28302 * oligoindex.c, oligoindex.h: Made changes to improve speed, by eliminating 28303 unnecessary arrays. 28304 28305 * stage2.c, stage2.h: Made changes to improve speed, including making 28306 build_pairs_middle iterative and using the fill_oligo function where 28307 possible. 28308 28309 * align.c: Changed Pair_T object to reflect the actual case of the query 28310 sequence. 28311 28312 * oligoindex.c: Simplified construction of Oligoindex_T object. 28313 28314 * Makefile, align.c, align.h, oligoindex.c, oligoindex.h, snap.c, gmap.c, 28315 stage2.c, stage2.h: Made indexsize a hardcoded parameter. Allocated space 28316 for Oligoindex once at beginning of program. 28317 28318 * align.c, stage2.c: Using Pair_T object instead of Result_T object 28319 throughout stage 2. 28320 28321 * stage1.c: Fixed memory leak. 28322 28323 * align.c, align.h, stage2.c: Major change to improve stage 2 efficiency. 28324 Using arrays instead of lists for the dynamic programming alignment. 28325 28326 * oligoindex.c: Changed debug statements from fprintf to printf. 28327 28328 * offset.c, offset.h: Added back to repository. 28329 283302002-10-18 twu 28331 28332 * stage1.c, stage2.c: Allowed alignments even if we can't find a matching 28333 pair on the 5' and 3' ends. 28334 28335 * oligoindex.c: Changed debug statements. 28336 28337 * align.c: Removed debug statement. 28338 28339 * Makefile: Added rule for counting lines of code. 28340 283412002-10-17 twu 28342 28343 * Makefile, segmentpos.c, segmentpos.h, snap.c, snapgenerate.c, gmap.c: 28344 Generated dump procedure to work on either the text offset file or the 28345 offset BerkeleyDB. Added accession length to the output. 28346 28347 * match.c, match.h, result.c, result.h: Cleaned up unused or obsolete 28348 procedures. 28349 28350 * hashdb-read.c: Made hashindex memory mapped again. Added madvise() 28351 commands to help with memory mapping. 28352 28353 * stage1.c: Changed algorithm for stage 1 to extend for 2 hash intervals 28354 past the first connectable pair of hits. 28355 28356 * block.c, block.h: Added ability to stop block at a certain position. 28357 283582002-10-15 twu 28359 28360 * dynprog.c, dynprog.h, stage1.c, stage2.c: Added counts of matches and 28361 mismatches on dynamic programming of single gaps, and used this to exclude 28362 dynamic programming results on 5' and 3' ends. 28363 28364 * snap.c, gmap.c, stage2.c, stage2.h: Increased EXTENSION from 90 to 1000. 28365 Included check for genomicpos2 against chromosomal length. 28366 28367 * hashdb-read.c, hashdb-write.c: Reading hashindex into memory instead of 28368 memory mapping it. 28369 28370 * dynprog.c: Cosmetic changes to debug macro. 28371 28372 * align.c: Cleaned out unused code. 28373 28374 * snap.c, gmap.c: Added flag for specifying maxentries (in stage 2). 28375 28376 * stage2.c: Fixed one-off error on requesting dynamic programming of 5' end. 28377 283782002-10-14 twu 28379 28380 * hashdb-write.h: Added log file. 28381 28382 * snapindex.c, gmapindex.c: Fixed minor bug relating to log file. 28383 28384 * hashdb-write.c: Added file pointer for a log file. 28385 28386 * snapindex.c, gmapindex.c: Raised default maxentries value from 5 to 20. 28387 Added file pointer for a log file. 28388 28389 * hashdb-write.c: Commented out monitoring statements. 28390 28391 * hashdb.c: Developed new hash function to give the same hash value for an 28392 oligo and its reverse complement. This should improve page access for the 28393 hash lookup. 28394 28395 * stage2.c: Improved debugging statement. 28396 28397 * snap.c, gmap.c: Removed effect of maxentries in stage 2, which was causing 28398 some alignments to be short. 28399 28400 * align.c: Changed debug statements from fprintf to printf. 28401 28402 * hashdb-read.c: Fixed problem in binary search where we subtracted 1U from 28403 0U. 28404 28405 * dynprog.c: Allowed dynamic programming to identify introns even if 28406 lowercase. 28407 28408 * database.c: Reformatting. 28409 28410 * hashdb-read.c: Fixed bug where function was returning NULL prematurely. 28411 28412 * snapindex.c, gmapindex.c: Made second pass read only on auxfile. 28413 284142002-10-13 twu 28415 28416 * Makefile, block.h, database.c, database.h, match.c, oligo.h, request.h, 28417 result.c, snap.c, snapindex.c, gmap.c, gmapindex.c, stage1.h: Removed 28418 traces of PureDB package. 28419 28420 * Makefile, cell.c, cell.h, snapindex.c, gmapindex.c: Changed snapindex 28421 program to a two-pass process. The first pass saves the .aux file, and 28422 the second pass creates the hash table. This simplifies the Cell_T object 28423 greatly. 28424 28425 * cell.c, cell.h, hashdb-write.c, snapindex.c, gmapindex.c: Sorting cell 28426 entries by hashvalue then by oligo. 28427 284282002-10-12 twu 28429 28430 * hashdb-read.c: Fixed minor bug in binary search routine. 28431 284322002-10-11 twu 28433 28434 * Makefile, block.c, hashdb-read.c, hashdb-write.c: Changed structure of 28435 hashdb to have three tables: oligo_offset, oligos, and positions. 28436 28437 * hashdb-write.c: Checking totalsize of contents and setting file size 28438 initially to that. 28439 28440 * hashdb-read.c: Memory mapping hash contents now. 28441 284422002-10-10 twu 28443 28444 * snapgenerate.c: Fixed memory leaks. 28445 28446 * snap.c, gmap.c: Fixed memory leak from failure to free Offset_T object. 28447 28448 * snap.c, gmap.c: Fixed bug where datadir was freed. 28449 28450 * hashdb-read.c: Fixed bug where memory mapped offsets were freed. 28451 28452 * dynprog.c: Fixed bug where cL+1 or cL+2 exceeded length2L. 28453 28454 * oligo.c: Revised procedures to handle lowercase letters in the query 28455 sequence. 28456 28457 * hashdb-read.c: Fixed bug where nentries wasn't being set. 28458 28459 * Makefile: Divided hashdb into separate read and write files. 28460 28461 * block.c, block.h, oligo.c, oligo.h, request.c, request.h, snap.c, gmap.c: 28462 Changing from PureDB to our own Hashdb_T. 28463 28464 * stage1.c, stage1.h: Changing from PureDB to our own Hashdb_T. Also fixed 28465 bug where results3 was not being initialized to NULL. 28466 28467 * snapindex.c, gmapindex.c: Divided hashdb file into a separate read and 28468 write file. 28469 28470 * snapgenerate.c: Using Offset_T object now, after we have written the 28471 chromosome file. 28472 28473 * hashdb-write.c: Fixed bug causing unaligned access errors, by splitting 28474 header into two 4-byte unsigned ints. 28475 28476 * hashdb-read.c: Fixed bug where length = 0. Also fixed bug causing 28477 unaligned access errors, by splitting header into two 4-byte unsigned ints. 28478 28479 * hashdb-read.c, hashdb-read.h, hashdb-write.c, hashdb-write.h, hashdb.c, 28480 hashdb.h: Split Hashdb functions into separate read and write files. 28481 28482 * hashdb.c, hashdb.h: Provided option to switch between unsigned long and 28483 unsigned int for hashoffset_t. 28484 28485 * hashdb.c, hashdb.h: Changed offsets to be memory-mapped rather than read 28486 by file. 28487 28488 * Makefile, hashdb.c, hashdb.h, snapindex.c, gmapindex.c: Changing hash 28489 database to our own format. 28490 28491 * snapindex.c, gmapindex.c: Fixed bug where last oligo would not get stored. 28492 284932002-10-09 twu 28494 28495 * pair.c, segmentpos.c: Added missing header file for commafmt. 28496 28497 * Makefile: Revised object files needed for snapindex and snapgenerate. 28498 28499 * Makefile, accpos.c, add-chrpos-to-endpoints.c, block.c, block.h, cell.c, 28500 database.c, database.h, match.c, match.h, offsetdb.c, offsetdb.h, 28501 result.c, result.h, segmentpos.c, segmentpos.h, snap.c, snapgenerate.c, 28502 snapindex.c, gmap.c, gmapindex.c, stage1.c, stage1.h: Changed offset reads 28503 from a database to a structure read from a flat file. 28504 28505 * get-genome.c: Added -U flag to generate unmasked sequences. 28506 28507 * offset.c, offset.h: Renamed files from offset.* to offsetdb.h 28508 28509 * snap.c, gmap.c: Implemented print_details. 28510 28511 * stage1.c, stage1.h: Implemented print_details. Fixed problem where 28512 dominated bounds were not being eliminated. 28513 28514 * stage2.c: Increased the peelback to identify introns. Added debugging 28515 statements. 28516 28517 * align.c: Fixed greediness for finding introns. Removed gap penalty and 28518 reward for intron. Instead, implemented a tie breaker for scores based on 28519 genomic distance. Increased the peelback to identify introns. 28520 285212002-10-08 twu 28522 28523 * align.c, block.c, match.c, match.h, result.c, result.h, snap.c, gmap.c, 28524 stage1.c, stage1.h, stage2.c: Changed stage 1 of algorithm to find bounds 28525 using 5' and 3' hits. 28526 28527 * Makefile, pair.c, pair.h, stage2.c: Changed goodness to be differences of 28528 matches and mismatches. 28529 285302002-10-07 twu 28531 28532 * dynprog.c: Changed recursive functions of traceback and scoreback to be 28533 iterative. 28534 28535 * snap.c, gmap.c, stage2.c: Added check for large query gaps and avoided 28536 doing dynamic programming on those. Also added check for allpaths being 28537 NULL from stage 2. 28538 28539 * align.c: Toggled DEBUG. 28540 28541 * Makefile, snap.c, gmap.c, stage2.c, stage2.h: Added ability to print 28542 alignment summaries only. 28543 28544 * pair.c: Ignored N's in computing percent identity. 28545 28546 * Makefile, pair.c, pair.h, stage2.c: Added number of exons to calculations 28547 and output. 28548 28549 * snap.c, gmap.c, stage2.c, stage2.h: Made alignment procedure the default. 28550 Now sorting paths based on the goodness of the alignment. 28551 28552 * pair.c, pair.h, stage2.c: Removed npairs from some parameter lists. 28553 28554 * pair.c, pair.h, stage2.c, stage2.h: Added calculation for goodness, based 28555 on percent identity. 28556 28557 * match.c, pair.c, pair.h, result.c, segmentpos.c, segmentpos.h, snap.c, 28558 gmap.c, stage2.c, stage2.h: Now printing endpoints based on alignments, if 28559 available. 28560 28561 * list.c: Fixed bug in List_last. 28562 28563 * list.c, list.h: Added a List_last procedure. 28564 28565 * Makefile, match.c, match.h, result.c, result.h, snap.c, gmap.c, stage1.c, 28566 stage1.h, stage2.c, stage2.h: Created a Stage2_T object and reorganized 28567 calculations, in preparation for using the alignments to rank the results. 28568 28569 * Makefile, snap.c, gmap.c: Added parameter for maxaligns, the maximum 28570 number of alignments to print. 28571 285722002-10-06 twu 28573 28574 * dynprog.c: Fixed read of unallocated hash. 28575 28576 * align.c: Fixed read of uninitialized variable. 28577 28578 * align.c, dynprog.c, pair.c, pair.h, stage2.c: Added ability to recognize 28579 introns in revcomp direction, and to print correct indices for Crick 28580 strand matches. 28581 28582 * match.c, result.c: Simplified use of zerobasedp. 28583 28584 * snap.c, gmap.c, stage1.c, stage1.h, stage2.c, stage2.h: Changed variable 28585 names to distinguish between hashsize and indexsize. 28586 28587 * dynprog.c, genome.c, stage2.c: Fixed errors with the sequence and genomic 28588 indices. 28589 28590 * stage1.c: Removed list reversal to match new scheme for doing stage 1 28591 dynamic programming. 28592 28593 * align.c, pair.c: Enhanced debugging information. 28594 28595 * align.c: Revised code to make sure that we don't pick unwanted paths after 28596 the first. We set the usedp flags and recompute dynamic programming on 28597 subsequent rounds to avoid using those results. This should affect only 28598 stage 1, because maxpaths equals 1 on stage2. 28599 28600 * align.c: Removed gappenalty for stage 1 computation. This was causing 28601 problems with multiple paths for HER2. 28602 28603 * pair.c, pair.h, stage2.c: Added procedure for summary of exons. 28604 28605 * snap.c, gmap.c: Made printout slightly better. 28606 28607 * match.c, result.c: Fixed miscount on number of matches. 28608 28609 * pair.c: No change. 28610 28611 * genome.c, genome.h: Added modules to retrieve genome sequences. 28612 28613 * dynprog.c: Minor restructuring of procedures. 28614 28615 * dynprog.c: Fixed coordinates in gap. Changed gap output for non-introns. 28616 28617 * pair.c: Added printing of rulers in alignments. 28618 28619 * stage2.c: Fixed memory leaks. 28620 286212002-10-05 twu 28622 28623 * dynprog.c: Fixed major problem in paired gap assessments. Need to 28624 subtract, not add, the entry in the right matrix. 28625 28626 * stage2.c: Changed criteria for single and paired gaps, based on a minimum 28627 intron length. Created special case for the 3' end. 28628 28629 * penalties.c: Changed middle gap penalties to have bigger opening and 28630 smaller extend penalties. 28631 28632 * dynprog.c, dynprog.h: Changed concepts from short and long gaps to single 28633 and paired gaps. 28634 28635 * stage2.c: Added peelback procedure to help identify correct intron. 28636 Otherwise, the greedy oligo matching procedure can mask the intron 28637 boundaries. 28638 28639 * dynprog.c: Fixed bug for traceback on longgap, where we didn't start from 28640 the lower right cell. 28641 28642 * align.c, stage2.c: Increased size of stage 2 oligos from 8 to 10. 28643 28644 * align.c, oligoindex.c, oligoindex.h, snap.c, gmap.c, stage2.c, stage2.h: 28645 Added ability to limit maxentries in stage 2. 28646 28647 * dynprog.c: Changed alignment character for dynamic programming to help 28648 with debugging. 28649 28650 * dynprog.c, stage2.c: Implemented dynamic programming across long gaps. 28651 28652 * dynprog.c: Reordered priorities in traceback to be (1) continue in same 28653 direction, (2) diagonal, (3) vertical, and (4) horizontal. 28654 28655 * dynprog.c, dynprog.h, penalties.c, penalties.h, stage2.c: Cleaned up 28656 dynamic programming code for the three cases of FIVE, MIDDLE, and THREE. 28657 Added stub for dynamic programming of long gaps. 28658 28659 * pair.c, pair.h, stage2.c: Made improvements to the alignment output. 28660 286612002-10-04 twu 28662 28663 * match.c, pair.c, pair.h, result.c, snap.c, gmap.c, stage2.c: Added 28664 improvements to the alignment output. 28665 28666 * stage2.c: Added code to handle the 5' end properly. 28667 28668 * penalties.c: Changed some values for the penalty parameters. 28669 28670 * dynprog.c: Changed opening penalties to not include the extension. Added 28671 special procedures for 5' and 3' ends of sequence, essentially 28672 implementing part of Smith-Waterman on each end. Added special cases in 28673 traceback for 5' and 3' ends, but may not be necessary in light of the 28674 other changes. 28675 28676 * stage2.c: Added querypos and genomepos to the Pair object. Reorganized 28677 various functions. 28678 28679 * pair.c, pair.h: Added querypos and genomepos to the Pair object. 28680 28681 * reader.h: Added another option to cDNAEnd_T. 28682 28683 * align.c: Fixed the precise bounds around an intron. 28684 28685 * Makefile, dynprog.c, dynprog.h, penalties.c, penalties.h: Added penalties 28686 object. Provided ability to specify different penalties for left, middle, 28687 and right part of sequence. 28688 28689 * stage2.c: Moved printing procedure to another file. Fixed small bug that 28690 caused us to miss printing a base. 28691 28692 * pair.c, pair.h: Removed printing of loci names from alignment. 28693 28694 * Makefile, align.c, dynprog.c, dynprog.h, pair.c, stage2.c: Added dynamic 28695 programming routine to take care of small gaps. 28696 28697 * Makefile, align.c, align.h, match.c, match.h, matrix.c, matrix.h, 28698 oligoindex.c, oligoindex.h, pair.c, pair.h, path.c, path.h, penalties.c, 28699 penalties.h, reader.c, reader.h, result.c, result.h, snap.c, gmap.c, 28700 stage1.c, stage1.h, stage2.c, stage2.h: Major change to algorithm to have 28701 two stages: one using hash table (24-mers) and another using an index 28702 table (8-mers). Still need to incorporate a dynamic programming step for 28703 gaps in the final alignment. 28704 287052002-10-02 twu 28706 28707 * whats_on: Changed program to work with new data directory for alignment 28708 results. 28709 287102002-10-01 twu 28711 28712 * snap.c, gmap.c: Fixed problem where intronlen == 0. Now requiring 28713 intronlen > 0. Added extra carriage return when zero paths found. 28714 287152002-09-27 twu 28716 28717 * Makefile: Reduced number of object files used in SNAP. 28718 28719 * get-genome.c: Fixed use of fscanf to match the .chromosome and .contig 28720 file format. 28721 28722 * path.c, path.h: Simplified call to Path_compute to eliminate scoremat. 28723 28724 * penalties.c, penalties.h: Added procedure to create a default penalties 28725 object. 28726 28727 * match.c, result.c: Added line for number of matches. 28728 28729 * snap.c, gmap.c: Fixed bug where resultlist was uninitialized. Allowed 28730 resultstring of 0. Simplified call to Path_compute. 28731 28732 * whats_on: Added -R flag for release number. 28733 287342002-09-25 twu 28735 28736 * intlist.c, intlist.h, scoremat.c, scoremat.h: No longer need Intlist_T or 28737 Scoremat_T. 28738 28739 * Makefile, path.c, path.h, snap.c, gmap.c: Removing Sequence_T. Using char 28740 * instead to represent sequences. 28741 28742 * reader.c, reader.h: Added Reader_pointer function. 28743 28744 * penalties.c, intlist.c, matrix.c, scoremat.c: Using CALLOC/FREE macros. 28745 28746 * snap.c, gmap.c: Inadvertent commit. Adding routines to perform 28747 nucleotide-level dynamic programming. 28748 28749 * ring.c, ring.h: Removed Ring_T. Apparently not used by other seqalign 28750 files. 28751 28752 * path.c, path.h: Premature commit. Adding routines to analyze only 28753 submatrices. 28754 28755 * Makefile: Adding files from seqalign. 28756 28757 * intlist.c, intlist.h: Added files from seqalign to do nucleotide-level 28758 dynamic programming. 28759 28760 * get-genome.c: Added flag for release string. Changed type of positions 28761 from long to unsigned int. 28762 28763 * offset.c, offset.h, offsetdb.c, offsetdb.h: Added datadir to 28764 Offset_read_file. 28765 28766 * match.c, match.h, result.c, result.h: Added Result_path command. 28767 28768 * snapgenerate.c, snapindex.c, gmapindex.c: Simplified strcpy/strcat calls 28769 to sprintf. 28770 28771 * matrix.c, matrix.h, penalties.c, penalties.h, ring.c, ring.h, scoremat.c, 28772 scoremat.h, path.c, path.h: Added to program for doing nucleotide-level 28773 dynamic program. Taken from seqalign. 28774 28775 * path.c: Inadvertent commit. Still editing. 28776 287772002-09-24 twu 28778 28779 * segmentpos.c, segmentpos.h, snapindex.c, gmapindex.c: Added 28780 superaccessions to accsegmentpos_db. 28781 287822002-09-23 twu 28783 28784 * radixsort.c: Fixed syntax error when monitoring is turned off. 28785 28786 * Makefile, radixsort.c: Added monitoring routine for radix sort. 28787 287882002-09-19 twu 28789 28790 * snapgenerate.c: Removed debug line. 28791 28792 * snapindex.c, gmapindex.c: Added option for using lowercase characters. 28793 287942002-09-18 twu 28795 28796 * Makefile, snapgenerate.c: Added program snapgenerate, to create text 28797 .chromosome, .contig, and .chromosome files. 28798 28799 * snapindex.c, gmapindex.c: Clarified the variable auxfile. 28800 28801 * snapindex.c, gmapindex.c: Clarified the variable dbroot. 28802 288032002-09-17 twu 28804 28805 * oligo.c: Made comment to explain Third Degree warning. 28806 28807 * Makefile, block.c, block.h, match.c, match.h, reqpost.c, reqpost.h, 28808 result.c, result.h, snap.c, gmap.c: Made changes to sample query sequence 28809 at a test interval and perform dynamic programming. 28810 288112002-09-16 twu 28812 28813 * endpoints.c, endpoints.h: Removed from source. 28814 288152002-09-12 twu 28816 28817 * block.c: Changed debug flag. 28818 28819 * Makefile: Changed C compiler flags. 28820 28821 * snap.c, gmap.c: Changed default directory to be in /usr/seqdb2_nb. 28822 288232002-08-30 twu 28824 28825 * endpoints.c, match.c, match.h, result.c, result.h, snap.c, gmap.c: Made 28826 changes to facilitate garbage collection, including adding a matchedp flag 28827 to results, and putting singleton results into an endpoint. 28828 28829 * block.c: Changed debug messages. 28830 28831 * endpoints.c, endpoints.h: Changed print routine. Added code for query 28832 length. 28833 28834 * snap.c, gmap.c: Added consolidation of endpoints, and ranking of those to 28835 generate a single result. 28836 288372002-08-29 twu 28838 28839 * block.c, match.c, oligo.c, result.c, segmentpos.c: Added debug macros. 28840 28841 * endpoints.c, endpoints.h: Added commands for sorting endpoints and testing 28842 for adjacency. 28843 28844 * reader.c: Fixed test when startptr == endptr. 28845 28846 * snap.c, gmap.c: Implemented divide-and-conquer strategy on query sequence. 28847 288482002-08-28 twu 28849 28850 * snapindex.c, gmapindex.c: Turned off printing of subaccession messages. 28851 288522002-08-22 twu 28853 28854 * snapindex.c, gmapindex.c: Added timing statistics. 28855 28856 * block.c, oligo.c: Fixed coordinate calculations. May need to check. 28857 28858 * snapindex.c, gmapindex.c: Added dump function. 28859 28860 * radixsort.c, radixsort.h: Changed accessor function to get a character 28861 rather than a pointer. Fixed algorithm for case where byte equals strlen. 28862 28863 * cell.c, cell.h: Changed accessor function to get a character rather than a 28864 pointer. 28865 28866 * Makefile, rsort-check.c, rsort-test.c: Added a test and check routine for 28867 radixsort. 28868 28869 * Makefile, snapindex.c, gmapindex.c: Removed unnecessary files for 28870 snapindex. 28871 288722002-08-21 twu 28873 28874 * Makefile, block.c, block.h, endpoints.c, endpoints.h, match.c, match.h, 28875 offset.c, offsetdb.c, oligo.c, oligo.h, readcirc.c, reader.c, reader.h, 28876 request.h, result.c, result.h, snap.c, gmap.c: Major change to implement 28877 divide-and-conquer strategy. 28878 28879 * read.c, read.h: Changed name of file from read.c to readcirc.c 28880 28881 * block.c, block.h, endpoints.c: Partial changes to implement 28882 divide-and-conquer strategy. 28883 28884 * snapindex.c, gmapindex.c: Improved diagnostic messages. 28885 28886 * Makefile, offset.c, offsetdb.c, radixsort.c, radixsort.h, read.c, 28887 readcirc.c, segmentpos.c, snapindex.c, gmapindex.c: Fixed minor compiler 28888 warnings. 28889 28890 * cell.c: Using pointers rather than lists to store multiple positions for 28891 an oligo. Fixed quicksort compare function accordingly. 28892 28893 * snapindex.c, gmapindex.c: Using pointers rather than lists to store 28894 multiple positions for an oligo. 28895 28896 * radixsort.c: Added small speed hacks. 28897 28898 * Makefile: Added quicksort as an option. 28899 28900 * Makefile, cell.c, cell.h, radixsort.c, radixsort.h, snapindex.c, 28901 gmapindex.c: Added radix sort as a replacement for quicksort. 28902 289032002-08-20 twu 28904 28905 * oligo.c: Fixed key_size for partial bytes. 28906 28907 * snapindex.c, gmapindex.c: Changed location of oligo file to be in dbenv 28908 directory, not a subdirectory. 28909 289102002-08-15 twu 28911 28912 * Makefile, block.c, block.h, oligo.c, oligo.h, request.c, request.h, 28913 snap-withenv.c, snap.c, snapindex.c, gmap.c, gmapindex.c: Made changes to 28914 accommodate sizes less than 32-mers. 28915 28916 * match.c, match.h, offset.c, offset.h, offsetdb.c, offsetdb.h, result.c, 28917 result.h, snap.c, gmap.c: Added ability to read chromosome information 28918 from file, but not done by default right now. 28919 289202002-08-11 twu 28921 28922 * Makefile, block.c, block.h, dpentry.c, dpentry.h, endpoints.c, 28923 endpoints.h, match.c, match.h, result.c, result.h, snap.c, gmap.c: Changed 28924 algorithm to work inward from both ends and find a single match. 28925 289262002-08-10 twu 28927 28928 * Makefile, snapindex.c, gmapindex.c: Fixed program to handle cases where 28929 interval is less than size. 28930 289312002-07-19 twu 28932 28933 * segmentpos.c: Changed type of querylen. 28934 28935 * whats_on: Changed suffix for db filenames. 28936 28937 * iit-read.c, iit.c, interval-read.c, interval-read.h, interval.c, 28938 interval.h: Changed binary storage format to be a single file. 28939 28940 * get-genome.c: Changed input format to accept a single string. 28941 289422002-07-13 twu 28943 28944 * iit_get.c: Added ability to query symbolic db. 28945 289462002-07-12 twu 28947 28948 * iit_get.c: Fixed bugs in the algorithm. 28949 28950 * iit_get.c: Allowed user to specify a single point, rather than an interval. 28951 28952 * iit_store.c: Added an output message when Berkeley DB file is done. 28953 28954 * iit_get.c, iit_store.c: Integrated interval tree into 28955 db_load/retrieve_endpoints. 28956 28957 * Makefile, basic.h, iit-read.c, iit.c, interval-read.c, interval-read.h, 28958 interval.c, interval.h: Rewrote interval tree to handle interval queries 28959 and to write tree to and read tree from files. 28960 289612002-07-11 twu 28962 28963 * basic.h, iit-read.c, iit.c: Added code for integer interval trees from 28964 Edelsbrunner's alpha shapes. 28965 28966 * get-genome.c: Added ability to convert a single coordinate. 28967 28968 * prb.c: Fixed minor typos. 28969 28970 * prb.c, prb.h: Revised format and separated interface from implementation. 28971 28972 * prb.c, prb.h: Added routines for red-black trees with parent pointers from 28973 libavl 2.0 28974 289752002-07-10 twu 28976 28977 * dpentry.c: Changed criterion to consider query coverage. 28978 28979 * endpoints.c: Fixed problem with negative relative positions. 28980 289812002-07-09 twu 28982 28983 * endpoints.c, endpoints.h, snap.c, gmap.c: Revised output format of SNAP. 28984 Coordinates are now given for each accession. 28985 28986 * offset.c, offset.h, offsetdb.c, offsetdb.h, snap.c, gmap.c: Revised 28987 chromosome dump procedure to print lengths as well as offsets. 28988 28989 * Makefile, add-chrpos-to-endpoints.c: Added program for adding chromosomal 28990 position to endpoints. 28991 28992 * whats_on: Modified program to work with new version of SNAP. 28993 289942002-07-08 twu 28995 28996 * dpentry.c, match.c, match.h, result.c, result.h, snap.c, gmap.c: Restored 28997 nleads as a criterion in dynamic programming. Added features to help with 28998 debugging. 28999 29000 * Makefile, get-genome.c: Freed get-genome from using BerkeleyDB databases, 29001 which are too slow to open. 29002 29003 * get-genome.c: Added ability to report coordinates. 29004 29005 * iit_get.c: Added check for zero matches. 29006 29007 * whats_on: Preliminary changes (inadvertent checkin). 29008 29009 * Makefile, get-genome.c: Created get-genome program. 29010 29011 * accpos.c: Added offset for chromosomes. 29012 29013 * accpos.c, database.c, database.h, snap.c, snapindex.c, gmap.c, 29014 gmapindex.c: Changed location of data files. 29015 290162002-07-07 twu 29017 29018 * iit_get.c: Fixed bug with testing dumpp. 29019 290202002-07-06 twu 29021 29022 * endpoints.c, endpoints.h, match.c, result.c: Added check for boomerang 29023 paths, where the genomic length is 0. 29024 290252002-07-05 twu 29026 29027 * whats_on: Added whats_on from ../snap. 29028 29029 * spidey_compress.pl: Added meta-level compression. 29030 29031 * spidey_compress.pl: Added spidey_compress.pl from ../snap. 29032 29033 * iit_get.c: Added dump utility. 29034 29035 * iit_get.c, iit_store.c: Added programs for storing and retrieving records 29036 based on endpoints. 29037 290382002-07-04 twu 29039 29040 * sim4_uncompress.pl: Added retrieval function for get-genome. 29041 290422002-07-03 twu 29043 29044 * sim4_compress.pl, sim4_uncompress.pl: Added further compression by 29045 counting repeated tokens. 29046 29047 * sim4_compress.pl, sim4_uncompress.pl, util: Added sim4 29048 compression/uncompression routines from snap CVS archive. 29049 29050 * dpentry.c, match.c, result.c: Changed from using slopes (quotients) to 29051 intron measurements (differences). 29052 29053 * endpoints.c, snap.c, gmap.c: Changed output slightly, e.g., en-dash for 29054 number ranges. 29055 29056 * Makefile, accpos.c: Created program accpos, for finding genomic position 29057 of accessions. 29058 29059 * segmentpos.c, segmentpos.h: Added procedure for finding partially matching 29060 accessions. 29061 29062 * snapindex.c, gmapindex.c: Made creation of aux-only database faster. 29063 29064 * snap.c, gmap.c: Fixed small bug in error message. 29065 29066 * segmentpos.c, segmentpos.h: Made Segmentpos_print extern. 29067 29068 * database.c: Changed accsegmentpos_db from hash to B-tree. 29069 29070 * database.c, database.h, snap.c, snapindex.c, gmap.c, gmapindex.c: Merged 29071 two database procedures. 29072 29073 * segmentpos.c, segmentpos.h: Added procedure for reading from 29074 accsegmentpos_db. 29075 29076 * Makefile, cell.c, cell.h, database.c, database.h, endpoints.c, 29077 endpoints.h, segmentpos.c, segmentpos.h, snap.c, snapindex.c, gmap.c, 29078 gmapindex.c: Added another database, from accession name to segmentpos, 29079 and renamed databases. 29080 290812002-07-02 twu 29082 29083 * Makefile, block.c, block.h, snap.c, gmap.c: Added specification for 29084 minimum separation between leads. 29085 29086 * Makefile: Removed segmentpos dump flag from db.test 29087 29088 * endpoints.c, endpoints.h, snap.c, gmap.c: Changed to 1-based coordinates 29089 as default. 29090 29091 * snapindex.c, gmapindex.c: Removed segment dump, because it can be 29092 performed by snap. 29093 29094 * snap.c, gmap.c: Added several command-line options. 29095 29096 * segmentpos.c, segmentpos.h: Enhanced dump procedure to report absolute 29097 genomic positions. 29098 29099 * match.c, match.h, result.c, result.h: Storing signed genome_coverage into 29100 dpentry and checking for impossible slopes (< 0.9). 29101 29102 * offset.c, offset.h, offsetdb.c, offsetdb.h: Added dump procedure. 29103 29104 * endpoints.c: Added printing of subaccessions for Celera genome. Added 29105 commas to output of positions. Changed dominated function to look for any 29106 overlap instead of complete coverage. 29107 29108 * Makefile, dpentry.c, dpentry.h: Changed comparison function to use slopes. 29109 29110 * chrnum.c: Added check for uninitialized chromosome. 29111 291122002-07-01 twu 29113 29114 * block.h, buffer-thread-attempt.c, buffer-thread-attempt.h, buffer.c, 29115 buffer.h, dbentry.c, dbentry.h, entry.c, entry.h, hits.c, hits.h, oligo.c, 29116 sort.c, sort.h, table.c, table.h: Removed unused files. 29117 29118 * Makefile, block.c, block.h, database.c, database.h, endpoints.c, 29119 endpoints.h, hash-oligos.c, hit.c, hit.h, match.c, match.h, oligo.c, 29120 oligo.h, request.c, request.h, result.c, result.h, scan.c, scan.h, 29121 segmentpos.c, snap.c, snapindex.c, gmap.c, gmapindex.c: Changed oligo_db 29122 from BerkeleyDB to PureDB. Created object for endpoints. Removed unused 29123 files. 29124 291252002-06-29 twu 29126 29127 * segmentpos.c, segmentpos.h, snap.c, gmap.c: Added genomic position to the 29128 output. 29129 29130 * match.c, result.c, snap.c, gmap.c: Fixed memory leaks. 29131 29132 * block.c, block.h, dpentry.c, dpentry.h, match.c, match.h, request.c, 29133 request.h, result.c, result.h, snap.c, gmap.c: Added minimum spanning 29134 tree. Version appears to work well. 29135 29136 * Makefile, block.c, block.h, dpentry.c, dpentry.h, match.c, match.h, 29137 request.c, request.h, result.c, result.h, snap.c, gmap.c: Early version of 29138 dynamic programming that stores H best paths at each hit. 29139 291402002-06-28 twu 29141 29142 * match.c, match.h, result.c, result.h, segmentpos.c, snap.c, gmap.c: Added 29143 simple dynamic programming and best pair techniques. 29144 29145 * Makefile, block.c, block.h, database.c, database.h, match.c, match.h, 29146 oligo.c, oligo.h, reqpost.c, request.c, request.h, result.c, result.h, 29147 snap.c, snapindex.c, gmap.c, gmapindex.c: Implemented working version of 29148 snap that uses multiple oligo_dbs with requests and strings results 29149 together from 5' and 3' ends. 29150 29151 * commafmt.c, commafmt.h: Added source code for adding commas to numbers. 29152 291532002-06-25 twu 29154 29155 * Makefile: Added specification of directory for dbenv. 29156 29157 * database.c, snapindex.c, gmapindex.c: Added provisions for transactions, 29158 to try to speed up build of database. 29159 291602002-05-29 twu 29161 29162 * snapindex.c, gmapindex.c: Allowed the user to specify a directory for the 29163 BerkeleyDB environment. 29164 291652002-05-27 twu 29166 29167 * Makefile, segmentpos.c, snapindex.c, gmapindex.c: Added specification of 29168 segmentfile as flag -g. 29169 29170 * Makefile, database.c, database.h, snapindex.c, gmapindex.c: Removed 29171 genome_db and delta_db from snapindex. 29172 29173 * Makefile, segmentpos.c, segmentpos.h, snapindex.c, gmapindex.c: Added 29174 ability to dump segments (in order) from segmentpos_db 29175 29176 * Makefile: Changed flags for C compiler. 29177 29178 * segmentpos.c: Added check to get previous segment only in some cases. 29179 291802002-05-22 twu 29181 29182 * Makefile, oligo.c, oligo.h, read.c, read.h, readcirc.c, readcirc.h, 29183 scan.c, scan.h, segmentpos.c, segmentpos.h, snap.c, gmap.c: Working 29184 version of snap using a scan of genomic and delta information. 29185 291862002-05-21 twu 29187 29188 * Makefile, cell.c, cell.h, database.c, database.h, hit.c, hit.h, offset.c, 29189 offset.h, offsetdb.c, offsetdb.h, oligo.c, oligo.h, scan.c, scan.h, 29190 snap.c, snapindex.c, gmap.c, gmapindex.c, table.c, table.h: Made changes 29191 to store delta position of genomic oligos and to store oligos of query 29192 sequence. 29193 291942002-05-08 twu 29195 29196 * Makefile, database.c, database.h, hash-oligos.c, snap.c, snapindex.c, 29197 gmap.c, gmapindex.c: Consolidated sample-oligos and hash-oligos into 29198 snapindex. Specified oligo dbtype by using Berkeley DB constants. 29199 292002002-05-03 twu 29201 29202 * Makefile, cell.c, cell.h, database.c, database.h, hit.c, read.c, 29203 readcirc.c, sample-oligos.c: Separated database commands for oligos from 29204 the other database (aux). 29205 292062002-04-26 twu 29207 29208 * Makefile, hit.c, oligo.c, read.c, readcirc.c, snap.c, gmap.c: Removed 29209 environment. Began implementation of dynamic programming. 29210 29211 * Makefile, chrnum.c, chrnum.h, database.c, database.h, hash-oligos.c, 29212 hit.c, hit.h, offset.c, offset.h, offsetdb.c, offsetdb.h, oligo.c, 29213 oligo.h, read.c, read.h, readcirc.c, readcirc.h, snap.c, gmap.c: 29214 Re-implementation of SNAP using new database created by hash-oligos. 29215 292162002-04-24 twu 29217 29218 * segmentpos.c, segmentpos.h: Handled problems with chromosome string to 29219 integer conversions. 29220 29221 * hash-oligos.c: Handled problems with chromosome string to integer 29222 conversions. Rearranged calls to db->open so that each db is opened only 29223 once. 29224 29225 * Makefile, btree.c, btree.h, hash-oligos.c, hash.c, hash.h, oligo.c, 29226 oligo.h: Consolidated code into fewer files. 29227 29228 * Makefile: Changed CFLAGS to optimize speed. 29229 29230 * cell.h: Matched up .h file with .c file. 29231 29232 * Makefile, genomicpos.c, genomicpos.h, hash-oligos.c, segmentpos.c, 29233 segmentpos.h: Now storing genomic locations as global positions, which 29234 require keeping track of chromosomal offsets. 29235 29236 * Makefile, btree.c, btree.h, cell.c, cell.h, entry.c, entry.h, 29237 genomicpos.c, genomicpos.h, hash-oligos.c, sample-oligos.c: Major change 29238 to allow B-trees, to avoid storing adjacent oligos, to store genomic 29239 positions, and to write oligos in binary format. 29240 292412002-04-22 twu 29242 29243 * Makefile, assert.c, assert.h, block.c, block.h, bool.h, 29244 buffer-thread-attempt.c, buffer-thread-attempt.h, buffer.c, buffer.h, 29245 cksum-fa.c, cksum.c, dbentry.c, dbentry.h, entry.c, entry.h, except.c, 29246 except.h, hash-oligos.c, hash-test.c, hash.c, hash.h, hits.c, hits.h, 29247 list.c, list.h, match.c, match.h, mem.c, mem.h, oligo.c, oligo.h, read.c, 29248 read.h, readcirc.c, readcirc.h, reqpost.c, reqpost.h, request.c, 29249 request.h, result.c, result.h, sample-oligos.c, snap-withenv.c, snap.c, 29250 sort.c, sort.h, src, gmap.c: Initial import into CVS. 29251 292522000-05-08 paf 29253 29254 * config, cvswrappers, loginfo, modules, CVSROOT, checkoutlist, commitinfo, 29255 editinfo, notify, rcsinfo, taginfo, verifymsg: initial checkin 29256 292572000-05-08 (no author) 29258 29259 * branches, tags, trunk: Standard project directories initialized by cvs2svn. 29260 29261