12020-09-17  twu
2
3    * Makefile.gsnaptoo.am: Including sam_sort again
4
52020-09-13  twu
6
7    * VERSION, index.html: Updated version number
8
9    * path-solve.c, distant-rna.c, stage3hr.c, terminal.c: Using new interfaces
10      to Substring trim functions
11
12    * gmap_build.pl.in: Fixed flag in help output
13
14    * substring.c, substring.h: For DNA-seq, not allowing extension of last
15      mismatch at end if it extends beyond chromosomal bounds
16
172020-07-10  twu
18
19    * pair.c: Added semicolon between Dir and coverage in gff3 output
20
212020-06-29  twu
22
23    * trunk, VERSION: Revised version number
24
25    * src: Merged revisions 222852 through 222926 from
26      branches/2020-06-12-end-trimming
27
28    * terminal.c: Merged revisions 222852 through 222926 from
29      branches/2020-06-12-end-trimming to use new interface to
30      Subsrtring_trim_qend_nosplice
31
32    * stage1hr.c, stage1hr.h: Merged revisions 222852 through 222926 to use both
33      max_mismatches_refalt and max_mismatches_ref
34
35    * gsnap.c: Merged revisions 222852 through 222926 to add
36      --max-mismatches-ref and to change --ignore-trim-in-filtering to
37      --filter-within-trims
38
39    * substring.c, substring.h: Merged revisions 222852 through 222926 from
40      branches/2020-06-12-end-trimming to backup 1 bp for mismatches at the ends
41      of reads for DNA-seq
42
43    * stage3hr.c, stage3hr.h, stage3hrdef.h: Merged revisions 222852 through
44      222926 from branches/2020-06-12-end-trimming to use refalt for scoring and
45      both ref and refalt for filtering
46
47    * indel.c: Merged revisions 222852 through 222926 from
48      branches/2020-06-12-end-trimming to use new interface to
49      Genome_count_mismatches_substring
50
51    * genome128_hr.c, genome128_hr.h: Merged revisions 222852 through 222926
52      from branches/2020-06-12-end-trimming to handle masked genomes
53
542020-06-22  twu
55
56    * gsnap.c: Using new interface to Indel_setup
57
58    * indel.h: Providing genomelength in Indel_setup
59
60    * indel.c: Using Genome_fill_buffer_ref instead of Genome_fill_buffer_blocks
61      to avoid issues at ends of the genome
62
63    * genome.c, genome.h: Implemented Genome_fill_buffer_ref
64
652020-06-18  twu
66
67    * cpuid.c: Fixed parameter list for Intel compilers
68
69    * stage3.c: Adding another exception for long end introns: end exon must be
70      less than 40 bp
71
722020-06-15  twu
73
74    * VERSION: Updated version number
75
76    * trunk, src, genome128_consec.c, genome128_hr.c, splice.c: Merged revisions
77      222853 through 222859 from branches/2020-06-12-end-trimming to increase
78      MIN_EXON_LENGTH from 9 to 20
79
802020-06-13  twu
81
82    * trunk, src, substring.c: Merged revision 222854 from
83      branches/2020-06-12-end-trimming to iterate through correct number of
84      mismatches from Genome trim procedures, and when splicing fails, using
85      trimpos rather than pos5 or pos3
86
872020-06-04  twu
88
89    * samprint.c: Fixed memory leak with mate_md_fp in nomapping alignments
90
91    * trunk, VERSION, config.site.rescomp.tst: Updated version number
92
93    * index.html: Updated for new version
94
95    * src, Makefile.gsnaptoo.am, cigar.c, cigar.h, gsnap.c, mdprint.c,
96      mdprint.h, samprint.c, samprint.h: Merged revisions 222824 through 222834
97      from branches/2020-06-03-MD to integrate computation of CIGAR and MD
98      strings
99
1002020-06-03  twu
101
102    * trunk, cigar.c, cigar.h, samprint.c: Merged revisions 222820 through
103      222823 from branches/2020-06-03-MD to compute MD string correctly for
104      hardclipping on minus alignments and for the --sam-hardclip-use-S flag
105
106    * samprint.c: Removed debugging macro
107
108    * samprint.c: Removed code that led to a zero-length MD string when
109      hard-clipping was present
110
111    * substring.c: Changed format of debugging statement to handle both regular
112      and large genomes
113
114    * splice.c, oligoindex_hr.c, kmer-search.c, iit-read-univ.c: Removed unused
115      variables
116
117    * gsnap.c: Using new interfaces to Indexdb_new_genome and
118      Indexdb_new_transcriptome
119
120    * gmap.c, atoiindex.c, cmetindex.c, indexdb-cat.c: Using new interface to
121      Indexdb_new_genome
122
123    * genome.c, transcriptome.c: Removed unused variable
124
125    * compress-write.c: Fixed messages to stderr
126
127    * stage3.c: Changed type of intronlength from Chrpos_T to int.  Changed
128      types of new_leftgenomepos and new_rightgenomepos to be int
129
130    * stage1hr.c: Using new interface to Terminal_solve_plus,
131      Terminal_solve_minus, and Distant_rna_solve
132
133    * distant-rna.c, distant-rna.h: Removed unused parameters queryuc_ptr and
134      queryrc
135
136    * terminal.c, terminal.h: Removed unused parameters queryuc_ptr and queryrc
137      from Terminal_solve_plus and Terminal_solve_minus, respectively
138
139    * uniqscan.c: Using new interface to Stage3_new_genome
140
141    * stage3hr.c: Using new interface to Junction_new_chimera
142
143    * junction.c, junction.h: Removed unused parameter sensedir from
144      Junction_new_chimera
145
146    * indexdb.c, indexdb.h: Removed unused parameter expand_offsets_p
147
148    * trunk, VERSION, config.site.rescomp.tst, index.html, src, distant-dna.c,
149      distant-dna.h, gsnap.c, indel.c, indel.h, kmer-search.c, path-solve.c,
150      splice.c, splice.h, stage3hr.c, stage3hr.h: Merged revisions 222790
151      through 222796 from branches/2020-06-03-TGGA to fix bugs in
152      transcriptome-guided genomic alignment
153
1542020-06-02  twu
155
156    * stage3hrdef.h: Added comment
157
158    * substring.c: Made computations for mandatory trims similar to those for
159      querystart_chrbound and queryend_chrbound
160
161    * stage3hr.c: Computing fields mandatory_trim_querystart and
162      mandatory_trim_queryend and using them in computing coverage
163
164    * stage3hrdef.h: Added fields mandatory_trim_querystart and
165      mandatory_trim_queryend
166
167    * kmer-search.c: Changed types of genomic coords from Trcoord_T to
168      Univcoord_T
169
170    * get-genome.c, snpindex.c: Added calls to Genome_setup
171
1722020-06-01  twu
173
174    * VERSION: Updated version number
175
176    * gmap.c: Using new interface to Genome_user_setup
177
178    * genome.c, genome.h: Added genomelength to Genome_user_setup
179
180    * gsnap.c: Moved Genome_setup before knownsplicing initialization
181
1822020-05-31  twu
183
184    * gmap.c, gsnap.c: Using new interface to Genome_setup
185
186    * index.html: Updated for latest version
187
188    * VERSION: Updated version number
189
190    * README: Updated information
191
192    * genome.c: Fixed bug in specifying coordinates in Genome_fill_buffer_simple
193
194    * genome.c, genome.h: Modifying pos5 and pos3 in Genome_fill_buffer_simple
195      to avoid going outside of genome bounds
196
197    * stage3hr.c: Fixed a bug in Stage3end_remove_duplicates that failed to
198      return distant alignments
199
200    * gmap_build.pl.in: Checking if transcript FASTA or genes file is provided
201      but transcriptome name is not
202
203    * gsnap.c: Using new interfaces to setup procedures
204
205    * stage3hr.c, stage3hr.h, substring.c, substring.h, terminal.c, terminal.h,
206      distant-rna.h, extension-search.c, extension-search.h, kmer-search.h,
207      path-solve.c, path-solve.h: Providing genomelength to setup procedures
208
209    * kmer-search.c: Fixed calculations of pos5 and pos3 for transcriptome bounds
210
211    * distant-rna.c: Fixed bug in computation of pos3.  Providing genomelength
212      to setup procedure
213
214    * stage3hr.c, stage3hr.h: Removed unused variables for Stage3hr_setup
215
216    * gsnap.c, substring.c, substring.h: Removed unused variables for setup
217
218    * get-genome.c: Added a --genes option for converting a genes file to FASTA
219      format
220
221    * gmap_build.pl.in: Added --genes option for building a transcriptome from a
222      genes file
223
224    * gff3_genes.pl.in: Added options for printing exon and/or CDNA fields.
225      Printing only exons by default
226
227    * get-genome.c: Removed unused code
228
229    * kmer-search.c: Fixed issue with uninitialized variable in
230      transcriptome-guided genomic alignment
231
2322020-05-30  twu
233
234    * VERSION: Updated version number
235
236    * gmap_build.pl.in: Building a genome index based on the presence of genome
237      FASTA files
238
239    * path-solve.c: Handling the case where best_left_paths or best_right_paths
240      is NULL, due to an alignment attempt that yields an unacceptable path
241
242    * kmer-search.c: No longer using SIMD shortcuts for computing exoni.  Not
243      comparing len against nindels, only exon_residual against nindels
244
245    * stage1hr.c: Edited comment
246
247    * stage3hr.c: Removed test against MIN_ALIGNMENT_LEN
248
249    * transcriptome.c: Added variable for debugging
250
251    * transcript.c: Fixed memory leak
252
253    * distant-rna.c: Computing fragment substring bounds for new
254      Genome_mismatches_left and Genome_mismatches_right
255
256    * distant-dna.c: Fixed memory leak.  Allocating memory for new
257      Genome_mismatches_left and Genome_mismatches_right
258
259    * genome128_hr.c: For Genome_mismatches_left and Genome_mismatches_right,
260      enforcing that nmismatches <= max_mismatches
261
2622020-05-29  twu
263
264    * VERSION: Updated version number
265
266    * iit_store.c: Fixed bug in allocating memory for string
267
268    * path-solve.c: Rearranged debugging code
269
270    * distant-rna.c: Using new interface to Substring_qstart_trim and
271      Substring_qend_trim
272
273    * stage3hr.c: In Stage3end_new_substitution, checking for pos5 and pos3
274      being outside of genome bounds
275
276    * substring.c, substring.h: Changed parameter names for
277      Substring_qstart_trim and Substring_qend_trim
278
279    * iit_store.c: Fixed bug with double freeing line in parsing GFF3 files
280
281    * concordance.c, concordance.h, distant-dna.c, distant-dna.h, kmer-search.c,
282      kmer-search.h, path-solve.c, path-solve.h, samprint.c, samprint.h,
283      simplepair.c, simplepair.h, stage1hr.c, stage1hr.h, terminal.c,
284      terminal.h: Using splicingp instead of novelsplicingp
285
286    * gsnap.c: Added variable splicingp and providing it to setup procedures
287
2882020-05-28  twu
289
290    * stage1hr.c: Testing novelsplicingp before all code for antisense hits
291
292    * ladder.c: In Ladder_minimax_trim, handling the case where the antisense
293      ladders are NULL
294
295    * VERSION: Updated version number
296
297    * trunk, src, concordance.c, concordance.h, distant-dna.c, distant-dna.h,
298      distant-rna.c, distant-rna.h, extension-search.c, extension-search.h,
299      genome128_hr.c, gsnap.c, indel.c, indel.h, junction.c, junction.h,
300      kmer-search.c, kmer-search.h, ladder.c, ladder.h, path-solve.c,
301      path-solve.h, segment-search.c, segment-search.h, splice.c, splice.h,
302      stage1hr.c, stage3hr.c, stage3hr.h, stage3hrdef.h, substring.c,
303      substring.h, terminal.c, terminal.h: Merged revisions 222646 through
304      222711 from branches/2020-05-23-min-coverage to improve splice-plus-indel
305      alignments, allow multiple paths in path-solve procedures, and find
306      concordance separately for sense and antisense alignments
307
3082020-05-23  twu
309
310    * indexdb-cat.c: Handling cases where sampling intervals are different
311
312    * gsnap.c: Using new interface to Extension_search_setup
313
314    * extension-search.c, extension-search.h: Generalized from an index1interval
315      of 3
316
3172020-05-20  twu
318
319    * stage1hr.c: Always making calls to Stage3end_filter
320
321    * stage3hr.c: Added a debugging statement
322
3232020-05-19  twu
324
325    * trunk, src, gsnap.c, indel.c, ladder.c, merge-diagonals-simd-uint4.c,
326      regiondb.c, splice.c, stage1hr.c, stage3hr.c, stage3hr.h, stage3hrdef.h,
327      terminal.c, terminal.h: Merged revisions 222614 through 222628 from
328      branches/2020-05-18-filtering to improve the filtering and choices among
329      alignments
330
331    * VERSION, config.site.rescomp.tst: Updated version number
332
3332020-05-18  twu
334
335    * substring.c: Removed printing of NA for probabilities under NO_COMPARE
336      macro
337
3382020-05-15  twu
339
340    * splice.c: Removed code that exited prematurely from search for indels plus
341      splicing
342
343    * splice.c: Removed unused code
344
3452020-05-14  twu
346
347    * stage3hr.c: In Stage3end_new_substrings, checking if an alignment in a
348      circular chromosome exceeds chrlength
349
350    * substring.c: Added assertions
351
3522020-05-13  twu
353
354    * stage3hr.c: Setting trim_querystart and trim_queryend, and then
355      querystart_chrbound and queryend_chrbound, revising according to trim
356      values
357
358    * stage3hr.c: Applying minimum alignment length to Stage3_new_substrings
359
360    * stage3hr.c: Requiring a minimum exon length before reducing score for
361      spliced ends
362
363    * extension-search.c: Using new interface to Univ_IIT_update_chrnum
364
365    * stage3hr.c: Fixed penalty for trims to work with spliced ends
366
367    * gsnap.c: Restored the -E abbreviation for --distant-splice-penalty
368
3692020-05-12  twu
370
371    * trunk, config.site.rescomp.tst, src, distant-rna.c, extension-search.c,
372      extension-search.h, gsnap.c, kmer-search.c, kmer-search.h, path-solve.c,
373      path-solve.h, segment-search.c, segment-search.h, stage1hr.c, stage1hr.h,
374      stage3hr.c, stage3hr.h: Merged revisions 222561 through 222589 from
375      branches/2020-05-08-large-insertions to identify large insertions
376
377    * Makefile.gsnaptoo.am: Added comment
378
379    * stage3hr.c: Not compensating for short chrlengths if chrlength is 0
380
381    * stage1hr.c: Filtering singlehits5 and singlehits3
382
383    * terminal.c, distant-rna.c: Using new interface to Univ_IIT_get_chrnum
384
385    * genome128_hr.c: Setting final querypos to be pos5 - 1, rather than -1
386
387    * concordance.h: Removed unused header file
388
389    * iit-read-univ.c, iit-read-univ.h: Univ_IIT_get_chrnum,
390      Univ_IIT_update_chrnum, and Univ_IIT_get_trnum take low and high as
391      parameters
392
393    * segment-search.c: Checking for the need to go to a previous chrnum
394
395    * intersect-large.c: Adding parentheses for clarity
396
397    * kmer-search.c: Using new interface to Univ_IIT_get_trnum
398
399    * path-solve.c: In attach_qstart_diagonal and attach_qend_diagonal, checking
400      that left+pos5/pos3 are not before the beginning of the genome
401
4022020-05-09  twu
403
404    * intersect-large.c: Fixed check for positions0 at beginning of genome
405
406    * extension-search.c: Using new interface to Univ_IIT_update_chrnum
407
4082020-04-25  twu
409
410    * stage3hr.c: Redefining low and high in Stage3end_T object to be based on
411      the aligned endpoints
412
413    * extension-search.c: Checking for genomic bounds on each of the
414      univdiagonals
415
4162020-04-24  twu
417
418    * intersect-large.c, intersect.c: Fixed Intersect_exact_indices routines to
419      handle small univdiagonals less than diagterm
420
4212020-04-22  twu
422
423    * splice.c: Specifying a minimum splice prob
424
425    * stage3hr.c: Adding slop for insert length and splice score
426
427    * util, gmap_cat.pl.in: Removed unused lines
428
429    * trunk, config.site.rescomp.prd, config.site.rescomp.tst, src,
430      Makefile.gsnaptoo.am, concordance.c, distant-rna.c, genome128_hr.c,
431      genome128_hr.h, gmap.c, gsnap.c, indexdb.c, kmer-search.c, ladder.c,
432      path-solve.c, path-solve.h, splice.c, splice.h, splicetrie.c,
433      splicetrie.h, stage3.h, stage3hr.c, stage3hr.h, stage3hrdef.h,
434      substring.c, substring.h, terminal.c: Merged revisions 222160 through
435      222483 from branches/2020-03-13-exon-intron-scores to allow for masked
436      genomes
437
438    * simplepair.c: No longer printing transcript information, since it requires
439      a merging step
440
441    * samprint.c: Using new interfaces to procedures for printing transcripts
442
443    * stage3hr.c, stage3hr.h, stage3hrdef.h, transcript.c, transcript.h: Removed
444      transcripts5 and transcripts3 from Stage3pair_T object, and computing and
445      printing concordance when needed
446
447    * stage3hr.c: Put debugging statements within a macro
448
449    * stage3hr.c: Merged revision 222198 from branches/2020-03-13 to restore
450      hit_equal and hitpair_equal procedures for handling overlaps within loci
451
452    * substring.c: Merged revision 222188 from
453      branches/2020-03-13-exon-intron-scores to improve trimming
454
455    * stage1hr.c: Merged revision 222189 from
456      branches/2020-03-13-exon-intron-scores to call optimal_score_prefinal
457      before removing overlaps and optimal_score_final
458
459    * splice.c, splice.h: Merged revision 222187 from
460      branches/2020-03-13-exon-intron-scores to taking probability and
461      nconsecutive thresholds as parameters
462
4632020-04-21  twu
464
465    * iit-read-univ.c, iit-read-univ.h: Removed obsolete procedure
466      Univ_IIT_interval_bounds_linear
467
468    * kmer-search.c: Using Univ_IIT_get_trnum
469
470    * trunk, configure.ac, src, Makefile.gsnaptoo.am, chrnum.h, distant-dna.c,
471      distant-rna.c, extension-search.c, extension-search.h, gmapindex.c,
472      gsnap.c, iit-read-univ.c, iit-read-univ.h, indexdb.c, indexdb.h,
473      kmer-search.c, localdb.c, merge-diagonals-heap.c,
474      merge-diagonals-simd-uint4.c, merge-diagonals-simd-uint8.c, path-solve.c,
475      path-solve.h, record.h, regiondb.c, regiondb.h, segment-search.c,
476      stage1hr.c, stage3hr.c, stage3hr.h, substring.c, substring.h, terminal.c,
477      util: Merged revisions 222404 through 222444 from
478      branches/2020-04-14-right-diagonals to use univdiagonals instead of lefts
479
4802020-04-18  twu
481
482    * samprint.c: Allowing for XS field to be printed in transcriptome-guided
483      alignment
484
485    * index.html: Updated for version 2020-04-08
486
487    * gsnap.c: Using new interface to Transcriptome_new
488
489    * gmap_build.pl.in: Added options for building a transcriptome db
490
491    * transcriptome.c, transcriptome.h: Putting transcriptome info into a
492      subdirectory of the genome db
493
494    * trindex.c: Putting transcriptome info into a subdirectory in the genome db
495
496    * kmer-search.c: Fixed bug in computing adj
497
498    * Makefile.gsnaptoo.am: Restored trindex program
499
5002020-04-15  twu
501
502    * output.c: Fixed bug in checking for a condition
503
5042020-04-14  twu
505
506    * gmap.c, output.c, output.h: Added options cdna+introns and genomic+introns
507      to the --exons flag
508
5092020-04-13  twu
510
511    * svncl.pl: Grouping files with identical comments
512
5132020-04-12  twu
514
515    * iit_get.c: Setting coordstart and coordend to 0 when force_label_p is true
516
5172020-04-10  twu
518
519    * VERSION: Updated version number
520
521    * access.c, access.h, iit-read-univ.c, iit-read.c: Removed procedures for
522      read/write memory mapping
523
5242020-04-08  twu
525
526    * trunk, src, gmapindex.c, indexdb-cat.c, gmap_cat.pl.in: Merged revisions
527      222346 to 222387 from branches/2020-03-13-exon-intron-scores to remove the
528      -F flag from concatenation programs
529
5302020-04-05  twu
531
532    * VERSION, config.site.rescomp.prd, index.html: Revised for latest version
533
534    * svncl.pl: Adding spaces between lines of multi-line comments
535
536    * svncl.pl, MAINTAINER: Replaced svncl.pl with a program that does not
537      depend on xml output
538
539    * configure.ac: Added comment
540
541    * dynprog_end.c: Fixed debugging macro
542
543    * gsnap.c: Turned shared memory off by default
544
545    * gmap.c: Turned shared memory off by default.  Added option
546      --use-shared-memory
547
548    * compress-write.c, regiondb-write.c: Initializing value of current_pos in
549      concatenation procedures
550
5512020-04-02  twu
552
553    * get-genome.c: Added option --add-circular
554
5552020-03-27  twu
556
557    * iit_store.c: Added option --accesion-only
558
5592020-03-23  twu
560
561    * compress-write.c, compress-write.h, gmapindex.c: Allowing for compressing
562      genomes from stdin
563
564    * iit_get.c: Removed debugging command
565
566    * compress-write.c, compress-write.h, gmapindex.c: Implemented option for
567      compressing FASTA files
568
569    * iit_get.c: Fixed printing of sequence with coordinates
570
571    * gmap.c, output.c, output.h, pair.c, pair.h: Differentiating between
572      mask_introns and mask_utr_introns
573
5742020-03-18  twu
575
576    * gmap.c, output.c, output.h, pair.c, pair.h: Added option --mask-introns to
577      GMAP
578
5792020-03-13  twu
580
581    * iit_get.c: Ignoring linefeed characters in handling coordinates
582
583    * stage3hr.c, stage3hrdef.h, substring.c, substring.h: Computing
584      querystart_chrbound and queryend_chrbound, and disregarding in comparing
585      against max_mismatches
586
587    * index.html: Added statement about nosimd versions being restored
588
5892020-03-12  twu
590
591    * archive.html, index.html: Updated for latest version
592
593    * trunk, VERSION, config.site.rescomp.prd, config.site.rescomp.tst,
594      index.html, src, Makefile.gsnaptoo.am, atoi.c, atoi.h, atoiindex.c,
595      cmet.c, cmet.h, cmetindex.c, dynprog_end.c, gmap.c, gmapindex.c, gsnap.c,
596      iit-read.c, iit-read.h, indexdb-write.c, indexdb.c, indexdb.h,
597      intersect.c, kmer-search.c, kmer-search.h, localdb.c,
598      merge-diagonals-simd-uint8.c, path-solve.c, path-solve.h,
599      regiondb-write.c, regiondb-write.h, regiondb.c, regiondb.h, regiondbdef.h,
600      stage1hr.c, stage1hr.h, transcriptome.c, trindex.c, uint8list.c,
601      uint8list.h, util, fa_coords.pl.in, gmap_build.pl.in, gmap_cat.pl.in:
602      Merged revisions 221585 through 222138 from
603      branches/2020-02-01-local-fixed-size to implement a regiondb hash to
604      replace the localdb hash
605
606    * MAINTAINER: Revised instructions for rosalind
607
608    * bootstrap.gsnaptoo: Using automake and autoreconf in path
609
610    * index.html: Added comment about default behavior for localdb usage
611
6122020-03-06  twu
613
614    * gsnap.c: Setting default behavior for localdb usage to be true for RNA-seq
615      and false for DNA-seq
616
617    * gmap.c: Setting npaths_primary and npaths_altloc when stage3list is NULL
618
619    * atoiindex.c, cmetindex.c, genome.c, indexdb-write.c, indexdb.c, localdb.c,
620      snpindex.c: Using new interface to Access_mmap
621
622    * access.c, access.h: Access_mmap now returns seconds
623
6242020-02-20  twu
625
626    * trunk, index.html, src, iit_store.c, indexdb-cat.c, indexdb.c, util,
627      gmap_cat.pl.in: Merged revisions 221935 through 221944 from
628      branches/2020-02-01-local-fixed-size to handle circular chromosomes in
629      gmap_cat
630
631    * gmap_cat.pl.in, indexdb-cat.c: Allowing -F flag to handle multiple source
632      directories
633
634    * gmapindex.c: Improved user error message
635
636    * gmap_build.pl.in: Made none the default value for sorting.  Removed usage
637      of sourcedir and -F in calls to gmapindex.
638
639    * gmapindex.c: Removed usage of -F except for concatenating genomes.
640      Allowing -F to handle multiple source directories
641
6422020-02-19  twu
643
644    * README: Now references LICENSE file
645
646    * Makefile.am: Added LICENSE file for distribution
647
648    * LICENSE: Initial import
649
650    * COPYING: Removed old license.  Now refers to the LICENSE file
651
652    * NOTICE: Removed references to suffix array code.  Added journal reference
653      for Lemire and Boytsov
654
655    * compress-write.c: Implemented a simpler and more general algorithm for
656      Compress_cat
657
6582020-02-17  twu
659
660    * gmapindex.c: Fixed bug when -F and -D values are different when
661      concatenating genomes
662
6632020-02-13  twu
664
665    * indexdb.c: Fixed lengths for kmer size and sampling and error message to
666      user
667
668    * gmap_build.pl.in: Removed unused variable
669
670    * gmap_cat.pl.in: Fixed Id comment line
671
672    * gmap_cat.pl.in: Added Id comment line
673
674    * gmap_cat.pl.in: Removed references to sleep.  Added better final message
675
676    * gmap_build.pl.in, gmap_cat.pl.in: Adding package version numbers to
677      version file
678
679    * Makefile.am: Removed obsolete scripts
680
681    * indexdb-cat.c: Updating required_index1part and required_index1interval so
682      they all match
683
684    * indexdb.c: Added user statement about required kmer and interval
685
686    * gmap_cat.pl.in: Checking compiler assumptions and creating version file
687
688    * compress-write.c: Made fixes to computation of flags
689
690    * trunk, index.html: Updated version
691
692    * VERSION, config.site.rescomp.prd, config.site.rescomp.tst: Updated version
693      number
694
695    * gmap_cat.pl.in: Implemented --names option
696
697    * gmap_build.pl.in: Changed documentation
698
699    * gmap_cat.pl.in: Initial import
700
701    * index.html: Updated for latest version
702
703    * access.h: Added NOT_USED as an access type
704
705    * access.c: Adding a check for a null filename
706
707    * gmap_build.pl.in: Removed comment about gmap_setup
708
709    * indexdb-cat.c: Using destdir, which should be set correctly by gmap_cat
710
711    * iit_store.c: For universal IIT output, adding a circular type
712
713    * gmapindex.c: Added code for concatenating genomecomp
714
715    * gmap.c: Turning off contig output
716
717    * compress-write.c: Put debugging messages inside macro
718
719    * bitpack64-write.c, bitpack64-write.h: Using
720      Bitpack64_compute_q4_diffs_bidir_huge.  Removed duplicate code
721
722    * README: Removed references to suffix array
723
724    * configure.ac: Added gmap_cat.  Removed obsolete scripts
725
726    * Makefile.gsnaptoo.am: Added instructions for indexdb_cat
727
728    * indexdbdef.h: Added a field hugep
729
730    * indexdb.c: Modified user messages
731
732    * indexdb-cat.c: Implemented procedures for huge genomes and 8-byte positions
733
734    * indexdb-cat.c: Implemented uint8 procedure, but has errors
735
736    * indexdb-cat.c: Implemented merging of positions
737
738    * indexdb.c: Cleaned up Indexdb_new_genome procedure
739
7402020-02-12  twu
741
742    * indexdb-cat.c: Initial import
743
744    * compress-write.c: Enclosed debugging statements into macros
745
746    * compress-write.c: Fixed bugs in Compress_cat
747
748    * compress-write.c: Simplified code for Compress_cat
749
7502020-02-11  twu
751
752    * compress-write.c: Made fixes for shifts < 16
753
754    * compress-write.c, compress-write.h: Initial implementation of Compress_cat
755      for concatenating genomes
756
757    * Makefile.gsnaptoo.am: Removed trindex and sam_sort
758
759    * gmap.c, pair.c, sequence.c, sequence.h, stage3.c, stage3.h: Added option
760      --gff3-fasta-annotation to GMAP
761
7622020-02-04  twu
763
764    * iit-read.c: Fixed IIT_dump for intervals with start and end both being 0
765
766    * iit_store.c: Handling the case where annotation has zero length
767
768    * gmap_build.pl.in: Returning pipe variable
769
770    * stage3.c: Using new interface to Pair_print_gff3
771
772    * sequence.c, sequence.h: Implemented Sequence_restofheader
773
774    * path-solve.c: No longer using localdb for large genomes
775
776    * pair.c, pair.h: For GFF3 output, printing FASTA headers as annotation
777
778    * localdb.c: Created a separate debugging category
779
7802020-01-30  twu
781
782    * Makefile.gsnaptoo.am: Removed ushortlist.c
783
784    * dynprog.h, dynprog_cdna.h, dynprog_end.h, dynprog_genome.h,
785      dynprog_single.h, extension-search.h, genome-write.h, genome128-write.h,
786      iit-read-univ.h, ladder.h, record.h, stage2.h, stage3hrdef.h: Added
787      include for genomicpos.h
788
789    * chrnum.h, genome.h, genomicpos.h, merge-records-heap.h,
790      merge-records-simd.h, path-solve.h: Added include for univcoord.h
791
792    * snpindex.c: Making genomeblocks for snpindex the same length as for the
793      reference
794
795    * parserange.c: Initializing coordstart and coordend
796
797    * localdbdef.h, localdb.c, localdb.h: Using a single loctable
798
799    * localdb-write.c, localdb-write.h: Using Uintlist_T instead of Ushortlist_T
800
801    * iit_get.c: If coords are given, printing the corresponding substring of
802      the annotation
803
804    * gsnap.c, gmap.c: Allowing pipe signals
805
806    * gmapindex.c: Printing user messages about compression only when offsets
807      are compressed
808
809    * epu16-bitpack64-readtwo.c: Declaring a variable
810
8112020-01-23  twu
812
813    * iit_store.c: Allowing input FASTA to have labels without intervals
814
8152019-12-15  twu
816
817    * fa_coords.pl.in, gmap_build.pl.in, gmap_process.pl.in: Handling 1-column
818      names.txt file
819
820    * stage3.c: Changed some variables from int to Chrpos_T
821
822    * merge-diagonals-simd-uint8.c: Using correct printf format for uint8
823
824    * localdb.c, localdb.h, localdbdef.h, localdb-write.c, localdb-write.h:
825      Handling large genomes
826
827    * indexdb-write.c, indexdb-write.h: Changed procedure names to make them
828      clearer
829
830    * gmapindex.c: Using new interfaces to indexdb and localdb write procedures
831
832    * epu16-bitpack64-write.h: Added comment
833
834    * Makefile.gsnaptoo.am, ushortlist.c, ushortlist.h: Added Ushortlist_T
835      object for localdb
836
837    * pair.c: Added cDNA direction to GFF3 output
838
8392019-12-12  twu
840
841    * path-solve.c: Changed type for deletionpos to be Univcoord_T
842
8432019-09-12  twu
844
845    * src, concordance.c, concordance.h, distant-dna.c, distant-dna.h,
846      extension-search.c, extension-search.h, gsnap.c, junction.c, junction.h,
847      kmer-search.c, kmer-search.h, list.c, list.h, stage1hr.c, stage1hr.h,
848      stage3hr.c, stage3hr.h, stage3hrdef.h, substring.c, substring.h: Merged
849      revisions 220290 through 220325 from
850      branches/2019-09-11-zero-length-introns to handle cases where ambiguous
851      ends are resolved and where distant DNA alignments yield a zero-length
852      intron
853
854    * index.html: Updated for latest version
855
8562019-08-09  twu
857
858    * inbuffer.c, inbuffer.h: Removed references to interleavedp from GMAP
859
860    * gsnap.c: Added --interleaved feature
861
862    * bzip2.c: Saving a file handle and closing it
863
864    * Makefile.gsnaptoo.am: Including bzip2.c and bzip2.h to relevant programs
865
866    * atoiindex.c, cmetindex.c: Initializing filenames to be NULL
867
8682019-07-15  twu
869
870    * inbuffer.c, inbuffer.h, shortread.h: Added interleavedp parameter
871
872    * shortread.c: Implemented interleaved format for gzip- and bzip2-compressed
873      files.
874
875    * getline.c, getline.h: Implemented Getline_gzip and Getline_bzip2
876
877    * shortread.c: Implemented Shortread_read_interleaved_text
878
8792019-06-11  twu
880
881    * stage1hr.c: Added debugging statements
882
883    * concordance.c: Limiting number of overlaps to avoid combinatorial
884      explosion in some cases
885
8862019-05-20  twu
887
888    * index.html: Updated for latest version
889
890    * ax_ext.m4: Improved structure for AVX2 and AVX512
891
8922019-05-12  twu
893
894    * gmap.c, gsnap_select.c, gsnapl_select.c, cpuid.c, cpuid.h, gmap_select.c,
895      gmapl_select.c: Adding support for avx512bw
896
897    * gsnap.c: Changed default parameter for --max-mismatches for DNA-seq
898
899    * Makefile.gsnaptoo.am: Adding programs for avx512bw
900
901    * configure.ac: Adding option for AVX512BW SIMD
902
903    * ax_cpuid_intel.m4, ax_cpuid_non_intel.m4: Adding test for AVX512BW support
904
905    * ax_ext.m4: Adding commands for AVX512BW
906
907    * univdiagpool.c: Adding assertions
908
909    * substring.c: Checking against substrings on the wrong chromosome
910
911    * stage1hr.c: Commenting out extended algorithm, which can cause problems
912      with repetitive reads
913
914    * output.c: Consider excessive output to be a fail for the purpose of the
915      --nofails flag.
916
917    * merge-uint8.c: Fixed SIMD command for AVX512 machines
918
9192019-03-25  twu
920
921    * extension-search.c, extension-search.h: Implemented extension of elt sets
922      in the opposite direction
923
9242019-03-19  twu
925
926    * segment-search.c: For alignments straddling a chromosome, recomputing
927      querypos and queryend to cover the new chromosome
928
929    * gsnap.c: Using new interface for Path_solve_setup
930
931    * path-solve.h: New interface for Path_solve_setup
932
933    * path-solve.c: Not allowing splices on circular chromosomes
934
935    * concordance.c: Using field sensedir_for_concordance
936
937    * stage3hr.c, stage3hrdef.h: Now using fields sensedir_for_concordance and
938      sensedir
939
940    * samprint.c: Removed references to Stage3end_sensedir_distant_guess
941
9422019-03-15  twu
943
944    * trunk, VERSION, config.site.rescomp.tst, index.html, src,
945      Makefile.gsnaptoo.am, distant-dna.c, distant-dna.h, distant-rna.c,
946      distant-rna.h, gsnap.c, method.c, method.h, output.c, path-solve.c,
947      samprint.c, samprint.h, splice.c, splice.h, stage1hr.c, stage3hr.c,
948      stage3hr.h, stage3hrdef.h, substring.c, substring.h, terminal.c: Merged
949      revisions 218560 through 218674 from branches/2019-03-07-distant-dna to
950      implement distant splicing and to fix some bugs in spliced alignments
951
952    * genome128_hr.c: Added comment
953
9542019-03-06  twu
955
956    * distant-dna.h, distant-dna.c: Initial import
957
9582019-03-05  twu
959
960    * gsnap.c: Added option --use-local-hash
961
962    * trunk, VERSION, config.site.rescomp.tst, src: Updated for latest version
963
9642019-03-04  twu
965
966    * localdb.c, localdb.h, path-solve.c: Merged revisions 218528 and 218529
967      from branches/2019-03-04-fix-repetitive to limit recursive procedures
968
969    * stage3hr.c: Fixed debugging statements
970
971    * stage1hr.c, kmer-search.c: Added debugging statements
972
973    * segment-search.c: Fixed debugging statement
974
975    * gsnap.c: Removed variables relating to stage2 suboptimal alignments
976
9772019-03-02  twu
978
979    * path-solve.c: For compute_qstart_paths and compute_qend_paths, added a
980      max_depth criterion, and checking for repetitive positions
981
9822019-03-01  twu
983
984    * stage3hr.c: Checking nmismatches when we are checking all assertions
985
986    * path-solve.c: Revising qstart of middle segment if an insertion is
987      present.  Revised code for computing ninserts
988
989    * path-solve.c: Fixed addition of alts substring to best path from
990      all_child_paths, rather than to path
991
9922019-02-26  twu
993
994    * VERSION, config.site.rescomp.prd, index.html: Updated for latest version
995
996    * substring.c: In trimming at ends without splicing, extending to the end if
997      nmismatches is 0
998
999    * stage3hr.c: For score_within_trims, adding a penalty for long ambiguous
1000      ends
1001
1002    * substring.c: Restored previous algorithm for computing trim at ends with
1003      no splice. For alts with good splice probability, counting substring as
1004      nmatches rather than amb.  For alts with poor splice probability, counting
1005      substring as amb.
1006
10072019-02-25  twu
1008
1009    * path-solve.c: Fixed cases in compute_qstart_paths and compute_qend_paths
1010      where terminalp was not being set
1011
10122019-02-22  twu
1013
1014    * trunk, config.site.rescomp.prd, config.site.rescomp.tst, src,
1015      concordance.c, concordance.h, distant-rna.c, distant-rna.h,
1016      extension-search.c, extension-search.h, genome128_hr.c, gsnap.c,
1017      kmer-search.c, kmer-search.h, ladder.c, path-solve.c, path-solve.h,
1018      resulthr.c, segment-search.c, segment-search.h, stage1hr.c, stage3hr.c,
1019      stage3hr.h, stage3hrdef.h, substring.c, substring.h, terminal.c,
1020      terminal.h: Merged revisions 218419 through 218472 from
1021      branches/2019-02-19-restore-fusions
1022
1023    * index.html: Updated for latest version
1024
10252019-02-19  twu
1026
1027    * stage1hr.c: Using user_maxlevel_float for final filtering, and not for
1028      searching
1029
1030    * concordance.c: Using score_ignore_trim instead of score_posttrim
1031
1032    * gsnap.c: Added option for --ignore-trim-in-filtering
1033
1034    * stage3hrdef.h: Changed field score_posttrim to score_ignore_trim
1035
1036    * stage3hr.c, stage3hr.h: Changed Stage3hr_filter_coverage to
1037      Stage3hr_filter, which accounts for number of mismatches.  Added parameter
1038      ignore_trim_p.  Changed score_posttrim to score_ignore_trim, and computing
1039      this to be lower than score
1040
10412019-02-18  twu
1042
1043    * terminal.c: Using Univ_IIT_get_chrnum and new interface to Substring_new
1044
1045    * substring.c, substring.h: Substring_new now assumes that chrnum was set
1046      correctly
1047
1048    * stage3hr.c: Using new interface to Substring_new
1049
1050    * stage1hr.c: Using new interface to Segment_identify procedures
1051
1052    * segment-search.c, segment-search.h: Handling alignments straddling more
1053      than two chromosomes.  Removed plusp as a parameter
1054
1055    * kmer-search.c: Using Univ_IIT_get_chrnum.  Using new interface to
1056      Substring_new
1057
1058    * iit-read-univ.c, iit-read-univ.h: Implemented Univ_IIT_update_chrnum and
1059      Univ_IIT_get_chrnum
1060
1061    * extension-search.c: Calling Univ_IIT_update_chrnum to set chrnum
1062
1063    * stage3hr.c: If qend <= qstart, do not calculate number of nmismatches,
1064      which is not defined.  For nmatches, do not penalize for indels
1065
1066    * substring.c: If trimming changes querystart or queryend, recalculate
1067      nmismatches
1068
10692019-02-15  twu
1070
1071    * stage1hr.c: Skipping alignment when querylength is less than index1part +
1072      index1interval - 1
1073
1074    * segment-search.c: When advancing chrnum for straddled alignments, checking
1075      that we do not go past the last chromosome
1076
1077    * VERSION: Updated version number
1078
1079    * indexdb-write.c: Added code for comparing counts with compression and
1080      counts without compression
1081
1082    * bitpack64-write.c, bitpack64-write.h: For huge genomes, using an array of
1083      UINT8 for calculations of genome position.  Printing strerror for
1084      file-related errors.
1085
1086    * substring.c: For circular chromosomes, checking if the entire substring
1087      resides in the next chromosome and returning the circularpos at that query
1088      position
1089
1090    * segment-search.c: When a straddle calls for advancing to a later
1091      chromosome, using local data structures instead of mixing them with a call
1092      to Univ_IIT_get_one
1093
1094    * path-solve.c: Using subtract_bounded to subtract the local1part amount
1095
1096    * localdb.c: In Localdb_get_diagonals, checking for low < high, since low ==
1097      high can occur when a substring is at the beginning or end of a chromosome
1098
1099    * cigar.c: Checking for an initial M to be printed before printing any indel
1100      or splice
1101
11022019-02-13  twu
1103
1104    * intersect-large.c, intersect.c: No longer initializing last_diagonal to be
1105      0, and comparing against it, which fails if a diagonal is 0.  Instead,
1106      checking explicitly for the first case
1107
11082019-01-31  twu
1109
1110    * gsnap.c, path-solve.c, path-solve.h, segment-search.c, segment-search.h,
1111      stage1hr.c, stage1hr.h, stage3hr.c, stage3hr.h: Removed GMAP parameters
1112      from GSNAP code
1113
1114    * VERSION, config.site.rescomp.prd, archive.html, index.html: Updated for
1115      latest version
1116
1117    * intersect-large.c, intersect.c: In Intersect_approx_lower and
1118      Intersect_approx_higher, ignoring duplicates of diagonals0, in order to
1119      ensure that the result is in ascending order
1120
1121    * substring.c: Using plusp in interpreting mandatory_trim_querystart and
1122      mandatory_trim_queryend
1123
1124    * path-solve.c: When a mismatch extends a diagonal, computing the number of
1125      mismatches at that time
1126
1127    * merge-diagonals-simd-uint4.c: Added code for checking that inputs are in
1128      ascending order
1129
1130    * stage3hr.c: Improved tradeoffs between nmatches, nmatches_posttrim, splice
1131      score, nsegments, insertlength, and outerlength in Stage3end_optimal_score
1132      and Stage3pair_optimal_score_final
1133
1134    * extension-search.c, segment-search.c: Handling the case where the middle
1135      or anchor diagonal straddles two chromosomes
1136
11372019-01-23  twu
1138
1139    * trunk, VERSION, src, Makefile.gsnaptoo.am, changepoint.h, compress.c,
1140      concordance.h, diag.c, extension-search.h, gbuffer.c, gbuffer.h, genome.c,
1141      genome.h, genome128_consec.c, genome128_hr.c, genome_hr.c, genome_sites.c,
1142      gmap.c, gsnap.c, kmer-search.c, kmer-search.h, knownsplicing.c,
1143      knownsplicing.h, oligoindex_localdb.h, outbuffer.h, output.c, pair.c,
1144      pair.h, path-solve.c, path-solve.h, samprint.c, segment-search.c,
1145      segment-search.h, simplepair.c, simplepair.h, splice.c, splice.h,
1146      splicetrie.c, splicetrie.h, splicetrie_build.h, stage1hr.c, stage1hr.h,
1147      stage3.c, stage3hr.c, stage3hr.h, substring.c, substring.h, types.h,
1148      univcoord.h, univdiagpool.c: Merged revisions 218195 through 218285 from
1149      branches/2019-01-17-split-gmap-gsnap to separate GMAP and GSNAP code
1150
11512019-01-22  twu
1152
1153    * stage1hr.c, cigar.c, path-solve.c: Added debugging statements
1154
1155    * stage3hr.c: At a given locus, checking for nmatches_posttrim before
1156      checking for splice score
1157
1158    * splice.c, splice.h, substring.c: Added limit on consecutive matches in
1159      scanning for spliceends
1160
1161    * path-solve.c: Restored previous code that did not look at extendedp
1162
1163    * splice.c: Added requirement for MIN_EXON_LENGTH in trimming ends
1164
1165    * gsnap.c: Added --sam_sparse_secondaries to omit SEQ and QUAL flags in
1166      secondary alignments
1167
1168    * samprint.c, samprint.h: Using mate_plusp computed in Cigar_compute_main in
1169      compute_flag, to give correct results for translocations.  Added
1170      sam_sparse_secondaries_p to omit SEQ and QUAL flags in secondary alignments
1171
1172    * path-solve.c: Restored usage of the sense condition to handle the
1173      non-splicing condition
1174
1175    * filestring.c: Handling the case in Filestring_merge where source->string
1176      is NULL
1177
1178    * path-solve.c: Checking if qstart or qend is extended from spliced
1179      endpoints of middle diagonal, and if not extended, using the original
1180      endpoints
1181
1182    * junction.c: Changed name of macro for debugging
1183
1184    * samprint.c, cigar.c: Handling the case where hard clipping removes all
1185      substrings
1186
1187    * extension-search.c: Added debugging statement
1188
11892019-01-18  twu
1190
1191    * extension-search.c: Fixed bugs in process_seed for processing the
1192      remainder of the queryfwd set or the queryrev set
1193
1194    * extension-search.c: Rewrote algorithm extensively to combine seeds and
1195      sets from queryfwd and queryrev passes
1196
1197    * segment-search.c: Using new interface to Path_solve_from_diagonals
1198
1199    * path-solve.c, path-solve.h: Changed Path_solve_from_diagonals to take a
1200      univdiagonal, qstart, and qend, instead of a Univdiag_T object
1201
1202    * kmer-search.c: Removed unused include file
1203
12042019-01-17  twu
1205
1206    * setup.genomecomp.ok: Revised gold standard for extra bytes at end
1207
1208    * segment-search.c: Fixed problem with allocation when total_npositions is
1209      zero in Segment_identify_lower and Segment_identify_higher
1210
1211    * hitlistpool.c: Initial import
1212
1213    * gsnap.c: Remoed oligoindices_major, oligoindices_minor, pairpool,
1214      diagpool, cellpool, and Dynprog_T objects as variables
1215
1216    * gmap.c: Using new interfaces to stage1, stage2, and stage3 procedures
1217
1218    * stage3.c, stage3.h: Removed unused parameters
1219
1220    * translation.c: Removed npairs as a parameter for backward procedures
1221
1222    * terminal.c, terminal.h: Removed mismatch_positions_alloc as a parameter
1223
1224    * stage2.c: Removed code based on anchoredp, anchor_querypos, and
1225      anchor_position, which are now always false and 0
1226
1227    * stage1hr.c, stage1hr.h: Using new interfaces to kmer-search,
1228      extension-search, terminal, and concordance procedures.  Removed
1229      oligoindices_minor, diagpool, and cellpool parameters to single_read and
1230      paired_read procedures
1231
1232    * stage1.c, stage1.h: Using new interfaces to Block_process_oligo_5 and
1233      Block_process_oligo_3.  Removed sizelimit parameters to Stage1_compute
1234      procedures
1235
1236    * splice.c, splice.h: Removed unused parameters to Splice_setup
1237
1238    * smooth.c: Removed exon_denominator as a parameter to
1239      find_internal_bads_by_prob
1240
1241    * segment-search.c: Removed unused variable
1242
1243    * samprint.h: Removed preprocessor macros for GSNAP
1244
1245    * samprint.c: Using new interfaces to Substring_compute_chrpos and
1246      Pair_print_sam
1247
1248    * path-solve.h: Removed interface to Path_solve_via_gmap
1249
1250    * path-solve.c: Using new interfaces to substring procedures
1251
1252    * pair.c, pair.h: Removed unused parameters for Pair_print_sam
1253
1254    * output.c: Using new interfaces to stage3 print procedures
1255
1256    * iit-write-univ.c: Removed omegas as a parameter to node_select
1257
1258    * iit-read.c, iit-read.h: Commented out obsolete procedure
1259
12602019-01-16  twu
1261
1262    * genome128_consec.c: Added macros around some procedures.  Removed unused
1263      procedures
1264
1265    * genome128_hr.c: Added macros around some procedures
1266
1267    * intersect.c, indexdb.c: Added LARGE_GENOMES macro to a procedure
1268
1269    * oligoindex_hr.c, oligoindex_hr.h: Removed unused parameters from
1270      Oligoindex_untally
1271
1272    * kmer-search.c, kmer-search.h, extension-search.c, extension-search.h:
1273      Removed unused parameters
1274
1275    * epu16-bitpack64-read.c, epu16-bitpack64-readtwo.c: Commented out print
1276      procedures for debugging
1277
1278    * epu16-bitpack64-incr.c: Turned off CHECK macro
1279
1280    * dynprog_single.c, dynprog_single.h: Removed glengthL and glengthR as
1281      parameters to Dynprog_microexon_int
1282
1283    * dynprog_genome.c, dynprog_genome.h: Removed unused parameters, including
1284      calculation of canonical_reward
1285
1286    * distant-rna.c, distant-rna.h: Removed user_maxlevel as a parameter
1287
1288    * datadir.c: Removed unused variables
1289
1290    * concordance.c, concordance.h: Using new interface to Stage3pair_new.
1291      Removed unused parameters
1292
1293    * stage3hr.c, stage3hr.h: Removed oligoindices_minor, diagpool, and cellpool
1294      as parameters, used previously for resolving insides
1295
1296    * compress-write.c: Using different format statements for Univcoord_T
1297      variables
1298
1299    * cigar.c: Removed trimlength as a parameter to length_cigar_M.  Using new
1300      interface to Substring_compute_chrpos
1301
1302    * substring.c, substring.h: Removed plusp as a parameter to
1303      embellish_genomic and hardclip_high as a parameter to
1304      Substring_compute_chrpos
1305
1306    * reader.c, reader.h: Commented out unused procedures
1307
1308    * block.c, block.h: Removed indexdb_sizelimit as a parameter to
1309      Block_process_oligo_5 and Block_process_oligo_3
1310
1311    * atoiindex.c, cmetindex.c, snpindex.c: Using new interfaces to
1312      Indexdb_bitpack_counter and Localdb_new_genome
1313
1314    * localdb.c, localdb.h: Removed expand_offsets_p as a parameter to
1315      Localdb_new_genome
1316
1317    * indexdb-write.c, indexdb-write.h: Removed offsetsstrm and offsetspages as
1318      parameters to Indexdb_bitpack_counter and Indexdb_bitpack_counter_huge
1319
1320    * trunk, VERSION, src, Makefile.gsnaptoo.am, access.c, atoiindex.c,
1321      boyer-moore.h, cellpool.c, chrom.h, cigar.c, cmetindex.c,
1322      compress-write.h, concordance.c, concordance.h, diag.h, diagpool.c,
1323      distant-rna.c, distant-rna.h, extension-search.c, extension-search.h,
1324      filestring.c, genome128_hr.c, genome_sites.h, genomicpos.c, gmap.c,
1325      gmapindex.c, gregion.c, gsnap.c, hitlistpool.h, indel.c, indexdb.c,
1326      indexdb.h, intersect-large.h, intersect.c, intersect.h, intlist.c,
1327      intlist.h, intlistdef.h, intlistpool.c, intlistpool.h, junction.c,
1328      junction.h, kmer-search.c, kmer-search.h, ladder.c, ladder.h, list.h,
1329      listdef.h, listpool.c, listpool.h, localdb.c, localdb.h, localdbdef.h,
1330      matchpool.c, maxent_hr.h, mem.h, merge-diagonals-heap.h,
1331      merge-diagonals-simd-uint4.c, merge-diagonals-simd-uint4.h,
1332      merge-diagonals-simd-uint8.h, merge-uint4.c, method.c, oligoindex_hr.c,
1333      oligoindex_hr.h, outbuffer.c, outbuffer.h, output.c, output.h, pair.c,
1334      pair.h, pairpool.c, path-solve.c, path-solve.h, result.h, resulthr.h,
1335      samprint.c, samprint.h, segment-search.c, segment-search.h, shortread.c,
1336      shortread.h, splice.c, splice.h, splicestringpool.c, splicetrie.c,
1337      splicetrie_build.h, stage1hr.c, stage1hr.h, stage3.c, stage3.h,
1338      stage3hr.c, stage3hr.h, stage3hrdef.h, substring.c, substring.h,
1339      terminal.c, terminal.h, types.h, uint8listpool.c, uint8listpool.h,
1340      uint8table_rh.c, uint8table_rh.h, uintlist.c, uintlist.h, uintlistpool.c,
1341      uintlistpool.h, uinttable.c, uinttable_rh.c, uinttable_rh.h, uniqscan.c,
1342      univcoord.h, univdiag.c, univdiag.h, univdiagdef.h, univdiagpool.c,
1343      univdiagpool.h, univinterval.h: Merged revisions 216893 to 218146 from
1344      branches/2018-10-08-path-solve to improve path-solve procedure
1345
13462018-10-18  twu
1347
1348    * genome-write.c: Adding 2 words to the end of genomecomp, needed for
1349      accessing nextlow (ptr+4) in the fwd_partial and rev_partial procedures in
1350      oligoindex_hr.c
1351
13522018-10-10  twu
1353
1354    * genome.c, bitpack64-serial-write.c, bitpack64-write.c, genome-write.c,
1355      genome128.c, gmapindex.c, indexdb-write.c, indexdb.c, indexdb_hr.c,
1356      kmer-search.c, outbuffer.c, pair.c, parserange.c, stage3.c, stage3hr.c,
1357      substring.c: Replaced occurrences of 1U with 1
1358
1359    * sedgesort.c: Fixed bug in Sedgesort_uint8 where we assigned -1U instead of
1360      (UINT8) -1 as the sentinel value
1361
13622018-10-09  twu
1363
1364    * path-solve.c: Removing endpoints from the left and right to see if a
1365      continuing alignment works
1366
1367    * trunk, src, Makefile.gsnaptoo.am, concordance.c, concordance.h,
1368      distant-rna.c, distant-rna.h, extension-search.c, extension-search.h,
1369      gsnap.c, intlistpool.c, junction.c, junction.h, kmer-search.c,
1370      kmer-search.h, list.h, listpool.c, listpool.h, pair.c, pair.h,
1371      path-solve.c, path-solve.h, segment-search.c, segment-search.h,
1372      stage1hr.c, stage1hr.h, stage3hr.c, stage3hr.h, terminal.c, terminal.h:
1373      Merged revision 216940 from branches/2018-10-10-reduce-list-push to add
1374      Listpool_T object for lists of substrings and junctions
1375
1376    * trunk, VERSION, src, Makefile.gsnaptoo.am, intersect-large.c,
1377      intersect-large.h, intersect.c, intersect.h, path-solve.c, record.h,
1378      segment-search.c, segment-search.h, stage1hr.c, stage1hr.h: Merged
1379      revisions 216922 through 216936 from branches/2018-10-09-merge-records to
1380      replace Merge_records procedures in segment search with Merge_diagonals
1381
1382    * localdb.c: Allocating extra space for array, needed for Sedgesort
1383
1384    * concordance.c: Removed unused code for filtering paired-end hits
1385
1386    * concordance.c: Restored computation of abort_pairing_p
1387
13882018-10-08  twu
1389
1390    * trunk, VERSION, src, Makefile.gsnaptoo.am, distant-rna.c, distant-rna.h,
1391      extension-search.c, filter-diagonals.c, filter-diagonals.h, gsnap.c,
1392      kmer-search.c, kmer-search.h, localdb.c, localdb.h,
1393      merge-diagonals-heap.c, merge-diagonals-heap.h,
1394      merge-diagonals-simd-uint4.c, merge-diagonals-simd-uint4.h,
1395      merge-diagonals-simd-uint8.c, merge-diagonals-simd-uint8.h,
1396      merge-records-heap.c, merge-records-heap.h, merge-records-simd.c,
1397      merge-records-simd.h, merge-uint4.c, path-solve.c, path-solve.h,
1398      segment-search.c, segment-search.h, splice.c, splice.h, stage1hr.c,
1399      stage1hr.h, stage3hr.c, stage3hr.h, substring.c, substring.h, terminal.c,
1400      terminal.h: Merged revisions 216889 to 216917 from
1401      branches/2018-10-07-filter-diagonals to introduce a filtering step before
1402      segment search, and to pre-allocate memory for Merge_records,
1403      Merge_diagonals, Splice_resolve, and Substring_new procedures
1404
1405    * src, kmer-search.c: Fixed double-assignment of variable
1406
1407    * method.c, method.h, segment-search.c, segment-search.h, stage1hr.c:
1408      Distinguishing between segment search for single-end reads and segment
1409      search for anchored paired-end reads
1410
14112018-10-07  twu
1412
1413    * trunk, VERSION, config.site.rescomp.prd, src, Makefile.gsnaptoo.am,
1414      concordance.c, concordance.h, extension-search.c, extension-search.h,
1415      gsnap.c, intersect-large.c, intersect-large.h, intersect.c, intersect.h,
1416      intlistpool.c, intlistpool.h, intpool.c, intpool.h, kmer-search.c,
1417      kmer-search.h, ladder.c, ladder.h, localdb.c, mem.h,
1418      merge-diagonals-heap.c, merge-diagonals-heap.h,
1419      merge-diagonals-simd-uint4.c, merge-diagonals-simd-uint4.h,
1420      merge-diagonals-simd-uint8.c, merge-diagonals-simd-uint8.h,
1421      merge-heap-diagonals.c, merge-heap-diagonals.h, merge-heap-records.c,
1422      merge-heap-records.h, merge-records-heap.c, merge-records-heap.h,
1423      merge-records-simd.c, merge-records-simd.h, merge-simd-diagonals.c,
1424      merge-simd-diagonals.h, merge-simd-records.c, merge-simd-records.h,
1425      merge-uint4.h, merge-uint8.c, merge-uint8.h, method.c, method.h,
1426      path-solve.c, path-solve.h, sedgesort.c, sedgesort.h, segment-search.c,
1427      segment-search.h, stage1hr.c, stage1hr.h, stage3hr.c: Merged revisions
1428      216741 to 216887 from branches/2018-10-01-gsnapl-speed to increased speed
1429      of GSNAP and GSNAPL, especially for paired-end reads
1430
1431    * index.html: Updated for current version
1432
14332018-10-03  twu
1434
1435    * concordance.c, concordance.h, stage1hr.c: Using new interfaces to
1436      Stage3pair_new and Concordance_pair_up procedures
1437
1438    * stage3hr.c, stage3hr.h: No longer filtering substrings based on
1439      endtrim_allowed on one side. Performing resolve_insides at end of
1440      Stage3pair_new
1441
1442    * segment-search.c: Fixed debugging statements
1443
1444    * path-solve.c: No longer calling a check creation of a substring of the
1445      middle diagonal
1446
1447    * kmer-search.c: Hiding a debugging procedure
1448
1449    * acinclude.m4, simd-intrinsics.m4, configure.ac, genome128_consec.c,
1450      genome128_hr.c: Added compiler checks for the SIMD intrinsics
1451      _mm_extract_epi64 and _mm_popcnt_u64, and using them
1452
1453    * extension-search.c: Removed unused debugging procedure
1454
1455    * epu16-bitpack64-write.c: Modified comments
1456
1457    * bigendian.c, bigendian.h: Implemented FWRITE_USHORT and FWRITE_USHORTS for
1458      bigendian machines
1459
14602018-10-02  twu
1461
1462    * substring.c: Fixed uninitialized fields querystart_pretrim and
1463      queryend_pretrim in Substring_T object
1464
14652018-07-05  twu
1466
1467    * trunk, VERSION, config.site.rescomp.tst, index.html, src, path-solve.c:
1468      Changed check on sense_endpoints to antisense_endpoints before trying to
1469      remove an end segment
1470
14712018-06-29  twu
1472
1473    * types.h: Merged revisions 215752 to 215897 from
1474      branches/2018-06-15-path-solve-junctions to define Univcoordlist_pop
1475
1476    * stage3hrdef.h: Merged revisions 215752 to 215897 from
1477      branches/2018-06-15-path-solve-junctions to add nmatches_amb as a field
1478
1479    * stage3hr.c, stage3hr.h: Merged revisions 215752 to 215897 from
1480      branches/2018-06-15-path-solve-junctions to consider trim amount in the
1481      final comparison among alignments
1482
1483    * stage3.h: Merged revisions 215752 to 215897 from
1484      branches/2018-06-15-path-solve-junctions to take queryseq as an argument
1485      in merge procedures
1486
1487    * stage3.c: Merged revisions 215752 to 215897 from
1488      branches/2018-06-15-path-solve-junctions to copy all pairs when performing
1489      a merge, to peelback to indels on the medial side when taking the
1490      continuous solution, and to remove indels in insert_gapholders
1491
1492    * pair.c, pair.h: Merged revisions 215752 to 215897 from
1493      branches/2018-06-15-path-solve-junctions to implement Pair_split_circular
1494
1495    * junction.c, junction.h: Merged revisions 215752 to 215897 from
1496      branches/2018-06-15-path-solve-junctions to implement Junction_new_generic
1497
1498    * gmap.c: Merged revisions 215752 to 215897 from
1499      branches/2018-06-15-path-solve-junctions to split circular alignments to
1500      cross the origin
1501
1502    * chimera.c, chimera.h: Merged revisions 215752 to 215897 from
1503      branches/2018-06-15-path-solve-junctions to disallow chimeras to circular
1504      chromosomes
1505
1506    * path-solve.c: Merged revisions 215752 to 215897 from
1507      branches/2018-06-15-path-solve-junctions to create a new junction at the
1508      end, rather than use a precomputed junction
1509
1510    * path-solve.c: At ends, pushing NULL instead of the previously computed
1511      junction
1512
1513    * pair.c, pair.h: Implemented checking procedures
1514
15152018-06-15  twu
1516
1517    * stage3hr.c: Removed extraneous characters in debugging statement
1518
15192018-06-14  twu
1520
1521    * stage3.c: Fixed handling of end exons in end trimming procedures.  Using
1522      new interfaces to Pair_clip_bounded_list_5 and Pair_clip_bounded_list_3
1523
1524    * pair.c, pair.h: Implemented separate 5' and 3' versions of
1525      Pair_clip_bounded_list
1526
1527    * chimera.c: Fixed debugging statements to use new interface to
1528      Sequence_stdout
1529
15302018-05-30  twu
1531
1532    * pair.c: In converting pairarray to substrings, now resetting exon
1533      variables after an insertion or deletion
1534
1535    * iit-read.c: Added a warning in using IIT_read for a version 1 IIT
1536
1537    * iit_get.c, iit_store.c: Fixed some memory leaks
1538
1539    * getline.c, getline.h: Returning string_length with Getline_wlinefeed
1540
1541    * iit_store.c: Using Getline_wlinefeed instead of Getline_wlength
1542
1543    * stage1hr.c: Fixed uninitialized variable
1544
1545    * substring.c: Fixed assertion
1546
1547    * stage3hr.c: Checking if there is enough space at ends of the chromosome
1548      before resolving inner exons
1549
1550    * gmapindex.c: Assigning variable and not pointer when clearing empty space
1551      at end of line
1552
1553    * atoiindex.c, cmetindex.c, indexdb-write.c, snpindex.c: Calling
1554      Access_allocate_private properly for machines where mmap is not available
1555      or disabled
1556
1557    * gregion.c: Improved procedure for finding unique gregions by sorting by
1558      support instead of weight, by checking if query coordinates are
1559      consistent, and handling cases where endpoints are equal
1560
1561    * Makefile.gsnaptoo.am: Not making iit_pileup
1562
15632018-05-25  twu
1564
1565    * Makefile.gsnaptoo.am: Added getline.c and getline.h to library
1566
1567    * get-genome.c: Using variable line instead of Buffer
1568
1569    * trunk, config.site.rescomp.prd, config.site.rescomp.tst, index.html, src,
1570      gmap.c, oligoindex_hr.c, stage2.c, stage3.c: Merged revisions 215481
1571      through 215483 from branches/2018-05-25-restore-compute-ends to restore
1572      GMAP behavior from 2018-03-20 and use Stage3middle_T,
1573      Stage3_compute_middle, and Stage3_comput_ends for better alignment at ends
1574
1575    * datadir.c: Fixed extraneous parenthesis
1576
1577    * datadir.c, Makefile.gsnaptoo.am, iit_get.c, iit_store.c: Using Getline
1578
1579    * gmap.c, chrsubset.c, iit-read-univ.c, iit-read.c, iit_store.c, samread.c,
1580      sequence.c, shortread.c, splicing-scan.c, stage2.c, stage3hr.c: Ensuring
1581      that calls to strncpy are followed by setting the end to be '\0'.  Using
1582      malloc instead of calloc in these situations
1583
1584    * Makefile.gsnaptoo.am, datadir.c, get-genome.c, getline.c, getline.h,
1585      gmapindex.c: Using calls to Getline to prevent problems with buffer
1586      overflow
1587
1588    * substring.c: Fixed calculation of mandatory_trim_left and
1589      mandatory_trim_right for alias to be positive, rather than negative
1590
15912018-05-11  twu
1592
1593    * VERSION: Updated version number
1594
1595    * src: Made changes
1596
1597    * fa_coords.pl.in, gmap_build.pl.in, gmap_process.pl.in: Added options to
1598      handle fastq files and to reverse complement all sequences
1599
1600    * stage1hr.c: No longer iterating through Segment_search
1601
1602    * stage1hr.c: Not calling Stage1_init or other Stage1 procedures when
1603      querylength < index1part
1604
1605    * substring.c: Not doing aliasing and unaliasing on alt substrings.
1606      Computing mandatory_trim_querystart and mandatory_trim_queryend instead of
1607      mandatory_trim_left and mandatory_trim_right
1608
1609    * trindex.c: Revised instructions
1610
1611    * iit-read-univ.c, iit_get.c, iit_pileup.c, iit_tally.c: Freeing div name
1612      allocated by new versions of Parserange routines
1613
1614    * genome128_consec.c, genome128_hr.c: Added header files needed for SSSE3
1615      computers
1616
16172018-04-30  twu
1618
1619    * shortread.c: Formatting change
1620
1621    * parserange.c: Allow colons in accession names
1622
1623    * gmapindex.c: Allow colons in accession.  Implementing revcomp by using '-'
1624      sign for contig length
1625
1626    * get-genome.c: Freeing chromosome string after call to Parserange
1627
16282018-04-21  twu
1629
1630    * stage1hr.c: Merged revision 214805 from branches/2018-04-21-fix-anchors to
1631      use Solve_segment_all instead of Concordance_filter_records for paired-end
1632      reads
1633
1634    * stage3hr.c, stage1hr.c: Added variable remap_transcriptome_p
1635
1636    * stage3.c: Checking for a NULL stage3 in Stage3_split
1637
1638    * intersect.c: Fixed comments
1639
1640    * stage1.c: Using Oligospace_T to cast 0
1641
1642    * kmer-search.c: Fixed issues with transcript coordinates
1643
16442018-04-20  twu
1645
1646    * gmap.c, gsnap.c, inbuffer.c, inbuffer.h, sequence.c, sequence.h,
1647      shortread.c, shortread.h: Adding a command-line option --read-files-command
1648
1649    * extension-search.c, segment-search.c, stage1hr.c: Using (Oligospace_T) as
1650      a cast for 0
1651
1652    * Makefile.gsnaptoo.am, fopen.c, fopen.h: Added a command-line option
1653      --read-files-command
1654
1655    * gsnap.c: Changed flag from --use-transcriptome to --use-transcriptome-only
1656
16572018-03-25  twu
1658
1659    * samprint.c: Using new interface to Pair_print_sam
1660
1661    * trunk, config.site.rescomp.prd, index.html, src, block.c, gmap.c,
1662      indexdb.c, oligoindex_hr.c, pair.c, stage1.c, stage2.c, stage3.c,
1663      translation.c: Merged revisions 214439 through 214446 from
1664      branches/2018-03-24-restore-gmap to restore speed and sensitivity of GMAP
1665      by eliminating use of Stage3_compute_ends, restoring use of multiple
1666      oligoindices, and fixing CDS phases for GFF3 output
1667
16682018-03-24  twu
1669
1670    * indexdb.c: Removed option for expanding offsets
1671
1672    * gmap.c: Fixed memory issues with new calls to Stage3_merge_local and
1673      Stage3_merge_chimera, and appending middlepieces to stage3list. Changed
1674      default gff3_cds to be genomic.  Ignoring option for --expand-offsets
1675
1676    * stage3.c, stage3.h: Removed cigar_tokens and intronp as fields for
1677      Stage3_T objects. Implemented Stage3_copy and using it in
1678      Stage3_merge_local and Stage3_merge_chimera
1679
1680    * pair.c, pair.h: Removed cigar_tokens and intronp as paramters to
1681      Pair_print_sam
1682
16832018-03-23  twu
1684
1685    * substring.c: Added assertions about alts, to make sure we don't use
1686      alignstart_trim or alignend_trim fields in those cases
1687
1688    * stage3hr.c: Checking for alts when getting chrpos_low or chrpos_high
1689
1690    * samprint.c: Formatting changes
1691
1692    * dynprog.h: Setting AMBIGUOUS score to be 3, to make cmet and atoi
1693      alignments equivalent to standard
1694
1695    * cigar.c: For sam_hardclip_use_S option, returning hardclips to be 0 so
1696      they don't affect the query sequence in SAM output
1697
1698    * stage3.c: Restored force of single gaps, to avoid problems with
1699      add_dual_break later
1700
17012018-03-21  twu
1702
1703    * iit_tally.c: Initial import
1704
17052018-03-20  twu
1706
1707    * VERSION: Updated version number
1708
1709    * trunk, src, dynprog.c, dynprog.h, dynprog_cdna.c, dynprog_cdna.h,
1710      dynprog_end.c, dynprog_end.h, dynprog_genome.c, dynprog_genome.h,
1711      dynprog_simd.c, dynprog_simd.h, dynprog_single.c, dynprog_single.h,
1712      gmap.c, gsnap.c, pair.c, path-solve.c, splicetrie.c, splicetrie.h,
1713      stage3.c, stage3.h: Merged revisions 214343 through 214360 from
1714      branches/2018-03-20-cmet-gmap to make non-standard modes work in GMAP
1715      alignments
1716
1717    * sam-exons.pl.in: Initial import
1718
1719    * indexdb.c: Fixed preprocessor macro
1720
17212018-03-19  twu
1722
1723    * VERSION: Updated version number
1724
1725    * trunk, VERSION, config.site.rescomp.tst, configure.ac, index.html, src,
1726      Makefile.gsnaptoo.am, atoi.c, atoi.h, atoiindex.c, cmet.c, cmet.h,
1727      cmetindex.c, epu16-bitpack64-access.c, epu16-bitpack64-access.h,
1728      epu16-bitpack64-incr.c, epu16-bitpack64-incr.h, epu16-bitpack64-read.c,
1729      epu16-bitpack64-read.h, epu16-bitpack64-readtwo.c,
1730      epu16-bitpack64-readtwo.h, epu16-bitpack64-write.c,
1731      epu16-bitpack64-write.h, extension-search.c, extension-search.h,
1732      genomicpos.c, genomicpos.h, gsnap.c, indexdb.h, indexdbdef.h,
1733      kmer-search.c, kmer-search.h, localdb-write.c, localdb.c, localdb.h,
1734      localdbdef.h, oligo.c, path-solve.c, path-solve.h, reader.c, stage1hr.c,
1735      types.h: Merged revisions 214117 through 214304 from
1736      branches/2018-03-11-cmet-localdb to implement non-standard modes with
1737      support for localdb
1738
1739    * substring.c: Removed an assertion that is not always valid
1740
1741    * stage3hr.c, stage3hr.h: Commented out code for Stage3end_substring_high,
1742      which is not used anymore
1743
1744    * samprint.c: Fixed typo in using querylength5 instead of querylength3 for
1745      mate
1746
1747    * samflags.h: Incremented values for number of filestream outputs, to handle
1748      XS output files properly
1749
1750    * outbuffer.c: Always skipping output for OUTPUT_NONE filestream
1751
1752    * Makefile.gsnaptoo.am: Hiding program iit_pileup
1753
1754    * oligoindex_hr.c, localdb.c: Allowing debugging statements to work
1755
1756    * kmer-search.c: Added debugging statements
1757
1758    * indexdb.c: Distinguishing between GSNAP pointer procedures, which need to
1759      handle negative diagterms, and GMAP read procedures, where diagterms are
1760      always non-negative
1761
1762    * gsnap.c: Added option for --sam-hardclip-use-S
1763
1764    * cigar.c, cigar.h: Calling low substring for all circular chromosome hits
1765      to get correct chrpos in SAM output.  Added provisions for
1766      SAM_hardclip_use_S_p
1767
17682018-03-10  twu
1769
1770    * substring.c: Changed assertion to allow for '*'
1771
1772    * stage3.c: Added a clip of end5 against the chromosomal bound for
1773      alignments on the minus strand
1774
1775    * intersect.c: Commenting out advance of positions past -diagterm
1776
1777    * indexdb.c: No longer checking for diagterm < 0, since diagterm <= 0 should
1778      be true
1779
1780    * extension-search.c: Reverted advance of positions past -diagterm and added
1781      assertions instead.  Changed algorithm to not add an elt when the
1782      positions are invalid.
1783
1784    * shortread.c: Preventing invalid read when accession contains only one or
1785      two characters
1786
1787    * kmer-search.c, merge-heap-diagonals.c, merge-heap-records.c,
1788      merge-simd-diagonals.c, merge-simd-records.c: Added assertions to check
1789      that positions >= -diagterm
1790
1791    * extension-search.c, intersect.c: Skipping or advancing positions so they
1792      exceed -diagterm
1793
1794    * oligo.c: Fixed initialization code so that Clang compiler will accept it
1795
17962018-03-09  twu
1797
1798    * localdb-write.c: Added a message to indicate when writing is done
1799
1800    * Makefile.gsnaptoo.am, iit_pileup.c: Added ability for the user to specify
1801      a genomic range
1802
1803    * stage3hr.c, stage3hr.h: Fixed bug with circular read not being unaliased
1804      because of a soft clip.  Removed macro for soft clips avoiding
1805      circularization.  Removed circularalias values of +2 and -2, since
1806      alignments now should stay within chromosomal bounds
1807
1808    * iit_pileup.c: Added code for using filestrings
1809
1810    * Makefile.gsnaptoo.am, iit-read.c, iit-read.h, iit_pileup.c: Restored
1811      iit_pileup program.  Rewrote to take a FASTA file as input
1812
18132018-03-05  twu
1814
1815    * stage3.c: Added back declaration of variables
1816
1817    * stage3.c: Removed unused variables and parameters
1818
1819    * samprint.c: Removed unused static variable
1820
1821    * path-solve.c: Using new interface to Stage3end_new_gmap
1822
1823    * stage3hr.c, stage3hr.h: Removed unused parameters to Stage3end_new_gmap.
1824      Removed unused code
1825
1826    * gsnap.c: Removed unused parameter to process_request
1827
1828    * extension-search.c: Removed unused code
1829
1830    * stage1hr.c: Fixed memory error when abort_pairing_p is true
1831
18322018-03-04  twu
1833
1834    * localdb.c: Defined variables needed for large genomes
1835
1836    * trunk, VERSION, src, block.c, block.h, distant-rna.c, distant-rna.h,
1837      extension-search.c, gmap.c, gsnap.c, kmer-search.c, kmer-search.h,
1838      ladder.c, ladder.h, localdb.c, localdb.h, oligo.c, oligo.h, pair.c,
1839      pair.h, path-solve.c, path-solve.h, segment-search.c, segment-search.h,
1840      splice.c, splice.h, stage1.c, stage1.h, stage1hr.c, stage1hr.h, stage2.c,
1841      stage3.c, stage3hr.c, stage3hr.h, substring.c, substring.h, terminal.c,
1842      terminal.h: Merged revisions 213978 through 214024 from
1843      branches/2018-03-01-ptr-and-diagterm to add potential support for cmet,
1844      atoi, and ttoc modes
1845
18462018-03-03  twu
1847
1848    * cigar.c, cigar.h, gsnap.c: Obey behavior of --sam-use-0M flag
1849
18502018-03-02  twu
1851
1852    * trunk, config.site.rescomp.prd, src, Makefile.gsnaptoo.am, access.c,
1853      atoiindex.c, bitpack64-readtwo.c, cmetindex.c, epu16-bitpack64-read.c,
1854      epu16-bitpack64-readtwo.c, epu16-bitpack64-write.c, extension-search.c,
1855      filesuffix.h, genome-write.c, genome128_hr.c, gmapindex.c, gsnap.c,
1856      iit-write.c, iit_store.c, indel.c, indexdb-write.c, indexdb.c, indexdb.h,
1857      indexdbdef.h, intersect.c, intersect.h, kmer-search.c, kmer-search.h,
1858      localdb.c, merge-heap-diagonals.c, merge-heap-diagonals.h,
1859      merge-heap-records.c, merge-heap-records.h, merge-simd-diagonals.c,
1860      merge-simd-diagonals.h, oligoindex_hr.c, pair.c, path-solve.c,
1861      segment-search.c, segment-search.h, smooth.c, snpindex.c, stage1hr.c,
1862      stage1hr.h, stage2.c, stage3.c, stage3hr.c, substring.c, terminal.c,
1863      types.h: Merged revisions 213921 to 213977 from
1864      branches/2018-03-01-ptr-and-diagterm to add back support for large genomes
1865
1866    * stage3hr.c: Improved debugging statements
1867
1868    * kmer-search.c: Removed extraneous calls to free lists
1869
1870    * ladder.c, ladder.h: Implemented Ladder_cutoff to allow for ladders, but
1871      restrict computation to MAX_HITS
1872
1873    * concordance.c: Calling Ladder_cutoff instead of Ladder_maxscore
1874
1875    * cigar.c: Fixed computation of SAM chrpos.  Always calling
1876      Stage3end_substring_low for a single SAM line
1877
1878    * concordance.c: Applying MAX_HITS to Concordance_pair_up_distant
1879
18802018-03-01  twu
1881
1882    * stage1hr.c: Fixed a typo: querypos_rc instead of querypos for minus
1883      positions
1884
1885    * indexdb.c: Hiding Indexdb_ptr_with_diagterm from utility programs
1886
1887    * pairpool.c, pairpool.h, stage3.c: Using a better method for handling
1888      chromosomal bounds in GMAP, by trimming pairs, rather than the pairarray
1889
1890    * trunk, VERSION, config.site.rescomp.prd, config.site.rescomp.tst, src,
1891      Makefile.gsnaptoo.am, extension-search.c, extension-search.h, indexdb.c,
1892      indexdb.h, intersect.c, intersect.h, kmer-search.c, kmer-search.h,
1893      localdb.c, merge-heap-diagonals.c, merge-heap-diagonals.h,
1894      merge-heap-records.c, merge-heap-records.h, merge-simd-diagonals.c,
1895      merge-simd-diagonals.h, merge-simd-records.c, merge-simd-records.h,
1896      segment-search.c, stage1hr.c, stage1hr.h: Merged revisions 213873 through
1897      213921 from branches/2018-03-01-ptr-and-diagterm to use pointers to
1898      positions and diagterms, rather than copying positions
1899
1900    * substring.c: In Substring_set_alt, handling amb_splice_pos properly as
1901      being defined from left, rather than querystart
1902
1903    * Makefile.util.am: Added instructions for bitpack64-test
1904
1905    * segment-search.c: Freeing lists provided to Merge_records
1906
1907    * path-solve.c: Using FREE_ALIGN for freeing results of merge procedures
1908
1909    * localdb.c: Localdb_read_with_bounds now always returns aligned memory,
1910      which can happen if merging is performed.  Using new interface to merge
1911      procedures
1912
1913    * kmer-search.c: Fixed a memory leak in exact procedure when the ends have
1914      invalid oligos
1915
1916    * merge-heap-diagonals.c, merge-heap-records.c, merge-simd-diagonals.c,
1917      merge-simd-records.c: Merge procedures no longer free their input lists or
1918      streams.  Streams can be aligned or not, so the caller needs to free them
1919
1920    * pair.c, pairpool.c, pairpool.h, stage3.c: Prevent GMAP results from going
1921      beyond chromosomal bounds, when making the final pairarray
1922
19232018-02-28  twu
1924
1925    * stage1hr.c: Using new interfaces to exact and approx algorithms
1926
1927    * stage3hr.c, stage3hr.h: Renamed Stage3end_list_free to Stage3end_gc
1928
1929    * segment-search.c: Fixing memory leak
1930
1931    * kmer-search.c, kmer-search.h: Limiting results for exact algorithm with
1932      max_hits, and limiting approx algorithm with both max_hits and sizelimit
1933
1934    * concordance.c: Limiting results for newladder with MAX_HITS
1935
1936    * uint8table.c, uint8table.h: Initial implementation
1937
1938    * Makefile.gsnaptoo.am, stage1hr.c: Using a Uint8table_T object to call
1939      repeated occurrences of an oligo invalid
1940
1941    * substring.c: Allocating querylength+1 for mismatch_positions
1942
1943    * path-solve.c: Making a better call to Substring_new for the middle
1944      diagonal, and handling the case where the result is NULL
1945
1946    * gsnap.c: Including header for concordance.h
1947
1948    * terminal.c, stage3hr.c: Using new interface to Substring_new
1949
1950    * path-solve.c: Removed all instances of middle_path
1951
1952    * extension-search.c, kmer-search.c: Using new interfaces to
1953      Path_solve_from_diagonals and Substring_new
1954
1955    * segment-search.c: Using new interface to Path_solve_from_diagonals.
1956      Letting that procedure determine which left and right diagonals are in the
1957      correct chrnum
1958
1959    * path-solve.c, path-solve.h: Path_solve_from_diagonals now takes a middle
1960      diagonal, rather than a middle path.  Sets chrnum based on middle
1961      diagonal, as determined by Substring_new.  Also considers only left and
1962      right diagonals in that chrnum.
1963
1964    * substring.c, substring.h: Substring_new now takes chrnum_fixed_p as a
1965      parameter.  If true, then the procedure does not recompute chromosomal
1966      bounds
1967
19682018-02-27  twu
1969
1970    * segment-search.c: Using left instead of lowpos to determine chrnum, to be
1971      safe
1972
1973    * substring.c: Checking for substring being below given chroffset, and
1974      recomputing chromosomal bounds
1975
1976    * segment-search.c: When adding segments to left and right, checking to make
1977      sure they fall at least partially into the same chromosome as the anchor
1978      segment
1979
1980    * pair.c: Fixed argument for setting querystart_pretrim and queryend_pretrim
1981
1982    * substring.c: Fixed typo
1983
1984    * pair.c, substring.c, substring.h: Computing more accurate values for
1985      querystart_pretrim and queryend_pretrim.  Removed assertion about
1986      left_bound and right_bound in Substring_count_mismatches_region
1987
1988    * concordance.c, stage1hr.c, stage3hr.c, stage3hr.h, stage3hrdef.h: Removed
1989      private5p and private3p fields from Stage3pair_T object. Always copying
1990      hits when making a pair, needed because the concordance procedure can now
1991      delete hits
1992
1993    * ladder.c: Assigning a value to nhits in all cases from
1994      Ladder_hits_for_score
1995
1996    * concordance.c, stage1hr.c: Restored assignment of abort_pairing_p
1997
1998    * ladder.c, ladder.h, stage3hr.c, stage3hr.h: Removing duplicates before
1999      returning hits at a given score.  Added a duplicates field to hold
2000      duplicate hits, and a procedure for deleting them
2001
20022018-02-26  twu
2003
2004    * filestring.c: Now handling %p in format statement
2005
2006    * stage3hr.c: Changed one occurrence of Substring_ambiguous_p to
2007      Substring_has_alts_p
2008
2009    * trunk, src, Makefile.gsnaptoo.am, cigar.c, concordance.c, concordance.h,
2010      distant-rna.c, distant-rna.h, gsnap.c, kmer-search.c, kmer-search.h,
2011      ladder.c, ladder.h, merge-heap-records.c, merge-heap-records.h,
2012      merge-simd-records.c, merge-simd-records.h, merge-uint4.c, method.c,
2013      method.h, pair.c, pair.h, path-solve.c, record.h, samprint.c,
2014      segment-search.c, segment-search.h, splice.h, stage1hr.c, stage1hr.h,
2015      stage3hr.c, stage3hr.h, stage3hrdef.h, substring.c, substring.h,
2016      terminal.c, terminal.h, types.h, univdiag.c: Merged revisions 213654 to
2017      213758 from branches/2018-02-22-limit-segment-search to add concordance
2018      procedures, ladders, and filters for segment search, and to revise the
2019      single-end and paired-end procedures
2020
20212018-02-22  twu
2022
2023    * trunk, VERSION, config.site.rescomp.tst, src, Makefile.gsnaptoo.am,
2024      block.c, extension-search.c, genomicpos.h, gmap.c, gsnap.c, indexdb.c,
2025      indexdb.h, intersect.c, intersect.h, intlist.c, intlist.h, junction.c,
2026      kmer-search.c, localdb.c, localdb.h, merge-heap-diagonals.c,
2027      merge-heap-diagonals.h, merge-heap-records.c, merge-heap-records.h,
2028      merge-heap.c, merge-heap.h, merge-simd-diagonals.c,
2029      merge-simd-diagonals.h, merge-simd-records.c, merge-simd-records.h,
2030      merge-uint4.c, merge-uint4.h, merge.c, merge.h, oligo.c, pair.c, pair.h,
2031      path-solve.c, segment-search.c, segment-search.h, splice.c, splicetrie.c,
2032      splicetrie.h, stage1.c, stage1hr.c, stage1hr.h, stage2.c, stage2.h,
2033      stage3.c, stage3hr.c, stage3hr.h, substring.c, substring.h, types.h,
2034      uint8list.c, uint8list.h, uintlist.h, uniqscan.c, util, gmap_build.pl.in:
2035      Merged revisions 213593 through 213652 from
2036      branches/2018-02-21-large-genomes to add support for large genomes
2037
20382018-02-21  twu
2039
2040    * stage1hr.c: Checking for a set with zero elements before calling
2041      Orderstat_int_pct
2042
2043    * segment-search.c: Excluding invalid oligos
2044
2045    * path-solve.c: When novelsplicing is false, creating just one hit
2046
2047    * oligo.c: Commented out unused procedures
2048
2049    * kmer-search.c: Handling invalid oligos correctly
2050
2051    * stage1hr.c: Making poly_A and poly_T oligos not valid.  Added a min
2052      sizelimit
2053
2054    * stage1hr.c: Added comment
2055
2056    * cigar.c, samprint.c, stage3hr.c, stage3hr.h: Removed code designed for old
2057      meaning of Substring_ambiguous_p, now distinct from Substring_has_alts_p
2058
2059    * indexdb.c: Commented out SIMD procedures for utility programs.  Added
2060      support for AVX2
2061
2062    * trunk, src, extension-search.c, extension-search.h, indexdb.c, indexdb.h,
2063      kmer-search.c, kmer-search.h, path-solve.c, path-solve.h,
2064      segment-search.c, segment-search.h, splice.c, stage1hr.c, stage1hr.h,
2065      stage3hr.c, stage3hr.h, substring.c, substring.h, types.h: Merged
2066      revisions 213472 through 213572 from branches/2018-02-16-faster-one-kmer
2067      to increase speed and accuracy
2068
2069    * VERSION, config.site.rescomp.prd, config.site.rescomp.tst, index.html:
2070      Updated for latest version
2071
2072    * stage1hr.c: Added debugging statements
2073
20742018-02-16  twu
2075
2076    * stage3.c: Replaced check with an assertion
2077
2078    * gmap.c: Filtering out Stage3_T objects with zero npairs
2079
2080    * junction.c, junction.h: Implemented Junction_typestring
2081
2082    * stage3.c: Making explicit check for npairs being 0 in Stage3_new
2083
2084    * pair.c: Handling extra exons with a specific type and transition rules
2085
2086    * gmap.c: Checking for Stage3_T object having zero pairs before pushing onto
2087      list
2088
2089    * cigar.c: Checking hardclip when printing insertion token.  Calling either
2090      Stage3end_substring_low or Stage3end_substring_high
2091
2092    * stage3hr.c: Added provisions for junctions_HtoL
2093
20942018-02-13  twu
2095
2096    * localdb.c: Performing a check for possible negative coordinates
2097
2098    * cigar.c, cigar.h: Storing value of extended_cigar_p during setup
2099
2100    * gsnap.c: Added flags for --endtrim-length and --sam-extended-cigar
2101
2102    * stage3.c: In Stage3_recompute_coverage, added a check for npairs being 0
2103
2104    * shortread.c, shortread.h: Added support for end (3') trimming of each read
2105
2106    * kmer-search.c: Improved debugging statements
2107
2108    * gmap.c: Added checks for breakpoint being invalid, before running
2109      Stage3_mergeable
2110
2111    * chimera.c: Added checks for npairs being zero
2112
2113    * gsnap.c: Removed the --expand-offsets flag
2114
2115    * access.c: Put a limit on the maximum number of attempts to kill an
2116      unattached shared memory segment
2117
2118    * splice.c, substring.c: Checking for cases where the splice search
2119      boundaries yield a negative coordinate.  Redefined middle coordinate to be
2120      alignstart and alignend, and not the positions 1 bp distal to these
2121      coordinates.
2122
2123    * stage1hr.c: Increased all sizelimits for diagonals from 100 to 3000
2124
2125    * stage3.c: Defined all parameters for allowed iterations to be 1
2126
2127    * stage2.c: Added a step to filter stage2 middles, starts, and ends for
2128      uniqueness
2129
21302018-02-11  twu
2131
2132    * substring.c: Improved debugging statement
2133
2134    * stage3hr.c: Added a comment
2135
2136    * splice.c: Improved debugging statements
2137
2138    * stage3hr.c: Now allowing Stage3pair_resolve_insides to resolve both ends
2139      at the same time
2140
2141    * VERSION: Updated version number
2142
2143    * stage3hr.c: Added debugging statements
2144
21452018-02-10  twu
2146
2147    * pair.c: Fixed counting routines to skip over bad pairs correctly
2148
2149    * substring.c: Removed unused code
2150
2151    * stage3hr.c: In Stage3pair_resolve_insides and resolve_inside_general
2152      procedures, when hits are changed for the hitpair, revising the values of
2153      alts_resolve_5 and alts_resolve_3.  Fixes a seg fault when
2154      Stage3pair_eval_and_sort tries to resolve the hits a second time
2155
2156    * path-solve.c, samprint.c, stage3hr.c, stage3hr.h, substring.c,
2157      substring.h: Changed variable names to distinguish between an ambiguous
2158      splice length and alternative splice coords
2159
2160    * trunk, config.site.rescomp.tst, src, pair.c, pair.h, segment-search.c,
2161      segment-search.h, stage1hr.c, stage3.c, stage3hr.c: Merged revisions
2162      213287 through 213291 from branches/2018-02-10-fix-bugs to fix bugs
2163      related to pairarrays that crossed chromosomal bounds, segments subsuming
2164      others, and handling of dual breaks
2165
2166    * pair.c: Hiding function Pairarray_convert_to_substrings from GMAP
2167
2168    * trunk, index.html, src, cigar.c, cigar.h, distant-rna.c,
2169      extension-search.c, filestring.c, filestring.h, gsnap.c, kmer-search.c,
2170      output.c, pair.c, pair.h, path-solve.c, samprint.c, samprint.h,
2171      segment-search.c, splice.c, stage1hr.c, stage3hr.c, stage3hr.h,
2172      substring.c, substring.h: Merged revisions 213162 through 213277 from
2173      branches/2018-02-07-improve-circular to standardize printing of circular
2174      and translocation alignments
2175
2176    * stage3hr.c: Fixed printing of method label
2177
21782018-02-08  twu
2179
2180    * extension-search.c: Making sure that queryoffset does not go outside
2181      bounds of 0 to query_lastpos
2182
21832018-02-07  twu
2184
2185    * substring.c: In Substring_new, giving initial values to trim_left and
2186      trim_right
2187
2188    * substring.c: In Substring_new, calling trim_left_end and trim_right_end
2189      from 0 and querylength, but trim_novel_spliceends from given querystart
2190      and queryend.  If novel spliceends yields a short exon, then using the
2191      non-spliced trimming results.
2192
2193    * stage3hr.c: Calling compute_circularpos on all Substring_T objects
2194
2195    * samprint.c: Added debugging statements
2196
2197    * substring.c: In Substring_new, initializing necessary values of
2198      Substring_T object earlier than first possible abort and call to
2199      Substring_free
2200
22012018-02-06  twu
2202
2203    * trunk, config.site.rescomp.tst, index.html, src, extension-search.c,
2204      gmap.c, oligo.c, oligo.h, reader.c, reader.h, stage1hr.c, svncl.pl: Merged
2205      revisions 213097 through 213130 from branches/2018-02-05-distant-dna to
2206      allow for updating of querypos around N's and fix of fatal bug in GMAP in
2207      merge_left_and_right_readthrough
2208
22092018-02-05  twu
2210
2211    * substring.c: In trim_left_end and trim_right_end, computing trims relative
2212      to querystart and queryend, rather than 0 and querylength.  In
2213      embellish_genomic_sam, filling in beginning and end with dashes and stars
2214      to avoid genomic_refdiff having a different length than querylength
2215
2216    * Makefile.gsnaptoo.am: When MAKE_LIB is not defined, not copying headers
2217      either
2218
2219    * VERSION, config.site.rescomp.prd: Updated version number
2220
2221    * get-genome.c, parserange.c, parserange.h: Allowing Parserange_universal to
2222      return a value for whole_chromosome_p
2223
2224    * substring.c: In Substring_new, always making querystart and queryend
2225      correspond to trim_left and trim_right, to fix errors in CIGAR strings
2226
2227    * access.c: Using a loop to create (and possibly deallocate) shared memory
2228
2229    * access.c: Upon startup, if a shared memory segment exists with no other
2230      attached processes (possibly corrupted), then deallocating it and creating
2231      a new one
2232
2233    * kmer-search.c: Simplified code in making sure that both ends are in the
2234      same chromosome
2235
2236    * samprint.c: In SAM_compute chrpos, if hardclip_low yields a NULL
2237      low_substring, then trying hardclip_high
2238
2239    * substring.c: Handling the case where both cdna and genome have 'N'
2240
2241    * stage3hr.c, stage3hr.h: Reversed revision 213080, which is giving CIGAR
2242      errors.  Restored Stage3end_substring_high
2243
2244    * substring.c: In Substring_new, added a minimum of 8 bp in general test 2
2245
2246    * stage3hr.c: For Stage3end_substring_low and Stage3end_substring_high, if
2247      the hardclip passes all substrings, return the last one processed
2248
2249    * stage3hr.c: In Stage3end_new_substrings, handling the case where
2250      substring1 and substringN have different chrnums assigned by Substring_new
2251
2252    * stage3hr.c: In Stage3end_new_substitution, handling the case where
2253      Substring_new has assigned a different chrnum than the one given
2254
2255    * substring.c: Made trim incremental when performed as a preliminary step
2256      before finding novel splice ends
2257
2258    * kmer-search.c: Made a more rigorous check that both ends are on the same
2259      chromosome
2260
2261    * kmer-search.c: Making sure the two ends are on the same chromosome
2262
2263    * substring.c: Handling the case where the alignment goes over the upper
2264      bound of the next higher chromosome
2265
2266    * substring.c: Removed residue from an SVN merge conflict
2267
2268    * trunk, src, Makefile.gsnaptoo.am, distant-rna.c, gsnap.c, path-solve.c,
2269      stage1hr.c, stage3hr.c, stage3hr.h, terminal.c, terminal.h, types.h:
2270      Merged revisions 213033 through 213070 from
2271      branches/2018-02-05-add-terminals to add terminal alignments
2272
22732018-02-04  twu
2274
2275    * substring.c: Not applying general test 1 if orig_nmismatches < 0
2276
2277    * substring.c: Making outofbounds adjustments to be increments to the
2278      existing trim_left and trim_right
2279
2280    * trunk, src, cigar.c, distant-rna.c, gsnap.c, kmer-search.c, samprint.c,
2281      splice.c, stage1hr.c, stage3hr.c, stage3hr.h, substring.c, substring.h:
2282      Merged revisions 213027 through 213053 from
2283      branches/2018-02-04-simplify-substring-new to simplify Substring_new
2284      procedure
2285
2286    * config.site.rescomp.prd, config.site.rescomp.tst, index.html, svncl.pl:
2287      Updated version number
2288
2289    * stage3.c: Added a List_reverse command
2290
2291    * types.h: Added comments
2292
2293    * stage1hr.c: Fixed wrong assignment of first_read_p
2294
2295    * stage3hr.c: Checking for the eventrim region having bounds that don't make
2296      sense
2297
2298    * substring.c: Added an assertion
2299
2300    * stage3.c: Restored pass 7 to remove dual breaks at the ends
2301
2302    * stage1hr.c: Fixed a memory leak for queryrc
2303
2304    * filestring.c: Added a check for fp being NULL, which can occur with the
2305      flags --omit-concordant-uniq or --omit-concordant-mult
2306
23072018-02-02  twu
2308
2309    * VERSION: Updated version number
2310
2311    * configure.ac: Fixed command for --enable-lib
2312
23132018-02-01  twu
2314
2315    * trunk, index.html, src, block.c, block.h, extension-search.c,
2316      extension-search.h, filestring.c, gmap.c, gsnap.c, indexdb.c, indexdb.h,
2317      intersect.c, kmer-search.c, kmer-search.h, oligoindex_hr.c, pair.c,
2318      pair.h, path-solve.c, segment-search.c, stage1.c, stage1.h, stage1hr.c,
2319      stage1hr.h, stage2.c, stage3.c, stage3.h, stage3hr.c, substring.c,
2320      transcriptome.c: Merged revisions 212952 through 212996 from
2321      branches/2018-02-01-reverse-gmap-slowdown
2322
2323    * gmap.c: Fixed reference to array0 that should have been array1
2324
2325    * configure.ac: Disabled alloca by default
2326
2327    * stage3.c: Restored computation of Stage2_compute_starts and
2328      Stage2_compute_ends
2329
2330    * pair.c: Handling GFF3 output when chrstring or accession is NULL
2331
2332    * oligoindex_localdb.c, oligoindex_localdb.h: Not using
2333      Oligoindex_localdb_tally because of speed
2334
2335    * inbuffer.c: Added debugging statements
2336
2337    * gmap.c: Restored function of --pairalign option
2338
23392018-01-30  twu
2340
2341    * stage3hr.c: Made changes to debugging statements
2342
2343    * stage3.h: Updated interface for Stage3_setup
2344
2345    * stage3.c: Distinguishing between overall_end_distance_linear and
2346      overall_end_distance_circular when making call to Stage2_compute_starts
2347      and Stage2_compute_ends
2348
2349    * stage1hr.c: Making calls to remove_circular_alias and remove_duplicates
2350      for single-end reads
2351
2352    * samprint.c: Fixed bug where alignment results to circular chromosome were
2353      not being printed
2354
2355    * path-solve.c, kmer-search.c: Not allowing splice junctions for circular
2356      chromosomes
2357
2358    * gsnap.c: Using new interface to Stage3_setup
2359
2360    * gmap.c: Using genomelength instead of genome_totallength.  Initializing
2361      circularp for usersegment
2362
23632018-01-29  twu
2364
2365    * gsnap.c, kmer-search.c, kmer-search.h, path-solve.c, path-solve.h: Not
2366      allowing Junction_new_splice to be called when splicing is turned off
2367
2368    * trindex.c: Added some headers for open() procedure
2369
2370    * gmap.c, gsnap.c: Using interfaces to new setup procedures
2371
2372    * stage3.c, stage3.h: Limiting chrend for Stage2_compute_starts and
2373      Stage2_compute_ends based on genome total length
2374
2375    * stage1hr.c, stage1hr.h: Added hook for distant DNA alignments
2376
2377    * samprint.c, samprint.h: Not printing XS field in SAM output when splicing
2378      is turned off.
2379
2380    * pair.c, pair.h: Printing . in features field of GFF3 output when sense is
2381      unknown. Not printing XS field in SAM output when splicing is turned off.
2382
2383    * stage3.c: Before running Stage2_compute_starts and Stage2_compute_ends,
2384      removed the check on chrend going past chrhigh.  The truncated coordinates
2385      can cause chrend to be less than chrstart if the alignment straddles a
2386      chromosomal bound
2387
2388    * stage3hr.c: Fixed a memory free error.  In Stage3end_new_substrings, when
2389      Stage3end_T object fails due to circular alias, letting Stage3end_free and
2390      not the caller free the junctions
2391
2392    * stage3.c: Accepting a single alignment regardless of final score, if the
2393      original queryjump or genomejump is negative
2394
23952018-01-27  twu
2396
2397    * substring.c: In Substring_new, computing trims first, then adjusting trims
2398      for out-of-bound lengths, then adjusting query and genomic bounds
2399
24002018-01-26  twu
2401
2402    * VERSION: Updated version number
2403
2404    * intlist.c, intlist.h, list.c, list.h, uintlist.c, uintlist.h: Added code
2405      for non-inlined functions
2406
2407    * transcript.c, transcript.h: Made Transcript_num non-inline
2408
2409    * configure.ac: Added macro AC_C_INLINE
2410
2411    * get-genome.c: Added include for intlist.h
2412
24132018-01-25  twu
2414
2415    * archive.html, index.html: Updated for latest version
2416
2417    * path-solve.c: Added a type conversion
2418
2419    * Makefile.gsnaptoo.am: Fixed typos in file names
2420
2421    * VERSION: Updated version number
2422
2423    * gsnap.c: Change in description for --action-if-cigar-error
2424
2425    * gmap.c: Implemented --action-if-cigar-error flag
2426
2427    * stage3.c: Putting an upper bound on the number of Boyer-Moore searches for
2428      microexons, based on the number of 5' and 3' splice positions
2429
2430    * translation.c: Assigning aaphase_e for indels in CDS
2431
2432    * pair.c, pair.h, gsnap.c: Implemented flag --action-if-cigar-error
2433
2434    * oligoindex_hr.h, oligoindex_hr.c: Moved SIMD includes to header file,
2435      since definition of Oligoindex_T object is now there
2436
2437    * gmap.c: Using new interface to Pair_setup
2438
2439    * Makefile.gsnaptoo.am: Turned off making of gmapl and gsnapl for all SIMD
2440      types
2441
2442    * pair.c: Implemented patch by Nathan Weeks to remove extra token from
2443      print_gff3_exons_forward.  For GFF3 code, changed types for genomic
2444      coordinates from int to Chrpos_T.  Also initializing genomic coordinates
2445      to 0 instead of -1
2446
24472018-01-24  twu
2448
2449    * oligoindex_hr.c: Added assertion
2450
2451    * stage3.c: Fixed computation of chrstart and chrend before final
2452      Stage2_compute_start and Stage2_compute_ends for circular chromosomes
2453
2454    * trunk, src, Makefile.gsnaptoo.am, gmap.c, gsnap.c, localdb.c, localdb.h,
2455      merge.c, oligoindex_hr.c, oligoindex_hr.h, oligoindex_localdb.c,
2456      oligoindex_localdb.h, stage2.c, stage2.h, stage3.c: Merged revisions
2457      212704 through 212742 from branches/2018-01-23-gmap-localdb
2458
2459    * VERSION, config.site.rescomp.tst: Updated version
2460
2461    * configure.ac: Added the conditional MAKE_LIB and a flag to control it
2462
2463    * pair.c: Added an assertion about cds_phase being non-negative
2464
2465    * Makefile.gsnaptoo.am: Building library only if MAKE_LIB is true
2466
24672018-01-23  twu
2468
2469    * gsnap.c: Allow --terminal-threshold flag for backward compatibility, but
2470      ignore
2471
24722018-01-22  twu
2473
2474    * gsnap.c: Allowing the --use-sarray flag and ignoring it
2475
2476    * stage1hr.c: Fixed bugs regarding the use of querylength3 instead of
2477      querylength5, and for handling the case where all paired-end hits are NULL
2478      and we need to run GMAP on the complete paths.
2479
24802018-01-21  twu
2481
2482    * setup1.test.in: Turning off setup1.test
2483
24842018-01-19  twu
2485
2486    * trunk, src, Makefile.gsnaptoo.am, access.c, atoiindex.c,
2487      bitpack64-access.h, bitpack64-incr.c, bitpack64-incr.h, bitpack64-read.c,
2488      bitpack64-read.h, bitpack64-readtwo.c, bitpack64-write.c,
2489      bitpack64-write.h, block.c, cellpool.c, cmetindex.c, diagpool.c,
2490      distant-rna.c, distant-rna.h, dynprog.c, dynprog_cdna.c, dynprog_end.c,
2491      dynprog_genome.c, dynprog_single.c, epu16-bitpack64-access.c,
2492      epu16-bitpack64-access.h, epu16-bitpack64-incr.c, epu16-bitpack64-incr.h,
2493      epu16-bitpack64-read.c, epu16-bitpack64-read.h, epu16-bitpack64-readtwo.c,
2494      epu16-bitpack64-readtwo.h, epu16-bitpack64-write.c,
2495      epu16-bitpack64-write.h, extension-search.c, extension-search.h,
2496      filesuffix.h, genome.c, genome128_consec.c, genome128_hr.c,
2497      genome128_hr.h, genome_sites.c, genome_sites.h, gmap.c, gmapindex.c,
2498      gregion.c, gregion.h, gsnap.c, indel.c, indel.h, indexdb-write.c,
2499      indexdb-write.h, indexdb.c, indexdb.h, indexdb_hr.c, indexdbdef.h,
2500      intersect.c, intersect.h, intlist.c, intlist.h, junction.c, kmer-search.c,
2501      kmer-search.h, list.c, list.h, littleendian.h, localdb-write.c,
2502      localdb-write.h, localdb.c, localdb.h, localdbdef.h, matchpool.c,
2503      maxent_hr.c, merge.c, merge.h, oligo.c, oligo.h, oligoindex.c, output.c,
2504      pair.c, pair.h, pairpool.c, path-solve.c, path-solve.h, reader.c,
2505      reader.h, resulthr.c, resulthr.h, samprint.c, sarray-read.c,
2506      sarray-search.c, sarray-search.h, sedgesort.c, sedgesort.h,
2507      segment-search.c, segment-search.h, smooth.c, snpindex.c, splice.c,
2508      splice.h, splicestringpool.c, splicetrie.c, splicetrie.h, stage1.c,
2509      stage1hr.c, stage1hr.h, stage2.c, stage3.c, stage3.h, stage3hr.c,
2510      stage3hr.h, substring.c, substring.h, transcript.c, transcript.h, types.h,
2511      uintlist.c, uintlist.h, uniqscan.c, univdiag.c, univdiag.h, univdiagdef.h,
2512      util, gmap_build.pl.in: Merged revisions 210996 to 212658 from
2513      branches/2017-11-04-faster-transcriptome.  Put previous version into
2514      tags/2018-01-19-version1-pre-transcriptome
2515
2516    * index.html: Revised for new version
2517
2518    * uniqscan.c: Using new interface to Stage1hr_setup
2519
2520    * transcript.c: Sorting transcripts before printing
2521
2522    * stage3hr.c, stage3hr.h: Stage3end_new_transcript aborting of number of
2523      substrings and junctions don't match.  Stage3end_new_substrings returning
2524      new junctions to caller
2525
2526    * stage1hr.h, stage1hr.c: Using transcriptome_end_accept
2527
2528    * sarray-search.c: Various improvements to algorithm
2529
25302018-01-17  twu
2531
2532    * pair.c: Skipping gap characters in Pairarray_genomic_sequence
2533
2534    * gsnap.c: Fixed default value of max_middle_insertions
2535
25362017-11-17  twu
2537
2538    * sarray-search.c: Turned off qsort in favor of sedgesort
2539
25402017-11-15  twu
2541
2542    * stopwatch.c: Added standalone code for testing
2543
2544    * stage3.c: Fixed issue with trimming of chimeras not being re-extended back
2545      to the breakpoint.  Fixed issue with CIGAR strings from shortgap comps
2546      being treated differently from indel comps.
2547
2548    * gsnap.c: Using new interface to Sarray_search.  Printing worker runtimes
2549      to stderr.  Using --transcriptome-accept flag instead of old flags
2550
2551    * sarray-search.c, sarray-search.h: Removed genes_iit as a parameter
2552
2553    * indel.c: Added debugging statements
2554
2555    * dynprog_end.c: Handling the case where rev_goffset is negative
2556
2557    * outbuffer.c: Fixed fatal bug when trying to write SAM headers to
2558      OUTPUT_NONE split output, which is now NULL
2559
25602017-10-30  twu
2561
2562    * stage3.c: Fixed trim_end_indel procedures to stop at a gap
2563
25642017-10-29  twu
2565
2566    * substring.c: Fixed a memory leak when embellish_genomic is called a second
2567      time on a Stage3end_T object
2568
2569    * stage3hr.c: In Stage3end_new_gmap, in computing CIGAR, starting with
2570      hardclips set to zero
2571
2572    * stage3.c: In traverse_single_gap, not allowing a force of the gappairs.
2573      Implemented contiguous versions of peel_leftward and peel_rightward, but
2574      not using yet.
2575
2576    * sarray-search.c: For transcriptome endpoints, restricting further the
2577      cases where an indel should be ignored.  Added some potential code for
2578      genome endpoints, but not implemented
2579
2580    * pairpool.c: In Pairpool_join_end5 and Pairpool_join_end3, if the final
2581      revision process still results in a negative queryjump or genomejump, then
2582      don't attach the end at all
2583
2584    * get-genome.c: If map file is not in mapdir, then look in genomesubdir
2585
2586    * cigar.c, cigar.h, gsnap.c: Calling Cigar_setup to initialize static
2587      variables
2588
25892017-10-20  twu
2590
2591    * trindex.c: Added error message if genes file not provided by user
2592
2593    * stage1hr.c, stage3hr.c, stage3hr.h: Doing remapping back to transcriptome
2594      during pairing procedure
2595
2596    * get-genome.c: Added comment
2597
2598    * pair.c, pair.h, samprint.c: For GMAP-based alignments in GSNAP, printing
2599      transcript info
2600
2601    * sarray-read.c, sarray-read.h: Changed relevant types from Univcoord_T to
2602      Sarrayptr_T
2603
2604    * sarray-search.c, sarray-search.h: Changed relevant types from Univcoord_T
2605      to Sarrayptr_T.  Choosing closest indel on left and right ends.  For
2606      transcriptome, checking for overlapping or adjacent indels
2607
26082017-10-18  twu
2609
2610    * config.site.rescomp.prd, config.site.rescomp.tst, index.html: Updated for
2611      latest version
2612
2613    * transcriptome.c, transcriptome.h: Implemented
2614      Transcriptome_genomic_bounded_p
2615
2616    * transcript.c, transcript.h: Implemented Transcript_in_list_p.  Returning
2617      min_insertlength from Transcript_intersect_p
2618
2619    * stage3hr.c: Fixed bug in retrieving parent hit in Stage3end_remove_overlaps
2620
2621    * sarray-search.c: Fixed an issue where querystart was -1, which can result
2622      when nmatches is 0
2623
26242017-10-17  twu
2625
2626    * gmap.c: Changed option name from --alt-initiation-codons to
2627      --alt-start-codons
2628
2629    * stage1hr.c: Fixed typo in variable name
2630
26312017-10-12  twu
2632
2633    * stage3hr.c: Removed optimization to use the minimum of npaths and
2634      maxpaths, since we need computations on all npaths to make a random
2635      selection
2636
2637    * samprint.c: Handling dinucleotides gracefully if genomic sequence is NULL
2638
2639    * gsnap.c: Increased default expected_pairlength from 200 to 500.  Removed
2640      --maxsearch option, since it can lead to poor answers.
2641
2642    * outbuffer.c, samflags.h, samheader.c: Fixed problem introduced in 2017-05
2643      which caused the --output-file option to produce a NULL file pointer
2644
2645    * chrom.c: Deferencing a character pointer
2646
2647    * pair.c: Fixed problem with initialization of endi in determining gff3
2648      coordinates.  Avoiding reading the pair at index of npairs.
2649
2650    * cellpool.c, cellpool.h, stage2.c: Adding non-overlapping paths, as well as
2651      high-scoring paths
2652
2653    * stage1hr.c: Using new interface to Sarray_search_transcriptome
2654
2655    * substring.c, substring.h: Added functions for Substring_chrpos_low and
2656      Substring_chrpos_high
2657
2658    * stage3hr.h: Added interfaces for chrpos_low and chrpos_high
2659
2660    * stage3hr.c: In transferring Transcript_T objects, checking for duplicates
2661
2662    * sarray-search.c, sarray-search.h: Allowing for genomic bounds on
2663      transcriptome alignment, for use in re-aligning genomic hit to find
2664      transcriptome coordinates
2665
2666    * gmap.c, translation.c, translation.h: Adding option for only ATG as
2667      initiation codon.  Making this the default.
2668
2669    * stage3.c: For trimming end exons, using a percentage of the querylength
2670      instead of a fixed length
2671
26722017-10-03  twu
2673
2674    * stage3hr.c: Removed old version of pair_up_concordant_transcriptome
2675
2676    * stage3hr.c: In resolve_ambiguous_splice procedures, when ambiguity is
2677      resolved, setting genomicstart or genomicend fields accordingly
2678
2679    * stage3hr.h: Removed interface for Stage3pair_pairtype
2680
2681    * stage3hr.c: Fixed the order for constructing genomic sequence from
2682      substrings
2683
2684    * substring.h, substring.c: Not applying general test of goodness for
2685      substrings from transcriptome-guided alignment
2686
2687    * sarray-search.c: Modified debugging statements
2688
2689    * samprint.c: Determining pairtype again, and allowing for the possibility
2690      of concordant uniq
2691
26922017-10-02  twu
2693
2694    * trindex.c: Copying the genes IIT file to the transcriptome directory
2695
2696    * transcriptome.c: Allowing a transcriptome to be read without a genome, by
2697      using divints instead of chrnums
2698
2699    * transcript.c: Revised debugging statements
2700
2701    * gsnap.c, uniqscan.c: Using new interface to Stage1hr_setup
2702
2703    * stage3hr.c, stage3hr.h: Implemented a separate procedure for pairing up
2704      transcriptome alignments.  Transcriptome hit types now take precedence.
2705      Implemented Stage3end_substrings_genomic_sequence.
2706
2707    * stage1hr.c, stage1hr.h: Using new interface to Stage3pair_new.  Re-mapping
2708      to transcriptome from genomic suffix array alignment.
2709
2710    * junction.c, junction.h: Added function Junction_deletionpos
2711
27122017-09-29  twu
2713
2714    * stage3.c: Moved location of build_dual_breaks step to get better behavior
2715
27162017-09-27  twu
2717
2718    * stage3.c: Fixed over-aggressive use of minintronlen_ends from wrong end of
2719      sequence
2720
2721    * chimera.c: Initializing a variable
2722
27232017-09-22  twu
2724
2725    * uinttableuint.c, uinttableuint.h: Initial import
2726
2727    * stage1hr.c, stage3hr.c, stage3hr.h: For transcriptome-guided genomic
2728      alignment, placing results into concordant_uniq instead of
2729      paired_uniq_long, if a transcript matches both ends
2730
2731    * pair.c, pair.h: Fixed computation of cds bounds for GFF3 output
2732
2733    * gsnap.c, uniqscan.c: Using new interface to Pair_setup
2734
2735    * gmap.c: Restored MAX_CHIMERA_ITER to be 3, but not iterating multiple
2736      times for middle pieces.  Added option --gff3-cds
2737
2738    * Makefile.gsnaptoo.am: Added uinttableuint.c to library
2739
2740    * translation.c: Assigning aaphase_g for final genomic codon
2741
27422017-09-11  twu
2743
2744    * gmap.c: Restored --intronlength option
2745
2746    * pair.c: Fixed gff3 cds output so it ignores indels
2747
2748    * pair.c: Replaced gff3 printing code for CDS with a call to code for exons
2749
2750    * pair.c, pair.h: Using new interface to transcript print functions
2751
2752    * stage3hr.c, stage3hr.h, transcript.c, transcript.h: Replacing separate
2753      trnums, trstarts, and trends fields with transcripts field
2754
2755    * stage1hr.c: No longer filtering initially by transcript concordance
2756
2757    * gsnap.c, samprint.c, samprint.h: Using new interface to transcripts field
2758      for Stage3end_T and Stage3pair_T objects
2759
2760    * gmap.c: No longer iterating on check_middle_local
2761
2762    * Makefile.gsnaptoo.am: Added transcript.c and transcript.h to programs that
2763      need it from pair.c
2764
27652017-09-05  twu
2766
2767    * sarray-search.c, stage3hr.c: Commented out or fixed code for LARGE_GENOMES
2768      to use Uint8list_T
2769
2770    * sarray-search.c, sarray-search.h: Commented out genome code for
2771      LARGE_GENOMES
2772
2773    * trindex.c: Fixed code so it will compile.  Fixed memory leaks.
2774
2775    * trunk, VERSION, gsl.m4, config.site.rescomp.prd, config.site.rescomp.tst,
2776      index.html, src, Makefile.gsnaptoo.am, bitpack64-readtwo.c,
2777      genome128_consec.c, genome128_consec.h, genome128_hr.c, genome128_hr.h,
2778      get-genome.c, gmap.c, gmapindex.c, gsnap.c, iit-read-univ.c,
2779      iit-read-univ.h, iit-read.c, iit-read.h, indel.c, indel.h, intlist.c,
2780      intlist.h, junction.c, junction.h, mapq.c, mapq.h, output.c, pair.c,
2781      pair.h, pairpool.c, samprint.c, samprint.h, sarray-read.c, sarray-read.h,
2782      sarray-search.c, sarray-search.h, sarray-write.c, sarray-write.h,
2783      splicealt.c, stage1hr.c, stage1hr.h, stage3hr.c, stage3hr.h, substring.c,
2784      substring.h, transcriptome.c, transcriptome.h, trindex.c, uniqscan.c:
2785      Merged revisions 207858 to 209656 from branches/2017-07-01-transcripts to
2786      allow for transcriptome-guided genomic alignment
2787
2788    * sarray-search.c: Fixed bug resulting from check of common diagonal over
2789      circular origin
2790
27912017-09-01  twu
2792
2793    * index.html: Updated for latest version
2794
2795    * stage1hr.c: Fixed criterion for looking for spliceends with nmismatches
2796      less than max_splice_mismatches on each end
2797
2798    * sarray-search.c: Checking right and left diagonals for collinearity with
2799      middle diagonal in query coordinates
2800
2801    * gmapindex.c: Casting all Univcoord_T lengths to UINT4 for suffix array
2802      procedures
2803
2804    * get-genome.c: Added option --gsequence to print exons and introns
2805
28062017-08-15  twu
2807
2808    * stage3hr.c: Changed checks on circularalias to circularpos
2809
2810    * sarray-search.c: Disallowing any splicing solution that goes around a
2811      circular origin
2812
2813    * pair.c: Checking that we are not at the end of the alignment before doing
2814      backward steps
2815
2816    * gmap.c: Removed option -G for uncompressed genome
2817
2818    * stage1hr.c: Disallowing any splicing solution that goes around a circular
2819      origin. Incrementing counter when comparing against max_gmap_improvement.
2820      Fixed a memory leak.
2821
28222017-07-27  twu
2823
2824    * stage3.c: In find_dual_break_spliceends, fixed a bug that generated
2825      negative coordinates
2826
28272017-06-29  twu
2828
2829    * Makefile.gsnaptoo.am: Added some files to the library and the include
2830      directory
2831
2832    * table.c, table.h: Added a function needed by gstruct
2833
2834    * interval.c, interval.h, iit-write.c, iit-write.h: Added a variable to make
2835      a function compatible with the gstruct version
2836
2837    * gsnap.c: Added a header file
2838
2839    * dynprog_genome.c: Commented out assertions that do not hold in transcript
2840      alignment
2841
2842    * chrom.c: Removed a faulty assertion
2843
28442017-06-21  twu
2845
2846    * stage3.c: For final call to insert_gapholders from path_compute_final,
2847      filling the gap with nucleotides if queryjump == genomejump
2848
2849    * pair.c: For GFF3 output, not printing lines where genomestart and
2850      genomeend coordinates are the same, typically resulting from a query skip
2851
28522017-06-20  twu
2853
2854    * Makefile.gsnaptoo.am: Added maxent_hr to lib and include
2855
2856    * stage3hr.c: Added assertions to make sure ilengths are not negative
2857
2858    * substring.c: In overlap checking procedures, decrementing high coordinate
2859      by 1 if possible to match the procedures for clip_overlap and
2860      merge_overlap in stage3hr.c
2861
28622017-06-19  twu
2863
2864    * VERSION, index.html: Updated version number
2865
28662017-06-16  twu
2867
2868    * gmap.c: Added to debugging statements
2869
2870    * stage3.c: In merge procedures, restoring original pairs to Stage3_T
2871      objects if the merge fails
2872
2873    * gsnap.c: Turning off default of 0 for trim-mismatch-score and
2874      trim-indel-score for DNA-Seq
2875
28762017-06-15  twu
2877
2878    * samprint.c, pair.c: For XM, handling the case where queryseq_mate is NULL
2879
2880    * shortread.c: Changed memory source of longstring to IN
2881
2882    * samprint.c: Added back a missing else clause after checking for
2883      omit_concordant_uniq_p
2884
2885    * stage3.c: Added debugging statements for creating and freeing Stage3_T
2886      objects
2887
2888    * stage1hr.c: Fixed memory leaks relating to floors and anchor segments
2889
2890    * sequence.c: Changed memory source of all contents to IN
2891
2892    * pair.c: Changed memory source of all tokens to OUT
2893
2894    * list.c, list.h: Implemented List_to_array_out_n
2895
2896    * intlist.c, intlist.h: Implemented Intlist_to_char_array_in
2897
2898    * gmap.c: Fixed memory leaks and memory bugs relating to chimera code.
2899      Removed all references to a nonjoinable list, and using stage3list as the
2900      master list for all procedures.
2901
2902    * genome.c: Changed source of alloc to IN
2903
29042017-06-14  twu
2905
2906    * index.html: Updated for latest version
2907
2908    * configure.ac: Removed unused macros
2909
2910    * src, util: Merged revisions 204076 through 207268 from
2911      branches/2017-03-07-multimapper-genes
2912
2913    * stage3hr.c: Reverted from revision 207330 (revision 205421 from
2914      branches/2017-03-07-multimapper-genes) to remove nindelbreaks field, since
2915      it discriminates against some equivalently good alignments
2916
2917    * stage3hr.c: Merged revision 205421 from
2918      branches/2017-03-07-multimapper-genes to add nindelbreaks field
2919
2920    * Makefile.gsnaptoo.am: Added commands for building lib and include
2921
2922    * uniqscan.c: Added Access_controlled_cleanup
2923
2924    * substring.c: Merged revisions 204076 through 205371 from
2925      branches/2017-03-07-multimapper-genes to remove splicecoordN and to set
2926      splicecoordD_knowni and splicecoordA_knowni.
2927
2928    * stage1hr.c: Merged revisions 204076 through 205713 from
2929      branches/2017-03-07-multimapper-genes to find DNA chimeras in paired-end
2930      reads and to double-check apparent perfect matches for actual number of
2931      mismatches
2932
2933    * sequence.c, sequence.h: Added function Sequence_stdout_header
2934
2935    * sarray-read.c, sarray-read.h, sarray-search.c, sarray-search.h: Merged
2936      revisions 204076 through 205420 from branches/2017-03-07-multimapper-genes
2937      to move search functions from sarray-read.c to sarray-search.c
2938
2939    * samprint.c, samprint.h: Merged revisions 204076 through 206196 from
2940      branches/2017-03-07-multimapper-genes to print information in XT field for
2941      transcript splicing and to handle omitting of concordant alignments
2942
2943    * samheader.c: Don't open a file for OUTPUT_NONE
2944
2945    * popcount.c, popcount.h: Modified conditions for including our own popcount
2946      instructions.  No longer needed if built-in options are available
2947
2948    * pair.c: Modified compressed format to no longer print tokens or
2949      dinucleotides
2950
2951    * littleendian.h: Added macros for FREAD_FLOATS and FWRITE_FLOATS
2952
2953    * iit-read.c, iit-read.h: Merged revisions 205322 through 206058 from
2954      branches/2017-03-07-multimapper-genes to support --coding in get-genome
2955      and to implement IIT_genestruct_chrpos
2956
2957    * gsnap.c: Merged revisions 204076 through 206184 from
2958      branches/2017-03-07-multimapper-genes to add options for transcriptome
2959      alignment and omitting concordant output.
2960
2961    * datadir.c: Modified messages when gmapdb is not found
2962
2963    * cigar.c: Fixed printing of "*" for cigar with mate is NULL or substrings
2964      is NULL
2965
2966    * block.c: Merged revision 204180 from branches/2017-03-07-multimapper-genes
2967      to generalize from 12-mers to oligo size for debugging output
2968
2969    * stage3hr.c, stage3hr.h: Merged revisions 206187 and 205714 from
2970      branches/2017-03-07-multimapper-genes to add behavior for
2971      --omit-concordant-uniq and --omit-concordant-mult and to add a
2972      splice_score field for all splice types
2973
29742017-06-13  twu
2975
2976    * pair.c, stage3.c: Changed type of chroffset and chrhigh from Chrpos_T to
2977      Univcoord_T in trim end functions
2978
29792017-06-12  twu
2980
2981    * src, gmap.c, output.c, pair.c, pair.h, stage3.c, stage3.h: Merged revision
2982      204925 from branches/2017-04-02-genome-genome to add bedpe output
2983
2984    * diag.c, stage2.c: Merged revisions 207196 and 207198 from
2985      branches/2017-04-02-genome-genome to improve genome-genome alignment
2986
29872017-06-09  twu
2988
2989    * stage3.c, output.c: Using functions now in pair.c
2990
2991    * samprint.c, cigar.c, pair.c, pair.h, stage3hr.c, stage3hr.h: Moved some
2992      functions to pair.c
2993
2994    * get-genome.c: Allowing for --dump to work with --exons
2995
2996    * substring.h: Moved typedef of Substring_T early
2997
2998    * Makefile.gsnaptoo.am: Including cigar.c and cigar.h for uniqscan and
2999      uniqscanl
3000
30012017-05-30  twu
3002
3003    * substring.c, substring.h: Commenting out procedures needed for chrpos_high
3004
3005    * stage3hr.c, stage3hr.h: Commenting out procedures needed for chrpos_high.
3006      Using procedures from cigar.c.
3007
3008    * stage3.c: Using procedures from cigar.c
3009
3010    * stage1hr.c: Added debugging statement
3011
3012    * pair.c, pair.h, samprint.c, samprint.h: Moved CIGAR printing procedures to
3013      cigar.c.  Printing mate cigar in XM field instead of mate chrpos_high.
3014
3015    * gsnap.c: Using new interfaces to Output_setup and SAM_setup
3016
3017    * output.c, output.h, samprint.c, samprint.h: Moved setup of merge_samechr_p
3018      from output.c to samprint.c
3019
3020    * Makefile.gsnaptoo.am, cigar.c, cigar.h: Added cigar.c and cigar.h for code
3021      relating to printing of CIGAR strings
3022
30232017-05-25  twu
3024
3025    * sarray-read.c: Increased iteration condition, allowing sarray algorithm to
3026      work when nmisses_allowed is zero.
3027
30282017-05-12  twu
3029
3030    * stage3.c: In Stage3_merge_chimera, doing peelback to remove any indels at
3031      the chimeric junction
3032
30332017-05-11  twu
3034
3035    * output.c, pair.c, pair.h, samprint.c, samprint.h, stage3hr.c, stage3hr.h,
3036      substring.c, substring.h: Added printing of mate chrpos high with an XM
3037      field
3038
3039    * stage1hr.c: Removed exception for FREE_ALIGN when nstreams is 1
3040
3041    * iit_get.c: Commented out printing of total when reading queries from stdin
3042
30432017-05-10  twu
3044
3045    * chimera.c, chimera.h, gmap.c: Allowing search for chimera exon-exon
3046      boundary to extend for 1 mismatch
3047
3048    * stage3.c, stage3.h: Implemented procedures Stage3_trim_left and
3049      Stage3_trim_right
3050
3051    * gmap.c: Calling Chimera_find_breakpoint first to set bounds based on
3052      sequence, and then Chimera_find_exonexon to find the exon boundary
3053
3054    * gmap.c: Increased value of CHIMERA_EXTEND from 8 to 20
3055
3056    * parserange.c: Added null terminating character after strncpy
3057
30582017-05-09  twu
3059
3060    * shortread.c: Fixed uninitialized variable in nextchar2 and invalid free
3061      when skipping in second file
3062
30632017-05-08  twu
3064
3065    * uniqscan.c: Using new interface to Stage1hr_setup
3066
3067    * gsnap.c, stage1hr.c, stage1hr.h: Added --speed option for GSNAP
3068
3069    * gmap.c: Set default value for maxintronlen to be 500,000
3070
30712017-05-03  twu
3072
3073    * stage3hr.c: Changed the procedure for resolving overlapping and separate
3074      alignments.  Now filtering both the overlapping and separate alignments.
3075      Using expected pairlength and pairlength deviation to select which one to
3076      report.
3077
3078    * stage1hr.c: Turned off the shortcut to skip complete set algorithm if
3079      suffix array has found something.  Turned off the shortcut for GMAP
3080      pairsearch/halfmapping if nconcordant > 0
3081
3082    * spanningelt.c: Changed a check procedure to abort rather than exit
3083
3084    * spanningelt.h: Fixed a typo in a comment
3085
3086    * merge.c: Made Merge_diagonals non-destructive, by copying the streams into
3087      the heap
3088
3089    * indexdb_hr.c: Added a comment about Merge_uint4 being destructive
3090
3091    * iit-read.c, iit-read.h: Added IIT_gene_overlapp function used by
3092      get-genome with the --coding flag
3093
3094    * get-genome.c: Added a --coding flag to report only genes that overlap in
3095      their coding regions
3096
30972017-04-24  twu
3098
3099    * index.html: Updated for latest version
3100
3101    * Makefile.am: Added gtf_transcript_splicesites to CLEANFILES
3102
3103    * pair.c, pair.h: Taking mate_chrnum as an argument in SAM print function
3104
3105    * output.c, samprint.c, samprint.h: Computing chrnum and mate_chrnum at same
3106      time as chrpos and mate_chrpos, to resolve issues with SAM output
3107
31082017-04-21  twu
3109
3110    * samprint.c, stage3hr.c, stage3hr.h: Fixed issue in mate chromosome printed
3111      when mate is a translocation
3112
31132017-04-13  twu
3114
3115    * chrom.c: Fixed compare function for alpha_numeric entries
3116
31172017-04-12  twu
3118
3119    * Makefile.am, configure.ac: Added an entry for gtf_transcript_splicesites
3120
3121    * index.html: Changed for latest version
3122
3123    * gtf_transcript_splicesites.pl.in: Changed output format
3124
3125    * stage3.c: Revised debugging statements
3126
3127    * samflags.h, samprint.c, samprint.h: Adding supplementary flag.  Adding
3128      information to XT field for transcript splicing.
3129
3130    * gsnap.c: Turning off trim mismatch for DNA-Seq
3131
3132    * gmap.c: Added comments
3133
3134    * dynprog_end.c: Turned wideband off for extending from medial splicesite,
3135      which was causing the end exon to be re-discovered as an indel
3136
31372017-04-11  twu
3138
3139    * gtf_transcript_splicesites.pl.in: Initial import
3140
31412017-03-17  twu
3142
3143    * substring.c: Commented out assertions that don't hold under SNP-tolerant
3144      alignment
3145
3146    * stage3hr.c: Improved debugging statement
3147
3148    * stage1hr.c: Proceeding to spanning set procedure if the other end has more
3149      hits than the number of concordant hits
3150
3151    * splicing-score.c: Added debugging calls to Maxent_hr procedures, to help
3152      in development
3153
3154    * samprint.c, maxent_hr.c, junction.c: Added debugging statements
3155
3156    * Makefile.gsnaptoo.am: Added files for splicing_score
3157
3158    * iit-read.c: Fixed memory leak for intron-level known splicing
3159
3160    * intron.c, intron.h: Added some utility functions
3161
3162    * indel.c, indel.h: Modified Indel_resolve_middle_deletion to favor and
3163      report intron dinucleotides for short deletions
3164
3165    * gsnap.c: Using new interface to Sarray_setup
3166
3167    * sarray-read.c, sarray-read.h: Checking short deletions with length between
3168      min_intronlength and max_deletionlen to see if they are introns
3169
31702017-03-09  twu
3171
3172    * get-genome.c: Allowing dump of all sequences from a map file
3173
31742017-03-02  twu
3175
3176    * pairpool.c: Allowing querypos and genomepos of 0
3177
31782017-02-24  twu
3179
3180    * VERSION, config.site.rescomp.prd, config.site.rescomp.tst: Updated version
3181      number
3182
3183    * chrom.c: Added a type ALPHA_NUMERIC and sorting appropriately for those.
3184      Stripping "Chr" as well as "chr" from names
3185
3186    * pairpool.c: In Pairpool_push, not doing anything if querypos or genomepos
3187      is less than or equal to 0
3188
3189    * stage2.c: For convert_to_nucleotides, handling the case where path is NULL
3190
3191    * gmap.c: Added missing brace
3192
3193    * gmap.c, stage3.c, stage3.h: Added option --split-large-introns and
3194      implemented procedure Stage3_split
3195
3196    * stage2.c: Renamed variable querypos to curr_querypos in some procedures,
3197      so debug9 can be used
3198
31992017-02-16  twu
3200
3201    * iit-read.c: Handling the case in finding unique positions and splices
3202      where a gene has no overlapping genes
3203
32042017-02-15  twu
3205
3206    * archive.html, index.html: Updated for latest version
3207
3208    * VERSION: Updated version number
3209
3210    * substring.c: Fixed calculation of mandatory_trim_left and
3211      mandatory_trim_right
3212
3213    * indexdb.c: Assigning MMAPPED to positions_high_access when appropriate, to
3214      avoid free() error at end of program
3215
32162017-02-14  twu
3217
3218    * output.c: Ignoring mergedp in restricting the final result to a single path
3219
3220    * gmap.c: Allowing value of --suboptimal-score to be a float.  Ignoring
3221      mergedp in handling the final result
3222
3223    * gmap_build.pl.in: Added flag to build genome index in parts
3224
3225    * substring.c: For default alignment format, filling in stars in regions
3226      where the alignment goes past the beginning or end of the genome
3227
3228    * dynprog_single.c, stage3.c: Added checks against non-positive values for
3229      rlength and glength in Dynprog_single_gap.  Also requiring a positive
3230      value for rlength in running Dynprog_single_gap over Dynprog_cdna_gap or
3231      Dynprog_genome_gap.
3232
3233    * dynprog.c: Added debugging statements
3234
3235    * stage3.c, stage3.h: Added sort comparison procedures to help with local
3236      chimeric joins on each chromosome
3237
3238    * gmap.c: In checking for local chimeric joins, processing each chromosome
3239      separately
3240
3241    * stage3hr.c: Not resolving inside alignment when the coordinates look like
3242      a scramble, which can occur with circular chromosomes
3243
3244    * stage1hr.c: Fixed a memory leak for a non-concordant pair.  Fixed an
3245      uninitialized variable for non-spliced alignment
3246
3247    * oligoindex_hr.h: Commented out obsolete code
3248
3249    * iit-read.c, iit-read.h: Added support for finding unique splices, and for
3250      finding unique positions and splices in a set of genes
3251
3252    * gmap.c, gsnap.c: Fixed printing of SIMD capabilities for AVX2 and AVX512
3253
3254    * get-genome.c: Added ability to dump a map file, and the ability to print
3255      unique positions among a set of genes
3256
3257    * genome128_hr.c: Changed builtin commands for trailing and leading zeroes
3258      to use the long long versions for 64-bit words
3259
3260    * genome.c: Commented out messages to stderr for negative coordinates
3261
3262    * dynprog_genome.c: Increased rewards for canonical intron.  Removed
3263      penalties for indels next to a splice site
3264
32652017-02-08  twu
3266
3267    * get-genome.c: Printing presence/absence of unique splices also
3268
32692017-01-31  twu
3270
3271    * get-genome.c, iit-read.c, iit-read.h: Added option --nunique to print
3272      number of unique positions
3273
32742017-01-27  twu
3275
3276    * oligoindex_hr.c: Fixed typos in atoi functions for SSE2 code
3277
32782017-01-13  twu
3279
3280    * stage3hr.c: Fixed a memory leak in resolving inner splices
3281
3282    * dynprog_end.c: Fixed conditional jump based on finalscore, by not checking
3283      when endalign is QUERYEND_NOGAPS
3284
3285    * stage1hr.c: Fixed uninitialized value for successp.  Using FREE_ALIGN macro
3286
3287    * spanningelt.c, indexdb_hr.c: Using MALLOC_ALIGN instead of MALLOC when
3288      needed
3289
3290    * oligoindex_hr.c: Including atoi.h
3291
3292    * samprint.c, substring.c, substring.h: Fixed coordinates reported in XT
3293      field, which depend on the donor and acceptor strands
3294
3295    * merge.c: Using macros FREE_ALIGN and CHECK_ALIGN
3296
3297    * mem.h: Defined macros FREE_ALIGN and CHECK_ALIGN
3298
32992017-01-10  twu
3300
3301    * genome128_hr.c: Fixed incorrect AVX macro
3302
3303    * oligoindex_hr.c: Changed _mm_bsrli_si128 to _mm_srli_si128.  Added atoi
3304      and ttoc modes to all code.
3305
33062017-01-09  twu
3307
3308    * gsnap.c: Removed option --microexon-spliceprob
3309
33102017-01-06  twu
3311
3312    * stage1hr.c: Using alignments with most matches, even if they are
3313      translocations compared with other hitpairs
3314
33152017-01-02  twu
3316
3317    * genome128_hr.c: For handling middle rows, using <= and >= to endptr and
3318      startptr, instead of < and >
3319
33202017-01-01  twu
3321
3322    * stage3.c: Using new interface to Dynprog_end5_gap and Dynprog_end3_gap
3323
3324    * stage1hr.c: In identify_all_segments, filtering out diagonals <
3325      querylength from the merged array
3326
3327    * dynprog_single.c, dynprog_cdna.c, dynprog_genome.c: Using use8p_size
3328
3329    * dynprog_simd.h: Removing fixed definition for SIMD_MAXLENGTH_EPI8
3330
3331    * dynprog_simd.c: Added assertions for traceback procedures for vertical and
3332      horizontal jumps not to go past the main diagonal.  Put macros around
3333      memory fences in debugging print procedures.
3334
3335    * dynprog_end.c, dynprog_end.h: Using use8p_size and introduced parameter
3336      require_pos_score_p
3337
3338    * dynprog.c, dynprog.h: Introducing an array for use8p_size that depends on
3339      the mismatch type
3340
33412016-12-30  twu
3342
3343    * stage3hr.c: Not converting splices when resolving insides of
3344      paired-end-reads
3345
33462016-12-29  twu
3347
3348    * trunk, src, dynprog_genome.c, gsnap.c, pair.c, pair.h, sarray-read.c,
3349      smooth.c, stage1hr.c, stage1hr.h, stage2.c, stage3.c, stage3.h,
3350      stage3hr.c, stage3hr.h, substring.c, substring.h, uniqscan.c: Merged
3351      revisions 201789 through 202030 from branches/2016-12-18-stage2-soa to
3352      make various improvements to alignments
3353
3354    * stage1hr.c: Added debugging statements
3355
3356    * indexdb_hr.c: Checking for nmerged being 0
3357
33582016-12-16  twu
3359
3360    * ax_ext.m4: Not adding -mno options to an Intel compiler
3361
3362    * indexdb_hr.c: Returning an array created by malloc, rather than
3363      _mm_malloc, from the merge version of Indexdb_merge_compoundpos
3364
3365    * sarray-read.c: Using qsort instead of Sedgesort, because of seg faults
3366      observed on Intel compiler
3367
3368    * Makefile.gsnaptoo.am: Including merge.c, merge.h, merge-heap.c, and
3369      merge-heap.h where needed
3370
3371    * stage1hr.c: Providing a version of identify_all_segments for LARGE_GENOMES
3372
3373    * indexdb_hr.c: Cleaned up code so there are three versions of
3374      Indexdb_merge_compoundpos.  Fixed the merge version.
3375
3376    * oligoindex_hr.c: Fixed faulty svn merge
3377
3378    * genome128_hr.c: Fixed faulty svn merge, and hid shift_lo and shift_hi
3379      procedures
3380
3381    * trunk, src, Makefile.gsnaptoo.am, indexdb_hr.c, mem.h, merge-heap.c,
3382      merge-heap.h, merge.c, merge.h, stage1hr.c: Merged revisions 200992
3383      through 201743 from branches/2016-11-28-simd-merging to revise SIMD merge
3384      code
3385
3386    * spanningelt.c, spanningelt.h: Merged revisions 200992 through 201743 from
3387      branches/2016-11-28-simd-merging to change a calloc to a malloc
3388
3389    * trunk, ax_cpuid_intel.m4, ax_cpuid_non_intel.m4, ax_ext.m4, configure.ac,
3390      src, Makefile.gsnaptoo.am, cpuid.c: Merged revisions 200476 through 201735
3391      from branches/2016-11-14-avx512 to make provisions for AVX-512
3392
3393    * gmap.c: Merged revisions 200476 through 201735 from
3394      branches/2016-11-14-avx512 to change Genome_hr_user_setup to
3395      Genome_hr_setup
3396
3397    * gmap_select.c, gmapl_select.c, gsnap_select.c, gsnapl_select.c: Merged
3398      revisions 200476 through 201735 from branches/2016-11-14-avx512 to add
3399      provisions for AVX-512
3400
3401    * genome128_hr.c, genome128_hr.h: Merged revisions 200476 through 201735
3402      from branches/2016-11-14-avx512 to add shift and wrap procedures
3403
3404    * oligoindex_hr.c, oligoindex_hr.h: Merged revisions 200476 through 201735
3405      from branches/2016-11-14-avx512 to revise algorithms substantially
3406
3407    * oligoindex_old.c, oligoindex_old.h: Merged revisions 200476 through 201735
3408      from branches/2016-11-14-avx512 to make checking code work with current
3409      code
3410
3411    * stage2.c: Merged revisions 200476 through 201735 from
3412      branches/2016-11-14-avx512 to fix debugging comment
3413
3414    * sarray-read.c: Merged revisions 200476 through 201735 from
3415      branches/2016-11-14-avx512 to add AVX-512 code
3416
3417    * stage1hr.c: Fixed uninitialized variable
3418
34192016-12-13  twu
3420
3421    * trunk, VERSION, config.site.rescomp.prd, config.site.rescomp.tst, src,
3422      genome128_hr.c: Merged revisions 201421 through 201532 from
3423      branches/2016-12-09-genomebits-serial-simd to change structure of SIMD
3424      code in genome128_hr.c
3425
3426    * index.html: Updated for version 2016-11-07
3427
3428    * configure.ac: Allowing sse4.1 and sse4.2 as responses to --with-simd-level
3429
3430    * samprint.c: Added missing pair of braces
3431
3432    * gsnap.c, stage1hr.c, stage1hr.h: Removed references to indel_knownsplice
3433      mode for gmap
3434
34352016-11-18  twu
3436
3437    * oligoindex_hr.c: Fixed debugging statements to use SIMD commands in count
3438      procedures
3439
34402016-11-16  twu
3441
3442    * ax_ext.m4: Removed -mno... flags for compilers
3443
3444    * configure.ac: Restricting response to --with-simd-level
3445
3446    * ax_cpuid_intel.m4: Fixed configure issue for AVX2 support using Intel
3447      compiler
3448
34492016-11-14  twu
3450
3451    * pair.c: Removed initialization of static variables
3452
3453    * gsnap.c, outbuffer.c, outbuffer.h, output.c, output.h: Separate output
3454      files for single-end and paired-end results
3455
34562016-11-07  twu
3457
3458    * sam_sort.c: Added printing at monitor intervals
3459
3460    * stage3hr.c: Checking for cases where insertions and deletions extend past
3461      genomicpos 0
3462
3463    * samprint.c: Added preliminary code for printing extended cigar strings
3464
3465    * pair.c, pair.h: Added code for printing extended cigar strings.  Not
3466      printing BLAST e-values.
3467
3468    * indexdb.c: Removed unused statement
3469
3470    * gsnap.c, uniqscan.c: Using new interface to Pair_setup
3471
3472    * gmap_select.c, gmapl_select.c, gsnap_select.c, gsnapl_select.c: Using new
3473      interface to CPUID_support
3474
3475    * gmap.c: Added option --sam-cigar-extended
3476
3477    * cpuid.c, cpuid.h: Added provisions for AVX512
3478
34792016-10-24  twu
3480
3481    * stage1hr.c: Not computing floors if querylength is less than index1part
3482
3483    * pair.c: Showing blast_evalue function for GMAP
3484
3485    * samprint.c: Removing assertions and aborts that do not hold for DNA-Seq
3486      chimeras
3487
34882016-10-23  twu
3489
3490    * bitpack64-read.c, pair.c, samprint.c, stage3hr.c, stage3hr.h, substring.c,
3491      substring.h: Printing BLAST e-values
3492
34932016-10-22  twu
3494
3495    * indexdb-write.c: For initializing counters, using packsizes from
3496      offsetsmeta, rather than recomputing them from offsetsstrm
3497
3498    * bitpack64-write.c: Handling problem with ptri overflowing a signed int.
3499      Now just advancing a pointer.
3500
3501    * bitpack64-read.h: Added interface for a function
3502
3503    * bitpack64-access.c, bitpack64-incr.c, bitpack64-read.c,
3504      bitpack64-readtwo.c: Handling the case where nwritten*4 overflows an
3505      unsigned int.  Casting it first to UINT8.
3506
35072016-10-17  twu
3508
3509    * bitpack64-access.c: Fixed an increment of out in extract_28
3510
35112016-09-26  twu
3512
3513    * archive.html, index.html: Updated for latest version
3514
35152016-09-23  twu
3516
3517    * stage3.c: In solving dual introns, handling the case where single_gappairs
3518      is NULL.  Added code for gmapl.
3519
3520    * stage1.c: Modified debugging statements
3521
3522    * pair.c: Added a check for monotonicity of query coordinates to the
3523      debugging procedure
3524
3525    * dynprog_genome.c: If procedure is returning NULL instead of the computed
3526      gap pairs, then setting finalscore to be negative, so the result is not
3527      used by the calling procedure
3528
3529    * access.c: If shm_attach fails, and using mmap instead, then not trying to
3530      copy a file to a read-only memory segment
3531
3532    * Makefile.gsnaptoo.am: Added uint8list.c and uint8list.h for gmap
3533
3534    * stage2.c: Added back find_shifted_canonical procedure as unused code
3535
35362016-09-20  twu
3537
3538    * substring.h: Using sensedir as a field, instead of chimera_sensedir
3539
3540    * substring.c: Substring_new can use trimmed ends to determine the sensedir.
3541       Using sensedir as a field, instead of chimera_sensedir
3542
3543    * stage3hr.h: Stage3end_new_gmap takes sensedir_knownp as an argument
3544
3545    * stage3hr.c: Stage3end_new_gmap takes sensedir_knownp as an argument, and
3546      can use trimmed ends to determine the sensedir.  Stage3end_new_substrings
3547      can determine sensedir from its component substrings and junctions.  For
3548      comparing alignments, using nmatches rather than nmatches_posttrim
3549
3550    * stage3.h: Changed variable name
3551
3552    * stage3.c: Removing maxpeelback restriction on peeling back for introns.
3553      For microexons, just transferring without checking.  In comparing single
3554      and dual gaps, not using middle exonprob to evaluate middle exon.  In
3555      solving dual breaks for microexons, allowing for multiple possible outer
3556      splice positions.  Changed order of operations to smooth first, then find
3557      dual breaks, and then single introns.
3558
3559    * stage1hr.c: Deciding separately whether to run gmap on 5' and 3' ends,
3560      depending on max_matches found on each end
3561
3562    * pair.c: Putting macro around GSNAP-specific output code for using mate
3563      sensedir
3564
3565    * dynprog_genome.c: Not using probabilities to determine if dinucleotide
3566      solution is good
3567
3568    * boyer-moore.c: Added debugging statement
3569
3570    * dynprog_genome.c: Removed backup algorithm for best score above a
3571      probability threshold. Instead, using best probability among canonical or
3572      semicanonical dinucleotides.
3573
35742016-09-16  twu
3575
3576    * stage3.c: Solving for microexons inside of traverse_dual_break.  Solving
3577      for dual breaks before solving introns.
3578
3579    * splice.c, splice.h: Splice_trim_novel_spliceends function now returning
3580      new splicedir
3581
3582    * stage3.c: Added intron-specific functions for peelback, to handle long
3583      similarity between exon ends and intron segments on the other end.
3584      Function for finding novel spliceends now returns new splicedir, although
3585      currently not used.
3586
35872016-09-15  twu
3588
3589    * samprint.c: Using new interface for Substring_sensedir
3590
3591    * pair.c: Printing mate sensedir, to be consistent with samprint code
3592
35932016-09-13  twu
3594
3595    * stage3hr.h: Removed obsolete functions
3596
3597    * stage3hr.c: Fixed cases where trim was added to amb_length.  Removed
3598      specific amb_length fields for GMAP alignments, and calculating instead
3599      using trim_left_splicep and trim_right_splicep
3600
3601    * stage1hr.c: Modified debugging statements
3602
3603    * substring.h, substring.c: Removed an include statement
3604
3605    * splice.c, splice.h: Moved splice site probability calculations from
3606      substring_trim_novel_spliceends to here
3607
3608    * stage3.c: Fixed gmap_trim_novel_spliceends to initialize mismatchp to be
3609      true if the alignment does not extend to the end
3610
3611    * dynprog_genome.c: Turned off debugging
3612
3613    * substring.c: Added comments
3614
3615    * dynprog_genome.c: Fixed bug in decision-making for using bestscore when it
3616      has a good probability.  Previously, this switched to the
3617      probability-based algorithm.  Renamed variables to clarify the algorithms.
3618
36192016-09-12  twu
3620
3621    * stage3hr.c: For overlap calculation, using just trim, not trim plus amb
3622      length
3623
36242016-09-09  twu
3625
3626    * stage3hr.h: Defining Stage3end_nmatches
3627
3628    * stage3hr.c: Defining nmatches to be nmatches_posttrim plus amb length.
3629      Requiring minimum number of matches to allow a transloc splice.  Favoring
3630      definite ambig results, plus insertlength, over definite splices or
3631      trimmed ambig, and then favoring definite splices over trimmed ambig.
3632
3633    * stage1hr.c: Using Stage3end_nmatches instead of
3634      Stage3end_nmatches_posttrim to decide whether to run GMAP
3635
3636    * substring.h: Defining procedures for returning nmatches and amb lengths
3637
3638    * substring.c: Defining nmatches to be nmatches_posttrim plus ambiguous
3639      length. Computing MAPQ over trimmed region to be consistent with
3640      pair-based method.  For new donor and acceptor substrings, extending the
3641      trim calculation to 0 or querylength.
3642
36432016-09-07  twu
3644
3645    * stage1hr.c: Checking whether result of Stage3end_new_splice is NULL
3646
3647    * stage3hr.c: Using number of matches and nmatches_posttrim in
3648      hit_goodness_cmp and hitpair_goodness_cmp.  Requiring a minimum number of
3649      matches in donor and acceptor before creating a transloc splice.  Added
3650      code for checking suffix array mismatches.
3651
3652    * sarray-read.c: After finding an insertion, modifying querystart of current
3653      diagonal, so next substring operation starts from that position
3654
3655    * indel.c: Improved debugging statements
3656
3657    * bitpack64-incr.c: Fixed errors in code for transferring from bitpack sizes
3658      22 to 24, and from 26 to 28
3659
36602016-09-02  twu
3661
3662    * gmap.c, gsnap.c, uniqscan.c: Using new interface to Indexdb_new_genome
3663
3664    * splice.c: When splice is not found, return -1 as values for nmismatches
3665
3666    * sarray-read.c: Allowing initial value of nmismatches to be used if it is
3667      0.  Fixed case involving ambiguous substrings.
3668
3669    * sarray-read.c: Setting nmismatches correctly in various cases, so we do
3670      not have to recompute them.  Looking at endpoints to determine if the
3671      nmismatches value is correct.
3672
36732016-09-01  twu
3674
3675    * indexdb.c, indexdb.h: For the option --unload-shared-memory, use
3676      allocation and not memory mapping to make sure we deallocate any shared
3677      memory
3678
36792016-08-24  twu
3680
3681    * genome.c: Not accessing beyond end of blocks when enddiscard is 0
3682
36832016-08-16  twu
3684
3685    * VERSION: Updated version number
3686
3687    * README: Discussing MAX_STACK_READLENGTH
3688
3689    * gsnap.c, uniqscan.c: Using MAX_FLOORS_READLENGTH instead of MAX_READLENGTH
3690
3691    * configure.ac, Makefile.gsnaptoo.am: Using MAX_STACK_READLENGTH instead of
3692      MAX_READLENGTH
3693
3694    * stage1hr.h: Adding max_floor_readlength to setup
3695
3696    * stage1hr.c: Removed local allocation of arrays of size MAX_READLENGTH.
3697      Now checking querylength against MAX_STACK_READLENGTH to determine whether
3698      to allocate from stack or heap.  Adding max_floor_readlength to setup
3699
3700    * indel.c, mapq.c, sarray-read.c, splice.c: Removed local allocation of
3701      arrays of size MAX_READLENGTH.  Now checking querylength against
3702      MAX_STACK_READLENGTH to determine whether to allocate from stack or heap
3703
3704    * stage3hr.c: Not allowing any indels to set trims in determining optimal
3705      score
3706
3707    * stage1hr.c: Using pre-processor macro LONG_READLENGTHS to allocate
3708      read-related memory on heap instead of stack.  Setting spliceable_high_p
3709      to be false for last segment.  In computing end indels, ensuring that
3710      shifti is not negative when looking up array value.
3711
3712    * shortread.c: Using MAX_EXPECTED_READLENGTH instead of MAX_READLENGTH
3713
3714    * stage3.c: Handling the case when trimming ends that exon is empty
3715
3716    * stage3hr.c: Restored setting of abort_pairing_p when nconcordant exceeds
3717      maxpairedpaths
3718
3719    * gsnap.c, uniqscan.c: Using new interface to Pair_setup
3720
3721    * indel.c, mapq.c, sarray-read.c, splice.c, substring.c: Using pre-processor
3722      macro LONG_READLENGTHS to allocate read-related memory on heap instead of
3723      stack
3724
3725    * gmap.c, pair.c, pair.h: Added option --gff3-swap-phase
3726
3727    * bytecoding.c: Added explanation messages to remove shared memory segments
3728
37292016-08-12  twu
3730
3731    * trunk, config.site.rescomp.prd, configure.ac, src, Makefile.gsnaptoo.am,
3732      filestring.c, genome_sites.c, gsnap.c, pair.c, samprint.c, sarray-read.c,
3733      sedgesort.c, sedgesort.h, shortread.c, splice.c, stage1hr.c, stage3hr.c,
3734      stage3hr.h, substring.c, substring.h, univdiag.c, univdiag.h, util: Merged
3735      revisions 195608 to 196272 from branches/2016-08-09-genome-sites-hr, which
3736      contains merged revisions from branches/2016-08-02-long-read-fusions and
3737      2016-07-01-better-triage
3738
3739    * trunk, VERSION: Updated version number
3740
3741    * Makefile.gsnaptoo.am: Removed chrsubset.c and chrsubset.h for
3742      splicing-score
3743
3744    * pair.c: Added variable to swap phase for gff3 output
3745
3746    * configure.ac: Added a line to disable maintainer mode for users
3747
3748    * config.site.rescomp.prd, config.site.rescomp.tst, archive.html,
3749      index.html: Updated for latest version
3750
3751    * MAINTAINER: Added note about PATH
3752
37532016-08-08  twu
3754
3755    * gtf_genes.pl.in, gtf_introns.pl.in, gtf_splicesites.pl.in: Printing both
3756      gene_id and gene_name
3757
3758    * atoi.c, cmet.c: Fixed reduce procedures for 64-bit computers
3759
3760    * Makefile.gsnaptoo.am: Added semaphore.c and semaphore.h to list of files
3761      for splicing-score
3762
3763    * stage1hr.c: Fixed debugging statements
3764
3765    * stage3.c: Fixed issue where we tried to use pairs_pretrim after path_trim
3766      altered the pairs
3767
3768    * samprint.c, substring.c, substring.h: Fixed XT field to print correct
3769      junction coordinates
3770
37712016-08-02  twu
3772
3773    * stage3hr.c: Restoring final procedure based on nmatches in
3774      Stage3pair_optimal_score
3775
3776    * stage3.c: Reverting from revision 195487 to allow extraexon comps again
3777      and from revision 193238 to always insert dual break alignments
3778
3779    * comp.h, pair.c, pairpool.c: Reverting from revision 195484 to allow
3780      extraexon comps again
3781
37822016-08-01  twu
3783
3784    * gtf_genes.pl.in, gtf_introns.pl.in, gtf_splicesites.pl.in: Imposing
3785      preference order based on desired keys, rather than the text
3786
3787    * src, inbuffer.c, shortread.c: Merged revisions 195492 and 195493 to fix
3788      problem where --force-single-end terminated when a file had reads that
3789      were a multiple of --input-buffer-size
3790
3791    * stage3.c, comp.h, pair.c, pairpool.c: Using shortgap comp instead of
3792      extraexon comp for representing dual breaks
3793
3794    * shortread.c: Fixed issues in reading multiple pairs of files from command
3795      line
3796
37972016-07-23  twu
3798
3799    * atoiindex.c: Fixed calculation of oligo using new block algorithm
3800
38012016-07-11  twu
3802
3803    * stage1hr.c: For paired terminals, assigning final pairtype to be concordant
3804
3805    * archive.html: Exposed version 2015-07-23.  Improved formatting.
3806
3807    * stage3hr.c, substring.c: Handling the special case when alignstart or
3808      alignend is requested on an ambiguous substring
3809
3810    * stage3hr.c: Computing insertlength and concordance properly for
3811      overlapping dual GMAP alignments
3812
3813    * stage1hr.c: For dynamic programming of anchor segments, proceeding from
3814      closest to farthest segments, to favor shorter splice distances.  Resolved
3815      uninitialized variable when completeset algorithm is called but
3816      spanningset was not called, so read_oligos was not called.  Skipping
3817      re-alignment of hits that are already have a type of GMAP.
3818
3819    * sarray-read.c: Resolving ambiguous ends if one dominates by both
3820      probability and splice distance
3821
3822    * gmap.c, gsnap.c, uniqscan.c: Using new interface to Stage3_setup
3823
3824    * stage3.c, stage3.h: Adding dual break for both SAM and non-SAM output,
3825      needed to give the correct CIGAR starting coordinate
3826
3827    * doublelist.c, doublelist.h, intlist.c, intlist.h, uintlist.c, uintlist.h:
3828      Implemented procedures for keeping a single item in the list
3829
3830    * sarray-read.c: Among ambiguous splice segments, ranking by probability and
3831      selecting closest one if it is less than half the distance of the second
3832      one
3833
3834    * substring.c, stage3hr.c: Improved debugging statements
3835
3836    * sarray-read.c: Fixed value of substring1p passed to Substring_new_ambig
3837      for alignments on the minus strand, which resulted in problems with the
3838      --merge-overlap feature
3839
38402016-07-06  twu
3841
3842    * stage3.c, pair.c: Restored backward movement of ptr
3843
3844    * stage1hr.c: Fixed infinite loop due to circular list
3845
3846    * gsnap.c: Changed default end detail to be medium
3847
3848    * substring.c: Fixed standard GSNAP output for deletions, by reducing the
3849      number of final dashes
3850
3851    * stage3hr.c: Added debugging statements
3852
38532016-06-30  twu
3854
3855    * pair.c, stage3.c: No longer going backward after an indel, which could
3856      cause an infinite loop
3857
3858    * splice.c: Using looser criteria for accepting a splice
3859
3860    * sarray-read.c: Revising previous number of mismatches instead of replacing
3861      it
3862
3863    * substring.c: Using correct memory category for substrings
3864
3865    * stage3hr.c: Penalizing bad introns
3866
3867    * pair.c, pair.h: Pair_nmismatches_region returning number of bad introns
3868
3869    * indel.c: Improved debugging statements
3870
3871    * gmap.c: Including -K for backward compatibility
3872
3873    * stage1hr.c: Merged revision 193193 from branches/2016-06-29-add-listpool
3874      to change from lists to a vector for anchor_segments
3875
38762016-06-29  twu
3877
3878    * resulthr.c, resulthr.h: Added UNPAIRED_TERMINALS result type
3879
3880    * stage1hr.c: Handling unpaired_terminals.  Consolidated memory allocation
3881      for plus and minus cases in Stage1hr_T object.
3882
38832016-06-21  twu
3884
3885    * shortread.c: Made fixes for --force-single-end option to work properly
3886
38872016-06-15  twu
3888
3889    * configure.ac: Added provision for user-selected SIMD level
3890
38912016-06-09  twu
3892
3893    * cpuid.c: Providing more detailed information from standalone program
3894
3895    * splicetrie.c: Commented out a debugging statement
3896
3897    * splice.c: Restored check for sufficient splice probabilities on splices
3898
3899    * sarray-read.c: Restored _pext_u32 command
3900
3901    * Makefile.gsnaptoo.am, cpuid.c: Added cpuid main program
3902
3903    * stage3.c: Added comment
3904
39052016-06-08  twu
3906
3907    * ax_ext.m4: Added -mbmi2 flag for avx2
3908
39092016-06-03  twu
3910
3911    * VERSION: Updated version number
3912
3913    * stage1hr.c: Replaced constant value of 15 with
3914      min_distantsplicing_end_matches
3915
3916    * indexdb.c: Removed sanity check on positions filesize, which can fail on
3917      multiple simultaneous instances of the process
3918
3919    * stage1hr.c, stage3hr.c, stage3hr.h: Searching for distant splicing based
3920      on trim
3921
3922    * sarray-read.c: Turning off AVX2-specific version of
3923      fill_positions_filtered_first
3924
3925    * sam_sort.c: Fixed warning message.  Fixed memory leak.
3926
3927    * gmap_select.c, gmapl_select.c, gsnap_select.c, gsnapl_select.c: Improved
3928      warning messages
3929
39302016-05-25  twu
3931
3932    * pair.c: Fixed calculation of circularpos for plus strand
3933
39342016-05-24  twu
3935
3936    * trunk, VERSION, acinclude.m4, bootstrap.gsnaptoo, asm-bsr.m4, ax_ext.m4,
3937      builtin-popcount.m4, configure.ac, index.html, src, Makefile.gsnaptoo.am,
3938      cpuid.c: Merged revisions 189683 through 190434 from
3939      branches/2016-05-12-power8
3940
3941    * uniqscan.c, gsnap.c: Using new interface to Stage3_setup
3942
3943    * iit-read-univ.c: Commented out warning when IIT file cannot be read
3944
3945    * gmap.c: Changed flag names to --max-intronlength-middle,
3946      --max-intronlength-ends, and --trim-end-exons
3947
3948    * stage3.c, stage3.h: Checking for end exon length with minendexon variable
3949
3950    * config.site.rescomp.dev, config.site.rescomp.tst: Added a configuration
3951      file for run-time checking
3952
39532016-05-17  twu
3954
3955    * stage3hr.c: Fixed uninitialized variable
3956
3957    * gmap.c, gsnap.c, uniqscan.c: Using new interface to Stage3_setup
3958
3959    * stage3.h: Added variable maxintronlen_ends
3960
3961    * stage3.c: In trimming ends, always going to intron, and not allowing
3962      indel. Comparing end intron length with maxintronlen_ends.
3963
39642016-05-06  twu
3965
3966    * dynprog_simd.c: Computing X_prev_nogap correctly for the case of zero gap
3967      penalty
3968
3969    * stage1hr.c: Modified debugging statements
3970
3971    * stage2.c: Added assertions
3972
3973    * acinclude.m4, ax_cpuid_intel.m4, ax_cpuid_non_intel.m4, ax_ext.m4,
3974      configure.ac: Writing own cpuid configure checks based on same code as in
3975      src/cpuid.c
3976
39772016-05-01  twu
3978
3979    * trunk, src, dynprog.c, dynprog.h, dynprog_genome.c, gmap.c, gsnap.c,
3980      pair.c, pair.h, sarray-read.c, splice.c, stage1hr.c, stage3.c, stage3.h,
3981      stage3hr.c, stage3hr.h, uniqscan.c: Merged revisions 188721 through 188751
3982      from branches/2016-04-29-improve-alignments
3983
3984    * trunk, Makefile.gsnaptoo.am: Property changes
3985
3986    * VERSION, config.site.rescomp.prd: Updated version number
3987
3988    * config.site.rescomp.tst: Added sanitize flag
3989
39902016-04-29  twu
3991
3992    * papers: Removed papers directory from SVN
3993
3994    * src, Makefile.gsnaptoo.am, gmap.c, pair.c, stage3.c, translation.c,
3995      translation.h: Merged revisions 188558 to 188717 from
3996      branches/2016-04-27-alt-codons to allow for alternate genetic codes
3997
39982016-04-20  twu
3999
4000    * archive.html, index.html: Updated for latest version
4001
4002    * stage3.c: Not allowing any ambiguous matches at 3' or 5' ends when trimming
4003
4004    * datadir.c: Modified comments
4005
4006    * datadir.c: In find_fileroot, showing preference if <dbroot>.version is
4007      found. Otherwise, handling the case where multiple .version files are
4008      found.
4009
40102016-04-04  twu
4011
4012    * archive.html, index.html: Revised for latest version
4013
4014    * splice.c: Checking for more than 10% mismatches in either end.  Using
4015      value of min_shortend in Splice_resolve_sense and Splice_resolve_antisense.
4016
4017    * sarray-read.c: Modified debugging statements
4018
40192016-03-30  twu
4020
4021    * gmapindex.c: Not creating altscaffold IIT file if no alt scaffolds are
4022      observed
4023
4024    * gmap.c, uniqscan.c: Using new interface to Univ_IIT_altlocp
4025
4026    * VERSION: Updated version number
4027
4028    * index.html: Updated for latest version
4029
4030    * stage3hr.c: Removed low_alias and high_alias fields.  Using altlocp,
4031      alias_starts, and alias_ends.
4032
4033    * resulthr.c: Using npaths_primary and npaths_altloc
4034
4035    * gsnap.c, iit-read-univ.c, iit-read-univ.h: Reading altloc IIT file
4036
40372016-03-29  twu
4038
4039    * sam_sort.c: Added option --restore-orig-order
4040
4041    * samprint.c: Removed print statement
4042
4043    * iit-read.c: Trying adding .iit suffix first
4044
4045    * stage3hr.c: Turning off DISTANT_SPLICE_SPECIAL, so we can find distant
4046      splices. For substrings, updating found_score only when the new one is
4047      better. Using nmismatches_whole for score field.
4048
4049    * sarray-read.c: Fixed debugging statements
4050
40512016-03-17  twu
4052
4053    * gff3_genes.pl.in, gff3_introns.pl.in, gff3_splicesites.pl.in: Handling
4054      Parent and ID fields in exon and CDS types of recent NCBI gff3 files.
4055      Handling new transcript types.
4056
4057    * pair.c: Changed occurrences of abs() to explicit conditional statements,
4058      since abs() can give large integers with -m64 compiler flag
4059
4060    * stage2.c: Added parentheses
4061
4062    * gsnap.c, stage1hr.c, stage1hr.h, uniqscan.c: Added option --max-anchors
4063
4064    * samread.c: Added debugging statements
4065
40662016-02-19  twu
4067
4068    * VERSION: Updated for latest version
4069
4070    * index.html: Updated version latest version
4071
4072    * stage3hr.c: Turned off debugging
4073
4074    * stage1hr.c, stage3hr.c, substring.c: Fixed query coordinates for salvage
4075      terminal procedure on minus strand
4076
40772016-02-18  twu
4078
4079    * stage3hr.c: Checking for stage2pairs being NULL when running GMAP on
4080      substrings or previous GMAP
4081
4082    * gmap_build.pl.in: Restored removal of fasta_sources and coordsfile
4083      temporary files
4084
4085    * gmap_build.pl.in: Added quotes around bindir programs
4086
4087    * stage3hr.c: Fixed creating stage 2 pairs for circular chromosomes
4088
4089    * stage3.c: Fixed debugging statements
4090
40912016-02-17  twu
4092
4093    * indel.c: Require more matches on both ends than the length of the insertion
4094
4095    * indel.c, oligoindex_hr.c, sarray-read.c, stage1hr.c, stage3hr.c,
4096      stage3hr.h, substring.c, substring.h: Removed genomiclength as a field
4097      from Substring_T objects.  Fixed overflow bug for large insertion
4098      substrings.
4099
4100    * sarray-read.c: Fixed code for SSE2 compilation
4101
4102    * stage3hr.c: Removed assertion, which is not valid
4103
4104    * indexdb.c: Handling stderr message for single sequence, where number of
4105      seconds is not defined
4106
4107    * stage3hr.c: Changed assertion to handle large genomes
4108
4109    * stage1hr.c, smooth.c, stage3.c, translation.c, translation.h, gmap.c,
4110      gregion.c, gregion.h, stage1.c, stage1.h, sarray-read.c, splice.c,
4111      splice.h: Removed unused variables and parameters
4112
4113    * splicetrie_build.c, sarray-write.c, indexdb.c, indexdb-write.c,
4114      gmapindex.c, dynprog_single.c: Removed unused variables
4115
4116    * splicetrie.c: Using new interface to dynprog procedures
4117
4118    * pbinom.c, outbuffer.c, iit-write.c, get-genome.c: Hiding unused procedures
4119
4120    * pair.c, indel.c, indel.h, sarray-read.c, sarray-read.h, stage1hr.c,
4121      stage3.c, dynprog_single.c, dynprog.c, dynprog_cdna.c, dynprog_cdna.h,
4122      dynprog_end.c, dynprog_end.h, dynprog_genome.c, dynprog_simd.c,
4123      dynprog_simd.h: Removed unused parameters
4124
4125    * output.c, stage3.h: Removed unused parameters in Stage3_print_sam
4126
4127    * oligoindex_hr.c: Fixed comparison between unsigned and signed values
4128
4129    * genome128_hr.c: Hiding procedures specific to GSNAP
4130
4131    * dynprog_cdna.c, dynprog_end.c, dynprog_end.h, dynprog_genome.c,
4132      dynprog_single.c, dynprog_single.h: Changed check for HAVE_SSE4_1 or
4133      HAVE_SSE2 to just HAVE_SSE2
4134
4135    * compress.c, bitpack64-readtwo.c: Put macros around a variable
4136
4137    * stage3hr.c: Using new interfaces to stage 2 procedures
4138
4139    * splicetrie.c: Using new interfaces to dynprog procedures
4140
4141    * gmap.c, stage1hr.c, stage2.c, stage2.h, stage3.c: Removed stage2_source
4142      and stage2_indexsize as return values from procedures
4143
4144    * stage3hr.c: Fixed comparisons of signed and unsigned integers
4145
4146    * gmap.c, stage3.c, stage3.h: Removed stage2 and stage3 benchmarking fields
4147      from Stage3_T object
4148
4149    * output.c, samprint.c, samprint.h, chimera.c, pair.c, pair.h: Removed
4150      unused variables and parameters from pair procedures
4151
4152    * dynprog_single.c, dynprog.c, dynprog.h, dynprog_cdna.c, dynprog_end.c,
4153      dynprog_genome.c, dynprog_simd.c, pairpool.c, pairpool.h: Reduced
4154      parameters for Pairpool_add_genomeskip and Dynprog_traceback_std
4155
4156    * gsnap.c, stage1hr.c, stage3hr.c, stage3hr.h, uniqscan.c: Putting
4157      subopt_levels into Stage3hr_setup.  Removing cutoff_level as parameter
4158      from optimal score procedures
4159
4160    * stage1hr.c, stage3hr.c, stage3hr.h: Removed unused parameters from display
4161      and eval procedures
4162
41632016-02-16  twu
4164
4165    * sarray-read.c, stage1hr.c, stage3hr.c, stage3hr.h: Removed unused
4166      parameters for stage3hr procedures
4167
4168    * sarray-read.c, splice.c, stage1hr.c, stage3hr.c, stage3hr.h: Removed
4169      unused parameters and variables
4170
4171    * stage1hr.c, stage3hr.c, stage3hr.h: Using new interfaces to functions
4172      without first_read_p
4173
4174    * mapq.c, mapq.h, bitpack64-readtwo.c: Hiding unused functions
4175
4176    * bitpack64-read.c: Put correct macros around variable
4177
4178    * indel.c, splice.c: Using new interfaces to procedures without first_read_p
4179
4180    * sarray-read.c, substring.c, substring.h: Removed unused field first_read_p
4181
4182    * samprint.c: Removed unused parameter concordant_chrpos
4183
41842016-02-13  twu
4185
4186    * gsnap.c, stage1hr.c, stage1hr.h, uniqscan.c: Removed unused variables and
4187      parameters in stage1hr.c
4188
41892016-02-12  twu
4190
4191    * gmap.c, gsnap.c, indexdb_hr.c, oligo.c, sarray-read.c, sarray-read.h,
4192      stage1hr.c, stage2.c, stage2.h, stage3.c, stage3hr.c: Removed unused
4193      variables and parameters from sarray procedures
4194
4195    * genome-write.c, genome-write.h, gmapindex.c: Removed altstrain_iit as a
4196      parameter to Genome_write_comp32
4197
4198    * genome_sites.c, sequence.c: Hiding unused function
4199
4200    * genome_sites.c, genome_sites.h, access.c: Hiding unused functions
4201
4202    * genome128_hr.c, genome128_hr.h, indel.c, mapq.c, sarray-read.c, splice.c,
4203      splicetrie.c, stage1hr.c, stage3hr.c, substring.c: Removed first_read_p as
4204      a parameter from all genome128 procedures
4205
4206    * compress.c: Removed unused variables
4207
4208    * iit_get.c: Removed unused variables and parameters
4209
4210    * parserange.c, iit-read-univ.c: Removed unused variable
4211
4212    * iit-read.c, iit-read.h, stage3.c: Removed map_bothstrands_p as a parameter
4213      to IIT_print_header
4214
4215    * iit-read-univ.c, iit-read.c, iit-read.h, iit_get.c: Removed sortp as a
4216      parameter from IIT_get_values routines
4217
4218    * get-genome.c, gmap.c, gsnap.c, iit-read.c, iit-read.h, iit_dump.c,
4219      iit_fetch.c, iit_get.c, parserange.c, snpindex.c, uniqscan.c: Removed
4220      labels_read_p as a parameter from IIT_read
4221
4222    * access.c: Restored check for number of attached processes when deallocating
4223
4224    * iit-read.c, iit-read.h, splicing-scan.c: Removed parameter annotationonlyp
4225      from IIT_dump
4226
4227    * genome128.c, gmapindex.c, sarray-read.c, snpindex.c, access.c, access.h,
4228      atoiindex.c, cmetindex.c, genome.c, iit-read-univ.c, iit-read.c,
4229      indexdb-write.c, indexdb.c, sarray-write.c: Removed eltsize as an argument
4230      to Access_mmap routines
4231
4232    * gsnap.c, stage3hr.c, stage3hr.h, uniqscan.c: Added --end-detail flag
4233
4234    * access.c: No longer printing long string of periods and commas during
4235      pre-load
4236
42372016-02-09  twu
4238
4239    * gmap.c, gsnap.c: Added message to remove shared memory manually
4240
4241    * access.c: Removed warning message
4242
4243    * spanningelt.c: Removed debugging code
4244
4245    * spanningelt.c: For debugging purposes
4246
4247    * src, atoiindex.c, cmetindex.c, gmapindex.c, indexdb.c, indexdb_hr.c,
4248      sarray-write.c: Using new interface to Access_allocate_private
4249
4250    * sarray-read.c: Using new interface to Access_allocate_shared and
4251      Access_allocate_private.  Removed code for USE_CSA.
4252
4253    * genome.c, indexdb.c, indexdb.h, indexdbdef.h: Using new interface to
4254      Access_allocate_shared and Access_allocate_private
4255
4256    * access.c, access.h: If shared allocation fails, now using memory mapping
4257      if possible. Setting access variable.
4258
4259    * gsnap.c, gmap.c, uniqscan.c: Using new interface to Access_setup
4260
4261    * dynprog_cdna.c: Changed variable initialization
4262
4263    * semaphore.c: Changed variable names
4264
4265    * access.c: Storing all semaphore IDs, and looking at their
4266      resident/freeable status.  Handling emergency stops better.
4267
42682016-02-08  twu
4269
4270    * Makefile.gsnaptoo.am, access.c, semaphore.c, semaphore.h: Put semaphore
4271      commands in a separate file.  Fixed small bugs with deleting semaphores
4272      and shared memory.
4273
42742016-02-05  twu
4275
4276    * stage2.c: Removed unnecessary calls to abs().  Replaced with a comparison
4277      between gendistance and querydistance.
4278
4279    * shortread.c: Using size_t instead of unsigned long long
4280
4281    * bitpack64-write.c, indexdb_hr.c, junction.c, pair.c, parserange.c,
4282      sarray-write.c, splicetrie_build.c, substring.c, uint8list.c: Using %llu
4283      for formatting instead of %u
4284
4285    * access.c, access.h, genuncompress.c, iit-read-univ.c, iit-read.c,
4286      indexdb-write.c, indexdb.c, sam_sort.c: Changed off_t to size_t for
4287      filesize
4288
4289    * gsnap.c: Removed testing code
4290
4291    * gsnap.c: For testing purposes
4292
42932016-02-04  twu
4294
4295    * pairpool.c: Fixed assertion on genomepos
4296
4297    * stage3hr.c: Fixed computation of minus chromosome coordinates for circular
4298      chromosomes
4299
43002016-02-03  twu
4301
4302    * gmap.c: Creating altlocp, alias_starts, and alias_ends for user-provided
4303      genomic segment
4304
4305    * coords1.test.ok: Revised for alternate genomic contigs
4306
4307    * gsnap.c: Allowing for npaths_primary and npaths_alternate.  Letting
4308      insertion length be arbitrarily long when user does not specify
4309      --max-middle-insertions.
4310
4311    * gmap.c, stage3hr.h, stage3.h: Allowing for npaths_primary and
4312      npaths_alternate
4313
4314    * uniqscan.c: Using new interfaces to functions
4315
4316    * substring.h, substring.c: Trimming novel spliceends for substrings
4317
4318    * stage3hr.c: Allowing for npaths_primary and npaths_alternate.  Changed
4319      logic for extending substrings using GMAP.  Implemented extension of GMAP
4320      alignments.
4321
4322    * stage3.c: In find_novel_spliceends, using trim lengths at ends to define
4323      two regions, one with a stronger and one with a weaker criterion for
4324      splice site probability.
4325
4326    * stage2.c, stage2.h: Implemented Stage2_compute_starts and
4327      Stage2_compute_ends for extending ends of alignments.  Fixed a condition
4328      for termination of while loop.
4329
4330    * stage1hr.h: Allowing for npaths_primary and npaths_alternate.  Allowing
4331      for arbitrarily long insertions when --max-middle-insertions is not set by
4332      user.
4333
4334    * stage1hr.c: Allowing for npaths_primary and npaths_alternate.  Making call
4335      to extend gmap alignments.  Allowing for arbitrarily long insertions when
4336      --max-middle-insertions is not set by user
4337
4338    * sarray-read.h, sarray-read.c: Allowing for arbitrarily long insertions
4339      when --max-middle-insertions is not set by user
4340
4341    * samprint.c, samprint.h: Allowing for npaths_primary and npaths_alternate.
4342      Added parameter for artificial mate in --add-paired-nomappers.
4343
4344    * pairpool.c: Added assertion for genomepos
4345
4346    * pair.c, pair.h: Allowing for npaths_primary and npaths_alternate.  Added
4347      function Pair_trim_distances, used by new find_novel_spliceends function
4348      for pairs.
4349
4350    * output.c: Allowing for npaths_primary and npaths_alternate.  Using new
4351      interface using artificial_mate_p.
4352
4353    * gmapindex.c: Allowing for alternate scaffolds
4354
4355    * dynprog_simd.c: Calling correct printing procedures for debugging
4356
4357    * dynprog_genome.c: Requiring finalscore to be >= 0
4358
4359    * Makefile.gsnaptoo.am: Added parserange.c and parserange.h for sam_sort
4360
43612016-01-15  twu
4362
4363    * src, result.c, result.h, resulthr.c, resulthr.h: Merging revision 182439
4364      from branches/2014-09-04-secondary-chr to handle npaths_primary and
4365      npaths_altloc
4366
4367    * src, iit-read.c, iit_store.c: Merging revision 182435 to use new interface
4368      to Chrom_from_string
4369
4370    * src, parserange.c: Merging revision 182431 from
4371      branches/2014-09-04-secondary-chr to check if contig_iit is NULL
4372
4373    * Makefile.gsnaptoo.am: Merging revisions 162111 and 182429 from
4374      branches/2014-09-04-secondary-chr to add tableint.c, tableint.h,
4375      parserange.c, and parserange.h where needed
4376
4377    * chrom.c, chrom.h: Merged revisions 162112 and 182427 from
4378      branches/2014-09-04-secondary-chr to add fields for alt_scaffold_start and
4379      alt_scaffold_end
4380
4381    * table.c, tableint.c, tableuint.c, tableuint8.c: Merged revision 162110
4382      from branches/2014-09-04-secondary-chr to fix memory leak
4383
4384    * iit-read-univ.c, iit-read-univ.h: Merged revision 182424 from
4385      branches/2014-09-04-secondary-chr to add function Univ_IIT_altlocp
4386
4387    * util, fa_coords.pl.in, gmap_build.pl.in, gmap_process.pl.in: Merged
4388      revisions 146896 through 182422 from branches/2014-09-04-secondary-chr
4389
4390    * index.html: Updated to latest version
4391
4392    * archive.html: Added revision for 2014-12-31.v2
4393
4394    * stage1hr.c: Fixed array overflow in segmentation procedure
4395
43962016-01-14  twu
4397
4398    * uniqscan.c: Using new interface to Stage3hr_setup
4399
4400    * gsnap.c, stage3hr.c, stage3hr.h: Distinguishing between pairmax_linear and
4401      pairmax_circular
4402
4403    * pair.c: Removed SOFT_CLIPS_AVOID_CIRCULARIZATION code in computing
4404      circularpos, since it isn't needed
4405
4406    * substring.c, substring.h: Defining Substring_mandatory_trim_left and
4407      Substring_mandatory_trim_right.
4408
4409    * stage3hr.c: Turning SOFT_CLIPS_AVOID_CIRCULARIZATION back on.  Using
4410      Substring_mandatory_left_trim and Substring_mandatory_right_trim.
4411
44122016-01-13  twu
4413
4414    * gmap_select.c, gmapl_select.c, gsnap_select.c, gsnapl_select.c: Checking
4415      for ENOENT instead of EACCES
4416
4417    * substring.c: Handling trim_left_action and trim_right_action, instead of
4418      trim_left_p and trim_right_p
4419
4420    * substring.h, stage3hr.c: Using trim_left_action and trim_right_action,
4421      instead of trim_left_p and trim_right_p
4422
44232016-01-12  twu
4424
4425    * gmap_select.c, gmapl_select.c, gsnap_select.c, gsnapl_select.c: Returning
4426      return code from execvp
4427
4428    * gmap_select.c, gmapl_select.c, gsnap_select.c, gsnapl_select.c: Handling
4429      case where no path is given, by using execvp to find the correct program
4430
4431    * stage3hr.c, substring.c: Computing coordinates correctly in salvage
4432      procedure for terminal alignments with too many mismatches
4433
44342016-01-11  twu
4435
4436    * iit-read.c: Removed debugging code which was not adding .iit ending to
4437      file suffix
4438
44392016-01-08  twu
4440
4441    * dynprog_cdna.c, dynprog_end.c, dynprog_genome.c, dynprog_single.c: Added
4442      parameters for Dynprog_standard in non-SIMD code
4443
4444    * samprint.c, stage3hr.c, stage3hr.h: Changed name of function from
4445      Stage3end_substring2 to Stage3end_substringN
4446
4447    * gsnap.c, sarray-read.c, sarray-read.h: Not allowing for ambiguous splicing
4448      on circular chromosomes
4449
4450    * dynprog_single.c: Added assertion
4451
4452    * dynprog_genome.c: Checking for the case where no intron is found in a gap
4453
44542016-01-07  twu
4455
4456    * stage1hr.c: Removed extra debugging code
4457
4458    * sarray-read.c, stage3hr.c: Handling Junction_gc from within
4459      Stage3end_new_substrings
4460
4461    * stage1hr.c: Defining MAX_ANCHORS instead of EXHAUSTIVE_ANCHORS.  Keeping
4462      track of both all_segments and anchor_segments, and using whichever
4463      satisfies MAX_ANCHORS.
4464
4465    * dynprog_cdna.c, dynprog_end.c, dynprog_genome.c, dynprog_simd.h,
4466      dynprog_single.c: Replacing DEBUG14 and DEBUG16 with DEBUG_SIMD and
4467      DEBUG_AVX2, respectively
4468
4469    * dynprog.c: Allocating space needed for AVX2 debugging
4470
4471    * dynprog.h: Providing SIMD variables for non-AVX2 debugging procedures
4472
4473    * dynprog_simd.c: Fixed SIMD variables in non-AVX2 debugging procedures
4474
4475    * dynprog_simd.c: Added code for AVX2
4476
4477    * dynprog_cdna.c, dynprog_end.c, dynprog_genome.c, dynprog_simd.h: Passing
4478      debugging parameters for both SIMD and AVX2 debugging
4479
4480    * dynprog.c: Generalized allocation procedures to use ALIGN_SIZE
4481
4482    * stage3hr.c: In Stage3end_optimal_score and Stage3pair_optimal_score,
4483      turning off comparison of score_eventrim with cutoff_level.  In
4484      Stage3end_new_terminal, if number of mismatches between pos5 and pos3
4485      exceeds number allowed, then recomputing pos5 or pos3 that does fit within
4486      the number allowed.
4487
4488    * stage1hr.c: Allowing for an exhaustive set of anchor segments
4489
44902015-12-19  twu
4491
4492    * types.h: Revised comment
4493
4494    * dynprog.h: Moved definitions of infinite gap penalties here
4495
4496    * dynprog.c: Adjusting for negative infinity in last_nogap in F loop
4497
4498    * atoi.c, cmet.c: Added types to make sure 64 bits are used
4499
45002015-12-18  twu
4501
4502    * dynprog_simd.c: Fixed initial conditions for all three types of initial
4503      gap penalty
4504
45052015-12-15  twu
4506
4507    * dynprog.c: Added initializtion for ZERO_INITIAL_GAP_PENALTY.  Made fixes
4508      for initial Fgap calculation for standard initial gap penalty, by
4509      initializing last_nogap appropriately.
4510
4511    * dynprog_simd.c: Using a filter on lband for the E calculation for the
4512      first column when using a standard initial gap penalty
4513
4514    * dynprog_end.c: Added function needed for debugging
4515
4516    * dynprog_simd.c: For standard initial gap penalty, revising extend_ladder
4517      for first column of values, in second and later blocks.
4518
4519    * dynprog_simd.c: Implemented code for ZERO_INITIAL_GAP_PENALTY and
4520      INFINITE_INITIAL_GAP_PENALTY.  Added filters on lband for first column of
4521      values in ZERO_INITIAL_GAP_PENALTY.
4522
45232015-12-11  twu
4524
4525    * dynprog.c, dynprog.h: Added upperp and lowerp parameters to
4526      Dynprog_standard, to give it the same behavior as the SIMD upper and lower
4527      procedures
4528
45292015-12-10  twu
4530
4531    * VERSION: Updated version number
4532
4533    * oligoindex_hr.h: Removed duplicate definition of Shortoligmer_T
4534
4535    * trunk, config.site.rescomp.tst, src, Makefile.gsnaptoo.am,
4536      Makefile.pmaptoo.am, alphabet.c, alphabet.h, atoi.h, bitpack64-read.c,
4537      bitpack64-read.h, block.c, block.h, cmet.h, compress-write.c, dynprog.c,
4538      dynprog_single.c, gmap.c, gmapindex.c, indexdb-write.c, indexdb-write.h,
4539      indexdb.c, indexdb.h, oligoindex.h, oligoindex_hr.c, oligoindex_pmap.c,
4540      oligoindex_pmap.h, oligop.c, oligop.h, pair.c, pmap_select.c, pmapindex.c,
4541      stage1.c, stage2.c, stage2.h, stage3.c, types.h: Merged revisions 179384
4542      through 180698 from branches/2015-11-20-pmap
4543
45442015-12-09  twu
4545
4546    * indexdb.c: Defining blocksize for PMAP
4547
4548    * stage3hr.c: Adjusting cutoff levels in Stage3end_pair_up_concordant to
4549      consider the best alignment on each end, if has mismatches that exceed the
4550      given cutoff level.  Added a field distant_splice_p, and using that for
4551      filtering, rather than chrnum == 0.
4552
4553    * output.c, samprint.c, samprint.h: Printing NH to be the maximum of the two
4554      npaths in --add-paired-nomappers option
4555
45562015-12-07  twu
4557
4558    * trunk, VERSION, config.site.rescomp.prd, config.site.rescomp.tst, src,
4559      Makefile.gsnaptoo.am, atoi.c, atoi.h, atoiindex.c, bitpack64-access.c,
4560      bitpack64-access.h, bitpack64-incr.c, bitpack64-incr.h, bitpack64-read.c,
4561      bitpack64-read.h, bitpack64-readtwo.c, bitpack64-readtwo.h,
4562      bitpack64-write.c, bitpack64-write.h, block.c, block.h, cmet.c, cmet.h,
4563      cmetindex.c, gdiag.c, genome_sites.c, gmap.c, gmapindex.c, gsnap.c,
4564      indexdb-write.c, indexdb-write.h, indexdb.c, indexdb.h, indexdb_hr.c,
4565      indexdb_hr.h, oligo.c, oligo.h, oligoindex_hr.c, sarray-read.c,
4566      sarray-write.c, snpindex.c, spanningelt.c, spanningelt.h,
4567      splicetrie_build.c, stage1.c, stage1hr.c, types.h: Merged revisions 179335
4568      through 180340 from branches/2015-11-20-16mers to allow for genomic
4569      indices up to 18-mers
4570
4571    * index.html: Updated for latest version
4572
45732015-12-04  twu
4574
4575    * samprint.c: Fixed printing under --add-paired-nomappers for unpaired
4576      multiple alignments where npaths2 > npaths1
4577
45782015-12-02  twu
4579
4580    * stage2.c: In looking back, skipping positions where no hits were found
4581
45822015-12-01  twu
4583
4584    * configure.ac: Building only one level of SIMD
4585
4586    * ax_ext.m4: Added "no" flags for various SIMD levels
4587
45882015-11-19  twu
4589
4590    * gsnap.c: Fixed incorrect check on floating-point values for --min-coverage
4591
4592    * gmap_select.c, gmapl_select.c, gsnap_select.c, gsnapl_select.c: Picking
4593      best available program at run-time
4594
4595    * stage1hr.c: In computing gmap using segments, introducing a min_genomepos
4596      and max_genomepos
4597
4598    * samread.c: Fixed incorrect parsing needed by sam_sort, resulting from
4599      missing break commands in case statements
4600
4601    * samprint.c: Fixed some issues with --add-paired-nomappers option
4602
4603    * inbuffer.c: Initializing variables
4604
4605    * pair.c, stage3hr.c: Excluding alignments to circular chromosomes that
4606      extend below the first copy or above the second copy
4607
4608    * atoiindex.c, cmetindex.c, gmapindex.c: Using new interface to suffix array
4609      write procedures
4610
4611    * sarray-write.h, sarray-write.c: Using a list of Cell_T objects to compute
4612      exceptions
4613
4614    * bytecoding.c, bytecoding.h: Implemented procedures for writing bytes file
4615      and interleaving bytes files.
4616
4617    * Makefile.gsnaptoo.am: Removed uinttable.c and uinttable.h from programs
4618      with bytecoding.c
4619
46202015-11-13  twu
4621
4622    * Makefile.gsnaptoo.am: Added uinttable files needed by bytecoding
4623
46242015-10-29  twu
4625
4626    * stage3hr.c: Turning off SOFT_CLIPS_AVOID_CIRCULARIZATION.  Handling
4627      deletions on minus strand that go beyond genomic position 0.
4628
46292015-10-06  twu
4630
4631    * Makefile.gsnaptoo.am: Added "=1" to some pre-processor flags
4632
4633    * access.h: Added LOADED type for cases where IIT file is loaded from memory
4634      instead of being read from file
4635
4636    * iit-write.c: Adding padding to character arrays in IIT file, so the
4637      integer arrays are aligned.
4638
4639    * iit-read.c, iit-read.h: Added IIT_load function to obtain IIT from a
4640      region of memory
4641
46422015-09-28  twu
4643
4644    * oligoindex_hr.c: Handling the case where left_plus_length < indexsize
4645
46462015-09-24  twu
4647
4648    * gmap.c, gsnap.c: Revised explanation message for illegal instructions
4649
4650    * datadir.c: Increased buffer size for dbversion file name
4651
4652    * access.c: Looping until we get a semaphore, either by creation or by using
4653      existing one
4654
46552015-09-21  twu
4656
4657    * stage3hr.c: Using nsegments field for all alignment types.  Filtering
4658      results if nsegments is relatively high compared with the best result.
4659
4660    * stage3.c: Using higher standard for microexons.  Using nmatches instead of
4661      support for sufficient_splice_prob.
4662
4663    * stage1hr.c: Handling case of paired-end alignments where both ends do not
4664      satisfy minimum coverage.  Fixed debugging statements.
4665
4666    * splice.c: Using correct variable names inside FREEA calls
4667
4668    * pair.c, pair.h: Implemented Pair_maxnegscore
4669
4670    * dynprog_genome.c: Revised bridge_intron_gap procedures to make reasoning
4671      clearer.  Using weaker values for scoreI.  Using maxnegscore to filter out
4672      bad alignments.
4673
46742015-09-16  twu
4675
4676    * stage1hr.c: Removed debugging comment
4677
4678    * stage1hr.c, stage3hr.c: Fixed calls to Genome_get_segment_blocks_left,
4679      where we were providing the left coordinate instead of the right one.
4680
4681    * genome.c: Added comment
4682
46832015-09-11  twu
4684
4685    * cpuid.c: Added to repository
4686
4687    * uniqscan.c: Using new interfaces to stage 1 procedures
4688
4689    * substring.c, substring.h: Removed reject_trimlength
4690
4691    * stage3hr.c, stage3hr.h: Added procedures for filtering by coverage
4692
4693    * stage1hr.h, gsnap.c: Added --min-coverage and removed --terminal-threshold
4694
4695    * stage1hr.c: Added --min-coverage.  Changed criteria for running
4696      find_terminals.
4697
4698    * gmap_build.pl.in: Allowing for spaces in destination directory
4699
47002015-09-08  twu
4701
4702    * stage3hr.c: Fixed debugging statement
4703
47042015-09-01  twu
4705
4706    * gmap.c: Setting some uninitialized variables for chimera
4707
4708    * chimera.c: When chimeric breakpoint is beyond chromosomal bounds,
4709      returning NN for dinucleotides
4710
4711    * stage3hr.c: Favoring non-zero sensedirs when sorting results
4712
4713    * splice.c: Fixed variable names for FREEA command, which was observed by
4714      compiler only when alloca is not available.
4715
4716    * oligoindex_hr.c: Initializing some return variables when exiting trimming
4717      procedure early
4718
4719    * chimera.c, chimera.h, gmap.c, pair.c: Fixed issues when chimeras extend to
4720      beginning or end of chromosomes, causing a search for donor and acceptor
4721      nucleotides beyond chromosomal bounds.  Fixed Pair_pathscores to extend to
4722      last pair of path.
4723
4724    * dynprog_genome.c: Resolved fatal bug in bridging intron gaps when when no
4725      probabilities are found
4726
47272015-08-31  twu
4728
4729    * stage1.c: No longer using alloca for array of Batch_T objects
4730
47312015-08-28  twu
4732
4733    * gmap.c: Changed MALLOC of array to MALLOC_OUT.  Fixed code for memusage.
4734
4735    * outbuffer.c: Changed FREE of outputs to FREE_KEEP
4736
4737    * stage3.c: Using known splices in pick_cdna_direction.  Changed MALLOC of
4738      Stage3_new to MALLOC_OUT.
4739
47402015-08-27  twu
4741
4742    * stage3.c: Changed criterion for evaluating splice neighborhood to allow
4743      for short ends
4744
47452015-08-24  twu
4746
4747    * sarray-read.c: Using max_mismatches_allowed from original call to suffix
4748      array algorithm, and not allowing it to be unlimited
4749
4750    * splice.c: Added debugging statements
4751
4752    * genome.c: Setting end of genomealt string to be NULL
4753
4754    * dynprog_genome.c: Changed loop end condition to avoid accessing
4755      uninitialized variables
4756
4757    * stage3hr.c: Calling Genome_get_segment_blocks_left with chroffset and not
4758      chrhigh. Setting resolve value to be -1 in case of AMB_UNRESOLVED_TOOCLOSE.
4759
47602015-08-19  twu
4761
4762    * gsnap.c: Removed extraneous space after newline
4763
4764    * access.c: Added header file
4765
47662015-08-13  twu
4767
4768    * gsnap.c: Using new interface to SAM_setup
4769
4770    * oligoindex_hr.h: Changing Inquery_T types to be unsigned
4771
4772    * oligoindex_hr.c: Removed old code that caused counts to be incremented
4773      twice
4774
4775    * access.c, access.h, atoiindex.c, cmetindex.c, genome.c, gmapindex.c,
4776      indexdb-write.c, indexdb.c, sarray-read.c, sarray-write.c, snpindex.c:
4777      Replaced Access_allocate with Access_allocate_private and
4778      Access_allocate_shared
4779
4780    * gsnap.c, samprint.c, samprint.h: Added option
4781      --paired-flag-means-concordant
4782
47832015-08-11  twu
4784
4785    * genome.c, indexdb.c, indexdbdef.h, sarray-read.c: Storing keys from shared
4786      memory to check semaphores to see if the memory should be retained
4787
4788    * access.c, access.h, gsnap.c: Added code for preloading and unloading of
4789      shared memory
4790
4791    * configure.ac: No longer offering options to enable or disable CPU types
4792
4793    * ax_ext.m4: Various changes to handle CPU types and features
4794
4795    * oligoindex_hr.c, oligoindex_hr.h: Created Inquery_T type to handle both
4796      SSE2 and non-SSE code
4797
4798    * genome128_hr.c: Fixed branch for clz/ctz for SSE4.2
4799
4800    * configure.ac: Removing mpi for now
4801
48022015-08-10  twu
4803
4804    * ax_ext.m4: Requiring BMI2 as part of AVX2
4805
4806    * sarray-read.c: Allowing stream_load of si128 only for HAVE_SSE4_1
4807
4808    * genome.c, genome.h, indexdb.c, indexdb.h, sarray-read.h: Commented unused
4809      procedures for shmem_remove
4810
4811    * trunk, Makefile.am, acinclude.m4, ax_compiler_vendor.m4, ax_ext.m4,
4812      config.site.rescomp.tst, configure.ac, src, Makefile.gsnaptoo.am, cpuid.h,
4813      genome128_hr.c, gmap.c, gmap_select.c, gmapindex.c, gmapl_select.c,
4814      gsnap.c, gsnap_select.c, gsnapl_select.c, popcount.c, popcount.h,
4815      sarray-read.c: Merged revisions 171384 through 171613 from
4816      branches/2015-08-06-run-time-variants to allow for run-time variants
4817
4818    * trunk, VERSION, config.site.rescomp.tst, index.html, src, oligoindex_hr.c,
4819      oligoindex_hr.h, sarray-read.c: Merged revisions 170634 to 171595 to add
4820      code for AVX2
4821
48222015-08-05  twu
4823
4824    * oligoindex_hr.c: Changed debugging statements
4825
48262015-08-04  twu
4827
4828    * oligoindex_hr.c: Added debugging statements
4829
48302015-08-03  twu
4831
4832    * gsnap.c, samprint.c, samprint.h: Added flag --add-paired-nomappers
4833
48342015-07-28  twu
4835
4836    * oligoindex_hr.c: Restored missing line in counting of 9-mers
4837
48382015-07-23  twu
4839
4840    * VERSION: Updated version number
4841
4842    * stage1hr.c: Removed an abort command from debugging
4843
4844    * sarray-read.c: Using new interface to Bytecoding lcp_next function.
4845      Commented out code that is not used when SUBDIVIDE_ENDS is not defined.
4846
4847    * bytecoding.c, bytecoding.h: Call to lcp_next now returns child_next
4848
48492015-07-22  twu
4850
4851    * VERSION: Updated version number
4852
4853    * dynprog_genome.c: Fixed boundaries that led to negative coordinates for
4854      splice site candidates.
4855
4856    * stage1hr.c: Removed unused variables
4857
4858    * stage1hr.c: Removed allvalidp as parameter to align_end and align_pair.
4859
4860    * stage1hr.c: Setting spanningsetp and completesetp to false if querylength
4861      < min_kmer_readlength
4862
4863    * stage1hr.c: Removed restriction on min_readlength.  Running only suffix
4864      array, if possible, if reads are too short.
4865
4866    * access.c: Changed user message
4867
4868    * sarray-write.c: Changing plcp[n] to be 0 instead of -1
4869
4870    * sarray-read.c: Improved debugging results
4871
4872    * access.c: Printing user message if shmem fails
4873
48742015-07-17  twu
4875
4876    * get-genome.c, sequence.c, sequence.h: Added flags for --stream-chars and
4877      --stream-ints
4878
48792015-06-26  twu
4880
4881    * trunk, Makefile.gsnaptoo.am, 2015-statgen, algorithm.tex, discussion.tex,
4882      features.tex, introduction.tex, util: Modified mergeinfo
4883
4884    * config.site.rescomp.tst: Updated version
4885
4886    * index.html: Updated for version 2015-06-23
4887
4888    * archive.html: Updated for version 2014-12-31
4889
4890    * README: Removed references to Goby
4891
4892    * src, access.c, bigendian.c, bigendian.h, bitpack64-access.c,
4893      bitpack64-read.c, bitpack64-readtwo.c, bytecoding.c, compress.c,
4894      compress.h, genome-write.c, genome.c, genome.h, genome128_hr.c,
4895      iit-read-univ.c, indexdb.c, indexdb_hr.c, sarray-read.c, sarray-write.c,
4896      snpindex.c, types.h, univinterval.h: Merged revisions 167282 through
4897      168383 from branches/2015-06-10-bigendian to support bigendian
4898      architectures
4899
4900    * Makefile.dna.am, Makefile.util.am: Added instructions for check-bigendian
4901
49022015-06-23  twu
4903
4904    * VERSION, config.site.rescomp.tst: Updated version number
4905
4906    * algorithm.tex, biblio.bib, discussion.tex, features.tex, introduction.tex,
4907      toplevel.tex: Final version
4908
4909    * stage1hr.c: Added comments
4910
4911    * gmap.c: Removed message about different batch levels
4912
4913    * gsnap.c: Added option --master-is-worker for MPI version
4914
4915    * access.c: Using malloc whenever shmget fails
4916
49172015-06-15  twu
4918
4919    * stage1hr.c: Removed extra #endif statements
4920
4921    * trunk, VERSION, config.site.rescomp.tst, Makefile.gsnaptoo.am,
4922      2015-statgen, Ambiguous-splicing.eps, Hierarchical-GMAP.eps,
4923      Large-hash-table.eps, Overlapping-alignment.eps, biblio.bib, toplevel.tex,
4924      util: Updated version number
4925
4926    * stage1hr.c: Fixed indentation
4927
4928    * src, genome.c, genome128_hr.c, gmap.c, gsnap.c, indexdb.c, mode.h,
4929      sarray-read.c, stage1hr.c, substring.c, uniqscan.c: Merged revisions
4930      165630 through 167691 from branches/2015-05-13-ttoc to implement ttoc mode
4931
4932    * splice.c: Applied revision 167580 from releases/public-2014-12-17.  In
4933      group_by_segmenti_aux and group_by_segmentj_aux, checking plusp for each
4934      individual hit in deciding whether to group donor or acceptor.
4935
4936    * bitpack64-readtwo.c: Added debugging statements
4937
4938    * sarray-read.c: Defining a variable for debugging
4939
4940    * oligoindex_hr.c: Defining reverse_nt for machines without SSE4.1
4941
49422015-06-11  twu
4943
4944    * stage3hr.c: Changed occurrences of Uintlist_next to Uint8list_next for
4945      LARGE_GENOMES
4946
4947    * oligoindex_hr.c: Providing alternative to _mm_extract_epi32 for machines
4948      without SSE4.1
4949
4950    * acinclude.m4, shm-flags.m4, configure.ac, access.c: Including check for
4951      SHM_NORESERVE
4952
4953    * Makefile.gsnaptoo.am: Removed -lrt
4954
4955    * sarray-read.c: Initializing chromosome values to be those for chrnum 1 to
4956      handle left == 0
4957
49582015-06-10  twu
4959
4960    * VERSION, index.html: Updated version number
4961
4962    * sarray-write.c, gmapindex.c: Removing rankfile
4963
4964    * gmap_build.pl.in: Changed flag from --no-sarray to --build-sarray
4965
4966    * atoiindex.c, cmetindex.c: Added flag --build-sarray
4967
49682015-06-09  twu
4969
4970    * indel.c: Added debugging statements
4971
4972    * stage1hr.c: Bypassing gmap on region if mappingend is less than or equal
4973      to mappingstart, which can happen if the region is pushed to the beginning
4974      or end of the chromosome
4975
4976    * stage3hr.c: Assigning loop variable to given junctions before we push
4977      left_ambig
4978
49792015-06-06  twu
4980
4981    * stage3.c: Reversed last revision, and put trim_novel_spliceends at
4982      beginning of path_trim, since putting it at the end results in an infinite
4983      loop
4984
4985    * stage3hr.c: Added debugging statement
4986
4987    * stage3.c: Moved trimming of novel spliceends from beginning of path_trim
4988      procedure to end
4989
4990    * pair.c: Fixed computation of circularpos for minus alignments
4991
49922015-06-05  twu
4993
4994    * samprint.c: Removed unused variables
4995
4996    * stage3hr.c: In printing translocations, getting separate chrs for the two
4997      halves. Turned on TRANSLOC_SPECIAL.
4998
4999    * samprint.c: In printing halfdonors and halfacceptors, comparing endlengths
5000      to trimlengths to determine whether to print H or S in CIGAR string
5001
5002    * samprint.c: Fixed printing of CIGAR strings for minus alignments
5003
50042015-06-04  twu
5005
5006    * stage1hr.c: Added lowpos and highpos to Segment_T object.  Rewrote dynamic
5007      programming procedures for converting segments to pairs.
5008
50092015-06-03  twu
5010
5011    * stage1hr.c: In converting segments to GMAP, changed criteria for dynamic
5012      programming to be relative to anchor_segment and not to segment[k].
5013
50142015-06-02  twu
5015
5016    * sarray-read.c, stage3hr.c: Using new interface to Substring_new_ambig
5017
5018    * substring.c, substring.h: Setting trim_left and trim_right for ambiguous
5019      substrings
5020
5021    * stage3hr.c, substring.c, substring.h: Renamed outofbounds variables to
5022      outofbounds_start and outofbounds_end.  Handling the case where the
5023      alignment is out of bounds to the left of the current chromosome.
5024
5025    * VERSION: Updated version number
5026
5027    * archive.html, index.html: Made changes for new version
5028
5029    * stage1hr.c: Handling the case where floors is NULL, such as for a poly-A
5030      read
5031
5032    * stage3hr.c: Fixed genomic segments for converting substrings to GMAP
5033
5034    * stage1hr.c: For converting segments to GMAP, fixed criteria for allowing
5035      non-monotonic query orders and possible insertions
5036
5037    * stage3hr.c: Fixed bug in referring to uninitialized substring
5038
5039    * substring.c, substring.h: Removed left_genomicseg field
5040
5041    * stage3hr.c: In converting substrings to GMAP, using correct genomic
5042      nucleotide now
5043
5044    * gsnap.c: Made batch level 4 the default
5045
5046    * stage1hr.c: Reordered search algorithms.  Limiting number of anchor
5047      segments, and pairing up those instead.  Disabling doublesplicing
5048      algorithm.
5049
5050    * sarray-read.h, sarray-read.c: Removed references to sarray_gmap
5051
5052    * pair.c, pair.h: For GSNAP default output format, no longer printing pair
5053      info for single-end reads
5054
5055    * memory-check.pl: Handling results for non-threaded runs
5056
5057    * sarray-read.c: Fixed memory leak
5058
50592015-06-01  twu
5060
5061    * stage1hr.c: Deferring read_oligos until we need them for spanning set or
5062      complete set algorithms
5063
5064    * stage1hr.c: Fixed call to single hit alignment of GMAP.  Made batch level
5065      4 the default for memory.
5066
5067    * stage1hr.c: Allowing terminal alignments only if no single-end alignments
5068      are found, or if no concordant alignments are found.
5069
5070    * sarray-read.c: Fixed memory leak
5071
5072    * stage1hr.c: Limiting number of anchor segments.  Implementing terminal
5073      alignments.
5074
5075    * stage3hr.c: Using new interface to Substring procedures
5076
5077    * substring.c, substring.h: Removed unused variables
5078
5079    * samprint.c: Removed obsolete code for printing specific GSNAP types
5080
5081    * stage1hr.c: Implemented finding of terminals based on anchor segments
5082
5083    * stage3hr.c: Fixed accumulation of ilength_high.  In comparing GMAP against
5084      substrings, iterating through all substrings.
5085
5086    * stage3hr.c, stage3hr.h, substring.c, substring.h: Fixed issues with
5087      substring boundaries, computing genomic_diff, and marking mismatches
5088
5089    * stage2.c: Removed GMAP-specific code from GSNAP
5090
5091    * samprint.c: Changed call to get querylengths
5092
5093    * genome128_hr.c, genome128_hr.h: Removed mismatch_offset
5094
50952015-05-31  twu
5096
5097    * samprint.c: Removed references to Pair_check_cigar.  Changed calls to get
5098      cdna_direction to those for sensedir.
5099
5100    * pair.c: Removed printing of state
5101
5102    * substring.c: Removed references to genomicstart_adj and genomicend_adj in
5103      converting substrings to pairs
5104
5105    * stage3hr.h: Removed interface for Stage3end_indel_pos
5106
5107    * stage3hr.c: Changed calls to Substring_new for insertion and deletion
5108      types to conform to new substrings standards, where each substring has its
5109      genomicstart and genomicend adjusted for indels.  Removed indel_pos and
5110      indel_low fields from Stage3end_T object.  Removed code for printing
5111      separate GSNAP types.
5112
5113    * stage3hr.c: Setting trim_left, trim_right, trim_left_splicep, and
5114      trim_right_splicep for substring hit type
5115
51162015-05-29  twu
5117
5118    * stage3hr.c: Fixed coordinate error in test_hardclips
5119
5120    * stage3hr.c: Fixed typo
5121
5122    * samprint.c, stage3hr.c: Fixed issues in finding substring_low for minus
5123      alignments using hardclip_low
5124
5125    * stage3hr.c: Fixed computation of ilength for substrings
5126
5127    * trunk, README, Makefile.gsnaptoo.am, 2015-statgen, Ambiguous-splicing.eps,
5128      DP-triangles.eps, Diagonalization.eps, Hierarchical-GMAP.eps,
5129      Large-hash-table.eps, Overlapping-alignment.eps, SIMD-oligomers.eps,
5130      Vertical-format.eps, algorithm.tex, biblio.bib, context.tex,
5131      discussion.tex, features.tex, introduction.tex, toplevel.tex, src, diag.c,
5132      diag.h, diagpool.c, diagpool.h, doublelist.c, doublelist.h,
5133      genome128_hr.c, gmap.c, gsnap.c, indel.c, indel.h, intlist.c, intlist.h,
5134      junction.c, junction.h, list.c, list.h, oligoindex_hr.c, oligoindex_hr.h,
5135      pair.c, pair.h, samprint.c, samprint.h, sarray-read.c, sarray-read.h,
5136      sequence.c, splice.c, splice.h, splicing-score.c, stage1hr.c, stage1hr.h,
5137      stage2.c, stage2.h, stage3.c, stage3.h, stage3hr.c, stage3hr.h,
5138      substring.c, substring.h, uint8list.c, uint8list.h, uintlist.c,
5139      uintlist.h, uniqscan.c, univdiag.c, univdiag.h, univdiagdef.h, util:
5140      Merged revisions 162218 to 166640 from branches/2015-03-28-sarray-gmap,
5141      2015-03-31-new-sarray-, 2015-05-07-sarray-ambig, 2015-05-21-segment-gmap,
5142      and 2015-05-22-fast-oligoindex
5143
5144    * trunk, config.site.rescomp.tst: Updated version number
5145
5146    * index.html: Made changes for 2014-12-29
5147
5148    * samprint.c: Moved position of #endif line
5149
51502015-05-27  twu
5151
5152    * substring.c, stage3.c: Fixes to debugging statements
5153
5154    * samprint.c, samprint.h: Revisions to SAM_compute_chrpos
5155
5156    * output.c: Using new interface to SAM_compute_chrpos
5157
51582015-05-19  twu
5159
5160    * src, gmapindex.c: Allowing genomecomp to be a command-line argument.
5161      Merged changes from branches-2015-05-15-compressed-sarray to allow for
5162      compressed suffix arrays.
5163
5164    * util, gmap_build.pl.in: Providing genomecomp file as a command-line
5165      argument, instead of piping it into gmapindex
5166
5167    * sarray-write.c, sarray-write.h: Merged changes from
5168      branches/2015-05-15-compressed-sarray to allow for compressed suffix
5169      arrays, but removed csafile needed for debugging
5170
5171    * sarray-read.c: Turning off code for compressed suffix arrays
5172
5173    * indexdb-write.c, indexdb-write.h: Allowing the case where genomelength is
5174      less than index1part
5175
5176    * bitpack64-write.h: Improved comments
5177
5178    * access.c: Merged changes from branches/2015-05-15-compressed-sarray to
5179      assign *fd, even if file is empty
5180
5181    * sarray-read.c: Merged code for compressed suffix array.  Implemented
5182      different methods for Elt_fill_positions_filtered, depending on whether
5183      the filtering occurs more than once.
5184
5185    * gmap.c: Using new interface to Pair_setup
5186
51872015-05-15  twu
5188
5189    * output.c: Not computing chrpos for SAMECHR_SPLICE and TRANSLOC_SPLICE
5190      hittypes
5191
5192    * gmap.c, gsnap.c, pair.c, pair.h, uniqscan.c: Fixed issue with printing
5193      nsnpdiffs for GMAP alignments
5194
5195    * stage3hr.c: Turned on TRANSLOC_SPECIAL to remove translocations when
5196      non-translocation alignments are found.  Using effective_chr for printing
5197      purposes.  Pushing both substrings for a distant splice. Using querystart
5198      and queryend instead of querystart_adj and queryend_adj for computing
5199      insertlength.
5200
5201    * samprint.c: Using Substring_compute_chrpos to compute chrpos based on
5202      substrings instead of Stage3end_T object
5203
5204    * substring.c, substring.h: Implemented Substring_compute_chrpos
5205
52062015-05-01  twu
5207
5208    * iit-read.c: Checking for the possibility in IIT_get_highs_for_low and
5209      IIT_get_lows_for_high of a zero-length array.
5210
5211    * stage3hr.c: Fixed order of LtoH substrings for deletions
5212
5213    * oligoindex_hr.c: Replaced count_fwdrev_simd with individual
5214      count_*mer_fwd|rev_simd procedures
5215
52162015-04-30  twu
5217
5218    * substring.c: Revised some debugging statements
5219
5220    * stage3hr.c: Retaining old information about sarrayp when copying a
5221      Stage3_T object
5222
5223    * stage3.c: Initializing max_nmatches to be 0 in end-trimming procedures
5224
5225    * Makefile.gsnaptoo.am: Added -lrt to get shm commands
5226
5227    * algorithm.tex, context.tex, features.tex, introduction.tex: Augmented
5228      captions
5229
5230    * biblio.bib, toplevel.tex: Added references
5231
5232    * discussion.tex: Added material
5233
5234    * algorithm.tex, features.tex, introduction.tex: Added citations
5235
5236    * discussion.tex: Added text
5237
5238    * context.tex: Added description of GSTRUCT
5239
5240    * context.tex, discussion.tex: Moved HTSeqGenie to context.tex
5241
5242    * introduction.tex: Added caption
5243
5244    * features.tex: Revisions
5245
5246    * Diagonalization.eps, Hierarchical-GMAP.eps, Large-hash-table.eps,
5247      Overlapping-alignment.eps, SIMD-oligomers.eps: Revised figures
5248
5249    * algorithm.tex: Expanded caption
5250
52512015-04-29  matthejb
5252
5253    * discussion.tex: + adding content to discussion
5254
52552015-04-29  twu
5256
5257    * context.tex, algorithm.tex: Revisions
5258
5259    * algorithm.tex: Revisions to diagonalization
5260
5261    * toplevel.tex: Changed symbols for logical operations
5262
52632015-04-28  matthejb
5264
5265    * discussion.tex: + initial additions to discussion by MB
5266
52672015-04-28  twu
5268
5269    * algorithm.tex: Revisions to linear genome
5270
5271    * algorithm.tex: Moved material on large genomes from features.tex to here
5272
5273    * introduction.tex, features.tex: Revisions
5274
5275    * algorithm.tex: Moved section on ranking alignments and eliminating
5276      duplicates to features.tex
5277
52782015-04-27  twu
5279
5280    * discussion.tex: Added notes
5281
5282    * algorithm.tex: Changed table
5283
5284    * features.tex, introduction.tex: Revisions
5285
52862015-04-27  michafla
5287
5288    * biblio.bib, context.tex: first draft of gmapR writeup
5289
52902015-04-24  twu
5291
5292    * Hierarchical-GMAP.eps, algorithm.tex, features.tex, introduction.tex,
5293      toplevel.tex: Revisions
5294
5295    * Ambiguous-splicing.eps, DP-triangles.eps, Diagonalization.eps,
5296      Hierarchical-GMAP.eps, Large-hash-table.eps, Overlapping-alignment.eps,
5297      SIMD-oligomers.eps, Vertical-format.eps: Added figures
5298
52992015-04-23  twu
5300
5301    * papers, 2015-statgen, algorithm.tex, context.tex, discussion.tex,
5302      features.tex, introduction.tex, toplevel.tex: Added directory for editing
5303      papers
5304
53052015-04-07  twu
5306
5307    * splice.c: Fixed probability calculation for an ambiguous splice
5308
53092015-03-27  twu
5310
5311    * stage3hr.c: Allowing insertlength to be negative, up to -pairmax, to allow
5312      for overlaps.  For debugging messages involving insert length, using
5313      chromosomal coordinates.
5314
5315    * stage1hr.c: Added address of GMAP alignment to debugging messages
5316
5317    * chimera.c: Added information about querypos and homology to XT field for
5318      GMAP
5319
5320    * samprint.c: Removed old version of adjust_hardclips
5321
53222015-03-26  twu
5323
5324    * filestring.c: Turned off debugging output to stdout
5325
5326    * outbuffer.c, master.c, master.h, gsnap.c, filestring.c: Allow possibility
5327      in MPI for output to stdout
5328
5329    * mpidebug.h: Added tag for writing to stdout
5330
5331    * mpidebug.c: Handling debugging output for MPI_BOOL_T as an unsigned char
5332
5333    * gsnap.c: Allowing MPI with only a single thread per rank, by calling
5334      Master_parser as a detached thread
5335
5336    * sarray-read.c: Allowing memory mapping for indexij_access
5337
53382015-03-25  twu
5339
5340    * gmap.c, gsnap.c: Added USE_MPI checks around final MPI_Barrier
5341
5342    * VERSION: Updated version number
5343
5344    * trunk, configure.ac, index.html, src, access.c, access.h, atoiindex.c,
5345      cmetindex.c, genome.c, genome.h, get-genome.c, gmap.c, gmapindex.c,
5346      gsnap.c, iit-read-univ.c, iit-read.c, indexdb-write.c, indexdb.c,
5347      indexdb.h, indexdbdef.h, outbuffer.c, sarray-read.c, sarray-read.h,
5348      sarray-write.c, snpindex.c, uniqscan.c: Merged revisions 161768 through
5349      161939 from branches/2015-03-23-shmem to implement shared memory
5350
53512015-03-24  twu
5352
5353    * stage3hr.c: In test_hardclips, checking if low and high coordinates are
5354      equal
5355
5356    * stage3hr.c: Fixed comparison of chrpos in adjust_hardclips_right and
5357      adjust_hardclips_left
5358
53592015-03-23  twu
5360
5361    * stage3hr.c: In adjust_hardclips, advancing both low_querypos and
5362      high_querypos on either failure, to prevent infinite loop
5363
5364    * stage3hr.c: In adjust_hardclips, advancing either low_querypos or
5365      high_querypos if needed
5366
53672015-03-22  twu
5368
5369    * stage3hr.c: Doing a final test_hardclip when shift right and shift left
5370      are not possible
5371
5372    * substring.c: In alias_circular and unalias_circular, updating
5373      genomicstart_adj and genomicend_adj
5374
5375    * stage3hr.c: Changed endpoint test in Stage3end_substring_low
5376
5377    * substring.c: Removed debugging string
5378
5379    * substring.c, substring.h: Added fields genomicstart_adj and genomicend_adj
5380      for substring2 of insertions and deletions to handle computations with
5381      querypos to obtain a genomic position
5382
5383    * stage3hr.c: Using genomicstart_adj and genomicend_adj in insertions and
5384      deletions to handle computations with querypos to obtain a genomic position
5385
53862015-03-21  twu
5387
5388    * substring.c, substring.h: Substring_convert_to_pairs now takes
5389      genomicstart_indel_adj
5390
5391    * stage3hr.c: No longer changing left2, genomicstart2, and genomicend2 for
5392      substring2 of insertions and deletions.  Providing indel adjustments
5393      instead to Substring_convert_to_pairs.
5394
5395    * pair.c: Made Pairarray_contains_p routine look for any case of a gap or
5396      indel for a given querypos
5397
53982015-03-20  twu
5399
5400    * stage3hr.c: In adjust_hardclips, for dual GMAP, added the ability to shift
5401      low_querypos or high_querypos independently to make the low genomicpos and
5402      high genomicpos equal.
5403
5404    * stage3hr.c: In test_hardclips, for dual GMAP, checking that the
5405      coordinates match for the two ends
5406
5407    * stage3hr.c: On recomputing of hardclips near center, decrementing the
5408      higher value to make the clipping more even
5409
5410    * stage3hr.c: Fixed bug in defining left2 for deletion
5411
5412    * stage3hr.c: In find_ilengths, returning false instead of aborting
5413
5414    * VERSION: Updated version number
5415
5416    * list.c, list.h: Implemented List_pop_out
5417
5418    * substring.c: Fixed genomic coordinates to be 0-based when converting from
5419      substrings to pairs
5420
5421    * stage3hr.c: In test_hardclips, fixed bug with uninitialized values.  In
5422      adjust_hardclips, checking querypos, querypos-1, and querypos+1 again.
5423      Also, for dual GMAP, checking that genomepos matches for the given
5424      low_querypos and high_querypos, meaning that alignments are similar.
5425      Always doing a recompute of ilengths after adjust_hardclips. Implemented
5426      stripping of gaps and indels that occur between the two parts when doing a
5427      merge overlap.
5428
5429    * stage3hr.c: Subtracting 1 from alignstart or alignend in computing
5430      overlaps.  The find_ilengths function returns false if a common point is
5431      not found. Added a test_hardclips step and separate right and left shifts
5432      for adjust_hardclip.  Computing a separate genomicstart2 for substring2 of
5433      insertions and deletions.
5434
54352015-03-19  twu
5436
5437    * pair.c, pair.h: Implemented Pairarray_lookup
5438
5439    * stage3hr.c: Computing second hardclip from its ilength, not overlap.  In
5440      finding common point involving GMAP, skipping introns and indels.  Added
5441      code to check that merged overlap pieces are next to each other.
5442
54432015-03-18  twu
5444
5445    * stage3hr.c: Fixed bug in some of the initial loops of adjust_hardclips
5446
5447    * splice.c, stage1hr.c: Using only sensedir and not sensep in calling
5448      Substring_new_donor, acceptor, and shortexon
5449
54502015-03-17  twu
5451
5452    * stage3hr.c, substring.c, substring.h: Removed unused variables and
5453      parameters.  Using sensedir instead of sensep.
5454
5455    * samprint.c: Removed unused parameters and variables
5456
5457    * substring.c, substring.h: Making Substring_print_shortexon use sensedir
5458      instead of sensep. Removed unused parameters.
5459
5460    * stage3hr.c: Calling Substring_print_donor, acceptor, and shortexon
5461      procedures with sensedir instead of sensep
5462
5463    * pair.c, pair.h: Removed unused parameters
5464
5465    * VERSION: Updated version number
5466
5467    * output.c: Using new interface to SAM_compute_chrpos
5468
5469    * samprint.c, samprint.h: Corrected calculations in SAM_compute_chrpos
5470
5471    * stage3hr.c, stage3hr.h: Using substring_LtoH instead of substring_low and
5472      substring_high. Added initial shift in adjust_hardclips.  Fixed
5473      calculation of overlap to depend only on common_left and common_right.
5474
5475    * substring.c, substring.h: Changed Substring_chrstart and Substring_chrend
5476      to Substring_alignstart_chr and Substring_alignend_chr
5477
5478    * output.c, samprint.c, samprint.h: Did a reverse merge to undo revision
5479      160876 which used substring_hardclipped instead of substring_low
5480
54812015-03-12  twu
5482
5483    * VERSION: Updated version number
5484
5485    * output.c, samprint.c, samprint.h: Revised SAM_compute_chrpos to search for
5486      the hardclipped substring, rather than using substring_low
5487
5488    * stage3hr.c: Changed comment
5489
5490    * shortread.c: Initializing nextchar2 in various procedures
5491
5492    * gsnap.c: Fixed small memory leak
5493
54942015-03-11  twu
5495
5496    * stage3hr.c: Adjusting hardclips by checking adjacent positions left and
5497      right of the crossover querypos.
5498
5499    * substring.c: Removed comment
5500
5501    * stage3hr.c: Restored correct ilength calculations for minus strand
5502
55032015-03-06  twu
5504
5505    * VERSION, config.site.rescomp.prd, config.site.rescomp.tst, index.html:
5506      Updated version number
5507
5508    * stage3hr.c: Added comparisons in hitpair_sort_cmp to fix issue where
5509      duplicate alignments were not being put together for removal
5510
5511    * oligoindex_hr.c: Implemented bit twiddling and SIMD-based method for
5512      computing reverse_nt
5513
55142015-03-03  twu
5515
5516    * stage3.c: Removed automatic trimming of ends less than 12 bp.  Fixed bug
5517      in assigning splice pair in end trimming procedures.
5518
5519    * ax_ext.m4: Performing run test for tzcnt_u32 and tzcnt_u64
5520
5521    * stage3hr.c: Made minor fixes in --clip-overlap feature, including fixes to
5522      gaps and overlaps, more even division of overlaps, and preference for
5523      clipping heads rather than tails in cases of ties
5524
5525    * stage3.c: Turning off branch that can lead to bad CIGAR strings
5526
5527    * inbuffer.c: Defining variable needed when MPI_FILE_INPUT is specified
5528
5529    * gsnap.c: Doing a chromosome_iit_setup before worker_setup
5530
5531    * genome128_hr.c: Using HAVE_TZCNT instead of HAVE_BMI1
5532
55332015-02-25  twu
5534
5535    * stage1hr.c, stage3hr.c, stage3hr.h: Printing an accession when reporting a
5536      CIGAR error
5537
5538    * inbuffer.c, inbuffer.h: Changed nspaces to be an unsigned int
5539
5540    * gsnap.c: Moved pthread_attr_init to places just before they are needed
5541
5542    * Makefile.gsnaptoo.am: Added master.c and master.h as extra files to be
5543      distributed
5544
5545    * master.c: Added pre-processor macros
5546
55472015-02-24  twu
5548
5549    * gsnap.c: Added pre-processor macro around inclusion of master.h
5550
5551    * trunk, VERSION, config.site.rescomp.prd, config.site.rescomp.tst,
5552      Makefile.gsnaptoo.am, index.html, src, filestring.c, filestring.h,
5553      gsnap.c, inbuffer.c, inbuffer.h, master.c, master.h, mpidebug.c,
5554      mpidebug.h, util: Merged revisions 158119 through 159424 from
5555      branches/2015-02-05-mpi-workers-0 to allow for worker threads in rank 0
5556
55572015-02-12  twu
5558
5559    * gmap.c: Added debugging statements
5560
5561    * chimera.c, pair.c, pair.h: Providing Pair_pathscores with a
5562      pre_extension_slop parameter. Distinguishing between call to
5563      Pair_pathscores when finding non-extended paths to pair up, and when
5564      finding a breakpoint between the final, extended paths.
5565
5566    * outbuffer.c: Rearranged procedures for compilation to work
5567
5568    * pair.c: In Pair_print_sam, always doing a Pair_compute_cigar
5569
5570    * outbuffer.c: Printing SAM headers on empty files
5571
55722015-02-10  twu
5573
5574    * gmap.c: Allowing PMAP to have variables for gff3_separators_p
5575
5576    * gmap.c, gsnap.c, pair.c, pair.h, uniqscan.c: For gff3 output, always
5577      adding a separator line.  Added --gff3-add-separators flag to GMAP.
5578
5579    * stage1.c: In find_range, limiting number of results to 100 to avoid
5580      getting bogged down in repeats
5581
5582    * gff3_genes.pl.in, gff3_introns.pl.in, gff3_splicesites.pl.in: For gff3
5583      files without a gene name, always read $chr from line
5584
55852015-02-05  twu
5586
5587    * pair.c: GMAP always recompute cigar_tokens, in case merging has affected
5588      them
5589
55902015-02-04  twu
5591
5592    * pair.c: Added slop in computing Pair_pathscores, to allow for better
5593      identification of translocations
5594
5595    * gmap.c: Improved debugging statements
5596
5597    * chimera.c: Changed type of some debugging statements
5598
55992015-02-02  twu
5600
5601    * trunk, VERSION, src, gmap.c, gsnap.c, pair.c, pair.h, samprint.c,
5602      stage3.c, stage3hr.c, stage3hr.h, uniqscan.c: Merged revisions 157793
5603      through 157918 from branches/2015-01-30-cigar-check to create and check
5604      cigar strings when Stage3_T or Stage3end_T objects are created
5605
56062015-01-30  twu
5607
5608    * stage1hr.c: Using new interface to Stage3_compute
5609
5610    * gmap.c: Using new interface to Stage3_compute and Stage3_new.  No longer
5611      calling Stage3_recompute_goodness.
5612
5613    * pair.c, pair.h: Implemented Pair_fracidentity_array, which returns goodness
5614
5615    * stage3.h, stage3.c: Changed Stage3_recompute_goodness to
5616      Stage3_compute_mapq.  Always recomputing matches and goodness when
5617      this->pairarray is assigned. Removed references to
5618      END_KNOWNSPLICING_SHORTCUT.
5619
56202015-01-29  twu
5621
5622    * stage3.c: In Stage3_cmp, using npairs and matches as secondary criteria
5623      beyond goodness
5624
5625    * gmap.c: Cleaned up unused variables and parameters.  Using new interface
5626      to Stage3_compute
5627
5628    * filestring.c: Added ability to handle %f
5629
5630    * stage3.c, stage3.h: Cleaned up unused variables and parameters
5631
5632    * stage1hr.c: Using new interface to Stage3_compute
5633
5634    * pair.c: Using false instead of 0
5635
56362015-01-28  twu
5637
5638    * gmap.c: Added call to Outbuffer_cleanup()
5639
5640    * outbuffer.c: Moved lock outside of loop to prevent a race condition
5641
5642    * inbuffer.c: Removed check of nextchar == EOF, which causes standard GSNAP
5643      and GMAP not to terminate
5644
56452015-01-27  twu
5646
5647    * shortread.c, shortread.h: Fixed some issues with variable names for MPI
5648      code
5649
5650    * outbuffer.c, outbuffer.h: Added Outbuffer_cleanup, which frees array of
5651      outputs
5652
5653    * inbuffer.c: Allowing for gzipped and bzipped2 files in MPI version by
5654      sending and receiving filecontents
5655
5656    * gsnap.c: Calling Outbuffer_cleanup
5657
5658    * gmap.c: Revealed variable needed for debugging
5659
5660    * filestring.c, filestring.h: Implemented Filestring_send and Filestring_recv
5661
5662    * compress.c: Fixed comment
5663
5664    * shortread.c: Made code consistent across text, gzip and bzip2.  Added
5665      hooks for filling a Filestring_T object in gzip and bzip2 procedures.
5666
56672015-01-26  twu
5668
5669    * index.html: Updated for 2014-12-17.v2
5670
5671    * shortread.c, shortread.h, mpidebug.c, mpidebug.h: Using workers_comm in
5672      MPI_fopen
5673
5674    * inbuffer.c, inbuffer.h: Passing workers_comm to Shortread_read_filecontents
5675
5676    * gsnap.c: Introduced a workers_comm so MPI_File_open and MPI_File_close can
5677      be restricted to that group
5678
5679    * shortread.c: Added debugging statements for opening and closing files
5680
5681    * gsnap.c: Added debugging statements for opening and closing files.  For
5682      MPI master using MPI_File input, explicitly closing those inputs.
5683
5684    * gsnap.c: Using new interfaces to Inbuffer_setup, Inbuffer_new, and
5685      Inbuffer_master_process.  Master rank 0 no longer calling Inbuffer_new.
5686
5687    * gmap.c: Using new interface to Inbuffer_new
5688
5689    * inbuffer.c: No longer making a special case in fill_buffer for MPI when
5690      nextchar at end of block is EOF.
5691
5692    * shortread.c, shortread.h: MPI procedures for reading from filecontents
5693      also close and open input files
5694
5695    * inbuffer.h, inbuffer.c: Moved nspaces into Inbuffer_T object and into
5696      Inbuffer_new instead of Inbuffer_setup.  Made Inbuffer_master_process free
5697      of an Inbuffer_T object.
5698
56992015-01-23  twu
5700
5701    * inbuffer.c: Added comments
5702
5703    * gsnap.c: Created separate worker_setup and worker_cleanup procedures
5704
57052015-01-22  twu
5706
5707    * inbuffer.c: Assigning filecontents buffers to the IN category for memusage
5708
5709    * trunk, config.site.rescomp.prd, src, gsnap.c, inbuffer.c, inbuffer.h,
5710      shortread.c, shortread.h: Merged revisions 157242 to 157253 from
5711      branches/2015-01-22-mpi-file-block to have worker ranks read blocks into a
5712      buffer
5713
5714    * memchk.c, popcount.c, bitpack64-read.h, bitpack64-serial-read.h,
5715      compress.h, dynprog.h, except.h, genomicpos.h, iit-read.h,
5716      indexdb-write.h, indexdb.h, indexdbdef.h, popcount.h, sequence.h: Added
5717      include of config.h
5718
5719    * configure.ac: Changed variable name from USE_MPI_FILE to USE_MPI_FILE_INPUT
5720
5721    * samheader.h, iit-read-univ.h: Added include of <mpi.h>
5722
5723    * oligoindex_pmap.h, bigendian.h, fopen.h, iitdef.h, littleendian.h, mem.h,
5724      oligoindex.h, types.h: Added explanation of why config.h needs to be
5725      included
5726
5727    * gsnap.c, inbuffer.c, inbuffer.h, shortread.c, shortread.h: Checking for
5728      both USE_MPI and USE_MPI_FILE_INPUT in using MPI_File for input
5729
5730    * chrsubset.h, access.h, alphabet.h, backtranslation.h, block.h,
5731      boyer-moore.h, bp-read.h, bp-write.h, bytecoding.h, bzip2.h, chrom.h,
5732      chrsegment.h, datadir.h, diag.h, diagdef.h, diagpool.h, genome-write.h,
5733      genome128-write.h, genome_hr.h, genome_sites.h, genomepage.h, gregion.h,
5734      iit-write-univ.h, iit-write.h, indel.h, indexdb_hr.h, interval.h,
5735      intlist.h, intpool.h, intron.h, match.h, matchdef.h, matchpool.h,
5736      maxent128_hr.h, maxent_hr.h, oligo.h, oligop.h, pairdef.h, parserange.h,
5737      reader.h, stage1.h, tableuint8.h, tally.h, translation.h, univinterval.h:
5738      Added blank line for formatting
5739
5740    * filestring.h, mpidebug.h, oligoindex_hr.h, samprint.h, sortinfo.h: Removed
5741      include of config.h, since not necessary
5742
5743    * atoi.h, bitpack64-write.h, cmet.h: Added $Id$ string
5744
57452015-01-21  twu
5746
5747    * stage3hr.c: Turning on SOFT_CLIPS_AVOID_CIRCULARIZATION again to avoid
5748      duplicates in circular chromosomes
5749
5750    * ax_mpi.m4: Added cc to list of possible values for MPICC, for systems that
5751      use a wrapper called cc
5752
5753    * shortread.c: Fixed parsing issues for blank lines and ends of files
5754
5755    * configure.ac: Added configure flag --enable-mpi-file
5756
5757    * Makefile.gsnaptoo.am: Removed mpi_gmap for now
5758
5759    * gsnap.c, pair.c, pair.h: Added noprint option for --action-if-cigar-error
5760      and made it the default
5761
5762    * gsnap.c, inbuffer.c: Made -q or --part flag work for MPI code
5763
5764    * inbuffer.c: Added ending brace for MPI code
5765
57662015-01-20  twu
5767
5768    * shortread.c: Fixed bug in a print statement where a pointer was not being
5769      provided. In input_oneline, making a single read to get nextchar.
5770
5771    * inbuffer.c: Not doing fseek if nextchar is EOF
5772
5773    * gsnap.c: Removed a debugging statement
5774
5775    * filestring.c: Increased size of buffer
5776
5777    * outbuffer.c, outbuffer.h: Added parameter for output_file
5778
5779    * gmap.c: Using new interface to Outbuffer_setup and
5780      Outbuffer_print_filestrings
5781
5782    * samheader.c, samheader.h, iit-read-univ.c, iit-read-univ.h, gsnap.c,
5783      filestring.c, filestring.h: Applied changes from
5784      branches/2015-01-17-mpi-seq
5785
5786    * outbuffer.c, outbuffer.h: Applied changes from
5787      branches/2015-01-17-mpi-seq.  Removed code for Outbuffer_mpi_process.
5788
5789    * inbuffer.c: Removed requestid variable from fill_buffer for GMAP
5790
5791    * gmap.c: Put in dummy variables for Inbuffer_new
5792
5793    * trunk, config.site.rescomp.tst, src, filestring.c, filestring.h, gsnap.c,
5794      inbuffer.c, inbuffer.h, mpidebug.c, mpidebug.h, outbuffer.c, shortread.c,
5795      shortread.h: Merged revisions 156908 to 157083 from
5796      branches/2015-01-17-mpi-seq to change the input side of mpi_gsnap
5797
5798    * index.html: Updated for version 2014-12-17
5799
5800    * VERSION: Updated version number
5801
5802    * samprint.c: Consolidated print statements
5803
5804    * output.c: Defining abbrev for a nomapper
5805
5806    * diag.c: Added debugging statement
5807
58082015-01-15  twu
5809
5810    * gmap.c, pair.c, stage2.c, stage3.c: Merged revisions 156824 to 156843 from
5811      branches/2015-01-15-fix-chimeras to make better decisions for last exons
5812      having partial alignments
5813
5814    * oligoindex_hr.c: Allowing diagonals where ptr->i < querylength.  Reveals
5815      alignments that were otherwise missed.
5816
5817    * gmap.c: Fixed debugging statements to use Sequence_stdout instead of
5818      Sequence_print
5819
5820    * chimera.c, chimera.h, gmap.c: Fixed algorithm for finding non-exon-exon
5821      chimeric breakpoint and finding dinucleotides
5822
58232015-01-14  twu
5824
5825    * stage3hr.c: In anomalous_splice_p procedures, checking for samechr_splice
5826      hittypes
5827
5828    * stage1hr.c: Not applying GMAP to samechr_splice hittypes
5829
58302015-01-07  twu
5831
5832    * oligoindex_hr.c: Fixed type for positions_space field in Oligoindex_T
5833
5834    * trunk, src, oligoindex_hr.c, oligoindex_hr.h, oligoindex_old.c,
5835      oligoindex_old.h, stage2.c: Merged revisions 154793 through 156263 from
5836      branches/2014-12-06-stage2-larger-kmers to allow for 9-mers in stage 2
5837
5838    * config.site.rescomp.prd, config.site.rescomp.tst, VERSION: Updated version
5839      number
5840
5841    * index.html: Added changes for version 2014-12-16 (v2)
5842
5843    * substring.c: Fixed assertions to account for out-of-bounds regions
5844
5845    * README: Added explanation of XI field
5846
5847    * pair.c, samprint.c, shortread.c, shortread.h: Added code for XI field
5848
58492015-01-05  twu
5850
5851    * stage1hr.c: Using correct typecast of ambcoords to (Uint8list_T) NULL for
5852      large genomes
5853
5854    * stage2.c: Fixed uninitialized variable for firstactive
5855
58562014-12-16  twu
5857
5858    * gsnap.c, uniqscan.c: Using new interface to Stage3hr_setup
5859
5860    * stage3hr.c, stage3hr.h: Computing outofbounds_left and outofbounds_right.
5861      Using new interface to Substring_new.
5862
5863    * substring.c, substring.h: Added provision for outofbounds_left and
5864      outofbounds_right, to be considered part of trimming
5865
5866    * gsnap.c: Changed input sequence to open input streams to get one character
5867      and determine if it is FASTQ format, and then to do Shortread_setup, and
5868      then to fill the inbuffer.
5869
5870    * sarray-read.c: Fixed typo: spliceends_antisense => spliceends_sense
5871
5872    * substring.c: Removed debugging statement
5873
58742014-12-15  twu
5875
5876    * samheader.c: Not printing tabs if there are no headers
5877
5878    * sam_sort.c: Setting fileposition variable for each file
5879
5880    * filestring.c: Handling the case where filestring is NULL
5881
58822014-12-12  twu
5883
5884    * doublelist.c: Fixed type error in doublelist_to_array_out
5885
5886    * trunk, config.site.rescomp.prd, Makefile.gsnaptoo.am, src, gsnap.c,
5887      samprint.c, stage1hr.c, stage1hr.h, stage3hr.c, substring.c, substring.h,
5888      uniqscan.c: Merged revisions 154499 through 155289 from
5889      branches/2014-12-03-dna-chimeras
5890
5891    * VERSION, config.site.rescomp.prd, index.html: Updated version number
5892
5893    * sam_sort.c: Revised sam_sort to handle multiple input files
5894
5895    * trunk, Makefile.am, VERSION, bootstrap.gsnaptoo, ax_mpi.m4, config.site,
5896      config.site.rescomp.prd, config.site.rescomp.tst, configure.ac,
5897      memory-check.pl, mpi, src, Makefile.gsnaptoo.am, access.c,
5898      backtranslation.c, backtranslation.h, bool.h, chimera.c, chimera.h,
5899      filestring.c, filestring.h, genomicpos.c, genomicpos.h, get-genome.c,
5900      gmap.c, gsnap.c, iit-read-univ.c, iit-read-univ.h, iit-read.c, iit-read.h,
5901      inbuffer.c, inbuffer.h, md5.c, md5.h, mem.c, mem.h, mpidebug.c,
5902      mpidebug.h, outbuffer.c, outbuffer.h, output.c, output.h, pair.c, pair.h,
5903      request.c, request.h, resulthr.c, resulthr.h, revcomp.c, sam_sort.c,
5904      samflags.h, samheader.c, samheader.h, samprint.c, samprint.h,
5905      sarray-read.c, segmentpos.c, segmentpos.h, sequence.c, sequence.h,
5906      shortread.c, shortread.h, stage1hr.c, stage2.c, stage2.h, stage3.c,
5907      stage3.h, stage3hr.c, stage3hr.h, substring.c, substring.h, translation.c,
5908      translation.h, types.h, uniqscan.c: Merged revisions 154226 to 155279 from
5909      branches/2014-11-27-mpi to implement MPI versions and to use Filestring_T
5910      objects for all output
5911
5912    * genome.c, genome.h: Changed type of gbuffer from unsigned char to char
5913
59142014-12-10  twu
5915
5916    * oligoindex_hr.c: Added code for handling 9-mers
5917
59182014-12-06  twu
5919
5920    * stage1hr.c: Fixed typo in assigning probs_acceptor
5921
59222014-12-05  twu
5923
5924    * trunk, VERSION, config.site.rescomp.prd, src, doublelist.c, doublelist.h,
5925      gsnap.c, samprint.c, sarray-read.c, splice.c, stage1hr.c, stage1hr.h,
5926      stage3hr.c, stage3hr.h, uniqscan.c: Merged revisions 154673 through 154777
5927      from branches/2014-12-04-stage1-ambig to compute ambiguous splicing better
5928      in suffix array, stage1, and combining splices.  Fixed memory leak and
5929      changed criteria for comparing across hits
5930
59312014-12-04  twu
5932
5933    * samprint.c, stage3hr.c, stage3hr.h: Merged revisions 154673 through 154678
5934      from branches/2014-12-04-stage1-ambig to change XA field
5935
5936    * index.html: Updated for latest version
5937
5938    * configure.ac: Added more detailed messages about our own loading of
5939      config.site files to counteract the warning message from the standard
5940      autoconf loading
5941
59422014-12-03  twu
5943
5944    * uniqscan.c: Using new interface to Substring_setup
5945
5946    * gsnap.c: Replaced --terminal-output-minlength with --reject-trimlength
5947
5948    * stage1hr.c, stage1hr.h: Calling Sarray_search_greedy with nmisses_allowed
5949      being cutoff_level, and not querylength.  Using reject_trimlength instead
5950      of terminal_output_minlength.
5951
5952    * stage3hr.c, stage3hr.h: Replaced Stage3_filter_terminals with
5953      Stage3_reject_trimlengths
5954
5955    * substring.c, substring.h: Implemented new logic based on
5956      reject_trimlength.  True terminals from the GSNAP algorithm are allowed at
5957      this point (but taken care of now by Stage3end_reject_trimlengths).
5958
5959    * sarray-read.c: Improved debugging statements
5960
5961    * stage3hr.c: No longer trying to clip overlaps when the two ends are not in
5962      a concordant orientation
5963
5964    * outbuffer.c: Using new interface to SAM_print_nomapping
5965
5966    * samprint.c, samprint.h: Allowing for non-zero npaths to be printed in
5967      SAM_print_nomapping as an NH field, which can occur with the
5968      --quiet-if-excessive feature
5969
5970    * samread.c: Allowing for the possibility that XO is the first field in a
5971      SAM line
5972
5973    * stage3hr.c: Fixed problem with --merge-distant-samchr feature giving the
5974      wrong chrpos on SAM output on distant splices, since this was being
5975      treated the same as a translocation (chrnum == 0)
5976
5977    * samread.c: Terminating parse_XO procedures for either '\0' or '\n'
5978
59792014-12-02  twu
5980
5981    * gmap.c, gsnap.c: Including default variables in --help statement
5982
5983    * stage3hr.c: Calculating common_shift to get more even splits between the
5984      two paired ends, by accounting for the common shared point between
5985      common_right and common_left.
5986
5987    * sarray-read.c: Fixed typo in a for loop
5988
5989    * sam_sort.c: Made --no-sam-headers option work correctly
5990
5991    * sam_sort.c, samheader.c, samheader.h: For --split-output function, writing
5992      SAM header files to each output file
5993
59942014-11-27  twu
5995
5996    * archive.html, index.html: Updated for latest version
5997
59982014-11-25  twu
5999
6000    * README: Added comment about sam_sort and --split-output
6001
6002    * sam_sort.c, samflags.h, samread.c, samread.h: Added --split-output and
6003      --append-output options
6004
6005    * outbuffer.c: Changed abbrev NM in comment
6006
6007    * stage1hr.c: Changed calculation of amb_nmatches to amb_length
6008
6009    * stage3hr.c: Swapping ilength_low and ilength_high for GMAP when alignment
6010      is minus
6011
60122014-11-24  twu
6013
6014    * stage1hr.c: Turning off debugging
6015
6016    * bootstrap.gsnaptoo: Running automake to add missing files
6017
6018    * trunk, util, src, outbuffer.c, pair.c, pair.h, samprint.c, samprint.h,
6019      stage3hr.c, stage3hr.h, substring.c, substring.h: Merged revisions 153682
6020      to 154020 from branches/2014-11-20-redo-overlap to compute overlap better
6021      using ilength53 and ilength35 and a common shift
6022
6023    * sarray-read.c: Merged revisions 153682 to 154020 to handle ambiguous
6024      splicing better
6025
6026    * trunk, INSTALL, VERSION, config.site.rescomp.prd, index.html: No longer
6027      keeping track of INSTALL
6028
6029    * config.guess, config.sub, ltmain.sh: No longer keeping track of
6030      config.guess, config.sub, or ltmain.sh
6031
6032    * gmap_build.pl.in: Added comment about meaning of -D flag
6033
6034    * acinclude.m4, configure.ac: Adding check for MPI
6035
6036    * ax_mpi.m4: Added code for MPI
6037
6038    * src, access.c, bitpack64-read.c, bitpack64-readtwo.c, bitpack64-write.c,
6039      compress-write.c, genome-write.c, genome.c, genome128-write.c,
6040      genome128.c, genome_sites.c, get-genome.c, gmapindex.c, iit-read-univ.c,
6041      iit-read.c, iit_get.c, iit_store.c, indel.c, indexdb-write.c, indexdb.c,
6042      indexdb_hr.c, mem.c, oligoindex_hr.c, sam_sort.c, samheader.c, snpindex.c,
6043      stage1hr.c, stage3.c, table.c, tableuint8.c, uniqscan.c, univinterval.c:
6044      Merged revisions 153114 to 153944 from branches/2014-11-12-make-check-i386
6045      to make tests work in i386 computers
6046
6047    * stage3hr.c: Not using ambiguous splices to update found_score
6048
6049    * stage2.c: Removed adjacentp as unused variables
6050
6051    * samprint.c, samprint.h: For circular alignments, checking for sole HS
6052      pattern.  Also checking for chrpos > chrlength, and subtracting chrlength
6053      if necessary.
6054
6055    * pair.c, pair.h: Added Cigar_action_T.  Added Pair_check_cigar.  Removed
6056      prev as an unused variable.
6057
6058    * iit-write-univ.c: Handling the case if total_nintervals is 0
6059
6060    * gmap.c, gsnap.c: Added --action-if-cigar-error
6061
60622014-11-17  twu
6063
6064    * bytecoding.c: Removed unused variable
6065
6066    * gmapindex.c: Printing genome length to stderr
6067
6068    * bytecoding.c: Using a buffer of 10,000,000 block-sizes, and writing
6069      iteratively, rather than a single buffer and single write.
6070
60712014-11-13  twu
6072
6073    * gmapindex.c: Made some changes in casting.  Fixed printf format to use
6074      %llu.
6075
6076    * stage3hr.c, stage3hr.h: Renamed amb_nmatches to amb_length.
6077
6078    * samprint.c: In adjust_hardclips, not changing hardclips if shift downward
6079      fails. Renamed amb_nmatches to amb_length.
6080
6081    * splice.c, stage1hr.c: Renamed amb_nmatches to amb_length.  Providing
6082      Substring_match_length_orig to amb_length in Stage3end_new_splice and
6083      Stage3end_new_shortexon
6084
60852014-10-31  twu
6086
6087    * iit-read-univ.c, iit-read.c: Using %llu and casting to (long long int) for
6088      printing offset and filesize
6089
6090    * gmap.c, gsnap.c: Using %zu for printing results of sizeof().
6091
60922014-10-29  twu
6093
6094    * stage3hr.c: Restoring revision of SAM insertlength for ends involving GMAP
6095      when method is successful
6096
6097    * stage3hr.c: Fixed SAM output of insert length of 0 when no overlap is
6098      found in a GMAP alignment
6099
6100    * stage3.c: Added debugging statements
6101
6102    * gmap.c, gmapindex.c, gsnap.c: Added output statement at end of checking
6103      compiler assumptions
6104
6105    * README: Added comment about change from PG: to XG:
6106
6107    * ax_ext.m4, configure.ac: Added option to enable or disable sse4.2
6108
6109    * samprint.c: Fixed typo in adjust_hardclips.  Also, when querypos increase
6110      fails, trying querypos decrease.
6111
6112    * samprint.c: Fixed infinite loop in adjust_hardclips
6113
61142014-10-28  twu
6115
6116    * stage3hr.c: Fixed bug with uninitialized variables
6117
6118    * outbuffer.c: Fixing potential data race as noted by valgrind for
6119      this->ntotal between input and output threads, although not problematic
6120      before, because this->ntotal increases monotonically
6121
61222014-10-27  twu
6123
6124    * stage3hr.c: Fixed computation of overlap between GMAP and non-GMAP
6125      alignments
6126
61272014-10-22  twu
6128
6129    * gregion.c: Checking size before deciding to use alloca or malloc.
6130
61312014-10-16  twu
6132
6133    * stage2.c: Fixed an uninitialized variable in grand_fwd and grand_rev
6134      procedures, plus the checks on maxintronlen in computing
6135      grand_fwd_lookforward and grand_rev_lookforward.
6136
6137    * shortread.c: Allowing queryseq1 to be equal to SKIPPED.  Removed unused
6138      parameter acc from input_oneline routines.
6139
6140    * VERSION, index.html: Updated version number
6141
6142    * stage1hr.c: Added debugging statement
6143
6144    * sarray-read.c: Don't limit filling of best elt based on nmatches being
6145      more than half of the read length
6146
6147    * stage3hr.c: Restored previous behavior where soft clips avoid
6148      circularization
6149
6150    * indexdb-write.c, sarray-write.c: Removed unnecessary includes of popcount.h
6151
6152    * bitpack64-write.c, genome128_hr.c: In lookups of clz_table, removing the
6153      intermediate variable "top".
6154
61552014-10-15  twu
6156
6157    * stage3hr.c: Not allowing soft-clipping at ends to avoid circularization.
6158      Added pre-processor macro SOFT_CLIPS_AVOID_CIRCULARIZATION to preserve
6159      previous code.
6160
6161    * VERSION, config.site.rescomp.prd, config.site.rescomp.tst, index.html:
6162      Updated version number
6163
6164    * outbuffer.c: Added gff3 header for GMAP gff3 output to stdout.  Added HD
6165      and PG headers for GMAP sam output to --split-output files.
6166
61672014-10-14  twu
6168
6169    * Makefile.gsnaptoo.am: Changed variable name to SIMD_CFLAGS
6170
6171    * configure.ac: Changed variable name to SIMD_CFLAGS.  Setting -mpopcnt
6172      based only on acx_mpopcnt_ok, and not on individual builtin functions.
6173
6174    * builtin-popcount.m4: Setting CFLAGS instead of LIBS.  Checking for builtin
6175      functions regardless of whether -mpopcnt works.
6176
6177    * ax_ext.m4: Setting CFLAGS instead of LIBS.  Changed variable name to
6178      SIMD_CFLAGS
6179
6180    * builtin-popcount.m4, configure.ac: Changed macro name to
6181      ACX_BUILTIN_POPCOUNT
6182
6183    * acinclude.m4, builtin-popcount.m4, builtin.m4: Renamed program to
6184      builtin-popcount.m4
6185
6186    * popcnt.m4: Added comment
6187
6188    * builtin.m4: Added check from popcnt.m4
6189
61902014-10-13  twu
6191
6192    * shortread.c: Made input procedures robust to incomplete entries
6193
6194    * indexdb.c: Fixed bug where munmap was called twice on positions_high for
6195      GSNAPL and GMAPL.
6196
61972014-10-09  twu
6198
6199    * VERSION: Updated version number
6200
6201    * ax_ext.m4: Seeing of avx and avx2 are enabled
6202
6203    * configure.ac: Added ability to disable avx and avx2
6204
6205    * gmap_build.pl.in: Calling gmapindex initially to check for compiler
6206      assumptions
6207
6208    * gmapindex.c: Added check for compiler assumptions
6209
6210    * gmap.c, gsnap.c: Added description of --check option to --help message
6211
6212    * gmap.c, gsnap.c: Improved check for compiler assumptions, and added
6213      --check option
6214
6215    * samprint.c: Fixed flag for merged overlap to not have PAIRED_READ set.
6216      When clipdir == 0, not calling adjust_hardclips.
6217
62182014-10-07  twu
6219
6220    * sarray-read.c: Fixed bug when setting array_stop and finalptr is less than
6221      4
6222
62232014-10-01  twu
6224
6225    * samprint.c: Further fixes to mapping quality for merged alignments
6226
6227    * samprint.c: For merged alignments, printing mapping quality of 40
6228
6229    * pair.c, samprint.c: Putting XB and XP tags after XH
6230
6231    * VERSION: Updated version number
6232
6233    * samprint.c: Calling GMAP procedure for querystart and queryend when
6234      necessary in adjust_hardclips
6235
6236    * stage3.c: Initializing variables
6237
6238    * pair.c: Passing correct values for hardclip_low and hardclip_high to
6239      hardclip_pairs
6240
6241    * samprint.c: Removed static initialization of hide_soft_clips_p
6242
6243    * trunk, src, outbuffer.c, pair.c, pair.h, samprint.c, samprint.h: Merged
6244      revisions 149547 through 149570 from branches/2014-10-01-hardclip-adj to
6245      improve adjustment of hardclipping
6246
6247    * stage3.c: Removed diagnosticp from some procedures
6248
62492014-09-30  twu
6250
6251    * sam_sort.c: Providing timing information to user
6252
6253    * sam_sort.c, samread.c, samread.h: Changed algorithm to parse just for
6254      linelengths initially, which allows for the SAM file to be read using
6255      buffers
6256
62572014-09-29  twu
6258
6259    * VERSION, index.html: Updated version number
6260
6261    * sam_sort.c: Fixed usage statement
6262
6263    * samread.c, samread.h: Commented out unused procedures
6264
6265    * samread.c: Added header file
6266
6267    * samheader.c, samheader.h, chimera.c, chimera.h, get-genome.c, gmap.c,
6268      gsnap.c, iit-read-univ.c, iit-read-univ.h, outbuffer.c, outbuffer.h,
6269      pair.c, pair.h, samprint.c, samprint.h, shortread.c, shortread.h,
6270      stage3.c, stage3.h, uniqscan.c: Removing computation of .sortinfo file
6271
6272    * Makefile.gsnaptoo.am: Removed sortinfo.c and sortinfo.h
6273
6274    * sam_sort.c, samflags.h, samheader.c, samheader.h, samread.c, samread.h:
6275      Made sam_sort independent of a .sortinfo file.  Computes sortinfo
6276      information directly from the SAM input file
6277
62782014-09-26  twu
6279
6280    * outbuffer.c: Providing sortinfo to Pair_print_sam_nomapping
6281
6282    * VERSION, index.html: Updated version number
6283
6284    * trunk, index.html, src, outbuffer.c, pair.c, pair.h, sam_sort.c,
6285      samflags.h, samprint.c, samprint.h, samread.c, samread.h, sortinfo.c,
6286      sortinfo.h, stage3.c: Merged revisions 148987 through 149172 from
6287      branches/2014-09-25-sam-sort to yield a working version of sortinfo feature
6288
6289    * stage3hr.c: Reverting to previous version 149160 in trunk
6290
6291    * stage3hr.c: Rewrote Stage3pair_remove_overlap procedures to use
6292      hitpair_overlap_score_cmp, hitpair_overlap_test,
6293      hitpair_equiv_preference_cmp, and hitpair_equiv_tst
6294
62952014-09-25  twu
6296
6297    * samprint.c: Not merging paired ends if there is no overlap
6298
62992014-09-24  twu
6300
6301    * pair.c: Enabled call to Sortinfo_update to work for GMAP
6302
6303    * VERSION, index.html: Updated version number
6304
6305    * chimera.c, chimera.h, pair.c, samprint.c: Using @ instead of : for
6306      coordinates for XT field.  Made GMAP XT field same as that for GSNAP.
6307
6308    * config.site.rescomp.tst: Allowing builtin_popcount
6309
6310    * README: Added description of XH field
6311
6312    * Makefile.gsnaptoo.am: Added samheader.c to sam_sort
6313
6314    * sam_sort.c: Handling single-end alignments
6315
6316    * samread.c: Setting terminating character at end of computation
6317
6318    * samheader.c, samheader.h: Implemented change of HD header to SO:sorted
6319
6320    * sam_sort.c, samread.c, samread.h: Handling hard-clipped alignments
6321
6322    * pair.c, pair.h, samprint.c, shortread.c, shortread.h: Added XH field to
6323      provide hard-clipped sequence
6324
6325    * substring.c: For --merge-overlap feature, in deciding whether to add
6326      insertion, deletion, or intron between the pieces, using <= instead of <
6327      to decide.
6328
6329    * samprint.c: In SAM_compute_chrpos, always using substring_low to compute
6330      chrpos
6331
6332    * indel.c: Initializing variables
6333
6334    * sam_sort.c: Implemented --dups-only and --uniq-only
6335
6336    * sortinfo.c: Fixed typo in comment
6337
6338    * pair.c, pair.h, samprint.c, sortinfo.c, sortinfo.h: Sortinfo_update uses
6339      sign on chrnum to indicate whether a read is the low or high end
6340
6341    * iit-read-univ.c: Using safer computation of an average
6342
6343    * Makefile.gsnaptoo.am, sam_sort.c, samread.c, samread.h: Implemented
6344      --mark-dups function into sam_sort.  Works except for hard-clipping.
6345
6346    * samread.c, samread.h: Brought over copy from GSTRUCT
6347
6348    * sam_sort.c: Introduced readindex, needed for marking duplicates
6349
63502014-09-23  twu
6351
6352    * Makefile.gsnaptoo.am, iit-read-univ.c, iit-read-univ.h, sam_sort.c,
6353      sortinfo.c: Using genome chromosome_iit file in sam_sort, instead of
6354      storing information in .sortinfo files
6355
6356    * VERSION, index.html: Updated version number
6357
6358    * trunk, src, Makefile.gsnaptoo.am, chimera.c, chimera.h, chrnum.h,
6359      get-genome.c, gmap.c, gsnap.c, iit-read-univ.c, iit-read-univ.h,
6360      outbuffer.c, outbuffer.h, pair.c, pair.h, sam_sort.c, samheader.c,
6361      samheader.h, samprint.c, samprint.h, shortread.c, shortread.h, sortinfo.c,
6362      sortinfo.h, stage1hr.h, stage3.c, stage3.h, types.h, uint8list.h,
6363      uintlist.h, uniqscan.c: Merged revisions 148659 through 148720 from
6364      branches/2014-09-23-sam-sort to add --make-sortinfo option and sam_sort
6365      program
6366
6367    * splice.c: Fixed bug in reading Stage3end_T object that was already freed
6368
63692014-09-22  twu
6370
6371    * configure.ac: Loading default ./config.site file only if it exists
6372
6373    * VERSION, config.site.rescomp.prd, config.site.rescomp.tst, configure.ac,
6374      index.html: Handling cases differently if CONFIG_SITE file begins with ./.
6375       Allowing for multiple files in CONFIG_SITE.
6376
6377    * Makefile.gsnaptoo.am: Interchanged order of POPCNT_CFLAGS and SIMD_FLAGS
6378
6379    * samprint.c: Fixed bugs introduced in adding --merge-overlap feature
6380
6381    * INSTALL, config.guess, config.sub, ltmain.sh: Updated to more recent
6382      version of autoconf
6383
6384    * bootstrap.gsnaptoo: Changed /usr/bin/touch to touch
6385
6386    * configure.ac: Testing first for ./ in front of CONFIG_SITE.  Added
6387      AC_SUBST of POPCNT_CFLAGS
6388
6389    * shortread.c, shortread.h: Removed unused procedures
6390
6391    * stage1hr.c: Changed nmisses_allowed for sarray method to be querylength,
6392      for both --use-sarray=2 and normal method
6393
63942014-09-21  twu
6395
6396    * pair.c: Changed PG field to XG
6397
6398    * configure.ac: Change flag to --disable-builtin-popcount
6399
6400    * splice.c: In procedures that group for ambiguous hits, separating sense
6401      and antisense segments
6402
6403    * substring.c, stage3hr.c, stage1hr.c: Improved debugging statements
6404
64052014-09-19  twu
6406
6407    * stage3.c: Using new interface to Pair_print_sam
6408
6409    * ax_ext.m4, builtin.m4: Restoring value of LIBS at end of procedure
6410
6411    * builtin.m4: Setting LIBS to -mpopcnt before running tests
6412
6413    * gsnap.c: Added comment that --merge-overlap is a beta implementation
6414
6415    * trunk, VERSION, config.site.rescomp.tst, index.html, src, gsnap.c, list.c,
6416      list.h, outbuffer.c, outbuffer.h, pair.c, pair.h, samprint.c, samprint.h,
6417      shortread.c, shortread.h, stage3hr.c, stage3hr.h, substring.c,
6418      substring.h: Merged revisions 148190 through 1418357 from
6419      branches/2014-10-18-merge-overlap to add --merge-overlap feature
6420
6421    * ax_ext.m4: Changed name of cpu features to sse4.1 and sse4.2
6422
6423    * configure.ac: Showing pthread and popcnt flags at end
6424
6425    * README: Explaining XG flag
6426
6427    * samprint.c: Changed PG flag to XG flag
6428
6429    * samheader.c: Commenting out code for extra @PG lines
6430
6431    * configure.ac: Setting POPCNT_CFLAGS based on result from builtin.m4
6432
6433    * builtin.m4: Setting a variable ax_cv_compile_builtin_ext
6434
6435    * ax_ext.m4: Setting LIBS properly before running AC_LINK_IFELSE
6436
64372014-09-18  twu
6438
6439    * genome128_hr.c: Using Harley's method to reduce number of popcount
6440      operations for SSE2
6441
6442    * genome128_hr.c: Fixed HAVE_MM_POPCNT alternative for SSE2-based popcount
6443
6444    * archive.html, index.html: Updated for version 2014-09-18
6445
64462014-09-17  twu
6447
6448    * VERSION: Updated version number
6449
6450    * genome128_hr.c: Added macro for debug4
6451
6452    * genome128_hr.c: Using _mm_extract_epi16 for SSE2 code instead of casting
6453      to (UINT4 *), because casting leads some compilers to generate wrong
6454      ordering of statements
6455
64562014-09-16  twu
6457
6458    * ax_ext.m4, configure.ac: Providing information to user about SIMD cpu
6459      features available and compiler flags to be used
6460
6461    * gmap.c, gsnap.c: Added extra information to --version about type of popcnt
6462      supported
6463
6464    * ax_ext.m4: Changed ordering of ifthen clauses.  Added variable to indicate
6465      compiler or linker problems.
6466
6467    * configure.ac: Added warnings for compiler or linker problems for SIMD
6468      extensions
6469
64702014-09-15  twu
6471
6472    * configure.ac: Changed flag name from --enable-popcnt to
6473      --enable-builtin-popcnt
6474
6475    * genome128_hr.c, sarray-read.c: Using mm_popcnt when _popcnt not available
6476
6477    * ax_ext.m4: Revamped tests to check for CPU, compile, and linking, in that
6478      order.  Renamed variables more systematically.
6479
6480    * pair.c, pair.h, pairpool.c, pairpool.h, stage3.c, stage3.h: Moved
6481      Pairpool_clip_bounded to pair.c, and created Pair_clip_bounded_pairs for
6482      computing chimeras by the working thread and Pair_clip_bounded_array for
6483      truncation by the output thread.  This enables the --truncate flag to work
6484      again for GMAP.
6485
6486    * gmap.c: Improved error message when invalid argument is given to -f
6487
6488    * stage3.c: Made cdna direction choices based on splice site scores more
6489      stringent, so both donor and acceptor sites have to be significantly
6490      better.
6491
64922014-09-12  twu
6493
6494    * gmap.c, outbuffer.c, pair.c, pair.h, stage3hr.c, stage3hr.h, substring.c,
6495      substring.h: For BLAST m8 output, adding endings to accessions for
6496      paired-end reads
6497
6498    * gmap.c, gsnap.c, outbuffer.c, outbuffer.h, pair.c, pair.h, stage3hr.c,
6499      stage3hr.h, substring.c, substring.h, uniqscan.c: Added implementation of
6500      BLAST m8 output format
6501
65022014-09-10  twu
6503
6504    * VERSION: Updated version number
6505
6506    * ax_ext.m4: Fixed typo for handling AVX
6507
65082014-09-08  twu
6509
6510    * uniqscan.c: Using new interface to Stage1hr_setup
6511
6512    * stage1hr.c, stage1hr.h, gsnap.c: Added option --use-sarray=2 to use only
6513      suffix array algorithm
6514
6515    * stage3hr.c: Stopped using alloca for hitlists, since they can cause stack
6516      overflow.  Made loops more efficient for pair_up_concordant_aux.
6517
6518    * gmap.c: Stopping memory error when a chimera is found, --npaths is set to
6519      1, and one part of the chimera fails the conditions for --min-identity or
6520      --min-trimmed-coverage
6521
65222014-09-04  twu
6523
6524    * sarray-read.c: Implemented faster SIMD algorithm for
6525      Elt_fill_positions_filtered
6526
6527    * sarray-read.c: Implemented Elt_fill_positions_filtered using alloca and
6528      copying from stack, instead of guessing allocation
6529
65302014-09-03  twu
6531
6532    * VERSION, config.site.rescomp.prd: Updated version number
6533
6534    * stage3hr.c: In pair_remove_bad_superstretches, keeping track of better and
6535      worse children separately, and handling list order correctly.  Now chooses
6536      shorter insert lengths correctly.  Added OUTERLENGTH_SLOP.
6537
65382014-09-02  twu
6539
6540    * stage3hr.c: In Stage3end_optimal_score_aux and
6541      Stage3pair_optimal_score_aux, counting indels only if they are within
6542      trim_left and trim_right
6543
6544    * gsnap.c, stage3hr.c, stage3hr.h, uniqscan.c: Added option
6545      --order-among-best to GSNAP to control randomization among best alignments
6546
6547    * stage3hr.c: When SCORE_INDELS is true for comparing alignments, not
6548      counting indel_penalty in new->penalties to avoid double-counting
6549
6550    * stage2.c, stage3hr.c, stage3.c: Turned off debugging
6551
6552    * stage1hr.c: Calling GMAP pairsearch if indels5 or indels3 is not NULL, as
6553      well as if found_score is too high
6554
6555    * trunk, src: Merged revisions 145502 to 146146 from
6556      branches/2014-08-19-stack-alloca and 146146 to 146618 from
6557      branches/2014-08-27-parallel-stage2
6558
6559    * oligoindex_hr.c, diag.c: Merging revisions 145502 to 146146 from
6560      branches/2014-08-19-stack-alloca to ignore check on querylength and to use
6561      alloca for GSNAP
6562
6563    * stage2.c, stage2.h: Merging revisions 146146 to 146618 from
6564      branches/2014-08-27-parallel-stage2 to make stage 2 computation faster
6565
6566    * list.c, list.h, spanningelt.c, spanningelt.h, stage1hr.c, stage1hr.h:
6567      Merging revisions 145502 to 146146 from branches/2014-08-19-stack-alloca
6568      to work directly on arrays of Spanningelt_T objects
6569
6570    * dynprog_simd.c, dynprog_single.c: Merging revisions 146146 to 146618 from
6571      branches/2014-08-27-parallel-stage2 to fix debugging procedures and to use
6572      a stricter check for using 8-bit SIMD
6573
6574    * gmap.c, gsnap.c, stage3.c, stage3.h: Merging revisions 145502 to 146146
6575      from branches/2014-08-19-stack-alloca to use stage2_alloc only for GMAP
6576      initial stage 2 computation
6577
6578    * stage1hr.c: Fixed debugging statements
6579
65802014-08-25  twu
6581
6582    * trunk, configure.ac: Merged revisions 145989 through 145990 from
6583      branches/2014-08-19-stack-alloca to adde flag to configure.ac
6584
6585    * trunk, VERSION, index.html, src, boyer-moore.c, boyer-moore.h, chimera.c,
6586      chop_primers.c, diag.c, doublelist.c, doublelist.h, dynprog.c,
6587      dynprog_cdna.c, dynprog_end.c, dynprog_genome.c, dynprog_simd.c,
6588      dynprog_single.c, genome.c, genome.h, genome128_hr.c, genome_sites.c,
6589      genomicpos.h, gmap.c, gregion.c, gregion.h, gsnap.c, indel.c, intlist.c,
6590      intlist.h, list.c, list.h, mapq.c, mem.c, mem.h, oligo.c, oligoindex_hr.c,
6591      outbuffer.c, pair.c, pair.h, pairpool.c, parserange.c, samprint.c,
6592      sarray-read.c, shortread.c, shortread.h, smooth.c, splice.c, splicetrie.c,
6593      stage1.c, stage1hr.c, stage1hr.h, stage2.c, stage2.h, stage3.c, stage3.h,
6594      stage3hr.c, substring.c, uint8list.c, uint8list.h, uintlist.c, uintlist.h,
6595      uinttable.c, uinttable.h, gvf_iit.pl.in: Merged revisions 145503 through
6596      145988 from branches/2014-08-19-stack-alloca to use alloca
6597
65982014-08-20  twu
6599
6600    * uniqscan.c: Using new interface to Shortread_new
6601
6602    * shortread.c, shortread.h: Made Shortread_new extern again
6603
6604    * trunk, INSTALL, README, config.guess, config.sub, ltmain.sh,
6605      config.site.rescomp.prd, config.site.rescomp.tst, src, gmap.c, gsnap.c,
6606      indel.c, mapq.c, mem.c, mem.h, sarray-read.c, shortread.c, shortread.h,
6607      splice.c, stage1hr.c, stage3hr.c, stage3hr.h, substring.c: Merged
6608      revisions 145503 through 145603 from branches/2014-08-19-stack-alloca to
6609      use alloca instead of statck arrays based on MAX_READLENGTH and to handle
6610      reads longer than MAX_READLENGTH
6611
66122014-08-19  twu
6613
6614    * VERSION, index.html: Updated version number
6615
6616    * configure.ac: Added configure flag for enabling or disabling ssse3
6617      instructions
6618
6619    * ax_ext.m4: Checking whether user wants SSSE3 instructions
6620
6621    * stage2.c, stage3.c: Not putting gapholders into starts and ends.  Removing
6622      gapholders from middle before calling Pairpool_join_end5 and
6623      Pairpool_join_end3. Gapholders were causing problems with the join
6624      operation.
6625
6626    * pairpool.c, pairpool.h: Implemented Pairpool_remove_gapholders
6627
66282014-08-04  twu
6629
6630    * stage3.c: Not setting best_pairs or best_path when the result is NULL
6631
6632    * shortread.c: Ignoring spaces in read
6633
6634    * pairpool.c: In joining paths, handling the case when one path is NULL
6635
66362014-07-29  twu
6637
6638    * ltmain.sh: Updated from 2.2.6 to 2.2.6b
6639
6640    * config.sub, config.guess: Updated from 2008 version to 2009 version
6641
6642    * INSTALL: Updated from 2007 version to 2009 version
6643
6644    * archive.html, index.html: Made changes for new version
6645
6646    * gff3_genes.pl.in, gff3_introns.pl.in, gff3_splicesites.pl.in: Handling
6647      GFF3 files that have both exon and CDS fields
6648
66492014-07-21  twu
6650
6651    * stage3hr.c: Restored random behavior for equivalent alignments
6652
6653    * VERSION: Updated version number
6654
6655    * atoiindex.c, cmetindex.c: Making correct calls to
6656      Sarray_discriminating_chars
6657
6658    * dynprog_simd.c: Added code for having no initial gap penalties, to be used
6659      for bam_indelfix
6660
66612014-07-19  twu
6662
6663    * dynprog_simd.c: Improvements to debugging procedures to handle 3-digit
6664      indices
6665
66662014-07-16  twu
6667
6668    * gmap.c: Using new interface to Stage3_compute
6669
6670    * stage3hr.h: Added interfaces for Stage3end_shortexonA_distance and
6671      Stage3end_shortexonD_distance
6672
6673    * stage3hr.c: Added hook for amb_prob in Stage3end_new_gmap.  Removed
6674      penalty for ambig end lengths if amb_prob > 0.9.  Fixed pointer advances
6675      in removing bad superstretches.
6676
6677    * stage3.c: Made fixes to choice of cdna_direction: using presence/absence
6678      of intron types, rather than number, and decreased binomial threshold for
6679      alignments around intron.  Fixed handling of multiple start or end paths
6680      by joining them at the outset.
6681
6682    * stage2.c: Allowing for multiple end cells from each rootposition
6683
6684    * dynprog_genome.c: Altered decision-making between best alignment and
6685      probability-based alignment
6686
6687    * dynprog_simd.c: Added debugging statements
6688
6689    * dynprog.h: Made open gap penalties more uniform for different defect rates
6690
6691    * pair.c: Fixed calculation of nmatches at end for Pair_trim_ends
6692
6693    * pairpool.c: Enhanced debugging statement
6694
66952014-07-15  twu
6696
6697    * stage3.h: Stage3_compute now returns ambig_prob_5 and ambig_prob_3
6698
6699    * stage3.c: For ambiguous ends, no longer calling clean_pairs_end5,
6700      clean_path_end3, trim_end5_exon_indels, or trim_end3_exon_indels
6701
6702    * stage1hr.c: Passing amb_prob_5 and amb_prob_3 to Stage3end_new_gmap
6703
6704    * pair.c, pair.h: In Pair_trim_ends, not trimming ambiguous ends
6705
6706    * stage3.c: In trim_end5_exon_indels and trim_end3_exon_indels, counting
6707      trimmed ends as mismatches and handling large indels differently from
6708      small indels.  Added hooks for ambig_prob.
6709
6710    * pair.c, pair.h, stage3hr.c: Eliminating ambig_end_nmatches from
6711      consideration in Pair_nmismatches_region
6712
6713    * stage3hr.c: Restored eventrim algorithm from revision 140363, which had
6714      only a single eventrim calculation and not separate calculations for the
6715      two ends.
6716
67172014-07-03  twu
6718
6719    * VERSION: Updated version number
6720
6721    * dynprog_end.c: Fixed typo in calling matrix for 16_upper twice, instead of
6722      16_upper and 16_lower
6723
6724    * splicetrie.c: Handling the case where Dynprog_end5_splicejunction or
6725      Dynprog_end3_splicejunction returns NULL
6726
6727    * dynprog_simd.h: Decreased value of SIMD_MAXLENGTH_EPI8 from 40 to 30 to
6728      prevent issues with overflows
6729
6730    * dynprog_simd.c, dynprog.c: In traceback, using main loop to decide whether
6731      to handle dir == DIAG
6732
6733    * dynprog_end.c: In traceback, using main loop to decide whether to handle
6734      dir == DIAG. In Dynprog_end5_splicejunction and
6735      Dynprog_end3_splicejunction, requiring finalscore to be positive before
6736      doing any traceback.
6737
6738    * sarray-write.c: Made monitoring statements work for
6739      Sarray_discriminating_chars
6740
6741    * sarray-write.c: Implemented batch reading method for
6742      Sarray_discriminating_chars
6743
67442014-07-02  twu
6745
6746    * gmapindex.c, sarray-write.c, sarray-write.h: Made the building of the LCP
6747      array and discriminating chars array more memory efficient by writing
6748      temporary files for rank and permuted sarray
6749
6750    * genome.c, genome.h: Changed type of counts to be Univcoord_T
6751
6752    * access.c: Fixed bug in handling of final partial block.  Added debugging
6753      code for checking results.
6754
6755    * access.c: Increased FREAD_BATCH to 100 million bytes.  Modified
6756      Access_allocated to always read in batches of size FREAD_BATCH.
6757
67582014-07-01  twu
6759
6760    * trunk, config.site.rescomp.prd, config.site.rescomp.tst, src,
6761      genome128_hr.c, gmap.c, intlist.c, intlist.h, samprint.c, sarray-read.c,
6762      splice.c, splice.h, stage1hr.c, stage3.c, stage3hr.c, stage3hr.h,
6763      substring.c, uint8list.c, uint8list.h, uintlist.c, uintlist.h: Merged
6764      revisions 140131 through 140367 from branches/2014-06-27-fix-amb to
6765      implement separate eventrim scores for start/end of read, fix
6766      cmet-stranded and cmet-nonstranded modes, implement separate
6767      sense/antisense for Splice_solve_single, and rewrite of ambiguous
6768      parameters from left/right to donor/acceptor
6769
6770    * index.html: Updated for version 2014-06-10
6771
6772    * archive.html: Added link to version 2011-12-28
6773
6774    * VERSION: Updated version number
6775
6776    * stage3hr.c: In Stage3end_new_shortexon, setting amb_nmismatches_start and
6777      amb_nmismatches_end separately
6778
6779    * stage3.c: Using score_introns (which looks at splice site neighborhood),
6780      instead of score_alignment to count canonical introns.  Using defect_rate
6781      to determine whether to rely on splice site probabilities.
6782
6783    * stage1hr.c: Added blank lines
6784
6785    * gmap.c: Preventing leftpos and rightpos from exceeding query coordinates
6786      in solving for chimeras.  Not using extension in finding remaining
6787      alignment, since it makes alignment harder.
6788
67892014-06-30  twu
6790
6791    * stage3.c: Transferring microexon pairs without looking at probabilities
6792
6793    * dynprog_single.c: Using MIN_MICROEXON_LENGTH instead of 8
6794
67952014-06-25  twu
6796
6797    * stage3hr.c: Fixed assignment of amb_nmatches_start and amb_nmatches_end
6798      for shortexons on minus strand
6799
6800    * stage3hr.c: Removed debugging code
6801
6802    * sarray-write.c: In Sarray_compute_child, cleaning out stack at end,
6803      because skipping it results in an incorrect child array
6804
68052014-06-24  twu
6806
6807    * stage3hr.c: Commenting out assertions that are not always true
6808
6809    * stage1hr.c: Assigning correct values of amb_nmatches_donor and
6810      amb_nmatches_acceptor to Stage3end_new_shortexon
6811
68122014-06-11  twu
6813
6814    * VERSION: Updated version number
6815
6816    * trunk, config.site.rescomp.prd, config.site.rescomp.tst, src,
6817      sarray-read.c, splice.c, stage1hr.c, stage3hr.c, stage3hr.h: Merged
6818      revisions 138722 through 138743 from
6819      branches/2014-06-01-amb-shortexon-fix1 to allow for ambiguous shortexons
6820      and to place a limit of MAX_LOCALSPLICING_POTENTIAL on splicing and
6821      shortexons
6822
6823    * archive.html, index.html: Put version 2014-05-15.v3 into archive
6824
6825    * stage1hr.c: Not processing splicing or shortexons if number of
6826      possibilities exceeds MAX_LOCALSPLICING_POTENTIAL
6827
6828    * gsnap.c: In memusage debugging, printing accession for each thread at the
6829      start of its processing
6830
6831    * substring.c, substring.h: Added function Substring_chimera_prob_2
6832
6833    * segmentpos.c, segmentpos.h: Added function Segmentpos_compare_order
6834
6835    * samprint.c: Removed unused variable
6836
6837    * iitdef.h: Added FILENAME_SORT as a sorting type
6838
6839    * gsnap.c: Added commas to memusage debugging output
6840
6841    * dynprog_end.c: Removed assertions, which do not hold
6842
68432014-06-09  twu
6844
6845    * gmap_build.pl.in, chrom.c, chrom.h, gmapindex.c: Added option for sorting
6846      chromosomes by order in a file
6847
68482014-06-04  twu
6849
6850    * VERSION: Updated version number
6851
6852    * README, configure.ac: Increased default MAX_READLENGTH for GSNAP from 250
6853      to 300
6854
6855    * dynprog_simd.c: Fixed formatting
6856
6857    * dynprog_genome.c, dynprog_cdna.c: Using correct interface to
6858      Dynprog_standard for non-SSE2 systems
6859
6860    * dynprog_end.c: Enabled non-SSE2 compilation to work.  Made traceback
6861      procedures follow those in dynprog_simd.c.
6862
6863    * dynprog.c, dynprog.h: Modified traceback_std (non-SIMD) to behave the same
6864      as the SIMD traceback routines.  Improved debugging output.
6865
6866    * stage3.c: Added comments
6867
6868    * dynprog.c, dynprog.h: Exposed Dynprog_standard, needed for systems without
6869      SSE2 instructions
6870
68712014-06-03  twu
6872
6873    * gmap.c, gsnap.c: In checking behavior of _mm_extract_epi8, just reporting
6874      results and not exiting based on behavior
6875
6876    * genome128_hr.c: Casting _mm_extract_epi16 to unsigned short, or a zero
6877      extended result, which is technically the correct behavior
6878
6879    * dynprog_simd.c: Being very explicit about casting between int and Score8_T
6880      and Score16_T types
6881
6882    * dynprog.h: Removed conditionals around defining Score_8 and Score_16 types
6883
6884    * compress.c: Calling _mm_free to match _mm_malloc
6885
68862014-05-30  twu
6887
6888    * trunk, VERSION, config.site.rescomp.prd, config.site.rescomp.tst,
6889      index.html, src, compress.c, genome128_hr.c: Merged revisions 137093
6890      through 137694 from branches/2014-05-23-genome128-32bit-shortchut to
6891      implement 32-bit shortcuts for 128-bit genomebits
6892
6893    * stage3hr.c: Not checking any more for duplicate Stage3end_T objects
6894
6895    * stage2.c: Eliminated penalty when exon length < EXON_DEFN, which misses
6896      short exons
6897
68982014-05-29  twu
6899
6900    * stage1hr.c: Restored usage of paired_usedp to avoid excess calls to GMAP
6901      for halfmapping alignments
6902
6903    * stage1hr.c: Not using paired_usedp in computing GMAP for halfmapping
6904      alignments
6905
6906    * dynprog_simd.c: Fixed traceback procedures to follow correct paths on gaps
6907
6908    * dynprog.c, dynprog.h, dynprog_genome.c, dynprog_single.c: Moved gap
6909      penalties to dynprog.h, and reduced open penalties to allow for multiple
6910      indels
6911
69122014-05-28  twu
6913
6914    * stage3.c: In merge_local_single, checking filledp to see if merge failed.
6915      When merge fails, recompute pairarray for this_left and this_right.
6916
69172014-05-27  twu
6918
6919    * dynprog_simd.c: Implemented replacement for _mm_min_epi8 for non-SSE4.1
6920      systems
6921
69222014-05-23  twu
6923
6924    * dynprog_simd.c: Fixed computations of E and H for values near
6925      NEG_INFINITY, to prevent horizontal or vertical jumps into the empty
6926      triangle.  Added an E_mask variable to set horizontal/vertical scores into
6927      the empty triangle to be NEG_INFINITY.  Setting directions_nogap
6928      explicitly to DIAG along the main diagonal to take care of ties between E
6929      and H.
6930
6931    * dynprog_end.c: Using computed lband and uband in find_best_endpoint
6932      procedures, instead of trying to recompute them, which led to incorrect
6933      results
6934
6935    * get-genome.c, iit-read-univ.c: Fixed zero-based behavior of -L option to
6936      one-based behavior. Zero-based behavior introduced in revision 99737 on
6937      2013-06-27.
6938
6939    * index.html: Made changes for version 2014-05-15
6940
6941    * gmap.c, gsnap.c: Added better test for behavior of max operation in SSE4.1
6942
6943    * stage1hr.c: Fixed a memory leak involving ambcoords
6944
6945    * stage1hr.c: Using nmatches_posttrim in evaluating alignments.  Doing a
6946      comparison between concordant alignments involving terminals and
6947      halfmapping alignments, to determine the best solution.
6948
6949    * stage3hr.c: Fixed assignment of amb_nmatches to correct end for minus
6950      alignments in Stage3end_new_splice.  Extending hardclips by
6951      amb_nmatches_start and amb_nmatches_end in Stage3pair_overlap.
6952
6953    * stage3hr.c, stage3hr.h: Changed Stage3end_nmatches and Stage3pair_nmatches
6954      to Stage3end_nmatches_posttrim and Stage3pair_nmatches_posttrim
6955
69562014-05-21  twu
6957
6958    * samprint.c, samprint.h: Fixed typo in keeping a parameter
6959
6960    * README: Clarified effect of --failed-input option
6961
6962    * README: Fixed typo
6963
6964    * acx_mmap_fixed.m4, acx_mmap_variable.m4: Fixed type incompatibility when
6965      char * is cast to int
6966
6967    * stage1hr.c: Changed categories for some debugging statements
6968
6969    * gmap.c, gsnap.c, outbuffer.c, outbuffer.h, samprint.c, samprint.h,
6970      stage3hr.c, stage3hr.h: Changed --fails-as-input flag to --failed-input
6971      flag, which takes an argument.  Printing failed inputs in addition to
6972      nomapping output.
6973
6974    * genome.c: Made error message clearer when genomebits128 file not found
6975
69762014-05-17  twu
6977
6978    * stage3hr.c: Setting amb_nmatches_start, amb_nmatches_end,
6979      start_ambiguous_p, and end_ambiguous_p based on amb_nmatches for halfdonor
6980      and halfacceptor splices, even when ambcoords_left and ambcoords_right are
6981      NULL
6982
69832014-05-16  twu
6984
6985    * stage1hr.c: Minor fixes to debugging statements
6986
6987    * stage1hr.c: Turned on macro for finding middle alignments.
6988
6989    * stage1hr.c: Turned on finding of middle alignments in find_terminals.  Set
6990      length threshold to be querylength/3 instead of index1part.
6991
69922014-05-15  twu
6993
6994    * stage3.c: Removed revisions of coordinates near indels, which is not
6995      needed any more with the latest dynamic programming procedures
6996
69972014-05-13  twu
6998
6999    * VERSION, index.html: Updated version number
7000
7001    * dynprog_simd.c: For systems with SSE2 but not SSE4.1, subtracting 128 from
7002      pairscore in F loop of Dynprog_simd_8, and from initial column in
7003      Dynprog_simd_8_lower, to obtain correct results
7004
7005    * dynprog_genome.c: Removed debugging statements
7006
7007    * dynprog_genome.c: Fixed an uninitialized variable, best_prob
7008
7009    * trunk, README, src, Makefile.gsnaptoo.am, bytecoding.c, samprint.c,
7010      sarray-read.c, sarray-read.h, splice.c, splice.h, stage1hr.c, stage3hr.c,
7011      stage3hr.h, substring.c, substring.h, uint8list.c, uint8list.h,
7012      uintlist.c, uintlist.h: Merged revisions 135802 through 136084 from
7013      branches/2014-05-09-novel-ambiguous to consolidate ambiguous splices to
7014      save on memory usage and to consider them as a single concordant alignment
7015
7016    * pairpool.c: Not advancing coordinate at start of Pairpool_add_queryskip
7017      and Pairpool_add_genomeskip
7018
70192014-05-09  twu
7020
7021    * trunk, VERSION, config.site.rescomp.prd, config.site.rescomp.tst,
7022      index.html, src, sarray-read.c, splice.c, splice.h, stage1hr.c,
7023      stage3hr.c, stage3hr.h, substring.c, substring.h: Merged revisions 135797
7024      through 135801 from branches/2014-05-09-novel-ambiguous to save state
7025      before implementing novel ambiguous positions
7026
7027    * gsnap.c: Improved output for MEMUSAGE
7028
7029    * dynprog_simd.c: Speeding up traceback for upper and lower triangles, to
7030      take advantage of the fact that the main diagonal is filled with DIAGs.
7031
7032    * sarray-read.c, stage1hr.c, stage3hr.c, stage3hr.h: Grouping multiple
7033      splice segments connected to a given splice site, and selecting the best
7034      among these.  If multiple best choices are found, creating an ambiguous
7035      splice.
7036
7037    * dynprog_simd.c: In traceback procedures, added check after each indel to
7038      see if we are in row 0 or column 0, and if so, not to trust the value in
7039      directions_nogap.
7040
70412014-05-08  twu
7042
7043    * splicetrie_build.c: Fixed comparison with MAX_SITES_ALLOCATED for
7044      intron-level splicing files
7045
7046    * gsnap.c: Setting POOL_FREE_INTERVAL to be 1
7047
7048    * gsnap.c, outbuffer.c: Using new interface to memusage routines
7049
7050    * mem.c, mem.h: Changed variable names for memusage.  Added memusage report
7051      for keep pool.
7052
7053    * dynprog_simd.c: Fixed bug with infinite loop at row 1 or column 1, leading
7054      to error in memory allocation for pairpool
7055
7056    * dynprog_simd.c, pairpool.c: Adding 1 to r and c only on final indel
7057
7058    * pairpool.c: When adding indel (queryskip or genomeskip), not advancing
7059      coordinate by 1, since that can cross over chromosomal bounds
7060
70612014-05-07  twu
7062
7063    * VERSION: Updated version number
7064
7065    * Makefile.gsnaptoo.am, cellpool.c, cellpool.h, gmap.c, smooth.c,
7066      stage1hr.c, stage1hr.h, stage2.c, stage2.h, stage3.c, stage3.h,
7067      uniqscan.c: Added Cellpool_T object to handle allocations of Cell_T in
7068      stage 2
7069
7070    * gsnap.c: Calling Pairpool_reset, Diagpool_reset, and Cellpool_reset before
7071      processing each request.  Previously, this memory was not being freed
7072      until the end of the process.
7073
7074    * splicestringpool.c: Changed memory procedures to use standard pool instead
7075      of keep pool
7076
7077    * dynprog_simd.c: Fixed saturation bug in F loop when trying to add
7078      pairscore.  Setting E and H so non-diag directions are placed in row 0 and
7079      column 0.  At end of traceback procedures, adding final indel to (0,0).
7080
7081    * stage3.c: Having make_pairarray_merge return a boolean to indicate success
7082      or failure.  Trying to use the old pairarray in case of failure.
7083
7084    * gmap.c, gsnap.c, uniqscan.c: Using new interface to Splicetrie_retrieve
7085      procedures, using Splicestringpool_T object
7086
7087    * Makefile.gsnaptoo.am, splicestringpool.c, splicestringpool.h,
7088      splicetrie_build.c, splicetrie_build.h: Using Splicestringpool_T object to
7089      reduce number of memory allocations for Splicestring_T objects.  Using
7090      local array for sites to also reduce number of memory allocations.
7091
7092    * splicetrie_build.c: Allocating struct Interval_T when copies are needed,
7093      to reduce the number of calls to allocate memory.  Allocating triecontents
7094      as an array instead of uintlist, also to reduce the number of calls to
7095      allocate memory.
7096
7097    * interval.c, interval.h: Allocating struct Interval_T when copies are
7098      needed, to reduce the number of calls to allocate memory
7099
71002014-05-06  twu
7101
7102    * dynprog.h: Added definitions of POS_INFINITY_8 and POS_INFINITY_16
7103
7104    * pair.c: Returning correct type of NULL
7105
7106    * dynprog_cdna.c, dynprog_genome.c: Fixed potential memory leak with
7107      SNP-tolerant alignment
7108
7109    * genome.c: If glength == 0, Genome_get_segment_blocks now returns NULL
7110
7111    * stage3.c: Changed condition for not running Dynprog_single_gap from
7112      (queryjump < 0 || genomejump < 0) to (queryjump <= 0 || genomejump <= 0)
7113
71142014-05-05  twu
7115
7116    * stage1hr.c: Checking for anomalous splice (samechr splice) before trying
7117      to compute mapping position from distal end
7118
7119    * compress.c: Bypassing SSSE3 version of Compress_shift when
7120      DEFECTIVE_SSE2_COMPILER is true
7121
71222014-05-02  twu
7123
7124    * ax_ext.m4: Not allowing AVX unless immintrin.h is present
7125
7126    * indexdb.c: Freeing offsetspages for large genomes
7127
7128    * gsnap.c: For MEMUSAGE, changing pool free interval to be 1
7129
7130    * dynprog.c: Freeing nt_to_int_array
7131
7132    * dynprog_simd.c: Simplified traceback procedures
7133
71342014-05-01  twu
7135
7136    * stopwatch.c: Removed comment
7137
7138    * stage3hr.c: Allocating ambi and amb_nmismatches into output memory pool
7139
7140    * stage1hr.c: Limiting the number of terminals
7141
7142    * mem.c: Fixed memusage_reset procedure and revised debugging messages to
7143      print pool type
7144
7145    * intlist.c, intlist.h: Implemented Intlist_to_array_out
7146
7147    * compress.c, compress.h: Implemented SSSE3 procedure and fixed bug in SSE2
7148      procedure for Compress_shift
7149
7150    * genome128_hr.c: Fixed block_diff_snp procedure to perform the complete
7151      calculation for query, ref, and alt sequences.
7152
7153    * genome.c: Fixed Genome_get_segment_blocks_left and _right to provide
7154      correct alternate genomic segment
7155
7156    * dynprog_simd.c: Setting values along bands to be DIAG to avoid going out
7157      of bounds. Revised loops for gaps in traceback procedures.  Compensating
7158      for open value in comparing E vs H for dir_horiz and dir_vert.  Using
7159      nt_to_int_array. Improved debugging print procedures.
7160
7161    * dynprog.c, dynprog.h: Introduced nt_to_int_array.  Setting score for
7162      AMBIGUOUS to be 0, and setting N-N to be that score.
7163
71642014-04-24  twu
7165
7166    * ax_ext.m4: Removing -mavx and -mavx2 compiler flags for now.  Being added
7167      to Mac OS X Mavericks, where they are causing problems.
7168
7169    * stopwatch.c: Added macro for specifying POSIX C time
7170
7171    * stage3hr.c: Allowing pair_insert_length_trimmed to handle non-concordant
7172      paired ends
7173
7174    * genome128_hr.c: Implemented code for defective SSE2 compilers that cannot
7175      handle shifts with a non-immediate scalar
7176
7177    * samprint.c: Fixed printing of XS for half-donor and half-acceptor reads
7178
71792014-04-21  twu
7180
7181    * iit-read-univ.c: Printing tp:circular for circular chromosomes
7182
7183    * stage3.c: Using new interface to Pair_circularpos
7184
7185    * stage3hr.c: Computing insertlength properly for circular chromosomes when
7186      ends have different aliases.  Handling duplicates of aliases better.
7187
7188    * pair.c, pair.h: Computing alias correctly in Pair_circularpos
7189
7190    * pair.c: Fixed bug when Pair_trim_ends is called when pairs is NULL
7191
71922014-04-19  twu
7193
7194    * trunk, src, Makefile.gsnaptoo.am, atoiindex.c, cmetindex.c, genome.c,
7195      genome.h, genome128_hr.c, genome128_hr.h, gmapindex.c, gsnap.c, indel.c,
7196      indel.h, indexdb.c, mapq.c, mapq.h, oligo.c, oligoindex_hr.c,
7197      sarray-read.c, sarray-read.h, sarray-write.c, sarray-write.h, splice.c,
7198      splice.h, splicetrie.c, splicetrie.h, stage1hr.c, stage1hr.h, stage3hr.c,
7199      stage3hr.h, substring.c, substring.h: Merged revisions 133654 through
7200      133759 from branches/2014-04-18-cmet-atoi-suffix-arrays to fix CMET and
7201      ATOI procedures
7202
7203    * VERSION, config.site.rescomp.prd, config.site.rescomp.tst: Updated version
7204      number
7205
7206    * index.html: Updated for version 2014-04-17
7207
7208    * stage3.c: Fixed uninitialized variable
7209
72102014-04-17  twu
7211
7212    * indexdb.c: Creating correct bitpack filenames when snps_root is given
7213
7214    * snpindex.c: Handling the case when an IIT file is not provided on the
7215      command-line, and installation is not needed
7216
72172014-04-10  twu
7218
7219    * gmap_build.pl.in: Fixed documentation for compression types
7220
7221    * dynprog_simd.c, dynprog_simd.h, dynprog_single.c: Added traceback
7222      procedure for Score16_T variables
7223
72242014-04-09  twu
7225
7226    * snpindex.c: Creating genomebits128 format instead of genomebits format
7227
7228    * sarray-read.c: Made changes in code (commented out) to try to infer
7229      lcp-intervals for short oligomers from 12-mer index, but still has some
7230      bugs
7231
7232    * archive.html, index.html: Made changes for version 2014-04-08 and
7233      2013-03-28.v2
7234
72352014-04-08  twu
7236
7237    * VERSION: Updated version number
7238
7239    * gmap.c: Removed unnecessary const
7240
7241    * stage2.c: Applying maxintronlen in deciding whether to create a grand
7242      lookback
7243
7244    * pairpool.c: Fixed bug in Pairpool_compact_copy
7245
7246    * stage3.c: Removed debugging statement
7247
7248    * trunk, src, Makefile.gsnaptoo.am, dynprog.c, dynprog.h, dynprog_cdna.c,
7249      dynprog_cdna.h, dynprog_end.c, dynprog_end.h, dynprog_genome.c,
7250      dynprog_genome.h, dynprog_simd.c, dynprog_simd.h, dynprog_single.c,
7251      dynprog_single.h, gmap.c, gsnap.c, sarray-read.c, sequence.c, sequence.h,
7252      splicetrie.c, stage3.c, stage3.h, uniqscan.c, util: Merged --reintegrate
7253      branches/2014-04-04-dynprog-shift to change dynprog_end routines to use
7254      upper/lower algorithm without F loops
7255
7256    * index.html: Updated for version 2014-04-06
7257
7258    * pairpool.c, pairpool.h: Implemented Pairpool_compact_copy
7259
7260    * oligoindex_hr.c: Freeing storage memory in Oligoindex_array_T object
7261
7262    * stage1.c: Checking extensions to make sure they fall within
7263      --max_totallength
7264
7265    * chimera.c, chimera.h, gmap.c, stage3.c, stage3.h: Checking chimeras to see
7266      that they satisfy --min-trimmed-coverage and --min-identity filters
7267
7268    * mem.c: Turned off DEBUG macro
7269
7270    * iit-read-univ.c: Restored warning message when IIT file cannot be read
7271
7272    * gmapindex.c, sarray-write.c, sarray-write.h: Implemented
7273      Sarray_child_uncompress
7274
7275    * gmap.c: No longer proceeding to align if Stage2_scan yields max_ncovered
7276      of less than 10% of the querylength.
7277
7278    * genome128_hr.c: Made additional changes to avoid _mm_extract_ps on
7279      non-SSE4.1 systems
7280
72812014-04-07  twu
7282
7283    * sarray-read.c: When effective querylength is less than bucket indexsize,
7284      no longer trying to infer lcp-interval from bucket array, but just
7285      starting from entire lcp-interval.
7286
72872014-04-06  twu
7288
7289    * gmap.c, gsnap.c, uniqscan.c: Checking for valid float values between 0.0
7290      and 1.0
7291
7292    * dynprog_genome.c, splicetrie.c: Fixed headers for dynprog routines
7293
7294    * dynprog_single.c: Using a variable in the F loop to see if H needs to be
7295      reloaded
7296
7297    * oligoindex_hr.c:  Allocating storage for Oligoindex_array_T
7298
72992014-04-05  twu
7300
7301    * trunk, src, Makefile.gsnaptoo.am, dynprog.c, dynprog.h, dynprog_genome.c,
7302      dynprog_genome.h, dynprog_single.c, dynprog_single.h, gmap.c, gsnap.c,
7303      pairpool.c, pairpool.h, stage3.c, stage3.h, uniqscan.c, util:  erge
7304      reintegrated branches/2014-04-04-dynprog-shift to use new SIMD routines
7305      that are row-first and reduce use of F loops
7306
7307    * util, src: Removed property changes
7308
7309    * VERSION: Updated version number
7310
7311    * gmap.c, gsnap.c, oligoindex_hr.c, oligoindex_hr.h, stage1hr.c, stage1hr.h,
7312      stage2.c, stage2.h, stage3.c, stage3.h, uniqscan.c: Created a separate
7313      Oligoindex_array_T object, which also holds storage
7314
7315    * configure.ac: Added AC_FUNC_ALLOCA
7316
7317    * mem.h: Added hooks for memory allocation using alloca
7318
7319    * bitpack64-readtwo.c: Added missing #endif
7320
7321    * indexdb.c: Fixed type of offsets from UINT4 back to Positionsptr_T
7322
7323    * iit-read-univ.c:  aking Univ_FNode_T struct separate from FNode_T struct
7324
7325    * gmapindex.c: Checking genomelength before trying to create suffix array or
7326      LCP/child/DC arrays
7327
7328    * bitpack64-readtwo.c: Fixed bug where not enough 128-bit registers were
7329      provided for large genomes
7330
7331    * bitpack64-read.c: Eliminated an extra addition from computing offset in
7332      large genomes
7333
73342014-04-03  twu
7335
7336    * uniqscan.c: Using new interface to Oligoindex_new routines
7337
7338    * index.html: Updated for 2014-04-01 version
7339
7340    * archive.html:  oved 2014-03-28 and earlier versions to archive
7341
7342    * acinclude.m4, ax_ext.m4, ax_gcc_x86_avx_xgetbv.m4: Updated ax_ext.m4 and
7343      added ax_gcc_x86_avx_xgetbv
7344
7345    * uniqscan.c: Using new interface to Stage3_setup
7346
7347    * stage3.c, stage3.h: Providing min_end_indel_matches to end trimming
7348      procedures
7349
7350    * sarray-read.c, sarray-read.h: Added separate access control for lcpchilddc
7351
7352    * oligoindex_hr.c, oligoindex_hr.h: Allocating dedicated space needed for
7353      Oligoindex_get_mappings, to avoid memory allocation/deallocation
7354
7355    * gsnap.c: Using new interface to Oligoindex_new routines, Stage3_setup, and
7356      Sarray_new
7357
7358    * gmap.c: Using new interface to Oligoindex_new routines and Stage3_setup
7359
7360    * genome128_hr.c: Provided alternative to _mm_extract_ps, which also
7361      requires SSE4.1
7362
73632014-04-02  twu
7364
7365    * dynprog.c:  odified debugging statements
7366
7367    * ax_ext.m4: Adding warning messages when immintrin.h is not found and could
7368      be used
7369
7370    * ax_ext.m4: Checking for immintrin.h before allowing popcnt, lzcnt, or bmi1
7371
7372    * ax_ext.m4: Improved warning messages
7373
7374    * trunk, VERSION, ax_ext.m4, config.site.rescomp.prd,
7375      config.site.rescomp.tst, src, Makefile.gsnaptoo.am, atoiindex.c,
7376      bitpack64-access.c, bitpack64-read.c, bitpack64-read.h,
7377      bitpack64-readtwo.c, bitpack64-readtwo.h, bitpack64-serial-read.c,
7378      bitpack64-serial-read.h, bitpack64-serial-write.c,
7379      bitpack64-serial-write.h, bitpack64-write.c, bitpack64-write.h,
7380      bytecoding.c, bytecoding.h, cmetindex.c, compress-write.c,
7381      compress-write.h, compress.c, compress.h, compress128.c, dynprog.c,
7382      genome-write.c, genome-write.h, genome.c, genome128-write.c,
7383      genome128-write.h, genome128_hr.c, genome128_hr.h, genome_hr.c,
7384      genome_hr.h, genome_sites.c, gmap.c, gmapindex.c, gsnap.c,
7385      iit-read-univ.c, iit-read.h, iit-write-univ.h, iit-write.h, iitdef.h,
7386      indel.c, indexdb-write.c, indexdb-write.h, indexdb.c, indexdb.h,
7387      indexdb_hr.c, indexdbdef.h, mapq.c, sarray-read.c, sarray-read.h,
7388      sarray-write.c, sarray-write.h, snpindex.c, splice.c, splicetrie.c,
7389      splicetrie_build.c, stage1hr.c, stage2.c, stage3.c, stage3hr.c,
7390      substring.c, types.h, uniqscan.c, util, gmap_build.pl.in:  ajor change.
7391      Merged revisions 131573 to 132142 from branches/2014-03-26-bitpack-esa to
7392      use genomebits128 format, bp64-columnar, and enhanced suffix arrays
7393
7394    * VERSION: Updated version number
7395
7396    * index.html: Revised for latest version
7397
7398    * stage3hr.c: Not using number of introns to determine equivalence in
7399      hit_equiv_cmp and hitpair_equiv_cmp
7400
7401    * genome_sites.c: Resolved comparison between unsigned and signed values for
7402      -1
7403
74042014-04-01  twu
7405
7406    * boyer-moore.c, dynprog.c, stage3.c: Implemented changes to restore finding
7407      microexons
7408
7409    * shortread.c: Handling Casava reads ending in ";1".  Fixed problem where -q
7410      and --allow-pe-name-mismatch together caused a fatal bug.
7411
7412    * genome_hr.c: Changed debugging output for splice fragments to print
7413      unsigned shorts
7414
74152014-03-28  twu
7416
7417    * indexdb.c: Changed type of offsets (called only for regular bitpack
7418      procedure) from Positionsptr_T * to UINT4 *
7419
7420    * indexdb_hr.c: Calling correct procedures for LARGE_GENOMES
7421
7422    * substring.c, substring.h, stage3hr.c: Fixing comparisons of coordinates to
7423      handle circular chromosomes
7424
7425    * stage3hr.h: Removed queryseq as arguments to Stage3end_remove_duplicates
7426
7427    * stage1hr.c: Calling Stage3end_remove_duplicates after
7428      Stage3end_remove_circular_alias
7429
74302014-03-27  twu
7431
7432    * stage1hr.c, stage3hr.c, stage3hr.h: Trying to salvage alias +1 within
7433      Stage3end_remove_circular_alias, and calling that rather than
7434      Stage3end_unalias_circular
7435
7436    * stage3hr.c: In Stage3_new_splice, not trying to merge long-distance
7437      splices at this time, which can lead to bad coordinates for GMAP
7438
7439    * stage1hr.c: Initializing variable, previously not initialized
7440
7441    * pair.c: Fixed assertion on CIGAR length to include hardclips
7442
7443    * gsnap.c: Including splice.h
7444
7445    * chimera.c: Rearrange order of loops in Chimera_bestpath
7446
74472014-03-26  twu
7448
7449    * stage3hr.c: Fixed bug where overlap across circular chromosome origin is
7450      entirely trimmed, leading to an "SH" cigar string
7451
7452    * trunk, VERSION, config.site.rescomp.prd, config.site.rescomp.tst,
7453      index.html, util: Updated version number
7454
7455    * gmap.c: Checking return value of Chimera_bestpath
7456
7457    * chimera.c, chimera.h: Chimera_bestpath returning a value to indicate if a
7458      chimera was found
7459
7460    * stage3.c: In Stage3_merge_local, traversing cDNA gap any time intronlength
7461      is less than 0
7462
7463    * pairpool.c: Created separate debugging category for Pairpool_clean_join
7464
7465    * gsnap.c, splice.c, splice.h: Using min_shortend to control splice length
7466      at ends
7467
7468    * sarray-read.c: Fixed problem where splice or deletion could extend into
7469      next chromosome
7470
74712014-03-17  twu
7472
7473    * translation.c: Fixed bug where assign_cdna_backward was returning ncdna
7474      instead of codon
7475
74762014-03-12  twu
7477
7478    * sarray-read.c: Put sarray_search loop into a new function,
7479      find_longest_match
7480
7481    * sarray-read.c: Improved sarray_search by looking up genome only when lcp
7482      advances by more than one position
7483
7484    * oligoindex_hr.c: Using faster method for checking for zero 128-bit
7485      register when SSE4.1 is available
7486
7487    * dynprog.c, gdiag.c, indexdb_hr.c, oligo.c, pair.c, sarray-read.c,
7488      spanningelt.c, stage1.c, stage1hr.c, stage2.c: Using safer method for
7489      computing average of lowi and highi in binary search
7490
7491    * access.c: Handling empty files
7492
74932014-02-28  twu
7494
7495    * gmap.c, stage3.c, stage3.h: Removed alignment_score_fwd and
7496      alignment_score_rev from Stage3_T object
7497
7498    * VERSION: Updated version number
7499
7500    * stage3.h, gmap.c: Using maxintronlen_bound in Stage3_mergeable
7501
7502    * stage3.c: Not trimming exons at end based on splice site probabilities
7503
7504    * stage2.c: In D4 section, when diffdistance <= EQUAL_DISTANCE_NO_SPLICING,
7505      adding CONSEC_POINTS_PER_MATCH, to extend chains further
7506
7507    * gmap.c: Performing iterations for finding a local join, before iterations
7508      for finding a chimera.
7509
7510    * stage3.c: Revised criteria for pick_cdna_direction.  When cdna_direction
7511      == 0, assigning fwd or rev introntype instead of NONINTRON.
7512
7513    * Makefile.dna.am, Makefile.gsnaptoo.am, get-genome.c, gmap.c, outbuffer.c,
7514      outbuffer.h, stage1.c, stage1.h: Removed references to Chrsubset_T.  Made
7515      -c flag work again by setting universal coordinate bounds in stage 1.
7516
7517    * gmap.c: Checking coverage of nonchimericbest against each chimeric part,
7518      and if coverage is large enough on each side, picking non-chimeric
7519      alignment over chimera.
7520
7521    * stage3.c, stage3.h: Not determining mergeable based on cdna_directions of
7522      the left and right part.  In that situation, fixing the alignment by
7523      recomputing to find the best cdna_direction.
7524
7525    * pair.c: Disallowing transitions even 10 bp outside of alignments
7526
7527    * chimera.c: Calling Pair_pathscores directly, instead of through
7528      Stage3_pathscores
7529
7530    * gmap_build.pl.in: Removed reference to localdir.  Putting tempfiles for
7531      <genome>.coords and <genome>.sources into destination directory.
7532
75332014-02-27  twu
7534
7535    * sarray-read.c: Slight speed improvement in handling pre-alignment loop.
7536      Now using 0 instead of 4.
7537
7538    * stage3hr.c: Restored old version of Stage3end_remove_overlaps, with
7539      different algorithm from that for paired ends
7540
75412014-02-26  twu
7542
7543    * stage1hr.c: For paired-end reads, when only one end is too short, aligning
7544      the other end as part of a halfmapping alignment
7545
75462014-02-24  twu
7547
7548    * stage3hr.c: Implemented recursive, list-based approach to removing bad
7549      superstretches in paired-end alignments, instead of O(n^3) algorithm,
7550      which occasionally hanged in repetitive regions
7551
7552    * gmap.c: Added debugging statements for chimera
7553
7554    * stage2.c: Reducing NINTRON_PENALTY_MISMATCH from 32 to 1, because short
7555      exons were being missed.  Also, eliminated querydist_credit and restored
7556      querydist_penalty.
7557
7558    * chimera.c: In Chimera_local_join_p, checking genomic positions to make
7559      sure they make sense
7560
75612014-02-21  twu
7562
7563    * gsnap.c, pair.c, pair.h, samprint.c, samprint.h: Implemented option
7564      --hide-soft-clips
7565
7566    * stage3.c: Returning value from merge_local_single
7567
7568    * substring.c, substring.h: Added function Substring_queryend_orig
7569
75702014-02-20  twu
7571
7572    * gsnap.c: Using new interface to Stage2_setup
7573
7574    * src, chimera.c, diagpool.c, gmap.c, oligoindex.c, oligoindex_hr.c, pair.c,
7575      stage2.c, stage2.h, stage3.c, uniqscan.c:  erged revisions 128075 to
7576      128117 from branches/2014-02-20-chimera-breakpoint to fix cases where
7577      breakpoint was found outside of alignments
7578
75792014-02-19  twu
7580
7581    * dynprog.c: Handling lower-case nucleotides correctly in dynamic
7582      programming. Handling alternate alleles equally in dynamic programming.
7583
7584    * bitpack64-read.c, indexdb.c: Fixed bugs in handling bitpackpages file for
7585      huge genomes
7586
75872014-02-18  twu
7588
7589    * pair.c, pair.h, stage3.c: Added numbers of matches, mismatches, indels and
7590      unknowns to GFF3 output
7591
7592    * VERSION: Updated version
7593
7594    * ax_ext.m4: Fixed typo
7595
7596    * gmap_build.pl.in: Not printing warning message about -T unless it is used.
7597       Deleting .sources file.
7598
7599    * chimera.c, chimera.h, gmap.c: Added donor_watsonp and acceptor_watsonp to
7600      fields for GMAP
7601
7602    * stage3hr.c: Fixed bug in printing header twice for concordant
7603      translocations in standard GSNAP output format
7604
7605    * get-genome.c: Added output for vareffect
7606
7607    * samprint.c: Added strands to XT field
7608
7609    * indexdb-write.c: When checking bitpack64, printing warning messages rather
7610      than exiting.
7611
76122014-02-14  twu
7613
7614    * sarray-read.c: Fixed a bug where r for an lcp-interval was being
7615      decremented from 0 to -1U.
7616
76172014-02-13  twu
7618
7619    * samprint.c: Added distant splice information to XT field, but without
7620      strand information
7621
7622    * samprint.c: Added XC flag to indicate a circular alignment
7623
76242014-02-11  twu
7625
7626    * stage1hr.c: Removed unnecessary variables that have been made global
7627      within the file
7628
7629    * branchpoint.c, branchpoint.h, splicing-score.c: Added bpa-all option to
7630      print all marginals
7631
7632    * iit-read.c: Handling cases in signed matching where sign == 0
7633
7634    * get-genome.c: Added option --aslabel
7635
7636    * sarray-read.c: Turning off debugging
7637
76382014-02-02  twu
7639
7640    * get-genome.c: Added debugging statements
7641
76422014-01-31  twu
7643
7644    * get-genome.c: Making typed queries when user provides a typestring
7645
7646    * trunk, README, src, Makefile.dna.am, Makefile.gsnaptoo.am, branchpoint.c,
7647      branchpoint.h, gsnap.c, indel.c, indel.h, sarray-read.c, splicing-score.c,
7648      stage1hr.c, translation.c, uniqscan.c, util: Merged revisions 125127 to
7649      125307 from branches/2014-01-30-lsm-branch-point to add bpa analysis to
7650      splicing-score
7651
76522014-01-29  twu
7653
7654    * splicing-score.c: Added -v flag to handle alternate alleles
7655
76562014-01-21  twu
7657
7658    * fa_coords.pl.in: Fixed syntax error
7659
7660    * VERSION, config.site.rescomp.prd, config.site.rescomp.tst, index.html,
7661      src, closeparen-table.pl, excess-leftward.pl, excess-rightward.pl,
7662      openparen-table.pl, select1-table.pl: Updated version number
7663
7664    * Makefile.dna.am, Makefile.gsnaptoo.am: Added SIMD_FLAGS and POPCNT_FLAGS
7665      to all programs that include compress.c
7666
7667    * sarray-read.c: Added benchmarking procedure Sarray_traverse_children
7668
7669    * oligo.c: Added "U" to end of integer constants where necessary
7670
7671    * gsnap.c: Fixed typo in --help statement
7672
7673    * gmap.c: Fixed revision of extension length relative to end-segment length
7674      for finding chimeras
7675
76762014-01-15  twu
7677
7678    * gmap_build.pl.in: Fixed variable name
7679
76802013-12-23  twu
7681
7682    * sarray-read.c: Added comments
7683
7684    * Makefile.gsnaptoo.am: Added files for bp-read and bp-write
7685
7686    * bp-read.c, bp-read.h, bp-write.c, bp-write.h, bp.h, gmapindex.c,
7687      sarray-write.c, sarray-write.h, types.h: Merged revisions 119699 to 122378
7688      from branches/2013-11-27-child-bitvector to add code for bitvector
7689      representation of child array
7690
7691    * bitpack64-read.c: Removed extraneous addition for second half of block for
7692      packsize of 32
7693
76942013-12-20  twu
7695
7696    * util, gmap_build.pl.in: Merged revisions 119700 through 122189 from
7697      branches/2013-11-27-child-bitvector/util to write genome indices in
7698      destination directory
7699
7700    * fa_coords.pl.in, gmap_build.pl.in, gmap_process.pl.in: Writing FASTA
7701      sources to a file
7702
77032013-12-19  twu
7704
7705    * Makefile.dna.am: Added gamma-speed-test and bitpack64-speed-test
7706
7707    * splicing-score.c: Checking if coordinates are valid
7708
7709    * gamma-speed-test.c: Printing nanoseconds per query
7710
7711    * bitpack64-speed-test.c: Using new interface to bitpack64 read commands
7712
77132013-12-17  twu
7714
7715    * stage1hr.c, stage3hr.c, stage3hr.h: Requiring single or unpaired terminals
7716      to be 2/3rds of the querylength
7717
7718    * sarray-read.c: Fixed memory leak
7719
77202013-12-16  twu
7721
7722    * dynprog.c: Fixed a bug where the wrong genomic position was provided for a
7723      high-probability microexon.
7724
7725    * stage1hr.c: Not passing in max_terminal_length to find_terminals
7726
7727    * sarray-read.c: Allowing initial lcp interval for the first nucleotide
7728
7729    * gmapindex.c: Changed name of child_uncompress procedure
7730
7731    * bitpack64-write.c: Handling the case for Bitpack64_write_direct where the
7732      packsize is 0.
7733
7734    * sarray-write.c, sarray-write.h: Included code for bp version of child
7735      information
7736
77372013-12-13  twu
7738
7739    * trunk, config.site.rescomp.tst, src, Makefile.dna.am,
7740      Makefile.gsnaptoo.am, atoiindex.c, bitpack64-access.c, bitpack64-access.h,
7741      bitpack64-read.c, bitpack64-read.h, bitpack64-write.c, bitpack64-write.h,
7742      chimera.c, cmetindex.c, dynprog.c, dynprog.h, genome.c, genome.h, gmap.c,
7743      gmapindex.c, indexdb-write.c, indexdb-write.h, indexdb.c, indexdb.h,
7744      indexdb_hr.c, indexdbdef.h, oligoindex_hr.c, oligoindex_hr.h,
7745      oligoindex_pmap.c, oligoindex_pmap.h, pmapindex.c, sarray-read.c,
7746      sarray-write.c, sarray-write.h, sequence.c, sequence.h, snpindex.c,
7747      splicetrie.c, splicetrie.h, splicing-score.c, stage1hr.h, stage2.c,
7748      stage2.h, stage3.c, stage3.h, util: Merged revisions 120874 through 121506
7749      from branches/2013-12-10-pmap
7750
77512013-12-04  twu
7752
7753    * substring.c: Fixed issue where circularpos was assigned in the region of
7754      soft-clipping, leading to an improper CIGAR string
7755
7756    * pair.c: In GFF output, printing '.' instead of '?' for unknown strand
7757
77582013-11-27  twu
7759
7760    * index.html: Added statement about portability of NAN
7761
7762    * iit_store.c: Handling NAN when it is not available
7763
7764    * VERSION, index.html: Updated version number
7765
7766    * popcount.c, popcount.h: Added popcount.c and popcount.h to store tables
7767
7768    * Makefile.gsnaptoo.am: Added popcount.c and popcount.h
7769
7770    * genome_sites.c: Added include of popcount.h.
7771
7772    * genome_hr.c: Moved tables to popcount.c, and added include of popcount.h.
7773
7774    * indexdb-write.c: Revised size of memory reserved for bitpackptrs.  Moved
7775      tables to popcount.c, and added include of popcount.h.
7776
7777    * sarray-write.c: Revised size of memory reserved.  Added include of
7778      popcount.h
7779
77802013-11-22  twu
7781
7782    * VERSION, config.site.rescomp.prd, config.site.rescomp.tst, index.html:
7783      Updated version number
7784
7785    * stage3.c: In build_pairs_singles, build_pairs_introns, and
7786      build_pairs_dualintrons, remove gaps at the beginning and any gap at the
7787      end of the path.
7788
7789    * outbuffer.c: Fixed bug in using --fails-as-input on single-end reads
7790
7791    * stage3.c: Replaced one call to insert_gapholders with List_reverse
7792
7793    * Makefile.dna.am: Added files needed for gmapindex and uniqscan
7794
7795    * stage3.c: Inserting gapholders before calling assign_intron_probs
7796
77972013-11-21  twu
7798
7799    * iit-read.c: Fixed issue with IIT_fieldvalue returning the previous line
7800      from what is desired
7801
78022013-11-20  twu
7803
7804    * stage3hr.c: Slight improvement in efficiency of randomization procedure
7805
7806    * stage3hr.c: Using a different formula for generating a random integer
7807
7808    * stage3hr.c: Put an explicit type conversion to double before RAND_MAX
7809
7810    * stage3hr.c: Now picking primary alignment randomly among best alignments
7811      with ties
7812
7813    * trunk, VERSION, config.site.rescomp.prd, config.site.rescomp.tst,
7814      index.html, src, Makefile.dna.am, Makefile.gsnaptoo.am, dynprog.c,
7815      iit-read.c, iit-read.h, iit-write.c, iit-write.h, iit_get.c, iit_store.c,
7816      iitdef.h, littleendian.h, util: Merged revisions 115687 to 115891 from
7817      branches/2013-11-19-iit-values
7818
7819    * archive.html: Moved version 2013-10-28 to archive
7820
7821    * README: Changed discussion to use gmap_build and not gmap_setup
7822
7823    * configure.ac, Makefile.am: Removing gmap_setup program
7824
7825    * gmap_build.pl.in: Added --contigs-are-mapped and --fasta-pipe options.
7826      Providing arguments to subroutines.
7827
7828    * archive.html: Added link to old database format using gamma compression
7829
7830    * index.html: Provided link to old genome database formats
7831
7832    * src, spanningelt.c: Fixed bug in code for spanning elt in regular-sized
7833      genomes
7834
7835    * index.html: Updated for version 2013-10-28
7836
78372013-11-19  twu
7838
7839    * README: Changed N1 and N2 abbreviations to NM
7840
7841    * configure.ac, Makefile.am, ensembl_genes.pl.in: Added ensembl_genes program
7842
7843    * gmap_build.pl.in: Clarified usage of --circular feature when --names
7844      feature is also used
7845
7846    * bitpack64-read.c: Declared variable needed for non-SSE2 compilation
7847
7848    * stage2.c: Revised debugging statement
7849
7850    * gmap.c: Added debugging statement
7851
7852    * oligoindex_hr.c: Made debugging statements print counts for each oligomer
7853
7854    * oligoindex.c: Resetting query_evaluated_p in Oligoindex_untally, to avoid
7855      interactions between queryseqs
7856
78572013-11-18  twu
7858
7859    * dynprog.c: Not adding dashes if gap is long.  Not performing simple genome
7860      gap if finalp is true.
7861
7862    * Makefile.dna.am, Makefile.gsnaptoo.am: Added sarray-read.c and
7863      sarray-read.h to gsnapl and uniqscanl
7864
7865    * uniqscan.c: Using new interface to Stage3hr_setup
7866
7867    * stage3hr.c, outbuffer.c: Removed unused code
7868
7869    * smooth.c, stage3.c: In traverse_dual_intron, if dual scores win over
7870      single score, protecting that exon from being smoothed by size
7871
7872    * pairdef.h: Added comment about protectedp
7873
7874    * pair.c, pair.h: Added function Pair_protect_list
7875
7876    * gmap.c: Reduced threshold for discarding gegion from 0.80*max_ncovered to
7877      0.25.
7878
7879    * stage3.c: Reduced amount of peelback.  Made fixes to improve finding of
7880      short exons, including reducing criteria for dual_canonical_p, and
7881      eliminating smoothing by size.
7882
7883    * access.c, atoiindex.c, bitpack64-read.c, bitpack64-read.h,
7884      bitpack64-speed-test.c, cmetindex.c, genome_hr.c, genome_hr.h, gmap.c,
7885      gmapindex.c, gsnap.c, iit-read-univ.c, indexdb-write.c, indexdb-write.h,
7886      indexdb.c, indexdb.h, indexdb_hr.c, indexdbdef.h, littleendian.c,
7887      littleendian.h, snpindex.c, spanningelt.c, spanningelt.h, stage1hr.c,
7888      stage2.c, types.h: Merged changes from branches/2013-10-16-huge-genomes to
7889      allow for huge genomes, where offsets (or length of positions file)
7890      exceeds 2^32 entries
7891
7892    * table.c, table.h: Changed type of keyfree procedure to take const void as
7893      argument type
7894
7895    * stage3hr.c, stage3hr.h, shortread.c, shortread.h: Allowing
7896      --fails-as-input to work with new xs files
7897
7898    * samflags.h: Changed N1 and N2 abbreviations to be NM
7899
7900    * outbuffer.c, samprint.c, samprint.h: Allowing --fastq-as-input to work
7901      with new xs files
7902
7903    * gmap_build.pl.in: Merged changes from branches/2013-10-16-huge-genomes to
7904      count offsets and to allow for huge genomes
7905
79062013-11-15  twu
7907
7908    * README: Added discussion of XS categories
7909
7910    * stage1hr.c: Setting ignore_found_score to be found_score, so the input
7911      value will be correct for those procedures
7912
7913    * stage1.c: For short sequences (< 4 times the default matchsize),
7914      performing scan_ends twice, once with a short matchsize and once with a
7915      default matchsize, to improve sensitivity and specificity
7916
7917    * outbuffer.c, samflags.h, samprint.c, samprint.h, stage3hr.c, stage3hr.h:
7918      Added xs (for quiet-if-excessive) output files and categories.  Added
7919      file_setup procedures for SAM and standard output types for GSNAP.
7920
7921    * gmap.c: Removed --quiet-if-excessive option from GMAP, because not working
7922      properly
7923
7924    * gmap_build.pl.in: Added comment
7925
7926    * indexdb-write.c: Not performing check of bitpack compression when there
7927      are no k-mers
7928
7929    * sarray-write.c: Include code for reading sarray using fread instead of mmap
7930
79312013-11-14  twu
7932
7933    * Makefile.gsnaptoo.am, gmapindex.c, sarray-write.c, sarray-write.h: Using
7934      less memory for creating LCP array and saindex.  Using permuted LCP
7935      algorithm.  Memory mapping suffix array file and compressed LCP files.
7936
7937    * genome_hr.c: Made some fixes to Genome_consecutive_matches_pair
7938
79392013-11-13  twu
7940
7941    * genome_hr.c, genome_hr.h: Implemented Genome_consecutive_matches_pair
7942
79432013-10-29  twu
7944
7945    * indexdb.c: Using a separate sanity check on positions filesize when
7946      expanding bitpack offsets
7947
7948    * stage1.c: Reducing intial matchsize for short reads to be less than half
7949      the read length
7950
7951    * stage3hr.c: Fixed bug where insert length was being computed between GMAP
7952      and a TRANSLOC_SPLICE using pair_insert_length on a NULL substring.
7953
79542013-10-25  twu
7955
7956    * samheader.c: Changed SO:unknown to SO:unsorted
7957
7958    * gregion.c: Added comment
7959
79602013-10-24  twu
7961
7962    * gregion.c: In Gregion_extend, when chrend goes past chrhigh, setting it to
7963      chrlength - 1 and not chrlength
7964
7965    * stage3.c: In peel_leftward and peel_rightward, removing initial pairs that
7966      are gaps or indels, so we don't leave a gap or indel on the top.
7967
7968    * genome_sites.c: Fixed bug in large genomes where -1U was being compared to
7969      -1UL
7970
79712013-10-23  twu
7972
7973    * samprint.c: For non-concordant pairs, setting clipdir to 0 when calling
7974      SAM_compute_chrpos.
7975
7976    * stage3hr.c: Now calling pair_insert_length_unpaired for unpaired
7977      alignments involving GMAP, which helps to eliminate duplicate alignments.
7978
7979    * README, acinclude.m4, sse2_shift_defect.m4, configure.ac: Added a test to
7980      see if compiler can handle SSE2 shift commands properly, and setting
7981      config.h variable automatically
7982
79832013-10-22  twu
7984
7985    * VERSION: Updated version number
7986
7987    * dynprog.c: Fixed bug in checking for either rlength or glength to be too
7988      long
7989
79902013-10-21  twu
7991
7992    * stage3.c: Added hooks for HMM step, but still not using it
7993
7994    * stage3.h, gmap.c: Allowing stage3debug value of middle
7995
7996    * oligoindex.c: Adjusting lookback for querypos with no hits.  Making
7997      lookback value equal for all oligoindices.
7998
79992013-10-18  twu
8000
8001    * gmap.c, stage3.c, stage3.h: Made stage3debug work
8002
80032013-10-15  twu
8004
8005    * stage1hr.c: Restored previous criteria for distant splicing
8006
8007    * stage1hr.c: Loosened criteria for distant splice probabilities.  Checking
8008      for gmap_allowance on single-end GMAP and terminal improvement GMAP
8009      alignments.
8010
8011    * samprint.c: Fixed bug in printing deletion resulting in consecutive M
8012      tokens
8013
80142013-10-11  twu
8015
8016    * indexdb.c: Fixed memory leak when snp_root is given and desired file is
8017      not found
8018
8019    * boyer-moore.c: Checking if result of Genome_get_segment_blocks_left is NULL
8020
8021    * dynprog.c, genome.c: In Genome_get_segment_blocks_right and
8022      Genome_get_segment_blocks_left, returning NULL if the entire segment is
8023      outside the chromosomal bounds
8024
8025    * psl_splices.pl.in: Replaced hard-coded path
8026
80272013-10-10  twu
8028
8029    * stage3.c: Fixed typo in variable name
8030
8031    * stage2.c: Adding query_offset to pairs in non-standard (cmet or atoi) modes
8032
80332013-10-09  twu
8034
8035    * VERSION, index.html: Updated version
8036
8037    * README: Added section on compiler issues.  Added section numbers.  Revised
8038      some sections for latest version.
8039
8040    * configure.ac: Added option --with-defective-sse2-compiler
8041
8042    * diag.c: Added debugging statements
8043
8044    * compress.c: Added code for defective SSE2 compilers
8045
8046    * stage3hr.c: In cases of overreach when concordant is expected, returning
8047      NULL
8048
8049    * gmap_build.pl.in: Added --no-sarray option to gmap_build
8050
8051    * stage3.c: Making sure not to leave an INDEL_COMP or SHORTGAP_COMP on the
8052      path/pairs after peelback.  Treating SHORTGAP_COMP the same as INDEL_COMP
8053      throughout the code.
8054
8055    * VERSION: Updated version number
8056
8057    * stage3hr.c: Fixed insert length calculation for GMAP/substring to go from
8058      start5 to end3.  Fixed overreach criteria to consider only substrings that
8059      entirely pass the end of the other hit.
8060
80612013-10-08  twu
8062
8063    * substring.c, substring.h: Added procedures for segment-based trimmed
8064      overlap
8065
8066    * stage3hr.c: Added a general segment-based algorithm for finding trimmed
8067      insert length.  Skipping cases with dual overreach.
8068
8069    * stage3hr.c: Fixed bug in computing start and end positions in computing
8070      non-GMAP trimmed insert length.  Added code for handling overreach, by
8071      changing splices to halfsplices.
8072
8073    * stage3hr.c: Fixed bug in computing overlap between non-GMAP hits
8074
80752013-10-07  twu
8076
8077    * stage3hr.c, substring.c, substring.h: Checking all possible substring
8078      combinations in computing trimmed insertlength for non-GMAP hits
8079
80802013-10-04  twu
8081
8082    * stage1hr.c: Fixed the computation of goal for finding the next chromosome
8083      bounds
8084
8085    * sarray-read.c: Handling the case where nmisses_allowed < 0.  When saindex
8086      yields no result, setting low and high to be the beginning and end of the
8087      suffix array, rather than iterating through saindex for a hit.
8088
8089    * gmap.c, stage3.c, stage3.h: In Stage3_merge_local, recovering gracefully
8090      if clipping leads to a NULL list for either Stage3_T object
8091
8092    * substring.c, substring.h: Revised calculation of substring trimmed
8093      insertlength to account for trimming of 3' end
8094
8095    * stage3hr.c: Revised calculation of GMAP-GMAP trimmed insertlength to
8096      account for trimming
8097
8098    * samprint.c: Revised criteria for printing D token in deletion.  Made
8099      compute_cigar follow the logic of print_cigar.
8100
81012013-10-03  twu
8102
8103    * samprint.c: Not computing pair overlaps when a pair involves a circular
8104      alignment
8105
8106    * stage3hr.c: Improvements to computing insertlength for calculating pair
8107      overlap. Using querylength instead of querylength_adj.  Searching through
8108      GMAP pairarray to find intersecting genomic position.  Added checks for
8109      (left + genomiclength) > chrhigh.
8110
81112013-10-02  twu
8112
8113    * stage3.c: Reduced some parameters for looping in stage 3.  Not looping for
8114      GSNAP.
8115
8116    * bitpack64-access.c: Added an access procedure for packsize of zero
8117
8118    * bitpack64-read.c: Added a read procedure for packsize of zero
8119
8120    * sarray-write.c: Changed type of len to size_t
8121
8122    * indexdb-write.c: Handling the case when all offset differences are zero,
8123      resulting in a packsize of zero
8124
8125    * bitpack64-write.c: Allow writing when packsize is zero
8126
8127    * stage3.c: Using new interface to Pair_print_sam
8128
8129    * config.site.rescomp.prd, config.site.rescomp.tst: Updated version number
8130
8131    * archive.html, index.html: Made changes for new version, including the one
8132      with suffix arrays
8133
8134    * outbuffer.c, pair.c, pair.h, samprint.c, samprint.h, stage3hr.c,
8135      substring.c, substring.h: Made numerous changes to fix --clip-overlap,
8136      involving trimming and GMAP alignments
8137
81382013-10-01  twu
8139
8140    * sarray-read.c: Fixed memory leak
8141
8142    * stage3.c: Turned off HMM in stage 3, because of possible core dumps
8143
8144    * VERSION, config.site.rescomp.prd: Updated version number
8145
8146    * stage3.c: Allowing iteration of outer loop based on dual_break_p.
8147      Restored filtering by HMM to remove bad sections.  Removing noncanonical
8148      end exons only if both canonical and noncanonical introns exist.
8149
8150    * index.html: Updated for latest version
8151
8152    * fa_coords.pl.in, gmap_build.pl.in: Allowing user to specify circular
8153      chromosomes using a file instead of the command line.
8154
81552013-09-30  twu
8156
8157    * bitpack64-read.c: Fixed Bitpack64_offsetptr_only to work on poly-T.
8158
8159    * trunk, src, bitpack64-read.c, dynprog.c, indexdb-write.c, sarray-write.c,
8160      util: Merged revisions from branches/2013-09-27-bidir-bitpack64 to
8161      implement bidirectional bitpack64 format
8162
8163    * stage3hr.c: Fixed Stage3pair_overlap to handle trimmed ends
8164
8165    * access.c: Disabling memory allocation on Macintoshes when single fread
8166      fails
8167
8168    * stage3hr.c, stage1hr.c: Added debugging statements
8169
8170    * resulthr.c: Handing a missing case in Pairtype_string
8171
8172    * gsnap.c: No longer setting terminal_output_minlength to be MAX_READLENGTH
8173
8174    * genome_hr.c: For determining trim mismatches, setting query_unk_mismatch_p
8175      to be false
8176
8177    * access.c: Eliminated check for i >= 0, which is always true for size_t
8178
8179    * trunk, src, dynprog.c, util: Merged revisions 109420 through 109556 from
8180      releases/internal-2013-09-27
8181
8182    * stage1hr.c, stage3hr.c, stage3hr.h: Merged revisions 109420 through 109556
8183      from releases/internal-2013-09-27 to check for concordance after running
8184      GMAP improvement on paired results
8185
8186    * madvise-flags.m4: Added check for MADV_SEQUENTIAL
8187
8188    * dynprog.c: Making NEG_INFINITY_16 and NEG_INFINITY_8 visible outside SIMD
8189      code
8190
8191    * gmap.c: No longer bounding chimera_margin by CHIMERA_SLOP
8192
8193    * stage3.c: Avoiding indels at chimeric join by cleaning ends, extending
8194      with no gaps, and then clipping at breakpoint
8195
8196    * substring.c: Fixed assertions relative to chrhigh
8197
8198    * samprint.c: Fixed bug in --clip-overlap SAM output where deletion occurs
8199      at the clip site
8200
82012013-09-27  twu
8202
8203    * VERSION, config.site.rescomp.prd, config.site.rescomp.tst: Updated version
8204      number
8205
8206    * access.c: Fixed workaround for Macintosh fread bug to handle large files
8207
8208    * uniqscan.c, gsnap.c: Providing --gmap-allowance
8209
8210    * samprint.c: Fixing problem with mate flags when the mate is a translocation
8211
8212    * sarray-read.h: Passing max_end_deletions into Sarray_setup
8213
8214    * sarray-read.c: If read from beginning does not go halfway across read,
8215      trying a search from the middle of the read to see if it finds
8216      substitutions
8217
8218    * substring.c: No longer allowing donor or acceptor substrings in circular
8219      chromosomes, thereby preventing translocations to those chromosomes
8220
8221    * stage1hr.c, stage3hr.c, stage3hr.h: Using
8222      Stage3end_better_equiv_unpaired_p as a way to determine whether to do
8223      spanning set or complete set algorithms after suffix array method.
8224
8225    * stage1hr.h: Added gmap_allowance to filter bad GMAP alignments
8226
8227    * stage1hr.c: Added gmap_allowance to filter bad GMAP alignments.  No longer
8228      allowing GMAP splicing on circular chromosomes.  No GMAP improvement on
8229      sarray hits.  Doing spanning set algorithm after suffix array algorithm if
8230      nconcordant == 0.
8231
82322013-09-26  twu
8233
8234    * sarray-read.c: Applying max_end_deletions.  In collect_elt_matches,
8235      looking at multiple hits between goal and high.
8236
82372013-09-25  twu
8238
8239    * substring.c: Disallowing donor and acceptor substrings on the duplicate
8240      length of a circular chromosome
8241
82422013-09-24  twu
8243
8244    * trunk, src, dynprog.c, util: Reintegrated changes from
8245      releases/internal-2013-09-01
8246
8247    * stage3.c, stage3.h: Merged changes from releases/internal-2013-09-01 to
8248      clean ends of local merges
8249
8250    * sarray-read.c, sarray-read.h: Merged changes from
8251      releases/internal-2013-09-01 to fix problem with insertions that are too
8252      long
8253
8254    * pairpool.c: Merged changes from releases/internal-2013-09-01 to add
8255      debugging statements
8256
8257    * pair.c: Merged changes from releases/internal-2013-09-01 to fix
8258      Pair_start_bound and Pair_end_bound
8259
8260    * gsnap.c: Merged changes from releases/internal-2013-09-01 to use new
8261      interfaces to Sarray_setup
8262
8263    * gmap.c: Merged changes from releases/internal-2013-09-01 to use new
8264      interfaces to stage 3 chimera commands
8265
8266    * VERSION: Updated version number
8267
8268    * bitpack64-speed-test.c, gamma-speed-test.c: Finding both start and end of
8269      position blocks
8270
8271    * sarray-read.c, sarray-read.h: Make some fields available for benchmarking
8272      purposes
8273
82742013-09-19  twu
8275
8276    * pairpool.c: In Pairpool_clean_join, making sure that no gaps are left at
8277      the medial ends
8278
8279    * gmap.c, stage3.c, stage3.h: In Stage3_merge_local_splice, when a new
8280      intron is required, performing a full call to path_compute_dir and
8281      path_compute_final
8282
8283    * gmap.c: Fixed behavior of -n 1 in GMAP, so it does not print chimeric
8284      alignments
8285
8286    * substring.c: Added new assertions to make sure alignstart and alignend do
8287      not exceed chrhigh
8288
8289    * sarray-read.c: For indels and splices, using subtract_bounded and
8290      add_bounded to make sure the other piece is on the same chromosome
8291
82922013-09-18  twu
8293
8294    * stage3hr.c: Stage3pair_overlap inverts return value for minus alignments
8295
8296    * pair.c, stage3hr.c: Reworking of computation for pair overlap
8297
82982013-09-17  twu
8299
8300    * bitpack64-speed-test.c: Added code for uncompressing offsets
8301
8302    * stage3hr.c: Computing pair overlap based on difference between totallength
8303      and insertlength, not alignment coordinates
8304
8305    * indexdb.c: Fixed user message about expanding bitpackcomp
8306
8307    * stage1hr.c: Fixed memory leak when newpair involving GMAP is not kept
8308
8309    * stage3.c: When queryjump > nullgap and genomejump < 16 (not enough
8310      material for stage 2), performing cDNA gap rather than dual break.  In
8311      path_compute_final, resolving dual breaks as single gaps.
8312
8313    * stage3hr.c: Fixed memory leak when resolving inner splices
8314
8315    * stage3hr.c: Fixed memory leak
8316
8317    * shortread.c: Fixed invalid read in chopping adapters when read lengths are
8318      unequal
8319
8320    * stage3.c: Removed uninitialized variable
8321
83222013-09-16  twu
8323
8324    * bitpack64-speed-test.c: Initial import into SVN
8325
8326    * Makefile.dna.am: Made instructions match those for Makefile.gsnaptoo.am.
8327      Added gamma-speed-test and bitpack64-speed-test.
8328
8329    * gamma-speed-test.c: Simplified code to focus only on decoding
8330
8331    * splicing-score.c: Fixed code for universal chromosome IIT
8332
8333    * indexdb.c, indexdb.h: Exposing different types of filename retrieval
8334
8335    * gregion.c: Considering fraction of overlap
8336
8337    * samprint.c: Fixed CIGAR strings for splicing on minus strand with
8338      clip-overlap
8339
83402013-09-13  twu
8341
8342    * gmap.c: Increased default value for shortsplicedist from 200,000 to
8343      2,000,000. For chimeric alignments that overlap, fixed bug where
8344      maxpeelback was negative, and now computing based on midpoint of the ends.
8345
8346    * stage3.h: In Stage3_mergeable, using shortsplicedist
8347
8348    * stage3.c: No longer calling Smooth_pairs_by_netgap.  In Stage3_mergeable,
8349      using shortsplicedist.  For extend5 and extend3, handling case where path
8350      or pairs is NULL after peelback.
8351
83522013-09-12  twu
8353
8354    * index.html: Updated for version 2013-09-11
8355
83562013-09-11  twu
8357
8358    * VERSION, config.site.rescomp.prd, config.site.rescomp.tst: Updated version
8359      numbers
8360
8361    * stage1.c: Added debugging statement
8362
8363    * gregion.c: For determining overlaps, using chrstart and chrend instead of
8364      extentstart and extentend
8365
8366    * pairpool.c, pairpool.h, stage3.c: In merging for local splice, added
8367      procedure Pairpool_clean_join to peel back both ends to remove negative
8368      genomejumps
8369
8370    * stage3hr.c: Fixed memory leak on new pairs.  Handling the case where pair
8371      overlap is negative.
8372
8373    * stage1hr.c: Fixed memory leak on terminal alignments that fail
8374      terminal_output_minlength test
8375
8376    * shortread.c: Not checking for Illumina endings on shortreads that are
8377      skipped
8378
8379    * dynprog.c: Initializing value of introntype
8380
8381    * pair.c: Fixed memory leak under --clip-overlap feature
8382
8383    * stage3.c: Requiring 25 bp on both sides of a chimeric alignment
8384
83852013-09-10  twu
8386
8387    * sarray-read.c: Fixed bug in Elt_fill_positions_all where position could be
8388      negative
8389
83902013-09-09  twu
8391
8392    * VERSION, config.site.rescomp.prd, config.site.rescomp.tst: Updated version
8393      number
8394
8395    * stage3.c: Restored finding of microexons when sequence quality is high
8396
8397    * pair.c: Making value of trim_5p dependent on trim_left
8398
8399    * iit_get.c: Fixed behavior of iit_get to work when reading queries from
8400      stdin
8401
8402    * stage3.c: In build_pairs_introns, when finalp is true, solving cDNA gaps
8403      as single gaps
8404
8405    * stage3.c: When finalp is true, not relying upon stored solutions for any
8406      traversal functions.  Performing iterative calls to trimming non-canonical
8407      introns at ends.
8408
8409    * stage3.c: In traverse_single_gap, not relying upon previous result if
8410      finalp is true
8411
8412    * stage2.c: Checking to make sure calls to genome canonicalp procedures do
8413      not have negative coordinates
8414
8415    * gmap.c: Initializing genome_sites for pairalign and usersegment queries
8416
8417    * samprint.c: Allowing printing of results where chrpos == 0
8418
84192013-09-06  twu
8420
8421    * index.html: Updated for latest version
8422
8423    * stage1hr.c: Fixed bug in add_bounded where coordinates extended past
8424      chrhigh
8425
8426    * dynprog.c: Fixed problem where prob_trunc was not being initialized
8427
8428    * fa_coords.pl.in, gmap_build.pl.in: Added -n flag for substituting
8429      chromosome names
8430
84312013-09-05  twu
8432
8433    * gmap.c, gsnap.c: Allowed compilation when pthreads are disabled
8434
84352013-09-04  twu
8436
8437    * stage1hr.c: Fixed a fatal bug in find_middle_indels when floors is NULL,
8438      from all oligos being omitted
8439
8440    * README: Added discussion about output types
8441
8442    * stage3.c: Put an absolute limit on peelback.  In build_pairs_introns, when
8443      queryjump exceeds nullgap, traversing a dual break.
8444
8445    * pair.c: Eliminated possible infinite loop in
8446      Pair_guess_cdna_direction_array
8447
8448    * gsnap.c: Fixed error message
8449
84502013-09-03  twu
8451
8452    * VERSION, config.site.rescomp.prd, config.site.rescomp.tst: Updated version
8453      number
8454
8455    * outbuffer.c, stage3.c, stage3.h: Providing XO abbrev information for SAM
8456      output from GMAP
8457
8458    * outbuffer.c, pair.c, pair.h, samflags.h, samprint.c, samprint.h: Added XO
8459      flag to SAM output
8460
84612013-08-31  twu
8462
8463    * gsnap.c, stage3hr.c, stage3hr.h, uniqscan.c: Fixed genomicstart for
8464      distant samechr splices when --merge-distant-samechr is specified
8465
8466    * gregion.c: Fixed bug where overlaps were found in error because we used
8467      chromosomal coordinates instead of univeral coordinates
8468
8469    * dynprog.c: Made single and paired indel penalties the same
8470
8471    * README, configure.ac: Modified to include gvf_iit program
8472
8473    * Makefile.am, gvf_iit.pl.in: Added gvf_iit program
8474
84752013-08-30  twu
8476
8477    * gsnap.c, stage1hr.c, stage1hr.h, stage3hr.c, stage3hr.h, uniqscan.c:
8478      Filtering terminals with minlength less than value of
8479      --terminal-output-minlength
8480
8481    * stage3hr.c, stage3hr.h: Creating Stage3pair_T for terminals only if the
8482      terminal lengths exceed terminal_output_minlength
8483
84842013-08-29  twu
8485
8486    * trunk, src, dynprog.c, dynprog.h, gmap.c, gsnap.c, pair.c, pair.h,
8487      stage1hr.c, stage2.c, stage3.c, stage3.h, stage3hr.c, stage3hr.h, util:
8488      Did reintegration merge (revisions 106186 to 106267) of
8489      branches/2013-08-20-stage23-work
8490
8491    * trunk, src, util: Commit property changes
8492
84932013-08-28  twu
8494
8495    * trunk, config.site.rescomp.tst, src, Makefile.gsnaptoo.am, diag.c, diag.h,
8496      doublelist.c, dynprog.c, dynprog.h, genome_sites.c, genome_sites.h,
8497      gmap.c, gsnap.c, pairpool.c, pairpool.h, smooth.c, smooth.h, stage1.c,
8498      stage1hr.c, stage2.c, stage2.h, stage3.c, stage3.h, stage3hr.c,
8499      uniqscan.c, util: Merged revisions 105272 to 106186 from
8500      branches/2013-08-20-stage23-work to improve stage 2 and stage 3 procedures
8501
85022013-08-20  twu
8503
8504    * trunk, src, dynprog.c, gmap.c, scores.h, util: Reintegrated changes from
8505      branches/2013-08-20-stage23-work to fix alignment rankings and evaluation
8506      of chimeric alignment
8507
8508    * index.html: Updated for latest version
8509
8510    * gmap_build.pl.in: Handling case where -d argument contains a directory
8511
8512    * stage3.c: Fixed infinite loop in peelback.  Made some changes in stage 3
8513      procedures.
8514
8515    * smooth.c, smooth.h: Restoring procedures for smoothing by net gap
8516
8517    * pairpool.c: If all pairs are outside bounds, returning NULL
8518
8519    * dynprog.c: Reduced indel penalties for low-quality alignments
8520
85212013-08-19  twu
8522
8523    * outbuffer.c: Adding q() line for R output
8524
85252013-08-13  twu
8526
8527    * stage3hr.c: Restoring filter where terminal alignments are removed if
8528      mismatches in trimmed region are too high.
8529
8530    * stage1hr.c: In computing genomic bounds for GMAP alignment, truncating at
8531      chromosomal boundaries
8532
8533    * chimera.c, chimera.h, gmap.c: For local chimeric joins, requiring that
8534      pieces are locally joinable in Chimera_bestpath
8535
85362013-08-07  twu
8537
8538    * dynprog.c: Added cache flush command needed for opencc compiler using -O3
8539      optimization on AMD computers
8540
85412013-08-06  twu
8542
8543    * genome_sites.c: Added lookup tables needed when popcnt is not available
8544
8545    * genome_hr.c: Made debugging statements work when popcnt is not available
8546
85472013-08-02  twu
8548
8549    * acinclude.m4, configure.ac: Removed popcnt.m4 and relying upon ACX_BUILTIN
8550
8551    * Makefile.gsnaptoo.am, outbuffer.c, samheader.c, samheader.h, samprint.c,
8552      samprint.h: Moved SAM header code to new samheader.c file
8553
8554    * gmap.c, gsnap.c, outbuffer.c, outbuffer.h, samprint.c, samprint.h: Added
8555      @HD and @PG header lines to SAM output
8556
85572013-07-25  twu
8558
8559    * trunk, NOTICE, config.site.rescomp.prd, src, Makefile.gsnaptoo.am,
8560      dynprog.c, fastlog.h, mapq.c, mapq.h, oligoindex.c, oligoindex_hr.c,
8561      pair.c, pair.h, pairpool.c, smooth.c, smooth.h, stage1hr.c, stage2.c,
8562      stage3.c, stage3hr.c, stage3hr.h, substring.c, substring.h,
8563      univinterval.c, univinterval.h, util: Reintegrated revisions from
8564      branches/2013-07-24-fastlog to use a fast approximate log function, to
8565      change calloc to malloc in several places, to eliminate a smoothing step,
8566      and to improve the oligoindex SIMD procedures slightly
8567
85682013-07-24  twu
8569
8570    * trunk, src, Makefile.gsnaptoo.am, compress.c, compress.h, dynprog.c,
8571      dynprog.h, genome_hr.c, gmap.c, pair.c, pairpool.c, pairpool.h,
8572      stage1hr.c, stage2.c, stage3.c, stage3.h, stage3hr.c, stage3hr.h, table.c,
8573      table.h, univinterval.c, univinterval.h, util: Merged revisions 102329 to
8574      102725 from branches/2012-07-22-stage3-dir to restructure stage 3
8575      algorithm, including reduction of insert_gapholders and early
8576      determination of cdna direction; to add gmap_history to memoize GMAP
8577      alignment results; and SIMD instructions for Compress_shift
8578
85792013-07-23  twu
8580
8581    * stage3.c: Changing Pairpool_pop to List_transfer_one in some procedures
8582
85832013-07-22  twu
8584
8585    * trunk, config.site.rescomp.prd, config.site.rescomp.tst, src, dynprog.c,
8586      dynprog.h, stage3.c, util: Reintegrated revisions from
8587      branches/2013-07-22-stage3-dir to determine cdna direction early in stage 3
8588
8589    * VERSION, index.html: Updated version number
8590
8591    * dynprog.c: Added memory fences at beginning of each SIMD loop
8592
8593    * stage2.c: Returned to safer code that keeps ranges 3 and 4 separate
8594
8595    * dynprog.c: Added needed variables
8596
8597    * dynprog.c: Making a single call to find splice sites from IIT file,
8598      instead of individual calls.  Removed old code.
8599
8600    * iit-read.c, iit-read.h: Added functions IIT_get_lows_signed and
8601      IIT_get_highs_signed
8602
8603    * dynprog.c: In bridge_intron_gap procedures, using genomic sequence instead
8604      of get_genomic_nt.  Also, implemented code for a single call to
8605      splicesites IIT, instead of one call for each position.
8606
86072013-07-19  twu
8608
8609    * gmap_setup.pl.in: Added backslash escape that was missing
8610
8611    * gsnap.c: Fixed help string
8612
8613    * sarray-read.c: Handling N's in query sequence
8614
8615    * types.h, compress-write.h: Added comment
8616
8617    * iit-write-univ.c: Fixed message
8618
8619    * types.h: Prefer to use unsigned long long for UINT8
8620
8621    * stage3hr.c: Changed debugging statement
8622
8623    * stage1hr.c: Changed single-end statement for terminals to look like
8624      paired-end statement
8625
8626    * gsnap.c: Turned default for --terminal-threshold back to 2 for DNA-Seq
8627      alignments, since terminal alignments are needed for some GMAP alignments
8628
8629    * bitpack64-write.c: Using _mm_store_si128 instead of type casting and a
8630      memory fence
8631
8632    * iit-write-univ.c: Added message about coordinate sizes
8633
8634    * iit-write.c: Including types.h just to make sure we have it
8635
8636    * stage1hr.c: Keeping lowprob splices, unless dominated by other splices
8637
8638    * gsnap.c, uniqscan.c: Restored --pairexpect and --pairdev flags.  Giving
8639      information to Stage3hr_setup.
8640
8641    * stage3hr.h, stage3hr.c: Using insertlength, outerlength, and splice
8642      probabilities to resolve difficult cases
8643
8644    * dynprog.c: Added memory fences between SIMD and non-SIMD code
8645
86462013-07-18  twu
8647
8648    * bitpack64-write.c: Added comment
8649
8650    * bitpack64-write.c: Added _mm_lfence to take care of incorrect SIMD
8651      behavior.
8652
8653    * sarray-write.c: Fixed procedure for checking bitpack compression
8654
8655    * indexdb.c: Fixed procedure for expanding bitpack
8656
8657    * indexdb-write.c: Added procedure for checking bitpack compression
8658
8659    * indexdb.c: Added extra information to message when expanding offsets
8660
8661    * bitpack64-access.c: Added extra information to error message
8662
8663    * bitpack64-read.c: Fixed Bitpack64_block_offsets function
8664
86652013-07-17  twu
8666
8667    * trunk, config.site.rescomp.tst, src, dynprog.c, util: Reintegrated
8668      revisions from branches/2013-07-16-faster-stage2
8669
8670    * stage2.c: Reintegrated revisions from branches/2013-07-16-faster-stage2 to
8671      speed up stage2 procedure
8672
8673    * trunk, src, dynprog.c, goby.c, goby.h, inbuffer.c, shortread.c,
8674      shortread.h, uniqscan.c, util: Reintegrated changes from
8675      branches/2013-07-17-reduce-memset to not allocate memory for short reads
8676      that are skipped
8677
8678    * VERSION, index.html: Updated version number
8679
8680    * dynprog.c: Preventing read of uninitialized variable at matrix position
8681      0,0 during traceback
8682
8683    * dynprog.c: Not initializing directions_Egap or directions_nogap
8684
8685    * stage1.c: In find_first_pair, restored decision of which end to advance
8686      based on number of hits
8687
86882013-07-16  twu
8689
8690    * gregion.c: Turned off debugging
8691
8692    * stage1.c: Allowing arbitrarily long scanning of ends to find first pair
8693
8694    * sequence.c: Modified definition of Sequence_trimlength
8695
8696    * gregion.c: Handling case where extentstart goes past genomic position 0
8697
8698    * gmap.c: Handling case where genomebits is absent
8699
8700    * genome.c: Handling case of negative genomic coordinates
8701
8702    * diag.c, oligoindex_hr.c, stage3.c: Fixed ends to capture last oligomers of
8703      sequence and chimeras exactly
8704
8705    * stage2.c: Added comment
8706
8707    * bitpack64-access.c: Fixed size of accessor table
8708
8709    * dynprog.c: For non-SSE4.1 8-bit SIMD, using a separate pairscore array
8710      incremented by 128.
8711
87122013-07-15  twu
8713
8714    * dynprog.c: Turned off debugging
8715
8716    * trunk, src, dynprog.c, util: Reintegrated revisions from
8717      branches/2013-07-15-sse2-simd8 allowing SSE computers to run 8-bit SIMD
8718      dynamic programming procedures
8719
8720    * VERSION, config.site.rescomp.prd, index.html: Updated version number
8721
8722    * dynprog.c: Before calling Boyer-Moore procedure, requiring that textlen >=
8723      querylen
8724
8725    * dynprog.c: Not calling Boyer-Moore procedure if textright <= textleft
8726
8727    * dynprog.c: Fixed variable name for SSE2 code
8728
8729    * gsnap.c: Fixed --help comment for --mode flag
8730
8731    * gmap.c: Including a header file for compress-write.h
8732
8733    * compress-write.c: Fixed bug in writing genomebits for user-provided
8734      segment in GMAP
8735
8736    * genomicpos.c, genomicpos.h: Fixed formatting routine to work on large files
8737
8738    * trunk, Makefile.am, src, Makefile.gsnaptoo.am, bitpack64-access.c,
8739      bitpack64-access.h, bitpack64-write.c, bitpack64-write.h, gmapindex.c,
8740      gsnap.c, indexdb-write.c, sarray-read.c, sarray-write.c, sarray-write.h,
8741      util, gmap_build.pl.in, gmap_setup.pl.in: Reintegrated revisions from
8742      branches/2013-07-11-compress-lcp to compress lcp file
8743
8744    * stage1hr.c: Disallowing diagonals < querylength, which lead to left < 0,
8745      again
8746
8747    * boyer-moore.c, dynprog.c: Using new Genome_get_segment_blocks_left and
8748      Genome_get_segment_blocks_right procedures
8749
8750    * genome.c, genome.h: Added separate Genome_get_segment_blocks_left and
8751      Genome_get_segment_blocks_right
8752
87532013-07-14  twu
8754
8755    * stage1hr.c: In batch_init and identify_all_segments, allowing diagonal to
8756      be less than querylength, needed for insertions in reads at the beginning
8757      of the genome.
8758
8759    * genome.c: Fixed memory leak when genomebits file is not available
8760
8761    * Makefile.am, NOTICE: Added NOTICE file for distribution
8762
8763    * Makefile.gsnaptoo.am: Using new saca-k.c and saca-k.h files
8764
8765    * compress-write.c, genome.c: Fixed bug where coordinates above 2^31 were
8766      being treated as negative values
8767
8768    * iit_store.c: For IITs with divs (versions above 1), converting from
8769      Univinterval_T objects to Interval_T objects
8770
8771    * saca-k.c, saca-k.h, sarray-write.c: Moved suffix array construction
8772      procedures to a separate file.  Using latest version of SACA-K code.
8773
8774    * gmap_build.pl.in: Added command for making suffix array
8775
87762013-07-11  twu
8777
8778    * trunk, VERSION, config.site.rescomp.prd, config.site.rescomp.tst, src,
8779      Makefile.gsnaptoo.am, compress-write.c, compress-write.h, compress.c,
8780      compress.h, compress128.c, compress128.h, genome-write.c, genome-write.h,
8781      genome.c, genome.h, genome128-write.c, genome128-write.h, genome128.c,
8782      genome128.h, genome128_hr.c, genome128_hr.h, genome_hr.c, genome_hr.h,
8783      genome_sites.c, genome_sites.h, get-genome.c, gmap.c, gmapindex.c,
8784      gsnap.c, indexdb-write.c, maxent128_hr.c, maxent128_hr.h, snpindex.c,
8785      splice.c, splicetrie.c, splicetrie_build.c, stage1hr.c, stage2.c,
8786      uniqscan.c, util, gmap_build.pl.in: Merged changes from
8787      branches/2013-07-05-new-genomecomp to implement 32-bit unshuffled
8788      representation of genome
8789
8790    * gmap_setup.pl.in: Added offsets, bitpackptrs, and bitpackcomp suffixes
8791
8792    * gff3_splicesites.pl.in: Removed debugging statement
8793
8794    * boyer-moore.c, dynprog.c: Using Genome_get_segment_blocks to avoid making
8795      repeated calls to Genome_get_char_blocks
8796
8797    * genome.c, genome.h: Implemented Genome_get_segment_blocks
8798
8799    * sarray-read.c: Fixed bug in known splicing, where index j was not being
8800      incremented
8801
8802    * samprint.c: Fixed hardclipping ends for circular alignments
8803
88042013-07-10  twu
8805
8806    * stage1hr.c: Fixed debugging statement
8807
88082013-07-09  twu
8809
8810    * gff3_introns.pl.in: Fixed warning statements
8811
88122013-07-05  twu
8813
8814    * setup1.test.in: Fixed test to use gmap_build, rather than gmap_setup
8815
8816    * oligoindex_hr.c: Handling special cases now within general case.  Not
8817      allowing count/store rev SIMD procedures to have ptr go below 0.  Fixed
8818      debugging comparison of std and SIMD results.
8819
8820    * gmap.c, stage2.c: Using new interfaces to Oligoindex_untally and
8821      Oligoindex_clear_inquery
8822
8823    * VERSION, config.site.rescomp.prd, config.site.rescomp.tst: Updated version
8824      number
8825
8826    * sarray-read.c, sarray-write.c: Changed lcp file suffix to be .salcp.
8827      Checking to see if suffix array files are present, and returning NULL
8828      gracefully if not.
8829
8830    * oligoindex.c, oligoindex.h: Clearing Oligoindex_T data structures after
8831      alignment by going through oligomers in the query, rather than using
8832      memset on the entire structure.
8833
8834    * uniqscan.c: Using new interface to Stage1hr_setup
8835
8836    * gsnap.c: Added --use-sarray flag
8837
8838    * stage1hr.c, stage1hr.h: Added parameter use_sarray_p to setup procedure.
8839      Improved warning message for very short reads.
8840
8841    * gff3_genes.pl.in: Printing genes with a single exon
8842
8843    * gff3_genes.pl.in, gff3_introns.pl.in, gff3_splicesites.pl.in: Allowing
8844      type CDS to be equivalent to exon
8845
88462013-07-04  twu
8847
8848    * oligoindex_hr.c: Removed global memcpy of oligospace in
8849      allocate_positions, and copying individual pointers at the same time as
8850      positions are assigned.
8851
8852    * gsnap.c, uniqscan.c: Using new interfaces to Stage1hr procedures
8853
8854    * stage1hr.c, stage1hr.h: Generalized mergeable list to be used for middle
8855      indels as well as splicing
8856
8857    * oligoindex_hr.c: Fixed values for startdiscard and enddiscard in special
8858      case of count_fwdrev_simd and store_fwdrev_simd when startptr + 3 ==
8859      endptr.
8860
8861    * sarray-read.c: Implemented SIMD procedure for scanning array in
8862      Elt_fill_positions_filtered
8863
88642013-07-03  twu
8865
8866    * gsnap.c: For a single read, changing sarray access to be USE_MMAP_ONLY
8867
8868    * stage1hr.c: Removed requirement of nconcordant == 0 from performing GMAP
8869      terminal alignments.  The concordant pairs can sometimes include pairs of
8870      terminal alignments.
8871
8872    * sarray-read.c: In calls to Genome_count_mismatches_limit, changed
8873      nmismatches_allowed parameter from incorrect value of 2 to nmisses_allowed.
8874
8875    * sarray-read.c: Increased value of SARRAY_EXCESS_HITS from 1000 to 100,000.
8876       Providing separate actions for USE_MMAP_PRELOAD and USE_MMAP_ONLY in
8877      Sarray_new.
8878
8879    * trunk, src, Makefile.gsnaptoo.am, access.c, genome.c, genome.h,
8880      genome_hr.c, genome_hr.h, gmapindex.c, gsnap.c, pair.c, pair.h,
8881      samprint.c, sarray-read.c, sarray-read.h, sarray-write.c, sarray-write.h,
8882      spanningelt.c, splice.c, splice.h, stage1hr.c, stage3hr.c, stage3hr.h,
8883      types.h, util: Merged revisions 100273 to 100402 from
8884      branches/2013-07-02-suffix-array-redo to implement suffix array algorithm
8885
88862013-07-02  twu
8887
8888    * gsnap.c: Added comment
8889
8890    * gmap_build.pl.in: When compression type is specified to be none, setting
8891      base size to be equal to the k-mer size.
8892
8893    * gmapindex.c: Fixed bug from failing to initialize compression_types.
8894      Changing to NO_COMPRESSION when base size is equal to k-mer size.
8895
8896    * indexdb-write.c, indexdb-write.h: When compression_type is NO_COMPRESSION,
8897      writing file as "offsets", rather than "offsetscomp".
8898
8899    * indexdb.c, indexdb_hr.c: Checking for NO_COMPRESSION case first
8900
8901    * oligoindex_hr.c: Fixed case for SIMD where startptr and endptr are 3 units
8902      apart (adjacent blocks).
8903
8904    * shortread.c: Ignoring accession and header for second queryseq in
8905      paired-end FASTA format over two files
8906
8907    * indexdb-write.c: Computing basesize separately for bitpack and gamma
8908      compression
8909
8910    * atoiindex.c, cmetindex.c, gmapindex.c, indexdb.c, indexdb.h, snpindex.c:
8911      Added offsets_only_p argument to Indexdb_get_filenames
8912
8913    * gmap_build.pl.in: Added -z flag for specifying compression types
8914
8915    * trunk, config.site.rescomp.prd, src, util: Merged changes from
8916      branches/2013-07-01-faster-splicing to speed up find_singlesplices and
8917      find_doublesplices
8918
8919    * gsnap.c, uniqscan.c: Using new interface to Stage1hr procedures
8920
8921    * stage1hr.c, stage1hr.h: Merged changes from
8922      branches/2013-07-01-faster-splicing to use a spliceable array.  Removed
8923      code for quicksort version of identify_all_segments.
8924
8925    * genome_hr.c: Added comment about unshuffle procedure
8926
8927    * VERSION, config.site.rescomp.tst: Updated version number
8928
8929    * gff3_genes.pl.in, gff3_introns.pl.in, gff3_splicesites.pl.in: Handling
8930      GFF3 files that lack gene lines
8931
8932    * indexdb.c: Changed stderr message when allocating memory for output
8933      pointers
8934
8935    * bitpack64-read.c: Using a procedure dispatch table instead of a switch
8936      statement
8937
89382013-07-01  twu
8939
8940    * atoiindex.c, cmetindex.c, indexdb-write.c, indexdb-write.h, snpindex.c:
8941      Added support for bitpack compression
8942
8943    * gmapindex.c, indexdb.c, indexdb.h, indexdb-write.c, indexdb-write.h: Added
8944      ability to write positions from offsets compressed with bitpacking method
8945
8946    * bitpack64-read.c: Implemented Bitpack64_block_offsets to compute all
8947      offsets
8948
89492013-06-30  twu
8950
8951    * bitpack64-read.c: Gave distinct variable names for debugging statements
8952
8953    * indexdb-write.c, bitpack64-write.c: Added code so bitpack offsets file can
8954      be written without SIMD instructions.
8955
8956    * bitpack16-read.c, bitpack32-read.c: Fixed debugging code
8957
8958    * bitpack64-read.c: Made some speed improvements in non-SIMD code by only
8959      adding as many terms as needed.
8960
8961    * bitpack64-read.c: Added code for decoding of bitpack files without SSE2
8962      present
8963
8964    * Makefile.gsnaptoo.am, bitpack64-read.c, bitpack64-read.h,
8965      bitpack64-write.c, bitpack64-write.h, gmapindex.c, indexdb-write.c,
8966      indexdb.c, indexdb_hr.c, indexdbdef.h: Changed back to 64-element blocks,
8967      with only two pieces of meta-information, pointer and offset0, and
8968      packsize inferred from successive pointers.
8969
8970    * bitpack32-read.c, bitpack32-read.h: Performing cumulative sums within each
8971      unpack procedure
8972
89732013-06-29  twu
8974
8975    * bitpack32-read.c, bitpack32-read.h, bitpack32-write.c, bitpack32-write.h,
8976      indexdb-write.c, indexdb.c, indexdb_hr.c, indexdbdef.h, gmapindex.c,
8977      Makefile.gsnaptoo.am: Changed from 64-element blocks to 32-element
8978      half-blocks
8979
8980    * indexdb.c, indexdb.h: Added Indexdb_get_filenames procedure that returns a
8981      Filenames_T object.  Put definition of Filenames_T into header file.
8982
8983    * genome_hr.c, genome_hr.h: Using Offsetscomp_T type
8984
8985    * bitpack16-write.c: Changed write procedure to take a pointer as argument,
8986      to avoid compiler warnings
8987
8988    * bitpack16-read.h: Added procedure to compute all offsets
8989
8990    * gmapindex.c, indexdb-write.h: Determining selection of filenames based on
8991      compression type for offsets.
8992
8993    * bitpack16-read.c, indexdb-write.c: Changed forrmat to put all metablock
8994      information, including offset and packsize, into the pointers file
8995
89962013-06-28  twu
8997
8998    * indexdbdef.h: Added values for compression types, and compression type
8999      field in Indexdb_T object.
9000
9001    * indexdb_hr.c: Calling bitpack or gamma compression at run time, as needed
9002
9003    * indexdb.c, indexdb.h: Searching for bitpack format, then gamma format, and
9004      then no compression. Created Filenames_T object to standardize routines.
9005
9006    * indexdb-write.c, indexdb-write.h: Added compression_types as a parameter,
9007      so multiple formats can be written
9008
9009    * bitpack16-read.c, bitpack16-read.h: Moved data structures to static
9010      variables, so they do not need to be passed as arguments each time.
9011
9012    * bitpack16-read.c, bitpack16-read.h: Using Blocksize_T type
9013
9014    * indexdb-write.c, indexdb-write.h: Restored lost code from large genome
9015      revisions
9016
9017    * Makefile.gsnaptoo.am: Added bitpack files for large genome programs
9018
9019    * trunk, config.site.rescomp.prd, src, Makefile.gsnaptoo.am,
9020      bitpack16-read.c, bitpack16-read.h, bitpack16-write.c, bitpack16-write.h,
9021      indexdb-write.c, indexdb-write.h, indexdb.c, indexdb_hr.c, util:
9022      Reintegrated revisions from branches/2013-06-14-bitpacking to add bitpack
9023      compression code
9024
9025    * trunk, index.html, src, util: Added link to large genomes version
9026
9027    * oligoindex_hr.c: Merged revisions 99785 to 99781 from
9028      branches/2013-06-27-simd-oligo to add SIMD code for counting and storing
9029      olimers
9030
90312013-06-27  twu
9032
9033    * types.h: Added comment
9034
9035    * stage1.c, gsnap.c, gmap.c: Using Width_T type
9036
9037    * spanningelt.c, spanningelt.h: Using Width_T types
9038
9039    * pair.c, pair.h: Changed types for binary search to be Chrpos_T
9040
9041    * indexdbdef.h, indexdb.c, indexdb.h: Using Width_T and Blocksize_T types
9042
9043    * indexdb-write.c: Resolved compiler warnings about signed/unsigned
9044      comparisons
9045
9046    * genome_hr.c, genome_hr.h: Using Blocksize_T type for offsetscomp_blocksize
9047
9048    * block.c, block.h: Using Width_T type for oligosize
9049
9050    * types.h: Added Width_T and Blocksize_t types
9051
9052    * trunk, README, configure.ac, src, Makefile.gsnaptoo.am, access.c,
9053      alphabet.c, alphabet.h, atoi.c, atoiindex.c, bigendian.c, bigendian.h,
9054      block.c, block.h, boyer-moore.c, boyer-moore.h, chimera.c, chimera.h,
9055      chrnum.c, chrnum.h, chrom.c, chrom.h, chrsubset.c, chrsubset.h, cmet.c,
9056      cmetindex.c, compress.c, compress.h, diag.c, diag.h, diagdef.h, dynprog.c,
9057      dynprog.h, gdiag.c, genome-write.c, genome-write.h, genome.c, genome.h,
9058      genome_hr.c, genome_hr.h, genomicpos.c, genomicpos.h, genuncompress.c,
9059      get-genome.c, gmap.c, gmapindex.c, goby.c, goby.h, gregion.c, gregion.h,
9060      gsnap.c, gsnap_tally.c, iit-read-univ.c, iit-read-univ.h, iit-read.c,
9061      iit-read.h, iit-write-univ.c, iit-write-univ.h, iit-write.c, iit_dump.c,
9062      iit_get.c, iit_store.c, iitdef.h, indexdb-write.c, indexdb-write.h,
9063      indexdb.c, indexdb.h, indexdb_hr.c, indexdb_hr.h, indexdbdef.h,
9064      interval.c, interval.h, intron.c, intron.h, littleendian.c,
9065      littleendian.h, mapq.c, mapq.h, match.c, match.h, matchdef.h, matchpool.c,
9066      matchpool.h, maxent_hr.c, maxent_hr.h, oligo.c, oligo.h, oligoindex.c,
9067      oligoindex.h, oligoindex_hr.c, oligoindex_hr.h, oligop.c, oligop.h,
9068      outbuffer.c, outbuffer.h, pair.c, pair.h, pairdef.h, parserange.c,
9069      parserange.h, samprint.c, samprint.h, segmentpos.c, segmentpos.h,
9070      snpindex.c, spanningelt.c, spanningelt.h, splicetrie.c, splicetrie.h,
9071      splicetrie_build.c, splicetrie_build.h, stage1.c, stage1.h, stage1hr.c,
9072      stage1hr.h, stage2.c, stage2.h, stage3.c, stage3.h, stage3hr.c,
9073      stage3hr.h, substring.c, substring.h, tableint.c, tableuint.c,
9074      tableuint.h, tableuint8.c, tableuint8.h, types.h, uint8list.c,
9075      uint8list.h, uintlist.c, uintlist.h, uniqscan.c, univinterval.c,
9076      univinterval.h, iittest.iit.ok, util: Reintegrated changes from
9077      branches/2012-02-14-biggenomes to handle large genomes
9078
9079    * index.html: Updated comments for latest version
9080
9081    * trunk, src, dynprog.c, oligoindex.c, oligoindex.h, oligoindex_hr.c,
9082      stage2.c, util: Merged revisions 99648 to 99702 from
9083      branches/2013-06-25-simd-8 to use unsigned chars for counts, and 8-bit
9084      SIMD instructions for allocate_positions
9085
9086    * stage3.c: Removed assertion that c != g when filling in a single bp gap,
9087      which can occur with overabundant oligomers, such as poly-A
9088
90892013-06-26  twu
9090
9091    * VERSION, index.html: Updated version number
9092
9093    * trunk, ax_ext.m4, config.site, config.site.rescomp.tst, configure.ac, src,
9094      boyer-moore.c, dynprog.c, gmap.c, gsnap.c, stage3hr.c, util: Merged
9095      revisions 99462 to 99647 from branches/2013-06-25-simd-8 to allow for
9096      dynamic programming with 8-bit chars when SSE4.1 is available
9097
9098    * stage3.c: Allowing insert_gapholders to fill in single mismatches, rather
9099      than inserting a gap.  Fixed peel_rightward and peel_leftward, so the
9100      extrapeel step does not remove gaps.
9101
9102    * stage3hr.c: Fixed issues with wrong queryseq sequence for second end being
9103      printed in standard GSNAP output
9104
9105    * dynprog.c: Further check to make sure traceback_local does not give rise
9106      to negative querypos coordinates.
9107
91082013-06-25  twu
9109
9110    * dynprog.c: Distinguishing between NEG_INFINITY and NEG_INFINITY_DISPLAY
9111
9112    * dynprog.c: Made traceback_local code follow that for traceback
9113
9114    * VERSION, index.html: Updated version number
9115
9116    * dynprog.c: Fixed bug in traceback_local where r is negative
9117
91182013-06-18  twu
9119
9120    * VERSION, index.html: Updated version number
9121
9122    * stage3.c: Using new interface to Pair_print_sam, which requires two
9123      accessions
9124
9125    * gsnap.c, outbuffer.c, pair.c, pair.h, samprint.c, samprint.h, shortread.c,
9126      shortread.h, stage3hr.c: Added flag --allow-pe-name-mismatch
9127
9128    * oligoindex_hr.c: Fixed faulty cmpgt statement based on 16-bit quantities,
9129      instead of 32-bit quantities
9130
91312013-06-14  twu
9132
9133    * dynprog.c: Removed unnecessary initialization for SIMD code
9134
9135    * trunk, util, src, gsnap.c, iit-read.c, iit-read.h, stage1hr.c, stage1hr.h:
9136      Merged revisions 98458 to 98523 from branches/2013-06-13-sort-not-merge to
9137      speed up update of chromosome bounds in identify_all_segments
9138
9139    * VERSION: Removed newline from file
9140
91412013-06-13  twu
9142
9143    * VERSION, config.site.rescomp.tst, index.html: Updated version number
9144
9145    * trunk, src, gmap.c, gsnap.c, indexdb.c, stage1hr.c, util: Merged revisions
9146      98429 to 98457 from branches/2013-06-13-sort-not-merge to add code for
9147      qsort, although code not used
9148
9149    * config.site.rescomp.prd: Removed -g flag from production CFLAGS
9150
9151    * trunk, config.site.rescomp.prd, index.html, src, dynprog.c, util: Merged
9152      revisions 97749 to 98423 from branches/2013-06-05-dynprog-sse to use SIMD
9153      instructions for dynamic programming procedures
9154
91552013-06-12  twu
9156
9157    * dynprog.c: Fixed an unassigned value for rlo.  Fixed code in
9158      make_splicejunction_3 where the standard splicejunction was reverse
9159      complemented, but not splicejunction_alt.
9160
91612013-06-10  twu
9162
9163    * oligoindex_hr.c: Using compile-time constants to clarify code
9164
9165    * gmap.c, gsnap.c: Added run-time check of compiler assumptions
9166
91672013-06-09  twu
9168
9169    * dynprog.c: Fixes to debugging code.  Slightly more efficient
9170      initialization.
9171
91722013-06-08  twu
9173
9174    * oligoindex_hr.c: Computing skips of empty counts to increase speed of
9175      allocate_positions
9176
91772013-06-07  twu
9178
9179    * oligoindex_hr.c: Using a convert instruction instead of a store
9180      instruction to compute the final sum in allocate_positions.
9181
9182    * oligoindex_hr.c: In allocate_positions, using SIMD commands to check when
9183      positions need to be computed
9184
9185    * gsnap.c: Turning off terminal alignments by default for DNA-Seq alignment.
9186      Added "all" option for --gmap-mode.
9187
9188    * trunk, config.site.rescomp.prd, config.site.rescomp.tst, src, dynprog.c,
9189      pair.c, Makefile.am, util: Merged revisions 97750 to 97963 from
9190      branches/2013-06-05-dynprog-sse for a faster and better implementation of
9191      dynamic programming procedures
9192
9193    * indexdb.c: Removed forced writing of positions by chunks for debugging
9194
9195    * Makefile.dna.am, Makefile.gsnaptoo.am: Removed PTHREAD_CFLAGS from LDFLAGS
9196      and moved SIMD_FLAGS to CFLAGS
9197
9198    * config.site: Added sections for bzlib, simd, and popcnt
9199
9200    * gmap.c: Moved free of usersegment to end, after we decide whether we need
9201      to free genome_blocks
9202
9203    * configure.ac: Moved check for SIMD to be close to that for check for popcnt
9204
9205    * VERSION: Updated version number
9206
9207    * configure.ac: Added flags --enable-simd and --disable-simd to control
9208      check for SIMD features
9209
9210    * indexdb.c: Added a second attempt to write positions file in chunks if the
9211      first write fails.  Added a sanity check in reading in a genomic index
9212      that the positions file has the expected size.
9213
92142013-06-05  twu
9215
9216    * configure.ac: Fixed comment
9217
9218    * ax_ext.m4: Changed comment lines for config.h to be standard definitions
9219      of 1
9220
9221    * Makefile.dna.am, Makefile.gsnaptoo.am: Added SIMD_FLAGS
9222
9223    * gmap.c, gsnap.c: Added available SIMD functions to --version output
9224
9225    * indexdb.c: Added check that positions file has the expected size
9226
9227    * oligoindex.c, oligoindex.h, oligoindex_hr.c: Using SIMD functions for
9228      allocate_positions
9229
9230    * acinclude.m4, configure.ac: Added check for SIMD support
9231
9232    * ax_ext.m4: Fixed spelling errors
9233
9234    * ax_check_compile_flag.m4, ax_gcc_x86_cpuid.m4, ax_ext.m4: Initial import
9235      into SVN
9236
9237    * dynprog.c: Created separate source code for jump late and jump early
9238      conditions
9239
92402013-05-28  twu
9241
9242    * pair.c: Fixed bug in final revcomp of sequence, where genomealt was not
9243      being complemented
9244
92452013-05-22  twu
9246
9247    * archive.html: Storing version 2013-03-31 into archive
9248
9249    * index.html: Updated for version 2013-05-09
9250
9251    * iit_get.c: No longer printing total, when getting queries from stdin
9252
9253    * parserange.c: Handling a case where interpreting query as a contig, and
9254      the result is NULL
9255
9256    * pair.c: Fixed computation of coverage for GFF output
9257
92582013-05-17  michafla
9259
9260    * Makefile.am, bootstrap.dna, bootstrap.gmaponly, bootstrap.gsnaptoo,
9261      bootstrap.pmaptoo, bootstrap.three, configure.ac: Port from release: use
9262      only 'config' for m4 macros
9263
92642013-05-09  twu
9265
9266    * dynprog.c, dynprog.h, splicetrie.c: In making splice junctions, checking
9267      for junctions that go to the left of genomic position 0.
9268
92692013-05-08  twu
9270
9271    * samprint.c: Fixed SAM output for translocations, affected by changes to
9272      hardclipping
9273
92742013-05-07  twu
9275
9276    * stage2.c: In find_shifted_canonical, checking for leftpos and rightpos
9277      that exceed chrhigh
9278
92792013-05-06  twu
9280
9281    * uniqscan.c: Using new interface to Stage1_single_read
9282
9283    * gsnap.c, stage1hr.c, stage1hr.h, uniqscan.c: Passing
9284      min_distantsplicing_end_matches and min_distantsplicing_identity to
9285      Stage1hr_setup.  Made GSNAP more sensitive to shorter reads, such as 50 bp.
9286
9287    * stage3.c: Removed line that is not reached.
9288
92892013-05-03  twu
9290
9291    * gmap.c: Added comment in --help output about chimeric alignments being an
9292      exception to the filtering options
9293
9294    * stage3.c, stage3.h: Added functions Stage3_recompute_coverage and
9295      Stage3_passes_filter
9296
9297    * gmap.c: Added options --min-trimmed-coverage and --min-identity
9298
9299    * stage2.c: Further check to make sure splice site does not occur in first 3
9300      bp of segment in find_shifted_canonical
9301
93022013-05-02  twu
9303
9304    * stage2.c: In find_shifted_canonical, preventing discovery of splice sites
9305      in first 3 bp of segment
9306
9307    * index.html: Made changes for version 2013-03-31 (v5)
9308
9309    * gmap.c: Checking value of Stage3_merge_chimera before creating chimera and
9310      running merge_left_and_right_transloc.
9311
9312    * stage3.c, stage3.h: Stage3_merge_chimera checks for pairs on left and
9313      right being non-NULL, and returns a bool
9314
9315    * pairpool.c: Fixed Pairpool_clip_bounded so it handles the case where the
9316      list is NULL.
9317
93182013-04-30  twu
9319
9320    * get-genome.c: Added --exact flag
9321
9322    * stage3hr.c: Removed exclusions in Stage3pair_overlap that prevented tails
9323      from being hard clipped.
9324
9325    * shortread.c: Fixed hard clipping of reads and quality strings to match new
9326      definitions of hardclip_low and hardclip_high
9327
93282013-04-26  twu
9329
9330    * iit_get.c: Added --exact option
9331
9332    * chimera.c: Adding range of chimeric overlap in XT SAM field
9333
9334    * outbuffer.c: Using new interface to SAM_print
9335
9336    * stage3hr.h, stage3hr.c: Revised Stage3pair_overlap to return hardclip5,
9337      hardclip3, and clipdir.
9338
9339    * samprint.h: Added clipdir as a parameter to SAM_print
9340
9341    * samprint.c: Handling clipping of overlaps when the low end of the first
9342      read and the high end of the second read should be clipped.  This is
9343      indicated by clipdir of -1, and occurs when the insert length is so short
9344      that the two reads have passed each other.
9345
93462013-04-12  twu
9347
9348    * pair.c, pair.h, samprint.c: Providing effective_chrnum of mate to
9349      Pair_print_sam, in case GMAP alignment has a translocation for a mate
9350
93512013-04-11  twu
9352
9353    * gmap.c: Made changes for PMAP to compile
9354
93552013-04-10  twu
9356
9357    * gmap.c: Fixed memory issues in merging middle pieces
9358
93592013-04-09  twu
9360
9361    * Makefile.dna.am: Fixed name of file
9362
93632013-04-05  twu
9364
9365    * stage1hr.c, stage3hr.c, stage3hr.h: When terminal_threshold is set to a
9366      high value, using trim_left_raw and trim_right_raw to exclude GMAP hits.
9367
9368    * samprint.c: Reverting to previous version, where chrpos of 0 indicates
9369      nomapping
9370
9371    * oligoindex_hr.c: Fixed an issue where minus oligomers were extending past
9372      the beginning
9373
9374    * stage3.c: Removed the unused variables nintrons, nnonintrons, intronlen,
9375      and nonintronlen.
9376
93772013-04-04  twu
9378
9379    * gmap.c, outbuffer.c, stage3.h: Changed name from map_genes to map_ranges,
9380      to avoid confusion
9381
9382    * iit_store.c: Handling empty files gracefully
9383
9384    * iit-write.c: If no intervals are found, then returning gracefully instead
9385      of exiting
9386
9387    * outbuffer.c: If allow_chimeras_p is false and chimera is present, then
9388      effective_maxpaths is 0.
9389
9390    * gmap.c: Added flag --no-chimeras
9391
9392    * stage3hr.c: Added debugging statement
9393
9394    * samprint.c: Using NULL hit instead of zero chrpos to indicate lack of
9395      mapping
9396
93972013-04-03  twu
9398
9399    * dynprog.c, stage3.c: Fixed uninitialized variable for g_alt in
9400      get_genomic_nt
9401
9402    * stage3hr.c, substring.c, substring.h: Updating nmismatches_bothdiff also
9403
94042013-04-02  twu
9405
9406    * stage3hr.c: Handling the new pairtype UNSPECIFIED.
9407
9408    * stage1hr.c: Allowing align_pair_with_gmap to change final_pairtype.  Calls
9409      Stage3pair_new with pairtype UNSPECIFIED.
9410
9411    * resulthr.c, resulthr.h: Added function Pairtype_string
9412
94132013-04-01  twu
9414
9415    * VERSION, config.site.rescomp.tst: Updated version number
9416
9417    * stage3.c: Added debugging statements
9418
9419    * splicetrie.c: Using computed miss_score.  Both miss_score and
9420      threshold_miss_score are negative.
9421
9422    * smooth.c, smooth.h: Added code for marking short exons, but not used
9423
9424    * pair.c, pair.h: Removed unused procedures
9425
9426    * dynprog.h, dynprog.c: Computing miss_score.  Both miss_score and
9427      threshold_miss_score are negative.
9428
94292013-03-30  twu
9430
9431    * stage3.c: Fixed bug in handling list for middle exon in
9432      build_pairs_dualintrons. In traverse_single_gap, when forcep == false, not
9433      adding pairs if finalscore < 0.
9434
94352013-03-28  twu
9436
9437    * stage3.c: Skipping pass 9 (distalmedial) and relying instead of
9438      trim_noncanonical_exons.  In pass 8, for extend_endings, not removing
9439      indel gaps, and setting quit_on_gap_p true, to preserve indels at ends.
9440      Not setting ambig_end_lengths in trim_noncanonical_exons, reserving it
9441      instead for trim_novel_spliceends.  Performing final extension of ends at
9442      end of path_compute, and not at beginning of path_trim.
9443
94442013-03-27  twu
9445
9446    * stage1hr.c: Allowing redo of GMAP pairs based on inconsistent senses.
9447
9448    * gmap.c, gsnap.c: Increased maxpeelback from 11 to 20
9449
9450    * stage3.c: Added quit_on_gap_p parameter to peel_rightward and
9451      peel_leftward. This allows smoothing procedures after traverse_single_gap
9452      to merge gaps, and dynamic programming traversal of introns to add indels.
9453      Calling remove_indel_gaps before dynamic programming solutions of introns.
9454       Added computation of max_intron_score, and using it in
9455      pick_cdna_direction to determine if sense is NULL.
9456
9457    * Makefile.dna.am: Added commands for splicing-score program
9458
9459    * stage3hr.c: Handling case in Stage3pair_new where expect_concordant_p is
9460      false for a GMAP alignment
9461
9462    * stage1hr.h, gsnap.c, uniqscan.c: Added distances_observed_p to
9463      Stage1hr_setup
9464
9465    * splicetrie_build.c: When observed distance is greater than
9466      localsplicedist, storing observed distance
9467
9468    * stage1hr.c: In find_splicepairs_distant, if splice distance is within the
9469      known maximum distance, then treating as a local splice, rather than a
9470      distant splice.
9471
9472    * substring.c: Added code for Substring_intragenic_splice_p, but using
9473      version in stage1hr.c instead
9474
9475    * stage3hr.c: In Stage3_determine_pairtype, using effective_chrnum instead
9476      of chrnum, so new pairs with PAIRED_UNSPECIFIED from GMAP runs work.
9477
9478    * stage3hr.h: Added function Stage3pair_sense_consistent_p.
9479
9480    * stage3hr.c: Changed ambig_end_interval, used in penalty, to 8.  In
9481      Stage3end_pick_cdna_direction, returning a non-zero cdna_direction. Added
9482      function Stage3pair_sense_consistentp.
9483
9484    * stage3.c: Allowing ambig end only if medial prob > 0.95.  Moved
9485      trim_novel_spliceends earlier, so it is effective.  In
9486      pick_cdna_direction, making a final use of alignment score.  Checking for
9487      divide by zero in computing defect_rate.
9488
9489    * stage1hr.c: Using matches over entire read to determine whether to perform
9490      GMAP. Performing halfmapping of GMAP against terminals only if nconcordant
9491      is 0.  Added a redo step on align_pair_with_gmap if the senses are
9492      inconsistent.
9493
9494    * inbuffer.c, shortread.c, shortread.h: Allowing FASTA input files to be
9495      paired
9496
94972013-03-26  twu
9498
9499    * stage1hr.c: For GMAP alignments with long trimmed ends, comparing
9500      terminal_threshold against user_maxlevel in deciding whether to drop them.
9501
95022013-03-25  twu
9503
9504    * stage3hr.c, stage1hr.c: Allowing GMAP improvement to be run on paired
9505      results
9506
9507    * samprint.c: Adding PG:Z:T output for terminal alignments
9508
9509    * stage3hr.c: Switching back to old GMAP filter, where nmatches_posttrim
9510      must exceed querylength/2.
9511
9512    * stage1hr.c: Applying terminal_threshold test to GMAP alignments
9513
95142013-03-21  twu
9515
9516    * dynprog.c: Allowing only two mismatches in a distant splice
9517
9518    * gmap.c, gsnap.c: Made results of --version consistent
9519
9520    * stage1hr.c: Applying match length criterion to terminal GMAP alignments
9521
9522    * stage1hr.c: Incrementing nconcordant only for high-quality GMAP pairsearch
9523      results, where nmatches is high enough, instead of using GMAP score.
9524
95252013-03-14  twu
9526
9527    * splicetrie_build.c: Fixed type
9528
9529    * gsnap.c, shortread.c, shortread.h: Implemented --force-single-end flag
9530
9531    * index.html: Added comment about --force-single-end flag
9532
9533    * README: Added description of stranded and nonstranded cmet and atoi modes.
9534
9535    * README: Added information about --force-single-end flag and multiple FASTQ
9536      files on the command line.
9537
9538    * configure.ac: Removed --with-samtools flag
9539
95402013-03-13  twu
9541
9542    * VERSION, config.site.rescomp.tst, index.html: Updated version number
9543
9544    * Makefile.dna.am, gmap.c, oligoindex_hr.c, stage3.c, stage3.h: Made changes
9545      so PMAP would compile
9546
9547    * index.html: Made changes for release 2013-03-05
9548
9549    * splicetrie_build.c: Added code for get_exons, which may be needed for
9550      getting splice sites from a genes IIT file
9551
9552    * genome-write.h: Changed UINT4 to Genomecomp_T
9553
9554    * types.h: Added Genomecomp_T
9555
9556    * stage3.c: Added minimum length for running stage2 in a dual break
9557
9558    * stage2.c: Improved debugging statements
9559
95602013-03-12  twu
9561
9562    * genome_hr.c: Returning correct reverse chrpos for
9563      prev_dinucleotide_position_rev. Printing universal coordinates on blocks
9564      for debugging.  Improved debugging statements for finding dinucleotides.
9565
9566    * stage3.c: Fixed pass 7, so path is expected in loop, and then is converted
9567      to pairs at the end.
9568
9569    * pair.c: In Pair_start_bound and Pair_end_bound, skipping pairs with
9570      querypos < 0, which represent gaps.
9571
95722013-03-06  twu
9573
9574    * indexdb.c: Removed code for writing word-by-word
9575
9576    * gsnap.c: Fixed bug where --adapter-strip=off had no effect.
9577
95782013-03-05  twu
9579
9580    * fa_coords.pl.in, gmap_build.pl.in, gmap_process.pl.in: Allowing FASTA file
9581      names and gzipped file names to contain spaces
9582
9583    * gsnap.c: Added entry for distant-splice-identity
9584
9585    * VERSION, config.site.rescomp.tst: Updated version number
9586
9587    * stage3hr.c: Restored check on bad GMAP alignments
9588
9589    * stage1hr.c: Taking both long and short terminals, with short terminals
9590      based on half the number of mismatches.  Always setting long and short
9591      terminal pos based on one-third of the read length.
9592
9593    * gmap.c: Fixed issues with computation of chimeric middle pieces, including
9594      memory freeing
9595
95962013-03-04  twu
9597
9598    * dynprog.c: Fixed bug with uninitialized variable introntype in
9599      Dynprog_genome_gap
9600
96012013-02-28  twu
9602
9603    * stage1hr.c: No longer using Stage3_short_alignment_p to rule out GMAP hits
9604
9605    * gsnap.c, stage3hr.c, stage3hr.h: Using min-nconsecutive criterion instead
9606      of min-coverage criterion for keeping GMAP hits.
9607
96082013-02-26  twu
9609
9610    * stage1hr.c: Introduced idea  of long terminals and short  terminals, with
9611      terminal length = querylength/3.  Allowing GMAP pairsearch to set
9612      nconcordant.
9613
9614    * stage1hr.c: Reverted to not recording GMAP pairsearch successes as
9615      nconcordant
9616
9617    * stage3hr.c: No longer calling Stage3_bad_stretch_p on GMAP alignment.  Not
9618      allowing end indels to set trims for alignment comparisons. Penalizing
9619      indels in alignment comparisons.
9620
9621    * stage3.c, stage3.h: Added procedure Stage3_good_part
9622
9623    * stage1hr.c: Recording GMAP pairsearch successes as nconcordant
9624
96252013-02-25  twu
9626
9627    * index.html: Updated to reflect new version number
9628
9629    * VERSION: Updated version number
9630
9631    * gsnap.c: Increased values of max_gmap_pairsearch and max_gmap_terminals
9632      from 10 and 5, respectively, to 50 and 50
9633
9634    * stage1hr.c: Sorting terminals by matches before running GMAP against a
9635      limited number of them
9636
9637    * uniqscan.c: Printing given sequence in addition to uniqueness result
9638
9639    * stage3.c: Reduced requirement for GMAP from querylength/2 to querylength/3
9640
9641    * gsnap.c: Reduced gmap_min_coverage from 0.50 to 0.33
9642
9643    * stage1hr.c: Counting nhits during subs and indels, and exiting when the
9644      value exceeds maxpaths_search
9645
96462013-02-22  twu
9647
9648    * gff3_introns.pl.in, gff3_splicesites.pl.in: Disallowing negative intron
9649      lengths
9650
9651    * gff3_genes.pl.in, gff3_introns.pl.in: Skipping blank lines.
9652
9653    * gff3_splicesites.pl.in: Fixed bug in warning message.  Skipping blank
9654      lines.
9655
9656    * stage3hr.c: Setting value of guided_insertlength for exact hits
9657
9658    * gsnap.c, outbuffer.c, outbuffer.h, samprint.c, samprint.h, stage1hr.c,
9659      stage1hr.h: Created separate variables maxpaths_search and maxpaths_report
9660
9661    * shortread.c: Fixed potential bug if sequence is longer than sequence_length
9662
9663    * uniqscan.c: Computing full sequence first, then iterating from start until
9664      we reach a unique alignment.
9665
96662013-02-21  twu
9667
9668    * samprint.c: Fixed assertion for circularpos to handle trimming at ends
9669
9670    * substring.c: Handling trimmed regions in Substring_circularpos
9671
96722013-02-15  twu
9673
9674    * VERSION, config.site.rescomp.tst, index.html: Updated version number
9675
9676    * stage1hr.c: Changed terminal length for one-half to one-third
9677
9678    * stage1hr.c: Fixed bug in the terminal position used for comparing
9679      mismatches in find_terminals
9680
96812013-02-14  twu
9682
9683    * stage3hr.c: Preventing distant splices from being considered as circular
9684      aliases and removed
9685
9686    * stage1hr.c: Fixed typo in comment
9687
96882013-02-12  twu
9689
9690    * stage1hr.c: Increased MAXCHIMERAPATHS from 3 to 100
9691
96922013-02-07  twu
9693
9694    * stage1hr.c, stage3hr.c, stage3hr.h: Added Stage3end_eval_and_sort_guided,
9695      to sort one end of unpaired alignments when the other end has a unique
9696      alignment.
9697
9698    * get-genome.c: Fixed bug in printing accession for sequence, based on
9699      coordinates, for minus strand
9700
97012013-02-06  twu
9702
9703    * uniqscan.c: Running scan for entire sequence length.  Reduced
9704      terminal-threshold to 5.
9705
97062013-02-05  twu
9707
9708    * VERSION, index.html: Updated version number
9709
9710    * genome-write.c, genome_hr.c, indexdb.c, outbuffer.c, stage1hr.c, stage3.c,
9711      substring.c: Fixed static analysis errors found by Nathan Weeks using the
9712      Clang 3.1 compiler
9713
97142013-01-24  twu
9715
9716    * VERSION, config.site.rescomp.tst, archive.html, index.html: Updated
9717      version number
9718
9719    * iit_get.c: Allowing -N flag to be used with tally IIT files
9720
9721    * outbuffer.c: Fixed bug where SAM headers were duplicated in .nomapping file
9722
97232013-01-17  twu
9724
9725    * pair.c: Added PG:Z:M flag for alignments using GMAP method within GSNAP
9726
9727    * stage3.c: Made further changes to peel_leftward and peel_rightward to
9728      prevent an indel from being on top of pairs or path
9729
9730    * Makefile.dna.am: Add bzip2 files
9731
9732    * stage3.c: Changed peel_leftward and peel_rightward so they do not leave a
9733      gap or indel at the top of the pairs or path.
9734
97352013-01-16  twu
9736
9737    * iit_store.c: Added debugging statement
9738
9739    * snpindex.c: Added information for warning statement from check_acgt
9740
9741    * configure.ac: Added configure commands for bzip2 library
9742
9743    * README: Added comment about bunzip2
9744
9745    * Makefile.gsnaptoo.am, bzip2.c, bzip2.h, gsnap.c, inbuffer.c, inbuffer.h,
9746      sequence.c, sequence.h, shortread.c, shortread.h: Added procedures for
9747      handling bunzip2
9748
9749    * snpindex.c: Fixed issue where warning messages referred to wrong labels
9750
9751    * stage1hr.c: Added debugging information about hits used as anchors for GMAP
9752
9753    * stage3hr.c: Restored checks on chromosomal bounds for Stage3_new_gmap, and
9754      returning NULL when bounds are exceeded
9755
97562012-12-19  twu
9757
9758    * VERSION, config.site.rescomp.tst, index.html: Updated version number
9759
9760    * stage3.c: In assign_gap_types, handling case where cdna_direction == 0
9761
9762    * gmap.c: Calling Stage3_guess_cdna_direction at appropriate places after
9763      merge_left_and_right_readthrough.  Using maxextension instead of
9764      max_intronlength for Stage 1 computations.
9765
9766    * stage3.h: Added function Stage3_guess_cdna_direction
9767
9768    * stage3.c: Using new interface to Pairpool_push_gapalign.  Not performing
9769      guess of cdna_direction in Stage3_merge_local_splice.
9770
9771    * stage3hr.c, stage3hr.h: Stage3_pair_up_concordant now limits number of
9772      samechr results
9773
9774    * stage1hr.c: Using new interface to Stage3_pair_up_concordant, which now
9775      takes nsamechr
9776
9777    * stage1.c, stage1.h: Using extensionlen instead of max_intronlength.
9778      Providing variables in Stage1_setup.
9779
9780    * pairpool.c, pairpool.h: Handling introntype field in Pair_T object
9781
9782    * pairdef.h: Added introntype as field for Pair_T object
9783
9784    * pair.c, pair.h: Added function Pair_fix_cdna_direction_array
9785
9786    * outbuffer.c, outbuffer.h: Adding commas to output of memory usage
9787
9788    * mem.c, mem.h: Added reporting of peak memory usage
9789
9790    * intron.c: For cdna_direction of 0, returning introntype rather than
9791      NONINTRON
9792
97932012-12-18  twu
9794
9795    * stage3.c: In Stage3_merge_local_splice, if intronlength is small or
9796      negative, calling Stage3_merge_local_single.
9797
9798    * mem.c: Increased size of hash table
9799
9800    * gmap.c: Putting results of middle search into middlepieces, and iterating
9801      through that list.  Searching for local readthroughs first, but linking
9802      local middle piece to both ends in any case.
9803
98042012-12-14  twu
9805
9806    * gsnap.c: Fixed --help documentation to show default for -a is off
9807
98082012-12-11  twu
9809
9810    * VERSION, index.html: Updated version number
9811
9812    * stage3.c: In Stage3_merge_local_splice, not advancing querypos for a
9813      deletion
9814
9815    * trunk, VERSION, src, gmap.c, pair.c, result.c, stage3.c, stage3.h, util:
9816      Merged changes from branches/2012-11-11-middle-piece to allow for
9817      searching of middle pieces in GMAP
9818
9819    * uinttable.c, table.c: Fixed memory leak
9820
9821    * samprint.c: Fixed issue with CIGAR string and hard-clipping
9822
9823    * gmap.c: Not finding chimeras if --nosplicing is requested
9824
98252012-12-10  twu
9826
9827    * indexdb.c: In writing gammas, using a buffer for offsetcomp, so we do not
9828      write words one at a time, which is slow on some filesystems
9829
9830    * samprint.c: In compute_cigar, handling D and N types based on querypos
9831
9832    * samprint.c: Rewrote compute_cigar
9833
98342012-12-07  twu
9835
9836    * gmap.c: Added separate procedure evaluate_query for use before all calls
9837      to Stage1_compute.  Turned on check for repetitivep.
9838
9839    * VERSION, config.site.rescomp.tst, index.html: Updated version number
9840
9841    * archive.html: Added entry for 2012-07-20.v2
9842
9843    * gmap_build.pl.in: Added -e or --nmessages flag
9844
9845    * samprint.c: Fixed problem with CIGAR string and clip-overlap function
9846
9847    * shortread.c: No longer assuming that slashes in the input FASTQ file are
9848      present consistently
9849
98502012-12-06  twu
9851
9852    * stage1hr.c: Added debugging messages
9853
9854    * iit_get.c: Added printing of flanking results in stdin mode.  Added -C
9855      flag to force interpretation of queries as coordinates.
9856
9857    * iit-read.c: Fixed coord_search_low and coord_search_high to prevent it
9858      from going below given chromosome
9859
9860    * gsnap.c: Revised --help message to give correct formula for fast index size
9861
9862    * gmapindex.c, genome-write.c, genome-write.h: Added nmessages as a parameter
9863
9864    * gmap.c: Using new interface to Stage3_merge_chimera
9865
9866    * get-genome.c: Added --signed flag
9867
9868    * stage3.h: Doing full trimming of inside ends of chimeras
9869
9870    * stage3.c: Doing full trimming of inside ends of chimeras.  Added commented
9871      code from revised 2012-07-20 version to prevent chimera extensions from
9872      crossing chromosomal coordinates.
9873
9874    * pair.c: Handling case for MD string in reverse query direction of I-to-D
9875      and N-to-D transitions
9876
98772012-11-28  twu
9878
9879    * get-genome.c: Fixed sign to be 0 when coordend == coordstart
9880
98812012-11-27  twu
9882
9883    * VERSION, index.html: Updated version number
9884
9885    * gmap_process.pl.in: Turning off handling of pipes in FASTA headers
9886
9887    * iit_get.c: Using new interface for IIT_get_flanking_typed
9888
9889    * iit-read.c, iit-read.h: Added procedures for getting signed results
9890
9891    * get-genome.c: Added --signed flag
9892
9893    * samprint.c: Fixed bugs in cigar strings for hard clipping and
9894      translocations
9895
98962012-11-21  twu
9897
9898    * VERSION, index.html: Updated version number
9899
9900    * gmap.c: Added a limit to iterations of chimera search to prevent an
9901      infinite loop
9902
99032012-11-19  twu
9904
9905    * VERSION, index.html: Updated version number
9906
9907    * gmap.c: On chimera search, calling original found subsequence, in case
9908      stage 1 was led astray by the ends
9909
9910    * stage2.c: Changed debugging call
9911
9912    * dynprog.c: Not aborting from Dynprog_microexon_int if cdna_direction is 0,
9913      which can happen with chimeric joins.  Returning NULL instead.
9914
9915    * stage2.c: Moved some debugging statements around
9916
9917    * stage1.c: Added debugging statements for printing final results
9918
9919    * gregion.c: Increasing MAX_GENOMICLENGTH from 1 million bp to 2 million bp
9920
9921    * stage3.c: Using make_pairarrays instead of make_pairarrays_chimera in
9922      Stage3_merge_local_splice.  Calling this even for dual break.
9923
9924    * stage2.c: Made slight differences in coordinates for Oligoindex_hr_tally
9925      on minus strand
9926
9927    * oligoindex_hr.c: Added debugging statements
9928
9929    * oligoindex.c, oligoindex.h: Added debugging procedure
9930
9931    * pair.c: Changed all types of exon_genome positions from int to Genomicpos_T
9932
99332012-11-16  twu
9934
9935    * stage3.c: Fixed bug in computing chrstart and chrend for
9936      Stage2_compute_one from traverse_dual_break.  Making all splice_genomepos
9937      consistently as chromosomal coordinates.
9938
9939    * stage3hr.c: Handling case in Stage3_cdna_direction where stage3 is NULL
9940
9941    * pair.c, pair.h, samprint.c: Fixed CIGAR strings when clip-overlap hits an
9942      insertion
9943
99442012-11-15  twu
9945
9946    * pairpool.c: Added assert.h
9947
9948    * trunk, config.site.rescomp.tst, src, boyer-moore.c, boyer-moore.h, diag.c,
9949      diag.h, diagdef.h, dynprog.c, dynprog.h, genome.c, genome_hr.c,
9950      genome_hr.h, gmap.c, gregion.c, gregion.h, iit-read.c, iit_dump.c,
9951      oligoindex.c, oligoindex.h, oligoindex_hr.c, oligoindex_hr.h, pair.c,
9952      pair.h, pairpool.c, splicetrie.c, splicetrie.h, splicetrie_build.c,
9953      stage1hr.c, stage2.c, stage2.h, stage3.c, stage3.h, util: Merged revisions
9954      78884 to 79299 from branches/2012-11-11-middle-piece to use chromosomal
9955      coordinates, chroffset, and chrhigh
9956
9957    * README, VERSION, index.html: Updated version number
9958
9959    * genome.h: Changed interface to Genome_fill_buffer_simple_alt
9960
9961    * splicetrie_build.c: Using new interface to Genome_fill_buffer_simple_alt
9962
9963    * genome.c: Fixed bugs in substituting SNPs into string
9964      (uncompress_mmap_snps_subst)
9965
99662012-11-09  twu
9967
9968    * VERSION, index.html: Updated version number
9969
9970    * stage3.c, stage3.h: Implemented Stage3_set_genomicend
9971
9972    * stage2.c: Remove MAX_GENOMICLENGTH restriction
9973
9974    * gregion.c: Impose MAX_GENOMICLENGTH on gregion
9975
9976    * gmap.c: Revising genomicend after pieces are merged
9977
9978    * indexdb.h: Fixed type conflict involving Oligospace_T with definition in
9979      indexdb.c
9980
9981    * stage2.c: Raised MAX_GENOMICLENGTH from 1 million to 10 million bp
9982
9983    * stage3.c: Fixed identification of insertion in Stage3_mergeable
9984
9985    * dynprog.c: Added debugging statement
9986
9987    * gregion.c: Fixed bug where gregion could extend beyond chrhigh
9988
99892012-11-07  twu
9990
9991    * stage3hr.c: Using fact that circularp[0] == false to handle translocations
9992
9993    * iit-read.c: Fixed IIT_circularp for lookup of 1-based chrnum values
9994
9995    * VERSION, index.html, config.site.rescomp.prd: Updated version number
9996
9997    * gmap.c, gsnap.c, uniqscan.c: Using new interface to Stage3hr_setup
9998
9999    * stage3hr.c: Checking hit->alias before checking hit->plusp
10000
10001    * stage3hr.c, stage3hr.h: Using information about circularp in
10002      compute_circularpos and in computing hit->alias.
10003
10004    * iit-read.c, iit-read.h: Implemented IIT_circularp function
10005
10006    * stage1hr.c: Added checks to make sure greedy mapping positions do not
10007      result in a genomic segment with negative length
10008
100092012-11-06  twu
10010
10011    * VERSION, index.html: Updated version number
10012
10013    * access.c: Changed printf commands for off_t to use %ju
10014
10015    * genome_hr.c: Fixed computation of wildcard SNP positions to exclude
10016      positions where reference allele is 'N'.
10017
10018    * configure.ac, access.c: Handling compiler warning messages when sizeof
10019      off_t is not 8
10020
10021    * dynprog.c, dynprog.h, gsnap.c, uniqscan.c: Removed genome and genomealt
10022      from Dynprog_setup
10023
10024    * stage1.c: No longer filtering for support anymore, since it leads to poor
10025      behavior
10026
10027    * gmap.c: Fixed a bug where a NULL genomealt was being passed to
10028      Chimera_find_exonexon
10029
100302012-10-31  twu
10031
10032    * coords1.test.ok: Changed to match current output of program, which allows
10033      for circular chromosomes
10034
10035    * Makefile.gsnaptoo.am: Made files match those of Makefile.dna.am
10036
10037    * archive.html, index.html: Made changes for new version
10038
10039    * README: Added discussion of --force-xs-flag, the XW and XV fields, and
10040      circular chromosomes.
10041
10042    * VERSION, config.site.rescomp.tst: Updated version number
10043
10044    * stage2.c: Prefer shorter intron in all cases
10045
10046    * gsnap.c, stage1hr.c, stage1hr.h, uniqscan.c: Implemented variable for
10047      splice distancs at novel ends
10048
10049    * stage2.c: For first intron, favoring shorter intron lengths in cases of
10050      ties
10051
10052    * stage1hr.c: Refinements made to computing GMAP genomic region, by checking
10053      if a fallback mappingstart or end is available.  Performing n best GMAP
10054      alignments for pairsearch.  Going back to sorting terminals by matches in
10055      GMAP terminals.
10056
10057    * stage3hr.c: Modifications made to score_eventrim for GMAP, by ignoring
10058      small trims and restoring indel penalty.  Fixed bug in computing
10059      non_terminal3_p.
10060
10061    * stage3.c: Calculating ambig end lengths better in trimming noncanonical
10062      ends. Not doing trim of novel splice end if ambig end length already set.
10063
10064    * samprint.c, samprint.h: Implemented --force-xs-direction flag
10065
100662012-10-30  twu
10067
10068    * dynprog.c: Limiting indels to 3 bp around splice sites
10069
10070    * stage2.c: Edited comment on debugging usage
10071
10072    * stage3.c: In trimming non-canonical end exons, when end exon is actually
10073      canonical but with poor probability, we trim the exon but set
10074      ambig_end_length, so the alignment can compete with other alignments.
10075
10076    * stage3.c: Made bad stretch algorithm more tolerant
10077
100782012-10-29  twu
10079
10080    * genome_hr.c: Fixed compile for GMAP
10081
10082    * pair.h: Added code to handle force-xs-direction.  Using
10083      mate_cdna_direction when necessary.
10084
10085    * pair.c: Added code to handle force-xs-direction.  Using
10086      mate_cdna_direction when necessary.  Counting N's as mismatches for
10087      trimming purposes.
10088
10089    * genome_hr.c, genome_hr.h, substring.c: Counting N's as mismatches for the
10090      purposes of end trimming
10091
100922012-10-28  twu
10093
10094    * substring.c, substring.h: Added function Substring_chimera_sensedir
10095
10096    * outbuffer.c, outbuffer.h: Removed snps_p as a parameter
10097
10098    * gmap.c, gsnap.c, uniqscan.c: Added --force-xs-dir flag
10099
101002012-10-27  twu
10101
10102    * stage3.c, stage3.h: Removed snps_p
10103
101042012-10-26  twu
10105
10106    * dynprog.c: Fixed debugging statements
10107
10108    * gmap.c, gsnap.c, uniqscan.c: Using new interface to Stage3_setup
10109
10110    * dynprog.c, stage3.c, stage3.h, intron.c: Commented out constants that
10111      should not be used by PMAP
10112
10113    * samprint.c, pair.c, pair.h: Printing XW and XV only when SNP-tolerant
10114      alignment is used
10115
10116    * uniqscan.c: Using new interfaces to procedures.
10117
10118    * gmap.c, gsnap.c: Added flag --md-lowercase-snp.  Using new interfaces to
10119      procedures.
10120
10121    * dynprog.c: Using jump_late_p in cdna_gap and genome_gap, which can
10122      sometimes give rise to indels
10123
10124    * stage1hr.c: Removed nsalvage from debugging statements
10125
10126    * stage3.c: Changed Stage3_bad_stretch_p to count each indel gap as one
10127      mismatch, regardless of length
10128
10129    * pair.c: Removed debugging statements
10130
101312012-10-25  twu
10132
10133    * stage3.c, stage3.h: Using new interface to Intron_type.  Obtaining
10134      alternate genomic segments in some procedures.
10135
10136    * stage2.c, stage2.h: Obtaining alternate allele and putting into Pair_T
10137      object when SNP-tolerant alignment is used
10138
10139    * splicetrie.c, splicetrie.h: Using new interfaces to Dynprog splicejunction
10140      procedures
10141
10142    * outbuffer.c, outbuffer.h: Added snps_p field
10143
10144    * iit-read.c, iit-read.h: Added function IIT_interval_type
10145
10146    * genome_hr.c: In finding dinucleotides, using alternate genome
10147
10148    * genome.c, genome.h: In function used to create splicejunctions, returning
10149      alternate genomic segment
10150
10151    * chimera.c, chimera.h, intron.c, intron.h, maxent_hr.c, maxent_hr.h,
10152      dynprog.c: Using alternate alleles in evaluating splice sites
10153
10154    * samprint.c, samprint.h, pair.c, pair.h: Printing lowercase MD for known
10155      SNP variants
10156
10157    * access.c, access.h: Added function Access_file_equal
10158
10159    * dynprog.h: Taking alternate genome sequence in splicejunction procedures
10160
10161    * dynprog.c: Using alternate allele in computing dynamic programming matrices
10162
10163    * snpindex.c: Made snpindex with with circular chromosomes
10164
10165    * snpindex.c: Checking if given IIT file and installed IIT file are the same
10166
101672012-10-23  twu
10168
10169    * trunk, VERSION, config.site.rescomp.tst, src, boyer-moore.c,
10170      boyer-moore.h, diag.c, diag.h, dynprog.c, dynprog.h, genome.c, gmap.c,
10171      oligoindex_hr.c, oligoindex_hr.h, stage1hr.c, stage2.c, stage2.h,
10172      stage3.c, stage3.h, util: Merged revisions 77378 to 77446 from
10173      branches/2012-10-21-no-genomicseg to remove genomicseg parameters
10174
10175    * stage3.c: Fixed uninitialized variables knownsplice5p, knownsplice3p, and
10176      intronlength.
10177
101782012-10-21  twu
10179
10180    * gsnap.c: Using new interface to Genome_setup and SAM_setup.
10181
10182    * gmap.c: Using new interface to Genome_setup.  Using -V and -v flags as in
10183      GSNAP.
10184
10185    * stage1.c: Fixed matchsize to be double index1part when index1part < 12
10186
10187    * dynprog.c, genome.c, pair.c, pair.h, pairdef.h, pairpool.c, pairpool.h,
10188      stage2.c, stage3.c: Added genomealt to Pair_T object and assigning this
10189      value in Pairpool_push routines.  Creating GENOMEALT_DEFERRED value until
10190      we resolve all occurrences of genomicseg.
10191
10192    * boyer-moore.c, dynprog.c, dynprog.h, genome.c, genome.h, maxent_hr.c,
10193      pair.c, stage2.c, stage3.c: Genome_get_char_blocks and get_genomic_nt now
10194      return alternate allele
10195
10196    * samprint.h: Added snps_iit to SAM_setup
10197
10198    * samprint.c: Computing mismatches for both refdiff and bothdiff, and
10199      printing XW and XV fields if running in SNP-tolerant mode
10200
102012012-10-20  twu
10202
10203    * gmap.c: Clarified that chimera search can be turned off by setting value
10204      to 0
10205
10206    * indexdb_hr.c: Fixed debugging statement
10207
10208    * snpindex.c: Fixed program to work with sampling intervals other than 3 bp.
10209      Performing file copy of IIT file to maps subdirectory.
10210
10211    * access.c, access.h: Added function Access_file_copy
10212
10213    * samprint.c: Fixed bug where circularpos was called before results arrays
10214      were retrieved
10215
102162012-10-19  twu
10217
10218    * stage3hr.c: Using hitpair score_eventrim in Stage3pair_optimal_score,
10219      instead of individual score5 and score3.
10220
10221    * trunk, VERSION, config.site.rescomp.tst, src, Makefile.dna.am, chrom.c,
10222      chrom.h, dynprog.c, gamma-speed-test.c, gdiag.c, genome-write.c,
10223      genome-write.h, genome.c, genome.h, gmap.c, gmapindex.c, gregion.c,
10224      gregion.h, gsnap.c, gsnap_tally.c, iit-read.c, iit-read.h, iit_dump.c,
10225      iit_store.c, indexdb.c, outbuffer.c, pair.c, pair.h, samprint.c,
10226      samprint.h, splicetrie_build.c, stage1.c, stage1.h, stage1hr.c, stage3.c,
10227      stage3.h, stage3hr.c, stage3hr.h, substring.c, substring.h, uniqscan.c,
10228      util, fa_coords.pl.in, gmap_build.pl.in, gmap_process.pl.in: Merged
10229      revisions 76693 to 77345 from branches/2012-10-15-circular to handle
10230      circular chromosomes
10231
10232    * outbuffer.c: Fixed write mode for appending to a file
10233
102342012-10-18  twu
10235
10236    * dynprog.c: Allowing get_genomic_nt to retrieve negative coordinates
10237
102382012-10-15  twu
10239
10240    * gmap.c, gsnap.c, outbuffer.c, outbuffer.h: Added --append option to append
10241      results to output files
10242
10243    * memory-check.pl: Added -9 flag to print successive maximum memory usage
10244
102452012-10-12  twu
10246
10247    * types.h: Added OLIGOSPACE_NOT_LONG
10248
10249    * stage3.c: Handling PMAP case for final guess at cdna_direction
10250
10251    * stage2.c: Not doing stage 2 if genomiclength > MAX_GENOMICLENGTH
10252
10253    * indexdb.c: Fixed issues with %lu using OLIGOSPACE_NOT_LONG
10254
10255    * sequence.c, sequence.h: Sequence_read_unlimited returns nextchar
10256
10257    * inbuffer.c, inbuffer.h, outbuffer.c, outbuffer.h: Handling multiple pairs
10258      of sequences in --pairalign mode
10259
10260    * gmap.c, gsnap.c: Requiring -m to be 0.10 or less when it is a float
10261
10262    * stage3hr.c, stage3hr.h: In pair_up_concordant, treating hits and terminals
10263      separately.  When all results are double terminals, treating as if it were
10264      final.
10265
10266    * stage1hr.c: Added variable gmap_rerun_p.  Fixed memory leak.  Removed use
10267      of segment->usedp.  Changed some uses of starti and endi.  GMAP result
10268      must be significantly better than original hit (reducing misses by half).
10269      GMAP pairsearch run only if hit list is small enough.  Keeping hits and
10270      terminals in separate lists.
10271
102722012-10-10  twu
10273
10274    * substring.c: Fixed bug in not copying trim_left_splicep and
10275      trim_right_splicep
10276
102772012-10-01  twu
10278
10279    * stage3.c: Added ability for Stage3_merge_local_splice to make a deletion
10280      instead of an intron
10281
10282    * stage3.c: Made code for coordinate change in Stage3_merge_local_splice
10283      match that of Stage3_merge_local_single
10284
102852012-09-26  twu
10286
10287    * gmap.c, gsnap.c, pair.c, pair.h, stage3.c, stage3.h, uniqscan.c: Added
10288      --require-splicedir flag and code for guessing at cdna direction.
10289
102902012-09-24  twu
10291
10292    * gmap.c, gsnap.c, stage3.c, stage3.h, stage3hr.c, stage3hr.h, uniqscan.c:
10293      Removed --pairexpect and --pairdev flags.  Removed expected_pairlength and
10294      pairlength_deviation variables.
10295
10296    * gsnap.c, uniqscan.c: Using new interface to setup procedures
10297
10298    * stage1hr.c: Turned off debugging
10299
10300    * substring.c, substring.h: Added check for splice site at trimmed position,
10301      to be used in even-trimming
10302
10303    * stage3hr.c, stage3hr.h: Added Stage3end_trim_left and Stage3end_trim_right
10304      commands.  In Stage3pair_optimal_score, eliminating hits only if both hit5
10305      and hit3 are worse than optimal in pre-final stages.  In final stages,
10306      using the sum of hit5 and hit3.  Eliminated absdifflength field.
10307
10308    * stage1hr.c: For determining GMAP bounds, computing both close
10309      mappingstart/mappingend (at distal end) and middle
10310      mappingstart/mappingend, and using close if available.  If close
10311      mappingstart/mappingend does not give a good alignment at the end, then
10312      trying full pairmax plus shortsplicedist.
10313
103142012-09-21  twu
10315
10316    * stage1hr.c: Consolidated calls to Stage2_compute and Stage3_compute into a
10317      run_gmap procedure.  Computing close_genomicstart and close_genomicend
10318      values, but using full pairmax for now.
10319
103202012-09-20  twu
10321
10322    * stage1hr.c: In computing mappingstart and mappingend for GMAP region,
10323      evaluating each diagonal for shortsplicedist vs querylength extension
10324
103252012-09-19  twu
10326
10327    * stage3hr.c: Computing trim_left_splicep and trim_right_splicep for GMAP
10328      alignments.  Using trim_left_splicep and trim_right_splicep to determine
10329      trim amount.
10330
10331    * stage1hr.c: Taking lowprob splice only if both ends have minimum support
10332      (set at 20) and no subs or indels were found previously
10333
103342012-09-18  twu
10335
10336    * stage1hr.c: Requiring one splice site to be sufficient for lowprob
10337      splices. Finding best splice first by nmismatches, and then by prob.
10338
10339    * gsnap.c, stage1hr.c, stage1hr.h, uniqscan.c: Using min_intronlength to
10340      prevent deletions from showing up as lowprob splices
10341
103422012-09-17  twu
10343
10344    * gmap.c, stage3.c, stage3.h: Implemented procedures to favor paths first by
10345      best goodness score, and then by shorter genomiclength
10346
103472012-09-14  twu
10348
10349    * stage3.c: Compute best_absmq_score, even if it is negative
10350
10351    * dynprog.c, pair.c, pair.h, stage3.c: Not protecting distal indels after
10352      known splice sites
10353
10354    * gsnap.c: For cmet-stranded and cmet-nonstranded mode, make
10355      --terminal-threshold=100 the default
10356
10357    * stage3.c: Allowing for merging when there is excess query sequence at the
10358      breakpoint
10359
103602012-09-13  twu
10361
10362    * oligoindex_hr.c: For minus strand, not subtracting 1 from left if left is
10363      0, which caused the entire sequence to be skipped
10364
10365    * datadir.c: Including comment about -F flag for cmetindex and atoiindex
10366
10367    * snpindex.c: Allowing snpindex to work when there is no gammaptrs file
10368
103692012-09-12  twu
10370
10371    * stage3hr.c: Changed a free command from FREE to FREE_OUT
10372
103732012-08-10  twu
10374
10375    * oligoindex.c: Fixed assertion for PMAP to use 3*querylength
10376
103772012-07-20  twu
10378
10379    * pair.c: Restored capitalized Coverage for standard output
10380
10381    * index.html: Added note about lower-case coverage and identity tags in GFF3
10382
10383    * pair.c: Changed Coverage and Identity in GFF3 output to be lower case
10384
10385    * Makefile.three.am: Fixed Makefile instructions
10386
10387    * VERSION, config.site.rescomp.tst, index.html: Updated version number
10388
10389    * gmap_build.pl.in: Fixed use of short and long options
10390
103912012-07-18  twu
10392
10393    * stage3.c: Merging pairs list of left and right parts for local merge, so
10394      that the resulting Stage3_T object can be used iteratively to find a
10395      chimera.
10396
10397    * gmap.c: Added DEBUG2A macro to show details of chimera detection
10398
103992012-07-17  twu
10400
10401    * gmap.c, pair.c, pair.h, stage3.c, stage3.h: Implemented iterative
10402      algorithm for finding chimeras
10403
104042012-07-12  twu
10405
10406    * VERSION, config.site.rescomp.tst, index.html: Updated version number
10407
10408    * pair.h, pair.c, samprint.c: Fixed problem in SAM output with an unpaired
10409      alignment with one end being a GMAP alignment
10410
104112012-07-09  twu
10412
10413    * gff3_introns.pl.in, gff3_splicesites.pl.in: Added -Q flag to suppress
10414      messages to stderr
10415
10416    * gff3_genes.pl.in, gff3_introns.pl.in, gff3_splicesites.pl.in: Checking for
10417      transcript lines as well as mRNA lines
10418
104192012-07-06  twu
10420
10421    * README: Added instructions for using vcf_iit
10422
10423    * configure.ac, Makefile.am, vcf_iit.pl.in: Added vcf_iit program for
10424      processing VCF files
10425
104262012-07-05  twu
10427
10428    * dbsnp_iit.pl.in: Added exception types InconsistentAlleles and
10429      SingleAlleleFreq
10430
10431    * dbsnp_iit.pl.in: Printing exception handling rules to stderr even if
10432      exception file not given
10433
104342012-07-03  twu
10435
10436    * VERSION, index.html: Updated version number
10437
10438    * parserange.c: No longer checking for labels that match a contig
10439
10440    * stage3.c: In extend_ending5 and extend_ending3, checking for a gap between
10441      gappairs and the rest of the read
10442
104432012-06-27  twu
10444
10445    * gmap.c: Fixed bug which prevented -1 flag from working
10446
10447    * stage1.c: Made stage 1 work for PMAP
10448
10449    * stage3hr.c: Turned off debugging
10450
10451    * stage3hr.c: For hitpair_equiv_cmp, not looking at score or nmatches anymore
10452
104532012-06-25  twu
10454
10455    * VERSION, index.html: Updated version number
10456
10457    * splicetrie_build.c: Fixed contents of splicesite_i in splicestrings, after
10458      sorting of splice sites
10459
104602012-06-20  twu
10461
10462    * uniqscan.c: Using new interface to Stage3hr_setup
10463
10464    * VERSION, index.html: Updated version number
10465
10466    * index.html, gsnap.c, stage3hr.c, stage3hr.h: Added --gmap-min-coverage flag
10467
10468    * stage2.c: Changed find_shifted_canonical to go directly to Genome_hr
10469      procedures instead of allocating memory and saving past results
10470
10471    * gsnap.c, stage1hr.c: Added indel_knownsplice as option to --gmap-mode
10472
104732012-06-19  twu
10474
10475    * gmap_build.pl.in: Added -M flag for handling NCBI MD files
10476
104772012-06-18  twu
10478
10479    * stage3hr.c: Cleaned up optimal_score commands for removing terminal
10480      alignments in final stage.  Using trim_terminals_p argument in calling
10481      compute_mapq functions.
10482
10483    * pair.c, pair.h, substring.c, substring.h: Added trim_terminals_p argument
10484      to compute_mapq functions
10485
104862012-06-15  twu
10487
10488    * index.html, pair.c, substring.c: Reverted back to old behavior for
10489      computing MAPQ in entire read, but trimming off ends of type TERM
10490
10491    * index.html: Added comments for changes
10492
10493    * gmap_build.pl.in: Using long flag names
10494
10495    * pair.c, pair.h, stage3hr.c, substring.c: Computing MAPQ scores over
10496      trim-kept region, instead of entire substring
10497
10498    * VERSION, config.site.rescomp.tst, index.html: Updated version number
10499
10500    * stage3.c: In trim_noncanonical_end_exons, keeping known introns only if
10501      nmismatches == 0
10502
105032012-06-14  twu
10504
10505    * stage3hr.c: Allowing Stage3end_remove_overlaps to work with translocations
10506
10507    * stage1hr.c: Allowing for multiple translocations to be reported.  Not
10508      updating nconcordant for GMAP pair revisions
10509
10510    * outbuffer.c, resulthr.c, samprint.c: Allowing for multiple single-end and
10511      unpaired translocations to be printed
10512
10513    * resulthr.c, samprint.c: Allowing for multiple paired translocations to be
10514      printed
10515
10516    * stage3hr.c, stage3hr.h: Changed Stage3pair_remove_overlaps and
10517      hitpair_sort_cmp, so they work on translocations
10518
10519    * stage3hr.c: Allowing multiple concordant translocations to be printed
10520
10521    * stage1hr.c: Not skipping GMAP on terminal alignments.  Performing
10522      align_concordant_with_gmap on with_terminal list.
10523
10524    * resulthr.c, resulthr.h, pair.c: Changed pairtype TRANSLOCATION to
10525      CONCORDANT_TRANSLOCATIONS
10526
10527    * stage1hr.c, stage3hr.c, stage3hr.h: Added a category of hitpairs called
10528      with_terminal, with lower priority than samechr or conc_transloc
10529
10530    * stage2.c: Increased value of SHIFT_EXTRA to fix a fatal bug
10531
105322012-06-13  twu
10533
10534    * stage3.c: Counting indels and short gaps as mismatches in
10535      Stage3_bad_stretch_p
10536
105372012-06-12  twu
10538
10539    * index.html: Added comment about improved detection of translocations
10540      within read ends
10541
10542    * stage3hr.c: Computing substring_for_concordance for both translocations
10543      (chrnum == 0) and intrachromosomal rearrangements (shortdistancep == false)
10544
10545    * stage1hr.c: Checking for bad stretch in GMAP hits, as soon as we call
10546      Stage3end_new_gmap
10547
10548    * index.html: Updated version number
10549
105502012-06-11  twu
10551
10552    * trunk, VERSION: Updated version
10553
10554    * Makefile.gsnaptoo.am: Removed extra includes of cmet and atoi files for
10555      GMAP
10556
10557    * oligoindex_hr.c: Getting the final oligomers when computing mappings
10558
10559    * stage3.c: Fixed computation of mappingstart and mappingend for traversing
10560      dual breaks on crick strand
10561
10562    * stage1.c: Restoring old scan ends algorithm
10563
10564    * stage1hr.c: Removed unused debugging macro
10565
10566    * stage3.c: In trimming novel splice ends, allowing perfect matches to
10567      extend into intron
10568
10569    * psl_introns.pl.in: Added print command
10570
10571    * Makefile.gsnaptoo.am: Added file dependencies
10572
10573    * stage3.c: Using QUERYEND_NOGAPS for pass 9a and pass 9b for GSNAP, so
10574      trimming will work.  Fixed computation of mappingstart and mappingend in
10575      traverse_dual_break.
10576
105772012-06-06  twu
10578
10579    * Makefile.dna.am, stage3hr.c: Adding an absolute sufficient minlength for a
10580      terminal, besides using querylength
10581
10582    * VERSION, config.site.rescomp.tst, index.html: Updated version number
10583
10584    * src: Committing property changes from last merge
10585
10586    * gmap.c: Increased max_nalignments from 3 to 10
10587
10588    * stage1hr.c: Fixed bug in find_terminals, where querypos3 was used to
10589      compute start_endtype and querypos5 was used to compute end_endtype,
10590      instead of querypos5 and querypos3, respectively.
10591
10592    * stage3hr.c: Allowing both ends to be of type TERM in a terminal, and
10593      checking for mismatches only between the trimmed ends.  Requiring that
10594      final length is querylength/3.
10595
10596    * dynprog.c: Dropped mismatch scores, which helps GMAP extend ends and find
10597      chimeras.
10598
10599    * stage3.c: Changed endalign for pass 9a and 9b from QUERYEND_NOGAPS to
10600      BEST_LOCAL.  This fixes an issue in GMAP where ends are truncated, and
10601      chimeras not found, as introduced in revision 64732 on 2012-05-22.
10602
10603    * stage2.c: Fixed bug in condition on suboptimal stage 2 paths, where we
10604      were requiring fewer than max_nalignments results plus the score ==
10605      bestscore.  The condition should have been a disjunction, not a
10606      conjunction.
10607
10608    * stage1hr.c: Skipping computation of GMAP on single-end terminal
10609      alignments, since that is a duplication of effort
10610
10611    * stage1hr.c, stage3hr.c, stage3hr.h, substring.c, substring.h: Restored
10612      assignment of endtypes to terminal alignments.  Using them again determine
10613      whether to extend terminals left or right for GMAP alignments.
10614
10615    * stage1hr.c: Integrated two criteria for finding terminals: old method
10616      based on counting mismatches from ends, and new method based on width of
10617      (querypos3 - querypos5).
10618
106192012-06-05  twu
10620
10621    * stage3.c: Fixed bug in local chimera alignment with uninitialized value
10622      for genomicseg_ptr
10623
106242012-06-01  twu
10625
10626    * genome.c: Added assertions to Genome_get_char and Genome_get_char_blocks
10627      to check for negative coordinates
10628
10629    * dynprog.c: Removed debugging statements
10630
10631    * stage3.c, dynprog.c: Fixed get_genomic_nt to check for both genomicpos
10632      between 0 and genomiclength, and pos between chroffset and chrhigh
10633
10634    * VERSION, index.html, config.site.rescomp.tst: Updated version number
10635
10636    * dynprog.c, stage3.c: Checking only for genomepos < 0 in get_genomic_nt,
10637      not for chrpos between chroffset and chrhigh, which may need further
10638      debugging for chimeras.
10639
10640    * dynprog.c: Checking for genomepos < 0 in get_genomic_nt.
10641
10642    * stage3.c: For Stage3_extend_left and Stage3_extend_right, using
10643      get_genomic_nt instead of going directly to Genome_get_char.  Checking for
10644      genomepos < 0 in get_genomic_nt.
10645
10646    * stage3hr.c: In Stage3pair_remove_overlaps, allowing separate pair to
10647      subsume overlapping pair only if it is better
10648
10649    * stage3.c, dynprog.c: Fixed check of chrpos to compare genomicpos against
10650      chrhigh
10651
10652    * dynprog.c, stage3.c: Checking for chrpos between 0 and chrhigh - 1,
10653      inclusive
10654
10655    * dynprog.c, dynprog.h, gmap.c, gregion.c, gregion.h, splicetrie.c,
10656      splicetrie.h, stage1hr.c, stage3.c, stage3.h: Passing chrhigh along with
10657      chroffset to all procedures
10658
10659    * dynprog.c, genome.c: When chromosomal coordinate is negative, returns '*'
10660      instead of 'N'. Traceback procedures in dynamic programming will not add
10661      pairs with '*' genomic nucleotides.
10662
10663    * util: Merged changes from last branch
10664
10665    * README: Added note that MAX_READLENGTH applies only to GSNAP
10666
10667    * stage2.c, stage3.c: Merged changes from
10668      branches/2012-06-01-merge-single-gap to fix problems with merging single
10669      gap on minus strand
10670
106712012-05-31  twu
10672
10673    * stage3.c: Protected another debugging statement from referring to
10674      genomicseg
10675
10676    * gsnap.c: Fixed documentation for --fails-as-input flag.
10677
10678    * gmap.c: Added --fails-as-input string to getopt processing.  Fixed
10679      documentation for --fails-as-input flag.
10680
10681    * dynprog.c: Added messages to stderr before all abort statements
10682
10683    * translation.c: Requiring translation_leftpos and translation_rightpos to
10684      be between 0 and querylength-1.
10685
106862012-05-25  twu
10687
10688    * gmap_build.pl.in: If -k 15 specified, but not -b, setting basesize to be 12
10689
106902012-05-24  twu
10691
10692    * Makefile.gsnaptoo.am: Added uniqscan program
10693
10694    * stage1hr.c: Decreased max_nalignments from 3 to 2
10695
10696    * dynprog.c: For known splicesites, adjusted low and high boundaries so
10697      contlength is always between 0 and endlength-1, inclusive.
10698
10699    * stage3.c: Not reducing genomejump at ends anymore
10700
107012012-05-23  twu
10702
10703    * VERSION, index.html: Updated version number
10704
10705    * gmap.c, stage1hr.c: Increased max_nalignments for stage 2 to 3
10706
10707    * stage3hr.c: Turned off check for cdna_direction != 0 and SENSE_NULL in
10708      declaring a GMAP alignment as bad
10709
10710    * stage3.c: Changed pass 9 from queryend_indels to queryend_nogaps, to avoid
10711      false positive indels at ends and to prepare for noncanonical end trimming
10712
10713    * splicetrie.c: Improved debugging statements
10714
10715    * pair.c: Added information about knowngapp and protectedp in printing pair
10716      information for debugging purposes
10717
107182012-05-22  twu
10719
10720    * stage3.c: In trimming of noncanonical introns near end, making an
10721      exception for known introns
10722
10723    * dynprog.c: Replaced noindel version of Dynprog_end_splicejunction
10724      functions with version allowing indels
10725
10726    * stage3.c, stage3.h, stage3hr.c: In Stage3_bad_stretch_p, excluding trimmed
10727      regions on ends
10728
10729    * uniqscan.c: Using new interfaces to setup procedures
10730
10731    * stage1hr.c: Added debugging statements
10732
10733    * genome_hr.c: Removed debugging statements
10734
10735    * gamma-speed-test.c: Using new interface to setup procedures
10736
10737    * dynprog.c, dynprog.h, stage3.c: Introduced new endalign type,
10738      QUERYEND_GAP, and using it in pass 8. Restored call of Dynprog_end
10739      procedures in trim ends using BEST_LOCAL, which does not try to find an
10740      intron.
10741
10742    * stage3.c: Better handling of ends: pass 8, best_local plus known splicing;
10743      pass 9, queryend_indels; pass 10, queryend_nogaps.  Medial gap not using
10744      known splicing.  Simplified trim_ends procedure.  Not removing or
10745      re-inserting known intron gaps.  Bayesian computation of mapping scores
10746      for GMAP alignments.
10747
10748    * splicetrie.c: Computing separate offsets for anchor and far splicesites
10749      for use in Dynprog_end_splicejunction procedures.  Not calling
10750      Dynprog_add_known_splice procedures.
10751
10752    * pairdef.h, pairpool.c, pairpool.h: Added knowngapp field to Pair_T object
10753
10754    * dynprog.c, dynprog.h: Allowing compute_scores procedures to work on
10755      genomicseg (for splicejunctions).  Added functions
10756      Dynprog_end5_splicejunction and Dynprog_end3_splicejunction to replace
10757      add_known_splice procedures. Calling traceback_local_nogaps in two parts.
10758      Dynprog_end_gap procedures returning final score even for QUERYEND_INDELS.
10759       Made debugging statements work without genomicseg.
10760
107612012-05-21  twu
10762
10763    * samprint.c: Checking if MD string output is empty and if so printing "0"
10764
107652012-05-18  twu
10766
10767    * stage3hr.c: For paired-end reads, in cases of tie score, sorting results
10768      by genomic position
10769
10770    * splicetrie_build.c: For intron-level splicing information, sorting
10771      individual splicesites by ascending genomic position
10772
10773    * splicetrie.c, splicetrie.h: Computing spliceoffset needed to construct
10774      splicejunctions.  Calling Dynprog_local_nogaps procedures.  Requiring
10775      dynprog score > 0 on known splicejunctions.
10776
10777    * dynprog.c, dynprog.h: Fixed bug in making A-G and C-T ambiguous scores for
10778      all modes. Implemented traceback procedure using given sequence and no
10779      gaps for handling known splicejunctions.  End dynamic programming
10780      procedure now returns a final score for queryend_nogaps endalign.
10781      Implemented make_contjunction procedures to retrieve the continuous part
10782      of splicejunctions.  Made make_splicejunction_3 consistent with
10783      make_contjunction_3.
10784
10785    * stage3hr.c: Turned off check for min_splice_prob on GMAP alignments, since
10786      it appears not to work for known splicesites
10787
10788    * pair.c: Pair_dump_list now prints line to indicate start of list
10789
107902012-05-17  twu
10791
10792    * dynprog.c: Handling cases where length1 == 0 or length2 == 0, which
10793      otherwise cause fatal errors
10794
107952012-05-16  twu
10796
10797    * stage3.c: Setting use_genomicseg_p to false in all cases
10798
10799    * stage2.c: Using Oligoindex_hr_tally even if user genomic segment provided
10800
10801    * gmap.c: Computing genomicend even if user genomic segment provided
10802
10803    * genome-write.c: Added extra 4 words to end of genome blocks to accommodate
10804      nextlow in Oligoindex_hr procedures
10805
10806    * Makefile.gsnaptoo.am: Added source files for GMAP
10807
10808    * VERSION, config.site.rescomp.prd, config.site.rescomp.tst, index.html:
10809      Updated version number
10810
10811    * stage1.c, stage1.h: Added procedure for nonstranded alignment.  Turning
10812      off scan_ends algorithm and using only sampling.  Using indexdb size limit
10813      for standard mode, but not for cmet-nonstranded mode.
10814
10815    * stage2.c: Looking up genomic nt for all alignment pairs in
10816      convert_to_nucleotides
10817
10818    * uniqscan.c, gsnap.c: Using new interfaces to setup routines
10819
10820    * gmap.c: Using different indexdb size thresholds for standard and
10821      cmet-nonstranded modes
10822
108232012-05-15  twu
10824
10825    * stage1hr.c: Using new interfaces to stage 2 procedures
10826
10827    * stage3.c, stage3.h: Handling AMBIGUOUS_COMP the same as MATCH_COMP and
10828      DYNPROG_MATCH_COMP. Removed genomic_offset argument from Stage2_compute.
10829      Fixed intermediate alignment results for debugging by returning pairs
10830      instead of path from path_compute.
10831
10832    * oligoindex_hr.c, oligoindex_hr.h: Fixed Cmet_reduce commands for
10833      CMET_STRANDED mode
10834
10835    * match.c: Fixed memory leak in a debugging procedure, Match_print
10836
10837    * gregion.c, gregion.h: Made Gregion_filter_support function available
10838      again.  Added function Gregion_genestrand.
10839
10840    * Makefile.dna.am, dynprog.c, dynprog.h, genome.c, genome.h, gmap.c,
10841      stage2.c, stage2.h: Added code to make GMAP work on cmet-stranded and
10842      cmet-nonstranded modes
10843
108442012-05-14  twu
10845
10846    * stage3hr.c: Disallowing new Stage3pair_T object if its insertlength
10847      exceeds pairmax and we expect a concordant pair
10848
10849    * trunk, config.site.rescomp.tst, src, Makefile.dna.am, block.c, block.h,
10850      boyer-moore.c, boyer-moore.h, dynprog.c, dynprog.h, genome-write.c,
10851      genome-write.h, genome.c, genome.h, genome_hr.c, genome_hr.h, gmap.c,
10852      gsnap.c, intlist.c, intlist.h, oligoindex.c, oligoindex.h,
10853      oligoindex_hr.c, oligoindex_hr.h, pair.c, pair.h, splicetrie.c, stage1.c,
10854      stage1.h, stage1hr.c, stage2.c, stage2.h, stage3.c, stage3.h, stage3hr.c,
10855      stage3hr.h, uniqscan.c, util: Merged revisions 63606 to 64016 from
10856      branches/2012-05-08-genomic-nts to read genomic nt rather than generate
10857      genomicseg
10858
108592012-05-10  twu
10860
10861    * trunk, config.site.rescomp.tst, archive.html, index.html, src, dynprog.c,
10862      util: Merged revisions 63773 through 63823 from
10863      branches/2012-05-10-better-affine-gap to make score matrix symmetric, put
10864      sequence2 in outer loop, fix boundary conditions, and improve efficiency
10865
10866    * resulthr.c: Fixed uninitialized value for X2 on halfmapping_mult alignments
10867
10868    * samprint.c: Fixed uninitialized value for X2 on halfmapping_uniq alignments
10869
108702012-05-07  twu
10871
10872    * gtf_genes.pl.in, gtf_introns.pl.in, gtf_splicesites.pl.in: Made -E flag
10873      use exon_number field
10874
10875    * gtf_genes.pl.in, gtf_introns.pl.in, gtf_splicesites.pl.in: Added -E flag
10876      to ignore exon_number fields in GTF file
10877
10878    * VERSION: Updated version
10879
10880    * oligoindex_hr.c: Reverting to version that zeroes out counts for oligomers
10881      that are overabundant or not in query
10882
10883    * gmap.c, stage1hr.c, stage2.c, stage2.h: Providing a limit on the number of
10884      suboptimal alignments returned from stage 2.  Limit set to 2 for GMAP and
10885      1 for GSNAP.
10886
10887    * gsnap.c: Added getopt handler for --sam-multiple-primaries
10888
108892012-05-03  twu
10890
10891    * trunk, VERSION, config.site.rescomp.tst, src, dynprog.c, gmap.c, pair.c,
10892      pair.h, stage1hr.c, stage3.c, stage3.h, stage3hr.c, stage3hr.h, util:
10893      Merged revisions 63036 to 63240 from branchese/2012-05-01-affine-gap to
10894      implement an affine gap algorithm for dynamic programming
10895
10896    * uniqscan.c: Using new interface to Stage1hr procedure that contains
10897      first_absmq as an argument
10898
10899    * shortread.c: Generalized handling of old Illumina paired-end format ending
10900      in :0 or :<digit>.
10901
10902    * genome.c: Fixed function Genome_fill_buffer_simple_alt so it returns
10903      ref+alt, instead of empty+alt.
10904
10905    * indexdb.c: Made writing of offsetscomp file faster when blocksize == 1 (or
10906      k == b), by using a single write command instead of looping.
10907
10908    * goby.c, outbuffer.c: Implemented patches for Goby 2.0
10909
10910    * gmap.c, gsnap.c, outbuffer.c, pair.c, pair.h, result.c, result.h,
10911      resulthr.c, resulthr.h, samprint.c, samprint.h, stage1hr.c, stage1hr.h,
10912      stage3.c, stage3.h, stage3hr.c, stage3hr.h: Added flag
10913      --sam-multiple-primaries to allow multiple alignments to be marked as
10914      primary, if their mapping scores are equally good
10915
10916    * shortread.c: Handling older style Illumina paired-end reads that end in
10917      ":0"
10918
109192012-05-02  twu
10920
10921    * stage3.c: Fixed debugging statements
10922
10923    * oligoindex_hr.c: Not zeroing out counts[i] for oligomers that are either
10924      overabundant or not in query.  Saves time in allocate_positions function
10925      and in store functions.
10926
109272012-04-27  twu
10928
10929    * VERSION, config.site.rescomp.prd, config.site.rescomp.tst, index.html:
10930      Updated version
10931
10932    * gmap.c: Using new interface to Stage3_compute
10933
10934    * stage3hr.c, stage3hr.h: Using nmatches_posttrim (adjusted by scores for
10935      indels) to compare alignments, except for pre-final terminals.  Performing
10936      iteration in optimal_score procedures to use updated trim boundaries as
10937      poor alignments are removed.
10938
10939    * stage3.h: No longer use nmatches_pretrim in Stage3_compute
10940
10941    * stage3.c: Removed use of QUERYEND_INDELS at ends of reads, and using
10942      QUERYEND_NOGAPS instead, to reduce time spent in dynamic programming
10943
10944    * stage1hr.c: Added usedp field to Segment_T object, and marking it true if
10945      the segment is used in making a Stage3end_T object.  Skipping these
10946      segments in finding terminals.
10947
10948    * pair.c, pair.h: Added pos5 and pos3 arguments to Pair_nmatches
10949
10950    * genome.c: Made uncompress_mmap faster by translating high and low words 16
10951      bits at a time.  Adding N's only if flags is not zero.
10952
10953    * dynprog.c: Modified bridge_intron_gap so it searches for indels on left or
10954      right of splice, but not both.  Changes algorithm from quadratic to linear
10955      time.
10956
109572012-04-25  twu
10958
10959    * stage1hr.c, stage3hr.c, stage3hr.h: Revisions made to hit_goodness_cmp and
10960      hitpair_goodness_cmp.  Using genomiclength in hit_goodness_cmp, final
10961      round.  For terminals, returning 0 in preliminary rounds.  Not using
10962      scores for terminals, and using scores only if GMAP or indels are
10963      involved.  In pair_up procedure, not using terminal scores to update
10964      found_score.
10965
10966    * indexdb.c: Fixed uninitialized variables that caused problems with older
10967      GMAP indices
10968
109692012-04-23  twu
10970
10971    * dynprog.c: Created separate functions, compute_scores_lookup_fwd and
10972      compute_scores_lookup_rev
10973
10974    * splicetrie.c: Fixed calls to Dynprog_end5_gap and Dynprog_end3_gap to use
10975      endalign instead of to_queryend_p.
10976
109772012-04-20  twu
10978
10979    * VERSION: Updated version
10980
10981    * stage3hr.c: Added check for too many indel breaks in GMAP
10982
10983    * stage1hr.c, stage3hr.c, stage3hr.h: Storing cdna_direction in hit of type
10984      GMAP and using it instead of sensedir when printing
10985
109862012-04-19  twu
10987
10988    * VERSION, index.html: Updated version number
10989
10990    * stage3.c: Turned off debugging
10991
10992    * stage3.c: Fixed uninitialized variable in trim_novel_spliceends
10993
10994    * dynprog.c, dynprog.h, stage3.c: Using correct endalign types in
10995      Dynprog_end5_known and Dynprog_end3_known
10996
10997    * pair.c: Fixed issue with uninitialized variable in printing splicesite
10998      labels
10999
110002012-04-18  twu
11001
11002    * result.c, result.h: Added mergedp variable
11003
11004    * outbuffer.c, outbuffer.h: Handling results where mergedp is true
11005
11006    * gmap.c: Allowing chimera finding to be turned off by setting -x to be 0.
11007      Added mergedp variable so merged alignments generate only a single result.
11008
110092012-04-16  twu
11010
11011    * VERSION, config.site.rescomp.tst, index.html: Updated version
11012
11013    * samprint.c: Handling case where clip-overlap results in a NULL substring
11014
11015    * gmap.c: In call to Stage3_merge_local_single, clipping parts around
11016      breakpoint instead of chimerapos and chimeraequivpos, to avoid issues
11017      where maxpeelback is insufficient
11018
11019    * stage3.c, stage3.h: Renamed variable from extendp to max_extend_p
11020
11021    * get-genome.c: Added --forsam flag to generate header for SAM files
11022
110232012-04-10  twu
11024
11025    * VERSION: Updated version
11026
11027    * stage1hr.c: Fixed bug from uninitialized variable
11028
11029    * gmap.c, stage3.c, stage3.h: Added a criterion for extending left and right
11030      chimera ends to consecutive mismatches, based on queryjump and genomejump
11031      being unequal.
11032
110332012-04-09  twu
11034
11035    * gsnap.c, uniqscan.c: Using new interface to Stage3hr_setup
11036
11037    * stage1hr.c: Finding terminals by a new method.  Instead of counting
11038      mismatches from end, requiring only that querypos3 - querypos5 is greater
11039      than index1part.  Now searching terminals on single-end reads even if a
11040      GSNAP alignment has been found.  Removed nsalvage == 0 requirement for
11041      searching terminals and paired-end reads.
11042
11043    * substring.c, substring.h: Added procedure Substring_set_endtypes
11044
11045    * stage3hr.c, stage3hr.h: Changed optimal score procedures to use max of
11046      max-terminal and min-other for prefinal rounds, and max of max-GMAP and
11047      min-other for final rounds.  For GMAP eventrim scores, not counting
11048      indels, and adding a penalty for long ambiguous ends, by dividing by
11049      index1part + (index1interval - 1).  Terminal alignments now compute their
11050      own endtypes.
11051
110522012-04-06  twu
11053
11054    * stage2.c: Fixed fatal bug when looking for shifted canonical splice site
11055      by checking that rightpos is less than genomicend.
11056
11057    * gmap.c, stage3.c, stage3.h: For chimeras, extending ends until three
11058      consecutive mismatches are found.  At final breakpoint, cleaning indels
11059      from ends.
11060
110612012-04-04  twu
11062
11063    * stage1hr.c: For single-end reads, finding distant splicing only when no
11064      other hits have been found
11065
11066    * VERSION, index.html: Updated version
11067
11068    * pair.c, pair.h, stage3.c: Fixed bug in Stage3_mergeable where we require
11069      end1 and start2 pairs to be computed
11070
11071    * splicetrie.c: Allowing 1 mismatch in distal exon
11072
11073    * stage3hr.c: Changed debugging statement to report score_eventrim
11074
11075    * stage3.c: Rewrite trim_novel_spliceends to scan pairs first to find
11076      genomicpos bounds and then iterate through genomicpos.  Allowing
11077      pick_cdna_direction to return SENSE_NULL if no introns exist.
11078
11079    * stage1hr.c: For GMAP terminals, also checking for a bad stretch in GMAP
11080      result after the call to align_halfmapping_with_gmap
11081
11082    * stage3.c: Fixed bug in trim_novel_spliceends when pairs is NULL
11083
110842012-04-03  twu
11085
11086    * stage3.c: Turned off trimming of novel splice ends for GMAP
11087
11088    * index.html: Made changes to reflect new version
11089
11090    * VERSION: Updated version
11091
11092    * stage3hr.c: Fixed issue where no non-terminal alignment existed, resulting
11093      in using min trim length of MAX_READLENGTH
11094
11095    * splicetrie.c: Allowing only 1 mismatch (previously 2) in internal splice
11096      region of 6 bp, and no mismatches in external splice region (previously
11097      depended on extension).  This avoids bad splicing due to poor gene models.
11098
11099    * pair.c, pair.h, stage3.c: Using pairs instead of pairarray in determining
11100      whether chimera ends are connectable
11101
11102    * pair.c, pair.h, stage3hr.c: Counting indels in GMAP alignments only within
11103      eventrim region
11104
11105    * stage3.c: Added function trim_novel_spliceends
11106
11107    * pair.c: Requiring that cdna_direction not be zero when printing splice
11108      site probabilities at the ends
11109
111102012-04-02  twu
11111
11112    * trunk, VERSION, src, chimera.c, chimera.h, gmap.c, outbuffer.c, pair.c,
11113      pair.h, pairpool.c, pairpool.h, stage1hr.c, stage3.c, stage3.h,
11114      stage3hr.c, stage3hr.h, util: Merged revisions 60621 to 60936 from
11115      branches/2012-03-27-gmap-chimeras to improve GMAP chimeras and to apply a
11116      uniform eventrim procedure in stage 3 optimal score procedures
11117
111182012-03-30  twu
11119
11120    * gamma-speed-test.c: Added to SVN repository
11121
11122    * config.site.rescomp.tst, configure.ac, memory-check.pl, atoiindex.c,
11123      cmetindex.c, gmap.c, gsnap.c, indexdb.c, pdldata.c, snpindex.c: Added
11124      --enable-mmap flag to configure.  Added small fixes to allow programs to
11125      work without mmap.
11126
111272012-03-29  twu
11128
11129    * stage3hr.c: Allowing trimming on both ends of a terminal alignment
11130
11131    * pair.c: Handling case where hard clipping is not possible
11132
111332012-03-27  twu
11134
11135    * stage1hr.c: Handling the case where GMAP alignment is attempted on a
11136      translocation
11137
111382012-03-23  twu
11139
11140    * VERSION, index.html: Updated version
11141
11142    * stage1hr.c: Added multiple checks for GMAP bad stretch
11143
111442012-03-22  twu
11145
11146    * index.html: Updated for latest version
11147
11148    * stage1hr.c: Fixing fatal bug when max_end_insertions is set to less than 3
11149
111502012-03-21  twu
11151
11152    * VERSION: Updated version
11153
11154    * index.html, archive.html: Updated for new version
11155
11156    * gsnap.c: Reduced default value of distant-splice-penalty from 2 to 1
11157
11158    * gsnap.c: Reduced default value of distant-splice-penalty from 3 to 2
11159
11160    * stage1hr.c: Changed criterion for GMAP salvage from
11161      Stage3end_bad_stretch_p to Stage3end_score > cutoff_level, because
11162      previous criterion caused GSNAP to miss distant splicing
11163
11164    * dbsnp_iit.pl.in: Checking whether the exceptions field is defined in the
11165      snp file
11166
111672012-03-20  twu
11168
11169    * gmap.c, outbuffer.c, stage3.c, stage3.h: Removed some unused parameters
11170
11171    * VERSION: Updated version
11172
11173    * stage3hr.c: Changed score for GMAP alignments to be post-trim matches
11174      minus penalties for splicing and indels.  Allowing Stage3end_bad_stretch_p
11175      to handle GMAP alignments.
11176
11177    * stage3.c: Not doing any peelback on the extension after trim_ends
11178
11179    * stage1hr.c: Moved check for GMAP bad stretch from elimination as a very
11180      bad alignment to a salvage status
11181
11182    * dynprog.h: Added comment
11183
111842012-03-19  twu
11185
11186    * pair.c, pair.h, stage3.c: Added an extend step after the ends are trimmed,
11187      to get as long an extension as possible.
11188
11189    * stage1hr.c, stage3.c, stage3.h: Added a check for GMAP alignment length,
11190      in addition to the bad stretch check
11191
11192    * gmap.c: Using new interface to Stage3_compute
11193
11194    * substring.c: Added comment
11195
11196    * stage1hr.c, stage3.c, stage3.h, stage3hr.c, stage3hr.h: Removed
11197      computation of gmap_nconsecutive, and implemented Stage3_bad_stretch_p to
11198      evaluate GMAP alignments instead.
11199
112002012-03-16  twu
11201
11202    * gff3_genes.pl.in, gff3_introns.pl.in, gff3_splicesites.pl.in: Filtering
11203      out comment lines beginning with '#'.
11204
11205    * gff3_genes.pl.in, gff3_introns.pl.in, gff3_splicesites.pl.in: Printing
11206      results for last mRNA in each gene
11207
112082012-03-15  twu
11209
11210    * stage3.c: In traversing dual break, accepting the stage 2 solution only if
11211      the entire query sequence is bridged
11212
11213    * gff3_genes.pl.in, gff3_introns.pl.in, gff3_splicesites.pl.in,
11214      gtf_genes.pl.in, gtf_introns.pl.in, gtf_splicesites.pl.in: Made gff3 and
11215      gtf programs handle exons in arbitrary order
11216
11217    * gtf_genes.pl.in, gtf_introns.pl.in: Checking for comment lines
11218
11219    * gtf_genes.pl.in, gtf_introns.pl.in: Allowing GTF file to lack exon_number
11220      field
11221
11222    * gtf_genes.pl.in: Checking both gene_name and gene_id to get gene name
11223
112242012-03-13  twu
11225
11226    * gmap.c, inbuffer.c, inbuffer.h: Added -1 flag for self-align feature
11227
112282012-03-12  twu
11229
11230    * configure.ac, README, config.site: Changed default value of MAX_READLENGTH
11231      from 200 to 250
11232
11233    * stage1hr.c: Added missing else statement
11234
11235    * dynprog.c, splicetrie.c, splicetrie.h: Changed Splicetrie_solve_end5 and
11236      Splicetrie_solve_end3 to take triecontents, trieoffsets, and j, rather
11237      than triestart, and to check for a null pointer.
11238
11239    * stage3hr.c: No longer returning NULL from Stage3end_new_gmap when result
11240      crosses before beginning of genome or at end of chromosome.
11241
11242    * stage1hr.c: Removed possibility of dereferencing uninitialized memory when
11243      skipping over diagonals straddling beginning of genome.
11244
112452012-03-09  twu
11246
11247    * gmap.c, stage3.c, stage3.h: Made changes in Stage3_merge_single compatible
11248      with PMAP
11249
11250    * Makefile.dna.am, gsnap_fasta.c: Moved gsnap_fasta.c and bam_fasta and
11251      sam_fasta programs to GSTRUCT repository
11252
11253    * get-genome.c: Fixed -E option for printing exons from gene map files.
11254      Added -S option for printing sequence from gene map files.
11255
11256    * get-genome.c: Removed R from getopt string
11257
11258    * get-genome.c: Removed -R flag and references to map_relativep
11259
112602012-03-08  twu
11261
11262    * gmap.c: Removed debugging statement
11263
11264    * gmap.c: Fixed test for CHIMERA_SLOP to handle minus strand alignments
11265      correctly.  Added a remove duplicates step to stage3list_from_gregions.
11266      Using new interface to Stage3_merge_single and Stage3_merge_splice.
11267
11268    * stage3.c, stage3.h: Implemented functions Stage3_merge_single and
11269      Stage3_merge_splice. The first function uses dynamic programming to solve
11270      the region between the two parts.
11271
11272    * pair.c, pair.h: Implemented function Pair_set_genomepos_list
11273
112742012-03-07  twu
11275
11276    * gsnap.c: Fixed default values in --help statement for number of GMAP runs
11277      allowed
11278
11279    * stage1hr.c: Turned bad stretchp back on as a criterion for running GMAP
11280      improvement in paired-end reads.  Implemented GMAP improvement for
11281      single-end reads.
11282
11283    * substring.c, substring.h: Implemented Substring_genestrand
11284
11285    * stage3hr.c, stage3hr.h: Added genestrand as a field for Stage3end_T object
11286
112872012-03-02  twu
11288
11289    * stage3.c: Added another check to make sure we don't try to solve for dual
11290      breaks at the ends of an alignment
11291
112922012-03-01  twu
11293
11294    * dynprog.c: Making certain that left_prob and right_prob are initialized to
11295      0.0
11296
11297    * splicetrie_build.c: Limiting distance for splicetries to shortsplicedist
11298
11299    * stage3.c: Checking for endgappairs == NULL, before trying to access
11300      *endgappairs
11301
113022012-02-29  twu
11303
11304    * configure.ac: Changed default .config.site to ./config.site
11305
11306    * dynprog.c: Fixed cases where left_prob and right_prob were not assigned
11307
11308    * stage3.c: Making singlep a local variable in traverse_dual_genome_gap
11309
113102012-02-28  twu
11311
11312    * stage3.c: Setting *singlep to false to fix bug in traversing dual genome
11313      gaps where left goodness or right goodness was called after a dual gap win.
11314
11315    * gmap.c: Calling Alphabet_setup, Oligo_setup, Oligop_setup, Indexdb_setup,
11316      and Stage1_setup only when genome is provided
11317
113182012-02-27  twu
11319
11320    * gmap.c, stage3.c, stage3.h: Restored cdna_direction == 0 when no splices
11321      are present. Transferring overall cdna_direction to first Stage3_T object
11322      in a chimera.
11323
11324    * stage3.c: Disallowing cdna_direction to be set to 0
11325
11326    * gmap.c: Added call to Oligo_setup
11327
11328    * indexdb.c: Looking for offsetscomp_suffix exactly
11329
11330    * gmap.c: Using new interface to Stage3_compute
11331
113322012-02-24  twu
11333
11334    * VERSION: Updated version number
11335
11336    * stage3.c: Handling case where non-canonical splice is exactly in the
11337      middle of the read
11338
11339    * stage1hr.c, stage3.c, stage3.h, stage3hr.c, stage3hr.h: Replaced
11340      high_quality computation on gmap alignments with gmap_nconsecutive
11341
11342    * outbuffer.c, gsnap.c, goby.c, goby.h: Implemented new Goby code
11343
11344    * gtf_splicesites.pl.in: Skipping comment lines that begin with '#'.
11345
11346    * gtf_splicesites.pl.in: Removed requirement for exon_number field in GTF
11347      file
11348
11349    * gmap_build.pl.in: Added -T flag to specify temporary build directory
11350
11351    * gmap_build.pl.in: Deleting .coords file
11352
113532012-02-14  twu
11354
11355    * gmap_build.pl.in: Fixed filenames for 1-digit values
11356
11357    * oligo.c: Added repetitive oligos for 6-, 7-, and 8-mers.
11358
11359    * indexdb.c: Improved warning message
11360
11361    * gmap_build.pl.in: Made default value for base size to be kmer size
11362
11363    * configure.ac: Moved read of config.site and setting of CFLAGS earlier, so
11364      default CFLAGS is not set by autoconf.
11365
11366    * setup1.test.in, Makefile.am, setup.ref12123positions.ok,
11367      setup.ref123positions.ok: Changed name of positions file to reflect new
11368      naming scheme
11369
11370    * README: Added comment about new naming for positions file
11371
11372    * VERSION: Updated version
11373
11374    * config.site.rescomp.prd, config.site.rescomp.tst: Changed dates
11375
11376    * configure.ac: Adding check for CFLAGS and setting default to be -O3
11377
11378    * config.site: Commenting out default CFLAGS variable
11379
11380    * README: Added comment about adding -m64 in CFLAGS for Macintosh machines
11381
11382    * gsnap.c, stage1hr.c, stage1hr.h: Added control of gmap_indel_knownsplice
11383      feature to gsnap program
11384
11385    * stage3hr.c, stage3hr.h: Added function to help run GMAP on indels to find
11386      known splicing
11387
11388    * stage1hr.c: Added function to run GMAP on indels to find known splicing
11389
11390    * stage1.c: Using new interface to Reader_new
11391
11392    * pmapindex.c: Removed -R flag for processing reverse strand of genome.
11393      Consolidated code for computing offsets and positions.
11394
11395    * pair.c: Eliminated extra points subtracted from Pair_nmatches, so that the
11396      function reports the correct number of matches
11397
11398    * dynprog.c: Fixed bug where known splicing was being called when length2
11399      was 0, resulting in bad endpoints for binary search
11400
114012012-02-03  twu
11402
11403    * stage1hr.c: Using spansize in computing floors.  Reduced value of
11404      STAGE2_MIN_OLIGO from 5 to 3.
11405
11406    * gsnap.c, uniqscan.c: Using new interface to Stage1hr_setup and
11407      Spanningelt_setup
11408
11409    * reader.c, reader.h: Removed blocksize as a field for Reader_T object
11410
11411    * oligo.c, oligo.h: Storing oligosize as a static variable, and not using it
11412      anymore as a parameter to Oligo_next or Oligo_skip.
11413
11414    * block.c: Removed oligosize as a field from Block_T object
11415
11416    * stage1hr.c, stage3hr.c, stage3hr.h, substring.c, substring.h: Revised
11417      Stage3end_optimal_score to compare terminal and non-terminal alignments
11418      using an eventrim procedure, based on the maximal trims found in the
11419      non-terminal alignments, and re-computing the mismatch scores in those
11420      regions.
11421
11422    * stage1hr.c: Running Stage3end_remove_overlaps to remove terminals that
11423      overlap with a GSNAP alignment.  Using spansize to compute fast_level.
11424      Removed index1part (oligosize) from calls to Reader_new and Oligo_next.
11425
114262012-02-02  twu
11427
11428    * stage1hr.c, stage1hr.h: Generalized from 6 to STAGE2_MIN_OLIGO +
11429      index1interval, for deciding whether a found diagonal is close enough to
11430      end to redefine region for GMAP alignment.  Using spansize to compute
11431      floors, and counting all mismatches if spansize != index1part.
11432
11433    * spanningelt.c, spanningelt.h: Added function Spanningelt_setup, which
11434      computes spansize for a given index1part and index1interval
11435
11436    * pairdef.h, pairpool.c, stage3.c: Added field end_intron_p to Pair_T
11437      object, and using it in traversing dual genome gaps.  When applicable,
11438      solving introns where left intron or right intron is omitted.
11439
11440    * stage1hr.c: Improved debugging messages by printing chromosomal
11441      coordinates rather than universal coordinates
11442
11443    * trunk, config.site.rescomp.prd, config.site.rescomp.tst, memory-check.pl,
11444      src, Makefile.dna.am, atoiindex.c, cmetindex.c, genomicpos.c,
11445      genomicpos.h, gmapindex.c, indexdb.c, indexdb_hr.c, pmapindex.c,
11446      snpindex.c, spanningelt.c, spanningelt.h, stage1hr.c, types.h, util,
11447      gmap_build.pl.in, gmap_setup.pl.in: Merged revisions 56772 to 56962 from
11448      branches/2012-01-31-index1interval to allow genomic indices up to 16-mers
11449      and sampling intervals from 1 to 3
11450
114512012-02-01  twu
11452
11453    * resulthr.c, stage3hr.c, stage3hr.h: If anomalous splices (of type
11454      SAMECHR_SPLICE) occur within a read for a unique mapping, changing
11455      resulttype from _UNIQ to _TRANSLOC.
11456
11457    * outbuffer.c: Fixed issue with double opening of nomapping output file,
11458      first as single and then as paired
11459
114602012-01-31  twu
11461
11462    * gmapindex.c: Allowing only values 1, 2, and 3 for sampling interval
11463
114642012-01-30  twu
11465
11466    * gmap.c, gsnap.c, indexdb.c, indexdb.h, stage1hr.c, stage1hr.h, uniqscan.c:
11467      Added --sampling flag and passing index1interval to Stage1hr_setup
11468
11469    * uniqscan.c: Using new interface to Indexdb_new_genome
11470
11471    * indexdb.c: Fixed incorrect variable name (oligo instead of oligoi).
11472
11473    * gmap_build.pl.in: Removed fixed value of -q 3 in calls to gmapindex
11474
11475    * gmap_build.pl.in: Added -q flag for sampling interval
11476
11477    * gmap.c, gmapindex.c, gsnap.c: Allowing user to enter -k 16
11478
11479    * block.c, indexdb_dump.c, indexdb_hr.c, stage1.c, stage1hr.c: Fixed mask
11480      calculation to use unsigned long, so kmer of 16 works
11481
11482    * atoiindex.c, cmetindex.c, indexdb.c, indexdb.h, snpindex.c: Enabled
11483      writing of genome indices with kmer of 16
11484
11485    * oligo.c: Allowing k-mer size of 16
11486
11487    * genome_hr.c: Clarified types of gamma and value to be Positionsptr_T and
11488      firstbit to be int.
11489
114902012-01-27  twu
11491
11492    * oligo.c: Allowing k-mer size to go down to 9
11493
11494    * indexdb_hr.c, indexdbdef.h, indexdb.c: Eliminated separate data storage
11495      for offsets when expand_offsets_p is true.  Instead, expanding gammas into
11496      offsetscomp and making gammaptrs just the identity function.
11497
114982012-01-19  twu
11499
11500    * indexdb.c, indexdb.h: Allowing selection of base size, and returning found
11501      base size. Simplified logic for selecting indexdb.
11502
11503    * gmap.c, gsnap.c: Added --basesize flag to allow user to select base size
11504
11505    * gmapindex.c: Allowing k-mer size to be 15 or less
11506
115072012-01-18  twu
11508
11509    * gmapindex.c: Added break after handling flag -b
11510
115112012-01-13  twu
11512
11513    * VERSION: Updated version number
11514
11515    * Makefile.three.am: Rebuilt instructions from Makefile.gsnaptoo.am, plus
11516      Makefile.dna.am for PMAP and PMAPINDEX
11517
11518    * Makefile.gsnaptoo.am: Removed gsnap_tally from distribution
11519
11520    * README: Change URL for Web site
11521
11522    * alphabet.c: No longer checking new get_codon procedures against old ones
11523
11524    * pmapindex.c: Revised limit for 8-mers to be alphabet size of 13 or less
11525
11526    * README.PMAP: Added a README file for PMAP
11527
11528    * indexdb.c, indexdb.h: Providing alphabet and alphabet_size to caller of
11529      Indexdb_new_genome
11530
11531    * gmap.c: Added --alphabet flag to PMAP to specify a particular alphabet
11532
11533    * atoiindex.c, cmetindex.c, genome_hr.h, snpindex.c, types.h: Moved
11534      definition of Storedoligomer_T from indexdbdef.h to types.h
11535
11536    * block.c, block.h, stage1.c: Removed msb computation and storage in Block_T
11537      object, and initializing instead by Oligop_setup
11538
11539    * alphabet.c, alphabet.h, oligop.c, oligop.h: Providing aa_index_table to
11540      Oligop procedures at run time
11541
11542    * Makefile.dna.am, alphabet.c, alphabet.h, indexdb.c, indexdb.h,
11543      indexdbdef.h, pmapindex.c: Created Alphabet_T object.  Moved relevant
11544      codon-based procedures from indexdb.c to alphabet.c.  Replaced
11545      get_codon_fwd and get_codon_rev with lookup tables.  Allowed pmapindex to
11546      generate indices with different alphabets and alphabet sizes.
11547
11548    * gmapindex.c: Added check for k-mer size being greater than or equal to
11549      base size
11550
11551    * gmap.c: Made default k-mer size 7 for PMAP
11552
11553    * Makefile.dna.am, indexdb.c, pmapindex.c: Made PMAP index files compatible
11554      with compressed hash tables and added flags to be consistent with other
11555      auxiliary indexing programs
11556
115572012-01-11  twu
11558
11559    * VERSION: Updated version
11560
11561    * index.html: Updated Web page for latest version
11562
11563    * mapq.c: Fixed debugging statement
11564
11565    * stage3hr.c: In output_cmp procedures, sorting first by nmatches (pretrim),
11566      and then by mapq.  Added procedure to enforce monotonicity of mapq scores.
11567
11568    * stage1hr.c, stage3hr.c, stage3hr.h: Running optimal_score first allowing
11569      all GMAP alignments, then removing overlaps, and then running
11570      optimal_score again without special provision for GMAP alignments.
11571
11572    * gmap.c, gsnap.c, pair.c, pair.h, uniqscan.c: Added flag --sam-use-0M to
11573      control printing of 0M in CIGAR strings
11574
11575    * gmap.c, gsnap.c, stage3.h, uniqscan.c: Providing output_sam_p to
11576      Stage3_setup
11577
11578    * stage3.c: In fill_in_gaps, handling dual breaks by inserting query and
11579      genomic segments when output_sam_p is true.
11580
11581    * pair.c: In compute_md_string, setting state to be IN_MATCHES after seeing
11582      I token, so we don't print two successive ^ tokens.
11583
11584    * stage3.c: Added a pass to remove adjacent insertions and deletions and fix
11585      single gaps again.
11586
115872012-01-10  twu
11588
11589    * gmap_build.pl.in: Doing chmod on gammaptrs only if kmersize > basesize
11590
115912012-01-09  twu
11592
11593    * gsnap.c, stage1hr.c, stage3hr.c, stage3hr.h: Computing GMAP alignment
11594      score using standard splicing penalties and indel penalties
11595
115962012-01-06  twu
11597
11598    * README: Added information about maximum genome length.  Added information
11599      about SAM fields.
11600
11601    * atoiindex.c, cmetindex.c: Eliminated memory leaks and reduced memory usage
11602      from 20 GB to 12 GB for a human-sized genome
11603
11604    * gmap.c, gsnap.c, uniqscan.c: Using new interfaces to Dynprog_setup and
11605      Stage3_setup to provide information about novelsplicingp.  Passing
11606      fileroot instead of dbroot to Datadir_find_mapdir.
11607
11608    * stage3hr.c: Stage3end_optimal_score and Stage3pair_optimal_score now
11609      keeping all GMAP results
11610
11611    * stage3.c, stage3.h: Marking pairs as disallowed when novelsplicingp is
11612      false and dynamic programming cannot find a solution provided by the
11613      splicing_iit file. Trimming end introns that are disallowed.
11614
11615    * stage1hr.c: Added debugging message
11616
11617    * dynprog.c, dynprog.h: The bridge_intron_gap function now handling runs
11618      where novelsplicingp is false and known splicing is at intron level.
11619      Dynprog_genome_gap returning NULL in such cases.
11620
11621    * pair.c, pairdef.h, pairpool.c: Added field disallowedp to Pair_T object
11622
11623    * iit-read.c, iit-read.h: Added functions IIT_low_exists and
11624      IIT_high_exists, used for intron-level known splicing.  Improved warning
11625      message for invalid IIT files.
11626
116272012-01-05  twu
11628
11629    * gmapindex.c: Added check for total genomic length exceeding 4 GB
11630
11631    * snpindex.c: Reduced maximum memory usage for human genome from 12 GB to 8
11632      GB. Eliminated memory leaks.
11633
116342012-01-03  twu
11635
11636    * trunk, src, util: Merged revisions 50470 to 50909 from
11637      branches/gmap-2011-10-24-mult-stage2
11638
11639    * config.site.rescomp.tst, config.site.rescomp.prd: Changed date
11640
116412011-12-28  twu
11642
11643    * README: Changed comment about splicing file with known introns possibly
11644      being buggy
11645
11646    * VERSION: Updated version number
11647
11648    * index.html: Added changes for new version
11649
11650    * iit-read.c: If add_iit_p is true, trying filename with .iit suffix first
11651      before trying filename as given
11652
11653    * stage3hr.c: Added check for low coordinate of new GMAP object being to the
11654      left of coordinate 0
11655
116562011-12-21  twu
11657
11658    * shortread.c: Allowing for /3 endings in second end of Illumina short reads
11659
116602011-12-13  twu
11661
11662    * index.html: Made update for new version
11663
11664    * VERSION: Updated version number
11665
11666    * stage3.c: Fixed issue in trying to solve dual introns with negative query
11667      coordinates
11668
116692011-12-09  twu
11670
11671    * spanningelt.c, spanningelt.h, stage1hr.c: Alternate fix to problem with
11672      positions being NULL, while npositions > 0.  Fixed Spanningelt_set so it
11673      updates npositions along with positions.
11674
11675    * VERSION: Updated version number
11676
11677    * archive.html: Moved old version here
11678
11679    * index.html: Added new version.  Added information about users group.
11680
11681    * stage1hr.c: Added an extra check for positions[querypos] not being NULL,
11682      needed for nonstranded modes
11683
116842011-12-07  twu
11685
11686    * gmap_build.pl.in: Fixed bug in not assigning cmd variable
11687
116882011-12-05  twu
11689
11690    * shortread.c: Fixed bug in printing null accession for second end when
11691      using --fails-as-input flag
11692
11693    * gmap_build.pl.in: Checking return codes from system calls
11694
116952011-12-02  twu
11696
11697    * VERSION, index.html: Updated version number
11698
11699    * gsnap.c: Changed name of flag from --ambig-splice-notrim to
11700      --ambig-splice-noclip
11701
11702    * pair.c: Pushing "0M" between adjacent I and D operations in a cigar string
11703
11704    * gmap.c, gsnap.c, uniqscan.c: Using new interface to Splicetrie_setup.
11705      Providing --ambig-splice-notrim flag in GSNAP.
11706
11707    * splicetrie.c, splicetrie.h: Providing behavior to turn clipping off at
11708      ambiguous known splice sites, useful if trying to turn off all soft
11709      clipping
11710
11711    * stage1hr.c: Fixed debugging statement
11712
117132011-11-30  twu
11714
11715    * stage1.c: Fixed variable name so PMAP could compile
11716
117172011-11-29  twu
11718
11719    * dynprog.c, maxent_hr.c, maxent_hr.h, pair.c, stage1hr.c, stage3.c,
11720      substring.c: Checking for case where splice_pos minus margin goes beyond
11721      beginning of chromosome
11722
11723    * VERSION, index.html: Updated version number
11724
11725    * maxent_hr.c: Checking for case where splice_pos is smaller than margin
11726
11727    * pair.c, samprint.c: Changed value of NM tag in SAM output to be edit
11728      distance (mismatches plus gaps)
11729
11730    * stage1hr.c: Fixed debugging statement
11731
117322011-11-27  twu
11733
11734    * VERSION, index.html: Updated version number
11735
11736    * gsnap.c: Calling SAM_setup
11737
11738    * outbuffer.c: For GMAP, when quiet_if_excessive_p is true and npaths >
11739      maxpaths, not printing any output
11740
11741    * pair.c, pair.h, stage3.c: Printing HI tag in SAM output
11742
11743    * samprint.c, samprint.h: When quiet_if_excessive_p is true and npaths_mate
11744      > maxpaths, setting MATE_UNMAPPED in flag.  Printing HI tag.  Added
11745      function SAM_setup.
11746
117472011-11-25  twu
11748
11749    * VERSION, index.html: Updated version number
11750
11751    * stage3.c: For GMAP, using QUERYEND_INDELS instead of QUERYEND_NOGAPS
11752
11753    * stage1.c: Added check for querylength being too short
11754
11755    * gmap.c: Changed information in --help about how to turn off chimera
11756      detection
11757
11758    * stage3.c: In Stage3_mergeable, added check for firstpart_npairs or
11759      secondpart_npairs being 0
11760
117612011-11-23  twu
11762
11763    * index.html, VERSION: Updated version number
11764
11765    * stage1.c: Made direct calls to fields in Match_T objects for speed.  Made
11766      changes so debugging macros work.
11767
11768    * dynprog.c: Made special traceback procedure for queryend_nogaps
11769
117702011-11-21  twu
11771
11772    * stage3hr.c: Fixed bug where Stage3end_remove_overlaps was not keeping ends
11773      where paired_usedp was true.  In Stage3end_remove_overlaps and
11774      Stage3pair_remove_overlaps, terminal alignments lose to all other types.
11775
11776    * stage3hr.h: Put GMAP hittype before TERMINAL hittype
11777
11778    * gsnap.c: Increased default values for max_gmap_pairsearch,
11779      max_gmap_terminal, and max_gmap_improvement
11780
11781    * stage3hr.c: Changed code for Stage3end_remove_overlaps to parallel that
11782      for Stage3pair_remove_overlaps
11783
11784    * samprint.c: Allowing for GMAP alignments to be printed for single-end reads
11785
11786    * VERSION, config.site.rescomp.prd, config.site.rescomp.tst: Updated version
11787      number
11788
11789    * stage3.c: Added comment about pass 8 and 9
11790
11791    * index.html: Made changes for 2011-11-20 version
11792
11793    * uniqscan.c: Using new interface to Stage1_single_read
11794
11795    * stage3hr.c: Changed the code for Stage3end_optimal_score to be consistent
11796      with that of Stage3pair_optimal_score.  Allowing terminal alignments to be
11797      considered along with other alignments.
11798
11799    * stage1hr.c: For single-end alignment, initializing done_level to
11800      user_maxlevel, rather than opt_level, to be consistent with the code for
11801      paired-end alignment.  However, the values are the same anyway.
11802
11803    * gsnap.c: Changed the default value for terminal_threshold from 3 to 2.
11804      Expanded the --help entry for --terminal-threshold.
11805
11806    * index.html: Made changes for version 2011-11-17
11807
118082011-11-20  twu
11809
11810    * stage3.c: Added more restrictions on non-canonical splices at ends of read
11811
11812    * stage3hr.c, stage3hr.h: Added functions Stage1end_start_endtype and
11813      Stage1end_end_endtype for terminal alignments
11814
11815    * gsnap.c: Using new interface to Stage1_single_read
11816
11817    * gmap.c, stage3.c, stage3.h: Changed name of variable
11818
11819    * stage1hr.c, stage1hr.h: Implemented GMAP terminal mode for single-end reads
11820
11821    * dynprog.c: Lowered rewards for canonical introns, to help find
11822      non-canonical introns
11823
11824    * stage3.c: In fill_in_gaps, if splicingp is false, then filling in a
11825      deletion, not an intron
11826
118272011-11-19  twu
11828
11829    * stage3.c: Moved trim_noncanonical procedures from path_trim to
11830      path_compute. After trim_noncanonical procedures, doing an extension using
11831      QUERYEND_NOGAPS.
11832
11833    * stage1hr.c: Using -3*nmismatches and -4 for an indel in evaluating end
11834      indels, corresponding to default values for trim_mismatch_score and
11835      trim_indel_score.
11836
118372011-11-18  twu
11838
11839    * stage3.c: Using endalign instead of to_queryend_p.  When endalign is
11840      QUERY_NOGAPS, not doing peelback.  In pass 8, changed extendp from false
11841      to true (QUERYEND_INDELS).
11842
11843    * gsnap.c, uniqscan.c: Added flag --trim-indel-score
11844
11845    * dynprog.c, dynprog.h: Replaced to_queryend_p with endalign, with types
11846      BEST_LOCAL, QUERYEND_INDELS, and QUERYEND_NOGAPS.  For extensions to
11847      queryend, always returning gappairs.
11848
11849    * atoiindex.c, cmetindex.c: Updated --help to indicate how --kmer and -D are
11850      chosen by default
11851
118522011-11-17  twu
11853
11854    * VERSION: Updated version number
11855
11856    * gmap.c, gsnap.c, stage1hr.c, stage3.c, stage3.h, uniqscan.c: Revised
11857      calculation of insertlength inside trim_noncanonical_ends procedures to
11858      compensate for using maximum overlap in computing genomicseg.
11859
11860    * stage3.c: Made fill_in_gaps procedure replace short non-canonical introns
11861      with deletions.  Removed this feature from assign_gap_types.  Added
11862      additional checks on translation coordinates to stay within array bounds.
11863
11864    * dynprog.c: Added debugging statements
11865
11866    * substring.c: Added comment
11867
11868    * shortread.c, shortread.h, stage1hr.c: For GMAP algorithm in GSNAP,
11869      assuming maximum overlap, rather than trying to compute overlap
11870
11871    * outbuffer.c: Printing all paths when -E flag is given to GMAP
11872
11873    * stage3hr.c: Improved error message when ambig end splicetype has an
11874      unexpected value
11875
11876    * dynprog.c: Removed unused code for END_KNOWNSPLICING_SHORTCUT.  Always
11877      assigning *ambig_end_length in Dynprog_end5_known and Dynprog_end3_known.
11878
118792011-11-14  twu
11880
11881    * uniqscan.c: Using new interface to Stage3hr_setup
11882
11883    * stage3.c: Not substituting for long deletions at end of path_compute
11884
11885    * stage1hr.c, stage3hr.c, stage3hr.h: Removed end_indel_p parameters to
11886      Stage3end_new_insertion and Stage3end_new_deletion
11887
11888    * stage1hr.c: Preventing call to Genome_count_mismatches_limit by
11889      find_doublesplices where pos5 >= pos3
11890
11891    * VERSION: Updated version number
11892
11893    * configure.ac: Added flag --enable-popcnt
11894
11895    * dynprog.c: Initialing value of ambig_end_length
11896
11897    * stage3.c: Skipping gap pairs at the beginning of alignments in fill_in_gaps
11898
11899    * goby.c: Using new interface to Result_array
11900
119012011-11-12  twu
11902
11903    * VERSION: Updated version number
11904
11905    * stage1hr.c, stage3hr.c: Checking in Stage3end_new_gmap if genomicstart or
11906      genomicend exceeds chrhigh, and if so, returns NULL.
11907
119082011-11-11  twu
11909
11910    * stage3.c: Moved assigning of gap types to end of path_compute, rather than
11911      beginning of path_trim
11912
11913    * stage3.c: Inserting gap pairs and adding gap types at beginning of
11914      path_trim
11915
11916    * gsnap.c, outbuffer.c, outbuffer.h, samprint.c, samprint.h, stage1hr.c,
11917      stage3hr.c, stage3hr.h: Divided DISTANT_SPLICE type to SAMECHR_SPLICE and
11918      TRANSLOC_SPLICE. Making merge_samechr_p act only at print time, which
11919      allows SAMECHR_SPLICE to undergo pair_up_concordant again.  Stopping
11920      clip-overlap on distant splices.
11921
119222011-11-10  twu
11923
11924    * VERSION: Updated version number
11925
11926    * stage3hr.c: Keeping hits in optimal scoring if nmatches_posttrim is
11927      sufficiently high relative to the best hit
11928
11929    * gsnap.c, uniqscan.c: Increased value of max_deletionlength from 30 to 50
11930
11931    * gmap.c, stage1hr.c: Using new interface to Stage3_compute
11932
11933    * stage3.c, stage3.h: Changed flow to path_compute on both cdna directions,
11934      then pick_cdna_direction, then path_trim, which removes non-canonical end
11935      exons.
11936
11937    * stage3hr.c, stage3hr.h, substring.c, substring.h: Added nmatches_posttrim
11938      and using it to break ties resulting from equal values of nmatches.  Added
11939      a general test for Substring_new based on matches and mismatches before
11940      trimming.
11941
11942    * pair.c: Modified Pair_nmatches to add penalties for an indel and for a
11943      splice site with low probabilities.  Modified compute_md_string on I
11944      tokens to skip only for insertion pairs.
11945
11946    * dynprog.c: Fixed procedure find_best_endpoint_to_queryend to look only at
11947      r == length1
11948
11949    * stage3hr.c: Turned off separate treatment of terminal alignments in
11950      Stage3pair_optimal_score.  Always trimming ends of insertions and
11951      deletions (previously depended on value of end_indel_p).
11952
119532011-11-09  twu
11954
11955    * stage3.c: Using a sliding scale in trimming end exons
11956
11957    * pair.c: Added debugging macro for compute_md_string
11958
11959    * pair.c, pair.h: Taking cdna_direction as a parameter in
11960      Pair_print_exonsummary, instead of determining sense separately for each
11961      intron
11962
11963    * stage3.c: Restored alignment score in pick_cdna_direction with different
11964      values of significant difference for GMAP and GSNAP.  Considering a
11965      non-canonical intron as canonical if both splice site probabilities are
11966      high.
11967
11968    * splicing-score.c: Fixed getopt so -D takes an argument
11969
11970    * pairpool.c, pairpool.h: Revised Pairpool_count_bounded to return number of
11971      pairs at start.
11972
11973    * gmap.c: Added comments
11974
11975    * stage3.c: Restored use of alignment scores in pick_cdna_direction, after
11976      comparing number of noncanonical splices
11977
119782011-11-08  twu
11979
11980    * VERSION: Updated version number
11981
11982    * shortread.c: No longer printing warning message about not finding "/1" or
11983      "/2" endings
11984
11985    * gsnap.c, uniqscan.c: Added max_deletionlength variable
11986
11987    * gmap.c, outbuffer.c, outbuffer.h, pair.c, pair.h, samprint.c: No longer
11988      converting short noncanonical splices from type N to type D, since this is
11989      now performed in stage 3.  Removed cigar_noncanonical_splices_p variable.
11990
11991    * stage3.c, stage3.h: In assign_gap_types, converting noncanonical splices
11992      smaller than max_deletionlength into deletions
11993
11994    * gsnap.c, stage3hr.c, stage3hr.h, uniqscan.c: Treating distant splices on
11995      the same chromosome as a translocation by default.  Added a flag
11996      --merge-distant-samechr to get previous behavior.
11997
11998    * stage3.c: Using maxintronlen in trimming end exons
11999
12000    * stage3.c: Removed alignment scores from pick_cdna_direction.  Revised
12001      procedures for trimming noncanonical end exons and doing distal/medial
12002      comparison by adding procedures canonicalp and good_end_intron_p, with
12003      latter using probabilities.
12004
12005    * gmap.c: Increased parameter for maxpeelback_distalmedial from 24 to 100
12006
12007    * dynprog.c: Added debugging statements
12008
120092011-11-07  twu
12010
12011    * pair.c: Computing MD string from cigar tokens
12012
12013    * stage2.c: Restored querydist penalty
12014
12015    * VERSION: Revised version number
12016
12017    * index.html: Added entry for new version
12018
12019    * stage1hr.c: Commented out second round of terminal alignments
12020
12021    * gmap.c, gsnap.c, uniqscan.c: Set trim_indel_score to -4 to be consistent
12022      with previous value
12023
12024    * gmap.c, gsnap.c, uniqscan.c: Calling Pair_setup
12025
12026    * substring.c: Extending trimming toward ends in case of ties
12027
12028    * stage3.c: Extending ends completely before final trim
12029
12030    * pair.h, pair.c: Extending trimming toward ends in case of ties.  Added
12031      Pair_setup function to use trim_mismatch_score value provided by user.
12032
12033    * stage2.c: Put code for suboptimal starts into a compiler directive
12034
12035    * gmap.c, gsnap.c, uniqscan.c: Provided new defaults for
12036      suboptimal_score_start and suboptimal_score_end, based on simulations
12037
12038    * gmap.c, gsnap.c, stage2.c, stage2.h, uniqscan.c: Introduced parameters for
12039      suboptimal_score_end and suboptimal_score_start
12040
120412011-11-06  twu
12042
12043    * gmap.c: Added flag for --suboptimal-score
12044
12045    * gmap.c, gsnap.c, outbuffer.c, pair.c, pair.h, result.c, result.h,
12046      resulthr.c, resulthr.h, samprint.c, samprint.h, stage1hr.c, stage3.c,
12047      stage3.h, stage3hr.c, stage3hr.h, uniqscan.c: Restoring old MAPQ score.
12048      Making absolute MAPQ score a separate calculation, and printing it in an
12049      XQ flag.
12050
120512011-11-05  twu
12052
12053    * stage1hr.c: Added comments
12054
12055    * stage3hr.c: Added field indel_low, and using to prefer indels at low
12056      genomic coords
12057
12058    * stage1hr.c: Fixed computation of firstbound and lastbound so end indels
12059      are found on short reads, such as 36-mers.
12060
120612011-11-04  twu
12062
12063    * pair.c: Added code to compute_cigar to merge duplicate token types
12064
12065    * stage2.c: Fixed uninitialized value for last_canonicalp
12066
12067    * stage2.c: Implemented ability to generate suboptimal paths based on
12068      different initial positions
12069
120702011-11-03  twu
12071
12072    * gmapindex.c: Removed unnecessary file open for -P flag
12073
12074    * gmapindex.c: Fixed memory leaks for -G flag
12075
12076    * gmapindex.c: Fixed memory leaks for -A flag
12077
120782011-11-01  twu
12079
12080    * pairpool.c: Commented out copy of shortexonp
12081
12082    * dynprog.c, dynprog.h, gmap.c, gsnap.c, stage3.c, uniqscan.c: Removed
12083      endlength requirement for microexons.  Returning prob2 and prob3 from
12084      Dynprog_microexon_int and applying two standards, depending on whether an
12085      indel was originally present
12086
12087    * pair.c: Printing shortexon information
12088
12089    * pairpool.c: Copying shortexon information in copying pairs
12090
12091    * smooth.c: Removed unused parameters
12092
12093    * stage3.c: Adding endlength requirement for finding microexons
12094
12095    * smooth.c: Printing result of smoothing as pairs
12096
12097    * stage3.c: Made penalties harsher for indels at end near poor splice sites
12098
12099    * stage2.c: Computing best overall score during dynamic programming process
12100
12101    * uniqscan.c: Using new interface to Stage3hr_setup
12102
12103    * stage2.c: Made changes so PMAP could compile
12104
12105    * stage3.h: Added parameter favor_mode to Stage3_compute
12106
12107    * stage3.c: Counting indel near splice as 2 mismatches.  Added endlength
12108      requirement of 12 for indel near splice.
12109
12110    * stage2.c: Going through all hits to accumulate cells.  No longer using
12111      number of links to set root scores.
12112
12113    * gmap.c, stage1hr.c: Passing value of favor_mode to Stage3_compute
12114
12115    * smooth.c: Relaxing probability requirement for end exons in GSNAP from
12116      0.05 to 0.10.
12117
121182011-10-31  twu
12119
12120    * stage3.c: In trimming non-canonical end exons, not combining nearindelp
12121      with splice probs, requiring 1 mismatch or less for bingop, and allowing
12122      AT-AC introns.  Extending alignments to queryend before trimming
12123      non-canonical end exons.
12124
12125    * stage3.c: In trimming end exons, using a sliding scale based on intron
12126      length. Also penalizing for indels and mismatches close to exon-exon
12127      boundary.
12128
121292011-10-28  twu
12130
12131    * gsnap.c, stage1hr.c, stage1hr.h, stage3hr.c, stage3hr.h: Providing
12132      expected_pairlength and pairlength_deviation values in Stage3hr_setup, and
12133      removing from Stage1hr procedures.
12134
12135    * stage3.c: In end exons, checking if indel present, and if so, requiring
12136      that splice probabilities both be greater than 0.9.
12137
12138    * gsnap.c, stage3hr.c, stage3hr.h: Restoring pairlength deviation.  Using
12139      expected pairlength and pairlength deviation to discriminate among
12140      paired-end reads.
12141
12142    * stage2.c: Adding all hits from final querypos directly to celllist, rather
12143      than updating rootscores.  Fixed update of rootscore information to use
12144      current querypos and hit.  Dynamic programming starting from querypos 0,
12145      rather than querypos 1.
12146
12147    * diag.c: Restricted update of diagonal in middle region, to avoid affecting
12148      subsequent beginning and end regions
12149
12150    * splicetrie_build.c: Fixed handling of intron intervals, by introducing
12151      INTRON_HIGH_TO_LOW
12152
121532011-10-27  twu
12154
12155    * stage3hr.c: Made other fixes to allow copying of GMAP hit types
12156
12157    * stage1hr.c: Making copies where necessary for multiple GMAP subpaths, and
12158      freeing old Stage3pair_T objects at the appropriate time.  Calling
12159      Stage3pair_remove_overlaps on double GMAP alignments.
12160
12161    * pairpool.c: Using CALLOC_OUT in Pairpool_copy_array
12162
12163    * gsnap.c: Printing sequence name when debugging memusage
12164
12165    * stage1hr.c: Handling multiple subpaths from stage 2 computation
12166
12167    * stage2.c: Removed restriction on number of positions for final non-zero
12168      querypos
12169
12170    * stage3hr.c: Allowing Stage3end_T objects of type GMAP to be copied
12171
12172    * pairpool.c, pairpool.h: Added function Pairpool_copy_array
12173
12174    * stage2.c: Fixed location of compiler directive for PMAP
12175
12176    * stage3.c: Restoring negative points for non-canonical introns in computing
12177      goodness
12178
12179    * stage2.c: Adding root scores for final non-zero and specific querypos
12180
12181    * gmap.c, list.c, list.h, stage1hr.c, stage1hr.h, stage2.c, stage2.h: Going
12182      back to changes from revision 50508 for multiple subpaths from stage2,
12183      plus revisions 50504 to 50909 from branches/gmap-2011-10-24-mult-stage2
12184      for a root position method for finding optimal and suboptimal subpaths
12185
121862011-10-25  twu
12187
12188    * src, gmap.c, list.c, list.h, stage1hr.c, stage1hr.h, stage2.c, stage2.h:
12189      Reverted to version 50507, before changes made to allow multiple paths
12190      from a stage2 computation
12191
121922011-10-24  twu
12193
12194    * VERSION: Updated version number
12195
12196    * archive.html, index.html: Added changes for 2011-10-24 version
12197
12198    * gmap.c, list.c, list.h, stage1hr.c, stage1hr.h, stage2.c, stage2.h: Merged
12199      revisions 50469 to 50504 from branches/2011-10-24-mult-stage2 to allow for
12200      multiple stage2 results from the same genomic segment for GMAP.
12201
12202    * stage3.c: Assigning sensedir in all cases, based on intron scores if
12203      necessary
12204
12205    * gmap.c, gsnap.c, outbuffer.c, pair.c, pair.h, result.c, result.h,
12206      resulthr.c, resulthr.h, samprint.c, samprint.h, stage1hr.c, stage1hr.h,
12207      stage3hr.c, stage3hr.h, uniqscan.c: Computing MAPQ score relative to best
12208      alignment.  Printing X2 field in SAM output to provide second best MAPQ
12209      score.
12210
12211    * stage3.c, stage3.h: When sense_try is provided, assigning sensedir
12212
122132011-10-16  twu
12214
12215    * VERSION: Updated version number
12216
12217    * index.html: Added comment about change to GMAP
12218
12219    * gmap.c: Sorting stage3list before evaluating for chimeras
12220
122212011-10-14  twu
12222
12223    * VERSION: Updated version number
12224
12225    * index.html: Updated for new version
12226
12227    * configure.ac: Grouped together checks for built-in procedures
12228
12229    * splicetrie_build.c, splicetrie_build.h: Checking for splice sites being
12230      beyond the chromosome length boundary
12231
12232    * splicetrie.c: Handling case where trieoffsets has a NULL_POINTER, which
12233      can occur with intron-type splicing
12234
12235    * splicetrie_build.c: Handling case where nsites from a given splice site is
12236      zero
12237
122382011-10-13  twu
12239
12240    * popcnt.m4: Checking each built-in instruction only if available
12241
12242    * asm-bsr.m4: Assigning a value to x
12243
12244    * VERSION: Updated version number
12245
12246    * gsnap.c: Removed 'S' from getopt
12247
12248    * gsnap.c: Fixed naming of --splicingdir flag
12249
12250    * samprint.c: Not printing mate chr or mate chrpos when mate has an
12251      excessive number of paths
12252
12253    * popcnt.m4: Changed test for -mpopcnt from a compiler test to a run test,
12254      to make sure instruction is legal
12255
122562011-10-12  twu
12257
12258    * dbsnp_iit.pl.in: Fixed syntax
12259
12260    * stage1hr.c: Fixed debugging statements
12261
12262    * VERSION: Updated version number
12263
12264    * README, dbsnp_iit.pl.in: Made changes to handle exceptions within the snp
12265      file
12266
12267    * stage1hr.c: Fixed detection of novel splice ends for distant splicing
12268
12269    * maxent_hr.c: Fixed debugging commands
12270
12271    * stage3hr.c: Added comment
12272
12273    * stage3hr.c: Eliminating terminals in Stage3end_optimal_score when
12274      non-terminals are present
12275
12276    * iit-read.c: Checking for case where nintervals is zero
12277
12278    * iit-write.c: Fixed memory leak in IIT_build
12279
122802011-10-10  twu
12281
12282    * stage3hr.c: Removing -nindels from calculation of querylength_adj
12283
12284    * gsnap.c: Limiting minimum value of indexdb_size_threshold
12285
12286    * indexdb.c: Fixed bug in computing Indexdb_mean_size using compressed hash
12287      table
12288
12289    * genome_hr.c: Adding +1 to ctr only when necessary
12290
122912011-10-09  twu
12292
12293    * genome_hr.c, genome_hr.h: In gamma decoding, changed from division of
12294      shift by 2 to using shift - 1, made all variables unsigned ints, and added
12295      code for branchless computation.
12296
122972011-10-07  twu
12298
12299    * gmap.c, gsnap.c, inbuffer.c, inbuffer.h: Generalized --filter-chastity to
12300      work on either or both ends of a paired-end read
12301
12302    * index.html: Changed wording
12303
12304    * index.html: Added information about --filter-chastity option
12305
12306    * goby.c, gsnap.c, inbuffer.c, shortread.c, shortread.h, uniqscan.c:
12307      Implemented --filter-chastity option
12308
12309    * genome_hr.c: Implemented a 2-shift method for decoding gammas, which
12310      avoids a branch
12311
12312    * config.site.rescomp.prd, config.site.rescomp.tst: Updated for new version
12313
12314    * index.html: Entered new version 2011-10-07
12315
12316    * VERSION: Updated version number
12317
12318    * splicetrie.c: Fixed a bug where splicesites_i was not being initialized to
12319      NULL
12320
12321    * genome_hr.c: Removed dispatch procedures.  Moved part of read_gamma to top
12322      of loop, to guarantee that the final ptr location is correct.
12323
12324    * genome_hr.c: Removed nbits.  Implemented macros for clear_lowbit and
12325      clear_highbit.
12326
12327    * genome_hr.c: Implemented both dispatch and non-dispatch methods for
12328      getting offsetptrs from gammas
12329
123302011-10-06  twu
12331
12332    * genome_hr.c: Rearranged order of gamma computations within loops
12333
12334    * genome_hr.c: For gamma commands, trying builtin clz first, then bsr in
12335      assembly, then table lookup.
12336
12337    * Makefile.dna.am, Makefile.gsnaptoo.am: Added POPCNT_CFLAGS
12338
12339    * config.site: Made CFLAGS=-O3 and added comments about testing -mpopcnt
12340
12341    * acinclude.m4, asm-bsr.m4, popcnt.m4, configure.ac: Added tests for
12342      -mpopcnt compiler flag and bsr function in assembly
12343
12344    * genome_hr.c: Added assertion statements to make sure builtin_clz is not
12345      called with a value of 0
12346
123472011-10-05  twu
12348
12349    * VERSION: Updated version number
12350
12351    * index.html: Made changes for latest version
12352
12353    * configure.ac: Added gff3 utility programs
12354
12355    * stage3hr.c: Sped up comparison of overlapping and separate paired-end
12356      alignments by using arrays instead of lists.
12357
12358    * stage3hr.c: Simplified procedure for finding bad superstretches
12359
12360    * stage1hr.c: Fixed bugs in pushing indels to low genomic position in
12361      solve_middle_insertion and solve_middle_deletion
12362
12363    * gsnap.c: Commenting out --genes option
12364
12365    * stage3hr.c: Performing separate runs of Stage3pair_remove_overlaps on
12366      overlapping and non-overlapping alignments
12367
12368    * stage3hr.c: Added check for bad superstretches in Stage3pair_overlap
12369
12370    * gsnap.c, stage1hr.c, stage1hr.h, uniqscan.c: Implemented a fast,
12371      integrated method for novel and known double splicing, and using it by
12372      default
12373
12374    * access.c: Added error messages when mmap fails
12375
123762011-10-04  twu
12377
12378    * stage3hr.c: Picking a winner in case of ties in Stage3pair_remove_overlaps
12379
12380    * stage1hr.c: Fixed calculation of genomicstart and genomicend for GMAP
12381      alignments, to extend as if there were no trimming
12382
12383    * stage3hr.c: Added debugging statements
12384
12385    * stage1hr.c: Made finding of novel doublesplices faster
12386
12387    * psl_introns.pl.in, psl_splicesites.pl.in: Moved print_exons into a
12388      subroutine
12389
12390    * psl_introns.pl.in, psl_splicesites.pl.in: Using donor_okay_p and
12391      acceptor_okay_p subroutines
12392
12393    * gtf_introns.pl.in: Changed warning message to refer to intron, not exon
12394
12395    * gtf_genes.pl.in: Removed unused variables
12396
12397    * Makefile.am, gff3_genes.pl.in, gff3_introns.pl.in, gff3_splicesites.pl.in:
12398      Added programs to handle GFF3 files
12399
12400    * Makefile.dna.am: Removed programs for processing BAM files
12401
12402    * iitdef.h, iit-write.c: Restored NUMERIC_ALPHA_SORT to sort types
12403
12404    * stage1hr.c: Fixed issue with finding novel doublesplices where not all
12405      middle segments were tested
12406
124072011-10-03  twu
12408
12409    * iitdef.h, iit-write.c, iit-write.h: Added code for creating IIT files
12410      internally
12411
12412    * archive.html, index.html: Made changes for version 2011-10-01
12413
124142011-10-02  twu
12415
12416    * VERSION: Updated version number
12417
124182011-10-01  twu
12419
12420    * sequence.c: Removed compiler directive to undef HAVE_ZLIB
12421
12422    * stage1hr.c: Allowing reads to be equal to index1part+2, not just greater
12423      than
12424
12425    * Makefile.dna.am, uniqscan.c: Added program uniqscan
12426
124272011-09-30  twu
12428
12429    * Makefile.dna.am, dynprog.c, dynprog.h, gmap.c, gsnap.c, splicetrie.c,
12430      splicetrie.h, stage1hr.c, stage1hr.h, stage3.c, stage3.h: Moved
12431      splicesites and trieoffsets into local static variables to avoid passing
12432      them as parameters
12433
12434    * stage1hr.c: Moved check of genestrand value outside loop for retrieving
12435      oligos
12436
12437    * substring.c: Created separate procedures for mark_mismatches for stranded
12438      and non-stranded cases
12439
12440    * genome_hr.c, gsnap.c, indexdb.c, mode.h, stage1hr.c, substring.c:
12441      Implemented stranded and non-stranded versions of cmet and atoi
12442      substitutions
12443
12444    * trunk, VERSION, config.site.rescomp.prd, config.site.rescomp.tst, src,
12445      compress.c, genome_hr.c, genome_hr.h, gsnap.c, mapq.c, mapq.h,
12446      splicetrie.c, splicetrie.h, stage1hr.c, stage3hr.c, stage3hr.h,
12447      substring.c, substring.h, util: Merged revisions 48527 to 48790 from
12448      branches/2011-09-28-atoi
12449
124502011-09-29  twu
12451
12452    * stage1hr.c: Previously returning NULL if either end of a paired-end read
12453      had no oligos.  Now returning NULL if both ends have no oligos.
12454
12455    * stage3hr.c: Requiring that overlap be greater than both ends of a
12456      paired-end read
12457
124582011-09-28  twu
12459
12460    * oligoindex_hr.c: Not computing oligoindex if left_plus_length <= left
12461
12462    * psl_introns.pl.in, psl_splicesites.pl.in: Added warning if intron lengths
12463      are negative
12464
12465    * splicetrie_build.c: Added check for negative distances in a splicing_iit
12466      file
12467
12468    * genome_hr.c: Fixed assignment to predicate inside of assert statement
12469
124702011-09-22  twu
12471
12472    * VERSION: Updated version number
12473
12474    * gsnap.c: Fixed typo in help statement
12475
12476    * table.c, uinttable.c: Allowing retrieval of keys to work, even if table is
12477      empty
12478
12479    * substring.c: Allowing case where nothing is found in splicesites_iit,
12480      because it is for introns and not splicesites
12481
12482    * splicetrie.c: Fixed case where splicetrie_obs is not NULL, but
12483      splicetrie_max is NULL, which occurs if the splicing file is for introns,
12484      and not splicesites
12485
12486    * splicetrie_build.c: Fixed memory leak in building splicetrie for introns
12487      splicing file
12488
124892011-09-20  twu
12490
12491    * cappaths.c, spliceclean.c: Moved cappaths and spliceclean programs to
12492      GSTRUCT repository
12493
12494    * Makefile.dna.am, genecompare.c: Moved program genecompare to GSTRUCT
12495      repository
12496
12497    * Makefile.dna.am: Moved cappaths to GSTRUCT repository
12498
12499    * Makefile.dna.am, splicegene.c, spliceturn.c: Moved spliceturn and
12500      splicegene to GSTRUCT repository
12501
125022011-09-16  twu
12503
12504    * gsnap.c: Provided further clarification in help statement about
12505      min-localsplice-endlength
12506
12507    * gsnap.c: Checking that min_distantsplicing_end_matches is greater than or
12508      equal to kmer size.  Clarified some help statements.
12509
12510    * README: Added recommendation to use known splice sites, rather than known
12511      introns
12512
12513    * README: Clarified that a given set of known splice sites can find
12514      alternative splicing.
12515
12516    * except.c: Fixed Except_advance_stack to return a value if pthreads not
12517      available
12518
12519    * Makefile.dna.am: Moved instructions for spliceclean and splicealt to
12520      GSTRUCT repository
12521
12522    * psl_introns.pl.in: Removed extraneous "v" at beginning of line
12523
125242011-09-14  twu
12525
12526    * VERSION: Updated version number
12527
12528    * index.html: Updated page to show version 2011-09-14
12529
12530    * inbuffer.c, sequence.c, shortread.c, shortread.h: Revised read procedures
12531      to handle multiple files correctly
12532
125332011-09-13  twu
12534
12535    * pair.c, pair.h, samprint.c: For SAM output of GMAP alignments, printing
12536      correct value of NH for number of hits
12537
12538    * stage3.c, stage3.h, gmap.c, gsnap.c: Added parameter for min_intronlength
12539
12540    * Makefile.dna.am, bam_pileup.c, bam_tally.c, bamread.c, bamread.h,
12541      gsnap_extents.c, gsnap_splices.c: Moved files to gstruct
12542
12543    * stage3.c: Reduced value of MIN_NONINTRON from 50 to 9 to avoid declaring
12544      short introns as indels
12545
12546    * pair.c: Fixed Pair_print_sam to work properly for chimeric alignments
12547
12548    * stage3.c: Cleaning up gaps and indels from ends at end of stage 3
12549
12550    * stage1hr.c: Fixed debugging statement
12551
125522011-09-09  twu
12553
12554    * VERSION: Updated version number
12555
12556    * dynprog.c: Fixed computation of lband and rband in find_best_endpoint and
12557      find_best_endpoint_to_queryend
12558
12559    * dynprog.c: Added protections against length1 being negative
12560
125612011-09-07  twu
12562
12563    * index.html: Made changes for 2011-09-07 version
12564
12565    * samprint.c, pair.c, pair.h, stage3.c: Removed almost all unused parameters
12566
12567    * outbuffer.c, stage3.c, stage3.h: Removed most unused parameters
12568
12569    * pair.c: Fixed compiler messages about comparison of signed and unsigned
12570      ints
12571
12572    * compress.c, compress.h: Fixed compiler messages about comparison of
12573      unsigned and signed ints
12574
12575    * svncl.pl: Made changes to fix merges of words between lines and preserve
12576      original numbers of spaces
12577
12578    * VERSION: Updated version number
12579
12580    * shortread.c: For input error involving line that is too long, printing
12581      accession where problem occurred.
12582
12583    * Makefile.am: Including full set of psl and gtf parsing programs with or
12584      without fulldist
12585
12586    * genome.c: Added include line for genomicpos.h
12587
12588    * Makefile.dna.am: Included genomicpos.c and genomicpos.h for
12589      extents_genebounds
12590
12591    * README: Made changes to reflect new gtf_introns program
12592
12593    * configure.ac, Makefile.am, gtf_introns.pl.in: Added gtf_introns program.
12594      Also putting psl_introns back into the public distribution.  Made changes
12595      accordingly in README file.
12596
12597    * dynprog.c, dynprog.h, stage3.c: When searching for gappairs_alt using
12598      probabilities, bounding the search based on a score threshold computed
12599      from the original score
12600
12601    * stage3.c: For GSNAP, no trimming of noncanonical ends based on
12602      probabilities, since need to compare fwd and rev directions.  Stopped
12603      final trimming of short end exons.  For gappairs_alt, accepting if it
12604      results in high-probability splice sites.  For pick_cdna_direction, using
12605      separate donor and acceptor scores and using alignment score again.
12606
12607    * stage3.c: Put final pass to find canonical introns before trimming of dual
12608      breaks at ends
12609
12610    * stage3.c: Fixed problem with trimming dual breaks where it was trimming
12611      indels. In trimming noncanonical exons at end, reduced NONCANONICAL_ACCEPT
12612      from 20 to 15, and added NONCANONICAL_PERFECT_ACCEPT.  In
12613      pick_cdna_direction, turned off use of indel_alignment_score, and added
12614      nmatches - nmismatches - 3*nindels with MATCHES_SIGDIFF after use of
12615      totalintronscore.
12616
12617    * pair.c, dynprog.c: Counting nindels correctly near splice sites
12618
12619    * gtf_splicesites.pl.in: Allowing GTF file to use tag gene_id instead of
12620      gene_name
12621
126222011-09-06  twu
12623
12624    * stage1hr.c: Fixed masking of oligos for cmet and atoi modes
12625
12626    * gsnap.c: Added 'S' flag to getopt command.  Removed 'R' flag from getopt
12627      and from help message.
12628
12629    * datadir.c: Fixed error message to remove the word default
12630
12631    * stage3.c: Peeling back 1 pair in a dual break (previously turned off) to
12632      avoid having a gap on either side.
12633
12634    * dynprog.c, dynprog.h, stage3.c: Changed bridge_intron_gap to have an
12635      explicit parameter for use_probabilities_p.  Using indel_alignment_score
12636      now in pick_cdna_direction.
12637
12638    * splicetrie.c, splicetrie.h: Providing contlength as parameter when making
12639      3' splicejunctions
12640
12641    * genome.c, genome.h: Added functions for Genome_fill_buffer_blocks that do
12642      not print final null at end of string, needed for making splicejunctions.
12643
12644    * dynprog.c, dynprog.h: Fixed creation of splicejunctions.  For 5'
12645      splicejunctions, not printing final null at end of string.  For 3'
12646      splicejunctions, printing distal sequence at splicejunction[contlength].
12647
126482011-09-02  twu
12649
12650    * atoiindex.c, cmetindex.c: Masking to oligomer indices to given index1part
12651      size
12652
12653    * gmap.c: Fixed bug in pairalign where min_matches not be adjusted downward
12654      to MIN_MATCHES
12655
126562011-09-01  twu
12657
12658    * stage3.c: Fixed bug where longer end of dual break was being trimmed, not
12659      shorter end
12660
12661    * dynprog.c: Allow for cDNA insert of up to 3 bp at splice site
12662
12663    * dynprog.c, dynprog.h, gmap.c, gsnap.c, stage3.c: Introduced
12664      --microexon-spliceprob flag and allowing microexons only if one of the
12665      splice sites exceeds this value
12666
12667    * stage3.c: Always keeping gappairs if finalp is true (was forcep)
12668
12669    * dynprog.c: Always using splice site probabilities to find introns if
12670      finalp is true, and allowing indels nearby
12671
12672    * stage3.c: Added a factor for dual break query jump to avoid dual breaks at
12673      end with small query jumps
12674
12675    * gmap.c: Fixed indentation
12676
12677    * scores.h, pair.c: Added negative points in pathscore for non-canonical
12678      intron or for when cdna direction is indeterminate
12679
12680    * chimera.c: Not allowing chimeric transition into a gap
12681
12682    * stage1hr.c: Using new interface to Stage3_compute
12683
12684    * gsnap.c: Using new interface to Stage3_setup
12685
12686    * gmap.c: Added --nosplicing flag.  Fixed memory leak when matches <
12687      min_matches.
12688
12689    * stage3.c, stage3.h: Made splicingp a static variable.  Added a step to
12690      remove dual breaks from the ends of an alignment.
12691
12692    * stage2.c: Turning off link back to grand_fwd_hit when splicingp is false
12693
12694    * stage2.c: Using macros for diffdist_penalty under splicing and
12695      non-splicing cases
12696
12697    * stage2.c: Consolidated loops for use_shifted_canonical_p == true and ==
12698      false. Removed compiler branches when SHIFT_EXTRA is not defined.
12699
127002011-08-31  twu
12701
12702    * stage3.c: In trimming noncanonical exons from end, requiring end intron to
12703      be canonical and have donor prob or acceptor prob >= 0.9.
12704
12705    * chimera.c, chimera.h, pair.c, pair.h, stage3.c, stage3.h: Prohibiting
12706      chimeric join at a query position containing a gap
12707
12708    * gsnap.c: Added comment in --help output about how to turn off terminal
12709      alignments
12710
12711    * stage1hr.c: Using terminal_threshold in paired-end reads
12712
12713    * gsnap.c, stage1hr.c, stage1hr.h, stage3hr.c, stage3hr.h: Replaced
12714      terminal_penalty with terminal_threshold
12715
127162011-08-30  twu
12717
12718    * gmap.c: Fixed entry for --pairalign in long_options
12719
12720    * gmap.c: If min_matches exceeds MIN_MATCHES, use MIN_MATCHES
12721
12722    * outbuffer.c, pair.c, pair.h, stage3.c, stage3.h: In compressed output,
12723      printing accession of usersegment instead of null dbversion
12724
12725    * gmap.c: Implemented --pairalign flag for aligning a pair of sequences via
12726      stdin
12727
12728    * splicetrie_build.c: Fixed compiler warning about unused variable
12729
12730    * stage3.c: Improved debugging output
12731
12732    * genome.c, gmap.c, inbuffer.c, inbuffer.h, sequence.c, sequence.h:
12733      Implemented --cmdline flag to align two sequences provided on the command
12734      line
12735
12736    * shortread.c: Changed warning message when /1 or /2 endings not found
12737
12738    * access.c: Changed warning message for Macs
12739
12740    * stage3.c: Fixed add_querypos_offset, which previously excluded gap pairs
12741
127422011-08-29  twu
12743
12744    * genome.c: Fixed compiler warnings about comparing ints and unsigned ints
12745
12746    * indexdb.c: Printing genomesubdir and then individual index file names in
12747      monitoring message
12748
12749    * indexdb.c: Using commas in initial monitoring message
12750
12751    * genome.c: Using commas in initial monitoring message.  Allowing allocation
12752      if mmap not available.
12753
12754    * gsnap.c: Modified messages about RNA-Seq and DNA-Seq
12755
12756    * stage1hr.c: Put GMAP modes on a single line in monitoring message
12757
12758    * indexdb.c: Restored debugging messages
12759
12760    * iit-read.c: Casting all size_t to (unsigned long) in error messages
12761
12762    * access.c: Added check for failure of fread, which happens with Macs on
12763      large genomes
12764
12765    * pair.c: Fixed case of a negative distance in printing GMAP alignment
12766      beyond chromosomal bounds
12767
12768    * substring.c, substring.h: Using new interface to Pair_print_gsnap
12769
12770    * segmentpos.c, indexdb_hr.c: Fixed compiler warning about comparing int and
12771      unsigned int
12772
12773    * oligo.c: Commented out unused procedures for dibase
12774
12775    * stage1hr.c: Using new interfaces to Stage3end_remove_overlaps and
12776      Pair_print_gsnap
12777
12778    * stage3hr.c, stage3hr.h: Removed unused parameters in
12779      Stage3end_remove_overlaps
12780
12781    * pair.c, pair.h: Fixed bug in GSNAP standard output in printing GMAP
12782      alignments beyond chromosomal bounds
12783
12784    * stage1hr.c: For finding GMAP mapping bounds using segments, checking
12785      plus_nsegments and minus_nsegments > 0, rather than plus_segments and
12786      minus_segments == NULL.
12787
127882011-08-27  twu
12789
12790    * VERSION: Updated VERSION
12791
12792    * pair.c: Switched from aaphase_g to aaphase_e, since aaphase_e correctly
12793      codes for all three positions of the stop codon
12794
12795    * config.site: Removed references to samtools
12796
12797    * gsnap.c, shortread.c, shortread.h: Added flags --fastq-id-start and
12798      --fastq-id-end.  Stripping Illumina paired-end endings more intelligently.
12799
128002011-08-26  twu
12801
12802    * dynprog.c, dynprog.h, splicetrie.c, stage3.c: Removed some unused
12803      parameters
12804
12805    * dynprog.c, dynprog.h, stage3.c: Removed code for INTRON_HELP
12806
12807    * gmap.c: Changed author list
12808
12809    * dynprog.c, dynprog.h, splicetrie.c, splicetrie.h: Removed references to
12810      gbuffer
12811
128122011-08-25  twu
12813
12814    * gsnap.c: Fixed warning messages for -N and -s.  In --help message,
12815      notifying user that full pathnames are allowed.
12816
12817    * stage3.h: Providing accessor commands for chrnum, chrstart, and chrend
12818
12819    * stage3.c: Fixed bug in determining coordinates for Stage3_mergeable
12820
12821    * gmap.c: Modified some debugging statements for chimeras
12822
128232011-08-19  twu
12824
12825    * stage1hr.c: Adding at least querylength beyond distal mappingstart or
12826      mappingend to obtain distal genomicstart or genomicend
12827
12828    * oligoindex.c: Added to size of genomicdiag by 1.  Added assertion about
12829      exceeding those bounds.
12830
12831    * gsnap.c: Added warning messages about interpretation of -N and -s flags
12832
12833    * genome_hr.c: Fixed bugs preventing program from compiling
12834
12835    * gmap_build.pl.in: Not providing -s flag to fa_coords
12836
128372011-08-16  twu
12838
12839    * archive.html, index.html: Updated for version 2011-08-15
12840
12841    * COPYING: Changed Developer
12842
12843    * gmap_setup.pl.in: Removed instructions about gmapdb_lc and gmapdb_lc_masked
12844
12845    * README: Added default value for MAX_READLENGTH
12846
12847    * README: Revised for new features
12848
12849    * indexdb.c: Added warning message if no gammaptrs file is produced
12850
12851    * gmapindex.c: Allocating an extra two chars in the offsets file names for
12852      the basesize
12853
12854    * archive.html, index.html: Changes made for 2011-03-28 version
12855
128562011-08-15  twu
12857
12858    * fa_coords.pl.in: Limiting number of warning messages about duplicate
12859      contigs
12860
12861    * atoiindex.c, cmetindex.c: Improved monitoring messages
12862
12863    * setup1.test.in, Makefile.am, setup.ref12123positions.ok,
12864      setup.ref123positions.ok: Changed tests for new gamma file format
12865
12866    * atoiindex.c, cmetindex.c, indexdb.c: Handling special case where
12867      index1part == basesize by not writing gammaptrs file and reading
12868      offsetscomp file directly into offsets.
12869
12870    * atoiindex.c, cmetindex.c: Modified procedures to work with new compressed
12871      offsets file format
12872
12873    * indexdb.c: Handling case where kmer == basesize
12874
128752011-08-14  twu
12876
12877    * gmap_setup.pl.in: Removed -S flag and added -s flag
12878
12879    * gmap_build.pl.in: Changed message to indicate that default order is chrom
12880      order
12881
12882    * fa_coords.pl.in: Removed -S flag
12883
12884    * gmapindex.c: Made chrom sort order the default
12885
12886    * gmap.c, gsnap.c: Added --splicingdir flag
12887
12888    * gsnap_splices.c: Turned off warning messages about non-canonical splices
12889
12890    * bam_tally.c: Fixed warning message output going to stdout
12891
12892    * bam_tally.c: Fixed printing of print_allele_counts_simple.  Allocating and
12893      freeing tallies within parse_bam procedure for each chromosome.
12894
12895    * indexdb.c: Added missing closing brace
12896
12897    * gsnap_extents.c: Using find_strand procedure from gsnap_splices, which
12898      trusts strand from SAM output
12899
12900    * spliceturn.c: Fixed eliminatep to be indexed by universal IIT index
12901
129022011-08-13  twu
12903
12904    * indexdb.c, indexdb.h, indexdb_hr.c, indexdbdef.h: Allowing backward
12905      compatibility with pre-gamma genomic indices.  Using littleendian and
12906      bigendian versions of gamma procedures.  Implemented more compatibility
12907      with bigendian machines.
12908
12909    * genome_hr.c, genome_hr.h: Instead of allocated/mmapped versions of gamma
12910      procedures, creating littleendian and bigendian versions
12911
12912    * spliceclean.c: Changed wording of monitoring messages from "Resolve" to
12913      "Choose". Providing count information even when cannot choose between fwd
12914      and rev.
12915
12916    * spliceturn.c: Printing monitoring message about number of splices
12917      eliminated
12918
12919    * bam_tally.c: Improved warning messages for genotypes inconsistent with
12920      reference allele
12921
12922    * atoiindex.c, cmetindex.c: Using offsetscomp instead of offsets in variable
12923      names.
12924
12925    * snpindex.c: Fixed bug with freeing gammaptrs_filename and
12926      offsetscomp_filename too early.  Using offsetscomp instead of offsets in
12927      variable names.
12928
12929    * indexdb.h: Removed unused procedures
12930
12931    * indexdb.c: Checking for rare case that ctr == 0 after all gammas read, and
12932      not advancing ptr in that case
12933
12934    * bam_tally.c: Added Tally_T structure to simplify data structures and speed
12935      up program
12936
12937    * bam_tally.c: Removed quality_score_constant.  Handling empty quality
12938      strings correctly.
12939
12940    * bamread.c: Handling empty quality strings correctly
12941
12942    * indexdb.c, indexdb_hr.c: Handling an offsetscomp_access condition that is
12943      not possible, to eliminate compiler warnings
12944
12945    * gmap.c, stage1hr.c: Reduced value of minendexon from 12 to 9, since we are
12946      using nmatches - nmismatches.  Results in much better results at ends.
12947
12948    * stage3hr.c: Restored usage of score before nmatches in remove_overlaps
12949      procedures
12950
12951    * stage3.c: For trimming at ends, using nmatches - nmismatchs to evaluate.
12952      Fixed bug in pick_cdna_direction where value for cdna_direction not
12953      assigned correctly.  Indels for bad introns, using requiring each end
12954      probability to be greater than 0.9, and taking alternate gappairs if
12955      nmismatches is less.
12956
129572011-08-12  twu
12958
12959    * trunk, VERSION, config.site.rescomp.prd, config.site.rescomp.tst, src,
12960      Makefile.dna.am, Makefile.gsnaptoo.am, access.h, atoiindex.c, cmetindex.c,
12961      genome_hr.c, genome_hr.h, gmap.c, gmapindex.c, gsnap.c, iit-read.c,
12962      indexdb.c, indexdb.h, indexdb_hr.c, indexdbdef.h, pmapindex.c, snpindex.c,
12963      splicetrie_build.c, types.h, util, gmap_build.pl.in, gmap_setup.pl.in:
12964      Merged revisions 44539 to 44852 from branches/2011-08-09-elias-gamma,
12965      implementing gamma coding to represent offsets in genomic indices
12966
129672011-08-10  twu
12968
12969    * stage1hr.c: Put GMAP pairsearch back in front of distant splicing.
12970      Characterizing GMAP pairsearch results according to quality, and updating
12971      either nconcordant or nsalvage.  Distant splicing done if nconcordant is
12972      0. Terminals done if both nconcordant and nsalvage are 0.
12973
12974    * stage3.c: Not using nnoncanonical in pick_cdna_direction.  Restored
12975      assignment of SENSE_NULL if no canonical site is found.
12976
12977    * stage3hr.c: In Stage3end_remove_overlaps and Stage3pair_remove_overlaps,
12978      using nmatches rather than score as the primary measure
12979
12980    * samprint.c, stage1hr.c, stage3hr.h: Introduced new hittype DISTANT_SPLICE
12981      and not trying to do GMAP alignment on those
12982
12983    * stage1hr.c: Moved distant splicing ahead of GMAP pairsearch
12984
12985    * stage3hr.c: Added debugging information
12986
129872011-08-09  twu
12988
12989    * substring.c: Trimming terminals with fixed -3 mismatch score, while
12990      allowing other ends to be controlled with user-specified
12991      trim_mismatch_score.
12992
12993    * dynprog.c: Using probabilities to find splice sites only if finalp is true
12994
129952011-08-08  twu
12996
12997    * bamread.c: Added missing #endif statement
12998
12999    * stage3.c: Limiting SENSE_NULL only to ties between fwd and rev
13000
13001    * stage3.c: Preventing semicanonical splices at end from being trimmed.
13002      Using product of probabilities to decide whether an indel is next to a bad
13003      intron
13004
13005    * dynprog.c: If canonical intron cannot be found, using probabilities to
13006      find best splice junction
13007
13008    * bamread.c, bamread.h: Added function Bamread_splice_strand
13009
13010    * samprint.c: Making cigar_noncanonical_splices_p true
13011
13012    * sequence.c, shortread.c: Made fixes in new handling of eoln situations
13013
130142011-08-07  twu
13015
13016    * stage3hr.c, stage3hr.h, substring.c, substring.h: Using runlength IIT for
13017      resolving multiple mappings
13018
13019    * spliceturn.c: Sorting splices in order of observed counts, and eliminating
13020      in that order
13021
13022    * snpindex.c: Fixed coordinates in error messages
13023
13024    * gsnap.c: Added option for using runlength IIT to resolve multiple mappings
13025
13026    * bam_tally.c: Added option for printing runlengths
13027
13028    * bam_tally.c: Printing genotype.  Added --diffs-only flag.
13029
130302011-08-06  twu
13031
13032    * dynprog.c: Providing separate rewards for GC-AG and AT-AC introns, with
13033      stronger reward for GC-AG.
13034
13035    * stage3.c: Not removing bad non-canonical exons at end.  Computing combined
13036      probability score for donor and acceptor splice sites.
13037
13038    * spliceturn.c: Fixed program to not depend on distinction between known and
13039      new splices
13040
13041    * gsnap_splices.c: Added --minsupport flag
13042
13043    * gmap.c: Added break after case 'z'
13044
13045    * stage3hr.c: Turned off TALLY_RATIO, and checking instead for presence or
13046      absence of overlap
13047
13048    * gsnap_splices.c: Added --mincount flag
13049
13050    * gsnap.c: Adding .iit to splicesites file when searching locally
13051
13052    * bam_pileup.c: Printing accessions at start and end of reads
13053
13054    * indexdb.c: Checking for full offsets_suffix, not just "offsets"
13055
13056    * dynprog.c, dynprog.h, stage3.c: Fixed bug in assigning the wrong value to
13057      splicingp.  For indels next to bad introns, checking the alternative to
13058      see if it is free of mismatches.
13059
13060    * Makefile.dna.am, Makefile.gsnaptoo.am, atoiindex.c, cmetindex.c,
13061      indexdb.c, indexdb.h, snpindex.c: Created a general Indexdb_get_filenames
13062      procedure and using it for snpindex, cmetindex, and atoiindex, so they
13063      work on all k-mer types
13064
13065    * stage3hr.c, substring.c, substring.h: Removed some unused parameters and
13066      variables
13067
13068    * outbuffer.c, stage3.c, stage3.h: Removed some unused parameters
13069
13070    * stage3.c: Not calling pick_cdna_direction when splicingp is false
13071
130722011-08-05  twu
13073
13074    * bam_pileup.c: Initial import into SVN
13075
13076    * outbuffer.c: Restored Paths and Alignments sections to "gmap -4" output
13077      (continuous by exon).
13078
13079    * VERSION: Updated version number
13080
13081    * stage3hr.c: In Stage3end_gene_overlap, initializing foundp
13082
13083    * iit-read.c: In IIT_gene_overlap, initializing allocp and freeing matches.
13084
13085    * gtf_genes.pl.in: Revised ends and starts for genes on minus strand.
13086
13087    * psl_genes.pl.in: Fixed 0-basis of starts.  Revised ends and starts for
13088      genes on minus strand.
13089
13090    * psl_genes.pl.in: Added headers for gene format
13091
13092    * configure.ac, Makefile.am: Revised set of files distributed
13093
130942011-08-04  twu
13095
13096    * Makefile.am, gtf_genes.pl.in, psl_genes.pl.in: Added psl_genes and
13097      gtf_genes programs
13098
13099    * gmap.c, iit-read.c, stage1.c, stage1.h, gregion.c: Removed unused
13100      parameters
13101
13102    * inbuffer.h: Removed parameter pc_linefeeds_p
13103
13104    * shortread.h: Removed braces
13105
13106    * sequence.c, shortread.c: Removed call to find_bad_char, since we are
13107      checking for '\r' directly before '\n'
13108
13109    * sequence.c, shortread.c: Added checks so we don't read p[-1] when the
13110      first character in the string is already '\n'
13111
13112    * sequence.c, shortread.c: Checking for carriage return before every line
13113      feed
13114
13115    * gsnap.c, inbuffer.c, shortread.h: Removed --pc-lines option
13116
13117    * Makefile.am, dbsnp_iit.pl.in, fa_coords.pl.in, gmap_compress.pl.in,
13118      gmap_process.pl.in, gmap_uncompress.pl.in, gtf_splicesites.pl.in,
13119      md_coords.pl.in, psl_introns.pl.in, psl_splices.pl.in,
13120      psl_splicesites.pl.in: Stripping CR-LF from input files
13121
13122    * gsnap.c, iit-read.c, iit-read.h, pair.c, pair.h, stage3hr.c, stage3hr.h,
13123      substring.c, substring.h: Added option to favor multi-exon genes
13124
13125    * gmap.c, gsnap.c: Checking for valid int and float arguments
13126
13127    * stage3hr.c: Fixed bug in resolve_multimapping procedures
13128
13129    * stage3hr.c: Fixed bug in not initializing antistranded_penalty
13130
13131    * gsnap.c, iit-read.c, iit-read.h, pair.c, pair.h, stage1hr.c, stage3hr.c,
13132      stage3hr.h, substring.c, substring.h: Added -g flag and genes_iit.  Added
13133      procedures for resolving multimapping using known genes and tally.
13134
131352011-08-03  twu
13136
13137    * gmap.c: Making call to Splicetrie_setup, so -s flag works for known splice
13138      sites
13139
13140    * Makefile.dna.am, Makefile.gsnaptoo.am, Makefile.three.am, Makefile.am,
13141      coords1.test.in, iit.test.in, setup1.test.in: Fixed "make check" so it
13142      works for Cygwin on Windows, where copying of programs from src does not
13143      work
13144
13145    * dynprog.c, dynprog.h, stage3.c: Computing splice site probabilities when
13146      user genomic segment is provided
13147
13148    * stage3.c: In assign_gap_types, using known splicesites_iit to assign
13149      splice site probabilities of 1.
13150
13151    * pair.c, pairdef.h, pairpool.c, stage3.c: In addition to trimming
13152      noncanonical exons close to the end, trimming bad canonical exons close to
13153      the end.  Now computing splice site probabilities in assign_gap_types.
13154
13155    * dynprog.c: Checking for indel plus bad intron only when finalp is true,
13156      because earlier passes may need some time to iterate to reach a final
13157      solution.
13158
13159    * stage3.c: Checking for pairs being NULL before calling Pair_trim_ends
13160
13161    * gmap.c, pair.c, pair.h, stage1hr.c, stage3.c, stage3.h, stage3hr.c,
13162      stage3hr.h: Using matches post-trim for deciding if the alignment has
13163      sufficient quality, but using nmatches_pretrim for ranking and scoring
13164      purposes.
13165
131662011-08-02  twu
13167
13168    * stage3.c: Turned final pass 6 back on, which was inadvertently turned off
13169
13170    * dynprog.c, pair.c, pair.h, pairdef.h, pairpool.c: Protecting pairs at end
13171      against trimming by GMAP if they are found by splicetrie at known splice
13172      sites
13173
13174    * stage3.c: Not running Dynprog_single_gap if queryjump or genomejump of a
13175      dual break is equal to 1, since that just leads to two indels.
13176
13177    * pair.c: For trimming of ends, changed penalty for indel score from -6 to
13178      -4.
13179
13180    * dynprog.c: Corrected coordinates for splice site probabilities and
13181      dinucleotides. Disallowing indels near splice sites if either probability
13182      is less than 0.9.
13183
13184    * stage3.c: Keeping non-canonical intron if there is sufficient exon
13185      evidence at the end.  Iterating trimming of non-canonical introns at the
13186      end.
13187
13188    * pair.c: Printing "method:gmap" even if assertions are turned off
13189
13190    * dynprog.c: Checking probability of intron found by bridge_intron_gap and
13191      discarding the solution if it finds both an indel and a bad intron.
13192
13193    * chrom.c, diag.c, diag.h, dynprog.c, gmap.c, oligoindex_hr.c, outbuffer.c,
13194      pair.c, pair.h, smooth.c, stage1.c, stage1hr.c, stage2.c, stage3.c,
13195      stage3.h: Removed various unused parameters
13196
13197    * VERSION, config.site.rescomp.prd, config.site.rescomp.tst: Revised version
13198      number
13199
13200    * stage1hr.c: Fixed debugging statements
13201
13202    * substring.c: Added a general test of goodness for a substring based on its
13203      numbers of matches and mismatches.
13204
13205    * stage3.c: Restored trimming of non-canonical end exons, but removed code
13206      that trimmed exons less than 20 bp when known splicing was available.
13207
13208    * iit-read.c, iit-read.h: Added function IIT_exists_with_divno_typed_signed
13209
13210    * stage1hr.c: Providing an allowance for GMAP score when want_high_quality_p
13211      is true.  Turning off use of score as primary criterion in choosing
13212      splices, since it does not take advantage of known splicing.
13213
13214    * stage3.c: Not using nmismatches in pick_cdna_direction.  If
13215      splicesites_iit is available, assigning known splicesites a probability of
13216      1.0.
13217
132182011-08-01  twu
13219
13220    * stage1hr.c, stage1hr.h: When antistranded_penalty has a value, using
13221      mismatches plus penalty to decide between sense and antisense, rather than
13222      using probabilities.
13223
13224    * stage3.c: Changed to_queryend_p to be true for distalmedial_ending, since
13225      it is comparing alternatives.
13226
13227    * stage3hr.c, stage3hr.h: Adding antistranded_penalty to score
13228
13229    * stage3.c: In pick_cdna_direction, using presence of indels in combination
13230      with bad intron
13231
13232    * stage1hr.c, stage1hr.h: Using antistranded penalty to determine to force
13233      GMAP to look for antisense result.  Requiring GMAP to be high quality in
13234      all cases. Basing GMAP quality on nmismatches plus gap opens.  Choosing
13235      best result among all paired types, so concordant no longer predominates
13236      over others.
13237
13238    * samflags.h: Added comment to show some common flags
13239
13240    * gsnap.c: Added parameter for antistranded penalty
13241
13242    * chrom.c, chrom.h, gmapindex.c, gsnap_extents.c, gsnap_splices.c,
13243      gsnap_terms.c, iit-read.c, iit_store.c, iitdef.h, segmentpos.c,
13244      segmentpos.h, spliceclean.c: Introducing new chrom sort in addition to
13245      previous numeric_alpha sort
13246
13247    * chrom.c: Fixed bugs in parsing out initial "chr" from strings
13248
132492011-07-31  twu
13250
13251    * gsnap.c: Changed default for all gmap parameters from 2 to 3
13252
13253    * stage1hr.c: Added macros add_bounded and subtract_bounded to keep
13254      computations within chromosomal bounds
13255
13256    * gmap.c, stage2.c, stage2.h: Created a specialized procedure for
13257      score_querypos when splicing is true and no shifted canonicals are used
13258
13259    * oligoindex.c: Using only 8-mers for oligoindices_major in GSNAP for stage 2
13260
13261    * trunk, config.site.rescomp.tst, configure.ac, src, gsnap.c, stage1hr.c,
13262      stage1hr.h, stage3hr.c, stage3hr.h, util, gmap_build.pl.in,
13263      gtf_splicesites.pl.in, psl_splicesites.pl.in: Merged revisions 44034 to
13264      44047 from branches/2011-07-31-gmap-then-terminals
13265
13266    * stage1hr.c: For GMAP halfmapping, when overlap is found, take widest
13267      possible starting point between the overlap calculation and the normal
13268      calculation
13269
13270    * stage3.c: Added gappairs in debugging statements
13271
13272    * shortread.c, shortread.h, stage1hr.c: In computing GMAP halfmapping,
13273      checking for existence of primers and extending GMAP region if they exist
13274
13275    * stage3hr.c, stage3hr.h: Added function Stage3end_best_score_paired
13276
13277    * stage1hr.c: Running GMAP pairsearch only on ends with scores better than
13278      those already paired
13279
13280    * gmap.c, gsnap.c: Printing arguments before they are parsed
13281
13282    * stage2.c: When shifted_canonical_p is false, not computing rev scores,
13283      since they are the same as the fwd scores
13284
13285    * stage2.c: Improved debugging statements
13286
13287    * gmap.c, stage1hr.c, stage2.c, stage2.h, stage3.c: Added parameter
13288      use_shifted_canonical_p.  Using now only for cross-species alignment in
13289      GMAP.
13290
13291    * stage3hr.c, stage3hr.h: Added function Stage3pair_sort_bymatches.  For
13292      substitutions with 0 mismatches, classifying hit as EXACT rather than SUB,
13293      so duplicates are eliminated properly.
13294
13295    * stage1hr.c, stage1hr.h: Allowing multiple concordant results to undergo
13296      GMAP improvement, up to max_gmap_improvement.  Sorting results by matches
13297      before GMAP improvement.
13298
13299    * gsnap.c: Introduced separate parameter for max_gmap_improvement.  Hid
13300      pairexpect and terminal-penalty parameters.
13301
133022011-07-30  twu
13303
13304    * stage1hr.c: Fixed errors in find_terminals with analysis of mismatches and
13305      use of floors
13306
13307    * stage1hr.c: Requiring high quality GMAP unpaired method when we observe a
13308      paired toolong alignment
13309
13310    * gsnap.c: Made halfmapping,unpaired,improve the default GMAP method
13311
13312    * stage3hr.c, stage3hr.h: Scoring GMAP based on total number of matches.
13313      Implemented Stage3end_best_score.
13314
13315    * stage1hr.c: Using new test based on Stage3end_best_score to avoid
13316      terminals in anticipation of GMAP halfmapping.  Allowing poor quality GMAP
13317      in GMAP improvement.  Filtering results by optimal score before GMAP
13318      improvement using an infinite cutoff.
13319
13320    * stage3.c: In pick_cdna_direction, allowing bad-scoring canonical introns
13321      to determine sense
13322
13323    * Makefile.dna.am, Makefile.gsnaptoo.am, gmap.c, sense.h, stage1hr.c,
13324      stage3.c, stage3.h, stage3hr.h: Created new header file sense.h.
13325      Returning sensedir from pick_cdna_direction and Stage3_compute.
13326
13327    * gsnap.c, stage1hr.c, stage1hr.h: Introduced separate parameter for
13328      trigger-score-for terminals
13329
13330    * stage1hr.c: Turned off option to avoid terminals if halfmapping
13331      anticipated later
13332
13333    * stage3.c: Increased threshold for trying microexon from 0 acceptable
13334      mismatches to 2 in high quality sequences
13335
13336    * stage1hr.c: Applying Stage3pair_optimal_score after GMAP improvement step
13337
13338    * pair.c, pair.h, samprint.c, stage3hr.c, substring.c, substring.h:
13339      Implemented computation of MAPQ scores for GMAP alignments
13340
13341    * mapq.c, mapq.h: Moved some constants to mapq.h, so pair.c can access them
13342
13343    * gsnap.c, stage1hr.h: Changed name of GMAP method from concordant_uniq to
13344      improvement
13345
13346    * stage1hr.c: Conducting search for terminals even if gmap_halfmapping_p is
13347      true, if both single ends have no hits so far
13348
13349    * splicetrie.c, splicetrie.h: Checking internal exon region at splice site
13350      before doing a search for a short-end splice.
13351
13352    * oligoindex_hr.c: Removed assertions about sequencepos
13353
13354    * stage1hr.c: In using segments to bound GMAP region, looking at querypos5
13355      and querypos3 to help decide whether to extend mapping region by
13356      shortsplicedist.
13357
13358    * stage3.c: In pick_cdna_direction, no longer using indel_alignment_score
13359
13360    * gsnap.c, stage1hr.c, stage1hr.h, stage3hr.c, stage3hr.h: Introducing
13361      separate parameters for GMAP methods.  Sorting singlehits5 and singlehits3
13362      before running GMAP halfmapping/unpaired.
13363
13364    * gmap.c, gsnap.c, oligoindex_hr.c, oligoindex_hr.h, stage1hr.c, stage1hr.h,
13365      stage2.c, stage2.h, stage3.c: Merged revisions 43984 to 43994 from
13366      branches/2011-07-30-oligoindex-mapping-region to introduce mappingstart
13367      and mappingend in addition to genomicstart and genomicend for finding
13368      oligoindex mappings in stage 2
13369
13370    * pair.c: In trimming, counting an indel only as a single mismatch
13371
13372    * stage3.c: Turning off the step to remove noncanonical introns, since it
13373      fails in some cases
13374
13375    * stage1hr.c, stage2.c: Not doing terminals if GMAP pairsearch or
13376      halfmapping is available
13377
13378    * stage2.c: Ignore log message for 43986.  Correct log message should be:
13379      Introduced NEAR_END_LENGTH to define ends of reads where we can ignore
13380      EXON_DEFN.
13381
13382    * gsnap.c: Made halfmapping,cuniq the default for gmap-mode.
13383
13384    * gmap.c: Reduced values of CHIMERA_SLOP
13385
133862011-07-29  twu
13387
13388    * stage3hr.c: For single-end reads, favoring non-ambiguous alignments over
13389      ambiguous ones when they have identical genomicstarts or genomicends.
13390
13391    * dynprog.c: In Dynprog_end5_known and Dynprog_end3_known, if extension not
13392      found to query end and if ambiguous splicing not found, then doing dynamic
13393      programming again to best end, rather than to query end.
13394
13395    * pair.c, pair.h, stage3.c: Trimming of ends of GMAP alignments
13396
13397    * stage1hr.c: Allowing very poor alignments to be reported by GMAP method,
13398      now that we have trimming at ends
13399
13400    * gsnap.c, stage1hr.c, stage1hr.h: Avoiding duplication of chr marker
13401      segments and chr marker segment at the beginning.  Passing chromosome_iit
13402      and nchromosomes into Stage1hr_setup.
13403
134042011-07-28  twu
13405
13406    * stage1hr.c: Using binary search on segments to bound region for GMAP
13407      alignments
13408
134092011-07-27  twu
13410
13411    * stage1hr.c: Created debugging version that identifies plus_segments and
13412      minus_segments within GMAP halfmapping region
13413
13414    * stage2.c: Re-using result of find_shifted_canonical when leftpos remains
13415      the same
13416
13417    * gsnap.c: Made default for GMAP halfmapping with 2 candidates
13418
13419    * genome_hr.c: Added comments in splicesite_positions that offset - 1 is
13420      verified
13421
13422    * stage1hr.c: Fixed issues with fast_level and cutoff_level for very short
13423      reads, where fast_level < 0
13424
13425    * stage1hr.c: Made plus_segments, plus_nsegments, minus_segments, and
13426      minus_nsegments fields within the Stage1_T object, so they can be used
13427      later in limiting range for GMAP alignments
13428
13429    * genome_hr.c, genome_hr.h, stage2.c: Implemented lookup as needed of the
13430      previous dinucleotide from the genomic blocks
13431
134322011-07-26  twu
13433
13434    * stage1hr.c: Added chr marker segments to indicate boundaries between
13435      chromosomes and simplify inner loops for localsplicing and indels
13436
13437    * Makefile.dna.am: Removed gbuffer.c and gbuffer.h from file lists
13438
13439    * stage3.c, stage3.h: Multiple revisions to pick_cdna_direction
13440
13441    * stage2.c: Requiring exon length > EXON_DEFN for canonical splicing only in
13442      middle of queryseq
13443
13444    * gsnap.c, stage1hr.c, stage1hr.h: Allowing control over individual gmap
13445      modes
13446
13447    * chimera.c, dynprog.c, dynprog.h, extents_genebounds.c, genome.c, genome.h,
13448      get-genome.c, gmap.c, gsnap_tally.c, iit_plot.c, match.c, pair.c,
13449      sequence.c, sequence.h, splicetrie.c, splicetrie.h, splicing-score.c,
13450      substring.c: Making complement in place.  Using new interface to
13451      Genome_get_segment.
13452
134532011-07-21  twu
13454
13455    * Makefile.am, gtf_splicesites.pl.in: Added gtf_splicesites program
13456
13457    * config.site.rescomp.tst, VERSION: Updated version number
13458
13459    * indexdb.c: Fixed error message for PMAP
13460
13461    * trunk, config.site.rescomp.tst, src, gsnap.c, indexdb.c, outbuffer.c,
13462      outbuffer.h, resulthr.c, resulthr.h, splicealt.c, splicetrie.c,
13463      splicetrie.h, splicetrie_build.c, splicetrie_build.h, stage1hr.c,
13464      stage1hr.h, stage3hr.c, stopwatch.c, substring.c, util: Merged revisions
13465      43003 to 43362 from branches/2011-07-15-fast-knownsplices in creating
13466      splicecomp and creating specialized procedures for find_spliceends for
13467      shortends and distant splicing
13468
13469    * stage2.c: Fixed bug in older canonical dinucleotides procedure that
13470      overwrite -1 values in initial positions
13471
13472    * stage1hr.c: Revised comments
13473
13474    * Makefile.dna.am: Added command for splicealt
13475
13476    * psl_splicesites.pl.in, psl_introns.pl.in: Fixed warning messages
13477
13478    * gmap_build.pl.in: Added initial definition for -B flag
13479
134802011-07-16  twu
13481
13482    * stage2.c: Fixed problem in get_last, needed by find_shifted_canonical,
13483      when first few positions lack a value of -1.
13484
134852011-07-13  twu
13486
13487    * VERSION: Revised version number
13488
13489    * genome-write.c: Added fix for genomes in PC-DOS file format
13490
13491    * trunk, VERSION, config.site.rescomp.prd, src, atoiindex.c, cmetindex.c,
13492      gmap.c, gmapindex.c, gsnap.c, indexdb.c, oligo.c, stage1.c, stage1hr.c,
13493      stage1hr.h, stage3.c, stage3hr.c, stage3hr.h, util, gmap_build.pl.in,
13494      gmap_setup.pl.in: Merged revisions 42540 through 42858 from
13495      branches/2011-07-08-index-14mers to handler 13-mers and 14-mers, and to
13496      fix various bugs
13497
134982011-07-08  twu
13499
13500    * samprint.c: Using new interface to Pair_print_sam
13501
13502    * outbuffer.c, pair.c, pair.h, stage3.c, stage3.h: Using usersegment
13503      accession in SAM and GFF3 output when -g flag is specified.
13504
13505    * goby.h, gsnap.c, outbuffer.c, shortread.c, shortread.h, stage3hr.c,
13506      stage3hr.h, substring.h: Merged external changes for Goby from 2011-07-01
13507      and 2011-07-08
13508
13509    * Makefile.gsnaptoo.am: Added files for oligoindex_hr.c and oligoindex_hr.h
13510
13511    * goby.c: Applied external patch from 2011-07-08
13512
13513    * trunk, VERSION, config.site.rescomp.tst, src, Makefile.dna.am,
13514      genome_hr.c, genome_hr.h, gmap.c, gsnap.c, oligoindex.c, oligoindex.h,
13515      oligoindex_hr.c, oligoindex_hr.h, stage1hr.c, stage1hr.h, stage2.c,
13516      stage2.h, stage3.c, stage3hr.c, stage3hr.h, util: Merged revisions 42282
13517      to 42511 from branches/2011-07-05-oligoindex-hr to speed up GMAP
13518      oligoindex and provide options for GMAP
13519
13520    * oligoindex.c: Stop initializing all diagonals.  Initialize only when
13521      needed.
13522
13523    * splice-sites-hr.pl: Added -S flag for semicanonical splice sites
13524
13525    * indexdb.c: Fixed Indexdb_new_genome for PMAP
13526
13527    * gmap.c: Changed default index1part for PMAP
13528
135292011-07-07  twu
13530
13531    * splice-sites-hr.pl: Initial import into SVN
13532
135332011-07-05  twu
13534
13535    * shortread.c: Allows for non-digit, then "1" or "2" for paired-end reads
13536
13537    * dynprog.c: Added protection against negative genomic coordinates in
13538      dynamic programming at 5' and 3' ends
13539
13540    * shortread.c: Modified error statement
13541
13542    * stage1hr.c: Adding querylength to genomicbound in cases where no overlap
13543      is found, just in case the overlap is not found.
13544
13545    * stage1hr.c: Requiring high quality when aligning concordant unique hits
13546      with GMAP
13547
135482011-07-04  twu
13549
13550    * stage3.c: Trimming end exons if less than 20 bp and known splicing is
13551      available
13552
13553    * stage1hr.c: Improved debugging statements for GMAP alignments
13554
13555    * splicetrie.c: Counting mismatches correctly, without penalty, when known
13556      splicing exceeds observed distances
13557
13558    * gsnap.c, stage3hr.c, stage3hr.h: Adding ambiguous matches to nmatches when
13559      favor_ambiguous_p is true, which happens if we have known splicing without
13560      observed distances
13561
13562    * dynprog.c, dynprog.h: Implemented find_best_endpoint_to_queryend
13563
13564    * stage3.c: Trimming semi-canonical exons at end also
13565
13566    * splicetrie.c: Fixed bug in failing to push coordinates for ambiguous cases
13567
13568    * pair.c: Fixed symbol for rev semi-canonical pair in debugging output
13569
13570    * iit_store.c: Improved error message when parsing coords
13571
13572    * iit_store.c: Printing entire problematic line when a parsing error occurs
13573
135742011-07-03  twu
13575
13576    * stage3.c: Revised pick_cdna_direction to count semicanonical splices, and
13577      to use nmatches - nmismatches
13578
135792011-07-02  twu
13580
13581    * stage3hr.c: Fixed bug in always eliminating second hit in
13582      Stage3end_remove_overlaps
13583
135842011-07-01  twu
13585
13586    * Makefile.am, setup.ref123positions.ok, setup.ref3positions.ok,
13587      setup1.test.in, setup2.test.in: Made changes for renaming of ref3positions
13588      to ref123positions
13589
13590    * gmap.c, gsnap.c, stage2.c, stage2.h: Added setup for stage 2 to handle
13591      GMAP alignments without splicing
13592
13593    * pair.c: Putting "0M" between adjacent deletion and insertion in CIGAR
13594      string
13595
13596    * gsnap.c, stage1hr.c, stage1hr.h: Providing --allow-gmap flag to control
13597      whether GMAP alignments are allowed
13598
13599    * stage3.c: Adding gap holders before nonconcordant exon trimming.  Checking
13600      if nonconcordant end exon is less than halfway from end.  In picking cDNA
13601      direction, checking only for presence or absence of nonconcordant or
13602      concordant introns.
13603
13604    * stage2.c: Put back diffdist_penalty
13605
13606    * stage1hr.c: Using better genomicbounds when running GMAP alignments.
13607      Introduced check for very bad GMAP alignments (nmatches < querylength/2).
13608      Comparing GMAP against original hit for aligning concordant uniques with
13609      GMAP.  Trying top hits with GMAP to find concordant pairs, rather than
13610      checking to see if the total number of hits is less than a threshold.
13611
13612    * pair.c, pair.h, shortread.c, stage3hr.c, stage3hr.h, substring.c,
13613      substring.h: Added procedures for computing better genomicbounds on GMAP
13614      alignments, based on overlap between the paired ends
13615
13616    * gsnap.c, splicetrie.c, splicetrie.h: Added option for amb_closest_p
13617      behavior, where shortest intron among ambiguous ones is picked
13618
13619    * genome-write.c: Added index1part as parameter to some procedures
13620
136212011-06-30  twu
13622
13623    * stage3hr.c: Allowing terminal ends to win on the basis of nmatches,
13624      instead of score
13625
13626    * stage3.c: Added pass 6a to remove noncanonical end exons
13627
13628    * stage2.c: Restored NINTRON_PENALTY_MISMATCH from 4 to 8
13629
13630    * stage1hr.c: Generalized minimum querylength for one-miss algorithm to
13631      handle 15-mers.  Order is now GMAP, terminal 1, terminal 2, and distant
13632      splicing, with terminals done if found_score > trigger_score_for_gmap.
13633
13634    * gmap.c, gsnap.c, iit-read.c, iit-read.h, outbuffer.c, outbuffer.h: Added
13635      flags for read group library and platform in SAM headers
13636
13637    * compress.c, compress.h, genome-write.c, genome-write.h, gmapindex.c:
13638      Allowing for 15-mer genomic indices when writing an uncompressed genome
13639      using a file
13640
13641    * spanningelt.c: Generalized from 12-mers to 15-mers
13642
136432011-06-29  twu
13644
13645    * gmap_build.pl.in: Fixed installation for 12-mer and 15-mer indices
13646
13647    * gmap.c, gsnap.c: Added -B 5 option to allocate offsets file
13648
13649    * trunk, src, Makefile.dna.am, access.h, atoiindex.c, block.c, block.h,
13650      cmetindex.c, gdiag.c, gmap.c, gmapindex.c, gsnap.c, indexdb.c, indexdb.h,
13651      indexdb_dump.c, indexdb_hr.c, indexdb_hr.h, indexdbdef.h, oligo-count.c,
13652      oligo.c, oligo.h, oligop.c, oligop.h, outbuffer.c, pmapindex.c,
13653      snpindex.c, spanningelt.c, splicetrie.c, stage1.c, stage1.h, stage1hr.c,
13654      stage1hr.h, stage2.c, stage3.c, stage3.h, stage3hr.c, stage3hr.h,
13655      substring.c, substring.h, util, gmap_build.pl.in, gmap_setup.pl.in: Merged
13656      revisions 41633 to 41936 from branches/2011-06-22-index-15mers to allow
13657      for 15-mers in genomic indices
13658
136592011-06-24  twu
13660
13661    * stage1hr.c: Computing query_compress if necessary before deciding whether
13662      to perform halfmapping GMAP on concordant unique
13663
13664    * pair.c: Fixed bug in printing start: instead of end: when endtype2 is END
13665
136662011-06-22  twu
13667
13668    * trunk, config.site.rescomp.prd, config.site.rescomp.tst, src, stage1hr.c,
13669      util: Merged revisions 41618 to 41631 from
13670      branches/2011-06-22-gmap-earlier to move GMAP algorithm before distant
13671      splices
13672
13673    * dynprog.c, dynprog.h, gmap.c, gsnap.c, iit-read.c, intron.c, intron.h,
13674      stage1hr.c, stage1hr.h, stage3.c, stage3.h, stage3hr.c, stage3hr.h: Merged
13675      revisions 41516 to 41609 from branches/2011-06-14-terminals to try GMAP up
13676      to 5 times before terminals if no concordant matches can be found.
13677
136782011-06-21  twu
13679
13680    * dynprog.c, dynprog.h, gmap.c, iit-read.c, iit-read.h, maxent_hr.c, pair.c,
13681      pair.h, splicetrie.c, splicetrie.h, stage1hr.c, stage2.c, stage3.c,
13682      stage3.h, stage3hr.c, stage3hr.h, substring.c, substring.h: Merged
13683      revisions 41410 to 41516 from branches/2011-06-14-terminals. Applying GMAP
13684      for halfmapping multiple and unpaired unique results. Allowing ambiguous
13685      splice ends for GMAP alignments.  Fixed computation of splice junctions.
13686      Allowing canonical exons to be found for short exons in GMAP alignments.
13687      Improved pick_cdna_direction.  Made fixes to GSNAP output for GMAP
13688      alignments.
13689
136902011-06-18  twu
13691
13692    * src, stage3hr.c: Merged change from branches/2011-06-14-terminals to check
13693      for subsumption in paired alignments
13694
136952011-06-17  twu
13696
13697    * samprint.c, stage1hr.c, stage3hr.c, stage3hr.h, substring.c, substring.h:
13698      Merged revisions 41223 to 41410 from branches/2011-06-14-terminals to use
13699      lexicographic comparison in Stage3end_remove_duplicates; to prevent known
13700      splicing from extending past chromosomal bounds; to prevent end indels
13701      from going past right end of chromosome; to iterate through both
13702      mismatch_positions_cont and mismatch_positions_shift simultaneously in end
13703      indel procedures; and to compute nmatches over entire substring.
13704
137052011-06-14  twu
13706
13707    * genome.c: Allowing fill_buffer_simple procedures to fill past the left end
13708      of the genome
13709
13710    * splicetrie.c, splicetrie.h, stage1hr.c: Preventing short-overlap splicing
13711      from going past chromosomal boundaries
13712
13713    * VERSION: Updated version
13714
13715    * gmap.c: Increased default chimera margin from 20 to 40
13716
13717    * trunk, src, gmap.c, stage3.c, stage3.h, util: Merged revisions 41185 to
13718      41214 from branches/2011-06-13-gmap-merge.  Merging chimeric parts when
13719      possible.
13720
13721    * trunk, config.site.rescomp.tst, src, maxent.c, stage3hr.c, util: Merged
13722      revisions 40955 to 41210 from releases/internal-2011-06-09
13723
13724    * trunk, src, dynprog.c, stage1hr.c, stage3hr.c, stage3hr.h, util: Merged
13725      revisions 40682 to 41210 from releases/internal-2011-06-06
13726
13727    * stage1hr.c: Turning on NEW_TERMINALS branch
13728
137292011-06-13  twu
13730
13731    * shortread.c, shortread.h: Added function Shortread_find_overlap
13732
137332011-06-12  twu
13734
13735    * stage1hr.c: Added hooks for computing terminals at zero penalty if
13736      necessary
13737
137382011-06-10  twu
13739
13740    * stage3hr.c: Allowing use of Stage3end_substringD and Stage3end_substringA
13741      by half splices
13742
13743    * samprint.c, stage3hr.c, substring.c, substring.h: Merged changes from
13744      releases/internal-2011-06-09 to change internal representation of splice
13745      from substring1 for donor and substring2 for acceptor.  Now substring1 and
13746      substring2 are in query order.  Needed to avoid problems when a splice was
13747      labeled both as donor/acceptor and as acceptor/donor.
13748
13749    * gsnap.c: Checking for value of adapter stripping flag.  Turning on adapter
13750      stripping by default.
13751
137522011-06-09  twu
13753
13754    * gmap.c, gsnap.c: Joining worker threads instead of detaching them, so
13755      Inbuffer_free can be called safely
13756
13757    * gmap.c, stage3.c, stage3.h: Merge chimeric parts into a single continuous
13758      alignment if possible.
13759
13760    * VERSION: Updated version
13761
13762    * gmap.c: Allowing chimera switchpoint to occur one base pair earlier
13763
13764    * chimera.c: Improved debugging statements
13765
13766    * genome.c: Made Genome_fill_buffer refer to its local genome argument, not
13767      the global one.  Needed to fix a bug in snpindex.
13768
13769    * Makefile.dna.am, Makefile.gsnaptoo.am, compress.h, dynprog.h, outbuffer.c,
13770      pair.c, pair.h, pairpool.h, samprint.c, splicetrie.h, splicetrie_build.h,
13771      stage3.c, stage3.h: Printing XT flag for translocation information in SAM
13772      output of both GMAP and GSNAP
13773
13774    * chimera.c, chimera.h, gmap.c: Changed algorithm for finding chimera
13775      boundary in GMAP to maintain best number of mismatches, and then to find
13776      highest splice site probabilities within that range
13777
137782011-06-08  twu
13779
13780    * gsnap.c: Fixed mode flag to take a required argument
13781
13782    * gmap.c, outbuffer.c, outbuffer.h: Added .transloc split output file for
13783      GMAP
13784
13785    * gmap.c: Hiding -s flag from help output
13786
13787    * gmap.c, stage3.c, stage3.h: Revising pairarray genomepos coordinates of
13788      GMAP chimeras to be chromosomal coordinates, so SAM output is correct
13789
13790    * pair.c: Implemented hard clipping in SAM output for GMAP chimeras
13791
137922011-06-06  twu
13793
13794    * samprint.c: Made the sign of insertlength for a translocation depend on
13795      the concordant substring
13796
13797    * VERSION: Updated version number
13798
13799    * stage3hr.c: Fixed computation of pair_insert_length when there is no
13800      overlap
13801
13802    * stage1hr.c: Added debugging statements
13803
13804    * stage1hr.c: Corrected privatep flags and memory freeing for halfmapping
13805      unique cases solved by GMAP.
13806
13807    * outbuffer.c: Corrected argument list for GMAP when MEMUSAGE is turned on
13808
13809    * stage3hr.c, stage3hr.h: Added function Stage3end_effective_chrnum
13810
13811    * samprint.c: For mates that are translocations, using the effective chrnum
13812      in printing the mate location
13813
13814    * resulthr.c: Fixed bug in assigning UNPAIRED_TRANSLOC category
13815
138162011-06-05  twu
13817
13818    * stage1hr.c: Added checks for new pair from Stage3pair_new being NULL
13819
13820    * stage3.c: Placed a limit on iterations of building ends using known
13821      splicing
13822
13823    * trunk, src, util: Merged property changes on subdirectories from
13824      branches/2011-06-04-gmap-genomicseg
13825
13826    * VERSION, config.site.rescomp.prd, config.site.rescomp.tst: Updated version
13827      number
13828
13829    * dynprog.c, gmap.c, pair.c, stage1hr.c, stage3.c, stage3hr.c, stage3hr.h:
13830      Merged changes from revision 40649 to 40663 from
13831      branches/2011-06-04-gmap-genomicseg to provide correct bounds on GMAP
13832      alignment in GSNAP and to improve various issues in alignment, including
13833      close indels
13834
138352011-06-03  twu
13836
13837    * stage3hr.c: Changed paired_seenp back to paired_usedp in
13838      Stage3end_remove_duplicates, because not all pairs are seen in previous
13839      pair_up procedures, resulting in a fatal bug.
13840
13841    * config.site.rescomp.prd, config.site.rescomp.tst, VERSION: Updated version
13842      number
13843
13844    * gmap.c, splicetrie.c, splicetrie.h, stage1hr.c, stage3.c, stage3.h: Making
13845      use of jump_late_p
13846
13847    * dynprog.c, dynprog.h: Added provision for jump_late_p.  Fixed issue in
13848      jump_penalty where it was not consistent with jump_penalty_init.  Now both
13849      procedures compute extend*length.
13850
138512011-06-02  twu
13852
13853    * pair.c: Fixed printing of dashes in GSNAP standard output
13854
13855    * dynprog.c, dynprog.h, gmap.c, stage1hr.c, stage3.c, stage3.h: Allowing
13856      close combinations of insertions and deletions, by allowing onesidegapp to
13857      be false and letting extraband_single equal 3 instead of 0.  Controlled by
13858      --allow-close-indels flag in GMAP.  Default set to be on in GMAP and in
13859      GSNAP.
13860
13861    * stage1hr.c: Not performing Stage3end_remove_duplicates on exact matches,
13862      which should not be necessary.
13863
13864    * stage3hr.c: In Stage3end_remove_duplicates, checking against paired_seenp,
13865      instead of paired_usedp, for speed.
13866
13867    * stage3hr.c: Reverted back to revision 40489 of Stage3_pair_up_concordant,
13868      which does has the pairing procedures inline.
13869
13870    * stage3hr.c: Attempt to have different lists for old and new hits, but this
13871      seems to slow down the program.
13872
138732011-06-01  twu
13874
13875    * stage3hr.c: Moved parts of Stage3_pair_up_concordant into separate
13876      procedures
13877
13878    * stage1hr.c, stage3hr.c, stage3hr.h: Performing GMAP on concordant unique
13879      results where one end is of type TERMINAL
13880
13881    * gsnap.c: Changed default indel penalty from 1 to 2
13882
13883    * stage3hr.h: Formatting change
13884
13885    * stage3hr.c, substring.c, substring.h: Using nchimera_novel in
13886      Stage3end_remove_overlaps
13887
13888    * stage3hr.c: In Stage3pair_remove_overlap, favoring longer insert lengths,
13889      if all else is equal
13890
13891    * pair.c: Moved MD string for SAM output before NH tag to be consistent with
13892      other GSNAP SAM output
13893
13894    * stage1hr.c, stage3hr.c, stage3hr.h, substring.c, substring.h: Making a
13895      distinction between Stage3end_remove_duplicates and
13896      Stage3end_remove_overlaps
13897
13898    * stage3hr.c: Reverted to old method of finding pair insert length, where
13899      all substrings are checked.
13900
13901    * stage1hr.c: In pairing algorithm, moved short-overlap splicing and distant
13902      splicing into a single singlesplicing class, so duplicates are handled
13903      properly.
13904
13905    * gsnap.c: Added documentation for --use-tally flag
13906
139072011-05-30  twu
13908
13909    * inbuffer.c, inbuffer.h: Changed nspaces and nread to unsigned int
13910
13911    * gmap.c, gsnap.c, outbuffer.c, outbuffer.h: Made output buffer size a
13912      user-definable parameter
13913
13914    * gmap.c, gsnap.c, outbuffer.c, outbuffer.h: Made more changes to output
13915      thread.  Made noutput a local variable. Clearing backlog in ordered output
13916      when necessary.
13917
13918    * stage1hr.c: Added a dinucleotide check for repetitive sequences
13919
13920    * gmap.c, gsnap.c, mem.c, mem.h, outbuffer.c, request.c, result.c,
13921      resulthr.c, sequence.c, shortread.c, stage1hr.c, stage1hr.h, stage3.c,
13922      stage3hr.c, substring.c: Replaced LEAKCHECK system with MEMUSAGE system
13923
13924    * list.c, list.h: Added specialized procedures for using specific memory
13925      pools for memusage
13926
13927    * inbuffer.c: Added comment
13928
13929    * diagpool.c, diagpool.h, pairpool.c, pairpool.h: Added procedures for
13930      reporting memory usage.  Using memory from keep portion.
13931
13932    * outbuffer.c: Cleaned up pthread code for output thread.  Added MAXQUEUE to
13933      clear out outbuffer.
13934
139352011-05-28  twu
13936
13937    * config.site.rescomp.tst: Added -Wextra to CFLAGS
13938
13939    * gsnap.c, stage3hr.c: Eliminating hitpair duplicates based on hittypes of
13940      ends.  Allowing MAPQ score to go as high as 96.
13941
139422011-05-27  twu
13943
13944    * gsnap.c, stage1hr.c, stage3hr.c, stage3hr.h: Implemented mapq-unique-score
13945
13946    * internal-2011-02-27, AUTHORS, COPYING, INSTALL, MAINTAINER, Makefile.am,
13947      NEWS, README, VERSION, acinclude.m4, bootstrap.dna, bootstrap.gmaponly,
13948      bootstrap.gsnaptoo, bootstrap.pmaptoo, bootstrap.three, config,
13949      acx_mmap_fixed.m4, acx_mmap_variable.m4, acx_pthread.m4, builtin.m4,
13950      config.guess, config.sub, expand.m4, fopen.m4, ltmain.sh,
13951      madvise-flags.m4, mmap-flags.m4, pagesize.m4, perl.m4, struct-stat64.m4,
13952      config.site, config.site.rescomp.prd, config.site.rescomp.tst,
13953      configure.ac, dev, maint, memory-check.pl, share, archive.html,
13954      index.html, src, Makefile.dna.am, Makefile.gmaponly.am,
13955      Makefile.gsnaptoo.am, Makefile.pmaptoo.am, Makefile.three.am,
13956      Makefile.util.am, access.c, access.h, add_rpk.c, assert.c, assert.h,
13957      atoi.c, atoi.h, atoiindex.c, backtranslation.c, backtranslation.h,
13958      bam_tally.c, bamread.c, bamread.h, bigendian.c, bigendian.h, block.c,
13959      block.h, bool.h, boyer-moore.c, boyer-moore.h, cappaths.c, changepoint.c,
13960      changepoint.h, chimera.c, chimera.h, chop_primers.c, chrnum.c, chrnum.h,
13961      chrom.c, chrom.h, chrsegment.c, chrsegment.h, chrsubset.c, chrsubset.h,
13962      cmet.c, cmet.h, cmetindex.c, color.c, color.h, comp.h, complement.h,
13963      compress.c, compress.h, convert.t.c, cum.c, datadir.c, datadir.h, datum.c,
13964      datum.h, diag.c, diag.h, diagdef.h, diagnostic.c, diagnostic.h,
13965      diagpool.c, diagpool.h, dibase.c, dibase.h, dibaseindex.c, doublelist.c,
13966      doublelist.h, dynprog.c, dynprog.h, except.c, except.h, exonscan.c,
13967      extents_genebounds.c, fopen.h, gbuffer.c, gbuffer.h, gdiag.c,
13968      geneadjust.c, genecompare.c, geneeval.c, genome-write.c, genome-write.h,
13969      genome.c, genome.h, genome_hr.c, genome_hr.h, genomepage.c, genomepage.h,
13970      genomeplot.c, genomicpos.c, genomicpos.h, genuncompress.c, get-genome.c,
13971      getopt.c, getopt.h, getopt1.c, gmap.c, gmapindex.c, goby.c, goby.h,
13972      gregion.c, gregion.h, gsnap.c, gsnap_best.c, gsnap_concordant.c,
13973      gsnap_extents.c, gsnap_fasta.c, gsnap_filter.c, gsnap_iit.c,
13974      gsnap_multiclean.c, gsnap_splices.c, gsnap_tally.c, gsnap_terms.c,
13975      gsnapread.c, gsnapread.h, hint.c, hint.h, iit-read.c, iit-read.h,
13976      iit-write.c, iit-write.h, iit_dump.c, iit_fetch.c, iit_get.c,
13977      iit_pileup.c, iit_plot.c, iit_store.c, iit_update.c, iitdef.h, inbuffer.c,
13978      inbuffer.h, indexdb.c, indexdb.h, indexdb_dibase.c, indexdb_dibase.h,
13979      indexdb_dump.c, indexdb_hr.c, indexdb_hr.h, indexdbdef.h, interval.c,
13980      interval.h, intlist.c, intlist.h, intlistdef.h, intpool.c, intpool.h,
13981      intron.c, intron.h, lgamma.c, lgamma.h, list.c, list.h, listdef.h,
13982      littleendian.c, littleendian.h, mapq.c, mapq.h, match.c, match.h,
13983      matchdef.h, matchpool.c, matchpool.h, maxent.c, maxent.h, maxent_hr.c,
13984      maxent_hr.h, md5-compute.c, md5.c, md5.h, mem.c, mem.h, memchk.c, mode.h,
13985      nmath.c, nmath.h, nr-x.c, nr-x.h, oligo-count.c, oligo.c, oligo.h,
13986      oligoindex.c, oligoindex.h, oligop.c, oligop.h, orderstat.c, orderstat.h,
13987      outbuffer.c, outbuffer.h, pair.c, pair.h, pairdef.h, pairingcum.c,
13988      pairingflats.c, pairinggene.c, pairingstrand.c, pairingtrain.c,
13989      pairpool.c, pairpool.h, parserange.c, parserange.h, pbinom.c, pbinom.h,
13990      pdl_smooth.c, pdldata.c, pdldata.h, pdlimage.c, plotdata.c, plotdata.h,
13991      plotgenes.c, plotgenes.h, pmapindex.c, random.c, random.h, rbtree.c,
13992      rbtree.h, rbtree.t.c, reader.c, reader.h, reads.c, reads.h, reads_dump.c,
13993      reads_store.c, request.c, request.h, result.c, result.h, resulthr.c,
13994      resulthr.h, revcomp.c, samflags.h, samprint.c, samprint.h, samread.c,
13995      samread.h, scores.h, segmentpos.c, segmentpos.h, segue.c, separator.h,
13996      seqlength.c, sequence.c, sequence.h, shortread.c, shortread.h, smooth.c,
13997      smooth.h, snpindex.c, spanningelt.c, spanningelt.h, spliceclean.c,
13998      spliceeval.c, splicefill.c, splicegene.c, splicegraph.c, splicescan.c,
13999      splicetrie.c, splicetrie.h, splicetrie_build.c, splicetrie_build.h,
14000      spliceturn.c, splicing-scan.c, splicing-score.c, stage1.c, stage1.h,
14001      stage1hr.c, stage1hr.h, stage2.c, stage2.h, stage3.c, stage3.h,
14002      stage3hr.c, stage3hr.h, stopwatch.c, stopwatch.h, subseq.c, substring.c,
14003      substring.h, table.c, table.h, tableint.c, tableint.h, tableuint.c,
14004      tableuint.h, tally.c, tally.h, tally_exclude.c, tally_expr.c, tallyadd.c,
14005      tallyflats.c, tallygene.c, tallyhmm.c, tallystrand.c, translation.c,
14006      translation.h, trial.c, trial.h, types.h, uintlist.c, uintlist.h,
14007      uinttable.c, uinttable.h, svncl.pl, tests, align.test.in, align.test.ok,
14008      coords1.test.in, coords1.test.ok, defs, fa.iittest, iit.test.in,
14009      iit_get.out.ok, iittest.iit.ok, map.test.ok, setup.genomecomp.ok,
14010      setup.idxpositions.ok, setup.ref3positions.ok, setup1.test.in,
14011      setup2.test.in, ss.chr17test, ss.her2, util, dbsnp_iit.pl.in,
14012      ddsgap2_compress.pl, fa_coords.pl.in, gmap_build.pl.in,
14013      gmap_compress.pl.in, gmap_process.pl.in, gmap_reassemble.pl.in,
14014      gmap_setup.pl.in, gmap_uncompress.pl.in, gmap_update.pl.in,
14015      gsnap-fetch-reads.pl, gsnap-fetch-reads.pl.in, gsnap-remap.pl,
14016      gsnap-remap.pl.in, md_coords.pl.in, psl_introns.pl.in, psl_splices.pl.in,
14017      psl_splicesites.pl.in, sam_merge.pl.in, sam_restore.pl.in,
14018      sim4_compress.pl, sim4_uncompress.pl, spidey_compress.pl, whats_on, trunk:
14019      Restored gmap trunk subdirectory
14020
14021    * VERSION: Updated version
14022
14023    * substring.c: Computing MAPQ on entire substring, not on trimmed portion
14024
14025    * dynprog.c, dynprog.h, gmap.c, stage1hr.c, stage3.c, stage3.h: For genomic
14026      GMAP alignments in GSNAP, not assigning any canonical reward, not
14027      computing pairs_rev, and not scoring introns.
14028
14029    * splicetrie.c, splicetrie.h, stage1hr.c: Removed unused parameter
14030      splicetypes
14031
14032    * gsnap.c, stage1hr.c, stage1hr.h: Removed unused parameters, including
14033      queryptr and queryrc
14034
14035    * src, Makefile.dna.am, Makefile.gsnaptoo.am, dynprog.c, dynprog.h,
14036      genome.c, genome.h, genome_hr.c, genome_hr.h, gmap.c, gsnap.c, mapq.c,
14037      mapq.h, maxent_hr.c, maxent_hr.h, oligoindex.c, outbuffer.c, outbuffer.h,
14038      pair.c, pair.h, samprint.c, samprint.h, splicetrie.c, splicetrie.h,
14039      splicetrie_build.c, splicetrie_build.h, stage1hr.c, stage1hr.h, stage3.c,
14040      stage3.h, stage3hr.c, stage3hr.h, substring.c, substring.h: Merged
14041      revisions 40182:40234 from branches/2011-05-27-no-block-vars to reduce
14042      number of parameters
14043
140442011-05-26  twu
14045
14046    * VERSION: Updated version number
14047
14048    * dynprog.c, dynprog.h, splicetrie.c, splicetrie.h, stage1hr.c, stage3.c,
14049      stage3.h: Restored complete searching of known splicesites for dynamic
14050      programming of ends
14051
14052    * stage1hr.c, dynprog.c, dynprog.h, splicetrie.c, splicetrie.h, stage3.c,
14053      stage3.h: Created hybrid procedure for performing dynamic programming at
14054      5' and 3' ends with known splicing
14055
14056    * dynprog.c, dynprog.h, splicetrie.c, splicetrie.h, stage3.c, stage3.h:
14057      Wrote faster procedure for performing dynamic programming at 5' and 3'
14058      ends with known splicing, but does not handle distal indels.
14059
14060    * inbuffer.c, gsnap.c, request.c, request.h, shortread.c, shortread.h:
14061      Performing chopping of adapters only after paired-end alignment fails to
14062      give concordant or paired result.
14063
14064    * stage1hr.c: Removed query as a parameter.  Changed knownsplice limits.
14065
14066    * uintlist.c, uintlist.h: Added procedure Uintlist_to_string
14067
14068    * mapq.c, mapq.h, stage3hr.c, stage3hr.h, substring.c, substring.h: Removed
14069      query as parameter to procedures
14070
14071    * genome_hr.c, genome_hr.h: Removed query as parameter to some procedures
14072
140732011-05-25  twu
14074
14075    * stage3hr.c: Moved assertions about private5p and private3p to correct place
14076
14077    * gsnap.c, inbuffer.c, request.c, request.h, shortread.c, shortread.h: When
14078      potential paired-end adapter is found, checking alignment first without
14079      chopping adapters, and then if no concordant or paired alignments are
14080      found, then re-aligning with adapters chopped.
14081
14082    * stage3hr.c: Removing only duplicates that have not been used yet in a pair
14083
14084    * stage1hr.c: Doing Stage3pair_privatize before Stage3pair_eval
14085
14086    * pair.c: Added NH tag for GMAP alignments in GSNAP
14087
14088    * get-genome.c: Enabling re-use of contig_iit
14089
14090    * shortread.c: Fixed bug in printing pairedend fasta.  Was printing both
14091      revcomp and forward sequence for queryseq2.
14092
140932011-05-24  twu
14094
14095    * stage1hr.c, stage3hr.c, stage3hr.h: Reduced amount of memory copying in
14096      making Stage3pair_T objects
14097
14098    * get-genome.c, parserange.c, parserange.h: Made operation of get-genome
14099      from stdin more efficient by making only one open of chromosome_iit and
14100      contig_iit
14101
14102    * VERSION: Updated version
14103
14104    * gsnap.c: Added message to indicate when alignment is starting
14105
14106    * stage1hr.c: Doing pairing only when individual alignments are performed
14107
14108    * outbuffer.c: Fixed debugging statement
14109
14110    * gmap.c: Added information to --help output on the -f flag about other
14111      output types
14112
14113    * gsnap.c: Changed default value of genome_unk_mismatch_p to be 1
14114
141152011-05-23  twu
14116
14117    * samprint.c: Fixed sign insert size when read and mate have identical chrpos
14118
14119    * VERSION: Updated version
14120
14121    * samprint.c: Added NH flag to indicate number of paths
14122
14123    * gsnap_concordant.c: Defined concordance to allow for overlapping reads
14124
14125    * stage1hr.c: Introduced DEBUG4K for known doublesplicing
14126
14127    * dynprog.c, dynprog.h, gmap.c, intlist.c, intlist.h, splicetrie.c,
14128      splicetrie.h, stage3.c, stage3.h: Removing duplicate results from
14129      splicetrie when SNPs are allowed
14130
14131    * samprint.c: Fixed bug in printing cigar for two-thirds shortexon on minus
14132      strand
14133
14134    * gmap.c: Removed include of mode.h
14135
14136    * Makefile.dna.am, Makefile.gsnaptoo.am, atoi.c, atoi.h, atoiindex.c,
14137      genome_hr.c, genome_hr.h, gmap.c, gsnap.c, indexdb.c, indexdb.h, mode.h,
14138      stage1hr.c, stage1hr.h, substring.c, substring.h: Using Mode_T instead of
14139      cmetp.  Incorporated atoi mode.
14140
14141    * oligo.c, oligo.h: Removed Oligo_setup
14142
14143    * convert.t.c: Initial import into SVN
14144
14145    * gsnap_fasta.c: Added code stub for handling BAM input
14146
14147    * gsnap.c: Added a thread-specific key for storing the request, and
14148      accessing it with the signal handler, which no longer throws an exception.
14149
14150    * stage3.c: Checking for case in distalmedial comparison where medial
14151      location extends past given genomicseg.
14152
14153    * stage1hr.c: Replaced indirect function calls with direct calls to
14154      read_oligos_cmet and read_oligos_standard.
14155
14156    * oligo.c: Replaced indirect function calls with direct calls to oligo_read
14157      and oligo_revise.  Not handling dibasep anymore.
14158
14159    * genome_hr.c, genome_hr.h: Replaced indirect function calls with static
14160      inline procedures
14161
141622011-05-22  twu
14163
14164    * stage1.c: Removed dibase parameter in calling Reader_new
14165
14166    * stage3hr.c: Removed assertion check for plusp equality in
14167      pair_insert_length.  For splice translocations, redefining plusp based on
14168      substring_for_concordance.
14169
14170    * gsnap.c: Fixed output of query when exception occurs
14171
14172    * except.c: Fixed handling of exceptions by removing unnecessary call to
14173      Except_advance_stack.
14174
14175    * gsnap.c: Removed dibasep and cmetp as parameters.
14176
14177    * stage1hr.c, stage1hr.h: Removed dibasep as a parameter.  Also eliminated
14178      cmetp as a parameter.
14179
14180    * substring.c, substring.h, stage3hr.c, stage3hr.h, splicetrie.c,
14181      splicetrie.h, reader.c, reader.h, mapq.c, mapq.h, genome_hr.c,
14182      genome_hr.h: Removed dibasep as a parameter
14183
14184    * oligo.c, oligo.h: Removed dibasep as a parameter.  Added setup procedure
14185      to assign procedure for dibase operation.
14186
141872011-05-21  twu
14188
14189    * genome_hr.c, genome_hr.h, gsnap.c, mapq.c, mapq.h, splicetrie.c,
14190      splicetrie.h, stage1hr.c, stage3hr.c, stage3hr.h, substring.c,
14191      substring.h: Setting block_diff procedure in genome_hr.c during setup, and
14192      removing many uses of the cmetp variable.
14193
14194    * dynprog.c: Include header for splicetrie.h
14195
141962011-05-20  twu
14197
14198    * cmet.c, cmet.h, genome_hr.c: Moved mark_a, mark_c, mark_g, and mark_t data
14199      and procedures from cmet.c to genome_hr.c
14200
14201    * Makefile.gsnaptoo.am: Made files for GMAP and GSNAP match those in
14202      Makefile.dna.am
14203
14204    * Makefile.dna.am: Minor rearrangement of filenames
14205
14206    * samprint.c: Fixed printing of SAM output for translocations
14207
14208    * stage3hr.c: Using both hit5 and hit3 end points in hitpair_equal_cmp.  Put
14209      tally ahead of score in ranking subsumed hitpairs.
14210
14211    * splicetrie.c: Constraining max_mismatches_allowed to be less than
14212      one-third of the end length
14213
14214    * substring.c: Added header file for pair.h
14215
14216    * stage3hr.c: Not using absdifflength bingo at all
14217
14218    * gsnap.c: Removed references to pairlength deviation
14219
14220    * stage3.c: In peel_leftward and peel_rightward, when running into a second
14221      gap, transferring endgappairs first before transferring peeled pairs.
14222
14223    * dynprog.c: Using List_push_existing instead of Pairpool_push_existing to
14224      save on memory
14225
14226    * stage3hr.c: Using only best splice within constraints and not pairlength
14227      in resolving ambiguous inside splices
14228
14229    * stage1hr.c: Fixed definition of collect_all_p for shortexons
14230
14231    * splicetrie.c, splicetrie.h: Removed old code.  Implemented collect_all_p
14232      in Splicetrie_search_left and Splicetrie_search_right procedure.
14233
142342011-05-19  twu
14235
14236    * stage1hr.c, stage3hr.c, stage3hr.h: Requiring concordant pairs to have a
14237      non-zero insert length
14238
14239    * stage1hr.c, stage3hr.c, stage3hr.h: Recording number of ambiguous matches
14240      after known splicing, and subtracting from nmatches when ambiguous inner
14241      splicing yields no candidates.
14242
142432011-05-18  twu
14244
14245    * gmap.c, stage1hr.c, stage2.c, stage2.h, stage3.c: Allowing stage 2 to
14246      favor either left or right part of genomicseg
14247
14248    * gsnap.c: Using the same value for middle and end indel penalties.  Changed
14249      flags to allow only one indel penalty to be specified.
14250
14251    * stage3hr.c: In Stage3pair_remove_duplicates, allowing for ties within
14252      cluster. Using hittype to rank hitpairs in hitpair_equal_cmp, but not for
14253      distinguishing hitpairs in hitpair_equal_no_hittype_cmp.
14254
14255    * stage3hr.c: Fixed bug in using hitpair_equal_cmp
14256
14257    * stage3hr.c: Using same procedure, hitpair_equal_cmp (previously called
14258      hitpair_position_cmp), both for sorting and for recognizing equal hitpairs.
14259
14260    * stage3hr.c: In Stage3pair_remove_duplicates, added hittype in sorting and
14261      removing exact duplicates.  Going through clusters separately from left
14262      and right and checking for subsumption against initial alignment, not the
14263      previous one.
14264
142652011-05-17  twu
14266
14267    * stage3hr.c: In Stage3pair_remove_duplicates, using tally within clusters.
14268      Removing absdifflength bingo from Stage3pair_optimal_score.
14269
14270    * README, config.site, configure.ac: Setting default value of MAX_READLENGTH
14271      to be 200
14272
14273    * gsnap.c: Providing value of MAX_READLENGTH in printing --version output
14274
14275    * dibase.c, inbuffer.c, mapq.c, shortread.c, stage3hr.c, substring.c:
14276      Removed unnecessary includes of stage1hr.h, needed previously to obtain
14277      MAX_READLENGTH
14278
14279    * Makefile.dna.am: Providing MAX_READLENGTH to gsnap.  Provided files for
14280      bam_fasta.
14281
14282    * README, config.site, configure.ac, Makefile.gsnaptoo.am, dibase.c,
14283      gsnap.c, inbuffer.c, mapq.c, reads_store.c, samprint.c, shortread.c,
14284      stage1hr.c, stage1hr.h, stage3hr.c, substring.c: Changed MAX_QUERYLENGTH
14285      to MAX_READLENGTH and allowing value to be defined as an argument to
14286      configure
14287
14288    * samprint.c, shortread.c, shortread.h: Printing chopped primers in SAM
14289      output
14290
14291    * stage3hr.c: Moved code for resolving inside ambiguous splices to separate
14292      procedures.  Allowing mate of a GMAP alignment to resolve its inside
14293      ambiguous splice.
14294
14295    * dynprog.c, dynprog.h, gmap.c, splicetrie.c, splicetrie.h, stage1hr.c,
14296      stage3.c, stage3.h: Limiting region of known splice extension for GMAP
14297      alignments in GSNAP involving paired-end reads.  Region now cannot extend
14298      past mate.
14299
143002011-05-16  twu
14301
14302    * stage3hr.c: Allowing overlapping of paired-end reads when resolving
14303      ambiguous splices on insides
14304
14305    * dynprog.c, dynprog.h, gmap.c, stage1hr.c, stage3.c, stage3.h: Counting
14306      ambiguous end matches in stage 3 alignment
14307
14308    * substring.c: Hid debugging statement
14309
14310    * gsnap.c: Using user-provided dir for tally IIT when available
14311
14312    * substring.c, substring.h, gsnap.c, stage1hr.c, stage3hr.c, stage3hr.h:
14313      Implemented a multiclean procedure using a tally IIT file
14314
143152011-05-15  twu
14316
14317    * stage3hr.c: Not using hitpair type in making comparisons across multiple
14318      alignments.  Using subsumption instead of overlap in
14319      Stage3pair_remove_duplicates.
14320
143212011-05-14  twu
14322
14323    * stage3hr.c: Changed removal of Stage3pair_T duplicates within overlapping
14324      clusters from an O(n^2) algorithm to an O(n) algorithm
14325
143262011-05-13  twu
14327
14328    * stage1hr.c: Added debugging statement
14329
14330    * stage3hr.c: Assigning plusp for translocations based on overall
14331      genomestart and genomeend.  Setting substring_low and substring_high for
14332      translocations to be the part that is concordant.
14333
14334    * samprint.c: Fixed bug in printing of translocations
14335
14336    * gsnap.c, genome_hr.c, genome_hr.h: Added options --query-unk-mismatch and
14337      --genome-unk-mismatch, and made both default false, meaning that query N
14338      and genome N no longer count as mismatches
14339
143402011-05-12  twu
14341
14342    * samprint.c, stage3hr.c, stage3hr.h: Handling case where clipping of
14343      overlap removes entire alignment
14344
14345    * pair.c: Fixed bug in number of dashes in GSNAP output on deletions
14346
14347    * substring.c: Fixed GSNAP and SAM output for bisulfite alignments
14348
14349    * stage1hr.c: Eliminating cases with bad GMAP alignments, either with
14350      non-canonical splices or with too many mismatches
14351
14352    * stage3.c: On extend_ending5 and extend_ending3, returning dynamic
14353      programming results, even if finalscore is negative
14354
143552011-05-11  twu
14356
14357    * snpindex.c: Checking for presence of IIT file at destination, and
14358      providing a better reminder message
14359
14360    * snpindex.c: Added reminder message at end to install IIT file
14361
14362    * gsnap.c: Added warnings under --use-cmet flag if cmet index files are not
14363      present
14364
14365    * substring.c: Removed references to MAX_END_DELETIONS
14366
14367    * stage1hr.c, substring.c: Allocating gbuffer when it exceeds the amount
14368      allocated statically
14369
14370    * stage3hr.c, substring.c, substring.h: Added preference in removing
14371      duplicates for known splice sites over novel ones
14372
14373    * gsnap.c: Fixed documentation for --clip-overlap flag
14374
14375    * outbuffer.c, pair.c, pair.h, samprint.c, samprint.h, stage3hr.c,
14376      stage3hr.h, substring.c, substring.h: Performing search for correct
14377      hardclipping boundary.  Computing chrpos and mate_chrpos in advance of
14378      printing SAM output.  For Pair_binary_search, performing forward and
14379      backward search of middlei to avoid gaps.  Fixed computation of overlap in
14380      some cases involving GMAP alignments.
14381
143822011-05-10  twu
14383
14384    * samprint.c: Made logic of print_cigar follow that of print_md_string
14385
14386    * outbuffer.c, samprint.c, shortread.c, stage3hr.c: Fixes made to MD string
14387      with hard clipping of overlaps
14388
143892011-05-09  twu
14390
14391    * pair.c: Fixed infinite loops in binary search procedures
14392
14393    * resulthr.c: Removed unused variable
14394
14395    * pair.c, pair.h, samprint.c: Performing hard clipping by computing a
14396      subsequence on pairarray
14397
14398    * stage3hr.c: Allowing for NULL arguments in Stage3end_substring_low and
14399      Stage3end_chrnum, now possible in samprint.c procedures
14400
14401    * stage1hr.c: Rewrote calculation for genomicseg
14402
14403    * stage3hr.c: Fixed calculation of insert length involving GMAP alignment
14404
14405    * samprint.c, pair.c: Fixed printing of SAM chromosomal pos
14406
14407    * pair.c, pair.h, samprint.c: Further implementation of hard clipping for
14408      overlapping paired-end reads
14409
144102011-05-07  twu
14411
14412    * pair.c: Fixed bug caused by wrong order of parameters
14413
14414    * gsnap.c, outbuffer.c, outbuffer.h, pair.c, pair.h, samprint.c, samprint.h,
14415      stage3.c, stage3hr.c, stage3hr.h, substring.c, substring.h: Added code for
14416      hardclipping overlaps between paired ends.  Improved computation of insert
14417      lengths involving GMAP alignments.  Increased default shortsplicedist to
14418      200000.
14419
14420    * gmap.c: Changed default shortsplicedist to 200000
14421
14422    * get-genome.c: Added -E option for printing exons for gene maps
14423
14424    * genome.c: Made changes to perror statements
14425
144262011-05-05  twu
14427
14428    * samprint.c: Made changes to MD string to handle hard clipping
14429
14430    * stage1hr.c: Reduced genomicseg for GMAP from pairmax + shortsplicedist to
14431      just pairmax
14432
14433    * dynprog.c: Added a new debugging category for known splicing at ends
14434
14435    * outbuffer.c, samprint.c, samprint.h, stage3hr.c, stage3hr.h: Computing
14436      cigar strings to allow for hard clipping
14437
14438    * pair.c, pair.h, stage3.c: Fixed SAM flags when printing GMAP alignment in
14439      GSNAP
14440
14441    * bam_tally.c: Made bam_tally work on entire genome
14442
144432011-05-04  twu
14444
14445    * README, chrom.c, chrom.h, gmapindex.c, fa_coords.pl.in, gmap_build.pl.in,
14446      gmap_process.pl.in, gmap_setup.pl.in: Made changes for fa_coords to sort
14447      chromosomes in .coords file, for gmap_process to provide universal
14448      coordinate information to gmapindex, and for gmapindex to sort based on
14449      this order.  Ignoring leading "chr" in sorting chromosomes.
14450
14451    * chrom.c: Ignoring leading "chr" in chromosome name for sorting purposes
14452
144532011-05-03  twu
14454
14455    * stage1hr.c: Fixed bug in calling Substring_chrhigh
14456
144572011-05-02  twu
14458
14459    * gsnap.c, stage1hr.c, stage3hr.c, stage3hr.h, substring.c, substring.h:
14460      Storing both chroffset and chrhigh in Stage3end_T and Substring_T objects.
14461       Checking chromosomal boundaries when performing GMAP algorithm in GSNAP.
14462
144632011-05-01  twu
14464
14465    * gmap_build.pl.in: Added Id property
14466
14467    * gmap_build.pl.in: Added a -w flag for sleeping between steps
14468
144692011-04-27  twu
14470
14471    * splicetrie.c: Removed unused variables
14472
14473    * goby.c, gsnap.c, outbuffer.c, pair.c, pair.h, resulthr.c, resulthr.h,
14474      samprint.c, samprint.h, stage1hr.c, stage1hr.h, stage3.c, stage3.h,
14475      stage3hr.c, stage3hr.h: Changes in handling of translocations: (1) Created
14476      new "_transloc" output files.  (2) Removing XT flag from SAM output.  (3)
14477      Enforcing reported translocations to be unique.  (4) Creating a new
14478      category for translocations in pair_up procedure.
14479
14480    * gmap.c: Fixed bug in calling Genome_blocks on a user-provided segment
14481
14482    * diag.c: Added debugging information about minactive and maxactive at ends
14483      of query
14484
144852011-04-26  twu
14486
14487    * stage2.c: Not checking for pct_coverage or ncovered when querylength < 150
14488
14489    * tally_expr.c: Removed dependence on pre-computed total in tally IIT file
14490
14491    * README: Showing examples from both hg18 and hg19 in retrieving known
14492      splicesite tracks from UCSC
14493
14494    * gsnap.c, stage1hr.c, stage1hr.h: Removed min_localsplicing_end_matches
14495      parameter.  For short overlaps, now checking only that endlength >=
14496      support.
14497
14498    * pair.c: Not printing first read or second read bit in GMAP samse output
14499
145002011-04-25  twu
14501
14502    * shortread.c: Fixed bug in printing queryseq in SAM output when
14503      hardclipping is present
14504
14505    * samprint.c: Formatting changes
14506
14507    * pair.c: Removed warning message when splicesite not found in GMAP
14508
14509    * gsnap.c: Changed documentation for -N
14510
145112011-04-22  twu
14512
14513    * parserange.c: Added warning message if divstring not found in IIT file
14514
14515    * get-genome.c, iit_get.c: Removed extra linefeed introduced in old IIT
14516      versions
14517
14518    * stage1hr.c: Added check for NULL pairarray
14519
145202011-04-21  twu
14521
14522    * Makefile.dna.am, dynprog.c, dynprog.h, gmap.c, indexdb_hr.c, sequence.c,
14523      stage3.c: Made changes so PMAP would compile
14524
14525    * trunk, src, Makefile.dna.am, dynprog.c, dynprog.h, genome.c, genome.h,
14526      gmap.c, goby.c, goby.h, gregion.c, gregion.h, gsnap.c, iit-read.c,
14527      maxent.c, maxent_hr.c, outbuffer.c, pair.c, pair.h, pairdef.h, resulthr.c,
14528      samprint.c, samprint.h, sequence.c, sequence.h, shortread.c, shortread.h,
14529      splicetrie.c, splicetrie.h, stage1.c, stage1.h, stage1hr.c, stage1hr.h,
14530      stage3.c, stage3.h, stage3hr.c, stage3hr.h, substring.c, substring.h,
14531      translation.c, translation.h, util: Merged revisions 38122 to 38539 from
14532      branch 2011-04-14-halfmapping-gmap
14533
145342011-04-15  twu
14535
14536    * bam_tally.c: Changed colon to tab when printing chromosomal coordinates
14537
145382011-04-14  twu
14539
14540    * trunk, src, genome_hr.c, gsnap.c, stage1hr.c, stage1hr.h, stage3hr.c,
14541      stage3hr.h, util: Merged revisions 37902 to 38171 from
14542      branches/2011-04-10-end-indels
14543
14544    * bam_tally.c: Fixed bug in iterating through list in position_printable_p
14545
14546    * bam_tally.c: Added -n and -X flags to control depth and variant strands
14547      required. Added -B flag to control block format output.
14548
14549    * bam_tally.c: Changed -A flag into separate -C and -Q flags to print
14550      details about cycles and quality scores
14551
14552    * stage3hr.c: Allowing optimal_score procedures to consider terminal scores
14553      if all alignments are terminal
14554
14555    * trunk, VERSION, src, dynprog.c, dynprog.h, gmap.c, intron.c, intron.h,
14556      maxent.c, maxent.h, stage3.c, stage3.h, util: Merged 38078:38120 from
14557      branch 2011-04-13-gmap-knownsplicing
14558
145592011-04-13  twu
14560
14561    * snpindex.c: Added variables for bigendian machines
14562
145632011-04-10  twu
14564
14565    * stage1hr.c: Using function Genome_fill_buffer_blocks for debugging
14566
14567    * config.site.rescomp.prd: Revised version number
14568
14569    * genome.c, genome.h: Added function Genome_fill_buffer_blocks
14570
14571    * substring.c: Made fixes to printing of SNPs in GSNAP output
14572
14573    * README, configure.ac, Makefile.am, psl_introns.pl.in: Added psl_introns
14574      program.  Explaining in README file about known site-level and known
14575      intron-level splicing.
14576
14577    * gmap.c, outbuffer.c, pair.c, pair.h, stage3.c, stage3.h: Added option for
14578      -f introns output
14579
145802011-04-09  twu
14581
14582    * gsnap.c, iit-read.c, iit-read.h, interval.c, interval.h, splicetrie.c,
14583      splicetrie.h: Added code to allow known splicing based on introns, rather
14584      than splice sites
14585
145862011-03-29  twu
14587
14588    * bam_tally.c: Fixed bug in checking too early for chrpos_high > alloc_high
14589
14590    * VERSION: Updated version number
14591
145922011-03-28  twu
14593
14594    * cmetindex.c, snpindex.c: Made program work on bigendian machines
14595
14596    * stage3hr.c, stage3hr.h, substring.c, substring.h: Moved definition of
14597      Hittype_T from substring.h to stage3hr.h
14598
14599    * trunk, VERSION, config.site.rescomp.tst, src, goby.c, goby.h, gsnap.c,
14600      inbuffer.c, inbuffer.h, outbuffer.c, samprint.c, samprint.h, stage3hr.c,
14601      substring.c, util: Merged changes 36746 through 37246 from branch
14602      2011-03-17-goby-paired-end
14603
146042011-03-26  twu
14605
14606    * resulthr.c: Fixed bug in assignment of translocationp for single-end reads
14607
146082011-03-25  twu
14609
14610    * gmap.c, gsnap.c: Added clarification about memory mapping in message
14611
14612    * bamread.c: Fixed a memory leak caused by unnecessary copying of read
14613      string from BAM.
14614
14615    * Makefile.dna.am, bamread.c, bamread.h, gsnap_splices.c: Added bam_splices
14616      program
14617
14618    * bam_tally.c: Fixed bug with non-initialized count_plus or count_minus
14619
14620    * bam_tally.c: Fixed bug in extracting genomic reference nt from Genome_T.
14621      Added option for signed counts.
14622
14623    * configure.ac, Makefile.dna.am: Added Automake conditional to control when
14624      bam_tally can be made
14625
14626    * bamread.c: Added compiler directives to protect code when samtools is not
14627      available
14628
14629    * bam_tally.c: Fixed usage statement
14630
14631    * bam_tally.c: Added --pairmax option
14632
14633    * bam_tally.c: Fixed help message and added some printing options
14634
14635    * bamread.c: Stopped printing of chromosomes to stdout
14636
14637    * Makefile.dna.am: Added compiler commands for bam_tally
14638
14639    * bam_tally.c: Added clipping at ends of requested genomic region
14640
14641    * samread.c, samread.h: Added function Samread_print_cigar
14642
14643    * bam_tally.c: Implementation of working version
14644
146452011-03-24  twu
14646
14647    * bam_tally.c: Implementation of overall alloc and block structure
14648
14649    * bam_tally.c, bamread.c, bamread.h: Implemented memory freeing procedures
14650
14651    * bam_tally.c: Initial import into SVN
14652
146532011-03-18  twu
14654
14655    * gsnap.c: Rearranged some lines
14656
14657    * gmap.c: Added --quality-protocol flag and 'j' flag to set to quality print
14658      shift
14659
14660    * gmap.c, outbuffer.c, outbuffer.h: Added --quiet-if-excessive option to GMAP
14661
14662    * gsnap.c, outbuffer.c: Adding globals for invert_first_p and
14663      invert_second_p to stage3hr.c
14664
14665    * stage1hr.c, stage3hr.c, stage3hr.h: Determining effective_chrnum,
14666      genomicstart, and genomicend for splice translocations based on inner
14667      substrings.  No longer generating copies for each substring when chrnum ==
14668      0.
14669
14670    * outbuffer.c: Changed a 0 to a false
14671
14672    * README: Added explanation of dbsnp_iit program and output reporting for
14673      translocations
14674
14675    * configure.ac, Makefile.am, dbsnp_iit.pl.in: Added dbsnp_iit program
14676
14677    * stage3.h: Fixed declaration
14678
14679    * resulthr.c: Put debugging statements into debug macro
14680
14681    * outbuffer.c, pair.c, pair.h, resulthr.c, resulthr.h, samprint.c,
14682      samprint.h, stage3.c, stage3hr.c, stage3hr.h: Removed separate fp_transloc
14683      file.  Adding (transloc) string to GSNAP output, and XT flag to SAM output
14684      for translocation results.
14685
14686    * substring.c: Added error message for endtype_string
14687
14688    * outbuffer.c, resulthr.c, resulthr.h, samprint.c, stage3hr.c: Removed
14689      SINGLEEND_TRANSLOCATION and PAIREDEND_TRANSLOCATION types
14690
146912011-03-11  twu
14692
14693    * README: Added more information about the --gunzip option, about the
14694      command-line usage for paired-end reads, and about extended FASTA inputs.
14695
14696    * README: Added information about -s flag for psl_splicesites
14697
14698    * VERSION: Changed version number
14699
14700    * snpindex.c: Added option to limit number of warning messages
14701
147022011-03-10  twu
14703
14704    * snpindex.c: Fixed warning messages that previously reported the wrong SNPs
14705      and coordinates that were problematic
14706
14707    * gsnap.c, mem.c, outbuffer.c, shortread.c: Enabled LEAKCHECK in GSNAP to
14708      check for memory leaks
14709
14710    * gmap_build.pl.in: Fixed bug in creating .maps subdirectory
14711
147122011-03-09  twu
14713
14714    * VERSION: Updated version
14715
14716    * README: Added information about latest syntax for running snpindex
14717
147182011-03-08  twu
14719
14720    * stage1hr.c, stage1hr.h: Passing new parameter nsplicepartners_skip
14721
14722    * stage3hr.c, stage3hr.h: Eliminating splice translocations if a
14723      non-translocation exists
14724
14725    * outbuffer.c, outbuffer.h, resulthr.c, resulthr.h, samprint.c, samprint.h:
14726      Providing a translocation result type and printing split output to a new
14727      file
14728
14729    * indexdbdef.h: Removing unused macro definition
14730
14731    * gsnap.c, splicetrie.c, splicetrie.h: Providing a minimum intron length in
14732      building the splicetrie
14733
14734    * snpindex.c: Providing user options to specify sourcedir and destdir
14735
14736    * cmetindex.c: Fixed amount of space allocated for filename
14737
147382011-03-06  twu
14739
14740    * gmap.c, gsnap.c: Changed warning message about memory mapping
14741
14742    * stage1hr.c: Removed a conversion for bigendian machines in
14743      Batch_init_simple.
14744
14745    * spanningelt.c: Added a necessary conversion for bigendian machines
14746
14747    * indexdb.c, indexdb_hr.c: Making FILEIO access to positions look the same
14748      as MMAPPED access for bigendian machines
14749
14750    * bigendian.c, littleendian.c: Using unsigned char instead of char
14751
147522011-03-04  twu
14753
14754    * inbuffer.c: Initializing value of pc_linefeeds_p
14755
14756    * gmap.c, gsnap.c: Changed advice message about -B 3 and -B 4
14757
14758    * iit_store.c: Using 2^32-1 as a constant instead of 2^32
14759
14760    * stage1hr.c: Added additional places where Bigendian_convert_uint should be
14761      applied
14762
14763    * bigendian.c: Removed monitoring message
14764
147652011-03-03  twu
14766
14767    * stage3hr.c: Sorting paired-end reads by insert length
14768
14769    * gsnap.c: Removed --dibase flag from --help output
14770
14771    * inbuffer.c, shortread.c, shortread.h: Added field for pc_linefeeds_p.
14772      Changed variable name from pc_line_feeds_p.
14773
14774    * gsnap.c, inbuffer.c, inbuffer.h, shortread.c, shortread.h: Added option to
14775      strip PC line feeds from input
14776
147772011-03-02  twu
14778
14779    * sequence.c: Enabled GMAP to read FASTQ files
14780
14781    * gsnap_fasta.c: Made the default behavior to print all sequences
14782
14783    * pair.c: Fixed problem with double tabs in SAM output.  Now printing NM tag
14784      in SAM output.
14785
14786    * config.site.rescomp.prd, config.site.rescomp.tst, VERSION: Changed version
14787      number
14788
14789    * sam_merge.pl.in: Printing nomapping lines from original GSNAP output
14790
14791    * configure.ac, Makefile.am, sam_restore.pl.in: Added program sam_restore
14792
14793    * spliceturn.c: Added option to print splices, rather than splicesites
14794
14795    * samread.c, samread.h: Added functions to print modified SAM reads
14796
14797    * gsnap_splices.c: Trusting SAM splice directions by default.  Added an
14798      explicit variable to indicate if the splice is canonical.
14799
14800    * gsnap_multiclean.c: Added a flag -C to pick either concordant or
14801      non-concordant behavior. Allowing nonmapped queries to be printed, if no
14802      other alignment is available.
14803
14804    * gsnap_filter.c: Modified to print nonmapped SAM entries.  Removed
14805      inconsistencies in printing some lines to fp_one and some to fp_many.
14806
14807    * gsnap_fasta.c: Prints the query from the SAM line with the most matches
14808      (i.e., not hardclipped).  Added an option --oneway to print both ends of a
14809      paired-end read in the same direction.
14810
14811    * gsnap_concordant.c: Modified to print nonmapped SAM entries, and to add
14812      mate information for concordant pairs.  Code follows that in
14813      gsnap_multiclean.c
14814
148152011-03-01  twu
14816
14817    * splicetrie.c: Increased value of MAX_DUPLICATES from 100 to 1000
14818
14819    * gsnap.c: Checking for case where splicesites_iit has no sites
14820      corresponding to given genome
14821
14822    * pair.c: Fixed MD string to print genomic nt, rather than query nt
14823
148242011-02-28  twu
14825
14826    * pair.c: Implemented MD string in SAM output for GMAP
14827
148282011-02-26  twu
14829
14830    * spliceturn.c: Added --new flag to print only new splices
14831
14832    * spliceclean.c: Added break after case 0 in getopt
14833
14834    * substring.c, substring.h: Fixed insert length calculation to be based on
14835      genomicstart and genomicend.  Removed querylength_adj field.
14836
14837    * stage3hr.c: Preventing terminal alignments from setting minscore in
14838      Stage3_optimal_score and Stage3pair_optimal_score
14839
14840    * stage1hr.c: Fixed and simplified calculation of floor_left and floor_right
14841      for diagonals
14842
148432011-02-25  twu
14844
14845    * configure.ac, Makefile.am, psl_splices.pl.in: Added psl_splices program
14846
14847    * sam_merge.pl.in: Including inserted query segment in computing number of
14848      matches.
14849
14850    * spliceturn.c: Eliminating only new splices (numeric labels).  Searching
14851      for nearest surrounding known splices.
14852
14853    * spliceclean.c: Printing original label for each splice
14854
14855    * gsnap_splices.c: Added flag to require canonical splices.  Checking for
14856      canonical dinucleotides regardless of sam XS string.
14857
14858    * samprint.c: Fixed bug in MD string for shortexon.  Added debugging
14859      statements.
14860
14861    * stage1hr.c, stage3hr.c, stage3hr.h, substring.c, substring.h: Added
14862      querylength_adj to compensate for indels when computing insert length
14863
14864    * pair.c: Fixed bug in tokens where Ilength was being printed instead of
14865      Dlength
14866
14867    * spliceturn.c: Printing intron lengths.  Added labels to output when
14868      splices are turned.
14869
14870    * shortread.c: Requiring first file to have "/1" and second file to have
14871      "/2" if slashes are present
14872
14873    * samread.c: Allowing for XS:A:? to indicate unknown splice direction
14874
14875    * pair.c: Changed minimum intron length for non-concordant splice in cigar
14876      string from 100 to 20
14877
14878    * gsnap_tally.c: Changed default handling of quality scores to be Sanger
14879      protocol
14880
14881    * gsnap_splices.c: Using available XS flag in SAM output.  Printing
14882      non-directional splices as both forward and reverse.  Printing
14883      non-canonical dinucleotide pairs.
14884
14885    * gsnap_multiclean.c: Skipping, rather than aborting on, concordant pairs
14886      with different numbers of hits, due to translocations
14887
14888    * gmap.c: Changed flag for printing noncanonical splices in cigar string
14889
14890    * samprint.c, stage3hr.c, substring.c, substring.h: Fixed MD string to
14891      exclude part that is soft-clipped
14892
14893    * spliceclean.c, tally.c, tally.h: Added default-count option.  Fixed bug
14894      where splice occurred beyond extents.  Fixed some memory leaks.
14895
148962011-02-24  twu
14897
14898    * psl_splicesites.pl.in: Added missing parentheses
14899
14900    * configure.ac, Makefile.am, sam_merge.pl.in: Added sam_merge program
14901
14902    * README: Added description of gmap_build
14903
14904    * gsnap.c, mapq.c: Made sanger the default for quality protocol
14905
149062011-02-23  twu
14907
14908    * shortread.c: Fixed bug in chopping paired-end reads of different lengths
14909
14910    * sequence.c: Fixed bug in skipping initial '<', '>', or '+' in quality
14911      string
14912
14913    * gsnap_concordant.c: Requiring chr strings in the two ends to be equal
14914
14915    * pair.c: Printing XS:A:? when splice direction is not known, because it is
14916      non-canonical
14917
14918    * sequence.c, sequence.h: Added compiler directives to hide quality string
14919      from PMAP
14920
14921    * gsnap_fasta.c: Printing extended FASTA with quality strings for GMAP output
14922
14923    * gmap.c, outbuffer.c, outbuffer.h, pair.c, pair.h, sequence.c, sequence.h,
14924      stage3.c, stage3.h: Implemented ability to read extended FASTA with
14925      quality strings and print them in SAM format
14926
14927    * mapq.c: Added recommendation to use --quality-protocol=sanger
14928
14929    * gsnap_concordant.c, samread.c, samread.h: Printing altered mapq scores
14930
14931    * gmap.c, outbuffer.c, outbuffer.h, pair.c, pair.h, stage3.c, stage3.h:
14932      Added flag to print non-concordant splices as N in SAM cigar string,
14933      rather than as D
14934
14935    * gsnap_concordant.c, samread.c, samread.h: Added calculation and printing
14936      of insert length
14937
14938    * gsnap.c: Added information about quality protocols to --help output
14939
149402011-02-21  twu
14941
14942    * shortread.c: Fixed bugs in handling gzipped paired-end files
14943
14944    * exonscan.c, extents_genebounds.c, gdiag.c, geneadjust.c, get-genome.c,
14945      gsnap_extents.c, gsnap_splices.c, gsnap_tally.c, gsnap_terms.c,
14946      iit_plot.c, pairinggene.c, segue.c, snpindex.c, splicefill.c,
14947      splicegene.c, splicegraph.c, splicescan.c, spliceturn.c, splicing-scan.c,
14948      splicing-score.c, tally_expr.c, tallygene.c: Changed parameter for
14949      Genome_new from batchp to access mode
14950
14951    * psl_splicesites.pl.in: Added -R flag to report non-canonical splice sites
14952
14953    * gmap_build.pl.in: Fixed name of maps subdirectory
14954
14955    * sequence.c, shortread.c: Fixed bugs further in closing NULL file pointer
14956
14957    * psl_splicesites.pl.in: Fixed bugs in syntax.  Added -s flag to specify
14958      start column.
14959
14960    * util, fa_coords.pl.in, gmap_build.pl.in, gmap_process.pl.in: Merged -r
14961      35283:35500 from branches/gmapindex-multifile/util
14962
14963    * Makefile.am: Added psl_splicesites to the list of files to be cleaned
14964
14965    * README, configure.ac, Makefile.am, psl_splicesites.pl.in: Added
14966      psl_splicesites program to process UCSC alignment tracks into a splicesite
14967      file.
14968
14969    * sequence.c, shortread.c: Fixed bug from trying to close a NULL file pointer
14970
14971    * trunk, README, VERSION, config.site.rescomp.tst, maint, memory-check.pl,
14972      src, gsnap.c, interval.c, mapq.c, mapq.h, samprint.c, splicetrie.c,
14973      splicetrie.h, stage1hr.c, stage1hr.h, stage3hr.c, stage3hr.h, substring.c,
14974      substring.h: Merged revisions 35346:35468 from branches/tentative-splices
14975      to store and resolve ambiguous splice ends
14976
14977    * outbuffer.c: Added warning messages when an output file cannot be written
14978
149792011-02-16  twu
14980
14981    * mapq.c: Changed allowable range of quality scores to go from 0 to 96
14982
14983    * gsnap.c: Changed default min_shortend from 1 to 2
14984
14985    * stage1hr.c: Removed unused code for old maxent procedures
14986
14987    * splicetrie.c, splicetrie.h, stage1hr.c: Providing splicesites to
14988      Splicetrie_find_short procedures, useful for debugging
14989
14990    * stage3hr.c: Removed debugging statement
14991
14992    * stage1hr.c: Restored ability to find short-overlap splicing in 1-2 bp at
14993      ends of read
14994
14995    * stage3hr.c: For insert length of 0, setting absdifflength to be infinite
14996
14997    * sequence.c, shortread.c: Closing files when multiple ones are provided on
14998      the command line
14999
15000    * pair.c: Implemented printing of intron distances in splice sites (-f 6)
15001      output
15002
15003    * gsnap.c, interval.c, interval.h, splicetrie.c, splicetrie.h, stage1hr.c,
15004      stage1hr.h: Implemented usage of known splice distances for single splices
15005
15006    * get-genome.c: Printing output correctly for version 5 IIT files with
15007      information in rest of header
15008
150092011-02-15  twu
15010
15011    * stage1hr.c: Removed unnecessary checks for nsplicesites > 0
15012
15013    * stage1hr.c: Computing and storing known splicesites_i for each segment once
15014
15015    * shortread.c: Removed abort left in for debugging
15016
15017    * shortread.c: Fixed bug in selecting region of short read to look for
15018      adapter stripping
15019
15020    * stage3hr.c: Fixed indentation
15021
15022    * gsnap_concordant.c: Fixed procedure to report all concordant pairs with
15023      correct distance and orientation
15024
15025    * gsnap.c, inbuffer.c, shortread.c, shortread.h: Simplified procedure for
15026      adapter stripping using a linear algorithm, instead of dynamic programming.
15027
15028    * Makefile.dna.am, Makefile.gsnaptoo.am: Revised instructions for cmetindex
15029
15030    * cmetindex.c: Enabled -F and -D flags for specify source and destination
15031      directories.  Added --version and --help flags.
15032
15033    * gmap_build.pl.in: Added -B flag to optionally specify bindir
15034
15035    * gmap_setup.pl.in: Modified usage statement to include GSNAP
15036
15037    * configure.ac, Makefile.am, gmap_build.pl.in: Added gmap_build program
15038
150392011-02-14  twu
15040
15041    * VERSION: Updated version
15042
15043    * README: Added statement about usage of -m flag
15044
15045    * gsnap_concordant.c: Fixed typo leading to wrong output file
15046
15047    * outbuffer.c: Fixed typo
15048
15049    * stage3.c, stage3.h: Allowing flags for GMAP to indicate SAM output is
15050      paired-end
15051
15052    * shortread.c: Allowing extended FASTA format to include a quality string
15053
15054    * sequence.c: Removed debugging statement
15055
15056    * samflags.h: Clarified comment
15057
15058    * pair.c, pair.h: Allowing flags for GMAP to indicate SAM output is
15059      paired-end. Printing XS flag for strand direction.
15060
15061    * outbuffer.c, outbuffer.h: Added variables for indicating SAM output is
15062      paired-end
15063
15064    * gsnap_terms.c: Using procedure in samread.c for computing chrpos_high.
15065
15066    * gsnap_multiclean.c, samread.c, samread.h: Added file variable for printing
15067      altered flags.  Added procedure for computing chrpos_high.
15068
15069    * gsnap_filter.c: Added program for sam_filter
15070
15071    * gsnap_fasta.c: Handling GMAP and GSNAP output correctly in a single
15072      procedure
15073
15074    * gsnap_extents.c: Enabling GMAP indexdb for sam_extents
15075
15076    * gsnap_concordant.c: Recomputing all SAM flags
15077
15078    * gmap.c: Changed format names to samse and sampe
15079
15080    * Makefile.gsnaptoo.am: Include maxent_hr.c and .h for GMAP.  Removed
15081      maxent.c and .h from GSNAP.
15082
15083    * Makefile.dna.am: Include maxent_hr.c and .h for PMAP.  Included programs
15084      sam_fasta and sam_concordant.
15085
150862011-02-13  twu
15087
15088    * gsnap_concordant.c: Added file for concordant_mult.  Printing concordant
15089      results in adjacent pairs of SAM lines.
15090
15091    * gsnap_concordant.c: Initial import into SVN
15092
150932011-02-12  twu
15094
15095    * gsnap_fasta.c: Printing extended FASTA output for GMAP, using '>' and '<'
15096
15097    * gsnap_fasta.c: Provided separate output types for GMAP and for GSNAP
15098
15099    * pair.c, sequence.c, sequence.h: Enabled reading of extended FASTA using
15100      '>' and '<' to indicate first and second reads, and putting information in
15101      flag of SAM output.
15102
151032011-02-11  twu
15104
15105    * stage3hr.c: Implemented a different method for using bingo pairlengths, by
15106      calculating a minscore for those pairlengths with the bingo characteristic
15107
15108    * stage3hr.c: Using pairlength_deviation to eliminate pairs, even if
15109      non-overlapping
15110
15111    * gsnap_extents.c, gsnap_iit.c, gsnap_splices.c, gsnap_tally.c,
15112      gsnap_terms.c: Changed defaults of concordant and unique to be false in
15113      all programs
15114
15115    * gsnap_fasta.c: Added printing of quality strings.  Implemented -A flag to
15116      print all sequences.
15117
15118    * stage1hr.c: Rearranged order of singlesplice_minus procedure
15119
15120    * gsnap_terms.c: Fixed check of concordance and uniqueness for last sequence
15121
15122    * shortread.c: Moved return statement to correct place
15123
15124    * gmap.c, outbuffer.c: Added compiler directives for case when pthreads not
15125      available
15126
15127    * datadir.c: Changed warning message about needing to recompile GMAP package
15128
15129    * outbuffer.c, samprint.c, samprint.h, stage3hr.c, stage3hr.h: Split
15130      fp_paired_uniq into separate files for inversions, scrambles, and long
15131      inserts
15132
151332011-02-10  twu
15134
15135    * gsnap.c, substring.c, substring.h: Removed notion of termdonor and
15136      termacceptor typeints
15137
15138    * samprint.c: Printing insert length for paired alignments.  Changed method
15139      for determining sign.
15140
15141    * substring.c, substring.h: Added functions for finding overlaps and insert
15142      lengths between two substrings
15143
15144    * stage3hr.c: Changed criterion for scramble to be the absence of any
15145      overlap in the wrong relative positions.  Revamped computation of insert
15146      length to look for overlapping substrings.
15147
15148    * inbuffer.c: Removed second check of nleft == 0 when initially it is not
15149
15150    * substring.c: Added other fields in Substring_T that were not being copied.
15151       Removed splicesites_offset from Substring_T object.
15152
15153    * gsnap.c, outbuffer.c, outbuffer.h, stage1hr.c, stage3hr.c, stage3hr.h,
15154      substring.c, substring.h: Added compiler directives for using new
15155      maxent_hr procedures.  Fixed problem in substring.c where chimera_knownp_2
15156      was not being copied.
15157
15158    * maxent_hr.c: Using jump tables based on shift
15159
151602011-02-09  twu
15161
15162    * maxent_hr.c: Removed duplicate calls in reading genome_blocks
15163
15164    * stage3hr.h: Fixed definition of sense consistency for inversion pairs
15165
15166    * stage3hr.c, stage3hr.h: Changed splicing sense test to work for inversion
15167      pairs
15168
15169    * stage1hr.c: Distinguished a binary_search procedure to be used for
15170      bigendian computers in processing positions from the indexdb file.
15171
15172    * stage1hr.c: Added missing code for computing nmismatches for one type of
15173      splice end
15174
15175    * Makefile.dna.am, maxent_hr.c, maxent_hr.h, stage1hr.c: Implemented fast
15176      calculation of maxent splice site probabilities
15177
15178    * stage1hr.c: Removed trimpos calculation from find_spliceends.  Not useful
15179      when we can compute short-overlaps.
15180
15181    * Makefile.dna.am, gsnap_best.c, gsnap_fasta.c: Added programs for
15182      extracting alignment with best MAPQ score and converting alignment output
15183      to FASTA.
15184
15185    * stage1hr.c: Using genome-based splice site detection for splice ends
15186
15187    * gsnap.c: Fixed name of flag suboptimal-levels in help statement
15188
15189    * shortread.c: Fixed bug in not allocating space for final '\0' to contents.
15190      Commented out check for PC line feeds.
15191
151922011-02-08  twu
15193
15194    * genome_hr.c: Fixed excessive shift in calculating high_halfbit
15195
15196    * stage1hr.c: Put back checks for zero nsegments
15197
15198    * outbuffer.c: Some output seems to be missed on occasion.  Rewrote to use
15199      ndone and noutput.
15200
15201    * stage1hr.c: Making all loops on segments go to segments[nsegments] instead
15202      of nsegments-1
15203
15204    * stage1hr.c: Implemented novel double splice detection using new
15205      genome-based splice site detection.  Removed leftspan/rightspan test and
15206      replaced with counting mismatches.
15207
15208    * outbuffer.c: Fixed outbuffer to check for donep flag, and for inbuffer to
15209      signal when donep is set
15210
152112011-02-07  twu
15212
15213    * genome_hr.c: Fixed bug in dealing with high halfbit
15214
152152011-02-06  twu
15216
15217    * stage1hr.c: Fixed memory leak caused by computing floors twice
15218
15219    * stage1hr.c: Raised minimum splice prob support to 0.80.  Extending range
15220      for single splicing to 2 nt from each end.  Implemented double splice
15221      detection involving known splice sites.
15222
15223    * stage3hr.c: Eliminating identical Stage3_T pairs properly
15224
15225    * outbuffer.c: Fixed bug that caused GMAP to handle maxpaths parameter
15226      incorrectly
15227
15228    * genome_hr.c, genome_hr.h, stage1hr.c, substring.c: Implemented working
15229      procedure for finding splice sites from compressed genome, and merging
15230      with known sites.  Using this procedure to find single splices.  Merged
15231      single-splice procedures for plus and minus strand, and moved handling of
15232      plus and minus strands inside of Substring_T procedures.
15233
152342011-02-05  twu
15235
15236    * genome_hr.c, genome_hr.h: Implemented fast determination of splice site
15237      locations
15238
15239    * stage1hr.c: Removed unused variables
15240
15241    * genome_hr.c: Added tables for splice site positions
15242
15243    * dev: New directory for developer work
15244
152452011-02-04  twu
15246
15247    * stage1hr.c: Integrated multiple procedures for merging heaps to find
15248      segments into a single procedure (plus a specialized one for terminals
15249      only). Removed separate Splicesegment_T object and using a general
15250      Segment_T object.
15251
15252    * stage1hr.c: Divided single splicing, known double splicing, and novel
15253      double splicing into separate procedures.
15254
15255    * gsnap.c, stage1hr.c, stage1hr.h: Provided flag for detecting novel double
15256      splices, and turned the feature off by default.
15257
15258    * stage1hr.c: Implemented faster method for finding double splices.
15259      Commented out code no longer valid for setting splice_pos_start and
15260      splice_pos_end in finding single splices.
15261
15262    * stage1hr.c: Increased speed of finding local splices by storing leftmost
15263      and rightmost querypos.  Hiding double-splicing for now.
15264
152652011-02-03  twu
15266
15267    * stage1hr.c, substring.c, substring.h: Handling splicesites_offset in
15268      donor, acceptor, and shortexon Substring_T types.  Handling two values of
15269      splicesites_i in shortexon.
15270
15271    * stage1hr.c: Implemented detection of double-splicing at novel splice
15272      sites. Integrated detection of single-splicing and double-splicing.
15273      Removed some unused code based on USE_CHARS rather than nucleotides.
15274
15275    * stage3hr.h: Added missing declaration of function
15276
15277    * outbuffer.c: Finishing up printing of remaining output
15278
152792011-02-02  twu
15280
15281    * stage1hr.c: Offset knowni arrays by +1, so we can clear by setting to 0,
15282      rather than by setting to -1.
15283
15284    * outbuffer.c, pair.c, pair.h: Now GMAP prints nomapping results in SAM
15285      format
15286
15287    * pair.c: Fixed printing of splice site scores for antisense cDNAs
15288
15289    * stage1hr.c: Removed restriction on finding terminals only when nconcordant
15290      was 0.
15291
15292    * trunk, src, gsnap.c, stage1hr.c, stage1hr.h: Merged r33262:34617 from
15293      branch suboptimal-alignments, adding parameter for terminal length, and
15294      not using 10*maxpaths for computation
15295
15296    * config.site.rescomp.tst, VERSION: Updated version number
15297
15298    * README: Updated description to include information about paired alignments
15299      and the advantages of known splice sites
15300
15301    * stage1hr.c: Added better debugging statements.  Renamed "shortend"
15302      procedures to "short-overlap".
15303
15304    * splicetrie.c: Requiring that 3 * nmismatches < nmatches to report a true
15305      search result
15306
15307    * gsnap.c, outbuffer.c, resulthr.c, resulthr.h, samprint.c, samprint.h,
15308      stage1hr.c, stage1hr.h, stage3hr.c, stage3hr.h: Implemented three types of
15309      alignment: concordant, paired, and unpaired, with three subtypes of
15310      paired: inversion, toolong, and scramble.  Detecting paired alignments in
15311      Stage3_pair_up_concordant. Converting unpaired uniq to paired uniq when
15312      appropriate.
15313
153142011-02-01  twu
15315
15316    * stage3hr.c: Adding information about unpaired type (interchrom, toolong,
15317      scramble, inversion) for unpaired_uniq results
15318
15319    * outbuffer.c: Changed type for nread and ncomputed fields from bool to int
15320
15321    * gsnap.c, stage1hr.c, stage1hr.h, stage3hr.c, stage3hr.h: Added parameter
15322      for pairlength_deviation
15323
15324    * outbuffer.c: Protecting print loops with surrounding lock and unlock
15325      instructions. Removed debugging flag.
15326
15327    * Makefile.dna.am, Makefile.gsnaptoo.am, Makefile.three.am, gmap.c, gsnap.c,
15328      inbuffer.c, inbuffer.h, ioboard.c, ioboard.h, outbuffer.c, outbuffer.h:
15329      Fixed bug where multithreading was hanging.  Moved IOBoard_T information
15330      to Outbuffer_T.
15331
15332    * stage3hr.c: Rewrite of Stage3_pair_up_concordant to get all concordant
15333      pairs
15334
15335    * gmap.c, gsnap.c: Not printing program name as an arg
15336
15337    * gmap.c, gsnap.c: Printing version and calling arguments to stderr
15338
15339    * stage3hr.c: Using a pointer instead of a count to mark paired_seenp
15340
15341    * outbuffer.c: Using RRlist_T to represent queue for ordered output
15342
15343    * outbuffer.c: Replaced doubly linked list with singly linked list for queue
15344
153452011-01-31  twu
15346
15347    * stage3hr.c: Added additional check to prevent negative insert lengths
15348
15349    * outbuffer.c: Storing results in a queue, instead of a list
15350
15351    * Makefile.dna.am: Removed reads_store and reads_dump
15352
15353    * stage3hr.c: Removed keyword "quality:" from output
15354
15355    * shortread.c: Fixed bug that removed all quality strings
15356
15357    * stage3hr.c: Fixed bug in finding concordance between overlapping ends of
15358      read
15359
15360    * indexdb.c: Fixed bug in compiler directive for MMAP
15361
153622011-01-28  twu
15363
15364    * goby.c, goby.h, gsnap.c, inbuffer.c, inbuffer.h, samprint.c, shortread.c,
15365      shortread.h, stage3hr.c: Simplified procedures in shortread.c.
15366      Implemented parsing of barcodes.
15367
15368    * mapq.c, mapq.h, substring.c: Fixed calculation of MAPQ with separate
15369      coordinates for checking genomic string and quality string
15370
15371    * shortread.c, shortread.h: Initial import into SVN.  Contains Sequence_T
15372      functions specific to GSNAP.
15373
15374    * Makefile.dna.am, Makefile.gsnaptoo.am, Makefile.three.am, chop_primers.c,
15375      genome.c, goby.c, goby.h, gsnap.c, inbuffer.c, outbuffer.c, reads_store.c,
15376      request.c, request.h, samprint.c, samprint.h, sequence.c, sequence.h,
15377      stage1hr.c, stage1hr.h, stage3hr.c, stage3hr.h, substring.c, substring.h:
15378      Separated Sequence_T functions into Sequence_T and Shortread_T
15379
15380    * gmap.c, outbuffer.c, outbuffer.h: Made outbuffer work for PMAP by removing
15381      references to sam_header_p and related variables
15382
15383    * gsnap.c: Removed old sam.h include
15384
15385    * substring.c: Turned off debugging
15386
15387    * spliceclean.c: Version 34373 was an accidental reversion.  Going back to
15388      version 34369, where we are adding use of genebounds_iit and adding
15389      functionality for resolving splice directions
15390
15391    * spliceturn.c: Version 34374 was an accidental reversion.  Going back to
15392      version 34369, where information goes to stderr.
15393
15394    * splicegene.c: Version 34375 was an accidental reversion.  Going back to
15395      version 34369, which actually does stop fixing of terminalp in acceptors
15396      where next donor is terminal.
15397
15398    * splicefill.c: Version 34377 was an accidental reversion.  Going back to
15399      version 34369, which uses tally, adds smoothing, does not use slopes to
15400      find edges, and does not check for edgedistance.
15401
15402    * pair.c: Using new samflags.h
15403
15404    * Makefile.dna.am, Makefile.gsnaptoo.am, Makefile.three.am, gsnap_extents.c,
15405      gsnap_multiclean.c, gsnap_splices.c, gsnap_tally.c, gsnap_terms.c, sam.c,
15406      sam.h, samflags.h, samprint.c, samprint.h: Change file name from sam.c to
15407      samprint.c.  Moved definitions of SAM flags to samflags.h.
15408
15409    * splicefill.c: Removed smoothing.  Using slopes to find edges.  Checking
15410      for edgedistance.
15411
15412    * splicegene.c: Stopped fixing of terminalp in acceptors cases where next
15413      donor was terminal
15414
15415    * spliceturn.c: Providing information to stdout about splices that are turned
15416
15417    * spliceclean.c: Removed use of genebounds_iit and functionality for
15418      resolving splice directions
15419
15420    * changepoint.c: Changed function for both ends, but not used anyway
15421
15422    * src, Makefile.dna.am, Makefile.gsnaptoo.am, Makefile.three.am,
15423      blackboard.c, blackboard.h, changepoint.c, chop_primers.c, gmap.c,
15424      gsnap.c, iit_pileup.c, inbuffer.c, inbuffer.h, ioboard.c, ioboard.h,
15425      outbuffer.c, outbuffer.h, reads_get.c, reqpost.c, reqpost.h, request.c,
15426      request.h, result.c, result.h, resulthr.c, resulthr.h, sequence.c,
15427      sequence.h, spliceclean.c, splicefill.c, splicegene.c, spliceturn.c,
15428      stage3.c, stage3.h, tableuint.c, tableuint.h, tally_exclude.c: Merging
15429      changes to threads system from new-threads branch
15430
15431    * indexdb.c: Removed extraneous allocation of memory for offsets
15432
15433    * pair.c, pair.h, stage3.c, stage3.h: Using a single parameter for
15434      sourcename for GFF3 output
15435
15436    * substring.c: Fixed calculation of region to check for MAPQ scoring
15437
15438    * mapq.c: Added check for wrong segment to check in computing MAPQ
15439
154402011-01-24  twu
15441
15442    * result.c: Altered debugging statements
15443
15444    * gmap.c: Fixed memory leak in chimera detection
15445
154462011-01-21  twu
15447
15448    * gmap.c: Fixed case where best0 or best1 was duplicated in rest of
15449      stage3list
15450
15451    * result.c: Added debugging statements
15452
15453    * gmap.c: Removed debugging comment
15454
15455    * stage3.c, stage3.h: Added function Stage3_identity_cmp to help with
15456      chimera detection
15457
15458    * gmap.c: Removed check for chimeras based on alignment break.  Handling
15459      cases where the same stage3 object is in both lists.
15460
15461    * chimera.c, pair.c, pair.h: Simplified Pair_matchscores and computing over
15462      querylength.  In Chimera_bestpair, check for cases where the same stage3
15463      object is in both lists.
15464
154652011-01-20  twu
15466
15467    * Makefile.gsnaptoo.am: Added chimera.c to build of gmap
15468
15469    * VERSION: Updated version to 2011-01-21
15470
15471    * gsnap.c: Always creating a .nomapping file with --split-output option
15472
15473    * stage1hr.c: Changed debugging statements for shortexon
15474
15475    * splicetrie.c: Changed debugging statements
15476
15477    * sequence.c: Not printing space at end of accession
15478
15479    * gsnap.c: Turning on splicetrie precomputation by default
15480
15481    * gmap.c: Fixed bug in separating chimeric paths
15482
15483    * gmap.c: Not sorting first part of stage3list when chimera is present
15484
15485    * Makefile.dna.am: Added uintlist.c to gsnap_iit
15486
15487    * chimera.c: Made detection of alignment break work again
15488
15489    * splicetrie.c: Implemented handling of duplicate leaves
15490
154912011-01-19  twu
15492
15493    * splicegene.c: Handling genebounds.iit as input
15494
15495    * gsnap.c: Added --sam-headers-batch option
15496
15497    * gsnap_iit.c: Changed output to look like gene map format
15498
15499    * gsnap_extents.c: Fixed handling of non-spliced reads in sam_extents
15500
155012011-01-18  twu
15502
15503    * sam.h: Added constant for clearing NOT_PRIMARY bit
15504
15505    * sam_tally.c: Removed from CVS
15506
15507    * gsnap_multiclean.c, samread.c, samread.h: Implemented printing of altered
15508      flag
15509
15510    * gsnap_terms.c: Made program provide same output with SAM input.
15511      Implemented filtering for concordant pairs.  Removed filtering by
15512      max_endlength.
15513
15514    * gsnap_tally.c: Implemented filtering for concordant pairs
15515
15516    * gsnap_splices.c: Made program provide same output with SAM input
15517
15518    * gsnap_extents.c: Made program with SAM input
15519
15520    * gsnap_multiclean.c: Turned off debugging statements
15521
15522    * Makefile.dna.am, gsnap_multiclean.c: Implemented sam_multiclean
15523
155242011-01-15  twu
15525
15526    * gsnap_multiclean.c, multiclean.c: Renamed file
15527
15528    * gsnap_tally.c: Added check for concordantp in SAM input.  Fixed bug in
15529      initializing a variable.
15530
15531    * sequence.c: Made paired adapter detection more stringent, allowing only 1
15532      mismatch
15533
15534    * gsnap.c, sam.c, sequence.c, sequence.h, stage3hr.c: Fixed bugs in printing
15535      full quality string in GSNAP output
15536
15537    * gsnap.c, sam.c, sequence.c, sequence.h, stage3hr.c: Printing full quality
15538      string (not chopped for adapter) in GSNAP output
15539
155402011-01-10  twu
15541
15542    * stage3.c: Fixed compilation for PMAP
15543
15544    * gmap.c: Added compiler directives to hide SAM output which is not used in
15545      PMAP
15546
15547    * translation.c: Added compiler directives to hide functions that are not
15548      used in PMAP
15549
15550    * oligop.c: Fixed compiler warnings about array index being char
15551
15552    * Makefile.dna.am: Removed bam_pileup from being made
15553
15554    * gmap.c: Added documentation for new output flags
15555
15556    * gsnap.c: Changed output flag from -7 to --split-output
15557
15558    * chimera.c, chimera.h, genome.c, get-genome.c, gmap.c, iit-read.c,
15559      iit-read.h, md5-compute.c, pair.c, pair.h, revcomp.c, segmentpos.c,
15560      segmentpos.h, sequence.c, sequence.h, stage1.c, stage3.c, stage3.h,
15561      subseq.c, translation.c, translation.h: Implemented split output to files
15562
15563    * iit-read.c: Fixed bug in handling NULL IITs
15564
15565    * gmap.c, pair.c, pair.h, sequence.c, sequence.h, stage3.c, stage3.h:
15566      Implemented printing of chimeras in SAM output
15567
155682011-01-09  twu
15569
15570    * trunk, gmap.c, pair.c, pair.h, result.c, result.h, stage3.c, stage3.h,
15571      translation.c: Merged all changes from chimera branch
15572
15573    * Makefile.pmaptoo.am: Update commands
15574
15575    * Makefile.dna.am: Added commands for bam_pileup
15576
155772011-01-07  twu
15578
15579    * gmap.c, stage3.h: Added new debugging point for result after all cycles.
15580
15581    * stage3.c: Not forcing solution for dual breaks.  Using separate maxiter
15582      limits.
15583
15584    * stage3.c: Changed comments for fix_adjacent_indels
15585
155862011-01-06  twu
15587
15588    * Makefile.three.am: Added files to GSNAP
15589
15590    * pair.c: Changed debugging output for Pair_dump to show the comp
15591
15592    * stage2.c: Added a check for all zero scores when trying to find alignment
15593      end point
15594
155952011-01-05  twu
15596
15597    * stage3.c: Added a final cleaning of ends
15598
15599    * stage3.c: Added procedure to fix adjacent indels
15600
15601    * gmap.c, pair.c, pair.h, segmentpos.c, segmentpos.h, stage3.c, stage3.h:
15602      Removed references to zerobasedp
15603
15604    * pair.c: Using last_querypos and last_genomepos explicitly instead of
15605      prev->querypos and prev->genomepos.  Fixed issues with SAM output.
15606
156072011-01-04  twu
15608
15609    * gmap.c: Added compiler directives to prevent PMAP from seeing SAM output
15610      code
15611
15612    * backtranslation.h: Fixed typo in declaration
15613
15614    * gsnap.c: Fixed comment
15615
15616    * gmap.c, pair.c, pair.h, stage3.c, stage3.h: Printing headers and
15617      read-groups in SAM output
15618
156192011-01-03  twu
15620
15621    * MAINTAINER: Updated instructions for ChangeLog
15622
15623    * config.guess: Update of config.guess by latest autoconf
15624
15625    * INSTALL: Update of INSTALL message by latest autoconf
15626
15627    * stage3hr.c: Added assertions about sign of nindels
15628
15629    * gmap_setup.pl.in: Handling case where user gives -d argument with trailing
15630      slash
15631
15632    * gsnap.c: Added missing break after -o flag
15633
156342010-12-22  coryba
15635
15636    * gsnap_tally.c: changed compiler directives to get gmap build to work
15637
15638    * sam.c: *minor change to have the MD field output a 0 after the deletion if
15639      an  insertion is adjacent to a deletion **IGB can now parse gsnap's SAM
15640      output
15641
156422010-12-15  twu
15643
15644    * gsnap.c, mapq.c, sequence.c: Added flag --quality-protocol
15645
156462010-12-12  twu
15647
15648    * stage1hr.c: Fixed bugs in storing splicesites_i
15649
156502010-12-10  twu
15651
15652    * pair.c: Fixed bug in dealing with EXTRAEXON_COMP
15653
15654    * gsnap_tally.c: Added flag for minimum mapq
15655
15656    * gsnap.c, mapq.c, stage1hr.c, stage1hr.h, stage3hr.c, stage3hr.h,
15657      substring.c: Merged r32485:32693 from branch gsnap-trim-penalty into the
15658      trunk
15659
156602010-12-08  twu
15661
15662    * config.site.rescomp.tst: Updated to include with_samtools
15663
15664    * bamread.c: Hid declaration of bam_init_header_hash when samtools is not
15665      enabled
15666
156672010-12-07  twu
15668
15669    * substring.c: Implemented marking of methylation changes
15670
15671    * stage1.c: Performing a single uniqueness step at end
15672
15673    * stage2.c: Using global or local winner for end of stage 2
15674
15675    * indexdb_dibase.c: Using Access_mode_T for Indexdb_new_genome
15676
15677    * indexdb.c: Minor fixes
15678
15679    * gregion.c, gregion.h: Providing hooks for Gregion_filter_clean
15680
15681    * gmap.c, gsnap.c: Using allocate as default mode if mmap not available
15682
156832010-12-06  twu
15684
15685    * gsnap_tally.c, bamread.c, bamread.h, gsnap_extents.c, gsnap_splices.c,
15686      gsnap_terms.c, samread.c, samread.h: Returning mapping quality from SAM
15687      and BAM inputs
15688
15689    * gmap.c: Improved default information for --batch feature in --help
15690
15691    * get-genome.c: Fixed mapping labels from stdin
15692
15693    * gsnap.c: Changed default memory access to be level 2
15694
156952010-12-04  twu
15696
15697    * stage3hr.c: Disallowing concordant pairs between two terminal alignments
15698
15699    * stage1hr.c, stage3hr.c, stage3hr.h: Placed restriction on terminal
15700      alignments to have fewer than allowed mismatches within region after
15701      trimming
15702
15703    * stage3hr.c: Changed Stage3pair_remove_duplicates to resolve overlaps using
15704      absdifflength
15705
15706    * gsnap.c: Changed --help output to show default batch mode of 4
15707
15708    * gmap.c: Providing more batch modes in GMAP
15709
15710    * access.h, genome.c, genome.h, gsnap.c, indexdb.c, indexdb.h: Providing
15711      more batch modes in GSNAP
15712
157132010-12-03  twu
15714
15715    * stage1hr.c: Made done_level always less than or equal to user_maxlevel
15716
157172010-12-02  twu
15718
15719    * samread.h: Added Id tag
15720
15721    * sam.c: Changed terminal alignments to use soft clipping, since hard
15722      clipping information appears to be removed in making BAM files.
15723
15724    * parserange.c, parserange.h: Implemented simple parser for regions
15725
15726    * gsnap_tally.c: Implemented limited region for indexed BAM files in
15727      bam_tally.  Added -P flag for printing probabilities.
15728
15729    * bamread.c, bamread.h: Implemented indexed BAM files
15730
15731    * Makefile.dna.am: Added parserange.c and .h to bam_tally
15732
157332010-12-01  twu
15734
15735    * Makefile.dna.am, bamread.c, bamread.h, gsnap_tally.c: Implemented
15736      bam_tally.  Changed standard tally output back to previous format.
15737
15738    * config.site, configure.ac: Made changes to include samtools library
15739
157402010-11-30  twu
15741
15742    * Makefile.dna.am, Makefile.gsnaptoo.am, Makefile.three.am, blackboard.c,
15743      blackboard.h, gmap.c, gsnap.c, sequence.c, sequence.h: Implemented the
15744      ability to uncompress gzip files by GSNAP
15745
15746    * README, config.site, configure.ac: Made changes to reflect a new zlib
15747      option
15748
157492010-11-29  twu
15750
15751    * gsnap.c: Fixed bug in output to multiple files where GSNAP single-end
15752      nomapping goes to stdout.
15753
157542010-11-24  twu
15755
15756    * get-genome.c: Fixed stdin input to get-genome for non-map requests
15757
157582010-11-22  twu
15759
15760    * Makefile.dna.am, Makefile.pmaptoo.am, Makefile.three.am: Added uinttable.c
15761      and uinttable.h to pmap
15762
15763    * Makefile.dna.am, Makefile.gmaponly.am, Makefile.gsnaptoo.am,
15764      Makefile.pmaptoo.am, Makefile.three.am: Added uinttable.c and uinttable.h
15765      to gmap
15766
15767    * stage2.c: Fixed bug in determining overall grand winner
15768
15769    * sam.c: Moved read group field to be first
15770
15771    * iit-read.c, iit-read.h: Implemented print_comment option
15772
15773    * gmap.c: Providing nchrs to stage1 procedure
15774
15775    * gregion.c, gregion.h: Implemented extentstart and extentend for comparing
15776      gregions.  Added code for a Gregion cleaning step.
15777
15778    * stage1.c, stage1.h: Added hooks for a Gregion cleaning step
15779
157802010-11-18  twu
15781
15782    * gmap.c, pair.c, pair.h, stage3.c, stage3.h: Implemented --print-comment
15783      for map output.  Removed old code for universal coordinate IIT files.
15784
15785    * genome.h, stage3hr.h: Formatting changes
15786
15787    * goby.c, goby.h, gsnap.c, sequence.c, sequence.h: Changes made for new Goby
15788      code
15789
15790    * substring.c: Always initializing trim_left and trim_right
15791
15792    * pdl_smooth.c, spliceturn.c, multiclean.c: Initial import into svn
15793
15794    * sam.c: Made fixes in printing mate information
15795
15796    * splicing-scan.c: Combining splice and terminal splicesites
15797
15798    * Makefile.dna.am: Removed hexamer-score.c and .h from extents_genebounds
15799
15800    * extents_genebounds.c: Added debugging information
15801
15802    * tally_expr.c: Using new interface to IIT_annotation.  Providing option to
15803      print gc-content.
15804
15805    * Makefile.dna.am: Removed tally_exclude from bin files
15806
15807    * Makefile.dna.am: Removed iit_pileup from bin files
15808
15809    * fopen.m4: Added _cv_ to all variable names
15810
15811    * Makefile.am: Added ACLOCAL_AMFLAGS
15812
15813    * VERSION: Updated version
15814
15815    * bootstrap.dna, bootstrap.gmaponly, bootstrap.gsnaptoo, bootstrap.three:
15816      Using autoreconf.  Added --install to some files to allow building from
15817      svn.
15818
15819    * sam.c: For unpaired_uniq, performing sorting first and then selecting mate
15820      for each end.
15821
15822    * sam.c: Restored null mate for unpaired_mult
15823
15824    * sam.c: Providing mate information in unpaired_mult
15825
15826    * iit-read.c: Printing tabs in SAM headers
15827
15828    * Makefile.dna.am, Makefile.gmaponly.am, Makefile.gsnaptoo.am,
15829      Makefile.three.am: Changes to Makefile.am files
15830
158312010-11-17  twu
15832
15833    * gsnap.c: Added a flag for --no-sam-headers
15834
15835    * resulthr.c, resulthr.h: Added a printing command for resulttype
15836
15837    * sam.c: For unmapped reads, always providing a mate if available
15838
15839    * sequence.c, sequence.h, gmap.c, pair.c, pair.h, stage3.c, stage3.h: Added
15840      ability to print GMAP alignments in SAM output format
15841
15842    * gsnap.c, substring.c, substring.h: Added --show-refdiff option
15843
15844    * README: Added further information about SNP-tolerant alignment and
15845      wildcard SNPs.
15846
158472010-11-16  twu
15848
15849    * README: Made changes in instructions for -V and -v flags
15850
15851    * gsnap.c, iit-read.c, iit-read.h, sam.c, sam.h: Implemented SM and RG
15852      fields.
15853
15854    * gsnap.c: Added warning about paired-end output in Goby
15855
15856    * goby.c, goby.h: Using new interface to sequence.h
15857
15858    * datadir.c: Minor formatting change
15859
15860    * get-genome.c, sequence.c, sequence.h: Handling printing of wildcard SNPs
15861
158622010-11-15  twu
15863
15864    * sequence.c, sequence.h: Changed name of procedure
15865
15866    * revcomp.c: Added flag for --byline
15867
15868    * reads_store.c: Fixed bug in freeing memory too early
15869
15870    * gsnapread.c: Reading quality string based on presence of third tab
15871
15872    * gsnap.c: Removed short versions of some flags
15873
15874    * sam.c: Using nmismatches_refdiffs in NM output
15875
15876    * stage3hr.c, stage3hr.h, substring.c, substring.h: Fixed trimming based on
15877      SNPs.  Computing different types of nmismatches.
15878
15879    * add_rpk.c, exonscan.c, genecompare.c, plotgenes.c, tally.c, tallygene.c:
15880      Using new interface to IIT_annotation
15881
15882    * genome_hr.c, snpindex.c: Enabled representation of wildcard SNPs
15883
15884    * get-genome.c: Added -V flag to specify a directory for alternate genome
15885      information
15886
15887    * substring.c, substring.h, gsnap.c, mapq.c, mapq.h, sam.c, stage1hr.c,
15888      stage3hr.c, stage3hr.h: Added computation of mapping quality
15889
158902010-11-11  twu
15891
15892    * sequence.c, sequence.h: Fixes to printing of query sequences for failed
15893      alignments
15894
15895    * goby.c, goby.h, gsnap.c: Always shutting down Protobuf if compiled in.
15896      Calling gobyAlEntry_appendTooManyHits, even under quiet-if-excessive
15897      option. Changes to flag descriptions.
15898
15899    * genome_hr.c, genome_hr.h, stage1hr.c, stage3hr.c, stage3hr.h, substring.c,
15900      substring.h: Marking all mismatches by using query_compress,
15901      genome_blocks, and snp_blocks.
15902
159032010-11-10  twu
15904
15905    * goby.c, goby.h, gsnap.c, sam.c, sam.h, stage3hr.c, stage3hr.h: Allowing
15906      for three paired-end orientations instead of circular option. Added
15907      --fails-as-input flag.  Fixed issues with handling --failsonly option.
15908
15909    * Makefile.gsnaptoo.am: Added blank line
15910
15911    * Makefile.gmaponly.am: Added parserange.c and parserange.h
15912
15913    * iit-read.c, iit-read.h: Revised IIT_annotation to handle version 5 IIT
15914      files
15915
15916    * iit_get.c: Using new interface to IIT_annotation
15917
15918    * genome.c, genome.h: Added procedures for returning ntcounts in a segment
15919
15920    * gmap.c: Using long options without short options for --version and --help
15921
15922    * pair.c: Fixed output of -f 8 format
15923
15924    * sequence.c, sequence.h: Added procedures for printing GSNAP queries
15925
159262010-11-09  twu
15927
15928    * goby.c, goby.h, Makefile.dna.am, Makefile.gsnaptoo.am, Makefile.three.am,
15929      gsnap.c, stage3hr.c, stage3hr.h: Added functionality for Goby file formats
15930
15931    * configure.ac: Added hooks for Goby compile-time option
15932
15933    * config.site: Added information for Goby compile-time option
15934
15935    * README: Added comments for FASTQ files, -z flag, and Goby functionality
15936
15937    * gsnap_tally.c: Fixed problem with underflow in taking exp() of log
15938      likelihood
15939
15940    * gsnap_tally.c: Added flag for using a constant quality score.  Printing
15941      1-p instead of p.
15942
159432010-11-08  twu
15944
15945    * get-genome.c: Making only a single open of genome or genomealt
15946
159472010-11-07  twu
15948
15949    * genome.c, get-genome.c: Using new interface to IIT_annotation
15950
159512010-11-04  twu
15952
15953    * gsnap_terms.c: Added flags and parameters for mincount, min_endlength, and
15954      max_endlength.
15955
159562010-10-31  twu
15957
15958    * reads_get.c: Made several changes in parsing
15959
15960    * gsnap.c, sam.c, sam.h, stage1hr.c, stage1hr.h, stage3hr.c, stage3hr.h,
15961      substring.c, substring.h: Made SNP and splicesite parameters local to
15962      Substring procedures
15963
159642010-10-29  twu
15965
15966    * gsnap_tally.c: Added -A flag for controlling printing of ref details.
15967      Removed unused global parameters.  Fixed bug in retrieving genomic
15968      reference.
15969
159702010-10-28  twu
15971
15972    * sam.c: Added XS flag to indicate splice direction
15973
15974    * configure.ac: Additions needed for new libtool version
15975
15976    * bootstrap.gsnaptoo: Running full set of autoconf programs
15977
15978    * iit.test.in: Revised version of test
15979
15980    * config.guess, config.sub: New version of libtool programs
15981
15982    * config.guess, config.sub, ltmain.sh: Previous version of libtool programs
15983
15984    * gsnap_tally.c: Added computation of genotype probabilities
15985
15986    * indexdb_dibase.c, indexdb_dibase.h, setup.ref3positions.ok: Initial import
15987      into SVN
15988
159892010-10-27  twu
15990
15991    * gsnap_tally.c: Sorting shifts and quality scores
15992
15993    * gsnap_tally.c: Keeping track of and reporting shifts and quality for
15994      reference matches
15995
15996    * list.c: Added tentative code for dealing with NULL lists in List_to_array
15997
15998    * chop_primers.c: Using new interface to Sequence_print_header
15999
16000    * translation.c, translation.h: Added ability to start CDS from a given
16001      position
16002
16003    * tally_expr.c: Added ability to show mincount
16004
16005    * tally.c, tally.h: Added functions Tally_mean_double and Tally_quantile
16006
16007    * seqlength.c, pair.c: Using new interface to Sequence_print_digest
16008
16009    * parserange.c: Fixed bug in returning coordstart
16010
16011    * md5-compute.c: Using new interface to MD5_print
16012
16013    * iit_store.c: Storing rest of header in annotation.  Using new interface to
16014      IIT_write.
16015
16016    * iit_get.c: Changed stats to compute mean over entire width, not just
16017      non-zero positions
16018
16019    * gsnap_iit.c: Using new interface to Gsnapread_parse_line
16020
16021    * gsnap_tally.c: Subtracting 64 from quality scores, as standard for Illumina
16022
16023    * gsnap_extents.c, gsnap_splices.c, gsnap_terms.c: Using new interface to
16024      Samread
16025
16026    * gsnap_tally.c: Printing quality scores relative to highest one seen
16027
16028    * sam.c: Changed separator for extra fields to be a tab, rather than a space.
16029
160302010-10-26  twu
16031
16032    * gsnap.c, stage1hr.c, stage1hr.h, stage3hr.c, stage3hr.h, substring.c,
16033      substring.h: Implemented trim-mismatch-score for controlling trimming
16034
16035    * samread.c, samread.h, gsnapread.c, gsnapread.h: Implemented retrieval of
16036      quality string
16037
16038    * gsnap_tally.c: Printing mismatch information by position and quality
16039
16040    * gmapindex.c: Using new interface to IIT_write
16041
16042    * gmap.c: Protecting against calling List_to_array with an empty list
16043
16044    * get-genome.c: Fixed bug introduced by new default snps_mode
16045
16046    * diag.c: Protecting against call to List_to_array on an empty list
16047
16048    * stage3hr.c, stage3hr.h: Reversing quality string in GSNAP output when
16049      necessary, and using quality shift.
16050
16051    * gsnap.c: Fixed memory leak when npaths is zero.  Reversing quality string
16052      in GSNAP output when necessary.
16053
160542010-10-25  twu
16055
16056    * gsnap_tally.c: Changed output format to show all signed query positions
16057
16058    * gsnap_tally.c: Changed output format to show all query positions
16059
16060    * gsnap_tally.c: Incorporated sam_tally into this source code
16061
160622010-10-24  twu
16063
16064    * splicetrie.h: Changed Splicetrie_dump
16065
16066    * splicetrie.c: Changed Splicetrie_dump.  Handling case where pos5 == pos3
16067      in short-exon splicing.  Added debugging statements.
16068
16069    * stage1hr.c: Fixed a bug in handling one case of ambiguous splice ends in
16070      short-exon splicing.
16071
16072    * splicetrie.c: Allowing only one mismatch at most for searching at ends in
16073      short-exon splicing when ends are 16 nt or shorter.
16074
16075    * splicetrie.c: Combined Trie_new and Trie_output into a single procedure
16076
160772010-10-23  twu
16078
16079    * splicetrie.c: Removed debugging statement
16080
16081    * gsnap.c, splicetrie.c, splicetrie.h, stage1hr.c, stage1hr.h: Enabled
16082      computation of splice tries on the fly
16083
16084    * gsnap.c, splicetrie.c, splicetrie.h: Divided Splicetrie_build process into
16085      two steps, with one computing nsplicepartners.
16086
16087    * gsnap.c, splicetrie.c, splicetrie.h: Using unsigned ints rather than char
16088      * to store splicestrings and compute tries.
16089
160902010-10-22  twu
16091
16092    * splicetrie.c: Ignoring cases where splice site has an N
16093
16094    * stage3hr.c: Changed assertion to use effective_chrnum rather than chrnum
16095
16096    * stage1hr.c: Using new interface to Splicetrie procedures.  Revised
16097      parameters for distant splicing.
16098
16099    * splicetrie.c, splicetrie.h: Checking short-end and short-exon splicing
16100      against extension
16101
16102    * gsnap.c: Automatically setting pairmax if not specified for RNA-Seq
16103
161042010-10-20  twu
16105
16106    * stage3hr.c: Changed ambiguous splice procedure to remove longer splice
16107
16108    * stage1hr.c: Using new interfaces to Splicetrie procedures.
16109
16110    * splicetrie.c, splicetrie.h: Fixed various bugs.  Implemented separate
16111      procedures for short-ends and for longer ends (needed for short-exon
16112      alignments).
16113
161142010-10-19  twu
16115
16116    * splicetrie.c, splicetrie.h, stage1hr.c: Checking entire subtree against
16117      splicefrags when using alternate genome and reaching a non-leaf with no
16118      string remaining.  Removed unused parameters.
16119
16120    * stage1hr.c: Using new interface to Splicetrie_dump
16121
16122    * get-genome.c: Changed flags for using SNPs
16123
16124    * genome.c, genome.h: Added function Genome_fill_buffer_simple_alt
16125
16126    * splicetrie.c, splicetrie.h: Fixed use of nmismatches from splicefrags.
16127      Fixed use of alternate genome.  Using 4 bytes instead of 2 bytes for
16128      reloffsets.  Not using suboptimal separation.
16129
161302010-10-18  twu
16131
16132    * stage1hr.c, stage3hr.c, stage3hr.h, substring.c, substring.h: Removed
16133      chrhigh from Substring_T and Stage3_T objects, and removed from segment
16134      objects in stage 1.
16135
16136    * gsnap.c, splicetrie.c, splicetrie.h, stage1hr.c: Completed transition to
16137      using splicetries.  Removed unnecessary variables and code.
16138
16139    * genome_hr.c, genome_hr.h, gsnap.c, splicetrie.c, splicetrie.h, stage1hr.c,
16140      stage1hr.h: Enabling use of splicefrags with splicetrie.  Enabled
16141      processing of alternate alleles.
16142
161432010-10-16  twu
16144
16145    * Makefile.dna.am, Makefile.gsnaptoo.am, gsnap.c, splicetrie.c,
16146      splicetrie.h, stage1hr.c, stage1hr.h: Implemented tries for short-end
16147      splicing
16148
161492010-10-15  twu
16150
16151    * gsnapread.c, gsnapread.h: Returning number of mismatches
16152
16153    * stage3.c, stage3.h, gmap.c: Added ability to specify where CDS begins
16154
16155    * get-genome.c: Added -A flag for dumping entire genome.  Handling -m flag
16156      correctly for stdin input.
16157
161582010-10-14  twu
16159
16160    * pair.c: Implemented printinf of coverage, identity, and phases in GFF
16161      output
16162
16163    * iit-write.c, iit-write.h, iitdef.h: Implemented version 5 of IIT format,
16164      which allows different pointer sizes for labels and annotations
16165
16166    * gsnap.c, iit-read.c, iit-read.h, md5.c, md5.h, resulthr.c, resulthr.h,
16167      sam.c, sam.h, sequence.c, sequence.h, stage3hr.c, stage3hr.h, substring.c,
16168      substring.h: Enabled printing of output into multiple files
16169
16170    * stage1hr.c: Allowing 1 mismatch in shortexon end, but requiring a
16171      separation of 2 from next best alignment.  Consolidated code into
16172      find_left_splice and find_right_splice.
16173
161742010-10-13  twu
16175
16176    * gsnap.c: Changed default quality-shift to be 0
16177
16178    * stage3hr.c, stage3hr.h, substring.c, substring.h, sam.c: Implemented
16179      calculation and printing of MD string
16180
16181    * gsnap_tally.c: Fixed bug in freeing data.  Setting min_readlength to 0.
16182      Using new interface to Gsnapread.
16183
161842010-10-06  twu
16185
16186    * spliceclean.c: Added ability to print excluded splices
16187
16188    * spliceclean.c: In resolving splice direction, using a 10-to-1 threshold
16189      and checking adjacent splices if necessary.
16190
16191    * gsnap.c: Turned on indels by setting default indel-penalty to be 1.
16192
161932010-10-04  twu
16194
16195    * stage1hr.c: Added restriction on number of mismatches allowed in short
16196      exons
16197
161982010-10-02  twu
16199
16200    * spliceclean.c: Added ability to resolve between competing splices based on
16201      fwd and rev extents.  Added ability to print endpoints and midpoints.
16202      Added flag to bypass cleaning step.
16203
16204    * stage3hr.c: Allowing pair overlaps when splices are involved.  Using new
16205      way of computing low and high for pairs.
16206
162072010-09-30  twu
16208
16209    * gsnap_extents.c: Initial creation
16210
16211    * Makefile.dna.am: Added extents_genebounds
16212
162132010-09-29  twu
16214
16215    * spliceclean.c: Added ability to print runlengths and splicesites.  Added
16216      ability to filter based on uniqueness, concordance, or maxminsupport.
16217
16218    * extents_genebounds.c: Initial creation
16219
162202010-09-28  twu
16221
16222    * Makefile.dna.am: Added multiclean, gsnap_extents, and gsnap_terms
16223
16224    * gsnap_terms.c: Initial creation of gsnap_terms
16225
162262010-09-22  twu
16227
16228    * iit_dump.c: Implemented printing of IIT in runlength or integral output
16229
16230    * gsnap.c, sam.c, sam.h, sequence.c, stage3hr.c: Fixed handling of circular
16231      reads and implemented printing of SAM output for circular reads.
16232
16233    * gmap.c, stage3.c, stage3.h, translation.c, translation.h: Added feature to
16234      start protein coding sequence from first query position.
16235
162362010-09-20  twu
16237
16238    * splicegene.c: Added the ability to output genes as well as paths
16239
162402010-09-08  twu
16241
16242    * mem.c: Added comment about use of LEAKCHECK
16243
16244    * list.c: Removed unused variable
16245
16246    * iit-read.c, iit-read.h, iit_dump.c: Added sort functionality for iit_dump
16247
16248    * snpindex.c: Using -V flag to allow user to specify destination directory
16249
16250    * substring.c: Added check to avoid checking for mismatches past end of
16251      string
16252
16253    * stage1hr.c: Simplified computation of leftbound and rightbound in
16254      short-end splicing
16255
16256    * stage1hr.c: Using different stopi for novel and known splicing.  Fixed
16257      possible bug in reading past mismatches_left and mismatches_right.  Fixed
16258      calculation of chrend in finding right bound for short-end splicing.
16259
162602010-09-04  twu
16261
16262    * splicegene.c: Incorporated cappaths functionality
16263
16264    * splicegene.c: Removed global variables related to linear fitting
16265
16266    * splicegene.c: Fixed memory leaks.  Added filtering based on mean number of
16267      splices.
16268
162692010-09-03  twu
16270
16271    * splicegene.c: Removed global variables
16272
16273    * splicegene.c: Added ability to handle all chromosomes in a single run.
16274      Fixed some memory leaks.
16275
16276    * gsnap.c, stage1hr.c, stage1hr.h, stage3hr.c, stage3hr.h, substring.c,
16277      substring.h: Introduced -V flag for specifying snpsdir, and now using -v
16278      flag for indicating SNPs file.  Removed geneprob option and procedures.
16279
16280    * stage1hr.c, stage3hr.c, substring.c, substring.h: Made terminals extend to
16281      beginning and end of read, with trimming starting from there.  Endtypes
16282      based on presence of trimming.
16283
162842010-09-02  twu
16285
16286    * gsnapread.c, sam.c, stage1hr.c, stage3hr.c, stage3hr.h, substring.c,
16287      substring.h: Added left and right endtypes to Substring_T object, and
16288      using them for printing exact, substitution, and terminal alignments.
16289      Renamed variables in Hittype_T enum.  Added ambiguous alignments.
16290      Restored usage of score in Stage3_remove_duplicates.  Using number of
16291      mismatches to compute nmatches in Stage3_T objects.  Revised computation
16292      of terminal alignments.
16293
162942010-09-01  twu
16295
16296    * gsnap.c, stage1hr.c, stage1hr.h: Introduced min_shortend as a parameter
16297      with flag -l.  The find_left_splice and find_right_splice procedures now
16298      compete with extension.
16299
163002010-08-31  twu
16301
16302    * gmap.c, stage3.c, stage3.h: Introducing sense_filter in addition to
16303      sense_try.  Counting non-canonical introns, rather than canonical ones to
16304      determine sense, and adding a small penalty for introns to bias against
16305      short exons.
16306
16307    * stage1hr.c: Using new parameter to turn off concordant translocations with
16308      terminal alignments.  Clarified usage of query, queryuc_ptr, and queryrc.
16309
16310    * stage3hr.c, stage3hr.h: Added flag to control concordant translocations
16311
16312    * genome_hr.c, genome_hr.h: Fixed bug in handling fragments when query
16313      length is 16.  Removed query parameter from Genome_trim_left and
16314      Genome_trim_right procedures.
16315
16316    * stage1hr.c: Stopped placing restrictions on stopi in finding splice ends.
16317      Requiring minimum endlength for short end splicing.
16318
163192010-08-30  twu
16320
16321    * stage1hr.c: Fixed bug with donor, acceptor, and shortexons that were NULL.
16322       Fixed logic with novel splice sites in local splicing.
16323
163242010-08-26  twu
16325
16326    * stage1hr.c: Fixed bug attempting to make shortexon of length 0
16327
163282010-08-25  twu
16329
16330    * splicing-scan.c: Initial import into SVN
16331
16332    * Makefile.util.am: Renamed revcomp program to rc
16333
16334    * Makefile.gsnaptoo.am: Added gsnapread.c and gsnapread.h for gsnap_tally
16335
16336    * Makefile.dna.am: Added rc and splicing-scan
16337
16338    * get-genome.c: Removed unused parameter
16339
16340    * gsnap_splices.c, gsnapread.c: Allowing program to handle short exon
16341      alignments with multiple splices
16342
16343    * parserange.c: Added check for coordinate lengths that exceed 32 bits
16344
16345    * substring.c: Commented old location of sub: field in donor and acceptor
16346      substrings
16347
16348    * stage3hr.c, stage3hr.h: Including chrhigh in substrings
16349
16350    * stage1hr.c: Including chrhigh in segments and substrings.  Implemented
16351      usage of splice distance in short exon alignments.
16352
16353    * gsnap_iit.c, gsnap_splices.c, gsnap_tally.c, gsnapread.c, gsnapread.h:
16354      Gsnapread_parse_line returns information about types of endpoints
16355
163562010-08-24  twu
16357
16358    * substring.c, substring.h: Including chrhigh as a field in Substring_T
16359
16360    * sam.c, stage1hr.c, stage3hr.c, stage3hr.h, substring.c, substring.h:
16361      Implemented shortexon alignment
16362
16363    * gsnap.c: Changed message for reading splicesites IIT file
16364
16365    * genome_hr.c: Fixed bug in using flags for shortend splicing
16366
163672010-08-23  twu
16368
16369    * gsnap.c: Enabled reading of a local splicesite file
16370
163712010-08-19  twu
16372
16373    * stage1hr.c: Added check for query_compress to be non-NULL before
16374      find_terminals for single-end alignment
16375
16376    * stage1hr.c: Fixed bug where query_compress needed to be computed before
16377      finding terminals
16378
163792010-08-18  twu
16380
16381    * iit_store.c: Using string_compare and string_hash functions from table.c
16382
16383    * iit-write.c: Moved position of free() statement
16384
16385    * iit-read.c: Fixed debugging output
16386
16387    * genome_hr.c, genome_hr.h, gsnap.c, stage1hr.c, stage1hr.h: Using
16388      splicefrags to increase speed of finding short-end splicing
16389
16390    * stage3hr.c: Setting nindels to be zero for a terminal alignment
16391
163922010-08-13  twu
16393
16394    * gsnap.c, indexdb.c, indexdb.h, indexdb_hr.c, indexdb_hr.h, spanningelt.c,
16395      spanningelt.h, stage1hr.c, stage1hr.h: Allowing GSNAP to run when
16396      positions are read as FILEIO.
16397
16398    * stage1hr.c, stage3hr.c, stage3hr.h, substring.c, substring.h: Using
16399      splicesites array rather than splicesites_iit in short-end splicing
16400
16401    * stage1hr.h: Increased MAX_QUERYLENGTH from 200 to 500
16402
164032010-08-09  twu
16404
16405    * indexdb_dump.c: Added file left out in SVN conversion
16406
16407    * indexdb_dump.c: Undo addition to wrong directory
16408
16409    * stage1hr.c: Removed requirement for nconcordant == 0 in deciding to
16410      compute local splices and short-end splices.
16411
164122010-08-06  twu
16413
16414    * reads.c, reads_store.c: Eliminating labelorder in file format.  Cleaned up
16415      memory leaks.
16416
164172010-08-05  twu
16418
16419    * reads.c: Improved speed of dumping procedure
16420
16421    * reads.c, reads_get.c, reads_store.c: Allowing either 4-byte or 8-byte
16422      label and read pointers
16423
16424    * Makefile.dna.am, reads.c, reads.h, reads_get.c, reads_store.c: Enabled
16425      compression of reads
16426
16427    * Makefile.dna.am, reads.c, reads.h, reads_store.c: Using a div structure in
16428      file format
16429
16430    * Makefile.dna.am, reads.c, reads.h, reads_dump.c, reads_get.c,
16431      reads_store.c: Using our own file format for storing reads, rather than
16432      BerkeleyDB
16433
16434    * iit-read.c: Fixed bug in fileio reading of annotations
16435
16436    * access.c, access.h, add_rpk.c, assert.c, assert.h, backtranslation.c,
16437      backtranslation.h, bigendian.c, bigendian.h, blackboard.c, blackboard.h,
16438      block.c, block.h, bool.h, boyer-moore.c, boyer-moore.h, cappaths.c,
16439      changepoint.c, changepoint.h, chimera.c, chimera.h, chop_primers.c,
16440      chrnum.c, chrnum.h, chrom.c, chrom.h, chrsegment.c, chrsegment.h,
16441      chrsubset.c, chrsubset.h, cmet.c, cmet.h, cmetindex.c, color.c, color.h,
16442      comp.h, complement.h, compress.c, compress.h, cum.c, datadir.c, datadir.h,
16443      datum.c, datum.h, diag.c, diag.h, diagdef.h, diagnostic.c, diagnostic.h,
16444      diagpool.c, diagpool.h, dibase.c, dibase.h, dibaseindex.c, doublelist.c,
16445      doublelist.h, dynprog.c, dynprog.h, except.c, except.h, exonscan.c,
16446      fopen.h, gbuffer.c, gbuffer.h, gdiag.c, geneadjust.c, genecompare.c,
16447      geneeval.c, genome-write.c, genome-write.h, genome.c, genome.h,
16448      genome_hr.c, genome_hr.h, genomepage.c, genomepage.h, genomeplot.c,
16449      genomicpos.c, genomicpos.h, genuncompress.c, get-genome.c, getopt.c,
16450      getopt.h, getopt1.c, gmap.c, gmapindex.c, gregion.c, gregion.h, gsnap.c,
16451      gsnap_iit.c, gsnap_splices.c, gsnap_tally.c, gsnapread.c, gsnapread.h,
16452      hint.c, hint.h, iit-read.h, iit-write.c, iit-write.h, iit_dump.c,
16453      iit_fetch.c, iit_get.c, iit_plot.c, iit_store.c, iit_update.c, iitdef.h,
16454      indexdb.c, indexdb.h, indexdb_hr.c, indexdb_hr.h, indexdbdef.h,
16455      interval.c, interval.h, intlist.c, intlist.h, intlistdef.h, intpool.c,
16456      intpool.h, intron.c, intron.h, lgamma.c, lgamma.h, list.c, list.h,
16457      listdef.h, littleendian.c, littleendian.h, match.c, match.h, matchdef.h,
16458      matchpool.c, matchpool.h, maxent.c, maxent.h, md5-compute.c, md5.c, md5.h,
16459      mem.c, mem.h, memchk.c, nmath.c, nmath.h, nr-x.c, nr-x.h, oligo-count.c,
16460      oligo.c, oligo.h, oligoindex.c, oligoindex.h, oligop.c, oligop.h,
16461      orderstat.c, orderstat.h, pair.c, pair.h, pairdef.h, pairingcum.c,
16462      pairingflats.c, pairinggene.c, pairingstrand.c, pairingtrain.c,
16463      pairpool.c, pairpool.h, parserange.c, parserange.h, pbinom.c, pbinom.h,
16464      pdldata.c, pdldata.h, pdlimage.c, plotdata.c, plotdata.h, plotgenes.c,
16465      plotgenes.h, pmapindex.c, random.c, random.h, rbtree.c, rbtree.h,
16466      rbtree.t.c, reader.c, reader.h, reqpost.c, reqpost.h, request.c,
16467      request.h, result.c, result.h, resulthr.c, resulthr.h, revcomp.c, sam.c,
16468      sam.h, sam_tally.c, samread.c, samread.h, scores.h, segmentpos.c,
16469      segmentpos.h, segue.c, separator.h, seqlength.c, sequence.c, sequence.h,
16470      smooth.c, smooth.h, snpindex.c, spanningelt.c, spanningelt.h,
16471      spliceeval.c, splicegene.c, splicegraph.c, splicescan.c, splicing-score.c,
16472      stage1.c, stage1.h, stage1hr.c, stage1hr.h, stage2.c, stage2.h, stage3.c,
16473      stage3.h, stage3hr.c, stage3hr.h, stopwatch.c, stopwatch.h, subseq.c,
16474      substring.c, substring.h, table.c, table.h, tableint.c, tableint.h,
16475      tallyadd.c, tallyflats.c, tallygene.c, tallyhmm.c, tallystrand.c,
16476      translation.c, translation.h, trial.c, trial.h, types.h, uintlist.c,
16477      uintlist.h, uinttable.c, uinttable.h: Added keyword property for Id
16478
16479    * tally_expr.c: Fixed bug in reporting number of exons and in skipping exons
16480
164812010-08-04  twu
16482
16483    * stage3hr.c: Improved debugging statements
16484
16485    * stage3.c: Fixed bug when ngap was larger than gaps in dual_break
16486
16487    * samread.c: Added old but unused code
16488
16489    * iit-read.c, iit-read.h: Added function IIT_get_typed_signed
16490
16491    * gsnap_splices.c: Added parameter for shortsplicedist
16492
16493    * gsnapread.c, gsnapread.h: Added function Gsnapread_accession
16494
16495    * gsnap_tally.c: Fixed bug in using advance_one_hit
16496
16497    * Makefile.dna.am, gsnap_iit.c: Created gsnap_iit
16498
16499    * Makefile.dna.am, reads_get.c, reads_store.c: Created programs reads_store
16500      and reads_get.
16501
16502    * splicefill.c: Using median filtering as a first step
16503
16504    * splicefill.c: Removed probabilistic calculations
16505
16506    * splicefill.c: Version with probabilistic calculations
16507
165082010-08-03  twu
16509
16510    * splicefill.c: Using tally information to find edges.  Using Poisson and
16511      exponential models.
16512
16513    * iit-read.c, iit-read.h, snpindex.c: Providing messages about chromosomes
16514      in the genome and in the SNPs IIT file
16515
165162010-08-02  twu
16517
16518    * Makefile.dna.am, splicefill.c: Initial creation of splicefill program
16519
165202010-08-01  twu
16521
16522    * gsnap_tally.c, gsnapread.c, gsnapread.h: Able to separate low and high
16523      ends of paired-end reads
16524
165252010-07-31  twu
16526
16527    * Makefile.dna.am, gsnap_tally.c: Using parsing functions in gsnapread.c
16528
16529    * Makefile.dna.am, gsnap_splices.c, gsnapread.c, gsnapread.h: Moved parsing
16530      functions to gsnapread.c
16531
165322010-07-30  twu
16533
16534    * spliceclean.c: Preserving information in rest of header
16535
16536    * gsnap_splices.c: Printing maxminsupport and nconcordant information
16537
16538    * Makefile.dna.am, spliceclean.c: Enabled spliceclean to handle all
16539      chromosomes.  Using tables to store splices.
16540
16541    * spliceclean.c: Fixed bugs in parsing input
16542
165432010-07-29  twu
16544
16545    * spliceclean.c: Added procedure to free memory
16546
16547    * gsnap_splices.c: Fixed bug from freeing table keys too early
16548
16549    * gsnap_splices.c: Enabled program to handle all chromosomes in a single run
16550
16551    * Makefile.dna.am, gsnap_splices.c, iit-read.c, uinttable.c, uinttable.h:
16552      Using a table to store splice sites in gsnap_splices.c
16553
165542010-07-28  twu
16555
16556    * gsnap_splices.c: Removed -F and -R flags for separate strands
16557
16558    * Makefile.dna.am, gsnap_splices.c, sam_splices.c: Integrated sam_splices.c
16559      and gsnap_splices.c into a single file
16560
16561    * mem.c, genome.c: Removed unused variable
16562
16563    * iit-read.c, substring.c: Removed unused code
16564
16565    * blackboard.c: Returning bool type explicitly
16566
16567    * sequence.c: Resolving compiler warning about type casting
16568
16569    * gsnap_splices.c, sam_splices.c, spliceclean.c: Allowing -s flag to print
16570      annotations about known splicesites
16571
16572    * struct-stat64.m4: Added missing m4 file
16573
16574    * Makefile.am, cvs2cl.pl, svncl.pl: Replace cvs2cl.pl with svncl.pl
16575
16576    * CVSROOT: Removed CVSROOT directory
16577
165782010-07-27  twu
16579
16580    * assert.h: Changed compiler variable
16581
16582    * VERSION, config.site.rescomp.prd, index.html: Revised for 2010-07-27
16583      release
16584
16585    * bootstrap.dna: Using autoreconf
16586
16587    * README: Modified statement about -m flag and about types in SNP IIT files
16588
16589    * MAINTAINER: Added statement about assert.h
16590
16591    * tally_expr.c: Standardized output format
16592
16593    * gsnap.c: Made -q flag work correctly for single-thread mode.  Printing run
16594      time at end of each run.
16595
16596    * gmap.c: Calling correct exception for a sigtrap
16597
165982010-07-26  twu
16599
16600    * Makefile.dna.am, Makefile.gsnaptoo.am: Using datadir in snpindex
16601
16602    * iit-read.c, iit-read.h: Fixed IIT_index function
16603
16604    * snpindex.c: Using datadir.  Fixed error messages.
16605
16606    * stage3hr.h, substring.c, substring.h: Removed fields for halfintrons.
16607
16608    * stage3hr.c: Fixed bug in removing duplicates.  Removed fields for
16609      halfintrons.
16610
16611    * stage1hr.c, stage1hr.h: Implemented short-end splicing for known splice
16612      sites
16613
16614    * mem.c: Changed monitoring statement to print only in debug mode
16615
16616    * iit-read.c, iit-read.h: Added procedure for typed and signed intervals
16617      based on divno
16618
16619    * gsnap.c: New interface to stage 1 procedures
16620
166212010-07-23  twu
16622
16623    * VERSION, index.html: Revised for 2010-07-23 release
16624
16625    * spliceclean.c: Processing forward and reverse splices separately
16626
16627    * gsnap.c: Fixed bug where -a flag modified trim_maxlength
16628
16629    * assert.h: Turned off assertion checking
16630
16631    * Makefile.dna.am: Added tally_exclude
16632
16633    * substring.c: Modified debugging statements for trimming
16634
16635    * stage3hr.c: Added debugging statements
16636
16637    * iit-read.c, iit-read.h: Added function IIT_interval_sign
16638
166392010-07-22  twu
16640
16641    * tally_expr.c: Allowing printing over all positions
16642
16643    * tally_expr.c: Allowing multiple tallies
16644
166452010-07-21  twu
16646
16647    * gsnap.c, sam.c, sam.h, sequence.c, sequence.h: Fixed handling of quality
16648      scores to match that of sequence.  Added -j flag to specify amount of
16649      shift for quality scores.
16650
16651    * setup1.test.in: Putting test chromosome in subdirectory
16652
16653    * setup2.test.in: Revised test for new gmapindex, but test not being used
16654      currently
16655
16656    * iit.test.in: Not testing for diff in iittest.iit
16657
16658    * align.test.ok, coords1.test.ok, map.test.ok: Changed expectations to match
16659      latest program output
16660
16661    * iittest.iit.ok: Using latest IIT version
16662
16663    * Makefile.am: Using ref3offsets and ref3positions instead of idxoffsets and
16664      idxpositions
16665
16666    * acx_mmap_fixed.m4, acx_mmap_variable.m4: Added stdlib.h and unistd.h
16667      headers
16668
16669    * bootstrap.dna, bootstrap.gsnaptoo, bootstrap.three,
16670      config.site.rescomp.prd, config.site.rescomp.tst, gsnap-fetch-reads.pl,
16671      gsnap-fetch-reads.pl.in, gsnap-remap.pl, gsnap-remap.pl.in, cum.c,
16672      dibaseindex.c, geneadjust.c, pairingtrain.c, splicegraph.c, tallyadd.c,
16673      tallygene.c, tallystrand.c: Initial import into CVS
16674
16675    * config.site.gne: Removed old config file
16676
16677    * acinclude.m4: Including builtin m4 code
16678
16679    * MAINTAINER: Added notes about checking Bigendian behavior
16680
16681    * archive.html, index.html: Revised for 2010-07-20 release
16682
16683    * configure.ac: Better checking for VERSION
16684
16685    * bootstrap.pmaptoo: Added --force flag
16686
16687    * bootstrap.gmaponly: Added autoreconf step
16688
16689    * README, VERSION: Changed for 2010-07-20 release
16690
16691    * gmap_process.pl.in: Removed check for contig version
16692
16693    * gmap_update.pl.in: Not updating chromosome or contig IIT files
16694
16695    * gmap_setup.pl.in: Providing -q and -Q flags for GMAP and PMAP indexing
16696      intervals.
16697
16698    * gsnap_splicing.pl: Program is superseded by C program gsnap_splices
16699
16700    * gsnap_splicing.pl: Various changes.  Program is superseded by C program
16701      gsnap_splices.
16702
16703    * gmap_compress.pl.in, gmap_reassemble.pl.in, gmap_uncompress.pl.in,
16704      md_coords.pl.in: Using "use warnings" instead of "-w" flag
16705
16706    * fa_coords.pl.in: Handling duplicate occurrences of a chromosome.  Limiting
16707      number of warnings.
16708
16709    * Makefile.am: Added gmap_update
16710
16711    * gmap.c, pair.c, pair.h, stage3.c, stage3.h: Implemented -f 4 GFF estmatch
16712      format based on patch from Shaun Jackman and Eoghan Harrington of British
16713      Columbia Genome Sciences Centre.
16714
16715    * chimera.c: Commented out problematic code, to be resolved later
16716
16717    * get-genome.c: Fixed coordinates when retrieving map file contents
16718
16719    * pairingstrand.c, tallyhmm.c, geneeval.c: Using new Parserange_universal
16720      function
16721
16722    * tally.c, tally.h: Treating counts as long ints
16723
16724    * splicegene.c: Changed algorithm
16725
16726    * spliceeval.c: Removed unused code
16727
16728    * plotgenes.c, plotgenes.h: Several changes, including trying to resolve
16729      fatal errors
16730
16731    * pdldata.c, pdldata.h: Implemented Pdldata_new and Pdldata_write
16732
16733    * pairinggene.c: Counting found splices as flats
16734
16735    * pairingflats.c: Changed algorithm for finding flat regions
16736
16737    * pairingcum.c: Treating high and low reads separately
16738
16739    * oligo-count.c: Using new interface to Reader_new.
16740
16741    * lgamma.c, lgamma.h: Handling counts as long ints
16742
16743    * hint.c, hint.h: Changed models
16744
16745    * genecompare.c: Separate output for forward and reverse chromosome strands.
16746
16747    * gdiag.c: Removed some output.  Using new interfaces to IIT_read.
16748
16749    * dibase.c, dibase.h, exonscan.c: Change in algorithm
16750
16751    * chimera.c: Using Path_matchscores instead of Stage3_matchscores
16752
16753    * cappaths.c: Using xintercepts instead of slopes
16754
16755    * boyer-moore.c, boyer-moore.h: Added procedures for chop_primers.c
16756
16757    * add_rpk.c: Change of output format
16758
167592010-07-20  twu
16760
16761    * stage1hr.c: Tightened requirements further for splice site probabilities
16762      on distant splicing.
16763
16764    * stage3hr.c: Using nmatches to filter pairs containing terminal alignments
16765
16766    * gsnap.c: Changed advice on RNA-Seq settings for -m.
16767
16768    * archive.html, index.html: Released version 2010-03-10
16769
16770    * substring.c, substring.h: Computing nmatches directly
16771
16772    * stage3hr.h: Removed score parameter from Stage3_new_terminal
16773
16774    * stage3hr.c: Selecting best among terminal alignments.  Computing nmatches
16775      directly.
16776
16777    * stage1hr.c: Changed algorithm for finding terminal alignments.  Requiring
16778      distant splicing to have high splice probabilities.
16779
16780    * sam_splices.c: Computing readlengths on each end of splice separately
16781
16782    * gsnap.c, gsnap_splices.c: Added debugging code
16783
167842010-07-19  twu
16785
16786    * stage1hr.c: Using sequences as numeric in some cases
16787
16788    * maxent.c, maxent.h: Added procedures to handle sequences as numeric
16789
16790    * gsnap.c: Added a comment to the --help message
16791
16792    * genome_hr.c, genome_hr.h: Added a procedure to retrieve a dinucleotide
16793
16794    * genome.c, genome.h: Added a procedure to retrieve sequences as numeric
16795
16796    * Makefile.dna.am, Makefile.gmaponly.am, Makefile.gsnaptoo.am,
16797      Makefile.pmaptoo.am: Revised files and programs as needed
16798
167992010-07-16  twu
16800
16801    * stage3hr.c: Requiring that dual translocations be concordant only across
16802      the same two chromosomes.
16803
16804    * smooth.c: Conserving listcells where possible
16805
16806    * oligoindex.c, oligoindex.h: Removed computation of fingerprint
16807
16808    * list.c, list.h: Implemented List_transfer_one and List_push_existing
16809
16810    * gsnap.c: Performing trimming by default
16811
16812    * dynprog.c: Ensuring that finalscore is assigned in Dynprog_genome_gap.
16813
16814    * stage3hr.c, substring.c, substring.h: Providing a minlength parameter to
16815      Substring_new, so end indels do not get eliminated.
16816
16817    * chop_primers.c: Initial import into CVS
16818
16819    * sam_tally.c: Trimming uses -3 for mismatches and resets negative scores to
16820      zero. Handling hard clipping.
16821
16822    * gsnap_tally.c: Trimming uses -3 for mismatches and resets negative scores
16823      to zero
16824
16825    * gsnap_splices.c: Sorting splices using both ends.
16826
16827    * sam_splices.c: Handling AT-AC introns.  Sorting splices using both ends.
16828
16829    * samread.c, samread.h: Returning acc in parsing line
16830
16831    * sam.h: Renamed NOT_PRIMARY bit
16832
16833    * sam.c: Implemented hard clipping of sequences for SAM output.  Enabled
16834      printing of distant splices onto two separate lines.  Using NOT_PRIMARY
16835      bit in flag.
16836
16837    * sequence.c, sequence.h: Implemented hard clipping of sequences for SAM
16838      output
16839
16840    * stage3.c: Removed unused procedures.  Conserving listcells when possible.
16841
16842    * stage2.c: Removed unused procedures
16843
16844    * gmap.c: Removed references to Intpool_T
16845
16846    * pairpool.c: Setting initial value for state
16847
16848    * matchpool.c, matchpool.h: Implemented Matchpool_free_memory procedure
16849
16850    * mem.c, mem.h: Added procedures for computing memory usage
16851
16852    * stage3hr.c, substring.c, substring.h: Now trimming from both ends of
16853      terminal alignment.  Explicitly specifying which ends to trim.
16854
16855    * substring.h: Replaced Substring_T with T
16856
16857    * substring.c: Allowing Substring_new to return NULL if trimmed alignments
16858      are poor. Replaced Substring_T with T.  Resetting score to zero when it
16859      becomes negative in trimming.
16860
16861    * stage1hr.c, stage3hr.c: Allowing Substring_new and Stage3_new to return
16862      NULL if trimmed alignments are poor.
16863
16864    * substring.c: Changed mismatch score from -1 to -3 for trimming
16865
16866    * stage1hr.c, stage3hr.h: Added notion of ambiguous splices.
16867
16868    * stage3hr.c: Added notion of ambiguous splices.  Fixing removal of
16869      duplicates.
16870
168712010-07-14  twu
16872
16873    * stage3hr.c, stage3hr.h: Implemented Stage3_substring_low
16874
16875    * samread.c: Added debugging comments
16876
16877    * sam.c, sam.h: Moved flag constants to sam.h.  Using Stage3_substring_low
16878      to print chromosomal pos.
16879
16880    * sam_splices.c: Simplified loop
16881
16882    * gsnap_splices.c: Having lines_gc return NULL
16883
16884    * gsnap_tally.c: Fixed trimming.  Turning off trimming by default.
16885
16886    * sam_tally.c: Initial import into CVS
16887
16888    * sam_splices.c, samread.c, samread.h: Fixed bug in specifying allowed
16889      dinucleotides.  Moved parsing procedures to samread.c.
16890
168912010-07-13  twu
16892
16893    * sam_splices.c: Initial import into CVS
16894
168952010-07-10  twu
16896
16897    * spliceclean.c: Changed variable names
16898
16899    * stage2.c, stage2.h, stage3.h: Removed stage2 fingerprint
16900
16901    * gmap.c: Added freeing of pairpool and diagpool memory at certain intervals.
16902
16903    * pair.c, pair.h, stage3.c: Moved HMM code from pair.c to stage3.c
16904
16905    * pairpool.c, pairpool.h: Implemented Pairpool_free_memory function
16906
16907    * diagpool.c, diagpool.h: Implemented Diagpool_free_memory function
16908
16909    * gsnap.c: Added ability to remove adapters from paired-end reads.
16910      Providing option for maxlength on trimming.
16911
16912    * gmap.c: Using Stage2_scan method to rank gregions.  Providing additional
16913      diagnostic options.
16914
16915    * diag.c, diag.h, diagpool.h: Added ability to allocate memory for
16916      diagonals, rather than using diagpool
16917
16918    * tally_expr.c: Fixed bug in using IIT index
16919
16920    * substring.h: Added handling of terminal reads
16921
16922    * substring.c: Using trimming maxlength.  Fixed printing of sequences with
16923      adapters.
16924
16925    * stage3hr.c: Fixed identification of duplicates.  Using total matches to
16926      compare results, rather than score.
16927
16928    * stage3.c, stage3.h: Using an HMM to find bad sections and fixing resulting
16929      dual breaks.
16930
16931    * stage2.c, stage2.h: Added Stage2_scan procedure.  Providing diagonals for
16932      diagnostic purposes.  Computing a fingerprint.
16933
16934    * stage1.c: Using a boolean to see if weight exists rather than depending on
16935      floating point value
16936
16937    * sequence.h: Added handling of finding adapters.  Computing sequence
16938      quality for trimming.
16939
16940    * sequence.c: Fixed bug where fastq quality line begins with ">".  Added
16941      removal of adapters from paired-end data.
16942
16943    * sam.h: Removed genome from argument lists
16944
16945    * sam.c: Fixed bugs in coordinates, epecially involving trimmed reads.
16946      Handling terminal reads.
16947
16948    * result.c, result.h: Added ability to report intermediate gregions or
16949      diagonals
16950
16951    * oligoindex.h: Added computation of fingerprint
16952
16953    * oligoindex.c: Added necessary clearing of oligoindex.
16954
169552010-07-09  twu
16956
16957    * pairdef.h, pairpool.c: Added Pair_goodness_hmm procedure.
16958
16959    * pair.c, pair.h: Added Pair_goodness_hmm procedure.  Added printing of
16960      stage2 fingerprint.
16961
16962    * orderstat.c: Removed reliance on a floating point equality
16963
16964    * mem.c, mem.h: Added leak check procedures
16965
16966    * match.c, match.h, matchdef.h: Using a boolean to record whether weight is
16967      zero or not, rather than relying on floating point
16968
16969    * indexdb_hr.c: Added comment
16970
16971    * indexdb.c: Fixed printf procedure
16972
16973    * iit-read.h: Removed unused IIT_print prototype
16974
16975    * iit-read.c: Fixed print_record procedure
16976
16977    * gsnap_tally.c: Fixed trimming procedure.  Added reference nucleotide in
16978      all lines. Fixed processing of all chromosomes.
16979
16980    * gsnap_splices.c: Fixed parsing.  Made uniquep false by default.  Added
16981      info about nextensions and nunique.
16982
16983    * gregion.c, gregion.h: Added fields ncovered and source, plus function
16984      Gregion_cmp
16985
16986    * get-genome.c: Removed unused function print_map
16987
16988    * genome.c, genome.h: Added function Genome_get_char
16989
16990    * dynprog.c: Added space for formatting
16991
16992    * stage1hr.h: Setting a maxlength on trimming
16993
16994    * stage1hr.c: Finding terminals rather than halfintrons.  Fixed case where
16995      splice ends are adjacent in genome.
16996
169972010-07-02  twu
16998
16999    * stage3hr.h: Added support for a terminal alignment.
17000
17001    * stage3hr.c: Added support for a terminal alignment.  Removed
17002      halfintron_support field.
17003
170042010-05-28  twu
17005
17006    * iit_store.c: Fixed issues with removing and re-inserting null divstring.
17007
170082010-05-26  twu
17009
17010    * stage1hr.c: Added trim_maxlength.  Added nmismatches to halfintron
17011      alignments.
17012
17013    * stage3hr.h: Added trim_maxlength.
17014
17015    * stage3hr.c: Added trim_maxlength.  Checking pairlength on samechr_single
17016      to see if concordant.
17017
170182010-05-21  twu
17019
17020    * iit-read.c, iit-read.h, stage3.c: Fixed printing of chromosome in map
17021      results
17022
170232010-05-20  twu
17024
17025    * stage3hr.c: Finding concordant pairs against translocations with chrnum ==
17026      0, by making copies for each chrnum and storing in effective_chrnum.
17027
170282010-05-17  twu
17029
17030    * substring.c, substring.h: Added halfintron support field.
17031
17032    * stage3hr.c, stage3hr.h: Implemented sense consistency in paired-end
17033      alignments
17034
17035    * stage1hr.c: Fixed bugs in previous implementation of half introns
17036
170372010-05-16  twu
17038
17039    * stage1hr.c, stage3hr.c: Implemented new way of handling half introns, by
17040      storing best half intron for sense and for antisense
17041
170422010-05-14  twu
17043
17044    * resulthr.c, resulthr.h, stage1hr.c, stage3hr.c, stage3hr.h: Added
17045      procedure for finding samechr pairs if no concordant ones found. Revised
17046      result types to include PAIREDEND_SAMECHR_SINGLE and
17047      PAIREDEND_SAMECHR_MULTIPLE.
17048
170492010-05-13  twu
17050
17051    * stage1hr.c: Added conditional compilation statements for filtering
17052      halfintrons
17053
17054    * gsnap.c, stage3hr.c, stage3hr.h: Handling failsonly and nofails flags for
17055      paired-end data.  Printing FASTQ format for failsonly on single-end data.
17056
170572010-04-16  twu
17058
17059    * iit-write.c: Fixed bug in freeing data when number of intervals is zero
17060
170612010-04-12  twu
17062
17063    * iit-read.c: Commented out IIT_index function
17064
17065    * sam.c: Fixed situation where query has no mapping and mate is an
17066      interchromosomal splice
17067
170682010-04-05  twu
17069
17070    * tally_expr.c: Initial import into CVS
17071
170722010-04-02  twu
17073
17074    * iit_get.c: Added allele information to -T option
17075
170762010-03-24  twu
17077
17078    * gsnap.c, sequence.c, sequence.h: Implemented processing of FASTQ files
17079
17080    * gmap.c: Using new interface to blackboard.c
17081
17082    * blackboard.c, blackboard.h: Added input2 to Blackboard_T object
17083
17084    * stage3hr.c, stage3hr.h: Fixed classification of paired-end reads when one
17085      or both ends have a translocation.
17086
170872010-03-09  twu
17088
17089    * stage3hr.c: Revised half_intron_score.  Using that score when comparing
17090      overlapping half_introns with one another.
17091
17092    * gsnap.c, stage1hr.c, stage1hr.h: Added parameter for
17093      min_distantsplicing_identity
17094
17095    * stage1hr.c: Providing querylength information when making Stage3_T splice
17096      objects
17097
17098    * stage3hr.c, stage3hr.h: Adding a penalty to half-intron alignments based
17099      on the amount of sequence that was not aligned.
17100
17101    * stage3hr.c: Changed output for samechr results
17102
17103    * substring.c: Printing sub:0 instead of exact
17104
17105    * stage1hr.c: Checking for exact matches that cross chromosomal boundaries
17106
171072010-03-08  twu
17108
17109    * resulthr.c: Making all paired reads of type concordant
17110
17111    * stage3hr.c: Added printing of samechr as a special case of
17112      PAIREDEND_AS_SINGLES_UNIQUE.
17113
17114    * sam.h: Added mate information to nomapping result.
17115
17116    * sam.c: Removed unused code.  Fixed printing of query string.  Added mate
17117      information to nomapping result.
17118
17119    * gsnap_tally.c: Handling new output format for GSNAP
17120
17121    * gsnap.c: Using new interface for SAM_print_nomapping
17122
17123    * README: Added more information about GSNAP features and output
17124
171252010-03-04  twu
17126
17127    * iit-read.c, iit-read.h: Added function IIT_dump_sam
17128
17129    * gsnap.c: Renamed resulttypes
17130
17131    * resulthr.c, resulthr.h: Added resulttype PAIREDEND_AS_SINGLES_UNIQUE
17132
17133    * substring.c, substring.h: Added function Substring_match_length
17134
17135    * stage3hr.h: Computing chrnum, chroffset, genomicstart, and genomicpos at
17136      Stage3_T level for splices.
17137
17138    * stage3hr.c: Pairing up at each successive score level.  Computing chrnum,
17139      chroffset, genomicstart, and genomicpos at Stage3_T level for splices.
17140
17141    * stage1hr.c: Fixed bug allowing deletion to extend past genomicpos 0.
17142      Fixed cases where known splicing occurs near end of sequence.  Removing
17143      duplicate hits before pairing up ends.
17144
17145    * sam.c: Made multiple changes to generate correct SAM output
17146
171472010-03-01  twu
17148
17149    * substring.c, substring.h: Removed unnecessary parameters during printing
17150
17151    * stage3hr.h: Removed unnecessary parameters.
17152
17153    * stage3hr.c: Added support information to splices, and using it to select
17154      best half introns.  Removing unnecessary parameters during printing.
17155      Checking for abort in pairing process, based on local counts.
17156
17157    * stage1hr.c: Added support information in making splices.  Not checking for
17158      sufficiency for half introns.  Using an abort_pairing_p flag, and when
17159      true, recomputing ends as singles.
17160
17161    * splicing-score.c: Using parserange module.  Allowing range to be specified.
17162
17163    * iit-read.h: Removed unused parameter
17164
17165    * iit-read.c: Changed format strings to eliminate compiler warnings
17166
17167    * genome.c: Added parentheses around some conditional statements
17168
171692010-02-26  twu
17170
17171    * stage3.c: Removed unused parameters from print functions
17172
17173    * sequence.c: Handling sequence at end of file without line feed
17174
17175    * reader.c: Commented out unused code
17176
17177    * gsnap.c: Added flags for SAM and quiet-if-excessive.  Dropped flags for
17178      probability thresholds.
17179
17180    * datadir.h: Added external interface for a function
17181
171822010-02-25  twu
17183
17184    * sam.c: Fixed bug where numbers of deletions was being reported as a
17185      negative number
17186
17187    * genome_hr.c, genome_hr.h, stage1hr.c: Removed computation of snpdiffs by
17188      genome_hr
17189
17190    * genome_hr.h, genome_hr.c: Added code for performing trimming.  Using
17191      macros for clearing and setting outside regions in start and end blocks.
17192
17193    * stage1hr.c: Added trimming of splice ends to avoid extending into region
17194      of many mismatches.  Saving all splice ends that have sufficient sequence
17195      and probability support.
17196
171972010-02-23  twu
17198
17199    * substring.c: Fixed printing of splices.  Fixed bugs in retrieving SNP
17200      information.
17201
17202    * stage3hr.h: Returning found score from all functions that create a
17203      Stage3_T object
17204
17205    * stage3hr.c: Fixed computation of pair length.  Fixed search for concordant
17206      pairs.
17207
17208    * stage1hr.h: Removed unused parameters
17209
17210    * stage1hr.c: Using found score rather than found number of mismatches.
17211      Fixed cases where indel pos was outside of query range.
17212
17213    * spanningelt.c: Fixed typecast error
17214
17215    * sam.c, sam.h: Implemented SAM output for paired-end reads
17216
172172010-02-12  twu
17218
17219    * resulthr.c, resulthr.h, stage1hr.c, stage3hr.c, stage3hr.h, substring.c,
17220      substring.h: Changed output format to have separate columns for alignment
17221      information and pair information.  Standardized output routines. Three
17222      categories for paired-end reads: concordant, samechr, and unpaired.
17223
172242010-02-11  twu
17225
17226    * sam.c, sam.h, stage1hr.c, stage3hr.c, stage3hr.h, substring.c,
17227      substring.h: Rearranged and cleaned up code for making substrings
17228
172292010-02-10  twu
17230
17231    * gmap_process.pl.in: Removed code that removed version numbers on accessions
17232
172332010-02-03  twu
17234
17235    * indexdb.c: Fixed string formatting
17236
17237    * snpindex.c: Fixed some printing statements
17238
17239    * get-genome.c: Changed call to parserange to match new interface
17240
17241    * uintlist.c, uintlist.h: Added Uintlist_find command
17242
17243    * table.c, tableint.c: Added stdlib.h header file
17244
17245    * stage3.h: Added genome to print_alignment for splice sites scores in output
17246
17247    * stage3.c: Allowing null gaps again
17248
17249    * stage2.c: Added separate data types for a 1-dimensional matrix and
17250      2-dimensional matrix representation
17251
17252    * stage1hr.c: Prevented splicing unless both dinucleotides are present
17253
17254    * stage1.c: Removed extensions of gregions
17255
17256    * sequence.c: Commented out unused functions
17257
17258    * resulthr.c, resulthr.h: Renamed result type to PAIRED_AND_PAIRABLE
17259
17260    * parserange.c, parserange.h: Implemented parse_query function
17261
17262    * pair.c, pair.h: Added donor and acceptor scores to output
17263
17264    * orderstat.c, orderstat.h: Added functions for long int
17265
17266    * oligoindex.c, oligoindex.h: Added parameter oned_matrix_p
17267
17268    * nr-x.h: Added ppois functions
17269
17270    * nr-x.c: Added ppois functions.  Fixed bug in pbinom for zero observed
17271      counts.
17272
17273    * list.c, list.h: Rewrote function for List_insert
17274
17275    * intlist.c: Handling case of empty list better for conversion to string
17276
17277    * interval.c, interval.h: Added functions for sorting intervals by position
17278
17279    * indexdb.c: Added debugging statements
17280
17281    * iit_plot.c: Using new interface to Genome_new
17282
17283    * iit_get.c: Implemented statistics function.  Using long int for tally
17284      IITs. Using parserange module.
17285
17286    * iit-read.h: Added function for divlength
17287
17288    * iit-read.c: New implementation of sorting of intervals by position
17289
172902010-02-02  twu
17291
17292    * gmapindex.c: Increased expected table size for number of chromosomes.
17293      Stopping warning messages after 100 printed.
17294
17295    * gmap.c: Added genome parameter to Stage3_print_alignment
17296
17297    * get-genome.c: Using parserange module.  Implemented flanking segments.
17298
17299    * genome_hr.c: Removed unused variables for certain compile-time conditions
17300
17301    * genome-write.c: Stopping warning messages after 50 are printed
17302
17303    * gdiag.c: Formatting changes
17304
17305    * except.c: Using pointers to exception frame objects
17306
17307    * dynprog.c: Reduced PAIRED_OPEN penalty from -24 to -18
17308
17309    * diag.h: Added function Diag_range
17310
17311    * diag.c: Reduced EXTRA_BOUNDS parameter
17312
17313    * datadir.c: Fixed bug where insufficient buffer space was provided for one
17314      string
17315
17316    * backtranslation.h: Removed void in formal parameter lists
17317
17318    * backtranslation.c: Casting character array indices to ints
17319
17320    * splicegene.c: Attempted to find genebounds on all sites
17321
173222010-02-01  twu
17323
17324    * splicegene.c: Implemented finding and reporting of alternate splice forms
17325
17326    * splicegene.c: Differentiated donor and acceptor sites.  Handling reverse
17327      strand in reverse direction.  Noting conflicts when either endpoint is
17328      close to an endpoint on the other.
17329
173302010-01-31  twu
17331
17332    * splicegene.c: Completely new rewrite based on pairinggene.c.  Attempt to
17333      assign genebounds based on tally high and tally low.
17334
173352010-01-30  twu
17336
17337    * splicegene.c: Using tally high and low.  Added hooks for alternate splice
17338      site.
17339
17340    * spliceclean.c: Performing validation based on ratio of count to maxcount
17341      over region
17342
173432010-01-29  twu
17344
17345    * spliceclean.c: Using less memory.  Attempted validation of splices based
17346      on envelope.
17347
17348    * spliceclean.c: Initial import into CVS
17349
17350    * pairingcum.c: Implemented filtering based on significance at endpoints
17351
173522010-01-27  twu
17353
17354    * pairingcum.c: Added computation on floors as well as ceilings
17355
173562010-01-25  twu
17357
17358    * pairinggene.c: Testing flat regions against the splice IIT to determine if
17359      they are intron-like.  Also adding splice edges to the original list.
17360      Splices will therefore need to be filtered.
17361
17362    * pairinggene.c: Reverted to previous version using only observed GSNAP
17363      splices
17364
17365    * pairinggene.c: Improved algorithm for distinguishing between intergenic
17366      flats and intron flats.
17367
173682010-01-24  twu
17369
17370    * geneeval.c: Initial import into CVS
17371
17372    * pairinggene.c: Improved algorithm for detecting intergenic regions.  For
17373      flats, we can use a loose criterion without a level threshold, because of
17374      the ordering constraint.  We are using both edges from flats and from
17375      gsnap splices.  We added a procedure for distinguishing between intergenic
17376      regions and long exons based on the counts_tally.
17377
17378    * pairinggene.c: Using iblocks instead of nblocks to control exon segments,
17379      so essentially all combinations of introns are considered
17380
17381    * pairinggene.c: Reading in edges from the splices_iit file, presumably
17382      after filtering
17383
17384    * pairinggene.c: Attempt to get more edges by looking up splice edges when a
17385      flat does not yield clean ones
17386
17387    * cappaths.c: Added analysis of slopes and attempt to find a flat region
17388
17389    * pairinggene.c: Fixed bug with negative unsigned int
17390
17391    * pairinggene.c: For objective function, using the count of observed splices
17392      from GSNAP.
17393
17394    * pairinggene.c: Eliminated concept of an eblock (or exon block).  Trying
17395      all intron combinations, since intergenic blocks are sufficient to contain
17396      the search space.
17397
17398    * pairinggene.c: Fixed bug where an up was a terminal, which hid the
17399      downstream down. Added some debugging code.
17400
17401    * pairinggene.c: Added checking of edges based on genome splice sites
17402
174032010-01-23  twu
17404
17405    * pairinggene.c: Made intergenic regions go between flats, and increased the
17406      length requirement.  Using auto_exonlength for adding exons.
17407
17408    * pairinggene.c: Restricted intergenic blocks to be between adjacent down to
17409      up edges
17410
17411    * pairinggene.c: Reordered procedures to minimize memory usage
17412
17413    * pairinggene.c: Implemented a new algorithm for constructing the graph,
17414      using various blocks and building the graph in stages
17415
17416    * tally.c: Added functions for the median and for adding a runlength to an
17417      existing count
17418
17419    * pairinggene.c: Fixed error in formula for computing down edge
17420
17421    * splicing-score.c: Initial import into CVS
17422
17423    * pairingcum.c: Fixed a bug where the cum was being put at the wrong
17424      position, causing the down edge to be 1 position too small.
17425
17426    * pairinggene.c: Implemented trimming of ends
17427
17428    * pairinggene.c: Implemented a new test for intergenic regions based on
17429      finding a long flat region in the counts, which should not happen in an
17430      exon.
17431
174322010-01-22  twu
17433
17434    * pairinggene.c: Added a test for sharpness based on an area ratio
17435
17436    * pairinggene.c: Fixed dynamic programming procedure
17437
17438    * pairinggene.c: Keeping a min-max test on whether introns are acceptable,
17439      but using mean levels of introns and exons for scoring.
17440
17441    * pairinggene.c: Using zero-based check on pairingfull to test for
17442      intergenic regions. Added a greedy addition of introns.
17443
17444    * pairinggene.c: Attempt to use pairing full information and gradual
17445      downsloping to find UTRs.
17446
17447    * pairinggene.c: Using intron level minus exon level to determine edges with
17448      greater sensitivity.  Implemented scores as double, rather than int,
17449      although currently using mincount and maxcount.
17450
17451    * pairinggene.c: Changed from onepath dynamic programming to multiple paths
17452      with terminals.  Using explicit objects for exons and introns.
17453
17454    * pairinggene.c: Implemented finding of initial ups
17455
174562010-01-21  twu
17457
17458    * pairinggene.c: Initial import into CVS.  Dynamic programming based on
17459      splicegene.c
17460
174612010-01-20  twu
17462
17463    * spliceeval.c: Implemented computation of reception zone and init/term
17464      status of splice sites
17465
17466    * gsnap.c, stage1hr.c, stage3hr.c, stage3hr.h: Implemented trimming of ends
17467      of sequences
17468
17469    * stage1hr.c: Allowing GC-AG splicing as well as GT-AG.  Using a sliding
17470      scale of splice site probabilities based on alignment support.
17471
17472    * stage3hr.c, stage3hr.h: Added code for using a geneprob IIT file to assist
17473      in finding splice sites
17474
17475    * gsnap.c: Added a -g flag for using a geneprob IIT file to assist in
17476      finding splice sites
17477
174782010-01-19  twu
17479
17480    * tally.c, tally.h: Added function Tally_mean()
17481
17482    * tallyflats.c: Analyzing both fwd and rev tallies and storing in a single
17483      IIT file
17484
17485    * splicegene.c: Using variability in pairing.unk rather than tallyflats to
17486      determine intragenic regions
17487
17488    * spliceeval.c: Computing slopes internally, rather than relying upon
17489      pairingflats
17490
17491    * pairingflats.c: Added median smoothing
17492
17493    * lgamma.c, lgamma.h: Added ppois function
17494
17495    * genecompare.c: Initial import into CVS
17496
17497    * geneeval.c: Changed name to genecompare.c
17498
17499    * geneeval.c: Added ability to handle comment lines in gene
17500
175012010-01-17  twu
17502
17503    * lgamma.c, lgamma.h, random.c, random.h, pairingstrand.c, tallyflats.c:
17504      Initial import into CVS
17505
17506    * tally.c, tally.h: Added functions
17507
17508    * geneeval.c: Printing goldstandard information in comment line
17509
17510    * cappaths.c: Using pairing fwd and rev iits
17511
17512    * spliceeval.c: Removed unused code.  Added procedures for merging
17513      pairingflats.
17514
17515    * splicegene.c: Added a check for validity tally flats by looking at tally
17516      information
17517
17518    * tallyflats.c: Keeping track separately of zero regions and flat regions.
17519      Changed parameters.
17520
17521    * gsnap_splices.c: Removed unused code
17522
17523    * gsnap_tally.c: Added flags for picking specific strands and for forced
17524      trimming at ends
17525
17526    * pairingflats.c: Storing regions and then printing them.  Have three
17527      states, for zero, flat, and bumpy.
17528
17529    * pairingcum.c: Print all run lengths, even those with level 0
17530
17531    * splicegene.c: Using tallyflats to determine boundaries for donor to
17532      acceptor
17533
17534    * splicegene.c: Removed donorprob and acceptorprob.  Recording and printing
17535      all extra information from each splice.  Removed unused paths code.
17536
175372010-01-16  twu
17538
17539    * pairingflats.c: Initial import into CVS
17540
17541    * spliceeval.c: Added a intron_transition buffer at the ends of each intron,
17542      where level changes are ignored.
17543
175442010-01-15  twu
17545
17546    * splicegene.c: Using a gap test on pairing IITs to determine whether to
17547      link donor to previous acceptor.
17548
17549    * spliceeval.c: Now computing statistics based on edge finding using Poisson
17550      model and number of consecutive zeroes.
17551
17552    * spliceeval.c: Printing mean pairing levels of each splice
17553
175542010-01-13  twu
17555
17556    * tallyhmm.c: Integrated parserange and tally modules.  Removed hints.
17557      Added edge detection.  Simplified state model.
17558
175592010-01-12  twu
17560
17561    * tally.c, tally.h: Implemented an exon test and a scanning solution for
17562      pairing information.
17563
17564    * splicegene.c: Using an exon test to determine if we can join splices
17565
17566    * littleendian.c, geneeval.c: Initial import into CVS
17567
17568    * bigendian.c: Created distinct function names for 64-bit procedures.  Added
17569      procedures for OUTPUT_BIGENDIAN.  Fixed compiler warning messages about
17570      truncating unsigned ints to chars.
17571
17572    * bigendian.h: Created distinct function names for 64-bit procedures
17573
175742010-01-11  twu
17575
17576    * cappaths.c: Initial import into CVS
17577
175782010-01-09  twu
17579
17580    * tally.c, tally.h: Initial import into CVS
17581
175822010-01-08  twu
17583
17584    * pairingcum.c: Initial import into CVS
17585
175862010-01-07  twu
17587
17588    * splicegene.c: Computing exonbounds for each donor
17589
175902010-01-05  twu
17591
17592    * tallyhmm.c: Using edges rather than edgepairs
17593
175942010-01-04  twu
17595
17596    * parserange.c, parserange.h, spliceeval.c: Initial import into CVS
17597
17598    * splicegene.c: Added iterative method to remove conflicting splices
17599
17600    * splicegene.c: Computing one path over forward and one path over reverse
17601      strands, instead of collecting terminals
17602
176032010-01-03  twu
17604
17605    * splicegene.c: Added reading and printing of probability values.  Added
17606      debugging statements for Paths_remove_dominated
17607
176082009-12-28  twu
17609
17610    * stage1hr.c, stage1hr.h: Added separate stage for half introns.  Added hook
17611      for geneprob_iit eval.
17612
176132009-12-22  twu
17614
17615    * splicegene.c: Initial import into CVS
17616
17617    * gsnap_splices.c: Added command for dumping graph
17618
176192009-12-21  twu
17620
17621    * gsnap.c, stage3hr.h: Added ability to print output in SAM format
17622
17623    * stage3hr.c: Added ability to print output in SAM format.  Fixed bug in
17624      identifying pairing.
17625
176262009-12-17  twu
17627
17628    * exonscan.c: Added function for writing edges
17629
176302009-12-10  twu
17631
17632    * stage1hr.c: Fixed bug in insertion at end of query sequence.  Removed
17633      requirement for HALF_INTRON_END_LENGTH.  Made separate done levels for 5'
17634      and 3' ends in paired alignment.
17635
176362009-12-04  twu
17637
17638    * gsnap_tally.c: Added ability to run on forward or reverse complement
17639      strand only
17640
17641    * gsnap_tally.c: Added ability to run on all chromosomes
17642
176432009-11-25  twu
17644
17645    * stage1hr.h: Added new masktypes
17646
17647    * stage1hr.c: Created a single procedure for omit_oligos.  Altered xfirst
17648      and xlast calculation.
17649
176502009-11-20  twu
17651
17652    * stage1hr.c: Made slight efficiency improvements in accessing floor->score
17653      array
17654
176552009-11-18  twu
17656
17657    * stage2.c: Combined features of versions 235 and 237 for both GMAP and PMAP
17658      to work.
17659
17660    * stage2.h: Updated interface
17661
17662    * stage2.c: Fixed bug where processed was updated too soon
17663
17664    * pairpool.c, pairpool.h: Added function Pairpool_transfer_n
17665
17666    * orderstat.c: Commented out debugging function
17667
17668    * gmap.c, oligoindex.c: Restored variables specific to gmap
17669
17670    * oligoindex.c, oligoindex.h: Added major and minor oligoindices
17671
17672    * gmap.c: Added Oligoindex_clear_inquery in all cases
17673
17674    * stage2.c: Restored stage 2 to working condition
17675
176762009-11-06  twu
17677
17678    * maxent.c, maxent.h: Added functions for reporting log odds scores
17679
17680    * littleendian.h: Added interface for WRITE_UINT
17681
17682    * list.c: Added check for NULL in List_truncate
17683
17684    * iit_fetch.c: Added flag for computing cumulative total of an iit.  Removed
17685      unused variables.
17686
17687    * iit_get.c: Added flags for computing mean and overall total of tally iit.
17688
176892009-11-04  twu
17690
17691    * add_rpk.c: Initial import into CVS
17692
176932009-10-30  twu
17694
17695    * exonscan.c, hint.c, hint.h, tallyhmm.c: Using edgepair and splice
17696      information in transitions, and tally and pairing information in
17697      emissions.  Providing separate training information for transitions and
17698      emissions.
17699
177002009-10-27  twu
17701
17702    * hint.c, hint.h: Initial import into CVS
17703
177042009-10-26  twu
17705
17706    * stage3.c: Removed unused variables
17707
177082009-10-14  twu
17709
17710    * pair.c: Fixed bug in PSL output
17711
177122009-10-08  twu
17713
17714    * exonscan.c, tallyhmm.c: Multiple changes.  Version used for rGASP
17715      submission 2.
17716
177172009-10-02  twu
17718
17719    * gsnap.c, stage1hr.c, stage1hr.h: Allowing user to specify max mismatches
17720      as a fraction of read length
17721
17722    * stage3hr.c, stage3hr.h: Made printing of score and insert length more
17723      consistent.  Made filtering of paired hits by score and duplicates
17724      consistent with filtering of single hits.
17725
17726    * resulthr.c, resulthr.h: Removed Pairedresult_T type
17727
17728    * stage3hr.c: Made printing of insert length consistent for paired-end reads
17729
17730    * gsnap.c, stage1hr.c, stage1hr.h: Added parameters for minimum end matches
17731      for local and distant splicing
17732
177332009-10-01  twu
17734
17735    * spanningelt.c: Made intersection procedures remove duplicates
17736
17737    * snpindex.c: Formatting change
17738
17739    * gsnap.c: Added parameters for second part of novel splicing and half
17740      intron minimum support
17741
17742    * genome_hr.c, genome_hr.h: Returning ncolordiffs
17743
17744    * gbuffer.h: Added procedures for allocing and freeing contents
17745
17746    * blackboard.c: Fixed problem with hanging when using -q batch feature
17747
17748    * stage1hr.c: Made min_end_matches work on middle indels
17749
17750    * stage3hr.h: Added printing of colordiffs and score.
17751
17752    * stage3hr.c: Added printing of colordiffs and score.  Fixed problem with
17753      printing splice on second, inverted read.
17754
17755    * stage1hr.h: Added half_intron_min_support parameter.
17756
17757    * stage1hr.c: Added half_intron_min_support parameter.  Fixed bug where
17758      deletion indels were mixed up with colordiffs.  Fixed bug where splice
17759      junctions were evaluated past beginning of genome.
17760
177612009-09-21  twu
17762
17763    * tallyhmm.c: Uses hints from splices iit and altexons iit files.  Added
17764      median filtering.
17765
17766    * exonscan.c: Added ability to get splice sites from splices iit and
17767      altexons iit file
17768
17769    * tallyhmm.c: Adding information from splices_iit and altexons_iit files
17770
17771    * tallyhmm.c: Implemented two-strand solution as default, with ability to
17772      force 1-strand solution.  Provided hooks for splices and altexons iit
17773      files.
17774
17775    * gsnap_tally.c: Added flag for handling 2-base encoded GSNAP output
17776
17777    * gsnap_splices.c: Eliminated printing of overlapping paths.
17778
177792009-09-20  twu
17780
17781    * gsnap_splices.c: Fixed various bugs
17782
17783    * gsnap_splices.c: Added ability to find alternate skipped or extra exons at
17784      each acceptor.
17785
17786    * gsnap_splices.c: Initial import into CVS
17787
177882009-09-19  twu
17789
17790    * tallyhmm.c: Implemented faster way of computing running percentiles
17791
17792    * tallyhmm.c: Implemented ability to read lambda parameters from a file.
17793      Attempted to add a SINGLE exon state and allow transitions from NON to
17794      SINGLE even when no edges were present.
17795
17796    * gsnap_tally.c: Had program determine own trimming based on scoring matches
17797      and mismatches from the ends.
17798
17799    * iit_get.c: Added -M flag for reporting mean of a region in a tally IIT
17800      file.
17801
17802    * gsnap_splicing.pl: Initial import into CVS
17803
178042009-09-18  twu
17805
17806    * exonscan.c: Printing information about sharp edges
17807
17808    * exonscan.c: Fixed bug in recording history.  Added hook for allowing GC
17809      donor site.  Added splice model probability to name of splice site.
17810
17811    * tallyhmm.c: Added flag for printing lambdas
17812
17813    * tallyhmm.c: Added ability to handle multiple sites at the same position,
17814      by making a mixture of transition tables.  Wrote down transition table
17815      explicitly.
17816
178172009-09-17  twu
17818
17819    * tallyhmm.c: Added smoothing to estimation of lambdas.  Added routines for
17820      printing genes.
17821
17822    * tallyhmm.c: Consolidated separate fivefwd, fiverev, threefwd, and threerev
17823      sites back into up and down sites.
17824
17825    * tallyhmm.c: Working version of Viterbi algorithm, but still need output of
17826      segments.
17827
17828    * tallyhmm.c: Initial import into CVS
17829
17830    * segue.c: Attempt to use objective function based on sum of counts,
17831      relative to threshold.
17832
17833    * segue.c: Reduced states to be much simpler, where only one strand can be
17834      coding at a time.
17835
178362009-09-16  twu
17837
17838    * segue.c: Added functions for printing genes by exons.
17839
17840    * segue.c: Fixed lookback lengths between sites.  Added flag for specifying
17841      knownsites iit.
17842
17843    * segue.c: Optimizing using mean square error.  Simplified code for
17844      traversing graph.
17845
17846    * segue.c: Fixed bug with computing cumulative gammln.  Version works on
17847      test data set.
17848
17849    * segue.c: Complete rewrite to handle both strands simultaneously
17850
178512009-09-15  twu
17852
17853    * exonscan.c: Using LR test to take all acceptable gene ends.  Using
17854      separate end bounds for finding gene ends.
17855
17856    * exonscan.c: Using both edge algorithms, stepfunction and linear fit.  Made
17857      different objects for Edge_T and Diff_T.  Using goodness-of-fit instead of
17858      xintercept for finding gene ends.
17859
17860    * exonscan.c: No longer using bootstrap method, but relying on testing of
17861      sites using goodness of fit.  Evaluating missing edges for both ups and
17862      downs based only on greedy splice sites, and then performing both testing
17863      of sites and tracing.
17864
178652009-09-14  twu
17866
17867    * exonscan.c: Using x-intercepts instead of step function to detect edges
17868
17869    * exonscan.c: Added hooks for a history-recording mechanism.
17870
17871    * exonscan.c: Fixed some bugs with array indices.  Added flags for debugging.
17872
17873    * segue.c: Using splice model scores to evaluate introns
17874
178752009-09-13  twu
17876
17877    * exonscan.c: Made numerous tweaks to the scanning algorithm.  Incorporated
17878      finding of ends into scanning, using x-intercepts.  Always finding ends
17879      when an edge with a splice lacks a match.
17880
17881    * exonscan.c: Implemented two-phase method on stepfunction results, first
17882      picking steps with highest probabilities, and then bootstrapping
17883      neighboring steps.
17884
17885    * exonscan.c: Allowed best prob again for stepfunction results.  Fixed bug
17886      in code in scanning procedure.  Distinguishing between donor and acceptor
17887      types for matching edges.
17888
17889    * exonscan.c: Large numbers of changes.  Implemented scanning method,
17890      testing at positions with good splice site scores, for finding other ends.
17891       Using adjacency information to decide whether to scan.  Implemented
17892      testing procedures for ends.  Removed unused code.
17893
178942009-09-12  twu
17895
17896    * exonscan.c: Implemented a Gibbs sampling method to speed up identification
17897      of changepoint, but reverted back to testing goodness of fit exhaustively
17898      over a limited range.
17899
17900    * exonscan.c: Implemented a strategy of finding edges only for those that
17901      appear to be missing.  Using changepoint to find those edges, with
17902      maximizing goodness of fit.
17903
17904    * exonscan.c: Made reasonably good step function based on log scale.
17905
17906    * exonscan.c: Implemented rampfunction as anchored to a step result.
17907
179082009-09-11  twu
17909
17910    * exonscan.c: Implemented a ramp detector using linear fitting, but too
17911      sensitive
17912
17913    * exonscan.c: Using a cumulative tally to speed up computation of segment
17914      means
17915
17916    * exonscan.c: Added hooks for a redo changepoint step
17917
17918    * segue.c: Implemented traversal of minus strand.  Implemented reading of
17919      splicepairs.
17920
17921    * segue.c: Implemented scoring of exons using log likelihood.  Implemented
17922      dynamic programming and printing of paths.
17923
179242009-09-10  twu
17925
17926    * exonscan.c: Added splice sites based on analyzing local data, using
17927      methodology from splicescan.
17928
17929    * splicescan.c: Implemented posterior log odds calculations.
17930
17931    * exonscan.c: Added filtering of other ends.  Cleaned up unused code.
17932
17933    * exonscan.c: Made output format consistent with that of splicescan
17934
17935    * segue.c: Initial import into CVS
17936
17937    * exonscan.c: Added finding of nearest good splice sites.
17938
179392009-09-09  twu
17940
17941    * exonscan.c: Method based on finding exons.  However, will need to switch
17942      to a dynamic programming method.
17943
17944    * exonscan.c: Initial import into CVS
17945
179462009-09-08  twu
17947
17948    * splicescan.c: Added options for separate output files, training mode only,
17949      and random output.
17950
17951    * stage1hr.c: Fixed algorithm for end indels.  Provided hooks for 2-base
17952      encoding.
17953
179542009-09-07  twu
17955
17956    * splicescan.c: Fixed calculations
17957
179582009-09-06  twu
17959
17960    * splicescan.c: Added ability to use a known splice site IIT file
17961
17962    * splicescan.c: Initial import into CVS
17963
179642009-09-03  twu
17965
17966    * stage1hr.c: Fixed bug where singlehits5 and singlehits3 not being
17967      initialized. Set limits on local splicing hits and attempts.
17968
179692009-09-02  twu
17970
17971    * stage3.c, stage3.h: Allowing a re-do of stage 2 for bad exons in middle
17972
179732009-08-31  twu
17974
17975    * gsnap.c: Using single-end hits already computed when paired alignments not
17976      found.
17977
17978    * gmap.c: Added minor oligoindices
17979
17980    * changepoint.c: Added comment
17981
17982    * stage3hr.c, stage3hr.h: Introduced faster pair-up procedure.  Sorting
17983      paired-end solutions by score.
17984
17985    * stage1hr.h: For paired-end alignment, returning single-end hits.
17986
17987    * stage1hr.c: Fixes to paired-end alignment: (1) stopping when excessive
17988      splicing hits or paired hits found, (2) using new pair_up procedure, (3)
17989      fixed pairing code, (4) returning single-end hits.  For dibase alignment,
17990      skipping spanning set.
17991
17992    * iit_store.c: Using total label and annotation lengths to decide if format
17993      should use 8-byte quantities.
17994
17995    * iit_get.c: Added flag to explicitly indicate coordinate is a label.  Added
17996      flag to print all zeroes in tally mode.
17997
179982009-08-28  twu
17999
18000    * stage3hr.c, stage3hr.h: Taking a splicing penalty for all splices.  Added
18001      code for marking dibase mismatches.
18002
18003    * stage1hr.c, stage1hr.h: Made procedure work for 2-base encoded reads
18004
18005    * oligo.c: Added code to read 2-base encoded queries
18006
18007    * reader.c, reader.h: Added field to indicate if Reader_T is for dibase
18008      queries
18009
18010    * littleendian.h: Added code for handling 8-byte quantities
18011
18012    * iit-read.c, iit-write.c, iitdef.h: Added version 4 format, which uses
18013      8-byte quantities to store label pointers and annotation pointers.
18014
18015    * gsnap_tally.c: Added trimming on left and right
18016
18017    * gsnap.c: Added flag for 2-base mode.  Added local splice penalty.
18018
18019    * genome_hr.c, genome_hr.h: Provided hooks for dibase procedures
18020
18021    * genome.c, genome.h: Provided exposure to uncompress_mmap directly from
18022      blocks, needed by dibase procedures.
18023
18024    * dibase.c, dibase.h: Initial import into CVS
18025
18026    * compress.c: Added code for compressing 2-base color genomes, but not
18027      necessary.
18028
18029    * bigendian.c, bigendian.h: Added functions for 8-byte quantities
18030
18031    * access.c: Changed types in debugging statements for off_t
18032
180332009-08-21  twu
18034
18035    * oligo.c: If state is invalid, skipping forward until a valid state is found
18036
18037    * sequence.h: Added FILE * parameter for oneline outputs.
18038
18039    * sequence.c: Added FILE * parameter for oneline outputs.  Added hooks for
18040      skipping dashes, but appears to be buggy.
18041
18042    * gsnap.c: Added flag to turn off output (quiet) if too many are found.
18043
18044    * stage1hr.c: Classified half introns as long distance
18045
180462009-08-20  twu
18047
18048    * stage1hr.c: Moved half introns after distant splicing.  Set fast_level to
18049      be 1, if user hasn't already specified it.
18050
180512009-08-19  twu
18052
18053    * gmap_setup.pl.in: Using "use warnings" instead of -w flag
18054
180552009-08-18  twu
18056
18057    * stage3hr.h: Added quiet-if-excessive flag.
18058
18059    * stage3hr.c: Fixed problem where total_nmismatches not being set for
18060      indels. Making printing of excessive paths consistent with single-end
18061      behavior.
18062
18063    * stage1hr.c: Fixed problem where pair_up function was creating circular
18064      loops by calling List_append more than once.
18065
18066    * list.c, list.h: Added function List_dump
18067
180682009-08-17  twu
18069
18070    * gsnap-to-iit.c, gsnap_tally.c: Renamed gsnap-to-iit.c to gsnap_tally.c
18071
18072    * stage1hr.c: Fixed bug where program tried to find deletions at end
18073      extending past coordinate 0U.
18074
180752009-08-14  twu
18076
18077    * stage3hr.c: Revised paired-end output to show npaths and indication of
18078      paired or unpaired.
18079
18080    * bigendian.c, gsnap.c, mem.c, iit-read.c, sequence.c, genome.c, indexdb.c,
18081      indexdb_hr.c: Fixed compiler warnings from -Wall
18082
18083    * iit-read.c: Fixed bug where divno not checking the last div.
18084
18085    * sequence.h: Added a function to the interface
18086
18087    * genome.c: Using SNP_FLAGS for getting alternate genome
18088
18089    * indexdb.c, indexdb.h: Added procedure for reading with diagterm and
18090      sizelimit
18091
18092    * indexdb_hr.c: Providing information about nmerged
18093
18094    * maxent.h: Defined a variable to specify maximum storage required
18095
18096    * maxent.c: Added code for computing splices from revcomp sequences
18097
18098    * cmet.c: Added a missing type
18099
18100    * genome_hr.c, stage3hr.c: Fixed compiler warnings from -Wall.
18101
18102    * stage1hr.h: Added Floors_free() to interface.
18103
18104    * stage1hr.c: Fixed bug where unsigned int * assigned to signed int *.
18105      Fixed compiler warnings from -Wall.
18106
181072009-08-04  twu
18108
18109    * list.c, list.h: Added function List_truncate
18110
18111    * pair.c, pair.h: Added function Pair_fracidentity_max
18112
18113    * stage3.h: Added some interfaces
18114
18115    * stage3.c: Forward/reverse decision based on local scoring around each
18116      intron. Distal/medial step now truncates distal exon at best point, and
18117      iterates.  When edges cross in changepoint step, now chopping shortest end.
18118
18119    * stage2.c: Allowing rightward and leftward shifts in finding shifted
18120      canonical introns.  Fixed bug in scoring for reverse introns.  Adjusted
18121      scoring for canonical introns.
18122
18123    * stage1.h: Imposing a size limit on position lists, so large ones are
18124      ignored.
18125
18126    * stage1.c: Imposing a size limit on position lists, so large ones are
18127      ignored. Finding best solutions for each level of numbers of exons.
18128
181292009-07-30  twu
18130
18131    * stage3.c: Using existing pairs/path in end exons, rather than recomputing
18132      in distal/medial calculation.  Moved distal/medial after changepoint.
18133
18134    * stage3.h: Moved distal/medial calculation to be after changepoint
18135
18136    * README: Added information about processing reads from bisulfite-treated DNA
18137
18138    * gsnap.c, stage3hr.c, stage3hr.h: Added printing of SNP information
18139
18140    * stage1hr.c: Making correct decision on when to find splice ends for half
18141      introns
18142
18143    * gmap.c: Added indexdb_size_threshold.  Made -9 flag do checking, but not
18144      print full diagnostics.
18145
18146    * dynprog.c: Made gap penalties at ends the same as for middle
18147
18148    * diag.c: Added debugging statement
18149
18150    * Makefile.gsnaptoo.am: Included snpindex
18151
181522009-07-29  twu
18153
18154    * README: Added information about providing information to GSNAP about known
18155      splice sites and SNPs
18156
18157    * gsnap.c: Increased default max_middle_insertions from 6 to 9
18158
18159    * stage1hr.c: Checking entire query for nsnpdiffs when snp_blocks is present
18160
181612009-07-07  twu
18162
18163    * stage1.c: In find_best_path, calculating median of segments and requiring
18164      that medians ascend or descend.  Adjusting scores for overlaps between
18165      segments.
18166
181672009-07-01  twu
18168
18169    * gsnap.c, stage1hr.c, stage1hr.h: Added separate probability thresholds for
18170      local and distant splicing
18171
18172    * stage1hr.c, stage1hr.h: Re-implemented paired-end alignment
18173
181742009-06-29  twu
18175
18176    * gsnap.c, stage1hr.c, stage1hr.h, stage3hr.c, stage3hr.h: Introduced a
18177      penalty for distant splicing
18178
18179    * gsnap.c: Introduced masktype
18180
18181    * stage3hr.c, stage3hr.h: Introduced chimera_prob field for Stage3_T object.
18182       Increased distantsplicing penalty from 1 to 2.
18183
18184    * stage1hr.c: For novel splice sites in local splicing, requiring canonical
18185      dinucleotides plus sufficient probability score in either donor or
18186      acceptor.  Fixed bug in recognizing plus antisense splicing. Restricted
18187      half introns to known splice sites.
18188
18189    * stage1hr.h: Added masktype
18190
18191    * stage1hr.c: Added masktype.  Added back half introns.
18192
181932009-06-28  twu
18194
18195    * stage1hr.c: Merged finding of distant splice pairs using known or novel
18196      splice sites.
18197
18198    * stage1hr.c: Consolidated finding of splice pairs using known or novel
18199      splice sites.  Making a single call to retrieve genomic segment for local
18200      splicing.
18201
182022009-06-26  twu
18203
18204    * stage1hr.c: Removed unused tournament and middle_indel_p code
18205
18206    * stage1hr.c: Using floors for novel distant splicing
18207
18208    * stage1hr.c: Using floors for finding novel local splicing
18209
18210    * stage3hr.h: Added field for paired_up
18211
18212    * stage3hr.c: Added field for paired_up.  Removing duplicates that differ in
18213      indel gap length.
18214
182152009-06-25  twu
18216
18217    * stage1hr.h: Implemented paired-end alignment
18218
18219    * stage1hr.c: Implemented paired-end alignment.  Fixed computation of floors
18220      for middle indels.  Fixed computation of find_segments_all for novel
18221      splicing.
18222
182232009-06-18  twu
18224
18225    * Makefile.dna.am, Makefile.gsnaptoo.am: Removed segmentpool files from
18226      Makefile.am
18227
18228    * gsnap.c, segmentpool.c, segmentpool.h, stage1hr.c, stage1hr.h: Removed all
18229      references to segmentpool
18230
18231    * stage1hr.c: Removed code for Segmentpool_T
18232
18233    * segmentpool.c, segmentpool.h: Removed some fields
18234
18235    * stage1hr.c: Fixed bug resulting from resetting nmismatches_all
18236      unnecessarily
18237
18238    * stage3.c: Fixed bug in computing goodness_rev
18239
182402009-06-09  twu
18241
18242    * dynprog.c: Increased penalties for mismatch at ends and for opening indels
18243      around introns.
18244
18245    * gsnap.c: Increased default definition of shortsplicedist from 200000 to
18246      500000.
18247
18248    * stage1hr.c: Fixed bugs in dealing with empty plus_segments or
18249      minus_segments.
18250
182512009-06-08  twu
18252
18253    * stage1hr.c: Created variables for faster retrieval of floor scores.
18254      Reporting half introns only if both local and distant known splicing fail.
18255
182562009-06-07  twu
18257
18258    * stage1hr.c: Fixed computation of floors, floor_xfirst, and floor_xlast
18259
18260    * stage1.c: Fixed bug in debugging variables
18261
18262    * stage1hr.c: Implemented code that does not use Segmentpool_T object
18263
18264    * pair.c: Treating ambiguous characters as mismatches for computing
18265      matchscores
18266
18267    * gmap.c: Proving an option for splicesites output.  Allowing use of a SNP
18268      genome version.
18269
18270    * dynprog.c: Giving ambiguous characters a negative score in GMAP, but not
18271      PMAP
18272
18273    * stage3.c: Computing direction using just canonical introns and indel
18274      openings
18275
18276    * stage1hr.c: Removed middle_indel_p field in Segment_T object
18277
182782009-06-06  twu
18279
18280    * segmentpool.c, segmentpool.h, stage1hr.c: Removed floor from the
18281      Segmentpool_T object
18282
18283    * segmentpool.c, segmentpool.h: Added a procedure for pushing without any
18284      floors
18285
18286    * stage3hr.c: Added a penalty for distant splicing
18287
18288    * stage1hr.c: Performing local and distant splicing in separate levels
18289
182902009-06-05  twu
18291
18292    * pair.c, pair.h, stage3.h: Added a function for printing splicesites
18293
18294    * stage3.c: Reinstated checking of goodness to determine direction, but now
18295      considering just canonical introns and indels.  Added a final call to
18296      assign_gap_types after trimming.
18297
182982009-06-04  twu
18299
18300    * stage1hr.c: Small improvements to code for binary_search and dual_search
18301
18302    * stage1hr.c: Made improvements in code for dual_search by removing lowi and
18303      lowj and updating pointers for positions1 and positions2.
18304
183052009-06-03  twu
18306
18307    * stage1hr.c: Using single splicesites list for novel local splicing
18308
18309    * stage1hr.c: Implemented slightly more efficient code for dual_search.
18310
18311    * gsnap.c, stage1hr.c, stage1hr.h: Improved speed of finding known
18312      splicesites by doing one dual_search for all splice types.
18313
18314    * stage1hr.c: Revised dual_search procedure to handle overlapping
18315      splicesites and positions correctly.
18316
183172009-06-02  twu
18318
18319    * genome_hr.c: Fixed bug in handling pos5 for Genome_mismatches_left.
18320
18321    * stage1hr.c: Fixed dual_search so it handles all overlapping splicesites
18322      and intervals
18323
18324    * stage1hr.c: Implemented faster code for finding novel splice ends
18325
18326    * stage1hr.c: Changed Floors_T to be just a single set of scores.  Using
18327      floors to prune known splice ends.
18328
183292009-06-01  twu
18330
18331    * stage3hr.c: Removed genomiclength from Stage3_T object.  Using chrnum of 0
18332      to indicate distant splicing.
18333
18334    * stage1hr.c: Removed oldindels code
18335
183362009-05-29  twu
18337
18338    * stage1hr.c: Using compressed nucleotide-level alignment for end indels.
18339      Preventing firstbound and lastbound from going past read boundaries.
18340
18341    * stage1hr.c: Fixed assignment of shortdistancep in splice pairs.  Fixed
18342      assignments of prior penalties.
18343
183442009-05-25  twu
18345
18346    * stage3hr.c: Eliminating duplicates where genomicstart and genomicend are
18347      equal
18348
18349    * stage1hr.c: Excluding splice positions at ends in novel local splicing
18350
18351    * genome_hr.c: Fixed bug in mismatches_right_snps where startdiscard was not
18352      being applied
18353
183542009-05-24  twu
18355
18356    * stage3hr.h: Added genomicend to Stage3_T object
18357
18358    * stage3hr.c: Made removal of duplicates work for splicing by adding
18359      genomicend to Stage3_T object.  Made removal of duplicate splice ends
18360      faster.
18361
18362    * stage1hr.c: Eliminating splice ends where splice position occurs too close
18363      to the beginning.  Fixed triage to treat novel splicing and known splicing
18364      equally, and to give preferences to substitutions over indels over
18365      splicing when hits are found at each type.
18366
18367    * gsnap.c: Passing only one splice prob to stage 1
18368
18369    * stage3hr.c: Lookin up splicesites iit only if site was known
18370
18371    * stage1hr.h: Passing only one splice prob.
18372
18373    * stage1hr.c: Implemented finding of novel distant splice pairs.  Fixed bug
18374      in dual_search.
18375
183762009-05-23  twu
18377
18378    * stage1hr.c: Implemented novel local splicing, which includes known splice
18379      sites
18380
183812009-05-22  twu
18382
18383    * indexdb_hr.c, indexdb_hr.h, spanningelt.c, stage1hr.c: Implemented gallop
18384      search
18385
18386    * stage1hr.c, stage1hr.h: Implemented a new flow through the different
18387      algorithms.
18388
18389    * gsnap.c: Made shortsplicedist an unsigned int.  Changed name of spliceprob
18390      to minspliceprob.
18391
18392    * stage3hr.c: Using total number of mismatches to score spliced reads
18393
18394    * stage1hr.c: Implemented dual intersection method for finding known splice
18395      sites
18396
18397    * genome_hr.c, genome_hr.h: Added substring parameters to
18398      Genome_count_mismatches_limit
18399
184002009-05-20  twu
18401
18402    * stage1hr.h: Changed name from spliceprob to minspliceprob
18403
18404    * stage1hr.c: Moved mismatches, indels, and splicing into a single
18405      procedure.  Using firstbound and lastbound for half introns.  Added
18406      provisions for a stretch procedure which uses a spanning set with
18407      nrequired = 1.
18408
18409    * stage2.c: Modified debugging output
18410
18411    * oligoindex.c: Removed unnecessary clearing step
18412
18413    * gmap.c: Fixed bug where program failed to clear oligoindices after a poor
18414      or repetitive sequence.
18415
184162009-05-17  twu
18417
18418    * genome_hr.c, gsnap.c: Enabled methylation mode on snp databases
18419
18420    * indexdb.c: Improved pre-loading messages for snp databases
18421
18422    * snpindex.c: Changed naming convention for snp databases
18423
18424    * stage1hr.c: Fixed bug in calling new substitution with fixed value for
18425      cmetp.
18426
18427    * cmetindex.c: Made the program work for snp databases
18428
18429    * stage3hr.c, stage3hr.h: Stage3 now handles all marking of mismatches.
18430      Implemented marking of methylation for indels.
18431
18432    * stage1hr.c: Moved functions for counting and marking mismatches to
18433      stage3hr.c
18434
18435    * gsnap.c: Removed call to specify methylation printing
18436
18437    * genome.c, genome.h: Removed function for signaling methylation printing
18438
184392009-05-16  twu
18440
18441    * stage1hr.h: Providing second indexdb and size_threshold to procedures.
18442
18443    * stage1hr.c: Omitting frequent 12-mers in the middle and poly-AT 12-mers at
18444      the ends.  Performing another round without omitting 12-mers if necessary.
18445      Added first implementation for handling methylation data.
18446
18447    * oligo.c, oligo.h: Changed definition of repetitive to mean only shifts of
18448      1, 2, or 3 nucleotides.  Added procedure to mark frequent oligos, but not
18449      used.
18450
18451    * indexdb.c, indexdb.h: Added procedure to compute indexdb mean size.
18452
18453    * gsnap.c: Added flag to deal with methylation data.  Passing size_threshold
18454      to stage 1 procedure.
18455
18456    * genome_hr.c, genome_hr.h: Added procedures to deal with methylation data.
18457
18458    * cmetindex.c: Creating two indexdb's, one for plus strand and one for minus
18459      strand. Moved conversion tables to cmet.c.  Removed conversion of genome.
18460
184612009-05-14  twu
18462
18463    * cmet.c, cmet.h: Initial import into CVS
18464
184652009-05-13  twu
18466
18467    * cmetindex.c: Initial import into CVS.  Implements reverse genome.
18468
18469    * iit-read.c: Added warning for use of IIT_string_from_position
18470
18471    * stage2.c: Fixed bug in using wrong indexsize for a given oligoindex
18472
18473    * oligoindex.c, oligoindex.h: Added a procedure to return indexsize
18474
18475    * get-genome.c: Fixed printing of coordinates
18476
18477    * stage1hr.c, stage1hr.h: Changed polyat to omitted.  Created separate
18478      procedure to mark omitted, which omits repetitive oligos except at the
18479      ends, except for poly-AT at the ends.
18480
18481    * stage3hr.c, stage3hr.h: Made removal of duplicates faster
18482
18483    * stage1hr.c, stage1hr.h: Added ability to handle methylation data.  Using
18484      simplified version of Genome_fill_buffer that does not check chromosome
18485      bounds.
18486
18487    * snpindex.c: Changed name of SNP genome file from genome to genomecomp.
18488
18489    * genome.c, genome.h, gsnap.c: Added ability to handle methylation data
18490
184912009-05-11  twu
18492
18493    * gsnap.c: Made min_end_matches a user-adjustable parameter.  Made
18494      maxchimerapaths the same as maxpaths.
18495
18496    * stage1hr.h: Made min_end_matches a user-adjustable parameter
18497
18498    * stage1hr.c: Fixed computation of end indels so it finds maximal length
18499      from end. Made min_end_matches a user-adjustable parameter.  Removed
18500      allocation of polyat outside of 0..query_lastpos.
18501
185022009-05-10  twu
18503
18504    * stage1hr.c: Fixed bug in turning off end indels.  Implemented faster
18505      method for computing firstbound and lastbound for xfirst and xlast
18506      computation.
18507
185082009-05-08  twu
18509
18510    * stage3hr.c: Fixed sorting so it uses hittype, and not indel separation
18511
18512    * oligo.c, oligo.h: Added function Oligo_mark_repetitive
18513
18514    * stage1hr.c: Fixed computations of xfirst and xlast in presence of
18515      repetitive oligos.  Increased speed of computing nmismatches_long in end
18516      indels.
18517
18518    * genome_hr.c, genome_hr.h: Modified Genome_mismatches_left and
18519      Genome_mismatches_right to take pos5 and pos3 as arguments.
18520
185212009-05-07  twu
18522
18523    * stage1hr.c: Fixed floor formulas again.  Generalized idea of polyat to
18524      mean all repetitive oligos.  For middle indels, computing middle floor
18525      explicitly when polyat oligos are present.  For end indels, computing
18526      firstbound and lastbound to handle cases with polyat oligos at the ends.
18527      Reordered compute_end_indels to starting computing from 1 deletion.
18528
185292009-05-06  twu
18530
18531    * spanningelt.c, spanningelt.h, stage1hr.c: Removed unused code
18532
18533    * spanningelt.c, spanningelt.h: Added code for spanning set computation of
18534      end indels, but not used.
18535
18536    * gsnap.c, stage1hr.h: Added floors_array
18537
18538    * stage1hr.c: Made changes to floors: (1) Made floor_middle formula handle
18539      middle indels, (2) Created Floors_T object to precompute floors and handle
18540      polyat oligomers.  Fixed computation of end indels, now can handle
18541      mismatches.  Fixed bug in computing max_indel_sep.  Added code for
18542      possible fast computation of end indels, but not used.
18543
185442009-05-05  twu
18545
18546    * stage1hr.c: Replaced arithmetic expression for smallesti with if statement.
18547
18548    * stage3hr.c: Changed sorting order so non-indel alignments rank higher than
18549      indel alignments.
18550
18551    * stage1hr.c: Computing floors between two segments in finding middle indels
18552
185532009-05-04  twu
18554
18555    * stage1hr.c: Added limit to number of middle indels found
18556
185572009-05-01  twu
18558
18559    * stage1hr.c: Implemented incremental execution of fast mismatch algorithm.
18560
18561    * stage1hr.c: Fixed setting of indel separation in triage.  Prevented
18562      boostpos from being a compoundpos.  Turned off tournament tree, since it
18563      was causing a crash.
18564
18565    * spanningelt.c: Fixed bug occurring when all spanningelts have no positions
18566
185672009-04-30  twu
18568
18569    * stage1hr.c: Fixed bug with setting mismatch levels for suboptimal results.
18570      Solving all multiple_mm solutions in a single run.
18571
18572    * stage3hr.c: Fixed bugs in Stage3_remove_duplicates
18573
18574    * stage3hr.h: Added function for counting number of optimal hits.
18575
18576    * stage3hr.c: Fixed problem with undesired removal of second splice site
18577      within a given genomic region.  Added function for counting number of
18578      optimal hits.
18579
18580    * stage1hr.c: Removed code for REFINE_MISSES.  Added back missing statement
18581      setting duplicates_possible_p.
18582
18583    * gsnap.c: Removed minlevel and maxlevel and replaced with suboptimal
18584      mismatches. Made sort always happen.  Changed flag name from invertp to
18585      circular-output.
18586
18587    * stage1hr.c, stage3hr.c, stage3hr.h: Using score in sorting procedure.
18588      Added provision for minlevel in Stage3_optimal_score.
18589
18590    * stage1hr.c, stage1hr.h: Refined miss_querypos5 and miss_querypos3
18591      boundaries.  Implemented triage for minlevel_mismatches and
18592      maxlevel_mismatches instead of minlevel and maxlevel, and accounted for
18593      fast mismatch algorithm.
18594
185952009-04-27  twu
18596
18597    * gmapindex.c: Allowing contigs with a single nucleotide
18598
185992009-04-26  twu
18600
18601    * spanningelt.c, spanningelt.h: Removed boosterset idea and returning only
18602      the minscore within the spanningset.
18603
18604    * stage1hr.c: Removed all recursion from identify_multimiss_iter.  Added
18605      feature to modify spanningset list and update a counter when a spanningelt
18606      is empty.  Removed boosterset idea and restored a single boostpos.
18607      Correctly implemented fast multimiss algorithm.
18608
18609    * spanningelt.c, spanningelt.h, stage1hr.c: Created separate scores for
18610      candidate generation and for pruning. Implemented idea of a boosterset,
18611      instead of a single boostpos, but seems to be slower.
18612
186132009-04-25  twu
18614
18615    * Makefile.dna.am, Makefile.gsnaptoo.am, indexdb_hr.c, indexdb_hr.h,
18616      spanningelt.c, spanningelt.h, stage1hr.c: Created formal Spanningelt_T
18617      object and rewrote algorithms to use it.
18618
186192009-04-24  twu
18620
18621    * stage1hr.c: Preliminary implementation of a multimiss algorithm
18622      generalized from the onemiss algorithm.
18623
186242009-04-23  twu
18625
18626    * stage3hr.c: Fixed bug that resulted in duplicate outputs
18627
18628    * stage1hr.c: Reverting to previous version
18629
18630    * stage1hr.c: Implemented vertical solution for find_segments_multiple_mm,
18631      which handles each querypos one at a time, but result is much slower.
18632
18633    * stage1hr.c: Removed code from before DELAY_READING.  Implemented
18634      tournament trees, which require slightly fewer instructions than heaps.
18635
186362009-04-22  twu
18637
18638    * Makefile.gmaponly.am, Makefile.pmaptoo.am: Included changepoint files.
18639
18640    * Makefile.dna.am: Included changepoint files.  Included indexdb_dump
18641      program.
18642
18643    * bigendian.c: Added code to output files in bigendian format
18644
18645    * stage3.c, stage3.h: Parameterized TRIM_END_PVALUE.  Fixed map feature of
18646      GMAP for new IIT format.
18647
18648    * stage3hr.c: Loosened criteria for duplicate hits that was eliminating
18649      overlapping matches with the same number of mismatches.
18650
18651    * indexdb_hr.c, indexdb_hr.h, stage1hr.c: Using binary_threshold instead of
18652      parent_ndiagonals
18653
18654    * indexdb.c: Fixed bug for bigendian machines
18655
18656    * iitdef.h: Changed type of divsort to be int, apparently for compiler
18657      warnings?
18658
18659    * iit-read.c, iit-read.h: Implemented algorithm for map feature of GMAP to
18660      use new IIT format
18661
18662    * gmap.c: Using divint crosstable for map feature.
18663
186642009-04-21  twu
18665
18666    * stage1hr.c: Implemented a delay in converting positions to diagonals until
18667      needed.
18668
186692009-04-08  twu
18670
18671    * stage3.c: Using number of matches to trim ends, not total length
18672
186732009-04-04  twu
18674
18675    * indexdb_hr.c, stage1hr.c: Made exact and sub-1 code work correctly on
18676      bigendian machines without having to copy memory.
18677
186782009-04-03  twu
18679
18680    * gmap.c, stage3.c, stage3.h: Modified -H flag to let user control
18681      minendexon length
18682
18683    * stage1hr.c: Skipping computation on poly-A or poly-T sequences
18684
186852009-04-02  twu
18686
18687    * stage3.c: Added check for null pairs before generating matchscores
18688
18689    * pair.c, pair.h: Revised procedures for producing matchscores
18690
18691    * gsnap.c: Adding number of paths to output.  Removed check for inplace
18692      being possible.
18693
18694    * snpindex.c: Added comment
18695
18696    * indexdb.c, indexdb.h: Removed procedures that create sentinels.
18697
18698    * indexdb_hr.c: Not using sentinels.  For compoundpos, always reading in
18699      place and converting to bigendian when needed.
18700
18701    * stage1hr.c, stage1hr.h: Not using sentinels.  For exact and sub:1, always
18702      reading in place and converting to bigendian when needed.
18703
18704    * stage1.c: Added limits on number of gregions to speed up program
18705
18706    * stage3.c, stage3.h: Using changepoint algorithm and iterative trimming of
18707      ends to improve ends of alignment.
18708
18709    * indexdb.c, indexdb.h: Added hooks for sentinel in indexdb files
18710
18711    * gregion.c, gregion.h: Rewrote support filtering to use either a fixed
18712      difference (for longer sequences) or a percentage difference (for shorter
18713      ones).  For extending sequences, using querylength for adequate support or
18714      short support.
18715
18716    * gmap.c: Improved documentation for --help
18717
18718    * genome-write.c: Always print out number of bad characters
18719
18720    * changepoint.c, changepoint.h: Modified procedures to ignore -1 values in
18721      input
18722
187232009-03-31  twu
18724
18725    * stage1hr.c: Using heaps for nomiss and onemiss, but not for exact.
18726      Replacing NULL lists with sentinels when necessary.
18727
18728    * stage1.c: Implemented limit on number of gregions before finding unique
18729      ones, to prune nonspecific, slow sequences.
18730
187312009-03-26  twu
18732
18733    * stage1hr.c: Using pointers and relying on sentinels to advance through
18734      lists.
18735
18736    * stage1hr.c: Removed old code
18737
18738    * stage1hr.c: Changed sub:1 recursive procedures to iterative
18739
18740    * stage1hr.c: Reverting to previous version
18741
18742    * stage1hr.c: Attempt to use information from third and later lists to
18743      advance first list
18744
18745    * stage1hr.c: Using results of second list to speed up intersection
18746
187472009-03-25  twu
18748
18749    * stage1hr.c: Implemented a faster version of performing intersection for
18750      exact matches.
18751
18752    * stage1hr.c: Starting exact matches with intersection of first two lists.
18753
18754    * stage1hr.c: Made non-heap compoundpos_find procedure the default
18755
18756    * indexdb_hr.c, indexdb_hr.h: Implemented a non-heap version of searching
18757      through a union of positions.
18758
18759    * indexdb_hr.c: Added hooks for handling indexdb with sentinels.  Removed
18760      unused code.
18761
18762    * stage1hr.c: Implemented iterative version of find_exact_aux.  Depending on
18763      sentinel to increase speed of loop.
18764
187652009-03-24  twu
18766
18767    * gmap.c, pair.c, pair.h, stage3.c, stage3.h: Fixed printing in psl format
18768      when user provides a segment
18769
18770    * compress.c, compress.h, genome-write.c: Fixed reporting of non-ACGTNX
18771      characters
18772
18773    * gmap.c, stage3.c, stage3.h: Removed -k flag and trimexonpct parameter
18774
18775    * iit_fetch.c: Added ability to compute ratios from pdl files
18776
18777    * iit-read.c, iit-read.h: Provided procedures for dumping version 1 IITs
18778
18779    * get-genome.c: Fixed dump output of contigs
18780
18781    * compress.c: Reduced number of error messages for non-ACGTNX characters
18782
18783    * stage3.c: No longer using non-canonical introns for trimming end exons,
18784      only binomial test.  Fixed bad behavior for theta values close to 1.0. Not
18785      reporting alignments with fewer than 20 matches.
18786
187872009-03-19  twu
18788
18789    * iit_fetch.c: Added hook for splices iits
18790
18791    * plotgenes.c, plotgenes.h: Added function for handling splices.  Made sure
18792      nbins > 0.
18793
187942009-03-18  twu
18795
18796    * plotgenes.c, plotgenes.h: Added function Plotgenes_fetch_points.  Renamed
18797      some functions.
18798
18799    * gmap_setup.pl.in: Gives files on command line to fa_coords and
18800      gmap_process, which can then add linefeed if necessary to lines.
18801
18802    * gmap_process.pl.in: Allowed program to read from either stdin or files on
18803      command line. In the latter case, it adds linefeed if necessary to lines.
18804
18805    * fa_coords.pl.in: Re-indented program.  Allowed program to read from either
18806      stdin or files on command line.
18807
18808    * iit_fetch.c: Implemented handling of PDL files.  Added -s flag to specify
18809      sample number.
18810
18811    * stage3.c: Trimming using a binomial test on end exons
18812
18813    * gdiag.c, genome.c, genome.h, get-genome.c, gsnap-to-iit.c, gsnap.c,
18814      indexdb.c, indexdb.h, oligo-count.c, snpindex.c: Modified programs to take
18815      a snp_root argument to -V
18816
18817    * gmap.c: Added "Processed" message to stderr at end of batch run
18818
18819    * Makefile.dna.am, Makefile.gmaponly.am, Makefile.gsnaptoo.am,
18820      Makefile.pmaptoo.am: Updated Makefiles
18821
18822    * pair.c, pair.h: Added function Pair_fracidentity_simple
18823
18824    * pbinom.c, pbinom.h: Initial import into CVS
18825
188262009-03-17  twu
18827
18828    * gsnap.c: Considering too many paths as a failure type for the --nofails
18829      and --failsonly flags
18830
188312009-03-10  twu
18832
18833    * stage1.c: Added variable needed in debugging
18834
18835    * plotgenes.c, plotgenes.h: Renamed functions relating to fetching
18836
18837    * iit_fetch.c: Removed code relating to printing
18838
18839    * iit-read.c: Fixed issues for bigendian machines and for fileio access
18840
18841    * datadir.c, datadir.h, gmap.c: Added extra functionality to show available
18842      databases and map files
18843
18844    * iit_fetch.c: Initial import into CVS.  Copied from iit_plot.c
18845
18846    * blackboard.c, changepoint.c, compress.c, diag.c, dynprog.c,
18847      genome-write.c, genome.c, get-genome.c, gmap.c, gmapindex.c, gregion.c,
18848      iit-read.c, iit-write.c, indexdb.c, indexdb_hr.c, intpool.c, oligo.c,
18849      oligoindex.c, oligoindex.h, pair.c, reader.c, sequence.c, smooth.c,
18850      stage1.c, stage2.c, stage3.c, translation.c: Removed unused variables
18851      based on SGI compiler warnings
18852
188532009-03-09  twu
18854
18855    * iit-read.c: Fixed reading of labels and annotations using fileio on
18856      bigendian machines.
18857
18858    * bigendian.c, bigendian.h: Added command for reading uint using fileio
18859
18860    * stage3.c: Fixed bug where gap was left at 3' end before extending the end.
18861
18862    * stage1hr.c: Fixed counting of mismatches on revcomp sequences
18863
18864    * indexdb_hr.c: Made heapsize an int, rather than unsigned int, in all
18865      procedures
18866
18867    * indexdb.c: Moved variable declarations above debugging statement to
18868      satisfy SGI compiler.
18869
18870    * gsnap.c: Changed variable names involving -q flag to "part_"
18871
18872    * get-genome.c, iit_get.c: Printing divstring only if ndivs > 1
18873
18874    * iit-read.h: Added ability to retrieve divstring from universal coordinates.
18875
18876    * iit-read.c: Added ability to retrieve divstring from universal
18877      coordinates.  Fixed bug occurring during reading annotatinos using fileio.
18878
18879    * iit-write.c: Fixed bug relating to use of null annotlist to indicate
18880      altstrain IIT.
18881
188822009-03-06  twu
18883
18884    * pair.c, pair.h, stage3.c: Fixed bug where -P and -Q flags were printing
18885      C-to-N on antisense cDNAs.
18886
188872009-02-20  twu
18888
18889    * gmap_setup.pl.in: Added FILE_ENDINGS variable
18890
18891    * get-genome.c: Removed debugging flag
18892
18893    * sequence.c: Improved handling of blank lines
18894
18895    * indexdb.c, snpindex.c: Changed offsets and positions to occur at modulo 0
18896      intervals based on chrpos, not universal position.
18897
18898    * iit_get.c: Added feature to print results from tally types of IITs
18899
18900    * iit-read.c: Fixed bug in re-fetching matches based on index
18901
18902    * gsnap.c: Cleared flags for batch and novelsplicing
18903
18904    * types.h: Removed an extraneous open brace
18905
18906    * stage3hr.c, stage3hr.h: Calculating splice site model score at print time
18907      for all splice sites.
18908
18909    * stage1hr.c: Providing sense information for donor and acceptor substrings.
18910      Testing if duplicate matches are possible due to minlevel and eliminating
18911      them.
18912
189132009-02-05  twu
18914
18915    * stage3hr.c, stage3hr.h: Distinguishing between splice objects that require
18916      copying of substrings and those that do not.
18917
18918    * stage1hr.c: Incorporated splice site probabilities into finding splice
18919      sites by distance.
18920
18921    * stage1hr.c: Initial implementation of finding splice pairs by distance
18922
18923    * gsnap.c: Retrieving and printing known splicesite information at print
18924      time.
18925
18926    * iit-read.c, iit-read.h: Added function IIT_get_typed_with_divno
18927
18928    * stage3hr.c, stage3hr.h: Providing chroffset information to Substring_T
18929      object.  Retrieving and printing known splicesite information at print
18930      time.
18931
18932    * stage1hr.c: Providing chroffset information to Substring_T object
18933
189342009-02-04  twu
18935
18936    * stage1hr.c, stage1hr.h, gsnap.c: Implemented faster method for applying
18937      known splice sites
18938
18939    * stage1hr.c, stage1hr.h: Fixed bug when allvalidp5 or allvalidp3 is false
18940      in paired end reads. Implemented Stage1_retrieve_splicesites.
18941
18942    * resulthr.c, resulthr.h: Renamed paired_translocation as paired_as_singles
18943
18944    * iit-read.c: Fixed bug in evaluating nexactmatches
18945
18946    * get-genome.c: Made retrieval of map information work with universal
18947      coordinates.
18948
18949    * stage3hr.c: Fixed bug where second read of paired_as_singles was not being
18950      printed
18951
18952    * gsnap.c: Renamed paired_translocation to paired_as_singles
18953
189542009-02-03  twu
18955
18956    * genome_hr.c: Now requiring query to match either reference or alternate
18957      allele at SNPs.  Otherwise, it counts as a mismatch.
18958
189592009-02-02  twu
18960
18961    * get-genome.c: Added feature to print snp information
18962
18963    * gdiag.c, gsnap-to-iit.c, iit_plot.c: Using new interface to Genome_new
18964
18965    * stage3hr.c: Eliminating novel splice site when it duplicates a known
18966      splice site
18967
18968    * stage1hr.c: Fixed bug where plus multiple mismatches are dropped when
18969      minus batches are all empty.  Added flexibility to floor_xfirst and
18970      floor_xlast to allow for indels adjacent to end 12-mers.  Changed
18971      condition endpoint for indels at ends.  Using genome_hr procedure to count
18972      mismatches for indels.  Finding all shortdistance splices with known
18973      splice sites first, before finding novel splice sites.
18974
18975    * snpindex.c: Writing revised version of genome.  Skipping cases where snp
18976      type is inconsistent with reference genome.  Taking snps_iit as a
18977      command-line argument.
18978
18979    * sequence.c, sequence.h: Added Sequence_print_two function for snps
18980
18981    * gsnap.c, genome_hr.c, genome_hr.h, stage1hr.c, stage1hr.h: Using genomealt
18982      instead of snps_iit
18983
18984    * gmap.c: Switched -v and -V flags.  Using new interface to Genome_new.
18985
18986    * genome_hr.c: Corrected calculation of mismatches_right.  Corrected offset
18987      and now subtracting number of leading zeroes.  Using clz_table instead of
18988      log_table.  Improved debugging statements.
18989
18990    * genome.c, genome.h: Added ability to read Genome_T object as alternate or
18991      snp only versions.
18992
189932009-02-01  twu
18994
18995    * gdiag.c, oligo-count.c: Using new interface to Indexdb_new_genome
18996
18997    * gmapindex.c, pmapindex.c: Removed altstrain code
18998
18999    * indexdb.c, indexdb.h: Storing positions only at 0 mod 3.
19000
19001    * gsnap.c: Made batch loading the default for multiple input sequences.
19002      Testing for both new names and old names of reference offsets and
19003      positions files.  Made flag for splicing refer to novel splicing.
19004
19005    * gmap.c: Made batch loading the default for multiple input sequences.
19006      Testing for both new names and old names of reference offsets and
19007      positions files.
19008
19009    * get-genome.c: Added debugging statement
19010
19011    * genome_hr.c: Fixed retrieval of intervals from snps_iit file, which are
19012      1-based.
19013
19014    * stage1hr.c, stage1hr.h: Implemented snp differences for splicing and exact
19015      matches.  Allowing identification of novel splicing in addition to known
19016      splice sites.
19017
19018    * stage3hr.c: Implemented reporting of snp differences for indels.
19019
19020    * snpindex.c: Fixed bug in using chromosomal position instead of universal
19021      position. Using parameterized suffix for reference offsets and positions
19022      files.
19023
190242009-01-30  twu
19025
19026    * snpindex.c: Initial import into CVS
19027
190282009-01-28  twu
19029
19030    * genome_hr.c: In Genome_mismatches_left and Genome_mismatches_right, adding
19031      a 'sentinel' mismatch position to list if haven't reached max_mismatches.
19032
19033    * stage1hr.c: Implemented tolerance and reporting of snp differences for
19034      novel splice sites.
19035
19036    * stage3hr.c, stage3hr.h: Added ability to print number of SNP differences
19037      separately.  Thie feature not yet implemented for indels though.
19038
19039    * stage1hr.c, stage1hr.h: Added ability to tolerate known SNPs and count
19040      differences at those sites separately.  Thie feature not yet implemented
19041      for novel splices though.
19042
19043    * segmentpool.c, segmentpool.h: Changed name of chrlow to chroffset
19044
19045    * iit-read.c: Fixed computation of divint_crosstable
19046
19047    * genome_hr.c, genome_hr.h, gsnap.c: Added ability to tolerate known SNPs
19048      and count differences at those sites separately.
19049
190502009-01-27  twu
19051
19052    * gsnap-to-iit.c: Increased buffer sizes
19053
19054    * get-genome.c: Made get-genome work on map files
19055
19056    * plotgenes.c: Edited comments
19057
19058    * gsnap.c: Provided user with the ability to set parameters for size of
19059      middle and end insertions and deletions.
19060
190612009-01-22  twu
19062
19063    * stage1hr.c: Added steps to remove duplicate paired-end results
19064
19065    * stage1hr.c: Fixed memory leak
19066
19067    * gsnap.c: Increased default maxpaths from 20 to 100.  Added -e flag to
19068      --help output.  Providing max_mismatches parameter to paired-end procedure.
19069
19070    * stage3hr.h: Added function Stage3_remove_old.
19071
19072    * stage3hr.c: Revised definition of paired-end length to go from
19073      beginning-of-read to beginning-of-read.  Added function Stage3_remove_old.
19074
19075    * stage1hr.c, stage1hr.h: Passing max_mismatches parameter for paired reads
19076
190772009-01-21  twu
19078
19079    * indexdb_hr.c: Fixed bug where heapify was performed after binary_search
19080      used up all available diagonals in a batch.
19081
19082    * gmap_setup.pl.in: Fixed bug in escaping a variable
19083
19084    * pagesize.m4: Made comment line clearer
19085
19086    * builtin.m4: Initial import into CVS.
19087
19088    * Makefile.am: Added gmap_reassemble
19089
19090    * fa_coords.pl.in: Made -S flag the default.  Added -C flag to look
19091      explicitly for chromosomal information.
19092
19093    * md_coords.pl.in: Added check for unmapped contigs
19094
19095    * gmap_setup.pl.in: Maded -S flag the default behavior.  Added -C and -O
19096      flags.  Added clean procedure when making coords.genome.
19097
19098    * stage3hr.c, stage3hr.h: Including label as part of Substring_T
19099
19100    * stage1hr.h: Added procedures for finding splices against known splicesites
19101      iit.
19102
19103    * stage1hr.c: Added procedures for finding splices against known splicesites
19104      iit. Corrected computation of distances on inversions.
19105
19106    * indexdb.c: Cleaned code to ensure gsnap finds the right offsets file
19107
19108    * iit-write.c: Added check for a null typestring
19109
19110    * iit_dump.c: Fixed debugging output
19111
19112    * iit-read.h: Added procedures to search based on divint and to get a
19113      crosstable of divints.
19114
19115    * iit-read.c: Fixed IIT_debug.  Added procedures to search based on divint
19116      and to get a crosstable of divints.
19117
19118    * gsnap.c: Added flags for maxmismatches, splicing penalties, and splicing
19119      iit. Added flags for failsonly and nofails in output.
19120
19121    * gregion.c: Added abort if genomicend < genomicstart
19122
19123    * gmapindex.c: Eliminated reading of strain information and assignment to
19124      contigtypelist.  Increased size of chrpos string from 100 to 8192.
19125
19126    * gmap.c: Reformatted output for --help
19127
191282009-01-15  twu
19129
19130    * stage1hr.c: Created user-specified parameters for splicing probabilities
19131      and length.
19132
19133    * stage1hr.c: Fixed bug in printing coordinates of splicing results on minus
19134      strand
19135
191362009-01-14  twu
19137
19138    * gsnap.c: Added flags for excluding failed alignments, or limiting to those
19139
191402009-01-13  twu
19141
19142    * stage1hr.c: Added additional check to prevent straddling across chromosomes
19143
19144    * indexdb_hr.c, indexdb_hr.h, stage1hr.c: Implemented more efficient way of
19145      ignoring extensions past beginning of genome.
19146
191472009-01-07  twu
19148
19149    * stage1hr.c: Hacks put in to exclude diagonals that are less than
19150      querylength
19151
19152    * stage1hr.c: Fixed issues with wrong indel_pos chosen in middle insertions,
19153      and not checking up to specified number of indels.
19154
191552008-12-24  twu
19156
19157    * indexdbdef.h: Reverted to version 1.2
19158
19159    * indexdb.c: Reverted to version 1.121
19160
19161    * indexdb.c, indexdbdef.h: Attempt to use a compressed indexdb file
19162
191632008-12-22  twu
19164
19165    * stage1hr.c: Reading floor 2 during find_segments_multiple_mm.  Returning
19166      min_mismatches_seen from find_onemiss_matches.
19167
191682008-12-21  twu
19169
19170    * indexdb_hr.c: Put check for size of compoundpos outside of loop.
19171
19172    * stage1hr.c: Added bounds on location of mismatch in onemiss search
19173
19174    * stage1hr.c: Reverted back to version 1.106 that has hanging compoundpos
19175      positions for exact and onemiss matches.
19176
19177    * stage1hr.c: Version of stage 1 with hooks for disallowing compoundpos
19178      positions that hang over ends.  However, this appears to add 40% to number
19179      of instructions.
19180
191812008-12-20  twu
19182
19183    * stage1hr.c: Setting pointers->compoundpos to NULL after it becomes empty,
19184      to prevent further computation on it.
19185
19186    * indexdb_hr.c, indexdb_hr.h: Added function Compoundpos_intersect
19187
19188    * stage1hr.c: Eliminated compoundpos positions in creating segments from
19189      multiple mismatches.  Delayed sorting of segments until needed for middle
19190      insertions and deletions.  Setting floor to zero in cases where poly-AT is
19191      present.  For end indels, computing oligomer_start and oligomer_end based
19192      on results of actual mismatches found.
19193
19194    * stage1hr.c: Fixed bug in debugging statement
19195
19196    * gmapindex.c: Added more compiler checks to hide alternate strain code
19197
19198    * Makefile.util.am: Added program all-orfs
19199
19200    * Makefile.dna.am, Makefile.gsnaptoo.am: Removed chrsubset.c from gsnap
19201      sources
19202
192032008-12-18  twu
19204
19205    * stage3hr.c, stage3hr.h: Using left instead of genomicpos5 for creating
19206      Stage3_T objects.
19207
19208    * stage1hr.c: Using left instead of genomicpos5 for creating Stage3_T
19209      objects. Consolidated code for plus and minus segments.  Changed parameter
19210      list for find_segments_multiple_mm to prepare for finding hits within that
19211      procedure.
19212
19213    * stage1hr.c: Clarified calculations of floors
19214
192152008-12-17  twu
19216
19217    * stage1hr.c: Using init and search routines for Compoundpos_T objects
19218
19219    * separator.h: Changed coordinate separator from "--" to ".."
19220
19221    * intlist.c, intlist.h: Added function Intlist_sort_ascending()
19222
19223    * indexdb_hr.c, indexdb_hr.h: Implemented init and search routines for
19224      Compoundpos_T objects
19225
19226    * iit-write.c, iit_store.c: Added monitoring output
19227
19228    * stage1hr.c: Clarified processing of pointers in search for onemiss matches.
19229
192302008-12-16  twu
19231
19232    * stage1hr.c: Allowing spanning 12-mers for exact and onemiss searches to go
19233      in either forward or reverse direction, and picking the optimal direction.
19234
19235    * stage1hr.c: Allowing compoundpos positions to be used for boosting, by
19236      merging them during search for exact matches.
19237
19238    * stage1.c: Fixed a bug where the querypos of sentinel was set incorrectly.
19239      Now using querylength, not -1.  Added a check to prevent gregions with
19240      negative values for genomicstart.
19241
19242    * indexdb_hr.c, indexdb_hr.h: Removed partner_diagonals from Compoundpos_T
19243      object and removed reduce function.
19244
19245    * stage1hr.c: Fixed potential problem with sentinel.  The querypos part of
19246      sentinel now set to querylength, not -1, to guarantee it stops the loop.
19247      Introduced Pointers_T object to simplify exact and onemiss code.
19248
192492008-12-15  twu
19250
19251    * stage1hr.c: Introduction of Compoundpos_T object for speeding up
19252      computation in exact and onemiss algorithms
19253
19254    * intlist.c, intlist.h: Added Intlist_insert_second() function
19255
19256    * indexdb_hr.c, indexdb_hr.h: Introduction of Compoundpos_T object and
19257      operations
19258
192592008-12-14  twu
19260
19261    * intlist.c: Made cell_ascending() and cell_descending() static.
19262
19263    * genome_hr.c, genome_hr.h: Created Genome_count_mismatches_limit.  Also
19264      added code for a oneloop version of Genome_count_mismatches.
19265
19266    * gsnap.c: Removed chrsubset feature
19267
19268    * stage3hr.c, stage3hr.h: Clarified that Substring_new_donor and
19269      Substring_new_acceptor should receive forward query sequence.
19270
19271    * stage1hr.c: Removed querylength from call to select_positions_for_exact().
19272
19273    * stage1hr.h: Removed Chrsubset_T object.
19274
19275    * stage1hr.c: Using new call to Genome_count_mismatches_limit.  Replaced
19276      uses of queryseq with queryuc_ptr and queryrc.  Introduced query_lastpos
19277      to replace multiple calculations of (querylength - INDEX1PART).  Removed
19278      Chrsubset_T object.
19279
19280    * stage1hr.c: Fixed onemiss algorithm so it handles short reads less than
19281      2*INDEX1PART in length.  Changed occurrences of oligobase to INDEX1PART.
19282      Removed oligobase and querylength from Stage1_T object.
19283
19284    * genome_hr.c, genome_hr.h: Made Genome_count_mismatches more efficient, by
19285      using pointers and stepping through query and genome blocks sequentially.
19286
192872008-12-13  twu
19288
19289    * stage1hr.c: Implemented new method for identifying single mismatches,
19290      similar to that for finding exact matches.
19291
192922008-12-12  twu
19293
19294    * result.h: Added a new failure type for short sequences
19295
19296    * pmapindex.c: Changed default index1interval from 3 to 6
19297
19298    * oligo.c: Added a comment.
19299
19300    * oligo-count.c: Using revised interface to indexdb.c.
19301
19302    * indexdb_hr.c, indexdb_hr.h: Removed addition of a diagterm for lookups
19303      involving a left or right shift.
19304
19305    * indexdb.c, indexdb.h: Changed some function names.  Added function to
19306      determine if inplace reading is possible.  Added parameter to require
19307      sampling of 3 in indexdb.
19308
19309    * gmap.c: Using revised interface to indexdb.c.  Added check and message for
19310      sequences shorter than INDEX1PART.
19311
19312    * gdiag.c: Using revised interface to indexdb.c
19313
19314    * block.c: Using revised function names in indexdb.c
19315
19316    * segmentpool.c, segmentpool.h: Removed break5 and break3 from Segmentpool_T
19317      object.
19318
19319    * stage1hr.c: Removed break5 and break3 from Segmentpool_T object
19320
19321    * stage1.c: Providing correct adjustment to diagonals for the minus strand,
19322      by adding the length of the oligomer.
19323
19324    * stage3hr.c, stage3hr.h: Added function to compare Stage3_T objects by
19325      genomic location
19326
19327    * gsnap.c: Added flag to sort results by genomic location
19328
19329    * stage1hr.c: No longer saving segments during check for single mismatches.
19330      Checking and saving substitution hits within each heap merge.
19331
19332    * gsnap.c: Added user flag for setting indel penalty
19333
19334    * stage1hr.c, stage1hr.h: Made indel_penalty a parameter adjustable by the
19335      user.
19336
193372008-12-11  twu
19338
19339    * stage1hr.c: Moved around code for special cases that prevent searching for
19340      end indels at beginning or end of the sequence.
19341
19342    * gsnap.c: Added information about whether inplace reading of indexdb is
19343      possible.  Added information to version command.
19344
19345    * segmentpool.c, segmentpool.h: Added floor, floor_xfirst, and floor_xlast
19346      fields to the Segment_T object.
19347
19348    * stage1hr.c: Removed separate lists of segments by floor.  Keeping only a
19349      single list of plus segments and of minus segments, with floor information
19350      stored in the Segment_T object.
19351
19352    * stage1hr.h: Providing information about whether inplace reading of
19353      diagonals is possible.
19354
19355    * stage1hr.c: Delayed addition of diagterm until after search for exact
19356      matches.  Allowing reads of diagonals from indexdb to be inplace when
19357      possible
19358
193592008-12-10  twu
19360
19361    * types.h: Added check for size of unsigned long long as an 8-byte word
19362
19363    * stage1hr.c: Using 64-bit words, if available, to speed up comparison of
19364      batches in heap merge.
19365
19366    * stage1hr.c: Replaced calls to List_head, List_next, Intlist_head, and
19367      Intlist_next with primitives.
19368
19369    * Makefile.pmaptoo.am: Revised source files
19370
19371    * Makefile.gsnaptoo.am: Added gsnap-to-iit program.  Added segmentpool.
19372
19373    * Makefile.gmaponly.am: Added chrom.c to iit utilities
19374
19375    * Makefile.dna.am: Removed test programs for Compress_T procedures
19376
19377    * Makefile.dna.am: Added test programs for Compress_T procedures.  Added
19378      segmentpool.
19379
19380    * segmentpool.c, segmentpool.h: Initial import into CVS
19381
19382    * gsnap.c, stage1hr.c, stage1hr.h: Added segmentpool
19383
193842008-12-09  twu
19385
19386    * stage1hr.c: Enforcing diagonals to be within chromosomal bounds.  Removed
19387      unused code.
19388
19389    * genome.c: Fixed check for chromosome bounds
19390
19391    * genome_hr.c, genome_hr.h: Removed checks for crossing of chromosome
19392      boundaries.  Relying upon calling procedures to enforce this.
19393
193942008-12-08  twu
19395
19396    * gsnap.c: Added flags for minlevel and maxlevel.  Cleaned up unused flags.
19397
19398    * genome_hr.c, genome_hr.h: Made functions return maximum number of
19399      mismatches if they cross a chromosome bound.
19400
194012008-12-05  twu
19402
19403    * stage1hr.c: On end indels, checking to see if indel_pos is non-positive.
19404      Passing chromosome_iit to Genome_mismatches_left and
19405      Genome_mismatches_right.
19406
19407    * gsnap.c: Added statement at end of batch processing to indicate number of
19408      queries processed.
19409
194102008-12-02  twu
19411
19412    * subseq.c: Added U and l flags
19413
19414    * stage1hr.c: For middle indels, putting indel at leftmost genomic position.
19415       Fixed filtering criteria for end indels.  Counting mismatches and left
19416      and right to identify candidates for end indels.
19417
19418    * stage1.c: Added debugging statements
19419
19420    * gdiag.c, gsnap-to-iit.c: Using new interface to Genome_fill_buffer
19421
19422    * gmap.c: Fixed genomiclength for user-provided genomic sequence.  Stopped
19423      trimming of sequence.
19424
19425    * diag.c: Loosened criteria for MAX_DIAGONALS and MIN_SCORE.
19426
194272008-11-25  twu
19428
19429    * genome_hr.c: Made Genome_mismatches_left and Genome_mismatches_right fill
19430      mismatch_positions entries 0..max_mismatches.
19431
19432    * stage1hr.c: Implemented new algorithms for middle insertions and
19433      deletions, using Genome_mismatches_left and Genome_mismatches_right.
19434
19435    * stage1hr.c: Using new interface to Genome_count_mismatches
19436
19437    * genome_hr.c, genome_hr.h: Added functions Genome_mismatches_left and
19438      Genome_mismatches_right. Added builtin bit-vector functions.
19439
194402008-11-24  twu
19441
19442    * genome_hr.c, genome_hr.h: Allowed specification of pos5 and pos3 in
19443      Genome_count_mismatches
19444
19445    * genome_hr.h: Using new Compress_T object.
19446
19447    * genome_hr.c: Using new Compress_T object.  Removed unused code.
19448
19449    * stage1hr.c: Using new Compress_T object
19450
19451    * stage3hr.c: Fixed memory leak
19452
19453    * compress.c, compress.h: Introduced Compress_T object
19454
194552008-11-23  twu
19456
19457    * genome_hr.c, genome_hr.h: Initial entry into CVS
19458
19459    * gsnap.c: Added flag for handling circular-end reads
19460
19461    * stage3hr.c, stage3hr.h: Making a single invertp work on paired-end and
19462      circular-end reads
19463
19464    * stage1hr.c: Using direct comparison against compressed genome to count
19465      mismatches
19466
19467    * genome.c, genome.h: Added function Genome_blocks.  Made Genome_fill_buffer
19468      return nunknowns.
19469
19470    * compress.c, compress.h: Added functions Compress_new and Compress_shift
19471
194722008-11-20  twu
19473
19474    * sequence.c, sequence.h: For circular-end reads, keeping reverse complement
19475      of queryseq2, but swapping queryseq1 and queryseq2.
19476
19477    * stage1hr.h: Placed local exon-exon mappings after multiple substitutions
19478      and indels in the hierarchy of levels.
19479
19480    * stage2.c: Commented out unused procedure
19481
19482    * stage3.c: Added debugging statement
19483
19484    * stage3hr.c, stage3hr.h: Enabled printing of circular-end reads
19485
194862008-11-13  twu
19487
19488    * stage1hr.c: Allowing mismatches with splicing
19489
194902008-11-11  twu
19491
19492    * gsnap.c, stage1hr.h: Removed unused parameters
19493
19494    * stage1hr.c: Using new single-end read algorithms for paired-end reads.
19495      Fixed problems with query sequences that contain non-ACGT characters.
19496
19497    * stage3hr.c: Fixed problems with printing inverted sequence for paired-end
19498      reads
19499
19500    * gmap_reassemble.pl.in: Initial entry into CVS
19501
19502    * stage3hr.c: Fixed bug in Stage3_remove_duplicates
19503
19504    * stage1hr.c: Revised splicing parameters.  Fixed calculation of maxfloor.
19505
195062008-11-10  twu
19507
19508    * stage3hr.c: Favoring substitutions over equivalent indels in removing
19509      repeats.
19510
19511    * stage1hr.c: Treating max_middle_insertions and max_middle_deletions
19512      separately in solving middle indels.  Removed scores from
19513      compute_end_indels.  Not resetting min_mismatches after single_mm, because
19514      of effect of poly_at oligos.
19515
19516    * gsnap.c, stage1hr.h: Allowing separate parameters for middle and end
19517      insertions and deletions.
19518
19519    * stage3hr.c: Added check and warning messages if observed mismatches is
19520      different from the number expected
19521
19522    * stage1hr.c: Introduced floor system for computing indels.  Defined
19523      calculation of middle and end indels more clearly, with separate
19524      parameters for middle and end insertions and deletions.
19525
195262008-11-06  twu
19527
19528    * stage1hr.c: Made fixes for splicing to work
19529
19530    * indexdb_hr.h: Removed obsolete functions
19531
19532    * indexdb_hr.c: Fixed masking of left shifts
19533
19534    * indexdb.c: Commented out warning message for multiple index files
19535
19536    * gsnap.c: Partially implemented minlevel and maxlevel controls for Stage1.
19537      Removed references to Stage3chimera_T objects
19538
19539    * resulthr.c, resulthr.h: Removed references to Stage3chimera_T objects
19540
19541    * stage3hr.h: Implemented new structure for Stage3 objects: single reads may
19542      have one or more substrings.
19543
19544    * stage3hr.c: Implemented new structure for Stage3 objects: single reads may
19545      have one or more substrings.  Modified print procedure for indels to allow
19546      for mismatches.
19547
19548    * stage1hr.h: Removed chimerap variable since Stage3 single reads are all of
19549      the same type now.
19550
19551    * stage1hr.c: Fixed polyat assessment at ends of query.  Storing first and
19552      last diagonals and computing mismatches on both.  Changed ptr->indels to
19553      be consistently positive for insertions and negative for deletions. Using
19554      new Stage3 objects.  Solving middle indels with mismatches. Using minlevel
19555      and maxlevel to control computing behavior on different alignment types.
19556
195572008-11-03  twu
19558
19559    * stage1hr.h: Made max_insertions and max_deletions parameters.  Added
19560      minlevel and maxlevel.
19561
19562    * stage1hr.c: Cleaned up procedures for single mismatches and multiple
19563      mismatches. Added oligobase to minus diagonals to prevent negative
19564      coordinates. Made max_insertions and max_deletions parameters.  Added
19565      minlevel and maxlevel.
19566
195672008-10-28  twu
19568
19569    * md_coords.pl.in: Revised instructions to user
19570
195712008-10-24  twu
19572
19573    * stage3.c: In comparing paths_fwd and paths_rev, using just number of
19574      matches
19575
19576    * stage2.c: Also performing stage2 if there is a sufficient value for
19577      ncoverage
19578
19579    * stage1.c: Removed matchsize and matchinterval from Stage1_T object, and
19580      allowing option in scan_ends of iterating on different matchsizes.  In
19581      removal of repeated oligomers, now also removing neighboring oligomers.
19582      Now filtering gregions by support.
19583
19584    * reader.c: Added fields so Reader_reset_ends resets correctly
19585
19586    * gregion.c, gregion.h: Added function to filter by support
19587
19588    * gmap.c: Fixed error message for -z flag
19589
19590    * diag.c, diag.h: Returning ncovered from Diag_update_coverage
19591
19592    * block.c, block.h: Removed high-resolution option
19593
195942008-10-23  twu
19595
19596    * stage3.c: Handling case where gap is at beginning of path.  Trimming end
19597      exons until a canonical intron is reached.
19598
19599    * stage1.c: Identifying repeated oligos at the outset
19600
19601    * pair.c: Made counting of ambiguous matches more uniform
19602
19603    * gsnap-to-iit.c: Added information about unique positions.  Added ability
19604      to halt at a given position.
19605
19606    * gregion.c: Modified print statement
19607
196082008-10-10  twu
19609
19610    * pair.c: Made N's in query sequence align as mismatches in GMAP.
19611
19612    * gmapindex.c: Removing "chr" from chrsubset file
19613
19614    * stage1hr.c: Tightened criteria for finding exon-exon junctions.  Not
19615      reading 10- or 11-mers at ends if the 12-mer is invalid.
19616
19617    * stage3hr.c: Made macros for text constants
19618
19619    * match.c, match.h: Added function Match_print
19620
19621    * gsnap.c: Changed default of trim flag to be false
19622
19623    * gmap.c: Added jobdiv capability to GMAP
19624
196252008-10-02  twu
19626
19627    * blackboard.c: Simplified the fix for the hang for input done with no inputs
19628
196292008-10-01  twu
19630
19631    * blackboard.c: Fixed hang that occurs when no input was ever received,
19632      which happens with the jobdiv option when the input has fewer sequences
19633      than the first batch modulus.
19634
196352008-09-26  twu
19636
19637    * iit_get.c: Allowing retrieval of labels that contain colons, by checking
19638      first to see if the first part of the label is a divstring.
19639
19640    * iit-read.c, iit-read.h: Added function to determine divint without reading
19641      entire IIT.
19642
196432008-09-23  twu
19644
19645    * stage3.h: Added PRE_ENDS as a debugging endpoint
19646
19647    * stage3.c: Modified solutions at ends.  First, we decide between distal and
19648      medial, with distal penalized for non-canonical introns.  Then, we simply
19649      extend the ends without peelback and permitting an initial gap.
19650
19651    * stage2.c: Introducing a minimum pct_coverage
19652
19653    * oligoindex.c, oligoindex.h: Allowed suffnconsecutive to be a different
19654      value in each level of resolution.
19655
19656    * match.c, match.h: Moved function Match_get_coords to gregion.c
19657
19658    * gregion.h: Added fields for weight and support.
19659
19660    * gregion.c: Added fields for weight and support.  Duplicate gregions are
19661      now resolved in favor of the gregion with the greatest weight, or if
19662      equal, the greatest support.
19663
19664    * gmap.c: Increased extraband at end from 3 to 6
19665
19666    * dynprog.c: Allowing a gap to start the alignment of end5 and end3.
19667      Introducing a parameter init_jump_penalty_p to control this.
19668
19669    * diag.c: Replaced 0 with 0U in some cases
19670
19671    * stage1.c: Using weights on matches and on gregions to focus on genomic
19672      regions with most specificity.
19673
196742008-09-19  twu
19675
19676    * stage1.c: Added penalty for intron length in find_best_path to reduce
19677      excessively large regions.  If segments are used, then clearing gregions
19678      and starting over.
19679
196802008-09-16  twu
19681
19682    * stage1hr.c: Fixed bug with false positives on middle indel.  Fixed bug
19683      with combinations of insertions and deletions in find_segments_multiple_mm.
19684
196852008-09-15  twu
19686
19687    * gmap.c: Allowing multiple paths for alignment against user-provided
19688      segment. Explicitly recomputing goodness over all stage3 objects.
19689      Allowing user to specify direction of introns.
19690
19691    * stage2.c: Giving points for indexsize-equivalent number of matches if it
19692      starts a new chain.
19693
19694    * stage1hr.c: Fixed problem with insertions in first 12-mer.  Now treating
19695      as a mismatch, as we did for insertions in the last 12-mer.
19696
19697    * oligoindex.h: Removed debug_graphic_p from argument list
19698
19699    * oligoindex.c: Fixed memory leak
19700
19701    * md5-compute.c: Added ability to handle multiple input files
19702
19703    * diag.c, diag.h, diagdef.h: Computing scores for each diagonal and
19704      requiring a minimum score
19705
197062008-09-09  twu
19707
19708    * stage1hr.c: Fixed bug in exact match to end of chromosome, resulting in
19709      negative coordinates of the next chromosome.
19710
19711    * stage3.h: Added function for recomputing goodness
19712
19713    * stage3.c: Made widebandp true on all single gap solutions.  Extending 5'
19714      and 3' ends, rather than comparing distal with medial, when defect rate is
19715      high.  Recomputing goodness using just matches if best hit is poor.
19716
197172008-09-08  twu
19718
19719    * diag.c, diag.h: Moved some functions from oligoindex.c to diag.c
19720
19721    * oligoindex.c, oligoindex.h: Implemented different mapping resolutions by
19722      using multiple oligoindices.  Using a separate lookback for each
19723      resolution.
19724
19725    * stage2.c, stage2.h: Implemented different mapping resolutions by using
19726      multiple oligoindices.
19727
197282008-09-04  twu
19729
19730    * configure.ac: Added check for stat64
19731
19732    * acinclude.m4: Including config/acx_mmap_fixed.m4,
19733      config/acx_mmap_variable.m4, and config/struct-stat64.m4.
19734
19735    * VERSION: Updated version
19736
19737    * README: Augmented instructions for new gmap_setup flags and made mention
19738      of GSNAP.
19739
19740    * dynprog.c, dynprog.h: New functions added for dealing with an internal gap
19741
19742    * indexdb.c: Fixed problem in reading offsets and positions file based on
19743      interval of 6.
19744
19745    * gmap.c: Made default canonical mode to be 1
19746
19747    * stage3.c: Reverted to revision 1.300 with newer code kept for stage3debug.
19748
19749    * stage2.c: Reverted to revision 1.221 with newer code kept for converting
19750      oligomers to nucleotides
19751
19752    * smooth.h: Removed stage2_indexsize
19753
19754    * smooth.c: Reverted to revision 1.41, plus removal of stage2_indexsize
19755
19756    * oligoindex.h: Reverted to revision 1.47, plus most recent wobble masking
19757      and code for multiple oligoindices
19758
19759    * oligoindex.c: Reverted to revision 1.108, plus most recent wobble masking
19760      and code for multiple oligoindices
19761
19762    * diagpool.c: Removed initialization for bestscore and prev fields
19763
19764    * diagdef.h: Removed score, bestscore, and prev fields
19765
19766    * diag.h: Reverted to revision 1.5 with some functions moved from
19767      oligoindex.c.
19768
19769    * diag.c: Reverted to revision 1.7 with some functions moved from
19770      oligoindex.c.
19771
19772    * stage3.h: Calling stage 2 directly
19773
19774    * stage3.c: More attempts to rearrange steps
19775
19776    * stage2.c, stage2.h: Bypasses former stage 2 and returns best path of
19777      diagonals, converted to nucleotides
19778
19779    * smooth.c: Changed function for finding internal shorts
19780
19781    * oligoindex.c, oligoindex.h: Changed Oligoindex_get_mappings to return a
19782      list of diagonals
19783
19784    * iit-read.h: Added comments to explain arguments
19785
19786    * gmap.c: Having stage2 return a path
19787
19788    * diagpool.c: Added initialization for bestscore and prev
19789
19790    * diagdef.h: Added fields for bestscore and prev
19791
19792    * diag.c, diag.h: Added functions Diag_compare_querystart and Diag_best_path
19793
197942008-08-15  twu
19795
19796    * diag.c, diag.h, gmap.c, oligoindex.c, oligoindex.h, stage2.c, stage2.h:
19797      Implementation of oligoindex step at multiple resolutions
19798
19799    * stage2.c: Rearranged procedures in preparation for multiple oligoindices.
19800
19801    * oligoindex.c, oligoindex.h: Moved various functions from oligoindex.c to
19802      diag.c.  Added various variables to Oligoindex_T struct.  Rearranged
19803      procedures in preparation for multiple oligoindices.
19804
19805    * diag.c, diag.h: Moved various functions from oligoindex.c to diag.c
19806
19807    * stage2.h: Added a version of stage 2 that can be called from within stage
19808      3.
19809
19810    * stage2.c: Using active hits, instead of minactive and maxactive bounds.
19811      Added hooks for relying upon splice site scores.  Made conversion to
19812      nucleotides handle arbitrary masks.  Added penalty for diffdistance not a
19813      multiple of 3.  Added a version of stage 2 that can be called from within
19814      stage 3.
19815
19816    * stage1hr.c: Exiting if a single polyat 12-mer found, to prevent false
19817      indels from being found in find_segments_multiple_mm.
19818
19819    * oligoindex.h: Computing active hits around each diagonal, instead of
19820      minactive and maxactive bounds.
19821
19822    * oligoindex.c: Added wobble masking.  Computing dominance by using scores,
19823      based on number of diagonals overlapping each querypos.
19824
19825    * indexdb_hr.c: Added masking for all left shifts
19826
19827    * indexdb.c: Fixed problem where highest resolution indexdb was not being
19828      used
19829
19830    * gmap.c: Using new interface to Oligoindex_set_inquery
19831
19832    * diag.c, diag.h, diagdef.h: Added score to Diag_T object
19833
19834    * block.c: Added error message
19835
198362008-08-11  twu
19837
19838    * oligoindex.c, oligoindex.h: Passing character strings to procedures,
19839      rather than Sequence_T objects.
19840
198412008-08-10  twu
19842
19843    * gmap.c: Made changes to debug requests from stage3
19844
198452008-08-09  twu
19846
19847    * stage3.c: Rearranging steps to improve cross-species performance.  Work
19848      still in progress.
19849
198502008-08-08  twu
19851
19852    * stage1hr.c: Removed old code
19853
19854    * stage1.c: Made heap and segment algorithm work for PMAP
19855
19856    * binarray.c, binarray.h: Removed binarray source code
19857
19858    * sequence.c: Redefined trim_end for PMAP to exclude the terminal stop codon
19859      added
19860
19861    * pmapindex.c: Including index1interval in filename for PMAP databases
19862
19863    * matchpool.c: Removed old code that referred to positions, not diagonals
19864
19865    * match.c: Simplified a procedure
19866
19867    * indexdbdef.h: For PMAP, allowed index1interval to be determined by
19868      available databases
19869
19870    * indexdb_hr.c, indexdb_hr.h: Moved Indexdb_read_no_subst command to
19871      indexdb.c
19872
19873    * indexdb.c, indexdb.h: Moved Indexdb_read_no_subst command here.  Including
19874      index1interval into filename for PMAP databases.
19875
19876    * gmap.c: Changed variable name from samplingp to lowidentityp
19877
19878    * block.c: Bypassing oligo.c and calling Indexdb commands directly
19879
198802008-08-06  twu
19881
19882    * Makefile.dna.am, Makefile.gmaponly.am, Makefile.gsnaptoo.am,
19883      Makefile.pmaptoo.am, stage1.c, stage1.h: Removed binarray
19884
198852008-08-05  twu
19886
19887    * binarray.c, binarray.h, stage1.c: Transitioning away from bins and toward
19888      segments.  Intermediate code contains both sets of functions.
19889
198902008-08-01  twu
19891
19892    * block.c, block.h, gregion.c, gregion.h, matchpool.c, matchpool.h,
19893      stage1.c: Just before change to using diagonals, with directives
19894      indicating changes
19895
198962008-07-30  twu
19897
19898    * gsnap.c: Changed batch specification so it runs from 0 to n-1.
19899
19900    * stage1hr.c: Changed hierarchy of results to be exact, sub:1, local
19901      splicing, half introns, sub:2, sub:3, sub:4, indels, distant splicing.
19902      Increased speed for computing splice ends.  Limiting nmismatches for each
19903      splice end, so not checking nmismatches for splicing after that.
19904
19905    * gsnap.c: Reduced default maxpaths to 20 and maxchimerapaths to 2
19906
199072008-07-29  twu
19908
19909    * stage1hr.c: Improved identification of repetitive oligos
19910
19911    * sequence.c: Better handling of FASTA files that end with blank lines
19912
19913    * stage1hr.c, stage1hr.h: Implemented different sizes for insertions and
19914      deletions
19915
19916    * stage3hr.c, stage3hr.h: Added function Stage3_remove_duplicates
19917
19918    * stage1hr.c: Made 12-mer mod 3 strategy work for multiple mismatches,
19919      indels, and exon-exon junctions.
19920
199212008-07-28  twu
19922
19923    * stage1hr.c: Removed special variables for -2, -1, querypos+1, and
19924      querypos+2. Removed middle_indel_p.
19925
19926    * stage1hr.c: Made paired reads use new 12-mer strategy for exact and 1-sub
19927
19928    * stage1hr.h: Changed variable name to expected_pairlength
19929
19930    * datadir.c: Improved error message when genome db not found
19931
19932    * indexdb_hr.c, indexdb_hr.h, intlist.c, intlist.h, stage1hr.c: Implemented
19933      faster version of exact and 1-sub using 12-mers
19934
199352008-07-26  twu
19936
19937    * indexdb_hr.c: Removed oligo_hr.h and oligo_hr.c.  Added code for reading
19938      left and right subst of 1 and 2 nts.
19939
19940    * oligo_hr.c, oligo_hr.h: Removed oligo_hr.h and oligo_hr.c
19941
19942    * Makefile.dna.am, Makefile.gmaponly.am, Makefile.gsnaptoo.am,
19943      Makefile.pmaptoo.am, block.c, indexdb.c, indexdb.h, stage1hr.c: Removed
19944      oligo_hr.h and oligo_hr.c.
19945
19946    * indexdb_hr.h: Initial import into CVS
19947
199482008-07-17  twu
19949
19950    * stage3hr.c: Fixed handling of trimming for inverted hits.  Fixed handling
19951      of hits that have negative genomic coordinates.
19952
19953    * stage1hr.c: Fixed handling of trimming at ends
19954
19955    * pair.c: Changed output to show "genome" instead of "chr"
19956
19957    * Makefile.dna.am, Makefile.gmaponly.am, Makefile.gsnaptoo.am,
19958      Makefile.pmaptoo.am, diagnostic.c, diagnostic.h, gmap.c, result.c,
19959      result.h, stage1.c, stage1.h: Added Diagnostic_T to hold information
19960
19961    * chimera.c, chimera.h, get-genome.c, iit_plot.c, match.c, match.h,
19962      stage3.c: Using new interface to Genome_get_segment
19963
19964    * genome.c, genome.h: Printing out-of-bounds characters on all cases where
19965      coordinates exceed chromosomal boundaries.
19966
199672008-07-07  twu
19968
19969    * stage1hr.c: Added comment
19970
19971    * stage1.c: Added high-resolution sampling
19972
19973    * oligo.c, oligo.h: Removed burden of leftreadshift to caller
19974
19975    * indexdb.c, indexdb.h: Added function to provide indexing interval
19976
19977    * block.c, block.h: Added high-resolution behavior to Block_T object
19978
19979    * gmap.c: Added hybrid behavior for finding canonical introns: low reward
19980      for high-identity sequences and high reward otherwise.
19981
19982    * stage1.h: Removed obsolete functions
19983
19984    * stage1.c: Renamed variables
19985
19986    * pair.c, pair.h, stage3.c: Printing separate runtimes for stage2
19987      diagonalization and alignment
19988
19989    * stage2.h: Computing separate runtimes for stage2 diagonalization and
19990      alignment.
19991
19992    * stage2.c: Reinstating limitation on maximum number of active hits.
19993      Computing separate runtimes for stage2 diagonalization and alignment.
19994
19995    * stage1.c, stage1.h: Reporting whether sampling was used
19996
19997    * gmap.c: Using smaller stage 2 indexsize when stage 1 sampling is done
19998
19999    * plotgenes.c, plotgenes.h: Added ability to handle values
20000
20001    * pdldata.c: Using Access_mmap function
20002
20003    * gdiag.c, gsnap-to-iit.c: Using new interface to Genome_fill_buffer
20004
20005    * subseq.c: Added initial '>' to header
20006
20007    * stage3.c: Using new interface to IIT_print
20008
20009    * stage1.c: Removed references to Matchpair_T
20010
20011    * pmapindex.c: Removed -l as an input flag
20012
20013    * oligo.c, oligo.h: Added code for identifying repetitive oligos
20014
20015    * match.c, match.h: Added code for dealing with pairs of matches
20016
20017    * matchpair.c, matchpair.h: Removed Matchpair_T code
20018
20019    * indexdb.h: Restoring previous definition of sufficient support
20020
20021    * iit_update.c: Using new interface to IIT_read
20022
20023    * iit_plot.c: Handling values, in addition to counts and genes
20024
20025    * gregion.c, gregion.h: Added fields to Gregion_T
20026
20027    * gmap.c: Interpreting optarg as strings, not integers
20028
20029    * get-genome.c: Removed -F and -R flags.  Using -R flag for relative
20030      coordinates.
20031
20032    * block.h: Separated interfaces for GMAP and PMAP
20033
20034    * block.c: Added hook for removing repetitive oligos
20035
20036    * binarray.c: Taking all boxes in final step.  Reduced debugging output.
20037
20038    * Makefile.dna.am, Makefile.gmaponly.am, Makefile.gsnaptoo.am,
20039      Makefile.pmaptoo.am: Added binarray.c and .h and removed matchpair.c and .h
20040
20041    * matchpair.c: Added code for bins
20042
20043    * stage1.c: Added two levels of ntopboxes
20044
200452008-07-03  twu
20046
20047    * binarray.c, binarray.h, stage1.c: Implemented working version of binarray
20048      algorithm
20049
20050    * Makefile.util.am, revcomp.c, seqlength.c, subseq.c: Added utility programs
20051      for internal use
20052
200532008-07-02  twu
20054
20055    * binarray.c, binarray.h: Initial import into CVS
20056
200572008-07-01  twu
20058
20059    * stage1.c, stage1.h: Initial implementation of bins
20060
200612008-06-30  twu
20062
20063    * stage3.c, iit-read.c, iit-read.h: Added ability to print relative
20064      coordinates
20065
20066    * stage1hr.c: Improved handling of heaps.  Added code for handling
20067      out-of-bounds conditions.
20068
20069    * sequence.c, sequence.h: Added command for printing revcomp of sequence
20070
20071    * interval.c, indexdb.c: Added debugging statements
20072
20073    * indexdb_hr.c: Replaced separate variables for heapsize and delta into a
20074      single header.  Added code for doing all reads, then doing all writes.
20075
20076    * iitdef.h: Storing separate mmap pointers for parts of IIT
20077
20078    * gmap.c: Removed -R flag
20079
20080    * gsnap.c: Added ability to handle input sequence in batches
20081
20082    * genome.c, genome.h: Changed out-of-bounds symbol to be '*'.
20083
200842008-06-25  twu
20085
20086    * gmap.c: Made low reward for canonical sequences to be the default
20087
20088    * get-genome.c: Fixed calculation of genomiclength
20089
20090    * iit-read.h: Removed unused function.
20091
20092    * iit-read.c: Now doing memory mapping of pointers rather than reading all
20093      of them. Fixed bug in reporting second chromosomal coordinate.  Fixed bug
20094      in sorting segments by coordinate.
20095
200962008-05-19  twu
20097
20098    * indexdb_hr.c: Made process_heap inline.  Removed delta from Batch_T.
20099
20100    * indexdb_hr.c: Various code introduced to improve speed of heapify operation
20101
201022008-05-09  twu
20103
20104    * access.c, access.h: Fixed mmap calls with offset so offset is on a page
20105      boundary
20106
20107    * stage3hr.c, stage3hr.h: Added new functions for filtering and sorting
20108      chimeras.  Fixed calls of scrambled exons.
20109
20110    * stage1hr.c: Major efficiency improvements in heapify and other heap
20111      functions for merging diagonals
20112
20113    * indexdb_hr.c: Major efficiency improvements in heapify and other heap
20114      functions for merging batches
20115
20116    * stage1hr.c: Changed filtering methodology for exon-exon junctions
20117
201182008-05-08  twu
20119
20120    * indexdb_hr.c: Using pointers to memory-mapped positions file, and adding
20121      shift-plus-diagterm as heap builds the final array of positions.
20122
201232008-05-07  twu
20124
20125    * indexdb_hr.c: Restored previous version
20126
20127    * indexdb_hr.c: Attempt to reduce D2 cache miss rate, but actually increases
20128      it by 10x.
20129
201302008-05-06  twu
20131
20132    * stage3hr.c: Using correct type for Stage3chimera_t objects
20133
20134    * stage1hr.c: Set chimerap flag correctly
20135
20136    * indexdb_hr.c: Faster counting of entries in cases where duplicates are not
20137      allowed
20138
20139    * indexdb.c: Minor syntactic changes
20140
20141    * gsnap.c: Turned off reading of labels for map iit files
20142
201432008-05-05  twu
20144
20145    * gmapindex.c, pmapindex.c: Providing chromosome_iit to procedures for
20146      writing offset and position files
20147
20148    * stage3hr.c, stage3hr.h: Values for chrnum are pre-computed rather than
20149      computed here
20150
20151    * stage1hr.c: Using new interfaces to Genome_fill_buffer and Stage3_new
20152      routines
20153
20154    * indexdb.c, indexdb.h: No longer storing oligomers at ends of chromosomes
20155
20156    * iit-read.c, iit-read.h, iitdef.h: Providing specific fields for memory
20157      mapping of labels and annotations.  Reading all pointers for labels and
20158      annotations.
20159
20160    * genome.c, genome.h: Trimming correctly at chromosome boundaries.
20161      Returning chrnum.
20162
20163    * access.h: Added function to mmap at a particular offset.
20164
20165    * access.c: Added function to mmap at a particular offset.  Added check for
20166      struct stat64.
20167
20168    * stage1hr.c: Searching for indels only if substitution fails
20169
201702008-05-04  twu
20171
20172    * gsnap.c, stage1hr.c, stage1hr.h, stage3hr.c, stage3hr.h: Added trimming of
20173      mismatches at ends of substitutions
20174
201752008-04-25  twu
20176
20177    * stage3hr.c, stage3hr.h: Printing information about paired result type and
20178      about structural variations in spliced reads.
20179
20180    * stage3.c: Removed include of maxent.h
20181
20182    * stage1hr.c, stage1hr.h: Added hierarchy of paired result types.  Checking
20183      for cross repetitiveness.
20184
20185    * sequence.c: Improved debugging statements
20186
20187    * resulthr.c, resulthr.h: Storing information about paired result type
20188
20189    * iit_plot.c, plotgenes.c, plotgenes.h: Fixed handling of new IIT map format
20190
20191    * pair.c: Changed output format for IIT-readable files (-f 7)
20192
20193    * list.c, list.h: Added function of List_to_array that reports list length
20194
20195    * iit-read.c: Fixed handling of flanking intervals
20196
20197    * gsnap.c: Adding information about paired result type and providing
20198      information about max number of paired paths.
20199
20200    * Makefile.gsnaptoo.am: Using tables for IITs.  Removed iit_update.
20201
20202    * Makefile.dna.am: Using tables for IIT.  Removing gdiag and iit_update.
20203
202042008-04-23  twu
20205
20206    * iit_get.c: Fixed bug in handling queries from stdin
20207
202082008-04-22  twu
20209
20210    * stage3hr.c: Slight improvement in efficiency in eliminating duplicates or
20211      dominated paired end solutions.
20212
20213    * dynprog.c: Reduced mismatch penalty for low quality sequences.  Equalizing
20214      extension penalty for single gaps, regardless of sequence quality.
20215
20216    * maxent.c: Added debugging statements
20217
20218    * stage3hr.c: Fixed bugs in eliminating duplicate or dominated paired-end
20219      results
20220
20221    * iit-read.c: Fixed memory leak for entire IIT structure
20222
20223    * stage3.c: In dual break, peeling pairs first.
20224
20225    * stage3.c: Improved handling of dual breaks by scanning genomic segment
20226
202272008-04-21  twu
20228
20229    * iit_store.c: Fixed bug in handling intervals without divs
20230
20231    * iit-write.c: Added error message if total_nintervals is zero.
20232
20233    * iit-read.c: Modified output for IIT_dump
20234
20235    * fa.iittest, iit_get.out.ok: Modified IIT input/output for new interval
20236      format
20237
20238    * stage3hr.c, stage3hr.h: Added code for printing half introns.  Now storing
20239      chrnum when Stage3_T objects are computed.  Using chrnum to determine
20240      whether two paired ends are connectable.
20241
20242    * stage1hr.c: Using scores to determine whether indel beats substitution.
20243      Added code for finding half introns.  Now storing chrnum when Stage3_T
20244      objects are computed.
20245
20246    * indexdb_hr.c: Added code, not currently used, for using doubles to find
20247      longer oligomers.
20248
202492008-04-15  twu
20250
20251    * stage1hr.c, stage1hr.h: Increased MAX_INDELS, and using it instead of
20252      hard-coded 3
20253
20254    * plotdata.c, segmentpos.c, stage3hr.c, chrnum.c, chrnum.h, chrsubset.c,
20255      gdiag.c, genomepage.c, genomeplot.c: Using new interface to IIT_label
20256
20257    * plotgenes.c, plotgenes.h, stage3.c: Using new interface to IIT_get and
20258      IIT_label
20259
20260    * table.c, table.h: Added functions Table_string_compare and
20261      Table_string_hash
20262
20263    * pmapindex.c, iit_dump.c, iit_plot.c: Using new interface to IIT_read
20264
20265    * match.c, pair.c: Using new interface to Chrnum_to_string
20266
20267    * indexdb.c: Using IIT_total_nintervals
20268
20269    * indexdbdef.h: Moved definition of Indexdb_T to a separate file
20270
20271    * iitdef.h: Added fields for whether labels were read, and for offsets to
20272      various parts of the iit file.
20273
20274    * iit_store.c: Using new version for reporting intervals
20275
20276    * iit_get.c: Using new interface to IIT_get and IIT_read.  Added ability to
20277      center annotations at a given column.
20278
20279    * iit-write.c: Fixed bugs for divs with no intervals
20280
20281    * iit-print.c, iit-print.h: Moved IIT_print procedures back to iit-read.c.
20282
20283    * iit-read.c, iit-read.h: Fixed bug in handling divs with no intervals.
20284      Allowing memory mapping of labels and intervals and their pointers (in
20285      addition to annotations).  Moved IIT_print procedures back to this file.
20286
20287    * gsnap.c: Providing flag for user to specify consecutive matches, to
20288      control speed
20289
20290    * gsnap-to-iit.c: Removed flag for old GSNAP version output format
20291
20292    * gmapindex.c: Using tables to provide information to IIT_write
20293
20294    * get-genome.c, gmap.c: Using new interface to IIT_get and IIT_read
20295
20296    * genome-write.c: Using new interface to IIT_get
20297
202982008-04-10  twu
20299
20300    * gregion.c, match.c, matchpool.c: Made IIT_get_one pass additional parameter
20301
203022008-04-01  twu
20303
20304    * stage1hr.c: Various methods to improve speed, including separate
20305      processing for plus and minus strands, use of threshold_noligomers and a
20306      user-specified threshold_score for finding segments for multiple
20307      mismatches.
20308
203092008-03-31  twu
20310
20311    * stage1hr.c: Removed old code based on fixed (nonrecursive) oligosize
20312
20313    * stage1hr.h: Changed variable names
20314
20315    * stage1hr.c: Using new variable names for paired-end lengths.  Generalized
20316      mask for oligosize.
20317
20318    * gsnap.c: Removed -a flag and replaced it with -S flag.  Changed flags for
20319      paired-end lengths.
20320
20321    * chrnum.c, chrom.c, chrom.h, chrsubset.c, segmentpos.c: Using new interface
20322      to IIT routines with divs.
20323
20324    * get-genome.c: Moved Chrom_string_from_position function to iit-print.c
20325
20326    * stage3hr.h: Changed variable names for paired-end lengths.
20327
20328    * stage3hr.c: Using new interface to IIT routines wiht divs.  Changed
20329      variable names for paired-end lengths.
20330
20331    * stage1hr.c: Made indel alignments extend inward from ends as far as
20332      possible.
20333
20334    * stage1hr.c: Added new routine for computing indels without using dynamic
20335      programming matrix.  Maximizes matches from left to right.
20336
203372008-03-27  twu
20338
20339    * iit-print.c, iit-print.h, iit-read.c, iit-read.h, iit-write.c,
20340      iit-write.h, iit_get.c, iit_store.c, iitdef.h: Introduced version 3 of IIT
20341      format, to handle multiple divs.
20342
203432008-03-20  twu
20344
20345    * Makefile.dna.am, Makefile.gsnaptoo.am: Removed block_hr and blockdef files
20346
20347    * pmapindex.c: Removed both uppercase and lowercase flags, and added -l flag
20348      to make the distinction
20349
20350    * stage3hr.c: Changed order of output so type of match comes before genomic
20351      location
20352
20353    * stage1hr.c: Handling short reads with lowercase characters.  Using
20354      Oligo_hr functions rather than Block_T functions.
20355
20356    * sequence.c, sequence.h: Added functions to handle short reads with
20357      lowercase characters
20358
20359    * oligo_hr.c, oligo_hr.h: Moved leftreadshift step out of oligo_hr functions
20360
20361    * oligo-count.c: Using new interface to Block_new
20362
20363    * indexdb_hr.c: Removed checking for duplicates
20364
20365    * indexdb.c, indexdb.h: Added ability to mask lowercase characters in genome
20366
20367    * gsnap.c: Made program work for query sequences with lower case
20368
20369    * gmapindex.c: Removed uppercase and lowercase flags and added -l flag.
20370      Making ".masked" indexdb files for masked genomes (where lowercase nts not
20371      indexed).
20372
20373    * genome.c, genome.h: Changed name of variable
20374
20375    * block.c, block_hr.c, block_hr.h, blockdef.h: Restored definition of
20376      Block_T to block.c
20377
203782008-03-05  twu
20379
20380    * gsnap.c: Using new interfaces to Stage1 procedures
20381
20382    * stage1hr.c: Deleted debugging statements that give a seg fault
20383
20384    * stage1hr.c, stage1hr.h: Generalized procedures to use arbitrary oligosize
20385
20386    * stage1.c: Using new interface to Block_new
20387
20388    * block.c, block.h, block_hr.c, blockdef.h, oligo.c, oligo.h, oligo_hr.c,
20389      oligo_hr.h: Generalized procedures to handle arbitrary oligosize
20390
20391    * indexdb_hr.c: Fixed bugs in adding wildcard nucleotides
20392
20393    * indexdb.c: Fixed bug in recognizing index file at interval 6
20394
20395    * Makefile.dna.am, Makefile.gmaponly.am, Makefile.gsnaptoo.am: Added new
20396      source files
20397
20398    * stage3hr.c: Generalized print procedures to handle arbitrarily long reads
20399
20400    * stage1hr.c, stage1hr.h: Added recursive procedures for paired end reads
20401
20402    * stage1.c: Using original GMAP calls to Block_T procedures
20403
20404    * oligo.c, oligo.h, oligo_hr.c, oligo_hr.h: Moved GSNAP-specific procedures
20405      to a separate file
20406
20407    * intlist.c, intlist.h: Added function Intlist_ascending_by_key
20408
20409    * indexdb_hr.c: Moved GSNAP-specific procedures to a separate file.
20410
20411    * indexdb.h: Using "id<number>" as file suffix for offsets and positions
20412      files.
20413
20414    * indexdb.c: Moved definition of Indexdb_T object to a separate file.
20415      Separated GSNAP-specific procedures to a separate file.  Using
20416      "id<number>" as file suffix for offsets and positions files.
20417
20418    * gmapindex.c: Removed -e flag for specifying subindexing
20419
20420    * block_hr.c, block_hr.h: Made separate file for GSNAP-specific procedures
20421
20422    * blockdef.h: Put Block_T definition into a separate file
20423
20424    * block.h: Removed Block_T procedures specific to GSNAP
20425
20426    * block.c: Put Block_T definition into a separate file.  Removed GSNAP
20427      parameters for GMAP calls to Block_T procedures.
20428
204292008-03-04  twu
20430
20431    * stage1hr.c: Implemented recursive method for finding exact matches.
20432      Binary search not yet added.
20433
204342008-03-03  twu
20435
20436    * Makefile.am, cvs2cl.pl: Made maintainer Perl machine-independent
20437
20438    * fa_coords.pl.in, gmap_process.pl.in, gmap_setup.pl.in, md_coords.pl.in:
20439      Made different make commands for gmapdb_highres and gmapdb_lowres
20440
20441    * Makefile.dna.am, Makefile.gmaponly.am, Makefile.gsnaptoo.am: Customized
20442      each Makefile.am for its specific task
20443
204442008-02-29  twu
20445
20446    * gmapindex.c, iit-read.c, segmentpos.c: Using new interface to obtain
20447      strings from Chrom_T objects
20448
20449    * chrom.c, chrom.h: Restricted criteria for considering initial part of
20450      chromosome string as numeric.  Now storing initial string directly.
20451
20452    * gsnap.c: Using new interface to print commands
20453
204542008-02-28  twu
20455
20456    * stage3hr.c, stage3hr.h: Changed output to be more uniform, in a 1-column
20457      format
20458
20459    * list.c: Added include of string.h
20460
20461    * iit_plot.c: Made program able to print counts
20462
20463    * iit-read.c: Added more informative error messages when offset appears
20464      incorrect relative to filesize.  Removed output of type in print_record.
20465
20466    * gsnap-to-iit.c: Handles new GSNAP output format.  Handles remapping to
20467      genome.
20468
20469    * get-genome.c: Made program work correctly on chromosomally tagged IIT map
20470      files
20471
20472    * genomepage.c, genomepage.h: Removed sequence as a parameter
20473
20474    * pair.c, pair.h, stage3.c: Modified output of exon map
20475
20476    * plotgenes.c, plotgenes.h: Added function for printing counts
20477
20478    * Makefile.dna.am, Makefile.gsnaptoo.am, blackboard.c, blackboard.h, gmap.c,
20479      gsnap.c, params.c, params.h, reqpost.c, reqpost.h: Removed Params_T object
20480
204812008-02-26  twu
20482
20483    * gsnap-to-iit.c: Handling new version of gsnap output (after remapping).
20484
20485    * gsnap-to-iit.c: Added -b flag to specify blocksize.  Made default
20486      blocksize 10000.
20487
204882008-02-13  twu
20489
20490    * iit_plot.c, plotgenes.c, plotgenes.h: Fixed printing of genes in ascii
20491      format
20492
204932008-02-08  twu
20494
20495    * plotgenes.c, plotgenes.h: Added binning by pixel.  Removed allgenesp for
20496      plot_counts.
20497
204982008-02-07  twu
20499
20500    * gsnap-to-iit.c, plotgenes.c: Modified count format for IITs to store
20501      information in batches
20502
20503    * plotgenes.c: Added printing of alternate counts.  Fixed problem for calls
20504      to IIT_get_typed.
20505
20506    * gsnap-to-iit.c: Initial import into CVS
20507
205082008-02-06  twu
20509
20510    * iit_plot.c: Increased top margin.  Added -V flag for handing count data.
20511
20512    * plotgenes.c, plotgenes.h: Added function for plotting count data.
20513      Handling signs for both versions 1 and 2 of IIT files.
20514
20515    * iit-read.h: Added interface for IIT_version()
20516
20517    * iit-read.c: Added abort statement for negative coordinates
20518
20519    * sequence.c: Added functions for skipping sequences
20520
20521    * indexdb.c: Commented out some information output to stderr
20522
20523    * iit_get.c: If iit file not found, try adding ".iit" suffix
20524
20525    * stage3hr.c: Printing distances for spliced reads only if distance value is
20526      nonzero
20527
20528    * stage1hr.c: Fixed calculation of distances in spliced reads
20529
205302008-02-05  twu
20531
20532    * Makefile.gmaponly.am, Makefile.gsnaptoo.am, Makefile.pmaptoo.am: Added
20533      compiler commands for iit_plot
20534
20535    * iit_plot.c: Taken from mapplot.c in gdp.
20536
20537    * genomepage.c, genomepage.h: Extracted commands from gdata-write in gdp.
20538
20539    * plotgenes.c, plotgenes.h: Incorporated changes from gdp.  Improved
20540      plotting capabilities.
20541
20542    * list.c, list.h: Incorporated changes from gdp.  Added List_from_string.
20543
20544    * color.c: Incorporated changes from gdp.  Removed yellow.
20545
205462008-01-30  twu
20547
20548    * gsnap.c: Limited reporting of exon-exon paths.  Added -E flag to turn off
20549      finding of exon-exon solutions.
20550
20551    * genome.c, genome.h: Made Genome_fill_buffer return a false value if it
20552      goes into negative genome coordinates.
20553
205542008-01-29  twu
20555
20556    * stage1hr.c: Skipping cases that result in negative genomic coordinates.
20557      Skipping cases of finding first indels when alignment doesn't extend to
20558      the end.
20559
20560    * stage3hr.c, stage3hr.h: Made fixes for handling exon-exon junctions
20561
20562    * stage1hr.c: Fixed problems in handling various combinations of
20563      sense/antisense and plus/minus strands for exon-exon junctions.
20564
20565    * gmap.c: Made finding canonical introns the default.  Made -X flag take an
20566      argument.
20567
205682008-01-16  twu
20569
20570    * stage1hr.c, stage3hr.c, stage3hr.h: Improved algorithm for finding and
20571      ranking chimeras
20572
205732008-01-14  twu
20574
20575    * gmap.c, pair.c, pair.h, stage3.c, stage3.h: Added output in IIT exon map
20576      format
20577
20578    * stage1hr.c, stage3hr.c, stage3hr.h: Added printing of number of mismatches
20579      for chimeras
20580
20581    * dynprog.c: Added type cast for memset.
20582
20583    * stage1hr.c: Reduced max mismatches to 4.  Penalized mismatches further in
20584      finding breakpoints for chimeras.
20585
20586    * stage3hr.c: Reporting breakpoint coordinates for chimeras
20587
20588    * stage1hr.c: Increased penalty for mismatches to help find correct
20589      breakpoint for chimeras
20590
205912008-01-11  twu
20592
20593    * Makefile.gmaponly.am, Makefile.gsnaptoo.am, Makefile.pmaptoo.am,
20594      stage1hr.c, stage3hr.c, stage3hr.h: Added probabilistic calculations of
20595      splice sites
20596
20597    * gsnap.c, stage1hr.c, stage1hr.h, stage3hr.c, stage3hr.h: Implemented code
20598      for identifying chimeras
20599
20600    * resulthr.c, resulthr.h: Added a new type for chimeras
20601
206022008-01-09  twu
20603
20604    * stage1hr.c: Fixed typo in variable name
20605
20606    * resulthr.c, resulthr.h, stage3hr.c, stage3hr.h, gsnap.c: Added the ability
20607      to report paired-end cases that fail to co-localize
20608
20609    * gmap.c: Cleaned up some code
20610
20611    * gsnap.c, stage1hr.c, stage1hr.h: Added -o flag to specify optimum length
20612
206132008-01-08  twu
20614
20615    * gsnap.c: Added -I flag for specifying inversion of second read of paired
20616      end read
20617
20618    * sequence.c, sequence.h: Added procedure for printing revcomp of a short
20619      read
20620
20621    * stage3hr.c, stage3hr.h: Provided options for printing second read either
20622      in original direction or as reverse complement.
20623
20624    * stage1hr.c: Fixed various memory leaks
20625
20626    * gsnap.c: Added flag to print all solutions, either for single read or for
20627      paired end read.
20628
20629    * stage3hr.c, stage3hr.h: Added procedures for sorting results of single
20630      read mappings
20631
20632    * stage1hr.c, stage1hr.h: Added ability to print all solutions in single read
20633
20634    * stage3hr.c, stage3hr.h: Added sorting of results by closeness to optimal
20635      distance
20636
20637    * gsnap.c: Removed unused variables.  Removed instant printing feature.
20638
20639    * stage1hr.c, stage1hr.h: Removed instant printing feature
20640
206412008-01-07  twu
20642
20643    * resulthr.c, resulthr.h: Generalized Result_T object so it can print either
20644      single or paired end results
20645
20646    * sequence.c, sequence.h: Implemented procedure for reading short reads,
20647      either single or paired ends.
20648
20649    * request.c, request.h: Enabled storage of paired reads in Request_T object
20650
20651    * stage3hr.c, stage3hr.h: Implemented routines for storing and printing
20652      paired ends
20653
20654    * stage1hr.c: Implemented separate strategy for handling reads with poly-A
20655      or poly-T 12-mers.  In such cases, need to test 12-mers exhaustively.
20656
20657    * stage1hr.c, stage1hr.h: Initial implementation of mapping for paired
20658      reads.  For consistency, changed indel to be same rank as sub:2 for single
20659      reads.  Generalized separator used in printing results.
20660
206612008-01-04  twu
20662
20663    * stage1hr.c: Separated single read strategy into separate components
20664
20665    * sequence.c: Fixed a memory leak.
20666
20667    * stage3hr.c, stage3hr.h: Added a stage3 procedure specific for GSNAP
20668
20669    * sequence.c, sequence.h: Added a read procedure that converts input to
20670      uppercase
20671
20672    * gsnap.c, resulthr.c, resulthr.h: Made GSNAP algorithm return results
20673      rather than printing them
20674
20675    * Makefile.gmaponly.am, Makefile.gsnaptoo.am, Makefile.pmaptoo.am: Removed
20676      gregion.c and added stage3hr.c to GSNAP build
20677
20678    * stage1hr.c, stage1hr.h: Made algorithm return results rather than printing
20679      them.  Fixed a bug in handling cases with mismatches on both ends.
20680
206812007-12-19  twu
20682
20683    * gmap.c, pair.c, pair.h, stage3.c, stage3.h: Added -4 flag for printing
20684      alignments per exon
20685
20686    * gsnap.c: Removed unused code
20687
20688    * stage1hr.c: Removed unused header files and objects
20689
20690    * stage1.c, stage1.h: Added function Stage1_size()
20691
20692    * resulthr.c, resulthr.h: Added specialized Result_T for GSNAP
20693
20694    * reqpost.h: Added conditional include of resulthr.h for GSNAP
20695
20696    * params.c, params.h: Removed obsolete fields
20697
20698    * matchpair.c, matchpair.h: Removed obsolete functions
20699
20700    * iit-read.c: Removed warning message about not finding a file
20701
20702    * stage3.h: Made maponly mode work with Gregion_T objects.
20703
20704    * stage3.c: Fixed bug where genomicuc_ptr was NULL.  Made maponly mode work
20705      with Gregion_T objects.
20706
20707    * gregion.c, gregion.h: Added fields to Gregion_T for maponly mode.
20708
20709    * gmap.c: Restored old maponlyp code.  Added error message if pthreads fails
20710      on operating system.
20711
20712    * genuncompress.c: Added flag to print one character per line
20713
20714    * blackboard.c, blackboard.h: Added function to see if blackboard is done
20715
20716    * Makefile.gmaponly.am, Makefile.gsnaptoo.am, Makefile.pmaptoo.am: Added
20717      commands for GSNAP
20718
207192007-12-07  twu
20720
20721    * pair.c: Fixed bug with trim_end, now an exclusive coordinate rather than
20722      an inclusive one.
20723
20724    * cvs2cl.pl: Added changelog program to CVS
20725
20726    * gmap.c: Changed calls to Stage1_compute to match new interface
20727
20728    * stage1hr.h: Removing unused parameters
20729
20730    * stage1hr.c: Corrected positions of 12-mers for sequences shorter than 36
20731      nt. Reduced final threshold score to 12.  Checking for repetitive sequence.
20732
20733    * stage1.c, stage1.h: Separated stage 1 low-resolution procedure from
20734      high-resolution procedure.
20735
20736    * block.c, block.h, indexdb.c, indexdb.h, oligo.c, oligo.h: Added procedure
20737      for counting number of genomic positions for a given oligomer
20738
20739    * gsnap.c: Now calling Stage1hr_compute directly.  Removed some unused code.
20740
20741    * gmap.c: Restored maponly mode
20742
207432007-12-06  twu
20744
20745    * stage1hr.c: Initial attempt to generalized procedure to handle oligomers
20746      shorter than 36, using initial testing of 3 12-mers.
20747
207482007-12-01  twu
20749
20750    * stage1hr.c: Fixed bugs in computation and printing of middle indels.
20751      Fixed bug when best_querypos (in terms of npositions) was -1.
20752
20753    * stage1hr.c: Added strategy of using most specific oligomer to drive search
20754      for exact matches.
20755
20756    * stage1hr.c: Added keep_score parameter to find_segments.
20757
20758    * indexdb.c: Made heapify faster.  Hard-coded left_nts and right_nts to be 1
20759      in read_shifted.
20760
20761    * genome.c: In Genome_fill_buffer, checking for negative starting
20762      coordinate, and filling with N's if necessary.
20763
207642007-11-30  twu
20765
20766    * stage1hr.c: Limited dynamic programming to just the non-matching oligomer,
20767      whenever possible.
20768
20769    * stage1hr.c: Added a triplet matching step with binary search to find exact
20770      matches.  Fixed a bug in find_segments in handling the last diagonal.
20771
207722007-11-29  twu
20773
20774    * stage1hr.c: Changed output format to show substitutions, insertions, and
20775      deletions.  Made speed improvements in heap algorithm.
20776
20777    * gsnap.c: Simplified code for handling short reads.  Stopped usage of
20778      oligoindex.
20779
20780    * stage1hr.c: Implemented version that handles indels.  Some speed
20781      improvements in reporting exact matches when found.
20782
207832007-11-27  twu
20784
20785    * stage1hr.c: Removed unused code
20786
20787    * stage1hr.c, stage1hr.h: Working version implemented for 36-mers, allowing
20788      for substitutions
20789
20790    * stage1.h: Taking queryseq as an argument for Stage1_compute (needed for
20791      gsnap).
20792
20793    * stage1.c: Using new interface to Block_process_oligo
20794
20795    * Makefile.am: Makefile.am now generated by bootstrap from other files
20796
20797    * stage2.c: Added debugging statements
20798
20799    * sequence.c, sequence.h: Added procedure Sequence_print_oneline
20800
20801    * rbtree.c, rbtree.h, rbtree.t.c, gregion.c, gregion.h: Initial import into
20802      CVS
20803
20804    * result.c, result.h: Added procedure Result_blank
20805
20806    * params.c, params.h: Removed truncstep
20807
20808    * oligoindex.c: Added correct calculation of badoligos
20809
20810    * oligo.c, oligo.h: Providing diagterm information to lookups from indexdb
20811
20812    * indexdb.c, indexdb.h: Changed high-resolution indexdb to be subclassified
20813      by adjacent nucleotides, rather than by phase.
20814
20815    * gsnap.c: Adding separate main program for gsnap.
20816
20817    * block.h: Added function Block_skipto.  Giving diagterm information to
20818      Oligo_lookup.
20819
20820    * block.c: Added function Block_skipto.  Revised coordinates assigned to
20821      last_querypos.  Giving diagterm information to Oligo_lookup.
20822
20823    * Makefile.gmaponly.am, Makefile.pmaptoo.am: Added hooks for gsnap
20824
20825    * fa_coords.pl.in: Improved handling of cases where chromosome is not parsed
20826
20827    * gmap_setup.pl.in: Added -H flag to generate high-resolution gmap dbs.
20828
208292007-11-26  twu
20830
20831    * iit_store.c: Fixed bug in handling GFF files
20832
208332007-11-14  twu
20834
20835    * indexdb.c, indexdb.h: Implemented precise positioning by organizing
20836      composite positions according to phase
20837
208382007-11-13  twu
20839
20840    * result.c, result.h: Remove stage 1 diagnostic information
20841
20842    * matchpair.c, matchpair.h: Making matchpair generate gregion as output from
20843      stage 1
20844
20845    * gmap.c: Using new interface to stage 1.  Removed maponly output.
20846
20847    * Makefile.gmaponly.am, Makefile.pmaptoo.am, stage1hr.c, stage1hr.h: Moved
20848      high-resolution stage 1 algorithm to a different file
20849
20850    * stage1.c, stage1.h: Eliminated diagnostic fields.  Made interface for
20851      low-resolution version compatible with high-resolution version.
20852
208532007-11-02  twu
20854
20855    * stage3.c, stage3.h: Removed matchpairend and Stage3_direct procedure
20856
20857    * stage1.c, stage1.h: Reverting back to 2007-09-28 version
20858
20859    * pmapindex.c: Changed order of arguments in a function call
20860
20861    * params.c, params.h: Added slots for truncstep and chromosomal transitions.
20862
20863    * list.c, list.h: Added functions List_insert and List_reinsert.
20864
20865    * indexdb.c, indexdb.h: Added function Indexdb_shiftedp.  For
20866      high-resolution indexdbs, added code to merge batches using either a queue
20867      or a heap.
20868
20869    * iit-read.c, iit-read.h: Added function IIT_transitions_subset.
20870
20871    * gmapindex.c: Added -e flag to specify high-resolution genomic indices
20872
20873    * chrsubset.c, chrsubset.h: Added function Chrsubset_transitions.  Added
20874      assumption to Chrsubset_includep.
20875
20876    * chrnum.c, chrnum.h: Added function Chrnum_print_position
20877
20878    * block.h: Added function Block_donep.
20879
20880    * block.c: Improved debugging output.  Added function Block_donep.
20881
20882    * access.h: Added function to report if file exists.
20883
20884    * access.c: Improved error messages.  Added function to report if file
20885      exists.
20886
208872007-10-16  twu
20888
20889    * stage1.c: Refined high-resolution algorithm
20890
208912007-10-11  twu
20892
20893    * orderstat.c: Included appropriate header files for memcpy
20894
208952007-10-08  twu
20896
20897    * reader.c: Made reader go all the way to the ends of the sequence
20898
20899    * sequence.c: Fixed computation of trimlength
20900
20901    * indexdb.c, indexdb.h: Implemented read and write procedures for new
20902      genomic index format (trading off position resolution for adjacent
20903      nucleotide contents).
20904
209052007-10-07  twu
20906
20907    * stage1.c: Implemented mapping at ends
20908
209092007-10-06  twu
20910
20911    * stage1.c: Completed initial mapping from middle outward
20912
20913    * stage1.c: Added computation of best subpaths
20914
209152007-10-03  twu
20916
20917    * stage1.c: Implemented high-resolution mapping, and arbitrarily long
20918      matches for the middle of the sequence outward.
20919
209202007-09-30  twu
20921
20922    * stage1.c: Attempt to use diagonals to find genomic position
20923
209242007-09-29  twu
20925
20926    * genome.c, genome.h: Added Genome_totallength function
20927
20928    * stage1.c, stage1.h: Added procedure to match doubles of truncated indexdb
20929      entries
20930
209312007-09-28  twu
20932
20933    * gmap_setup.pl.in: In -S mode (treating each contig as a chromosome),
20934      turning off sorting of chromosomes and contigs.
20935
20936    * gmapindex.c: Added -S flag to turn off sorting of chromosomes and contigs
20937
20938    * table.c, table.h, tableint.c, tableint.h: Added ability to return keys
20939      sorted by timeindex
20940
20941    * stage1.c, trial.c, trial.h: Changes made to scan query sequence from
20942      middle outward
20943
20944    * VERSION: Updated version
20945
209462007-09-27  twu
20947
20948    * gmap.c: Fixed bug for -f 9 and -E output when no paths were found
20949
209502007-09-26  twu
20951
20952    * VERSION: Updated version
20953
20954    * index.html: Revised features for 2007-09-26 version
20955
20956    * gmap_update.pl.in: Made new IIT file permissions the same as the old
20957      permissions
20958
20959    * iit-read.c: Added error messages to various conditions in IIT_read
20960
209612007-09-25  twu
20962
20963    * sequence.c: Fixed reading of sequences with multiple PC line feeds
20964
209652007-09-20  twu
20966
20967    * stage1.c: Kept code that depended on USE_MATCHPOOL and removed alternate
20968      (old) code
20969
20970    * block.c, block.h: Put save variables inside Block_T object
20971
209722007-09-19  twu
20973
20974    * VERSION: Updated version number
20975
20976    * MAINTAINER: Added reminder to do cvs tag
20977
20978    * iit-read.c, iit-read.h, iit-write.c: Moved compute_flanking procedure from
20979      iit-read.c to iit-write.c
20980
20981    * configure.ac, Makefile.am, gmap_update.pl.in: Added gmap_update program
20982
20983    * Makefile.am: Added compile instructions for iit_update
20984
20985    * iit-write.c: Made stringlen of type off_t (to handle annotations of length
20986      greater than can be handled by int).  Added check to make sure stringlen
20987      is non-zero.
20988
20989    * iit-read.c: Made stringlen of type off_t (to handle annotations of length
20990      greater than can be handled by int)
20991
20992    * archive.html, index.html: Made changes for 2007-09-20 release
20993
209942007-09-18  twu
20995
20996    * Makefile.gmaponly.am, Makefile.pmaptoo.am, iit-read.c, iit-read.h,
20997      iit-write.c, iit-write.h, iit_update.c: Implemented iit_update program
20998
20999    * iit_store.c: Added -v flag to specify desired version
21000
210012007-09-17  twu
21002
21003    * oligoindex.c, stage2.c: Changed R output for diagonal graphics
21004
210052007-09-12  twu
21006
21007    * gmap.c: Added a check to make sure we don't push NULL for Stage3_T object.
21008
21009    * dynprog.c: Fixed bug in Dynprog_dual_break; need to compute matrix scores
21010      only to the minimum of length1 and length2.
21011
210122007-09-11  twu
21013
21014    * dynprog.c: Fixed problem where Dynprog_dual_break was exiting
21015      unnecessarily; need to be concerned only about shorter distance.
21016
21017    * stage3.c: Added cDNA direction to debugging statements
21018
21019    * iit_get.c, stage3.c: Added sign argument for getting flanking entries
21020
21021    * pair.c: Added provision in PMAP to limit coverage to 100% (could exceed
21022      previously because of implicit stop codon added at end of query sequence).
21023
21024    * iit-read.c, iit-read.h: Added a sign argument for getting flanking entries
21025
21026    * get-genome.c: Added flags for accessing from map files entries of a
21027      particular direction or tag
21028
21029    * stage1.c: Performing filtering based on clustersize only if too many
21030      entries and at least one cluster is large.
21031
210322007-09-04  twu
21033
21034    * stage1.c: Removed filtering based on too many matching pairs
21035
210362007-08-30  twu
21037
21038    * gbuffer.c, gbuffer.h, gmap.c: Removed unused code and parameters from
21039      Gbuffer_T
21040
21041    * gmap.c: Allocating memory for genomicseg only as needed
21042
210432007-08-28  twu
21044
21045    * blackboard.c, blackboard.h, gmap.c, sequence.c, sequence.h: Added ability
21046      to read input from multiple sequence files
21047
21048    * oligoindex.h: Changed calls to reset oligoindex
21049
21050    * oligoindex.c: Fixed hang that resulted when no oligomer positions were
21051      found.  Eliminated an extra call to Oligoindex_set_inquery.
21052
21053    * stage2.c: Changed call to Oligoindex to reset after tally
21054
21055    * stage1.c: Modified debugging output
21056
21057    * match.c, mem.c: Enhanced debugging output
21058
21059    * stage2.c, stage2.h: Returned to previous algorithm for finding shifted
21060      canonical dinucleotides, but now allocating memory dynamically.
21061
21062    * gbuffer.c, gbuffer.h: Removed pre-allocated memory for finding shifted
21063      dinucleotides
21064
21065    * stage2.c: Attempt to conserve memory used in finding shifted canonical
21066      dinucleotides.  However, results in speed penalty.
21067
210682007-08-26  twu
21069
21070    * gbuffer.c, gbuffer.h: Removed unused matchscores variable and unnecessary
21071      memory allocation.
21072
210732007-08-23  twu
21074
21075    * gmap_uncompress.pl.in: Added a missing space in the output
21076
21077    * gmap_uncompress.pl.in: Added coordinates output (with flag '-f 9')
21078
210792007-08-22  twu
21080
21081    * pair.c: Fixed potential divide-by-zero bug
21082
210832007-08-20  twu
21084
21085    * gmap_setup.pl.in: Added a .SUFFIXES: command at top to prevent unexpected
21086      behaviors
21087
210882007-08-18  twu
21089
21090    * stage2.c: Added step to recover when all scores at a querypos are
21091      negative, by continuing from grand result.
21092
210932007-08-16  twu
21094
21095    * pair.c, pair.h, stage2.c, stage2.h, stage3.c: Computing defect rate in
21096      middle of stage 3, instead of in stage 2
21097
210982007-08-15  twu
21099
21100    * oligoindex.c: Restored amino acid alphabet to 20 from 18.
21101
21102    * oligoindex.c: Fixed typo in variable name
21103
21104    * stage2.c: Inactivated limit on number of active hits
21105
21106    * Makefile.gmaponly.am, Makefile.pmaptoo.am: Added orderstat.c and
21107      orderstat.h to code
21108
21109    * orderstat.c, orderstat.h: Modified procedures to compute order statistics
21110      in place and for both doubles and ints.
21111
21112    * oligoindex.c: Computing overabundance based on upper percentile of
21113      non-zero counts
21114
21115    * doublelist.h: Added Id info to header
21116
21117    * gmap.c: Fixed memory leak when user segment is provided
21118
21119    * orderstat.c, orderstat.h: Added orderstat to CVS
21120
21121    * oligoindex.c, oligoindex.h: Trial to eliminate limit on maxoligohits
21122
21123    * stage2.c: Improved output for graphical debugging
21124
21125    * oligoindex.c: Improved debugging output
21126
211272007-08-13  twu
21128
21129    * stage2.c: Fixed problem with debugging output
21130
21131    * stage1.c: Added pruning by path sizes
21132
21133    * matchpair.c, matchpair.h: Added a procedure for finding path size of a
21134      given matchpair
21135
211362007-07-16  twu
21137
21138    * Makefile.gmaponly.am, Makefile.pmaptoo.am: Created two specialized
21139      Makefile.am files
21140
21141    * Makefile.am: Preparing for iit and genome libraries
21142
21143    * bootstrap, bootstrap.gmaponly, bootstrap.pmaptoo: Created separate
21144      bootstrap routines for gmap and gmap-plus-pmap
21145
21146    * VERSION: Updated version number
21147
21148    * iit-read.c: Computing alphas and betas for iit_dump
21149
21150    * iit-read.c: Computing alphas and betas only when needed for flanking
21151
21152    * gmapindex.c, iit-read.c, iit-read.h, iit-write.c, iit-write.h, iit_get.c,
21153      iit_store.c, iitdef.h: Added fields for annotation in IITs
21154
21155    * indexdb.c: Added monitoring information
21156
211572007-06-25  twu
21158
21159    * genome-write.c, segmentpos.c: For version 2 IITs and later, getting sign
21160      directly from IIT, rather than from annotation.
21161
21162    * get-genome.c, plotgenes.c: Providing sortp parameter to IIT_get
21163
21164    * iit_store.c: Added -v flag to print IIT version
21165
21166    * iit_get.c: Added -U flag to indicate unsigned results
21167
21168    * genome.c: Added comment
21169
21170    * gmapindex.c: No longer writing segment length in contig IITs
21171
21172    * iit-write.c: Writing alphas and betas for correct calculation of flanking
21173      intervals.
21174
21175    * iit-read.c, iit-read.h: Using alphas and betas for correct calculation of
21176      flanking intervals. Added functions IIT_types, IIT_get_all, and
21177      IIT_get_all_typed.
21178
21179    * interval.h: Added interface for Interval_sign
21180
21181    * iitdef.h: Added space for alphas and betas, needed for correct calculation
21182      of flanking intervals
21183
211842007-06-22  twu
21185
21186    * dynprog.h: Defined UNKNOWNJUMP to be used for temporary gapholders during
21187      stage 3 calculations.
21188
21189    * dynprog.c: Returning NULL on all failures, without gapholders (which are
21190      now inserted by calling procedures in stage 3).  Allowing 5' and 3'
21191      extensions to work to maxlength allowed.
21192
211932007-06-21  twu
21194
21195    * iitdef.h: Added version to IIT_T
21196
21197    * interval.c: Storing sign for each interval
21198
21199    * iit-read.c, iit-read.h, iit-write.c, iit-write.h: Introduced version 2
21200      format, which stores sign for each interval.  Not using annotation anymore
21201      to represent sign.  Added function IIT_find_multiple.
21202
212032007-06-20  twu
21204
21205    * pairpool.c: Improved debugging statements
21206
21207    * gdiag.c, get-genome.c, gmap.c, iit-read.c, iit-read.h, iit_get.c,
21208      interval.c, interval.h, plotgenes.c, segmentpos.c, stage3.c: Added ability
21209      to sort intervals by coordinates in IIT_get routines
21210
21211    * mem.c: In comments, showing how TRAP should be defined
21212
21213    * stage3.c: Whenever dynprog procedure returns NULL, make sure to put back
21214      peeled pairs and insert a gapholder.  Fixes a bug in BQ672778 against
21215      hg18. Jumps in gapholders now calculated only in certain procedures.
21216
212172007-06-06  twu
21218
21219    * stage3.c: Removed abort commands on peels that run into gaps
21220
21221    * dynprog.c: Expanded on comment
21222
212232007-06-04  twu
21224
21225    * VERSION, index.html: Updated version
21226
21227    * dynprog.c: Removed insertion of gapholder for a single gap that is too
21228      long to solve.
21229
212302007-06-02  twu
21231
21232    * stage1.c: Added debugging statement to signal end of stage 1
21233
21234    * dynprog.c: Eliminated allocation of temporary Dynprog_T objects
21235
21236    * pair.c: Now printing query_skip in -A and -S output
21237
212382007-05-29  twu
21239
21240    * dynprog.c: Lowered gap penalties for single gaps.  Fixed bug in solving
21241      dynamic programming for lower-case input sequences.
21242
21243    * stage3.c: Added procedures for cleaning non-matches at ends of alignment,
21244      which are always called.
21245
21246    * stage3.c: Removing all nonmatches at 5' and 3' ends
21247
212482007-05-25  twu
21249
21250    * VERSION: Updated version
21251
21252    * index.html, archive.html: Made changes to reflect 2007-05-25 version
21253
21254    * pair.c, pair.h, stage3.c: Added coverage and identity information to GFF3
21255      output
21256
212572007-05-24  twu
21258
21259    * configure.ac: Checking both fixed and variable mapping for mmap
21260
21261    * fa_coords.pl.in: Fixed problem in parsing lines containing the word
21262      "chromosome"
21263
21264    * stage3.c: Fixed bug in solving dual breaks
21265
21266    * pair.c: Fixed PSL output so query and target gaps are computed directly
21267      from the block starts and lengths.
21268
21269    * gmap.c: Added flag -j for showing dual breaks
21270
21271    * dynprog.c: Fixed bug where solution of dual break exceeded minimum gap
21272
212732007-05-15  twu
21274
21275    * stage3.h: Added do_final_p parameter to Stage3_compute
21276
21277    * stage3.c: Incorporating procedure to trim bad middle exons
21278
21279    * stage1.c: Performing salvage if total number of matches is relatively low
21280
21281    * smooth.c: Added to exon length in smoothing for short exons
21282
21283    * pair.c: Added test code to print information about extra exons
21284
21285    * intron.c: Removing call to abort.
21286
21287    * dynprog.c: Added code to bridge dual break with rewards for canonical
21288      introns. Tweaked some parameters, including less penalty for gap
21289      extensions.
21290
212912007-04-25  twu
21292
21293    * mem.c: Added error messages for memory allocation problems
21294
21295    * gmap.c: Added -X flag for heavily favoring canonical and semi-canonical
21296      introns
21297
212982007-04-23  twu
21299
21300    * gmap.c: Made strict translation the default again
21301
21302    * indexdb.c: Added "U" to integers in bit operations
21303
21304    * genome.c: Using two arrays instead of one for translate.
21305
21306    * compress.c: Using two arrays instead of one for translate.  Added various
21307      abort checks.
21308
21309    * translation.c: Removed unnecessary checks of extraexonp
21310
21311    * pair.c: Fixed Pair_dump_one so it handles extraexonp flag
21312
21313    * pairpool.c: Fixed bug in assigning extraexonp
21314
21315    * pair.c: Changed add_intronlengths slightly
21316
21317    * acx_mmap_variable.m4: Moved AC_DEFINE out of macro, to be called
21318      explicitly in configure.ac
21319
21320    * acx_mmap_fixed.m4: Added macro for testing mmap with MAP_FIXED
21321
21322    * gmap.c: Made frameshift-tolerant translation the default
21323
21324    * pair.c: Changed output of CDS in gff3 mode to produce an in-frame protein
21325      sequence
21326
21327    * translation.h, stage3.h: Added strictp for PMAP.
21328
21329    * translation.c: Added strictp for PMAP.  Handling extraexonp items in
21330      alignment.
21331
21332    * stage3.c: Giving extraexonp information to pairpool procedures for
21333      gapalign items
21334
21335    * pair.h: Added procedure for computing fractional error.
21336
21337    * pair.c: Added provisions for handling extraexonp flag.  Added procedure
21338      for computing fractional error.
21339
21340    * pairpool.c, pairpool.h: Added provisions for handling extraexonp flag
21341
21342    * pairdef.h: Added flag for extra cDNA exon
21343
21344    * comp.h: Added comp for extra cDNA exon
21345
21346    * stage3.c: Added procedures for trimming internal exons with poor matches,
21347      and for finding extra exons in a dual break.  Added strictp for PMAP.
21348
21349    * dynprog.c, dynprog.h: Added procedures for finding extra exons in a dual
21350      break
21351
21352    * compress.c: Provided more informative error message
21353
21354    * oligoindex.c: Handled an arithmetic error caused by divide by zero
21355
21356    * gmap.c: Added flag -H for handling trimming middle exons and reporting of
21357      dual breaks.  Removed flag -j.  Allowed strictp to be used in PMAP.
21358
213592007-04-16  twu
21360
21361    * align.test.ok, map.test.ok: Made test output match current output
21362
21363    * acx_mmap_variable.m4, configure.ac: Added check for mmap using
21364      MAP_VARIABLE (because AIX fails on MAP_FIXED)
21365
21366    * gmap.c: Fixed global variables not used in PMAP
21367
21368    * stage2.c: Fixed check for negative scores to work for PMAP
21369
21370    * stage1.c: Placing bound on number of samples taken over query sequence
21371
21372    * stage2.c: For very long sequences, pruning based on clear coverage
21373
21374    * oligoindex.c, oligoindex.h: Added computation of clear coverage (percent
21375      of query sequence covered by relatively few diagonals).
21376
21377    * matchpair.c: Added debugging statements
21378
213792007-04-06  twu
21380
21381    * stage2.c: Added restriction on distance for grand lookback.  Added check
21382      on nactive for scoring at querypos.
21383
21384    * stage2.c: Added nactive as a filter in stage 2
21385
213862007-04-02  twu
21387
21388    * stage2.c: Added comment about a way to restrict debugging output
21389
21390    * iit-read.c: Added start/end information to endpoints in dumping counts
21391
213922007-03-22  yunli
21393
21394    * modules: *** empty log message ***
21395
213962007-03-21  yunli
21397
21398    * modules: *** empty log message ***
21399
214002007-03-08  twu
21401
21402    * stage2.c: Quitting if totalpositions is zero
21403
21404    * gmap.c: Enabled table output (-f 9) for relative alignment mode (-w).
21405
214062007-03-03  twu
21407
21408    * stage2.c: Fixed bug which led to the partial fill problem in filling
21409      oligomers
21410
214112007-03-02  twu
21412
21413    * Makefile.am: Added pthread_libs for iit utilities
21414
214152007-03-01  twu
21416
21417    * gdiag.c: Added printing of centromere regions.  Printing marginals more
21418      efficiently.
21419
21420    * Makefile.am: Added chrnum and chrsubset files to gdiag
21421
21422    * iit-read.c, iit-read.h: Made IIT_transitions function return signs
21423
214242007-02-28  twu
21425
21426    * gdiag.c: Added flag for ignoring main diagonal.  Fixed problem with
21427      printing revcomp diagonals.  Added code for computing different types of
21428      patterns.
21429
214302007-02-21  twu
21431
21432    * gdiag.c: Added hooks for user to enter chromosomal subsets
21433
21434    * complement.h: Added a character code string that doesn't convert to
21435      uppercase, for gdiag.
21436
214372007-02-20  twu
21438
21439    * gdiag.c: Added ability to use map iit files.  Skipping over masked regions
21440      in determining lookback.
21441
21442    * gdiag.c: Fixed bug in retrieving last part of sequence from gmapdb.
21443      Providing flag to ignore lowercase (e.g., masked) characters in query.
21444
21445    * gdiag.c: Added ability for user to provide genomic segment
21446
21447    * Makefile.am: Added iit-write source files for gdiag
21448
21449    * gdiag.c: Fixed bug where genomestart could be less than genomeend in a
21450      diagonal.  Made separate procedures for updating forward and revcomp
21451      diagonals.
21452
21453    * indexdb.c, indexdb.h: Added procedure IIT_read_inplace for gdiag.
21454
21455    * iit-write.h: Added procedure IIT_new to allow creation and use of iit in
21456      the same program.
21457
21458    * iit-write.c: Added procedure IIT_new to allow creation and use of iit in
21459      the same program.  Simplified code for Node_fwrite.
21460
21461    * iit-read.c, iit-read.h: Added procedure IIT_transitions
21462
214632007-02-18  twu
21464
21465    * gdiag.c: Added a ring structure to increase speed
21466
21467    * genome.c, genome.h: Added procedures for gdiag
21468
21469    * gdiag.c: Made speed improvements by not storing full 24-mers, but rather
21470      storing results of previous 12-mers
21471
21472    * gdiag.c: Added calculation of diagonals, ability to read query from
21473      gmapdb, and storage of intervals.
21474
214752007-02-16  twu
21476
21477    * Makefile.am, gdiag.c: Added program gdiag
21478
214792007-02-12  twu
21480
21481    * gmap.c: Increased parameter for maxoligohits
21482
21483    * stage3.c: Lowered parameter for intronlen
21484
21485    * stage2.h: Removed unused function
21486
21487    * stage2.c: Changed distance penalty to consider both gendistance (now
21488      linearly, instead of logarithmically) and querydistance (quadratically).
21489      Using both maxnconsecutive and pct_coverage to decide whether to continue
21490      with stage 2.
21491
21492    * stage1.c: Reduced parameter for number of trials
21493
21494    * pair.c: Fixed calculation of coverage for PMAP
21495
21496    * oligoindex.h: Moved parameters here
21497
21498    * oligoindex.c: Implemented algorithm for PMAP.  Allowing a diagonal to
21499      dominate only if it is completely consecutive.
21500
215012007-02-09  twu
21502
21503    * iit-read.c, iit-read.h: Added function to dump labels
21504
21505    * gmap.c: Fixed bugs with map iit files: bad test for distinguishing between
21506      universal map files and chromosomal map files, and incorrectly checking
21507      map tags against chromosomal iit.
21508
215092007-02-08  twu
21510
21511    * oligoindex.c, oligoindex.h: Added computation of percent coverage by
21512      diagonals
21513
21514    * access.c: Added a debugging statement
21515
21516    * gmap.c: Fixed floating point error when trimoligos is zero
21517
21518    * gmap.c, oligoindex.c, oligoindex.h, stage2.c, stage2.h: Added graphical
21519      debugging output for stage 2
21520
215212007-02-07  twu
21522
21523    * smooth.c: Distinguishing use of genomejump and queryjump lengths in
21524      pre-single-gap smoothing versus post-single-gap smoothing.
21525
21526    * gmap.c: Fixed floating exception when sequence has no oligos
21527
21528    * stage2.c: Initializing guide in 5' trim region, until first hits are found.
21529
21530    * gmap.c: Changed default pruning behavior to be no pruning.
21531
21532    * oligoindex.c: Made speed improvements in scanning diagonals.  Removed old
21533      code for computing maxconsecutive.
21534
21535    * oligoindex.h, stage2.c: Changed name of variable
21536
21537    * oligoindex.c: Eliminated convex hull algorithm and implemented method
21538      based on ordering of diagonals.
21539
21540    * diag.c, diag.h: Removed unused procedures
21541
215422007-02-06  twu
21543
21544    * oligoindex.c: Implemented a convex hull algorithm to determine minactive
21545      and maxactive bounds.
21546
21547    * diag.c, diag.h: Added a procedure to sort diagonals based on closeness to
21548      origin
21549
21550    * oligoindex.c, oligoindex.h, stage2.c: Restored computation of
21551      maxgoodconsecutive to filter out bad stage1 candidates
21552
21553    * stage2.c: Allowing fill of nucleotides to occur even when
21554      querypos/lastquerypos or genomepos/lastgenomepos are too close.
21555
21556    * oligoindex.c: Added new procedure for determining dominance among diagonals
21557
21558    * diag.c, diag.h: Added procedure for sorting by nconsecutive
21559
21560    * stage2.c: Using minactive and maxactive to bound current querypos, and
21561      active to determine available hits for previous querypos.
21562
21563    * oligoindex.c, oligoindex.h: Computing diagonals inside
21564      Oligoindex_get_mappings procedure. Implemented simple dominance procedure.
21565
21566    * diag.c, diag.h, diagdef.h, diagpool.c, diagpool.h: Added fields for
21567      nconsecutive and dominancep
21568
215692007-02-05  twu
21570
21571    * oligoindex.c: Changes in parameters
21572
21573    * stage2.c: Changes to debugging output
21574
21575    * oligoindex.c: Speed improvements by inlining calls to Intlist accessors
21576
21577    * stage2.h: Removed unused function
21578
21579    * oligoindex.h, stage2.c: Using length as a criterion instead of
21580      nconsecutive for proceeding to dynamic programming
21581
21582    * oligoindex.c: Reduced requirement for nconsecutive in scanning diagonals.
21583      Keeping cumulative track of highest and lowest diagonals.  Added
21584      extra_bounds to diagonal bounds.
21585
21586    * Makefile.am, diag.c, diag.h, diagdef.h, diagpool.c, diagpool.h, gmap.c:
21587      Adding Diag_T and Diagpool_T objects for scanning diagonals
21588
21589    * oligoindex.c, oligoindex.h, stage2.c, stage2.h: Scanning diagonals to set
21590      bounds on active oligomers
21591
215922007-02-04  twu
21593
21594    * Makefile.am, gmap.c, intpool.c, intpool.h, oligoindex.c, oligoindex.h,
21595      stage2.c, stage2.h: Added Intpool_T object to manage storage for Intlist_T
21596      objects
21597
21598    * gmap.c: Using new interface to Stage3_compute
21599
21600    * stage2.c: In dynamic programming, added a lookback to the grand best
21601      querypos and hit
21602
21603    * intlist.c, intlistdef.h: Provided exposure to internal structure of
21604      Intlist_T
21605
216062007-02-03  twu
21607
21608    * oligoindex.c, oligoindex.h, stage2.c: Implemented faster algorithm for
21609      identifying active stage 2 oligomers.
21610
21611    * stage2.h: Removed unused procedures
21612
21613    * oligoindex.c, oligoindex.h, stage2.c: Implemented procedures to skip over
21614      unused mappings, based on active
21615
216162007-02-02  twu
21617
21618    * gmap.c, oligoindex.c, oligoindex.h, stage2.c, stage2.h: Implemented new
21619      stage 2 procedure.  Now using oligoindex at minindexsize and filtering
21620      those hits according to a local search. Initial search is based on
21621      maxindexsize.  Uncovered ends of the alignment receive a looser local
21622      search criterion.  Increased stage 2 lookback from 60 to 100.
21623
21624    * translation.c: Fixed bug where first cDNA amino acid appeared under a cDNA
21625      space.
21626
21627    * stage1.c: Adding one more trial for long sequences
21628
21629    * pair.h, stage3.h: Computing coverage and now trimmed coverage at print
21630      time.
21631
21632    * pair.c: Computing coverage and now trimmed coverage at print time.  Added
21633      output line for trimmed coverage.  Added trim information in compressed
21634      (-Z) output.  Adjusting output of trim boundaries based on alignment.
21635
21636    * stage3.c: Computing coverage and now trimmed coverage at print time.
21637      Increased parameter for minintronlength.
21638
21639    * gmap.c: Removed -X (cross-species) flag
21640
216412007-01-31  twu
21642
21643    * oligoindex.c: Added code for 18-amino-acid alphabet
21644
21645    * gmap.c: Made user_stage1p false
21646
21647    * stage2.c: Trying to make penalties consistent across different cases
21648
21649    * stage3.c: Initialized variables in Stage3_T object
21650
21651    * pair.c: Fixed memory leak in gff exon mode (-f 2)
21652
216532007-01-07  twu
21654
21655    * stage3.c: If both single and dual gap solutions are canonical, picking
21656      solution with best score.
21657
216582007-01-06  twu
21659
21660    * dynprog.c: Made reward for final canonical intron uniform across defect
21661      rates. Boosted reward for final semicanonical intron to match that for
21662      canonical intron.
21663
216642007-01-05  twu
21665
21666    * pair.c: Fixed dinucleotide output in compressed (-Z) format when
21667      user-provided genomic segment has lower-case characters.
21668
216692007-01-04  twu
21670
21671    * stage2.c: Reduced value of EQUAL_DISTANCE, to favor better local alignment
21672      over longer global alignment
21673
21674    * stage3.c: Counting exons only after gaps filled in
21675
21676    * pair.c, pair.h: Added procedure for counting exons after gaps filled in
21677
216782007-01-03  twu
21679
21680    * dynprog.c: Made one-sided gap behavior true only for single gaps and end
21681      gaps
21682
21683    * stage3.c: Removed fix_pmap_holes function and all references to it
21684
216852006-12-18  twu
21686
21687    * index.html, VERSION: Updated version
21688
21689    * translation.c: Prevented assignment of incomplete last codon on cDNA side
21690      in strict mode
21691
21692    * stage1.c: Removed unused variables
21693
21694    * matchpair.c: Increased EXTRA_SHORTEND
21695
21696    * gmap.c: Reduced default trimexonpct.  Changed bandwidths for single and
21697      gap gaps.
21698
21699    * dynprog.c: Added onesidegap behavior, which allows gaps on either genomic
21700      or cDNA side, but not both.  Added concept of fixeddestp, which is not
21701      true for the ends.
21702
217032006-12-15  twu
21704
21705    * VERSION: Updated version
21706
21707    * index.html: Made changes to reflect new version and strict translation as
21708      default
21709
21710    * gmap.c: Reduced extraband_end
21711
21712    * dynprog.c, gmap.c: Reduced extraband_single to prevent gaps from being
21713      inserted on both sides
21714
21715    * gmap.c, pair.c, pair.h, sequence.c, sequence.h, stage3.c: For PMAP, adding
21716      an implicit stop codon at end of sequence if not already present, and
21717      distinguishing between computational fulllength and given fulllength.
21718
21719    * smooth.c: Changed probability threshold for identifying short exons
21720
217212006-12-14  twu
21722
21723    * gmap.c: Made strict translation the default, and tolerant translation
21724      turned on by -Y flag
21725
21726    * stage3.c: Providing pound signs in dual breaks in diagnostic output.
21727      Replacing backtranslation characters with 'N' in PMAP output.  Removed
21728      microexon search from PMAP.  Using single gap procedure instead of
21729      fix_pmap_holes procedure for PMAP.
21730
21731    * pair.c: Counting ambiguous characters as matches in all instances of
21732      computing percent identity
21733
21734    * indexdb.h: Added variables for 5-aa mers
21735
21736    * dynprog.c: Limiting bandwidth in single gap alignment to be dependent on
21737      differences in segment lengths
21738
21739    * backtranslation.c: Not performing backtranslation if any genomic codon
21740      position is blank.
21741
21742    * md5.t.c: Removed unused file
21743
217442006-12-13  twu
21745
21746    * dynprog.c: Reduced width of band in single gaps when lengths are equal
21747
21748    * translation.c: Fixed strict translation mode, so it begins as same
21749      location as genomic translation.
21750
21751    * stage3.c: Removed step of merging adjacent dynamic programming.  Using two
21752      different smoothing steps.  Protected small introns from being solved as
21753      single gaps in final intron pass.
21754
21755    * smooth.c, smooth.h: Created two separate smoothing procedures, one based
21756      on net gap, and one based on size.
21757
21758    * dynprog.c: For cDNA gaps, inserting indel pairs only if both gaps are small
21759
217602006-12-12  twu
21761
21762    * map.test.ok: Added blank line at end
21763
21764    * VERSION: Updated version
21765
21766    * MAINTAINER: Added reminder to check cvs log to make sure files are all up
21767      to date
21768
21769    * stage3.c: Removing gaps at 5' and 3' ends after end extensions.  Checking
21770      for division by zero in trim_bad_exons.
21771
21772    * Makefile.am: Simplified list of source files
21773
217742006-12-08  twu
21775
21776    * archive.html: Updated to reflect 2006-12-08 version
21777
21778    * archive.html, Makefile.am, align.test.in, align.test.ok, coords1.test.in,
21779      coords1.test.ok, iit_dump.test.in, iit_get.test.in, iit_store.test.in,
21780      map.test.in, map.test.ok, setup1.test.in, setup2.test.in: Merged into trunk
21781
21782    * defs: Initial import into CVS
21783
21784    * VERSION, config.site.gne, share, index.html, pmap_setup.pl.in: Merging
21785      into trunk
21786
21787    * MAINTAINER: Merging into main trunk
21788
21789    * iit.test.in: Combined iit_store, iit_get and iit_dump tests into one script
21790
21791    * stage1.c: Increased definition of short sequence (for allowing cluster
21792      mode) for PMAP
21793
21794    * match.c: Fixed printing of sequences in debugging statements
21795
217962006-12-05  twu
21797
21798    * stage3.c: Fixed miscount problem with filling in short introns.  Increased
21799      MININTRONLEN_FINAL significantly.
21800
218012006-12-01  twu
21802
21803    * stage1.c: Printing chromosome name in debugging statements
21804
21805    * match.c, match.h: Added procedure Match_chr
21806
218072006-11-30  twu
21808
21809    * stage3.c: Allowing and correcting for gaps after gaps
21810
21811    * smooth.c: Using difference between genomejump and queryjump to define
21812      introns for the purposes of smoothing.
21813
21814    * configure.ac: Added check for sigaction
21815
21816    * README: Updated README file
21817
21818    * md_coords.pl.in: Fixed behavior when user wants only the reference strain
21819
21820    * gmap_setup.pl.in: Changed name from raw to fullascii.  Changed default for
21821      PMAP from 7 to 6.
21822
21823    * gmap_process.pl.in: Added check to see that all contigs are processed
21824
21825    * stage3.c: Assigning gap pairs after final extensions of 5' and 3' ends
21826
21827    * smooth.c: Removed include of unused header file
21828
21829    * genome-write.c, gmapindex.c, indexdb.c, indexdb.h, pmapindex.c: Added
21830      genome name to monitoring statements
21831
21832    * gmap.c: Stopped warning message for -B when flag was not provided
21833
218342006-11-28  twu
21835
21836    * stage1.c: Revised heuristics for determining maxtotallen and lengths for
21837      extensions
21838
21839    * gmap.c: Ignoring batch flag if only a single sequence is given
21840
218412006-11-27  twu
21842
21843    * stage1.c: Removed unused code
21844
21845    * matchpair.c: Adding extra extension length when continuousp is false
21846
21847    * gmap.c: Revised default lengths for single intron length and total genomic
21848      length
21849
21850    * dynprog.c: Added checks for genomic segment at ends being shorter than
21851      query segment
21852
21853    * indexdb.h: Using 6-mers with full alphabet in PMAP
21854
21855    * indexdb.c: Improved monitoring statements
21856
21857    * matchpair.c, matchpair.h: Revised procedures for computing support and
21858      extensions.  Integrated procedures for filtering of unique and duplicate
21859      matchpairs.
21860
21861    * oligoindex.c: Returned to 20 amino acids in stage 2
21862
21863    * params.c, params.h, stage1.h: Removed unused variable
21864
21865    * stage1.c: Integrated matchpairs into a single list.  Revised procedures
21866      for extending genomic region based on 12-mers.
21867
21868    * stage3.c: Allowed arbitrarily long incursion into previous dynprog during
21869      peelback.
21870
21871    * stage2.c: Separated fwd_consecutive and rev_consecutive.  Made values
21872      consistent regardless of indexsize.
21873
218742006-11-21  twu
21875
21876    * stage1.c: Fixed extensions for PMAP
21877
21878    * indexdb.c: Reversed previous changes to try to make idxpositions file
21879      point to end of oligomer for reverse strand matches.
21880
21881    * indexdb.c: Made idxpositions file point to end of oligomer for reverse
21882      strand matches.  Improved debugging output.
21883
21884    * stage1.c: Added a binary search routine
21885
218862006-11-20  twu
21887
21888    * indexdb.h, stage1.c: Made changes for PMAP to work with 6-mer pmapdb
21889
21890    * oligoindex.c: Fixed debugging statements for PMAP
21891
21892    * pair.c: Revised psl protein output for matches to the negative genome
21893      strand
21894
21895    * backtranslation.c, backtranslation.h: Made an extern procedure for
21896      computing consistent codon for a given amino acid.
21897
21898    * translation.c, translation.h: Made get_codon an extern procedure
21899
21900    * stage3.c: Added procedure for fixing alignment holes in PMAP.  Applying
21901      higher standard for accepting dual intron solutions.
21902
21903    * stage2.c: Fixing bugs in identifying stage 2 candidates to abort
21904
21905    * gmap.c: Setting trim variables appropriately in maponly mode
21906
21907    * dynprog.c: In PMAP, rounding up or down to finish codon
21908
219092006-11-16  twu
21910
21911    * stage3.c: Using intron types to evaluate bad exons at ends.  Adding
21912      another round of extensions at ends after trimming of bad exons.  Restored
21913      correction for genomepos at left end skip when filling in introns.
21914
21915    * dynprog.c: Assigning intron type for microexons added at ends of alignment
21916
21917    * gbuffer.c, gbuffer.h: Removed unused variables
21918
21919    * stage2.c: Removed unused variables.  Using correct value for
21920      maxconsecutive instead of last one.
21921
219222006-11-15  twu
21923
21924    * stage3.c: Using uppercase string, with U-to-T conversion, to identify
21925      mismatches in peelback procedures.
21926
21927    * backtranslation.c, translation.c: Using uppercase string, with U-to-T
21928      conversion, instead of toupper().
21929
21930    * sequence.c: Using new complement and uppercase strings
21931
21932    * pair.c: Using new name for (lowercase) complement string.  Including 'U'
21933      and 'u' as known bases for computing percent identity.
21934
21935    * indexdb.c: Using uppercase string, which also performs U-to-T conversion,
21936      instead of toupper().
21937
21938    * genome-write.c, genome.c: Using new name for (lowercase) complement string.
21939
21940    * compress.c: Using uppercase string instead of toupper.
21941      Compress_get_char() no longer converts characters to uppercase.
21942
21943    * dynprog.c: Made U and T a matching pair.  Commented out old code dealing
21944      with lowercase characters.
21945
21946    * complement.h: Added strings for uppercase of complement, and for U-to-T
21947      conversion during uppercase
21948
21949    * sequence.c: Enabled removal of spaces in read procedure
21950
219512006-11-14  twu
21952
21953    * dynprog.c: Reduced extension penalties for single gaps
21954
21955    * stage3.c: Fixed bug in filling in gaps where leftpair has a genome gap.
21956      Increased size of MININTRONLEN to avoid finding introns in single gap
21957      regions.
21958
219592006-11-12  twu
21960
21961    * stage3.h: Added parameter for number of flanking sequences to
21962      Stage3_print_map
21963
21964    * stage3.c: In 5' and 3' extensions, evaluating continuations before and
21965      after a gap if one is found during peeling, and performing microexon
21966      search medial to the gap
21967
21968    * dynprog.c: Returning null in genome gap if queryjump <= 1
21969
21970    * get-genome.c: Added -u flag for printing flanking intervals
21971
21972    * iit-read.c, iit-read.h: Added option to print iit entries in reverse order
21973
21974    * indexdb.h: Restored previous parameters
21975
21976    * pair.c: Added pointer to pair in debugging output
21977
219782006-11-11  twu
21979
21980    * stage3.c: Fixed computation of bad end exons.  Included short end exons in
21981      definition of bad end exons.  Finding bad end exons after 5' and 3'
21982      extensions.  Fixed declaration of sense/antisense when no canonical or
21983      semicanonical introns are present.  Removing end introns during peelback
21984      before 5' and 3' extensions.  Removed unused code for trimming alignment
21985      at ends.
21986
219872006-11-06  twu
21988
21989    * gmap.c: Added flags for printing flanking IIT hits and for trimming end
21990      exons
21991
21992    * stage3.c: Fixed bug in trimming empty alignment
21993
21994    * smooth.c: Fixed bug in handling lower-case query sequences
21995
219962006-11-01  twu
21997
21998    * translation.c: Fixed bug in strict translation
21999
22000    * iit-read.h: Added procedures for finding flanking hits.
22001
22002    * iit-read.c: Made IIT_get more efficient.  Added procedures for finding
22003      flanking hits.
22004
22005    * iit_get.c: Added -u flag for printing flanking hits
22006
22007    * stage3.c, stage3.h: Allowing trimming of bad exons at ends.  Increased
22008      peelback at ends. Added iterative cycles of intron finding within
22009      smoothing and dual intron cycles.
22010
22011    * smooth.c: Relaxing requirements for short exons at ends, because of later
22012      trimming of poor exons at ends
22013
22014    * pair.c: Adding printing of intron type for debugging
22015
22016    * gmap.c: Stopping deletion of global_except_key, because worker threads may
22017      still need it.  Increasing standards for defining a sequence to be
22018      repetitive.  Eliminating -U flag for trimming alignments, and adding -k
22019      flag for specifying trimming of exons at ends.
22020
22021    * except.c: Stopping deletion of global_except_key, because worker threads
22022      may still need it
22023
22024    * blackboard.c: Letting each thread destroy its own reqpost
22025
220262006-10-31  twu
22027
22028    * gmap.c: Added -Y flag for performing strict translation of cDNA sequence.
22029      Removed worker_assignments variable, and using global blackboard variable
22030      instead to handle exceptions.
22031
22032    * stage3.c, stage3.h, translation.c, translation.h: Added strictp flag for
22033      protein translation
22034
22035    * oligoindex.c: Dropped oligospace requirements for PMAP by reducing amino
22036      acid alphabet in stage 2 from 20 to 16.
22037
22038    * gmapindex.c, indexdb.c, pmapindex.c: Fixed memory allocation for filename
22039
22040    * except.c: Fixed location of compiler directive
22041
22042    * blackboard.c: Put mutex locks outside of updates to input counter and
22043      output counter.  This is to be cautious, since only input thread and
22044      output thread, respectively, should be affecting these counters.
22045
220462006-10-24  twu
22047
22048    * stage3.c: Fixed undefine_nucleotides to handle gapholders
22049
22050    * oligoindex.c: Using calloc instead of malloc for initializing oligoindex
22051      space
22052
22053    * gmap.c: Reduced indexsizes in PMAP, so they won't overflow in some machines
22054
22055    * backtranslation.c: Fixed usage of translation_start and translation_end
22056
220572006-10-20  twu
22058
22059    * gmap.c: Printing messages to stderr when no paths are found, in all cases
22060      where sequence headers are not printed.
22061
22062    * translation.c: Fixed coordinates for translation start and end
22063
22064    * stage3.c: Fixed bug with NULL path passed to undefine_nucleotides
22065
22066    * pair.c: Changed gff3 procedures to treat translation start and end values
22067      as query positions, not alignment indices.
22068
220692006-10-16  twu
22070
22071    * stage3.c: Making sure that gaps are inserted after smoothing procedure
22072      deletes exons
22073
22074    * stage2.c: Clarified differences between amino acid indexsize and
22075      nucleotide indexsize.  Cleaned up code for filling in oligomers.
22076
22077    * smooth.c: Reduced definition of a gap between exons
22078
22079    * oligop.c: Included possibility of 12-amino acid alphabet for 8-mers.
22080
22081    * indexdb.h: Included possibility of 12-amino acid alphabet for 8-mers.
22082      Provided compile-time values for file suffixes.
22083
22084    * indexdb.c: Included possibility of 12-amino acid alphabet for 8-mers
22085
22086    * pmapindex.c: Performing complete build with a single command
22087
22088    * gmapindex.c: Using compiler-time value for suffix
22089
22090    * gmap.c: Printing value of INDEX1PART in help output for PMAP
22091
220922006-10-13  twu
22093
22094    * oligoindex.c: Added debugging statements
22095
22096    * translation.c: Fixed bug with translating cDNA beyond the genomic stop
22097
22098    * stage3.c: Reorganized passes through the alignment.  Made peelback
22099      routines more robust.
22100
22101    * smooth.h: Using stage 2 indexsize in smoothing procedures
22102
22103    * smooth.c: Major rewrite of smoothing procedures
22104
22105    * dynprog.c: Added another mechanism to prevent microexon from having a gap
22106      at either end
22107
221082006-10-12  twu
22109
22110    * gmap.c: Allowing "-t 0" to mean non-threaded behavior.  Using new
22111      thread-safe exception handler.
22112
22113    * dynprog.c, dynprog.h: Fixed traceback for cDNA gaps
22114
22115    * except.c, except.h: Re-implemented thread-safe exception handler to remove
22116      memory leaks. Now using exception frames in stack rather than in heap.
22117
22118    * stage3.c: Fixed peelback to codon boundaries for PMAP.  Relaxed forcep
22119      requirement for single gaps.  Recognizing cases where prior genome or cDNA
22120      gap solution was obtained.
22121
22122    * stage3.h: Removed ngap from parameter lists when possible
22123
22124    * stage1.c: Initialized a diagnostic variable
22125
22126    * pair.c, pair.h: Removed unused code and variables
22127
221282006-10-11  twu
22129
22130    * except.c, except.h: Implemented thread-safe version of exception handler
22131
22132    * gmap.c: Added -j flag to control printing of dual breaks
22133
22134    * except.c, except.h: Reformatted exception handling code.  Using pointers
22135      to frames.
22136
221372006-10-10  twu
22138
22139    * stage3.c, stage3.h: Rewrote peelback routines.  More accurate handling of
22140      coordinates and checking of coordinates and gaps.
22141
22142    * dynprog.c: Advancing query and genome coordinates in cases of skips
22143
22144    * smooth.c: Revised trimming at ends to use individual exons, rather than
22145      the sum of exon and intron lengths
22146
22147    * pairpool.c: Showing pointer to pair in debugging statements
22148
22149    * pair.c: Showing queryjump and genomejump in debugging statements
22150
22151    * gmap.c: Added -0 flag to inactivate exception handler, and -7 and -8 flags
22152      to show results of stage 2 and smoothing, respectively.
22153
22154    * except.c, except.h: Added mechanism to inactivate exception handler
22155
221562006-10-09  twu
22157
22158    * access.c: Fixed compiler warning about reference to void *.
22159
22160    * block.c, chimera.c, oligo.c, sequence.c, stage1.c: Removed unused variables
22161
22162    * compress.c, dynprog.c, genome-write.c, intron.c: Added necessary header
22163      file
22164
22165    * genome.c: Fixed compiler warning about mismatched variable types.
22166
22167    * gmap.c: Added flag for pruning level.  Inactivated conversion of signals
22168      to exceptions with diagnostic flag.  Removed references to badoligos.
22169
22170    * indexdb.c: Added necessary header file.  Fixed compiler warning about
22171      mismatched variable types.
22172
22173    * matchpool.h: Added declarations of external functions
22174
22175    * oligoindex.c, oligoindex.h: Computing estimate of maxconsecutive when
22176      mappings are obtained
22177
22178    * pair.c, pair.h: Added diagnostic information about stage 2 maxconsecutive.
22179
22180    * result.c, result.h: Added diagnostic information about initial query check
22181
22182    * smooth.c: Handling possible gaps at ends of alignment
22183
22184    * stage2.c: Using maxconsecutive estimate from Oligoindex_get_mappings to
22185      determine whether to proceed with stage 2.
22186
22187    * stage3.c: Added diagnostic information about stage 2 maxconsecutive.
22188      Fixed procedure for removing adjacent dynamic programming to remove all
22189      gaps, and then to reinsert them later.
22190
221912006-10-06  twu
22192
22193    * oligoindex.c, oligoindex.h: Added counting of replicate oligos
22194
22195    * md_coords.pl.in: Added information about number of contigs in each strain
22196
22197    * configure.ac: Removed obsolete tests.  Fixed problem in setting share
22198      directory. Added maintainer option.
22199
22200    * gmap.c: Distinguishing between poor and repetitive sequences.  Providing
22201      -p flag to control pruning behavior.
22202
22203    * result.h: Distinguishing between poor and repetitive sequences
22204
22205    * sequence.c: Set skiplength correctly on empty sequences
22206
22207    * gmap.c: Added -W flag to force GMAP to compute repetitive or poor sequences
22208
22209    * oligoindex.c: Limited definition of badoligo to consider only non-ACGT
22210      characters, and not to consider number of hits.
22211
22212    * stage3.c: Fixed bug arising from gaps left at ends of alignment
22213
22214    * dynprog.c: Disallowing bridges of introns and cDNA insertions that lead to
22215      coordinate errors
22216
22217    * gmap.c: Changed thread-based exception handling to kill all other threads
22218      and to report all worker assignments
22219
222202006-10-05  twu
22221
22222    * stage3.h: Made checking of coordinates occur in diagnostic mode.
22223
22224    * stage3.c: Made checking of coordinates occur in diagnostic mode.  Fixed
22225      case where cDNA gap turned into a single gap after peelback.
22226
22227    * stage1.c: Fixed memory leak
22228
22229    * smooth.c: Fixed bug resulting from apparent negative exon and negative
22230      intron lengths.
22231
22232    * oligoindex.c: Restored pruning of sequences with bad oligos.
22233
22234    * gmap.c: Added handlers to convert signals into exceptions, to indicate the
22235      problematic sequence.  Restored pruning of sequences with bad oligos.
22236
222372006-10-04  twu
22238
22239    * result.c, stage1.c, stage1.h: Added reporting of more diagnostic
22240      information
22241
22242    * stage2.c: Fixed problems with uninitialized variable
22243
22244    * matchpair.c: Fixed problem with uninitialized variable
22245
22246    * gmap.c, pair.c, pair.h, result.c, result.h: Printing diagnostic
22247      information upon request
22248
22249    * access.c: Using a Stopwatch_T object
22250
22251    * stage1.c, stage1.h, stage2.c, stage2.h, stage3.c, stage3.h: Storing
22252      diagnostic information
22253
22254    * smooth.c: Fixed memory leak
22255
22256    * oligoindex.c: Stopped initializing data buffer for Oligoindex_T object
22257
22258    * stopwatch.c, stopwatch.h: Created a Stopwatch_T object
22259
222602006-10-03  twu
22261
22262    * stage3.c: Removed unnecessary list reversal
22263
22264    * pair.c: Allowing jump in querypos in pair check procedure
22265
222662006-10-02  twu
22267
22268    * gmap.c: Provide stage 2 information in diagnostic output.  Use stage 2
22269      information to prune bad alignments before stage 3.
22270
22271    * stage3.c: Provide stage 2 information in diagnostic output.  Allow a
22272      single open in scoring a single intron compared with dual introns.
22273
22274    * stage2.h: Interface provides number of canonical and non-canonical introns
22275
22276    * stage2.c: Returned to using gendistance for computing penalties, except
22277      for diffdistance in deadp.  Fixed bug in tallying unknown types of introns.
22278
22279    * sequence.c: Fixed problems with reading control-M characters (PC line
22280      feed) in input.
22281
22282    * pair.h: Reporting stage 2 information in diagnostic output.
22283
22284    * pair.c: Reporting stage 2 information in diagnostic output.  Counting
22285      indels in computing percent identity for each exon.
22286
22287    * dynprog.c: Eliminated extra reward for finding semicanonical introns in
22288      final pass
22289
22290    * stage2.c: Need to take abs() when measuring diffdistance.  Scoring
22291      behavior checked against revision 1.157.  Making stage2 information
22292      available for diagnostic output.
22293
222942006-09-30  twu
22295
22296    * gmap.c: Added -8 flag to show results of stage 2 calculation
22297
22298    * boyer-moore.c: Revised procedure to handle ambiguous characters for PMAP.
22299
22300    * dynprog.h: Added dynprogindex information.
22301
22302    * dynprog.c: Added table for performing Boyer-Moore searches of microexons
22303      for PMAP.  Reduced penalties for extending gaps.  Added separate rewards
22304      for final pass of finding canonical introns.  Added dynprogindex
22305      information.
22306
22307    * smooth.c: Systematically checking ends for smoothing.  Using matches
22308      instead of lengths to evaluate exons.  Added probabilistic checking for
22309      marking middle exons.
22310
22311    * stage3.h: Passing stage2p as a parameter to Stage3_compute.
22312
22313    * stage3.c: Major changes to algorithm.  Added iteration through smoothing,
22314      dual intron, and single intron passes.  Checking peel back to determine if
22315      canonical intron needs to be recomputed.  Added final pass to find introns
22316      with higher reward.  Added dynprogindex information.  Using dynprogindex
22317      information in peeling leftward and rightward.
22318
22319    * stage1.c: New criterion for setting usep to false, namely, if support is
22320      less than a certain fraction of the maximum observed support.
22321
22322    * pair.h: Pair_check_array now returns a bool.
22323
22324    * pair.c: Handling more cases of short gaps as indels.  Printing
22325      dynprogindex in diagnostic and debugging output.
22326
22327    * stage2.c: Reverted to algorithm from revision 1.157.  Using diffdistance
22328      instead of gendistance.  Making sufflookback depend on mapfraction.
22329
223302006-09-28  twu
22331
22332    * stage2.c: Changes made to scoring algorithm, but not well-motivated.
22333      Fixed bugs in predicting cDNA direction.
22334
223352006-09-18  twu
22336
22337    * stage3.c: Fixed bug when recomputing over adjacent dynamic programming
22338      regions at end of sequence
22339
22340    * stage2.c: Revised rules for giving credit for query distance, giving none
22341      if difference in distance is greater than min intron length.
22342
22343    * stage1.c: Doubling genomic region with each iteration, until sufficient
22344      support found for a matchpair.
22345
22346    * matchpair.c, matchpair.h: Computing and storing fraction of stage 1 support
22347
22348    * gbuffer.c, gbuffer.h, gmap.c: Allowed Genome_T object to exceed default
22349      length of genomic segment
22350
22351    * dynprog.c: Reduced penalties for gap extension, to match reductions in
22352      mismatch penalties
22353
22354    * Makefile.am: Provided target machine during compilation
22355
223562006-09-11  twu
22357
22358    * gmap.c: Included build target in version output.  Increased oligomer size
22359      in PMAP from 3-4 to 4-5.
22360
223612006-09-08  twu
22362
22363    * stage2.c: Added oligos to output of debugging statements
22364
22365    * configure.ac: Using AC_FUNC_FSEEKO to check for fseeko.  Added comment
22366      line for $Id$.
22367
223682006-09-07  twu
22369
22370    * stage2.c: Added debugging statements for finding shifted canonical introns
22371
22372    * stage1.c: Increased trimlength and extension past ends for PMAP
22373
22374    * gmap.c: Increased maxextension to 120000
22375
223762006-09-01  twu
22377
22378    * translation.c: Making sure to assign values to variables when number of
22379      alignment pairs is fewer than the minimum
22380
22381    * pair.c: Fixed bug in printing CDS of GFF3 format
22382
22383    * stage3.c: For PMAP, trimming ends of alignment to codon boundaries
22384
22385    * translation.c: Removed check for minimum number of pairs for PMAP
22386
22387    * dynprog.c: Changed calls to Pairpool_push.  Added dynprogindex
22388      information. Reduced penalty for mismatches.
22389
22390    * dynprog.h: Changed calls to Pairpool_push.  Added dynprogindex information.
22391
22392    * stage3.c: Fixed bug where peeling back yielded wrong coordinates.  Changed
22393      calls to Pairpool_push.  Added dynprogindex information.  Recomputing
22394      regions with adjacent dynamic programming solutions.
22395
22396    * matchpair.c, stage2.c: Changed calls to Pairpool_push
22397
22398    * smooth.c: Added debugging statements for exon and intron lengths
22399
22400    * sequence.c: Fixed bug where return type should be int, not bool.
22401
22402    * pairpool.c, pairpool.h: Distinguished between gapholder and gapalign
22403      elements.  Added dynprogindex to Pairpool_push.
22404
22405    * pair.c: Added debugging option for printing dynprogindex
22406
22407    * pairdef.h: Added dynprogindex to struct.  Reordered fields.
22408
22409    * bool.h: Defining bool to be an unsigned char instead of an enumerated type
22410
224112006-08-03  twu
22412
22413    * fa_coords.pl.in: Added pattern Chr_ seen in some TIGR genomes.  Changed
22414      variable name from chronlyp to concatenatedp.
22415
22416    * oligoindex.c: Added check for query lengths shorter than index size
22417
22418    * get-genome.c, iit_get.c: Allowed program to take coordinate requests from
22419      stdin
22420
22421    * iit-read.c, iit-read.h, iit_dump.c: Added option to dump counts of each
22422      segment
22423
22424    * gmap.c: Printing calling arguments in gff mode
22425
224262006-06-12  twu
22427
22428    * pair.c, gmap_compress.pl.in, gmap_uncompress.pl.in: Using bp to denote
22429      query length, instead of nt
22430
22431    * stage3.c: Turned off gap checking
22432
22433    * gmap_compress.pl.in: Putting cDNA length into the Coverage field
22434
22435    * gmap_uncompress.pl.in: Getting cDNA length from the Coverage field
22436
224372006-05-31  twu
22438
22439    * params.c, params.h: Adding maxoligohits as a parameter
22440
22441    * oligoindex.c, oligoindex.h: Using maxoligohits parameter, and reducing it
22442      for cross-species alignment (to avoid random and misleading matches)
22443
22444    * stage2.h: Using maxoligohits parameter
22445
22446    * stage2.c: For cross-species alignment, increasing enough_consecutive
22447      parameter and not opportunistically increasingly sampling interval
22448
22449    * stage1.h: Reduced SINGLEINTRONLENGTH to 100000
22450
22451    * stage1.c: Using maxextension parameter instead of SINGLEINTRONLENGTH
22452      directly
22453
22454    * gmap.c: Limited crossspecies parameters to maxextension and maxoligohits.
22455
224562006-05-25  twu
22457
22458    * stage2.c: Introduced detection of semicanonical introns and penalty for
22459      these. Removed distpenalty_dead, and introduced distpenalty_noncanonical;
22460      motivated by ENST0356720.  Decreased distpenalty; motivated by
22461      ENST0356222.  Introduced procedure for querydist_credit, bounded below by
22462      zero.  Decreated querydist points when gendistance equals querydistance;
22463      motivated by ENST0354988.
22464
22465    * gmap.c: Using single intron length as basis for maxextension
22466
22467    * stage3.c: Made initial pass of build_pairs_singles work only when
22468      genomejump equals queryjump; motivated by ENST0341339.  Made acceptable
22469      mismatches for dual introns depend again on defect rate.
22470
22471    * smooth.c: Removed deletion of longest middle exon in a series of short
22472      exons. Motivated by ENST0348697.
22473
22474    * stage1.c, stage1.h: Using single intron length to extend genomic segment
22475      at ends. Motivated by ENST0358972.
22476
22477    * oligoindex.c: Increased thetadiff for trimming repetitive oligos from 2 to
22478      20. Motivated by ENST0357282.
22479
22480    * gbuffer.c, gbuffer.h: Added data structures for storing positions of
22481      semicanonical dinucleotides
22482
22483    * dynprog.c, dynprog.h: Made microexon p-value threshold depend on the
22484      defect rate.  Increased genomejump needed for single gap penalties to
22485      apply.  Motivated by ENST0262608.
22486
224872006-05-23  twu
22488
22489    * stage2.c: Moved preprocessor directive outside of macro (needed for gcc3
22490      compiler).
22491
224922006-05-22  twu
22493
22494    * gmap.c, stage3.c, stage3.h: Changed variable name from extend_mismatch_p
22495      to trimalignmentp
22496
22497    * changepoint.c: Changed criterion from a differences in theta to a ratio
22498
224992006-05-19  twu
22500
22501    * stage3.c: Removed trimming of alignments in PMAP
22502
22503    * stage1.c: Changed some parameters to increase sensitivity
22504
22505    * chrsubset.c, chrsubset.h: Added function Chrsubset_make
22506
22507    * translation.c: Fixed assignment of amino acids to genomic sequence in PMAP
22508
22509    * stage3.h: Minor formatting change
22510
22511    * stage3.c: Printing trimmed query coordinates in path summary.  Pruning
22512      stage 3 result of coverage is less than MINCOVERAGE.
22513
22514    * sequence.h: Added appropriate MAXSEQLEN for PMAP
22515
22516    * reader.c, reader.h: Allowing reading in each direction to proceed to the
22517      ends of the query sequence
22518
22519    * pairpool.c: Setting initial value for aa_g and aa_e
22520
22521    * pair.c, pair.h: Printing trimmed query coordinates in path summary
22522
22523    * oligoindex.c: Reinstated trimming of query sequence based on changepoint
22524      analysis
22525
22526    * mem.c: Fixed compiler warning about pointer arithmetic on void *.
22527
22528    * matchpair.c: Added comments
22529
22530    * gmap.c: Performing trimming of query sequence in more cases.  Changed name
22531      of "mutation reference" to "reference sequence".
22532
22533    * dynprog.c: Removed step function penalty based on codons.  Reduced
22534      extension penalty to obtain better behavior.
22535
22536    * stage2.c: Changed position for starting to compute mismatch gaps.  Added
22537      trimming at ends for PMAP.
22538
22539    * stage1.c: Improved calculation of genome segment length, based on expected
22540      exon and intron sizes.  In sampling mode, continuing sampling at current
22541      position of block pointers.
22542
22543    * block.c, block.h: Added procedures for saving and restoring blocks
22544
225452006-05-15  twu
22546
22547    * Makefile.am, gbuffer.c, gbuffer.h, gmap.c, matchpair.c, matchpair.h,
22548      pair.c, pair.h, stage2.c, stage2.h, stage3.c, stage3.h: Created Gbuffer_T
22549      object to use as workspace for various calculations
22550
22551    * translation.c: Fixed uninitialized variable
22552
22553    * dynprog.c, pair.c: Made cDNA gaps into type SHORTGAP_COMP instead of
22554      INDEL_COMP, so they get treated properly by the changepoint analysis
22555
22556    * stage3.c: Fixed memory leak
22557
22558    * changepoint.c: Changed changepoint parameters slightly
22559
225602006-05-14  twu
22561
22562    * stage3.c: Added checks to make sure both qgenome lengths are adequately
22563      long in dual intron gaps
22564
22565    * dynprog.c: Increased penalties for mismatches
22566
22567    * gmap.c: Changed 'U' flag to mean no trimming of poor alignments at ends
22568
225692006-05-13  twu
22570
22571    * gmap.c: Changed interfaces to some Stage3_T functions
22572
22573    * pair.c, pair.h: Added query length to Coverage line
22574
22575    * stage1.c: Fixed bug where maxtrial wasn't set
22576
22577    * stage2.c: Removed final assignment of dinucleotide positions
22578
22579    * stage3.h: Changed some interfaces to Pair_T functions
22580
22581    * stage3.c: Added some shortcuts for changepoint analysis.  Changed some
22582      interfaces to Pair_T functions.
22583
225842006-05-12  twu
22585
22586    * gmap.c: Provided initial values to some variables
22587
22588    * oligoindex.c: Reduced MAXHITS parameter from 200 to 20
22589
22590    * stage1.c: Limiting trials for same-species alignment.  Limiting salvage
22591      algorithm to short sequences and cross-species alignment.
22592
22593    * stage2.c: Implemented faster method for finding shifted canonical introns
22594
22595    * stage2.c: Saving mappings for each indexsize, and going back to best one.
22596      Introduced idea of sufficient and minimum map fraction, and aborting if
22597      minimum map fraction not satisfied.
22598
22599    * gmap.c, stage3.c, stage3.h: Added option to print output in IIT FASTA map
22600      format
22601
22602    * pair.c, pair.h: Removed parameter from Pair_print_iit_map
22603
22604    * pair.c, pair.h: Removed old code.  Added a procedure for printing an IIT
22605      map.
22606
22607    * sequence.c: Removed printing of '>' from Sequence_print_header
22608
22609    * iit-read.c: Fixed bug in printing results from map iit
22610
22611    * stage2.c: Added debugging statements
22612
226132006-05-11  twu
22614
22615    * gmap.c, pair.c, pair.h, stage3.c, stage3.h: Included filename of
22616      user-provided genomic seg as source in gff3 output
22617
22618    * iit_dump.c: Included header for getopt_long
22619
22620    * README: Added more information about IIT utilities
22621
22622    * iit-read.c, iit-read.h: Added annotation-only option to IIT_dump
22623
22624    * get-genome.c: Changed program description statement slightly
22625
22626    * Makefile.am, iit_dump.c, iit_get.c, iit_store.c: Added long options and
22627      documentation for the IIT utilities
22628
22629    * iit_store.c: Added support for quotation marks in gff3 features
22630
22631    * chrnum.c, chrnum.h, chrsubset.c: Using new Chrnum_to_string interface
22632
22633    * gmap.c, stage3.c, stage3.h: Added cDNA_match option for gff3 output
22634
22635    * pair.c, pair.h: Added cDNA_match option for gff3 output, including Gap
22636      attribute. Using new Chrnum_to_string interface.
22637
22638    * gmap.c, params.c, params.h, stage3.c, stage3.h: Added procedures for
22639      allowing chromosome-tagged IIT map files, in addition to strand-tagged IIT
22640      map files.
22641
22642    * iit-read.c, iit-read.h: Added functions for retrieving multiple types and
22643      for getting label when access mode is fileio.
22644
226452006-05-10  twu
22646
22647    * iit_get.c: Allowing user to specify multiple types
22648
22649    * iit_store.c: Modified gff3 parsing to assign only one tag for each row.
22650      Using feature column as a source for labels.
22651
22652    * pair.c, pair.h: Added routines for output in GFF3 format
22653
22654    * Makefile.am: Added trimming of alignment based on changepoint analysis
22655
22656    * stage2.c: Fixed bug in scanning for reverse canonical intron
22657
22658    * backtranslation.c, pairdef.h, pairpool.c, translation.c: Introduced phase
22659
22660    * gmap.c: Added flags for gff3 output
22661
22662    * stage3.c, stage3.h: Added procedure for trimming pairs.  Added gap when
22663      bounds don't make sense for dual intron gaps.  Introduced gff3 format and
22664      phase.
22665
22666    * changepoint.c, changepoint.h: Initial import into CVS
22667
22668    * stage2.c: Added debugging statements for the result of stage 2 prior to
22669      trimming
22670
22671    * pair.c, pair.h: Added a function for computing matchscores from an
22672      alignment
22673
22674    * dynprog.c: Changed various gap penalties, especially at ends of sequence
22675
22676    * iit_get.c: Changed atol() to strtoul(), because atol() was truncating
22677      numbers above 2^31 in machines with long ints of 4 bytes.
22678
226792006-05-08  twu
22680
22681    * stage2.c: Changed condition for re-computing dead links from an "or" to an
22682      "and" on both directions.  Added trimming of ends, based on consecutive
22683      matches.
22684
226852006-05-07  twu
22686
22687    * stage2.c: Cleaned up procedure for finding introns in PMAP, which can be
22688      shifted.  Cleaned up counts of canonical and total introns.
22689
22690    * stage3.c: Fixed problems with shortcut for existing introns and with
22691      coordinates for dual genome pairs
22692
22693    * oligoindex.c: Added debugging statement
22694
226952006-05-06  twu
22696
22697    * stage2.c: Reworking of stage 2 scoring to make it more robust for
22698      low-identity sequences.  Includes identification of possible canonical
22699      sites by shifting boundaries.
22700
22701    * stage3.c, stage3.h: Using dynamic programming paths computed for dual
22702      intron gaps
22703
22704    * gmap.c, stage2.c, stage2.h: Computing indexsize adaptively
22705
22706    * smooth.c, smooth.h: Removed indexsize from smoothing procedures
22707
22708    * params.c, params.h: Added minindexsize and maxindexsize to params
22709
22710    * oligoindex.c, oligoindex.h: Changed PMAP indexsize to be in aa.
22711      Calculating mapfraction.
22712
22713    * pair.c: Keeping ambiguous character and comp in PMAP alignments
22714
22715    * intron.c, intron.h: Added function to return string for printing intron
22716      type
22717
22718    * gmap.c, oligoindex.c, oligoindex.h: Put indexsize parameters inside of
22719      Oligoindex_T object
22720
227212006-05-05  twu
22722
22723    * md_coords.pl.in: Improved handling of alternate strains
22724
227252006-05-04  twu
22726
22727    * iit_store.c: Implemented parsing of gff format
22728
22729    * gmap.c: Incremented stuttercycles for PMAP
22730
22731    * indexdb.c: In monitoring commands, printing positions with commas
22732
22733    * matchpair.c, matchpair.h, stage1.c: Using maxintronlen instead of querylen
22734      as criterion for removing hits before clustering
22735
22736    * stage2.c: Checking two possible query positions for intron in PMAP
22737
22738    * dynprog.c, dynprog.h: Removed obsolete parameters for computing genome gaps
22739
22740    * stage3.c: Fixed boundaries for check of dual introns
22741
22742    * fa_coords.pl.in: Improved monitoring messages to indicate when coordinates
22743      are parsed and when they are concatenated.
22744
22745    * md_coords.pl.in: Fixed bug in handling alternate strains
22746
227472006-05-03  twu
22748
22749    * md5-compute.c, oligo-count.c: Using new version of Sequence_read
22750
227512006-05-02  twu
22752
22753    * stage2.c: Removed statement that does not apply to PMAP
22754
22755    * matchpair.c: Fixed computation of support for PMAP.  Added debugging
22756      statements
22757
22758    * match.c: Fixed genomic segment retrieved for debugging
22759
22760    * iit-read.c: Minor editing changes
22761
22762    * datadir.c: Improved error message when genome directory isn't found
22763
22764    * compress.c: Removing spaces from reading of uncompressed sequence
22765
22766    * stage1.c: Increased matchpairs allowed.  Fixed position adjustment for
22767      reverse strand matches on PMAP.
22768
22769    * indexdb.c: Shifted positions for .prxpositions down by one.
22770
227712006-04-21  twu
22772
22773    * gmap_setup.pl.in: Fixed bugs in printing of instructions
22774
22775    * fa_coords.pl.in: Augmented patterns allowed for specifying chromosomal
22776      location of contigs
22777
22778    * Makefile.am, chrsubset.c, chrsubset.h, get-genome.c: Added ability to
22779      print all chromosomal subsets from get-genome
22780
22781    * datadir.c: Improved error message
22782
22783    * README: Added information about the -q flag.  Added additional forms for
22784      specifying chromosomal location of contigs.
22785
227862006-04-18  twu
22787
22788    * gmap_setup.pl.in: Added the -q and -Q flags for specifying indexing
22789      intervals
22790
227912006-04-07  twu
22792
22793    * gmap_setup.pl.in: Fixed bugs with new install statements
22794
22795    * gmap_setup.pl.in: Added comment about editing .chrsubset file.  Creating
22796      genome.maps directory.
22797
22798    * stage3.c: Turned off CHECK
22799
22800    * README: Added comment about editing .chrsubset file
22801
228022006-04-06  twu
22803
22804    * gmap_compress.pl.in: Changed program to handle intron lengths in exon
22805      summary
22806
22807    * stage2.c, stage2.h: Introduced limit on individual intron lengths
22808
22809    * stage1.c, stage1.h: Changed variable name from maxintronlen to maxtotallen
22810
22811    * gmap.c: Added separate flag for limiting individual intron lengths
22812
22813    * pair.c: Added intronlengths to exon summary
22814
22815    * sequence.h: Increased maximum sequence length to be 1000000.
22816
228172006-04-04  twu
22818
22819    * stage3.c: Building singles if a short exon is deleted during smoothing
22820
22821    * smooth.c: Improved debugging statements
22822
228232006-03-24  twu
22824
22825    * gmap_setup.pl.in: Printing a copy of the install procedure to a file
22826
228272006-03-20  twu
22828
22829    * match.c: Match_npairings returns an int
22830
22831    * md_coords.pl.in: Passing back maxwidth as a result.
22832
22833    * gmap.c: Made changes for compatibility with PMAP.
22834
22835    * stage3.c, stage3.h: Giving maxpeelback information to dynamic programming
22836      routine, so it can use single gap penalties for long intron gaps.  Made
22837      changes for compatibility with PMAP.
22838
22839    * smooth.c, smooth.h: Changed smoothing routine to be based on net intron
22840      lengths. Sequences of small exons are removed if they yield a net intron
22841      length of approximately zero.
22842
22843    * dynprog.c, dynprog.h: Disallowing intron or cDNA gaps to be placed at the
22844      edge of the segment, which caused an error to occur in the check_gaps
22845      routine. Using single gap penalties for long intron gaps.
22846
228472006-03-17  twu
22848
22849    * sequence.c: Added handler for cases where requested subsequence start and
22850      end are beyond the bounds of the sequence
22851
22852    * gmap.c, stage1.c, stage1.h: Added concept of maxtrial, to be used for
22853      chimera (subsequence) problems
22854
22855    * stage3.c: Added an exception handle for errors in checking gaps
22856
22857    * dynprog.c: Disallowed intron or cDNA gaps to be inserted at ends of the
22858      subsequence, which results in an unexpected gap.
22859
228602006-03-05  twu
22861
22862    * gmap.c: Providing maponlyp information to Sequence_read, to turn
22863      skiplength warning message on or off.
22864
22865    * sequence.h: Set MAXSEQLEN to be 200000
22866
22867    * pair.c, stage3.c: Revision of procedures to handle sequences with
22868      skiplength
22869
22870    * stage1.c: Expanded maxintronlen to include skiplength
22871
22872    * sequence.c, sequence.h: Addition of skiplength.  Rewriting of code for
22873      reading sequences to handle skipping of middle correctly.
22874
228752006-03-04  twu
22876
22877    * gmap.c: Reworking of maponlyp case to generate a Stage3_T object
22878
22879    * stage3.c, stage3.h: Implementedq Stage3_direct function for maponlyp case.
22880       Cleaned up merge function for combining two Stage3_T objects.
22881
22882    * stage1.c, stage1.h: Cleaned up various procedures in stage 1 computation.
22883      Simplified function identify_matches.  Eliminating extensions for maponlyp
22884      case.
22885
22886    * matchpair.c, matchpair.h: Added function for making a path from a
22887      matchpair object
22888
22889    * matchpool.c: Simplified code for handling positions on reverse genomic
22890      strands.
22891
22892    * match.c, match.h: Added function for printing the oligomer for a match.
22893      Simplified code for handling positions on reverse genomic strands.
22894
22895    * oligoindex.c: Turned off code for changepoint analysis for trimming ends
22896
22897    * pair.c, pair.h: Modified printing of path summary for maponlyp
22898
22899    * result.c, result.h: Removed Stage1_T objects from Result_T
22900
22901    * genome.c: Added debugging statements
22902
22903    * block.c, oligo.c, oligo.h: Fixed problem where oligomers read from left
22904      side need to be shifted down to low 12-mer.  This corrects problem with
22905      match coordinates being off by 4.
22906
229072006-03-02  twu
22908
22909    * gmap.c: Revised code for computing chimeras
22910
22911    * chimera.c, chimera.h: Made Chimera_T object created only when completely
22912      specified
22913
22914    * stage3.c: Added a step to allow for subseq_offset, if present
22915
22916    * sequence.c, sequence.h: Added subseq_offset to Sequence_T
22917
229182006-03-01  twu
22919
22920    * dynprog.c, dynprog.h: Restored one gap behavior on ends.  Using
22921      cdna_direction information on single gaps.
22922
22923    * stage3.c: Forcing single gaps to be solved.  Adding cdna_direction
22924      information for single gaps.  Fixed problem with short indels being
22925      inserted backward.
22926
22927    * oligoindex.c, oligoindex.h: Implemented new scheme for detecting
22928      repetitive sequence on ends, based on changepoint analysis
22929
22930    * smooth.c: Fixed memory leak.
22931
22932    * translation.c: Added check so we won't go beyond ends.  Assigned variables
22933      when npairs is too few.
22934
229352006-02-27  twu
22936
22937    * stage3.c, stage3.h: Minor bug fixes
22938
229392006-02-26  twu
22940
22941    * match.c, match.h, matchdef.h, matchpool.c, stage1.c: Keeping track of
22942      number of pairings for each match, and placing a limit on the number of
22943      matchpairs generated for each match with a "promiscuous" variable
22944
229452006-02-25  twu
22946
22947    * stage2.c: Made behavior similar for sequence and reverse complement,
22948      including bug fix and using diffdistance rather than querydistance
22949
229502006-02-24  twu
22951
22952    * pairpool.c, pairpool.h: Added procedure for counting result of bounding
22953      operation
22954
22955    * pair.c, pair.h: Counting amino acids directly for protein PSL output.
22956      Fixed problem in coordinates output where chrstring was NULL.
22957
22958    * dynprog.c: Increased penalty for gaps in single alignments and made them
22959      uniform across sequence quality
22960
22961    * smooth.c, smooth.h: Rewrite of code to use arrays instead of lists.
22962      Reduced definition of short exon.  Now deleting consecutive strings of
22963      short exons.
22964
22965    * translation.c: Noting large insertions and deletions of amino acids, even
22966      if not a multiple of 3
22967
229682006-02-23  twu
22969
22970    * chimera.c, chimera.h, gmap.c, stage3def.h: Moved various functions back to
22971      stage3.c
22972
22973    * stage3.c, stage3.h: Performing substitution of gaps only for final cDNA
22974      direction
22975
22976    * oligoindex.c, oligoindex.h: Turned off trimming of sequence for reference
22977      sequences and for protein sequences
22978
22979    * intron.c, intron.h: Using cdna_direction information in assigning
22980      Intron_type
22981
22982    * dynprog.c, pairpool.c, pairpool.h, stage2.c: Passing in gapp as a
22983      parameter to Pairpool_push
22984
22985    * translation.c: Fixed bug with marking backwards cDNAs relative to
22986      reference sequence
22987
229882006-02-22  twu
22989
22990    * translation.c: Fixed minor bugs in new implementation
22991
22992    * Makefile.am: Rewrite of code for determining mutations and for printing
22993      the results.  Removed mutation.c and mutation.h.
22994
22995    * mutation.c, mutation.h, pair.c, pair.h, translation.c: Rewrite of code for
22996      determining mutations and for printing the results
22997
229982006-02-21  twu
22999
23000    * stage3.c: Moved some chimera functions from stage3.c to chimera.c.  Set
23001      acceptable_mismatches for microexons to be 2.
23002
23003    * Makefile.am, chimera.c, chimera.h, stage3.h, stage3def.h: Moved some
23004      chimera functions from stage3.c to chimera.c
23005
23006    * dynprog.c: Increased probability standard for finding microexons
23007
230082006-02-19  twu
23009
23010    * translation.c: Fixed bug where cDNA translation was incomplete
23011
23012    * stage3.c: Fixed bug in substitution for gaps when ngap is not 3
23013
23014    * stage3.c, stage3.h: Complete rewrite of stage 3 to use gap pairs
23015
23016    * translation.c: Increased parameter for ignoring amino acid mismatches at
23017      ends of query sequence
23018
23019    * smooth.c, smooth.h: Made changes to handle new gap pairs
23020
23021    * pair.c: No longer assigning coordinates for query sequence and genomic
23022      segment within gaps
23023
23024    * matchpair.c, matchpair.h: Limiting 12-mer hits that are considered in
23025      clustering method to those that have a neighboring hit within the query
23026      length
23027
23028    * dynprog.c, dynprog.h: Inserting a single gap pair for introns and cDNA
23029      insertions instead of filling in nucleotides
23030
23031    * stage1.c: Reduced extension of genomic segment when cluster mode is
23032      required
23033
23034    * gmap.c: Put output to stderr when path not found in compressed output
23035
23036    * intron.c, intron.h: Moved Intron_type function here
23037
23038    * pairpool.c, pairpool.h: Added explicit functios for handling gap pairs
23039
23040    * pairdef.h: Added fields for queryjump and genomejump, to be used for gaps
23041
230422006-02-08  twu
23043
23044    * translation.c: Set minimum number of pairs required for a translation
23045
230462006-02-07  twu
23047
23048    * gmap.c: Now checking for existence of -g or -d flag before proceeding
23049
23050    * stage3.c: Fixed problem when solving an intron and unable to peel back
23051      anything.
23052
230532006-02-05  twu
23054
23055    * dynprog.c: Fixed problem with extending 5' and 3' ends with assumption of
23056      no gap.  Added extra efficiency based on this assumption.
23057
230582006-01-19  twu
23059
23060    * README: Enhanced usage statement for gmap_setup
23061
23062    * gmap_setup.pl.in: Cleaned up flags.  Added messages after each make
23063      procedure.  Enhanced usage statement.
23064
23065    * gmap_process.pl.in: Removed code for a separate strain file
23066
23067    * gmap_process.pl.in: Added provision for a separate strain file, but
23068      commented out code
23069
23070    * md_coords.pl.in: Fixed problem when MD file has fewer than 6 lines.  Put
23071      output into an array for printing out in one batch.  Improved handling of
23072      strains.
23073
23074    * fa_coords.pl.in: Put output into an array for printing out in one batch.
23075
23076    * pmap_setup.pl.in, Makefile.am: Removed pmap_setup program
23077
23078    * stage3.c: Added procedure to fix short gaps
23079
23080    * gmapindex.c: Added ability to read reference strain from coords file
23081
23082    * gmap.c: Added provision for different stage 2 index size for PMAP
23083
230842006-01-17  twu
23085
23086    * pair.c: Fixed problem with protein PSL coordinates
23087
230882005-12-15  twu
23089
23090    * backtranslation.c, backtranslation.h: Fixed problems in backtranslation
23091      when genomic segment has lower case characters
23092
23093    * gmap.c, stage3.c, stage3.h: Preserved diagnostic info in PMAP through
23094      backtranslation
23095
23096    * pair.c: Changed printing of cDNA on ambiguous comps to be lower case if
23097      appropriate
23098
23099    * dynprog.c: Changed ends from 1 gap to no gaps.  Changed open/extend
23100      penalties at ends (which may be irrelevant now).
23101
23102    * matchpool.c, stage1.c: Fixed problems with genomic position in reverse
23103      complement matches in PMAP.
23104
23105    * translation.c: Fixed problems with ends of cDNA and genomic translation
23106      for PMAP. Set margin to zero for computing amino acid changes.
23107
23108    * iit-read.c: Commented out abort
23109
231102005-12-14  twu
23111
23112    * sequence.c: Fixed uninitialized heap
23113
231142005-12-13  twu
23115
23116    * gmap.c, stage2.c, stage2.h: Added pruning before stage 2 based on number
23117      of potentially consecutive hits and short paths
23118
23119    * oligoindex.c, oligoindex.h: Added computation of potentially consecutive
23120      hits in the query
23121
23122    * stage1.c: Added filtering of matchlist based on support
23123
23124    * matchpair.c, matchpair.h: Added storage of support and usep in Matchpair_T
23125      object
23126
231272005-12-09  twu
23128
23129    * gmap.c, stage2.c: Removed code for finding PMAP unaligned access error
23130
23131    * gmap.c, stage2.c: Added code for finding PMAP unaligned access error
23132
23133    * backtranslation.c, oligoindex.c: Removed code for checking assertions
23134
23135    * backtranslation.c, oligoindex.c: Added code for checking assertions
23136
231372005-12-08  twu
23138
23139    * pair.c: Streamlined determination of amino acid coordinates in alignment
23140      output
23141
23142    * indexdb.c: Fixed bug in handling offsets in alternative strains in PMAP
23143
23144    * dynprog.c: Reformulated assignment of pointers in two-dimensional array
23145
231462005-12-06  twu
23147
23148    * translation.c: Formatting change
23149
23150    * stage1.c: Turned on use of matchpool.  Fixed problem where list was not
23151      reset to NULL.
23152
23153    * pair.c: Changed dir:unknown to dir:indet
23154
23155    * oligoindex.c: Fixed uninitialized variable in GMAP
23156
23157    * matchpool.c: Improved debugging statements
23158
23159    * matchpair.c: Increased standard for stage 1 support
23160
231612005-12-05  twu
23162
23163    * oligoindex.c: Made code compatible with both GMAP and PMAP
23164
23165    * backtranslation.c, dynprog.c: Reduced memory allocation for
23166      two-dimensional array into a one-dimensional array
23167
23168    * matchpool.c, pairpool.c: Removed initial creation of chunks
23169
23170    * oligoindex.c: Fixed bug in PMAP where stop codon in the genomic sequence
23171      created a value that exceeded oligospace
23172
23173    * pair.c, pair.h: Added a way for the thread worker id to be printed with
23174      the result. Removed ambiguous comp characters from gmap.
23175
23176    * gmap.c, reqpost.c, reqpost.h, result.c, result.h, stage3.c, stage3.h:
23177      Added a way for the thread worker id to be printed with the result
23178
23179    * matchpool.c: Added commands for saving and restoring pointers, so memory
23180      can be re-used
23181
23182    * match.c, match.h, stage1.c: Added compiler conditions for using matchpool
23183      method.
23184
23185    * genome.c: Fixed messages to user
23186
23187    * chrsubset.c: Changed format of output
23188
23189    * translation.c: Fixed bug in translating backward cDNAs.  Extended
23190      translation all the way to the end.
23191
231922005-12-04  twu
23193
23194    * Makefile.am, gmap.c, matchpair.c, matchpair.h, matchpairdef.h,
23195      matchpairpool.c, matchpairpool.h, stage1.c, stage1.h: Removed special
23196      memory allocation routines for matchpairs
23197
23198    * Makefile.am, gmap.c, match.c, match.h, matchpair.c, matchpair.h,
23199      matchpairdef.h, matchpairpool.c, matchpairpool.h, matchpool.c,
23200      matchpool.h, stage1.c, stage1.h: Added special memory allocation routines
23201      for matches and matchpairs
23202
23203    * iit-read.c: Added an exception handler
23204
23205    * pair.c: Commented out unused procedure
23206
23207    * genome.c: Added include of except.h
23208
232092005-12-02  twu
23210
23211    * gmap.c: Fixed memory leak
23212
23213    * translation.c: Added separate routine for printing list of mutations.
23214      Fixed problem where number of cDNA nucleotides in codon is 4 or 5.
23215
23216    * stage2.c: Clarified different code for gmap and pmap
23217
23218    * stage1.c: Added checking routine for Stage1 object
23219
23220    * access.c, mem.c: Augmented debugging statements
23221
23222    * sequence.c: Fixed case where first sequence of FASTA file has no header,
23223      but subsequent sequences do.
23224
23225    * nr-x.c: Initial import into CVS
23226
23227    * pair.c: Added printing of aapos to all positions in "f -9" mode
23228
23229    * mutation.c: Simplified logic of merge functions
23230
232312005-11-29  twu
23232
23233    * match.h: Provided interface for new functions
23234
23235    * gmap.c: Fixed bug due to switched parameters
23236
23237    * stage3.c: Added comment
23238
23239    * config.site: Added information about defaults
23240
23241    * README: Added information about Cygwin and defaults
23242
232432005-11-28  twu
23244
23245    * stage1.c: Added include of match.h
23246
232472005-11-23  twu
23248
23249    * acinclude.m4, fopen.m4, configure.ac: Added commands to check for 'b' or
23250      't' flag to fopen
23251
23252    * pmap.c: Removed obsolete file
23253
23254    * Makefile.am, access.c, chrsubset.c, datadir.c, fopen.h, genome-write.c,
23255      genomeplot.c, gmap.c, gmapindex.c, iit-read.c, iit-write.c, iit_store.c,
23256      indexdb.c, oligo-count.c, pdldata.c, pmapindex.c: All calls to fopen now
23257      generalized to handle systems that allow or disallow the 'b' or 't' flag
23258
232592005-11-22  twu
23260
23261    * VERSION: Updated version
23262
23263    * Makefile.am: Removed coords1.test, which is now performed by setup1.test
23264      and setup2.test
23265
23266    * setup1.test.in, setup2.test.in: Added prerequisite of fa_coords program
23267      for setup tests
23268
23269    * README: Made instructions for raw genome build match changes in gmap_setup
23270
23271    * gmap_setup.pl.in: Changed name of make command
23272
23273    * gmap_setup.pl.in: Clarified comments
23274
23275    * gmap.c: Made npaths output correct when user provides a segment
23276
23277    * match.c, matchdef.h, stage1.c: Storing reciprocal of nentries to avoid
23278      repeating this calculation multiple times later
23279
23280    * setup1.test.in, setup2.test.in: Made changes in test to match changes in
23281      program
23282
23283    * align.test.ok, map.test.ok: Made change in output from Mutations to Amino
23284      acid changes
23285
23286    * Makefile.am: Made change in name of coords file
23287
23288    * README: Made instructions consistent with changes in programs
23289
232902005-11-21  twu
23291
23292    * fa_coords.pl.in: Changed a flag.  Output now going to stdout rather than
23293      stderr.
23294
23295    * gmap_setup.pl.in: Now making the call to fa_coords or md_coords within the
23296      Makefile
23297
23298    * matchpair.c: Turned off debugging
23299
23300    * match.c, match.h, matchdef.h: Storing number of entries for each match
23301
23302    * indexdb.c: Moved one type of debug macro into its own category
23303
23304    * stage1.c: Weighted dangling computation according to number of entries for
23305      each match
23306
23307    * translation.c: Fixed bug where pointer went past beyond sequence
23308
233092005-11-19  twu
23310
23311    * chrsubset.c, gmap.c: Added printing of chrsubset information.
23312      Consolidated printing of npaths information into a single function.
23313
23314    * backtranslation.c: Using the two aamarkers.  Allowing matches to codons
23315      even for frameshifts.
23316
23317    * mutation.c: Allowed merging of adjacent insertions
23318
23319    * translation.c: Made PMAP assignment of genomic amino acids conform to GMAP
23320      code for assignment of cDNA amino acids
23321
23322    * translation.c: Added further translation of cDNA beyond genomic stop
23323      codon, if possible
23324
23325    * translation.c: Streamlined code for amino acids to cDNA sequence
23326
23327    * translation.c: Overhaul of method for assigning amino acids to cDNA
23328      sequence, now based on separate marking and assignment of codons.
23329
23330    * pair.c, pairdef.h, pairpool.c: Created separate aamarkers for genomic and
23331      cDNA sequence
23332
233332005-11-18  twu
23334
23335    * gmap.c, params.c, params.h, stage3.c, stage3.h, translation.c,
23336      translation.h: Added flag for specifying maximum number of amino acid
23337      changes to show
23338
23339    * stage1.c: Fixed memory leak
23340
23341    * matchpair.c: Fixed read of uninitialized heap when bestsize == 0
23342
23343    * matchpair.c, matchpair.h: Removed storage of support value
23344
23345    * gmap.c, matchpair.c, matchpair.h, stage1.c: Moved sequence pruning
23346      procedures from gmap.c to matchpair.c
23347
23348    * stage1.c: Fixed bug which caused loop to continue unnecessarily
23349
23350    * gmap.c, result.c, stage1.h: Added complete option for freeing Stage 1
23351      objects
23352
23353    * stage1.c: Introduced idea of stepping through trials to identify poor
23354      genomic matches
23355
23356    * matchpair.c, matchpair.h: Introduced method for salvaging individual
23357      12-mer hits
23358
23359    * gmap.c, stage1.c, stage1.h: Simplified call to Stage1_matchlist
23360
23361    * stage1.c: Cleaning up parameters in preparation for cycling through stage 1
23362
233632005-11-17  twu
23364
23365    * indexdb.c: Added forward/backward to pre-loading messages for pmap
23366
23367    * translation.c: Skipping mutation calls on non-standard amino acids
23368
23369    * backtranslation.c: Fixed bug when trying to backtranslate non-standard
23370      amino acids
23371
23372    * gmap_setup.pl.in: Added intermediate commands to Makefile
23373
233742005-11-11  twu
23375
23376    * backtranslation.c: Improved matching of genomic codon to cdna codon.
23377
23378    * translation.c: Added debugging statement
23379
23380    * pair.c: Restored printing of genomic sequence for ambiguous matches in pmap
23381
233822005-11-10  twu
23383
23384    * genome-write.c: Added read of linefeed after FASTA entry in raw genome
23385      files. Improved speed of writing blocks of zeros or X's.
23386
23387    * gmapindex.c: Fixed bug in skip_sequence for raw genome files
23388
23389    * get-genome.c: Implemented printing of raw genome files
23390
23391    * gmap.c, stage3.c, stage3.h: Moved final translation and backtranslation
23392      steps into print procedures
23393
23394    * Makefile.am, backtranslation.c, backtranslation.h, translation.c,
23395      translation.h: Moved nucleotide consistency procedures for pmap into
23396      backtranslation.c
23397
23398    * pair.c: Removed consistency conversion.  Now being done by backtranslation
23399      procedures.  Removed meaning of AMBIGUOUS_COMP for compressed output of
23400      pmap.
23401
23402    * dynprog.c: Added actual coordinates to debugging statements
23403
23404    * translation.h: Made backtranslation procedure more rigorous.
23405
23406    * translation.c: Made backtranslation procedure more rigorous.  Added
23407      debugging statements.
23408
234092005-11-09  twu
23410
23411    * get-genome.c: Changed -r flag to also indicate use of the uncompressed
23412      genome file
23413
23414    * get-genome.c, sequence.c, sequence.h: Added uncompressed raw format for
23415      printing genome segment
23416
23417    * genome-write.c, genome-write.h, gmapindex.c: Added uncompressed raw format
23418      for genome file
23419
234202005-11-08  twu
23421
23422    * pair.c: Reformulated printing of protein-based PSL output
23423
23424    * intlist.c: Added include of stdio.h
23425
23426    * chimera.c: Removed include of nmath.h
23427
23428    * gmap.c: Allowed coordinate output for pmap.  Changed flag to -f 9.
23429
234302005-11-04  twu
23431
23432    * gmap_compress.pl.in: Allowed handling of PMAP output
23433
23434    * gmap_uncompress.pl.in: Fixed bug in printing last line of alignment
23435
234362005-11-01  twu
23437
23438    * gmap.c, pair.c, pair.h, stage3.c, stage3.h: Allowed introns to be printed
23439      in exon mode
23440
23441    * matchpair.c: Imposed the requirement that minsize be 2 or more away from
23442      bestsize
23443
234442005-10-31  twu
23445
23446    * gmap.c, pair.c, pair.h, stage3.c, stage3.h: Added ability to print exons
23447      using genomic sequence
23448
234492005-10-28  twu
23450
23451    * indexdb.c: Using lseek instead of fseek/fseeko for writing a positions
23452      file on disk
23453
23454    * access.c, access.h: Added a function for opening a file as read/write
23455
234562005-10-27  twu
23457
23458    * VERSION: Updated version
23459
23460    * Makefile.am: Restored setup2.test
23461
23462    * setup2.test.in: Made test use new gmap_setup script
23463
23464    * setup1.test.in: Removed install step
23465
23466    * gmap_setup.pl.in: Fixed bug in clean statement
23467
23468    * configure.ac: Added check for fseeko
23469
23470    * gmap.c: Added information about various type sizes to -V flag
23471
23472    * compress.c, genome-write.c, indexdb.c: Using fseeko if available
23473
234742005-10-25  twu
23475
23476    * gmap_setup.pl.in: Fixed bug where -W flag was in the wrong branch
23477
23478    * pair.c: Removed extraneous linefeed in compressed output
23479
23480    * VERSION: Updated version number
23481
23482    * pair.c: Fixed psl output
23483
23484    * README: Clarified use of ./configure flags.  Added instructions for the -C
23485      flag in fa_coords.
23486
23487    * gmap_setup.pl.in: Restored -W flag for writing directly to file
23488
23489    * gmap_setup.pl.in: Added instructions for clean to Makefile
23490
23491    * fa_coords.pl.in: Added -C flag to make each sequence a separate chromosome
23492
23493    * pair.c, pair.h, stage3.c: Added printing of cDNA direction in compressed
23494      output
23495
23496    * bigendian.h, mem.h: Added include of config.h.
23497
234982005-10-21  twu
23499
23500    * VERSION: Updated version
23501
23502    * README: Added instructions for running make after gmap_setup
23503
23504    * MAINTAINER: Added reminder to check for DEBUG mode
23505
23506    * pair.c, pair.h, stage3.c, stage3.h: Restored printing of strain information
23507
23508    * pdldata.c: Fixed typo
23509
23510    * oligo-count.c: Using new interface to indexdb
23511
23512    * gmap.c: Added error message if user tries to use strain information and
23513      file is not found
23514
23515    * gmap.c: Restored printing of strain information.  Added conversion to
23516      upper case for altstrain sequence.
23517
23518    * genome.c, indexdb.c: Added printing of number of bytes
23519
23520    * access.h: Added MAX32BIT
23521
23522    * access.c: Added debugging statements
23523
23524    * Makefile.am: Added needed files
23525
235262005-10-18  twu
23527
23528    * configure.ac: Added warning message if mmap not available
23529
23530    * fa_coords.pl.in: Added ability to read from stdin
23531
23532    * setup1.test.in: Added install command
23533
23534    * access.c, access.h: Made Access_filesize an external routine
23535
23536    * genuncompress.c, pdldata.c: Using routines from access.c
23537
23538    * Makefile.am: Added access.c and access.h to programs with IIT_T object
23539
23540    * chrsubset.c: Added include of config.h
23541
23542    * genome-write.c, genomeplot.c, get-genome.c, iit_dump.c, iit_get.c,
23543      pmapindex.c, segmentpos.c: Changed calls to IIT_free and IIT_annotation
23544
23545    * gmapindex.c: Removing free of accsegmentpos_table, which fails on some
23546      computers
23547
23548    * gmap.c: Reading user-provided genomic segment and reference sequence
23549      before FASTA query
23550
23551    * iit-write.c, iit-write.h: Made write version of IIT_free static and
23552      renamed it.
23553
23554    * iit-read.h: Changed interface to IIT_annotation.
23555
23556    * iit-read.c: Added FILEIO mode for reading IIT_T objects.  Changed
23557      interface to IIT_annotation.
23558
23559    * iitdef.h: Made mutex part of IIT_T object.  Added offset to IIT_T object
23560      for FILEIO mode.
23561
23562    * indexdb.c: Made mutexes part of Indexdb_T object.  Changed calls to
23563      IIT_annotation.
23564
23565    * genome.c: Made mutex part of Genome_T object
23566
23567    * access.h: Added flag for randomp.  Added function for read/write mmap.
23568
23569    * access.c: Moved file size determination to a separate function
23570
235712005-10-14  twu
23572
23573    * gmap.c: Moved reading of input sequences to beginning
23574
23575    * indexdb.c: Minor fixes
23576
23577    * access.h: Returning length and time for Access_immediate
23578
23579    * access.c: Returning length and time for Access_immediate.  Forcing read of
23580      pages during pre-load.
23581
23582    * datadir.c, gmap.c: Removed unused variables
23583
23584    * genome.c, stage3.c: Added necessary include file
23585
23586    * result.c: Addressed compiler warning
23587
23588    * pair.c: Fixed faulty print statement in pslformat_nt
23589
23590    * indexdb.c: Added necessary include file.  Removed unnecessary variables.
23591
23592    * dynprog.c: Applied type conversion for char to access array
23593
23594    * Makefile.am: Added access.c and access.h
23595
23596    * blackboard.c, blackboard.h, gmap.c: Added nextchar to Blackboard_T object
23597
23598    * gmap.c: Now reading first sequence in main thread, and using existence of
23599      a second sequence to determine whether to start threads and to pre-read
23600      offsets file for GMAP.  Conditioning some flags based on existence of mmap
23601      and threading support.
23602
23603    * datadir.h: Removed unnecessary include
23604
23605    * intlist.c: Minor fix to resolve gcc compiler warning
23606
23607    * access.c, access.h, genome.c, indexdb.c: Standardized file access routines
23608      and moved them to access.c
23609
236102005-10-13  twu
23611
23612    * genomeplot.c, plotdata.c, plotdata.h: Fixed ASCII printing of universal
23613      coordinates when a range is selected
23614
23615    * matchpair.c: Fixed calculation of genome length for segment
23616
23617    * sequence.c: Fixed Sequence_read_unlimited to handle sequences without a
23618      header line.
23619
236202005-10-12  twu
23621
23622    * gmap_setup.pl.in: Changed program to generate a Makefile
23623
23624    * fa_coords.pl.in, md_coords.pl.in: Deleted comment about gmap_setup running
23625      time
23626
23627    * gmap_process.pl.in: Initial import into CVS
23628
23629    * Makefile.am: Added instructions for gmap_process
23630
23631    * setup1.test.in: Modified setup test for new interface to utility programs
23632
23633    * Makefile.am: Modified setup test to put binary files in tests directory
23634
23635    * MAINTAINER: Minor note to self
23636
23637    * Makefile.am: Made FULLDIST work for gmap sources
23638
23639    * gmap.c: Made separate flags for batch for offsets and batch for positions
23640      file.  Simplified input thread.
23641
23642    * indexdb.h: Made separate flags for batch for offsets and batch for
23643      positions file
23644
23645    * indexdb.c: Added memory mapping for offsets files under PMAP.  Made
23646      separate flags for batch for offsets and batch for positions file.
23647
23648    * oligoindex.c: Removed stop codon from oligomers in stage 2
23649
23650    * Makefile.am: Moved beta code for GMAP into a separate program
23651
23652    * stage2.c: Moved PMAP conditionals out of debugging statements
23653
236542005-10-11  twu
23655
23656    * Makefile.am: Removed conditional distribution of files
23657
23658    * translation.c: Including comp.h header
23659
23660    * gmap.c, pmapindex.c: Changed PMAP indexing interval to be based on amino
23661      acids.
23662
23663    * oligop.c: Removed STOP from amino acid alphabet.
23664
23665    * gmapindex.c: Generating chrsubset file at same time as chromosome file
23666
23667    * indexdb.h: Removed STOP from amino acid alphabet.  Changed PMAP interval
23668      to be based on amino acids.
23669
23670    * indexdb.c: Simplified conversion of oligomer to amino acid index for PMAP.
23671      Removed STOP from amino acid alphabet.  Computing each protein frame
23672      separately.
23673
23674    * configure.ac: Added large file support with AC_SYS_LARGEFILE.  Removed
23675      setup test number 2.  Added gmap_process.
23676
23677    * acinclude.m4: Removed macros for O_LARGEFILE
23678
23679    * open-flags.m4: Removed file open-flags.m4
23680
236812005-10-10  twu
23682
23683    * acinclude.m4, open-flags.m4: Added check for O_LARGEFILE in open
23684
236852005-10-07  twu
23686
23687    * gmap_setup.pl.in: Restored -W flag and improved it
23688
236892005-10-06  twu
23690
23691    * VERSION: Updated version
23692
23693    * configure.ac: Added hook for pmap_setup.pl
23694
23695    * README: Added explanation of full, uncompressed genome, and of batch modes
23696
23697    * gmap_setup.pl.in: Added checks to make sure desired files are built.
23698      Added printing of commands to stdout.
23699
23700    * pmap_setup.pl.in: Added checks to make sure desired files are built
23701
23702    * gmap_compress.pl.in, gmap_uncompress.pl.in, pair.c: Altered format of
23703      compressed output to indicate ambiguous matches
23704
23705    * genome.c: Fixed batch loading of full genomes greater than 2 gigabytes
23706
23707    * gmap.c: Modified message about batch mode and multiple threads mode
23708
23709    * stage2.c: Parameterized alignment characters and defined them centrally in
23710      comp.h.  Restored previous intron penalties based on length.
23711
23712    * pair.c: Parameterized alignment characters and defined them centrally in
23713      comp.h.  Now printing ambiguous nucleotide matches.
23714
23715    * dynprog.c: Parameterized alignment characters and defined them centrally
23716      in comp.h.  Added separate table for consistent nucleotide pairs.
23717
23718    * Makefile.am, comp.h, pairpool.c, stage3.c, translation.c: Parameterized
23719      alignment characters and defined them centrally in comp.h
23720
237212005-10-05  twu
23722
23723    * gmap.c: Changed batch mode to be of two types: pre-loading of indices
23724      only, and pre-loading of both indices and genome.
23725
23726    * gmap.c: Clarified various user messages
23727
23728    * indexdb.c: Added an explicit check for a nonsensical offsets file
23729
23730    * pair.c: Made margin width determined dynamically in printing the alignments
23731
237322005-10-03  twu
23733
23734    * dynprog.c: Removed reverse intron possibilities from PMAP
23735
23736    * gmapindex.c: Restored monitoring output for logging contigs
23737
23738    * indexdb.c: Added fwd/rev to monitoring commands for indexing offsets and
23739      position files
23740
23741    * compress.c: Added monitoring commands for compressing and uncompressing
23742      files
23743
23744    * gmap_setup.pl.in: Clarified behavior and instructions for building a full
23745      (uncompressed) genome file
23746
23747    * fa_coords.pl.in: Abbreviated monitoring output, with a parameter that
23748      controls which contigs to ignore
23749
23750    * Makefile.am: Added make instructions for pmap_setup
23751
23752    * pmap_setup.pl.in: Initial import into CVS
23753
237542005-10-01  twu
23755
23756    * gmap.c: Performing translation of query sequence and genomic segment to
23757      upper case.  Turned off stage 1 for user-provided genomic segment in PMAP.
23758      Provided -G flag for specifying full genome, if it exists.
23759
23760    * genome.c: Turned warning into error, if user wishes to use a full genome
23761      and none exists
23762
23763    * dynprog.c: Allowed intron gap parameter to be arbitrarily large
23764
23765    * pair.c, stage2.c, translation.c: Fixed handling of user-provided genomic
23766      segment with lower case characters for PMAP
23767
23768    * stage1.c: Improved debugging statement
23769
23770    * oligoindex.c: Minor formatting change
23771
23772    * mem.c: Enhanced trap features
23773
23774    * boyer-moore.c: Removed assertions
23775
237762005-09-30  twu
23777
23778    * boyer-moore.c, dynprog.c, dynprog.h, stage3.c, stage3.h: Made stage 3 use
23779      upper case for query sequence and genomic segment when needed, but
23780      original sequences for building alignment
23781
23782    * oligoindex.c, oligoindex.h, stage2.c, stage2.h: Made stage 2 use upper
23783      case for query sequence and genomic segment for oligomer chaining, but
23784      original sequences for building alignment
23785
23786    * oligo.c, oligop.c, stage1.c, stage1.h: Made stage 1 assume upper case
23787      query sequence
23788
23789    * pair.c: Removed call to toupper
23790
23791    * complement.h, sequence.c, sequence.h: Provided utilities for making
23792      uppercase and alias versions of sequences
23793
23794    * compress.c: Added toupper as reason for including ctype.h
23795
23796    * plotdata.c: Revised autoscale function
23797
237982005-09-29  twu
23799
23800    * stage2.c: For PMAP, fixed bug where C terminus of query sequence was not
23801      aligned.  Eliminated computation of reverse intron direction for PMAP.
23802
23803    * oligoindex.c: Modified comments
23804
23805    * gmap.c: Removed Sequence_trim for PMAP, and reduced stage 2 indexsize.
23806
23807    * pair.c: Changed psl output to reflect definition of a block to be a region
23808      without indels or gaps, instead of an exon
23809
238102005-09-22  twu
23811
23812    * oligoindex.c: Make amino acid index for stage 2 (with 21 amino acids)
23813      distinct from that of stage 1 (with 16)
23814
23815    * indexdb.c, indexdb.h, oligop.c: Compressing 21 amino acids into 16 to
23816      allow offsets of amino acid 7-mers to fit into less than 2 GB
23817
238182005-09-21  twu
23819
23820    * stage1.c: Parameterized size of oligomers for PMAP
23821
23822    * gmap.c, indexdb.h: Parameterized interval for stage 1 when user provides a
23823      genomic segment
23824
23825    * pmapindex.c: Parameterized size of oligomers
23826
23827    * matchpair.c: Turned off debugging
23828
23829    * indexdb.c, indexdb.h: Introduced indexing of 7-mers by PMAP
23830
23831    * VERSION: Updated version
23832
23833    * gmap_setup.pl.in: Commented out -W flag for forcing write to file.  Added
23834      option -G for making an uncompressed version of the genome (.genome file).
23835
23836    * fa_coords.pl.in: Allowed both chr and Chr in parsing for chromosomal
23837      mapping
23838
23839    * config.site: Clarified possible choices for LDFLAGS
23840
23841    * matchpair.c: Penalizing clusters spread out in repetitive genomic regions
23842
23843    * pdlimage.c: Made images in color
23844
238452005-09-20  twu
23846
23847    * gmapindex.c: Commented out monitoring statement about logging contigs
23848
23849    * stage1.c: Fixed a bug involving subtraction of two unsigned ints into a
23850      signed int, occurring for chromosomes greater than 2^31 in length.
23851
238522005-09-19  twu
23853
23854    * stage2.c: Fixed bug when stage 2 fails
23855
23856    * pair.c: Fixed assessment of unknown bases for PMAP queries
23857
23858    * matchpair.c: Fixed computation of stretch for PMAP protein queries
23859
23860    * indexdb.c: Removed debugging flag
23861
23862    * iit_get.c: Added termination message and flushing output when input coming
23863      from stdin
23864
23865    * gmap.c: Added debugging messages
23866
238672005-09-16  twu
23868
23869    * Makefile.am, pdlimage.c: Initial addition of pdlimage to CVS.
23870
238712005-09-08  twu
23872
23873    * iit-read.c, iit-read.h, stage3.c: Added option to print levels of map
23874      results
23875
23876    * intlist.c, intlist.h: Added command for Intlist_to_string
23877
23878    * gmap.c: Modified directory printing to go to a given file pointer.  Added
23879      information about default directory to print_version command.
23880
23881    * datadir.c, datadir.h, get-genome.c: Modified directory printing to go to a
23882      given file pointer
23883
23884    * genomeplot.c, plotdata.c, plotdata.h: Changed format of positions file.
23885      Changed title for summary genome plots.
23886
238872005-09-06  twu
23888
23889    * get-genome.c: Added ability to print levels of map contents.  Fixed bug in
23890      interpreting an entire chromosome.
23891
238922005-09-02  twu
23893
23894    * genomeplot.c, pdldata.c, pdldata.h, plotdata.c, plotdata.h: Generalized
23895      variable for transform and added reciprocal
23896
23897    * genomeplot.c, plotdata.c, plotdata.h: Added option to autoscale
23898
23899    * genomeplot.c, pdldata.c, pdldata.h, plotdata.c, plotdata.h: Added options
23900      for computing summary of multiple samples
23901
23902    * genomeplot.c, plotdata.c, plotdata.h: Added functions for printing a
23903      threshold line, and for printing output in ascii format.
23904
23905    * pdldata.c: Now removing line feeds from annotations.  If no annotations
23906      are available, using sample numbers.
23907
239082005-08-29  twu
23909
23910    * plotgenes.c, plotgenes.h: Improved display of genes
23911
23912    * genomeplot.c: Made changes so PDL file is read only when necessary.  Added
23913      extra room for showing genes.
23914
239152005-08-26  twu
23916
23917    * genomeplot.c, plotdata.c, plotdata.h: Printing accession header only if
23918      one sample per page.  Reduced default number of genomes per page to 12.
23919
23920    * genomeplot.c, pdldata.c, pdldata.h, plotdata.c: Added ability to read
23921      sample identifiers from a separate file for PDL input
23922
23923    * genomeplot.c: Added ability to plot a single page
23924
239252005-08-23  twu
23926
23927    * gmap.c, pair.c, pair.h, stage3.c, stage3.h: For PMAP, allowed PSL output
23928      in both nucleotide coordinates and protein coordinates
23929
23930    * Makefile.am, genomeplot.c, plotgenes.c, plotgenes.h: Added ability to plot
23931      genes
23932
239332005-08-18  twu
23934
23935    * gmap.c, stage3.c, stage3.h: Added option for printing coordinates
23936
23937    * plotdata.c, plotdata.h: Added options for printing dots and overlapping
23938      samples
23939
23940    * pair.c, pair.h: Added option for printing coordinates.  Trying to fix PSL
23941      output for PMAP.
23942
23943    * genomeplot.c: Added option for printing dots and overlapping samples
23944
23945    * color.c, color.h: Added color brewer palette
23946
23947    * plotdata.c: Prevented printing of empty strings
23948
239492005-08-16  twu
23950
23951    * plotdata.c, plotdata.h: Fixed bug when only a subset of genes is selected.
23952       Added commands for gif output.
23953
23954    * get-genome.c, gmap.c: Showing available map files when valid one is not
23955      entered
23956
23957    * datadir.c, datadir.h: Added function to list directory contents
23958
23959    * genomeplot.c: Allowed user to specify a list of samples to plot
23960
23961    * intlist.c, intlist.h: Added function Intlist_from_string
23962
23963    * stage3.c: Fixed mapping to account for cDNA direction
23964
239652005-08-10  twu
23966
23967    * Makefile.am, genomeplot.c, pdldata.c, pdldata.h, plotdata.c, plotdata.h:
23968      Allowed genomeplot to read PDL files
23969
239702005-08-07  twu
23971
23972    * iit-read.c: Improved debugging statements
23973
23974    * gmap.c, stage3.c, stage3.h: Added option to map by exons
23975
23976    * pair.c, pair.h: Added function to retrieve exon bounds
23977
239782005-08-04  twu
23979
23980    * Makefile.am, gmap.c: Merged pmap main code into gmap.c
23981
23982    * sequence.c, sequence.h: Added functionality for pmap chimeras
23983
23984    * stage3.c, stage3.h: Changed function to take queryntlength instead of
23985      queryseq.  Made function work with both gmap and pmap.
23986
23987    * chimera.c, chimera.h: Changed functions to take queryntlength instead of
23988      queryseq
23989
23990    * pair.c: Made PSL format for proteins print protein coordinates
23991
239922005-08-03  twu
23993
23994    * chimera.c, chimera.h, gmap.c: Changed chimera algorithm to potentially
23995      search both sides of an incomplete alignment
23996
239972005-08-02  twu
23998
23999    * stage3.c: Increased size of merge length for chimeric exon-exon junctions
24000
24001    * sequence.c: Restored trimming of subsequences
24002
240032005-08-02  gcavet
24004
24005    * modules: put back to original state
24006
24007    * modules: added dev module
24008
240092005-08-01  twu
24010
24011    * stage3.c, stage3.h: Changed chimeric margin detection to work on both ends
24012
24013    * stage1.c: Added debugging statements
24014
24015    * chimera.c, chimera.h, gmap.c: Changed chimeric search to work on both ends
24016      that fail to align
24017
24018    * sequence.c: Turned trimming off for subsequences
24019
24020    * pair.c, pair.h: Added indel penalties at appropriate end for chimeric path
24021      scores
24022
24023    * Makefile.am, get-genome.c: Allowed user to look up information in map iit
24024      files
24025
24026    * chimera.c: Tested code for checking if breakpoint is outside the alignment
24027
240282005-07-29  twu
24029
24030    * datum.c, datum.h, genomeplot.c, plotdata.c, plotdata.h: Allowed colors to
24031      be specified in input file
24032
24033    * chrsubset.c, chrsubset.h, genomeplot.c, plotdata.c: Implemented
24034      user-selected genomic range
24035
240362005-07-28  twu
24037
24038    * plotdata.c: Allowed program to handle NaNQs
24039
24040    * genomeplot.c, plotdata.c, plotdata.h: Added log and signed cube root
24041      functions
24042
24043    * genomeplot.c, plotdata.c, plotdata.h: Added ability to print genome on a
24044      single line
24045
240462005-07-27  twu
24047
24048    * genomeplot.c: Added ability to handle multiple samples
24049
24050    * plotdata.c, plotdata.h: Added ability to detect and read header lines.
24051      Made code for starting and ending pages extern.
24052
24053    * genomeplot.c, plotdata.c, plotdata.h: Implemented printing of circular
24054      genome
24055
240562005-07-26  twu
24057
24058    * pair.c: Corrected query coordinates of chimera in compressed mode
24059
24060    * pairpool.c: Fixed problem where a gap was left at the 5' end of a bounded
24061      transfer.
24062
24063    * datum.c, datum.h, genomeplot.c, plotdata.c, plotdata.h: Added ability to
24064      read specified colors for each line
24065
24066    * datum.c, datum.h, genomeplot.c, plotdata.c, plotdata.h: Allowed printing
24067      of segments between chromosomes
24068
240692005-07-25  twu
24070
24071    * pmap.c, stage3.c, stage3.h: Allowed chimeric pieces to be merged over
24072      longer length if ends have strong splice sites
24073
24074    * stage1.c: Restored reader overlap for longer sequences
24075
24076    * gmap.c, iit-read.c, iit-read.h: Added chromosomal positions to map
24077      information
24078
24079    * chrnum.c, chrnum.h: Added function to get offset for a chrnum
24080
24081    * Makefile.am: Added chrsegment.h to sources for genomeplot
24082
240832005-07-22  twu
24084
24085    * gmap.c, pmap.c, stage3.c, stage3.h: Allowed two parts of chimera to merge
24086      if close on the genome
24087
240882005-07-21  twu
24089
24090    * stage3.c, stage3.h, translation.c, translation.h: Clarified code specific
24091      to PMAP and GMAP
24092
24093    * sequence.c: Changed sequence header for PMAP to refer to amino acids
24094
24095    * pair.c, pair.h: Added ability to print inferred nucleotide sequence for
24096      PMAP
24097
24098    * oligop.c: Clarified meaning of INDEX1PART to be number of amino acids
24099
24100    * pmapindex.c: Clarified meaning of INDEX1PART to be number of nucleotides
24101
24102    * oligo-count.c: Using new interface for Reader_new
24103
24104    * indexdb.c, indexdb.h: Clarified meaning of INDEX1PART to be number of
24105      nucleotides for PMAP.
24106
24107    * pmap.c: Removed -q flag for specifying stage 1 interval, and removed -T
24108      flag for truncating sequence at full-length protein.  Specified -Q flag to
24109      be printing of inferred nucleotide sequence.
24110
24111    * gmap.c: Removed -q flag
24112
24113    * Makefile.am: Added beta source files for pmap
24114
24115    * stage1.c: Introduced min intron length.  Clarified meaning of INDEX1PART
24116      to be number of amino acids.  Added debugging statements.
24117
24118    * reader.c, reader.h: Allowed crossover of start pointer and end pointer so
24119      that middle oligomers will be read.  Should help in mapping of short
24120      sequences.
24121
24122    * translation.c, translation.h: Moved combinatorial testing of codons to
24123      translation step
24124
24125    * stage3.h: Performing protein translation only when necessary.
24126
24127    * stage3.c: Considering only forward intron directions for pmap.  Performing
24128      protein translation only when necessary.
24129
24130    * stage2.c: Considering only forward intron directions for pmap
24131
24132    * sequence.c: Changed header line for pmap
24133
24134    * pair.c: Made further changes to accommodate plus sign in alignment
24135
24136    * pair.c: Introduced plus sign in alignment
24137
24138    * oligoindex.c: Improved efficiency of analyzing genomic segment, by storing
24139      indices for each frame
24140
24141    * genome.c: Added debugging statements
24142
24143    * dynprog.c, dynprog.h: Changed combinatorial instantiation of codons to a
24144      single instantiation
24145
24146    * Makefile.am, params.c, params.h, pmap.c: Gave pmap the same overall
24147      behavior as gmap, including multi-threading and flag options
24148
241492005-07-19  twu
24150
24151    * Makefile.am, block.c, block.h, dynprog.c, dynprog.h, gmap.c, indexdb.c,
24152      indexdb.h, oligoindex.c, oligop.c, oligop.h, pair.c, pmap.c, pmapindex.c,
24153      sequence.c, sequence.h, stage1.c, stage1.h, stage2.c, stage2.h, stage3.c,
24154      stage3.h, translation.c, translation.h: Introduced pmap and pmapindex
24155
241562005-07-15  twu
24157
24158    * get-genome.c: Added range format to allow negative lengths
24159
24160    * genome.c: Added exception when requested length exceeds allocated buffer
24161      length
24162
24163    * except.c: Added printing of exception message
24164
24165    * chimera.c: Fixed problem when donor or acceptor length exceeded allocated
24166      buffer length
24167
24168    * Makefile.am: Fixed handling of non-distributed source code
24169
24170    * align.test.ok: Changed genomic coordinate to match new computation of
24171      coordinates in gaps
24172
24173    * VERSION: Updated version number
24174
241752005-07-13  twu
24176
24177    * dynprog.c: Made genomic positions on left and right ends of gap constant,
24178      to avoid problems in stage 3 computations
24179
24180    * memchk.c: Made procedures thread-safe
24181
24182    * stage3.c, stage3.h: Fixed genomic positions on left and right ends of gap.
24183       Removing gaps at 5' end, possibly introduced by smoothing.
24184
24185    * mem.c, mem.h: Improved memory trap procedures
24186
24187    * matchpair.c: Added check before freeing some possibly null structures
24188
24189    * genome.c: Removed duplicate FREE of filename
24190
24191    * gmap.c: Fixed genomic positions on left and right ends of gap.  Fixed bug
24192      when chimera was not reset to NULL.
24193
24194    * pair.c, pair.h: Fixed genomic positions on left and right ends of gap
24195
241962005-07-12  twu
24197
24198    * stage3.c: Fixed bugs in computing dual introns, dealing with previously
24199      computed gaps, and returning coordinates for empty peelbacks.
24200
24201    * stage1.c: Increased parameters for maximum number of matching pairs
24202      considered
24203
24204    * pairpool.c: Enhanced debugging output
24205
24206    * pair.c, pair.h: Added procedure for printing a single pair
24207
24208    * mutation.c: Changed unnamed unions to named unions
24209
242102005-07-08  twu
24211
24212    * stage3.c: Added extra check to make sure pairs is non-empty
24213
24214    * gmap.c: Initialized chimera to be NULL
24215
24216    * oligoindex.c: Fixed bug caused by writing to a random location when
24217      indexsize < 8.
24218
24219    * mem.c: Improved trap code
24220
24221    * memchk.c: Changed types to be consistent with regular version of memory
24222      manager
24223
24224    * memchk.c: Added checking implementation of memory manager
24225
24226    * stage3.c: Fixed a segmentation fault bug.
24227
24228    * stage2.c: Changed distpenalty to ignore query distance and
24229      max_intronlength, and simplified computation.  These values were probably
24230      not affecting previous computation anyway.
24231
24232    * mutation.c: Fixed problem caused by removal of unnamed union
24233
24234    * iitdef.h: Included header file for off_t type.
24235
24236    * gmap.c: Changed maxpeelback for cross-species mode back to previous value
24237
24238    * VERSION: Updated version
24239
24240    * configure.ac: Added check for caddr_t type.  Added check for madvise flags.
24241
24242    * gmap.c: Removed unnecessary variables and arguments.  Changed variable
24243      type of nworkers.
24244
24245    * oligoindex.h, pairpool.c, pairpool.h, stopwatch.c, stopwatch.h: Added
24246      formal void argument
24247
24248    * compress.c, genome.c, genome.h, matchpair.c, oligoindex.c, stage2.c:
24249      Removed unnecessary variables
24250
24251    * block.c, boyer-moore.c, dynprog.c, dynprog.h, matchpair.h, oligo.c,
24252      oligo.h, stage1.c, stage1.h: Removed unnecessary arguments
24253
24254    * sequence.c, sequence.h: Added formal void argument.  Changed some variable
24255      types.
24256
24257    * indexdb.c, match.c, segmentpos.c: Changed print statement
24258
24259    * pair.c: Added static specification to some functions
24260
24261    * iit-read.c, iitdef.h: Changed some variable types
24262
24263    * indexdb.c: Increased interval of monitoring output from 1 million nt to 10
24264      million nt
24265
24266    * gmap_uncompress.pl.in: Fixed bug in argument list
24267
24268    * bigendian.c, boyer-moore.c, chrom.c, chrsubset.c, chrsubset.h, compress.c,
24269      datadir.c, dynprog.c, except.c, genome-write.c, genome.c, get-genome.c,
24270      gmap.c, gmapindex.c, iit-read.c, iit-write.c, iit_dump.c, iit_get.c,
24271      iit_store.c, indexdb.c, interval.h, intlist.c, list.c, match.c,
24272      matchpair.c, md5-compute.c, md5.c, mutation.c, oligo-count.c, oligo.c,
24273      oligoindex.c, pair.c, pairpool.c, params.h, reqpost.c, segmentpos.c,
24274      segmentpos.h, sequence.c, smooth.c, stage1.c, stage2.c, stage3.c,
24275      translation.c, uintlist.c: Made changes to satisfy pedantic gcc compiler
24276      warnings and to comply with ANSI C
24277
24278    * acinclude.m4: Added autoconf macro for madvise flags
24279
24280    * MAINTAINER: Added comment about strict compiler checking
24281
24282    * madvise-flags.m4: Initial import into CVS
24283
242842005-07-07  twu
24285
24286    * gmap.c: Changed parameters to prevent segmentation fault in cross-species
24287      mode
24288
24289    * gmap.c, stage3.c, stage3.h, pair.c, pair.h: Added psl output format
24290
24291    * stage1.c: Increased matchpairs allowed at pre-unique stage
24292
24293    * match.c, match.h: Trivial formatting change
24294
24295    * gmapindex.c: Removed some unused variables
24296
24297    * get-genome.c: Changed usage statement for coordinate interval
24298
24299    * datadir.c: Added error message when genome subdirectory is not readable
24300
24301    * chrnum.c, chrnum.h: Added Chrnum_length command, needed for psl output
24302      format
24303
243042005-06-23  twu
24305
24306    * VERSION: Updated version for release
24307
24308    * stage2.c: Increased cross-species penalty for intron length
24309
24310    * gmap.c: Added other constraints on using oligo depth.  Reporting failure
24311      type.  Separated out beta source files from gmap.
24312
24313    * result.c, result.h: Added failure type
24314
24315    * chrsubset.c, chrsubset.h, plotdata.c: Added checks if chrsubset is NULL.
24316
24317    * genomeplot.c: Added getopt to genomeplot.  Added mode for printing
24318      segments.
24319
24320    * Makefile.am: Added getopt to genomeplot.  Separated out beta source files
24321      from gmap.
24322
243232005-06-21  twu
24324
24325    * genomeplot.c, plotdata.c, plotdata.h: Fixed coloring of raw data,
24326      depending on whether segmentation is performed.
24327
24328    * Makefile.am: Moved chrsegment functionality to genomeplot
24329
24330    * gmap.c: Giving crossspecies flag to Stage 2
24331
24332    * genomeplot.c: Getting segment breakpoints back in three separate lists
24333
24334    * chrsegment.c, chrsegment.h: Added re-checking of segment breakpoints
24335
24336    * intlist.c, intlist.h: Added Intlist_delete function
24337
24338    * stage2.c, stage2.h: Implemented different intron penalties for
24339      crossspecies mode
24340
24341    * stage1.c: Restored full functionality for crossspecies mode
24342
243432005-06-16  twu
24344
24345    * stage1.c: Added check for too many matchpairs before applying
24346      Matchpair_filter_unique
24347
24348    * chrsegment.c, chrsegment.h: Using chromosomal positions in calculations
24349
24350    * genomeplot.c, plotdata.c, plotdata.h: Modified calls to Plotdata_values
24351      and Plotdata_chrpositions
24352
24353    * genomeplot.c, iit-read.c, iit-read.h: Added function IIT_length
24354
24355    * plotdata.c: Storing chrpositions and values as individual arrays
24356
24357    * genomeplot.c: Using a tree structure to store segment results.
24358
24359    * chrsegment.c, chrsegment.h: Using a tree structure to store segment
24360      results.  Added check for single breakpoint in addition to double
24361      breakpoints.
24362
243632005-06-15  twu
24364
24365    * chrsegment.c, chrsegment.h, genomeplot.c: Implemented recursive
24366      segmentation, generating a list of segments
24367
24368    * iit-read.c: Fixed problem with memory fault
24369
24370    * Makefile.am, chrsegment.c, chrsegment.h, genomeplot.c, plotdata.c,
24371      plotdata.h: Merged chrsegment functionality into genomeplot
24372
24373    * genomeplot.c: Fixed some memory leaks
24374
24375    * datum.c, datum.h: Added Datum_T object for use by Plotdata_T
24376
24377    * Makefile.am: Added program chrsegment and added Datum_T object to
24378      genomeplot
24379
24380    * chrsegment.c, nr-x.h: Added program chrsegment
24381
24382    * plotdata.c, plotdata.h: Now storing data as sorted within each chromosome
24383
24384    * chrsubset.c, chrsubset.h: Added function to compute and retrieve old
24385      indices
24386
243872005-06-14  twu
24388
24389    * Makefile.am, chrsubset.c, chrsubset.h, color.c, color.h, doublelist.c,
24390      doublelist.h, genomeplot.c, plotdata.c, plotdata.h: Added program
24391      genomeplot
24392
24393    * uintlist.c: Fixed typo
24394
24395    * iit-read.c: Skipping freeing of memory, since it sometimes gives a memory
24396      fault.
24397
24398    * Makefile.am, chimera.c, maxent.c, maxent.h, splice-site.c, splice-site.h:
24399      Changed splice site predictor from scoring matrix to maxent method
24400
244012005-06-10  twu
24402
24403    * indexdb.c: Added error message when user-provided genomic segment is
24404      invalid
24405
244062005-06-03  twu
24407
24408    * chimera.c, chimera.h: Added detection of exon-exon boundary for chimeras
24409      in both forward and reverse directions
24410
24411    * gmap.c, pair.c, pair.h, stage3.c, stage3.h: Added output of cDNA direction
24412      of exon-exon boundary for chimeras
24413
244142005-06-02  twu
24415
24416    * dynprog.c, dynprog.h, gmap.c, stage3.c: Restored previous behavior for
24417      finding microexons.  Changed meaning of end_microexons_p to be an
24418      allowance for longer introns at the ends.
24419
24420    * chimera.c: Improved debugging output
24421
244222005-06-01  twu
24423
24424    * stage3.c, stage3.h: Added utilities for new chimera functions.
24425
24426    * gmap.c: Added stage 3 calls for truncating full length.  Using Chimera_T
24427      objects and new chimera functions.
24428
24429    * stage1.c, stage1.h: Using ends only for cross-species mode in stage 1
24430
24431    * result.c, result.h: Created Chimera_T object.
24432
24433    * pair.c, pair.h: Added utility programs for chimera evaluation.
24434
24435    * chimera.c, chimera.h: Added search for exon-exon boundaries in chimeras.
24436      Created Chimera_T object.
24437
24438    * Makefile.am, splice-site.c, splice-site.h: Added splice site calculations
24439      to chimera evaluation.
24440
244412005-05-24  twu
24442
24443    * gmap.c: Moved translate calls up to gmap.c.  Added hook for -T flag for
24444      truncating full-length sequence.
24445
24446    * stage3.c, stage3.h: Using function Pairpool_transfer_bounded.  Moved
24447      translate calls up to gmap.c.
24448
24449    * pairpool.c, pairpool.h: Added function Pairpool_transfer_bounded.
24450
244512005-05-20  twu
24452
24453    * VERSION: Revised version
24454
24455    * stage3.c, stage3.h: Turned off default microexon finding at ends.  Cleaned
24456      up margin function for identifying chimeras.
24457
24458    * pair.c: Changed computation of matchscores
24459
24460    * oligoindex.c: Changed definition of oligodepth.
24461
24462    * gmap.c: Added -U flag to turn on microexons at ends.  Changed code for
24463      chimeras, and changed meaning of -x flag.
24464
24465    * chrsubset.c: Added check on freeing object.
24466
24467    * chimera.c: Fixed debugging statement
24468
24469    * Makefile.am: Added beta testing flag.
24470
244712005-05-09  twu
24472
24473    * Makefile.am: Added compiler instructions for pthreads to various programs
24474
24475    * VERSION: Modified version number.
24476
24477    * README: Added information about -E feature of fa_coords and gmap_setup,
24478      and information about editing coords.txt.
24479
24480    * MAINTAINER: Added reminder to modify VERSION.
24481
24482    * gmap_setup.pl.in: Removed reverse complement procedures here; now being
24483      done by gmapindex.  Allowed specification of a command.
24484
24485    * fa_coords.pl.in: Introduced chromosome NA for headers that cannot be
24486      parsed.  Allowed specification of a command.  Improved handling of Celera
24487      genomes.
24488
24489    * genome.c, indexdb.c: Put mutexes around read procedures for the
24490      combination of multi-threading and non-memory mapped reading of file.
24491
24492    * gmap.c: Fixed bug from uninitialized querysubseq.
24493
24494    * pair.c, pair.h, result.c, result.h, stage3.c, stage3.h: Allowed printing
24495      of range of chimera breakpoints
24496
24497    * interval.c, interval.h: Changed interface to some functions
24498
24499    * iit-read.c, iit-write.c: Fixed bug in debug version of dump.  Changed
24500      calls to Interval_T functions.
24501
24502    * gmapindex.c: Changed count_sequence() to read a line at a time
24503
24504    * genome-write.c: Properly handling contigs marked as reverse complement.
24505
24506    * gmap.c: Using fscore threshold to determine statistical significance.
24507      Reporting equivalent positions for breakpoint.
24508
24509    * chimera.c, chimera.h: Using fscore threshold to determine statistical
24510      significance
24511
245122005-05-06  twu
24513
24514    * gmap_setup.pl.in: Handling other NCBI cases where version numbers are
24515      missing
24516
24517    * genome-write.c, indexdb.c: Minor changes in monitoring output
24518
24519    * VERSION: Updated version number
24520
24521    * README: Added explanation of output ordering with multiple threads
24522
24523    * coords1.test.ok: Changed to add new comment line in coords.txt
24524
24525    * README: Minor textual change
24526
24527    * gmap_setup.pl.in: Added -q flag for specifying indexing interval.  Allowed
24528      comment lines to be in coords.txt.
24529
24530    * md_coords.pl.in: Improved messages to user.
24531
24532    * fa_coords.pl.in: Added handling of unmapped contigs for Ensembl genomes.
24533      Improved messages to user.  Added check for possible conversions of
24534      alternate chromosomes to alternate strains.
24535
24536    * gmap_uncompress.pl.in: Fixed bug due to old code that referred to the -R
24537      flag
24538
24539    * gmap.c: Enhanced result to show number of matches, mismatches, and indels
24540      in alternative to chimera.  Introduced maxpaths of 0 to indicate output of
24541      both paths of chimera if present, otherwise one path.
24542
24543    * pair.c, pair.h, sequence.c, sequence.h, stage3.c, stage3.h: Removed
24544      references to ntrimmed
24545
24546    * result.c, result.h: Enhanced result to show number of matches, mismatches,
24547      and indels in alternative to chimera
24548
24549    * gmap.c: Remove check for badoligos.  Modified logic for computing
24550      chimeras. Made calls to initialization and termination routines for
24551      Dynprog_T.
24552
24553    * chimera.c: Fixed memory leak
24554
24555    * pair.c: Removed printing of ntrimmed nucleotides
24556
24557    * stage3.c, stage3.h: Added functions for reporting matches, mismatches,
24558      indels, and margin of a Stage3_T object
24559
24560    * translation.c: Added initial values for translation_start and
24561      translation_end
24562
24563    * stage2.c: Removed computation of stage2 support.  Simplified loop
24564      conditions.
24565
24566    * oligoindex.c, oligoindex.h: Removed computation of stage2 support
24567
24568    * dynprog.c, dynprog.h: Replaced functions with arrays for computing
24569      pairdistances and jump penalties
24570
245712005-05-05  twu
24572
24573    * oligoindex.c, oligoindex.h: Changed memory allocation scheme, by setting
24574      ALLOCSIZE == MAXHITS. Assigning blocks in ascending order of available
24575      slots.  Computing trim_start and trim_end.  Reporting support for stage 2.
24576
24577    * gmap.c: Changed calls to Sequence_read().  Using oligomer-based method for
24578      trimming query sequence.
24579
24580    * md5-compute.c, oligo-count.c: Changed calls to Sequence_read()
24581
24582    * sequence.c, sequence.h: Removed poly-A and poly-T detection in favor of
24583      oligomer-based trimming at ends.
24584
24585    * stage2.c, stage2.h: Added check for stage 2 support.
24586
24587    * stage1.c: Restored terminal sampling for short sequences.  Fixed potential
24588      bug with subtracting unsigned ints.  Enhanced debugging messages.
24589
24590    * md5-compute.c, oligo-count.c, sequence.c, sequence.h: Modified functions
24591      to report next char in input.
24592
24593    * matchpair.c, matchpair.h: Added reporting of stage1 support and stage1
24594      stretch.
24595
24596    * gmap.c: Added checks for bad input sequences based on oligo depth, bad
24597      oligos, stage1 support, and stage2 support.  Moved message about batch
24598      mode earlier, if evidence of a second sequence is present.
24599
24600    * gmap.c: Chopping chimeras at breakpoint, and providing a flag to allow
24601      overlaps at the breakpoint.
24602
24603    * stage3.c, stage3.h: Simplified interface to Stage3_copy.
24604
24605    * pair.c, pair.h: Removed coverage correction for genomic gaps.  Added way
24606      to turn off merge_gaps during copying of pairs.
24607
246082005-05-04  twu
24609
24610    * stage2.c: Made changes in individual instructions to improve speed
24611
24612    * oligoindex.c: Added overabundant field
24613
24614    * chimera.c: Speeded up computation
24615
24616    * gmap.c: Using explicit step for marking oligos in the query.  Terminating
24617      attempt at mapping if oligo depth exceeds 2.  Fixed memory leak.
24618
24619    * stage2.c, stage2.h: The variable badsequencep is now fed into
24620      Stage2_compute.
24621
24622    * stage1.c: Killed terminal sampling for short sequences.  Reduced values
24623      for maxentries.  Both done to improve speed.
24624
24625    * oligoindex.c, oligoindex.h: Added an explicit step for marking oligos in
24626      the query, which needs to be done only once for each query sequence.
24627
24628    * chimera.h: Added computation of margin.
24629
24630    * chimera.c: Added computation of margin.  Improved debugging output.
24631
24632    * gmap.c: Fixed bug where bestfrom == bestto.  Added check for sufficient
24633      margin at ends before finding chimera.
24634
24635    * gmap_compress.pl.in: Changed compression routine to handle chimera
24636      information
24637
24638    * chrsubset.c: Fixed bug where stdin was closed if .chrsubset file didn't
24639      exist
24640
24641    * stage3.h: Added function to compute matchscores for chimera detection.
24642
24643    * stage3.c: Changed calls to Sequence_T functions.  Added function to
24644      compute matchscores for chimera detection.
24645
24646    * stage2.c: Performing Stage 2 from trim start to trim end, instead of
24647      entire sequence.  Changed calls to Sequence_T functions.
24648
24649    * stage1.c: Changed calls to Sequence_T and Reader_T functions
24650
24651    * sequence.c, sequence.h: Cleaned up interface.  Added ability to print
24652      trimmed part of sequence.
24653
24654    * Makefile.am, chimera.c, chimera.h, gmap.c, nmath.c, nmath.h, pair.c,
24655      pair.h: Added chimera detection based on Chow test
24656
24657    * md5-compute.c: Changed call to Sequence_T function.  Using full sequence
24658      now for MD5 computation.
24659
24660    * matchpair.c: Removed call to Sequence_T function
24661
24662    * oligoindex.c: Changed calls to Sequence_T functions
24663
24664    * oligo-count.c: Changed call to Reader_new
24665
24666    * get-genome.c: Changed call to Sequence_print
24667
24668    * reader.c, reader.h: Storing querystart and queryend in Reader_T object
24669
24670    * block.c, block.h: Removed unnecessary field
24671
246722005-05-03  twu
24673
24674    * gmap.c, gmapindex.c, indexdb.c, indexdb.h: Allowed indexing interval of
24675      12-mers to be specified at run time
24676
24677    * configure.ac: Added check for madvise function
24678
24679    * README: Added Ensembl format as a recognized coordinate format
24680
24681    * md_coords.pl.in: Improved prompt for alternate chromosomes
24682
24683    * genome.c, iit-read.c, indexdb.c: Put compiler flags around madvise
24684
24685    * datadir.c: Deleted line that was causing problems when the GMAPDB
24686      environment variable was set
24687
246882005-05-01  twu
24689
24690    * fa_coords.pl.in: Further fixed coordinates
24691
24692    * fa_coords.pl.in: Removed addition of 1 to coordinates.  Added parsing for
24693      Ensembl format.
24694
24695    * gmap_setup.pl.in: Testing accessions with and without version numbers
24696
24697    * md_coords.pl.in: Making -U and -A flags standard.  Can exclude unmapped
24698      contigs and alternate chromosomes with chrsubsets.
24699
24700    * md_coords.pl.in: Fixed case where direction eq "0".
24701
24702    * oligoindex.c: Modified memory allocation scheme to have a fixed block of
24703      memory that expands when necessary.
24704
24705    * iit_get.c: Added -A back to allowed flags.
24706
24707    * chrsubset.c: Added debug statements
24708
24709    * VERSION: Updated version
24710
247112005-04-20  twu
24712
24713    * sequence.c: Kept poly-A and poly-T limits when specifying subsequences.
24714
24715    * pair.c: Added an exception handler.  Removed minor bug where first pair
24716      was handled twice.
24717
24718    * gmapindex.c: Allowed compress and uncompress routines to take a filename
24719      as an argument.  Added wraplength option for uncompress.
24720
24721    * gmap.c: Fixed bug in specifying wrong sequence length for computing
24722      chimeras. Removed limit on number of paths for finding chimeras.  Added
24723      exception handler.
24724
24725    * except.c: Modified behavior of exception handler
24726
24727    * genuncompress.c: Fixed problem if positions were greater than allowed for
24728      signed ints.
24729
24730    * compress.c, compress.h: Added wraplength option to Compress_uncompress.
24731
247322005-04-19  twu
24733
24734    * sequence.c, stage1.c: Added checks for null before freeing memory.
24735
24736    * gmap.c: Made IIT_get return an array of ints, rather than an Intlist, to
24737      reduce repeated small memory allocations.  Placed a limit on npaths for
24738      finding chimeras.
24739
24740    * get-genome.c, iit-read.c, iit-read.h, iit_get.c, segmentpos.c, stage3.c:
24741      Made IIT_get return an array of ints, rather than an Intlist, to reduce
24742      repeated small memory allocations.
24743
24744    * mem.c: Added debugging statements.
24745
247462005-04-18  twu
24747
24748    * dynprog.c: Added memory allocation routines in cases where problem size
24749      exceeds maxlength of Dynprog_T.  Removed unused code for affine gap
24750      penalties.
24751
247522005-04-12  vivekr
24753
24754    * cvswrappers: Added binary extensions
24755
247562005-03-11  twu
24757
24758    * gmap_setup.pl.in, md_coords.pl.in: Allowed for contigs to be reverse
24759      complement
24760
24761    * fa_coords.pl.in: Removed unused functions
24762
24763    * gmap.c, get-genome.c: Moved dump functions to get-genome
24764
24765    * segmentpos.c: Fixed bug when alternate strain contig exists but reference
24766      is to reference strain
24767
24768    * get-genome.c, iit-read.c, iit-read.h: Changed output of dump functions
24769
24770    * README: Added instructions for specifying reverse coordinates
24771
24772    * VERSION: Changed version number
24773
247742005-03-09  twu
24775
24776    * gmapindex.c, iit-read.c: Now storing information about reverse
24777      complementing of contigs
24778
24779    * match.c, pair.c, pair.h, segmentpos.c, segmentpos.h, stage3.c: Limited
24780      printing of contigs to those that are relevant for a given strain.
24781
24782    * get-genome.c, gmap.c: Fixed bug when using the -R release flag.
24783
247842005-03-08  twu
24785
24786    * get-genome.c: Changed default behavior to print just the reference strain.
24787       Added a flag to print all strains.
24788
247892005-03-04  twu
24790
24791    * chrsubset.c: Fixed minor memory leak
24792
24793    * VERSION: Updated version
24794
24795    * README: Added explanation of chromosome subsets
24796
24797    * chrsubset.c: Changed Chrsubset_T object to be NULL when a blank list is
24798      read in .chrsubset file.
24799
24800    * gmap.c: Incorporated chrsubset.  Fixed printing of option flags.
24801
24802    * gmap_setup.pl.in: Added creation of chrsubset file
24803
24804    * whats_on: Changed directories where genomic maps are located
24805
24806    * Makefile.am, chrsubset.c, chrsubset.h, params.c, params.h, stage1.c,
24807      stage1.h: Added capability to search on chromosome subsets
24808
24809    * separator.h: Changed separator back to dashes
24810
24811    * iit-read.c: Changed format of dumping typestrings for .altstrain.type file.
24812
24813    * gmapindex.c: Added writing of .altstrain.type file.
24814
24815    * stage1.c: Removed unused code.  Using stage1size instead of INDEX1PART in
24816      some places.
24817
24818    * gmap.c: Added error message.
24819
24820    * datadir.c: Removed unused error message.
24821
248222005-03-03  twu
24823
24824    * stage1.c: Introduced idea of dangling matches at ends, and using it to
24825      determine when to sample further at each end, and when to sample from the
24826      middle.
24827
248282005-03-02  twu
24829
24830    * separator.h: Changed separator from dashes to dots.
24831
24832    * stage1.c: Fixed a bug in find_3prime_matches.  Changed sampling to avoid
24833      terminal sampling, and to redo sampling just before nskip is zero. This is
24834      done to avoid long computation times with terminal sampling on long cDNAs.
24835
248362005-03-01  twu
24837
24838    * matchpair.c, matchpair.h: Added a boundmethod type.
24839
24840    * stage1.c: Added code for finding matches using triplets, but not using it.
24841      Removing terminal sampling, and performing a redo of last sampling instead.
24842
248432005-02-18  twu
24844
24845    * params.c, params.h: Removed maxintronlen from the params structure.
24846
24847    * gmap.c: Increased default maxintronlen to 1.2M, and provided a flag to
24848      allow user to change this value.
24849
24850    * perl.m4, configure.ac: Changed name of macro
24851
24852    * configure.ac: Added check for Perl with needed modules.  Added warning
24853      messages to bottom of configure script.
24854
24855    * config.site: Added option for user to specify a value for PERL
24856
24857    * acinclude.m4: Added check for Perl with needed modules
24858
24859    * perl.m4: Added check for Perl with appropriate modules
24860
24861    * VERSION: Set version number
24862
24863    * COPYING, config.site: Changed wording slightly
24864
24865    * README: Removed optional comment after make check
24866
248672005-02-17  twu
24868
24869    * datadir.c, gmap_setup.pl.in: Allowed subdirectory to be present in the -d
24870      flag
24871
24872    * config.site: Fixed advice on installing in build directory
24873
24874    * README: Fixed some textual errors
24875
248762005-02-16  twu
24877
24878    * gmap_setup.pl.in: Modified instruction text
24879
24880    * md_coords.pl.in: Added guessing of columns
24881
24882    * genome.c, genuncompress.c, iit-read.c, indexdb.c: Added type cast to avoid
24883      compiler warnings for munmap.
24884
24885    * configure.ac: Removed capitalization
24886
24887    * VERSION: Updated version
24888
24889    * configure.ac: Capitalized message when compilation of pthreads fails
24890
24891    * Makefile.am: Added subdirectories
24892
24893    * iit_get.out.ok, iittest.iit.ok: Added okay files for IIT programs
24894
24895    * AUTHORS: Minor text change
24896
24897    * gmap_setup.pl.in: Changed usage statement
24898
24899    * acx_pthread.m4: Updated macro to latest version
24900
24901    * configure.ac: Added tests for IIT programs.  Changed call to ACX_PTHREAD.
24902
24903    * config.site.gne: Changed name from genomedir to gmapdb
24904
24905    * config.site: Added lines for PTHREAD_CFLAGS and PTHREAD_LIBS
24906
24907    * MAINTAINER: Added instructions for building .ok files for tests
24908
24909    * align.test.in, coords1.test.in, map.test.in, setup1.test.in,
24910      setup2.test.in: Added ${srcdir} where necessary to make distcheck happy
24911
24912    * Makefile.am, fa.iittest, iit.test.in, iit_dump.test.in, iit_get.test.in,
24913      iit_store.test.in: Added tests for IIT programs
24914
24915    * gmap.c: Changed ENABLE_PTHREADS to HAVE_PTHREAD.  Added reporting of
24916      features to version command.
24917
24918    * blackboard.c, reqpost.c: Changed ENABLE_PTHREADS to HAVE_PTHREAD
24919
24920    * iit_store.c: Changed flags and calling convention
24921
24922    * Makefile.am: Removed ENABLE_PTHREADS and POPT_LIBS.
24923
24924    * acinclude.m4, acx-pthread.m4, acx_pthread.m4: Changed name of file
24925
24926    * README: Completed instructions
24927
24928    * COPYING: Completed license terms
24929
24930    * acinclude.m4, config, acx-pthread.m4, expand.m4, mmap-flags.m4,
24931      pagesize.m4: Put m4 macros into separate files
24932
24933    * configure.ac: Commented out code for AC_PROG_LIBTOOL.  Added some compiler
24934      checks.
24935
249362005-02-15  twu
24937
24938    * gmap_setup.pl.in: Removed IO::Dir.  Changed behavior if -I flag not given.
24939       Added -9 for debugging behavior.
24940
24941    * fa_coords.pl.in, md_coords.pl.in: Removed IO::Dir
24942
24943    * iit-read.h, iit-write.h: Fixed compiler complaint about double typedef for
24944      IIT_T
24945
24946    * iit-read.c: Fixed one-off problem with IIT_totallength.
24947
24948    * genome-write.c: Fixed montoring statements.
24949
24950    * gmap.c: Put pthreads information in version text.
24951
24952    * gmapindex.c: Fixed problem in comparing an int (255) with EOF (-1) on some
24953      machines.
24954
24955    * Makefile.am, align.test.in, align.test.ok, coords1.test.in,
24956      coords1.test.ok, map.test.in, map.test.ok, setup.genomecomp.ok,
24957      setup.idxpositions.ok, setup1.test.in, setup2.test.in, ss.chr17test:
24958      Expanded test suite
24959
24960    * Makefile.am, ss.cdna, ss.chr17test, ss.her2: Initial addition to CVS
24961      repository.
24962
249632005-02-14  twu
24964
24965    * gmap_setup.pl.in, md_coords.pl.in: Moved functionality to separate
24966      md_coords program
24967
24968    * Makefile.am: Added fa_coords program
24969
24970    * fa_coords.pl.in: Added file to CVS repository.
24971
24972    * block.c, block.h, compress.c, dynprog.c, dynprog.h, genome-write.c,
24973      iit-read.c, iit-write.c, indexdb.c, interval.c, intron.c, match.c,
24974      match.h, matchpair.c, matchpair.h, md5.c, md5.h, md5.t.c, oligo.c,
24975      oligo.h, pair.h, pairpool.c, pairpool.h, reader.c, request.c, result.h,
24976      segmentpos.c, segmentpos.h, smooth.c, smooth.h, stage1.c, stage3.c,
24977      stopwatch.c, translation.h: Cleaned up included headers
24978
24979    * table.c, table.h, tableint.c, tableint.h, chrom.c, chrom.h: Clarified
24980      meaning of unsigned type.
24981
24982    * reqpost.h: Using Blackboard_T in interface.
24983
24984    * oligo-count.c: Fixed call to Block_new.
24985
24986    * listdef.h: Added a define for T.
24987
24988    * iitdef.h: Moved typedef to iit-read.h and iit-write.h.
24989
24990    * iit_get.c: Removed popt library calls.
24991
24992    * iit-read.h, iit-write.h: Moved include of iitdef.h to .c files.
24993
24994    * get-genome.c: Using SEPARATOR now instead of DASH.
24995
24996    * datadir.c, datadir.h: Formatting changes.
24997
24998    * gmap.c, oligoindex.c, oligoindex.h, params.c, params.h, stage2.c,
24999      stage2.h: Moved get_mappings command to be in oligoindex.c.  Moved
25000      indexsize to be stored in Params_T.
25001
25002    * complement.c, complement.h, genome.c, pair.c, sequence.c, translation.c:
25003      Changed complement table to be a macro.
25004
25005    * blackboard.h: Added comments about include of reqpost.h.
25006
25007    * Makefile.am: Cleaned up source files needed for each binary.
25008
25009    * shortoligomer.h: Removed file.  Definition needed only by oligoindex.c.
25010
25011    * bigendian.h, genuncompress.c, iit-write.c, littleendian.h: Conditionally
25012      include littleendian.h.
25013
25014    * iit-read.h: Added function to compute total length.
25015
25016    * iit-read.c: Conditionally include littleendian.h.  Added function to
25017      compute total length.
25018
25019    * indexdb.h: Allow user to force building of positions file in file.
25020
25021    * indexdb.c: Conditionally include littleendian.h.  Allow user to force
25022      building of positions file in file.
25023
25024    * genome-write.c: Added explanation of file format.
25025
25026    * genome.c: Changed type from unsigned int to UINT4.  Conditionally include
25027      littleendian.h.
25028
25029    * compress.c, compress.h: Added ability to create genome file in memory, if
25030      enough is available. Changed type from unsigned int to UINT4.
25031
25032    * Makefile.am, genome-write.c, genome-write.h, gmapindex.c: Moved procedures
25033      for writing genome file to a new file.  Added ability to create genome
25034      file in memory, if enough is available.
25035
250362005-02-10  twu
25037
25038    * iit_get.c: Added include for strings.h to handle rindex.
25039
25040    * bigendian.h, genome.c, genuncompress.c, indexdb.c, sequence.c: Added
25041      includes for stddef.h to handle size_t
25042
25043    * genome.c, genuncompress.c, iit-read.c, indexdb.c: Added check for
25044      HAVE_SYS_STAT_H
25045
25046    * gmap.c, gmapindex.c, oligo-count.c: Removed include of sys/stat.h
25047
25048    * iit-read.c: Commented out include of sys/param.h
25049
25050    * genome.c, indexdb.c: Commented out include of errno.h
25051
25052    * except.c: Removed code for mailing error messages to developer.
25053
25054    * genome.c, genuncompress.c, gmapindex.c, iit-read.c, iit_store.c,
25055      indexdb.c, md5-compute.c, stopwatch.c: Added checks for HAVE_UNISTD_H and
25056      HAVE_FCNTL_H.
25057
25058    * blackboard.c, compress.c, datadir.c, genome.c, genuncompress.c, gmap.c,
25059      gmapindex.c, iit-read.c, iit_store.c, indexdb.c, oligo-count.c, reqpost.c,
25060      stopwatch.c: Added check for HAVE_SYS_TYPES_H
25061
25062    * genome.c, genomicpos.c, iit-write.c, indexdb.c, match.c, md5.c,
25063      oligoindex.c, pair.c, sequence.c: Created separate macros for handling
25064      absence of memcpy and memmove.
25065
25066    * genome.c, genomicpos.c, iit-write.c, indexdb.c, match.c, md5.c,
25067      oligoindex.c, pair.c, sequence.c: Included macros for handling computers
25068      without memcpy or memmove.
25069
25070    * datadir.c: Included macros for handling computers without dirent.h.
25071
250722005-02-10  jmurray
25073
25074    * cvswrappers: Added binary extensions
25075
250762005-02-07  twu
25077
25078    * chimera.c, translation.c: Fixed rcsid lines
25079
25080    * Makefile.am, uinttable.c, uinttable.h: Removed files uinttable.c and
25081      uinttable.h
25082
25083    * bigendian.c: Added ending quotation mark to rcsid.
25084
25085    * bigendian.h, chimera.h, scores.h, separator.h: Added Id comment to
25086      beginning of header files.
25087
25088    * assert.c, assert.h, bigendian.c, bigendian.h, blackboard.c, blackboard.h,
25089      block.c, block.h, boyer-moore.c, chimera.c, chrnum.c, chrnum.h, chrom.c,
25090      chrom.h, complement.c, complement.h, compress.c, datadir.h, dynprog.c,
25091      dynprog.h, except.c, except.h, genome.c, genome.h, genomicpos.c,
25092      genomicpos.h, get-genome.c, gmap.c, gmapindex.c, iit-read.c, iit-read.h,
25093      iit-write.c, iit-write.h, iit_dump.c, iit_get.c, iit_store.c, indexdb.c,
25094      indexdb.h, interval.c, interval.h, intlist.c, intlist.h, intron.c,
25095      intron.h, list.c, list.h, match.c, match.h, matchpair.c, matchpair.h,
25096      md5-compute.c, md5.c, md5.h, mem.c, mem.h, mutation.c, mutation.h,
25097      oligo-count.c, oligo.c, oligo.h, oligoindex.c, oligoindex.h, pair.c,
25098      pair.h, pairpool.c, pairpool.h, params.c, params.h, reader.c, reader.h,
25099      reqpost.c, reqpost.h, request.c, request.h, result.c, result.h,
25100      segmentpos.c, segmentpos.h, sequence.c, sequence.h, smooth.c, smooth.h,
25101      stage1.c, stage1.h, stage2.c, stage2.h, stage3.c, stage3.h, stopwatch.c,
25102      stopwatch.h, table.c, table.h, tableint.c, tableint.h, translation.c,
25103      translation.h, uintlist.c, uintlist.h, uinttable.c, uinttable.h: Moved
25104      HAVE_CONFIG_H from .h file to .c file.
25105
25106    * datadir.c: Added check to see if closedir succeeded.
25107
25108    * Makefile.am: Augmented list of bin programs.
25109
25110    * get-genome.c, match.c, pair.c, pair.h, stage3.c, stage3.h: Changing
25111      variable names to genomesubdir, fileroot, and dbversion.
25112
25113    * gmap.c: Added -g flag.  Changing variable names to genomesubdir, fileroot,
25114      and dbversion.
25115
25116    * params.c, params.h: Made dbversion a static variable.
25117
25118    * genome.c, genome.h, indexdb.c, indexdb.h: Changing variable names to
25119      genomesubdir and fileroot.
25120
25121    * datadir.c, datadir.h: Now searching subdirectory to find name of fileroot,
25122      which can be different from subdirectory name.
25123
25124    * pair.c: Removed unnecessary math.h header.  Added initialization of donor
25125      and acceptor arrays.
25126
25127    * getopt.c: Removed internationalization code.
25128
25129    * gmap.c: Removed unnecessary math.h header.  Changed location of map
25130      directory for each genome.
25131
25132    * matchpair.c, oligoindex.c, segmentpos.c, smooth.c, stage3.c: Removed
25133      unnecessary math.h header.
25134
25135    * indexdb.c, indexdb.h: Allowed user to build positions file directly to
25136      disk, if sufficient memory is unavailable.
25137
25138    * mem.c, mem.h: Added procedures for allocating memory without throwing an
25139      exception.
25140
25141    * gmapindex.c: Changed flags.  Allowed user to build positions file directly
25142      to disk, if sufficient memory is unavailable.
25143
25144    * chrom.c: Eliminated printing of initial zero on non-numeric chromosomes.
25145
251462005-02-03  twu
25147
25148    * gmap_setup.pl.in: Removed -R flag, and symbolic links.  Fixed problems
25149      with parsing unmapped contigs in seq_contig.md files.
25150
25151    * gmapindex.c: Added debugging statements.
25152
251532005-01-27  twu
25154
25155    * config.site: Added warning about non-absolute paths.
25156
25157    * README: Added comments about downloading a genome database.
25158
25159    * Makefile.am: Added extra commands for "make distcheck" to be happy.
25160      Removed genome example.
25161
25162    * MAINTAINER: Added comment about --enable-fulldist
25163
251642005-01-26  twu
25165
25166    * config.site, configure.ac, Makefile.am, datadir.c: Changed GENOMEDIR to
25167      GMAPDB.
25168
251692005-01-25  twu
25170
25171    * ss.AA005326, ss.cdna: Changed name of example cDNA sequence.
25172
251732005-01-24  twu
25174
25175    * MAINTAINER: Added recommended steps for creating a distribution.
25176
25177    * Makefile.am, chrnum.c, chrnum.h, chrom.c, chrom.h, genome.c, genome.h,
25178      get-genome.c, gmap.c, gmapindex.c, match.c, match.h, matchdef.h,
25179      matchpair.c, pair.c, pair.h, segmentpos.c, segmentpos.h, stage1.c,
25180      stage3.c, stage3.h: Made changes to allow chromosome names to be
25181      arbitrarily long
25182
25183    * gmap_setup.pl.in: Removed restriction on chromosome name length.  Stripped
25184      spaces from beginning and end of input.  Added step to create initial
25185      genomedir.
25186
25187    * config.site: Changed defaults in config.site.
25188
25189    * getopt.c, getopt.h, getopt1.c: Added gnu getopt_long function
25190
25191    * tests, ss.AA005326: Added test sequence.
25192
25193    * Makefile.am: Created Makefile.am in util subdirectory
25194
25195    * gmap_setup.pl.in: Fixed bug due to missing quotation mark
25196
25197    * configure.ac: Removed dependence upon popt library
25198
25199    * Makefile.am, get-genome.c, gmap.c: Added gnu getopt_long procedure
25200
25201    * README: Changed prerequisites.  Improved formatting.
25202
252032005-01-23  twu
25204
25205    * gmap_setup.pl.in: Added procedures for handling UCSC genomes.
25206
25207    * iit_store.c: Using Tableint_T instead of Table_T for types.
25208
25209    * Makefile.am: Removed some unnecessary source files.
25210
25211    * configure.ac: Added ACX_EXPAND, turned off popt, and fixed problem when no
25212      threads compilation is possible.
25213
25214    * config.site.gne: Added comments for profiling and making .third file.
25215
25216    * acinclude.m4: Added macro for ACX_EXPAND.
25217
25218    * README: Added mention of examples and make check.
25219
25220    * Makefile.am: Added extra dist files for examples.
25221
25222    * oligoindex.c: Created a union type to make clear the possible storage of
25223      either a position or a pointer to an array positions.
25224
25225    * datadir.c: Removed unused function.
25226
25227    * gmapindex.c, table.c, table.h, tableint.c, tableint.h, uinttable.c,
25228      uinttable.h: Added an end value to avoid problems when table length is 0.
25229
25230    * Makefile.am, tableint.c, tableint.h, uinttable.c, uinttable.h: Made
25231      specific table types.
25232
25233    * gmap.c: Removed duplicate getopt line.
25234
25235    * iit_get.c: Fixed compilation bug when popt not available.
25236
25237    * gmapindex.c: Used specific table types and keys/values functions.
25238
25239    * table.c, table.h: Made functions Table_keys and Table_values
25240
25241    * gmap_uncompress.pl.in: Using BINDIR for substitution.
25242
25243    * Makefile.am: Removed Makefile.am
25244
25245    * gmap_setup.pl.in: Major changes made to provide both interactive and
25246      command-line use.
25247
252482005-01-22  twu
25249
25250    * configure.ac: Allowed hyphens to be in the version number
25251
25252    * MAINTAINER, bootstrap, config.site, config.site.gne: Added local
25253      config.site to CVS directory
25254
25255    * MAINTAINER: Added notes for maintainer
25256
25257    * README: Simplifying the installation instructions
25258
25259    * configure.ac: Made configuration easier by adding VERSION and config.site
25260      files. Removed MAPDIR.  Added Perl scripts.
25261
25262    * VERSION, config.site: Made configuration easier by adding VERSION and
25263      config.site files.
25264
25265    * gmap_compress.pl.in, gmap_uncompress.pl.in: Changed file from .pl version
25266      to .pl.in version.
25267
25268    * Makefile.am: Moved Perl scripts to util subdirectory.
25269
25270    * datadir.c, datadir.h, gmap.c: Moved map files to a subdirectory in genome
25271      directory.
25272
25273    * gmapsetup.pl.in: Moved file to util subdirectory.
25274
25275    * whats_on: Changed location of map files to be inside genome directories.
25276
25277    * gmap_compress.pl, gmap_uncompress.pl: Changing scripts from .pl to .pl.in
25278      version
25279
25280    * README, configure.ac, Makefile.am, compress.c, datadir.c, genome.c,
25281      get-genome.c, gmap.c, gmapsetup.pl.in, iit-read.c, indexdb.c,
25282      segmentpos.c, gmap_compress.pl, gmap_compress.pl.in, gmap_setup.pl.in,
25283      snap.c, snapbuild.pl.in, snapindex.c, snap_compress.pl,
25284      snap_uncompress.pl: Renamed program from snap to gmap
25285
25286    * gmap_compress.pl, gmap_compress.pl.in, snap_compress.pl: Better handling
25287      of MD5 info and aa lines.
25288
25289    * gmap_uncompress.pl, gmap_uncompress.pl.in, snap_uncompress.pl: Handling
25290      arbitrary flags in compression.
25291
25292    * mutation.c, mutation.h, pair.c, pair.h, pairdef.h, translation.c: Added
25293      refquerypos to print nucleotide position of mutations.
25294
25295    * get-genome.c: Fixed problem with empty header for reference sequence when
25296      specific strain is requested.
25297
252982005-01-19  twu
25299
25300    * translation.c: Fixed problem with printing of an AA in an intron.
25301
25302    * mutation.c: Consolidated point mutations near a segmental mutation.
25303
253042005-01-06  twu
25305
25306    * translation.c: Fixed detection of deletion mutations where aapos was
25307      advancing in a gap.
25308
25309    * mutation.c, mutation.h, translation.c: Fixed cases where a single-position
25310      mutation was reported next to a segmental mutation.
25311
25312    * stage3.c: Added debugging statements for relative alignment.
25313
253142005-01-05  twu
25315
25316    * translation.c: Allowed lower case letters to translate appropriately to a
25317      codon.
25318
253192004-12-21  twu
25320
25321    * stage3.c: Performing microexon search for all defect rates.  Adjusted
25322      acceptable mismatches for low-quality sequences.
25323
253242004-12-20  twu
25325
25326    * translation.c: Increased IGNORE_MARGIN to deal with nucleotide coordinates.
25327
25328    * stage3.c: Changed criteria for performing microexon search.
25329
25330    * translation.c, translation.h: Fixed detection of large deletions relative
25331      to reference sequence. Fixed printing of cDNA aa in a gap.
25332
25333    * stage3.c: Changed criterion for starting microexon search to add
25334      mismatches and indels.  Fixed detection of large deletions relative to
25335      reference sequence.
25336
25337    * gmap.c, snap.c: Set chimera threshold to 0 for default.  Reduced band from
25338      10 to 7.
25339
25340    * dynprog.c: Reduced pvalue thresholds for microexons.
25341
253422004-12-19  twu
25343
25344    * gmap.c, snap.c: Turned on chimera functionality.  Increased dynamic
25345      programming band from 7 to 10.
25346
25347    * stage1.c: Changed function for maxintronlen.
25348
25349    * smooth.c: Increased SHORTMIDEXON_LEN from 40 to 80.
25350
25351    * dynprog.c: Removed definition for INFINITY, which wasn't being used.
25352
253532004-12-18  twu
25354
25355    * stage2.c: Created define parameter SAMPLE_INTERVAL.
25356
25357    * gmap.c, snap.c: Change maxintronlen to be maxintronlen_bound, and compute
25358      new maxintronlen depending on current query length.  Increased size of
25359      extraband_single and extraband_paired.
25360
25361    * params.c, params.h, stage1.c, stage1.h: Change maxintronlen to be
25362      maxintronlen_bound, and compute new maxintronlen depending on current
25363      query length.
25364
25365    * dynprog.c: Changed compute_scores_affine to have parameter list compatible
25366      with compute_scores (with codon penalty).
25367
25368    * stage3.c: Subtracting points for non-canonical introns in determining
25369      direction. Doing middle introns of sequence before doing 3' and 5' ends.
25370
25371    * dynprog.c: Increased pvalue thresholds.
25372
253732004-12-13  twu
25374
25375    * gmap.c, pair.c, pair.h, snap.c, stage3.h: Added function for printing cDNA
25376      exons.
25377
25378    * stage3.c: Fixed problem where we shouldn't perform single-gap dynamic
25379      programming because unable to peel forward and peel back.
25380
25381    * translation.c: Created separate mutation types for substitution,
25382      insertion, and deletion.  Allowed filling in of last amino acid.
25383
25384    * mutation.c, mutation.h: Created separate mutation types for substitution,
25385      insertion, and deletion.
25386
253872004-12-09  twu
25388
25389    * stage3.c: Changed calls to Translate module.
25390
25391    * translation.c: Simplified code for computing protein bounds.  Handled the
25392      case where full length is specified, but no full length protein exists.
25393
25394    * mutation.c, mutation.h: Added procedures for handling multiple insertions
25395      and deletions.
25396
25397    * translation.c, translation.h: Changed algorithm for translate_est_forward
25398      and translate_est_backward.
25399
254002004-12-08  twu
25401
25402    * translation.c, translation.h: Changed algorithms for translate_est_forward
25403      and translation_est_backward.  Added printing of nucleotide differences.
25404
25405    * gmap.c, snap.c, stage3.c, stage3.h: Added options for printing either
25406      genomic or cDNA version of protein.
25407
25408    * pair.c, pair.h: Added function Pair_dump_aapos.
25409
25410    * mutation.c, mutation.h: Added functions for retrieving amino acids from
25411      mutation.
25412
25413    * dynprog.c: Added slight penalty against gaps next to an intron.
25414
254152004-12-05  twu
25416
25417    * Makefile.am, mutation.c, mutation.h, stage3.c, translation.c,
25418      translation.h: Simplified computation of translations and mutations.
25419
25420    * pair.c, pair.h, pairdef.h, pairpool.c: Now printing both genomic and cDNA
25421      proteins.
25422
254232004-12-02  twu
25424
25425    * stage3.h: Removed unused chimera code.
25426
25427    * stage3.c: Removed unused chimera code.  Changed criteria for finding
25428      microexons at end; now performed only when extension is poor and sequence
25429      quality is high.
25430
25431    * dynprog.h, gmap.c, snap.c: Allowed user option to extend alignment past
25432      last match.
25433
25434    * dynprog.c: Fixed bug in adding gap to replace dashes.
25435
25436    * pairpool.c: Added debugging statement for creation of pairs.
25437
25438    * smooth.c: Added check for negative exon length.
25439
254402004-11-29  twu
25441
25442    * dynprog.c, dynprog.h: Added symbols for an intron if applicable to a large
25443      horizontal jump. Increased maximum microexon size.
25444
25445    * stage3.c: Added peel_back and peel_forward to 5' and 3' ends before doing
25446      search for microexons.
25447
254482004-11-22  twu
25449
25450    * scores.h, stage3.c: Added credit for dual half-canonical introns.
25451
25452    * stage1.c, stage1.h: Removed unused code.
25453
25454    * dynprog.c: Added parameters for PVALUE for microexon and end exon searches.
25455
254562004-11-18  twu
25457
25458    * result.h: Changed interface for Result_new to match implementation.
25459
25460    * Makefile.am: Added scores.h to Makefile.am.
25461
254622004-11-15  twu
25463
25464    * stage3.c, stage3.h: Commented out code for extending pairs in a chimera.
25465
25466    * gmap.c, snap.c: Fixed problem in rearranging best two paths for chimera.
25467
25468    * pair.c: Stopped printing of the terminal amino acid '*'.
25469
25470    * genome.c, indexdb.c: Added printing of a dot every 10000 pages.
25471
254722004-10-12  twu
25473
25474    * gmap.c, snap.c: For chimeras that extend too long, now chopping off the
25475      extra part.
25476
25477    * stage3.c, stage3.h: Added procedure for doing a bounded copy of a Stage 3
25478      object.
25479
25480    * pairpool.c, pairpool.h: Added procedure for doing a bounded copy of a path.
25481
254822004-10-06  twu
25483
25484    * Makefile.am, chimera.c, chimera.h, gmap.c, snap.c: Changed procedure for
25485      chimeras to find best pair and to order the chimeras according to query
25486      sequence.
25487
25488    * scores.h: Moved scores for determining goodness into a separate file.
25489
25490    * stage3.c, stage3.h: Added procedure for copying a Stage3 object.
25491
25492    * result.c, result.h: Changed chimera information to be a position, rather
25493      than a boolean.
25494
25495    * dynprog.c: Added code for allowing right angles, but not using at present.
25496
25497    * pairpool.c: Changed print statement to work only in debug mode.
25498
25499    * pair.c, pair.h: Added procedure for computing scores along a path.
25500
255012004-09-30  twu
25502
25503    * gmap.c, match.c, pair.c, pair.h, sequence.c, sequence.h, snap.c, stage3.c,
25504      stage3.h: Added MD5 checksum for compressed output.
25505
25506    * stage1.c: Added notation about using position for revcomp matches in IITs.
25507
25508    * iit-read.c: Changed debugging statements to print unsigned ints.
25509
255102004-09-27  twu
25511
25512    * stage3.c: Fixed problem when peeling an extra pair if it's a gap.
25513
255142004-09-09  twu
25515
25516    * stage3.c: Restored behavior of crossing just one short exon for dual
25517      genome gap.
25518
25519    * stage3.c: Peeled back one more matching pair.  For dual intron gap, now
25520      skipping multiple short exons and keeping the longest one.
25521
25522    * gmap.c, snap.c: Increased maxpeelback from 10 to 11.
25523
25524    * pairpool.c, pairpool.h: Added command Pairpool_transfer_copy, although not
25525      currently used.
25526
25527    * pair.c, pair.h: Added command Pair_check_list.
25528
25529    * dynprog.c: Added end reward for bridging a cDNA gap.
25530
25531    * md5-compute.c: Changed behavior from a single sequence to a FASTA file of
25532      multip[le sequences.
25533
25534    * Makefile.am: Added object file for md5-compute.
25535
255362004-09-02  twu
25537
25538    * stage3.c: Fixed floating exception bug when middle_exonlength is
25539      non-positive.
25540
25541    * stage2.c: Fixed problem of reading uninitialized value.
25542
255432004-08-30  twu
25544
25545    * dynprog.c: Added check for non-positive span.
25546
255472004-07-28  twu
25548
25549    * stage2.c: Changed some penalties.  Using bad sequence information to
25550      increase lookback.
25551
25552    * oligoindex.c, oligoindex.h: Added check for bad sequences (with several
25553      non-ACGTN characters).
25554
25555    * dynprog.c: Added check for zero span.
25556
255572004-06-25  twu
25558
25559    * gmap.c, snap.c: Added flag to search only reference strain.
25560
25561    * stage2.c: Increased definition of ENOUGH_CONSECUTIVE.  Added penalties for
25562      deadp.
25563
25564    * stage3.c: Penalizing noncanonical introns in comparing across different
25565      paths.
25566
25567    * segmentpos.c: Changed output of contig length.
25568
25569    * pair.c, pair.h: Reporting number of noncanonical introns.  Allowing
25570      goodness to be reported during debugging.
25571
255722004-06-23  jtang
25573
25574    * cvswrappers: Added binary extensions
25575
255762004-06-20  twu
25577
25578    * stage2.c: Simplified decision making for mismatch gaps.  Increased
25579      penalties on gendistance and querydistance.
25580
255812004-06-16  twu
25582
25583    * whats_on: Added get_sequences function.
25584
25585    * gmap_uncompress.pl, gmap_uncompress.pl.in, snap_uncompress.pl: Implemented
25586      code to interpret new compression scheme.
25587
25588    * stage2.c: Changed penalty functions into macros for speed.  Made some
25589      other changes to improve speed.
25590
25591    * dynprog.c: Fixed bug involving uninitialized variable.
25592
255932004-06-15  twu
25594
25595    * stage2.c: Penalizing intron length per 2000 nt instead of 1000.
25596
25597    * stage3.c: Using goodness scores to decide between single and dual introns.
25598      Searching for microexons only when sequence quality is medium to high.
25599
25600    * dynprog.c, dynprog.h: Reporting nopens and nindels from Dynprog_genome_gap.
25601
256022004-06-12  twu
25603
25604    * stage3.c: Added evaluation of middle exon length in deciding between
25605      single and dual introns.
25606
25607    * stage2.c: Reduced size of INTRON_DEFN.  Further penalized large query
25608      distances.
25609
25610    * stage1.c: Increased length of additional ends of genome segment.
25611
25612    * oligoindex.c: Improved debugging statements.
25613
25614    * dynprog.c, dynprog.h: Allowing up to 1 mismatch on either side for
25615      microexon search. Reporting position of exonhead in Dynprog_genome_gap for
25616      use in traversing dual genome gap.
25617
256182004-06-10  twu
25619
25620    * stage2.c: Added procedure to determine maximum intron length at a given
25621      querypos, and based penalties to be linear with that.
25622
25623    * smooth.c: Added check for nullness of intronlengths.
25624
25625    * dynprog.c: Added probabilistic check on microexon length for a given
25626      genomic span.
25627
256282004-06-09  twu
25629
25630    * pair.c, pair.h, stage3.c: Keeping track of semicanonical introns and
25631      scoring them to decide on strand.
25632
25633    * smooth.c: Made decision about deleting end exons based on probability.
25634
256352004-06-08  twu
25636
25637    * get-genome.c: Fixed a bug in determining whether a query is a range.
25638
256392004-06-07  twu
25640
25641    * stage3.c: Added debugging statement for microexons.
25642
25643    * dynprog.c: Added minimum length for introns when looking for microexons.
25644
25645    * stage2.c: Changed penalties to be more consistent on mismatches between
25646      different conditions, including deadp.  For deadp, now requiring that
25647      abs(gendistance - querydistance) or querydistance be less than INTRON_DEFN.
25648
25649    * smooth.c: Increased threshold on ends to be 20.
25650
25651    * pair.c: For determining fracidentity (and selecting between forward and
25652      reverse strands), now counting semicanonical introns as canonical.
25653
25654    * stage2.c: For deadp, increased lookback.
25655
25656    * gmap.c, snap.c: Increased maxintronlen from 1 million bp to 2 million bp.
25657      Motivated by HER4 (NM_005235).
25658
25659    * gmap.c, snap.c: Increased nullgap from 80 to 600.
25660
25661    * stage2.c: Modified stage 2 scoring for mismatch alignments.  Invoked deadp
25662      when fwd or rev score is zero.
25663
256642004-06-04  twu
25665
25666    * translation.c: Further fixed the bug involving uninitialized heap
25667      (translation_start/translation_end extending beyond sequence boundaries).
25668
25669    * stage2.c, stage2.h: Rewrote code into separate procedure.  Increased
25670      gendistance penalty. Changed penalties when querypos is dead.
25671
25672    * gmap.c, snap.c: Created separate parameters for extraband_end and
25673      extraband_paired. Renamed maxlookback to nullgap.  Created nsufflookback
25674      parameter. Removed repetition of stage 2.
25675
25676    * params.c, params.h, stage3.c, stage3.h: Created separate parameters for
25677      extraband_end and extraband_paired. Renamed maxlookback to nullgap.
25678      Created nsufflookback parameter.
25679
25680    * dynprog.h: Created separate parameters for extraband_end and
25681      extraband_paired.
25682
25683    * dynprog.c: Created separate parameters for extraband_end and
25684      extraband_paired. Extending last nucleotide at ends if possible.  Removing
25685      gaps at ends.
25686
25687    * translation.c: Fixed a bug involving reading/writing of uninitialized heap.
25688
256892004-06-02  twu
25690
25691    * stage3.c: Doubled intron space required for a paired gap solution to be
25692      attempted.
25693
25694    * dynprog.c: Implemented gap penalties that are non-affine, with extensions
25695      being the same within a codon.
25696
25697    * translation.c: Fixed bug where codon was assigned improperly at a cDNA gap.
25698
25699    * dynprog.c, dynprog.h, stage3.c: Added a conservative search for microexons
25700      at the 5' and 3' ends.
25701
25702    * smooth.c: Increased pruning of ends from 8 back to 16.
25703
257042004-06-01  twu
25705
25706    * stage2.c, stage2.h: Added Stage2_pathlength function.
25707
25708    * gmap.c, snap.c: ncreased maxpeelback from 8 to 10 and allowed program to
25709      redo stage2 with increased suflookback if cDNA not covered.
25710
25711    * stage3.h: Changed MININTRONLEN from 9 to 6 and moved definition into .c
25712      file.
25713
25714    * stage3.c: Made search for microexon dependent on number of mismatches.
25715
25716    * dynprog.c, dynprog.h: Made Dynprog_genome_gap return number of matches and
25717      mismatches.
25718
257192004-05-26  twu
25720
25721    * dynprog.c, dynprog.h, stage3.c: Made microexon search work in reverse
25722      direction.  Fixed memory leak.
25723
25724    * boyer-moore.h: Added RCS Id.
25725
25726    * boyer-moore.c: Removed debugging statement.  Added RCS Id.
25727
25728    * Makefile.am, boyer-moore.c, boyer-moore.h, dynprog.c, dynprog.h, stage3.c:
25729      Added procedure for finding microexons.  Works for forward direction only.
25730
25731    * stage3.c: Increased goodness score for canonical intron when deciding
25732      between forward and reverse directions.
25733
25734    * sequence.c: Fixed read procedure to handle PC line feeds.
25735
25736    * dynprog.c: Changed end extension to allow one gap and to proceed if number
25737      of matches is greater than or equal to number of mismatches.
25738
25739    * gmap.c, snap.c, stage3.c, stage3.h, translation.c, translation.h: Added
25740      option for assuming a full-length sequence.
25741
25742    * pair.c: Changed printing of protein coordinates to correspond to first
25743      amino acid on each line.
25744
25745    * gmap.c, snap.c: Added ability to print protein sequence.  Fixed some flags.
25746
25747    * translation.c, translation.h: Fixed calculation of translation coordinates.
25748
25749    * pair.c, pair.h, stage3.c, stage3.h: Added printing of protein coordinates.
25750
25751    * indexdb.c: Revised monitoring statement.
25752
25753    * dynprog.c: Revised extensions of 5' and 3' ends to use best score with no
25754      gap, even if negative.  This extends ends when there is one match and one
25755      mismatch.
25756
257572004-05-14  vivekr
25758
25759    * cvswrappers: Added binary extensions
25760
257612004-05-05  twu
25762
25763    * compress.c, gmap.c, indexdb.c, pair.c, snap.c, stage3.c, stage3.h,
25764      translation.c, translation.h: Made improvements to relative translation
25765      routines.
25766
25767    * Makefile.am, compress.c, compress.h, gmapindex.c, indexdb.c, snapindex.c:
25768      Moved compress and uncompress routines to a new file.
25769
257702004-04-22  twu
25771
25772    * translation.c: Fixed frameshift-tolerant protein computation for cases
25773      where cDNA deletion is 3 or more nt.
25774
25775    * gmap.c, snap.c, stage3.c, stage3.h, translation.c, translation.h: Added
25776      feature for fixing frameshifts in reference-based protein computation and
25777      made it the default.
25778
25779    * stage3.c, translation.c, translation.h: Changed internal data format for
25780      calculating translations.
25781
25782    * translation.c: Fixed array bounds bug in translating from reference.
25783
25784    * whats_on: Added flag for showing original headers.
25785
257862004-04-21  twu
25787
25788    * gmap.c, pair.c, pair.h, pairdef.h, pairpool.c, params.c, params.h, snap.c,
25789      stage3.c, stage3.h, translation.c, translation.h: Added protein
25790      calculation for ESTs based on a reference mRNA.
25791
257922004-04-19  twu
25793
25794    * gmap.c, pair.c, pair.h, params.c, params.h, snap.c, stage3.c, stage3.h,
25795      translation.c, translation.h: Changes to allow calculation of mutation
25796      effect given a specific mutation
25797
257982004-04-18  twu
25799
25800    * genome.c, genome.h: Fixed code for patching strains.
25801
25802    * genomicpos.c: Cleaned up code for adding commas.
25803
25804    * stage1.c: Changed variable name from stutter to stutterdist.
25805
25806    * gmap.c, snap.c: Added internal flag to control strain searching feature.
25807
258082004-04-17  twu
25809
25810    * whats_on: Added ability to print align.iit files, rather than map.iit
25811      files.
25812
25813    * gmap_uncompress.pl, gmap_uncompress.pl.in, snap_uncompress.pl: Added
25814      inversion mode.
25815
25816    * gmap_compress.pl, gmap_compress.pl.in, snap_compress.pl: Added code to
25817      skip protein sequence lines.
25818
258192004-03-30  twu
25820
25821    * Makefile.am, gmap.c, snap.c: Added routines for printing protein sequence.
25822
25823    * iit-read.c, iit-read.h: Added procedure for listing all types.
25824
25825    * stage3.c, stage3.h: Fixed memory leak and bug when stage3 result is NULL.
25826
25827    * translation.c, translation.h: Added routines for printing peptide sequence.
25828
25829    * get-genome.c: Allowed user to select a particular strain to align against.
25830
25831    * pair.c, pair.h: Added routine for printing peptide.  Clarified code for
25832      handling inversions on minus strand.  Fixed bug in compression for '#'
25833      character.
25834
258352004-02-23  twu
25836
25837    * stage3.c: Added special cases for single mismatch and single cDNA
25838      insertion.
25839
25840    * stage1.c: Defined maximum on finding match pairs, to eliminate slow
25841      response on nonsense sequences, such as poly-G.
25842
25843    * stage3.c: In pass 3, force single gap to be crossed even if finalscore is
25844      negative, to complete alignment.
25845
258462004-02-19  twu
25847
25848    * stage3.c, stage3.h: Removed unused variable minendtrigger.
25849
25850    * gmap.c, params.c, params.h, snap.c: Removed global user-specified
25851      parameters from Params_T.
25852
25853    * gmap.c, snap.c: Removed fraction_threshold parameter.  Changed default
25854      chimera_threshold to 0.50.
25855
25856    * params.c, params.h: Removed fraction_threshold parameter.
25857
258582004-02-18  twu
25859
25860    * gmap.c, snap.c: Made chimera threshold definable by user, and set default
25861      to 0.70.
25862
25863    * stage3.c: Re-defined criterion for a gap to be when queryjump <= 0 and
25864      genomejump <= 0, which holds true after single gaps are filled. Prevented
25865      filling in a genome gap when its alignment score is negative.
25866
25867    * smooth.c, smooth.h: Re-defined criterion for a gap to be when queryjump <=
25868      0 and genomejump <= 0, which holds true after single gaps are filled.
25869
25870    * matchpair.c, matchpair.h: Made size bound a fraction of the best, rather
25871      than subtraction.
25872
25873    * stage1.c: Increased maxentries parameters.  Made size bound a fraction of
25874      the best, rather than subtraction.
25875
25876    * gmap.c, snap.c: Removed calls to Stage1_matchpairlist.  Now performing
25877      sampling by default.
25878
25879    * smooth.c: Reduced size definition of short intron.  Made intron definition
25880      depend only on genome distance, which now includes single gaps that
25881      weren't filled in in stage 3.
25882
25883    * stage2.c: Made a macro for query distance penalty.
25884
25885    * stage3.c: Fixed problem with peeled is NULL.  Added decision to not fill
25886      in single gap if the score is negative, and to restore peeled pairs in
25887      that case.
25888
25889    * smooth.c: Fixed memory leak.
25890
258912004-02-17  twu
25892
25893    * stage3.c: Fixed bug where program would skip over a pair after gappairs
25894      was added.
25895
25896    * stage3.c: Made peel_back and peel_forward end at a non-gap, by
25897      backtracking from peeled.
25898
258992004-02-16  twu
25900
25901    * stage3.c: Increased reward for canonical introns from 5 to 8.
25902
259032004-02-15  twu
25904
25905    * stage3.c: Fixed bugs in peel_back and peel_forward.  Fixed bug in
25906      computing goodness scores.
25907
25908    * smooth.c, smooth.h, stage3.c: Giving information about number of short
25909      exons found in smoothing to stage 3 to help improve speed.
25910
25911    * stage3.c: Removed occurrences of indexsize.  Cleaned up procedure for
25912      finding middle exons in dual intron procedure.
25913
25914    * smooth.c, smooth.h: Rewrote smoothing procedure to be a cleaner procedure.
25915       Analyzing both ends to prune short exons.
25916
25917    * pair.c, pair.h: Added Pair_debug_alignment procedure.
25918
259192004-02-14  twu
25920
25921    * smooth.c, smooth.h, stage2.c, stage3.c: Changed stage 2 to produce a
25922      nucleotide-based path, rather than 8-mer path.  Changed smoothing and
25923      stage 3 accordingly.  Made all intron distance penalties equal in stage 2.
25924
25925    * dynprog.c, dynprog.h, stage3.c: Rearranged stage 3 to solve dual introns
25926      before other introns and large gaps.  Performing smoothing iteratively
25927      with dual introns.
25928
25929    * stage2.c: Replaced calculations of gendistance_penalty with macros.
25930
25931    * smooth.c: Increased minexonlen for smoothing, because smoothing has been
25932      made iterative.
25933
25934    * pair.c: Using memcpy commands instead of copying individual fields.  Added
25935      diagnostic printing of short exons.
25936
25937    * Makefile.am, smooth.c, smooth.h, stage3.c: Added files smooth.c and
25938      smooth.h and moved Smooth_path there
25939
25940    * matchpair.c: Fixed memory leak.
25941
259422004-02-13  twu
25943
25944    * gmap.c, snap.c, stage2.c, stage2.h, stage3.c, stage3.h: Changed stage 2
25945      and stage 3 algorithms to interleave in the following order: dynamic
25946      programming on single gaps, then smoothing, then dynamic programming on
25947      ends and large gaps.  Allows dual intron algorithm to work even when
25948      middle exon has small mismatches or gaps.
25949
25950    * pair.c: Fixed merge_one_gap to handle user-selected ngap != 3.
25951
25952    * dynprog.c, dynprog.h: Moved definitions of defect rate boundaries to
25953      header file.
25954
25955    * stage2.c: Subtracting (querydistance+7)/8 on mismatches to penalize once
25956      per 8-mer.  Subtracting 1 for each intron to reduce number of introns,
25957      especially when a/1000 + b/1000 < (a+b)/1000.
25958
25959    * stage2.c: Added separate intron penalties for consistent, unknown, and
25960      inconsistent introns.  Increased lengths of short middle exons marked for
25961      dual genome gap.
25962
25963    * stage2.h, stage3.c: Added intron length to goodness score.
25964
25965    * stage2.c: Implemented two parallel computations in Stage 2 under forward
25966      and reverse assumptions.  Removed firstregion and lastregion computations
25967      from smoothing.
25968
259692004-02-12  twu
25970
25971    * gmap.c, snap.c: Removed universalp flag (-U).
25972
25973    * iit_get.c: Added annotation only mode (-A).
25974
259752004-02-11  twu
25976
25977    * whats_on: Simplified code greatly.
25978
25979    * gmap_compress.pl, gmap_compress.pl.in, snap_compress.pl: Removed space
25980      before first token.  Fixed bug in reporting genomic exon length rather
25981      than cDNA exon length.
25982
25983    * dynprog.c, dynprog.h, stage3.c: Added computation of nonintronlen for
25984      goodness ranking.
25985
25986    * stage2.c: Introduced Link_T to hold dynamic programming data.
25987
25988    * stage2.c: Modified smoothing to have keep, delete, and mark options.
25989
25990    * stage2.c, stage2.h, stage3.c: Added category for introns of unknown
25991      direction.
25992
25993    * gmap.c, snap.c: Made default for chimerasearchp false again.  Added an
25994      automatic mode for chimera search if coverage is less than 50%.
25995
259962004-02-10  twu
25997
25998    * gmap.c, snap.c: Made chimera search the default.
25999
26000    * stage2.c: Added counts of forward, reverse, and non-canonical introns to
26001      the dynamic programming procedure, and used consistency in computing
26002      scores.
26003
26004    * stage3.c: Added debugging macros.
26005
26006    * stage2.c: Added intron penalty only for noncanonical introns.
26007
26008    * stage1.c: Added debugging statements.
26009
26010    * matchpair.c: Reduced MAXCANDIDATES from 30 to 10.
26011
26012    * dynprog.c, pair.c, pairdef.h: Added '~' character for non-canonical gaps
26013      converted to insertions, to avoid penalizing them as non-intron gaps.
26014
260152004-02-09  twu
26016
26017    * gmap.c, iit-read.c, iit-read.h, snap.c, stage3.c, stage3.h: Changed map
26018      output to include strand if both strands are requested.
26019
26020    * dynprog.c: Restored horizontal or vertical jump of 1 next to intron.
26021
26022    * datadir.c: Changed error message.
26023
26024    * stage2.c: Add penalty for number of non-canonical introns.  Accumulate
26025      best score for introns, even if negative, and use that if no other score
26026      exceeds 0.
26027
26028    * iit-read.h: Added name to IIT structure.
26029
26030    * gmap.c, pair.c, pair.h, params.c, params.h, snap.c, stage3.h: Added
26031      compression feature.
26032
26033    * stage3.c: Added compression feature.  Added debug mode to show output from
26034      stage 2.
26035
26036    * gmapindex.c, snapindex.c, get-genome.c, iit-read.c, iit_dump.c, iit_get.c,
26037      iitdef.h: Added optional name to IIT structure.
26038
260392004-02-06  twu
26040
26041    * dynprog.c: Advanced counter within gaps to the next position.
26042
26043    * pair.c, pair.h, pairdef.h, pairpool.c: Added the shortexonp field for
26044      pairs.
26045
26046    * stage2.c: For smoothing of short exons, marking positions as short, rather
26047      than deleting them.  Increased length threshold for short exons, because
26048      we now have a mechanism for handling them well.
26049
26050    * gmap.c, snap.c: Added a dynprogM for handling short exons.
26051
26052    * stage3.c, stage3.h: Removed special procedure for dual genome gaps.
26053      Instead comparing a single genome gap with two half genome gaps for short
26054      exons.
26055
26056    * dynprog.c: Removed special procedure for dual genome gaps.  Instead, for
26057      short exons, comparing a single genome gap with two half genome gaps.
26058
26059    * dynprog.c: Passing pointers to revsequence and revoffset from stage3 to
26060      dynprog procedures where appropriate.  Added preliminary code for dual
26061      genome gap.
26062
26063    * dynprog.h, stage3.c: Passing pointers to revsequence and revoffset from
26064      stage3 to dynprog procedures where appropriate.
26065
26066    * get-genome.c, gmap.c, pair.c, pair.h, params.c, params.h, sequence.c,
26067      sequence.h, snap.c, stage3.c, stage3.h: Added option for specifying wrap
26068      length.
26069
26070    * dynprog.c: Fixed problem with sequence being short by 1 nt in conversion
26071      of gap to insertion.
26072
26073    * dynprog.c: Convert short non-canonical introns into insertions.
26074
26075    * dynprog.c: Removed reverse_sequence and creation of reverse sequence.  Now
26076      using a boolean to determine whether to use negative indices.
26077
260782004-02-05  twu
26079
26080    * README, configure.ac, Makefile.am, datadir.c, datadir.h, gmap.c, params.c,
26081      params.h, snap.c, stage3.c, stage3.h: Changed references to "bounds" to
26082      "map".
26083
26084    * ddsgap2_compress.pl: Made much faster.
26085
26086    * get-genome.c: Fixed get-genome for reverse complement.  Added debugging
26087      statements.
26088
26089    * dynprog.c: Added specific constraints on whether to allow gaps adjacent to
26090      the intron, depending on sequence quality.
26091
260922004-02-03  twu
26093
26094    * dynprog.c, dynprog.h: Removed conservative option.  Added comments to
26095      explain rationale behing scoring scheme.
26096
26097    * gmap.c, params.c, params.h, snap.c, stage3.c, stage3.h: Removed
26098      conservative option.
26099
26100    * stage3.c: Removed peelback on sequence ends.  Continued peelback through
26101      small gaps and mismatches.  Included comp of '-' in pruning of gaps at end.
26102
26103    * iit-read.c: Added debugging code.
26104
26105    * genome.c: Fixed faulty reasoning when patch has expansion or contraction.
26106
26107    * dynprog.c: Raised penalties on paired gap alignment to prevent
26108      gap-match-gap being preferred to two mismatches.  Added checks to bridging
26109      across introns to prevent genomic insertion or more than one cDNA
26110      insertion.
26111
261122004-02-02  twu
26113
26114    * pairdef.h: Revised comment about definition of gapp.
26115
26116    * pair.c: Removed comment.
26117
26118    * dynprog.c: Fixed debugging statements for pairs pushed on horizontal or
26119      vertical moves.
26120
26121    * gmap.c, pair.c, pair.h, snap.c, stage3.c, stage3.h: Added printing of
26122      bounds information as a separate section.
26123
261242004-01-31  twu
26125
26126    * Makefile.am: Added uintlist.c and uintlist.h to source lists where
26127      necessary.
26128
26129    * gmapindex.c, snapindex.c: Made contig intervals inclusive.
26130
26131    * iit_get.c: Changed isnumber to isnumberp to avoid conflict on some Unix
26132      machines.
26133
26134    * iit_get.c: Handle case where strlen of annotation is 0.  Add carriage
26135      return after annotation if necessary.  If one numeric argument given, try
26136      as a label, then as a number.
26137
26138    * iit-read.c: Handle case where strlen of annotation is 0.
26139
26140    * genome.c, get-genome.c: Reverted to previous IIT format, where we don't
26141      store lengths explicitly.  For sequences, can determine actual length from
26142      annotation strlen.
26143
26144    * iit-read.c, iit-read.h, iit-write.c, iit-write.h, iit_store.c, interval.c,
26145      interval.h: Reverted to previous format, where we don't store lengths
26146      explicitly.
26147
26148    * iit_dump.c: Added warning if IIT_read fails.
26149
26150    * gmapindex.c, snapindex.c: Reverted to previous format, where we don't
26151      store lengths explicitly. For FASTA files, count sequence length and store
26152      as annotation in contig_iit.
26153
26154    * stage3.c: Added Pair_check procedure.
26155
26156    * dynprog.c: Fixed problem with dynamic programming not going back to
26157      beginning. Fixed bridging across cDNA gaps.
26158
26159    * datadir.c, datadir.h: Created two data directories, one for genome files
26160      and one for bounds files.
26161
26162    * pair.c, pair.h: Added Pair_check function.
26163
26164    * configure.ac, Makefile.am, gmap.c, snap.c: Created two data directories,
26165      one for genome files and one for batch files.
26166
261672004-01-27  twu
26168
26169    * dynprog.c: Reduced mismatch and gap penalties at ends to extend ends more
26170      completely.
26171
26172    * stage1.c: Increased length of very small sequences from 30 to 40.
26173
261742004-01-26  twu
26175
26176    * gmap.c, snap.c, stage1.h: Changed criterion for good alignment on short
26177      sequences to be based on coverage rather than percent identity.
26178
26179    * stage1.c: Sampling exhaustively on short sequences.
26180
26181    * stage2.c: Removed tiebreaker based on genomic distance.  Ignoring
26182      gendistance penalty if no better score can be found, which allows program
26183      to find distant 5' exons.
26184
26185    * pair.c, pairpool.c, stage3.c: Restored large gap and '#' character when
26186      queryjump exceeds maxlookback.
26187
26188    * match.c: Fixed bug where accessions were looked up on chromosomal
26189      coordinates instead of universal coordinates.
26190
26191    * Makefile.am, datadir.c, datadir.h, snapconfig.c: Removed snapconfig and
26192      run-time configuration of SNAP, which doesn't work on statically built
26193      binaries.
26194
261952004-01-23  twu
26196
26197    * gmap.c, snap.c: Updated print_usage statement for non-popt systems.
26198
26199    * snapconfig.c: Added a usage statement.
26200
26201    * iit_dump.c: Added a debug flag.
26202
26203    * iit-write.c: Writing out elements of structs individually, instead of
26204      depending on an fwrite of the struct.
26205
26206    * iit-read.c, iit-read.h: Fixed problem with Bigendian reads of iit files.
26207      Added IIT_debug function.
26208
26209    * Makefile.am: Provided different dist and nodist instructions depending on
26210      FULLDIST.
26211
26212    * stage1.c: Set maxentries during sampling to be 10 times that of scanning.
26213      Set stage1size for short sequences to be 12-mers for < 40 nt, and 18-mers
26214      for 40-80 nt.
26215
26216    * pair.c, pair.h, pairpool.c: Removed '#' is a character in alignment.
26217
26218    * dynprog.c, dynprog.h, stage3.c: Treated cDNA gaps (extra cDNA material) in
26219      a way analogous to genome gaps.
26220
26221    * get-genome.c: Changed name of function from isnumber to isnumberp to avoid
26222      name conflict with some systems (like MacOSX) that define isnumber in
26223      ctype.h.
26224
262252004-01-20  twu
26226
26227    * stage3.c: Fixed bug where dynamic programming of ends wouldn't go all the
26228      way to the end of the genomic segment.
26229
26230    * dynprog.c: Fixed debug statement.
26231
26232    * Makefile.am: Added file matchdef.h
26233
26234    * dynprog.c, dynprog.h, gmap.c, pair.c, pair.h, params.c, params.h, snap.c,
26235      stage3.c, stage3.h: Added parameter for length of intron gap shown.
26236
26237    * stage1.c: Added a second maxentries parameter to prevent slowness on long
26238      repeated inputs, like CA...CA.
26239
262402004-01-19  twu
26241
26242    * stage3.c: Allowed cDNA direction to be indeterminate.
26243
26244    * matchpair.c, stage1.c: Fixed clustering to work with minsize of 1.
26245
26246    * dynprog.c: Reduced points for match, which improves some alignments.
26247
262482004-01-16  twu
26249
26250    * gmap.c, params.c, params.h, snap.c, stage1.c, stage1.h: Removed nsamples
26251      as a global parameter.
26252
26253    * bootstrap, configure.ac, Makefile.am: Added libtool and
26254      --enable-static-linking feature.
26255
26256    * gmap.c, snap.c: Implemented incremental clustering based on progressively
26257      smaller sampling intervals.  Added ability to print alignment continuously.
26258
26259    * matchpair.c, matchpair.h, stage1.c: Implemented incremental clustering
26260      based on progressively smaller sampling intervals.
26261
26262    * match.c, matchdef.h: Moved structure definition to matchdef.h
26263
26264    * block.c, block.h, reader.c, reader.h: Added ability to reset ends of block.
26265
26266    * stage3.c, stage3.h: Added printing of number of unknowns and of gap
26267      openings in cDNA and genome.
26268
26269    * params.c, params.h: Added parameter for continuous output of alignment.
26270
26271    * pair.c, pair.h: Added output of number of unknowns.  Added procedure for
26272      continuous output of alignment.
26273
26274    * dynprog.c: Created different penalties for gaps in single and paired gaps.
26275
262762004-01-14  desany
26277
26278    * loginfo: Added Yan to e-mail notifications.
26279
26280    * loginfo: Finally figured out where to put the quote (I think).
26281
26282    * loginfo: e-mail command tweak
26283
26284    * loginfo: Tweaking the e-mail command.
26285
26286    * loginfo: Sending log messages to desany when cgh module updates are
26287      committed.
26288
262892004-01-14  twu
26290
26291    * configure.ac: Added feature for static linking.
26292
26293    * params.c, params.h: Using two parameters for stutter: stuttercycles and
26294      stutterhits.
26295
26296    * gmap.c, snap.c: Performing sampling only when necessary.  Using popt help
26297      when available.
26298
26299    * stage1.c, stage1.h: Performing sampling only when necessary.  Limiting
26300      size and changing parameters for bestlist.
26301
26302    * matchpair.c, matchpair.h: Eliminated unused code in filtering procedure.
26303
26304    * indexdb.c: Fixed fread_int to be fread_uint.
26305
26306    * iit-read.c: Added abort statement when more than one interval retrieved by
26307      IIT_get_one.
26308
26309    * get-genome.c: Fixed bug with accessing chromosome_iit after being freed.
26310      Using popt help when available.
26311
26312    * oligo.c, oligo.h: Added Oligo_skip function.
26313
26314    * block.c, block.h: Removed maxtries and added Block_skip.
26315
263162004-01-12  twu
26317
26318    * gmap.c, snap.c, stage1.c: Changed strategy to use clusters of matches,
26319      after first pair found.
26320
26321    * gmapindex.c, snapindex.c: Eliminated check for genome database in
26322      compression mode.
26323
26324    * stage2.c: Changed distance penalty to 1 point per 1000 nt.
26325
26326    * pair.c, pair.h, stage3.c: Keeping separate track of query indels and
26327      target indels.
26328
26329    * genome.c, genome.h, get-genome.c: Implemented check for gbufferlen when
26330      shifting old sequence.
26331
26332    * separator.h: Added file for separator information.
26333
263342004-01-09  twu
26335
26336    * Makefile.am, get-genome.c: Changed program to use chromosome_iit and
26337      contig_iit, rather than text files.
26338
26339    * genome.c: Fixed bug from call to madvise on NULL region.
26340
26341    * iit-read.c, iit-read.h: Added function IIT_read_linear.
26342
26343    * gmapindex.c, snapindex.c: Storing length in interval of contig_iit, rather
26344      than in annotation.
26345
26346    * stage1.c: Changed paired algorithm to use sum of reciprocals of number of
26347      hits.
26348
26349    * get-genome.c: Removed unnecessary decompression functions (now in
26350      genome.c).
26351
26352    * gmap.c, snap.c: Fixed bug where fraction_threshold was declared as int
26353      rather than double.
26354
26355    * stage1.c: Revised algorithm to count number of query hits on 5' and 3'
26356      ends.
26357
26358    * Makefile.am, datadir.c, datadir.h, get-genome.c, gmap.c, snap.c,
26359      snapconfig.c: Moved datadir functions to a separate file.
26360
26361    * gmapindex.c, snapindex.c: Changed format of text files .chromosome and
26362      .contig.
26363
263642004-01-08  twu
26365
26366    * genome.c, genome.h, get-genome.c, gmap.c, iit-write.c, iit-write.h,
26367      snap.c: Allowed genomic patches to be longer or shorter than their
26368      endpoints.
26369
26370    * gmapindex.c, snapindex.c: Allowed intervals to have length that is
26371      different from their endpoints.  Changed format for fasta file input to
26372      snapindex.
26373
26374    * iit_store.c, interval.c, interval.h, segmentpos.c, segmentpos.h: Allowed
26375      intervals to have length that is different from their endpoints.
26376
26377    * iit-read.c: Added carriage returns to annotations, if absent.
26378
263792004-01-07  twu
26380
26381    * gmap.c, params.c, params.h, snap.c: Made fraction_threshold a parameter.
26382
26383    * stage2.c: Changed calculation of penalty for large genome distances to be
26384      done only when necessary.
26385
26386    * snapconfig.c: Changed feedback message.
26387
26388    * genome.c, indexdb.c: Improved warning messages when memory mapping fails.
26389
263902004-01-05  twu
26391
26392    * snapdir.c: Changed name of snapdir to snapconfig.
26393
26394    * gmap.c, match.c, match.h, params.c, params.h, result.c, result.h, snap.c:
26395      Restored alignment using stage 1 only.
26396
26397    * stage1.c, stage1.h: Moved decision of stage1size and maxentries to here.
26398
26399    * genome.c: Added warning message of memory mapping of genome fails.
26400
26401    * genome.c: Restored batch memory mapping of genome.
26402
26403    * stage1.c: Greatly increased MAXENTRIES parameter.
26404
26405    * gmap.c, params.c, params.h, snap.c: Made stage1size dependent upon
26406      sequence length, with short sequences getting stage1size of 12.
26407
26408    * gmap_compress.pl, gmap_compress.pl.in, snap_compress.pl, whats_on:
26409      Generalized parse for coordinate separator.
26410
26411    * get-genome.c: Restored -- as coordinate separator.
26412
264132003-12-19  twu
26414
26415    * gmap.c, sequence.c, sequence.h, snap.c, stage3.c, stage3.h: New approach
26416      to chimeras, involving a subsequence and new stage1 procedure.
26417
26418    * stage2.c: Added distance penalty for long introns.
26419
26420    * Makefile.am, pair.c, segmentpos.c: Included separator.h
26421
26422    * pair.c, segmentpos.c, segmentpos.h: Removed unnecessary parameters in
26423      Segmentpos_print_accessions.
26424
26425    * get-genome.c: Change in coordinate separator from -- to ..
26426
264272003-12-17  twu
26428
26429    * gmap.c, match.c, match.h, matchpair.c, matchpair.h, snap.c, stage1.c,
26430      stage1.h: Changed procedures for finding chimeras to try singlelist of the
26431      appropriate side.
26432
26433    * pair.c, segmentpos.c: Changing coordinate output from -- to ..
26434
26435    * stage3.c, stage3.h: Changed procedures for finding chimeras to try
26436      singlelist of the appropriate side.  Fixed bug in computing chimeric
26437      goodness.
26438
26439    * dynprog.c, dynprog.h: Provided separate parameters for ends, removed
26440      multiplicative reward, and changed all score calculations to be integers.
26441
264422003-12-16  twu
26443
26444    * matchpair.c, stage1.c, stage1.h: Fixed bug with position calculations on
26445      large chromosomes (> 2 Gig).
26446
26447    * gmap.c, matchpair.c, matchpair.h, snap.c, stage1.c, stage1.h: Based
26448      algorithm for finding extensions on 12-mers.
26449
26450    * chrnum.c, chrnum.h: Added function for computing chromosomal string and
26451      position from genomic position.
26452
264532003-12-15  twu
26454
26455    * gmap.c, matchpair.c, matchpair.h, params.c, params.h, snap.c: Made
26456      extension linear depending on query length.
26457
26458    * stage1.c, stage1.h: Made cluster list depend on size of largest cluster.
26459
264602003-12-14  twu
26461
26462    * stage2.c: Added a minimum exon length for ends during smoothing.
26463
26464    * stage1.c, stage1.h: Added a last-resort procedure for trying all matches
26465      found in stage 1.  Enhanced debugging statements.
26466
26467    * gmap.c, snap.c: Added a last-resort procedure for trying all matches found
26468      in stage 1.
26469
26470    * oligoindex.c, shortoligomer.h: Returned to old method for store_positions,
26471      because it appears to be faster.
26472
26473    * genome.c, genome.h, get-genome.c: Enhanced debugging statements.
26474
26475    * matchpair.c: Added assertions about strands and relative position of
26476      matches.
26477
26478    * stage2.c: Returned to old method for store_positions.  Fixed smoothing for
26479      a single exon.
26480
264812003-12-13  twu
26482
26483    * oligoindex.c, shortoligomer.h, types.h: Further attempt to increase speed
26484      of store_positions.
26485
26486    * gmap.c, snap.c: Fixed memory leak when stage3array is recomputed.
26487
26488    * oligoindex.c, oligoindex.h, stage2.c: Increasing speed of store_positions
26489      by reducing number of calls to calloc.
26490
26491    * gmap.c, matchpair.c, matchpair.h, params.c, params.h, snap.c, stage1.c,
26492      stage1.h: Changed cluster algorithm to rank clusters based on size and
26493      process the top ones based on sum of sizes.
26494
264952003-12-12  twu
26496
26497    * genome.c: Added check for enddiscard being 0.
26498
26499    * stage2.c: Did an in-lining of intron_score.
26500
26501    * gmap.c, params.c, params.h, snap.c, stage1.c, stage1.h, stage3.c,
26502      stage3.h: Added new cluster algorithm for stage 1, used when paired
26503      algorithm fails to produce an alignment with high identity.
26504
26505    * gmap.c, snap.c: Added ability to modify binary file to include default
26506      genome directory.
26507
26508    * snapconfig.c, snapdir.c: Initial import into CVS.
26509
265102003-12-10  twu
26511
26512    * gmapindex.c, indexdb.c, indexdb.h, snapindex.c: Added ability to generate
26513      idxoffsets and idxpositions files from compressed genome.
26514
26515    * gmap.c, snap.c: Changed the uncompressed flag from -G to -g.
26516
26517    * gmapindex.c, snapindex.c: Implemented direct writing of compressed genome
26518      file.
26519
265202003-12-09  twu
26521
26522    * iit_store.c: Fixed bug where non-copied string is entered into table.
26523
26524    * iit_get.c: Improved error message.
26525
26526    * iit_dump.c: Added function for showing all types.
26527
26528    * table.c: Added debugging statements.
26529
26530    * gmap.c, params.c, params.h, snap.c: For user-provided segments, skipping
26531      stage 1 (although can be specified by the user), to achieve increased
26532      speed.
26533
26534    * sequence.c, sequence.h: Restored function Sequence_revcomp.
26535
265362003-12-04  twu
26537
26538    * stage1.c: Restored cluster algorithm for short sequences.
26539
26540    * gmap.c, snap.c: Generalized definition of chimera, and reduced percentage
26541      to 80%.
26542
265432003-12-03  twu
26544
26545    * Makefile.am, iit-read.c, iit-read.h, iit_get.c: Augmented iit_get to
26546      handle types and file input.
26547
26548    * gmap.c, intlist.c, intlist.h, sequence.c, sequence.h, snap.c: Allowed
26549      user-specified genomic segment to have arbitrary length.
26550
26551    * gmap.c, snap.c: Restored -U flag for reporting in universal coordinates.
26552
26553    * iit-read.c: Fixed bug in IIT_dump_formatted.
26554
26555    * Makefile.am, md5-compute.c: Added program md5-compute.
26556
265572003-12-01  twu
26558
26559    * gmap.c, params.c, params.h, snap.c: Added message to user when FASTA file
26560      is run without batch mode.
26561
265622003-11-28  twu
26563
26564    * oligo.c: Changed debug statements.
26565
26566    * reader.c: Cleaned up pointer calculation.
26567
26568    * sequence.h: Removed Sequence_revcomp, which is not used.
26569
26570    * sequence.c: Revised comments.
26571
26572    * stage2.c: In-lined gap_score.
26573
26574    * indexdb.c: More bug fixes for bigendian machines on user-provided segments.
26575
26576    * indexdb.c: Fixed a problem with bigendian machines for user-provided
26577      segments.
26578
26579    * gmap.c, snap.c: Added releasestring in attempt to find version file.
26580
26581    * genome.c, genome.h: Added option for replacing X's with N's.
26582
26583    * get-genome.c: Added option for replacing X's with N's.  Fixed bug when
26584      closing a null file pointer.
26585
26586    * iit_store.c: Append .iit to given filename, instead of replacing existing
26587      suffix.
26588
265892003-11-26  twu
26590
26591    * gmapindex.c, snapindex.c: Removed -U flag.
26592
26593    * gmapindex.c, indexdb.c, indexdb.h, snapindex.c: Reverted back to using
26594      uncompressed genome for making idxoffsets and idxpositions.
26595
26596    * gmap.c, snap.c: Changed flag for uncompressed genome from -G to -U.
26597
26598    * gmapindex.c, indexdb.c, indexdb.h, snapindex.c: Attempt to build
26599      idxoffsets and idxpositions from genomecomp, but has problems.
26600
26601    * genome.c: Added automated switching between compressed and uncompressed
26602      genome, if the requested one cannot be found.
26603
26604    * iit_store.c: Keeping last carriage return of annotation.
26605
26606    * iit_get.c: If iit file cannot be found, try appending .iit.
26607
26608    * gmapindex.c, snapindex.c: Finding labels in IIT directly instead of
26609      converting to a table.
26610
26611    * iit-read.c, iit-read.h, iit-write.c, iitdef.h: Changed IIT format to store
26612      alphabetic order of labels, so that labels can be found by binary search.
26613
266142003-11-25  twu
26615
26616    * genome.c, genome.h, gmap.c, pair.c, pair.h, params.c, params.h, snap.c,
26617      stage3.c, stage3.h: Added popt handling of options.  Renamed various
26618      program options.
26619
26620    * iit-read.c, iit-read.h, iit_get.c: Added ability to search IITs by label.
26621
26622    * get-genome.c: Changed usage statement for popt autohelp.
26623
26624    * Makefile.am: Changed name of variable to POPT_LIBS.
26625
26626    * acinclude.m4, configure.ac: Added AC_DEFINE for HAVE_LIBPOPT.  Set various
26627      defines to have value 1.
26628
26629    * gmapindex.c, iit-read.c, iit-read.h, iit-write.c, iit-write.h, iit_get.c,
26630      iit_store.c, iitdef.h, match.c, segmentpos.c, snapindex.c: Change made to
26631      format of IIT file.  Now allowing each interval to be labeled.
26632
26633    * indexdb.c: Fix made for the case where an oligomer earlier than TT...TT is
26634      the last one and points to totalcounts.
26635
266362003-11-24  twu
26637
26638    * gmap_compress.pl, gmap_compress.pl.in, snap_compress.pl: Added notation
26639      for chimeric sequences.
26640
26641    * acinclude.m4: Added check for MAP_FAILED.  Added sys/types.h when checking
26642      for pthreads (needed for Sun compiler).
26643
26644    * assert.h, bigendian.h, blackboard.h, block.h, chrnum.h, complement.h,
26645      dynprog.h, except.h, genome.h, genomicpos.h, iit-read.h, iit-write.c,
26646      iit-write.h, iit_dump.c, iit_get.c, iit_store.c, indexdb.h, interval.h,
26647      intlist.h, intron.h, list.h, match.h, matchpair.h, md5.h, mem.h, oligo.h,
26648      oligoindex.h, pair.h, pairpool.h, params.h, reader.h, reqpost.h,
26649      request.h, result.h, segmentpos.h, sequence.h, stage1.h, stage2.h,
26650      stage3.h, stopwatch.h, table.h, uintlist.h: Included config.h in all
26651      header files, to catch redefinition of const, which is needed for the Sun
26652      compiler.
26653
26654    * stage3.c: Commented out code that is never reached.
26655
26656    * genome.c, indexdb.c: Modified messages to stderr for batch mode.
26657
26658    * blackboard.c, gmap.c, reqpost.c, snap.c: Added sys/types.h to handle
26659      pthread_t, needed by Sun compiler.
26660
26661    * assert.c: Kept only the header file definition of assert, due to problem
26662      with Sun compiler.
26663
26664    * iit-read.c, table.c: For functions passed as arguments, added pointer and
26665      parentheses around parameter list.
26666
26667    * stage2.c: Changed some exon length parameters.
26668
266692003-11-19  twu
26670
26671    * gmap.c, snap.c, stage3.c, stage3.h: Added additional check for chimeras,
26672      based on top two hits.
26673
26674    * bigendian.c, indexdb.c: Moved masking to the logical or statements to
26675      address a bug on MacOSX.
26676
266772003-11-18  twu
26678
26679    * gmap.c, snap.c: Made directory searching process more flexible, by looking
26680      for version file at toplevel and subdirectory of datadir.
26681
26682    * genome.c, indexdb.c: Fixed calls to mmap and munmap when mmap fails.
26683      Moved stopwatch start before madvise command.
26684
26685    * bigendian.c, genome.c, indexdb.c: Added masks to chars when converting to
26686      an int or unsigned int, due to problem observed on DEC Alpha.
26687
26688    * genome.c, indexdb.c: Corrected conversion of littleendian to bigendian
26689      numbers.  Added lseek and read procedures when mmap is not present or
26690      fails.
26691
26692    * bigendian.c: Corrected conversion of littleendian to bigendian numbers.
26693
26694    * Makefile.am: Generate ChangeLog only when CVS directory present.
26695
266962003-11-17  twu
26697
26698    * Makefile.am: Used LDADD instructions to call libraries instead of LDFLAGS.
26699      (Required for program to load on SGI.)  Moved SCRIPTS under FULLDIST.
26700
26701    * configure.ac: Renamed POPT_LDFLAGS to POPT_LIBS.
26702
26703    * bootstrap: Added --copy flag to automake.
26704
26705    * Makefile.am: Added dist-hook to make ChangeLog up to date.
26706
26707    * config: Removed secondary config files generated by automake.
26708
26709    * gmapindex.c, snapindex.c: Fixed bug where X's were not being filled in,
26710      because variable declared as int, rather than unsigned int.
26711
26712    * block.c, block.h: Removed obsolete function.
26713
26714    * acinclude.m4: Moved to top-level directory.
26715
26716    * ChangeLog: Removed from repository.  Can be generated as needed.
26717
26718    * bootstrap: Added --add-missing flag.
26719
26720    * README: Added message about config.site.
26721
26722    * bootstrap: Initial import into CVS.  Added because autoreconf doesn't work
26723      with a config subdirectory.
26724
26725    * configure.ac: Made toplevel configure.ac work with a config subdirectory.
26726
26727    * gmap.c, snap.c, stage1.c, stage1.h: Changed algorithm to declare chimera
26728      only after alignment is done, and to use salvaged matches in that case.
26729
26730    * stage3.c, stage3.h: Stored genomicstart and genomicend as part of Stage3_T
26731      structure.
26732
26733    * ddsgap2_compress.pl: Initial import into CVS.
26734
26735    * whats_on, install-sh, missing, mkinstalldirs, sim4_compress.pl,
26736      sim4_uncompress.pl, snap_compress.pl, snap_uncompress.pl, snapbuild.pl.in,
26737      spidey_compress.pl: Moved to subdirectory.
26738
26739    * compile, config.guess, config.sub, depcomp: Removed secondary config files
26740      (generated by automake).
26741
26742    * Makefile.am: Adding top-level Makefile.am
26743
26744    * assert.c, assert.h, bigendian.c, bigendian.h, blackboard.c, blackboard.h,
26745      block.c, block.h, bool.h, chrnum.c, chrnum.h, complement.c, complement.h,
26746      dynprog.c, dynprog.h, except.c, except.h, genome.c, genome.h,
26747      genomicpos.c, genomicpos.h, genuncompress.c, get-genome.c, iit-read.c,
26748      iit-read.h, iit-write.c, iit-write.h, iit_dump.c, iit_get.c, iit_store.c,
26749      iitdef.h, indexdb.c, indexdb.h, interval.c, interval.h, intlist.c,
26750      intlist.h, intron.c, intron.h, list.c, list.h, listdef.h, match.c,
26751      match.h, matchpair.c, matchpair.h, md5.c, md5.h, md5.t.c, mem.c, mem.h,
26752      oligo-count.c, oligo.c, oligo.h, oligoindex.c, oligoindex.h, pair.c,
26753      pair.h, pairdef.h, pairpool.c, pairpool.h, params.c, params.h, reader.c,
26754      reader.h, reqpost.c, reqpost.h, request.c, request.h, result.c, result.h,
26755      segmentpos.c, segmentpos.h, sequence.c, sequence.h, shortoligomer.h,
26756      snap.c, snapindex.c, stage1.c, stage1.h, stage2.c, stage2.h, stage3.c,
26757      stage3.h, stopwatch.c, stopwatch.h, table.c, table.h, types.h, uintlist.c,
26758      uintlist.h: Moved source files to subdirectory.
26759
26760    * iit-read.c, iit-read.h: Added function IIT_get_typed.
26761
26762    * indexdb.c: Removed debugging message.
26763
26764    * snap.c, gmap.c, stage3.c, stage3.h: Improved determination of when an
26765      alternate strain applies, based on the aligned genomic segment.  Added
26766      strain type to sorting of results.
26767
26768    * stage1.c: Bypassing the cluster algorithm.
26769
26770    * snap.c, gmap.c: Added ability to determine datadir from environment
26771      variable or configuration file.
26772
26773    * get-genome.c: Added popt processing of command-line options.
26774
26775    * genome.c: Added bigendian conversions for compressed genome, which is
26776      memory mapped.
26777
26778    * configure.ac, Makefile.am: Added check for popt library.
26779
267802003-11-15  twu
26781
26782    * snapindex.c, gmapindex.c: Fixed pointer bug.
26783
26784    * stage2.c, stage2.h: Removed directional check on stage 2 smoothing.
26785      Introduced separate length criterion for first long exon.
26786
26787    * stage3.c, stage3.h: Implemented checks and procedures for chimeric
26788      sequences.  Removed directional check on stage 2 smoothing.
26789
26790    * pair.c, pair.h, result.c, result.h, snap.c, gmap.c, stage1.c, stage1.h:
26791      Implemented checks and procedures for chimeric sequences.
26792
26793    * genome.c: Changed debug statements from stderr to stdout.
26794
267952003-11-14  twu
26796
26797    * stage1.c: Changed identify_matches to assume the absence of duplicates.
26798
26799    * stage2.c: Changed criterion for short first and last exon during smoothing
26800      to be half of the corresponding region.
26801
26802    * stage3.c: Fixed debugging statements.
26803
26804    * snap.c, gmap.c, stage3.c, stage3.h: Fixed bug where a strain was falsely
26805      reported due to duplicate stage 3 objects and deletion of the one for the
26806      reference.
26807
26808    * sequence.c: Reduced poly-A tail left from 7 to 1.
26809
26810    * pair.c: Made print procedure backward compatible with old altstrain_iits.
26811
26812    * pair.c, params.c, params.h, snap.c, snapindex.c, gmap.c, gmapindex.c: Made
26813      changes to include name of reference strain.
26814
26815    * snap.c, gmap.c: Fixed typo in comment.
26816
26817    * iit-read.c: Fixed memory leak when altstrain_iit doesn't exist.
26818
26819    * Makefile.am: Integrated get-genome into snap code.
26820
26821    * get-genome.c, sequence.c, sequence.h: Major rewrite of get-genome, to
26822      integrate it into existing snap code.
26823
26824    * genome.c, genome.h, snap.c, gmap.c: Handled case where more than one patch
26825      from a given strain is applicable to a given genomic segment.
26826
26827    * intlist.c: Added check for null list in Intlist_to_array.
26828
26829    * indexdb.c: Changed idxpositions to eliminate duplicates during writing and
26830      to skip bad values during reading.
26831
268322003-11-13  twu
26833
26834    * snap_compress.pl, gmap_compress.pl, gmap_compress.pl.in: Revised program
26835      to parse strain info.
26836
26837    * stage1.c: Added some comments.
26838
26839    * block.c, block.h, oligo-count.c, oligo.c, oligo.h, Makefile.am: Revised
26840      oligo-count to use the new code.
26841
26842    * Makefile.am: Added build for get-genome.
26843
26844    * get-genome.c: Major cleaning of code.  Added ability to read from
26845      compressed genome files.
26846
26847    * oligo-count.c: Initial import into CVS.  Dated 2003-07-16.
26848
26849    * genome.c, genome.h, genomicpos.c, genomicpos.h, matchpair.c, matchpair.h,
26850      pair.c, pair.h, snap.c, gmap.c, stage3.c, stage3.h: Added ability to align
26851      to multiple strains.
26852
26853    * stage1.c: Cleaned up some bugs on handling stutter.  Implemented check for
26854      duplicates in idxpositions.
26855
26856    * indexdb.c, indexdb.h, snapindex.c, gmapindex.c: Changed strategy for
26857      idxoffsets and idxpositions for strains.  Now storing the union of all
26858      strains.
26859
268602003-11-12  twu
26861
26862    * snapbuild.pl.in, Makefile.am, gmapsetup.pl.in, gmap_setup.pl.in: Fixed
26863      procedure for making snapbuild.
26864
26865    * Makefile.am: Added procedure for making snapbuild script.
26866
26867    * configure.ac: Added feature for enabling full distribution.
26868
26869    * snapbuild.pl: Changed file from snapbuild.pl to snapbuild.pl.in.
26870
26871    * configure.ac, params.h, Makefile.am: Cleaned up specification of data
26872      directory and version file.
26873
26874    * params.c: Added provisions for reading altstrain IIT.
26875
26876    * snap.c, gmap.c: Cleaned up specification of data directory and version
26877      file.  Added provisions for reading altstrain IIT.
26878
26879    * snapindex.c, gmapindex.c: Fixed problem with slashes in alternate strain
26880      name.
26881
26882    * stage1.c: Cleaned up code for stage1.c.  Fixed memory leak for paired
26883      algorithm.  Added chromosomal constraint for cluster algorithm.
26884
268852003-11-11  twu
26886
26887    * iit-read.c, iit-read.h, iit-write.c, iit-write.h, iit_dump.c, iit_get.c,
26888      iit_store.c, iitdef.h, indexdb.c, indexdb.h, interval.c, interval.h,
26889      segmentpos.c, segmentpos.h, snap.c, snapbuild.pl, snapbuild.pl.in,
26890      snapindex.c, Makefile.am, gmap.c, gmapindex.c, gmapsetup.pl.in,
26891      gmap_setup.pl.in: Changes made to introduce types into IITs, and to build
26892      SNAP databases with alternate strain information.
26893
26894    * match.c, match.h, stage1.c: Changes to stage 1 algorithm: (1) choice of 5'
26895      or 3' advancement based on number of hits, (2) stutter based on positions
26896      with hits, (3) computed fraction of paired hits on each end.
26897
268982003-11-10  twu
26899
26900    * snapbuild.pl, snapbuild.pl.in, gmapsetup.pl.in, gmap_setup.pl.in: Initial
26901      import into CVS.
26902
26903    * Makefile.am: Added object files for bigendian.
26904
26905    * iit-read.c: Added header file for bigendian.h
26906
26907    * bigendian.c, indexdb.c: Fixed problem in bigendian conversion.
26908
26909    * sequence.c: Fixed problem in handling sequence files without headers.
26910
26911    * iit-read.c: Changed most elements of IIT_T to be fread, rather than
26912      mmapped. Added code for program to work on bigendian architectures.
26913
26914    * indexdb.c: Changed offsets file to be fread, rather than mmapped.  Added
26915      code for program to work on bigendian architectures.
26916
26917    * iitdef.h: Added comments.
26918
26919    * bigendian.c, bigendian.h, configure.ac, genuncompress.c, iit-write.c,
26920      snapindex.c, Makefile.am, gmapindex.c: Added code for program to work on
26921      bigendian architectures.
26922
269232003-11-08  twu
26924
26925    * acinclude.m4, configure.ac: Made VERSION automatically equal the current
26926      date.
26927
26928    * Makefile.am: Removed reference to iit_convert.
26929
26930    * genome.c: Turned off batch loading of genome.
26931
26932    * sequence.c, snap.c, gmap.c: Rest of header printed in output.  Exceptional
26933      file terminations handled better.
26934
269352003-11-07  twu
26936
26937    * params.c, params.h, snap.c, gmap.c, stage1.c, stage1.h: Added cluster
26938      algorithm for short query sequences.
26939
26940    * block.c, block.h, longoligomer.c, longoligomer.h, match.c, match.h,
26941      matchpair.c, matchpair.h, oligo.c, oligo.h, Makefile.am: Removed
26942      longoligomers.
26943
26944    * genome.c: Fixed print statement for batch mode.
26945
26946    * snap.c, gmap.c: Restored dump_segs functionality.
26947
26948    * snapindex.c, gmapindex.c: Changed name of table from chroffset to
26949      chrlength.
26950
26951    * iit-read.c, iit-read.h: Added function IIT_dump_formatted.
26952
269532003-10-27  twu
26954
26955    * iit_get.c, iit_store.c: Removed carriage return at end of annotation.
26956
26957    * iit-read.c, iit-read.h, iit_dump.c, Makefile.am: Added a program for
26958      dumping IIT files.
26959
26960    * snapindex.c, gmapindex.c: Added better comments.
26961
26962    * whats_on: Fixed program to use new IIT file format.
26963
26964    * table.c: Removed assertion checks for key being non-zero, which doesn't
26965      work for a chromosome of 0.
26966
26967    * INSTALL: Copied generic installation instructions.
26968
26969    * COPYING: Created copyright notice.
26970
269712003-10-25  twu
26972
26973    * iit-write.c: Made Node_make static.
26974
269752003-10-24  twu
26976
26977    * indexdb.c: Fixed format of batch statement.
26978
26979    * iit-read.c, iit-write.c, iit_get.c, match.c, segmentpos.c, snapindex.c,
26980      gmapindex.c: Changed annotations in .iit files to have '\0' characters at
26981      the ends, so they can be used in the file, without copying.
26982
269832003-10-23  twu
26984
26985    * interval.c: Added comment about sorting procedures.
26986
26987    * iit_get.c, iit_store.c: Changed program to use the IIT implementation in
26988      this directory.
26989
26990    * iit-read.c: Added madvise command.
26991
26992    * genome.c, indexdb.c: Changed reporting of touching pages under batch mode.
26993
26994    * Makefile.am: Added iit_store and iit_get.
26995
26996    * genome.c, indexdb.c: Revised touching of pages for batch mode.
26997
26998    * assert.h, blackboard.h, block.h, bool.h, chrnum.h, complement.h,
26999      dynprog.h, except.h, genome.h, genomicpos.h, iit-read.h, iit-write.h,
27000      iitdef.h, indexdb.h, interval.h, intlist.h, intron.h, list.h, listdef.h,
27001      longoligomer.h, match.h, matchpair.h, md5.h, mem.h, oligo.h, oligoindex.h,
27002      pair.h, pairdef.h, pairpool.h, params.h, reader.h, reqpost.h, request.h,
27003      result.h, segmentpos.h, sequence.h, shortoligomer.h, stage1.h, stage2.h,
27004      stage3.h, stopwatch.h, table.h, types.h, uintlist.h: Added RCS Id string
27005      to header files.
27006
27007    * snap.c, gmap.c: Removed call to strdup.
27008
27009    * snapindex.c, gmapindex.c: Removed printing of superaccessions for NCBI
27010      genomes.
27011
27012    * segmentpos.c: Removed unused procedures based on Berkeley DB.
27013
27014    * chrnum.c: Fixed problem with numeric-alpha ordering of chromosomes.  XU
27015      now follows X and precedes Y.
27016
270172003-10-22  twu
27018
27019    * acinclude.m4, genome.c, indexdb.c: Added macros to check for pagesize
27020      determination.
27021
27022    * config.h.in: Removed derived file.
27023
27024    * configure.ac: Cleaned up unnecessary autoconf macros.
27025
27026    * acinclude.m4, config.h.in, genome.c, genuncompress.c, iit-read.c,
27027      indexdb.c: Improved autoconf checks and header files for mmap.
27028
27029    * snapindex.c, gmapindex.c: Fixed problem with freeing memory.
27030
27031    * segmentpos.c: Fixed small error with printing accession bounds.
27032
27033    * chrnum.c, iit-read.c, iit-read.h, iit-write.c, iit-write.h, segmentpos.c,
27034      snap.c, snapindex.c, gmap.c, gmapindex.c: Fixed memory leaks.
27035
27036    * acinclude.m4, block.h, chrnum.c, chrnum.h, config.h.in, configure.ac,
27037      database.c, database.h, genomicpos.c, genomicpos.h, get-genome.c,
27038      iit-read.c, iit-read.h, iit-write.c, iit-write.h, interval.c, interval.h,
27039      match.c, match.h, offset.c, offset.h, offsetdb.c, offsetdb.h, oligo.c,
27040      oligo.h, pair.c, pair.h, params.c, params.h, segmentpos.c, segmentpos.h,
27041      sequence.c, snap.c, snapindex.c, Makefile.am, gmap.c, gmapindex.c,
27042      stage1.c, stage1.h, stage3.c, stage3.h, table.c, table.h: Eliminated
27043      dependence upon Berkeley DB.
27044
27045    * table.c, table.h: Initial import into CVS.
27046
270472003-10-21  twu
27048
27049    * acinclude.m4, config.h.in, configure.ac, genome.c, genuncompress.c,
27050      iit-read.c, indexdb.c: Added checks for various mmap flags.
27051
27052    * iitdef.h: Restructed IIT_T commands.
27053
27054    * iit-read.c, iit-read.h, iit-write.c, iit-write.h, interval-read.c,
27055      interval-read.h, interval.c, interval.h, pair.c, snap.c, Makefile.am,
27056      gmap.c: Restructured Interval_T and IIT_T implementations so they don't
27057      depend on BerkeleyDB, and added ability to write IITs.
27058
27059    * acinclude.m4, database.c: Added provision for BerkeleyDB version 4.1.
27060
27061    * iit_store.c: Changed format of input file to have only intervals on the
27062      header line.
27063
27064    * iit_get.c: Changed program to use new IIT format.
27065
27066    * iit_store.c: Fixed problem with annotlist being reversed.
27067
27068    * iit_store.c: Changed format of iit file to include annotations.
27069
270702003-10-20  twu
27071
27072    * sequence.c: Corrected type for return value of fgetc.
27073
27074    * oligo.c: Corrected type for return value of Reader_getc.
27075
27076    * stage1.h: Removed db.h as an included header.
27077
27078    * acinclude.m4: Added -rpath flag during linking of Berkeley DB.
27079
27080    * Makefile.in, configure: Removing from CVS.
27081
27082    * Makefile.in, configure: Result of autoreconf.
27083
27084    * Makefile.am: Added header files to SOURCES.
27085
27086    * configure.ac: Added no-dependencies option.
27087
27088    * iit-read.c: Removed MAP_VARIABLE from mmap call, because not recognized by
27089      Linux.
27090
27091    * sequence.c: Renamed variable strlen to avoid compiler error on Linux.
27092
27093    * Makefile.in: Added various auxiliary files.
27094
27095    * Makefile.in, compile, config.guess, config.sub, depcomp, install-sh,
27096      configure: Initial import into CVS.
27097
27098    * missing, mkinstalldirs: Provided updated version.
27099
27100    * genome.c, genomicpos.c, iit-read.c, iit-read.h, indexdb.c, intlist.c,
27101      intlist.h, match.c, md5.c, mem.c, offset.c, offsetdb.c, oligoindex.c,
27102      pair.c, segmentpos.c, segmentpos.h, Makefile.am, uintlist.c, uintlist.h:
27103      Addressed compiler warnings from gcc.
27104
271052003-10-19  twu
27106
27107    * acinclude.m4, blackboard.c, configure.ac, reqpost.c, snap.c, Makefile.am,
27108      gmap.c: Allowed pthreads to be enabled or disabled.
27109
271102003-10-18  twu
27111
27112    * assert.c, block.c, chrnum.c, complement.c, database.c, dynprog.c,
27113      except.c, genome.c, genomicpos.c, genuncompress.c, get-genome.c,
27114      iit-read.c, indexdb.c, interval-read.c, intron.c, list.c, longoligomer.c,
27115      match.c, matchpair.c, md5.c, offset.c, offsetdb.c, oligo.c, oligoindex.c,
27116      pair.c, pairpool.c, params.c, reader.c, reqpost.c, request.c, result.c,
27117      segmentpos.c, sequence.c, snap.c, snapindex.c, gmap.c, gmapindex.c,
27118      stage1.c, stage2.c, stage3.c, stopwatch.c: Added RCS Id string correctly
27119
27120    * assert.c, block.c, chrnum.c, complement.c, database.c, dynprog.c,
27121      except.c, genome.c, genomicpos.c, genuncompress.c, get-genome.c,
27122      iit-read.c, indexdb.c, interval-read.c, intron.c, list.c, longoligomer.c,
27123      match.c, matchpair.c, md5.c, offset.c, offsetdb.c, oligo.c, oligoindex.c,
27124      pair.c, pairpool.c, params.c, reader.c, reqpost.c, request.c, result.c,
27125      segmentpos.c, sequence.c, snap.c, snapindex.c, gmap.c, gmapindex.c,
27126      stage1.c, stage2.c, stage3.c, stopwatch.c: Added rcsid strings.
27127
27128    * blackboard.c, block.c, complement.c, database.c, dynprog.c, except.c,
27129      iit-read.c, interval-read.c, intron.c, list.c, matchpair.c, md5.c, mem.c,
27130      mem.h, oligo.c, oligoindex.c, pair.c, pairpool.c, params.c, reader.c,
27131      reqpost.c, request.c, result.c, sequence.c, stage2.c, stopwatch.c:
27132      Rearranged header includes.
27133
27134    * longoligomer.h: Defined T for both cases of HAVE_64_BIT.
27135
27136    * longoligomer.c: Added conditional compiling based on HAVE_64_BIT.
27137
27138    * offset.h: Added necessary header file stdio.h.
27139
27140    * types.h: Added compiler directives from config.h.
27141
27142    * configure.ac: Initial changes to configure.scan to make autoconf and
27143      automake work for the cc compiler.
27144
27145    * Makefile: Removed Makefile from CVS, because it is now generated from
27146      Makefile.am by automake, and then from Makefile.in by configure.
27147
27148    * AUTHORS, COPYING, ChangeLog, INSTALL, NEWS, README, acinclude.m4, config,
27149      missing, mkinstalldirs, config.h.in, Makefile.am: Added files for autoconf
27150      and automake to work.
27151
27152    * configure.ac: Initial configure.ac from configure.scan produced by
27153      autoscan.
27154
271552003-10-17  twu
27156
27157    * gencompress.c, snapindex.c, gmapindex.c: Moved gencompress function inside
27158      snapindex (previously in gencompress.c).
27159
27160    * segmentpos.c: Changed type of relstart and relend to int, due to problems
27161      with long.
27162
271632003-10-16  twu
27164
27165    * dynprog.c: Removed splice-site.c.
27166
27167    * commafmt.c, commafmt.h, genomicpos.c, genomicpos.h, match.c, pair.c,
27168      segmentpos.c: Moved commafmt command to genomicpos.c.
27169
27170    * types.h: Defined UINT8 only if HAVE_64_BIT is defined.
27171
27172    * splice-site.c, splice-site.h: Removed splice-site.c from CVS.
27173
27174    * readcirc.c, readcirc.h: Removing readcirc from CVS.
27175
27176    * radixsort.c, radixsort.h: Removing radixsort from CVS.
27177
27178    * boyer-moore.c, boyer-moore.h: Removed Boyer-Moore procedures from CVS.
27179
27180    * longoligomer.h: Introduced constants and procedures for Longoligomer_T on
27181      32-bit systems.
27182
27183    * snapindex.c, gmapindex.c: Changed output type of write_genome_file.
27184
27185    * indexdb.c, indexdb.h: Introduced Storedoligomer_T.
27186
27187    * iit-read.c: Added type cast from void * to char *.
27188
27189    * oligo.c, oligo.h: Created 32-bit versions of procedures.
27190
27191    * match.c: Removed functions Match_print() and oligo_nt().
27192
27193    * stage1.c: Removed mask from Block_T.  Removed function Match_print().
27194
27195    * block.c, block.h: Removed mask from Block_T.
27196
27197    * longoligomer.c: Added object Longoligomer_T for 32-bit systems.
27198
271992003-10-13  twu
27200
27201    * Makefile, chrnum.c, database.c, gencompress.c, genome.c, genome.h,
27202      genuncompress.c, stage1.c, chrnum.h, match.c, match.h, offset.c, offset.h,
27203      pair.c, pair.h, segmentpos.c, segmentpos.h, snap.c, snapindex.c, gmap.c,
27204      gmapindex.c, stage3.c, stage3.h: Changed unsigned int to more descriptive
27205      types.
27206
27207    * genomicpos.c, genomicpos.h, longoligomer.h, shortoligomer.h: Added new
27208      types.
27209
27210    * offsetdb.c, offsetdb.h: Added type for Chrnum_T.  Removed function
27211      Offset_position_to_chr.
27212
27213    * oligoindex.c, stage1.c, stage2.c: Changed unsigned long and unsigned int
27214      to more descriptive types.
27215
27216    * add-chrpos-to-endpoints.c: Removed file used for prototyping.
27217
27218    * rsort-check.c, rsort-test.c: Removed utility files for radixsort.
27219
27220    * sequence.c: Removed code for computing CRC32 checksum.
27221
27222    * sample-oligos.c: Removed sample-oligos.c, which was used for prototyping.
27223
27224    * Makefile: Removed cksum-fa
27225
27226    * prb.c, prb.h: Removed prb.c and prb.h, which implemented red-black trees.
27227
27228    * block.c, block.h, indexdb.c, indexdb.h, match.c, match.h, oligo.c,
27229      oligo.h, oligoindex.c, oligoindex.h: Changed unsigned long and unsigned
27230      int to more informative types.
27231
27232    * cksum.c: Removed cksum.c, which is now computed in sequence.c
27233
27234    * cksum-fa.c: Removed cksum-fa.c, which was a utility program.
27235
27236    * cell.c, cell.h: Removed Cell_T, which was designed for the HashDB storage
27237      scheme for genomic oligomers.
27238
27239    * pair.c, pair.h, sequence.c, sequence.h, stage3.c: Added provision for
27240      correcting coverage in the presence of genomic gaps at the ends.
27241
27242    * chrnum.c: Fixed a bug in printing output.
27243
272442003-10-09  twu
27245
27246    * stage3.c, stage3.h: Added reward for spliced cDNAs based on number of
27247      exons, if it's greater than 2.  Also, added flag for conservative behavior
27248      for splice site prediction, by reducing the reward for canonical splice
27249      sites. Note, however, that such behavior causes SNAP to perform poorly in
27250      the presence of sequence errors.
27251
27252    * dynprog.c, dynprog.h, params.c, params.h, snap.c, gmap.c: Added flag for
27253      conservative behavior for splice site prediction, by reducing the reward
27254      for canonical splice sites.  Note, however, that such behavior causes SNAP
27255      to perform poorly in the presence of sequence errors.
27256
272572003-10-07  twu
27258
27259    * iit-read.c: Adapt to new format of bounds database contents.
27260
27261    * pair.c: Makes correct call to IIT_get when coordinates are in reverse
27262      order.
27263
272642003-08-19  twu
27265
27266    * Makefile, iit-read.c, iit.c, iit.h, interval-read.c, interval.c,
27267      interval.h, pair.h, params.h, snap.c, gmap.c, stage3.h: Changed filenames
27268      from iit.c and interval.c to iit-read.c and interval-read.c
27269
27270    * whats_on: Generalized procedure for identifying FASTA files containing
27271      ESTs.
27272
27273    * sequence.c: Fixed conversion of char to unsigned char.
27274
27275    * Makefile, bounds.c, bounds.h, iit-read.c, iit-read.h, iit.c, iit.h,
27276      interval-read.c, interval-read.h, interval.c, interval.h, pair.c, pair.h,
27277      params.c, params.h, snap.c, gmap.c, stage3.c, stage3.h: Changed calls to
27278      iit to open the files just once.
27279
27280    * bounds.c, bounds.h: Adding bounds.c file to compute bounds.
27281
27282    * Makefile, database.c, database.h, iit-read.c, iit.c: Added ability to use
27283      a gene bounds iit file.
27284
27285    * interval-read.c, interval-read.h, interval.c, interval.h: Revised version
27286      from berkeleydb CVS repository.
27287
27288    * pair.c, pair.h, params.c, params.h, snap.c, gmap.c, stage3.c, stage3.h:
27289      Added capability to use a gene bounds iit file.
27290
272912003-08-18  twu
27292
27293    * iit-read.h, iit.h: Initial import into CVS.
27294
27295    * iit_get.c: Compare only to query length.
27296
27297    * get-genome.c: Fixes procedure isrange to make a copy of the string.
27298
272992003-07-07  twu
27300
27301    * whats_on: Changed behavior to not die if directory isn't found.
27302
27303    * chrnum.c, chrnum.h, segmentpos.c: Fixed sorting and printing for
27304      chromosomes like 2L.
27305
27306    * stage3.h: Removed Stage3_goodness as an external procedure.
27307
27308    * stage3.c: Changed goodness within a given chromosomal segment to include
27309      canonical introns, but goodness between chromosomal segments to exclude
27310      this.
27311
27312    * stage2.c: Increased MAXHITS from 20 to 1000.  Previous value was too low
27313      and led to splicing errors.
27314
27315    * get-genome.c: Changed program to try segment first as a chromosome, then
27316      as a contig.
27317
27318    * offsetdb.c: Improved output statements to print beginning and ending of
27319      chromosomes.
27320
273212003-06-19  twu
27322
27323    * dynprog.c: Changed penalties.  Made reward for extension multiplicative.
27324
27325    * Makefile, md5.c, md5.h, md5.t.c, params.c, params.h, sequence.c,
27326      sequence.h, snap.c, snap_compress.pl, snap_uncompress.pl, gmap.c, types.h,
27327      gmap_compress.pl, gmap_compress.pl.in, gmap_uncompress.pl,
27328      gmap_uncompress.pl.in: Added MD5 calculations.
27329
27330    * stage3.c: Added debugging statements for finalscore.
27331
27332    * cksum-fa.c: Added comments.
27333
273342003-06-17  twu
27335
27336    * Makefile: Rearranged lines.
27337
27338    * snap.c, gmap.c: Fixed calculation of indexdb to occur only once for
27339      user-provided segment.
27340
27341    * sequence.c, sequence.h: Added computation for crc32.
27342
273432003-06-13  twu
27344
27345    * dynprog.c: Changed reward for partial match to be zero.
27346
27347    * stage3.c: Fixed bug where pairs_fwd or pairs_rev might be NULL.
27348
273492003-06-03  twu
27350
27351    * stage2.c, stage2.h: Created separate paths for forward and revcomp
27352      directions after smoothing.  Added back intron score during calculations.
27353
27354    * stage3.c: Separated calculations of forward and revcomp paths.
27355
27356    * snap.c, gmap.c: Increased size of maxlookback.
27357
27358    * pair.c, pair.h: Added calculation of number of canonical exons.
27359
27360    * dynprog.c: Setting finalscore as a return value.
27361
27362    * stage3.c: Added number of canonical exons to goodness criterion.  Added
27363      "Stage 3" to debug statements.
27364
273652003-05-27  twu
27366
27367    * snap.c, gmap.c, stage2.c, stage2.h, stage3.c, stage3.h: Moved alignment of
27368      different cDNA direction from stage 2 to stage 3.
27369
27370    * pair.c, pair.h: Changed Pair_fracidentity to work on a list, rather than
27371      an array.
27372
273732003-05-22  twu
27374
27375    * stage3.c: Changed goodness function to ignore number of canonical introns.
27376
27377    * snap.c, gmap.c: Added parameter for sufflookback, potentially different
27378      from maxlookback, but found that setting maxlookback >> sufflookback led
27379      to long, poor alignments, so set maxlookback = sufflookback.
27380
27381    * params.c, params.h, stage2.c, stage2.h: Added separate parameter for
27382      sufflookback, to be used in stage 2, and possibly different from
27383      maxlookback, used in stage 3.
27384
273852003-05-03  twu
27386
27387    * genome.c, genuncompress.c, iit-read.c, iit.c, indexdb.c: Removed
27388      MAP_VARIABLE from mmap command, because it is not available in Linux.
27389
27390    * hash-test.c, hashdb-read.c, hashdb-read.h, hashdb-write.c, hashdb-write.h,
27391      hashdb.c, hashdb.h: Removed hashdb files, which have been replaced by
27392      indexdb.
27393
27394    * whats_on: Added error message.
27395
27396    * Makefile: Removed old Makefile commands.
27397
27398    * snapgenerate.c, snapindex.c, gmapindex.c: Moved functions from
27399      snapgenerate.c to snapindex.c, so only snapindex is needed to create SNAP
27400      genome files.
27401
27402    * genuncompress.c: Initial import into CVS.
27403
274042003-04-30  twu
27405
27406    * stage2.c: Added check for MAXHITS in stage 2, to prevent slowness problems
27407      from repetitive cDNAs in repetitive genomic segments (such as AA704019).
27408
27409    * stage1.c: Added debugging statement.
27410
27411    * snap_uncompress.pl, gmap_uncompress.pl, gmap_uncompress.pl.in: Fixed
27412      problem where gpos was not handled correctly for the minus strand.
27413
27414    * chrnum.c: Fixed problem where signed chromosomes were being printed
27415      incorrectly.
27416
274172003-04-29  twu
27418
27419    * whats_on: Fixed problem where genomic coordinates were in the order of
27420      largest, then smallest (reverse strand).
27421
274222003-04-27  twu
27423
27424    * stage3.c: Removed queryoffset.
27425
27426    * stage2.c, stage2.h: Removed queryoffset.  Made sampling interval variable.
27427       Added bounding of a querypos to a single hit if its top score exceeds its
27428      second highest score.
27429
27430    * snap.c, gmap.c: Changed lookback and extramaterial_paired.
27431
27432    * sequence.c: Changed trimming to leave non-poly-A/T oligomers.
27433
27434    * chrnum.c, database.c, get-genome.c, segmentpos.c: Changed interpretation
27435      of chromosome numbers to allow all single letters and all numbers.
27436
27437    * offsetdb.c: Extra blank line.
27438
274392003-04-16  twu
27440
27441    * Makefile, accpos.c: Removed file accpos.c, which isn't being used anymore.
27442
27443    * genome.c, sequence.c, sequence.h: Removed offset as a parameter for
27444      Sequence_genomic_new.
27445
27446    * mem.c: Removed upper limit check on allocating memory.
27447
27448    * pair.h: Removed queryoffset from print routines.
27449
27450    * pair.c: Removed queryoffset from print routines.  Fixed calculation of
27451      genomic distances for Crick strand.
27452
274532003-04-14  rkh
27454
27455    * config, cvswrappers: *** empty log message ***
27456
274572003-04-09  rkh
27458
27459    * config: *** empty log message ***
27460
274612003-04-07  twu
27462
27463    * dynprog.c: Reduced rewards for canonical introns.
27464
27465    * pair.c: Added conversion to uppercase.
27466
27467    * mem.c: Added check for unexpectedly large allocations.
27468
274692003-04-02  twu
27470
27471    * stage3.c: Made separate procedures for 3' and 5' ends.  Turned off
27472      Boyer-Moore extension at ends.  Added checks to prevent dynamic
27473      programming past end of sequence.
27474
27475    * params.c: Removed freeing of version.
27476
27477    * pairpool.c: Added additional debugging checks.
27478
27479    * pair.c: Improved output for user-provided segments.
27480
27481    * indexdb.c, indexdb.h, match.c, matchpair.c, matchpair.h, offset.c,
27482      sequence.c, sequence.h, snap.c, gmap.c: Now performing stage 1 on
27483      user-provided segments.  This eliminates poor alignments when the
27484      user-provided segment is longer than stage 1 would have provided.
27485
27486    * segmentpos.c: Added limit to number of accessions reported.
27487
27488    * request.c, request.h, blackboard.c, blackboard.h: Changed from name from
27489      genomicseg to usersegment.
27490
27491    * stage1.c: Removed offset from call to Block_T procedures.
27492
27493    * genome.c: Renamed some procedures.
27494
27495    * dynprog.c: Increased penalties for mismatch.
27496
27497    * chrnum.c: Allowed chromosome 0.
27498
27499    * block.c, block.h: Removed offset from list of parameters.
27500
275012003-03-27  twu
27502
27503    * pair.c, snap_compress.pl, snap_uncompress.pl, stage3.c, gmap_compress.pl,
27504      gmap_compress.pl.in, gmap_uncompress.pl, gmap_uncompress.pl.in: Changed
27505      alignment output for dual breaks.
27506
275072003-03-25  twu
27508
27509    * dynprog.c: Created an inline procedure and scheme for scoring canonical
27510      and alternate introns.  Increased penalties for mismatches.
27511
27512    * intron.c, intron.h: Moved most functions to other files, to increase speed.
27513
27514    * pair.c, pair.h, pairdef.h, pairpool.c, stage3.c: Added field to Pair_T
27515      object to denote a gap.
27516
27517    * sequence.c: Fixed bug that caused large amounts of memory to be allocated.
27518
275192003-03-24  twu
27520
27521    * snap_compress.pl, gmap_compress.pl, gmap_compress.pl.in: Introduced a
27522      better error statement.
27523
27524    * stage2.c: Changed sampling to start at -1 after the first 8-mer missed,
27525      and then go back by the Nyquest rate.
27526
27527    * stage3.c: Introduced peelback for single gaps.
27528
275292003-03-21  twu
27530
27531    * genome.c, pair.c, pair.h, sequence.c, sequence.h, snap.c, gmap.c,
27532      stage3.c: Fixed algorithm to handle poly-T starts as well as poly-A ends.
27533      Added extra information to Sequence_T structure and output procedures to
27534      handle this correctly.
27535
27536    * snap.c, gmap.c: Fixed problems with Stage3_T objects that were not
27537      assigned to NULL. Added flushing of output for debugging.
27538
27539    * dynprog.c: Fixed dynamic programming on ends so the genomic segment won't
27540      stick out.
27541
275422003-03-20  twu
27543
27544    * spidey_compress.pl: Modified routine to look for spaces of at least 10,
27545      instead of 20.
27546
27547    * dynprog.c: Added a separate reward for canonical introns, depending on the
27548      defect rate.
27549
27550    * list.c, list.h: Added a command for setting the head of a list.
27551
27552    * pair.c: Fixed counting of indels.
27553
27554    * pairpool.c: Created new debugging commands.
27555
27556    * snap.c, gmap.c: Added trimming of first or last exon in stage 2 if the
27557      defect rate is high enough and the exons are too long.  Increased lookback
27558      from 60 to 90.
27559
27560    * stage3.c: Modified peelback to go past nonconsecutive hits, stopping only
27561      at an intron.
27562
27563    * stage2.c, stage2.h: Added trimming of first or last exon if the defect
27564      rate is high enough and the exons are too long.
27565
275662003-03-16  twu
27567
27568    * stage2.c: Added hooks for making smooth_path depend on defect_rate, but
27569      this appears to be a bad idea.
27570
27571    * pair.c: Improved consistency check to work when cdna_direction is
27572      initially zero.
27573
27574    * dynprog.c, dynprog.h, stage3.c: Changed effect of defect rate to be on
27575      mismatches and gaps, rather than intron scores.
27576
275772003-03-15  twu
27578
27579    * pair.c, pair.h, snap.c, gmap.c, stage2.c, stage2.h, stage3.c, stage3.h:
27580      Added check for consistency of intron directions, and ability to back
27581      track to stage 2 with forced cdna_directions if the stage 3 result is
27582      inconsistent.
27583
275842003-03-14  twu
27585
27586    * dynprog.c, dynprog.h, pair.c, pair.h, stage2.c, stage2.h, stage3.c: Added
27587      estimation of defect_rate in stage 2, and used it to change parameters in
27588      dynamic programming and extension of ends.
27589
275902003-03-12  twu
27591
27592    * stage3.c: Changed limitation on Boyer-Moore search to be a certain number
27593      of hits.  This compensates for the fact that smaller oligomers will occur
27594      more frequently than longer ones, and that longer ones are more
27595      statistically significant.
27596
27597    * stage3.c, stage3.h: Limited length of Boyer-Moore search at ends.  Changed
27598      name of minendsearch to minendtrigger.
27599
27600    * params.c, params.h, snap.c, gmap.c: Changed name of minendsearch to
27601      minendtrigger.
27602
276032003-03-11  twu
27604
27605    * dynprog.c: Extended the search range of bridge_gap, so that it finds
27606      introns even at the bounds of the dynamic programming.
27607
27608    * params.c, params.h, snap.c, gmap.c, stage3.c, stage3.h: Added parameter
27609      for minendsearch.
27610
27611    * dynprog.c: Fixed safety check in intron_score for reading off end of
27612      segment.
27613
27614    * stage3.c: Rearranged computation of stage 3, such that middle is computed
27615      first, then cDNA direction is recomputed, then 5' and 3' ends are computed.
27616
27617    * pair.c, pair.h: Added function for computing cDNA direction from list of
27618      pairs.
27619
27620    * dynprog.c: Adjusted various dynamic programming scores.  Fixed coordinates
27621      in gap.  Added check for very short introns.
27622
27623    * stage3.c: Discrimination between paired gap dynamic programming at ends
27624      and in middle.
27625
27626    * dynprog.c, dynprog.h: Major rewrite of dynamic programming procedures.
27627      Changed from Gotoh algorithm to pure banded procedure.  Reversing
27628      sequences when necessary, so all computations are symmetric.
27629
276302003-03-10  twu
27631
27632    * sim4_compress.pl, spidey_compress.pl: Added output of the number of exons.
27633
27634    * stage3.c: Added check for genomejump being zero or negative, which would
27635      give rise to a position beyond the genomic segment.
27636
27637    * pair.c: Added check for zero denominator.
27638
27639    * boyer-moore.c: Added check for sequence to consist entirely of valid
27640      nucleotides.
27641
276422003-03-09  twu
27643
27644    * spidey_compress.pl: Added printing of exon lengths, intron lengths, and
27645      dinucleotides, to match new output of snap_compress.pl.  Fixed problems
27646      with parsing Spidey output.
27647
27648    * sim4_compress.pl: Added printing of exon lengths, intron lengths, and
27649      dinucleotides, to match new output of snap_compress.pl.
27650
27651    * snap_compress.pl, gmap_compress.pl, gmap_compress.pl.in: Fixed problem
27652      when reverse intron is GT-AG.
27653
27654    * snap_uncompress.pl, gmap_uncompress.pl, gmap_uncompress.pl.in: Fixed bug
27655      that occurs when snap was called with -N, without printing intron lengths.
27656
276572003-03-08  twu
27658
27659    * stage1.c: Fixed a memory leak from not freeing Stage1_T object.
27660
27661    * block.c: Fixed bug which caused a memory leak because we were overwriting
27662      a previous querypos.
27663
27664    * oligo.c: Fixed debug message.
27665
276662003-03-07  twu
27667
27668    * snap.c, gmap.c: Reduced stage1size for short query sequences (< 60 bp).
27669
27670    * match.c, match.h, stage1.c, stage1.h: Fixed Match_print to print the
27671      correct oligo.
27672
27673    * get-genome.c: Changed header to contain the version number.
27674
27675    * snap_uncompress.pl, gmap_uncompress.pl, gmap_uncompress.pl.in: Added exon
27676      lengths to compressed output.
27677
27678    * snap_compress.pl, gmap_compress.pl, gmap_compress.pl.in: Added exon
27679      lengths to compressed output.  Removed printing of dinucleotides for
27680      canonical introns.
27681
276822003-03-06  twu
27683
27684    * stage3.c: Cleaned up code extensively.  Added Boyer-Moore searches on both
27685      ends of cDNA.
27686
27687    * dynprog.c, dynprog.h: Cleaned up code by making separate procedures for
27688      single gap in middle, and 5' and 3' ends.
27689
27690    * pair.c, pair.h: Added procedure for dumping a list of pairs.
27691
27692    * boyer-moore.c: Removed debugging statements.
27693
276942003-03-04  twu
27695
27696    * Makefile, boyer-moore.c, boyer-moore.h: Addition of Boyer-Moore string
27697      search.
27698
27699    * stage3.c: Consolidated peelback code.  Beginning to insert Boyer-Moore
27700      code.
27701
27702    * stage2.c: Fixed bug where index was -1.
27703
27704    * block.c, block.h, oligo.c, oligo.h, stage1.c: Fixed code to use stage1size
27705      instead of INDEX1PART in certain places.
27706
27707    * snap_compress.pl, gmap_compress.pl, gmap_compress.pl.in: Fixed code to
27708      handle genomic accession when genomic sequence is provided by the user.
27709
27710    * pair.c, sequence.c, sequence.h, snap.c, gmap.c: Fixed code to print out
27711      genomic accession when genomic sequence is provided by the user.
27712
27713    * match.c: Fixed code to print just forward oligo.
27714
277152003-03-03  twu
27716
27717    * intron.c, pair.c: Changed '===...===' to represent a non-canonical intron.
27718
27719    * dynprog.c: Reduced reward to semi-canonical introns to be slightly less
27720      than that for canonical introns.
27721
27722    * stage3.c: Changed output of large gaps from '=========' to '###...###'.
27723
27724    * snap_uncompress.pl, gmap_uncompress.pl, gmap_uncompress.pl.in: Made
27725      changes to accommodate enhancements to SNAP, namely use of '#' for large
27726      gaps and switch of intronends and intronlengths info.
27727
27728    * snap_compress.pl, gmap_compress.pl, gmap_compress.pl.in: Initial import
27729      into CVS.
27730
277312003-03-02  twu
27732
27733    * snap_uncompress.pl, gmap_uncompress.pl, gmap_uncompress.pl.in: Initial
27734      import into CVS.
27735
27736    * Makefile, dynprog.c, intron.c, intron.h, pair.c, splice-site.c: Removed
27737      use of splice site matrices and added identification of semi-canonical
27738      dinucleotides.
27739
277402003-02-28  twu
27741
27742    * params.c, params.h, snap.c, gmap.c, stage3.c, stage3.h: Changed
27743      extramaterial at the end and for paired to be parameters.
27744
27745    * get-genome.c: Changed program to check only first four letters of genomic
27746      name.
27747
277482003-02-11  twu
27749
27750    * pair.c, pair.h, stage3.c: Adjusted goodness score of alignment by number
27751      of canonical introns.
27752
27753    * dynprog.c, dynprog.h, params.c, params.h, snap.c, gmap.c, stage3.c,
27754      stage3.h: Parameterized band size in dynamic programming and increased
27755      bands for cross-species alignment.
27756
27757    * oligoindex.c, oligoindex.h, params.c, params.h, snap.c, gmap.c, stage2.c,
27758      stage3.c, stage3.h: Parameterized INDEXSIZE and made it different for
27759      cross-species alignment.
27760
27761    * stage2.c: Added smooth_path step in stage 2 to remove short spurious exon
27762      hits.
27763
277642003-02-03  twu
27765
27766    * params.c, params.h: Replaced dbroot with version.
27767
27768    * snap.c, gmap.c: Added reporting of version to program.
27769
277702003-01-27  twu
27771
27772    * stage3.c: Fixed problem where a base pair was missed on the 5' end.
27773
27774    * stage2.c: Fixed problems where genomic matches can overlap.
27775
27776    * pair.c: Fixed problems in computing exon endpoints.
27777
277782003-01-22  twu
27779
27780    * stage3.c: Reverted back to old method of building pairs in the middle.
27781
27782    * pair.c: Added post-processing check for a gap at the end of the alignment.
27783
27784    * oligoindex.c: Added check for poly-T.
27785
277862003-01-03  twu
27787
27788    * stage3.c: Made some changes to eliminate large gaps at the 3' end.
27789
27790    * snap.c, gmap.c: Improved handling of case where user provides both cDNA
27791      and genomic files.
27792
27793    * stage3.c: Fixed bug when no pairs are found.
27794
27795    * sequence.c: Fixed bug in failing to initialize.
27796
277972002-12-30  twu
27798
27799    * params.c, params.h: Added parameter for fwdonlyp.
27800
27801    * pairpool.c: Fixed small memory leak.
27802
27803    * params.h: Changed genomeinvert from a bool to an int.
27804
27805    * pair.c: Fixed bug where pointer was advanced before freeing it.
27806
278072002-12-11  twu
27808
27809    * snap.c, gmap.c: Fixed problem where complement table was not initialized
27810      early enough.
27811
278122002-12-10  twu
27813
27814    * sequence.c, sequence.h, snap.c, gmap.c: Improved procedure for trimming
27815      poly-A tails.
27816
27817    * pair.c: Increased space for positions from 12 to 14.
27818
27819    * complement.h: Removed extraneous semicolon.
27820
27821    * stage2.c: Fixed problem where no matching 8-mers are found.
27822
278232002-12-04  twu
27824
27825    * snapindex.c, gmapindex.c: Write accession names to .aux file, even if they
27826      do not start with NT_ or GA_.
27827
27828    * snap.c, gmap.c: Added routines for adding signs to chromosomes, inverting
27829      the genome, printing intron lengths, and trimming poly-A tails.
27830
27831    * pair.c, pair.h, params.c, params.h, stage3.c, stage3.h: Added routines for
27832      adding signs to chromosomes, inverting the genome, and printing intron
27833      lengths.
27834
27835    * sequence.c, sequence.h: Added routines for trimming poly-A tails.
27836
27837    * chrnum.c, chrnum.h, match.c: Added routines for adding signs to
27838      chromosomes.
27839
27840    * Makefile, complement.c, complement.h, genome.c: Added files for handling
27841      complements.
27842
278432002-11-26  twu
27844
27845    * match.c, pair.c: Changed printing of FWD/REV to +/-.  Added printing of
27846      intron lengths.
27847
27848    * pair.c, pair.h, params.c, params.h, snap.c, gmap.c, stage3.c, stage3.h:
27849      Added ability to print genome first in alignment.
27850
278512002-11-25  twu
27852
27853    * snap.c, gmap.c: Added iteration code for cross-species alignments.
27854
27855    * Makefile, matchpair.c, matchpair.h: Added an object Matchpair_T to hold
27856      pairs of Match_T objects.
27857
27858    * segmentpos.c: Added a check against freeing a null value.
27859
27860    * result.c, result.h: Created a Stage1_T object that can hold state, for
27861      resuming stage 1 calculations later.
27862
27863    * pair.c, stage3.c: Changed definition of coverage to be based on length of
27864      query sequence that aligns.
27865
27866    * dynprog.c: Changed allocation procedures for Matrix_T and Directions_T.
27867      Provided hooks for doing band-limited memory clearing, but this won't work
27868      with the Gotoh P1 and Q1 matrices.
27869
27870    * stage2.c: Added a seenone check to protect against long stretches of N's
27871      in the genome.
27872
27873    * stage1.c, stage1.h: Created a Stage1_T object that can hold state, for
27874      resuming stage 1 calculations later.  Stage1_T contains a list of
27875      Matchpair_T objects, and some procedures have been moved to matchpair.c.
27876
27877    * params.c, params.h: Added parameters for crossspecies and changed name of
27878      maxextend to maxstutter.
27879
278802002-11-20  twu
27881
27882    * dynprog.c, dynprog.h, stage3.c: When calling dynprog, now passing pointers
27883      to subsequence rather than copying subsequences.
27884
27885    * block.c, block.h: Simplified procedure for processing oligos by Block_T
27886      object.
27887
27888    * params.c, params.h: Added parameters for stage1size and maxlookback.
27889
27890    * pair.c, pair.h, stage3.c, stage3.h: Added counts for unknowns and
27891      reporting of coverage.
27892
27893    * oligoindex.c: Increased size of memory blocks from 10 to 50.
27894
27895    * oligoindex.c: Replaced realloc function with explicit calls to calloc and
27896      free, because Third Degree reported occasional errors with realloc.
27897
27898    * indexdb.c, indexdb.h, oligo.c, oligo.h, snap.c, gmap.c, stage1.c,
27899      stage1.h: Major change to stage 1 procedure to work on either 24-mers or
27900      18-mers.
27901
27902    * mem.c: Added blank line.
27903
27904    * match.c, match.h: Added procedure for Match_copy and simplified Match_new.
27905
27906    * genome.c: Inlined procedure fill_buffer.
27907
27908    * genome.c: Simplified routine for fill_buffer.
27909
279102002-11-15  twu
27911
27912    * indexdb.c, indexdb.h: Added code for ignoring poly A hits.  Added
27913      procedure for reading 12-mer positions.
27914
27915    * pair.c: Removed debugging statement.
27916
27917    * block.c, match.c, match.h, oligo.c, stage1.c, stage1.h: Parameterized
27918      stage1size.
27919
279202002-11-11  twu
27921
27922    * pair.c, pair.h: Distinguished between mismatches and indels.  Fixed cases
27923      where gaps need to be merged (e.g., affy.HGU95A.34233_i_at, which created
27924      ===...======...=== when an 8-mer fell into a gap and was then aligned to
27925      either end of the gap by dynamic programming.
27926
27927    * params.c, params.h: Added flag for low stringency.
27928
27929    * stage3.c, stage3.h: Changed definition of LARGEQUERYGAP to be maxlookback.
27930       Distinguished between mismatches and indels.
27931
27932    * snap.c, gmap.c: Changed definition of LARGEQUERYGAP to be maxlookback.
27933      Added flag for lowstringency (12-mers).
27934
27935    * dynprog.c, dynprog.h, stage2.c: Changed definition of LARGEQUERYGAP to be
27936      maxlookback.
27937
27938    * result.c: Improved check on whether to free array in result.
27939
27940    * params.c, params.h, stage2.c, stage2.h: Made maxlookback a parameter.
27941
27942    * snap.c, gmap.c: Introduced heap memory for each thread for dynamic
27943      programming. Made maxlookback a parameter.
27944
27945    * stage3.c, stage3.h: Introduced heap memory for each thread for dynamic
27946      programming. Restricted peelback for consecutive positions.
27947
27948    * dynprog.c, dynprog.h: Introduced heap memory for each thread for dynamic
27949      programming.
27950
279512002-11-08  twu
27952
27953    * dynprog.c: Changed dynamic programming procedure to be banded.
27954
27955    * stage2.c, stage3.c: Revised stage 2 procedure to jump every INDEXSIZE,
27956      keep track of consecutive matches, and have a maximum lookback.  Changed
27957      stage 3 procedure accordingly, including increasing peelback to INDEXSIZE.
27958
27959    * snap.c, gmap.c: Changed default behavior to be ordered output.
27960
27961    * genome.c: Revmoed pre-loading for genome, and used madvise(MADV_DONTNEED)
27962      instead.
27963
279642002-11-07  twu
27965
27966    * dynprog.c: Added a zero gap penalty on the ends.  Changed mismatch penalty
27967      to be less than a match penalty, and reduced intron reward accordingly.
27968
27969    * stage3.c: Added a peelback on the 5' end, because it's just like half of a
27970      paired gap alignment.
27971
27972    * snap.c, gmap.c: Removed hack used for debugging.
27973
27974    * stage2.c: Introduced concept of a maximum lookback, and will now go beyond
27975      the previous limit if no hit has been found.
27976
27977    * genome.c: Changed genome to be pre-paged when user specifies it.
27978
27979    * indexdb.c: Changed type of i from int to size_t.
27980
279812002-11-06  twu
27982
27983    * pair.c, pair.h, stage3.c: Changed pairs in stage 3 object to be allocated
27984      as a separate block, so they can be output at a later time.
27985
27986    * blackboard.c, blackboard.h, reqpost.c, reqpost.h, request.c, request.h,
27987      result.c, result.h, snap.c, gmap.c: Updated multithreading system to
27988      handle ordered output with better throughput by adding an output queue.
27989
27990    * genome.c, indexdb.c, pair.c: Added header for string.h to eliminate
27991      compiler warnings about strlen type.
27992
27993    * pairpool.c: Increased chunk size from 10000 to 20000.
27994
27995    * stage2.c: Added debugging comments to generate a graph.
27996
27997    * stage3.c: In-lined calls to List_T and Pair_T accessor functions.
27998
27999    * pair.c, stage1.c: In-lined calls to List_T accessor functions.
28000
28001    * list.c, list.h: Added function to return value of the last element of a
28002      list.
28003
280042002-11-05  twu
28005
28006    * pairpool.c, pairpool.h: Removed calls to realloc(), because they do not
28007      preserve pointer values.  Replaced with allocation of chunks of memory as
28008      needed.
28009
28010    * dynprog.c, stage2.c: Changed two-dimensional matrices to be
28011      one-dimensional with pointer.
28012
28013    * blackboard.h, snap.c, gmap.c: Made minor tweaks to blackboard object,
28014      primarily alterating ninputs and noutputs outside the lock, and changing
28015      signal of end of output to be a null result.
28016
28017    * reqpost.c, reqpost.h: Added orderedp flag to send only appropriate signals.
28018
28019    * blackboard.c: Made minor tweaks to blackboard object, primarily alterating
28020      ninputs and noutputs outside the lock.
28021
280222002-11-04  twu
28023
28024    * dynprog.c, list.c, listdef.h, pairpool.c, pairpool.h, stage2.c, stage3.c:
28025      Added a pool of List_T cells for each thread to reduce heap contention.
28026
28027    * Makefile, blackboard.c, dynprog.c, dynprog.h, genome.c, genome.h, pair.c,
28028      pairdef.h, pairpool.c, pairpool.h, reqpost.c, reqpost.h, request.c,
28029      request.h, sequence.c, sequence.h, snap.c, gmap.c, stage2.c, stage2.h,
28030      stage3.c, stage3.h: Provided each worker thread with separate sources of
28031      heap memory for genomic sequence and for Pair_T objects.  Intended to
28032      reduce heap contention.
28033
28034    * stage2.c: Created define for MAXLOOKBACK.
28035
28036    * stage1.c, stage1.h: Changed constants to be based on those in indexdb.h.
28037
28038    * indexdb.c, indexdb.h: Changed stage 1 lookup to be based on 12-mers,
28039      rather than 8-mers.
28040
280412002-11-02  twu
28042
28043    * indexdb.c: Implemented binary search on third 8-mer.
28044
28045    * snap.c, gmap.c: Allowed user to specify full path of database in the -d
28046      flag.
28047
28048    * Makefile: Added stopwatch to Makefile.
28049
28050    * indexdb.c: Changed preloading of indexdb to touch each page effectively,
28051      not by using memcpy(), which fails to load in pages.
28052
28053    * sequence.c: Added check for first call to fgetc(input) being EOF.
28054
280552002-11-01  twu
28056
28057    * pair.c: Changed print routine to work properly on user-supplied genomic
28058      segments.
28059
28060    * genome.c, indexdb.c: Changed pre-load to use fread/fopen/fwrite, rather
28061      than memcpy, which fails to load the pages into memory.
28062
28063    * Makefile, block.c, block.h, oligo.c, oligo.h, params.c, params.h, snap.c,
28064      snapindex.c, gmap.c, gmapindex.c, stage1.c, stage1.h: Changed stage 1
28065      database to use index table of 8-mers, rather than a hash table of 24-mers.
28066
28067    * stopwatch.c, stopwatch.h: Added stopwatch function to program.
28068
28069    * genome.c, genome.h: Added batch mode by using mmap/memcpy, but this
28070      appears to fail on a clustered file system.
28071
28072    * indexdb.c, indexdb.h: Implemented Indexdb_T as a substitute for Hashdb_T.
28073
280742002-10-31  twu
28075
28076    * stage2.c: Changed stage 2 procedure to consider both forward and reverse
28077      complement introns in one pass.  Fixed a small bug in intron_score to
28078      require position >= 2.
28079
280802002-10-29  twu
28081
28082    * stage2.c, stage3.c: Replaced calls to Sequence_char with direct array
28083      access.
28084
28085    * Makefile, blackboard.h, genome.c, genome.h, hashdb-read.c, match.c,
28086      offset.c, oligoindex.c, pair.c, reqpost.c, reqpost.h, segmentpos.c,
28087      snap.c, gmap.c, stage2.c, stage3.c: Made various fixes for compiler
28088      warnings.
28089
28090    * stage3.c: Separated procedures for middle single gap and end single gap.
28091      Decreased size of single gap dynamic programming procedure for 5' and 3'
28092      ends to have genomejump = 2*queryjump.
28093
28094    * snap.c, gmap.c: Increased default extension to 30000 nt.
28095
28096    * dynprog.c: Prevented horizontal jumps on 3' end of splice site.  Adjusted
28097      score parameters.
28098
280992002-10-28  twu
28100
28101    * dynprog.c, oligoindex.h, stage2.c: Changed oligomer size in stage 2 from
28102      10 to 8, and adjusted dynamic programming parameters accordingly.
28103      Prevented genomic gap at the 5' edge of an intron.  Made initial
28104      cdna_direction test more robust.
28105
28106    * snap.c, gmap.c: Fixed calls to SNAP that don't involve any sequence (the
28107      -C and -L flags).
28108
28109    * stage3.c: Reduced minimum intron size from 10 to 9.
28110
28111    * stage1.c: Substituted the constant HASHSIZE for 24.
28112
28113    * dynprog.c: Increased reward for intron.  A score of 10 fails to identify a
28114      canonical intron with a gap.
28115
28116    * pair.c: Fixed misreporting of query start coordinate.
28117
28118    * snap.c, gmap.c: Fixed small memory leak.
28119
281202002-10-27  twu
28121
28122    * Makefile, dynprog.c, splice-site.c, splice-site.h: Added splice site
28123      calculations to find best intron.
28124
28125    * dynprog.c, dynprog.h, stage2.c, stage2.h, stage3.c, stage3.h: Improved
28126      stage 2 dynamic programming procedure to consider introns (only for
28127      consecutive query positions), to compute gap penalty based on difference
28128      of genomejump and queryjump, and to consider cDNA directions separately.
28129
28130    * snap.c, gmap.c: Improved handling of arguments for database search and for
28131      alignment to genomic segment.
28132
28133    * stage1.c, stage1.h: Fixed stage 1 to consider Watson and Crick strands
28134      separately.
28135
28136    * params.c, params.h, snap.c, gmap.c, stage1.c, stage1.h: Made extension in
28137      stage 1 a user-definable parameter.
28138
28139    * blackboard.c, blackboard.h, pair.c, request.c, request.h, sequence.c,
28140      sequence.h, snap.c, gmap.c: Provided ability to align cDNA against
28141      user-provided genomic segment.
28142
28143    * dynprog.c: Gave credit to half introns.
28144
281452002-10-26  twu
28146
28147    * genome.c, genome.h, oligoindex.c, oligoindex.h, snap.c, gmap.c, stage1.c,
28148      stage1.h, stage2.c, stage2.h, stage3.c, stage3.h: Changed genomicseg to be
28149      of type Sequence_T.
28150
28151    * block.c: Changed debug flag.
28152
281532002-10-25  twu
28154
28155    * Makefile, request.c, request.h, sequence.c, sequence.h, snap.c, gmap.c,
28156      stage1.c, stage1.h, stage2.c, stage2.h, stage3.c, stage3.h: Renamed
28157      Queryseq_T to Sequence_T.
28158
28159    * queryseq.c, queryseq.h: Renamed Queryseq_T to Sequence_T, to allow genomic
28160      sequences to be represented this way.
28161
28162    * pair.c, pair.h, stage3.c, stage3.h: Simplified argument lists of some
28163      functions.
28164
28165    * params.c, params.h, result.c, result.h: Allowed first-order approximation
28166      using stage1 results.
28167
28168    * stage2.c: Increased extension on left and right to find small terminal
28169      exons.
28170
28171    * stage1.c, stage1.h: Fixed assessment of whether getpair succeeded or
28172      failed.
28173
28174    * match.c, match.h, snap.c, gmap.c: Added first-order approximation, to use
28175      just stage1 results.
28176
28177    * block.c, block.h, oligo.c: Fixed bugs in Block_next_to_stoppos when the
28178      query sequence has many non-ACGT characters.
28179
28180    * Makefile, snap.c, gmap.c: Made compressed genome the default.
28181
28182    * hashdb-read.c, hashdb-write.c: Reverted to old hashtable format, which
28183      contains only two arrays.
28184
281852002-10-24  twu
28186
28187    * dynprog.c, dynprog.h, pair.c, pair.h, params.c, params.h, snap.c, gmap.c,
28188      stage3.c, stage3.h: Added diagnostic mode to print out asterisks instead
28189      of vertical bars where dynamic programming was done.
28190
28191    * genome.c, genome.h: Added ability to read compressed genomes.
28192
28193    * Makefile, gencompress.c: Added compression routine for genomes.
28194
28195    * blackboard.c, blackboard.h, reqpost.c, reqpost.h, snap.c, gmap.c: Added
28196      anyorder behavior to blackboard, and made it default.
28197
281982002-10-22  twu
28199
28200    * stage2.c: Removed code for memory freeing of positions, which is now
28201      performed by Oligoindex_T.
28202
28203    * oligoindex.c: Changed type of positions from void ** to unsigned int **,
28204      to make code clearer and more robust.
28205
28206    * pair.c, pair.h, params.c, params.h, snap.c, gmap.c, stage3.c, stage3.h:
28207      Added option to print universal genomic coordinates.
28208
28209    * genome.c, hashdb-read.c: Changed mmap to from MAP_PRIVATE to MAP_SHARED.
28210
28211    * genome.c, genome.h, params.c, params.h, snap.c, gmap.c, stage2.c,
28212      stage2.h: Changed Genome_T to be memory-mapped, rather than using fopen,
28213      which is needed for multithreading.
28214
28215    * stage1.c: Fixed bug where salvage procedure fails to find anything.
28216
28217    * mem.c: Enhanced mem.c to give actual location of failure.
28218
28219    * oligoindex.c, oligoindex.h, snap.c, gmap.c, stage2.c, stage2.h: Changed
28220      algorithm for stage 2 to allocate genomic positions dynamically in
28221      Oligoindex_T.  To limit number of positions stored, we prescan the
28222      queryseq to see what oligomers are relevant.
28223
28224    * stage2.c: Reverted back to previous Stage 2 strategy where we stored
28225      genomic sequence in oligoindex and scanned query sequence.
28226
28227    * oligoindex.c, oligoindex.h: Simplified routines greatly.
28228
282292002-10-21  twu
28230
28231    * stage2.c: Changed stage 2 strategy to index the query sequence rather than
28232      the genomic sequence.  This should result in some speed up.
28233
28234    * reader.c, reader.h, stage1.c: Made Reader_new function depend on sequence
28235      rather than Queryseq_T object.
28236
28237    * list.c, list.h: Added function List_index.
28238
28239    * oligoindex.c, oligoindex.h: Hard-coded interval, rather than passing it in.
28240
28241    * pair.c, snap.c, gmap.c: Added flag to avoid showing contig coordinates.
28242
28243    * params.c, params.h: Added Params_T object.
28244
28245    * Makefile, blackboard.c, blackboard.h, reqpost.c, reqpost.h, request.c,
28246      request.h, result.c, result.h, snap.c, gmap.c: Major change to make
28247      program multithreaded.  Introduced Blackboard_T, new Reqpost_T, and new
28248      Result_T objects.
28249
28250    * stage3.h: Added header file.
28251
28252    * stage2.c, stage2.h: Cleaned up procedures.  Passed in querylength via
28253      queryseq.
28254
28255    * stage1.c, stage1.h: Cleaned up procedures.  Made hashinterval a constant.
28256
28257    * queryseq.c, sequence.c: Removed macro for DEBUG2.
28258
28259    * oligoindex.c, oligoindex.h: Allowed offsets for oligoindex to be created
28260      separately (for individual worker threads.)
28261
28262    * block.c: Changed name from Result_T to Match_T.
28263
28264    * block.c, match.c, match.h, stage1.c, stage1.h, stage2.c, stage2.h: Changed
28265      name of Result_T to Match_T.
28266
28267    * result.c, result.h: Renamed Result_T to Match_T.
28268
28269    * stage3.c: Turned off debug statements.
28270
282712002-10-20  twu
28272
28273    * queryseq.c, sequence.c: Fixed small memory leak.
28274
28275    * Makefile, align.c, align.h, snap.c, gmap.c, stage2.c, stage2.h, stage3.c,
28276      stage3.h: Created Stage 3 and moved part of Stage 2 commands there.
28277
28278    * Makefile, align.c, align.h, queryseq.c, queryseq.h, reader.c, reader.h,
28279      sequence.c, sequence.h, snap.c, gmap.c, stage1.c, stage1.h, stage2.c,
28280      stage2.h: Added a separate Queryseq_T object, and moved some functions
28281      from Reader to Querypos.
28282
28283    * pair.c: Removed bottom ruler.
28284
28285    * block.c, block.h, snap.c, gmap.c, stage1.c, stage1.h: Removed
28286      multithreading from stage 1 (hash table reads).
28287
28288    * dynprog.c: Made maximize_entry inline to speed up dynamic programming.
28289
28290    * stage2.c: Added another 24 (hashsize) to extensions.  Without this, for
28291      some reason, we miss the ends.
28292
28293    * Makefile, dynprog.c, penalties.c, penalties.h: Removed Penalties_T object
28294      in order to increase speed.
28295
28296    * stage2.c: Changed extension to be based on remaining distance from end.
28297
282982002-10-19  twu
28299
28300    * snap.c, gmap.c: Eliminated parameters maxentries and indexsize.
28301
28302    * oligoindex.c, oligoindex.h: Made changes to improve speed, by eliminating
28303      unnecessary arrays.
28304
28305    * stage2.c, stage2.h: Made changes to improve speed, including making
28306      build_pairs_middle iterative and using the fill_oligo function where
28307      possible.
28308
28309    * align.c: Changed Pair_T object to reflect the actual case of the query
28310      sequence.
28311
28312    * oligoindex.c: Simplified construction of Oligoindex_T object.
28313
28314    * Makefile, align.c, align.h, oligoindex.c, oligoindex.h, snap.c, gmap.c,
28315      stage2.c, stage2.h: Made indexsize a hardcoded parameter.  Allocated space
28316      for Oligoindex once at beginning of program.
28317
28318    * align.c, stage2.c: Using Pair_T object instead of Result_T object
28319      throughout stage 2.
28320
28321    * stage1.c: Fixed memory leak.
28322
28323    * align.c, align.h, stage2.c: Major change to improve stage 2 efficiency.
28324      Using arrays instead of lists for the dynamic programming alignment.
28325
28326    * oligoindex.c: Changed debug statements from fprintf to printf.
28327
28328    * offset.c, offset.h: Added back to repository.
28329
283302002-10-18  twu
28331
28332    * stage1.c, stage2.c: Allowed alignments even if we can't find a matching
28333      pair on the 5' and 3' ends.
28334
28335    * oligoindex.c: Changed debug statements.
28336
28337    * align.c: Removed debug statement.
28338
28339    * Makefile: Added rule for counting lines of code.
28340
283412002-10-17  twu
28342
28343    * Makefile, segmentpos.c, segmentpos.h, snap.c, snapgenerate.c, gmap.c:
28344      Generated dump procedure to work on either the text offset file or the
28345      offset BerkeleyDB.  Added accession length to the output.
28346
28347    * match.c, match.h, result.c, result.h: Cleaned up unused or obsolete
28348      procedures.
28349
28350    * hashdb-read.c: Made hashindex memory mapped again.  Added madvise()
28351      commands to help with memory mapping.
28352
28353    * stage1.c: Changed algorithm for stage 1 to extend for 2 hash intervals
28354      past the first connectable pair of hits.
28355
28356    * block.c, block.h: Added ability to stop block at a certain position.
28357
283582002-10-15  twu
28359
28360    * dynprog.c, dynprog.h, stage1.c, stage2.c: Added counts of matches and
28361      mismatches on dynamic programming of single gaps, and used this to exclude
28362      dynamic programming results on 5' and 3' ends.
28363
28364    * snap.c, gmap.c, stage2.c, stage2.h: Increased EXTENSION from 90 to 1000.
28365      Included check for genomicpos2 against chromosomal length.
28366
28367    * hashdb-read.c, hashdb-write.c: Reading hashindex into memory instead of
28368      memory mapping it.
28369
28370    * dynprog.c: Cosmetic changes to debug macro.
28371
28372    * align.c: Cleaned out unused code.
28373
28374    * snap.c, gmap.c: Added flag for specifying maxentries (in stage 2).
28375
28376    * stage2.c: Fixed one-off error on requesting dynamic programming of 5' end.
28377
283782002-10-14  twu
28379
28380    * hashdb-write.h: Added log file.
28381
28382    * snapindex.c, gmapindex.c: Fixed minor bug relating to log file.
28383
28384    * hashdb-write.c: Added file pointer for a log file.
28385
28386    * snapindex.c, gmapindex.c: Raised default maxentries value from 5 to 20.
28387      Added file pointer for a log file.
28388
28389    * hashdb-write.c: Commented out monitoring statements.
28390
28391    * hashdb.c: Developed new hash function to give the same hash value for an
28392      oligo and its reverse complement.  This should improve page access for the
28393      hash lookup.
28394
28395    * stage2.c: Improved debugging statement.
28396
28397    * snap.c, gmap.c: Removed effect of maxentries in stage 2, which was causing
28398      some alignments to be short.
28399
28400    * align.c: Changed debug statements from fprintf to printf.
28401
28402    * hashdb-read.c: Fixed problem in binary search where we subtracted 1U from
28403      0U.
28404
28405    * dynprog.c: Allowed dynamic programming to identify introns even if
28406      lowercase.
28407
28408    * database.c: Reformatting.
28409
28410    * hashdb-read.c: Fixed bug where function was returning NULL prematurely.
28411
28412    * snapindex.c, gmapindex.c: Made second pass read only on auxfile.
28413
284142002-10-13  twu
28415
28416    * Makefile, block.h, database.c, database.h, match.c, oligo.h, request.h,
28417      result.c, snap.c, snapindex.c, gmap.c, gmapindex.c, stage1.h: Removed
28418      traces of PureDB package.
28419
28420    * Makefile, cell.c, cell.h, snapindex.c, gmapindex.c: Changed snapindex
28421      program to a two-pass process.  The first pass saves the .aux file, and
28422      the second pass creates the hash table.  This simplifies the Cell_T object
28423      greatly.
28424
28425    * cell.c, cell.h, hashdb-write.c, snapindex.c, gmapindex.c: Sorting cell
28426      entries by hashvalue then by oligo.
28427
284282002-10-12  twu
28429
28430    * hashdb-read.c: Fixed minor bug in binary search routine.
28431
284322002-10-11  twu
28433
28434    * Makefile, block.c, hashdb-read.c, hashdb-write.c: Changed structure of
28435      hashdb to have three tables: oligo_offset, oligos, and positions.
28436
28437    * hashdb-write.c: Checking totalsize of contents and setting file size
28438      initially to that.
28439
28440    * hashdb-read.c: Memory mapping hash contents now.
28441
284422002-10-10  twu
28443
28444    * snapgenerate.c: Fixed memory leaks.
28445
28446    * snap.c, gmap.c: Fixed memory leak from failure to free Offset_T object.
28447
28448    * snap.c, gmap.c: Fixed bug where datadir was freed.
28449
28450    * hashdb-read.c: Fixed bug where memory mapped offsets were freed.
28451
28452    * dynprog.c: Fixed bug where cL+1 or cL+2 exceeded length2L.
28453
28454    * oligo.c: Revised procedures to handle lowercase letters in the query
28455      sequence.
28456
28457    * hashdb-read.c: Fixed bug where nentries wasn't being set.
28458
28459    * Makefile: Divided hashdb into separate read and write files.
28460
28461    * block.c, block.h, oligo.c, oligo.h, request.c, request.h, snap.c, gmap.c:
28462      Changing from PureDB to our own Hashdb_T.
28463
28464    * stage1.c, stage1.h: Changing from PureDB to our own Hashdb_T.  Also fixed
28465      bug where results3 was not being initialized to NULL.
28466
28467    * snapindex.c, gmapindex.c: Divided hashdb file into a separate read and
28468      write file.
28469
28470    * snapgenerate.c: Using Offset_T object now, after we have written the
28471      chromosome file.
28472
28473    * hashdb-write.c: Fixed bug causing unaligned access errors, by splitting
28474      header into two 4-byte unsigned ints.
28475
28476    * hashdb-read.c: Fixed bug where length = 0.  Also fixed bug causing
28477      unaligned access errors, by splitting header into two 4-byte unsigned ints.
28478
28479    * hashdb-read.c, hashdb-read.h, hashdb-write.c, hashdb-write.h, hashdb.c,
28480      hashdb.h: Split Hashdb functions into separate read and write files.
28481
28482    * hashdb.c, hashdb.h: Provided option to switch between unsigned long and
28483      unsigned int for hashoffset_t.
28484
28485    * hashdb.c, hashdb.h: Changed offsets to be memory-mapped rather than read
28486      by file.
28487
28488    * Makefile, hashdb.c, hashdb.h, snapindex.c, gmapindex.c: Changing hash
28489      database to our own format.
28490
28491    * snapindex.c, gmapindex.c: Fixed bug where last oligo would not get stored.
28492
284932002-10-09  twu
28494
28495    * pair.c, segmentpos.c: Added missing header file for commafmt.
28496
28497    * Makefile: Revised object files needed for snapindex and snapgenerate.
28498
28499    * Makefile, accpos.c, add-chrpos-to-endpoints.c, block.c, block.h, cell.c,
28500      database.c, database.h, match.c, match.h, offsetdb.c, offsetdb.h,
28501      result.c, result.h, segmentpos.c, segmentpos.h, snap.c, snapgenerate.c,
28502      snapindex.c, gmap.c, gmapindex.c, stage1.c, stage1.h: Changed offset reads
28503      from a database to a structure read from a flat file.
28504
28505    * get-genome.c: Added -U flag to generate unmasked sequences.
28506
28507    * offset.c, offset.h: Renamed files from offset.* to offsetdb.h
28508
28509    * snap.c, gmap.c: Implemented print_details.
28510
28511    * stage1.c, stage1.h: Implemented print_details.  Fixed problem where
28512      dominated bounds were not being eliminated.
28513
28514    * stage2.c: Increased the peelback to identify introns.  Added debugging
28515      statements.
28516
28517    * align.c: Fixed greediness for finding introns.  Removed gap penalty and
28518      reward for intron.  Instead, implemented a tie breaker for scores based on
28519      genomic distance.  Increased the peelback to identify introns.
28520
285212002-10-08  twu
28522
28523    * align.c, block.c, match.c, match.h, result.c, result.h, snap.c, gmap.c,
28524      stage1.c, stage1.h, stage2.c: Changed stage 1 of algorithm to find bounds
28525      using 5' and 3' hits.
28526
28527    * Makefile, pair.c, pair.h, stage2.c: Changed goodness to be differences of
28528      matches and mismatches.
28529
285302002-10-07  twu
28531
28532    * dynprog.c: Changed recursive functions of traceback and scoreback to be
28533      iterative.
28534
28535    * snap.c, gmap.c, stage2.c: Added check for large query gaps and avoided
28536      doing dynamic programming on those.  Also added check for allpaths being
28537      NULL from stage 2.
28538
28539    * align.c: Toggled DEBUG.
28540
28541    * Makefile, snap.c, gmap.c, stage2.c, stage2.h: Added ability to print
28542      alignment summaries only.
28543
28544    * pair.c: Ignored N's in computing percent identity.
28545
28546    * Makefile, pair.c, pair.h, stage2.c: Added number of exons to calculations
28547      and output.
28548
28549    * snap.c, gmap.c, stage2.c, stage2.h: Made alignment procedure the default.
28550      Now sorting paths based on the goodness of the alignment.
28551
28552    * pair.c, pair.h, stage2.c: Removed npairs from some parameter lists.
28553
28554    * pair.c, pair.h, stage2.c, stage2.h: Added calculation for goodness, based
28555      on percent identity.
28556
28557    * match.c, pair.c, pair.h, result.c, segmentpos.c, segmentpos.h, snap.c,
28558      gmap.c, stage2.c, stage2.h: Now printing endpoints based on alignments, if
28559      available.
28560
28561    * list.c: Fixed bug in List_last.
28562
28563    * list.c, list.h: Added a List_last procedure.
28564
28565    * Makefile, match.c, match.h, result.c, result.h, snap.c, gmap.c, stage1.c,
28566      stage1.h, stage2.c, stage2.h: Created a Stage2_T object and reorganized
28567      calculations, in preparation for using the alignments to rank the results.
28568
28569    * Makefile, snap.c, gmap.c: Added parameter for maxaligns, the maximum
28570      number of alignments to print.
28571
285722002-10-06  twu
28573
28574    * dynprog.c: Fixed read of unallocated hash.
28575
28576    * align.c: Fixed read of uninitialized variable.
28577
28578    * align.c, dynprog.c, pair.c, pair.h, stage2.c: Added ability to recognize
28579      introns in revcomp direction, and to print correct indices for Crick
28580      strand matches.
28581
28582    * match.c, result.c: Simplified use of zerobasedp.
28583
28584    * snap.c, gmap.c, stage1.c, stage1.h, stage2.c, stage2.h: Changed variable
28585      names to distinguish between hashsize and indexsize.
28586
28587    * dynprog.c, genome.c, stage2.c: Fixed errors with the sequence and genomic
28588      indices.
28589
28590    * stage1.c: Removed list reversal to match new scheme for doing stage 1
28591      dynamic programming.
28592
28593    * align.c, pair.c: Enhanced debugging information.
28594
28595    * align.c: Revised code to make sure that we don't pick unwanted paths after
28596      the first.  We set the usedp flags and recompute dynamic programming on
28597      subsequent rounds to avoid using those results.  This should affect only
28598      stage 1, because maxpaths equals 1 on stage2.
28599
28600    * align.c: Removed gappenalty for stage 1 computation.  This was causing
28601      problems with multiple paths for HER2.
28602
28603    * pair.c, pair.h, stage2.c: Added procedure for summary of exons.
28604
28605    * snap.c, gmap.c: Made printout slightly better.
28606
28607    * match.c, result.c: Fixed miscount on number of matches.
28608
28609    * pair.c: No change.
28610
28611    * genome.c, genome.h: Added modules to retrieve genome sequences.
28612
28613    * dynprog.c: Minor restructuring of procedures.
28614
28615    * dynprog.c: Fixed coordinates in gap.  Changed gap output for non-introns.
28616
28617    * pair.c: Added printing of rulers in alignments.
28618
28619    * stage2.c: Fixed memory leaks.
28620
286212002-10-05  twu
28622
28623    * dynprog.c: Fixed major problem in paired gap assessments.  Need to
28624      subtract, not add, the entry in the right matrix.
28625
28626    * stage2.c: Changed criteria for single and paired gaps, based on a minimum
28627      intron length.  Created special case for the 3' end.
28628
28629    * penalties.c: Changed middle gap penalties to have bigger opening and
28630      smaller extend penalties.
28631
28632    * dynprog.c, dynprog.h: Changed concepts from short and long gaps to single
28633      and paired gaps.
28634
28635    * stage2.c: Added peelback procedure to help identify correct intron.
28636      Otherwise, the greedy oligo matching procedure can mask the intron
28637      boundaries.
28638
28639    * dynprog.c: Fixed bug for traceback on longgap, where we didn't start from
28640      the lower right cell.
28641
28642    * align.c, stage2.c: Increased size of stage 2 oligos from 8 to 10.
28643
28644    * align.c, oligoindex.c, oligoindex.h, snap.c, gmap.c, stage2.c, stage2.h:
28645      Added ability to limit maxentries in stage 2.
28646
28647    * dynprog.c: Changed alignment character for dynamic programming to help
28648      with debugging.
28649
28650    * dynprog.c, stage2.c: Implemented dynamic programming across long gaps.
28651
28652    * dynprog.c: Reordered priorities in traceback to be (1) continue in same
28653      direction, (2) diagonal, (3) vertical, and (4) horizontal.
28654
28655    * dynprog.c, dynprog.h, penalties.c, penalties.h, stage2.c: Cleaned up
28656      dynamic programming code for the three cases of FIVE, MIDDLE, and THREE.
28657      Added stub for dynamic programming of long gaps.
28658
28659    * pair.c, pair.h, stage2.c: Made improvements to the alignment output.
28660
286612002-10-04  twu
28662
28663    * match.c, pair.c, pair.h, result.c, snap.c, gmap.c, stage2.c: Added
28664      improvements to the alignment output.
28665
28666    * stage2.c: Added code to handle the 5' end properly.
28667
28668    * penalties.c: Changed some values for the penalty parameters.
28669
28670    * dynprog.c: Changed opening penalties to not include the extension.  Added
28671      special procedures for 5' and 3' ends of sequence, essentially
28672      implementing part of Smith-Waterman on each end.  Added special cases in
28673      traceback for 5' and 3' ends, but may not be necessary in light of the
28674      other changes.
28675
28676    * stage2.c: Added querypos and genomepos to the Pair object.  Reorganized
28677      various functions.
28678
28679    * pair.c, pair.h: Added querypos and genomepos to the Pair object.
28680
28681    * reader.h: Added another option to cDNAEnd_T.
28682
28683    * align.c: Fixed the precise bounds around an intron.
28684
28685    * Makefile, dynprog.c, dynprog.h, penalties.c, penalties.h: Added penalties
28686      object.  Provided ability to specify different penalties for left, middle,
28687      and right part of sequence.
28688
28689    * stage2.c: Moved printing procedure to another file.  Fixed small bug that
28690      caused us to miss printing a base.
28691
28692    * pair.c, pair.h: Removed printing of loci names from alignment.
28693
28694    * Makefile, align.c, dynprog.c, dynprog.h, pair.c, stage2.c: Added dynamic
28695      programming routine to take care of small gaps.
28696
28697    * Makefile, align.c, align.h, match.c, match.h, matrix.c, matrix.h,
28698      oligoindex.c, oligoindex.h, pair.c, pair.h, path.c, path.h, penalties.c,
28699      penalties.h, reader.c, reader.h, result.c, result.h, snap.c, gmap.c,
28700      stage1.c, stage1.h, stage2.c, stage2.h: Major change to algorithm to have
28701      two stages: one using hash table (24-mers) and another using an index
28702      table (8-mers).  Still need to incorporate a dynamic programming step for
28703      gaps in the final alignment.
28704
287052002-10-02  twu
28706
28707    * whats_on: Changed program to work with new data directory for alignment
28708      results.
28709
287102002-10-01  twu
28711
28712    * snap.c, gmap.c: Fixed problem where intronlen == 0.  Now requiring
28713      intronlen > 0.  Added extra carriage return when zero paths found.
28714
287152002-09-27  twu
28716
28717    * Makefile: Reduced number of object files used in SNAP.
28718
28719    * get-genome.c: Fixed use of fscanf to match the .chromosome and .contig
28720      file format.
28721
28722    * path.c, path.h: Simplified call to Path_compute to eliminate scoremat.
28723
28724    * penalties.c, penalties.h: Added procedure to create a default penalties
28725      object.
28726
28727    * match.c, result.c: Added line for number of matches.
28728
28729    * snap.c, gmap.c: Fixed bug where resultlist was uninitialized.  Allowed
28730      resultstring of 0.  Simplified call to Path_compute.
28731
28732    * whats_on: Added -R flag for release number.
28733
287342002-09-25  twu
28735
28736    * intlist.c, intlist.h, scoremat.c, scoremat.h: No longer need Intlist_T or
28737      Scoremat_T.
28738
28739    * Makefile, path.c, path.h, snap.c, gmap.c: Removing Sequence_T.  Using char
28740      * instead to represent sequences.
28741
28742    * reader.c, reader.h: Added Reader_pointer function.
28743
28744    * penalties.c, intlist.c, matrix.c, scoremat.c: Using CALLOC/FREE macros.
28745
28746    * snap.c, gmap.c: Inadvertent commit.  Adding routines to perform
28747      nucleotide-level dynamic programming.
28748
28749    * ring.c, ring.h: Removed Ring_T.  Apparently not used by other seqalign
28750      files.
28751
28752    * path.c, path.h: Premature commit.  Adding routines to analyze only
28753      submatrices.
28754
28755    * Makefile: Adding files from seqalign.
28756
28757    * intlist.c, intlist.h: Added files from seqalign to do nucleotide-level
28758      dynamic programming.
28759
28760    * get-genome.c: Added flag for release string.  Changed type of positions
28761      from long to unsigned int.
28762
28763    * offset.c, offset.h, offsetdb.c, offsetdb.h: Added datadir to
28764      Offset_read_file.
28765
28766    * match.c, match.h, result.c, result.h: Added Result_path command.
28767
28768    * snapgenerate.c, snapindex.c, gmapindex.c: Simplified strcpy/strcat calls
28769      to sprintf.
28770
28771    * matrix.c, matrix.h, penalties.c, penalties.h, ring.c, ring.h, scoremat.c,
28772      scoremat.h, path.c, path.h: Added to program for doing nucleotide-level
28773      dynamic program.  Taken from seqalign.
28774
28775    * path.c: Inadvertent commit.  Still editing.
28776
287772002-09-24  twu
28778
28779    * segmentpos.c, segmentpos.h, snapindex.c, gmapindex.c: Added
28780      superaccessions to accsegmentpos_db.
28781
287822002-09-23  twu
28783
28784    * radixsort.c: Fixed syntax error when monitoring is turned off.
28785
28786    * Makefile, radixsort.c: Added monitoring routine for radix sort.
28787
287882002-09-19  twu
28789
28790    * snapgenerate.c: Removed debug line.
28791
28792    * snapindex.c, gmapindex.c: Added option for using lowercase characters.
28793
287942002-09-18  twu
28795
28796    * Makefile, snapgenerate.c: Added program snapgenerate, to create text
28797      .chromosome, .contig, and .chromosome files.
28798
28799    * snapindex.c, gmapindex.c: Clarified the variable auxfile.
28800
28801    * snapindex.c, gmapindex.c: Clarified the variable dbroot.
28802
288032002-09-17  twu
28804
28805    * oligo.c: Made comment to explain Third Degree warning.
28806
28807    * Makefile, block.c, block.h, match.c, match.h, reqpost.c, reqpost.h,
28808      result.c, result.h, snap.c, gmap.c: Made changes to sample query sequence
28809      at a test interval and perform dynamic programming.
28810
288112002-09-16  twu
28812
28813    * endpoints.c, endpoints.h: Removed from source.
28814
288152002-09-12  twu
28816
28817    * block.c: Changed debug flag.
28818
28819    * Makefile: Changed C compiler flags.
28820
28821    * snap.c, gmap.c: Changed default directory to be in /usr/seqdb2_nb.
28822
288232002-08-30  twu
28824
28825    * endpoints.c, match.c, match.h, result.c, result.h, snap.c, gmap.c: Made
28826      changes to facilitate garbage collection, including adding a matchedp flag
28827      to results, and putting singleton results into an endpoint.
28828
28829    * block.c: Changed debug messages.
28830
28831    * endpoints.c, endpoints.h: Changed print routine.  Added code for query
28832      length.
28833
28834    * snap.c, gmap.c: Added consolidation of endpoints, and ranking of those to
28835      generate a single result.
28836
288372002-08-29  twu
28838
28839    * block.c, match.c, oligo.c, result.c, segmentpos.c: Added debug macros.
28840
28841    * endpoints.c, endpoints.h: Added commands for sorting endpoints and testing
28842      for adjacency.
28843
28844    * reader.c: Fixed test when startptr == endptr.
28845
28846    * snap.c, gmap.c: Implemented divide-and-conquer strategy on query sequence.
28847
288482002-08-28  twu
28849
28850    * snapindex.c, gmapindex.c: Turned off printing of subaccession messages.
28851
288522002-08-22  twu
28853
28854    * snapindex.c, gmapindex.c: Added timing statistics.
28855
28856    * block.c, oligo.c: Fixed coordinate calculations.  May need to check.
28857
28858    * snapindex.c, gmapindex.c: Added dump function.
28859
28860    * radixsort.c, radixsort.h: Changed accessor function to get a character
28861      rather than a pointer.  Fixed algorithm for case where byte equals strlen.
28862
28863    * cell.c, cell.h: Changed accessor function to get a character rather than a
28864      pointer.
28865
28866    * Makefile, rsort-check.c, rsort-test.c: Added a test and check routine for
28867      radixsort.
28868
28869    * Makefile, snapindex.c, gmapindex.c: Removed unnecessary files for
28870      snapindex.
28871
288722002-08-21  twu
28873
28874    * Makefile, block.c, block.h, endpoints.c, endpoints.h, match.c, match.h,
28875      offset.c, offsetdb.c, oligo.c, oligo.h, readcirc.c, reader.c, reader.h,
28876      request.h, result.c, result.h, snap.c, gmap.c: Major change to implement
28877      divide-and-conquer strategy.
28878
28879    * read.c, read.h: Changed name of file from read.c to readcirc.c
28880
28881    * block.c, block.h, endpoints.c: Partial changes to implement
28882      divide-and-conquer strategy.
28883
28884    * snapindex.c, gmapindex.c: Improved diagnostic messages.
28885
28886    * Makefile, offset.c, offsetdb.c, radixsort.c, radixsort.h, read.c,
28887      readcirc.c, segmentpos.c, snapindex.c, gmapindex.c: Fixed minor compiler
28888      warnings.
28889
28890    * cell.c: Using pointers rather than lists to store multiple positions for
28891      an oligo.  Fixed quicksort compare function accordingly.
28892
28893    * snapindex.c, gmapindex.c: Using pointers rather than lists to store
28894      multiple positions for an oligo.
28895
28896    * radixsort.c: Added small speed hacks.
28897
28898    * Makefile: Added quicksort as an option.
28899
28900    * Makefile, cell.c, cell.h, radixsort.c, radixsort.h, snapindex.c,
28901      gmapindex.c: Added radix sort as a replacement for quicksort.
28902
289032002-08-20  twu
28904
28905    * oligo.c: Fixed key_size for partial bytes.
28906
28907    * snapindex.c, gmapindex.c: Changed location of oligo file to be in dbenv
28908      directory, not a subdirectory.
28909
289102002-08-15  twu
28911
28912    * Makefile, block.c, block.h, oligo.c, oligo.h, request.c, request.h,
28913      snap-withenv.c, snap.c, snapindex.c, gmap.c, gmapindex.c: Made changes to
28914      accommodate sizes less than 32-mers.
28915
28916    * match.c, match.h, offset.c, offset.h, offsetdb.c, offsetdb.h, result.c,
28917      result.h, snap.c, gmap.c: Added ability to read chromosome information
28918      from file, but not done by default right now.
28919
289202002-08-11  twu
28921
28922    * Makefile, block.c, block.h, dpentry.c, dpentry.h, endpoints.c,
28923      endpoints.h, match.c, match.h, result.c, result.h, snap.c, gmap.c: Changed
28924      algorithm to work inward from both ends and find a single match.
28925
289262002-08-10  twu
28927
28928    * Makefile, snapindex.c, gmapindex.c: Fixed program to handle cases where
28929      interval is less than size.
28930
289312002-07-19  twu
28932
28933    * segmentpos.c: Changed type of querylen.
28934
28935    * whats_on: Changed suffix for db filenames.
28936
28937    * iit-read.c, iit.c, interval-read.c, interval-read.h, interval.c,
28938      interval.h: Changed binary storage format to be a single file.
28939
28940    * get-genome.c: Changed input format to accept a single string.
28941
289422002-07-13  twu
28943
28944    * iit_get.c: Added ability to query symbolic db.
28945
289462002-07-12  twu
28947
28948    * iit_get.c: Fixed bugs in the algorithm.
28949
28950    * iit_get.c: Allowed user to specify a single point, rather than an interval.
28951
28952    * iit_store.c: Added an output message when Berkeley DB file is done.
28953
28954    * iit_get.c, iit_store.c: Integrated interval tree into
28955      db_load/retrieve_endpoints.
28956
28957    * Makefile, basic.h, iit-read.c, iit.c, interval-read.c, interval-read.h,
28958      interval.c, interval.h: Rewrote interval tree to handle interval queries
28959      and to write tree to and read tree from files.
28960
289612002-07-11  twu
28962
28963    * basic.h, iit-read.c, iit.c: Added code for integer interval trees from
28964      Edelsbrunner's alpha shapes.
28965
28966    * get-genome.c: Added ability to convert a single coordinate.
28967
28968    * prb.c: Fixed minor typos.
28969
28970    * prb.c, prb.h: Revised format and separated interface from implementation.
28971
28972    * prb.c, prb.h: Added routines for red-black trees with parent pointers from
28973      libavl 2.0
28974
289752002-07-10  twu
28976
28977    * dpentry.c: Changed criterion to consider query coverage.
28978
28979    * endpoints.c: Fixed problem with negative relative positions.
28980
289812002-07-09  twu
28982
28983    * endpoints.c, endpoints.h, snap.c, gmap.c: Revised output format of SNAP.
28984      Coordinates are now given for each accession.
28985
28986    * offset.c, offset.h, offsetdb.c, offsetdb.h, snap.c, gmap.c: Revised
28987      chromosome dump procedure to print lengths as well as offsets.
28988
28989    * Makefile, add-chrpos-to-endpoints.c: Added program for adding chromosomal
28990      position to endpoints.
28991
28992    * whats_on: Modified program to work with new version of SNAP.
28993
289942002-07-08  twu
28995
28996    * dpentry.c, match.c, match.h, result.c, result.h, snap.c, gmap.c: Restored
28997      nleads as a criterion in dynamic programming.  Added features to help with
28998      debugging.
28999
29000    * Makefile, get-genome.c: Freed get-genome from using BerkeleyDB databases,
29001      which are too slow to open.
29002
29003    * get-genome.c: Added ability to report coordinates.
29004
29005    * iit_get.c: Added check for zero matches.
29006
29007    * whats_on: Preliminary changes (inadvertent checkin).
29008
29009    * Makefile, get-genome.c: Created get-genome program.
29010
29011    * accpos.c: Added offset for chromosomes.
29012
29013    * accpos.c, database.c, database.h, snap.c, snapindex.c, gmap.c,
29014      gmapindex.c: Changed location of data files.
29015
290162002-07-07  twu
29017
29018    * iit_get.c: Fixed bug with testing dumpp.
29019
290202002-07-06  twu
29021
29022    * endpoints.c, endpoints.h, match.c, result.c: Added check for boomerang
29023      paths, where the genomic length is 0.
29024
290252002-07-05  twu
29026
29027    * whats_on: Added whats_on from ../snap.
29028
29029    * spidey_compress.pl: Added meta-level compression.
29030
29031    * spidey_compress.pl: Added spidey_compress.pl from ../snap.
29032
29033    * iit_get.c: Added dump utility.
29034
29035    * iit_get.c, iit_store.c: Added programs for storing and retrieving records
29036      based on endpoints.
29037
290382002-07-04  twu
29039
29040    * sim4_uncompress.pl: Added retrieval function for get-genome.
29041
290422002-07-03  twu
29043
29044    * sim4_compress.pl, sim4_uncompress.pl: Added further compression by
29045      counting repeated tokens.
29046
29047    * sim4_compress.pl, sim4_uncompress.pl, util: Added sim4
29048      compression/uncompression routines from snap CVS archive.
29049
29050    * dpentry.c, match.c, result.c: Changed from using slopes (quotients) to
29051      intron measurements (differences).
29052
29053    * endpoints.c, snap.c, gmap.c: Changed output slightly, e.g., en-dash for
29054      number ranges.
29055
29056    * Makefile, accpos.c: Created program accpos, for finding genomic position
29057      of accessions.
29058
29059    * segmentpos.c, segmentpos.h: Added procedure for finding partially matching
29060      accessions.
29061
29062    * snapindex.c, gmapindex.c: Made creation of aux-only database faster.
29063
29064    * snap.c, gmap.c: Fixed small bug in error message.
29065
29066    * segmentpos.c, segmentpos.h: Made Segmentpos_print extern.
29067
29068    * database.c: Changed accsegmentpos_db from hash to B-tree.
29069
29070    * database.c, database.h, snap.c, snapindex.c, gmap.c, gmapindex.c: Merged
29071      two database procedures.
29072
29073    * segmentpos.c, segmentpos.h: Added procedure for reading from
29074      accsegmentpos_db.
29075
29076    * Makefile, cell.c, cell.h, database.c, database.h, endpoints.c,
29077      endpoints.h, segmentpos.c, segmentpos.h, snap.c, snapindex.c, gmap.c,
29078      gmapindex.c: Added another database, from accession name to segmentpos,
29079      and renamed databases.
29080
290812002-07-02  twu
29082
29083    * Makefile, block.c, block.h, snap.c, gmap.c: Added specification for
29084      minimum separation between leads.
29085
29086    * Makefile: Removed segmentpos dump flag from db.test
29087
29088    * endpoints.c, endpoints.h, snap.c, gmap.c: Changed to 1-based coordinates
29089      as default.
29090
29091    * snapindex.c, gmapindex.c: Removed segment dump, because it can be
29092      performed by snap.
29093
29094    * snap.c, gmap.c: Added several command-line options.
29095
29096    * segmentpos.c, segmentpos.h: Enhanced dump procedure to report absolute
29097      genomic positions.
29098
29099    * match.c, match.h, result.c, result.h: Storing signed genome_coverage into
29100      dpentry and checking for impossible slopes (< 0.9).
29101
29102    * offset.c, offset.h, offsetdb.c, offsetdb.h: Added dump procedure.
29103
29104    * endpoints.c: Added printing of subaccessions for Celera genome.  Added
29105      commas to output of positions.  Changed dominated function to look for any
29106      overlap instead of complete coverage.
29107
29108    * Makefile, dpentry.c, dpentry.h: Changed comparison function to use slopes.
29109
29110    * chrnum.c: Added check for uninitialized chromosome.
29111
291122002-07-01  twu
29113
29114    * block.h, buffer-thread-attempt.c, buffer-thread-attempt.h, buffer.c,
29115      buffer.h, dbentry.c, dbentry.h, entry.c, entry.h, hits.c, hits.h, oligo.c,
29116      sort.c, sort.h, table.c, table.h: Removed unused files.
29117
29118    * Makefile, block.c, block.h, database.c, database.h, endpoints.c,
29119      endpoints.h, hash-oligos.c, hit.c, hit.h, match.c, match.h, oligo.c,
29120      oligo.h, request.c, request.h, result.c, result.h, scan.c, scan.h,
29121      segmentpos.c, snap.c, snapindex.c, gmap.c, gmapindex.c: Changed oligo_db
29122      from BerkeleyDB to PureDB.  Created object for endpoints.  Removed unused
29123      files.
29124
291252002-06-29  twu
29126
29127    * segmentpos.c, segmentpos.h, snap.c, gmap.c: Added genomic position to the
29128      output.
29129
29130    * match.c, result.c, snap.c, gmap.c: Fixed memory leaks.
29131
29132    * block.c, block.h, dpentry.c, dpentry.h, match.c, match.h, request.c,
29133      request.h, result.c, result.h, snap.c, gmap.c: Added minimum spanning
29134      tree.  Version appears to work well.
29135
29136    * Makefile, block.c, block.h, dpentry.c, dpentry.h, match.c, match.h,
29137      request.c, request.h, result.c, result.h, snap.c, gmap.c: Early version of
29138      dynamic programming that stores H best paths at each hit.
29139
291402002-06-28  twu
29141
29142    * match.c, match.h, result.c, result.h, segmentpos.c, snap.c, gmap.c: Added
29143      simple dynamic programming and best pair techniques.
29144
29145    * Makefile, block.c, block.h, database.c, database.h, match.c, match.h,
29146      oligo.c, oligo.h, reqpost.c, request.c, request.h, result.c, result.h,
29147      snap.c, snapindex.c, gmap.c, gmapindex.c: Implemented working version of
29148      snap that uses multiple oligo_dbs with requests and strings results
29149      together from 5' and 3' ends.
29150
29151    * commafmt.c, commafmt.h: Added source code for adding commas to numbers.
29152
291532002-06-25  twu
29154
29155    * Makefile: Added specification of directory for dbenv.
29156
29157    * database.c, snapindex.c, gmapindex.c: Added provisions for transactions,
29158      to try to speed up build of database.
29159
291602002-05-29  twu
29161
29162    * snapindex.c, gmapindex.c: Allowed the user to specify a directory for the
29163      BerkeleyDB environment.
29164
291652002-05-27  twu
29166
29167    * Makefile, segmentpos.c, snapindex.c, gmapindex.c: Added specification of
29168      segmentfile as flag -g.
29169
29170    * Makefile, database.c, database.h, snapindex.c, gmapindex.c: Removed
29171      genome_db and delta_db from snapindex.
29172
29173    * Makefile, segmentpos.c, segmentpos.h, snapindex.c, gmapindex.c: Added
29174      ability to dump segments (in order) from segmentpos_db
29175
29176    * Makefile: Changed flags for C compiler.
29177
29178    * segmentpos.c: Added check to get previous segment only in some cases.
29179
291802002-05-22  twu
29181
29182    * Makefile, oligo.c, oligo.h, read.c, read.h, readcirc.c, readcirc.h,
29183      scan.c, scan.h, segmentpos.c, segmentpos.h, snap.c, gmap.c: Working
29184      version of snap using a scan of genomic and delta information.
29185
291862002-05-21  twu
29187
29188    * Makefile, cell.c, cell.h, database.c, database.h, hit.c, hit.h, offset.c,
29189      offset.h, offsetdb.c, offsetdb.h, oligo.c, oligo.h, scan.c, scan.h,
29190      snap.c, snapindex.c, gmap.c, gmapindex.c, table.c, table.h: Made changes
29191      to store delta position of genomic oligos and to store oligos of query
29192      sequence.
29193
291942002-05-08  twu
29195
29196    * Makefile, database.c, database.h, hash-oligos.c, snap.c, snapindex.c,
29197      gmap.c, gmapindex.c: Consolidated sample-oligos and hash-oligos into
29198      snapindex.  Specified oligo dbtype by using Berkeley DB constants.
29199
292002002-05-03  twu
29201
29202    * Makefile, cell.c, cell.h, database.c, database.h, hit.c, read.c,
29203      readcirc.c, sample-oligos.c: Separated database commands for oligos from
29204      the other database (aux).
29205
292062002-04-26  twu
29207
29208    * Makefile, hit.c, oligo.c, read.c, readcirc.c, snap.c, gmap.c: Removed
29209      environment.  Began implementation of dynamic programming.
29210
29211    * Makefile, chrnum.c, chrnum.h, database.c, database.h, hash-oligos.c,
29212      hit.c, hit.h, offset.c, offset.h, offsetdb.c, offsetdb.h, oligo.c,
29213      oligo.h, read.c, read.h, readcirc.c, readcirc.h, snap.c, gmap.c:
29214      Re-implementation of SNAP using new database created by hash-oligos.
29215
292162002-04-24  twu
29217
29218    * segmentpos.c, segmentpos.h: Handled problems with chromosome string to
29219      integer conversions.
29220
29221    * hash-oligos.c: Handled problems with chromosome string to integer
29222      conversions.  Rearranged calls to db->open so that each db is opened only
29223      once.
29224
29225    * Makefile, btree.c, btree.h, hash-oligos.c, hash.c, hash.h, oligo.c,
29226      oligo.h: Consolidated code into fewer files.
29227
29228    * Makefile: Changed CFLAGS to optimize speed.
29229
29230    * cell.h: Matched up .h file with .c file.
29231
29232    * Makefile, genomicpos.c, genomicpos.h, hash-oligos.c, segmentpos.c,
29233      segmentpos.h: Now storing genomic locations as global positions, which
29234      require keeping track of chromosomal offsets.
29235
29236    * Makefile, btree.c, btree.h, cell.c, cell.h, entry.c, entry.h,
29237      genomicpos.c, genomicpos.h, hash-oligos.c, sample-oligos.c: Major change
29238      to allow B-trees, to avoid storing adjacent oligos, to store genomic
29239      positions, and to write oligos in binary format.
29240
292412002-04-22  twu
29242
29243    * Makefile, assert.c, assert.h, block.c, block.h, bool.h,
29244      buffer-thread-attempt.c, buffer-thread-attempt.h, buffer.c, buffer.h,
29245      cksum-fa.c, cksum.c, dbentry.c, dbentry.h, entry.c, entry.h, except.c,
29246      except.h, hash-oligos.c, hash-test.c, hash.c, hash.h, hits.c, hits.h,
29247      list.c, list.h, match.c, match.h, mem.c, mem.h, oligo.c, oligo.h, read.c,
29248      read.h, readcirc.c, readcirc.h, reqpost.c, reqpost.h, request.c,
29249      request.h, result.c, result.h, sample-oligos.c, snap-withenv.c, snap.c,
29250      sort.c, sort.h, src, gmap.c: Initial import into CVS.
29251
292522000-05-08  paf
29253
29254    * config, cvswrappers, loginfo, modules, CVSROOT, checkoutlist, commitinfo,
29255      editinfo, notify, rcsinfo, taginfo, verifymsg: initial checkin
29256
292572000-05-08  (no author)
29258
29259    * branches, tags, trunk: Standard project directories initialized by cvs2svn.
29260
29261