|
Name |
|
Date |
Size |
#Lines |
LOC |
| .. | | 03-May-2022 | - |
| COPYRIGHT | H A D | 03-May-2022 | 605 | 18 | 13 |
| FileDlog.c | H A D | 03-May-2022 | 4.3 KiB | 213 | 182 |
| Makefile | H A D | 03-May-2022 | 8.4 KiB | 277 | 170 |
| README.versions | H A D | 03-May-2022 | 1.1 KiB | 69 | 36 |
| aacomp.c | H A D | 03-May-2022 | 2.4 KiB | 123 | 87 |
| aamap.gbl | H A D | 03-May-2022 | 422 | 15 | 9 |
| ag_stats.c | H A D | 03-May-2022 | 4.3 KiB | 195 | 155 |
| align.1 | H A D | 03-May-2022 | 3.4 KiB | 137 | 131 |
| align.c | H A D | 03-May-2022 | 8.8 KiB | 378 | 308 |
| alt_parms.h | H A D | 03-May-2022 | 10.1 KiB | 379 | 350 |
| altlib.h | H A D | 03-May-2022 | 1.1 KiB | 55 | 48 |
| apam.c | H A D | 03-May-2022 | 4.3 KiB | 181 | 141 |
| bestscor.c | H A D | 03-May-2022 | 2.7 KiB | 144 | 114 |
| blosum50.mat | H A D | 03-May-2022 | 2.1 KiB | 32 | 31 |
| bovgh.seq | H A D | 03-May-2022 | 2.5 KiB | 39 | 38 |
| bovprl.rev | H A D | 03-May-2022 | 920 | 14 | 13 |
| bovprl.seq | H A D | 03-May-2022 | 986 | 18 | 17 |
| checkevent.c | H A D | 03-May-2022 | 3.3 KiB | 187 | 150 |
| chofas | H A D | 03-May-2022 | 1.7 KiB | | |
| chofas.c | H A D | 03-May-2022 | 8.1 KiB | 346 | 286 |
| chofas.htm | H A D | 03-May-2022 | 1.9 KiB | 66 | 52 |
| codaa.mat | H A D | 03-May-2022 | 1,012 | 31 | 29 |
| crandseq.c | H A D | 03-May-2022 | 8 KiB | 404 | 338 |
| crck.c | H A D | 03-May-2022 | 745 | 28 | 16 |
| dna.mat | H A D | 03-May-2022 | 489 | 23 | 21 |
| ecoli.comp | H A D | 03-May-2022 | 658 | 27 | 26 |
| egmsmg.aa | H A D | 03-May-2022 | 1.3 KiB | 20 | 19 |
| extractp.c | H A D | 03-May-2022 | 7.5 KiB | 367 | 309 |
| f_band.c | H A D | 03-May-2022 | 2.3 KiB | 118 | 84 |
| faatran.c | H A D | 03-May-2022 | 2.8 KiB | 143 | 116 |
| fasta.1 | H A D | 03-May-2022 | 13.3 KiB | 493 | 474 |
| fasta.rsp | H A D | 03-May-2022 | 72 | 2 | 1 |
| fasta20.doc | H A D | 03-May-2022 | 47.8 KiB | 1,131 | 895 |
| fasta20.me | H A D | 03-May-2022 | 47.8 KiB | 1,103 | 1,086 |
| fastx.rsp | H A D | 03-May-2022 | 67 | 2 | 1 |
| fffasta.c | H A D | 03-May-2022 | 75.9 KiB | 3,237 | 2,635 |
| find.gbl | H A D | 03-May-2022 | 680 | 36 | 26 |
| fldispn.c | H A D | 03-May-2022 | 7.1 KiB | 409 | 345 |
| format.doc | H A D | 03-May-2022 | 4.1 KiB | 119 | 84 |
| fromgb.c | H A D | 03-May-2022 | 4.3 KiB | 150 | 101 |
| g_band.c | H A D | 03-May-2022 | 10 KiB | 474 | 386 |
| garnier.c | H A D | 03-May-2022 | 5 KiB | 207 | 169 |
| garnier.cgi | H A D | 03-May-2022 | 2.1 KiB | 112 | 67 |
| garnier.h | H A D | 03-May-2022 | 5.7 KiB | 95 | 90 |
| garnier.htm | H A D | 03-May-2022 | 1.9 KiB | 67 | 52 |
| gen.rsp | H A D | 03-May-2022 | 40 | 2 | 1 |
| getenv.c | H A D | 03-May-2022 | 1 KiB | 57 | 48 |
| getopt.c | H A D | 03-May-2022 | 1.2 KiB | 62 | 57 |
| gonnet.mat | H A D | 03-May-2022 | 952 | 29 | 27 |
| grease.c | H A D | 03-May-2022 | 2.7 KiB | 135 | 107 |
| grease.htm | H A D | 03-May-2022 | 1.6 KiB | 60 | 51 |
| idnaa.mat | H A D | 03-May-2022 | 1,019 | 31 | 29 |
| idpaa.mat | H A D | 03-May-2022 | 836 | 28 | 27 |
| jtr_gst.aa | H A D | 03-May-2022 | 508 | 26 | 25 |
| l_band.c | H A D | 03-May-2022 | 9.1 KiB | 437 | 365 |
| laacomp.c | H A D | 03-May-2022 | 2.7 KiB | 138 | 97 |
| lalign.1 | H A D | 03-May-2022 | 5.7 KiB | 185 | 179 |
| lalign.cgi | H A D | 03-May-2022 | 3.2 KiB | 154 | 94 |
| lalign.htm | H A D | 03-May-2022 | 2.7 KiB | 100 | 81 |
| lalign2.c | H A D | 03-May-2022 | 14.1 KiB | 595 | 492 |
| lcbo.aa | H A D | 03-May-2022 | 271 | 6 | 5 |
| lcbo.vms | H A D | 03-May-2022 | 272 | 7 | 6 |
| lfasta.rsp | H A D | 03-May-2022 | 64 | 2 | 1 |
| llmax.c | H A D | 03-May-2022 | 7.3 KiB | 337 | 271 |
| llmax0.c | H A D | 03-May-2022 | 8.9 KiB | 423 | 361 |
| lsim2.c | H A D | 03-May-2022 | 26.2 KiB | 1,001 | 841 |
| lsim3.c | H A D | 03-May-2022 | 25 KiB | 1,047 | 841 |
| lx_align3.c | H A D | 03-May-2022 | 17.1 KiB | 661 | 520 |
| lx_band2.c | H A D | 03-May-2022 | 4 KiB | 164 | 136 |
| make_vms.com | H A D | 03-May-2022 | 5.1 KiB | 113 | 111 |
| makefile.32 | H A D | 03-May-2022 | 6.3 KiB | 196 | 117 |
| makefile.tc | H A D | 03-May-2022 | 6.6 KiB | 204 | 122 |
| makefile.unx | H A D | 03-May-2022 | 7.3 KiB | 257 | 170 |
| mchu.aa | H A D | 03-May-2022 | 213 | 5 | 4 |
| mgstm1.aa | H A D | 03-May-2022 | 264 | 9 | 8 |
| mgstm1.e05 | H A D | 03-May-2022 | 1.2 KiB | 21 | 20 |
| mgstm1.eeq | H A D | 03-May-2022 | 1.1 KiB | 21 | 20 |
| mgstm1.esq | H A D | 03-May-2022 | 1.1 KiB | 21 | 20 |
| mgstm1.ran | H A D | 03-May-2022 | 7.4 KiB | 121 | 120 |
| mgstm1.rev | H A D | 03-May-2022 | 1.1 KiB | 17 | 16 |
| mgstm1.rsq | H A D | 03-May-2022 | 1.2 KiB | 21 | 20 |
| mgstm1.rsq2 | H A D | 03-May-2022 | 1.2 KiB | 21 | 20 |
| mgstm1.seq | H A D | 03-May-2022 | 1.1 KiB | 21 | 20 |
| mplotsub.c | H A D | 03-May-2022 | 3.1 KiB | 168 | 131 |
| mstm1.ssq | H A D | 03-May-2022 | 1.1 KiB | 21 | 20 |
| mtdispn.c | H A D | 03-May-2022 | 7.9 KiB | 413 | 324 |
| musplfm.aa | H A D | 03-May-2022 | 272 | 7 | 6 |
| mwkw.aa | H A D | 03-May-2022 | 2 KiB | 32 | 31 |
| mwrtc1.aa | H A D | 03-May-2022 | 500 | 9 | 8 |
| ncbl_head.h | H A D | 03-May-2022 | 948 | 31 | 21 |
| ncbl_lib.c | H A D | 03-May-2022 | 11.9 KiB | 457 | 379 |
| ndispn.c | H A D | 03-May-2022 | 7.9 KiB | 343 | 290 |
| nrand.c | H A D | 03-May-2022 | 497 | 37 | 30 |
| nrand48.c | H A D | 03-May-2022 | 386 | 27 | 20 |
| nrandom.c | H A D | 03-May-2022 | 368 | 26 | 19 |
| ntcomp.c | H A D | 03-May-2022 | 2.2 KiB | 119 | 84 |
| nxgetaa.c | H A D | 03-May-2022 | 31.7 KiB | 1,459 | 1,226 |
| oohu.aa | H A D | 03-May-2022 | 378 | 7 | 6 |
| oohu.raa | H A D | 03-May-2022 | 401 | 8 | 7 |
| pam.c | H A D | 03-May-2022 | 2.8 KiB | 141 | 102 |
| pam120.mat | H A D | 03-May-2022 | 1 KiB | 30 | 29 |
| pam250.mat | H A D | 03-May-2022 | 1 KiB | 30 | 29 |
| pgrease.cgi | H A D | 03-May-2022 | 3.3 KiB | 136 | 83 |
| plalign.cgi | H A D | 03-May-2022 | 4.3 KiB | 190 | 121 |
| plalign.htm | H A D | 03-May-2022 | 2.6 KiB | 97 | 86 |
| pll.rsp | H A D | 03-May-2022 | 58 | 2 | 1 |
| plotsub.c | H A D | 03-May-2022 | 1.9 KiB | 110 | 93 |
| prdf.1 | H A D | 03-May-2022 | 3.4 KiB | 118 | 115 |
| prdf.c | H A D | 03-May-2022 | 29.7 KiB | 1,261 | 1,020 |
| prot.mat | H A D | 03-May-2022 | 1 KiB | 32 | 29 |
| prss.1 | H A D | 03-May-2022 | 3.4 KiB | 113 | 110 |
| prss.c | H A D | 03-May-2022 | 15.1 KiB | 668 | 550 |
| ps_dispn.c | H A D | 03-May-2022 | 9 KiB | 434 | 351 |
| ps_plotsub.c | H A D | 03-May-2022 | 2.8 KiB | 141 | 115 |
| pscore.c | H A D | 03-May-2022 | 2.3 KiB | 120 | 96 |
| qrhuld.aa | H A D | 03-May-2022 | 914 | 16 | 15 |
| qsubs.c | H A D | 03-May-2022 | 554 | 31 | 22 |
| qsubs.h | H A D | 03-May-2022 | 162 | 12 | 8 |
| randlib.c | H A D | 03-May-2022 | 8.8 KiB | 411 | 330 |
| randseq.1 | H A D | 03-May-2022 | 958 | 46 | 42 |
| randseq.c | H A D | 03-May-2022 | 9.3 KiB | 452 | 375 |
| randtest.c | H A D | 03-May-2022 | 239 | 15 | 9 |
| readme.v15 | H A D | 03-May-2022 | 3 KiB | 63 | 52 |
| readme.v16 | H A D | 03-May-2022 | 5.2 KiB | 132 | 101 |
| readme.v17 | H A D | 03-May-2022 | 504 | 16 | 11 |
| readme.v20 | H A D | 03-May-2022 | 3.7 KiB | 130 | 81 |
| readme.v20u4 | H A D | 03-May-2022 | 5.8 KiB | 202 | 178 |
| readme.v20u5 | H A D | 03-May-2022 | 1.7 KiB | 57 | 37 |
| readme.v20u6 | H A D | 03-May-2022 | 2.5 KiB | 93 | 55 |
| readme.v21u0 | H A D | 03-May-2022 | 1.6 KiB | 56 | 34 |
| relate.c | H A D | 03-May-2022 | 8.3 KiB | 388 | 317 |
| release.v16 | H A D | 03-May-2022 | 1.3 KiB | 50 | 30 |
| release.v17 | H A D | 03-May-2022 | 726 | 29 | 16 |
| res_stats.c | H A D | 03-May-2022 | 11.7 KiB | 534 | 458 |
| revcomp.c | H A D | 03-May-2022 | 2.8 KiB | 133 | 99 |
| rweibull.c | H A D | 03-May-2022 | 4.6 KiB | 186 | 114 |
| scalesw2.c | H A D | 03-May-2022 | 16 KiB | 689 | 481 |
| scalesws.c | H A D | 03-May-2022 | 15.9 KiB | 686 | 479 |
| score_al.c | H A D | 03-May-2022 | 9.4 KiB | 415 | 338 |
| simlib.h | H A D | 03-May-2022 | 1.2 KiB | 44 | 28 |
| sindex.c | H A D | 03-May-2022 | 9.5 KiB | 470 | 372 |
| ssearch.1 | H A D | 03-May-2022 | 6 KiB | 219 | 211 |
| ssearch.c | H A D | 03-May-2022 | 39.2 KiB | 1,702 | 1,466 |
| test.seq | H A D | 03-May-2022 | 133 | 4 | 3 |
| test.sh | H A D | 03-May-2022 | 549 | 23 | 21 |
| tgrease.c | H A D | 03-May-2022 | 5.1 KiB | 214 | 176 |
| time.c | H A D | 03-May-2022 | 640 | 44 | 36 |
| tldispn.c | H A D | 03-May-2022 | 6 KiB | 344 | 283 |
| tplotsub.c | H A D | 03-May-2022 | 2.7 KiB | 150 | 126 |
| translate.c | H A D | 03-May-2022 | 2.4 KiB | 119 | 94 |
| ttdispn.c | H A D | 03-May-2022 | 7.4 KiB | 403 | 336 |
| uascii.gbl | H A D | 03-May-2022 | 1.6 KiB | 50 | 43 |
| upam.gbl | H A D | 03-May-2022 | 10.8 KiB | 332 | 299 |
| uwgetaa.c | H A D | 03-May-2022 | 14.8 KiB | 584 | 469 |
| vmsgeten.c | H A D | 03-May-2022 | 1 KiB | 43 | 40 |
| xurt8c.aa | H A D | 03-May-2022 | 292 | 6 | 5 |
| zs_exp.c | H A D | 03-May-2022 | 1.1 KiB | 49 | 29 |
| zxlgmata.c | H A D | 03-May-2022 | 6.7 KiB | 339 | 251 |
| zzgmata.gbl | H A D | 03-May-2022 | 324 | 15 | 11 |
| zzlgmata.c | H A D | 03-May-2022 | 12 KiB | 551 | 428 |
README.versions
1
2May 13, 1997
3
4Version overview
5
6Currently, the fasta2u65.shar.Z is the latest complete fasta package,
7and it has a complete set of searching programs (fasta, ssearch,
8fastx, etc). However, the searching programs are more in a maintenance
9mode, bug fixes only.
10
11The fasta3 series, which has ONLY the searching programs, has the
12latest versions of the algorithms and statistical methods. fasta3
13also runs the exact same functions threaded (fasta3, fasta3_t) and in
14parallel using PVM.
15
16Here is a list of the programs, and where they can be found:
17
18program fasta2 fasta3 replaced by
19
20fasta yes fasta3, fasta3_t
21
22ssearch yes ssearch3, ssearch3_t
23
24tfasta yes tfasta3, tfasta3_t (tfastx3 preferred)
25
26fastx yes fastx3, fastx3_t
27
28tfastx yes tfastx3, tfastx3_t
29
30rdf2 (obsolete) no no prdf2
31
32rss (obsolete) no no prss
33
34prdf2 yes no
35
36prss yes no
37
38lfasta yes no
39
40lalign yes no
41
42plalign yes no
43
44flalign yes no
45
46align yes no
47
48align0 yes no
49
50randseq yes no
51
52crandseq yes no
53
54aacomp yes no
55
56bestscor yes no
57
58fromgb yes no
59
60grease yes no
61
62tgrease yes no
63
64garnier yes no
65
66zs_exp yes no
67
68
69
readme.v15
1
2Changes with version 1.5
3
4 FASTA version 1.5 includes a number of substantial revisions
5to improve the performance and sensistivity of the program. Two
6changes are apparent. It is now possible to tell the program to
7optimize all of the init1 scores greater than a threshold. The
8threshold is set at the same value as the old FASTA cutoff score
9(approximately 0.5 standard deviations above the mean for average
10length sequences). For highest sensitivity, you can use the -c option
11to set the threshold to 1. (This will slow the search down about
125-fold). In addition, you can tell FASTA to sort the results by the
13"init1" score, rather than the "initn" score, by using the "-1"
14option. FASTA -1 ... will report the results the way the older FASTP
15program did.
16
17 A new method has been provided for selecting libraries. In the
18past, one could enter the name of a sequence file to be searched or a
19single letter that would specify a library from the list included in
20the $FASTLIBS file. Now, you can specify a set of library files with a
21string of letters preceeded by a '%'. Thus, if the FASTLIBS file has
22the lines:
23
24 Genbank 64 primates$1P/seqlib/gbpri.seq
25 Genbank 64 rodents$1R/seqlib/gbrod.seq
26 Genbank 64 other mammals$1M/seqlib/gbmam.seq
27 Genbank 64 vertebrates $1B/seqlib/gbvrt.seq
28
29Then the string: "%PRMB" would tell FASTA to search the four libraries
30listed above. The %PRMB string can be entered either on the command
31line or when the program asks for a filename or library letter.
32
33 FASTA1.5 also provides additional flexibility for specifying
34the number of results and alignments to be displayed with the -Q
35(quiet) option. The "-b number" option allows you to specify the number of
36sequence scores to show when the search is finished. Thus
37
38 FASTA -b 100 ...
39
40would tell the program to display the top 100 sequence scores. In the
41past, if you displayed 100 scores (in -Q mode), you would also have
42store 100 alignments. The "-d" option allows you to limit the number
43of alignments shown. FASTA -b 100 -d 20 would show 100 scores and 20
44alignments.
45
46 The old "CUTOFF" parameter is no longer used. The program
47stores the best 2000 (IBM-PC, MAC) or 6000 (Unix, VMS) scores and then
48throws out the lowest 25%, stores the next 500 (1500) better than the
49threshold determined with the first scores were discarded, and repeats
50the process as the library is scanned. As a result, the best 1500 -
512000 (4500 - 6000) scores are saved. The old cut-off parameter was
52also used to set the joining threshold for the calculation of the
53initn score from initial regions. This joining threshold can now be
54set with the -g option or the GAPCUT parameter.
55
56 Finally, FASTA can provide a complete list of all of the
57sequences and scores calculated to a file with the "-r" (results)
58option. FASTA -r results.out ... creates a file with a list of scores
59for every sequence in the library. The list is not sorted, and only
60includes those scores calculated during the initial scan of the
61library (the optimized score is not calculated unless the -o option is
62used).
63
readme.v16
1Changes with 1.6c31a
2
3 (August, 1993) Released support for NCBI SEARCH and
4 BLASTP/BLASTN formats.
5
6 (November, 1993) Changes to nxgetaa.c to accomodate changes in
7 embl library format. Changes to ncbl_lib.c to work on DNA
8 sequences
9
10Changes with 1.6c24
11
12 (December 1992) Added -e option for more selective scores.
13
14 (April 1993) Added #define SUPERFAMNUM for genpept.fasta
15 users. By default, superfamily numbers are not returned from
16 fasta format (libtype=1) files.
17
18 (May 1993) Changed window shuffle routine in rdf2, rss, to
19 preserve locality of shuffle.
20
21Changes with version 1.6b
22
23 FASTA version 1.6b uses a new method for calculating optimal
24scores in a band (the optimization or last step in the FASTA
25algorithm). In addition, it uses a linear-space method for calculating
26the actual alignments. The FASTA package also includes four new
27programs:
28
29 ssearch a program to search a sequence database using
30 the rigorous Smith-Waterman algorith (this
31 program is about 100-fold slower than FASTA
32 with ktup=2 (for proteins).
33
34 rss a version of rdf2 that uses a rigorous
35 Smith-Waterman calculation to score
36 similarities
37
38 lalign A rigorous local sequence alignment program
39 that will display the N-best local alignments
40 (N=10 by default).
41
42 plalign a version of lalign that plots the local alignments.
43
44
45 The lalign/plalign program incorporate the "sim" algorithm
46described by Huang and Miller (1991) Adv. Appl. Math. 12:337-357.
47The ssearch and rss programs incorporate algorithms described by
48Huang, Hardison, and Miller (1990) CABIOS 6:373-381.
49
50 Two new command line options are available:
51
52 -n indicates that the query file is a nucleotide
53 sequence. This option can be very useful when
54 searching with consensus regulatory sequences.
55
56 -x "off1 off2" allows you to specify an offset for the
57 beginning of a DNA or protein sequence. For example,
58 if you are comparing upstream regions for two genes, and
59 the first sequence contains 500 nt of upstream
60 sequence while the second contains 300 nt of upstream
61 sequence, you might try:
62
63 fasta -x "-500 -300" seq1.nt seq2.nt
64
65 This option will not work properly with the translated
66 library sequence with tfasta.
67
68 (You should double check to be certain the negative
69 numbering works properly.)
70
71Changes with version 1.5
72
73 FASTA version 1.5 includes a number of substantial revisions
74to improve the performance and sensistivity of the program. Two
75changes are apparent. It is now possible to tell the program to
76optimize all of the init1 scores greater than a threshold. The
77threshold is set at the same value as the old FASTA cutoff score
78(approximately 0.5 standard deviations above the mean for average
79length sequences). For highest sensitivity, you can use the -c option
80to set the threshold to 1. (This will slow the search down about
815-fold). In addition, you can tell FASTA to sort the results by the
82"init1" score, rather than the "initn" score, by using the "-1"
83option. FASTA -1 ... will report the results the way the older FASTP
84program did.
85
86 A new method has been provided for selecting libraries. In the
87past, one could enter the name of a sequence file to be searched or a
88single letter that would specify a library from the list included in
89the $FASTLIBS file. Now, you can specify a set of library files with a
90string of letters preceeded by a '%'. Thus, if the FASTLIBS file has
91the lines:
92
93 Genbank 64 primates$1P/seqlib/gbpri.seq
94 Genbank 64 rodents$1R/seqlib/gbrod.seq
95 Genbank 64 other mammals$1M/seqlib/gbmam.seq
96 Genbank 64 vertebrates $1B/seqlib/gbvrt.seq
97
98Then the string: "%PRMB" would tell FASTA to search the four libraries
99listed above. The %PRMB string can be entered either on the command
100line or when the program asks for a filename or library letter.
101
102 FASTA1.5 also provides additional flexibility for specifying
103the number of results and alignments to be displayed with the -Q
104(quiet) option. The "-b number" option allows you to specify the number of
105sequence scores to show when the search is finished. Thus
106
107 FASTA -b 100 ...
108
109would tell the program to display the top 100 sequence scores. In the
110past, if you displayed 100 scores (in -Q mode), you would also have
111store 100 alignments. The "-d" option allows you to limit the number
112of alignments shown. FASTA -b 100 -d 20 would show 100 scores and 20
113alignments.
114
115 The old "CUTOFF" parameter is no longer used. The program
116stores the best 2000 (IBM-PC, MAC) or 6000 (Unix, VMS) scores and then
117throws out the lowest 25%, stores the next 500 (1500) better than the
118threshold determined with the first scores were discarded, and repeats
119the process as the library is scanned. As a result, the best 1500 -
1202000 (4500 - 6000) scores are saved. The old cut-off parameter was
121also used to set the joining threshold for the calculation of the
122initn score from initial regions. This joining threshold can now be
123set with the -g option or the GAPCUT parameter.
124
125 Finally, FASTA can provide a complete list of all of the
126sequences and scores calculated to a file with the "-r" (results)
127option. FASTA -r results.out ... creates a file with a list of scores
128for every sequence in the library. The list is not sorted, and only
129includes those scores calculated during the initial scan of the
130library (the optimized score is not calculated unless the -o option is
131used).
132
readme.v17
1Changes with 1.7
2
3 (February 1994) Replaced rdf2 and rss with prdf, prss, which
4 calculate informative score probabilities by fitting the
5 distribution of shuffled scores to an extreme value
6 distribution. The curve fitting routines in rweibull.c were
7 provided by Phil Green, Washington U., St. Louis.
8
9 "-i" switch to reverse-complement query sequence if it is
10 DNA.
11
12 Fix bug in zzlgmata.c that caused problems with alignments.
13
14 Fix histogram routine to work properly with ln() normalized
15 scores.
16
readme.v20
1
2Changes with 2.0 (March, 1995)
3
4 WARNING - Optimization is now turned on by default. The
5 meaning of the "-o" option has been reversed. "-o" now turns
6 off optimization, reverting to the earlier method of sorting
7 by "initn" scores.
8
9 Change default protein matrix to BLOSUM50. PAM250 is
10 still available with -s 250. Change program to accept gap
11 penalties from the command line with "-f" (-12) and "-g" (-2).
12
13 Provide MARKX=4, which allows one to display the conserved
14 regions of the query sequence after a library search.
15
16 Calculate explicit probability estimates for FASTA, TFASTA,
17 and SSEARCH. Estimates assume that the library contains a large
18 number of unrelated sequences. If this is not correct, the
19 estimates are useless (and should be turned off with the -z
20 flag).
21
22 The width of the band used to calculate optimized scores is
23 now variable. For proteins and ktup=1, 32 residues are used,
24 otherwise 16 residues are used. For DNA, 16 residues are
25 used. This value can be changed with the "-y" option.
26
27 FASTA alignments now use the Smith-Waterman algorithm; there
28 is no longer a limit on gap size for FASTA alignments.
29
30 Fixed a rare bug in lalign/plalign for low gap penalties.
31
32 Fixed lfasta to read one letter filenames in second position.
33
34April 5, 1995
35
36 Fixed bug in blast-format file reading treat sequences that do
37 not end in "*" properly.
38
39May, 1995
40
41 More accurate display of the expected value histogram. The
42 quality of the fit is now quantitated with the
43 Kolmogorov-Smirnov statistic.
44
45 DNA match/mismatch penalties changed to +5/-4.
46
47 An expectation theshold (-E) is provided for displaying
48 scores.
49
50July, 1995
51
52 Corrected a very serious bug in ssearch E()-score calculation
53 for large databases.
54
55 Corrected a minor problem with histogram scaling.
56
57 Removed Kolmogorov-Smirnov statistic if histogram not shown.
58
59 Show correct scoring matrix if specified matrix is not
60 found.
61
62August, 1995
63
64 Some corrections so that "-z" flag works properly and statistical
65 calculations fall back properly when no distribution of lengths
66 is available.
67
682.0x3 Change default DNA and TFASTA alignments to older band-limited
69 Smith-Waterman rather than full Smith-Waterman. Now DNA
70 sequence searches are as fast as before (with Smith-Waterman
71 alignments, they were often 50 times slower). Full Smith-Waterman
72 alignments are available with the "-A" option.
73
74 Small changes in the way that memory is allocated for
75 alignments in FASTA, TFASTA, LFASTA/PLFASTA, and SSEARCH.
76
77 The DOS/BorlandC and UNIX versions have been merged. All
78 files necessary for compilation on Dos/WinNT are included.
79
80 Added -O option to FASTA, TFASTA, LFASTA, PRSS, PRDF, LALIGN, ALIGN
81 to specify output file.
82
832.0u3 merge of Mac FASTA with Win/DOS, Unix FASTA to a single set of files.
84
85Sept, 1995
86
87 add -Q option to prss, prdf. Fix bug in -O option for those
88 programs.
89
90 Allow longer lengths for filenames. Use QFILE_SIZE and LFILE_SIZE
91 to define lengths for query and library file names (40, 80 for
92 microcomputers, 256 for Unix).
93
94November, 1995
95
96 Fix bug in nxgetaa.c that prevented reading multiple
97 blast-formatted files.
98
99February, 1996
100
101 see readme.v20u4 for more information
102
103 added -m 10 option for parseable output
104
105 added library_type 6 for GCG formatted files
106
107 added -L option for long descriptions of library sequences
108
109 "randseq" random shuffle program now available.
110
111March, 1996
112
113 modified nxgetaa for 12 character locus names.
114
115 fixed a bug in lfasta that appears with very long sequences
116
117April, 1996
118
119 Make certain '-z' option really works (required for libraries with
120 sequences < 10 aa).
121
122 Removed duplicate sw_score: in ssearch with -m 10.
123
124 Added -DPROGRESS to report progress of search with "....".
125
126Mar, 1996
127
128 Added "fastx", see readme.v20u5. "-h" is not "-H".
129
130
readme.v20u4
1Changes with 2.0u4 (February, 1996)
2
3Added '-L' option, which provides a longer discription of the library
4sequence.
5
6Fixed a bug in the -m 10 parseable output.
7
8Support is now provided for version 8.0 GCG libraries, both protein
9and DNA. Use library type 6.
10
11Changes with 2.0x4 (January, 1996)
12
13The major change in with 2.0x4 is the ability to get a parseable
14output from FASTA/TFASTA/SSEARCH. This can be done using output
15option -m 10. With -m 10, the initial histogram and list of best
16scores is unchanges, but the alignments are now in a parseable form:
17
18>>>mgstm1.aa, 217 aa vs s library
19; pg_name: FASTA
20; pg_ver: version 2.0x4 Jan., 1996
21; pg_matrix: BLOSUM50
22; pg_gap-pen: -12 -2
23; pg_ktup: 1
24; pg_optcut: 30
25; pg_cgap: 42
26>>GTB1_MOUSE GLUTATHIONE S-TRANSFERASE GT8.7 (EC 2.5.1.18
27; fa_initn: 1490
28; fa_init1: 1490
29; fa_opt: 1490
30; fa_z-score: 1916.0
31; fa_expect: 0
32; sw_score: 1490
33; sw_ident: 1.000
34; sw_overlap: 217
35>GT8.7 ..
36; sq_len: 217
37; sq_type: p
38; al_start: 1
39; al_stop: 217
40; al_display_start: 1
41PMILGYWNVRGLTHPIRMLLEYTDSSYDEKRYTMGDAPDFDRSQWLNEKF
42KLGLDFPNLPYLIDGSHKITQSNAILRYLARKHHLDGETEEERIRADIVE
43NQVMDTRMQLIMLCYNPDFEKQKPEFLKTIPEKMKLYSEFLGKRPWFAGD
44KVTYVDFLAYDILDQYRMFEPKCLDAFPNLRDFLARFEGLKKISAYMKSS
45RYIATPIFSKMAHWSNK
46>GTB1_MOUSE ..
47; sq_len: 217
48; sq_type: p
49; al_start: 1
50; al_stop: 217
51; al_display_start: 1
52PMILGYWNVRGLTHPIRMLLEYTDSSYDEKRYTMGDAPDFDRSQWLNEKF
53KLGLDFPNLPYLIDGSHKITQSNAILRYLARKHHLDGETEEERIRADIVE
54NQVMDTRMQLIMLCYNPDFEKQKPEFLKTIPEKMKLYSEFLGKRPWFAGD
55KVTYVDFLAYDILDQYRMFEPKCLDAFPNLRDFLARFEGLKKISAYMKSS
56RYIATPIFSKMAHWSNK
57>>GT28_SCHJA GLUTATHIONE S-TRANSFERASE 28 KD (EC 2.5.1.18
58; fa_initn: 190
59; fa_init1: 97
60; fa_opt: 169
61; fa_z-score: 217.9
62; fa_expect: 1.1e-05
63; sw_score: 169
64; sw_ident: 0.277
65; sw_overlap: 228
66>GT8.7 ..
67; sq_len: 217
68; sq_type: p
69; al_start: 4
70; al_stop: 180
71; al_display_start: 1
72PMILGYWNVRGLTHPIRMLLEYTDSSYDEKRYTMGDAPDFDRSQWLNEKF
73KLGLDFPNLPY--LID--GSHK-ITQSNAILRYLARKHHLDGETEEERIR
74ADIVENQVMDTRMQLIMLCYNPDFEKQK--PEFLK-TIPEKMKLYSEFLG
75KRP--WFAGDKVTYVDFLAYDILDQYRMFEPKCLDA-FPNLRDFLARFEG
76LKKISAYMKSSRYIATPIFSKMAHWSNK
77>GT28_SCHJA ..
78; sq_len: 206
79; sq_type: p
80; al_start: 3
81; al_stop: 180
82; al_display_start: 1
83-VKLIYFNGRGRAEPIRMILVAAGVEFEDERIEFQDWP----------KI
84KPTIPGGRLPIVKITDKRGDVKTMSESLAIARFIARKHNMMGDTDDEYYI
85IEKMIGQVEDVESEYHKTLIKPPEEKEKISKEILNGKVPILLQAICETLK
86ESTGNLTVGDKVTLADVVLIASIDHITDLDKEFLTGKYPEIHKHRKHLLA
87TSPKLAKYLSERHATAF
88>>GT2_DROME GLUTATHIONE S-TRANSFERASE 2 (EC 2.5.1.18).
89; fa_initn: 124
90; fa_init1: 124
91; fa_opt: 164
92; fa_z-score: 210.1
93; fa_expect: 2.9e-05
94; sw_score: 164
95; sw_ident: 0.248
96; sw_overlap: 251
97>GT8.7 ..
98; sq_len: 217
99; sq_type: p
100; al_start: 4
101; al_stop: 198
102; al_display_start: 1
103---------------------------PMILGYWNVRGLTHPIRMLLEYT
104DSSYDEKRYTMGDAPDFDRSQWLNEKFKLGLDFPNLPYL-IDGSHKITQS
105NAILRYLARKHHLDGETEEERIRADIVENQVMDTRMQLIMLCYNPDFEKQ
106KPEFLKTIPEKMKLYSEFLGKR-----PWFAGDKVTYVDFLAYDILDQYR
107-MFEPKCLDAFPNLRDFLARFEGLKKISAYMKSSRYIATPIFSKMAHWSN
108K
109>GT2_DROME ..
110; sq_len: 247
111; sq_type: p
112; al_start: 52
113; al_stop: 240
114; al_display_start: 22
115PPAEGAEGAVEGGEAAPPAEPAEPIKHSYTLFYFNVKALPSPC------A
116TCSDGNQEYE--DVAHPRRVPALKPTMPMG----QMPVLEVDGK-RVHQS
117ISMARFLAKTVGLCGATPWEDLQIDIVVDTINDFRLKIAVVSYEPEDEIK
118EKKLVTLNAEVIPFYLEKLEQTVKDNDGHLALGKLTWADVYFAGITDYMN
119YMVKRDLLEPYPAVRGVVDAVNALEPIKAWIEKRPVTEV
120
121
122Note that the parseable output starts with ">>>" and that each
123alignment record starts with ">>" while each aligned sequence record
124starts with ">"
125
126All parameters produced by the fasta package will be of the form:
127
128 ; xx_yyyyy
129
130In this version, we have xx:
131
132 pg - program parameters (name, version, matrix)
133 fa - fasta scores, expect values, etc.
134 sw - Smith-Waterman scores, expect values.
135 sq - sequence length, type
136 al - alignment start, stop, display_offset
137
138Other FASTA distributors may choose to add additional fields. If they
139do, they should use a tag with more than two characters, e.g.:
140
141 ebi_access:
142or
143 gcg_?????
144
145The FASTA tags will be limited to two characters followed by a "_".
146
147All of the output parameters correspond to values that are presented
148in other FASTA output formats, with the exception of the "al_"
149parameters.
150
151al_start gives the location of the alignment start in the
152 original sequence
153
154al_stop gives the location of the end of the alignment in the
155 original sequence
156
157al_display_start
158 gives the location of the first displayed amino acid residue
159 in the original sequence. The -m 10 alignments are the same
160 as those produced in the other modes. In particular,
161 FASTA/SSEARCH provide some context for the alignment; if the
162 "-a" option is not used, FASTA/SSEARCH will try to provide
163 about 30 residues on either side of the actual local
164 alignment, if alignment is in the middle of one or the other
165 sequence. If the begining of the query sequence aligns with
166 the 10'th residue of the library sequence, then the query
167 sequence will be padded with ten leading "-" to produce the
168 alignment. The leading '-' are a formatting convenience only;
169 they are not considered in the numbering system for
170 al_display_start, al_start, or al_stop.
171
172 Thus:
173
174 >GT8.7 ..
175 ; sq_len: 217
176 ; sq_type: p
177 ; al_start: 3
178 ; al_stop: 180
179 ; al_display_start: 1
180 ---PMILGYWNVRGLTHPIRMLLEYTDSSYDEKRYTMGDAPDFDRSQWLN
181 EKFKLGLDFPNLPYLIDGSHKITQSNAILRYLARKHH---LDGETEEERI
182 RADIVENQVMDTRMQLIMLCYNPDFEKQKPEFLKTIPEKMKLYSEFLGKR
183 PWFAGDKVTYVDFLAYDILDQYRMFEPKCLDA------FPNLRDFLARFE
184 GLKKISAYMKSSRYIATPIFSKMAHWSNK
185 >ARP2_TOBAC ..
186 ; sq_len: 223
187 ; sq_type: p
188 ; al_start: 6
189 ; al_stop: 181
190 ; al_display_start: 1
191 MAEVKLLGFW-YSPFSHRVEWALKIKGVKYE---YIEEDRD--NKSSLLL
192 QSNPV---YKKVPVLIHNGKPIVESMIILEYIDETFEGPSILPKDPYDRA
193 LARFWAKFLDDKVAAVVNTFFRKGEEQEKGK--EEVYEMLKVLDNELKDK
194 KFFAGDKFGFADIAANLVGFWLGVFEEGYGDVLVKSEKFPNFSKWRDEYI
195 NCSQVNESLPPRDELLAFFRARFQAVVASRSAPK
196
197 Says that to align the two sequences, the first 'P' of GT8.7 must
198 line up with the first 'V' (residue 4) in ARP2_TOBAC but that
199 the actual best local alignment starts with the first 'I' in
200 GT8.7 and the first 'L' in ARP2_TOBAC.
201
202
readme.v20u5
1
2**Changes with release 20u5 - May 1996
3
4This version of the FASTA package includes FASTX - a program that
5compares a DNA sequence with a protein sequence library by translating
6the DNA sequence in three frames and finding the best match, with
7frame-shifts, between the translated DNA and protein sequence.
8(unlike BLASTX, FASTX only does a three-frame translation. To search
9all six frames, do a second search with the "-i" option).
10
11The code for aligning a three-frame protein sequence with a normal
12protein sequence was provided by Zheng Zhang and W. Miller of the
13Pennsylvania State University.
14
15A third gap parameter, the frameshifts penalty, is provided with the
16'-h' option. The default gap penalties are -15 for the first residue
17in a gap, -3 for each additional residue, and -30 for a frameshift.
18
19The '-h' option used to prevent the histogram display, that option is
20now invoked with '-H'.
21
22Much of the FASTX code is new and has not been tested nearly as
23extensively as the other fasta programs. Please inform me of bugs as
24you find them.
25
26================
27**Changes with release 20u51 - June, 1996
28
29Fixes to showalign for SHOWALL.
30
31Fixes to routines that read fasta format files for long DNA sequences.
32
33================
34**Changes with release 20u52 - July, 1996
35
36Fixes to lalign/plalign for setting gap penalties on DNA
37
38Fixes to fastx to correct bug in alignment routine.
39
40================
41**Changes with release 20u53 - July, 1996
42
43Another fix to fastx
44
45Added flalign, a version of plalign that produces a GCG fig file for
46local aligment graphics.
47
48Increased the size of sequences that can be aligned by lalign to
49120,000 residues with BIGMEM.
50
51First release of Mac version with FASTX.
52
53================
54
55Bill Pearson
56wrp@virginia.edu
57
readme.v20u6
1** Changes with release 20u67 - May, 1999
2
3Corrected serious problem with fastx alignments.
4
5** Changes with release 20u66 - September, 1998
6
7Made plalign, plfasta, psgrease, to be used with WWW pages.
8
9plalign and plfasta now generate postscript graphics, rather
10than tektronix graphics.
11
12psgrease makes postscript Kyte-Doolittle plots.
13
14Provide various *.htm and *.cgi files to implement WWW pages for
15lalign, plalign, grease, chofas, garnier.
16
17Updated grease (tgrease), chofas, and garnier for consistent
18user interface.
19
20** Changes with release 20u65 - May 1998
21
22Various minor bug-tweeks to the fastx function, faatran.c, and other
23programs associated with fastx.
24
25** Changes with release 20u64 - May, 1998
26
27Programs have been modified to accept query sequences from STDIN for
28WWW interfaces. prss, prdf, lalign, and plalign should accept input
29from STDIN. This makes it relatively easy to set up a prss WWW site.
30
31The translation routine used by FASTX has been modified to translate
32ambiguous nucleotides as 'X'.
33
34Problems with specifying gap penalties with DNA sequences have been
35corrected in prss.
36
37**Changes with release 20u6 - August 1996
38
39Another new program - TFASTX - compares a protein sequence to a
40translated, potenitally frameshifted, DNA library. TFASTX is a
41substantial improvement over TFASTA, although TFASTX is slower.
42
43The LALIGN/PLALIGN family now includes FLALIGN, which will write out
44alignment plots in GCG's Figure format.
45
46Mac - version now uses System7 Standard File routines.
47
48BLOSUM62 support finally included.
49
50** September, 1996
51
52Fixed another bug in fastx/tfastx.
53
54** September, 1996
55
56Fixed problem with query subsequence selection.
57
58Fixed problem with selectbestz().
59
60** September 23, 1996
61
62Fixed problem with in -m=10 Smith-Waterman alignments pointed
63out and corrected by Erik Wallen (erikw@biokemi.su.se).
64
65**Changes with release 20u61 - November, 1996
66
67Made corrections to fffasta.c, nxgetaa.c to support alternative
68protein scoring matrices with fastx.
69
70**Changes with release 20u62 - December, 1996
71
72A fix to nxgetaa.c to allow -i with fastx (bug caused by 20u61
73changes).
74
75**Changes with release 20u63 - December, 1996
76
77Corrected some problems with lfasta using -m 10 (parseable) output.
78
79**Changes with release 20u64 - September, 1997
80
81Modified faatran.c so that fastx searches with many "X"'s are
82translated to 'X', not 'K'.
83
84Added -m 5 (MARKX), which combines -m 0 and -m 4.
85
86Corrected problem with lalign/plalign/flalign and
87lfasta/plfasta/flfasta when reverse complemented DNA sequences were
88compared to the same file.
89
90
91Bill Pearson
92wrp@virginia.edu
93
readme.v21u0
1** January, 2007
2
3Modify align.c (used for global alignments) to use the same scoring matrices, and other options, as lalign2.c.
4
5Makefile has been modified so that make all only makes FASTA2 programs
6that are not part of FASTA3. The search programs are no longer made
7by default.
8
9** December, 2006
10
11Modify fffasta.c to support the **pam2 global in upam.gbl, rather than
12pam2[][] - **pam2 was introduced for lalign2.c/lsim3.c.
13
14** August, October 2006, 21u08 (lalign2.c)
15
16Make some efforts to remove global variables from lsim2.c, by
17replacing it with lsim3.c. Initial efforts in August, 2006,
18introduced a bug, which was detected and fixed in October.
19
20Provide option to show identical alignments.
21
22** May, 2005, 21u07 (lalign.c)
23
24Modify the code that checks for identical sequences to not assume
25sequences are identical just because the filenames are. They may
26be different because of sub-setting.
27
28Add -I option to show identical alignment.
29
30Update lalign.1 documentation.
31
32** April, 2004, 21u06 (lalign2.c)
33
34Fix problem reading external scoring matrix files. The file
35was not read, and then sequences were not read properly.
36
37Incorporate GAP_OPEN gap matrix options.
38
39Changes to allow G:U RNA base matches.
40
41** March, 2000, 21u02 (lalign.c)
42
43Added '-N length' option to limit query, library sequences to
44"length". Corrected problems with sequence numbering when
45subsequences were specified.
46
47Modifications to nrand.c to keep more bits and return random numbers
48from 0..n-1. Use "nrandom.c" rather than nrand.c if random() is
49available.
50
51Fixes to shuffling routines in randlib.c.
52
53** November, 2003
54
55Add Blosum80 matrix to lalign.c, upam.gbl.
56