• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

COPYRIGHTH A D03-May-2022605 1813

FileDlog.cH A D03-May-20224.3 KiB213182

MakefileH A D03-May-20228.4 KiB277170

README.versionsH A D03-May-20221.1 KiB6936

aacomp.cH A D03-May-20222.4 KiB12387

aamap.gblH A D03-May-2022422 159

ag_stats.cH A D03-May-20224.3 KiB195155

align.1H A D03-May-20223.4 KiB137131

align.cH A D03-May-20228.8 KiB378308

alt_parms.hH A D03-May-202210.1 KiB379350

altlib.hH A D03-May-20221.1 KiB5548

apam.cH A D03-May-20224.3 KiB181141

bestscor.cH A D03-May-20222.7 KiB144114

blosum50.matH A D03-May-20222.1 KiB3231

bovgh.seqH A D03-May-20222.5 KiB3938

bovprl.revH A D03-May-2022920 1413

bovprl.seqH A D03-May-2022986 1817

checkevent.cH A D03-May-20223.3 KiB187150

chofasH A D03-May-20221.7 KiB

chofas.cH A D03-May-20228.1 KiB346286

chofas.htmH A D03-May-20221.9 KiB6652

codaa.matH A D03-May-20221,012 3129

crandseq.cH A D03-May-20228 KiB404338

crck.cH A D03-May-2022745 2816

dna.matH A D03-May-2022489 2321

ecoli.compH A D03-May-2022658 2726

egmsmg.aaH A D03-May-20221.3 KiB2019

extractp.cH A D03-May-20227.5 KiB367309

f_band.cH A D03-May-20222.3 KiB11884

faatran.cH A D03-May-20222.8 KiB143116

fasta.1H A D03-May-202213.3 KiB493474

fasta.rspH A D03-May-202272 21

fasta20.docH A D03-May-202247.8 KiB1,131895

fasta20.meH A D03-May-202247.8 KiB1,1031,086

fastx.rspH A D03-May-202267 21

fffasta.cH A D03-May-202275.9 KiB3,2372,635

find.gblH A D03-May-2022680 3626

fldispn.cH A D03-May-20227.1 KiB409345

format.docH A D03-May-20224.1 KiB11984

fromgb.cH A D03-May-20224.3 KiB150101

g_band.cH A D03-May-202210 KiB474386

garnier.cH A D03-May-20225 KiB207169

garnier.cgiH A D03-May-20222.1 KiB11267

garnier.hH A D03-May-20225.7 KiB9590

garnier.htmH A D03-May-20221.9 KiB6752

gen.rspH A D03-May-202240 21

getenv.cH A D03-May-20221 KiB5748

getopt.cH A D03-May-20221.2 KiB6257

gonnet.matH A D03-May-2022952 2927

grease.cH A D03-May-20222.7 KiB135107

grease.htmH A D03-May-20221.6 KiB6051

idnaa.matH A D03-May-20221,019 3129

idpaa.matH A D03-May-2022836 2827

jtr_gst.aaH A D03-May-2022508 2625

l_band.cH A D03-May-20229.1 KiB437365

laacomp.cH A D03-May-20222.7 KiB13897

lalign.1H A D03-May-20225.7 KiB185179

lalign.cgiH A D03-May-20223.2 KiB15494

lalign.htmH A D03-May-20222.7 KiB10081

lalign2.cH A D03-May-202214.1 KiB595492

lcbo.aaH A D03-May-2022271 65

lcbo.vmsH A D03-May-2022272 76

lfasta.rspH A D03-May-202264 21

llmax.cH A D03-May-20227.3 KiB337271

llmax0.cH A D03-May-20228.9 KiB423361

lsim2.cH A D03-May-202226.2 KiB1,001841

lsim3.cH A D03-May-202225 KiB1,047841

lx_align3.cH A D03-May-202217.1 KiB661520

lx_band2.cH A D03-May-20224 KiB164136

make_vms.comH A D03-May-20225.1 KiB113111

makefile.32H A D03-May-20226.3 KiB196117

makefile.tcH A D03-May-20226.6 KiB204122

makefile.unxH A D03-May-20227.3 KiB257170

mchu.aaH A D03-May-2022213 54

mgstm1.aaH A D03-May-2022264 98

mgstm1.e05H A D03-May-20221.2 KiB2120

mgstm1.eeqH A D03-May-20221.1 KiB2120

mgstm1.esqH A D03-May-20221.1 KiB2120

mgstm1.ranH A D03-May-20227.4 KiB121120

mgstm1.revH A D03-May-20221.1 KiB1716

mgstm1.rsqH A D03-May-20221.2 KiB2120

mgstm1.rsq2H A D03-May-20221.2 KiB2120

mgstm1.seqH A D03-May-20221.1 KiB2120

mplotsub.cH A D03-May-20223.1 KiB168131

mstm1.ssqH A D03-May-20221.1 KiB2120

mtdispn.cH A D03-May-20227.9 KiB413324

musplfm.aaH A D03-May-2022272 76

mwkw.aaH A D03-May-20222 KiB3231

mwrtc1.aaH A D03-May-2022500 98

ncbl_head.hH A D03-May-2022948 3121

ncbl_lib.cH A D03-May-202211.9 KiB457379

ndispn.cH A D03-May-20227.9 KiB343290

nrand.cH A D03-May-2022497 3730

nrand48.cH A D03-May-2022386 2720

nrandom.cH A D03-May-2022368 2619

ntcomp.cH A D03-May-20222.2 KiB11984

nxgetaa.cH A D03-May-202231.7 KiB1,4591,226

oohu.aaH A D03-May-2022378 76

oohu.raaH A D03-May-2022401 87

pam.cH A D03-May-20222.8 KiB141102

pam120.matH A D03-May-20221 KiB3029

pam250.matH A D03-May-20221 KiB3029

pgrease.cgiH A D03-May-20223.3 KiB13683

plalign.cgiH A D03-May-20224.3 KiB190121

plalign.htmH A D03-May-20222.6 KiB9786

pll.rspH A D03-May-202258 21

plotsub.cH A D03-May-20221.9 KiB11093

prdf.1H A D03-May-20223.4 KiB118115

prdf.cH A D03-May-202229.7 KiB1,2611,020

prot.matH A D03-May-20221 KiB3229

prss.1H A D03-May-20223.4 KiB113110

prss.cH A D03-May-202215.1 KiB668550

ps_dispn.cH A D03-May-20229 KiB434351

ps_plotsub.cH A D03-May-20222.8 KiB141115

pscore.cH A D03-May-20222.3 KiB12096

qrhuld.aaH A D03-May-2022914 1615

qsubs.cH A D03-May-2022554 3122

qsubs.hH A D03-May-2022162 128

randlib.cH A D03-May-20228.8 KiB411330

randseq.1H A D03-May-2022958 4642

randseq.cH A D03-May-20229.3 KiB452375

randtest.cH A D03-May-2022239 159

readme.v15H A D03-May-20223 KiB6352

readme.v16H A D03-May-20225.2 KiB132101

readme.v17H A D03-May-2022504 1611

readme.v20H A D03-May-20223.7 KiB13081

readme.v20u4H A D03-May-20225.8 KiB202178

readme.v20u5H A D03-May-20221.7 KiB5737

readme.v20u6H A D03-May-20222.5 KiB9355

readme.v21u0H A D03-May-20221.6 KiB5634

relate.cH A D03-May-20228.3 KiB388317

release.v16H A D03-May-20221.3 KiB5030

release.v17H A D03-May-2022726 2916

res_stats.cH A D03-May-202211.7 KiB534458

revcomp.cH A D03-May-20222.8 KiB13399

rweibull.cH A D03-May-20224.6 KiB186114

scalesw2.cH A D03-May-202216 KiB689481

scalesws.cH A D03-May-202215.9 KiB686479

score_al.cH A D03-May-20229.4 KiB415338

simlib.hH A D03-May-20221.2 KiB4428

sindex.cH A D03-May-20229.5 KiB470372

ssearch.1H A D03-May-20226 KiB219211

ssearch.cH A D03-May-202239.2 KiB1,7021,466

test.seqH A D03-May-2022133 43

test.shH A D03-May-2022549 2321

tgrease.cH A D03-May-20225.1 KiB214176

time.cH A D03-May-2022640 4436

tldispn.cH A D03-May-20226 KiB344283

tplotsub.cH A D03-May-20222.7 KiB150126

translate.cH A D03-May-20222.4 KiB11994

ttdispn.cH A D03-May-20227.4 KiB403336

uascii.gblH A D03-May-20221.6 KiB5043

upam.gblH A D03-May-202210.8 KiB332299

uwgetaa.cH A D03-May-202214.8 KiB584469

vmsgeten.cH A D03-May-20221 KiB4340

xurt8c.aaH A D03-May-2022292 65

zs_exp.cH A D03-May-20221.1 KiB4929

zxlgmata.cH A D03-May-20226.7 KiB339251

zzgmata.gblH A D03-May-2022324 1511

zzlgmata.cH A D03-May-202212 KiB551428

README.versions

1
2May 13, 1997
3
4Version overview
5
6Currently, the fasta2u65.shar.Z is the latest complete fasta package,
7and it has a complete set of searching programs (fasta, ssearch,
8fastx, etc).  However, the searching programs are more in a maintenance
9mode, bug fixes only.
10
11The fasta3 series, which has ONLY the searching programs, has the
12latest versions of the algorithms and statistical methods.  fasta3
13also runs the exact same functions threaded (fasta3, fasta3_t) and in
14parallel using PVM.
15
16Here is a list of the programs, and where they can be found:
17
18program		fasta2		fasta3			replaced by
19
20fasta		yes		fasta3, fasta3_t
21
22ssearch		yes		ssearch3, ssearch3_t
23
24tfasta		yes		tfasta3, tfasta3_t	(tfastx3 preferred)
25
26fastx		yes		fastx3, fastx3_t
27
28tfastx		yes		tfastx3, tfastx3_t
29
30rdf2 (obsolete)	no		no			prdf2
31
32rss  (obsolete)	no		no			prss
33
34prdf2		yes		no
35
36prss		yes		no
37
38lfasta		yes		no
39
40lalign		yes		no
41
42plalign		yes		no
43
44flalign		yes		no
45
46align		yes		no
47
48align0		yes		no
49
50randseq		yes		no
51
52crandseq	yes		no
53
54aacomp		yes		no
55
56bestscor	yes		no
57
58fromgb		yes		no
59
60grease		yes		no
61
62tgrease		yes		no
63
64garnier		yes		no
65
66zs_exp		yes		no
67
68
69

readme.v15

1
2Changes with version 1.5
3
4	FASTA version 1.5 includes a number of substantial revisions
5to improve the performance and sensistivity of the program. Two
6changes are apparent.  It is now possible to tell the program to
7optimize all of the init1 scores greater than a threshold.  The
8threshold is set at the same value as the old FASTA cutoff score
9(approximately 0.5 standard deviations above the mean for average
10length sequences).  For highest sensitivity, you can use the -c option
11to set the threshold to 1.  (This will slow the search down about
125-fold).  In addition, you can tell FASTA to sort the results by the
13"init1" score, rather than the "initn" score, by using the "-1"
14option.  FASTA -1 ... will report the results the way the older FASTP
15program did.
16
17	A new method has been provided for selecting libraries. In the
18past, one could enter the name of a sequence file to be searched or a
19single letter that would specify a library from the list included in
20the $FASTLIBS file. Now, you can specify a set of library files with a
21string of letters preceeded by a '%'.  Thus, if the FASTLIBS file has
22the lines:
23
24	Genbank 64 primates$1P/seqlib/gbpri.seq
25	Genbank 64 rodents$1R/seqlib/gbrod.seq
26	Genbank 64 other mammals$1M/seqlib/gbmam.seq
27	Genbank 64 vertebrates $1B/seqlib/gbvrt.seq
28
29Then the string: "%PRMB" would tell FASTA to search the four libraries
30listed above.  The %PRMB string can be entered either on the command
31line or when the program asks for a filename or library letter.
32
33	FASTA1.5 also provides additional flexibility for specifying
34the number of results and alignments to be displayed with the -Q
35(quiet) option.  The "-b number" option allows you to specify the number of
36sequence scores to show when the search is finished.  Thus
37
38	FASTA -b 100 ...
39
40would tell the program to display the top 100 sequence scores. In the
41past, if you displayed 100 scores (in -Q mode), you would also have
42store 100 alignments. The "-d" option allows you to limit the number
43of alignments shown.  FASTA -b 100 -d 20 would show 100 scores and 20
44alignments.
45
46	The old "CUTOFF" parameter is no longer used.  The program
47stores the best 2000 (IBM-PC, MAC) or 6000 (Unix, VMS) scores and then
48throws out the lowest 25%, stores the next 500 (1500) better than the
49threshold determined with the first scores were discarded, and repeats
50the process as the library is scanned.  As a result, the best 1500 -
512000 (4500 - 6000) scores are saved.  The old cut-off parameter was
52also used to set the joining threshold for the calculation of the
53initn score from initial regions.  This joining threshold can now be
54set with the -g option or the GAPCUT parameter.
55
56	Finally, FASTA can provide a complete list of all of the
57sequences and scores calculated to a file with the "-r" (results)
58option.  FASTA -r results.out ... creates a file with a list of scores
59for every sequence in the library.  The list is not sorted, and only
60includes those scores calculated during the initial scan of the
61library (the optimized score is not calculated unless the -o option is
62used).
63

readme.v16

1Changes with 1.6c31a
2
3	(August, 1993) Released support for NCBI SEARCH and
4	BLASTP/BLASTN formats.
5
6	(November, 1993) Changes to nxgetaa.c to accomodate changes in
7	embl library format. Changes to ncbl_lib.c to work on DNA
8	sequences
9
10Changes with 1.6c24
11
12	(December 1992)  Added -e option for more selective scores.
13
14	(April 1993) Added #define SUPERFAMNUM for genpept.fasta
15	users.  By default, superfamily numbers are not returned from
16	fasta format (libtype=1) files.
17
18	(May 1993) Changed window shuffle routine in rdf2, rss, to
19	preserve locality of shuffle.
20
21Changes with version 1.6b
22
23	FASTA version 1.6b uses a new method for calculating optimal
24scores in a band (the optimization or last step in the FASTA
25algorithm). In addition, it uses a linear-space method for calculating
26the actual alignments.  The FASTA package also includes four new
27programs:
28
29	ssearch		a program to search a sequence database using
30			the rigorous Smith-Waterman algorith (this
31			program is about 100-fold slower than FASTA
32			with ktup=2 (for proteins).
33
34	rss		a version of rdf2 that uses a rigorous
35			Smith-Waterman calculation to score
36			similarities
37
38	lalign		A rigorous local sequence alignment program
39			that will display the N-best local alignments
40			(N=10 by default).
41
42	plalign		a version of lalign that plots the local alignments.
43
44
45	The lalign/plalign program incorporate the "sim" algorithm
46described by Huang and Miller (1991) Adv. Appl. Math. 12:337-357.
47The ssearch and rss programs incorporate algorithms described by
48Huang, Hardison, and Miller (1990) CABIOS 6:373-381.
49
50	Two new command line options are available:
51
52	-n	indicates that the query file is a nucleotide
53		sequence.  This  option can be very useful when
54		searching with consensus regulatory sequences.
55
56	-x "off1 off2"  allows you to specify an offset for the
57		beginning of a DNA or protein sequence.  For example,
58		if you are comparing upstream regions for two genes, and
59		the first sequence contains 500 nt of upstream
60		sequence while the second contains 300 nt of upstream
61		sequence, you might try:
62
63		fasta -x "-500 -300" seq1.nt seq2.nt
64
65		This option will not work properly with the translated
66		library sequence with tfasta.
67
68		(You should double check to be certain the negative
69		numbering works properly.)
70
71Changes with version 1.5
72
73	FASTA version 1.5 includes a number of substantial revisions
74to improve the performance and sensistivity of the program. Two
75changes are apparent.  It is now possible to tell the program to
76optimize all of the init1 scores greater than a threshold.  The
77threshold is set at the same value as the old FASTA cutoff score
78(approximately 0.5 standard deviations above the mean for average
79length sequences).  For highest sensitivity, you can use the -c option
80to set the threshold to 1.  (This will slow the search down about
815-fold).  In addition, you can tell FASTA to sort the results by the
82"init1" score, rather than the "initn" score, by using the "-1"
83option.  FASTA -1 ... will report the results the way the older FASTP
84program did.
85
86	A new method has been provided for selecting libraries. In the
87past, one could enter the name of a sequence file to be searched or a
88single letter that would specify a library from the list included in
89the $FASTLIBS file. Now, you can specify a set of library files with a
90string of letters preceeded by a '%'.  Thus, if the FASTLIBS file has
91the lines:
92
93	Genbank 64 primates$1P/seqlib/gbpri.seq
94	Genbank 64 rodents$1R/seqlib/gbrod.seq
95	Genbank 64 other mammals$1M/seqlib/gbmam.seq
96	Genbank 64 vertebrates $1B/seqlib/gbvrt.seq
97
98Then the string: "%PRMB" would tell FASTA to search the four libraries
99listed above.  The %PRMB string can be entered either on the command
100line or when the program asks for a filename or library letter.
101
102	FASTA1.5 also provides additional flexibility for specifying
103the number of results and alignments to be displayed with the -Q
104(quiet) option.  The "-b number" option allows you to specify the number of
105sequence scores to show when the search is finished.  Thus
106
107	FASTA -b 100 ...
108
109would tell the program to display the top 100 sequence scores. In the
110past, if you displayed 100 scores (in -Q mode), you would also have
111store 100 alignments. The "-d" option allows you to limit the number
112of alignments shown.  FASTA -b 100 -d 20 would show 100 scores and 20
113alignments.
114
115	The old "CUTOFF" parameter is no longer used.  The program
116stores the best 2000 (IBM-PC, MAC) or 6000 (Unix, VMS) scores and then
117throws out the lowest 25%, stores the next 500 (1500) better than the
118threshold determined with the first scores were discarded, and repeats
119the process as the library is scanned.  As a result, the best 1500 -
1202000 (4500 - 6000) scores are saved.  The old cut-off parameter was
121also used to set the joining threshold for the calculation of the
122initn score from initial regions.  This joining threshold can now be
123set with the -g option or the GAPCUT parameter.
124
125	Finally, FASTA can provide a complete list of all of the
126sequences and scores calculated to a file with the "-r" (results)
127option.  FASTA -r results.out ... creates a file with a list of scores
128for every sequence in the library.  The list is not sorted, and only
129includes those scores calculated during the initial scan of the
130library (the optimized score is not calculated unless the -o option is
131used).
132

readme.v17

1Changes with 1.7
2
3	(February 1994) Replaced rdf2 and rss with prdf, prss, which
4	calculate informative score probabilities by fitting the
5	distribution of shuffled scores to an extreme value
6	distribution.  The curve fitting routines in rweibull.c were
7	provided by Phil Green, Washington U., St. Louis.
8
9	"-i" switch to reverse-complement query sequence if it is
10	DNA.
11
12	Fix bug in zzlgmata.c that caused problems with alignments.
13
14	Fix histogram routine to work properly with ln() normalized
15	scores.
16

readme.v20

1
2Changes with 2.0  (March, 1995)
3
4	WARNING - Optimization is now turned on by default.  The
5	meaning of the "-o" option has been reversed.  "-o" now turns
6	off optimization, reverting to the earlier method of sorting
7	by "initn" scores.
8
9	Change default protein matrix to BLOSUM50.  PAM250 is
10	still available with -s 250.  Change program to accept gap
11	penalties from the command line with "-f" (-12) and "-g" (-2).
12
13	Provide MARKX=4, which allows one to display the conserved
14	regions of the query sequence after a library search.
15
16	Calculate explicit probability estimates for FASTA, TFASTA,
17	and SSEARCH.  Estimates assume that the library contains a large
18	number of unrelated sequences.  If this is not correct, the
19	estimates are useless (and should be turned off with the -z
20	flag).
21
22	The width of the band used to calculate optimized scores is
23	now variable.  For proteins and ktup=1, 32 residues are used,
24	otherwise 16 residues are used.  For DNA, 16 residues are
25	used. This value can be changed with the "-y" option.
26
27	FASTA alignments now use the Smith-Waterman algorithm; there
28	is no longer a limit on gap size for FASTA alignments.
29
30	Fixed a rare bug in lalign/plalign for low gap penalties.
31
32	Fixed lfasta to read one letter filenames in second position.
33
34April 5, 1995
35
36	Fixed bug in blast-format file reading treat sequences that do
37	not end in "*" properly.
38
39May, 1995
40
41	More accurate display of the expected value histogram.  The
42	quality of the fit is now quantitated with the
43	Kolmogorov-Smirnov statistic.
44
45	DNA match/mismatch penalties changed to +5/-4.
46
47	An expectation theshold (-E) is provided for displaying
48	scores.
49
50July, 1995
51
52	Corrected a very serious bug in ssearch E()-score calculation
53	for large databases.
54
55	Corrected a minor problem with histogram scaling.
56
57	Removed Kolmogorov-Smirnov statistic if histogram not shown.
58
59	Show correct scoring matrix if specified matrix is not
60	found.
61
62August, 1995
63
64	Some corrections so that "-z" flag works properly and statistical
65	calculations fall back properly when no distribution of lengths
66	is available.
67
682.0x3	Change default DNA and TFASTA alignments to older band-limited
69	Smith-Waterman rather than full Smith-Waterman.  Now DNA
70	sequence searches are as fast as before (with Smith-Waterman
71	alignments, they were often 50 times slower).  Full Smith-Waterman
72	alignments are available with the "-A" option.
73
74	Small changes in the way that memory is allocated for
75	alignments in FASTA, TFASTA, LFASTA/PLFASTA, and SSEARCH.
76
77	The DOS/BorlandC and UNIX versions have been merged.  All
78	files necessary for compilation on Dos/WinNT are included.
79
80	Added -O option to FASTA, TFASTA, LFASTA, PRSS, PRDF, LALIGN, ALIGN
81	to specify output file.
82
832.0u3	merge of Mac FASTA with Win/DOS, Unix FASTA to a single set of files.
84
85Sept, 1995
86
87	add -Q option to prss, prdf.  Fix bug in -O option for those
88	programs.
89
90	Allow longer lengths for filenames.  Use QFILE_SIZE and LFILE_SIZE
91	to define lengths for query and library file names (40, 80 for
92	microcomputers, 256 for Unix).
93
94November, 1995
95
96	Fix bug in nxgetaa.c that prevented reading multiple
97	blast-formatted files.
98
99February, 1996
100
101	see readme.v20u4 for more information
102
103	added -m 10 option for parseable output
104
105	added library_type 6 for GCG formatted files
106
107	added -L option for long descriptions of library sequences
108
109	"randseq" random shuffle program now available.
110
111March, 1996
112
113	modified nxgetaa for 12 character locus names.
114
115	fixed a bug in lfasta that appears with very long sequences
116
117April, 1996
118
119	Make certain '-z' option really works (required for libraries with
120	sequences < 10 aa).
121
122	Removed duplicate sw_score: in ssearch with -m 10.
123
124	Added -DPROGRESS to report progress of search with "....".
125
126Mar, 1996
127
128	Added "fastx", see readme.v20u5.  "-h" is not "-H".
129
130

readme.v20u4

1Changes with 2.0u4 (February, 1996)
2
3Added '-L' option, which provides a longer discription of the library
4sequence.
5
6Fixed a bug in the -m 10 parseable output.
7
8Support is now provided for version 8.0 GCG libraries, both protein
9and DNA. Use library type 6.
10
11Changes with 2.0x4  (January, 1996)
12
13The major change in with 2.0x4 is the ability to get a parseable
14output from FASTA/TFASTA/SSEARCH.  This can be done using output
15option -m 10.  With -m 10, the initial histogram and list of best
16scores is unchanges, but the alignments are now in a parseable form:
17
18>>>mgstm1.aa, 217 aa vs s library
19; pg_name: FASTA
20; pg_ver: version 2.0x4 Jan., 1996
21; pg_matrix: BLOSUM50
22; pg_gap-pen: -12 -2
23; pg_ktup: 1
24; pg_optcut: 30
25; pg_cgap: 42
26>>GTB1_MOUSE GLUTATHIONE S-TRANSFERASE GT8.7 (EC 2.5.1.18
27; fa_initn: 1490
28; fa_init1: 1490
29; fa_opt: 1490
30; fa_z-score: 1916.0
31; fa_expect:      0
32; sw_score: 1490
33; sw_ident: 1.000
34; sw_overlap: 217
35>GT8.7  ..
36; sq_len: 217
37; sq_type: p
38; al_start: 1
39; al_stop: 217
40; al_display_start: 1
41PMILGYWNVRGLTHPIRMLLEYTDSSYDEKRYTMGDAPDFDRSQWLNEKF
42KLGLDFPNLPYLIDGSHKITQSNAILRYLARKHHLDGETEEERIRADIVE
43NQVMDTRMQLIMLCYNPDFEKQKPEFLKTIPEKMKLYSEFLGKRPWFAGD
44KVTYVDFLAYDILDQYRMFEPKCLDAFPNLRDFLARFEGLKKISAYMKSS
45RYIATPIFSKMAHWSNK
46>GTB1_MOUSE ..
47; sq_len: 217
48; sq_type: p
49; al_start: 1
50; al_stop: 217
51; al_display_start: 1
52PMILGYWNVRGLTHPIRMLLEYTDSSYDEKRYTMGDAPDFDRSQWLNEKF
53KLGLDFPNLPYLIDGSHKITQSNAILRYLARKHHLDGETEEERIRADIVE
54NQVMDTRMQLIMLCYNPDFEKQKPEFLKTIPEKMKLYSEFLGKRPWFAGD
55KVTYVDFLAYDILDQYRMFEPKCLDAFPNLRDFLARFEGLKKISAYMKSS
56RYIATPIFSKMAHWSNK
57>>GT28_SCHJA GLUTATHIONE S-TRANSFERASE 28 KD (EC 2.5.1.18
58; fa_initn: 190
59; fa_init1: 97
60; fa_opt: 169
61; fa_z-score: 217.9
62; fa_expect: 1.1e-05
63; sw_score: 169
64; sw_ident: 0.277
65; sw_overlap: 228
66>GT8.7  ..
67; sq_len: 217
68; sq_type: p
69; al_start: 4
70; al_stop: 180
71; al_display_start: 1
72PMILGYWNVRGLTHPIRMLLEYTDSSYDEKRYTMGDAPDFDRSQWLNEKF
73KLGLDFPNLPY--LID--GSHK-ITQSNAILRYLARKHHLDGETEEERIR
74ADIVENQVMDTRMQLIMLCYNPDFEKQK--PEFLK-TIPEKMKLYSEFLG
75KRP--WFAGDKVTYVDFLAYDILDQYRMFEPKCLDA-FPNLRDFLARFEG
76LKKISAYMKSSRYIATPIFSKMAHWSNK
77>GT28_SCHJA ..
78; sq_len: 206
79; sq_type: p
80; al_start: 3
81; al_stop: 180
82; al_display_start: 1
83-VKLIYFNGRGRAEPIRMILVAAGVEFEDERIEFQDWP----------KI
84KPTIPGGRLPIVKITDKRGDVKTMSESLAIARFIARKHNMMGDTDDEYYI
85IEKMIGQVEDVESEYHKTLIKPPEEKEKISKEILNGKVPILLQAICETLK
86ESTGNLTVGDKVTLADVVLIASIDHITDLDKEFLTGKYPEIHKHRKHLLA
87TSPKLAKYLSERHATAF
88>>GT2_DROME GLUTATHIONE S-TRANSFERASE 2 (EC 2.5.1.18).
89; fa_initn: 124
90; fa_init1: 124
91; fa_opt: 164
92; fa_z-score: 210.1
93; fa_expect: 2.9e-05
94; sw_score: 164
95; sw_ident: 0.248
96; sw_overlap: 251
97>GT8.7  ..
98; sq_len: 217
99; sq_type: p
100; al_start: 4
101; al_stop: 198
102; al_display_start: 1
103---------------------------PMILGYWNVRGLTHPIRMLLEYT
104DSSYDEKRYTMGDAPDFDRSQWLNEKFKLGLDFPNLPYL-IDGSHKITQS
105NAILRYLARKHHLDGETEEERIRADIVENQVMDTRMQLIMLCYNPDFEKQ
106KPEFLKTIPEKMKLYSEFLGKR-----PWFAGDKVTYVDFLAYDILDQYR
107-MFEPKCLDAFPNLRDFLARFEGLKKISAYMKSSRYIATPIFSKMAHWSN
108K
109>GT2_DROME ..
110; sq_len: 247
111; sq_type: p
112; al_start: 52
113; al_stop: 240
114; al_display_start: 22
115PPAEGAEGAVEGGEAAPPAEPAEPIKHSYTLFYFNVKALPSPC------A
116TCSDGNQEYE--DVAHPRRVPALKPTMPMG----QMPVLEVDGK-RVHQS
117ISMARFLAKTVGLCGATPWEDLQIDIVVDTINDFRLKIAVVSYEPEDEIK
118EKKLVTLNAEVIPFYLEKLEQTVKDNDGHLALGKLTWADVYFAGITDYMN
119YMVKRDLLEPYPAVRGVVDAVNALEPIKAWIEKRPVTEV
120
121
122Note that the parseable output starts with ">>>" and that each
123alignment record starts with ">>" while each aligned sequence record
124starts with ">"
125
126All parameters produced by the fasta package will be of the form:
127
128	; xx_yyyyy
129
130In this version, we have xx:
131
132	pg - program parameters (name, version, matrix)
133	fa - fasta scores, expect values, etc.
134	sw - Smith-Waterman scores, expect values.
135	sq - sequence length, type
136	al - alignment start, stop, display_offset
137
138Other FASTA distributors may choose to add additional fields.  If they
139do, they should use a tag with more than two characters, e.g.:
140
141	ebi_access:
142or
143	gcg_?????
144
145The FASTA tags will be limited to two characters followed by a "_".
146
147All of the output parameters correspond to values that are presented
148in other FASTA output formats, with the exception of the "al_"
149parameters.
150
151al_start gives the location of the alignment start in the
152	original sequence
153
154al_stop gives the location of the end of the alignment in the
155	original sequence
156
157al_display_start
158	gives the location of the first displayed amino acid residue
159	in the original sequence.  The -m 10 alignments are the same
160	as those produced in the other modes. In particular,
161	FASTA/SSEARCH provide some context for the alignment; if the
162	"-a" option is not used, FASTA/SSEARCH will try to provide
163	about 30 residues on either side of the actual local
164	alignment, if alignment is in the middle of one or the other
165	sequence.  If the begining of the query sequence aligns with
166	the 10'th residue of the library sequence, then the query
167	sequence will be padded with ten leading "-" to produce the
168	alignment.  The leading '-' are a formatting convenience only;
169	they are not considered in the numbering system for
170	al_display_start, al_start, or al_stop.
171
172	Thus:
173
174	>GT8.7  ..
175	; sq_len: 217
176	; sq_type: p
177	; al_start: 3
178	; al_stop: 180
179	; al_display_start: 1
180	---PMILGYWNVRGLTHPIRMLLEYTDSSYDEKRYTMGDAPDFDRSQWLN
181	EKFKLGLDFPNLPYLIDGSHKITQSNAILRYLARKHH---LDGETEEERI
182	RADIVENQVMDTRMQLIMLCYNPDFEKQKPEFLKTIPEKMKLYSEFLGKR
183	PWFAGDKVTYVDFLAYDILDQYRMFEPKCLDA------FPNLRDFLARFE
184	GLKKISAYMKSSRYIATPIFSKMAHWSNK
185	>ARP2_TOBAC ..
186	; sq_len: 223
187	; sq_type: p
188	; al_start: 6
189	; al_stop: 181
190	; al_display_start: 1
191	MAEVKLLGFW-YSPFSHRVEWALKIKGVKYE---YIEEDRD--NKSSLLL
192	QSNPV---YKKVPVLIHNGKPIVESMIILEYIDETFEGPSILPKDPYDRA
193	LARFWAKFLDDKVAAVVNTFFRKGEEQEKGK--EEVYEMLKVLDNELKDK
194	KFFAGDKFGFADIAANLVGFWLGVFEEGYGDVLVKSEKFPNFSKWRDEYI
195	NCSQVNESLPPRDELLAFFRARFQAVVASRSAPK
196
197 	Says that to align the two sequences, the first 'P' of GT8.7 must
198	line up with the first 'V' (residue 4) in ARP2_TOBAC but that
199	the actual best local alignment starts with the first 'I' in
200	GT8.7 and the first 'L' in ARP2_TOBAC.
201
202

readme.v20u5

1
2**Changes with release 20u5 - May 1996
3
4This version of the FASTA package includes FASTX - a program that
5compares a DNA sequence with a protein sequence library by translating
6the DNA sequence in three frames and finding the best match, with
7frame-shifts, between the translated DNA and protein sequence.
8(unlike BLASTX, FASTX only does a three-frame translation.  To search
9all six frames, do a second search with the "-i" option).
10
11The code for aligning a three-frame protein sequence with a normal
12protein sequence was provided by Zheng Zhang and W. Miller of the
13Pennsylvania State University.
14
15A third gap parameter, the frameshifts penalty, is provided with the
16'-h' option.  The default gap penalties are -15 for the first residue
17in a gap, -3 for each additional residue, and -30 for a frameshift.
18
19The '-h' option used to prevent the histogram display, that option is
20now invoked with '-H'.
21
22Much of the FASTX code is new and has not been tested nearly as
23extensively as the other fasta programs. Please inform me of bugs as
24you find them.
25
26================
27**Changes with release 20u51 - June, 1996
28
29Fixes to showalign for SHOWALL.
30
31Fixes to routines that read fasta format files for long DNA sequences.
32
33================
34**Changes with release 20u52 - July, 1996
35
36Fixes to lalign/plalign for setting gap penalties on DNA
37
38Fixes to fastx to correct bug in alignment routine.
39
40================
41**Changes with release 20u53 - July, 1996
42
43Another fix to fastx
44
45Added flalign, a version of plalign that produces a GCG fig file for
46local aligment graphics.
47
48Increased the size of sequences that can be aligned by lalign to
49120,000 residues with BIGMEM.
50
51First release of Mac version with FASTX.
52
53================
54
55Bill Pearson
56wrp@virginia.edu
57

readme.v20u6

1** Changes with release 20u67 - May, 1999
2
3Corrected serious problem with fastx alignments.
4
5** Changes with release 20u66 - September, 1998
6
7Made plalign, plfasta, psgrease, to be used with WWW pages.
8
9plalign and plfasta now generate postscript graphics, rather
10than tektronix graphics.
11
12psgrease makes postscript Kyte-Doolittle plots.
13
14Provide various *.htm and *.cgi files to implement WWW pages for
15lalign, plalign, grease, chofas, garnier.
16
17Updated grease (tgrease), chofas, and garnier for consistent
18user interface.
19
20** Changes with release 20u65 - May 1998
21
22Various minor bug-tweeks to the fastx function, faatran.c, and other
23programs associated with fastx.
24
25** Changes with release 20u64 - May, 1998
26
27Programs have been modified to accept query sequences from STDIN for
28WWW interfaces.  prss, prdf, lalign, and plalign should accept input
29from STDIN. This makes it relatively easy to set up a prss WWW site.
30
31The translation routine used by FASTX has been modified to translate
32ambiguous nucleotides as 'X'.
33
34Problems with specifying gap penalties with DNA sequences have been
35corrected in prss.
36
37**Changes with release 20u6 - August 1996
38
39Another new program - TFASTX - compares a protein sequence to a
40translated, potenitally frameshifted, DNA library.  TFASTX is a
41substantial improvement over TFASTA, although TFASTX is slower.
42
43The LALIGN/PLALIGN family now includes FLALIGN, which will write out
44alignment plots in GCG's Figure format.
45
46Mac - version now uses System7 Standard File routines.
47
48BLOSUM62 support finally included.
49
50** September, 1996
51
52Fixed another bug in fastx/tfastx.
53
54** September, 1996
55
56Fixed problem with query subsequence selection.
57
58Fixed problem with selectbestz().
59
60** September 23, 1996
61
62Fixed problem with in -m=10 Smith-Waterman alignments pointed
63out and corrected by Erik Wallen (erikw@biokemi.su.se).
64
65**Changes with release 20u61 - November, 1996
66
67Made corrections to fffasta.c, nxgetaa.c to support alternative
68protein scoring matrices with fastx.
69
70**Changes with release 20u62 - December, 1996
71
72A fix to nxgetaa.c to allow -i with fastx (bug caused by 20u61
73changes).
74
75**Changes with release 20u63 - December, 1996
76
77Corrected some problems with lfasta using -m 10 (parseable) output.
78
79**Changes with release 20u64 - September, 1997
80
81Modified faatran.c so that fastx searches with many "X"'s are
82translated to 'X', not 'K'.
83
84Added -m 5 (MARKX), which combines -m 0 and -m 4.
85
86Corrected problem with lalign/plalign/flalign and
87lfasta/plfasta/flfasta when reverse complemented DNA sequences were
88compared to the same file.
89
90
91Bill Pearson
92wrp@virginia.edu
93

readme.v21u0

1** January, 2007
2
3Modify align.c (used for global alignments) to use the same scoring matrices, and other options, as lalign2.c.
4
5Makefile has been modified so that make all only makes FASTA2 programs
6that are not part of FASTA3.  The search programs are no longer made
7by default.
8
9** December, 2006
10
11Modify fffasta.c to support the **pam2 global in upam.gbl, rather than
12pam2[][] - **pam2 was introduced for lalign2.c/lsim3.c.
13
14** August, October 2006, 21u08 (lalign2.c)
15
16Make some efforts to remove global variables from lsim2.c, by
17replacing it with lsim3.c.  Initial efforts in August, 2006,
18introduced a bug, which was detected and fixed in October.
19
20Provide option to show identical alignments.
21
22** May, 2005,  21u07 (lalign.c)
23
24Modify the code that checks for identical sequences to not assume
25sequences are identical just because the filenames are.  They may
26be different because of sub-setting.
27
28Add -I option to show identical alignment.
29
30Update lalign.1 documentation.
31
32** April, 2004, 21u06 (lalign2.c)
33
34Fix problem reading external scoring matrix files.  The file
35was not read, and then sequences were not read properly.
36
37Incorporate GAP_OPEN gap matrix options.
38
39Changes to allow G:U RNA base matches.
40
41** March, 2000, 21u02 (lalign.c)
42
43Added '-N length' option to limit query, library sequences to
44"length".  Corrected problems with sequence numbering when
45subsequences were specified.
46
47Modifications to nrand.c to keep more bits and return random numbers
48from 0..n-1.  Use "nrandom.c" rather than nrand.c if random() is
49available.
50
51Fixes to shuffling routines in randlib.c.
52
53** November, 2003
54
55Add Blosum80 matrix to lalign.c, upam.gbl.
56