README
1FASTA TEST DATA FILES
2=====================
3
4This directory contains various data files for testing the
5Fasta-related code in Biopython.
6
7The following are the common sequence file format, originally
8introduced as the input file format for Bill Pearson's FASTA
9tools. These are for tested in Bio.SeqIO and Bio.AlignIO
10(where the format is called "fasta") as well as other older
11parts of Biopython such as the Bio.Fasta module.
12
13ID Description
14f001 1 protein sequence
15f002 3 DNA sequences
16f003 2 proteins, with comments
17fa01 fasta alignment
18
19The following are example "machine readable" pairwise alignment
20output files from the FASTA tools when using the -m 10 command
21line option. These are for testing the Bio.AlignIO and Bio.SearchIO
22code where the format is called "fasta-m10".
23
24output001.m10 - fasta35 protein-protein, 3 query sequences,
25 no histogram, with expectation threshold
26output002.m10 - fasta34 protein-protein, 3 query sequences,
27 with offsets and word size, max 2 hits per query
28output003.m10 - fasta34 protein-protein, 5 query sequences,
29 very strict threshold so not all have hits.
30output004.m10 - fasta35 nucleotide-nucleotide, 3 queries where
31 only the middle one has a single hit.
32output005.m10 - ssearch35 protein-protein, 3 queries where
33 only the middle one has a single hit.
34output006.m10 - fasta35 nucleotide-nucleotide, 1 query, in the
35 alignment the query has been reversed.
36output007.m10 - recreation of output001.m10 using fasta-36.3.4 (note that
37 histogram is now off by default, -H now turns it on, and the
38 -Q quiet flag no longer exists). Get more hits due to revised
39 e-value calculations, also ">>><<<" marks end of a query, not
40 just end of the file!
41output008.m10 - tfastx36 protein-nucleotide, 4 queries, some with no hits,
42 some matches with multiple HSPs (new feature in FASTA v36)
43output009.m10 - fasta36, multiple dna queries
44output010.m10 - fasta36, single dna query, no hit
45output011.m10 - fasta36, single dna query, each hit contains a single hsp
46output012.m10 - fasta36, single dna query, some hits contain multiple hsps
47output013.m10 - fasta36, multiple protein queries
48output014.m10 - fasta36, single protein query, no hit
49output015.m10 - fasta36, single protein query, each hit contains a single hsp
50output016.m10 - fasta36, single protein query, some hits contain multiple hsps
51