• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

example/H02-Feb-2018-1616

LICENSEH A D02-Feb-20182 KiB3433

READMEH A D02-Feb-20185.9 KiB179123

alf.cppH A D02-Feb-20189.6 KiB204119

README

1               *** ALF - Alignment Free Sequence Comparison ***
2                       http://www.seqan.de/projects/alf
3                                January, 2012
4
5---------------------------------------------------------------------------
6Table of Contents
7---------------------------------------------------------------------------
8
9  1.   Overview
10  2.   Installation
11  3.   Usage
12  4.   Output Format
13  5.   Example
14  6.   Contact and Reference
15
16---------------------------------------------------------------------------
171. Overview
18---------------------------------------------------------------------------
19
20ALF can be used to calculate the pairwise similarity of sequences using
21alignment-free methods. All methods which are implemented are based on
22k-mer counts. More details can be found in the online documentation of the
23alignment-free methods (www.seqan.de). By default, ALF uses the
24N2 similarity measure.
25
26---------------------------------------------------------------------------
272. Installation
28---------------------------------------------------------------------------
29
30ALF is distributed with SeqAn - The C++ Sequence Analysis Library (see
31http://www.seqan.de). To build ALF from Git do the following:
32
33  1) git clone https://github.com/seqan/seqan.git
34  2) mkdir -p build/Release
35  3) cd build/Release
36  4) cmake ../../seqan -DCMAKE_BUILD_TYPE=Release
37  5) make alf
38  6) ./apps/alf/alf --help
39
40On success, an executable file alf was build and a brief usage description
41was dumped.
42
43For more information about retrieving SeqAn and prerequisites please visit
44
45  https://www.seqan.de/getting-started/
46
47---------------------------------------------------------------------------
483. Usage
49---------------------------------------------------------------------------
50
51To get a short usage description of ALF, you can execute alf -h or
52alf --help.
53
54Usage: alf [OPTION]... -i <MULTI FASTA FILE>
55
56ALF expects one DNA (multi-)Fasta file. For all pairs of sequences, the
57pairwise scores will be computed. A matrix of pairwise scores will be
58returned. The default behaviour can be modified by specifying the following
59options at the command line:
60
61---------------------------------------------------------------------------
623.1. Main Options
63---------------------------------------------------------------------------
64
65  [ -i ],   [ --input-file ]
66
67  Name of the multi fasta input file. Mandatory.
68
69  [ -o ],   [ --output-file ]
70
71  Name of the file to which the tab delimited matrix with pairwise scores
72  will be written. Default: stdout.
73
74  [ -m ],   [ --method ]
75
76  Method that will be udes for sequence comparison.
77  Default:N2 [N2, D2, D2Star, D2z]
78
79  [ -k ],   [ --k-mer-size ]
80
81  Size of the k-mers that will be counted. Default:4 [integer]
82
83  [ -mo ],  [ --bg-model-order ]
84
85  Order of background markov model for N2, D2Star, D2z. Default:1 [integer]
86
87  [ -rc ],  [ --reverse-complement ]
88
89  N2 only. Specify how the k-mer counts from the reverse and foreward
90  strand should be combined. By default, only the input sequence is used
91  for the comparison. Select 'bothStrands' to calculate the pairwise score
92  using both strands from the input sequences.  Default: input sequence
93  only. ['bothStrands','mean','min','max']
94
95  [ -mm ],  [ --mismatches ]
96
97  N2 only. Select -mm 1 if you want to include all words with one mismatch
98  to the k-mer neighbourhood. Default: Exact counts only [0,1]
99
100  [ -mmw ], [ --mismatch-weight ]
101
102  N2 only. Weight of counts for words with mismatches, only used in
103  combination with -mm 1. Default:0.1 [Double]
104
105  [ -kwf ], [ --k-mer-weights-file ]
106
107  N2 only. Print k-mer weights for every sequence to this file.
108
109  [ -v ],   [ --verbose ]
110
111  Specify this option to print details on progress to the screen.
112
113  [ -h ],  [ --help ]
114
115  Displays help message
116
117---------------------------------------------------------------------------
1184. Output Format
119---------------------------------------------------------------------------
120
121The program returns a (tab delimited) matrix with pairwise scores for all
122sequences from the input fasta file, for example:
123
124  1         0.046   0.052
125  0.046     1       0.992
126  0.052     0.992   1
127
128---------------------------------------------------------------------------
1295. Example
130---------------------------------------------------------------------------
131
132These examples use the fasta file "small.fasta" which can be found in
133seqan/apps/alf/example/. Copy this file to the directory where you
134execute alf.
135
136(1) Run ALF with default settings on two sequences:
137
138  ./alf -i small.fasta
139
140Output:
141
142  1           0.0463497
143  0.0463497   1
144
145(2) Calculate scores using N2 (-m N2), counting words of length 5 (-k 5) on
146both strands (-rc both_strands), including words with one mismatch into the
147word neighbourhood (-mm 1) with a weight of 0.5 (-mmw 0.5) and a background
148Markov model of order 1 (-mo 1), writing the output to a file
149(-o results.txt), saving all k-mer weights to a file (-kwf kmerWeights.txt):
150
151  ./alf -m N2 -k 5 -mo 1 -rc both_strands -mm 1 -mmw 0.5 -i small.fasta \
152      -o results.txt -kwf kmerWeights.txt
153
154---------------------------------------------------------------------------
1556. Contact and Reference
156---------------------------------------------------------------------------
157
158For questions or comments, contact:
159  Jonathan Goeke <goeke@molgen.mpg.de>
160
161Please reference the following publication if you used ALF or the N2 method
162for your analysis:
163
164  Jonathan Goeke, Marcel H. Schulz, Julia Lasserre, and Martin Vingron.
165  Estimation of Pairwise Sequence Similarity of Mammalian Enhancers with
166  Word Neighbourhood Counts. Bioinformatics (2012).
167
168---------------------------------------------------------------------------
1697. Version History
170---------------------------------------------------------------------------
171
172* 2012-07-17: Version 1.1
173  - Updated ALF to use the new ArgumentParser for command line parsing.
174  - Changed long parameter names to use --parameter-name instead of
175    --parameterName.
176
177* 2012-01-05: Version 1.0
178  - Initial Release of ALF.
179