• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

examples/H03-May-2022-1,2901,255

.hgsubH A D05-Sep-201747 11

.hgsubstateH A D05-Sep-201749 21

README.mdH A D05-Sep-201710.9 KiB203180

export2graphlan.pyH A D05-Sep-201729.5 KiB770564

README.md

1**export2graphlan** is a conversion software tool for producing both annotation and tree file for GraPhlAn. In particular, the annotation file tries to highlight specific sub-trees deriving automatically from input file what nodes are important. The two output file of **export2graphlan** should then be used to run ``graphlan_annotate.py``, in order to attach to the tree the derived annotations, and finally, by executing ``graphlan.py`` the user can get the output image.
2
3# PREREQUISITES #
4
5**export2graphlan** requires the following additional library:
6
7* pandas ver. 0.13.1 ([pandas](http://pandas.pydata.org/index.html))
8* BIOM ver. 2.0.1 ([biom-format](http://biom-format.org), only if you have input files in BIOM format)
9* SciPy ([scipy](http://www.scipy.org), required by hclust2)
10
11# INSTALLATION #
12
13**export2graphlan** should be obtained using [Mercurial](http://mercurial.selenic.com/) and is available in Bitbucket here: [export2graphlan repository](https://bitbucket.org/CibioCM/export2graphlan).
14
15In a Unix environment you have to type:
16```
17#!bash
18
19$ hg clone ssh://hg@bitbucket.org/CibioCM/export2graphlan
20```
21or, alternatively:
22```
23#!bash
24
25$ hg clone https://hg@bitbucket.org/CibioCM/export2graphlan
26```
27
28This will download the **export2graphlan** repository locally in the ``export2graphlan`` subfolder. You then have to put this subfolder into the system path, so that you can use **export2graphlan** from anywhere in your system:
29```
30#!bash
31
32$ export PATH=`pwd`/export2graphlan/:$PATH
33```
34Adding the above line into the bash configuration file will make the path addition permanent. For Windows or MacOS systems a similar procedure should be followed.
35
36# USAGE #
37```
38#!
39usage: export2graphlan.py [-h] [-i LEFSE_INPUT] [-o LEFSE_OUTPUT] -t TREE -a
40usage: export2graphlan.py [-h] [-i LEFSE_INPUT] [-o LEFSE_OUTPUT] -t TREE -a
41                          ANNOTATION [--annotations ANNOTATIONS]
42                          [--external_annotations EXTERNAL_ANNOTATIONS]
43                          [--background_levels BACKGROUND_LEVELS]
44                          [--background_clades BACKGROUND_CLADES]
45                          [--background_colors BACKGROUND_COLORS]
46                          [--title TITLE] [--title_font_size TITLE_FONT_SIZE]
47                          [--def_clade_size DEF_CLADE_SIZE]
48                          [--min_clade_size MIN_CLADE_SIZE]
49                          [--max_clade_size MAX_CLADE_SIZE]
50                          [--def_font_size DEF_FONT_SIZE]
51                          [--min_font_size MIN_FONT_SIZE]
52                          [--max_font_size MAX_FONT_SIZE]
53                          [--annotation_legend_font_size ANNOTATION_LEGEND_FONT_SIZE]
54                          [--abundance_threshold ABUNDANCE_THRESHOLD]
55                          [--most_abundant MOST_ABUNDANT]
56                          [--least_biomarkers LEAST_BIOMARKERS]
57                          [--discard_otus] [--internal_levels] [--sep SEP]
58                          [--out_table OUT_TABLE] [--fname_row FNAME_ROW]
59                          [--sname_row SNAME_ROW]
60                          [--metadata_rows METADATA_ROWS]
61                          [--skip_rows SKIP_ROWS] [--sperc SPERC]
62                          [--fperc FPERC] [--stop STOP] [--ftop FTOP]
63                          [--def_na DEF_NA]
64
65export2graphlan.py (ver. 0.17 of 21th August 2014). Convert MetaPhlAn, LEfSe,
66and/or HUMAnN output to GraPhlAn input format. Authors: Francesco Asnicar
67(francesco.asnicar@gmail.com)
68
69optional arguments:
70  -h, --help            show this help message and exit
71  --annotations ANNOTATIONS
72                        List which levels should be annotated in the tree. Use
73                        a comma separate values form, e.g.,
74                        --annotation_levels 1,2,3. Default is None
75  --external_annotations EXTERNAL_ANNOTATIONS
76                        List which levels should use the external legend for
77                        the annotation. Use a comma separate values form,
78                        e.g., --annotation_levels 1,2,3. Default is None
79  --background_levels BACKGROUND_LEVELS
80                        List which levels should be highlight with a shaded
81                        background. Use a comma separate values form, e.g.,
82                        --background_levels 1,2,3
83  --background_clades BACKGROUND_CLADES
84                        Specify the clades that should be highlight with a
85                        shaded background. Use a comma separate values form
86                        and surround the string with " if it contains spaces.
87                        Example: --background_clades "Bacteria.Actinobacteria,
88                        Bacteria.Bacteroidetes.Bacteroidia,
89                        Bacteria.Firmicutes.Clostridia.Clostridiales"
90  --background_colors BACKGROUND_COLORS
91                        Set the color to use for the shaded background. Colors
92                        can be either in RGB or HSV (using a semi-colon to
93                        separate values, surrounded with ()) format. Use a
94                        comma separate values form and surround the string
95                        with " if it contains spaces. Example:
96                        --background_colors "#29cc36, (150; 100; 100), (280;
97                        80; 88)"
98  --title TITLE         If specified set the title of the GraPhlAn plot.
99                        Surround the string with " if it contains spaces,
100                        e.g., --title "Title example"
101  --title_font_size TITLE_FONT_SIZE
102                        Set the title font size. Default is 15
103  --def_clade_size DEF_CLADE_SIZE
104                        Set a default size for clades that are not found as
105                        biomarkers by LEfSe. Default is 10
106  --min_clade_size MIN_CLADE_SIZE
107                        Set the minimum value of clades that are biomarkers.
108                        Default is 20
109  --max_clade_size MAX_CLADE_SIZE
110                        Set the maximum value of clades that are biomarkers.
111                        Default is 200
112  --def_font_size DEF_FONT_SIZE
113                        Set a default font size. Default is 10
114  --min_font_size MIN_FONT_SIZE
115                        Set the minimum font size to use. Default is 8
116  --max_font_size MAX_FONT_SIZE
117                        Set the maximum font size. Default is 12
118  --annotation_legend_font_size ANNOTATION_LEGEND_FONT_SIZE
119                        Set the font size for the annotation legend. Default
120                        is 10
121  --abundance_threshold ABUNDANCE_THRESHOLD
122                        Set the minimun abundace value for a clade to be
123                        annotated. Default is 20.0
124  --most_abundant MOST_ABUNDANT
125                        When only lefse_input is provided, you can specify how
126                        many clades highlight. Since the biomarkers are
127                        missing, they will be chosen from the most abundant
128  --least_biomarkers LEAST_BIOMARKERS
129                        When only lefse_input is provided, you can specify the
130                        minimum number of biomarkers to extract. The taxonomy
131                        is parsed, and the level is choosen in order to have
132                        at least the specified number of biomarkers
133  --discard_otus        If specified the OTU ids will be discarde from the
134                        taxonmy. Default behavior keep OTU ids in taxonomy
135  --internal_levels     If specified sum-up from leaf to root the abundances
136                        values. Default behavior do not sum-up abundances on
137                        the internal nodes
138
139input parameters:
140  You need to provide at least one of the two arguments
141
142  -i LEFSE_INPUT, --lefse_input LEFSE_INPUT
143                        LEfSe input data
144  -o LEFSE_OUTPUT, --lefse_output LEFSE_OUTPUT
145                        LEfSe output result data
146
147output parameters:
148  -t TREE, --tree TREE  Output filename where save the input tree for GraPhlAn
149  -a ANNOTATION, --annotation ANNOTATION
150                        Output filename where save GraPhlAn annotation
151
152Input data matrix parameters:
153  --sep SEP
154  --out_table OUT_TABLE
155                        Write processed data matrix to file
156  --fname_row FNAME_ROW
157                        row number containing the names of the features
158                        [default 0, specify -1 if no names are present in the
159                        matrix
160  --sname_row SNAME_ROW
161                        column number containing the names of the samples
162                        [default 0, specify -1 if no names are present in the
163                        matrix
164  --metadata_rows METADATA_ROWS
165                        Row numbers to use as metadata[default None, meaning
166                        no metadata
167  --skip_rows SKIP_ROWS
168                        Row numbers to skip (0-indexed, comma separated) from
169                        the input file[default None, meaning no rows skipped
170  --sperc SPERC         Percentile of sample value distribution for sample
171                        selection
172  --fperc FPERC         Percentile of feature value distribution for sample
173                        selection
174  --stop STOP           Number of top samples to select (ordering based on
175                        percentile specified by --sperc)
176  --ftop FTOP           Number of top features to select (ordering based on
177                        percentile specified by --fperc)
178  --def_na DEF_NA       Set the default value for missing values [default None
179                        which means no replacement]
180```
181
182*Note*: the last input parameters (``Input data matrix parameters``) refer to the **DataMatrix** class contained in the [hclust2](https://bitbucket.org/nsegata/hclust2/overview) repository.
183
184# EXAMPLES #
185The ``examples`` folder contains the following sub-folders: ``hmp_aerobiosis``, ``hmp_metahit_metabolic``, and ``hmp_metahit_mp2``.
186Each example should work just by typing in a terminal window (provided that you are inside one of the example folder) the following command:
187```
188#!bash
189
190$ ./PIPELINE.sh
191```
192
193If everything goes well you should find in the same folder of the example six new files: ``annot.txt``, ``outimg.png``, ``outimg_annot.png``, ``outimg_legend.png``, ``outtree.txt``, and ``tree.txt``. Where:
194
195* ``annot.txt``: contains the annotation that will be used by GraPhlAn, produced by the export2graphlan.py script
196* ``outimg.png``: is the circular tree produced by GraPhlAn
197* ``outimg_annot.png``: contains the annotation legend of the circular tree
198* ``outimg_legend.png``: contains the legends of the highlighted biomarkers in the circular tree
199* ``outtree.txt``: is the annotated tree produced by graphlan_annotate.py
200* ``tree.txt``: is the tree produced by the export2graphlan.py script
201
202# CONTACTS #
203Francesco Asnicar ([francescoasnicar@gmail.com](mailto:francescoasnicar@gmail.com))