README.md
1# xenaPython
2Python API for Xena Hub
3
4---------
5
6#### Requirement
7 support python2 python3
8
9
10#### Installation
11 pip install xenaPython
12
13
14#### Upgrade Installation
15 pip install --upgrade xenaPython
16
17
18#### Usage
19 >>> import xenaPython as Xena
20
21#### Examples
22
23##### 1: Query four samples and three identifers expression
24 import xenaPython as xena
25
26 hub = "https://toil.xenahubs.net"
27 dataset = "tcga_RSEM_gene_tpm"
28 samples = ["TCGA-02-0047-01","TCGA-02-0055-01","TCGA-02-2483-01","TCGA-02-2485-01"]
29 probes = ['ENSG00000282740.1', 'ENSG00000000005.5', 'ENSG00000000419.12']
30 [position, [ENSG00000282740_1, ENSG00000000005_5, ENSG00000000419_12]] = xena.dataset_probe_values(hub, dataset, samples, probes)
31 ENSG00000282740_1
32
33##### 2: Query four samples and three genes expression, when the dataset you want to query has a identifier-to-gene mapping (i.e. xena probeMap)
34 hub = "https://toil.xenahubs.net"
35 dataset = "tcga_RSEM_gene_tpm"
36 samples = ["TCGA-02-0047-01","TCGA-02-0055-01","TCGA-02-2483-01","TCGA-02-2485-01"]
37 genes =["TP53", "RB1", "PIK3CA"]
38 xena.dataset_gene_probe_avg(hub, dataset, samples, genes)
39
40##### 3: If the dataset does not have id-to-gene mapping, but the dataset used gene names as its identifier, you can query gene expression like example 1, example 2 will not work.
41 hub = "https://toil.xenahubs.net"
42 dataset = "tcga_RSEM_Hugo_norm_count"
43 samples = ["TCGA-02-0047-01","TCGA-02-0055-01","TCGA-02-2483-01","TCGA-02-2485-01"]
44 probes =["TP53", "RB1", "PIK3CA"]
45 [position, [TP53, RB1, PIK3CA]] = xena.dataset_probe_values (hub, dataset, samples, probes)
46 TP53
47
48##### 4: Find out the samples in a dataset
49 hub = "https://tcga.xenahubs.net"
50 dataset = "TCGA.BLCA.sampleMap/HiSeqV2"
51 xena.dataset_samples (hub, dataset, 10)
52 xena.dataset_samples (hub, dataset, None)
53
54##### 5: Find out the identifiers in a dataset
55 hub = "https://tcga.xenahubs.net"
56 dataset = "TCGA.BLCA.sampleMap/HiSeqV2"
57 xena.dataset_field (hub, dataset)
58
59##### 6. Find out the number of idnetifiers in a dataset
60 hub = "https://tcga.xenahubs.net"
61 dataset = "TCGA.BLCA.sampleMap/HiSeqV2"
62 xena.dataset_field_n (hub, dataset)
63
64##### 7. Find out hub id, dataset id
65 use xena browser datasets tool: https://xenabrowser.net/datapages/
66
67#### Help
68 >>> import xenaPython
69 >>> help(xenaPython)
70
71Help on package xenaPython:
72
73NAME
74
75 xenaPython - Methods for querying data from UCSC Xena hubs
76
77DESCRIPTION
78
79 Data rows are associated with "sample" IDs.
80 Sample IDs are unique within a "cohort".
81 A "dataset" is a particular assay of a cohort, e.g. gene expression.
82 Datasets have associated metadata, specifying their data type and cohort.
83
84 There are three primary data types: dense matrix (samples by probes),
85 sparse (sample, position, variant), and segmented (sample, position, value).
86
87
88 Dense matrices can be genotypic or phenotypic. Phenotypic matrices have
89 associated field metadata (descriptive names, codes, etc.).
90
91 Genotypic matricies may have an associated probeMap, which maps probes to
92 genomic locations. If a matrix has hugo probeMap, the probes themselves
93 are gene names. Otherwise, a probeMap is used to map a gene location to a
94 set of probes.
95
96FUNCTIONS
97
98 all_cohorts(host, exclude)
99
100 all_datasets(host)
101
102 all_datasets_n(host)
103 Count the number datasets with non-null cohort
104
105 all_field_metadata(host, dataset)
106 Metadata for all dataset fields (phenotypic datasets)
107
108 cohort_samples(host, cohort, limit)
109 All samples in cohort
110
111 cohort_summary(host, exclude)
112 Count datasets per-cohort, excluding the given dataset types
113
114 xena.cohort_summary(xena.PUBLIC_HUBS["pancanAtlasHub"], ["probeMap"])
115
116 dataset_fetch(host, dataset, samples, probes)
117 Probe values for give samples
118
119 dataset_field(host, dataset)
120 All field (probe) names in dataset
121
122 dataset_field_examples(host, dataset, count)
123 Field names in dataset, up to <count>
124
125 dataset_field_n(host, dataset)
126 Number of fields in dataset
127
128 dataset_gene_probe_avg(host, dataset, samples, genes)
129 Probe average, per-gene, for given samples
130
131 dataset_gene_probes_values(host, dataset, samples, genes)
132 Probe values in gene, and probe genomic positions, for given samples
133
134 dataset_list(host, cohorts)
135 Dataset metadata for datasets in the given cohorts
136
137 dataset_metadata(host, dataset)
138 Dataset metadata
139
140 dataset_probe_signature(host, dataset, samples, probes, weights)
141 Computed probe signature for given samples and weight array
142
143 dataset_probe_values(host, dataset, samples, probes)
144 Probe values for given samples, and probe genomic positions
145
146 host = xena.PUBLIC_HUBS["pancanAtlasHub"]
147 dataset = "EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena"
148 samples = xena.dataset_samples(host, dataset, None)
149 [position, [foxm1, tp53]] = xena.dataset_probe_values(host, dataset, samples, ["FOXM1", "TP53"])
150
151 dataset_samples(host, dataset, limit)
152 All samples in dataset (optional limit)
153
154 samples = xena.dataset_samples(xena.PUBLIC_HUBS["pancanAtlasHub"], "EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena", None)
155
156 dataset_samples_n_dense_matrix(host, dataset)
157 All samples in dataset (faster, for dense matrix dataset only)
158
159 datasets_null_rows(host)
160
161 feature_list(host, dataset)
162 Dataset field names and long titles (phenotypic datasets)
163
164 field_codes(host, dataset, fields)
165 Codes for categorical fields
166
167 field_metadata(host, dataset, fields)
168 Metadata for given fields (phenotypic datasets)
169
170 gene_transcripts(host, dataset, gene)
171 Gene transcripts
172
173 match_fields(host, dataset, names)
174 Find fields matching names (must be lower-case)
175
176 probe_count(host, dataset)
177
178 probemap_list(host)
179 Find probemaps
180
181 ref_gene_exons(host, dataset, genes)
182 Gene model
183
184 ref_gene_position(host, dataset, gene)
185 Gene position from gene model
186
187 ref_gene_range(host, dataset, chr, start, end)
188 Gene models overlapping range
189
190 segment_data_examples(host, dataset, count)
191 Initial segmented data rows, with limit
192
193 segmented_data_range(host, dataset, samples, chr, start, end)
194 Segmented (copy number) data overlapping range
195
196 sparse_data(host, dataset, samples, genes)
197 Sparse (mutation) data rows for genes
198
199 sparse_data_examples(host, dataset, count)
200 Initial sparse data rows, with limit
201
202 sparse_data_match_field(host, field, dataset, names)
203 Genes in sparse (mutation) dataset matching given names
204
205 sparse_data_match_field_slow(host, field, dataset, names)
206 Genes in sparse (mutation) dataset matching given names, case-insensitive (names must be lower-case)
207
208 sparse_data_match_partial_field(host, field, dataset, names, limit)
209 Partial match genes in sparse (mutation) dataset
210
211 sparse_data_range(host, dataset, samples, chr, start, end)
212 Sparse (mutation) data rows overlapping the given range, for the given samples
213
214 transcript_expression(host, transcripts, studyA, subtypeA, studyB, subtypeB, dataset)
215
216
217DATA
218
219 LOCAL_HUB = 'https://local.xena.ucsc.edu:7223'
220 PUBLIC_HUBS = {'gdcHub': 'https://gdc.xenahubs.net', 'icgcHub': 'https...
221 excludeType = ['probeMap', 'probemap', 'genePredExt']
222
223#### Contact
224 http://xena.ucsc.edu/
225 https://groups.google.com/forum/#!forum/ucsc-cancer-genomics-browser
226 genome-cancer@soe.ucsc.edu
227
228
229