README_relative_prot_rates.md
1## Site-wise relative rate estimator for protein multiple sequence alignments
2
3_Written by Sergei L Kosakovsky Pond [spond@temple.edu] and Stephanie J Spielman_
4
5> 2017-01-25: v0.1. Initial release.
6
7
8### Motivation
9
10This analysis performs a "non-paramteric" estimation of site-level substitution rates in a multiple sequence alignment of protein sequences. This allows one to evaluate levels of substitutional rate heterogeneity and, by extension, conservation. This analysis is based on the ["Rate4Site" method](http://www.tau.ac.il/~itaymay/cp/rate4site.html).
11
12### Analysis workflow
13
14#### Input
15
161. A protein sequence alignment (**file**)
172. A phylogenetic tree
18
19#### Output
20
211. **standard output**: a MarkDown file (see sample.md)
222. **file.json**: a JSON object (see sample.json), which contains the following keys
23 * `Relative site rate estimates`: for each site, a record like <pre>"1":{
24 "LB":0.9712850593725352,
25 "MLE":1.343044028469595,
26 "UB":1.821044718637831
27 }</pre> is provided. **MLE** is the point estimate of the relative rate at the site, **UB** and **LB** are the upper and lower bounds, respectively, of the profile likelihood confidence interval.
28 * `alignment`: file path for the alignment used to infer the rates, e.g. _/Users/sergei/Dropbox/Work/Collaborations/rates4sites/sim178.fasta_
29 * `analysis`: an object describing the version of the analysis run
30
31#### Procedure
32
331. Fit a protein model of sequence evolution to the entire alignment to obtain branch lengths.
342. For each site, fixing all the other model parameters, estimate site level scaler for branch lengths: **r<sub>i</sub>**, i.e. for site **i**, the following relationship holds, for each branch **b**: **length(b | data @ site i) = r<sub>i</sub> length (b | entire alignment)**
353. The MLE for **r<sub>i</sub>**, along with a profile likelihood confidence interval, is obtained.
36
37#### Features
38
39* MPI Enabled