1# peak-classifier 2 3## Description 4 5peak-classifier classify ChIP/ATAC-Seq peaks based on features provided in 6a GFF file. 7 8Peaks are provided in a BED file sorted by chromosome and position. Typically 9these are output from a peak caller such as MACS2, or the differential 10analysis that follows. The GFF must also be sorted by chromosome, position, 11and subfeature, which is the default for common data sources. 12 13Peak-classifier generates features that are not explicitly identified in the 14GFF, such as introns and potential promoter regions, and outputs the augmented 15feature list to a BED file. It then identifies overlapping features by 16running bedtools intersect on the augmented feature list and peak list, 17outputting an annotated BED-like TSV file with additional columns to describe 18the feature. If a peak overlaps multiple features, a separate line is output 19for each. 20 21Alternative approaches to this problem include R scripting with a tool such 22as ChIPpeakAnno or multistage processing of the GFF using awk and bedtools. 23 24In contrast, peak-classifier is a simple Unix command that takes a BED file 25and a GFF file as inputs and reports all peak classifications in a matter of 26seconds. 27 28Admittedly, an optimal C program isn't really necessary to solve this problem, 29since the crappiest implementation I can imagine would not take more than 30hours to run for a typical ATAC-Seq peak set. However: 31 32 * It's an opportunity to develop and test biolibc code that will be 33 useful for other problems and bigger data 34 * It's more about making peak classification convenient than fast 35 * It never hurts to hone your C skills 36 * There's no such thing as a program that's too fast 37 38## Design and Implementation 39 40The code is organized following basic object-oriented design principals, but 41implemented in C to minimize overhead and keep the source code accessible to 42scientists who don't have time to master the complexities of C++. 43 44Structures are treated as classes, with accessor and mutator functions 45(or macros) provided, so dependent applications and libraries need not access 46structure members directly. Since the C language cannot enforce this, it's 47up to application programmers to exercise self-discipline. 48 49## Building and installing 50 51peak-classifier is intended to build cleanly in any POSIX environment on 52any CPU architecture. Please 53don't hesitate to open an issue if you encounter problems on any 54Unix-like system. 55 56Primary development is done on FreeBSD with clang, but the code is frequently 57tested on CentOS, MacOS, and NetBSD as well. MS Windows is not supported, 58unless using a POSIX environment such as Cygwin or Windows Subsystem for Linux. 59 60The Makefile is designed to be friendly to package managers, such as 61[Debian packages](https://www.debian.org/distrib/packages), 62[FreeBSD ports](https://www.freebsd.org/ports/), 63[MacPorts](https://www.macports.org/), [pkgsrc](https://pkgsrc.org/), etc. 64End users should install via one of these if at all possible. 65 66I maintain a FreeBSD port and a pkgsrc package. 67 68### Installing peak-classifier on FreeBSD: 69 70FreeBSD is a highly underrated platform for scientific computing, with over 711,900 scientific libraries and applications in the FreeBSD ports collection 72(of more than 30,000 total), modern clang compiler, fully-integrated ZFS 73filesystem, and renowned security, performance, and reliability. 74FreeBSD has a somewhat well-earned reputation for being difficult to set up 75and manage compared to user-friendly systems like [Ubuntu](https://ubuntu.com/). 76However, if you're a little bit Unix-savvy, you can very quickly set up a 77workstation, laptop, or VM using 78[desktop-installer](http://www.acadix.biz/desktop-installer.php). If 79you're new to Unix, you can also reap the benefits of FreeBSD by running 80[GhostBSD](https://ghostbsd.org/), a FreeBSD distribution augmented with a 81graphical installer and management tools. GhostBSD does not offer as many 82options as desktop-installer, but it may be more comfortable for Unix novices. 83 84``` 85pkg install peak-classifier 86``` 87 88### Installing via pkgsrc 89 90pkgsrc is a cross-platform package manager that works on any Unix-like 91platform. It is native to [NetBSD](https://www.netbsd.org/) and well-supported 92on [Illumos](https://illumos.org/), [MacOS](https://www.apple.com/macos/), 93[RHEL](https://www.redhat.com)/[CentOS](https://www.centos.org/), and 94many other Linux distributions. 95Using pkgsrc does not require admin privileges. You can install a pkgsrc 96tree in any directory to which you have write access and easily install any 97of the nearly 20,000 packages in the collection. The 98[auto-pkgsrc-setup](http://netbsd.org/~bacon/) script can assist you with 99basic setup. 100 101First bootstrap pkgsrc using auto-pkgsrc-setup or any 102other method. Then run the following commands: 103 104``` 105cd pkgsrc-dir/biology/peak-classifier 106bmake install clean 107``` 108 109There may also be binary packages available for your platform. If this is 110the case, you can install by running: 111 112``` 113pkgin install peak-classifier 114``` 115 116See the [Joyent Cloud Services Site](https://pkgsrc.joyent.com/) for 117available package sets. 118 119### Building peak-classifier locally 120 121Below are cave man install instructions for development purposes, not 122recommended for regular use. 123 124peak-classifier depends on [biolibc](https://github.com/auerlab/biolibc). 125Install biolibc before attempting to build peak-classifier. 126 1271. Clone the repository 1282. Run "make depend" to update Makefile.depend 1293. Run "make install" 130 131The default install prefix is ../local. Clone peak-classifier, biolibc and dependent 132apps into sibling directories so that ../local represents a common path to all 133of them. 134 135To facilitate incorporation into package managers, the Makefile respects 136standard make/environment variables such as CC, CFLAGS, PREFIX, etc. 137 138Add-on libraries required for the build, such as biolibc, should be found 139under ${LOCALASE}, which defaults to ../local. 140The library, headers, and man pages are installed under 141${DESTDIR}${PREFIX}. DESTDIR is empty by default and is primarily used by 142package managers to stage installations. PREFIX defaults to ${LOCALBASE}. 143 144To install directly to /myprefix, assuming biolibc is installed there as well, 145using a make variable: 146 147``` 148make LOCALBASE=/myprefix clean depend install 149``` 150 151Using an environment variable: 152 153``` 154# C-shell and derivatives 155setenv LOCALBASE /myprefix 156make clean depend install 157 158# Bourne shell and derivatives 159LOCALBASE=/myprefix 160export LOCALBASE 161make clean depend install 162``` 163 164View the Makefile for full details. 165