• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..28-Apr-2020-

Python/H03-May-2022-1,7111,538

Tests/H28-Apr-2020-621456

build_dynamic/H28-Apr-2020-161120

build_win/H28-Apr-2020-9165

cindex/H28-Apr-2020-730578

include/H28-Apr-2020-38,02129,497

libdeflate/H28-Apr-2020-9,1534,959

pgenlibr/H28-Apr-2020-2,6622,210

zstd/H03-May-2022-28,21918,952

.gitignoreH A D28-Apr-202091 87

LICENSEH A D28-Apr-202034.3 KiB675553

MakefileH A D28-Apr-20201.9 KiB7643

Makefile.srcH A D28-Apr-20202.9 KiB3523

ReadMe.mdH A D28-Apr-20206.1 KiB9792

build.shH A D28-Apr-20201.6 KiB6536

pgen_compress.ccH A D28-Apr-20208.7 KiB249235

pgenlib_ffi_support.ccH A D28-Apr-202023.2 KiB583494

pgenlib_ffi_support.hH A D28-Apr-20205.6 KiB10338

plink2.ccH A D28-Apr-2020521.2 KiB10,1629,460

plink2_adjust.ccH A D28-Apr-202035.3 KiB924845

plink2_adjust.hH A D28-Apr-20203 KiB8757

plink2_cmdline.ccH A D28-Apr-2020163.1 KiB4,4403,628

plink2_cmdline.hH A D28-Apr-202072.1 KiB1,8331,195

plink2_common.ccH A D28-Apr-2020124.4 KiB3,1422,460

plink2_common.hH A D28-Apr-202046.3 KiB1,178767

plink2_compress_stream.ccH A D28-Apr-20208.3 KiB218181

plink2_compress_stream.hH A D28-Apr-20204.5 KiB12263

plink2_cpu.ccH A D28-Apr-20204.9 KiB13970

plink2_data.ccH A D28-Apr-2020384.1 KiB8,2926,814

plink2_data.hH A D28-Apr-20209 KiB10659

plink2_decompress.ccH A D28-Apr-20202.5 KiB6440

plink2_decompress.hH A D28-Apr-20206.5 KiB16099

plink2_export.ccH A D28-Apr-2020463 KiB10,0118,883

plink2_export.hH A D28-Apr-20202.8 KiB6638

plink2_fasta.ccH A D28-Apr-202029.7 KiB713637

plink2_fasta.hH A D28-Apr-20201.5 KiB4218

plink2_filter.ccH A D28-Apr-2020192.6 KiB4,4144,063

plink2_filter.hH A D28-Apr-20209.3 KiB11865

plink2_glm.ccH A D28-Apr-2020621.8 KiB12,60110,995

plink2_glm.hH A D28-Apr-20204.9 KiB12985

plink2_help.ccH A D28-Apr-2020134.1 KiB2,2152,009

plink2_help.hH A D28-Apr-20201 KiB3612

plink2_import.ccH A D28-Apr-2020706.1 KiB16,27714,145

plink2_import.hH A D28-Apr-20205.7 KiB11676

plink2_ld.ccH A D28-Apr-2020168.2 KiB3,5783,003

plink2_ld.hH A D28-Apr-20202.5 KiB6434

plink2_matrix.ccH A D28-Apr-202050.8 KiB1,3661,108

plink2_matrix.hH A D28-Apr-202020.2 KiB473294

plink2_matrix_calc.ccH A D28-Apr-2020356.4 KiB7,9197,170

plink2_matrix_calc.hH A D28-Apr-20208.9 KiB196149

plink2_misc.ccH A D28-Apr-2020392.4 KiB9,2238,665

plink2_misc.hH A D28-Apr-202020.3 KiB352272

plink2_psam.ccH A D28-Apr-202068.2 KiB1,4861,346

plink2_psam.hH A D28-Apr-20204.4 KiB8820

plink2_pvar.ccH A D28-Apr-202095 KiB2,1011,820

plink2_pvar.hH A D28-Apr-20205.5 KiB9829

plink2_random.ccH A D28-Apr-20209.8 KiB278235

plink2_random.hH A D28-Apr-20201.7 KiB5019

plink2_set.ccH A D28-Apr-202017.7 KiB434378

plink2_set.hH A D28-Apr-20201.2 KiB3411

pvar_ffi_support.ccH A D28-Apr-202015.4 KiB400353

pvar_ffi_support.hH A D28-Apr-20201.9 KiB6328

ReadMe.md

1The include/ subdirectory contains two (LGPL3-licensed) major libraries, while
2the immediate directory contains the PLINK 2.0 application built on top of
3them.  These are carefully written to be valid C99 (from gcc and clang's
4perspective, anyway) to simplify FFI development, while still taking advantage
5of quite a few C++-specific affordances to improve safety and occasionally
6performance.  They are currently x86-specific, but there are annotations to
7facilitate a possible future port to ARM.
8
9The first library is plink2_text, which provides a pair of classes designed to
10replace std::getline(), fgets(), and similar ways of iterating over text lines.
11Key properties:
12* Instead of copying every line to your buffer, one at a time, these classes
13  just return a pointer to the beginning of each line in the underlying binary
14  stream, and give you access to a pointer to the end.  In exchange, the line
15  is invalidated when you iterate to the next one; it's like being forced to
16  pass the same string to std::getline(), or the same buffer to fgets(), on
17  every call.  But whenever that's problematic, you can always copy the line
18  before iterating to the next; on all systems I've seen, that *still* exhibits
19  better throughput than getline/fgets.  And in the many situations where
20  there's no need to copy, you get a fundamentally lower-latency abstraction.
21* They automatically detect and decompress gzipped and Zstd-compressed
22  (https://facebook.github.io/zstd/ ) files, in a manner that works with pipe
23  file descriptors.
24* The primary TextStream class automatically reads *and decompresses* ahead for
25  you.  Decompression is even multithreaded by default when the file is
26  BGZF-compressed.  (And the textFILE class covers the setting where you don't
27  want to launch any more threads.)
28* They do not support network input as of this writing, but that would not be
29  difficult to add.  The existing code uses FILE* in a very straightforward
30  manner.
31* As for text parsing, the ScanadvDouble() utility function in the
32  plink2_string component is a very efficient string-to-double converter.
33  While it does not support perfect string<->double round-trips (that's what
34  C++17 std::from_chars is for; https://abseil.io/ has a working implementation
35  while we wait for gcc/clang...), or long-tail features like locale-specific
36  decimal separators or hex floats, it has been incredibly useful for speeding
37  up the basic job of scanning standard-locale printf("%g")-formatted and
38  similar output.  (Note that you lose roughly a billion times as much accuracy
39  to %g's 6-digit limit as you do to imperfect string->double conversion in
40  that setting.)
41
42(Coming soon: example text-processing programs using plink2_text.)
43
44The second library is pgenlib.  This supports reading and writing of PLINK 2.x
45genotype files (".pgen").  A draft specification for this format is under
46https://github.com/chrchang/plink-ng/tree/master/pgen_spec ; here are some key
47properties:
48* A PLINK 1 .bed is a valid .pgen.
49* In addition, .pgen can represent multiallelic, phased, and/or dosage
50  information.  As of this writing, software support for multiallelic dosages
51  does not exist yet, but it does for the other attribute pairs
52  (multiallelic+phased, phased+dosage).
53* **.pgen CANNOT represent genotype probability triplets.  It also cannot store
54  read depths, per-call quality scores, etc.**  While plink2 can *filter* on
55  the aforementioned BGEN/VCF fields during import, it cannot re-export or do
56  anything else with them.  Use other software, such as bcftools
57  (https://samtools.github.io/bcftools/bcftools.html ) or qctool2
58  (www.well.ox.ac.uk/~gav/qctool_v2/ ) when you must retain any of these
59  fields.
60* .pgen is compressed, but in a domain-specific manner that supports very fast
61  compression and decompression.  It is even practical to perform several key
62  computations (e.g. allele frequency) directly on the compressed
63  representation, and this capability is exposed by the pgenlib library.
64* Python/pgenlib.pyx is the Python wrapper (see Python/python_api.txt for
65  details), and pgenlibr/ is the R wrapper.  These are somewhat incomplete as
66  of this writing, but it would not take much effort to fill in key components;
67  that work is scheduled for roughly the time of the beta release, but if you
68  could really use a specific feature earlier, you have good odds of getting it
69  by asking at https://groups.google.com/forum/#!forum/plink2-dev .
70  (plink2-dev is also the place to ask other questions about any of this code.)
71
72As for the PLINK 2.0 application:
73* build_dynamic/ contains a Makefile suitable for Linux and macOS dynamic
74  builds.  On Linux, if Intel MKL is installed using the instructions at e.g.
75  https://software.intel.com/en-us/articles/installing-intel-free-libs-and-python-apt-repo ,
76  you can dynamically link to it.
77* build_win/ contains a Makefile for producing static Windows builds.  This
78  requires MinGW[-w64] and zlib; a prebuilt OpenBLAS package from
79  https://sourceforge.net/projects/openblas/files/ is also strongly
80  recommended.
81* GPUs are not exploited, and there are currently no plans to write much
82  GPU-specific code before PLINK 2.0's core function set is completed around
83  2021.  However, a few linear-algebra-heavy workloads may benefit
84  significantly from a simple replacement of Intel MKL by cuBLAS + cuSOLVER.
85  This can probably be supported earlier; feel free to open a GitHub issue
86  about it if it would make a big difference to you.
87* The LGPL3-licensed plink2_stats component may be of independent interest.  It
88  includes a function for computing the 2x2 Fisher's exact test p-value in
89  approximately O(sqrt(n)) time--much faster than the O(n) algorithms employed
90  by other libraries as of this writing--as well as several log-p-value
91  computations (Z-score/chi-square, T-test, F-test) that remain accurate well
92  beyond the limits of most other statistical library functions.  (No, you
93  don't want to take a 10^{-1000000} p-value literally, but it can be useful to
94  distinguish it from 10^{-325}, and both of these numbers can naturally arise
95  when analyzing biobank-scale data.)
96* More documentation is at www.cog-genomics.org/plink/2.0/ .
97