• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..19-Feb-2021-

README.mdH A D19-Feb-20217 KiB7352

Structs.wdlH A D19-Feb-2021160 119

cnv_germline_case_scattered_workflow.wdlH A D19-Feb-202110.7 KiB223202

cnv_germline_case_workflow.wdlH A D19-Feb-202121.6 KiB495440

cnv_germline_cohort_workflow.wdlH A D19-Feb-202132.7 KiB747666

joint_call_exome_cnvs.wdlH A D19-Feb-202110.5 KiB303266

README.md

1## Running the Germline CNV WDL
2
3### Which WDL should you use?
4
5- Cohort WDL: Calling a cohort of samples and building a model for denoising further case samples: ``cnv_germline_cohort_workflow.wdl``
6- Case WDL: Calling case samples using a previously built model for denoising: ``cnv_germline_case_workflow.wdl``
7- Scattered case WDL (recommended): Functionally equivalent to case WDL, written for reducing cloud compute cost (see below) and wall-clock time ``cnv_germline_case_scattered_workflow.wdl``
8
9#### Setting up parameter json file for a run
10
11To get started, create the json template (using ``java -jar wdltool.jar inputs <workflow>``) for the workflow you wish to run and adjust parameters accordingly.
12
13*Please note that there are optional workflow-level and task-level parameters that do not appear in the template file.  These are set to reasonable values by default, but can also be adjusted if desired.*
14
15#### Required parameters in the germline cohort workflow
16
17The reference used must be the same between PoN and case samples.
18
19- ``CNVGermlineCohortWorkflow.cohort_entity_id`` -- Name of the cohort.  Will be used as a prefix for output filenames.
20- ``CNVGermlineCohortWorkflow.contig_ploidy_priors`` -- TSV file containing prior probabilities for the ploidy of each contig, with column headers: CONTIG_NAME, PLOIDY_PRIOR_0, PLOIDY_PRIOR_1, ...
21- ``CNVGermlineCohortWorkflow.gatk_docker`` -- GATK Docker image (e.g., ``broadinstitute/gatk:latest``).
22- ``CNVGermlineCohortWorkflow.intervals`` -- Picard or GATK-style interval list.  For WGS, this should typically only include the chromosomes of interest.
23- ``CNVGermlineCohortWorkflow.normal_bais`` -- List of BAI files.  This list must correspond to `normal_bams`.  For example, `["Sample1.bai", "Sample2.bai"]`.
24- ``CNVGermlineCohortWorkflow.normal_bams`` -- List of BAM files.  This list must correspond to `normal_bais`.  For example, `["Sample1.bam", "Sample2.bam"]`.
25- ``CNVGermlineCohortWorkflow.num_intervals_per_scatter`` -- Number of intervals (i.e., targets or bins) in each scatter for GermlineCNVCaller.  If total number of intervals is not divisible by the value provided, the last scatter will contain the remainder.
26- ``CNVGermlineCohortWorkflow.ref_fasta_dict`` -- Path to reference dict file.
27- ``CNVGermlineCohortWorkflow.ref_fasta_fai`` -- Path to reference fasta fai file.
28- ``CNVGermlineCohortWorkflow.ref_fasta`` -- Path to reference fasta file.
29- ``CNVGermlineCohortWorkflow.maximum_number_events_per_sample`` -- Maximum number of events threshold for doing sample QC (recommended for WES is ~100)
30
31In additional, there are optional workflow-level and task-level parameters that may be set by advanced users; for example:
32
33- ``CNVGermlineCohortWorkflow.do_explicit_gc_correction`` -- (optional) If true, perform explicit GC-bias correction when creating PoN and in subsequent denoising of case samples.  If false, rely on PCA-based denoising to correct for GC bias.
34- ``CNVGermlineCohortWorkflow.PreprocessIntervals.bin_length`` -- Size of bins (in bp) for coverage collection.  *This must be the same value used for all case samples.*
35- ``CNVGermlineCohortWorkflow.PreprocessIntervals.padding`` -- Amount of padding (in bp) to add to both sides of targets for WES coverage collection.  *This must be the same value used for all case samples.*
36
37Further explanation of other task-level parameters may be found by invoking the ``--help`` documentation available in the gatk.jar for each tool.
38
39#### Required parameters in the germline case workflow
40
41The reference, number of intervals per scatter, and bins (if specified) must be the same between cohort and case samples.
42
43- ``CNVGermlineCohortWorkflow.normal_bais`` -- List of BAI files.  This list must correspond to `normal_bams`.  For example, `["Sample1.bai", "Sample2.bai"]`.
44- ``CNVGermlineCohortWorkflow.normal_bams`` -- List of BAM files.  This list must correspond to `normal_bais`.  For example, `["Sample1.bam", "Sample2.bam"]`.
45- ``CNVGermlineCaseWorkflow.contig_ploidy_model_tar`` -- Path to tar of the contig-ploidy model directory generated by the DetermineGermlineContigPloidyCohortMode task.
46- ``CNVGermlineCaseWorkflow.gatk_docker`` -- GATK Docker image (e.g., ``broadinstitute/gatk:latest``).
47- ``CNVGermlineCaseWorkflow.gcnv_model_tars`` -- Array of paths to tars of the contig-ploidy model directories generated by the GermlineCNVCallerCohortMode tasks.
48- ``CNVGermlineCaseWorkflow.intervals`` -- Picard or GATK-style interval list.  For WGS, this should typically only include the chromosomes of interest.
49- ``CNVGermlineCaseWorkflow.num_intervals_per_scatter`` -- Number of intervals (i.e., targets or bins) in each scatter for GermlineCNVCaller.  If total number of intervals is not divisible by the value provided, the last scatter will contain the remainder.
50- ``CNVGermlineCaseWorkflow.ref_fasta_dict`` -- Path to reference dict file.
51- ``CNVGermlineCaseWorkflow.ref_fasta_fai`` -- Path to reference fasta fai file.
52- ``CNVGermlineCaseWorkflow.ref_fasta`` -- Path to reference fasta file.
53- ``CNVGermlineCohortWorkflow.maximum_number_events_per_sample`` -- Maximum number of events threshold for doing sample QC (recommended for WES is ~100)
54
55In additional, there are several task-level parameters that may be set by advanced users as above.
56
57Further explanation of these task-level parameters may be found by invoking the ``--help`` documentation available in the gatk.jar for each tool.
58
59#### Required parameters in the scattered germline case workflow
60
61Same required parameters as in the germline case workflow. However, in order to reduce wall-clock time and compute cost, it is recommended to optimize for the following parameters:
62
63- ``CNVGermlineCaseScatteredWorkflow.num_samples_per_scatter_block`` -- (recommended WES value=25) number of samples to process in a single block; blocks of this size will be sent to the germline case workflow and processed in a batch;
64- ``CNVGermlineCaseScatteredWorkflow.preemptible_attempts`` -- (recommended value=5) this reduces cost by using preemptible instances
65- ``CNVGermlineCaseScatteredWorkflow.mem_gb_for_determine_germline_contig_ploidy`` -- amount of memory allotted for ploidy determination tasks (the lower the cheaper)
66- ``CNVGermlineCaseScatteredWorkflow.cpu_for_determine_germline_contig_ploidy`` -- number of CPU cores allotted for ploidy determination tasks (the lower the cheaper)
67- ``CNVGermlineCaseScatteredWorkflow.disk_for_determine_germline_contig_ploidy`` -- amount of storage allotted for ploidy determination tasks (the lower the cheaper)
68- ``CNVGermlineCaseScatteredWorkflow.mem_gb_for_germline_cnv_caller`` -- amount of memory allotted for gCNV caller tasks (the lower the cheaper)
69- ``CNVGermlineCaseScatteredWorkflow.cpu_for_germline_cnv_caller`` -- number of CPU cores allotted for gCNV caller tasks (the lower the cheaper)
70- ``CNVGermlineCaseScatteredWorkflow.disk_for_germline_cnv_caller`` -- amount of storage allotted for gCNV caller tasks (the lower the cheaper)
71
72Note that lowering disk and memory too much will eventually lead to the workflow failing. Lowering the number of CPU cores could increase the wall-clock times.
73