Run the Battenberg pipeline

battenberg(
  analysis = "paired",
  samplename,
  normalname,
  sample_data_file,
  normal_data_file,
  imputeinfofile,
  g1000prefix,
  problemloci,
  gccorrectprefix = NULL,
  repliccorrectprefix = NULL,
  g1000allelesprefix = NA,
  ismale = NA,
  data_type = "wgs",
  impute_exe = "impute2",
  allelecounter_exe = "alleleCounter",
  nthreads = 8,
  platform_gamma = 1,
  phasing_gamma = 1,
  segmentation_gamma = 10,
  segmentation_kmin = 3,
  phasing_kmin = 1,
  clonality_dist_metric = 0,
  ascat_dist_metric = 1,
  min_ploidy = 1.6,
  max_ploidy = 4.8,
  min_rho = 0.1,
  max_rho = 1,
  min_goodness = 0.63,
  uninformative_BAF_threshold = 0.51,
  min_normal_depth = 10,
  min_base_qual = 20,
  min_map_qual = 35,
  max_allowed_state = 250,
  cn_upper_limit = 1000,
  calc_seg_baf_option = 3,
  skip_allele_counting = F,
  skip_preprocessing = F,
  skip_phasing = F,
  externalhaplotypefile = NA,
  usebeagle = FALSE,
  beaglejar = NA,
  beagleref.template = NA,
  beagleplink.template = NA,
  beaglemaxmem = 10,
  beaglenthreads = 1,
  beaglewindow = 40,
  beagleoverlap = 4,
  javajre = "java",
  write_battenberg_phasing = T,
  multisample_relative_weight_balanced = 0.25,
  multisample_maxlag = 90,
  segmentation_gamma_multisample = 5,
  snp6_reference_info_file = NA,
  apt.probeset.genotype.exe = "apt-probeset-genotype",
  apt.probeset.summarize.exe = "apt-probeset-summarize",
  norm.geno.clust.exe = "normalize_affy_geno_cluster.pl",
  birdseed_report_file = "birdseed.report.txt",
  heterozygousFilter = "none",
  prior_breakpoints_file = NULL,
  genomebuild = "hg19",
  chrom_coord_file = NULL,
  enhanced_grid_search = F
)

Arguments

analysis: The mode of Battenberg copy number analysis to be undertaken: 'paired' for tumour-normal pair, 'cell_line' for Cell line tumour-only and 'germline' for germline CNV of normal sample (Default: 'paired')
samplename: Sample identifier (tumour or germline), this is used as a prefix for the output files. If allele counts are supplied separately, they are expected to have this identifier as prefix.
normalname: Matched normal identifier, this is used as a prefix for the output files. If allele counts are supplied separately, they are expected to have this identifier as prefix.
sample_data_file: A BAM or CEL file for the sample
normal_data_file: A BAM or CEL file for the normal-pair (paired analysis)
imputeinfofile: Full path to a Battenberg impute info file with pointers to Impute2 reference data
g1000prefix: Full prefix path to 1000 Genomes SNP loci data, as part of the Battenberg reference data
problemloci: Full path to a problem loci file that contains SNP loci that should be filtered out
gccorrectprefix: Full prefix path to GC content files, as part of the Battenberg reference data, not required for SNP6 data (Default: NULL)
repliccorrectprefix: Full prefix path to replication timing files, as part of the Battenberg reference data, not required for SNP6 data (Default: NULL)
g1000allelesprefix: Full prefix path to 1000 Genomes SNP alleles data, as part of the Battenberg reference data, not required for SNP6 data (Default: NA)
ismale: A boolean set to TRUE if the donor is male, set to FALSE if female, not required for SNP6 data (Default: NA)
data_type: String that contains either wgs or snp6 depending on the supplied input data (Default: wgs)
impute_exe: Pointer to the Impute2 executable (Default: impute2, i.e. expected in $PATH)
allelecounter_exe: Pointer to the alleleCounter executable (Default: alleleCounter, i.e. expected in $PATH)
nthreads: The number of concurrent processes to use while running the Battenberg pipeline (Default: 8)
platform_gamma: Platform scaling factor, suggestions are set to 1 for wgs and to 0.55 for snp6 (Default: 1)
phasing_gamma: Gamma parameter used when correcting phasing mistakes (Default: 1)
segmentation_gamma: The gamma parameter controls the size of the penalty of starting a new segment during segmentation. It is therefore the key parameter for controlling the number of segments (Default: 10)
segmentation_kmin: Kmin represents the minimum number of probes/SNPs that a segment should consist of (Default: 3)
phasing_kmin: Kmin used when correcting for phasing mistakes (Default: 3)
clonality_dist_metric: Distance metric to use when choosing purity/ploidy combinations (Default: 0)
ascat_dist_metric: Distance metric to use when choosing purity/ploidy combinations (Default: 1)
min_ploidy: Minimum ploidy to be considered (Default: 1.6)
max_ploidy: Maximum ploidy to be considered (Default: 4.8)
min_rho: Minimum purity to be considered (Default: 0.1)
max_rho: Maximum purity to be considered (Default: 1.0)
min_goodness: Minimum goodness of fit required for a purity/ploidy combination to be accepted as a solution (Default: 0.63)
uninformative_BAF_threshold: The threshold beyond which BAF becomes uninformative (Default: 0.51)
min_normal_depth: Minimum depth required in the matched normal for a SNP to be considered as part of the wgs analysis (Default: 10)
min_base_qual: Minimum base quality required for a read to be counted when allele counting (Default: 20)
min_map_qual: Minimum mapping quality required for a read to be counted when allele counting (Default: 35)
max_allowed_state: The maximum CN state allowed (Default 250)
cn_upper_limit: Maximum number of copy number that can be called (Default 1000)
calc_seg_baf_option: Sets way to calculate BAF per segment: 1=mean, 2=median, 3=ifelse median==0 | 1, mean, median (Default (paired): 3, cell_line & germline: 1)
skip_allele_counting: Provide TRUE when allele counting can be skipped (i.e. its already done) (Default: FALSE)
skip_preprocessing: Provide TRUE when preprocessing is already complete (Default: FALSE)
skip_phasing: Provide TRUE when phasing is already complete (Default: FALSE)
externalhaplotypefile: Vcf containing externally obtained haplotype blocks (Default: NA)
usebeagle: Should use beagle5 instead of impute2 Default: FALSE
beaglejar: Full path to Beagle java jar file Default: NA
beagleref.template: Full path template to Beagle reference files where the chromosome is replaced by 'CHROMNAME' Default: NA
beagleplink.template: Full path template to Beagle plink files where the chromosome is replaced by 'CHROMNAME' Default: NA
beaglemaxmem: Integer Beagle max heap size in Gb Default: 10
beaglenthreads: Integer number of threads used by beagle5 Default:1
beaglewindow: Integer size of the genomic window for beagle5 (cM) Default:40
beagleoverlap: Integer size of the overlap between windows beagle5 Default:4
javajre: Path to the Java JRE executable, only required for haplotype reconstruction with Beagle (default java, i.e. in $PATH)
write_battenberg_phasing: Write the Battenberg phasing results as vcf to disk, e.g. for multisample cases (Default: TRUE)
multisample_relative_weight_balanced: Relative weight to give to haplotype info from a sample without allelic imbalance in the region (Default: 0.25)
multisample_maxlag: Maximal number of upstream SNPs used in the multisample haplotyping to inform the haplotype at another SNP (Default: 100)
segmentation_gamma_multisample: The gamma parameter controls the size of the penalty of starting a new segment during mutlisample segmentation. It is the key parameter for controlling the number of segments (Default: 10)
snp6_reference_info_file: Reference files for the SNP6 pipeline only (Default: NA)
apt.probeset.genotype.exe: Helper tool for extracting data from CEL files, SNP6 pipeline only (Default: apt-probeset-genotype)
apt.probeset.summarize.exe: Helper tool for extracting data from CEL files, SNP6 pipeline only (Default: apt-probeset-summarize)
norm.geno.clust.exe: Helper tool for extracting data from CEL files, SNP6 pipeline only (Default: normalize_affy_geno_cluster.pl)
birdseed_report_file: Sex inference output file, SNP6 pipeline only (Default: birdseed.report.txt)
heterozygousFilter: Legacy option to set a heterozygous SNP filter, SNP6 pipeline only (Default: "none")
prior_breakpoints_file: A two column file with prior breakpoints to be used during segmentation (Default: NULL)
genomebuild: Genome build upon which the 1000G SNP coordinates were obtained (Default: hg19; options: "hg19" or "hg38")
enhanced_grid_search: Should use multi-start, parallelized and multi-approach grid search (Default: FALSE)

Author

sd11, jdemeul, Naser Ansari-Pour, Julio Cesar Cortes Rios