Run the Battenberg pipeline
battenberg(
analysis = "paired",
samplename,
normalname,
sample_data_file,
normal_data_file,
imputeinfofile,
g1000prefix,
problemloci,
gccorrectprefix = NULL,
repliccorrectprefix = NULL,
g1000allelesprefix = NA,
ismale = NA,
data_type = "wgs",
impute_exe = "impute2",
allelecounter_exe = "alleleCounter",
nthreads = 8,
platform_gamma = 1,
phasing_gamma = 1,
segmentation_gamma = 10,
segmentation_kmin = 3,
phasing_kmin = 1,
clonality_dist_metric = 0,
ascat_dist_metric = 1,
min_ploidy = 1.6,
max_ploidy = 4.8,
min_rho = 0.1,
max_rho = 1,
min_goodness = 0.63,
uninformative_BAF_threshold = 0.51,
min_normal_depth = 10,
min_base_qual = 20,
min_map_qual = 35,
max_allowed_state = 250,
cn_upper_limit = 1000,
calc_seg_baf_option = 3,
skip_allele_counting = F,
skip_preprocessing = F,
skip_phasing = F,
externalhaplotypefile = NA,
usebeagle = FALSE,
beaglejar = NA,
beagleref.template = NA,
beagleplink.template = NA,
beaglemaxmem = 10,
beaglenthreads = 1,
beaglewindow = 40,
beagleoverlap = 4,
javajre = "java",
write_battenberg_phasing = T,
multisample_relative_weight_balanced = 0.25,
multisample_maxlag = 90,
segmentation_gamma_multisample = 5,
snp6_reference_info_file = NA,
apt.probeset.genotype.exe = "apt-probeset-genotype",
apt.probeset.summarize.exe = "apt-probeset-summarize",
norm.geno.clust.exe = "normalize_affy_geno_cluster.pl",
birdseed_report_file = "birdseed.report.txt",
heterozygousFilter = "none",
prior_breakpoints_file = NULL,
genomebuild = "hg19",
chrom_coord_file = NULL,
enhanced_grid_search = F
)The mode of Battenberg copy number analysis to be undertaken: 'paired' for tumour-normal pair, 'cell_line' for Cell line tumour-only and 'germline' for germline CNV of normal sample (Default: 'paired')
Sample identifier (tumour or germline), this is used as a prefix for the output files. If allele counts are supplied separately, they are expected to have this identifier as prefix.
Matched normal identifier, this is used as a prefix for the output files. If allele counts are supplied separately, they are expected to have this identifier as prefix.
A BAM or CEL file for the sample
A BAM or CEL file for the normal-pair (paired analysis)
Full path to a Battenberg impute info file with pointers to Impute2 reference data
Full prefix path to 1000 Genomes SNP loci data, as part of the Battenberg reference data
Full path to a problem loci file that contains SNP loci that should be filtered out
Full prefix path to GC content files, as part of the Battenberg reference data, not required for SNP6 data (Default: NULL)
Full prefix path to replication timing files, as part of the Battenberg reference data, not required for SNP6 data (Default: NULL)
Full prefix path to 1000 Genomes SNP alleles data, as part of the Battenberg reference data, not required for SNP6 data (Default: NA)
A boolean set to TRUE if the donor is male, set to FALSE if female, not required for SNP6 data (Default: NA)
String that contains either wgs or snp6 depending on the supplied input data (Default: wgs)
Pointer to the Impute2 executable (Default: impute2, i.e. expected in $PATH)
Pointer to the alleleCounter executable (Default: alleleCounter, i.e. expected in $PATH)
The number of concurrent processes to use while running the Battenberg pipeline (Default: 8)
Platform scaling factor, suggestions are set to 1 for wgs and to 0.55 for snp6 (Default: 1)
Gamma parameter used when correcting phasing mistakes (Default: 1)
The gamma parameter controls the size of the penalty of starting a new segment during segmentation. It is therefore the key parameter for controlling the number of segments (Default: 10)
Kmin represents the minimum number of probes/SNPs that a segment should consist of (Default: 3)
Kmin used when correcting for phasing mistakes (Default: 3)
Distance metric to use when choosing purity/ploidy combinations (Default: 0)
Distance metric to use when choosing purity/ploidy combinations (Default: 1)
Minimum ploidy to be considered (Default: 1.6)
Maximum ploidy to be considered (Default: 4.8)
Minimum purity to be considered (Default: 0.1)
Maximum purity to be considered (Default: 1.0)
Minimum goodness of fit required for a purity/ploidy combination to be accepted as a solution (Default: 0.63)
The threshold beyond which BAF becomes uninformative (Default: 0.51)
Minimum depth required in the matched normal for a SNP to be considered as part of the wgs analysis (Default: 10)
Minimum base quality required for a read to be counted when allele counting (Default: 20)
Minimum mapping quality required for a read to be counted when allele counting (Default: 35)
The maximum CN state allowed (Default 250)
Maximum number of copy number that can be called (Default 1000)
Sets way to calculate BAF per segment: 1=mean, 2=median, 3=ifelse median==0 | 1, mean, median (Default (paired): 3, cell_line & germline: 1)
Provide TRUE when allele counting can be skipped (i.e. its already done) (Default: FALSE)
Provide TRUE when preprocessing is already complete (Default: FALSE)
Provide TRUE when phasing is already complete (Default: FALSE)
Vcf containing externally obtained haplotype blocks (Default: NA)
Should use beagle5 instead of impute2 Default: FALSE
Full path to Beagle java jar file Default: NA
Full path template to Beagle reference files where the chromosome is replaced by 'CHROMNAME' Default: NA
Full path template to Beagle plink files where the chromosome is replaced by 'CHROMNAME' Default: NA
Integer Beagle max heap size in Gb Default: 10
Integer number of threads used by beagle5 Default:1
Integer size of the genomic window for beagle5 (cM) Default:40
Integer size of the overlap between windows beagle5 Default:4
Path to the Java JRE executable, only required for haplotype reconstruction with Beagle (default java, i.e. in $PATH)
Write the Battenberg phasing results as vcf to disk, e.g. for multisample cases (Default: TRUE)
Relative weight to give to haplotype info from a sample without allelic imbalance in the region (Default: 0.25)
Maximal number of upstream SNPs used in the multisample haplotyping to inform the haplotype at another SNP (Default: 100)
The gamma parameter controls the size of the penalty of starting a new segment during mutlisample segmentation. It is the key parameter for controlling the number of segments (Default: 10)
Reference files for the SNP6 pipeline only (Default: NA)
Helper tool for extracting data from CEL files, SNP6 pipeline only (Default: apt-probeset-genotype)
Helper tool for extracting data from CEL files, SNP6 pipeline only (Default: apt-probeset-summarize)
Helper tool for extracting data from CEL files, SNP6 pipeline only (Default: normalize_affy_geno_cluster.pl)
Sex inference output file, SNP6 pipeline only (Default: birdseed.report.txt)
Legacy option to set a heterozygous SNP filter, SNP6 pipeline only (Default: "none")
A two column file with prior breakpoints to be used during segmentation (Default: NULL)
Genome build upon which the 1000G SNP coordinates were obtained (Default: hg19; options: "hg19" or "hg38")
Should use multi-start, parallelized and multi-approach grid search (Default: FALSE)