This function performs part of the Battenberg WGS pipeline: Counting alleles, generating BAF and logR, reconstructing normal-pair allele counts for the cell line and performing GC content correction.

prepare_wgs_cell_line(
  chrom_names,
  chrom_coord,
  tumourbam,
  tumourname,
  g1000lociprefix,
  g1000allelesprefix,
  gamma_ivd = 100000,
  kmin_ivd = 50,
  centromere_noise_seg_size = 1000000,
  centromere_dist = 500000,
  min_het_dist = 100000,
  gamma_logr = 100,
  length_adjacent = 50000,
  gccorrectprefix,
  repliccorrectprefix,
  min_base_qual,
  min_map_qual,
  allelecounter_exe,
  min_normal_depth,
  skip_allele_counting
)

Arguments

chrom_names

A vector containing the names of chromosomes to be included

tumourbam

Full path to the tumour BAM file

tumourname

Identifier to be used for tumour output files (i.e. the cell line BAM file name without the '.bam' extension).

g1000lociprefix

Prefix path to the 1000 Genomes loci reference files

g1000allelesprefix

Prefix path to the 1000 Genomes SNP allele reference files

gamma_ivd

The PCF gamma value for segmentation of 1000G hetSNP IVD values (Default 1e5).

kmin_ivd

The min number of SNPs to support a segment in PCF of 1000G hetSNP IVD values (Default 50)

centromere_noise_seg_size

The maximum size of PCF segment to be removed as noise when it overlaps with the centromere due to the noisy nature of data (Default 1e6)

centromere_dist

The minimum distance from the centromere to ignore in analysis due to the noisy nature of data in the vicinity of centromeres (Default 5e5)

min_het_dist

The minimum distance for detecting higher resolution inter-hetSNP regions with potential LOH while accounting for inherent homozygote stretches (Default 1e5)

gamma_logr

The PCF gamma value for confirming LOH within each inter-hetSNP candidate segment (Default 100)

length_adjacent

The length of adjacent regions either side of a candidate inter-hetSNP LOH region to be plotted (Default 5e4)

gccorrectprefix

Prefix path to GC content reference data

repliccorrectprefix

Prefix path to replication timing reference data (supply NULL if no replication timing correction is to be applied)

min_base_qual

Minimum base quality required for a read to be counted

min_map_qual

Minimum mapping quality required for a read to be counted

allelecounter_exe

Path to the allele counter executable (can be found in $PATH)

min_normal_depth

Minimum depth required in the normal for a SNP to be included

skip_allele_counting

Flag, set to TRUE if allele counting is already complete (files are expected in the working directory on disk)

Author

Naser Ansari-Pour (BDI, Oxford)