Function to generate normal-pair allele count files based on IVD-PCF and inter-hetSNP logR-based LOH detection (IVD: Inter-Variant Distance, het: heterozygote) This method reconstructs the normal-pair counts by using the allele counts of the Cell line as template. It fills the detected LOH regions with evenly-distributed hetSNPs with the density estimated based on each chromosome in each tumour sample. It essentially informs Battenberg of the location of hetSNPs across the genome in the tumour sample.

cell_line_reconstruct_normal(
  TUMOURNAME,
  NORMALNAME,
  chrom_coord,
  chrom,
  CL_OHET,
  CL_AL,
  CL_AC,
  CL_LogR,
  GAMMA_IVD,
  KMIN_IVD,
  CENTROMERE_NOISE_SEG_SIZE,
  CENTROMERE_DIST,
  MIN_HET_DIST,
  GAMMA_LOGR,
  LENGTH_ADJACENT
)

Arguments

TUMOURNAME

The tumour name used for Battenberg (i.e. the cell line BAM file name without the '.bam' extension).

NORMALNAME

The normal name used for naming the generated normal-pair allele counts files.

chrom_coord

Full path to the file with chromosome coordinates including start, end and left/right centromere positions

chrom

Chromosome number for which normal-pair will be reconstructed (1,2, etc.)

CL_OHET

List of observed heterozygous SNPs across all chromosomes generated within the cell_line_baf_logR function

CL_AL

List of alleles at SNPs across all chromosomes generated within the cell_line_baf_logR function

CL_AC

List of allele counts at SNPs across all chromosomes generated within the cell_line_baf_logR function

CL_LogR

Dataframe of genomewide LogR values for SNPs across all chromosomes generated within the cell_line_baf_logR function

GAMMA_IVD

The PCF gamma value for segmentation of 1000G hetSNP IVD values (Default 1e5).

KMIN_IVD

The min number of SNPs to support a segment in PCF of 1000G hetSNP IVD values (Default 50)

CENTROMERE_NOISE_SEG_SIZE

The maximum size of PCF segment to be removed as noise when it overlaps with the centromere due to the noisy nature of data (Default 1e6)

CENTROMERE_DIST

The minimum distance from the centromere to ignore in analysis due to the noisy nature of data in the vicinity of centromeres (Default 5e5)

MIN_HET_DIST

The minimum distance for detecting higher resolution inter-hetSNP regions with potential LOH while accounting for inherent homozygote stretches (Default 1e5)

GAMMA_LOGR

The PCF gamma value for confirming LOH within each inter-hetSNP candidate segment (Default 100)

LENGTH_ADJACENT

The length of adjacent regions either side of a candidate inter-hetSNP LOH region to be plotted (Default 5e4)

Author

Naser Ansari-Pour (BDI, Oxford)