R/prepare_wgs_germline.R
germline_reconstruct_normal.RdFunction to generate normal-pair allele count files based on IVD-PCF and inter-hetSNP logR-based LOH detection (IVD: Inter-Variant Distance, het: heterozygote) This method reconstructs the normal-pair counts by using the allele counts of the Germline as template It fills the detected LOH regions with evenly-distributed hetSNPs with the density estimated based on each chromosome in each germline sample It essentially informs Battenberg of the location of hetSNPs across the genome in the germline sample
germline_reconstruct_normal(
GERMLINENAME,
NORMALNAME,
chrom_coord,
chrom,
GL_OHET,
GL_AL,
GL_AC,
GL_LogR,
GAMMA_IVD,
KMIN_IVD,
CENTROMERE_NOISE_SEG_SIZE,
CENTROMERE_DIST,
MIN_HET_DIST,
GAMMA_LOGR,
LENGTH_ADJACENT
)The germline name used for Battenberg (i.e. the germline BAM file name without the '.bam' extension)
The normal name used for naming the generated normal-pair allele counts files
Full path to the file with chromosome coordinates including start, end and left/right centromere positions
Chromosome number for which normal-pair will be reconstructed
List of observed heterozygous SNPs across all chromosomes generated within the germline_baf_logR function
List of alleles at SNPs across all chromosomes generated within the germline_baf_logR function
List of allele counts at SNPs across all chromosomes generated within the germline_baf_logR function
Dataframe of genomewide LogR values for SNPs across all chromosomes generated within the germline_baf_logR function
The PCF gamma value for segmentation of 1000G hetSNP IVD values (Default 1e5)
The min number of SNPs to support a segment in PCF of 1000G hetSNP IVD values (Default 50)
The minimum distance from the centromere to ignore in analysis due to the noisy nature of data in the vicinity of centromeres (Default 5e5)
The minimum distance for detecting higher resolution inter-hetSNP regions with potential LOH while accounting for inherent homozygote stretches (Default 1e5)
The PCF gamma value for confirming LOH within each inter-hetSNP candidate segment (Default 100)
The length of adjacent regions either side of a candidate inter-hetSNP LOH region to be plotted (Default 5e4)