vignettes/getting-started.Rmd
getting-started.RmdBattenberg is a whole genome sequencing subclonal copy number caller that estimates subclonal copy number alterations from matched tumor-normal whole genome sequencing data. It can detect both clonal and subclonal copy number changes and provides estimates of tumor purity and ploidy.
Battenberg requires several dependencies. Install them first:
# Install Bioconductor packages
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(c("devtools", "splines", "readr", "doParallel",
"ggplot2", "RColorBrewer", "gridExtra", "gtools",
"parallel", "VariantAnnotation", "GenomicRanges"))
# Install modified copynumber package
devtools::install_github("igordot/copynumber")
# Install ASCAT
devtools::install_github("VanLoo-lab/ascat/ASCAT")
# Install from GitHub (pre_3.0 branch)
devtools::install_github("Wedge-lab/battenberg", ref="pre_3.0")Before running Battenberg, you need to download reference data:
The main function battenberg() runs the complete analysis pipeline:
# Define sample names and file paths
TUMOURNAME <- "sample_tumor"
NORMALNAME <- "sample_normal"
TUMOURBAM <- "path/to/tumor.bam"
NORMALBAM <- "path/to/normal.bam"
# Reference file paths (adjust to your setup)
IMPUTEINFOFILE <- "path/to/impute_info.txt"
G1000PREFIX <- "path/to/1000genomes/prefix"
PROBLEMLOCI <- "path/to/probloci.txt"
GCCORRECTPREFIX <- "path/to/gc_correction/prefix"
REPLICCORRECTPREFIX <- "path/to/replic_correction/prefix"
G1000PREFIX_AC <- "path/to/1000genomes_alleles/prefix"
# Determine if sample is male (TRUE) or female (FALSE)
IS_MALE <- FALSE
# Number of threads to use
NTHREADS <- 8
# Run Battenberg
result <- battenberg(
tumourname = TUMOURNAME,
normalname = NORMALNAME,
tumour_data_file = TUMOURBAM,
normal_data_file = NORMALBAM,
imputeinfofile = IMPUTEINFOFILE,
g1000prefix = G1000PREFIX,
problemloci = PROBLEMLOCI,
gccorrectprefix = GCCORRECTPREFIX,
repliccorrectprefix = REPLICCORRECTPREFIX,
g1000allelesprefix = G1000PREFIX_AC,
ismale = IS_MALE,
data_type = "wgs",
nthreads = NTHREADS,
platform_gamma = 1,
phasing_gamma = 1,
segmentation_gamma = 10,
segmentation_kmin = 3,
phasing_kmin = 1,
min_ploidy = 1.6,
max_ploidy = 4.8,
min_rho = 0.1,
min_goodness = 0.63,
uninformative_BAF_threshold = 0.51,
min_normal_depth = 10,
min_base_qual = 20,
min_map_qual = 35
)Battenberg produces several key output files:
[samplename]_copynumber.txt: Copy number segments with clonal/subclonal states[samplename]_rho_and_psi.txt: Tumor purity and ploidy estimates
# Read the main results file
cn_data <- read.delim("sample_tumor_copynumber.txt")
# Examine the structure
head(cn_data)
str(cn_data)
# Read purity and ploidy estimates
rho_psi <- read.delim("sample_tumor_rho_and_psi.txt")
# Extract purity (rho) - use FRAC_genome value from second row
tumor_purity <- rho_psi$rho[2]
tumor_ploidy <- rho_psi$psi[2]
cat("Estimated tumor purity:", tumor_purity, "\n")
cat("Estimated tumor ploidy:", tumor_ploidy, "\n")