We introduced DECODE, a novel deconvolution method to infer the neutral tail and mutation clusters from DNA-sequencing samples.

We introduced DECODE (Deciphering Cancer Origin from DNA Evolution), a novel tumor deconvolution method that can detect and characterize the neutral tail and mutation clusters in the site frequency spectrum (SFS) from a DNA-sequencing sample. DECODE can be installed from GitHub.

DECODE is based on our mathematical framework for the SFS, which corrects for sample-specific sequencing coverage and mutation calling biases. It implements ABC-SMC-DRF, our general likelihood-free inference method available as a stand-alone R package, which incorporates random forests into the framework of sequential Monte Carlo to accurately and efficiently infer the tail and cluster parameters.

On synthetic data, DECODE outperformed existing methods across multiple metrics for intra-tumor heterogeneity (ITH) and accurately detected and characterized the SFS neutral tail, the shape of which reflects the tumor’s expansion mode. In acute myeloid leukemia, accounting for the tail yielded more parsimonious clonal decompositions that are better aligned with the subclonal dynamics that drive relapse. Applied to The Cancer Genome Atlas, DECODE detected a neutral SFS tail in most samples across tumor types and uncovered a clinically meaningful link between ITH and survival in low-grade glioma. By jointly inferring clonality and expansion mode, DECODE provides two complementary readouts of tumor evolution from a single sample.