Abstract: Current methods for annotating and interpreting human genetic variation typically exploit only a single information type (e.g., conservation) and/or are restricted in scope (e.g., to missense changes). Here, a method for objectively integrating many diverse annotations into a single measure (integrated deleteriousness score, or C-score) for each variant is described. The method may be implemented as a support vector machine (SVM) trained to differentiate high-frequency human-derived alleles from simulated variants. C-scores were precomputed for all 8.6 billion possible human single-nucleotide variants and allow scoring of short insertions-deletions. C-scores correlate with allelic diversity, annotations of functionality, pathogenicity, disease severity, experimentally measured regulatory effects and complex trait associations, and they highly rank known pathogenic variants within individual genomes.
Type:
Application
Filed:
September 20, 2014
Publication date:
December 8, 2016
Applicants:
University of Washington through its Center for Commercialization, Hudsonsalpha Institute for Biotechnology
Inventors:
Jay Shendure, Gregory M Cooper, Martin Kircher, Daniela Witten