Patents by Inventor Kai-How FARH

Kai-How FARH has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Artificial intelligence-based analysis of protein three-dimensional (3D) structures

Patent number: 12217829

Abstract: The technology disclosed relates to determining pathogenicity of variants. In particular, the technology disclosed relates to generating amino acid-wise distance channels for a plurality of amino acids in a protein. Each of the amino acid-wise distance channels has voxel-wise distance values for voxels in a plurality of voxels. A tensor includes the amino acid-wise distance channels and at least an alternative allele of the protein expressed by a variant. A deep convolutional neural network determines a pathogenicity of the variant based at least in part on processing the tensor. The technology disclosed further augments the tensor with supplemental information like a reference allele of the protein, evolutionary conservation data about the protein, annotation data about the protein, and structure confidence data about the protein.

Type: Grant

Filed: April 15, 2021

Date of Patent: February 4, 2025

Assignee: Illumina, Inc.

Inventors: Tobias Hamp, Hong Gao, Kai-How Farh
Deep learning-based variant classifier

Patent number: 12217832

Abstract: The technology disclosed directly operates on sequencing data and derives its own feature filters. It processes a plurality of aligned reads that span a target base position. It combines elegant encoding of the reads with a lightweight analysis to produce good recall and precision using lightweight hardware. For instance, one million training examples of target base variant sites with 50 to 100 reads each can be trained on a single GPU card in less than 10 hours with good recall and precision. A single GPU card is desirable because it a computer with a single GPU is inexpensive, almost universally within reach for users looking at genetic data. It is readily available on could-based platforms.

Type: Grant

Filed: May 9, 2023

Date of Patent: February 4, 2025

Assignees: Illumina, Inc., Illumina Cambridge Limited

Inventors: Ole Schulz-Trieglaff, Anthony James Cox, Kai-How Farh
Splicing site classification using neural networks

Patent number: 12165742

Abstract: The technology disclosed relates to splice site prediction and aberrant splicing detection. In particular, it relates to a splice site predictor that includes a convolutional neural network trained on training examples of donor splice sites, acceptor splice sites, and non-splicing sites. An input stage of the convolutional neural network feeds an input sequence of nucleotides for evaluation of target nucleotides in the input sequence. An output stage of the convolutional neural network translates analysis by the convolutional neural network into classification scores for likelihoods that each of the target nucleotides is a donor splice site, an acceptor splice site, and a non-splicing site.

Type: Grant

Filed: September 29, 2023

Date of Patent: December 10, 2024

Assignee: Illumina, Inc.

Inventors: Kishore Jaganathan, Kai-How Farh, Jeremy Francis McRae, Sofia Kyriazopoulou Panagiotopoulou
CALIBRATING PATHOGENCITY SCORES FROM A VARIANT PATHOGENCITY MACHINE-LEARNING MODEL

Publication number: 20240290425

Abstract: This disclosure describes methods, non-transitory-computer readable media, and systems that can identify and apply a temperature weight to a pathogenicity prediction for an amino-acid variant at a particular protein position to calibrate and improve an accuracy of such a prediction. For example, in some cases, a variant pathogenicity machine-learning model generates an initial pathogenicity score for a protein or a target amino acid at a particular protein position based on an amino-acid sequence of the protein. The disclosed system further identifies a temperature weight that estimates a degree of certainty for pathogenicity scores output by the variant pathogenicity machine-learning model. To generate such a weight, the disclosed system can use a new triangle attention neural network as a temperature prediction machine-learning model.

Type: Application

Filed: February 28, 2024

Publication date: August 29, 2024

Inventors: Tobias Hamp, Jeffrey Mark Ede, Kai-How Farh
Deep learning-based framework for identifying sequence patterns that cause sequence-specific errors (SSEs)

Patent number: 12073922

Abstract: The technology disclosed presents a deep learning-based framework, which identifies sequence patterns that cause sequence-specific errors (SSEs). Systems and methods train a variant filter on large-scale variant data to learn causal dependencies between sequence patterns and false variant calls. The variant filter has a hierarchical structure built on deep neural networks such as convolutional neural networks and fully-connected neural networks. Systems and methods implement a simulation that uses the variant filter to test known sequence patterns for their effect on variant filtering. The premise of the simulation is as follows: when a pair of a repeat pattern under test and a called variant is fed to the variant filter as part of a simulated input sequence and the variant filter classifies the called variant as a false variant call, then the repeat pattern is considered to have caused the false variant call and identified as SSE-causing.

Type: Grant

Filed: July 8, 2019

Date of Patent: August 27, 2024

Assignee: Illumina, Inc.

Inventors: Dorna Kashefhaghighi, Amirali Kia, Kai-How Farh
Deep Learning-Based Pathogenicity Classifier for Promoter Single Nucleotide Variants (pSNVs)

Publication number: 20240242075

Abstract: We disclose computational models that alleviate the effects of human ascertainment biases in curated pathogenic non-coding variant databases by generating pathogenicity scores for variants occurring in the promoter regions (referred to herein as promoter single nucleotide variants (pSNVs)). We train deep learning networks (referred to herein as pathogenicity classifiers) using a semi-supervised approach to discriminate between a set of labeled benign variants and an unlabeled set of variants that were matched to remove biases.

Type: Application

Filed: November 17, 2023

Publication date: July 18, 2024

Inventors: Sofia Kyriazopoulou Panagiotopoulou, Kai-How Farh
ANALYZING EXPRESSION OF PROTEIN-CODING VARIANTS IN CELLS

Publication number: 20240167020

Abstract: Analyzing expression of protein-coding variants in cells is provided herein. A method may include replacing a protein coding-region of the DNA in a cell with a donor vector including a variant of the protein-coding region and a first barcode identifying that variant. The cell may generate mRNA including an expression of the variant and an expression of the first barcode. A second barcode corresponding to the cell may be coupled to the mRNA. The mRNA. having the second barcode coupled thereto, may be reverse transcribed into complementary cDNA. The cDNA may be sequenced. The donor vector or cDNA may be sequenced using amplicon sequencing. The donor vector sequence and the cDNA sequence may be correlated to identify the variant and the cell's expression of the variant.

Type: Application

Filed: March 8, 2022

Publication date: May 23, 2024

Applicant: Illumina, Inc.

Inventors: Hongxia Xu, Tong Liu, Shi Min Xiao, Dan Cao, Victor Quijano, Kai-How Farh, Mohan Sun
MACHINE LEARNING PIPELINE FOR GENOME-WIDE ASSOCIATION STUDIES

Publication number: 20240120024

Abstract: Genome-wide association studies may allow for detection of variants that are statistically significantly associated with disease risk. However, inferring which are the genes underlying these variant associations may be difficult. The presently disclosed approaches utilize machine learning techniques to predict genes from genome-wide association study summary statistics that substantially improves causal gene identification in terms of both precision and recall compared to other techniques.

Type: Application

Filed: October 9, 2023

Publication date: April 11, 2024

Inventors: Yair Field, Jacob Christopher Ulirsch, Cinzia Malangone, Miguel Madrid-Mencia, Geoffrey Nilsen, Pam Tang Cheng, Ileena Mitra, Petko Plamenov Fiziev, Sabrina Rashid, Anthonius Petrus Nicolaas de Boer, Pierrick Wainschtein, Vlad Mihai Sima, Francois Aguet, Kai-How Farh
SPLICING SITE CLASSIFICATION USING NEURAL NETWORKS

Publication number: 20240055072

Abstract: The technology disclosed relates to splice site prediction and aberrant splicing detection. In particular, it relates to a splice site predictor that includes a convolutional neural network trained on training examples of donor splice sites, acceptor splice sites, and non-splicing sites. An input stage of the convolutional neural network feeds an input sequence of nucleotides for evaluation of target nucleotides in the input sequence. An output stage of the convolutional neural network translates analysis by the convolutional neural network into classification scores for likelihoods that each of the target nucleotides is a donor splice site, an acceptor splice site, and a non-splicing site.

Type: Application

Filed: September 29, 2023

Publication date: February 15, 2024

Inventors: Kishore Jaganathan, Kai-How Farh, Jeremy Francis McRae, Sofia Kyriazopoulou Panagiotopoulou
Federated systems and methods for medical data sharing

Patent number: 11875237

Abstract: Systems, computer-implemented methods, and non-transitory computer readable media are provided for sharing medical data. The disclosed systems may be configured to create a first workgroup having a first knowledgebase. This first knowledgebase may be federated with a common knowledgebase, and with a second knowledgebase of a second workgroup. At least one of the first knowledgebase, common knowledgebase, and second knowledgebase may be configured to store data items comprising associations, signs, and evidence. The signs may comprise measurements and contexts, and the associations may describe the relationships between the measurements and contexts. The evidence may support these associations. The disclosed systems may be configured to receive a request from a user in the first workgroup, retrieve matching data items, and optionally then output to the user at least some of the retrieved matching data items. The request may comprise at least one of a first association and a first measurement.

Type: Grant

Filed: February 7, 2022

Date of Patent: January 16, 2024

Assignee: Illumina, Inc.

Inventors: Kai-How Farh, Donavan Cheng, John Casey Shon, Jorg Hakenberg, Eugene Bolotin, James Geaney, Hong Gao, Pam Cheng, Inderjit Singh, Daniel Roche, Milan Karangutkar
Splicing Site Classification Using Neural Networks

Publication number: 20240013856

Abstract: The technology disclosed relates to splice site prediction and aberrant splicing detection. In particular, it relates to a splice site predictor that includes a convolutional neural network trained on training examples of donor splice sites, acceptor splice sites, and non-splicing sites. An input stage of the convolutional neural network feeds an input sequence of nucleotides for evaluation of target nucleotides in the input sequence. An output stage of the convolutional neural network translates analysis by the convolutional neural network into classification scores for likelihoods that each of the target nucleotides is a donor splice site, an acceptor splice site, and a non-splicing site.

Type: Application

Filed: July 26, 2022

Publication date: January 11, 2024

Applicant: Illumina, Inc.

Inventors: Kishore Jaganathan, Kai-how Farh, Jeremy F. McRAE, Sofia Kyriazopoulou Panagiotopoulou
Deep learning-based pathogenicity classifier for promoter single nucleotide variants (pSNVs)

Patent number: 11861491

Abstract: We disclose computational models that alleviate the effects of human ascertainment biases in curated pathogenic non-coding variant databases by generating pathogenicity scores for variants occurring in the promoter regions (referred to herein as promoter single nucleotide variants (pSNVs)). We train deep learning networks (referred to herein as pathogenicity classifiers) using a semi-supervised approach to discriminate between a set of labeled benign variants and an unlabeled set of variants that were matched to remove biases.

Type: Grant

Filed: September 20, 2019

Date of Patent: January 2, 2024

Assignee: Illumina, Inc.

Inventors: Sofia Kyriazopoulou Panagiotopoulou, Kai-How Farh
Deep learning-based aberrant splicing detection

Patent number: 11837324

Abstract: The technology disclosed relates to constructing a convolutional neural network-based classifier for variant classification. In particular, it relates to training a convolutional neural network-based classifier on training data using a backpropagation-based gradient update technique that progressively match outputs of the convolutional neural network-based classifier with corresponding ground truth labels. The convolutional neural network-based classifier comprises groups of residual blocks, each group of residual blocks is parameterized by a number of convolution filters in the residual blocks, a convolution window size of the residual blocks, and an atrous convolution rate of the residual blocks, the size of convolution window varies between groups of residual blocks, the atrous convolution rate varies between groups of residual blocks. The training data includes benign training examples and pathogenic training examples of translated sequence pairs generated from benign variants and pathogenic variants.

Type: Grant

Filed: October 15, 2018

Date of Patent: December 5, 2023

Assignee: Illumina, Inc.

Inventors: Kishore Jaganathan, Kai-How Farh, Sofia Kyriazopoulou Panagiotopoulou, Jeremy Francis McRae
DEEP LEARNING-BASED VARIANT CLASSIFIER

Publication number: 20230386611

Abstract: The technology disclosed directly operates on sequencing data and derives its own feature filters. It processes a plurality of aligned reads that span a target base position. It combines elegant encoding of the reads with a lightweight analysis to produce good recall and precision using lightweight hardware. For instance, one million training examples of target base variant sites with 50 to 100 reads each can be trained on a single GPU card in less than 10 hours with good recall and precision. A single GPU card is desirable because it a computer with a single GPU is inexpensive, almost universally within reach for users looking at genetic data. It is readily available on could-based platforms.

Type: Application

Filed: May 9, 2023

Publication date: November 30, 2023

Inventors: Ole SCHULZ-TRIEGLAFF, Anthony James COX, Kai-How FARH
PROTEIN STRUCTURE-BASED PROTEIN LANGUAGE MODELS

Publication number: 20230343413

Abstract: The technology disclosed relates to determining pathogenicity of nucleotide variants. In particular, the technology disclosed relates to specifying a particular amino acid at a particular position in a protein as a gap amino acid, and specifying remaining amino acids at remaining positions in the protein as non-gap amino acids. The technology disclosed further relates to generating a gaped spatial representation of the protein that includes spatial configurations of the non-gap amino acids, and excludes a spatial configuration of the gap amino acid, and determining a pathogenicity of a nucleotide variant based at least in part on the gaped spatial representation, and a representation of an alternate amino acid coated by the nucleotide variant at the particular position.

Type: Application

Filed: November 15, 2022

Publication date: October 26, 2023

Inventors: Tobias HAMP, Hong GAO, Kai-How FARH
Semi-supervised learning for training an ensemble of deep convolutional neural networks

Patent number: 11798650

Abstract: The technology disclosed relates to constructing a convolutional neural network-based classifier for variant classification. In particular, it relates to training a convolutional neural network-based classifier on training data using a backpropagation-based gradient update technique that progressively match outputs of the convolutional neural network-based classifier with corresponding ground truth labels. The convolutional neural network-based classifier comprises groups of residual blocks, each group of residual blocks is parameterized by a number of convolution filters in the residual blocks, a convolution window size of the residual blocks, and an atrous convolution rate of the residual blocks, the size of convolution window varies between groups of residual blocks, the atrous convolution rate varies between groups of residual blocks. The training data includes benign training examples and pathogenic training examples of translated sequence pairs generated from benign variants and pathogenic variants.

Type: Grant

Filed: October 15, 2018

Date of Patent: October 24, 2023

Assignee: Illumina, Inc.

Inventors: Laksshman Sundaram, Kai-How Farh, Hong Gao, Jeremy Francis McRae
IMAGE-BASED VARIANT PATHOGENICITY DETERMINATION

Publication number: 20230245305

Abstract: Described herein are technologies for classifying a protein structure (such as technologies for classifying the pathogenicity of a protein structure related to a nucleotide variant). Such a classification is based on two-dimensional images taken from a three-dimensional image of the protein structure. With respect to some implementations, described herein are multi-view convolutional neural networks (CNNs) for classifying a protein structure based on inputs of two-dimensional images taken from a three-dimensional image of the protein structure. In some implementations, a computer-implemented method of determining pathogenicity of variants includes accessing a structural rendition of amino acids, capturing images of those parts of the structural rendition that contain a target amino acid from the amino acids, and, based on the images, determining pathogenicity of a nucleotide variant that mutates the target amino acid into an alternate amino acid.

Type: Application

Filed: January 27, 2023

Publication date: August 3, 2023

Inventors: Tobias Hamp, Hong Gao, Kai-How Farh
INDEL PATHOGENICITY DETERMINATION

Publication number: 20230245717

Abstract: Described herein are technologies for converting context of an ANN or context of another type of computing system that is trainable through machine learning. In some implementations, the technologies convert a first context of a computing system (such as an ANN), which is to provide pathogenicity of variants of genomes of a population, to a second context of the computing system, which is to provide pathogenicity of indels of the genomes of the population.

Type: Application

Filed: January 27, 2023

Publication date: August 3, 2023

Inventors: Jeremy Francis McRae, Yanshen Yang, Marc Fasnacht, Kai-How Farh
Deep learning-based variant classifier

Patent number: 11705219

Abstract: The technology disclosed directly operates on sequencing data and derives its own feature filters. It processes a plurality of aligned reads that span a target base position. It combines elegant encoding of the reads with a lightweight analysis to produce good recall and precision using lightweight hardware. For instance, one million training examples of target base variant sites with 50 to 100 reads each can be trained on a single GPU card in less than 10 hours with good recall and precision. A single GPU card is desirable because it a computer with a single GPU is inexpensive, almost universally within reach for users looking at genetic data. It is readily available on could-based platforms.

Type: Grant

Filed: January 14, 2019

Date of Patent: July 18, 2023

Assignees: Illumina, Inc., Illumina Cambridge Limited

Inventors: Ole Schulz-Trieglaff, Anthony James Cox, Kai-How Farh
INTER-MODEL PREDICTION SCORE RECALIBRATION

Publication number: 20230223100

Abstract: The technology disclosed relates to inter-model prediction score recalibration. In one implementation, the technology disclosed relates to a system including a first model that generates, based on evolutionary conservation summary statistics of amino acids in a target protein sequence, a first pathogenicity score-to-rank mapping for a set of variants in the target protein sequence; and a second model that generates, based on epistasis expressed by amino acid patterns spanning the target protein sequence and a plurality of non-target protein sequences aligned in multiple sequence alignment, a second pathogenicity score-to-rank mapping for the set of variants. The system also includes a reassignment logic that reassigns pathogenicity scores from the first set of pathogenicity scores to the set of variants based on the first and second score-to-rank mappings, and an output logic to generate a ranking of the set of variants based on the reassigned scores.

Type: Application

Filed: September 16, 2022

Publication date: July 13, 2023

Applicants: Illumina, Inc., Illumina Cambridge Limited

Inventors: Tobias HAMP, Kai-How FARH

1 2 3 4 next