Patents by Inventor Bonnie Berger Leighton

Bonnie Berger Leighton has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Quality score compression for improving downstream genotyping accuracy

Publication number: 20240004838

Abstract: This disclosure provides for a highly-efficient and scalable compression tool that compresses quality scores, preferably by capitalizing on sequence redundancy. In one embodiment, compression is achieved by smoothing a large fraction of quality score values based on k-mer neighborhood of their corresponding positions in read sequences. The approach exploits the intuition that any divergent base in a k-mer likely corresponds to either a single-nucleotide polymorphism (SNP) or sequencing error; thus, a preferred approach is to only preserve quality scores for probable variant locations and compress quality scores of concordant bases, preferably by resetting them to a default value. By viewing individual read datasets through the lens of k-mer frequencies in a corpus of reads, the approach herein ensures that compression “lossiness” does not affect accuracy in a deleterious way.

Type: Application

Filed: September 19, 2023

Publication date: January 4, 2024

Inventors: Bonnie Berger Leighton, Deniz Yorukoglu, Yun William Yu, Jian Peng
Quality score compression apparatus and method for improving downstream accuracy

Patent number: 11762813

Abstract: This disclosure provides for a highly-efficient and scalable compression tool that compresses quality scores, preferably by capitalizing on sequence redundancy. In one embodiment, compression is achieved by smoothing a large fraction of quality score values based on k-mer neighborhood of their corresponding positions in read sequences. The approach exploits the intuition that any divergent base in a k-mer likely corresponds to either a single-nucleotide polymorphism (SNP) or sequencing error; thus, a preferred approach is to only preserve quality scores for probable variant locations and compress quality scores of concordant bases, preferably by resetting them to a default value. By viewing individual read datasets through the lens of k-mer frequencies in a corpus of reads, the approach herein ensures that compression “lossiness” does not affect accuracy in a deleterious way.

Type: Grant

Filed: February 5, 2019

Date of Patent: September 19, 2023

Inventors: Bonnie Berger Leighton, Deniz Yorukoglu, Yun William Yu, Jian Peng
Memory-efficient whole genome assembly of long reads

Publication number: 20230178179

Abstract: A method for computation- and memory-efficient DNA sequencing. In one embodiment, the approach herein is used to facilitate genome assembly for state-of-the-art and low-error long-read data. In this embodiment, the approach herein implements a minimizer-space de Bruijn graph, which—instead of building an assembly over sequence bases (in a base-space wherein an alphabet sequence comprises nucleotide letters)—performs assembly in a minimizer-space (wherein an alphabet sequence comprises an ordered sequence of minimizers), and later converts the assembly back to base-space assemblies. Specifically, and in a preferred implementation, each read is initially converted to an ordered sequence of its minimizers. The order of the minimizers is maintained to facilitate reconstructing the entire genome as an ordered list.

Type: Application

Filed: September 6, 2022

Publication date: June 8, 2023

Inventors: Baris Ekim, Bonnie Berger Leighton, Rayan Chikhi
Realizing private and practical pharmacological collaboration using a neural network architecture configured for reduced computation overhead

Publication number: 20230154630

Abstract: Computationally-efficient techniques facilitate secure pharmacological collaboration with respect to private drug target interaction (DTI) data. In one embodiment, a method begins by receiving, via a secret sharing protocol, observed DTI data from individual participating entities. A secure computation then is executed against the secretly-shared data to generate a pooled DTI dataset. For increased computational efficiency, at least a part of the computation is executed over dimensionality-reduced data. The resulting pooled DTI dataset is then used to train a neural network model. The model is then used to provide one or more DTI predictions that are then returned to the participating entities (or other interested parties).

Type: Application

Filed: September 20, 2022

Publication date: May 18, 2023

Inventors: Brian Hie, Bonnie Berger Leighton, Hyunghoon Cho
Protein database search using learned representations

Publication number: 20230123770

Abstract: A method for efficient search of protein sequence databases for proteins that have sequence, structural, and/or functional homology with respect to information derived from a search query. The method involves transforming the protein sequences into vector representations and searching in a vector space. Given a database of protein sequences and a learned embedding model, the embedding model is applied to each amino acid sequence to transform it into a sequence of vector representations. A query sequence is also transformed into a sequence of vector representations, preferably using the same learned embedding model. Once the query has been embedded in this manner, proteins are retrieved from the database based on distance between the query embedding and the protein embeddings contained within the database. Rapid and accurate search of the vector space is carried out using exact search using metric data structures, or approximate search using locality sensitive hashing.

Type: Application

Filed: December 20, 2022

Publication date: April 20, 2023

Inventors: Tristan Bepler, Bonnie Berger Leighton
Compressively-accelerated read mapping framework for next-generation sequencing

Patent number: 11632125

Abstract: A method of compressive read mapping. A high-resolution homology table is created for the reference genomic sequence, preferably by mapping the reference to itself. Once the homology table is created, the reads are compressed to eliminate full or partial redundancies across reads in the dataset. Preferably, compression is achieved through self-mapping of the read dataset. Next, a coarse mapping from the compressed read data to the reference is performed. Each read link generated represents a cluster of substrings from one or more reads in the dataset and stores their differences from a locus in the reference. Preferably, read links are further expanded to obtain final mapping results through traversal of the homology table, and final mapping results are reported. As compared to prior techniques, substantial speed-up gains are achieved through the compressive read mapping technique due to efficient utilization of redundancy within read sequences as well as the reference.

Type: Grant

Filed: June 8, 2021

Date of Patent: April 18, 2023

Inventors: Bonnie Berger Leighton, Deniz Yorukoglu, Jian Peng
Protein database search using learned representations

Patent number: 11532378

Abstract: A method for efficient search of protein sequence databases for proteins that have sequence, structural, and/or functional homology with respect to information derived from a search query. The method involves transforming the protein sequences into vector representations and searching in a vector space. Given a database of protein sequences and a learned embedding model, the embedding model is applied to each amino acid sequence to transform it into a sequence of vector representations. A query sequence is also transformed into a sequence of vector representations, preferably using the same learned embedding model. Once the query has been embedded in this manner, proteins are retrieved from the database based on distance between the query embedding and the protein embeddings contained within the database. Rapid and accurate search of the vector space is carried out using exact search using metric data structures, or approximate search using locality sensitive hashing.

Type: Grant

Filed: November 23, 2021

Date of Patent: December 20, 2022

Assignee: NE47 Bio, Inc.

Inventors: Tristan Bepler, Bonnie Berger Leighton
Multi-resolution modeling of discrete stochastic processes for computationally-efficient information search and retrieval

Publication number: 20220374425

Abstract: An activity of interest is modeled by a non-stationary discrete stochastic process, such as a pattern of mutations across a cancer genome. Initially, input genomic data is used to train a model to predict rate parameters and their associated uncertainty estimation for each of a set of process regions. For any arbitrary set of indexed positions of the stochastic process that are identified in an information query, the rate parameters and their associated estimation uncertainties are scaled using the model to obtain a distribution of the events of interest and their associated estimation uncertainties for the set of indexed positions. In one practical application, and in response to a search query associated with one or more base-pairs, a result is then returned. The result, which represents deviations between the estimated and observed mutation rates, is used to identify genomic elements that have more mutations than expected and therefore constitute previously unknown driver mutations.

Type: Application

Filed: April 18, 2022

Publication date: November 24, 2022

Inventors: Bonnie Berger Leighton, Maxwell Aaron Sherman, Adam Uri Yaari
Realizing private and practical pharmacological collaboration using a neural network architecture configured for reduced computation overhead

Patent number: 11450439

Abstract: Computationally-efficient techniques facilitate secure pharmacological collaboration with respect to private drug target interaction (DTI) data. In one embodiment, a method begins by receiving, via a secret sharing protocol, observed DTI data from individual participating entities. A secure computation then is executed against the secretly-shared data to generate a pooled DTI dataset. For increased computational efficiency, at least a part of the computation is executed over dimensionality-reduced data. The resulting pooled DTI dataset is then used to train a neural network model. The model is then used to provide one or more DTI predictions that are then returned to the participating entities (or other interested parties).

Type: Grant

Filed: December 28, 2018

Date of Patent: September 20, 2022

Inventors: Brian Hie, Bonnie Berger Leighton, Hyunghoon Cho
Protein database search using learned representations

Publication number: 20220165356

Abstract: A method for efficient search of protein sequence databases for proteins that have sequence, structural, and/or functional homology with respect to information derived from a search query. The method involves transforming the protein sequences into vector representations and searching in a vector space. Given a database of protein sequences and a learned embedding model, the embedding model is applied to each amino acid sequence to transform it into a sequence of vector representations. A query sequence is also transformed into a sequence of vector representations, preferably using the same learned embedding model. Once the query has been embedded in this manner, proteins are retrieved from the database based on distance between the query embedding and the protein embeddings contained within the database. Rapid and accurate search of the vector space is carried out using exact search using metric data structures, or approximate search using locality sensitive hashing.

Type: Application

Filed: November 23, 2021

Publication date: May 26, 2022

Inventors: Tristan Bepler, Bonnie Berger Leighton
Multi-resolution modeling of discrete stochastic processes for computationally-efficient information search and retrieval

Patent number: 11308101

Abstract: An activity of interest is modeled by a non-stationary discrete stochastic process, such as a pattern of mutations across a cancer genome. Initially, input genomic data is used to train a model to predict rate parameters and their associated uncertainty estimation for each of a set of process regions. For any arbitrary set of indexed positions of the stochastic process that are identified in an information query, the rate parameters and their associated estimation uncertainties are scaled using the model to obtain a distribution of the events of interest and their associated estimation uncertainties for the set of indexed positions. In one practical application, and in response to a search query associated with one or more base-pairs, a result is then returned. The result, which represents deviations between the estimated and observed mutation rates, is used to identify genomic elements that have more mutations than expected and therefore constitute previously unknown driver mutations.

Type: Grant

Filed: August 27, 2021

Date of Patent: April 19, 2022

Inventors: Bonnie Berger Leighton, Maxwell Aaron Sherman, Adam Uri Yaari
Multi-resolution modeling of discrete stochastic processes for computationally-efficient information search and retrieval

Publication number: 20220092065

Abstract: An activity of interest is modeled by a non-stationary discrete stochastic process, such as a pattern of mutations across a cancer genome. Initially, input genomic data is used to train a model to predict rate parameters and their associated uncertainty estimation for each of a set of process regions. For any arbitrary set of indexed positions of the stochastic process that are identified in an information query, the rate parameters and their associated estimation uncertainties are scaled using the model to obtain a distribution of the events of interest and their associated estimation uncertainties for the set of indexed positions. In one practical application, and in response to a search query associated with one or more base-pairs, a result is then returned. The result, which represents deviations between the estimated and observed mutation rates, is used to identify genomic elements that have more mutations than expected and therefore constitute previously unknown driver mutations.

Type: Application

Filed: August 27, 2021

Publication date: March 24, 2022

Inventors: Bonnie Berger Leighton, Maxwell Aaron Sherman, Adam Uri Yaari
Escape profiling for therapeutic and vaccine development

Publication number: 20220013194

Abstract: A method of viral escape profiling is used in association with antiviral or vaccine development. The method begins by training a language-based model against training data comprising a corpus of viral protein sequences of a given viral protein to model a viral escape profile. The viral escape profile represents, for one or more regions of the given viral protein, a relative viral escape potential of a mutation, the relative viral escape potential being derived as a function that combines both “semantic change,” representing a degree to which the mutation is recognized by the human immune system (i.e., antigenic change), and “grammaticality,” representing a degree to which the mutation affects viral infectivity (i.e. viral fitness). Using the model, a region of the given viral protein having an escape potential of interest is identified. Information regarding the region is then output to a vaccine or anti-viral therapeutic design and development workflow.

Type: Application

Filed: May 17, 2021

Publication date: January 13, 2022

Inventors: Brian Hie, Bonnie Berger Leighton, Bryan D. Bryson
Secure genome crowdsourcing for large-scale association studies

Publication number: 20210398611

Abstract: Computationally-efficient techniques facilitate secure crowdsourcing of genomic and phenotypic data, e.g., for large-scale association studies. In one embodiment, a method begins by receiving, via a secret sharing protocol, genomic and phenotypic data of individual study participants. Another data set, comprising results of pre-computation over random number data, e.g., mutually independent and uniformly-distributed random numbers and results of calculations over those random numbers, is also received via secret sharing. A secure computation then is executed against the secretly-shared genomic and phenotypic data, using the secretly-shared results of the pre-computation over random number data, to generate a set of genome-wide association study (GWAS) statistics. For increased computational efficiency, at least a part of the computation is executed over dimensionality-reduced genomic data.

Type: Application

Filed: February 1, 2021

Publication date: December 23, 2021

Inventors: Hyunghoon Cho, Bonnie Berger Leighton, David J. Wu
Compressively-accelerated read mapping framework for next-generation sequencing

Publication number: 20210297090

Abstract: A method of compressive read mapping. A high-resolution homology table is created for the reference genomic sequence, preferably by mapping the reference to itself. Once the homology table is created, the reads are compressed to eliminate full or partial redundancies across reads in the dataset. Preferably, compression is achieved through self-mapping of the read dataset. Next, a coarse mapping from the compressed read data to the reference is performed. Each read link generated represents a cluster of substrings from one or more reads in the dataset and stores their differences from a locus in the reference. Preferably, read links are further expanded to obtain final mapping results through traversal of the homology table, and final mapping results are reported. As compared to prior techniques, substantial speed-up gains are achieved through the compressive read mapping technique due to efficient utilization of redundancy within read sequences as well as the reference.

Type: Application

Filed: June 8, 2021

Publication date: September 23, 2021

Inventors: Bonnie Berger Leighton, Deniz Yorukoglu, Jian Peng
Compressively-accelerated read mapping framework for next-generation sequencing

Patent number: 11031950

Abstract: A method of compressive read mapping. A high-resolution homology table is created for the reference genomic sequence, preferably by mapping the reference to itself. Once the homology table is created, the reads are compressed to eliminate full or partial redundancies across reads in the dataset. Preferably, compression is achieved through self-mapping of the read dataset. Next, a coarse mapping from the compressed read data to the reference is performed. Each read link generated represents a cluster of substrings from one or more reads in the dataset and stores their differences from a locus in the reference. Preferably, read links are further expanded to obtain final mapping results through traversal of the homology table, and final mapping results are reported. As compared to prior techniques, substantial speed-up gains are achieved through the compressive read mapping technique due to efficient utilization of redundancy within read sequences as well as the reference.

Type: Grant

Filed: March 12, 2019

Date of Patent: June 8, 2021

Inventors: Bonnie Berger Leighton, Deniz Yorukoglu, Jian Peng
Escape profiling for therapeutic and vaccine development

Patent number: 11011253

Abstract: A method of viral escape profiling is used in association with antiviral or vaccine development. The method begins by training a language-based model against training data comprising a corpus of viral protein sequences of a given viral protein to model a viral escape profile. The viral escape profile represents, for one or more regions of the given viral protein, a relative viral escape potential of a mutation, the relative viral escape potential being derived as a function that combines both “semantic change,” representing a degree to which the mutation is recognized by the human immune system (i.e., antigenic change), and “grammaticality,” representing a degree to which the mutation affects viral infectivity (i.e. viral fitness). Using the model, a region of the given viral protein having an escape potential of interest is identified. Information regarding the region is then output to a vaccine or anti-viral therapeutic design and development workflow.

Type: Grant

Filed: January 13, 2021

Date of Patent: May 18, 2021

Inventors: Brian Hie, Bonnie Berger Leighton, Bryan D. Bryson
Secure secret-sharing-based crowdsourcing for large-scale association studies of genomic and phenotypic data

Patent number: 10910087

Abstract: Computationally-efficient techniques facilitate secure crowdsourcing of genomic and phenotypic data, e.g., for large-scale association studies. In one embodiment, a method begins by receiving, via a secret sharing protocol, genomic and phenotypic data of individual study participants. Another data set, comprising results of pre-computation over random number data, e.g., mutually independent and uniformly-distributed random numbers and results of calculations over those random numbers, is also received via secret sharing. A secure computation then is executed against the secretly-shared genomic and phenotypic data, using the secretly-shared results of the pre-computation over random number data, to generate a set of genome-wide association study (GWAS) statistics. For increased computational efficiency, at least a part of the computation is executed over dimensionality-reduced genomic data.

Type: Grant

Filed: June 27, 2018

Date of Patent: February 2, 2021

Inventors: Hyunghoon Cho, Bonnie Berger Leighton, David J. Wu
Compressing, storing and searching sequence data

Publication number: 20200411138

Abstract: The redundancy in genomic sequence data is exploited by compressing sequence data in such a way as to allow direct computation on the compressed data using methods that are referred to herein as “compressive” algorithms. This approach reduces the task of computing on many similar genomes to only slightly more than that of operating on just one. In this approach, the redundancy among genomes is translated into computational acceleration by storing genomes in a compressed format that respects the structure of similarities and differences important to analysis. Specifically, these differences are the nucleotide substitutions, insertions, deletions, and rearrangements introduced by evolution. Once such a compressed library has been created, analysis is performed on it in time proportional to its compressed size, rather than having to reconstruct the full data set every time one wishes to query it.

Type: Application

Filed: September 15, 2020

Publication date: December 31, 2020

Inventors: Michael H. Baym, Bonnie Berger Leighton, Po-Ru Loh
Compressing, storing and searching sequence data

Patent number: 10777304

Abstract: The redundancy in genomic sequence data is exploited by compressing sequence data in such a way as to allow direct computation on the compressed data using methods that are referred to herein as “compressive” algorithms. This approach reduces the task of computing on many similar genomes to only slightly more than that of operating on just one. In this approach, the redundancy among genomes is translated into computational acceleration by storing genomes in a compressed format that respects the structure of similarities and differences important to analysis. Specifically, these differences are the nucleotide substitutions, insertions, deletions, and rearrangements introduced by evolution. Once such a compressed library has been created, analysis is performed on it in time proportional to its compressed size, rather than having to reconstruct the full data set every time one wishes to query it.

Type: Grant

Filed: July 24, 2017

Date of Patent: September 15, 2020

Inventors: Michael H. Baym, Bonnie Berger Leighton, Po-Ru Loh

1 2 3 next