Patents by Inventor Tristan Bepler

Tristan Bepler has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Automating cryo-electron microscopy data collection

Publication number: 20240395497

Abstract: A method of automated control of a microscope in cryogenic electron microscopy (cryo-EM), wherein the microscope is configured to collect high-magnification micrographs of particles suspended in vitreous ice. Such particles are found in grid squares, and a square contains holes from which high-magnification micrographs are imaged. The method is carried out during an active data collection session, leveraging a pipeline that comprises a set of models. The pipeline evaluates a set of collection locations to determine whether to continue collection at a current grid/square or instead at a new grid/square. The evaluation is based on a set of one or more quality scores derived from one or more pretrained models and machine learning-based active learning. Based on the determination, control information is provided to automatically control the microscope to move to a next target for data collection.

Type: Application

Filed: August 6, 2024

Publication date: November 28, 2024

Inventors: Paul T. Kim, Tristan Bepler
Protein engineering workflow using a generative model of protein families

Publication number: 20240282404

Abstract: A retrieval-augmented framework leverages a generative protein language model of whole protein families. The model is configured and trained on homologous sequences and learns to generate sets of related proteins as sequences-of-sequences across very large numbers (e.g., tens of millions) of natural protein sequence clusters. In order to capture conditioning between sequences in an order independent manner (typically, the order of sequences within a family is arbitrary) and to generalize to large context lengths, the model leverages a transformer layer that models order-dependence between tokens within sequences and order-independence between sequences. Upon training, the model is used in protein engineering workflows, such as controllable design of protein sequences and variant effect prediction.

Type: Application

Filed: February 16, 2024

Publication date: August 22, 2024

Inventors: Tristan Bepler, Timothy F. Truong, JR.
Automating cryo-electron microscopy data collection

Patent number: 12057289

Abstract: A method of automated control of a microscope in cryogenic electron microscopy (cryo-EM), wherein the microscope is configured to collect high-magnification micrographs of particles suspended in vitreous ice. Such particles are found in grid squares, and a square contains holes from which high-magnification micrographs are imaged. The method is carried out during an active data collection session, leveraging a pipeline that comprises a set of models. The pipeline evaluates a set of collection locations to determine whether to continue collection at a current grid/square or instead at a new grid/square. The evaluation is based on a set of one or more quality scores derived from one or more pretrained models and machine learning-based active learning. Based on the determination, control information is provided to automatically control the microscope to move to a next target for data collection.

Type: Grant

Filed: May 3, 2023

Date of Patent: August 6, 2024

Assignee: New York Structural Biology Center

Inventors: Paul T. Kim, Tristan Bepler
Protein database search using learned representations

Publication number: 20230123770

Abstract: A method for efficient search of protein sequence databases for proteins that have sequence, structural, and/or functional homology with respect to information derived from a search query. The method involves transforming the protein sequences into vector representations and searching in a vector space. Given a database of protein sequences and a learned embedding model, the embedding model is applied to each amino acid sequence to transform it into a sequence of vector representations. A query sequence is also transformed into a sequence of vector representations, preferably using the same learned embedding model. Once the query has been embedded in this manner, proteins are retrieved from the database based on distance between the query embedding and the protein embeddings contained within the database. Rapid and accurate search of the vector space is carried out using exact search using metric data structures, or approximate search using locality sensitive hashing.

Type: Application

Filed: December 20, 2022

Publication date: April 20, 2023

Inventors: Tristan Bepler, Bonnie Berger Leighton
Protein database search using learned representations

Patent number: 11532378

Abstract: A method for efficient search of protein sequence databases for proteins that have sequence, structural, and/or functional homology with respect to information derived from a search query. The method involves transforming the protein sequences into vector representations and searching in a vector space. Given a database of protein sequences and a learned embedding model, the embedding model is applied to each amino acid sequence to transform it into a sequence of vector representations. A query sequence is also transformed into a sequence of vector representations, preferably using the same learned embedding model. Once the query has been embedded in this manner, proteins are retrieved from the database based on distance between the query embedding and the protein embeddings contained within the database. Rapid and accurate search of the vector space is carried out using exact search using metric data structures, or approximate search using locality sensitive hashing.

Type: Grant

Filed: November 23, 2021

Date of Patent: December 20, 2022

Assignee: NE47 Bio, Inc.

Inventors: Tristan Bepler, Bonnie Berger Leighton
Protein database search using learned representations

Publication number: 20220165356

Abstract: A method for efficient search of protein sequence databases for proteins that have sequence, structural, and/or functional homology with respect to information derived from a search query. The method involves transforming the protein sequences into vector representations and searching in a vector space. Given a database of protein sequences and a learned embedding model, the embedding model is applied to each amino acid sequence to transform it into a sequence of vector representations. A query sequence is also transformed into a sequence of vector representations, preferably using the same learned embedding model. Once the query has been embedded in this manner, proteins are retrieved from the database based on distance between the query embedding and the protein embeddings contained within the database. Rapid and accurate search of the vector space is carried out using exact search using metric data structures, or approximate search using locality sensitive hashing.

Type: Application

Filed: November 23, 2021

Publication date: May 26, 2022

Inventors: Tristan Bepler, Bonnie Berger Leighton
Latent Representations of Phylogeny to Predict Organism Phenotype

Publication number: 20190130999

Abstract: Genetic sequence information representative of a first set of organisms is accessed. The first set of organisms can include organisms that include an organism feature and organisms that do not include the organism feature. A latent space representation of k-mers within the first genetic sequence information is generated, for instance using a generative topic model. A generative interpolation model is generated using the latent space representation. The generative interpolation model is configured to classify genetic sequence information representation of a target organism to determine whether the target organism includes the organism feature. Second genetic sequence information representation of each of a second set of organisms is accessed. The generative interpolation model is then applied to the second genetic sequence information to identify which of the second set of organisms are likely to include the organism feature.

Type: Application

Filed: October 25, 2018

Publication date: May 2, 2019

Inventors: Jacob N. Oppenheim, Tristan Bepler

Automating cryo-electron microscopy data collection

Protein engineering workflow using a generative model of protein families

Automating cryo-electron microscopy data collection

Protein database search using learned representations

Protein database search using learned representations

Protein database search using learned representations

Latent Representations of Phylogeny to Predict Organism Phenotype