Patents by Inventor Tristan Bepler

Tristan Bepler has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240395497
    Abstract: A method of automated control of a microscope in cryogenic electron microscopy (cryo-EM), wherein the microscope is configured to collect high-magnification micrographs of particles suspended in vitreous ice. Such particles are found in grid squares, and a square contains holes from which high-magnification micrographs are imaged. The method is carried out during an active data collection session, leveraging a pipeline that comprises a set of models. The pipeline evaluates a set of collection locations to determine whether to continue collection at a current grid/square or instead at a new grid/square. The evaluation is based on a set of one or more quality scores derived from one or more pretrained models and machine learning-based active learning. Based on the determination, control information is provided to automatically control the microscope to move to a next target for data collection.
    Type: Application
    Filed: August 6, 2024
    Publication date: November 28, 2024
    Inventors: Paul T. Kim, Tristan Bepler
  • Publication number: 20240282404
    Abstract: A retrieval-augmented framework leverages a generative protein language model of whole protein families. The model is configured and trained on homologous sequences and learns to generate sets of related proteins as sequences-of-sequences across very large numbers (e.g., tens of millions) of natural protein sequence clusters. In order to capture conditioning between sequences in an order independent manner (typically, the order of sequences within a family is arbitrary) and to generalize to large context lengths, the model leverages a transformer layer that models order-dependence between tokens within sequences and order-independence between sequences. Upon training, the model is used in protein engineering workflows, such as controllable design of protein sequences and variant effect prediction.
    Type: Application
    Filed: February 16, 2024
    Publication date: August 22, 2024
    Inventors: Tristan Bepler, Timothy F. Truong, JR.
  • Patent number: 12057289
    Abstract: A method of automated control of a microscope in cryogenic electron microscopy (cryo-EM), wherein the microscope is configured to collect high-magnification micrographs of particles suspended in vitreous ice. Such particles are found in grid squares, and a square contains holes from which high-magnification micrographs are imaged. The method is carried out during an active data collection session, leveraging a pipeline that comprises a set of models. The pipeline evaluates a set of collection locations to determine whether to continue collection at a current grid/square or instead at a new grid/square. The evaluation is based on a set of one or more quality scores derived from one or more pretrained models and machine learning-based active learning. Based on the determination, control information is provided to automatically control the microscope to move to a next target for data collection.
    Type: Grant
    Filed: May 3, 2023
    Date of Patent: August 6, 2024
    Assignee: New York Structural Biology Center
    Inventors: Paul T. Kim, Tristan Bepler
  • Publication number: 20230123770
    Abstract: A method for efficient search of protein sequence databases for proteins that have sequence, structural, and/or functional homology with respect to information derived from a search query. The method involves transforming the protein sequences into vector representations and searching in a vector space. Given a database of protein sequences and a learned embedding model, the embedding model is applied to each amino acid sequence to transform it into a sequence of vector representations. A query sequence is also transformed into a sequence of vector representations, preferably using the same learned embedding model. Once the query has been embedded in this manner, proteins are retrieved from the database based on distance between the query embedding and the protein embeddings contained within the database. Rapid and accurate search of the vector space is carried out using exact search using metric data structures, or approximate search using locality sensitive hashing.
    Type: Application
    Filed: December 20, 2022
    Publication date: April 20, 2023
    Inventors: Tristan Bepler, Bonnie Berger Leighton
  • Patent number: 11532378
    Abstract: A method for efficient search of protein sequence databases for proteins that have sequence, structural, and/or functional homology with respect to information derived from a search query. The method involves transforming the protein sequences into vector representations and searching in a vector space. Given a database of protein sequences and a learned embedding model, the embedding model is applied to each amino acid sequence to transform it into a sequence of vector representations. A query sequence is also transformed into a sequence of vector representations, preferably using the same learned embedding model. Once the query has been embedded in this manner, proteins are retrieved from the database based on distance between the query embedding and the protein embeddings contained within the database. Rapid and accurate search of the vector space is carried out using exact search using metric data structures, or approximate search using locality sensitive hashing.
    Type: Grant
    Filed: November 23, 2021
    Date of Patent: December 20, 2022
    Assignee: NE47 Bio, Inc.
    Inventors: Tristan Bepler, Bonnie Berger Leighton
  • Publication number: 20220165356
    Abstract: A method for efficient search of protein sequence databases for proteins that have sequence, structural, and/or functional homology with respect to information derived from a search query. The method involves transforming the protein sequences into vector representations and searching in a vector space. Given a database of protein sequences and a learned embedding model, the embedding model is applied to each amino acid sequence to transform it into a sequence of vector representations. A query sequence is also transformed into a sequence of vector representations, preferably using the same learned embedding model. Once the query has been embedded in this manner, proteins are retrieved from the database based on distance between the query embedding and the protein embeddings contained within the database. Rapid and accurate search of the vector space is carried out using exact search using metric data structures, or approximate search using locality sensitive hashing.
    Type: Application
    Filed: November 23, 2021
    Publication date: May 26, 2022
    Inventors: Tristan Bepler, Bonnie Berger Leighton
  • Publication number: 20190130999
    Abstract: Genetic sequence information representative of a first set of organisms is accessed. The first set of organisms can include organisms that include an organism feature and organisms that do not include the organism feature. A latent space representation of k-mers within the first genetic sequence information is generated, for instance using a generative topic model. A generative interpolation model is generated using the latent space representation. The generative interpolation model is configured to classify genetic sequence information representation of a target organism to determine whether the target organism includes the organism feature. Second genetic sequence information representation of each of a second set of organisms is accessed. The generative interpolation model is then applied to the second genetic sequence information to identify which of the second set of organisms are likely to include the organism feature.
    Type: Application
    Filed: October 25, 2018
    Publication date: May 2, 2019
    Inventors: Jacob N. Oppenheim, Tristan Bepler