Abstract: Systems and methods for generating indexes and fast searching of “approximate”, “fuzzy”, or “homologous” matches for a large quantity of data in a metric space are provided. The data is indexed to generate a search tree taxonomy. Once the index is generated, a query can be provided to report all hits within a certain neighborhood of the query. In an even faster implementation, the invention may be used together with existing approximate sequence comparison algorithms, such as FASTA and BLAST. Here, a local distance of a local metric space is used to generate local search tree branches. Applications of this invention may include homology search for DNA and/or protein sequences, textual or byte-based searches, literature search based on lists of keywords, and vector and matrix based indexing and searching.
Abstract: The invention provides a computerized storage and retrieval system for storing biological information organized as a protein pathways database and methods for performing pathway searches on nodes (proteins or other molecules), modes (interactions), and nodes-and-modes. The protein pathways database is a relational database that integrates protein sequence, genomic sequence, gene-expression, protein interactions, protein-protein association and pathway data and can be searched using a query pathway to predict homologous or orthologous nodes, modes, and pathways.
Type:
Application
Filed:
February 20, 2002
Publication date:
May 29, 2003
Applicant:
Genmetrics
Inventors:
Yonghong Yang, John Tillinghast, Christopher Piercy