Patents Assigned to Claritech
  • Patent number: 7356604
    Abstract: The delivery ratio of r (which is a fraction between 0 and 1) partitions a stream of documents into a section of top scoring r-fraction of documents and the remainder. This way a set of successively bigger delivery ratios, r1, r2, r3, . . . sections the stream into tiers. Any given document is assigned to a tier according to how many delivery ratio thresholds it matched or surpassed and how many it failed to reach. This creates a scoring structure which reflects the specificity of the document with respect to a profile in terms of density of relevant documents in the stream. In other words, a document in the kth tier is such that it failed to be classified in the top rk ratio of the stream (thus rk fraction of the stream is more relevant to the given profile than the document under consideration). At the same time this document was classified as being in the top rk?1 part of the stream.
    Type: Grant
    Filed: April 18, 2000
    Date of Patent: April 8, 2008
    Assignee: Claritech Corporation
    Inventor: Norbert Roma
  • Patent number: 6915308
    Abstract: The present invention combines a data processing structure with a graphical user interface (GUI) to create an information analysis tool wherein multiple functions are combined in a network to extract information from multiple data sources. The functional network is created, and graphically represented to the user, by linking individual operations together. The combination of individual operations is not limited by the input or output characteristic of any single operation. The form of the input to or output from a by individual operation, whether from a database or from another operation, is the same. That is, both the input to and the output from an analysis function is a list of document identifiers and corresponding document characteristics. Because the form of the input and output from each operation is the same, arbitrary combinations into of operations may be created. Moreover, functional networks of individual operations can then be used for database retrieval as well as to filter data streams.
    Type: Grant
    Filed: April 6, 2000
    Date of Patent: July 5, 2005
    Assignee: Claritech Corporation
    Inventors: Davis A. Evans, Michael L. Horowitz, Christopher C. Lichti, Thomas P. Neuendorffer
  • Patent number: 6876998
    Abstract: The present mechanism provides a method and apparatus for retrieving documents that are stored in a language other than the language that is used to formulate a search query. This invention decomposes the query into terms and then translates each of the terms into terms of the language of the database. Once the database language terms have been listed, a series of subqueries is formed by creating all the possible combinations of the listed terms. Each subquery is then scored on each of the documents in the target language database. Only those subqueries that return meaningful scores are relevant to the query. Thus, the semantic meaning of the query is determined against the database itself and those documents in the database language that are most relevant to that semantic meaning are returned.
    Type: Grant
    Filed: March 13, 2001
    Date of Patent: April 5, 2005
    Assignee: Claritech Corporation
    Inventor: David A. Evans
  • Patent number: 6820079
    Abstract: A method and apparatus for retrieving similar or identical textual passages among different documents is disclosed. Normal discourse structures along with textual content attributes are used to encode a known passage with “marker sequences” that give a characterizing “signature” to the passage. The encoded known passage is then evaluated against similarly encoded passages appearing in a database of documents. If it is determined that there is a possible match between the encoded known passage and an encoded passage in a database document, a sequential string search is performed to determine whether the two passages are likely to be similar or identical. If the sequential string search records a probable match between the known passage and the database passage, the database passage is displayed for further review.
    Type: Grant
    Filed: January 18, 2000
    Date of Patent: November 16, 2004
    Assignee: Claritech Corporation
    Inventor: David A. Evans
  • Patent number: 6728701
    Abstract: A technique for optimizing the number of terms in a profile used for information extraction. This optimization is performed by estimating the number of terms which will substantively affect the information extraction process. That is, the technique estimates the point in a term weight curve where that curve becomes flat. A term generally is important and remains part of the profile as long as its weight and the weight of the next term may be differentiated. When terms' weights are not differentiable, then they are not significant and may be cut off. Reducing the number of terms used in a profile increases the efficiency and effectiveness of the information retrieval process.
    Type: Grant
    Filed: April 18, 2000
    Date of Patent: April 27, 2004
    Assignee: Claritech Corporation
    Inventor: Emilia Stoica
  • Patent number: 6721734
    Abstract: A technique for analyzing affect in which ambiguity in both emotion and natural language is explicitly represented and processed through fuzzy logic. In particular, textual information is processed to i) isolate a vocabulary of words belonging to an emotion, ii) represent the meaning of each word belonging to that emotion using multiple categories and scalar metrics, iii) compute profiles for text documents based on the categories and scores of their component words, and iv) manipulate the profiles to visualize the texts. The representation vehicle in the system is a set of fuzzy semantic categories (affect categories) followed by their respective centralities (degrees of relatedness between lexicon entries and their various categories) and intensities (representative of the strength of the affect level described by that word) called an affect set. A graphical representation of the affect set can also be used as a tool for decision making.
    Type: Grant
    Filed: April 18, 2000
    Date of Patent: April 13, 2004
    Assignee: Claritech Corporation
    Inventors: Pero Subasic, Alison Huettner
  • Patent number: 6587850
    Abstract: A novel approach for filtering documents involves the use of delivery ratio threshold setting technique to set an initial profile score threshold and the use of beta-gamma regulation for dynamic threshold updating. A group of documents is scored pursuant to a user profile. The score for each document is indicative of the relevance of the corresponding document to the user profile. The score can be compared with a profile score threshold to decide if the document should be accepted or rejected. According to one aspect of the invention, the initial threshold is set to a score threshold that approximates an expected ratio of acceptable documents calibrated with respect to a set of reference documents. According to another aspect of the invention, the score threshold can be updated based on the accumulated example documents, user's relevance judgment, and the user's utility function. The accumulated example documents are first scored against a profile and a ranked list of scored documents is obtained.
    Type: Grant
    Filed: June 5, 2002
    Date of Patent: July 1, 2003
    Assignee: Claritech Corporation
    Inventor: Chengxiang Zhai
  • Patent number: 6523030
    Abstract: The present invention is a method for operating a computer system to minimize the number of disk storage access operations used in creating an inverted database. This method divides a database into several smaller subdatabases. The documents of the subdatabases are decomposed into subdocuments. A postings list for each subdatabase is then created in which all the terms for the subdatabase are associated with the identity of each subdocument of the subdatabase in which the terms occur. The resulting postings lists for the subdatabases are then merged. The merge process sorts the postings of the subdatabases and merges common terms. The non-common terms are merged after the common terms. The process of sorting the postings list and then merging the common terms followed by the non-common terms minimizes the number of disk storage access operations required for creating the inverted database from a series of inverted subdatabases.
    Type: Grant
    Filed: October 24, 2000
    Date of Patent: February 18, 2003
    Assignee: Claritech Corporation
    Inventor: Michael L. Horowitz
  • Patent number: 6505198
    Abstract: The present invention is a method for operating a computer system to retrieve information from a computer database. This method decomposes documents from the database into subdocuments and then inverts the database. Also, a query for retrieving documents from a database is decomposed into terms. The subdocuments from the inverted database and the terms from the query are then used to compute a score that indicates a quantitative relation between terms in the query and the subdocuments. The resulting list of the subdocuments and their scores is then reorganized into a heap form. The highest ranking subdocument is then selected by the computer and the text associated with this subdocument is displayed. The remainder of the subdocument score list is concurrently reheapified.
    Type: Grant
    Filed: August 21, 2001
    Date of Patent: January 7, 2003
    Assignee: Claritech Corporation
    Inventor: Michael L. Horowitz
  • Patent number: 6473755
    Abstract: The present invention is a method and apparatus for retrieving information from a database. Initially, the documents within the database are divided into mutually exclusive subdocuments that generally correspond to paragraphs of text. The present invention further creates a second set of subdocuments that overlap adjacent paragraphs of text. In particular, the location of the overlapping subdocuments depends on the size of the initial paragraphs. This second set of overlapping subdocuments are scored just as the mutually exclusive subdocuments are scored. The scores from both the mutually exclusive and overlapping subdocuments are used in ranking the relevance of documents to a query. The use of both sets of subdocument scores improves the effectiveness of the scoring algorithm.
    Type: Grant
    Filed: March 19, 2001
    Date of Patent: October 29, 2002
    Assignee: Claritech Corporation
    Inventor: David A. Evans
  • Patent number: 6463434
    Abstract: A novel approach for filtering documents involves the use of delivery ratio threshold setting technique to set an initial profile score threshold and the use of beta-gamma regulation for dynamic threshold updating. A group of documents is scored pursuant to a user profile. The score for each document is indicative of the relevance of the corresponding document to the user profile. The score can be compared with a profile score threshold to decide if the document should be accepted or rejected. According to one aspect of the invention, the initial threshold is set to a score threshold that approximates an expected ratio of acceptable documents calibrated with respect to a set of reference documents. According to another aspect of the invention, the score threshold can be updated based on the accumulated example documents, user's relevance judgment, and the user's utility function. The accumulated example documents are first scored against a profile and a ranked list of scored documents is obtained.
    Type: Grant
    Filed: December 17, 2001
    Date of Patent: October 8, 2002
    Assignee: Claritech Corporation
    Inventor: Chengxiang Zhai
  • Patent number: 6453079
    Abstract: A document image that is the source of Optical Character Recognition (OCR) output is displayed. Recognition confidence parameters are determined for regions of the document image corresponding to words in the OCR output. The regions are displayed in a manner (e.g., highlighted in various colors) that is indicative of the respective recognition confidence parameter. Preferably, a user can select a region of the displayed document image. When the region is selected, text of the OCR output corresponding to the selected region is displayed in a pop-up menu.
    Type: Grant
    Filed: July 11, 2000
    Date of Patent: September 17, 2002
    Assignee: Claritech Corporation
    Inventor: Michael J. McInerny
  • Patent number: 6446066
    Abstract: The present invention provides a method and apparatus for generating a database search result. The creation of the search result is achieved by representing the subdocument lists of an inverted database with encoded bit strings. The encoded bit strings are space efficient methods of storing the correspondence between terms in the database and their occurrence in subdocuments. Logical combinations of these bit strings are then obtained by identifying the intersection, union, and/or inversion of a plurality of the bit strings. Since keywords for a database search can be identified by selecting the terms of the inverted database, the logical combinations of bit strings represent search results over the database. This technique for method for generating a search result is computationally efficient because computers combine bit strings very efficiently. Also, the search elements of the present invention are not just limited to keywords. The search elements also include types of fields (e.g.
    Type: Grant
    Filed: August 25, 2000
    Date of Patent: September 3, 2002
    Assignee: Claritech Corporation
    Inventor: Michael L. Horowitz
  • Patent number: 6430559
    Abstract: A novel approach for filtering documents involves the use of delivery ratio threshold setting technique to set an initial profile score threshold and the use of beta-gamma regulation for dynamic threshold updating. A group of documents is scored pursuant to a user profile. The score for each document is indicative of the relevance of the corresponding document to the user profile. The score can be compared with a profile score threshold to decide if the document should be accepted or rejected. According to one aspect of the invention, the initial threshold is set to a score threshold that approximates an expected ratio of acceptable documents calibrated with respect to a set of reference documents. According to another aspect of the invention, the score threshold can be updated based on the accumulated example documents, user's relevance judgment, and the user's utility function. The accumulated example documents are first scored against a profile and a ranked list of scored documents is obtained.
    Type: Grant
    Filed: November 2, 1999
    Date of Patent: August 6, 2002
    Assignee: Claritech Corporation
    Inventor: Chengxiang Zhai
  • Patent number: 6418455
    Abstract: The present invention is a computer system for modifying a database which comprises a computer that modifies records stored in a database. In the process for modifying records in the database, addresses to memory locations in a disk storage unit are accessed during the commit phase by first checking the address space in a transaction log. The computer system of the present invention operates by committing transactions without locking out readers. This is possible because any changed data in the database is reflected in the transaction log and the log must be accessed prior to reading from the disk storage unit. As a result, the user sees changed data when the log is accessed, or if data has not been changed, the log merely directs the computer to the address in the original database storage where unchanged data is stored.
    Type: Grant
    Filed: October 17, 2000
    Date of Patent: July 9, 2002
    Assignee: Claritech Corporation
    Inventors: Michael L. Horowitz, Michael J. McInerny, Stewart M. Clamen
  • Patent number: 6377947
    Abstract: In a novel approach for retrieving information a set of sub-documents first is established based upon a set of documents. A query is processed which operates on the set of sub-documents, causing a score to be generated for each sub-document. The score for each sub-document is indicative of the relevance of the corresponding sub-document to the query. The scores are reviewed and the best sub-document is retrieved. According to one aspect of the invention, the best sub-document has a score that indicates the highest relevance between the sub-document and the query. According to another aspect of the invention, in response to a user selection, the next best sub-document is identified and retrieved. The sub-documents are also presented to the user in an order based upon the scores. According to another aspect of the invention, the document containing the sub-document having the best score is displayed and automatically scrolled to the location of the sub-document having the best score.
    Type: Grant
    Filed: August 25, 2000
    Date of Patent: April 23, 2002
    Assignee: Claritech Corporation
    Inventor: David A. Evans
  • Patent number: 6363179
    Abstract: Document texts are produced by recognizing characters in document images by an Optical Character Recognition (OCR) process. When such a document text matches one or more search terms of a query, the corresponding document image is displayed. Regions of the document image, corresponding to words of the document text that match the search terms, are displayed in a visually distinctive manner. The display of the document image may be augmented by displaying a region corresponding to a reference text within the document text in another visually distinctive manner.
    Type: Grant
    Filed: January 11, 1999
    Date of Patent: March 26, 2002
    Assignee: Claritech Corporation
    Inventors: David A. Evans, Michael J. McInerny
  • Patent number: 6278990
    Abstract: The present invention is a method for operating a computer system to retrieve information from a computer database. This method decomposes documents from the database into subdocuments and then inverts the database. Also, a query for retrieving documents from a database is decomposed into terms. The subdocuments from the inverted database and the terms from the query are then used to compute a score that indicates a quantitative relation between terms in the query and the subdocuments. The resulting list of the subdocuments and their scores is then reorganized into a heap form. The highest ranking subdocument is then selected by the computer and the text associated with this subdocument is displayed. The remainder of the subdocument score list is concurrently reheapified.
    Type: Grant
    Filed: July 25, 1997
    Date of Patent: August 21, 2001
    Assignee: Claritech Corporation
    Inventor: Michael L. Horowitz
  • Patent number: 6263329
    Abstract: The present invention provides a method and apparatus for retrieving documents that are stored in a language other than the language that is used to formulate a search query. This invention decomposes the query into terms and then translates each of the terms into terms of the language of the database. Once the database language terms have been listed, a series of subqueries is formed by creating all the possible combinations of the listed terms. Each subquery is then scored on each of the documents in the target language database. Only those subqueries that return meaningful scores are relevant to the query. Thus, the semantic meaning of the query is determined against the database itself and those documents in the database language that are most relevant to that semantic meaning are returned.
    Type: Grant
    Filed: September 3, 1999
    Date of Patent: July 17, 2001
    Assignee: Claritech
    Inventor: David A. Evans
  • Patent number: 6226631
    Abstract: An document image that is the source of Optical Character Recognition (OCR) output is displayed so that a user can select a region of the displayed document image. When the region is selected, text of the OCR output corresponding to the selected region is submitted as an input to a search engine.
    Type: Grant
    Filed: September 3, 1999
    Date of Patent: May 1, 2001
    Assignee: Claritech Corporation
    Inventor: David A. Evans