Patents Assigned to Claritech
-
Patent number: 7356604Abstract: The delivery ratio of r (which is a fraction between 0 and 1) partitions a stream of documents into a section of top scoring r-fraction of documents and the remainder. This way a set of successively bigger delivery ratios, r1, r2, r3, . . . sections the stream into tiers. Any given document is assigned to a tier according to how many delivery ratio thresholds it matched or surpassed and how many it failed to reach. This creates a scoring structure which reflects the specificity of the document with respect to a profile in terms of density of relevant documents in the stream. In other words, a document in the kth tier is such that it failed to be classified in the top rk ratio of the stream (thus rk fraction of the stream is more relevant to the given profile than the document under consideration). At the same time this document was classified as being in the top rk?1 part of the stream.Type: GrantFiled: April 18, 2000Date of Patent: April 8, 2008Assignee: Claritech CorporationInventor: Norbert Roma
-
Patent number: 6915308Abstract: The present invention combines a data processing structure with a graphical user interface (GUI) to create an information analysis tool wherein multiple functions are combined in a network to extract information from multiple data sources. The functional network is created, and graphically represented to the user, by linking individual operations together. The combination of individual operations is not limited by the input or output characteristic of any single operation. The form of the input to or output from a by individual operation, whether from a database or from another operation, is the same. That is, both the input to and the output from an analysis function is a list of document identifiers and corresponding document characteristics. Because the form of the input and output from each operation is the same, arbitrary combinations into of operations may be created. Moreover, functional networks of individual operations can then be used for database retrieval as well as to filter data streams.Type: GrantFiled: April 6, 2000Date of Patent: July 5, 2005Assignee: Claritech CorporationInventors: Davis A. Evans, Michael L. Horowitz, Christopher C. Lichti, Thomas P. Neuendorffer
-
Patent number: 6876998Abstract: The present mechanism provides a method and apparatus for retrieving documents that are stored in a language other than the language that is used to formulate a search query. This invention decomposes the query into terms and then translates each of the terms into terms of the language of the database. Once the database language terms have been listed, a series of subqueries is formed by creating all the possible combinations of the listed terms. Each subquery is then scored on each of the documents in the target language database. Only those subqueries that return meaningful scores are relevant to the query. Thus, the semantic meaning of the query is determined against the database itself and those documents in the database language that are most relevant to that semantic meaning are returned.Type: GrantFiled: March 13, 2001Date of Patent: April 5, 2005Assignee: Claritech CorporationInventor: David A. Evans
-
Patent number: 6820079Abstract: A method and apparatus for retrieving similar or identical textual passages among different documents is disclosed. Normal discourse structures along with textual content attributes are used to encode a known passage with “marker sequences” that give a characterizing “signature” to the passage. The encoded known passage is then evaluated against similarly encoded passages appearing in a database of documents. If it is determined that there is a possible match between the encoded known passage and an encoded passage in a database document, a sequential string search is performed to determine whether the two passages are likely to be similar or identical. If the sequential string search records a probable match between the known passage and the database passage, the database passage is displayed for further review.Type: GrantFiled: January 18, 2000Date of Patent: November 16, 2004Assignee: Claritech CorporationInventor: David A. Evans
-
Patent number: 6728701Abstract: A technique for optimizing the number of terms in a profile used for information extraction. This optimization is performed by estimating the number of terms which will substantively affect the information extraction process. That is, the technique estimates the point in a term weight curve where that curve becomes flat. A term generally is important and remains part of the profile as long as its weight and the weight of the next term may be differentiated. When terms' weights are not differentiable, then they are not significant and may be cut off. Reducing the number of terms used in a profile increases the efficiency and effectiveness of the information retrieval process.Type: GrantFiled: April 18, 2000Date of Patent: April 27, 2004Assignee: Claritech CorporationInventor: Emilia Stoica
-
Patent number: 6721734Abstract: A technique for analyzing affect in which ambiguity in both emotion and natural language is explicitly represented and processed through fuzzy logic. In particular, textual information is processed to i) isolate a vocabulary of words belonging to an emotion, ii) represent the meaning of each word belonging to that emotion using multiple categories and scalar metrics, iii) compute profiles for text documents based on the categories and scores of their component words, and iv) manipulate the profiles to visualize the texts. The representation vehicle in the system is a set of fuzzy semantic categories (affect categories) followed by their respective centralities (degrees of relatedness between lexicon entries and their various categories) and intensities (representative of the strength of the affect level described by that word) called an affect set. A graphical representation of the affect set can also be used as a tool for decision making.Type: GrantFiled: April 18, 2000Date of Patent: April 13, 2004Assignee: Claritech CorporationInventors: Pero Subasic, Alison Huettner
-
Patent number: 6587850Abstract: A novel approach for filtering documents involves the use of delivery ratio threshold setting technique to set an initial profile score threshold and the use of beta-gamma regulation for dynamic threshold updating. A group of documents is scored pursuant to a user profile. The score for each document is indicative of the relevance of the corresponding document to the user profile. The score can be compared with a profile score threshold to decide if the document should be accepted or rejected. According to one aspect of the invention, the initial threshold is set to a score threshold that approximates an expected ratio of acceptable documents calibrated with respect to a set of reference documents. According to another aspect of the invention, the score threshold can be updated based on the accumulated example documents, user's relevance judgment, and the user's utility function. The accumulated example documents are first scored against a profile and a ranked list of scored documents is obtained.Type: GrantFiled: June 5, 2002Date of Patent: July 1, 2003Assignee: Claritech CorporationInventor: Chengxiang Zhai
-
Patent number: 6523030Abstract: The present invention is a method for operating a computer system to minimize the number of disk storage access operations used in creating an inverted database. This method divides a database into several smaller subdatabases. The documents of the subdatabases are decomposed into subdocuments. A postings list for each subdatabase is then created in which all the terms for the subdatabase are associated with the identity of each subdocument of the subdatabase in which the terms occur. The resulting postings lists for the subdatabases are then merged. The merge process sorts the postings of the subdatabases and merges common terms. The non-common terms are merged after the common terms. The process of sorting the postings list and then merging the common terms followed by the non-common terms minimizes the number of disk storage access operations required for creating the inverted database from a series of inverted subdatabases.Type: GrantFiled: October 24, 2000Date of Patent: February 18, 2003Assignee: Claritech CorporationInventor: Michael L. Horowitz
-
Patent number: 6505198Abstract: The present invention is a method for operating a computer system to retrieve information from a computer database. This method decomposes documents from the database into subdocuments and then inverts the database. Also, a query for retrieving documents from a database is decomposed into terms. The subdocuments from the inverted database and the terms from the query are then used to compute a score that indicates a quantitative relation between terms in the query and the subdocuments. The resulting list of the subdocuments and their scores is then reorganized into a heap form. The highest ranking subdocument is then selected by the computer and the text associated with this subdocument is displayed. The remainder of the subdocument score list is concurrently reheapified.Type: GrantFiled: August 21, 2001Date of Patent: January 7, 2003Assignee: Claritech CorporationInventor: Michael L. Horowitz
-
Patent number: 6473755Abstract: The present invention is a method and apparatus for retrieving information from a database. Initially, the documents within the database are divided into mutually exclusive subdocuments that generally correspond to paragraphs of text. The present invention further creates a second set of subdocuments that overlap adjacent paragraphs of text. In particular, the location of the overlapping subdocuments depends on the size of the initial paragraphs. This second set of overlapping subdocuments are scored just as the mutually exclusive subdocuments are scored. The scores from both the mutually exclusive and overlapping subdocuments are used in ranking the relevance of documents to a query. The use of both sets of subdocument scores improves the effectiveness of the scoring algorithm.Type: GrantFiled: March 19, 2001Date of Patent: October 29, 2002Assignee: Claritech CorporationInventor: David A. Evans
-
Patent number: 6463434Abstract: A novel approach for filtering documents involves the use of delivery ratio threshold setting technique to set an initial profile score threshold and the use of beta-gamma regulation for dynamic threshold updating. A group of documents is scored pursuant to a user profile. The score for each document is indicative of the relevance of the corresponding document to the user profile. The score can be compared with a profile score threshold to decide if the document should be accepted or rejected. According to one aspect of the invention, the initial threshold is set to a score threshold that approximates an expected ratio of acceptable documents calibrated with respect to a set of reference documents. According to another aspect of the invention, the score threshold can be updated based on the accumulated example documents, user's relevance judgment, and the user's utility function. The accumulated example documents are first scored against a profile and a ranked list of scored documents is obtained.Type: GrantFiled: December 17, 2001Date of Patent: October 8, 2002Assignee: Claritech CorporationInventor: Chengxiang Zhai
-
Patent number: 6453079Abstract: A document image that is the source of Optical Character Recognition (OCR) output is displayed. Recognition confidence parameters are determined for regions of the document image corresponding to words in the OCR output. The regions are displayed in a manner (e.g., highlighted in various colors) that is indicative of the respective recognition confidence parameter. Preferably, a user can select a region of the displayed document image. When the region is selected, text of the OCR output corresponding to the selected region is displayed in a pop-up menu.Type: GrantFiled: July 11, 2000Date of Patent: September 17, 2002Assignee: Claritech CorporationInventor: Michael J. McInerny
-
Patent number: 6446066Abstract: The present invention provides a method and apparatus for generating a database search result. The creation of the search result is achieved by representing the subdocument lists of an inverted database with encoded bit strings. The encoded bit strings are space efficient methods of storing the correspondence between terms in the database and their occurrence in subdocuments. Logical combinations of these bit strings are then obtained by identifying the intersection, union, and/or inversion of a plurality of the bit strings. Since keywords for a database search can be identified by selecting the terms of the inverted database, the logical combinations of bit strings represent search results over the database. This technique for method for generating a search result is computationally efficient because computers combine bit strings very efficiently. Also, the search elements of the present invention are not just limited to keywords. The search elements also include types of fields (e.g.Type: GrantFiled: August 25, 2000Date of Patent: September 3, 2002Assignee: Claritech CorporationInventor: Michael L. Horowitz
-
Patent number: 6430559Abstract: A novel approach for filtering documents involves the use of delivery ratio threshold setting technique to set an initial profile score threshold and the use of beta-gamma regulation for dynamic threshold updating. A group of documents is scored pursuant to a user profile. The score for each document is indicative of the relevance of the corresponding document to the user profile. The score can be compared with a profile score threshold to decide if the document should be accepted or rejected. According to one aspect of the invention, the initial threshold is set to a score threshold that approximates an expected ratio of acceptable documents calibrated with respect to a set of reference documents. According to another aspect of the invention, the score threshold can be updated based on the accumulated example documents, user's relevance judgment, and the user's utility function. The accumulated example documents are first scored against a profile and a ranked list of scored documents is obtained.Type: GrantFiled: November 2, 1999Date of Patent: August 6, 2002Assignee: Claritech CorporationInventor: Chengxiang Zhai
-
Patent number: 6418455Abstract: The present invention is a computer system for modifying a database which comprises a computer that modifies records stored in a database. In the process for modifying records in the database, addresses to memory locations in a disk storage unit are accessed during the commit phase by first checking the address space in a transaction log. The computer system of the present invention operates by committing transactions without locking out readers. This is possible because any changed data in the database is reflected in the transaction log and the log must be accessed prior to reading from the disk storage unit. As a result, the user sees changed data when the log is accessed, or if data has not been changed, the log merely directs the computer to the address in the original database storage where unchanged data is stored.Type: GrantFiled: October 17, 2000Date of Patent: July 9, 2002Assignee: Claritech CorporationInventors: Michael L. Horowitz, Michael J. McInerny, Stewart M. Clamen
-
Patent number: 6377947Abstract: In a novel approach for retrieving information a set of sub-documents first is established based upon a set of documents. A query is processed which operates on the set of sub-documents, causing a score to be generated for each sub-document. The score for each sub-document is indicative of the relevance of the corresponding sub-document to the query. The scores are reviewed and the best sub-document is retrieved. According to one aspect of the invention, the best sub-document has a score that indicates the highest relevance between the sub-document and the query. According to another aspect of the invention, in response to a user selection, the next best sub-document is identified and retrieved. The sub-documents are also presented to the user in an order based upon the scores. According to another aspect of the invention, the document containing the sub-document having the best score is displayed and automatically scrolled to the location of the sub-document having the best score.Type: GrantFiled: August 25, 2000Date of Patent: April 23, 2002Assignee: Claritech CorporationInventor: David A. Evans
-
Patent number: 6363179Abstract: Document texts are produced by recognizing characters in document images by an Optical Character Recognition (OCR) process. When such a document text matches one or more search terms of a query, the corresponding document image is displayed. Regions of the document image, corresponding to words of the document text that match the search terms, are displayed in a visually distinctive manner. The display of the document image may be augmented by displaying a region corresponding to a reference text within the document text in another visually distinctive manner.Type: GrantFiled: January 11, 1999Date of Patent: March 26, 2002Assignee: Claritech CorporationInventors: David A. Evans, Michael J. McInerny
-
Patent number: 6278990Abstract: The present invention is a method for operating a computer system to retrieve information from a computer database. This method decomposes documents from the database into subdocuments and then inverts the database. Also, a query for retrieving documents from a database is decomposed into terms. The subdocuments from the inverted database and the terms from the query are then used to compute a score that indicates a quantitative relation between terms in the query and the subdocuments. The resulting list of the subdocuments and their scores is then reorganized into a heap form. The highest ranking subdocument is then selected by the computer and the text associated with this subdocument is displayed. The remainder of the subdocument score list is concurrently reheapified.Type: GrantFiled: July 25, 1997Date of Patent: August 21, 2001Assignee: Claritech CorporationInventor: Michael L. Horowitz
-
Patent number: 6263329Abstract: The present invention provides a method and apparatus for retrieving documents that are stored in a language other than the language that is used to formulate a search query. This invention decomposes the query into terms and then translates each of the terms into terms of the language of the database. Once the database language terms have been listed, a series of subqueries is formed by creating all the possible combinations of the listed terms. Each subquery is then scored on each of the documents in the target language database. Only those subqueries that return meaningful scores are relevant to the query. Thus, the semantic meaning of the query is determined against the database itself and those documents in the database language that are most relevant to that semantic meaning are returned.Type: GrantFiled: September 3, 1999Date of Patent: July 17, 2001Assignee: ClaritechInventor: David A. Evans
-
Patent number: 6226631Abstract: An document image that is the source of Optical Character Recognition (OCR) output is displayed so that a user can select a region of the displayed document image. When the region is selected, text of the OCR output corresponding to the selected region is submitted as an input to a search engine.Type: GrantFiled: September 3, 1999Date of Patent: May 1, 2001Assignee: Claritech CorporationInventor: David A. Evans