Patents Assigned to Content Analyst Company, LLC
-
Patent number: 8024344Abstract: Presented are systems and methods for securely sharing confidential information. In such a method, term vectors corresponding to ones of a plurality of confidential terms included in a plurality of confidential documents is received. Each of the received term vectors is mapped into a vector space. Non-confidential documents are mapped into the vector space to generate a document vector corresponding to each non-confidential document, wherein the generation of each document vector is based on a subset of the received term vectors. At least one of the non-confidential documents is identified in response to a query mapped into the vector space.Type: GrantFiled: June 5, 2008Date of Patent: September 20, 2011Assignee: Content Analyst Company, LLCInventor: Roger Bradford
-
Patent number: 7844566Abstract: An embodiment of the present invention provides a computer-based method for automatically identifying clusters of conceptually-related documents in a collection of documents, including the following steps: generating a document-representation of each document in an abstract mathematical space; identifying a plurality of document clusters in the collection of documents based on a conceptual similarity between respective pairs of the document-representations, wherein each document cluster is associated with an exemplary document and a plurality of other documents; and identifying a non-intersecting document cluster from among the plurality of document clusters based on (i) a conceptual similarity between the document-representation of the exemplary document and the document-representation of each document in the non-intersecting cluster and (ii) a conceptual dissimilarity between a cluster-representation of the non-intersecting document cluster and a cluster-representation of each other document cluster.Type: GrantFiled: May 11, 2006Date of Patent: November 30, 2010Assignee: Content Analyst Company, LLCInventor: Janusz Wnek
-
Patent number: 7765098Abstract: An embodiment of the present invention provides a method for automatically translating text. First, a conceptual representation space is generated based on source-language documents and target-language documents, wherein respective terms from the source-language and target-language documents have a representation in the conceptual representation space. Second, a new source-language document is represented in the conceptual representation space, wherein a subset of terms in the new source-language document is represented in the conceptual representation space, such that each term in the subset has a representation in the conceptual representation space. Then, a term in the new source-language document is automatically translated into a corresponding target-language term based on a similarity between the representation of the term and the representation of the corresponding target-language term.Type: GrantFiled: April 24, 2006Date of Patent: July 27, 2010Assignee: Content Analyst Company, LLCInventor: Roger Burrowes Bradford
-
Patent number: 7720792Abstract: Disclosed are methods and computer program products for automatically identifying and compensating for stop words in a text processing system. This automatic stop word compensation allows such operations as performing queries on an abstract mathematical space built using all words from all texts, with the ability to compensate for the skew that the inclusion of the stop words may have introduced into the space. Documents are represented by document vectors in the abstract mathematical space. To compensate for stop words, a weight function is applied to a predetermined component of the document vectors associated with frequently occurring word(s) contained in the documents. The weight function may be applied dynamically during query processing. Alternatively, the weight function may be applied statically to all document vectors.Type: GrantFiled: February 7, 2006Date of Patent: May 18, 2010Assignee: Content Analyst Company, LLCInventor: Robert Jenson Price
-
Publication number: 20090328226Abstract: Presented are systems and methods for securely sharing confidential information. In such a method, term vectors corresponding to ones of a plurality of confidential terms included in a plurality of confidential documents is received. Each of the received term vectors is mapped into a vector space. Non-confidential documents are mapped into the vector space to generate a document vector corresponding to each non-confidential document, wherein the generation of each document vector is based on a subset of the received term vectors. At least one of the non-confidential documents is identified in response to a query mapped into the vector space.Type: ApplicationFiled: June 5, 2008Publication date: December 31, 2009Applicant: Content Analyst Company. LLCInventor: Roger BRADFORD
-
Patent number: 7580910Abstract: A text processing method is provided that includes the following steps. First, an abstract mathematical vector space is generated based on a collection of documents. Respective documents in the collection of documents have a representation in the abstract mathematical vector space and respective terms contained in the collection of documents have a representation in the abstract mathematical vector space. Then, the abstract mathematical vector space is perturbed to produce a perturbed abstract mathematical vector space that is stored in an electronic format accessible to a user. Perturbing the abstract mathematical vector space may include modifying the representation of a document with a newly computed representation for that document, or modifying the representation of a term with a newly computed representation for that term.Type: GrantFiled: March 31, 2006Date of Patent: August 25, 2009Assignee: Content Analyst Company, LLCInventor: Robert Jenson Price
-
Patent number: 7415462Abstract: Systems and methods for word sense disambiguation, including discerning one or more senses or occurrences, distinguishing between senses or occurrences, and determining a meaning for a sense or occurrence of a subject term. In a collection of documents containing terms and a reference collection containing at least one meaning associated with a term, the method includes forming a vector space representation of terms and documents. In some embodiments, the vector space is a latent semantic index vector space. In some embodiments, occurrences are clustered to discern or distinguish a sense of a term. In preferred embodiments, meaning of a sense or occurrence is assigned based on either correlation with an external reference source, or proximity to a reference source that has been indexed into the space.Type: GrantFiled: January 20, 2006Date of Patent: August 19, 2008Assignee: Content Analyst Company, LLCInventor: Roger B. Bradford
-
Publication number: 20060294101Abstract: A system and method for the automated classification of documents. To generate a function for the automatic classification of documents, a set of similarity scores is calculated for each document in a set of exemplary documents, wherein a similarity score is calculated by measuring the similarity in a conceptual representation space between a document vector representing the document and a centroid vector representing a category. The set of similarity scores are then used by an inductive learning from examples classifier to generate the function for the automatic classification of documents.Type: ApplicationFiled: June 23, 2006Publication date: December 28, 2006Applicant: Content Analyst Company, LLCInventor: Janusz Wnek
-
Publication number: 20060265362Abstract: A method for ranking data-objects retrieved from a plurality of databases is provided. First, data-objects are retrieved from a plurality of databases, wherein each database is accessed using a retrieval product. Second, a representation of each of the data-objects retrieved from the plurality of databases is generated in a conceptual representation space. Third, a representation of a query is generated in the conceptual representation space. Then, the data-objects are ranked with respect to the query based on a similarity between the representation of each data-object and the representation of the query.Type: ApplicationFiled: May 17, 2006Publication date: November 23, 2006Applicant: Content Analyst Company, LLCInventor: Roger Bradford
-
Publication number: 20060265209Abstract: An embodiment of the present invention provides a method for automatically translating text. First, a conceptual representation space is generated based on source-language documents and target-language documents, wherein respective terms from the source-language and target-language documents have a representation in the conceptual representation space. Second, a new source-language document is represented in the conceptual representation space, wherein a subset of terms in the new source-language document is represented in the conceptual representation space, such that each term in the subset has a representation in the conceptual representation space. Then, a term in the new source-language document is automatically translated into a corresponding target-language term based on a similarity between the representation of the term and the representation of the corresponding target-language term.Type: ApplicationFiled: April 24, 2006Publication date: November 23, 2006Applicant: Content Analyst Company, LLCInventor: Roger Bradford
-
Publication number: 20060242140Abstract: An embodiment of the present invention provides a computer-based method for automatically identifying clusters of conceptually-related documents in a collection of documents, including the following steps: generating a document-representation of each document in an abstract mathematical space; identifying a plurality of document clusters in the collection of documents based on a conceptual similarity between respective pairs of the document-representations, wherein each document cluster is associated with an exemplary document and a plurality of other documents; and identifying a non-intersecting document cluster from among the plurality of document clusters based on (i) a conceptual similarity between the document-representation of the exemplary document and the document-representation of each document in the non-intersecting cluster and (ii) a conceptual dissimilarity between a cluster-representation of the non-intersecting document cluster and a cluster-representation of each other document cluster.Type: ApplicationFiled: May 11, 2006Publication date: October 26, 2006Applicant: Content Analyst Company, LLCInventor: Janusz Wnek
-
Publication number: 20060242098Abstract: A method for automatically selecting representative exemplars from a collection of documents. The method includes generating a representation of each document in the collection of documents in an abstract mathematical space, measuring a similarity between the representation of each document in the collection of documents and the representation of at least one other document in the collection of documents, identifying clusters of conceptually similar documents based on the similarity measurements, and identifying at least one exemplary document within each cluster.Type: ApplicationFiled: November 1, 2005Publication date: October 26, 2006Applicant: Content Analyst Company, LLCInventor: Janusz Wnek
-
Publication number: 20060224572Abstract: Computer-based methods for automatically identifying and compensating for stop words contained in documents are described. The method for compensating for stop words includes: generating an abstract mathematical space based on documents included in a collection of documents, wherein each document has a representation in the abstract mathematical space; receiving a user query; generating a representation of the user query in the abstract mathematical; computing a similarity between the representation of the user query and the representation of each document, wherein computing a similarity between the representation of the user query and the representation of a first document in the collection of documents comprises applying a weighting function to a value associated with a frequently occurring word contained in the first document, thereby automatically compensating for the frequently occurring word contained in the first document; and displaying a result based on the similarity computations.Type: ApplicationFiled: February 7, 2006Publication date: October 5, 2006Applicant: Content Analyst Company, LLCInventor: Robert Price
-
Publication number: 20060224584Abstract: An embodiment of the present invention provides a method for automatically subdividing a document into conceptually cohesive segments. The method includes the following steps: subdividing the document into contiguous blocks of text; generating an abstract mathematical space based on the blocks of text, wherein each block of text has a representation in the abstract mathematical space; computing similarity scores for adjacent blocks of text based on the similarity scores; and aggregating similar adjacent blocks of text based on the similarity scores.Type: ApplicationFiled: December 27, 2005Publication date: October 5, 2006Applicant: Content Analyst Company, LLCInventor: Robert Price
-
Patent number: 7113943Abstract: Extensions to latent semantic indexing (LSI), including: phrase processing, creation of generalized entities, elaboration of entities, replacement of idiomatic expressions, and use of data fusion methods to combine the aforementioned extensions in a synergistic fashion. Additionally, novel methods tailored to specific applications of LSI are disclosed.Type: GrantFiled: December 5, 2001Date of Patent: September 26, 2006Assignee: Content Analyst Company, LLCInventors: Roger Burrowes Bradford, Janusz Wnek
-
Publication number: 20060117052Abstract: Systems and methods for word sense disambiguation, including discerning one or more senses or occurrences, distinguishing between senses or occurrences, and determining a meaning for a sense or occurrence of a subject term. In a collection of documents containing terms and a reference collection containing at least one meaning associated with a term, the method includes forming a vector space representation of terms and documents. In some embodiments, the vector space is a latent semantic index vector space. In some embodiments, occurrences are clustered to discern or distinguish a sense of a term. In preferred embodiments, meaning of a sense or occurrence is assigned based on either correlation with an external reference source, or proximity to a reference source that has been indexed into the space.Type: ApplicationFiled: January 20, 2006Publication date: June 1, 2006Applicant: Content Analyst Company, LLCInventor: Roger Bradford
-
Patent number: 7024407Abstract: Systems and methods for word sense disambiguation, including discerning one or more senses or occurrences, distinguishing between senses or occurrences, and determining a meaning for a sense or occurrence of a subject term. In a collection of documents containing terms and a reference collection containing at least one meaning associated with a term, the method includes forming a vector space representation of terms and documents. In some embodiments, the vector space is a latent semantic index vector space. In some embodiments, occurrences are clustered to discern or distinguish a sense of a term. In preferred embodiments, meaning of a sense or occurrence is assigned based on either correlation with an external reference source, or proximity to a reference source that has been indexed into the space.Type: GrantFiled: August 24, 2001Date of Patent: April 4, 2006Assignee: Content Analyst Company, LLCInventor: Roger B. Bradford
-
Patent number: 6954750Abstract: Refining a current query. Receiving information regarding the relevancy of documents retrieved from a document collection in response to a current query. Ranking the retrieved documents in accordance with the relevancy information. Forming a candidate query based on the rankings and analysis of locations of the retrieved documents in a latent semantic index vector space formed from the retrieved document. Applying the candidate query to the document collection. Ranking the documents retrieved in response to the candidate query in accordance with the received relevancy information. Comparing the ranking of documents retrieved in response to the candidate query and the ranking of documents retrieved in response to the current query with the received relevancy information. Choosing the query that produces the best ranking.Type: GrantFiled: October 21, 2003Date of Patent: October 11, 2005Assignee: Content Analyst Company, LLCInventor: Roger Burrowes Bradford