Patents Assigned to Content Analyst Company, LLC

Vector space method for secure information sharing

Patent number: 8024344

Abstract: Presented are systems and methods for securely sharing confidential information. In such a method, term vectors corresponding to ones of a plurality of confidential terms included in a plurality of confidential documents is received. Each of the received term vectors is mapped into a vector space. Non-confidential documents are mapped into the vector space to generate a document vector corresponding to each non-confidential document, wherein the generation of each document vector is based on a subset of the received term vectors. At least one of the non-confidential documents is identified in response to a query mapped into the vector space.

Type: Grant

Filed: June 5, 2008

Date of Patent: September 20, 2011

Assignee: Content Analyst Company, LLC

Inventor: Roger Bradford
Latent semantic clustering

Patent number: 7844566

Abstract: An embodiment of the present invention provides a computer-based method for automatically identifying clusters of conceptually-related documents in a collection of documents, including the following steps: generating a document-representation of each document in an abstract mathematical space; identifying a plurality of document clusters in the collection of documents based on a conceptual similarity between respective pairs of the document-representations, wherein each document cluster is associated with an exemplary document and a plurality of other documents; and identifying a non-intersecting document cluster from among the plurality of document clusters based on (i) a conceptual similarity between the document-representation of the exemplary document and the document-representation of each document in the non-intersecting cluster and (ii) a conceptual dissimilarity between a cluster-representation of the non-intersecting document cluster and a cluster-representation of each other document cluster.

Type: Grant

Filed: May 11, 2006

Date of Patent: November 30, 2010

Assignee: Content Analyst Company, LLC

Inventor: Janusz Wnek
Machine translation using vector space representations

Patent number: 7765098

Abstract: An embodiment of the present invention provides a method for automatically translating text. First, a conceptual representation space is generated based on source-language documents and target-language documents, wherein respective terms from the source-language and target-language documents have a representation in the conceptual representation space. Second, a new source-language document is represented in the conceptual representation space, wherein a subset of terms in the new source-language document is represented in the conceptual representation space, such that each term in the subset has a representation in the conceptual representation space. Then, a term in the new source-language document is automatically translated into a corresponding target-language term based on a similarity between the representation of the term and the representation of the corresponding target-language term.

Type: Grant

Filed: April 24, 2006

Date of Patent: July 27, 2010

Assignee: Content Analyst Company, LLC

Inventor: Roger Burrowes Bradford
Automatic stop word identification and compensation

Patent number: 7720792

Abstract: Disclosed are methods and computer program products for automatically identifying and compensating for stop words in a text processing system. This automatic stop word compensation allows such operations as performing queries on an abstract mathematical space built using all words from all texts, with the ability to compensate for the skew that the inclusion of the stop words may have introduced into the space. Documents are represented by document vectors in the abstract mathematical space. To compensate for stop words, a weight function is applied to a predetermined component of the document vectors associated with frequently occurring word(s) contained in the documents. The weight function may be applied dynamically during query processing. Alternatively, the weight function may be applied statically to all document vectors.

Type: Grant

Filed: February 7, 2006

Date of Patent: May 18, 2010

Assignee: Content Analyst Company, LLC

Inventor: Robert Jenson Price
Vector Space Method for Secure Information Sharing

Publication number: 20090328226

Abstract: Presented are systems and methods for securely sharing confidential information. In such a method, term vectors corresponding to ones of a plurality of confidential terms included in a plurality of confidential documents is received. Each of the received term vectors is mapped into a vector space. Non-confidential documents are mapped into the vector space to generate a document vector corresponding to each non-confidential document, wherein the generation of each document vector is based on a subset of the received term vectors. At least one of the non-confidential documents is identified in response to a query mapped into the vector space.

Type: Application

Filed: June 5, 2008

Publication date: December 31, 2009

Applicant: Content Analyst Company. LLC

Inventor: Roger BRADFORD
Perturbing latent semantic indexing spaces

Patent number: 7580910

Abstract: A text processing method is provided that includes the following steps. First, an abstract mathematical vector space is generated based on a collection of documents. Respective documents in the collection of documents have a representation in the abstract mathematical vector space and respective terms contained in the collection of documents have a representation in the abstract mathematical vector space. Then, the abstract mathematical vector space is perturbed to produce a perturbed abstract mathematical vector space that is stored in an electronic format accessible to a user. Perturbing the abstract mathematical vector space may include modifying the representation of a document with a newly computed representation for that document, or modifying the representation of a term with a newly computed representation for that term.

Type: Grant

Filed: March 31, 2006

Date of Patent: August 25, 2009

Assignee: Content Analyst Company, LLC

Inventor: Robert Jenson Price
Word sense disambiguation

Patent number: 7415462

Abstract: Systems and methods for word sense disambiguation, including discerning one or more senses or occurrences, distinguishing between senses or occurrences, and determining a meaning for a sense or occurrence of a subject term. In a collection of documents containing terms and a reference collection containing at least one meaning associated with a term, the method includes forming a vector space representation of terms and documents. In some embodiments, the vector space is a latent semantic index vector space. In some embodiments, occurrences are clustered to discern or distinguish a sense of a term. In preferred embodiments, meaning of a sense or occurrence is assigned based on either correlation with an external reference source, or proximity to a reference source that has been indexed into the space.

Type: Grant

Filed: January 20, 2006

Date of Patent: August 19, 2008

Assignee: Content Analyst Company, LLC

Inventor: Roger B. Bradford
Multi-strategy document classification system and method

Publication number: 20060294101

Abstract: A system and method for the automated classification of documents. To generate a function for the automatic classification of documents, a set of similarity scores is calculated for each document in a set of exemplary documents, wherein a similarity score is calculated by measuring the similarity in a conceptual representation space between a document vector representing the document and a centroid vector representing a category. The set of similarity scores are then used by an inductive learning from examples classifier to generate the function for the automatic classification of documents.

Type: Application

Filed: June 23, 2006

Publication date: December 28, 2006

Applicant: Content Analyst Company, LLC

Inventor: Janusz Wnek
Machine translation using vector space representations

Publication number: 20060265209

Abstract: An embodiment of the present invention provides a method for automatically translating text. First, a conceptual representation space is generated based on source-language documents and target-language documents, wherein respective terms from the source-language and target-language documents have a representation in the conceptual representation space. Second, a new source-language document is represented in the conceptual representation space, wherein a subset of terms in the new source-language document is represented in the conceptual representation space, such that each term in the subset has a representation in the conceptual representation space. Then, a term in the new source-language document is automatically translated into a corresponding target-language term based on a similarity between the representation of the term and the representation of the corresponding target-language term.

Type: Application

Filed: April 24, 2006

Publication date: November 23, 2006

Applicant: Content Analyst Company, LLC

Inventor: Roger Bradford
Federated queries and combined text and relational data

Publication number: 20060265362

Abstract: A method for ranking data-objects retrieved from a plurality of databases is provided. First, data-objects are retrieved from a plurality of databases, wherein each database is accessed using a retrieval product. Second, a representation of each of the data-objects retrieved from the plurality of databases is generated in a conceptual representation space. Third, a representation of a query is generated in the conceptual representation space. Then, the data-objects are ranked with respect to the query based on a similarity between the representation of each data-object and the representation of the query.

Type: Application

Filed: May 17, 2006

Publication date: November 23, 2006

Applicant: Content Analyst Company, LLC

Inventor: Roger Bradford
Latent semantic clustering

Publication number: 20060242140

Abstract: An embodiment of the present invention provides a computer-based method for automatically identifying clusters of conceptually-related documents in a collection of documents, including the following steps: generating a document-representation of each document in an abstract mathematical space; identifying a plurality of document clusters in the collection of documents based on a conceptual similarity between respective pairs of the document-representations, wherein each document cluster is associated with an exemplary document and a plurality of other documents; and identifying a non-intersecting document cluster from among the plurality of document clusters based on (i) a conceptual similarity between the document-representation of the exemplary document and the document-representation of each document in the non-intersecting cluster and (ii) a conceptual dissimilarity between a cluster-representation of the non-intersecting document cluster and a cluster-representation of each other document cluster.

Type: Application

Filed: May 11, 2006

Publication date: October 26, 2006

Applicant: Content Analyst Company, LLC

Inventor: Janusz Wnek
Generating representative exemplars for indexing, clustering, categorization and taxonomy

Publication number: 20060242098

Abstract: A method for automatically selecting representative exemplars from a collection of documents. The method includes generating a representation of each document in the collection of documents in an abstract mathematical space, measuring a similarity between the representation of each document in the collection of documents and the representation of at least one other document in the collection of documents, identifying clusters of conceptually similar documents based on the similarity measurements, and identifying at least one exemplary document within each cluster.

Type: Application

Filed: November 1, 2005

Publication date: October 26, 2006

Applicant: Content Analyst Company, LLC

Inventor: Janusz Wnek
Automatic linear text segmentation

Publication number: 20060224584

Abstract: An embodiment of the present invention provides a method for automatically subdividing a document into conceptually cohesive segments. The method includes the following steps: subdividing the document into contiguous blocks of text; generating an abstract mathematical space based on the blocks of text, wherein each block of text has a representation in the abstract mathematical space; computing similarity scores for adjacent blocks of text based on the similarity scores; and aggregating similar adjacent blocks of text based on the similarity scores.

Type: Application

Filed: December 27, 2005

Publication date: October 5, 2006

Applicant: Content Analyst Company, LLC

Inventor: Robert Price
Automatic stop word identification and compensation

Publication number: 20060224572

Abstract: Computer-based methods for automatically identifying and compensating for stop words contained in documents are described. The method for compensating for stop words includes: generating an abstract mathematical space based on documents included in a collection of documents, wherein each document has a representation in the abstract mathematical space; receiving a user query; generating a representation of the user query in the abstract mathematical; computing a similarity between the representation of the user query and the representation of each document, wherein computing a similarity between the representation of the user query and the representation of a first document in the collection of documents comprises applying a weighting function to a value associated with a frequently occurring word contained in the first document, thereby automatically compensating for the frequently occurring word contained in the first document; and displaying a result based on the similarity computations.

Type: Application

Filed: February 7, 2006

Publication date: October 5, 2006

Applicant: Content Analyst Company, LLC

Inventor: Robert Price
Method for document comparison and selection

Patent number: 7113943

Abstract: Extensions to latent semantic indexing (LSI), including: phrase processing, creation of generalized entities, elaboration of entities, replacement of idiomatic expressions, and use of data fusion methods to combine the aforementioned extensions in a synergistic fashion. Additionally, novel methods tailored to specific applications of LSI are disclosed.

Type: Grant

Filed: December 5, 2001

Date of Patent: September 26, 2006

Assignee: Content Analyst Company, LLC

Inventors: Roger Burrowes Bradford, Janusz Wnek
Word sense disambiguation

Publication number: 20060117052

Abstract: Systems and methods for word sense disambiguation, including discerning one or more senses or occurrences, distinguishing between senses or occurrences, and determining a meaning for a sense or occurrence of a subject term. In a collection of documents containing terms and a reference collection containing at least one meaning associated with a term, the method includes forming a vector space representation of terms and documents. In some embodiments, the vector space is a latent semantic index vector space. In some embodiments, occurrences are clustered to discern or distinguish a sense of a term. In preferred embodiments, meaning of a sense or occurrence is assigned based on either correlation with an external reference source, or proximity to a reference source that has been indexed into the space.

Type: Application

Filed: January 20, 2006

Publication date: June 1, 2006

Applicant: Content Analyst Company, LLC

Inventor: Roger Bradford
Word sense disambiguation

Patent number: 7024407

Abstract: Systems and methods for word sense disambiguation, including discerning one or more senses or occurrences, distinguishing between senses or occurrences, and determining a meaning for a sense or occurrence of a subject term. In a collection of documents containing terms and a reference collection containing at least one meaning associated with a term, the method includes forming a vector space representation of terms and documents. In some embodiments, the vector space is a latent semantic index vector space. In some embodiments, occurrences are clustered to discern or distinguish a sense of a term. In preferred embodiments, meaning of a sense or occurrence is assigned based on either correlation with an external reference source, or proximity to a reference source that has been indexed into the space.

Type: Grant

Filed: August 24, 2001

Date of Patent: April 4, 2006

Assignee: Content Analyst Company, LLC

Inventor: Roger B. Bradford
Method and system for facilitating the refinement of data queries

Patent number: 6954750

Abstract: Refining a current query. Receiving information regarding the relevancy of documents retrieved from a document collection in response to a current query. Ranking the retrieved documents in accordance with the relevancy information. Forming a candidate query based on the rankings and analysis of locations of the retrieved documents in a latent semantic index vector space formed from the retrieved document. Applying the candidate query to the document collection. Ranking the documents retrieved in response to the candidate query in accordance with the received relevancy information. Comparing the ranking of documents retrieved in response to the candidate query and the ranking of documents retrieved in response to the current query with the received relevancy information. Choosing the query that produces the best ranking.

Type: Grant

Filed: October 21, 2003

Date of Patent: October 11, 2005

Assignee: Content Analyst Company, LLC

Inventor: Roger Burrowes Bradford