Patents Assigned to Content Analyst Company, LLC
  • Patent number: 8024344
    Abstract: Presented are systems and methods for securely sharing confidential information. In such a method, term vectors corresponding to ones of a plurality of confidential terms included in a plurality of confidential documents is received. Each of the received term vectors is mapped into a vector space. Non-confidential documents are mapped into the vector space to generate a document vector corresponding to each non-confidential document, wherein the generation of each document vector is based on a subset of the received term vectors. At least one of the non-confidential documents is identified in response to a query mapped into the vector space.
    Type: Grant
    Filed: June 5, 2008
    Date of Patent: September 20, 2011
    Assignee: Content Analyst Company, LLC
    Inventor: Roger Bradford
  • Patent number: 7844566
    Abstract: An embodiment of the present invention provides a computer-based method for automatically identifying clusters of conceptually-related documents in a collection of documents, including the following steps: generating a document-representation of each document in an abstract mathematical space; identifying a plurality of document clusters in the collection of documents based on a conceptual similarity between respective pairs of the document-representations, wherein each document cluster is associated with an exemplary document and a plurality of other documents; and identifying a non-intersecting document cluster from among the plurality of document clusters based on (i) a conceptual similarity between the document-representation of the exemplary document and the document-representation of each document in the non-intersecting cluster and (ii) a conceptual dissimilarity between a cluster-representation of the non-intersecting document cluster and a cluster-representation of each other document cluster.
    Type: Grant
    Filed: May 11, 2006
    Date of Patent: November 30, 2010
    Assignee: Content Analyst Company, LLC
    Inventor: Janusz Wnek
  • Patent number: 7765098
    Abstract: An embodiment of the present invention provides a method for automatically translating text. First, a conceptual representation space is generated based on source-language documents and target-language documents, wherein respective terms from the source-language and target-language documents have a representation in the conceptual representation space. Second, a new source-language document is represented in the conceptual representation space, wherein a subset of terms in the new source-language document is represented in the conceptual representation space, such that each term in the subset has a representation in the conceptual representation space. Then, a term in the new source-language document is automatically translated into a corresponding target-language term based on a similarity between the representation of the term and the representation of the corresponding target-language term.
    Type: Grant
    Filed: April 24, 2006
    Date of Patent: July 27, 2010
    Assignee: Content Analyst Company, LLC
    Inventor: Roger Burrowes Bradford
  • Patent number: 7720792
    Abstract: Disclosed are methods and computer program products for automatically identifying and compensating for stop words in a text processing system. This automatic stop word compensation allows such operations as performing queries on an abstract mathematical space built using all words from all texts, with the ability to compensate for the skew that the inclusion of the stop words may have introduced into the space. Documents are represented by document vectors in the abstract mathematical space. To compensate for stop words, a weight function is applied to a predetermined component of the document vectors associated with frequently occurring word(s) contained in the documents. The weight function may be applied dynamically during query processing. Alternatively, the weight function may be applied statically to all document vectors.
    Type: Grant
    Filed: February 7, 2006
    Date of Patent: May 18, 2010
    Assignee: Content Analyst Company, LLC
    Inventor: Robert Jenson Price
  • Publication number: 20090328226
    Abstract: Presented are systems and methods for securely sharing confidential information. In such a method, term vectors corresponding to ones of a plurality of confidential terms included in a plurality of confidential documents is received. Each of the received term vectors is mapped into a vector space. Non-confidential documents are mapped into the vector space to generate a document vector corresponding to each non-confidential document, wherein the generation of each document vector is based on a subset of the received term vectors. At least one of the non-confidential documents is identified in response to a query mapped into the vector space.
    Type: Application
    Filed: June 5, 2008
    Publication date: December 31, 2009
    Applicant: Content Analyst Company. LLC
    Inventor: Roger BRADFORD
  • Patent number: 7580910
    Abstract: A text processing method is provided that includes the following steps. First, an abstract mathematical vector space is generated based on a collection of documents. Respective documents in the collection of documents have a representation in the abstract mathematical vector space and respective terms contained in the collection of documents have a representation in the abstract mathematical vector space. Then, the abstract mathematical vector space is perturbed to produce a perturbed abstract mathematical vector space that is stored in an electronic format accessible to a user. Perturbing the abstract mathematical vector space may include modifying the representation of a document with a newly computed representation for that document, or modifying the representation of a term with a newly computed representation for that term.
    Type: Grant
    Filed: March 31, 2006
    Date of Patent: August 25, 2009
    Assignee: Content Analyst Company, LLC
    Inventor: Robert Jenson Price
  • Patent number: 7415462
    Abstract: Systems and methods for word sense disambiguation, including discerning one or more senses or occurrences, distinguishing between senses or occurrences, and determining a meaning for a sense or occurrence of a subject term. In a collection of documents containing terms and a reference collection containing at least one meaning associated with a term, the method includes forming a vector space representation of terms and documents. In some embodiments, the vector space is a latent semantic index vector space. In some embodiments, occurrences are clustered to discern or distinguish a sense of a term. In preferred embodiments, meaning of a sense or occurrence is assigned based on either correlation with an external reference source, or proximity to a reference source that has been indexed into the space.
    Type: Grant
    Filed: January 20, 2006
    Date of Patent: August 19, 2008
    Assignee: Content Analyst Company, LLC
    Inventor: Roger B. Bradford
  • Publication number: 20060294101
    Abstract: A system and method for the automated classification of documents. To generate a function for the automatic classification of documents, a set of similarity scores is calculated for each document in a set of exemplary documents, wherein a similarity score is calculated by measuring the similarity in a conceptual representation space between a document vector representing the document and a centroid vector representing a category. The set of similarity scores are then used by an inductive learning from examples classifier to generate the function for the automatic classification of documents.
    Type: Application
    Filed: June 23, 2006
    Publication date: December 28, 2006
    Applicant: Content Analyst Company, LLC
    Inventor: Janusz Wnek
  • Publication number: 20060265362
    Abstract: A method for ranking data-objects retrieved from a plurality of databases is provided. First, data-objects are retrieved from a plurality of databases, wherein each database is accessed using a retrieval product. Second, a representation of each of the data-objects retrieved from the plurality of databases is generated in a conceptual representation space. Third, a representation of a query is generated in the conceptual representation space. Then, the data-objects are ranked with respect to the query based on a similarity between the representation of each data-object and the representation of the query.
    Type: Application
    Filed: May 17, 2006
    Publication date: November 23, 2006
    Applicant: Content Analyst Company, LLC
    Inventor: Roger Bradford
  • Publication number: 20060265209
    Abstract: An embodiment of the present invention provides a method for automatically translating text. First, a conceptual representation space is generated based on source-language documents and target-language documents, wherein respective terms from the source-language and target-language documents have a representation in the conceptual representation space. Second, a new source-language document is represented in the conceptual representation space, wherein a subset of terms in the new source-language document is represented in the conceptual representation space, such that each term in the subset has a representation in the conceptual representation space. Then, a term in the new source-language document is automatically translated into a corresponding target-language term based on a similarity between the representation of the term and the representation of the corresponding target-language term.
    Type: Application
    Filed: April 24, 2006
    Publication date: November 23, 2006
    Applicant: Content Analyst Company, LLC
    Inventor: Roger Bradford
  • Publication number: 20060242140
    Abstract: An embodiment of the present invention provides a computer-based method for automatically identifying clusters of conceptually-related documents in a collection of documents, including the following steps: generating a document-representation of each document in an abstract mathematical space; identifying a plurality of document clusters in the collection of documents based on a conceptual similarity between respective pairs of the document-representations, wherein each document cluster is associated with an exemplary document and a plurality of other documents; and identifying a non-intersecting document cluster from among the plurality of document clusters based on (i) a conceptual similarity between the document-representation of the exemplary document and the document-representation of each document in the non-intersecting cluster and (ii) a conceptual dissimilarity between a cluster-representation of the non-intersecting document cluster and a cluster-representation of each other document cluster.
    Type: Application
    Filed: May 11, 2006
    Publication date: October 26, 2006
    Applicant: Content Analyst Company, LLC
    Inventor: Janusz Wnek
  • Publication number: 20060242098
    Abstract: A method for automatically selecting representative exemplars from a collection of documents. The method includes generating a representation of each document in the collection of documents in an abstract mathematical space, measuring a similarity between the representation of each document in the collection of documents and the representation of at least one other document in the collection of documents, identifying clusters of conceptually similar documents based on the similarity measurements, and identifying at least one exemplary document within each cluster.
    Type: Application
    Filed: November 1, 2005
    Publication date: October 26, 2006
    Applicant: Content Analyst Company, LLC
    Inventor: Janusz Wnek
  • Publication number: 20060224572
    Abstract: Computer-based methods for automatically identifying and compensating for stop words contained in documents are described. The method for compensating for stop words includes: generating an abstract mathematical space based on documents included in a collection of documents, wherein each document has a representation in the abstract mathematical space; receiving a user query; generating a representation of the user query in the abstract mathematical; computing a similarity between the representation of the user query and the representation of each document, wherein computing a similarity between the representation of the user query and the representation of a first document in the collection of documents comprises applying a weighting function to a value associated with a frequently occurring word contained in the first document, thereby automatically compensating for the frequently occurring word contained in the first document; and displaying a result based on the similarity computations.
    Type: Application
    Filed: February 7, 2006
    Publication date: October 5, 2006
    Applicant: Content Analyst Company, LLC
    Inventor: Robert Price
  • Publication number: 20060224584
    Abstract: An embodiment of the present invention provides a method for automatically subdividing a document into conceptually cohesive segments. The method includes the following steps: subdividing the document into contiguous blocks of text; generating an abstract mathematical space based on the blocks of text, wherein each block of text has a representation in the abstract mathematical space; computing similarity scores for adjacent blocks of text based on the similarity scores; and aggregating similar adjacent blocks of text based on the similarity scores.
    Type: Application
    Filed: December 27, 2005
    Publication date: October 5, 2006
    Applicant: Content Analyst Company, LLC
    Inventor: Robert Price
  • Patent number: 7113943
    Abstract: Extensions to latent semantic indexing (LSI), including: phrase processing, creation of generalized entities, elaboration of entities, replacement of idiomatic expressions, and use of data fusion methods to combine the aforementioned extensions in a synergistic fashion. Additionally, novel methods tailored to specific applications of LSI are disclosed.
    Type: Grant
    Filed: December 5, 2001
    Date of Patent: September 26, 2006
    Assignee: Content Analyst Company, LLC
    Inventors: Roger Burrowes Bradford, Janusz Wnek
  • Publication number: 20060117052
    Abstract: Systems and methods for word sense disambiguation, including discerning one or more senses or occurrences, distinguishing between senses or occurrences, and determining a meaning for a sense or occurrence of a subject term. In a collection of documents containing terms and a reference collection containing at least one meaning associated with a term, the method includes forming a vector space representation of terms and documents. In some embodiments, the vector space is a latent semantic index vector space. In some embodiments, occurrences are clustered to discern or distinguish a sense of a term. In preferred embodiments, meaning of a sense or occurrence is assigned based on either correlation with an external reference source, or proximity to a reference source that has been indexed into the space.
    Type: Application
    Filed: January 20, 2006
    Publication date: June 1, 2006
    Applicant: Content Analyst Company, LLC
    Inventor: Roger Bradford
  • Patent number: 7024407
    Abstract: Systems and methods for word sense disambiguation, including discerning one or more senses or occurrences, distinguishing between senses or occurrences, and determining a meaning for a sense or occurrence of a subject term. In a collection of documents containing terms and a reference collection containing at least one meaning associated with a term, the method includes forming a vector space representation of terms and documents. In some embodiments, the vector space is a latent semantic index vector space. In some embodiments, occurrences are clustered to discern or distinguish a sense of a term. In preferred embodiments, meaning of a sense or occurrence is assigned based on either correlation with an external reference source, or proximity to a reference source that has been indexed into the space.
    Type: Grant
    Filed: August 24, 2001
    Date of Patent: April 4, 2006
    Assignee: Content Analyst Company, LLC
    Inventor: Roger B. Bradford
  • Patent number: 6954750
    Abstract: Refining a current query. Receiving information regarding the relevancy of documents retrieved from a document collection in response to a current query. Ranking the retrieved documents in accordance with the relevancy information. Forming a candidate query based on the rankings and analysis of locations of the retrieved documents in a latent semantic index vector space formed from the retrieved document. Applying the candidate query to the document collection. Ranking the documents retrieved in response to the candidate query in accordance with the received relevancy information. Comparing the ranking of documents retrieved in response to the candidate query and the ranking of documents retrieved in response to the current query with the received relevancy information. Choosing the query that produces the best ranking.
    Type: Grant
    Filed: October 21, 2003
    Date of Patent: October 11, 2005
    Assignee: Content Analyst Company, LLC
    Inventor: Roger Burrowes Bradford