Patents Assigned to PureDiscovery Corporation
  • Patent number: 8788516
    Abstract: A method includes determining a plurality of social interactions associated with a plurality of people, generating a social object matrix using the determined social interactions, and generating a social brain by performing Singular Value Decomposition (SVD) on the social object matrix. The method further includes determining text from the social objects of the determined social interactions, generating a term-document matrix (TDM) using the determined text, generating a semantic brain by performing SVD on the TDM, generating an index using the determined text, and performing a query using the social brain, the semantic brain, and the index. The social brain is a singular value representation of the social object matrix and the semantic brain is a singular value representation of the TDM. Each social interaction is a particular person interacting with a particular social object.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: July 22, 2014
    Assignee: PureDiscovery Corporation
    Inventor: Paul A. Jakubik
  • Patent number: 8639496
    Abstract: A method includes accessing text that includes a plurality of words, tagging each of the plurality of words with one of a plurality of parts of speech (POS) tags, and creating a plurality of tokens, each token comprising one of the plurality of words and its associated POS tag. The method further includes clustering one or more of the created tokens into a chunk of tokens, the one or more tokens clustered into the chunk of tokens based on the POS tags of the one or more tokens, and forming a phrase based on the chunk of tokens, the phrase comprising the words of the one or more tokens clustered into the chunk of tokens.
    Type: Grant
    Filed: January 2, 2013
    Date of Patent: January 28, 2014
    Assignee: PureDiscovery Corporation
    Inventor: Paul A. Jakubik
  • Patent number: 8635225
    Abstract: A method includes accessing a set of documents and a set of representative documents, determining distances from each document to a nearest representative document, and selecting a subset of documents using an algorithm for choosing initial seed values and the determined distances to the nearest representative document. The method further includes repeating the following for each particular document of the subset of documents: adding the particular document to the set of representative documents to create a new set of representative documents, removing the particular document of documents from the set of documents to create a new set of documents, and calculating a sum of distances from each document of the new set of documents to a nearest document in the new set of representative documents. The particular document of the subset that resulted in the lowest sum of distances is selected as a new representative document.
    Type: Grant
    Filed: March 14, 2013
    Date of Patent: January 21, 2014
    Assignee: PureDiscovery Corporation
    Inventor: Paul A. Jakubik
  • Publication number: 20130158979
    Abstract: A method includes accessing text that includes a plurality of words, tagging each of the plurality of words with one of a plurality of parts of speech (POS) tags, and creating a plurality of tokens, each token comprising one of the plurality of words and its associated POS tag. The method further includes clustering one or more of the created tokens into a chunk of tokens, the one or more tokens clustered into the chunk of tokens based on the POS tags of the one or more tokens, and forming a phrase based on the chunk of tokens, the phrase comprising the words of the one or more tokens clustered into the chunk of tokens.
    Type: Application
    Filed: December 14, 2011
    Publication date: June 20, 2013
    Applicant: PUREDISCOVERY CORPORATION
    Inventor: Paul A. Jakubik
  • Publication number: 20130159313
    Abstract: A method includes accessing text, identifying a plurality of terms from the text, determining a plurality of term vectors associated with the identified plurality of terms, and clustering the determined plurality of term vectors into a plurality of clusters, the plurality of clusters comprising a first and a second cluster, the first and second clusters each comprising two or more of the determined term vectors. The method further includes creating a first pseudo-document according to the first cluster, creating a second pseudo-document according to the second cluster, identifying a first set of terms associated with the first cluster using latent semantic analysis (LSA) of the first pseudo-document, identifying a second set of terms associated with the second cluster using LSA of the second pseudo-document, and combining the first and second sets of terms into a list of output terms.
    Type: Application
    Filed: December 14, 2011
    Publication date: June 20, 2013
    Applicant: PUREDISCOVERY CORPORATION
    Inventor: Paul A. Jakubik
  • Patent number: 8312034
    Abstract: A concept bridge employable with a search engine, method of operating the same and computer information system employing the concept bridge and method. In one embodiment, the concept bridge includes an extractor configured to derive concept terms by extracting significant terms from search text and inferring relevant terms therefrom. The concept bridge also includes a query generator configured to generate a query consistent with an index of a search engine as a function of the concept terms.
    Type: Grant
    Filed: June 21, 2006
    Date of Patent: November 13, 2012
    Assignee: PureDiscovery Corporation
    Inventors: David Adam Hagar, Stephen Scott Jernigan, David Seigert Copps
  • Publication number: 20100114890
    Abstract: A computerized method of querying an array of vectors includes receiving a first matrix, partitioning the first matrix into a plurality of subset matrices, and processing each subset matrix with a natural language analysis process to create a plurality of processed subset matrices. The first matrix includes a first plurality of terms and represents one or more data objects to be queried, each subset matrix includes similar vectors from the first matrix, and each processed subset matrix relates terms in each subset matrix to each other.
    Type: Application
    Filed: October 31, 2008
    Publication date: May 6, 2010
    Applicant: PureDiscovery Corporation
    Inventors: David A. Hagar, Paul A. Jakubik, Stephen S. Jernigan