Patents Assigned to Stratify, Inc.
  • Patent number: 8938384
    Abstract: Multiple nonoverlapping languages within a single document can be identified. In one embodiment, for each of a set of candidate languages, a set of non-overlapping languages is defined. The document is analyzed under the hypothesis that the whole document is in one language and that part of the document is in one language while the rest is in a different, non-overlapping language. Language(s) of the document are identified based on comparing these competing hypotheses across a number of language pairs. In another embodiment, transitions between non-overlapping character sets are used to segment a document, and each segment is scored separately for a subset of candidate languages. Language(s) of the document are identified based on the segment scores.
    Type: Grant
    Filed: July 16, 2012
    Date of Patent: January 20, 2015
    Assignee: Stratify, Inc.
    Inventor: Sauraj Goswami
  • Patent number: 8862670
    Abstract: A pool of messages, e.g., e-mails and/or other electronic documents that each correspond to a communication from a sender to a recipient, is analyzed to identify communication chains between a source and a target. Sender and recipient identifiers extracted from the messages are used to detect direct and indirect communication links between pairs of entities. Information related to the identified communication chains can be presented to a user via an interactive network graph that supports iterative analysis of the communication-chain data.
    Type: Grant
    Filed: January 26, 2007
    Date of Patent: October 14, 2014
    Assignee: Stratify, Inc.
    Inventors: Hakan Ancin, David Bayer, Kumar Maddalli, Joy Thomas
  • Patent number: 8788601
    Abstract: Improved techniques of fulfilling a request to perform a task involve a master computer placing the request in a first queue and a copy of the request in a second queue, the second queue being frequently accessed by a set of worker computers which rapidly scans the second queue for requests to fulfill. If, during the scanning, a worker computer determines that it has a capability to fulfill the request, the worker computer removes the copy of the request from the second queue. Furthermore, if the copy of the request remains in the second queue after a brief time period, it is clear that the set of worker computers is unable to perform the task. In this case, the master computer takes a remedial action such as notifying a client computer which sent the request that the worker computers, as currently configured, are unable to perform the task.
    Type: Grant
    Filed: May 26, 2011
    Date of Patent: July 22, 2014
    Assignee: Stratify, Inc.
    Inventors: Anand Rajasekar, Pankaj Nayal
  • Patent number: 8781817
    Abstract: Meaningful phrases are distinguished from chance word sequences statistically, by analyzing a large number of documents and using a statistical metric such as a mutual information metric to distinguish meaningful phrases from groups of words that co-occur by chance. In some embodiments, multiple lists of candidate phrases are maintained to optimize the storage requirement of the phrase-identification algorithm. After phrase identification, a combination of words and meaningful phrases can be used to construct clusters of documents.
    Type: Grant
    Filed: March 4, 2013
    Date of Patent: July 15, 2014
    Assignee: Stratify, Inc.
    Inventors: Joy Thomas, Karthik Ramachandran
  • Patent number: 8527436
    Abstract: An automated parser for e-mail messages identifies component parts such as header, body, signature, and disclaimer. The parser uses a hidden Markov model (HMM) in which the lines making up an e mail are treated as a sequence of observations of a system that evolves according to a Markov chain having states corresponding to the component parts. The HMM is trained using a manually-annotated set of e-mail messages, then applied to parse other e-mail messages. HMM-based parsing can be further refined or expanded using heuristic post-processing techniques that exploit redundancy of some component parts (e.g., signatures, disclaimers) across a corpus of e-mail messages.
    Type: Grant
    Filed: August 30, 2010
    Date of Patent: September 3, 2013
    Assignee: Stratify, Inc.
    Inventors: Vamsi Salaka, Joy Thomas
  • Publication number: 20130191111
    Abstract: Multiple nonoverlapping languages within a single document can be identified. In one embodiment, for each of a set of candidate languages, a set of non-overlapping languages is defined. The document is analyzed under the hypothesis that the whole document is in one language and that part of the document is in one language while the rest is in a different, non-overlapping language. Language(s) of the document are identified based on comparing these competing hypotheses across a number of language pairs. In another embodiment, transitions between non-overlapping character sets are used to segment a document, and each segment is scored separately for a subset of candidate languages. Language(s) of the document are identified based on the segment scores.
    Type: Application
    Filed: July 16, 2012
    Publication date: July 25, 2013
    Applicant: Stratify, Inc.
    Inventor: Sauraj GOSWAMI
  • Publication number: 20130185060
    Abstract: Meaningful phrases are distinguished from chance word sequences statistically, by analyzing a large number of documents and using a statistical metric such as a mutual information metric to distinguish meaningful phrases from groups of words that co-occur by chance. In some embodiments, multiple lists of candidate phrases are maintained to optimize the storage requirement of the phrase-identification algorithm. After phrase identification, a combination of words and meaningful phrases can be used to construct clusters of documents.
    Type: Application
    Filed: March 4, 2013
    Publication date: July 18, 2013
    Applicant: STRATIFY, INC.
    Inventor: STRATIFY, INC.
  • Patent number: 8484221
    Abstract: Documents are assigned to one or more indexes in a document indexing system on the basis of document properties such as total number of tokens in the document, number of numeric tokens in the document, number of alphabetic tokens in the document, size of the document, and metadata associated with the document. Based on statistical distributions of document properties (over a large number of documents), different indexes can be defined, and a document router can direct a particular document to one index or another based on the properties of the particular document. In some implementations, certain document properties may be used to identify a nonrelevant document, or garbage document, so that it is either not indexed or assigned to an index dedicated for such documents.
    Type: Grant
    Filed: May 25, 2010
    Date of Patent: July 9, 2013
    Assignee: Stratify, Inc.
    Inventors: Kumar Maddali, Joy Thomas
  • Patent number: 8392175
    Abstract: Meaningful phrases are distinguished from chance word sequences statistically, by analyzing a large number of documents and using a statistical metric such as a mutual information metric to distinguish meaningful phrases from groups of words that co-occur by chance. In some embodiments, multiple lists of candidate phrases are maintained to optimize the storage requirement of the phrase-identification algorithm. After phrase identification, a combination of words and meaningful phrases can be used to construct clusters of documents.
    Type: Grant
    Filed: May 21, 2010
    Date of Patent: March 5, 2013
    Assignee: Stratify, Inc.
    Inventors: Joy Thomas, Karthik Ramachandran
  • Patent number: 8244767
    Abstract: Reliable identification of highly similar documents allows such documents to be treated as identical for purposes of document analysis. Identification of highly similar documents can be based on a composite hash value or other value for which the likelihood of two documents having the same value is high if and only if the documents have a high degree of similarity. Prior to performing content based analysis, the composite hash value for the current document is determined and compared to composite hash values of previously analyzed documents. If a match is found, the results of the analysis of the previous document can be applied to the current document. If no match is found, the current document is analyzed.
    Type: Grant
    Filed: May 21, 2010
    Date of Patent: August 14, 2012
    Assignee: Stratify, Inc.
    Inventors: Hakan Ancin, Rajashekhar Goli, Ankita Bakshi, Kumar Maddali, Joy Thomas, Karthik Ramachandran
  • Patent number: 8224642
    Abstract: An “impostor profile” for a language is used to determine whether documents are in that language or no language. The impostor profile for a given language provides statistical information about the expected results of applying a language model for one or more other (“impostor”) languages to a document that is in fact in the given language. After a most likely language for a test document is identified, the impostor profile is used together with the scores for the test document in the various impostor languages to determine whether to identify the test document as being in the most likely language or in no language.
    Type: Grant
    Filed: November 20, 2008
    Date of Patent: July 17, 2012
    Assignee: Stratify, Inc.
    Inventor: Sauraj Goswami
  • Patent number: 8224641
    Abstract: Multiple nonoverlapping languages within a single document can be identified. In one embodiment, for each of a set of candidate languages, a set of non-overlapping languages is defined. The document is analyzed under the hypothesis that the whole document is in one language and that part of the document is in one language while the rest is in a different, non-overlapping language. Language(s) of the document are identified based on comparing these competing hypotheses across a number of language pairs. In another embodiment, transitions between non-overlapping character sets are used to segment a document, and each segment is scored separately for a subset of candidate languages. Language(s) of the document are identified based on the segment scores.
    Type: Grant
    Filed: November 19, 2008
    Date of Patent: July 17, 2012
    Assignee: Stratify, Inc.
    Inventor: Sauraj Goswami
  • Publication number: 20120054135
    Abstract: An automated parser for e-mail messages identifies component parts such as header, body, signature, and disclaimer. The parser uses a hidden Markov model (HMM) in which the lines making up an e mail are treated as a sequence of observations of a system that evolves according to a Markov chain having states corresponding to the component parts. The HMM is trained using a manually-annotated set of e-mail messages, then applied to parse other e-mail messages. HMM-based parsing can be further refined or expanded using heuristic post-processing techniques that exploit redundancy of some component parts (e.g., signatures, disclaimers) across a corpus of e-mail messages.
    Type: Application
    Filed: August 30, 2010
    Publication date: March 1, 2012
    Applicant: Stratify, Inc.
    Inventors: Vamsi Salaka, Joy Thomas
  • Publication number: 20110191098
    Abstract: Meaningful phrases are distinguished from chance word sequences statistically, by analyzing a large number of documents and using a statistical metric such as a mutual information metric to distinguish meaningful phrases from groups of words that co-occur by chance. In some embodiments, multiple lists of candidate phrases are maintained to optimize the storage requirement of the phrase-identification algorithm. After phrase identification, a combination of words and meaningful phrases can be used to construct clusters of documents.
    Type: Application
    Filed: May 21, 2010
    Publication date: August 4, 2011
    Applicant: Stratify, Inc.
    Inventors: Joy Thomas, Karthik Ramachandran
  • Publication number: 20110191347
    Abstract: Documents are assigned to one or more indexes in a document indexing system on the basis of document properties such as total number of tokens in the document, number of numeric tokens in the document, number of alphabetic tokens in the document, size of the document, and metadata associated with the document. Based on statistical distributions of document properties (over a large number of documents), different indexes can be defined, and a document router can direct a particular document to one index or another based on the properties of the particular document. In some implementations, certain document properties may be used to identify a nonrelevant document, or garbage document, so that it is either not indexed or assigned to an index dedicated for such documents.
    Type: Application
    Filed: May 25, 2010
    Publication date: August 4, 2011
    Applicant: Stratify, Inc.
    Inventors: Kumar Maddali, Joy Thomas
  • Patent number: 7945600
    Abstract: Techniques for organizing a corpus of electronic documents. The electronic documents are organized in a manner that facilitates review of the documents. The documents are organized into a concept-based hierarchical collection of folders based upon contents of the documents.
    Type: Grant
    Filed: March 4, 2005
    Date of Patent: May 17, 2011
    Assignee: Stratify, Inc.
    Inventors: Joy Aloysius Thomas, Mohana Krishna Lakhamraju, George Manianghat Mathew, Pangal Pandurang Nayak, Gollakota Venkata Ramana, John O. Lamping
  • Publication number: 20110087668
    Abstract: Documents likely to be near-duplicates are clustered based on document vectors that represent word-occurrence patterns in a relatively low-dimensional space. Edit distance between documents is defined based on comparing their document vectors. In one process, initial clusters are formed by applying a first edit-distance constraint relative to a root document of each cluster. The initial clusters can be merged subject to a second edit-distance constraint that limits the maximum edit distance between any two documents in the cluster. The second edit-distance constraint can be defined such that whether it is satisfied can be determined by comparing cluster structures rather than individual documents.
    Type: Application
    Filed: August 27, 2010
    Publication date: April 14, 2011
    Applicant: Stratify, Inc.
    Inventors: Joy Thomas, Sauraj Goswami, Vamsi Salaka
  • Publication number: 20110087669
    Abstract: Reliable identification of highly similar documents allows such documents to be treated as identical for purposes of document analysis. Identification of highly similar documents can be based on a composite hash value or other value for which the likelihood of two documents having the same value is high if and only if the documents have a high degree of similarity. Prior to performing content based analysis, the composite hash value for the current document is determined and compared to composite hash values of previously analyzed documents. If a match is found, the results of the analysis of the previous document can be applied to the current document. If no match is found, the current document is analyzed.
    Type: Application
    Filed: May 21, 2010
    Publication date: April 14, 2011
    Applicant: Stratify, Inc.
    Inventors: Hakan Ancin, Rajashekhar Goli, Ankita Bakshi, Kumar Maddali, Joy Thomas
  • Patent number: 7877388
    Abstract: A method (and system) for clustering a plurality of items. Each of the items includes information. The method includes inputting a plurality of items. Each of the items includes information. The items are provided into a clustering process. The method also inputs an initial organization structure into the clustering process. The initial organization structure includes one or more categories, at least one of the categories being associated with one of the items. The method processes the plurality of items based upon at least the initial organization structure and the information in each of the items; and determines a resulting organization structure based upon the processing. The resulting organization structure relates to the initial organization structure.
    Type: Grant
    Filed: October 31, 2007
    Date of Patent: January 25, 2011
    Assignee: Stratify, Inc.
    Inventors: John O. Lamping, Ramana Venkata, Shashidhar Thakur, Samdeer Siruguri
  • Patent number: 7822812
    Abstract: Techniques for sharing content information between members of a virtual user group without compromising the privacy of the members. A user can identify content information to be shared with other members of a virtual user group using a user computer system. The content information is then communicated to the other members of the virtual user group and can be accessed by members of the virtual user group in such a manner that the privacy of the user and of the other members of the virtual user group is not compromised. The present invention preserves user privacy by controlling and minimizing the amount of user-related information available/accessible to server systems hosting the virtual user groups.
    Type: Grant
    Filed: January 3, 2007
    Date of Patent: October 26, 2010
    Assignee: Stratify, Inc.
    Inventors: Rakesh Mathur, Ramesh Subramonian, Ramana Venkata, Pangal P. Nayak, Joy A. Thomas