Patents Assigned to Stratify, Inc.
-
Patent number: 8938384Abstract: Multiple nonoverlapping languages within a single document can be identified. In one embodiment, for each of a set of candidate languages, a set of non-overlapping languages is defined. The document is analyzed under the hypothesis that the whole document is in one language and that part of the document is in one language while the rest is in a different, non-overlapping language. Language(s) of the document are identified based on comparing these competing hypotheses across a number of language pairs. In another embodiment, transitions between non-overlapping character sets are used to segment a document, and each segment is scored separately for a subset of candidate languages. Language(s) of the document are identified based on the segment scores.Type: GrantFiled: July 16, 2012Date of Patent: January 20, 2015Assignee: Stratify, Inc.Inventor: Sauraj Goswami
-
Patent number: 8862670Abstract: A pool of messages, e.g., e-mails and/or other electronic documents that each correspond to a communication from a sender to a recipient, is analyzed to identify communication chains between a source and a target. Sender and recipient identifiers extracted from the messages are used to detect direct and indirect communication links between pairs of entities. Information related to the identified communication chains can be presented to a user via an interactive network graph that supports iterative analysis of the communication-chain data.Type: GrantFiled: January 26, 2007Date of Patent: October 14, 2014Assignee: Stratify, Inc.Inventors: Hakan Ancin, David Bayer, Kumar Maddalli, Joy Thomas
-
Patent number: 8788601Abstract: Improved techniques of fulfilling a request to perform a task involve a master computer placing the request in a first queue and a copy of the request in a second queue, the second queue being frequently accessed by a set of worker computers which rapidly scans the second queue for requests to fulfill. If, during the scanning, a worker computer determines that it has a capability to fulfill the request, the worker computer removes the copy of the request from the second queue. Furthermore, if the copy of the request remains in the second queue after a brief time period, it is clear that the set of worker computers is unable to perform the task. In this case, the master computer takes a remedial action such as notifying a client computer which sent the request that the worker computers, as currently configured, are unable to perform the task.Type: GrantFiled: May 26, 2011Date of Patent: July 22, 2014Assignee: Stratify, Inc.Inventors: Anand Rajasekar, Pankaj Nayal
-
Patent number: 8781817Abstract: Meaningful phrases are distinguished from chance word sequences statistically, by analyzing a large number of documents and using a statistical metric such as a mutual information metric to distinguish meaningful phrases from groups of words that co-occur by chance. In some embodiments, multiple lists of candidate phrases are maintained to optimize the storage requirement of the phrase-identification algorithm. After phrase identification, a combination of words and meaningful phrases can be used to construct clusters of documents.Type: GrantFiled: March 4, 2013Date of Patent: July 15, 2014Assignee: Stratify, Inc.Inventors: Joy Thomas, Karthik Ramachandran
-
Patent number: 8527436Abstract: An automated parser for e-mail messages identifies component parts such as header, body, signature, and disclaimer. The parser uses a hidden Markov model (HMM) in which the lines making up an e mail are treated as a sequence of observations of a system that evolves according to a Markov chain having states corresponding to the component parts. The HMM is trained using a manually-annotated set of e-mail messages, then applied to parse other e-mail messages. HMM-based parsing can be further refined or expanded using heuristic post-processing techniques that exploit redundancy of some component parts (e.g., signatures, disclaimers) across a corpus of e-mail messages.Type: GrantFiled: August 30, 2010Date of Patent: September 3, 2013Assignee: Stratify, Inc.Inventors: Vamsi Salaka, Joy Thomas
-
Publication number: 20130191111Abstract: Multiple nonoverlapping languages within a single document can be identified. In one embodiment, for each of a set of candidate languages, a set of non-overlapping languages is defined. The document is analyzed under the hypothesis that the whole document is in one language and that part of the document is in one language while the rest is in a different, non-overlapping language. Language(s) of the document are identified based on comparing these competing hypotheses across a number of language pairs. In another embodiment, transitions between non-overlapping character sets are used to segment a document, and each segment is scored separately for a subset of candidate languages. Language(s) of the document are identified based on the segment scores.Type: ApplicationFiled: July 16, 2012Publication date: July 25, 2013Applicant: Stratify, Inc.Inventor: Sauraj GOSWAMI
-
Publication number: 20130185060Abstract: Meaningful phrases are distinguished from chance word sequences statistically, by analyzing a large number of documents and using a statistical metric such as a mutual information metric to distinguish meaningful phrases from groups of words that co-occur by chance. In some embodiments, multiple lists of candidate phrases are maintained to optimize the storage requirement of the phrase-identification algorithm. After phrase identification, a combination of words and meaningful phrases can be used to construct clusters of documents.Type: ApplicationFiled: March 4, 2013Publication date: July 18, 2013Applicant: STRATIFY, INC.Inventor: STRATIFY, INC.
-
Patent number: 8484221Abstract: Documents are assigned to one or more indexes in a document indexing system on the basis of document properties such as total number of tokens in the document, number of numeric tokens in the document, number of alphabetic tokens in the document, size of the document, and metadata associated with the document. Based on statistical distributions of document properties (over a large number of documents), different indexes can be defined, and a document router can direct a particular document to one index or another based on the properties of the particular document. In some implementations, certain document properties may be used to identify a nonrelevant document, or garbage document, so that it is either not indexed or assigned to an index dedicated for such documents.Type: GrantFiled: May 25, 2010Date of Patent: July 9, 2013Assignee: Stratify, Inc.Inventors: Kumar Maddali, Joy Thomas
-
Patent number: 8392175Abstract: Meaningful phrases are distinguished from chance word sequences statistically, by analyzing a large number of documents and using a statistical metric such as a mutual information metric to distinguish meaningful phrases from groups of words that co-occur by chance. In some embodiments, multiple lists of candidate phrases are maintained to optimize the storage requirement of the phrase-identification algorithm. After phrase identification, a combination of words and meaningful phrases can be used to construct clusters of documents.Type: GrantFiled: May 21, 2010Date of Patent: March 5, 2013Assignee: Stratify, Inc.Inventors: Joy Thomas, Karthik Ramachandran
-
Patent number: 8244767Abstract: Reliable identification of highly similar documents allows such documents to be treated as identical for purposes of document analysis. Identification of highly similar documents can be based on a composite hash value or other value for which the likelihood of two documents having the same value is high if and only if the documents have a high degree of similarity. Prior to performing content based analysis, the composite hash value for the current document is determined and compared to composite hash values of previously analyzed documents. If a match is found, the results of the analysis of the previous document can be applied to the current document. If no match is found, the current document is analyzed.Type: GrantFiled: May 21, 2010Date of Patent: August 14, 2012Assignee: Stratify, Inc.Inventors: Hakan Ancin, Rajashekhar Goli, Ankita Bakshi, Kumar Maddali, Joy Thomas, Karthik Ramachandran
-
Patent number: 8224642Abstract: An “impostor profile” for a language is used to determine whether documents are in that language or no language. The impostor profile for a given language provides statistical information about the expected results of applying a language model for one or more other (“impostor”) languages to a document that is in fact in the given language. After a most likely language for a test document is identified, the impostor profile is used together with the scores for the test document in the various impostor languages to determine whether to identify the test document as being in the most likely language or in no language.Type: GrantFiled: November 20, 2008Date of Patent: July 17, 2012Assignee: Stratify, Inc.Inventor: Sauraj Goswami
-
Patent number: 8224641Abstract: Multiple nonoverlapping languages within a single document can be identified. In one embodiment, for each of a set of candidate languages, a set of non-overlapping languages is defined. The document is analyzed under the hypothesis that the whole document is in one language and that part of the document is in one language while the rest is in a different, non-overlapping language. Language(s) of the document are identified based on comparing these competing hypotheses across a number of language pairs. In another embodiment, transitions between non-overlapping character sets are used to segment a document, and each segment is scored separately for a subset of candidate languages. Language(s) of the document are identified based on the segment scores.Type: GrantFiled: November 19, 2008Date of Patent: July 17, 2012Assignee: Stratify, Inc.Inventor: Sauraj Goswami
-
Publication number: 20120054135Abstract: An automated parser for e-mail messages identifies component parts such as header, body, signature, and disclaimer. The parser uses a hidden Markov model (HMM) in which the lines making up an e mail are treated as a sequence of observations of a system that evolves according to a Markov chain having states corresponding to the component parts. The HMM is trained using a manually-annotated set of e-mail messages, then applied to parse other e-mail messages. HMM-based parsing can be further refined or expanded using heuristic post-processing techniques that exploit redundancy of some component parts (e.g., signatures, disclaimers) across a corpus of e-mail messages.Type: ApplicationFiled: August 30, 2010Publication date: March 1, 2012Applicant: Stratify, Inc.Inventors: Vamsi Salaka, Joy Thomas
-
Publication number: 20110191098Abstract: Meaningful phrases are distinguished from chance word sequences statistically, by analyzing a large number of documents and using a statistical metric such as a mutual information metric to distinguish meaningful phrases from groups of words that co-occur by chance. In some embodiments, multiple lists of candidate phrases are maintained to optimize the storage requirement of the phrase-identification algorithm. After phrase identification, a combination of words and meaningful phrases can be used to construct clusters of documents.Type: ApplicationFiled: May 21, 2010Publication date: August 4, 2011Applicant: Stratify, Inc.Inventors: Joy Thomas, Karthik Ramachandran
-
Publication number: 20110191347Abstract: Documents are assigned to one or more indexes in a document indexing system on the basis of document properties such as total number of tokens in the document, number of numeric tokens in the document, number of alphabetic tokens in the document, size of the document, and metadata associated with the document. Based on statistical distributions of document properties (over a large number of documents), different indexes can be defined, and a document router can direct a particular document to one index or another based on the properties of the particular document. In some implementations, certain document properties may be used to identify a nonrelevant document, or garbage document, so that it is either not indexed or assigned to an index dedicated for such documents.Type: ApplicationFiled: May 25, 2010Publication date: August 4, 2011Applicant: Stratify, Inc.Inventors: Kumar Maddali, Joy Thomas
-
Patent number: 7945600Abstract: Techniques for organizing a corpus of electronic documents. The electronic documents are organized in a manner that facilitates review of the documents. The documents are organized into a concept-based hierarchical collection of folders based upon contents of the documents.Type: GrantFiled: March 4, 2005Date of Patent: May 17, 2011Assignee: Stratify, Inc.Inventors: Joy Aloysius Thomas, Mohana Krishna Lakhamraju, George Manianghat Mathew, Pangal Pandurang Nayak, Gollakota Venkata Ramana, John O. Lamping
-
Publication number: 20110087668Abstract: Documents likely to be near-duplicates are clustered based on document vectors that represent word-occurrence patterns in a relatively low-dimensional space. Edit distance between documents is defined based on comparing their document vectors. In one process, initial clusters are formed by applying a first edit-distance constraint relative to a root document of each cluster. The initial clusters can be merged subject to a second edit-distance constraint that limits the maximum edit distance between any two documents in the cluster. The second edit-distance constraint can be defined such that whether it is satisfied can be determined by comparing cluster structures rather than individual documents.Type: ApplicationFiled: August 27, 2010Publication date: April 14, 2011Applicant: Stratify, Inc.Inventors: Joy Thomas, Sauraj Goswami, Vamsi Salaka
-
Publication number: 20110087669Abstract: Reliable identification of highly similar documents allows such documents to be treated as identical for purposes of document analysis. Identification of highly similar documents can be based on a composite hash value or other value for which the likelihood of two documents having the same value is high if and only if the documents have a high degree of similarity. Prior to performing content based analysis, the composite hash value for the current document is determined and compared to composite hash values of previously analyzed documents. If a match is found, the results of the analysis of the previous document can be applied to the current document. If no match is found, the current document is analyzed.Type: ApplicationFiled: May 21, 2010Publication date: April 14, 2011Applicant: Stratify, Inc.Inventors: Hakan Ancin, Rajashekhar Goli, Ankita Bakshi, Kumar Maddali, Joy Thomas
-
Patent number: 7877388Abstract: A method (and system) for clustering a plurality of items. Each of the items includes information. The method includes inputting a plurality of items. Each of the items includes information. The items are provided into a clustering process. The method also inputs an initial organization structure into the clustering process. The initial organization structure includes one or more categories, at least one of the categories being associated with one of the items. The method processes the plurality of items based upon at least the initial organization structure and the information in each of the items; and determines a resulting organization structure based upon the processing. The resulting organization structure relates to the initial organization structure.Type: GrantFiled: October 31, 2007Date of Patent: January 25, 2011Assignee: Stratify, Inc.Inventors: John O. Lamping, Ramana Venkata, Shashidhar Thakur, Samdeer Siruguri
-
Patent number: 7822812Abstract: Techniques for sharing content information between members of a virtual user group without compromising the privacy of the members. A user can identify content information to be shared with other members of a virtual user group using a user computer system. The content information is then communicated to the other members of the virtual user group and can be accessed by members of the virtual user group in such a manner that the privacy of the user and of the other members of the virtual user group is not compromised. The present invention preserves user privacy by controlling and minimizing the amount of user-related information available/accessible to server systems hosting the virtual user groups.Type: GrantFiled: January 3, 2007Date of Patent: October 26, 2010Assignee: Stratify, Inc.Inventors: Rakesh Mathur, Ramesh Subramonian, Ramana Venkata, Pangal P. Nayak, Joy A. Thomas