Abstract: An “impostor profile” for a language is used to determine whether documents are in that language or no language. The impostor profile for a given language provides statistical information about the expected results of applying a language model for one or more other (“impostor”) languages to a document that is in fact in the given language. After a most likely language for a test document is identified, the impostor profile is used together with the scores for the test document in the various impostor languages to determine whether to identify the test document as being in the most likely language or in no language.
Abstract: Multiple nonoverlapping languages within a single document can be identified. In one embodiment, for each of a set of candidate languages, a set of non-overlapping languages is defined. The document is analyzed under the hypothesis that the whole document is in one language and that part of the document is in one language while the rest is in a different, non-overlapping language. Language(s) of the document are identified based on comparing these competing hypotheses across a number of language pairs. In another embodiment, transitions between non-overlapping character sets are used to segment a document, and each segment is scored separately for a subset of candidate languages. Language(s) of the document are identified based on the segment scores.
Abstract: A pool of messages, e.g., e-mails and/or other electronic documents that each correspond to a communication from a sender to a recipient, is analyzed to identify communication chains between a source and a target. Sender and recipient identifiers extracted from the messages are used to detect communication links between pairs of entities. Indirect chains of any desired length can be found by iteratively tracing a communication path one step forward from the source, then one step backward from the target, and so on; at each new step, entities at end points of the forward paths and backward paths are compared to detect any entities that complete a communication chain from source to target. Information related to the identified communication chains can be presented to a user via an interactive report that supports iterative analysis of the communication-chain data.
Type:
Grant
Filed:
January 26, 2007
Date of Patent:
February 16, 2010
Assignee:
Stratify, Inc.
Inventors:
Hakan Ancin, David Bayer, Kumar Maddalli, Joy Thomas
Abstract: A method (and system) for associating an item with only less than ten categories from a collection of more than hundred categories. Each category includes a model, which determines a process. The process determines a degree of closeness to a set of features of items. The method includes receiving an item (e.g., document); and determining a set of features (e.g., word, number, author) associated with the item. The method also uses the set of features with two or more processes in respective categories out of the collection of more than one hundred categories to determine how well the set of features matches the two or more categories. A step of forming a blended model from two or more categories to derive a blended process is included. The blended process yields a closer fit to the set of features of the item than that of any single process of a category from the collection of categories. The method associates the item to each of the two or more categories that were used to form the blended process.
Abstract: A method (and system) for clustering a plurality of items. Each of the items includes information. The method includes inputting a plurality of items. Each of the items includes information. The items are provided into a clustering process. The method also inputs an initial organization structure into the clustering process. The initial organization structure includes one or more categories, at least one of the categories being associated with one of the items. The method processes the plurality of items based upon at least the initial organization structure and the information in each of the items; and determines a resulting organization structure based upon the processing. The resulting organization structure relates to the initial organization structure.
Type:
Grant
Filed:
December 14, 2001
Date of Patent:
December 11, 2007
Assignee:
Stratify, Inc.
Inventors:
John O. Lamping, Ramana Venkata, Shashidhar Thakur, Sameer Siruguri
Abstract: Techniques for sharing content information between members of a virtual user group without compromising the privacy of the members. A user can identify content information to be shared with other members of a virtual user group using a user computer system. The content information is then communicated to the other members of the virtual user group and can be accessed by members of the virtual user group in such a manner that the privacy of the user and of the other members of the virtual user group is not compromised. The present invention preserves user privacy by controlling and minimizing the amount of user-related information available/accessible to server systems hosting the virtual user groups.
Type:
Application
Filed:
January 3, 2007
Publication date:
July 12, 2007
Applicant:
Stratify, Inc.
Inventors:
Rakesh Mathur, Ramesh Subramonian, Ramana Venkata, Pangal Nayak, Joy Thomas
Abstract: Techniques for sharing content information between members of a virtual user group without compromising the privacy of the members. A user can identify content information to be shared with other members of a virtual user group using a user computer system. The content information is then communicated to the other members of the virtual user group and can be accessed by members of the virtual user group in such a manner that the privacy of the user and of the other members of the virtual user group is not compromised. The present invention preserves user privacy by controlling and minimizing the amount of user-related information available/accessible to server systems hosting the virtual user groups.
Type:
Grant
Filed:
May 17, 2001
Date of Patent:
February 13, 2007
Assignee:
Stratify, Inc.
Inventors:
Rakesh Mathur, Ramesh Subramonian, Ramana Venkata, Pangal P. Nayak, Joy A. Thomas