Patents Examined by Jeese Pullias - Justia Patents Search

Patents Examined by Jeese Pullias

Phrase based document clustering with automatic phrase extraction

Patent number: 8781817

Abstract: Meaningful phrases are distinguished from chance word sequences statistically, by analyzing a large number of documents and using a statistical metric such as a mutual information metric to distinguish meaningful phrases from groups of words that co-occur by chance. In some embodiments, multiple lists of candidate phrases are maintained to optimize the storage requirement of the phrase-identification algorithm. After phrase identification, a combination of words and meaningful phrases can be used to construct clusters of documents.

Type: Grant

Filed: March 4, 2013

Date of Patent: July 15, 2014

Assignee: Stratify, Inc.

Inventors: Joy Thomas, Karthik Ramachandran