Patents by Inventor Harr Chen

Harr Chen has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Methods and systems for automatically summarizing semantic properties from documents with freeform textual annotations

Publication number: 20100153318

Abstract: Some embodiments are directed to identifying semantic properties of documents using free-text annotations associated with the documents. Semantic properties of documents may be identified by using a model that is trained on a corpus of training documents where one or more of the training documents may include free-text annotations. In some embodiments, the model may identify semantic topics expressed only in free-text annotations or only in the body of a document. The model may applied to identify semantic topics associated with a work document or to summarize the semantic topics present in a plurality of work documents.

Type: Application

Filed: November 19, 2009

Publication date: June 17, 2010

Applicant: Massachusetts Institute of Technology

Inventors: Satchuthananthavale Rasiah Kuhan Branavan, Harr Chen, Jacob Richard Eisenstein, Regina Barzilay
Building and using subwebs for focused search

Patent number: 7392278

Abstract: A system that facilitates performance of a focused search over a collection of sites comprises a subweb that corresponds to a topic and/or user characteristic(s) that are of interest to the user. The subweb includes a plurality of domains and/or paths (e.g. sites) that are related to the topic and/or the user characteristic(s). Each of the sites within the subweb is assigned a weight that indicates relevance of the site to the desirable topic and/or user characteristic(s). A search engine employs the subweb to facilitate focusing a search over a collection of sites. The search engine receives a query, and utilizes the subweb to focus a search over the selection of sites corresponding to the topic and/or user characteristic(s) represented by the subweb. The results from the search are returned to the user based at least in part upon the relevance weights assigned to the sites within the subweb.

Type: Grant

Filed: February 13, 2004

Date of Patent: June 24, 2008

Assignee: Microsoft Corporation

Inventors: Harr Chen, Raman Chandrasekar, Simon H. Corston, Eric D. Brill
Machine-learned approach to determining document relevance for search over large electronic collections of documents

Patent number: 7287012

Abstract: The present invention relates to a system and methodology that applies automated learning procedures for determining document relevance and assisting information retrieval activities. A system is provided that facilitates a machine-learned approach to determine document relevance. The system includes a storage component that receives a set of human selected items to be employed as positive test cases of highly relevant documents. A training component trains at least one classifier with the human selected items as positive test cases and one or more other items as negative test cases in order to provide a query-independent model, wherein the other items can be selected by a statistical search, for example. Also, the trained classifier can be employed to aid an individual in identifying and selecting new positive cases or utilized to filter or re-rank results from a statistical-based search.

Type: Grant

Filed: January 9, 2004

Date of Patent: October 23, 2007

Assignee: Microsoft Corporation

Inventors: Simon H. Corston, Raman Chandrasekar, Harr Chen
System and method for using anchor text as training data for classifier-based search systems

Publication number: 20060143254

Abstract: A computer implemented information retrieval system is provided. The system includes a user input configured to receive a user query relative to the corpus. A machine learning classifier is trained with a first set of training data comprising anchor text relative to at least some of the documents in the corpus. A processing unit is adapted to interact with the classifier to obtain search results relative to the query using the machine learning classifier. In some aspects, the classifier is also trained with a second set of training data. A method of integrating a new document into a corpus of documents is also provided. A method of training a machine learning classifier for retrieving documents from a corpus using two distinct types of training data is also provided.

Type: Application

Filed: December 24, 2004

Publication date: June 29, 2006

Applicant: Microsoft Corporation

Inventors: Harr Chen, Adwait Ratnaparkhi, Sonja Knoll, Hsiao-Wuen Hon
Building and using subwebs for focused search

Publication number: 20050165753

Abstract: A system that facilitates performance of a focused search over a collection of sites comprises a subweb that corresponds to a topic and/or user characteristic(s) that are of interest to the user. The subweb includes a plurality of domains and/or paths (e.g. sites) that are related to the topic and/or the user characteristic(s). Each of the sites within the subweb is assigned a weight that indicates relevance of the site to the desirable topic and/or user characteristic(s). A search engine employs the subweb to facilitate focusing a search over a collection of sites. The search engine receives a query, and utilizes the subweb to focus a search over the selection of sites corresponding to the topic and/or user characteristic(s) represented by the subweb. The results from the search are returned to the user based at least in part upon the relevance weights assigned to the sites within the subweb.

Type: Application

Filed: February 13, 2004

Publication date: July 28, 2005

Inventors: Harr Chen, Raman Chandrasekar, Simon Corston, Eric Brill
Machine-learned approach to determining document relevance for search over large electronic collections of documents

Publication number: 20050154686

Abstract: The present invention relates to a system and methodology that applies automated learning procedures for determining document relevance and assisting information retrieval activities. A system is provided that facilitates a machine-learned approach to determine document relevance. The system includes a storage component that receives a set of human selected items to be employed as positive test cases of highly relevant documents. A training component trains at least one classifier with the human selected items as positive test cases and one or more other items as negative test cases in order to provide a query-independent model, wherein the other items can be selected by a statistical search, for example. Also, the trained classifier can be employed to aid an individual in identifying and selecting new positive cases or utilized to filter or re-rank results from a statistical-based search.

Type: Application

Filed: January 9, 2004

Publication date: July 14, 2005

Inventors: Simon Corston, Raman Chandrasekar, Harr Chen

Methods and systems for automatically summarizing semantic properties from documents with freeform textual annotations

Building and using subwebs for focused search

Machine-learned approach to determining document relevance for search over large electronic collections of documents

System and method for using anchor text as training data for classifier-based search systems

Building and using subwebs for focused search

Machine-learned approach to determining document relevance for search over large electronic collections of documents