Patents by Inventor Harr Chen

Harr Chen has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20100153318
    Abstract: Some embodiments are directed to identifying semantic properties of documents using free-text annotations associated with the documents. Semantic properties of documents may be identified by using a model that is trained on a corpus of training documents where one or more of the training documents may include free-text annotations. In some embodiments, the model may identify semantic topics expressed only in free-text annotations or only in the body of a document. The model may applied to identify semantic topics associated with a work document or to summarize the semantic topics present in a plurality of work documents.
    Type: Application
    Filed: November 19, 2009
    Publication date: June 17, 2010
    Applicant: Massachusetts Institute of Technology
    Inventors: Satchuthananthavale Rasiah Kuhan Branavan, Harr Chen, Jacob Richard Eisenstein, Regina Barzilay
  • Patent number: 7392278
    Abstract: A system that facilitates performance of a focused search over a collection of sites comprises a subweb that corresponds to a topic and/or user characteristic(s) that are of interest to the user. The subweb includes a plurality of domains and/or paths (e.g. sites) that are related to the topic and/or the user characteristic(s). Each of the sites within the subweb is assigned a weight that indicates relevance of the site to the desirable topic and/or user characteristic(s). A search engine employs the subweb to facilitate focusing a search over a collection of sites. The search engine receives a query, and utilizes the subweb to focus a search over the selection of sites corresponding to the topic and/or user characteristic(s) represented by the subweb. The results from the search are returned to the user based at least in part upon the relevance weights assigned to the sites within the subweb.
    Type: Grant
    Filed: February 13, 2004
    Date of Patent: June 24, 2008
    Assignee: Microsoft Corporation
    Inventors: Harr Chen, Raman Chandrasekar, Simon H. Corston, Eric D. Brill
  • Patent number: 7287012
    Abstract: The present invention relates to a system and methodology that applies automated learning procedures for determining document relevance and assisting information retrieval activities. A system is provided that facilitates a machine-learned approach to determine document relevance. The system includes a storage component that receives a set of human selected items to be employed as positive test cases of highly relevant documents. A training component trains at least one classifier with the human selected items as positive test cases and one or more other items as negative test cases in order to provide a query-independent model, wherein the other items can be selected by a statistical search, for example. Also, the trained classifier can be employed to aid an individual in identifying and selecting new positive cases or utilized to filter or re-rank results from a statistical-based search.
    Type: Grant
    Filed: January 9, 2004
    Date of Patent: October 23, 2007
    Assignee: Microsoft Corporation
    Inventors: Simon H. Corston, Raman Chandrasekar, Harr Chen
  • Publication number: 20060143254
    Abstract: A computer implemented information retrieval system is provided. The system includes a user input configured to receive a user query relative to the corpus. A machine learning classifier is trained with a first set of training data comprising anchor text relative to at least some of the documents in the corpus. A processing unit is adapted to interact with the classifier to obtain search results relative to the query using the machine learning classifier. In some aspects, the classifier is also trained with a second set of training data. A method of integrating a new document into a corpus of documents is also provided. A method of training a machine learning classifier for retrieving documents from a corpus using two distinct types of training data is also provided.
    Type: Application
    Filed: December 24, 2004
    Publication date: June 29, 2006
    Applicant: Microsoft Corporation
    Inventors: Harr Chen, Adwait Ratnaparkhi, Sonja Knoll, Hsiao-Wuen Hon
  • Publication number: 20050165753
    Abstract: A system that facilitates performance of a focused search over a collection of sites comprises a subweb that corresponds to a topic and/or user characteristic(s) that are of interest to the user. The subweb includes a plurality of domains and/or paths (e.g. sites) that are related to the topic and/or the user characteristic(s). Each of the sites within the subweb is assigned a weight that indicates relevance of the site to the desirable topic and/or user characteristic(s). A search engine employs the subweb to facilitate focusing a search over a collection of sites. The search engine receives a query, and utilizes the subweb to focus a search over the selection of sites corresponding to the topic and/or user characteristic(s) represented by the subweb. The results from the search are returned to the user based at least in part upon the relevance weights assigned to the sites within the subweb.
    Type: Application
    Filed: February 13, 2004
    Publication date: July 28, 2005
    Inventors: Harr Chen, Raman Chandrasekar, Simon Corston, Eric Brill
  • Publication number: 20050154686
    Abstract: The present invention relates to a system and methodology that applies automated learning procedures for determining document relevance and assisting information retrieval activities. A system is provided that facilitates a machine-learned approach to determine document relevance. The system includes a storage component that receives a set of human selected items to be employed as positive test cases of highly relevant documents. A training component trains at least one classifier with the human selected items as positive test cases and one or more other items as negative test cases in order to provide a query-independent model, wherein the other items can be selected by a statistical search, for example. Also, the trained classifier can be employed to aid an individual in identifying and selecting new positive cases or utilized to filter or re-rank results from a statistical-based search.
    Type: Application
    Filed: January 9, 2004
    Publication date: July 14, 2005
    Inventors: Simon Corston, Raman Chandrasekar, Harr Chen