Patents by Inventor Yunbo Cao

Yunbo Cao has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20070136280
    Abstract: A query and a factoid type selection are received from a user. An index of passages, indexed based on factoids, is accessed and passages that are related to the query, and that have the selected factoid type, are retrieved. The retrieved passages are ranked and provided to the user based on a calculated score, in rank order.
    Type: Application
    Filed: December 13, 2005
    Publication date: June 14, 2007
    Applicant: Microsoft Corporation
    Inventors: Hang Li, Jianfeng Gao, Yunbo Cao
  • Publication number: 20070136281
    Abstract: A query and a factoid type selection are received from a user. An index of passages, indexed based on factoids, is accessed and passages that are related to the query, and that have the selected factoid type, are retrieved. The retrieved passages are ranked and provided to the user based on a calculated score, in rank order.
    Type: Application
    Filed: January 5, 2006
    Publication date: June 14, 2007
    Applicant: Microsoft Corporation
    Inventors: Hang Li, Jianfeng Gao, Yunbo Cao
  • Publication number: 20070130263
    Abstract: A cascaded processing approach is used to clean noisy electronic mail or other text messaging data. Non-text filtering is first performed on the noisy data to filter out non-text items in the data. Text normalization is then performed on the filtered data to provide cleaned data. The cleaned data can be used in one or more of a wide variety of other applications or processing systems.
    Type: Application
    Filed: December 2, 2005
    Publication date: June 7, 2007
    Applicant: Microsoft Corporation
    Inventors: Hang Li, Yunbo Cao, ZhaoHui Tang
  • Publication number: 20070112720
    Abstract: A two stage model identifies individuals having knowledge in a subject matter area relevant to a query. A relevance model receives a query and identifies documents, or other information, relevant to the query. A co-occurrence model identifies individuals, in the retrieved documents, related to the subject matter of the query. Individuals identified can be scored by combining scores from the relevance model and the co-occurrence model and output in a rank ordered list.
    Type: Application
    Filed: November 14, 2005
    Publication date: May 17, 2007
    Applicant: Microsoft Corporation
    Inventors: Yunbo Cao, Hang Li
  • Publication number: 20060277173
    Abstract: An information extraction model is trained on format features identified within labeled training documents. Information from a document is extracted by assigning labels to units based on format features of the units within the document. A begin label and end label are identified and the information is extracted between the begin label and the end label. The extracted information can be used in various document processing tasks such as ranking.
    Type: Application
    Filed: July 29, 2005
    Publication date: December 7, 2006
    Applicant: Microsoft Corporation
    Inventors: Hang Li, Ruihua Song, Yunbo Cao, Dmitriy Meyerzon
  • Publication number: 20060248049
    Abstract: A method of processing information is provided. The method includes collecting text strings of definition candidates from a data source. The definition candidates are ranked based on the text strings.
    Type: Application
    Filed: April 27, 2005
    Publication date: November 2, 2006
    Applicant: Microsoft Corporation
    Inventors: Yunbo Cao, Hang Li, Jun Xu
  • Publication number: 20060206306
    Abstract: A method for extracting key terms and associated key terms for use in text mining is provided. The method includes receiving unstructured text documents, such as emails over a customer service system. Term candidates are extracted based on identifying consecutive word strings satisfying a context independency threshold. Term candidates are weighted using mutual information to generate a list of weighted terms. The weighted terms are then recounted. Terms are associated based on Chi-square values. Associated terms can then be used for information retrieval. A user interface can be personalized with individual user profiles.
    Type: Application
    Filed: February 9, 2005
    Publication date: September 14, 2006
    Applicant: Microsoft Corporation
    Inventors: Yunbo Cao, Hang Li, Olivier Ribet, Benjamin Martin
  • Publication number: 20060047637
    Abstract: The present invention is a system for answering questions. The present invention uses a data mining module to mine data, such as enterprise data, and to configure the data to answer a predetermined number of questions each having a predefined form. The present invention also provides a user interface component for receiving user queries and responding to those queries.
    Type: Application
    Filed: September 2, 2004
    Publication date: March 2, 2006
    Applicant: Microsoft Corporation
    Inventors: Dmitriy Meyerzon, Hang Li, Joseph Sherman, Yunbo Cao, Zheng Chen
  • Publication number: 20050283357
    Abstract: A method for performing data mining is provided. The method includes selecting at least one data source of unstructured text. Additionally, a transformation is selected to identify a list of terms in the unstructured text. A run-time path is established to connect the data source to the transformation to load the list of terms identified into a destination database.
    Type: Application
    Filed: October 21, 2004
    Publication date: December 22, 2005
    Applicant: Microsoft Corporation
    Inventors: C. MacLennan, Hang Li, Ming Zhou, Yunbo Cao, ZhaoHui Tang
  • Publication number: 20050131896
    Abstract: The present invention relates to extracting information from an information source. During extraction, strings in the information source are accessed. These strings in the information source are matched with generalized extraction patterns that include words and wildcards. The wildcards denote that at least one word in an individual string can be skipped in order to match the individual string to an individual generalized extraction pattern.
    Type: Application
    Filed: December 11, 2003
    Publication date: June 16, 2005
    Applicant: Microsoft Corporation
    Inventors: Yunbo Cao, Hang Li
  • Publication number: 20050131850
    Abstract: Collaborative bootstrapping with uncertainty reduction for increased classifier performance. One classifier selects a portion of data that is uncertain with respect to the classifier and a second classifier labels the portion. Uncertainty reduction includes parallel processing where the second classifier also selects an uncertain portion for the first classifier to label. Uncertainty reduction can be incorporated into existing or new co-training or bootstrapping, including bilingual bootstrapping.
    Type: Application
    Filed: December 10, 2003
    Publication date: June 16, 2005
    Applicant: Microsoft Corporation
    Inventors: Yunbo Cao, Hang Li
  • Publication number: 20050108266
    Abstract: A computer-implemented method is provided that includes receiving a document and determining a file type for the document. In addition, the document is segmented into blocks of text as a function of the file type and at least one keyword and a summary is generated for the document.
    Type: Application
    Filed: November 14, 2003
    Publication date: May 19, 2005
    Applicant: Microsoft Corporation
    Inventors: Yunbo Cao, Hang Li