Patents by Inventor Yunbo Cao

Yunbo Cao has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Factoid-based searching

Publication number: 20070136280

Abstract: A query and a factoid type selection are received from a user. An index of passages, indexed based on factoids, is accessed and passages that are related to the query, and that have the selected factoid type, are retrieved. The retrieved passages are ranked and provided to the user based on a calculated score, in rank order.

Type: Application

Filed: December 13, 2005

Publication date: June 14, 2007

Applicant: Microsoft Corporation

Inventors: Hang Li, Jianfeng Gao, Yunbo Cao
Training a ranking component

Publication number: 20070136281

Abstract: A query and a factoid type selection are received from a user. An index of passages, indexed based on factoids, is accessed and passages that are related to the query, and that have the selected factoid type, are retrieved. The retrieved passages are ranked and provided to the user based on a calculated score, in rank order.

Type: Application

Filed: January 5, 2006

Publication date: June 14, 2007

Applicant: Microsoft Corporation

Inventors: Hang Li, Jianfeng Gao, Yunbo Cao
Electronic mail data cleaning

Publication number: 20070130263

Abstract: A cascaded processing approach is used to clean noisy electronic mail or other text messaging data. Non-text filtering is first performed on the noisy data to filter out non-text items in the data. Text normalization is then performed on the filtered data to provide cleaned data. The cleaned data can be used in one or more of a wide variety of other applications or processing systems.

Type: Application

Filed: December 2, 2005

Publication date: June 7, 2007

Applicant: Microsoft Corporation

Inventors: Hang Li, Yunbo Cao, ZhaoHui Tang
Two stage search

Publication number: 20070112720

Abstract: A two stage model identifies individuals having knowledge in a subject matter area relevant to a query. A relevance model receives a query and identifies documents, or other information, relevant to the query. A co-occurrence model identifies individuals, in the retrieved documents, related to the subject matter of the query. Individuals identified can be scored by combining scores from the relevance model and the co-occurrence model and output in a rank ordered list.

Type: Application

Filed: November 14, 2005

Publication date: May 17, 2007

Applicant: Microsoft Corporation

Inventors: Yunbo Cao, Hang Li
Extraction of information from documents

Publication number: 20060277173

Abstract: An information extraction model is trained on format features identified within labeled training documents. Information from a document is extracted by assigning labels to units based on format features of the units within the document. A begin label and end label are identified and the information is extracted between the begin label and the end label. The extracted information can be used in various document processing tasks such as ranking.

Type: Application

Filed: July 29, 2005

Publication date: December 7, 2006

Applicant: Microsoft Corporation

Inventors: Hang Li, Ruihua Song, Yunbo Cao, Dmitriy Meyerzon
Ranking and accessing definitions of terms

Publication number: 20060248049

Abstract: A method of processing information is provided. The method includes collecting text strings of definition candidates from a data source. The definition candidates are ranked based on the text strings.

Type: Application

Filed: April 27, 2005

Publication date: November 2, 2006

Applicant: Microsoft Corporation

Inventors: Yunbo Cao, Hang Li, Jun Xu
Text mining apparatus and associated methods

Publication number: 20060206306

Abstract: A method for extracting key terms and associated key terms for use in text mining is provided. The method includes receiving unstructured text documents, such as emails over a customer service system. Term candidates are extracted based on identifying consecutive word strings satisfying a context independency threshold. Term candidates are weighted using mutual information to generate a list of weighted terms. The weighted terms are then recounted. Terms are associated based on Chi-square values. Associated terms can then be used for information retrieval. A user interface can be personalized with individual user profiles.

Type: Application

Filed: February 9, 2005

Publication date: September 14, 2006

Applicant: Microsoft Corporation

Inventors: Yunbo Cao, Hang Li, Olivier Ribet, Benjamin Martin
System and method for managing information by answering a predetermined number of predefined questions

Publication number: 20060047637

Abstract: The present invention is a system for answering questions. The present invention uses a data mining module to mine data, such as enterprise data, and to configure the data to answer a predetermined number of questions each having a predefined form. The present invention also provides a user interface component for receiving user queries and responding to those queries.

Type: Application

Filed: September 2, 2004

Publication date: March 2, 2006

Applicant: Microsoft Corporation

Inventors: Dmitriy Meyerzon, Hang Li, Joseph Sherman, Yunbo Cao, Zheng Chen
Text mining method

Publication number: 20050283357

Abstract: A method for performing data mining is provided. The method includes selecting at least one data source of unstructured text. Additionally, a transformation is selected to identify a list of terms in the unstructured text. A run-time path is established to connect the data source to the transformation to load the list of terms identified into a destination database.

Type: Application

Filed: October 21, 2004

Publication date: December 22, 2005

Applicant: Microsoft Corporation

Inventors: C. MacLennan, Hang Li, Ming Zhou, Yunbo Cao, ZhaoHui Tang
Learning and using generalized string patterns for information extraction

Publication number: 20050131896

Abstract: The present invention relates to extracting information from an information source. During extraction, strings in the information source are accessed. These strings in the information source are matched with generalized extraction patterns that include words and wildcards. The wildcards denote that at least one word in an individual string can be skipped in order to match the individual string to an individual generalized extraction pattern.

Type: Application

Filed: December 11, 2003

Publication date: June 16, 2005

Applicant: Microsoft Corporation

Inventors: Yunbo Cao, Hang Li
Uncertainty reduction in collaborative bootstrapping

Publication number: 20050131850

Abstract: Collaborative bootstrapping with uncertainty reduction for increased classifier performance. One classifier selects a portion of data that is uncertain with respect to the classifier and a second classifier labels the portion. Uncertainty reduction includes parallel processing where the second classifier also selects an uncertain portion for the first classifier to label. Uncertainty reduction can be incorporated into existing or new co-training or bootstrapping, including bilingual bootstrapping.

Type: Application

Filed: December 10, 2003

Publication date: June 16, 2005

Applicant: Microsoft Corporation

Inventors: Yunbo Cao, Hang Li
Method and apparatus for browsing document content

Publication number: 20050108266

Abstract: A computer-implemented method is provided that includes receiving a document and determining a file type for the document. In addition, the document is segmented into blocks of text as a function of the file type and at least one keyword and a summary is generated for the document.

Type: Application

Filed: November 14, 2003

Publication date: May 19, 2005

Applicant: Microsoft Corporation

Inventors: Yunbo Cao, Hang Li

prev 1 2 3