Patents by Inventor Yunbo Cao

Yunbo Cao has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Domain constraint path based data record extraction

Patent number: 9171080

Abstract: Described herein are techniques for extracting data records containing user-generated content from documents. The documents may be processed into document trees in which sub-trees represent the data records of the document. Domain constraints may be used to locate structured portions of the document tree. For example, anchor trees may be located as being sets of sibling sub-trees with similar tag paths that contain the domain constraints. The anchor trees may then be used to determine a record boundary (e.g., the start offset and length) of the data records. Finally, the data records may be extracted based on the anchor trees and the record boundaries.

Type: Grant

Filed: January 23, 2012

Date of Patent: October 27, 2015

Assignee: Microsoft Technology Licensing LLC

Inventors: Xinying Song, Zhiyuan Chen, Yunbo Cao, Chin-Yew Lin
Search query user interface

Patent number: 9092509

Abstract: This disclosure describes, in part, techniques for operating a search query user interface to allow seamless creating, editing and/or refining of a search query using various interactive functions. The techniques described herein may display a search query divided into segments. A selection of a segment of the search query may then be received. One or more alternatives to the selected segment may then be presented. Next, a selection of one of the presented alternative may be received. As a result, the search query may be altered using the selected alternative. Furthermore, the techniques described herein allow a user to operate on a search query using query substitution, expansion, association and/or history functions.

Type: Grant

Filed: November 19, 2012

Date of Patent: July 28, 2015

Assignee: Microsoft Technology Licensing, LLC

Inventors: Shuming Shi, Chin-Yew Lin, Yunbo Cao
Domain constraint based data record extraction

Patent number: 8983980

Abstract: Embodiments for a Mining Data Records based on Anchor Trees (MiBAT) process are disclosed. In accordance with at least one embodiment, the MiBAT process extracts data records containing user-generated content from web documents. The web document is processed into a Document Object Model (DOM) tree in which sub-trees of the DOM tree represent the data records of the web document. Domain constraints are used to locate structured portions of the DOM tree. Anchor trees are then located as being sets of sibling sub-trees which contain the domain constraints. The anchor trees are then used to determine a record boundary (i.e. the start offset and length) of the data records. Finally, the data records are extracted based on the anchor trees and the record boundaries.

Type: Grant

Filed: November 12, 2010

Date of Patent: March 17, 2015

Assignee: Microsoft Technology Licensing, LLC

Inventors: Xinying Song, Yunbo Cao, Chin-Yew Lin
Two stage search

Patent number: 8849787

Abstract: A two stage model identifies individuals having knowledge in a subject matter area relevant to a query. A relevance model receives a query and identifies documents, or other information, relevant to the query. A co-occurrence model identifies individuals, in the retrieved documents, related to the subject matter of the query. Individuals identified can be scored by combining scores from the relevance model and the co-occurrence model and output in a rank ordered list.

Type: Grant

Filed: January 4, 2012

Date of Patent: September 30, 2014

Assignee: Microsoft Corporation

Inventors: Yunbo Cao, Hang Li
Record linkage based on a trained blocking scheme

Patent number: 8843492

Abstract: Some implementations disclosed herein provide techniques and arrangements to train a blocking scheme using both labeled data and unlabeled data. For example, training the blocking scheme may include iteratively: learning a conjunction, identifying first matches in the labeled data and the unlabeled data that are uncovered by the conjunction, and identifying second matches in the labeled data and the unlabeled data that are covered by the conjunction. The conjunction learned in each iteration may be combined using a disjunction. A search engine may use the search engine when searching for records that match an entity.

Type: Grant

Filed: February 13, 2012

Date of Patent: September 23, 2014

Assignee: Microsoft Corporation

Inventors: Yunbo Cao, Chin-Yew Lin, Pei Yue, Zhiyuan Chen
Search Query User Interface

Publication number: 20140143223

Abstract: This disclosure describes, in part, techniques for operating a search query user interface to allow seamless creating, editing and/or refining of a search query using various interactive functions. The techniques described herein may display a search query divided into segments. A selection of a segment of the search query may then be received. One or more alternatives to the selected segment may then be presented. Next, a selection of one of the presented alternative may be received. As a result, the search query may be altered using the selected alternative. Furthermore, the techniques described herein allow a user to operate on a search query using query substitution, expansion, association and/or history functions.

Type: Application

Filed: November 19, 2012

Publication date: May 22, 2014

Applicant: MICROSOFT CORPORATION

Inventors: Shuming Shi, Chin-Yew Lin, Yunbo Cao
RECORD LINKAGE BASED ON A TRAINED BLOCKING SCHEME

Publication number: 20130212103

Abstract: Some implementations disclosed herein provide techniques and arrangements to train a blocking scheme using both labeled data and unlabeled data. For example, training the blocking scheme may include iteratively: learning a conjunction, identifying first matches in the labeled data and the unlabeled data that are uncovered by the conjunction, and identifying second matches in the labeled data and the unlabeled data that are covered by the conjunction. The conjunction learned in each iteration may be combined using a disjunction. A search engine may use the search engine when searching for records that match an entity.

Type: Application

Filed: February 13, 2012

Publication date: August 15, 2013

Applicant: Microsoft Corporation

Inventors: Yunbo Cao, Chin-Yew Lin, Pei Yue, Zhiyuan Chen
Domain Constraint Based Data Record Extraction

Publication number: 20120124077

Abstract: Embodiments for a Mining Data Records based on Anchor Trees (MiBAT) process are disclosed. In accordance with at least one embodiment, the MiBAT process extracts data records containing user-generated content from web documents. The web document is processed into a Document Object Model (DOM) tree in which sub-trees of the DOM tree represent the data records of the web document. Domain constraints are used to locate structured portions of the DOM tree. Anchor trees are then located as being sets of sibling sub-trees which contain the domain constraints. The anchor trees are then used to determine a record boundary (i.e. the start offset and length) of the data records. Finally, the data records are extracted based on the anchor trees and the record boundaries.

Type: Application

Filed: November 12, 2010

Publication date: May 17, 2012

Applicant: MICROSOFT CORPORATION

Inventors: Xinying Song, Yunbo Cao, Chin-Yew Lin
Domain Constraint Path Based Data Record Extraction

Publication number: 20120124086

Abstract: Described herein are techniques for extracting data records containing user-generated content from documents. The documents may be processed into document trees in which sub-trees represent the data records of the document. Domain constraints may be used to locate structured portions of the document tree. For example, anchor trees may be located as being sets of sibling sub-trees with similar tag paths that contain the domain constraints. The anchor trees may then be used to determine a record boundary (e.g., the start offset and length) of the data records. Finally, the data records may be extracted based on the anchor trees and the record boundaries.

Type: Application

Filed: January 23, 2012

Publication date: May 17, 2012

Applicant: MICROSOFT CORPORATION

Inventors: Xinying Song, Zhiyuan Chen, Yunbo Cao, Chin-Yew Lin
TWO STAGE SEARCH

Publication number: 20120109949

Abstract: A two stage model identifies individuals having knowledge in a subject matter area relevant to a query. A relevance model receives a query and identifies documents, or other information, relevant to the query. A co-occurrence model identifies individuals, in the retrieved documents, related to the subject matter of the query. Individuals identified can be scored by combining scores from the relevance model and the co-occurrence model and output in a rank ordered list.

Type: Application

Filed: January 4, 2012

Publication date: May 3, 2012

Applicant: Microsoft Corporation

Inventors: Yunbo CAO, Hang LI
Two stage search

Patent number: 8156097

Abstract: A two stage model identifies individuals having knowledge in a subject matter area relevant to a query. A relevance model receives a query and identifies documents, or other information, relevant to the query. A co-occurrence model identifies individuals, in the retrieved documents, related to the subject matter of the query. Individuals identified can be scored by combining scores from the relevance model and the co-occurrence model and output in a rank ordered list.

Type: Grant

Filed: November 14, 2005

Date of Patent: April 10, 2012

Assignee: Microsoft Corporation

Inventors: Yunbo Cao, Hang Li
Determining utility of a question

Patent number: 8112269

Abstract: A question search system provides a collection of questions having words for use in evaluating the utility of the questions based on a language model. The question search system calculates n-gram probabilities for words within the questions of the collection. The n-gram probability of a word for a sequence of n?1 words indicates the probability of that word being next after that sequence in the collection of questions. The n-gram probabilities for the words of the collection represent the language model of the collection. The question search system calculates a language model utility score for each question within a collection that indicates the likelihood that a question is repeatedly asked by users. The question search system derives the language model utility score for a question from the n-gram probabilities of the words within that question.

Type: Grant

Filed: August 25, 2008

Date of Patent: February 7, 2012

Assignee: Microsoft Corporation

Inventors: Yunbo Cao, Chin-Yew Lin
Searching questions based on topic and focus

Patent number: 8027973

Abstract: A method and system for determining the relevance of questions to a queried question based on topics and focuses of the questions is provided. A question search system provides a collection of questions with topics and focuses. Upon receiving a queried question, the question search system identifies a queried topic and queried focus of the queried question. The question search system generates a score indicating the relevance of a question of the collection to the queried question based on a language model of the topic of the question and a language model of the focus of the question.

Type: Grant

Filed: August 4, 2008

Date of Patent: September 27, 2011

Assignee: Microsoft Corporation

Inventors: Yunbo Cao, Chin-Yew Lin
Clustering question search results based on topic and focus

Patent number: 8024332

Abstract: A method and system for presenting questions that are relevant to a queried question based on clusters of topics and clusters of focuses of the questions is provided. A question search system provides a collection of questions. Each question of the collection has an associated topic and focus. Upon receiving a queried question, the question search system identifies questions of the collection that may be relevant to the queried question and generates a score or ranking indicating relevance of the identified questions. The question search system clusters the identified questions into topic clusters of questions with similar topics. The question search system may also cluster the questions within each topic cluster into focus clusters of questions with similar focuses.

Type: Grant

Filed: August 4, 2008

Date of Patent: September 20, 2011

Assignee: Microsoft Corporation

Inventors: Yunbo Cao, Chin-Yew Lin
Question type-sensitive answer summarization

Patent number: 7966316

Abstract: In a question answering system, the system identifies a type of question input by a user. The system then generates answer summaries that summarize answers to the input question in a format that is determined based on the type of question asked by the user. The answer summaries are output, in the corresponding format, in answer to the input question.

Type: Grant

Filed: April 15, 2008

Date of Patent: June 21, 2011

Assignee: Microsoft Corporation

Inventors: Yunbo Cao, Chin-Yew Lin
Ranking and accessing definitions of terms

Patent number: 7877383

Abstract: A method of processing information is provided. The method includes collecting text strings of definition candidates from a data source. The definition candidates are ranked based on the text strings.

Type: Grant

Filed: April 27, 2005

Date of Patent: January 25, 2011

Assignee: Microsoft Corporation

Inventors: Yunbo Cao, Hang Li, Jun Xu
Mining latent associations of objects using a typed mixture model

Patent number: 7849097

Abstract: A typed separable mixture model is used to mine associative relationships between sets of objects. Instead of modeling only one type of co-occurrence among the sets of objects, the typed separable mixture model can model multiple different types of co-occurrences among more than two sets of objects, and co-occurrences that exist in different contexts.

Type: Grant

Filed: April 12, 2007

Date of Patent: December 7, 2010

Assignee: Microsoft Corporation

Inventors: Yunbo Cao, Hang Li
QUESTION AND ANSWER SEARCH

Publication number: 20100235311

Abstract: Exemplary methods, computer-readable media, and systems are presented for leveraging question-answering knowledge from community sites by complementing product search services with a search of questions, answers, reviews and other Internet accessible content including user-generated content. Product or service information is obtained by crawling Internet-accessible Web sites including community sites. An integrated index of such information is generated. A user is able to browse questions by product or service feature, by topic, by identified comparative questions, and by question ranking (for example, interestingness or popularity).

Type: Application

Filed: March 13, 2009

Publication date: September 16, 2010

Applicant: Microsoft Corporation

Inventors: Yunbo Cao, Chin-Yew Lin, Bo Wang
Predicting Interestingness of Questions in Community Question Answering

Publication number: 20100235343

Abstract: Exemplary methods, computer-readable media, and systems are presented for learning to recommend questions and other user-generated submissions to community sites based on user ratings. The size of available training data is enlarged by taking into consideration questions without user ratings, which in turn benefits the learned model. Question or other user-generated submissions are obtained by crawling Internet-accessible Web sites including community sites. Questions and other submissions, even when not tagged, voted or indicated as “popular” or “interesting” by users are quantitatively indentified as “interesting.

Type: Application

Filed: September 29, 2009

Publication date: September 16, 2010

Applicant: Microsoft Corporation

Inventors: Yunbo Cao, Chin-Yew Lin, Young-In Song
Training a ranking component

Patent number: 7783629

Abstract: A query and a factoid type selection are received from a user. An index of passages, indexed based on factoids, is accessed and passages that are related to the query, and that have the selected factoid type, are retrieved. The retrieved passages are ranked and provided to the user based on a calculated score, in rank order.

Type: Grant

Filed: January 5, 2006

Date of Patent: August 24, 2010

Assignee: Microsoft Corporation

Inventors: Hang Li, Jianfeng Gao, Yunbo Cao

1 2 3 next