Patents by Inventor Yunbo Cao
Yunbo Cao has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 9171080Abstract: Described herein are techniques for extracting data records containing user-generated content from documents. The documents may be processed into document trees in which sub-trees represent the data records of the document. Domain constraints may be used to locate structured portions of the document tree. For example, anchor trees may be located as being sets of sibling sub-trees with similar tag paths that contain the domain constraints. The anchor trees may then be used to determine a record boundary (e.g., the start offset and length) of the data records. Finally, the data records may be extracted based on the anchor trees and the record boundaries.Type: GrantFiled: January 23, 2012Date of Patent: October 27, 2015Assignee: Microsoft Technology Licensing LLCInventors: Xinying Song, Zhiyuan Chen, Yunbo Cao, Chin-Yew Lin
-
Patent number: 9092509Abstract: This disclosure describes, in part, techniques for operating a search query user interface to allow seamless creating, editing and/or refining of a search query using various interactive functions. The techniques described herein may display a search query divided into segments. A selection of a segment of the search query may then be received. One or more alternatives to the selected segment may then be presented. Next, a selection of one of the presented alternative may be received. As a result, the search query may be altered using the selected alternative. Furthermore, the techniques described herein allow a user to operate on a search query using query substitution, expansion, association and/or history functions.Type: GrantFiled: November 19, 2012Date of Patent: July 28, 2015Assignee: Microsoft Technology Licensing, LLCInventors: Shuming Shi, Chin-Yew Lin, Yunbo Cao
-
Patent number: 8983980Abstract: Embodiments for a Mining Data Records based on Anchor Trees (MiBAT) process are disclosed. In accordance with at least one embodiment, the MiBAT process extracts data records containing user-generated content from web documents. The web document is processed into a Document Object Model (DOM) tree in which sub-trees of the DOM tree represent the data records of the web document. Domain constraints are used to locate structured portions of the DOM tree. Anchor trees are then located as being sets of sibling sub-trees which contain the domain constraints. The anchor trees are then used to determine a record boundary (i.e. the start offset and length) of the data records. Finally, the data records are extracted based on the anchor trees and the record boundaries.Type: GrantFiled: November 12, 2010Date of Patent: March 17, 2015Assignee: Microsoft Technology Licensing, LLCInventors: Xinying Song, Yunbo Cao, Chin-Yew Lin
-
Patent number: 8849787Abstract: A two stage model identifies individuals having knowledge in a subject matter area relevant to a query. A relevance model receives a query and identifies documents, or other information, relevant to the query. A co-occurrence model identifies individuals, in the retrieved documents, related to the subject matter of the query. Individuals identified can be scored by combining scores from the relevance model and the co-occurrence model and output in a rank ordered list.Type: GrantFiled: January 4, 2012Date of Patent: September 30, 2014Assignee: Microsoft CorporationInventors: Yunbo Cao, Hang Li
-
Patent number: 8843492Abstract: Some implementations disclosed herein provide techniques and arrangements to train a blocking scheme using both labeled data and unlabeled data. For example, training the blocking scheme may include iteratively: learning a conjunction, identifying first matches in the labeled data and the unlabeled data that are uncovered by the conjunction, and identifying second matches in the labeled data and the unlabeled data that are covered by the conjunction. The conjunction learned in each iteration may be combined using a disjunction. A search engine may use the search engine when searching for records that match an entity.Type: GrantFiled: February 13, 2012Date of Patent: September 23, 2014Assignee: Microsoft CorporationInventors: Yunbo Cao, Chin-Yew Lin, Pei Yue, Zhiyuan Chen
-
Publication number: 20140143223Abstract: This disclosure describes, in part, techniques for operating a search query user interface to allow seamless creating, editing and/or refining of a search query using various interactive functions. The techniques described herein may display a search query divided into segments. A selection of a segment of the search query may then be received. One or more alternatives to the selected segment may then be presented. Next, a selection of one of the presented alternative may be received. As a result, the search query may be altered using the selected alternative. Furthermore, the techniques described herein allow a user to operate on a search query using query substitution, expansion, association and/or history functions.Type: ApplicationFiled: November 19, 2012Publication date: May 22, 2014Applicant: MICROSOFT CORPORATIONInventors: Shuming Shi, Chin-Yew Lin, Yunbo Cao
-
Publication number: 20130212103Abstract: Some implementations disclosed herein provide techniques and arrangements to train a blocking scheme using both labeled data and unlabeled data. For example, training the blocking scheme may include iteratively: learning a conjunction, identifying first matches in the labeled data and the unlabeled data that are uncovered by the conjunction, and identifying second matches in the labeled data and the unlabeled data that are covered by the conjunction. The conjunction learned in each iteration may be combined using a disjunction. A search engine may use the search engine when searching for records that match an entity.Type: ApplicationFiled: February 13, 2012Publication date: August 15, 2013Applicant: Microsoft CorporationInventors: Yunbo Cao, Chin-Yew Lin, Pei Yue, Zhiyuan Chen
-
Publication number: 20120124077Abstract: Embodiments for a Mining Data Records based on Anchor Trees (MiBAT) process are disclosed. In accordance with at least one embodiment, the MiBAT process extracts data records containing user-generated content from web documents. The web document is processed into a Document Object Model (DOM) tree in which sub-trees of the DOM tree represent the data records of the web document. Domain constraints are used to locate structured portions of the DOM tree. Anchor trees are then located as being sets of sibling sub-trees which contain the domain constraints. The anchor trees are then used to determine a record boundary (i.e. the start offset and length) of the data records. Finally, the data records are extracted based on the anchor trees and the record boundaries.Type: ApplicationFiled: November 12, 2010Publication date: May 17, 2012Applicant: MICROSOFT CORPORATIONInventors: Xinying Song, Yunbo Cao, Chin-Yew Lin
-
Publication number: 20120124086Abstract: Described herein are techniques for extracting data records containing user-generated content from documents. The documents may be processed into document trees in which sub-trees represent the data records of the document. Domain constraints may be used to locate structured portions of the document tree. For example, anchor trees may be located as being sets of sibling sub-trees with similar tag paths that contain the domain constraints. The anchor trees may then be used to determine a record boundary (e.g., the start offset and length) of the data records. Finally, the data records may be extracted based on the anchor trees and the record boundaries.Type: ApplicationFiled: January 23, 2012Publication date: May 17, 2012Applicant: MICROSOFT CORPORATIONInventors: Xinying Song, Zhiyuan Chen, Yunbo Cao, Chin-Yew Lin
-
Publication number: 20120109949Abstract: A two stage model identifies individuals having knowledge in a subject matter area relevant to a query. A relevance model receives a query and identifies documents, or other information, relevant to the query. A co-occurrence model identifies individuals, in the retrieved documents, related to the subject matter of the query. Individuals identified can be scored by combining scores from the relevance model and the co-occurrence model and output in a rank ordered list.Type: ApplicationFiled: January 4, 2012Publication date: May 3, 2012Applicant: Microsoft CorporationInventors: Yunbo CAO, Hang LI
-
Patent number: 8156097Abstract: A two stage model identifies individuals having knowledge in a subject matter area relevant to a query. A relevance model receives a query and identifies documents, or other information, relevant to the query. A co-occurrence model identifies individuals, in the retrieved documents, related to the subject matter of the query. Individuals identified can be scored by combining scores from the relevance model and the co-occurrence model and output in a rank ordered list.Type: GrantFiled: November 14, 2005Date of Patent: April 10, 2012Assignee: Microsoft CorporationInventors: Yunbo Cao, Hang Li
-
Patent number: 8112269Abstract: A question search system provides a collection of questions having words for use in evaluating the utility of the questions based on a language model. The question search system calculates n-gram probabilities for words within the questions of the collection. The n-gram probability of a word for a sequence of n?1 words indicates the probability of that word being next after that sequence in the collection of questions. The n-gram probabilities for the words of the collection represent the language model of the collection. The question search system calculates a language model utility score for each question within a collection that indicates the likelihood that a question is repeatedly asked by users. The question search system derives the language model utility score for a question from the n-gram probabilities of the words within that question.Type: GrantFiled: August 25, 2008Date of Patent: February 7, 2012Assignee: Microsoft CorporationInventors: Yunbo Cao, Chin-Yew Lin
-
Patent number: 8027973Abstract: A method and system for determining the relevance of questions to a queried question based on topics and focuses of the questions is provided. A question search system provides a collection of questions with topics and focuses. Upon receiving a queried question, the question search system identifies a queried topic and queried focus of the queried question. The question search system generates a score indicating the relevance of a question of the collection to the queried question based on a language model of the topic of the question and a language model of the focus of the question.Type: GrantFiled: August 4, 2008Date of Patent: September 27, 2011Assignee: Microsoft CorporationInventors: Yunbo Cao, Chin-Yew Lin
-
Patent number: 8024332Abstract: A method and system for presenting questions that are relevant to a queried question based on clusters of topics and clusters of focuses of the questions is provided. A question search system provides a collection of questions. Each question of the collection has an associated topic and focus. Upon receiving a queried question, the question search system identifies questions of the collection that may be relevant to the queried question and generates a score or ranking indicating relevance of the identified questions. The question search system clusters the identified questions into topic clusters of questions with similar topics. The question search system may also cluster the questions within each topic cluster into focus clusters of questions with similar focuses.Type: GrantFiled: August 4, 2008Date of Patent: September 20, 2011Assignee: Microsoft CorporationInventors: Yunbo Cao, Chin-Yew Lin
-
Patent number: 7966316Abstract: In a question answering system, the system identifies a type of question input by a user. The system then generates answer summaries that summarize answers to the input question in a format that is determined based on the type of question asked by the user. The answer summaries are output, in the corresponding format, in answer to the input question.Type: GrantFiled: April 15, 2008Date of Patent: June 21, 2011Assignee: Microsoft CorporationInventors: Yunbo Cao, Chin-Yew Lin
-
Patent number: 7877383Abstract: A method of processing information is provided. The method includes collecting text strings of definition candidates from a data source. The definition candidates are ranked based on the text strings.Type: GrantFiled: April 27, 2005Date of Patent: January 25, 2011Assignee: Microsoft CorporationInventors: Yunbo Cao, Hang Li, Jun Xu
-
Patent number: 7849097Abstract: A typed separable mixture model is used to mine associative relationships between sets of objects. Instead of modeling only one type of co-occurrence among the sets of objects, the typed separable mixture model can model multiple different types of co-occurrences among more than two sets of objects, and co-occurrences that exist in different contexts.Type: GrantFiled: April 12, 2007Date of Patent: December 7, 2010Assignee: Microsoft CorporationInventors: Yunbo Cao, Hang Li
-
Publication number: 20100235311Abstract: Exemplary methods, computer-readable media, and systems are presented for leveraging question-answering knowledge from community sites by complementing product search services with a search of questions, answers, reviews and other Internet accessible content including user-generated content. Product or service information is obtained by crawling Internet-accessible Web sites including community sites. An integrated index of such information is generated. A user is able to browse questions by product or service feature, by topic, by identified comparative questions, and by question ranking (for example, interestingness or popularity).Type: ApplicationFiled: March 13, 2009Publication date: September 16, 2010Applicant: Microsoft CorporationInventors: Yunbo Cao, Chin-Yew Lin, Bo Wang
-
Publication number: 20100235343Abstract: Exemplary methods, computer-readable media, and systems are presented for learning to recommend questions and other user-generated submissions to community sites based on user ratings. The size of available training data is enlarged by taking into consideration questions without user ratings, which in turn benefits the learned model. Question or other user-generated submissions are obtained by crawling Internet-accessible Web sites including community sites. Questions and other submissions, even when not tagged, voted or indicated as “popular” or “interesting” by users are quantitatively indentified as “interesting.Type: ApplicationFiled: September 29, 2009Publication date: September 16, 2010Applicant: Microsoft CorporationInventors: Yunbo Cao, Chin-Yew Lin, Young-In Song
-
Patent number: 7783629Abstract: A query and a factoid type selection are received from a user. An index of passages, indexed based on factoids, is accessed and passages that are related to the query, and that have the selected factoid type, are retrieved. The retrieved passages are ranked and provided to the user based on a calculated score, in rank order.Type: GrantFiled: January 5, 2006Date of Patent: August 24, 2010Assignee: Microsoft CorporationInventors: Hang Li, Jianfeng Gao, Yunbo Cao