Patents by Inventor Yunbo Cao

Yunbo Cao has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 9171080
    Abstract: Described herein are techniques for extracting data records containing user-generated content from documents. The documents may be processed into document trees in which sub-trees represent the data records of the document. Domain constraints may be used to locate structured portions of the document tree. For example, anchor trees may be located as being sets of sibling sub-trees with similar tag paths that contain the domain constraints. The anchor trees may then be used to determine a record boundary (e.g., the start offset and length) of the data records. Finally, the data records may be extracted based on the anchor trees and the record boundaries.
    Type: Grant
    Filed: January 23, 2012
    Date of Patent: October 27, 2015
    Assignee: Microsoft Technology Licensing LLC
    Inventors: Xinying Song, Zhiyuan Chen, Yunbo Cao, Chin-Yew Lin
  • Patent number: 9092509
    Abstract: This disclosure describes, in part, techniques for operating a search query user interface to allow seamless creating, editing and/or refining of a search query using various interactive functions. The techniques described herein may display a search query divided into segments. A selection of a segment of the search query may then be received. One or more alternatives to the selected segment may then be presented. Next, a selection of one of the presented alternative may be received. As a result, the search query may be altered using the selected alternative. Furthermore, the techniques described herein allow a user to operate on a search query using query substitution, expansion, association and/or history functions.
    Type: Grant
    Filed: November 19, 2012
    Date of Patent: July 28, 2015
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Shuming Shi, Chin-Yew Lin, Yunbo Cao
  • Patent number: 8983980
    Abstract: Embodiments for a Mining Data Records based on Anchor Trees (MiBAT) process are disclosed. In accordance with at least one embodiment, the MiBAT process extracts data records containing user-generated content from web documents. The web document is processed into a Document Object Model (DOM) tree in which sub-trees of the DOM tree represent the data records of the web document. Domain constraints are used to locate structured portions of the DOM tree. Anchor trees are then located as being sets of sibling sub-trees which contain the domain constraints. The anchor trees are then used to determine a record boundary (i.e. the start offset and length) of the data records. Finally, the data records are extracted based on the anchor trees and the record boundaries.
    Type: Grant
    Filed: November 12, 2010
    Date of Patent: March 17, 2015
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Xinying Song, Yunbo Cao, Chin-Yew Lin
  • Patent number: 8849787
    Abstract: A two stage model identifies individuals having knowledge in a subject matter area relevant to a query. A relevance model receives a query and identifies documents, or other information, relevant to the query. A co-occurrence model identifies individuals, in the retrieved documents, related to the subject matter of the query. Individuals identified can be scored by combining scores from the relevance model and the co-occurrence model and output in a rank ordered list.
    Type: Grant
    Filed: January 4, 2012
    Date of Patent: September 30, 2014
    Assignee: Microsoft Corporation
    Inventors: Yunbo Cao, Hang Li
  • Patent number: 8843492
    Abstract: Some implementations disclosed herein provide techniques and arrangements to train a blocking scheme using both labeled data and unlabeled data. For example, training the blocking scheme may include iteratively: learning a conjunction, identifying first matches in the labeled data and the unlabeled data that are uncovered by the conjunction, and identifying second matches in the labeled data and the unlabeled data that are covered by the conjunction. The conjunction learned in each iteration may be combined using a disjunction. A search engine may use the search engine when searching for records that match an entity.
    Type: Grant
    Filed: February 13, 2012
    Date of Patent: September 23, 2014
    Assignee: Microsoft Corporation
    Inventors: Yunbo Cao, Chin-Yew Lin, Pei Yue, Zhiyuan Chen
  • Publication number: 20140143223
    Abstract: This disclosure describes, in part, techniques for operating a search query user interface to allow seamless creating, editing and/or refining of a search query using various interactive functions. The techniques described herein may display a search query divided into segments. A selection of a segment of the search query may then be received. One or more alternatives to the selected segment may then be presented. Next, a selection of one of the presented alternative may be received. As a result, the search query may be altered using the selected alternative. Furthermore, the techniques described herein allow a user to operate on a search query using query substitution, expansion, association and/or history functions.
    Type: Application
    Filed: November 19, 2012
    Publication date: May 22, 2014
    Applicant: MICROSOFT CORPORATION
    Inventors: Shuming Shi, Chin-Yew Lin, Yunbo Cao
  • Publication number: 20130212103
    Abstract: Some implementations disclosed herein provide techniques and arrangements to train a blocking scheme using both labeled data and unlabeled data. For example, training the blocking scheme may include iteratively: learning a conjunction, identifying first matches in the labeled data and the unlabeled data that are uncovered by the conjunction, and identifying second matches in the labeled data and the unlabeled data that are covered by the conjunction. The conjunction learned in each iteration may be combined using a disjunction. A search engine may use the search engine when searching for records that match an entity.
    Type: Application
    Filed: February 13, 2012
    Publication date: August 15, 2013
    Applicant: Microsoft Corporation
    Inventors: Yunbo Cao, Chin-Yew Lin, Pei Yue, Zhiyuan Chen
  • Publication number: 20120124077
    Abstract: Embodiments for a Mining Data Records based on Anchor Trees (MiBAT) process are disclosed. In accordance with at least one embodiment, the MiBAT process extracts data records containing user-generated content from web documents. The web document is processed into a Document Object Model (DOM) tree in which sub-trees of the DOM tree represent the data records of the web document. Domain constraints are used to locate structured portions of the DOM tree. Anchor trees are then located as being sets of sibling sub-trees which contain the domain constraints. The anchor trees are then used to determine a record boundary (i.e. the start offset and length) of the data records. Finally, the data records are extracted based on the anchor trees and the record boundaries.
    Type: Application
    Filed: November 12, 2010
    Publication date: May 17, 2012
    Applicant: MICROSOFT CORPORATION
    Inventors: Xinying Song, Yunbo Cao, Chin-Yew Lin
  • Publication number: 20120124086
    Abstract: Described herein are techniques for extracting data records containing user-generated content from documents. The documents may be processed into document trees in which sub-trees represent the data records of the document. Domain constraints may be used to locate structured portions of the document tree. For example, anchor trees may be located as being sets of sibling sub-trees with similar tag paths that contain the domain constraints. The anchor trees may then be used to determine a record boundary (e.g., the start offset and length) of the data records. Finally, the data records may be extracted based on the anchor trees and the record boundaries.
    Type: Application
    Filed: January 23, 2012
    Publication date: May 17, 2012
    Applicant: MICROSOFT CORPORATION
    Inventors: Xinying Song, Zhiyuan Chen, Yunbo Cao, Chin-Yew Lin
  • Publication number: 20120109949
    Abstract: A two stage model identifies individuals having knowledge in a subject matter area relevant to a query. A relevance model receives a query and identifies documents, or other information, relevant to the query. A co-occurrence model identifies individuals, in the retrieved documents, related to the subject matter of the query. Individuals identified can be scored by combining scores from the relevance model and the co-occurrence model and output in a rank ordered list.
    Type: Application
    Filed: January 4, 2012
    Publication date: May 3, 2012
    Applicant: Microsoft Corporation
    Inventors: Yunbo CAO, Hang LI
  • Patent number: 8156097
    Abstract: A two stage model identifies individuals having knowledge in a subject matter area relevant to a query. A relevance model receives a query and identifies documents, or other information, relevant to the query. A co-occurrence model identifies individuals, in the retrieved documents, related to the subject matter of the query. Individuals identified can be scored by combining scores from the relevance model and the co-occurrence model and output in a rank ordered list.
    Type: Grant
    Filed: November 14, 2005
    Date of Patent: April 10, 2012
    Assignee: Microsoft Corporation
    Inventors: Yunbo Cao, Hang Li
  • Patent number: 8112269
    Abstract: A question search system provides a collection of questions having words for use in evaluating the utility of the questions based on a language model. The question search system calculates n-gram probabilities for words within the questions of the collection. The n-gram probability of a word for a sequence of n?1 words indicates the probability of that word being next after that sequence in the collection of questions. The n-gram probabilities for the words of the collection represent the language model of the collection. The question search system calculates a language model utility score for each question within a collection that indicates the likelihood that a question is repeatedly asked by users. The question search system derives the language model utility score for a question from the n-gram probabilities of the words within that question.
    Type: Grant
    Filed: August 25, 2008
    Date of Patent: February 7, 2012
    Assignee: Microsoft Corporation
    Inventors: Yunbo Cao, Chin-Yew Lin
  • Patent number: 8027973
    Abstract: A method and system for determining the relevance of questions to a queried question based on topics and focuses of the questions is provided. A question search system provides a collection of questions with topics and focuses. Upon receiving a queried question, the question search system identifies a queried topic and queried focus of the queried question. The question search system generates a score indicating the relevance of a question of the collection to the queried question based on a language model of the topic of the question and a language model of the focus of the question.
    Type: Grant
    Filed: August 4, 2008
    Date of Patent: September 27, 2011
    Assignee: Microsoft Corporation
    Inventors: Yunbo Cao, Chin-Yew Lin
  • Patent number: 8024332
    Abstract: A method and system for presenting questions that are relevant to a queried question based on clusters of topics and clusters of focuses of the questions is provided. A question search system provides a collection of questions. Each question of the collection has an associated topic and focus. Upon receiving a queried question, the question search system identifies questions of the collection that may be relevant to the queried question and generates a score or ranking indicating relevance of the identified questions. The question search system clusters the identified questions into topic clusters of questions with similar topics. The question search system may also cluster the questions within each topic cluster into focus clusters of questions with similar focuses.
    Type: Grant
    Filed: August 4, 2008
    Date of Patent: September 20, 2011
    Assignee: Microsoft Corporation
    Inventors: Yunbo Cao, Chin-Yew Lin
  • Patent number: 7966316
    Abstract: In a question answering system, the system identifies a type of question input by a user. The system then generates answer summaries that summarize answers to the input question in a format that is determined based on the type of question asked by the user. The answer summaries are output, in the corresponding format, in answer to the input question.
    Type: Grant
    Filed: April 15, 2008
    Date of Patent: June 21, 2011
    Assignee: Microsoft Corporation
    Inventors: Yunbo Cao, Chin-Yew Lin
  • Patent number: 7877383
    Abstract: A method of processing information is provided. The method includes collecting text strings of definition candidates from a data source. The definition candidates are ranked based on the text strings.
    Type: Grant
    Filed: April 27, 2005
    Date of Patent: January 25, 2011
    Assignee: Microsoft Corporation
    Inventors: Yunbo Cao, Hang Li, Jun Xu
  • Patent number: 7849097
    Abstract: A typed separable mixture model is used to mine associative relationships between sets of objects. Instead of modeling only one type of co-occurrence among the sets of objects, the typed separable mixture model can model multiple different types of co-occurrences among more than two sets of objects, and co-occurrences that exist in different contexts.
    Type: Grant
    Filed: April 12, 2007
    Date of Patent: December 7, 2010
    Assignee: Microsoft Corporation
    Inventors: Yunbo Cao, Hang Li
  • Publication number: 20100235311
    Abstract: Exemplary methods, computer-readable media, and systems are presented for leveraging question-answering knowledge from community sites by complementing product search services with a search of questions, answers, reviews and other Internet accessible content including user-generated content. Product or service information is obtained by crawling Internet-accessible Web sites including community sites. An integrated index of such information is generated. A user is able to browse questions by product or service feature, by topic, by identified comparative questions, and by question ranking (for example, interestingness or popularity).
    Type: Application
    Filed: March 13, 2009
    Publication date: September 16, 2010
    Applicant: Microsoft Corporation
    Inventors: Yunbo Cao, Chin-Yew Lin, Bo Wang
  • Publication number: 20100235343
    Abstract: Exemplary methods, computer-readable media, and systems are presented for learning to recommend questions and other user-generated submissions to community sites based on user ratings. The size of available training data is enlarged by taking into consideration questions without user ratings, which in turn benefits the learned model. Question or other user-generated submissions are obtained by crawling Internet-accessible Web sites including community sites. Questions and other submissions, even when not tagged, voted or indicated as “popular” or “interesting” by users are quantitatively indentified as “interesting.
    Type: Application
    Filed: September 29, 2009
    Publication date: September 16, 2010
    Applicant: Microsoft Corporation
    Inventors: Yunbo Cao, Chin-Yew Lin, Young-In Song
  • Patent number: 7783629
    Abstract: A query and a factoid type selection are received from a user. An index of passages, indexed based on factoids, is accessed and passages that are related to the query, and that have the selected factoid type, are retrieved. The retrieved passages are ranked and provided to the user based on a calculated score, in rank order.
    Type: Grant
    Filed: January 5, 2006
    Date of Patent: August 24, 2010
    Assignee: Microsoft Corporation
    Inventors: Hang Li, Jianfeng Gao, Yunbo Cao