Patents by Inventor Yunbo Cao
Yunbo Cao has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 7707204Abstract: A query and a factoid type selection are received from a user. An index of passages, indexed based on factoids, is accessed and passages that are related to the query, and that have the selected factoid type, are retrieved. The retrieved passages are ranked and provided to the user based on a calculated score, in rank order.Type: GrantFiled: December 13, 2005Date of Patent: April 27, 2010Assignee: Microsoft CorporationInventors: Hang Li, Jianfeng Gao, Yunbo Cao
-
Publication number: 20100049498Abstract: A question search system provides a collection of questions having words for use in evaluating the utility of the questions based on a language model. The question search system calculates n-gram probabilities for words within the questions of the collection. The n-gram probability of a word for a sequence of n?1 words indicates the probability of that word being next after that sequence in the collection of questions. The n-gram probabilities for the words of the collection represent the language model of the collection. The question search system calculates a language model utility score for each question within a collection that indicates the likelihood that a question is repeatedly asked by users. The question search system derives the language model utility score for a question from the n-gram probabilities of the words within that question.Type: ApplicationFiled: August 25, 2008Publication date: February 25, 2010Applicant: Microsoft CorporationInventors: Yunbo Cao, Chin-Yew Lin
-
Publication number: 20100030769Abstract: A method and system for presenting questions that are relevant to a queried question based on clusters of topics and clusters of focuses of the questions is provided. A question search system provides a collection of questions. Each question of the collection has an associated topic and focus. Upon receiving a queried question, the question search system identifies questions of the collection that may be relevant to the queried question and generates a score or ranking indicating relevance of the identified questions. The question search system clusters the identified questions into topic clusters of questions with similar topics. The question search system may also cluster the questions within each topic cluster into focus clusters of questions with similar focuses.Type: ApplicationFiled: August 4, 2008Publication date: February 4, 2010Applicant: Microsoft CorporationInventors: Yunbo Cao, Chin-Yew Lin
-
Publication number: 20100030770Abstract: A method and system for determining the relevance of questions to a queried question based on topics and focuses of the questions is provided. A question search system provides a collection of questions with topics and focuses. Upon receiving a queried question, the question search system identifies a queried topic and queried focus of the queried question. The question search system generates a score indicating the relevance of a question of the collection to the queried question based on a language model of the topic of the question and a language model of the focus of the question.Type: ApplicationFiled: August 4, 2008Publication date: February 4, 2010Applicant: Microsoft CorporationInventors: Yunbo Cao, Chin-Yew Lin
-
Patent number: 7644074Abstract: A method of finding documents. A method of finding documents comprising, ranking documents according to relevance to form a ranked relevance list, ranking documents according to type to form a ranked type list, and interpolating the ranked relevance list and the ranked type list to form a list of documents ranked by relevance and type.Type: GrantFiled: December 22, 2005Date of Patent: January 5, 2010Assignee: Microsoft CorporationInventors: Yunbo Cao, Hang Li, Jun Xu
-
Publication number: 20090259642Abstract: In a question answering system, the system identifies a type of question input by a user. The system then generates answer summaries that summarize answers to the input question in a format that is determined based on the type of question asked by the user. The answer summaries are output, in the corresponding format, in answer to the input question.Type: ApplicationFiled: April 15, 2008Publication date: October 15, 2009Applicant: MICROSOFT CORPORATIONInventors: Yunbo Cao, Chin-Yew Lin
-
Publication number: 20090253112Abstract: The present system graphs topic terms in stored cQA questions and also converts a submitted question into a graph of topic terms. Topic terms that correspond to a question topic are delineated from topic terms that correspond to question focus. New questions are recommended to the user based on a comparison between the topics of the new questions and the topic of the submitted question as well as the focus of the new questions and the focus of the submitted question.Type: ApplicationFiled: April 7, 2008Publication date: October 8, 2009Applicant: MICROSOFT CORPORATIONInventors: Yunbo Cao, Chin-Yew Lin
-
Patent number: 7593934Abstract: A method and system for generating a ranking function to rank the relevance of documents to a query is provided. The ranking system learns a ranking function from training data that includes queries, resultant documents, and relevance of each document to its query. The ranking system learns a ranking function using the training data by weighting incorrect rankings of relevant documents more heavily than the incorrect rankings of not relevant documents so that more emphasis is placed on correctly ranking relevant documents. The ranking system may also learn a ranking function using the training data by normalizing the contribution of each query to the ranking function so that it is independent of the number of relevant documents of each query.Type: GrantFiled: July 28, 2006Date of Patent: September 22, 2009Assignee: Microsoft CorporationInventors: Hang Li, Jun Xu, Yunbo Cao, Tie-Yan Liu
-
Patent number: 7590608Abstract: A cascaded processing approach is used to clean noisy electronic mail or other text messaging data. Non-text filtering is first performed on the noisy data to filter out non-text items in the data. Text normalization is then performed on the filtered data to provide cleaned data. The cleaned data can be used in one or more of a wide variety of other applications or processing systems.Type: GrantFiled: December 2, 2005Date of Patent: September 15, 2009Assignee: Microsoft CorporationInventors: Hang Li, Yunbo Cao, ZhaoHui Tang
-
Patent number: 7512582Abstract: Collaborative bootstrapping with uncertainty reduction for increased classifier performance. One classifier selects a portion of data that is uncertain with respect to the classifier and a second classifier labels the portion. Uncertainty reduction includes parallel processing where the second classifier also selects an uncertain portion for the first classifier to label. Uncertainty reduction can be incorporated into existing or new co-training or bootstrapping, including bilingual bootstrapping.Type: GrantFiled: December 10, 2003Date of Patent: March 31, 2009Assignee: Microsoft CorporationInventors: Yunbo Cao, Hang Li
-
Publication number: 20090083096Abstract: A method for handling product reviews can detect a first quality product review from a second quality product review. The first and second quality product reviews can be associated with a product. The first quality product review can be filtered. An opinion segment in the second quality product review can be identified and the polarity can be determined of the opinion segment. An opinion set can be generated with the opinion segment for a product feature. A score (or weighty can be aggregated of segments in the opinion set for the product feature.Type: ApplicationFiled: September 20, 2007Publication date: March 26, 2009Applicant: Microsoft CorporationInventors: Yunbo Cao, Chin-Yew Lin, Ming Zhou
-
Patent number: 7469251Abstract: An information extraction model is trained on format features identified within labeled training documents. Information from a document is extracted by assigning labels to units based on format features of the units within the document. A begin label and end label are identified and the information is extracted between the begin label and the end label. The extracted information can be used in various document processing tasks such as ranking.Type: GrantFiled: July 29, 2005Date of Patent: December 23, 2008Assignee: Microsoft CorporationInventors: Hang Li, Ruihua Song, Yunbo Cao, Dmitriy Meyerzon
-
Patent number: 7461056Abstract: A method for extracting key terms and associated key terms for use in text mining is provided. The method includes receiving unstructured text documents, such as emails over a customer service system. Term candidates are extracted based on identifying consecutive word strings satisfying a context independency threshold. Term candidates are weighted using mutual information to generate a list of weighted terms. The weighted terms are then recounted. Terms are associated based on Chi-square values. Associated terms can then be used for information retrieval. A user interface can be personalized with individual user profiles.Type: GrantFiled: February 9, 2005Date of Patent: December 2, 2008Assignee: Microsoft CorporationInventors: Yunbo Cao, Hang Li, Olivier Ribet, Benjamin Martin
-
Publication number: 20080249764Abstract: A sentiment classifier is described. In one implementation, a system applies both full text and complex feature analyses to sentences of a product review. Each analysis is weighted prior to linear combination into a final sentiment prediction. A full text model and a complex features model can be trained separately offline to support online full text analysis and complex features analysis. Complex features include opinion indicators, negation patterns, sentiment-specific sections of the product review, user ratings, sequence of text chunks, and sentence types and lengths. A Conditional Random Field (CRF) framework provides enhanced sentiment classification for each segment of a complex sentence to enhance sentiment prediction.Type: ApplicationFiled: December 5, 2007Publication date: October 9, 2008Applicant: Microsoft CorporationInventors: Shen Huang, Ling Bao, Yunbo Cao, Zheng Chen, Chin-Yew Lin, Christoph R. Ponath, Jian-Tao Sun, Ming Zhou, Jian Wang
-
Publication number: 20080147654Abstract: A typed separable mixture model is used to mine associative relationships between sets of objects. Instead of modeling only one type of co-occurrence among the sets of objects, the typed separable mixture model can model multiple different types of co-occurrences among more than two sets of objects, and co-occurrences that exist in different contexts.Type: ApplicationFiled: April 12, 2007Publication date: June 19, 2008Applicant: Microsoft CorporationInventors: Yunbo Cao, Hang Li
-
Publication number: 20080027925Abstract: A method and system for generating a ranking function to rank the relevance of documents to a query is provided. The ranking system learns a ranking function from training data that includes queries, resultant documents, and relevance of each document to its query. The ranking system learns a ranking function using the training data by weighting incorrect rankings of relevant documents more heavily than the incorrect rankings of not relevant documents so that more emphasis is placed on correctly ranking relevant documents. The ranking system may also learn a ranking function using the training data by normalizing the contribution of each query to the ranking function so that it is independent of the number of relevant documents of each query.Type: ApplicationFiled: July 28, 2006Publication date: January 31, 2008Applicant: Microsoft CorporationInventors: Hang Li, Jun Xu, Yunbo Cao, Tie-Yan Liu
-
Patent number: 7299228Abstract: The present invention relates to extracting information from an information source. During extraction, strings in the information source are accessed. These strings in the information source are matched with generalized extraction patterns that include words and wildcards. The wildcards denote that at least one word in an individual string can be skipped in order to match the individual string to an individual generalized extraction pattern.Type: GrantFiled: December 11, 2003Date of Patent: November 20, 2007Assignee: Microsoft CorporationInventors: Yunbo Cao, Hang Li
-
Patent number: 7284006Abstract: A computer-implemented method is provided that includes receiving a document and determining a file type for the document. In addition, the document is segmented into blocks of text as a function of the file type and at least one keyword and a summary is generated for the document.Type: GrantFiled: November 14, 2003Date of Patent: October 16, 2007Assignee: Microsoft CorporationInventors: Yunbo Cao, Hang Li
-
Publication number: 20070150473Abstract: A method of finding documents. A method of finding documents comprising, ranking documents according to relevance to form a ranked relevance list, ranking documents according to type to form a ranked type list, and combining the ranked relevance list and the ranked type list to form a list of documents ranked by relevance and type.Type: ApplicationFiled: May 16, 2006Publication date: June 28, 2007Applicant: Microsoft CorporationInventors: Hang Li, Yunbo Cao, Jun Xu
-
Publication number: 20070150472Abstract: A method of finding documents. A method of finding documents comprising, ranking documents according to relevance to form a ranked relevance list, ranking documents according to type to form a ranked type list, and interpolating the ranked relevance list and the ranked type list to form a list of documents ranked by relevance and type.Type: ApplicationFiled: December 22, 2005Publication date: June 28, 2007Applicant: Microsoft CorporationInventors: Yunbo Cao, Hang Li, Jun Xu