Patents by Inventor Wen-tau Yih
Wen-tau Yih has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11250331Abstract: A technique is described herein for processing documents in a time-efficient and accurate manner. In a training phase, the technique generates a set of initial training examples by associating entity mentions in a text corpus with corresponding entity identifiers. Each entity identifier uniquely identifies an entity in a particular ontology. The technique then removes noisy training examples from the set of initial training examples, to provide a set of filtered training examples. The technique then applies a machine-learning process to generate a linking component based, in part, on the set of filtered training examples. In an application phase, the technique uses the linking component to link input entity mentions with corresponding entity identifiers. Various application systems can leverage the capabilities of the linking component, including a search system, a document-creation system, etc.Type: GrantFiled: October 31, 2017Date of Patent: February 15, 2022Assignee: Microsoft Technology Licensing, LLCInventors: Christopher Brian Quirk, Hoifung Poon, Wen-tau Yih, Hai Wang
-
Patent number: 10592519Abstract: A processing unit can determine multiple representations associated with a statement, e.g., subject or predicate representations. In some examples, the representations can lack representation of semantics of the statement. The computing device can determine a computational model of the statement based at least in part on the representations. The computing device can receive a query, e.g., via a communications interface. The computing device can determine at least one query representation, e.g., a subject, predicate, or entity representation. The computing device can then operate the model using the query representation to provide a model output. The model output can represent a relationship between the query representations and information in the model. The computing device can, e.g., transmit an indication of the model output via the communications interface.Type: GrantFiled: March 29, 2016Date of Patent: March 17, 2020Assignee: Microsoft Technology Licensing, LLCInventors: Xiaodong He, Li Deng, Jianfeng Gao, Wen-tau Yih, Moontae Lee, Paul Smolensky
-
Publication number: 20190130282Abstract: A technique is described herein for processing documents in a time-efficient and accurate manner. In a training phase, the technique generates a set of initial training examples by associating entity mentions in a text corpus with corresponding entity identifiers. Each entity identifier uniquely identifies an entity in a particular ontology. The technique then removes noisy training examples from the set of initial training examples, to provide a set of filtered training examples. The technique then applies a machine-learning process to generate a linking component based, in part, on the set of filtered training examples. In an application phase, the technique uses the linking component to link input entity mentions with corresponding entity identifiers. Various application systems can leverage the capabilities of the linking component, including a search system, a document-creation system, etc.Type: ApplicationFiled: October 31, 2017Publication date: May 2, 2019Inventors: Christopher Brian QUIRK, Hoifung POON, Wen-tau YIH, Hai WANG
-
Patent number: 10255269Abstract: Long short term memory units that accept a non-predefined number of inputs are used to provide natural language relation extraction over a user-specified range on content. Content written for human consumption is parsed with distant supervision in segments (e.g., sentences, paragraphs, chapters) to determine relationships between various words within and between those segments.Type: GrantFiled: December 30, 2016Date of Patent: April 9, 2019Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Christopher Brian Quirk, Kristina Nikolova Toutanova, Wen-tau Yih, Hoifung Poon, Nanyun Peng
-
Publication number: 20180189269Abstract: Long short term memory units that accept a non-predefined number of inputs are used to provide natural language relation extraction over a user-specified range on content. Content written for human consumption is parsed with distant supervision in segments (e.g., sentences, paragraphs, chapters) to determine relationships between various words within and between those segments.Type: ApplicationFiled: December 30, 2016Publication date: July 5, 2018Applicant: Microsoft Technology Licensing, LLCInventors: Christopher Brian Quirk, Kristina Nikolova Toutanova, Wen-tau Yih, Hoifung Poon, Nanyun Peng
-
Publication number: 20170286494Abstract: A processing unit can determine multiple representations associated with a statement, e.g., subject or predicate representations. In some examples, the representations can lack representation of semantics of the statement. The computing device can determine a computational model of the statement based at least in part on the representations. The computing device can receive a query, e.g., via a communications interface. The computing device can determine at least one query representation, e.g., a subject, predicate, or entity representation. The computing device can then operate the model using the query representation to provide a model output. The model output can represent a relationship between the query representations and information in the model. The computing device can, e.g., transmit an indication of the model output via the communications interface.Type: ApplicationFiled: March 29, 2016Publication date: October 5, 2017Inventors: Xiaodong He, Li Deng, Jianfeng Gao, Wen-tau Yih, Moontae Lee, Paul Smolensky
-
Publication number: 20170193157Abstract: Drug combinations offer promising treatment for some conditions such as cancer. However, the large number of available drug combinations makes it impractical to try all possible combinations. Machine-learning techniques described in this disclosure train a classification algorithm. Once trained, the classification algorithm uses genomic data from a specific patient to perform in silico tests of drugs and drug combinations against the genomic data to determine which therapies are likely to be effective for treating a condition of the specific patient.Type: ApplicationFiled: December 30, 2015Publication date: July 6, 2017Inventors: Christopher B. Quirk, Wen-tau Yih, Hoifung Poon, Kristina Toutanova, Stephen William Mayhew, Sheng Wang
-
Patent number: 9183173Abstract: Described is a technology for measuring the similarity between two objects (e.g., documents), via a framework that learns the term-weighting function from training data, e.g., labeled pairs of objects, to develop a learned model. A learning procedure tunes the model parameters by minimizing a defined loss function of the similarity score. Also described is using the learning procedure and learned model to detect near duplicate documents.Type: GrantFiled: March 2, 2010Date of Patent: November 10, 2015Assignee: Microsoft Technology Licensing, LLCInventors: Wen-tau Yih, Christopher A. Meek, Hannaneh Hajishirzi
-
Patent number: 9009148Abstract: There is provided a computer-implemented method and system for ranking documents. The method includes identifying a number of query-document pairs based on clickthrough data for a number of documents. The method also includes building a latent semantic model based on the query-document pairs and ranking the documents for a search based on the latent semantic model.Type: GrantFiled: December 19, 2011Date of Patent: April 14, 2015Assignee: Microsoft Technology Licensing, LLCInventors: Jianfeng Gao, Kristina Toutanova, Wen-tau Yih
-
Patent number: 8996515Abstract: Two methods for measuring keyword-document relevance are described. The methods receive a keyword and a document as input and output a probability value for the keyword. The first method is a similarity-based approach which uses techniques for measuring similarity between two short-text segments to measure relevance between the keyword and the document. The second method is a regression-based approach based on an assumption that if an out-of-document phrase (the keyword) is semantically similar to an in-document phrase, then relevance scores of the in and out-of document phrases should be close to each other.Type: GrantFiled: September 11, 2012Date of Patent: March 31, 2015Assignee: Microsoft CorporationInventors: Wen-tau Yih, Christopher A. Meek
-
Publication number: 20140249799Abstract: Relational similarity measuring embodiments are presented that generally involve creating a relational similarity model that, given two pairs of words, is used to measure a degree of relational similarity between the two relations respectively exhibited by these word pairs. In one exemplary embodiment this involves creating a combined relational similarity model from a plurality of relational similarity models. This is generally accomplished by first selecting a plurality of relational similarity models, each of which measures relational similarity between two pairs of words, and each of which is trained or created using a different method or linguistic/textual resource. The selected models are then combined to form the combined relational similarity model. The combined model inputs two pairs of words and outputs a relational similarity indicator representing a measure the degree of relational similarity between the word pairs.Type: ApplicationFiled: March 4, 2013Publication date: September 4, 2014Applicant: Microsoft CorporationInventors: Wen-tau Yih, Geoffrey Zweig, Christopher Meek, Alisa Zhila, Tomas Mikolov
-
Publication number: 20140067368Abstract: A document-term matrix may be generated based on a corpus. A term representation matrix may be generated based on modifying a plurality of elements of the document-term matrix based on antonym information included in the corpus. Similarities may be determined based on a plurality of elements of the term representation matrix.Type: ApplicationFiled: August 29, 2012Publication date: March 6, 2014Applicant: Microsoft CorporationInventors: Wen-tau Yih, Geoffrey G. Zweig, John C. Platt
-
Publication number: 20130159320Abstract: There is provided a computer-implemented method and system for ranking documents. The method includes identifying a number of query-document pairs based on clickthrough data for a number of documents. The method also includes building a latent semantic model based on the query-document pairs and ranking the documents for a search based on the latent semantic model.Type: ApplicationFiled: December 19, 2011Publication date: June 20, 2013Applicant: MICROSOFT CORPORATIONInventors: Jianfeng Gao, Kristina Toutanova, Wen-tau Yih
-
Publication number: 20120330978Abstract: Two methods for measuring keyword-document relevance are described. The methods receive a keyword and a document as input and output a probability value for the keyword. The first method is a similarity-based approach which uses techniques for measuring similarity between two short-text segments to measure relevance between the keyword and the document. The second method is a regression-based approach based on an assumption that if an out-of-document phrase (the keyword) is semantically similar to an in-document phrase, then relevance scores of the in and out-of document phrases should be close to each other.Type: ApplicationFiled: September 11, 2012Publication date: December 27, 2012Applicant: Microsoft CorporationInventors: Wen-tau Yih, Christopher A. Meek
-
Publication number: 20120323968Abstract: A model for mapping the raw text representation of a text object to a vector space is disclosed. A function is defined for computing a similarity score given two output vectors. A loss function is defined for computing an error based on the similarity scores and the labels of pairs of vectors. The parameters of the model are tuned to minimize the loss function. The label of two vectors indicates a degree of similarity of the objects. The label may be a binary number or a real-valued number. The function for computing similarity scores may be a cosine, Jaccard, or differentiable function. The loss function may compare pairs of vectors to their labels. Each element of the output vector is a linear or non-linear function of the terms of an input vector. The text objects may be different types of documents and two different models may be trained concurrently.Type: ApplicationFiled: June 14, 2011Publication date: December 20, 2012Applicant: Microsoft CorporationInventors: Wen-tau Yih, Kristina N. Toutanova, Christopher A. Meek, John C. Platt
-
Patent number: 8290946Abstract: Two methods for measuring keyword-document relevance are described. The methods receive a keyword and a document as input and output a probability value for the keyword. The first method is a similarity-based approach which uses techniques for measuring similarity between two short-text segments to measure relevance between the keyword and the document. The second method is a regression-based approach based on an assumption that if an out-of-document phrase (the keyword) is semantically similar to an in-document phrase, then relevance scores of the in and out-of document phrases should be close to each other.Type: GrantFiled: June 24, 2008Date of Patent: October 16, 2012Assignee: Microsoft CorporationInventors: Wen-tau Yih, Christopher A. Meek
-
Patent number: 8135728Abstract: Extraction analysis techniques biased, in part, by query frequency information from a query log file and/or search engine cache are employed along with machine learning processes to determine candidate keywords and/or phrases of web documents. Web oriented features associated with the candidate keywords and/or phrases are also utilized to analyze the web documents. A keyword and/or phrase extraction mechanism can be utilized to score keywords and/or phrases in a web document and estimate a likelihood that the keywords and/or phrases are relevant, for example, in an advertising system and the like.Type: GrantFiled: January 3, 2007Date of Patent: March 13, 2012Assignee: Microsoft CorporationInventors: Wen-tau Yih, Joshua T. Goodman, Vitor Rocha de Carvalho
-
Publication number: 20110219012Abstract: Described is a technology for measuring the similarity between two objects (e.g., documents), via a framework that learns the term-weighting function from training data, e.g., labeled pairs of objects, to develop a learned model. A learning procedure tunes the model parameters by minimizing a defined loss function of the similarity score. Also described is using the learning procedure and learned model to detect near duplicate documents.Type: ApplicationFiled: March 2, 2010Publication date: September 8, 2011Inventors: Wen-tau Yih, Christopher A. Meek, Hannaneh Hajishirzi
-
Patent number: 7788292Abstract: The claimed subject matter provides systems and/or methods for normalizing document representations for use with Naïve Bayes. The system can include devices and components that determine norms associated with documents by aggregating absolute term weight values associated with the documents, and further ascertain term weights for features associated with the documents, and thereafter divides the term weights for the features associated with the documents with the norms associated with the documents to produce a normalized document representation that can be utilized by arbitrary linear classifiers.Type: GrantFiled: December 12, 2007Date of Patent: August 31, 2010Assignee: Microsoft CorporationInventors: Aleksander Kolcz, Wen-tau Yih
-
Publication number: 20100211641Abstract: Techniques and systems are described that utilize a scalable, “light-weight” user model, which can be combined with a traditional global email spam filter, to determine whether an email message sent to a target user is a desired email. A global email model is trained with a set of email messages to detect desired emails, and a user email model is also trained to detect desired emails. Training the user email model may comprise one or more of: using labeled training emails; using target user-based information; and using information from the global email model. Global and user model scores for an email sent to a target user can be combined to produce an email score. The email score can be compared with a desired email threshold to determine whether the email message sent to the target user is desired or not.Type: ApplicationFiled: February 16, 2009Publication date: August 19, 2010Applicant: Microsoft CorporationInventors: Wen-tau Yih, Chrisopher A. Meek, Robert L. McCann, Ming-Wei Chang