Patents by Inventor Wen-tau Yih

Wen-tau Yih has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Distant supervision for entity linking with filtering of noise

Patent number: 11250331

Abstract: A technique is described herein for processing documents in a time-efficient and accurate manner. In a training phase, the technique generates a set of initial training examples by associating entity mentions in a text corpus with corresponding entity identifiers. Each entity identifier uniquely identifies an entity in a particular ontology. The technique then removes noisy training examples from the set of initial training examples, to provide a set of filtered training examples. The technique then applies a machine-learning process to generate a linking component based, in part, on the set of filtered training examples. In an application phase, the technique uses the linking component to link input entity mentions with corresponding entity identifiers. Various application systems can leverage the capabilities of the linking component, including a search system, a document-creation system, etc.

Type: Grant

Filed: October 31, 2017

Date of Patent: February 15, 2022

Assignee: Microsoft Technology Licensing, LLC

Inventors: Christopher Brian Quirk, Hoifung Poon, Wen-tau Yih, Hai Wang
Computational-model operation using multiple subject representations

Patent number: 10592519

Abstract: A processing unit can determine multiple representations associated with a statement, e.g., subject or predicate representations. In some examples, the representations can lack representation of semantics of the statement. The computing device can determine a computational model of the statement based at least in part on the representations. The computing device can receive a query, e.g., via a communications interface. The computing device can determine at least one query representation, e.g., a subject, predicate, or entity representation. The computing device can then operate the model using the query representation to provide a model output. The model output can represent a relationship between the query representations and information in the model. The computing device can, e.g., transmit an indication of the model output via the communications interface.

Type: Grant

Filed: March 29, 2016

Date of Patent: March 17, 2020

Assignee: Microsoft Technology Licensing, LLC

Inventors: Xiaodong He, Li Deng, Jianfeng Gao, Wen-tau Yih, Moontae Lee, Paul Smolensky
Distant Supervision for Entity Linking with Filtering of Noise

Publication number: 20190130282

Abstract: A technique is described herein for processing documents in a time-efficient and accurate manner. In a training phase, the technique generates a set of initial training examples by associating entity mentions in a text corpus with corresponding entity identifiers. Each entity identifier uniquely identifies an entity in a particular ontology. The technique then removes noisy training examples from the set of initial training examples, to provide a set of filtered training examples. The technique then applies a machine-learning process to generate a linking component based, in part, on the set of filtered training examples. In an application phase, the technique uses the linking component to link input entity mentions with corresponding entity identifiers. Various application systems can leverage the capabilities of the linking component, including a search system, a document-creation system, etc.

Type: Application

Filed: October 31, 2017

Publication date: May 2, 2019

Inventors: Christopher Brian QUIRK, Hoifung POON, Wen-tau YIH, Hai WANG
Graph long short term memory for syntactic relationship discovery

Patent number: 10255269

Abstract: Long short term memory units that accept a non-predefined number of inputs are used to provide natural language relation extraction over a user-specified range on content. Content written for human consumption is parsed with distant supervision in segments (e.g., sentences, paragraphs, chapters) to determine relationships between various words within and between those segments.

Type: Grant

Filed: December 30, 2016

Date of Patent: April 9, 2019

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Christopher Brian Quirk, Kristina Nikolova Toutanova, Wen-tau Yih, Hoifung Poon, Nanyun Peng
GRAPH LONG SHORT TERM MEMORY FOR SYNTACTIC RELATIONSHIP DISCOVERY

Publication number: 20180189269

Abstract: Long short term memory units that accept a non-predefined number of inputs are used to provide natural language relation extraction over a user-specified range on content. Content written for human consumption is parsed with distant supervision in segments (e.g., sentences, paragraphs, chapters) to determine relationships between various words within and between those segments.

Type: Application

Filed: December 30, 2016

Publication date: July 5, 2018

Applicant: Microsoft Technology Licensing, LLC

Inventors: Christopher Brian Quirk, Kristina Nikolova Toutanova, Wen-tau Yih, Hoifung Poon, Nanyun Peng
COMPUTATIONAL-MODEL OPERATION USING MULTIPLE SUBJECT REPRESENTATIONS

Publication number: 20170286494

Abstract: A processing unit can determine multiple representations associated with a statement, e.g., subject or predicate representations. In some examples, the representations can lack representation of semantics of the statement. The computing device can determine a computational model of the statement based at least in part on the representations. The computing device can receive a query, e.g., via a communications interface. The computing device can determine at least one query representation, e.g., a subject, predicate, or entity representation. The computing device can then operate the model using the query representation to provide a model output. The model output can represent a relationship between the query representations and information in the model. The computing device can, e.g., transmit an indication of the model output via the communications interface.

Type: Application

Filed: March 29, 2016

Publication date: October 5, 2017

Inventors: Xiaodong He, Li Deng, Jianfeng Gao, Wen-tau Yih, Moontae Lee, Paul Smolensky
Testing of Medicinal Drugs and Drug Combinations

Publication number: 20170193157

Abstract: Drug combinations offer promising treatment for some conditions such as cancer. However, the large number of available drug combinations makes it impractical to try all possible combinations. Machine-learning techniques described in this disclosure train a classification algorithm. Once trained, the classification algorithm uses genomic data from a specific patient to perform in silico tests of drugs and drug combinations against the genomic data to determine which therapies are likely to be effective for treating a condition of the specific patient.

Type: Application

Filed: December 30, 2015

Publication date: July 6, 2017

Inventors: Christopher B. Quirk, Wen-tau Yih, Hoifung Poon, Kristina Toutanova, Stephen William Mayhew, Sheng Wang
Learning element weighting for similarity measures

Patent number: 9183173

Abstract: Described is a technology for measuring the similarity between two objects (e.g., documents), via a framework that learns the term-weighting function from training data, e.g., labeled pairs of objects, to develop a learned model. A learning procedure tunes the model parameters by minimizing a defined loss function of the similarity score. Also described is using the learning procedure and learned model to detect near duplicate documents.

Type: Grant

Filed: March 2, 2010

Date of Patent: November 10, 2015

Assignee: Microsoft Technology Licensing, LLC

Inventors: Wen-tau Yih, Christopher A. Meek, Hannaneh Hajishirzi
Clickthrough-based latent semantic model

Patent number: 9009148

Abstract: There is provided a computer-implemented method and system for ranking documents. The method includes identifying a number of query-document pairs based on clickthrough data for a number of documents. The method also includes building a latent semantic model based on the query-document pairs and ranking the documents for a search based on the latent semantic model.

Type: Grant

Filed: December 19, 2011

Date of Patent: April 14, 2015

Assignee: Microsoft Technology Licensing, LLC

Inventors: Jianfeng Gao, Kristina Toutanova, Wen-tau Yih
Consistent phrase relevance measures

Patent number: 8996515

Abstract: Two methods for measuring keyword-document relevance are described. The methods receive a keyword and a document as input and output a probability value for the keyword. The first method is a similarity-based approach which uses techniques for measuring similarity between two short-text segments to measure relevance between the keyword and the document. The second method is a regression-based approach based on an assumption that if an out-of-document phrase (the keyword) is semantically similar to an in-document phrase, then relevance scores of the in and out-of document phrases should be close to each other.

Type: Grant

Filed: September 11, 2012

Date of Patent: March 31, 2015

Assignee: Microsoft Corporation

Inventors: Wen-tau Yih, Christopher A. Meek
RELATIONAL SIMILARITY MEASUREMENT

Publication number: 20140249799

Abstract: Relational similarity measuring embodiments are presented that generally involve creating a relational similarity model that, given two pairs of words, is used to measure a degree of relational similarity between the two relations respectively exhibited by these word pairs. In one exemplary embodiment this involves creating a combined relational similarity model from a plurality of relational similarity models. This is generally accomplished by first selecting a plurality of relational similarity models, each of which measures relational similarity between two pairs of words, and each of which is trained or created using a different method or linguistic/textual resource. The selected models are then combined to form the combined relational similarity model. The combined model inputs two pairs of words and outputs a relational similarity indicator representing a measure the degree of relational similarity between the word pairs.

Type: Application

Filed: March 4, 2013

Publication date: September 4, 2014

Applicant: Microsoft Corporation

Inventors: Wen-tau Yih, Geoffrey Zweig, Christopher Meek, Alisa Zhila, Tomas Mikolov
DETERMINING SYNONYM-ANTONYM POLARITY IN TERM VECTORS

Publication number: 20140067368

Abstract: A document-term matrix may be generated based on a corpus. A term representation matrix may be generated based on modifying a plurality of elements of the document-term matrix based on antonym information included in the corpus. Similarities may be determined based on a plurality of elements of the term representation matrix.

Type: Application

Filed: August 29, 2012

Publication date: March 6, 2014

Applicant: Microsoft Corporation

Inventors: Wen-tau Yih, Geoffrey G. Zweig, John C. Platt
CLICKTHROUGH-BASED LATENT SEMANTIC MODEL

Publication number: 20130159320

Abstract: There is provided a computer-implemented method and system for ranking documents. The method includes identifying a number of query-document pairs based on clickthrough data for a number of documents. The method also includes building a latent semantic model based on the query-document pairs and ranking the documents for a search based on the latent semantic model.

Type: Application

Filed: December 19, 2011

Publication date: June 20, 2013

Applicant: MICROSOFT CORPORATION

Inventors: Jianfeng Gao, Kristina Toutanova, Wen-tau Yih
CONSISTENT PHRASE RELEVANCE MEASURES

Publication number: 20120330978

Abstract: Two methods for measuring keyword-document relevance are described. The methods receive a keyword and a document as input and output a probability value for the keyword. The first method is a similarity-based approach which uses techniques for measuring similarity between two short-text segments to measure relevance between the keyword and the document. The second method is a regression-based approach based on an assumption that if an out-of-document phrase (the keyword) is semantically similar to an in-document phrase, then relevance scores of the in and out-of document phrases should be close to each other.

Type: Application

Filed: September 11, 2012

Publication date: December 27, 2012

Applicant: Microsoft Corporation

Inventors: Wen-tau Yih, Christopher A. Meek
Learning Discriminative Projections for Text Similarity Measures

Publication number: 20120323968

Abstract: A model for mapping the raw text representation of a text object to a vector space is disclosed. A function is defined for computing a similarity score given two output vectors. A loss function is defined for computing an error based on the similarity scores and the labels of pairs of vectors. The parameters of the model are tuned to minimize the loss function. The label of two vectors indicates a degree of similarity of the objects. The label may be a binary number or a real-valued number. The function for computing similarity scores may be a cosine, Jaccard, or differentiable function. The loss function may compare pairs of vectors to their labels. Each element of the output vector is a linear or non-linear function of the terms of an input vector. The text objects may be different types of documents and two different models may be trained concurrently.

Type: Application

Filed: June 14, 2011

Publication date: December 20, 2012

Applicant: Microsoft Corporation

Inventors: Wen-tau Yih, Kristina N. Toutanova, Christopher A. Meek, John C. Platt
Consistent phrase relevance measures

Patent number: 8290946

Abstract: Two methods for measuring keyword-document relevance are described. The methods receive a keyword and a document as input and output a probability value for the keyword. The first method is a similarity-based approach which uses techniques for measuring similarity between two short-text segments to measure relevance between the keyword and the document. The second method is a regression-based approach based on an assumption that if an out-of-document phrase (the keyword) is semantically similar to an in-document phrase, then relevance scores of the in and out-of document phrases should be close to each other.

Type: Grant

Filed: June 24, 2008

Date of Patent: October 16, 2012

Assignee: Microsoft Corporation

Inventors: Wen-tau Yih, Christopher A. Meek
Web document keyword and phrase extraction

Patent number: 8135728

Abstract: Extraction analysis techniques biased, in part, by query frequency information from a query log file and/or search engine cache are employed along with machine learning processes to determine candidate keywords and/or phrases of web documents. Web oriented features associated with the candidate keywords and/or phrases are also utilized to analyze the web documents. A keyword and/or phrase extraction mechanism can be utilized to score keywords and/or phrases in a web document and estimate a likelihood that the keywords and/or phrases are relevant, for example, in an advertising system and the like.

Type: Grant

Filed: January 3, 2007

Date of Patent: March 13, 2012

Assignee: Microsoft Corporation

Inventors: Wen-tau Yih, Joshua T. Goodman, Vitor Rocha de Carvalho
Learning Element Weighting for Similarity Measures

Publication number: 20110219012

Abstract: Described is a technology for measuring the similarity between two objects (e.g., documents), via a framework that learns the term-weighting function from training data, e.g., labeled pairs of objects, to develop a learned model. A learning procedure tunes the model parameters by minimizing a defined loss function of the similarity score. Also described is using the learning procedure and learned model to detect near duplicate documents.

Type: Application

Filed: March 2, 2010

Publication date: September 8, 2011

Inventors: Wen-tau Yih, Christopher A. Meek, Hannaneh Hajishirzi
Raising the baseline for high-precision text classifiers

Patent number: 7788292

Abstract: The claimed subject matter provides systems and/or methods for normalizing document representations for use with Naïve Bayes. The system can include devices and components that determine norms associated with documents by aggregating absolute term weight values associated with the documents, and further ascertain term weights for features associated with the documents, and thereafter divides the term weights for the features associated with the documents with the norms associated with the documents to produce a normalized document representation that can be utilized by arbitrary linear classifiers.

Type: Grant

Filed: December 12, 2007

Date of Patent: August 31, 2010

Assignee: Microsoft Corporation

Inventors: Aleksander Kolcz, Wen-tau Yih
PERSONALIZED EMAIL FILTERING

Publication number: 20100211641

Abstract: Techniques and systems are described that utilize a scalable, “light-weight” user model, which can be combined with a traditional global email spam filter, to determine whether an email message sent to a target user is a desired email. A global email model is trained with a set of email messages to detect desired emails, and a user email model is also trained to detect desired emails. Training the user email model may comprise one or more of: using labeled training emails; using target user-based information; and using information from the global email model. Global and user model scores for an email sent to a target user can be combined to produce an email score. The email score can be compared with a desired email threshold to determine whether the email message sent to the target user is desired or not.

Type: Application

Filed: February 16, 2009

Publication date: August 19, 2010

Applicant: Microsoft Corporation

Inventors: Wen-tau Yih, Chrisopher A. Meek, Robert L. McCann, Ming-Wei Chang

1 2 next