Patents by Inventor Wen-tau Yih

Wen-tau Yih has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11250331
    Abstract: A technique is described herein for processing documents in a time-efficient and accurate manner. In a training phase, the technique generates a set of initial training examples by associating entity mentions in a text corpus with corresponding entity identifiers. Each entity identifier uniquely identifies an entity in a particular ontology. The technique then removes noisy training examples from the set of initial training examples, to provide a set of filtered training examples. The technique then applies a machine-learning process to generate a linking component based, in part, on the set of filtered training examples. In an application phase, the technique uses the linking component to link input entity mentions with corresponding entity identifiers. Various application systems can leverage the capabilities of the linking component, including a search system, a document-creation system, etc.
    Type: Grant
    Filed: October 31, 2017
    Date of Patent: February 15, 2022
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Christopher Brian Quirk, Hoifung Poon, Wen-tau Yih, Hai Wang
  • Patent number: 10592519
    Abstract: A processing unit can determine multiple representations associated with a statement, e.g., subject or predicate representations. In some examples, the representations can lack representation of semantics of the statement. The computing device can determine a computational model of the statement based at least in part on the representations. The computing device can receive a query, e.g., via a communications interface. The computing device can determine at least one query representation, e.g., a subject, predicate, or entity representation. The computing device can then operate the model using the query representation to provide a model output. The model output can represent a relationship between the query representations and information in the model. The computing device can, e.g., transmit an indication of the model output via the communications interface.
    Type: Grant
    Filed: March 29, 2016
    Date of Patent: March 17, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Xiaodong He, Li Deng, Jianfeng Gao, Wen-tau Yih, Moontae Lee, Paul Smolensky
  • Publication number: 20190130282
    Abstract: A technique is described herein for processing documents in a time-efficient and accurate manner. In a training phase, the technique generates a set of initial training examples by associating entity mentions in a text corpus with corresponding entity identifiers. Each entity identifier uniquely identifies an entity in a particular ontology. The technique then removes noisy training examples from the set of initial training examples, to provide a set of filtered training examples. The technique then applies a machine-learning process to generate a linking component based, in part, on the set of filtered training examples. In an application phase, the technique uses the linking component to link input entity mentions with corresponding entity identifiers. Various application systems can leverage the capabilities of the linking component, including a search system, a document-creation system, etc.
    Type: Application
    Filed: October 31, 2017
    Publication date: May 2, 2019
    Inventors: Christopher Brian QUIRK, Hoifung POON, Wen-tau YIH, Hai WANG
  • Patent number: 10255269
    Abstract: Long short term memory units that accept a non-predefined number of inputs are used to provide natural language relation extraction over a user-specified range on content. Content written for human consumption is parsed with distant supervision in segments (e.g., sentences, paragraphs, chapters) to determine relationships between various words within and between those segments.
    Type: Grant
    Filed: December 30, 2016
    Date of Patent: April 9, 2019
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Christopher Brian Quirk, Kristina Nikolova Toutanova, Wen-tau Yih, Hoifung Poon, Nanyun Peng
  • Publication number: 20180189269
    Abstract: Long short term memory units that accept a non-predefined number of inputs are used to provide natural language relation extraction over a user-specified range on content. Content written for human consumption is parsed with distant supervision in segments (e.g., sentences, paragraphs, chapters) to determine relationships between various words within and between those segments.
    Type: Application
    Filed: December 30, 2016
    Publication date: July 5, 2018
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Christopher Brian Quirk, Kristina Nikolova Toutanova, Wen-tau Yih, Hoifung Poon, Nanyun Peng
  • Publication number: 20170286494
    Abstract: A processing unit can determine multiple representations associated with a statement, e.g., subject or predicate representations. In some examples, the representations can lack representation of semantics of the statement. The computing device can determine a computational model of the statement based at least in part on the representations. The computing device can receive a query, e.g., via a communications interface. The computing device can determine at least one query representation, e.g., a subject, predicate, or entity representation. The computing device can then operate the model using the query representation to provide a model output. The model output can represent a relationship between the query representations and information in the model. The computing device can, e.g., transmit an indication of the model output via the communications interface.
    Type: Application
    Filed: March 29, 2016
    Publication date: October 5, 2017
    Inventors: Xiaodong He, Li Deng, Jianfeng Gao, Wen-tau Yih, Moontae Lee, Paul Smolensky
  • Publication number: 20170193157
    Abstract: Drug combinations offer promising treatment for some conditions such as cancer. However, the large number of available drug combinations makes it impractical to try all possible combinations. Machine-learning techniques described in this disclosure train a classification algorithm. Once trained, the classification algorithm uses genomic data from a specific patient to perform in silico tests of drugs and drug combinations against the genomic data to determine which therapies are likely to be effective for treating a condition of the specific patient.
    Type: Application
    Filed: December 30, 2015
    Publication date: July 6, 2017
    Inventors: Christopher B. Quirk, Wen-tau Yih, Hoifung Poon, Kristina Toutanova, Stephen William Mayhew, Sheng Wang
  • Patent number: 9183173
    Abstract: Described is a technology for measuring the similarity between two objects (e.g., documents), via a framework that learns the term-weighting function from training data, e.g., labeled pairs of objects, to develop a learned model. A learning procedure tunes the model parameters by minimizing a defined loss function of the similarity score. Also described is using the learning procedure and learned model to detect near duplicate documents.
    Type: Grant
    Filed: March 2, 2010
    Date of Patent: November 10, 2015
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Wen-tau Yih, Christopher A. Meek, Hannaneh Hajishirzi
  • Patent number: 9009148
    Abstract: There is provided a computer-implemented method and system for ranking documents. The method includes identifying a number of query-document pairs based on clickthrough data for a number of documents. The method also includes building a latent semantic model based on the query-document pairs and ranking the documents for a search based on the latent semantic model.
    Type: Grant
    Filed: December 19, 2011
    Date of Patent: April 14, 2015
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jianfeng Gao, Kristina Toutanova, Wen-tau Yih
  • Patent number: 8996515
    Abstract: Two methods for measuring keyword-document relevance are described. The methods receive a keyword and a document as input and output a probability value for the keyword. The first method is a similarity-based approach which uses techniques for measuring similarity between two short-text segments to measure relevance between the keyword and the document. The second method is a regression-based approach based on an assumption that if an out-of-document phrase (the keyword) is semantically similar to an in-document phrase, then relevance scores of the in and out-of document phrases should be close to each other.
    Type: Grant
    Filed: September 11, 2012
    Date of Patent: March 31, 2015
    Assignee: Microsoft Corporation
    Inventors: Wen-tau Yih, Christopher A. Meek
  • Publication number: 20140249799
    Abstract: Relational similarity measuring embodiments are presented that generally involve creating a relational similarity model that, given two pairs of words, is used to measure a degree of relational similarity between the two relations respectively exhibited by these word pairs. In one exemplary embodiment this involves creating a combined relational similarity model from a plurality of relational similarity models. This is generally accomplished by first selecting a plurality of relational similarity models, each of which measures relational similarity between two pairs of words, and each of which is trained or created using a different method or linguistic/textual resource. The selected models are then combined to form the combined relational similarity model. The combined model inputs two pairs of words and outputs a relational similarity indicator representing a measure the degree of relational similarity between the word pairs.
    Type: Application
    Filed: March 4, 2013
    Publication date: September 4, 2014
    Applicant: Microsoft Corporation
    Inventors: Wen-tau Yih, Geoffrey Zweig, Christopher Meek, Alisa Zhila, Tomas Mikolov
  • Publication number: 20140067368
    Abstract: A document-term matrix may be generated based on a corpus. A term representation matrix may be generated based on modifying a plurality of elements of the document-term matrix based on antonym information included in the corpus. Similarities may be determined based on a plurality of elements of the term representation matrix.
    Type: Application
    Filed: August 29, 2012
    Publication date: March 6, 2014
    Applicant: Microsoft Corporation
    Inventors: Wen-tau Yih, Geoffrey G. Zweig, John C. Platt
  • Publication number: 20130159320
    Abstract: There is provided a computer-implemented method and system for ranking documents. The method includes identifying a number of query-document pairs based on clickthrough data for a number of documents. The method also includes building a latent semantic model based on the query-document pairs and ranking the documents for a search based on the latent semantic model.
    Type: Application
    Filed: December 19, 2011
    Publication date: June 20, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Jianfeng Gao, Kristina Toutanova, Wen-tau Yih
  • Publication number: 20120330978
    Abstract: Two methods for measuring keyword-document relevance are described. The methods receive a keyword and a document as input and output a probability value for the keyword. The first method is a similarity-based approach which uses techniques for measuring similarity between two short-text segments to measure relevance between the keyword and the document. The second method is a regression-based approach based on an assumption that if an out-of-document phrase (the keyword) is semantically similar to an in-document phrase, then relevance scores of the in and out-of document phrases should be close to each other.
    Type: Application
    Filed: September 11, 2012
    Publication date: December 27, 2012
    Applicant: Microsoft Corporation
    Inventors: Wen-tau Yih, Christopher A. Meek
  • Publication number: 20120323968
    Abstract: A model for mapping the raw text representation of a text object to a vector space is disclosed. A function is defined for computing a similarity score given two output vectors. A loss function is defined for computing an error based on the similarity scores and the labels of pairs of vectors. The parameters of the model are tuned to minimize the loss function. The label of two vectors indicates a degree of similarity of the objects. The label may be a binary number or a real-valued number. The function for computing similarity scores may be a cosine, Jaccard, or differentiable function. The loss function may compare pairs of vectors to their labels. Each element of the output vector is a linear or non-linear function of the terms of an input vector. The text objects may be different types of documents and two different models may be trained concurrently.
    Type: Application
    Filed: June 14, 2011
    Publication date: December 20, 2012
    Applicant: Microsoft Corporation
    Inventors: Wen-tau Yih, Kristina N. Toutanova, Christopher A. Meek, John C. Platt
  • Patent number: 8290946
    Abstract: Two methods for measuring keyword-document relevance are described. The methods receive a keyword and a document as input and output a probability value for the keyword. The first method is a similarity-based approach which uses techniques for measuring similarity between two short-text segments to measure relevance between the keyword and the document. The second method is a regression-based approach based on an assumption that if an out-of-document phrase (the keyword) is semantically similar to an in-document phrase, then relevance scores of the in and out-of document phrases should be close to each other.
    Type: Grant
    Filed: June 24, 2008
    Date of Patent: October 16, 2012
    Assignee: Microsoft Corporation
    Inventors: Wen-tau Yih, Christopher A. Meek
  • Patent number: 8135728
    Abstract: Extraction analysis techniques biased, in part, by query frequency information from a query log file and/or search engine cache are employed along with machine learning processes to determine candidate keywords and/or phrases of web documents. Web oriented features associated with the candidate keywords and/or phrases are also utilized to analyze the web documents. A keyword and/or phrase extraction mechanism can be utilized to score keywords and/or phrases in a web document and estimate a likelihood that the keywords and/or phrases are relevant, for example, in an advertising system and the like.
    Type: Grant
    Filed: January 3, 2007
    Date of Patent: March 13, 2012
    Assignee: Microsoft Corporation
    Inventors: Wen-tau Yih, Joshua T. Goodman, Vitor Rocha de Carvalho
  • Publication number: 20110219012
    Abstract: Described is a technology for measuring the similarity between two objects (e.g., documents), via a framework that learns the term-weighting function from training data, e.g., labeled pairs of objects, to develop a learned model. A learning procedure tunes the model parameters by minimizing a defined loss function of the similarity score. Also described is using the learning procedure and learned model to detect near duplicate documents.
    Type: Application
    Filed: March 2, 2010
    Publication date: September 8, 2011
    Inventors: Wen-tau Yih, Christopher A. Meek, Hannaneh Hajishirzi
  • Patent number: 7788292
    Abstract: The claimed subject matter provides systems and/or methods for normalizing document representations for use with Naïve Bayes. The system can include devices and components that determine norms associated with documents by aggregating absolute term weight values associated with the documents, and further ascertain term weights for features associated with the documents, and thereafter divides the term weights for the features associated with the documents with the norms associated with the documents to produce a normalized document representation that can be utilized by arbitrary linear classifiers.
    Type: Grant
    Filed: December 12, 2007
    Date of Patent: August 31, 2010
    Assignee: Microsoft Corporation
    Inventors: Aleksander Kolcz, Wen-tau Yih
  • Publication number: 20100211641
    Abstract: Techniques and systems are described that utilize a scalable, “light-weight” user model, which can be combined with a traditional global email spam filter, to determine whether an email message sent to a target user is a desired email. A global email model is trained with a set of email messages to detect desired emails, and a user email model is also trained to detect desired emails. Training the user email model may comprise one or more of: using labeled training emails; using target user-based information; and using information from the global email model. Global and user model scores for an email sent to a target user can be combined to produce an email score. The email score can be compared with a desired email threshold to determine whether the email message sent to the target user is desired or not.
    Type: Application
    Filed: February 16, 2009
    Publication date: August 19, 2010
    Applicant: Microsoft Corporation
    Inventors: Wen-tau Yih, Chrisopher A. Meek, Robert L. McCann, Ming-Wei Chang