Patents by Inventor Dengyong Zhou

Dengyong Zhou has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Neural symbolic reader

Patent number: 11954442

Abstract: The present disclosure is directed to systems and methods for performing reading comprehension with machine learning. More specifically, the present disclosure is directed to a Neural Symbolic Reader (example implementations of which may be referred to as NeRd), which includes a reader to encode the passage and question, and a programmer to generate a program for multi-step reasoning. By using operators like span selection, the program can be executed over a natural language text passage to generate an answer to a natural language text question. NeRd is domain-agnostic such that the same neural architecture works for different domains. Further, NeRd is compositional such that complex programs can be generated by compositionally applying the symbolic operators.

Type: Grant

Filed: August 6, 2020

Date of Patent: April 9, 2024

Assignee: GOOGLE LLC

Inventors: Chen Liang, Wei Yu, Quoc V. Le, Xinyun Chen, Dengyong Zhou
Extreme Language Model Compression with Optimal Sub-Words and Shared Projections

Publication number: 20240013059

Abstract: Provided is a knowledge distillation technique for training a student language model that, relative to a larger teacher language model, has a significantly smaller vocabulary, lower embedding dimensions, and/or hidden state dimensions. Specifically, aspects of the present disclosure are directed to a dual-training mechanism that trains the teacher and student language models simultaneously to obtain optimal word embeddings for the student vocabulary. In some implementations, this approach can be combined with learning shared projection matrices that transfer layer-wise knowledge from the teacher language model to the student language model. Example experimental results have also demonstrated higher compression efficiency and accuracy when compared with other state-of-the-art compression techniques, including the ability to compress the BERTBASE model by more than 60×, with only a minor drop in downstream task metrics, resulting in a language model with a footprint of under 7 MB.

Type: Application

Filed: September 21, 2023

Publication date: January 11, 2024

Inventors: Yang Song, Raghav Gupta, Dengyong Zhou, Sanqiang Zhao
Prompting Machine-Learned Models Using Chains of Thought

Publication number: 20230394328

Abstract: Example embodiments of aspects of the present disclosure provide an example computer-implemented method for improved prompting of a machine-learned model. The example method can include obtaining an instructive sequence descriptive of an instructive query, an instructive response, and an instructive trace of intermediate states from the instructive query to the instructive response. The example method can include inputting, to a machine-learned model, the instructive sequence and an operative query, wherein the machine-learned model is configured to process the operative query with attention over the instructive sequence. The example method can include generating, using the machine-learned model and responsive to the operative query, an operative response.

Type: Application

Filed: August 5, 2022

Publication date: December 7, 2023

Inventors: Jason Weng Wei, Dengyong Zhou, Dale Eric Schuurmans, Quoc V. Le, Maarten Paul Bosma, Ed Huai-Hsin Chi, Olivier Jean Andrè Bousquet, Le Hou, Nathan Kemp Sekiguchi Scales, David J. Bieber, Charles Aloysius Sutton, Nathanael Martin Schärli, Augustus Quadrozzi Odena, Sharan Ajit Narang, Guy Gur-Ari Krakover, Aakanksha Chowdhery, Aitor Lewkowycz, Jiageng Luan, David Martin Dohan, Henryk Michalewski, Jacob Austin, Anders Johan Andreassen, Maxwell Isaac Nye, Xuezhi Wang
Extreme language model compression with optimal sub-words and shared projections

Patent number: 11797862

Abstract: Provided is a knowledge distillation technique for training a student language model that, relative to a larger teacher language model, has a significantly smaller vocabulary, lower embedding dimensions, and/or hidden state dimensions. Specifically, aspects of the present disclosure are directed to a dual-training mechanism that trains the teacher and student language models simultaneously to obtain optimal word embeddings for the student vocabulary. In some implementations, this approach can be combined with learning shared projection matrices that transfer layer-wise knowledge from the teacher language model to the student language model. Example experimental results have also demonstrated higher compression efficiency and accuracy when compared with other state-of-the-art compression techniques, including the ability to compress the BERTBASE model by more than 60×, with only a minor drop in downstream task metrics, resulting in a language model with a footprint of under 7 MB.

Type: Grant

Filed: January 22, 2020

Date of Patent: October 24, 2023

Assignee: GOOGLE LLC

Inventors: Yang Song, Raghav Gupta, Dengyong Zhou, Sanqiang Zhao
Knowledge Graph Completion and Multi-Hop Reasoning in Knowledge Graphs at Scale

Publication number: 20230289626

Abstract: Provided are computing systems, methods, and platforms for negative sampling in knowledge graphs with improved efficiency. A knowledge graph comprising entities and links between the entities can be obtained. A query computation graph comprising nodes and edges can be generated based on the knowledge graph. The nodes of the query computation graph can include anchor nodes, a root node, and intermediate nodes positioned in paths between the anchor nodes and the root node. A node cut of a query of the query computation graph can be determined and can include at least one node that cuts at least one path between each anchor node and the root node of the query computation graph. Negative samples can be identified by bidirectionally traversing the query computation graph in a first direction from the anchor nodes to the node cut and in a second direction from the root node to the node cut.

Type: Application

Filed: March 14, 2023

Publication date: September 14, 2023

Inventors: Hanjun Dai, Dale Eric Schuurmans, Xinyun Chen, Dengyong Zhou, Bo Dai, Hongyu Ren
Using Chains of Thought to Prompt Machine-Learned Models Pre-Trained on Diversified Objectives

Publication number: 20230244938

Abstract: An example method for pretraining a machine-learned model is provided. The example method includes obtaining a plurality of different combinations of configuration parameters of a pretraining objective framework. The example method includes generating, using the pretraining objective framework, a plurality of corrupted training examples from one or more training examples, wherein the plurality of corrupted training examples are respectively generated according to the plurality of different combinations. The example method includes inputting the plurality of corrupted training examples into the machine-learned model, wherein the machine-learned model is configured to generate uncorrupted subportions corresponding to corrupted subportions of the corrupted training examples. The example method includes obtaining, from the machine-learned model, a plurality of outputs respectively generated by the machine-learned model based on the plurality of corrupted training examples.

Type: Application

Filed: January 27, 2023

Publication date: August 3, 2023

Inventors: Jason Weng Wei, Dengyong Zhou, Xuezhi Wang, Dale Eric Schuurmans, Quoc V. Le, Maarten Paul Bosma, Ed Huai-Hsin Chi, Olivier Jean Andrè Bousquet, Le Hou, Charles Aloysius Sutton, Nathanael Martin Schärli, Nathan Kemp Sekiguchi Scales, Augustus Quadrozzi Odena, Sharan Ajit Narang, Guy Gur-Ari Krakover, Aakanksha Chowdhery, David Martin Dohan, Aitor Lewkowycz, Henryk Michalewski, Jiageng Luan, David J. Bieber, Jacob Austin, Anders Johan Andreassen, Maxwell Isaac Nye, Yi Tay, Mostafa Dehghani
Systems And Methods For Parameter Sharing To Reduce Computational Costs Of Training Machine-Learned Models

Publication number: 20220108221

Abstract: Systems and methods of the present disclosure are directed to a computer-implemented method. The method can include obtaining a machine-learned model comprising a plurality of model units, wherein each model unit comprises a plurality of parameters that are tied to a shared plurality of parameters. The method can include performing a first plurality of training iterations with the machine-learned model to adjust parameters of the shared plurality of parameters. The method can include detecting, based on the first plurality of training iterations, an occurrence of an untying condition. The method can include untying the parameters of one or more model units from the shared plurality of parameters. The method can include performing a second plurality of training iterations with the machine-learned model to adjust parameters of the one or more model units independent of the shared plurality of parameters.

Type: Application

Filed: October 4, 2021

Publication date: April 7, 2022

Inventors: Dengyong Zhou, Xiaodan Song, Shuo Yang, Qiang Liu, Le Hou
Neural Symbolic Reader

Publication number: 20220043981

Abstract: The present disclosure is directed to systems and methods for performing reading comprehension with machine learning. More specifically, the present disclosure is directed to a Neural Symbolic Reader (example implementations of which may be referred to as NeRd), which includes a reader to encode the passage and question, and a programmer to generate a program for multi-step reasoning. By using operators like span selection, the program can be executed over a natural language text passage to generate an answer to a natural language text question. NeRd is domain-agnostic such that the same neural architecture works for different domains. Further, NeRd it is compositional such that complex programs can be generated by compositionally applying the symbolic operators.

Type: Application

Filed: August 6, 2020

Publication date: February 10, 2022

Inventors: Chen Liang, Wei Yu, Quoc V. Le, Xinyun Chen, Dengyong Zhou
Extreme Language Model Compression with Optimal Sub-Words and Shared Projections

Publication number: 20210224660

Abstract: Provided is a knowledge distillation technique for training a student language model that, relative to a larger teacher language model, has a significantly smaller vocabulary, lower embedding dimensions, and/or hidden state dimensions. Specifically, aspects of the present disclosure are directed to a dual-training mechanism that trains the teacher and student language models simultaneously to obtain optimal word embeddings for the student vocabulary. In some implementations, this approach can be combined with learning shared projection matrices that transfer layer-wise knowledge from the teacher language model to the student language model. Example experimental results have also demonstrated higher compression efficiency and accuracy when compared with other state-of-the-art compression techniques, including the ability to compress the BERTBAsE model by more than 60×, with only a minor drop in downstream task metrics, resulting in a language model with a footprint of under 7 MB.

Type: Application

Filed: January 22, 2020

Publication date: July 22, 2021

Inventors: Yang Song, Raghav Gupta, Dengyong Zhou, Sanqiang Zhao
Machine-Learned State Space Model for Joint Forecasting

Publication number: 20210065066

Abstract: A deep state space generative model is augmented with intervention prediction. The state space model provides a principled way to capture the interactions among observations, interventions, critical event occurrences, true states, and associated uncertainty. The state space model can include a discrete-time hazard rate model that provides flexible fitting of general survival time distributions. The state space model can output a joint prediction of event risk, observation and intervention trajectories based on patterns in temporal progressions, and correlations between past measurements and interventions.

Type: Application

Filed: August 31, 2020

Publication date: March 4, 2021

Inventors: Yuan Xue, Dengyong Zhou, Nan Du, Andrew Mingbo Dai, Zhen Xu, Kun Zhang, Yingwei Cui
Neural network for program synthesis

Patent number: 10795645

Abstract: Described are systems, methods, and computer-readable media for program generation in a domain-specific language based on input-output examples. In accordance with various embodiments, a neural-network-based program generation model conditioned on an encoded set of input-output examples is used to generate a program tree by iteratively expanding a partial program tree, beginning with a root node and ending when all leaf nodes are terminal.

Type: Grant

Filed: March 27, 2017

Date of Patent: October 6, 2020

Assignee: Microsoft Technology Licensing, LLC

Inventors: Abdelrahman S. A. Mohamed, Rishabh Singh, Lihong Li, Dengyong Zhou, Pushmeet Kohli, Emilio Parisotto
SEQUENCE MODELING VIA SEGMENTATIONS

Publication number: 20190266246

Abstract: In neural-network-based approaches to sequence modeling, an output sequence may be modeled via segmentations, the probability of the output sequence being constructed as a sum of products of output-segment probabilities, taken over all valid output-sequence segmentations. A set of artificial neural networks may model the distribution of the output-sequence probability with a recurrent neural network modeling the distributions of the individual output-segment probabilities, optionally in conjunction with a second recurrent neural network modeling concatenations of output segments. In various embodiments, this approach is applied to neural phrase-based machine translation.

Type: Application

Filed: February 23, 2018

Publication date: August 29, 2019

Inventors: Chong Wang, Yining Wang, Po-Sen Huang, Abdelrahman Samir Abdelrahman Mohamed, Dengyong Zhou, Li Deng, Sitao Huang
NEURAL NETWORK FOR PROGRAM SYNTHESIS

Publication number: 20180275967

Abstract: Described are systems, methods, and computer-readable media for program generation in a domain-specific language based on input-output examples. In accordance with various embodiments, a neural-network-based program generation model conditioned on an encoded set of input-output examples is used to generate a program tree by iteratively expanding a partial program tree, beginning with a root node and ending when all leaf nodes are terminal.

Type: Application

Filed: March 27, 2017

Publication date: September 27, 2018

Inventors: Abdelrahman S.A. Mohamed, Rishabh Singh, Lihong Li, Dengyong Zhou, Pushmeet Kohli, Emilio Parisotto
MULTIPLICATIVE INCENTIVE MECHANISMS

Publication number: 20150262313

Abstract: A user interface may include instructions to complete a task (including a plurality of task items) and rule(s) that indicate to a worker how a payment associated with the task is to be calculated. The worker may provide information associated with the individual task items via the user interface. The payment may be calculated based on the rule(s), where the payment is determined based at least in part on a multiplicative payment component. In some implementations, the user interface may include an option for the worker to skip question(s), and the worker may be incentivized to skip question(s) when the worker does not know the answer. Further, in some implementations, the user interface may allow the worker to specify a confidence value when the worker chooses to answer the question, and the worker may be incentivized to provide an accurate confidence value.

Type: Application

Filed: March 12, 2014

Publication date: September 17, 2015

Applicant: Microsoft Corporation

Inventors: Nihar Bhadresh Shah, Dengyong Zhou
Learning-based image webpage index selection

Patent number: 9070046

Abstract: Architecture that performs image page index selection. A learning-based framework learns a statistical model based on the hyperlink (URL-uniform resource locator) previous click information obtained from the image search users. The learned model can combine the features of a newly discovered URL to predict the possibility of the newly-discovered URL being clicked in the future image search. In addition to existing web index selection features, image clicks are added as features, and the image clicks are aggregated over different URL segments, as well as the site modeling pattern trees to reduce the sparse problem of the image click information.

Type: Grant

Filed: October 17, 2012

Date of Patent: June 30, 2015

Assignee: Microsoft Technology Licensing, LLC

Inventors: Bo Geng, Xian-Sheng Hua, Zhong Wu, Dengyong Zhou
Learning with noisy labels from multiple judges

Patent number: 8914321

Abstract: A system and method infer true labels for multiple items. The inferred labels are generated from judgments. Multiple judges select the judgments from a specified choice of labels for each item. The method includes determining a characterization of judge expertise and item difficulties based on the judgments. The method also includes determining, using maximum entropy, a probability distribution over the specified choice of labels for each judge and item, based on the judgments. The method further includes selecting improved labels for the items from the specified choice such that the entropy over the probability distribution is reduced. The improved labels represent an improvement from the judgments toward the true labels. Additionally, the method includes performing iterative procedure to determine the true labels, the characterizations of judge expertise and the labeling difficulties.

Type: Grant

Filed: February 3, 2013

Date of Patent: December 16, 2014

Assignee: Microsoft Corporation

Inventors: Dengyong Zhou, Sumit Basu, Yi Mao, John C. Platt
Link spam detection using smooth classification function

Patent number: 8805754

Abstract: A spam detection system is disclosed. The system includes a classifier training component that receives a first set of training pages labeled as normal pages and a second set of training pages labeled as spam pages. The training component trains a web page classifier based on both the first set of training pages and the second set of training pages. A spam detector then receives unlabeled web pages uses the web page classifier to classify the unlabeled web pages as spam pages or normal pages.

Type: Grant

Filed: June 19, 2013

Date of Patent: August 12, 2014

Assignee: Microsoft Corporation

Inventors: Dengyong Zhou, Christopher Burges, Tao Tao
LEARNING WITH NOISY LABELS FROM MULTIPLE JUDGES

Publication number: 20140222747

Abstract: A system and method infer true labels for multiple items. The inferred labels are generated from judgments. Multiple judges select the judgments from a specified choice of labels for each item. The method includes determining a characterization of judge expertise and item difficulties based on the judgments. The method also includes determining, using maximum entropy, a probability distribution over the specified choice of labels for each judge and item, based on the judgments. The method further includes selecting improved labels for the items from the specified choice such that the entropy over the probability distribution is reduced. The improved labels represent an improvement from the judgments toward the true labels. Additionally, the method includes performing iterative procedure to determine the true labels, the characterizations of judge expertise and the labeling difficulties.

Type: Application

Filed: February 3, 2013

Publication date: August 7, 2014

Applicant: MICROSOFT CORPORATION

Inventors: Dengyong Zhou, Sumit Basu, Yi Mao, John C. Platt
BUDGET OPTIMAL CROWDSOURCING

Publication number: 20140172767

Abstract: To optimize the number of correct decisions made by a crowdsourcing system given a fixed budget, tasks for multiple decisions are allocated to workers in a sequence. A task is allocated to a worker based on results already achieved for that task from other workers. Such allocation addresses the different levels of difficulty of decisions. A task also can be allocated to a worker based on results already received for other tasks from that worker. Such allocation addresses the different levels of reliability of workers. The process of allocating tasks to workers can be modeled as a Bayesian Markov decision process. Given the information already received for each item and worker, an estimate of the number of correct labels received can be determined. At each step, the system attempts to maximize the estimated number of correct labels it expects to have given the inputs so far.

Type: Application

Filed: December 14, 2012

Publication date: June 19, 2014

Applicant: MICROSOFT CORPORATION

Inventors: Xi Chen, Qihang Lin, Dengyong Zhou
LEARNING-BASED IMAGE PAGE INDEX SELECTION

Publication number: 20140105488

Abstract: Architecture that performs image page index selection. A learning-based framework learns a statistical model based on the hyperlink (URL-uniform resource locator) previous click information obtained from the image search users. The learned model can combine the features of a newly discovered URL to predict the possibility of the newly-discovered URL being clicked in the future image search. In addition to existing web index selection features, image clicks are added as features, and the image clicks are aggregated over different URL segments, as well as the site modeling pattern trees to reduce the sparse problem of the image click information.

Type: Application

Filed: October 17, 2012

Publication date: April 17, 2014

Applicant: MICROSOFT CORPORATION

Inventors: Bo Geng, Xian-Sheng Hua, Zhong Wu, Dengyong Zhou

1 2 next