Patents by Inventor Dengyong Zhou
Dengyong Zhou has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11954442Abstract: The present disclosure is directed to systems and methods for performing reading comprehension with machine learning. More specifically, the present disclosure is directed to a Neural Symbolic Reader (example implementations of which may be referred to as NeRd), which includes a reader to encode the passage and question, and a programmer to generate a program for multi-step reasoning. By using operators like span selection, the program can be executed over a natural language text passage to generate an answer to a natural language text question. NeRd is domain-agnostic such that the same neural architecture works for different domains. Further, NeRd is compositional such that complex programs can be generated by compositionally applying the symbolic operators.Type: GrantFiled: August 6, 2020Date of Patent: April 9, 2024Assignee: GOOGLE LLCInventors: Chen Liang, Wei Yu, Quoc V. Le, Xinyun Chen, Dengyong Zhou
-
Publication number: 20240013059Abstract: Provided is a knowledge distillation technique for training a student language model that, relative to a larger teacher language model, has a significantly smaller vocabulary, lower embedding dimensions, and/or hidden state dimensions. Specifically, aspects of the present disclosure are directed to a dual-training mechanism that trains the teacher and student language models simultaneously to obtain optimal word embeddings for the student vocabulary. In some implementations, this approach can be combined with learning shared projection matrices that transfer layer-wise knowledge from the teacher language model to the student language model. Example experimental results have also demonstrated higher compression efficiency and accuracy when compared with other state-of-the-art compression techniques, including the ability to compress the BERTBASE model by more than 60×, with only a minor drop in downstream task metrics, resulting in a language model with a footprint of under 7 MB.Type: ApplicationFiled: September 21, 2023Publication date: January 11, 2024Inventors: Yang Song, Raghav Gupta, Dengyong Zhou, Sanqiang Zhao
-
Publication number: 20230394328Abstract: Example embodiments of aspects of the present disclosure provide an example computer-implemented method for improved prompting of a machine-learned model. The example method can include obtaining an instructive sequence descriptive of an instructive query, an instructive response, and an instructive trace of intermediate states from the instructive query to the instructive response. The example method can include inputting, to a machine-learned model, the instructive sequence and an operative query, wherein the machine-learned model is configured to process the operative query with attention over the instructive sequence. The example method can include generating, using the machine-learned model and responsive to the operative query, an operative response.Type: ApplicationFiled: August 5, 2022Publication date: December 7, 2023Inventors: Jason Weng Wei, Dengyong Zhou, Dale Eric Schuurmans, Quoc V. Le, Maarten Paul Bosma, Ed Huai-Hsin Chi, Olivier Jean Andrè Bousquet, Le Hou, Nathan Kemp Sekiguchi Scales, David J. Bieber, Charles Aloysius Sutton, Nathanael Martin Schärli, Augustus Quadrozzi Odena, Sharan Ajit Narang, Guy Gur-Ari Krakover, Aakanksha Chowdhery, Aitor Lewkowycz, Jiageng Luan, David Martin Dohan, Henryk Michalewski, Jacob Austin, Anders Johan Andreassen, Maxwell Isaac Nye, Xuezhi Wang
-
Patent number: 11797862Abstract: Provided is a knowledge distillation technique for training a student language model that, relative to a larger teacher language model, has a significantly smaller vocabulary, lower embedding dimensions, and/or hidden state dimensions. Specifically, aspects of the present disclosure are directed to a dual-training mechanism that trains the teacher and student language models simultaneously to obtain optimal word embeddings for the student vocabulary. In some implementations, this approach can be combined with learning shared projection matrices that transfer layer-wise knowledge from the teacher language model to the student language model. Example experimental results have also demonstrated higher compression efficiency and accuracy when compared with other state-of-the-art compression techniques, including the ability to compress the BERTBASE model by more than 60×, with only a minor drop in downstream task metrics, resulting in a language model with a footprint of under 7 MB.Type: GrantFiled: January 22, 2020Date of Patent: October 24, 2023Assignee: GOOGLE LLCInventors: Yang Song, Raghav Gupta, Dengyong Zhou, Sanqiang Zhao
-
Publication number: 20230289626Abstract: Provided are computing systems, methods, and platforms for negative sampling in knowledge graphs with improved efficiency. A knowledge graph comprising entities and links between the entities can be obtained. A query computation graph comprising nodes and edges can be generated based on the knowledge graph. The nodes of the query computation graph can include anchor nodes, a root node, and intermediate nodes positioned in paths between the anchor nodes and the root node. A node cut of a query of the query computation graph can be determined and can include at least one node that cuts at least one path between each anchor node and the root node of the query computation graph. Negative samples can be identified by bidirectionally traversing the query computation graph in a first direction from the anchor nodes to the node cut and in a second direction from the root node to the node cut.Type: ApplicationFiled: March 14, 2023Publication date: September 14, 2023Inventors: Hanjun Dai, Dale Eric Schuurmans, Xinyun Chen, Dengyong Zhou, Bo Dai, Hongyu Ren
-
Publication number: 20230244938Abstract: An example method for pretraining a machine-learned model is provided. The example method includes obtaining a plurality of different combinations of configuration parameters of a pretraining objective framework. The example method includes generating, using the pretraining objective framework, a plurality of corrupted training examples from one or more training examples, wherein the plurality of corrupted training examples are respectively generated according to the plurality of different combinations. The example method includes inputting the plurality of corrupted training examples into the machine-learned model, wherein the machine-learned model is configured to generate uncorrupted subportions corresponding to corrupted subportions of the corrupted training examples. The example method includes obtaining, from the machine-learned model, a plurality of outputs respectively generated by the machine-learned model based on the plurality of corrupted training examples.Type: ApplicationFiled: January 27, 2023Publication date: August 3, 2023Inventors: Jason Weng Wei, Dengyong Zhou, Xuezhi Wang, Dale Eric Schuurmans, Quoc V. Le, Maarten Paul Bosma, Ed Huai-Hsin Chi, Olivier Jean Andrè Bousquet, Le Hou, Charles Aloysius Sutton, Nathanael Martin Schärli, Nathan Kemp Sekiguchi Scales, Augustus Quadrozzi Odena, Sharan Ajit Narang, Guy Gur-Ari Krakover, Aakanksha Chowdhery, David Martin Dohan, Aitor Lewkowycz, Henryk Michalewski, Jiageng Luan, David J. Bieber, Jacob Austin, Anders Johan Andreassen, Maxwell Isaac Nye, Yi Tay, Mostafa Dehghani
-
Publication number: 20220108221Abstract: Systems and methods of the present disclosure are directed to a computer-implemented method. The method can include obtaining a machine-learned model comprising a plurality of model units, wherein each model unit comprises a plurality of parameters that are tied to a shared plurality of parameters. The method can include performing a first plurality of training iterations with the machine-learned model to adjust parameters of the shared plurality of parameters. The method can include detecting, based on the first plurality of training iterations, an occurrence of an untying condition. The method can include untying the parameters of one or more model units from the shared plurality of parameters. The method can include performing a second plurality of training iterations with the machine-learned model to adjust parameters of the one or more model units independent of the shared plurality of parameters.Type: ApplicationFiled: October 4, 2021Publication date: April 7, 2022Inventors: Dengyong Zhou, Xiaodan Song, Shuo Yang, Qiang Liu, Le Hou
-
Publication number: 20220043981Abstract: The present disclosure is directed to systems and methods for performing reading comprehension with machine learning. More specifically, the present disclosure is directed to a Neural Symbolic Reader (example implementations of which may be referred to as NeRd), which includes a reader to encode the passage and question, and a programmer to generate a program for multi-step reasoning. By using operators like span selection, the program can be executed over a natural language text passage to generate an answer to a natural language text question. NeRd is domain-agnostic such that the same neural architecture works for different domains. Further, NeRd it is compositional such that complex programs can be generated by compositionally applying the symbolic operators.Type: ApplicationFiled: August 6, 2020Publication date: February 10, 2022Inventors: Chen Liang, Wei Yu, Quoc V. Le, Xinyun Chen, Dengyong Zhou
-
Publication number: 20210224660Abstract: Provided is a knowledge distillation technique for training a student language model that, relative to a larger teacher language model, has a significantly smaller vocabulary, lower embedding dimensions, and/or hidden state dimensions. Specifically, aspects of the present disclosure are directed to a dual-training mechanism that trains the teacher and student language models simultaneously to obtain optimal word embeddings for the student vocabulary. In some implementations, this approach can be combined with learning shared projection matrices that transfer layer-wise knowledge from the teacher language model to the student language model. Example experimental results have also demonstrated higher compression efficiency and accuracy when compared with other state-of-the-art compression techniques, including the ability to compress the BERTBAsE model by more than 60×, with only a minor drop in downstream task metrics, resulting in a language model with a footprint of under 7 MB.Type: ApplicationFiled: January 22, 2020Publication date: July 22, 2021Inventors: Yang Song, Raghav Gupta, Dengyong Zhou, Sanqiang Zhao
-
Publication number: 20210065066Abstract: A deep state space generative model is augmented with intervention prediction. The state space model provides a principled way to capture the interactions among observations, interventions, critical event occurrences, true states, and associated uncertainty. The state space model can include a discrete-time hazard rate model that provides flexible fitting of general survival time distributions. The state space model can output a joint prediction of event risk, observation and intervention trajectories based on patterns in temporal progressions, and correlations between past measurements and interventions.Type: ApplicationFiled: August 31, 2020Publication date: March 4, 2021Inventors: Yuan Xue, Dengyong Zhou, Nan Du, Andrew Mingbo Dai, Zhen Xu, Kun Zhang, Yingwei Cui
-
Patent number: 10795645Abstract: Described are systems, methods, and computer-readable media for program generation in a domain-specific language based on input-output examples. In accordance with various embodiments, a neural-network-based program generation model conditioned on an encoded set of input-output examples is used to generate a program tree by iteratively expanding a partial program tree, beginning with a root node and ending when all leaf nodes are terminal.Type: GrantFiled: March 27, 2017Date of Patent: October 6, 2020Assignee: Microsoft Technology Licensing, LLCInventors: Abdelrahman S. A. Mohamed, Rishabh Singh, Lihong Li, Dengyong Zhou, Pushmeet Kohli, Emilio Parisotto
-
Publication number: 20190266246Abstract: In neural-network-based approaches to sequence modeling, an output sequence may be modeled via segmentations, the probability of the output sequence being constructed as a sum of products of output-segment probabilities, taken over all valid output-sequence segmentations. A set of artificial neural networks may model the distribution of the output-sequence probability with a recurrent neural network modeling the distributions of the individual output-segment probabilities, optionally in conjunction with a second recurrent neural network modeling concatenations of output segments. In various embodiments, this approach is applied to neural phrase-based machine translation.Type: ApplicationFiled: February 23, 2018Publication date: August 29, 2019Inventors: Chong Wang, Yining Wang, Po-Sen Huang, Abdelrahman Samir Abdelrahman Mohamed, Dengyong Zhou, Li Deng, Sitao Huang
-
Publication number: 20180275967Abstract: Described are systems, methods, and computer-readable media for program generation in a domain-specific language based on input-output examples. In accordance with various embodiments, a neural-network-based program generation model conditioned on an encoded set of input-output examples is used to generate a program tree by iteratively expanding a partial program tree, beginning with a root node and ending when all leaf nodes are terminal.Type: ApplicationFiled: March 27, 2017Publication date: September 27, 2018Inventors: Abdelrahman S.A. Mohamed, Rishabh Singh, Lihong Li, Dengyong Zhou, Pushmeet Kohli, Emilio Parisotto
-
Publication number: 20150262313Abstract: A user interface may include instructions to complete a task (including a plurality of task items) and rule(s) that indicate to a worker how a payment associated with the task is to be calculated. The worker may provide information associated with the individual task items via the user interface. The payment may be calculated based on the rule(s), where the payment is determined based at least in part on a multiplicative payment component. In some implementations, the user interface may include an option for the worker to skip question(s), and the worker may be incentivized to skip question(s) when the worker does not know the answer. Further, in some implementations, the user interface may allow the worker to specify a confidence value when the worker chooses to answer the question, and the worker may be incentivized to provide an accurate confidence value.Type: ApplicationFiled: March 12, 2014Publication date: September 17, 2015Applicant: Microsoft CorporationInventors: Nihar Bhadresh Shah, Dengyong Zhou
-
Patent number: 9070046Abstract: Architecture that performs image page index selection. A learning-based framework learns a statistical model based on the hyperlink (URL-uniform resource locator) previous click information obtained from the image search users. The learned model can combine the features of a newly discovered URL to predict the possibility of the newly-discovered URL being clicked in the future image search. In addition to existing web index selection features, image clicks are added as features, and the image clicks are aggregated over different URL segments, as well as the site modeling pattern trees to reduce the sparse problem of the image click information.Type: GrantFiled: October 17, 2012Date of Patent: June 30, 2015Assignee: Microsoft Technology Licensing, LLCInventors: Bo Geng, Xian-Sheng Hua, Zhong Wu, Dengyong Zhou
-
Patent number: 8914321Abstract: A system and method infer true labels for multiple items. The inferred labels are generated from judgments. Multiple judges select the judgments from a specified choice of labels for each item. The method includes determining a characterization of judge expertise and item difficulties based on the judgments. The method also includes determining, using maximum entropy, a probability distribution over the specified choice of labels for each judge and item, based on the judgments. The method further includes selecting improved labels for the items from the specified choice such that the entropy over the probability distribution is reduced. The improved labels represent an improvement from the judgments toward the true labels. Additionally, the method includes performing iterative procedure to determine the true labels, the characterizations of judge expertise and the labeling difficulties.Type: GrantFiled: February 3, 2013Date of Patent: December 16, 2014Assignee: Microsoft CorporationInventors: Dengyong Zhou, Sumit Basu, Yi Mao, John C. Platt
-
Patent number: 8805754Abstract: A spam detection system is disclosed. The system includes a classifier training component that receives a first set of training pages labeled as normal pages and a second set of training pages labeled as spam pages. The training component trains a web page classifier based on both the first set of training pages and the second set of training pages. A spam detector then receives unlabeled web pages uses the web page classifier to classify the unlabeled web pages as spam pages or normal pages.Type: GrantFiled: June 19, 2013Date of Patent: August 12, 2014Assignee: Microsoft CorporationInventors: Dengyong Zhou, Christopher Burges, Tao Tao
-
Publication number: 20140222747Abstract: A system and method infer true labels for multiple items. The inferred labels are generated from judgments. Multiple judges select the judgments from a specified choice of labels for each item. The method includes determining a characterization of judge expertise and item difficulties based on the judgments. The method also includes determining, using maximum entropy, a probability distribution over the specified choice of labels for each judge and item, based on the judgments. The method further includes selecting improved labels for the items from the specified choice such that the entropy over the probability distribution is reduced. The improved labels represent an improvement from the judgments toward the true labels. Additionally, the method includes performing iterative procedure to determine the true labels, the characterizations of judge expertise and the labeling difficulties.Type: ApplicationFiled: February 3, 2013Publication date: August 7, 2014Applicant: MICROSOFT CORPORATIONInventors: Dengyong Zhou, Sumit Basu, Yi Mao, John C. Platt
-
Publication number: 20140172767Abstract: To optimize the number of correct decisions made by a crowdsourcing system given a fixed budget, tasks for multiple decisions are allocated to workers in a sequence. A task is allocated to a worker based on results already achieved for that task from other workers. Such allocation addresses the different levels of difficulty of decisions. A task also can be allocated to a worker based on results already received for other tasks from that worker. Such allocation addresses the different levels of reliability of workers. The process of allocating tasks to workers can be modeled as a Bayesian Markov decision process. Given the information already received for each item and worker, an estimate of the number of correct labels received can be determined. At each step, the system attempts to maximize the estimated number of correct labels it expects to have given the inputs so far.Type: ApplicationFiled: December 14, 2012Publication date: June 19, 2014Applicant: MICROSOFT CORPORATIONInventors: Xi Chen, Qihang Lin, Dengyong Zhou
-
Publication number: 20140105488Abstract: Architecture that performs image page index selection. A learning-based framework learns a statistical model based on the hyperlink (URL-uniform resource locator) previous click information obtained from the image search users. The learned model can combine the features of a newly discovered URL to predict the possibility of the newly-discovered URL being clicked in the future image search. In addition to existing web index selection features, image clicks are added as features, and the image clicks are aggregated over different URL segments, as well as the site modeling pattern trees to reduce the sparse problem of the image click information.Type: ApplicationFiled: October 17, 2012Publication date: April 17, 2014Applicant: MICROSOFT CORPORATIONInventors: Bo Geng, Xian-Sheng Hua, Zhong Wu, Dengyong Zhou