Patents by Inventor Dengyong Zhou

Dengyong Zhou has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11954442
    Abstract: The present disclosure is directed to systems and methods for performing reading comprehension with machine learning. More specifically, the present disclosure is directed to a Neural Symbolic Reader (example implementations of which may be referred to as NeRd), which includes a reader to encode the passage and question, and a programmer to generate a program for multi-step reasoning. By using operators like span selection, the program can be executed over a natural language text passage to generate an answer to a natural language text question. NeRd is domain-agnostic such that the same neural architecture works for different domains. Further, NeRd is compositional such that complex programs can be generated by compositionally applying the symbolic operators.
    Type: Grant
    Filed: August 6, 2020
    Date of Patent: April 9, 2024
    Assignee: GOOGLE LLC
    Inventors: Chen Liang, Wei Yu, Quoc V. Le, Xinyun Chen, Dengyong Zhou
  • Publication number: 20240013059
    Abstract: Provided is a knowledge distillation technique for training a student language model that, relative to a larger teacher language model, has a significantly smaller vocabulary, lower embedding dimensions, and/or hidden state dimensions. Specifically, aspects of the present disclosure are directed to a dual-training mechanism that trains the teacher and student language models simultaneously to obtain optimal word embeddings for the student vocabulary. In some implementations, this approach can be combined with learning shared projection matrices that transfer layer-wise knowledge from the teacher language model to the student language model. Example experimental results have also demonstrated higher compression efficiency and accuracy when compared with other state-of-the-art compression techniques, including the ability to compress the BERTBASE model by more than 60×, with only a minor drop in downstream task metrics, resulting in a language model with a footprint of under 7 MB.
    Type: Application
    Filed: September 21, 2023
    Publication date: January 11, 2024
    Inventors: Yang Song, Raghav Gupta, Dengyong Zhou, Sanqiang Zhao
  • Publication number: 20230394328
    Abstract: Example embodiments of aspects of the present disclosure provide an example computer-implemented method for improved prompting of a machine-learned model. The example method can include obtaining an instructive sequence descriptive of an instructive query, an instructive response, and an instructive trace of intermediate states from the instructive query to the instructive response. The example method can include inputting, to a machine-learned model, the instructive sequence and an operative query, wherein the machine-learned model is configured to process the operative query with attention over the instructive sequence. The example method can include generating, using the machine-learned model and responsive to the operative query, an operative response.
    Type: Application
    Filed: August 5, 2022
    Publication date: December 7, 2023
    Inventors: Jason Weng Wei, Dengyong Zhou, Dale Eric Schuurmans, Quoc V. Le, Maarten Paul Bosma, Ed Huai-Hsin Chi, Olivier Jean Andrè Bousquet, Le Hou, Nathan Kemp Sekiguchi Scales, David J. Bieber, Charles Aloysius Sutton, Nathanael Martin Schärli, Augustus Quadrozzi Odena, Sharan Ajit Narang, Guy Gur-Ari Krakover, Aakanksha Chowdhery, Aitor Lewkowycz, Jiageng Luan, David Martin Dohan, Henryk Michalewski, Jacob Austin, Anders Johan Andreassen, Maxwell Isaac Nye, Xuezhi Wang
  • Patent number: 11797862
    Abstract: Provided is a knowledge distillation technique for training a student language model that, relative to a larger teacher language model, has a significantly smaller vocabulary, lower embedding dimensions, and/or hidden state dimensions. Specifically, aspects of the present disclosure are directed to a dual-training mechanism that trains the teacher and student language models simultaneously to obtain optimal word embeddings for the student vocabulary. In some implementations, this approach can be combined with learning shared projection matrices that transfer layer-wise knowledge from the teacher language model to the student language model. Example experimental results have also demonstrated higher compression efficiency and accuracy when compared with other state-of-the-art compression techniques, including the ability to compress the BERTBASE model by more than 60×, with only a minor drop in downstream task metrics, resulting in a language model with a footprint of under 7 MB.
    Type: Grant
    Filed: January 22, 2020
    Date of Patent: October 24, 2023
    Assignee: GOOGLE LLC
    Inventors: Yang Song, Raghav Gupta, Dengyong Zhou, Sanqiang Zhao
  • Publication number: 20230289626
    Abstract: Provided are computing systems, methods, and platforms for negative sampling in knowledge graphs with improved efficiency. A knowledge graph comprising entities and links between the entities can be obtained. A query computation graph comprising nodes and edges can be generated based on the knowledge graph. The nodes of the query computation graph can include anchor nodes, a root node, and intermediate nodes positioned in paths between the anchor nodes and the root node. A node cut of a query of the query computation graph can be determined and can include at least one node that cuts at least one path between each anchor node and the root node of the query computation graph. Negative samples can be identified by bidirectionally traversing the query computation graph in a first direction from the anchor nodes to the node cut and in a second direction from the root node to the node cut.
    Type: Application
    Filed: March 14, 2023
    Publication date: September 14, 2023
    Inventors: Hanjun Dai, Dale Eric Schuurmans, Xinyun Chen, Dengyong Zhou, Bo Dai, Hongyu Ren
  • Publication number: 20230244938
    Abstract: An example method for pretraining a machine-learned model is provided. The example method includes obtaining a plurality of different combinations of configuration parameters of a pretraining objective framework. The example method includes generating, using the pretraining objective framework, a plurality of corrupted training examples from one or more training examples, wherein the plurality of corrupted training examples are respectively generated according to the plurality of different combinations. The example method includes inputting the plurality of corrupted training examples into the machine-learned model, wherein the machine-learned model is configured to generate uncorrupted subportions corresponding to corrupted subportions of the corrupted training examples. The example method includes obtaining, from the machine-learned model, a plurality of outputs respectively generated by the machine-learned model based on the plurality of corrupted training examples.
    Type: Application
    Filed: January 27, 2023
    Publication date: August 3, 2023
    Inventors: Jason Weng Wei, Dengyong Zhou, Xuezhi Wang, Dale Eric Schuurmans, Quoc V. Le, Maarten Paul Bosma, Ed Huai-Hsin Chi, Olivier Jean Andrè Bousquet, Le Hou, Charles Aloysius Sutton, Nathanael Martin Schärli, Nathan Kemp Sekiguchi Scales, Augustus Quadrozzi Odena, Sharan Ajit Narang, Guy Gur-Ari Krakover, Aakanksha Chowdhery, David Martin Dohan, Aitor Lewkowycz, Henryk Michalewski, Jiageng Luan, David J. Bieber, Jacob Austin, Anders Johan Andreassen, Maxwell Isaac Nye, Yi Tay, Mostafa Dehghani
  • Publication number: 20220108221
    Abstract: Systems and methods of the present disclosure are directed to a computer-implemented method. The method can include obtaining a machine-learned model comprising a plurality of model units, wherein each model unit comprises a plurality of parameters that are tied to a shared plurality of parameters. The method can include performing a first plurality of training iterations with the machine-learned model to adjust parameters of the shared plurality of parameters. The method can include detecting, based on the first plurality of training iterations, an occurrence of an untying condition. The method can include untying the parameters of one or more model units from the shared plurality of parameters. The method can include performing a second plurality of training iterations with the machine-learned model to adjust parameters of the one or more model units independent of the shared plurality of parameters.
    Type: Application
    Filed: October 4, 2021
    Publication date: April 7, 2022
    Inventors: Dengyong Zhou, Xiaodan Song, Shuo Yang, Qiang Liu, Le Hou
  • Publication number: 20220043981
    Abstract: The present disclosure is directed to systems and methods for performing reading comprehension with machine learning. More specifically, the present disclosure is directed to a Neural Symbolic Reader (example implementations of which may be referred to as NeRd), which includes a reader to encode the passage and question, and a programmer to generate a program for multi-step reasoning. By using operators like span selection, the program can be executed over a natural language text passage to generate an answer to a natural language text question. NeRd is domain-agnostic such that the same neural architecture works for different domains. Further, NeRd it is compositional such that complex programs can be generated by compositionally applying the symbolic operators.
    Type: Application
    Filed: August 6, 2020
    Publication date: February 10, 2022
    Inventors: Chen Liang, Wei Yu, Quoc V. Le, Xinyun Chen, Dengyong Zhou
  • Publication number: 20210224660
    Abstract: Provided is a knowledge distillation technique for training a student language model that, relative to a larger teacher language model, has a significantly smaller vocabulary, lower embedding dimensions, and/or hidden state dimensions. Specifically, aspects of the present disclosure are directed to a dual-training mechanism that trains the teacher and student language models simultaneously to obtain optimal word embeddings for the student vocabulary. In some implementations, this approach can be combined with learning shared projection matrices that transfer layer-wise knowledge from the teacher language model to the student language model. Example experimental results have also demonstrated higher compression efficiency and accuracy when compared with other state-of-the-art compression techniques, including the ability to compress the BERTBAsE model by more than 60×, with only a minor drop in downstream task metrics, resulting in a language model with a footprint of under 7 MB.
    Type: Application
    Filed: January 22, 2020
    Publication date: July 22, 2021
    Inventors: Yang Song, Raghav Gupta, Dengyong Zhou, Sanqiang Zhao
  • Publication number: 20210065066
    Abstract: A deep state space generative model is augmented with intervention prediction. The state space model provides a principled way to capture the interactions among observations, interventions, critical event occurrences, true states, and associated uncertainty. The state space model can include a discrete-time hazard rate model that provides flexible fitting of general survival time distributions. The state space model can output a joint prediction of event risk, observation and intervention trajectories based on patterns in temporal progressions, and correlations between past measurements and interventions.
    Type: Application
    Filed: August 31, 2020
    Publication date: March 4, 2021
    Inventors: Yuan Xue, Dengyong Zhou, Nan Du, Andrew Mingbo Dai, Zhen Xu, Kun Zhang, Yingwei Cui
  • Patent number: 10795645
    Abstract: Described are systems, methods, and computer-readable media for program generation in a domain-specific language based on input-output examples. In accordance with various embodiments, a neural-network-based program generation model conditioned on an encoded set of input-output examples is used to generate a program tree by iteratively expanding a partial program tree, beginning with a root node and ending when all leaf nodes are terminal.
    Type: Grant
    Filed: March 27, 2017
    Date of Patent: October 6, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Abdelrahman S. A. Mohamed, Rishabh Singh, Lihong Li, Dengyong Zhou, Pushmeet Kohli, Emilio Parisotto
  • Publication number: 20190266246
    Abstract: In neural-network-based approaches to sequence modeling, an output sequence may be modeled via segmentations, the probability of the output sequence being constructed as a sum of products of output-segment probabilities, taken over all valid output-sequence segmentations. A set of artificial neural networks may model the distribution of the output-sequence probability with a recurrent neural network modeling the distributions of the individual output-segment probabilities, optionally in conjunction with a second recurrent neural network modeling concatenations of output segments. In various embodiments, this approach is applied to neural phrase-based machine translation.
    Type: Application
    Filed: February 23, 2018
    Publication date: August 29, 2019
    Inventors: Chong Wang, Yining Wang, Po-Sen Huang, Abdelrahman Samir Abdelrahman Mohamed, Dengyong Zhou, Li Deng, Sitao Huang
  • Publication number: 20180275967
    Abstract: Described are systems, methods, and computer-readable media for program generation in a domain-specific language based on input-output examples. In accordance with various embodiments, a neural-network-based program generation model conditioned on an encoded set of input-output examples is used to generate a program tree by iteratively expanding a partial program tree, beginning with a root node and ending when all leaf nodes are terminal.
    Type: Application
    Filed: March 27, 2017
    Publication date: September 27, 2018
    Inventors: Abdelrahman S.A. Mohamed, Rishabh Singh, Lihong Li, Dengyong Zhou, Pushmeet Kohli, Emilio Parisotto
  • Publication number: 20150262313
    Abstract: A user interface may include instructions to complete a task (including a plurality of task items) and rule(s) that indicate to a worker how a payment associated with the task is to be calculated. The worker may provide information associated with the individual task items via the user interface. The payment may be calculated based on the rule(s), where the payment is determined based at least in part on a multiplicative payment component. In some implementations, the user interface may include an option for the worker to skip question(s), and the worker may be incentivized to skip question(s) when the worker does not know the answer. Further, in some implementations, the user interface may allow the worker to specify a confidence value when the worker chooses to answer the question, and the worker may be incentivized to provide an accurate confidence value.
    Type: Application
    Filed: March 12, 2014
    Publication date: September 17, 2015
    Applicant: Microsoft Corporation
    Inventors: Nihar Bhadresh Shah, Dengyong Zhou
  • Patent number: 9070046
    Abstract: Architecture that performs image page index selection. A learning-based framework learns a statistical model based on the hyperlink (URL-uniform resource locator) previous click information obtained from the image search users. The learned model can combine the features of a newly discovered URL to predict the possibility of the newly-discovered URL being clicked in the future image search. In addition to existing web index selection features, image clicks are added as features, and the image clicks are aggregated over different URL segments, as well as the site modeling pattern trees to reduce the sparse problem of the image click information.
    Type: Grant
    Filed: October 17, 2012
    Date of Patent: June 30, 2015
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Bo Geng, Xian-Sheng Hua, Zhong Wu, Dengyong Zhou
  • Patent number: 8914321
    Abstract: A system and method infer true labels for multiple items. The inferred labels are generated from judgments. Multiple judges select the judgments from a specified choice of labels for each item. The method includes determining a characterization of judge expertise and item difficulties based on the judgments. The method also includes determining, using maximum entropy, a probability distribution over the specified choice of labels for each judge and item, based on the judgments. The method further includes selecting improved labels for the items from the specified choice such that the entropy over the probability distribution is reduced. The improved labels represent an improvement from the judgments toward the true labels. Additionally, the method includes performing iterative procedure to determine the true labels, the characterizations of judge expertise and the labeling difficulties.
    Type: Grant
    Filed: February 3, 2013
    Date of Patent: December 16, 2014
    Assignee: Microsoft Corporation
    Inventors: Dengyong Zhou, Sumit Basu, Yi Mao, John C. Platt
  • Patent number: 8805754
    Abstract: A spam detection system is disclosed. The system includes a classifier training component that receives a first set of training pages labeled as normal pages and a second set of training pages labeled as spam pages. The training component trains a web page classifier based on both the first set of training pages and the second set of training pages. A spam detector then receives unlabeled web pages uses the web page classifier to classify the unlabeled web pages as spam pages or normal pages.
    Type: Grant
    Filed: June 19, 2013
    Date of Patent: August 12, 2014
    Assignee: Microsoft Corporation
    Inventors: Dengyong Zhou, Christopher Burges, Tao Tao
  • Publication number: 20140222747
    Abstract: A system and method infer true labels for multiple items. The inferred labels are generated from judgments. Multiple judges select the judgments from a specified choice of labels for each item. The method includes determining a characterization of judge expertise and item difficulties based on the judgments. The method also includes determining, using maximum entropy, a probability distribution over the specified choice of labels for each judge and item, based on the judgments. The method further includes selecting improved labels for the items from the specified choice such that the entropy over the probability distribution is reduced. The improved labels represent an improvement from the judgments toward the true labels. Additionally, the method includes performing iterative procedure to determine the true labels, the characterizations of judge expertise and the labeling difficulties.
    Type: Application
    Filed: February 3, 2013
    Publication date: August 7, 2014
    Applicant: MICROSOFT CORPORATION
    Inventors: Dengyong Zhou, Sumit Basu, Yi Mao, John C. Platt
  • Publication number: 20140172767
    Abstract: To optimize the number of correct decisions made by a crowdsourcing system given a fixed budget, tasks for multiple decisions are allocated to workers in a sequence. A task is allocated to a worker based on results already achieved for that task from other workers. Such allocation addresses the different levels of difficulty of decisions. A task also can be allocated to a worker based on results already received for other tasks from that worker. Such allocation addresses the different levels of reliability of workers. The process of allocating tasks to workers can be modeled as a Bayesian Markov decision process. Given the information already received for each item and worker, an estimate of the number of correct labels received can be determined. At each step, the system attempts to maximize the estimated number of correct labels it expects to have given the inputs so far.
    Type: Application
    Filed: December 14, 2012
    Publication date: June 19, 2014
    Applicant: MICROSOFT CORPORATION
    Inventors: Xi Chen, Qihang Lin, Dengyong Zhou
  • Publication number: 20140105488
    Abstract: Architecture that performs image page index selection. A learning-based framework learns a statistical model based on the hyperlink (URL-uniform resource locator) previous click information obtained from the image search users. The learned model can combine the features of a newly discovered URL to predict the possibility of the newly-discovered URL being clicked in the future image search. In addition to existing web index selection features, image clicks are added as features, and the image clicks are aggregated over different URL segments, as well as the site modeling pattern trees to reduce the sparse problem of the image click information.
    Type: Application
    Filed: October 17, 2012
    Publication date: April 17, 2014
    Applicant: MICROSOFT CORPORATION
    Inventors: Bo Geng, Xian-Sheng Hua, Zhong Wu, Dengyong Zhou