Patents by Inventor Ruofei Zhang
Ruofei Zhang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11966428Abstract: A training system produces a resource-efficient machine-trained model via a training architecture that employs plural processing paths. Some of the processing paths incorporate the use of auxiliary information that imparts external knowledge about source items being processed. The training architecture also employs contrastive learning that operates at different respective levels within the training architecture. For instance, the training architecture uses encoder-level contrastive learning to compare output information generated by different encoders within the training architecture. The training architecture uses decoder-level contrastive learning to compare output information produced by different decoders within the training architecture. An inference-stage system performs an application task using the model produced by the training system.Type: GrantFiled: July 1, 2021Date of Patent: April 23, 2024Assignee: Microsoft Technology Licensing, LLCInventors: Jian Jiao, Yeyun Gong, Nan Duan, Ruofei Zhang
-
Publication number: 20240129437Abstract: A method can include selecting, from at least a first avatar and a second avatar based on at least one attribute of a calendar event associated with a user, a session avatar, the first avatar being based on a first set of images of a user wearing a first outfit and the second avatar being based on a second set of images of the user wearing a second outfit, and presenting the session avatar during a videoconference, the presentation of the session avatar changing based on audio input received from the user during the videoconference.Type: ApplicationFiled: October 18, 2022Publication date: April 18, 2024Inventors: Yinda Zhang, Ruofei Du
-
Patent number: 11960573Abstract: Neural network-based categorization can be improved by incorporating graph neural networks that operate on a graph representing the taxonomy of the categories into which a given input is to be categorized by the neural network based-categorization. The output of a graph neural network, operating on a graph representing the taxonomy of categories, can be combined with the output of a neural network operating upon the input to be categorized, such as through an interaction of multidimensional output data, such as a dot product of output vectors. In such a manner, information conveying the explicit relationships between categories, as defined by the taxonomy, can be incorporated into the categorization. To recapture information, incorporate new information, or reemphasize information a second neural network can also operate upon the input to be categorized, with the output of such a second neural network being merged with the output of the interaction.Type: GrantFiled: November 7, 2022Date of Patent: April 16, 2024Assignee: Microsoft Technology Licensing, LLCInventors: Tianchuan Du, Keng-Hao Chang, Ruofei Zhang, Paul Liu
-
Patent number: 11954899Abstract: Systems and methods for training models to predict dense correspondences across images such as human images. A model may be trained using synthetic training data created from one or more 3D computer models of a subject. In addition, one or more geodesic distances derived from the surfaces of one or more of the 3D models may be used to generate one or more loss values, which may in turn be used in modifying the model's parameters during training.Type: GrantFiled: March 11, 2021Date of Patent: April 9, 2024Assignee: GOOGLE LLCInventors: Yinda Zhang, Feitong Tan, Danhang Tang, Mingsong Dou, Kaiwen Guo, Sean Ryan Francesco Fanello, Sofien Bouaziz, Cem Keskin, Ruofei Du, Rohit Kumar Pandey, Deqing Sun
-
Patent number: 11921766Abstract: Described herein are technologies related to constructing supplemental content items that summarize electronic landing pages. A sequence to sequence model that is configured to construct supplemental content items is trained based upon a corpus of electronic landing pages and supplemental content items that have been constructed by domain experts, wherein each landing page has a respective supplemental content item assigned thereto. The sequence to sequence model is additionally trained using self critical sequence training, where estimated click through rates of supplemental content items generated by the sequence to sequence model are employed to train the sequence to sequence model.Type: GrantFiled: September 2, 2022Date of Patent: March 5, 2024Assignee: MICRSOFT TECHNOLOGY LICENSING, LLCInventors: Keng-hao Chang, Ruofei Zhang, John Weston Hughes
-
Publication number: 20240054326Abstract: Systems and methods are provided for learning classifiers for annotating a document with predicted labels under extreme classification where there are over a million labels. The learning includes receiving a joint graph including documents and labels as nodes. Multi-dimensional vector representations of a document (i.e., document representations) are generated based on graph convolution of the joint graph. Each document representation varies an extent of reliance on neighboring nodes to accommodate context. The document representations are feature-transformed using a residual layer. Per-label document representations are generated from the transformed document representations based on neighboring label attention. A classifier is trained for each of over a million labels based on joint learning using training data and the per-label document representation. The trained classifier performs highly efficiently as compared to other classifiers trained using disjoint graphs of documents and labels.Type: ApplicationFiled: April 12, 2021Publication date: February 15, 2024Applicant: Microsoft Technology Licensing, LLCInventors: Kushal DAVE, Deepak SAINI, Arnav Kumar JAIN, Jian JIAO, Amit Kumar Rambachan SINGH, Ruofei ZHANG, Manik VARMA
-
Publication number: 20240046037Abstract: Systems and methods are provided for training a data model based on training data. The training includes pre-training and fine-tuning the data model based on a combination of an autoregressive (AR) model and a non-autoregressive (NAR) model. Training data may be received and encoded into streams of tokens. A pre-trainer during decoding generates a continuum of data structures of the AR and NAR combined model including a main stream and a series of predicting streams. Masked tokens in predicting streams reference or attend to one or more preceding tokens in the main stream or the preceding predicting streams. A fine-tuner selects streams to generate a trained model according to a target data model. The target data model is determined based on balancing an accuracy constraint and an efficiency constraint for predicting tokens. The decoder acts as abridge between the AR and NAR models in generating a trained data model.Type: ApplicationFiled: December 25, 2020Publication date: February 8, 2024Applicant: Microsoft Technology Licensing, LLCInventors: Jian JIAO, Yeyun GONG, Nan DUAN, Weizhu CHEN, Kewen TANG, Qiang LOU, Ruofei ZHANG, Yu YAN, Jiusheng CHEN
-
Publication number: 20230394333Abstract: A knowledge injection model for generative commonsense reasoning. In examples, an encoder-decoder model is used to generate a model output (204) a plausible description for a set of concepts. A prototype (218) is generated from an in-domain or out-of-domain knowledge corpus, which is further used as input (202) for the encoder-decoder model. Concept input tokens and prototype input tokens are scaled to limit potential skew that may be introduced by the prototype (218). Additionally, position indicators are generated for each input token, which indicate the relative position each respective input token as compared to other input tokens. As such, when decoding the scaled encoded input tokens, the decoder (214) may be more attuned to the scenario bias that is introduced by the prototype (218) when generating a model output (204). Thus, the encoder-decoder model need not rely solely on the set of concepts when generating the model output (204).Type: ApplicationFiled: November 12, 2020Publication date: December 7, 2023Applicant: Microsoft Technology Licensing, LLCInventors: Jian JIAO, Yeyun GONG, Nan DUAN, Yameng HUANG, Ruofei ZHANG, Ming ZHOU
-
Publication number: 20230385315Abstract: Systems and methods are provided for generating a keyword sequence from an input query. A first text sequence corresponding to an input query may be received and encoded into a source sequence representation using an encoder of a machine learning model. A keyword sentence may then be generated from the source sequence representation using a decoder of the machine learning model. The decoder may generate a modified generation score for a plurality of prediction tokens, wherein the modified generation score is based on the respective prediction token generation score and a maximum generation score for a suffix of each prediction token. The decoder may then select the prediction token of the plurality of prediction tokens based on the modified generation score, and add the selected prediction token to the previously decoded partial hypothesis provided by the decoder.Type: ApplicationFiled: October 14, 2020Publication date: November 30, 2023Applicant: Microsoft Technology Licensing, LLCInventors: Jian JIAO, Yeyun GONG, Nan DUAN, Ruofei ZHANG, Ming ZHOU
-
Publication number: 20230334320Abstract: A neural architecture search (NAS) system generates a machine-trained model that satisfies specified real-time latency objectives by selecting among a collection of layer-wise sparse candidate models. In operation, the NAS system selects a parent model from among the candidate models. The NAS system then identifies a particular layer of the parent model, and then determines how the layer is to be mutated, to yield a child model. The NAS system calculates a reward score for the child model based on its latency and accuracy. The NAS system then uses reinforcement learning to update the trainable logic used to perform the mutating based on the reward score. The NAS system repeats the above process a plurality of times. An online application system can use the machine-trained model eventually produced by the NAS system to deliver real-time responses to user queries.Type: ApplicationFiled: April 15, 2022Publication date: October 19, 2023Applicant: Microsoft Technology Licensing, LLCInventors: Li ZHANG, Youkow HOMMA, Yujing WANG, Min WU, Mao YANG, Ruofei ZHANG, Ting CAO, Wei SHEN
-
Publication number: 20230334350Abstract: A computing device including a processor configured to receive data indicating, for a query category within a sampled time period, a matching density defined as a number of matches per query. The processor may generate a structural causal model (SCM) of the data within the sampled time period. The SCM may include a plurality of structural equations. Based at least in part on the plurality of structural equations, the processor may estimate a structural equation error value for the matching density. The processor may update a value of a target SCM output variable to a counterfactual updated value. Based at least in part on the SCM, the counterfactual updated value, and the structural equation error value, the processor may compute a predicted matching density when the target SCM output variable has the counterfactual updated value. The processor may output the predicted matching density.Type: ApplicationFiled: April 14, 2022Publication date: October 19, 2023Applicant: Microsoft Technology Licensing, LLCInventors: Hua LI, Amit SHARMA, Jian JIAO, Ruofei ZHANG
-
Publication number: 20230267308Abstract: Knowledge graphs can greatly improve the quality of content recommendation systems. There is a broad variety of knowledge graphs in the domain including clicked user-ad graphs, clicked query-ad graphs, keyword-display URL graphs etc. A hierarchical Transformer model learns entity embeddings in knowledge graphs. The model consists of two different Transformer blocks where the bottom block generates relation-dependent embeddings for the source entity and its neighbors, and the top block aggregates the outputs from the bottom block to produce the target entity embedding. To balance the information from contextual entities and the source entity itself, a masked entity model (MEM) task is combined with a link prediction task in model training.Type: ApplicationFiled: May 4, 2023Publication date: August 24, 2023Applicant: Microsoft Technology Licensing, LLCInventors: Jian JIAO, Xiaodong LIU, Ruofei ZHANG, Jianfeng GAO
-
Patent number: 11676001Abstract: Knowledge graphs can greatly improve the quality of content recommendation systems. There is a broad variety of knowledge graphs in the domain including clicked user-ad graphs, clicked query-ad graphs, keyword-display URL graphs etc. A hierarchical Transformer model learns entity embeddings in knowledge graphs. The model consists of two different Transformer blocks where the bottom block generates relation-dependent embeddings for the source entity and its neighbors, and the top block aggregates the outputs from the bottom block to produce the target entity embedding. To balance the information from contextual entities and the source entity itself, a masked entity model (MEM) task is combined with a link prediction task in model training.Type: GrantFiled: November 9, 2020Date of Patent: June 13, 2023Assignee: Microsoft Technology Licensing, LLCInventors: Jian Jiao, Xiaodong Liu, Ruofei Zhang, Jianfeng Gao
-
Publication number: 20230081624Abstract: A training technique trains a neural network having sparsely-activated sub-networks. It does so by processing plural batches of training data in two respective passes of the neural network, yielding first prediction information and second prediction information. For each batch, the technique randomly assigns different sub-networks in the first and second passes of the neural network to process the batch. Over the course of training, the technique attempts to minimize loss information, which describes the difference between the first prediction information and ground-truth information, and the difference between the second prediction information and the ground-truth information. Simultaneously, the technique attempts to minimize divergence information, which describes the divergence of the first prediction information from the second prediction information (and vice versa).Type: ApplicationFiled: October 11, 2021Publication date: March 16, 2023Applicant: Microsoft Technology Licensing, LLCInventors: Jian JIAO, Xiaodong LIU, Jianfeng GAO, Ruofei ZHANG
-
Patent number: 11603017Abstract: The present application describes a system and method for converting a natural language query to a standard query using a sequence-to-sequence neural network. As described herein, when a natural language query is receive, the natural language query is converted to a standard query using a sequence-to-sequence model. In some cases, the sequence-to-sequence model is associated with an attention layer. A search using the standard query is performed and various documents may be returned. The documents that result from the search are scored based, at least in part, on a determined conditional entropy of the document. The conditional entropy is determined using the natural language query and the document.Type: GrantFiled: May 18, 2020Date of Patent: March 14, 2023Assignee: Microsoft Technology Licensing, LLCInventors: Keng-hao Chang, Ruofei Zhang, Zi Yin
-
Patent number: 11551039Abstract: Neural network-based categorization can be improved by incorporating graph neural networks that operate on a graph representing the taxonomy of the categories into which a given input is to be categorized by the neural network based-categorization. The output of a graph neural network, operating on a graph representing the taxonomy of categories, can be combined with the output of a neural network operating upon the input to be categorized, such as through an interaction of multidimensional output data, such as a dot product of output vectors. In such a manner, information conveying the explicit relationships between categories, as defined by the taxonomy, can be incorporated into the categorization. To recapture information, incorporate new information, or reemphasize information a second neural network can also operate upon the input to be categorized, with the output of such a second neural network being merged with the output of the interaction.Type: GrantFiled: April 28, 2020Date of Patent: January 10, 2023Assignee: Microsoft Technology Licensing, LLCInventors: Tianchuan Du, Keng-hao Chang, Ruofei Zhang, Paul Liu
-
Publication number: 20230004588Abstract: A training system produces a resource-efficient machine-trained model via a training architecture that employs plural processing paths. Some of the processing paths incorporate the use of auxiliary information that imparts external knowledge about source items being processed. The training architecture also employs contrastive learning that operates at different respective levels within the training architecture. For instance, the training architecture uses encoder-level contrastive learning to compare output information generated by different encoders within the training architecture. The training architecture uses decoder-level contrastive learning to compare output information produced by different decoders within the training architecture. An inference-stage system performs an application task using the model produced by the training system.Type: ApplicationFiled: July 1, 2021Publication date: January 5, 2023Applicant: Microsoft Technology Licensing, LLCInventors: Jian JIAO, Yeyun GONG, Nan DUAN, Ruofei ZHANG
-
Publication number: 20220414134Abstract: Described herein are technologies related to constructing supplemental content items that summarize electronic landing pages. A sequence to sequence model that is configured to construct supplemental content items is trained based upon a corpus of electronic landing pages and supplemental content items that have been constructed by domain experts, wherein each landing page has a respective supplemental content item assigned thereto. The sequence to sequence model is additionally trained using self critical sequence training, where estimated click through rates of supplemental content items generated by the sequence to sequence model are employed to train the sequence to sequence model.Type: ApplicationFiled: September 2, 2022Publication date: December 29, 2022Inventors: Keng-hao CHANG, Ruofei ZHANG, John Weston HUGHES
-
Publication number: 20220318601Abstract: Computing technology is described herein that provides an attention mechanism, implemented by a neural network, that generates attention information based on head-specific query information and shared key and value (KV) information, without computing head-specific key information and head-specific value information, and without caching the head-specific key information and the head-specific value information in memory. This manner of operation allows the computing technology to make efficient use of processing and memory resources. In some implementations, the attention mechanism is part of decoder of an encoder-decoder system, or a standalone decoder system. In some implementations, the computing technology leverages the attention information to generate synthesized text based on input text.Type: ApplicationFiled: April 3, 2021Publication date: October 6, 2022Inventors: Yu YAN, Jiusheng CHEN, Nikhil BHENDAWADE, Yeyun GONG, Nan DUAN, Ruofei ZHANG
-
Patent number: 11461415Abstract: A technique is described herein for processing a given query item in a latency-efficient and resource-efficient manner. The technique uses a first transformer-based encoder to transform the given query item into an encoded query item. In one case, the given query item is an expression that includes one or more query-expression linguistic tokens. The technique includes a second transformer-based encoder for transforming a given target item into an encoded target item. The given target item may likewise correspond to an expression that includes one or more target-expression linguistic tokens. A similarity-assessing mechanism then assesses the semantic similarity between the given query item and the given target item based on the encoded query item and the encoded target item. Each transformer-based encoder uses one or more self-attention mechanisms. The second transformer-based encoder can optionally perform its work in an offline manner, prior to receipt of the given query item.Type: GrantFiled: February 6, 2020Date of Patent: October 4, 2022Assignee: Microsoft Technology Licensing, LLCInventors: Wenhao Lu, Jian Jiao, Ruofei Zhang