Patents by Inventor Nanshan Zeng

Nanshan Zeng has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

ANNOTATING IMAGES FOR TRAINING COMPUTER VISION MODELS

Publication number: 20250148765

Abstract: A method for annotating images to create a corpus for training a multi-task computer vision machine learning model is presented. The method comprises receiving, at one or more annotation specialist models, a plurality of images to be annotated. Via operation of the one or more annotation specialist models, pre-filtered annotations are generated for the plurality of images. Via operation of a data filtering and enhancement module, the pre-filtered annotations are filtered in accordance with predefined noise criteria so as to output candidate annotations for the plurality of images. The method further comprises, for each of one or more candidate annotations, selectively (1) storing the candidate annotation into the corpus as a final annotation for its associated image, or (2) adding the candidate annotation to its associated image using the one or more annotation specialist models and the data filtering and enhancement module for subsequent iterative annotation and filtering.

Type: Application

Filed: January 30, 2024

Publication date: May 8, 2025

Applicant: Microsoft Technology Licensing, LLC

Inventors: Lu YUAN, Bin XIAO, Haiping WU, Weijian XU, Xiyang DAI, Houdong HU, Yumao LU, Nanshan ZENG, Ce Christopher LIU
AUTOMATIC LANGUAGE MODEL (LM) INPUT OPTIMIZATION USING TEXTUAL GRADIENTS

Publication number: 20250111147

Abstract: Systems and methods are provided for implementing automatic prompt optimization using textual gradients. In various embodiments, a feedback prompt, input into a large language model (“LLM”), is used to generate textual gradients that criticize a current prompt. The feedback prompt includes the current prompt and predictions that are incorrect compared with corresponding labels associated with minibatch data processed by the LLM using the current prompt. The textual gradients and current prompt are used in an editing prompt to the LLM to obtain a set of optimized prompts, which may be expanded using a paraphrasing prompt that is input into the LLM to generate a set of paraphrased prompts. A selection algorithm is used to select one or more optimized prompts from the set of optimized prompts and/or the set of paraphrased prompts, and the process is repeated with the selected one or more optimized prompts replacing the current prompt.

Type: Application

Filed: September 29, 2023

Publication date: April 3, 2025

Applicant: Microsoft Technology Licensing, LLC

Inventors: Reid Allen PRYZANT, Jerry Zheng LI, Dan ITER, Yin Tat LEE, Chenguang ZHU, Nanshan ZENG, Anup Shirgaonkar
MULTI-GRANULARITY MEETING SUMMARIZATION MODELS

Publication number: 20250111133

Abstract: Generally discussed herein are devices, systems, and methods for. A method can include receiving, from a user through a user interface, a segmentation granularity value indicating a number of events in the transcript to be included in a summary, extracting, by a ranker model and from the transcript, a number of hints equal to the number of events, generating, by a summarizer model that includes a re-trained language model, respective summaries, one for each event, of a portion of the transcript corresponding to the event, and providing the respective summaries as an overall summary of the transcript.

Type: Application

Filed: March 25, 2022

Publication date: April 3, 2025

Inventors: Chenguang Zhu, Yang LIU, Nanshan ZENG, Xuedong HUANG, Ming ZHONG, Yuantao Wang, Wei XIONG
Speaker attributed transcript generation

Patent number: 12243534

Abstract: A computer implemented method processes audio streams recorded during a meeting by a plurality of distributed devices.

Type: Grant

Filed: April 4, 2022

Date of Patent: March 4, 2025

Assignee: Microsoft Technology Licensing, LLC

Inventors: Takuya Yoshioka, Andreas Stolcke, Zhuo Chen, Dimitrios Basile Dimitriadis, Nanshan Zeng, Lijuan Qin, William Isaac Hinthorn, Xuedong Huang
Generation of optimized spoken language understanding model through joint training with integrated acoustic knowledge-speech module

Patent number: 12243513

Abstract: A speech module is joint trained with a knowledge module by transforming a first knowledge graph into an acoustic knowledge graph. The knowledge module is trained on the acoustic knowledge graph. Then, the knowledge module is integrated with the speech module to generate an integrated knowledge-speech module. In some instances, the speech module included in the integrated knowledge-speech module is aligned with a language module to generate an optimized speech model configured to leverage acoustic information and acoustic-based knowledge information, along with language information.

Type: Grant

Filed: May 18, 2021

Date of Patent: March 4, 2025

Assignee: Microsoft Technology Licensing, LLC

Inventors: Chenguang Zhu, Nanshan Zeng
TRAINING AND USING A DEEP LEARNING MODEL FOR TRANSCRIPT TOPIC SEGMENTATION

Publication number: 20250061277

Abstract: The disclosure herein describes using a deep learning model to identify topic segments of a communication transcript. A communication transcript including a set of utterances is obtained. The set of utterances is divided into a plurality of utterance windows, wherein each utterance window of the plurality of utterance windows includes a different subset of utterances of the set of utterances, and wherein each utterance of the set of utterances is included in at least one utterance window of the plurality of utterance windows. For each utterance window of the plurality of utterance windows, each utterance in the utterance window is classified as a topic boundary or a non-boundary using a deep learning model. Topic segments of the communication transcript are identified based on utterances of the set of utterances that are classified as topic boundaries. A communication transcript summary is generated using the communication transcript and the identified topic segments.

Type: Application

Filed: December 15, 2021

Publication date: February 20, 2025

Inventors: Chenguang ZHU, Yang LIU, David Peace HUNG, Nanshan ZENG
Unified speech representation learning

Patent number: 12217745

Abstract: A system obtains a first training data set comprising labeled speech data or both labeled and unlabeled data corresponding to a high-resource data set as well as latent speech representations based on the first training data set. The system trains a machine learning model on the first training data set to learn phonetically aware speech representations corresponding to the first training data set. The system applies the latent speech representations to a transformer context network to generate contextual representations. The system aligns each of the contextual representations with a phoneme label to generate phonetically-aware contextual representations. The system causes a refinement engine to further refine the machine learning model.

Type: Grant

Filed: July 3, 2023

Date of Patent: February 4, 2025

Assignee: Microsoft Technology Licensing, LLC

Inventors: Yao Qian, Yu Wu, Kenichi Kumatani, Shujie Liu, Furu Wei, Nanshan Zeng, Xuedong David Huang, Chengyi Wang
Dynamic gradient aggregation for training neural networks

Patent number: 12136034

Abstract: The disclosure herein describes training a global model based on a plurality of data sets. The global model is applied to each data set of the plurality of data sets and a plurality of gradients is generated based on that application. At least one gradient quality metric is determined for each gradient of the plurality of gradients. Based on the determined gradient quality metrics of the plurality of gradients, a plurality of weight factors is calculated. The plurality of gradients is transformed into a plurality of weighted gradients based on the calculated plurality of weight factors and a global gradient is generated based on the plurality of weighted gradients. The global model is updated based on the global gradient, wherein the updated global model, when applied to a data set, performs a task based on the data set and provides model output based on performing the task.

Type: Grant

Filed: July 31, 2020

Date of Patent: November 5, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Dimitrios B. Dimitriadis, Kenichi Kumatani, Robert Peter Gmyr, Masaki Itagaki, Yashesh Gaur, Nanshan Zeng, Xuedong Huang
NATURAL LANGUAGE TRAINING AND/OR AUGMENTATION WITH LARGE LANGUAGE MODELS

Publication number: 20240346254

Abstract: The techniques described herein enhance the operations of natural language generation systems through training and/or augmentation by a large language model. In a first example, the large language model can execute training operations by processing a training dataset to produce a natural language output. The natural language generation system can analyze the training dataset and the natural language output to generate a natural language output mimicking the output of the large language model. The large language model can then evaluate the output of the natural language generation system to iteratively adjust and improve the quality of natural language outputs. In a second example, the large language can augment a small language model in executing natural language tasks. This is accomplished by retrieving external information using the large language model to generate an augmentation input to provide context and a language framework to the small language model to enhance overall outputs.

Type: Application

Filed: April 12, 2023

Publication date: October 17, 2024

Inventors: Yang LIU, Yichong XU, Dan ITER, Chenguang ZHU, Nanshan ZENG, Shuohang WANG, Hiteshi SHARMA
SYSTEMS AND METHODS FOR REAL-TIME MEETING SUMMARIZATION

Publication number: 20240340193

Abstract: Systems and methods are provided for processing electronic content and generating corresponding output. Electronic content is received from a meeting, including recognizable speech content. This content is then summarized into real-time summary output by processing and encoding the meeting content while selectively alternating between unidirectional attention and bidirectional attention that is applied to the meeting contents.

Type: Application

Filed: April 10, 2023

Publication date: October 10, 2024

Inventors: Chenguang ZHU, Xuedong HUANG, Zong Zong YUAN, Wei XIONG, Nanshan ZENG, Yuantao WANG
QUALITY ASSURANCE FOR DIGITAL TECHNOLOGIES USING LARGE LANGUAGE MODELS

Publication number: 20240330165

Abstract: Systems and methods are provided for implementing quality assurance for digital technologies using language model (“LM”)-based artificial intelligence (“AI”) and/or machine learning (“ML”) systems. In various embodiments, a first prompt is provided to an LM actor or attacker to cause the LM actor or attacker to generate interaction content for interacting with test software. Responses from the test software are then evaluated by an LM evaluator to produce evaluation results. In some examples, a second prompt is generated that includes the responses from the test software along with the evaluation criteria for the test software. When the second prompt is provided to the LM evaluator, the LM evaluator generates the evaluation results.

Type: Application

Filed: April 3, 2023

Publication date: October 3, 2024

Applicant: Microsoft Technology Licensing, LLC

Inventors: Reid Allen PRYZANT, Yin Tat LEE, Chenguang ZHU, Sebastien BUBECK, Ronen ELDAN, Yuwei FANG, Dan ITER, Yichong XU, Yuanzhi LI, Yi ZHANG, Lijuan QIN, Nanshan ZENG, Xuedong HUANG
Processing overlapping speech from distributed devices

Patent number: 12051422

Abstract: A computer implemented method includes receiving audio signals representative of speech via multiple audio streams transmitted from corresponding multiple distributed devices, performing, via a neural network model, continuous speech separation for one or more of the received audio signals having overlapped speech, and providing the separated speech on a fixed number of separate output audio channels.

Type: Grant

Filed: September 13, 2021

Date of Patent: July 30, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Takuya Yoshioka, Andreas Stolcke, Zhuo Chen, Dimitrios Basile Dimitriadis, Nanshan Zeng, Lijuan Qin, William Isaac Hinthorn, Xuedong Huang
Automated meeting minutes generator

Patent number: 11990132

Abstract: A transcription of audio speech included in electronic content associated with a meeting is created by an ASR model trained on speech-to-text data. The transcription is post-processed by modifying text included in the transcription, for example, by modifying punctuation, grammar, or formatting introduced by the ASR model and by changing or omitting one or more words that were included in both the audio speech and the transcription. After the transcription is post-processed, output based on the post-processed transcription is generated in the form of a meeting summary and/or template.

Type: Grant

Filed: February 28, 2023

Date of Patent: May 21, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Chenguang Zhu, Yu Shi, William Isaac Hinthorn, Nanshan Zeng, Ruochen Xu, Liyang Lu, Xuedong Huang
PRE-TRAINING A UNIFIED NATURAL LANGUAGE MODEL WITH CORRUPTED SPAN AND REPLACED TOKEN DETECTION

Publication number: 20240062018

Abstract: Systems and methods are provided for training and using a novel unified language foundation model. An encoder-decoder natural language model is obtained and various training data is obtained and used for training. The training process integrates a combination of replaced token detection, corrupted span reconstruction, and disentangled attention methodologies to produce a unified encoder-decoder model. The trained model is trained for performing both natural language understanding (NLU) tasks and natural language generation (NLG) tasks. Attention applied to the model is applied discretely to segmented chunks of encoded data during processing to improve the efficiency of applying attention by the model.

Type: Application

Filed: October 20, 2022

Publication date: February 22, 2024

Inventors: Pengcheng HE, Jianfeng GAO, Nanshan ZENG, Xuedong HUANG, Wei XIONG, Baolin PENG
UNIFIED NATURAL LANGUAGE MODEL WITH SEGMENTED AND AGGREGATE ATTENTION

Publication number: 20240062020

Abstract: Systems and methods are provided for training and using a novel unified language foundation model. An encoder-decoder natural language model is obtained and various training data is obtained and used for training. The training process integrates a combination of replaced token detection, corrupted span reconstruction, and disentangled attention methodologies to produce a unified encoder-decoder model. The trained model is trained for performing both natural language understanding (NLU) tasks and natural language generation (NLG) tasks. Attention applied to the model is applied discretely to segmented chunks of encoded data during processing to improve the efficiency of applying attention by the model.

Type: Application

Filed: October 20, 2022

Publication date: February 22, 2024

Inventors: Pengcheng HE, Jianfeng GAO, Nanshan ZENG, Xuedong HUANG, Wei XIONG, Baolin PENG
Audio-visual diarization to identify meeting attendees

Patent number: 11875796

Abstract: A computer implemented method includes receiving information streams on a meeting server from a set of multiple distributed devices included in a meeting, receiving audio signals representative of speech by at least two users in at least two of the information streams, receiving at least one video signal of at least one user in the information streams, associating a specific user with speech in the received audio signals as a function of the received audio and video signals, and generating a transcript of the meeting with an indication of the specific user associated with the speech.

Type: Grant

Filed: April 30, 2019

Date of Patent: January 16, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Lijuan Qin, Nanshan Zeng, Dimitrios Basile Dimitriadis, Zhuo Chen, Andreas Stolcke, Takuya Yoshioka, William Isaac Hinthorn, Xuedong Huang
Synthetic data generation for training of natural language understanding models

Patent number: 11875787

Abstract: This document relates to machine learning. One example includes a method or technique that can be performed on a computing device. The method or technique can include obtaining a task-semantically-conditioned generative model that has been pretrained based at least on a first training data set having unlabeled training examples and semantically conditioned based at least on a second training data set having dialog act-labeled utterances. The method or technique can also include inputting dialog acts into the semantically-conditioned generative model and obtaining synthetic utterances that are output by the semantically-conditioned generative model. The method or technique can also include outputting the synthetic utterances.

Type: Grant

Filed: October 11, 2022

Date of Patent: January 16, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Baolin Peng, Chenguang Zhu, Chunyuan Li, Xiujun Li, Jinchao Li, Nanshan Zeng, Jianfeng Gao
AUTOMATIC RULE INDUCTION FOR SEMI-SUPERVISED TEXT CLASSIFICATION

Publication number: 20230376789

Abstract: Systems and techniques are provided for facilitating the automatic discovery and application of rules for refining the training of pretrained models, such as natural language processing models. Weak symbolic rules are automatically generated from the identification and processing of sparse labeled data by the pretrained model(s). Once the weak rules are generated, they are integrated into the model(s) via an attention mechanism to supplement the direct training performed by the sparse labeled data and to thereby boost a supervision signal generated by the sparse labeled data on any newly processed unlabeled data in the intended runtime environment(s) where the models are applied.

Type: Application

Filed: June 10, 2022

Publication date: November 23, 2023

Inventors: Reid Allen PRYZANT, Chenguang ZHU, Ziyi YANG, Yichong XU, Nanshan ZENG
UNIFIED SPEECH REPRESENTATION LEARNING

Publication number: 20230368782

Abstract: Systems and methods are provided for training a machine learning model to learn speech representations. Labeled speech data or both labeled and unlabeled data sets is applied to a feature extractor of a machine learning model to generate latent speech representations. The latent speech representations are applied to a quantizer to generate quantized latent speech representations and to a transformer context network to generate contextual representations. Each contextual representation included in the contextual representations is aligned with a phoneme label to generate phonetically-aware contextual representations. Quantized latent representations are aligned with phoneme labels to generate phonetically aware latent speech representations.

Type: Application

Filed: July 3, 2023

Publication date: November 16, 2023

Inventors: Yao QIAN, Yu WU, Kenichi KUMATANI, Shujie LIU, Furu WEI, Nanshan ZENG, Xuedong David HUANG, Chengyi WANG
Generation of optimized knowledge-based language model through knowledge graph multi-alignment

Patent number: 11798529

Abstract: A language module is joint trained with a knowledge module for natural language understanding by aligning a first knowledge graph with a second knowledge graph. The knowledge module is trained on the aligned knowledge graphs. Then, the knowledge module is integrated with the language module to generate an integrated knowledge-language module.

Type: Grant

Filed: May 18, 2021

Date of Patent: October 24, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Chenguang Zhu, Nanshan Zeng

1 2 3 4 next