Patents by Inventor Xuedong Huang

Xuedong Huang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

ARRAY GEOMETRY AGNOSTIC MULTI-CHANNEL PERSONALIZED SPEECH ENHANCEMENT

Publication number: 20230116052

Abstract: Examples of array geometry agnostic multi-channel personalized speech enhancement (PSE) extract speaker embeddings, which represent acoustic characteristics of one or more target speakers, from target speaker enrollment data. Spatial features (e.g., inter-channel phase difference) are extracted from input audio captured by a microphone array. The input audio includes a mixture of speech data of the target speaker(s) and one or more interfering speaker(s). The input audio, the extracted speaker embeddings, and the extracted spatial features are provided to a trained geometry-agnostic PSE model. Output data is produced, which comprises estimated clean speech data of the target speaker(s) that has a reduction (or elimination) of speech data of the interfering speaker(s), without the trained PSE model requiring geometry information for the microphone array.

Type: Application

Filed: December 17, 2021

Publication date: April 13, 2023

Inventors: Sefik Emre ESKIMEZ, Takuya YOSHIOKA, Huaming WANG, Hassan TAHERIAN, Zhuo CHEN, Xuedong HUANG
Automated meeting minutes generator

Patent number: 11615799

Abstract: A transcription of audio speech included in electronic content associated with a meeting is created by an ASR model trained on speech-to-text data. The transcription is post-processed by modifying text included in the transcription, for example, by modifying punctuation, grammar, or formatting introduced by the ASR model and by changing or omitting one or more words that were included in both the audio speech and the transcription. After the transcription is post-processed, output based on the post-processed transcription is generated in the form of a meeting summary and/or template.

Type: Grant

Filed: May 29, 2020

Date of Patent: March 28, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Chenguang Zhu, Yu Shi, William Isaac Hinthorn, Nanshan Zeng, Ruochen Xu, Liyang Lu, Xuedong Huang
Distributed device meeting initiation

Patent number: 11468895

Abstract: A computer implemented method includes receiving audio streams at a meeting server from two distributed devices that are streaming audio captured during an ad-hoc meeting between at least two users, comparing the received audio streams to determine that the received audio streams are representative of sound from the ad-hoc meeting, generating a meeting instance to process the audio streams in response to the comparing determining that the audio streams are representative of sound from the ad-hoc meeting, and processing the received audio streams to generate a transcript of the ad-hoc meeting.

Type: Grant

Filed: April 30, 2019

Date of Patent: October 11, 2022

Assignee: Microsoft Technology Licensing, LLC

Inventors: Takuya Yoshioka, Andreas Stolcke, Zhuo Chen, Dimitrios Basile Dimitriadis, Nanshan Zeng, Lijuan Qin, William Isaac Hinthorn, Xuedong Huang
CONTROLLED TRAINING AND USE OF TEXT-TO-SPEECH MODELS AND PERSONALIZED MODEL GENERATED VOICES

Publication number: 20220310058

Abstract: Systems are configured for generating text-to-speech data in a personalized voice by training a neural text-to-speech machine learning model on natural speech data collected from a particular user, validating the identity of the user from which data is collected, and authorizing requests from users to use the personalized voice in generating new speech data. The systems are further configured to train a machine learning model as a neural text-to-speech model with generated personalized speech data.

Type: Application

Filed: November 3, 2020

Publication date: September 29, 2022

Inventors: Sheng ZHAO, Li JIANG, Xuedong HUANG, Lijuan QIN, Lei HE, Binggong DING, Bo YAN, Chunling MA, Raunak OBEROI
Speaker Attributed Transcript Generation

Publication number: 20220230642

Abstract: A computer implemented method processes audio streams recorded during a meeting by a plurality of distributed devices.

Type: Application

Filed: April 4, 2022

Publication date: July 21, 2022

Inventors: Takuya Yoshioka, Andreas Stolcke, Zhuo Chen, Dimitrios Basile Dimitriadis, Nanshan ZENG, Lijuan QIN, William Isaac Hinthorn, Xuedong HUANG
SYSTEMS, METHODS, AND COMPUTER-READABLE STORAGE DEVICE FOR GENERATING NOTES FOR A MEETING BASED ON PARTICIPANT ACTIONS AND MACHINE LEARNING

Publication number: 20220180869

Abstract: Systems, methods, and computer-readable storage devices are disclosed for generating smart notes for a meeting based on participant actions and machine learning. One method including: receiving meeting data from a plurality of participant devices participating in an online meeting; continuously generating text data based on the received audio data from each participant device of the plurality of participant devices; iteratively performing the following steps until receiving meeting data for the meeting has ended, the steps including: receiving an indication that a predefined action has occurred on the first participating device; generating a participant segment of the meeting data for at least the first participant device from a first predetermined time before when the predefined action occurred to when the predefined action occurred; determining whether the receiving meeting data of the meeting has ended; and generating a summary of the meeting.

Type: Application

Filed: November 18, 2021

Publication date: June 9, 2022

Inventors: Heiko Rahmel, Li-Juan Qin, Xuedong Huang, Wei Xiong
Speaker attributed transcript generation

Patent number: 11322148

Abstract: A computer implemented method processes audio streams recorded during a meeting by a plurality of distributed devices.

Type: Grant

Filed: April 30, 2019

Date of Patent: May 3, 2022

Assignee: Microsoft Technology Licensing, LLC

Inventors: Takuya Yoshioka, Andreas Stolcke, Zhuo Chen, Dimitrios Basile Dimitriadis, Nanshan Zeng, Lijuan Qin, William Isaac Hinthorn, Xuedong Huang
DYNAMIC GRADIENT AGGREGATION FOR TRAINING NEURAL NETWORKS

Publication number: 20220036178

Abstract: The disclosure herein describes training a global model based on a plurality of data sets. The global model is applied to each data set of the plurality of data sets and a plurality of gradients is generated based on that application. At least one gradient quality metric is determined for each gradient of the plurality of gradients. Based on the determined gradient quality metrics of the plurality of gradients, a plurality of weight factors is calculated. The plurality of gradients is transformed into a plurality of weighted gradients based on the calculated plurality of weight factors and a global gradient is generated based on the plurality of weighted gradients. The global model is updated based on the global gradient, wherein the updated global model, when applied to a data set, performs a task based on the data set and provides model output based on performing the task.

Type: Application

Filed: July 31, 2020

Publication date: February 3, 2022

Inventors: Dimitrios B. DIMITRIADIS, Kenichi KUMATANI, Robert Peter GMYR, Masaki ITAGAKI, Yashesh GAUR, Nanshan ZENG, Xuedong HUANG
Processing Overlapping Speech from Distributed Devices

Publication number: 20210407516

Abstract: A computer implemented method includes receiving audio signals representative of speech via multiple audio streams transmitted from corresponding multiple distributed devices, performing, via a neural network model, continuous speech separation for one or more of the received audio signals having overlapped speech, and providing the separated speech on a fixed number of separate output audio channels.

Type: Application

Filed: September 13, 2021

Publication date: December 30, 2021

Inventors: Takuya Yoshioka, Andreas Stolcke, Zhuo Chen, Dimitrios Basile Dimitriadis, Nanshan Zeng, Lijuan Qin, William Isaac Hinthorn, Xuedong Huang
AUTOMATED MEETING MINUTES GENERATOR

Publication number: 20210375289

Abstract: A transcription of audio speech included in electronic content associated with a meeting is created by an ASR model trained on speech-to-text data. The transcription is post-processed by modifying text included in the transcription, for example, by modifying punctuation, grammar, or formatting introduced by the ASR model and by changing or omitting one or more words that were included in both the audio speech and the transcription. After the transcription is post-processed, output based on the post-processed transcription is generated in the form of a meeting summary and/or template.

Type: Application

Filed: May 29, 2020

Publication date: December 2, 2021

Inventors: Chenguang Zhu, Yu Shi, William Isaac Hinthorn, Nanshan Zeng, Ruochen Xu, Liyang Lu, Xuedong Huang
Systems, methods, and computer-readable storage device for generating notes for a meeting based on participant actions and machine learning

Patent number: 11183192

Abstract: Systems, methods, and computer-readable storage devices are disclosed for generating smart notes for a meeting based on participant actions and machine learning. One method including: receiving meeting data from a plurality of participant devices participating in an online meeting; continuously generating text data based on the received audio data from each participant device of the plurality of participant devices; iteratively performing the following steps until receiving meeting data for the meeting has ended, the steps including: receiving an indication that a predefined action has occurred on the first participating device; generating a participant segment of the meeting data for at least the first participant device from a first predetermined time before when the predefined action occurred to when the predefined action occurred; determining whether the receiving meeting data of the meeting has ended; and generating a summary of the meeting.

Type: Grant

Filed: November 13, 2019

Date of Patent: November 23, 2021

Assignee: Microsoft Technology Licensing, LLC

Inventors: Heiko Rahmel, Li-Juan Qin, Xuedong Huang, Wei Xiong
Conversational virtual assistant

Patent number: 11157490

Abstract: Conversational virtual assistance for delivering relevant query solutions is provided. A virtual assistant system comprises various components associated with developing a knowledge database that can be searched for finding documents that fulfill the user's intent. The virtual assistant system further comprises components for receiving a query from a user, extracting entities for understanding the user's intent, and for searching a knowledge database for documents responsive to the query. When additional information is needed for determining more relevant results, a conversation strategy is determined, and a question is formulated for generating a conversation with the user for clarifying the user's intent, confirming a solution, or obtaining additional information. The user is enabled to provide a follow-up response that is related to a previously identified entity. The entity is edited in the query, and responses are refined responsive to the edited query.

Type: Grant

Filed: February 16, 2017

Date of Patent: October 26, 2021

Assignee: Microsoft Technology Licensing, LLC

Inventors: Chenguang Zhu, Weizhu Chen, Jianwen Zhang, Xuedong Huang, Zheng Chen
Processing overlapping speech from distributed devices

Patent number: 11138980

Abstract: A computer implemented method includes receiving audio signals representative of speech via multiple audio streams transmitted from corresponding multiple distributed devices, performing, via a neural network model, continuous speech separation for one or more of the received audio signals having overlapped speech, and providing the separated speech on a fixed number of separate output audio channels.

Type: Grant

Filed: April 30, 2019

Date of Patent: October 5, 2021

Assignee: Microsoft Technology Licensing, LLC

Inventors: Takuya Yoshioka, Andreas Stolcke, Zhuo Chen, Dimitrios Basile Dimitriadis, Nanshan Zeng, Lijuan Qin, William Isaac Hinthorn, Xuedong Huang
Computerized Intelligent Assistant for Conferences

Publication number: 20210210097

Abstract: A method for facilitating a remote conference includes receiving a digital video and a computer-readable audio signal. A face recognition machine is operated to recognize a face of a first conference participant in the digital video, and a speech recognition machine is operated to translate the computer-readable audio signal into a first text. An attribution machine attributes the text to the first conference participant. A second computer-readable audio signal is processed similarly, to obtain a second text attributed to a second conference participant. A transcription machine automatically creates a transcript including the first text attributed to the first conference participant and the second text attributed to the second conference participant.

Type: Application

Filed: December 8, 2020

Publication date: July 8, 2021

Inventors: Adi DIAMANT, Karen MASTER BEN-DOR, Eyal KRUPKA, Raz HALALY, Yoni SMOLIN, Ilya GURVICH, Aviv HURVITZ, Lijuan QIN, Wei XIONG, Shixiong ZHANG, Lingfeng WU, Xiong XIAO, Ido LEICHTER, Moshe DAVID, Xuedong HUANG, Amit Kumar AGARWAL
Customized output to optimize for user preference in a distributed system

Patent number: 11023690

Abstract: Systems and methods for providing customized output based on a user preference in a distributed system are provided. In example embodiments, a meeting server or system receives audio streams from a plurality of distributed devices involved in an intelligent meeting. The meeting system identifies a user corresponding to a distributed device of the plurality of distributed devices and determines a preferred language of the user. A transcript from the received audio streams is generated. The meeting system translates the transcript into the preferred language of the user to form a translated transcript. The translated transcript is provided to the distributed device of the user.

Type: Grant

Filed: April 30, 2019

Date of Patent: June 1, 2021

Assignee: Microsoft Technology Licensing, LLC

Inventors: Takuya Yoshioka, Andreas Stolcke, Zhuo Chen, Dimitrios Basile Dimitriadis, Nanshan Zeng, Lijuan Qin, William Isaac Hinthorn, Xuedong Huang
CUSTOMIZED TRANSCRIBED CONVERSATIONS

Publication number: 20210042477

Abstract: Systems and methods may be used to provide transcription and translation services. A method may include initializing a plurality of user devices with respective language output selections in a translation group by receiving a shared identifier from the plurality of user devices and transcribing the audio stream to transcribed text. The method may include translating the transcribed text to one or more of the respective language output selections when an original language of the transcribed text differs from the one or more of the respective language output selections. The method may include sending, a user device in the translation group, the transcribed text including translated text in a language corresponding to the respective language output selection for the user device. In an example, the method may include customizing the transcription or the translation, such as to a particular topic, location, user, or the like.

Type: Application

Filed: October 23, 2020

Publication date: February 11, 2021

Inventors: William D. LEWIS, Ivo José Garcia Dos SANTOS, Tanvi Surti, Arul A. Menezes, Olivier Nano, Christian Wendt, Xuedong Huang
Computerized intelligent assistant for conferences

Patent number: 10867610

Abstract: A method for facilitating a remote conference includes receiving a digital video and a computer-readable audio signal. A face recognition machine is operated to recognize a face of a first conference participant in the digital video, and a speech recognition machine is operated to translate the computer-readable audio signal into a first text. An attribution machine attributes the text to the first conference participant. A second computer-readable audio signal is processed similarly, to obtain a second text attributed to a second conference participant. A transcription machine automatically creates a transcript including the first text attributed to the first conference participant and the second text attributed to the second conference participant.

Type: Grant

Filed: June 29, 2018

Date of Patent: December 15, 2020

Assignee: Microsoft Technology Licensing, LLC

Inventors: Adi Diamant, Karen Master Ben-Dor, Eyal Krupka, Raz Halaly, Yoni Smolin, Ilya Gurvich, Aviv Hurvitz, Lijuan Qin, Wei Xiong, Shixiong Zhang, Lingfeng Wu, Xiong Xiao, Ido Leichter, Moshe David, Xuedong Huang, Amit Kumar Agarwal
CUSTOMIZED OUTPUT TO OPTIMIZE FOR USER PREFERENCE IN A DISTRIBUTED SYSTEM

Publication number: 20200349230

Abstract: Systems and methods for providing customized output based on a user preference in a distributed system are provided. In example embodiments, a meeting server or system receives audio streams from a plurality of distributed devices involved in an intelligent meeting. The meeting system identifies a user corresponding to a distributed device of the plurality of distributed devices and determines a preferred language of the user. A transcript from the received audio streams is generated. The meeting system translates the transcript into the preferred language of the user to form a translated transcript. The translated transcript is provided to the distributed device of the user.

Type: Application

Filed: April 30, 2019

Publication date: November 5, 2020

Inventors: Takuya Yoshioka, Andreas Stolcke, Zhuo Chen, Dimitrios Basile Dimitriadis, Nanshan Zeng, Lijuan Qin, William Isaac Hinthorn, Xuedong Huang
Speaker Attributed Transcript Generation

Publication number: 20200349950

Abstract: A computer implemented method processes audio streams recorded during a meeting by a plurality of distributed devices.

Type: Application

Filed: April 30, 2019

Publication date: November 5, 2020

Inventors: Takuya Yoshioka, Andreas Stolcke, Zhuo Chen, Dimitrios Basile Dimitriadis, Nanshan Zeng, Lijuan Qin, William Isaac Hinthorn, Xuedong Huang
Audio-visual diarization to identify meeting attendees

Publication number: 20200349953

Abstract: A computer implemented method includes receiving information streams on a meeting server from a set of multiple distributed devices included in a meeting, receiving audio signals representative of speech by at least two users in at least two of the information streams, receiving at least one video signal of at least one user in the information streams, associating a specific user with speech in the received audio signals as a function of the received audio and video signals, and generating a transcript of the meeting with an indication of the specific user associated with the speech.

Type: Application

Filed: April 30, 2019

Publication date: November 5, 2020

Inventors: Lijuan Qin, Nanshan Zeng, Dimitrios Basile Dimitriadis, Zhuo Chen, Andreas Stolcke, Takuya Yoshioka, William Isaac Hinthorn, Xuedong Huang

prev 1 2 3 4 5 6 … next