Patents by Inventor Hemin YANG

Hemin YANG has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

METHODS AND SYSTEMS FOR TRAINING AN ARTIFICIAL INTELLIGENCE (AI) TOTAL DURATION-AWARE MODEL TO CONTROL THE TOTAL DURATION OF SPEECH UTTERANCES BY A TEXT-TO-SPEECH (TTS) COMPUTING SYTEM

Publication number: 20250378816

Abstract: Systems and methods are provided for training and using a total duration-aware (TDA) model to control the duration of speech utterances by a text-to-speech computing system when converting text into speech. During use, text to be converted into speech and target output speech time duration are used as inputs into the TDA model. The text is then tokenized into phonemes, and the TDA model predicts frame durations for each phoneme. The TDA model is trained on phonemes derived from text, corresponding actual frame durations for the phonemes, and a target output speech time duration. The TDA model masks a subset of the actual frame durations, and generates predicted frame durations for the subset. A loss between the actual and predicted frame durations is calculated, and used to adjust parameters of the TDA model to control future generation of predicted frame durations.

Type: Application

Filed: July 30, 2024

Publication date: December 11, 2025

Inventors: Sefik Emre ESKIMEZ, Xiaofei WANG, Manthan THAKKER, Naoyuki KANDA, Hemin YANG, Zirun ZHU, Min TANG
Systems and methods for human listening and live captioning

Patent number: 11922963

Abstract: Systems and methods are provided for generating and operating a speech enhancement model optimized for generating noise-suppressed speech outputs for improved human listening and live captioning. A computing system obtains a speech enhancement model trained on a first training dataset to generate noise-suppressed speech outputs and an automatic speech recognition model trained on a second training dataset to generate transcription labels for spoken language utterances. A third training dataset comprising a set of spoken language utterances is applied to the speech enhancement model to obtain a first noise-suppressed speech output which is applied to the automatic speech recognition model to generate a noise-suppressed transcription output for the set of spoken language utterances.

Type: Grant

Filed: May 26, 2021

Date of Patent: March 5, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Xiaofei Wang, Sefik Emre Eskimez, Min Tang, Hemin Yang, Zirun Zhu, Zhuo Chen, Huaming Wang, Takuya Yoshioka
SYSTEMS AND METHODS FOR HUMAN LISTENING AND LIVE CAPTIONING

Publication number: 20220383887

Abstract: Systems and methods are provided for generating and operating a speech enhancement model optimized for generating noise-suppressed speech outputs for improved human listening and live captioning. A computing system obtains a speech enhancement model trained on a first training dataset to generate noise-suppressed speech outputs and an automatic speech recognition model trained on a second training dataset to generate transcription labels for spoken language utterances. A third training dataset comprising a set of spoken language utterances is applied to the speech enhancement model to obtain a first noise-suppressed speech output which is applied to the automatic speech recognition model to generate a noise-suppressed transcription output for the set of spoken language utterances.

Type: Application

Filed: May 26, 2021

Publication date: December 1, 2022

Inventors: Xiaofei WANG, Sefik Emre ESKIMEZ, Min TANG, Hemin YANG, Zirun ZHU, Zhuo CHEN, Huaming WANG, Takuya YOSHIOKA

METHODS AND SYSTEMS FOR TRAINING AN ARTIFICIAL INTELLIGENCE (AI) TOTAL DURATION-AWARE MODEL TO CONTROL THE TOTAL DURATION OF SPEECH UTTERANCES BY A TEXT-TO-SPEECH (TTS) COMPUTING SYTEM

Systems and methods for human listening and live captioning

SYSTEMS AND METHODS FOR HUMAN LISTENING AND LIVE CAPTIONING