Patents by Inventor Sefik Emre ESKIMEZ

Sefik Emre ESKIMEZ has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Joint Acoustic Echo Cancellation (AEC) and Personalized Noise Suppression (PNS)

Publication number: 20240135949

Abstract: A data processing system implements receiving a far-end signal associated with a first computing device participating in an online communication session and receiving a near-end signal associated with a second computing device participating in the online communication session. The near-end signal includes speech of a target speaker, a first interfering speaker, and an echo signal. The system further implements providing the far-end signal, the near-end signal, and an indication of the target speaker as an input to a machine learning model. The machine learning model trained to analyze the far-end signal and the near-end signal to perform personalized noise suppression (PNS) to remove speech from one or more interfering speakers and acoustic echo cancellation (AEC) to remove echoes. The model is trained to output an audio signal comprising speech of the target speaker. The system obtains the audio signal comprising the speech of the target speaker from the model.

Type: Application

Filed: February 21, 2023

Publication date: April 25, 2024

Applicant: Microsoft Technology Licensing, LLC

Inventors: Sefik Emre ESKIMEZ, Takuya YOSHIOKA, Huaming WANG, Alex Chenzhi JU, Min TANG, Tanel PÄRNAMAA
Systems and methods for human listening and live captioning

Patent number: 11922963

Abstract: Systems and methods are provided for generating and operating a speech enhancement model optimized for generating noise-suppressed speech outputs for improved human listening and live captioning. A computing system obtains a speech enhancement model trained on a first training dataset to generate noise-suppressed speech outputs and an automatic speech recognition model trained on a second training dataset to generate transcription labels for spoken language utterances. A third training dataset comprising a set of spoken language utterances is applied to the speech enhancement model to obtain a first noise-suppressed speech output which is applied to the automatic speech recognition model to generate a noise-suppressed transcription output for the set of spoken language utterances.

Type: Grant

Filed: May 26, 2021

Date of Patent: March 5, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Xiaofei Wang, Sefik Emre Eskimez, Min Tang, Hemin Yang, Zirun Zhu, Zhuo Chen, Huaming Wang, Takuya Yoshioka
ARRAY GEOMETRY AGNOSTIC MULTI-CHANNEL PERSONALIZED SPEECH ENHANCEMENT

Publication number: 20230116052

Abstract: Examples of array geometry agnostic multi-channel personalized speech enhancement (PSE) extract speaker embeddings, which represent acoustic characteristics of one or more target speakers, from target speaker enrollment data. Spatial features (e.g., inter-channel phase difference) are extracted from input audio captured by a microphone array. The input audio includes a mixture of speech data of the target speaker(s) and one or more interfering speaker(s). The input audio, the extracted speaker embeddings, and the extracted spatial features are provided to a trained geometry-agnostic PSE model. Output data is produced, which comprises estimated clean speech data of the target speaker(s) that has a reduction (or elimination) of speech data of the interfering speaker(s), without the trained PSE model requiring geometry information for the microphone array.

Type: Application

Filed: December 17, 2021

Publication date: April 13, 2023

Inventors: Sefik Emre ESKIMEZ, Takuya YOSHIOKA, Huaming WANG, Hassan TAHERIAN, Zhuo CHEN, Xuedong HUANG
SYSTEMS AND METHODS FOR HUMAN LISTENING AND LIVE CAPTIONING

Publication number: 20220383887

Abstract: Systems and methods are provided for generating and operating a speech enhancement model optimized for generating noise-suppressed speech outputs for improved human listening and live captioning. A computing system obtains a speech enhancement model trained on a first training dataset to generate noise-suppressed speech outputs and an automatic speech recognition model trained on a second training dataset to generate transcription labels for spoken language utterances. A third training dataset comprising a set of spoken language utterances is applied to the speech enhancement model to obtain a first noise-suppressed speech output which is applied to the automatic speech recognition model to generate a noise-suppressed transcription output for the set of spoken language utterances.

Type: Application

Filed: May 26, 2021

Publication date: December 1, 2022

Inventors: Xiaofei WANG, Sefik Emre ESKIMEZ, Min TANG, Hemin YANG, Zirun ZHU, Zhuo CHEN, Huaming WANG, Takuya YOSHIOKA

Joint Acoustic Echo Cancellation (AEC) and Personalized Noise Suppression (PNS)

Systems and methods for human listening and live captioning

ARRAY GEOMETRY AGNOSTIC MULTI-CHANNEL PERSONALIZED SPEECH ENHANCEMENT

SYSTEMS AND METHODS FOR HUMAN LISTENING AND LIVE CAPTIONING