Patents by Inventor Sefik Emre ESKIMEZ

Sefik Emre ESKIMEZ has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240135949
    Abstract: A data processing system implements receiving a far-end signal associated with a first computing device participating in an online communication session and receiving a near-end signal associated with a second computing device participating in the online communication session. The near-end signal includes speech of a target speaker, a first interfering speaker, and an echo signal. The system further implements providing the far-end signal, the near-end signal, and an indication of the target speaker as an input to a machine learning model. The machine learning model trained to analyze the far-end signal and the near-end signal to perform personalized noise suppression (PNS) to remove speech from one or more interfering speakers and acoustic echo cancellation (AEC) to remove echoes. The model is trained to output an audio signal comprising speech of the target speaker. The system obtains the audio signal comprising the speech of the target speaker from the model.
    Type: Application
    Filed: February 21, 2023
    Publication date: April 25, 2024
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Sefik Emre ESKIMEZ, Takuya YOSHIOKA, Huaming WANG, Alex Chenzhi JU, Min TANG, Tanel PĂ„RNAMAA
  • Patent number: 11922963
    Abstract: Systems and methods are provided for generating and operating a speech enhancement model optimized for generating noise-suppressed speech outputs for improved human listening and live captioning. A computing system obtains a speech enhancement model trained on a first training dataset to generate noise-suppressed speech outputs and an automatic speech recognition model trained on a second training dataset to generate transcription labels for spoken language utterances. A third training dataset comprising a set of spoken language utterances is applied to the speech enhancement model to obtain a first noise-suppressed speech output which is applied to the automatic speech recognition model to generate a noise-suppressed transcription output for the set of spoken language utterances.
    Type: Grant
    Filed: May 26, 2021
    Date of Patent: March 5, 2024
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Xiaofei Wang, Sefik Emre Eskimez, Min Tang, Hemin Yang, Zirun Zhu, Zhuo Chen, Huaming Wang, Takuya Yoshioka
  • Publication number: 20230116052
    Abstract: Examples of array geometry agnostic multi-channel personalized speech enhancement (PSE) extract speaker embeddings, which represent acoustic characteristics of one or more target speakers, from target speaker enrollment data. Spatial features (e.g., inter-channel phase difference) are extracted from input audio captured by a microphone array. The input audio includes a mixture of speech data of the target speaker(s) and one or more interfering speaker(s). The input audio, the extracted speaker embeddings, and the extracted spatial features are provided to a trained geometry-agnostic PSE model. Output data is produced, which comprises estimated clean speech data of the target speaker(s) that has a reduction (or elimination) of speech data of the interfering speaker(s), without the trained PSE model requiring geometry information for the microphone array.
    Type: Application
    Filed: December 17, 2021
    Publication date: April 13, 2023
    Inventors: Sefik Emre ESKIMEZ, Takuya YOSHIOKA, Huaming WANG, Hassan TAHERIAN, Zhuo CHEN, Xuedong HUANG
  • Publication number: 20220383887
    Abstract: Systems and methods are provided for generating and operating a speech enhancement model optimized for generating noise-suppressed speech outputs for improved human listening and live captioning. A computing system obtains a speech enhancement model trained on a first training dataset to generate noise-suppressed speech outputs and an automatic speech recognition model trained on a second training dataset to generate transcription labels for spoken language utterances. A third training dataset comprising a set of spoken language utterances is applied to the speech enhancement model to obtain a first noise-suppressed speech output which is applied to the automatic speech recognition model to generate a noise-suppressed transcription output for the set of spoken language utterances.
    Type: Application
    Filed: May 26, 2021
    Publication date: December 1, 2022
    Inventors: Xiaofei WANG, Sefik Emre ESKIMEZ, Min TANG, Hemin YANG, Zirun ZHU, Zhuo CHEN, Huaming WANG, Takuya YOSHIOKA