Patents by Inventor DIMITRIOS B. DIMITRIADIS

DIMITRIOS B. DIMITRIADIS has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11645473
    Abstract: Systems, computer-implemented methods, and computer program products that can facilitate predicting a source of a subsequent spoken dialogue are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise a speech receiving component that can receive a spoken dialogue from a first entity. The computer executable components can further comprise a speech processing component that can employ a network that can concurrently process a transition type and a dialogue act of the spoken dialogue to predict a source of a subsequent spoken dialogue.
    Type: Grant
    Filed: December 23, 2020
    Date of Patent: May 9, 2023
    Assignees: INTERNATIONAL BUSINESS MACHINES CORPORATION, THE REGENTS OF THE UNIVERSITY OF MICHIGAN
    Inventors: Lazaros Polymenakos, Dimitrios B. Dimitriadis, Zakaria Aldeneh, Emily Mower Provost
  • Publication number: 20220036178
    Abstract: The disclosure herein describes training a global model based on a plurality of data sets. The global model is applied to each data set of the plurality of data sets and a plurality of gradients is generated based on that application. At least one gradient quality metric is determined for each gradient of the plurality of gradients. Based on the determined gradient quality metrics of the plurality of gradients, a plurality of weight factors is calculated. The plurality of gradients is transformed into a plurality of weighted gradients based on the calculated plurality of weight factors and a global gradient is generated based on the plurality of weighted gradients. The global model is updated based on the global gradient, wherein the updated global model, when applied to a data set, performs a task based on the data set and provides model output based on performing the task.
    Type: Application
    Filed: July 31, 2020
    Publication date: February 3, 2022
    Inventors: Dimitrios B. DIMITRIADIS, Kenichi KUMATANI, Robert Peter GMYR, Masaki ITAGAKI, Yashesh GAUR, Nanshan ZENG, Xuedong HUANG
  • Patent number: 11120802
    Abstract: An approach is provided that receives an audio stream and utilizes a voice activation detection (VAD) process to create a digital audio stream of voices from at least two different speakers. An automatic speech recognition (ASR) process is applied to the digital stream with the ASR process resulting in the spoken words to which a speaker turn detection (STD) process is applied to identify a number of speaker segments with each speaker segment ending at a word boundary. A speaker clustering algorithm is then applied to the speaker segments to associate one of the speakers with each of the speaker segments.
    Type: Grant
    Filed: November 21, 2017
    Date of Patent: September 14, 2021
    Assignee: International Business Machines Corporation
    Inventors: Kenneth W. Church, Dimitrios B. Dimitriadis, Petr Fousek, Miroslav Novak, George A. Saon
  • Patent number: 11031027
    Abstract: A system for providing an acoustic environment recognizer for optimal speech processing is disclosed. In particular, the system may utilize metadata obtained from various acoustic environments to assist in suppressing ambient noise interfering with a desired audio signal. In order to do so, the system may receive an audio stream including an audio signal associated with a user and including ambient noise obtained from an acoustic environment of the user. The system may obtain first metadata associated with the ambient noise, and may determine if the first metadata corresponds to second metadata in a profile for the acoustic environment. If the first metadata corresponds to the second metadata, the system may select a processing scheme for suppressing the ambient noise from the audio stream, and process the audio stream using the processing scheme. Once the audio stream is processed, the system may provide the audio stream to a destination.
    Type: Grant
    Filed: March 5, 2018
    Date of Patent: June 8, 2021
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Horst J. Schroeter, Donald J. Bowen, Dimitrios B. Dimitriadis, Lusheng Ji
  • Patent number: 10984814
    Abstract: A computer-implemented method according to one embodiment includes creating a clean dictionary, utilizing a clean signal, creating a noisy dictionary, utilizing a first noisy signal, determining a time varying projection, utilizing the clean dictionary and the noisy dictionary, and denoising a second noisy signal, utilizing the time varying projection.
    Type: Grant
    Filed: February 24, 2020
    Date of Patent: April 20, 2021
    Assignee: International Business Machines Corporation
    Inventors: Dimitrios B. Dimitriadis, Samuel Thomas, Colin C. Vaz
  • Publication number: 20210110829
    Abstract: Systems, computer-implemented methods, and computer program products that can facilitate predicting a source of a subsequent spoken dialogue are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise a speech receiving component that can receive a spoken dialogue from a first entity. The computer executable components can further comprise a speech processing component that can employ a network that can concurrently process a transition type and a dialogue act of the spoken dialogue to predict a source of a subsequent spoken dialogue.
    Type: Application
    Filed: December 23, 2020
    Publication date: April 15, 2021
    Inventors: Lazaros Polymenakos, Dimitrios B. Dimitriadis, Zakaria Aldeneh, Emily Mower Provost
  • Patent number: 10957320
    Abstract: Systems, computer-implemented methods, and computer program products that can facilitate predicting a source of a subsequent spoken dialogue are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise a speech receiving component that can receive a spoken dialogue from a first entity. The computer executable components can further comprise a speech processing component that can employ a network that can concurrently process a transition type and a dialogue act of the spoken dialogue to predict a source of a subsequent spoken dialogue.
    Type: Grant
    Filed: January 25, 2019
    Date of Patent: March 23, 2021
    Assignees: INTERNATIONAL BUSINESS MACHINES CORPORATION, THE REGENTS OF THE UNIVERSITY OF MICHIGAN
    Inventors: Lazaros Polymenakos, Dimitrios B. Dimitriadis, Zakaria Aldeneh, Emily Mower Provost
  • Patent number: 10902843
    Abstract: Audio features, such as perceptual linear prediction (PLP) features and time derivatives thereof, are extracted from frames of training audio data including speech by multiple speakers, and silence, such as by using linear discriminant analysis (LDA). The frames are clustered into k-means clusters using distance measures, such as Mahalanobis distance measures, of means and variances of the extracted audio features of the frames. A recurrent neural network (RNN) is trained on the extracted audio features of the frames and cluster identifiers of the k-means clusters into which the frames have been clustered. The RNN is applied to audio data to segment audio data into segments that each correspond to one of the cluster identifiers. Each segment can be assigned a label corresponding to one of the cluster identifiers. Speech recognition can be performed on the segments.
    Type: Grant
    Filed: November 15, 2019
    Date of Patent: January 26, 2021
    Assignee: International Business Machines Corporation
    Inventors: Dimitrios B. Dimitriadis, David C. Haws, Michael Picheny, George Saon, Samuel Thomas
  • Publication number: 20200243073
    Abstract: Systems, computer-implemented methods, and computer program products that can facilitate predicting a source of a subsequent spoken dialogue are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise a speech receiving component that can receive a spoken dialogue from a first entity. The computer executable components can further comprise a speech processing component that can employ a network that can concurrently process a transition type and a dialogue act of the spoken dialogue to predict a source of a subsequent spoken dialogue.
    Type: Application
    Filed: January 25, 2019
    Publication date: July 30, 2020
    Inventors: Lazaros Polymenakos, Dimitrios B. Dimitriadis, Zakaria Aldeneh, Emily Mower Provost
  • Publication number: 20200202879
    Abstract: A computer-implemented method according to one embodiment includes creating a clean dictionary, utilizing a clean signal, creating a noisy dictionary, utilizing a first noisy signal, determining a time varying projection, utilizing the clean dictionary and the noisy dictionary, and denoising a second noisy signal, utilizing the time varying projection.
    Type: Application
    Filed: February 24, 2020
    Publication date: June 25, 2020
    Inventors: Dimitrios B. Dimitriadis, Samuel Thomas, Colin C. Vaz
  • Patent number: 10657980
    Abstract: A computer-implemented method according to one embodiment includes creating a clean dictionary, utilizing a clean signal, creating a noisy dictionary, utilizing a first noisy signal, determining a time varying projection, utilizing the clean dictionary and the noisy dictionary, and denoising a second noisy signal, utilizing the time varying projection.
    Type: Grant
    Filed: October 25, 2017
    Date of Patent: May 19, 2020
    Assignee: International Business Machines Corporation
    Inventors: Dimitrios B. Dimitriadis, Samuel Thomas, Colin C. Vaz
  • Patent number: 10629221
    Abstract: A computer-implemented method according to one embodiment includes creating a clean dictionary, utilizing a clean signal, creating a noisy dictionary, utilizing a first noisy signal, determining a time varying projection, utilizing the clean dictionary and the noisy dictionary, denoising a second noisy signal, utilizing the time varying projection, and expanding the clean dictionary and the noisy dictionary by updating the clean dictionary and the noisy dictionary to include new clean spectro-temporal building blocks and new noisy spectro-temporal building blocks created utilizing additional clean and noisy signals.
    Type: Grant
    Filed: April 9, 2019
    Date of Patent: April 21, 2020
    Assignee: International Business Machines Corporation
    Inventors: Dimitrios B. Dimitriadis, Samuel Thomas, Colin C. Vaz
  • Patent number: 10614797
    Abstract: A diarization embodiment may include a system that clusters data up to a current point in time and consolidates it with the past decisions, and then returns the result that minimizes the difference with past decisions. The consolidation may be achieved by performing a permutation of the different possible labels and comparing the distance. For speaker diarization, a distance may be determined based on a minimum edit or hamming distance. The distance may alternatively be a measure other than the minimum edit or hamming distance. The clustering may have a finite time window over which the analysis is performed.
    Type: Grant
    Filed: November 30, 2017
    Date of Patent: April 7, 2020
    Assignee: International Business Machines Corporation
    Inventors: Kenneth W. Church, Dimitrios B. Dimitriadis, Petr Fousek, Jason W. Pelecanos, Weizhong Zhu
  • Publication number: 20200082809
    Abstract: Audio features, such as perceptual linear prediction (PLP) features and time derivatives thereof, are extracted from frames of training audio data including speech by multiple speakers, and silence, such as by using linear discriminant analysis (LDA). The frames are clustered into k-means clusters using distance measures, such as Mahalanobis distance measures, of means and variances of the extracted audio features of the frames. A recurrent neural network (RNN) is trained on the extracted audio features of the frames and cluster identifiers of the k-means clusters into which the frames have been clustered. The RNN is applied to audio data to segment audio data into segments that each correspond to one of the cluster identifiers. Each segment can be assigned a label corresponding to one of the cluster identifiers. Speech recognition can be performed on the segments.
    Type: Application
    Filed: November 15, 2019
    Publication date: March 12, 2020
    Inventors: DIMITRIOS B. DIMITRIADIS, David C. Haws, MICHAEL PICHENY, GEORGE SAON, Samuel Thomas
  • Patent number: 10546575
    Abstract: Audio features, such as perceptual linear prediction (PLP) features and time derivatives thereof, are extracted from frames of training audio data including speech by multiple speakers, and silence, such as by using linear discriminant analysis (LDA). The frames are clustered into k-means clusters using distance measures, such as Mahalanobis distance measures, of means and variances of the extracted audio features of the frames. A recurrent neural network (RNN) is trained on the extracted audio features of the frames and cluster identifiers of the k-means clusters into which the frames have been clustered. The RNN is applied to audio data to segment audio data into segments that each correspond to one of the cluster identifiers. Each segment can be assigned a label corresponding to one of the cluster identifiers. Speech recognition can be performed on the segments.
    Type: Grant
    Filed: December 14, 2016
    Date of Patent: January 28, 2020
    Assignee: International Business Machines Corporation
    Inventors: Dimitrios B. Dimitriadis, David C. Haws, Michael Picheny, George Saon, Samuel Thomas
  • Patent number: 10468031
    Abstract: An approach is provided that receives an audio stream and utilizes a voice activation detection (VAD) process to create a digital audio stream of voices from at least two different speakers. An automatic speech recognition (ASR) process is applied to the digital stream with the ASR process resulting in the spoken words to which a speaker turn detection (STD) process is applied to identify a number of speaker segments with each speaker segment ending at a word boundary. The STD process analyzes a number of speaker segments using a language model that determines when speaker changes occur. A speaker clustering algorithm is then applied to the speaker segments to associate one of the speakers with each of the speaker segments.
    Type: Grant
    Filed: November 21, 2017
    Date of Patent: November 5, 2019
    Assignee: International Business Machines Corporation
    Inventors: Kenneth W. Church, Dimitrios B. Dimitriadis, Petr Fousek, Miroslav Novak, George A. Saon
  • Publication number: 20190237090
    Abstract: A computer-implemented method according to one embodiment includes creating a clean dictionary, utilizing a clean signal, creating a noisy dictionary, utilizing a first noisy signal, determining a time varying projection, utilizing the clean dictionary and the noisy dictionary, denoising a second noisy signal, utilizing the time varying projection, and expanding the clean dictionary and the noisy dictionary by updating the clean dictionary and the noisy dictionary to include new clean spectro-temporal building blocks and new noisy spectro-temporal building blocks created utilizing additional clean and noisy signals.
    Type: Application
    Filed: April 9, 2019
    Publication date: August 1, 2019
    Inventors: Dimitrios B. Dimitriadis, Samuel Thomas, Colin C. Vaz
  • Patent number: 10347270
    Abstract: According to one embodiment, a computer program product for denoising a signal comprises a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, and where the program instructions are executable by a processor to cause the processor to perform a method comprising creating, utilizing a processor, a clean dictionary, utilizing a clean signal, creating, utilizing the processor, a noisy dictionary, utilizing a first noisy signal, determining, utilizing the processor, a time varying projection, utilizing the clean dictionary and the noisy dictionary, and denoising, utilizing the processor, a second noisy signal, utilizing the time varying projection.
    Type: Grant
    Filed: July 28, 2016
    Date of Patent: July 9, 2019
    Assignee: International Business Machines Corporation
    Inventors: Dimitrios B. Dimitriadis, Samuel Thomas, Colin C. Vaz
  • Publication number: 20190156835
    Abstract: An approach is provided that receives an audio stream and utilizes a voice activation detection (VAD) process to create a digital audio stream of voices from at least two different speakers. An automatic speech recognition (ASR) process is applied to the digital stream with the ASR process resulting in the spoken words to which a speaker turn detection (STD) process is applied to identify a number of speaker segments with each speaker segment ending at a word boundary. The STD process analyzes a number of speaker segments using a language model that determines when speaker changes occur. A speaker clustering algorithm is then applied to the speaker segments to associate one of the speakers with each of the speaker segments.
    Type: Application
    Filed: November 21, 2017
    Publication date: May 23, 2019
    Inventors: Kenneth W. Church, Dimitrios B. Dimitriadis, Petr Fousek, Miroslav Novak, George A. Saon
  • Publication number: 20190156832
    Abstract: An approach is provided that receives an audio stream and utilizes a voice activation detection (VAD) process to create a digital audio stream of voices from at least two different speakers. An automatic speech recognition (ASR) process is applied to the digital stream with the ASR process resulting in the spoken words to which a speaker turn detection (STD) process is applied to identify a number of speaker segments with each speaker segment ending at a word boundary. A speaker clustering algorithm is then applied to the speaker segments to associate one of the speakers with each of the speaker segments.
    Type: Application
    Filed: November 21, 2017
    Publication date: May 23, 2019
    Inventors: Kenneth W. Church, Dimitrios B. Dimitriadis, Petr Fousek, Miroslav Novak, George A. Saon