Patents by Inventor DIMITRIOS B. DIMITRIADIS

DIMITRIOS B. DIMITRIADIS has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

End-of-turn detection in spoken dialogues

Patent number: 11645473

Abstract: Systems, computer-implemented methods, and computer program products that can facilitate predicting a source of a subsequent spoken dialogue are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise a speech receiving component that can receive a spoken dialogue from a first entity. The computer executable components can further comprise a speech processing component that can employ a network that can concurrently process a transition type and a dialogue act of the spoken dialogue to predict a source of a subsequent spoken dialogue.

Type: Grant

Filed: December 23, 2020

Date of Patent: May 9, 2023

Assignees: INTERNATIONAL BUSINESS MACHINES CORPORATION, THE REGENTS OF THE UNIVERSITY OF MICHIGAN

Inventors: Lazaros Polymenakos, Dimitrios B. Dimitriadis, Zakaria Aldeneh, Emily Mower Provost
DYNAMIC GRADIENT AGGREGATION FOR TRAINING NEURAL NETWORKS

Publication number: 20220036178

Abstract: The disclosure herein describes training a global model based on a plurality of data sets. The global model is applied to each data set of the plurality of data sets and a plurality of gradients is generated based on that application. At least one gradient quality metric is determined for each gradient of the plurality of gradients. Based on the determined gradient quality metrics of the plurality of gradients, a plurality of weight factors is calculated. The plurality of gradients is transformed into a plurality of weighted gradients based on the calculated plurality of weight factors and a global gradient is generated based on the plurality of weighted gradients. The global model is updated based on the global gradient, wherein the updated global model, when applied to a data set, performs a task based on the data set and provides model output based on performing the task.

Type: Application

Filed: July 31, 2020

Publication date: February 3, 2022

Inventors: Dimitrios B. DIMITRIADIS, Kenichi KUMATANI, Robert Peter GMYR, Masaki ITAGAKI, Yashesh GAUR, Nanshan ZENG, Xuedong HUANG
Diarization driven by the ASR based segmentation

Patent number: 11120802

Abstract: An approach is provided that receives an audio stream and utilizes a voice activation detection (VAD) process to create a digital audio stream of voices from at least two different speakers. An automatic speech recognition (ASR) process is applied to the digital stream with the ASR process resulting in the spoken words to which a speaker turn detection (STD) process is applied to identify a number of speaker segments with each speaker segment ending at a word boundary. A speaker clustering algorithm is then applied to the speaker segments to associate one of the speakers with each of the speaker segments.

Type: Grant

Filed: November 21, 2017

Date of Patent: September 14, 2021

Assignee: International Business Machines Corporation

Inventors: Kenneth W. Church, Dimitrios B. Dimitriadis, Petr Fousek, Miroslav Novak, George A. Saon
Acoustic environment recognizer for optimal speech processing

Patent number: 11031027

Abstract: A system for providing an acoustic environment recognizer for optimal speech processing is disclosed. In particular, the system may utilize metadata obtained from various acoustic environments to assist in suppressing ambient noise interfering with a desired audio signal. In order to do so, the system may receive an audio stream including an audio signal associated with a user and including ambient noise obtained from an acoustic environment of the user. The system may obtain first metadata associated with the ambient noise, and may determine if the first metadata corresponds to second metadata in a profile for the acoustic environment. If the first metadata corresponds to the second metadata, the system may select a processing scheme for suppressing the ambient noise from the audio stream, and process the audio stream using the processing scheme. Once the audio stream is processed, the system may provide the audio stream to a destination.

Type: Grant

Filed: March 5, 2018

Date of Patent: June 8, 2021

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Horst J. Schroeter, Donald J. Bowen, Dimitrios B. Dimitriadis, Lusheng Ji
Denoising a signal

Patent number: 10984814

Abstract: A computer-implemented method according to one embodiment includes creating a clean dictionary, utilizing a clean signal, creating a noisy dictionary, utilizing a first noisy signal, determining a time varying projection, utilizing the clean dictionary and the noisy dictionary, and denoising a second noisy signal, utilizing the time varying projection.

Type: Grant

Filed: February 24, 2020

Date of Patent: April 20, 2021

Assignee: International Business Machines Corporation

Inventors: Dimitrios B. Dimitriadis, Samuel Thomas, Colin C. Vaz
END-OF-TURN DETECTION IN SPOKEN DIALOGUES

Publication number: 20210110829

Abstract: Systems, computer-implemented methods, and computer program products that can facilitate predicting a source of a subsequent spoken dialogue are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise a speech receiving component that can receive a spoken dialogue from a first entity. The computer executable components can further comprise a speech processing component that can employ a network that can concurrently process a transition type and a dialogue act of the spoken dialogue to predict a source of a subsequent spoken dialogue.

Type: Application

Filed: December 23, 2020

Publication date: April 15, 2021

Inventors: Lazaros Polymenakos, Dimitrios B. Dimitriadis, Zakaria Aldeneh, Emily Mower Provost
End-of-turn detection in spoken dialogues

Patent number: 10957320

Abstract: Systems, computer-implemented methods, and computer program products that can facilitate predicting a source of a subsequent spoken dialogue are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise a speech receiving component that can receive a spoken dialogue from a first entity. The computer executable components can further comprise a speech processing component that can employ a network that can concurrently process a transition type and a dialogue act of the spoken dialogue to predict a source of a subsequent spoken dialogue.

Type: Grant

Filed: January 25, 2019

Date of Patent: March 23, 2021

Assignees: INTERNATIONAL BUSINESS MACHINES CORPORATION, THE REGENTS OF THE UNIVERSITY OF MICHIGAN

Inventors: Lazaros Polymenakos, Dimitrios B. Dimitriadis, Zakaria Aldeneh, Emily Mower Provost
Using recurrent neural network for partitioning of audio data into segments that each correspond to a speech feature cluster identifier

Patent number: 10902843

Abstract: Audio features, such as perceptual linear prediction (PLP) features and time derivatives thereof, are extracted from frames of training audio data including speech by multiple speakers, and silence, such as by using linear discriminant analysis (LDA). The frames are clustered into k-means clusters using distance measures, such as Mahalanobis distance measures, of means and variances of the extracted audio features of the frames. A recurrent neural network (RNN) is trained on the extracted audio features of the frames and cluster identifiers of the k-means clusters into which the frames have been clustered. The RNN is applied to audio data to segment audio data into segments that each correspond to one of the cluster identifiers. Each segment can be assigned a label corresponding to one of the cluster identifiers. Speech recognition can be performed on the segments.

Type: Grant

Filed: November 15, 2019

Date of Patent: January 26, 2021

Assignee: International Business Machines Corporation

Inventors: Dimitrios B. Dimitriadis, David C. Haws, Michael Picheny, George Saon, Samuel Thomas
END-OF-TURN DETECTION IN SPOKEN DIALOGUES

Publication number: 20200243073

Abstract: Systems, computer-implemented methods, and computer program products that can facilitate predicting a source of a subsequent spoken dialogue are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise a speech receiving component that can receive a spoken dialogue from a first entity. The computer executable components can further comprise a speech processing component that can employ a network that can concurrently process a transition type and a dialogue act of the spoken dialogue to predict a source of a subsequent spoken dialogue.

Type: Application

Filed: January 25, 2019

Publication date: July 30, 2020

Inventors: Lazaros Polymenakos, Dimitrios B. Dimitriadis, Zakaria Aldeneh, Emily Mower Provost
DENOISING A SIGNAL

Publication number: 20200202879

Abstract: A computer-implemented method according to one embodiment includes creating a clean dictionary, utilizing a clean signal, creating a noisy dictionary, utilizing a first noisy signal, determining a time varying projection, utilizing the clean dictionary and the noisy dictionary, and denoising a second noisy signal, utilizing the time varying projection.

Type: Application

Filed: February 24, 2020

Publication date: June 25, 2020

Inventors: Dimitrios B. Dimitriadis, Samuel Thomas, Colin C. Vaz
Denoising a signal

Patent number: 10657980

Abstract: A computer-implemented method according to one embodiment includes creating a clean dictionary, utilizing a clean signal, creating a noisy dictionary, utilizing a first noisy signal, determining a time varying projection, utilizing the clean dictionary and the noisy dictionary, and denoising a second noisy signal, utilizing the time varying projection.

Type: Grant

Filed: October 25, 2017

Date of Patent: May 19, 2020

Assignee: International Business Machines Corporation

Inventors: Dimitrios B. Dimitriadis, Samuel Thomas, Colin C. Vaz
Denoising a signal

Patent number: 10629221

Abstract: A computer-implemented method according to one embodiment includes creating a clean dictionary, utilizing a clean signal, creating a noisy dictionary, utilizing a first noisy signal, determining a time varying projection, utilizing the clean dictionary and the noisy dictionary, denoising a second noisy signal, utilizing the time varying projection, and expanding the clean dictionary and the noisy dictionary by updating the clean dictionary and the noisy dictionary to include new clean spectro-temporal building blocks and new noisy spectro-temporal building blocks created utilizing additional clean and noisy signals.

Type: Grant

Filed: April 9, 2019

Date of Patent: April 21, 2020

Assignee: International Business Machines Corporation

Inventors: Dimitrios B. Dimitriadis, Samuel Thomas, Colin C. Vaz
Prefix methods for diarization in streaming mode

Patent number: 10614797

Abstract: A diarization embodiment may include a system that clusters data up to a current point in time and consolidates it with the past decisions, and then returns the result that minimizes the difference with past decisions. The consolidation may be achieved by performing a permutation of the different possible labels and comparing the distance. For speaker diarization, a distance may be determined based on a minimum edit or hamming distance. The distance may alternatively be a measure other than the minimum edit or hamming distance. The clustering may have a finite time window over which the analysis is performed.

Type: Grant

Filed: November 30, 2017

Date of Patent: April 7, 2020

Assignee: International Business Machines Corporation

Inventors: Kenneth W. Church, Dimitrios B. Dimitriadis, Petr Fousek, Jason W. Pelecanos, Weizhong Zhu
USING RECURRENT NEURAL NETWORK FOR PARTITIONING OF AUDIO DATA INTO SEGMENTS THAT EACH CORRESPOND TO A SPEECH FEATURE CLUSTER IDENTIFIER

Publication number: 20200082809

Abstract: Audio features, such as perceptual linear prediction (PLP) features and time derivatives thereof, are extracted from frames of training audio data including speech by multiple speakers, and silence, such as by using linear discriminant analysis (LDA). The frames are clustered into k-means clusters using distance measures, such as Mahalanobis distance measures, of means and variances of the extracted audio features of the frames. A recurrent neural network (RNN) is trained on the extracted audio features of the frames and cluster identifiers of the k-means clusters into which the frames have been clustered. The RNN is applied to audio data to segment audio data into segments that each correspond to one of the cluster identifiers. Each segment can be assigned a label corresponding to one of the cluster identifiers. Speech recognition can be performed on the segments.

Type: Application

Filed: November 15, 2019

Publication date: March 12, 2020

Inventors: DIMITRIOS B. DIMITRIADIS, David C. Haws, MICHAEL PICHENY, GEORGE SAON, Samuel Thomas
Using recurrent neural network for partitioning of audio data into segments that each correspond to a speech feature cluster identifier

Patent number: 10546575

Abstract: Audio features, such as perceptual linear prediction (PLP) features and time derivatives thereof, are extracted from frames of training audio data including speech by multiple speakers, and silence, such as by using linear discriminant analysis (LDA). The frames are clustered into k-means clusters using distance measures, such as Mahalanobis distance measures, of means and variances of the extracted audio features of the frames. A recurrent neural network (RNN) is trained on the extracted audio features of the frames and cluster identifiers of the k-means clusters into which the frames have been clustered. The RNN is applied to audio data to segment audio data into segments that each correspond to one of the cluster identifiers. Each segment can be assigned a label corresponding to one of the cluster identifiers. Speech recognition can be performed on the segments.

Type: Grant

Filed: December 14, 2016

Date of Patent: January 28, 2020

Assignee: International Business Machines Corporation

Inventors: Dimitrios B. Dimitriadis, David C. Haws, Michael Picheny, George Saon, Samuel Thomas
Diarization driven by meta-information identified in discussion content

Patent number: 10468031

Abstract: An approach is provided that receives an audio stream and utilizes a voice activation detection (VAD) process to create a digital audio stream of voices from at least two different speakers. An automatic speech recognition (ASR) process is applied to the digital stream with the ASR process resulting in the spoken words to which a speaker turn detection (STD) process is applied to identify a number of speaker segments with each speaker segment ending at a word boundary. The STD process analyzes a number of speaker segments using a language model that determines when speaker changes occur. A speaker clustering algorithm is then applied to the speaker segments to associate one of the speakers with each of the speaker segments.

Type: Grant

Filed: November 21, 2017

Date of Patent: November 5, 2019

Assignee: International Business Machines Corporation

Inventors: Kenneth W. Church, Dimitrios B. Dimitriadis, Petr Fousek, Miroslav Novak, George A. Saon
DENOISING A SIGNAL

Publication number: 20190237090

Abstract: A computer-implemented method according to one embodiment includes creating a clean dictionary, utilizing a clean signal, creating a noisy dictionary, utilizing a first noisy signal, determining a time varying projection, utilizing the clean dictionary and the noisy dictionary, denoising a second noisy signal, utilizing the time varying projection, and expanding the clean dictionary and the noisy dictionary by updating the clean dictionary and the noisy dictionary to include new clean spectro-temporal building blocks and new noisy spectro-temporal building blocks created utilizing additional clean and noisy signals.

Type: Application

Filed: April 9, 2019

Publication date: August 1, 2019

Inventors: Dimitrios B. Dimitriadis, Samuel Thomas, Colin C. Vaz
Denoising a signal

Patent number: 10347270

Abstract: According to one embodiment, a computer program product for denoising a signal comprises a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, and where the program instructions are executable by a processor to cause the processor to perform a method comprising creating, utilizing a processor, a clean dictionary, utilizing a clean signal, creating, utilizing the processor, a noisy dictionary, utilizing a first noisy signal, determining, utilizing the processor, a time varying projection, utilizing the clean dictionary and the noisy dictionary, and denoising, utilizing the processor, a second noisy signal, utilizing the time varying projection.

Type: Grant

Filed: July 28, 2016

Date of Patent: July 9, 2019

Assignee: International Business Machines Corporation

Inventors: Dimitrios B. Dimitriadis, Samuel Thomas, Colin C. Vaz
Diarization Driven by Meta-Information Identified in Discussion Content

Publication number: 20190156835

Abstract: An approach is provided that receives an audio stream and utilizes a voice activation detection (VAD) process to create a digital audio stream of voices from at least two different speakers. An automatic speech recognition (ASR) process is applied to the digital stream with the ASR process resulting in the spoken words to which a speaker turn detection (STD) process is applied to identify a number of speaker segments with each speaker segment ending at a word boundary. The STD process analyzes a number of speaker segments using a language model that determines when speaker changes occur. A speaker clustering algorithm is then applied to the speaker segments to associate one of the speakers with each of the speaker segments.

Type: Application

Filed: November 21, 2017

Publication date: May 23, 2019

Inventors: Kenneth W. Church, Dimitrios B. Dimitriadis, Petr Fousek, Miroslav Novak, George A. Saon
Diarization Driven by the ASR Based Segmentation

Publication number: 20190156832

Abstract: An approach is provided that receives an audio stream and utilizes a voice activation detection (VAD) process to create a digital audio stream of voices from at least two different speakers. An automatic speech recognition (ASR) process is applied to the digital stream with the ASR process resulting in the spoken words to which a speaker turn detection (STD) process is applied to identify a number of speaker segments with each speaker segment ending at a word boundary. A speaker clustering algorithm is then applied to the speaker segments to associate one of the speakers with each of the speaker segments.

Type: Application

Filed: November 21, 2017

Publication date: May 23, 2019

Inventors: Kenneth W. Church, Dimitrios B. Dimitriadis, Petr Fousek, Miroslav Novak, George A. Saon

1 2 3 next