Patents by Inventor DIMITRIOS B. DIMITRIADIS
DIMITRIOS B. DIMITRIADIS has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11645473Abstract: Systems, computer-implemented methods, and computer program products that can facilitate predicting a source of a subsequent spoken dialogue are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise a speech receiving component that can receive a spoken dialogue from a first entity. The computer executable components can further comprise a speech processing component that can employ a network that can concurrently process a transition type and a dialogue act of the spoken dialogue to predict a source of a subsequent spoken dialogue.Type: GrantFiled: December 23, 2020Date of Patent: May 9, 2023Assignees: INTERNATIONAL BUSINESS MACHINES CORPORATION, THE REGENTS OF THE UNIVERSITY OF MICHIGANInventors: Lazaros Polymenakos, Dimitrios B. Dimitriadis, Zakaria Aldeneh, Emily Mower Provost
-
Publication number: 20220036178Abstract: The disclosure herein describes training a global model based on a plurality of data sets. The global model is applied to each data set of the plurality of data sets and a plurality of gradients is generated based on that application. At least one gradient quality metric is determined for each gradient of the plurality of gradients. Based on the determined gradient quality metrics of the plurality of gradients, a plurality of weight factors is calculated. The plurality of gradients is transformed into a plurality of weighted gradients based on the calculated plurality of weight factors and a global gradient is generated based on the plurality of weighted gradients. The global model is updated based on the global gradient, wherein the updated global model, when applied to a data set, performs a task based on the data set and provides model output based on performing the task.Type: ApplicationFiled: July 31, 2020Publication date: February 3, 2022Inventors: Dimitrios B. DIMITRIADIS, Kenichi KUMATANI, Robert Peter GMYR, Masaki ITAGAKI, Yashesh GAUR, Nanshan ZENG, Xuedong HUANG
-
Patent number: 11120802Abstract: An approach is provided that receives an audio stream and utilizes a voice activation detection (VAD) process to create a digital audio stream of voices from at least two different speakers. An automatic speech recognition (ASR) process is applied to the digital stream with the ASR process resulting in the spoken words to which a speaker turn detection (STD) process is applied to identify a number of speaker segments with each speaker segment ending at a word boundary. A speaker clustering algorithm is then applied to the speaker segments to associate one of the speakers with each of the speaker segments.Type: GrantFiled: November 21, 2017Date of Patent: September 14, 2021Assignee: International Business Machines CorporationInventors: Kenneth W. Church, Dimitrios B. Dimitriadis, Petr Fousek, Miroslav Novak, George A. Saon
-
Patent number: 11031027Abstract: A system for providing an acoustic environment recognizer for optimal speech processing is disclosed. In particular, the system may utilize metadata obtained from various acoustic environments to assist in suppressing ambient noise interfering with a desired audio signal. In order to do so, the system may receive an audio stream including an audio signal associated with a user and including ambient noise obtained from an acoustic environment of the user. The system may obtain first metadata associated with the ambient noise, and may determine if the first metadata corresponds to second metadata in a profile for the acoustic environment. If the first metadata corresponds to the second metadata, the system may select a processing scheme for suppressing the ambient noise from the audio stream, and process the audio stream using the processing scheme. Once the audio stream is processed, the system may provide the audio stream to a destination.Type: GrantFiled: March 5, 2018Date of Patent: June 8, 2021Assignee: AT&T Intellectual Property I, L.P.Inventors: Horst J. Schroeter, Donald J. Bowen, Dimitrios B. Dimitriadis, Lusheng Ji
-
Patent number: 10984814Abstract: A computer-implemented method according to one embodiment includes creating a clean dictionary, utilizing a clean signal, creating a noisy dictionary, utilizing a first noisy signal, determining a time varying projection, utilizing the clean dictionary and the noisy dictionary, and denoising a second noisy signal, utilizing the time varying projection.Type: GrantFiled: February 24, 2020Date of Patent: April 20, 2021Assignee: International Business Machines CorporationInventors: Dimitrios B. Dimitriadis, Samuel Thomas, Colin C. Vaz
-
Publication number: 20210110829Abstract: Systems, computer-implemented methods, and computer program products that can facilitate predicting a source of a subsequent spoken dialogue are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise a speech receiving component that can receive a spoken dialogue from a first entity. The computer executable components can further comprise a speech processing component that can employ a network that can concurrently process a transition type and a dialogue act of the spoken dialogue to predict a source of a subsequent spoken dialogue.Type: ApplicationFiled: December 23, 2020Publication date: April 15, 2021Inventors: Lazaros Polymenakos, Dimitrios B. Dimitriadis, Zakaria Aldeneh, Emily Mower Provost
-
Patent number: 10957320Abstract: Systems, computer-implemented methods, and computer program products that can facilitate predicting a source of a subsequent spoken dialogue are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise a speech receiving component that can receive a spoken dialogue from a first entity. The computer executable components can further comprise a speech processing component that can employ a network that can concurrently process a transition type and a dialogue act of the spoken dialogue to predict a source of a subsequent spoken dialogue.Type: GrantFiled: January 25, 2019Date of Patent: March 23, 2021Assignees: INTERNATIONAL BUSINESS MACHINES CORPORATION, THE REGENTS OF THE UNIVERSITY OF MICHIGANInventors: Lazaros Polymenakos, Dimitrios B. Dimitriadis, Zakaria Aldeneh, Emily Mower Provost
-
Patent number: 10902843Abstract: Audio features, such as perceptual linear prediction (PLP) features and time derivatives thereof, are extracted from frames of training audio data including speech by multiple speakers, and silence, such as by using linear discriminant analysis (LDA). The frames are clustered into k-means clusters using distance measures, such as Mahalanobis distance measures, of means and variances of the extracted audio features of the frames. A recurrent neural network (RNN) is trained on the extracted audio features of the frames and cluster identifiers of the k-means clusters into which the frames have been clustered. The RNN is applied to audio data to segment audio data into segments that each correspond to one of the cluster identifiers. Each segment can be assigned a label corresponding to one of the cluster identifiers. Speech recognition can be performed on the segments.Type: GrantFiled: November 15, 2019Date of Patent: January 26, 2021Assignee: International Business Machines CorporationInventors: Dimitrios B. Dimitriadis, David C. Haws, Michael Picheny, George Saon, Samuel Thomas
-
Publication number: 20200243073Abstract: Systems, computer-implemented methods, and computer program products that can facilitate predicting a source of a subsequent spoken dialogue are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise a speech receiving component that can receive a spoken dialogue from a first entity. The computer executable components can further comprise a speech processing component that can employ a network that can concurrently process a transition type and a dialogue act of the spoken dialogue to predict a source of a subsequent spoken dialogue.Type: ApplicationFiled: January 25, 2019Publication date: July 30, 2020Inventors: Lazaros Polymenakos, Dimitrios B. Dimitriadis, Zakaria Aldeneh, Emily Mower Provost
-
Publication number: 20200202879Abstract: A computer-implemented method according to one embodiment includes creating a clean dictionary, utilizing a clean signal, creating a noisy dictionary, utilizing a first noisy signal, determining a time varying projection, utilizing the clean dictionary and the noisy dictionary, and denoising a second noisy signal, utilizing the time varying projection.Type: ApplicationFiled: February 24, 2020Publication date: June 25, 2020Inventors: Dimitrios B. Dimitriadis, Samuel Thomas, Colin C. Vaz
-
Patent number: 10657980Abstract: A computer-implemented method according to one embodiment includes creating a clean dictionary, utilizing a clean signal, creating a noisy dictionary, utilizing a first noisy signal, determining a time varying projection, utilizing the clean dictionary and the noisy dictionary, and denoising a second noisy signal, utilizing the time varying projection.Type: GrantFiled: October 25, 2017Date of Patent: May 19, 2020Assignee: International Business Machines CorporationInventors: Dimitrios B. Dimitriadis, Samuel Thomas, Colin C. Vaz
-
Patent number: 10629221Abstract: A computer-implemented method according to one embodiment includes creating a clean dictionary, utilizing a clean signal, creating a noisy dictionary, utilizing a first noisy signal, determining a time varying projection, utilizing the clean dictionary and the noisy dictionary, denoising a second noisy signal, utilizing the time varying projection, and expanding the clean dictionary and the noisy dictionary by updating the clean dictionary and the noisy dictionary to include new clean spectro-temporal building blocks and new noisy spectro-temporal building blocks created utilizing additional clean and noisy signals.Type: GrantFiled: April 9, 2019Date of Patent: April 21, 2020Assignee: International Business Machines CorporationInventors: Dimitrios B. Dimitriadis, Samuel Thomas, Colin C. Vaz
-
Patent number: 10614797Abstract: A diarization embodiment may include a system that clusters data up to a current point in time and consolidates it with the past decisions, and then returns the result that minimizes the difference with past decisions. The consolidation may be achieved by performing a permutation of the different possible labels and comparing the distance. For speaker diarization, a distance may be determined based on a minimum edit or hamming distance. The distance may alternatively be a measure other than the minimum edit or hamming distance. The clustering may have a finite time window over which the analysis is performed.Type: GrantFiled: November 30, 2017Date of Patent: April 7, 2020Assignee: International Business Machines CorporationInventors: Kenneth W. Church, Dimitrios B. Dimitriadis, Petr Fousek, Jason W. Pelecanos, Weizhong Zhu
-
Publication number: 20200082809Abstract: Audio features, such as perceptual linear prediction (PLP) features and time derivatives thereof, are extracted from frames of training audio data including speech by multiple speakers, and silence, such as by using linear discriminant analysis (LDA). The frames are clustered into k-means clusters using distance measures, such as Mahalanobis distance measures, of means and variances of the extracted audio features of the frames. A recurrent neural network (RNN) is trained on the extracted audio features of the frames and cluster identifiers of the k-means clusters into which the frames have been clustered. The RNN is applied to audio data to segment audio data into segments that each correspond to one of the cluster identifiers. Each segment can be assigned a label corresponding to one of the cluster identifiers. Speech recognition can be performed on the segments.Type: ApplicationFiled: November 15, 2019Publication date: March 12, 2020Inventors: DIMITRIOS B. DIMITRIADIS, David C. Haws, MICHAEL PICHENY, GEORGE SAON, Samuel Thomas
-
Patent number: 10546575Abstract: Audio features, such as perceptual linear prediction (PLP) features and time derivatives thereof, are extracted from frames of training audio data including speech by multiple speakers, and silence, such as by using linear discriminant analysis (LDA). The frames are clustered into k-means clusters using distance measures, such as Mahalanobis distance measures, of means and variances of the extracted audio features of the frames. A recurrent neural network (RNN) is trained on the extracted audio features of the frames and cluster identifiers of the k-means clusters into which the frames have been clustered. The RNN is applied to audio data to segment audio data into segments that each correspond to one of the cluster identifiers. Each segment can be assigned a label corresponding to one of the cluster identifiers. Speech recognition can be performed on the segments.Type: GrantFiled: December 14, 2016Date of Patent: January 28, 2020Assignee: International Business Machines CorporationInventors: Dimitrios B. Dimitriadis, David C. Haws, Michael Picheny, George Saon, Samuel Thomas
-
Patent number: 10468031Abstract: An approach is provided that receives an audio stream and utilizes a voice activation detection (VAD) process to create a digital audio stream of voices from at least two different speakers. An automatic speech recognition (ASR) process is applied to the digital stream with the ASR process resulting in the spoken words to which a speaker turn detection (STD) process is applied to identify a number of speaker segments with each speaker segment ending at a word boundary. The STD process analyzes a number of speaker segments using a language model that determines when speaker changes occur. A speaker clustering algorithm is then applied to the speaker segments to associate one of the speakers with each of the speaker segments.Type: GrantFiled: November 21, 2017Date of Patent: November 5, 2019Assignee: International Business Machines CorporationInventors: Kenneth W. Church, Dimitrios B. Dimitriadis, Petr Fousek, Miroslav Novak, George A. Saon
-
Publication number: 20190237090Abstract: A computer-implemented method according to one embodiment includes creating a clean dictionary, utilizing a clean signal, creating a noisy dictionary, utilizing a first noisy signal, determining a time varying projection, utilizing the clean dictionary and the noisy dictionary, denoising a second noisy signal, utilizing the time varying projection, and expanding the clean dictionary and the noisy dictionary by updating the clean dictionary and the noisy dictionary to include new clean spectro-temporal building blocks and new noisy spectro-temporal building blocks created utilizing additional clean and noisy signals.Type: ApplicationFiled: April 9, 2019Publication date: August 1, 2019Inventors: Dimitrios B. Dimitriadis, Samuel Thomas, Colin C. Vaz
-
Patent number: 10347270Abstract: According to one embodiment, a computer program product for denoising a signal comprises a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, and where the program instructions are executable by a processor to cause the processor to perform a method comprising creating, utilizing a processor, a clean dictionary, utilizing a clean signal, creating, utilizing the processor, a noisy dictionary, utilizing a first noisy signal, determining, utilizing the processor, a time varying projection, utilizing the clean dictionary and the noisy dictionary, and denoising, utilizing the processor, a second noisy signal, utilizing the time varying projection.Type: GrantFiled: July 28, 2016Date of Patent: July 9, 2019Assignee: International Business Machines CorporationInventors: Dimitrios B. Dimitriadis, Samuel Thomas, Colin C. Vaz
-
Publication number: 20190156835Abstract: An approach is provided that receives an audio stream and utilizes a voice activation detection (VAD) process to create a digital audio stream of voices from at least two different speakers. An automatic speech recognition (ASR) process is applied to the digital stream with the ASR process resulting in the spoken words to which a speaker turn detection (STD) process is applied to identify a number of speaker segments with each speaker segment ending at a word boundary. The STD process analyzes a number of speaker segments using a language model that determines when speaker changes occur. A speaker clustering algorithm is then applied to the speaker segments to associate one of the speakers with each of the speaker segments.Type: ApplicationFiled: November 21, 2017Publication date: May 23, 2019Inventors: Kenneth W. Church, Dimitrios B. Dimitriadis, Petr Fousek, Miroslav Novak, George A. Saon
-
Publication number: 20190156832Abstract: An approach is provided that receives an audio stream and utilizes a voice activation detection (VAD) process to create a digital audio stream of voices from at least two different speakers. An automatic speech recognition (ASR) process is applied to the digital stream with the ASR process resulting in the spoken words to which a speaker turn detection (STD) process is applied to identify a number of speaker segments with each speaker segment ending at a word boundary. A speaker clustering algorithm is then applied to the speaker segments to associate one of the speakers with each of the speaker segments.Type: ApplicationFiled: November 21, 2017Publication date: May 23, 2019Inventors: Kenneth W. Church, Dimitrios B. Dimitriadis, Petr Fousek, Miroslav Novak, George A. Saon