Patents by Inventor George Saon

George Saon has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Using recurrent neural network for partitioning of audio data into segments that each correspond to a speech feature cluster identifier

Patent number: 10902843

Abstract: Audio features, such as perceptual linear prediction (PLP) features and time derivatives thereof, are extracted from frames of training audio data including speech by multiple speakers, and silence, such as by using linear discriminant analysis (LDA). The frames are clustered into k-means clusters using distance measures, such as Mahalanobis distance measures, of means and variances of the extracted audio features of the frames. A recurrent neural network (RNN) is trained on the extracted audio features of the frames and cluster identifiers of the k-means clusters into which the frames have been clustered. The RNN is applied to audio data to segment audio data into segments that each correspond to one of the cluster identifiers. Each segment can be assigned a label corresponding to one of the cluster identifiers. Speech recognition can be performed on the segments.

Type: Grant

Filed: November 15, 2019

Date of Patent: January 26, 2021

Assignee: International Business Machines Corporation

Inventors: Dimitrios B. Dimitriadis, David C. Haws, Michael Picheny, George Saon, Samuel Thomas
Using recurrent neural network for partitioning of audio data into segments that each correspond to a speech feature cluster identifier

Patent number: 10546575

Abstract: Audio features, such as perceptual linear prediction (PLP) features and time derivatives thereof, are extracted from frames of training audio data including speech by multiple speakers, and silence, such as by using linear discriminant analysis (LDA). The frames are clustered into k-means clusters using distance measures, such as Mahalanobis distance measures, of means and variances of the extracted audio features of the frames. A recurrent neural network (RNN) is trained on the extracted audio features of the frames and cluster identifiers of the k-means clusters into which the frames have been clustered. The RNN is applied to audio data to segment audio data into segments that each correspond to one of the cluster identifiers. Each segment can be assigned a label corresponding to one of the cluster identifiers. Speech recognition can be performed on the segments.

Type: Grant

Filed: December 14, 2016

Date of Patent: January 28, 2020

Assignee: International Business Machines Corporation

Inventors: Dimitrios B. Dimitriadis, David C. Haws, Michael Picheny, George Saon, Samuel Thomas
Using long short-term memory recurrent neural network for speaker diarization segmentation

Patent number: 10249292

Abstract: Speaker diarization is performed on audio data including speech by a first speaker, speech by a second speaker, and silence. The speaker diarization includes segmenting the audio data using a long short-term memory (LSTM) recurrent neural network (RNN) to identify change points of the audio data that divide the audio data into segments. The speaker diarization includes assigning a label selected from a group of labels to each segment of the audio data using the LSTM RNN. The group of labels comprising includes labels corresponding to the first speaker, the second speaker, and the silence. Each change point is a transition from one of the first speaker, the second speaker, and the silence to a different one of the first speaker, the second speaker, and the silence. Speech recognition can be performed on the segments that each correspond to one of the first speaker and the second speaker.

Type: Grant

Filed: December 14, 2016

Date of Patent: April 2, 2019

Assignee: International Business Machines Corporation

Inventors: Dimitrios B. Dimitriadis, David C. Haws, Michael Picheny, George Saon, Samuel Thomas
USING RECURRENT NEURAL NETWORK FOR PARTITIONING OF AUDIO DATA INTO SEGMENTS THAT EACH CORRESPOND TO A SPEECH FEATURE CLUSTER IDENTIFIER

Publication number: 20180166067

Abstract: Audio features, such as perceptual linear prediction (PLP) features and time derivatives thereof, are extracted from frames of training audio data including speech by multiple speakers, and silence, such as by using linear discriminant analysis (LDA). The frames are clustered into k-means clusters using distance measures, such as Mahalanobis distance measures, of means and variances of the extracted audio features of the frames. A recurrent neural network (RNN) is trained on the extracted audio features of the frames and cluster identifiers of the k-means clusters into which the frames have been clustered. The RNN is applied to audio data to segment audio data into segments that each correspond to one of the cluster identifiers. Each segment can be assigned a label corresponding to one of the cluster identifiers. Speech recognition can be performed on the segments.

Type: Application

Filed: December 14, 2016

Publication date: June 14, 2018

Inventors: Dimitrios B. Dimitriadis, David C. Haws, Michael Picheny, George Saon, Samuel Thomas
USING LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK FOR SPEAKER DIARIZATION SEGMENTATION

Publication number: 20180166066

Abstract: Speaker diarization is performed on audio data including speech by a first speaker, speech by a second speaker, and silence. The speaker diarization includes segmenting the audio data using a long short-term memory (LSTM) recurrent neural network (RNN) to identify change points of the audio data that divide the audio data into segments. The speaker diarization includes assigning a label selected from a group of labels to each segment of the audio data using the LSTM RNN. The group of labels comprising includes labels corresponding to the first speaker, the second speaker, and the silence. Each change point is a transition from one of the first speaker, the second speaker, and the silence to a different one of the first speaker, the second speaker, and the silence. Speech recognition can be performed on the segments that each correspond to one of the first speaker and the second speaker.

Type: Application

Filed: December 14, 2016

Publication date: June 14, 2018

Inventors: Dimitrios B. Dimitriadis, David C. Haws, Michael Picheny, George Saon, Samuel Thomas
Discriminative learning via hierarchical transformations

Patent number: 9378464

Abstract: A system and an article of manufacture for discriminative learning via hierarchical transformations, which includes obtaining a model of a first set of data, two or more data transformations, and a second set of data, evaluating the two or more data transformations to determine which data transformation will most effectively modify the second set of data to match the model, and selecting the data transformation that will most effectively modify the second set of data to match the model based on the evaluation.

Type: Grant

Filed: July 30, 2012

Date of Patent: June 28, 2016

Assignee: International Business Machines Corporation

Inventors: Sasha P. Caskey, Dimitri Kanevsky, Brian Kingsbury, Tara N. Sainath, George Saon
Discriminative Learning Via Hierarchical Transformations

Publication number: 20140032570

Abstract: Techniques for discriminative learning via hierarchical transformations. A method includes obtaining a model of a first set of data, two or more data transformations, and a second set of data, evaluating the two or more data transformations to determine which data transformation will most effectively modify the second set of data to match the model, and selecting the data transformation that will most effectively modify the second set of data to match the model based on the evaluation.

Type: Application

Filed: July 30, 2012

Publication date: January 30, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Sasha P. Caskey, Dimitri Kanevsky, Brian Kingsbury, Tara N. Sainath, George Saon
Discriminative Learning Via Hierarchical Transformations

Publication number: 20140032571

Abstract: A system and an article of manufacture for discriminative learning via hierarchical transformations, which includes obtaining a model of a first set of data, two or more data transformations, and a second set of data, evaluating the two or more data transformations to determine which data transformation will most effectively modify the second set of data to match the model, and selecting the data transformation that will most effectively modify the second set of data to match the model based on the evaluation.

Type: Application

Filed: July 30, 2012

Publication date: January 30, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Sasha P. Caskey, Dimitri Kanevsky, Brian Kingsbury, Tara N. Sainath, George Saon
Speech recognition utilizing multitude of speech features

Publication number: 20050119885

Abstract: In a speech recognition system, the combination of a log-linear model with a multitude of speech features is provided to recognize unknown speech utterances. The speech recognition system models the posterior probability of linguistic units relevant to speech recognition using a log-linear model. The posterior model captures the probability of the linguistic unit given the observed speech features and the parameters of the posterior model. The posterior model may be determined using the probability of the word sequence hypotheses given a multitude of speech features. Log-linear models are used with features derived from sparse or incomplete data. The speech features that are utilized may include asynchronous, overlapping, and statistically non-independent speech features. Not all features used in training need to appear in testing/recognition.

Type: Application

Filed: November 28, 2003

Publication date: June 2, 2005

Inventors: Scott Axelrod, Sreeram Balakrishnan, Stanley Chen, Yuging Gao, Ramesh Gopinath, Hong-Kwang Kuo, Benoit Maison, David Nahamoo, Michael Picheny, George Saon, Geoffrey Zweig