Patents by Inventor Petr Fousek

Petr Fousek has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Diarization driven by the ASR based segmentation

Patent number: 11120802

Abstract: An approach is provided that receives an audio stream and utilizes a voice activation detection (VAD) process to create a digital audio stream of voices from at least two different speakers. An automatic speech recognition (ASR) process is applied to the digital stream with the ASR process resulting in the spoken words to which a speaker turn detection (STD) process is applied to identify a number of speaker segments with each speaker segment ending at a word boundary. A speaker clustering algorithm is then applied to the speaker segments to associate one of the speakers with each of the speaker segments.

Type: Grant

Filed: November 21, 2017

Date of Patent: September 14, 2021

Assignee: International Business Machines Corporation

Inventors: Kenneth W. Church, Dimitrios B. Dimitriadis, Petr Fousek, Miroslav Novak, George A. Saon
Prefix methods for diarization in streaming mode

Patent number: 10614797

Abstract: A diarization embodiment may include a system that clusters data up to a current point in time and consolidates it with the past decisions, and then returns the result that minimizes the difference with past decisions. The consolidation may be achieved by performing a permutation of the different possible labels and comparing the distance. For speaker diarization, a distance may be determined based on a minimum edit or hamming distance. The distance may alternatively be a measure other than the minimum edit or hamming distance. The clustering may have a finite time window over which the analysis is performed.

Type: Grant

Filed: November 30, 2017

Date of Patent: April 7, 2020

Assignee: International Business Machines Corporation

Inventors: Kenneth W. Church, Dimitrios B. Dimitriadis, Petr Fousek, Jason W. Pelecanos, Weizhong Zhu
Diarization driven by meta-information identified in discussion content

Patent number: 10468031

Abstract: An approach is provided that receives an audio stream and utilizes a voice activation detection (VAD) process to create a digital audio stream of voices from at least two different speakers. An automatic speech recognition (ASR) process is applied to the digital stream with the ASR process resulting in the spoken words to which a speaker turn detection (STD) process is applied to identify a number of speaker segments with each speaker segment ending at a word boundary. The STD process analyzes a number of speaker segments using a language model that determines when speaker changes occur. A speaker clustering algorithm is then applied to the speaker segments to associate one of the speakers with each of the speaker segments.

Type: Grant

Filed: November 21, 2017

Date of Patent: November 5, 2019

Assignee: International Business Machines Corporation

Inventors: Kenneth W. Church, Dimitrios B. Dimitriadis, Petr Fousek, Miroslav Novak, George A. Saon
Diarization Driven by the ASR Based Segmentation

Publication number: 20190156832

Abstract: An approach is provided that receives an audio stream and utilizes a voice activation detection (VAD) process to create a digital audio stream of voices from at least two different speakers. An automatic speech recognition (ASR) process is applied to the digital stream with the ASR process resulting in the spoken words to which a speaker turn detection (STD) process is applied to identify a number of speaker segments with each speaker segment ending at a word boundary. A speaker clustering algorithm is then applied to the speaker segments to associate one of the speakers with each of the speaker segments.

Type: Application

Filed: November 21, 2017

Publication date: May 23, 2019

Inventors: Kenneth W. Church, Dimitrios B. Dimitriadis, Petr Fousek, Miroslav Novak, George A. Saon
Diarization Driven by Meta-Information Identified in Discussion Content

Publication number: 20190156835

Abstract: An approach is provided that receives an audio stream and utilizes a voice activation detection (VAD) process to create a digital audio stream of voices from at least two different speakers. An automatic speech recognition (ASR) process is applied to the digital stream with the ASR process resulting in the spoken words to which a speaker turn detection (STD) process is applied to identify a number of speaker segments with each speaker segment ending at a word boundary. The STD process analyzes a number of speaker segments using a language model that determines when speaker changes occur. A speaker clustering algorithm is then applied to the speaker segments to associate one of the speakers with each of the speaker segments.

Type: Application

Filed: November 21, 2017

Publication date: May 23, 2019

Inventors: Kenneth W. Church, Dimitrios B. Dimitriadis, Petr Fousek, Miroslav Novak, George A. Saon
PREFIX METHODS FOR DIARIZATION IN STREAMING MODE

Publication number: 20180158451

Abstract: A diarization embodiment may include a system that clusters data up to a current point in time and consolidates it with the past decisions, and then returns the result that minimizes the difference with past decisions. The consolidation may be achieved by performing a permutation of the different possible labels and comparing the distance. For speaker diarization, a distance may be determined based on a minimum edit or hamming distance. The distance may alternatively be a measure other than the minimum edit or hamming distance. The clustering may have a finite time window over which the analysis is performed.

Type: Application

Filed: November 30, 2017

Publication date: June 7, 2018

Inventors: Kenneth W. CHURCH, Dimitrios B. DIMITRIADIS, Petr FOUSEK, Jason W. PELECANOS, Weizhong ZHU
System and method for dynamic noise adaptation for robust automatic speech recognition

Patent number: 9741341

Abstract: A speech processing method and arrangement are described. A dynamic noise adaptation (DNA) model characterizes a speech input reflecting effects of background noise. A null noise DNA model characterizes the speech input based on reflecting a null noise mismatch condition. A DNA interaction model performs Bayesian model selection and re-weighting of the DNA model and the null noise DNA model to realize a modified DNA model characterizing the speech input for automatic speech recognition and compensating for noise to a varying degree depending on relative probabilities of the DNA model and the null noise DNA model.

Type: Grant

Filed: January 20, 2015

Date of Patent: August 22, 2017

Assignee: Nuance Communications, Inc.

Inventors: Steven J. Rennie, Pierre Dognin, Petr Fousek
Deep scattering spectrum in acoustic modeling for speech recognition

Patent number: 9640186

Abstract: Deep scattering spectral features are extracted from an acoustic input signal to generate a deep scattering spectral feature representation of the acoustic input signal. The deep scattering spectral feature representation is input to a speech recognition engine. The acoustic input signal is decoded based on at least a portion of the deep scattering spectral feature representation input to a speech recognition engine.

Type: Grant

Filed: May 2, 2014

Date of Patent: May 2, 2017

Assignee: International Business Machines Corporation

Inventors: Petr Fousek, Vaibhava Goel, Brian E. D. Kingsbury, Etienne Marcheret, Shay Maymon, David Nahamoo, Vijayaditya Peddinti, Bhuvana Ramabhadran, Tara N. Sainath
DEEP SCATTERING SPECTRUM IN ACOUSTIC MODELING FOR SPEECH RECOGNITION

Publication number: 20150317990

Abstract: Deep scattering spectral features are extracted from an acoustic input signal to generate a deep scattering spectral feature representation of the acoustic input signal. The deep scattering spectral feature representation is input to a speech recognition engine. The acoustic input signal is decoded based on at least a portion of the deep scattering spectral feature representation input to a speech recognition engine.

Type: Application

Filed: May 2, 2014

Publication date: November 5, 2015

Inventors: Petr Fousek, Vaibhava Goel, Brian E.D. Kingsbury, Etienne Marcheret, Shay Maymon, David Nahamoo, Tara N. Sainath, Bhuvana Ramabhadran
System and Method for Dynamic Noise Adaptation for Robust Automatic Speech Recognition

Publication number: 20150199964

Abstract: A speech processing method and arrangement are described. A dynamic noise adaptation (DNA) model characterizes a speech input reflecting effects of background noise. A null noise DNA model characterizes the speech input based on reflecting a null noise mismatch condition. A DNA interaction model performs Bayesian model selection and re-weighting of the DNA model and the null noise DNA model to realize a modified DNA model characterizing the speech input for automatic speech recognition and compensating for noise to a varying degree depending on relative probabilities of the DNA model and the null noise DNA model.

Type: Application

Filed: January 20, 2015

Publication date: July 16, 2015

Inventors: Steven J. Rennie, Pierre Dognin, Petr Fousek
System and method for dynamic noise adaptation for robust automatic speech recognition

Patent number: 8972256

Abstract: A speech processing method and arrangement are described. A dynamic noise adaptation (DNA) model characterizes a speech input reflecting effects of background noise. A null noise DNA model characterizes the speech input based on reflecting a null noise mismatch condition. A DNA interaction model performs Bayesian model selection and re-weighting of the DNA model and the null noise DNA model to realize a modified DNA model characterizing the speech input for automatic speech recognition and compensating for noise to a varying degree depending on relative probabilities of the DNA model and the null noise DNA model.

Type: Grant

Filed: October 17, 2011

Date of Patent: March 3, 2015

Assignee: Nuance Communications, Inc.

Inventors: Steven J. Rennie, Pierre Dognin, Petr Fousek
System and Method for Dynamic Noise Adaptation for Robust Automatic Speech Recognition

Publication number: 20130096915

Abstract: A speech processing method and arrangement are described. A dynamic noise adaptation (DNA) model characterizes a speech input reflecting effects of background noise. A null noise DNA model characterizes the speech input based on reflecting a null noise mismatch condition. A DNA interaction model performs Bayesian model selection and re-weighting of the DNA model and the null noise DNA model to realize a modified DNA model characterizing the speech input for automatic speech recognition and compensating for noise to a varying degree depending on relative probabilities of the DNA model and the null noise DNA model.

Type: Application

Filed: October 17, 2011

Publication date: April 18, 2013

Applicant: NUANCE COMMUNICATIONS, INC.

Inventors: Steven J. Rennie, Pierre Dognin, Petr Fousek
Compressing feature space transforms

Patent number: 8386249

Abstract: Methods for compressing a transform associated with a feature space are presented. For example, a method for compressing a transform associated with a feature space includes obtaining the transform including a plurality of transform parameters, assigning each of a plurality of quantization levels for the plurality of transform parameters to one of a plurality of quantization values, and assigning each of the plurality of transform parameters to one of the plurality of quantization values to which one of the plurality of quantization levels is assigned. One or more of obtaining the transform, assigning of each of the plurality of quantization levels, and assigning of each of the transform parameters are implemented as instruction code executed on a processor device. Further, a Viterbi algorithm may be employed for use in non-uniform level/value assignments.

Type: Grant

Filed: December 11, 2009

Date of Patent: February 26, 2013

Assignee: International Business Machines Corporation

Inventors: Petr Fousek, Vaibhava Goel, Etienne Marcheret, Peder Andreas Olsen
Compressing Feature Space Transforms

Publication number: 20110144991

Abstract: Methods for compressing a transform associated with a feature space are presented. For example, a method for compressing a transform associated with a feature space includes obtaining the transform including a plurality of transform parameters, assigning each of a plurality of quantization levels for the plurality of transform parameters to one of a plurality of quantization values, and assigning each of the plurality of transform parameters to one of the plurality of quantization values to which one of the plurality of quantization levels is assigned. One or more of obtaining the transform, assigning of each of the plurality of quantization levels, and assigning of each of the transform parameters are implemented as instruction code executed on a processor device. Further, a Viterbi algorithm may be employed for use in non-uniform level/value assignments.

Type: Application

Filed: December 11, 2009

Publication date: June 16, 2011

Applicant: International Business Machines Corporation

Inventors: Petr Fousek, Vaibhava Goel, Etienne Marcheret, Peder Andreas Olsen