Patents by Inventor JOSEF VOPICKA

JOSEF VOPICKA has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Audio-visual speech recognition with scattering operators

Patent number: 10181325

Abstract: Aspects described herein are directed towards methods, computing devices, systems, and computer-readable media that apply scattering operations to extracted visual features of audiovisual input to generate predictions regarding the speech status of a subject. Visual scattering coefficients generated according to one or more aspects described herein may be used as input to a neural network operative to generate the predictions regarding the speech status of the subject. Predictions generated based on the visual features may be combined with predictions based on audio input associated with the visual features. In some embodiments, the extracted visual features may be combined with the audio input to generate a combined feature vector for use in generating predictions.

Type: Grant

Filed: June 30, 2017

Date of Patent: January 15, 2019

Assignee: Nuance Communications, Inc.

Inventors: Etienne Marcheret, Josef Vopicka, Vaibhava Goel
Role modeling in call centers and work centers

Patent number: 10147438

Abstract: Embodiments of the invention include method, systems and computer program products for role modeling. Aspects of the invention include receiving, by a processor, audio data, wherein the audio data includes a plurality of audio conversation for one or more speakers. The one or more segments for each of the plurality of audio conversations are partitioned. A speaker is associated with each of the one or more segments. The one or more segments for each of the plurality of audio conversations are labeled with roles utilizing a speaker recognition engine. Speakers are clustered based at least in part on a number of times the speakers are present in an audio conversation.

Type: Grant

Filed: March 2, 2017

Date of Patent: December 4, 2018

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Kenneth W. Church, Jason W. Pelecanos, Josef Vopicka, Weizhong Zhu
Methods and apparatus for speech recognition using visual information

Patent number: 10109277

Abstract: Methods and apparatus for using visual information to facilitate a speech recognition process. The method comprises dividing received audio information into a plurality of audio frames, determining for each of the plurality of audio frames, whether the audio information in the audio frame comprises speech from the foreground speaker, wherein the determining is based, at least in part, on received visual information, and transmitting the audio frame to an automatic speech recognition (ASR) engine for speech recognition when it is determined that the audio frame comprises speech from the foreground speaker.

Type: Grant

Filed: April 27, 2015

Date of Patent: October 23, 2018

Assignee: Nuance Communications, Inc.

Inventors: Etienne Marcheret, Josef Vopicka, Vaibhava Goel
ROLE MODELING IN CALL CENTERS AND WORK CENTERS

Publication number: 20180254051

Abstract: Embodiments of the invention include method, systems and computer program products for role modeling. Aspects of the invention include receiving, by a processor, audio data, wherein the audio data includes a plurality of audio conversation for one or more speakers. The one or more segments for each of the plurality of audio conversations are partitioned. A speaker is associated with each of the one or more segments. The one or more segments for each of the plurality of audio conversations are labeled with roles utilizing a speaker recognition engine. Speakers are clustered based at least in part on a number of times the speakers are present in an audio conversation.

Type: Application

Filed: March 2, 2017

Publication date: September 6, 2018

Inventors: Kenneth W. Church, Jason W. Pelecanos, Josef Vopicka, Weizhong Zhu
Audio-Visual Speech Recognition with Scattering Operators

Publication number: 20180025729

Abstract: Aspects described herein are directed towards methods, computing devices, systems, and computer-readable media that apply scattering operations to extracted visual features of audiovisual input to generate predictions regarding the speech status of a subject. Visual scattering coefficients generated according to one or more aspects described herein may be used as input to a neural network operative to generate the predictions regarding the speech status of the subject. Predictions generated based on the visual features may be combined with predictions based on audio input associated with the visual features. In some embodiments, the extracted visual features may be combined with the audio input to generate a combined feature vector for use in generating predictions.

Type: Application

Filed: June 30, 2017

Publication date: January 25, 2018

Inventors: Etienne Marcheret, Josef Vopicka, Vaibhava Goel
Audio-visual speech recognition with scattering operators

Patent number: 9697833

Abstract: Aspects described herein are directed towards methods, computing devices, systems, and computer-readable media that apply scattering operations to extracted visual features of audiovisual input to generate predictions regarding the speech status of a subject. Visual scattering coefficients generated according to one or more aspects described herein may be used as input to a neural network operative to generate the predictions regarding the speech status of the subject. Predictions generated based on the visual features may be combined with predictions based on audio input associated with the visual features. In some embodiments, the extracted visual features may be combined with the audio input to generate a combined feature vector for use in generating predictions.

Type: Grant

Filed: August 25, 2015

Date of Patent: July 4, 2017

Assignee: Nuance Communications, Inc.

Inventors: Etienne Marcheret, Josef Vopicka, Vaibhava Goel
Audio-Visual Speech Recognition with Scattering Operators

Publication number: 20170061966

Abstract: Aspects described herein are directed towards methods, computing devices, systems, and computer-readable media that apply scattering operations to extracted visual features of audiovisual input to generate predictions regarding the speech status of a subject. Visual scattering coefficients generated according to one or more aspects described herein may be used as input to a neural network operative to generate the predictions regarding the speech status of the subject. Predictions generated based on the visual features may be combined with predictions based on audio input associated with the visual features. In some embodiments, the extracted visual features may be combined with the audio input to generate a combined feature vector for use in generating predictions.

Type: Application

Filed: August 25, 2015

Publication date: March 2, 2017

Inventors: Etienne Marcheret, Josef Vopicka, Vaibhava Goel
METHODS AND APPARATUS FOR SPEECH RECOGNITION USING VISUAL INFORMATION

Publication number: 20160314789

Abstract: Methods and apparatus for using visual information to facilitate a speech recognition process. The method comprises dividing received audio information into a plurality of audio frames, determining for each of the plurality of audio frames, whether the audio information in the audio frame comprises speech from the foreground speaker, wherein the determining is based, at least in part, on received visual information, and transmitting the audio frame to an automatic speech recognition (ASR) engine for speech recognition when it is determined that the audio frame comprises speech from the foreground speaker.

Type: Application

Filed: April 27, 2015

Publication date: October 27, 2016

Applicant: Nuance Communications, Inc.

Inventors: Etienne Marcheret, Josef Vopicka, Vaibhava Goel
RESPONSE TIME WHEN USING A DUAL FACTOR END OF UTTERANCE DETERMINATION TECHNIQUE

Publication number: 20090198490

Abstract: The present invention discloses a solution for a speech processing system to determine end-of-utterance (EOU) events. The solution is a modified dual factor technique, where one factor is based upon a number of silence frames received and a second factor is based upon an end-of-path occurrence. The solution permits a set of configurable timeout delay values to be established, which can be configured on an application specific basis by application developers. The solution can speed up EOU determinations made through a dual factor technique, by situationally making finalization determination before a silence frame window is full.

Type: Application

Filed: February 6, 2008

Publication date: August 6, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: JOHN W. ECKHART, JONATHAN PALGON, JOSEF VOPICKA