Patents by Inventor Olivier Siohan

Olivier Siohan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Rescoring automatic speech recognition hypotheses using audio-visual matching

Patent number: 12334054

Abstract: A method (400) includes receiving audio data (112) corresponding to an utterance (101) spoken by a user (10), receiving video data (114) representing motion of lips of the user while the user was speaking the utterance, and obtaining multiple candidate transcriptions (135) for the utterance based on the audio data. For each candidate transcription of the multiple candidate transcriptions, the method also includes generating a synthesized speech representation (145) of the corresponding candidate transcription and determining an agreement score (155) indicating a likelihood that the synthesized speech representation matches the motion of the lips of the user while the user speaks the utterance. The method also includes selecting one of the multiple candidate transcriptions for the utterance as a speech recognition output (175) based on the agreement scores determined for the multiple candidate transcriptions for the utterance.

Type: Grant

Filed: November 18, 2019

Date of Patent: June 17, 2025

Assignee: Google LLC

Inventors: Olivier Siohan, Takaki Makino, Richard Rose, Otavio Braga, Hank Liao, Basilio Garcia Castillo
PRIVACY-AWARE MEETING ROOM TRANSCRIPTION FROM AUDIO-VISUAL STREAM

Publication number: 20250173461

Abstract: A method for a privacy-aware transcription includes receiving audio-visual signal including audio data and image data for a speech environment and a privacy request from a participant in the speech environment where the privacy request indicates a privacy condition of the participant. The method further includes segmenting the audio data into a plurality of segments. For each segment, the method includes determining an identity of a speaker of a corresponding segment of the audio data based on the image data and determining whether the identity of the speaker of the corresponding segment includes the participant associated with the privacy condition. When the identity of the speaker of the corresponding segment includes the participant, the method includes applying the privacy condition to the corresponding segment. The method also includes processing the plurality of segments of the audio data to determine a transcript for the audio data.

Type: Application

Filed: January 30, 2025

Publication date: May 29, 2025

Applicant: Google LLC

Inventors: Olivier Siohan, Takaki Makino, Richard Rose, Otavio Braga, Hank Liao, Basilio Garcia Castillo
MEETING SPEECH BIASING AND/OR DOCUMENT GENERATION BASED ON MEETING CONTENT AND/OR RELATED DATA

Publication number: 20250150295

Abstract: Implementations relate to an application that can bias automatic speech recognition for meetings using data that may be associated with the meeting and/or meeting participants. A transcription of inputs provided during a meeting can additionally and/or alternatively be processed to determine whether the inputs should be incorporated into a meeting document, which can provide a summary for the meeting. In some instances, entries into a meeting document can be designated as action items, and those action items can optionally have conditions for reminding meeting participants about the action items and/or for determining whether an action item has been fulfilled. In this way, various tasks that may typically be manually performed by meeting participants, such as creating a meeting summary, can be automated in a more accurate manner. This can preserve resources that may otherwise be wasted during video conferences, in-person meetings, and/or other gatherings.

Type: Application

Filed: January 13, 2025

Publication date: May 8, 2025

Inventors: Olivier Siohan, Takaki Makino, Joshua Maynez, Ryan Mcdonald, Benyah Shaparenko, Joseph Nelson, Kishan Sachdeva, Basilio Garcia
Meeting speech biasing and/or document generation based on meeting content and/or related data

Patent number: 12199783

Abstract: Implementations relate to an application that can bias automatic speech recognition for meetings using data that may be associated with the meeting and/or meeting participants. A transcription of inputs provided during a meeting can additionally and/or alternatively be processed to determine whether the inputs should be incorporated into a meeting document, which can provide a summary for the meeting. In some instances, entries into a meeting document can be designated as action items, and those action items can optionally have conditions for reminding meeting participants about the action items and/or for determining whether an action item has been fulfilled. In this way, various tasks that may typically be manually performed by meeting participants, such as creating a meeting summary, can be automated in a more accurate manner. This can preserve resources that may otherwise be wasted during video conferences, in-person meetings, and/or other gatherings.

Type: Grant

Filed: February 23, 2022

Date of Patent: January 14, 2025

Assignee: GOOGLE LLC

Inventors: Olivier Siohan, Takaki Makino, Joshua Maynez, Ryan Mcdonald, Benyah Shaparenko, Joseph Nelson, Kishan Sachdeva, Basilio Garcia
MEETING SPEECH BIASING AND/OR DOCUMENT GENERATION BASED ON MEETING CONTENT AND/OR RELATED DATA

Publication number: 20230267922

Abstract: Implementations relate to an application that can bias automatic speech recognition for meetings using data that may be associated with the meeting and/or meeting participants. A transcription of inputs provided during a meeting can additionally and/or alternatively be processed to determine whether the inputs should be incorporated into a meeting document, which can provide a summary for the meeting. In some instances, entries into a meeting document can be designated as action items, and those action items can optionally have conditions for reminding meeting participants about the action items and/or for determining whether an action item has been fulfilled. In this way, various tasks that may typically be manually performed by meeting participants, such as creating a meeting summary, can be automated in a more accurate manner. This can preserve resources that may otherwise be wasted during video conferences, in-person meetings, and/or other gatherings.

Type: Application

Filed: February 23, 2022

Publication date: August 24, 2023

Inventors: Olivier Siohan, Takaki Makino, Joshua Maynez, Ryan Mcdonald, Benyah Shaparenko, Joseph Nelson, Kishan Sachdeva, Basilio Garcia
Speech recognition with parallel recognition tasks

Patent number: 11527248

Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving an audio signal and initiating speech recognition tasks by a plurality of speech recognition systems (SRS's). Each SRS is configured to generate a recognition result specifying possible speech included in the audio signal and a confidence value indicating a confidence in a correctness of the speech result. The method also includes completing a portion of the speech recognition tasks including generating one or more recognition results and one or more confidence values for the one or more recognition results, determining whether the one or more confidence values meets a confidence threshold, aborting a remaining portion of the speech recognition tasks for SRS's that have not generated a recognition result, and outputting a final recognition result based on at least one of the generated one or more speech results.

Type: Grant

Filed: May 27, 2020

Date of Patent: December 13, 2022

Assignee: GOOGLE LLC

Inventors: Brian Strope, Francoise Beaufays, Olivier Siohan
Rescoring Automatic Speech Recognition Hypotheses Using Audio-Visual Matching

Publication number: 20220392439

Abstract: A method (400) includes receiving audio data (112) corresponding to an utterance (101) spoken by a user (10), receiving video data (114) representing motion of lips of the user while the user was speaking the utterance, and obtaining multiple candidate transcriptions (135) for the utterance based on the audio data. For each candidate transcription of the multiple candidate transcriptions, the method also includes generating a synthesized speech representation (145) of the corresponding candidate transcription and determining an agreement score (155) indicating a likelihood that the synthesized speech representation matches the motion of the lips of the user while the user speaks the utterance. The method also includes selecting one of the multiple candidate transcriptions for the utterance as a speech recognition output (175) based on the agreement scores determined for the multiple candidate transcriptions for the utterance.

Type: Application

Filed: November 18, 2019

Publication date: December 8, 2022

Applicant: Google LLC

Inventors: Olivier Siohan, Takaki Makino, Richard Rose, Otavio Braga, Hank Liao, Basillo Garcia Castillo
SPEECH RECOGNITION WITH PARALLEL RECOGNITION TASKS

Publication number: 20200357413

Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving an audio signal and initiating speech recognition tasks by a plurality of speech recognition systems (SRS's). Each SRS is configured to generate a recognition result specifying possible speech included in the audio signal and a confidence value indicating a confidence in a correctness of the speech result. The method also includes completing a portion of the speech recognition tasks including generating one or more recognition results and one or more confidence values for the one or more recognition results, determining whether the one or more confidence values meets a confidence threshold, aborting a remaining portion of the speech recognition tasks for SRS's that have not generated a recognition result, and outputting a final recognition result based on at least one of the generated one or more speech results.

Type: Application

Filed: May 27, 2020

Publication date: November 12, 2020

Applicant: Google LLC

Inventors: Brian Strope, Francoise Beaufays, Olivier Siohan
Speech recognition with parallel recognition tasks

Patent number: 10699714

Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving an audio signal and initiating speech recognition tasks by a plurality of speech recognition systems (SRS's). Each SRS is configured to generate a recognition result specifying possible speech included in the audio signal and a confidence value indicating a confidence in a correctness of the speech result. The method also includes completing a portion of the speech recognition tasks including generating one or more recognition results and one or more confidence values for the one or more recognition results, determining whether the one or more confidence values meets a confidence threshold, aborting a remaining portion of the speech recognition tasks for SRS's that have not generated a recognition result, and outputting a final recognition result based on at least one of the generated one or more speech results.

Type: Grant

Filed: July 20, 2018

Date of Patent: June 30, 2020

Assignee: Google LLC

Inventors: Brian Strope, Francoise Beaufays, Olivier Siohan
Speech recognition using associative mapping

Patent number: 10204619

Abstract: Methods, systems, and apparatus are described that receive audio data for an utterance. Association data is accessed that indicates associations between data corresponding to uncorrupted audio segments, and data corresponding to corrupted versions of the uncorrupted audio segments, where the associations are determined before receiving the audio data for the utterance. Using the association data and the received audio data for the utterance, data corresponding to at least one uncorrupted audio segment is selected. A transcription of the utterance is determined based on the selected data corresponding to the at least one uncorrupted audio segment.

Type: Grant

Filed: February 22, 2016

Date of Patent: February 12, 2019

Assignee: Google LLC

Inventors: Olivier Siohan, Pedro J. Moreno Mengibar
Speech Recognition with Parallel Recognition Tasks

Publication number: 20180330735

Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving an audio signal and initiating speech recognition tasks by a plurality of speech recognition systems (SRS's). Each SRS is configured to generate a recognition result specifying possible speech included in the audio signal and a confidence value indicating a confidence in a correctness of the speech result. The method also includes completing a portion of the speech recognition tasks including generating one or more recognition results and one or more confidence values for the one or more recognition results, determining whether the one or more confidence values meets a confidence threshold, aborting a remaining portion of the speech recognition tasks for SRS's that have not generated a recognition result, and outputting a final recognition result based on at least one of the generated one or more speech results.

Type: Application

Filed: July 20, 2018

Publication date: November 15, 2018

Applicant: Google LLC

Inventors: Brian Strope, Francoise Beaufays, Olivier Siohan
Speech recognition with parallel recognition tasks

Patent number: 10049672

Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving an audio signal and initiating speech recognition tasks by a plurality of speech recognition systems (SRS's). Each SRS is configured to generate a recognition result specifying possible speech included in the audio signal and a confidence value indicating a confidence in a correctness of the speech result. The method also includes completing a portion of the speech recognition tasks including generating one or more recognition results and one or more confidence values for the one or more recognition results, determining whether the one or more confidence values meets a confidence threshold, aborting a remaining portion of the speech recognition tasks for SRS's that have not generated a recognition result, and outputting a final recognition result based on at least one of the generated one or more speech results.

Type: Grant

Filed: June 2, 2016

Date of Patent: August 14, 2018

Assignee: Google LLC

Inventors: Brian Patrick Strope, Francoise Beaufays, Olivier Siohan
Acoustic model training corpus selection

Patent number: 9472187

Abstract: The present disclosure relates to training a speech recognition system. One example method includes receiving a collection of speech data items, wherein each speech data item corresponds to an utterance that was previously submitted for transcription by a production speech recognizer. The production speech recognizer uses initial production speech recognizer components in generating transcriptions of speech data items. A transcription for each speech data item is generated using an offline speech recognizer, and the offline speech recognizer components are configured to improve speech recognition accuracy in comparison with the initial production speech recognizer components. The updated production speech recognizer components are trained for the production speech recognizer using a selected subset of the transcriptions of the speech data items generated by the offline speech recognizer.

Type: Grant

Filed: May 25, 2016

Date of Patent: October 18, 2016

Assignee: Google Inc.

Inventors: Olga Kapralova, John Paul Alex, Eugene Weinstein, Pedro J. Moreno Mengibar, Olivier Siohan, Ignacio Lopez Moreno
Speech Recognition with Parallel Recognition Tasks

Publication number: 20160275951

Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving an audio signal and initiating speech recognition tasks by a plurality of speech recognition systems (SRS's). Each SRS is configured to generate a recognition result specifying possible speech included in the audio signal and a confidence value indicating a confidence in a correctness of the speech result. The method also includes completing a portion of the speech recognition tasks including generating one or more recognition results and one or more confidence values for the one or more recognition results, determining whether the one or more confidence values meets a confidence threshold, aborting a remaining portion of the speech recognition tasks for SRS's that have not generated a recognition result, and outputting a final recognition result based on at least one of the generated one or more speech results.

Type: Application

Filed: June 2, 2016

Publication date: September 22, 2016

Inventors: Brian Patrick Strope, Francoise Beaufays, Olivier Siohan
ACOUSTIC MODEL TRAINING CORPUS SELECTION

Publication number: 20160267903

Abstract: The present disclosure relates to training a speech recognition system. One example method includes receiving a collection of speech data items, wherein each speech data item corresponds to an utterance that was previously submitted for transcription by a production speech recognizer. The production speech recognizer uses initial production speech recognizer components in generating transcriptions of speech data items. A transcription for each speech data item is generated using an offline speech recognizer, and the offline speech recognizer components are configured to improve speech recognition accuracy in comparison with the initial production speech recognizer components. The updated production speech recognizer components are trained for the production speech recognizer using a selected subset of the transcriptions of the speech data items generated by the offline speech recognizer.

Type: Application

Filed: May 25, 2016

Publication date: September 15, 2016

Inventors: Olga Kapralova, John Paul Alex, Eugene Weinstein, Pedro J. Moreno Mengibar, Olivier Siohan, Ignacio Lopez Moreno
Acoustic model training corpus selection

Patent number: 9378731

Abstract: The present disclosure relates to training a speech recognition system. One example method includes receiving a collection of speech data items, wherein each speech data item corresponds to an utterance that was previously submitted for transcription by a production speech recognizer. The production speech recognizer uses initial production speech recognizer components in generating transcriptions of speech data items. A transcription for each speech data item is generated using an offline speech recognizer, and the offline speech recognizer components are configured to improve speech recognition accuracy in comparison with the initial production speech recognizer components. The updated production speech recognizer components are trained for the production speech recognizer using a selected subset of the transcriptions of the speech data items generated by the offline speech recognizer.

Type: Grant

Filed: April 22, 2015

Date of Patent: June 28, 2016

Assignee: Google Inc.

Inventors: Olga Kapralova, John Paul Alex, Eugene Weinstein, Pedro J. Moreno Mengibar, Olivier Siohan, Ignacio Lopez Moreno
Speech recognition with parallel recognition tasks

Patent number: 9373329

Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving an audio signal and initiating speech recognition tasks by a plurality of speech recognition systems (SRS's). Each SRS is configured to generate a recognition result specifying possible speech included in the audio signal and a confidence value indicating a confidence in a correctness of the speech result. The method also includes completing a portion of the speech recognition tasks including generating one or more recognition results and one or more confidence values for the one or more recognition results, determining whether the one or more confidence values meets a confidence threshold, aborting a remaining portion of the speech recognition tasks for SRS's that have not generated a recognition result, and outputting a final recognition result based on at least one of the generated one or more speech results.

Type: Grant

Filed: October 28, 2013

Date of Patent: June 21, 2016

Assignee: Google Inc.

Inventors: Brian Strope, Francoise Beaufays, Olivier Siohan
SPEECH RECOGNITION USING ASSOCIATIVE MAPPING

Publication number: 20160171977

Abstract: Methods, systems, and apparatus are described that receive audio data for an utterance. Association data is accessed that indicates associations between data corresponding to uncorrupted audio segments, and data corresponding to corrupted versions of the uncorrupted audio segments, where the associations are determined before receiving the audio data for the utterance. Using the association data and the received audio data for the utterance, data corresponding to at least one uncorrupted audio segment is selected. A transcription of the utterance is determined based on the selected data corresponding to the at least one uncorrupted audio segment.

Type: Application

Filed: February 22, 2016

Publication date: June 16, 2016

Inventors: Olivier Siohan, Pedro J. Moreno Mengibar
ACOUSTIC MODEL TRAINING CORPUS SELECTION

Publication number: 20160093294

Abstract: The present disclosure relates to training a speech recognition system. One example method includes receiving a collection of speech data items, wherein each speech data item corresponds to an utterance that was previously submitted for transcription by a production speech recognizer. The production speech recognizer uses initial production speech recognizer components in generating transcriptions of speech data items. A transcription for each speech data item is generated using an offline speech recognizer, and the offline speech recognizer components are configured to improve speech recognition accuracy in comparison with the initial production speech recognizer components. The updated production speech recognizer components are trained for the production speech recognizer using a selected subset of the transcriptions of the speech data items generated by the offline speech recognizer.

Type: Application

Filed: April 22, 2015

Publication date: March 31, 2016

Inventors: Olga Kapralova, John Paul Alex, Eugene Weinstein, Pedro J. Moreno Mengibar, Olivier Siohan, Ignacio Lopez Moreno
Speech recognition using associative mapping

Patent number: 9299347

Abstract: Methods, systems, and apparatus are described that receive audio data for an utterance. Association data is accessed that indicates associations between data corresponding to uncorrupted audio segments, and data corresponding to corrupted versions of the uncorrupted audio segments, where the associations are determined before receiving the audio data for the utterance. Using the association data and the received audio data for the utterance, data corresponding to at least one uncorrupted audio segment is selected. A transcription of the utterance is determined based on the selected data corresponding to the at least one uncorrupted audio segment.

Type: Grant

Filed: April 14, 2015

Date of Patent: March 29, 2016

Assignee: Google Inc.

Inventors: Olivier Siohan, Pedro J. Moreno Mengibar

1 2 next