Patents by Inventor OTAVIO BRAGA
OTAVIO BRAGA has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240104247Abstract: A method for a privacy-aware transcription includes receiving audio-visual signal including audio data and image data for a speech environment and a privacy request from a participant in the speech environment where the privacy request indicates a privacy condition of the participant. The method further includes segmenting the audio data into a plurality of segments. For each segment, the method includes determining an identity of a speaker of a corresponding segment of the audio data based on the image data and determining whether the identity of the speaker of the corresponding segment includes the participant associated with the privacy condition. When the identity of the speaker of the corresponding segment includes the participant, the method includes applying the privacy condition to the corresponding segment. The method also includes processing the plurality of segments of the audio data to determine a transcript for the audio data.Type: ApplicationFiled: December 11, 2023Publication date: March 28, 2024Applicant: Google LLCInventors: Oliver Siohan, Takaki Makino, Richard Rose, Otavio Braga, Hank Liao, Basilio Garcia Castillo
-
Patent number: 11900919Abstract: An audio-visual automated speech recognition model for transcribing speech from audio-visual data includes an encoder frontend and a decoder. The encoder includes an attention mechanism configured to receive an audio track of the audio-visual data and a video portion of the audio-visual data. The video portion of the audio-visual data includes a plurality of video face tracks each associated with a face of a respective person. For each video face track of the plurality of video face tracks, the attention mechanism is configured to determine a confidence score indicating a likelihood that the face of the respective person associated with the video face track includes a speaking face of the audio track. The decoder is configured to process the audio track and the video face track of the plurality of video face tracks associated with the highest confidence score to determine a speech recognition result of the audio track.Type: GrantFiled: March 21, 2023Date of Patent: February 13, 2024Assignee: Google LLCInventor: Otavio Braga
-
Publication number: 20230223012Abstract: An audio-visual automated speech recognition model for transcribing speech from audio-visual data includes an encoder frontend and a decoder. The encoder includes an attention mechanism configured to receive an audio track of the audio-visual data and a video portion of the audio-visual data. The video portion of the audio-visual data includes a plurality of video face tracks each associated with a face of a respective person. For each video face track of the plurality of video face tracks, the attention mechanism is configured to determine a confidence score indicating a likelihood that the face of the respective person associated with the video face track includes a speaking face of the audio track. The decoder is configured to process the audio track and the video face track of the plurality of video face tracks associated with the highest confidence score to determine a speech recognition result of the audio track.Type: ApplicationFiled: March 21, 2023Publication date: July 13, 2023Applicant: Google LLCInventor: Otavio Braga
-
Patent number: 11615781Abstract: A singe audio-visual automated speech recognition model for transcribing speech from audio-visual data includes an encoder frontend and a decoder. The encoder includes an attention mechanism configured to receive an audio track of the audio-visual data and a video portion of the audio-visual data. The video portion of the audio-visual data includes a plurality of video face tracks each associated with a face of a respective person. For each video face track of the plurality of video face tracks, the attention mechanism is configured to determine a confidence score indicating a likelihood that the face of the respective person associated with the video face tack includes a speaking face of the audio track. The decoder is configured to process the audio track and the video face track of the plurality of video face tracks associated with the highest confidence score to determine a speech recognition result of the audio track.Type: GrantFiled: October 2, 2020Date of Patent: March 28, 2023Assignee: Google LLCInventor: Otavio Braga
-
Publication number: 20220392439Abstract: A method (400) includes receiving audio data (112) corresponding to an utterance (101) spoken by a user (10), receiving video data (114) representing motion of lips of the user while the user was speaking the utterance, and obtaining multiple candidate transcriptions (135) for the utterance based on the audio data. For each candidate transcription of the multiple candidate transcriptions, the method also includes generating a synthesized speech representation (145) of the corresponding candidate transcription and determining an agreement score (155) indicating a likelihood that the synthesized speech representation matches the motion of the lips of the user while the user speaks the utterance. The method also includes selecting one of the multiple candidate transcriptions for the utterance as a speech recognition output (175) based on the agreement scores determined for the multiple candidate transcriptions for the utterance.Type: ApplicationFiled: November 18, 2019Publication date: December 8, 2022Applicant: Google LLCInventors: Olivier Siohan, Takaki Makino, Richard Rose, Otavio Braga, Hank Liao, Basillo Garcia Castillo
-
Publication number: 20220382907Abstract: A method for a privacy-aware transcription includes receiving audio-visual signal including audio data and image data for a speech environment and a privacy request from a participant in the speech environment where the privacy request indicates a privacy condition of the participant. The method further includes segmenting the audio data into a plurality of segments. For each segment, the method includes determining an identity of a speaker of a corresponding segment of the audio data based on the image data and determining whether the identity of the speaker of the corresponding segment includes the participant associated with the privacy condition. When the identity of the speaker of the corresponding segment includes the participant, the method includes applying the privacy condition to the corresponding segment. The method also includes processing the plurality of segments of the audio data to determine a transcript for the audio data.Type: ApplicationFiled: November 18, 2019Publication date: December 1, 2022Applicant: Google LLCInventors: Oliver Siohan, Takaki Makino, Richard Rose, Otavio Braga, Hank Liao, Basilio Castillo
-
Publication number: 20210118427Abstract: A singe audio-visual automated speech recognition model for transcribing speech from audio-visual data includes an encoder frontend and a decoder. The encoder includes an attention mechanism configured to receive an audio track of the audio-visual data and a video portion of the audio-visual data. The video portion of the audio-visual data includes a plurality of video face tracks each associated with a face of a respective person. For each video face track of the plurality of video face tracks, the attention mechanism is configured to determine a confidence score indicating a likelihood that the face of the respective person associated with the video face tack includes a speaking face of the audio track. The decoder is configured to process the audio track and the video face track of the plurality of video face tracks associated with the highest confidence score to determine a speech recognition result of the audio track.Type: ApplicationFiled: October 2, 2020Publication date: April 22, 2021Applicant: Google LLCInventor: Otavio Braga
-
Publication number: 20130097194Abstract: A method for displaying visual information corresponding to at least one user can including receiving a selection of at least one attribute to be viewed, with a computer arrangement, tracking at least one user pose of the at least one user in real-time using a marker-less capture procedure to generate tracking information, matching the at least one user pose with at least one database pose provided in a database based on the tracking information, and displaying the at least one database pose in combination with the at least one attribute.Type: ApplicationFiled: August 6, 2012Publication date: April 18, 2013Applicant: New York UniversityInventors: OTAVIO BRAGA, Davi Geiger