Patents by Inventor Sourish Chaudhuri

Sourish Chaudhuri has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

HOT-WORD FREE ADAPTATION OF AUTOMATED ASSISTANT FUNCTION(S)

Publication number: 20230253009

Abstract: Hot-word free adaptation of one or more function(s) of an automated assistant. Sensor data, from one or more sensor components of an assistant device that provides an automated assistant interface (graphical and/or audible), is processed to determine occurrence and/or confidence metric(s) of various attributes of a user that is proximal to the assistant device. Whether to adapt each of one or more of the function(s) of the automated assistant is based on the occurrence and/or the confidence of one or more of the various attributes. For example, certain processing of at least some of the sensor data can be initiated, such as initiating previously dormant local processing of at least some of the sensor data and/or initiating transmission of at least some of the audio data to remote automated assistant component(s).

Type: Application

Filed: April 17, 2023

Publication date: August 10, 2023

Inventors: Jaclyn Konzelmann, Kenneth Mixter, Sourish Chaudhuri, Tuan Nguyen, Hideaki Matsui, Caroline Pantofaru, Vinay Bettadapura
Hot-word free adaptation of automated assistant function(s)

Patent number: 11688417

Abstract: Hot-word free adaptation of one or more function(s) of an automated assistant. Sensor data, from one or more sensor components of an assistant device that provides an automated assistant interface (graphical and/or audible), is processed to determine occurrence and/or confidence metric(s) of various attributes of a user that is proximal to the assistant device. Whether to adapt each of one or more of the function(s) of the automated assistant is based on the occurrence and/or the confidence of one or more of the various attributes. For example, certain processing of at least some of the sensor data can be initiated, such as initiating previously dormant local processing of at least some of the sensor data and/or initiating transmission of at least some of the audio data to remote automated assistant component(s).

Type: Grant

Filed: May 2, 2019

Date of Patent: June 27, 2023

Assignee: GOOGLE LLC

Inventors: Jaclyn Konzelmann, Kenneth Mixter, Sourish Chaudhuri, Tuan Nguyen, Hideaki Matsui, Caroline Pantofaru, Vinay Bettadapura
CONTEXT-BASED SPEAKER COUNTER FOR A SPEAKER DIARIZATION SYSTEM

Publication number: 20230103060

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining the number of speakers in a video and a corresponding audio using visual context. In one aspect, a method includes detecting within the video multiple speakers, determining a bounding box for each detected speaker that includes the detected person and objects within a threshold distance of the detected person in an image frame, determining a unique descriptor for that person based in part on image information depicting the objects within the bounding box, determining a cardinality of unique speakers in the video, providing to the speaker diarization system the cardinality of unique speakers.

Type: Application

Filed: March 13, 2020

Publication date: March 30, 2023

Inventors: Sourish Chaudhuri, Lev Finkelstein
Gating model for video analysis

Patent number: 11587319

Abstract: Implementations described herein relate to methods, devices, and computer-readable media to perform gating for video analysis. In some implementations, a computer-implemented method includes obtaining a video comprising a plurality of frames and corresponding audio. The method further includes performing sampling to select a subset of the plurality of frames based on a target frame rate and extracting a respective audio spectrogram for each frame in the subset of the plurality of frames. The method further includes reducing resolution of the subset of the plurality of frames. The method further includes applying a machine-learning based gating model to the subset of the plurality of frames and corresponding audio spectrograms and obtaining, as output of the gating model, an indication of whether to analyze the video to add one or more video annotations.

Type: Grant

Filed: March 30, 2021

Date of Patent: February 21, 2023

Assignee: Google LLC

Inventors: Sharadh Ramaswamy, Sourish Chaudhuri, Joseph Roth
GATING MODEL FOR VIDEO ANALYSIS

Publication number: 20210216778

Abstract: Implementations described herein relate to methods, devices, and computer-readable media to perform gating for video analysis. In some implementations, a computer-implemented method includes obtaining a video comprising a plurality of frames and corresponding audio. The method further includes performing sampling to select a subset of the plurality of frames based on a target frame rate and extracting a respective audio spectrogram for each frame in the subset of the plurality of frames. The method further includes reducing resolution of the subset of the plurality of frames. The method further includes applying a machine-learning based gating model to the subset of the plurality of frames and corresponding audio spectrograms and obtaining, as output of the gating model, an indication of whether to analyze the video to add one or more video annotations.

Type: Application

Filed: March 30, 2021

Publication date: July 15, 2021

Applicant: Google LLC

Inventors: Sharadh Ramaswamy, Sourish Chaudhuri, Joseph Roth
Automatic determination of timing windows for speech captions in an audio stream

Patent number: 11011184

Abstract: The technology disclosed herein may determine timing windows for speech captions of an audio stream. In one example, the technology may involve accessing audio data comprising a plurality of segments; determining, by a processing device, that one or more of the plurality of segments comprise speech sounds; identifying a time duration for the speech sounds; and providing a user interface element corresponding to the time duration for the speech sounds, wherein the user interface element indicates an estimate of a beginning and ending of the speech sounds and is configured to receive caption text associated with the speech sounds of the audio data.

Type: Grant

Filed: November 15, 2019

Date of Patent: May 18, 2021

Assignee: Google LLC

Inventors: Sourish Chaudhuri, Nebojsa Ciric, Khiem Pham
Gating model for video analysis

Patent number: 10984246

Abstract: Implementations described herein relate to methods, devices, and computer-readable media to perform gating for video analysis. In some implementations, a computer-implemented method includes obtaining a video comprising a plurality of frames and corresponding audio. The method further includes performing sampling to select a subset of the plurality of frames based on a target frame rate and extracting a respective audio spectrogram for each frame in the subset of the plurality of frames. The method further includes reducing resolution of the subset of the plurality of frames. The method further includes applying a machine-learning based gating model to the subset of the plurality of frames and corresponding audio spectrograms and obtaining, as output of the gating model, an indication of whether to analyze the video to add one or more video annotations.

Type: Grant

Filed: March 13, 2019

Date of Patent: April 20, 2021

Assignee: Google LLC

Inventors: Sharadh Ramaswamy, Sourish Chaudhuri, Joseph Roth
Speaking classification using audio-visual data

Patent number: 10846522

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating predictions for whether a target person is speaking during a portion of a video. In one aspect, a method includes obtaining one or more images which each depict a mouth of a given person at a respective time point. The images are processed using an image embedding neural network to generate a latent representation of the images. Audio data corresponding to the images is processed using an audio embedding neural network to generate a latent representation of the audio data. The latent representation of the images and the latent representation of the audio data is processed using a recurrent neural network to generate a prediction for whether the given person is speaking.

Type: Grant

Filed: October 16, 2018

Date of Patent: November 24, 2020

Assignee: Google LLC

Inventors: Sourish Chaudhuri, Ondrej Klejch, Joseph Edward Roth
HOT-WORD FREE ADAPTATION OF AUTOMATED ASSISTANT FUNCTION(S)

Publication number: 20200349966

Abstract: Hot-word free adaptation of one or more function(s) of an automated assistant. Sensor data, from one or more sensor components of an assistant device that provides an automated assistant interface (graphical and/or audible), is processed to determine occurrence and/or confidence metric(s) of various attributes of a user that is proximal to the assistant device. Whether to adapt each of one or more of the function(s) of the automated assistant is based on the occurrence and/or the confidence of one or more of the various attributes. For example, certain processing of at least some of the sensor data can be initiated, such as initiating previously dormant local processing of at least some of the sensor data and/or initiating transmission of at least some of the audio data to remote automated assistant component(s).

Type: Application

Filed: May 2, 2019

Publication date: November 5, 2020

Inventors: Jaclyn Konzelmann, Kenneth Mixter, Sourish Chaudhuri, Tuan Nguyen, Hideaki Matsui, Caroline Pantofaru, Vinay Bettadapura
GATING MODEL FOR VIDEO ANALYSIS

Publication number: 20200293783

Abstract: Implementations described herein relate to methods, devices, and computer-readable media to perform gating for video analysis. In some implementations, a computer-implemented method includes obtaining a video comprising a plurality of frames and corresponding audio. The method further includes performing sampling to select a subset of the plurality of frames based on a target frame rate and extracting a respective audio spectrogram for each frame in the subset of the plurality of frames. The method further includes reducing resolution of the subset of the plurality of frames. The method further includes applying a machine-learning based gating model to the subset of the plurality of frames and corresponding audio spectrograms and obtaining, as output of the gating model, an indication of whether to analyze the video to add one or more video annotations.

Type: Application

Filed: March 13, 2019

Publication date: September 17, 2020

Applicant: Google LLC

Inventors: Sharadh Ramaswamy, Sourish Chaudhuri, Joseph Roth
SPEAKING CLASSIFICATION USING AUDIO-VISUAL DATA

Publication number: 20200117887

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating predictions for whether a target person is speaking during a portion of a video. In one aspect, a method includes obtaining one or more images which each depict a mouth of a given person at a respective time point. The images are processed using an image embedding neural network to generate a latent representation of the images. Audio data corresponding to the images is processed using an audio embedding neural network to generate a latent representation of the audio data. The latent representation of the images and the latent representation of the audio data is processed using a recurrent neural network to generate a prediction for whether the given person is speaking.

Type: Application

Filed: October 16, 2018

Publication date: April 16, 2020

Inventors: Sourish Chaudhuri, Ondrej Klejch, Joseph Edward Roth
AUTOMATIC DETERMINATION OF TIMING WINDOWS FOR SPEECH CAPTIONS IN AN AUDIO STREAM

Publication number: 20200090678

Abstract: The technology disclosed herein may determine timing windows for speech captions of an audio stream. In one example, the technology may involve accessing audio data comprising a plurality of segments; determining, by a processing device, that one or more of the plurality of segments comprise speech sounds; identifying a time duration for the speech sounds; and providing a user interface element corresponding to the time duration for the speech sounds, wherein the user interface element indicates an estimate of a beginning and ending of the speech sounds and is configured to receive caption text associated with the speech sounds of the audio data.

Type: Application

Filed: November 15, 2019

Publication date: March 19, 2020

Inventors: Sourish Chaudhuri, Nebojsa Ciric, Khiem Pham
Audio classifier

Patent number: 10566009

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for audio classifiers. In one aspect, a method includes obtaining a plurality of video frames from a plurality of videos, wherein each of the plurality of video frames is associated with one or more image labels of a plurality of image labels determined based on image recognition; obtaining a plurality of audio segments corresponding to the plurality of video frames, wherein each audio segment has a specified duration relative to the corresponding video frame; and generating an audio classifier trained using the plurality of audio segment and the associated image labels as input, wherein the audio classifier is trained such that the one or more groups of audio segments are determined to be associated with respective one or more audio labels.

Type: Grant

Filed: July 24, 2019

Date of Patent: February 18, 2020

Assignee: Google LLC

Inventors: Sourish Chaudhuri, Achal D. Dave, Bryan Andrew Seybold
Associating faces with voices for speaker diarization within videos

Patent number: 10497382

Abstract: A computer-implemented method for speech diarization is described. The method comprises determining temporal positions of separate faces in a video using face detection and clustering. Voice features are detected in the speech sections of the video. The method further includes generating a correlation between the determined separate faces and separate voices based at least on the temporal positions of the separate faces and the separate voices in the video. This correlation is stored in a content store with the video.

Type: Grant

Filed: April 26, 2017

Date of Patent: December 3, 2019

Assignee: Google LLC

Inventors: Sourish Chaudhuri, Kenneth Hoover
Automatic determination of timing windows for speech captions in an audio stream

Patent number: 10490209

Abstract: A content system accessing an audio stream. The content system inputs segments of the audio stream into a speech classifier for classification, the speech classifier generating, for the segments of the audio stream, raw scores representing likelihoods that the respective segment of the audio stream includes an occurrence of a speech sound. The content system generates binary scores for the audio stream based on the set of raw scores, each binary score generated based on an aggregation of raw scores from consecutive series of the segments of the audio stream. The content system generates one or more timing windows for the speech sounds in the audio stream based on the binary scores, each timing window indicating an estimate of a beginning and ending timestamps of one or more speech sounds in the audio stream.

Type: Grant

Filed: August 1, 2016

Date of Patent: November 26, 2019

Assignee: GOOGLE LLC

Inventors: Sourish Chaudhuri, Neboj{hacek over (s)}a Ćirić, Khiem Pham
Audio classifier

Patent number: 10381022

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for audio classifiers. In one aspect, a method includes obtaining a plurality of video frames from a plurality of videos, wherein each of the plurality of video frames is associated with one or more image labels of a plurality of image labels determined based on image recognition; obtaining a plurality of audio segments corresponding to the plurality of video frames, wherein each audio segment has a specified duration relative to the corresponding video frame; and generating an audio classifier trained using the plurality of audio segment and the associated image labels as input, wherein the audio classifier is trained such that the one or more groups of audio segments are determined to be associated with respective one or more audio labels.

Type: Grant

Filed: February 11, 2016

Date of Patent: August 13, 2019

Assignee: Google LLC

Inventors: Sourish Chaudhuri, Achal D. Dave, Bryan Andrew Seybold
Filtering wind noises in video content

Patent number: 10356469

Abstract: Implementations disclose filtering wind noises in video content. A method includes receiving video content comprising an audio component and a video component, detecting, by a processing device, occurrence of a wind noise artifact in a segment of the audio component, identifying an intensity of the wind noise artifact, wherein the intensity is based on a signal-to-noise ratio of the wind noise artifact, selecting, by the processing device, a wind noise replacement operation based on the identified intensity of the wind noise artifact, and applying, by the processing device, the selected wind noise replacement operation to the segment of the audio component to remove the wind noise artifact from the segment.

Type: Grant

Filed: November 29, 2017

Date of Patent: July 16, 2019

Assignee: Google LLC

Inventors: Elad Eban, Aren Jansen, Sourish Chaudhuri
Automatic smoothed captioning of non-speech sounds from audio

Patent number: 10037313

Abstract: A content server accessing an audio stream, and inputs portions of the audio stream into one or more non-speech classifiers for classification, the non-speech classifiers generating, for portions of the audio stream, a set of raw scores representing likelihoods that the respective portion of the audio stream includes an occurrence of a particular class of non-speech sounds associated with each of the non-speech classifiers. The content server generates binary scores for the sets of raw scores, the binary scores generated based on a smoothing of a respective set of raw scores. The content server applies a set of non-speech captions to portions of the audio stream in time, each of the sets of non-speech captions based on a different one of the set binary scores of the corresponding portion of the audio stream.

Type: Grant

Filed: August 23, 2016

Date of Patent: July 31, 2018

Assignee: GOOGLE LLC

Inventors: Fangzhou Wang, Sourish Chaudhuri, Daniel Ellis, Nathan Reale
ASSOCIATING FACES WITH VOICES FOR SPEAKER DIARIZATION WITHIN VIDEOS

Publication number: 20180174600

Abstract: A computer-implemented method for speech diarization is described. The method comprises determining temporal positions of separate faces in a video using face detection and clustering. Voice features are detected in the speech sections of the video. The method further includes generating a correlation between the determined separate faces and separate voices based at least on the temporal positions of the separate faces and the separate voices in the video. This correlation is stored in a content store with the video.

Type: Application

Filed: April 26, 2017

Publication date: June 21, 2018

Inventors: Sourish Chaudhuri, Kenneth Hoover
FILTERING WIND NOISES IN VIDEO CONTENT

Publication number: 20180084301

Abstract: Implementations disclose filtering wind noises in video content. A method includes receiving video content comprising an audio component and a video component, detecting, by a processing device, occurrence of a wind noise artifact in a segment of the audio component, identifying an intensity of the wind noise artifact, wherein the intensity is based on a signal-to-noise ratio of the wind noise artifact, selecting, by the processing device, a wind noise replacement operation based on the identified intensity of the wind noise artifact, and applying, by the processing device, the selected wind noise replacement operation to the segment of the audio component to remove the wind noise artifact from the segment.

Type: Application

Filed: November 29, 2017

Publication date: March 22, 2018

Inventors: Elad Eban, Aren Jansen, Sourish Chaudhuri

1 2 next