Patents by Inventor Sourish Chaudhuri
Sourish Chaudhuri has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20230253009Abstract: Hot-word free adaptation of one or more function(s) of an automated assistant. Sensor data, from one or more sensor components of an assistant device that provides an automated assistant interface (graphical and/or audible), is processed to determine occurrence and/or confidence metric(s) of various attributes of a user that is proximal to the assistant device. Whether to adapt each of one or more of the function(s) of the automated assistant is based on the occurrence and/or the confidence of one or more of the various attributes. For example, certain processing of at least some of the sensor data can be initiated, such as initiating previously dormant local processing of at least some of the sensor data and/or initiating transmission of at least some of the audio data to remote automated assistant component(s).Type: ApplicationFiled: April 17, 2023Publication date: August 10, 2023Inventors: Jaclyn Konzelmann, Kenneth Mixter, Sourish Chaudhuri, Tuan Nguyen, Hideaki Matsui, Caroline Pantofaru, Vinay Bettadapura
-
Patent number: 11688417Abstract: Hot-word free adaptation of one or more function(s) of an automated assistant. Sensor data, from one or more sensor components of an assistant device that provides an automated assistant interface (graphical and/or audible), is processed to determine occurrence and/or confidence metric(s) of various attributes of a user that is proximal to the assistant device. Whether to adapt each of one or more of the function(s) of the automated assistant is based on the occurrence and/or the confidence of one or more of the various attributes. For example, certain processing of at least some of the sensor data can be initiated, such as initiating previously dormant local processing of at least some of the sensor data and/or initiating transmission of at least some of the audio data to remote automated assistant component(s).Type: GrantFiled: May 2, 2019Date of Patent: June 27, 2023Assignee: GOOGLE LLCInventors: Jaclyn Konzelmann, Kenneth Mixter, Sourish Chaudhuri, Tuan Nguyen, Hideaki Matsui, Caroline Pantofaru, Vinay Bettadapura
-
Publication number: 20230103060Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining the number of speakers in a video and a corresponding audio using visual context. In one aspect, a method includes detecting within the video multiple speakers, determining a bounding box for each detected speaker that includes the detected person and objects within a threshold distance of the detected person in an image frame, determining a unique descriptor for that person based in part on image information depicting the objects within the bounding box, determining a cardinality of unique speakers in the video, providing to the speaker diarization system the cardinality of unique speakers.Type: ApplicationFiled: March 13, 2020Publication date: March 30, 2023Inventors: Sourish Chaudhuri, Lev Finkelstein
-
Patent number: 11587319Abstract: Implementations described herein relate to methods, devices, and computer-readable media to perform gating for video analysis. In some implementations, a computer-implemented method includes obtaining a video comprising a plurality of frames and corresponding audio. The method further includes performing sampling to select a subset of the plurality of frames based on a target frame rate and extracting a respective audio spectrogram for each frame in the subset of the plurality of frames. The method further includes reducing resolution of the subset of the plurality of frames. The method further includes applying a machine-learning based gating model to the subset of the plurality of frames and corresponding audio spectrograms and obtaining, as output of the gating model, an indication of whether to analyze the video to add one or more video annotations.Type: GrantFiled: March 30, 2021Date of Patent: February 21, 2023Assignee: Google LLCInventors: Sharadh Ramaswamy, Sourish Chaudhuri, Joseph Roth
-
Publication number: 20210216778Abstract: Implementations described herein relate to methods, devices, and computer-readable media to perform gating for video analysis. In some implementations, a computer-implemented method includes obtaining a video comprising a plurality of frames and corresponding audio. The method further includes performing sampling to select a subset of the plurality of frames based on a target frame rate and extracting a respective audio spectrogram for each frame in the subset of the plurality of frames. The method further includes reducing resolution of the subset of the plurality of frames. The method further includes applying a machine-learning based gating model to the subset of the plurality of frames and corresponding audio spectrograms and obtaining, as output of the gating model, an indication of whether to analyze the video to add one or more video annotations.Type: ApplicationFiled: March 30, 2021Publication date: July 15, 2021Applicant: Google LLCInventors: Sharadh Ramaswamy, Sourish Chaudhuri, Joseph Roth
-
Patent number: 11011184Abstract: The technology disclosed herein may determine timing windows for speech captions of an audio stream. In one example, the technology may involve accessing audio data comprising a plurality of segments; determining, by a processing device, that one or more of the plurality of segments comprise speech sounds; identifying a time duration for the speech sounds; and providing a user interface element corresponding to the time duration for the speech sounds, wherein the user interface element indicates an estimate of a beginning and ending of the speech sounds and is configured to receive caption text associated with the speech sounds of the audio data.Type: GrantFiled: November 15, 2019Date of Patent: May 18, 2021Assignee: Google LLCInventors: Sourish Chaudhuri, Nebojsa Ciric, Khiem Pham
-
Patent number: 10984246Abstract: Implementations described herein relate to methods, devices, and computer-readable media to perform gating for video analysis. In some implementations, a computer-implemented method includes obtaining a video comprising a plurality of frames and corresponding audio. The method further includes performing sampling to select a subset of the plurality of frames based on a target frame rate and extracting a respective audio spectrogram for each frame in the subset of the plurality of frames. The method further includes reducing resolution of the subset of the plurality of frames. The method further includes applying a machine-learning based gating model to the subset of the plurality of frames and corresponding audio spectrograms and obtaining, as output of the gating model, an indication of whether to analyze the video to add one or more video annotations.Type: GrantFiled: March 13, 2019Date of Patent: April 20, 2021Assignee: Google LLCInventors: Sharadh Ramaswamy, Sourish Chaudhuri, Joseph Roth
-
Patent number: 10846522Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating predictions for whether a target person is speaking during a portion of a video. In one aspect, a method includes obtaining one or more images which each depict a mouth of a given person at a respective time point. The images are processed using an image embedding neural network to generate a latent representation of the images. Audio data corresponding to the images is processed using an audio embedding neural network to generate a latent representation of the audio data. The latent representation of the images and the latent representation of the audio data is processed using a recurrent neural network to generate a prediction for whether the given person is speaking.Type: GrantFiled: October 16, 2018Date of Patent: November 24, 2020Assignee: Google LLCInventors: Sourish Chaudhuri, Ondrej Klejch, Joseph Edward Roth
-
Publication number: 20200349966Abstract: Hot-word free adaptation of one or more function(s) of an automated assistant. Sensor data, from one or more sensor components of an assistant device that provides an automated assistant interface (graphical and/or audible), is processed to determine occurrence and/or confidence metric(s) of various attributes of a user that is proximal to the assistant device. Whether to adapt each of one or more of the function(s) of the automated assistant is based on the occurrence and/or the confidence of one or more of the various attributes. For example, certain processing of at least some of the sensor data can be initiated, such as initiating previously dormant local processing of at least some of the sensor data and/or initiating transmission of at least some of the audio data to remote automated assistant component(s).Type: ApplicationFiled: May 2, 2019Publication date: November 5, 2020Inventors: Jaclyn Konzelmann, Kenneth Mixter, Sourish Chaudhuri, Tuan Nguyen, Hideaki Matsui, Caroline Pantofaru, Vinay Bettadapura
-
Publication number: 20200293783Abstract: Implementations described herein relate to methods, devices, and computer-readable media to perform gating for video analysis. In some implementations, a computer-implemented method includes obtaining a video comprising a plurality of frames and corresponding audio. The method further includes performing sampling to select a subset of the plurality of frames based on a target frame rate and extracting a respective audio spectrogram for each frame in the subset of the plurality of frames. The method further includes reducing resolution of the subset of the plurality of frames. The method further includes applying a machine-learning based gating model to the subset of the plurality of frames and corresponding audio spectrograms and obtaining, as output of the gating model, an indication of whether to analyze the video to add one or more video annotations.Type: ApplicationFiled: March 13, 2019Publication date: September 17, 2020Applicant: Google LLCInventors: Sharadh Ramaswamy, Sourish Chaudhuri, Joseph Roth
-
Publication number: 20200117887Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating predictions for whether a target person is speaking during a portion of a video. In one aspect, a method includes obtaining one or more images which each depict a mouth of a given person at a respective time point. The images are processed using an image embedding neural network to generate a latent representation of the images. Audio data corresponding to the images is processed using an audio embedding neural network to generate a latent representation of the audio data. The latent representation of the images and the latent representation of the audio data is processed using a recurrent neural network to generate a prediction for whether the given person is speaking.Type: ApplicationFiled: October 16, 2018Publication date: April 16, 2020Inventors: Sourish Chaudhuri, Ondrej Klejch, Joseph Edward Roth
-
Publication number: 20200090678Abstract: The technology disclosed herein may determine timing windows for speech captions of an audio stream. In one example, the technology may involve accessing audio data comprising a plurality of segments; determining, by a processing device, that one or more of the plurality of segments comprise speech sounds; identifying a time duration for the speech sounds; and providing a user interface element corresponding to the time duration for the speech sounds, wherein the user interface element indicates an estimate of a beginning and ending of the speech sounds and is configured to receive caption text associated with the speech sounds of the audio data.Type: ApplicationFiled: November 15, 2019Publication date: March 19, 2020Inventors: Sourish Chaudhuri, Nebojsa Ciric, Khiem Pham
-
Patent number: 10566009Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for audio classifiers. In one aspect, a method includes obtaining a plurality of video frames from a plurality of videos, wherein each of the plurality of video frames is associated with one or more image labels of a plurality of image labels determined based on image recognition; obtaining a plurality of audio segments corresponding to the plurality of video frames, wherein each audio segment has a specified duration relative to the corresponding video frame; and generating an audio classifier trained using the plurality of audio segment and the associated image labels as input, wherein the audio classifier is trained such that the one or more groups of audio segments are determined to be associated with respective one or more audio labels.Type: GrantFiled: July 24, 2019Date of Patent: February 18, 2020Assignee: Google LLCInventors: Sourish Chaudhuri, Achal D. Dave, Bryan Andrew Seybold
-
Patent number: 10497382Abstract: A computer-implemented method for speech diarization is described. The method comprises determining temporal positions of separate faces in a video using face detection and clustering. Voice features are detected in the speech sections of the video. The method further includes generating a correlation between the determined separate faces and separate voices based at least on the temporal positions of the separate faces and the separate voices in the video. This correlation is stored in a content store with the video.Type: GrantFiled: April 26, 2017Date of Patent: December 3, 2019Assignee: Google LLCInventors: Sourish Chaudhuri, Kenneth Hoover
-
Patent number: 10490209Abstract: A content system accessing an audio stream. The content system inputs segments of the audio stream into a speech classifier for classification, the speech classifier generating, for the segments of the audio stream, raw scores representing likelihoods that the respective segment of the audio stream includes an occurrence of a speech sound. The content system generates binary scores for the audio stream based on the set of raw scores, each binary score generated based on an aggregation of raw scores from consecutive series of the segments of the audio stream. The content system generates one or more timing windows for the speech sounds in the audio stream based on the binary scores, each timing window indicating an estimate of a beginning and ending timestamps of one or more speech sounds in the audio stream.Type: GrantFiled: August 1, 2016Date of Patent: November 26, 2019Assignee: GOOGLE LLCInventors: Sourish Chaudhuri, Neboj{hacek over (s)}a Ćirić, Khiem Pham
-
Patent number: 10381022Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for audio classifiers. In one aspect, a method includes obtaining a plurality of video frames from a plurality of videos, wherein each of the plurality of video frames is associated with one or more image labels of a plurality of image labels determined based on image recognition; obtaining a plurality of audio segments corresponding to the plurality of video frames, wherein each audio segment has a specified duration relative to the corresponding video frame; and generating an audio classifier trained using the plurality of audio segment and the associated image labels as input, wherein the audio classifier is trained such that the one or more groups of audio segments are determined to be associated with respective one or more audio labels.Type: GrantFiled: February 11, 2016Date of Patent: August 13, 2019Assignee: Google LLCInventors: Sourish Chaudhuri, Achal D. Dave, Bryan Andrew Seybold
-
Patent number: 10356469Abstract: Implementations disclose filtering wind noises in video content. A method includes receiving video content comprising an audio component and a video component, detecting, by a processing device, occurrence of a wind noise artifact in a segment of the audio component, identifying an intensity of the wind noise artifact, wherein the intensity is based on a signal-to-noise ratio of the wind noise artifact, selecting, by the processing device, a wind noise replacement operation based on the identified intensity of the wind noise artifact, and applying, by the processing device, the selected wind noise replacement operation to the segment of the audio component to remove the wind noise artifact from the segment.Type: GrantFiled: November 29, 2017Date of Patent: July 16, 2019Assignee: Google LLCInventors: Elad Eban, Aren Jansen, Sourish Chaudhuri
-
Patent number: 10037313Abstract: A content server accessing an audio stream, and inputs portions of the audio stream into one or more non-speech classifiers for classification, the non-speech classifiers generating, for portions of the audio stream, a set of raw scores representing likelihoods that the respective portion of the audio stream includes an occurrence of a particular class of non-speech sounds associated with each of the non-speech classifiers. The content server generates binary scores for the sets of raw scores, the binary scores generated based on a smoothing of a respective set of raw scores. The content server applies a set of non-speech captions to portions of the audio stream in time, each of the sets of non-speech captions based on a different one of the set binary scores of the corresponding portion of the audio stream.Type: GrantFiled: August 23, 2016Date of Patent: July 31, 2018Assignee: GOOGLE LLCInventors: Fangzhou Wang, Sourish Chaudhuri, Daniel Ellis, Nathan Reale
-
Publication number: 20180174600Abstract: A computer-implemented method for speech diarization is described. The method comprises determining temporal positions of separate faces in a video using face detection and clustering. Voice features are detected in the speech sections of the video. The method further includes generating a correlation between the determined separate faces and separate voices based at least on the temporal positions of the separate faces and the separate voices in the video. This correlation is stored in a content store with the video.Type: ApplicationFiled: April 26, 2017Publication date: June 21, 2018Inventors: Sourish Chaudhuri, Kenneth Hoover
-
Publication number: 20180084301Abstract: Implementations disclose filtering wind noises in video content. A method includes receiving video content comprising an audio component and a video component, detecting, by a processing device, occurrence of a wind noise artifact in a segment of the audio component, identifying an intensity of the wind noise artifact, wherein the intensity is based on a signal-to-noise ratio of the wind noise artifact, selecting, by the processing device, a wind noise replacement operation based on the identified intensity of the wind noise artifact, and applying, by the processing device, the selected wind noise replacement operation to the segment of the audio component to remove the wind noise artifact from the segment.Type: ApplicationFiled: November 29, 2017Publication date: March 22, 2018Inventors: Elad Eban, Aren Jansen, Sourish Chaudhuri