Patents by Inventor Sourish Chaudhuri

Sourish Chaudhuri has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230253009
    Abstract: Hot-word free adaptation of one or more function(s) of an automated assistant. Sensor data, from one or more sensor components of an assistant device that provides an automated assistant interface (graphical and/or audible), is processed to determine occurrence and/or confidence metric(s) of various attributes of a user that is proximal to the assistant device. Whether to adapt each of one or more of the function(s) of the automated assistant is based on the occurrence and/or the confidence of one or more of the various attributes. For example, certain processing of at least some of the sensor data can be initiated, such as initiating previously dormant local processing of at least some of the sensor data and/or initiating transmission of at least some of the audio data to remote automated assistant component(s).
    Type: Application
    Filed: April 17, 2023
    Publication date: August 10, 2023
    Inventors: Jaclyn Konzelmann, Kenneth Mixter, Sourish Chaudhuri, Tuan Nguyen, Hideaki Matsui, Caroline Pantofaru, Vinay Bettadapura
  • Patent number: 11688417
    Abstract: Hot-word free adaptation of one or more function(s) of an automated assistant. Sensor data, from one or more sensor components of an assistant device that provides an automated assistant interface (graphical and/or audible), is processed to determine occurrence and/or confidence metric(s) of various attributes of a user that is proximal to the assistant device. Whether to adapt each of one or more of the function(s) of the automated assistant is based on the occurrence and/or the confidence of one or more of the various attributes. For example, certain processing of at least some of the sensor data can be initiated, such as initiating previously dormant local processing of at least some of the sensor data and/or initiating transmission of at least some of the audio data to remote automated assistant component(s).
    Type: Grant
    Filed: May 2, 2019
    Date of Patent: June 27, 2023
    Assignee: GOOGLE LLC
    Inventors: Jaclyn Konzelmann, Kenneth Mixter, Sourish Chaudhuri, Tuan Nguyen, Hideaki Matsui, Caroline Pantofaru, Vinay Bettadapura
  • Publication number: 20230103060
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining the number of speakers in a video and a corresponding audio using visual context. In one aspect, a method includes detecting within the video multiple speakers, determining a bounding box for each detected speaker that includes the detected person and objects within a threshold distance of the detected person in an image frame, determining a unique descriptor for that person based in part on image information depicting the objects within the bounding box, determining a cardinality of unique speakers in the video, providing to the speaker diarization system the cardinality of unique speakers.
    Type: Application
    Filed: March 13, 2020
    Publication date: March 30, 2023
    Inventors: Sourish Chaudhuri, Lev Finkelstein
  • Patent number: 11587319
    Abstract: Implementations described herein relate to methods, devices, and computer-readable media to perform gating for video analysis. In some implementations, a computer-implemented method includes obtaining a video comprising a plurality of frames and corresponding audio. The method further includes performing sampling to select a subset of the plurality of frames based on a target frame rate and extracting a respective audio spectrogram for each frame in the subset of the plurality of frames. The method further includes reducing resolution of the subset of the plurality of frames. The method further includes applying a machine-learning based gating model to the subset of the plurality of frames and corresponding audio spectrograms and obtaining, as output of the gating model, an indication of whether to analyze the video to add one or more video annotations.
    Type: Grant
    Filed: March 30, 2021
    Date of Patent: February 21, 2023
    Assignee: Google LLC
    Inventors: Sharadh Ramaswamy, Sourish Chaudhuri, Joseph Roth
  • Publication number: 20210216778
    Abstract: Implementations described herein relate to methods, devices, and computer-readable media to perform gating for video analysis. In some implementations, a computer-implemented method includes obtaining a video comprising a plurality of frames and corresponding audio. The method further includes performing sampling to select a subset of the plurality of frames based on a target frame rate and extracting a respective audio spectrogram for each frame in the subset of the plurality of frames. The method further includes reducing resolution of the subset of the plurality of frames. The method further includes applying a machine-learning based gating model to the subset of the plurality of frames and corresponding audio spectrograms and obtaining, as output of the gating model, an indication of whether to analyze the video to add one or more video annotations.
    Type: Application
    Filed: March 30, 2021
    Publication date: July 15, 2021
    Applicant: Google LLC
    Inventors: Sharadh Ramaswamy, Sourish Chaudhuri, Joseph Roth
  • Patent number: 11011184
    Abstract: The technology disclosed herein may determine timing windows for speech captions of an audio stream. In one example, the technology may involve accessing audio data comprising a plurality of segments; determining, by a processing device, that one or more of the plurality of segments comprise speech sounds; identifying a time duration for the speech sounds; and providing a user interface element corresponding to the time duration for the speech sounds, wherein the user interface element indicates an estimate of a beginning and ending of the speech sounds and is configured to receive caption text associated with the speech sounds of the audio data.
    Type: Grant
    Filed: November 15, 2019
    Date of Patent: May 18, 2021
    Assignee: Google LLC
    Inventors: Sourish Chaudhuri, Nebojsa Ciric, Khiem Pham
  • Patent number: 10984246
    Abstract: Implementations described herein relate to methods, devices, and computer-readable media to perform gating for video analysis. In some implementations, a computer-implemented method includes obtaining a video comprising a plurality of frames and corresponding audio. The method further includes performing sampling to select a subset of the plurality of frames based on a target frame rate and extracting a respective audio spectrogram for each frame in the subset of the plurality of frames. The method further includes reducing resolution of the subset of the plurality of frames. The method further includes applying a machine-learning based gating model to the subset of the plurality of frames and corresponding audio spectrograms and obtaining, as output of the gating model, an indication of whether to analyze the video to add one or more video annotations.
    Type: Grant
    Filed: March 13, 2019
    Date of Patent: April 20, 2021
    Assignee: Google LLC
    Inventors: Sharadh Ramaswamy, Sourish Chaudhuri, Joseph Roth
  • Patent number: 10846522
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating predictions for whether a target person is speaking during a portion of a video. In one aspect, a method includes obtaining one or more images which each depict a mouth of a given person at a respective time point. The images are processed using an image embedding neural network to generate a latent representation of the images. Audio data corresponding to the images is processed using an audio embedding neural network to generate a latent representation of the audio data. The latent representation of the images and the latent representation of the audio data is processed using a recurrent neural network to generate a prediction for whether the given person is speaking.
    Type: Grant
    Filed: October 16, 2018
    Date of Patent: November 24, 2020
    Assignee: Google LLC
    Inventors: Sourish Chaudhuri, Ondrej Klejch, Joseph Edward Roth
  • Publication number: 20200349966
    Abstract: Hot-word free adaptation of one or more function(s) of an automated assistant. Sensor data, from one or more sensor components of an assistant device that provides an automated assistant interface (graphical and/or audible), is processed to determine occurrence and/or confidence metric(s) of various attributes of a user that is proximal to the assistant device. Whether to adapt each of one or more of the function(s) of the automated assistant is based on the occurrence and/or the confidence of one or more of the various attributes. For example, certain processing of at least some of the sensor data can be initiated, such as initiating previously dormant local processing of at least some of the sensor data and/or initiating transmission of at least some of the audio data to remote automated assistant component(s).
    Type: Application
    Filed: May 2, 2019
    Publication date: November 5, 2020
    Inventors: Jaclyn Konzelmann, Kenneth Mixter, Sourish Chaudhuri, Tuan Nguyen, Hideaki Matsui, Caroline Pantofaru, Vinay Bettadapura
  • Publication number: 20200293783
    Abstract: Implementations described herein relate to methods, devices, and computer-readable media to perform gating for video analysis. In some implementations, a computer-implemented method includes obtaining a video comprising a plurality of frames and corresponding audio. The method further includes performing sampling to select a subset of the plurality of frames based on a target frame rate and extracting a respective audio spectrogram for each frame in the subset of the plurality of frames. The method further includes reducing resolution of the subset of the plurality of frames. The method further includes applying a machine-learning based gating model to the subset of the plurality of frames and corresponding audio spectrograms and obtaining, as output of the gating model, an indication of whether to analyze the video to add one or more video annotations.
    Type: Application
    Filed: March 13, 2019
    Publication date: September 17, 2020
    Applicant: Google LLC
    Inventors: Sharadh Ramaswamy, Sourish Chaudhuri, Joseph Roth
  • Publication number: 20200117887
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating predictions for whether a target person is speaking during a portion of a video. In one aspect, a method includes obtaining one or more images which each depict a mouth of a given person at a respective time point. The images are processed using an image embedding neural network to generate a latent representation of the images. Audio data corresponding to the images is processed using an audio embedding neural network to generate a latent representation of the audio data. The latent representation of the images and the latent representation of the audio data is processed using a recurrent neural network to generate a prediction for whether the given person is speaking.
    Type: Application
    Filed: October 16, 2018
    Publication date: April 16, 2020
    Inventors: Sourish Chaudhuri, Ondrej Klejch, Joseph Edward Roth
  • Publication number: 20200090678
    Abstract: The technology disclosed herein may determine timing windows for speech captions of an audio stream. In one example, the technology may involve accessing audio data comprising a plurality of segments; determining, by a processing device, that one or more of the plurality of segments comprise speech sounds; identifying a time duration for the speech sounds; and providing a user interface element corresponding to the time duration for the speech sounds, wherein the user interface element indicates an estimate of a beginning and ending of the speech sounds and is configured to receive caption text associated with the speech sounds of the audio data.
    Type: Application
    Filed: November 15, 2019
    Publication date: March 19, 2020
    Inventors: Sourish Chaudhuri, Nebojsa Ciric, Khiem Pham
  • Patent number: 10566009
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for audio classifiers. In one aspect, a method includes obtaining a plurality of video frames from a plurality of videos, wherein each of the plurality of video frames is associated with one or more image labels of a plurality of image labels determined based on image recognition; obtaining a plurality of audio segments corresponding to the plurality of video frames, wherein each audio segment has a specified duration relative to the corresponding video frame; and generating an audio classifier trained using the plurality of audio segment and the associated image labels as input, wherein the audio classifier is trained such that the one or more groups of audio segments are determined to be associated with respective one or more audio labels.
    Type: Grant
    Filed: July 24, 2019
    Date of Patent: February 18, 2020
    Assignee: Google LLC
    Inventors: Sourish Chaudhuri, Achal D. Dave, Bryan Andrew Seybold
  • Patent number: 10497382
    Abstract: A computer-implemented method for speech diarization is described. The method comprises determining temporal positions of separate faces in a video using face detection and clustering. Voice features are detected in the speech sections of the video. The method further includes generating a correlation between the determined separate faces and separate voices based at least on the temporal positions of the separate faces and the separate voices in the video. This correlation is stored in a content store with the video.
    Type: Grant
    Filed: April 26, 2017
    Date of Patent: December 3, 2019
    Assignee: Google LLC
    Inventors: Sourish Chaudhuri, Kenneth Hoover
  • Patent number: 10490209
    Abstract: A content system accessing an audio stream. The content system inputs segments of the audio stream into a speech classifier for classification, the speech classifier generating, for the segments of the audio stream, raw scores representing likelihoods that the respective segment of the audio stream includes an occurrence of a speech sound. The content system generates binary scores for the audio stream based on the set of raw scores, each binary score generated based on an aggregation of raw scores from consecutive series of the segments of the audio stream. The content system generates one or more timing windows for the speech sounds in the audio stream based on the binary scores, each timing window indicating an estimate of a beginning and ending timestamps of one or more speech sounds in the audio stream.
    Type: Grant
    Filed: August 1, 2016
    Date of Patent: November 26, 2019
    Assignee: GOOGLE LLC
    Inventors: Sourish Chaudhuri, Neboj{hacek over (s)}a Ćirić, Khiem Pham
  • Patent number: 10381022
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for audio classifiers. In one aspect, a method includes obtaining a plurality of video frames from a plurality of videos, wherein each of the plurality of video frames is associated with one or more image labels of a plurality of image labels determined based on image recognition; obtaining a plurality of audio segments corresponding to the plurality of video frames, wherein each audio segment has a specified duration relative to the corresponding video frame; and generating an audio classifier trained using the plurality of audio segment and the associated image labels as input, wherein the audio classifier is trained such that the one or more groups of audio segments are determined to be associated with respective one or more audio labels.
    Type: Grant
    Filed: February 11, 2016
    Date of Patent: August 13, 2019
    Assignee: Google LLC
    Inventors: Sourish Chaudhuri, Achal D. Dave, Bryan Andrew Seybold
  • Patent number: 10356469
    Abstract: Implementations disclose filtering wind noises in video content. A method includes receiving video content comprising an audio component and a video component, detecting, by a processing device, occurrence of a wind noise artifact in a segment of the audio component, identifying an intensity of the wind noise artifact, wherein the intensity is based on a signal-to-noise ratio of the wind noise artifact, selecting, by the processing device, a wind noise replacement operation based on the identified intensity of the wind noise artifact, and applying, by the processing device, the selected wind noise replacement operation to the segment of the audio component to remove the wind noise artifact from the segment.
    Type: Grant
    Filed: November 29, 2017
    Date of Patent: July 16, 2019
    Assignee: Google LLC
    Inventors: Elad Eban, Aren Jansen, Sourish Chaudhuri
  • Patent number: 10037313
    Abstract: A content server accessing an audio stream, and inputs portions of the audio stream into one or more non-speech classifiers for classification, the non-speech classifiers generating, for portions of the audio stream, a set of raw scores representing likelihoods that the respective portion of the audio stream includes an occurrence of a particular class of non-speech sounds associated with each of the non-speech classifiers. The content server generates binary scores for the sets of raw scores, the binary scores generated based on a smoothing of a respective set of raw scores. The content server applies a set of non-speech captions to portions of the audio stream in time, each of the sets of non-speech captions based on a different one of the set binary scores of the corresponding portion of the audio stream.
    Type: Grant
    Filed: August 23, 2016
    Date of Patent: July 31, 2018
    Assignee: GOOGLE LLC
    Inventors: Fangzhou Wang, Sourish Chaudhuri, Daniel Ellis, Nathan Reale
  • Publication number: 20180174600
    Abstract: A computer-implemented method for speech diarization is described. The method comprises determining temporal positions of separate faces in a video using face detection and clustering. Voice features are detected in the speech sections of the video. The method further includes generating a correlation between the determined separate faces and separate voices based at least on the temporal positions of the separate faces and the separate voices in the video. This correlation is stored in a content store with the video.
    Type: Application
    Filed: April 26, 2017
    Publication date: June 21, 2018
    Inventors: Sourish Chaudhuri, Kenneth Hoover
  • Publication number: 20180084301
    Abstract: Implementations disclose filtering wind noises in video content. A method includes receiving video content comprising an audio component and a video component, detecting, by a processing device, occurrence of a wind noise artifact in a segment of the audio component, identifying an intensity of the wind noise artifact, wherein the intensity is based on a signal-to-noise ratio of the wind noise artifact, selecting, by the processing device, a wind noise replacement operation based on the identified intensity of the wind noise artifact, and applying, by the processing device, the selected wind noise replacement operation to the segment of the audio component to remove the wind noise artifact from the segment.
    Type: Application
    Filed: November 29, 2017
    Publication date: March 22, 2018
    Inventors: Elad Eban, Aren Jansen, Sourish Chaudhuri