Patents by Inventor Sachin Kajarekar

Sachin Kajarekar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20220093095
    Abstract: An example process includes: receiving an audio stream; determining a plurality of acoustic representations of the audio stream, where each acoustic representation of the plurality of acoustic representations corresponds to a respective frame of the audio stream; obtaining a respective plurality of scores indicating whether each respective frame of the audio stream is directed to an electronic device, where the obtaining includes: determining, using a triggering model operating on the electronic device, for each acoustic representation, a score indicating whether the respective frame of the audio stream is directed to the electronic device; determining, based on the respective plurality of scores, a likelihood that the audio stream is directed to the electronic device; determining whether the likelihood is above or below a threshold; and in response to determining that the likelihood is below the threshold, ceasing to process the audio stream.
    Type: Application
    Filed: December 16, 2020
    Publication date: March 24, 2022
    Inventors: Pranay DIGHE, Erik MARCHI, Srikanth VISHNUBHOTLA, Sachin KAJAREKAR, Devang K. NAIK
  • Publication number: 20210350810
    Abstract: Systems and processes for operating an intercom system via a digital assistant are provided. The intercom system is trigger-free, in that users communicate, in real-time, via devices without employing a trigger to speak. Acoustic fingerprints are employed to associate users with devices. Acoustic fingerprints include vector embeddings of speech input in an acoustic-feature vector space. Speech heard at multiple devices, as embedded in a fingerprint, may be clustered in the vector space, and the structure of the clusters is employed to associate users and devices. Based on the fingerprints, a device is mapped to a user, and the user employs that device to participate in a conversation, via the intercom service.
    Type: Application
    Filed: October 16, 2020
    Publication date: November 11, 2021
    Inventors: Benjamin S. PHIPPS, Sachin KAJAREKAR, Eugene RAY, Mahesh Ramaray SHANBHAG, Kisun YOU, Patrick L. Coffman
  • Publication number: 20210248804
    Abstract: Systems and processes for animating an avatar are provided. An example process of animating an avatar includes at an electronic device having one or more processors and memory, receiving text, determining an emotional state, and generating, using a neural network, a speech data set representing the received text and a set of parameters representing one or more movements of an avatar based on the received text and the determined emotional state.
    Type: Application
    Filed: January 20, 2021
    Publication date: August 12, 2021
    Inventors: Ahmed Serag El Din HUSSEN ABDELAZIZ, Justin BINDER, Sachin KAJAREKAR, Anushree PRASANNA KUMAR, Chloé Ann SEIVWRIGHT
  • Publication number: 20210090314
    Abstract: Systems and methods for animating an avatar are provided. An example method of animating an avatar includes at an electronic device having one or more processors and memory, receiving an audio input, receiving a video input including at least a portion of a user's face, wherein the video input is separate from the audio input, determining one or more movements of the user's face based on the received audio input and received video input, and generating, using a neural network separately trained with a set of audio training data and a set of video training data, a set of characteristics for controlling an avatar representing the one or more movements of the user's face.
    Type: Application
    Filed: December 20, 2019
    Publication date: March 25, 2021
    Inventors: Ahmed Serag El Din HUSSEN ABDELAZIZ, Nicholas APOSTOLOFF, Justin BINDER, Paul Richard DIXON, Sachin KAJAREKAR, Reinhard KNOTHE, Sebastian MARTIN, Barry-John THEOBALD, Thibaut WEISE
  • Patent number: 10186282
    Abstract: Systems and processes for robust end-pointing of speech signals using speaker recognition are provided. In one example process, a stream of audio having a spoken user request can be received. A first likelihood that the stream of audio includes user speech can be determined. A second likelihood that the stream of audio includes user speech spoken by an authorized user can be determined. A start-point or an end-point of the spoken user request can be determined based at least in part on the first likelihood and the second likelihood.
    Type: Grant
    Filed: April 30, 2015
    Date of Patent: January 22, 2019
    Assignee: Apple Inc.
    Inventors: Devang K. Naik, Sachin Kajarekar
  • Publication number: 20150371665
    Abstract: Systems and processes for robust end-pointing of speech signals using speaker recognition are provided. In one example process, a stream of audio having a spoken user request can be received. A first likelihood that the stream of audio includes user speech can be determined. A second likelihood that the stream of audio includes user speech spoken by an authorized user can be determined. A start-point or an end-point of the spoken user request can be determined based at least in part on the first likelihood and the second likelihood.
    Type: Application
    Filed: April 30, 2015
    Publication date: December 24, 2015
    Inventors: Devang K. NAIK, Sachin KAJAREKAR
  • Patent number: 9058806
    Abstract: A method is provided and includes estimating an approximate list of potential speakers in a file from one or more applications. The file (e.g., an audio file, video file, or any suitable combination thereof) includes a recording of a plurality of speakers. The method also includes segmenting the file according to the approximate list of potential speakers such that each segment corresponds to at least one speaker; and recognizing particular speakers in the file based on the approximate list of potential speakers.
    Type: Grant
    Filed: September 10, 2012
    Date of Patent: June 16, 2015
    Assignee: CISCO TECHNOLOGY, INC.
    Inventors: Ananth Sankar, Sachin Kajarekar, Satish K. Gannu
  • Patent number: 8902274
    Abstract: A method is provided and includes discovering active participants and passive participants from a meeting recording, generating an active notification that includes an option to manipulate the meeting recording, and a passive notification without the option to manipulate the meeting recording, and sending the active notification and the passive notification to the active participants and the passive participants, respectively. The method can also include discovering followers from the meeting recording, generating a followers notification without the option to manipulate the meeting recording, and which includes access to a portion of meeting recording, and sending the followers notification to the followers.
    Type: Grant
    Filed: December 4, 2012
    Date of Patent: December 2, 2014
    Assignee: Cisco Technology, Inc.
    Inventors: Ashutosh A. Malegaonkar, Paul Quinn, Sachin Kajarekar
  • Patent number: 8886011
    Abstract: An example method is provided and includes receiving a video bitstream in a network environment; detecting a question in a decoded audio portion of a video bitstream; and marking a segment of the video bitstream with a tag. The tag may correspond to a location of the question in the video bitstream, and can facilitate consumption of the video bitstream. The method can further include detecting keywords in the question, and combining the keywords to determine a content of the question. In specific embodiments, the method can also include receiving the question and a corresponding answer from a user interaction, crowdsourcing the question by a plurality of users, counting a number of questions in the video bitstream and other features.
    Type: Grant
    Filed: December 7, 2012
    Date of Patent: November 11, 2014
    Assignee: Cisco Technology, Inc.
    Inventors: Jim Chen Chou, Ananth Sankar, Sachin Kajarekar
  • Patent number: 8831403
    Abstract: An example method includes receiving a search query that includes one or more attributes; evaluating a plurality of video files; identifying video clips within the video files that have one or more of the search attributes; and creating a video report comprising a contiguous sequence of the video clips, where the video clips are stitched together according to a stitch criterion. In more particular embodiments, the method can include providing a user interface configured for receiving feedback associated with the plurality of video files. Additionally, the method may include tagging the video files with tags corresponding to predefined attributes; and identifying the predefined attributes in response to the search query. Furthermore, method can include matching the tags with the one or more search attributes, where at least one video clip in a particular one of the video files has at least some of the one or more search attributes.
    Type: Grant
    Filed: February 1, 2012
    Date of Patent: September 9, 2014
    Assignee: Cisco Technology, Inc.
    Inventors: Deepti Patil, Satish K. Gannu, Sachin Kajarèkar
  • Publication number: 20140161416
    Abstract: An example method is provided and includes receiving a video bitstream in a network environment; detecting a question in a decoded audio portion of a video bitstream; and marking a segment of the video bitstream with a tag. The tag may correspond to a location of the question in the video bitstream, and can facilitate consumption of the video bitstream. The method can further include detecting keywords in the question, and combining the keywords to determine a content of the question. In specific embodiments, the method can also include receiving the question and a corresponding answer from a user interaction, crowdsourcing the question by a plurality of users, counting a number of questions in the video bitstream and other features.
    Type: Application
    Filed: December 7, 2012
    Publication date: June 12, 2014
    Applicant: Cisco Technology, Inc.
    Inventors: Jim Chen Chou, Ananth Sankar, Sachin Kajarekar
  • Publication number: 20140152757
    Abstract: A method is provided and includes discovering active participants and passive participants from a meeting recording, generating an active notification that includes an option to manipulate the meeting recording, and a passive notification without the option to manipulate the meeting recording, and sending the active notification and the passive notification to the active participants and the passive participants, respectively. The method can also include discovering followers from the meeting recording, generating a followers notification without the option to manipulate the meeting recording, and which includes access to a portion of meeting recording, and sending the followers notification to the followers.
    Type: Application
    Filed: December 4, 2012
    Publication date: June 5, 2014
    Inventors: Ashutosh A. Malegaonkar, Paul Quinn, Sachin Kajarekar
  • Publication number: 20140074471
    Abstract: A method is provided and includes estimating an approximate list of potential speakers in a file from one or more applications. The file (e.g., an audio file, video file, or any suitable combination thereof) includes a recording of a plurality of speakers. The method also includes segmenting the file according to the approximate list of potential speakers such that each segment corresponds to at least one speaker; and recognizing particular speakers in the file based on the approximate list of potential speakers.
    Type: Application
    Filed: September 10, 2012
    Publication date: March 13, 2014
    Applicant: CISCO TECHNOLOGY, INC.
    Inventors: Ananth Sankar, Sachin Kajarekar, Satish K. Gannu
  • Publication number: 20130300939
    Abstract: An example method is provided and includes receiving a media file that includes video data and audio data; determining an initial scene sequence in the media file; determining an initial speaker sequence in the media file; and updating a selected one of the initial scene sequence and the initial speaker sequence in order to generate an updated scene sequence and an updated speaker sequence respectively. The initial scene sequence is updated based on the initial speaker sequence, and wherein the initial speaker sequence is updated based on the initial scene sequence.
    Type: Application
    Filed: May 11, 2012
    Publication date: November 14, 2013
    Inventors: Jim Chen Chou, Sachin Kajarekar, Jason J. Catchpole, Ananth Sankar
  • Publication number: 20130195422
    Abstract: An example method includes receiving a search query that includes one or more attributes; evaluating a plurality of video files; identifying video clips within the video files that have one or more of the search attributes; and creating a video report comprising a contiguous sequence of the video clips, where the video clips are stitched together according to a stitch criterion. In more particular embodiments, the method can include providing a user interface configured for receiving feedback associated with the plurality of video files. Additionally, the method may include tagging the video files with tags corresponding to predefined attributes; and identifying the predefined attributes in response to the search query. Furthermore, method can include matching the tags with the one or more search attributes, where at least one video clip in a particular one of the video files has at least some of the one or more search attributes.
    Type: Application
    Filed: February 1, 2012
    Publication date: August 1, 2013
    Inventors: Deepti Patil, Satish K. Gannu, Sachin Kajarèkar
  • Publication number: 20130144414
    Abstract: In one embodiment, an audio stream is partitioned into a plurality of segments such that the plurality of segments are clustered into one or more clusters, each of the one or more clusters identifying a subset of the plurality of segments in the audio stream and corresponding to one of a first set of one or more speaker models, each speaker model in the first set of speaker models representing one of a first set of hypothetical speakers. The speaker models in the first set of speaker models are compared with a second set of one or more speaker models, where each speaker model in the second set of speaker models represents one of a second set of hypothetical speakers. Labels associated with one or more speaker models in the second set of speaker models are propagated to one or more speaker models in the first set of speaker models according to a result of the comparing step.
    Type: Application
    Filed: December 6, 2011
    Publication date: June 6, 2013
    Applicant: Cisco Technology, Inc.
    Inventors: Sachin Kajarekar, Ananth Sankar, Sattish Gannu, Aparna Khare
  • Publication number: 20110153326
    Abstract: A system and method for extracting acoustic features and speech activity on a device and transmitting them in a distributed voice recognition system. The distributed voice recognition system includes a local VR engine in a subscriber unit and a server VR engine on a server. The local VR engine comprises a feature extraction (FE) module that extracts features from a speech signal, and a voice activity detection module (VAD) that detects voice activity within a speech signal. The system includes filters, framing and windowing modules, power spectrum analyzers, a neural network, a nonlinear element, and other components to selectively provide an advanced front end vector including predetermined portions of the voice activity detection indication and extracted features from the subscriber unit to the server. The system also includes a module to generate additional feature vectors on the server from the received features using a feed-forward multilayer perceptron (MLP) and providing the same to the speech server.
    Type: Application
    Filed: February 9, 2011
    Publication date: June 23, 2011
    Applicant: QUALCOMM INCORPORATED
    Inventors: HARINATH GARUDADRI, HYNEK HERMANSKY, LUKAS BURGET, PRATIBHA JAIN, SACHIN KAJAREKAR, SUNIL SIVADAS, STEPHANE N. DUPONT, MARIA CARMEN BENITEZ ORTUZAR, NELSON H. MORGAN
  • Publication number: 20080010065
    Abstract: A method and apparatus for speaker recognition is provided. One embodiment of a method for determining whether a given speech signal is produced by an alleged speaker, where a plurality of statistical models (including at least one support vector machine) have been produced for the alleged speaker based on a previous speech signal received from the alleged speaker, includes receiving the given speech signal, the speech signal representing an utterance made by a speaker claiming to be the alleged speaker, scoring the given speech signal using at least two modeling systems, where at least one of the modeling systems is a support vector machine, combining scores produced by the modeling systems, with equal weights, to produce a final score, and determining, in accordance with the final score, whether the speaker is likely the alleged speaker.
    Type: Application
    Filed: June 5, 2007
    Publication date: January 10, 2008
    Inventors: Harry BRATT, Luciana Ferrer, Martin Graciarena, Sachin Kajarekar, Elizabeth Shriberg, Mustafa Sonmez, Andreas Stolcke, Gokhan Tur, Anand Venkataraman
  • Patent number: 7089178
    Abstract: A distributed voice recognition system and method for obtaining acoustic features and speech activity at multiple frequencies by extracting high frequency components thereof on a device, such as a subscriber station and transmitting them to a network server having multiple stream processing capability, including cepstral feature processing, MLP nonlinear transformation processing, and multiband temporal pattern architecture processing. The features received at the network server are processed using all three streams, wherein each of the three streams provide benefits not available in the other two, thereby enhancing feature interpretation. Feature extraction and feature interpretation may operate at multiple frequencies, including but not limited to 8 kHz, 11 kHz, and 16 kHz.
    Type: Grant
    Filed: April 30, 2002
    Date of Patent: August 8, 2006
    Assignee: Qualcomm Inc.
    Inventors: Harinath Garudadri, Sunil Sivadas, Hynek Hermansky, Nelson H. Morgan, Charles C. Wooters, Andre Gustavo Adami, Maria Carmen Benitez Ortuzar, Lukas Burget, Stephane N. Dupont, Frantisek Grezl, Pratibha Jain, Sachin Kajarekar, Petr Motlicek
  • Publication number: 20030204394
    Abstract: A distributed voice recognition system and method for obtaining acoustic features and speech activity at multiple frequencies by extracting high frequency components thereof on a device, such as a subscriber station and transmitting them to a network server having multiple stream processing capability, including cepstral feature processing, MLP nonlinear transformation processing, and multiband temporal pattern architecture processing. The features received at the network server are processed using all three streams, wherein each of the three streams provide benefits not available in the other two, thereby enhancing feature interpretation. Feature extraction and feature interpretation may operate at multiple frequencies, including but not limited to 8 kHz, 11 kHz, and 16 kHz.
    Type: Application
    Filed: April 30, 2002
    Publication date: October 30, 2003
    Inventors: Harinath Garudadri, Sunil Sivadas, Hynek Hermansky, Nelson H. Morgan, Charles C. Wooters, Andre Gustavo Adami, Maria Carmen Benitez Ortuzar, Lukas Burget, Stephane N. Dupont, Frantisek Grezl, Pratibha Jain, Sachin Kajarekar, Petr Motlicek