Patents by Inventor Sachin S. Kajarekar

Sachin S. Kajarekar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230368783
    Abstract: An example process includes: receiving a speech input representing a user utterance; determining, based on a textual representation of the speech input, a first score corresponding to a type of the user utterance; determining, based on the textual representation of the speech input, a second score representing a correspondence between the user utterance and a domain recognized by a digital assistant; determining, based on the first score and the second score, whether the speech input is intended for the digital assistant; in accordance with a determination that the speech input is intended for the digital assistant: initiating, by the digital assistant, a task based on the speech input; and providing an output indicative of the initiated task.
    Type: Application
    Filed: September 23, 2022
    Publication date: November 16, 2023
    Inventors: Eric MARCHI, Ognjen RUDOVIC, Pranay DIGHE, Sachin S. KAJAREKAR, Saurabh ADYA, Barry-John THEOBALD, Seyedmahdad MIRSAMADI, Ahmed S. HUSSEN ABDELAZIZ
  • Publication number: 20230368812
    Abstract: An example process includes: receiving a speech input representing a user utterance; determining, based on a textual representation of the speech input, a first score corresponding to a type of the user utterance; determining, based on the textual representation of the speech input, a second score representing a correspondence between the user utterance and a domain recognized by a digital assistant; determining, based on the first score and the second score, whether the speech input is intended for the digital assistant; in accordance with a determination that the speech input is intended for the digital assistant: initiating, by the digital assistant, a task based on the speech input; and providing an output indicative of the initiated task.
    Type: Application
    Filed: September 23, 2022
    Publication date: November 16, 2023
    Inventors: Eric MARCHI, Ognjen RUDOVIC, Sachin S. KAJAREKAR, Saurabh ADYA, Barry-John THEOBALD, Ahmed S. HUSSEN ABDELAZIZ
  • Patent number: 11423898
    Abstract: Systems and processes for operating an intelligent automated assistant are provided. An example method includes receiving, from one or more external electronic devices, a plurality of speaker profiles for a plurality of users; receiving a natural language speech input; determining, based on comparing the natural language speech input to the plurality of speaker profiles: a first likelihood that the natural language speech input corresponds to a first user of the plurality of users; and a second likelihood that the natural language speech input corresponds to a second user of the plurality of users; determining whether the first likelihood and the second likelihood are within a first threshold; and in accordance with determining that the first likelihood and the second likelihood are not within the first threshold: providing a response to the natural language speech input, the response being personalized for the first user.
    Type: Grant
    Filed: March 11, 2020
    Date of Patent: August 23, 2022
    Assignee: Apple Inc.
    Inventors: Stephen H. Shum, Corey J. Peterson, Sachin S. Kajarekar, Benjamin S. Phipps, Erik Marchi, Jessica Peck, Anumita Biswas, Chaitanya Mannemala
  • Publication number: 20200380980
    Abstract: Systems and processes for operating an intelligent automated assistant are provided. An example method includes receiving, from one or more external electronic devices, a plurality of speaker profiles for a plurality of users; receiving a natural language speech input; determining, based on comparing the natural language speech input to the plurality of speaker profiles: a first likelihood that the natural language speech input corresponds to a first user of the plurality of users; and a second likelihood that the natural language speech input corresponds to a second user of the plurality of users; determining whether the first likelihood and the second likelihood are within a first threshold; and in accordance with determining that the first likelihood and the second likelihood are not within the first threshold: providing a response to the natural language speech input, the response being personalized for the first user.
    Type: Application
    Filed: March 11, 2020
    Publication date: December 3, 2020
    Inventors: Stephen H. SHUM, Corey J. PETERSON, Sachin S. KAJAREKAR, Benjamin S. PHIPPS, Erik MARCHI, Jessica PECK, Anumita BISWAS, Harry SIMMONDS, Chas MANNEMALA
  • Publication number: 20200312315
    Abstract: An acoustic environment aware method for selecting a high quality audio stream during multi-stream speech recognition. A number of input audio streams are processed to determine if a voice trigger is detected, and if so a voice trigger score is calculated for each stream. An acoustic environment measurement is also calculated for each audio stream. The trigger score and acoustic environment measurement are combined for each audio stream, to select as a preferred audio stream the audio stream with the highest combined score. The preferred audio stream is output to an automatic speech recognizer. Other aspects are also described and claimed.
    Type: Application
    Filed: March 28, 2019
    Publication date: October 1, 2020
    Inventors: Feipeng Li, Mehrez Souden, Joshua D. Atkins, John Bridle, Charles P. Clark, Stephen H. Shum, Sachin S. Kajarekar, Haiying Xia, Erik Marchi
  • Patent number: 10789959
    Abstract: Techniques for training a speaker recognition model used for interacting with a digital assistant are provided. In some examples, user authentication information is obtained at a first time. At a second time, a user utterance representing a user request is received. A voice print is generated from the user utterance. A determination is made as to whether a plurality of conditions are satisfied. The plurality of conditions includes a first condition that the user authentication information corresponds to one or more authentication credentials assigned to a registered user of an electronic device. The plurality of conditions further includes a second condition that the first time and the second time are not separated by more than a predefined time period. In accordance with a determination that the plurality of conditions are satisfied, a speaker profile assigned to the registered user is updated based on the voice print.
    Type: Grant
    Filed: June 4, 2018
    Date of Patent: September 29, 2020
    Assignee: Apple Inc.
    Inventor: Sachin S. Kajarekar
  • Patent number: 10438595
    Abstract: Systems and processes for generating a speaker profile for use in performing speaker identification for a virtual assistant are provided. One example process can include receiving an audio input including user speech and determining whether a speaker of the user speech is a predetermined user based on a speaker profile for the predetermined user. In response to determining that the speaker of the user speech is the predetermined user, the user speech can be added to the speaker profile and operation of the virtual assistant can be triggered. In response to determining that the speaker of the user speech is not the predetermined user, the user speech can be added to an alternate speaker profile and operation of the virtual assistant may not be triggered. In some examples, contextual information can be used to verify results produced by the speaker identification process.
    Type: Grant
    Filed: October 9, 2018
    Date of Patent: October 8, 2019
    Assignee: Apple Inc.
    Inventors: Yoon Kim, Sachin S. Kajarekar
  • Publication number: 20190272831
    Abstract: Techniques for training a speaker recognition model used for interacting with a digital assistant are provided. In some examples, user authentication information is obtained at a first time. At a second time, a user utterance representing a user request is received. A voice print is generated from the user utterance. A determination is made as to whether a plurality of conditions are satisfied. The plurality of conditions includes a first condition that the user authentication information corresponds to one or more authentication credentials assigned to a registered user of an electronic device. The plurality of conditions further includes a second condition that the first time and the second time are not separated by more than a predefined time period. In accordance with a determination that the plurality of conditions are satisfied, a speaker profile assigned to the registered user is updated based on the voice print.
    Type: Application
    Filed: June 4, 2018
    Publication date: September 5, 2019
    Inventor: Sachin S. KAJAREKAR
  • Patent number: 10255907
    Abstract: Systems and processes for automatic accent detection are provided. In accordance with one example, a method includes, at an electronic device with one or more processors and memory, receiving a user input, determining a first similarity between a representation of the user input and a first acoustic model of a plurality of acoustic models, and determining a second similarity between the representation of the user input and a second acoustic model of the plurality of acoustic models. The method further includes determining whether the first similarity is greater than the second similarity. In accordance with a determination that the first similarity is greater than the second similarity, the first acoustic model may be selected; and in accordance with a determination that the first similarity is not greater than the second similarity, the second acoustic model may be selected.
    Type: Grant
    Filed: September 4, 2015
    Date of Patent: April 9, 2019
    Assignee: Apple Inc.
    Inventors: Udhyakumar Nallasamy, Sachin S. Kajarekar, Matthias Paulik, Matthew Seigel
  • Publication number: 20190051309
    Abstract: Systems and processes for generating a speaker profile for use in performing speaker identification for a virtual assistant are provided. One example process can include receiving an audio input including user speech and determining whether a speaker of the user speech is a predetermined user based on a speaker profile for the predetermined user. In response to determining that the speaker of the user speech is the predetermined user, the user speech can be added to the speaker profile and operation of the virtual assistant can be triggered. In response to determining that the speaker of the user speech is not the predetermined user, the user speech can be added to an alternate speaker profile and operation of the virtual assistant may not be triggered. In some examples, contextual information can be used to verify results produced by the speaker identification process.
    Type: Application
    Filed: October 9, 2018
    Publication date: February 14, 2019
    Inventors: Yoon KIM, Sachin S. KAJAREKAR
  • Patent number: 10186254
    Abstract: The present disclosure generally relates to context-based endpoint detection in user speech input. A method for identifying an endpoint of a spoken request by a user may include receiving user input of natural language speech including one or more words; identifying at least one context associated with the user input; generating a probability, based on the at least one context associated with the user input, that a location in the user input is an endpoint; determining whether the probability is greater than a threshold; and in accordance with a determination that the probability is greater than the threshold, identifying the location in the user input as the endpoint.
    Type: Grant
    Filed: September 4, 2015
    Date of Patent: January 22, 2019
    Assignee: Apple Inc.
    Inventors: Shaun E. Williams, Henry G. Mason, Mahesh Krishnamoorthy, Matthias Paulik, Neha Agrawal, Sachin S. Kajarekar, Selen Uguroglu, Ali S. Mohamed
  • Patent number: 10127911
    Abstract: Systems and processes for generating a speaker profile for use in performing speaker identification for a virtual assistant are provided. One example process can include receiving an audio input including user speech and determining whether a speaker of the user speech is a predetermined user based on a speaker profile for the predetermined user. In response to determining that the speaker of the user speech is the predetermined user, the user speech can be added to the speaker profile and operation of the virtual assistant can be triggered. In response to determining that the speaker of the user speech is not the predetermined user, the user speech can be added to an alternate speaker profile and operation of the virtual assistant may not be triggered. In some examples, contextual information can be used to verify results produced by the speaker identification process.
    Type: Grant
    Filed: August 25, 2015
    Date of Patent: November 13, 2018
    Assignee: Apple Inc.
    Inventors: Yoon Kim, Sachin S. Kajarekar
  • Publication number: 20170365249
    Abstract: A method of performing automatic speech recognition (ASR) using end-pointing markers generated using accelerometer-based voice activity detector starts with a voice activity detector (VAD) generating an accelerometer VAD output (VADa) based on data output by at least one accelerometer that is included in at least one earbud. The at least one accelerometer to detect vibration of the user's vocal chords. A voice processor detects a speech signal based on acoustic signals from at least one microphone. An end-pointer generates the end-pointing markers based on the VADa output and an ASR engine performs ASR on the speech signal based on the end-pointing markers. Other embodiments are also described.
    Type: Application
    Filed: June 21, 2016
    Publication date: December 21, 2017
    Inventors: Sorin V. Dusan, Devang K. Naik, Sachin S. Kajarekar
  • Publication number: 20160358598
    Abstract: The present disclosure generally relates to context-based endpoint detection in user speech input. A method for identifying an endpoint of a spoken request by a user may include receiving user input of natural language speech including one or more words; identifying at least one context associated with the user input; generating a probability, based on the at least one context associated with the user input, that a location in the user input is an endpoint; determining whether the probability is greater than a threshold; and in accordance with a determination that the probability is greater than the threshold, identifying the location in the user input as the endpoint.
    Type: Application
    Filed: September 4, 2015
    Publication date: December 8, 2016
    Inventors: Shaun E. WILLIAMS, Henry G. MASON, Mahesh KRISHNAMOORTHY, Matthias PAULIK, Neha AGRAWAL, Sachin S. KAJAREKAR, Selen UGUROGLU, Ali S. MOHAMED
  • Publication number: 20160358600
    Abstract: Systems and processes for automatic accent detection are provided. In accordance with one example, a method includes, at an electronic device with one or more processors and memory, receiving a user input, determining a first similarity between a representation of the user input and a first acoustic model of a plurality of acoustic models, and determining a second similarity between the representation of the user input and a second acoustic model of the plurality of acoustic models. The method further includes determining whether the first similarity is greater than the second similarity. In accordance with a determination that the first similarity is greater than the second similarity, the first acoustic model may be selected; and in accordance with a determination that the first similarity is not greater than the second similarity, the second acoustic model may be selected.
    Type: Application
    Filed: September 4, 2015
    Publication date: December 8, 2016
    Inventors: Udhyakumar NALLASAMY, Sachin S. KAJAREKAR, Matthias PAULIK, Matthew SEIGEL
  • Publication number: 20160093304
    Abstract: Systems and processes for generating a speaker profile for use in performing speaker identification for a virtual assistant are provided. One example process can include receiving an audio input including user speech and determining whether a speaker of the user speech is a predetermined user based on a speaker profile for the predetermined user. In response to determining that the speaker of the user speech is the predetermined user, the user speech can be added to the speaker profile and operation of the virtual assistant can be triggered. In response to determining that the speaker of the user speech is not the predetermined user, the user speech can be added to an alternate speaker profile and operation of the virtual assistant may not be triggered. In some examples, contextual information can be used to verify results produced by the speaker identification process.
    Type: Application
    Filed: August 25, 2015
    Publication date: March 31, 2016
    Inventors: Yoon KIM, Sachin S. KAJAREKAR
  • Patent number: 9282284
    Abstract: Videoconferencing may be provided. A participant may be identified from audio information and in video information. From the video information, a plurality of images may be captured of the participant identified in the video information. A unique identifier may be associated with the captured plurality of images. The unique identifier may correspond to the participant identified from the audio information. The captured plurality of images and the associated unique identifier may be saved in a database.
    Type: Grant
    Filed: May 20, 2013
    Date of Patent: March 8, 2016
    Assignee: Cisco Technology, Inc.
    Inventors: Sachin S. Kajarekar, Mainak Sen
  • Patent number: 9240181
    Abstract: An audio stream is segmented into a plurality of time segments using speaker segmentation and recognition (SSR), with each time segment corresponding to the speaker's name, producing an SSR transcript. The audio stream is transcribed into a plurality of word regions using automatic speech recognition (ASR), with each of the word regions having a measure of the confidence in the accuracy of the translation, producing an ASR transcript. Word regions with a relatively low confidence in the accuracy of the translation are identified. The low confidence regions are filtered using named entity recognition (NER) rules to identify low confidence regions that a likely names. The NER rules associate a region that is identified as a likely name with the name of the speaker corresponding to the current, the previous, or the next time segment. All of the likely name regions associated with that speaker's name are selected.
    Type: Grant
    Filed: August 20, 2013
    Date of Patent: January 19, 2016
    Assignee: Cisco Technology, Inc.
    Inventors: Aparna Khare, Neha Agrawal, Sachin S. Kajarekar, Matthias Paulik
  • Patent number: 9165182
    Abstract: In one embodiment, a method includes obtaining media that includes a video stream and an audio stream. The method also includes detecting a number of faces visible in the video stream, and performing a speaker segmentation on the media. Performing the speaker segmentation on the media includes utilizing the number of faces visible in the video stream to augment the speaker segmentation.
    Type: Grant
    Filed: August 19, 2013
    Date of Patent: October 20, 2015
    Assignee: Cisco Technology, Inc.
    Inventors: Sachin S. Kajarekar, Mainak Sen
  • Publication number: 20150058005
    Abstract: An audio stream is segmented into a plurality of time segments using speaker segmentation and recognition (SSR), with each time segment corresponding to the speaker's name, producing an SSR transcript. The audio stream is transcribed into a plurality of word regions using automatic speech recognition (ASR), with each of the word regions having a measure of the confidence in the accuracy of the translation, producing an ASR transcript. Word regions with a relatively low confidence in the accuracy of the translation are identified. The low confidence regions are filtered using named entity recognition (NER) rules to identify low confidence regions that a likely names. The NER rules associate a region that is identified as a likely name with the name of the speaker corresponding to the current, the previous, or the next time segment. All of the likely name regions associated with that speaker's name are selected.
    Type: Application
    Filed: August 20, 2013
    Publication date: February 26, 2015
    Applicant: Cisco Technology, Inc.
    Inventors: Aparna Khare, Neha Agrawal, Sachin S. Kajarekar, Matthias Paulik