Patents by Inventor Vijendra Raj Apsingekar

Vijendra Raj Apsingekar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

AUTOMATIC UPDATING OF AUTOMATIC SPEECH RECOGNITION FOR NAMED ENTITIES

Publication number: 20250149031

Abstract: A method includes identifying, using an automated speech recognition (ASR) system, at least one named entity hypothesis from at least one audio input. The method also can include providing, using the ASR system, the identified at least one named entity to a large language model (LLM). The method also can include generating a prompt using an automated prompt generator. The method also can include processing, using the LLM, the identified at least one named entity hypothesis and the prompt to generate updated named entity recognition data. The method also can include providing the updated named entity recognition data back to the ASR system.

Type: Application

Filed: August 27, 2024

Publication date: May 8, 2025

Inventors: Aditya Jajodia, Akash Sahoo, Patrick Hegarty, Divya Neelagiri, Vijendra Raj Apsingekar
System and method for accent-agnostic frame-level wake word detection

Patent number: 12272357

Abstract: A method includes accessing, using at least one processor of an electronic device, a machine learning model. The machine learning model is a trained student model that is trained using audio samples in a plurality of accent types. The method also includes receiving, using the at least one processor, an audio input from an audio input device. The method further includes providing, using the at least one processor, the audio input to the trained student model. The method also includes receiving, using the at least one processor, an output from the trained student model including frame-level probabilities associated with the audio input. In addition, the method includes instructing, using the at least one processor, at least one action based on the frame-level probabilities associated with the audio input.

Type: Grant

Filed: September 1, 2022

Date of Patent: April 8, 2025

Assignee: Samsung Electronics Co., Ltd.

Inventors: Sivakumar Balasubramanian, Gowtham Srinivasan, Srinivasa Rao Ponakala, Vijendra Raj Apsingekar, Anil Sunder Yadav
JOINT END-TO-END SPOKEN LANGUAGE UNDERSTANDING AND AUTOMATIC SPEECH RECOGNITION

Publication number: 20250078824

Abstract: A method includes receiving an utterance from an audio input device. The method also includes determining a context associated with the utterance. The method also includes providing the utterance as an input to a joint model for automatic speech recognition (ASR) and spoken language understanding (SLU), wherein the joint model operates in a single mode to perform both ASR and SLU or a dual mode to perform one of ASR or SLU depending on the context. The method also includes using an output of the joint model to perform an action requested in the utterance. The joint model is trained by training a shared encoder and a shared decoder using a text-to-text task and, after training the shared encoder and the shared decoder, training a speech encoder and the shared encoder using a speech self-supervised learning (SSL) learning task and a text-to-text task with a masked prediction loss.

Type: Application

Filed: August 23, 2024

Publication date: March 6, 2025

Inventors: Euisung Kim, Yun Tang, Taeyeon Ki, Divya Neelagiri, Vijendra Raj Apsingekar
System and method for improving named entity recognition

Patent number: 12170079

Abstract: A method includes training a set of teacher models. Training the set of teacher models includes, for each individual teacher model of the set of teacher models, training the individual teacher model to transcribe unlabeled audio samples and predict a pseudo labeled dataset having multiple labels. At least some of the unlabeled audio samples contain named entity (NE) audio data. At least some of the labels include transcribed NE labels corresponding to the NE audio data. The method also includes correcting at least some of the transcribed NE labels using user-specific NE textual data. The method further includes retraining the set of teacher models based on the pseudo labeled dataset from a selected one of the teacher models, where the selected one of the teacher models predicts the pseudo labeled dataset more accurately than other teacher models of the set of teacher models.

Type: Grant

Filed: August 3, 2021

Date of Patent: December 17, 2024

Assignee: Samsung Electronics Co., Ltd.

Inventors: Divya Neelagiri, Taeyeon Ki, Vijendra Raj Apsingekar
EFFICIENT ADAPTATION OF SPOKEN LANGUAGE UNDERSTANDING BASED ON AUTOMATIC SPEECH RECOGNITION USING MULTI-TASK LEARNING

Publication number: 20240304179

Abstract: A method includes receiving, by an automatic speech recognition (ASR)-based spoken language understanding (SLU) model, an input utterance using an audio input device. The method also includes, for each token of the input utterance, generating, using a shared ASR encoder of the ASR-based SLU model, an acoustic representation of acoustic features of the token (the shared ASR encoder including a first adapter layer); determining, using an ASR decoder of the ASR-based SLU model, a text representation of the token using the acoustic representation and any previous tokens (the ASR decoder including a second adapter layer); combining, using a fusion model of the ASR-based SLU model, the text representation and the acoustic representation to generate a joint representation, and determining, using an SLU decoder of the ASR-based SLU model, a semantic label associated with the token based on the joint representation and any previous semantic labels.

Type: Application

Filed: March 5, 2024

Publication date: September 12, 2024

Inventors: Euisung Kim, Aditya Jajodia, Cindy Sushen Tseng, Divya Neelagiri, Taeyeon Ki, Vijendra Raj Apsingekar
Method and apparatus for performing speaker diarization on mixed-bandwidth speech signals

Patent number: 12087307

Abstract: An apparatus for processing speech data may include a processor configured to: separate an input speech into speech signals; identify a bandwidth of each of the speech signals; extract speaker embeddings from the speech signals based on the bandwidth of each of the speech signals, using at least one neural network configured to receive the speech signals and output the speaker embeddings; and cluster the speaker embeddings into one or more speaker clusters, each speaker cluster corresponding to a speaker identity.

Type: Grant

Filed: November 30, 2021

Date of Patent: September 10, 2024

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Myungjong Kim, Vijendra Raj Apsingekar, Aviral Anshu, Taeyeon Ki
CONTEXT-AWARE FALSE TRIGGER MITIGATION FOR AUTOMATIC SPEECH RECOGNITION (ASR) SYSTEMS OR OTHER SYSTEMS

Publication number: 20240054999

Abstract: A method includes obtaining an audio input and a location associated with an electronic device. The method also includes generating an audio embedding associated with the audio input. The method further includes determining a first difference between the audio embedding associated with the audio input and an audio embedding associated with a known user. The method also includes determining a second difference between the location associated with the electronic device and a known location associated with the known user. The method further includes generating, using a false trigger mitigation (FTM) system, a probability of the audio input including a false trigger for automatic speech recognition based on the audio input, the first difference, and the second difference. In addition, the method includes determining whether to perform automatic speech recognition based on the probability.

Type: Application

Filed: April 7, 2023

Publication date: February 15, 2024

Inventors: Cindy Sushen Tseng, Srinivasa Rao Ponakala, Myungjong Kim, Taeyeon Ki, Vijendra Raj Apsingekar
THREE-DIMENSIONAL (3D) SOUND RENDERING WITH MULTI-CHANNEL AUDIO BASED ON MONO AUDIO INPUT

Publication number: 20240056761

Abstract: A method includes obtaining video content and associated substantially mono audio content. The method also includes determining at least one of a position or a motion trajectory of each of one or more objects detected in the video content and classifying each of the one or more objects into one of multiple object classes. The method further includes separating audio streams within the audio content based on the video content. Each of the audio streams is associated with one of multiple audio sources. The method also includes classifying each of the audio sources into one of the object classes. In addition, the method includes, for each audio source classified into the same object class as one of the one or more objects, distributing the audio stream associated with that audio source into multiple audio channels based on at least one of the position or the motion trajectory of that object.

Type: Application

Filed: June 15, 2023

Publication date: February 15, 2024

Inventors: Vijendra Raj Apsingekar, Akash Sahoo, Anil S. Yadav, Sivakumar Balasubramanian
SYSTEM AND METHOD FOR COMMAND FULFILLMENT WITHOUT WAKE WORD

Publication number: 20240029723

Abstract: A method comprises obtaining an audio input. The method also includes providing at least a portion of the audio input to a frame-level detector model. The method also includes obtaining a first output of the frame-level detector model including frame-level predictions associated with at least the portion of the audio input. The method also includes providing at least one chunked audio frame to a word-level verifier model. The method also includes obtaining a second output of the word-level verifier model including word-level probabilities associated with the at least one chunked audio frame. The method also includes instructing performance of automatic speech recognition on the audio input based on the word-level probabilities associated with the at least one chunked audio frame.

Type: Application

Filed: September 30, 2022

Publication date: January 25, 2024

Inventors: Sivakumar Balasubramanian, Gowtham Srinivasan, Srinivasa Rao Ponakala, Vijendra Raj Apsingekar, Anil Sunder Yadav
SYSTEM AND METHOD FOR SPEAKER VERIFICATION FOR VOICE ASSISTANT

Publication number: 20230419962

Abstract: A method includes obtaining audio data and identifying an utterance of a wake word or phrase in the audio data. The method also includes generating an embedding vector based on the utterance from the audio data and accessing a set of previously-generated vectors representing previous utterances of the wake word or phrase. The method further includes performing clustering on the embedding vector and the set of previously-generated vectors to identify a cluster including the embedding vector, where the identified cluster is associated with a speaker. The method also includes updating a speaker vector associated with the speaker based on the embedding vector and determining, using a speaker verification model, a similarity score between the updated speaker vector and the embedding vector. In addition, the method includes determining, based on the similarity score, whether a speaker providing the utterance matches the speaker associated with the identified cluster.

Type: Application

Filed: October 18, 2022

Publication date: December 28, 2023

Inventors: Myungjong Kim, Taeyeon Ki, Cindy Sushen Tseng, Srinivasa Rao Ponakala, Vijendra Raj Apsingekar
PERSONALIZED MULTI-MODAL SPOKEN LANGUAGE IDENTIFICATION

Publication number: 20230419958

Abstract: A method includes obtaining an audio input of a person speaking, where the audio input is captured by an electronic device. The method also includes, for each of multiple language types, (i) determining a first probability that the person is speaking in the language type by applying a trained spoken language identification model to the audio input, (ii) determining at least one second probability that the person is speaking in the language type based on at least one characteristic of the person or the electronic device, and (iii) determining a score for the language type based on a weighted sum of the first and second probabilities. The method further includes identifying the language type associated with a highest score as a spoken language of the person in the audio input.

Type: Application

Filed: October 3, 2022

Publication date: December 28, 2023

Inventors: Divya Neelagiri, Cindy Sushen Tseng, Vijendra Raj Apsingekar
ONLINE SPEAKER DIARIZATION USING LOCAL AND GLOBAL CLUSTERING

Publication number: 20230419979

Abstract: A method includes obtaining at least a portion of an audio stream containing speech activity. At least the portion of the audio stream includes multiple segments. The method also includes, for each of the multiple segments, generating an embedding vector that represents the segment. The method further includes, within each of multiple local windows, clustering the embedding vectors into one or more clusters to perform speaker identification. Different clusters correspond to different speakers. The method also includes presenting at least one first sequence of speaker identities based on the speaker identification performed for the local windows. The method further includes, within each of multiple global windows, clustering the embedding vectors into one or more clusters to perform speaker identification. Each global window includes two or more local windows.

Type: Application

Filed: October 12, 2022

Publication date: December 28, 2023

Inventors: Myungjong Kim, Taeyeon Ki, Vijendra Raj Apsingekar, Sungjae Park, SeungBeom Ryu, Hyuk Oh
SYSTEM AND METHOD FOR ACCENT-AGNOSTIC FRAME-LEVEL WAKE WORD DETECTION

Publication number: 20230368786

Abstract: A method includes accessing, using at least one processor of an electronic device, a machine learning model. The machine learning model is a trained student model that is trained using audio samples in a plurality of accent types. The method also includes receiving, using the at least one processor, an audio input from an audio input device. The method further includes providing, using the at least one processor, the audio input to the trained student model. The method also includes receiving, using the at least one processor, an output from the trained student model including frame-level probabilities associated with the audio input. In addition, the method includes instructing, using the at least one processor, at least one action based on the frame-level probabilities associated with the audio input.

Type: Application

Filed: September 1, 2022

Publication date: November 16, 2023

Inventors: Sivakumar Balasubramanian, Gowtham Srinivasan, Srinivasa Rao Ponakala, Vijendra Raj Apsingekar, Anil Sunder Yadav
METHOD AND APPARATUS FOR PERFORMING SPEAKER DIARIZATION BASED ON LANGUAGE IDENTIFICATION

Publication number: 20230169988

Abstract: An apparatus for processing speech data may include a processor configured to: separate speech signals from an input speech; identify a language of each of the speech signals that are separated from the input speech; extract speaker embeddings from the speech signals based on the language of each of the speech signals, using at least one neural network configured to receive the speech signals and output the speaker embeddings; and identify a speaker of each of the speech signals by iteratively clustering the speaker embeddings.

Type: Application

Filed: November 30, 2021

Publication date: June 1, 2023

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Myungjong KIM, Vijendra Raj APSINGEKAR, Divya NEELAGIRI, Taeyeon KI
METHOD AND APPARATUS FOR PERFORMING SPEAKER DIARIZATION ON MIXED-BANDWIDTH SPEECH SIGNALS

Publication number: 20230169981

Abstract: An apparatus for processing speech data may include a processor configured to: separate an input speech into speech signals; identify a bandwidth of each of the speech signals; extract speaker embeddings from the speech signals based on the bandwidth of each of the speech signals, using at least one neural network configured to receive the speech signals and output the speaker embeddings; and cluster the speaker embeddings into one or more speaker clusters, each speaker cluster corresponding to a speaker identity.

Type: Application

Filed: November 30, 2021

Publication date: June 1, 2023

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Myungjong KIM, Vijendra Raj APSINGEKAR, Aviral ANSHU, Taeyeon KI
METHOD AND SYSTEM FOR DEVICE FEATURE ANALYSIS TO IMPROVE USER EXPERIENCE

Publication number: 20230117535

Abstract: A method and system are provided. The method includes receiving an audio input, in response to the audio input being unrecognized by an audio recognition model, identifying contextual information, determining whether the contextual information corresponds to the audio input, and in response to determining that the contextual information corresponds to the audio input, causing training of a neural network associated with the audio recognition model based on the contextual information and the audio input.

Type: Application

Filed: October 15, 2021

Publication date: April 20, 2023

Inventors: Vijendra Raj Apsingekar, Myungjong Kim, Anil Yadav
SYSTEM AND METHOD FOR IMPROVING NAMED ENTITY RECOGNITION

Publication number: 20230040181

Abstract: A method includes training a set of teacher models. Training the set of teacher models includes, for each individual teacher model of the set of teacher models, training the individual teacher model to transcribe unlabeled audio samples and predict a pseudo labeled dataset having multiple labels. At least some of the unlabeled audio samples contain named entity (NE) audio data. At least some of the labels include transcribed NE labels corresponding to the NE audio data. The method also includes correcting at least some of the transcribed NE labels using user-specific NE textual data. The method further includes retraining the set of teacher models based on the pseudo labeled dataset from a selected one of the teacher models, where the selected one of the teacher models predicts the pseudo labeled dataset more accurately than other teacher models of the set of teacher models.

Type: Application

Filed: August 3, 2021

Publication date: February 9, 2023

Inventors: Divya Neelagiri, Taeyeon Ki, Vijendra Raj Apsingekar