Patents by Inventor Ashish Panda

Ashish Panda has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

METHOD AND SYSTEM FOR AUGMENTED SPEECH EMBEDDINGS BASED AUTOMATIC SPEECH RECOGNITION

Publication number: 20250218451

Abstract: Though several data augmentation techniques have been explored in the signal or feature space, very few studies have explored augmentation in the embedding space for Automatic Speech Recognition (ASR). The outputs of the hidden layers of a neural network can be seen as different representations or projections of the features. The augmentations performed on the features may not necessarily translate into augmentation of the different projections of the features as obtained from the output of the hidden layers. To overcome the challenges of the conventional approaches, embodiments herein provide a method and system for augmented speech embeddings based automatic speech recognition. The present disclosure provides an augmentation scheme which works on the speech embeddings. The augmentation works by replacing a set of randomly selected embeddings by noise during training. It does not require additional data, works online during training, and adds very little to the overall computational cost.

Type: Application

Filed: December 30, 2024

Publication date: July 3, 2025

Applicant: Tata Consultancy Services Limited

Inventors: ASHISH PANDA, SUNIL KUMAR KOPPARAPU
METHOD AND SYSTEM FOR AUTOMATIC SPEECH RECOGNITION (ASR) USING MULTI-TASK LEARNED (MTL) EMBEDDINGS

Publication number: 20240071373

Abstract: State of the art Acoustic Models (AM), which are trained using data from one environment, may fail to adapt to another environment, and as a result, application is restricted. The disclosure herein generally relates to speech signal processing, and, more particularly, to a method and system for Automatic Speech Recognition (ASR) using Multi-task Learned Embeddings (MTL). In this approach, MTL embeddings are extracted from an MTL neural network that has been trained using feature vectors from a plurality of speech files. The MTL embeddings are then used for generating an acoustic model, which maybe then used for the purpose of Automatic Speech Recognition, along with the feature vectors and the MTL embeddings.

Type: Application

Filed: August 11, 2023

Publication date: February 29, 2024

Applicant: Tata Consultancy Services Limited

Inventors: ASHISH PANDA, SUNIL KUMAR KOPPARAPU, ADITYA RAIKAR, MEETKUMAR HEMAKSHU SONI
Systems and methods for muting audio information in multimedia files and retrieval thereof

Patent number: 11340863

Abstract: Audio based transactions are getting more popular and are envisaged to become common in years to come. With the rise in data protection regulations, muting portions of the audio files is necessary to hide sensitive information from an eavesdropper or accidental hearing by an entity who gets unauthorized access to these audio files. However, it is realized that deleted transaction information in a muted audio files make audit of the transaction challenging and impossible. Embodiments of the present disclosure provide systems and methods of muting audio information in multimedia files and retrieval thereof which is masked and further allows for reconstruction of the original audio conversation or restoration Private to an Entity (P2aE) information without original audio reconstruction when auditing is being exercised.

Type: Grant

Filed: February 26, 2020

Date of Patent: May 24, 2022

Assignee: Tata Consultancy Services Limited

Inventors: Sunil Kumar Kopparapu, Ashish Panda
Method and system for generating synthetic multi-conditioned data sets for robust automatic speech recognition

Patent number: 11335329

Abstract: Performance of Automatic Speech Recognition (ASR) for robustness against real world noises and channel distortions is critical. Embodiments herein provide method and system for generating synthetic multi-conditioned data sets for additive noise and channel distortion for training multi-conditioned acoustic models for robust ASR. The method provides a generative noise model generating plurality of types of noise signals for additive noise based on weighted linear combination of plurality of noise basis signals and channel distortion based on estimated channel responses. The generative noise model is a parametric model, wherein basis function selection, number of basis functions to be combined linearly and weightages to be applied to the combinations is tunable, thereby enabling generation of wide variety of noise signals. Further, the noise signals are added to set of training speech utterances under set of constraints providing the multi-conditioned data sets, imitating real world effects.

Type: Grant

Filed: March 24, 2020

Date of Patent: May 17, 2022

Assignee: Tata Consultancy Services Limited

Inventors: Meetkumar Hemakshu Soni, Sonal Joshi, Ashish Panda
Features search and selection techniques for speaker and speech recognition

Patent number: 11322156

Abstract: With recent real-world applications of speaker and speech recognition systems, robust features for degraded speech have become a necessity. In general, degraded speech results in poor performance of any speech-based system. This poor performance can be attributed to feature extraction functionality of speech-based system which takes input speech file and converts it into a representation called as a feature. Embodiments of the present disclosure provide systems and methods that compute distance between each degraded speech feature extracted from an input speech signal with each clean speech feature comprised in a memory of the system to obtain set of matched clean speech features wherein at least a subset of cleaned speech features are dynamically selected based on a pre-defined threshold and the computed distance, thereby computing statistics for the dynamically selected clean speech features set for utilizing in at least one of a speech recognition system and a speaker recognition system.

Type: Grant

Filed: December 26, 2019

Date of Patent: May 3, 2022

Assignee: Tata Consultancy Services Limited

Inventors: Ashish Panda, Sunilkumar Kopparapu, Sonal Sunil Joshi
METHOD AND SYSTEM FOR GENERATING SYNTHETIC MULTI-CONDITIONED DATA SETS FOR ROBUST AUTOMATIC SPEECH RECOGNITION

Publication number: 20210065681

Abstract: Performance of Automatic Speech Recognition (ASR) for robustness against real world noises and channel distortions is critical. Embodiments herein provide method and system for generating synthetic multi-conditioned data sets for additive noise and channel distortion for training multi-conditioned acoustic models for robust ASR. The method provides a generative noise model generating plurality of types of noise signals for additive noise based on weighted linear combination of plurality of noise basis signals and channel distortion based on estimated channel responses. The generative noise model is a parametric model, wherein basis function selection, number of basis functions to be combined linearly and weightages to be applied to the combinations is tunable, thereby enabling generation of wide variety of noise signals. Further, the noise signals are added to set of training speech utterances under set of constraints providing the multi-conditioned data sets, imitating real world effects.

Type: Application

Filed: March 24, 2020

Publication date: March 4, 2021

Applicant: Tata Consultancy Services Limited

Inventors: Meetkumar Hemakshu SONI, Sonal JOSHI, Ashish PANDA
SYSTENS AND METHODS FOR MUTING AUDIO INFORMATION IN MULTIMEDIA FILES AND RETRIEVAL THEREOF

Publication number: 20200310746

Abstract: Audio based transactions are getting more popular and are envisaged to become common in years to come. With the rise in data protection regulations, muting portions of the audio files is necessary to hide sensitive information from an eavesdropper or accidental hearing by an entity who gets unauthorized access to these audio files. However, it is realized that deleted transaction information in a muted audio files make audit of the transaction challenging and impossible. Embodiments of the present disclosure provide systems and methods of muting audio information in multimedia files and retrieval thereof which is masked and further allows for reconstruction of the original audio conversation or restoration Private to an Entity (P2aE) information without original audio reconstruction when auditing is being exercised.

Type: Application

Filed: February 26, 2020

Publication date: October 1, 2020

Applicant: Tata Consultancy Services Limited

Inventors: Sunil Kumar Kopparapu, Ashish Panda
FEATURES SEARCH AND SELECTION TECHNIQUES FOR SPEAKER AND SPEECH RECOGNITION

Publication number: 20200211568

Abstract: With recent real-world applications of speaker and speech recognition systems, robust features for degraded speech have become a necessity. In general, degraded speech results in poor performance of any speech-based system. This poor performance can be attributed to feature extraction functionality of speech-based system which takes input speech file and converts it into a representation called as a feature. Embodiments of the present disclosure provide systems and methods that compute distance between each degraded speech feature extracted from an input speech signal with each clean speech feature comprised in a memory of the system to obtain set of matched clean speech features wherein at least a subset of cleaned speech features are dynamically selected based on a pre-defined threshold and the computed distance, thereby computing statistics for the dynamically selected clean speech features set for utilizing in at least one of a speech recognition system and a speaker recognition system.

Type: Application

Filed: December 26, 2019

Publication date: July 2, 2020

Applicant: Tata Consultancy Services Limited

Inventors: Ashish PANDA, Sunilkumar KOPPARAPU, Sonal Sunil JOSHI
System and method to insert visual subtitles in videos

Patent number: 10460732

Abstract: A system and method to insert visual subtitles in videos is described. The method comprises segmenting an input video signal to extract the speech segments and music segments. Next, a speaker representation is associated for each speech segment corresponding to a speaker visible in the frame. Further, speech segments are analyzed to compute the phones and the duration of each phone. The phones are mapped to a corresponding viseme and a viseme based language model is created with a corresponding score. Most relevant viseme is selected for the speech segments by computing a total viseme score. Further, a speaker representation sequence is created such that phones and emotions in the speech segments are represented as reconstructed lip movements and eyebrow movements. The speaker representation sequence is then integrated with the music segments and super imposed on the input video signal to create subtitles.

Type: Grant

Filed: March 29, 2017

Date of Patent: October 29, 2019

Assignee: Tata Consultancy Services Limited

Inventors: Chitralekha Bhat, Sunil Kumar Kopparapu, Ashish Panda
Method and system of estimating clean speech parameters from noisy speech parameters

Patent number: 10319377

Abstract: A method and system is provided for estimating clean speech parameters from noisy speech parameters. The method is performed by acquiring speech signals, estimating noise from the acquired speech signals, computing speech features from the acquired speech signals, estimating model parameters from the computed speech features and estimating clean parameters from the estimated noise and the estimated model parameters.

Type: Grant

Filed: February 28, 2017

Date of Patent: June 11, 2019

Assignee: Tata Consultancy Services Limited

Inventors: Ashish Panda, Sunil Kumar Kopparapu
SYSTEM AND METHOD TO INSERT VISUAL SUBTITLES IN VIDEOS

Publication number: 20170287481

Abstract: A system and method to insert visual subtitles in videos is described. The method comprises segmenting an input video signal to extract the speech segments and music segments. Next, a speaker representation is associated for each speech segment corresponding to a speaker visible in the frame. Further, speech segments are analysed to compute the phones and the duration of each phone. The phones are mapped to a corresponding viseme and a viseme based language model is created with a corresponding score. Most relevant viseme is selected for the speech segments by computing a total viseme score. Further, a speaker representation sequence is created such that phones and emotions in the speech segments are represented as reconstructed lip movements and eyebrow movements. The speaker representation sequence is then integrated with the music segments and super imposed on the input video signal to create subtitles.

Type: Application

Filed: March 29, 2017

Publication date: October 5, 2017

Applicant: Tata Consultancy Services Limited

Inventors: Chitralekha Bhat, Sunil Kumar Kopparapu, Ashish Panda
METHOD AND SYSTEM OF ESTIMATING CLEAN SPEECH PARAMETERS FROM NOISY SPEECH PARAMETERS

Publication number: 20170270952

Abstract: A method and system is provided for estimating clean speech parameters from noisy speech parameters. The method is performed by acquiring speech signals, estimating noise from the acquired speech signals, computing speech features from the acquired speech signals, estimating model parameters from the computed speech features and estimating clean parameters from the estimated noise and the estimated model parameters.

Type: Application

Filed: February 28, 2017

Publication date: September 21, 2017

Applicant: Tata Consultancy Services Limited

Inventors: ASHISH PANDA, Sunil Kumar Kopparapu
Computer implemented system and method for identifying significant speech frames within speech signals

Patent number: 9659578

Abstract: The present disclosure envisages a computer implemented system for identifying significant speech frames within speech signals for facilitating speech recognition. The system receives an input speech signal having a plurality of feature vectors which is passed through a spectrum analyzer. The spectrum analyzer divides the input speech signal into a plurality of speech frames and computes a spectral magnitude of each of the speech frames. There is provided a suitability engine which is enabled to compute a suitability measure for each of the speech frames corresponding to spectral flatness measure (SFM), energy normalized variance (ENV), entropy, signal-to-noise ratio (SNR) and similarity measure. The suitability engine further computes a weighted suitability measure for each of the speech frames.

Type: Grant

Filed: March 26, 2015

Date of Patent: May 23, 2017

Assignee: TATA CONSULTANCY SERVICES LTD.

Inventors: Ashish Panda, Sunil Kumar Kopparapu
Computer Implemented System and Method for Identifying Significant Speech Frames Within Speech Signals

Publication number: 20160155441

Abstract: The present disclosure envisages a computer implemented system for identifying significant speech frames within speech signals for facilitating speech recognition. The system receives an input speech signal having a plurality of feature vectors which is passed through a spectrum analyzer. The spectrum analyzer divides the input speech signal into a plurality of speech frames and computes a spectral magnitude of each of the speech frames. There is provided a suitability engine which is enabled to compute a suitability measure for each of the speech frames corresponding to spectral flatness measure (SFM), energy normalized variance (ENV), entropy, signal-to-noise ratio (SNR) and similarity measure. The suitability engine further computes a weighted suitability measure for each of the speech frames.

Type: Application

Filed: March 26, 2015

Publication date: June 2, 2016

Applicant: TATA CONSULTANCY SERVICES LTD.

Inventors: Ashish Panda, Sunil Kumar Kopparapu