Patents by Inventor Ashish Panda
Ashish Panda has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240071373Abstract: State of the art Acoustic Models (AM), which are trained using data from one environment, may fail to adapt to another environment, and as a result, application is restricted. The disclosure herein generally relates to speech signal processing, and, more particularly, to a method and system for Automatic Speech Recognition (ASR) using Multi-task Learned Embeddings (MTL). In this approach, MTL embeddings are extracted from an MTL neural network that has been trained using feature vectors from a plurality of speech files. The MTL embeddings are then used for generating an acoustic model, which maybe then used for the purpose of Automatic Speech Recognition, along with the feature vectors and the MTL embeddings.Type: ApplicationFiled: August 11, 2023Publication date: February 29, 2024Applicant: Tata Consultancy Services LimitedInventors: ASHISH PANDA, SUNIL KUMAR KOPPARAPU, ADITYA RAIKAR, MEETKUMAR HEMAKSHU SONI
-
Patent number: 11340863Abstract: Audio based transactions are getting more popular and are envisaged to become common in years to come. With the rise in data protection regulations, muting portions of the audio files is necessary to hide sensitive information from an eavesdropper or accidental hearing by an entity who gets unauthorized access to these audio files. However, it is realized that deleted transaction information in a muted audio files make audit of the transaction challenging and impossible. Embodiments of the present disclosure provide systems and methods of muting audio information in multimedia files and retrieval thereof which is masked and further allows for reconstruction of the original audio conversation or restoration Private to an Entity (P2aE) information without original audio reconstruction when auditing is being exercised.Type: GrantFiled: February 26, 2020Date of Patent: May 24, 2022Assignee: Tata Consultancy Services LimitedInventors: Sunil Kumar Kopparapu, Ashish Panda
-
Patent number: 11335329Abstract: Performance of Automatic Speech Recognition (ASR) for robustness against real world noises and channel distortions is critical. Embodiments herein provide method and system for generating synthetic multi-conditioned data sets for additive noise and channel distortion for training multi-conditioned acoustic models for robust ASR. The method provides a generative noise model generating plurality of types of noise signals for additive noise based on weighted linear combination of plurality of noise basis signals and channel distortion based on estimated channel responses. The generative noise model is a parametric model, wherein basis function selection, number of basis functions to be combined linearly and weightages to be applied to the combinations is tunable, thereby enabling generation of wide variety of noise signals. Further, the noise signals are added to set of training speech utterances under set of constraints providing the multi-conditioned data sets, imitating real world effects.Type: GrantFiled: March 24, 2020Date of Patent: May 17, 2022Assignee: Tata Consultancy Services LimitedInventors: Meetkumar Hemakshu Soni, Sonal Joshi, Ashish Panda
-
Patent number: 11322156Abstract: With recent real-world applications of speaker and speech recognition systems, robust features for degraded speech have become a necessity. In general, degraded speech results in poor performance of any speech-based system. This poor performance can be attributed to feature extraction functionality of speech-based system which takes input speech file and converts it into a representation called as a feature. Embodiments of the present disclosure provide systems and methods that compute distance between each degraded speech feature extracted from an input speech signal with each clean speech feature comprised in a memory of the system to obtain set of matched clean speech features wherein at least a subset of cleaned speech features are dynamically selected based on a pre-defined threshold and the computed distance, thereby computing statistics for the dynamically selected clean speech features set for utilizing in at least one of a speech recognition system and a speaker recognition system.Type: GrantFiled: December 26, 2019Date of Patent: May 3, 2022Assignee: Tata Consultancy Services LimitedInventors: Ashish Panda, Sunilkumar Kopparapu, Sonal Sunil Joshi
-
Publication number: 20210065681Abstract: Performance of Automatic Speech Recognition (ASR) for robustness against real world noises and channel distortions is critical. Embodiments herein provide method and system for generating synthetic multi-conditioned data sets for additive noise and channel distortion for training multi-conditioned acoustic models for robust ASR. The method provides a generative noise model generating plurality of types of noise signals for additive noise based on weighted linear combination of plurality of noise basis signals and channel distortion based on estimated channel responses. The generative noise model is a parametric model, wherein basis function selection, number of basis functions to be combined linearly and weightages to be applied to the combinations is tunable, thereby enabling generation of wide variety of noise signals. Further, the noise signals are added to set of training speech utterances under set of constraints providing the multi-conditioned data sets, imitating real world effects.Type: ApplicationFiled: March 24, 2020Publication date: March 4, 2021Applicant: Tata Consultancy Services LimitedInventors: Meetkumar Hemakshu SONI, Sonal JOSHI, Ashish PANDA
-
Publication number: 20200310746Abstract: Audio based transactions are getting more popular and are envisaged to become common in years to come. With the rise in data protection regulations, muting portions of the audio files is necessary to hide sensitive information from an eavesdropper or accidental hearing by an entity who gets unauthorized access to these audio files. However, it is realized that deleted transaction information in a muted audio files make audit of the transaction challenging and impossible. Embodiments of the present disclosure provide systems and methods of muting audio information in multimedia files and retrieval thereof which is masked and further allows for reconstruction of the original audio conversation or restoration Private to an Entity (P2aE) information without original audio reconstruction when auditing is being exercised.Type: ApplicationFiled: February 26, 2020Publication date: October 1, 2020Applicant: Tata Consultancy Services LimitedInventors: Sunil Kumar Kopparapu, Ashish Panda
-
Publication number: 20200211568Abstract: With recent real-world applications of speaker and speech recognition systems, robust features for degraded speech have become a necessity. In general, degraded speech results in poor performance of any speech-based system. This poor performance can be attributed to feature extraction functionality of speech-based system which takes input speech file and converts it into a representation called as a feature. Embodiments of the present disclosure provide systems and methods that compute distance between each degraded speech feature extracted from an input speech signal with each clean speech feature comprised in a memory of the system to obtain set of matched clean speech features wherein at least a subset of cleaned speech features are dynamically selected based on a pre-defined threshold and the computed distance, thereby computing statistics for the dynamically selected clean speech features set for utilizing in at least one of a speech recognition system and a speaker recognition system.Type: ApplicationFiled: December 26, 2019Publication date: July 2, 2020Applicant: Tata Consultancy Services LimitedInventors: Ashish PANDA, Sunilkumar KOPPARAPU, Sonal Sunil JOSHI
-
Patent number: 10460732Abstract: A system and method to insert visual subtitles in videos is described. The method comprises segmenting an input video signal to extract the speech segments and music segments. Next, a speaker representation is associated for each speech segment corresponding to a speaker visible in the frame. Further, speech segments are analyzed to compute the phones and the duration of each phone. The phones are mapped to a corresponding viseme and a viseme based language model is created with a corresponding score. Most relevant viseme is selected for the speech segments by computing a total viseme score. Further, a speaker representation sequence is created such that phones and emotions in the speech segments are represented as reconstructed lip movements and eyebrow movements. The speaker representation sequence is then integrated with the music segments and super imposed on the input video signal to create subtitles.Type: GrantFiled: March 29, 2017Date of Patent: October 29, 2019Assignee: Tata Consultancy Services LimitedInventors: Chitralekha Bhat, Sunil Kumar Kopparapu, Ashish Panda
-
Patent number: 10319377Abstract: A method and system is provided for estimating clean speech parameters from noisy speech parameters. The method is performed by acquiring speech signals, estimating noise from the acquired speech signals, computing speech features from the acquired speech signals, estimating model parameters from the computed speech features and estimating clean parameters from the estimated noise and the estimated model parameters.Type: GrantFiled: February 28, 2017Date of Patent: June 11, 2019Assignee: Tata Consultancy Services LimitedInventors: Ashish Panda, Sunil Kumar Kopparapu
-
Publication number: 20170287481Abstract: A system and method to insert visual subtitles in videos is described. The method comprises segmenting an input video signal to extract the speech segments and music segments. Next, a speaker representation is associated for each speech segment corresponding to a speaker visible in the frame. Further, speech segments are analysed to compute the phones and the duration of each phone. The phones are mapped to a corresponding viseme and a viseme based language model is created with a corresponding score. Most relevant viseme is selected for the speech segments by computing a total viseme score. Further, a speaker representation sequence is created such that phones and emotions in the speech segments are represented as reconstructed lip movements and eyebrow movements. The speaker representation sequence is then integrated with the music segments and super imposed on the input video signal to create subtitles.Type: ApplicationFiled: March 29, 2017Publication date: October 5, 2017Applicant: Tata Consultancy Services LimitedInventors: Chitralekha Bhat, Sunil Kumar Kopparapu, Ashish Panda
-
Publication number: 20170270952Abstract: A method and system is provided for estimating clean speech parameters from noisy speech parameters. The method is performed by acquiring speech signals, estimating noise from the acquired speech signals, computing speech features from the acquired speech signals, estimating model parameters from the computed speech features and estimating clean parameters from the estimated noise and the estimated model parameters.Type: ApplicationFiled: February 28, 2017Publication date: September 21, 2017Applicant: Tata Consultancy Services LimitedInventors: ASHISH PANDA, Sunil Kumar Kopparapu
-
Patent number: 9659578Abstract: The present disclosure envisages a computer implemented system for identifying significant speech frames within speech signals for facilitating speech recognition. The system receives an input speech signal having a plurality of feature vectors which is passed through a spectrum analyzer. The spectrum analyzer divides the input speech signal into a plurality of speech frames and computes a spectral magnitude of each of the speech frames. There is provided a suitability engine which is enabled to compute a suitability measure for each of the speech frames corresponding to spectral flatness measure (SFM), energy normalized variance (ENV), entropy, signal-to-noise ratio (SNR) and similarity measure. The suitability engine further computes a weighted suitability measure for each of the speech frames.Type: GrantFiled: March 26, 2015Date of Patent: May 23, 2017Assignee: TATA CONSULTANCY SERVICES LTD.Inventors: Ashish Panda, Sunil Kumar Kopparapu
-
Publication number: 20160155441Abstract: The present disclosure envisages a computer implemented system for identifying significant speech frames within speech signals for facilitating speech recognition. The system receives an input speech signal having a plurality of feature vectors which is passed through a spectrum analyzer. The spectrum analyzer divides the input speech signal into a plurality of speech frames and computes a spectral magnitude of each of the speech frames. There is provided a suitability engine which is enabled to compute a suitability measure for each of the speech frames corresponding to spectral flatness measure (SFM), energy normalized variance (ENV), entropy, signal-to-noise ratio (SNR) and similarity measure. The suitability engine further computes a weighted suitability measure for each of the speech frames.Type: ApplicationFiled: March 26, 2015Publication date: June 2, 2016Applicant: TATA CONSULTANCY SERVICES LTD.Inventors: Ashish Panda, Sunil Kumar Kopparapu