Patents by Inventor Ashish Panda

Ashish Panda has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240071373
    Abstract: State of the art Acoustic Models (AM), which are trained using data from one environment, may fail to adapt to another environment, and as a result, application is restricted. The disclosure herein generally relates to speech signal processing, and, more particularly, to a method and system for Automatic Speech Recognition (ASR) using Multi-task Learned Embeddings (MTL). In this approach, MTL embeddings are extracted from an MTL neural network that has been trained using feature vectors from a plurality of speech files. The MTL embeddings are then used for generating an acoustic model, which maybe then used for the purpose of Automatic Speech Recognition, along with the feature vectors and the MTL embeddings.
    Type: Application
    Filed: August 11, 2023
    Publication date: February 29, 2024
    Applicant: Tata Consultancy Services Limited
    Inventors: ASHISH PANDA, SUNIL KUMAR KOPPARAPU, ADITYA RAIKAR, MEETKUMAR HEMAKSHU SONI
  • Patent number: 11340863
    Abstract: Audio based transactions are getting more popular and are envisaged to become common in years to come. With the rise in data protection regulations, muting portions of the audio files is necessary to hide sensitive information from an eavesdropper or accidental hearing by an entity who gets unauthorized access to these audio files. However, it is realized that deleted transaction information in a muted audio files make audit of the transaction challenging and impossible. Embodiments of the present disclosure provide systems and methods of muting audio information in multimedia files and retrieval thereof which is masked and further allows for reconstruction of the original audio conversation or restoration Private to an Entity (P2aE) information without original audio reconstruction when auditing is being exercised.
    Type: Grant
    Filed: February 26, 2020
    Date of Patent: May 24, 2022
    Assignee: Tata Consultancy Services Limited
    Inventors: Sunil Kumar Kopparapu, Ashish Panda
  • Patent number: 11335329
    Abstract: Performance of Automatic Speech Recognition (ASR) for robustness against real world noises and channel distortions is critical. Embodiments herein provide method and system for generating synthetic multi-conditioned data sets for additive noise and channel distortion for training multi-conditioned acoustic models for robust ASR. The method provides a generative noise model generating plurality of types of noise signals for additive noise based on weighted linear combination of plurality of noise basis signals and channel distortion based on estimated channel responses. The generative noise model is a parametric model, wherein basis function selection, number of basis functions to be combined linearly and weightages to be applied to the combinations is tunable, thereby enabling generation of wide variety of noise signals. Further, the noise signals are added to set of training speech utterances under set of constraints providing the multi-conditioned data sets, imitating real world effects.
    Type: Grant
    Filed: March 24, 2020
    Date of Patent: May 17, 2022
    Assignee: Tata Consultancy Services Limited
    Inventors: Meetkumar Hemakshu Soni, Sonal Joshi, Ashish Panda
  • Patent number: 11322156
    Abstract: With recent real-world applications of speaker and speech recognition systems, robust features for degraded speech have become a necessity. In general, degraded speech results in poor performance of any speech-based system. This poor performance can be attributed to feature extraction functionality of speech-based system which takes input speech file and converts it into a representation called as a feature. Embodiments of the present disclosure provide systems and methods that compute distance between each degraded speech feature extracted from an input speech signal with each clean speech feature comprised in a memory of the system to obtain set of matched clean speech features wherein at least a subset of cleaned speech features are dynamically selected based on a pre-defined threshold and the computed distance, thereby computing statistics for the dynamically selected clean speech features set for utilizing in at least one of a speech recognition system and a speaker recognition system.
    Type: Grant
    Filed: December 26, 2019
    Date of Patent: May 3, 2022
    Assignee: Tata Consultancy Services Limited
    Inventors: Ashish Panda, Sunilkumar Kopparapu, Sonal Sunil Joshi
  • Publication number: 20210065681
    Abstract: Performance of Automatic Speech Recognition (ASR) for robustness against real world noises and channel distortions is critical. Embodiments herein provide method and system for generating synthetic multi-conditioned data sets for additive noise and channel distortion for training multi-conditioned acoustic models for robust ASR. The method provides a generative noise model generating plurality of types of noise signals for additive noise based on weighted linear combination of plurality of noise basis signals and channel distortion based on estimated channel responses. The generative noise model is a parametric model, wherein basis function selection, number of basis functions to be combined linearly and weightages to be applied to the combinations is tunable, thereby enabling generation of wide variety of noise signals. Further, the noise signals are added to set of training speech utterances under set of constraints providing the multi-conditioned data sets, imitating real world effects.
    Type: Application
    Filed: March 24, 2020
    Publication date: March 4, 2021
    Applicant: Tata Consultancy Services Limited
    Inventors: Meetkumar Hemakshu SONI, Sonal JOSHI, Ashish PANDA
  • Publication number: 20200310746
    Abstract: Audio based transactions are getting more popular and are envisaged to become common in years to come. With the rise in data protection regulations, muting portions of the audio files is necessary to hide sensitive information from an eavesdropper or accidental hearing by an entity who gets unauthorized access to these audio files. However, it is realized that deleted transaction information in a muted audio files make audit of the transaction challenging and impossible. Embodiments of the present disclosure provide systems and methods of muting audio information in multimedia files and retrieval thereof which is masked and further allows for reconstruction of the original audio conversation or restoration Private to an Entity (P2aE) information without original audio reconstruction when auditing is being exercised.
    Type: Application
    Filed: February 26, 2020
    Publication date: October 1, 2020
    Applicant: Tata Consultancy Services Limited
    Inventors: Sunil Kumar Kopparapu, Ashish Panda
  • Publication number: 20200211568
    Abstract: With recent real-world applications of speaker and speech recognition systems, robust features for degraded speech have become a necessity. In general, degraded speech results in poor performance of any speech-based system. This poor performance can be attributed to feature extraction functionality of speech-based system which takes input speech file and converts it into a representation called as a feature. Embodiments of the present disclosure provide systems and methods that compute distance between each degraded speech feature extracted from an input speech signal with each clean speech feature comprised in a memory of the system to obtain set of matched clean speech features wherein at least a subset of cleaned speech features are dynamically selected based on a pre-defined threshold and the computed distance, thereby computing statistics for the dynamically selected clean speech features set for utilizing in at least one of a speech recognition system and a speaker recognition system.
    Type: Application
    Filed: December 26, 2019
    Publication date: July 2, 2020
    Applicant: Tata Consultancy Services Limited
    Inventors: Ashish PANDA, Sunilkumar KOPPARAPU, Sonal Sunil JOSHI
  • Patent number: 10460732
    Abstract: A system and method to insert visual subtitles in videos is described. The method comprises segmenting an input video signal to extract the speech segments and music segments. Next, a speaker representation is associated for each speech segment corresponding to a speaker visible in the frame. Further, speech segments are analyzed to compute the phones and the duration of each phone. The phones are mapped to a corresponding viseme and a viseme based language model is created with a corresponding score. Most relevant viseme is selected for the speech segments by computing a total viseme score. Further, a speaker representation sequence is created such that phones and emotions in the speech segments are represented as reconstructed lip movements and eyebrow movements. The speaker representation sequence is then integrated with the music segments and super imposed on the input video signal to create subtitles.
    Type: Grant
    Filed: March 29, 2017
    Date of Patent: October 29, 2019
    Assignee: Tata Consultancy Services Limited
    Inventors: Chitralekha Bhat, Sunil Kumar Kopparapu, Ashish Panda
  • Patent number: 10319377
    Abstract: A method and system is provided for estimating clean speech parameters from noisy speech parameters. The method is performed by acquiring speech signals, estimating noise from the acquired speech signals, computing speech features from the acquired speech signals, estimating model parameters from the computed speech features and estimating clean parameters from the estimated noise and the estimated model parameters.
    Type: Grant
    Filed: February 28, 2017
    Date of Patent: June 11, 2019
    Assignee: Tata Consultancy Services Limited
    Inventors: Ashish Panda, Sunil Kumar Kopparapu
  • Publication number: 20170287481
    Abstract: A system and method to insert visual subtitles in videos is described. The method comprises segmenting an input video signal to extract the speech segments and music segments. Next, a speaker representation is associated for each speech segment corresponding to a speaker visible in the frame. Further, speech segments are analysed to compute the phones and the duration of each phone. The phones are mapped to a corresponding viseme and a viseme based language model is created with a corresponding score. Most relevant viseme is selected for the speech segments by computing a total viseme score. Further, a speaker representation sequence is created such that phones and emotions in the speech segments are represented as reconstructed lip movements and eyebrow movements. The speaker representation sequence is then integrated with the music segments and super imposed on the input video signal to create subtitles.
    Type: Application
    Filed: March 29, 2017
    Publication date: October 5, 2017
    Applicant: Tata Consultancy Services Limited
    Inventors: Chitralekha Bhat, Sunil Kumar Kopparapu, Ashish Panda
  • Publication number: 20170270952
    Abstract: A method and system is provided for estimating clean speech parameters from noisy speech parameters. The method is performed by acquiring speech signals, estimating noise from the acquired speech signals, computing speech features from the acquired speech signals, estimating model parameters from the computed speech features and estimating clean parameters from the estimated noise and the estimated model parameters.
    Type: Application
    Filed: February 28, 2017
    Publication date: September 21, 2017
    Applicant: Tata Consultancy Services Limited
    Inventors: ASHISH PANDA, Sunil Kumar Kopparapu
  • Patent number: 9659578
    Abstract: The present disclosure envisages a computer implemented system for identifying significant speech frames within speech signals for facilitating speech recognition. The system receives an input speech signal having a plurality of feature vectors which is passed through a spectrum analyzer. The spectrum analyzer divides the input speech signal into a plurality of speech frames and computes a spectral magnitude of each of the speech frames. There is provided a suitability engine which is enabled to compute a suitability measure for each of the speech frames corresponding to spectral flatness measure (SFM), energy normalized variance (ENV), entropy, signal-to-noise ratio (SNR) and similarity measure. The suitability engine further computes a weighted suitability measure for each of the speech frames.
    Type: Grant
    Filed: March 26, 2015
    Date of Patent: May 23, 2017
    Assignee: TATA CONSULTANCY SERVICES LTD.
    Inventors: Ashish Panda, Sunil Kumar Kopparapu
  • Publication number: 20160155441
    Abstract: The present disclosure envisages a computer implemented system for identifying significant speech frames within speech signals for facilitating speech recognition. The system receives an input speech signal having a plurality of feature vectors which is passed through a spectrum analyzer. The spectrum analyzer divides the input speech signal into a plurality of speech frames and computes a spectral magnitude of each of the speech frames. There is provided a suitability engine which is enabled to compute a suitability measure for each of the speech frames corresponding to spectral flatness measure (SFM), energy normalized variance (ENV), entropy, signal-to-noise ratio (SNR) and similarity measure. The suitability engine further computes a weighted suitability measure for each of the speech frames.
    Type: Application
    Filed: March 26, 2015
    Publication date: June 2, 2016
    Applicant: TATA CONSULTANCY SERVICES LTD.
    Inventors: Ashish Panda, Sunil Kumar Kopparapu