Patents by Inventor Dushyant Sharma

Dushyant Sharma has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12289595
    Abstract: A method, computer program product, and computing system for generating a plurality of acoustic relative transfer functions associated with a plurality of audio acquisition devices of an audio recording system deployed in an acoustic environment. Acoustic relative transfer functions of at least a pair of audio acquisition devices of the plurality of audio acquisition devices may be compared. Location information associated with an acoustic source within the acoustic environment may be determined based upon, at least in part, the comparison of the acoustic relative transfer functions of the at least a pair of audio acquisition devices of the plurality of audio acquisition devices.
    Type: Grant
    Filed: February 11, 2022
    Date of Patent: April 29, 2025
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Dushyant Sharma, Patrick A. Naylor, Uwe Helmut Jost
  • Patent number: 12260866
    Abstract: A method, computer program product, and computing system for processing audio information associated with a speech processing system and encoding a watermark in a non-disruptive portion of the audio information.
    Type: Grant
    Filed: August 30, 2022
    Date of Patent: March 25, 2025
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Patrick Aubrey Naylor, Dushyant Sharma, William Francis Ganong, III, Uwe Helmut Jost, Ljubomir Milanovic
  • Publication number: 20250087230
    Abstract: A method, computer program product, and computing system for enhancement of audio signals received from a plurality of microphones. A multichannel audio signal is received from a plurality of microphones and is processed with a short-time discrete cosine transform (STDCT) to generate a real-valued spectral representation of the multichannel signal encoding both magnitude and phase information. Magnitude- and phase-dependent weights are generated, and an enhanced single-channel signal is produced based upon, at least in part, the spectral representation of the multichannel signal and the magnitude- and phase-dependent weights.
    Type: Application
    Filed: September 13, 2023
    Publication date: March 13, 2025
    Inventors: Stanislav Kruchinin, Dushyant Sharma, Rong Gong
  • Publication number: 20250087231
    Abstract: There is provided a speech processing system that includes a neural encoder module. A processor that receives an audio signal; and the memory that contains instructions that control said processor to perform operations that process speech. In an implementation, a front end module can include a Neural Spatial RTF Estimator and a neural spatial and residual encoder (NSRE) configured accept as inputs a spectral encoded reference channel stream to output Neural Transfer Functions (NTFs). In another implementation, a front end module encodes and outputs a Ch1 bitstream; computes a plurality of relative transfer functions (RTFs) for an N-Channel signal and outputs an N?1 RTFs or an RTF codebook ids and computes and processes an N?1 residual stream; and a back end module comprising a neural encoder module configured to accept the RTFs and output an encoded speech signal comprising an embedding that comprises features extracted from RTFs.
    Type: Application
    Filed: September 20, 2024
    Publication date: March 13, 2025
    Inventors: Dushyant SHARMA, Patrick NAYLOR, Daniel T. JONES
  • Publication number: 20250078851
    Abstract: A method, computer program product, and computing system for disentangling background information from speaker information in a speech signal. Background information is extracted from the speech signal to generate a background acoustics embedding and speaker information is extracted from the speech signal to generate a speaker acoustics embedding. A first loss factor is applied to the background acoustics embedding to decrease speaker information therein to generate a processed background acoustics embedding using machine learning and a second loss factor is applied to the speaker acoustics embedding to decrease background information therein to generate a processed speaker acoustics embedding using machine learning. At least one of the processed background acoustics embedding and the processed speaker acoustics embedding is output to a speech processing system.
    Type: Application
    Filed: December 7, 2023
    Publication date: March 6, 2025
    Inventors: Dushyant Sharma, Patrick A. Naylor, Sri Harsha Dumpala, Chandramouli Shama Sastry
  • Patent number: 12243514
    Abstract: A method, computer program product, and computing system for obtaining one or more speech signals from a first device, thus defining one or more first device speech signals. One or more speech signals may be obtained from a second device, thus defining one or more second device speech signals. A noise component model may be selected from a plurality of noise component models based upon, at least in part, the one or more first device speech signals and the one or more second device speech signals. The one or more second device speech signals may be augmented, at run-time, based upon, at least in part, the noise component model.
    Type: Grant
    Filed: January 20, 2022
    Date of Patent: March 4, 2025
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Dushyant Sharma, Ljubomir Milanovic, Philipp Salletmayr, Rong Gong, Patrick A. Naylor
  • Patent number: 12190888
    Abstract: A method, computer program product, and computing system for generating an obscured speech signal from an input speech signal and an obscured transcription from a transcription of the input speech signal. A speaker embedding may be extracted from the input speech signal. A speaker embedding delta may be generated based upon, at least in part, the extracted speaker embedding and a synthetic speaker embedding. A synthetic speech signal may be generated from the obscured speech signal using the synthetic speaker embedding. A residual signal may be generated based upon, at least in part, the obscured speech signal and the speaker embedding delta. A speech processing system may be trained using the obscured transcription, the synthetic speech signal, the speaker embedding delta, and the residual signal.
    Type: Grant
    Filed: June 15, 2022
    Date of Patent: January 7, 2025
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Shou-Chun Yin, Junho Park, Dushyant Sharma, DoYeong Kim, Francesco Nespoli
  • Publication number: 20250005546
    Abstract: Methods, apparatuses, and computer program products are described for presenting an interactive audio-visual presentation of transaction documents. A method can include receiving a bill associated with a payor and payee, using a textual language processor or the like to identify content fields from the bill and assign markups and/or metadata to content fields, and using the content fields, markups, and/or metadata to generate an audio-visual presentation associated with the bill. This audio-visual presentation can be presented to the payor. The payee may then interact with the audio-visual presentation, for instance by verbal, visual, manual, or textual response. A verbal language processing engine, natural language processing engine, audio-visual language processing engine, or visual-manual language processing engine can be initiated to facilitate interpretation of the payee response and generate a further audio-visual presentation.
    Type: Application
    Filed: September 9, 2024
    Publication date: January 2, 2025
    Inventor: Dushyant Sharma
  • Publication number: 20240420722
    Abstract: A method, computer program product, and computing system for processing an audio signal by converting the audio signal to the modulation domain. The modulation domain audio signal is encoded with a plurality of carrier signals and a plurality of modulator signals derived from the modulation domain audio signal. The encoded modulation domain audio signal is converted to the time domain.
    Type: Application
    Filed: June 14, 2023
    Publication date: December 19, 2024
    Inventors: Dushyant Sharma, Patrick Aubrey Naylor, William Francis Ganong, III
  • Publication number: 20240420677
    Abstract: A method, computer program product, and computing system for receiving an input speech signal. A transcription of the input speech signal may be received. A speaker embedding may be extracted from the input speech signal. Acoustic properties from the input speech signal may be extracted. An obscured transcription may be generated from the transcription, where the obscured transcription includes obscured representations of sensitive content from the transcription. An obscured speech signal may be generated based upon, at least in part, the extracted speaker embedding and the obscured transcription, where the obscured speech signal includes obscured representations of sensitive content from the input speech signal. The obscured speech signal may be augmented based upon, at least in part, the extracted acoustic properties.
    Type: Application
    Filed: July 8, 2024
    Publication date: December 19, 2024
    Inventors: Dushyant Sharma, Patrick Aubrey Naylor, Francesco Nespoli
  • Patent number: 12165668
    Abstract: A method of performing at least de-reverberation and noise-reduction of an input sound signal of at least one input channel includes: performing, using at least one filter element, at least one of de-reverberation and noise-reduction of the input sound signal to generate a clean output sound signal; and determining, by a non-intrusive measure (NIM) estimation element, at least one non-intrusive measure (NIM) from the sound signal, wherein the at least one NIM includes at least one of voice activity detection (VAD) posterior, reverberation time, clarity index, direct-to-reverberant ratio (DRR), and signal-to-noise ratio (SNR); the de-reverberation is achieved by applying at least one channel shortening (CS) filter component of the at least one filter element in conjunction with the at least one NIM; and the noise reduction is performed in combination with the de-reverberation by the channel shortening (CS) filter component.
    Type: Grant
    Filed: February 18, 2022
    Date of Patent: December 10, 2024
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Dushyant Sharma, James Fosburgh, Patrick Naylor
  • Publication number: 20240404504
    Abstract: A method, computer program product, and computing system for processing a speech signal. A sensitive portion of the speech signal is identified. A pseudo-speech representation of the sensitive portion is generated using a voice converter system. Speech processing is performed on the speech signal and the pseudo-speech representation of the sensitive portion using a speech processing system.
    Type: Application
    Filed: May 31, 2023
    Publication date: December 5, 2024
    Inventors: Dushyant Sharma, William Francis Ganong, III, Daniel Paulino Almendro Barreda, Patrick Aubrey Naylor, Alvaro Martin Iturralde Zurita, Francesco Nespoli
  • Patent number: 12154541
    Abstract: A method, computer program product, and computing system for receiving feature-based voice data associated with a first acoustic domain. One or more reverberation-based augmentations may be performed on at least a portion of the feature-based voice data, thus defining reverberation-augmented feature-based voice data.
    Type: Grant
    Filed: March 10, 2021
    Date of Patent: November 26, 2024
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Dushyant Sharma, Patrick A. Naylor, James W. Fosburgh, Do Yeong Kim
  • Patent number: 12149914
    Abstract: A method, computer program product, and computing system for obtaining machine vision encounter information using one or more machine vision systems. Audio encounter information may be obtained using a plurality of audio acquisition devices of an audio recording system. The audio encounter information may be encoded using an audio codec. The encoding of the audio encounter information by the audio codec may be adapted based upon, at least in part, the machine vision encounter information.
    Type: Grant
    Filed: February 11, 2022
    Date of Patent: November 19, 2024
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Dushyant Sharma, Patrick A. Naylor, Uwe Helmut Jost
  • Patent number: 12148437
    Abstract: A method of processing speech includes: providing a first set of audio data having audio features in a first bandwidth; down-sampling the first set of audio data to a second bandwidth lower than the first bandwidth; producing, by a high frequency reconstruction network (HFRN), an estimate of audio features in the first bandwidth for the first set of audio data, based on at least the down-sampled audio data; inputting, into the HFRN, a second set of audio data having audio features in the second bandwidth; producing, by the HFRN, based on a second set of audio data having audio features in the second bandwidth, an estimate of audio features in the first bandwidth for the second set of audio data; and training a speech processing system (SPS) using the estimates of audio features in the first bandwidth for the first and second sets of audio data.
    Type: Grant
    Filed: December 10, 2021
    Date of Patent: November 19, 2024
    Assignee: Microsoft Technology Licensing, LLC
    Inventor: Dushyant Sharma
  • Patent number: 12143798
    Abstract: A method, computer program product, and computing system for generating a plurality of acoustic relative transfer functions associated with a plurality of audio acquisition devices of an audio recording system deployed in an acoustic environment. At least a pair of the plurality of acoustic relative transfer functions from time frames may be compared. A change in the acoustic environment may be detected based upon, at least in part, the comparison of the plurality of acoustic relative transfer functions from at least the pair of time frames.
    Type: Grant
    Filed: February 11, 2022
    Date of Patent: November 12, 2024
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Dushyant Sharma, Patrick A. Naylor, Uwe Helmut Jost
  • Patent number: 12142293
    Abstract: There is provided a speech processing system that includes a neural encoder module. A processor that receives an audio signal; and the memory that contains instructions that control said processor to perform operations that process speech. In an implementation, a front end module can include a Neural Spatial RTF Estimator and a neural spatial and residual encoder (NSRE) configured accept as inputs a spectral encoded reference channel stream to output Neural Transfer Functions (NTFs). In another implementation, a front end module encodes and outputs a Ch1 bitstream; computes a plurality of relative transfer functions (RTFs) for an N-Channel signal and outputs an N?1 RTFs or an RTF codebook ids and computes and processes an N?1 residual stream; and a back end module comprising a neural encoder module configured to accept the RTFs and output an encoded speech signal comprising an embedding that comprises features extracted from RTFs.
    Type: Grant
    Filed: June 30, 2022
    Date of Patent: November 12, 2024
    Assignee: Microsoft Technology Licensing, LLC.
    Inventors: Dushyant Sharma, Patrick Naylor, Daniel T. Jones
  • Patent number: 12118522
    Abstract: Methods, apparatuses, and computer program products are described for presenting an interactive audio-visual presentation of transaction documents. A method can include receiving a bill associated with a payor and payee, using a textual language processor or the like to identify content fields from the bill and assign markups and/or metadata to content fields, and using the content fields, markups, and/or metadata to generate an audio-visual presentation associated with the bill. This audio-visual presentation can be presented to the payor. The payee may then interact with the audio-visual presentation, for instance by verbal, visual, manual, or textual response. A verbal language processing engine, natural language processing engine, audio-visual language processing engine, or visual-manual language processing engine can be initiated to facilitate interpretation of the payee response and generate a further audio-visual presentation.
    Type: Grant
    Filed: August 24, 2020
    Date of Patent: October 15, 2024
    Assignee: PAYMENTUS CORPORATION
    Inventor: Dushyant Sharma
  • Patent number: 12112741
    Abstract: A method, computer program product, and computing system for defining a model representative of a plurality of acoustic variations to a speech signal, thus defining a plurality of time-varying spectral modifications. The plurality of time-varying spectral modifications may be applied to a reference signal using a filtering operation, thus generating a time-varying spectrally-augmented signal.
    Type: Grant
    Filed: February 18, 2021
    Date of Patent: October 8, 2024
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Patrick A. Naylor, Dushyant Sharma, Uwe Helmut Jost, William F Ganong, III
  • Patent number: 12114147
    Abstract: A method, computer program product, and computing system for generating a plurality of acoustic relative transfer functions between a plurality of audio acquisition devices of an audio recording system based upon, at least in part, one or more of a predefined speech processing application and a predefined acoustic environment. An acoustic relative transfer function codebook may be generated using the plurality of acoustic relative transfer functions. One or more channels from the plurality of audio acquisition devices of the audio recording system may be encoded using the acoustic relative transfer function codebook.
    Type: Grant
    Filed: February 11, 2022
    Date of Patent: October 8, 2024
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Dushyant Sharma, Patrick A. Naylor, Uwe Helmut Jost