Patents by Inventor Dushyant Sharma

Dushyant Sharma has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

System and method for secure transcription generation

Patent number: 12367860

Abstract: A method, computer program product, and computing system for receiving an input speech signal. A transcription of the input speech signal may be generated via an automated speech recognition (ASR) system. One or more splitting points between one or more sensitive content portions and one or more non-sensitive content portions from the transcription may be identified. The input speech signal maybe split into the one or more sensitive content portions and the one or more non-sensitive content portions based upon, at least in part, the one or more splitting points, thus defining one or more sensitive content signals and one or more non-sensitive content signals.

Type: Grant

Filed: June 3, 2022

Date of Patent: July 22, 2025

Assignee: Microsoft Technology Licensing, LLC

Inventors: William F. Ganong, III, Uwe Helmut Jost, Dushyant Sharma
Multi-Channel Speech Compression System and Method

Publication number: 20250220378

Abstract: A method, computer program product, and computing system for generating a plurality of acoustic relative transfer functions associated with a plurality of audio acquisition devices of an audio recording system deployed in an acoustic environment. Acoustic relative transfer functions of at least a pair of audio acquisition devices of the plurality of audio acquisition devices may be compared. Location information associated with an acoustic source within the acoustic environment may be determined based upon, at least in part, the comparison of the acoustic relative transfer functions of the at least a pair of audio acquisition devices of the plurality of audio acquisition devices.

Type: Application

Filed: March 24, 2025

Publication date: July 3, 2025

Inventors: Dushyant Sharma, Patrick A. Naylor, Uwe Helmut Jost
System and Method for Securely Transmitting Voice Signals

Publication number: 20250191597

Abstract: A method, computer program product, and computing system for securely transmitting voice signals. A speech signal including a content component and a speaker component of a first voice is received at an encoder. The speaker component of the speech signal is processed, using machine learning, to generate a speaker embedding. The content component of the voice signal is processed, using machine learning and based at least on the speaker embedding, to generate a content embedding having minimized speaker information. The content embedding is transmitted to a decoder for restoring the received speech signal.

Type: Application

Filed: December 7, 2023

Publication date: June 12, 2025

Inventors: Dushyant Sharma, Patrick A. Naylor, Chandramouli Shama Sastry
System and Method for Secure Speech Feature Extraction

Publication number: 20250191599

Abstract: A method, computer program product, and computing system for secure speech feature extraction. A speech signal comprising content information and speaker information is received and a component of the speaker information is altered to generate an augmented voice signal. In a first neural network, first embeddings of the received voice signal are generated. In a second neural network, second embeddings of the received voice signal having minimized speaker information based on the augmented voice signal are generated. The second neural network is trained to generate the second embeddings to be similar to the first embeddings generated by the first neural network.

Type: Application

Filed: December 7, 2023

Publication date: June 12, 2025

Inventors: Dushyant Sharma, Patrick A. Naylor, Sri Harsha Dumpala, Chandramouli Shama Sastry
Methods, apparatuses, and systems for dynamically navigating interactive communication systems

Patent number: 12301757

Abstract: Methods, apparatuses, and systems are described for dynamically navigating interactive communication systems. An example method may comprise: receiving, from a user device, sound waves or audio information, the sound waves or audio information indicative of a request to initiate an interactive communication session with a communication system of a biller or merchant; interpreting, based on the sound waves or audio information, an intent of the communication session and an identity of the biller or merchant; retrieving a predetermined interaction coding associated with the biller or merchant; and initiating the interactive communication session with the communication system of the biller or merchant based on the predetermined interaction coding.

Type: Grant

Filed: January 11, 2024

Date of Patent: May 13, 2025

Assignee: PAYMENTUS CORPORATION

Inventor: Dushyant Sharma
Multi-channel speech compression system and method

Patent number: 12289595

Abstract: A method, computer program product, and computing system for generating a plurality of acoustic relative transfer functions associated with a plurality of audio acquisition devices of an audio recording system deployed in an acoustic environment. Acoustic relative transfer functions of at least a pair of audio acquisition devices of the plurality of audio acquisition devices may be compared. Location information associated with an acoustic source within the acoustic environment may be determined based upon, at least in part, the comparison of the acoustic relative transfer functions of the at least a pair of audio acquisition devices of the plurality of audio acquisition devices.

Type: Grant

Filed: February 11, 2022

Date of Patent: April 29, 2025

Assignee: Microsoft Technology Licensing, LLC

Inventors: Dushyant Sharma, Patrick A. Naylor, Uwe Helmut Jost
System and method for watermarking audio data for automated speech recognition (ASR) systems

Patent number: 12260866

Abstract: A method, computer program product, and computing system for processing audio information associated with a speech processing system and encoding a watermark in a non-disruptive portion of the audio information.

Type: Grant

Filed: August 30, 2022

Date of Patent: March 25, 2025

Assignee: Microsoft Technology Licensing, LLC

Inventors: Patrick Aubrey Naylor, Dushyant Sharma, William Francis Ganong, III, Uwe Helmut Jost, Ljubomir Milanovic
System and Method for Speech Enhancement in Multichannel Audio Processing Systems

Publication number: 20250087230

Abstract: A method, computer program product, and computing system for enhancement of audio signals received from a plurality of microphones. A multichannel audio signal is received from a plurality of microphones and is processed with a short-time discrete cosine transform (STDCT) to generate a real-valued spectral representation of the multichannel signal encoding both magnitude and phase information. Magnitude- and phase-dependent weights are generated, and an enhanced single-channel signal is produced based upon, at least in part, the spectral representation of the multichannel signal and the magnitude- and phase-dependent weights.

Type: Application

Filed: September 13, 2023

Publication date: March 13, 2025

Inventors: Stanislav Kruchinin, Dushyant Sharma, Rong Gong
SPEECH DIALOG SYSTEM AND RECIPROCITY ENFORCED NEURAL RELATIVE TRANSFER FUNCTION ESTIMATOR

Publication number: 20250087231

Abstract: There is provided a speech processing system that includes a neural encoder module. A processor that receives an audio signal; and the memory that contains instructions that control said processor to perform operations that process speech. In an implementation, a front end module can include a Neural Spatial RTF Estimator and a neural spatial and residual encoder (NSRE) configured accept as inputs a spectral encoded reference channel stream to output Neural Transfer Functions (NTFs). In another implementation, a front end module encodes and outputs a Ch1 bitstream; computes a plurality of relative transfer functions (RTFs) for an N-Channel signal and outputs an N?1 RTFs or an RTF codebook ids and computes and processes an N?1 residual stream; and a back end module comprising a neural encoder module configured to accept the RTFs and output an encoded speech signal comprising an embedding that comprises features extracted from RTFs.

Type: Application

Filed: September 20, 2024

Publication date: March 13, 2025

Inventors: Dushyant SHARMA, Patrick NAYLOR, Daniel T. JONES
System and Method for Disentangling Audio Signal Information

Publication number: 20250078851

Abstract: A method, computer program product, and computing system for disentangling background information from speaker information in a speech signal. Background information is extracted from the speech signal to generate a background acoustics embedding and speaker information is extracted from the speech signal to generate a speaker acoustics embedding. A first loss factor is applied to the background acoustics embedding to decrease speaker information therein to generate a processed background acoustics embedding using machine learning and a second loss factor is applied to the speaker acoustics embedding to decrease background information therein to generate a processed speaker acoustics embedding using machine learning. At least one of the processed background acoustics embedding and the processed speaker acoustics embedding is output to a speech processing system.

Type: Application

Filed: December 7, 2023

Publication date: March 6, 2025

Inventors: Dushyant Sharma, Patrick A. Naylor, Sri Harsha Dumpala, Chandramouli Shama Sastry
Data augmentation system and method for multi-microphone systems

Patent number: 12243514

Abstract: A method, computer program product, and computing system for obtaining one or more speech signals from a first device, thus defining one or more first device speech signals. One or more speech signals may be obtained from a second device, thus defining one or more second device speech signals. A noise component model may be selected from a plurality of noise component models based upon, at least in part, the one or more first device speech signals and the one or more second device speech signals. The one or more second device speech signals may be augmented, at run-time, based upon, at least in part, the noise component model.

Type: Grant

Filed: January 20, 2022

Date of Patent: March 4, 2025

Assignee: Microsoft Technology Licensing, LLC

Inventors: Dushyant Sharma, Ljubomir Milanovic, Philipp Salletmayr, Rong Gong, Patrick A. Naylor
System and method for secure training of speech processing systems

Patent number: 12190888

Abstract: A method, computer program product, and computing system for generating an obscured speech signal from an input speech signal and an obscured transcription from a transcription of the input speech signal. A speaker embedding may be extracted from the input speech signal. A speaker embedding delta may be generated based upon, at least in part, the extracted speaker embedding and a synthetic speaker embedding. A synthetic speech signal may be generated from the obscured speech signal using the synthetic speaker embedding. A residual signal may be generated based upon, at least in part, the obscured speech signal and the speaker embedding delta. A speech processing system may be trained using the obscured transcription, the synthetic speech signal, the speaker embedding delta, and the residual signal.

Type: Grant

Filed: June 15, 2022

Date of Patent: January 7, 2025

Assignee: Microsoft Technology Licensing, LLC

Inventors: Shou-Chun Yin, Junho Park, Dushyant Sharma, DoYeong Kim, Francesco Nespoli
SYSTEMS AND METHODS FOR INTERACTIVE VIDEO PRESENTATION OF TRANSACTIONAL INFORMATION

Publication number: 20250005546

Abstract: Methods, apparatuses, and computer program products are described for presenting an interactive audio-visual presentation of transaction documents. A method can include receiving a bill associated with a payor and payee, using a textual language processor or the like to identify content fields from the bill and assign markups and/or metadata to content fields, and using the content fields, markups, and/or metadata to generate an audio-visual presentation associated with the bill. This audio-visual presentation can be presented to the payor. The payee may then interact with the audio-visual presentation, for instance by verbal, visual, manual, or textual response. A verbal language processing engine, natural language processing engine, audio-visual language processing engine, or visual-manual language processing engine can be initiated to facilitate interpretation of the payee response and generate a further audio-visual presentation.

Type: Application

Filed: September 9, 2024

Publication date: January 2, 2025

Inventor: Dushyant Sharma
System and Method for Secure Data Augmentation for Speech Processing Systems

Publication number: 20240420677

Abstract: A method, computer program product, and computing system for receiving an input speech signal. A transcription of the input speech signal may be received. A speaker embedding may be extracted from the input speech signal. Acoustic properties from the input speech signal may be extracted. An obscured transcription may be generated from the transcription, where the obscured transcription includes obscured representations of sensitive content from the transcription. An obscured speech signal may be generated based upon, at least in part, the extracted speaker embedding and the obscured transcription, where the obscured speech signal includes obscured representations of sensitive content from the input speech signal. The obscured speech signal may be augmented based upon, at least in part, the extracted acoustic properties.

Type: Application

Filed: July 8, 2024

Publication date: December 19, 2024

Inventors: Dushyant Sharma, Patrick Aubrey Naylor, Francesco Nespoli
System and Method for Modulation Domain-Based Audio Signal Encoding

Publication number: 20240420722

Abstract: A method, computer program product, and computing system for processing an audio signal by converting the audio signal to the modulation domain. The modulation domain audio signal is encoded with a plurality of carrier signals and a plurality of modulator signals derived from the modulation domain audio signal. The encoded modulation domain audio signal is converted to the time domain.

Type: Application

Filed: June 14, 2023

Publication date: December 19, 2024

Inventors: Dushyant Sharma, Patrick Aubrey Naylor, William Francis Ganong, III
Method for neural beamforming, channel shortening and noise reduction

Patent number: 12165668

Abstract: A method of performing at least de-reverberation and noise-reduction of an input sound signal of at least one input channel includes: performing, using at least one filter element, at least one of de-reverberation and noise-reduction of the input sound signal to generate a clean output sound signal; and determining, by a non-intrusive measure (NIM) estimation element, at least one non-intrusive measure (NIM) from the sound signal, wherein the at least one NIM includes at least one of voice activity detection (VAD) posterior, reverberation time, clarity index, direct-to-reverberant ratio (DRR), and signal-to-noise ratio (SNR); the de-reverberation is achieved by applying at least one channel shortening (CS) filter component of the at least one filter element in conjunction with the at least one NIM; and the noise reduction is performed in combination with the de-reverberation by the channel shortening (CS) filter component.

Type: Grant

Filed: February 18, 2022

Date of Patent: December 10, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Dushyant Sharma, James Fosburgh, Patrick Naylor
System and Method for Secure Processing of Speech Signals using Pseudo-speech Representations

Publication number: 20240404504

Abstract: A method, computer program product, and computing system for processing a speech signal. A sensitive portion of the speech signal is identified. A pseudo-speech representation of the sensitive portion is generated using a voice converter system. Speech processing is performed on the speech signal and the pseudo-speech representation of the sensitive portion using a speech processing system.

Type: Application

Filed: May 31, 2023

Publication date: December 5, 2024

Inventors: Dushyant Sharma, William Francis Ganong, III, Daniel Paulino Almendro Barreda, Patrick Aubrey Naylor, Alvaro Martin Iturralde Zurita, Francesco Nespoli
System and method for data augmentation of feature-based voice data

Patent number: 12154541

Abstract: A method, computer program product, and computing system for receiving feature-based voice data associated with a first acoustic domain. One or more reverberation-based augmentations may be performed on at least a portion of the feature-based voice data, thus defining reverberation-augmented feature-based voice data.

Type: Grant

Filed: March 10, 2021

Date of Patent: November 26, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Dushyant Sharma, Patrick A. Naylor, James W. Fosburgh, Do Yeong Kim
Multi-channel speech compression system and method

Patent number: 12149914

Abstract: A method, computer program product, and computing system for obtaining machine vision encounter information using one or more machine vision systems. Audio encounter information may be obtained using a plurality of audio acquisition devices of an audio recording system. The audio encounter information may be encoded using an audio codec. The encoding of the audio encounter information by the audio codec may be adapted based upon, at least in part, the machine vision encounter information.

Type: Grant

Filed: February 11, 2022

Date of Patent: November 19, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Dushyant Sharma, Patrick A. Naylor, Uwe Helmut Jost
Feature domain bandwidth extension and spectral rebalance for ASR data augmentation

Patent number: 12148437

Abstract: A method of processing speech includes: providing a first set of audio data having audio features in a first bandwidth; down-sampling the first set of audio data to a second bandwidth lower than the first bandwidth; producing, by a high frequency reconstruction network (HFRN), an estimate of audio features in the first bandwidth for the first set of audio data, based on at least the down-sampled audio data; inputting, into the HFRN, a second set of audio data having audio features in the second bandwidth; producing, by the HFRN, based on a second set of audio data having audio features in the second bandwidth, an estimate of audio features in the first bandwidth for the second set of audio data; and training a speech processing system (SPS) using the estimates of audio features in the first bandwidth for the first and second sets of audio data.

Type: Grant

Filed: December 10, 2021

Date of Patent: November 19, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventor: Dushyant Sharma

1 2 3 4 5 … next