Patents by Inventor Prashant SRIDHAR

Prashant SRIDHAR has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Stochastic future context for speech processing

Patent number: 12334055

Abstract: The amount of future context used in a speech processing application allows for tradeoffs between performance and the delay in providing results to users. Existing speech processing applications may be trained with a specified future context size and perform poorly when used in production with a different future context size. A speech processing application trained using a stochastic future context allows a trained neural network to be used in production with different amounts of future context. During an update step in training, a future-context size may be sampled from a probability distribution, used to mask a neural network, and compute an output of the masked neural network. The output may then be used to compute a loss value and update parameters of the neural network. The trained neural network may then be used in production with different amounts of future context to provide greater flexibility for production speech processing applications.

Type: Grant

Filed: November 18, 2021

Date of Patent: June 17, 2025

Assignee: ASAPP, INC.

Inventors: Kwangyoun Kim, Felix Wu, Prashant Sridhar, Kyu Jeong Han
TARGETED VOICE SEPARATION BY SPEAKER CONDITIONED ON SPECTROGRAM MASKING

Publication number: 20240203426

Abstract: Techniques are disclosed that enable processing of audio data to generate one or more refined versions of audio data, where each of the refined versions of audio data isolate one or more utterances of a single respective human speaker. Various implementations generate a refined version of audio data that isolates utterance(s) of a single human speaker by processing a spectrogram representation of the audio data (generated by processing the audio data with a frequency transformation) using a mask generated by processing the spectrogram of the audio data and a speaker embedding for the single human speaker using a trained voice filter model. Output generated over the trained voice filter model is processed using an inverse of the frequency transformation to generate the refined audio data.

Type: Application

Filed: March 4, 2024

Publication date: June 20, 2024

Inventors: Quan Wang, Prashant Sridhar, Ignacio Lopez Moreno, Hannah Muckenhim
Targeted voice separation by speaker conditioned on spectrogram masking

Patent number: 11922951

Abstract: Techniques are disclosed that enable processing of audio data to generate one or more refined versions of audio data, where each of the refined versions of audio data isolate one or more utterances of a single respective human speaker. Various implementations generate a refined version of audio data that isolates utterance(s) of a single human speaker by processing a spectrogram representation of the audio data (generated by processing the audio data with a frequency transformation) using a mask generated by processing the spectrogram of the audio data and a speaker embedding for the single human speaker using a trained voice filter model. Output generated over the trained voice filter model is processed using an inverse of the frequency transformation to generate the refined audio data.

Type: Grant

Filed: January 3, 2022

Date of Patent: March 5, 2024

Assignee: GOOGLE LLC

Inventors: Quan Wang, Prashant Sridhar, Ignacio Lopez Moreno, Hannah Muckenhirn
Training and/or using a language selection model for automatically determining language for speech recognition of spoken utterance

Patent number: 11646011

Abstract: Methods and systems for training and/or using a language selection model for use in determining a particular language of a spoken utterance captured in audio data. Features of the audio data can be processed using the trained language selection model to generate a predicted probability for each of N different languages, and a particular language selected based on the generated probabilities. Speech recognition results for the particular language can be utilized responsive to selecting the particular language of the spoken utterance. Many implementations are directed to training the language selection model utilizing tuple losses in lieu of traditional cross-entropy losses. Training the language selection model utilizing the tuple losses can result in more efficient training and/or can result in a more accurate and/or robust model—thereby mitigating erroneous language selections for spoken utterances.

Type: Grant

Filed: June 22, 2022

Date of Patent: May 9, 2023

Assignee: GOOGLE LLC

Inventors: Li Wan, Yang Yu, Prashant Sridhar, Ignacio Lopez Moreno, Quan Wang
TRAINING AND/OR USING A LANGUAGE SELECTION MODEL FOR AUTOMATICALLY DETERMINING LANGUAGE FOR SPEECH RECOGNITION OF SPOKEN UTTERANCE

Publication number: 20220328035

Abstract: Methods and systems for training and/or using a language selection model for use in determining a particular language of a spoken utterance captured in audio data. Features of the audio data can be processed using the trained language selection model to generate a predicted probability for each of N different languages, and a particular language selected based on the generated probabilities. Speech recognition results for the particular language can be utilized responsive to selecting the particular language of the spoken utterance. Many implementations are directed to training the language selection model utilizing tuple losses in lieu of traditional cross-entropy losses. Training the language selection model utilizing the tuple losses can result in more efficient training and/or can result in a more accurate and/or robust model—thereby mitigating erroneous language selections for spoken utterances.

Type: Application

Filed: June 22, 2022

Publication date: October 13, 2022

Inventors: Li Wan, Yang Yu, Prashant Sridhar, Ignacio Lopez Moreno, Quan Wang
STOCHASTIC FUTURE CONTEXT FOR SPEECH PROCESSING

Publication number: 20220319501

Abstract: The amount of future context used in a speech processing application allows for tradeoffs between performance and the delay in providing results to users. Existing speech processing applications may be trained with a specified future context size and perform poorly when used in production with a different future context size. A speech processing application trained using a stochastic future context allows a trained neural network to be used in production with different amounts of future context. During an update step in training, a future-context size may be sampled from a probability distribution, used to mask a neural network, and compute an output of the masked neural network. The output may then be used to compute a loss value and update parameters of the neural network. The trained neural network may then be used in production with different amounts of future context to provide greater flexibility for production speech processing applications.

Type: Application

Filed: November 18, 2021

Publication date: October 6, 2022

Inventors: Kwangyoun Kim, Felix Wu, Prashant Sridhar, Kyu Jeong Han
Training and/or using a language selection model for automatically determining language for speech recognition of spoken utterance

Patent number: 11410641

Abstract: Methods and systems for training and/or using a language selection model for use in determining a particular language of a spoken utterance captured in audio data. Features of the audio data can be processed using the trained language selection model to generate a predicted probability for each of N different languages, and a particular language selected based on the generated probabilities. Speech recognition results for the particular language can be utilized responsive to selecting the particular language of the spoken utterance. Many implementations are directed to training the language selection model utilizing tuple losses in lieu of traditional cross-entropy losses. Training the language selection model utilizing the tuple losses can result in more efficient training and/or can result in a more accurate and/or robust model—thereby mitigating erroneous language selections for spoken utterances.

Type: Grant

Filed: November 27, 2019

Date of Patent: August 9, 2022

Assignee: GOOGLE LLC

Inventors: Li Wan, Yang Yu, Prashant Sridhar, Ignacio Lopez Moreno, Quan Wang
Inventory management server

Patent number: 11379792

Abstract: An inventory management server is provided. The inventory management server includes at least one processor, and at least one memory. The at least one memory includes computer program code configured to cause the inventory management server at least to receive tracking data assigned to a product from a payment network, interrogate a mapping table containing assigned product to tracking data information, for the presence of the received tracking data, update an inventory database of the product stocked at the merchant inventory in response to detection of the presence of the received tracking data, and transmit acknowledgement data indicative of the inventory database update. The tracking data is transmitted by a merchant via a payment terminal in communication with the payment network.

Type: Grant

Filed: June 16, 2017

Date of Patent: July 5, 2022

Assignee: MASTERCARD ASIA/PACIFIC PTE. LTD.

Inventors: Hao Tang, Senxian Zhuo, Xijing Wang, Bensam Joyson, Naman Aggarwal, Donghao Huang, Prashant Sridhar, Martin Collings, Perry Kick
TARGETED VOICE SEPARATION BY SPEAKER CONDITIONED ON SPECTROGRAM MASKING

Publication number: 20220122611

Abstract: Techniques are disclosed that enable processing of audio data to generate one or more refined versions of audio data, where each of the refined versions of audio data isolate one or more utterances of a single respective human speaker. Various implementations generate a refined version of audio data that isolates utterance(s) of a single human speaker by processing a spectrogram representation of the audio data (generated by processing the audio data with a frequency transformation) using a mask generated by processing the spectrogram of the audio data and a speaker embedding for the single human speaker using a trained voice filter model. Output generated over the trained voice filter model is processed using an inverse of the frequency transformation to generate the refined audio data.

Type: Application

Filed: January 3, 2022

Publication date: April 21, 2022

Inventors: Quan Wang, Prashant Sridhar, Ignacio Lopez Moreno, Hannah Muckenhirn
Targeted voice separation by speaker conditioned on spectrogram masking

Patent number: 11217254

Abstract: Techniques are disclosed that enable processing of audio data to generate one or more refined versions of audio data, where each of the refined versions of audio data isolate one or more utterances of a single respective human speaker. Various implementations generate a refined version of audio data that isolates utterance(s) of a single human speaker by processing a spectrogram representation of the audio data (generated by processing the audio data with a frequency transformation) using a mask generated by processing the spectrogram of the audio data and a speaker embedding for the single human speaker using a trained voice filter model. Output generated over the trained voice filter model is processed using an inverse of the frequency transformation to generate the refined audio data.

Type: Grant

Filed: October 10, 2019

Date of Patent: January 4, 2022

Assignee: GOOGLE LLC

Inventors: Quan Wang, Prashant Sridhar, Ignacio Lopez Moreno, Hannah Muckenhirn
TRAINING AND/OR USING A LANGUAGE SELECTION MODEL FOR AUTOMATICALLY DETERMINING LANGUAGE FOR SPEECH RECOGNITION OF SPOKEN UTTERANCE

Publication number: 20200335083

Abstract: Methods and systems for training and/or using a language selection model for use in determining a particular language of a spoken utterance captured in audio data. Features of the audio data can be processed using the trained language selection model to generate a predicted probability for each of N different languages, and a particular language selected based on the generated probabilities. Speech recognition results for the particular language can be utilized responsive to selecting the particular language of the spoken utterance. Many implementations are directed to training the language selection model utilizing tuple losses in lieu of traditional cross-entropy losses. Training the language selection model utilizing the tuple losses can result in more efficient training and/or can result in a more accurate and/or robust model—thereby mitigating erroneous language selections for spoken utterances.

Type: Application

Filed: November 27, 2019

Publication date: October 22, 2020

Inventors: Li Wan, Yang Yu, Prashant Sridhar, Ignacio Lopez Moreno, Quan Wang
TARGETED VOICE SEPARATION BY SPEAKER CONDITIONED ON SPECTROGRAM MASKING

Publication number: 20200202869

Abstract: Techniques are disclosed that enable processing of audio data to generate one or more refined versions of audio data, where each of the refined versions of audio data isolate one or more utterances of a single respective human speaker. Various implementations generate a refined version of audio data that isolates utterance(s) of a single human speaker by processing a spectrogram representation of the audio data (generated by processing the audio data with a frequency transformation) using a mask generated by processing the spectrogram of the audio data and a speaker embedding for the single human speaker using a trained voice filter model. Output generated over the trained voice filter model is processed using an inverse of the frequency transformation to generate the refined audio data.

Type: Application

Filed: October 10, 2019

Publication date: June 25, 2020

Inventors: Quan Wang, Prashant Sridhar, Ignacio Lopez Moreno, Hannah Muckenhirn
INVENTORY MANAGEMENT SERVER

Publication number: 20170372264

Abstract: An inventory management server is provided. The inventory management server includes at least one processor, and at least one memory. The at least one memory includes computer program code configured to cause the inventory management server at least to receive tracking data assigned to a product from a payment network, interrogate a mapping table containing assigned product to tracking data information, for the presence of the received tracking data, update an inventory database of the product stocked at the merchant inventory in response to detection of the presence of the received tracking data, and transmit acknowledgement data indicative of the inventory database update. The tracking data is transmitted by a merchant via a payment terminal in communication with the payment network.

Type: Application

Filed: June 16, 2017

Publication date: December 28, 2017

Inventors: Hao Tang, Senxian Zhuo, Xijing Wang, Bensam Joyson, Naman Aggarwal, Donghao Huang, Prashant Sridhar, Martin Collings, Perry Kick
METHOD FOR DYNAMIC AUTHENTICATION OF AN OBJECT

Publication number: 20170201377

Abstract: There is provided a data processor implemented method for dynamic authentication of an object. There is also provided non-transitory computer readable storage mediums and systems for carrying out dynamic authentication of an object.

Type: Application

Filed: January 9, 2017

Publication date: July 13, 2017

Applicant: MASTERCARD ASIA/PACIFIC PTE LTD

Inventors: Hao TANG, Xijing WANG, Senxian ZHUO, Yong-How CHIN, Jiaming LI, Bensam JOYSON, Donghao HUANG, Martin COLLINGS, Prashant SRIDHAR, Perry KICK