Patents by Inventor Piyush BEHRE

Piyush BEHRE has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

SMART AUDIO SEGMENTATION USING LOOK-AHEAD BASED ACOUSTO-LINGUISTIC FEATURES

Publication number: 20250054491

Abstract: Systems and methods are provided for smart audio segmentation using look-ahead based acousto-linguistic features. For example, systems and methods are provided for obtaining audio, processing the audio, identifying a potential segmentation boundary within the audio, and determining whether to generate a segment break at the potential segmentation boundary. One or more look-ahead words occurring after the potential segmentation boundary are identified, wherein an acoustic segmentation score and a language segmentation score associated with the potential segmentation boundary and the one or more look-ahead words are generated. Systems then either refrain from generating a segment break at the potential segmentation boundary or generate the segment break at the potential segmentation boundary based on the acoustic and/or language segmentation score at least meeting or exceeding a segmentation score threshold.

Type: Application

Filed: December 22, 2021

Publication date: February 13, 2025

Inventors: Sayan Dev PATHAK, Hosam Adel KHALIL, Naveen PARIHAR, Piyush BEHRE, Shuangyu CHANG, Christopher Hakan BASOGLU, Sharman W TAN, Eva SHARMA, Jian WU, Yang LIU, Edward C LIN, Amit Kumar AGARWAL
CUSTOM DISPLAY POST PROCESSING IN SPEECH RECOGNITION

Publication number: 20240403539

Abstract: Solutions for custom display post processing (DPP) in speech recognition (SR) use a customized multi-stage DPP pipeline that transforms a stream of SR tokens from lexical form to display form. A first transformation stage of the DPP pipeline receives the stream of tokens, in turn, by an upstream filter, a base model stage, and a downstream filter, and transforms a first aspect of the stream of tokens (e.g., disfluency, inverse text normalization (ITN), capitalization, etc.) from lexical form into display form. The upstream filter and/or the downstream filter alter the stream of tokens to change the default behavior of the DPP pipeline into custom behavior. Additional transformation stages of the DPP pipeline perform further transforms, allowing for outputting final text in a display format that is customized for a specific user. This permits each user to efficiently leverage a common baseline DPP pipeline to produce a custom output.

Type: Application

Filed: July 3, 2024

Publication date: December 5, 2024

Inventors: Wei LIU, Padma VARADHARAJAN, Piyush BEHRE, Nicholas KIBRE, Edward C. LIN, Shuangyu CHANG, Che ZHAO, Khuram SHAHID, Heiko Willy RAHMEL
SYSTEMS AND METHODS FOR GPT GUIDED NEURAL PUNCTUATION FOR CONVERSATIONAL SPEECH

Publication number: 20240144931

Abstract: Some disclosed embodiments are directed to obtaining a decoded audio data including a spoken language utterance recognized in audio data and identifying a disfluency in the decoded audio data. Upon determining that correcting the disfluency would improve a readability score of the decoded audio data, the system generates a particular correction to correct the disfluency and applies the particular correction to the decoded audio data. Then, an updated decoded audio data is generated which reflects the particular correction. The updated decoded audio data has improved readability over the decoded audio data.

Type: Application

Filed: November 1, 2022

Publication date: May 2, 2024

Inventors: Sayan Dev PATHAK, Ayush VIKRAM, Zoltan ROMOCSA, Amy Parag SHAH, Piyush BEHRE, Sharman W TAN, Amit Kumar AGARWAL, Christopher Hakan BASOGLU
SYSTEMS AND METHODS FOR SEMANTIC SEGMENTATION FOR SPEECH

Publication number: 20240087572

Abstract: Systems are configured to obtain streaming audio data comprising language utterances, continuously decode the streaming audio data in order to generate decoded streaming audio data and determine whether a linguistic boundary exists within an initial segment of decoded streaming audio data. When a linguistic boundary is determined to exist, the systems apply a punctuation at the linguistic boundary and output a first portion of the initial segment of the streaming audio data ending at the linguistic boundary while refraining from outputting a second portion of the initial segment which is located temporally subsequent to the first portion of the initial segment. Systems are also configured to delay the output until predetermined punctuation validation processes have been performed.

Type: Application

Filed: November 14, 2022

Publication date: March 14, 2024

Inventors: Sayan Dev PATHAK, Amit Kumar AGARWAL, Amy Parag SHAH, Sourish CHATTERJEE, Zoltan ROMOCSA, Christopher Hakan BASOGLU, Piyush BEHRE, Shuangyu CHANG, Emilian Yordanov STOIMENOV
STREAMING PUNCTUATION FOR LONG-FORM DICTATION

Publication number: 20230352009

Abstract: Systems generate segments of spoken language utterances based on different sets of segmentation boundaries. The systems are also configured to generate one or more formatted segments by assigning a punctuation tags at segmentation boundaries and to generate one or more final sentences from the one or more segments.

Type: Application

Filed: April 29, 2022

Publication date: November 2, 2023

Inventors: Piyush BEHRE, Sharman W TAN, Shuangyu CHANG, Padma VARADHARAJAN, Sayan Dev PATHAK, Ravikant GUPTA
CUSTOM DISPLAY POST PROCESSING IN SPEECH RECOGNITION

Publication number: 20230351098

Abstract: Solutions for custom display post processing (DPP) in speech recognition (SR) use a customized multi-stage DPP pipeline that transforms a stream of SR tokens from lexical form to display form. A first transformation stage of the DPP pipeline receives the stream of tokens, in turn, by an upstream filter, a base model stage, and a downstream filter, and transforms a first aspect of the stream of tokens (e.g., disfluency, inverse text normalization (ITN), capitalization, etc.) from lexical form into display form. The upstream filter and/or the downstream filter alter the stream of tokens to change the default behavior of the DPP pipeline into custom behavior. Additional transformation stages of the DPP pipeline perform further transforms, allowing for outputting final text in a display format that is customized for a specific user. This permits each user to efficiently leverage a common baseline DPP pipeline to produce a custom output.

Type: Application

Filed: July 26, 2022

Publication date: November 2, 2023

Inventors: Wei LIU, Padma VARADHARAJAN, Piyush BEHRE, Nicholas KIBRE, Edward C. LIN, Shuangyu CHANG, Che ZHAO, Khuram SHAHID, Heiko Willy RAHMEL
LOW-RESOURCE, MULTI-LINGUAL TRANSFORMER MODELS

Publication number: 20230297606

Abstract: Generally discussed herein are devices, systems, and methods for multi-lingual model generation. A method can include determining, for low-resource languages, respective a language similarity value indicating language similarity between each of the low-resource languages, clustering the low-resource languages into groups based on the respective language similarity value, aggregating training data of languages corresponding to a given group resulting in aggregated training data, and training a re-ranking language model based on the aggregated training data resulting in a trained re-ranking language model.

Type: Application

Filed: June 14, 2022

Publication date: September 21, 2023

Inventors: Li MIAO, Jian WU, Shuangyu CHANG, Piyush BEHRE, Sarangarajan PARTHASARATHY
ON-DEVICE STREAMING INVERSE TEXT NORMALIZATION (ITN)

Publication number: 20230289536

Abstract: Solutions for on-device streaming inverse text normalization (ITN) include: receiving a stream of tokens, each token representing an element of human speech; tagging, by a tagger that can work in a streaming manner (e.g., a neural network), the stream of tokens with one or more tags of a plurality of tags to produce a tagged stream of tokens, each tag of the plurality of tags representing a different normalization category of a plurality of normalization categories; based on at least a first tag representing a first normalization category, converting, by a first language converter of a plurality of category-specific natural language converters (e.g., weighted finite state transducers, WFSTs), at least one token of the tagged stream of tokens, from a first lexical language form, to a first natural language form; and based on at least the first natural language form, outputting a natural language representation of the stream of tokens.

Type: Application

Filed: March 11, 2022

Publication date: September 14, 2023

Inventors: Yashesh GAUR, Nicholas KIBRE, Issac J. ALPHONSO, Jian XUE, Jinyu LI, Piyush BEHRE, Shawn CHANG