Patents by Inventor Xiaozhuo Cheng

Xiaozhuo Cheng has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Systems and methods for configuring and using an audio transcript correction machine learning model

Patent number: 11922947

Abstract: A system, method, and computer-program product includes constructing a transcript correction training data corpus that includes a plurality of labeled audio transcription training data samples, wherein each of the plurality of labeled audio transcription training data samples includes: an incorrect audio transcription of a target piece of audio data; a correct audio transcription of the target piece of audio data; and a transcript correction identifier that, when applied to a model input that includes a likely incorrect audio transcript, defines a text-to-text transformation objective causing an audio transcript correction machine learning model to predict a corrected audio transcript based on the likely incorrect audio transcript; configuring the audio transcript correction machine learning model based on a training of a machine learning text-to-text transformer model using the transcript correction training data corpus; and executing the audio transcript correction machine learning model within a speech-to-

Type: Grant

Filed: June 26, 2023

Date of Patent: March 5, 2024

Assignee: SAS INSTITUTE INC.

Inventors: Xiaolong Li, Xiaozhuo Cheng, Xu Yang
Multithreaded speech data preprocessing

Patent number: 11862171

Abstract: An apparatus includes a processor to: receive, from a requesting device, a request to perform speech-to-text conversion of a speech data set; within a first thread of a thread pool, perform a first pause detection technique to identify a first set of likely sentence pauses; within a second thread of the thread pool, perform a second pause detection technique to identify a second set of likely sentence pauses; perform a speaker diarization technique to identify a set of likely speaker changes; divide the speech data set into data segments representing speech segments based on a combination of at least the first set of likely sentence pauses, the second set of likely sentence pauses, and the set of likely speaker changes; use at least an acoustic model with each data segment to identify likely speech sounds; and generate a transcript based, at least in part, on the identified likely speech sounds.

Type: Grant

Filed: November 23, 2022

Date of Patent: January 2, 2024

Assignee: SAS Institute Inc.

Inventors: Xiaolong Li, Xiaozhuo Cheng, Samuel Norris Henderson, Xu Yang
Method for Configuring and Using a Numeric-to-Alphabetic Expression Machine Learning Model

Publication number: 20230386473

Abstract: A system, method, and computer-program product includes constructing a transcript adaptation training data corpus that includes a plurality of transcript normalization training data samples, wherein each of the plurality of transcript normalization training data samples includes: a predicted audio transcript that includes at least one numerical expression, an adapted audio transcript that includes an alphabetic representation of the at least one numerical expression, and a transcript normalization identifier that, when applied to a model input comprising a target audio transcript, defines a text-to-text transformation objective causing a numeric-to-alphabetic expression machine learning model to predict an alphabetic-equivalent audio transcript that represents each numerical expression included in the target audio transcript in one or more alphabetic tokens; configuring the numeric-to-alphabetic expression machine learning model based on a training of a machine learning text-to-text transformer model using th

Type: Application

Filed: July 11, 2023

Publication date: November 30, 2023

Applicant: SAS Institute Inc.

Inventors: Xiaolong Li, Xiaozhuo Cheng, Xu Yang
SYSTEMS AND METHODS FOR CONFIGURING AND USING AN AUDIO TRANSCRIPT CORRECTION MACHINE LEARNING MODEL

Publication number: 20230360652

Abstract: A system, method, and computer-program product includes constructing a transcript correction training data corpus that includes a plurality of labeled audio transcription training data samples, wherein each of the plurality of labeled audio transcription training data samples includes: an incorrect audio transcription of a target piece of audio data; a correct audio transcription of the target piece of audio data; and a transcript correction identifier that, when applied to a model input that includes a likely incorrect audio transcript, defines a text-to-text transformation objective causing an audio transcript correction machine learning model to predict a corrected audio transcript based on the likely incorrect audio transcript; configuring the audio transcript correction machine learning model based on a training of a machine learning text-to-text transformer model using the transcript correction training data corpus; and executing the audio transcript correction machine learning model within a speech-to-

Type: Application

Filed: June 26, 2023

Publication date: November 9, 2023

Applicant: SAS Institute Inc.

Inventors: Xiaolong Li, Xiaozhuo Cheng, Xu Yang
Multi-threaded speaker identification

Patent number: 11810572

Abstract: A system, method, and computer-program product includes distributing a plurality of audio data files of a speech data corpus to a plurality of computing nodes that each implement a plurality of audio processing threads, executing the plurality of audio processing threads associated with each of the plurality of computing nodes to detect a plurality of tentative speakers participating in each of the plurality of audio data files, generating, via a clustering algorithm, a plurality of clusters of embedding signatures based on a plurality of embedding signatures associated with the plurality of tentative speakers in each of the plurality of audio data files, and detecting a plurality of global speakers associated with the speech data corpus based on the plurality of clusters of embedding signatures.

Type: Grant

Filed: June 8, 2023

Date of Patent: November 7, 2023

Assignee: SAS INSTITUTE INC.

Inventors: Xiaozhuo Cheng, Xiaolong Li, Xu Yang
MULTI-THREADED SPEAKER IDENTIFICATION

Publication number: 20230317083

Abstract: A system, method, and computer-program product includes distributing a plurality of audio data files of a speech data corpus to a plurality of computing nodes that each implement a plurality of audio processing threads, executing the plurality of audio processing threads associated with each of the plurality of computing nodes to detect a plurality of tentative speakers participating in each of the plurality of audio data files, generating, via a clustering algorithm, a plurality of clusters of embedding signatures based on a plurality of embedding signatures associated with the plurality of tentative speakers in each of the plurality of audio data files, and detecting a plurality of global speakers associated with the speech data corpus based on the plurality of clusters of embedding signatures.

Type: Application

Filed: June 8, 2023

Publication date: October 5, 2023

Applicant: SAS Institute Inc.

Inventors: Xiaozhuo Cheng, Xiaolong Li, Xu Yang
Multithreaded speech-to-text processing

Patent number: 11776545

Abstract: An apparatus includes a processor to: receive a request to perform speech-to-text conversion of a speech data set; perform pause detection to identify a set of likely sentence pauses and/or speaker diarization technique to identify a set of likely speaker changes; based the set of likely sentence pauses and/or the set of likely speaker changes, divide the speech data set into data segments representing speech segments; use an acoustic model with the data segments to derive sets of probabilities of speech sounds uttered; store the sets of probabilities in temporal order within a buffer queue; distribute the sets of probabilities from the buffer queue in temporal order among threads of a thread pool; and within each thread, and based on set(s) of probabilities, derive one candidate word and select either the candidate word or an alternate candidate word derived from a language model as the next word most likely spoken.

Type: Grant

Filed: November 28, 2022

Date of Patent: October 3, 2023

Assignee: SAS Institute Inc.

Inventors: Xiaolong Li, Xiaozhuo Cheng, Samuel Norris Henderson, Xu Yang
Multithreaded Speech Data Preprocessing

Publication number: 20230107312

Abstract: An apparatus includes a processor to: receive, from a requesting device, a request to perform speech-to-text conversion of a speech data set; within a first thread of a thread pool, perform a first pause detection technique to identify a first set of likely sentence pauses; within a second thread of the thread pool, perform a second pause detection technique to identify a second set of likely sentence pauses; perform a speaker diarization technique to identify a set of likely speaker changes; divide the speech data set into data segments representing speech segments based on a combination of at least the first set of likely sentence pauses, the second set of likely sentence pauses, and the set of likely speaker changes; use at least an acoustic model with each data segment to identify likely speech sounds; and generate a transcript based, at least in part, on the identified likely speech sounds.

Type: Application

Filed: November 23, 2022

Publication date: April 6, 2023

Applicant: SAS Institute Inc.

Inventors: XIAOLONG LI, Xiaozhuo Cheng, Samuel Norris Henderson, Xu Yang
Multithreaded Speech-to-Text Processing

Publication number: 20230098063

Abstract: An apparatus includes a processor to: receive a request to perform speech-to-text conversion of a speech data set; perform pause detection to identify a set of likely sentence pauses and/or speaker diarization technique to identify a set of likely speaker changes; based the set of likely sentence pauses and/or the set of likely speaker changes, divide the speech data set into data segments representing speech segments; use an acoustic model with the data segments to derive sets of probabilities of speech sounds uttered; store the sets of probabilities in temporal order within a buffer queue; distribute the sets of probabilities from the buffer queue in temporal order among threads of a thread pool; and within each thread, and based on set(s) of probabilities, derive one candidate word and select either the candidate word or an alternate candidate word derived from a language model as the next word most likely spoken.

Type: Application

Filed: November 28, 2022

Publication date: March 30, 2023

Applicant: SAS Institute Inc.

Inventors: XIAOLONG LI, Xiaozhuo Cheng, Samuel Norris Henderson, Xu Yang
Speech segmentation based on combination of pause detection and speaker diarization

Patent number: 11538481

Abstract: An apparatus includes at least one processor to, in response to a request to perform speech-to-text conversion: perform a pause detection technique including analyzing speech audio to identify pauses, and analyzing lengths of the pauses to identify likely sentence pauses; perform a speaker diarization technique including dividing the speech audio into fragments, analyzing vocal characteristics of speech sounds of each fragment to identify a speaker of a set of speakers, and identifying instances of a change in speakers between each temporally consecutive pair of fragments to identify likely speaker changes; and perform speech-to-text operations including dividing the speech audio into segments based on at least the likely sentence pauses and likely speaker changes, using at least an acoustic model with each segment to identify likely speech sounds in the speech audio, and generating a transcript of the speech audio based at least on the likely speech sounds.

Type: Grant

Filed: June 28, 2022

Date of Patent: December 27, 2022

Assignee: SAS INSTITUTE INC.

Inventors: Xiaolong Li, Samuel Norris Henderson, Xiaozhuo Cheng, Xu Yang
SPEECH SEGMENTATION BASED ON COMBINATION OF PAUSE DETECTION AND SPEAKER DIARIZATION

Publication number: 20220335947

Abstract: An apparatus includes at least one processor to, in response to a request to perform speech-to-text conversion: perform a pause detection technique including analyzing speech audio to identify pauses, and analyzing lengths of the pauses to identify likely sentence pauses; perform a speaker diarization technique including dividing the speech audio into fragments, analyzing vocal characteristics of speech sounds of each fragment to identify a speaker of a set of speakers, and identifying instances of a change in speakers between each temporally consecutive pair of fragments to identify likely speaker changes; and perform speech-to-text operations including dividing the speech audio into segments based on at least the likely sentence pauses and likely speaker changes, using at least an acoustic model with each segment to identify likely speech sounds in the speech audio, and generating a transcript of the speech audio based at least on the likely speech sounds.

Type: Application

Filed: June 28, 2022

Publication date: October 20, 2022

Applicant: SAS Institute Inc.

Inventors: XIAOLONG LI, Samuel Norris Henderson, Xiaozhuo Cheng, Xu Yang
Speech-to-analytics framework with support for large n-gram corpora

Patent number: 11404053

Abstract: An apparatus includes processor(s) to: generate a set of candidate n-grams based on probability distributions from an acoustic model for candidate graphemes of a next word most likely spoken following at least one preceding word spoken within speech audio; provide the set of candidate n-grams to multiple devices; provide, to each node device, an indication of which candidate n-grams are to be searched for within the n-gram corpus by each node device to enable searches for multiple candidate n-grams to be performed, independently and at least partially in parallel, across the node devices; receive, from each node device, an indication of a probability of occurrence of at least one candidate n-gram within the speech audio; based on the received probabilities of occurrence, identify the next word most likely spoken within the speech audio; and add the next word most likely spoken to a transcript of the speech audio.

Type: Grant

Filed: July 8, 2021

Date of Patent: August 2, 2022

Assignee: SAS INSTITUTE INC.

Inventors: Xiaozhuo Cheng, Xu Yang, Xiaolong Li, Biljana Belamaric Wilsey, Haipeng Liu, Jared Peterson
Dual use of acoustic model in speech-to-text framework

Patent number: 11373655

Abstract: An apparatus includes processor(s) to: perform preprocessing operations of a segmentation technique including divide speech data set into data chunks representing chunks of speech audio, use an acoustic model with each data chunk to identify pauses in the speech audio, and analyze a length of time of each identified pause to identify a candidate set of likely sentence pauses in the speech audio; and perform speech-to-text operations including divide the speech data set into data segments that each representing segments of the speech audio based on the candidate set of likely sentence pauses, use the acoustic model with each data segment to identify likely speech sounds in the speech audio, analyze the identified likely speech sounds to identify candidate sets of words likely spoken in the speech audio, and generate a transcript of the speech data set based at least on the candidate sets of words likely spoken.

Type: Grant

Filed: October 12, 2021

Date of Patent: June 28, 2022

Assignee: SAS INSTITUTE INC.

Inventors: Xiaolong Li, Xiaozhuo Cheng, Xu Yang
Dual use of audio noise level in speech-to-text framework

Patent number: 11335350

Abstract: An apparatus includes processor(s) to: perform pre-processing operations including derive an audio noise level of speech audio of a speech data set, derive a first relative weighting for first and second segmentation techniques for identifying likely sentence pauses in the speech audio based on the audio noise level, and select likely sentence pauses for a converged set of likely sentence pauses from likely sentence pauses identified by the first and/or second segmentation techniques based on the first relative weighting; and perform speech-to-text processing operations including divide the speech data set into data segments representing speech segments of the speech audio based on the converged set of likely sentence pauses, and derive a second relative weighting based on the audio noise level for selecting words indicated by an acoustic model or by a language model as being most likely spoken in the speech audio for inclusion in a transcript.

Type: Grant

Filed: October 12, 2021

Date of Patent: May 17, 2022

Assignee: SAS INSTITUTE INC.

Inventors: Xiaolong Li, Xiaozhuo Cheng, Xu Yang
DUAL USE OF AUDIO NOISE LEVEL IN SPEECH-TO-TEXT FRAMEWORK

Publication number: 20220028396

Abstract: An apparatus includes processor(s) to: perform pre-processing operations including derive an audio noise level of speech audio of a speech data set, derive a first relative weighting for first and second segmentation techniques for identifying likely sentence pauses in the speech audio based on the audio noise level, and select likely sentence pauses for a converged set of likely sentence pauses from likely sentence pauses identified by the first and/or second segmentation techniques based on the first relative weighting; and perform speech-to-text processing operations including divide the speech data set into data segments representing speech segments of the speech audio based on the converged set of likely sentence pauses, and derive a second relative weighting based on the audio noise level for selecting words indicated by an acoustic model or by a language model as being most likely spoken in the speech audio for inclusion in a transcript.

Type: Application

Filed: October 12, 2021

Publication date: January 27, 2022

Applicant: SAS Institute Inc.

Inventors: XIAOLONG LI, XIAOZHUO CHENG, XU YANG
DUAL USE OF ACOUSTIC MODEL IN SPEECH-TO-TEXT FRAMEWORK

Publication number: 20220028395

Abstract: An apparatus includes processor(s) to: perform preprocessing operations of a segmentation technique including divide speech data set into data chunks representing chunks of speech audio, use an acoustic model with each data chunk to identify pauses in the speech audio, and analyze a length of time of each identified pause to identify a candidate set of likely sentence pauses in the speech audio; and perform speech-to-text operations including divide the speech data set into data segments that each representing segments of the speech audio based on the candidate set of likely sentence pauses, use the acoustic model with each data segment to identify likely speech sounds in the speech audio, analyze the identified likely speech sounds to identify candidate sets of words likely spoken in the speech audio, and generate a transcript of the speech data set based at least on the candidate sets of words likely spoken.

Type: Application

Filed: October 12, 2021

Publication date: January 27, 2022

Applicant: SAS Institute Inc.

Inventors: XIAOLONG LI, XIAOZHUO CHENG, XU YANG
Speech-to-analytics framework with support for large n-gram corpora

Patent number: 11217233

Abstract: An apparatus includes processor(s) to: generate a set of candidate n-grams based on probability distributions from an acoustic model for candidate graphemes of a next word most likely spoken following at least one preceding word spoken within speech audio; provide the set of candidate n-grams to multiple devices; provide, to each node device, an indication of which candidate n-grams are to be searched for within the n-gram corpus by each node device to enable searches for multiple candidate n-grams to be performed, independently and at least partially in parallel, across the node devices; receive, from each node device, an indication of a probability of occurrence of at least one candidate n-gram within the speech audio; based on the received probabilities of occurrence, identify the next word most likely spoken within the speech audio; and add the next word most likely spoken to a transcript of the speech audio.

Type: Grant

Filed: July 8, 2021

Date of Patent: January 4, 2022

Assignee: SAS INSTITUTE INC.

Inventors: Xiaozhuo Cheng, Xu Yang, Xiaolong Li, Biljana Belamaric Wilsey, Haipeng Liu, Jared Peterson
Speech audio pre-processing segmentation

Patent number: 11138979

Abstract: An apparatus includes processor(s) to: divide a speech data set into multiple data chunks that each represent a chunk of speech audio; derive a threshold amplitude based on at least one peak amplitude of the speech audio; designate each data chunk with a peak amplitude below the threshold amplitude a pause data chunk; within a set of temporally consecutive data chunks of the multiple data chunks, identify a longest subset of temporally consecutive pause data chunks; within the set of temporally consecutive data chunks, designate the longest subset of temporally consecutive pause data chunks as a likely sentence pause of a candidate set of likely sentence pauses; based on at least the candidate set, divide the speech data set into multiple data segments that each represent a speech segment of the speech audio; and perform speech-to-text conversion, to identify a sentence spoken in each speech segment.

Type: Grant

Filed: December 30, 2020

Date of Patent: October 5, 2021

Assignee: SAS INSTITUTE INC.

Inventors: Xiaozhuo Cheng, Xu Yang, Xiaolong Li
Speech Audio Pre-Processing Segmentation

Publication number: 20210295845

Abstract: An apparatus includes processor(s) to: divide a speech data set into multiple data chunks that each represent a chunk of speech audio; derive a threshold amplitude based on at least one peak amplitude of the speech audio; designate each data chunk with a peak amplitude below the threshold amplitude a pause data chunk; within a set of temporally consecutive data chunks of the multiple data chunks, identify a longest subset of temporally consecutive pause data chunks; within the set of temporally consecutive data chunks, designate the longest subset of temporally consecutive pause data chunks as a likely sentence pause of a candidate set of likely sentence pauses; based on at least the candidate set, divide the speech data set into multiple data segments that each represent a speech segment of the speech audio; and perform speech-to-text conversion, to identify a sentence spoken in each speech segment.

Type: Application

Filed: December 30, 2020

Publication date: September 23, 2021

Applicant: SAS Institute Inc.

Inventors: XIAOZHUO CHENG, XU YANG, XIAOLONG LI
Speech audio pre-processing segmentation

Patent number: 11049502

Abstract: An apparatus includes processor(s) to: divide a speech data set into multiple data chunks that each represent a chunk of speech audio; configure a neural network to implement an acoustic model that includes a CTC output; provide each data chunk to the neural network and monitor the CTC output for a string of blank symbols; designate each string of blank symbols from the CTC output that is at least as long as a predetermined blank threshold length as a likely sentence pause of a candidate set of likely sentence pauses; based on at least the candidate set, divide the speech data set into multiple data segments that each represent a speech segment of the speech audio; and perform speech-to-text conversion, to identify a sentence spoken in a selected language in each speech segment.

Type: Grant

Filed: December 30, 2020

Date of Patent: June 29, 2021

Assignee: SAS INSTITUTE INC.

Inventors: Xiaozhuo Cheng, Xu Yang, Xiaolong Li