Patents by Inventor Chung-Cheng Chiu

Chung-Cheng Chiu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

ENHANCED ATTENTION MECHANISMS

Publication number: 20220083743

Abstract: A method includes receiving a sequence of audio features characterizing an utterance and processing, using an encoder neural network, the sequence of audio features to generate a sequence of encodings. At each of a plurality of output steps, the method also includes determining a corresponding hard monotonic attention output to select an encoding from the sequence of encodings, identifying a proper subset of the sequence of encodings based on a position of the selected encoding in the sequence of encodings, and performing soft attention over the proper subset of the sequence of encodings to generate a context vector at the corresponding output step. The method also includes processing, using a decoder neural network, the context vector generated at the corresponding output step to predict a probability distribution over possible output labels at the corresponding output step.

Type: Application

Filed: November 30, 2021

Publication date: March 17, 2022

Applicant: Google LLC

Inventors: Chung-Cheng Chiu, Colin Abraham Raffel
Augmentation of Audiographic Images for Improved Machine Learning

Publication number: 20220012537

Abstract: Generally, the present disclosure is directed to systems and methods that generate augmented training data for machine-learned models via application of one or more augmentation techniques to audiographic images that visually represent audio signals. In particular, the present disclosure provides a number of novel augmentation operations which can be performed directly upon the audiographic image (e.g., as opposed to the raw audio data) to generate augmented training data that results in improved model performance. As an example, the audiographic images can be or include one or more spectrograms or filter bank sequences.

Type: Application

Filed: September 28, 2021

Publication date: January 13, 2022

Inventors: Daniel Sung-Joon Park, Quoc V. Le, William Chan, Ekin Dogus Cubuk, Barret Zoph, Yu Zhang, Chung-Cheng Chiu
SPEECH RECOGNITION WITH SEQUENCE-TO-SEQUENCE MODELS

Publication number: 20220005465

Abstract: A method for performing speech recognition using sequence-to-sequence models includes receiving audio data for an utterance and providing features indicative of acoustic characteristics of the utterance as input to an encoder. The method also includes processing an output of the encoder using an attender to generate a context vector, generating speech recognition scores using the context vector and a decoder trained using a training process, and generating a transcription for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.

Type: Application

Filed: September 20, 2021

Publication date: January 6, 2022

Applicant: Google LLC

Inventors: Rohit Prakash Prabhavalkar, Zhifeng Chen, Bo Li, Chung-cheng Chiu, Kanury Kanishka Rao, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Michiel A.u. Bacchiani, Tara N. Sainath, Jan Kazimierz Chorowski, Anjuli Patricia Kannan, Ekaterina Gonina, Patrick An Phu Nguyen
Enhanced attention mechanisms

Patent number: 11210475

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for enhanced attention mechanisms. In some implementations, data indicating an input sequence is received. The data is processed using an encoder neural network to generate a sequence of encodings. A series of attention outputs is determined using one or more attender modules. Determining each attention output can include (i) selecting an encoding from the sequence of encodings and (ii) determining attention over a proper subset of the sequence of encodings, where the proper subset of encodings is determined based on a position of the selected encoding in the sequence of encodings. The selections of encodings are also monotonic through the sequence of encodings. An output sequence is generated by processing the attention outputs using a decoder neural network. An output is provided that indicates a language sequence determined from the output sequence.

Type: Grant

Filed: July 22, 2019

Date of Patent: December 28, 2021

Assignee: Google LLC

Inventors: Chung-Cheng Chiu, Colin Abraham Raffel
MINIMUM WORD ERROR RATE TRAINING FOR ATTENTION-BASED SEQUENCE-TO-SEQUENCE MODELS

Publication number: 20210358491

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer-readable storage media, for speech recognition using attention-based sequence-to-sequence models. In some implementations, audio data indicating acoustic characteristics of an utterance is received. A sequence of feature vectors indicative of the acoustic characteristics of the utterance is generated. The sequence of feature vectors is processed using a speech recognition model that has been trained using a loss function that uses N-best lists of decoded hypotheses, the speech recognition model including an encoder, an attention module, and a decoder. The encoder and decoder each include one or more recurrent neural network layers. A sequence of output vectors representing distributions over a predetermined set of linguistic units is obtained. A transcription for the utterance is obtained based on the sequence of output vectors. Data indicating the transcription of the utterance is provided.

Type: Application

Filed: July 27, 2021

Publication date: November 18, 2021

Applicant: Google LLC

Inventors: Rohit Prakash Prabhavalkar, Tara N. Sainath, Yonghui Wu, Patrick An Phu Nguyen, Zhifeng Chen, Chung-Cheng Chiu, Anjuli Patricia Kannan
Speech recognition with sequence-to-sequence models

Patent number: 11145293

Abstract: Methods, systems, and apparatus, including computer-readable media, for performing speech recognition using sequence-to-sequence models. An automated speech recognition (ASR) system receives audio data for an utterance and provides features indicative of acoustic characteristics of the utterance as input to an encoder. The system processes an output of the encoder using an attender to generate a context vector and generates speech recognition scores using the context vector and a decoder trained using a training process that selects at least one input to the decoder with a predetermined probability. An input to the decoder during training is selected between input data based on a known value for an element in a training example, and input data based on an output of the decoder for the element in the training example. A transcription is generated for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.

Type: Grant

Filed: July 19, 2019

Date of Patent: October 12, 2021

Assignee: Google LLC

Inventors: Rohit Prakash Prabhavalkar, Zhifeng Chen, Bo Li, Chung-Cheng Chiu, Kanury Kanishka Rao, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Michiel A. U. Bacchiani, Tara N. Sainath, Jan Kazimierz Chorowski, Anjuli Patricia Kannan, Ekaterina Gonina, Patrick An Phu Nguyen
Augmentation of audiographic images for improved machine learning

Patent number: 11138471

Abstract: Generally, the present disclosure is directed to systems and methods that generate augmented training data for machine-learned models via application of one or more augmentation techniques to audiographic images that visually represent audio signals. In particular, the present disclosure provides a number of novel augmentation operations which can be performed directly upon the audiographic image (e.g., as opposed to the raw audio data) to generate augmented training data that results in improved model performance. As an example, the audiographic images can be or include one or more spectrograms or filter bank sequences.

Type: Grant

Filed: May 20, 2019

Date of Patent: October 5, 2021

Assignee: Google LLC

Inventors: Daniel Sung-Joon Park, Quoc Le, William Chan, Ekin Dogus Cubuk, Barret Zoph, Yu Zhang, Chung-Cheng Chiu
Minimum word error rate training for attention-based sequence-to-sequence models

Patent number: 11107463

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer-readable storage media, for speech recognition using attention-based sequence-to-sequence models. In some implementations, audio data indicating acoustic characteristics of an utterance is received. A sequence of feature vectors indicative of the acoustic characteristics of the utterance is generated. The sequence of feature vectors is processed using a speech recognition model that has been trained using a loss function that uses N-best lists of decoded hypotheses, the speech recognition model including an encoder, an attention module, and a decoder. The encoder and decoder each include one or more recurrent neural network layers. A sequence of output vectors representing distributions over a predetermined set of linguistic units is obtained. A transcription for the utterance is obtained based on the sequence of output vectors. Data indicating the transcription of the utterance is provided.

Type: Grant

Filed: August 1, 2019

Date of Patent: August 31, 2021

Assignee: Google LLC

Inventors: Rohit Prakash Prabhavalkar, Tara N. Sainath, Yonghui Wu, Patrick An Phu Nguyen, Zhifeng Chen, Chung-Cheng Chiu, Anjuli Patricia Kannan
Attention-Based Joint Acoustic and Text On-Device End-to-End Model

Publication number: 20210225362

Abstract: A method includes receiving a training example for a listen-attend-spell (LAS) decoder of a two-pass streaming neural network model and determining whether the training example corresponds to a supervised audio-text pair or an unpaired text sequence. When the training example corresponds to an unpaired text sequence, the method also includes determining a cross entropy loss based on a log probability associated with a context vector of the training example. The method also includes updating the LAS decoder and the context vector based on the determined cross entropy loss.

Type: Application

Filed: January 21, 2021

Publication date: July 22, 2021

Applicant: Google LLC

Inventors: Tara N. Sainath, Ruorning Pang, Ron Weiss, Yanzhang He, Chung-Cheng Chiu, Trevor Strohman
Recurrent neural networks for online sequence generation

Patent number: 10656605

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating a target sequence from a source sequence. In one aspect, the system includes a recurrent neural network configured to, at each time step, receive am input for the time step and process the input to generate a progress score and a set of output scores; and a subsystem configured to, at each time step, generate the recurrent neural network input and provide the input to the recurrent neural network; determine, from the progress score, whether or not to emit a new output at the time step; and, in response to determining to emit a new output, select an output using the output scores and emit the selected output as the output at a next position in the output order.

Type: Grant

Filed: May 2, 2019

Date of Patent: May 19, 2020

Assignee: Google LLC

Inventors: Chung-Cheng Chiu, Navdeep Jaitly, Ilya Sutskever, Yuping Luo
RECURRENT NEURAL NETWORKS FOR ONLINE SEQUENCE GENERATION

Publication number: 20200151544

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating a target sequence from a source sequence. In one aspect, the system includes a recurrent neural network configured to, at each time step, receive an input for the time step and process the input to generate a progress score and a set of output scores; and a subsystem configured to, at each time step, generate the recurrent neural network input and provide the input to the recurrent neural network; determine, from the progress score, whether or not to emit a new output at the time step; and, in response to determining to emit a new output, select an output using the output scores and emit the selected output as the output at a next position in the output order.

Type: Application

Filed: May 3, 2018

Publication date: May 14, 2020

Inventors: Chung-Cheng Chiu, Navdeep Jaitly, John Dieterich Lawson, George Jay Tucker
SPEECH RECOGNITION WITH SEQUENCE-TO-SEQUENCE MODELS

Publication number: 20200126538

Abstract: A method includes obtaining audio data for a long-form utterance and segmenting the audio data for the long-form utterance into a plurality of overlapping segments. The method also includes, for each overlapping segment of the plurality of overlapping segments: providing features indicative of acoustic characteristics of the long-form utterance represented by the corresponding overlapping segment as input to an encoder neural network; processing an output of the encoder neural network using an attender neural network to generate a context vector; and generating word elements using the context vector and a decoder neural network. The method also includes generating a transcription for the long-form utterance by merging the word elements from the plurality of overlapping segments and providing the transcription as an output of the automated speech recognition system.

Type: Application

Filed: December 17, 2019

Publication date: April 23, 2020

Applicant: Google LLC

Inventors: Wei Han, Chung-Cheng Chiu, Yu Zhang, Yonghui Wu, Patrick Nguyen, Sergey Kishchenko
MINIMUM WORD ERROR RATE TRAINING FOR ATTENTION-BASED SEQUENCE-TO-SEQUENCE MODELS

Publication number: 20200043483

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer-readable storage media, for speech recognition using attention-based sequence-to-sequence models. In some implementations, audio data indicating acoustic characteristics of an utterance is received. A sequence of feature vectors indicative of the acoustic characteristics of the utterance is generated. The sequence of feature vectors is processed using a speech recognition model that has been trained using a loss function that uses N-best lists of decoded hypotheses, the speech recognition model including an encoder, an attention module, and a decoder. The encoder and decoder each include one or more recurrent neural network layers. A sequence of output vectors representing distributions over a predetermined set of linguistic units is obtained. A transcription for the utterance is obtained based on the sequence of output vectors. Data indicating the transcription of the utterance is provided.

Type: Application

Filed: August 1, 2019

Publication date: February 6, 2020

Inventors: Rohit Prakash Prabhavalkar, Tara N. Sainath, Yonghui Wu, Patrick An Phu Nguyen, Zhifeng Chen, Chung-Cheng Chiu, Anjuli Patricia Kannan
SPEECH RECOGNITION WITH SEQUENCE-TO-SEQUENCE MODELS

Publication number: 20200027444

Abstract: Methods, systems, and apparatus, including computer-readable media, for performing speech recognition using sequence-to-sequence models. An automated speech recognition (ASR) system receives audio data for an utterance and provides features indicative of acoustic characteristics of the utterance as input to an encoder. The system processes an output of the encoder using an attender to generate a context vector and generates speech recognition scores using the context vector and a decoder trained using a training process that selects at least one input to the decoder with a predetermined probability. An input to the decoder during training is selected between input data based on a known value for an element in a training example, and input data based on an output of the decoder for the element in the training example. A transcription is generated for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.

Type: Application

Filed: July 19, 2019

Publication date: January 23, 2020

Inventors: Rohit Prakash Prabhavalkar, Zhifeng Chen, Bo Li, Chung-Cheng Chiu, Kanury Kanishka Rao, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Michiel A.U. Bacchiani, Tara N. Sainath, Jan Kazimierz Chorowski, Anjuli Patricia Kannan, Ekaterina Gonina, Patrick An Phu Nguyen
ENHANCED ATTENTION MECHANISMS

Publication number: 20200026760

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for enhanced attention mechanisms. In some implementations, data indicating an input sequence is received. The data is processed using an encoder neural network to generate a sequence of encodings. A series of attention outputs is determined using one or more attender modules. Determining each attention output can include (i) selecting an encoding from the sequence of encodings and (ii) determining attention over a proper subset of the sequence of encodings, where the proper subset of encodings is determined based on a position of the selected encoding in the sequence of encodings. The selections of encodings are also monotonic through the sequence of encodings. An output sequence is generated by processing the attention outputs using a decoder neural network. An output is provided that indicates a language sequence determined from the output sequence.

Type: Application

Filed: July 22, 2019

Publication date: January 23, 2020

Inventors: Chung-Cheng Chiu, Colin Abraham Raffel
Augmentation of Audiographic Images for Improved Machine Learning

Publication number: 20190354808

Abstract: Generally, the present disclosure is directed to systems and methods that generate augmented training data for machine-learned models via application of one or more augmentation techniques to audiographic images that visually represent audio signals. In particular, the present disclosure provides a number of novel augmentation operations which can be performed directly upon the audiographic image (e.g., as opposed to the raw audio data) to generate augmented training data that results in improved model performance. As an example, the audiographic images can be or include one or more spectrograms or filter bank sequences.

Type: Application

Filed: May 20, 2019

Publication date: November 21, 2019

Inventors: Daniel Sung-Joon Park, Quoc Le, William Chan, Ekin Dogus Cubuk, Barret Zoph, Yu Zhang, Chung-Cheng Chiu
Recurrent neural networks for online sequence generation

Patent number: 10281885

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating a target sequence from a source sequence. In one aspect, the system includes a recurrent neural network configured to, at each time step, receive am input for the time step and process the input to generate a progress score and a set of output scores; and a subsystem configured to, at each time step, generate the recurrent neural network input and provide the input to the recurrent neural network; determine, from the progress score, whether or not to emit a new output at the time step; and, in response to determining to emit a new output, select an output using the output scores and emit the selected output as the output at a next position in the output order.

Type: Grant

Filed: May 19, 2017

Date of Patent: May 7, 2019

Assignee: Google LLC

Inventors: Chung-Cheng Chiu, Navdeep Jaitly, Ilya Sutskever, Yuping Luo
Method for tracking and processing image

Patent number: 8331623

Abstract: The invention relates to a method for image processing, which can be used to calibrate the background quickly. When the external environment is changed due to the switch of light, the color of background is calibrated quickly, and the background can be updated together. The method not only is used to update the background, but also can be used to eliminate the convergence of background again.

Type: Grant

Filed: December 23, 2008

Date of Patent: December 11, 2012

Assignee: National Chiao Tung University

Inventors: Bing-Fei Wu, Chao-Jung Chen, Chih-Chung Kao, Meng-Liang Chung, Chung-Cheng Chiu, Min-Yu Ku, Chih-Chun Liu, Cheng-Yen Yang
Asynchronous photography automobile-detecting apparatus

Patent number: 8284239

Abstract: The invention discloses the asynchronous photography for dual camera apparatus and processing the method for real-time forward vehicle detection. Image is captured by a pair of monochrome camera and stored into a computer. After the video pre-process, the edge information is used to locate the forward vehicle position, and then obtained the disparity from a fast comparison search algorithm by the stereo vision methodology. Proposed algorithm calculation of the invention can conquer the asynchronous exposure problem from dual camera and lower the hardware cost.

Type: Grant

Filed: July 8, 2009

Date of Patent: October 9, 2012

Assignee: National Defense University

Inventors: Chung-Cheng Chiu, Wen-Chung Chen, Meng-Liang Chung
Tracking vehicle method by using image processing

Patent number: 8218877

Abstract: The invention relates to a method for image processing. First, establish the initial image background information. And retrieve the instant image information. Then calculate the initial image background information and color intensity information of the instant image. Furthermore, adjust the instant image information. Then calculate the moving-object information. Finally, track the moving-object information. It can improve the accuracy rate of detection without the influence of erected height.

Type: Grant

Filed: December 23, 2008

Date of Patent: July 10, 2012

Assignee: National Chiao Tung University

Inventors: Bing-Fei Wu, Chao-Jung Chen, Chih-Chung Kao, Meng-Liang Chung, Chung-Cheng Chiu, Min-Yu Ku, Chih-Chun Liu, Cheng-Yen Yang

prev 1 2 3 next