Patents by Inventor Kaizhi Qian
Kaizhi Qian has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12211491Abstract: One or more computer processors obtain an initial subnetwork at a target sparsity and an initial pruning mask from a pre-trained self-supervised learning (SSL) speech model. The one or more computer processors finetune the initial subnetwork, comprising: the one or more computer processors zero out one or more masked weights in the initial subnetwork specified by the initial pruning mask; the one or more computer processors train a new subnetwork from the zeroed out subnetwork; the one or more computer processors prune one or more weights of lowest magnitude in the new subnetwork regardless of network structure to satisfy the target sparsity. The one or more computer processors classify an audio segment with the finetuned subnetwork.Type: GrantFiled: May 9, 2022Date of Patent: January 28, 2025Assignee: International Business Machines CorporationInventors: Cheng-I Lai, Yang Zhang, Kaizhi Qian, Chuang Gan, James R. Glass, Alexander Haojan Liu
-
Publication number: 20240303508Abstract: Techniques of video processing for action detection using machine learning. An action depicted in a video is identified. A type of the action is predicted based on a classification module of one or more machine learning models. A video clip depicting the action is predicted in the video. To that end, a starting point and an ending point of the video clip in the video are determined. The video clip is predicted based on a localization module of the one or more machine learning models. A refinement is performed that includes refining the type of the action based on the video clip or refining the video clip based on the type of the action. An indication of the refined type or of the refined video clip is output.Type: ApplicationFiled: March 8, 2023Publication date: September 12, 2024Inventors: Bo WU, Chuang GAN, Kaizhi QIAN, Pin-Yu CHEN
-
Patent number: 11996083Abstract: A computer-implemented method is provided of using a machine learning model for disentanglement of prosody in spoken natural language. The method includes encoding, by a computing device, the spoken natural language to produce content code. The method further includes resampling, by the computing device without text transcriptions, the content code to obscure the prosody by applying an unsupervised technique to the machine learning model to generate prosody-obscured content code. The method additionally includes decoding, by the computing device, the prosody-obscured content code to synthesize speech indirectly based upon the content code.Type: GrantFiled: June 3, 2021Date of Patent: May 28, 2024Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Kaizhi Qian, Yang Zhang, Shiyu Chang, Jinjun Xiong, Chuang Gan, David Cox
-
Publication number: 20240170007Abstract: A method, computer system and computer program product is presented for providing a self-supervised speech representation. In one embodiment, audio input is received including speech utterances. A label sequence is generated from these speech utterances by a teacher label generator. A speech representation is generated of a partially masked version of the speech utterance using a speech representation network. The speech utterance is passed into two random transformations that alter only speaker information prior to the partial masking. A predictor will then predict the label sequence. In one embodiment performance-based assessment is made on a cross-entropy loss between the generated label sequence and a predicted label sequence.Type: ApplicationFiled: November 7, 2022Publication date: May 23, 2024Inventors: Kaizhi Qian, Yang Zhang, Chuang Gan, Dakuo Wang, Bo Wu
-
Publication number: 20240127001Abstract: Techniques for audio understanding using fixed language models are provided. In one aspect, a system for performing audio understanding tasks includes: a fixed text embedder for, on receipt of a prompt sequence having (e.g., from 0-10) demonstrations of an audio understanding task followed by a new question, converting the prompt sequence into text embeddings; a pretrained audio encoder for converting the prompt sequence into audio embeddings; and a fixed autoregressive language model for answering the new question using the text embeddings and the audio embeddings. A method for performing audio understanding tasks is also provided.Type: ApplicationFiled: October 12, 2022Publication date: April 18, 2024Inventors: Kaizhi Qian, Yang Zhang, Chuang Gan, Bo Wu, Zhenfang Chen
-
Patent number: 11854305Abstract: A bi-directional spatial-temporal transformer neural network (BDSTT) is trained to predict original coordinates of a skeletal joint in a specific frame through relative relationships of the skeletal joint to other joints and to the state of the skeletal joint in other frames. Obtain a plurality of frames comprising coordinates of the skeletal joint and coordinates of other joints. Produce a spatially masked frame by masking the original coordinates of the skeletal joint. Provide the specific frame, the spatially masked frame, and at least one more frame to a coordinate prediction head of the BDSTT. Obtain, from the coordinate prediction head, a prediction of coordinates for the skeletal joint. Adjust parameters of the BDSTT until a mean-squared error, between the prediction of coordinates for the skeletal joint and the original coordinates of the skeletal joint, converges.Type: GrantFiled: May 9, 2021Date of Patent: December 26, 2023Assignee: International Business Machines CorporationInventors: Bo Wu, Chuang Gan, Dakuo Wang, Kaizhi Qian
-
Publication number: 20230360642Abstract: One or more computer processors obtain an initial subnetwork at a target sparsity and an initial pruning mask from a pre-trained self-supervised learning (SSL) speech model. The one or more computer processors finetune the initial subnetwork, comprising: the one or more computer processors zero out one or more masked weights in the initial subnetwork specified by the initial pruning mask; the one or more computer processors train a new subnetwork from the zeroed out subnetwork; the one or more computer processors prune one or more weights of lowest magnitude in the new subnetwork regardless of network structure to satisfy the target sparsity. The one or more computer processors classify an audio segment with the finetuned subnetwork.Type: ApplicationFiled: May 9, 2022Publication date: November 9, 2023Inventors: Cheng-I Lai, Yang Zhang, Kaizhi Qian, Chuang Gan, James R. Glass, Alexander Haojan Liu
-
Publication number: 20220392429Abstract: A computer-implemented method is provided of using a machine learning model for disentanglement of prosody in spoken natural language. The method includes encoding, by a computing device, the spoken natural language to produce content code. The method further includes resampling, by the computing device without text transcriptions, the content code to obscure the prosody by applying an unsupervised technique to the machine learning model to generate prosody-obscured content code. The method additionally includes decoding, by the computing device, the prosody-obscured content code to synthesize speech indirectly based upon the content code.Type: ApplicationFiled: June 3, 2021Publication date: December 8, 2022Inventors: Kaizhi Qian, Yang Zhang, Shiyu Chang, Jinjun Xiong, Chuang Gan, David Cox
-
Publication number: 20220374629Abstract: A bi-directional spatial-temporal transformer neural network (BDSTT) is trained to predict original coordinates of a skeletal joint in a specific frame through relative relationships of the skeletal joint to other joints and to the state of the skeletal joint in other frames. Obtain a plurality of frames comprising coordinates of the skeletal joint and coordinates of other joints. Produce a spatially masked frame by masking the original coordinates of the skeletal joint. Provide the specific frame, the spatially masked frame, and at least one more frame to a coordinate prediction head of the BDSTT. Obtain, from the coordinate prediction head, a prediction of coordinates for the skeletal joint. Adjust parameters of the BDSTT until a mean-squared error, between the prediction of coordinates for the skeletal joint and the original coordinates of the skeletal joint, converges.Type: ApplicationFiled: May 9, 2021Publication date: November 24, 2022Inventors: Bo Wu, Chuang Gan, Dakuo Wang, Kaizhi Qian
-
Patent number: 11295762Abstract: A method, a structure, and a computer system for decomposing speech. The exemplary embodiments may include one or more encoders for generating one or more encodings of a speech input comprising rhythm information, pitch information, timbre information, and content information, and a decoder for decoding the one or more encodings.Type: GrantFiled: April 20, 2020Date of Patent: April 5, 2022Assignee: International Business Machines CorporationInventors: Kaizhi Qian, Yang Zhang, Shiyu Chang, Chuang Gan, David Cox
-
Publication number: 20210327460Abstract: A method, a structure, and a computer system for decomposing speech. The exemplary embodiments may include one or more encoders for generating one or more encodings of a speech input comprising rhythm information, pitch information, timbre information, and content information, and a decoder for decoding the one or more encodings.Type: ApplicationFiled: April 20, 2020Publication date: October 21, 2021Inventors: Kaizhi Qian, Yang Zhang, Shiyu Chang, Chuang Gan, David Cox
-
Patent number: 10709390Abstract: A system that detects heartbeats includes a sensor or a transducer and algorithms based on deep learning. The algorithms employ techniques of artificial intelligence that enable the system to extract heartbeat features under low signal-to-noise-ratio (SNR) conditions when a user is exercising. The algorithms can be applied to various technologies for heart rate monitoring such as ultrasound Doppler, photoplethysmogram (PPG), electrocardiogram (EKG), acoustic, pressure/force sensing and laser/RF Doppler, among other types of sensing methods.Type: GrantFiled: November 21, 2017Date of Patent: July 14, 2020Assignee: LOGOS CARE, INC.Inventors: Kaizhi Qian, Yang Zhang, Thomas Y. Lo
-
Publication number: 20180249964Abstract: A system that detects heartbeats includes a sensor or a transducer and algorithms based on deep learning. The algorithms employ techniques of artificial intelligence that enable the system to extract heartbeat features under low signal-to-noise-ratio (SNR) conditions when a user is exercising. The algorithms can be applied to various technologies for heart rate monitoring such as ultrasound Doppler, photoplethysmogram (PPG), electrocardiogram (EKG), acoustic, pressure/force sensing and laser/RF Doppler, among other types of sensing methods.Type: ApplicationFiled: November 21, 2017Publication date: September 6, 2018Inventors: Kaizhi Qian, Yang Zhang, Thomas Y. Lo