Patents by Inventor Kaizhi Qian

Kaizhi Qian has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240127001
    Abstract: Techniques for audio understanding using fixed language models are provided. In one aspect, a system for performing audio understanding tasks includes: a fixed text embedder for, on receipt of a prompt sequence having (e.g., from 0-10) demonstrations of an audio understanding task followed by a new question, converting the prompt sequence into text embeddings; a pretrained audio encoder for converting the prompt sequence into audio embeddings; and a fixed autoregressive language model for answering the new question using the text embeddings and the audio embeddings. A method for performing audio understanding tasks is also provided.
    Type: Application
    Filed: October 12, 2022
    Publication date: April 18, 2024
    Inventors: Kaizhi Qian, Yang Zhang, Chuang Gan, Bo Wu, Zhenfang Chen
  • Patent number: 11854305
    Abstract: A bi-directional spatial-temporal transformer neural network (BDSTT) is trained to predict original coordinates of a skeletal joint in a specific frame through relative relationships of the skeletal joint to other joints and to the state of the skeletal joint in other frames. Obtain a plurality of frames comprising coordinates of the skeletal joint and coordinates of other joints. Produce a spatially masked frame by masking the original coordinates of the skeletal joint. Provide the specific frame, the spatially masked frame, and at least one more frame to a coordinate prediction head of the BDSTT. Obtain, from the coordinate prediction head, a prediction of coordinates for the skeletal joint. Adjust parameters of the BDSTT until a mean-squared error, between the prediction of coordinates for the skeletal joint and the original coordinates of the skeletal joint, converges.
    Type: Grant
    Filed: May 9, 2021
    Date of Patent: December 26, 2023
    Assignee: International Business Machines Corporation
    Inventors: Bo Wu, Chuang Gan, Dakuo Wang, Kaizhi Qian
  • Publication number: 20230360642
    Abstract: One or more computer processors obtain an initial subnetwork at a target sparsity and an initial pruning mask from a pre-trained self-supervised learning (SSL) speech model. The one or more computer processors finetune the initial subnetwork, comprising: the one or more computer processors zero out one or more masked weights in the initial subnetwork specified by the initial pruning mask; the one or more computer processors train a new subnetwork from the zeroed out subnetwork; the one or more computer processors prune one or more weights of lowest magnitude in the new subnetwork regardless of network structure to satisfy the target sparsity. The one or more computer processors classify an audio segment with the finetuned subnetwork.
    Type: Application
    Filed: May 9, 2022
    Publication date: November 9, 2023
    Inventors: Cheng-I Lai, Yang Zhang, Kaizhi Qian, Chuang Gan, James R. Glass, Alexander Haojan Liu
  • Publication number: 20220392429
    Abstract: A computer-implemented method is provided of using a machine learning model for disentanglement of prosody in spoken natural language. The method includes encoding, by a computing device, the spoken natural language to produce content code. The method further includes resampling, by the computing device without text transcriptions, the content code to obscure the prosody by applying an unsupervised technique to the machine learning model to generate prosody-obscured content code. The method additionally includes decoding, by the computing device, the prosody-obscured content code to synthesize speech indirectly based upon the content code.
    Type: Application
    Filed: June 3, 2021
    Publication date: December 8, 2022
    Inventors: Kaizhi Qian, Yang Zhang, Shiyu Chang, Jinjun Xiong, Chuang Gan, David Cox
  • Publication number: 20220374629
    Abstract: A bi-directional spatial-temporal transformer neural network (BDSTT) is trained to predict original coordinates of a skeletal joint in a specific frame through relative relationships of the skeletal joint to other joints and to the state of the skeletal joint in other frames. Obtain a plurality of frames comprising coordinates of the skeletal joint and coordinates of other joints. Produce a spatially masked frame by masking the original coordinates of the skeletal joint. Provide the specific frame, the spatially masked frame, and at least one more frame to a coordinate prediction head of the BDSTT. Obtain, from the coordinate prediction head, a prediction of coordinates for the skeletal joint. Adjust parameters of the BDSTT until a mean-squared error, between the prediction of coordinates for the skeletal joint and the original coordinates of the skeletal joint, converges.
    Type: Application
    Filed: May 9, 2021
    Publication date: November 24, 2022
    Inventors: Bo Wu, Chuang Gan, Dakuo Wang, Kaizhi Qian
  • Patent number: 11295762
    Abstract: A method, a structure, and a computer system for decomposing speech. The exemplary embodiments may include one or more encoders for generating one or more encodings of a speech input comprising rhythm information, pitch information, timbre information, and content information, and a decoder for decoding the one or more encodings.
    Type: Grant
    Filed: April 20, 2020
    Date of Patent: April 5, 2022
    Assignee: International Business Machines Corporation
    Inventors: Kaizhi Qian, Yang Zhang, Shiyu Chang, Chuang Gan, David Cox
  • Publication number: 20210327460
    Abstract: A method, a structure, and a computer system for decomposing speech. The exemplary embodiments may include one or more encoders for generating one or more encodings of a speech input comprising rhythm information, pitch information, timbre information, and content information, and a decoder for decoding the one or more encodings.
    Type: Application
    Filed: April 20, 2020
    Publication date: October 21, 2021
    Inventors: Kaizhi Qian, Yang Zhang, Shiyu Chang, Chuang Gan, David Cox
  • Patent number: 10709390
    Abstract: A system that detects heartbeats includes a sensor or a transducer and algorithms based on deep learning. The algorithms employ techniques of artificial intelligence that enable the system to extract heartbeat features under low signal-to-noise-ratio (SNR) conditions when a user is exercising. The algorithms can be applied to various technologies for heart rate monitoring such as ultrasound Doppler, photoplethysmogram (PPG), electrocardiogram (EKG), acoustic, pressure/force sensing and laser/RF Doppler, among other types of sensing methods.
    Type: Grant
    Filed: November 21, 2017
    Date of Patent: July 14, 2020
    Assignee: LOGOS CARE, INC.
    Inventors: Kaizhi Qian, Yang Zhang, Thomas Y. Lo
  • Publication number: 20180249964
    Abstract: A system that detects heartbeats includes a sensor or a transducer and algorithms based on deep learning. The algorithms employ techniques of artificial intelligence that enable the system to extract heartbeat features under low signal-to-noise-ratio (SNR) conditions when a user is exercising. The algorithms can be applied to various technologies for heart rate monitoring such as ultrasound Doppler, photoplethysmogram (PPG), electrocardiogram (EKG), acoustic, pressure/force sensing and laser/RF Doppler, among other types of sensing methods.
    Type: Application
    Filed: November 21, 2017
    Publication date: September 6, 2018
    Inventors: Kaizhi Qian, Yang Zhang, Thomas Y. Lo