Patents Assigned to KWAI INC.
  • Patent number: 11562734
    Abstract: The present disclosure relates to an automatic speech recognition system and a method thereof. The system includes a conformer encoder and a pair of ping-pong buffers. The encoder includes a plurality of encoder layers sequentially executed by one or more graphic processing units. At least one encoder layer includes a first feed forward module, a multi-head self-attention module, a convolution module, and a second feed forward module. The convolution module and the multi-head self-attention module are sandwiched between the first feedforward module and the second feed forward module. The four modules respectively include a plurality of encoder sublayers fused into one or more encoder kernels. The one or more encoder kernels respectively read from one of the pair of ping-pong buffers and write into the other of the pair of ping-pong buffers.
    Type: Grant
    Filed: January 4, 2021
    Date of Patent: January 24, 2023
    Assignee: KWAI INC.
    Inventors: Yongxiong Ren, Yang Liu, Heng Liu, Lingzhi Liu, Jie Li, Kaituo Xu, Xiaorui Wang
  • Publication number: 20230010197
    Abstract: A method to implement a fixed-point batchnorm layer in a neural network for data processing is provided in the present disclosure. The method includes: receiving fixed-point input data over a channel of a standalone floating-point batchnorm layer, and converting the floating-point input data into fixed-point input data of the standalone floating-point batchnorm layer; obtaining fixed-point quantization parameters in each channel based on the input data and floating-point parameters ?i, ?i, ?i in each channel; converting the standalone floating-point batchnorm layer based on the fixed-point quantization parameters into a fixed-point batchnorm layer for processing the fixed-point input data to generate fixed-point output data; and mapping the fixed-point batchnorm layer to a fixed-point convolution layer and the computation of convolution is done by matrix multiplication that can be executed on a GEMM engine.
    Type: Application
    Filed: July 6, 2021
    Publication date: January 12, 2023
    Applicant: KWAI INC.
    Inventors: Ming Kai HSU, Sikai WANG
  • Publication number: 20230010981
    Abstract: A method to implement a fixed-point scale layer in a neural network for data processing is provided in the present disclosure. The method includes: receiving fixed-point input data over a channel of a standalone floating-point scale layer, and converting the floating-point input data into fixed-point input data of the standalone floating-point scale layer; obtaining fixed-point quantization parameters in each channel based on the input data and floating-point parameters ?i, ?i in each channel; converting the standalone floating-point scale layer based on the fixed-point quantization parameters into a fixed-point scale layer for processing the fixed-point input data to generate fixed-point output data; and mapping the fixed-point scale layer to a fixed-point convolution layer and the computation of convolution is done by matrix multiplication that can be executed on a GEMM engine.
    Type: Application
    Filed: July 6, 2021
    Publication date: January 12, 2023
    Applicant: KWAI INC.
    Inventors: Ming Kai HSU, Sitong FENG
  • Publication number: 20220335250
    Abstract: A method and an apparatus for training a generative adversarial network (GAN) and a method and an apparatus for processing an image are provided. The method for training the GAN includes: obtaining a fine-grained style label (FGSL) associated with the image and inputting the FGSL and a latent vector into a style-based generator in the GAN; the style-based generator generating an first output image based on the FGSL and the latent vector; the projection discriminator determining whether the first output image matches the image based on the FGSL; and adjusting one or more parameters of the GAN and regenerating, by the style-based generator, a second output image based on the FGSL, the latent vector, and the adjusted GAN in response to determining that the first output image does not match the image based on the FGSL.
    Type: Application
    Filed: April 19, 2021
    Publication date: October 20, 2022
    Applicant: KWAI INC.
    Inventors: Xin MIAO, Huayan WANG
  • Publication number: 20220310068
    Abstract: Methods and apparatuses for automatic speech recognition are provided. The method includes: generating a weight matrix for a layer of a plurality of layers in a neural network; dividing the weight matrix into a plurality of blocks, each block including a plurality of weights; selecting a set of blocks from the plurality of blocks for block-wise pruning by minimizing a cost function subject to a pre-determined block-wise constraint; and generating a block-wise pruned weight matrix by setting one or more weights in the set of blocks to zero. The weight matrix includes a set of weights associated with the layer, the plurality of layers includes a first layer receiving a first input associated with one or more audio feature sequences, and the plurality of layers are executed on one or more processors.
    Type: Application
    Filed: March 25, 2021
    Publication date: September 29, 2022
    Applicant: KWAI INC.
    Inventors: Yongxiong REN, Bingbing LI, Yang LIU, Lingzhi LIU
  • Publication number: 20220310069
    Abstract: A method and an apparatus for automatic speech recognition are provided. The method includes: generating a weight matrix for a layer of a plurality of layers in a neural network; dividing the weight matrix into a plurality of blocks, each block including a plurality of weights; selecting a pre-determined percentage of weights from at least one block for block-wise pruning; and generating a block-wise pruned weight matrix by setting the pre-determined percentage of weights selected from the at least one block to zero. The weight matrix includes a set of weights associated with the layer, the plurality of layers includes a first layer receiving a first input associated with one or more audio feature sequences, and the plurality of layers are executed on one or more processors.
    Type: Application
    Filed: March 25, 2021
    Publication date: September 29, 2022
    Applicant: KWAI INC.
    Inventors: Yongxiong REN, Bingbing LI, Yang LIU, Lingzhi LIU
  • Publication number: 20220292727
    Abstract: A class-specific neural network for video compressed sensing and methods for training and testing the class-specific neural network are provided. The class-specific neural network includes a Gaussian-mixture model (GMM) and a plurality of encoders, where the GMM classifies video frame blocks with a plurality of clusters and assigns the video frame blocks to the plurality of clusters. Further, the plurality of encoders receive the video frame blocks and generate a plurality of compressed-sensed frame block vectors, where the plurality of encoders correspond to the plurality of clusters.
    Type: Application
    Filed: March 15, 2022
    Publication date: September 15, 2022
    Applicants: KWAI INC., SANTA CLARA UNIVERSITY
    Inventors: Yifei PEI, Ying LIU, Nam LING, Lingzhi LIU, Yongxiong REN, Ming Kai HSU
  • Publication number: 20220262349
    Abstract: Systems and methods are provided for automatic speech recognition. In the method, the system obtains a padded sequence by processing a plurality of acoustic signals. The system compresses the padded sequence by reducing the size of the padded sequence to obtain a compressed sequence. The system inputs the compressed sequence into a pre-trained encoder neural network to obtain an encoded sequence and then decompresses the encoded sequence by recovering the encoded sequence to an original sequential ordering. The system inputs the encoded sequence to a decoding module to obtain recognition texts.
    Type: Application
    Filed: February 17, 2021
    Publication date: August 18, 2022
    Applicant: KWAI INC.
    Inventors: Yongxiong REN, Yang LIU, Heng LIU, Lingzhi LIU
  • Publication number: 20220245447
    Abstract: Systems and methods are provided for quantization aware training of a neural network for heterogeneous hardware platform. In the method, the system acquires hardware profiles with respect to a plurality of hardware components of a heterogeneous hardware platform. The system determines a plurality of hardware configurations based on the hardware profiles. The system acquires a set of training data and performing a quantization aware training using the training data on a network model based on the hardware configurations. The system obtains the network model with model weights for the heterogeneous hardware platform.
    Type: Application
    Filed: February 2, 2021
    Publication date: August 4, 2022
    Applicant: KWAI INC.
    Inventors: Yang LIU, Yongxiong REN, Lingzhi LIU
  • Publication number: 20220215832
    Abstract: The present disclosure relates to an automatic speech recognition system and a method thereof. The system includes a conformer encoder and a pair of ping-pong buffers. The encoder includes a plurality of encoder layers sequentially executed by one or more graphic processing units. At least one encoder layer includes a first feed forward module, a multi-head self-attention module, a convolution module, and a second feed forward module. The convolution module and the multi-head self-attention module are sandwiched between the first feedforward module and the second feed forward module. The four modules respectively include a plurality of encoder sublayers fused into one or more encoder kernels. The one or more encoder kernels respectively read from one of the pair of ping-pong buffers and write into the other of the pair of ping-pong buffers.
    Type: Application
    Filed: January 4, 2021
    Publication date: July 7, 2022
    Applicant: KWAI INC.
    Inventors: Yongxiong REN, Yang LIU, Heng LIU, Lingzhi LIU, Jie LI, Kaituo XU, Xiaorui WANG
  • Publication number: 20220215843
    Abstract: An automatic speech recognition system and a method thereof are provided. The system includes an encoder and a decoder. The encoder comprises a plurality of encoder layers. At least one encoder layer includes a plurality of encoder sublayers fused into one or more encoder kernels. The system further comprises a first pair of ping-pong buffers communicating with the one or more encoder kernels. The decoder comprises a plurality of decoder layers. At least one decoder layer includes a plurality of decoder sublayers fused into one or more decoder kernels. The decoder receives a decoder output related to the encoder output and generates a decoder output. The encoder sends the decoder output to a beam search kernel.
    Type: Application
    Filed: January 4, 2021
    Publication date: July 7, 2022
    Applicant: KWAI INC.
    Inventors: Yongxiong REN, Heng LIU, Yang LIU, Lingzhi LIU, Jie LI, Yuanyuan ZHAO, Xiaorui WANG
  • Publication number: 20220164630
    Abstract: A method for detecting moving objects in video frames, an apparatus and a non-transitory computer-readable storage medium thereof are provided. The method includes that: an encoder in a 3-dimenional (3D) separable convolutional neural network with multi-input multi-output (3DS_MM) receives a first input including multiple video frames, where the encoder includes a plurality of encoder layers including 3D separable convolutional neural network (CNN) layers; the encoder generates a first encoder output; and a decoder in the 3DS_MM receives the first encoder output and generates a first output including multiple first binary masks related to the first input, where the decoder includes a plurality of decoder layers comprising 3D separable transposed CNN layers.
    Type: Application
    Filed: November 22, 2021
    Publication date: May 26, 2022
    Applicants: KWAI INC., SANTA CLARA UNIVERSITY
    Inventors: Bingxin HOU, Ying LIU, Nam LING, Lingzhi LIU, Yongxiong REN, Ming Kai HSU