Patents by Inventor Wei Tsung Lu

Wei Tsung Lu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20250078814
    Abstract: The present disclosure provides a multi-modal encoder processing method and apparatus, a computer device and a storage medium. The method includes: acquiring a pair of mask samples to be processed, the pair of mask samples including a text sample and an audio sample associated with each other, and at least one of the text sample and the audio sample is masked; based on a multi-modal encoder, generating a text encoding feature of the text sample, and generating an audio encoding feature of the audio sample, a linear spectrum feature of the audio sample being fused in the text encoding feature, and a linear word feature of the text sample being fused in the audio encoding feature; and predicting masked mask information according to the text encoding feature and the audio encoding feature, and correcting the multi-modal encoder based on an accuracy of the mask information.
    Type: Application
    Filed: August 29, 2024
    Publication date: March 6, 2025
    Inventors: Dong Guo, Zihao He, Weituo Hao, Xuchen Song, Zongyu Yin, Jingsong Gao, Wei Tsung Lu, Junyu Dai
  • Publication number: 20250078857
    Abstract: The present disclosure describes techniques for implementing improved audio source separation. A complex spectrum X is split into a plurality of K bands along a frequency axis by applying band-split operations on the complex spectrum X. The complex spectrum is a time-frequency representation of audio signals. Each of the plurality of K bands is denoted as Xk, k=1, . . . , K. Each band Xk comprises one or more frequency bins. Each individual multilayer perceptron is applied to each band Xk to extract latent representations and obtain outputs Hk0. A time-domain transformer and a frequency-domain transformer are applied on a stacked representation H0. Time-domain and frequency-domain transformers are repeatedly applying in an interleaved manner for L times to obtain HL output from the transformer blocks. The HL is input into a multi-band mask estimation sub-model. A complex ideal ratio mask is generated based on outputs from the multi-band mask estimation sub-model.
    Type: Application
    Filed: August 31, 2023
    Publication date: March 6, 2025
    Inventors: Wei Tsung LU, Ju-Chiang WANG
  • Publication number: 20240404494
    Abstract: The present disclosure describes techniques for implementing automatic music audio transcription. A deep neural network model may be configured. The deep neural network model comprises a spectral cross-attention sub-model configured to project a spectral representation of each time step t, denoted as St, into a set of latent arrays at the time step t, denoted as ?th, h representing an h-th iteration. The deep neutral network model comprises a plurality of latent transformers configured to perform self-attention on the set of latent arrays ?th. The deep neural network model further comprises a set of temporal transformers configured to enable communications between any pairs of latent arrays ?that different time steps. Training data may be augmented by randomly mixing a plurality of types of datasets comprising a vocal dataset and an instrument dataset. The deep neural network model may be trained using the augmented training data.
    Type: Application
    Filed: June 1, 2023
    Publication date: December 5, 2024
    Inventors: Wei Tsung LU, Ju-Chiang WANG, Yun-Ning HUNG
  • Patent number: 12106740
    Abstract: Devices, systems, and methods related to implementing supervised metric learning during a training of a deep neural network model are disclosed herein. In examples, audio input may be received, where the audio input includes a plurality of song fragments from a plurality of songs. For each song fragment, an aligning function may be performed to center the song fragment based on determined beat information, thereby creating a plurality of aligned song fragments. For each song fragment of the plurality of song fragments, an embedding vector may be obtained from the deep neural network. Thus, a batch of aligned song fragments from the plurality of aligned song fragments may be selected, such that a training tuple may be selected. A loss metric may be generated based on the selected training tuple and one or more weights of the deep neural network model may be updated based on the loss metric.
    Type: Grant
    Filed: October 15, 2021
    Date of Patent: October 1, 2024
    Assignee: Lemon Inc.
    Inventors: Ju-Chiang Wang, Jordan Smith, Wei Tsung Lu
  • Publication number: 20240153474
    Abstract: The present disclosure describes techniques for melody extraction. The techniques comprise receiving a polyphonic symbolic music file. The polyphonic symbolic music file may comprise a plurality of notes. The polyphonic symbolic music file may be converted to a plurality of feature vectors. Each of the plurality of feature vectors may be a multidimensional vector. Each of the plurality of feature vectors may correspond to a particular note of the plurality of notes. The plurality of feature vectors corresponding to the plurality of notes may be classified using a model that is trained to determine whether each of the plurality of notes belongs to a melody based on the plurality of feature vectors.
    Type: Application
    Filed: November 14, 2022
    Publication date: May 9, 2024
    Inventors: Katerina KOSTA, Wei Tsung LU
  • Patent number: 11854558
    Abstract: Devices, systems and methods related to causing an apparatus to generate music information of audio data using a transformer-based neural network model with a multilevel transformer for audio analysis, using a spectral and a temporal transformer, are disclosed herein.
    Type: Grant
    Filed: October 15, 2021
    Date of Patent: December 26, 2023
    Assignee: Lemon Inc.
    Inventors: Wei Tsung Lu, Ju-Chiang Wang, Minz Won, Keunwoo Choi, Xuchen Song
  • Publication number: 20230121764
    Abstract: Devices, systems, and methods related to implementing supervised metric learning during a training of a deep neural network model are disclosed herein. In examples, audio input may be received, where the audio input includes a plurality of song fragments from a plurality of songs. For each song fragment, an aligning function may be performed to center the song fragment based on determined beat information, thereby creating a plurality of aligned song fragments. For each song fragment of the plurality of song fragments, an embedding vector may be obtained from the deep neural network. Thus, a batch of aligned song fragments from the plurality of aligned song fragments may be selected, such that a training tuple may be selected. A loss metric may be generated based on the selected training tuple and one or more weights of the deep neural network model may be updated based on the loss metric.
    Type: Application
    Filed: October 15, 2021
    Publication date: April 20, 2023
    Inventors: Ju-Chiang Wang, Jordan Smith, Wei Tsung Lu
  • Publication number: 20230124006
    Abstract: Devices, systems and methods related to causing an apparatus to generate music information of audio data using a transformer-based neural network model with a multilevel transformer for audio analysis, using a spectral and a temporal transformer, are disclosed herein.
    Type: Application
    Filed: October 15, 2021
    Publication date: April 20, 2023
    Inventors: Wei Tsung Lu, Ju-Chiang Wang, Minz Won, Keunwoo Choi, Xuchen Song