Patents by Inventor Wei Tsung Lu

Wei Tsung Lu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

MULTI-MODAL ENCODER PROCESSING METHOD AND APPARATUS, COMPUTER DEVICE AND STORAGE MEDIUM

Publication number: 20250078814

Abstract: The present disclosure provides a multi-modal encoder processing method and apparatus, a computer device and a storage medium. The method includes: acquiring a pair of mask samples to be processed, the pair of mask samples including a text sample and an audio sample associated with each other, and at least one of the text sample and the audio sample is masked; based on a multi-modal encoder, generating a text encoding feature of the text sample, and generating an audio encoding feature of the audio sample, a linear spectrum feature of the audio sample being fused in the text encoding feature, and a linear word feature of the text sample being fused in the audio encoding feature; and predicting masked mask information according to the text encoding feature and the audio encoding feature, and correcting the multi-modal encoder based on an accuracy of the mask information.

Type: Application

Filed: August 29, 2024

Publication date: March 6, 2025

Inventors: Dong Guo, Zihao He, Weituo Hao, Xuchen Song, Zongyu Yin, Jingsong Gao, Wei Tsung Lu, Junyu Dai
IMPLEMENTING IMPROVED AUDIO SOURCE SEPARATION

Publication number: 20250078857

Abstract: The present disclosure describes techniques for implementing improved audio source separation. A complex spectrum X is split into a plurality of K bands along a frequency axis by applying band-split operations on the complex spectrum X. The complex spectrum is a time-frequency representation of audio signals. Each of the plurality of K bands is denoted as Xk, k=1, . . . , K. Each band Xk comprises one or more frequency bins. Each individual multilayer perceptron is applied to each band Xk to extract latent representations and obtain outputs Hk0. A time-domain transformer and a frequency-domain transformer are applied on a stacked representation H0. Time-domain and frequency-domain transformers are repeatedly applying in an interleaved manner for L times to obtain HL output from the transformer blocks. The HL is input into a multi-band mask estimation sub-model. A complex ideal ratio mask is generated based on outputs from the multi-band mask estimation sub-model.

Type: Application

Filed: August 31, 2023

Publication date: March 6, 2025

Inventors: Wei Tsung LU, Ju-Chiang WANG
IMPLEMENTING AUTOMATIC MUSIC AUDIO TRANSCRIPTION

Publication number: 20240404494

Abstract: The present disclosure describes techniques for implementing automatic music audio transcription. A deep neural network model may be configured. The deep neural network model comprises a spectral cross-attention sub-model configured to project a spectral representation of each time step t, denoted as St, into a set of latent arrays at the time step t, denoted as ?th, h representing an h-th iteration. The deep neutral network model comprises a plurality of latent transformers configured to perform self-attention on the set of latent arrays ?th. The deep neural network model further comprises a set of temporal transformers configured to enable communications between any pairs of latent arrays ?that different time steps. Training data may be augmented by randomly mixing a plurality of types of datasets comprising a vocal dataset and an instrument dataset. The deep neural network model may be trained using the augmented training data.

Type: Application

Filed: June 1, 2023

Publication date: December 5, 2024

Inventors: Wei Tsung LU, Ju-Chiang WANG, Yun-Ning HUNG
Supervised metric learning for music structure features

Patent number: 12106740

Abstract: Devices, systems, and methods related to implementing supervised metric learning during a training of a deep neural network model are disclosed herein. In examples, audio input may be received, where the audio input includes a plurality of song fragments from a plurality of songs. For each song fragment, an aligning function may be performed to center the song fragment based on determined beat information, thereby creating a plurality of aligned song fragments. For each song fragment of the plurality of song fragments, an embedding vector may be obtained from the deep neural network. Thus, a batch of aligned song fragments from the plurality of aligned song fragments may be selected, such that a training tuple may be selected. A loss metric may be generated based on the selected training tuple and one or more weights of the deep neural network model may be updated based on the loss metric.

Type: Grant

Filed: October 15, 2021

Date of Patent: October 1, 2024

Assignee: Lemon Inc.

Inventors: Ju-Chiang Wang, Jordan Smith, Wei Tsung Lu
MELODY EXTRACTION FROM POLYPHONIC SYMBOLIC MUSIC

Publication number: 20240153474

Abstract: The present disclosure describes techniques for melody extraction. The techniques comprise receiving a polyphonic symbolic music file. The polyphonic symbolic music file may comprise a plurality of notes. The polyphonic symbolic music file may be converted to a plurality of feature vectors. Each of the plurality of feature vectors may be a multidimensional vector. Each of the plurality of feature vectors may correspond to a particular note of the plurality of notes. The plurality of feature vectors corresponding to the plurality of notes may be classified using a model that is trained to determine whether each of the plurality of notes belongs to a melody based on the plurality of feature vectors.

Type: Application

Filed: November 14, 2022

Publication date: May 9, 2024

Inventors: Katerina KOSTA, Wei Tsung LU
System and method for training a transformer-in-transformer-based neural network model for audio data

Patent number: 11854558

Abstract: Devices, systems and methods related to causing an apparatus to generate music information of audio data using a transformer-based neural network model with a multilevel transformer for audio analysis, using a spectral and a temporal transformer, are disclosed herein.

Type: Grant

Filed: October 15, 2021

Date of Patent: December 26, 2023

Assignee: Lemon Inc.

Inventors: Wei Tsung Lu, Ju-Chiang Wang, Minz Won, Keunwoo Choi, Xuchen Song
SUPERVISED METRIC LEARNING FOR MUSIC STRUCTURE FEATURES

Publication number: 20230121764

Abstract: Devices, systems, and methods related to implementing supervised metric learning during a training of a deep neural network model are disclosed herein. In examples, audio input may be received, where the audio input includes a plurality of song fragments from a plurality of songs. For each song fragment, an aligning function may be performed to center the song fragment based on determined beat information, thereby creating a plurality of aligned song fragments. For each song fragment of the plurality of song fragments, an embedding vector may be obtained from the deep neural network. Thus, a batch of aligned song fragments from the plurality of aligned song fragments may be selected, such that a training tuple may be selected. A loss metric may be generated based on the selected training tuple and one or more weights of the deep neural network model may be updated based on the loss metric.

Type: Application

Filed: October 15, 2021

Publication date: April 20, 2023

Inventors: Ju-Chiang Wang, Jordan Smith, Wei Tsung Lu
SYSTEM AND METHOD FOR TRAINING A TRANSFORMER-IN-TRANSFORMER-BASED NEURAL NETWORK MODEL FOR AUDIO DATA

Publication number: 20230124006

Abstract: Devices, systems and methods related to causing an apparatus to generate music information of audio data using a transformer-based neural network model with a multilevel transformer for audio analysis, using a spectral and a temporal transformer, are disclosed herein.

Type: Application

Filed: October 15, 2021

Publication date: April 20, 2023

Inventors: Wei Tsung Lu, Ju-Chiang Wang, Minz Won, Keunwoo Choi, Xuchen Song