Patents by Inventor Gus XIA

Gus XIA has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

System and method of GPT driven cinematic music generation through text processing

Patent number: 12488772

Abstract: A method, and non-transitory computer readable medium that perform a method for generating background music tailored for a movie scene, in a smart audio-visual display device. The method includes receiving video of the movie scene. Processing circuitry detects speech signals in the movie scene and extracts visual information and, when speech signals are detected, spoken dialogue from the movie scene. Descriptive text is generated from the visual information. Emotion categories are detected based on the visual information. The spoken dialogue is transcribed into transcribed text. A large language model (LLM) translates the descriptive text, emotion categories and transcribed text into text-based low-level musical instrument conditions. A text-to-music model is guided, by the low-level musical instrument conditions, to generate audio tokens that resonate with the movie scene. Music signals are output in accordance with the audio tokens in synchronism with the movie scene.

Type: Grant

Filed: October 1, 2024

Date of Patent: December 2, 2025

Assignee: Mohamed bin Zayed University of Artificial Intelligence

Inventors: Gus Xia, Muhammad Taimoor Haseeb, Ahmad Hammoudeh
SYSTEM AND METHOD OF GPT DRIVEN CINEMATIC MUSIC GENERATION THROUGH TEXT PROCESSING

Publication number: 20250252943

Abstract: A method, and non-transitory computer readable medium that perform a method for generating background music tailored for a movie scene, in a smart audio-visual display device. The method includes receiving video of the movie scene. Processing circuitry detects speech signals in the movie scene and extracts visual information and, when speech signals are detected, spoken dialogue from the movie scene. Descriptive text is generated from the visual information. Emotion categories are detected based on the visual information. The spoken dialogue is transcribed into transcribed text. A large language model (LLM) translates the descriptive text, emotion categories and transcribed text into text-based low-level musical instrument conditions. A text-to-music model is guided, by the low-level musical instrument conditions, to generate audio tokens that resonate with the movie scene. Music signals are output in accordance with the audio tokens in synchronism with the movie scene.

Type: Application

Filed: October 1, 2024

Publication date: August 7, 2025

Applicant: Mohamed bin Zayed University of Artificial Intelligence

Inventors: Gus XIA, Muhammad Taimoor HASEEB, Ahmad HAMMOUDEH

System and method of GPT driven cinematic music generation through text processing

SYSTEM AND METHOD OF GPT DRIVEN CINEMATIC MUSIC GENERATION THROUGH TEXT PROCESSING