Patents by Inventor Gus XIA

Gus XIA has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12488772
    Abstract: A method, and non-transitory computer readable medium that perform a method for generating background music tailored for a movie scene, in a smart audio-visual display device. The method includes receiving video of the movie scene. Processing circuitry detects speech signals in the movie scene and extracts visual information and, when speech signals are detected, spoken dialogue from the movie scene. Descriptive text is generated from the visual information. Emotion categories are detected based on the visual information. The spoken dialogue is transcribed into transcribed text. A large language model (LLM) translates the descriptive text, emotion categories and transcribed text into text-based low-level musical instrument conditions. A text-to-music model is guided, by the low-level musical instrument conditions, to generate audio tokens that resonate with the movie scene. Music signals are output in accordance with the audio tokens in synchronism with the movie scene.
    Type: Grant
    Filed: October 1, 2024
    Date of Patent: December 2, 2025
    Assignee: Mohamed bin Zayed University of Artificial Intelligence
    Inventors: Gus Xia, Muhammad Taimoor Haseeb, Ahmad Hammoudeh
  • Publication number: 20250252943
    Abstract: A method, and non-transitory computer readable medium that perform a method for generating background music tailored for a movie scene, in a smart audio-visual display device. The method includes receiving video of the movie scene. Processing circuitry detects speech signals in the movie scene and extracts visual information and, when speech signals are detected, spoken dialogue from the movie scene. Descriptive text is generated from the visual information. Emotion categories are detected based on the visual information. The spoken dialogue is transcribed into transcribed text. A large language model (LLM) translates the descriptive text, emotion categories and transcribed text into text-based low-level musical instrument conditions. A text-to-music model is guided, by the low-level musical instrument conditions, to generate audio tokens that resonate with the movie scene. Music signals are output in accordance with the audio tokens in synchronism with the movie scene.
    Type: Application
    Filed: October 1, 2024
    Publication date: August 7, 2025
    Applicant: Mohamed bin Zayed University of Artificial Intelligence
    Inventors: Gus XIA, Muhammad Taimoor HASEEB, Ahmad HAMMOUDEH