Patents by Inventor Zeyu Jin

Zeyu Jin has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

USER INSIGHTS USING DEEP GENERATIVE FOUNDATION MODELS

Publication number: 20250245485

Abstract: Systems and methods for generating user insights include obtaining a query about a user interaction with a software application. The query can be in the form of a natural language question. Embodiments then select a task from a plurality of event prediction tasks based on the query. Next, embodiments generate, using a machine learning model, an event prediction based on the query and the task, where the machine learning model is trained to predict an event based on a sequence of user interactions with the software application. Embodiments then generate a natural language response to the query based on the task and the event prediction.

Type: Application

Filed: January 30, 2024

Publication date: July 31, 2025

Inventors: Suofei Wu, Hsiang-Yu Yang, Luwan Zhang, Zeyu Jin
FACE-AWARE SCALE MAGNIFICATION VIDEO EFFECTS

Publication number: 20250140292

Abstract: Embodiments of the present invention provide systems, methods, and computer storage media for cutting down a user's larger input video into an edited video comprising the most important video segments and applying corresponding video effects. Some embodiments of the present invention are directed to adding face-aware scale magnification to the trimmed video (e.g., applying scale magnification to simulate a camera zoom effect that hides shot cuts with respect to the subject's face). For example, as the trimmed video transitions from one video segment to the next video segment, a scale magnification may be applied that zooms in on a detected face at a boundary between the video segments to smooth the transition between video segments.

Type: Application

Filed: February 2, 2024

Publication date: May 1, 2025

Inventors: Anh Lan TRUONG, Deepali ANEJA, Hijung SHIN, Rubaiat HABIB, Jakub FISER, Kishore RADHAKRISHNA, Joel Richard BRANDT, Matthew David FISHER, Zeyu JIN, Kim Pascal PIMMEL, Wilmot LI, Lubomira Assenova DONTCHEVA
CAPTIONING USING GENERATIVE ARTIFICIAL INTELLIGENCE

Publication number: 20250139161

Abstract: Embodiments of the present invention provide systems, methods, and computer storage media for cutting down a user's larger input video into an edited video comprising the most important video segments and applying corresponding video effects. Some embodiments of the present invention are directed to adding captioning video effects to the trimmed video (e.g., applying face-aware and non-face-aware captioning to emphasize extracted video segment headings, important sentences, quotes, words of interest, extracted lists, etc.). For example, a prompt is provided to a generative language model to identify portions of a transcript (e.g., extracted scene summaries, important sentences, lists of items discussed in the video, etc.) to apply to corresponding video segments as captions depending on the type of caption (e.g., an extracted heading may be captioned at the start of a corresponding video segment, important sentences and/or extracted list items may be captioned when they are spoken).

Type: Application

Filed: February 2, 2024

Publication date: May 1, 2025

Inventors: Deepali ANEJA, Zeyu JIN, Hijung SHIN, Anh Lan TRUONG, Dingzeyu LI, Hanieh DEILAMSALEHY, Rubaiat HABIB, Matthew David FISHER, Kim Pascal PIMMEL, Wilmot LI, Lubomira Assenova DONTCHEVA
High fidelity audio super resolution

Patent number: 12217742

Abstract: Embodiments are disclosed for generating full-band audio from narrowband audio using a GAN-based audio super resolution model. A method of generating full-band audio may include receiving narrow-band input audio data, upsampling the narrow-band input audio data to generate upsampled audio data, providing the upsampled audio data to an audio super resolution model, the audio super resolution model trained to perform bandwidth expansion from narrow-band to wide-band, and returning wide-band output audio data corresponding to the narrow-band input audio data.

Type: Grant

Filed: November 23, 2021

Date of Patent: February 4, 2025

Assignees: Adobe Inc., The Trustees of Princeton University

Inventors: Zeyu Jin, Jiaqi Su, Adam Finkelstein
STUDIO QUALITY AUDIO ENHANCEMENT

Publication number: 20240331720

Abstract: Embodiments are disclosed for converting audio data to studio quality audio data. The method includes obtaining an audio data having a first quality for conversion to studio quality audio. A first machine learning model predicts a set of acoustic features. A spectral mask is applied to the audio data during the prediction of the set of acoustic features. A second machine learning model generates studio quality audio from the set of acoustic features and the audio data.

Type: Application

Filed: March 28, 2023

Publication date: October 3, 2024

Inventors: Zeyu JIN, Jiaqi SU, Adam FINKELSTEIN
SPOKEN LANGUAGE RECOGNITION

Publication number: 20240257798

Abstract: Some aspects of the technology described herein employ a neural network with an efficient and lightweight architecture to perform spoken language recognition. Given an audio signal comprising speech, features are generated from the audio signal, for instance, by converting the audio signal to a normalized spectrogram. The features are input to the neural network, which has one or more convolutional layers and an output activation layer. Each neuron of the output activation layer corresponds to a language from a set of language and generates an activation value. Based on the activations values, an indication of zero or more languages from the set of languages is provided for the audio signal.

Type: Application

Filed: February 1, 2023

Publication date: August 1, 2024

Inventors: Oriol NIETO-CABALLERO, Zeyu JIN, Justin Jonathan SALAMON, Franck DERNONCOURT
Neural pitch-shifting and time-stretching

Patent number: 11915714

Abstract: Methods for modifying audio data include operations for accessing audio data having a first prosody, receiving a target prosody differing from the first prosody, and computing acoustic features representing samples. Computing respective acoustic features for a sample includes computing a pitch feature as a quantized pitch value of the sample by assigning a pitch value, of the target prosody or the audio data, to at least one of a set of pitch bins having equal widths in cents. Computing the respective acoustic features further includes computing a periodicity feature from the audio data. The respective acoustic features for the sample include the pitch feature, the periodicity feature, and other acoustic features. A neural vocoder is applied to the acoustic features to pitch-shift and time-stretch the audio data from the first prosody toward the target prosody.

Type: Grant

Filed: December 21, 2021

Date of Patent: February 27, 2024

Assignees: Adobe Inc., Northwestern University

Inventors: Maxwell Morrison, Juan Pablo Caceres Chomali, Zeyu Jin, Nicholas Bryan, Bryan A. Pardo
Context-aware prosody correction of edited speech

Patent number: 11830481

Abstract: Methods are performed by one or more processing devices for correcting prosody in audio data. A method includes operations for accessing subject audio data in an audio edit region of the audio data. The subject audio data in the audio edit region potentially lacks prosodic continuity with unedited audio data in an unedited audio portion of the audio data. The operations further include predicting, based on a context of the unedited audio data, phoneme durations including a respective phoneme duration of each phoneme in the unedited audio data. The operations further include predicting, based on the context of the unedited audio data, a pitch contour comprising at least one respective pitch value of each phoneme in the unedited audio data. Additionally, the operations include correcting prosody of the subject audio data in the audio edit region by applying the phoneme durations and the pitch contour to the subject audio data.

Type: Grant

Filed: November 30, 2021

Date of Patent: November 28, 2023

Assignee: Adobe Inc.

Inventors: Maxwell Morrison, Zeyu Jin, Nicholas Bryan, Juan Pablo Caceres Chomali, Lucas Rencker
Music Enhancement Systems

Publication number: 20230343312

Abstract: In implementations of music enhancement systems, a computing device implements an enhancement system to receive input data describing a recorded acoustic waveform of a musical instrument. The recorded acoustic waveform is represented as an input mel spectrogram. The enhancement system generates an enhanced mel spectrogram by processing the input mel spectrogram using a first machine learning model trained on a first type of training data to generate enhanced mel spectrograms based on input mel spectrograms. An acoustic waveform of the musical instrument is generated by processing the enhanced mel spectrogram using a second machine learning model trained on a second type of training data to generate acoustic waveforms based on mel spectrograms. The acoustic waveform of the musical instrument does not include an acoustic artifact that is included in the recorded waveform of the musical instrument.

Type: Application

Filed: April 21, 2022

Publication date: October 26, 2023

Applicant: Adobe Inc.

Inventors: Nikhil Kandpal, Oriol Nieto-Caballero, Zeyu Jin
NEURAL PITCH-SHIFTING AND TIME-STRETCHING

Publication number: 20230197093

Abstract: Methods for modifying audio data include operations for accessing audio data having a first prosody, receiving a target prosody differing from the first prosody, and computing acoustic features representing samples. Computing respective acoustic features for a sample includes computing a pitch feature as a quantized pitch value of the sample by assigning a pitch value, of the target prosody or the audio data, to at least one of a set of pitch bins having equal widths in cents. Computing the respective acoustic features further includes computing a periodicity feature from the audio data. The respective acoustic features for the sample include the pitch feature, the periodicity feature, and other acoustic features. A neural vocoder is applied to the acoustic features to pitch-shift and time-stretch the audio data from the first prosody toward the target prosody.

Type: Application

Filed: December 21, 2021

Publication date: June 22, 2023

Inventors: Maxwell Morrison, Juan Pablo Caceres Chomali, Zeyu Jin, Nicholas Bryan, Bryan A. Pardo
CONTEXT-AWARE PROSODY CORRECTION OF EDITED SPEECH

Publication number: 20230169961

Abstract: Methods are performed by one or more processing devices for correcting prosody in audio data. A method includes operations for accessing subject audio data in an audio edit region of the audio data. The subject audio data in the audio edit region potentially lacks prosodic continuity with unedited audio data in an unedited audio portion of the audio data. The operations further include predicting, based on a context of the unedited audio data, phoneme durations including a respective phoneme duration of each phoneme in the unedited audio data. The operations further include predicting, based on the context of the unedited audio data, a pitch contour comprising at least one respective pitch value of each phoneme in the unedited audio data. Additionally, the operations include correcting prosody of the subject audio data in the audio edit region by applying the phoneme durations and the pitch contour to the subject audio data.

Type: Application

Filed: November 30, 2021

Publication date: June 1, 2023

Inventors: Maxwell Morrison, Zeyu Jin, Nicholas Bryan, Juan Pablo Caceres Chomali, Lucas Rencker
Searching for music

Patent number: 11636342

Abstract: In implementations of searching for music, a music search system can receive a music search request that includes a music file including music content. The music search system can also receive a selected musical attribute from a plurality of musical attributes. The music search system includes a music search application that can generate musical features of the music content, where a respective one or more of the musical features correspond to a respective one of the musical attributes. The music search application can then compare the musical features that correspond to the selected musical attribute to audio features of audio files, and determine similar audio files to the music file based on the comparison of the musical features to the audio features of the audio files.

Type: Grant

Filed: October 3, 2022

Date of Patent: April 25, 2023

Assignee: Adobe Inc.

Inventors: Jongpil Lee, Nicholas J. Bryan, Justin J. Salamon, Zeyu Jin
Searching for Music

Publication number: 20230097356

Abstract: In implementations of searching for music, a music search system can receive a music search request that includes a music file including music content. The music search system can also receive a selected musical attribute from a plurality of musical attributes. The music search system includes a music search application that can generate musical features of the music content, where a respective one or more of the musical features correspond to a respective one of the musical attributes. The music search application can then compare the musical features that correspond to the selected musical attribute to audio features of audio files, and determine similar audio files to the music file based on the comparison of the musical features to the audio features of the audio files.

Type: Application

Filed: October 3, 2022

Publication date: March 30, 2023

Applicant: Adobe Inc.

Inventors: Jongpil Lee, Nicholas J. Bryan, Justin J. Salamon, Zeyu Jin
Using a predictive model to automatically enhance audio having various audio quality issues

Patent number: 11514925

Abstract: Operations of a method include receiving a request to enhance a new source audio. Responsive to the request, the new source audio is input into a prediction model that was previously trained. Training the prediction model includes providing a generative adversarial network including the prediction model and a discriminator. Training data is obtained including tuples of source audios and target audios, each tuple including a source audio and a corresponding target audio. During training, the prediction model generates predicted audios based on the source audios. Training further includes applying a loss function to the predicted audios and the target audios, where the loss function incorporates a combination of a spectrogram loss and an adversarial loss. The prediction model is updated to optimize that loss function. After training, based on the new source audio, the prediction model generates a new predicted audio as an enhanced version of the new source audio.

Type: Grant

Filed: April 30, 2020

Date of Patent: November 29, 2022

Assignees: ADOBE INC., THE TRUSTEES OF PRINCETON UNIVERSITY

Inventors: Zeyu Jin, Jiaqi Su, Adam Finkelstein
Searching for music

Patent number: 11461649

Abstract: In implementations of searching for music, a music search system can receive a music search request that includes a music file including music content. The music search system can also receive a selected musical attribute from a plurality of musical attributes. The music search system includes a music search application that can generate musical features of the music content, where a respective one or more of the musical features correspond to a respective one of the musical attributes. The music search application can then compare the musical features that correspond to the selected musical attribute to audio features of audio files, and determine similar audio files to the music file based on the comparison of the musical features to the audio features of the audio files.

Type: Grant

Filed: March 19, 2020

Date of Patent: October 4, 2022

Assignee: Adobe Inc.

Inventors: Jongpil Lee, Nicholas J. Bryan, Justin J. Salamon, Zeyu Jin
Secure audio watermarking based on neural networks

Patent number: 11170793

Abstract: Embodiments provide systems, methods, and computer storage media for secure audio watermarking and audio authenticity verification. An audio watermark detector may include a neural network trained to detect a particular audio watermark and embedding technique, which may indicate source software used in a workflow that generated an audio file under test. For example, the watermark may indicate an audio file was generated using voice manipulation software, so detecting the watermark can indicate manipulated audio such as deepfake audio and other attacked audio signals. In some embodiments, the audio watermark detector may be trained as part of a generative adversarial network in order to make the underlying audio watermark more robust to neural network-based attacks. Generally, the audio watermark detector may evaluate time domain samples from chunks of an audio clip under test to detect the presence of the audio watermark and generate a classification for the audio clip.

Type: Grant

Filed: February 13, 2020

Date of Patent: November 9, 2021

Assignee: Adobe Inc.

Inventors: Zeyu Jin, Oona Shigeno Risse-Adams
USING A PREDICTIVE MODEL TO AUTOMATICALLY ENHANCE AUDIO HAVING VARIOUS AUDIO QUALITY ISSUES

Publication number: 20210343305

Abstract: Operations of a method include receiving a request to enhance a new source audio. Responsive to the request, the new source audio is input into a prediction model that was previously trained. Training the prediction model includes providing a generative adversarial network including the prediction model and a discriminator. Training data is obtained including tuples of source audios and target audios, each tuple including a source audio and a corresponding target audio. During training, the prediction model generates predicted audios based on the source audios. Training further includes applying a loss function to the predicted audios and the target audios, where the loss function incorporates a combination of a spectrogram loss and an adversarial loss. The prediction model is updated to optimize that loss function. After training, based on the new source audio, the prediction model generates a new predicted audio as an enhanced version of the new source audio.

Type: Application

Filed: April 30, 2020

Publication date: November 4, 2021

Inventors: Zeyu Jin, Jiaqi Su, Adam Finkelstein
Searching for Music

Publication number: 20210294840

Abstract: In implementations of searching for music, a music search system can receive a music search request that includes a music file including music content. The music search system can also receive a selected musical attribute from a plurality of musical attributes. The music search system includes a music search application that can generate musical features of the music content, where a respective one or more of the musical features correspond to a respective one of the musical attributes. The music search application can then compare the musical features that correspond to the selected musical attribute to audio features of audio files, and determine similar audio files to the music file based on the comparison of the musical features to the audio features of the audio files.

Type: Application

Filed: March 19, 2020

Publication date: September 23, 2021

Applicant: Adobe Inc.

Inventors: Jongpil Lee, Nicholas J. Bryan, Justin J. Salamon, Zeyu Jin
SECURE AUDIO WATERMARKING BASED ON NEURAL NETWORKS

Publication number: 20210256978

Abstract: Embodiments provide systems, methods, and computer storage media for secure audio watermarking and audio authenticity verification. An audio watermark detector may include a neural network trained to detect a particular audio watermark and embedding technique, which may indicate source software used in a workflow that generated an audio file under test. For example, the watermark may indicate an audio file was generated using voice manipulation software, so detecting the watermark can indicate manipulated audio such as deepfake audio and other attacked audio signals. In some embodiments, the audio watermark detector may be trained as part of a generative adversarial network in order to make the underlying audio watermark more robust to neural network-based attacks. Generally, the audio watermark detector may evaluate time domain samples from chunks of an audio clip under test to detect the presence of the audio watermark and generate a classification for the audio clip.

Type: Application

Filed: February 13, 2020

Publication date: August 19, 2021

Inventors: Zeyu JIN, Oona Shigeno RISSE-ADAMS
Real-time speaker-dependent neural vocoder

Patent number: 10770063

Abstract: Techniques for a recursive deep-learning approach for performing speech synthesis using a repeatable structure that splits an input tensor into a left half and right half similar to the operation of the Fast Fourier Transform, performs a 1-D convolution on each respective half, performs a summation and then applies a post-processing function. The repeatable structure may be utilized in a series configuration to operate as a vocoder or perform other speech processing functions.

Type: Grant

Filed: August 22, 2018

Date of Patent: September 8, 2020

Assignees: Adobe Inc., The Trustees of Princeton University

Inventors: Zeyu Jin, Gautham J. Mysore, Jingwan Lu, Adam Finkelstein

1 2 next