Patents by Inventor Zeyu Jin
Zeyu Jin has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11915714Abstract: Methods for modifying audio data include operations for accessing audio data having a first prosody, receiving a target prosody differing from the first prosody, and computing acoustic features representing samples. Computing respective acoustic features for a sample includes computing a pitch feature as a quantized pitch value of the sample by assigning a pitch value, of the target prosody or the audio data, to at least one of a set of pitch bins having equal widths in cents. Computing the respective acoustic features further includes computing a periodicity feature from the audio data. The respective acoustic features for the sample include the pitch feature, the periodicity feature, and other acoustic features. A neural vocoder is applied to the acoustic features to pitch-shift and time-stretch the audio data from the first prosody toward the target prosody.Type: GrantFiled: December 21, 2021Date of Patent: February 27, 2024Assignees: Adobe Inc., Northwestern UniversityInventors: Maxwell Morrison, Juan Pablo Caceres Chomali, Zeyu Jin, Nicholas Bryan, Bryan A. Pardo
-
Patent number: 11830481Abstract: Methods are performed by one or more processing devices for correcting prosody in audio data. A method includes operations for accessing subject audio data in an audio edit region of the audio data. The subject audio data in the audio edit region potentially lacks prosodic continuity with unedited audio data in an unedited audio portion of the audio data. The operations further include predicting, based on a context of the unedited audio data, phoneme durations including a respective phoneme duration of each phoneme in the unedited audio data. The operations further include predicting, based on the context of the unedited audio data, a pitch contour comprising at least one respective pitch value of each phoneme in the unedited audio data. Additionally, the operations include correcting prosody of the subject audio data in the audio edit region by applying the phoneme durations and the pitch contour to the subject audio data.Type: GrantFiled: November 30, 2021Date of Patent: November 28, 2023Assignee: Adobe Inc.Inventors: Maxwell Morrison, Zeyu Jin, Nicholas Bryan, Juan Pablo Caceres Chomali, Lucas Rencker
-
Publication number: 20230343312Abstract: In implementations of music enhancement systems, a computing device implements an enhancement system to receive input data describing a recorded acoustic waveform of a musical instrument. The recorded acoustic waveform is represented as an input mel spectrogram. The enhancement system generates an enhanced mel spectrogram by processing the input mel spectrogram using a first machine learning model trained on a first type of training data to generate enhanced mel spectrograms based on input mel spectrograms. An acoustic waveform of the musical instrument is generated by processing the enhanced mel spectrogram using a second machine learning model trained on a second type of training data to generate acoustic waveforms based on mel spectrograms. The acoustic waveform of the musical instrument does not include an acoustic artifact that is included in the recorded waveform of the musical instrument.Type: ApplicationFiled: April 21, 2022Publication date: October 26, 2023Applicant: Adobe Inc.Inventors: Nikhil Kandpal, Oriol Nieto-Caballero, Zeyu Jin
-
Publication number: 20230197093Abstract: Methods for modifying audio data include operations for accessing audio data having a first prosody, receiving a target prosody differing from the first prosody, and computing acoustic features representing samples. Computing respective acoustic features for a sample includes computing a pitch feature as a quantized pitch value of the sample by assigning a pitch value, of the target prosody or the audio data, to at least one of a set of pitch bins having equal widths in cents. Computing the respective acoustic features further includes computing a periodicity feature from the audio data. The respective acoustic features for the sample include the pitch feature, the periodicity feature, and other acoustic features. A neural vocoder is applied to the acoustic features to pitch-shift and time-stretch the audio data from the first prosody toward the target prosody.Type: ApplicationFiled: December 21, 2021Publication date: June 22, 2023Inventors: Maxwell Morrison, Juan Pablo Caceres Chomali, Zeyu Jin, Nicholas Bryan, Bryan A. Pardo
-
Publication number: 20230169961Abstract: Methods are performed by one or more processing devices for correcting prosody in audio data. A method includes operations for accessing subject audio data in an audio edit region of the audio data. The subject audio data in the audio edit region potentially lacks prosodic continuity with unedited audio data in an unedited audio portion of the audio data. The operations further include predicting, based on a context of the unedited audio data, phoneme durations including a respective phoneme duration of each phoneme in the unedited audio data. The operations further include predicting, based on the context of the unedited audio data, a pitch contour comprising at least one respective pitch value of each phoneme in the unedited audio data. Additionally, the operations include correcting prosody of the subject audio data in the audio edit region by applying the phoneme durations and the pitch contour to the subject audio data.Type: ApplicationFiled: November 30, 2021Publication date: June 1, 2023Inventors: Maxwell Morrison, Zeyu Jin, Nicholas Bryan, Juan Pablo Caceres Chomali, Lucas Rencker
-
Patent number: 11636342Abstract: In implementations of searching for music, a music search system can receive a music search request that includes a music file including music content. The music search system can also receive a selected musical attribute from a plurality of musical attributes. The music search system includes a music search application that can generate musical features of the music content, where a respective one or more of the musical features correspond to a respective one of the musical attributes. The music search application can then compare the musical features that correspond to the selected musical attribute to audio features of audio files, and determine similar audio files to the music file based on the comparison of the musical features to the audio features of the audio files.Type: GrantFiled: October 3, 2022Date of Patent: April 25, 2023Assignee: Adobe Inc.Inventors: Jongpil Lee, Nicholas J. Bryan, Justin J. Salamon, Zeyu Jin
-
Publication number: 20230097356Abstract: In implementations of searching for music, a music search system can receive a music search request that includes a music file including music content. The music search system can also receive a selected musical attribute from a plurality of musical attributes. The music search system includes a music search application that can generate musical features of the music content, where a respective one or more of the musical features correspond to a respective one of the musical attributes. The music search application can then compare the musical features that correspond to the selected musical attribute to audio features of audio files, and determine similar audio files to the music file based on the comparison of the musical features to the audio features of the audio files.Type: ApplicationFiled: October 3, 2022Publication date: March 30, 2023Applicant: Adobe Inc.Inventors: Jongpil Lee, Nicholas J. Bryan, Justin J. Salamon, Zeyu Jin
-
Patent number: 11514925Abstract: Operations of a method include receiving a request to enhance a new source audio. Responsive to the request, the new source audio is input into a prediction model that was previously trained. Training the prediction model includes providing a generative adversarial network including the prediction model and a discriminator. Training data is obtained including tuples of source audios and target audios, each tuple including a source audio and a corresponding target audio. During training, the prediction model generates predicted audios based on the source audios. Training further includes applying a loss function to the predicted audios and the target audios, where the loss function incorporates a combination of a spectrogram loss and an adversarial loss. The prediction model is updated to optimize that loss function. After training, based on the new source audio, the prediction model generates a new predicted audio as an enhanced version of the new source audio.Type: GrantFiled: April 30, 2020Date of Patent: November 29, 2022Assignees: ADOBE INC., THE TRUSTEES OF PRINCETON UNIVERSITYInventors: Zeyu Jin, Jiaqi Su, Adam Finkelstein
-
Patent number: 11461649Abstract: In implementations of searching for music, a music search system can receive a music search request that includes a music file including music content. The music search system can also receive a selected musical attribute from a plurality of musical attributes. The music search system includes a music search application that can generate musical features of the music content, where a respective one or more of the musical features correspond to a respective one of the musical attributes. The music search application can then compare the musical features that correspond to the selected musical attribute to audio features of audio files, and determine similar audio files to the music file based on the comparison of the musical features to the audio features of the audio files.Type: GrantFiled: March 19, 2020Date of Patent: October 4, 2022Assignee: Adobe Inc.Inventors: Jongpil Lee, Nicholas J. Bryan, Justin J. Salamon, Zeyu Jin
-
Patent number: 11170793Abstract: Embodiments provide systems, methods, and computer storage media for secure audio watermarking and audio authenticity verification. An audio watermark detector may include a neural network trained to detect a particular audio watermark and embedding technique, which may indicate source software used in a workflow that generated an audio file under test. For example, the watermark may indicate an audio file was generated using voice manipulation software, so detecting the watermark can indicate manipulated audio such as deepfake audio and other attacked audio signals. In some embodiments, the audio watermark detector may be trained as part of a generative adversarial network in order to make the underlying audio watermark more robust to neural network-based attacks. Generally, the audio watermark detector may evaluate time domain samples from chunks of an audio clip under test to detect the presence of the audio watermark and generate a classification for the audio clip.Type: GrantFiled: February 13, 2020Date of Patent: November 9, 2021Assignee: Adobe Inc.Inventors: Zeyu Jin, Oona Shigeno Risse-Adams
-
Publication number: 20210343305Abstract: Operations of a method include receiving a request to enhance a new source audio. Responsive to the request, the new source audio is input into a prediction model that was previously trained. Training the prediction model includes providing a generative adversarial network including the prediction model and a discriminator. Training data is obtained including tuples of source audios and target audios, each tuple including a source audio and a corresponding target audio. During training, the prediction model generates predicted audios based on the source audios. Training further includes applying a loss function to the predicted audios and the target audios, where the loss function incorporates a combination of a spectrogram loss and an adversarial loss. The prediction model is updated to optimize that loss function. After training, based on the new source audio, the prediction model generates a new predicted audio as an enhanced version of the new source audio.Type: ApplicationFiled: April 30, 2020Publication date: November 4, 2021Inventors: Zeyu Jin, Jiaqi Su, Adam Finkelstein
-
Publication number: 20210294840Abstract: In implementations of searching for music, a music search system can receive a music search request that includes a music file including music content. The music search system can also receive a selected musical attribute from a plurality of musical attributes. The music search system includes a music search application that can generate musical features of the music content, where a respective one or more of the musical features correspond to a respective one of the musical attributes. The music search application can then compare the musical features that correspond to the selected musical attribute to audio features of audio files, and determine similar audio files to the music file based on the comparison of the musical features to the audio features of the audio files.Type: ApplicationFiled: March 19, 2020Publication date: September 23, 2021Applicant: Adobe Inc.Inventors: Jongpil Lee, Nicholas J. Bryan, Justin J. Salamon, Zeyu Jin
-
Publication number: 20210256978Abstract: Embodiments provide systems, methods, and computer storage media for secure audio watermarking and audio authenticity verification. An audio watermark detector may include a neural network trained to detect a particular audio watermark and embedding technique, which may indicate source software used in a workflow that generated an audio file under test. For example, the watermark may indicate an audio file was generated using voice manipulation software, so detecting the watermark can indicate manipulated audio such as deepfake audio and other attacked audio signals. In some embodiments, the audio watermark detector may be trained as part of a generative adversarial network in order to make the underlying audio watermark more robust to neural network-based attacks. Generally, the audio watermark detector may evaluate time domain samples from chunks of an audio clip under test to detect the presence of the audio watermark and generate a classification for the audio clip.Type: ApplicationFiled: February 13, 2020Publication date: August 19, 2021Inventors: Zeyu JIN, Oona Shigeno RISSE-ADAMS
-
Patent number: 10770063Abstract: Techniques for a recursive deep-learning approach for performing speech synthesis using a repeatable structure that splits an input tensor into a left half and right half similar to the operation of the Fast Fourier Transform, performs a 1-D convolution on each respective half, performs a summation and then applies a post-processing function. The repeatable structure may be utilized in a series configuration to operate as a vocoder or perform other speech processing functions.Type: GrantFiled: August 22, 2018Date of Patent: September 8, 2020Assignees: Adobe Inc., The Trustees of Princeton UniversityInventors: Zeyu Jin, Gautham J. Mysore, Jingwan Lu, Adam Finkelstein
-
Publication number: 20190318726Abstract: Techniques for a recursive deep-learning approach for performing speech synthesis using a repeatable structure that splits an input tensor into a left half and right half similar to the operation of the Fast Fourier Transform, performs a 1-D convolution on each respective half, performs a summation and then applies a post-processing function. The repeatable structure may be utilized in a series configuration to operate as a vocoder or perform other speech processing functions.Type: ApplicationFiled: August 22, 2018Publication date: October 17, 2019Applicants: Adobe Inc., The Trustees of Princeton UniversityInventors: Zeyu Jin, Gautham J. Mysore, Jingwan Lu, Adam Finkelstein
-
Patent number: 10347238Abstract: Systems and techniques are disclosed for synthesizing a new word or short phrase such that it blends seamlessly in the context of insertion or replacement in an existing narration. In one such embodiment, a text-to-speech synthesizer is utilized to say the word or phrase in a generic voice. Voice conversion is then performed on the generic voice to convert it into a voice that matches the narration. An editor and interface are described that support fully automatic synthesis, selection among a candidate set of alternative pronunciations, fine control over edit placements and pitch profiles, and guidance by the editors own voice.Type: GrantFiled: October 27, 2017Date of Patent: July 9, 2019Assignees: Adobe Inc., The Trustees of Princeton UniversityInventors: Zeyu Jin, Gautham J. Mysore, Stephen DiVerdi, Jingwan Lu, Adam Finkelstein
-
Publication number: 20190130894Abstract: Systems and techniques are disclosed for synthesizing a new word or short phrase such that it blends seamlessly in the context of insertion or replacement in an existing narration. In one such embodiment, a text-to-speech synthesizer is utilized to say the word or phrase in a generic voice. Voice conversion is then performed on the generic voice to convert it into a voice that matches the narration. An editor and interface are described that support fully automatic synthesis, selection among a candidate set of alternative pronunciations, fine control over edit placements and pitch profiles, and guidance by the editors own voice.Type: ApplicationFiled: October 27, 2017Publication date: May 2, 2019Applicants: Adobe Inc., The Trustees of Princeton UniversityInventors: Zeyu Jin, Gautham J. Mysore, Stephen DiVerdi, Jingwan Lu, Adam Finkelstein