Patents by Inventor Fabian David Caba Heilbron

Fabian David Caba Heilbron has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

MUSIC-AWARE SPEAKER DIARIZATION FOR TRANSCRIPTS AND TEXT-BASED VIDEO EDITING

Publication number: 20240127820

Abstract: Embodiments of the present invention provide systems, methods, and computer storage media for music-aware speaker diarization. In an example embodiment, one or more audio classifiers detect speech and music independently of each other, which facilitates detecting regions in an audio track that contain music but do not contain speech. These music-only regions are compared to the transcript, and any transcription and speakers that overlap in time with the music-only regions are removed from the transcript. In some embodiments, rather than having the transcript display the text from this detected music, a visual representation of the audio waveform is included in the corresponding regions of the transcript.

Type: Application

Filed: October 17, 2022

Publication date: April 18, 2024

Inventors: Justin Jonathan SALAMON, Fabian David CABA HEILBRON, Xue BAI, Aseem Omprakash AGARWALA, Hijung SHIN, Lubomira Assenova DONTCHEVA
FACE-AWARE SPEAKER DIARIZATION FOR TRANSCRIPTS AND TEXT-BASED VIDEO EDITING

Publication number: 20240127857

Abstract: Embodiments of the present invention provide systems, methods, and computer storage media for face-aware speaker diarization. In an example embodiment, an audio-only speaker diarization technique is applied to generate an audio-only speaker diarization of a video, an audio-visual speaker diarization technique is applied to generate a face-aware speaker diarization of the video, and the audio-only speaker diarization is refined using the face-aware speaker diarization to generate a hybrid speaker diarization that links detected faces to detected voices. In some embodiments, to accommodate videos with small faces that appear pixelated, a cropped image of any given face is extracted from each frame of the video, and the size of the cropped image is used to select a corresponding active speaker detection model to predict an active speaker score for the face in the cropped image.

Type: Application

Filed: October 17, 2022

Publication date: April 18, 2024

Inventors: Fabian David CABA HEILBRON, Xue BAI, Aseem Omprakash AGARWALA, Haoran CAI, Lubomira Assenova DONTCHEVA
Temporally distributed neural networks for video semantic segmentation

Patent number: 11854206

Abstract: A Video Semantic Segmentation System (VSSS) is disclosed that performs accurate and fast semantic segmentation of videos using a set of temporally distributed neural networks. The VSSS receives as input a video signal comprising a contiguous sequence of temporally-related video frames. The VSSS extracts features from the video frames in the contiguous sequence and based upon the extracted features, selects, from a set of labels, a label to be associated with each pixel of each video frame in the video signal. In certain embodiments, a set of multiple neural networks are used to extract the features to be used for video segmentation and the extraction of features is distributed among the multiple neural networks in the set. A strong feature representation representing the entirety of the features is produced for each video frame in the sequence of video frames by aggregating the output features extracted by the multiple neural networks.

Type: Grant

Filed: May 3, 2022

Date of Patent: December 26, 2023

Assignee: Adobe Inc.

Inventors: Federico Perazzi, Zhe Lin, Ping Hu, Oliver Wang, Fabian David Caba Heilbron
Adapting Pretrained Classification Models to Different Domains

Publication number: 20230325685

Abstract: A model training system is described that obtains a training dataset including videos and text labels. The model training system generates a video-text classification model by causing a model having a dual image text encoder architecture to predict which of the text labels describes each video in the training dataset. Predictions output by the model are compared to the training dataset to determine distillation and contrastive losses, which are used to adjust internal weights of the model during training. The internal weights of the model are then combined with internal weights of a trained image-text classification model to generate the video-text classification model. The video text-classification model is configured to generate a video or text output that classifies a video or text input.

Type: Application

Filed: April 12, 2022

Publication date: October 12, 2023

Applicant: Adobe Inc.

Inventors: Fabian David Caba Heilbron, Santiago Castro Serra
RETIMING DIGITAL VIDEOS UTILIZING DEEP LEARNING

Publication number: 20230276084

Abstract: This disclosure describes one or more implementations of systems, non-transitory computer-readable media, and methods that generate a temporally remapped video that satisfies a desired target duration while preserving natural video dynamics. In certain instances, the disclosed systems utilize a playback speed prediction machine-learning model that recognizes and localizes temporally varying changes in video playback speed to re-time a digital video with varying frame-change speeds. For instance, to re-time the digital video, the disclosed systems utilize the playback speed prediction machine-learning model to infer the slowness of individual video frames. Subsequently, in certain embodiments, the disclosed systems determine, from frames of a digital video, a temporal frame sub-sampling that is consistent with the slowness predictions and fit within a target video duration.

Type: Application

Filed: March 16, 2023

Publication date: August 31, 2023

Inventors: Simon Jenni, Markus Woodson, Fabian David Caba Heilbron
Retiming digital videos utilizing machine learning and temporally varying speeds

Patent number: 11610606

Abstract: This disclosure describes one or more implementations of systems, non-transitory computer-readable media, and methods that generate a temporally remapped video that satisfies a desired target duration while preserving natural video dynamics. In certain instances, the disclosed systems utilize a playback speed prediction machine-learning model that recognizes and localizes temporally varying changes in video playback speed to re-time a digital video with varying frame-change speeds. For instance, to re-time the digital video, the disclosed systems utilize the playback speed prediction machine-learning model to infer the slowness of individual video frames. Subsequently, in certain embodiments, the disclosed systems determine, from frames of a digital video, a temporal frame sub-sampling that is consistent with the slowness predictions and fit within a target video duration.

Type: Grant

Filed: February 25, 2022

Date of Patent: March 21, 2023

Assignee: Adobe Inc.

Inventors: Simon Jenni, Markus Woodson, Fabian David Caba Heilbron
TEMPORALLY DISTRIBUTED NEURAL NETWORKS FOR VIDEO SEMANTIC SEGMENTATION

Publication number: 20220270370

Abstract: A Video Semantic Segmentation System (VSSS) is disclosed that performs accurate and fast semantic segmentation of videos using a set of temporally distributed neural networks. The VSSS receives as input a video signal comprising a contiguous sequence of temporally-related video frames. The VSSS extracts features from the video frames in the contiguous sequence and based upon the extracted features, selects, from a set of labels, a label to be associated with each pixel of each video frame in the video signal. In certain embodiments, a set of multiple neural networks are used to extract the features to be used for video segmentation and the extraction of features is distributed among the multiple neural networks in the set. A strong feature representation representing the entirety of the features is produced for each video frame in the sequence of video frames by aggregating the output features extracted by the multiple neural networks.

Type: Application

Filed: May 3, 2022

Publication date: August 25, 2022

Inventors: Federico Perazzi, Zhe Lin, Ping Hu, Oliver Wang, Fabian David Caba Heilbron
Temporally distributed neural networks for video semantic segmentation

Patent number: 11354906

Abstract: A Video Semantic Segmentation System (VSSS) is disclosed that performs accurate and fast semantic segmentation of videos using a set of temporally distributed neural networks. The VSSS receives as input a video signal comprising a contiguous sequence of temporally-related video frames. The VSSS extracts features from the video frames in the contiguous sequence and based upon the extracted features, selects, from a set of labels, a label to be associated with each pixel of each video frame in the video signal. In certain embodiments, a set of multiple neural networks are used to extract the features to be used for video segmentation and the extraction of features is distributed among the multiple neural networks in the set. A strong feature representation representing the entirety of the features is produced for each video frame in the sequence of video frames by aggregating the output features extracted by the multiple neural networks.

Type: Grant

Filed: April 13, 2020

Date of Patent: June 7, 2022

Assignee: Adobe Inc.

Inventors: Federico Perazzi, Zhe Lin, Ping Hu, Oliver Wang, Fabian David Caba Heilbron
TEMPORALLY DISTRIBUTED NEURAL NETWORKS FOR VIDEO SEMANTIC SEGMENTATION

Publication number: 20210319232

Abstract: A Video Semantic Segmentation System (VSSS) is disclosed that performs accurate and fast semantic segmentation of videos using a set of temporally distributed neural networks. The VSSS receives as input a video signal comprising a contiguous sequence of temporally-related video frames. The VSSS extracts features from the video frames in the contiguous sequence and based upon the extracted features, selects, from a set of labels, a label to be associated with each pixel of each video frame in the video signal. In certain embodiments, a set of multiple neural networks are used to extract the features to be used for video segmentation and the extraction of features is distributed among the multiple neural networks in the set. A strong feature representation representing the entirety of the features is produced for each video frame in the sequence of video frames by aggregating the output features extracted by the multiple neural networks.

Type: Application

Filed: April 13, 2020

Publication date: October 14, 2021

Inventors: Federico Perazzi, Zhe Lin, Ping Hu, Oliver Wang, Fabian David Caba Heilbron
Active learning method for temporal action localization in untrimmed videos

Patent number: 10726313

Abstract: Various embodiments describe active learning methods for training temporal action localization models used to localize actions in untrimmed videos. A trainable active learning selection function is used to select unlabeled samples that can improve the temporal action localization model the most. The select unlabeled samples are then annotated and used to retrain the temporal action localization model. In some embodiment, the trainable active learning selection function includes a trainable performance prediction model that maps a video sample and a temporal action localization model to a predicted performance improvement for the temporal action localization model.

Type: Grant

Filed: April 19, 2018

Date of Patent: July 28, 2020

Assignee: Adobe Inc.

Inventors: Joon-Young Lee, Hailin Jin, Fabian David Caba Heilbron
ACTIVE LEARNING METHOD FOR TEMPORAL ACTION LOCALIZATION IN UNTRIMMED VIDEOS

Publication number: 20190325275

Abstract: Various embodiments describe active learning methods for training temporal action localization models used to localize actions in untrimmed videos. A trainable active learning selection function is used to select unlabeled samples that can improve the temporal action localization model the most. The select unlabeled samples are then annotated and used to retrain the temporal action localization model. In some embodiment, the trainable active learning selection function includes a trainable performance prediction model that maps a video sample and a temporal action localization model to a predicted performance improvement for the temporal action localization model.

Type: Application

Filed: April 19, 2018

Publication date: October 24, 2019

Inventors: Joon-Young Lee, Hailin Jin, Fabian David Caba Heilbron