Patents by Inventor Bryan Russell

Bryan Russell has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11949964
    Abstract: Systems, methods, and non-transitory computer-readable media are disclosed for automatic tagging of videos. In particular, in one or more embodiments, the disclosed systems generate a set of tagged feature vectors (e.g., tagged feature vectors based on action-rich digital videos) to utilize to generate tags for an input digital video. For instance, the disclosed systems can extract a set of frames for the input digital video and generate feature vectors from the set of frames. In some embodiments, the disclosed systems generate aggregated feature vectors from the feature vectors. Furthermore, the disclosed systems can utilize the feature vectors (or aggregated feature vectors) to identify similar tagged feature vectors from the set of tagged feature vectors. Additionally, the disclosed systems can generate a set of tags for the input digital videos by aggregating one or more tags corresponding to identified similar tagged feature vectors.
    Type: Grant
    Filed: September 9, 2021
    Date of Patent: April 2, 2024
    Assignee: Adobe Inc.
    Inventors: Bryan Russell, Ruppesh Nalwaya, Markus Woodson, Joon-Young Lee, Hailin Jin
  • Publication number: 20230368503
    Abstract: Embodiments are disclosed for correlating video sequences and audio sequences by a media recommendation system using a trained encoder network.
    Type: Application
    Filed: May 11, 2022
    Publication date: November 16, 2023
    Applicant: Adobe Inc.
    Inventors: Justin SALAMON, Bryan RUSSELL, Didac SURIS COLL-VINENT
  • Patent number: 11721056
    Abstract: In some embodiments, a model training system obtains a set of animation models. For each of the animation models, the model training system renders the animation model to generate a sequence of video frames containing a character using a set of rendering parameters and extracts joint points of the character from each frame of the sequence of video frames. The model training system further determines, for each frame of the sequence of video frames, whether a subset of the joint points are in contact with a ground plane in a three-dimensional space and generates contact labels for the subset of the joint points. The model training system trains a contact estimation model using training data containing the joint points extracted from the sequences of video frames and the generated contact labels. The contact estimation model can be used to refine a motion model for a character.
    Type: Grant
    Filed: January 12, 2022
    Date of Patent: August 8, 2023
    Assignee: Adobe Inc.
    Inventors: Jimei Yang, Davis Rempe, Bryan Russell, Aaron Hertzmann
  • Publication number: 20230115551
    Abstract: Methods, system, and computer storage media are provided for multi-modal localization. Input data comprising two modalities, such as image data and corresponding text or audio data, may be received. A phrase may be extracted from the text or audio data, and a neural network system may be utilized to spatially and temporally localize the phrase within the image data. The neural network system may include a plurality of cross-modal attention layers that each compare features across the first and second modalities without comparing features of the same modality. Using the cross-modal attention layers, a region or subset of pixels within one or more frames of the image data may be identified as corresponding to the phrase, and a localization indicator may be presented for display with the image data. Embodiments may also include unsupervised training of the neural network system.
    Type: Application
    Filed: October 12, 2021
    Publication date: April 13, 2023
    Inventors: Hailin Jin, Bryan Russell, Reuben Xin Hong Tan
  • Patent number: 11514632
    Abstract: This disclosure describes methods, non-transitory computer readable storage media, and systems that utilize a contrastive perceptual loss to modify neural networks for generating synthetic digital content items. For example, the disclosed systems generate a synthetic digital content item based on a guide input to a generative neural network. The disclosed systems utilize an encoder neural network to generate encoded representations of the synthetic digital content item and a corresponding ground-truth digital content item. Additionally, the disclosed systems sample patches from the encoded representations of the encoded digital content items and then determine a contrastive loss based on the perceptual distances between the patches in the encoded representations. Furthermore, the disclosed systems jointly update the parameters of the generative neural network and the encoder neural network utilizing the contrastive loss.
    Type: Grant
    Filed: November 6, 2020
    Date of Patent: November 29, 2022
    Assignee: Adobe Inc.
    Inventors: Bryan Russell, Taesung Park, Richard Zhang, Junyan Zhu, Alexander Andonian
  • Publication number: 20220148242
    Abstract: This disclosure describes methods, non-transitory computer readable storage media, and systems that utilize a contrastive perceptual loss to modify neural networks for generating synthetic digital content items. For example, the disclosed systems generate a synthetic digital content item based on a guide input to a generative neural network. The disclosed systems utilize an encoder neural network to generate encoded representations of the synthetic digital content item and a corresponding ground-truth digital content item. Additionally, the disclosed systems sample patches from the encoded representations of the encoded digital content items and then determine a contrastive loss based on the perceptual distances between the patches in the encoded representations. Furthermore, the disclosed systems jointly update the parameters of the generative neural network and the encoder neural network utilizing the contrastive loss.
    Type: Application
    Filed: November 6, 2020
    Publication date: May 12, 2022
    Inventors: Bryan Russell, Taesung Park, Richard Zhang, Junyan Zhu, Alexander Andonian
  • Publication number: 20220139019
    Abstract: In some embodiments, a model training system obtains a set of animation models. For each of the animation models, the model training system renders the animation model to generate a sequence of video frames containing a character using a set of rendering parameters and extracts joint points of the character from each frame of the sequence of video frames. The model training system further determines, for each frame of the sequence of video frames, whether a subset of the joint points are in contact with a ground plane in a three-dimensional space and generates contact labels for the subset of the joint points. The model training system trains a contact estimation model using training data containing the joint points extracted from the sequences of video frames and the generated contact labels. The contact estimation model can be used to refine a motion model for a character.
    Type: Application
    Filed: January 12, 2022
    Publication date: May 5, 2022
    Inventors: Jimei Yang, Davis Rempe, Bryan Russell, Aaron Hertzmann
  • Patent number: 11308329
    Abstract: A computer system is trained to understand audio-visual spatial correspondence using audio-visual clips having multi-channel audio. The computer system includes an audio subnetwork, video subnetwork, and pretext subnetwork. The audio subnetwork receives the two channels of audio from the audio-visual clips, and the video subnetwork receives the video frames from the audio-visual clips. In a subset of the audio-visual clips the audio-visual spatial relationship is misaligned, causing the audio-visual spatial cues for the audio and video to be incorrect. The audio subnetwork outputs an audio feature vector for each audio-visual clip, and the video subnetwork outputs a video feature vector for each audio-visual clip. The audio and video feature vectors for each audio-visual clip are merged and provided to the pretext subnetwork, which is configured to classify the merged vector as either having a misaligned audio-visual spatial relationship or not.
    Type: Grant
    Filed: May 7, 2020
    Date of Patent: April 19, 2022
    Assignee: Adobe Inc.
    Inventors: Justin Salamon, Bryan Russell, Karren Yang
  • Patent number: 11257298
    Abstract: Methods, systems, and non-transitory computer readable storage media are disclosed for reconstructing three-dimensional meshes from two-dimensional images of objects with automatic coordinate system alignment. For example, the disclosed system can generate feature vectors for a plurality of images having different views of an object. The disclosed system can process the feature vectors to generate coordinate-aligned feature vectors aligned with a coordinate system associated with an image. The disclosed system can generate a combined feature vector from the feature vectors aligned to the coordinate system. Additionally, the disclosed system can then generate a three-dimensional mesh representing the object from the combined feature vector.
    Type: Grant
    Filed: March 18, 2020
    Date of Patent: February 22, 2022
    Assignee: Adobe Inc.
    Inventors: Vladimir Kim, Pierre-alain Langlois, Oliver Wang, Matthew Fisher, Bryan Russell
  • Patent number: 11238634
    Abstract: In some embodiments, a motion model refinement system receives an input video depicting a human character and an initial motion model describing motions of individual joint points of the human character in a three-dimensional space. The motion model refinement system identifies foot joint points of the human character that are in contact with a ground plane using a trained contact estimation model. The motion model refinement system determines the ground plane based on the foot joint points and the initial motion model and constructs an optimization problem for refining the initial motion model. The optimization problem minimizes the difference between the refined motion model and the initial motion model under a set of plausibility constraints including constraints on the contact foot joint points and a time-dependent inertia tensor-based constraint. The motion model refinement system obtains the refined motion model by solving the optimization problem.
    Type: Grant
    Filed: April 28, 2020
    Date of Patent: February 1, 2022
    Assignee: Adobe Inc.
    Inventors: Jimei Yang, Davis Rempe, Bryan Russell, Aaron Hertzmann
  • Publication number: 20210409836
    Abstract: Systems, methods, and non-transitory computer-readable media are disclosed for automatic tagging of videos. In particular, in one or more embodiments, the disclosed systems generate a set of tagged feature vectors (e.g., tagged feature vectors based on action-rich digital videos) to utilize to generate tags for an input digital video. For instance, the disclosed systems can extract a set of frames for the input digital video and generate feature vectors from the set of frames. In some embodiments, the disclosed systems generate aggregated feature vectors from the feature vectors. Furthermore, the disclosed systems can utilize the feature vectors (or aggregated feature vectors) to identify similar tagged feature vectors from the set of tagged feature vectors. Additionally, the disclosed systems can generate a set of tags for the input digital videos by aggregating one or more tags corresponding to identified similar tagged feature vectors.
    Type: Application
    Filed: September 9, 2021
    Publication date: December 30, 2021
    Inventors: Bryan Russell, Ruppesh Nalwaya, Markus Woodson, Joon-Young Lee, Hailin Jin
  • Patent number: 11189066
    Abstract: Embodiments disclosed herein describe systems, methods, and products that train one or more neural networks and execute the trained neural network across various applications. The one or more neural networks are trained to optimize a loss function comprising a pixel-level comparison between the outputs generated by the neural networks and the ground truth dataset generated from a bubble view methodology or an explicit importance maps methodology. Each of these methodologies may be more efficient than and may closely approximate the more expensive but accurate human eye gaze measurements. The embodiments herein leverage an existing process for training neural networks to generate importance maps of a plurality of graphic objects to offer interactive applications for graphics designs and data visualizations.
    Type: Grant
    Filed: November 13, 2018
    Date of Patent: November 30, 2021
    Assignee: Adobe Inc.
    Inventors: Zoya Bylinskii, Aaron Hertzmann, Bryan Russell
  • Patent number: 11189094
    Abstract: Techniques are disclosed for 3D object reconstruction using photometric mesh representations. A decoder is pretrained to transform points sampled from 2D patches of representative objects into 3D polygonal meshes. An image frame of the object is fed into an encoder to get an initial latent code vector. For each frame and camera pair from the sequence, a polygonal mesh is rendered at the given viewpoints. The mesh is optimized by creating a virtual viewpoint, rasterized to obtain a depth map. The 3D mesh projections are aligned by projecting the coordinates corresponding to the polygonal face vertices of the rasterized mesh to both selected viewpoints. The photometric error is determined from RGB pixel intensities sampled from both frames. Gradients from the photometric error are backpropagated into the vertices of the assigned polygonal indices by relating the barycentric coordinates of each image to update the latent code vector.
    Type: Grant
    Filed: August 5, 2020
    Date of Patent: November 30, 2021
    Assignee: Adobe, Inc.
    Inventors: Oliver Wang, Vladimir Kim, Matthew Fisher, Elya Shechtman, Chen-Hsuan Lin, Bryan Russell
  • Publication number: 20210350135
    Abstract: A computer system is trained to understand audio-visual spatial correspondence using audio-visual clips having multi-channel audio. The computer system includes an audio subnetwork, video subnetwork, and pretext subnetwork. The audio subnetwork receives the two channels of audio from the audio-visual clips, and the video subnetwork receives the video frames from the audio-visual clips. In a subset of the audio-visual clips the audio-visual spatial relationship is misaligned, causing the audio-visual spatial cues for the audio and video to be incorrect. The audio subnetwork outputs an audio feature vector for each audio-visual clip, and the video subnetwork outputs a video feature vector for each audio-visual clip. The audio and video feature vectors for each audio-visual clip are merged and provided to the pretext subnetwork, which is configured to classify the merged vector as either having a misaligned audio-visual spatial relationship or not.
    Type: Application
    Filed: May 7, 2020
    Publication date: November 11, 2021
    Applicant: Adobe Inc.
    Inventors: Justin Salamon, Bryan Russell, Karren Yang
  • Publication number: 20210335028
    Abstract: In some embodiments, a motion model refinement system receives an input video depicting a human character and an initial motion model describing motions of individual joint points of the human character in a three-dimensional space. The motion model refinement system identifies foot joint points of the human character that are in contact with a ground plane using a trained contact estimation model. The motion model refinement system determines the ground plane based on the foot joint points and the initial motion model and constructs an optimization problem for refining the initial motion model. The optimization problem minimizes the difference between the refined motion model and the initial motion model under a set of plausibility constraints including constraints on the contact foot joint points and a time-dependent inertia tensor-based constraint. The motion model refinement system obtains the refined motion model by solving the optimization problem.
    Type: Application
    Filed: April 28, 2020
    Publication date: October 28, 2021
    Inventors: Jimei Yang, Davis Rempe, Bryan Russell, Aaron Hertzmann
  • Patent number: 11146862
    Abstract: Systems, methods, and non-transitory computer-readable media are disclosed for automatic tagging of videos. In particular, in one or more embodiments, the disclosed systems generate a set of tagged feature vectors (e.g., tagged feature vectors based on action-rich digital videos) to utilize to generate tags for an input digital video. For instance, the disclosed systems can extract a set of frames for the input digital video and generate feature vectors from the set of frames. In some embodiments, the disclosed systems generate aggregated feature vectors from the feature vectors. Furthermore, the disclosed systems can utilize the feature vectors (or aggregated feature vectors) to identify similar tagged feature vectors from the set of tagged feature vectors. Additionally, the disclosed systems can generate a set of tags for the input digital videos by aggregating one or more tags corresponding to identified similar tagged feature vectors.
    Type: Grant
    Filed: April 16, 2019
    Date of Patent: October 12, 2021
    Assignee: ADOBE INC.
    Inventors: Bryan Russell, Ruppesh Nalwaya, Markus Woodson, Joon-Young Lee, Hailin Jin
  • Publication number: 20210304799
    Abstract: Certain embodiments involve transcript-based techniques for facilitating insertion of secondary video content into primary video content. For instance, a video editor presents a video editing interface having a primary video section displaying a primary video, a text-based navigation section having navigable portions of a primary video transcript, and a secondary video menu section displaying candidate secondary videos. In some embodiments, candidate secondary videos are obtained by using target terms detected in the transcript to query a remote data source for the candidate secondary videos. In embodiments involving video insertion, the video editor identifies a portion of the primary video corresponding to a portion of the transcript selected within the text-based navigation section. The video editor inserts a secondary video, which is selected from the candidate secondary videos based on an input received at the secondary video menu section, at the identified portion of the primary video.
    Type: Application
    Filed: June 11, 2021
    Publication date: September 30, 2021
    Inventors: Bernd Huber, Bryan Russell, Gautham Mysore, Hijung Valentina Shin, Oliver Wang
  • Publication number: 20210295606
    Abstract: Methods, systems, and non-transitory computer readable storage media are disclosed for reconstructing three-dimensional meshes from two-dimensional images of objects with automatic coordinate system alignment. For example, the disclosed system can generate feature vectors for a plurality of images having different views of an object. The disclosed system can process the feature vectors to generate coordinate-aligned feature vectors aligned with a coordinate system associated with an image. The disclosed system can generate a combined feature vector from the feature vectors aligned to the coordinate system. Additionally, the disclosed system can then generate a three-dimensional mesh representing the object from the combined feature vector.
    Type: Application
    Filed: March 18, 2020
    Publication date: September 23, 2021
    Inventors: Vladimir Kim, Pierre-alain Langlois, Oliver Wang, Matthew Fisher, Bryan Russell
  • Patent number: 11049525
    Abstract: Certain embodiments involve transcript-based techniques for facilitating insertion of secondary video content into primary video content. For instance, a video editor presents a video editing interface having a primary video section displaying a primary video, a text-based navigation section having navigable portions of a primary video transcript, and a secondary video menu section displaying candidate secondary videos. In some embodiments, candidate secondary videos are obtained by using target terms detected in the transcript to query a remote data source for the candidate secondary videos. In embodiments involving video insertion, the video editor identifies a portion of the primary video corresponding to a portion of the transcript selected within the text-based navigation section. The video editor inserts a secondary video, which is selected from the candidate secondary videos based on an input received at the secondary video menu section, at the identified portion of the primary video.
    Type: Grant
    Filed: February 21, 2019
    Date of Patent: June 29, 2021
    Assignee: ADOBE INC.
    Inventors: Bernd Huber, Bryan Russell, Gautham Mysore, Hijung Valentina Shin, Oliver Wang
  • Patent number: 10937237
    Abstract: Methods, systems, and non-transitory computer readable storage media are disclosed for reconstructing three-dimensional object meshes from two-dimensional images of objects using multi-view cycle projection. For example, the disclosed system can determine a multi-view cycle projection loss across a plurality of images of an object via an estimated three-dimensional object mesh of the object. For example, the disclosed system uses a pixel mapping neural network to project a sampled pixel location across a plurality of images of an object and via a three-dimensional mesh representing the object. The disclosed system determines a multi-view cycle consistency loss based on a difference between the sampled pixel location and a cycle projection of the sampled pixel location and uses the loss to update the pixel mapping neural network, a latent vector representing the object, or a shape generation neural network that uses the latent vector to generate the object mesh.
    Type: Grant
    Filed: March 11, 2020
    Date of Patent: March 2, 2021
    Assignee: ADOBE INC.
    Inventors: Vladimir Kim, Pierre-alain Langlois, Matthew Fisher, Bryan Russell, Oliver Wang