Patents by Inventor Hailin Jin

Hailin Jin has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Multimodal unsupervised video temporal segmentation for summarization

Patent number: 12277767

Abstract: Systems and methods for video segmentation and summarization are described. Embodiments of the present disclosure receive a video and a transcript of the video; generate visual features representing frames of the video using an image encoder; generate language features representing the transcript using a text encoder, wherein the image encoder and the text encoder are trained based on a correlation between training visual features and training language features; and segment the video into a plurality of video segments based on the visual features and the language features.

Type: Grant

Filed: May 31, 2022

Date of Patent: April 15, 2025

Assignee: ADOBE INC.

Inventors: Hailin Jin, Jielin Qiu, Zhaowen Wang, Trung Huu Bui, Franck Dernoncourt
Topical vector-quantized variational autoencoders for extractive summarization of video transcripts

Patent number: 12147771

Abstract: System and methods for a text summarization system are described. In one example, a text summarization system receives an input utterance and determines whether the utterance should be included in a summary of the text. The text summarization system includes an embedding network, a convolution network, an encoding component, and a summary component. The embedding network generates a semantic embedding of an utterance. The convolution network generates a plurality of feature vectors based on the semantic embedding. The encoding component identifies a plurality of latent codes respectively corresponding to the plurality of feature vectors. The summary component identifies a prominent code among the latent codes and to select the utterance as a summary utterance based on the prominent code.

Type: Grant

Filed: June 29, 2021

Date of Patent: November 19, 2024

Assignee: ADOBE INC.

Inventors: Sangwoo Cho, Franck Dernoncourt, Timothy Jeewun Ganter, Trung Huu Bui, Nedim Lipka, Varun Manjunatha, Walter Chang, Hailin Jin, Jonathan Brandt
SEGMENT IDENTIFICATION FROM LONG VIDEOS

Publication number: 20240355119

Abstract: One or more aspects of the method, apparatus, and non-transitory computer readable medium include receiving a query relating to a long video. One or more aspects of the method, apparatus, and non-transitory computer readable medium further include generating a segment of the long video corresponding to the query using a machine learning model trained to identify relevant segments from long videos. One or more aspects of the method, apparatus, and non-transitory computer readable medium further include responding to the query based on the generated segment.

Type: Application

Filed: April 24, 2023

Publication date: October 24, 2024

Inventors: Ioana Croitoru, Trung Huu Bui, Zhaowen Wang, Seunghyun Yoon, Franck Dernoncourt, Hailin Jin
Localization of narrations in image data

Patent number: 12118787

Abstract: Methods, system, and computer storage media are provided for multi-modal localization. Input data comprising two modalities, such as image data and corresponding text or audio data, may be received. A phrase may be extracted from the text or audio data, and a neural network system may be utilized to spatially and temporally localize the phrase within the image data. The neural network system may include a plurality of cross-modal attention layers that each compare features across the first and second modalities without comparing features of the same modality. Using the cross-modal attention layers, a region or subset of pixels within one or more frames of the image data may be identified as corresponding to the phrase, and a localization indicator may be presented for display with the image data. Embodiments may also include unsupervised training of the neural network system.

Type: Grant

Filed: October 12, 2021

Date of Patent: October 15, 2024

Assignee: ADOBE INC.

Inventors: Hailin Jin, Bryan Russell, Reuben Xin Hong Tan
Equivariant models for generating vector representations of temporally-varying content

Patent number: 12061668

Abstract: The disclosed invention includes systems and methods for training and employing equivariant models for generating representations (e.g., vector representations) of temporally-varying content, such as but not limited to video content. The trained models are equivariant to temporal transformations applied to the input content (e.g., video content). The trained models are additionally invariant to non-temporal transformations (e.g., spatial and/or color-space transformations) applied to the input content. Such representations are employed in various machine learning tasks, such as but not limited to video retrieval (e.g., video search engine applications), identification of actions depicted in video, and temporally ordering clips of the video.

Type: Grant

Filed: September 3, 2021

Date of Patent: August 13, 2024

Assignee: Adobe Inc.

Inventors: Simon Jenni, Hailin Jin
Generating scalable and semantically editable font representations

Patent number: 11977829

Abstract: The present disclosure relates to systems, methods, and non-transitory computer readable media for accurately and flexibly generating scalable and semantically editable font representations utilizing a machine learning approach. For example, the disclosed systems generate a font representation code from a glyph utilizing a particular neural network architecture. For example, the disclosed systems utilize a glyph appearance propagation model and perform an iterative process to generate a font representation code from an initial glyph. Additionally, using a glyph appearance propagation model, the disclosed systems automatically propagate the appearance of the initial glyph from the font representation code to generate additional glyphs corresponding to respective glyph labels. In some embodiments, the disclosed systems propagate edits or other changes in appearance of a glyph to other glyphs within a glyph set (e.g., to match the appearance of the edited glyph).

Type: Grant

Filed: June 29, 2021

Date of Patent: May 7, 2024

Assignee: Adobe Inc.

Inventors: Zhifei Zhang, Zhaowen Wang, Hailin Jin, Matthew Fisher
Image processing network search for deep image priors

Patent number: 11966849

Abstract: Techniques and systems are provided for configuring neural networks to perform certain image manipulation operations. For instance, in response to obtaining an image for manipulation, an image manipulation system determines the fitness scores for a set of neural networks resulting from the processing of a noise map. Based on these fitness scores, the image manipulation system selects a subset of the set of neural networks for cross-breeding into a new generation of neural networks. The image manipulation system evaluates the performance of this new generation of neural networks and continues cross-breeding this neural networks until a fitness threshold is satisfied. From the final generation of neural networks, the image manipulation system selects a neural network that provides a desired output and uses the neural network to generate the manipulated image.

Type: Grant

Filed: February 20, 2020

Date of Patent: April 23, 2024

Assignee: Adobe Inc.

Inventors: John Collomosse, Hailin Jin
Generating action tags for digital videos

Patent number: 11949964

Abstract: Systems, methods, and non-transitory computer-readable media are disclosed for automatic tagging of videos. In particular, in one or more embodiments, the disclosed systems generate a set of tagged feature vectors (e.g., tagged feature vectors based on action-rich digital videos) to utilize to generate tags for an input digital video. For instance, the disclosed systems can extract a set of frames for the input digital video and generate feature vectors from the set of frames. In some embodiments, the disclosed systems generate aggregated feature vectors from the feature vectors. Furthermore, the disclosed systems can utilize the feature vectors (or aggregated feature vectors) to identify similar tagged feature vectors from the set of tagged feature vectors. Additionally, the disclosed systems can generate a set of tags for the input digital videos by aggregating one or more tags corresponding to identified similar tagged feature vectors.

Type: Grant

Filed: September 9, 2021

Date of Patent: April 2, 2024

Assignee: Adobe Inc.

Inventors: Bryan Russell, Ruppesh Nalwaya, Markus Woodson, Joon-Young Lee, Hailin Jin
Generating scalable fonts utilizing multi-implicit neural font representations

Patent number: 11875435

Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media for accurately and flexibly generating scalable fonts utilizing multi-implicit neural font representations. For instance, the disclosed systems combine deep learning with differentiable rasterization to generate a multi-implicit neural font representation of a glyph. For example, the disclosed systems utilize an implicit differentiable font neural network to determine a font style code for an input glyph as well as distance values for locations of the glyph to be rendered based on a glyph label and the font style code. Further, the disclosed systems rasterize the distance values utilizing a differentiable rasterization model and combines the rasterized distance values to generate a permutation-invariant version of the glyph corresponding glyph set.

Type: Grant

Filed: October 12, 2021

Date of Patent: January 16, 2024

Assignee: Adobe Inc.

Inventors: Chinthala Pradyumna Reddy, Zhifei Zhang, Matthew Fisher, Hailin Jin, Zhaowen Wang, Niloy J Mitra
Classifying motion in a video using detected visual features

Patent number: 11836932

Abstract: Technology is disclosed herein for learning motion in video. In an implementation, an artificial neural network extracts features from a video. A correspondence proposal (CP) module performs, for at least some of the features, a search for corresponding features in the video based on a semantic similarity of a given feature to others of the features. The CP module then generates a joint semantic vector for each of the features based at least on the semantic similarity of the given feature to one or more of the corresponding features and a spatiotemporal distance of the given feature to the one or more of the corresponding features. The artificial neural network is able to identify motion in the video using the joint semantic vectors generated for the features extracted from the video.

Type: Grant

Filed: June 17, 2021

Date of Patent: December 5, 2023

Assignee: Adobe Inc.

Inventors: Xingyu Liu, Hailin Jin, Joonyoung Lee
IDENTIFYING AND LOCALIZING EDITORIAL CHANGES TO IMAGES UTILIZING DEEP LEARNING

Publication number: 20230386054

Abstract: The present disclosure relates to systems, methods, and non-transitory computer readable media that utilize deep learning to identify regions of an image that have been editorially modified. For example, the image comparison system includes a deep image comparator model that compares a pair of images and localizes regions that have been editorially manipulated relative to an original or trusted image. More specifically, the deep image comparator model generates and surfaces visual indications of the location of such editorial changes on the modified image. The deep image comparator model is robust and ignores discrepancies due to benign image transformations that commonly occur during electronic image distribution. The image comparison system optionally includes an image retrieval model utilizes a visual search embedding that is robust to minor manipulations or benign modifications of images. The image retrieval model utilizes a visual search embedding for an image to robustly identify near duplicate images.

Type: Application

Filed: May 27, 2022

Publication date: November 30, 2023

Inventors: John Collomosse, Alexander Black, Van Tu Bui, Hailin Jin, Viswanathan Swaminathan
MULTIMODAL UNSUPERVISED VIDEO TEMPORAL SEGMENTATION FOR SUMMARIZATION

Publication number: 20230386208

Abstract: Systems and methods for video segmentation and summarization are described. Embodiments of the present disclosure receive a video and a transcript of the video; generate visual features representing frames of the video using an image encoder; generate language features representing the transcript using a text encoder, wherein the image encoder and the text encoder are trained based on a correlation between training visual features and training language features; and segment the video into a plurality of video segments based on the visual features and the language features.

Type: Application

Filed: May 31, 2022

Publication date: November 30, 2023

Inventors: Hailin Jin, Jielin Qiu, Zhaowen Wang, Trung Huu Bui, Franck Dernoncourt
Utilizing voxel feature transformations for view synthesis

Patent number: 11823322

Abstract: Systems, methods, and non-transitory computer-readable media are disclosed for utilizing an encoder-decoder architecture to learn a volumetric 3D representation of an object using digital images of the object from multiple viewpoints to render novel views of the object. For instance, the disclosed systems can utilize patch-based image feature extraction to extract lifted feature representations from images corresponding to different viewpoints of an object. Furthermore, the disclosed systems can model view-dependent transformed feature representations using learned transformation kernels. In addition, the disclosed systems can recurrently and concurrently aggregate the transformed feature representations to generate a 3D voxel representation of the object. Furthermore, the disclosed systems can sample frustum features using the 3D voxel representation and transformation kernels.

Type: Grant

Filed: June 16, 2022

Date of Patent: November 21, 2023

Assignee: Adobe Inc.

Inventors: Tong He, John Collomosse, Hailin Jin
Training text recognition systems

Patent number: 11810374

Abstract: In implementations of recognizing text in images, text recognition systems are trained using noisy images that have nuisance factors applied, and corresponding clean images (e.g., without nuisance factors). Clean images serve as supervision at both feature and pixel levels, so that text recognition systems are trained to be feature invariant (e.g., by requiring features extracted from a noisy image to match features extracted from a clean image), and feature complete (e.g., by requiring that features extracted from a noisy image be sufficient to generate a clean image). Accordingly, text recognition systems generalize to text not included in training images, and are robust to nuisance factors. Furthermore, since clean images are provided as supervision at feature and pixel levels, training requires fewer training images than text recognition systems that are not trained with a supervisory clean image, thus saving time and resources.

Type: Grant

Filed: April 26, 2021

Date of Patent: November 7, 2023

Assignee: Adobe Inc.

Inventors: Zhaowen Wang, Hailin Jin, Yang Liu
Controlled style-content image generation based on disentangling content and style

Patent number: 11776180

Abstract: Embodiments of the present disclosure are directed towards improved models trained using unsupervised domain adaptation. In particular, a style-content adaptation system provides improved translation during unsupervised domain adaptation by controlling the alignment of conditional distributions of a model during training such that content (e.g., a class) from a target domain is correctly mapped to content (e.g., the same class) in a source domain. The style-content adaptation system improves unsupervised domain adaptation using independent control over content (e.g., related to a class) as well as style (e.g., related to a domain) to control alignment when translating between the source and target domain. This independent control over content and style can also allow for images to be generated using the style-content adaptation system that contain desired content and/or style.

Type: Grant

Filed: February 26, 2020

Date of Patent: October 3, 2023

Assignee: Adobe Inc.

Inventors: Ning Xu, Bayram Safa Cicek, Hailin Jin, Zhaowen Wang
Determining fine-grain visual style similarities for digital images by extracting style embeddings disentangled from image content

Patent number: 11709885

Abstract: The present disclosure relates to systems, methods, and non-transitory computer readable media for accurately and flexibly identifying digital images with similar style to a query digital image using fine-grain style determination via weakly supervised style extraction neural networks. For example, the disclosed systems can extract a style embedding from a query digital image using a style extraction neural network such as a novel two-branch autoencoder architecture or a weakly supervised discriminative neural network. The disclosed systems can generate a combined style embedding by combining complementary style embeddings from different style extraction neural networks. Moreover, the disclosed systems can search a repository of digital images to identify digital images with similar style to the query digital image.

Type: Grant

Filed: September 18, 2020

Date of Patent: July 25, 2023

Assignee: Adobe Inc.

Inventors: John Collomosse, Zhe Lin, Saeid Motiian, Hailin Jin, Baldo Faieta, Alex Filipkowski
Digital content interaction prediction and training that addresses imbalanced classes

Patent number: 11676060

Abstract: Digital content interaction prediction and training techniques that address imbalanced classes are described. In one or more implementations, a digital medium environment is described to predict user interaction with digital content that addresses an imbalance of numbers included in first and second classes in training data used to train a model using machine learning. The training data is received that describes the first class and the second class. A model is trained using machine learning. The training includes sampling the training data to include at least one subset of the training data from the first class and at least one subset of the training data from the second class. Iterative selections are made of a batch from the sampled training data. The iteratively selected batches are iteratively processed by a classifier implemented using machine learning to train the model.

Type: Grant

Filed: January 20, 2016

Date of Patent: June 13, 2023

Assignee: Adobe Inc.

Inventors: Anirban Roychowdhury, Hung H. Bui, Trung H. Bui, Hailin Jin
Training neural networks to perform tag-based font recognition utilizing font classification

Patent number: 11636147

Abstract: The present disclosure relates to a tag-based font recognition system that utilizes a multi-learning framework to develop and improve tag-based font recognition using deep learning neural networks. In particular, the tag-based font recognition system jointly trains a font tag recognition neural network with an implicit font classification attention model to generate font tag probability vectors that are enhanced by implicit font classification information. Indeed, the font recognition system weights the hidden layers of the font tag recognition neural network with implicit font information to improve the accuracy and predictability of the font tag recognition neural network, which results in improved retrieval of fonts in response to a font tag query. Accordingly, using the enhanced tag probability vectors, the tag-based font recognition system can accurately identify and recommend one or more fonts in response to a font tag query.

Type: Grant

Filed: January 26, 2022

Date of Patent: April 25, 2023

Assignee: Adobe Inc.

Inventors: Zhaowen Wang, Tianlang Chen, Ning Xu, Hailin Jin
LOCALIZATION OF NARRATIONS IN IMAGE DATA

Publication number: 20230115551

Abstract: Methods, system, and computer storage media are provided for multi-modal localization. Input data comprising two modalities, such as image data and corresponding text or audio data, may be received. A phrase may be extracted from the text or audio data, and a neural network system may be utilized to spatially and temporally localize the phrase within the image data. The neural network system may include a plurality of cross-modal attention layers that each compare features across the first and second modalities without comparing features of the same modality. Using the cross-modal attention layers, a region or subset of pixels within one or more frames of the image data may be identified as corresponding to the phrase, and a localization indicator may be presented for display with the image data. Embodiments may also include unsupervised training of the neural network system.

Type: Application

Filed: October 12, 2021

Publication date: April 13, 2023

Inventors: Hailin Jin, Bryan Russell, Reuben Xin Hong Tan
GENERATING SCALABLE FONTS UTILIZING MULTI-IMPLICIT NEURAL FONT REPRESENTATIONS

Publication number: 20230110114

Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media for accurately and flexibly generating scalable fonts utilizing multi-implicit neural font representations. For instance, the disclosed systems combine deep learning with differentiable rasterization to generate a multi-implicit neural font representation of a glyph. For example, the disclosed systems utilize an implicit differentiable font neural network to determine a font style code for an input glyph as well as distance values for locations of the glyph to be rendered based on a glyph label and the font style code. Further, the disclosed systems rasterize the distance values utilizing a differentiable rasterization model and combines the rasterized distance values to generate a permutation-invariant version of the glyph corresponding glyph set.

Type: Application

Filed: October 12, 2021

Publication date: April 13, 2023

Inventors: Chinthala Pradyumna Reddy, Zhifei Zhang, Matthew Fisher, Hailin Jin, Zhaowen Wang, Niloy J Mitra

1 2 3 4 5 … next