Patents by Inventor Hailin Jin
Hailin Jin has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11977829Abstract: The present disclosure relates to systems, methods, and non-transitory computer readable media for accurately and flexibly generating scalable and semantically editable font representations utilizing a machine learning approach. For example, the disclosed systems generate a font representation code from a glyph utilizing a particular neural network architecture. For example, the disclosed systems utilize a glyph appearance propagation model and perform an iterative process to generate a font representation code from an initial glyph. Additionally, using a glyph appearance propagation model, the disclosed systems automatically propagate the appearance of the initial glyph from the font representation code to generate additional glyphs corresponding to respective glyph labels. In some embodiments, the disclosed systems propagate edits or other changes in appearance of a glyph to other glyphs within a glyph set (e.g., to match the appearance of the edited glyph).Type: GrantFiled: June 29, 2021Date of Patent: May 7, 2024Assignee: Adobe Inc.Inventors: Zhifei Zhang, Zhaowen Wang, Hailin Jin, Matthew Fisher
-
Patent number: 11966849Abstract: Techniques and systems are provided for configuring neural networks to perform certain image manipulation operations. For instance, in response to obtaining an image for manipulation, an image manipulation system determines the fitness scores for a set of neural networks resulting from the processing of a noise map. Based on these fitness scores, the image manipulation system selects a subset of the set of neural networks for cross-breeding into a new generation of neural networks. The image manipulation system evaluates the performance of this new generation of neural networks and continues cross-breeding this neural networks until a fitness threshold is satisfied. From the final generation of neural networks, the image manipulation system selects a neural network that provides a desired output and uses the neural network to generate the manipulated image.Type: GrantFiled: February 20, 2020Date of Patent: April 23, 2024Assignee: Adobe Inc.Inventors: John Collomosse, Hailin Jin
-
Patent number: 11949964Abstract: Systems, methods, and non-transitory computer-readable media are disclosed for automatic tagging of videos. In particular, in one or more embodiments, the disclosed systems generate a set of tagged feature vectors (e.g., tagged feature vectors based on action-rich digital videos) to utilize to generate tags for an input digital video. For instance, the disclosed systems can extract a set of frames for the input digital video and generate feature vectors from the set of frames. In some embodiments, the disclosed systems generate aggregated feature vectors from the feature vectors. Furthermore, the disclosed systems can utilize the feature vectors (or aggregated feature vectors) to identify similar tagged feature vectors from the set of tagged feature vectors. Additionally, the disclosed systems can generate a set of tags for the input digital videos by aggregating one or more tags corresponding to identified similar tagged feature vectors.Type: GrantFiled: September 9, 2021Date of Patent: April 2, 2024Assignee: Adobe Inc.Inventors: Bryan Russell, Ruppesh Nalwaya, Markus Woodson, Joon-Young Lee, Hailin Jin
-
Patent number: 11875435Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media for accurately and flexibly generating scalable fonts utilizing multi-implicit neural font representations. For instance, the disclosed systems combine deep learning with differentiable rasterization to generate a multi-implicit neural font representation of a glyph. For example, the disclosed systems utilize an implicit differentiable font neural network to determine a font style code for an input glyph as well as distance values for locations of the glyph to be rendered based on a glyph label and the font style code. Further, the disclosed systems rasterize the distance values utilizing a differentiable rasterization model and combines the rasterized distance values to generate a permutation-invariant version of the glyph corresponding glyph set.Type: GrantFiled: October 12, 2021Date of Patent: January 16, 2024Assignee: Adobe Inc.Inventors: Chinthala Pradyumna Reddy, Zhifei Zhang, Matthew Fisher, Hailin Jin, Zhaowen Wang, Niloy J Mitra
-
Patent number: 11836932Abstract: Technology is disclosed herein for learning motion in video. In an implementation, an artificial neural network extracts features from a video. A correspondence proposal (CP) module performs, for at least some of the features, a search for corresponding features in the video based on a semantic similarity of a given feature to others of the features. The CP module then generates a joint semantic vector for each of the features based at least on the semantic similarity of the given feature to one or more of the corresponding features and a spatiotemporal distance of the given feature to the one or more of the corresponding features. The artificial neural network is able to identify motion in the video using the joint semantic vectors generated for the features extracted from the video.Type: GrantFiled: June 17, 2021Date of Patent: December 5, 2023Assignee: Adobe Inc.Inventors: Xingyu Liu, Hailin Jin, Joonyoung Lee
-
Publication number: 20230386208Abstract: Systems and methods for video segmentation and summarization are described. Embodiments of the present disclosure receive a video and a transcript of the video; generate visual features representing frames of the video using an image encoder; generate language features representing the transcript using a text encoder, wherein the image encoder and the text encoder are trained based on a correlation between training visual features and training language features; and segment the video into a plurality of video segments based on the visual features and the language features.Type: ApplicationFiled: May 31, 2022Publication date: November 30, 2023Inventors: Hailin Jin, Jielin Qiu, Zhaowen Wang, Trung Huu Bui, Franck Dernoncourt
-
Publication number: 20230386054Abstract: The present disclosure relates to systems, methods, and non-transitory computer readable media that utilize deep learning to identify regions of an image that have been editorially modified. For example, the image comparison system includes a deep image comparator model that compares a pair of images and localizes regions that have been editorially manipulated relative to an original or trusted image. More specifically, the deep image comparator model generates and surfaces visual indications of the location of such editorial changes on the modified image. The deep image comparator model is robust and ignores discrepancies due to benign image transformations that commonly occur during electronic image distribution. The image comparison system optionally includes an image retrieval model utilizes a visual search embedding that is robust to minor manipulations or benign modifications of images. The image retrieval model utilizes a visual search embedding for an image to robustly identify near duplicate images.Type: ApplicationFiled: May 27, 2022Publication date: November 30, 2023Inventors: John Collomosse, Alexander Black, Van Tu Bui, Hailin Jin, Viswanathan Swaminathan
-
Patent number: 11823322Abstract: Systems, methods, and non-transitory computer-readable media are disclosed for utilizing an encoder-decoder architecture to learn a volumetric 3D representation of an object using digital images of the object from multiple viewpoints to render novel views of the object. For instance, the disclosed systems can utilize patch-based image feature extraction to extract lifted feature representations from images corresponding to different viewpoints of an object. Furthermore, the disclosed systems can model view-dependent transformed feature representations using learned transformation kernels. In addition, the disclosed systems can recurrently and concurrently aggregate the transformed feature representations to generate a 3D voxel representation of the object. Furthermore, the disclosed systems can sample frustum features using the 3D voxel representation and transformation kernels.Type: GrantFiled: June 16, 2022Date of Patent: November 21, 2023Assignee: Adobe Inc.Inventors: Tong He, John Collomosse, Hailin Jin
-
Patent number: 11810374Abstract: In implementations of recognizing text in images, text recognition systems are trained using noisy images that have nuisance factors applied, and corresponding clean images (e.g., without nuisance factors). Clean images serve as supervision at both feature and pixel levels, so that text recognition systems are trained to be feature invariant (e.g., by requiring features extracted from a noisy image to match features extracted from a clean image), and feature complete (e.g., by requiring that features extracted from a noisy image be sufficient to generate a clean image). Accordingly, text recognition systems generalize to text not included in training images, and are robust to nuisance factors. Furthermore, since clean images are provided as supervision at feature and pixel levels, training requires fewer training images than text recognition systems that are not trained with a supervisory clean image, thus saving time and resources.Type: GrantFiled: April 26, 2021Date of Patent: November 7, 2023Assignee: Adobe Inc.Inventors: Zhaowen Wang, Hailin Jin, Yang Liu
-
Patent number: 11776180Abstract: Embodiments of the present disclosure are directed towards improved models trained using unsupervised domain adaptation. In particular, a style-content adaptation system provides improved translation during unsupervised domain adaptation by controlling the alignment of conditional distributions of a model during training such that content (e.g., a class) from a target domain is correctly mapped to content (e.g., the same class) in a source domain. The style-content adaptation system improves unsupervised domain adaptation using independent control over content (e.g., related to a class) as well as style (e.g., related to a domain) to control alignment when translating between the source and target domain. This independent control over content and style can also allow for images to be generated using the style-content adaptation system that contain desired content and/or style.Type: GrantFiled: February 26, 2020Date of Patent: October 3, 2023Assignee: Adobe Inc.Inventors: Ning Xu, Bayram Safa Cicek, Hailin Jin, Zhaowen Wang
-
Patent number: 11709885Abstract: The present disclosure relates to systems, methods, and non-transitory computer readable media for accurately and flexibly identifying digital images with similar style to a query digital image using fine-grain style determination via weakly supervised style extraction neural networks. For example, the disclosed systems can extract a style embedding from a query digital image using a style extraction neural network such as a novel two-branch autoencoder architecture or a weakly supervised discriminative neural network. The disclosed systems can generate a combined style embedding by combining complementary style embeddings from different style extraction neural networks. Moreover, the disclosed systems can search a repository of digital images to identify digital images with similar style to the query digital image.Type: GrantFiled: September 18, 2020Date of Patent: July 25, 2023Assignee: Adobe Inc.Inventors: John Collomosse, Zhe Lin, Saeid Motiian, Hailin Jin, Baldo Faieta, Alex Filipkowski
-
Patent number: 11676060Abstract: Digital content interaction prediction and training techniques that address imbalanced classes are described. In one or more implementations, a digital medium environment is described to predict user interaction with digital content that addresses an imbalance of numbers included in first and second classes in training data used to train a model using machine learning. The training data is received that describes the first class and the second class. A model is trained using machine learning. The training includes sampling the training data to include at least one subset of the training data from the first class and at least one subset of the training data from the second class. Iterative selections are made of a batch from the sampled training data. The iteratively selected batches are iteratively processed by a classifier implemented using machine learning to train the model.Type: GrantFiled: January 20, 2016Date of Patent: June 13, 2023Assignee: Adobe Inc.Inventors: Anirban Roychowdhury, Hung H. Bui, Trung H. Bui, Hailin Jin
-
Patent number: 11636147Abstract: The present disclosure relates to a tag-based font recognition system that utilizes a multi-learning framework to develop and improve tag-based font recognition using deep learning neural networks. In particular, the tag-based font recognition system jointly trains a font tag recognition neural network with an implicit font classification attention model to generate font tag probability vectors that are enhanced by implicit font classification information. Indeed, the font recognition system weights the hidden layers of the font tag recognition neural network with implicit font information to improve the accuracy and predictability of the font tag recognition neural network, which results in improved retrieval of fonts in response to a font tag query. Accordingly, using the enhanced tag probability vectors, the tag-based font recognition system can accurately identify and recommend one or more fonts in response to a font tag query.Type: GrantFiled: January 26, 2022Date of Patent: April 25, 2023Assignee: Adobe Inc.Inventors: Zhaowen Wang, Tianlang Chen, Ning Xu, Hailin Jin
-
Publication number: 20230115551Abstract: Methods, system, and computer storage media are provided for multi-modal localization. Input data comprising two modalities, such as image data and corresponding text or audio data, may be received. A phrase may be extracted from the text or audio data, and a neural network system may be utilized to spatially and temporally localize the phrase within the image data. The neural network system may include a plurality of cross-modal attention layers that each compare features across the first and second modalities without comparing features of the same modality. Using the cross-modal attention layers, a region or subset of pixels within one or more frames of the image data may be identified as corresponding to the phrase, and a localization indicator may be presented for display with the image data. Embodiments may also include unsupervised training of the neural network system.Type: ApplicationFiled: October 12, 2021Publication date: April 13, 2023Inventors: Hailin Jin, Bryan Russell, Reuben Xin Hong Tan
-
Publication number: 20230110114Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media for accurately and flexibly generating scalable fonts utilizing multi-implicit neural font representations. For instance, the disclosed systems combine deep learning with differentiable rasterization to generate a multi-implicit neural font representation of a glyph. For example, the disclosed systems utilize an implicit differentiable font neural network to determine a font style code for an input glyph as well as distance values for locations of the glyph to be rendered based on a glyph label and the font style code. Further, the disclosed systems rasterize the distance values utilizing a differentiable rasterization model and combines the rasterized distance values to generate a permutation-invariant version of the glyph corresponding glyph set.Type: ApplicationFiled: October 12, 2021Publication date: April 13, 2023Inventors: Chinthala Pradyumna Reddy, Zhifei Zhang, Matthew Fisher, Hailin Jin, Zhaowen Wang, Niloy J Mitra
-
Patent number: 11615308Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media for generating a response to a question received from a user during display or playback of a video segment by utilizing a query-response-neural network. The disclosed systems can extract a query vector from a question corresponding to the video segment using the query-response-neural network. The disclosed systems further generate context vectors representing both visual cues and transcript cues corresponding to the video segment using context encoders or other layers from the query-response-neural network. By utilizing additional layers from the query-response-neural network, the disclosed systems generate (i) a query-context vector based on the query vector and the context vectors, and (ii) candidate-response vectors representing candidate responses to the question from a domain-knowledge base or other source.Type: GrantFiled: December 28, 2021Date of Patent: March 28, 2023Assignee: Adobe Inc.Inventors: Wentian Zhao, Seokhwan Kim, Ning Xu, Hailin Jin
-
Publication number: 20230075087Abstract: The disclosed invention includes systems and methods for training and employing equivariant models for generating representations (e.g., vector representations) of temporally-varying content, such as but not limited to video content. The trained models are equivariant to temporal transformations applied to the input content (e.g., video content). The trained models are additionally invariant to non-temporal transformations (e.g., spatial and/or color-space transformations) applied to the input content. Such representations are employed in various machine learning tasks, such as but not limited to video retrieval (e.g., video search engine applications), identification of actions depicted in video, and temporally ordering clips of the video.Type: ApplicationFiled: September 3, 2021Publication date: March 9, 2023Inventors: Simon Jenni, Hailin Jin
-
Publication number: 20220414314Abstract: The present disclosure relates to systems, methods, and non-transitory computer readable media for accurately and flexibly generating scalable and semantically editable font representations utilizing a machine learning approach. For example, the disclosed systems generate a font representation code from a glyph utilizing a particular neural network architecture. For example, the disclosed systems utilize a glyph appearance propagation model and perform an iterative process to generate a font representation code from an initial glyph. Additionally, using a glyph appearance propagation model, the disclosed systems automatically propagate the appearance of the initial glyph from the font representation code to generate additional glyphs corresponding to respective glyph labels. In some embodiments, the disclosed systems propagate edits or other changes in appearance of a glyph to other glyphs within a glyph set (e.g., to match the appearance of the edited glyph).Type: ApplicationFiled: June 29, 2021Publication date: December 29, 2022Inventors: Zhifei Zhang, Zhaowen Wang, Hailin Jin, Matthew Fisher
-
Publication number: 20220414338Abstract: System and methods for a text summarization system are described. In one example, a text summarization system receives an input utterance and determines whether the utterance should be included in a summary of the text. The text summarization system includes an embedding network, a convolution network, an encoding component, and a summary component. The embedding network generates a semantic embedding of an utterance. The convolution network generates a plurality of feature vectors based on the semantic embedding. The encoding component identifies a plurality of latent codes respectively corresponding to the plurality of feature vectors. The summary component identifies a prominent code among the latent codes and to select the utterance as a summary utterance based on the prominent code.Type: ApplicationFiled: June 29, 2021Publication date: December 29, 2022Inventors: SANGWOO CHO, Franck Dernoncourt, Timothy Jeewun Ganter, Trung Huu Bui, Nedim Lipka, Varun Manjunatha, Walter Chang, Hailin Jin, Jonathan Brandt
-
Publication number: 20220327767Abstract: Systems, methods, and non-transitory computer-readable media are disclosed for utilizing an encoder-decoder architecture to learn a volumetric 3D representation of an object using digital images of the object from multiple viewpoints to render novel views of the object. For instance, the disclosed systems can utilize patch-based image feature extraction to extract lifted feature representations from images corresponding to different viewpoints of an object. Furthermore, the disclosed systems can model view-dependent transformed feature representations using learned transformation kernels. In addition, the disclosed systems can recurrently and concurrently aggregate the transformed feature representations to generate a 3D voxel representation of the object. Furthermore, the disclosed systems can sample frustum features using the 3D voxel representation and transformation kernels.Type: ApplicationFiled: June 16, 2022Publication date: October 13, 2022Inventors: Tong He, John Collomosse, Hailin Jin