Patents by Inventor Hailin Jin

Hailin Jin has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11977829
    Abstract: The present disclosure relates to systems, methods, and non-transitory computer readable media for accurately and flexibly generating scalable and semantically editable font representations utilizing a machine learning approach. For example, the disclosed systems generate a font representation code from a glyph utilizing a particular neural network architecture. For example, the disclosed systems utilize a glyph appearance propagation model and perform an iterative process to generate a font representation code from an initial glyph. Additionally, using a glyph appearance propagation model, the disclosed systems automatically propagate the appearance of the initial glyph from the font representation code to generate additional glyphs corresponding to respective glyph labels. In some embodiments, the disclosed systems propagate edits or other changes in appearance of a glyph to other glyphs within a glyph set (e.g., to match the appearance of the edited glyph).
    Type: Grant
    Filed: June 29, 2021
    Date of Patent: May 7, 2024
    Assignee: Adobe Inc.
    Inventors: Zhifei Zhang, Zhaowen Wang, Hailin Jin, Matthew Fisher
  • Patent number: 11966849
    Abstract: Techniques and systems are provided for configuring neural networks to perform certain image manipulation operations. For instance, in response to obtaining an image for manipulation, an image manipulation system determines the fitness scores for a set of neural networks resulting from the processing of a noise map. Based on these fitness scores, the image manipulation system selects a subset of the set of neural networks for cross-breeding into a new generation of neural networks. The image manipulation system evaluates the performance of this new generation of neural networks and continues cross-breeding this neural networks until a fitness threshold is satisfied. From the final generation of neural networks, the image manipulation system selects a neural network that provides a desired output and uses the neural network to generate the manipulated image.
    Type: Grant
    Filed: February 20, 2020
    Date of Patent: April 23, 2024
    Assignee: Adobe Inc.
    Inventors: John Collomosse, Hailin Jin
  • Patent number: 11949964
    Abstract: Systems, methods, and non-transitory computer-readable media are disclosed for automatic tagging of videos. In particular, in one or more embodiments, the disclosed systems generate a set of tagged feature vectors (e.g., tagged feature vectors based on action-rich digital videos) to utilize to generate tags for an input digital video. For instance, the disclosed systems can extract a set of frames for the input digital video and generate feature vectors from the set of frames. In some embodiments, the disclosed systems generate aggregated feature vectors from the feature vectors. Furthermore, the disclosed systems can utilize the feature vectors (or aggregated feature vectors) to identify similar tagged feature vectors from the set of tagged feature vectors. Additionally, the disclosed systems can generate a set of tags for the input digital videos by aggregating one or more tags corresponding to identified similar tagged feature vectors.
    Type: Grant
    Filed: September 9, 2021
    Date of Patent: April 2, 2024
    Assignee: Adobe Inc.
    Inventors: Bryan Russell, Ruppesh Nalwaya, Markus Woodson, Joon-Young Lee, Hailin Jin
  • Patent number: 11875435
    Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media for accurately and flexibly generating scalable fonts utilizing multi-implicit neural font representations. For instance, the disclosed systems combine deep learning with differentiable rasterization to generate a multi-implicit neural font representation of a glyph. For example, the disclosed systems utilize an implicit differentiable font neural network to determine a font style code for an input glyph as well as distance values for locations of the glyph to be rendered based on a glyph label and the font style code. Further, the disclosed systems rasterize the distance values utilizing a differentiable rasterization model and combines the rasterized distance values to generate a permutation-invariant version of the glyph corresponding glyph set.
    Type: Grant
    Filed: October 12, 2021
    Date of Patent: January 16, 2024
    Assignee: Adobe Inc.
    Inventors: Chinthala Pradyumna Reddy, Zhifei Zhang, Matthew Fisher, Hailin Jin, Zhaowen Wang, Niloy J Mitra
  • Patent number: 11836932
    Abstract: Technology is disclosed herein for learning motion in video. In an implementation, an artificial neural network extracts features from a video. A correspondence proposal (CP) module performs, for at least some of the features, a search for corresponding features in the video based on a semantic similarity of a given feature to others of the features. The CP module then generates a joint semantic vector for each of the features based at least on the semantic similarity of the given feature to one or more of the corresponding features and a spatiotemporal distance of the given feature to the one or more of the corresponding features. The artificial neural network is able to identify motion in the video using the joint semantic vectors generated for the features extracted from the video.
    Type: Grant
    Filed: June 17, 2021
    Date of Patent: December 5, 2023
    Assignee: Adobe Inc.
    Inventors: Xingyu Liu, Hailin Jin, Joonyoung Lee
  • Publication number: 20230386208
    Abstract: Systems and methods for video segmentation and summarization are described. Embodiments of the present disclosure receive a video and a transcript of the video; generate visual features representing frames of the video using an image encoder; generate language features representing the transcript using a text encoder, wherein the image encoder and the text encoder are trained based on a correlation between training visual features and training language features; and segment the video into a plurality of video segments based on the visual features and the language features.
    Type: Application
    Filed: May 31, 2022
    Publication date: November 30, 2023
    Inventors: Hailin Jin, Jielin Qiu, Zhaowen Wang, Trung Huu Bui, Franck Dernoncourt
  • Publication number: 20230386054
    Abstract: The present disclosure relates to systems, methods, and non-transitory computer readable media that utilize deep learning to identify regions of an image that have been editorially modified. For example, the image comparison system includes a deep image comparator model that compares a pair of images and localizes regions that have been editorially manipulated relative to an original or trusted image. More specifically, the deep image comparator model generates and surfaces visual indications of the location of such editorial changes on the modified image. The deep image comparator model is robust and ignores discrepancies due to benign image transformations that commonly occur during electronic image distribution. The image comparison system optionally includes an image retrieval model utilizes a visual search embedding that is robust to minor manipulations or benign modifications of images. The image retrieval model utilizes a visual search embedding for an image to robustly identify near duplicate images.
    Type: Application
    Filed: May 27, 2022
    Publication date: November 30, 2023
    Inventors: John Collomosse, Alexander Black, Van Tu Bui, Hailin Jin, Viswanathan Swaminathan
  • Patent number: 11823322
    Abstract: Systems, methods, and non-transitory computer-readable media are disclosed for utilizing an encoder-decoder architecture to learn a volumetric 3D representation of an object using digital images of the object from multiple viewpoints to render novel views of the object. For instance, the disclosed systems can utilize patch-based image feature extraction to extract lifted feature representations from images corresponding to different viewpoints of an object. Furthermore, the disclosed systems can model view-dependent transformed feature representations using learned transformation kernels. In addition, the disclosed systems can recurrently and concurrently aggregate the transformed feature representations to generate a 3D voxel representation of the object. Furthermore, the disclosed systems can sample frustum features using the 3D voxel representation and transformation kernels.
    Type: Grant
    Filed: June 16, 2022
    Date of Patent: November 21, 2023
    Assignee: Adobe Inc.
    Inventors: Tong He, John Collomosse, Hailin Jin
  • Patent number: 11810374
    Abstract: In implementations of recognizing text in images, text recognition systems are trained using noisy images that have nuisance factors applied, and corresponding clean images (e.g., without nuisance factors). Clean images serve as supervision at both feature and pixel levels, so that text recognition systems are trained to be feature invariant (e.g., by requiring features extracted from a noisy image to match features extracted from a clean image), and feature complete (e.g., by requiring that features extracted from a noisy image be sufficient to generate a clean image). Accordingly, text recognition systems generalize to text not included in training images, and are robust to nuisance factors. Furthermore, since clean images are provided as supervision at feature and pixel levels, training requires fewer training images than text recognition systems that are not trained with a supervisory clean image, thus saving time and resources.
    Type: Grant
    Filed: April 26, 2021
    Date of Patent: November 7, 2023
    Assignee: Adobe Inc.
    Inventors: Zhaowen Wang, Hailin Jin, Yang Liu
  • Patent number: 11776180
    Abstract: Embodiments of the present disclosure are directed towards improved models trained using unsupervised domain adaptation. In particular, a style-content adaptation system provides improved translation during unsupervised domain adaptation by controlling the alignment of conditional distributions of a model during training such that content (e.g., a class) from a target domain is correctly mapped to content (e.g., the same class) in a source domain. The style-content adaptation system improves unsupervised domain adaptation using independent control over content (e.g., related to a class) as well as style (e.g., related to a domain) to control alignment when translating between the source and target domain. This independent control over content and style can also allow for images to be generated using the style-content adaptation system that contain desired content and/or style.
    Type: Grant
    Filed: February 26, 2020
    Date of Patent: October 3, 2023
    Assignee: Adobe Inc.
    Inventors: Ning Xu, Bayram Safa Cicek, Hailin Jin, Zhaowen Wang
  • Patent number: 11709885
    Abstract: The present disclosure relates to systems, methods, and non-transitory computer readable media for accurately and flexibly identifying digital images with similar style to a query digital image using fine-grain style determination via weakly supervised style extraction neural networks. For example, the disclosed systems can extract a style embedding from a query digital image using a style extraction neural network such as a novel two-branch autoencoder architecture or a weakly supervised discriminative neural network. The disclosed systems can generate a combined style embedding by combining complementary style embeddings from different style extraction neural networks. Moreover, the disclosed systems can search a repository of digital images to identify digital images with similar style to the query digital image.
    Type: Grant
    Filed: September 18, 2020
    Date of Patent: July 25, 2023
    Assignee: Adobe Inc.
    Inventors: John Collomosse, Zhe Lin, Saeid Motiian, Hailin Jin, Baldo Faieta, Alex Filipkowski
  • Patent number: 11676060
    Abstract: Digital content interaction prediction and training techniques that address imbalanced classes are described. In one or more implementations, a digital medium environment is described to predict user interaction with digital content that addresses an imbalance of numbers included in first and second classes in training data used to train a model using machine learning. The training data is received that describes the first class and the second class. A model is trained using machine learning. The training includes sampling the training data to include at least one subset of the training data from the first class and at least one subset of the training data from the second class. Iterative selections are made of a batch from the sampled training data. The iteratively selected batches are iteratively processed by a classifier implemented using machine learning to train the model.
    Type: Grant
    Filed: January 20, 2016
    Date of Patent: June 13, 2023
    Assignee: Adobe Inc.
    Inventors: Anirban Roychowdhury, Hung H. Bui, Trung H. Bui, Hailin Jin
  • Patent number: 11636147
    Abstract: The present disclosure relates to a tag-based font recognition system that utilizes a multi-learning framework to develop and improve tag-based font recognition using deep learning neural networks. In particular, the tag-based font recognition system jointly trains a font tag recognition neural network with an implicit font classification attention model to generate font tag probability vectors that are enhanced by implicit font classification information. Indeed, the font recognition system weights the hidden layers of the font tag recognition neural network with implicit font information to improve the accuracy and predictability of the font tag recognition neural network, which results in improved retrieval of fonts in response to a font tag query. Accordingly, using the enhanced tag probability vectors, the tag-based font recognition system can accurately identify and recommend one or more fonts in response to a font tag query.
    Type: Grant
    Filed: January 26, 2022
    Date of Patent: April 25, 2023
    Assignee: Adobe Inc.
    Inventors: Zhaowen Wang, Tianlang Chen, Ning Xu, Hailin Jin
  • Publication number: 20230115551
    Abstract: Methods, system, and computer storage media are provided for multi-modal localization. Input data comprising two modalities, such as image data and corresponding text or audio data, may be received. A phrase may be extracted from the text or audio data, and a neural network system may be utilized to spatially and temporally localize the phrase within the image data. The neural network system may include a plurality of cross-modal attention layers that each compare features across the first and second modalities without comparing features of the same modality. Using the cross-modal attention layers, a region or subset of pixels within one or more frames of the image data may be identified as corresponding to the phrase, and a localization indicator may be presented for display with the image data. Embodiments may also include unsupervised training of the neural network system.
    Type: Application
    Filed: October 12, 2021
    Publication date: April 13, 2023
    Inventors: Hailin Jin, Bryan Russell, Reuben Xin Hong Tan
  • Publication number: 20230110114
    Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media for accurately and flexibly generating scalable fonts utilizing multi-implicit neural font representations. For instance, the disclosed systems combine deep learning with differentiable rasterization to generate a multi-implicit neural font representation of a glyph. For example, the disclosed systems utilize an implicit differentiable font neural network to determine a font style code for an input glyph as well as distance values for locations of the glyph to be rendered based on a glyph label and the font style code. Further, the disclosed systems rasterize the distance values utilizing a differentiable rasterization model and combines the rasterized distance values to generate a permutation-invariant version of the glyph corresponding glyph set.
    Type: Application
    Filed: October 12, 2021
    Publication date: April 13, 2023
    Inventors: Chinthala Pradyumna Reddy, Zhifei Zhang, Matthew Fisher, Hailin Jin, Zhaowen Wang, Niloy J Mitra
  • Patent number: 11615308
    Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media for generating a response to a question received from a user during display or playback of a video segment by utilizing a query-response-neural network. The disclosed systems can extract a query vector from a question corresponding to the video segment using the query-response-neural network. The disclosed systems further generate context vectors representing both visual cues and transcript cues corresponding to the video segment using context encoders or other layers from the query-response-neural network. By utilizing additional layers from the query-response-neural network, the disclosed systems generate (i) a query-context vector based on the query vector and the context vectors, and (ii) candidate-response vectors representing candidate responses to the question from a domain-knowledge base or other source.
    Type: Grant
    Filed: December 28, 2021
    Date of Patent: March 28, 2023
    Assignee: Adobe Inc.
    Inventors: Wentian Zhao, Seokhwan Kim, Ning Xu, Hailin Jin
  • Publication number: 20230075087
    Abstract: The disclosed invention includes systems and methods for training and employing equivariant models for generating representations (e.g., vector representations) of temporally-varying content, such as but not limited to video content. The trained models are equivariant to temporal transformations applied to the input content (e.g., video content). The trained models are additionally invariant to non-temporal transformations (e.g., spatial and/or color-space transformations) applied to the input content. Such representations are employed in various machine learning tasks, such as but not limited to video retrieval (e.g., video search engine applications), identification of actions depicted in video, and temporally ordering clips of the video.
    Type: Application
    Filed: September 3, 2021
    Publication date: March 9, 2023
    Inventors: Simon Jenni, Hailin Jin
  • Publication number: 20220414314
    Abstract: The present disclosure relates to systems, methods, and non-transitory computer readable media for accurately and flexibly generating scalable and semantically editable font representations utilizing a machine learning approach. For example, the disclosed systems generate a font representation code from a glyph utilizing a particular neural network architecture. For example, the disclosed systems utilize a glyph appearance propagation model and perform an iterative process to generate a font representation code from an initial glyph. Additionally, using a glyph appearance propagation model, the disclosed systems automatically propagate the appearance of the initial glyph from the font representation code to generate additional glyphs corresponding to respective glyph labels. In some embodiments, the disclosed systems propagate edits or other changes in appearance of a glyph to other glyphs within a glyph set (e.g., to match the appearance of the edited glyph).
    Type: Application
    Filed: June 29, 2021
    Publication date: December 29, 2022
    Inventors: Zhifei Zhang, Zhaowen Wang, Hailin Jin, Matthew Fisher
  • Publication number: 20220414338
    Abstract: System and methods for a text summarization system are described. In one example, a text summarization system receives an input utterance and determines whether the utterance should be included in a summary of the text. The text summarization system includes an embedding network, a convolution network, an encoding component, and a summary component. The embedding network generates a semantic embedding of an utterance. The convolution network generates a plurality of feature vectors based on the semantic embedding. The encoding component identifies a plurality of latent codes respectively corresponding to the plurality of feature vectors. The summary component identifies a prominent code among the latent codes and to select the utterance as a summary utterance based on the prominent code.
    Type: Application
    Filed: June 29, 2021
    Publication date: December 29, 2022
    Inventors: SANGWOO CHO, Franck Dernoncourt, Timothy Jeewun Ganter, Trung Huu Bui, Nedim Lipka, Varun Manjunatha, Walter Chang, Hailin Jin, Jonathan Brandt
  • Publication number: 20220327767
    Abstract: Systems, methods, and non-transitory computer-readable media are disclosed for utilizing an encoder-decoder architecture to learn a volumetric 3D representation of an object using digital images of the object from multiple viewpoints to render novel views of the object. For instance, the disclosed systems can utilize patch-based image feature extraction to extract lifted feature representations from images corresponding to different viewpoints of an object. Furthermore, the disclosed systems can model view-dependent transformed feature representations using learned transformation kernels. In addition, the disclosed systems can recurrently and concurrently aggregate the transformed feature representations to generate a 3D voxel representation of the object. Furthermore, the disclosed systems can sample frustum features using the 3D voxel representation and transformation kernels.
    Type: Application
    Filed: June 16, 2022
    Publication date: October 13, 2022
    Inventors: Tong He, John Collomosse, Hailin Jin