Patents by Inventor Kshitiz Garg

Kshitiz Garg has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

GENERATING CUSTOM ANIMATIONS UTILIZING GENERATIVE ARTIFICIAL INTELLIGENCE

Publication number: 20260105670

Abstract: Systems, methods, and non-transitory computer-readable media generate custom animations comprises a structure of a coarse animation prompt. For example, the disclosed systems receive a style prompt and receive a coarse animation prompt. The disclosed systems generate, utilizing a media generation model, a custom animation having a structure and timing of the coarse animation prompt and a style informed by the style prompt. The disclosed systems also provide the custom animation for display via a graphical user interface.

Type: Application

Filed: September 30, 2025

Publication date: April 16, 2026

Inventors: Yangtuanfeng Wang, Li-Yi Wei, Wilmot Wei-Mau Li, Valerie Head, Seth Walker, Lakshya Lnu, Kshitiz Garg, Kazi Rubaiat Habib, Jun Saito, James Ratliff, Duygu Ceylan Aksit, Dafei Qin, Cameron Smith
Generating tile-able patterns from text

Patent number: 12579608

Abstract: Systems and methods for generating tile-able patterns from text include obtaining a text prompt and generating, by a generation prior model, a latent vector based on the text prompt, where the generation prior model is trained to output vectors within a distribution of tile-able patterns. An image generation model then generates an output image based on the latent vector. The output image comprises a tile-able pattern including an element from the text prompt.

Type: Grant

Filed: December 1, 2023

Date of Patent: March 17, 2026

Assignee: ADOBE INC.

Inventors: Vineet Batra, Sumit Chaturvedi, Abhishek Rai, Pranav Vineet Aggarwal, Ajinkya Gorakhnath Kale, Aman Jeph, Ankit Phogat, Sumit Dhingra, Fengbin Chen, Kshitiz Garg, Milos Hasan, Midhun Harikumar, Gaurav Suresh Pathak, Souymodip Chakraborty
CUSTOMIZABLE FRAMEWORK TO EXTRACT MOMENTS OF INTEREST

Publication number: 20260044563

Abstract: Embodiments of the present invention provide systems, methods, and computer storage media for extracting moments of interest (e.g., video frames, video segments) from a video. In an example embodiment, independent and/or orthogonal machine learning models are used to extract different types of features considering different modalities, and each frame in the video is assigned an importance score for each model. The importance scores for each model are combined into an aggregated importance score for each frame in the video. Depending on the embodiment, the aggregated importance scores are used to visualize the score per frame, identify moments of interest, automatically crop down the video into a highlight reel, browse or visualize the moments of interest within the video, and/or search across multiple videos.

Type: Application

Filed: October 16, 2025

Publication date: February 12, 2026

Inventors: Ali AMINIAN, William Lawrence MARINO, Kshitiz GARG, Aseem Omprakash AGARWALA
Time domain video extension

Patent number: 12513369

Abstract: Embodiments are disclosed for generating a temporally coherent video extension. The method includes displaying, on a graphical user interface, a user interface element representing a video to be extended, where the video includes a number of frames. The method further includes receiving an input via the graphical user interface associated with the user interface element. The input causes a visual change to the user interface element which represents a duration of an extension to be made to the video. The method further includes generating frames based on the duration of the extension. The generated frames use motion information determined from frames of the video. The motion information represents a per-pixel motion between at least a pair of frames of the video. The method further includes providing, for display on the graphical user interface, an extended video including the frames of the video and the generated frames.

Type: Grant

Filed: September 10, 2024

Date of Patent: December 30, 2025

Assignee: Adobe Inc.

Inventors: Gabriela Duncombe, Xue Bai, Lakshya Lnu, Kshitiz Garg, Gunjan Aggarwal, Feng Liu, Aseem Agarwala, Ali Aminian, Jui-Hsien Wang, Zhe Wang
Customizable framework to extract moments of interest

Patent number: 12468760

Abstract: Embodiments of the present invention provide systems, methods, and computer storage media for extracting moments of interest (e.g., video frames, video segments) from a video. In an example embodiment, independent and/or orthogonal machine learning models are used to extract different types of features considering different modalities, and each frame in the video is assigned an importance score for each model. The importance scores for each model are combined into an aggregated importance score for each frame in the video. Depending on the embodiment, the aggregated importance scores are used to visualize the score per frame, identify moments of interest, automatically crop down the video into a highlight reel, browse or visualize the moments of interest within the video, and/or search across multiple videos.

Type: Grant

Filed: October 28, 2021

Date of Patent: November 11, 2025

Assignee: Adobe Inc.

Inventors: Ali Aminian, William Lawrence Marino, Kshitiz Garg, Aseem Agarwala
Efficiently inferencing digital videos utilizing machine-learning models

Patent number: 12450504

Abstract: This disclosure describes one or more implementations of a video inference system that utilizes machine-learning models to efficiently and flexibly process digital videos utilizing various improved video inference architectures. For example, the video inference system provides a framework for improving digital video processing by increasing the efficiency of both central processing units (CPUs) and graphics processing units (GPUs). In one example, the video inference system utilizes a first video inference architecture to reduce the number of computing resources needed to inference digital videos by analyzing multiple digital videos utilizing sets of CPU/GPU containers along with parallel pipeline processing. In a further example, the video inference system utilizes a second video inference architecture that facilitates multiple CPUs to preprocess multiple digital videos in parallel as well as a GPU to continuously, sequentially, and efficiently inference each of the digital videos.

Type: Grant

Filed: July 12, 2024

Date of Patent: October 21, 2025

Assignee: Adobe Inc.

Inventors: Akhilesh Kumar, Xiaozhen Xue, Daniel Miranda, Nicolas Huynh Thien, Kshitiz Garg
Tracking unique face identities in videos

Patent number: 12412419

Abstract: Some aspects of the technology described herein perform identity identification on faces in a video. Object tracking is performed on detected faces in frames of a video to generate tracklets. Each tracklet comprises a sequence of consecutive frames in which each frame includes a detected face for a person. The tracklets are clustered using face feature vectors for detected faces of each tracklet to generate a plurality of clusters. Information is stored in an identity datastore, including a first identifier for a first identity in association with an indication of frames from tracklets in a first cluster from the plurality of clusters.

Type: Grant

Filed: November 7, 2022

Date of Patent: September 9, 2025

Assignee: ADOBE INC.

Inventors: Ali Aminian, Aashish Kumar Misraa, Kshitiz Garg, Aseem Agarwala
TEXT-BASED FRAMEWORK FOR VIDEO OBJECT SELECTION

Publication number: 20250252741

Abstract: Embodiments are disclosed for receiving a user input and an input video comprising multiple frames. The method may include extracting a text feature from the user input. The method may further include extracting a plurality of image features from the frames. The method may further include identifying one or more keyframes from the frames that include the object. The method may further include clustering one or more groups of the one or more keyframes. The method may further include generating a plurality of segmentation masks for each group. The method may further include determining a set of reference masks corresponding to the user input and the object. The method may further include generating a set of fusion masks by combining the plurality of segmentation masks and the set of reference masks. The method may further include propagating the set of fusion masks and outputting a final set of masks.

Type: Application

Filed: March 31, 2025

Publication date: August 7, 2025

Applicant: Adobe Inc.

Inventors: Shivam Nalin PATEL, Kshitiz GARG, Han GUO, Ali AMINIAN, Aashish MISRAA
Text-based framework for video object selection

Patent number: 12266181

Abstract: Embodiments are disclosed for receiving a user input and an input video comprising multiple frames. The method may include extracting a text feature from the user input. The method may further include extracting a plurality of image features from the frames. The method may further include identifying one or more keyframes from the frames that include the object. The method may further include clustering one or more groups of the one or more keyframes. The method may further include generating a plurality of segmentation masks for each group. The method may further include determining a set of reference masks corresponding to the user input and the object. The method may further include generating a set of fusion masks by combining the plurality of segmentation masks and the set of reference masks. The method may further include propagating the set of fusion masks and outputting a final set of masks.

Type: Grant

Filed: November 19, 2021

Date of Patent: April 1, 2025

Assignee: Adobe Inc.

Inventors: Shivam Nalin Patel, Kshitiz Garg, Han Guo, Ali Aminian, Aashish Misraa
GENERATING TILE-ABLE PATTERNS FROM TEXT

Publication number: 20240420389

Abstract: Systems and methods for generating tile-able patterns from text include obtaining a text prompt and generating, by a generation prior model, a latent vector based on the text prompt, where the generation prior model is trained to output vectors within a distribution of tile-able patterns. An image generation model then generates an output image based on the latent vector. The output image comprises a tile-able pattern including an element from the text prompt.

Type: Application

Filed: December 1, 2023

Publication date: December 19, 2024

Inventors: Vineet Batra, Sumit Chaturvedi, Abhishek Rai, Pranav Vineet Aggarwal, Ajinkya Gorakhnath Kale, Aman Jeph, Ankit Phogat, Sumit Dhingra, Fengbin Chen, Kshitiz Garg, Milos Hasan, Midhun Harikumar, Gaurav Suresh Pathak, Souymodip Chakraborty
EFFICIENTLY INFERENCING DIGITAL VIDEOS UTILIZING MACHINE-LEARNING MODELS

Publication number: 20240362506

Abstract: This disclosure describes one or more implementations of a video inference system that utilizes machine-learning models to efficiently and flexibly process digital videos utilizing various improved video inference architectures. For example, the video inference system provides a framework for improving digital video processing by increasing the efficiency of both central processing units (CPUs) and graphics processing units (GPUs). In one example, the video inference system utilizes a first video inference architecture to reduce the number of computing resources needed to inference digital videos by analyzing multiple digital videos utilizing sets of CPU/GPU containers along with parallel pipeline processing. In a further example, the video inference system utilizes a second video inference architecture that facilitates multiple CPUs to preprocess multiple digital videos in parallel as well as a GPU to continuously, sequentially, and efficiently inference each of the digital videos.

Type: Application

Filed: July 12, 2024

Publication date: October 31, 2024

Inventors: Akhilesh Kumar, Xiaozhen Xue, Daniel Miranda, Nicolas Huynh Thien, Kshitiz Garg
Increasing efficiency of inferencing digital videos utilizing machine-learning models

Patent number: 12067499

Abstract: This disclosure describes one or more implementations of a video inference system that utilizes machine-learning models to efficiently and flexibly process digital videos utilizing various improved video inference architectures. For example, the video inference system provides a framework for improving digital video processing by increasing the efficiency of both central processing units (CPUs) and graphics processing units (GPUs). In one example, the video inference system utilizes a first video inference architecture to reduce the number of computing resources needed to inference digital videos by analyzing multiple digital videos utilizing sets of CPU/GPU containers along with parallel pipeline processing. In a further example, the video inference system utilizes a second video inference architecture that facilitates multiple CPUs to preprocess multiple digital videos in parallel as well as a GPU to continuously, sequentially, and efficiently inference each of the digital videos.

Type: Grant

Filed: November 2, 2020

Date of Patent: August 20, 2024

Assignee: Adobe Inc.

Inventors: Akhilesh Kumar, Xiaozhen Xue, Daniel Miranda, Nicolas Huynh Thien, Kshitiz Garg
Open-domain trending hashtag recommendations

Patent number: 12050647

Abstract: Techniques for recommending hashtags, including trending hashtags, are disclosed. An example method includes accessing a graph. The graph includes video nodes representing videos, historical hashtag nodes representing historical hashtags, and edges indicating associations among the video nodes and the historical hashtag nodes. A trending hashtag is identified. An edge is added to the graph between a historical hashtag node representing a historical hashtag and a trending hashtag node representing the trending hashtag, based on a semantic similarity between the historical hashtag and the trending hashtag. A new video node representing a new video is added to the video nodes of the graph. A graph neural network (GNN) is applied to the graph, and the GNN predicts a new edge between the trending hashtag node and the new video node. The trending hashtag is recommended for the new video based on prediction of the new edge.

Type: Grant

Filed: July 29, 2022

Date of Patent: July 30, 2024

Assignee: Adobe Inc.

Inventors: Somdeb Sarkhel, Xiang Chen, Viswanathan Swaminathan, Swapneel Mehta, Saayan Mitra, Ryan Rossi, Han Guo, Ali Aminian, Kshitiz Garg
TRACKING UNIQUE FACE IDENTITIES IN VIDEOS

Publication number: 20240153303

Abstract: Some aspects of the technology described herein perform identity identification on faces in a video. Object tracking is performed on detected faces in frames of a video to generate tracklets. Each tracklet comprises a sequence of consecutive frames in which each frame includes a detected face for a person. The tracklets are clustered using face feature vectors for detected faces of each tracklet to generate a plurality of clusters. Information is stored in an identity datastore, including a first identifier for a first identity in association with an indication of frames from tracklets in a first cluster from the plurality of clusters.

Type: Application

Filed: November 7, 2022

Publication date: May 9, 2024

Inventors: Ali AMINIAN, Aashish Kumar MISRAA, Kshitiz GARG, Aseem AGARWALA
OPEN-DOMAIN TRENDING HASHTAG RECOMMENDATIONS

Publication number: 20240037149

Abstract: Techniques for recommending hashtags, including trending hashtags, are disclosed. An example method includes accessing a graph. The graph includes video nodes representing videos, historical hashtag nodes representing historical hashtags, and edges indicating associations among the video nodes and the historical hashtag nodes. A trending hashtag is identified. An edge is added to the graph between a historical hashtag node representing a historical hashtag and a trending hashtag node representing the trending hashtag, based on a semantic similarity between the historical hashtag and the trending hashtag. A new video node representing a new video is added to the video nodes of the graph. A graph neural network (GNN) is applied to the graph, and the GNN predicts a new edge between the trending hashtag node and the new video node. The trending hashtag is recommended for the new video based on prediction of the new edge.

Type: Application

Filed: July 29, 2022

Publication date: February 1, 2024

Inventors: Somdeb Sarkhel, Xiang Chen, Viswanathan Swaminathan, Swapneel Mehta, Saayan Mitra, Ryan Rossi, Han Guo, Ali Aminian, Kshitiz Garg
PROCESSING FRAMEWORK FOR TEMPORAL-CONSISTENT FACE MANIPULATION IN VIDEOS

Publication number: 20230377339

Abstract: Embodiments are disclosed for generating temporally consistent manipulated videos. A method of generating temporally consistent manipulated videos comprises receiving a target appearance and an input digital video including a plurality of frames, generating a plurality of target appearance frames from the plurality of frames, training a video prediction network to generate a digital video wherein a subject of the digital video has its appearance modified to match the target appearance, providing the input digital video to the video prediction network, and generating, by the video prediction network, an output digital video wherein the subject of the output digital video has its appearance modified to match the target appearance.

Type: Application

Filed: May 23, 2022

Publication date: November 23, 2023

Applicant: Adobe Inc.

Inventors: Han GUO, Kshitiz GARG, Ali AMINIAN, Aashish MISRAA, William MARINO, Nicolas HUYNH THIEN
CUSTOMIZABLE FRAMEWORK TO EXTRACT MOMENTS OF INTEREST

Publication number: 20230140369

Abstract: Embodiments of the present invention provide systems, methods, and computer storage media for extracting moments of interest (e.g., video frames, video segments) from a video. In an example embodiment, independent and/or orthogonal machine learning models are used to extract different types of features considering different modalities, and each frame in the video is assigned an importance score for each model. The importance scores for each model are combined into an aggregated importance score for each frame in the video. Depending on the embodiment, the aggregated importance scores are used to visualize the score per frame, identify moments of interest, automatically crop down the video into a highlight reel, browse or visualize the moments of interest within the video, and/or search across multiple videos.

Type: Application

Filed: October 28, 2021

Publication date: May 4, 2023

Inventors: Ali Aminian, William Lawrence Marino, Kshitiz Garg, Aseem Agarwala
INCREASING EFFICIENCY OF INFERENCING DIGITAL VIDEOS UTILIZING MACHINE-LEARNING MODELS

Publication number: 20220138596

Abstract: This disclosure describes one or more implementations of a video inference system that utilizes machine-learning models to efficiently and flexibly process digital videos utilizing various improved video inference architectures. For example, the video inference system provides a framework for improving digital video processing by increasing the efficiency of both central processing units (CPUs) and graphics processing units (GPUs). In one example, the video inference system utilizes a first video inference architecture to reduce the number of computing resources needed to inference digital videos by analyzing multiple digital videos utilizing sets of CPU/GPU containers along with parallel pipeline processing. In a further example, the video inference system utilizes a second video inference architecture that facilitates multiple CPUs to preprocess multiple digital videos in parallel as well as a GPU to continuously, sequentially, and efficiently inference each of the digital videos.

Type: Application

Filed: November 2, 2020

Publication date: May 5, 2022

Inventors: Akhilesh Kumar, Xiaozhen Xue, Daniel Miranda, Nicolas Huynh Thien, Kshitiz Garg
Sensor fusion and deep learning

Patent number: 10762440

Abstract: Some embodiments provide a sensor data-processing system which detects and classifies objects detected in an environment via fusion of sensor data representations generated by multiple separate sensors. The sensor data-processing system can fuse sensor data representations generated by multiple sensor devices into a fused sensor data representation and can further detect and classify features in the fused sensor data representation. Feature detection can be implemented based at least in part upon utilizing a feature-detection model generated via one or more of deep learning and traditional machine learning. The sensor data-processing system can adjust sensor data processing of representations generated by sensor devices based on external factors including indications of sensor health and environmental conditions.

Type: Grant

Filed: September 23, 2016

Date of Patent: September 1, 2020

Assignee: Apple Inc.

Inventors: Kshitiz Garg, Ahmad Al-Dahle
Shared sensor data across sensor processing pipelines

Patent number: 10671068

Abstract: Sensor data captured at by different sensors may be shared across different sensor processing pipelines. Sensor processing pipelines may process captured sensor data from respective sensors. Some of the sensor data that is received or processed at one sensor data processing pipeline may be provided to another sensor data processing pipeline so that subsequent processing stages at the recipient sensor processing pipeline may process the combined sensor data in order to determine a perception decision. Different types of sensor data may be shared, including raw sensor data, processed sensor data, or data derived from sensor data. A control system may perform control actions based on the perception decisions determined by the sensor processing pipelines that share sensor data.

Type: Grant

Filed: September 19, 2017

Date of Patent: June 2, 2020

Assignee: Apple Inc.

Inventors: Xinyu Xu, Ahmad Al-Dahle, Kshitiz Garg

1 2 3 next