Patents by Inventor Kshitiz Garg
Kshitiz Garg has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20260105670Abstract: Systems, methods, and non-transitory computer-readable media generate custom animations comprises a structure of a coarse animation prompt. For example, the disclosed systems receive a style prompt and receive a coarse animation prompt. The disclosed systems generate, utilizing a media generation model, a custom animation having a structure and timing of the coarse animation prompt and a style informed by the style prompt. The disclosed systems also provide the custom animation for display via a graphical user interface.Type: ApplicationFiled: September 30, 2025Publication date: April 16, 2026Inventors: Yangtuanfeng Wang, Li-Yi Wei, Wilmot Wei-Mau Li, Valerie Head, Seth Walker, Lakshya Lnu, Kshitiz Garg, Kazi Rubaiat Habib, Jun Saito, James Ratliff, Duygu Ceylan Aksit, Dafei Qin, Cameron Smith
-
Patent number: 12579608Abstract: Systems and methods for generating tile-able patterns from text include obtaining a text prompt and generating, by a generation prior model, a latent vector based on the text prompt, where the generation prior model is trained to output vectors within a distribution of tile-able patterns. An image generation model then generates an output image based on the latent vector. The output image comprises a tile-able pattern including an element from the text prompt.Type: GrantFiled: December 1, 2023Date of Patent: March 17, 2026Assignee: ADOBE INC.Inventors: Vineet Batra, Sumit Chaturvedi, Abhishek Rai, Pranav Vineet Aggarwal, Ajinkya Gorakhnath Kale, Aman Jeph, Ankit Phogat, Sumit Dhingra, Fengbin Chen, Kshitiz Garg, Milos Hasan, Midhun Harikumar, Gaurav Suresh Pathak, Souymodip Chakraborty
-
Publication number: 20260044563Abstract: Embodiments of the present invention provide systems, methods, and computer storage media for extracting moments of interest (e.g., video frames, video segments) from a video. In an example embodiment, independent and/or orthogonal machine learning models are used to extract different types of features considering different modalities, and each frame in the video is assigned an importance score for each model. The importance scores for each model are combined into an aggregated importance score for each frame in the video. Depending on the embodiment, the aggregated importance scores are used to visualize the score per frame, identify moments of interest, automatically crop down the video into a highlight reel, browse or visualize the moments of interest within the video, and/or search across multiple videos.Type: ApplicationFiled: October 16, 2025Publication date: February 12, 2026Inventors: Ali AMINIAN, William Lawrence MARINO, Kshitiz GARG, Aseem Omprakash AGARWALA
-
Patent number: 12513369Abstract: Embodiments are disclosed for generating a temporally coherent video extension. The method includes displaying, on a graphical user interface, a user interface element representing a video to be extended, where the video includes a number of frames. The method further includes receiving an input via the graphical user interface associated with the user interface element. The input causes a visual change to the user interface element which represents a duration of an extension to be made to the video. The method further includes generating frames based on the duration of the extension. The generated frames use motion information determined from frames of the video. The motion information represents a per-pixel motion between at least a pair of frames of the video. The method further includes providing, for display on the graphical user interface, an extended video including the frames of the video and the generated frames.Type: GrantFiled: September 10, 2024Date of Patent: December 30, 2025Assignee: Adobe Inc.Inventors: Gabriela Duncombe, Xue Bai, Lakshya Lnu, Kshitiz Garg, Gunjan Aggarwal, Feng Liu, Aseem Agarwala, Ali Aminian, Jui-Hsien Wang, Zhe Wang
-
Patent number: 12468760Abstract: Embodiments of the present invention provide systems, methods, and computer storage media for extracting moments of interest (e.g., video frames, video segments) from a video. In an example embodiment, independent and/or orthogonal machine learning models are used to extract different types of features considering different modalities, and each frame in the video is assigned an importance score for each model. The importance scores for each model are combined into an aggregated importance score for each frame in the video. Depending on the embodiment, the aggregated importance scores are used to visualize the score per frame, identify moments of interest, automatically crop down the video into a highlight reel, browse or visualize the moments of interest within the video, and/or search across multiple videos.Type: GrantFiled: October 28, 2021Date of Patent: November 11, 2025Assignee: Adobe Inc.Inventors: Ali Aminian, William Lawrence Marino, Kshitiz Garg, Aseem Agarwala
-
Patent number: 12450504Abstract: This disclosure describes one or more implementations of a video inference system that utilizes machine-learning models to efficiently and flexibly process digital videos utilizing various improved video inference architectures. For example, the video inference system provides a framework for improving digital video processing by increasing the efficiency of both central processing units (CPUs) and graphics processing units (GPUs). In one example, the video inference system utilizes a first video inference architecture to reduce the number of computing resources needed to inference digital videos by analyzing multiple digital videos utilizing sets of CPU/GPU containers along with parallel pipeline processing. In a further example, the video inference system utilizes a second video inference architecture that facilitates multiple CPUs to preprocess multiple digital videos in parallel as well as a GPU to continuously, sequentially, and efficiently inference each of the digital videos.Type: GrantFiled: July 12, 2024Date of Patent: October 21, 2025Assignee: Adobe Inc.Inventors: Akhilesh Kumar, Xiaozhen Xue, Daniel Miranda, Nicolas Huynh Thien, Kshitiz Garg
-
Patent number: 12412419Abstract: Some aspects of the technology described herein perform identity identification on faces in a video. Object tracking is performed on detected faces in frames of a video to generate tracklets. Each tracklet comprises a sequence of consecutive frames in which each frame includes a detected face for a person. The tracklets are clustered using face feature vectors for detected faces of each tracklet to generate a plurality of clusters. Information is stored in an identity datastore, including a first identifier for a first identity in association with an indication of frames from tracklets in a first cluster from the plurality of clusters.Type: GrantFiled: November 7, 2022Date of Patent: September 9, 2025Assignee: ADOBE INC.Inventors: Ali Aminian, Aashish Kumar Misraa, Kshitiz Garg, Aseem Agarwala
-
Publication number: 20250252741Abstract: Embodiments are disclosed for receiving a user input and an input video comprising multiple frames. The method may include extracting a text feature from the user input. The method may further include extracting a plurality of image features from the frames. The method may further include identifying one or more keyframes from the frames that include the object. The method may further include clustering one or more groups of the one or more keyframes. The method may further include generating a plurality of segmentation masks for each group. The method may further include determining a set of reference masks corresponding to the user input and the object. The method may further include generating a set of fusion masks by combining the plurality of segmentation masks and the set of reference masks. The method may further include propagating the set of fusion masks and outputting a final set of masks.Type: ApplicationFiled: March 31, 2025Publication date: August 7, 2025Applicant: Adobe Inc.Inventors: Shivam Nalin PATEL, Kshitiz GARG, Han GUO, Ali AMINIAN, Aashish MISRAA
-
Patent number: 12266181Abstract: Embodiments are disclosed for receiving a user input and an input video comprising multiple frames. The method may include extracting a text feature from the user input. The method may further include extracting a plurality of image features from the frames. The method may further include identifying one or more keyframes from the frames that include the object. The method may further include clustering one or more groups of the one or more keyframes. The method may further include generating a plurality of segmentation masks for each group. The method may further include determining a set of reference masks corresponding to the user input and the object. The method may further include generating a set of fusion masks by combining the plurality of segmentation masks and the set of reference masks. The method may further include propagating the set of fusion masks and outputting a final set of masks.Type: GrantFiled: November 19, 2021Date of Patent: April 1, 2025Assignee: Adobe Inc.Inventors: Shivam Nalin Patel, Kshitiz Garg, Han Guo, Ali Aminian, Aashish Misraa
-
Publication number: 20240420389Abstract: Systems and methods for generating tile-able patterns from text include obtaining a text prompt and generating, by a generation prior model, a latent vector based on the text prompt, where the generation prior model is trained to output vectors within a distribution of tile-able patterns. An image generation model then generates an output image based on the latent vector. The output image comprises a tile-able pattern including an element from the text prompt.Type: ApplicationFiled: December 1, 2023Publication date: December 19, 2024Inventors: Vineet Batra, Sumit Chaturvedi, Abhishek Rai, Pranav Vineet Aggarwal, Ajinkya Gorakhnath Kale, Aman Jeph, Ankit Phogat, Sumit Dhingra, Fengbin Chen, Kshitiz Garg, Milos Hasan, Midhun Harikumar, Gaurav Suresh Pathak, Souymodip Chakraborty
-
Publication number: 20240362506Abstract: This disclosure describes one or more implementations of a video inference system that utilizes machine-learning models to efficiently and flexibly process digital videos utilizing various improved video inference architectures. For example, the video inference system provides a framework for improving digital video processing by increasing the efficiency of both central processing units (CPUs) and graphics processing units (GPUs). In one example, the video inference system utilizes a first video inference architecture to reduce the number of computing resources needed to inference digital videos by analyzing multiple digital videos utilizing sets of CPU/GPU containers along with parallel pipeline processing. In a further example, the video inference system utilizes a second video inference architecture that facilitates multiple CPUs to preprocess multiple digital videos in parallel as well as a GPU to continuously, sequentially, and efficiently inference each of the digital videos.Type: ApplicationFiled: July 12, 2024Publication date: October 31, 2024Inventors: Akhilesh Kumar, Xiaozhen Xue, Daniel Miranda, Nicolas Huynh Thien, Kshitiz Garg
-
Patent number: 12067499Abstract: This disclosure describes one or more implementations of a video inference system that utilizes machine-learning models to efficiently and flexibly process digital videos utilizing various improved video inference architectures. For example, the video inference system provides a framework for improving digital video processing by increasing the efficiency of both central processing units (CPUs) and graphics processing units (GPUs). In one example, the video inference system utilizes a first video inference architecture to reduce the number of computing resources needed to inference digital videos by analyzing multiple digital videos utilizing sets of CPU/GPU containers along with parallel pipeline processing. In a further example, the video inference system utilizes a second video inference architecture that facilitates multiple CPUs to preprocess multiple digital videos in parallel as well as a GPU to continuously, sequentially, and efficiently inference each of the digital videos.Type: GrantFiled: November 2, 2020Date of Patent: August 20, 2024Assignee: Adobe Inc.Inventors: Akhilesh Kumar, Xiaozhen Xue, Daniel Miranda, Nicolas Huynh Thien, Kshitiz Garg
-
Patent number: 12050647Abstract: Techniques for recommending hashtags, including trending hashtags, are disclosed. An example method includes accessing a graph. The graph includes video nodes representing videos, historical hashtag nodes representing historical hashtags, and edges indicating associations among the video nodes and the historical hashtag nodes. A trending hashtag is identified. An edge is added to the graph between a historical hashtag node representing a historical hashtag and a trending hashtag node representing the trending hashtag, based on a semantic similarity between the historical hashtag and the trending hashtag. A new video node representing a new video is added to the video nodes of the graph. A graph neural network (GNN) is applied to the graph, and the GNN predicts a new edge between the trending hashtag node and the new video node. The trending hashtag is recommended for the new video based on prediction of the new edge.Type: GrantFiled: July 29, 2022Date of Patent: July 30, 2024Assignee: Adobe Inc.Inventors: Somdeb Sarkhel, Xiang Chen, Viswanathan Swaminathan, Swapneel Mehta, Saayan Mitra, Ryan Rossi, Han Guo, Ali Aminian, Kshitiz Garg
-
Publication number: 20240153303Abstract: Some aspects of the technology described herein perform identity identification on faces in a video. Object tracking is performed on detected faces in frames of a video to generate tracklets. Each tracklet comprises a sequence of consecutive frames in which each frame includes a detected face for a person. The tracklets are clustered using face feature vectors for detected faces of each tracklet to generate a plurality of clusters. Information is stored in an identity datastore, including a first identifier for a first identity in association with an indication of frames from tracklets in a first cluster from the plurality of clusters.Type: ApplicationFiled: November 7, 2022Publication date: May 9, 2024Inventors: Ali AMINIAN, Aashish Kumar MISRAA, Kshitiz GARG, Aseem AGARWALA
-
Publication number: 20240037149Abstract: Techniques for recommending hashtags, including trending hashtags, are disclosed. An example method includes accessing a graph. The graph includes video nodes representing videos, historical hashtag nodes representing historical hashtags, and edges indicating associations among the video nodes and the historical hashtag nodes. A trending hashtag is identified. An edge is added to the graph between a historical hashtag node representing a historical hashtag and a trending hashtag node representing the trending hashtag, based on a semantic similarity between the historical hashtag and the trending hashtag. A new video node representing a new video is added to the video nodes of the graph. A graph neural network (GNN) is applied to the graph, and the GNN predicts a new edge between the trending hashtag node and the new video node. The trending hashtag is recommended for the new video based on prediction of the new edge.Type: ApplicationFiled: July 29, 2022Publication date: February 1, 2024Inventors: Somdeb Sarkhel, Xiang Chen, Viswanathan Swaminathan, Swapneel Mehta, Saayan Mitra, Ryan Rossi, Han Guo, Ali Aminian, Kshitiz Garg
-
Publication number: 20230377339Abstract: Embodiments are disclosed for generating temporally consistent manipulated videos. A method of generating temporally consistent manipulated videos comprises receiving a target appearance and an input digital video including a plurality of frames, generating a plurality of target appearance frames from the plurality of frames, training a video prediction network to generate a digital video wherein a subject of the digital video has its appearance modified to match the target appearance, providing the input digital video to the video prediction network, and generating, by the video prediction network, an output digital video wherein the subject of the output digital video has its appearance modified to match the target appearance.Type: ApplicationFiled: May 23, 2022Publication date: November 23, 2023Applicant: Adobe Inc.Inventors: Han GUO, Kshitiz GARG, Ali AMINIAN, Aashish MISRAA, William MARINO, Nicolas HUYNH THIEN
-
Publication number: 20230140369Abstract: Embodiments of the present invention provide systems, methods, and computer storage media for extracting moments of interest (e.g., video frames, video segments) from a video. In an example embodiment, independent and/or orthogonal machine learning models are used to extract different types of features considering different modalities, and each frame in the video is assigned an importance score for each model. The importance scores for each model are combined into an aggregated importance score for each frame in the video. Depending on the embodiment, the aggregated importance scores are used to visualize the score per frame, identify moments of interest, automatically crop down the video into a highlight reel, browse or visualize the moments of interest within the video, and/or search across multiple videos.Type: ApplicationFiled: October 28, 2021Publication date: May 4, 2023Inventors: Ali Aminian, William Lawrence Marino, Kshitiz Garg, Aseem Agarwala
-
Publication number: 20220138596Abstract: This disclosure describes one or more implementations of a video inference system that utilizes machine-learning models to efficiently and flexibly process digital videos utilizing various improved video inference architectures. For example, the video inference system provides a framework for improving digital video processing by increasing the efficiency of both central processing units (CPUs) and graphics processing units (GPUs). In one example, the video inference system utilizes a first video inference architecture to reduce the number of computing resources needed to inference digital videos by analyzing multiple digital videos utilizing sets of CPU/GPU containers along with parallel pipeline processing. In a further example, the video inference system utilizes a second video inference architecture that facilitates multiple CPUs to preprocess multiple digital videos in parallel as well as a GPU to continuously, sequentially, and efficiently inference each of the digital videos.Type: ApplicationFiled: November 2, 2020Publication date: May 5, 2022Inventors: Akhilesh Kumar, Xiaozhen Xue, Daniel Miranda, Nicolas Huynh Thien, Kshitiz Garg
-
Patent number: 10762440Abstract: Some embodiments provide a sensor data-processing system which detects and classifies objects detected in an environment via fusion of sensor data representations generated by multiple separate sensors. The sensor data-processing system can fuse sensor data representations generated by multiple sensor devices into a fused sensor data representation and can further detect and classify features in the fused sensor data representation. Feature detection can be implemented based at least in part upon utilizing a feature-detection model generated via one or more of deep learning and traditional machine learning. The sensor data-processing system can adjust sensor data processing of representations generated by sensor devices based on external factors including indications of sensor health and environmental conditions.Type: GrantFiled: September 23, 2016Date of Patent: September 1, 2020Assignee: Apple Inc.Inventors: Kshitiz Garg, Ahmad Al-Dahle
-
Patent number: 10671068Abstract: Sensor data captured at by different sensors may be shared across different sensor processing pipelines. Sensor processing pipelines may process captured sensor data from respective sensors. Some of the sensor data that is received or processed at one sensor data processing pipeline may be provided to another sensor data processing pipeline so that subsequent processing stages at the recipient sensor processing pipeline may process the combined sensor data in order to determine a perception decision. Different types of sensor data may be shared, including raw sensor data, processed sensor data, or data derived from sensor data. A control system may perform control actions based on the perception decisions determined by the sensor processing pipelines that share sensor data.Type: GrantFiled: September 19, 2017Date of Patent: June 2, 2020Assignee: Apple Inc.Inventors: Xinyu Xu, Ahmad Al-Dahle, Kshitiz Garg