Patents by Inventor Yaman Kumar

Yaman Kumar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240086457
    Abstract: A content analysis system provides content understanding for a content item using an attention aware multi-modal model. Given a content item, feature extractors extract features from content components of the content item in which the content components comprise multiple modalities. A cross-modal attention encoder of the attention aware multi-modal model generates an embedding of the content item using features extracted from the content components. A decoder of the attention aware multi-modal model generates an action-reason statement using the embedding of the content item from the cross-modal attention encoder.
    Type: Application
    Filed: September 14, 2022
    Publication date: March 14, 2024
    Inventors: Yaman KUMAR, Vaibhav AHLAWAT, Ruiyi ZHANG, Milan AGGARWAL, Ganesh Karbhari PALWE, Balaji KRISHNAMURTHY, Varun KHURANA
  • Patent number: 11907508
    Abstract: Content creation techniques are described that leverage content analytics to provide insight and guidance as part of content creation. To do so, content features are extracted by a content analytics system from a plurality of content and used by the content analytics system as a basis to generate a content dataset. Event data is also collected by the content analytics system from an event data source. Event data describes user interaction with respective items of content, including subsequent activities in both online and physical environments. The event data is then used to generate an event dataset. An analytics user interface is then generated by the content analytics system using the content dataset and the event dataset and is usable to guide subsequent content creation and editing.
    Type: Grant
    Filed: April 12, 2023
    Date of Patent: February 20, 2024
    Assignee: Adobe Inc.
    Inventors: Yaman Kumar, Somesh Singh, William Brandon George, Timothy Chia-chi Liu, Suman Basetty, Pranjal Prasoon, Nikaash Puri, Mihir Naware, Mihai Corlan, Joshua Marshall Butikofer, Abhinav Chauhan, Kumar Mrityunjay Singh, James Patrick O'Reilly, Hyman Chung, Lauren Dest, Clinton Hansen Goudie-Nice, Brandon John Pack, Balaji Krishnamurthy, Kunal Kumar Jain, Alexander Klimetschek, Matthew William Rozen
  • Publication number: 20230252993
    Abstract: This disclosure describes one or more implementations of systems, non-transitory computer-readable media, and methods that recognize speech from a digital video utilizing an unsupervised machine learning model, such as a generative adversarial neural network (GAN) model. In one or more implementations, the disclosed systems utilize an image encoder to generate self-supervised deep visual speech representations from frames of an unlabeled (or unannotated) digital video. Subsequently, in one or more embodiments, the disclosed systems generate viseme sequences from the deep visual speech representations (e.g., via segmented visemic speech representations from clusters of the deep visual speech representations) utilizing the adversarially trained GAN model. Indeed, in some instances, the disclosed systems decode the viseme sequences belonging to the digital video to generate an electronic transcription and/or digital audio for the digital video.
    Type: Application
    Filed: February 4, 2022
    Publication date: August 10, 2023
    Inventors: Yaman Kumar, Balaji Krishnamurthy
  • Publication number: 20230085466
    Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for determining user affinities by tracking historical user interactions with tagged digital content and using the user affinities in content generation applications. Accordingly, the system may track user interactions with published digital content in order to generate user interaction reports whenever a user engages with the digital content. The system may aggregate the interaction reports to generate an affinity profile for a user or audience of users. A marketer may then generate digital content for a target user or audience of users and the system may process the digital content to generate a set of tags for the digital content. Based on the set of tags, the system may then evaluate the digital content in view of the affinity profile for the target user/audience to determine similarities or differences between the digital content and the affinity profile.
    Type: Application
    Filed: September 16, 2021
    Publication date: March 16, 2023
    Inventors: Yaman Kumar, Vinh Ngoc Khuc, Vijay Srivastava, Umang Moorarka, Sukriti Verma, Simra Shahid, Shirsh Bansal, Shankar Venkitachalam, Sean Steimer, Sandipan Karmakar, Nimish Srivastav, Nikaash Puri, Mihir Naware, Kunal Kumar Jain, Kumar Mrityunjay Singh, Hyman Chung, Horea Bacila, Florin Silviu Iordache, Deepak Pai, Balaji Krishnamurthy
  • Patent number: 10937428
    Abstract: A pose-invariant visual speech recognition system obtains a single view input of a speaker, such as a single video stream captured by a single camera. The single view input provides a particular pose of the speaker, which refers to a view angle, relative to the lens or image capture component of the camera that captured the video of the speaker, at which the speaker's face is captured. The pose of the speaker is used to select a visual speech recognition model to use to generate a text label that is the words spoken by the speaker. One or more additional view angles of the speaker are also generated from the single view input of the speaker. These one or more additional view angles, along with the single view input of the speaker, are used by the selected visual speech recognition model to generate the text label for the speaker.
    Type: Grant
    Filed: March 11, 2019
    Date of Patent: March 2, 2021
    Assignee: Adobe Inc.
    Inventor: Yaman Kumar
  • Publication number: 20200294507
    Abstract: A pose-invariant visual speech recognition system obtains a single view input of a speaker, such as a single video stream captured by a single camera. The single view input provides a particular pose of the speaker, which refers to a view angle, relative to the lens or image capture component of the camera that captured the video of the speaker, at which the speaker's face is captured. The pose of the speaker is used to select a visual speech recognition model to use to generate a text label that is the words spoken by the speaker. One or more additional view angles of the speaker are also generated from the single view input of the speaker. These one or more additional view angles, along with the single view input of the speaker, are used by the selected visual speech recognition model to generate the text label for the speaker.
    Type: Application
    Filed: March 11, 2019
    Publication date: September 17, 2020
    Applicant: Adobe Inc.
    Inventor: Yaman Kumar