Patents by Inventor Vimal Bhat

Vimal Bhat has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Automated generation and presentation of sign language avatars for video content

Patent number: 11935170

Abstract: Systems, methods, and computer-readable media are disclosed for systems and methods for automated generation and presentation of sign language avatars for video content. Example methods may include determining, by one or more computer processors coupled to memory, a first segment of video content, the first segment including a first set of frames, first audio content, and first subtitle data, where the first subtitle data comprises a first word and a second word. Methods may include determining, using a first machine learning model, a first sign gesture associated with the first word, determining first motion data associated with the first sign gesture, and determining first facial expression data. Methods may include generating an avatar configured to perform the first sign gesture using the first motion data, where a facial expression of the avatar while performing the first sign gesture is based on the first facial expression data.

Type: Grant

Filed: November 18, 2021

Date of Patent: March 19, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Abhinav Jain, Avijit Vajpayee, Vimal Bhat, Arjun Cholkar, Louis Kirk Barker
Ensemble of machine learning models for automatic scene change detection

Patent number: 11776273

Abstract: Techniques for automatic scene change detection are described. As one example, a computer-implemented method includes receiving a request to train an ensemble of machine learning models on a training dataset of videos having labels that indicate scene changes to detect a scene change in a video, partitioning each video file of the training dataset of videos into a plurality of shots, training the ensemble of machine learning models into a trained ensemble of machine learning models based at least in part on the plurality of shots of the training dataset of videos and the labels that indicate scene changes, receiving an inference request for an input video, partitioning the input video into a plurality of shots, generating, by the trained ensemble of machine learning models, an inference of one or more scene changes in the input video based at least in part on the plurality of shots of the input video, and transmitting the inference to a client application or to a storage location.

Type: Grant

Filed: November 30, 2020

Date of Patent: October 3, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Shixing Chen, Muhammad Raffay Hamid, Vimal Bhat, Shiva Krishnamurthy
Shot contras five self-supervised learning of a plurality of machine learning models for video analysis applications

Patent number: 11748988

Abstract: Techniques for automatic scene change detection in a video are described. As one example, a computer-implemented method includes extracting features of a query shot and its neighboring shots of a first set of shots without labels with a query model, determining a key shot of the neighboring shots which is most similar to the query shot based at least in part on the features of the query shot and its neighboring shots, extracting features of the key shot with a key model, training the query model into a trained query model based at least in part on a comparison of the features of the query shot and the features of the key shot, extracting features of a second set of shots with labels with the trained query model, and training a temporal model into a trained temporal model based at least in part on the features extracted from the second set of shots and the labels of the second set of shots.

Type: Grant

Filed: April 21, 2021

Date of Patent: September 5, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Shixing Chen, Xiaohan Nie, David Jiatian Fan, Dongqing Zhang, Vimal Bhat, Muhammad Raffay Hamid
Event based audio-video sync detection

Patent number: 11659217

Abstract: Techniques are described for detecting desynchronization between an audio component and a video component of a media presentation. Feature sets may be determined for portions of the audio component and portions of the video component, which may then be used to generate correlations between portions of the audio component and portions of the video component. Synchronization may then be assessed based on the correlations.

Type: Grant

Filed: March 29, 2021

Date of Patent: May 23, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Hooman Mahyar, Avijit Vajpayee, Abhinav Jain, Arjun Cholkar, Vimal Bhat
Audio-video synchronization for non-original audio tracks

Patent number: 11610610

Abstract: Systems and methods are provided for detecting and correcting synchronization errors in multimedia content comprising a video stream and a non-original audio stream. Techniques for directly detecting synchronization of video and audio streams may be inadequate to detect synchronize errors for non-original audio streams, particularly where such non-original audio streams contain audio not reflective of events within the video stream, such as speaking dialog in a different language than the speakers of the video stream. To overcome this problem, the present disclosure enables synchronization of a non-original audio stream to another audio stream, such as an original audio stream, that is synchronized to the video stream. By comparison of signatures, the non-original and other audio stream are aligned to determine an offset that can be used to synchronize the non-original audio stream to the video stream.

Type: Grant

Filed: December 10, 2021

Date of Patent: March 21, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Avijit Vajpayee, Hooman Mahyar, Vimal Bhat, Abhinav Jain, Zhikang Zhang
Person replacement utilizing deferred neural rendering

Patent number: 11582519

Abstract: Techniques are disclosed for performing video synthesis of audiovisual content. In an example, a computing system may determine first parameters of a face and body of a source person from a first frame in a video shot. The system also determines second parameters of a face and body of a target person. The system determines that the target person is a replacement for the source person in the first frame. The system generates third parameters of the target person based on merging the first parameters with the second parameters. The system then performs deferred neural rendering of the target person based on a neural texture that corresponds to a texture space of the video shot. The system then outputs a second frame that shows the target person as the replacement for the source person.

Type: Grant

Filed: March 29, 2021

Date of Patent: February 14, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Vimal Bhat, Sunil Sharadchandra Hadap, Abhinav Jain
Facial synchronization utilizing deferred neural rendering

Patent number: 11581020

Abstract: Techniques are disclosed for performing video synthesis of audiovisual content. In an example, a computing system may determine first facial parameters of a face of a particular person from a first frame in a video shot, whereby the video shot shows the particular person speaking a message. The system may determine second facial parameters based on an audio file that corresponds to the message being spoken in a different way from the video shot. The system may generate third facial parameters by merging the first and the second facial parameters. The system may identify a region of the face that is associated with a difference between the first and second facial parameters, render the region of the face based on a neural texture of the video shot, and then output a new frame showing the face of the particular person speaking the message in the different way.

Type: Grant

Filed: March 30, 2021

Date of Patent: February 14, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Sunil Sharadchandra Hadap, Vimal Bhat, Abhinav Jain
Detection and replacement of burned-in subtitles

Patent number: 11216684

Abstract: Techniques are described for detecting and replacing burned-in subtitles in image and video content.

Type: Grant

Filed: February 4, 2020

Date of Patent: January 4, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Hooman Mahyar, Vimal Bhat, Harshal Dilip Wanjari, Sushanta Das, Neha Aggarwal
Customized action based on video item events

Patent number: 11093781

Abstract: A user may indicate an interest relating to events such as objects, persons, or activities, where the events included in content depicted in a video. The user may also indicate a configurable action associated with the user interest, including receiving a notification via an electronic device. A video item, for example a live-streaming sporting event, may be broken into frames and analyzed frame-by-frame to determine a region of interest. The region of interest is then analyzed to identify objects, persons, or activities depicted in the frame. In particular, the region of interest is compared to stored images that are known to depict different objects, persons, or activities. When a region of interest is determined to be associated with the user interest, the configurable action is triggered.

Type: Grant

Filed: December 3, 2018

Date of Patent: August 17, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Vimal Bhat, Shai Ben Nun, Hooman Mahyar, Harshal Wanjari
Automated generation and presentation of textual descriptions of video content

Patent number: 10999566

Abstract: Systems, methods, and computer-readable media are disclosed for systems and methods for automated generation of textual descriptions of video content. Example methods may include determining, by one or more computer processors coupled to memory, a first segment of video content, the first segment including a first set of frames and first audio content, determining, using a first neural network, a first action that occurs in the first set of frames, and determining a first sound present in the first audio content. Some methods may include generating a vector representing the first action and the first sound, and generating, using a second neural network and the vector, a first textual description of the first segment, where the first textual description includes words that describe events of the first segment.

Type: Grant

Filed: September 6, 2019

Date of Patent: May 4, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Hooman Mahyar, Vimal Bhat, Jatin Jain, Udit Bhatia, Roya Hosseini
Audio locale mismatch detection

Patent number: 10860648

Abstract: Systems, methods, and computer-readable media are disclosed for detecting a mismatch between the spoken language in an audio file and the audio language that is tagged as the spoken language in the audio file metadata. Example methods may include receiving a media file including spoken language metadata. Certain methods include generating an audio sample from the media file. Certain methods include generating a text translation of the audio sample based on the spoken language metadata. Certain methods include determining that the spoken language metadata does not match a spoken language in the audio sample based on the text translation. Certain methods include sending an indication that the spoken language metadata does not match the spoken language.

Type: Grant

Filed: September 12, 2018

Date of Patent: December 8, 2020

Assignee: Amazon Technologies, Inc.

Inventors: Manolya McCormick, Vimal Bhat, Shai Ben Nun
CUSTOMIZED ACTION BASED ON VIDEO ITEM EVENTS

Publication number: 20200175303

Abstract: A user may indicate an interest relating to events such as objects, persons, or activities, where the events included in content depicted in a video. The user may also indicate a configurable action associated with the user interest, including receiving a notification via an electronic device. A video item, for example a live-streaming sporting event, may be broken into frames and analyzed frame-by-frame to determine a region of interest. The region of interest is then analyzed to identify objects, persons, or activities depicted in the frame. In particular, the region of interest is compared to stored images that are known to depict different objects, persons, or activities. When a region of interest is determined to be associated with the user interest, the configurable action is triggered.

Type: Application

Filed: December 3, 2018

Publication date: June 4, 2020

Inventors: Vimal Bhat, Shai Ben Nun, Hooman Mahyar, Harshal Wanjari
Customized video content summary generation and presentation

Patent number: 10455297

Abstract: Systems, methods, and computer-readable media are disclosed for systems and methods for customized video content summary generation. Example methods may include determining a first segment of digital content including a first set of frames, first textual content, and first audio content. Example methods may include determining a first event that occurs in the first set of frames, determining a first theme of the first event, generating first metadata indicative of the first theme, and determining a meaning of a first sentence that occurs in the first textual content. Some methods may include determining a second theme of the first sentence, generating second metadata indicative of the second theme, determining that user preference data associated with an active user profile includes the first theme and the second theme, generating a video summary that includes a portion of the first segment of digital content, and presenting the video summary.

Type: Grant

Filed: August 29, 2018

Date of Patent: October 22, 2019

Assignee: Amazon Technologies, Inc.

Inventors: Hooman Mahyar, Harshal Dilip Wanjari, Vimal Bhat