Abstract: This application discloses a motion vector obtaining method and apparatus, a computer device, and a storage medium. In the method, an initial motion vector of a to-be-processed picture block is determined by using a location relationship between a reference block and the to-be-processed picture block. When the reference block and the to-be-processed picture block are located in a same coding tree block, a decoder uses an initial motion vector of the reference block as the initial motion vector of the to-be-processed picture block. When the reference block and the to-be-processed picture block are located in different coding tree blocks, the decoder uses a final motion vector of the reference block as the initial motion vector of the to-be-processed picture block.
Abstract: This invention classifies an action that appears in a video clip by receiving a video clip for analysis, applying a convolutional neural network mechanism (CNN) to the frames in the clip to generate a 4D embedding tensor for each frame in the clip, applying a multi-resolution convolutional neural network mechanism (CNN) to each of the frames in the clip to generate a sequence of reduced resolution blocks, computing a kinematic attention weight that estimates the amount of motion in the block, applying the attention weights to the embedding tensors for each frame in a clip, to generate a weighted embedding tensor, or context, that represents all the frames in the clip, at the resolution, combining the contexts across all resolutions to generate a multi-resolution context, performing a 3D pooling to obtain a 1D feature vector and classifying a primary action of the video clip based on the feature vector.
Type:
Grant
Filed:
October 26, 2023
Date of Patent:
October 22, 2024
Assignee:
BEN GROUP, INC.
Inventors:
Schubert R. Carvalho, Tyler Folkman, Richard Ray Butler
Abstract: Systems and methods are provided herein for minimizing obstruction of a media asset by an overlay by predicting a path of movement of an object of interest of the media asset and avoiding placement of the overlay in the path of movement. To this end, a media guidance application may detect an object of interest in a first frame of a media asset, and may determine a determining a first location of the object in the first frame and a second location of the object of interest in a second frame. The media guidance application may calculate, based on the first location and the second location, a projected location of the object of interest in a third frame of the media asset, and may generate for display an overlay in a location that does not overlap with any of the first location, the second location, and the projected location.
Abstract: This invention classifies an action that appears in a video clip by receiving a video clip for analysis, applying a convolutional neural network mechanism (CNN) to the frames in the clip to generate a 4D embedding tensor for each frame in the clip, applying a multi-resolution convolutional neural network mechanism (CNN) to each of the frames in the clip to generate a sequence of reduced resolution blocks, computing a kinematic attention weight that estimates the amount of motion in the block, applying the attention weights to the embedding tensors for each frame in a clip, to generate a weighted embedding tensor, or context, that represents all the frames in the clip, at the resolution, combining the contexts across all resolutions to generate a multi-resolution context, performing a 3D pooling to obtain a 1D feature vector and classifying a primary action of the video clip based on the feature vector.
Type:
Grant
Filed:
November 16, 2021
Date of Patent:
December 12, 2023
Assignee:
BEN GROUP, INC.
Inventors:
Schubert R. Carvalho, Tyler Folkman, Richard Ray Butler
Abstract: A method for implementing an adaptive color transform (ACT) during image/video encoding and decoding, comprises determining, for a conversion between a video comprising a block and a bitstream of the video, that a size of the block is greater than a maximum allowed size for an ACT mode, and performing, based on the determining, the conversion, wherein, in response to the size of the block being greater than the maximum allowed size for the ACT mode, the block is partitioned into multiple sub-blocks, and wherein each of the multiple sub-blocks share a same prediction mode, and the ACT mode is enabled at a sub-block level.
Type:
Grant
Filed:
July 5, 2022
Date of Patent:
December 5, 2023
Assignees:
BEIJING BYTEDANCE NETWORK TECHNOLOGY CO., LTD., BYTEDANCE INC.
Inventors:
Weijia Zhu, Jizheng Xu, Li Zhang, Kai Zhang, Yue Wang
Abstract: Systems and methods are provided herein for minimizing obstruction of a media asset by an overlay by predicting a path of movement of an object of interest of the media asset and avoiding placement of the overlay in the path of movement. To this end, a media guidance application may detect an object of interest in a first frame of a media asset, and may determine a determining a first location of the object in the first frame and a second location of the object of interest in a second frame. The media guidance application may calculate, based on the first location and the second location, a projected location of the object of interest in a third frame of the media asset, and may generate for display an overlay in a location that does not overlap with any of the first location, the second location, and the projected location.
Abstract: A syntax parsing apparatus includes a plurality of syntax parsing circuits and a dispatcher. Each of the syntax parsing circuits has at least entropy decoding capability. The syntax parsing circuits generate a plurality of entropy decoding results of a plurality of image regions within a same frame, respectively. The dispatcher assigns bitstream start points of the image regions to the syntax parsing circuits, and triggers the syntax parsing circuits to start entropy decoding, respectively.