INTRA-BLOCK COPY SEARCHING
A video coding device may encode a video signal using intra-block copy prediction. A first picture prediction unit of a first picture may be identified. A second picture may be coded and identified. The second picture may be temporally related to the first picture, and the second picture may include second picture prediction units. A second picture prediction unit that is collocated with the first picture prediction unit may be identified. Prediction information for the first picture prediction unit may be generated. The prediction information may be based on a block vector of the second picture prediction unit that is collocated with the first picture prediction unit.
Latest VID SCALE, INC. Patents:
CROSS REFERENCE
This application claims the benefit of U.S. Provisional Application No. 62/109548, filed on Jan. 29, 2015, which is incorporated herein by reference as if fully set forth.
BACKGROUNDIn recent years, screen content sharing applications have become more popular with the advancing capabilities of desktop and mobile computing systems and the corresponding increase in the use of remote desktop, video conferencing, and mobile media presentation applications. The quality of content shared across devices using such applications is dependent on the use of an efficient screen content compression. This is even more important when devices with higher resolutions are used (e.g., high definition and ultra-high definition).
SUMMARYIntra-block copy searching may be performed. A video coding device may encode a video signal using intra-block copy prediction. A first picture prediction unit of a first picture may be identified. A second picture may be coded and identified. The second picture may be temporally related to the first picture, and the second picture may include second picture prediction units. A second picture prediction unit that is collocated with the first picture prediction unit may be identified. Prediction information for the first picture prediction unit may be generated. The prediction information may be based on a block vector of the second picture prediction unit that is collocated with the first picture prediction unit.
A hash may be generated for use in searching an image. A visual content block may be partitioned into sub-blocks. A direct current (DC) value of a sub-block may be determined, and a DC value of the visual content block may be determined. The DC value of the sub-block and the DC value of the visual content block may be compared. A hash value may be generated, where the hash value is based on the comparison.
A more detailed understanding may be had from the following description, given by way of example in conjunction with the accompanying drawings.
A detailed description of illustrative embodiments will now be described with reference to the various figures. Although this description provides a detailed example of possible implementations, it should be noted that the details are intended to be examples and in no way limit the scope of the application.
Screen content sharing applications have become more popular with the advancing capabilities of desktop and mobile computing systems and the corresponding increase in the use of remote desktop, video conferencing, and mobile media presentation applications.
Coding in red, green, blue (“RGB”) space may be used for high fidelity image display(s). Video compression may be used to encode screen content and/or may be used to transmit such content to a receiver. Video compression may be used for natural video encoding and/or may be used for screen content other than natural video. In purely natural video codings, coding distortion may be distributed, e.g., over an image and/or a picture. Where one or more large coding errors are present, such errors may be located around “strong” edges that have large gradient values (e.g., edges between sections of such an image and/or picture that have a high contrast to one another, contours of text), making artifacts associated with such errors more visible, even when an associated peak signal to noise ratio (PSNR) is relatively high for the associated whole image and/or picture.
Screen content compression technologies may compress screen content and/or may provide a relatively high quality of service in the associated screen content sharing related application(s). Efficient screen content compression methods and systems may be useful, e.g., where users share device content for media presentations and/or remote desktop uses. Screen resolutions of mobile devices has increased to high definition and/or ultra-high definition. Video encoding tools (e.g., block-based hybrid coding technologies), may be optimized for natural video coding and/or may be designed for screen content coding.
An encoder may have mode decision logic 204 that may choose a form of prediction. Selecting a form of prediction may be based on one or more criteria, such as a combination of rate and distortion considerations. Such an encoder may then transform 206 and/or quantize 208 a prediction residual (e.g., a signal representing a difference between an input signal and a prediction signal). Such a quantized residual, together with mode information (e.g., intra- or inter-prediction) and/or prediction information (e.g., block vectors, motion vectors, reference picture indexes, intra prediction modes, etc.) may be compressed at an entropy coder 210 and/or packed into an output video bitstream 216. As shown in
Video coding standards (e.g., Moving Picture Experts Group (“MPEG”) standards) may be used to reduce transmission bandwidth and/or storage. For example, block based hybrid video coding may be implemented. An encoder and/or decoder may be configured to operate as shown in the systems of
Temporal prediction, which may also be referred to as motion compensation, may be applied to reconstruct inter-coded PUs. Linear filters may be applied to obtain pixel values at fractional positions. For example, interpolation filters may have seven (7) or eight (8) taps for luma and/or four (4) taps for chroma. A deblocking filter may be content based, where different deblocking filter operations may be applied at the TU and PU boundaries, depending on one or more factors (e.g., a coding mode difference, a motion difference, a reference picture difference, a pixel value difference, similar factors, and/or a combination thereof). For entropy coding, context-based adaptive arithmetic binary coding (“CABAC”) for block level syntax elements may be used. For entropy coding, context-based adaptive arithmetic binary coding (“CABAC”) may not be used for high level parameters. In CABAC coding, one or more bins may be context-based coded regular bins, and/or another one or more bins may be by-pass coded bins without context.
Various block coding modes may be used. For example, spatial redundancy may not be used for screen content coding. This may be due to a focus on continuous tone video content, for example, in 4:2:0 format. Mode decision and/or transform coding tools may not be optimized for discrete tone screen content that may be captured in 4:4:4 video format.
Screen content material (e.g., text and/or graphics) may show different characteristics. One or more coding tools may be used to increase the coding efficiency of screen content coding. For example, 1-D string copy, palette coding, intra-block copy (“IntraBC”), adaptive color space conversion, and/or adaptive motion vector resolution coding tools may be used.
Screen content such as text and/or graphics may have repetitive patterns in terms of line segments and/or small blocks, and/or may have homogeneous small regions (e.g., mono-color regions or dual-color regions). Few colors may exist within a small block associated with text and/or graphics screen content. In a small block associated with natural video, there may be many colors.
A color value (e.g., a color value at each position) may be repeated from its above or left/right pixel. A 1-D string copy may copy a string (e.g., with variable length) from previously reconstructed pixel buffers. A position and/or string length may also be signaled. For palette coding, a palette table may be used as a dictionary to record significant colors in a coding block. A corresponding palette index map may be used to represent a color value of each pixel within a coding block and/or may be used to signal to a decoder, e.g., so that the decoder may reconstruct pixel values (e.g., using a look up table with the decoded index). Run values may be used to indicate a length of consecutive pixels that may have the same palette index in a horizontal and/or a vertical direction, e.g., to reduce the spatial redundancy of the palette index map. Palette coding may be efficient for large blocks that may contain sparse colors. A reduction of native RGB color space may be achieved, e.g., by using an adaptive color space conversion that may convert and/or encode a signal in another color space (e.g., a YCgCo color model (“YCgCo”)).
Intra-block copy (“IntraBC”) may use reconstructed pixels before in-loop filtering, e.g., to predict one or more current prediction units in a current coding block within a same picture. Displacement information, which may be referred to as a block vector (“BV”), may be coded.
An example IntraBC search 700 is shown in
A BV (e.g., a current block vector, “currBV”), at 804, may be taken from the predictor list. The current block vector may be checked to determine whether it is valid (e.g., whether it may be used for current prediction unit coding), at 806. If currBV is determined to be invalid, it is determined whether other BVs in the predictor list may be checked, at 816. For example, at 816, it may be determined whether all BVs in the predictor list have been checked. If it is determined that not all BVs in the predictor list have been checked, a next BV may be examined. If the BVs in the predictor list have been checked, the most suitable may be selected. For example, the best block vector, “Best_BV,” may be set equal to the best BV in a best BV list (e.g., the list “N_best_BV”), at 818. The best BV may be set equal to the most suitable (e.g., first) BV found in a list of BVs sorted by cost. The Best_BV and the N_best_list may be identified, at 820.
If currBV is determined to be valid, at 806, the predictor error SADY(currBV) of luma component with currBV may be calculated, at 808. The cost of the current BV (Cost(currBV)”) may be set equal to SADY(currBV)+lambda*Bits(currBV), at 810. If, at 812, the cost of currBV is not smaller than the cost of last BV in the existing best BV list (Cost(currBV)>=Cost(N_best_BV[N-1])), 816 may be followed. If at 812, the cost of currBV is smaller than the cost of last BV in the existing best BV list (Cost(currBV)<Cost(N_best_BV[N-1])), the current BV (currBV) may be inserted into a list (e.g., list N_best_BV), at 814. The currBV may be inserted into a list in ascending order, e.g., by cost. If, at 816, the BVs (e.g., all BVs) in the predictor list have been checked, 818 may be followed. If not all BVs in the predictor list have not been checked 804 may be followed.
The list of N-best BVs may be used in an IntraBC spatial search. For example, the list of N-best BVs may be selected based on a predictor based search, such as the search shown in
In a full frame IntraBC search, a hash-based intra-block search may be used to reduce search complexity at an encoder. A hash-based search may be used to build a set of hash-based tables for a block-size (e.g., 8×8, etc.).
BV.x=Blk_Pos[K].x−Curr_Blk_Pos.x (1)
BV.y=Blk_Pos[K].y−Curr_Blk_Pos.y (2)
where Blk_Pos[K] may be the (x, y) coordinates of the top left corner of the K-th block and Curr_Blk_Pos may be the (x, y) coordinates of the top left corner of the current block.
The hash value of a block may be generated based on the block's characteristics, such as a horizontal and/or a vertical gradient, direct current (“DC”) information (e.g. a mean value of a signal), or a pixel's cyclic redundancy check (CRC) value. In SCM-3.0, a 16-bit hash value may be generated, e.g., based on original pixel values for 8×8 luma blocks. In equation (3) below, let Grad denote the gradient of one 8×8 block and let DC0, DC1, DC2, and DC3 denote the DC values of the four (4) 4×4 blocks inside the 8×8 block, as shown in diagram 1300 of
H=(MSB(DC0,3)<<13)+(MSB(DC1,3)<<10)+(MSB(DC2,3)<<7)+(MSB(DC3,3)<<4)+MSB(Gad.4) (3)
where MSB(X, n) may be a function of extracting the n most significant bits of X.
In a hash-based search, there may be several processes used, independently and/or together in any combination. For example, a process may include updating a hash table during an encoding process. A process may include using a hash-based search for CU encoding. In a hash table update, after a coding tree unit (CTU) finishes encoding, the top left position of a reconstructed block (e.g., whose bottom right corner is in the CTU) may be added to a hash table based on the reconstructed block's hash value for future IntraBC coding. In a hash-based search, a current block's hash value may be calculated and a hash table may be searched, e.g., using the current block's hash value. If the current block's hash value is found in the hash table, a list of the corresponding block positions in the hash table may be used for a block vector (“BV”) search. For a block position in the list of corresponding block positions, a sum of an absolute difference (“SAD”) of a luma component between an original signal and a reference block may be compared to find a list of N-best block vectors with a first N having a least cost. Such a cost may be determined in the same manner as a cost is determined in spatial searches described herein. A best block vector may be determined by comparing a cost of components. For example, if a hash value of a current block is equal to Hash_1 as shown in hash table 1200 of
Table 1 below provides example searches and search windows in terms of CU size and partition type for IntraBC search, e.g., as may be used in SCM-30. In order to reduce the complexity, a predictor-based search may first be applied to a 2N×2N partition. If there is no non-zero coefficient for 2N×2N IntraBC coding (which may indicate that a 2N×2N search is sufficient), then a spatial and/or hash search for a 2N×2N partition and/or an IntraBC search for other partition types (i.e., non-2N×2N) may be skipped (e.g., may not be performed). Otherwise, partitions including a 2N×2N partition with spatial search and hash-based search may be applied. A hash-based search may be applicable to a 2N×2N partition of 8×8 CU. Various block vector predictions and block vector codings may be used, e.g., to improve the BV coding efficiency.
IntraBC searching may use a BV predictor from temporal pictures in predictor-based searches. For example, IntraBC searching may use a BV predictor from temporal pictures. There are regions (e.g., many still regions) in screen content. A block vector of a current picture and/or a block vector of a collocated block of a previous picture may be correlated (e.g., highly correlated).
A hash generation may affect an IntraBC search. For locality sensitive hash codes, a distance in hash space (e.g., a Hamming distance) may be related to the similarity distance in a feature space. If a Hamming distance of two blocks is smaller, e.g., it may indicate that the two blocks are closer in a feature space measured by Euclid distance. It may be desirable to cluster and/or fast block match such blocks.
For RGB coding, an IntraBC spatial search may be performed in YCgCo space to improve the accuracy of a block vector, e.g., because prediction error may be checked for the luma component r of for the G component. More energy may be concentrated in a luma component than a G component, which may cause a luma signal to be sharper than a G signal. A hash generation process for hash-based search may still utilize a sample value in a G component.
If quantized residuals of a best mode (e.g., current best mode) are not all zeros, at 1408, whether the coding unit width (e.g., “CU_width”) is less than or equal to 32 may be determined. If the CU_width is less than or equal to 32, in an inter-slice coding (e.g., inter-mode), a 2N×2N IntraBC with predictor-based search and/or intra-prediction modes may be checked (e.g., checked first), at 1410. At 1412, whether quantized residuals of a best mode are all zeros may be determined. If quantized residuals of a best mode are all zeros, remaining IntraBC coding triodes, including a 2N×2N partition with spatial and/or hash-based search and other partitions (e.g., N×2N, 2N×N, N×N) may not be performed, and 1416 may be followed. IntraBC coding may not signal a prediction direction (e.g., forward uni-prediction, backward uni-prediction, bi-prediction) and/or reference indices. Block vector coding may also be efficient because block vectors may use integer resolution and/or motion vectors may use quarter precision and/or integer precision. If, at 1412, it is determined that quantized residuals of a best mode are not all zeros, Intra mode may be checked, at 1414.
At 1416, it may be determined whether quantized residuals of a best mode are all zeros. If quantized residuals of a best mode arc all zero, the remaining steps may be skipped. If quantized residuals of a best mode are not all zeros, it may be determined, at 1418, whether CU_width is less than 32. 1418 may be evaluated to reduce encoding complexity (e.g., to reduce complexity for N×2N and/or 2N×N partitions for large CU sizes, such as 32×32 and/or 16×16). If the CU width is greater than or equal to 32, the palette mode may be checked at 1426. N×2N and/or 2N×N partitions for large CU sizes, (e.g., 32×32 and/or 16×16) may not be evaluated (see, e.g., Table 1 above). By not evaluating such CU sizes, coding efficiency may be reduced. Such partitions may be selectively evaluated, e.g., to minimize encoding complexity while improving coding efficiency. If CU_width is less than 32, at 1418, 2N×2N IntraBC mode may be checked, e.g., checked with spatial and/or hash searches, at 1420. It may be determined whether coding unit depth (e.g., “CU_depth”) is equal to max_depth-1, at 1422. If CU_depth is not equal to max_depth-1, the palette mode may be checked, at 1426. If CU_depth is equal to max_depth-1, IntraBC mode with intra block copy search may be checked for partitions (e.g., other partitions, N×2N, 2N×N, N×N, etc.), and 1426 may be performed.
A search order of 1-D search in a vertical direction, as shown in
One or more BV predictors from a previously coded picture may be used in IntraBC search. BV predictors may be generated in one or more ways, including from parent blocks, from spatial neighboring blocks, from reference blocks used for BV derivation, and/or from BVs from collocated blocks in a previously coded picture. For intra-coding configurations (e.g., where pictures are coded as intra-pictures), there may be a correlation (e.g., strong correlation) of BVs of a current block and/or its collocated block in the nearest coded picture, e.g., because a current picture and its previous coded picture may be close in temporal distance and/or because a current picture and/or its previous coded picture may be coded in a similar quality using similar quantization parameters. Available BVs from temporally collocated blocks, e.g., in a previous coded picture, may be added in a predictor generation process for predictor-based intraBC search. Such predictors may be effective in such an intra-coding configuration, e.g., because such pictures may be coded in a same order as an associated display order and/or because such pictures may be quantized using similar quantization parameters. The temporally collocated positions of such pictures may be the same as a temporal motion vector prediction (“TMVP”).
A hash generation for IntraBC hash-based search may be used. A hash generation may use more local features. Such local features may be calculated, e.g., based on relatively small sub-blocks and local features within the associated block may be used for hash generation. By using local features, an improved clustering of blocks may be obtained. Global features, such as gradients of such a block, may also be used in hash generation. The use of such global features may further improve clustering.
A block (e.g., an 8×8 block) may be partitioned into twelve (12) sub-blocks as shown in
For i-th sub-block SBi of a block B, the associated DC value may be denoted as DC(SBi). The DC value of block B, denoted as DC(B) may then be calculated. A one (1) bit flag F(SBi) may be calculated by comparing DC(SBi) with DC(B) as shown in equation (4):
A 15-bit hash value of block B may be generated by combining a local feature represented by F(SBi) and a global feature represented by the gradient of the block:
H(B)=(Σi=011F(SBi)<<(i+3)+MSB(Grad(B),3) (5)
Grad(B) may be gradients of the block B.MSB(X, n) may be a function of extracting the n most significant bits of the variable X. Luma value at position (x,y) may be denoted as L(x,y). For a pixel p at position (x,y), an associated gradient grad(p) may be calculated by determining and averaging the gradients in the horizontal and vertical directions:
grad(p)=(|L(x,y)−L(x−1y)|+|L(x,y)−L(x,y−1)|)/2 (6)
Grad(B) may be calculated as an average of the gradients of pixels p within the block B.
In a hash-based search, a hash table may be generated for possible positions within a picture. The hash table may be used in such a hash-based search. The computation process of using a hash table may be accelerated by calculating (e.g., first calculating) integral images for the DC and/or gradient. An integral image may be the sum of pixel values within one or more points (e.g., within one or more points of an image and/or sub-image). For example, the integral image may be the sum of pixel values found within a rectangle surrounding a current image. The value at a given position (x,y) in the integral image is the sum of the rectangle, whose top left corner is the top left of current image, and bottom right corner is the given position (x,y). The integral image may be the sum of the pixel values of the rectangle surrounding the current image. A hash value may be calculated using the DC integral image and/or the gradient integral image. Si(x,y) may denote the integral image of signal I (x,y), which may be luminance and/or gradient. Si(x,y) may be defined to be zero, e.g., if x and/or y is negative. Si(x,y) may be calculated using I(x,y) and the neighboring locations as shown in equation (7) and as further illustrated in
Si(x,y)=Si(x,y−1)+Si(x−1,y)−Si(x−1,y−1)+I(x,y) (7)
Where the top left position of block B is denoted as (XTL, YTL) and the bottom right position of block B is denoted as (XBR, YBR), the sum of I(x,y) within block B may be calculated as follows:
sum(I(x,y)|(x,y)∈B)=SI(XBR,YBR)+SI(XTL−1,YTL−1)−SI(XTL−1,YBR)−SI(XBR,YTL−1) (8)
Equation (8) may be used to calculate a DC value and a gradient of a block efficiently (e.g., more efficiently). The computation complexity for hash value generation may be reduced. Such a fast hash calculation may also be applicable to hash generation as shown in Equation (3) above.
The hash generations may be extended to a general method. A region R (e.g., a block or an ellipse) may be partitioned into multiple sub-regions SR using a partitioning rule. For a sub-region SRi, a local feature L(SRi) may be calculated, for example, as DC and/or gradients within that sub-region, and/or a clipped value F(SRi) may be generated by comparing the value of the local feature L(SRi) to the value a global feature G0(R) as shown below in Equations (9) or (10). The value of the global feature may be calculated in a similar manner as that used to calculate the local feature (e.g., as DC or gradients). The value of the global feature may be evaluated over the region R. M may be a maximum value of the variable F. If F is represented by p bits, M may be set equal to 2p−1 for Equation (9), whereas M may be set equal to 2p−1 for Equation (10).
Parameter S(i) may be a scaling factor for a sub-region. Such a scaling factor may be constant across regions, such that a sub-region is treated equally. Such a scaling factor may take into account a difference between a boundary area and/or a center area. For example, a center of a region may be reliable (e.g., more reliable) and/or important (e.g., more important) relative to an associated whole region. Therefore, S(i) may be larger for center sub-regions and/or smaller for boundary sub-regions. A hash value of region R, H(R) may be calculated as shown below in Equation (11):
H(R)=(Σi=0NF(SRi)<<(i*p+q))+MSB(G1(R),q) (11)
where p may be a number of bits for variable F and q may be a number of bits for a second global feature G1(R). The use of a second feature G1(R) may depend on the associated application. A second global feature may be omitted and/or multiple (e.g., multiple additional) global features may be included in the hash.
The hash generations may provide improved clustering capabilities due, at least in part, to the use of local features extracted from sub-blocks. Such hash generations may be used by a variety of applications, e.g., because the computation complexity required by the disclosed. methods is low (e.g., relatively low) due to the fast implementation approach described herein that utilizes one or more integral images.
The lower complexity hash generation may be utilized in clustering and/or patch/block matching related applications. For example, the disclosed systems and/or methods may be used for error concealment. In
A block matching related application may be used for point matching in 3D and/or multi-view processing, such as view synthesis and/or depth estimation. A corresponding point in another view may be found when one feature point in a first view is available. Hash-based matching using hash generations may be applicable. For fast face detection, there may be a weak classifier that may be used to remove samples (e.g., unlikely samples) at a stage (e.g., an early stage) of such a process. The samples that pass the weak classifier may be verified by one or more classifiers that may be cascaded. The early removal weak classifier may be trained in the hash space where the hash value is generated using the disclosed methods, reducing the complexity at an early stage of processing, e.g., making face detection faster.
IntraBC search may be used for RGB coding. Where encoding is performed using the RGB format, the color spaces used in spatial search and/or hash-based search may be different. Spatial search may use a YCgCo space, while hash-based search may use an RCB color space. A YCgCo color space may be used for spatial and/or hash-based search, as the luma (Y) component may represent more critical (e.g., high frequency, edge, texture, etc.) information than the G component.
IntraBC search early termination may be used. A 2N×2N partitioned IntraBC with spatial/hash search and/or non-2N×2N partitioned IntraBC may be avoided, e.g., if quantized residuals are zeros (e.g., all zeros). An early termination condition may be too aggressive and/or may degrade coding efficiency. If the best mode before spatial/hash search and/or before non-2N×2N partition search is IntraBC (with 2N×2N partition using predictor based search) with residuals being equal to zeroes (e.g., all zeros) and/or inter-skip mode, the remaining IntraBC search with partitions (e.g., different partitions) may be skipped. If the best mode before spatial/hash search and/or before non-2N×2N partition search is the intra-prediction mode and/or the non-merge inter-mode, the remaining spatial/hash search and/or non-2N×2N partition searches may still be conducted. If the best mode before spatial/hash search and/or before non-2N×2N partition search is the intra-prediction mode and/or the non-merge inter-mode, coding efficiency may be improved.
As shown in Table 1, N×2N and 2N×N partitions may not be checked for large CU sizes (e.g., 32×32, 16×16) due to encoding complexity considerations. Such partitions may be checked using predictor-based search for large CU sizes, which may improve coding efficiency without increasing (e.g., significantly increasing) complexity. For intra-slice coding, N×2N partitions may be checked for 32×32 CU, e.g, using predictor-based search before spatial/hash-based 2N×2N IntraBC checking. A 2N×N partition for a CU (e.g., a large CU) may not be checked for intra-slice coding. For inter-slice coding, N×2N and/or 2N×N partitions may be checked using, e.g., predictor-based search. For example,
The vertical search order for 1-D spatial search may be modified. In
As shown in
The communications systems 2000 may also include a base station 2014a and a base station 2014b. Each of the base stations 2014a, 2014b may be any type of device configured to wirelessly interface with at least one of the WTRUs 2002a, 2002b, 2002c, 2002d to facilitate access to one or more communication networks, such as the core network 2006/2007/2009, the Internet 2010, and/or the networks 2012. By way of example, the base stations 2014a, 2014b may be a base transceiver station (BTS), a Node-B, an eNode B, a Home Node B, a Home eNode B, a site controller, an access point (AP), a wireless router, and the like. While the base stations 2014a, 2014b are each depicted as a single element, it will be appreciated that the base stations 2014a, 2014b may include any number of interconnected base stations and/or network elements.
The base station 2014a may be part of the RAN 2003/2004/2005, which may also include other base stations and/or network elements (not shown), such as a base station controller (BSC), a radio network controller (RNC), relay nodes, etc. The base station 2014a and/or the base station 2014b may be configured to transmit and/or receive wireless signals within a particular geographic region, which may be referred to as a cell (not shown). The cell may further be divided into cell sectors. For example, the cell associated with the base station 2014a may be divided into three sectors. Thus, in one embodiment, the base station 2014a may include three transceivers, i.e., one for each sector of the cell. In another embodiment, the base station 2014a may employ multiple-input multiple output (MIMO) technology and, therefore, may utilize multiple transceivers for each sector of the cell.
The base stations 2014a, 2014b may communicate with one or more of the WTRUs 2002a, 2002b, 2002c, 2002d over an air interface 2015/2016/2017, which may be any suitable wireless communication link (e.g., radio frequency (RF), microwave, infrared (IR), ultraviolet (UV), visible light, etc.). The air interface 2015/2016/2017 may be established using any suitable radio access technology (RAT).
More specifically, as noted above, the communications system 2000 may be a multiple access system arid may employ one or more channel access schemes, such as CDMA, TDMA, FDMA, OFDMA, SC-FDMA, and the like. For example, the base station 2014a in the RAN 2003/2004/2005 and the WTRUs 2002a, 2002b, 2002c may implement a radio technology such as Universal Mobile Telecommunications System (UMTS) Terrestrial Radio Access (UTRA), which may establish the air interface 2015/2016/2017 using wideband CDMA (WCDMA). WCDMA may include communication protocols such as High-Speed Packet Access (HSPA) and/or Evolved HSPA (HSPA+). HSPA may include High-Speed Downlink Packet Access (HSDPA) and/or High-Speed Uplink Packet Access (HSUPA).
The base station 2014a and the WTRUs 2002a, 2002b, 2002c may implement a radio technology such as Evolved UMTS Terrestrial Radio Access (E-UTRA), which may establish the air interface 2015/2016/2017 using Long Term Evolution (LTE) and/or LTE-Advanced (LTE-A).
The base station 2014a and the WTRUs 2002a, 2002b, 2002c may implement radio technologies such as IEEE 802.16 (i.e., Worldwide Interoperability for Microwave Access (WiMAX)), CDMA2000, CDMA2000 1X, CDMA2000 EV-DO, Interim Standard 2000 (IS-2000), Interim Standard 95 (IS-95), Interim Standard 856 (IS-856), Global System for Mobile communications (GSM), Enhanced Data rates for GSM Evolution (EDGE), GSM EDGE (GERAN), and the like.
The base station 2014b in
The RAN 2003/2004/2005 may be in communication with the core network 2006/2007/2009, which may be any type of network configured to provide voice, data, applications, and/or voice over internet protocol (VoIP) services to one or more of the WTRUs 2002a, 2002b, 2002c, 2002d. For example, the core network 2006/2007/2009 may provide call control, billing services, mobile location-based services, pre-paid calling, Internet connectivity, video distribution, etc., and/or perform high-level security functions, such as user authentication. Although not shown in
The core network 2006/2007/2009 may also serve as a gateway for the WTRUs 2002a, 2002b, 2002c, 2002d to access the PSTN 2008, the Internet 2010, and/or other networks 2012. The PSTN 2008 may include circuit-switched telephone networks that provide plain old telephone service (POTS). The Internet 2010 may include a global system of interconnected computer networks and devices that use common communication protocols, such as the transmission control protocol (TCP), user datagram protocol (UDP) and the internet protocol (IP) in the TCP/IP interact protocol suite. The networks 2012 may include wired or wireless communications networks owned and/or operated by other service providers. For example, the networks 2012 may include another core network connected to one or more RANs, which may employ the same RAT as the RAN 2003/2004/2005 or a different RAT.
Some or all of the WTRUs 2002a, 2002b, 2002c, 2002d in the communications system 2000 may include multi-mode capabilities, i.e., the WTRUs 2002a, 2002b, 2002c, 2002d may include multiple transceivers for communicating with different wireless networks over different wireless links. For example, the WTRU 2002c shown in
The processor 2018 may be a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Array (FPGAs) circuits, any other type of integrated circuit (IC), a state machine, and the like. The processor 2018 may perform signal coding, data processing, power control, input/output processing, and/or any other functionality that enables the WTRU 2002 to operate in a wireless enviromnent. The processor 2018 may be coupled to the transceiver 2020, which may be coupled to the transmit/receive element 2022. While
The transmit/receive element 2022 may be configured to transmit signals to, or receive signals from, a base station (e.g., the base station 2014a) over the air interface 2015/2016/2017. For example, the transmit/receive element 2022 may be an antenna configured to transmit and/or receive RF signals. The transmit/receive element 2022 may be an emitter/detector configured to transmit and/or receive IR. UV, or visible light signals, for example. The transmit/receive element 2022 may be configured to transmit and receive both RF and light signals. It will be appreciated that the transmit/receive element 2022 may be configured to transmit and/or receive any combination of wireless signals.
In addition, although the transmit/receive element 2022 is depicted in
The transceiver 2020 may be configured to modulate the signals that are to be transmitted by the transmit/receive element 2022 and to demodulate the signals that are received by the transmit/receive element 2022. As noted above, the WTRU 2002 may have multi-mode capabilities. Thus, the transceiver 2020 may include multiple transceivers for enabling the WTRU 2002 to communicate via multiple RATS, such as UTRA and IEEE 802.11, for example.
The processor 2018 of the WTRU 2002 may be coupled to, and may receive user input data from, the speaker/microphone 2024, the keypad 2026, and/or the display/touchpad 2028 (e.g., a liquid crystal display (LCD) display unit or organic light-emitting diode (OLED) display unit). The processor 2018 may also output user data to the speaker/microphone 2024, the keypad 2026, and/or the display/touchpad 2028. In addition, the processor 2018 may access information from, and store data in, any type of suitable memory, such as the non-removable memory 2030 and/or the removable memory 2032. The non-removable memory 2030 may include random-access memory (RAM), read-only memory (ROM), a hard disk, or any other type of memory storage device. The removable memory 2032 may include a subscriber identity module (SIM) card, a memory stick, a secure digital (SD) memory card, and the like. The processor 2018 may access information from, and store data in, memory that is not physically located on the WTRU 2002, such as on a server or a home computer (not shown).
The processor 2018 may receive power from the power source 2034, and may be configured to distribute and/or control the power to the other components in the WTRU 2002. The power source 2034 may be any suitable device for powering the WTRU 2002. For example, the power source 2034 may include one or more dry cell batteries (e.g., nickel-cadmium (NiCd), nickel-zinc (NiZn), nickel metal hydride (NiMH), lithium-ion (Li-ion), etc.), solar cells, fuel cells, and the like.
The processor 2018 may also be coupled to the GPS chipset 2036, which may be configured to provide location information (e.g., longitude and latitude) regarding the current location of the WTRU 2002. In addition to, or in lieu of, the information from the GPS chipset 2036, the WTRU 2002 may receive location information over the air interface 2015/2016/2017 from a base station(e.g., base stations 2014a, 2014b) and/or determine its location based on the timing of the signals being received from two or more nearby base stations. It will be appreciated that the WTRU 2002 may acquire location information by way of any suitable location-determination method.
The processor 2018 may further be coupled to other peripherals 2038, which may include one or more software arid/or hardware modules that provide additional features, functionality, and/or wired or wireless connectivity. For example, the peripherals 2038 may include an accelerometer, an e-compass, a satellite transceiver, a digital camera (for photographs or video), a universal serial bus (USB) port, a vibration device, a television transceiver, a hands free headset, a Bluetooth® module, a frequency modulated (FM) radio unit, a digital music player, a media player, a video game player module, an Internet browser, and the like.
As shown in
The core network 2006 shown in
The RNC 2042a in the RAN 2003 may be connected to the MSC 2046 in the core network 2006 via an IuCS interface. The MSC 2046 may be connected to the MGW 2044. The MSC 2046 and the MGW 2044 may provide the WTRUs 2002a, 2002b, 2002c with access to circuit-switched networks, such as the PSTN 2008, to facilitate communications between the WTRUs 2002a, 2002b, 2002c and traditional land-line communications devices.
The RNC 2042a in the RAN 2003 may also be connected to the SGSN 2048 in the core network 2006 via an IuPS interface. The SGSN 2048 may be connected to the GGSN 2050. The SGSN 2048 and the GGSN 2050 may provide the WTRUs 2002a, 2002b, 2002c with access to packet-switched networks, such as the Internet 2010, to facilitate communications between and the WTRUs 2002a, 2002b, 2002c and IP-enabled devices.
As noted above, the core network 2006 may also be connected to the networks 2012, which may include other wired or wireless networks that are owned and/or operated by other service providers.
The RAN 2004 may include eNode-Bs 2060a, 2060b, 2060c, though it will be appreciated that the RAN 2004 may include any number of eNode-Bs. The eNode-Bs 2060a, 2060b, 2060c may each include one or more transceivers for communicating with the WTRUs 2002a, 2002b, 2002c over the air interface 2016. The eNode-Bs 2060a, 2060b, 2060c may implement MIMO technology. Thus, the eNode-B 2060a, for example, may use multiple antennas to transmit wireless signals to, and receive wireless signals from, the WTRU 2002a.
Each of the eNode-Bs 2060a, 2060b, 2060c may be associated with a particular cell (not shown) and may be configured to handle radio resource management decisions, handover decisions, scheduling of users in the uplink and/or downlink, and the like. As shown in
The core network 2007 shown in
The MME 2062 may be connected to each of the eNode-Bs 2060a, 2060b, 2060c in the RAN 2004 via an SI interface and may serve as a control node. For example, the MME 2062 may be responsible for authenticating users of the WTRUs 2002a, 2002b, 2002c, bearer activation/deactivation, selecting a particular serving gateway during an initial attach of the WTRUs 2002a, 2002b, 2002c, and the like. The MME 2062 may also provide a control plane function for switching between the RAN 2004 and other RANs (not shown) that employ other radio technologies, such as GSM or WCDMA.
The serving gateway 2064 may be connected to each of the eNode-Bs 2060a, 2060b, 2060c in the RAN 2004 via the SI interface. The serving gateway 2064 may generally route and forward user data packets to/from the WTRUs 2002a, 2002b, 2002c. The serving gateway 2064 may also perform other functions, such as anchoring user planes during inter-eNode B handovers, triggering paging when downlink data is available for the WTRUs 2002a, 2002b, 2002c, managing and storing contexts of the WTRUs 2002a, 2002b, 2002c, and the like.
The serving gateway 2064 may also be connected to the PDN gateway 2066, which may provide the WTRUs 2002a, 2002b, 2002c with access to packet-switched networks, such as the Internet 2010, to facilitate communications between the WTRUs 2002a, 2002b, 2002c and IP-enabled devices.
The core network 2007 may facilitate communications with other networks. For example, the core network 2007 may provide the WTRUs 2002a, 2002b, 2002c with access to circuit-switched networks, such as the PSTN 2008, to facilitate communications between the WTRUs 2002a, 2002b, 2002c and traditional land-line communications devices. For example, the core network 2007 may include, or may communicate with, an IP gateway (e.g., an IP multimedia subsystem (IMS) server) that serves as an interface between the core network 2007 and the PSTN 2008. In addition, the core network 2007 may provide the WTRUs 2002a, 2002b, 2002c with access to the networks 2012, which may include other wired or wireless networks that are owned and/or operated by other service providers.
As shown in
The air interface 2017 between the WTRUs 2002a, 2002b, 2002c and the RAN 2005 may be defined as an R1 reference point that implements the IEEE 802.16 specification. In addition, each of the WTRUs 2002a, 2002b, 2002c may establish a logical interface (not shown) with the core network 2009. The logical interface between the WTRUs 2002a, 2002b, 2002c and the core network 2009 may be defined as an R2 reference point, which may be used for authentication, authorization, IP host configuration management, and/or mobility management.
The communication link between each of the base stations 2080a, 2080b, 2080c may be defined as an R8 reference point that includes protocols for facilitating WTRU handovers and the transfer of data between base stations. The communication link between the base stations 2080a, 2080b, 2080c and the ASN gateway 2082 may be defined as an R6 reference point. The R6 reference point may include protocols for facilitating mobility management based on mobility events associated with each of the WTRUs 2002a, 2002b, 2002c.
As shown in
The MIP-HA may be responsible for IP address management, and may enable the WTRUs 2002a, 2002b, 2002c to roam between different ASNs and/or different core networks. The MIP-HA 2084 may provide the WTRUs 2002a, 2002b, 2002c with access to packet-switched networks, such as the Internet 2010, to facilitate communications between the WTRUs 2002a, 2002b, 2002c and IP-enabled devices. The AAA server 2086 may be responsible for user authentication and for supporting user services. The gateway 2088 may facilitate interworking with other networks. For example, the gateway 2088 may provide the WTRUs 2002a, 2002b, 2002c with access to circuit-switched networks, such as the PSTN 2008, to facilitate communications between the WTRUs 2002a, 2002b, 2002c and traditional land-line communications devices. In addition, the gateway 2088 may provide the WTRUs 2002a, 2002b, 2002c with access to the networks 2012, which may include other wired or wireless networks that are owned and/or operated by other service providers.
Although not shown in
Although features and elements are described above in particular combinations, one of ordinary skill in the art will appreciate that each feature or element can be used alone or in any combination with the other features and elements. In addition, the methods described herein may be implemented in a computer program, software, or firmware incorporated in a computer-readable medium for execution by a computer or processor. Examples of computer-readable media include electronic signals (transmitted over wired or wireless connections) and computer-readable storage media. Examples of computer-readable storage media include, but are not limited to, a read only memory (ROM), a random access memory (RAM), a register, cache memory, semiconductor memory devices, magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks, and digital versatile disks (DVDs). A processor in association with software may be used to implement a radio frequency transceiver for use in a WTRU, terminal,base station, RNC, or any host computer.
Claims
1-35. (canceled)
36. A method for coding videos, the method comprising:
- partitioning a parent block of video data into a plurality of sub-blocks;
- determining a respective local feature of each of the sub-blocks;
- determining a global feature of the parent block of video data;
- calculating a hash value based on the respective local features of the sub-blocks and the global feature of the parent block of video data; and
- using the hash value in a video coding operation.
37. The method of claim 36, wherein determining the respective local feature of each of the sub-blocks comprises determining a respective direct current (DC) value for each of the sub-blocks.
38. The method of claim 37, further comprising:
- determining a DC value for the parent block of video data;
- determining a respective flag value for each of the sub-blocks by comparing the DC value of each corresponding sub-block to the DC value of the parent block of video data, wherein the flag value is determined to be one if the DC value of the corresponding sub-block is greater than the DC value of the parent block of video data and the flag value is determined to be zero if the DC value of the corresponding sub-block is equal to or less than the DC value of the parent block of video data;
- calculating a sum of the flag values for the sub-blocks; and
- using at least the sum of the flag values to calculate the hash value.
39. The method of claim 36, wherein determining the respective local feature of each of the sub-blocks comprises applying a respective scaling factor to the local feature.
40. The method of claim 39, wherein the respective scaling factors applied to different sub-blocks have different values.
41. The method of claim 36, wherein determining the global feature of the parent block of video data comprises determining a gradient value for the parent block of video data, and calculating the hash value based on the respective local features of the sub-blocks and the global feature of the parent block of video data comprises extracting a number of bits from the gradient value of the parent block and using at least the number of bits to calculate the hash value.
42. The method of claim 41, wherein the gradient value for the parent block of video data is determined as an average of the respective gradient values associated with a plurality of pixels in the parent block of video data.
43. The method of claim 36, wherein using the hash value in a video coding operation comprises using the hash value to conduct a hash-based search.
44. The method of claim 43, wherein the hash-based search is conducted in an intra-block copy mode.
45. The method of claim 36, wherein the parent block of video data comprises multiple coding units.
46. A video coding device, comprising:
- a processor configured to:
- partition a parent block of video data into a plurality of sub-blocks;
- determine a respective local feature of each of the sub-blocks;
- determine a global feature of the parent block of video data;
- calculate a hash value based on the respective local features of the sub-blocks and the global feature of the parent block of video data; and
- use the hash value in a video coding operation.
47. The video coding device of claim 46, wherein the processor being configured to determine the respective local feature of each of the sub-blocks comprises the processor being configured to determine a respective direct current (DC) value for each of the sub-blocks.
48. The video coding device of claim 47, wherein the processor is further configured to:
- determine a DC value for the parent block of video data;
- determine a respective flag value for each of the sub-blocks by comparing the DC value of each corresponding sub-block to the DC value of the parent block of video data, wherein the flag value is determined to be one if the DC value of the corresponding sub-block is greater than the DC value of the parent block of video data and the flag value is determined to be zero if the DC value of the corresponding sub-block is equal to or less than the DC value of the parent block of video data;
- calculate a sum of the flag values for the sub-blocks; and
- use at least the sum of the flag values to calculate the hash value.
49. The video coding device of claim 46, wherein the processor being configured to determine the global feature of the parent block of video data comprises the processor being configured to determine a gradient value for the parent block of video data, and wherein the processor being configured to calculate the hash value based on the respective local features of the sub-blocks and the global feature of the parent block of video data comprises the processor being configured to extract a number of bits from the gradient value of the parent block and use at least the number of bits to calculate the hash value.
50. The video coding device of claim 49, wherein the processor being configured to use the hash value in a video coding operation comprises the processor being configured to use the hash value to conduct a hash-based search.
Type: Application
Filed: Nov 7, 2019
Publication Date: Mar 5, 2020
Applicant: VID SCALE, INC. (Wilmington, DE)
Inventors: Yuwen He (San Diego, CA), Xiaoyu Xiu (San Diego, CA), Yan Ye (San Diego, CA)
Application Number: 16/676,708