International organisation for standardisation organisation internationale de normalisation

Yüklə 8,24 Mb.

səhifə	161/203
tarix	02.01.2022
ölçüsü	8,24 Mb.
	#15533

1 ... 157 158 159 160 161 162 163 164 ... 203

5.19Memory bandwidth reduction

5.19.1.1.1.1.1.1.1JCTVC-H0052 Coding with Group of LCUs (GOC) [L. Liang, M. Li, J. Lin, N. Wang, W. Zuo (ZTE)]

This contribution introduces a coding concept called "coding with group of LCUs" (GOC) that partitions a picture, slice or tile into rows which contain more than one LCU row as a GOC and then the scan order of LCUs in the same GOC changes from raster scan to zigzag scan. GOCs are processed in raster scan within a picture, a slice, or a tile. It was argued that this allows for better bandwidth saving for ASIC encoders when struggling with limited on-chip memory. The goc_height indicating the fixed number of LCU rows in a GOC would be signalled in the sequence parameter set or in the picture parameter set. It was asserted that the GOC scheme can co-exist with slices and tiles, and it would be optional in the encoder and have a negligible impact on the decoder design. Experimental results using the compulsory configurations with one tile and one slice per picture and 3 LCU rows per GOC reportedly show that the BD bit rate gain when using GOC is 0.1% on average, and 0.2% for the class E sequences.

50% memory bandwidth was reportedly saved in motion estimation (on the encoder side). At the decoder side, the same zigzag scan can be done by using wavefront processing and a single thread (independency of diagonal blocks), which does not need to be normative.

There was no support for this among the other experts.

5.19.1.1.1.1.1.1.2JCTVC-H0089 AHG7: Feasibility study results on virtual motion compensation memory bandwidth verifier (VMBV) [H. Kim, M. Zhou (TI)]

This contribution reports feasibility study results on virtual motion compensation memory bandwidth verifier (VMBV) aimed to lower the worst-case motion compensation memory bandwidth requirements, as is proposed in JCTVC-G095. To impose VMBV constraints on bitstreams, a motion compensation memory bandwidth control algorithm is implemented in the encoder motion estimation of HM5.0 to guarantee that resulting bitstreams meet the memory bandwidth budget. The control algorithm is realized by adjusting number of reference frames and CU depth and is operated on a LCU by LCU basis. The memory bandwidth measurement tool used is the one described in JCTVC-C007. In the experiment, the memory bandwidth budget was set to 4 bytes per pixel, there were 13 sequences in common test conditions consumes more bandwidth than the budget. The algorithm was able to bring all the sequences to meet VMBV constraints. However, some of sequences had relatively high loss (up to 5.3%). Although the control algorithm was suggested to be simple and not mature, the results were asserted to reveal that it is feasible to impose VMBV compliance without causing a significant burden on the encoder.

The idea of a normative restriction of bitstreams by VMBV model was discussed. Several experts expressed the opinion that this could be an interesting idea, but the concrete model would need very careful investigation. Also, it was questionable whether this might penalize the compression performance too much.

A side remark was that this could restrict the occurrence of some "evil bitstreams" which would by default not comply. It was suggested that this could also be defined more locally.

Further study (in an AHG) was encouraged.

5.19.1.1.1.1.1.1.3JCTVC-H0096 AHG7: Controllable memory bandwidth reduction with bi-pred to uni-pred conversion [T. Ikai (Sharp)]

This contribution presents a bi-prediction restriction to reduce motion compensation complexity. In the proposed method the inter_4x4_enabled_flag is replaced with an inter_prediction_restriction_flag to control both PU size restriction and bi-prediction PU size restriction. If inter_prediction_restriction_flag is enabled, bi-prediction of 4x8 and 8x4 PU or 8x8 PU is prohibited depending on MinCUSize. Specifically 4x8 and 8x4 bi-prediction is prohibited when MinCUSize is 8, while 8x8 bi-prediction is prohibited when MinCUSize is 16. If bi-prediction is disabled, the related syntax inter_pred_flag is omitted from the bitstream and uni-prediction to bi-prediction conversion is performed in merge motion parameter derivation process. It was reported that average BD bit rate loss is 0.1% to 0.3% with 6% to 10% less encode time when 4x8 and 8x4 bi-prediction restriction case, while encoder only case’s average loss is 0.2% to 0.8%. It is reported that BD bit rate loss is 0.9% to 3.7% with 21% to 31% less encoding time when 8x8 bi-prediction restriction case, while the encoder-only case’s average loss is 1.2% to 4.6%.

This seemed similar to H0221. Practical encoders may not use 4x4/4x8/8x4 with bi-prediction at all, and if bi-prediction is not used, the merge mode may not be too useful either. Less loss in coding efficiency is observed when this is implemented in a decoder-normative way (compared to being a bitstream restriction).

5.19.1.1.1.1.1.1.4JCTVC-H0421 AHG7: Cross-verification of controllable memory bandwidth reduction with bi-pred to uni-pred conversion (JCTVC-H0096 from Sharp) [M. Zhou (TI)] [late]
5.19.1.1.1.1.1.1.5JCTVC-H0181 AHG7: A restriction of motion vector for small PU size [T. Chujoh, T. Yamakage (Toshiba)]

An experimental result of restriction of motion vector for small PU sizes was reported. This is a non-normative technology to reduce memory bandwidth for motion compensation. The worst cases of memory bandwidth of interpolation process are two-dimensional interpolation positions for both Luma and Chroma of bi-prediction PU. Therefore, in order to reduce the worst case of memory bandwidth, for example, an encoding method that at least one motion vector of L0 or L1 is restricted to an integer position for both luma and chroma. As an experimental result, the loss of coding efficiency is reportedly an average of 0.45% and this value is smaller than the result of prohibition of both 4x8 and 8x4 bi-prediction. Their worst memory bandwidths are reportedly almost the same.

Examples of restrictions were to prohibit bi-prediction for 4x8/8x4; MV of one of L0 or L1 such that no fractional position needs to be interpolated for chroma.

This would be a normative constraint imposed on bitstreams, but it would not change the normative specification of syntax, semantics and decoding. Such constraints could be imposed by level restrictions.

5.19.1.1.1.1.1.1.6JCTVC-H0104 AHG7: Crosscheck of Toshiba memory bandwidth reduction proposal (JCTVC-H0181) [T. Ikai (Sharp)] [late]
5.19.1.1.1.1.1.1.7JCTVC-H0221 AHG7: Modification of merge candidate derivation to reduce MC memory bandwidth [K. Kondo, T. Suzuki (Sony)]

This contribution proposes to replace the bi-prediction of merge candidates to uni-prediction when the block size is small (e.g. 4x4, 4x8, 8x4 and 8x8). This technique aims to avoid coding efficiency loss by restricting bi-prediction for small blocks. To restrict bi-prediction and small block is a way to limit maximum memory bandwidth. It was asserted that when a bi-prediction for small block is restricted by level, the encoder cannot choose merge candidates for bi-prediction and it is difficult to use merge and skip mode. This introduces coding efficiency loss. This proposal uses merge and skip mode by replacing the prediction direction to L0. For level 3, without the proposed method the BD BR impact is reportedly 1.1%, 1.6%, 0.8%, 1.4% and 2.3% for RA-HE, RA-LC, RA-HE10, LB-HE and LB-LC. With the proposed method, the BD BR impact is reportedly 0.8%, 1.2%, 0.6%, 0.8% and 1.4%.

Two syntax elements were suggested: Disable bi-prediction for 4x4/8x4 etc.; and Disable merge mode. It was suggested to remove the inter 4x4 enabled flag instead.

By introducing a syntax element that prohibits usage of bandwidth-intense operations for small PUs, the loss in compression efficiency is reduced.

However, no results of actual bandwidth reduction were given here.

Several experts expressed concern on the removal of the inter 4x4 flag.

Further study appeared needed in an AHG to get an assessment about:

how much memory bandwidth is saved versus loss of compression, and by which constraints
whether it is sufficient to impose constraints on bitstreams or is necessary to introduce syntax elements which change the decoding process
the complexity implications, e.g. complicating the parsing process

Only then can the answer can be determined regarding what actions to take.

5.19.1.1.1.1.1.1.8JCTVC-H0105 AHG7: Crosscheck of Sony memory bandwidth reduction proposal (JCTVC-H0221) [T. Ikai (Sharp)] [late]

5.19.1.1.1.1.1.1.9JCTVC-H0267 AHG15: Constraint the number of motion vector for memory bandwidth reduction [C.S.Park, T. Kosuge, J.H. Kim, K.H. Lee, J.H.Park, C. Kim (Samsung)]

In HEVC, one 64x64 LCU can have 512 motion vectors in the case of all 4x4 inter partitions of bi-directional prediction. If all partitions are 4x4 inter mode and have two motion vectors for bi-directional prediction, it was asserted not to be guaranteed that one could implement a real time decoder with very high resolution video over full HD due to the memory bandwidth requirement. An SPS flag disabling 4x4 inter partitions which is disabled in default configuration (such as inter_4x4_enabled flag in SPS header) was reportedly adopted at the Torino Meeting. There was supplementary information for restricting specific block sizes to reduce memory bandwidth. However, it was asserted that the mode restriction of block size is not the only solution for memory bandwidth problem.

It was commented that a similar constraint is in AVC; however the issue be more difficult to control in the case of HEVC.

5.19.1.1.1.1.1.1.10JCTVC-H0706 Cross-check report for Samsung constraint the number of motion vector for memory bandwidth reduction (128-MV restriction in JCTVC-H0267) [S.-C. Lim, H. Y. Kim, J. Lee (ETRI)] [late]

5.19.1.1.1.1.1.1.11JCTVC-H0441 Motion Compensation Restrictions to Alleviate Memory Bandwidth Concerns for High Resolution Video [T. Hellman, W. Wan (Broadcom)]

This proposal recommended adding a profile-independent level limit on the types of motion compensation prediction units, to alleviate worst-case motion compensation bandwidth. It notes that the AVC standard has a level limit on the number of motion vectors per MB pair, but the proposal claims that restricting 4x8 and 8x4 PUs to uni-prediction for HD and larger-sized pictures would be a preferred method to address the same concerns for HEVC.

4x8 and 8x4 are causing the main waste of bandwidth, particularly with bi-pred.

Such constraints could be imposed just at higher levels. (The loss is smaller in classes A and B.)

Assessment reportedly shows that worst-case decoder memory bandwidth then is only 20% higher than in AVC (using the contributor's own tool for bandwidth measurement).

The loss in compression should stay acceptable.

Several experts expressed the opinion that bi-prediction with 4x8/8x4 over all levels would be the most simple and effective solution. This would not even require definition by high-level syntax flags.

5.19.1.1.1.1.1.1.12JCTVC-H0600 AHG7: Level definition to limit memory bandwidth of MC [K. Kondo, Y. Morigami, T. Suzuki (Sony)] [late]

This was suggested to be considered in the context of level definition discussions.

Yüklə 8,24 Mb.

Dostları ilə paylaş:

1 ... 157 158 159 160 161 162 163 164 ... 203