6.17IBDI and memory compression
6.17.1.1.1.1.1.1.1JCTVC-F073 Joint Luma-Chroma adaptive reference picture memory compression [S. Liu, X. Zhang, S. Lei (MediaTek)]
This contribution proposes a localized adaptive scaling mechanism for compressing reference pictures, which can be used to reduce the hardware cost of IBDI technique in HEVC. The proposed mechanism compresses luma pixels and chroma pixels jointly, and was implemented in two versions. For the version 1 method, experimental results report average 0.07% BD-rate increase for random access (HE), and average 0.70% BD-rate increase for low delay (HE). For the version 2 method, experimental results report average 0.01% BD-rate increase for random access (HE), and average 0.29% BD-rate increase for low delay (HE). The average decoding time increases about 5-6% under current software implementation. The effect on encoding time is negligible. Furthermore, an offset was applied to each luma block in the proposed version 1 method (similarly to the method proposed in JCTVC-D035). Experimental results report average 0.03% BD-rate increase for random access (HE) and average 0.28% BD-rate increase for low delay (HE).
This technique still causes significant losses in LD cases, and seems to add some complexity. No action.
More general question: Is this topic relevant?
6.17.1.1.1.1.1.1.2JCTVC-F079 Crosscheck of JCTVC-F073 proposal for joint luma-chroma adaptive reference picture memory compression [D. Hoang (Zenverge)] [late upload 07-07]
6.17.1.1.1.1.1.1.3JCTVC-F075 Unified scaling with adaptive offset for reference frame compression [D. Hoang (Zenverge)]
Internal Bit Depth Increase (IBDI) is a technique that can often yield improved coding efficiency by increasing the arithmetic precision of the predictors, transforms, and loop filters in a video codec. The main drawback is an increase in memory storage and bandwidth requirements. Several reference frame compression techniques have been proposed to reduce the memory penalty of IBDI. This document describes two lightweight compression algorithms that operate on 4×4 blocks and can achieve compression ratios from 9:8 to 14:8. Unified Scaling with Adaptive Offset, the simpler of the two, introduces an average coding loss of about 0.5% for LB-HE and 0.1% for RA-HE. Unified Scaling with Adaptive Offset and DPCM, the better performing algorithm, reduces the coding loss to 0.3% for LB-HE and 0.1% for RA-HE.
Has it been tested with more evil structures? Could it cause visual artifacts?
The argument is brought that the MSE per block cannot become larger than 4 (in Toshiba’s previous proposal, here it could be higher).
We should not only look for MSE, but for maximum pixel deviation that could occur in evil cases. With this method it is said to be 13 (Toshiba’s original is said to be 4).
For 10-bit input (steam locomotive), a small gain (0.2%) is reported even when compared to 10-bit internal memory. This is however most likely due to the fact that the method works like an additional loop filter.
As usually the same buffer is used for DPB and reference buffer, it means that the decompression would again need to be performed before display.
How was the PSNR calculated? 8-bit or 10-bit reference?
6.17.1.1.1.1.1.1.4JCTVC-F548 Cross-verification report of JCTVC-F075 [G. Li (Santa Clara University), L. Liu (Hisilicon)] [late upload 07-07]
6.17.1.1.1.1.1.1.5JCTVC-F078 Cross verification report for JCTVC-F075 proposed by Zenverge [X. Zhang, S. Liu (MediaTek)]
6.17.1.1.1.1.1.1.6JCTVC-F319 Adaptive scaling with offset for reference pictures memory compression [T. Chujoh, T. Yamakage (Toshiba)]
A reference pictures memory compression from N-bit to 8-bit on high efficiency anchor and a definition for standardization are proposed. This contribution merges two methods, one is adaptive scaling proposed by JCTVC-E133 (Toshiba) and the other is adaptive offset proposed by JCTVC-E432 (Zenverge) and its definition by compression distortion control is shown. As experimental results, the loss bitrate of adaptive scaling with offset is average of 0.49% while the loss bitrate of fixed rounding is average of 2.24% and the loss bitrate of internal 8-bit is average of 2.47%.
Proposal suggests that it is not necessary to define compression and decompression, but only “distortion control”. Is lossless for 8-bit.
Decoding time increase 5% / 8%
Why doing it this way? It is said that number of operations is decreased, but this is not noticeable from the software runtime.
6.17.1.1.1.1.1.1.7JCTVC-F620 Crosscheck of JCTVC-F319 Toshiba Adaptive Scaling with Offset RFC [D. Hoang (Zenverge)] [late upload 07-06]
6.17.1.1.1.1.1.1.8JCTVC-F496 2x2 block-based reference pictures memory compression [Lijuan Kang, Yanzhuo Ma, Sixin Lin]
In this contribution, two adaptive reference pictures memory compression scheme from 10-bit/pix to 8-bit/pix based on 2x2 blocks are proposed. This contribution is to solve the redundant memory bandwidth accessing especially when the align/burst parameters are low. In the proposed scheme, both adaptive scaling and offset compensation are used on top of the first scheme. Results show that compared with the fixed rounding method, better coding efficiency is obtained, and compared with the existing 4x4 block based methods, little performance loss is introduced. In terms of memory access bandwidth, the proposed method decreased the memory access bandwidth by 15~17% compared with the 4x4 block based methods when burst parameter is smaller than 64, and about 9% when burst parameter is 128. For large burst parameters the memory bandwidth can be kept the same as the 4x4 block based compression methods by the non-normative trick that merges four 2x2 blocks to one 4x4 block as the basic unit for store and access.
General conclusion on memory compression:
-
The added complexity of current schemes would be a burden for software; it is mainly interesting for hardware implementation. For software it would be implemented rather as an additional loop filter (that decreases the quality).
-
Even for hardware, how to measure the benefit in terms of memory bandwidth is not completely clear. For some access sizes, the average bandwidth reductions that are reported are largely diverging among the different sequence classes. It would rather be relevant to study worst cases (either worst thinkable case or worst case found in any frame of the entire test set) instead of gross averages for memory bandwidth reduction.
-
IBDI is not a good use case for memory compression. Could eventually become relevant for higher bit-depth sources such as 12 or 14-bit. We currently only have two 10-bit sequences, so that an assessment is difficult.
-
Further study when we will assess higher bit depth material in the future
Dostları ilə paylaş: |