14.2Contributions
14.2.1.1.1.1.1.1.1JCTVC-D061 CE11: Evaluation of Transform Coding tools in HE configuration [T. Nguyen, D. Marpe, H. Schwarz, T. Wiegand]
This document reports results from Fraunhofer HHI for the tests of core experiment 11.
Issues tested (BR increase for intra/RA/LD):
-
Disabling adaptive scan order (direction of diagonal scanning) as in HM 1.0 for coding of the significance map increase 0.35/0.22/0.19
-
Disabling context model selection as in HM 1.0 for significance flag increase 1.76/1.41/2.78
-
Disabling context model selection as in HM 1.0 for last significance flag 0.04/0.27/0.86
-
Disabling context model selection as in HM 1.0 for the absolute transform coefficient levels 0.58/0.20/0.74
Among these four, adaptive scanning gives least benefit, some arguments were brought about the complexity.
Total bit rate loss for all four: 2.59/1.92/3.75.
Encoding/decoding time reported to change marginally (within 1-2%)
Decision: Disable/replace adaptive scan order (see JCTVC-D239).
14.2.1.1.1.1.1.1.2JCTVC-D113 CE11: Cross-check report for Fraunhofer HHI’s proposal [Yukinobu Yasugi, Tomoyuki Yamamoto]
14.2.1.1.1.1.1.1.3JCTVC-D190 CE11: Coding efficiency of tools in HHI_TRANSFORM_CODING (JCTVC-A116) [V. Sze (TI)]
14.2.1.1.1.1.1.1.4JCTVC-D236 CE11: Cross-check report from Motorola Mobility for HHI's adaptive scan (JCTVC-A116) [Jian Lou, Krit Panusopone, Limin Wang]
Cross-checker reports that disabling adaptive scan gives a benefit of 4% encoding time reduction
14.2.1.1.1.1.1.1.5JCTVC-D195 CE11: Simplified context selection for significant_coeff_flag (JCTVC-C227) [V. Sze, M. Budagavi (TI)]
HHI_TRANSFORM_CODING uses a highly adaptive context modeling approach for the significance map, where context selection for significant_coeff_flag depends on coefficients in 10 neighboring positions. While it provides coding gains between 1.4% to 2.8%, it greatly increases the complexity of context selection when coding significant_coeff_flag. In this contribution, a simplification of the context selection was proposed. Neighbor dependency was reduced from 10 to 8, then to 4. The number of contexts required for significant_coeff_flag is also reduced from 128 to 108. These modifications were implemented in TMuC-0.9 and their coding efficiencies were evaluated. Reducing from 10 to 8 has no reported coding loss, while reducing from 10 to 4 (reducing the dependency by more than half) has a coding loss between 0% to 0.2%. This simplified context selection had a negligible effect on the number significant_coeff_flag bins.
Reducing from 10 to 8 contexts does not lose compression efficiency
Reducing from 10 to 4 contexts loses compression efficiency 0.2%/0.1%/0% for intra/RA/LD
Reduction of encoding time reported to 98/92/92 and 95/93/91 for the 10->8 and 10->4 reduction cases
Decoding time changes are within noise; also encoding time numbers could be questionable (confirmed by one cross-checker, not confirmed by the other).
The number of contexts most probably is less relevant for software than hardware implementation.
JCTVC-D244, JCTVC-D260 & JCTVC-D262 are related.
Where is the optimum cutoff point between 8 and 4? Is a lower number of contexts useful for parallelization?
14.2.1.1.1.1.1.1.6JCTVC-D062 CE11: Cross-check report from HHI for TI's proposal JCTVC-C227 [T. Nguyen, D. Marpe, H. Schwarz, T. Wiegand]
14.2.1.1.1.1.1.1.7JCTVC-D075 CE11: Cross-check report from Panasonic for TI’s proposal [Hisao Sasai, Takahiro Nishi]
14.2.1.1.1.1.1.1.8JCTVC-D244 Context selection complexity in HEVC CABAC [V. Sze (TI)]
CABAC is a well known throughput bottleneck in video coding implementations. This is due to the many feedback loops in the CABAC, several of which are in the context selection. A common technique used in practice is speculative computation which aims to increase the throughput of CABAC. However, this comes at a cost of increased number of computations per bin. With the introduction of HHI Transform Coding into HM1, the data dependencies in context selection have become even stronger. The number of contexts for significance map in HEVC is almost 2x more than AVC, thus tightening this bottleneck in the context selection of CABAC. This also increases implementation area cost, as the context memory needs to be larger.
This contribution discussed the approach of "speculative computation" demands for possibly lowest number of contexts. It is said that this method is widely used when implementing CABAC in a pipeline architecture.
14.2.1.1.1.1.1.1.9JCTVC-D260 Parallel processing friendly simplified context selection of significance map [C. Auyeung, W. Liu (Sony)]
This contribution proposed a reportedlyparallel processing friendly and lower complexity alternative to the context selection of significance map in TMuC 0.9. The proposed context selection removes the data dependency on the left boundary, bottom boundary and scan direction. It reduces the maximum size of the neighborhood needed for context selection from 10 to 5 pixels. It reportedly resulted in % BD BR improvements of 0.0, 0.0, -0.1 relative to the anchor for Intra HE, Random Access HE, and Low Delay HE test cases respectively.
In contrast to JCTVC-D194, the number of context models was not changed; only the number of neighbors used to derive the probabilities was reduced.
JCTVC-D242 provides a cross-check and confirms the results.
Various similar contributions were submitted, this solution is reducing the number of neighbors used in the context to half without affecting compression performance.
Decision: Adopted.
14.2.1.1.1.1.1.1.10JCTVC-D242 Cross-check of Sony's simplified context selection (JCTVC-D260) [V. Sze (TI)]
JCTVC-D242 provides a cross-check and confirms the results.
14.2.1.1.1.1.1.1.11JCTVC-D262 Parallel Context Processing for the significance map in high coding efficiency [J. Sole, R. Joshi, I.S. Chong, M. Coban, M. Karczewicz]
This proposal presents a technique for the parallelization of context processing to improve the throughput of the entropy coder for the high efficiency case. The position of the last significant coefficient is encoded before the position of the other significant coefficients within a block. The position of the last coefficient is encoded explicitly by signaling its X and Y coordinates with a unary code. The X and Y signaling is independent. The context derivation for the significance map is simplified to further enhance parallelization. The parallelization improvements of the proposal reportedly come at no cost in performance. The BD BR for the high efficiency intra, random access, and low-delay configurations is reported as 0.06℅, 0.01℅, and -0.17℅, respectively.
If the last significant coefficient is found within the first 10 coefficients, then a set of contexts depending solely on the scan position is used. Otherwise, the usual contexts are used.
14.2.1.1.1.1.1.1.12JCTVC-D212 Verification results of Qualcomm's Proposal JCTVC-D262 on Parallel Context Processing [T. Yamakage (Toshiba)]
14.2.1.1.1.1.1.1.13JCTVC-D239 CE11: Report on zigzag scan performance for CABAC on TMuC0.9 [Jian Lou, Krit Panusopone, Limin Wang]
The target of this document is to verify the performance of zig-zag scan with CABAC in transform coding for high efficiency test conditions. The simulations were conducted using TMuC0.9 software with Motorola Mobility’s modifications. The code modifications reportedly clean up the redundancies introduced by adaptive scan in order to achieve accurate speed. Bit-exact rate-distortion performance as that provided in JCTVC-D236 was reportedly achieved. The anchor results were obtained with adaptive scan and the tested results were obtained with zig-zag scan.
In principle, this is the same as disabling adaptive scan order, however a different zig-zag scan implementation was used. Reported encoding time goes down to 91-95%; decoding time does not change.
This is certainly due to multiple encoding passes in RDO.
The direct implementation of the zig-zag scan is more efficient than disabling adaptive scanning by a switch in the current reference SW, but it produces the same bitstream and compression results.
Decision: Adopt – see discussion and conclusions section.
14.2.1.1.1.1.1.1.14JCTVC-D309 CE11: Cross-check of Motorola’s proposal JCTVC-D239 by Huawei [H. Yang, J. Zhou]
14.2.1.1.1.1.1.1.15JCTVC-D400 CE11: Cross verification of Motorola’s zigzag scan for CABAC/PIPE [M. Coban]
14.2.1.1.1.1.1.1.16JCTVC-D360 CE11: Low-complexity adaptive coefficients scanning [V. Seregin, J. Chen, W.-J. Han (Samsung)]
In this document, Adaptive Coefficient Scanning (ACS) with three scanning pattern is investigated and tested. The scanning index for every Transform Unit (TU) is explicitly signalled to the decoder side in the method proposed in JCTVC-C205. In the different method, mode dependent coefficient scanning proposed in JCTVC-D393 is used to derive scan mode for intra 4x4 and 8x8 blocks to reduce encoder complexity. Experimental results reportedly show that the proposed ACS provides 1.0%, 0.6% and 0.6% BD BR gain in high efficiency (HE) configurations and 0.1%, 0.2% and 1.1% BD BR gain in low complexity (LC) configurations, respectively for intra-only, random access, and low-delay test conditions.
-
In intra coding, scanning pattern is derived from the mode.
-
In inter coding, scanning pattern is explicitly signalled.
-
Loss of performance in LC case is explained by inefficiency of LCEC for intra (as reported elsewhere and resolved by new codenumber mapping method).
-
"optimized" version still uses the direction adaptive scanning of HM for diagonal.
-
Part of the gain in inter (particularly RA) also comes from intra coded pictures or blocks.
14.2.1.1.1.1.1.1.17JCTVC-D146 Cross-verification report for Samsung and Qualcomm's proposal from Microsoft [J. Xu (Microsoft)] (missing prior, uploaded Thursday 20th, before meeting)
14.2.1.1.1.1.1.1.18JCTVC-D189 CE11: Cross-verification of Samsung's low-complexity adaptive coefficients scanning (JCTVC-C205) [V. Sze (TI)]
14.2.1.1.1.1.1.1.19JCTVC-D320 CE11: Cross verification of Samsung's proposal JCTVC-C205 [K. Ugur (Nokia)]
14.2.1.1.1.1.1.1.20JCTVC-D382 CE11: Cross-verification of Samsung's low-complexity adaptive coefficients scanning (JCTVC-C205) [C. Auyeung]
14.2.1.1.1.1.1.1.21JCTVC-D393 CE11: Mode Dependent Coefficient Scanning [Yunfei Zheng, Muhammed Coban, Joel Sole, Rajan Joshi, Marta Karczewicz]
In this contribution, a mode dependent coefficient scanning order selection scheme was proposed in order to improve the HEVC coding performance. In the proposed scheme, the "best" scanning order for the significance map is selected among HHI transform coding (TC) zig-zag, horizontal, and vertical scans based on the transform unit size and intra prediction mode. The proposed scheme reportedly achieves 1.0% BD BR reduction on average for the high efficiency intra configuration. There was reportedly no encoder/decoder complexity increase when compared to the TMuC 0.9 default setting.
-
3 scan directions
-
Use direction-adaptive scanning from HM for diagonal scan case
-
Did not use context-adaptive scanning for inter
-
Explicit signalling was tested as well; matches with JCTVC-D360 results
-
No test for LC case.
14.2.1.1.1.1.1.1.22JCTVC-D453 Cross-check report of mode dependent transform coefficients scanning from JCTVC-D393 [J.Chen, V.Sze, K.Panusopone, A.Tabatabai, M.Coban] (late registration Wednesday 26th after start of meeting, uploaded Wednesday 26th, near the end of the meeting)
This contribution summarized cross-check activities on the transform coefficients scanning method in CE11. Mode Dependent Coefficient Scanning scheme of JCTVC-D393.
The software source code was verified to be a match with the proposal. High efficiency configuration results were partially cross-checked by Texas Instruments (Intra only configuration are fully cross-checked). Low complexity configuration results were cross-checked fully by Samsung and partially by Sony.
-
Was also checked compared with the coefficient VLC fix from JCTVC-D374
-
Gains for HE confirmed
-
For LC, gains are 0.6%, 0.3% and 0.1%
-
Only affects 4x4 and 8x8
Decision: Adopt the adaptive scanning method from JCTVC-D393 as tested here. (Revised in Track B Wednesday afternoon.)
14.2.1.1.1.1.1.1.23JCTVC-D456 CE11: Crosscheck of Qualcomm’s Mode Dependent Coefficient Scanning in JCTVC-D393 by MediaTek [C.-Y. Chen, Y.-W. Huang] (late registration Thursday 27th after start of meeting, uploaded Thursday 27th, near the end of the meeting)
No detailed discussion of this contribution was considered necessary
14.2.1.1.1.1.1.1.24JCTVC-D359 CE11: Cross-verification of Qualcomm’s low-complexity adaptive coefficients scanning [V. Seregin, J. Chen (Samsung)]
14.2.1.1.1.1.1.1.25JCTVC-D424 CE11: Cross-check results for Qualcomm’s Proposal JCTVC-D393 T. Nguyen, D. Marpe, H. Schwarz, T. Wiegand (late registration Thursday 20th after start of meeting, uploaded Friday 21st, second day of meeting)
Dostları ilə paylaş: |