14.11TE12 TMuC entropy coding 14.11.1PIPE vs. CABAC vs. LCEC
14.11.1.1.1.1.1.1.1JCTVC-C048 TE12.6: PIPE and LCEC tested against CABAC by Fraunhofer HHI [B. Bross (Fraunhofer HHI)]
This document reported results of testing the PIPE and LCEC entropy coders against CABAC within the scope of tool experiment 12 evaluating TMuC Tools. In a high efficiency scenario, CABAC was tested against the PIPE default. Bit rate savings up to 0.3% are shown when using CABAC. In a low complexity scenario, CABAC was tested against the LCEC default. CABAC performed better than LCEC in terms of coding efficiency, with average bit rate savings up to 15.5% while encoder runtimes up to 170% and decoder runtimes up to 113% of the LCEC runtimes had been measured.
-
Intra CABAC vs. PIPE 0.2%, PIPE vs. LCEC 12.5%.
-
Random access CABAC vs. PIPE 0.3%, PIPE vs. LCEC 15.2%.
-
Low delay CABAC vs. PIPE 0.3%, PIPE vs. LCEC 15.5%.
It was remarked that the RDO encoder decision-making uses CABAC. It seems like this could make a difference.
14.11.1.1.1.1.1.1.2JCTVC-C051 TE12.6: PIPE tested against LCEC by Fraunhofer HHI [B. Bross (Fraunhofer HHI)]
This contribution presented test results that were essentially consistent with those in JCTVC-C048.
14.11.1.1.1.1.1.1.3JCTVC-C058 TE12: Evaluation of entropy coders: PIPE tested against CABAC [V. Sze, M. Budagavi (TI)]
This document reported performance evaluation results of PIPE (Probability Interval Partitioning) in the Test Model under Consideration (TMuC) as part of TE12.
The coding efficiency loss of PIPE over CABAC was reported to be between 0.2 to 0.3% using TMuC-0.7.
These results were reported to have been verified to be a match with results provided by HHI.
Several outstanding issues pertaining to the throughput and complexity regarding the PIPE were discussed in the contribution. The contribution suggested that the difficulty of parallelizing context modeling indicated that more investigation is required to determine how PIPE impacts the overall throughput of the entropy coder. The reported estimated memory size requirement for PIPE is 5x larger than CABAC. The cost of the additional buffers required by PIPE were suggested to also merit additional analysis. More investigation was also requested on how the parallel bitstreams will be handled (i.e. interleaved or separate); if they are interleaved, the complexity and latency implications of the various approaches was requested to be analyzed. The contributor advocated that further study is required on PIPE.
Regarding interleaved versus separate modes of PIPE operation, at the moment the TMuC encoder makes a decision after buffering the bits for the picture, and the decoder follows what the encoder selected.
It was remarked that reduced power requirements may be more important than reduced area requirements.
It was remarked that TE8 studied some related implementation aspects.
14.11.1.1.1.1.1.1.4JCTVC-C184 TE12.6: PIPE tested against LCEC by Cisco [A. Fuldseth (Cisco)]
This document presented TE12 results on PIPE vs. LCEC comparisons. Depending on the configuration (low delay/random access/intra and low complexity/high efficiency), average BD-Rate figures between 12.3% and 18.0% were reported.
It was noted that the quantity of this compression difference may be affected by other proposed modifications.
It was remarked that some degree of encoding optimization (e.g., whether the encoder resets the statistical measures at the beginning of each picture or in some R-D decision-making) may also affect the coding efficiency difference.
14.11.2LCEC "phase 2" testing
14.11.2.1.1.1.1.1.1JCTVC-C196 TE12.6: Results for LCEC_PHASE2 tests by Nokia [K. Ugur, A. Hallapuro, J. Lainema (Nokia)]
This contribution presented Tool Experiment 12 results on LCEC_PHASE2. LCEC_PHASE2 represents the second integration phase of the low complexity entropy coding (LCEC).
A bit rate improvement was reported as follows: Intra 0.3%, random access 0.4%, low delay 1.6%.
The contributor recommended to adopt Phase 2 into the TMuC.
14.11.2.1.1.1.1.1.2JCTVC-C160 TE12: Crosscheck on LCEC phase 2 [P. Chen, M. Karczewicz (Qualcomm)]
This contribution presents the test results for LCEC tool (phase 2) listed in Tool Experiment 12: Evaluation of TMuC Tools. Similar gains were reported as described in JCTVC-C196.
14.12TE12 TMuC IBDI and transform precision extension
14.12.1.1.1.1.1.1.1JCTVC-C057 TE12: Evaluation of IBDI and TPE (transform precision extension) [M. Zhou (TI)]
This document reported evaluation results of the IBDI and TPE (Transform Precision Extension) for TE12. Experimental results on TMuC-0.7 reportedly show that the total gain with both the IBDI and TEP turned on is 0.9% to 5.0% on average; the average IBDI gain when TPE is off is 0.8% to 4.7%, and it is reduced to -0.1% to 3.3% when TPE is turned on; the average TPE gain when IBDI is off is 0.8% to 1.4%, but it totally vanishes when IBDI is on; The results also reportedly show that the additional left shift bits in the TPE (1 bit shift in T32x32, 2 bits shift in T16x16, and 3 bits shift in T8x8, respectively) do not provide additional gain compared to the 4 bits left shift across the transforms.
IBDI (Internal Bit-Depth Increase)
-
Input data is left shifted by 4 bits before passed to video codec
-
Internal data precision of each video coding processing stage increased by 4 bits
-
Reference frame has higher precison (8 bits -> 12 bits) which leads to higher memory bandwidth and increased off-chip storage requirements
-
IBDI is on in the reference HE configuration, off in LC configurations
TPE (Transform Precision Extension)
-
Expands the intermediate data precision after the first-stage of transform.
-
The final transform output data precison reamains unchanged.
-
Number of TPE extension bits LC configurations (IBDI off): 7 for T8x8, 6 for T16x16, 5 for T32x32 and 4 for T64x46
-
Number of TPE extension bits HE configurations (IBDI on): for T8x8, 2 for T16x16, 1 for T32x32 and - for T64x64
It was suggested to study the use of 8 bit reference pictures with 12 bit prediction signals and 12 bit residual signals. It should be easy to modify the software to support it if it is not already configurable in this fashion.
Results are shown in the table below (with rounding control enabled), indicating the range of benefit averaged across all test sequences within a particular configuration, out of 6 configurations total – HE & LC for all-intra, low delay, and random access. The italicized boldface entries are suggested as the particularly important ones:
Anchor combination
|
Combination under Test
|
Average Gain (BD-rate)
|
IBDI off, TPE off
|
IBDI on, TPE (5,6,7) on
|
0.8% to 5.0%
|
IBDI off, TPE off
|
IBDI on, TPE off
|
0.8% to 4.7%
|
IBDI off, TPE (5,6,7) on
|
IBDI on, TPE (5,6,7) on
|
-0.1% to 3.3%
|
IBDI off, TPE off
|
IBDI off, TPE (5,6,7) on
|
0.8% to 1.4%
|
IBDI on, TPE off
|
IBDI on, TPE (5,6,7) on
|
No gain
|
IBDI off, TPE (4,4,4) on
|
IBDI off, TPE (5,6,7) on
|
No gain
|
Recommendations from the proponent:
-
TPE be turned on in HE and LC configurations by replacing
-
const int g_iShift8x8 = 7;
-
const int g_iShift16x16 = 6;
-
const int g_iShift32x32 = 5;
-
const int g_iShift64x64 = 4;
-
In TlibCommon\Source\CommonDef.h with
-
const int g_iShift8x8 = 4;
-
const int g_iShift16x16 = 4;
-
const int g_iShift32x32 = 4;
-
const int g_iShift64x64 = 4;
In the transform, it was remarked that increased bit depth between the horizontal and vertical transform stages is somewhat burdensome for implementation.
It was also suggested to consider IBDI with 2 bit expansion rather than 4 bit expansion.
The rounding control feature for bidirectional averaging (adopted into TMuC in Geneva) was noted to affect the degree of benefit for these techniques. It was remarked that when the rounding control feature was integrated into the software, an additional encoder distortion measurement improvement for biprediction was also included – which resulted in making the encoder substantially slower while improving its R-D behavior). JCTVC-C253 was suggested to be related to this topic.
It was concluded that the TM should include 12 bit support (i.e., effectively IBDI) and 4-bit TPE. For LC configuration; use TPE, for HE configuration, use IBDI (without additional TPE).
14.12.1.1.1.1.1.1.2JCTVC-C088 TE12.7: Experimental results of IBDI [T. Chujoh, T. Yamakage (Toshiba)]
In this contribution, results of IBDI (Internal Bit-Depth Increase) and TPE (Transform Precision Extension) testing were reported. This is one of reports in Tool Experiment 12: Evaluation of TMuC Tools. The experimental results reportedly show that the coding gain of TPE in the case of intra coding averages about 1% and the coding gain of IBDI in the case of inter is a total of 3% benefit. The encoding and decoding times were reportedly stable.
14.12.1.1.1.1.1.1.3JCTVC-C252 TE12: Report on evaluation of internal bit depth increase and transform precision extension [R. Joshi, M. Karczewicz (Qualcomm)]
Similar results were presented in this contribution as in JCTVC-C057 and JCTVC-C088.
Dostları ilə paylaş: |