International organisation for standardisation organisation internationale de normalisation



Yüklə 5,72 Mb.
səhifə49/84
tarix25.12.2017
ölçüsü5,72 Mb.
#35931
1   ...   45   46   47   48   49   50   51   52   ...   84

10TE8: Parallel entropy coding


10.1.1.1.1.1.1.1.1JCTVC-C223 TE8 report [G. Martin-Cocher (RIM), M. Budagavi (TI)]

This contribution was a summary of tool experiment 8, Parallel Entropy Coding. Five proposed tools had been evaluated using the conditions defined in document JCTVC-B308r1. One of the test conditions defined in JVCTV-B308 was an optional hardware test. The hardware cross-verifications had not been performed.

In this TE, five tools had been evaluated according to the conditions defined in JCVTC-B308r1 and their performances had reportedly been verified.

The TE participants, as a group, recommended that the following two tools be adopted into TM and TMuC:



  • Coefficient Sign PCP (JCTVC-B088 Section 3.2)

  • Coeff Level BinIdx 0 PCP (JCTVC-B088 Section 3.3)

Decision: This was Agreed.

They also recommended that a core/tool experiment be started on the following two tools for improving context processing of significance map, since interaction with HHI_TRANSFORM_CODING needs to be studied more carefully:



  • Significance map PCP (JCTVC-B088 Section 3.4)

  • Coding order for significance map on bin decoding throughput( JCTVC B036 Section 2)

10.2Coefficient Sign PCP, Coeff Level BinIdx 0 PCP, and significance map PCP


10.2.1.1.1.1.1.1.1JCTVC-C062 TE8: TI parallel context processing (PCP) proposal [M. Budagavi (TI)]

Context-Adaptive Binary Arithmetic Coding (CABAC) is one of two entropy engines used by the AVC video coding standard. The processing in the CABAC engine is highly serial in nature. Consequently, in order to decode high bit rate video bit-streams in real-time, the CABAC engine needs to be run at extremely high frequencies which consumes a significant amount of power and in the worst case may not be feasible. Techniques to parallelize CABAC can be broadly classified into three categories: bin-level parallelism, syntax element-level parallelism, and slice-level parallelism. Bin-level parallelism techniques such as NBAC/PIPE/V2V parallelize binary arithmetic coder (BAC) of CABAC. However, due to serial bottlenecks in context processing, there is limited overall throughput improvement in the entropy coder. To address this issue, several techniques that parallelize context processing were advocated. The following three techniques for parallelization of context processing (PCP) were presented at the last JCTVC meeting in JCTVC-B088:



  • Coefficient Sign PCP (JCTVC-B088 Section 3.2)

  • Coeff Level BinIdx 0 PCP (JCTVC-B088 Section 3.3)

  • Significance map PCP (JCTVC-B088 Section 3.4)

The first two of these design elements were reported to have no impact on coding efficiency. Further study was suggested for the third topic.

10.2.1.1.1.1.1.1.2JCTVC-C141 TE8: Crosscheck on TI's proposal of parallel context processing by MediaTek [Y.-L. Chang, Y.-W. Huang, S. Lei (MediaTek)]

This document reported cross-check results for JCTVC-B088 on parallelization of context level processing submitted by Texas Instruments (TI). The first two tools listed above were checked. The software source code for the cross checking was obtained from TI, and was studied and it was concluded that it algorithmically matched the proposed technology design. The verification task was reportedly completed successfully and the results reportedly matched those provided by TI exactly.

10.2.1.1.1.1.1.1.3JCTVC-C245 TE8: Cross verification of TI-PCP proposal for significance map [J. Zan, J. Meng, M. T. Islam, D. He (RIM)]

This contribution reported the results of cross-checking activities performed by RIM on TI parallel context model proposal on significance map (described in JCTVC-B088 Section 3.4). It was reported that the software was studied and determined to match the TI proposal, and the results of the experiments matched those reported by TI.

10.3Coding order for significance map on bin decoding throughput (JCTVC B036 Section 2) and V2V coding tree (JCTVC-B034)


10.3.1.1.1.1.1.1.1JCTVC-C249 TE8: Reports on V2V coding and context modeling by RIM [J. Zan, G. Korodi, J. Meng, M. T. Islam, D. He (RIM)]

This contribution discussed the coding order for the significance map on bin decoding throughput (JCTVC B036 Section 2) and the V2V coding tree (JCTVC-B034)

This contribution reported experiment results on V2V codes and modified context processing as described in JCTVC-B308 and proposed by the contributor.

According to the plans established in JCTVC-B308, the following tools were tested:



  • V2V codes proposed by RIM

  • Improved context processing in HEVC

Specifically, the following three tests were run.

  • Bernoulli tests of RIM V2V codes against PIPE V2V codes

  • Modified context processing in TMuC0.7 against the high efficiency anchors as defined in JCTVC-B300.

  • RIM V2V codes in TMuC0.7 against high efficiency anchors as defined in JCTVC-B300.

It was remarked that Huffman coding can be sped up by various techniques – when the VLC table is short, the codewords can be concatenated to create a larger table that requires fewer data fetches to process.

Generally speaking, the RIM proposed code tables are larger than the HHI proposed code tables and they capture more source bits per codeword. It was noted that this causes extra latency/buffering and affects the overhead for flushing out the partially filled codewords when flushing is necessary.

In terms of overall coding efficiency for coding video data, the HHI tables seem very slightly better for predictive coding cases (due to needing fewer bits for flushing at the end of the codestream) and the RIM table seem very slightly better for intra (due to better fitting to entropy due to using larger tables).

Significant overlap with JCTVC-C134 was noted.

In regard to the proposed modified context processing, the motivation is to increase how often the same context is used for the decoding of consecutive bins, so that multiple bins could be processed at once through the finite state machine. For the modified proposal, a 0.0% to 0.1% degradation of coding efficiency was reported with a 1.4x to 2.1x effect on the "throughput" for the part of the bitstream that is devoted to these symbols. (See the contribution and JCTVC-C063 for the definition of throughput.)

10.3.1.1.1.1.1.1.2JCTVC-C134 Comments on V2V coding for TM/TMuC [D. He, G. Korodi, P. Imthurn, J. Jamias, D. O'Loughlin, G. Martin-Cocher (RIM)]

This document discussed V2V coding. It was asserted that the design of a V2V tree directly impacts, not only its compression performance, but its throughput, in both software and hardware implementations. The contributor recommended further evaluation of V2V codes by measuring their compression performance, encoding and decoding throughput, sizes, and power consumption in hardware.

The contribution discussed the "Codeset 1" and "Codeset 2" code tables currently in the TMuC.

In Bernoulli tests using the 12 probabilities used in the design of the V2V codes, CABAC and the two codesets were compared. It was remarked that this penalizes CABAC in the comparison, since CABAC was not customized for those specific probabilities.

It was noted that the Bernoulli tests do not include the work needed to select which V2V code to apply in the V2V cases, which is another type of apparent bias against CABAC in the test.

The "codeset 1" tables are generally somewhat smaller, and (perhaps as a result) have slightly less coding efficiency than the "codeset 2" tables (on the specific tested probabilities) – although both have very little excess entropy (0.15% versus 0.12%).

It was remarked that some assumptions may be built into this analysis about how the VLC would be implemented that may not reflect the various ways that optimized encoder and decoders would perform.

It was suggested that some measure such as average bits per look-up operation may be beneficial to collect in experiments.

It was noted that some of the bit rates discussed in the contribution were very high.

It was suggested that some aspects of power consumption may not be captured in the provided power analysis.

It was remarked that area versus power, for example, has a different priority balance in different applications.

The contributor asserted that using a code that has a small number of distinct codeword lengths is desirable.

A participant remarked that the "codeset 2" tables have reduced coding efficiency (approx 0.5%) under some coding conditions in video test classes that result in the use of small slices, due to needing increased overhead bits for flushing the codewords at the end of the slice.

10.3.1.1.1.1.1.1.3JCTVC-C063 TE8: Evaluation of RIM parallel context processing (PCP) proposal [V. Sze, M. Budagavi (TI)]

The coding efficiency loss of RIM PCP was reported to have been evaluated to be between 0.0 to 0.1% using TMuC-0.7 with QC_MDDT disabled. The implemented RIM PCP includes reordering of significant_coeff_flag and last_significant_coeff_flag by context. The update skipping mentioned in the original proposal was not included in the code provided for evaluation; the context updates are still being done serially. The software reports RIM PCP throughput impact to be between 1.4 to 1.9 for significant_coeff_flag amd 1.5 to 2.1 for last_significant_coeff_flag. Test results were reported to have been verified to mostly match with results provided by RIM.

The contributor indicated that using the HHI transform coefficient coding proposal may make it more difficult to take advantage of the throughput increase opportunity.

The contributor indicated that they had studied the software algorithm and that it functioned according to the proposal. It was remarked that some of the computed throughput increase might be difficult to achieve in practice.

Further study of the context modeling aspect was encouraged.

10.3.1.1.1.1.1.1.4JCTVC-C064 TE8: Evaluation of RIM-V2V entropy coding [V. Sze, M. Budagavi (TI)]

This document reported on the Bernoulli test used to compare the coding efficiency, encoding and decoding throughput of various proposed entropy coding approaches – namely RIM-V2V, HHI-PIPE and BAC. It also provided additional complexity comparison of the RIM-V2V tables versus the HHI-PIPE tables. To measure the coding efficiency, Bernoulli sequences of one billion samples were generated for 12 distinct probabilities. Each of the entropy coding approaches was used to encode the same bin sequence, and the total encoded bits were measured. While the RIM and HHI proposed tables were designed with the specific 12 probabilities in mind, the BAC used existing states, nearest to the 12 probabilities (i.e. BAC was not customized for the probabilities). Furthermore, the tests were not performed in the TMuC context with actual video data. Thus, it was reported to be difficult to draw a conclusion on coding efficiency. The encoding and decoding throughput were estimated based on simulation time. However, since software code for each approach was written in a very different style/structure, it was reported that no conclusion can be drawn on throughput based on the proposed use of simulation time. Further investigation was recommended. Test results given by the Bernoulli test software had been verified to match with results provided by RIM.

The TMuC software has 24 tables from RIM and 12 tables from HHI, and the TMuC document has the 12 tables from HHI, and the 12 tables now proposed by RIM were first documented in a TE plan document.

Similar measurements from the Bernoulli testing were reported in this contribution as what was reported by RIM.

The tested software was reported to have had differences in coding style and degree of optimization. The contributor indicated that it was difficult to determine whether this affected the analysis results.

The contribution indicated that the extra buffering needed by PIPE (relative to CABAC) was an issue.

Area estimates for PIPE were 15x higher than for CABAC.

Further study of the various issues was suggested.

10.3.1.1.1.1.1.1.5JCTVC-C280 TE8: Crosscheck result of the transcoder for JCTVC-B034 source selection for V2V entropy coding in HEVC [Y.-L. Chang, Y.-W. Huang, S. Lei (MediaTek)]

This document reported cross-check activity for the transcoder part of JCTVC-B034 for V2V entropy coding as proposed by Research In Motion (RIM). The verification task was reported to have been done successfully and the results reportedly matched those provided by RIM. There were some small variations (up to -0.12%) reported in random access conditions.

The source code provided by RIM was studied to verify the proposed algorithmic functionality.

10.3.1.1.1.1.1.1.6JCTVC-C313 TE8: Crosscheck Result of the Transcoder of JCTVC-B034 for V2V Entropy Coding in HEVC [Y. Zheng , R. Joshi , M. Coban , M. Karczewicz (Qualcomm)] (late reg.)

The purpose of this late information document was to crosscheck the transcoder implementation of JCTVC-B034 on source selection for V2V entropy coding previoiusly submitted by Research In Motion (RIM). The verification task had reportedly been completed successfully and the results were reported to closely match those provided by RIM.



Yüklə 5,72 Mb.

Dostları ilə paylaş:
1   ...   45   46   47   48   49   50   51   52   ...   84




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin