Joint Video Experts Team (jvet) of itu-t sg 6 wp and iso/iec jtc 1/sc 29/wg 11

Yüklə 4.04 Mb.
ölçüsü4.04 Mb.
1   ...   32   33   34   35   36   37   38   39   ...   53

6.12CE12: Mapping for HDR content (4)

Contributions in this category were discussed XXday XX July XXXX–XXXX (chaired by XXX).

JVET-K0032 CE12: Summary report on HDR coding [E. . FrancoisFrançois, D. . Rusanovskyy, P. . Yin]
JVET-K0298 CE12: Report of dynamic range adaptation (DRA) and DRA refinement [E. . FrancoisFrançois (Technicolor), D. . Rusanovskyy (Qualcomm)] [late]
JVET-K0308 CE12: HDR In-loop Reshaping (CE12-5, 12-6, 12-7 and 12-8) [T. . Lu, F. . Pu, P. . Yin, W. . Husak, S. . McCarthy, T. . Chen (Dolby)]
JVET-K0392 Cross-check for CE12.6.1 and CE12.6.2 [J. . Zhao, K. . Misra] [late]

6.13CE13: Projection formats (8)

Contributions in this category were discussed XXday XX July XXXX–XXXX (chaired by XXX).

JVET-K0033 CE13: Summary report on projection formats [P. . Hanhart, J.-L. Lin]
JVET-K0131 CE13: Modified Cubemap Projection in JVET-J0019 (Test 5) [Y.-H. Lee, J.-L. Lin, S.-K. Chang, C.-C. Ju (MediaTek)]
JVET-K0182 CE13: Parallel-to-Axis Uniform cubemap projection (PAU) in JVET-J0033 (Test 7) [Y. . Sun, X. . Huangfu, B. . Wang, L. . Yu (Zhejiang Univ.)] [late]
JVET-K0328 CE13: Cubemap projection (Tests 2.1 and 2.2) [P. . Hanhart, Y. . He, Y. . Ye (InterDigital)]
JVET-K0329 CE13: Equi-angular cubemap projection (Tests 3.1 and 3.2) [P. . Hanhart, Y. . He, Y. . Ye (InterDigital)]
JVET-K0330 CE13: Hybrid angular cubemap projection (Tests 4.1 and 4.2) [P. . Hanhart, Y. . He, Y. . Ye (InterDigital)]
JVET-K0331 CE13: Adaptive frame packing (Tests 4.3 and 4.4) [P. . Hanhart, Y. . He, Y. . Ye (InterDigital)]
JVET-K0387 CE13: Rotated Sphere Projection (Tests 8.1, 8.2 and 8.3) [C. . Pujara, A. . Singh, A. . Konda (Samsung)] [late]

[7]Non-CE Technology proposals

7.1CE1 related – Partitioning (9)

Contributions in this category were discussed XXday XX July XXXX–XXXX (chaired by XXX).

JVET-K0145 Non-CE1: On Transform Unit Partition-Uniform Transform Unit Structure [J. . Zhu, J. . Yao, W. . Cai, K. . Kazui (Fujitsu)]

This contribution was discussed Saturday 14 July 1610 (GJS).

This contribution proposes a “uniform TU” (UTU) structure. A CU is proposed to be partitioned into TUs uniformly, i.e. each TU in a CU has same size. A syntax element, utu_mode, would be signalled in CU syntax. When utu_mode is zero, it would mean no partitioning. Otherwise, the value of utu_mode would indicate the partition structure. The UTU structure is only applied on intra CUs. Test results reportedly show gain of 0.65 % (Y), 1.26% (Cb) and 1.5% (Cr) in the case of UTU only performed on the luma component of I-slices. These results are from shortened (40-frame) test sequences.

The prediction process would operate on a TU basis (as in HEVC) rather than on the CU basis.

The encoding time is roughly doubled and the decoding time is increased about 10%. The contributor said the amount of decoder increase may be primarily a code optimization issue, estimating that the increase should really be about 4%.

It was commented that since this is splitting the tree deeper, it should be compared to using a deeper tree depth, which also provides gain.

Further study is requested.

JVET-K0464 Crosscheck of JVET-K0145: Non-CE1:On Transform Unit Partition-Uniform Transform Unit Structure [P.-H. Lin, C.-H. Yao, S.-P. Wang, C.-C. Lin, C.-L. Lin (ITRI)] [late]
JVET-K0220 Non-CE1: Proposal for a partitioning method by Fraunhofer HHI and Technicolor [J. . Ma, A. . Wieckowski, H. . Schwarz, D. . Marpe, T. . Wiegand (HHI), F. . Le Léannec Léannec, T. . Poirer (Technicolor)]

This contribution was discussed Saturday 14 July 1445 (GJS).

This contribution proposes a partitioning scheme for VVC as a combination of different partitioning aspects tested in the Core Experiment 1: Partitioning (JVET-J1021). The proposed partitioner is configurable to reach different trade-off points.

The VTM encoder uses a maximum BTT depth of 3 (although the decoder also supports other depths and the encoder can be configured differently as well).

About 0.5%/0.7%/0.9% for AI/RA/LB gain was measured (configuration “C2”) without changing the partitioning structure or tree searching depth and with faster encoding, although changing the syntax and changing the boundary handling. It was estimated that about 0.3% for RA was from the boundary handling. It was commented that the encoder optimization and boundary handling rather than the other syntax difference seemed likely to be primarily responsible for the difference.

With the addition 4-way splits included, an additional 0.0%/0.2%/0.5% gain was reported.

It was noted that the particular variation of QT+BTT had been intended to be a "placeholder" with no presumptive status.

It was commented that the decoding time reported in the contribution had increased by 7% for AI.

This was further discussed Saturday 2000 (GJS). The proponent indicated that they had concluded there was no difference between the proposal without the additional 4-way splits and the current VTM QT+BTT and offered their encoder optimization for the current design. This was welcomed.

Decision: It was agreed that the QT+BTT as per draft 1 now does have presumptive status; it is not just a placeholder.

JVET-K0434 Crosscheck of JVET-K0220 Non-CE1: Proposal for a partitioning method by Fraunhofer HHI and Technicolor [X. . Li (Tencent)] [late]
JVET-K0230 CE1-related: Separate tree partitioning at 64x64-luma/32x32-chroma unit level [T.-D. Chuang, C.-Y. Chen, Y.-W. Huang, S.-M. Lei (MediaTek)]

In a typical pipelined hardware decoder architecture, the data is pipelined with NxN luma blocks and MxM chroma blocks, where NxN and MxM are the same as the maximum luma transform block (TB) size and the maximum chroma TB size. In VTM-1.0, the maximum luma and chroma TB sizes are 64x64 and 32x32. In separate tree partitioning, for each coding tree unit (CTU), the luma coding tree block (CTB) is first signalled, then the chroma CTBs are signalled, which makes hardware decoder architecture to change from maximum TB pipelining to CTU pipelining (i.e., the data is pipelined with luma CTBs and chroma CTBs). In VTM-1.0, the luma CTB size is 128x128, and the chroma CTB size is 64x64. The number of samples for a decoder pipeline stage to support CTU-level separate tree partitioning is four times of that without separate tree partitioning, which will lead to significant increase of hardware areas for pipeline stages. In this contribution, it is proposed to start separate tree partitioning starting at 64x64-luma/32x32-chroma units instead of CTU. In the proposed method, each intra-slice CTU is first implicitly split into 64x64-luma/32x32-chroma units. Then the coding tree under each 64x64-luma/32x32-chroma unit is separated, and luma syntax is signalled before chroma syntax within each 64x64-luma/32x32-chroma unit. It is claimed that the proposed method can be easily supported by hardware decoder architecture of data pipelining with 64x64 luma blocks and 32x32 chroma blocks. Compared against the CTU-level separate tree partitioning, simulation results reportedly show negligible BD-rate differences for the 64x64-luma/32x32-chroma unit-level separate tree partitioning. It is also reported that the 64x64-luma/32x32-chroma unit-level separate tree partitioning with multiple intra chroma direct modes (multi-DMs) can achieve 0.99% and 0.81% Y BD-rates, 11.24% and 9.80% U BD-rates, 11.20% and 9.80% V BD-rates, 17% and 40% encoding time decreases for VTM-1.0-AI and BMS-1.0-AI, respectively, when both the anchor and the test include linear model (LM) chroma mode and the 65 intra angular modes. The impact on decoding time seems negligible.

It was commented that the current shared tree scheme also is not friendly to 64x64 pipeline architecture if there is a top-level split that is a ternary split.

It was suggested that we could just have a smaller maximum CU size for intra than inter. The contributor said this would likely have no impact on coding efficiency.

It was noted that there is an interaction with CCLM.

For separate tree operation, the contribution proposed that all luma would be sent before all chroma on a 64x64 basis or a CTU basis, whichever is smaller.

Decision: Adopt separate trees for intra slices (without multiple intra chroma direct modes) with an implicit split to 64x64 (into both VTM and BMS).

Decision: Prohibit ternary split of something bigger than 64 in width or height (and not send the bit to indicate ternary type at that level). See also later contribution K0556 and the notes of the plenary discussion in section 12.2, in which it was agreed that this prohibition affects both intra and inter.

As a change of the software and CTC configuration, it was suggested to increase the chroma QP for intra when the trees are separate. The contributor had tested increasing the chroma QP offset by 1 for intra (with CCLM in both the anchor and test) and said this showed an increase to 3%/1.5%/0.4% over the VTM, that they would provide the test results in a revision of the contribution. Decision (SW & CTC): Agreed.

There was further discussion on Monday 1500 (GJS) about why separate trees are only planned for intra slices rather than also intra CTUs in inter slices. It was commented that the intra/inter switch is at the CU level rather than the CTU level, so in the current scheme the tree ends before the intra/inter decision is made. A proposed approach (JVET-K0354) that was studied in CE1 was to add a flag at the CU level that would continue the tree, and the two trees would separate from that point downward. It was remarked that the way current-picture referencing works means that if we only support separate trees for intra slices, we cannot combine separate tree usage with current picture referencing.

This was further discussed Tuesday 0945 (GJS). It was noted that supporting separate trees for intra CUs in inter slices shows little improvement in the CTC and has an encoder complexity impact, but it was reported that there was little decoder complexity impact, and enabling this would make intra more consistent between inter slices and intra slices (e.g., an encoder would not need to change the slice type in order to get access to the separate tree functionality, and some encoders may seldom use intra slices). There was some questioning of decoder complexity impact, but there was no clearly significant impact on decoder complexity).

In track A, it was initially agreed to enable separate trees for intra CUs in inter slices (relating to K0354).

Decision: There should be a high-level (e.g., SPS) flag to enable or disable the use of separate trees (for intra CUs in intra slices).

For intra CUs in inter slices, it was agreed that if the use of separate trees was enabled, there should be another flag to enable it or disable it in inter slices. The use in inter slices was to be disabled in the CTC. However, since experiments appeared to show some complexity impact on the decoder, further study in a CE was planned to be performed to confirm the decoder complexity impact before taking action on enabling separate trees for intra CTUs or CUs in inter slices.

This initial agreement was further discussed in JVET plenary Tuesday 1330 (GJS & JRO). It was suggested to consider a different approach for inter slices – which is that, in an inter slice, a flag at the whole-CTU level would indicate that the CTU is intra with separate trees. If the flag is zero, there would be the current scheme with a common tree. It was agreed that this and the other described scheme should be tested in a CE.

It was also suggested to consider a local switch at the CTU or CU level for whether a separate tree or a single tree or common tree is used. Such a local switch could be used in intra slices as well as in inter slices. This should also be tested in the CE.

As a matter of design principle, it was agreed that having intra work similarly in an inter slice as in an intra slice is desirable (in the absence of some justification to do otherwise). However, no action was taken to choose a particular method of enabling separate trees in inter slices for current inclusion in the VTM or BMS.

JVET-K0402 Crosscheck of JVET-K0230: CE1-related: Separate tree partitioning at 64x64-luma/32x32-chroma unit level [X. . Xu, J. . Ye (Tencent)] [late]
JVET-K0320 CE1-related: Zero-Unit for Picture Boundary Handling [K. . Zhang, L. . Zhang, H. . Liu, Y. . Wang, P. . Zhao, D. . Hong (Bytedance)]

Considered in BoG on picture boundary handling.

JVET-K0535 Cross-check of JVET-K0320: CE1-related: Zero-Unit for Picture Boundary Handling [M. . Xu (Tencent)] [late]
JVET-K0362 CE1-related: Context modelling for coding CU split decisions [S.-T. Hsiang, S.-M. Lei (MediaTek)]

This contribution proposes a modified method for entropy coding the coding unit (CU) split decisions. The proposed method reportedly reduces the total number of contexts by 1 and reduces the numbers spatial neighbouring CUs used context selection from four to two. The proposed method reportedly leads to 0.10%, 0.15%, and 0.14% luma BD-rate gains for the AI, RA, and LB settings, respectively, under the VTM-1.0 CTCs (with all BMS tools off). The proposed method reportedly leads to 0.10%, 0.12%, and 0.20% luma BD-rate gains for the AI, RA, and LB settings, respectively, under the BMS-1.0 CTCs (with all BMS tools on).

It was commented that although this looks logical, it seems like a very small refinement that is not necessary to consider at this time and changing it could interfere with other ongoing work. This should be kept in mind if it remains relevant as the project proceeds.

JVET-K0414 CE1-related Cross-check of JVET-K0362 [S. . Jeong (Samsung)] [late]
JVET-K0366 CE1-related: Partial CU for picture boundary [M. . Xu, X. . Li, S. . Liu (Tencent)]

Considered in BoG on picture handling.

JVET-K0523 Cross-check of JVET-K0366: CE1-related: Partial CU for picture boundary [K. . Zhang (Bytedance)] [late]
JVET-K0497 CE1.4 related: Evidence of Split Unit Coding Order [Y. . Piao, J. . Chen, C. . Kim (Samsung)] [late]

This contribution presents evidence of potential gain of split unit coding order (SUCO) described in CE1 subtest 1.4. Since the result of SUCO on VTM in JVET-K0133 is not consistent with results observed in other contexts, some evidence of potential gain from SUCO is presented in this contribution to justify further study of SUCO for VVC. SUCO on the HM reportedly provides 2.1% and 2.1% BD-rate gains in AI and RA, respectively. SUCO on JEM3.1 reportedly provides 2.9% gain in class A2 in RA configuration. In the IFVC software (JVET-J0072) of the Cfp response JVET-J0024, a maximum 1.7% gain was reported with less complexity in the IFVC software than in the HM and JEM because of encoder optimization.

Further study (although not currently in a CE, because it not yet clear exactly how to test it and there is an interaction with partitioning modifications) was encouraged.

JVET-K0554 CE1-related: Joint proposal for picture boundary partitioning by Fraunhofer HHI and Huawei [A. . Wieckowski, J. . Ma, H. . Schwarz, D. . Marpe, T. . Wiegand (HHI), H. . Gao, S. . Esenlik, J. . Chen (Huawei)] [late]

Considered in BoG on picture handling.

JVET-K0556 CE1-related: Constraint for binary and ternary partitions [C.-W. Hsu, T.-D. Chuang, C.-Y. Chen, Y.-W. Huang, S.-M. Lei (MediaTek)] [late]

Discussed Tuesday 1000 (GJS)

Virtual pipeline data units (VPDUs) are defined as non-overlapping MxM-luma(L)/NxN-chroma(C) units in a picture. In hardware decoders, successive VPDUs are processed by multiple pipeline stages at the same time; different stages process different VPDUs simultaneously. The VPDU size is roughly proportional to the buffer size in most pipeline stages, so it is said to be very important to keep the VPDU size small. In HEVC hardware decoders, the VPDU size is set to the maximum transform block (TB) size. Enlarging the maximum TB size from 32x32-L/16x16-C (as in HEVC) to 64x64-L/32x32-C (as in the current VVC) can bring coding gains, which results in 4X of VPDU size (64x64-L/32x32-C) expectedly in comparison with HEVC. However, in addition to quadtree (QT) coding unit (CU) partitioning, ternary tree (TT) and binary tree (BT) are adopted in VVC for achieving additional coding gains, and TT and BT splits can be applied to 128x128-L/64x64-C coding tree blocks (CTUs) recursively, which is said to lead to 16X of VPDU size (128x128-L/64x64-C) in comparison with HEVC. To reduce the VPDU size in VVC, a constraint for TT and BT is proposed, and the VPDU size is defined as 64x64-L/32x32-C for the following.

  • Cond. 1: For each VPDU containing one or more CUs, the CUs are completely contained in the VPDU.

  • Cond. 2: For each CU containing one or more VPDUs, the VPDUs are completely contained in the CU.

The contribution proposed to impose the constraint that, for each CTU, the above two conditions shall not be violated, and the processing order of CUs shall not leave a VPDU and re-visit it later.

For the current scheme, these constraints would be satisfied if three constraints are applied.

  • Prohibit ternary split of edges longer than 64 (32 for chroma)

  • Prohibit vertical split when width is 64 and height is 128 (half these for chroma)

  • Prohibit horizontal split when width is 128 and height is 64 (half these for chroma)

It was reported that imposing these three constraints would have a significant coding efficiency impact. Another way to meet the constraint would be to set MAX_TT_SIZE and MAX_BT_SIZE to 64, likely accompanied by increasing the BT/TT depth.

Further study in a CE is neeed to test some approaches and determine the coding efficiency impact.

Dostları ilə paylaş:
1   ...   32   33   34   35   36   37   38   39   ...   53

Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur © 2017
rəhbərliyinə müraciət

    Ana səhifə