Joint Video Experts Team (jvet) of itu-t sg 6 wp and iso/iec jtc 1/sc 29/wg 11


CE10 related – Combined and multi-hypothesis prediction (5)



Yüklə 4,04 Mb.
səhifə43/53
tarix31.12.2018
ölçüsü4,04 Mb.
#88583
1   ...   39   40   41   42   43   44   45   46   ...   53

7.10CE10 related – Combined and multi-hypothesis prediction (5)


Contributions in this category were discussed Saturday 14 July 2040–2145 (chaired by JRO).

JVET-K0148 CE10 related: Combining multi-hypothesis prediction with triangular prediction unit mode [R.-L. Liao, C. . S. . Lim (Panasonic)]

This contribution provides test results for combining the multi-hypothesis prediction with the triangular prediction unit mode. Two tests of multi-hypothesis prediction, the CE10.1.4 proposed by MediaTek and the CE10.1.8 proposed by Fraunhofer HHI, are used to combine with the CE10.3.2 triangular prediction unit mode. Two different combination are tested and their coding results are reported as follows:



  • CE10.1.4 plus CE10.3.2:

    • (VTM configuration) (RA) -1.88% BD-rate with 139% encoding time and 107% decoding time

(LB) -2.04% BD-rate with 149% encoding time and 106% decoding time

  • CE10.1.8 plus CE10.3.2:

    • (VTM configuration) (RA) -2.55% BD-rate with 150% encoding time and 106% decoding time

(LB) -3.75% BD-rate with 169% encoding time and 106% decoding time

Already discussed in context of CE – to be investigated in next round of CE10.

No results on BMS were available.
JVET-K0258 CE10-related: OBMC complexity reduction and parallel blending [C.-C. Chen, C.-W. Hsu, Y.-W. Huang, S.-M. Lei (MediaTek)]

Two aspects about overlapped block motion compensation (OBMC) are proposed. The first is to perform data reuse to achieve lossless runtime reduction for original JEM-based OBMC. The second is to replace sequential computing with parallel computing for original JEM-based OBMC sample blending process. For the first aspect, when OBMC is enabled additionally, the reductions of encoder runtime are 9% for RA VTM-1.0, 10% for LB VTM-1.0, 2% for RA BMS-1.0, and 3% for LB BMS-1.0; the savings of decoder runtime are 12% for RA VTM-1.0, 12% for LB VTM-1.0, 5% for RA BMS-1.0, and 5% for LB BMS-1.0. The proposed parallel blending has no noticeable BD-rate change or run time change.

When the proposed lossless runtime reduction techniques and parallel blending are applied, OBMC achieves -1.04%, -1.41%, -1.26%, and -1.93% luma BD-rates with 5%, 7%, 3%, and 5% encoding time increases and 11%, 13%, 25%, and 33% decoding time increases for RA VTM-1.0, LB VTM-1.0, RA BMS-1.0, and LB BMS-1.0, respectively. Chroma BD-rate savings are about 1% higher than luma BD-rate savings.

When the proposed lossless runtime reduction techniques and parallel blending are applied, CU-boundary-only OBMC (i.e., no sub-block OBMC) achieves -1.04%, -1.41%, -0.89%, and -1.38% luma BD-rates with 5%, 7%, 1%, and 3% encoding time increases and 11%, 13%, 6%, and 7% decoding time increases for RA VTM-1.0, LB VTM-1.0, RA BMS-1.0, and LB BMS-1.0, respectively. Chroma BD-rate savings are about 1% higher than luma BD-rate savings.

The first aspects are non-normative, but obviously help to reduce both encoder and decoder runtime of OBMC. The third aspect changes the blending procedure, to enable parallel processing

The worst case memory bandwidth increase of OBMC is still approx. 2.5X


JVET-K0474 Crosscheck of JVET-K0258: OBMC complexity reduction and parallel blending [J. . Ye, X. . Xu (Tencent)] [late]
JVET-K0422 Cross-check of JVET-K0258: CE10-related: OBMC complexity reduction and parallel blending [R.-L. Liao, C. . S. . Lim (Panasonic)] [late]
JVET-K0259 CE10-related: OBMC bandwidth reduction and line buffer reduction [Z.-Y. Lin, T.-D. Chuang, C.-Y. Chen, Y.-W. Huang, S.-M. Lei (MediaTek)]

This contribution presents proposed methods to reduce overlapped block motion compensation (OBMC) memory bandwidth and line buffer. In order to remove the additional memory bandwidth required by OBMC, padding is applied to additional reference picture samples at the right-most w’ columns and bottom-most h’ rows. They are not fetched and replaced with generated samples by replicating the right-most column and the bottom-most row of the original fetched reference picture samples for fractional sample accuracy motion compensation (MC), where w’ and h’ are the width and height of OBMC region, respectively. For blocks coded with sub-block mode, the same padding method can be applied to reduce the required bandwidth. As for line buffer usage, the number of blended lines for above block boundary is reduced from 4 to 2, if current block is at coding tree unit (CTU) row boundary. Applying padding for OBMC at coding unit (CU) boundary suffers 0.03%, 0.06%, 0.01%, and 0.03% BD-rates for VTM-1.0-RA, VTM-1.0-LB, BMS-1.0-RA, and BMS-1.0-LB, respectively. Applying padding for OBMC at CU boundary and reducing the number of blended lines at CTU row boundary induce 0.05%, 0.07%, 0.03%, and 0.02% BD-rates for VTM-1.0-RA, VTM-1.0-LB, BMS-1.0-RA, and BMS-1.0-LB, respectively. Applying padding for OBMC at CU boundary and subblock boundary and reducing the number of blended lines at CTU row boundary introduce 0.05%, 0.07%, 0.04%, and 0.06% BD-rates for VTM-1.0-RA, VTM-1.0-LB, BMS-1.0-RA, and BMS-1.0-LB, respectively.

Due to padding, worst case memory bandwidth is reduced to approx. 2x. Loss compared to “normal” OBMC is <0.1%.

Investigate in CE together with K0258.



JVET-K0537 Cross check of CE10-related: OBMC bandwidth reduction and line buffer reduction (K0259) [M. . Siekmann (HHI)] [late] [miss]
JVET-K0270 CE10-related: Diagonal motion partitions on top of MTT block structure [Y. . Ahn, D. . Sim (Digital Insights)]
JVET-K0526 Cross-check of JVET-K0270 (CE10-related: Diagonal motion partitions on top of MTT block structure) [T. . Na, J. . Lim (SK Ttelecom), J. . Shin (PIXTREE)] [late] [miss]

In JVET-H0087, diagonal motion partitions (DMPs) were proposed on top of quadtree plus binary tree (QTBT) block structure. In this contribution, the same method for inter prediction is proposed on top of multi-type tree (MTT) block structure. In the proposed partitioning method, a coding unit (CU) is split into two diagonal motion partitions. The proposed method includes only two diagonal directions, but it can represent various arbitrary partitions on top of MTT block structure. The proposed DMPs can achieve 1.28% and 1.75% BD-rate reduction over VTM-1.1 for random access and low-delay B configurations, respectively.

The slide deck showed additional information not in the word document – should be uploaded.

Similar or better performance than CE10 contributions on diagonal partitioning, however also significant increase in encoder runtime – that should be decreased.

In the presentation, preliminary results were presented that usage with uni prediction still preserves major part of the gain.

Blending of the two diagonal partitions was used, but that does not increase memory bandwidth.

Study in CE together with other geom. part. approaches.
JVET-K0485 CE9-related: A simplified bi-directional optical flow (BIO) design based on the combination of CE9.5.2 test 1 and CE9.5.3 [X. . Xiu, Y. . He, Y. . Ye (InterDigital), C.-Y. Chen, C.-Y. Lai, Y.-W. Huang, S.-M. Lei (MediaTek)] [late]
JVET-K0531 CE10-related: Combined test of CE10.1.4 and CE10.1.8 [M.-S. Chiang, C.-W. Hsu, Y.-W. Huang, S.-M. Lei (MediaTek)] [late]

In this contribution, the combined test regarding multi-hypothesis prediction in JVET-J0018 and JVET-J0014 are proposed. In JVET-J0018, multi-hypothesis prediction is applied to advanced motion vector prediction (AMVP) mode, skip or merge mode, and intra mode, which are tested in CE10.1.1, CE10.1.2, CE10.1.3, respectively. CE10.1.4 is the combined test of CE10.1.1, CE10.1.2, and CE10.1.3. In JVET-J0014, multi-hypothesis prediction is applied to merge mode, which is tested in CE10.1.5 to CE10.1.8 with different parameter settings. In this contribution, combined results of CE10.1.4 and CE10.1.8 is proposed. It is reported that, compared to VTM-1.0, this proposal achieves -2.45% and -2.58% luma BD-rates for RA and LB, respectively, with 39% and 54% encoding time increases and 9% and 7% decoding time increases. Compared to BMS-1.0, this proposal achieves xxx% and xxx% luma BD-rates for RA and LB, respectively, with xx% and xx% encoding time increases and xx% and xx% decoding time increases.

Only partial results on BMS were available before the meeting closed.

Worst case complexity is the same as the individual tools, as max. number of hypotheses stays the same. Also worst case memory bandwidth is not increased compared to the individual tools.

Further study in CE.

JVET-K0543 Cross-check of JVET-K0531: CE10-related: Combined tests of CE10.1.4 and CE10.1.8 [X. . Xiu (InterDigital)] [late] [miss]


Yüklə 4,04 Mb.

Dostları ilə paylaş:
1   ...   39   40   41   42   43   44   45   46   ...   53




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin