Joint Video Experts Team (jvet) of itu-t sg 6 wp and iso/iec jtc 1/sc 29/wg 11



Yüklə 1,03 Mb.
səhifə18/28
tarix03.08.2018
ölçüsü1,03 Mb.
#66753
1   ...   14   15   16   17   18   19   20   21   ...   28

7.6Partitioning (2)


Contributions in this category were discussed Sunday 15 April 1430–1520 (chaired by JRO).

JVET-J0035 Quadtree plus binary tree with shifting [J. Ma, A. Wieckowski, V. George, T. Hinz, J. Brandenburg, S. de Luxan Hernandez, H. Kirchhoffer, R. Skupin, H. Schwarz, D. Marpe, T. Schierl, T. Wiegand (HHI)]

This contribution reports proposes a quadtree plus binary tree with shifting (QT+BTS). The proposed technology extends the QTBT split modes by asymmetric modes with split ratios of 1:2, 1:3, 1:4, 2:3, and 3:5.

The document contains a link to a reportedly clean software implementation, which does not include any new coding tools (compared to HEVC), but provides three block partitioning options: QTBT, QTBT plus triple splits (MTT), and QT+BTS as proposed in this contribution. For combining the block partitioning with an adaptation of the chroma QP, the following BD-rate savings results awere reported relative to the HEVC (HM) for simulations with short video sequences containing 49 frames:

Random access SDR-A:



  • • QTBT: 16.85%, 5.83%, 6.55% (Y,U,V) at 87% encoder and 84% decoder run time;

  • • MTT: 19.23%, 9.32%, 10.55% (Y,U,V) at 196% encoder and 83% decoder run time;

  • • QT+BTS-A: 17.80%, 7.08%, 8.15% (Y,U,V) at 87% encoder and 86% decoder run time;

  • • QT+BTS-B: 18.90%, 8.75%, 9.94% (Y,U,V) at 133% encoder and 87% decoder run time;

  • • QT+BTS-C: 19.62%, 10.05%, 11.29% (Y,U,V) at 222% encoder and 87% decoder run time.

Random access SDR-B:

  • • QTBT: 10.12%, 5.55%, 5.28% (Y,U,V) at 78% encoder and 88% decoder run time;

  • • MTT: 12.72%, 9.52%, 9.60% (Y,U,V) at 170% encoder and 89% decoder run time;

  • • QT+BTS-A: 11.86%, 7.91%, 7.61% (Y,U,V) at 81% encoder and 91% decoder run time;

  • • QT+BTS-B: 13.02%, 9.35%, 9.32% (Y,U,V) at 120% encoder and 90% decoder run time;

  • • QT+BTS-C: 14.04%, 11.13%, 11.30% (Y,U,V) at 204% encoder and 93% decoder run time.

Random access HDR:

  • • QTBT: 7.89%, 6.84%, 15.53% (Y,U,V) at 59% encoder and 85% decoder run time;

  • • MTT: 10.23%, 11.93%, 20.43% (Y,U,V) at 111% encoder and 85% decoder run time;

  • • QT+BTS-A: 9.53%, 11.17%, 19.62% (Y,U,V) at 63% encoder and 86% decoder run time;

  • • QT+BTS-B: 10.30%, 12.49%, 20.98% (Y,U,V) at 88% encoder and 86% decoder run time;

  • • QT+BTS-C: 11.16%, 14.68%, 23.48% (Y,U,V) at 147% encoder and 89% decoder run time.

Low delay SDR-B:

  • • QTBT: 10.28%, 14.72%, 15.31% (Y,U,V) at 61% encoder and 93% decoder run time;

  • • MTT: 12.79%, 18.53%, 19.21% (Y,U,V) at 147% encoder and 94% decoder run time;

  • • QT+BTS-A: 12.33%, 17.29%, 18.25% (Y,U,V) at 70% encoder and 98% decoder run time;

  • • QT+BTS-B: 13.39%, 18.81%, 19.60% (Y,U,V) at 110% encoder and 97% decoder run time;

  • • QT+BTS-C: 14.22%, 19.82%, 20.94% (Y,U,V) at 196% encoder and 98% decoder run time.

It wais further reported that an adaptation of the luma-chroma-QP relationship can have a significant impact on the obtained luma BD rate savings. For the QT+BTS-B configuration, a modification of the chroma QP setting reportedly increases the average luma BD rate savings by 5.8% for RA SDR-A and 2.0% for RA SDR-B.

QuestionsDiscussed aspects included::



  • The cChroma and luma identical? Nosegmentations, can be separate in intra (as proposed).

  • The minimum block size was? 4x4

  • This includes transforms with sizes that are not powers of 2.? Yes, e.g. f For luma, the sizes included powers of 2, and also lengths 12, 20, 24, 40, 48, 80, 96.

QA quad-tree was used on top, and a binary tree with a shift was used in the end at the leaves, and the depth of both is signalled.

In the BT, depending on the previous side length, 2, 3 or 4 split options are possible. The total number of options is the sum of options coming for the horizontal and vertical sides (plus the no-split case).


For further studyFurther study of this was requested

JVET-J0048 Non-Square CTU on top of Qualcomm’s CfP response [X. Li, X. Zhao, X. Xu, S. Liu (Tencent)]

It is reported proposed that using non-square CTUs is desired in some applications. In this contribution, the feature of non-square CTUs is are implemented on top of JVET-J0021. It wais reported that similar coding performance to that of square CTUs is achieved with the help of a newly introduced tree type called SplitToSquare. It is proposed to further study the use of non-square CTUs and the tree type SplitToSquare tree type.

It wais shown reported that splitting a non-rectangular square 512x128 CTU into four 256x64 blocks is worse than splitting into four 128x128 blocks (with a loss around 3%). However, a comparison against using 128x128 wais not shown. Further, the need for CTUs of size 512x128 is not obvious.

7.7NN based technology (4)


Contributions in this category were discussed Sunday 15 April 1520–1640 (chaired by JRO).

JVET-J0034 AHG9: CNN-based driving of block partitioning for intra slices encoding [F. Galpin, F. Racapé, P. Bordes, F. Le Léannec, E. François (Technicolor)]

This contribution describes in more details the cConvolutional nNeural nNetwork (CNN) based algorithm, used in the MTT codec presented in the CfP response JVET-J0022, for driving the block partitioning in intra slices encoding.

A CNN-based encoding approach is explored to partly substitute heuristics-based encoder speed-ups by a systematic and automatic process. The solution approach allows controlling the trade-off between complexity and coding gains, in intra slices, with one a single parameter. This contribution reports, in AI configuration, a BD-rate gain of 6% for the method in the MTT codec presented in JVET-J0022, compared to JEM7 at the same encoding runtime, whereas and for the same BD-rate performance as JEM7, the average encoding runtime is reportedly reduced by a factor of 4.3.
256x256 The luma CTU size is 256x256, with the 1first split inferred; the seco2nd split based on RDO, with splits from 64x64 and lower being CNN based.

The CNN was derived from the ResNet architecture, with one CNN for luma and, one shared for Cb and /Cr. The iInput to the CNN is 65x65 patches plus and associated QP values. The oOutput is a vector with partition boundary probabilities.

The Nnetwork is trained with partition boundary choices obtained by conventional RDO.

With CNN based decisions, QT/ABT reportedly outperforms the JEM starting from 50% runtime.

Questions and comments from the discussion included:


  • Currently this was only used for intra coding, any idea for inter? A:; an iInvestigation for inter is was reportedly ongoing.

  • Why It was asked why ResNet was selected. N? A: no other architectures had been investigated.

  • It was asked wWhat influences the runtime.? A: FinallyUltimately, the number of split candidates that are checked by conventional RDO is the control mechanism , which are selected as the ones that the network marks as the most probable.

JVET-J0037 Intra Prediction Modes based on Neural Networks [J. Pfaff, P. Helle, D. Maniry, S. Kaltenstadler, B. Stallenberger, P. Merkle, M. Siekmann, H. Schwarz, D. Marpe, T. Wiegand (HHI)]

In this documentcontributions, intra prediction modes are proposed to generate an intra-picture prediction signal on a rectangular block in a future video codec are proposed. These intra prediction modes perform are processed using the following two main steps: First, a set of features is extracted from the decoded samples. Second, these features are used to select an affine linear combination of predefined image patterns as the prediction signal. Also, a specific signallingization scheme for the intra -prediction modes is proposed. Since the proposed predictors are non-linear, they cannot neither be represented by theby angular-prediction nor by the DC- or Planar-prediction modes of the HEVC resp.and JEM reference softwaretechniques.

The proposed intra-prediction modes are based on fully connected neural networks with several layers. Such networks come with an additional computational complexity compared relative to the traditional intra prediction modes. The proposed predictors have the following properties to deal with this complexity: For a given block shape, all predictors share all but the last layer in the neural network. Moreover, as a further development of the proposed technique, it is proposed that for large blocks, the target space of the neural network based prediction signal is the frequency domain where a lot of frequency components of the prediction signal are constrained to a constant value.

It is also proposed that for each of the above intra prediction modes a set of non-separable orthogonal transforms is available which are secondary transforms for large blocks. It is proposed that these transforms can be applied in the transform-coding of the prediction residual of the corresponding mode.


Results weare reported against relative to a configuration HM+QT+BTS+SOT (as of contributions JVET-J0035 and JVET-J0040); the, reported BR bit rate reduction is 2.16%, while decoder runtime increases by 33%.

Questions and comments in the discussion included:



  • It was asked whetherWould the gain would be retained when other intra coding tools would be used?. The p Proponent says said this would approximately be the case.

  • It was asked hHow large is the network.? The proponent used a fFully connected network, for which the largest network is for 32x32 (for 64x64, downsampling is used). The first two layers (the same for all modes) require NxN multiplications, where N is the number of input samples (N is 144 for 16x16 blocks, which seems was suggested to be the worst case). The third layer reduces the number to the output prediction size. The output layer is specific for each mode.

  • It was asked whether the implementation was in fFloating point implementation?; this was not the case. No, iInteger processing was used with 32 bit processings, with 16 bit weights.

  • It was asked Wwhy less gain was apparent in the frequency domain, and the proponent said this was ? A: due to quantization.

  • Two networks were used: oOne for reordering the mode list, and one for sample prediction. It was asked wWhat is benefit of the first aspect. The proponent said this could? A: Can not be answered, as both networks are were trained jointly.

  • Number of modes was? 35 for block sizes < 32, and 11 elseotherwise. The mModes are not directional; they are, just trained from the network.

  • It was asked hHow many input samples were used. Two? 2 rows/columns were used for large blocks, and 4 for small blocks.

  • Which It was asked what loss function was used, and the proponent said this was ? A self-designed, so there was (no clear answer)

  • It was asked hHow many models were used.? ForModels were designed for the sizes 4x4, 8x8, 16x16, 32x32 and all rectangular blocks, and were identical for transposes.

JVET-J0043 AHG9: Convolutional Neural Network Filter for inter frame [J. Yao, X. Song, S. Fang, L. Wang (Hikvision)]

This contribution provides proposed a convolutional neural network filter (CNNF) for inter frames. For intra frames, it keeps the same scheme as JVET-I0022. For inter frames, a flag is coded to indicate whether to use CNNF or traditional filters on JEM 7.1, i.e., a bilateral filter (BF), deblocking filter (DF), sample adaptive offset (SAO) and adaptive loop filters (ALF). Simulation results reported were −2.71%, −10.66% and −11.52% BD-rate deltas for luma, and both chroma components compared with JEM 7.1 with RA configuration, and −2.82%, −9.41%, −9.56% for the LDP configuration, and −2.43%, −8.01%, −8.75%for the LDB configuration.



The pPresentation deck was requested to be uploaded.

8 convolutional layers were used. The fFirst is 5x5, and the others are 3x3. Finally, the network result is summed added to the unfiltered input, i.e., the network tries to compute the residual error.

The CNN fFilter is operated in parallel with the conventional loop filters. For each 64x64 region, it is decided whether to use the output of the conventional filter path or the output of the CNNF.

The reported gGain is less than in case of using CNNF only for intra frame (as reported in JVET-I0022).

Question: In JVET-I0022, the CNNF and ALF were operated in a chain in which the (CNNF only replaced deblocking and SAO), where now this combination is not used. Has it been tried? No.The proponent had not tried that.

It wais remarked that potentially ALF could bring some additional gain if operated after a CNNF, as the filter parameters of ALF have are selected with knowledge about the original picture.

It wais also pointed out that the results may be very specific for a given codec and its compression characteristics.

JVET-J0076 AHG9: Crosscheck of CNN filter in JVET-I0022 as in-loop filter and post-processing filter [L. Zhao, X. Li, S. Liu (Tencent), H. Dou, Z. Deng (Intel)] [late]

The cross-checker noted that the dDecoding time was approximately. 100× thatx of the JEM.




Yüklə 1,03 Mb.

Dostları ilə paylaş:
1   ...   14   15   16   17   18   19   20   21   ...   28




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin