7.11CE11 related – Composite reference pictures (3)
Contributions in this category were discussed Friday 13 July 1940–2010 (Track B chaired by JRO).
JVET-K0157 CE11: HEVC-like encoder only solution for composite reference picture [W. . Li, X. . Zheng (DJI)]
This contribution provide a HEVC-like encoder only solution for composite reference that is evaluated in CE11. Implementation details and test results followed by common test condition are provided in the document. Simulations show that the proposed technique can achieve -2.46% and -1.46% coding gain over VTM1.0 and BMS1.0 at Lowdelay B Main10 configuration (LDB) with around 20% encoding time increase.
Decision(SW): Adopted – see further notes under CE11.
This contribution proposes a composite reference design. It can allow composite reference update at block level when a CTU’s coding is finished, which is said to show benefits on hardware design. It is said that the proposed method can achieve almost same coding performance of composite reference and get lower encoding runtime increase.
Relative to the method in CE11, the encoding run time is not decreased, and results are almost identical. It is claimed to be beneficial for encoder hardware implementation, but there is no need to investigate this in CE11, which should target investigating improved compression benefit of composite reference pictures.
JVET-K0447 CE11 Related Work: Long-term Reference Simulated Implementation [C. . Ma, D. . Liu, Y. . Li, F. . Wu (USTC)] [late]
This contribution reports simulation results of long-term reference. Compared with BMS1.0, simulation results show that this tool achieves 2.95% BD rate reduction in LDB configuration. Compared with VTM1.0, simulation results show that this tool achieves 2.43% BD rate reduction in LDB configuration.
The approach is to define the first I picture as long term reference picture.
The gain comes again from sequences with static background.
The approach also provides gain for RA test cases. The IDR picture of each IDR period is defined as long term reference in this case. However, the contribution does not show results for the entire test set in the IDR case, only selected sequences which provide coding gain.
For LDB, the results are worse than those of CE11.
No superior methods compared to CE11, neither here nor in CE11 itself. Discontinue CE11– potentially take up again when evidence of benefit is shown for sequences with non-static background, which might require (normative) decoder-side tools.
7.12CE12 related – Mapping for HDR content (21)
Contributions in this category were discussed XXday XX July XXXX–XXXX (chaired by XXX).
JVET-K0309 In-loop Reshaping for SDR Video [F. . Pu, T. . Lu, P. . Yin, W. . Husak, S. . McCarthy, T. . Chen (Dolby)]
(following notes by JRO when the document was discussed in context of BoG review Tue 17 track B)
Performs coding in a re-shaped domain, requires operating reshaper and inverse reshaper in the loop. Provides gain of 2% bit rate reduction, also for luma.
Reshaping function is computed for the IDR picture and then used over the whole IDR period.
Should be investigated if it has impact on visual quality, and what is the interrelationship with other tools of VTM and BMS (e.g. CCLM)
To be investigated in CE12 (rename as “mapping functions”
JVET-K0468 CE12-related: In-loop chroma refinement [E. . François, C. . Chevance, F. . Hiron (Technicolor)] [late]
Discussed in BoG – applied for SDR and HDR as well. Test in CE12 along with K0309.
7.13CE13 related – Projection formats (1)
Contributions in this category were discussed XXday XX July XXXX–XXXX (chaired by XXX).
JVET-K0332 CE13-related: Adaptive frame packing on top of CMP, MCP, and PAU [P. . Hanhart, Y. . He, Y. . Ye (InterDigital)] JVET-K0522 Crosscheck of JVET-K0332: CE13-related: Adaptive frame packing on top of CMP and MCP [P. . Wang (MediaTek)] [late]
7.14NN technology related (5)
Note: JVET-K0266 also relates to NN technology.
Contributions in this category were discussed Monday 16 July track B 1540–XXXX (chaired by JRO).
JVET-K0158 AHG9: Separable Convolutional Neural Network Filter with Squeeze-and-Excitation block [T. . Hashimoto, E. . Sasaki, T. . Ikai (Sharp)]
This contribution presents a “separable convolutional neural network filter with squeeze-and-excitation block” (SESCNNF), which has fewer parameters than the network structure proposed in JVET-I0022. The current BMS software has multiple filters such as deblocking filter (DF), sample adaptive offset (SAO) and adaptive loop filter (ALF). In this contribution, we replace these three filters with SESCNNF, and SESCNNF shows 3.08%, 4.62%, and 5.73% gain on Y, Cb, and Cr average in BMS AI configuration.
Results are with AI
Comparison is against normal BMS anchor. Test is with other loop filters turned off, and CNNLF always enabled. Decoding time is >500x larger than anchor. Various configurations are shown, e.g. the decoding time is reduced to about 140-150x against anchor, the luma gain goes down to 2%.
Network is trained for different QP values (from CTC). Network does not know the current QP.
C++ implementation, floating point.
JVET-K0443 Crosscheck of JVET-K0158: AHG9: Separable Convolutional Neural Network Filter with Squeeze-and-Excitation block [X. . Song, L. . Wang (Hikvision)] [late] JVET-K0222 AHG9: Convolution neural network loop filter [Y.-L. Hsiao, T.-D. Chuang, C.-Y. Chen, C.-W. Hsu, Y.-W. Huang, S.-M. Lei (MediaTek)]
This document presents three modifications of the convolution neural network loop filter (CNNLF) scheme proposed in JVET-J0018. The first modification is to train CNNLF parameters for each random access segment (RAS, roughly 1 second in the conducted tests) of the video sequence and signal the CNN parameters in the I-slice headers of each RAS, instead of training the CNNLF parameters for the entire video sequence and signalling them in the picture parameter set (PPS). The second modification is to simplify the CNNLF network from “eight layers with reconstructed samples, prediction samples, and residual samples as the input signals” to “four layers with reconstructed sample as the input signal” only. The third modification is that only those pictures with temporal ID equal to 0 or 1 are used in the training process to derive the CNNLF parameters. Compared with VTM-1.0, the proposed CNNLF reportedly achieves 2.57%, 18.52%, and 18.29% BD-rate reductions for Y, U, and V, respectively, for the RA configuration, with 89% decoding time increase. Compared to BMS-1.0, the proposed CNNLF reportedly achieves 0.88%, 13.76%, and 13.19% BD-rate reductions for Y, U, and V, respectively, for the RA configuration, with 29% decoding time increase. Since the chroma BD-rate savings are much higher than the luma BD-rate savings, increasing the chroma QP offset by 1 for both Cb and Cr is tested, and the BD-rate results are reported as follows: 3.67%, 10.10%, and 9.72% for Y, U, and V, respectively, for VTM-1.0 in the RA configuration; and 1.96%, 4.00%, and 3.48% for Y, U, and V, respectively, for BMS-1.0 in the RA configuration. Further research on complexity reduction and training enhancement for improving coding efficiency is suggested. It is suggested that such CNNs are a promising research direction for further study.
The network is trained for each random access segment of a sequence, and the network parameters are sent in slice headers of the I slices for each random access segment. Thus, the random access characteristics of the scheme are more in the spirit of what is intended for the random access configuration, and the algorithmic delay is not as extreme (avoiding whole-sequence pre-analysis). One CNN is trained for Y, and another for Cb/Cr. Filter sizes in the 4 layers are 1x1/3x3/1x1/3x3; represented by 6 bit integers, 16/16/8/4 filters in the layers, ReLU after each layer. The output of network is added to the original decoder output.
The time for the training was not included in the reported encoding time (and the scheme would, in any case, not be expected to be usable in real-time applications).
The chroma QP offset is increased by 1 relative to the CTC when using the scheme.
Higher gain is reported for higher resolutions (likely due to the fact that the number of parameters is relatively less for higher resolutions).
Regarding subjective quality, it is suggested that this shows a visual improvement for the CampFire sequence (due to higher gain for chroma), and sometimes for other sequences.
Theoretically, it could be done for low delay as well, when either using pre-trained networks, or training with a previous segment; this would, however, be difficult with current technology in real-time.
Several experts expressed the opinion that this was very interesting. This was the first time that the reported decoding time on a CPU had become a realistic value.
JVET-K0391 AHG9: Dense Residual Convolutional Neural Network based In-Loop Filter [Y. . Wang, Z. . Chen, Y. . Li (Wuhan UniversityUniv.), L. . Zhao, S. . Liu, X. . Li (Tencent)] [late]
This contribution provides a dense residual convolutional network based in-loop filter (DRNLF) for VVC. In-loop filters, such as DF (deblocking filter), sample adaptive offset (SAO), are employed in VTM for suppressing compression artefacts, which contributes to coding performance improvement. In this contribution, the proposed DRNLF is introduced as an additional filter before SAO. Simulation results report -5.75%, -17.56%, -18.74% BD-rate savings for luma, and both chroma components compared with VTM1.1 under AI configuration, and -6.11%, -15.85%, -14.36% for RA configuration, and -6.05%, -11.53%, -12.30% for LDB configuration, and -7.14%, -12.16%, -12.72% for LDP configuration.
Operated between DBF and SAO. The decoding times reported are 17x for AI, 38x for RA, 37x for LDB (run on GPU). When run on CPU, AI decoding time increases to >800x.
Generally, the bit rate savings reported are higher now than reported in previous meeting. However, further study (AHG) necessary before we could define a CE. In the next meeting, also subjective viewing should be done to better understand the impact of CNN in comparison to other loop filters.
JVET-K0444 Crosscheck of JVET-K0391: AHG9: Dense Residual Convolutional Neural Network based In-Loop Filter [X. . Song, L. . Wang (Hikvision)] [late]