17.5Loop Filtering
17.5.1.1.1.1.1.1.1JCTVC-D039 ALF decode complexity analysis and reduction [M. Budagavi, V. Sze, M. Zhou (TI)]
This contribution analyzes implementation the complexity of the Adaptive loop filter (ALF) at the decoder.
The contribution only studies the luma filtering.
The current filter allows up to 16 luma filters to be selected per slice, with support regions up to 9x9 diamond.
It was remarked that the symmetry property of the current filter kernels seems unusual and questionable, and may be the result of testing performance on our training set (although the tests sets have changed since that design decision was made) or of raster-flow decision-making processes. It was remarked that enforcing quadrant symmetry does seem to cause some loss in performance (although we don't know how much).
Implementation complexity analysis involves not just analysis of computations but also analysis of memory bandwidth and memory size (area).
There are reportedly two ways to implement ALF at the decoder – frame-based filtering and LCU-based filtering. For both frame-based and LCU-based filtering, line buffers that store previous lines of the deblocked image are needed to reduce memory bandwidth requirements. LCU-based filtering was suggested to be preferable.
The vertical extent of the filter was indicated to be especially important.
This contribution presents two ALF filter sets that reduce the vertical size of the filter ‒ thereby reducing the line buffer/memory bandwidth requirements. Both filter sets have a maximum vertical extent of 5 lines. Filter set 1 reportedly reduces memory bandwidth/memory size requirements by 50% and worst case computations by 20%. Filter set 2 reportedly has similar computational complexity as the existing HM ALF filters but reduces memory bandwidth/memory size requirements by 50%. Existing HM ALF filters reportedly provide average BD BR savings in range of 3.3%-4.1%. Filter set 1 and set 2 provide average BD BR savings in the range of 3.2%-4%.
The Nx5-Set2 filter support scheme was reportedly cross-checked in JCTVC-D188.
The proponent indicated a preference for the proposed "Nx5-Set1" variant, and this view had some support.
A participant remarked that rapid adoption without examining chroma might be a bit of an ad hoc way to move forward, such that perhaps we should do further study instead.
Another participant remarked that the Nx7 approach would be a conservative way to embrace this concept with minimal short-term impact on performance.
Decision: Adopted "Nx7" variation.
17.5.1.1.1.1.1.1.2JCTVC-D188 In-loop and post-processing filtering AHG: Verification results of TI's Proposal JCTVC-D039 [T. Yamakage, T. Chujoh (Toshiba)]
Cross-check of JCTVC-D039. The verification only checked "Nx5-Set2", but included studying the software as well as its results.
17.5.1.1.1.1.1.1.3JCTVC-D116 Region-based adaptive loop filter using two-dimensional feature [T. Ikai, Yukinobu Yasugi (Sharp)]
In this contribution, a region-based adaptive loop filter using two-dimensional feature was proposed. In the proposal, a hierarchical region division was proposed using two-dimensional features: activity and direction.
In the proposed technique, up to eighteen sets of filter coefficients can be sent. The technique was reported to be compatible with the DF-combined ALF where the number of input signals is two.
The experimental results reportedly show that the proposed technique using one input provides 0.5% BR reduction for HE and that the two input scheme (with DF-combined ALF) provides 1.4% BR reduction compared with the anchor (QC_ALF).
The scheme is primarily for inter coded areas. There was only 0.1% gain shown for AI HE.
The adaptivity in the proposal operates on a 4x4 region basis, and the proponent indicated that this aspect is better than the sample-by-sample adaptivity in the current reference design.
No cross-verification was provided.
Further study was encouraged.
17.5.1.1.1.1.1.1.4JCTVC-D213 In-loop reference frame denoising in HEVC reference software [Peter Amon, Eugen Wige, Andreas Hutter, André Kaup]
This proposal presents an algorithm for in-loop denoising of the reference frame. The algorithm modifies the temporal predictor while the decoded picture is unchanged. Knowledge of the noise power within the reference frame is used in order to improve the inter frame prediction. For noise filtering of the reference frame, a denoising algorithm is implemented. In JCT-VC219, the results for the implementation of the algorithm in the AVC reference software (JM 15.1) were presented. It was reportedly shown that the BR can be decreased for (high resolution) noisy image sequences, especially for higher qualities at medium to high data-rates. This contribution reportedly shows that the gains are nearly preserved when implementing the scheme in the HEVC reference software (HM 0.9) for coding of the similar test material. In addition, results for coding two sequences of the standard test set for HEVC are shown.
It was remarked that since this proposes a different processing for the picture that is output than for the picture that is stored for reference purposes, the storage requirements for out-of-order decoding would likely be increased (in the present form of the proposal).
The experiment results were provided for different sequences than are used in the common conditions (only two of the current common conditions test sequences were tested).
The testing was performed for the LD HE case.
The gain shown in the experiments was primarily at high coding fidelities. Compression impacts were reported that ranged from 0.7% (or more) degradation to 3.2% improvement, depending on the sequence. Theoretically, degradations should not be observed with ideal use of the concept.
It was remarked that current broadcast encoders perform some denoising of video source material prior to encoding, which may affect the result obtained by such methods.
Currently, the tested method does not use syntax – only decoder-side inference of the filter to be applied is performed.
This work was reported at a rather preliminary stage of maturity, although it was beneficial to see that progress had been made on moving the technique into the group's common software for testing in that context.
Further study was encouraged.
17.5.1.1.1.1.1.1.5JCTVC-D385 Adaptive Loop Filtering Using Multiple Filter Shapes [F. Kossentini, N. Mahdi, H. Guermazi, M. A. BenAyed (eBrisk Video Inc)]
This contribution presents the advantages of utilizing different filter shapes in the Adaptive Loop Filtering (ALF) part (only the ALF part that is applied to the luminance samples) of the TMuC 0.9 encoder/decoder. More specifically, it was reported that the simultaneous use of 7x7 diamond-shaped and 13x13 cross-shaped filters yields both an improvement in coding efficiency and a reduction in decoder complexity.
A 40% decrease in the worst-case number of multiplies and adds was reported.
Approximately 10% decoder runtime decrease was observed.
Some testing was reported for reduced vertical extent (reduced to 7 or 9 samples vertically).
Encoding time somewhat increased, but encoding was asserted to be parallel-friendly.
Results were reported as follows: HE LD 0.9% improvement, RA 0.4% improvement, AI 0.1% degradation (from reduced worst-case filter region of support).
It was remarked that the filter on/off decision method was more complex in this proposal, since in the HM the decision of on/off is always made for 5x5 rather than testing the results of different filters to make this decision.
A participant remarked that the lack of full 2D support in the cross-shaped filter design might produce visual artifacts.
No cross-verification was provided.
Further study was encouraged.
17.5.1.1.1.1.1.1.6JCTVC-D221 Loop filter with directional similarity mapping (DSM) [P. Lai, F. C. Fernandes (Samsung)]
This contribution presented an adaptive loop filtering design which combines linear spatial filtering and a directional "similarity filter" with a mapping function. To suppress ringing artifacts and preserve edges, the filters exploit directional features in video frames, and the mapping function avoids using pixels with large differences from the pixel to be filtered. The filter design uses only 4 filters with up to 7x7 window size. Using TMuC 0.9 with no ALF as anchor, experimental results reportedly demonstrate average BD rate reductions of 3.5% and 3.3% for Inra HE and Random Access HE, respectively. As compared to the ALF in TMuC 0.9, lower encoding complexity (135% to 118% for Intra HE against anchor) and improvements in visual quality around edges were reported in the Intra and RA settings.
The goals are to reduce computational complexity and to improve visual quality near edges.
Filter symmetries are designed to match the directional filtering classifications.
It was remarked that this may provide perceptual benefits, and that the perceptual effects should be studied.
The report for the technique was somewhat preliminary.
No control map was used.
No cross-verification was provided.
Further study was encouraged.
17.5.1.1.1.1.1.1.7JCTVC-D270 Low Complexity Parametric Adaptive Loop Filter [E. Maani, W. Liu, L. Dong (Sony)]
This contribution proposes a type of Adaptive Loop Filter (ALF) intended to remove coding errors and improve compression efficiency. In this approach, a set of fixed filters was used instead of the traditional trained Wiener filters. The encoder chooses the best (in an RD sense) filter for a Coding Unit (CU) and transmits the index of the filter to the decoder within the bitstream. The selection of the filter at the encoder uses a single pass processing for each LCU, thereby reportedly reducing the encoder delay significantly compared to traditional ALF approaches which were asserted to require one frame delay.
A participant asked whether the common conditions sequences had been used for the training of the filter tap values. The presenter said that additional sequences were used, but he was unsure whether any of the common conditions sequences had been used.
The results were only shown for the LC cases (in which ALF is disabled).
Results were reported as follows: AI LC: 2.3% improvement, RA LC: 2.5% improvement, LD LC: 3.9% improvement.
Further study was encouraged (e.g., in CE).
17.5.1.1.1.1.1.1.8JCTVC-D406 Cross-verification of SONY's proposal on low complexity parametric adaptive loop filter (JCTVC-D270) [Minhua Zhou] (missing prior, uploaded Monday 17th, before meeting)
Cross-verification of JCTVC-D270.
This contribution also provided results for use with ALF enabled. A 3.1% loss was reported in the RA case (with 20% decoding time reduction). Decoding time was reportedly reduced.
It was reported that the proposed parametric adaptive loop filter algorithm has the potential of improving the coding efficiency and reducing the encoder-side complexity; however, its decoder complexity deserves further investigation. The large number of pre-defined wiener filters (i.e. 32 9x9, 32 5x5), longer filter tap length (i.e. 41 tap and 25 tap), and higher precision of filter coefficients (i.e. 18 bits) could reportedly make the PALF more expensive than the HM1 ALF on the decoder side. The contributor recommended to carry out CEs on the ALF proposals including the PALF to seek a new ALF design which addresses the both encoder and decoder ALF complexity concerns raised in the various contributions.
17.5.1.1.1.1.1.1.9JCTVC-D377 Development of HEVC deblocking filter [A. Norkin, K. Andersson, R. Sjöberg (Ericsson)]
This contribution proposed a deblocking filter developed from the current HEVC deblocking filter. The proposed filter reportedly outperforms the TMuC0.9 anchor for all six common test configurations, with an average BD BR reduction of 1.2%. The decoding time is approximately unaffected. The subjective performance of the proposal was reportedly similar to that of the anchor.
Considers application of different modifications on each side of an edge.
The proposed changes reportedly do not modify the current deblocking framework.
Less filtering is conducted for smaller blocks (modifying only 1 sample on each side of the boundary). The smallest filtered block boundaries are still 8x8.
Some table values were also modified.
17.5.1.1.1.1.1.1.10JCTVC-D192 Analysis on the interaction between deblocking filtering and in-loop filtering [T. Yamakage, S. Asaka, A. Tanizawa, T. Watanabe, T. Chujoh (Toshiba)]
This contribution reports some analysis on the relationship for various in-loop filtering techniques, in particular interaction between deblocking filtering and Wiener-based in-loop filtering (QC_ALF). This item was one of the mandates of the In-loop and post-processing filtering ad-hoc group.
In this contribution, a parameter () that controls filter on/off and strength of filtering for deblocking was adjusted (thus weakening the filtering) for both with and without QC_ALF cases. By comparing the BD BR gain of each case with the best performing values, QC_ALF reportedly showed additional BR reduction of 0.7% for AI, 0.5% for RA and 0.7% for LD case for HE coding condition.
The contribution advocated changing the values for deblocking filtering according to the usage of QC_ALF so that the default value can provide the best performance.
The presenter noted that excessive weakening of the filter might occur if we exclusively optimize the parameters for PSNR performance. The subjective effect of the filtering very important.
17.5.1.1.1.1.1.1.11JCTVC-D293 Toshiba : Crosscheck of Toshiba’s deblocking filter in JCTVC-D192 by MediaTek [Jicheng An, Xun Guo]
Cross-verification of JCTVC-D192.
17.5.1.1.1.1.1.1.12JCTVC-D214 Reduction of operations in the critical path of the deblocking filter [Matthias Narroschke, Hisao Sasai, Thomas Wedi]
In the current HM, the deblocking of images is performed based on CUs. For the deblocking of a current coding unit, it was proposed to perform all required decisions based on the unfiltered signal of the current coding unit. This removes dependencies present in the current deblocking filter design. The removed dependencies increase the parallel processing possibilities for the deblocking filter. As a consequence, the sequential operations required in the critical path for the decision and filtering operations of the deblocking filter were reduced – reportedly by 30%. The BD-BR stays unchanged for the luma signal and approximately unchanged for the chroma signals. An average BR reduction of 0.2% is reported for the two chroma signals in the case of LD HE and of 0.1% in the case of LD LC (up to 0.5% for certain classes). The effect on the subjective quality was reportedly not noticeable.
17.5.1.1.1.1.1.1.13JCTVC-D455 Cross-check results of the deblocking filter of Panasonic (JCTVC-D214) [Teruhiko Suzuki, Masaru Ikeda] (late registration Wednesday 26th after start of meeting, uploaded Thursday 27th, near the end of the meeting)
No detailed discussion of this cross-check contribution was considered necessary.
17.5.1.1.1.1.1.1.14JCTVC-D263 Parallel deblocking filter [Masaru Ikeda, Junichi Tanaka, Teruhiko Suzuki]
This contribution proposed a modification of the deblocking filter design in TMuC-0.9. First, the filter process is made parallel by changing the order of horizontal filtering (across vertical edge) and vertical filtering (across horizontal edge). Second, the edge judgment of the block boundary for the deblocking filter is made parallel by changing the position of the lines which are used for the judgment.
The contribution proposes to effectively perform all horizontal filtering before any vertical filtering.
It proposes to modify which lines are used in the decision of where to apply the filtering, to avoid having the first stage of filtering affect the second stage of filtering.
With the modifications, it is not necessary to perform the DF processing on a LCU-by-LCU basis. The decoder would obtain the same result with different ordering of the DF processing.
A slight loss in BD BR performance was reported – about 0.0-0.2% average, somewhat more in Class E.
The loss is primarily due to the modification of which lines are used for the filtering decisions.
17.5.1.1.1.1.1.1.15JCTVC-D454 Cross-check results of the parallel deblocking filter of Sony (JCTVC-D263) [Matthias Narroschke (Panasonic)] (late registration Wednesday 26th after start of meeting, uploaded Thursday 27th, near the end of the meeting)
No detailed discussion was considered necessary.
17.5.1.1.1.1.1.1.16JCTVC-D434 Cross verification of Ericsson's proposal JCTVC-D377 by Nokia [K.Ugur (Nokia)] (late registration Saturday 22nd after start of meeting, uploaded Saturday 22nd, third day of meeting)
Cross-verification of JCTVC-D377.
Dostları ilə paylaş: |