6.2CE2: Loop filters (30)
Contributions in this category were discussed Wednesday 11 July in Track B 1130–1300 and 1430-1900 (chaired by JRO).
JVET-K0022 CE2: Summary Report on In-Loop Filters [L. . Zhang, K. . Andersson, C.-Y. Chen]
This contribution provides a summary report of Core Experiment 2 on in-loop filters. Five categories, including 1) bilateral filter, 2) deblocking filters, 3) sample adaptive offset (SAO) filter, 4) Adaptive Loop Filters (ALF), and 5) non-local filters are covered by this CE.
Test conditions are specified for each category. The corresponding coding performance of each coding tool under evaluated in CE2 are summarized in this contribution. In addition, answers to questions mentioned in Error: Reference source not found and crosschecking results are also integrated in this contribution.
CE2.1: Bilateral Filter
Test#
|
Description
|
Document#
|
2.1.1
|
Same as BMS/JEM version
|
|
2.1.2
|
Bilateral filter - spatial filter strength adjustment
|
JVET-K0231
|
2.1.3
|
In-loop bilateral filter (also operated after block reconstruction,
i.e. affecting subsequent intra prediction)
|
JVET-K0384
|
Test#
|
# of filter taps
|
Sample difference calculation
|
Parallel friendly (each sample can be filtered independently from other samples)
|
Table Size
|
2.1.1
|
5
|
1×1
|
Y
|
2778 bytes
|
2.1.2
|
5
|
1×1
|
Y
|
2778 bytes
|
2.1.3
|
5
|
1×1 for intra
3×3 for inter
|
Y
|
30000 18bits
|
Results vs. VTM:
|
AI
|
RA
|
LDB
|
Test#
|
Y
|
U
|
V
|
EncT
|
DecT
|
Y
|
U
|
V
|
EncT
|
DecT
|
Y
|
U
|
V
|
EncT
|
DecT
|
2.1.1
|
-0.11%
|
-0.09%
|
-0.12%
|
108%
|
107%
|
-0.41%
|
-0.14%
|
-0.29%
|
105%
|
102%
|
-0.37%
|
0.26%
|
0.31%
|
105%
|
101%
|
2.1.2
|
-0.10%
|
-0.10%
|
-0.15%
|
103%
|
109%
|
-0.40%
|
-0.18%
|
-0.29%
|
102%
|
103%
|
-0.37%
|
0.12%
|
0.14%
|
101%
|
101%
|
2.1.3
|
-0.29%
|
-0.08%
|
-0.15%
|
105%
|
109%
|
-0.79%
|
-0.24%
|
-0.28%
|
105%
|
107%
|
-0.63%
|
0.30%
|
0.42%
|
105%
|
109%
|
Results vs. BMS:
|
AI
|
RA
|
LDB
|
Test#
|
Y
|
U
|
V
|
EncT
|
DecT
|
Y
|
U
|
V
|
EncT
|
DecT
|
Y
|
U
|
V
|
EncT
|
DecT
|
2.1.1
|
-0.37%
|
0.26%
|
0.31%
|
105%
|
104%
|
-0.41%
|
-0.38%
|
-0.47%
|
107%
|
102%
|
-0.51%
|
-0.12%
|
0.08%
|
102%
|
102%
|
2.1.2
|
-0.26%
|
-0.39%
|
-0.39%
|
103%
|
104%
|
-0.44%
|
-0.45%
|
-0.52%
|
104%
|
100%
|
-0.46%
|
0.06%
|
0.06%
|
103%
|
101%
|
2.1.3
|
-0.33%
|
-0.25%
|
-0.24%
|
105%
|
106%
|
-0.54%
|
-0.27%
|
-0.28%
|
107%
|
103%
|
-0.62%
|
0.20%
|
0.38%
|
106%
|
104%
|
The complexity impact should be further studied, in particular regarding
- the pipelining aspects with small intra prediction blocks
- the table size for solution 2.1.3
- further simplification by re-using difference computations
Some of these aspects are reported to be touched in CE related documents
Further study should be performed (continuation of CE).
CE2.2: Deblocking filter
Test#
|
Description
|
Document#
|
2.2.1.1.a
|
Long deblocking filters and fixes (only luma)
|
JVET-K0307
|
2.2.1.1.b
|
Long deblocking filters and fixes (version which only applies fixes for luma,
no long deblocking filter)
|
JVET-K0307
|
2.2.1.2
|
Extended Deblocking Filter (only luma)
|
JVET-K0393
|
2.2.1.3
|
Long deblocking filters (only luma)
|
JVET-K0232
|
2.2.1.4
|
Tests on long deblocking (only long for luma also filtering
chroma when long filters are used for luma)
|
JVET-K0334
|
2.2.1.5
|
Long-tap deblocking filter (only luma)
|
JVET-K0112
|
2.2.1.6.a
|
Long-tap deblocking filter for luma
|
JVET-K0152
|
2.2.1.6.b
|
Long-tap deblocking filter for chroma
|
JVET-K0152
|
2.2.1.6.c
|
Long-tap deblocking filter for luma and chroma
|
JVET-K0152
|
2.2.1.7
|
Deblocking Improvements for Large CUs both luma and chroma
|
JVET-K0315
|
2.2.2.1
|
Deblocking filter with asymmetric weighting (weak filter modification)
|
JVET-K0129
|
2.2.2.2
|
Luma-adaptive deblocking filter (qp offset change based on luma level)
|
JVET-K0386
|
An analysis of design aspects of the different proposals is included in the CE report (v3), but not fully agreed among participants. This should be resolved offline, insert tables from section 2.3 when confirmed
Performance vs. VTM (very similar for BMS)
|
AI
|
RA
|
LDB
|
Test#
|
Y
|
U
|
V
|
EncT
|
DecT
|
Y
|
U
|
V
|
EncT
|
DecT
|
Y
|
U
|
V
|
EncT
|
DecT
|
2.2.1.1.a
|
-0.20%
|
0.00%
|
0.00%
|
104%
**
|
103%
|
-0.17%
|
-0.05%
|
-0.01%
|
103%
**
|
103%
|
-0.05%
|
0.04%
|
-0.15%
|
99%
**
|
102%
|
2.2.1.1.b
|
-0.19%
|
0.00%
|
0.00%
|
103%
**
|
101%
|
-0.12%
|
-0.01%
|
0.06%
|
104%
**
|
101%
|
-0.11%
|
0.12%
|
0.02%
|
102%
**
|
100%
|
2.2.1.2
|
0.32%
|
-0.01%
|
-0.01%
|
100%
|
102%
|
0.14%
|
0.01%
|
0.03%
|
100%
|
101%
|
0.04%
|
0.08%
|
-0.01%
|
100%
|
101%
|
2.2.1.3
|
0.00%
|
0.00%
|
0.00%
|
100%
|
104%
|
-0.02%
|
-0.02%
|
-0.01%
|
100%
|
103%
|
0.01%
|
0.06%
|
0.05%
|
100%
|
103%
|
2.2.1.4
|
0.01%
|
0.01%
|
-0.01%
|
103%
**
|
105%
|
-0.09%
|
-0.39%
|
-0.44%
|
105%
**
|
104%
|
0.02%
|
-0.35%
|
-0.47%
|
112%
**
|
106%
|
2.2.1.5
|
0.03%
|
0.00%
|
0.00%
|
98%
|
100%
|
-0.02%
|
-0.03%
|
0.03%
|
99%
|
105%
|
0.10%
|
0.12%
|
0.17%
|
100%
|
98%
|
2.2.1.6.a
|
-0.03%
|
0.00%
|
0.00%
|
105%
**
|
102%
|
-0.08%
|
-0.03%
|
0.04%
|
95%
**
|
101%
|
0.01%
|
-0.04%
|
0.03%
|
104%
**
|
101%
|
2.2.1.6.b
|
0.00%
|
-1.06%
|
-0.83%
|
100%
**
|
102%
|
-0.04%
|
-1.93%
|
-1.88%
|
102%
**
|
100%
|
-0.07%
|
-1.65%
|
-1.79%
|
104%
**
|
101%
|
2.2.1.6.c
|
-0.04%
|
-1.06%
|
-0.83%
|
113%
**
|
104%
|
-0.10%
|
-1.95%
|
-1.89%
|
115%
**
|
103%
|
-0.04%
|
-1.66%
|
-1.76%
|
124%
**
|
103%
|
2.2.1.7
|
-0.01%
|
0.33%
|
0.33%
|
105%
|
106%
|
-0.08%
|
0.22%
|
0.26%
|
105%
|
107%
|
0.02%
|
0.20%
|
0.20%
|
104%
|
90%
|
2.2.2.1
|
-0.11%
|
0.00%
|
0.01%
|
99%
|
99%
|
0.08%
|
0.02%
|
0.05%
|
100%
|
100%
|
0.13%
|
0.15%
|
0.16%
|
100%
|
101%
|
2.2.2.2
|
0.00%
|
0.00%
|
0.00%
|
100%
|
101%
|
0.01%
|
-0.04%
|
0.01%
|
100%
|
100%
|
0.03%
|
0.02%
|
0.03%
|
100%
|
101%
|
For deblocking, subjective viewing is needed, PSNR does not provide evidence for case of deblocking.
Subjective viewing with QPs 32+37, compare to VTM
Candidate sequences:
UHD: Food market, Campfire, Tango
HD: Ritual Dance, Kristen and Sara
RA conf. for UHD, LD for HD sequences
From 2.2.1.1, a and b should be tested (b is BF only, no long filter)
From 2.2.1.6, only c should be tested
This was further discussed Saturday 14th 1715 after the viewing. A report was given as follows:
A decision was taken during the JVET meeting to perform a expert subjective assessment to evaluate the performances of the participants to the CE 2.2.
The Test Chair was asked to design a test trying to assess the Anchor (VTM 1.0) vs all the received submissions.
The test was performed with the participation of 15 JVET experts (5 more participated as informal viewers).
The CE experts asked to perform a visual assessment comparing the Anchor with each Submission, at the UHD resolution, using three test sequences coded at two QP rates.
A total of 9 submissions were considered and labelled with the P-codes from P10 to P19 (P18 was not considered, being not available the data), two QP were considered (QP32 and QP37) and three test sequences were encoded (Campfire, Market, Tango).
The test site was reasonably acceptable, being made of a room isolated from visual and audible external noise; light was dimmable from full 100 candles peak light down to a complete dark; no light was hitting the surface of the monitor.
The monitor was a mid-low consumer 55” TV set; all local post processing features were disables and light and brightness was put and the top values to allow a better vision of artefacts.
Five viewers were seated in front of the monitor at 2H and arranged inside a 60° angle from the screen centre. An analysis of the collected data showed no significant difference between including or excluding the two viewers seated on the external sides.
The A vs. B test was done presenting on the screen alternatively the Anchor and the sequence under test; presentation order was randomised trying to equally distribute the content and the quality across the viewing sessions.
The Basic Test Cell (BTC) of this test was made presenting the label “A” on the screen followed by the Anchor and the letter “B” and a coded clip; a message “vote N” was shown for four seconds on the screen to allow the viewers to fill out the scoring sheet. The viewers were told that the sequence was random, i.e. they did not know that “A” was the anchor.
Having to assess a total of 54 cases, and being each BTC 25 seconds long, the total test time was 23’ and 20”; this lead to the design of two test sessions each including the evaluation of 27 coded clips; to examine the behaviour of the viewers six dummy cases were inserted in the test comparing (for each test sequence and QP) the anchor vs. itself.
All scores were collected on paper scoring sheets; the viewers were asked to score 1 when they retained the sequence “A” (the Anchor) was better than “B” (the coded clips) and to score 2 when “A” was worst than “B”; when “A” was equal to “B” the viewers had to score 0.
The MUP player was used together with a high speed PC to provide a smooth flow of UHD content.
The coded video clips were all made of 300 frames. This led to a viewing time of 10 seconds for the sequence “Tango” and of only 5 seconds for the sequences “FoodMarket4” and “Tango”.
It was also noted that:
-
in general the compression ratio was rather low and general quality was rather high,
-
5 seconds was a time too short to assess some impairments watching “FoodMarket4” and “Tango”.
For the above reasons it decided (together with many CE2.2 experts) to decrease the frame rate of “FoodMarket4” and “Tango” from 60 to 30 fps, getting a sequence length of 10 seconds allowing a better detection of any possible difference between the Anchor and the coded video clips.
The test sessions were conducted from 11:15 am to 2:20 pm of Saturday July 14 2018, in the test room.
Results of the test are shown in table below.
All the scores “2” (i.e. B better than A) were converted in the score “1”.
Then all the scores for each test cases were added to get a quality index.
The values of the indexes were ranging from 9 to -8 for the sequence “FoodMarket4”, from 9 to -10 for the sequence “Campfire” and from 6 to -8 for the sequence “Tango”.
The difficulty and the relative low reliability of this testing procedure (for which the Test Chair was discouraging the Experts to proceed) is shown by the “trap” cases inserted in the test, asking the viewers to compare Anchors vs. Anchor (i.e. itself).
A score of 0 (or at least close to 0) was expected for all the six “trap” cases; only in two case the “trap” got 0 and 1 while the other four cases the traps got scores of -6 and -5 (two times).
Here below are reported the graphs with the results.
P01 was the result of comparing the anchor against itself, i.e. identical sequences. The result of this might be judged to be the uncertainty of the test, e.g. in the case of Campfire any result less than +/-6 seem to be random.
The results of the visual test do not allow to draw reasonable conclusions. It was suggested by the test chair to iterate the test by the next meeting (or better before the meeting).
The CE description already mentioned that results should be prepared with QP42, however most participants did not provide them. During the current meeting, some results with QP42 should be assessed together with Vittorio, to judge if that would be a better operation point for comparison.
It was also mentioned that for the next round of viewing, 10s sequences should be used rather than slowing 5s down.
Preferred sequences would be Campfire, Food Market, Park Running.
No conclusion possible – continue CE, AND PLEASE READ THE CE DESCRIPTION CAREFULLY.
Decision (VTM/BMS): Apply the following fixes suggested in JVET-K0307, JVET-K0237, JVET-K0369, JVET-K0232, JVET-K0315:
-
Perform deblocking at boundaries of TUs with any size >=64.
There is also the suggestion to avoid duplicate filtering at 4x4 CU boundaries by reducing the deblocking to only 1 sample at the boundary; the current VTM software (the draft text does not specify deblocking) just applies filtering at CU boundaries (which could have a minimum size of 4x4), whereas the original deblocking of HEVC was at 8x8 boundaries. It is not clear if the deblocking at a 4x4 grid is necessary, as already HEVC had 4x4 TUs and PUs, and they were never deblocked. 4x4 deblocking quadruples the worst case complexity, and also has impact on parallelism.
Decision (SW): It is suggested in this context to implement VTM/BMS SW as the original HEVC deblocking, filtering on an 8x8 grid as minimum size. Was discussed in JVET plenary on Sunday and agreed. Kenneth will provide the SW update.
Another fix suggested is related to BMS, where it is suggested to apply deblocking at subblock boundaries as well.
Further study is necessary on the latter two aspects. Include this in CE2
Dostları ilə paylaş: |