There was not enough feedback from CE participants to take action on an adoption recommendation for the above proposals.
5.6CE6: Intra prediction improvement 5.6.1Summary
5.6.1.1.1.1.1.1.1JCTVC-F026 CE6: Summary report of Core Experiment on intra prediction improvements [A. Tabatabai, M. Budagavi, K. Chono, R. Joshi, A. Segall, H. Yu (CE coordinators)]
Intra prediction improvement core experiments were divided into 5 categories:
BUDI: Description of BUDI seems to be reasonably understandable for the first time. The proponents suggested to select version 2a (saying this balances the tradeoffs for luma/chroma). Reduction of encoding time to 97%, decoding time unchanged, bitrate reduction 0.1-0.2% for the different 2x versions.
Question: Harmonization with MDIS? Is used for block sizes 8x8, 16x16 and 32x32.
Gain by BUDI is much lower than reported in the last meeting.
The reduction in encoding time (skipping the checking of planar mode) is not BUDI specific.
Conclusion: No action.
SDIP: Different harmonizations were investigated.
-
RQT is roughly unchanged (same for rectangular TU) but a flag is embedded to support SDIP. This comes with same performance as un-harmonized version for HE and small gain (0.2%) for LC. Another method is case 2 which is less harmonized in introducing structures such as 8x32 in RQT -> case 1.
-
Harmonization with mode-dependent DCT/DST.
-
Harmonized version is using MDCS unchanged from HM 3.0 for the case of square blocks. This comes with same compression performance of non-harmonized version. Further results unveil that MDCS in combination with SDIP is a bit less well-performing (due to use of SDIP modes?).
-
Harmonization of LM mode: 2 different versions, both with unchanged performance. Question: How is division handled if the sum of hor/ver block lengths is not a power of 2? A: LM used only for square blocks -> LM mode with 1st position.
-
Harmonization with MDIS: No problem with normal MDIS. There is also a combination suggestion with an MDIS modification in CE6e.
-
Combination with planar prediction gives small gain (0.1%).
-
Combination with de-blocking causes losses (used now for both square and non-square blocks whereas original SDIP did not use de-blocking) Question: Is this useful for small partitions?
-
(Note: There is another version provided by Qualcomm which is actually new and should not have been considered in this CE.)
-
Combination with DC prediction filtering (no harmonization filtering) does retain gains of both methods.
The harmonization effort was judged to be satisfactory.
Concern was raised w.r.t. the throughput that can be achieved for the very small non-square blocks (2x8, 1x16). There is another doc (JCTVC-F343) that reports almost zero loss when 1x16 is disabled; however it is reported verbally that omitting 2x8 may cause more severe losses.
Question: Is the new mode 32x2 included in harmonized result? A: Yes. Some of the gain of the harmonized result may be due to that (it was not in the SDIP branch).
Gain seems to be lower than reported last meeting? Not completely true, as the higher-gain versions of SDIP included modes such as 1x4 which are not used any more. Lowest gain reported last meeting was 2.0 and 3.2% for most simple HE and LC versions, now the gain is 1.4% and 2.1%.
Encoding time increase seems to be roughly 36% and 55% for the HE and LC cases. This is approx. doubled as compared to the previous reports (with HM2). The implementers explain that this is due to lack of encoder optimization in HM3. Reason: Method of generating intra prediction references
Currently, numbers comparing SDIP branch versus HM3 are given in JCTVC-F026, and numbers comparing harmonized SDIP versus SDIP branch in JCTVC-F532. Exact numbers of gains and encoder/decoder run times for complete harmonized version vs. HM3.0 to be provided: 1.6%/2.6% BR red., 39%/56% enc. runtime increase, 3%/6% decoder runtime increase.
Conclusion: Further study: Performance when 1x16 and 2x8 TU sizes are disabled, potentially include 2x16 which is assumed to be (according to expert’s opinion) less problematic than 2x8 in terms of throughput. Furthermore, reduction of encoder runtime should be investigated.
Breakout (coordinated by W. Gao) will informally discuss the throughput issue. (See notes under JCTVC-F755.)
Preliminary decision (Monday): Adopt SDIP harmonized (RQT without 2x32/32x2, LM mode 1st position, conventional de-blocking not new method of JCTVC-F556, other items as above), without 1x16/16x1, provided that software implementation in HM3.3 is provided by Wed. 2359 with sufficient quality according to the guidelines, and appropriate WD text.
The WD text submitted was assessed to be of sufficient quality (confirmed by WJ Han Thu. 07-21 p.m.)
The Friday plenary assessed the results of the integration in HM 3.3, which indicated that the encoder runtime for HE was increased by 38%, for LC by 61%, whereas the compression gain became 0.2% lower than expected in both intra configurations. The preliminary decision was later revised – SDIP will be further studied in a CE (not included in the WD & HM).
An early-skip search method was noted in the encoder software for non-square cases. This was mentioned in v3 of the JCTVC-E278, although not described in detail. It was noted that encoder search algorithms should be documented in proposals. A proper description would be needed for the HM text.
It was also pointed out that possibly a harmonization with non-square inter coding would be desirable.
JCTVC-F532 should be updated with the additional results HM3.0 vs. harmonized version, as reported elsewhere in the report.
Disable flag shall be implemented.
Regarding DCIM, see notes below (no action).
Dostları ilə paylaş: |