5.17.2.2SDIP-related 5.17.2.3Intra prediction complexity reduction
5.17.2.3.1.1.1.1.1JCTVC-H0176 Border adaptive decimation for LM mode [C. Gisquet, E. François (Canon)]
In JCTVC-G358, additional luma-based modes were presented to improve coding efficiency. Also, in JCTVC-G126, simplifications to the OLS computations of the original luma-based modes were also presented. This present contribution merges the two concepts to evaluate the cumulative improvements. It was reported that the technique brings up to 1.3% and 1.2% BD BR gains in AI HE and 1.9% and 2.0% in AI HE 10 bits, respectively, for U and V, while reducing the encoding runtime by up to 1% and the decoding runtime by 2%.
This introduces a second LM mode, but reduces LMS computation for both.
There was no support for this among the experts.
5.17.2.3.1.1.1.1.2JCTVC-H0206 Crosscheck of JCTVC-H0176 [K Sato (Sony)] [late]
5.17.2.3.1.1.1.1.3JCTVC-H0244 Non-CE6: Intra-mode bypass parallelism (IMBP) [C. Rosewarne, M. Maeda (Canon)]
This contribution presented a method of encoding CU data for intra mode. The method tries to improve CU decoding performance by grouping data into chunks to achieve higher processing performance and to allow parallel processing with other CU data. Simulation results reportedly show no coding efficiency loss. This modification was implemented on top of HM-5.0 and resulted in degradations of 0.0% for IA HE, 0.0% for RA HE 0.0% for LB HE 0.0% for LP HE configurations in the Luma channel (relative to HM-5.0).
There was no support for this among the experts.
5.17.2.3.1.1.1.1.4JCTVC-H0686 Crosscheck of JCTVC-H0244 on Intra-mode bypass parallelism [W.-J. Chien (Qualcomm)] [late]
5.17.2.3.1.1.1.1.5JCTVC-H0464 Non-CE6a: Using averaged down-sampling reference pixels in LM parameter generation [L. Liu, Jianhua Zheng, P. Zhang (HiSilicon), G. Li, Nam Ling, Li Song (SCU)]
A different method of intra chroma from luma (LM mode) prediction was proposed. The simplification of intra LM mode reduces the number of multiplication operations for calculating parameter alpha and beta by using the averaged down-sampled reference luma and chroma samples. It was reported that 1:2 down-sampling of 16x16 PU has 0.00% change for Y and Cb, 0.02% coding loss for Cr; 1:4 down-sampling of 16x16 PU and 1:2 down-sampling of 8x8 PU has 0.00% change in Y, 0.04% and 0.05% coding loss for Cb and Cr, in all intra high efficiency tests.
There was no support for this among the experts.
5.17.2.3.1.1.1.1.6JCTVC-H0490 Non-CE6a: Reduce the look-up table entries for LM mode calculation [L. Liu (HiSilicon), G. Li (SCU)]
This contribution sugggests to reduce the look-up table used in LM mode calculation from 63 entries to 56 entries or further to 32 entries. It can be derived that the entries from 1~7 are theoretically not used for the calculation of alpha. And the entries from 1~31 are seldom used in all testing sequences. Simulations were reported to show that the performance is the same as the HM5.0 anchor in all intra high efficiency tests.
For information – the contribution was noted.
5.17.2.3.1.1.1.1.7JCTVC-H0624 Cross-check of JCTVC-H0464 and JCTVC-H0490 [D. Hoang (Zenverge)] [late]
5.17.2.3.1.1.1.1.8JCTVC-H0491 Non-CE6a: Remove the large multiplier for LM mode calculation [L. Liu, Yongbing Lin (HiSilicon), G. Li (SCU)]
This contribution proposed a change of the chroma from luma intra prediction (LM mode). The proposed method removes the 13-bit (AI-HE) or 15-bit (AI-HE10) multipliers by 8 or 10-bit multipliers through algorithm optimization. The BD bit rate for this optimization was reported as: AI-HE: Y: 0.0%, U: 0.1%, V: 0.0% and AI-HE10: Y: 0.0%, U: 0.1%, V: 0.0%.
Several experts expressed interest, but the detailed description was not available in the first version. Further checking seemed necessary (opinions of T. Hellman, M. Budhagavi, and the crosschecker were requested). The inspection unveiled that a second multiplier is still retained such that the current benefit is minimum, but it was emphasized that the basic direction is very useful and further investigation is recommended on whether the second multiplier could be removed as well.
5.17.2.3.1.1.1.1.9JCTVC-H0549 Enabling Sub-LCU level parallel decoding [S. Kumar, H. -C. Chuang, S. Xiao, X. Wang, W. -J. Chien, M. Karczewicz (Qualcomm)]
In HEVC (HM5.0), parallel decoding options include parallel decoding of different LCU rows. A more fine grained parallelism is not possible in the current scheme due to dependencies among blocks inside an LCU. E.g. it is not possible to split an LCU row into a top half and a bottom half for parallel decoding. This contribution identifies the minimal set of dependency (in intra prediction) preventing the feasibility of aforementioned granularity of parallelism (assuming parsing and MV prediction can be done ahead of time). Simulation results show that breaking this minimal set of dependency does not have obvious impact on coding performance. Practical advantages were asserted to include better performance of any cache subsystem, arising from reduced distance between different in-flight processing units and the consequent improved locality of reference.
Parallelism is achieved by partially disallowing prediction across LCU boundaries. It was asked whether this causes artefacts. Proponents say this is not the case, but only refer to PSNR testing.
There was no support for this among the experts.
5.17.2.3.1.1.1.1.10JCTVC-H0604 Cross-check of JCTVC-H0549 [T. Davies (Cisco)] [late]
Dostları ilə paylaş: |