7.5.1.1.1JVT-Y030 ( Prop 2.2/3.1) [H. Kimata, S. Shimizu (NTT)] Exper results on MVC down-sampled inter-view pred
This contribution reports experimental results for down-sampled inter-view prediction for MVC, as was requested at the last JVT meeting. The down-sampled inter-view prediction scheme was asserted to have been proposed for mixed spatial resolution views to reduce bit rate and complexity for multi-view video coding. This document also reports performance analysis of coding efficiency in terms of the generation method of the down-sampled images.
A low-resolution view is predicted from a high-resolution view for inter-view prediction.
Downsampling is performed by just dropping samples without using any anti-alias filtering (this enables use of the same memory area for storing the down-sampled picture as well as the higher-resolution picture) – this scheme was reported not to harm the coding efficiency of inter-view prediction for the low-resolution video.
Comparison was provided to mixed-resolution coding without inter-view prediction across resolutions.
Presentation? Uploaded later.
A 47% bit rate savings was reported for the low-resolution views, relative to not using inter-view prediction across resolutions.
Remark: That seems like an unexpectedly high degree of benefit.
The bit rate savings for the low-resolution part (each second view) as compared to separate MVC encoding of low and high resolution views was reported to be 47% on average. Assuming that the low resolution part, in the case of separate MVC encoding, would still be 1/3 of overall bit rate, this would still be more than a 15% BR savings. The loss due to not performing filtering when generating the down-sampled reference was reported to be very low (<1 dB on average).
Note: This saving appears unreasonably high, provided that typically inter-view prediction is applied less often than temporal prediction.
Note: The application where every-other camera has a different resolution may be very specific.
Question: What is the relationship to JVT-Y052 and JVT-Y054?
Remark: It might be more interesting to know the amount of improvement from the technique, multiplied by the fraction of the total bit rate that is consumed by the lower-resolution views – e.g., reduce the 47% by a factor of 3.
7.5.1.1.2JVT-Y054 ( Prop 2.2.1/3.1) [Y. Chen, S. Liu, Y.-K. Wang, M. M. Hannuksela, H. Li (Nokia)] Low complexity asymmetric MVC
“Asymmetric MVC” refers to coding of two views of a stereoscopic video source with different qualities. In this proposal, a new asymmetric coding technique was proposed for MVC by enabling inter-picture prediction from a higher resolution picture to a lower resolution picture, when the current picture is one fourth of the size of the inter-view reference picture. It was asserted that the proposed scheme has low complexity and requires a smaller decoded picture buffer size compared to downsampled inter-view prediction. (The preceding sentence may be unclear.) Simulation results reportedly showed that it provides comparable coding efficiency compared to downsampled inter-view prediction, and both reportedly outperform the “simulcast MVC” coding scheme, wherein two sets of views with different resolutions are independently coded into two MVC bitstreams.
This is the same basic idea as JVT-Y030, but with a 2-tap anti-alias filtering applied in some cases when downsampling.
Remark: Is it acceptable to have different views with different resolution? For stereo, perhaps. Stereo seems to be the main argument for support of this.
The primary motivator is complexity reduction (relative to what – exactly? – relative to only coding high-resolution views). As far as implementation effort is concerned, this increases effort.
Remark: So far we have desired to focus on achieving similar quality in all views.
7.5.1.1.3JVT-Y082 ( Info 2.0) [C. Fehn, P. Kauff, A. Smolic (HHI), S. Cho, N. Hur, J. Jim, S.-I. Lee (ETRI)] Asymmetric coding of stereoscopic video for mobile 3DTV
This contribution described the use of reference picture resampling (RPR) “asymmetric coding” with inter-view prediction for the compression of stereoscopic video sequences. It reported that downsampling one of the views before coding with inter-view prediction can lead to additional coding gains compared to the simulcast case. It was asserted that according to earlier psychological studies of stereoscopic vision, such processing should not reduce the overall visual quality of the resulting three-dimensional (3D) percept.
The suggestion is to consider RPR as a low-overhead way to enable stereoscopic applications. The contribution reports a requirement of 10-30% bit rate overhead by quarter-resolution downsampling (e.g. 320x240 QVGA and 160x120 QQVGA).
The contribution referred to JVT-W094 for binocular suppression theory.
The contribution suggested further study.
Remark: The motivation seems to be stereo rather than a general N-view cases.
Disposition: Plan to continue AHG to further investigate.
7.5.2Reduced-resolution update (RRU) for MVC
7.5.2.1.1JVT-Y052 ( Prop 2.2/3.1) [S. Cho, N. Hur, J. Kim, S.-I. Lee (ETRI), C. Fehn (HHI)] Resid-downsampled stereoscopic video for moble 3DTV
This contribution proposes a stereoscopic video coding using a residual-downsampling algorithm that is applied on macroblock basis. The proposed video coding algorithm was reported to be more efficient at lower bitrates for mobile 3DTV service such as T-DMB (Terrestrial Digital Multimedia Broadcasting). For supporting the proposed coding algorithm in the JMVM, the contribution also proposes residual-downsampling related syntax in the sequence_parameter_mvc_extension( ) syntax structure.
The contribution uses an “IPPP” coding structure.
The design is conceptually the same as reduced resolution update (RRU) (e.g., as in H.263 Annex Q or MPEG-4 Part 2 RRU) applied to the entire sequence.
For four two-view 320x240 sequences (not common conditions sequences), around a 0.5 or 0.6 dB average gain was reported.
The residual was encoded by half spatial resolution.
Remark: The PSNR values with up to 45 dB (as compared to original full resolution) appear unreasonably high.
Remark: How is it possible to achieve 40+ dB (and even a benefit in that range) without any ability to encode high-frequency residual differences?
Presentation? Uploaded later.
Cross-verification? Apparently not.
The design was based on the Baseline profile, so no 8x8 transform block size capability was assumed.
The design couples the RRU invocation to the segmentation used for prediction of each region. If a large block size is applied for prediction, RRU is invoked (regardless of whether high frequency residuals are present or not). If a small block size is applied for prediction, RRU is not invoked.
Remark: Tying the RRU decision to the transform size flag (or adding modes or something like that rather than coupling directly to prediction block size) would seem to make more sense.
Remark: Or perhaps just enabling use of an 8x8 transform (perhaps with discarding of high-frequency components – possibly as an encoder-only decision without special syntax) rather than using RRU at all would be a better design.
Response from proponent: This also switches from quarter-sample motion to half-sample motion whenever the block size is relatively large.
Remark: Such switching could be applied separately from RRU concepts – would it provide gain? – we speculate not.
Further study would be needed to determine whether there is value in this, and how much.
Disposition: Include in the same AHG with RPR.