6.10.5.1.1.1.1.1.1JCTVC-F063 Wavefront Parallel Processing with Tiles [C.-W. Hsu, C.-Y. Tsai, Y.-W. Huang, S. Lei (MediaTek)]
This contribution extended the concept of wavefront parallel processing (WPP) in JCTVC-E196 and applied it to tiles in JCTVC-E408 for parallel processing. In experiment results designed by the contributor (not using the common conditions), it was reported that the proposed WPP with tiles was 0.5% and 0.6% better than tiles and WPP respectively when two parallel threads were used, was 0.8% and 0.5% better than tiles and WPP respectively when three parallel threads were used, and was 1.2% and 0.5% better than tiles and WPP respectively when four parallel threads were used, in terms of BD-rates. In addition, the number of causality checks of the proposed WPP with tiles was less than 16%, 22%, and 33% of that of WPP for class A (2560x1600), class B (1920x1080), and class E (1280x720), respectively.
Further study was encouraged.
6.10.5.1.1.1.1.1.2JCTVC-F450 Cross-check of MediaTek’s contribution on wavefront parallel processing with tiles (JCTVC-F063) [A. Fuldseth (Cisco), M. Zhou (TI)]
6.10.5.1.1.1.1.1.3JCTVC-F274 Wavefront parallel processing for HEVC encoding and decoding [C. Gordon, F. Henry, S. Pateux (Orange FT)]
This contribution describes a method to perform parallel encoding and decoding of video using HEVC. LCU lines that are processed in parallel by encoding/decoding threads. In order to limit performance degradation, a wavefront pattern of processing ensures that spatial and motion vector dependencies are fully preserved, as recommended in JCTVC-D073. In addition, a single additional probability buffer is used to synchronize CABAC probabilities down the second LCU column. The average BD-rate degradation is +0.7% (all-intra +0.2%, random access +0.7%, low delay +1.2%). This contribution also describes a parallel software implementation of the HM3.0 decoder using wavefront parallel processing. The average decoding time compared to anchors (sequential HM3.0) on larger sequences (classes A, B, E) is reportedly 55% with 2 decoding threads and 33% with 4 decoding threads. This contribution also describes a combination of wavefront parallel processing with tiles from JCTVC-E408.
Previously proposed in JCTVC-E196.
Entry point indicators are proposed to be added in slice syntax.
Enables parallelism in both encoder and decoder.
Transcoding is possible between parallel and non-parallel entropy coding (without full decode).
Draft text was provided. It was reported that software can be provided on request.
A participant asked about the complexity penalty for a non-parallel decoder – the answer was that it is one CABAC context per LCU line vertically.
A participant commented that the memory bandwidth requirements on the encoder and decoder may be high.
6.10.5.1.1.1.1.1.4JCTVC-F486 Cross-check - Wavefront Parallel Processing for HEVC Encoding and Decoding (JCTVC-F274) [A. Henkel (Fraunhofer HHI)] [late upload 07-04]
Tested "experiment 1" on impact on RD performance.
6.10.5.1.1.1.1.1.5JCTVC-F527 Cross-check of JCTVC-F274 from Orange Labs [V. Drugeon (Panasonic)] [late upload 07-08]
Studied the software and identified a couple of small issues. Tested "experiment 3" on wavefront processing inside of tiles.
6.10.5.1.1.1.1.1.6JCTVC-F588 Cross-check report of JCTVC-F274 wavefront parallel processing [M. Coban] [upload 07-14 after opening]
Did not study the software. Tested "experiment 2" on multithreaded decoding.
6.10.5.1.1.1.1.1.7JCTVC-F275 Wavefront and CABAC Flush: Different degrees of parallelism without transcoding [G. Clare, F. Henry, S. Pateux (Orange FT)]
In JCTVC-F274, wavefront parallel processing (WPP) is proposed for parallel encoding and decoding. WPP consists of synchronizing the CABAC probabilities of the first LCU in each line from the second LCU of the line above, while maintaining inter-block dependencies. Parallel encoding and decoding are reported to have an average BD-rate degradation of +0.7% (intra +0.2%, random access +0.7%, low delay +1.2%). In WPP, converting a compressed video from a given level of parallelism to another is an entropy transcoding operation. In the present contribution, it is proposed to combine WPP with a flush and re-initialization of the internal state variables of CABAC at the end of each line of LCUs. Thus, each line of LCUs is compressed into a “chunk” of bits that is independent from the desired level of parallelism. Consequently, it is asserted that converting a bitstream from a given level of parallelism to another can be achieved either by re-ordering the chunks in the bitstream or providing chunk entry points as SEI messages. Using this approach, the reported average BD-rate degradation is +0.9% (intra +0.2%, random access +0.9%, low delay +1.6%).
This proposal is a combination of wavefront parallel processing with end-of-line CABAC flush.
Considers transcoding between different levels of parallelism.
In this proposal CABAC is flushed at the end of each LCU row (without byte aligning), producing separate chunks of data for each.
These chunks can be rearranged to support different levels of parallel processing.
It was suggested that byte aligning might be desirable.
But it is necessary to keep track of where each chunk begins in the bitstream, which is not included in the bit costs reported here. This is suggested to be stored in an SEI message (that is not delivered to the terminating end-point decoder).
It is not actually necessary to represent the chunk sizes/location with extra data – if removed, they can be re-identified by parsing the bitstream. But it is necessary to at least know how much parallelism is in the current form of the bitstream (and probably you would have the N entry points at least, although that is not really necessary.
The decoder needs (only) N entry point indicators for N parallel structured decoding.
A non-parallel encoder could deliver a parallelizable bitstream (and vice versa, and changes between any M-parallel encoding and N-parallel decoding).
The CABAC flush impact is reportedly about 0.2%.
Remark: Especially useful when the parallelism of the decoder is known to the encoder/server, but it may not really be common to have that knowledge.
6.10.5.1.1.1.1.1.8JCTVC-F351 Cross-check report on Orange-FT wavefront parallel processing (JCTVC-F275) [Hendry, J. Park, S. Park (JCTVC-F275)] [late upload 07-11]
6.10.5.1.1.1.1.1.9Discussion
Existing:
Proposed:
-
Tiles (JCTVC-F335)
-
With and without some tile-crossing coding dependencies (flag in JCTVC-F335)
-
Tiles with entry point identifiers for decoder parallelism (JCTVC-F594)
-
Wavefronts (JCTVC-F274)
-
Wavefronts with end-of-row CABAC flush (JCTVC-F275)
No text was (unfortunately) available for the combination of tiles and wavefronts. However, the relationship was considered to be understood.
Decision: Adopt both tiles and wavefronts, with the sub-bullet variants (not in common conditions).
Dostları ilə paylaş: |