5.13.1High-level systems usage of bitstreams
5.13.1.1.1.1.1.1.1JCTVC-H0423 Indication of the temporal structure of coded video sequences [M. M. Hannuksela, S. M. Gopalakrishna (Nokia)]
The contribution discusses the topic of describing the temporal structure of coded video sequences, which was proposed earlier in documents JCTVC-D200 and JCTVC-E279. A primary use case for knowing the temporal structure a priori is to assist trans-raters and bitstream extractors in pruning the coded video sequence.
By referring to the reference picture set index, the proposal is asserted to provide a more accurate description of the temporal structure than what was proposed in JCTVC-D200 and JCTVC-E279. Furthermore, by specifying the temporal structure in the sequence parameter set, it is reportedly possible to omit certain syntax elements from the slice header and hence reduce the bit count of the slice header.
The proposal can be considered to consist of the following ingredients:
-
The short term reference picture set syntax structure is moved from PPS to SPS.
-
A "structure of pictures" (SOP) description syntax structure is added in the SPS. The syntax structure describes a SOP in decoding order.
-
A coded video sequence can be described as part of the SPS with reference to described SOPs.
-
When the SOP description is used, the slice header includes a SOP index referring to a particular SOP description and a picture index within the SOP structure. Certain syntax elements of the slice header are excluded provided that the SOP description is used.
It is asserted that up to about 3% bit rate savings may be possible at a selected low bit rate and high frame rate relating to the test case "dyadic and nested temporal scalability for high frame rate" in the common test conditions specified in JCTVC-G1036, while typical bit rate savings are expected to be an order of magnitude smaller.
It was noted that, when used, having such a structure established in the SPS would prohibit adaptive structuring decisions within the CVS. If it is sent at the picture level (e.g. in SEI), an encoder could make dynamic changes.
A participant commented that a possible approach could be to have syntax in the SPS which establishes a default behaviour that can be overridden at the PPS or slice level.
A suggestion was to put this syntax in the PPS instead of in the SPS.
The proposal would remove from the slice header the slice_type, pic_order_cnt_lsb, and all reference picture set syntax. Long-term reference pictures would not be supported when the SOP syntax is used.
The proposal seemed interesting to the group.
Some modifications were made in a revision of the contribution, which put the syntax in the PPS to enable greater flexibility.
In subsequent discussion, placing the multiple SOP descriptions and RPSs in the SPS was suggested.
In subsequent discussion, it was commented that the value of this information may be limited (e.g. for trans-raters and bitstream extractors) if it is not guaranteed that the pictures in the bitstream will use it.
Just establishing descriptive information about the characteristics of a CVS could be accomplished with VUI or SEI or file/system-level metadata, whereas this also uses those characteristics to improve coding efficiency. The coding efficiency improvement, at least in typical uses, seemed minimal/negligible.
The proposed degree of rearchitecting of normative high-level syntax for this purpose seemed unnecessary. Some metadata approach (perhaps as VUI with a presence flag) was suggested as the appropriate way forward.
Moving the multiple-RPS sending from the PPS to the SPS was suggested, which seemed like a good idea. Decision: Agreed. (J. Samuelsson volunteered the editing effort.)
The metadata approach was worked on offline, and a revision was uploaded (-v4). It proposed an SEI message syntax. Decision: Adopted.
The additional suggestion was made to include the display orientation SEI message as has been currently planned for inclusion in AVC in both parent bodies (see prior JCTVC-G079, VCEG-AR12, and MPEG m23499 and ISO/IEC 14496-10/PDAM 1). Decision: Adopted.
5.13.1.1.1.1.1.1.2JCTVC-H0496 On bitstreams starting with CRA pictures [Y. -K. Wang, Y. Chen, M. Karczewicz (Qualcomm)]
This document re-proposed the proposal in JCTVC-G319. In the current HEVC design, "clean random access" (CRA) pictures are identified by a new NAL unit type. It is anticipated to be very common that a device with a conforming decoder would perform random access at a CRA picture. However, a bitstream starting at a CRA picture is considered non-conforming, thus a formally-conforming decoder may not actually be able to properly handle such bitstreams.
In this proposal, it was proposed to specify that a bitstream starting from a CRA picture could be conforming. Such a conforming bitstream may or may not contain "leading pictures" associated with the CRA picture. A leading picture associated with a CRA picture is a coded picture that follows the CRA picture in decoding order but precedes the CRA picture in output order. The proposed normative changes include: 1) skipping the decoding and output of the leading pictures associated with the starting CRA picture, when present; and 2) HRD modifications to specify bitstream conformance conditions to be fulfilled by a conforming bitstream starting with a CRA picture, regardless of whether the leading pictures associated with the CRA picture are present.
It was commented that bitstream conformance may be a bit tricky to specify. A suggestion was to specify for a reference decoding process to be applied to the "leading pictures" as in the prior recovery point SEI message description, with enforcing conformance of the bitstream when that operates, and then discarding the final decoded picture without output of the "leading pictures".
A modification of the buffering period SEI message was proposed, to assist with the HRD CPB flow modification that is introduced if the "leading pictures" are removed from the bitstream.
It was asserted that there should be no problem with the case where the "leading pictures" are not removed from the bitstream.
It was first agreed that, in spirit, we would like to adopt this – except for the modification of the buffering period SEI message).
It was asked whether, if we allow a bitstream to start with a CRA picture, there is still a purpose for IDR pictures. Further study of that question was encouraged.
It was noted and agreed that a specification of SPS activation from a CRA picture is needed, and that POC needs to be established for the CRA picture.
It was suggested that "non-existing" is a better word than "missing".
Decision: Adopt (modified version, with SPS activation, with setting of MSBs of POC to zero for the starting CRA picture, and with a note to say that the aspects relating to "non-existing" pictures are unnecessary for obtaining correct output and are only specified for purposes of establishing bitstream conformance – i.e., decoders may simply discard all leading pictures without any attempt to decode them).
5.13.1.1.1.1.1.1.3JCTVC-H0567 AHG15: Syntax controlled output process [J. Samuelsson, R. Sjöberg (Ericsson)]
This document proposes to add two syntax elements to the slice header to control the output process. A new output process was proposed to replace the "bumping" process, and it was claimed that the use of the proposed output process controlled by syntax elements in the slice header reduces unnecessary picture output delay. Further, it was claimed that the syntax-controlled output process is better suited for temporal scalability, since the encoder would not need separate decoder picture buffer (DPB) models for each temporal layer and since network nodes and decoders could rely on the proposed temporal layer switching information. The document describes how the flexible and efficient DPB usage provided by the "bumping" process is maintained by the proposed output process. A restriction was proposed to prohibit those cases that make the DPB status inconsistent among different temporal layers and that violate the concept of temporal layer switching points. It was also proposed that max_dec_frame_buffering and max_num_ref_frames can have different values for different temporal layers. The document also contains a proposed sequence parameter set (SPS) flag that it was asserted could be used to indicate the most common case of output process usage, when both output delay and DPB size is minimized. With the proposed SPS flag, the total bit rate increase for the common test conditions was reported to be one bit per sequence.
The contribution proposes having different max_dec_frame_buffering and max_num_ref_frames (if it exists – see below) for different temporal layers. It was commented that num_reorder_frames and max_latency_increase should be treated this way too. Decision: Agreed (to be expressed in a loop in SPS syntax).
It was commented that the bumping process should use num_reorder_frames and max_latency_increase, not just max_dec_frame_buffering. Decision: Agreed (depending on potential later decisions on related issues).
It was commented that we may no longer need max_num_ref_frames at all, since we no longer have a sliding window reference picture marking process. Decision: Remove it.
The contribution proposed to define a variable called OutputDistance for each picture (with a value sent at the slice level) and introduce a flag at the SPS level that can set that variable to 0 always, and to omit the slice-level syntax when the flag is set to 1. It was reported that the "always 0" case would be sufficient for most usage (and would be sufficient for our CTC).
It was asked whether the new syntax could be pushed up to the APS or PPS level, rather than residing in the slice header, and this seemed feasible.
It was remarked that this is only for output order based decoding, as it would not be needed for timestamp-based decoding operation.
Some participants expressed the view that such picture-level output control (using slice/APS/PPS-level syntax) did not seem really necessary and would be "overkill" (when considering the recent improvement in other high-level syntax for handling latency).
Further study was encouraged.
5.13.1.1.1.1.1.1.4JCTVC-H0500 Bit depth of output pictures [Y. Chen, Y. -K. Wang, X. Wang, I. S. Chong, M. Karczewicz (Qualcomm)]
This document re-proposed the proposal in JCTVC-G328.
In the current HEVC design, the decoded picture and the output picture always have the bit depth as signalled in the bitstream (e.g., 10 bits), regardless of the bit depth of the hypothetical encoder input video (e.g., 8 bits). In this document, it is proposed that: 1) a conforming decoder would output a decoded video sequence with a target output bit depth (signalled in the bitstream); and 2) if the output bit depth is lower than the bit depth of the decoded pictures, it is asserted that better memory consumption can be achieved in a way that if a picture is never or no longer used for reference, it could be converted to a lower bit depth immediately.
It was intended for the proposal to affect the DPB capacity for picture storage – to make it dependent on the bit depth of the stored pictures. This part of the text was, however, missing from the proposal. The practicality of an actual decoder taking advantage of this was questioned.
The decoder would be required to perform bit depth downconversion by truncation. It was remarked that prohibiting more sophisticated downconversion may be undesirable.
Further study was encouraged.
Dostları ilə paylaş: |