Thank you. We intend to use this as a baseline for future revisions.
m34961 Study of cross-sample variants in 23001-12
So, the question of whether it’s ever feasible to share data between variants, some experts felt it looked unlikely that sharing could occur, as they thought that meant that either the exact same original image data occurs in multiple frames (and if it did, a compressor would remove the redundancy) or (even less likely) that we get accidental byte-equality after compression. However, this is based on a misunderstanding that all watermarking is ‘imperceptible’. There are forensic cases that actually alter the pixel data (e.g. first line becomes ‘noise’, like teletext; different textures; minor elements such as stars move; and so on). Under those circumstances, sharing might be useful. It may also be desirable to have a data pool that has ‘too much’ data, to obfuscate what is actually being used by any given variant for any given sample (the constructors are encrypted so the pointers are not easily found).
The challenge question is how to enable this data sharing at the media level, while preserving the ability to adjust the file – re-fragment, de-fragment, insert or remove other data (e.g. a copyright) and so on. We really would like to keep this a ‘media level’ operation if we can and stay away from the ‘transport level’ of boxes, fragments, and so on. The box in the fragment at least means that we’re not ‘global’ (so copyright notices have no effect), but de/re-fragmentation are still hard.
We could consider two relative structures that the file format does offer. (a) We can use one or more time-parallel tracks containing data pools, they have a duration for the duration of sharing (b) we can use sample-relative numbers (i.e. “draw data from the sample that’s two before the one containing me”); or both (hint tracks use both). We have to confess that people found the sample-relative numbering slightly fragile in hint tracks.
Tracks, are of course, slightly ‘heavier’ data structures, but on the other hand, there can be multiple and (if this is OK from the point of view of obfuscation) they could be served separately (if track references are not needed).
We could use the sample auxiliary information (a) not at all (b) merely to say ‘variants happen here’ (c) an index of possible places to find constructors (d) the constructors themselves. Both hint tracks and aggregators use decode-time-parallel plus a sample-number relative offset, for the data source, by the way.
We sketch the overall design, using separate tracks, ‘some’ sample auxiliary information, track references and so on.
use track references for at least the data pool, and time parallel tracks
we are not sure if we need the +/-N samples in the constructors
we are not sure how much is in aux info and how much is in the parallel track structures (if we can keep the aux info under 255 bytes, that solves a separate question we have, by the way).
The authors agreed to provide a new text, with a view to issuing the DIS.