3D Audio
3D Audio Phase 2
Clemens Par, Swissaudec, presented
m33272
|
Status update on the Ecma standardization project ‘Scalable Sparse Spatial Sound System (S5)’
|
Clemens Par
|
|
The contribution reports on a new standard in Ecma that is to be finalized this April 18th. The zip archive of this contribution has a number of other documents. A general theme is that there is a market need for 3D Audio technology that can operate at Phase 2 bitrates (i.e. 128, 96, 64, 48 kb/s), and that the technology in the Ecma S5 standard appears to be able to operate at such bitrates and provide a quality that will be acceptable to the marketplace.
Juergen Herre, FhG-IIS/AudioLabs
m33206
|
Thoughts on MPEG-H 3DA Phase 2
|
Juergen Herre, Jan Plogsties
|
|
The contribution reviewed the vision and accomplishment of 3D Audio Phase 1, and the timeline for Phase 2 submissions and evaluations. The presenter noted that Phase 1 can deliver excellent quality, and that Phase 2 should be able to deliver quality that is well received in the marketplace, e.g. above 40 Mushra points. In addition, Phase 2 technology should deliver performance that is better than existing MPEG technology.
The presenter concluded that
-
There should be a minimum performance, e.g. Mushra score, for consideration for standardization.
-
“Benchmarks” should be included in the Phase 2 evaluation test such that Audio experts can understand the performance of submission with respect to e.g. current MPEG technology.
The Chair stated that MPEG Audio has never brought “bad” (e.g. below MUSHRA 40) technology to the market. While HE-AAC v2 and USAC may provide less than transparent audio quality at their lowest bitrate range, they have been extremely successful in the marketplace.
Furthermore, the Chair agreed that supplying MPEG “benchmark” technology should be viewed as independent of making a submission to the Phase 2 call.
Clemens Par, Swissaudec, noted that Phase 2 has been viewed as a Core Experiment with a common timeline (as defined by the Call). In this respect, benchmarking of RM vs. RM+CE should be encouraged. On the other hand, extension to lower bitrates may be viewed as a CE with a “new functionality” such that a comparison to RM is not applicable.
The Chair hoped that many of the issues raised by this contribution can be addressed by writing a detailed test logistics document, e.g. the progression of m33259 (see below).
Werner Oomen, Philips, presented
m33133
|
Extending Phase I to lower bitrates
|
Aki Härmä, Werner de Bruijn, Werner Oomen, Robert Brondijk
|
|
The contribution noted that 3D Audio supports a wide range of bit rate, and this will be extended in Phase 2. Furthermore, SAOC is a powerful tool for addressing the highest levels of compression. Combining SAOC and individual objects can be done with Phase 1 3D Audio, but may result in a rate/distortion curve that is not very smooth, as shown here (© Werner Oomen):
The contribution proposes to “smooth out” the rate/distortion curve by “allocating” objects to SAOC objects or individually coded objects on a QMF tile by QMF tile basis, e.g. as shown here (© Werner Oomen):
SAOC already permits this structure, using the mechanism of residual coding. The current 3D Audio specification already support this structure, but lacks the additional syntax to support manipulating the linked SAOC/Individual object in the sound scent.
The contribution proposes to extend 3D Audio objectConfig() by the additional syntax that will support interactivity. The presenter envisioned that such interactivity could encompass both object movement and also e.g. dialog enhancement.
The Chair noted that some experts would like more information about the envisioned performance of the technology, specifically as a means to “smooth out” the rate/distortion curve, and encouraged the presenter to do a pencil-and-paper analysis of how this might work.
This will continue to be discussed.
Later in the week Werner Oomen, Philips, gave a presentation offering additional information on this proposal.
The critical crossover frequency was investigated for the transition of objects between SCE and an SAOC downmix, which with respect to SBR proved to be functional at 7250Hz:
Philips proposed to include the considerations of m33133 into the current metadata work item.
It was the consensus of the ASG to include a generalization of this technology in the CD.
Phase 2
The Chair presented
m33259
|
DRAFT Listening Test Logistics for 3D Audio Phase 2
|
Schuyler Quackenbush
|
|
The contribution is a draft of the output document that will organize the listening tests that will be use in part to evaluate responses to the Phase 2 Call. It was discussed and experts suggested to add appropriate Anchors and Benchmarks to the set of systems under test to
-
Stabilize the subjective scale
-
Verify that submissions have performance better than current MPEG technology
This will continue to be discussed
Later in the week, this document was reviewed, edited and accepted by the consensus of the Audio subgroup as an output document.
Binauralization
Taegyu Lee, Yonsei, presented
The contribution notes that in the previous AdHoc period the FD binauralization engine was integrated into the WD text and Ref Sw. In the course of that effort, bugs were discovered in the software, and this contribution reports on those bugs and recommends fixes:
-
Error in curve fitting that specifies subband filter length. Bitstreams and decoded waveforms are not changed.
-
Error in RT20 calculation. Fix causes new bitstreams and decoded waveforms.
-
Delay compensation for QMF Tapped Delay Line (QTDL). Fix causes new bitstreams and decoded waveforms.
-
Propagation time in Sparse Frequency Reverberator (SFR). Fix causes new bitstreams and decoded waveforms.
In addition, the RM2 integrated architecture resulted in a slightly different QMF interface that resulted in different decoded output.
The contribution presented listening test results that verified that applying all bugfixes resulted in performance that was not different from RM2.
It was the consensus of the Audio subgroup to incorporate all recommended bugfixes into the RM2 Reference Software.
Gregory Pallone, Orange, presented
m33202
|
RM1-HOA Binaural Parameterization
|
Gregory Pallone
|
|
The contribution has a description or the algorithm used to convert BRIRs to binauralization parameters for the TD binauralization. It is recommended that this text be a normative section of the 3D Audio CD text to be produced at this meeting. A matching software implementation is available for integration into the 3D Audio Reference Software.
In the course of creating the software implementation of the parameterization, a bug was identified in the binauralization engine: that the cutoff frequency for diffuse block needs to be a vector, since there now can be more than one diffuse block.
In some cases, the automatic parameterization resulted in binauralization parameters that were different (and longer) that the RM0 manual parameterization. The contribution reports the results of a listening test that verified that the automatic parameterization produced quality that was not different from the RM0 performance (in fact, the mean of the automatic means tended to be higher).
It was the consensus of the Audio subgroup to incorporate all recommendations of the contribution for additions to CD text and Reference Software.
Simone Fontana, Huawei, presented
m33170
|
Mixing Time estimation for Binauralization
|
Simone Fontana, Peter Grosche, Panji Setiawan
|
|
The contribution presented a scenario for binauralization, and recommends that the parameterization from BRIR be automatic and normative.
An important part of the automatic parameterization is identifying the “Mixing Time” which is transition point between Direct and Early (D&E) part and the late Reverberant (RIR) part. It reviewed possible means to identify Mixing Time, and recommends a hybrid method that is data-based, but is verified to agree with perceptual results.
It notes that the currently in 3D Audio, TD uses T20 reverberation and FD used T60 reverberation, both based on the Schroeder Energy Decay Curve (EDC).
The contribution notes if a room is assumed to be “ergotic” then as some point in time the energy flow in the room due to the BRIR reverberant part will be uniform in all directions. This assumption leads to a conclusion that at and after the Mixing Time, all BRIR curves can be considered identical. Based on this observation, the contribution describes a simple algorithm to determine Mixing Time.
The presenter stated that the advantage of the proposed technology is that it provides a consistent and high-quality method to identify the point between D&E and RIR. The proposed technology would work with both TD and FD binauralization engines.
The Chair concluded that there is no consensus in the Audio subgroup to make a decision at this time, and encouraged interested experts to continue to discuss this proposal.
Jeongil Seo, ETRI, presented
m33136
|
Consideration on Binaural Renderer Architecture
|
Jeongil Seo, Kyeongok Kang, Taegyu Lee, Henney Oh
|
|
The contribution summarized the state of normative technology for binauralization in the integrated 3D Audio decoder architecture. It noted that there are three means for binauralization:
-
TD
-
FD
-
H2B/TD (an efficient means if number of coefficient channels is less than number of virtual loudspeaker signals)
Amongst these, two (TD, FD) are conceptually different only with respect to input format. It further notes that the mixer and DRC may operate in QMF domain. It presents complexity figures for TD and FD binauralization with various assumptions about domain of processing chain.
Based on these observations, it proposes that FD be the normative binauralization engine in most cases. In the special case that there are only HOA signals, the TD engine should be normative.
Oliver Weubbolt, Technicolor, reminded the group that at highest bitrates the USAC decoder will deliver time domain signals and so TD binauralization might be most appropriate.
Henney OH, WILUS STRC, noted that it may a burden on implementations to have to store two different binauralization parameters, one for each of FD and TD engines.
Gregory Pallone, Orange, presented
m33200
|
Proposal for Binauralization
|
Gregory Pallone
|
|
The presentation proposed that the binauralization domain (TD/FD) follow the mixing domain. With this assumption, it reviewed the complexity for each binauralization engine. In this case, there is no TD/FD conversion step and so this complexity can be neglected.
The contribution proposes that
-
If mixer is in time domain, TD binauralization is used.
-
If mixer is in QMF domain, FD binauralization is used.
The Chair noted that an implementation could be free to use either binauralization engine and that conformance could accommodate this flexibility.
Discussion on binauralization
The Chair noted that none of the proposals advocates only one of the two binauralization engines. The differences are only when a given engine would be used. I would be interesting to give implementations some amount of freedom to use either binauralization engine in some situations.
Experts will draft some normative text that clearly states a way forward. This will be reviewed later in the week.
Profiles
Gregory Pallone, Orange, presented
m33203
|
Thoughts on MPEG-H Profile
|
Gregory Pallone
|
|
This is a joint contribution from Orange, Huawei and Technicolor. It anticipates that MPEG will specify a 3D Audio profile at the 109th meeting. The contribution observes that the marketplace will want 3D Audio in all of Channel, Object and HOA signal modes and proposes that a profile should unify the marketplace by supporting all signal modes. Examples use cases are:
TV with Channel (current), HOA (possibly sports or live TV), objects (possibly dialog)
In such an envisioned profile, and open issues is what levels are appropriate, e.g.:
-
Number of HOA components
-
Maximum number of objects
-
Miximum number of channels
Specifying the number of core coder channels could accommodate these constraints.
Yeshwant Muthusamy, Samsung Telecom America, stated that Samsung is very strongly in support of a single profile for all signal modes.
Thomas Sporer, FhG-IDMT, noted that 32 simultaneous objects is the limit of what humans can perceive, while 16 might be a realistic number.
Jan Plogsties, FhG-IIS, noted that the question of levels might be posed to possible application standard customers this meeting via a liaison statement. Werner Oomen, Philips, noted that the DVB liaison contains use cases that could already inform Audio experts on what levels might be appropriate in a first profile.
The Chair suggested that a “Thoughts on 3D Audio Profile” document might be useful to focus our thoughts for a response to the ATSC and DVB liaisons.
Interfaces
Jan Plogsties, FhG-IIS, presented
m33209
|
Progress on workplan for 3D Audio interfaces
|
Jan Plogsties
|
|
The contribution reviewed the information the decoder needs in order to decode the transmitted signal for the user’s environment.
The contribution notes that transmitted loudspeaker layout is currently fully specified in the WD text as Speakerconfig3d(), which can reference:
-
One CICP layout index
-
Set of CICP loudspeaker indexes
-
Set of loudspeaker positions
Concerning BRIR data, there is an intent to interoperate with AES X212 file. However, the interface to 3D Audio could be:
-
Set of BRIR as FIR filters
-
MPEG-H binauralization parameters, as some text file or binary data structure
Finally, 3D Audio decoder data structures must be flagged as readable by an interface or as writeable by an interface.
Experts noted that at the Hannover AhG meeting, it was agree on how signal loudspeaker position in the bitstream, i.e. the configuration used in production.
Audio experts will draft a straw-man document on normative interfaces for discussion during the week.
Taegyu Lee, Yonsei University, presented
m33139
|
Consideration on the Interface of Binaural Renderer
|
Taegyu Lee, Henney Oh, Young-cheol Park, Dae Hee Youn, Jeongil Seo, Kyeongok Kang,
|
|
The contribution notes that there is a need for a normative interface for 3D Audio binauralization BRIR data. It notes that this can be as FIR filters or as parameters. It also gives a concrete example of the binary data needs for the FD binauralization engine.
Werner de Bruijn, Philips, presented
m33134
|
3D Audio Decoder Interfaces
|
Aki Härmä, Werner de Bruijn, Werner Oomen
|
|
The contribution notes that Philips has previously presented, in m29145, m30249, m31367 and m32241, information on possible normative interfaces. Any interface should be simple with low-complexity readers and writers.
-
Playback-side loudspeaker configuration
-
BRIR data
-
Object control data, e.g. to modify object gains
It proposes syntax for an interface parameter packet. The syntax is a compact binary representation in the form of a bitstream element. An annex of the contribution gives an equivalent representation using XML format. Concerning BRIR data, the contribution asserts that XML is not an efficient or perhaps even feasible container for BRIR data; that WAV file format has many variations, which might make interoperability problematic.
The Chair noted that it might not be the job of MPEG to define an interface that serves proprietary renders.
Interested experts will discuss this topic in a break-out and bring some text back to the group for further discussion.
CD of ISO/IEC 23008-3 3D Audio
Max Neuendorf, FhG-IIS, presented
m33178
|
Proposed CD of ISO/IEC 23008-3 3D Audio
|
Max Neuendorf, Oliver Wübbolt, Andreas Hölzer, Johannes Böhm, Sven Kordon, Florian Keiler, Peter Jax, Alexander Krüger, Deep Sen, Nils Peters,
|
|
The presenter highlighted what has changed between the WD from the 107th meeting and this contribution.
Loudspeaker configuration
-
Loudspeaker layout for encoded content
-
SpeakerConfig3d() – syntax element to describe loudspeaker layout
SAOC editorial changes
-
remove text that was not appropriate for SAOC 3D (i.e. in the context of 3D Audio)
-
re-orgainized text
-
remove text if a reference to MPEG-D SAOC is sufficient
Binaural
-
add normative text for FD and TD engines
It was the consensus of the Audio subgroup to accept the proposed changes and to use this contribution as the basis for the CD text that will issue this week.
Max Neuendorf, FhG-IIS, presented
m33182
|
Software for MPEG-H 3D Audio RM2
|
Christian Ertel, Sascha Dick, Andreas Hölzer, Florian Keiler, Alexander Krüger, Sven Kordon, Deep Sen, Nils Peters, ,
|
|
The contribution has the RM2 software in a zip archive in the contribution zip archive. The document text lists the changes incorporated into the Reference Software since RM1. If also give an overview of what software modules changed or were added as a result of the C/O/HOA architecture merge. The presenter noted that, as of just prior to the 108th MPEG meeting, the Ref Sw was fully in line with the Draft CD text.
Gregory Pallone noted that he could incorporate the parameterization software for the TD binauralization into the output document for 3D Audio Ref Sw.
Max Neuendorf, FhG-IIS, presented
m33198
|
Proposed corrections to MPEG-H 3D Audio
|
Max Neuendorf, Christian Helmrich, Adrian Murtaza,
|
|
The contribution proposes corrections to the Draft CD contribution text that go beyond editorial changes. In general such changes can be to
-
Only specification text (there were none)
-
Only reference software (1, related to SAOC)
-
Both text and reference software
Concerning Ref Sw, the SAOC object metadata is not aligned with the signal, and the correction is to add a delay to the metadata.
Concerning both text and Ref Sw, the following table summarizes the proposed changes:
|
label / issue
|
errata correction affects:
|
|
bit stream
|
dec. waveforms, algorithm change
|
dec. waveforms, processing side effects
(bit reservoir variations, random noise seeds, rounding etc.)
|
1
|
RM2_3D_BUGFIX_SAOC_1
|
n
|
y
|
n
|
2
|
RM2_3D_BUGFIX_IGF_11
(??)
|
n
|
y
|
n
|
3
|
RM2_3D_BUGFIX_IGF_12
(correct calculation of target energy)
|
n
|
y
|
n
|
4
|
RM2_3D_BUGFIX_IGF_13
(preserve maximum dynamic range)
|
n
|
n
|
y
|
5
|
Length field in elem loop
|
y
|
n
|
y
|
6
|
RM2_3D_BUGFIX_SAOC_2
(Wet path energy correction)
|
n
|
y
|
n
|
7
|
RM2_3D_BUGFIX_SAOC_3
(Use only maximum number of decorrelators)
|
y
|
n
|
n
|
8
|
FC table updates
|
y
|
n
|
y
|
9
|
CICP codepoint
|
y
|
n
|
n
|
The Chair noted that the”length field in elem loop” is perhaps less clear cut than the other proposed changes, in that the 16-bit length field imposes a bitrate burden that can be appreciable at lower (e.g. Phase 2, 48 kb/s) bitrates.
Concerning RM2_3D_BUGFIX_SAOC_3, the Chair noted that m33192 assumed to use level 0 signalling. Christof F, Dolby, stated that it is not clear that the capability to reduce complexity (i.e. reduce the number of decorrelators) should be given up. The Chair noted that in this case, we now that lower complexity leads to lower quality. The actual reduction in 3D Audio decoder complexity will be studied.
The revision to CICP codepoint is to add additional configurations to support what are envisioned to become widely deployed configurations.
Finally, it is proposed to use the new CICP codepoint LoudspeakerGeometry in the 3D Audio decoder text and reference software.
It was the consensus of the Audio subgroup to accept the proposed changes for all BUT items 5, 7, 9 and to give the editor the mandate to include these changes into the CD text and reference software.
Further discussion
Concerning point 7, above, Juergen Herre confirmed the following:
bsDcorrelationLevel
|
Decorrelators
|
Relative Complexity
|
0
|
11
|
100%
|
1
|
9
|
99%
|
2
|
7
|
98%
|
3
|
5
|
97%
|
Furthermore, he confirmed that at index 3, the subjective quality performance drops by 10 MUSHRA points.
The Audio Subgroup agreed to remove the 2-bit field bsDcorrelationLevel field and to always make the 11 decorrelators available.
Max Neuendorf, FhG-IIS, presented a proposal on point 5, above. In brief, it transmits a “elementLengthPresent” in the Config. This permits the length field to conditionally appear prior to each SCE, CPE, QCE.
The Audio Subgroup agreed to accept the proposal, subject to some further refinements of the syntax.
There was not sustained objection to the point 9, above, so the Audio Subgroup agreed to accept the proposal as presented in the contribution.
3D Audio and Systems
Robert Brondijk, Philips, presented
m33130
|
Addressing multi language in MPEG-H
|
Robert Brondijk, Frans de Bont
|
|
The presenter noted that e.g. DVB uses a “dual decoder” architecture that permits decoding of a main program and a language program, which are mixed in the receiver.
Robert Brondijk, Philips, presented
m33131
|
Flexible System signaling for 3D audio in MPEG-H
|
Robert Brondijk, Frans de Bont
|
|
The presenter noted that there is a Systems layer that typically “advertises” the program types and characteristics, while the MPEG layer carries the media representation and associated metadata. In between is a shared layer that might be jointly specified by e.g. DVB and MPEG. This is illustrated in the following figure:
DVB has
-
Service Description Table, which lists “TV Channels”
-
Event Information Table, which lists the “TV Shows” that are broadcast at a specific time on a specific TV Channel
MPEG has
-
Program Map Table, which lists PIDs, where each PID shall contain only one language type
-
Media Descriptor, which indicates at a high level the features of the media
The essential issue is that the PMT and PID can indicate only one language.
The contribution recommends a specific descriptor syntax, which defines the concept of a media group, where, e.g. one group is a main program with a default language, and other groups with other languages.
A key concept is a Stream Multiplexer that receives a main program and a supplemental program. This then feeds all streams (and objects) into a single 3D Audio decoder.
Audio experts will continue to discuss this proposal in conjunction with the presentation of other relevant contributions.
Stephan Schreiner, FhG-IIS, presented
m33187
|
Proposal for MPEG-H 3D Audio in MPEG-2 Systems
|
Stephan Schreiner, Harald Fuchs
|
|
The contribution proposes a new stream_id and stream_type for MPEG-H 3D Audio. It describes a new MPEG-H_3dAudioDescriptor.
The Chair noted that the proposed descriptor is detailed, and hopes that Systems experts can indicate the appropriate level of detail for such a descriptor.
Stephan Schreiner, FhG-IIS, presented
m33190
|
Proposed MPEG-H 3D Audio stream format
|
Stephan Schreiner, Harald Fuchs, Stefan Doehla,
|
|
The contribution proposes a synchronizing transport for carriage of 3D Audio in MPEG-2 Systems transport streams. It defines mpeghAudioTransportStream() and mpeghAudioTransprortStreamPacket(). The packets can be of type Sync, Config and 3D Audio payload. This permits a synchronizing stream to be constructed as a sequence of packets, as illustrated here:
It was the consensus of the Audio subgroup to accept this proposal into the CD text.
3D Audio
Johannes Boehm, Technicolor, presented
m33196
|
HOA decoder – changes and proposed modifications
|
Johannes Boehm, Peter Jax, Florian Keiler, Sven Kordon, Alexander Krueger, Oliver Wuebbolt,
|
|
The contribution documents the changes to the HOA that are incorporated into the draft CD contribution:
-
Bugfixes and syntax cleanup.
-
Removal of LP prediction mode. This reduces complexity and reduced decoding delay by 1 frame. Listening test results showed that this change resulted in no reduction in performance.
-
Replace HE-AAC core coder with the USAC core coder and re-structure the carriage of HOA compression information to be a USAC extension playload.
-
Removal of hoaDependancyFlag and use instead usacIndependancyFlag. This permits HOA processing structure to have the same
It was the consensus of the Audio subgroup to accept the recommendations of the AhG to incorporate these changes into the CD text to be produced at this meeting.
The contribution proposes additional changes to the HOA text:
-
Remove the signaling of HOA rendering matrix ID. This was due to the three sources of HOA material (Technicolor, Qualcomm, Orange). The ID might be useful, but only for exactly the normative 22.2 loudspeaker setup. The presenter noted that such a downmix matrix can be transmitted via the Passive Downmix capability.
-
Bitstream cleanup, including limiting the HOA order to at most 29. This is reasonable as it has special resolution comparable to that of the quantized Predominant Sound component.
-
Add some reserved bits in the HOA bitstream syntax that permits some (currently unknown) extensibility. A specific proposal will be made at the next meeting.
The contribution documents the changes to the HOA that are incorporated into the draft CD contribution:
Other topics
Max Neuendorf, FhG-IIS, presented
m33191
|
Technical description of stream-switching with MPEG-H SAOC 3D
|
Adrian Murtaza, Jouni Paulus, Leon Terentiv, Falko Ridderbusch, Harald Fuchs, Juergen Herre,
|
|
The proposal brings more information on seamless switching with SAOC-3D.
It was the consensus of the Audio subgroup to incorporate the proposed text and technology into the CD text to be issued this week.
Max Neuendorf, FhG-IIS, presented
m33231
|
Codec behavior at reconfiguration events
|
Max Neuendorf
|
|
The presentation noted that, unlike prevous MPEG technology (e.g. AAC), MPEG-H 3D Audio has decoder elements that may be used in response to decoder-side information or user interaction, and that these decoder elements cannot be known by the MPEG-H 3D Audio encoder. For example, consider:
-
22.2 channel reference config in bitstream and 22.2 target channel config (at decoder)
-
22.2 channel reference config in bitstream and 5.1 target channel config (at decoder)
In the latter case, the format converter must be used, which entails additional delay (e.g. QMF analysis/synthesis).
The contribution proposes to have a delay replace any tool that is “not active.” The presence of such a delay is optional, but its presence (or not) is normatively signalled in the mpegh3dConfig(). The presenter noted that some delays may be significant and others not.
The Chair suggested that the group draft a workplan to study which processing tools may require delay compensation. This may include use cases e.g. cinema or music only.
Jan Plogsties, FhG-IIS, presented
m33224
|
Object Interaction Use Cases and Technology
|
Simone Füg, Jan Plogsties
|
|
The contribution has reviewed the requirements listed in the CfP, and notes that the following requirements have not yet been met:
-
The system should enable user control of objects (e.g. dialog enhancement, alter sound scene by object replacement).
The contribution proposes new syntax and semantics for managing objects. There was considerable discussion of the proposal.
A break-out group will examine what might be agreed to this week, and what should be transferred to a workplan form this meeting for further study.
MPEG-H 3D Audio DRC
Fabian Kuech, FhG-IIS, presented
m33151
|
Proposed Text on DRC and Loudness Technology in MPEG-H 3D Audio
|
Michael Kratschmer, Fabian Kuech, Johannes Boehm
|
|
The contribution proposes:
-
Alignment of text between MPEG-D DRC and MPEG-H 3D Audio, such that 3D Audio can in general refer to MPEG-D DRC
-
A single section of the document that discuses semantics of DRC in 3D Audio
-
New syntax items for new capability
It was the consensus of the Audio subgroup to accept all proposals in the contribution except
-
Downmix Type that refers to a default BRIR (this will be left as reserved value)
Samsung CE on Immersive Audio
Sang Bae, Samsung, gave a presentation on additional information on the Samsung CE. It reviewed the proposed technology and indicated how listening rooms with different upper ring loudspeaker placement might have a sub-optimal listener experience. The presenter suggested how technology could be modified so that the different listening rooms could get the desired experience. Finally, the presenter reviewed the proposed syntax to support the technology.
It is proposed to conduct an additional cross-check during the AhG period, with listening tests at FhG-IIS and ETRI.
It was the consensus of the Audio subgroup to incorporate this technology into an Annex of the CD, marked subject to verification and decision at the June AhG meeting.
Panasonic CE on low complexity rendering for HOA
It was the consensus of the Audio subgroup to include a description of this technology as an informative annex of the 3D Audio CD.
Huawei CE on mixing time
A workplan was presented for further study of this technology. This workplan will be incorporated into the Workplan on 3D Audio, and the Audio subgroup looks forward to additional information at the 109th meeting.
DRC
Toru Chinen, Sony, presented
m33118
|
Comparison Test about Dynamic Range Control Technology
|
Runyu Shi, Toru Chinen, Yuki Yamamoto, Hiroyuki Honma, Masayuki Nishiguchi
|
|
The contribution reports the results of several experiments. First used all gain sequences from Apple, FhG-IIS and Sony.
-
RM is DRC WD1 with all bug fixes
-
CE is DRC WD1 with all bug fixes plus Sony technology (i.e. no linear rather than spline interpolation and prediction)
Experiment results showed that CE technology reduced average (-20%) and peak bitrate (-40%) of DRC information rate. When considering total 3D Audio bitrate, average bitrate reduction was approximately -0.05%. The presenter asserted that it is very important to reduce peak bitrate, which this CE technology can do.
A second experiment reported the distribution of bitrates. These results show a performance gain for the LIN technology with respect to RM1. The increase in performance of Sony CE as compared to LIN is much less.
Fabian Kuech, FhG-IIS, presented
m33142
|
Core Experiment on Redundancy Reduction in MPEG-D DRC Gain Sequence Coding
|
Bernhard Neugebauer, Fabian Kuech
|
|
The contribution stated that it is important to reduce both average and peak DRC bitrate. The CE technology is composed of three components
The first is syntax changes that always reduces rate. The resulting sequences produce bit-identical decoded output:
-
timeDeltaCode
-
introduce “frame end flag”
-
introduce adaptive word length
-
slopeCode
-
able to reduce rate by 1 bit/frame
The RM1+CE technology as compared to RM1
reduced average rate 5%,
and reduced peak rate by 17%.
The results have been cross-checked by Apple.
The presenter noted that the last change (slopeCode) permits either spline or linear interpolation such that the either the current RM1 technology or the Sony CE technology could be used.
The Chair noted that there seems to be support for accepting this CE, but a decision will be made after all DRC CE presentations and subsequent discussion.
Fabian Kuech, FhG-IIS, presented
m33145
|
Core Experiment on Low-Bitrate Gain Sequence Coding for MPEG-D DRC
|
Bernhard Neugebauer, Fabian Kuech
|
|
The contribution proposes a new gainInterpolationType flag. This permits either spline interpolation or linear interpolation. With this flag, the CE then tests the performance of linear interpolation. This results in an average reduction in bitrate of 17%, and 23% reduction for the “highest rate” frames (i.e. 95% percentile).
The contribution reports on a second experiment. For the experiments, the following abbreviations are used:
RM1: Reference Model 1
CE: This Core Experiment (gainInterpolationType = 1)
BR: Core Experiment on Bitrate Reduction [M33142]
LP: Linear Prediction [Sony M32229]
MT: Modified timeDelta coding [Sony M32229]
The bitrate reduction with respect to RM1 is as follows:
Technology Avg. Highest 5%
CE+MT 19% 29%
LP 19% 27%
CE+BR 22% 40%
Toru Chinen, Sony, stated that the Sony linear predictive coding is able to reduce the highest 1% by a significant amount.
Fabian Kuech, FhG-IIS, presented
m33147
|
Core Experiment on Improving MPEG-D DRC Technology
|
Michael Meier, Fabian Kuech
|
|
The CE introduces the concept of “node reservoir.” Similar to the “bit reservoir” used in USAC, this can smooth out the nodes transmitted per frame. It proposes a one-frame reservoir and notes that the default mode DRC nodes are used one frame after they are transmitted (to match to the USAC decoder overlap-add delay). The CE requires no syntax changes and produces a decoded result that is bit-exact as compared to RM1. The CE technology is not possible in DRC low-delay mode because in this case the DRC sequences do not have the one-frame delay.
The performance of the technology is as follows:
-
Technology
|
Average
|
High 5%
|
High 1%
|
RM1+CE
|
-1%
|
6%
|
14.5%
|
The presenter noted that the slight decrease in performance could be due to the fact that the shifted node case makes less use of node position differential coding.
Masayuki Nishiguchi, Sony, noted that the node buffer might result in worse performance in the case of packet loss or bit errors.
Fabian Kuech, FhG-IIS, presented
m33149
|
Technical Description of a Tool for DRC Technology
|
Michael Kratschmer, Fabian Kuech
|
|
The contribution proposes a DRC “playload splitter” that has properties of a bit reservoir in that is can spread DRC information bits over several core codec frames. The proposed reservoir can spread information up to 15 core coder frames into the future. The underlying DRC information is identical so that the decoded output is identical. The proposal is enabled by a 1-bit flag in the config().
The Chair noted that there are no proposals on limits on reservoir size, and that this might be best considered when using DRC in 3D Audio.
Additional information: impact of payload splitter peak rate smoothing in the context of 3D Audio using e.g. more than 16 DRC gain sequences.
Conclusion
It was the consensus of the Audio subgroup to accept into the DRC CE text:
-
Accept gainInterpolationType flag in m33145
-
Accept linear interpolation from Sony
-
Accept syntax changes as in m33142
It was decided to put the following technology into the “Candidate Technologies for 3D Audio” document:
-
Node Reservoir as in m33147 – accept into CD subject to xcheck
-
Payload splitter tool proposed in m33149 will have a workplan for further study.
Exploration
Takehiro Sugimoto, NHK, presented
m33079
|
Cross-check report of audio synchronization
|
Takehiro Sugimoto and Yasushige Nakayama
|
|
The contribution reported on a cross-check of the Sony technology.
Jürgen Herre, Fraunhofer-IIS, asked how the synchronization had been achieved and whether the system was consistent when different streams were used. Takehiro Sugimoto replied that Sony had supplied the relevant software and that the microphone and the stream synchronized in all cases.
Several conditions with different ambience noise were tested in a 3D lab of NHK. Regarding the number of tests, Sony referred to its last report; regarding the number of ambiences NHK referred to its crosscheck report.
Thomas Sporer, Fraunhofer-IDMT, asked for the average distance between loudspeaker and microphone, which was reported by NHk to be 3m.
Masayuki Nishiguchi, Sony Corporation, presented
m33154
|
Cross check report of audio synchronization experiment using MP4 file format
|
Shusuke Takahashi, Akira Inoue, Masayuki Nishiguchi, Toru Chinen
|
|
Sony proposed a scheme for synchronization of multiple audio streams for standardization at the 106th meeting in Geneva. At the 107th meeting in San Jose, the system aspect and syntax structure defining Audio Object Type (AOT) of the audio feature stream were proposed for their accommodation into an MPEG system, and a work plan [N14270] was issued. The present contribution describes the experimental setup and crosscheck results of the experiment conducted by Sony and NHK. AOT = 46 was used for the synchronization stream, Sony proposed a new Audio Object Type AOT = 46 within MPEG-4 in order to transmit the synchronization feature as an elementary stream.
At the same time the following syntax was proposed. For the configuration data:
Syntax
|
No. of bits
|
Mnemonic
|
AudioSyncFeatureSpecifcConfig()
|
|
|
{
|
|
|
audio_sync_feature_type
|
4
|
uimsbf
|
audio_sync_feature_frame_length_index
|
4
|
uimsbf
|
audio_sync_feature_time_resolution_index
|
4
|
uimsbf
|
audio_sync_number_of_streams_index
|
4
|
uimsbf
|
reserved
|
16
|
uimsbf
|
}
|
|
|
For the bitstream payload:
Syntax
|
No. of bits
|
Mnemonic
|
AudioSyncFeatureFrame()
|
|
|
{
|
|
|
For (i = 0; i < audio_sync_number_of_streams; i++) {;
|
|
|
For (j=0; j |
|
|
audio_sync_feature
|
1
|
uimsbf
|
}
|
|
|
}
|
|
|
}
|
|
|
Sony hinted that in IPTV synchronization mechanism of multiple screen contents were now under discussion.
Jürgen Herre, Fraunhofer-IIS, asked how quantization could be achieved for AOT = 46 and requested further information on how the matching was done and where it was described. Werner Oomen, Philips asked whether the extraction and the fingerprint feature were both available.
Sony Corporation volunteered to provide further information in case of approval, and in particular the extraction source code. Jürgen Herre noted that it is good MPEG tradition to provide the source code only after a successful core experiment. However, he and other experts observed that very little technical information was provided about the semantics, the extraction and the matching.
Sony reiterated to provide source code after approval. Werner Oomen, Philips, asked whether in such case the whole source code would be provided which was confirmed by Sony by asking at the same time whether the fingerprinting could be made normative.
Christof Fersch, Dolby, asked whether the system could also be used for other fingerprint mechanisms. Sony Corporation replied that the systems need not to be constrained to a specific algorithm and improved systems would be possible. In any case the detection algorithm should be made normative.
Werner Oomen, Philips, observed that the detection algorithm could be made normative whilst the fingerprinting could stay informative. Sony Corporation observed that such approach might break the compatibility and suggested a single normative part.
Jürgen Herre observed that without the specifications interoperability couldn’t be achieved. Sony claimed that the whole algorithm should be made normative whilst Jürgen Herre wished that some interoperability should be enabled.
Werner Oomen observed that the feature extraction algorithm and the comparison belong together. Different fingerprints would require different matching. He asked whether the synchronization could be made interoperable. Jürgen Herre observed that different solutions could be possible.
Werner Oomen observed that the system should work in closed applications, particular with regard to fingerprinting, and should not open the available standard. Sony replied that the algorithm would have to be identified.
Jürgen Herre stated that one should start with one matching algorithm, which probably should be made informative. Sony consented that this could be an option.
Jürgen Herre stated for Fraunhofer-IIS to be willing to show support for the proposal if informal information would be given on the feature extraction and the matching procedure.
Thomas Sporer asked who would be in charge to give the right identity to this new kind of fingerprint. Christof Fersch observed that a registration authority would be needed.
Max Neuendorf noted that it would be sufficient to proceed with the usual amendment process. Werner Oomen asked what would happen if one had the fingerprint and would like to use the transport mechanism. Max Neuendorf and Jürgen Herre both showed a preference for proceeding in the usual way instead of using a registry.
Sony urged for approval during the present WG 11 Meeting.
Jürgen Herre stated that Sony might provide WD with an informative section on feature extraction and matching during this meeting.
Sony replied that such procedure should start with the normative part approved.
Jürgen Herre noted that the present focus was on AOT = 46 and Fraunhofer-IIS was ready for support if above mentioned information would be provided in informative sections.
Sony claimed that likewise one of the types should be made normative and agreed, upon investigation from Jürgen Herre, to the proposal of making the whole algorithm normative. Jürgen Herre observed that MPEG-7 would be en excellent candidate for including these synchronization features.
Sony replied that any solution other than AOT = 46 would be unpredictable.
Thomas Sporer observed that the present solution would not represent a complete system with respect to severe delays, switches on both screens and Internet applications where an offset of 10 seconds may be observed. German TV programs could run with a few minutes’ lags and matching the second screen could impose a problem.
Sony noted that the present system couldn’t be used for live broadcast applications. The second screen would have to be imported first and the main stream to be delayed. However, the matching window currently had a 20s length depending on complexity. Hierarchical matching would, however, be possible,
Christof Fersch asked what was Systems’ reaction to the present system’s presentation by Sony and how it could be made sure that the current proposals works with Systems. Sony replied that further assessment results should be provided to Systems. Dolby showed further concern and requested an input contribution on how the proposal would work together with time synchronization in Systems.
Sony Corporation replied that complementary technology could be chosen for hybrid systems and even with exchanged time code one would have to take care of delay issues. Sony Corporation confirmed that in case of acceptance of the new object type full disclosure would be made.
Jürgen Herre and Thomas Sporer noted that there would be support under the condition that enough information is provided for understanding the proposal. Jürgen Herre noted that, in case of disclosure of the extraction and matching features, still the proposal would have to conform to Systems and timestamps.
Sony Corporation asked for the schedule for AMD5. Max Neuendorf, who observed that an editing period would not be needed, and Christof Fersch provided the detailed schedule:
Thomas Sporer asked within which standard all the other descriptions would be incorporated. Sony Corporation articulated its preference for MPEG-D. Thomas Sporer observed that a test for MPEG-D could be provided at the 109th MPEG Meeting in a synchronized way.
It was the consensus of the Audio Subgroup that the Sony proposal for AOT = 46 is conditionally approved together with audio_sync_feature_type set to 0, provided that the supplied information is complete enough to allow implementation, whilst in such case the extraction would be made normative and the matching informative. The syntax should be made normative.
It was the consensus of the Audio Subgroup that the synchronization technology be accepted as a part of the MPEG standard conditionally, that means:
-
Audio Object Type (AOT=46) is defined for the audio feature stream in the MPEG-4 standard.
-
The proposed audio feature extraction algorithm is accepted as a normative part of the MPEG-4 standard.
-
The proposed audio feature and its syntax are accepted as a normative part of the MPEG-4 standard.
-
The proposed feature matching algorithm is accepted as an informative part of the MPEG-4 standard.
On condition that:
-
Sony provides complete technical description of the proposed synchronization scheme.
Sony Corporation will present more information at the 109th MPEG Meeting.
Maintenance
Noboru Harada, NTT, presented
m33298
|
Proposed levels for ALS simple profile
|
Noboru Harada, Takehiro Moriya, Yutaka Kamamoto
|
|
The contribution describes a new service in Japan under the auspices of ARIB:
Video
|
Audio
|
8K MPEG-H HEVC, 100 Mbps
|
Up to 22.2 channel, 1.4 mb/s
MPEG-4 ALS
MPEG-4 AAC
|
4K MPEG-H HEVC, 40 Mbps
|
5.1 channel, 4.8 mb/s
MPEG-4 ALS
|
The contribution requests the new levels (levels 2 and 3) for MPEG-4 ALS Simple Profile. These new levels will be mirrored in MPEG-4 Conformance and MPEG-2 Systems, so that the contribution requests:
-
New levels in MPEG-4 ALS Simple Profile
-
New ProfileAndLevels in MPEG-2 Systems
-
New Conformance in MPEG-4 Audio Conformance.
The Chair noted that there will issue at this meeting a PDAM on MPEG-4 Audio, and that this will issue as IS in February 2015.
Audio experts look forward to more information from ARIB or the JNB on this topic at the July MPEG meeting.
Max Neuendorg, FhG-IIS, presented
m33168
|
Proposed Updates to MPEG-4 Audio
|
Arne Borsum, Nikolaus Rettelbach,
|
|
It was the consensus of the Audio subgroup to accept the proposals and add them to 14496-3/PDAM 5. It was noted that, with this change, the DVB and ARIB application standards may be in conflict with MPEG. Interested experts should check on this and consider whether MPEG should take some action at the 109th MPEG meeting.
Max Neuendorg, FhG-IIS, presented
m33172
|
Proposed AMD to MPEG-4 Reference Software
|
Nikolaus Rettelbach, Tobias Schwegler, Michael Haertl,
|
|
It was the consensus of the Audio subgroup to take the contribution and make it ISO/IEC 14496-5:2001/PDAM 37 New levels for AAC profiles and uniDRC support.
Max Neuendorg, FhG-IIS, presented
m33186
|
Corrections to MPEG-D USAC
|
Andreas Niedermeier, Matthias Hildenbrand, Daniel Fischer, Max Neuendorf,
|
|
The contribution listed a range of errata and inconsistencies in the specification text of the MPEG-D Unified speech and audio coding (USAC) or the corresponding software. Furthermore, USAC conformance issues as outlined in the defect report were addressed in this contribution, where issues were listed as those that:
⎯ affect only the specification text but not the reference software
⎯ affect only the reference software but not the specification text
⎯ affect both the specification text and reference software
⎯ affect only conformance test sequences
⎯ affect only conformance amendment text and conformance tables
Software and text changes
Fraunhofer-IIS requested the mandate for fixing reported issues and volunteered to put the relevant documents on the reflector for review by the Audio Subgroup.
Christof Fersch, Dolby, expressed concern that no precise information for clause 4.1 of the contribution (“Additional eSBR header requirements”) was available in the contribution. The eSBR tool provides the SBR patching method known from MPEG-4 SBR. He proposed to delete the issues raised in clause 4.1 from the list and wait for an NB Comment that would specifically address a proposed solution.
Max Neuendorf noted that the ballot for the present amendments would close after the 109th MPEG Meeting in Sapporo, after which there should be a Study on the document that addresses clause 4.1.
It was the consensus of the Audio Subgroup to approve the set of DCOR without the issues raised by clause 4.1
Dostları ilə paylaş: |