AhG on 3D Audio and DRC
The AHG on Dynamic Range Control (DRC) and 3D Audio and Audio Maintenance met Saturday March 29, 2014, 1200-1800 Hrs and Sunday January 12, 2013 1000-1800 Hrs at the MPEG meeting venue.
Ongoing CEs
Werner Oomen, Philips, presented
m33128
|
CE on IPF Crossfade
|
Werner Oomen, Johannes Hilpert, Andreas Hölzer, Max Neuendorf, Frans de Bont
|
It was the recommendation of the AhG to adopt the 0-qmf slot cross-fade for IPF.
Thomas Sporer, FhG-IDMT, presented
m33086
|
CE Immersive Audio (cross check site - Fraunhofer IDMT)
|
Thomas Sporer, Sara Kepplinger, Judith Liebetrau, Christina Mittag,
|
The contribution reports the results of a listening test of the CE technology. For 5.1 channel presentation, 1 item was worse, at the 95% level of significance (but the confidence interval was very close to overlapping with zero).
Achim Kuntz, FhG-IIS, presented
m33193
|
FhG crosscheck results for immersive audio rendering CE
|
Hanne Stenzel, Achim Kuntz
|
The contribution reports the results of a listening test of the CE technology. For both 5.1 channel and Rand5, the RM and CE systems were not different at the 95% level of significance. When looking at differences, for 5.1 presentation, the CE technology is worse than the RM technology for 2 item; and for Rand5 presentation, the CE technology is better than the RM technology for 1 item.
Jeongil Seo, ETRI, presented
m33135
|
ETRI listening test report for Immersive Audio Rendering CE
|
Jeongil Seo, Kyeongok Kang
|
The contribution reports the results of a listening test of the CE technology. For both 5.1 channel and Rand5, when averaged over all items, there RM and CE systems are not different.
Sang Bae Chon, Samsung, presented
m33138
|
Crosscheck Report on Immersive Audio Rendering
|
Sang Bae Chon, Sunmin Kim
|
The presentation gave a brief overview of the CE technology. It also reports the results of a listening test of the CE technology, including results with all listeners pooled. In 5.1 channel presentation, Signal 5 is better and mean better. In Rand5 presentation, mean is better.
When looking at only Samsung results, items 4 and 5 were significantly better in 5.1 and Rand5 presentations. The presenter noted that Samsung’s room had a 45 degree elevation to the upper ring of speakers, which might have explained the differences between Samsung, IDMT and IIS results.
Discussion
Experts noted that Samsung appeared to design the algorithm for their 45 degree top ring rather than for the nominal reference layout. Juergen Herre, FhG-IIS/ AudioLabs, noted that the documents for this CE did not have all CE requirements, e.g. Working Draft Text, and so would not be a complete CE proposal.
The Chair proposed that this CE result be further discussed during the week and that the presenter report back to the Audio group.
New CEs
Oliver Wuebbolt, Technicolor, presented
m33195
|
Scalable Decoding Mode for MPEG-H 3D Audio HOA
|
Johannes Boehm, Peter Jax, Florian Keiler, Sven Kordon, Alexander Krueger, Oliver Wuebbolt,
|
|
The contribution notes that the inherent structure of HOA supports an embedded, layered coding strategy in which increased sound scene “detail” is achieved by decoding an increasing number of HOA coefficient channels. This might have two applications:
-
Unequal error protections, e.g. in which higher protection is applied to the lower layers
-
Adaptive decoding “power,” in which higher layers are not decoded when a portable device is running low on battery power.
However, the predominant sound tool in the HOA RM breaks this inherent layered HOA structure.
The contribution proposes, e.g. to keep the 0-order HOA as is, and extract the predominant sound only from the 1st and higher order signals. Note that 0-order is an omni-directional mono signal. However, the proposed syntax supports a arbitrary order HOA base layer. Furthermore, the VVec directional tool (from Qualcomm) already supports the scalable architecture (if the mode Coded VVVecLength = 0 is excluded).
The syntax changed required are a single bit in the HOADecoderConfig(). The changes to the semantics are provided in the contribution, but not in the exact form as needed in the WD. There is no change in decoder complexity if all layers are decoded, although there is the desired reduction in complexity if only the
A base layer could be:
-
One SCE containing the 0-order HOA.
-
One SCE containing the 0-order HOA and one SCE containing a “voice commentary” audio object.
Werner Oomen, Philips, asked why not always use the scalable mode? The presenter replied that the scalable mode codes some sound components twice, once in base layer and once (perhaps) in the directional parameters. This might lead to noise unmasking since, in general, the noise in the two modes will be uncorrelated.
The presenter noted that
-
If “low-power” mode, this is already supported in the specification and what is further needed is a under interface to indicate whether the enhancement layer is decoded.
-
If “robust” mode, the application layer must present only the “error-free” layers to the 3D Audio audio decoder.
Max Neuendorg, FhG-IIS, stated that the coding efficiency might be reduced since, in scalable mode, you must code the same sound components in both the base layer and the enhancement layer. There may be two issues:
-
Noise unmasking, although Technicolor experts have not been able to identify conclusive evidence of this.
-
Extra bit requirements for coding a sound in both layers.
Nils Peters, Qualcomm, noted that the USAC CPE or QE have decorrelators that could mitigate the noise masking.
The Chair noted that there are several open issues in this contribution that should be discussed further during the week, and the results of those discussions brought back to the group.
There was no consensus to take an action, however the Audio subgroup looks forward to possible additional information at the next MPEG meetng.
Toru Chinen, Sony, presented
m33137
|
Proposal on Complexity Reduction of the MPEG-H 3D Audio CO Object Renderer
|
Yuki Yamamoto, Toru Chinen, Masayuki Nishiguchi, Mitsuyuki Hatanaka, Runyu Shi
|
|
The contribution proposed a way to reduce complexity for the case that objects are or “not active” in the sound scene. There might be a need for three levels of object “importance:” 0) is not active, 1) is active but not essential (e.g. reverberation), 2) is essential.
The Chair noted that experts seemed to support the need remove inefficiency in processing audio objects, e.g. to not mix “digital zero” object signals into the final mix. There may be several ways to accomplish this, one of which is the Sony proposal. This issue will continue to be discussed.
It was the decision of the group to put the proposed technology into the output document Technology under Consideration for CD
Achim Kuntz, FhG-IIS, presented
M33205
|
Report on Investigation of Imaginary Loudspeaker Placement
|
Christian Borß, Achim Kuntz
|
|
At the Hannover 3D Audio AhG meeting, the generalized VBAP algorithm was recommended for inclusion in the WD. However, the AhG also recommended that there be a follow-up study on placing an imaginary speaker in the center of the video screen and also on the e.g. left side of the room. This contribution reports on that study.
A listening test was performed, using dynamic single objects were used as stimuli and the following systems under test:
-
gVBAP, middle of screen
-
gVBAP, RM2
-
VBAP, two possible triangulations
The results of the test were that gVBAP, middle of screen, had the highest mean score of the systems under test. Using differential scores relative to gVBAP, middle of screen, VBAP was worse at the 95% level of significance. When objects are to the side, VBAP has best performance.
Based on these results, the contribution recommends
1) to add a rule for
-
gVBAP using a center front imaginary speaker.
-
gVBAP using a center rear imaginary speaker if there is no physical speaker there.
2) to remove tables from the WD for imaginary loudspeaker placement
3) to remove tables from the WD listing triangularization for 22.2, 10.1 and 8.1, in that they can be calculated by the QuickHull algorithm.
It was the recommendation of the AhG to adopt the recommendations 1) and 2). Concerning recommendation 3), the presenter will investigate the 22.2, 10.1 and 8.1 triangularizations created by QuickHull and report back to the group.
Achim Kuntz, FhG-IIS, provided further information. After some discussion it was agree that:
-
QuickHull triangularization tables were accepted to the Audio subgroup
-
Slight revision of processing rules were accepted by the Audio subgroup
-
Achim Kuntz, FhG-IIS, presented
m33228
|
Active downmix setting signaling
|
Alexander Adami, Achim Kuntz
|
|
The contribution reviews what is now supported in RM, which supports the transmission of a downmix matrix with active downmixing in the decoder (i.e. active downmix with phase-alignment and energy preservation), and this has be demonstrated to give high-quality results.
However, it notes that broadcasters have requested support for passive downmix (i.e. apply downmix without any adaptation).
The contribution proposes two new parameters in the downmix config of the bitstream:
-
Phase-align strength (phaseAlignStrength): permits use of additional mappings from ICC to phase alignment
-
Adaptive EQ strength (adaptiveEqStrength): weights between passive and active downmixes.
Takehiro Sugimoto, NHK, indicated that broadcasters, including NHK, are very interested in this functionality. He further asked if the two controls phaseAlignStrength, adaptiveEqStrength can be controlled by the user, i.e. as an interactivity feature. The presenter confirmed that interactivity is not supported by the proposal.
Johannes Boehm, Technicolor, asked what would happen if downmix matrix is transmitted, but the user loudspeakers are not in the expected locations. The presenter stated that this is not clear in the current WD text.
Werner Oomen, Philips, questioned the value of the 3-bit scale values as opposed to a simple 1-bit on/off value, i.e. no active or default active values.
Gregory Pallone, Orange, noted that the 3-bit scale values would be set in response to the artistic intent and that this intent would have to be communicated via e.g. workflow metadata, to the MPEG-H 3D Audio compression engine.
Yeshwant Muthusamy, Samsung, asked whether broadcasters need the 3-bit granularity in the parameters. He noted that it would be good to request information from e.g. ATSC.
It was the recommendation of the AhG to implement a 1-bit flag in the downmix config to indicate active or passive downmix with an additional 7 reserved bits, and to seek input from broadcasters as to whether the proposed 7-level active/passive granularity is useful. If so, then this functionality can be added in response to a CD ballot comment.
Takehiro Sugimoto, NHK, presented
m33080
|
Latest mixing method for 22.2 ch and
derived 5.1 ch downmixing coefficients
|
Takehiro Sugimoto, Kensuke Irie, Yasushige Nakayama
|
|
The contribution notes that in initial development of Super Hi-Vision, the sound scene was primarily ambient. However, to support cinema or television content, the sound scene must contain highly localized sound objects, e.g. dialog or foley sounds. Most importantly, depending how the 22.2 program is mixed (i.e. the nature and placement of sound objects), it may be required to use a specific downmix matrix, e.g. for producing a 5.1 channel output. Finally, NHK requests that 3D Audio support
-
Transmitted downmix matrix that must be used in a passive manner so that output can be confirmed by broadcaster during production.
-
Adaptive downmix that is e.g. best effort by a 3D Audio decoder.
NHK anticipates that the user can switch between the two modes.
Achim Kuntz, FhG-IIS, notes that if a 3D Audio decoder receives a description of loudspeaker positions, then it operates in Adaptive downmix mode and otherwise it operates in Passive downmix mode. If the user does not have e.g. ITU-R 5.1 setup but wishes to use the transmitted downmix matrix, then the implementation could give the “false” information that the user does have ITU-R 5.1 setup so that the 3D Audio decoder used the transmitted downmix matrix. Such decoder interface and controls may be at the application layer and would be outside the scope of the 3D Audio decoder.
The Chair concluded that all concerns raised by NHK have been met by that portion of the technology of the previous contribution (m33228) that was accepted into the CD.
Juergen Herre, FhG-IIS/International Audio Laboratories Erlangen (AudioLabs), presented
m33192
|
Proposal for generic rendering support for MPEG-H SAOC 3D
|
Adrian Murtaza, Jouni Paulus, Leon Terentiv, Juergen Herre, Harald Fuchs,
|
|
The contribution notes that the RM0 SAOC-3D can render to a designated set of loudspeaker layouts, e.g.: 22.2, 10.1, 8.1, 7.1, 5.1 and 2.1. For other configurations, SAOC-3D renders to the next “higher” layout and the format converter renders to the target output layout. The integrated architecture agreed to at the Hannover 3D Audio AhG meeting removed the format converter from the SAOC-3D Processing chain, and this has necessitated the changes proposed in the contribution.
The proposed new method is to specify a set of rules for allocating decorrelators to full-bandwidth audio channels, and an algorithm for sharing decorrelators in the case that there are more full bandwidth audio channels than decorrelators. The presenter noted that m33198 addresses the issue of how many decorrelators are available in the 3D Audio decoder.
The Chair noted that it is clear that modifications to SAOC-3D are needed. There were concerns raised by experts that it was not clear exactly what new syntax and semantics are being proposed. The presenter will get more detail and make an additional presentation later in the week.
Further discussion
The presenter had break-out sessions with interested experts to address concerns. These concerns and their resolution were summarized for Audio experts.
It was the consensus of the Audio subgroup to adopt the proposals into the CD text.
Richard Furse, Blue Ripple Sound, presented
m33176
|
Support for Proprietary Renderers in MPEG-H Audio
|
Richard Furse
|
|
The contribution noted statements from industry (e.g. DCI and BS.2266) that industry wishes to have audio decoders interoperate with alternative, proprietary, renderers. In the final audio mix, the mix engineer may work with an MPEG 3D Audio renderer or some other renderer. In the latter case, MPEG 3D Audio would have to accept content produced by that other renderer, and this may be best rendered by a non-MPEG 3D Audio renderer.
The Chair noted that is seems likely that proprietary renderer would need:
-
USAC decoded output and USAC element to loudspeaker mapping for channel-based signals
-
USAC decoded output and location metadata for objects
-
Reconstructed HOA N-order representations and nearfield compensation metadata
-
SAOC fully decoded and rendered signals (but this needs to be checked)
The Chair noted that Audio experts endorse the concept of supporting proprietary renders, however on this issue there are more questions than answers. He further noted that contributions m33134 and m33131 should be studied as they may address issues of how proprietary rendering interfaces might work.
This topic will continue to be discussed during the week.
Zongxian LIU, Panasonic, presented
m33348
|
Core Experiment Proposal on Low Complexity HOA Rendering
|
Zongxian LIU, Naoya TANAKA
|
|
The contribution proposes methods to reduce the complexity of RM1 HOA decoding. The changed do not result in any change in the decoded output, but does request one additional bit in the bitstream.
Oliver Wuebbolt, Technicolor, noted that there is no need for an additional bit in the bitstream, since the HOA extension payload already has equivalent information.
The Chair noted that there is no need for the proposed new bit in the bitstream, as equivalent information is already present. He proposed three possible actions to take based on the contribution:
-
No action. The contribution is noted but is for information only.
-
The technology in the contribution is put in an informative annex of the CD.
-
Take action (2) and also put an implementation of the technology into the Reference Software.
The group will continue to discuss which of the three actions is to be selected.
Later in the week Panasonic experts made a proposal for option 2), above.
It was the consensus of the Audio subgroup to put this technology into an informative annex of the CD.
Frank Baumgarte, Apple, gave a presentation that covered the following contributions:
m33249
|
WD Text on Dynamic Range Control
|
Frank Baumgarte, David Singer, Fabian Kuech, Michael Kratschmer, Christian Uhle, Bernhard Neugebauer, Michael Meier
|
|
m33250
|
List of Modifications of Dynamic Range Control Tool
|
Frank Baumgarte, David Singer, Fabian Kuech, Michael Kratschmer, Christian Uhle, Bernhard Neugebauer, Michael Meier
|
|
m33251
|
Dynamic Range Control Reference Software
|
Frank Baumgarte, Michael Kratschmer, Bernhard Neugebauer, Michael Meier
|
|
m33252
|
Dynamic Range Control Reference Software: List of Bugfixes
|
Frank Baumgarte, Bernhard Neugebauer, Michael Meier
|
|
m33253
|
Dynamic Range Control Reference Software: List of Modifications
|
Frank Baumgarte, Bernhard Neugebauer, Michael Meier
|
|
The presentation listed the modifications to the DRC tools, all of which are incorporated into the WD text contribution:
The presentation listed the bugfixes to the Reference Software, all of which are incorporated into the SVN revision 226.
The presentation listed the modifications to the Reference Software that increase functionality so that it more fully supports normative aspects of the WD text.
It was the recommendation of the AhG to accept all proposals into the WD on DRC.
The presenter noted that, at the 107th meeting there was WD on 14496-3/AMD 5, Support for Dynamic Range Control. This could progress to CD at this meeting.
Frank Baumgarte, Apple, presented
m33254
|
Dynamic Range Control Tool Extension Proposal
|
Frank Baumgarte
|
|
The contribution proposes an extension to the DRC tools to support “ducking,” as might be used to attenuate the main program when a narration commentary is active. Each narration channel group would have an associated ducking gain sequence. Ducking would be applied to all channels that are not in the narration sequence, with a scaling factor that can vary per target channel group. In this way, e.g. L, R can have one ducking scaling and Ls, Rs have a different ducking scaling.
It was the recommendation of the AhG to accept the proposals into the WD on DRC.
Recommendations and review of AhG Report
The AhG report was reviewed and was approved by the AhG members present.
Dostları ilə paylaş: |