Joint Meetings Audio, Systems on 3D Audio and DASH and A/V Synchronization (Tue 1130-1230)
Max Neuendorf, FhG-IIS, gave a presentation on the proposed 3D Audio “Immediate Play-out Frame” (IPF). The key attributes of the IPF are:
-
An IPF contains all Access Units (AU) needed for both Random Access and pre-roll.
-
This is larger (i.e. more bits) than a typical AU and requires more decode processing
-
There is required some overlap-add (e.g. 128 samples) to blend at the cross-over point.
Thomas Stockhammer, noted that this can be a general mechanism for Random Access and Splicing.
Ingo Hofmann, FhG-IIS, noted that DASH client may fetch segments well in advance of when they are needed, so that a bitrate peak at the SAP is not an issue.
Masayuki Nishiguchi, Sony, gave an application scenario on 2-screen presentation.
Conclusions
-
See MPEG-2 TS extensions “13818-1 PDAM 6 Delivery of timeline for external data.” Also see on “Thoughts on timeline alignment”
-
If each stream has same MPEG-2 Systems Transport Stream 27 MHz clock, and a PLL locked to this clock can control the device local clock, then it just works.
-
Use PTS which have a valid “zero” synchronization
Or perhaps
-
2 devices clock locked to a common external source (e.g. GPS)
-
have timesamps for each stream
Fingerprint is audio business and information for out-of-scope implementation
SMPTE also has this effort
It was decided that the Sony contribution would be an exploration.
File format issues (Tue 1700-1800)
The Audio Chair presented the EBU “Audio Model” document.
Audio will generate Liaison statement to say “MPEG is happy to support EBU in mapping the EBU Audio Model to the MPEG-4 File Format”
David Singer, Apple, presented
WD on Improved audio support in the ISO Base Media File Format, which will be a WG11 document from this meeting.
Audio subgroup agreed to study this WD as part of the DRC workplan and propose answers to the “questions” explicitly posed in the WD text (Section 2.5). For example, File Format boxes may be expressed in an equivalent but more compact form for use in Audio elementary streams.
MPEG Assets: Conformance (Wed 1130-1200)
Put all MPEG Audio conformance data on FTP account at:
wg11.sc29.org
This will be done by the Audio Chair (see Christian Tulvan email of April 26, 2013).
This includes
-
Complete work – copy SC29 freely available standards to this repository
-
On-going work – copy cross-checked data to this repository
This conformance information can be viewed and downloaded at this URL
http://wg11.sc29.org/conformance
Augmented Reality (Wed 1200-1300)
The 3DG Chair reviewed the conclusions of the joint meeting at the previous MPEG meeting. He gave an overview of MPEG-4 Advanced Audio BIFS nodes that might be used in Augmented Reality.
Experts noted that estimates of the real-world acoustic environment could be done by:
-
Visually observe room geometry and make an assumption of acoustic reflectivity
-
Have user emit a “probe signal” (e.g. click) whose acoustic impulse response is then used to estimate the room acoustic properties.
The 3DG Chair noted that Augmented Reality could enhance or process aspects of real-world audio.
The location of (possibly many) microphones could be used to perform generalized beamforming as a means to genaralized audio scene analysis.
Task Group discussions 3D Audio
Monday - 3D Audio Information
The Chair presented
m31416
|
Timeline and Objectives for MPEG-H 3D Audio
|
Schuyler Quackenbush
|
Nils Peters, Qualcomm, presented
m31215
|
Description of Qualcomms’s 3D-Audio listening room
|
npeters@qti.qualcomm.com, mmorrell@qti.qualcomm.com, dsen@qti.qualcomm.com
|
Binauralization formats
Werner Oomen, Philips, presented
m31367
|
HRTF input file format update
|
Aki Härmä, Werner de Bruijn, Werner Oomen
|
The contributions points out that different use cases (e.g. battery powered vs. line powered) motivate different binauralization computational complexity (e.g. parametric vs. FIR). Hence, there is a need for different file-based representations of a BRIR.
Thomas Sporer, FhG-IDMT, Clemens Par, Suissaudec, reported on the status of X212 standardization at the 135th AES meeting. There will be a draft in December 2013, after which there is a ballot. Any AES member can comment during the ballot period.
Marc Emerit, Orange, presented
m31427
|
Thoughts on Binaural Decoder Parameterization
|
Marc Emerit, Gregory Pallone
|
The contribution poses the question: should MPEG-H 3D Audio be required to read the AES X212 SOFA file format. There could be another interface to the 3D Audio decoder which could read:
-
binaural parameters (e.g. DE FIR, LR parameters)
-
head tracker parameters
-
loudspeaker positions
Clemens Par, Suisaudec, noted that SOFA (which is the X212 format) can have additional restrictions imposed by MPEG 3D Audio so that the file format is more “light weight.”
Marc Emerit, Orange, proposes the flowing:
-
Module reads X212 SOFA file and produces, e.g. parameter sets A and B in a specified data structure.
-
3D Audio e.g. parameter sets A and B in a specified data structure so it can perform binauralization.
Audio Subgroup Experts will work to harmonize the Orange 3D Audio interface and the Philips 3D Audio interface and report back later in the week.
Jan P, FhG-IIS, notes that we should pick a normative binauralization technology and support that low-level parameterization. Werner Oomen, Philips, supports the idea of supporting additional, generic, parametric representations.
Tuesday - 3D Audio RM0 Technical Descriptions
CO
Johannes Hilpert, FhG-IIS, presented
m31450
|
Working Draft Text of MPEG-H 3D Audio CO
|
Andreas Hölzer, Max Neuendorf, Johannes Hilpert, Michael Fischer, Christian Borß, Sascha Dick, Christian Helmrich, Adrian Murtaza, Simone Füg, Achim Kuntz, Sascha Disch, Andreas Niedermeier, Christian Neukam
|
m31449
|
Software for MPEG-H 3D Audio CO
|
Christian Ertel, Michael Kratschmer, Adrian Murtaza, Sascha Dick, Andreas Hölzer, Andreas Niedermeier, Michael Fischer, Simone Füg, Christian Borß, Bernhard Neugebauer
|
The contributions give an overview of the 3D Audio encoder and decoder as WD text and RM software.
WD Text
Decoder can do
-
Discrete coding of
-
Channels
-
Prerendered objects (to channels)
-
Objects
-
SAOC coding of objects
USAC-3D has new tools (with respect to USAC)
-
Vertical channel pair coding
-
Quad (vertical/horizontal) channel element coding
-
MPS 212 residual signal can be transmitted for only designated time/frequency tiles.
-
Transform splitting: two 512 length transforms are interleaved and processed as one 1024 block. Only TNS processing requires that data be de-interleaved, all other processing is unchanged.
-
Enhanced noise filling: noise filling uses actual noise patterns from some other (e.g. highly correlated) frequency tile of the same channel (which might be L, R or Mid).
Object metadata
-
Support for RAP
-
Typically 2 to 3 kb/s per object
SAOC object coding
-
Extension of SAOC to support more objects and more downmix channels
-
Downmix is not “compatible” in that it is not meant to be listened to
-
Up to 128 objects to a flexible number of downmix channels (e.g. 8 downmix at 256 kb/s)
-
Uses SAOC parameter coding
Mix objects onto channels to get combined loudspeaker signals
Renderer
-
Done in QMF domain
-
Mapping
-
Mix channel mapped directly to output loudspeaker
-
Panned between two or more output loudspeakers using VBAP
-
(e.g. Voice of God to all in 5.1 output configuration)
-
Phase alignment and energy preservation
Software
-
Modular libraries
-
Connection via files: time-domain samples or QMF T/F values
Johannes Hilpert, FhG-IIS, presented
m31445
|
Additional Information on MPEG-H 3D Audio CO Technology
|
Sascha Dick, Sascha Disch, Leon Terentiv, Christian Helmrich, Johannes Hilpert
|
The contribution describes the changes to USAC to create USAC-3D and the changes to SAOC-DE to create SAOC-3D. Changes are described above.
There was some discussion on whether to create a MPEG-D USAC “v2” that would be USAC-3D. USAC-3D will persist and can perform the role of “MPEG-D USAC v2.” One could move this to MPEG-D. Alternatively, if USAC does not have significant market penetration, MPEG could amend MPEG-D to incorporate MPEG-3D such that today’s MPEG-D no longer exists and there is only one USAC coder that has all features of MPEG-3D.
Johannes Hilpert, FhG-IIS, presented
m31440
|
Information on the Software for MPEG-H 3D Audio CO
|
Andreas Hölzer, Achim Kuntz, Adrian Murtaza, Sascha Dick, Michael Fischer
|
The contribution reports on bugs found in the RM0 software. The presenter felt that the bug fixes do not have a significant perceptible impact on the decoded output. Fixed code with a quality assurance test could be available at the next meeting. Assuming this is true, then CEs could proceed with RM0 bitstreams and decoded waveforms.
It was the consensus of the Audio subgroup to incorporate all the bugfixes into the reference software. The Audio subgroup anticipates that FhG-IIS will do a quality check on the bugfixes during the AhG period and bring to the next MPEG meeting
-
RM0.5 reference software
-
RM0.5 bitstreams and decoded waveforms
A second point in the contribution was that RM0 uses the LD-SAOC filterbank for the format converter. At the e.g. 512 kb/s rate, the core coder (USAC-3D) uses “classic” QMF and was interfaced to the render via time domain signals. Evidence was presented (as listening test data) that using “classic” QMF in the render at this bit rate does not harm the performance.
Draft workplan for listening test to explore if LD-SAOC provides an advantage vs. classic QMF.
HOA
Oliver Wuebbolt, Technicolor, presented
m31408
|
RM0-HOA Working Draft Text
|
Johannes Boehm, Peter Jax, Florian Keiler, Sven Kordon, Alexander Krueger, Oliver Wuebbolt, Gregory Pallone, Marc Emerit, Jerome Daniel
|
HOA 3D Audio decoder consists of
-
Bitstream unpacking
-
Parameter decoding
-
Core codec channel decoding
-
Mapping of core codec channel to Predominant/Ambient channels
-
Synthesize HOA channel coefficients using Predominant/Ambient channels and an overlap-add to smooth the transition between adjacent blocks
Binauralization can use as input either loudspeaker feeds or HOA coefficient channels (e.g. if they have fewer channels than loudspeaker feeds). Binauralization is via the Orange algorithm.
Predominant sounds are identified using a possible 900 points on the sphere.
Johannes Boehm, Technicolor, presented
m31410
|
RM0-HOA Reference Software
|
Johannes Boehm, Peter Jax, Florian Keiler, Sven Kordon, Alexander Krueger, Oliver Wuebbolt, Gregory Pallone, Marc Emerit, Jerome Daniel
|
The zip archive contains two software archives: RM0 decoder and informative encoder.
Additional Functionalities for Version 1
Max Neuendorf, FhG-IIS, presented
m31360
|
Contribution to MPEG-H 3D Audio Version 1
|
Max Neuendorf, Jan Plogsties
|
The contribution proposes several new functionalities for MPEG-H 3D Audio in response to m14855, “Timeline and Requirements for MPEG-H 3D Audio Version 1,” from the 105th meeting. Proposed are architectures and/or coding schemes for:
-
DRC, Loudness Control, Clipping Prevention, Peak Limiting – architecture
-
Downmix – coding method
-
Synchronization – problem statement and candidate solution
-
Can use ISO/IEC 13818-1:2013/AMD 6 “Delivery of timeline for external data”
-
Random Access – problem statement and candidate solution
-
Bitrate Adaptation – problem statement using Random Access solution
-
Solution does not require continuity of “bitstream state” across SAP boundary
Experts should draft a workplan to
-
harmonize the DRC/LC/gPL/PL tools with the selected DRC technology
-
continue to discuss
-
meet with other MPEG experts in Video, systems on A/V synchronization
Masayuki Nishiguchi, Sony, presented
m31368
|
Proposed scheme for synchronization of multiple audio streams
|
Shusuke Takahashi, Akira Inoue, Masayuki Nishiguchi, Toru Chinen,
|
The contribution desires to not change anything in a legacy “main screen” broadcast system. It is not possible to have timestamps in the legacy system. However, it is also desired to have a second, e.g. Tablet, presentation that can be synchronized with the main screen legacy broadcast. It identifies the following requirements:
-
32 ms time resolution
-
Capable of working in very low SNR of main screen acoustic presentation
Evidence was presented indicating that proposed technology will work in as low as -18 dB SNR acoustic environment (i.e. SNR at tablet microphone).
This discussion will be continued in the Audio, Systems joint meeting.
New CE
Werner Oomen, Philips, presented
m31361
|
Announcement of CE on 3D-Audio rendering
|
Werner de Bruijn, Aki Härmä, Werner Oomen
|
The contribution announces the intent for a CE on rendering. Specifically, it proposes a CE that tests rendering for the case of very different loudspeaker positions, e.g. a front loudspeaker sound bar plus 2 surround speakers. The envisioned subjective test requires
-
The target loudspeaker geometry, e.g. loudspeaker bar
-
Typical room acoustics, e.g. a more typical reflectivity
Proposes that there should be a rendering algorithm choice based on loudspeaker position and/or geometry. This assumes that the decoder can be informed of the target loudspeaker geometry.
The CE would use a MUSHRA test in which
-
The Open reference is 22.2 presentation
-
There is no no hidden reference
-
RM0 renders to CE speaker geometry
-
RM0+CE renders to CE speaker geometry
Philips would make available to a cross-check site the target loudspeaker device.
Richard Furse, Muse440, asked whether such an alternative rendering would be normative. The presenter stated that MPEG could choose to standardize additional renderers or to just support proprietary renders via some normative interface.
The Chair noted that this CE is a vehicle to clarify
-
An API to an alternative renderer
-
Whether an alternative renderer should be standardized
The Audio subgroup looks forward to more information at the next meeting.
3D Audio Information
The Chair presented
m31679
|
Some requirements on future audio codec
|
Schuyler Quackenbush, for Matthieu Parmentier, francetelevisions
| -
Need to do head tracking and specification needs an appropriate API.
-
What is loudness of > 5.1 presentations?
-
Consider a Liaison to ITU-R re BS.1770.
-
Synch for second device, e.g. to play program in alternate language
-
Dialog Enhancement
-
Need add’l metadata in CO 3D Audio to identify dialog objects
HOA CE/Merge
Nils Peters, Qualcomm, presented information on the following contributions
m31216
|
Technical Description of Qualcomm’s Candidate for the Accelerated HOA Core Experiment
|
npeters@qti.qualcomm.com, dsen@qti.qualcomm.com
|
m31411
|
RM0-HOA-QCOM listening test results from Technicolor
|
Johannes Boehm, Peter Jax, Florian Keiler, Sven Kordon, Alexander Krueger, Oliver Wuebbolt
|
m31425
|
Orange listening tests results for the CE on HOA
|
Gregory Pallone
|
The contribution reports on a CE in which the Technicolor “Predominant Sounds” module is replaced by the Qualcomm “Significant Sounds” module.
The results of a MUSHRA listening test with 24 listeners. When differential scores are analysed, it shows that:
-
5 items better
-
none worse
-
mean better
These items tended to be “dense” sound fields. In additions, the CE technology decoder as compared to the RM0 decoder had
-
48% complexity reduction
-
35% memory reduction
Thomas Sporer, FhG-IDMT asked whether the presenter checked whether the listening test sites could be pooled, based on statistical principles. The presenter noted that he checked and can confirm that an ANOVA does not show the test sites as a significant factor.
Max Neuendorf, FhG-IIS, asked for raw listener data. Chair said that it would be made available. The Chair noted that there appears to be support for:
A final decision will be made after experts have the opportunity to review the data.
Continued Discussion
Werner Oomen, Philips, stated there were significant differences between listening test sites.
Bernhard Grill, FhG-IIS, reminded that the goal is to build the best standard, and suggested to draft a workplan to further investigate, e.g. the performance of the integrated system.
Deep Sen, Qualcomm, noted that Qualcomm has an algorithm that selects between the Technicolor and the Qualcomm tools in a signal-responsive manner.
The Chair suggested that on Friday morning Qualcomm present information on what the discriminator tool would do in terms of deciding which tool to use (Qualcomm or Technicolor) in intervals of some TBD length for each CfP test item. It that information permits Audio experts to infer the quality rating of the integrated system including the discriminator tool, then a decision could be made. Otherwise, Qualcomm should conduct a listening test to determine the performance of the integrated system including the discriminator tool and report this at the next meeting.
General Discussion
Max Neuendork, FhG-IIS, made a presentation on the Contributions to V1 document
It was the consensus of the Audio subgroup to
-
Add to the Workplan on DRC that the DRC to be produced is extended and adapted support the FhG-IIS proposal (i.e. Loudness Normalization, DRC, guided Clipping Prevention, Peak Limiter) and should work with the 3D Audio architecture. Use of Peak Limiter is optional, but strongly encouraged.
-
Add to the Workplan on DRC to study how to represent, store and transmit Downmix Matrix Coding and Optional equalization parameters for use prior to downmix
-
Note in Workplan on 3D Audio: Downmix Matrix Coding and equalization parameter coding is added to RM0 text and will be available at the next MPEG meeting.
-
Note in Workplan on 3D Audio: Immediate Playout Frame coding is added to RM0 text and will be available at the next MPEG meeting.
-
Note in Workplan on 3D Audio: Audio experts should study MPEG-2 Transport and how it is able to support two-screen synchronization.
Joe H made a presentation on testing the significance of the LD-SAOC update issues
It was the consensus of the Audio subgroup to conduct the test as indicated and to document the test process in the Workplan on 3D Audio.
A/V Synchronization
Masayuki Nishiguchi, Sony, presented the Workplan on Audio Synchronization.
Oliver Webeult, Technicolor, noted that there are two classes of technology that are applicable:
-
Fingerprinting
-
Watermarking
He further noted that the test conditions for the evaluation should be defined.
The workplan will be edited and reviewed again on Friday.
The Chair encourages Audio experts to bring contributions to the next MPEG meeting on the topic of 3D Audio Renderer API.
Workplan on DRC
Fabian Kuech, FhG-IIS, presented a draft workplan which was reviewed.
Binauralization CE
Jan Plogsties, FhG-IIS, make a presentation that made the following points
-
5of 6 systems under test are similar
Needs to be API for alternative Binauralization algorithms, including
-
Output of MPEG-H decoder
-
BRIR input
FhG-IIS can make “Mozart” BRIR available as part of 3D Audio Reference Software
Check some or all Binauralization CE technology with other BRIRs
Define API (or APIs) so that use of alternate BRIR with normative 3D Audio Binauralization is possible
Define API so that alternate binauralizations are possible
The Chair summed up the consensus points in the group. Audio experts agree that:
-
Should support alternate, proprietary, binauralization to headphones.
-
Should support alternate, proprietary, rendering.
3D Audio should support a normative interface to decoded channel/object/HOA signals and metadata. The Audio subgroup looks forward to contributions on these topics at the next meeting.
3D Audio should support a normative interface to BRIR. There was discussion on “AES X212” or “parametric” interfaces, and this needs more discussion. The Audio subgroup looks forward to contributions on this topic at the next meeting. Audio experts agree that there should be a path so that users have access to a potentially large market of BRIR.
Selection of Binauralization Technology
Gregory Pallone, Orange, gave a presentation.
He proposed revised FFT and QMF complexity.
He noted that a way forward could be to select a time-domain and a QMF domain binauralization, with selection based on whether data from decoded channel/object/HOA signals is time-domain or QMF domain.
He further noted that the marketplace BRIRs could be time-domain BRIR, QMF domain representations or other parameterized representations. One algorithm for converting time-domain BRIR to 3D Audio parameterized representations should be normative. Alternate parameterizations would be available via normative parameter interfaces.
Jeongil Seo, ETRI, stated a different view: that time-domain BRIR in the marketplace is enough, and that a further parameterization block will deliver the necessary parameters to the 3D Audio binauralization.
Taegyu Lee, Yonsei and Henney Oh, Wilus, gave a presentation on complexity of the various binauralization algorithms.
Jan Plogsties, FhG-IIS, stated that computational complexity in the MPEG Audio subgroup is traditionally paper-in-pencil analysis. He further proposed that Audio either
-
Test performance of systems using other BRIR
-
Analyse systems to determine flexibility for other BRIR
Werner proposes to select one now and analyize it for BRIR.
Chair proposes that we
-
Draft a Workplan containing additional details to be reported in contributions to the next meeting.
-
Long and short BRIR
-
Conduct listening test for IIS, YU, QUAL, ETRI, HUA, ORA
-
Experts bring more information on FFT and QMF complexity to the next meeting.
SAOC
Harald Fuchs, FhG-IIS, presented information on the following contributions
m31270
|
Comments on SAOC Dialog Enhancement Profile
|
Joonil Lee, Jeongil Seo, Henney Oh
|
m31407
|
Information on Dialog Enhancement profile for SAOC
|
Oliver Hellmuth, Harald Fuchs, Jürgen Herre, Sascha Disch, Jouni Paulus, Leon Terentiv, Falko Ridderbusch, Adrian Murtaza
|
m31406
|
Consideration of downmix mastering effect compensation processing for SAOC-DE
|
Oliver Hellmuth, Harald Fuchs, Jürgen Herre, Sascha Disch, Jouni Paulus, Leon Terentiv, Falko Ridderbusch, Adrian Murtaza
|
The presenter proposes
-
PDAM text modifications based on a common view from all contribution authors
-
Disabling Low Power mode in SAOC-DE Profile (since there was virtually no LP complexity advantage in the context of SAOC-DE)
-
Disable MCU tool
-
Disable Send effects, Insert effects restricted.
-
Addition of inverse PDG tool (to permit clean object manipulation even when downmix has post-mastering)
It was the consensus of the Audio Subgroup to incorporate all recommened changes in the presentation into the SAOC-DE PDAM text. The Chair noted that the DoC on ISO/IEC 23003-2:2010/PDAM 3, Dialog Enhancement must still be reviewed by the group.
Leon Terentiv, FhG-IIS, presented
m31409
|
Report on corrections for MPEG SAOC
|
Oliver Hellmuth, Harald Fuchs, Leon Terentiv, Jouni Paulus, Falko Ridderbusch, Adrian Murtaza,
|
The contribution proposes several corrections to MPEG SAOC text, most are aligning text to reference software, but some are editorial.
It was the consensus of the Audio Subgroup to issue these corrections as a Defect Report on MPEG SAOC text. The Chair noted that there must be an NB ballot comment that asks to “fix all known errors” in order to incorporate the Defect Report corrections into ISO/IEC 23003-2:2010/COR 2, SAOC that is expected to issue at the next meeting.
The Chair also noted that the editor should remember to correct the reference software error in some future DCOR.
USAC
The Chair presented
m31390
|
Quality evaluation results of the stereo expansion module with the USAC Common Encoder, JAME
|
Jeongook Song, Henney Oh, Schuyler Quackenbush, Hong-Goo Kang
|
The contribution reports on the final step in the JAME project, revising the stereo coding modules. A subjective listening test information is given that, in summary, shows that JAME is 30 MUSHRA points better than the MPEG USAC reference encoder, and that the USAC Reference Quality Encoder (i.e. the USAC RM11 bitsteams) is 10 MUSHRA points better than JAME. The contribution notes that JAME1.13.5 (Committed on September 26, 2013) is available on the MPEG SVN server.
The Chair noted that the JAME code base can be used by MPEG-H 3D Audio CE proponents to develop and verify a desired API in the context of JAME and use the modified JAME code base to communicate the desired API the 3D Audio RM proponent so that the proponent can modify the Reference Quality codec to support the desired API.
Daniel Fischer, FhG-IIS, presented
m31363
|
Proposed corrections to ISO/IEC 23003-3:2012/Amd.1 and Amd.2
|
Daniel Fischer, Max Neuendorf
|
The contribution documents:
-
Several conformance stream errors
-
Several reference source code errors
It was the consensus of the Audio subgroup to issue this information as a Defect Report on USAC Conformance and Reference Software.
Dostları ilə paylaş: |