Joint Meetings Raw Audio and Video, with 3DV, Wednesday 11:30 – 12:00
The Audio Chair presented
m28325
|
Thoughts on ISO/IEC 14496-1:2010/FDAM 2 Support for raw audio-visual data
|
Schuyler Quackenbush
|
The contribution contained recommended changes to the Systems components to fully describe a raw audio or video signal. There as discussion on the tradeoffs of a simple Registration Authority description versus a simple Systems syntax.
Audio experts will propose a revised, more complete Systems syntax that is compatible with a simple Registration Authority description. David Singer, Apple, noted that Motion JPEG 2000 may have Systems syntax for carriage of Linear PCM.
MP4 FF extensions for Audio, with Systems, Thursday 9:00 – 10:00
David Singer, Apple, presented
m28160
|
Enhanced Audio Support for MP4 File Format
|
David Singer
|
The presentation notes three issues in the MP4 FF and proposes solutions:
-
Support for high sampling rates. The proposal is to have a 32-bit number for the sampling rate.
-
Support for richer metadata: loudness and peak value. This would be “anchor” loudness, program loudness at various time window lengths, and peak value. This would be in a high-level box that is easily accessable
-
Support for richer metadata: dynamic range control.
Issues to be addressed by Audio experts
-
Review
-
Recommend a path to harmonizing with duplicate technology in Audio
-
Consider what information is transmitted and whether the representation is appropriate
It was agreed to incorporate the 32-bit sample rate capability into an existing issue this document as Technologies Under Consideration in 14496-12, ISO Base Media FF.
David Singer, Apple, presented
The presenter used the opportunity of the joint meeting to draft the DoC on this item.
3D Audio in Augmented Reality, with 3DV, Thursday 10:00 – 11:00
The Audio Chair and 3DV experts discussed what is needed in Augmented Reality. Experts noted that Audio technology coordinates are typically listener-centric, while Augmented Reality is scene-centric.
Objects in a virtual scene, listener in virtual scene, generate objects position in listener coordinates and render.
Possible Audio work: add directivity and distance rendering for objects.
3D Audio Call for Proposals, with Requirements, Thursday 14:00 – 15:00
The 3D Audio Call for Proposals was presented and discussed. Gregory Pallone, Orange Labs, reminded experts that Peter Grosche, Huawei, raised an inconsistency in the “envisioned standard” paragraph of the Call for Proposal (3.2), which was silent about HOA. That was fixed by adding the sentence that it “may support HOA inputs” The Call was approved by Requirements. In addition, the 3D Audio CE Methodology was reviewed and the encoder source code requirements for CE integration was highlighted.
Task Group discussions MPEG-4, MPEG Surround, USAC Maintenance
Nikolaus Rettelbach, FhG-IIS, presented
m28129
|
Interoperability Tests regarding "AMENDMENT 4: New levels for AAC profiles"
|
Nikolaus Rettelbach
|
The contribution zip archive contains an excel spreadsheet that contains an extensive report on interoperability of 7.1 channel bitstreams and legacy products. Only implicit signalling was investigated.
Christof Fersch, Dolby, suggested that a further investigation of the problematic bitstream/product combinations, but now using explicit channel configuration.
Daniel Fischer, FhG-IIS, presented
m28139
|
Proposed update to MPEG Surround Reference Software
|
Julien Robilliard, Andreas Holzer
|
The contribution proposes fixes to the MPEG Surround Reference Software, which are described in the contribution and supplied as corrected software in the contribution zip archive.
It was the consensus of the Audio subgroup to issue the contribution as a DCOR.
Daniel Fischer, FhG-IIS, presented
m28113
|
Proposed corrections to USAC
|
Daniel Fischer, Julien Robilliard, Andreas Niedermeier
|
|
The contribution proposes corrections to the specification text and the reference software
Corrections to reference software only:
-
Remove restrictions for lower sampling frequencies (always use 64 bands of the QMF filterbank)
-
Correct same two bugs reported in m28139
Corrections to reference software and text:
-
In certain cases (i.e. when you call the arithmetic decoder and there is nothing to decode) the final state of the arithmetic coder in indeterminate. Fixes are provided for text and code to bring it into a deterministic state.
-
Force phase angle to modulo(2*pi) for phase interpolation.
-
Forbid combination of Unified Stereo and Complex Stereo Prediction.
It was the consensus of the Audio subgroup to
-
Issue a Study on USAC Reference Software DAM
-
Issue a DCOR on USAC text.
Daniel Fischer, FhG-IIS, presented
m28111
|
Proposed study on ISO/IEC 23003-3:2012/DAM 1, USAC Conformance
|
Daniel Fischer, Max Neuendorf
|
|
The contribution has updates to the conformance spreadsheet and the conformance text. The text changes are:
-
A number of bitstream restrictions
-
Added descriptions of bitstreams that are supplied at this meeting
-
Modified descriptions of existing bitstreams
The conformance spreadsheet is modified
-
Added “status” sheet to show available streams and cross-checked streams.
Current status: 95% of defined are available and 50% of available are cross-checked.
Support for MPEG Audio in the Marketplace
A Liaison statement to DVB was reviewed and approved.
3D Audio
Thomas Sporer, FhG-IDMT, presented
m28124
|
Testing Non-Sweet Spot Listening Positions
|
Thomas Sporer, Christoph Sladeczek, Robert Steffens
|
The contribution presented ideas on how to test for audio quality in “non-sweet-spot” listening positions. It proposes a two-step approach:
-
MUSHRA test at sweet spot
-
Multi-stimulus test at non-sweet-spot locations (i.e. MUSHRA without open reference)
The results are subject to this analysis:
-
Off sweet spot score for a system under test cannot be higher than its score in the sweet spot (higher scores are limited to score value of sweet spot).
It proposes to modify the “alternate listener positions” as specified in BS.1116
-
Proposed off-sweet-spot distance is less than what is defined in BS.1116
-
0.25 B in left, right and front-back directions
-
It proposes two off-sweet-spot locations: Front-Left and Back-Right.
The contribution notes that sweet-spot evaluation is done in Test 1.1, and Test 1.2 therefore need only assess off-sweet-spot listener locations. Experts noted this proposal requires that a listener take both Test 1.1 and Test 1.2 and that it would be most robust if Test 1.1 immediately follows Test 1.2.
The presenter noted that Test 1.2 should have the same objectives, e.g. assessment high quality audio, as Test 1.1. Further, he noted that the listener instructions should not require that one item (i.e. usually the hidden reference) be scored at 100. If need be, Test 1.2 could use fewer test items or fewer listeners at each off-sweet-spot position.
There was large agreement to adopt the proposed lower distance from sweet-spot to off-sweet-spot locations.
However, these were left as open issues:
-
2 or 4 positions
-
1 or 2 bitrates.
Gregory Pallone, Orange Labs, presented
m28159
|
Proposed Evaluation Procedures for 3D Audio
|
Gregory Pallone
|
The presenter noted that Test 1.2, off-sweet-spot, is very important because it represents a very common use case of many people in a home theatre setting.
The contribution proposes that Test 1.2 clearly state that it “will use same decoded wavefile as is used in Test 1.1.”
It was the consensus of the Audio subgroup to
-
Test 1.2: modify Call so that Test 1.2 clearly state that it “will use same decoded wavefile as is used in Test 1.1.”
For Test 1.3, it does not support using a loudspeaker references since:
-
Listener may not be in BRIR room
-
Listener can move the head to hear a different sound field when listening to speakers, but cannot when listening to headphones.
Four sites have contributed low-resolution BRIR: FhG-IIS, FhG-IDMT, Technicolor, Orange Labs. The presentation noted the advantages of low-resolution vs. high-resolution in binauralizing channel or object based content.
The contribution proposes:
-
MUSHRA with open reference
-
HR-HRTF for objects
-
Subset of HR-BRIR (effectively LR-BRIR) for channels and HR-HRTF for both C+O and HOA (a different approach would be to renderer these to 22.2, which are then binauralized as for channel signals).
The contribution notes that this is equivalent to just choosing a HR-BRIR.
Werner Oomen, Philips, stated that since the HRTF are not individualized, the low-resolution is sufficient when one uses HRTF interpolation. Thomas Sporer, FhG-IDMT, noted that based on the experience of his lab, BRIR is better in that it captures aspects of the room, and that LR-BRIR is all that we have. Note that “better” is the ability of a subject to correctly localize a sound in space. He further noted that BRIR often helps subjects to externalize the sound scene.
Clemens Par, Suisaudec, asked the presenter if he had evaluated his proposal with respect to localization and coloration. The presenter said that the evaluation was performed both objectively with the figures shown during the presentation, and also with informal listening tests where the difference between HR-HRTF and LR-HRTF was audible.
The presenter stated that the use of HR-BRIR instead of LR-BRIR is probably equivalent to what m28181 shows (a decrease of 30 MUSHRA points), but acknowledges that HR-BRIR are not available. However since HR-HRTF are available, they could be used to create the reference for some objects and HOA items (the ones that have the most reverberation for example). All the audio experts agreed that if there were HR-BRIR available, they would have selected them.
The Chair noted that, as stated in the contribution, the LR-BRIR has deficiencies for object synthesis. Is this deficiency greater than the deficiency due to a “non-peronalized” BRIR? Presenter responded that LR-BRIR can give incorrect Inter-aural Time Differences (when interpolating between loudspeaker positions), and hence might be a greater deficiency as compared to the lack of personalization.
This discussion topic came up again later, and Jean-Marc Jot, DTS, supported the use of HR-HRTF instead of LR-BRIR for items whose sources are available. Gregory Pallone, Orange Labs, stated that for several HOA items, he could provide the original mono sources and HOA panning laws.
However, it was finally agreed to not use the HR-HRTF because they were:
-
unfamiliar to experts (especially with dynamic objects)
-
research topic
The contribution notes that Test 1.4 is very important. The contribution has a MATLAB script in the Annex that can perform the randomization.
It was agreed to have a break-out to agree on “sectors” of the 22.2. loudspeaker set and that there should be different “sectors” for the choice of 5 of 22 and 10 of 22.
Jan Plogsties, FhG-IIS, presented
m28180
|
Proposed test method for headphone rendering in MPEG-H 3D Audio
|
Jan Plogsties
|
Test 1.3 (headphones): The contribution proposes that
-
Use two bitrates, e.g. high and low
-
The set of selected test material requires BRIR from loudspeakers at 30 locations for the two ears. These should be diffuse-field equalized. (Contribution cites a reference for doing this.)
-
The group needs to select a room for recording a set of BRIR.
-
The Open and Hidden Reference is Original + BRIR convolution.
-
The System under test is the Test 1.1 bitstream decoded and rendered using the BRIR data.
-
2-channel output signals for all systems are normalized in loudness.
-
Use high-quality open-back headphone with diffuse field equalization.
-
Listening is in a sound booth.
-
Subject can adjust listening volume.
Open issues
-
What is the best BRIR to use?
-
What are the bitrates to use?
Clemens Par, Suissaudec, presented
m28183
|
Evaluation of Binauralized Signals
|
Clemens Par
|
The contribution proposes a process to determine the subjective listening test methodology for Test 1.3
-
Select a BRIR
-
Conduct the test proposed in the contribution as a means to validate the headphone BRIR as reference
-
If headphone BRIR reference is validated, then use headphone BRIR as reference in a MUSHRA test
-
If headphone BRIR reference is not validated, then use the loudspeaker presentation as reference in a “MUSHRA” test without hidden reference
Thomas Sporer, FhG-IDMT, noted that the proposed test assumes decorrelated loudspeaker signals, and that the 22.2 presentation is expected to have high correlation between adjacent loudspeaker pairs. He further noted that a test with headphones on or off is flawed in its ability to localize sound sources.
Jan Plogsties, FhG-IIS, offered the anecdotal evidence that he cannot tell the difference between loudspeakers and headphones with personalized BRIR.
Andreas Silzle, FhG-IIS, presented
m28181
|
Reference for 3D Panning Experiment
|
Andreas Silzle, Jan Plogsties
|
The contribution presents the results of an experiment in object-based signal rendering. The experiment used point source objects rendered to the position of an actual loudspeaker as the reference with a phantom source created using VBAP between the four adjacent loudspeakers and an experimental renderer. Subjects were asked to judge
-
Source location (and if different at different frequencies)
-
Source width
-
Coloration (timbre)
The presenter noted that the VBAP and experimental renderer were fully 30 MUSHRA points below the reference.
The contribution proposes an additional test to specifically assess rendering under the restriction of audio objects with static locations in the sound stage. The test would consist of:
-
A real loudspeaker at target sound source location. The loudspeaker is the same as is used for the 22 channel presentation.
-
Four test items. All are mono coded at e.g. 68 kb/s. Items should have low frequency content (which felt to be the primarily degradation in VBAP).
-
Low anchor is 3.5 kHz LPF, but broadened to be played out of the speakers adjacent to reference speaker
-
Testing in sweet spot and perhaps off sweet spot
Werner Oomen, Philips, supported the test proposal since it is able to correctly test the rendering performance. Jean-Marc Jot, DTS, noted that there could be a multi-channel reference setup (i.e. multiple speakers and multiple sound objects). Experts suggested that this test could be incorporated into Test 1.1. It was observed that the proposed test item could be 22 channels of low-level pink noise plus 1 object that is the proposed test signal.
Johannes Boehm, Technicolor, suggested that this proposed test be added to Test 1.4.
Peter Grosche, Huawei, presented
m28035
|
Comments on MPEG-H 3D audio activity
|
Peter Grosche, David Virette
|
|
Huawei feels that the Personal TV and Mobile TV use cases are very important.
The presenter noted that all tests should have the listeners in the BS.1116 reference position. If off-sweet-spot listening, then the position must be clearly defined.
For the use cases of Personal TV and Mobile TV, the contribution proposes to evaluate two loudspeaker configurations for Personal TV and Mobile TV, but using loudspeakers in the Home Theatre” setting. If it is necessary to reduce the test workload, Huawei proposes to remove 8.1 or 7.1 channel configurations. The most important aspect of the proposal is to test reproduction using only loudspeakers located only in the front of the subject.
Henney Oh, WILUS STRC, stated support for the Huawei “front-speaker-evaluation” proposal. Clemens Par, Suissaudec, and Andreas Silzle, FhG-IIS, were sceptical that the test would give information that is relevant to real product implementations.
Concerning the CfP, Huawei proposes to make the Phase 2 timetable use “symbolic” meeting numbers.
On the Draft 3D Audio CE Methodology, Huawei proposes that,
-
Concerning the source code, that there be an open source code platform, perhaps including object modules, that the only platform used for Core Experiments.
-
Concerning the “soft threshold,” that there be a “high threshold” for which a proposal is very much favoured for acceptance.
This last proposal from Huawei is a comment on the “Draft MPEG Audio CE methodology for 3D Audio Work” which is a contribution authored by the Chair. This contribution is the product of considerable effort on his part to get a document that could gain the consensus approval of the Audio subgroup. In his opinion, it would be difficult to incorporate the first item (CE source code platform) and gain consensus of the group.
3D Audio Content
Deep Sen, Qualcomm, presented
m27347
|
Description of Qualcomm's HoA audio content
|
Deep Sen
|
The contribution provides details of the Qualcomm HOA recording process and format. The presenter noted that Qualcomm proposed to eliminate the SFD2 format, since SFD1 is fully equivalent.
Gregory Pallone, Orange Labs, presented
m27762
|
Update of Orange content
|
Gregory Pallone
|
The contribution documents the items submitted by Orange Labs and the HOA file format parameters for all items, where some items are new and not previously documented. The renderer coefficients are available as part of the content.
Jeongil Seo, ETRI, presented
m28221
|
Information on ETRI Test Items for 3D Audio
|
seoji@etri.re.kr, skbeack@etri.re.kr, tjlee@etri.re.kr, jmseong@etri.re.kr, kokang@etri.re.kr
|
The contribution describes the ETRI test items content. In addition, it proposes a clarification of audio object metadata and a description of the ETRI object renderer.
Robert Steffens, Iosono, presented
m28229
|
Update of IOSONO/Fraunhofer IDMT content
|
Robert Steffens, Christoph Sladeczek, Thomas Sporer
|
The contribution documents a description of the Iosono/FhG-IDMT content. In particular, the new items are in OA3D format and are shortened to approximately 20 seconds.
The presenter noted that if a few hundred ms of the Betty3 item are trimmed off, the number of active items in the digital never exceeds 26. This edited item can be made available this week.
Thomas Sporer, FhG-IDMT, noted that many items may need to be modified and re-submitted. Andreas Silzle, FhG-IIS, reported that FhG-IIS has investigated digital headroom of the items and headroom after 5.1 and stereo downmix. In order to not clip for downmix, he proposed these options:
-
Reduce only the critical items by the minimum necessary (about 1 dB).
-
Reduce all items by e.g. 3 dB.
-
Do nothing if stereo presentation is not used.
Johannes Boehm, Technicolor, presented
m28100
|
Study on HOA content loudness level
|
Johannes Boehm
|
The contribution gives information on rendering for HOA content.
Experts were in agreement that listeners should be instructed to grade down any differences in loudness.
Robert Steffens, Iosono, presented
m28130
|
Interactivity in MPEG-H 3D Audio Content - Proposal for Extension of OAM Format and Test Procedure
|
Robert Steffens, Thomas Sporer, Christoph Sladeczek
|
The contribution gave several examples of scenarios where object interactivity would be valuable. It proposes to add an interactivity flag to the object metadata in the 3D Audio file format. If further proposes to define a test to assess the ability of a technology to implement object interactivity, for example:
-
All objects presented as in the reference rendering
-
A designated interactive object attenuated by 20 dB
Chair noted that the range of interactivity could be from Tennis match (variable level of commentator and crowd) to Karaoke (infinite attenuation of main vocals). Oliver Wuebbolt, Technicolor, noted that dubbing into an alternate language is another use case that has infinite attenuation. Jan Plogsties, FhG-IIS, expressed that interactivity supports important use cases.
Discussion of CfP
There was extensive discussion on the CfP, particularly on the specification of the subjective performance tests. Hours of drawing on two flip charts finally converged to the text found in the CfP document.
It was agreed that Phase 2 is essentially a CE with a very specific timeline and subjective performance tests.
It was noted that a CE proponent should be able to choose the test set that is most appropriate for testing the proposed technology.
Response to Swiss NB Comment
The issued CfP document specifies subjective tests that assess the performance of presentation of 22.2 channel test items on loudspeakers over a broad range of bit rates: 1.2 Mb/s, 512 kb/s, 256 kb/s, 128 kb/s, 96 kb/s, 64 kb/s and 48 kb/s. In order to manage the workload of this broad range of tested bitrates, some of the lower bitrates (e.g. 96 kb/s and 48 kb/s) may have fewer listeners than the for the other bitrates, for example: 20 listeners for most rates, but only 10 listeners for 96 kb/s and 48 kb/s.
Dostları ilə paylaş: |