International organisation for standardisation organisation internationale de normalisation

Task group activities Joint Meetings

Yüklə 8,63 Mb.

səhifə	104/117
tarix	25.10.2017
ölçüsü	8,63 Mb.
	#13029

1 ... 100 101 102 103 104 105 106 107 ... 117

Task group activities

Joint Meetings
1. Raw Audio and Video, with 3DV, Wednesday 11:30 – 12:00

The Audio Chair presented

m28325

Thoughts on ISO/IEC 14496-1:2010/FDAM 2 Support for raw audio-visual data

Schuyler Quackenbush

The contribution contained recommended changes to the Systems components to fully describe a raw audio or video signal. There as discussion on the tradeoffs of a simple Registration Authority description versus a simple Systems syntax.

Audio experts will propose a revised, more complete Systems syntax that is compatible with a simple Registration Authority description. David Singer, Apple, noted that Motion JPEG 2000 may have Systems syntax for carriage of Linear PCM.

MP4 FF extensions for Audio, with Systems, Thursday 9:00 – 10:00

David Singer, Apple, presented

m28160

Enhanced Audio Support for MP4 File Format

David Singer

The presentation notes three issues in the MP4 FF and proposes solutions:

Support for high sampling rates. The proposal is to have a 32-bit number for the sampling rate.
Support for richer metadata: loudness and peak value. This would be “anchor” loudness, program loudness at various time window lengths, and peak value. This would be in a high-level box that is easily accessable
Support for richer metadata: dynamic range control.

Issues to be addressed by Audio experts

Review
Recommend a path to harmonizing with duplicate technology in Audio
Consider what information is transmitted and whether the representation is appropriate

It was agreed to incorporate the 32-bit sample rate capability into an existing issue this document as Technologies Under Consideration in 14496-12, ISO Base Media FF.

David Singer, Apple, presented

N13051

Study on ISO/IEC 23001-8:DIS on Codec Independent Code Points

David Singer

The presenter used the opportunity of the joint meeting to draft the DoC on this item.

3D Audio in Augmented Reality, with 3DV, Thursday 10:00 – 11:00

The Audio Chair and 3DV experts discussed what is needed in Augmented Reality. Experts noted that Audio technology coordinates are typically listener-centric, while Augmented Reality is scene-centric.

Objects in a virtual scene, listener in virtual scene, generate objects position in listener coordinates and render.

Possible Audio work: add directivity and distance rendering for objects.

3D Audio Call for Proposals, with Requirements, Thursday 14:00 – 15:00

The 3D Audio Call for Proposals was presented and discussed. Gregory Pallone, Orange Labs, reminded experts that Peter Grosche, Huawei, raised an inconsistency in the “envisioned standard” paragraph of the Call for Proposal (3.2), which was silent about HOA. That was fixed by adding the sentence that it “may support HOA inputs” The Call was approved by Requirements. In addition, the 3D Audio CE Methodology was reviewed and the encoder source code requirements for CE integration was highlighted.

Task Group discussions
1. MPEG-4, MPEG Surround, USAC Maintenance

Nikolaus Rettelbach, FhG-IIS, presented

m28129

Interoperability Tests regarding "AMENDMENT 4: New levels for AAC profiles"

Nikolaus Rettelbach

The contribution zip archive contains an excel spreadsheet that contains an extensive report on interoperability of 7.1 channel bitstreams and legacy products. Only implicit signalling was investigated.

Christof Fersch, Dolby, suggested that a further investigation of the problematic bitstream/product combinations, but now using explicit channel configuration.

Daniel Fischer, FhG-IIS, presented

m28139

Proposed update to MPEG Surround Reference Software

Julien Robilliard, Andreas Holzer

The contribution proposes fixes to the MPEG Surround Reference Software, which are described in the contribution and supplied as corrected software in the contribution zip archive.

It was the consensus of the Audio subgroup to issue the contribution as a DCOR.

Daniel Fischer, FhG-IIS, presented

m28113

Proposed corrections to USAC

Daniel Fischer, Julien Robilliard, Andreas Niedermeier

The contribution proposes corrections to the specification text and the reference software

Corrections to reference software only:

Remove restrictions for lower sampling frequencies (always use 64 bands of the QMF filterbank)
Correct same two bugs reported in m28139

Corrections to reference software and text:

In certain cases (i.e. when you call the arithmetic decoder and there is nothing to decode) the final state of the arithmetic coder in indeterminate. Fixes are provided for text and code to bring it into a deterministic state.
Force phase angle to modulo(2*pi) for phase interpolation.
Forbid combination of Unified Stereo and Complex Stereo Prediction.

It was the consensus of the Audio subgroup to

Issue a Study on USAC Reference Software DAM

Issue a DCOR on USAC text.

Daniel Fischer, FhG-IIS, presented

m28111

Proposed study on ISO/IEC 23003-3:2012/DAM 1, USAC Conformance

Daniel Fischer, Max Neuendorf

The contribution has updates to the conformance spreadsheet and the conformance text. The text changes are:

A number of bitstream restrictions
Added descriptions of bitstreams that are supplied at this meeting

Modified descriptions of existing bitstreams

The conformance spreadsheet is modified

Added “status” sheet to show available streams and cross-checked streams.

Current status: 95% of defined are available and 50% of available are cross-checked.

Support for MPEG Audio in the Marketplace

A Liaison statement to DVB was reviewed and approved.

3D Audio

Thomas Sporer, FhG-IDMT, presented

m28124

Testing Non-Sweet Spot Listening Positions

Thomas Sporer, Christoph Sladeczek, Robert Steffens

The contribution presented ideas on how to test for audio quality in “non-sweet-spot” listening positions. It proposes a two-step approach:

MUSHRA test at sweet spot
Multi-stimulus test at non-sweet-spot locations (i.e. MUSHRA without open reference)

The results are subject to this analysis:

Off sweet spot score for a system under test cannot be higher than its score in the sweet spot (higher scores are limited to score value of sweet spot).

It proposes to modify the “alternate listener positions” as specified in BS.1116

Proposed off-sweet-spot distance is less than what is defined in BS.1116

0.25 B in left, right and front-back directions

It proposes two off-sweet-spot locations: Front-Left and Back-Right.

The contribution notes that sweet-spot evaluation is done in Test 1.1, and Test 1.2 therefore need only assess off-sweet-spot listener locations. Experts noted this proposal requires that a listener take both Test 1.1 and Test 1.2 and that it would be most robust if Test 1.1 immediately follows Test 1.2.

The presenter noted that Test 1.2 should have the same objectives, e.g. assessment high quality audio, as Test 1.1. Further, he noted that the listener instructions should not require that one item (i.e. usually the hidden reference) be scored at 100. If need be, Test 1.2 could use fewer test items or fewer listeners at each off-sweet-spot position.

There was large agreement to adopt the proposed lower distance from sweet-spot to off-sweet-spot locations.

However, these were left as open issues:

2 or 4 positions
1 or 2 bitrates.

Gregory Pallone, Orange Labs, presented

m28159

Proposed Evaluation Procedures for 3D Audio

Gregory Pallone

The presenter noted that Test 1.2, off-sweet-spot, is very important because it represents a very common use case of many people in a home theatre setting.

The contribution proposes that Test 1.2 clearly state that it “will use same decoded wavefile as is used in Test 1.1.”

It was the consensus of the Audio subgroup to

Test 1.2: modify Call so that Test 1.2 clearly state that it “will use same decoded wavefile as is used in Test 1.1.”

For Test 1.3, it does not support using a loudspeaker references since:

Listener may not be in BRIR room

Listener can move the head to hear a different sound field when listening to speakers, but cannot when listening to headphones.

Four sites have contributed low-resolution BRIR: FhG-IIS, FhG-IDMT, Technicolor, Orange Labs. The presentation noted the advantages of low-resolution vs. high-resolution in binauralizing channel or object based content.

The contribution proposes:

MUSHRA with open reference
HR-HRTF for objects
Subset of HR-BRIR (effectively LR-BRIR) for channels and HR-HRTF for both C+O and HOA (a different approach would be to renderer these to 22.2, which are then binauralized as for channel signals).

The contribution notes that this is equivalent to just choosing a HR-BRIR.

Werner Oomen, Philips, stated that since the HRTF are not individualized, the low-resolution is sufficient when one uses HRTF interpolation. Thomas Sporer, FhG-IDMT, noted that based on the experience of his lab, BRIR is better in that it captures aspects of the room, and that LR-BRIR is all that we have. Note that “better” is the ability of a subject to correctly localize a sound in space. He further noted that BRIR often helps subjects to externalize the sound scene.

Clemens Par, Suisaudec, asked the presenter if he had evaluated his proposal with respect to localization and coloration. The presenter said that the evaluation was performed both objectively with the figures shown during the presentation, and also with informal listening tests where the difference between HR-HRTF and LR-HRTF was audible.

The presenter stated that the use of HR-BRIR instead of LR-BRIR is probably equivalent to what m28181 shows (a decrease of 30 MUSHRA points), but acknowledges that HR-BRIR are not available. However since HR-HRTF are available, they could be used to create the reference for some objects and HOA items (the ones that have the most reverberation for example). All the audio experts agreed that if there were HR-BRIR available, they would have selected them.

The Chair noted that, as stated in the contribution, the LR-BRIR has deficiencies for object synthesis. Is this deficiency greater than the deficiency due to a “non-peronalized” BRIR? Presenter responded that LR-BRIR can give incorrect Inter-aural Time Differences (when interpolating between loudspeaker positions), and hence might be a greater deficiency as compared to the lack of personalization.

This discussion topic came up again later, and Jean-Marc Jot, DTS, supported the use of HR-HRTF instead of LR-BRIR for items whose sources are available. Gregory Pallone, Orange Labs, stated that for several HOA items, he could provide the original mono sources and HOA panning laws.

However, it was finally agreed to not use the HR-HRTF because they were:

unfamiliar to experts (especially with dynamic objects)
research topic

The contribution notes that Test 1.4 is very important. The contribution has a MATLAB script in the Annex that can perform the randomization.

It was agreed to have a break-out to agree on “sectors” of the 22.2. loudspeaker set and that there should be different “sectors” for the choice of 5 of 22 and 10 of 22.

Jan Plogsties, FhG-IIS, presented

m28180

Proposed test method for headphone rendering in MPEG-H 3D Audio

Jan Plogsties

Test 1.3 (headphones): The contribution proposes that

Use two bitrates, e.g. high and low
The set of selected test material requires BRIR from loudspeakers at 30 locations for the two ears. These should be diffuse-field equalized. (Contribution cites a reference for doing this.)
The group needs to select a room for recording a set of BRIR.
The Open and Hidden Reference is Original + BRIR convolution.
The System under test is the Test 1.1 bitstream decoded and rendered using the BRIR data.
2-channel output signals for all systems are normalized in loudness.
Use high-quality open-back headphone with diffuse field equalization.
Listening is in a sound booth.
Subject can adjust listening volume.

Open issues

What is the best BRIR to use?

What are the bitrates to use?

Clemens Par, Suissaudec, presented

m28183

Evaluation of Binauralized Signals

Clemens Par

The contribution proposes a process to determine the subjective listening test methodology for Test 1.3

Select a BRIR
Conduct the test proposed in the contribution as a means to validate the headphone BRIR as reference
If headphone BRIR reference is validated, then use headphone BRIR as reference in a MUSHRA test
If headphone BRIR reference is not validated, then use the loudspeaker presentation as reference in a “MUSHRA” test without hidden reference

Thomas Sporer, FhG-IDMT, noted that the proposed test assumes decorrelated loudspeaker signals, and that the 22.2 presentation is expected to have high correlation between adjacent loudspeaker pairs. He further noted that a test with headphones on or off is flawed in its ability to localize sound sources.

Jan Plogsties, FhG-IIS, offered the anecdotal evidence that he cannot tell the difference between loudspeakers and headphones with personalized BRIR.

Andreas Silzle, FhG-IIS, presented

m28181

Reference for 3D Panning Experiment

Andreas Silzle, Jan Plogsties

The contribution presents the results of an experiment in object-based signal rendering. The experiment used point source objects rendered to the position of an actual loudspeaker as the reference with a phantom source created using VBAP between the four adjacent loudspeakers and an experimental renderer. Subjects were asked to judge

Source location (and if different at different frequencies)
Source width
Coloration (timbre)

The presenter noted that the VBAP and experimental renderer were fully 30 MUSHRA points below the reference.

The contribution proposes an additional test to specifically assess rendering under the restriction of audio objects with static locations in the sound stage. The test would consist of:

A real loudspeaker at target sound source location. The loudspeaker is the same as is used for the 22 channel presentation.

MUSHRA test methodology

Four test items. All are mono coded at e.g. 68 kb/s. Items should have low frequency content (which felt to be the primarily degradation in VBAP).

Low anchor is 3.5 kHz LPF, but broadened to be played out of the speakers adjacent to reference speaker
Testing in sweet spot and perhaps off sweet spot

Werner Oomen, Philips, supported the test proposal since it is able to correctly test the rendering performance. Jean-Marc Jot, DTS, noted that there could be a multi-channel reference setup (i.e. multiple speakers and multiple sound objects). Experts suggested that this test could be incorporated into Test 1.1. It was observed that the proposed test item could be 22 channels of low-level pink noise plus 1 object that is the proposed test signal.

Johannes Boehm, Technicolor, suggested that this proposed test be added to Test 1.4.

Peter Grosche, Huawei, presented

m28035

Comments on MPEG-H 3D audio activity

Peter Grosche, David Virette

Huawei feels that the Personal TV and Mobile TV use cases are very important.

The presenter noted that all tests should have the listeners in the BS.1116 reference position. If off-sweet-spot listening, then the position must be clearly defined.

For the use cases of Personal TV and Mobile TV, the contribution proposes to evaluate two loudspeaker configurations for Personal TV and Mobile TV, but using loudspeakers in the Home Theatre” setting. If it is necessary to reduce the test workload, Huawei proposes to remove 8.1 or 7.1 channel configurations. The most important aspect of the proposal is to test reproduction using only loudspeakers located only in the front of the subject.

Henney Oh, WILUS STRC, stated support for the Huawei “front-speaker-evaluation” proposal. Clemens Par, Suissaudec, and Andreas Silzle, FhG-IIS, were sceptical that the test would give information that is relevant to real product implementations.

Concerning the CfP, Huawei proposes to make the Phase 2 timetable use “symbolic” meeting numbers.

On the Draft 3D Audio CE Methodology, Huawei proposes that,

Concerning the source code, that there be an open source code platform, perhaps including object modules, that the only platform used for Core Experiments.
Concerning the “soft threshold,” that there be a “high threshold” for which a proposal is very much favoured for acceptance.

This last proposal from Huawei is a comment on the “Draft MPEG Audio CE methodology for 3D Audio Work” which is a contribution authored by the Chair. This contribution is the product of considerable effort on his part to get a document that could gain the consensus approval of the Audio subgroup. In his opinion, it would be difficult to incorporate the first item (CE source code platform) and gain consensus of the group.

3D Audio Content

Deep Sen, Qualcomm, presented

m27347

Description of Qualcomm's HoA audio content

Deep Sen

The contribution provides details of the Qualcomm HOA recording process and format. The presenter noted that Qualcomm proposed to eliminate the SFD2 format, since SFD1 is fully equivalent.

Gregory Pallone, Orange Labs, presented

m27762

Update of Orange content

Gregory Pallone

The contribution documents the items submitted by Orange Labs and the HOA file format parameters for all items, where some items are new and not previously documented. The renderer coefficients are available as part of the content.

Jeongil Seo, ETRI, presented

m28221

Information on ETRI Test Items for 3D Audio

seoji@etri.re.kr, skbeack@etri.re.kr, tjlee@etri.re.kr, jmseong@etri.re.kr, kokang@etri.re.kr

The contribution describes the ETRI test items content. In addition, it proposes a clarification of audio object metadata and a description of the ETRI object renderer.

Robert Steffens, Iosono, presented

m28229

Update of IOSONO/Fraunhofer IDMT content

Robert Steffens, Christoph Sladeczek, Thomas Sporer

The contribution documents a description of the Iosono/FhG-IDMT content. In particular, the new items are in OA3D format and are shortened to approximately 20 seconds.

The presenter noted that if a few hundred ms of the Betty3 item are trimmed off, the number of active items in the digital never exceeds 26. This edited item can be made available this week.

Thomas Sporer, FhG-IDMT, noted that many items may need to be modified and re-submitted. Andreas Silzle, FhG-IIS, reported that FhG-IIS has investigated digital headroom of the items and headroom after 5.1 and stereo downmix. In order to not clip for downmix, he proposed these options:

Reduce only the critical items by the minimum necessary (about 1 dB).
Reduce all items by e.g. 3 dB.
Do nothing if stereo presentation is not used.

Johannes Boehm, Technicolor, presented

m28100

Study on HOA content loudness level

Johannes Boehm

The contribution gives information on rendering for HOA content.

Experts were in agreement that listeners should be instructed to grade down any differences in loudness.

Robert Steffens, Iosono, presented

m28130

Interactivity in MPEG-H 3D Audio Content - Proposal for Extension of OAM Format and Test Procedure

Robert Steffens, Thomas Sporer, Christoph Sladeczek

The contribution gave several examples of scenarios where object interactivity would be valuable. It proposes to add an interactivity flag to the object metadata in the 3D Audio file format. If further proposes to define a test to assess the ability of a technology to implement object interactivity, for example:

All objects presented as in the reference rendering
A designated interactive object attenuated by 20 dB

Chair noted that the range of interactivity could be from Tennis match (variable level of commentator and crowd) to Karaoke (infinite attenuation of main vocals). Oliver Wuebbolt, Technicolor, noted that dubbing into an alternate language is another use case that has infinite attenuation. Jan Plogsties, FhG-IIS, expressed that interactivity supports important use cases.

Discussion of CfP

There was extensive discussion on the CfP, particularly on the specification of the subjective performance tests. Hours of drawing on two flip charts finally converged to the text found in the CfP document.

It was agreed that Phase 2 is essentially a CE with a very specific timeline and subjective performance tests.

It was noted that a CE proponent should be able to choose the test set that is most appropriate for testing the proposed technology.

Response to Swiss NB Comment

The issued CfP document specifies subjective tests that assess the performance of presentation of 22.2 channel test items on loudspeakers over a broad range of bit rates: 1.2 Mb/s, 512 kb/s, 256 kb/s, 128 kb/s, 96 kb/s, 64 kb/s and 48 kb/s. In order to manage the workload of this broad range of tested bitrates, some of the lower bitrates (e.g. 96 kb/s and 48 kb/s) may have fewer listeners than the for the other bitrates, for example: 20 listeners for most rates, but only 10 listeners for 96 kb/s and 48 kb/s.

Yüklə 8,63 Mb.

Dostları ilə paylaş:

1 ... 100 101 102 103 104 105 106 107 ... 117

International organisation for standardisation organisation internationale de normalisation

Task group activities Joint Meetings

Task group activities

Joint Meetings

Raw Audio and Video, with 3DV, Wednesday 11:30 – 12:00

MP4 FF extensions for Audio, with Systems, Thursday 9:00 – 10:00

3D Audio in Augmented Reality, with 3DV, Thursday 10:00 – 11:00

3D Audio Call for Proposals, with Requirements, Thursday 14:00 – 15:00

Task Group discussions

MPEG-4, MPEG Surround, USAC Maintenance

Support for MPEG Audio in the Marketplace

3D Audio