4.2.33D Audio
Gregory Pallone, Orange Labs, presented
m23748
|
Use cases and possible material for 3D Audio
|
Gregory Pallone, Pierrick Philippe
| The contribution presents a framework that defines “3D Audio” as an evolution from 0D, 1D and 2D Audio. 3D Audio is the combination of 2D, as the azimuth and elevation location of sound sources, and distance as indicated by the relative timing of the direct path sound and the reverberant sound. Near-field effects are yet an additional issue. For example, objects may be perceived as nearby if they subtend a larger solid angle in the sound stage and are louder in volume.
It describes the European Project “FascinatE” that is targets 3D Audio. Finally, it presents a vision of what MPEG-H 3D Audio might be – a combination of channel, object and scene based coding coupled with flexible rendering for presentation.
There was considerable discussion on encoding and presentation of signals that are already binaural 2-channel signals. It was noted that current MPEG encoders (e.g. AAC) might have to be restricted to achieve good performance (e.g. never use intensity stereo). It was proposed that a use case might be to present 2-channel binaural signals on e.g. 5 loudspeakers. The Chair noted that experts should verify that this functionality represents a sufficiently large market to justify the investment in standardization.
Jean-Marc Jot, DTS, noted that it is critical to select test items that embrace and exercise the notion of “3D” signals, such that objects are nearby and far away in the audio scenes.
The Chair proposed that the ideas presented in this contribution should be further discussed and possibly incorporated into the “Thoughts on 3D Audio” document that will be revised and re-issued at this meeting.
Juergen Herre, International Audio Laboratories Erlangen and FhG, presented
m23696
|
Further Thoughts On Evaluation Procedures for 3D Audio
|
Juergen Herre, Andreas Silzle,
| The presentation reviewed that at the 98th meting the “Thoughts On” document incorporated some evaluation guidelines. This contribution aims to
-
simplify the structure of the test suite
-
limit the actual testing effort as much as possible
-
make the individual tests as meaningful as possible
Test procedures:
-
High Quality
-
Localization and Envelopment
-
Flexible Rendering
-
Transcoding for low bandwidth channels and limited capability devices
These can be represented schematically as:
The contribution notes that it is not appropriate to average the scores for the four tests, and states: “As a rule of thumb, the timbral fidelity of spatial sound receives a weight of 2/3 compared to 1/3 for the remaining spatial attributes (localization, envelopment, reverberation etc.) in human sound perception.” This notion is based on the reference: Rumsey et. al., “On the relative importance of special and timbral fidelities of degraded multichannel audio quality.” J. Acoust. Soc. Am, 118(2) pp. 968-976.
The contribution proposes two test procedures:
No Reference (Type 1):
-
Sequential presentation of A and B (no instantaneous switching)
-
Grade A and B on an absolute scale (or on a comparative scale). There is NO reference signal
Gregory Pallone, Orange Labs, and Werner Oomen, Philips, noted that there is some danger of subjective testing without a reference. In particular, Werner Oomen, noted that some systems try to promote the “wow factor” far more than retaining the sound stage of the original signal.
Namsuk Lee, Samsung, noted that in some cases the goal might be to have the processed signal be as close the reference as possible, while in other cases it might be more important to the user to have an “immersive” experience.
Reference (Type 2):
-
Presentation of A and B, possibly with instantaneous switching
-
Signal A shall be considered the Reference and all differences in B are to be considered degradations.
-
Grade A and B on an absolute scale (or on a comparative scale).
The Chair noted that the test set could be segmented into sets, each of which address distinct aspects to be assessed. A set encompassing source motion, source localization, sound scene detail may address aspects of Test 2 even in the Test 1 methodology. There seemed to be wide agreement on this.
The Chair also noted that audio/visual test items could make the task of assessment of “sound localization” much easier.
Clemens Par, Swissaudec, presented
m23762
|
Evidence of Performance of Swissaudec’s VoiCode® Technology
|
Clemens Par
| First, the presenter reviewed the Swissaudec technology that was presented in a contribution to the previous MPEG meeting. The technology is based on inverse modelling. This finds application in geophysics, e.g. inference of geologic layers from seismic signals.
The technology uses time and level modelling to interpolate actual or virtual mono/sum (M/S) signals. The presenter acknowledged that the technology requires an amount of “cross-talk” such that it is not applicable to discrete, uncorrelated signals.
The contribution reports on a listening test of the technology. The test signal is 5.0 channel item that is a violin concerto sampled at 48 kHz and 16 bits/sample. All systems under test produced 5.0 channel signals at the same sampling rate and word length. The presenter noted that the test stimuli are available for download and the URL is given in the contribution.
The systems under test were:
Dostları ilə paylaş: |