Task groups were convened for the duration of the MPEG meeting, as shown in . Results of task group activities are reported below.
Approval of previous meeting report
The Chair asked for approval of the 105rd Audio Subgroup meeting report, which was registered as a contribution. The report was approved.
|
106th MPEG Audio Report
|
Schuyler Quackenbush
| Review of AHG reports
There were no requests to review any of the AHG reports.
Ballots to process
Title
|
Ballot
|
ISO/IEC 23003-2:2010/DCOR 2, SAOC (no ballot comments)
|
m31787
| Received National Body Comments and Liaison matters
No.
|
Body
|
Title
|
31801
|
ITU-R SG 6/WP 6C
|
Liaison Statement from ITU-R SG 6/WP 6C on MPEG-H 3D-Audio
|
31805
|
ITU-R SG 6/WP 6B
|
Liaison Statement from ITU-R SG 6/WP 6B on a METADATA model for audio formats
| Joint meetings
Groups
|
What
|
Where
|
Day
|
Time
|
Audio, Systems
|
Improved audio support in the ISO base media file format (DRC)
Proposed audio CICP changes
Understanding on how to synchronize Audio ES in FF
|
Audio
|
Wed
|
1130 – 1230
|
Audio, Systems
|
Possible Systems descriptors for DRC and 3D Audio
Understanding on how to synchronize Audio ES in MP2-TS
|
Audio
|
Wed
|
1230 – 1300
|
Audio, 3DG
|
Binauralization in 3D Audio and reference software
|
Audio
|
Wed
|
1700 – 1730
|
Audio, Systems
|
Green metadata
|
Audio
|
Thu
|
0900 – 0930
| Plenary Discussions
There were none.
Record of AhG meetings AhG on 3D Audio
The AHG on Dynamic Range Control (DRC) and 3D Audio and Audio Maintenance met Sunday January 12, 2013 1000-1800 hrs at the MPEG meeting venue.
3D Audio Binauralization CE
Listening Test Site Reports
The reports from each listening test site are listed below. Since all experts could read the reports, it was agreed that there was no need to make presentations.
m32224
|
ETRI listening test report for MPEG-H 3D Audio Binaural CE
|
Taejin Lee, Jeongil Seo, Kyeongok Kang, Hochong Park
|
|
m31831
|
Fraunhofer IIS Binaural CE Listening Test Report for MPEG-H 3D Audio
|
Simone Füg, Jan Plogsties
|
|
m31911
|
Huawei listening test report for the binauralization CE
|
Peter Grosche, Simone Fontana
|
|
m32277
|
Orange listening tests report for the second CE on RM0-CO binauralization
|
Gregory Pallone
|
|
m32194
|
Yonsei/WILUS listening test report for MPEG-H 3D Audio Binaural CE
|
Taegyu Lee, Henney Oh, Young-cheol Park, Dae Hee Youn
|
|
The Chair presented a spreadsheet with the combined subjective data (in the zip archive of the AhG report):
m31764
|
AHG on 3D Audio and Audio Maintenance
|
Schuyler Quackenbush
|
|
The spreadsheet presents statistical analysis on each of SHORT, MEDIUM and LONG BRIRs and also gave a statistical analysis of all subjective data taken together. The identity of the systems were revealed as:
Proponent
|
System Number
|
ETRI
|
2
|
IIS
|
4
|
HUA
|
1
|
ORA
|
3
|
Technical Descriptions
Jan Plogties, FhG-IIS and Jeongil Seo, ETRI, gave a joint presentation on
m32223
|
Technical Description of ETRI/Yonsei/WILUS Binaural CE Proposal in MPEG-H 3D Audio
|
Jeongil Seo, Yong Ju Lee, Taejin Lee, Seungkwon Beack, Kyeongok Kang, Taegyu Lee, Young-cheol Park, Dae Hee Youn, Henney Oh
|
|
m32188
|
Fraunhofer IIS Binaural CE proposal in MPEG-H 3D Audio
|
Simone Füg, Jan Plogsties
|
|
The presentation reviewed the technology for two joint proposals from the ETRI/Yonsei/WILUS and FhG-IIS. The difference is the FhG-IIS proposal used FFT-based convolution for 48 subbands, while the ETRI/Yonsei/WILUS proposal used FFT-based convolution for 32 subbands and a 1-tap delay line filter for bands 33-48.
The performance of all proponent technologies was shown, both averaged over all items and for each item. This analysis was consistent with the one in the Excel spread sheet attached to the AhG report.
The differences between the two systems are summarized here, where:
D&E Direct and Early reflections
LR Late reflections
TDL Tapped delay line
VOFF Variable Order Filtering in Frequency domain
|
ETRI/Yonsei/Wilus
|
FhG-IIS
|
D&E
|
VOFF, band 1-32(1)
|
VOFF, band 1-48
|
LR
|
Sparse Freq. Reverb, band 1-32
|
Sparse Freq. Reverb, band 1-48
|
TDL
|
1-tap TDL, band 33-48
|
-
|
Note 1: The first subband is number 1 (not 0).
Gregory Pallone, Orange, presented
m32278
|
Orange proposal for the second CE on RM0-CO binauralization
|
Gregory Pallone, Marc Emerit
|
|
The contribution reviews the technology which is in HOA and which was in a contribution to the 106th MPEG meeting. The technology uses parameters that are obtained by a fully automatic method. In addition, it presents complexity estimations for SHORT, MEDIUM and LONG BRIRs.
It documents that the automatic filter pre-processing provides the following parameters:
|
SHORT
|
MID
|
LONG
|
Direct length (in samples)
|
128
|
4096
|
8192
|
Diffuse length (in samples)
|
-
|
4096
|
8192
|
FcDirect (in kHz)
|
24
|
18
|
18
|
FcDiffuse (in kHz)
|
-
|
12
|
8
|
The presenter noted that there was a lower performance for the SHORT case. However, if the automatic processing were modified so that for BRIR length of less or equal to 4096, there was no truncation and just a direct convolution with the 558 length BRIR (and no diffuse component), then complexity for all lengths is as follows:
Length
|
Complexity per sample
|
SHORT
|
481
|
MEDIUM
|
922,00
|
LONG
|
958,67
|
Jan Plogsties, FhG-IIS and Jeongil Seo, ETRI, noted that the results showed that the Orange proposal did have issues for the SHORT case, and the proposed “fix” is effectively a hand-tuned optimization. Jan Plogsties further noted that, even at high bit rates, there may be QMF domain analysis if:
-
Formatter requires QMF anysis/synthesis
-
DRC requires QMF anysis/synthesis
Simone Fontana, Huawei, presented
m31914
|
Technical Description of the Huawei Binaural CE proposal
|
Simone Fontana, Karim Helwani, Peter Grosche
|
|
The presenter noted that, compared to the technology description of the previous meeting, this proposal
-
Integrates a QMF interface
-
Defines interfaces between different processing modules
The presenter noted that the system did incur some decrease in subjective quality for LONG BRIR due to the low-complexity algorithm.
The presenter further noted that it is physically “incorrect” to truncate a HRTF (i.e. a BRIR measured in an anechoic environment). He envisions that the BRIR input data would have a flag indicating that it is a true HRTF or not.
The technology implements the filtering in the subband domain. An automatic algorithm identifies a time-point in each subband signal that separates Early Decay Time (EDT) response and reverberant response. EDT is intended to provide “perceptually lossless” binauralization. Late Reverberation is an average over all BRIR.
When the FFT complexity is 5*N*Log2(N), the system complexity is (assuming that QMF data is available):
Length
|
Complexity per sample
|
SHORT
|
1353
|
MEDIUM
|
1779
|
LONG
|
1777
|
Taegyu Lee, Yonsei, presented
m32225
|
Comments on the complexity evaluation for MPEG-H 3D Audio binaural CE
|
Jeongil Seo, Yong Ju Lee, Taejin Lee, Seungkwon Beack, Kyeongok Kang, Taegyu Lee, Young-cheol Park, Dae Hee Youn, Henney Oh
|
|
The contribution notes that RM0-CO may have QMF data after core coder decoding or not, depending on the total bitrate. This appears to require the following additional complexity for binauralization:
Rate
|
Multi-band Binauralization (i.e QMF)
|
Single-band Binauralization
|
1.2 Mb/s
|
Add complexity of QMF analysis/synthesis
|
-
|
512, 256 kb/s
|
-
|
Add complexity of QMF analysis/synthesis
|
Disussion
The Chair summarized his view on the open issues:
-
Decide on tapped delay line (TDL) or not in Yonsei/ETRI/WILLUS/FhG-IIS technology
-
What is complexity of complex FFT?
-
How to evaluate complexity of multi-band binauralization systems that receive single-band input?
-
How to evaluate complexity of single-band binauralization systems that receive multi-band input?
Werner Oomen, Philips, noted that Sys2 is never worse, which suggests that the lower-complexity technology with TDL should be selected.
It was the consensus of the AhG to select Sys2 (Yonsei with TDL) over Sys4 (IIS without TDL).
It was the consensus of the AhG to not adopt any Sys1 (Huawei) technology at this meeting. The Chair noted that the Huawei stereo reverberant technology could be proposed as a subsequent CE.
The Chair suggested to use 5*(N/2)*Log2(N) = 2.5*N*log2(N) as FFT complexity (measured in DSP operations or MACs), where
5
|
Is number of operations per butterfly
|
N/2
|
Is number of butterflys per stage
|
Log2(N)
|
Is number of stages
|
Werner Oomen, Philips and Henney Oh, WILUS, questioned why there should be two normative binaural rendering systems. This goes against the MPEG “one function, one tool” philosophy. Henney Oh, WILUS, restated the worst-case complexity analysis presented by Yonsei. Gregory Pallone, Orange, noted that two technologies for one function might be valid if they have different complexities. Jan Plogsties, FhG-IIS, stated his support for a “one function, one tool” philosophy. Failure to do this could undermine the technical credibility of MPEG 3D Audio.
It was the consensus of the AhG to select Sys2 as a normative binauralization technology for RM-CO, but the specific case in which QMF data is not available needs further discussion.
Dostları ilə paylaş: |