International organisation for standardisation organisation internationale de normalisation



Yüklə 8,2 Mb.
səhifə256/277
tarix02.01.2022
ölçüsü8,2 Mb.
#13030
1   ...   252   253   254   255   256   257   258   259   ...   277

Creation of Task Groups


Task groups were convened for the duration of the MPEG meeting, as shown in . Results of task group activities are reported below.
    1. Approval of previous meeting report


The Chair asked for approval of the 105rd Audio Subgroup meeting report, which was registered as a contribution. The report was approved.




106th MPEG Audio Report

Schuyler Quackenbush
    1. Review of AHG reports


There were no requests to review any of the AHG reports.
    1. Ballots to process


      Title

      Ballot

      ISO/IEC 23003-2:2010/DCOR 2, SAOC (no ballot comments)

      m31787
    2. Received National Body Comments and Liaison matters


      No.

      Body

      Title

      31801

      ITU-R SG 6/WP 6C

      Liaison Statement from ITU-R SG 6/WP 6C on MPEG-H 3D-Audio

      31805

      ITU-R SG 6/WP 6B

      Liaison Statement from ITU-R SG 6/WP 6B on a METADATA model for audio formats

    3. Joint meetings


      Groups

      What

      Where

      Day

      Time

      Audio, Systems

      Improved audio support in the ISO base media file format (DRC)

      Proposed audio CICP changes

      Understanding on how to synchronize Audio ES in FF


      Audio

      Wed

      1130 – 1230

      Audio, Systems

      Possible Systems descriptors for DRC and 3D Audio

      Understanding on how to synchronize Audio ES in MP2-TS



      Audio

      Wed

      1230 – 1300

      Audio, 3DG

      Binauralization in 3D Audio and reference software

      Audio

      Wed

      1700 – 1730

      Audio, Systems

      Green metadata

      Audio

      Thu

      0900 – 0930
    4. Plenary Discussions


There were none.
  1. Record of AhG meetings

    1. AhG on 3D Audio


The AHG on Dynamic Range Control (DRC) and 3D Audio and Audio Maintenance met Sunday January 12, 2013 1000-1800 hrs at the MPEG meeting venue.
      1. 3D Audio Binauralization CE


Listening Test Site Reports

The reports from each listening test site are listed below. Since all experts could read the reports, it was agreed that there was no need to make presentations.



m32224

ETRI listening test report for MPEG-H 3D Audio Binaural CE

Taejin Lee, Jeongil Seo, Kyeongok Kang, Hochong Park




m31831

Fraunhofer IIS Binaural CE Listening Test Report for MPEG-H 3D Audio

Simone Füg, Jan Plogsties




m31911

Huawei listening test report for the binauralization CE

Peter Grosche, Simone Fontana




m32277

Orange listening tests report for the second CE on RM0-CO binauralization

Gregory Pallone




m32194

Yonsei/WILUS listening test report for MPEG-H 3D Audio Binaural CE

Taegyu Lee, Henney Oh, Young-cheol Park, Dae Hee Youn



The Chair presented a spreadsheet with the combined subjective data (in the zip archive of the AhG report):



m31764

AHG on 3D Audio and Audio Maintenance

Schuyler Quackenbush




The spreadsheet presents statistical analysis on each of SHORT, MEDIUM and LONG BRIRs and also gave a statistical analysis of all subjective data taken together. The identity of the systems were revealed as:

Proponent

System Number

ETRI

2

IIS

4

HUA

1

ORA

3


Technical Descriptions

Jan Plogties, FhG-IIS and Jeongil Seo, ETRI, gave a joint presentation on



m32223

Technical Description of ETRI/Yonsei/WILUS Binaural CE Proposal in MPEG-H 3D Audio

Jeongil Seo, Yong Ju Lee, Taejin Lee, Seungkwon Beack, Kyeongok Kang, Taegyu Lee, Young-cheol Park, Dae Hee Youn, Henney Oh




m32188

Fraunhofer IIS Binaural CE proposal in MPEG-H 3D Audio

Simone Füg, Jan Plogsties




The presentation reviewed the technology for two joint proposals from the ETRI/Yonsei/WILUS and FhG-IIS. The difference is the FhG-IIS proposal used FFT-based convolution for 48 subbands, while the ETRI/Yonsei/WILUS proposal used FFT-based convolution for 32 subbands and a 1-tap delay line filter for bands 33-48.

The performance of all proponent technologies was shown, both averaged over all items and for each item. This analysis was consistent with the one in the Excel spread sheet attached to the AhG report.

The differences between the two systems are summarized here, where:

D&E Direct and Early reflections

LR Late reflections

TDL Tapped delay line

VOFF Variable Order Filtering in Frequency domain




ETRI/Yonsei/Wilus

FhG-IIS

D&E

VOFF, band 1-32(1)

VOFF, band 1-48

LR

Sparse Freq. Reverb, band 1-32

Sparse Freq. Reverb, band 1-48

TDL

1-tap TDL, band 33-48

-

Note 1: The first subband is number 1 (not 0).

Gregory Pallone, Orange, presented

m32278

Orange proposal for the second CE on RM0-CO binauralization

Gregory Pallone, Marc Emerit




The contribution reviews the technology which is in HOA and which was in a contribution to the 106th MPEG meeting. The technology uses parameters that are obtained by a fully automatic method. In addition, it presents complexity estimations for SHORT, MEDIUM and LONG BRIRs.

It documents that the automatic filter pre-processing provides the following parameters:






SHORT

MID

LONG

Direct length (in samples)

128

4096

8192

Diffuse length (in samples)

-

4096

8192

FcDirect (in kHz)

24

18

18

FcDiffuse (in kHz)

-

12

8

The presenter noted that there was a lower performance for the SHORT case. However, if the automatic processing were modified so that for BRIR length of less or equal to 4096, there was no truncation and just a direct convolution with the 558 length BRIR (and no diffuse component), then complexity for all lengths is as follows:



Length

Complexity per sample

SHORT

481

MEDIUM

922,00

LONG

958,67

Jan Plogsties, FhG-IIS and Jeongil Seo, ETRI, noted that the results showed that the Orange proposal did have issues for the SHORT case, and the proposed “fix” is effectively a hand-tuned optimization. Jan Plogsties further noted that, even at high bit rates, there may be QMF domain analysis if:



  • Formatter requires QMF anysis/synthesis

  • DRC requires QMF anysis/synthesis

Simone Fontana, Huawei, presented



m31914

Technical Description of the Huawei Binaural CE proposal

Simone Fontana, Karim Helwani, Peter Grosche




The presenter noted that, compared to the technology description of the previous meeting, this proposal

  • Integrates a QMF interface

  • Defines interfaces between different processing modules

The presenter noted that the system did incur some decrease in subjective quality for LONG BRIR due to the low-complexity algorithm.

The presenter further noted that it is physically “incorrect” to truncate a HRTF (i.e. a BRIR measured in an anechoic environment). He envisions that the BRIR input data would have a flag indicating that it is a true HRTF or not.

The technology implements the filtering in the subband domain. An automatic algorithm identifies a time-point in each subband signal that separates Early Decay Time (EDT) response and reverberant response. EDT is intended to provide “perceptually lossless” binauralization. Late Reverberation is an average over all BRIR.

When the FFT complexity is 5*N*Log2(N), the system complexity is (assuming that QMF data is available):



Length

Complexity per sample

SHORT

1353

MEDIUM

1779

LONG

1777

Taegyu Lee, Yonsei, presented



m32225

Comments on the complexity evaluation for MPEG-H 3D Audio binaural CE

Jeongil Seo, Yong Ju Lee, Taejin Lee, Seungkwon Beack, Kyeongok Kang, Taegyu Lee, Young-cheol Park, Dae Hee Youn, Henney Oh




The contribution notes that RM0-CO may have QMF data after core coder decoding or not, depending on the total bitrate. This appears to require the following additional complexity for binauralization:

Rate

Multi-band Binauralization (i.e QMF)

Single-band Binauralization

1.2 Mb/s

Add complexity of QMF analysis/synthesis

-

512, 256 kb/s

-

Add complexity of QMF analysis/synthesis


Disussion

The Chair summarized his view on the open issues:



  • Decide on tapped delay line (TDL) or not in Yonsei/ETRI/WILLUS/FhG-IIS technology

  • What is complexity of complex FFT?

  • How to evaluate complexity of multi-band binauralization systems that receive single-band input?

  • How to evaluate complexity of single-band binauralization systems that receive multi-band input?

Werner Oomen, Philips, noted that Sys2 is never worse, which suggests that the lower-complexity technology with TDL should be selected.

It was the consensus of the AhG to select Sys2 (Yonsei with TDL) over Sys4 (IIS without TDL).

It was the consensus of the AhG to not adopt any Sys1 (Huawei) technology at this meeting. The Chair noted that the Huawei stereo reverberant technology could be proposed as a subsequent CE.

The Chair suggested to use 5*(N/2)*Log2(N) = 2.5*N*log2(N) as FFT complexity (measured in DSP operations or MACs), where



5

Is number of operations per butterfly

N/2

Is number of butterflys per stage

Log2(N)

Is number of stages

Werner Oomen, Philips and Henney Oh, WILUS, questioned why there should be two normative binaural rendering systems. This goes against the MPEG “one function, one tool” philosophy. Henney Oh, WILUS, restated the worst-case complexity analysis presented by Yonsei. Gregory Pallone, Orange, noted that two technologies for one function might be valid if they have different complexities. Jan Plogsties, FhG-IIS, stated his support for a “one function, one tool” philosophy. Failure to do this could undermine the technical credibility of MPEG 3D Audio.

It was the consensus of the AhG to select Sys2 as a normative binauralization technology for RM-CO, but the specific case in which QMF data is not available needs further discussion.

      1. Yüklə 8,2 Mb.

        Dostları ilə paylaş:
1   ...   252   253   254   255   256   257   258   259   ...   277




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin