International organisation for standardisation organisation internationale de normalisation


Audio plenary, joint meeting and task group activities



Yüklə 3,67 Mb.
səhifə49/55
tarix27.10.2017
ölçüsü3,67 Mb.
#16651
1   ...   45   46   47   48   49   50   51   52   ...   55

4Audio plenary, joint meeting and task group activities

4.1Review of AHG reports


There were no requests to review any of the AHG reports.

4.2Received national body comments and liaison matters


NB Comment and Liaison documents were reviewed and the drafting of the responses was delegated.

4.3Joint meetings and documents from other groups


Requirements

Hiroya Nakamura, JVC, presented



14975

Sadahiro Yasura
Hiroya Nakamura
Motoharu Ueda

Requirements for FTV Audio

This contribution presented a number of new audio-related requirements for FTV. These include

  • Channel-oritented representation (i.e. loudspeakers)

    • Recording-oriented parameters

    • Listener-oriented paremeters

  • Object-oriented representations (i.e. sound sources)

    • Position-oriented recording

The Chair noted that it is very difficult for audio experts to endorse, revise or expand the text of the contribution since there is very little information on the overall goals and application scenarios of the FVT work. Hence no action was taken at this meeting.

4.4Task Group discussions

4.4.1MPEG-4 audio, conformance, reference software


Andreas Schneider, Coding Technologies, presented

14986

Klaus Peichl

Proposed correction to SBR conformance testing

The contribution reviewed the SBR conformance procedure and then explained a pathological issue with one of the monophonic conformance signals. The solution is to create the bitstream at a higher bitrate so as to make the AAC-encoded base band better conditioned for the subsequent SBR operation. The Chair noted that the old bitstream could be labelled as “deprecated” as of an indicated date, and manufactures are directed to use the new bitstream for testing of new product.

It was the consensus of the ASG to create a DCOR to MPEG-4 Conformance AMD 8 based on this contribution.

Heiko Purnhagen, Coding Technologies, presented

14988

Heiko Purnhagen gel@codingtechnologies.com Alexander Groeschel hrc@codingtechnologies.com Holger Hoerich

Proposed clarification for MPEG-4 Audio

This notes that it is possible to signal an arbitrary sampling rate, but in a ProgramConfigElement there is no possibility to signal arbitrary sampling rates. The contribution proposes adding a paragraph to clarify this issue.

It was the consensus of the Audio Subgroup to incorporate this into the existing DCOR on DST and MP3on4 and LTP.

Tilman Liebchen, LG, presented

14928

Tilman Liebchen

Proposed Text of ISO/IEC 14496-3:2005/Amd.2:2006/COR 3, ALS

It was the consensus of the Audio Subgroup to make this contribution the text of Cor. 3. It is anticipate that 14496-3/Cor 5, 14496-3/AMD2/Cor 3 and 14496-3/AMD3/Cor 1 will be incorporated into 14496-3 Fourth Edition.

4.4.2MPEG-D MPS


Heiko Purnhagen, Coding Technologies, presented

14856

Heiko Purnhagen
Jeroen Koppens
Matthias Neusinger
Kristofer Kjörling

Further corrections to MPEG Surround DCOR1

The contribution raises additional corrections that it proposed to incorporate into 23003/DCOR1. Most are editorial in that they clarify text or correct editing errors. One correction proposed a mechanism to extend the CRC that signals the integrity of the SBR payload from 8 bits to 16 bits.

It was the consensus of the Audio Subgroup to incorporate this into 23003/COR1.

Andreas Schneider, Coding Technologies, presented

14860

Andreas Schneider

Update on MPEG Surround conformance

This contribution presented the current status of MPEG Surround Conformance, which has the status of FPDAM. There are now 19 defined for two base coders, and 36 of the 38 bitstreams are available. The contribution reports that there are still bugs in the MPEG Surround ANSI-C encoder and MPEG Surround conformance tool, but it is expected that these issues will be resolved by the next MPEG meeting.

4.4.3MPEG-D SAOC


Oliver Hellmuth, FhG, and Erik Schuijers, Philips, presented

14989

J. Engdegård
H. Purnhagen
B. Resch
L. Villemoes
C. Falch
O. Hellmuth
J. Herre
J. Hilpert
A. Hölzer
L. Terentiev
J. Breebaart
J. Koppens

Proposed SAOC Working Draft Document

This contribution is the proposed text of the Spatial Audio Object Coding (SAOC) Working Draft. The text is comprised of the following sections:

  • Encoder (informative)

  • Multipoint Control Unit (MCU) stream combiner

  • SAOC to MPEG Surround transcoder

The presentation gave a brief overview of the technology in SAOC, including

  • SAOC to MPEG Surround transcoding

  • Stereo to stereo processing, typically with object re-positioning

  • Binaural rendering of the virtual multichannel presentation

  • Effects interface (Insert and Send)

It was the consensus of the Audio Subgroup to convert this contribution into an output document.
Oliver Hellmuth, FhG, presented

14974

Andreas Hoelzer
Jeroen Koppens
Leif Sehlstrom

SAOC Reference software

The contribution contains the reference code in an attached *.zip file. The transcoder compiled from the ANSI-C code decodes the bitstreams submitted as part of the CfP and produces MPEG Surround streams that are bit-identical to those associated with the CfP submissions, but subject to occatoinal rounding errors due to the transition from MATLAB based code to ANSI-C code. These rounding errors are at most one quantization level and occur for no more than 0.3% of the parameters.

Jeongil Seo, ETRI, presented



14908

Jeongil Seo
Seungkwon Beack
Kyeongok Kang
Kwangki Kim
Minsoo Hahn

Comments on SAOC architecture for MBO

This contribution proposed two Core Experiments. After looking into the SAOC WD, it was discovered that CE1 is exactly as the SAOC WD currently operates, and so this CE is not meaningful. ETRI expects to propose the second CE at the next meeting.

Oliver Hellmuth, FhG, presented



14985

Oliver Hellmuth
Jürgen Herre
Leonid Terentiev
Andreas Hölzer
Cornelia Falch
Johannes Hilpert

Proposed Improvement for MPEG SAOC

This proposes an additional tool that enhances performance for the Karaoke application. The additional tool is the Two-to-Three (TTT) element to combine and split a foreground object and a background object, where background is L, R (stereo downmix) and the foreground object is C (mono object). FhG expects to propose this CE at the next meeting.

The contribution presented listening test results for the cases of Mute/Karaoke and Solo, each with varying levels of residual coding. Results showed that, with 24 kb/s of residual, performance was improved from “fair” for RM0 to “excellent” for the CE proposal. An audio demonstration is available at this meeting.

FhG expects to present a cross-check listening test at the next meeting to make a complete CE proposal.

Osamu Shimada, NEC, presented



15009

Osamu Shimada
Toshiyuki Nomura
Akihiko Sugiyama
Osamu Hoshuyama

Information on an additional SAOC functionality of separating real-environment signals into multiple objects

The contribution gives information on a new tool that could be incorporated into SAOC that permits monophonic signals to be separated into individual objects, e.g. speech and undesired background noise. The new tool permits separate control of object level and object localization. This separation would be done in the SAOC decoder.

NEC expects to present this as a CE proposal at the next MPEG meeting.


4.4.4MPEG-4 AAC-ELD


Pierrick Philippe, France Telecom, presented

14979

Pierrick Philippe
David Virette

Proposed changes to ELD AAC

The contribution covers three broad topics:

  • Block Switching CE – changes in FPDAM text to support incorporation of Block Switching CE

  • Delayless mixing – corrections and clarifications to the proposed informative annex.

  • Syntax changes

It is the FT recommendation that the AAC-ELD syntax be as close to the MPEG-4 syntax as possible. This includes the following issues:

  • Backward compatibility is not possible, so backward compatible signalling is not required.

  • Retain method of embedding SBR data in extension payload

  • Suggested small syntax changes may not result in significant bitrate savings.

Markus Schnell, FhG, noted that the FPDAM text already has removed embedding of SBR data in extension payload. Concerning syntax changes the result in bit savings, the Chair noted that savings of up to 1 kb/s corresponds to approximately 3% when operating at 32 kb/s, which is significant.

Markus Schnell, FhG, presented



14999

Markus Schnell
Ralf Geiger

Proposed changes on ELD

This contribution proposes the following

  • Transmission of multiple SBR headers in ELDSpecifcConfig to support multichannel signals. This permits the system to be correctly initialized after reading the ELDSpecifcConfig.

  • Simplification of sytax. This proposes to transmit SBR payload after the AAC data, and do this in a way that supports multichannel.

  • Error sensitivity categories. This proposes to assign SBR data to two categories of error resilience.

  • Clarification of text describing filter bank.

Markus Schnell, FhG, presented

15000

Markus Schnell

Proposed Signaling Extension for AAC ELD

This contribution proposes changes to the ELDSpecificConfig such that all information needed to instantiate a AAC-ELD decoder is known after parsing the ELDSpecificConfig (e.g. output sampling rate, number of channels, single or dual rate). It also proposes an extensible mechanism in the Config that permits new tools (e.g. SAOC) to have their configuration information carried in the ELDSpecificConfig.

Discussion of m14979, m14999 and m15000

It was the consensus of the Audio Subgroup to:



  • Since backward compatibility is not possible, avoid backward compatible signalling methods for AAC-ELD. The task group will draft text for subpart 1, which will be reviewed when the FPDAM text is reviewed.

Incorporate the following into the AAC-ELD FPDAM text:

  • new syntax for extensible ELDSpecificConfig (m15000)

  • new error sensitivity categories (m14999)

  • clarification of informative text describing encoder filterbank (m14999)

  • transmission of multiple SBR headers in ELDSpecifcConfig to support multichannel signals (m14999)

Continue to discuss

  • Block Switching CE

  • Transport of SBR payload

  • Small syntax changes aimed at bitrate savings

Jeff Huang, Qualcomm, presented

15008

Yuriy Reznik
Ravi Chivukula
Jeff Huang

Proposed informative addition to AAC ELD

This contribution presents information on the relationship between the MDCT filterbank in AAC and the filterbank in AAC-ELD. In particular, it is shown that the ELD filterbank can be computed using an MDCT core within some simple indexing and sign inversions as pre- and post-processing steps.

The consensus of the Audio Subgroup is to continue to discuss ideas of fast forms of filterbanks and other computationally complex element of MPEG specifications, with the possibility of creating informative text in the future that would address such issues in a comprehensive way (e.g. for both ELD and SBR filterbanks, etc). Additionally, the Audio Subgroup encourages the authors to publish the ideas presented in this contribution in the literature so as to make them available to implementers of MPEG specifications.



Further Discussion: Thursday Afternoon

Jeff Huang, Qualcomm, presented a candidate output document that might become an MPEG-4 Technical Report. This document is intended to present fast implementations of filterbanks found in MPEG-4 specifications, such as the filterbank in AAC, the filterbank in AAC-ELD and the SBR filterbank. Ralph Sperschneider, FhG, and Werner Oomen, Philips, felt that this was not an appropriate topic to pursue in the Audio Subgroup. Mohammed Mansour, TI, raised the issue of how would the Audio Subgroup select the “best” implementation for inclusion into this document. Ralph Geiger, FhG, noted that the current content of the document, based on contribution m15008, may not even be the best implementation for the ELD filterbank. The Chair noted that one could have “core experiments” on such a document, but that might not be the best use of Audio Subgroup resources.

It was the consensus of the Audio Subgroup that the proposal not be an output document and that such a technical report will not be entertained at this time. However, the Chair will investigate whether the Implementation Studies Group might be an appropriate form for such work. Reference code for such fast implementations may be collected as part of the MPEG-4 VM.

David Virette, France Telecom, presented



14973

Pierrick Philippe
David Virette

Proposal for MPEG-4 AAC ELD Verification Tests

The contribution proposes the following as a framework for the AAC-ELD verification test:

  • Mono signals

  • G722.1 Annex C at 32 kb/s

  • MPEG-4 AAC-ELD at 32 kb/s

  • MPEG-4 AAC-LD at 32 kb/s and 38 kb/s

  • MPEG-4 HE-AAC at 32 kb/s

  • MUSHRA methodology

  • Use Speech and Audio database as test items

The contribution proposed to have a pre-selection phase that will select final test items from the set of candidate coded items, and FT

Ralf Geiger, FhG, presented



15001

Markus Schnell
Ralf Geiger

Proposal for MPEG-4 ER AAC ELD Verification Test

The contribution proposes the following as the framework for a verification test:

  • Mono signals

  • G722.1 Annex C at 32 kb/s

  • G722.2 (AMR-WB) at 23 kb/s

  • MPEG-4 AAC-LD at 32 kb/s

  • MPEG-4 AAC-ELD at 32 kb/s

  • MPEG-4 HE-AAC at 32 kb/s

  • MUSHRA methodology

  • Use MPEG-4 test set as test items

Discussion

Kristofer Kjörling, Coding Technologies, noted that there should be application-driven and technology-driven tests. He proposed that an application-driven test might be



  • AAC-ELD at 32 kb/s and 48 kb/s and even 64 kb/s

  • G722.1 Annex C at 32 kb/s

It was decided to continue to discuss this in a break-out group with the goal of drafting a test workplan that will be an output document.

Discussion Thursday 9AM

The Chair presented a summary of the core experiment results from the past several meetings. Bernhard Grill, FhG, presented his summary of the core experiment results from the past several meetings. The proponents of AAC-ELD and the block switching CE discussed the pros and cons of the block switching CE, including level of performance improvement and impact on complexity and implementation.

The Chair summarized the discussion by noting that the non-symmetric windows in the AAC-ELD filterbank together with the TNS tool provide the means for effectively coding transients. Added to this base architecture, the block switching tool provides an increment in performance. However, the level of increment is small and the tool has an impact on some implementation platforms, such as low-cost DSP chips and some use cases, such as those that might exploit frequency-domain mixing.

It was a consensus of the Audio Subgroup that block switching provides an improvement, but the Audio Subgroup feels that the amount of improvement is small relative to potential complexity increase and applicability of the proposed technology to identified scenarios.

It was the consensus of the Audio Subgroup that the Block Switching technology not be adopted into the AAC-ELD FDAM text.

Discussion Thursday 2:30PM

The remain open issues on AAC-ELD are:



  • Transport of SBR payload

  • Small syntax changes aimed at bitrate savings

Transport of SBR payload. The FPDAM text has syntax in which the SBR payload is part of the AAC-ELD payload rather in the extension payload. There were no opinions expressed to change the FPDAM text, so it was the consensus of the Audio Subgroup to keep the FPDAM text as is.
Syntax Changes. Markus Schnell, FhG, reviewed the set of syntax changes that are presented in contribution m14999, and which are that same as those presented in Chapter 7 of N9237, Technology under Consideration for AAC-ELD. Pierrick Philippe, France Telecom, raised an objection to the proposed changes, and reminded audio experts that these objections are presented in his contribution m14979. The Chair confirmed that the proposed changes result in 11 bits/frame of savings, or a savings of 0.55 kb/s when operating AAC-ELD in 48 kHz/24 kHz dual mode. The Chair further noted that there is no backward compatibility in AAC-ELD, and hence no compelling reason to retain a given structure in the bitsteram. Kristofer Kjörling, Coding Technologies, felt that it would be inappropriate for the Audio Subgroup to neglect this potential bitrate savings when the primary objective of its work is compression efficiency.

It was the consensus of the Audio Subgroup to incorporate the proposed syntax changes into the AAC-ELD FDAM text.


Thursday 5PM

Markus Schnell, FhG, presented the AAC-ELD FDAM text. There was consensus on the FDAM text. The Chair advised that approval Friday morning will be a formality only with minimum discussion, but noted that there will be a 1-month editing period to check and re-check the correctness of the text. The Chair also noted that he has the expectation that all National Bodies will approve the AAC-ELD FPDAM text in MPEG closing plenary.


Ralf Geiger, FhG, presented the DoC on AAC-ELD FPDAM. The response to the FNB, GNB and USNB was discussed and it was agreed that revisions were required and this will be done in a task group this evening.

4.4.5Speech and Audio Coding


The Chair presented the CfP text several times for review and revision by the group. Likewise, the Chair presented the Evaluation Guidelines text for review and several revisions were made. After review, there was consensus on the text of the CfP and final text will be reviewed and approved Friday morning. The open issues in the Evaluation Guidelines will be identified Friday morning prior to review and approval.

Yüklə 3,67 Mb.

Dostları ilə paylaş:
1   ...   45   46   47   48   49   50   51   52   ...   55




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin