1Opening Audio Plenary
The MPEG Audio Subgroup meeting was held during the 94th meeting of WG11, October 11-15, 2010, Guangzhou, China. The list of participants is given in 1.
2Administrative matters 2.1Communications from the Chair
The Chair summarised the issues raised at the Sunday evening Chair’s meeting, proposed task groups for the week, and proposed agenda items for discussion in Audio plenary.
2.2Approval of agenda and allocation of contributions
The agenda and schedule for the meeting was discussed, edited and approved. It shows the documents contributed to this meeting and presented to the Audio Subgroup, either in the task groups or in Audio plenary. The Chair brought relevant documents from Requirements, Systems to the attention of the group. It was revised in the course of the week to reflect the progress of the meeting, and the final version is shown in 2.
2.3Creation of Task Groups
Task groups were convened for the duration of the MPEG meeting, as shown in . Results of task group activities are reported below.
2.4Approval of previous meeting report
The Chair asked for approval of the 93rd Audio Subgroup meeting report, which was registered as a contribution. There was some discussion and wording concerning discussion in the Audio closing plenary was modified, and the revised report was approved.
2.5Review of AHG reports
There were no requests to review any of the AHG reports.
2.6Joint meetings -
Who
| -
What
| -
Where
| -
When
| -
Req, Audio
| -
Audio for HEVC
| -
Audio
| -
Wed, 1400-1500
| 2.7Received National Body Comments and Liaison matters -
Num.
| -
Source
| -
Respond
| -
M17985
| -
ITU-T SG 16 to SC 29/WG 11
-
Approval of a New Question on Telepresence systems
| -
S. Quackenbush
| 2.8Plenary Discussion 3Record of AhG meetings 3.1AhG Meeting on USAC -- Sunday 1000-1800
USAC Text and Reference Software corrections
Max Neuendorf, FhG, presented
-
m18456
| -
Corrections to Reference Software and CD of USAC
| -
Max Neuendorf
|
This contribution proposes changed to the USAC CD text and Reference Software. The various proposed changes are categorized into the following three categories:
Changes to the CD text (which are only editorial)
-
Description of spectral noiseless coding
-
Consistent use of variables
-
Clarification of arithmetic coding context
-
Correction to numerical constants
-
Description of overlap-add for synthesis of output from adjacent coded frames
-
Clarifications to MDCT-based TCX description
-
Decoding coefficents for complex stereo prediction in MDCT
-
Combination of SBR and MPS 212
-
Re-organization of CD text to conform to ISO directive
Changes to the Reference Software
-
Recalculation of pitch gain for use in the bass postfilter in TCX to ACELP transitions
-
Corrections to handling higher bands in 32-band QMF analysis filterbank
-
Should not initialize high QMF bands to zero in unified stereo coding
Changes to both CD text and Reference Software
-
Simplification of AVQ bit stream syntax. This re-organizes the bit stream but does not affect the decoded waveform
-
Make all integer serialization msb first
-
Remove interleaving of AVQ coefficients
The AhG recommends to
-
Adopt the proposed changes to the CD text (which are only editorial) into a “Study on USAC CD” output document.
-
Make the proposed changes to the USAC reference software
-
Adopt the proposed changes to the CD text and make the associated changes to the USAC reference software
Schuyler Quackenbush, ARL, presented
-
m18077
| -
Draft USAC CE Status and Workplan
| -
S. Quackenbush
|
The presenter asked USAC CE proponents to please check the CE status tables and to bring them up to date so that producing the output document on Friday is that much easier.
Improved bass- post filter
Philippe Gournay , VoiceAge, presented
-
m18435
| -
VoiceAge listening test results for the enhanced bass-postfilter CE
| -
Philippe Gournay, Roch Lefebvre
|
The contribution presented 16 kb/s listening test results for this CE. Systems under test are RM7 and RM7+CE. Seven items containing singing voice with music background were used as items in addition to the CfP test items. There was no difference for absolute scores. When considering differential scores, 4 of the 19 items are better and the mean is better at 95% level of significance.
Max Neuendorf, FhG, presented
-
m18451
| -
FhG listening test report on CE on improving the USAC bass post-filter
| -
Guillaume Fuchs, Max Neuendorf
|
The contribution presented 16 kb/s mono listening test results for this CE. There is no difference for absolute scores. When considering differential scores, 2 items are better, 1 item worse at the 95% level of significance.
The contribution also reports that the CE software, as integrated into the decoder, was run and decoded the listening test bit streams to produce exactly the listening test decoded waveforms.
David Virette, Huawei, presented
-
m18467
| -
Report on cross-check listening test for the CE on improved Bass-post filter for USAC
| -
David Virette
|
The contribution presented 16 kb/s mono listening test results for this CE. There is no difference for absolute scores. When considering differential scores, 1 item worse (es01) at the 95% level of significance.
Kristofer Kjörling, Dolby, presented
-
m18379
| -
Finalization of CE on an improved bass- post filter operation for the ACELP of USAC
| -
Barbara Resch, Leif Sehlström, Heiko Purnhagen, Lars Villemoes, Kristofer Kjörling, Bruno Bessette, Philippe Gournay
|
The contribution presented 16 kb/s mono listening test results for this CE. There is no difference for absolute scores. When considering differential scores, 4 items better, the mean better, none worse at the 95% level of significance.
The contribution also presents a overview of the technology and a summary of the cross-check results.
It notes that the current bass post-filter, used in ACELP coding mode, helps enhance the vocal signal. However in the case that there is background music with strong low-frequency harmonics, when the coder switches between ACELP, TXCX and FD modes, the suppression of the musical low harmonics comes and goes and is quite audible.
When considering individual sites, for 2 items and for the mean, 2 of 4 sites agree on improvement.
When all data is pooled, there are a total of 32 listeners. When considering differential scores, 4 items better, the mean better and none worse at the 95% level of significance.
The AhG recommends adopting the CE technology into the “Study on USAC CD” output document.
Complexity reduction for time warping
Takeshi Norimatsu, Panasonic, presented
-
m18360
| -
Panasonic cross check report on complexity reduction for time warping in USAC
| -
Takeshi Norimatsu, Tomokazu Ishikawa, Haishan Zhong, Dan Zhao, Kok Seng Chong
|
The contribution reports results for a 64 kb/s stereo listening test. It showed no significant differences in either absolute or differential scores, for any item or for the mean.
Heiko Purnhagen, Dolby, presented
-
m18375
| -
Dolby listening test results for CE on complexity reduction for time warping in USAC
| -
Heiko Purnhagen, Kristofer Kjörling
|
The contribution reports results for a 64 kb/s stereo listening test. It showed no significant differences for absolute scores. For differential scores, 1 item was better.
Markus Multrus, FhG, presented
-
m18452
| -
Completion of the Core Experiment on Reducing the Complexity of the USAC Time Warping
| -
Stefan Bayer
|
The contribution gives an overview of the CE technology and a summary of the listening test results.
The technology uses time warping to reduce the pitch interval variations over a frame such that the pitch epochs have a more nearly harmonic frequency representation and thus can be more efficiently coded by the FD coder mode.
The following table presents computational complexity. All numbers are WMOPS.
-
| -
RM8
| -
RM8+CE
| -
TW tool only
| -
19.6
| -
9.5
| -
Full Decoder
| -
33.4
| -
23.3
|
The contribution reports results for a 64 kb/s stereo listening test conducted at FhG. It showed no significant differences in either absolute or differential scores, for any item or for the mean.
When listening test data are pooled over all test sites (25 listeners), it showed no significant differences in either absolute or differential scores, for any item or for the mean.
Heiko Purnhagen, Dolby, reported that he decoded the listening test bitstreams (which were the RM8 bitstreams) using the CE proponent decoder and produced the waveforms that were used in the listening test. He then decoded the bitstreams using the RM8 decoder and produced the reference RM8 decoded waveforms.
The AhG recommends adopting the CE technology into the “Study on USAC CD” output document.
Improved SBR in USAC
Toru Chinen, Sony, presented
-
m18398
| -
Sony listening test report on improved SBR in USAC
| -
Toru Chinen, Masayuki Nishiguchi
|
The contribution reports the results of a 12 kb/s mono listening test. It showed no significant differences in either absolute or differential scores, for any item or for the mean.
Max Neuendorf, FhG, presented
-
m18431
| -
FhG Listening Test Report – improved SBR
| -
Stephan Wilde, Max Neuendorf
|
The contribution reports the results of a 12 kb/s mono listening test. It showed no significant differences in absolute scores. For differential scores, 2 items were better at the 95% level of significance.
The contribution also reports on software verification: It confirms that the listening test bitstreams did decode to the listening test waveforms. Furthermore, it confirms that WD7 bitstreams decoded to the WD7 reference waveforms.
Kristofer Kjörling, Dolby, presented
-
m18378
| -
Finalization of CE on improved SBR
| -
Kristofer Kjörling, Leif Sehlström
|
The contribution gives an overview of the CE technology, presents new listening test results and presents all listening test results as pooled data.
In RM7, SBR uses much of the machinery present in MPEG-4 SBR. One shortcoming present in SBR is that the copy-up envelope can have considerable discontinuities (as, for example, shown in es01), which might be so large that adjustment limiters prevent a target envelope from being achieved. The CE technology is a “pre-adjustment” gain stage which insures that the high-band envelope adjuster is able to make the adjustments needed in the high bank.
Computational complexity is quite low, approximately 0.1 WMPOS.
A control bit in the SBR header signals that the tool should be used from that point forward. Toru Chinen, Sony, noted that ARIB in Japan are currently using HE-AAC and send an SBR header very 500 ms. Kristofer Kjörling, Dolby, noted that the SBR encoder would usually send the SBR header whenever it was advantageous to change SBR configuration (which is specified in the SBR header).
For the pooled data, two items are better at the 95% level of significance.
The AhG recommends adopting the CE technology into the “Study on USAC CD” output document.
Harmonic transposer
Kihyun Choo, Samsung, presented
-
m18372
| -
Crosscheck report on harmonic transposer CEs
| -
Kihyun Choo, Miyoung Kim, Eunmi Oh
|
The contribution reports the results of a 16 kb/s mono listening test that compares all configurations of the two CEs. There were no differences with respect to the absolute scores.
With respect to differential scores, it reports the following:
-
QMF – FFT: 1 better, 1 worse.
-
QMFxp – QMF: 1 worse.
-
FFTxp – FFT: 1 worse
Kimitaka Tsutsumi, NTT DOCOMO, presented
-
m18466
| -
NTT DOCOMO Cross-check Report on Improved Harmonic Transposer in USAC
| -
Kimitaka Tsutsumi, Kei Kikuiri, Nobuhiko Naka
|
The contribution reports the results of a 16 kb/s mono listening test that compares all configurations of the two CEs. There were no differences with respect to the absolute scores.
With respect to differential scores, it reports the following:
-
QMF – FFT: 3 better
-
QMFxp – QMF: no difference.
-
FFTxp – FFT: 3 better, mean better
In addition, the contribution reports that NTT DOCOMO confirms that the listening test bitstreams decode exactly to the listening test decoded waveforms.
Jeff Huang, Qualcomm, presented
-
m18459
| -
Crosscheck listening test report for USAC on FFT and QMF harmonic transposers
| -
Jeff Huang
|
The contribution reports the results of a 16 kb/s mono listening test that compares all configurations of the two CEs. There were no differences with respect to the absolute scores.
With respect to differential scores, it reports the following:
David Virette, Huawei, presented
-
m18468
| -
Report on cross-check listening test for the Ces on QMF based harmonic transposer and improved harmonic transposer in USAC
| -
David Virette
|
The contribution reports the results of a 16 kb/s mono listening test that compares all configurations of the two CEs. There were no differences with respect to the absolute scores.
With respect to differential scores, it reports the following:
-
QMF – FFT: 1 better
-
QMFxp – QMF: no difference
-
FFTxp – FFT: no difference
In addition, the contribution reports that Huawei confirms that the listening test bitstreams decode exactly to the listening test decoded waveforms.
Zhong Haishan, Panasonic, presented
-
m18501
| -
Panasonic crosscheck report on improved harmonic transposer
| -
Zhong Haishan, Chong Kok Seng, Zhao Dan, Takeshi Norimatsu, Tomokazu Ishikawa, Neo Sua Hong
|
The contribution reports the results of a 16 kb/s mono listening test that compares all configurations of the two CEs. There were no differences with respect to the absolute scores.
With respect to differential scores, it reports the following:
-
QMFxp – QMF: 2 better
-
FFTxp – FFT: 5 better
-
QMFxp – FFTxp: no difference
Zhong Haishan, Panasonic, presented
-
m18386
| -
Finalization of CE on QMF based harmonic transposer
| -
Haishan Zhong, Kok Seng Chong, Takeshi Norimatsu, Tomokazu Ishikawa, Lars Villemoes, Per Ekstrand, Kristofer Kjörling, Stephan Wilde, Sascha Disch, Frederik Nagel, Max Neuendorf ,
|
The contribution gives an overview of the CE technology, including complexity information, and also presents a summary of all listening test results.
The FFT transposer has high frequency resolution but also high complexity, while the QMF transposer has much lower complexity, as shown in the following table:
Configuration
|
Total WMOPS Transposer only
|
WMOPS percentage Transposer
|
Total WMOPS Decoder
|
WMOPS percentage Decoder
|
FFT based Harmonic Transposer with 10% oversampling frames (WD7)
|
5.79
|
100%
|
9.42
|
100%
|
QMF based harmonic transposer
|
0.86
|
14.8%
|
4.49
|
47.7%
|
When all listening test data is combined, analysis of differential MUSHRA scores shows:
Harmonic transposer – Cross products technology
Kristofer Kjörling, Dolby, presented
-
m18384
| -
Finalization of CE on improved harmonic transposer in USAC
| -
Lars Villemoes, Per Ekstrand, Sascha Disch, Frederik Nagel
|
The contribution presented an overview of the RM8 transposer technology as compared to the CE transposer technologies. It makes the point that the RM8 transposer results in many missing harmonics which are perceived as “ghost” higher fundamentals. The proposed cross-product technology permits “filling-in” the missing fundamentals via construction filterbank signals as a sum of adjacent low-band filter signals.
QMF suffers when signals have very low fundamentals, since its low frequency resolution results in many distinct fundamentals mapping to the same quantized representation.
When all listening test data is combined, analysis of differential MUSHRA scores shows:
-
QMFxp – QMF: 4 better, mean better
-
FFTxp – FFT: 4 better, mean better
When looking at individual test sites:
-
QMFxp – QMF: no strong consensus
-
FFTxp – FFT: 3 items for which at least 3 of 6 agree
The AhG recommends adopting the cross-product technology into the “Study on USAC CD” output document.
Harmonic transposer – QMF technology
Kristofer Kjörling, Dolby, presented
-
m18389
| -
Overview of performance of transposer proposals, and suggested decoding modes
| -
Kristofer Kjörling, Haishan Zhong, Kok Seng Chong, Takeshi Norimatsu, Tomokazu Ishikawa, Lars Villemoes, Per Ekstrand, Stephan Wilde, Sascha Disch, Frederik Nagel, Max Neuendorf ,
|
The contribution presents an overview of the proposed transposer technology. There are two CEs which propose the following replacements or additions to the current RM7 FFT transposer technology:
-
QMF (replacement or addition)
-
Cross products (addition)
It focuses on the 16 kb/s mono and stereo operating point because here the transposer requires the greatest fraction of decoder resources.
For mono:
-
QMF – 4.5 MOPS (reduces total decoder complexity to 48% of RM7)
-
Cross products
-
FFT – 7.4 MOPS (reduces total decoder complexity to 78% of RM7)
-
QMF – 4.7 MOPS (reduces total decoder complexity to 50% of RM7)
For stereo:
-
QMF – 8.6 MOPS (reduces total decoder complexity to 63% of RM7)
-
Cross products
-
FFT – 11.5 MOPS (reduces total decoder complexity to 85% of RM7)
-
QMF – 8.8 MOPS (reduces total decoder complexity to 65% of RM7)
It notes that the quality of the QMF transposer is comparable to that of the FFT transposer. It further notes that incorporating cross products into the transposer provides a significant increase in quality while requiring either a decrease in complexity (FFT) or a modest increase in complexity (QMF). If a decoder with FFT and cross products is used as a baseline of 100%, then the QMF transproser with cross products results in a decoder that is 65% of the baseline complexity.
In terms of quality, when all data is pooled, 1 item is better for the differential score QMFxp – FFTxp, however there is not strong agreement amongst the results from individual test sites.
The contribution proposes that there be a “Low Power” and a “High Quality” decoding modes, where a single bitstream syntax can be decoded in either decoding mode. For each decoding mode, the transposers in each would be:
-
Low Power: QMFxp
-
High Quality: FFTxp
Discussion
The Chair noted that, in USAC, there is no “low power” mode defined in USAC. Hence, the Chair feels that the proposal is whether to have a FFTxp transposer (low complexity) or a QMFxp transposer (very low complexity) or both.
Max Neuendorf, FhG, notes that when the individual test sites compare QMFxp versus FFTxp, for many sites there was judged to be 1 or more item worse (i.e. QMFxp worse than FFTxp).
Werner Oomen, Philips, felt that there should be only one transposer in USAC and that the group should pick one.
The Chair felt that there was not consensus in the group to make a decision at this time. The topic will be brought up again later in the MPEG week.
T/F domain post-processing
Kihyun Choo, Samsung, presented
-
m18371
| -
Crosscheck report on adaptive T/F domain post-processing for USAC
| -
Kihyun Choo, Miyoung Kim, Eunmi Oh
|
The contribution presents results of a listening test comparing WD6+CE and WD6. The operating points and test results for the statistic (WD6+CE - WD6) were
-
12 kb/s mono
-
Absolute scores: no difference
-
Differential scores: 1 better
-
8 kb/s mono
-
Absolute scores: no difference
-
Differential scores: 2 better
Heiko Purnhagen, Dolby, presented
-
m18373
| -
Dolby listening test results for CE on T/F post-processing in USAC
| -
Heiko Purnhagen, Kristofer Kjörling
|
The contribution presents results of a listening test comparing WD6+CE and WD6. The operating points and test results for the statistic (WD6+CE - WD6) were
-
12 kb/s mono
-
Absolute scores: no difference
-
Differential scores: no difference
-
8 kb/s mono
-
Absolute scores: no difference
-
Differential scores: 2 better, 1 worse (Normal distribution) or 1 better (Student t distribution)
Jeff Huang, Qualcomm, presented
-
m18461
| -
Crosscheck listening test report for USAC on time frequency domain post-processing
| -
Jeff Huang
|
The contribution presents results of a listening test comparing WD6+CE and WD6. The operating points and test results for the statistic (WD6+CE - WD6) were
-
12 kb/s mono
-
Absolute scores: no difference
-
Differential scores: : 3 better, 1 worse
David Virette, Huawei, presented
-
m18471
| -
Finalization of CE on adaptive T/F domain post-processing for USAC
| -
David Virette, Wei Xiao
|
The contribution presents results of a listening test comparing WD6+CE and WD6. The operating points and test results for the statistic (WD6+CE - WD6) were
-
12 kb/s mono
-
Absolute scores: no difference
-
Differential scores: 3 better, mean better
-
8 kb/s mono
-
Absolute scores: no difference
-
Differential scores: 5 better, mean better
When data from all test sites is pooled
-
12 kb/s mono
-
Absolute scores: no difference
-
Differential scores: 4 better, mean better
-
8 kb/s mono
-
Absolute scores: no difference
-
Differential scores: 5 better, mean better
It reviewed the complexity of the CE technology, which is shown in the following table:
|
Average PCU
|
Maximum PCU
|
RM6
|
mono@8kbps
|
0.24
|
0.56
|
|
mono@12kbps
|
0.31
|
0.73
|
8
|
The presenter noted that the post-processing control bits are transmitted in the bit stream only if the coding mode is LP.
Kristofer Kjörling, Dolby, noted that the Spectrum Flattening Post Processing seems similar to the “Improved SBR” tool, which also helps to flatten the SBR HF envelope. The Chair asked how this post processor compares to the “Improved Bass Post-Filter” Philippe Gournay , VoiceAge noted that the Bass Post Filter was limited to processing the signal below 500 Hz. The presenter noted that this technology does noise shaping across the spectrum.
The Chair felt that there was not consensus in the group to make a decision at this time. The topic will be brought up again later in the MPEG week.
Dostları ilə paylaş: |