Opening Audio Plenary
The MPEG Audio Subgroup meeting was held during the 92nd meeting of WG11, April 19-23, 2010 in Dresden, Germany. The list of participants is given in Annex A.
Administrative matters Communications from the Chair
The Chair noted that the April 15, 2010 eruption of the volcano Eyjafjallajökull in Iceland has made travel to this MPEG meeting difficult to impossible for most of the audio experts. Air travel over most of Europe was prohibited due to the risk posed by the resultant ash cloud. For the most part, delegates that were scheduled to arrive before the 15th and those that were able to travel by train were able to get to the meeting by Monday. Some few delegates additional arrived Tuesday, Wednesday and Thursday. Other delegates, particularly those from North America, Korea and Japan were not able to attend. Because of the low participation, the Chair indicated that the only decisions for the week would be those for which there is no controversy. The Chair kept all delegates informed of the decisions taken in the Audio room and gave those not present an opportunity to indicate opposing opinions via email. This was done on a daily basis as a means to confirm that decisions made truly represent the consensus of all delegates.
The Chair summarised the issues raised at the Sunday evening Chair’s meeting, proposed task groups for the week, and proposed agenda items for discussion in Audio plenary.
Approval of agenda and allocation of contributions
The agenda and schedule for the meeting was discussed, edited and approved. It shows the documents contributed to this meeting and presented to the Audio Subgroup, either in the task groups or in Audio plenary. The Chair brought relevant documents from Requirements, Systems to the attention of the group. It was revised in the course of the week to reflect the progress of the meeting, and the final version is shown in Annex B.
Creation of Task Groups
Task groups were convened for the duration of the MPEG meeting, as shown in . Results of task group activities are reported below.
Approval of previous meeting report
The 91st Audio Subgroup meeting report was registered as a contribution, and was approved.
Review of AHG reports
There were no requests to review any of the AHG reports.
Joint meetings
There were no joint meetings.
Received National Body Comments and Liaison matters
There were no NB or Liaison statements generated at the meeting.
Plenary Discussion
The Chair presented a draft schedule for the 92nd meeting. Delegates present discussed what items (e.g. contributions that request an action, CE status or other open issues) could be decided at this meeting and what should be deferred to next meeting when all delegates can attend and participate in the discussion. The plan for this meeting is for delegates present to take a consensus position, communicate this via email to delegate not present and gather feedback and comments. The positions on which all agree will be taken as consensus positions of the Audio subgroup.
Record of AhG meetings AhG Meeting on USAC -- Sunday 1000-1800
This meeting was cancelled because many delegates were unable to arrive for the Sunday meeting.
Task group activities Task Group discussions MPEG-2, MPEG-4, MPEG-7, Audio Conformance, Reference Software, MPEG Surround
Julien Robilliard, FhG, presented
m17518
|
Comments on MPEG Surround
|
Andreas Hoelzer
Dominik Will
Johannes Hilpert
|
This document proposed to correct three issues concerning MPEG surround:
MPEG Surround text (ISO/IEC 23003-1)
-
Correct text to align with reference software
-
Proposes to issue correction as ISO/IEC 23003-1/DCOR 3 at this meeting
MPEG Surround Conformance
-
Add one bitstream restriction
-
Proposed to add correction to Study on ISO/IEC 23003-1:2006/AMD 1/DCOR 1
MPEG Surround Reference Software
-
Identified an error, but as yet no solution.
-
Proposed to add paragraph to Study on ISO/IEC 23003-1:2007/AMD 2/DCOR 2 that raises the awareness of this issue.
It was the consensus of the Audio subgroup to adopt the proposed actions.
The Chair presented
m17568
|
Defect Report on ISO/IEC13818-7:2006
|
Ralph Sperschneider
|
It was the consensus of the Audio subgroup to turn the contribution into a DCOR, ISO/IEC13818-7:2006:DCOR 1.
The Chair presented
m17520
|
Updates on BSAC Conformance for Broadcasting
|
Miyoung Kim
Eunmi Oh
|
It was the consensus of the Audio subgroup to issue a DCOR, ISO/IEC14496-26:2009/DCOR 1 that has an electronic annex containing the corrected bitstreams.
The author of the following document was not able to attend the meeting. The Chair urged delegate to study it and communicate questions and concerns directly to the author.
m17453
|
Proposal for Mood Description of Music
|
Kyoungro Yoon
|
Stefan Doehla, FhG, presented
m17584
|
Encoder/Decoder round-trip
|
Stefan Doehla
|
The contribution reviewed the current technical report on this topic. It notes that the edit list mechanism is really too complicated for a simple “round-trip” or splicing operation. The Systems specification assumes that the first timestamp is 1 and cannot support “negative” timestamps. This is a problem for “pre-roll” access units. The contribution proposes specific extensions to the bitstream to support these functionalities and suggest that bitstreams could be added to Conformance to show the functionality.
Heiko Purnhagen, Dolby, noted that “fractional” composition units could be problematic, and also emphasised that we must be sure that suggested implementations do not “break” anything in the field.
The Chair noted that it would be very good to use the HE-AAC or USAC code base to construct a “round-trip” encode/decode implementation with appropriate pre-roll and post-roll. The group should study appropriate meta-data (in the audio bitstream) that can signal and control “round-trip,” splicing and “gapless playback” operations. As a first step, bitstreams and possible extensions to syntax (i.e. additional meta-data) and to the reference software can be made available that show the implementation.
MPEG-D Spatial Audio Object Coding
Leonid Terentiev, FhG, presented
m17539
|
Fraunhofer IIS listening test report on SAOC for teleconferencing applications
|
Oliver Hellmuth
Jürgen Herre
Johannes Hilpert
Leonid Terentiev
Cornelia Falch
|
The contribution reports on a listening test of candidate SAOC teleconferencing setups.
Coder name
|
Description
|
DMX
|
SoA
|
State-of-the-Art system architecture
|
Stereo
|
SAOC-A
|
Selective downmix (3 objects); SAOC encoding at the MCU side
|
Mono
|
SAOC-B
|
Common downmix (4 objects); SAOC encoding at the MCU side
|
Mono
|
SAOC-C
|
Common downmix (4 objects); SAOC encoding at the user side
|
Mono
|
SAOC-D
|
Selective downmix (3 objects); SAOC encoding at the user side
|
Mono
|
Listening test results show that performance is comparable for all individual items. When pooling over all items, SAOC-A and SAOC-D have performance comparable to that of SoA, while SAOC-B and SAOC-C have significantly lower performance (at 95% level of significance). This makes sense, as B and C both rely on SAOC “karaoke” processing (without residual) to remove local talker from the downmix signal.
Main conclusion of contribution is that SAOC-A and SAOC-D have performance that is comparable to that of the state of the art teleconferencing system.
Juergen Herre, FhG, noted that the tests B and C do not reflect a real-world use scenario, in that a teleconference participant would be talking simultaneous to the SAOC MCU presenting a downmix having “karaoke” like processing (with a level of perceptible distortion). Hence the auditory peripheral masking effects and higher-level cognitive processing would make this distortion less perceptible to the local talker.
Gregory Pallone, Orange Labs, presented
m17540
|
Evaluation of different SAOC configurations for Teleconferencing Use Cases
|
Gregory Pallone
Pierrick Philippe
|
The contribution reports on a listening test of the identical candidate SAOC teleconferencing setups as was presented in the previous contribution (m17539). The test results show very similar results: SAOC-A and SAOC-D have performance that is comparable to that of the state of the art teleconferencing system while SAOC-B and SAOC-C have significantly lower performance (at 95% level of significance).
The presenter noted that B and C provide for low complexity operation, that may be of interest to industry. For example, in B and C there is a single downmix for every participant in the conference so that the MCU has lower complexity. In B SAOC encode/decode complexity is at MCU, while in C SAOC encode/decode complexity is at endpoints.
The Chair and Mohammed Raad, RaadTech, noted that a real product would offer an individual downmix to those who are actually talking and a common downmix to those who are not talking, so that complexity issues of scaling up to many conference participants may be very less that envisioned. Juergen Herre and Leonid Terentiev, FhG, noted that DCU could help the B and C configuration by making the local talker leakage less distorted (e.g. less “musical noise”).
The Chair noted that a real product might combine modes A and B or D and C, or even A, B, C and D. In this case, a product might use mode A when a participant is actually talking or mode B when the participant is not talking. Hence the quality of systems using only B and C is not relevant.
Juergen Herre, FhG, noted that SAOC technology is orthogonal to the issue of downmix tandeming, and hence outside the scope of the verification test report. He advocates testing SAOC-D in that it shows the full value of SAOC with a parameter-domain MCU. Leonid, Terentiev, FhG, noted that an even better “showcase” is for there to be multiple objects at a given endpoint.
The presenter noted that if the verification test report presents the complexity of B and C, it still does not give an assessment of the quality that might be achieved with a combination of modes A and B in a hypothetical product.
Oliver Hellumth, FhG, advocated testing only SAOC-D and notes that SAOC-A can be covered by other test configurations in the verification test.
Mohammed Raad, RaadTech, supported the idea that the verification test report contain a discussion addressing how one might use SAOC to build a practical teleconferencing system that scales to a large number of participants.
Chair summarized that one conclusion from the two presentations is to:
-
Test SAOC-D. It is noted that may be better to have multiple talkers at one (or each) endpoint.
-
Discuss how one might use SAOC to build a practical teleconferencing system that scales to a large number of participants.
There will be a breakout to define more precisely the “SAOC-D” configuration that will be tested.
Oliver Hellmuth, FhG, presented
m17537
|
Proposal for SAOC verification test
|
Jonas Engdegård
Heiko Purnhagen
Oliver Hellmuth
Jürgen Herre
Johannes Hilpert
Leonid Terentiev
Cornelia Falch
Werner Oomen
|
This contribution is a refinement of the output from the last MPEG meeting. Test scenarios are:
-
T1 Remix: low bit rate, stereo headphones with HE-AAC core coder (64 kb/s)
-
T2 Karaoke: high bit rate, stereo headphones with AAC core coder (128 kb/s)
-
T3 Stereo teleconferencing: stereo loudspeakers with AAC-ELD core coder (40 kb/s)
-
T4 Multichannel teleconferencing: playback over ITU-T 5.0 (or ITU-T 3 front) with G.722.2 core coder (24 kb/s)
-
T5 Multichannel teleconferencing: playback over ITU-T 5.0 (or ITU-T 3 front) with AAC-ELD core coder (40 kb/s)
-
T6 Binaural teleconferencing: playback to headphones with G.722.2 core coder (24 kb/s)
There was considerable discussion on where the SAOC-A, B, C, D (particularly SAOC-D) fit into the test workplan. The Chair proposed that Gregory Pallone, Orange Labs, study whether his proposal can fit into the test wrokplan and to respond to the group Wednesday after MPEG plenary.
The Chair asked for delegated to make, where possible, commitments for number of listeners for tests T1 – T6 and it was agreed to have a breakout to draft details of the test logistics (e.g. who, what where and when) and bring that information back to the group.
The Chair will bring forward the final test workplan on Friday during document approval.
Oliver Hellmuth, FhG,
m17538
|
Contributions to SAOC conformance and reference software
|
Jonas Engdegård
Heiko Purnhagen
Oliver Hellmuth
Jürgen Herre
Johannes Hilpert
Leonid Terentiev
Andreas Hölzer
Cornelia Falch
Werner Oomen
|
It was the consensus of the Audio subgroup to issue the contribution as WD for SAOC conformance and reference software.
MPEG-D Unified Speech and Audio Coding
Max Neuendorf, FhG, presented
m17481
|
Corrections to Reference Software and WD6 of USAC
|
Max Neuendorf
|
The contribution proposes a number of changes. One set are just editorial, the others are minor technical.
Proposed editorial changes are
-
Nq[i] correction
-
Indices in arithmetic coding payload
-
Correct indices in pseudo-code
-
Remove pseudo-code and references for 1152 window
-
Indicate in tables of window sequences that DCT window slope changes if adjacent frame is LPD
-
Interpolation of LSP parameters: correct interpolation formula in text to agree with reference software operation.
-
Correct LP synthesis formula to clarify that decoder synthesis uses quantized LP coefficients.
-
Correct MDCT based TCX tool description to correctly describe operation of FDNS tool.
-
Clarification of FDNS text so that description shows weighted LPC rather than just LPC.
It was the consensus of the Audio subgroup to adopt the proposed editorial changes into WD7.
Changes affecting the Reference Software
-
Phase unwrapping in reference code is incorrect when “mod(angle, 2*PI) > 2*PI.” A correction is supplied. The contribution notes that the proposed correction changes the decoded waveform, but that the changes result in decoded waveforms with SNR with respect to the previous decoding of
It was the consensus of the Audio subgroup to adopt this proposed change into WD7.
-
Window generation in eSBR: a bug was discovered in the reference software that applied the wrong widow shape. The impact of the corrected versus previous decoded waveforms is 60 dB SNR.
Mohammed Raad, RaadTech, requested SNR figures for each test item. Mac Neuendorf, FhG, agreed to supply that by Tuesday morning of the MPEG week.
It was the consensus of the Audio subgroup to adopt this proposed change into WD7.
-
Clarification of FAC decoding process: In one of the four transition cases in which FAC is used (TCX to ACELP) the encoder and decoder reverse the FAC signal prior to compression and coding. In the other three cased they do not. It is proposed to “correct” the TCX to ACELP processing so it is like the other three.
Mohammed Raad, RaadTech, requested SNR figures for each test item that compare the old processing and the proposed new processing. Max Neuendorf, FhG, agreed to supply that by Tuesday morning of the MPEG week.
It was the consensus of the Audio subgroup to adopt this proposed change into WD7.
Changes affecting Working Draft and Reference Software
-
The current text and reference software implements the MDCT/IMDCT. Proposed change is to use DCT-IV.
It was the consensus of the Audio subgroup to adopt this proposed change into WD7.
Hervé Taddei, Huawei, presented
m17575
|
Proposed CE on adaptive T/F domain post-processing for USAC
|
Wei Xiao
David Virette
Hervé Taddei
Anisse Taleb
|
There was good discussion on the presentation. Because many experts were not present, aspects of the discussion were deferred to the AhG period and the next MPEG meeting.
Hervé Taddei, Huawei, presented
m17576
|
Proposed CE on additional bandwidth extension for USAC at low bit rates
|
Wei Xiao
David Virette
Hervé Taddei
Anisse Taleb
|
There was good discussion on the presentation.
On behalf of the lead authors (who were not present), Max Neuendorf, FhG, presented
m17484
|
Progress report on Time Warping Core Experiment for USAC
|
Zhong Haishan
Chong Kok Seng
Zhou Huan
Takeshi Norimatsu
Tomokazu Ishikawa
Neo Sua Hong
|
The authors envision that this CE will be complete at the next meeting. The technology aims to increase the accuracy of the transposition when using time warping. The contribution presents a listening test that demonstrates the performance of the proposal. The results show that, for differential score analysis, 4 of 7 items are better at the 95% level of significance. In the processing for this listening test, the time warping tool was active more often than in WG6 processing. In frames where time warping is active, bit rate savings is 1%, leading to an overall rate savings of 0.2%.
On behalf of the authors (who were not present), Max Neuendorf, FhG, presented
m17496
|
Progress Report on QMF based Harmonic Transposer CE
|
Zhou Huan
Zhong Haishan
Chong Kok Seng
Tomokazu Ishikawa
Takeshi Norimatsu
Lars Villemoes
Per Ekstrand
Kristofer Kjörling
Frederik Nagel
Stephan Wilde
Sascha Disch
Max Neuendorf
|
The contribution gives a progress report on the CE on a low complexity harmonic transposer for eSBR useing QMF domain processing rather than FFT processing. It reports that complexity is reduced from 5.7 MOPS to 1.1 MOPS. The proponents expect a complete CE at the next MPEG meeting.
Spectral Noiseless Coding
Markus Multrus, FhG, presented
m17558
|
Extra Information Regarding the CE on the Spectral Noiseless Coding in USAC
|
Guillaume Fuchs
KiHyun Choo
Markus Multrus
JungHoe Kim
Nikolaus Rettelbach
Eunmi Oh
Vignesh Subbaraman
|
The contribution summarizes the current status of the CE
-
CE proposal including cross-checks was brought to the 91st meeting, and additional information on Wednesday of the 91st meeting made this a complete proposal. Two experts requested additional time to review this information and so a decision on the CE was deferred until the 92nd meeting.
-
Significant reduction in ROM: 16900 to 1500 words and RAM: 666 to 72 words
-
Complexity is 0.61 PCU based on WD5, not expected to change for WD6.
-
In addition offers some additional compression: on average 1.81%
As an update, it reports how performance changes for WD5 or WD6 as compared to WD3 (which was reported in the last contribution). This is summarized here:
|
WD3
(% of total bitrate)
|
WD5
(% of total bitrate)
|
WD6 (% of total bitrate)
|
Base Version
|
-1.81
|
-1.53
|
-1.74
|
Enhanced Version
|
-1.87
|
-1.60
|
-1.81
|
The presenter noted that compression offered by the CE is comparable (i.e. within 5%) when based on WD6 or WD3.
If all tables are re-trained based on WD6, the following performance is obtained:
Mode
|
Saving (kbps)
|
Saving
(% total data rate)
|
Saving (kbps)
|
Saving
(% total data rate)
|
|
Base version
|
Enhanced version
|
Base version
|
Enhanced version
|
64s
|
-1.97
|
-3.07
|
-2.03
|
-3.17
|
32s
|
-0.68
|
-2.13
|
-0.71
|
-2.21
|
24s
|
-0.47
|
-1.97
|
-0.49
|
-2.03
|
20s
|
-0.43
|
-2.13
|
-0.44
|
-2.20
|
16s
|
-0.35
|
-2.19
|
-0.36
|
-2.23
|
24m
|
-0.52
|
-2.18
|
-0.54
|
-2.25
|
20m
|
-0.46
|
-2.30
|
-0.47
|
-2.37
|
16m
|
-0.38
|
-2.39
|
-0.39
|
-2.43
|
12m
|
-0.27
|
-2.25
|
-0.27
|
-2.28
|
Average
|
|
-2.29
|
|
-2.35
|
It is noted that these results were not crosschecked.
The presenter further noted that this CE has been ongoing for some time, was complete at the last meeting and asked for guidance in terms of how to proceed with this CE. The Chair acknowledged that the CE has been ongoing for some time and keeping a CE “up to date” is a drain on proponent resources. The Chair offered two ways to proceed: 1) decide on this CE now (at the 92nd meeting), or 2) to decide at the first item of business at the next meeting, but with the understanding that no additional information is required as a contribution to the 93rd meeting.
Hervé Taddei, Huawei, felt that there is opportunity to harmonize the FhG/Samsung and Huawei proposals and objects to making a decision on the FhG/Samsung CE at this meeting. It was the position of the other delegates present in the Audio room to make a decision on the FhG/Samsung CE as the first item of business at the 93rd meeting.
Gregory Pallone, Orange Labs, presented
m17569
|
Cross check report for Huawei proposal on Spectral Noiseless Coding for USAC
|
Gregory Pallone
Pierrick Philippe
|
The contribution verified that the Huawei proposal was lossless with respect to WD6 and was able to obtain the identical compression efficiency figures as is reported in m17574 and confirm that the bit reservoir limits were fulfilled.
Hervé Taddei, Huawei, presented
m17574
|
Progress report on Spectral Noiseless Coding for USAC
|
Wei Xiao
David Virette
Hervé Taddei
Anisse Taleb
|
The contribution reviews the technology comprising the proposal. It observes that 1-tuple coding is most appropriate for tonal signals, 2-tuple for typical signals and 4-tuple for noise-like signals. It reports the relative frequency of the selection of the various tuple lengths, in which 4-tuple is most likely. The presenter noted that WD3 training database is used for this proposal.
-
ROM reduced from 16894 words to 1847 words
-
Average increase in compression 2.1%
Markus Multrics, FhG, asked about the complexity of the proposed CE. The presenter noted that the WD6 has a complexity of 0.5 and the CE has a complexity of 1.33 and that complexity of the implementation was not optimized and that the measurement methodology was not the same as was used in m17558.
Unfied Stereo
Julien Robilliard, FhG, presented
m17557
|
Corrections to Unified Stereo Coding
|
Erik Schuijers
Werner Oomen
Julien Robilliard
Heiko Purnhagen
Pontus Carlsson
|
The contribution proposes two corrections:
-
Concerning gain clipping, to keep the reference software implementation and to change the WD6 to reflect the software implementation (which is to include the gain clipping).
-
Concerning upmix matrix, to change the upmix matrix so that it always has an invertible form that can be used as the downmix matrix in the encoder. This change affects both the WD text and reference software.
The Chair verified that the combination of quantized parameters (in the second
It was the consensus of the Audio subgroup to adopt proposed gain clippin change into WD7.
The proponents are requested to bring evidence of encoder/decoder operation using the second proposed change, as a bitstream and decoded wav file for (perhaps) a synthetic signal room. The decision to adopt the second proposed change into WD7 will be made after a review of the evidence.
Subsequently, this new evidence was reviewed and it was the consensus of the Audio subgroup to adopt proposed the upmix matrix so that there is always has an invertible form that can be used as the downmix matrix in the encoder. This change effects both the WD7 text and reference software.
On behalf of the authors, the Chair presented
m17511
|
Thoughts on Gain Clipping in Unified Stereo Coding
|
Miyoung Kim
Eunmi Oh
Hwan Shim
|
It was the understanding of those present in the Audio room that this contribution supports the adoption of the gain clipping constant into WD7, and so is in line with what is proposed in the previous document.
Heiko Purnhagen, Dolby, presented
m17556
|
Core experiment on improved stereo coding in USAC
|
Heiko Purnhagen
Pontus Carlsson
Lars Villemoes
Julien Robilliard
Johannes Hilpert
Christian Helmrich
|
The CE addresses the case of stereo coding when using the SBR tool is not used. It observes that the current specification requires four 64-band QMF filters (analysis or synthesis) in order to do the upmix. It proposed to instead to do the upmix in the USAC decoder MDCT time/frequency coefficient domain.
It presents listening test results that compares the following
-
USAC WD6 (operating with stereo coding tools as in MPEG-4 AAC
-
USAC WD6 + CE (but no SBR and no QMF domain upmix)
The results shows that when analyzing absolute scores, the one items that is a mono items that is panned to halfway between center and right with a 6dB level difference between left and right channels shows a significant increase in performance. When analyzing difference scores, there is a significant improvement for all 7 test items and for the mean core.
The presenter anticipates additional information at the next meeting, including independent cross-check over the full set of USAC test items.
Low Bitrate Stereo
Julien Robilliard, FhG, presented
m17534
|
FhG listening test report for CE on improved downmix/upmix for USAC
|
Julien Robilliard
|
The contribution presents listening test results for the systems WD6 and WD6+downmix /upmix.
For 32 kb/s stereo test, results of score analysis, at the 95% level of significance, are:
-
Absolute score values (WD6+downmix /upmix – WD6)
-
Improvement for one item (harmonics)
-
Differential score values (WD6+downmix /upmix – WD6)
-
Improvement of 5 items and also an improvement in the mean.
Gregory Pallone, Orange Labs, presented
m17541
|
Cross-check report on proposed improvements for low bitrate stereo in USAC
|
Gregory Pallone
Pierrick Philippe
|
The contribution presents listening test results for the systems WD6, WD6+downmix” and WD6+downmix /upmix.
For 32 kb/s stereo test, results of score analysis, at the 95% level of significance, are:
-
Absolute score values (WD6+downmix /upmix – WD6)
-
Differential score values (WD6+downmix /upmix – WD6)
-
Differential score values (WD6+downmix – WD6)
-
Improvement of 1 items and a degradation for 1 item
-
Differential score values (WD6+downmix /upmix – WD6+downmix)
-
Improvement of 1 items and a degradation for 1 item
On behalf of the authors, the Chair presented
m17509
|
Cross-check report on proposed improvements for low bitrate stereo in USAC
|
Miyoung Kim
Eunmi Oh
|
The contribution presents listening test results for the systems WD6, WD6+downmix” and WD6+downmix /upmix.
For absolute score analysis, no item can be found where the 95% confidence intervals of the mean scores do not overlap over three codecs under test at both 24 and 32kbits/sec stereo.
For differential score analysis at 24 kb/s stereo test, results of score analysis, at the 95% level of significance, are:
-
(WD6+downmix /upmix – WD6)
-
(WD6+downmix – WD6)
-
(WD6+downmix /upmix – WD6+downmix)
-
Improvement of 1 item (harmonics)
For differential score analysis at 32 kb/s stereo test, results of score analysis, at the 95% level of significance, are:
-
(WD6+downmix /upmix – WD6)
-
(WD6+downmix – WD6)
-
(WD6+downmix /upmix – WD6+downmix)
-
Improvement of 1 item (harmonics)
Werner Oomen, Philips, presented
m17494
|
CE on improvements to low bitrate stereo in USAC
|
Erik Schuijers
Werner Oomen
Heiko Purnhagen
Pontus Carlsson
|
The presenter gave an overview of the CE technology. The key observation is that there might be cancellation in the encoder downmix such that the decoder upmix is not the same and that the resultant L/R in the decoder does not result in the correct decoder OPD (i.e. phase between reconstructed L and R signals).
The contribution presents listening test results for the systems WD6, WD6+downmix” and WD6+downmix /upmix.
For differential score analysis at 24 kb/s stereo test, results of score analysis, at the 95% level of significance, are:
-
(WD6+downmix /upmix – WD6)
-
(WD6+downmix – WD6)
For differential score analysis at 32 kb/s stereo test, results of score analysis, at the 95% level of significance, are:
-
(WD6+downmix /upmix – WD6)
-
(WD6+downmix – WD6)
The presenter offered additional analysis of the test data as pooled over all four sites. The analyis was as a mean scores and 95% confidence intervals for the pooled data and as a count of how many items are better at each test site for each test condition and each bit rate.
Discussion
Werner Oomen, Philips, and Julien Robilliard, FhG, both supported the adoption of downmix/upmix processing since this would be the most appropriate engineering solution. Heiko Purnhagen, Dolby, noted that Orange Labs reported that one item was degraded (for each bit rate), but that was for (WD6+downmix /upmix – WD6+downmix) not for (WD6+downmix /upmix – WD6). He further noted that pooling of data generally decreases the confidence intervals but in the additional analysis done by Werner Oomen, adding the Orange Labs data actually increased the confidence intervals. Werner Oomen, Philips, noted that for some signals (e.g. harmonics), an incorrect OPD in the decoder can result in serious quality degradation. Pierrick Philippe, Orange Labs, noted that WD6+downmix seems fine, WD6+downmix/upmix is good engineering practice, but was not rated well by Orange Labs. However, he notes that the Orange labs data is ambiguous and if he is the only dissenter then he would not oppose adoption.
It is the position of those present in the Audio room to adopt downmix/upmix, although it is noted that Pierrick Philippe, Orange Labs has a slightly different position, as stated above. It is further noted that Eunmi Oh, Samsung, sent an email indicating the she only supports adoption of downmix. Because of the special nature of this meeting, the Chair indicated that there is no consensus at the meeting on adopting the technology.
USAC Encoder/Decoder Reference Software
Jeongook Song, Yonsei University, presented
m17571
|
Yonsei-LG Contribution to MPEG USAC Reference Software
|
Jeongook Song
Hong-Goo Kang
Henney Oh
|
The contribution presents work done at Yonsei University and LG on simplifying the USAC reference software, with the simplified software called “JAME.” The simplifications resulted in significant reduction in folders and files, as shown here:
|
RM5
|
JAME 0.5
|
Ratio (%)
|
Repository Size (mpeg4audio)
|
20.9 MB
|
14.9 MB
|
71 %
|
Repository Size (mp4AudVm_Rewrite)
|
11.7 MB
|
5.8 MB
|
50 %
|
# of Folders
|
152
|
19
|
13 %
|
# of Files
|
1144
|
385
|
34 %
|
# of .c Files
|
412
|
120
|
29 %
|
# of Functions
|
23xx
|
8xx
|
3x %
|
There was considerable discussion of and interest in this software project. The presenter showed a listening test that indicated that the mean performance of the JAME codec is better than that of the Reference Quality Encoder, although it appears that the original signals in the test were band-limited to 3.5 kHz. Details of the software project will be discussed in a break-out.
USAC Encoder CE Information
Max Neuendorf, FhG, presented
m17513
|
Encoder Description of the USAC Forward Aliasing Cancellation Tool
|
Philippe Gournay
Bruno Bessette
Roch Lefebvre
Max Neuendorf
|
The presenter notes that the contribution builds on information in m17167 that was already incorporated into WD6 in an informative annex. The contribution was reviewed. The presenter noted that the USAC MPEG reference software (i.e. encoder and decoder in the trunk of the SVN server) already implement the FAC and FDNS tools, such that a full implementation of the CE is available as source code.
It was the consensus of the Audio subgroup to accept this information as sufficient and hence this CE is successfully concluded.
On behalf of the authors, the Chair presented
m17514
|
Contribution to the MPEG USAC Reference Encoder : Phase Coding
|
Eunmi Oh
|
The contribution contains a textual overview of mapping tables for 10 and 28 IPD parameters bands and coarse fine quantization Tables. In addition, a zip archive in the contribution contains encoder software that implements the encoder functionality of the CE.
Max Neuendorf, FhG and Marcus Multrus, FhG, noted that code is delivered in a zip archive, and perhaps is not on the SVN server. The Chair suggested the additional step of asking Samsung to
-
check the relevant files into the MPEG SVN server in the MPEG-D/USAC/trunk directory tree
-
that the USAC code compiles and link
-
the resultant encoder and decoder can produce a bitstream and decode that bitstream
and post a message to mpeg-audio-call reflector on the successful completion of these checks.
It was the consensus of the Audio subgroup to accept this information as sufficient and hence this CE is successfully concluded.
Reference Encoder
On behalf of the authors, the Chair presented
m17588
|
Listening test results for proposed MPS encoder for USAC
|
Hyunkook Lee
Sungyong Yoon
Tacksung Choi
|
Audio experts agreed that the contribution represents a good process to use to revise the MPEG USAC reference encoder.
On behalf of the authors, the Chair presented
m17573
|
Contribution to MPEG USAC Open Reference Encoder
|
Henney Oh
|
Audio experts support this work, but request that LG experts please check that
-
the relevant code files into the MPEG SVN server in the MPEG-D/USAC/trunk directory tree
-
that the USAC code compiles and link
-
the resultant encoder and decoder can produce a bitstream and decode that bitstream
and post a message to mpeg-audio-call reflector on the successful completion of these checks.
USAC CE Comments
Schuyler Quackenbush, Audio Research Labs, presented
m17547
|
Summary of USAC CE Performance
|
S. Quackenbush
|
This contribution primarily consisted of a spreadsheet that gave details of every core experiment in USAC. The Chair asked proponents to review the information and to correct it as appropriate. He noted that the spreadsheet helps him
-
check which components of a CE have been delivered by proponents
-
allocate time on the ASG agenda for CE discussion
He expressed the hope that the information may help CE decision making.
Pierrick Philippe, Orange Labs, presented
m17577
|
Discussion on the progress of USAC
|
David Virette
|
The contribution notes that USAC can operate both in the target range of e.g. 12 kb/s to 48 kb/s, and also at higher bit rates. It requests the ASG to investigate USAC performance at additional operating points, for example
-
96 kb/s to 128 kb/s for stereo
-
160 kb/s multi-channel (5.0)
It also requests a demonstration of certain functionalities
-
Surround and stereo (which may be addressed by bullet points above)
-
Higher sampling rates
Finally, it requests an analysis of complexity of the components of USAC.
The contribution notes that having this information as soon as possible, but not later than the 94th meeting would be very desirable.
Bernhard Grill, FhG, expects that at higher bitrates USAC have performance that is virtually identical to that of MPEG-4 AAC. Hence, the likelihood of a “surprise” and higher bitrates is very low.
Herve T, H, noted that it may be very interesting to investigate performance of multichannel at e.g. 48 kb/s or 96 kb/s.
Heiko Purnhagen, Dolby and Bernhard Grill, FhG, noted that multichannel in USAC can be achieved by using either USAC for discrete channel coding or by using USAC as a downmix coder for use with MPEG Surround.
Concerning higher sampling rates, the presenter noted that at higher sampling rates with higher bitrates, AAC has already demonstrated good performance and that USAC inherits that capability. The Chair noted that at high sampling rates and low bit rates, a practical system might just rate convert to 48 kHz sampling rate as a first step.
Herve T, H, asked whether 96 kb/s and 128 kb/s bitstreams might be made available as part of the regular release of WD information. The Chair observed that, on a successful outcome of the improved stereo coding CE, those 96 kb/s bitstreams can be made available on the SVN server.
There was considerable discussion on what was asked in the contribution. Most all issues were addressed as already known: e.g. performance of USAC at 128 kb/s (similar to AAC performance) or USAC stereo into MPEG Surround to produce multi-channel (similar to HE-AAC and MPEG Surround).
What remains as an open issue is to make available
-
Complexity information, either as PCU/MCU of candidate profiles and levels or as MOPS per component tool, as can be found for MPEG-2 AAC in N2957.
The Chair noted that experts should begin to consider what might be in the USAC verification test and what the timeline for that test might be.
Werner Oomen, Philips, presented
m17590
|
Further Thoughts on USAC CE process
|
Bernhard Grill
Johannes Hilpert
Werner Oomen
Kristofer Kjörling
Heiko Purnhagen
Philippe Gournay
|
The contribution presents observations on the current CE process as it is progressing in USAC and offers possible guidelines that might assist the process such that is it both more efficient (in terms of timeline and discussion) and will result in a better USAC standard.
It notes that CEs, in general, can categorizes as follows
-
Sensible engineering – resolve bugs, or inconsistencies in text and reference software. Resolution is best achieved via discussion based on technical merit.
-
Performance gain in quality, bitrate or complexity – improvement in quality, reduction in bitreat or reduction in complexity.
If quality, then the following guidelines are from the CE process document (N7140)
-
1 item better
-
No items worse
If bitrate reduction (i.e. compression)
-
The real objective is that bitrate savings give rise to an increase in sound quality (i.e. “1 better”). The contribution notes that typically this requires an increase in compression efficiency of from 2% to 5%. It proposes a “soft threshold” such that proposals well below the “soft threshold” are unlikely to impact quality, while those well above the “soft threshold” are very likely to impact quality (see figure below).
If complexity reduction
-
Compexity needs to be presented in relationship to the complexity of the entire USAC decoder. Furthermore, it notes that measurements of execution time on a specific platform (as an isolated data point) is not sufficient evidence of any complexity increase or reduction. However, providing complexity numbers in relation to technology that is well know or widely adopted in the marketplace allows experts in the group to better judge the proposal.
The contribution presented the following figure that visually motivates the arguments summarized above. The light blue vertical line is meant to be the “soft threshold” point.
The contribution asks that the major points in the contribution be incorporated into an output document that be used in the USAC CE process.
Mohammed Raad, RaadTech, asked whether the contribution authors use these guidelines as there “internal compass” [Chair’s paraphrase] when judging CEs. Philips, Dolby and FhG experts confirmed that the contribution reflects their view of how CEs are evaluated. Bernhard Grill, FhG, noted that what is paramount is to create a specification that will be successful in the marketplace; the Audio group has a good track record in create standards that are widely deployed in the marketplace, and that Audio experts should use a process that insures that this success continues.
He further stated that he feels that USAC is “behind schedule” in terms the number of CEs that have been successfully comcluded and incorporated into USAC. In this respect there is some frustration in not being able to get as much work done as might be possible with a better environment for discussion and decision.
The Chair notes that the guidelines have an impact on his allocation of time to discussion. Mohammed Raad, RaadTech, stated his agreement with the Chair on this point.
The Chair notes that in the past consensus has very often been unanimous. However, in USAC work it may not be possible to achieve unanimous positions so that one, two or even more dissenting opinions may, of necessity, represent consensus position.
Juergen Herre, FhG, noted that fairness must be preserved, and a common expectation amongst the group is critical to all getting a feeling of being treating fairly in the Audio subgroup. Mohammed Raad, RaadTech, endorsed this perspective.
In conclusion, the Chair felt this was a very worthwhile discussion and suggested he will begin incorporating aspects of this discussion into how he manages presentation and discussion time in the subgroup and how he declares consensus positions. He urges experts to continue to think about these issues and bring ideas as contributions to the next meeting so as to facilitate a continuing discussion of this critically important topic.
Pitch Coding
Takehiro Moriya, NTT, presented the following contributions
m17528
|
VoiceAge listening test report on pitch coding for USAC
|
Philippe Gournay
Roch Lefebvre
|
m17535
|
Huawei Listening Test Report on lossless coding of pitch lag for ACELP
|
Zhengzhong Du
Wei Xiao
David Virette
|
m17515
|
Additional information of the CE proposal on lossless coding of pitch lag for ACELP in USAC
|
Takehiro Moriya
Yutaka Kamamoto
Noboru Harada
|
The presenter noted that the CE technology can save 5 to 6 bits per ACELP frame. Two cross-check listening tests were performed at 12 kb/s using 15 USAC items and 9 additional items, all speech, including clean speech, mixed speech and speech with background noise.
-
VoiceAge listening test using differential analysis showed no items improved and 1 item degraded.
-
Huawei listening test using differential analysis showed 1 item improved and no items degraded.
-
NTT listening test using differential analysis showed 3 items improved and mean improved.
The presenter showed results for pooled scores. Differential analysis showed:
-
For VoiceAge and Huawei there were no significant differences.
-
For VoiceAge and Huawei and NTT, 1 item improved, none is degraded
The presenter acknowledged that the evidence for the CE is not that compelling. He expects to bring additional information to the next meeting. He further offered the possibility that this technology could be combined with ACELP pulse indexing to provide even as much as 3% improvement in compression efficiency at low bit rates.
Toru Chinen, Sony, presented
m17522
|
Proposal for improvement of SBR envelope coding
|
Toru Chinen
Yuki Yamamoto
Mitsuyuki Hatanaka
Masayuki Nishiguchi
|
The contribution describes a new CE on prediction coefficient vector coding (PVC). The decoder architecture for the tool is shown in the following figure:
The main idea is to exploit correlation between the lower band (in core coder) and the higher band (in SBR coder) so as to gain additional compression efficiency. It predicts the SBR envelope scalefactors based on the energy of QMF subband samples below the SBR crossover frequency. It is able to reduce the data rate of SBR parameters by up to 26%. The CE work consisted of modifying WD6 decoder to incorporate the proposed tool, and to drive the decoder with a transcoded WD6 bitstream according to the bitstream syntax in the contribution.
The contribution presented a listening test for 12 kb/s mono operating point using the 15 USAC test items. Analysis of absolute scores showed 1 item better (te1-mg54_speech) and none worse.
WD6 has SBR bit rate, as allocated in sbr_grid(), sbr_dtdf and sbr_data() of
-
1.81 kb/s peak, 0.64 kb/s average
While WD6+CE has a bit rate of
-
0.67 kb/s peak, 0.47 kb/s average
The CE delivers an average increase in compression efficiency of 1.4% for the set of USAC test items.
Heiko Purnhagen, Dolby, noted that it was a conscious design choice to have the high-band coding independent of low-band coding. He further noted that the proposed technology would require a core low-band decoder (to the subband coefficient level) in the encoder in order to get the quantized low-band spectrum
The Chair noted that SBR is a key technology in MPEG audio such that any proposed change at any operating point should be thoroughly checked to insure that consistent performance is achieved.
The Audio subgroup looks forward to additional information at the next MEPG meeting.
Other CEs
The proponents of the following CEs were either not present and hence could not present or chose not to present the contributions. The Chair urged audio experts to study these contributions. Due to this special situation, the Chair noted that contribution authors may choose to present them at the next meeting. The Chair notes that, concerning eTES, substantial discussions have been conducted via email during the 92nd MPEG meeting week.
|
Pulse Indexing in ACELP
|
|
m17516
|
Subjective listening test of CE on pulse indexing of ACELP in USAC
|
Takehiro Moriya
Yutaka Kamamoto
Noboru Harada
|
m17533
|
VoiceAge listening test report on ACELP pulse indexing for USAC
|
Philippe Gournay
Roch Lefebvre
|
m17536
|
Progress report on Enhanced Pulse Indexing CE for ACELP in USAC
|
Dejun Zhang
Fuwei Ma
David Virette
Hervé Taddei
Anisse Taleb
|
|
TCX
|
|
m17483
|
FhG Listening Test Report: TCX windowing
|
Max Neuendorf
|
m17486
|
ETRI Listening test results about TCX CE
|
Taejin Lee
Seungkwon Beack
Minje Kim
Kyeongok Kang
|
m17587
|
LGE listening test results for USAC CE on TCX windowing
|
Hyunkook Lee
Sungyong Yoon
Tacksung Choi
|
m17487
|
Report on TCX window CE
|
Taejin Lee
Seungkwon Beack
Minje Kim
Kyeongok Kang
|
|
eTES
|
|
m17482
|
FhG Listening Test Report eTes
|
Max Neuendorf
|
m17491
|
Report on cross-check listening test for the USAC CE on eTES
|
Kristofer Kjörling
Heiko Purnhagen
|
m17510
|
Cross-check report on Enhanced Temporal Envelope Shaping in USAC
|
Miyoung Kim
Eunmi Oh
|
m17586
|
LGE listening test results for USAC CE on eTES
|
Hyunkook Lee
Sungyong Yoon
Tacksung Choi
|
m17502
|
Report on Enhanced Temporal Envelope Shaping CE for USAC
|
Kei Kikuiri
Atsushi Yamaguchi
|
|
New CEs
|
|
m17572
|
Proposed CE on Enhanced Long Term Prediction for USAC
|
Jeongook Song
Hong-Goo Kang
Henney Oh
|
m17589
|
Core experiment on noise shaping method in USAC
|
Hyunkook Lee
Sungyong Yoon
Tacksung Choi
|
Dostları ilə paylaş: |