3.1.2USAC
Max Neuendorf, FhG, presented
m16324
|
Comments on new USAC reference bitstreams
|
Max Neuendorf
Markus Multrus
|
This contribution reported on the results of integrating the new arithmetic codebooks into the proponent USAC encoder and the RM decoder (denoted here as the Reference Quality System, or RQS). It notes that, at the previous MPEG meeting, the CE demonstrated a bitsavings, when averaged over all signals, due to using the new arithmetic coding tables. When this bitsavings was fed back into the bit reservoir, the wLPT tool was selected more often that TCX tool wrt the previous arithmetic tables. Hence it was not possible to maintain a bit-identical decoded waveform in the RQS.
The Chair asked whether, in creating a lossless decoding, the new tables strictly obeyed the bit buffer requirements. Max Neuendorf confirmed that this was the case.
Eunmi Oh, Samsung, noted that informal listening done at Samsung suggested that some items sounded worse as a result of this change. Herve Taddei, Huawei Technologies, noted that listening tests in their lab also concluded that the new tables resulted in a slight degradation in audio quality. It was agreed that Samsung and Huawei will give details on what test items were judged to be degraded. The Chair requested that FhG report at the next MPEG meeting any new information or insight gained in the use of the new arithmetic tables.
Max Neuendorf, FhG, presented
m16323
|
Report on Merge of sys2 Technology into RM0: SBR Improvements
|
Max Neuendorf
Taejin Lee
|
The contribution reported on incorporating technology from Sys2 into the RM. Listening tests were presented that showed that the performance of the new RQS had a higher mean score than the RM0 system, but not better at the 95% level of significance. However, the enhancements related only to the encoder implementation. There was considerable discussion as to how such enhancements would be incorporated into the RQS. The Chair noted that what was of paramount importance was simply that all CE proponents have access to the same RQS.
Taejin Lee, ETRI, presented
m16383
|
Report on Merge of sys2 Technology into RM0: TCX Improvements
|
Taejin Lee
Max Neuendorf
| This contribution proposes to change the window shape for frames using TCX encoding. It presented listening test results for the proposed changes, in which a test over 6 items showed better performance wrt RM0. It was noted that, for these items and at 12 kb/s mono, the encoder was modified such that only TCX mode was used.
ETRI proposes to do additional work and bring a complete CE proposal to the next MPEG meeting. Chair noted that the CE process requires an independent listening test report cross-check. The Chair further noted that ETRI has the burden of bringing evidence that is sufficiently compelling on the merit of their technology, and that they should use this week to understand the concerns of the group.
Heiko Purnhagen, Dolby, presented
The contribution presented listening test results at 32 kb/s stereo which showed a tendency in the mean for the performance of RM+CE to be better than the performance of RM. The presenter suggested that more conclusive results could be achieved if the data from all test results were pooled.
Werner Oomen, Philips, presented
m16339
|
Philips Listening Test Results for USAC CE on Phase Coding in MPS
|
Werner Oomen
Jeroen Koppens
| The contribution presented listening test results at 24 kb/s stereo. The results showed a tendency in the mean for the performance of RM+CE to be better than the performance of RM. The presenter also suggested that more conclusive results could be achieved if the data from all test results were pooled.
Markus Multrus, FhG, presented
m16456
|
Fraunhofer Listening Test Results for USAC CE on Phase Coding in MPS
|
Julien Robilliard
Matthias Neusinger
Johannes Hilpert
| The listening test showed results at 24 kb/s and 32 kb/s stereo. The results showed that the mean performance of RM+CE is better than the mean performance of RM at the 95% level of confidence when averaged over all test items. When looking at each test items, none performed worse and some performed better at the 95% level of confidence.
Eunmi Oh, Samsung, presented
m16374
|
Report on Phase Coding in MPEG Surround for USAC
|
JungHoe Kim
Julien Robilliard
Eunmi Oh
Bernhard Grill
| The contribution described the proposed technology and presented listening test data. A summary of the technology follows:
-
can select fine or coarse phase quantization tables
-
decoder applies smoothing to unwrapped phase and uses linear interpolation in magnitude/phase domain
The technology provides the following performance:
-
at 32 kb/s the IPD rate is 0.479 kb/s
-
at 24 kb/s the IPD rate is 0.271 kb/s
The Samsung listening test result showed clear improvement for several items at the 95% level of significance for both bit rates and a clear improvement when scores for all items are pooled together.
Kristofer Kjörling, Dolby, expressed some concern with the details of the decoding process, specifically that the phase smoothing component of the proposed technology was not discussed in previous contributions. Heiko Purnhagen, Dolby, and Werver Oomen, Philips, expressed similar concerns.
The Chair asked Samsung, Dolby and Philips experts to have an off-line discussion and report back to the group on how best to proceed.
Heiko Purnhagen, Dolby, presented
m16311
|
Dolby Listening Test Results for USAC CE on AVQ-based LPC
|
Heiko Purnhagen
Kristofer Kjörling
|
The contribution presented listening test results at 16 kb/s mono that showed no statistically significant differences between RM and RM+CE at the 95% level of significance.
Markus Multrus, FhG, presented
The contribution presented listening test results at 16 kb/s mono. The test included a 7.0 kHz LPF anchor and the two codecs comprising the VC (i.e. AMR-WB+ and HE-AAC) and showed no statistically significant differences between RM and RM+CE at the 95% level of significance.
Philippe Gournay, VoiceAge, presented
m16316
|
CE Report on LPC Quantization for USAC
|
Philippe Gournay
Bruno Bessette
Roch Lefebvre
Redwan Salami
|
The contribution reviewed the CE technology, which is to replace the RM0 quantizer (based on trained codebooks) with an algebraic vector quantizer (AVQ). The advantages of AVQ are:
-
it uses less ROM (19456 32-bit words for RM0 vs 4096 32-bit words (first stage) and 1150 16-bit words (for the AVQ quantizer).
-
permits better control of spectral distortion (i.e. fewer outliers)
The contribution presented objective data on spectral distortion for the AVQ coded excitation. The AVQ quantizer showed an order of magnitude fewer outliers (more that 1.5 dB spectral distortion) than the RM0 trained VQ quantizer.
It is the consensus of the group to incorporate this technology into the USAC WD.
Philippe Gournay, VoiceAge, presented
m16325
|
VoiceAge Test Report for USAC CE on Unvoiced Coding
|
Philippe Gournay
Roch Lefebvre
| The contribution reports listening test results for two operating modes: 12 kb/s mono and 16 kb/s mono.
Analysis of difference scores indicates that, at 12 kb/s mono, one test item is worse for the CE technology (at the 95 % level of significance), while at 16 kb/s there is no statistical differences. Single-sided tests of the differences between scores showed that for one item (es01) at 12 kb/s mono the hypothesis that the mean of the two systems were the same was rejected.
The presenter acknowledged that the proposed technology is able to save bits, which is quite valuable. However, it may be that the hypothesis of modelling unvoiced speech by linear filtered Gaussian noise is not appropriate.
Eunmi Oh, Samsung, presented
The contribution presented the proposed CE technology, which consists of
-
Gaussian codebook of excitation vectors and gain factor for Low Energy segments (LEN)
-
Gaussian codebook of excitation vectors, gain factor and an additional LP filter for Unvoiced segments (UV)
It reviewed the bit savings possible by using the CE technology
-
UV 408 bits saved per superframe (as compared to LPD mode)
-
LEN 88 bits saved per superframe (as compared to LPD mode)
The contribution noted that the experiment could product bit-identical results, (except for UV and LEN coded segments), but because saved bits were fed back to the bit reservoir, the entire decoded waveform was different. It showed that the fraction of frames that are UV mode are a significant minority, with LEN quite a bit less. At 12 kb/s, the bitrate savings was as large as 0.92 kb/s, with 5 items larger than 0.5 kb/s savings. At 16 kb/s, the bitrate savings was as large as 1.15 kb/s, with 5 items larger than 0.5 kb/s savings.
It presented a listening test results for 12 kb/s mono and 16 kb/s mono. At 12 kb/s an analysis of difference between RM and RM+CE, RM+CE showed better performance for two items, at the 95% level of significance. At 16 kb/s an analysis of difference between RM and RM+CE, RM+CE showed better performance for one items, at the 95% level of significance. The presenter noted that these results are quite different from those in the previous contribution. Samsung and VoiceAge will investigate the reason for these differences and report back at the next meeting.
Samsung plans to bring more information to the next meeting that will make the CE a complete proposal and cross-check.
Dostları ilə paylaş: |