International organisation for standardisation organisation internationale de normalisation

Yüklə 2,76 Mb.

səhifə	50/62
tarix	02.01.2022
ölçüsü	2,76 Mb.
	#20863

1 ... 46 47 48 49 50 51 52 53 ... 62

3.1.2USAC

Max Neuendorf, FhG, presented

m16324

Comments on new USAC reference bitstreams

Max Neuendorf
Markus Multrus

This contribution reported on the results of integrating the new arithmetic codebooks into the proponent USAC encoder and the RM decoder (denoted here as the Reference Quality System, or RQS). It notes that, at the previous MPEG meeting, the CE demonstrated a bitsavings, when averaged over all signals, due to using the new arithmetic coding tables. When this bitsavings was fed back into the bit reservoir, the wLPT tool was selected more often that TCX tool wrt the previous arithmetic tables. Hence it was not possible to maintain a bit-identical decoded waveform in the RQS.

The Chair asked whether, in creating a lossless decoding, the new tables strictly obeyed the bit buffer requirements. Max Neuendorf confirmed that this was the case.

Eunmi Oh, Samsung, noted that informal listening done at Samsung suggested that some items sounded worse as a result of this change. Herve Taddei, Huawei Technologies, noted that listening tests in their lab also concluded that the new tables resulted in a slight degradation in audio quality. It was agreed that Samsung and Huawei will give details on what test items were judged to be degraded. The Chair requested that FhG report at the next MPEG meeting any new information or insight gained in the use of the new arithmetic tables.

Max Neuendorf, FhG, presented

m16323

Report on Merge of sys2 Technology into RM0: SBR Improvements

Max Neuendorf
Taejin Lee

The contribution reported on incorporating technology from Sys2 into the RM. Listening tests were presented that showed that the performance of the new RQS had a higher mean score than the RM0 system, but not better at the 95% level of significance. However, the enhancements related only to the encoder implementation. There was considerable discussion as to how such enhancements would be incorporated into the RQS. The Chair noted that what was of paramount importance was simply that all CE proponents have access to the same RQS.

Taejin Lee, ETRI, presented

m16383

Report on Merge of sys2 Technology into RM0: TCX Improvements

Taejin Lee
Max Neuendorf

This contribution proposes to change the window shape for frames using TCX encoding. It presented listening test results for the proposed changes, in which a test over 6 items showed better performance wrt RM0. It was noted that, for these items and at 12 kb/s mono, the encoder was modified such that only TCX mode was used.

ETRI proposes to do additional work and bring a complete CE proposal to the next MPEG meeting. Chair noted that the CE process requires an independent listening test report cross-check. The Chair further noted that ETRI has the burden of bringing evidence that is sufficiently compelling on the merit of their technology, and that they should use this week to understand the concerns of the group.

Heiko Purnhagen, Dolby, presented

m16312

Dolby Listening Test Results for USAC CE on Phase Coding in MPS

Heiko Purnhagen
Kristofer Kjörling

The contribution presented listening test results at 32 kb/s stereo which showed a tendency in the mean for the performance of RM+CE to be better than the performance of RM. The presenter suggested that more conclusive results could be achieved if the data from all test results were pooled.

Werner Oomen, Philips, presented

m16339

Philips Listening Test Results for USAC CE on Phase Coding in MPS

Werner Oomen
Jeroen Koppens

The contribution presented listening test results at 24 kb/s stereo. The results showed a tendency in the mean for the performance of RM+CE to be better than the performance of RM. The presenter also suggested that more conclusive results could be achieved if the data from all test results were pooled.

Markus Multrus, FhG, presented

m16456

Fraunhofer Listening Test Results for USAC CE on Phase Coding in MPS

Julien Robilliard
Matthias Neusinger
Johannes Hilpert

The listening test showed results at 24 kb/s and 32 kb/s stereo. The results showed that the mean performance of RM+CE is better than the mean performance of RM at the 95% level of confidence when averaged over all test items. When looking at each test items, none performed worse and some performed better at the 95% level of confidence.

Eunmi Oh, Samsung, presented

m16374

Report on Phase Coding in MPEG Surround for USAC

JungHoe Kim
Julien Robilliard
Eunmi Oh
Bernhard Grill

The contribution described the proposed technology and presented listening test data. A summary of the technology follows:

can select fine or coarse phase quantization tables

decoder applies smoothing to unwrapped phase and uses linear interpolation in magnitude/phase domain

The technology provides the following performance:

at 32 kb/s the IPD rate is 0.479 kb/s
at 24 kb/s the IPD rate is 0.271 kb/s

The Samsung listening test result showed clear improvement for several items at the 95% level of significance for both bit rates and a clear improvement when scores for all items are pooled together.

Kristofer Kjörling, Dolby, expressed some concern with the details of the decoding process, specifically that the phase smoothing component of the proposed technology was not discussed in previous contributions. Heiko Purnhagen, Dolby, and Werver Oomen, Philips, expressed similar concerns.

The Chair asked Samsung, Dolby and Philips experts to have an off-line discussion and report back to the group on how best to proceed.

Heiko Purnhagen, Dolby, presented

m16311

Dolby Listening Test Results for USAC CE on AVQ-based LPC

Heiko Purnhagen
Kristofer Kjörling

The contribution presented listening test results at 16 kb/s mono that showed no statistically significant differences between RM and RM+CE at the 95% level of significance.

Markus Multrus, FhG, presented

m16322

Fraunhofer IIS Listeningtest Results on USAC CE for AVQ-based LPC Quantizer

Markus Multrus
Ralf Geiger

The contribution presented listening test results at 16 kb/s mono. The test included a 7.0 kHz LPF anchor and the two codecs comprising the VC (i.e. AMR-WB+ and HE-AAC) and showed no statistically significant differences between RM and RM+CE at the 95% level of significance.

Philippe Gournay, VoiceAge, presented

m16316

CE Report on LPC Quantization for USAC

Philippe Gournay
Bruno Bessette
Roch Lefebvre
Redwan Salami

The contribution reviewed the CE technology, which is to replace the RM0 quantizer (based on trained codebooks) with an algebraic vector quantizer (AVQ). The advantages of AVQ are:

it uses less ROM (19456 32-bit words for RM0 vs 4096 32-bit words (first stage) and 1150 16-bit words (for the AVQ quantizer).
permits better control of spectral distortion (i.e. fewer outliers)

The contribution presented objective data on spectral distortion for the AVQ coded excitation. The AVQ quantizer showed an order of magnitude fewer outliers (more that 1.5 dB spectral distortion) than the RM0 trained VQ quantizer.

It is the consensus of the group to incorporate this technology into the USAC WD.

Philippe Gournay, VoiceAge, presented

m16325

VoiceAge Test Report for USAC CE on Unvoiced Coding

Philippe Gournay
Roch Lefebvre

The contribution reports listening test results for two operating modes: 12 kb/s mono and 16 kb/s mono.

Analysis of difference scores indicates that, at 12 kb/s mono, one test item is worse for the CE technology (at the 95 % level of significance), while at 16 kb/s there is no statistical differences. Single-sided tests of the differences between scores showed that for one item (es01) at 12 kb/s mono the hypothesis that the mean of the two systems were the same was rejected.

The presenter acknowledged that the proposed technology is able to save bits, which is quite valuable. However, it may be that the hypothesis of modelling unvoiced speech by linear filtered Gaussian noise is not appropriate.

Eunmi Oh, Samsung, presented

m16373

Report on Unvoiced Speech Coding for USAC

Hosang Sung
Eunmi Oh
Miyoung Kim

The contribution presented the proposed CE technology, which consists of

Gaussian codebook of excitation vectors and gain factor for Low Energy segments (LEN)
Gaussian codebook of excitation vectors, gain factor and an additional LP filter for Unvoiced segments (UV)

It reviewed the bit savings possible by using the CE technology

UV 408 bits saved per superframe (as compared to LPD mode)
LEN 88 bits saved per superframe (as compared to LPD mode)

The contribution noted that the experiment could product bit-identical results, (except for UV and LEN coded segments), but because saved bits were fed back to the bit reservoir, the entire decoded waveform was different. It showed that the fraction of frames that are UV mode are a significant minority, with LEN quite a bit less. At 12 kb/s, the bitrate savings was as large as 0.92 kb/s, with 5 items larger than 0.5 kb/s savings. At 16 kb/s, the bitrate savings was as large as 1.15 kb/s, with 5 items larger than 0.5 kb/s savings.

It presented a listening test results for 12 kb/s mono and 16 kb/s mono. At 12 kb/s an analysis of difference between RM and RM+CE, RM+CE showed better performance for two items, at the 95% level of significance. At 16 kb/s an analysis of difference between RM and RM+CE, RM+CE showed better performance for one items, at the 95% level of significance. The presenter noted that these results are quite different from those in the previous contribution. Samsung and VoiceAge will investigate the reason for these differences and report back at the next meeting.

Samsung plans to bring more information to the next meeting that will make the CE a complete proposal and cross-check.

Yüklə 2,76 Mb.

Dostları ilə paylaş:

1 ... 46 47 48 49 50 51 52 53 ... 62