Element
|
RM8
|
CE
|
Reduction
|
Decorrelator
|
3.8
|
1.7
|
54%
|
Decoder
|
13.2
|
11.2
|
15.2%
|
Hence the CE reduces the total complexity (WMOPS) by more than 15% without any new tool or any change in bitstream syntax elements.
The contribution reports listening test results pooled over all data (25 subjects). At the operating point of 16 kb/s stereo and differential MOS score analysis, the results showed no differences at the 95% level of significance. At the operating point of 32 kb/s stereo and differential MOS score analysis, the results showed no differences at the 95% level of significance.
Samsung volunteered to cross-check the complexity figures and will report Monday afternoon.
Kristofer Kjörling, Dolby, expressed a concern that the CE proposes to remove the fractional delay capability from the proposed decorrelator, which may conceivably bring a quality increase for some signals. Heiko Purnhagen noted that in this case there could be one bitstream and two possible decoding modes, one with a fractional delay capability and one without.
It was decided to continue discussion on this topic Tuesday after lunch.
Eunmi Oh, Samsung, presented
m19233
|
Crosscheck report on improved applause coding
|
Miyoung Kim, Eunmi Oh
|
USAC-APPL
|
The contribution reports on listening tests at two bit rates. At 16 kb/s stereo analysis of differential MUSHRA scores shows 1 item better and mean better, at the 95% level of significance. At 32 kb/s stereo analysis of differential MUSHRA scores shows 3 items better and mean better, at the 95% level of significance.
Kei Kikuiri, NTT DOCOMO, presented
m19258
|
NTT DOCOMO Listening Test Report on improved applause coding CE in USAC
|
Kei Kikuiri
|
USAC-APPL
|
The contribution reports on listening tests at two bit rates. At 16 kb/s stereo analysis of differential MUSHRA scores (sysA – sysB) for applause items shows 3 item better and mean better, at the 95% level of significance. At 32 kb/s stereo analysis of differential MUSHRA scores (sysA – sysB) for applause items shows 3 items better and mean better, at the 95% level of significance.
David Virette, Huawei, presented
m19264
|
Report on cross-check listening test for the CE on Improved Applause Coding for USAC
|
David Virette
|
USAC-APPL
|
The contribution reports on listening tests at two bit rates. At 16 kb/s stereo analysis of differential MUSHRA scores (sysA – sysB) for applause items shows the mean better, at the 95% level of significance. At 32 kb/s stereo analysis of differential MUSHRA scores (sysA – sysB) for applause items shows 3 items better and mean better, at the 95% level of significance.
Julien Robilliard, FhG, presented
m19311
|
Finalization of CE proposal on improved applause coding in USAC
|
Sascha Disch, Achim Kuntz, Tom Bäckström, Erik Schuijers, Werner Oomen
|
USAC-APPL
|
The contribution reviews the shortcoming of RM8 for applause coding:
-
Narrowed sound stage
-
Lack of envelopment
The CE proposes a new tool, the Transient Steering Decorrelator (TSD). It consists of a block to separate transient portions from the rest of the signal, a decorrelator block specifically for transients, and a block to integrate transient signal back to create a whole signal. A flag for enabling the TSD tool and the side-information for the TSD are new bitstream syntax elements. The presenter noted that whenever the TSD tool is enabled, the additional bit information for TSD is more than offset by a reduction in MPEG Surround parametric side information (since these need only be MPS broadband cues). This reflects an adaptive trade-off between temporal and spectral resolution.
The contribution reports on FhG listening tests at two bit rates. The presenter noted that SysA is the CE system. At 16 kb/s stereo analysis of differential MUSHRA scores (sysA – sysB) for applause items shows 4 items better and mean better, at the 95% level of significance. At 32 kb/s stereo analysis of differential MUSHRA scores (sysA – sysB) for applause items shows 4 items better and mean better, at the 95% level of significance. The presenter noted that some diff scores were as large as 10 MUSHRA points.
When FhG and cross-check data are pooled, at 16 kb/s stereo analysis of differential MUSHRA scores (sysA – sysB) for applause items shows 4 items better and mean better, at the 95% level of significance. At 32 kb/s stereo analysis of differential MUSHRA scores (sysA – sysB) for applause items shows 4 items better and mean better, at the 95% level of significance.
The computational complexity is comprised of two components: transient slot position decoding and transient decorrelator. Together the two operations increase the USAC decoder WMOPS by less than 1%.
It was the consensus of the AhG to recommend to the Audio Subgroup to incorporate this CE technology into the USAC DIS. As always, CE acceptance is contingent on completing the CE by
-
Integrating into the decoder source code to implement the tool.
-
Providing sufficient support, as educational text and/or exemplary source code, so as to enable others to implement the tool in the MPEG Reference Encoder. The “support” component must be accepted by the consensus of the Audio Subgroup.
Toru Chinen, Sony, presented
m19253
|
Sony listening test report on Enhanced Performance at Mid Bitrates CE
|
Yuki Yamamoto, Hiroyuki Honma, Toru Chinen, Masayuki Nishiguchi
|
USAC-MBR
|
The contribution reports the results of a listening test.
Systems under test were:
-
System A : WD7 operating at 24 kbit/s and I/O sampling rate of 34.15 kHz
-
System B : WD7 operating at 24 kbit/s and I/O sampling rate of 44.1 kHz
-
System C : WD7 + 768 CE operating at 24 kbit/s and I/O sampling rate of 44.1 kHz
Listening test results are summarized under contribution m19330.
Kei Kikuiri, NTT DOCOMO, presented
m19257
|
NTT DOCOMO Cross-Check Report on Enhanced Performance at Mid Bitrates in USAC
|
Kei Kikuiri
|
USAC-MBR
|
Listening test results are summarized under contribution m19330.
The contribution also reports that the CE decoder executable was verified to decode the CE bitstreams to the CE waveforms to within +/- 1 lsb.
Markus Multrus, FhG, presented
m19330
|
Finalization of USAC CE on Enhanced Performance at Mid-Bitrates
|
Markus Multrus, Philippe Gournay, Nikolaus Rettelbach, Bruno Bessette, Bernhard Grill
|
USAC-MBR
|
The contribution reviewed the motivation for the CE. It notes that the optimal sampling rate for the ACELP tool is lower than that for the FD (i.e. AAC) tools. The result is that USAC runs at 34/17 kHz sampling rate. This requires an additional step of sampling rate conversion to widely supported 44.1 or 48 kHz values.
The CE proposes the following:
-
Reduce coder-coder frame rate by factor of ¾, to 768
-
Reduce ACELP frame length to 192 samples
-
Upsample SBR by factor of 8/3, with 24-band analysis filterbank and 64-band synthesis filterbank
This is shown in the following figure:
f = 3/8 fout
f = fout
f = fout
f =3/8 fout
f = fout
The computational complexity of the proposed new operating point approximately 10% less than RM9 at 44.1 kHz output sampling rate, and approximately 10% greater than RM9 at 34 kHz output sampling rate.
The CE requires 900 additional 32-bit words of ROM tables for the 768 block length window function. This is approximately a 3.5% increase with respect to the entire decoder ROM.
The results of the FhG listening test were presented. Analysis of differential MUSHRA scores shows that speech signals suffer quite a bit in quality when the coder output sampling is raised from 34 to 44.1 kHz. Conversely, the music items have a slightly increased quality when the coder output sampling is raised from 34 to 44.1 kHz. The new operating mode greatly increases the quality of the speech items without reducing the quality of the music items. In summary, the CE technology provides 5 items better and mean score better.
Finally, the results of pooling data for all listening test sites were presented. In differential analysis of the MUSHRA scores, 6 items are better and the mean score is better. The pooled data also exhibits the same effect for speech and music signals as in the FhG test result. When investigating consistency across test sites, 4 items are graded better for 2 of 3 test sites and also in the main for the pooled data.
It summarized as:
-
Temporal resolution is increased for FD mode while not impairing ACELP performance.
-
Output sampling rate is maintained at industry standard values (e.g. 32, 44.1 or 48 kHz).
It was the consensus of the AhG to recommend to the Audio Subgroup to incorporate this CE technology into the USAC DIS. As always, CE acceptance is contingent on completing the CE by
-
Integrating into the decoder source code to implement the tool.
-
Providing sufficient support, as educational text and/or exemplary source code, so as to enable others to implement the tool in the MPEG Reference Encoder. The “support” component must be accepted by the consensus of the Audio Subgroup. The proponent expects to provide source code for the MPEG Reference Encoder such that the LP mode runs at the new framing and the encoder produces a conformant bitstream that causes the decoder to run at the new framing.
Kei Kikuiri, NTT DOCOMO, presented
m19255
|
NTT DOCOMO Listening Test Report on QMF Based Harmonic Transposer CE in USAC
|
Kei Kikuiri
|
USAC-HT-QMF
|
The contribution reports on a listening test. At 8 kb/s and differential analysis of MUSHRA scores, 1 item was worse at 95% level of significance. At 12 kb/s and differential analysis of MUSHRA scores, 3 items are better and mean is better at 95% level of significance.
Werner Oomen, Philips, presented
m19271
|
Philips Listening test results on USAC CE on QMF Based Harmonic Transposer
|
Werner Oomen, Jeroen Koppens, Erik Schuijers
|
USAC-HT-QMF
|
The contribution reports on a listening test. At 8 kb/s and differential analysis of MUSHRA scores, there was no difference at 95% level of significance. At 12 kb/s and differential analysis of MUSHRA scores, there was no difference at 95% level of significance.
Toru Chinen, Sony, presented
m19254
|
Sony listening test report on QMF based harmonic transposer CE
|
Mitsuyuki Hatanaka, Hiroyuki Honma, Toru Chinen, Masayuki Nishiguchi
|
USAC-HT-QMF
|
The contribution reports on a listening test. At 8 kb/s and differential analysis of MUSHRA scores, 1 item was worse at 95% level of significance. At 12 kb/s and differential analysis of MUSHRA scores, 1 item is worse but mean is better at 95% level of significance.
Haishan Zhong, Panasonic, presented
m19301
|
Finalization of CE on QMF based harmonic transposer
|
Haishan Zhong, Kok Seng Chong, Dan Zhao, Takeshi Norimatsu, Tomokazu Ishikawa, Lars Villemoes, Per Ekstrand, Kristofer Kjörling, Max Neuendorf, Stephan Wilde, Sascha Disch, Frederik Nagel
|
USAC-HT-QMF
|
The contribution reviewed the goal of the CE technology, which is to provide a low-complexity harmonic transposer operating in the QMF domain. It also summarized the history of this CE over the last five MPEG meetings.
The contribution summarized the set of listening test data in the following charts:
QMF transposer reduces the complexity of the entire decoder by approximately 35% for the 8 kb/s and 12 kb/s operating points.
Max Neuendorf, FhG, noted that the FFT transposer increases the quality of signals such as sio1 (harpsichord) . Kristofer Kjörling, Dolby, noted that the relative complexity of the FFT transposer decreases as operating point bitrate increases.
Kristofer Kjörling, Dolby, proposed to further complexity analysis, and further to host a “harpsichord” listening test session.
It was decided to continue discussion on this topic Tuesday after lunch.
Jeff Huang, Qualcomm, presented
The contribution reports on a listening test. At 12 kb/s and differential analysis of MUSHRA scores (Sys[12] – Sys3), 1 item is better for Sys1 and mean is better for Sys 2, at 95% level of significance.
The systems under test are:
-
Sys1 RM8 + SFC + TFPP
-
Sys2 RM8 + iSBR + TFPP
-
Sys3 RM8 + iSBR
It also reports that Qualcomm verified that the CE decoder was able to decode the CE bitstreams and produce the CE waveforms in a bit-exact manner.
Takehiro Moriya, NTT, presented
m19222
|
NTT listening test of CE on adaptive T/F domain post-processing in USAC
|
Takehiro Moriya, Noboru Harada, Yutaka Kamamoto
|
USAC-ATFPP
|
The contribution reports on a listening test. At 8 kb/s and differential analysis of MUSHRA scores (Sys[12] – Sys3), 1 item is better for Sys2, at the 95% level of significance.
Heiko Purnhagen, Dolby, presented
m19265
|
Dolby listening test results for CE on T/F post-processing in USAC
|
Heiko Purnhagen, Kristofer Kjörling
|
USAC-ATFPP
|
The contribution reports on a listening test. At 8 kb/s and differential analysis of MUSHRA scores (Sys[12] – Sys3), there are no differences at 95% level of significance. At 12 kb/s and differential analysis of MUSHRA scores (Sys[12] – Sys3), there are no differences at 95% level of significance.
David Virette, Huawei, presented
m19303
|
Finalization of CE on adaptive T/F domain post-processing for USAC
|
David Virette, Wei Xiao, Qing Zhang,
|
USAC-ATFPP
|
The contribution reports on a listening test. At 8 kb/s and differential analysis of MUSHRA scores (Sys[12] – Sys3), 6 items are better and mean better for Sys2 at 95% level of significance. At 12 kb/s and differential analysis of MUSHRA scores (Sys[12] – Sys3), 4 items are better and mean better for Sys2 at 95% level of significance.
When all listening data is pooled, at 8 kb/s and differential analysis of MUSHRA scores (Sys[12] – Sys3), 3 items are better and mean better for Sys2 at 95% level of significance. At 12 kb/s and differential analysis of MUSHRA scores (Sys[12] – Sys3), 3 items are better and mean better for Sys2 at 95% level of significance.
The presenter noted that generally the improvement occurs for speech items and that these quality improvements correlate well with tool activation.
The complexity of the TFPP tool at 8 kb/s is an average 0.18 WMOPS and at 12 kb/s is 0.24 WMOPS. At 12 kb/s mono the peak complexity of TFPP is 8.2%.
The contribution proposed to adopt the Sys2 technology into the USAC DIS.
Heiko Purnhagen, Dolby, noted that cross-check sites did not agree with the proponent listening test results. A closer look at the data showed that this is due to large confidence intervals, which further suggests that there is a lack of agreement of what is “better” amongst the cross-check listeners.
When pooling over only the cross-check sites, there is one item (lion) where Sys2 is better than Sys3 at 8 kb/s, but there is no significant difference for the mean. At 12 kb/s, the mean over all items is better for Sys2 than Sys3, but there are no significant differences for any individual items.
It was decided to continue discussion on this topic Tuesday after lunch.
The Chair presented the draft AhG report to the group. This was reviewed and it was agreed that it was ready for upload.
Dostları ilə paylaş: |