Organisation internationale de normalisation

Yüklə 7,54 Mb.

səhifə	149/166
tarix	03.01.2022
ölçüsü	7,54 Mb.
	#33742

1 ... 145 146 147 148 149 150 151 152 ... 166

Element	RM8	CE
Reduction
Decorrelator	3.8	1.7	54%
Decoder	13.2	11.2	15.2%

Hence the CE reduces the total complexity (WMOPS) by more than 15% without any new tool or any change in bitstream syntax elements.

The contribution reports listening test results pooled over all data (25 subjects). At the operating point of 16 kb/s stereo and differential MOS score analysis, the results showed no differences at the 95% level of significance. At the operating point of 32 kb/s stereo and differential MOS score analysis, the results showed no differences at the 95% level of significance.

Samsung volunteered to cross-check the complexity figures and will report Monday afternoon.

Kristofer Kjörling, Dolby, expressed a concern that the CE proposes to remove the fractional delay capability from the proposed decorrelator, which may conceivably bring a quality increase for some signals. Heiko Purnhagen noted that in this case there could be one bitstream and two possible decoding modes, one with a fractional delay capability and one without.

It was decided to continue discussion on this topic Tuesday after lunch.

Eunmi Oh, Samsung, presented

m19233

Crosscheck report on improved applause coding

Miyoung Kim, Eunmi Oh

USAC-APPL

The contribution reports on listening tests at two bit rates. At 16 kb/s stereo analysis of differential MUSHRA scores shows 1 item better and mean better, at the 95% level of significance. At 32 kb/s stereo analysis of differential MUSHRA scores shows 3 items better and mean better, at the 95% level of significance.

Kei Kikuiri, NTT DOCOMO, presented

m19258

NTT DOCOMO Listening Test Report on improved applause coding CE in USAC

Kei Kikuiri

USAC-APPL

The contribution reports on listening tests at two bit rates. At 16 kb/s stereo analysis of differential MUSHRA scores (sysA – sysB) for applause items shows 3 item better and mean better, at the 95% level of significance. At 32 kb/s stereo analysis of differential MUSHRA scores (sysA – sysB) for applause items shows 3 items better and mean better, at the 95% level of significance.

David Virette, Huawei, presented

m19264

Report on cross-check listening test for the CE on Improved Applause Coding for USAC

David Virette

USAC-APPL

The contribution reports on listening tests at two bit rates. At 16 kb/s stereo analysis of differential MUSHRA scores (sysA – sysB) for applause items shows the mean better, at the 95% level of significance. At 32 kb/s stereo analysis of differential MUSHRA scores (sysA – sysB) for applause items shows 3 items better and mean better, at the 95% level of significance.

Julien Robilliard, FhG, presented

m19311

Finalization of CE proposal on improved applause coding in USAC

Sascha Disch, Achim Kuntz, Tom Bäckström, Erik Schuijers, Werner Oomen

USAC-APPL

The contribution reviews the shortcoming of RM8 for applause coding:

Narrowed sound stage
Lack of envelopment

The CE proposes a new tool, the Transient Steering Decorrelator (TSD). It consists of a block to separate transient portions from the rest of the signal, a decorrelator block specifically for transients, and a block to integrate transient signal back to create a whole signal. A flag for enabling the TSD tool and the side-information for the TSD are new bitstream syntax elements. The presenter noted that whenever the TSD tool is enabled, the additional bit information for TSD is more than offset by a reduction in MPEG Surround parametric side information (since these need only be MPS broadband cues). This reflects an adaptive trade-off between temporal and spectral resolution.

The contribution reports on FhG listening tests at two bit rates. The presenter noted that SysA is the CE system. At 16 kb/s stereo analysis of differential MUSHRA scores (sysA – sysB) for applause items shows 4 items better and mean better, at the 95% level of significance. At 32 kb/s stereo analysis of differential MUSHRA scores (sysA – sysB) for applause items shows 4 items better and mean better, at the 95% level of significance. The presenter noted that some diff scores were as large as 10 MUSHRA points.

When FhG and cross-check data are pooled, at 16 kb/s stereo analysis of differential MUSHRA scores (sysA – sysB) for applause items shows 4 items better and mean better, at the 95% level of significance. At 32 kb/s stereo analysis of differential MUSHRA scores (sysA – sysB) for applause items shows 4 items better and mean better, at the 95% level of significance.

The computational complexity is comprised of two components: transient slot position decoding and transient decorrelator. Together the two operations increase the USAC decoder WMOPS by less than 1%.

It was the consensus of the AhG to recommend to the Audio Subgroup to incorporate this CE technology into the USAC DIS. As always, CE acceptance is contingent on completing the CE by

Integrating into the decoder source code to implement the tool.
Providing sufficient support, as educational text and/or exemplary source code, so as to enable others to implement the tool in the MPEG Reference Encoder. The “support” component must be accepted by the consensus of the Audio Subgroup.

Toru Chinen, Sony, presented

m19253

Sony listening test report on Enhanced Performance at Mid Bitrates CE

Yuki Yamamoto, Hiroyuki Honma, Toru Chinen, Masayuki Nishiguchi

USAC-MBR

The contribution reports the results of a listening test.

Systems under test were:

System A : WD7 operating at 24 kbit/s and I/O sampling rate of 34.15 kHz
System B : WD7 operating at 24 kbit/s and I/O sampling rate of 44.1 kHz
System C : WD7 + 768 CE operating at 24 kbit/s and I/O sampling rate of 44.1 kHz

Listening test results are summarized under contribution m19330.
Kei Kikuiri, NTT DOCOMO, presented

m19257

NTT DOCOMO Cross-Check Report on Enhanced Performance at Mid Bitrates in USAC

Kei Kikuiri

USAC-MBR

Listening test results are summarized under contribution m19330.

The contribution also reports that the CE decoder executable was verified to decode the CE bitstreams to the CE waveforms to within +/- 1 lsb.

Markus Multrus, FhG, presented

m19330

Finalization of USAC CE on Enhanced Performance at Mid-Bitrates

Markus Multrus, Philippe Gournay, Nikolaus Rettelbach, Bruno Bessette, Bernhard Grill

USAC-MBR

The contribution reviewed the motivation for the CE. It notes that the optimal sampling rate for the ACELP tool is lower than that for the FD (i.e. AAC) tools. The result is that USAC runs at 34/17 kHz sampling rate. This requires an additional step of sampling rate conversion to widely supported 44.1 or 48 kHz values.

The CE proposes the following:

Reduce coder-coder frame rate by factor of ¾, to 768
Reduce ACELP frame length to 192 samples
Upsample SBR by factor of 8/3, with 24-band analysis filterbank and 64-band synthesis filterbank

This is shown in the following figure:

f = 3/8 f_out

f = f_out

f = f_out

f =3/8 f_out

f = f_out

The computational complexity of the proposed new operating point approximately 10% less than RM9 at 44.1 kHz output sampling rate, and approximately 10% greater than RM9 at 34 kHz output sampling rate.

The CE requires 900 additional 32-bit words of ROM tables for the 768 block length window function. This is approximately a 3.5% increase with respect to the entire decoder ROM.

The results of the FhG listening test were presented. Analysis of differential MUSHRA scores shows that speech signals suffer quite a bit in quality when the coder output sampling is raised from 34 to 44.1 kHz. Conversely, the music items have a slightly increased quality when the coder output sampling is raised from 34 to 44.1 kHz. The new operating mode greatly increases the quality of the speech items without reducing the quality of the music items. In summary, the CE technology provides 5 items better and mean score better.

Finally, the results of pooling data for all listening test sites were presented. In differential analysis of the MUSHRA scores, 6 items are better and the mean score is better. The pooled data also exhibits the same effect for speech and music signals as in the FhG test result. When investigating consistency across test sites, 4 items are graded better for 2 of 3 test sites and also in the main for the pooled data.

It summarized as:

Temporal resolution is increased for FD mode while not impairing ACELP performance.
Output sampling rate is maintained at industry standard values (e.g. 32, 44.1 or 48 kHz).

It was the consensus of the AhG to recommend to the Audio Subgroup to incorporate this CE technology into the USAC DIS. As always, CE acceptance is contingent on completing the CE by

Integrating into the decoder source code to implement the tool.
Providing sufficient support, as educational text and/or exemplary source code, so as to enable others to implement the tool in the MPEG Reference Encoder. The “support” component must be accepted by the consensus of the Audio Subgroup. The proponent expects to provide source code for the MPEG Reference Encoder such that the LP mode runs at the new framing and the encoder produces a conformant bitstream that causes the decoder to run at the new framing.

Kei Kikuiri, NTT DOCOMO, presented

m19255

NTT DOCOMO Listening Test Report on QMF Based Harmonic Transposer CE in USAC

Kei Kikuiri

USAC-HT-QMF

The contribution reports on a listening test. At 8 kb/s and differential analysis of MUSHRA scores, 1 item was worse at 95% level of significance. At 12 kb/s and differential analysis of MUSHRA scores, 3 items are better and mean is better at 95% level of significance.

Werner Oomen, Philips, presented

m19271

Philips Listening test results on USAC CE on QMF Based Harmonic Transposer

Werner Oomen, Jeroen Koppens, Erik Schuijers

USAC-HT-QMF

The contribution reports on a listening test. At 8 kb/s and differential analysis of MUSHRA scores, there was no difference at 95% level of significance. At 12 kb/s and differential analysis of MUSHRA scores, there was no difference at 95% level of significance.

Toru Chinen, Sony, presented

m19254

Sony listening test report on QMF based harmonic transposer CE

Mitsuyuki Hatanaka, Hiroyuki Honma, Toru Chinen, Masayuki Nishiguchi

USAC-HT-QMF

The contribution reports on a listening test. At 8 kb/s and differential analysis of MUSHRA scores, 1 item was worse at 95% level of significance. At 12 kb/s and differential analysis of MUSHRA scores, 1 item is worse but mean is better at 95% level of significance.

Haishan Zhong, Panasonic, presented

m19301

Finalization of CE on QMF based harmonic transposer

Haishan Zhong, Kok Seng Chong, Dan Zhao, Takeshi Norimatsu, Tomokazu Ishikawa, Lars Villemoes, Per Ekstrand, Kristofer Kjörling, Max Neuendorf, Stephan Wilde, Sascha Disch, Frederik Nagel

USAC-HT-QMF

The contribution reviewed the goal of the CE technology, which is to provide a low-complexity harmonic transposer operating in the QMF domain. It also summarized the history of this CE over the last five MPEG meetings.

The contribution summarized the set of listening test data in the following charts:

QMF transposer reduces the complexity of the entire decoder by approximately 35% for the 8 kb/s and 12 kb/s operating points.

Max Neuendorf, FhG, noted that the FFT transposer increases the quality of signals such as sio1 (harpsichord) . Kristofer Kjörling, Dolby, noted that the relative complexity of the FFT transposer decreases as operating point bitrate increases.

Kristofer Kjörling, Dolby, proposed to further complexity analysis, and further to host a “harpsichord” listening test session.

It was decided to continue discussion on this topic Tuesday after lunch.

Jeff Huang, Qualcomm, presented

m19321

Crosscheck listening test report for USAC TFP CE

Jeff Huang,

USAC-ATFPP

The contribution reports on a listening test. At 12 kb/s and differential analysis of MUSHRA scores (Sys[12] – Sys3), 1 item is better for Sys1 and mean is better for Sys 2, at 95% level of significance.

The systems under test are:

Sys1 RM8 + SFC + TFPP
Sys2 RM8 + iSBR + TFPP
Sys3 RM8 + iSBR

It also reports that Qualcomm verified that the CE decoder was able to decode the CE bitstreams and produce the CE waveforms in a bit-exact manner.

Takehiro Moriya, NTT, presented

m19222

NTT listening test of CE on adaptive T/F domain post-processing in USAC

Takehiro Moriya, Noboru Harada, Yutaka Kamamoto

USAC-ATFPP

The contribution reports on a listening test. At 8 kb/s and differential analysis of MUSHRA scores (Sys[12] – Sys3), 1 item is better for Sys2, at the 95% level of significance.

Heiko Purnhagen, Dolby, presented

m19265

Dolby listening test results for CE on T/F post-processing in USAC

Heiko Purnhagen, Kristofer Kjörling

USAC-ATFPP

The contribution reports on a listening test. At 8 kb/s and differential analysis of MUSHRA scores (Sys[12] – Sys3), there are no differences at 95% level of significance. At 12 kb/s and differential analysis of MUSHRA scores (Sys[12] – Sys3), there are no differences at 95% level of significance.

David Virette, Huawei, presented

m19303

Finalization of CE on adaptive T/F domain post-processing for USAC

David Virette, Wei Xiao, Qing Zhang,

USAC-ATFPP

The contribution reports on a listening test. At 8 kb/s and differential analysis of MUSHRA scores (Sys[12] – Sys3), 6 items are better and mean better for Sys2 at 95% level of significance. At 12 kb/s and differential analysis of MUSHRA scores (Sys[12] – Sys3), 4 items are better and mean better for Sys2 at 95% level of significance.

When all listening data is pooled, at 8 kb/s and differential analysis of MUSHRA scores (Sys[12] – Sys3), 3 items are better and mean better for Sys2 at 95% level of significance. At 12 kb/s and differential analysis of MUSHRA scores (Sys[12] – Sys3), 3 items are better and mean better for Sys2 at 95% level of significance.

The presenter noted that generally the improvement occurs for speech items and that these quality improvements correlate well with tool activation.

The complexity of the TFPP tool at 8 kb/s is an average 0.18 WMOPS and at 12 kb/s is 0.24 WMOPS. At 12 kb/s mono the peak complexity of TFPP is 8.2%.

The contribution proposed to adopt the Sys2 technology into the USAC DIS.

Heiko Purnhagen, Dolby, noted that cross-check sites did not agree with the proponent listening test results. A closer look at the data showed that this is due to large confidence intervals, which further suggests that there is a lack of agreement of what is “better” amongst the cross-check listeners.

When pooling over only the cross-check sites, there is one item (lion) where Sys2 is better than Sys3 at 8 kb/s, but there is no significant difference for the mean. At 12 kb/s, the mean over all items is better for Sys2 than Sys3, but there are no significant differences for any individual items.

It was decided to continue discussion on this topic Tuesday after lunch.

The Chair presented the draft AhG report to the group. This was reviewed and it was agreed that it was ready for upload.

Yüklə 7,54 Mb.

Dostları ilə paylaş:

1 ... 145 146 147 148 149 150 151 152 ... 166