International organisation for standardisation


MPEG-D Unified Speech and Audio Coding



Yüklə 3 Mb.
səhifə52/58
tarix03.01.2022
ölçüsü3 Mb.
#38488
1   ...   48   49   50   51   52   53   54   55   ...   58

4.2.3MPEG-D Unified Speech and Audio Coding


Continued Discussion of Window Transitions

Mohamad Raad, RaadTech, presented analysis of differential scores for each of the test sites. He presented a summary table showing which items were “better” at which test sites for the 12 kb/s and 20 kb/s mono operating point. Overall, there was good agreement between test sites in terms of which individual items were better at the 95% level of significance. This is shown in the following two tables, where green highlighting shows items for which there is consistent agreement of improvement across test sites and B indicates Better and N indicated Not different.



12 kb/s/ mono:




VoiceAge

FhG

Dolby

LG

ETRI

Samsung

Arirang_speech

B

N

N

N

N

N

es01

N

N

N

N

N

N

HarryPotter

B

N

B

N

N

N

Louis_Raquin

N

B

N

B

N

N

Music_1

B

N

N

B

N

N

Music_3

N

N

N

N

N

N

Salvation

N

N

N

N

N

N

SpeechOverMusic_1

N

N

N

N

N

N

SpeechOverMusic_4

B

B

N

N

N

N

te1_mg54_speech

N

N

N

N

N

N

te15

N

N

N

N

B

N

twinkle_ff51

B

N

B

N

N

N

Wedding_speech

B

B

B

B

B

N

lion

B

N

N

N

N

N

phi7

B

B

B

B

N

B

20 kb/s mono:






VoiceAge

FhG

Dolby

ETRI

Arirang_speech

B

N

N

N

es01

N

N

N

N

HarryPotter

B

B

B

N

Louis_Raquin

B

N

B

N

Music_1

B

N

N

B

Music_3

N

N

N

N

Salvation

N

N

N

N

SpeechOverMusic_1

N

N

N

N

SpeechOverMusic_4

B

N

N

N

te1_mg54_speech

N

B

B

B

te15

N

N

N

N

twinkle_ff51

B

N

N

N

Wedding_speech

B

N

N

N

lion

B

N

N

N

phi7

B

B

N

N

It was the consensus of the Audio subgroup that the CE (i.e. FDNS+FAC technology) does provide a significant improvement in subjective audio quality.

A second open issue from the Sunday’s AhG discussion concerns interpolation in the FDNS. This was discussed in the task group and the group reported back that this is not a problem. It was the consensus of the ASG that the FDNS interpolation is appropriate and is not an issue in adoption of tool.

Issues of complexity were discussed Wednesday after MPEG plenary. For continuity in the Audio report, these discussions are noted here.



Window Transition CE discussion

Pierrick Phillipe, Orange Labs, gave a verbal report on the complexity of the window transition CE technology, which was that he has checked the information in the contribution m17167 and confirms that the complexity information in the contribution is correct.

The Chair noted that all identified open issues relating to this CE have been resolved and asked if there was consensus to adopt the CE technology. It was the consensus of the Audio subgroup to incorporate the window transition technology into USAC WD6.

Oliver Wuebbolt, Thomson, presented



m17271

Spectral Noiseless Coding CE in USAC: Report on evaluation of a combined proposal from Fraunhofer and Thomson

Oliver Wuebbolt
Markus Multrus

The contribution notes that moving from 4-tuple coding to 1-tuple arithmetic coding entailed a significant increase in computational complexity. Since the Thomson CE proposal was based on 1-tuple coding, it was not investigate further as a component in USAC arithmetic coding of spectral coefficients.

Oliver Wuebbolt, Thomson, presented



m17249

Spectral Noiseless Coding CE: Crosscheck report

Oliver Wuebbolt

This presented a cross-check on the performance of the modified CE proposal found in the next contribution (m17270). The presenter noted that the Thomson performance results (in terms of increase in compression efficiency) is identical to data found in m17270.

Markus Multrus, FhG, presented



m17270

Updated proposal of the CE on the Spectral Noiseless Coding in USAC

Guillaume Fuchs
JungHoe Kim
Markus Multrus
Eunmi Oh
Nikolaus Rettelbach
Vignesh Subbaraman

The contribution gives an overview of how the the proposed technology has changed relative to the proposal of the last meeting. The main contributions of the proposed technology is

Spectral coding tool



  • Lower ROM table usage as compared to WD5 (16900 words reduced to 1500 words)

  • Lower RAM usage as compared to WD5 (666 words reduced to 72 words per channel)

Overall

  • Increase in compression efficiency (on average an increase of 1.87% with respect to the entire bitstream length)

  • Lower ROM table size for entire USAC decoder (37000 words reduced to 21500 words, or a reduction of more than 40 %)

For the first bullet item, it notes that the spectral noiseless coding table size represents is 45% of the total table size in WD5. This is primarily due to the coding of 4-tuples (8^4 + ESC = 4097 alphabet size). Since typical DSP platforms in consumer devices have a cache size of 8 kBytes to 32 kBytes, a total table size of less than 2000 words is highly desirable. One could reduce the alphabet size by coding 1-tuples, although this increases the total number of symbols that entail arithmetic decoding. As a compromise, 2-tuple coding was used for the alphabet and the context. In addition, sign and magnitude are coded separately so that the tokens are magnitude only. Alphabet is 2x[0:3] + ESC = 17. In addition a new tool is used for coding the LSBs of the token (the “enhanced” version). The new proposal has a total ROM table usage that is comparable to the Huffman table size found in AAC.

With respect to the computational complexity, the contribution showed that the complexity of the proposed arithmetic coding technology is comparable to that of the arithmetic coding tool in WD5.

The contribution presents performance information on base and enhanced technology. The enhance technology gives some small additional compression at the cost of almost no additional storage complexity or computational complexity.

David Virette, Huawei, noted that he is not convinced of the merit of the enhanced version. Eunmi Oh, Samsung, noted that the complexity of the enhancement is extremely low. She further noted that many CEs entail multiple tools and these are not “dissected.” The Chair suggested that discussion be continued after the contributions on the related CE from Huawei are presented.

Pierrick Philippe, Orange Labs, presented

m17257

Cross check report for Huawei proposal on Spectral Noiseless Coding for USAC

Pierrick Philippe
Gregory Pallone

The contribution reports compression information that is identical to that provided by Huawei in m17246.

David Virette, Huawei, presented



m17246

Proposed CE on Spectral Noiseless Coding for USAC

Wei Xiao
David Virette
Herve Taddei

The contribution proposes a technology that permits a continuous adaptation of coding spectral information tuple size between 4-tuples and 1-tuples. Specifically, 1, 2 or 4-tuples can be coded. The goal it to use larger tuple size when there is larger correlation between tuple elements (i.e. longer for tonal spectra, shorter for noise-like spectra). The proposal delivers increased compression efficiency ranging

  • Compression efficiency: ranging from 0.88% at 64 kb/s stereo to 2.03% at 12 kb/s mono.

  • Lower ROM table size: reduced from 16900 words to 5334 words.

JungHoe Kim, Samsung, noted that at 64 kb/s the proposal does not satisfy bit buffer constraints when transcoding from WD3 bitstreams. Pierrick Philippe, Orange Labs, noted that adoption of the FDNS and FAC tools, which changes the TCX window sizes may have an impact on the probability models used to construct the arithmetic coding tables.

There was considerable additional discussion. It was agreed to bring this CE up again Friday (see Section 5.1).

Hyunkook Lee, LG, presented

m17289

Progress report on the arithmetic coding CE for USAC

Sungyong Yoon
Hyunkook Lee

This contribution proposes a revision of this CE in which there are random access frames in which global gain is coded as 8-bit PCM, otherwise it is coded as a differential value. The SF quantizer uses several contexts which are used adaptively. The technology is tested using WD2 which has a reset every 25 frames (approx every 1 second). The performance can be summarized as:


  • Increase in coding efficiency for differential global gain is between 0.00% (e.g. most speech items) and 1.02% (salvation, 12 kb/s mono), depending on the bitrate and test item. The average improvement in compression performance is (approx.) 0.35%.

  • Table size for the revised proposal is 345.5 32-bit words.

  • RAM size required is approximately 26 32-bit words.

There was considerable discussion on the merit of this proposal. The Chair called for a show of hands as to who could NOT support adoption of this proposal. Many (more than 5) hands were raised. Based on this, the Chair concluded that there is not consensus within the group to adopt the proposed technology at this time.

Markus Multrus, FhG, presented



m17361

Changes to WD5 for the CE on the Spectral Noiseless Coding in USAC

Markus Multrus
JungHoe Kim

The contribution shows the new bitstream syntax that associated decoding semantics that are needed to support the FhG/Samsung Spectral Noiseless Coding CE (m17270).

Since this is a late contribution (Wednesday of MPEG week before 2400 GMT), it was agreed to bring this CE up Friday morning so that interested experts have time to study the document (see Section 5.1).

Kei Kikuiri, NTT DOCOMO, presented

m17251

NTT DOCOMO Listening Test Results for Harmonic Transposer CE in USAC

Kei Kikuiri

The contribution reports the results of a cross-check listening test on the Harmonic Transposer CE. When doing an absolute score analysis, it shows that the mean score for WD4+CE is not different from that of WD4 at the 95% level of significance. When doing differential analysis, it shows that


  • the mean score for WD4+CE is better that that of WD4bugfix for three items and also better when averaged over all items at the 95% level of significance.

David Virette, Huawei, presented

m17245

Huawei Listening Test Report on Harmonic Transposition

Zhengzhong Du
Wei Xiao
David Virette
Hervé Taddei

The contribution reports the results of a cross-check listening test on the Harmonic Transposer CE. When doing an absolute score analysis, it shows that the mean score for WD4+CE is not different from that of WD4 at the 95% level of significance.

When doing differential analysis, it shows that



  • the mean score for WD4+CE is better that that of WD4bugfix for one items but not different when averaged over all items at the 95% level of significance.

Kristofer Kjörling, Dolby, presented

m17166

WD text for USAC CE on Harmonic Transposer

Per Ekstrand
Lars Villemoes
Sascha Disch
Frederik Nagel
Stephan Wilde

The contribution presents syntax and semantics required for the CE technology.

Discussion

Kristofer Kjörling, Dolby, notes that the interesting comparison for discussion is WD4bugfix vs CE technology. He reviewed the Dolby performance test from the 90th meeting and then summarized the cross-check information from the previous contributions.

It was the consensus of the Audio subgroup to incorporate the harmonic transposer technology into USAC WD6.

Werner Oomen, Philips, presented



m17280

Further thoughts on USAC CE process

Werner Oomen
Kristofer Kjorling
Heiko Purnhagen
Bernhard Grill
Johannes Hilpert
Philippe Gournay

The contribution in a joint contribution from Dolby, FhG, VoiceAge and presents possible guidelines for use by Audio in their CE work. It is a response to a contribution at the 90th meeting (m16870). Some of the points raised are that

  • Complete proposals with clear merit should be acted on at the current meeting. In terms of length of discussion, proposals with clear merit should have sufficient discussion to identify and resolve open issues. As a consequence, discussion time on other CE proposals may need to be limited.

  • A table that compares the various CEs in a uniform way can provide useful guidance with respect to what CEs have the greatest merit.

There was good discussion on this topic. The Chair preseted USAC-CE.xls and asked proponents to review and correct the information as appropriate.

Hyunkook Lee, LG, presented



m17291

Core Experiment Proposal on the quantization of spectral coefficients of TCX

Kiho Cho
Sungyong Yoon
Hyunkook Lee
Nam Soo Kim

This is a new CE proposal and the contribution is a first report on the technology. It proposes an additional non-uniform quantizer for use in quantizing TCX, such that TCX can switch between the two quantizers. The encoder used in the experiment was the MPEG Reference Encoder with mode information taken from the WD5 reference bitstreams. The contribution reports SNR improvement from using the switched quanitzer scheme.

The Chair noted that the strategy of taking the encoder mode information from the WD5 reference bitstreams is an excellent idea and the presenter confirmed this greatly improved the quality of the MPEG Reference Encoder.

The Audio subgroup looks forward to additional information at the next MPEG meeting.

Kei Kikuiri, NTT DOCOMO, presented



m17252

Additional Information on Enhanced Temporal Envelope Shaping CE for USAC

Kei Kikuiri
Atsushi Yamaguchi
Nobuhiko Naka

The contribution reports on progress of this CE. It reports on some new parameter values that drive the operation of the proposed technology. It reports significantly lower computational complexity and lower RAM requirements.

Five additional items with non-stationary statistics (e.g. percussive signals) were included in a subjective performance test. The test results shows that



  • At 24 kb/s, 7 items were better and the mean score was better. Four of these items are from the original USAC test set.

  • At 16 kb/s, 7 items were better and the score was better. Two of these items are from the original USAC test set.

There was good discussion about the technology being proposed. The Chair suggested that a contribution to the next meeting that explains the latest proposed technology at a tutorial level would be most welcome. Kristofer Kjörling, Dolby, noted that USAC is at WD6 in which there are new tools that address coding of non-stationary signlas. Hence, new subjective tests should be WD6 vs WD6+CE.

Interested experts will draft a workplan to coordinate cross-check of this proposal.

Werner Oomen, Philips, presented

m17279

Update on CE on parametric stereo coding in USAC

Erik Schuijers
Heiko Purnhagen
Pontus Carlsson
Werner Oomen

The contribution notes that WD3 had a bug that put the parametric stereo coding CE at a disadvantage. It gives new subjective test results of WD5 vs WD5+CE. The results show


  • Absolute score analysis shows no difference between systems under test

  • Differential score analysis shows that for C2 and C3 three items are better and C3 is better in the mean.

The presenter feels that there is still more investigation to do and expects to bring additional information to the next meeting.

Takehiro Moriya, NTT, presented



m17211

Crosscheck report of Huawei’s CE proposal for memory and complexity reduction of USAC LPD mode

Takehiro Moriya

The contribution is a cross-check on bit-exactness of the Huawei CE proposal and verifies complexity figures previously provided by Huawei. It confirms:

  • There is bit-exact reconstruction for all WD5 bitstreams and waveforms

  • Complexity information is correct: reduction of ROM size from 2.64 Kbytes to 1.5 Kbytes (reduction of 1.2 Kbytes); reduction of WMOPS is 25%, or 0.02% reduction with respect to overall decoder complexity.

The presenter feels that there is still more investigation to do and expects to bring additional information to the next meeting.

David Virette, Huawei, presented



m17241

Progress report on improved indexing of the AVQ-based LPC quantization for USAC

Wei Xiao
David Virette
Herve Taddei

The contribution gives WD syntax and semantics that are required to implement the CE technology. The primary advantage of the technology is to reduce the table size needed in the decoder.

Bruno Bessette, VoiceAge, noted that that the AVQ tables are accessed 4 times per superframe, and hence that there is negligible benefit of having the AVQ table fit in processor cache.

Philippe Gournay, VoiceAge, made a short presentation on how the WD5 AVQ tables might be reduced in size. He gave an example of optimizing one table


  • Table in WD5 is 226 4-byte values for 904 bytes total.

However, this table can be partitioned into three smaller tables:

  • 37 unsigned int as 4-byte values

  • 226 unsigned int 1-byte values

Using this technique, total storage size is reduced from 904 to 374 bytes (reduction of 530 bytes).

The Chair urged experts to consider the problem of how to evaluate this and similar proposals where increase in performance may be modest.

David Virette, Huawei, presented

m17242

Progress report on Improved Pulse Indexing for ACELP in USAC

David Virette
Dejun Zhang
Fuwei Ma
Hervé Taddei

The contribution gave performance information for a new database of speech items consisting of both clean speech and speech plus background noise. Increase in compression performance was given at 20 kb, 24 kb and 32 kb/s. It noted that for speech plus background noise the performance is lower because in this case only 50% of the frames are coded by ACELP. The contribution gives the required syntax and semantics changes to WD5 to support the CE technology.

The presenter noted that the new database consisting of 37 files, approximately half are clean speech.

The Chair asked whether ACELP mode is always used for the new speech database. The presenter noted that the coding mode mix was approximately as follows:


  • Clean speech: 50% ACLEP 50% TCX

  • Speech + background noise: 25% ACELP and 75% TCX

Roch Lefebvre, VoiceAge, asked whether Huawei experts have an idea of how the saved bits might be used to provide coding gain.

The Chair asked about using the saved bits. The presenter stated is unlikely that the saved bits can be used by quantizing in a higher rate ACELP codebook. Roch Lefebvre, VoiceAge, noted that this might likely result in only small run of frames in the higher-rate coding mode. Philippe Gournay, VoiceAge, noted that experiments at VoiceAge suggest that ~ 1% bit savings is not sufficient to realize a measurable increase in quality for TCX frames.

Roch Lefebvre, VoiceAge, noted that this is the most optimal know solution for ACELP coding. Pierrick Philippe, Orange Labs, noted that the ACELP tool is mature in that no new CE has been proposed in a number of meetings, hence it may be time

The Audio Subgroup agreed to draft a workplan on next steps, especially on how this technology might result in an increase in subjective quality.

The Chair re-iterated the importance of preparing an overview of the compression CEs so as to treat all such CEs in a fair and transparent manner.

Zhong Haishan, Panasonic, presented



m17261

Status report of time warping CE

Zhong Haishan
Chong Kok Seng
Zhou Huan 
Takeshi Norimatsu
Tomokazu Ishikawa
Neo Sua Hong

The contribution requests that a workplan be drafted to facilitate experiments on time warping, with the goal of bringing new information to the next meeting.

Zhou Huan, Panasonic, presented



m17204

Updated Information on CE proposal of QMF Based Phase Vocoder in eSBR Tool

Zhou Huan
Chong Kok Seng
Zhong Haishan
Takeshi Norimatsu
Tomokazu Ishikawa
Neo Sua Hong

This CE proposes a low complexity phase vocoder as a means to do QMF domain time stretching for use in spectral replication in the eSBR tool. The contribution reports details of complexity, in which the current proposal has approximagely 25% of the complexity of the WD5 FFT-based tool. MUSHRA subjective listening test results were presented. Differential analysis shows no change in audio quality for a tool that has 75% lower complexity.

Several parties expressed an interest in collaborating on this CE, and that effort will be coordinated with a workplan.

Takehiro Moriya, NTT, presented

m17212

Core Experiment proposal on lossless coding of pitch lag for ACELP in USAC

Takehiro Moriya
Yutaka Kamamoto
Noboru Harada

The contribution notes that ACELP was designed for mobile communications with an error-prone air interface channel, while USAC is most likely used for IP transport, in which case error robustness is not as important an issue. Hence it proposed to use variable-length codes for the ACELP pitch lag. The proposal assumes that every ACELP superframe is a random access point, and does entropy coding for subsequent pitch lags within the superframe.

At 12 kb/s mono, the CE technology is able to reduce bit rate by 2.2% for ACELP frames and 0.5% when averaged over all frames in the USAC test set. It requires 400 bytes of table and negligible RAM and processor cycles.

The presenter noted that additional ideas that could yield further increase in compression efficiency.

Roch Lefebvre, VoiceAge, observed that he would expect the CE technology to provide the greatest bit savings during stationary voiced intervals. Bruno Bessette, VoiceAge, suggested that it might be more beneficial to optimize the ACELP coding engine in the context of USAC and the new window transition technology that was adopted at this meeting.

A workplan will be drafted to make progress on this CE.

Finally, the presenter proposed an objective Figure of Merit for evaluation of CEs:

FoM = aR – bM – cC

where


a, b and c are weighting factors

R is increase in compression efficiency

M is increase in memory usage

C is increase in computational complexity

Pierrick Philippe, Orange Labs, presented

m17153

Contribution to the MPEG USAC Reference Encoder

Pierrick Philippe

The contribution describes a new module that is contributed to the MPEG USAC Reference Encoder. Included in the contribution is source code and informative text.

A workplan on MPEG USAC Reference Encoder will coordinate integration of the module into the USAC SVN software repository.

Max Neuendorf, FhG, presented

m17253

Proposed Corrections to Reference Software and WD5 of USAC

Max Neuendorf

The contribution notes two bugs in the USAC decoder reference software


  • Channel Pair Elements are not implemented in the decoder

  • A bug reported in MPEG Surround (m16985) should also be corrected in USAC

It was clarified that the USAC reference bitstreams never used channel pair elements

It was the consensus of the Audio Subgroup to



  • Add full Channel Pair Elements capability to USAC decoder

  • Correct bug reported in MPEG Surround (m16985) in USAC decoder

It was agreed that there will be a workplan on USAC Reference Software to

  • Clarify if encoder can produce CPE and if not, incorporate that functionality.

  • Correct MPEG Surround bug in encoder and decoder

  • Benchmark performance of MPEG Reference Encoder

For second bullet item, bitstreams and decoded waveforms change, but it was the consensus of the ASG that no cross-check listening test is necessary.

JungHoe Kim, Samsung, presented



m17230

Comments on reference software of MPEG Surround and reference software using MPEG Surround tool

JungHoe Kim
Miyoung Kim
Eunmi Oh

The contribution reports on the status of the MPEG Surround reference software encoder and notes that projects that use MPEG Surround software, e.g. USAC, may be impacted.

It requests workplans to identify what reference encoder tools are most worthwhile to be integrated into the reference software for:



  • MPEG Surround

  • SAOC

  • LD MPEG Surround combined with AAC ELD

Yüklə 3 Mb.

Dostları ilə paylaş:
1   ...   48   49   50   51   52   53   54   55   ...   58




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin