Organisation internationale de normalisation



Yüklə 7,54 Mb.
səhifə156/166
tarix03.01.2022
ölçüsü7,54 Mb.
#33742
1   ...   152   153   154   155   156   157   158   159   ...   166
Wednesday 11AM

Max Neuendorf, FhG, presented new information on this CE. The presenter noted that, in his opinion, the CE tries to do something that the RM coder can already do in the FD coding mode.

He presented the results of a listening test. The systems under test are


  • A USAC-TCXW CE system

  • LovlpAsFD_T80 New system, described below

  • LovlpAsFD_T80T40 New system, described below

  • Reg RM8

The middle two systems in the list above derive the following mode decisions from ETRI‘s RM8_CE bitstreams:

  • FD (with low SBR xover)

  • Regular LPD with short overlap

  • LPD, where: TCX80 with long overlap (“Lovlp”)

  • LPD, where: TCX40 with long overlap

These mode decisions are imposed on the RQE encoder, but instead of using long-overlap TCX frames, the encoder uses FD mode with a higher SBR cross-over:



  • Use FD mode when two or more consecutive frames are coded with TCX80 long-overlap

    • “LovlpAsFD_T80”

  • Same as above but in addition also use FD mode on frames where TCX40 long-overlap occurs

    • “LovlpAsFD_T80T40”

The last condition uses current regular short overlap in LPD

    • “reg”

With this configuration of “reg”, the only difference between the technology in the two systems is the ETRI technology. The listening test results at 12 kb/s mono show that there is no difference in the absolute scores. In analysis of the differential scores, no difference is observed either between the CE and LovlpAsFD_T80 nor between the CE and LovlpAsFD_T80T40"

Taejin Lee, ETRI, continues his presentation on the USAC-TCXW CE. In summing he made the following points:



  • ETRI followed the CE methodology, and requests that a decision be made based on the CE results.

  • The pooled listening test results are quite strong: The performance of the CE technology is better than that of RM8 at both 12 kb/s and 16 kb/s in the overall mean and also the mean for the Speech, Music and Mixed content categories.

Bernhard Grill, FhG, noted that the point of this discussion is whether a new tool (TCXW) is needed, or whether the current USAC encoder can deliver the same quality with additional tuning and/or classification modules.

Taejin Lee, ETRI, noted that a new encoder tuning that shows merit for this CE may not represent a reference quality encoder tuning.

Bernhard Grill, FhG, gave a presentation on his view of the structure USAC. The main point was that, neglecting ACEL mode, USAC is an AAC codec with two means to shape the quantization noise: scale factors or LPC frequency-domain weighting. He noted that one can always improve one coding mode, but what is important is to improve the overall coder performance. For example, he stated that in the FD transform, one could investigate the following options:


  • Window length (128, 256, 512, 1024)

  • Window overlap (50%, e.g. 20%)

  • Window shape (sine, Kaiser-Bessel)

Hence one could explore 16 configurations, whereas AAC standardized only 4 configurations (128, 1024; sine, Kaiser-Bessel). He contends that the other configurations would bring small, if any, improvements at the cost of significant increase in encoder control. The basic problem is to select long-overlap window for tonal items and short-overlap for speech items. All tools for this are already in USAC, and the ETRI CE technology is not needed.

The Chair asked if the CE imposes any significant constraints on the encoder architecture. Taejin Lee, ETRI, noted that the CE encoder/decoder has a small additional delay (192 samples).

The Chair called for a show of hands of experts that do not support adopting the CE technology, and 9 experts raised their hands.

The Chair proposed that the issue be brought up again on Thursday at 1400 hrs.

David Virette, Huawei, presented



m19262

Report on cross-check listening test for the CE on lossless coding of pitch lag for ACELP in USAC

David Virette

USAC-PL

The contribution presents the results of a listening test a 8 kb/s mono operating point. The two systems under test were

Sys1 RM8


Sys2 RM8+CE

There is no difference in the absolute scores. For differential scores, the CE was better for 1 item.

Philippe Gournay, VoiceAge, presented

m19345

VoiceAge Report on Cross-check Listening Test and Software Verification for the Core Experiment on Lossless Pitch Coding

Philippe Gournay, Roch Lefebvre

USAC- PL

The contribution presents the results of a listening test a 8 kb/s mono operating point. There is no difference in the absolute scores. For differential scores, the CE was better for 5 items and for the mean. The presenter noted that the items were better by 1 or 2 MUSHRA points and had small confidence intervals indicate large agreement amongst listeners.

The contribution also reports on CE decoder verification: the CE bitstreams decoded to exactly the CE waveforms.



Takehiro Moriya, NTT, presented

m19223

Information of CE on lossless coding of pitch lag for ACELP in USAC

Takehiro Moriya, Noboru Harada, Yutaka Kamamoto

USAC- PL

The contribution reports that the CE was slightly revised, in that it had new Huffman codebooks. This resulted in more than 6 bits savings per ACELP frame (an average of 3% savings for ACELP mode frames at 8 kb/s). The presenter noted that speech items coded at low rates may use ACELP mode for 50% of the frames. Saved bits were were returned to the bit reservoir, but an additional encoder tool was used to steer additional bits to ACELP and to increase the ACELP excitation mode when appropriate. The presenter noted that the CE proposal is purely lossless coding with respect to RM8 operation.

Differential coding was exploited, but an ACELP superframe remains a random access entry point.

Listening test results for data pooled over all listening sites was presented:

Only cross-check sites: 2 better

Proponent and cross-check sites: 6 better and mean better. Some improvements were as large as 3 points.

When investigating consistency amongst test sites

Of 7 items that are better for pooled data, 2 items were judged better at 2 of 3 test sites.

Complexity at 8 kb/s:

0.04 WMOPS typical, 0.15 worst case

1025 bytes of Huffman tables

The presenter noted that one step higher in ACELP excitation pulse coding requires 32 bits per superframe, hence 6 bits per superframe savings permits the sytem to jump to a higher rate in one of every 6 superframes. However, for voiced regions the savings can be much more and so one could jump to a higher-rate excitation mode more often.

MM, FhG, asked what material was used to train Huffman tables. The presenter stated that they are trained based on 100,000 ACELP frames outside of the RM0 test set and also on the RM8 reference quality bitstreams

Audio experts requested additional information on:


  • The dynamics of when the CE technology is able to jump up to a higher ACELP coding mode with respect to RM8. Also, the TCX frame length used with respect to RM8.

  • Database used for training Huffman codebooks

This discussion will be continued Thursday at 1400 hrs.

Takeshi Norimatsu, Panasonic, presented



m19246

Panasonic crosscheck report on Improved Tonal Component Coding

Takeshi Norimatsu, Tomokazu Ishikawa, Zhong Haishan, Zhao Dan

USAC-TonComp

The contribution presents the results of a listening test at 20 kb/s mono. The systems under test were:

Sys1 RM8


Sys2 RM8+CE

When analysing absolute scores there were no significant difference.

When analysing differential scores:

1 item better (Id4)



Eunwoo Song, Yonsei University, presented

m19249

Yonsei crosscheck listening test report on improved tonal component coding CE for USAC

Jeongook SongEunwoo SongHenney Oh

USAC-TonComp

The contribution presents the results of a listening test at 16 kb/s mono. When analysing absolute scores:

1 item better (Id4)

When analysing differential scores:

1 item better (Id4), 1 item worse (SpeechOverMusic_1)


Tomasz Zernicki, Telcordia, presented the following contributions

m19238

Report on CE on Improved Tonal Component Coding in eSBR

Tomasz ZernickiMaciej BartkowiakLukasz JanuszkiewiczMarek Domanski,

USAC-TonComp

m19367

Telcordia and PUT listening test report on Tonal Component Coding in eSBR

Tomasz ZernickiMaciej BartkowiakLukasz JanuszkiewiczMarek Domanski

USAC-TonComp

The presentation reviewed the problem that the CE proposes to solve, namely that the SBR tool in USAC is unable to reconstruct signals with only high tonal components (since there are no low-band harmonics to patch up to the high band). The proposed solution is to use a sinusoidal coding tool to represent the high-band tonal components.

The complexity of the decoding tool is:

0.5 WMOPS.

0.64 Kbyte ROM

48 kBytes RAM now, but anticipated to be 10 kBytes after further optimization

Listening test results for all data pooled together were presented:

Poznan, 16 kb/s: 2 better, some with mean value better by more than 15 MUSHRA points

Telcordia, 20 kb/s: 5 better, mean better

Poznan, Yonsei, 16 kb/s: 2 items better, 1 item worse

Telcordia, Poznan, Panasonic, 20 kb/s: 5 better, mean better

The presenter noted that the improved items are for the most part the new test items (e.g. Id4). He further noted that for the one item worse (at Yonsei), the Tonal Components tool was never used; hence any actual negative impact would be due to the 1 bit/frame allocated to the enable/disable flag. Very likely it might be due to a statistical anomaly.

It was clarified that the USAC CfP items were coded as one concatenated file, while the new items were encoded separately. It was also clarified that the average tonal components bit budget was subtracted from the nominal USAC bit rate to get an average bit rate for USAC to code the residual signal. Kristofer Kjörling, Dolby, note that this is different from how an actual system would work, where the per-frame tonal components bit requirements would be subtracted from the USAC per-frame residual coding bit budget. The Chair agreed that not modelling the instantaneous bit requirements leaves quite a bit of uncertainty in evaluating the subjective performance data.

Max Neuendorf, FhG, noted that there is an 8 frame latency in the encoder since the tool encodes 8 frames as a unit. There was considerable discussion on exactly how this data was transmitted over the channel.

Werner Oomen, Philips, reported on a very informal in-house listening test in which the violin item was graded as having worse quality.

Kristofer Kjörling, Dolby, summed up his concerns as:


  • The CE did not implement per-frame bit allocation

  • The contribution does not contain a single section showing the tool’s syntax and semantics

This discussion will be continued Thursday at 1400 hrs.

David Virette, Huawei, presented



m19305

Enhanced Pulse Indexing CE for ACELP in USAC

David Virette, Wei Xiao, Anisse Taleb

USAC-EPI

The contribution summarized the Enhanced Pulse Indexing CE technology. The technology is able to save 1 bit per ACELP track in certain coding modes. This results in an average bit savings of

Rate and mode

Bits saved per ACELP frame

Percent savings (for ACELP frames)

20 kb/s mono

4.8

1.6%

24 kb/s mono

9.5

2.9%

32 kb/s mono

9.9

2.7%

The Chair noted that the situation of this CE is that the CE technology is guaranteed to deliver increased compression, in that it is a lossless compression technology. However, the difficulty with the CE is that it has been unable to demonstrate that this bitsavings can be translated into an increase in subjective performance.

The Chair called for a show of hands of experts that do not support adopting the CE technology, and 10 experts raised their hands. Based on this, the Chair concluded that there is no consensus to adopt the CE technology.

Max Neuendorf, FhG, presented

m19337

Proposed revision of USAC bit stream syntax addressing USAC design considerations

Max NeuendorfMarkus Multrus, Stefan Döhla, Heiko PurnhagenFrans de Bont, Julien Robilliard, Matthias Neusinger, Johannes Hilpert

USAC-BITSTR

The contribution proposes changes to the USAC bitstream that address the requirements listed in the workplan from the 94th meeting. Possible bitstream changes fall into the following three categories, and the contribution addresses the first two categories:


  • Configuration data

  • USAC bitstream payload (i.e. Access Units)

  • Transport (i.e. synchronization and transmission error detection)

Config

  • Make much more modular

  • Modules that might conceivably be outside of an operating profile carry length information so they can be “skipped”

Payload

  • Indicate “independence” (i.e. random access points)

  • Remove redundant “static” information

  • Extensibility

  • Careful integration of SBR and MPS information

  • Order of syntax elements within Access Unit

The Chair noted that every change to the Extension Element ID will require an amendment to the USAC specification, and suggested that the alternate mechanism of a Registry be considered. He further suggested that the Independency flag be mapped back to the Config, e.g. that every 100 frames are Independent frames and hence random access entry points. Heiko Purnhagen, Dolby, noted that in the philosophy in the proposed design, such information would be at a higher level (e.g. Systems), which is not covered in this contribution.

The presenter hopes to bring information on the impact of the syntax design for various application scenarios, e.g. high-bitrate versus low-bitrate and file-based versus broadcast.

The presenter requested that the following syntax elements be adopted into the USAC DIS.


  • MPS 212 changes, both Config and MPS data.

  • Independency flag

The Chair stated that there was no more time to discuss this item. The discussion will continue in Friday’s Audio plenary.


Yüklə 7,54 Mb.

Dostları ilə paylaş:
1   ...   152   153   154   155   156   157   158   159   ...   166




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin