International organisation for standardisation organisation internationale de normalisation



Yüklə 2,78 Mb.
səhifə47/67
tarix02.01.2022
ölçüsü2,78 Mb.
#20862
1   ...   43   44   45   46   47   48   49   50   ...   67

7.10.2Unified Speech and Audio Coding


Max Nuendorf, FhG, presented

m16687

Listening Test Report on TCX Improvements for USAC

Max Neuendorf, Jérémie Lecomte

The contribution shows results of a listening test of WD3 system and WD3 with proposed TCX windowing. Two subjective tests were conducted and, on average, there was no difference between the two systems at the 95% level of significance. When examining the difference evaluation of individual items, at 12kbit/s there was one item that was better and two items that were worse for system WD3 with proposed TCX windowing. At 16kbit/s there were two items that were better and one item that was worse for system WD3 with proposed TCX windowing.

Taejin Lee, ETRI, presented



m16714

ETRI listening test results for USAC CE on TCX improvements

Taejin Lee, Seungkwon Beack, Minje Kim, Kyeongok Kang

The contribution shows listening test results for the proposed technology on adaptive TCX windowing technology. When examining individual items, at 12 kbp/s there were two items that were better for system WD3 with proposed TCX windowing and at 12 kb/s there were four items that was better for system WD3 with proposed TCX windowing.

Pierrick Philippe, Orange Labs, presented



m16561

Evaluation of TCX improvements Core Experiment

Pierrick Philippe

The contribution shows a listening test result (at 12 kb/s mono) for the proposed technology on adaptive TCX windowing technology. The results showed that, overall, there as no difference between the two systems, at the 95% level of significance. When examining individual items, for one item the system WD3 with proposed TCX windowing was better.

Hyunkook Lee, LG, presented



m16632

LGE listening test results for USAC CE on TCX improvements

Hyunkook Lee, Sungyong Yoon

At 12 kb/s the listening test showed, on average, no difference at the 95% level of significance. When examining difference scores for individual items, the listening tests showed

  • at 12kb/s there were four items that were better at the 95% level of significance

  • at 16 kb/s there were two items that were better at the 95% level of significance

Taejin Lee, ETRI, presented

m16715

Report on TCX Improvements for USAC

Taejin Lee, Seungkwon Beack, Minje Kim, Kyeongok Kang

The contribution shows an analysis of the frequency characteristics of the proposed adaptive TCX windows. In all cases the proposed windows, as compared to WD3 windows, have:

  • narrower main lobe

  • lower first sidelobe

  • Higher sidelobes rolloff at a faster rate

The presenter also showed listening test results for pooling over all test sites. An analysis of the difference scores showed that

  • at 12 kb/s there were two items better (es01, phi7) at the 95% level of significance

  • at 16 kb/s there were two items better (es01, phi7) at the 95% level of significance

There was considerable discussion on the contribution’s term “data rate gain,” where it was noted that for every MDCT window shown in the contribution the MDCT is a critically sampled filterbank. Roch Lefebvre, VoiceAge, noted that the left-side (ACELP side) of the RM3 window show in the contribution neglected to account for the effect of subtracting the ACELP’s LP filter zero-impulse response.

Bruno Bessette, VoiceAge, asked whether open-loop or closed-loop analysis was used to select coding mode at (potential) ACELP to TCX transitions.

The Chair noted that there is a “3x3 matrix” of coding mode transitions (at least on a superframe boundary), and that it would be beneficial to evaluate the extent that the WD3 technology is mature (or not) for each of the 9 transition cases. Finally, considering the amount of discussion, it appears that the proposal needs some time to be better understood. It will be brought up again later in the week.

Roch Lefebvre, VoiceAge, presented



m16688

Alternatives for windowing in USAC

Bruno Bessette, Roch Lefebvre, Philippe Gournay, Redwan Salami

The contribution concerns transitions between ACELP and TCX modes in the LP processing modes. It notes that, classically:

  • ACELP uses rectangular, non-overlapping windowing

  • TCX uses non-rectangular, overlapping windows

It also notes that

  • TCX window tails start at the beginning of the TCX frame

  • TCX windows are not centered on the TCX frame

  • In WD3, critical sampling is possible in TCX to TCX transitions

  • WD3 uses many different TCX windows, depending on adjacent frame processing mode (e.g. ACELP)

It proposes to select optimal TCX windows and adapt the processing of adjacent non-TCX frames (as opposed to adapting TCX processing to adjacent frame mode). The contribution notes that, for ACELP to TCX transitions the first half of the TCX frame needs to compensate for the following two effects:

  • time-domain aliasing

  • compensate for the non-unitary level of the window

It reports on the results of an experiment in which the encoder was forces to run in the following three modes:

  • TCX-256 frame surrounded on both sides by ACELP frames

  • TCX-512 frame surrounded on both sides by ACELP frames

  • TCX-1024 frame surrounded on both sides by ACELP frames

A table was presented that showed segmental SNR, calculated in the Weighted LPC residual domain and the bits used for coding the TCX MDCT coefficients and the FAC information. The table was adjusted to weight the FAC bit allocation by the relative frequency of the ACELP to TCX transitions seen by the WD3 system.

A final set of plots showed the distribution of the noise, in the Weighted LPC residual domain, at the various positions within the TCX frame.

In summary, the contribution proposes a TCX coding method that:


  • Treats all TCX frames identically

  • Centers TCX decision on the frame

  • Avoids the need for special radix MDCTs

  • Improves segSNR within the TCX frame

  • Flattens the distribution of the noise floor in the TCX frame

The Chair asked the following questions:

  • Why not use asymmetrical windows to achieve maximal overlap in TCX to TCX transitions? The presenter noted that this could be done, but this tradeoff was not investigated.

  • Why not just multiply decoded TCX samples by know factors to achieve unity gain? The presenter noted that this would amplify the noise floor most at the window edges, making the distribution of noise power less flat across the TCX frame.

  • Why not use TCX quantizer for FAC as opposed to algebraic VQ? The presenter noted that the TCX quantizer would require a new PDF for this signal, and that algebraic VQ is appropriate for a noise-like signal (which this is).

There quite a bit of additional discussion. The Chair suggested that the group develop a statement, with respect to the “3x3 matrix”of possible transitions, of what they feel WD3 does right and where CEs in hand offer alternatives for better performance, and that these alternatives be described in a “common vocabulary.”

Hyunkook Lee, LG, presented



m16635

Proposed core experiment on improved mode transition

Kiho Cho, Hyunkook Lee, Sungyong Yoon

The contribution proposes a different processing for ACELP to TCX transitions, with many of the transitions employing synthetic TDAC data generated from adjacent ACELP frames. Subjective listening tests show that, for difference score analysis, two items are better at the 95% level if significance for the proposed technology (the items being music3 and phi7). The presenter noted that the overlap between ACELP and TCX reduces the noise “blocking” beyond what would be obtained by the LP synthesis filter.

Ralf Geiger, FhG, noted that a 1088 block length MDCT requires a radix 17 transform which may have complexity implications.

Philippe Gournay, VoiceAge, presented

m16672

Report on Unvoiced Speech Coding for USAC

The contribution reports on joint work of VoiceAge and Samsung. The goal was to merge two coding modes from CfP Sys4 (unvoiced coding mode and low-energy coding mode) into the USAC RM. These coding modes used a Gaussian codebook which is know to be very efficient for weakly structured, noise-like signals such as are unvoiced and low-energy segments. At the last MPEG meeting it was noted that TCX supports “noise fill-in” for regions of the audio spectrum. It was investigated how this feature could be used for such unvoiced or low-energy modes. Based on internal experiments, it was concluded that the TCX noise-fill tool performed at least as well as the proposed unvoiced coding mode and low-energy coding mode. Hence there is no need to alter the current WD3 technology and the CE is concluded.

Kristofer Kjörling, Dolby, presented



m16640

Progress report on harmonic transposer CE for the USAC work item

Kristofer Kjörling, Per Ekstrand, Lars Villemoes, Max Neuendorf, Markus Multrus

The contribution reports the current status of the CE. It is envisioned that the work will try to reduce complexity and harmonize to perhaps only one synthesis filterbank instead of three. Furthermore, it will try to make the transpose tool do only the “transpose” function and have other tools (e.g. the noise fill tool) bring the performance of the decoder to the level expected.

The Chair presented the AhG report, received comments from the group, made modifications and the report was approved.



Yüklə 2,78 Mb.

Dostları ilə paylaş:
1   ...   43   44   45   46   47   48   49   50   ...   67




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin