Organisation internationale de normalisation


Record of AhG meetings 3.1AhG Meeting SAOC, Unified Speech and Audio Sunday 1000-1700



Yüklə 2,74 Mb.
səhifə90/96
tarix02.01.2022
ölçüsü2,74 Mb.
#28941
1   ...   86   87   88   89   90   91   92   93   ...   96

3Record of AhG meetings

3.1AhG Meeting SAOC, Unified Speech and Audio Sunday 1000-1700

3.1.1SAOC 1000-13000


Oliver Hellmuth, FhG, presented

m15123

Information and Verification Results for CE on Karaoke/solo System Improving Performance of MPEG SAOC RM0

Oliver Hellmuth
Johannes Hilpert
Andreas Hölzer
Leonid Terentiev
Cornelia Falch

This notes that RM0 does not provide a very satisfying level of performance for the difficult problem of muting a foreground object as in the Karaoke application. It reviewed the technology proposed as a CE at the previous MPEG meeting. If the Fore Ground Object (FGO) is stereo, it proposes to cascade TTT-1 boxes and shows that such a cascade can be formulated as a TTN-1 box, where N=3 if FGO is mono and N=4 if FGO is stereo.

Listening test results were presented, comparing SAOC RM0 and SAOC with the new TTN technology. In global mean performance TTN was better than SAOC RM0 in all tests at the 95% level of significance. Furthermore, for the operating points demonstrated, the SAOC TTN technology achieving scores that were solidly in the “good” region.

Heiko Purnhagen, Dolby Labs, presented

m15162

Cross Verification of SAOC CE on Karaoke enhancement

Jonas Engdegard

This contribution presents a listening test that provided a cross-check on the FhG Karaoke CE. In all cases, the mean performance of the TNN technology was better than the mean performance of RM0 at the 95% level of significance.

Henney Oh, LGE, noted that FhG presented no evidence of performance for energy mode, and that there is no basis for incorporating this operating mode into the SAOC WD. The Chair suggested that this could be provided at the next meeting, perhaps even as a collaboration between FhG and LG.

The AhG recommends that the Audio Subgroup accept the TTN prediction mode with residual coding into the SAOC WD.

Jeongil Seo, ETRI, presented

m15144

Consideration on enhanced Karaoke processing for stereo FGO

Jeongil Seo
Seungkwon Beack
Kwang-ki Kim
Kyeoungok Kang

This contribution notes that the current performance of SAOC RM0 in the karaoke application (i.e. suppression of FGO) has limited quality. ETRI suggest an alternative structure for karaoke/solo modes based on a cascade of OTT boxes in the case of stereo FGO. It further notes that the OTT box required 2 parameters while the TTT box requires 3 parameters

ETRI feels that the proposed technology can provide lower complexity and lower bitrate. The Chair welcomed ETRI to proceed with the CE, but noted that the proposed technology provided functionality similar to that of the FhG CE, which is recommended to be accepted into the SAOC WD. Hence there must be a significant increase in performance in order to displace the FhG CE technology. The Chair asked ETRI to give specific estimates of what, if any, resources ETRI might seed from the SAOC sometime during the MPEG week.

Henney Oh, LG, presented

m15112

Comments on SAOC applications and architectures

Henney Oh
Yang-Won Jung

The contributions makes three suggestions:


  • Downmix preprocessor – it suggests that mono to mono downmix be supported.

  • Binaural transcoder - it suggests incorporating a separate binaural synthesis engine into the SAOC decoder.

  • MBO architecture – it suggests that in the case of Multichannel Background Object (MBO), the downmix should be able to be either mono or stereo.

The Chair noted that the suggested modification for binaural transcoding provides no additional functionality as compared to the SAOC and MPEG Surround combination. Oliver Hellmuth, FhG, noted that in real implementation, one is free to optimize the internals relating to how to combine the SAOC and MPEG Surround functionality.

The Chair suggested that it may be good to add an informative section to the SAOC specification on how to “collapse” SAOC and MPEG Surround functionalities in the case of a unified implementation.

It was agree that interested parties should continue to discuss this contribution and report to the Audio Subgroup mid-week.

Osamu Shimada, NEC, presented

m15110

A core experiment proposal for an additional SAOC functionality of separating real-environment signals into multiple objects

Osamu Shimada
Toshiyuki Nomura
Akihiko Sugiyama
Osamu Hoshuyama

The contribution notes that SAOC does not provide information on the nature or relationship of the multiple objects in the SAOC bitstream such that the decoder can meaningfully decode and place objects in a multi-channel presentation.

Oliver Hellmuth, FhG, asked whether the current SAOC architecture with the addition of metadata that indicates that two objects are related (e.g. from the same microphone) could provide the same functionality. The Chair asked if NEC might clarify why the proposed technology (System 4) does not show significant improvement over what can be provided by the existing SAOC architecture (System 3).

In conclusion, the Chair suggested that NEC have discussions with interested parties during the first part of the MPEG week and make a mid-week presentation that addresses the issues raised.

3.1.2Unified Speech and Audio 1400-1700


Kristofer Kjörling, Dolby, presented

m15158

Homework according to the joint speech and audio workplan

Kristofer Kjörling
Heiko Purnhagen

This contribution reports the information requested in “Workplan for Candidate Test Items.” It did not find permission information on the item from NRSC, but did give information on where to get the DC associated with other items.

Schuyler Quackenbush will contact David Layer, NRSC, to ask if MPEG can get access to this item.

In addition, it presented a table that recommends the downmix, as L or (L+R)/2 and level adjustment, based on subjective evaluation.

Schuyler Quackenbush, Audio Research Labs, presented



m15095

Collected Set of Possible Evaluation Guidelines

S. Quackenbush

This contribution is merely the collection of text from various audio experts that was available on the Friday of the 82nd MPEG meeting. The presenter highlighted area in which a choice of methods must be made, but asked that discussion be deferred as the remaining contributions will a provide better vehicle for discussion.

Werner Oomen, Philips, presented



m15155

Evaluation criteria and test items for unified speech and audio coding

Werner Oomen
Erik Schuijers

This contribution covers four topics

  • Derivation of VC – for each item and each operating point a VC is selected.

  • Candidate test items – remove items that might duplicate the effect of oncatenated test items.

  • Figure of Merit – system of assigning points.

  • Item Selection – to select a representative subset of the 38 items, as two sets: most critical items and items that are coded with very good performance

The contribution presented the results of applying the item selection procedure using testing at 32 kb/s.

Kristofer Kjörling, Dolby, presented



m15160

Thoughts on evaluation criteria for joint speech and audio workitem

Kristofer Kjörling
Heiko Purnhagen

This contribution covers five topics

  • Derivation of VC – for each item, each operating point and each test site, a VC is selected.

  • Figure of Merit – which operating points are evaluated, and how do we pick a winner.

  • Candidate all test items to make a single item to code – this prevents the opportunity of:

    • Per-item tuning

    • Bit buffer abuse

  • Speech to Music transition – such items should be removed from the test, in that grading is difficult in that case that e.g. speech is handled well and music is not.

  • Dolby endorses the notion of using items such as the “classic” 12 MPEG items for the speech and audio process, as these are difficult and diverse items that span a large space of possible encoder “tunings.” The Speech and Audio test set should be known at the close of the April MPEG meeting.


Johannes Boehm, Thomson, presented

m15145

Thoughts on Speech and Audio Evaluation Guidelines

Oliver Wuebbolt
Johannes Boehm

The contribution shows a method to combine the variances of a given system under test over all test sites. It recommends that the Evaluation Guidelines document

  • Take care when building a measure of variance or use in determining 95% CI on a global mean performance

  • Specify in advance what your information might be when you must “consider additional information” in order to choose a best system when the Figure of Merit fails to decide a winner.

Miyoung Kim, Samsung, presented

m15118

Comments on Unified Speech and Audio CfP Evaluation Guidelines

Miyoung Kim
Eunmi Oh
JungHoe Kim

The contribution proposes to

  • Determine VC by pooling over all test sites

  • Requirements – at 64 kb/s pool over all signal categories to get a single mean performance

The Chair noted that pooling over all signal categories will result in a smaller confidence interval for that one score and thus may make the proposed 64 kb/s requirement more difficult to fulfil.

Markus Multrus, FhG, presented



m15165

Comments on Speech and Audio Evaluation Guidelines

Ralf Geiger
Markus Multrus
Bernhard Grill

The contribution raises a number of issues

  • Confidence intervals on the grand mean performance should be used when comparing the performance of systems under test.

The “winner” amongst systems with overlapping confidence intervals should be selected by considering additional information such as:

  • Operation at higher bitrates, e.g. 128 kb/s

  • That re-use of existing MPEG technology is desirable

Miyoung Kim, Samsung, noted that it is undesirable to delay the selection process by running another listening test to get additional information. Anisse Taleb, Ericsson, stated that we cannot ask for subjective performance information at 128 kb/s because that operating point is not listed in the Call, and the Chair agreed with that statement. Ralf Geiger, FhG, noted that in a deadlocked situation an additional listening test may be the quickest way to resolve the deadlock.

Schuyler Quackenbush, Audio Research Labs, presented



m15096

Draft Workplan for Testing of SA Proposals

S. Quackenbush

This is a skeleton for the final workplan document. The presenter asked that interested audio experts please read and provide comments on components that are missing or could be improved.

The Chair presented the AhG report, which was approved the AhG members present.



Yüklə 2,74 Mb.

Dostları ilə paylaş:
1   ...   86   87   88   89   90   91   92   93   ...   96




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin