3Record of AhG meetings 3.1AhG Meeting SAOC, Unified Speech and Audio Sunday 1000-1700 3.1.1SAOC 1000-13000
Oliver Hellmuth, FhG, presented
m15123
|
Information and Verification Results for CE on Karaoke/solo System Improving Performance of MPEG SAOC RM0
|
Oliver Hellmuth
Johannes Hilpert
Andreas Hölzer
Leonid Terentiev
Cornelia Falch
|
This notes that RM0 does not provide a very satisfying level of performance for the difficult problem of muting a foreground object as in the Karaoke application. It reviewed the technology proposed as a CE at the previous MPEG meeting. If the Fore Ground Object (FGO) is stereo, it proposes to cascade TTT-1 boxes and shows that such a cascade can be formulated as a TTN-1 box, where N=3 if FGO is mono and N=4 if FGO is stereo.
Listening test results were presented, comparing SAOC RM0 and SAOC with the new TTN technology. In global mean performance TTN was better than SAOC RM0 in all tests at the 95% level of significance. Furthermore, for the operating points demonstrated, the SAOC TTN technology achieving scores that were solidly in the “good” region.
Heiko Purnhagen, Dolby Labs, presented
m15162
|
Cross Verification of SAOC CE on Karaoke enhancement
|
Jonas Engdegard
|
This contribution presents a listening test that provided a cross-check on the FhG Karaoke CE. In all cases, the mean performance of the TNN technology was better than the mean performance of RM0 at the 95% level of significance.
Henney Oh, LGE, noted that FhG presented no evidence of performance for energy mode, and that there is no basis for incorporating this operating mode into the SAOC WD. The Chair suggested that this could be provided at the next meeting, perhaps even as a collaboration between FhG and LG.
The AhG recommends that the Audio Subgroup accept the TTN prediction mode with residual coding into the SAOC WD.
Jeongil Seo, ETRI, presented
m15144
|
Consideration on enhanced Karaoke processing for stereo FGO
|
Jeongil Seo
Seungkwon Beack
Kwang-ki Kim
Kyeoungok Kang
|
This contribution notes that the current performance of SAOC RM0 in the karaoke application (i.e. suppression of FGO) has limited quality. ETRI suggest an alternative structure for karaoke/solo modes based on a cascade of OTT boxes in the case of stereo FGO. It further notes that the OTT box required 2 parameters while the TTT box requires 3 parameters
ETRI feels that the proposed technology can provide lower complexity and lower bitrate. The Chair welcomed ETRI to proceed with the CE, but noted that the proposed technology provided functionality similar to that of the FhG CE, which is recommended to be accepted into the SAOC WD. Hence there must be a significant increase in performance in order to displace the FhG CE technology. The Chair asked ETRI to give specific estimates of what, if any, resources ETRI might seed from the SAOC sometime during the MPEG week.
Henney Oh, LG, presented
m15112
|
Comments on SAOC applications and architectures
|
Henney Oh
Yang-Won Jung
|
The contributions makes three suggestions:
-
Downmix preprocessor – it suggests that mono to mono downmix be supported.
-
Binaural transcoder - it suggests incorporating a separate binaural synthesis engine into the SAOC decoder.
-
MBO architecture – it suggests that in the case of Multichannel Background Object (MBO), the downmix should be able to be either mono or stereo.
The Chair noted that the suggested modification for binaural transcoding provides no additional functionality as compared to the SAOC and MPEG Surround combination. Oliver Hellmuth, FhG, noted that in real implementation, one is free to optimize the internals relating to how to combine the SAOC and MPEG Surround functionality.
The Chair suggested that it may be good to add an informative section to the SAOC specification on how to “collapse” SAOC and MPEG Surround functionalities in the case of a unified implementation.
It was agree that interested parties should continue to discuss this contribution and report to the Audio Subgroup mid-week.
Osamu Shimada, NEC, presented
m15110
|
A core experiment proposal for an additional SAOC functionality of separating real-environment signals into multiple objects
|
Osamu Shimada
Toshiyuki Nomura
Akihiko Sugiyama
Osamu Hoshuyama
|
The contribution notes that SAOC does not provide information on the nature or relationship of the multiple objects in the SAOC bitstream such that the decoder can meaningfully decode and place objects in a multi-channel presentation.
Oliver Hellmuth, FhG, asked whether the current SAOC architecture with the addition of metadata that indicates that two objects are related (e.g. from the same microphone) could provide the same functionality. The Chair asked if NEC might clarify why the proposed technology (System 4) does not show significant improvement over what can be provided by the existing SAOC architecture (System 3).
In conclusion, the Chair suggested that NEC have discussions with interested parties during the first part of the MPEG week and make a mid-week presentation that addresses the issues raised.
3.1.2Unified Speech and Audio 1400-1700
Kristofer Kjörling, Dolby, presented
m15158
|
Homework according to the joint speech and audio workplan
|
Kristofer Kjörling
Heiko Purnhagen
|
This contribution reports the information requested in “Workplan for Candidate Test Items.” It did not find permission information on the item from NRSC, but did give information on where to get the DC associated with other items.
Schuyler Quackenbush will contact David Layer, NRSC, to ask if MPEG can get access to this item.
In addition, it presented a table that recommends the downmix, as L or (L+R)/2 and level adjustment, based on subjective evaluation.
Schuyler Quackenbush, Audio Research Labs, presented
m15095
|
Collected Set of Possible Evaluation Guidelines
|
S. Quackenbush
|
This contribution is merely the collection of text from various audio experts that was available on the Friday of the 82nd MPEG meeting. The presenter highlighted area in which a choice of methods must be made, but asked that discussion be deferred as the remaining contributions will a provide better vehicle for discussion.
Werner Oomen, Philips, presented
m15155
|
Evaluation criteria and test items for unified speech and audio coding
|
Werner Oomen
Erik Schuijers
|
This contribution covers four topics
-
Derivation of VC – for each item and each operating point a VC is selected.
-
Candidate test items – remove items that might duplicate the effect of oncatenated test items.
-
Figure of Merit – system of assigning points.
-
Item Selection – to select a representative subset of the 38 items, as two sets: most critical items and items that are coded with very good performance
The contribution presented the results of applying the item selection procedure using testing at 32 kb/s.
Kristofer Kjörling, Dolby, presented
m15160
|
Thoughts on evaluation criteria for joint speech and audio workitem
|
Kristofer Kjörling
Heiko Purnhagen
|
This contribution covers five topics
-
Derivation of VC – for each item, each operating point and each test site, a VC is selected.
-
Figure of Merit – which operating points are evaluated, and how do we pick a winner.
-
Candidate all test items to make a single item to code – this prevents the opportunity of:
-
Per-item tuning
-
Bit buffer abuse
-
Speech to Music transition – such items should be removed from the test, in that grading is difficult in that case that e.g. speech is handled well and music is not.
-
Dolby endorses the notion of using items such as the “classic” 12 MPEG items for the speech and audio process, as these are difficult and diverse items that span a large space of possible encoder “tunings.” The Speech and Audio test set should be known at the close of the April MPEG meeting.
Johannes Boehm, Thomson, presented
m15145
|
Thoughts on Speech and Audio Evaluation Guidelines
|
Oliver Wuebbolt
Johannes Boehm
|
The contribution shows a method to combine the variances of a given system under test over all test sites. It recommends that the Evaluation Guidelines document
-
Take care when building a measure of variance or use in determining 95% CI on a global mean performance
-
Specify in advance what your information might be when you must “consider additional information” in order to choose a best system when the Figure of Merit fails to decide a winner.
Miyoung Kim, Samsung, presented
m15118
|
Comments on Unified Speech and Audio CfP Evaluation Guidelines
|
Miyoung Kim
Eunmi Oh
JungHoe Kim
|
The contribution proposes to
-
Determine VC by pooling over all test sites
-
Requirements – at 64 kb/s pool over all signal categories to get a single mean performance
The Chair noted that pooling over all signal categories will result in a smaller confidence interval for that one score and thus may make the proposed 64 kb/s requirement more difficult to fulfil.
Markus Multrus, FhG, presented
m15165
|
Comments on Speech and Audio Evaluation Guidelines
|
Ralf Geiger
Markus Multrus
Bernhard Grill
|
The contribution raises a number of issues
-
Confidence intervals on the grand mean performance should be used when comparing the performance of systems under test.
The “winner” amongst systems with overlapping confidence intervals should be selected by considering additional information such as:
-
Operation at higher bitrates, e.g. 128 kb/s
-
That re-use of existing MPEG technology is desirable
Miyoung Kim, Samsung, noted that it is undesirable to delay the selection process by running another listening test to get additional information. Anisse Taleb, Ericsson, stated that we cannot ask for subjective performance information at 128 kb/s because that operating point is not listed in the Call, and the Chair agreed with that statement. Ralf Geiger, FhG, noted that in a deadlocked situation an additional listening test may be the quickest way to resolve the deadlock.
Schuyler Quackenbush, Audio Research Labs, presented
m15096
|
Draft Workplan for Testing of SA Proposals
|
S. Quackenbush
|
This is a skeleton for the final workplan document. The presenter asked that interested audio experts please read and provide comments on components that are missing or could be improved.
The Chair presented the AhG report, which was approved the AhG members present.
Dostları ilə paylaş: |