Discussion
Oliver Hellmuth , FhG, felt that the test was difficult in that the two test items to be compared both had poor quality but where distorted in very different manners. Hence, responding with “A same as B” was clearly not appropriate, but selecting “A better than B” might not be appropriate either.
Heiko Purnhagen, Dolby, agreed with the previous comment. He noted that if the “0” response were “I have no preference between A and B” the test outcome might have been quite different. The Chair noted that the perfect test would be hardware with a knob. This could be simulated with a MUSHRA test with 10 items for 10 knob settings, NO REFERENCE and ask users to move sliders to reflect how much they liked the quality of the item.
Ken Sugiyama, NEC, noted that the test paradigm did force users to select between distortion and residual noise, and hence was an appropriate test setup. Leonid, FhG, noted that if the zero response values are excluded, then it is not clear that there is a bimodal distribution in the data. He agree that a MUSHRA
Since there were a number of concerns raised by audio experts, it was the consensus of the AhG to have further discussion in a break-out group of the Audio Subgroup, possibly resulting in a workplan for additional testing.
Osamu Shimada, NEC, presented
m15407
|
A proposal for test methodology of Test2 for SAOC CE on the functionality of separating real-environment signals into multiple objects
|
Osamu Shimada
Toshiyuki Nomura
Akihiko Sugiyama
Osamu Hoshuyama
|
This contribution notes that
-
Test 1 evaluates the effectiveness of the user-controllability over the object separation process.
-
Test 2 evaluates the quality improvement resulting from user-controllability over the object separation process.
It presents details of test 2, which is proposed to be a two-stage test. The first stage selects the best parameters for an encoder-based separation and for the proposed user-controllable separation. This stage used something like a MUSHRA test, but in which there was no Reference, Hidden Reference or Low Pass Anchors. The second stage compares the two best outcomes to determine which framework (encoder-based separation or user-controllable separation) is preferred.
Results for test2 conducted at NEC were presented.
There was some discussion concerning the selected two-stage test methodology. An alternative test methodology would be to put all test conditions into a single MUSHRA-like test.
Proposal asks to proceed to cross-check. The Chair noted that a CE can always proceed to cross-check phase, but that concerns were raised about Test1 and that there may be an opportunity to address those concerns in the cross-check of Test2.
Osamu Shimada, NEC, presented
m15408
|
A proposal of additional information for implementing the separation functionality by SAOC RM0
|
Osamu Shimada
Toshiyuki Nomura
Akihiko Sugiyama
Osamu Hoshuyama
|
The contribution proposes additions to WD syntax and semantics to support the real-environment separation functionality. Two additional fields are proposed for the SAOC header:
-
Origin – used to indicate which input channel (i.e. microphone) an object came from.
-
Attribute – used to indicate the nature of the object (e.g. speech, background noise, background music, babble noise, etc.).
Oliver Hellmuth, FhG, noted that this also might be signalled using the SAOC meta-data. Osamu Shimada, NEC, noted that with user-defined meta-data language dependency (e.g. German vs. Japanese) might make interpretation of user-defined meta-data difficult.
Werner Oomen, Philips, notes that encoders might not be able to provide this information, and that the proposed syntax would be empty. Osamu Shimada, NEC, noted that an encoder could automatically determine the attribute. The user could experience
It was the consensus of the AhG to have further discussion in a break-out group of the Audio Subgroup.
The remaining contributions on SAOC were presented during the MPEG week.
Taejin Lee, ETRI, presented
m15362
|
Evaluation of test items for Unified Speech and Audio Coding
|
Taejin Lee
Minje Kim
Seungkwon Beack
Kyeongok Kang
|
The contribution selected two mono items and two stereo items for each content category. The selection was based on maximum difference in score between the two reference codecs. The following table shows the result of test item selection based on above consideration.
|
Speech
|
Mixed
|
Music
|
Mono Max Difference 1
|
Arirang_speech
|
Lion
|
Phi3
|
Mono Max Difference 2
|
Wedding_speech
|
Te16_fe49
|
Music_4
|
Stereo Max Difference 1
|
Green_speech
|
Alice
|
Music_1
|
Stereo Max Difference 2
|
KoreanM1
|
SpeechOverMusic_4
|
Music_3
|
Kristofer Kjörling, Dolby, presented
m15400
|
Proposal for item selection for the Unified Speech and Audio Coding CfP
|
Kristofer Kjörling
Heiko Purnhagen
Lars Villemoes
|
This contribution used the following methodology for item selection:
-
for the speech and the music category, from available test-data, select the items with the largest difference between the two reference codecs.
-
for the mixed category apply a similar method to that of m15155, according to:
-
min of HE-AAC v2 performance
-
min of AMR-WB+ performance
-
min of VC performance
-
max of VC performance
Kristofer Kjörling stated that Dolby experts would be willing to apply these criteria to the listening test data of other test data, and bring the results to the Audio Subgroup during the MPEG week. He also suggested that there should be a diversity of languages and sound stage. The Chair encouraged all test sites to share their data, and will make available a Excel spreadsheet template for sharing this data.
Werner Oomen, Philips, presented
m15422
|
Proposal for test items for unified speech and audio coding
|
Werner Oomen
Erik Schuijers
|
The contribution proposed the following rules:
-
min(HE-AAC v2) - improve worst case behavior for frameworks based on HE-AAC v2 structure
-
min (AMR-WB+) - improve worst case behavior for frameworks based on AMR-WB+ structure
-
min (VC) - improve worst case behavior for virtual coder
-
max (VC) for music and mixed music / speech category – no compromise on best case behavior
-
mean(VC) for speech category - to exclude selection of the very dry speech items. Such items are not envisioned in the use cases of unified speech and audio coding.
Eunmi Oh, Samsung, asked why the last criteria was mean(VC) as opposed to max(VC). Werner Oomen responded that a coder using a pure speech model operating on clean speech signals. He further noted that the contribution’s selection process did not use the mean(VC) criteria. Heiko Purnhagen, Dolby, suggested that there should be a diversity of sound stage or nature of “difficulty” in that some speech items present the same stereo properties, and we might not want to choose more than one item with a given “difficult” property.
The items selected were:
|
Category
|
Criteria
|
Speech
|
Speech over Music
|
Music
|
1 min (HE-AAC v2)
|
Wedding_speech
|
HarryPotter
|
Salvation
|
2 min (AMR-WB+)
|
Arirang_speech
|
Phi6
|
Music_1
|
3 min (VC)
|
Green_speech
|
Alice
|
Music_3
|
4 max (VC)
|
Louis_raquin_15
|
SpeechOverMusic5
|
Sc03
|
Redwan Salami, Voice Age, presented
m15424
|
Test Items Selection for Unified Speech and Audio Coding
|
Redwan Salami
Jimmy Lapierre
Philippe Gournay
|
The contribution proposed to select based on the largest difference between HE-AAC V2 and AMR-WB+, and to compute that difference over the three mono bitrates.
Werner Oomen, Philips, asked why only mono was investigated. Redwan Salami responded that this tested the basic coding engine, in that stereo might be a tool that could be applied to any underlying coding engine.
Items selected were:
Dostları ilə paylaş: |