International organisation for standardisation

Yüklə 3,36 Mb.

səhifə	66/79
tarix	03.01.2022
ölçüsü	3,36 Mb.
	#42830

1 ... 62 63 64 65 66 67 68 69 ... 79

3.1.2Unified Speech and Audio 1400-1700

Discussion

Oliver Hellmuth , FhG, felt that the test was difficult in that the two test items to be compared both had poor quality but where distorted in very different manners. Hence, responding with “A same as B” was clearly not appropriate, but selecting “A better than B” might not be appropriate either.

Heiko Purnhagen, Dolby, agreed with the previous comment. He noted that if the “0” response were “I have no preference between A and B” the test outcome might have been quite different. The Chair noted that the perfect test would be hardware with a knob. This could be simulated with a MUSHRA test with 10 items for 10 knob settings, NO REFERENCE and ask users to move sliders to reflect how much they liked the quality of the item.
Ken Sugiyama, NEC, noted that the test paradigm did force users to select between distortion and residual noise, and hence was an appropriate test setup. Leonid, FhG, noted that if the zero response values are excluded, then it is not clear that there is a bimodal distribution in the data. He agree that a MUSHRA
Since there were a number of concerns raised by audio experts, it was the consensus of the AhG to have further discussion in a break-out group of the Audio Subgroup, possibly resulting in a workplan for additional testing.
Osamu Shimada, NEC, presented

m15407

A proposal for test methodology of Test2 for SAOC CE on the functionality of separating real-environment signals into multiple objects

Osamu Shimada
Toshiyuki Nomura
Akihiko Sugiyama
Osamu Hoshuyama

This contribution notes that

Test 1 evaluates the effectiveness of the user-controllability over the object separation process.
Test 2 evaluates the quality improvement resulting from user-controllability over the object separation process.

It presents details of test 2, which is proposed to be a two-stage test. The first stage selects the best parameters for an encoder-based separation and for the proposed user-controllable separation. This stage used something like a MUSHRA test, but in which there was no Reference, Hidden Reference or Low Pass Anchors. The second stage compares the two best outcomes to determine which framework (encoder-based separation or user-controllable separation) is preferred.
Results for test2 conducted at NEC were presented.
There was some discussion concerning the selected two-stage test methodology. An alternative test methodology would be to put all test conditions into a single MUSHRA-like test.
Proposal asks to proceed to cross-check. The Chair noted that a CE can always proceed to cross-check phase, but that concerns were raised about Test1 and that there may be an opportunity to address those concerns in the cross-check of Test2.
Osamu Shimada, NEC, presented

m15408

A proposal of additional information for implementing the separation functionality by SAOC RM0

Osamu Shimada
Toshiyuki Nomura
Akihiko Sugiyama
Osamu Hoshuyama

The contribution proposes additions to WD syntax and semantics to support the real-environment separation functionality. Two additional fields are proposed for the SAOC header:

Origin – used to indicate which input channel (i.e. microphone) an object came from.
Attribute – used to indicate the nature of the object (e.g. speech, background noise, background music, babble noise, etc.).

Oliver Hellmuth, FhG, noted that this also might be signalled using the SAOC meta-data. Osamu Shimada, NEC, noted that with user-defined meta-data language dependency (e.g. German vs. Japanese) might make interpretation of user-defined meta-data difficult.

Werner Oomen, Philips, notes that encoders might not be able to provide this information, and that the proposed syntax would be empty. Osamu Shimada, NEC, noted that an encoder could automatically determine the attribute. The user could experience
It was the consensus of the AhG to have further discussion in a break-out group of the Audio Subgroup.
The remaining contributions on SAOC were presented during the MPEG week.

3.1.2Unified Speech and Audio 1400-1700

Taejin Lee, ETRI, presented

m15362

Evaluation of test items for Unified Speech and Audio Coding

Taejin Lee
Minje Kim
Seungkwon Beack
Kyeongok Kang

The contribution selected two mono items and two stereo items for each content category. The selection was based on maximum difference in score between the two reference codecs. The following table shows the result of test item selection based on above consideration.

	Speech	Mixed	Music
Mono Max Difference 1	Arirang_speech	Lion	Phi3
Mono Max Difference 2	Wedding_speech	Te16_fe49	Music_4
Stereo Max Difference 1	Green_speech	Alice	Music_1
Stereo Max Difference 2	KoreanM1	SpeechOverMusic_4	Music_3

Kristofer Kjörling, Dolby, presented

m15400

Proposal for item selection for the Unified Speech and Audio Coding CfP

Kristofer Kjörling
Heiko Purnhagen
Lars Villemoes

This contribution used the following methodology for item selection:

for the speech and the music category, from available test-data, select the items with the largest difference between the two reference codecs.
for the mixed category apply a similar method to that of m15155, according to:
- min of HE-AAC v2 performance
- min of AMR-WB+ performance
- min of VC performance
- max of VC performance

Kristofer Kjörling stated that Dolby experts would be willing to apply these criteria to the listening test data of other test data, and bring the results to the Audio Subgroup during the MPEG week. He also suggested that there should be a diversity of languages and sound stage. The Chair encouraged all test sites to share their data, and will make available a Excel spreadsheet template for sharing this data.

Werner Oomen, Philips, presented

m15422

Proposal for test items for unified speech and audio coding

Werner Oomen
Erik Schuijers

The contribution proposed the following rules:

min(HE-AAC v2) - improve worst case behavior for frameworks based on HE-AAC v2 structure
min (AMR-WB+) - improve worst case behavior for frameworks based on AMR-WB+ structure
min (VC) - improve worst case behavior for virtual coder
max (VC) for music and mixed music / speech category – no compromise on best case behavior
mean(VC) for speech category - to exclude selection of the very dry speech items. Such items are not envisioned in the use cases of unified speech and audio coding.

Eunmi Oh, Samsung, asked why the last criteria was mean(VC) as opposed to max(VC). Werner Oomen responded that a coder using a pure speech model operating on clean speech signals. He further noted that the contribution’s selection process did not use the mean(VC) criteria. Heiko Purnhagen, Dolby, suggested that there should be a diversity of sound stage or nature of “difficulty” in that some speech items present the same stereo properties, and we might not want to choose more than one item with a given “difficult” property.

The items selected were:

	Category
Criteria	Speech	Speech over Music	Music
1 min (HE-AAC v2)	Wedding_speech	HarryPotter	Salvation
2 min (AMR-WB+)	Arirang_speech	Phi6	Music_1
3 min (VC)	Green_speech	Alice	Music_3
4 max (VC)	Louis_raquin_15	SpeechOverMusic5	Sc03

Redwan Salami, Voice Age, presented

m15424

Test Items Selection for Unified Speech and Audio Coding

Redwan Salami
Jimmy Lapierre
Philippe Gournay

The contribution proposed to select based on the largest difference between HE-AAC V2 and AMR-WB+, and to compute that difference over the three mono bitrates.

Werner Oomen, Philips, asked why only mono was investigated. Redwan Salami responded that this tested the basic coding engine, in that stereo might be a tool that could be applied to any underlying coding engine.

Items selected were:

Yüklə 3,36 Mb.

Dostları ilə paylaş:

1 ... 62 63 64 65 66 67 68 69 ... 79