Organisation internationale de normalisation



Yüklə 3,08 Mb.
səhifə75/91
tarix02.01.2022
ölçüsü3,08 Mb.
#27499
1   ...   71   72   73   74   75   76   77   78   ...   91

2.3Creation of Task Groups


Task groups were convened for the duration of the MPEG meeting, as shown in . Results of task group activities are reported below.

2.4Approval of previous meeting report


The 84th Audio Subgroup meeting report was registered as a contribution, and was approved.

2.5Review of AHG reports


There were no requests to review any of the AHG reports.

2.6Joint meetings


The joint meetings with Audio for the week are shown below:

Groups

What

Where

Day

Time

All

MPEG standards for Gaming

Plenary

Wed

1100-1200

Sys., Audio, Req.

Interactive Music AF, M15626

Audio

Wed

1600-1700

Audio, Req.

Proposed Audio profiles for ALS, SLS

Audio

Thu

1100-1130

2.7Received National Body Comments and Liaison matters


The NB Comments and Liaison documents for the meeting that require a response are as shown below.

No.

Title

Topic

Response by

m15261

Liaison Statement from ITU-T SG 16

Enhanced Low Delay AAC

Schnell

m15516

Liaison Statement from ITU-T SG 16

full band audio coding

Schnell

m15555

IEC 61937-10 Digital Audio Interface

ALS over IEC 61937-10

Quackenbush

m15551

FRNB comment on the Unified Speech and Audio Coding Exploration Activity

USAC

Quackenbush

m15720

KNB comment on the Unified Speech and Audio Coding Exploration Activity

USAC

Quackenbush

3Record of AhG meetings

3.1AhG Meeting on SAOC, Unified Speech and Audio Sunday 1000-1700

3.1.1Unified Speech and Audio 0900-1400


Test Site Reports

The first set of presentations reported on the test site listening facilities. These were very straightforward and factual, and in each case there was no discussion to record.

Kristofer Kjörling, Dolby, presented

m15545

Speech and Audio listening test lab report from Dolby

Kristofer Kjörling

Anisse Taleb, Ericsson, presented



m15618

Speech and Audio listening test lab report (Ericsson)

Anisse Taleb

Taejin Lee, ETRI, presented



m15570

Unified speech and audio coding listening test report from ETRI

Taejin Lee
Seungkwon Beack
Minje Kim
Kyeongok Kang

Herve Taddei, Huawei, presented



m15678

Audio Report on the subjective testing of Unified Speech and Audio Coding proposals at Huawei

Lijing Xu
Herve Taddei

Pierrick Philippe, France Telecom, presented



m15655

France Telecom listening test results for the CfP on Unified Speech and Audio Coding

Pierrick Philippe

David Virette


Markus Multrus, FhG, presented



m15619

Report on USAC Subjective Tests at Fraunhofer IIS Test Site

Markus Multrus

Ralf Geiger


Dong Soo Kim, LGE, presented



m15581

Report on the USAC listening test at LGE

Dong Soo Kim
Sungyong Yoon
Jaehyun Lim
Hyun-Kook Lee
Henney Oh
Yang-Won Jung

Werner Oomen, Philips, presented



m15548

Speech and Audio listening test lab report from Philips

Werner Oomen

Jeroen Koppens


Eunmi Oh, Samsung, presented



m15567

Listening test results for Unified Speech and Audio Coding from Samsung

Eunmi Oh
Miyoung Kim
JungHoe Kim

Oliver Wuebbolt, Thomson, presented



m15565

Speech & Audio - Listening Test, Report & Results - Thomson

Johannes Boehm
Oliver Wuebbolt
Florian Keiler

Roch Lefebvre, VoiceAge, presented



m15608

Report on USAC subjective tests at VoiceAge test site

Roch Lefebvre


System Descriptions

The next set of presentations reported on the proponent system architectures.

Kristofer Kjörling, Dolby, presented

m15547

Technical description of the Dolby Philips proposal for the speech and audio work- item

Kristofer Kjörling

Werner Oomen


Jonas Samuelsson
Lars Villemoes
Barbara Resch
Erik Schuijers
Pontus Carlsson

The Dolby/Philips system builds on HE-AAC V2 architecture, but adding prediction-based time domain processing elements. The tools in HE-AAC V2 have been slightly modified so as to offer additional compression performance for the diverse speech and audio content. The MDCT permits a number of window lengths and block sizes, in which all transform bins have equal bandwidth. All AAC-LC tools were available in the core coder. The codec delay can be as much as 1.2 seconds, which is due primarily to encoder look-ahead and blocking into super-frames.

Taejin Lee, ETRI, presented



m15568

Technical description of the ETRI proposal for the unified speech and audio coding

Taejin Lee
Seungkwon Beack
Minje Kim
Kyeongok Kang
Hochong Park
Youngcheol Park

The ETRI system combined the individual tools from AMR-WB+ and HE-AAC V2 to create a new codec, with modifications to the tools as needed to address the coding of diverse content types. It used two processing models: Steady State using linear prediction coding tools, and Complex State using transform coding tools. There is a very flexible signal state decision module that results in a hard decision to use one of the two processing models for each block.

Bernhard Grill, FhG, presented



m15621

Technical Decsription of the Fraunhofer IIS Submission for the CfP on USAC

Markus Multrus

Ralf Geiger

Bernhard Grill

Nikolaus Rettelbach

Max Neuendorf


The FhG system is a collaboration between Fraunhover IIS and VoiceAge. The system combines elements from MPEG Surround, the SBR tool, AAC and ACELP linear prediction coding. The tools were modified as appropriate for better performance for coding diverse speech and audio content.

MPEG Surround is used for stereo processing, although it can easily be extended for processing multi-channel signals. In the submission it only uses a single OTT box. The SBR technology employed the same filterbank as MPEG Surround, but was enhanced to support very low cross-over frequencies. It can be switched off for high-bitrate operation. The core coder is comprised of a Linear Prediction tool and a MDCT Transform coder or Time-Domain coder.

In the core coder, the Linear Prediction tools can be activated on a block-by-block basis or can always be active. The LP tool is followed by either a time-domain coding tool (ACELP) or a frequency-domain coding tool (AAC). At high bitrates the system is virtually identical to AAC. The AAC tool uses an improved entropy coder and optionally uses time-warping for improved performance on speech signals.

The possible configurations of the core coder are:



State

Stage 1

Stage 2

1

LPC

Time-Domain residual coder

2

LPC

MDCT residual coder

3

LPC off

MDCT Coder

Philippe Gournay, VoiceAge, presented



m15609

Technical Description of the VoiceAge Candidate for USAC

Roch Lefebvre


The VoiceAge system is a collaboration between Fraunhover IIS and VoiceAge. The decoder is essentially the same system as was presented in the FhG contribution. The time-domain core coding tool is the AMR-WB ACELP coding tool.

The LPC coding tool is typically switched on for speech-like signals. In this case, the core coder can switch between time-domain (ACELP) and frequency-domain coding of the residual (TCX) on a block-by-block basis. If the LPC coding tool is switched off, the coder uses frequency-domain coding. At high bitrates the codec is very similar to AAC. The VoiceAge encoder employed a different psychoacoustic model, SBR tool and MDCT coder. The MDCT did not employ time-warping.

The complexity of the decoder is no more than 1.8 times that of HE-AAC V2.
Dong Soo Kim, LGE, presented

m15582

LGE submission to Unified Speech & Audio Coding

Dong Soo Kim

Sungyong Yoon


Jaehyun Lim
Hyun-Kook Lee

The LG system switched between three coding modules for coding speech-rich signals, music-rich and mixed speech and music. The speech-rich coder employs a modified AMR-WB+ tools, music-rich coder used a modified HE-AAC V2 tools and mixed coder uses a residual coding scheme. The decoder employs a delay compensation process so that the system can switch between coding modes on a block-by-block basis.

JungHoe Kim, Samusng, presented



m15564

Response to CfP on unified speech and audio coding

Eunmi Oh
JungHoe Kim
Miyoung Kim
KiHyun Choo
Hosang Sung

The presentation noted that coding of music signals is most effective using atransform coder with perceptual threshold for shaping the quantization noise while coding of speech signals is most effective using a linear predictive speech model. The Samsung system uses a non-uniform bandwidth MDCT tool and any of the T/F information can be coded be either the high-temporal resolution tool or HE-AAC like quantization tools. In addition, it uses a TNS tool, a SBR tool and a parametric stereo tool. The high-temporal resolution tool is only used at the lower bitrates. In discussion it was noted that, without considering quantization errors, the variable T/F MDCT tool operates such that time-domain aliasing is always cancelled.

Oliver Wuebbolt, Thomson, presented



m15566

Speech & Audio - Description of Technology of the Thomson proposal

Florian Keiler
Oliver Wuebbolt
Johannes Boehm

The Thomson system uses a speech/audio switch to select either a time-domain coding tool or a frequency-domain coding tool in the core coder. In addition, it incorporates SBR and Parametric Stereo coding tools.

Analysis of Listening Test Data

The final set of presentations reported on analysis of the listening test data.

Werner Oomen, Philips, presented

m15546

Analysis of speech and audio listening test data

Werner Oomen

Kristofer Kjörling

Heiko Purnhagen


This contribution presents the Philips and Dolby experts analysis of the listening test data. The data was analyzed to determine if the requirements as specified in the Evaluation document were met, for both 95% and 99% confidence intervals. The results showed that systems 2, 4, 8, 10 and 11 were promising systems with respect to the Requirements analysis.

Taejin Lee, ETRI, presented



m15613

Analysis of unified speech and audio coding listening test results

Taejin Lee
Seungkwon Beack
Minje Kim
Kyeongok Kang

This contribution presents an analysis of the Figure of Merit statistics. It presents a count of how often each system has the highest mean performance for each test and each signal category. Its conclusion is that there is no dominant best system.

Pierrick Philippe, France Telecom R&D, presented

m15656

France Telecom contribution to the analysis of the listening test results following the CfP on Unified Speech and Audio Coding

Pierrick Philippe

David Virette




This contribution presented graphs of system performance as compared to VC. For each signal category it performed single-sided T-tests with respect to the best performing system

Category

System with best mean score

S

4, but not different from performance of 10

S+M

2 is best

M

10, but not different from performance of 11

The overall conclusions are that systems 2, 4, 8, 10 and 11 all seem promising for this new work.

Ralf Geiger, FhG, presented



m15625

Analysis of USAC Listening Test Data

Frederik Nagel

Ralf Geiger

Max Neuendorf

Markus Multrus



The contribution first presents the calculation of VC for each test and each category.

For the Requirement that the New Technology be no worse than VC, only sys10 and sys11 meet the requirements for very test and every category for both the 95% and 99% levels of significance.

For the Figure of Merit, only sys10 and sys11 have a non-negative “d” statistic for all tests, at the 95% level of significance. When averaged over all tests, only sys2, sys10 and sys11 have a non-negative “d” statistic, at the 95% level of significance, but sys10 has the best mean “d” statistic, but this mean value is not different from that of sys11 at the 95% level of significance.

Roch Lefebvre, VoiceAge, presented



m15610

Analysis of Combined Listening Test Results for USAC

Roch Lefebvre

This contribution presents a number of interesting views of the test results based on ranking of the proponent mean scores based on test, signal category and signal. The overall conclusion is that sys10 is the best overall proposal, followed closely by sys11.

Eunmi Oh, Samsung, presented a statistical analysis of listening test data, which was provided in contribution m15567. It presented a table showing which systems had the best performance by test and signal category, and also showed the systems whose performance were not different from the best. It concluded that there is no dominant system when performance is viewed by test and signal category. Systems sys2, sys4, sys8, sys10 and sys11 all seem promising. Frederik Nagel, FhG, noted that presenting a table of results based on best mean score might be misleading, in that several systems may have mean scores that are not different from the best at the 95% level of significance.

Dong Soo Kim, LG, presented

m15713

USAC listening test analysis from LGE

Dong Soo Kim
Sungyong Yoon

This contribution presents a statistical analysis of listening test data. This was registered as m15713 and was uploaded on Sunday July 20. For the Requirements, it showed that only sys10 and sys11 pass the Requirements for all tests and all signal categories. Furthermore, it noted that sys4 passed the Requirements most often for speech category signals and sys2 passed the Requirements most often for mixed category signals. It presents a Figure of Merit analysis by signal, sys2, sys10 and sys11 all have similarly good performance.

Review of AhG Report

The AhG Chairs presented the AhG report and proposed



  • To use France Telecom R&D mono/stereo graphs of performance with respect to VC, but removing LP anchors, AMR-WB+ and HE-AAC-V2.

And additionally summarize the test results as

  • sys10 and sys11 are the best or not different from the best for almost every bitrates (mono and stereo) and content types

  • sys2, sys4 and sys8 are the best or not different from the best for many bitrates (mono and stereo) and content types

3.1.2SAOC 1400-1800


CE: Separation of real-environment signals

Osamu Shimada, NEC, made short presentation that reminder the Audio experts the framework for and meaning of test1 and test2 in this core experiment.

Jeongil Seo, ETRI, presented the test1 results in



m15572

Listening Test Report for CE on separating real-environment signals into multiple objects from ETRI

Jeongil Seo

Seungkwon Beack

Kyeongok Kang

Yang-Won Jung, LG, presented the test1 results in



m15586

Listening test reports for CE on separating real-environment signals

Yang-Won Jung
Henney Oh
Dong Soo Kim
Sungyong Yoon


Leonid Terentiev , FhG, presented the test1 results in

m15632

Listening test report for CE on separating real-environment signals into multiple objects for the MPEG SAOC system

Leonid Terentiev

Cornelia Falch

Oliver Hellmuth

Johannes Hilpert



The contribution showed that all listeners in Test1 had at best “no preference” and at worst “did not prefer” the proposed technology. There was considerable discussion on the slide which presented mean score for all listeners in Test 1.

Osamu Shimada, NEC, presented

m15574

Listening test results of Test 1 for SAOC CE on separating real-environment signals into multiple objects

Osamu Shimada
Toshiyuki Nomura
Akihiko Sugiyama
Osamu Hoshuyama

The set of contributions were discussed. The Chair noted that averaging over listener responses may obscure the individual’s intent rather than minimize the noise in the listener responses. It was decided that a break-out group will consider how best to interpret the data we have at this meeting and report back later in the week.



Yüklə 3,08 Mb.

Dostları ilə paylaş:
1   ...   71   72   73   74   75   76   77   78   ...   91




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin