2.3Creation of Task Groups
Task groups were convened for the duration of the MPEG meeting, as shown in . Results of task group activities are reported below.
2.4Approval of previous meeting report
The 84th Audio Subgroup meeting report was registered as a contribution, and was approved.
2.5Review of AHG reports
There were no requests to review any of the AHG reports.
2.6Joint meetings
The joint meetings with Audio for the week are shown below:
Groups
|
What
|
Where
|
Day
|
Time
|
All
|
MPEG standards for Gaming
|
Plenary
|
Wed
|
1100-1200
|
Sys., Audio, Req.
|
Interactive Music AF, M15626
|
Audio
|
Wed
|
1600-1700
|
Audio, Req.
|
Proposed Audio profiles for ALS, SLS
|
Audio
|
Thu
|
1100-1130
| 2.7Received National Body Comments and Liaison matters
The NB Comments and Liaison documents for the meeting that require a response are as shown below.
No.
|
Title
|
Topic
|
Response by
|
m15261
|
Liaison Statement from ITU-T SG 16
|
Enhanced Low Delay AAC
|
Schnell
|
m15516
|
Liaison Statement from ITU-T SG 16
|
full band audio coding
|
Schnell
|
m15555
|
IEC 61937-10 Digital Audio Interface
|
ALS over IEC 61937-10
|
Quackenbush
|
m15551
|
FRNB comment on the Unified Speech and Audio Coding Exploration Activity
|
USAC
|
Quackenbush
|
m15720
|
KNB comment on the Unified Speech and Audio Coding Exploration Activity
|
USAC
|
Quackenbush
| 3Record of AhG meetings 3.1AhG Meeting on SAOC, Unified Speech and Audio Sunday 1000-1700 3.1.1Unified Speech and Audio 0900-1400
Test Site Reports
The first set of presentations reported on the test site listening facilities. These were very straightforward and factual, and in each case there was no discussion to record.
Kristofer Kjörling, Dolby, presented
m15545
|
Speech and Audio listening test lab report from Dolby
|
Kristofer Kjörling
|
Anisse Taleb, Ericsson, presented
m15618
|
Speech and Audio listening test lab report (Ericsson)
|
Anisse Taleb
|
Taejin Lee, ETRI, presented
m15570
|
Unified speech and audio coding listening test report from ETRI
|
Taejin Lee
Seungkwon Beack
Minje Kim
Kyeongok Kang
|
Herve Taddei, Huawei, presented
m15678
|
Audio Report on the subjective testing of Unified Speech and Audio Coding proposals at Huawei
|
Lijing Xu
Herve Taddei
|
Pierrick Philippe, France Telecom, presented
m15655
|
France Telecom listening test results for the CfP on Unified Speech and Audio Coding
|
Pierrick Philippe
David Virette
|
Markus Multrus, FhG, presented
m15619
|
Report on USAC Subjective Tests at Fraunhofer IIS Test Site
|
Markus Multrus
Ralf Geiger
|
Dong Soo Kim, LGE, presented
m15581
|
Report on the USAC listening test at LGE
|
Dong Soo Kim
Sungyong Yoon
Jaehyun Lim
Hyun-Kook Lee
Henney Oh
Yang-Won Jung
|
Werner Oomen, Philips, presented
m15548
|
Speech and Audio listening test lab report from Philips
|
Werner Oomen
Jeroen Koppens
|
Eunmi Oh, Samsung, presented
m15567
|
Listening test results for Unified Speech and Audio Coding from Samsung
|
Eunmi Oh
Miyoung Kim
JungHoe Kim
|
Oliver Wuebbolt, Thomson, presented
m15565
|
Speech & Audio - Listening Test, Report & Results - Thomson
|
Johannes Boehm
Oliver Wuebbolt
Florian Keiler
|
Roch Lefebvre, VoiceAge, presented
m15608
|
Report on USAC subjective tests at VoiceAge test site
|
Roch Lefebvre
|
System Descriptions
The next set of presentations reported on the proponent system architectures.
Kristofer Kjörling, Dolby, presented
m15547
|
Technical description of the Dolby Philips proposal for the speech and audio work- item
|
Kristofer Kjörling
Werner Oomen
Jonas Samuelsson
Lars Villemoes
Barbara Resch
Erik Schuijers
Pontus Carlsson
|
The Dolby/Philips system builds on HE-AAC V2 architecture, but adding prediction-based time domain processing elements. The tools in HE-AAC V2 have been slightly modified so as to offer additional compression performance for the diverse speech and audio content. The MDCT permits a number of window lengths and block sizes, in which all transform bins have equal bandwidth. All AAC-LC tools were available in the core coder. The codec delay can be as much as 1.2 seconds, which is due primarily to encoder look-ahead and blocking into super-frames.
Taejin Lee, ETRI, presented
m15568
|
Technical description of the ETRI proposal for the unified speech and audio coding
|
Taejin Lee
Seungkwon Beack
Minje Kim
Kyeongok Kang
Hochong Park
Youngcheol Park
|
The ETRI system combined the individual tools from AMR-WB+ and HE-AAC V2 to create a new codec, with modifications to the tools as needed to address the coding of diverse content types. It used two processing models: Steady State using linear prediction coding tools, and Complex State using transform coding tools. There is a very flexible signal state decision module that results in a hard decision to use one of the two processing models for each block.
Bernhard Grill, FhG, presented
m15621
|
Technical Decsription of the Fraunhofer IIS Submission for the CfP on USAC
|
Markus Multrus
Ralf Geiger
Bernhard Grill
Nikolaus Rettelbach
Max Neuendorf
|
The FhG system is a collaboration between Fraunhover IIS and VoiceAge. The system combines elements from MPEG Surround, the SBR tool, AAC and ACELP linear prediction coding. The tools were modified as appropriate for better performance for coding diverse speech and audio content.
MPEG Surround is used for stereo processing, although it can easily be extended for processing multi-channel signals. In the submission it only uses a single OTT box. The SBR technology employed the same filterbank as MPEG Surround, but was enhanced to support very low cross-over frequencies. It can be switched off for high-bitrate operation. The core coder is comprised of a Linear Prediction tool and a MDCT Transform coder or Time-Domain coder.
In the core coder, the Linear Prediction tools can be activated on a block-by-block basis or can always be active. The LP tool is followed by either a time-domain coding tool (ACELP) or a frequency-domain coding tool (AAC). At high bitrates the system is virtually identical to AAC. The AAC tool uses an improved entropy coder and optionally uses time-warping for improved performance on speech signals.
The possible configurations of the core coder are:
State
|
Stage 1
|
Stage 2
|
1
|
LPC
|
Time-Domain residual coder
|
2
|
LPC
|
MDCT residual coder
|
3
|
LPC off
|
MDCT Coder
|
Philippe Gournay, VoiceAge, presented
m15609
|
Technical Description of the VoiceAge Candidate for USAC
|
Roch Lefebvre
|
The VoiceAge system is a collaboration between Fraunhover IIS and VoiceAge. The decoder is essentially the same system as was presented in the FhG contribution. The time-domain core coding tool is the AMR-WB ACELP coding tool.
The LPC coding tool is typically switched on for speech-like signals. In this case, the core coder can switch between time-domain (ACELP) and frequency-domain coding of the residual (TCX) on a block-by-block basis. If the LPC coding tool is switched off, the coder uses frequency-domain coding. At high bitrates the codec is very similar to AAC. The VoiceAge encoder employed a different psychoacoustic model, SBR tool and MDCT coder. The MDCT did not employ time-warping.
The complexity of the decoder is no more than 1.8 times that of HE-AAC V2.
Dong Soo Kim, LGE, presented
m15582
|
LGE submission to Unified Speech & Audio Coding
|
Dong Soo Kim
Sungyong Yoon
Jaehyun Lim
Hyun-Kook Lee
|
The LG system switched between three coding modules for coding speech-rich signals, music-rich and mixed speech and music. The speech-rich coder employs a modified AMR-WB+ tools, music-rich coder used a modified HE-AAC V2 tools and mixed coder uses a residual coding scheme. The decoder employs a delay compensation process so that the system can switch between coding modes on a block-by-block basis.
JungHoe Kim, Samusng, presented
m15564
|
Response to CfP on unified speech and audio coding
|
Eunmi Oh
JungHoe Kim
Miyoung Kim
KiHyun Choo
Hosang Sung
|
The presentation noted that coding of music signals is most effective using atransform coder with perceptual threshold for shaping the quantization noise while coding of speech signals is most effective using a linear predictive speech model. The Samsung system uses a non-uniform bandwidth MDCT tool and any of the T/F information can be coded be either the high-temporal resolution tool or HE-AAC like quantization tools. In addition, it uses a TNS tool, a SBR tool and a parametric stereo tool. The high-temporal resolution tool is only used at the lower bitrates. In discussion it was noted that, without considering quantization errors, the variable T/F MDCT tool operates such that time-domain aliasing is always cancelled.
Oliver Wuebbolt, Thomson, presented
m15566
|
Speech & Audio - Description of Technology of the Thomson proposal
|
Florian Keiler
Oliver Wuebbolt
Johannes Boehm
|
The Thomson system uses a speech/audio switch to select either a time-domain coding tool or a frequency-domain coding tool in the core coder. In addition, it incorporates SBR and Parametric Stereo coding tools.
Analysis of Listening Test Data
The final set of presentations reported on analysis of the listening test data.
Werner Oomen, Philips, presented
m15546
|
Analysis of speech and audio listening test data
|
Werner Oomen
Kristofer Kjörling
Heiko Purnhagen
|
This contribution presents the Philips and Dolby experts analysis of the listening test data. The data was analyzed to determine if the requirements as specified in the Evaluation document were met, for both 95% and 99% confidence intervals. The results showed that systems 2, 4, 8, 10 and 11 were promising systems with respect to the Requirements analysis.
Taejin Lee, ETRI, presented
m15613
|
Analysis of unified speech and audio coding listening test results
|
Taejin Lee
Seungkwon Beack
Minje Kim
Kyeongok Kang
|
This contribution presents an analysis of the Figure of Merit statistics. It presents a count of how often each system has the highest mean performance for each test and each signal category. Its conclusion is that there is no dominant best system.
Pierrick Philippe, France Telecom R&D, presented
m15656
|
France Telecom contribution to the analysis of the listening test results following the CfP on Unified Speech and Audio Coding
|
Pierrick Philippe
David Virette
|
This contribution presented graphs of system performance as compared to VC. For each signal category it performed single-sided T-tests with respect to the best performing system
Category
|
System with best mean score
|
S
|
4, but not different from performance of 10
|
S+M
|
2 is best
|
M
|
10, but not different from performance of 11
|
The overall conclusions are that systems 2, 4, 8, 10 and 11 all seem promising for this new work.
Ralf Geiger, FhG, presented
m15625
|
Analysis of USAC Listening Test Data
|
Frederik Nagel
Ralf Geiger
Max Neuendorf
Markus Multrus
|
The contribution first presents the calculation of VC for each test and each category.
For the Requirement that the New Technology be no worse than VC, only sys10 and sys11 meet the requirements for very test and every category for both the 95% and 99% levels of significance.
For the Figure of Merit, only sys10 and sys11 have a non-negative “d” statistic for all tests, at the 95% level of significance. When averaged over all tests, only sys2, sys10 and sys11 have a non-negative “d” statistic, at the 95% level of significance, but sys10 has the best mean “d” statistic, but this mean value is not different from that of sys11 at the 95% level of significance.
Roch Lefebvre, VoiceAge, presented
m15610
|
Analysis of Combined Listening Test Results for USAC
|
Roch Lefebvre
|
This contribution presents a number of interesting views of the test results based on ranking of the proponent mean scores based on test, signal category and signal. The overall conclusion is that sys10 is the best overall proposal, followed closely by sys11.
Eunmi Oh, Samsung, presented a statistical analysis of listening test data, which was provided in contribution m15567. It presented a table showing which systems had the best performance by test and signal category, and also showed the systems whose performance were not different from the best. It concluded that there is no dominant system when performance is viewed by test and signal category. Systems sys2, sys4, sys8, sys10 and sys11 all seem promising. Frederik Nagel, FhG, noted that presenting a table of results based on best mean score might be misleading, in that several systems may have mean scores that are not different from the best at the 95% level of significance.
Dong Soo Kim, LG, presented
m15713
|
USAC listening test analysis from LGE
|
Dong Soo Kim
Sungyong Yoon
|
This contribution presents a statistical analysis of listening test data. This was registered as m15713 and was uploaded on Sunday July 20. For the Requirements, it showed that only sys10 and sys11 pass the Requirements for all tests and all signal categories. Furthermore, it noted that sys4 passed the Requirements most often for speech category signals and sys2 passed the Requirements most often for mixed category signals. It presents a Figure of Merit analysis by signal, sys2, sys10 and sys11 all have similarly good performance.
Review of AhG Report
The AhG Chairs presented the AhG report and proposed
-
To use France Telecom R&D mono/stereo graphs of performance with respect to VC, but removing LP anchors, AMR-WB+ and HE-AAC-V2.
And additionally summarize the test results as
-
sys10 and sys11 are the best or not different from the best for almost every bitrates (mono and stereo) and content types
-
sys2, sys4 and sys8 are the best or not different from the best for many bitrates (mono and stereo) and content types
3.1.2SAOC 1400-1800
CE: Separation of real-environment signals
Osamu Shimada, NEC, made short presentation that reminder the Audio experts the framework for and meaning of test1 and test2 in this core experiment.
Jeongil Seo, ETRI, presented the test1 results in
m15572
|
Listening Test Report for CE on separating real-environment signals into multiple objects from ETRI
|
Jeongil Seo
Seungkwon Beack
Kyeongok Kang
|
Yang-Won Jung, LG, presented the test1 results in
m15586
|
Listening test reports for CE on separating real-environment signals
|
Yang-Won Jung
Henney Oh
Dong Soo Kim
Sungyong Yoon
|
Leonid Terentiev , FhG, presented the test1 results in
m15632
|
Listening test report for CE on separating real-environment signals into multiple objects for the MPEG SAOC system
|
Leonid Terentiev
Cornelia Falch
Oliver Hellmuth
Johannes Hilpert
|
The contribution showed that all listeners in Test1 had at best “no preference” and at worst “did not prefer” the proposed technology. There was considerable discussion on the slide which presented mean score for all listeners in Test 1.
Osamu Shimada, NEC, presented
m15574
|
Listening test results of Test 1 for SAOC CE on separating real-environment signals into multiple objects
|
Osamu Shimada
Toshiyuki Nomura
Akihiko Sugiyama
Osamu Hoshuyama
|
The set of contributions were discussed. The Chair noted that averaging over listener responses may obscure the individual’s intent rather than minimize the noise in the listener responses. It was decided that a break-out group will consider how best to interpret the data we have at this meeting and report back later in the week.
Dostları ilə paylaş: |