Miyoung Kim, Samsung, presented
14959
|
Miyoung Kim
Eunmi Oh
JungHoe Kim
Hosang Sung
KiHyun Choo
|
Further evidence on Joint Speech and Audio Coding
|
It was noted that at the previous MPEG meeting evidence was provided on performance of Samsung technology when operating at 16, 20, 24 kb/s for stereo signals. This contribution shows the performance of Samsung technology when operating at 16, 20, 24 kb/s for mono signals, which is summarized as follows:
16 kb/s mono
-
For speech, NT is not different from VC
-
For music, NT is better that VC at the 95% level of significance.
-
For mixed content, NT is better that VC at the 95% level of significance.
-
When pooling all content types, NT is better that VC at the 95% level of significance.
24 kb/s mono
-
For speech, NT is not different from VC
-
For music, NT is not different from VC
-
For mixed content, NT is not different from VC
-
When pooling all content types, NT is better that VC at the 95% level of significance.
The contribution also presented the bitrate used to code each test item. This data was cross-checked by Audio Research Labs, as reported on the AhG reflector.
Anise Taleb, Ericsson, commented that he is not able to confirm the performance of VC because the Samsung results are not reported on a per-item basis. The Chair suggested that all the Samsung NT performance information to date be captured in a WG11 output document.
Eunmi Oh, Samsung, presented
15011
|
Eunmi Oh
JungHoe Kim
Miyoung Kim
|
Thoughts on Joint speech and audio coding
|
This reviewed the status of all items in the candidate test set and addressed the following issues:
-
Clarify exact source, if CD.
-
Clarify copyright and clarifying whether MPEG has permission to use the item for the purposes of developing an MPEG standard.
-
Noted that audio books may be an important application area for joint speech and audio coding, and nominated several additional items in the mixed category that are excerpts from audio books.
Recommended that the following be addressed via a workplan:
-
Adjust the level of each item so that all items are in a listener’s “comfort zone.”
-
For each item, determine whether the best means to derive mono from stereo is via downmix, left channel or right channel.
David Virette, France Telecom, presented
14980
|
Pierrick Philippe
David Virette
|
Comments on Speech and Audio Coding Activity
|
This contribution proposes that all items be concatenated, coded and then split prior to presentation for subjective evaluation. This concatenation can be used to eliminate issues of coder startup, shutdown and bit buffer management. In addition, it can be used to emulate the speech/music transition in the encoding and decoding process.
It further recommended that there should be five test items in each of the categories of speech, music and mixed content.
Finally, it recommended a two step selection procedure:
-
Retain only codecs better or equal to VC in the first step
-
If necessary, as second step, that there be a collaborative effort to merge qualified codecs into a single RM.
The Chair noted such collaboration is problematic if a proponent fully describes their technology prior to collaboration and if during the collaboration phase their technology is eliminate and hence is not part of the selected RM.
Anisse Taleb, Ericsson, presented
14971
|
Anisse Taleb
Manuel Briand
|
Comments on draft CfP on Speech and Audio Coding
|
This contribution raises a number of issues concerning the CfP.
-
First, the requirements are for speech and audio coding without reference to latency, but the introductory paragraphs refer to “voice communications applications.”
-
Second, that the CfP notes that WG11 is not obliged to proceed with standardization subsequent to the call, but elsewhere language suggests that standardization will proceed after evaluation of the CfP.
-
Third, Ericsson experts are concerned that if test items are selected far in advance of submission it could permit tuning of codecs. It is proposed that the process for final test items selection be discussed and carefully formulated.
-
Fourth, proponent complexity is described in CPU load on an x86 platform. Ericsson experts propose that instead that complexity be reported as a count of arithmetic operations, e.g. on a theoretical floating-point DSP architecture.
-
Fifth, the proposal notes that if may be that no submitted technology may meet the CfP requirements. In this case, a process should be identified to e.g. merge submissions into a single technology that meets the CfP requirements.
Schuyler Quackenbush, ARL, presented
This contribution contained a revised CfP based on the output document from the last meeting. The Chair noted that the final CfP will be issued at this meeting, so that working on the CfP text will be one of the most important tasks for this week. With that in mind, Quackenbush presented three additional document that he felt would guide the Speech and Audio Call: Further edits on the CfP, Guidelines for Evaluation and Workplan for Evaluation Tests. These were distributed to the group and will be discussed later in the week.
Recommendations
The AhG recommended that the Audio Subgroup:
-
Draft an output document that collects all Samsung NT performance information to date.
-
Draft an output document workplan to clarify test items issues raised in the Samsung contribution m15011.
-
Issue the final CfP on Speech and Audio Coding at this meeting. Issues raised by contributions 14980 and 14971 should be discussed.
Dostları ilə paylaş: |