Ongoing CEs
Toru Chinen, Sony, presented
m20007
|
Sony listening test report on lossless coding of pitch lag for ACELP CE
|
Mitsuyuki Hatanaka, Toru Chinen, Masayuki Nishiguchi,
|
USAC-PL
|
|
The contribution presented results of a cross-check listening test of the Lossless Coding of Pitch Lag CE operating at 8 kb/s mono. With both absolute and differential analysis of MUSHRA scores, there were no difference between RM and RM+CE at the 95% level of significance assuming Student T distributions.
Philippe Gournay, VoiceAge, presented
The contribution reviewed the CE status at the 95th meeting:
-
Audio experts requested that the Huffman codebooks be re-trained on a larger database
-
Audio experts noted that a part of the CE performance gain was due to the lossless coding of pitch lag and a part due to a new encoder tuning.
Based on these observations, two new tests were undertaken.
Test 1. Systems under test were:
-
RM8
-
RM8+CE using same ACELP/TCX decision as RM8 and with CE lossless coding
Test 2. Systems under test were:
-
RM8
-
RM8+CE using same ACELP/TCX decision as RM8 and with CE lossless coding
-
mod_RM8 using same ACELP/TCX decision and the same ACEP mode (i.e. ACELP bit rate) as in RM8+CE. This is an encoder-only strategy that can be decoded with normative RM8
The contribution presented results of a cross-check listening test for Test 1 and Test 2.
Test 1:
-
Differential analysis of MUSHRA scores for (RM8+CE – RM8) showed 1 item better and 2 items worse.
Test 2:
-
Differential analysis of MUSHRA scores for (RM8+CE – RM8) showed 1 item better.
-
Differential analysis of MUSHRA scores for (mod_RM8 – RM8) showed 1 item better.
-
Differential analysis of MUSHRA scores for (mod_RM8 – RM8+CE) showed no differences.
The Chair noted that no system in Test 1 and Test 2 was exactly the system under test, RM8+CE) presented a the 95th meeting, in that this system additionally allocated bits from the bit reservoir (typically from bits that would otherwise be allocated to TCX and SBR) to ACELP coding.
Takehiro Moriya, NTT, presented
m20023
|
Additional information of CE on lossless coding of pitch lag for ACELP in USAC
|
Takehiro Moriya, Yutaka Kamamoto, Noboru Harada
|
USAC-PL
|
|
The contribution reports that the Huffman codebook used for lossless coding of pitch lag was re-trained. A large (more than 3 hours) database of speech sampled at 48 kHz with 16-bit linear PCM was used to train the codebook. No USAC test item was included in the training set. At operating points from 8 kb/s to 24 kb/s mono, the re-trained codebooks provided slightly improved compression performance.
NTT test result:
Test 1:
-
Differential analysis of MUSHRA scores for (RM8+CE – RM8) showed 2 item better and mean better.
Test 2:
-
Differential analysis of MUSHRA scores for (RM8+CE – RM8) showed 2 better, mean better.
-
Differential analysis of MUSHRA scores for (mod_RM8 – RM8) showed no difference.
-
Differential analysis of MUSHRA scores for (RM8+CE – mod_RM8 ) showed 1 better, mean better.
When NTT, Sony and VoiceAge data are pooled
Test 1:
-
Differential analysis of MUSHRA scores for (RM8+CE – RM8) showed 3 better.
When Sony and VoiceAge data are pooled (NTT data is excluded)
Test 1:
-
Differential analysis of MUSHRA scores for (RM8+CE – RM8) showed 1 better,
-
The better item (“Arirang”) is consistently better when NTT only, Sony and VoiceAge pooled, and all pooled.
When NTT and VoiceAge data are pooled
Test 2:
-
Differential analysis of MUSHRA scores for (RM8+CE – RM8) showed 3 better, mean better.
-
Differential analysis of MUSHRA scores for (mod_RM8 – RM8) showed no difference.
-
Differential analysis of MUSHRA scores for (RM8+CE- mod_RM8) showed mean better.
-
“Arirang” coded by RM8+CE is again better than RM8 when NTT and all pooled.
The contribution notes that the CE performance could be increased with these additional encoder-only optimizations:
Roch Lefebvre, VoiceAge, noted that the items that showed an improvement were not consistent across test sites.
Bernhard Grill, FhG, noted that this CE provides bitrate savings at the lowest bitrates where it matters the most.
The presenter noted that at 8 k/bs and 12 kb/s the encoder always selects ACELP mode when the signal is clean speech. Hence the CE brings at least 3.0% and 2.5% savings, respectively.
Dostları ilə paylaş: |