International organisation for standardisation organisation internationale de normalisation


Task group activities 4.1Joint Meeting



Yüklə 5,72 Mb.
səhifə79/84
tarix25.12.2017
ölçüsü5,72 Mb.
#35931
1   ...   76   77   78   79   80   81   82   83   84

4Task group activities

4.1Joint Meeting

4.1.1With Requirements on Audio for HEVC (Wed 1400-1500)


The Requirements and Audio Chairs reviewed a document that presented a number of application scenarios for new audio work, the most prominent of which was audio coding and presentation for visual display systems that make use of High Efficiency Video Coding (HEVC).

Such visual displays might be Ultra-HD (UHD) devices, such as 4K x 2K displays. In such displays, a much closer viewing distance is feasible and perhaps desirable. At such viewing distances there is both visual and audio envelopment. This could have significant impact on audio presentation such that much more accurate sound localization in terms of direction and distance is desirable. If there is only one viewer, it might be that aspects of the audio presentation might be individualized in some meaningful way.

The Chairs captured comments from the group, with respect to relevant application scenarios, relevant technology and, most importantly, relevant requirements. The results of the joint meeting are captured in the output document “Audio for HEVC.”

4.2Task Group discussions

4.2.1MPEG-2, MPEG-4, MPEG-7, Audio Conformance, Reference Software, MPEG Surround


Ferenc Kraemer, Dolby, presented

  1. m18406

  1. Proposed addition to Intensity Stereo in ISO/IEC 14496-26:2009, Audio conformance

  1. Ferenc Kraemer, Heiko Purnhagen

The contribution notes that many AAC encoders and decoders have a bug in the intensity stereo processing. The bug entails the interaction between the intensity codebooks indicated (INTENSITY_HCB and INTENSITY_HCB2, numbers 14 and 15) and the ms_mask bit for a scale factor band. The correct behaviour is shown in the following table:




Intensity in-phase

Intensity out-of-phase

IS Codebook



INTENSITY_HCB
(#15)

INTENSITY_HCB

(#15)


INTENSITY_HCB2

(#14)


INTENSITY_HCB2

(#14)


M/S Bitmask Field

0

1

0

1

Phase Position
of Left and Right



180°

180°


The conbribution notes that the ms_mask_present flag can be used to “preset” the values of the ms_mask array. A buggy encoder or decoder ignores the role that the “preset” or “implicit” values of the ms_mask bit plays when constructing the desired intensity phase.

The Chair asked the group to consider whether additional text in the informative part of the MPEG-4 AAC and MPEG-4 AAC specifications to clarify the new encoder behaviour might be an appropriate additional action.

It was the consensus of the Audio Subgroup to issue two DCOR, one against MPEG-2 Conformance and one against MPEG-4 Audio Conformance.

Ferenc Kraemer, Dolby, presented

  1. m18426

  1. Additional information on MPEG Surround conformance testing

  1. Andreas Hölzer, Christian Ertel, Markus Lohwasser, Michael Härtl, Ferenc Kraemer, Frans de Bont

The contribution presents data on the output of the MPEG Surround conformance tool. It report on the decoding performance, in terms of LSB’s of difference relative to a given reference, of a number of important platforms, e.g. floating point and fixed point.

The presenter notes that MPEG Surround conformance criteria must be defined and expects to bring more information to the next meeting.


4.2.2MPEG-D Spatial Audio Object Coding


Leon Terentiv, FhG, presented

  1. m18436

  1. Report on corrections for MPEG SAOC

  1. Jonas Engdegård, Heiko Purnhagen, Oliver Hellmuth, Jürgen Herre, Cornelia Falch, Leon Terentiv, Maria Luis Valero, Johannes Hilpert, Andreas Hölzer, Werner Oomen

The contribution proposes two small corrections to the Low Delay MPEG Surround part of the SAOC specification.

It was the consensus of the Audio Subgroup to incorporate the proposed changes into a revised Defect Report that is an output of this meeting.

Oliver Hellmuth, FhG, presented

  1. m18437

  1. Revised draft of SAOC verification test report

  1. Jonas Engdegård, Heiko Purnhagen, Oliver Hellmuth, Jürgen Herre, Cornelia Falch, Leon Terentiv, Maria Luis Valero, Johannes Hilpert, Andreas Hölzer, Werner Oomen

The contribution is a draft of the SAOC test report. The presenter reviewed the document. It was noted that perhas results for individual items should be added to the report, but as an Annex. The Chair noted that figures that graphically

It is the consensus of the ASG to



  • Issue the SAOC verification test report at this meeting, with an editing period. This will be a public document.

  • Look forward to the “Karaoke” solution at the 95th meeting. It is expected to be added to the SAOC Defect Report (from this meeting) and to issue as a DCOR at the 95th meeting.

  • Based on info at the 95th meeting, decide on whether to update the VRT with a Karaoke test. Such a revised VTR would issue at the 96th meeting.

4.2.3MPEG-D Unified Speech and Audio Coding


CE Educational Material

Markus Multrus, FhG, presented



  1. m18454

  1. Informative Encoder Description for USAC Improved Noiseless Coding CE

  1. Vignesh Subbaraman, Markus Multrus, Kihyun Choo

The contribution contains text for the informative annex of the USAC specification that describes how to build the spectral arithmetic coding tool for the USAC encoder. The presenter verified that the MPEG Reference Encoder has full source code for the USAC Improved Noiseless Coding CE functionality.

Kei Kikuiri, NTT DOCOMO, presented



  1. m18473

  1. Educational Information on Encoder Implementation of inter-TES tool in USAC

  1. Kei Kikuiri, Atsushi Yamaguchi, Nobuhiko Naka

The contribution contains text for the informative annex of the USAC specification which describes how to build the inter-TES tool in the USAC encoder. The presenter verified that the MPEG Reference Encoder has only bitstream syntax support for the USAC inter-TES tool.

It was the consensus of the Audio Subgroup that the contribution provides sufficient educational information. It was noted that Markus Multrus, FhG, reported in an email (Sep 24, 2010, 16:35 CEST) that the inter-TES normative decoder software has been integrated into the USAC reference decoder. Hence this CE is concluded.



USAC Performance

Gregory Pallone, Orange Labs, presented



  1. m18434

  1. Report on USAC performance

  1. Gregory Pallone, Pierrick Philippe

The contribution reports on a listening test that compares the performance of USAC, MPEG-4 AAC and MPEG-4 HE-AAC. Systems under test were:

  • aac+64: HE AAC encoder @ 64kbps

  • aac96: AAC encoder @ 96kbps

  • aac128: AAC encoder @ 128kbps

  • usac64: reference quality bitstream of WD7 @ 64 kbps

  • usac96: reference quality bitstream of WD7+CE (improved stereo) @ 96 kbps

There were ten subjects after post-screening. When averaged over all listeners and all test items, USAC demonstrates a slight increase in performance (in the mean) for the range of bit rates tested.

USAC Configuration, Payload and Transport Issues

Markus Multrus, FhG, presented



  1. m18480

  1. Comments on further USAC investigation

  1. Max Neuendorf, Markus Multrus, Stefan Doehla, Werner Oomen, Heiko Purnhagen

This contribution presents a more detailed list of requirements that should be addressed in the final steps of the USAC development work.

It divides the requirements into three broad categories:



  • Configuration of decoder

  • Payload (i.e. Access Unit)

  • Transport (of payloads)

Contributions on this topic should refer to what aspect of the “requirements” (as put forth in this document) is addressed.


It was agreed to make the body of the contribution a section of the USAC CE Status and Workplan document.
MPEG Reference Encoder
David Virette, Huawei, presented

  1. m18470

  1. A new signal classifier for USAC reference encoder

  1. David Virette, Lijing Xu, Wei Xiao

The contribution describes a new software module for the USAC MPEG Reference Encoder that analyzes the input signal and determines which coding mode should be applied. The performance of this classifier was compared to both the current MPEG Reference Encoder and the RM7 bitstream classification decisions.
It was the consensus of the Audio Subgroup to incorporate this new module into the SVN trunk of the USAC Reference Encoder.
Jeongook Song, Yonsei University, presented

  1. m18429

  1. Status report on USAC Reference Software JAME

  1. Jeongook Song, Henney Oh, Hong-Goo Kang

The contribution reports on the status of the JAME USAC encoder project. In the AhG period running up to the 94th MPEG meeting, work was done to revise the psychoacoustic model tool and corresponding bit-allocation loop tool, but that work is not yet complete and so no new code release is available at this MPEG meeting.
The Chair hopes that, at the 95th MPEG meeting, a workplan can be created that coordinates a listening test done by interested companies in the MPEG Audio Subgroup that will benchmark the performance of the JAME software encoder against the Reference Quality Encoder.
Additional bandwidth extension

Henney Oh, Yonsei University, presented



  1. m18428

  1. Yonsei Crosscheck listening test report on additional bandwidth extension CE for USAC

  1. Jeongook Song, Eunwoo Song, Henney Oh

The contribution presents listening test results for USAC operating at 8 kb/s for mono signals. It compares RM7+CE and RM7. For absolute scores there were no differences at the 95% level of significance. When analyzing differential scores, 3 items were worse and the mean was worse at the 95% level of significance assuming a Normal distribution. When analyzed with Student-t distribution, 1 worse mean worse at the 95% level of significance.

David Virette, Huawei, presented



  1. m18472

  1. Progress report on additional bandwidth extension CE for USAC at low bit rates

  1. David Virette, Wei Xiao

The contribution gave an overview of the CE technology. This experiment had a bitrate of 8 kb/s and a sampling rate of 19200 Hz. The technology extends the bandwidth up to 9.6 kHz. Two architecutes were presented: one is “outside” of the SBR framework, which is appropriate for SBR extension that covers nearly up to the Nyquist frequency (typically when the internal sampling rate is adjusted lower for coding efficiency). The other is “inside” the SBR framework, which is appropriate when the SBR extension is far short of the Nyquist frequency.

The contribution presents results of a listening test of RM7+CE and RM7. For absolute scores there were no differences at the 95% level of significance. When analyzing differential scores, 1 item was at the 95% level of significance

Anecdotally, the presenter noted that listeners had to weigh the perceived effects of greater bandwidth and additional noise.

Next steps for this CE will be supported by the USAC CE Workplan.



Increased structural flexibility in SBR

Max Neuendorf, FhG, presented



  1. m18432

  1. Status update of CE proposal on increased structural flexibility in SBR

  1. Stephan Wilde, Markus Multrus, Max Neuendorf, Kristofer Kjörling, Heiko Purnhagen

The contribution presented an overview of the proposed technology. The basic idea is to extend the SBR functionality from 1:2 extension to 1:4 extension. The 1:4 extension mode is appropriate for very low bit rate systems (e.g. 12 kb/s) which may have very low internal sampling rates in the encoder/decoder framework. The 1:4 extension is achieved by using only 16 of the existing 64 bands of the QMF filterbank for the core coder signal. It notes that a similar technology has already been specified in the DRM system.

The contribution also presents results of a listening test for RM7+CE versus RM7 for the 8 kb/s mono operating mode. For absolute scores, 1 item is better and the mean is better. For differential scores, 9 items are better and the mean is better. The CE technology brings 4 MUSHRA points improvement for the mean and up to 7 points for Music3.

The presenter noted that RM7 and RM7+CE both uses an core coder sampling rate of 9.6 kHz, so that the RM7 output sampling rate is 19.2 kHz and the RM7+CE output sampling rate is 2*19.2 kHz.

The presenter expects to bring a complete CE proposal to the 96th MPEG meeting.


Enhanced long term predictor (eLTP)

Henney Oh, Yonsei University, presented



  1. m18430

  1. Updated CE on enhanced long term predictor (eLTP) for USAC

  1. Jeongook Song ,Henney Oh ,Hong-Goo Kang

The contribution presents an enhanced functionality, in that the eLTP tool now operates for all USAC core coding modes.

The Audio Subgroup looks forward to more information at the next meeting.


TCX windowing

Seungkwon Beack, ETRI, presented



  1. m18234

  1. Progress report on the TCX windowing CE for USAC

  1. Taejin Lee, Seungkwon Beack, Minje Kim, Kyeongok Kang

The contribution noted that there was not sufficient time in the AhG period to execute all aspects of the workplan, so that there are no new results to report at this meeting.

The Audio Subgroup looks forward to more information at the next meeting.


Time warping

Zhao Dan, Panasonic, presented



  1. m18419

  1. Status report of time warping CE in USAC

  1. Zhao Dan, Zhong Haishan, Chong Kok Seng, Takeshi Norimatsu, Tomokazu Ishikawa, Neo Sua Hong

The contribution presented an overview of the time warping technology.

It also presented results of a listening test for the operating modes 24 kb/s mono and 20 kb/s mono. Systems under test were:



  • No_TW (WD6 no time warping)

  • WDQ (WD6 time warping forced to be active)

  • TW_CE (WD6+CE)




  1. Statistic

  1. Mode

  1. Performance (differential score analysis)

  1. TW_CE – No_TW

  1. 24 kb/s

  1. 3 items better, mean better

  1. TW_CE – WDQ

  1. 24 kb/s

  1. 3 items better, mean better

  1. TW_CE – No_TW

  1. 20 kb/s

  1. 8 better, mean better

  1. TW_CE – WDQ

  1. 20 kb/s

  1. 4 items better, mean better

The contribution notes that the CE technology provides some bit savings (up to 2%) and also accommodates a higher dynamic range in the time warping parameter (via Huffman compression).

The Audio Subgroup looks forward to more information at the next meeting.

Pulse Indexing

Toru Chinen, Sony, presented



  1. m18396

  1. Sony listening test report on pulse indexing in USAC

  1. Toru Chinen, Masayuki Nishiguchi

The contribution presents results of a listening test which evaluated the RM7+CE versus RM7 at 20 kb/s for mono signals. All 15 test items were clean speech. It showed no differences when analyzing either absolute or differential scores at the 95% level of significance using Student-t distribution.

David Virette, Huawei, presented



  1. m18469

  1. Finalization of Enhanced Pulse Indexing CE for ACELP in USAC

  1. David Virette, Dejun Zhang, Fuwei Ma

The contribution summarizes the Enhanced Pulse Indexing CE technology. A theoretical analysis showed that it is able to provide the following additional compression efficiency:

  1. ACELP Mode

  1. Bit savings per frame

  1. 18k4

  1. 8 or 16

  1. 16k4

  1. 8 to 16

  1. 14k4

  1. 4 to 8

The contribution also presents the results of a listening test. When analyzing absolute scores, there were no differences at the 95% level of significance. When analyzing differential scores, 2 items were better at the 95% level of significance using Normal distribution.

The Sony and Huawei listening data were pooled and an analysis of the differential scores showed 1 item better (nadib2) and 1 worse (nadib1).

This was discussed later in the MPEG week (see later in this section).



PVC for SBR

Takeshi Norimatsu, Panasonic, presented



  1. m18361



  1. Panasonic cross check report on PVC for SBR envelope coding in USAC

  1. Takeshi Norimatsu, Tomokazu Ishikawa, Haishan Zhong, Dan Zhao, Kok Seng Chong

The contribution gave results of a listening test which evaluated the RM7+CE versus RM7 at 12 kb/s for mono signals. It showed no differences when analyzing absolute scores at the 95% level of significance using Normal distribution. When analyzing differential scores, 3 items were better at the 95% level of significance using Normal distribution, 2 items better when using the Student-t distribution.

The contribution also reports on verification of the RM7+CE decoder: all listening test bitstreams were decoded to produce exactly the WAV files used in the listening test.

Heiko Purnhagen, Dolby, presented

  1. m18374

  1. Dolby listening test results for CE on PVC for SBR in USAC

  1. Heiko Purnhagen, Kristofer Kjörling

The contribution gave results of a listening test which evaluated the RM7+CE versus RM7 at 20 kb/s for mono signals. It showed no differences when analyzing absolute scores at the 95% level of significance using Normal distribution. When analyzing differential scores, 2 items were better and 1 worse at the 95% level of significance using Normal distribution, 1 items better and 1 worse when using the Student-t distribution.

Markus Multrus, FhG, presented



  1. m18455

  1. FhG Listening Test Report – PVC for SBR envelope coding

  1. Frederik Nagel

The contribution gave results of a listening test which evaluated the RM7+CE versus RM7 at 20 kb/s for mono signals. It showed no differences when analyzing absolute scores at the 95% level of significance using Normal distribution. When analyzing differential scores, 3 items were better and 2 worse at the 95% level of significance using Normal distribution and 2 better, 2 worse if Student-t distribution is used.

Toru Chinen, Sony, presented



  1. m18399

  1. Report on PVC CE for SBR in USAC

  1. Toru Chinen, Yuki Yamamoto, Mitsuyuki Hatanaka, Masayuki Nishiguchi

The contribution gave an overview of the Predictive Vector Coding (PVC). The CE aims at reducing encoding bit rate and increasing the subjective quality by modifying the delta coding scheme of SBR envelope scalefactors. The proposed scheme is based on the prediction of SBR envelope scalefactors by using the energy of QMF subband samples below SBR range. The indices of prediction coefficient matrices are coded by using vector quantization.

PVC increases the complexity of the SBR tool by 0.4 WMOPS (peak load) and requires 1096 bytes of additional table storage. It provides 1.44% additional compression efficiency as averaged over all CfP test items (as compared to RM7 reference bit streams).

The contribution gave results of a listening test at Sony. When analyzing differential scores, 6 items were better and the mean was better.

When all data is pooled, 5 items are better and the mean is better. Of these, 1 item shows an improvement of 8 MUSHRA points, 2 items have show an improvement of 5 points. When looking at individual test site results, 2 items are better for 2 of 4 sites, 2 items are better for 3 of 4 sites.

This was discussed later in the MPEG week (see later in this section).

Tonal component coding in eSBR

Marek Domanski, Poznan University, presented



  1. m18532

  1. Telcordia and PUT listening test results for CE on improved tonal component coding in eSBR (USAC)

  1. Tomasz Zernicki ,Maciej Bartkowiak ,Marek Domanski

The contribution reviewed the CE technology. It proposes a separate sinusoidal encoder/decoder and separate bit stream of sinusoidal model parameters, as shown in the following figure. The sinusoidal model parameter bit stream is typically 2 kb/s. The sinusoidal modelling is only run for the SBR high-band region.

It presents results of a listening test at 16 kb/s and 20 kb/s. When analyzing differential scores of RM7+CE – RM7, at



  • 16 kb/s 3 of the new items with many high-frequency tonal components showed improvement (e.g. saxophone, violin, accordion). None of the CfP items showed any degradation.

  • 20 kb/s 4 of the new items showed improvement.

The presenter clarified that the RM7 encoder was the MPEG Reference Encoder compiled and run on the CfP items and the new CE items.

As a next step, it was agreed to draft a workplan in which the CE listening test experiment is repeated, but using the Reference Quality Encoder.



New CEs

New excitation coding for LPD mode

Roch Lefebvre, VoiceAge, presented



  1. m18481

  1. Proposed CE for extending the LPD mode in USAC

  1. Bruno Bessette ,Philippe Gournay, Roch Lefebvre

The contribution proposes to add an additional excitation coding method for use in ACEP coding mode.


The additional tool gives the following advantages:

  • Reduce the time and frequency dynamice of the residual (LPC excitation)

  • Better control of bit allocation for LPC excitation

  • More explicit control of coding noise in ACELP frame.

  • Re-use existing tools

Achieve higher quality for the ACELP coding mode at higher bit rates (i.e. above 32 kb/s)

The contribution presents the results of a listening test for stereo signals. The following Systems or Codec under Test (CuT) were tested:



  • USAC WD6 at 32 kbps stereo (switched FD/LPD)

  • Modified USAC WD6 (CuT) at 48 kbps stereo (LPD only)

  • Modified USAC WD6 (CuT) at 64 kbps stereo (LPD only)

  • USAC WD6 at 64 kbps stereo (FD only)

The results show that for the differential statistics

  • (CuT at 48 kbps – USAC WD6 at 32 kbps) – all items and the mean showed improvement at the 95% level of significance.

  • (CuT at 64 kbps – USAC WD6 at 64 kbps) – 5 better, 1 worse, mean better, at the 95% level of significance.

The proponent expects to bring a complete CE to the next meeting. It was agreed to draft a workplan to coordinate cross-check efforts.

Enhanced Performance at Mid Bitrates

Markus Multrus, FhG, presented



  1. m18479

  1. Proposed Core-Experiment on Enhancing the USAC Codec at Mid Bitrates

  1. Markus Multrus, Philippe Gournay, Nikolaus Rettelbach, Bruno Bessette, Bernhard Grill

The contribution notes that USAC can switch between a block-based transform coder and a LP coder. The transform coder has a fixed block length and cannot adjust its time resolution (e.g. via internal sampling rate conversion) when switching to LP mode must be accommodated. It further notes that temporal resolution can be achieved via either of two means:

  • Shorten transform frame size

  • Increase sampling rate

The proposed technology used both methods to achieve higher time resolution at mid bit rates (e.g. 24 kb/s). The proposed new operating mode is shown in the following figure:

The computational complexity of the proposed system is comparable to a system run at 44.1 kHz sampling rate (with a 1:2 SBR configuration). The memory complexity, comprised of tables for the new transform block lengths, is approximately 960 32-bit works.

The contribution presents results of a listening test of 24 kb/s mono. When differential scores are analyzed: (RM7+CE – RM7), 4 items are better at the 95% level of significance.

The presenter requests that a workplan that will specify an experimental setup and coordinate cross-checks, and expects to bring a complete CE to the next meeting.



Improved stereo coding

David Virette, Huawei, presented



  1. m18474

  1. CE proposal on improved stereo coding at low bit rates

  1. David Virette

The contribution describes the problem in USAC stereo coding that is addressed by the CE technology. For low bit rate stereo coding, when only CLD and ICC parameters are used, the upmix matrix may have a wrong behaviour for negative ICC parameters. The proposed technology uses an IPD estimation to better reconstruct the stereo signal in case of negative ICC.

The contribution reports the results of a listening test at 20 kb/s stereo. With differential analysis of the statistic (H6+CE – H6), there is 1 item better and mean better at the 95% level of significance, where “H6” is Huawei’s internal USAC encoder which employs the same the tools as in RM6.

Werner Oomen, Philis, noted that there are already means used in USAC to neutralize the anti-phase, and it might be interesting to compare the CE technology with current USAC tools.

The presenter requests that a workplan that will specify an experimental setup and coordinate cross-checks, and expects to bring a complete CE to the next meeting.



Lower-Complexity Decorrelator

Julien Robilliard, FhG, presented



  1. m18433

  1. Proposed decorrelator improvements in USAC

  1. Julien Robilliard, Matthias Neusinger, Johannes Hilpert, Erik Schuijers, Bert den Brinker, Werner Oomen

The contribution presented results of an investigation on reducing the complexity of the MPS 212 decorrelator without compromising audio quality. A complexity analysis of the current and proposed decorrelator is shown in the table below. The main point is that the decorrelator complexity is reduced by more than 50%, and that the complete decoder complexity is reduced by nearly 15%.




Complexity measure (MOPS)

Complexity reduction

Proposed filter

RM8 filter

Decorrelator tool

1.7

3.6

52.8%

Complete
USAC decoder

11.2

13.1

14.5%

The contribution presents results of a listening test for 32 kb/s stereo, showing no significant differences for any item at the 95% level of significance.

The presenter requests that a workplan that will specify an experimental setup and coordinate cross-checks, and expects to bring a complete CE to the next meeting.

Improved applause coding

Julien Robilliard, FhG, presented



  1. m18413

  1. CE proposal on improved applause coding in USAC

  1. Achim Kuntz, Sascha Disch, Erik Schuijers, Werner Oomen

The contribution is a joint proposal from FhG-IIS and Philips on Transient Steering Decorrelator (TSD). It notes that the parametric stereo technology inherited from MPEG Surround has a number of shortcomings:

  • Narrowed sound stage

  • Lack of envelopment

  • Sound coloration

The solution proposed it to add a block that separates out transient components and passes them through a new block especially adapted for decorrelating transient signals. This is shown in the following block diagram:

Results of a listening test at 32 kb/s stereo were presented. Analysis of absolute MUSHRA scores showed that 3 “applause” items showed significant improvement (more than 10 MUSHRA points) and the 95% level of significance. Analysis of differential scores showed that all items are improved at the 95% level of significance.

The presenter requests that a workplan that will specify an experimental setup and coordinate cross-checks, and expects to bring a complete CE to the next meeting.

Enhanced Mode Transitions

Sangoh Jeong, LG, presented



  1. m18393

  1. Core Experiment on Enhanced Mode Transitions in USAC

  1. Kiho Cho, Nam Soo Kim, Sungyong Yoon, Sangoh Jeong

The contribution proposes a technology for transitions between TCX and ACELP frames and from FD to ACELP frames. The main idea is shown in the following figures.

It notes that the proposed technology can save approximately 1% of the RM7 bitrate, depending on the operating mode. The complexity of the proposal is less than the FAC operation.

The results of a listening test are reported for 20 kb/s mono and 12 kb/s mono. It shows no differences for either absolute scores or differential scores.

Philippe Gournay, VoiceAge, noted that the proposed technology has two potential shortcomings:



  • Zero the filter state on the MDCT (i.e. TXC, FD) to ACELP transition

  • Base the TDAC on ACELP encoded signal

Max Neuendorf, FhG, agreed with Philippe Gournay in that FhG experimented with a LP filter state reset and found that it had a negative impact on audio quality. He further noted that the adaptive codebook in ACELP assumes a “time forward” sequence in the residual, while the folding operation results in a “time reversed” interval that will not be efficiently encoded.

The presenter requests that a workplan that will specify an experimental setup and coordinate cross-checks, and expects to bring a complete CE to the next meeting.



Unified global gain syntax

Max Neuendorf, FhG, presented



  1. m18450

  1. Proposed unified global gain syntax element in USAC

  1. Guillaume Fuchs, Markus Multrus, Max Neuendorf

The contribution proposes to modify syntax elements of the USAC bit stream to support a means to change the of the USAC output level via changing a single global gain element.

The current gain elements are shown here:



  1. Mode

  1. Gain element

  1. FD

  1. 8-bit “Global gain” per 1024 length blocks

  1. TCX

  1. 7-bit “TXC global gain” per 256 to 1024 length blocks

  1. ACELP

  1. 2-bit “Mean energy” per 256 length block

  1. SBR

  1. Codes absolute envelope energy. (This does not automatically scale when global gain of core coder changes.)

The proposed modifications are shown here:



  1. Mode

  1. Gain element

  1. FD

  1. Stays the same: 8-bit “Global gain” per 1024 length blocks

  1. TCX

  1. 1 bit refinement: 7-bit “TXC global gain” per 256 to 1024 length blocks

  1. ACELP

  1. Remove “Mean energy”

  1. SBR

  1. Code absolute envelope energy realative to the core codec energy.

This is shown graphically in the following figure:




Since the proposed change is not “noiseless,” the contribution presents the results of a listening test at 16 kb/s mono. One system under test was the proposed technology, another was the proposed technology, but decreased in level via the 8-bit Global gain element and then increased in level in the decoded PCM domain. When examining absolute MUSHRA scores, there is no difference at the 95% level of significance. When examining differential scores, there is no difference at the 95% level of significance.

Heiko Purnhagen, Dolby, asked what requirement is addressed with the proposal. In particular, he noted that the SBR high-band level adjustment is only responsive to an SBR header. Hence, adjustments in global gain will not impact the SBR high-band until a SBR header occurs in the bitstream.

The presenter requests that a workplan that will specify an experimental setup and coordinate cross-checks, and expects to bring a complete CE to the next meeting.
Additional discussion on PVC for SBR

Toru Chinen, Sony, made a presentation on a proposed workplan. This was discussed and it was the consensus of the Audio subgroup draft a workplan to execute the points:



  • Gather new evidence based on RM7 code base.

  • Bring evidence of performance, as a listening test, at one additional operating point, e.g. 8kb/s.

  • If the PVC bit stream and decoded output does not change for the 12 kb/s operating point, then that test does not have to be repeated.


Additional discussion on T/F domain post processing

David Virette, Huawei, made a presentation on the performance of T/F domain post processing at 8 kb/s and 12 kb/s mono. The new information pertains to calculating the 95% confidence intervals using the Student-t distribution. In addition, he presented information on the complexity of the two modules proposed in the CE.

He presented a comparison of the role of the eTES processing and the T/F post processing. He noted that


  • The eTES processing sharply adjusted the time-varing energy envelope

  • The T/F domain post processing applies a smooth emphasis to the spectral peaks and smooth de-emphasis to the spectral valleys, as in classical speech post processing.

He presented a graphic showing per-item listening test results, and the extent that the two modules (flattening and post-processing) were active in the frames of each item.

He concluded by noting that post-filtering is well known in the speech coding community and has been shown to provide performance improvements at low bit rates.

Finally, he requests that the post-processing tool be incorporated, and to rely on the Dolby improved SBR envelope flattening to perform the spectral flattening function.

There was considerable discussion. In particular, it was noted that it is not clear



  • Which flattening technique is best (CE from Dolby or CE from Huawei)

  • Whether the T/F post processing provides merit in addition to the spectral flattening

The workplan will conduct additional listening tests at 12 kb/s mono and 8 kb/s mono operating points. The systems under test will be:

1. RM7+iSBR

2. RM7+iSBR+T/Fpp

3. RM7+CEfull (flattening always active)

where


  • iSBR is “Improved SBR” technology, which flattens the high-band spectral envelope

  • T/Fpp is the Time/Frequency post processing filter

  • CEfull is both the spectral flattening module and the Time/Frequency post processing module proposed in the CE

In system 3, above, the spectral flattening module will be tuned so that it is active for all content categories, namely speech, speech+music and music.
Based on the MUSHRA scores, the following difference statistics would be analyzed (where numbers refer to systems, as labelled above). For each diff score, if there is significant improvement, then an action is indicated.


  1. Diff Score

  1. Conclusion if significant positive value

  1. 2-1

  1. Incorporate iSBR + T/Fpp

  1. 3-2

  1. Replace iSBR with CEflattening and incorporate T/Fpp

  1. 3-1





Additional discussion on QMF transposer

KK, Dolby, made a presentation on next steps on QMF transposer. The decisions are



  • Keep both QMF and FFT transposer

  • Replace FFT transposer by QMF transposer

The workplan will conduct additional listening tests at 12 kb/s mono and 8 kb/s mono operating points. The baseline software will be RM7 plus the transposer cross-products technology.

RM8 Reference Software

Markus Multrus, FhG, discussed an issue pertaining to the RM8 reference software. A bug was discovered in the decoder when it operates at 64 kb/s and with the time warping tool active. The bug pertained to the window size that was a float, was stored as an int but then used again as a float. The result was that, on second use, the window size was off by one. This bug resulted in a maximum difference of 42 in the decoded waveform, although the RMS error was much smaller.

It was the consensus of the Audio Subgroup to:


  • Correct the RM8 Reference Software

  • Correct the RM8 decoded waveforms

An email to the mpeg-audio-call reflector reported that this update was done during the MPEG meeting.

Performance of MPEG Reference Encoder Software

The Chair stated that it would be very desirable for the Audio Subgroup to conduct a small-scale but well controlled listening test to measure the performance of the MPEG Reference Encoder software. There was good discussion on this proposal and the following issues were identified. It is believed that the JAME software project represent the best version of the MPEG Reference Encoder software. The Yonsei experts verified that the JAME software is not encumbered by external (i.e. non-MPEG) copyright, such that all source code files can have the MPEG copyright header.

The Chair noted that such a listening test could be conducted in April, 2011, in which case it will not conflict with the USAC Verification Test work, which is envisioned to have listening tests in June, 2011. A test in the April, 2011 time frame could use the Reference Quality bit streams made available from the March MPEG meeting, which should be identical or at least very close in performance to the Reference Quality bit streams used in the USAC Verification Test.

USAC Timeline and CE Logistics

The Chair presented a timeline for USAC activities, shown in the table below. This was discussed and audio experts agreed to this schedule. However, the Chair acknowledged that the new ballot intervals under which USAC must operate plus DIS editing period and ballot processing could mean that the DIS ballot results are not available at the 97th meeting, thus making it impossible to progress to FDIS in July 2010.



  1. Meeting

  1. Date

  1. Activity

  1. 94th

  1. Oct 2010

  1. Study on CD

  1. 95th

  1. Jan 2011

  1. The default position is that all CEs must be complete at this meeting. Exceptions will be discussed on a case by case basis.

  2. DoC on CD

  3. DIS text

  1. 96th

  1. Mar 2011

  1. Workplan for Verification Test



  1. May 2011

  1. Integrate CE reference software into normative decoder



  1. Jun, 2011

  1. Conduct Verification Test listening tests and make results available as Excel spreadsheet

  1. 97th

  1. Jul 2011

  1. DoC on DIS

  2. FDIS text

  3. ProduceVerification Test Report



  1. Sep 2011

  1. IS


Name for USAC specification

The Chair discussed the need for a name for the USAC specification, and expressed a strong interest in having that name at the 95th MPEG meeting. The Chair suggested the following name and acronym, and asked audio experts to please come to the next meeting with something much better:

eXtented Audio Coding – XAC

During the meeting, other experts suggested:

NSA – Next Stage Audio

FEAC – Future Entertainment Audio Coder

FRAC – Full Range Audio Coder

C-AAC – Content Agnostic Audio Coder



MMAC – MPEG Media Audio Coder

Yüklə 5,72 Mb.

Dostları ilə paylaş:
1   ...   76   77   78   79   80   81   82   83   84




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin