International organisation for standardisation



Yüklə 1 Mb.
səhifə12/19
tarix04.09.2018
ölçüsü1 Mb.
#76690
1   ...   8   9   10   11   12   13   14   15   ...   19

MPEG-4

Technical issues

Compression techniques


Mr. Moriya presented m3686 reporting the convergence of Twin-VQ and AAC. The rationalisation of the two toolsets covered extension of sampling rates and introduction of Flexmux interface. Various sampling rates from 8 kHz to 48 kHz are now supported. However, it was reported that there is some loss of quality as a result at certain combinations of bitrate/sampling frequency/programme_item. It was noted that the complexity is significantly reduced and some minor modifications to the quantisers and enhancement parts will be checked out by the end of August. Mr. Moriya also has to submit bitstreams for conformance testing.

Mr. Nishigushi presented the Sony Labs appraisal of the proposed changes to NB- and WB-CELP, m3700. This showed small improvements (typically one quarter of ‘slightly better than’) in the quality in all modes tested, using Japanese listeners listening to English and German.

M3756 was presented by Mr. Nomura, describing the changes that they were proposing to their CELP coder and their test results. The changes relate to the LSP-codebook and the LSP Parameter. Modified syntax was provided. Again, very little improvement is given (it is statistically significant, but is very small in terms of quality). Mr. Paley reported informal tests done by TI, where they had evaluated CELP coding with noise signals. He reported problems with noisy signals. It was ascertained that the changes in NB-CELP were constrained to the contents of a code table.

In m3763, Philips report test results relating to WB-CELP Optimised VQ and MPEG-4 VM (990508) Scaleable VQ. The results show a small improvement.

Mr. Richard presented m3758, which showed test results for three of the CELP options. Modified NB-CELP is worse than G.723.1 but better than the unmodified NB-CELP. Modified WB-CELP (mode7) is better, most of the time, than WB-CELP (mode3).

Mr. Tanaka also presented m3750 showing additional results with small differences.

Three decisions were required of the Subgroup.

1) NB-CELP mode 8 VM replace? Changes are to the informative post filter, normative codebook changes and new values for LPC window. Accepted.

2) Modified additional mode, 6.2 kb/s and 30 ms frame size? Changes are an additional configuration table entry in encoder and decoder (number of subframes and parses). 30 ms frame already there for lower bitrates. Accepted.

3) WB-CELP mode 7, modification to add additional modes? Add some tables (size 3K words of ROM, 850 words RAM), no new tools, but increased complexity, with 75% increase in ROM. Mr. Oomen carried out further analysis of the complexity of the new mode, showing significant extra computational complexity in the encoder (perhaps 10:1) and some increase in the decoder (2:1). Additional comparative listening was conducted in the task group to quantify the benefits or otherwise of the new mode. Even on limited listening, the improvements with respect to mode 3 of the NADIB test were clearly noticeable, which in the context of the NADIB test results is good news. What is not known is the benefit of the new mode with and without optimised VQ. Accepted.

It was noted that the addition of mode 7 allowing RPE and MPE for each of the earlier modes requires an additional bit in the CELP configuration syntax. A proposal for this change was approved and included in N2271. The reference software will be updated to reflect this change in the ad-hoc activity until the Atlantic City meeting.

TTSI (Text-to-Speech Interface)


The TTS/FBA ad-hoc report was presented by Mr. Lee. They had concentrated on assessing options for a markup language to convey the FBA information. 4 candidates, SSML, STML, JSML and SABLE. The recommendation was to develop a new tool. Bookmark tools are also affecting the TTSI functionality. Thus, it is proposed that full functions of facial expression bookmark should be in the new tool in MPEG-4 version 2. The proposal is given in m3627. It was suggested that a subset of full functionalities should be in version 1, just to time-lock TTS and lip movement.

Mr. Ostermann presented a demonstration of the insertion of FBA bookmarks. The audio is not affected; this has been previously demonstrated, but what about video? This can be affected by trick-mode, such as skip sequence, which could lose a facial expression change. The solution is to repeat bookmarks where necessary: in fact for every sentence in the limit. The matter was resolved and will be covered by appropriate entries in the normative and informative parts of the standard.

Mr. Lee reported the Korean NB paper on TTSI, m3692. Most of the points are editorial and were approved. Mr. Lee prepared a response to the Korean NB.

Mr. Lee also demonstrated, in the context of m3680, a combination of TTS/FBA with markup, showing functionalities of FBA, speed change, forward/reverse. The functionalities were proven. His work is based on SABLE markup language. In m3680, he also showed how the TTSI could be used with FBA and how the synchronisation could be achieved. This Markup TTS (N2286) will be added to the MPEG-4 Version 2 Working Draft.

The report on the harmonisation of TTSI and FBA is presented in document WG11/N2281.

Structured Audio


Mr. Ray presented the report of the SA ad-hoc group, m3610. The MMA contribution is given in m3609. The alignment of SASBF and MIDI DLS2 has been achieved and so common formats are now a reality. There was still concern about some incomplete or incorrect details in the not yet finalised DLS2. These items have been identified and work is continuing to rectify these shortcomings until the next MPEG meeting. The completed work is now being incorporated into the FCD. During the Dublin meeting, the matter of Levels for SA was addressed and is covered in an output document from Requirements.

At a joint meeting with ISG, the matter of complexity of SA was discussed at length. Analysis based on SAOL authorship applications was suggested. The possibility was discussed of new profile or level where only those portions of SAOL for FX should be supported.

Mr. Ray introduced the question of how one can compute the level of complexity of the SA tools. This topic is covered in documents m3602 and m3611. The proposal in m3602 was preferred as it relates to real platforms and is presented in output document WG11/N2282.

Conformance testing of SA was discussed and internal test points that may not be economically placed in a commercial decoder may be required. This will require further discussion in the SA ad-hoc group.

The location of PICOLA, the speed change tool, in the standard was debated, and it seemed that the obvious place for it was as a post-process in the FX processor and that it will be further discussed within the SA ad-hoc group.

Due to the limited availability of resources the backchannel requirement for SA is being deferred to MPEG-4 version 2.



Complexity


Mr. Spille presented m3605 the ad-hoc meeting report. Five new entries to the complexity table were received, but very little email traffic was noted. There are still some open issues for this meeting: where practical figures are not available, calculated figures will be introduced.

Systems issues


In a joint meeting with Requirements, Systems, and ISG the topic of audio composition was debated. Mr. Horbach presented m3587 on the topic. Studer AG have looked into the question of composition with a view to optimising processing power for typical composition actions, e.g. crossfade, panning, delay. Means of carrying out these operations were suggested and the principles discussed. Quality parameters were noted as important within composition.

Mr. Zoia presented similar ideas in m3604 to identify profiles and levels from an analysis of audio composition. Various proposals were made in the document for the Levels within audio composition and were discussed in the group. The proposals are well timed and will help in the Task Group to define the SA composition profiles.

Mr. Coleman and Mr. Teichman monitored the Systems discussions during the week and reported progress into the Audio Subgroup. For instance, random access to audio objects needs to be considered by Systems for both ‘clean’ access for editing and ‘dirty’ access for break-in. The issues raised are covered in document WG11/N2280.

Backchannel bitstream syntax


No time was available for discussing the issue of backchannel. It has been agreed that this topic is work for MPEG-4 version 2.

Other matters


Mr. Richard introduced m3783 describing the MPEG-4 audio demonstrator that he has developed. It copes with the functionalities of distance, multiple objects, spatialisation and real-time composition. He demonstrated this to members.

Audio 14496-3 FDIS (FDIS Oct 98)


Mr. Purnhagen presented m3745 on a review of the FCD, identifying a number of problems and offering solutions. His recommendations included adding 9 bits/frame (0.28 kb/s) to HILN for transmission of extra noise parameters for better coding of noise like signals. However, this has not been checked on other types of programme material. This is to be checked out during this week.

Also in m3745, changes in HILN are proposed to add scalability to HILN. Where this is only applied to the encoder, and where it does not change the bitstream, it would be an informative annex only to the FDIS. This was accepted. The other proposal, relating to normative changes, was not accepted. Additional editorial changes were accepted.

Mr. Grill presented the reformatted style for the FDIS such that the section editors could prepare their inputs in this fashion. He also reviewed, for the subgroup, the inputs that had been reviewed in the ad-hoc group and task group. The Study on the FDIS and the study on the DoCs are given in documents WG11/N2271 and N2272 respectively.

Conformance Testing 14496-4 WD (CD Dec98)


M3606 was presented by Mr. Spille. The ad-hoc group discussed briefly the issues of noise generators, HILN sine components, TTS synch with FBA, SA filter specification, and AAC conformance for perceptual noise tools. Further work was noted to be needed.

In the task group, a model for Audio Conformance elements was generated, as shown above. It was noted, in doing so, that a PCM-Elementary Stream needs to be introduced directly into the Compositor: this will need to be added to the FCDs of Audio and Systems. Also some elements will need specific forms of conformance testing: looking for 1 LSB max. error is no longer appropriate. Psycho-acoustic objective test methods may be useful (e.g. PEAQ from ITU-R), but they too have their drawbacks. Mr. Inoue offered, in m3759, a testing procedure for HVXC.



Mr. Spille worked with his editing group during the week and produced version 3 of the conformance working draft, document WG11/N2273.


Reference Software 14496-5 FDIS (FDIS Oct 98)


Mr. Purnhagen’s task group reviewed the bug reports on the VM software. Contributions were merged during the week. The group also discussed the concept of ‘thread-safe’ software and bitstream exchange. The latter is essential to check that the merged software was working as expected. But in the absence of an agreed file format for MPEG-4, how can bitstream exchange take place? Mr. S-W Kim volunteered to extract the bitstream parser from IM1 to be used in the Audio VM. The study on the FDIS and the study on the DoC on the FCD are presented in documents WG11/N2274 and N2275.

Requirements


Mr. Thom reviewed, with others, the Version 1 and Version 2 MPEG-4 Overview documents and added further information to them. These were amalgamated into the Requirements’ output documents, N2323 and N2324.

Profiles & levels


Mr. Tanaka presented m3744 describing the handling of PICOLA speed change tool in profiles and levels. The suggestion is that it becomes an optional element in any/all of the profiles. And that it be described in a separate table for optional tools. The input document also gives estimates for complexity. The Japanese NB position paper, m3684, also makes this point. Mr. Brandenburg suggested that it be included in the FX block of SA as an alternative. This latter view was upheld and a response to the Japanese NB was prepared.

Testing

NADIB tests


Mr. Dietz presented the report on the tests, m3796. The results show differences and similarities between codecs. The report and its conclusions show that

  • only reliable listeners were used

  • the two test site gave statistically different results, resulting in separate analysis of the two sets of results

  • some codecs gave a very programme-dependent performance

  • in 8 kHz test NB-CELP and G.723.1 performed equally well and better than Twin-VQ.

  • in 24 kHz test AAC-24 was the best.

  • MPEG-4 at 24 kb/s offers a worthwhile improvement to AM broadcasting,

  • scalability at 6+18kb/s is better than basic coding at 18 kb/s but not as good as basic coding at 24 kb/s.

  • WB-CELP(mode3) did not perform well for speech+music.

The reasons for some of these observations were discussed. It was agreed that the report should be edited into a form suitable for an output document, as given in document WG11/N2276. An abstract was added to the report rather than attempting to produce a summary report.

Speech codec tests


It was noted in listening to the collected speech test material that there was an unacceptable variation of levels. The decision was to adjust four of the German language samples to bring down this variation. This will be done prior to their use in the forthcoming tests.

There was a great deal of debate over what codecs should be included in the codec tests to be run and completed by the beginning of September. Great pressure was brought to bear by those who wanted new tools or variations on themes to be included. It was a requirement that the just accepted mode 7 WB-CELP coder be included in the testing. The final decisions are reported in the test plan given in document WG11/N2277.


Internet radio tests


The selection process was discussed in Task Group to determine what critical items should be included and who could participate. The chair, Mr. S-W Kim, and his team prepared a full specification and timeline for these tests including 17 codec/bitrate combinations. The plan shows the results being made available by 4th September and is given in document WG11/N2278. Additionally, interested members reviewed the full range of test material and produced a subset for these tests as documented in WG11/N2279.

Archival records of audio source material

The chair of the Test Subgroup inquired whether the Audio Subgroup had plans to establish an archive site for all of the source material that has been used in MPEG Audio testing. The Video Subgroup has a planned activity. It was agreed that an archival record was of interest. This will be addressed further at the next meeting.


Version 2 matters

Audio 14496-3/Amd 1 WD (CD Dec 98)


Mr. Dietz led a drafting group during the week and have reviewed the MPEG-4 Version 2 Working Draft. The revised text is given in document WG11/N2283. The list of Audio MPEG-4 version 2 work items now includes:

  1. Error resilience

  2. Environmental spatialisation

  3. Low delay

  4. Backchannel

  5. Dynamic range control

  6. Watermarking

  7. Markup TTS



IPR and content protection

Watermarking

Mr. Meares reported on discussions with CRL. They are now in a position to process files for coding. Mr. Oomen and Mr. Paley volunteered to code/decode the files and send them back to CRL for watermark extraction checks. The list of source files for these evaluations is given in document WG11/N2285. The Subgroup discussed ways in which the number of watermarking proposals could be extended beyond that of CRL’s. A number of codec proponents agreed to participate in an initial informal evaluation of the CRL watermarking technique and report the results of listening tests and any other techniques used to determine the effects of a watermarking technique on codec performance.

Intellectual Property Management and Protection (IPMP)

Mr. Meares reported that the trend of discussions on IPMP during the week was toward defining just an IPMP Interface which will be used to identify flags to show that IPMP was or was not active for a particular MPEG-4 Object. The actual IPMP elements would be outside MPEG-4 and would be one or more proprietary systems. There is, however, concern amongst members that the timing consequences of the IPMPI had not been discussed nor resolved. Indeed, the plans for the IPMP Group, presented in the final Plenary, confirm that even the first checks on timing implications will not be carried out until the Atlantic City meeting.


Error resilience


Mr. Dietz reviewed and edited the error resilience workplan in his task group. The resultant workplan is given in document WG11/N2284.

Low delay


No effort was available to pursue this topic at this meeting.

Environmental spatialisation


This topic was discussed in SNHC but not in Audio.

Other developments


TTSI Markup language for Version 2 was presented by Mr. Lee. He outlined the additional functionality relative to the version 1 options. The description of the Markup TTS is given in document WG11/N2286.

Yüklə 1 Mb.

Dostları ilə paylaş:
1   ...   8   9   10   11   12   13   14   15   ...   19




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin