The following joint meetings were held.
Joint meeting CDVS/3DG on usage of desriptors in VR/AR 12-13 (3G room)
Joint meeting CDVS/3DG on relation of AR and 3D Video Thu 9-10 (3DG room):
-
Augmentation requires depth map (and its meaning of representation), camera parameters, position of light sources also desirable.
-
Problem: Camera parameters of MVC do not include the scale (are in pixel coordinates)
-
Temporal consistency of depth is important
-
Reliability of depth is important
-
Contradiction VSO and reliable depth?
-
Provide videos, depth maps (incl.camera parameters) and bit streams.
Joint meeting with Requirements on HEVC issues Mon 15-17 (Req. room)
Joint meeting with JCT-3V on MFC Tue 9-10
Thu 12:30-13:00 Decoder optimization (Req. room)
Thu 13:00-13:30 Conformance (3G room)
Thu 13:30-14:00 Liaison (3G room)
Thu afternoon: Communication white papers (R7) – CDVS experts were present to discuss finalization of CDVS white paper.
Output Documents planned
A provisional list was set up in the opening plenary and further updated during the Wednesday plenary; see list of final output docs in section 12.1.
MPEG-4 part 2Visual
Following the Defect report from the previous meeting, and after further checking by experts, a DCOR N13322 was issued for correction of a rounding operation in interlaced motion comp, where the current text and software/conformance deviate.
MPEG-4 part 10 AVC
Following the Defect report from the previous meeting, and after furter checking by experts, a DCOR N13325 was issued for correction of the sub-bitstream extraction process in AVC. Furthermore, PDAM4 (N13329, Request N13328) was issued including a new SEI message for tone mapping, and including a code point for the wide-gamut color space description corresponding to ITU-R BT.2020 (UHD formats) in VUI. Both changes are in alignment with the same definitions in HEVC.
In coordination with JCT-3V which had met in parallel under ITU-T auspices, the following documents were approved as outputs:
-
N13324 ISO/IEC 14496-4:2004/PDAM41 Conformance testing of the MVC plus depth extension of AVC
-
N13326 DoC on ISO/IEC 14496-10:2012/DAM2
-
N13327 ISO/IEC 14496-10:2012/FDAM2 MVC extension for inclusion of depth maps
-
N13331 Text of ISO/IEC 14496-10:2012/PDAM5 Multi-resolution Frame Compatible Stereo Coding (Request N13330)
-
N13332 Study Text of ISO/IEC 14496-10:2012/PDAM3 AVC compatible video-plus-depth extension
-
N13333 3D-AVC Test Model 5
It was further planned to issue a new edition of 14496-10 upon completion of Amds. 3–5.
MPEG-7 Visual
See above under AHG m27320. No specific input was brought for the current meeting.
CDVS
The discussions about CDVS were held as breakout activity, reporting back to the video plenary. One of the video chairs (JO) partially participated when the BoG discussed the structure and specification methodology of WD2. All recommendations of the BoG concerning WD3 and TM5, adoptions and definitions of new CEs were later approved by the video plenary (no objections were raised against any recommendation). BoG meetings were held Monday through Thursday and the subsequent reports are partially taken from the notes in the BoG (recorded by M. Bober and G. Cordara).
General
4.2.0.1.1.1.1.1.23m27344 CDVS Whitepaper: Update [Danilo Pau, Talha Soysal, Arcangelo Bruna, Emanuele Plebani]
Presented/discussed in joint meeting with the Communications SG.
4.2.0.1.1.1.1.1.24m27346 Update on UBC SIFT patent situation [Danilo Pau, Doug Sorensen]
The information brought in this document was considered, and this led to the decision to delay the CD and investigate possible alternatives than SIFT for the keypoint detection.
Breakout Monday 21/01:
Discussion about CDVS pipeline
The group agreed that the S-mode should be removed from the software pipeline and the corresponding documents.
CE and other technical inputs CE 1 Global Descriptors
4.2.0.1.1.1.1.1.25m28061 Peking Univ. Response to CE1: Performance Improvements of the Scalable Compressed Fisher Codes in TM 4 [Jie Lin, Ling-Yu Duan, Shuang Yang, Jie Chen, Alex C. Kot, Tiejun Huang, Wen Gao]
An improvement of the scalable GD (Global Descriptor) using Fischer Vector is presented.
Combination of the gradient vectors w.r.t. both the mean and the variance of Gaussian functions at higher operating points is proposed. Memory reduction (down to 21KB, reducing both PCA and GMM tables) was obtained. The descriptor length increases for higher operating points, where the variance is added, thus resulting in a decrease of the number of local descriptors sent. Comparable results were obtained with respect to current TM4 in terms of pairwise matching and localization accuracy. >2% increase in mAP, >%1 PTM. 32 parameters used for pairwise matching (102 are used in current TM4). Point of attention: reduction of number of keypoints.
Contribution adopted: parameters to control the GD will be made configurable through the parameter file. Code for training was to be made available during the meeting.
4.2.0.1.1.1.1.1.26m28082 CDVS: Cross-Check of Peking University responses to CE 1 [M28061] [Zheng Liu]
4.2.0.1.1.1.1.1.27m28062 Peking Univ. Response to CE 1:Optimizing the parameters of the SCFV global descriptor in pair-wise matching [Xiaofang Wang, Ling-Yu Duan, Shuang Yang, Jie Lin, Alex C. Kot, Tiejun Huang, Wen Gao]
The contribution focuses on the optimization of parameters of the GD: no changes in the GD and the bitstream have been introduced. In TM4 there are 102 parameters, computed on MPEG datasets. A continuous model for curve fitting has been presented with a reduction of the number of parameters to 32. The contribution also presents an independent dataset used for the training. The group should continue the effort for reducing the number of parameters further.
Contribution adopted: parameters to control the GD will be made configurable through the parameter file. Code and dataset for training were to be made available during the meeting.
4.2.0.1.1.1.1.1.28m28083 CDVS: Cross-Check of Peking University responses to CE 1 [M28062] [Zheng Liu]
4.2.0.1.1.1.1.1.29m28281 CDVS CE1: Improve Global Descriptor Matching with Order Statistics on Hamming Distance [Zhu Li, Xin Xin, Abhishek Nagar, Gaurav Srivastava, Felix Fernandes, Kong Posh Bhat]
Method aiming at improving accuracy of the GD (Global Descriptor) pairwise matching analyzing order statistics of hamming distances, in particular the different behavior for matching pairs and non matching pairs. Distances are matched to affinity values. Weights are computed through a LDA computed on a subset 32 Hamming distances. No changes were made in the bitstream.
Results were shown on MPEG datasets, but not fully integrated into the TM. Therefore, no actions were taken at this meeting. The approach needs to be tested with the new GD adopted at this meeting. The proponent plans to continue working on this topic.
CE 2 Local Descriptor Compression
4.2.0.1.1.1.1.1.30m28063 Peking Univ. Response to CE2: Performance Improvement of S-mode [Jie Chen, Ling-Yu Duan, Jie Lin, Alex C. Kot, Tiejun Huang, Wen Gao]
Since the group decided to keep only the S-mode, the contribution on S-mode was to be reviewed after the final review of CE contributions, together with the other technical inputs.
4.2.0.1.1.1.1.1.31m28084 CDVS: Cross-Check of Peking University responses to CE 2 [M28063] [Zheng Liu]
4.2.0.1.1.1.1.1.32m28316 CDVS CE2 Cross-check of PKU contribution M28063 [Karol Wnukowicz, Stavros Paschalakis]
4.2.0.1.1.1.1.1.33m28179 CDVS CE2: Local Descriptor Compression [Stavros Paschalakis, Karol Wnukowicz, Miroslaw Bober, Alessandra Mosca, Massimo Mattelliano]
Improvements of the H-mode are presented in the document, introducing also in this mode the one-to-one matching and weighted matching integrated last time into the S-mode. Also, the arithmetic coder has been replaced with prefix coding to reduce complexity. A 6% drop in compression efficiency (number of local features inserted into the bitstream) is reported as a result of removing arithmetic coding. Overall performances, also thanks to the one-to-one and weighted matching was slightly increased.
Contribution adopted. The group agrees in keeping the arithmetic coding inthe TM software for this part of the pipeline as a fall-back option in case performance is too degraded when integrated with the new GD.
This was agreed to be discussed further during the week with a plan to decide what actions to take in case performance is decreased (only one TM needs to be generated, preliminary results generated).
4.2.0.1.1.1.1.1.34m28207 CDVS CE2: Cross-check of Visual Atom's Proposal [Ryota Mase, Kota Iwamoto]
4.2.0.1.1.1.1.1.35m28278 CDVS CE2: Improve Local Descriptor Compression with SIFT Level Ground Truth [Zhu Li, Xin Xin, Abhishek Nagar, Gaurav Srivastava, Felix Fernandes, Kong Posh Bhat]
CE 3 Feature Point Location Coding
4.2.0.1.1.1.1.1.36m28131 CDVS Core Experiment 3 related: reduction of sum-based context tables [Giovanni Cordara, Lukasz Kondrad, Jacek Konieczny, Nicola Piotto]
Linear approximation of the context probabilities was proposed. For a given block size, two orders of magnitude reduction in table size were reported. No impact on TPR and FPR and localization accuracy were reported. 0.5%-1% loss was reported in terms of coordinate compression efficiency, with a maximum 1 key point loss. TM block-size was maintained as 6. This was accepted into TM5.0 under the condition of fast integration.
4.2.0.1.1.1.1.1.37m28132 CDVS: Cross check of Huawei response to CE3 (m28131) [Gianluca Francini]
4.2.0.1.1.1.1.1.38m28244 CDVS: ETRI's Response to CE3 [Sang-il Na, Seung-jae Lee, Weon-geun Oh, Kyung-min Choi, Hae-kwang Kim]
Two block sizes were proposed for context coding: 55 blocks coded at 5 and the remaining coded at 7. The coordinate bit compression was increased by 20% but this did not result in any improvement in TPR, FPR. Marginal improvement in localization: average 84.00 vs 84.22. Some increase in complexity due to the variable block size. The group decided not to include in the TM.
4.2.0.1.1.1.1.1.39m28247 CDVS: Crosscheck of m28244 response to CE3 [Jie Chen]
CE 4 Key Point Detection
CE4 discussion
Two proposals provided the full set of results and had been cross-checked.
-
The Peking University proposal had been crosschecked by Huawei.
-
The STM proposals had been crosschecked by Telecom Italia and Sisvel.
4.2.0.1.1.1.1.1.40m28066 Peking Univ. Response to CE4: Frequency Domain LoG Detector [Fangkun Wang, Ling-Yu Duan, Jie Chen, Tiejun Huang, Wen Gao, DaniloPietro Pau]
Frequency domain LoG was presented. Filtering is done in the frequency domain, then LoG is computed. The detection of local extrema is performed after the inverse Fourier transform in the spatial domain. Re-training of the different elements of the pipeline had been perfromed. Reported performance tests are showing a small drop in retrieval mAP (average drop around 0.57%, bigger drop at lowest operating point). Also a slight drop in TPR accompanied by some drop in FPR. This suggests comparable performance.
Comment: the frequency domain processing corresponds closely (equivalent under certain conditions) to the image domain processing.
4.2.0.1.1.1.1.1.41m28085 CDVS: Cross-Check of Peking University responses to CE 4 [M28066] [Zheng Liu]
4.2.0.1.1.1.1.1.42m28076 STMicroelectronics response to CE4: Fourier transform Based interest point detector using LoG frequency response [Danilo Pau, Ettore Napoli, Giorgio Lopez, Emanuele Plebani, Arcangelo Bruna, Doug Sorensen]
See notes below under M28080
4.2.0.1.1.1.1.1.43m28092 CDVS: Crosscheck of STMicroelectronics response to CE4 (m28076)[Alessandra Mosca, Massimo Mattelliano]
4.2.0.1.1.1.1.1.44m28080 STMicroelectronics response to CE4: Fourier transform Based interest point detector for Scales Computation [Danilo PAU, Ettore Napoli, Giorgio Lopez, Emanuele Plebani, Arcangelo BRUNA, Doug SORENSEN]
M28080/28076 – Interest point detector based on block-based filtering in the frequency domain, (similar in design to M28066, although independent proposals). The difference between the '076 and '080 proposals mostly relates to the used filter banks: LoG (076) vs. FBSC filter (DoG kernel) (080). The image is decomposed into blocks for low-pass and band-pass filtering, and is subsequently recomposed to locate the extrema in the image. Results for H-mode and S-mode are presented, without any retraining of the pipeline. FBSC shows a drop around 0.5% TPR (on average) and 0.88% in mAP (compared to TM4). FBLoG shows a drop of 1.7% for TPR and 3.4% for mAP. Complexity is significantly decreased vs. frame based approach (066). Memory necessary for the hardware implementation is reported as 574 kB (128x128 pixels blocks) and can be potentially lowered to 240 kB by using 64x64 blocks. Extraction performance 1.35 s vs. 0.45 s for the TM, without exploiting block parallelism.
4.2.0.1.1.1.1.1.45m28098 CDVS: Cross check of STM response to CE4 (m28080) [Massimo Balestri]
4.2.0.1.1.1.1.1.46m28090 CDVS: Huawei's Response to CE 4: Preliminary Results by Fourier Transform Based LOG [Zheng Liu, Qiang Zhou, Guojun Xu]
Same approach as presented by 28066, division of the image is also applied, but in slices, instead of blocks, and image is not recomposed. Just the location of the key-points is stored. A drop in performance results was reported, in particular because in the frequency-domain analysis is performed at lower resolution in order to maintain the complexity unchanged.
It was suggested to try to use just LoG and keep the CE open.
Conclusion:
The group was very pleased to receive these contributions and decided that the strong features of M28080/28076 (low complexity hardware optimized) with M28066 (less drop in performance vs. TM4.0) should be combined and further optimized in a CE relating to key-point detection.
The proposal M28090, while not cross-verified, also shown some good points, and the CE will be open to other contributions – which are encouraged.
CE 5 Local Descriptors CE 6 Retrieval
4.2.0.1.1.1.1.1.47m27873 MPEG-7 CDVS: Telecom Italia's response to CE6 on S-mode sort method [M. Balestri, G. Francini, S. Lepsoy, A. Varesio]
In the current TM, all descriptors are extracted, sorted based on relevancy, and low-relevance descriptors are later rejected. This proposal is to reduce computational complexity by extracting only the descriptors classified as relevant. The proposal is interesting but some implementations issues are to be considered carefully, so it is recommended that an implementation plan be suggested at the next meeting.
4.2.0.1.1.1.1.1.48m28064 Peking Univ. Response to CE6: Retrieval Code optimization [Shuang Yang, Ling-Yu Duan, Zhe Wang, Alex C. Kot, Tiejun Huang, Wen Gao]
Code optimizations were reported on the retrieval pipeline – and, in particular, for its first stage. The re-ranking of the current TM does not have high resource consupmtion – most of the resources are consumed in computing the query function (WordVisited and POP_COUNT). Code optimizations had been introduced to reduce the retrieval time. The speed-up of the first stage of the retrieval pipeline varies between 3–4 times. Excellent complexity reduction was reported. Decision: Include in the TM.
4.2.0.1.1.1.1.1.49m28086 CDVS: Cross-Check of Peking University responses to CE 6 [M28064] [Zheng Liu]
4.2.0.1.1.1.1.1.50m28065 Peking Univ. Response to CE6: An indexing approach to speed up retrieval [Zhe Wang, Ling-Yu Duan, Jie Lin, Alex C. Kot, Tiejun Huang, Wen Gao]
An indexing approach was presented to speed–up retrieval. A multi block based index table is introduced, and Hamming distance is computed only for a subset of candidate images. A global index is applied, aiming at retaining only the most promising images. SCFV is divided into blocks, partitioning the whole GD into different blocks of Gaussian components. For each combination, there is an index table, where the entry table is given by the different enumerations of the binary combinations. 70% of the images are eliminated based on this first analysis. Only on the remaining 30%, Hamming distance is computed. Without any performance drop, the reported speed up is around 50% for the first stage. Further speed-up can be achieved with some performance reduction (up to 2% with 20 times speedup).
This was a good proposal overall, affecting the non-normative part, and only applied on the retrieval side. However, results were not shown on the extended global descriptor that was accepted at the 103rd meeting. Contributors were asked to please extend this to the new GD and show results within the TM5.0 contexts. The proposal is interesting, so it was recommended that an integration plan be adopted at the next meeting.
4.2.0.1.1.1.1.1.51m28087 CDVS: Cross-Check of Peking University responses to CE 6 [M28065] [Zheng Liu]
CE 7 Feature Selection
4.2.0.1.1.1.1.1.52m28240 CDVS CE7: 2-Way SIFT Matching to Improve Image Matching Accuracy [Xin Xin, Zhu Li, Abhishek Nagar, Gaurav Srivastava, Felix C. A. Fernandes, Kong Posh Bhat]
The contribution introduces two-way matching to the SIFT descriptors, in order to make the matching more robust. TPR was slightly increased and FPR decreased of 0.5%.
Unfortunately, no cross-verification was available. These seemed to be very promising results, and were expected to extend to retrieval. The group considered the proposal to have good potential but would like to see full cross-verification and also results for all bit rates before TM adoption. Samsung committed to release the software based on TM5.0 in order for the group to be able to better evaluate the performance.
4.2.0.1.1.1.1.1.53m28246 CDVS: Crosscheck of m28240 response to CE7 [Jie Chen]
4.2.0.1.1.1.1.1.54m28241 CDVS CE7: Robust Feature Selection with Self-Matching Score [Zhu Li, Xin Xin, Abhishek Nagar, Gaurav Srivastava, Kong Posh Bhat, Felix C. A. Fernandes]
This contribution proposed a random affine image transformation followed by self-matching. Descriptors are selected based on consistently matched pairs to indicate stability. This seemed to be an interesting idea, but further work was needed. It seemed too complex – so simplification was needed. No cross-verification was presented when reviewed.
4.2.0.1.1.1.1.1.55m28300 CDVS: Crosscheck of Samsung US response to CE7 [Seung-jae Lee, Sang-il Na, In-soo Won, Weon-geun Oh, Dong-seok Jeong]
-
4.2.0.1.1.1.1.1.56m28279 CDVS CE8: SKIP Mode - Reconstructing Global Descriptor from Local Descriptors at Server End [Abhishek Nagar, Xin Xin, Zhan Ma, Zhu Li, Gaurav Srivastava, Ankur Saxena, YoungKwon Lim, Felix Fernandes, Kong Posh Bhat]
This contribution proposed a skip mode for GD. Global descriptor is not present in the bitstream – it is reconstructed from local descriptors that are present. The problem concerns the retrieval – GD has less information, and efficiency of the re-ranking stage was reduced. It may be difficult to reconstruct GD from local descriptors. System implications were suggested to need study.
4.2.0.1.1.1.1.1.57m28280 CDVS CE8: Differential Coding of Global Descriptor from Local Descriptors [Abhishek Nagar, Xin Xin, Zhan Ma, Zhu Li, Gaurav Srivastava, Felix Fernandes, Kong Posh Bhat]
A compression scheme for the GD was proposed, aiming at removing redundancy in the query through differential encoding. FPR slightly increases with a corresponding slight increase in TPR, at the cost of an increase of complexity. Localization is improved by 0.79%.
Cross-verification was ongoing (had not been completed yet). The results should have been verified also on the retrieval pipeline. No action was taken on this.
Dostları ilə paylaş: |