ISO/IEC DIS 14496-19 (N13561) had been issued in Incheon. Resolutions had been formulated by the same time and renewed in Vienna, urging for action on submitting IPR statements before the closing of the DIS ballot. Though the situation has improved since August 2013, patent statements received so far require further invetigation, and also some of the previously mentioned problems regarding the ISO database are not resolved yet. Therefore, it was decided to put the issuing of the FDIS on hold for one more meeting cycle. Several NBs had pointed to needs for text improvements during the ballot. A Draft DoC and Draft of FDIS integrating these changes were issued as output docs N13921 and N13922, respectively.
Further assessment of the different possible type-1 codecs (WVC, VCB, IVC) is also planned towards the 107th meeting (see notes under Wednesday plenary).
MPEG-7 Visual
See above AHG report M30724. Due to limited resources of contributors, the progress made in updating the XM software was not yet sufficient to start issuing of a new edition, but further work is planned in the continuing AHG.
CDVS
The BoG on CDVS was chaired by Miroslaw Bober and partially by Giovanni Cordara. The notes in this section were primarily recordings from the BoG, which met Monday-Thursday, usually whole day (except plenary times).
Decision was reached on
-
Progress to CD – yes
-
Keeping normative keypoint extraction
-
Introducing the new “ALP” framework for keypoint extraction in CD (based on analysis of third-order surface approximation of scale space)
-
Approx. 1% worse performance than SIFT
-
Describes generation of scale space in a generic way (specifying the properties of the set of Gaussian filters), which may be implementation costly
It was also considered to eventually recommend removal of inactive editors, insofar they had not contributed text elements to the current CD text, from the list of editors 15938-13. The current list had been set up very early, when the involvement was not yet fully foreseeable. No action was taken at this point in time.
Requirements related
14.1.1.1.1.1.1.1.25m31648 CDVidS Input Requirements [Jean-Ronan Vigouroux, Frédéric Lefèbvre, Hassane Guermoud, Joaquin Zepeda, Louis Chevallier]
General
14.1.1.1.1.1.1.1.26m31469 CDVS [Giovanni Ballocca, Alessandra Mosca, Massimo Mattelliano]
-
CE1: Key point detection
14.1.1.1.1.1.1.1.27m31487 CDVS: Suggestion of time complexity measurement for CE1 [Seungjae Lee, Keundong Lee, Sang-il Na, Woen-Geun Oh]
Several methods were proposed to evaluate time complexity, in particular comparing Windows implementations (Windows 32 bit, Windows 64 bit).
For Windows, the CPU status (idle/busy) has a significant impact. Several methods are proposed to evaluate times more reliably. The group decides that the current extract and match executables, that were so far compiled for Windows in 32 bits, will be moved to 64 bit implementation and software description and TM document will be updated accordingly.
14.1.1.1.1.1.1.1.28m31113 Huawei’s Response to CE 1: An Improved Block-Based Spatial LoG Interest Point Detector [Zheng Liu, Qiang Zhou, Guojun Xu, Giovanni Cordara]
Easy implementation, key point selection moved after descriptor extraction.
Find extrema in the middle layer first, only processing pixels that are possible extrama on remaining layers. Fast computation – 15 times faster vs TM7.0 Less memory computation – 93Byte of static; Dynamic memory is at most 0.87MB, Fully parallelable. Maintained TM7 performance. Crosscheck OK.
14.1.1.1.1.1.1.1.29m31555 CDVS: Crosscheck of Huawei's proposal m31113 response to CE1 [Jie Chen, Ling-Yu Duan]
Crosscheck noted.
14.1.1.1.1.1.1.1.30m31249 CDVS: STM response to Interest Point Detector CE1 [Danilo Pau, Emanuele Plebani, Arcangelo Bruna, Marco Marcon]
Evolution of proposal M30233. 23 times faster implementation vs TM7.0; 75% dynamic memory reduction (5Mb after reduction).
Only Gaussian Scale Space (GSS) is computed using overlap and save Fourier filtering. Laplacian in spatial domain. Computing gradients on the fly. Numbers of interest points is slightly lower. TPR is +0.2% higher. Number of FFT reduced by applying Laplacian in spatial domain.
Crosscheck OK.
14.1.1.1.1.1.1.1.31m31374 CDVS: Crosscheck of STMicroelectronics response to CE1 (m31249) [Alessandra Mosca, Massimo Mattelliano]
Crosscheck noted.
14.1.1.1.1.1.1.1.32m31662 Crosscheck of ST Micro's CE1 response [Abhishek Nagar]
Crosscheck noted.
14.1.1.1.1.1.1.1.33m31369 CDVS: Telecom Italia's response to CE1 - interest point detection [Gianluca Francini, Massimo Balestri, Skjalg Lepsoy]
ALP detector – approximation of space-scale using 3rd order polynomial from 4 LoG filters performed at each octave. First detection of maxima in the scale space, then comparison with neighbors and refinement of the location to obtain sub-pixel accuracy. “Good” extrema in scale space resulting in stable matching. All operations can be done pixel-wise, fast (40% of the overall pipeline time reduction with vlf). Performance maintained in the pairwise, some drop in retrieval.
14.1.1.1.1.1.1.1.34m31376 CDVS: Crosscheck of Telecom Italia response to CE1 (m31369) [Alessandra Mosca, Massimo Mattelliano]
Cross check OK.
14.1.1.1.1.1.1.1.35m31398 CDVS: PKU response to CE1 - v1 [Jie Chen, Lin-Yu Duan, Tiejun Huang, Wen Gao]
Block based processing extended to extrema detection and orientation assignment + descriptor computation with significant reduction of the memory footprint.
Block size reduced to 128*128 with the corresponding reduction in complexity.
Dynamic buffer introduced. Feature selection moved after descriptor.
Detection Time: TM7: 462ms, Improved BFloG 131ms, Low Complexity BFloG : 113ms
Total memory reduction to 957KB (13MB TM7.0).
Some performance improvement +0.48% in Retrieval.
14.1.1.1.1.1.1.1.36m31625 CDVS: PKU response to CE1 - v1.update [Jie Chen, Ling-Yu Duan, Tiejun Huang, Wen Gao]
14.1.1.1.1.1.1.1.37m31629 Cross-check of m31398 [Sang-il Na, Weon-Geun Oh, Insu Won, Dong-Seok Jeong]
Crosscheck OK.
14.1.1.1.1.1.1.1.38m31399 CDVS: PKU response to CE1 - v2 [Jie Chen, Lin-Yu Duan, Tiejun Huang, Wen Gao]
Changed frequency based processing to mixed domain processing. Laplacian moved to spatial domain. Filtering time reduced 371mS (TM7)-> 31mS (31398) -> 24 mS (this proposal). Static memory reduced from 31kB to 18kB. Comparable results to TM7.
Crosscheck OK.
14.1.1.1.1.1.1.1.39m31630 CDVS: Cross-check of m31399 [Sang-il Na, Keundong Lee, Seungjae Lee, Weon-Geun Oh, Insu Won, Dong-Seok Jeong]
Crosscheck noted.
14.1.1.1.1.1.1.1.40m31486 CDVS: ETRI's Response to CE1 [Sang-il Na, Keundong Lee, Seungjae Lee, Woen-Geun Oh]
Scale normalised LoG. Rough feature selection and update gradient buffer. Gradient only needs to be calculated on the feature points (partial computation of the gradients). Performance largely maintained, some improvement on the localisation. 3 times faster compared to TM7.0.
Cross check OK.
14.1.1.1.1.1.1.1.41m31490 CDVS CE1: Parallel CABOX Filtering and Key Points Detection [Gaurav Srivastava, Victor Fragoso, Abhishek Nagar, Zhu Li, Kyungmo Park]
Approximating Gaussian filtering with cascade difference of box filtering: developed a 2-step LS-LASSO scheme to compute a sparse linear combinations of box filters from a large dictionary, which offers a good speed up of the extraction process. It is quite different from SIFT detection.
Some drop in performance (-5% MAP, -1.8% TPR)
14.1.1.1.1.1.1.1.42m31687 Crosscheck of Samsung’s Proposal m31490 Response to CE1 [Jie Chen, Ling-Yu Duan]
Cross verification OK only for Pairwise.
14.1.1.1.1.1.1.1.43m31718 CDVS: Addendum to m31369 [Gianluca Francini, Massimo Balestri, Skjalg Lepsoy]
Late contribution presented as an addendum to CE1, as requested by the group, describing the modifications introduced to the feature selection, in order to fully exploit ALP detector capabilities.
14.1.1.1.1.1.1.1.44CE1 overall conclusion:
The BO group recommended that ALP should be adopted to the TM and included in the CD, and a collaborative CE-1 started to combine efficient filtering approaches presented at 106th meeting with the base ALP approach to detect extrema in the scale space. The main reason for this decision is the strong differentiation wrt to the SIFT-based framework. The new CE-1 objective is to improve the overall performance, reduce complexity and memory use and test stability and interoperability.
CE2: Global descriptor
14.1.1.1.1.1.1.1.45m31485 CE2 update - v2 [Miroslaw Bober, Syed Husain, Karol Wnukowicz, Stavros Paschalakis]
Document not presented/withdrawn.
14.1.1.1.1.1.1.1.46m31401 CDVS: PKU response to CE2 - v1 [Jie Lin, Zhe Wang, Lin-Yu Duan, Tiejun Huang, Wen Gao]
Improvements of the SCFV global descriptor are presented. In particular, two different solutions are presented: the first one relies only on a different testing dataset, that contains a balanced number of 2D and 3D objects (only 3D objects used for training so far).
The new training dataset contains images from Flickr and Peking dataset used in CADAL Chinese project. Memory usage of PCA+GMM tables reduced to 12KB. Only one threshold is adopted for SCFV matching process. Significant improvement achieved, 1.81% increase of TPR. Missing retrieval results with first method: results to be provided during the week.
In the second method, the same pipeline of the first method is maintained; further improvements are achieved using RootSIFT, and increasing the number of clusters from 128 to 200 (results reported also for 256). Significant improvement shown, 1.9% on average increase of TPR, even bigger with 256 clusters, that also shows improvements on retrieval (1.31% on average).
14.1.1.1.1.1.1.1.47m31402 CDVS: PKU response to CE2 - v2 [Jie Lin, Zhe Wang, Lin-Yu Duan, Tiejun Huang, Wen Gao]
Document not presented/withdrawn.
14.1.1.1.1.1.1.1.48m31426 Improving performance and usability of CDVS TM7 with a Robust Visual Descriptor (RVD) - CE 2 Proposal from University of Surrey and Visual Atoms [Miroslaw Bober, Syed Husain, Stavros Paschalakis, Karol Wnukowicz]
In Vienna was compared to TM6. Now also on 7. Improvements on 2 tests related to 3D object. Only 1 parameter to control TPR vs FPR. Incorporated it on TM7 and look at size of GD and noticed that size was higher for GD than in TM so some steps taken to decrease size with some drop in performances compensated with further tuning. Experiment 4 and 5, up to 16% lower performances (1KB) for TM7 on 3D objects. Tried a RVD with 2 years of dev on robust statistics. 170 class centers increasing vector assigned to each class. PCA to 48 dim (root sift), than RVD aggregation, cluster selection (new), sign binarization and bit selection. RVD aggregation. Normalize error vector to unity using L1 norm. Don’t care how far vector is from center and all vector have same impact from the cluster to limit outliers.Then multi assignement to vector to 3 clusters because to reuse vectors multiple time since 300 vectors (too few) and need to be re used. Not sensitive using rank 3 plus weighting in order to have same impact. Select cluster based on occupancy. Matching weighted hamming uses 1 parameter to weight it and penalties for empty cluster to reflect same occupied cluster to contribute to image matching. Probabilistic view on occupancy vs matching of images. Reduced parameters from 14 to 1. Table size 16.4 KB vs 21.5 KB. Now comparable with latest SCFV presented at MPEG 106. Accuracy comparable with TM7 but mAP, TPR improvements on 4 and 5. Lifted 3D TPR. Notice that TM7 has larger GD size compared to RVD at 16K by 8% and this acts as advantage for TM7. RVD can be improved by using this bitrate limit. Recommending collaborative experiments with PKU latest proposal because uses 20% more bits for profile 13 in the retrieval experiment.
Solution based on the RVD presented at last meeting. The approach is identical to the input M30311. The only difference is that the size of the global descriptor has been changed in order to match the size of the global descriptor currently used in TM7.
Also this solution just utilizes a single threshold for pairwise matching.
25% lower memory compared to TM7.0
Confirmed significant improvements presented at the last meeting, in particular for TPR (+1.21%), less prominent improvement for retrieval (0.38% MAP).
Localization slightly drops because of the difference of the size of the global descriptor.
The BO group recommends that a collaborative Core Experiment is started, in order to identify the optimal configuration for the global descriptor (e.g. size, number of clusters, thresholding mechanisms) and to further improve performances.
14.1.1.1.1.1.1.1.49m31611 CDVS: crosscheck of m31426 [Massimo Balestri]
Crosscheck noted.
14.1.1.1.1.1.1.1.50m31491 CDVS CE2: AKULA - Adaptive Cluster Aggregation [Abhishek Nagar, Zhu Li, Gaurav Srivastava, Kyungmo Park]
A type of intra independent coding w/out any global model. For PWM distance is straightforward. Local accuracy +2.7% . Need binarization scheme to work for retrieval? 1KB memory cost. SCFV used for indexing. Iterative optimization process. Descriptor is centroids and sift count. 16 clusters. Distance metric, min centroids distance weighted by SIFt count. Inner loop aggregation. Force SCFV to use different decision metrics, for example hamming weighting distance metric. Akula rates 64 B @ 512B 128 @1-4K, 256B @ 8-16K .FAR 0.85% @ 512B. Very efficient compaction. Huge gain of localization @ lower rates.
Method, presenting an aggregation not depending on a global model. K-means and local quantization applied to every image independently, with a new method based to an adaptive histogram computation and comparison.
Very first results obtained, just with the 1st subspace. No results ready for retrieval yet. Further work needed.
14.1.1.1.1.1.1.1.51m31664 CDVS: crosscheck of m31491 [Massimo Balestri]
Crosscheck noted.
Retrieval and Matching
No additional contributions noted.
Dostları ilə paylaş: |