2022.01.11 16:09

Why scale invariant feature transform

This service is more advanced with JavaScript available. Advertisement Hide. Conference paper. Keywords remote sensing image automatic geometric correction SIFT algorithm least squares. Download to read the full conference paper text. Zhang, M. Hohai University, Nanjing Google Scholar. Smith, S. Harris, C. Liu, X. To obtain rotational invariance of the descriptor, a dominant orientation in this neighbourhood is determined from the orientations of the gradient vectors in this neighbourhood and is used for orienting the grid over which the position-dependent histogram is computed with respect to this dominant orientation to achieve rotational invariance.

To find the dominant orientation, peaks are detected in this orientation histogram. In the case of multiple peaks, each peak is used for computing a new image descriptor for the corresponding orientation estimate. When computing the orientation histogram, the increments are weighted by the gradient magnitude and also weighted by a Gaussian window function centered at the interest point and with its size proportional to the detection scale.

To increase the accuracy of the orientation estimate, a rather dense sampling of the orientations is used, with 36 bins in the histogram. Moreover, the position of the peak is localized by local parabolic interpolation around the maximum point in the histogram.

Given these scale and orientation estimate for an interest point, a rectangular grid is laid out in the image domain, centered at the interest point, with its orientation determined by the main peak s in the histogram and with the spacing proportional to the detection scale of the interest point.

For each point on this grid, a local histogram of local gradient directions at the scale of the interest point. During the accumulation of the histograms, the increments in the histogram bins are weighted by the gradient magnitude. To give stronger weights to gradient orientations near the interest point, the entries in the histogram are also weighed by a Gaussian window function centered at the interest point and with its size proportional to the detection scale of the interest point.

This resulting image descriptor is referred to as the SIFT descriptor. Figure 2: Illustration of how the SIFT descriptor is computed from sampled values of the gradient orientation and the gradient magnitude over a locally adapted grid around each interest point, with the scale factor determined from the detection scales of the interest point and the orientation determined from the dominant peak in a gradient orientation histogram around the interest point.

To increase the accuracy of the local histograms, trilinear interpolation is used for distributing the weighted increments for the sampled image measurements into adjacent histogram bins. A closely related notion of orientation histograms "zoning" has also been previously used for optical character recognition Trier et al. To obtain contrast invariance, the SIFT descriptor is normalized to unit sum.

In this way, the weighted entries in the histogram will be invariant under local affine transformations of the image intensities around the interest point, which improves the robustness of the image descriptor under illumination variations.

To avoid local high contrast measurements from being given too excessive emphasis in the image descriptor, Lowe , proposed a two-stage normalization, where the entries after a first-stage unit sum normalization are limited to not exceed 0. The use of local position-dependent histograms of gradient directions for matching and recognition in SIFT constitutes a specific example of using image descriptors based on image measurements in terms of receptive fields.

More generally, receptive fields in terms of Gaussian derivatives have been proposed as a canonical model for linear receptive fields in computer vision by Koenderink and van Doorn , and Lindeberg , , b. The pyramid representation previously proposed by Burt and Adelson and Crowley and Stern and used by Lowe can be seen as a numerical approximation of such Gaussian receptive fields.

By the theoretical analysis in Lindeberg b it can be shown that such receptive fields capture inherent characteristics of the reflectance patterns of surfaces of objects and do thus enable visual recognition.

The use of scale selection in the interest point detection step ensures that the interest points will be invariant under scaling transformations Lindeberg , c , Specifically, the scale normalization of the image descriptor establishes a local scale-invariant reference frame which implies that also the image descriptors and the matching schemes based on those will be invariant under scaling transformations Lindeberg a , Thereby, image matching and object recognition based on such image features will have the ability to handle objects of different sizes as well as objects seen from different distances to the camera.

A more general set of scale-space interest point detectors for image-based matching and recognition and with better properties than Laplacian or difference-of-Gaussians interest points is presented in Lindeberg Given a set of image descriptors computed from two different images, these image descriptors can be mutually matched by for each point finding the point in the other image domain that minimizes the Euclidean distance between the descriptors represented as dimensional vectors.

To suppress matches that could be regarded as possibly ambiguous, Lowe only accepted matches for which the ratio between the distances to the nearest and the next nearest points is less than 0. Figure 3: Interest points detected from two images of the same scene with the computed image matches drawn as black lines between corresponding interest points. The blue and red arrows at the centers of the circles illustrate the orientation estimates obtained from peaks in local orientation histograms around the interest points.

If we would apply the above mentioned nearest neighbour matching approach for recognizing an object against a large collection of objects in a database, such nearest neighbour matching would imply comparisons to all the image descriptors stored in the database.

To speed up the resulting nearest-neighbour matching for larger data sets, Lowe applied an approximate best-bin-first BBF algorithm Beis and Lowe that scales better with increasing numbers of image features.

In later work Muja and Lowe , this approach has been furthered to hierarchical k-means trees and randomized k-d trees. When applying the SIFT descriptor for object recognition, Lowe developed a Hough transform approach based on triples of image matches to accumulate evidence for objects as represented by sets of interest points with associated image descriptors.

When integrating the different components together, matching based on the SIFT descriptor quickly established itself as a state-of-the-art method for image-based matching and object recognition. In an experimental evaluation of the robustness of different image descriptors performed by Mikolajczyk and Schmid , the SIFT descriptor was found to be more robust to image deformations than steerable filters, differential invariants, moment invariants, complex filters and cross-correlation of different types of interest points.

Ke and Sukthankar proposed an alternative approach for defining local image descriptors, similar to the SIFT descriptor in the sense of detecting interest points with associated scale estimates from scale-space extrema and performing orientation normalization from peaks in a local orientation histogram, but different in terms of the actual image measurements underlying the image descriptors.

Instead of computing gradient orientations, they first compute local maps of the gradient magnitude. These local patches are then oriented with respect to a dominant image orientation to achieve rotational invariance. A normalization to unit sum is also performed to achieve local contrast invariance. Then, these local gradient maps are projected to a lower-dimensional subspace with 20 dimensions using principal component analysis PCA. Thus, given a specific interest point, the corresponding gradient map is computed, and after contrast normalization projected to the lower-dimensional subspace.

Then, these local image descriptors are matched by minimizing the Euclidean distance. Different ways of extending the SIFT descriptor from grey-level to colour images have been proposed by different authors. For this, a concept similar to Harris corner detector is used. They used a 2x2 Hessian matrix H to compute the principal curvature.

We know from Harris corner detector that for edges, one eigen value is larger than the other. So here they used a simple function,. If this ratio is greater than a threshold, called edgeThreshold in OpenCV, that keypoint is discarded. It is given as 10 in paper. So it eliminates any low-contrast keypoints and edge keypoints and what remains is strong interest points. Now an orientation is assigned to each keypoint to achieve invariance to image rotation.

A neighbourhood is taken around the keypoint location depending on the scale, and the gradient magnitude and direction is calculated in that region. It creates keypoints with same location and scale, but different directions. It contribute to stability of matching. Now keypoint descriptor is created. A 16x16 neighbourhood around the keypoint is taken. It is divided into 16 sub-blocks of 4x4 size. For each sub-block, 8 bin orientation histogram is created.

So a total of bin values are available. It is represented as a vector to form keypoint descriptor. In addition to this, several measures are taken to achieve robustness against illumination changes, rotation etc.

Keypoints between two images are matched by identifying their nearest neighbours. But in some cases, the second closest-match may be very near to the first. It may happen due to noise or some other reasons.

azlablati1973's Ownd

0コメント

1000 / 1000