Home > An Efficient Image Matching Method for Multi-View Stereo

An Efficient Image Matching Method for Multi-View Stereo

Page 1
An Efficient Image Matching Method for Multi-View Stereo
Shuji Sakai1, Koichi Ito1, Takafumi Aoki1, Tomohito Masuda2, and Hiroki Unten2
1 Graduate School of Information Sciences, Tohoku University,
Sendai, Miyagi, 980–8579, Japan sakai@aoki.ecei.tohoku.ac.jp
2 Toppan Printing Co., Ltd., Bunkyo-ku, Tokyo, 112–8531, Japan
Abstract. Most existing Multi-View Stereo (MVS) algorithms employ the image matching method using Normalized Cross-Correlation (NCC) to estimate the depth of an object. The accuracy of the estimated depth depends on the step size of the depth in NCC-based window matching. The step size of the depth must be small for accurate 3D reconstruc- tion, while the small step significantly increases computational cost. To improve the accuracy of depth estimation and reduce the computational cost, this paper proposes an efficient image matching method for MVS. The proposed method is based on Phase-Only Correlation (POC), which is a high-accuracy image matching technique using the phase components in Fourier transforms. The advantages of using POC are (i) the corre- lation function is obtained only by one window matching and (ii) the accurate sub-pixel displacement between two matching windows can be estimated by fitting the analytical correlation peak model of the POC function. Thus, using POC-based window matching for MVS makes it possible to estimate depth accurately from the correlation function ob- tained only by one window matching. Through a set of experiments us- ing the public MVS datasets, we demonstrate that the proposed method performs better in terms of accuracy and computational cost than the conventional method.
1 Introduction
In recent years, the topic of Multi-View Stereo (MVS) has attracted much atten- tion in the field of computer vision [1–10]. MVS aims to reconstruct a complete 3D model from a set of images taken from different viewpoints. The major MVS algorithm consists of two steps: (i) estimating the 3D points on the basis of a photo-consistency measure and visibility model using a local image matching method and (ii) reconstructing a 3D model from estimated 3D point clouds. The accuracy, robustness and computational cost of MVS algorithms depend on the performance of the image matching method, which is the most important factor in MVS algorithms.

Page 2
2 S. Sakai et al.
Most MVS algorithms employ Normalized Cross-Correlation (NCC)-based image matching to estimate 3D points [1,5,6,8–10]. Goesele et al. [5] have ap- plied NCC-based image matching to the plane-sweeping approach to estimate a reliable depth map by cumulating the correlation values calculated from multiple image pairs with changing the depth. Campbell et al. [8] estimated a depth map more accurately than Goesele et al. [5] by using the matching results obtained from neighboring pixels to reduce outliers. Bradley et al. [9] and Furukawa et al. [10] achieved robust image matching by transforming the matching window in accordance with not only the depth but also the normal of the 3D points. In the MVS algorithms mentioned in the above, an NCC value between matching windows is used as the reliability of a 3D point. The optimal 3D point is estimated by iteratively computing NCC values between matching windows with changing the parameter of 3D point, i.e., depth or normal. For example, the plane-sweeping approach such as that of Goesele et al. [5] computes NCC values between matching windows with discretely changing the depth and selects the depth that has the highest NCC value as the optimal one. To estimate the accurate depth, a sufficiently small step of the depth must be employed, which significantly increases computational cost. If the step of the depth is small, the translational displacement of a 3D point is a sub-pixel on the multi-view im- ages. Most existing methods assume that the sub-pixel resolution of a matching window is represented by linear interpolation. This assumption, however, is not always true. In this paper, we propose an efficient image matching method for MVS using Phase-Only Correlation (POC) (or simply ��phase correlarion��). POC is a kind of correlation function calculated only from the phase components in Fourier transform. The translational displacement and similarity between two images can be estimated from the position and height of the correlation peak of the POC function, respectively. Kuglin et al. [11] proposed a fundamental image matching technique using POC, and Takita et al. [12] proposed a sub-pixel im- age registration technique using POC. The major advantages of using POC- based instead of NCC-based image matching are the following two points: (i) the correlation function is obtained only by one window matching and (ii) the accurate sub-pixel translational displacement between two windows can be es- timated by fitting the analytical correlation peak model of the POC function. By applying POC-based image matching to depth estimation, the peak position of the POC function indicates the displacement between the assumed and true depth. Hence, we can directly estimate the true depth from the results of only one POC-based window matching. By introducing POC-based image matching to the plane-sweeping approach, we need little window matching to estimate the true depth from multi-view images. In addition, the accuracy of depth estima- tion can be improved by integrating the POC functions calculated from multiple stereo image pairs. Thus, using POC-based window matching for MVS makes it possible to estimate depth accurately from the correlation function obtained only by one window matching. Through a set of experiments using the pub- lic multi-view stereo datasets [13], we demonstrate that the proposed method

Page 3
An Efficient Image Matching Method for Multi-View Stereo 3
performs better in terms of the accuracy and the computational cost than the method proposed by Goesele et al. [5].
2 Phase-Only Correlation
This section describes the fundamentals of POC-based image matching. Most existing POC-based image matching methods are for 2D images. The image matching between stereo images can be reduced to a 1D image matching through stereo rectification. In this paper, we employ 1D POC function to estimate the depth from multi-view images. POC is an image matching technique using the phase components in Discrete Fourier Transforms (DFTs) of given images. Consider two N-length 1D image signals f(n) and g(n), where the index range is −M, ··· ,M (M > 0) and hence N = 2M + 1. Let F(k) and G(k) denote the 1D DFTs of the two signals. F(k) and G(k) are given by F(k) =
M
��
n=−M
f(n)Wkn
N = AF (k)ej��F (k),
(1) G(k) =
M
��
n=−M
g(n)Wkn
N = AG(k)ej��G(k),
(2) where k = −M, ··· ,M, WN = e−j 2��
N , AF (k) and AG(k) are amplitude, and
��F (k) and ��G(k) are phase. The normalized cross-power spectrum R(k) is given by R(k) = F(k)G(k) \ \ \F(k)G(k) \ \ \ = ej(��F (k)−��G(k)), (3) where G(k) is the complex conjugate of G(k), and ��F (k) − ��G(k) denotes the phase difference. The POC function r(n) is defined by Inverse DFT (IDFT) of R(k) and is given by r(n) = 1 N
M
��
k=−M
R(k)W−kn
N
. (4) Shibahara et al. [14] derived the analytical peak model of 1D POC function. Let us assume that f(n) and g(n) are minutely displaced with each other. The analytical peak model of 1D POC function can be defined by r(n) ≃ �� N sin (��(n + ��)) sin ( ��
N
(n + ��)), (5) where �� is a sub-pixel peak position and �� is a peak value. The peak position n = �� indicates the translational displacement between the two 1D image signals

Page 4
4 S. Sakai et al.
-50 0 50 0 255
DFT
−50 0 50 0 50 −50 0 50 −3 0 3
Amplitude Phase
-50 0 50 0 255
DFT
−50 0 50 0 50 −50 0 50 −3 0 3
Amplitude Phase
IDFT
−50 0 50 0 1��
�� 1D POC function r(n) 1D image signal f(n) 1D image signal g(n) Image 1 Image 2 F(k) G(k)
Fig. 1. Example of 1D POC-based image matching.
and the peak value �� indicates the similarity between the two 1D image signals. The translational displacement with sub-pixel accuracy can be estimated by fitting the model of Eq. (5) to the calculated data array around the correlation peak, where �� and �� are fitting parameters. In addition, we employ the following techniques to improve the accuracy of 1D image matching: (i) windowing to reduce boundary effects, (ii) spectral weighting for reducing aliasing and noise effects, and (iii) averaging 1D POC functions to improve peak-to-noise ratio [12, 14]. Fig. 1 shows an example of 1D POC-based image matching.
3 POC-Based Image Matching for Multi-View Stereo
In this section, we describe a POC-based image matching method for MVS. The existing algorithms using NCC-based image matching need to do many NCC computations with changing the assumed depth to estimate the accurate depth of a 3D point. On the other hand, the proposed method estimates the accurate depth only with one window matching by approximating the depth change on a 3D point by the translational displacement on the stereo image and estimating the translational displacement using POC. The proposed method also enhances the estimation accuracy by integrating the POC functions calculated from multiple stereo image pairs. The POC functions calculated from stereo images with different view-points indicate the different peak positions due to the difference in camera positions. To integrate the POC functions, the proposed method normalizes the disparity of each stereo image and integrates the POC functions on the same coordinate system. So far, Okutomi et al. [15] have proposed the disparity normalization technique to integrate correlation functions calculated from stereo images with different viewpoints. This technique, however, assumes that all cameras are lo- cated on the same line. This assumption is not suitable in a practical situation. The disparity normalization technique used in the proposed method, which is

Page 5
An Efficient Image Matching Method for Multi-View Stereo 5
a generalized version of the technique proposed by Okutomi et al. [15], can in- tegrate the correlation functions calculated from stereo images with different viewpoints even if the cameras are not located on the same line. Let V = {V0, ··· ,VH−1} be the multi-view images with known camera pa- rameters. We consider a reference view VR �� V and neighboring views C = {C0, ··· ,CK−1} ⊂ V − {VR} as input images, where H and K are the number of the multi-view images and the number of the neighboring views, respectively. The proposed method generates K pairs of a rectified stereo image and estimates the depth of each point in VR from the peak position of the correlation function obtained by integrating the POC functions with normalized disparity. We use a stereo rectification method employed in the Camera Calibration Toolbox for Matlab [16]. Next, we describe the key techniques of the proposed method: (i) normalizing the disparity and (ii) integrating the POC functions. Then, we describe the proposed depth estimation method using POC-based image matching. 3.1 Normalization of Disparity We consider that the camera coordinate of the reference view VR corresponds to the world coordinate. Let V rect
R,i -Crect i
be the rectified stereo image pair, where V rect
R,i
is the rectified image of VR so as to correspond to the view angle of Ci. The relationship among the 3D point M = [X,Y,Z]T in the camera coordinate of VR, the rectified stereo image V rect
R,i -Crect i
(Ci �� C) with disparity di, and the rectified stereo image V rect
R,j -Crect j
(Cj �� C − {Ci}) with disparity dj is defined by M = ⎡ ⎣ X Y Z ⎤ ⎦ = Ri ⎡ ⎣ (ui − u0i)Bi/di (vi − v0i)Bi/di ��iBi/di ⎤ ⎦ = Rj ⎡ ⎣ (uj − u0j)Bj/dj (vj − v0j)Bj/dj ��jBj/dj ⎤ ⎦ , (6) where (ul,vl) is the corresponding point of M in V rect
R,l , (u0l,v0l) is the optical
center of V rect
R,l , ��l is focal length and Bl is baseline length between V rect R,l -Crect l
(l = i, j). Rl denotes a rotation matrix from the reference view VR to the rectified reference view V rect
R,l
used in stereo rectification for V rect
R,l -Crect l
, and is given by Rl = ⎡ ⎣ Rl11 Rl12 Rl13 Rl21 Rl22 Rl23 Rl31 Rl32 Rl33 ⎤ ⎦ . (7) From Eq. (6), we derive the relationship between di and dj as follows di = Ri31(ui−u0i)+Ri32(vi−v0i)+Ri33��i Rj31(uj −u0j)+Rj32(vj −v0j)+Rj33��j Bi Bj dj. (8) From Eq. (8), the relationship between di and dj is represented by the scaling factor that depends on the camera parameters and the coordinates of the cor- responding points in V rect
R
. We define the normalized disparity d to take into

Page 6
6 S. Sakai et al.
M M�� ��M VR C1 C0 B0 B1 X Z Y m ��0 ��1
Fig. 2. Geometric relationship between the location of 3D point and the disparity on the images.
account the scale factor for each disparity. If we consider the rectified stereo image pair V rect
R,i -Crect i
(i = 0, ··· ,K − 1), the relationship between di in each rectified stereo pair and the normalized disparity d can be written as di = sid, (9) where si denotes the scale factor for the disparity di and is given by si = (Ri31(ui−u0i)+Ri32(vi−v0i)+Ri33��i)Bi 1 K
K−1
��
l=0
(Rl31(ul −u0l)+Rl32(vl −v0l) + Rl33��l)Bl . (10) In this case, the 3D point M can be defined by M = Ri ⎡ ⎣ (ui − u0i)Bi/(sid) (vi − v0i)Bi/(sid) ��iBi/(sid) ⎤ ⎦ . (11) 3.2 Integration of POC Function We consider the 3D point M and its minutely displaced 3D point M0 = M+∆M, where ∆M = [∆X, ∆Y, ∆Z]T denotes the minute displacement, as shown in Fig. 2. Let d and d0 be the normalized disparities of M and M0, respectively. Assuming that M is the true 3D point, the relationship between d and d0 is given by d0 = d + ��, (12)

Page 7
An Efficient Image Matching Method for Multi-View Stereo 7
r(n )
^
−20 −15 −10 −5 0 5 10 15 −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 (a) n r(n ) −20 −15 −10 −5 0 5 10 15 −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 (b) n -
VR,0
rect
C0
rect
-
VR,1
rect
C1
rect
-
VR,2
rect
C2
rect
-
VR,3
rect
C3
rect
-
VR,0
rect
C0
rect
-
VR,1
rect
C1
rect
-
VR,2
rect
C2
rect
-
VR,3
rect
C3
rect
Fig. 3. Integration of the POC functions calculated from stereo image pairs with differ- ent viewpoints: (a) POC functions before disparity normalization and (b) POC func- tions after disparity normalization.
where �� denotes the error between the normalized disparities d and d0. For the rectified stereo image pair V rect
R,i -Crect i
(i �� {0, ··· ,K − 1}), the relationship between the 3D point M0 and the normalized disparity d is M0 = Ri ⎡ ⎣ (ui − u0i)Bi/(si(d + ��)) (vi − v0i)Bi/(si(d + ��)) ��iBi/(si(d + ��)) ⎤ ⎦ . (13) Let fi and gi be the matching windows extracted from V rect
R,i
and Crect
i
cen- tered on the corresponding point of M0, respectively. Approximating the local image transformation by translational displacement, the translational displace- ment between fi and gi is ��i = si��. The displacement ��i can be estimated from the correlation peak position of the POC function ri between fi and gi as mentioned in Sect. 2. The different rectified stereo image pairs, however, have different translational displacements. For example, ��i in V rect
R,i -Crect i
and ��j in V rect
R,j -Crect j
(j �� {0, ··· ,K − 1}−{i}) are not always equal. In other words, the POC functions ri and rj have different correlation peak positions. Addressing this problem, we convert the coordinate system of the POC func- tions ri and rj into the same coordinate system by scaling the matching windows in accordance with each normalized disparity. Let w be the unified size of the matching window. The size of the matching windows of fi and gi is defined by siw. Scaling the image signals fi and gi by 1/si, the size of the matching windows is normalized to w, where we denote ˆfi and gi as the scaled version of the matching windows fi and gi, respectively. Hence, the correlation peak of the POC function ri between ˆfi and gi is located at ��. Similarly, for the recti- fied stereo image pair V rect
R,j -Crect j
, the correlation peak of the POC function rj between ˆfj and gj is located at the same position ��, although the size of the matching window, i.e., sjw, is different from that for V rect
R,i -Crect i
, i.e., siw.

Page 8
8 S. Sakai et al.
C0
rect
VR,0
rect
VR,1
rect
VR,2
rect
VR,3
rect
C1
rect
C2
rect
C3
rect
g
0
g
1
r
0
^
r
1
^
r
2
^
rave
^
r
3
^
g
2
g
3
f1 f2 f3 M�� M �� �� f0
Fig. 4. Depth estimation using POC-based image matching.
Fig. 3 (a) shows the POC functions before disparity normalization. In this case, the translational displacement ��i between matching windows is different for each view-point. Thus, the positions of the correlation peaks are also dif- ferent. On the other hand, Fig. 3 (b) shows the POC functions after disparity normalization. In this case, the translational displacement �� is the same for all the viewpoints. Therefore, all the POC functions overlap at the same position. Using disparity normalization makes it possible to integrate the POC func- tions calculated from rectified stereo image pairs with different viewpoints. In this paper, we employ the POC function rave, which is the average of the POC functions ri (i = 0, ··· ,K − 1), as the integrated POC functions. 3.3 Depth Estimation Using POC-Based Image Matching We describe the depth estimation method using POC-based image matching with two important techniques as described above. Fig. 4 shows the flow of the proposed method. First, the initial position of the 3D point M0 is projected onto the rectified stereo image pair V rect
R,i -Crect i
, and the coordinates on V rect
R,i
and Crect
i
are denoted by mi = [ui,vi] and mC
i = [uC i ,vC i ], respectively, where
i = 0, ··· ,K − 1. Next, the matching windows fi and gi extracted from V rect
R,i
centered at mi with the size siw �� L and Crect
i
centered at mC
i with the size

Page 9
An Efficient Image Matching Method for Multi-View Stereo 9
siw �� L, respectively. Note that we extract L lines of the matching window to employ the technique averaging 1D POC functions to improve the peak-to-noise ratio as described in Sect. 2. Then, we apply the disparity normalization to the matching windows fi and gi and calculate the 1D POC function ri between ˆfi and gi. The correlation peak position of the 1D POC function ri may include a significant error if 3D point M0 is not visible from the neighboring view Ci �� C or the matching window is extracted from the boundary region of an object that has multiple disparities. In this case, we observe that the correlation peak value ��i drops, since the local image transformation between the matching windows cannot be approximated by translational displacement. To improve the accuracy of depth estimation, the average POC function rave is calculated from the POC functions ri with ��i > thcorr, where thcorr is a threshold. Finally, the correlation peak position �� with sub-pixel accuracy is estimated by fitting the analytical peak model of the POC function to rave. From Eq. (11), Eq. (12), and ��, the true position of the 3D point M is estimated by M = Ri ⎡ ⎣ (ui − u0i)Bi/(si(d0 − ��)) (vi − v0i)Bi/(si(d0 − ��)) fiBi/(si(d0 − ��)) ⎤ ⎦ . (14) To generate a depth map, we apply the POC-base image matching to a plane-sweeping approach, and search the depth of each pixel in VR. Since the POC-based image matching can estimate the depth corresponding to ��w/4 pixel in the neighboring-view image, we search on the ray within the bounding box with changing the depth of M0 in stpdf of siw/4 pixel in the stereo images. We also apply the the coarse-to-fine strategy using image pyramids to the proposed method described in the above. We first esimate the approximate depth in the coarsest image layer, and then refine the depth in the subsequent image layers.
4 Experiments and Discussion
We evaluate the reconstruction accuracy and the computational cost of the con- ventional method and the proposed method using the public multi-view stereo image datasets [13]. In the experiments, we employ the famous method using the plane-sweeping approach proposed by Goesele et al. [5] as the conventional method. 4.1 Implementation We describe the implementation notes for Goesele��s method and the proposed methods. Goesele��s method [5] The reconstruction accuracy and the computational cost of Goesele��s method significantly depends on the step size ∆Z of the depth. In the experiments, we employ four variations of ∆Z such that the resolution of the disparity on the

Page 10
10 S. Sakai et al.
Reference view Neighboring views VR C0 C1
Fig. 5. Examples of reference-view image VR and neighboring-view images C used in the experiments (upper: Herz-Jesu-P8, lower: Fountain-P11).
widest-baseline stereo image is 1, 1/2, 1/5, and 1/10 pixels. The size of NCC- based window matching is 17 �� 17 pixels. The threshold value for averaging the NCC values calculated from stereo image pairs is 0.3. Proposed method The parameters for the proposed method used in the experiments are as follows. The threshold thcorr is 0.3, the matching window size w is 32 pixel and the number of POC functions L is 17. Note that the effective information of POC function with 32 pixels��17 lines is limited to 17 pixels��17 line, since we apply a Hanning widow with w/2-half width to the POC function to reduce the boundary effect as described in Sect. 2. We also employ the coarse-to-fine strategy using image pyramids. The numbers of layers are 2, 3, and 4 for 768��512, 1, 536��1, 024, and 3, 072 �� 2, 048 pixels, respectively. 4.2 Evaluation of 3D Reconstruction Accuracy We evaluate the 3D reconstruction accuracy using Herz-Jesu-P8 (8 images) and Fountain-P11 (11 images), which are available in [13]. The datasets Herz-Jesu- P8 and Fountain-P11 include the multi-view images with 3, 072 �� 2, 048 pixels, camera parameters, bounding boxes, and the mesh model of the target object that can be used as the ground truth. For each dataset, we generate depth maps of all the view points using Goesele��s method and the proposed method. We use two neighboring-view images C for one reference-view image VR. Fig. 5 shows examples of VR and C used in the experiments. The performance is evaluated for the three different image sizes : 768 �� 512, 1, 536 �� 1, 024, and 3, 072 �� 2, 048 pixels. We evaluate the accuracy of 3D reconstruction by the error rate e defined by e = |Zcalculated − Zground truth| Zground truth �� 100 [%], (15) where Zcalculated and Zground truth denote the estimated depth and the true depth obtained from the ground truth, respectively. Fig. 6 shows the reconstructed 3D

Page 11
An Efficient Image Matching Method for Multi-View Stereo 11
Goesele, ��Z=1/10 pixel Proposed method Ground truth
Fig. 6. Reconstruction results of 1, 536 �� 1, 024-pixel images for each dataset (upper: Herz-Jesu-P8, lower: Fountain-P11).
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 10 20 30 40 50 60 70 80 90
Error rate [%] Inlier rate [%]
Goesele, ��Z=1 pixel Goesele, ��Z=1/2 pixel Goesele, ��Z=1/5 pixel Goesele, ��Z=1/10 pixel Proposed method
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 10 20 30 40 50 60 70 80 90
Error rate [%] Inlier rate [%]
Goesele, ��Z=1 pixel Goesele, ��Z=1/2 pixel Goesele, ��Z=1/5 pixel Goesele, ��Z=1/10 pixel Proposed method
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 10 20 30 40 50 60 70 80 90
Error rate [%] Inlier rate [%]
Goesele, ��Z=1 pixel Goesele, ��Z=1/2 pixel Goesele, ��Z=1/5 pixel Goesele, ��Z=1/10 pixel Proposed method
768��512 pixels 1,536��1,024 pixels 3,072��2,048 pixels
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 10 20 30 40 50 60 70 80 90
Error rate [%] Inlier rate [%]
Goesele, ��Z=1 pixel Goesele, ��Z=1/2 pixel Goesele, ��Z=1/5 pixel Goesele, ��Z=1/10 pixel Proposed method
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 10 20 30 40 50 60 70 80 90
Error rate [%] Inlier rate [%]
Goesele, ��Z=1 pixel Goesele, ��Z=1/2 pixel Goesele, ��Z=1/5 pixel Goesele, ��Z=1/10 pixel Proposed method
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 10 20 30 40 50 60 70 80 90
Error rate [%] Inlier rate [%]
Goesele, ��Z=1 pixel Goesele, ��Z=1/2 pixel Goesele, ��Z=1/5 pixel Goesele, ��Z=1/10 pixel Proposed method
768��512 pixels 1,536��1,024 pixels 3,072��2,048 pixels
Fig. 7. Inlier rate for each dataset (upper: Herz-Jesu-P8, lower: Fountain-P11).
point clouds of Goesele��s method and the proposed method for 1, 536 �� 1, 024- pixel images. Fig. 7 shows the inlier rates for changing threshold of the error rates for each dataset. Fig. 8 shows the average error rates of inliers, where the inlier is defined by a 3D point whose error rate is less than 1.0%.

Page 12
12 S. Sakai et al.
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
1,536x1,024 768x512 3,072x2,048
Goesele, ��Z=1 pixel Goesele, ��Z=1/2 pixel Goesele, ��Z=1/5 pixel Goesele, ��Z=1/10 pixel Proposed method
Image size [pixels] A verage of error rate [%]
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
1,536x1,024 768x512 3,072x2,048
Goesele, ��Z=1 pixel Goesele, ��Z=1/2 pixel Goesele, ��Z=1/5 pixel Goesele, ��Z=1/10 pixel Proposed method
Image size [pixels] A verage of error rate [%]
Fig. 8. Average error rates for each dataset (left: Herz-Jesu-P8, right: Fountain-P11).
For Goesele��s method, the error rates of the 3D point clouds are small when the step size ∆Z is sufficiently small. For the proposed method, we observe that the reconstructed 3D points are concentrated on smaller error rates than in Goesele��s method with ∆Z = 1/10 pixel. We also confirm this result from the av- erage error rates in Fig. 8. For Fountain-P11, the proposed method can estimate more accurate depth than Goesele��s method for all the image sizes. In Goesele��s method, to estimate the accurate depth, the sub-pixel displacement between the matching windows is represented by image interpolation. On the other hand, the proposed method employs the POC-based image matching, which can estimate the accurate sub-pixel displacement between the matching windows by fitting the analytical correlation peak model of the POC function. As is observed in the above experiments, the proposed method exhibits higher reconstruction accuracy than Goesele��s method. 4.3 Evaluation of Computational Cost We evaluate the computational cost to estimate the depth of one point on the reference-view image for Goesele��s method and the proposed method. When using the w-pixel matching window, the proposed method can estimate the dis- placement within ��w/4 pixels for one window matching. In Goesele��s method, we also estimate the displacement within ��w/4 pixels using NCC-based im- age matching. Table 1 shows the computational cost for each method. Goesele��s method with the small step size ∆Z requires high computational cost. On the other hand, the proposed method requires low computational cost that is com- parable to that for Goesele��s method with ∆Z = 1 pixel or ∆Z = 1/2 pixel. As described in Sect. 4.2, the reconstruction accuracy of the proposed method is higher than that of Goesele��s method with ∆Z = 1/10 pixel. Although the computational cost for Goesele��s method can be reduced when ∆Z is large, the reconstruction accuracy drops significantly. Compared with Goesele��s method, the proposed method exhibits efficient 3D reconstruction from multi-view images in terms of the reconstruction accuracy and the computational cost.

Page 13
An Efficient Image Matching Method for Multi-View Stereo 13 Table 1. Computational cost to estimate the depth of one point on the reference-view image for each method. Additions Multiplications Divisions Square roots Goesele, ∆Z = 1 pixel 75,140 31,246 578 578 Goesele, ∆Z = 1/2 pixel 150,280 62,492 1,156 1,156 Goesele, ∆Z = 1/5 pixel 357,700 156,230 2,890 2,890 Goesele, ∆Z = 1/10 pixel 751,400 312,460 5,780 5,780 Proposed method 40,000 34,496 2,176 1,088
5 Conclusion
This paper has proposed an efficient image matching method for Multi-View Stereo (MVS) using Phase-Only Correlation (POC). The proposed method with normalizing disparity and integrating POC functions can estimate the depth from the correlation function obtained only by one window matching. Also, the reconstruction accuracy of the proposed method is higher than that of NCC- based image matching, since POC-based image matching can estimate the ac- curate sub-pixel translational displacement between two windows by fitting the analytical correlation peak model of the POC function. Through a set of exper- iments using the public multi-view stereo datasets, we have demonstrated that the proposed method performs better in terms of accuracy and computational cost than Goesele��s method. In future work, we will improve the accuracy of the proposed method to consider the normal vectors of 3D point and develop an MVS algorithm using the proposed method.
References
1. Szeliski, R.: Computer Vision: Algorithms and Applications. Springer-Verlag New York Inc. (2010) 2. Seitz, S.M., Curless, B., Diebel, J., Scharstein, D., Szeliski, R.: A comparison and evaluation of multi-views stereo reconstruction algorithms. Proc. Int��l Conf. Computer Vision and Pattern Recognition (2006) pp. 519–528 3. Strecha, C., Fransens, R., Gool, L.V.: Wide-baseline stereo from multiple views: A probabilistic account. Proc. Int��l Conf. Computer Vision and Pattern Recognition (2004) pp. 552–559 4. Strecha, C., Fransens, R., Gool, L.V.: Combined depth and outlier estimation in multi-view stereo. Proc. Int��l Conf. Computer Vision and Pattern Recognition (2006) pp. 2394–2401 5. Goesele, M., Curless, B., Seitz, S.M.: Multi-view stereo revisited. Proc. Int��l Conf. Computer Vision and Pattern Recognition (2006) pp. 2402–2409 6. Goesele, M., Snavely, N., Curless, B., Hoppe, H., Seitz, S.M.: Multi-view stereo for community photo collections. Proc. Int��l Conf. Computer Vision (2007) pp. 1–8 7. Strecha, C., von Hansen, W., Gool, L.V., Fua, P., Thoennessen, U.: On benchmark- ing camera calibration and multi-view stereo for high resolution imagery. Proc. Int��l Conf. Computer Vision and Pattern Recognition (2008) pp. 1–8

Page 14
14 S. Sakai et al. 8. Campbell, N.D.F., Vogiatzis, G., Hernandez, C., Cipolla, R.: Using multiple hy- potheses to improve depth-maps for multi-view stereo. Proc. European Conf. Com- puter Vision (2008) pp. 766–779 9. Bradley, D., Boubekeur, T., Heidrich, W.: Accurate multi-view reconstruction using robust binocular stereo and surface meshing. Proc. Int��l Conf. Computer Vision and Pattern Recognition (2008) pp. 1–8 10. Furukawa, Y., Ponce, J.: Accurate, dense, and robust multiview stereopsis. IEEE Trans. Pattern Analysis and Machine Intelligence Vol. 32 (2010) pp. 1362–1376 11. Kuglin, C.D., Hines, D.C.: The phase correlation image alignment method. Proc. Int��l Conf. Cybernetics and Society (1975) pp. 163–165 12. Takita, K., Aoki, T., Sasaki, Y., Higuchi, T., Kobayashi, K.: High-accuracy sub- pixel image registration based on phase-only correlation. IEICE Trans. Fundamen- tals Vol. E86-A (2003) pp. 1925–1934 13. Strecha, C.: (Multi-view evaluation) http://cvlab.epfl.ch/data/. 14. Shibahara, T., Aoki, T., Nakajima, H., Kobayashi, K.: A sub-pixel stereo corre- spondence technique based on 1D phase-only correlation. Proc. Int��l Conf. Image Processing (2007) pp. V–221–V–224 15. Okutomi, M., Kanade, T.: A multiple-baseline stereo. IEEE Trans. Pattern Anal- ysis and Machine Intelligence Vol. 15 (1993) pp. 353–363 16. Bouguet, J.Y.: (Camera calibration toolbox for matlab) http://www.vision. caltech.edu/bouguetj/calib_doc/.

Set Home | Add to Favorites

All Rights Reserved Powered by Free Document Search and Download

Copyright © 2011
This site does not host pdf,doc,ppt,xls,rtf,txt files all document are the property of their respective owners. complaint#nuokui.com
TOP