An Efficient Image Matching Method
for MultiView Stereo
Shuji Sakai1, Koichi Ito1, Takafumi Aoki1, Tomohito Masuda2,
and Hiroki Unten2
1 Graduate School of Information Sciences, Tohoku University,
Sendai, Miyagi, 980–8579, Japan
sakai@aoki.ecei.tohoku.ac.jp
2 Toppan Printing Co., Ltd., Bunkyoku, Tokyo, 112–8531, Japan
Abstract. Most existing MultiView Stereo (MVS) algorithms employ
the image matching method using Normalized CrossCorrelation (NCC)
to estimate the depth of an object. The accuracy of the estimated depth
depends on the step size of the depth in NCCbased window matching.
The step size of the depth must be small for accurate 3D reconstruc
tion, while the small step significantly increases computational cost. To
improve the accuracy of depth estimation and reduce the computational
cost, this paper proposes an efficient image matching method for MVS.
The proposed method is based on PhaseOnly Correlation (POC), which
is a highaccuracy image matching technique using the phase components
in Fourier transforms. The advantages of using POC are (i) the corre
lation function is obtained only by one window matching and (ii) the
accurate subpixel displacement between two matching windows can be
estimated by fitting the analytical correlation peak model of the POC
function. Thus, using POCbased window matching for MVS makes it
possible to estimate depth accurately from the correlation function ob
tained only by one window matching. Through a set of experiments us
ing the public MVS datasets, we demonstrate that the proposed method
performs better in terms of accuracy and computational cost than the
conventional method.
1 Introduction
In recent years, the topic of MultiView Stereo (MVS) has attracted much atten
tion in the field of computer vision [1–10]. MVS aims to reconstruct a complete
3D model from a set of images taken from different viewpoints. The major MVS
algorithm consists of two steps: (i) estimating the 3D points on the basis of
a photoconsistency measure and visibility model using a local image matching
method and (ii) reconstructing a 3D model from estimated 3D point clouds. The
accuracy, robustness and computational cost of MVS algorithms depend on the
performance of the image matching method, which is the most important factor
in MVS algorithms.
2
S. Sakai et al.
Most MVS algorithms employ Normalized CrossCorrelation (NCC)based
image matching to estimate 3D points [1,5,6,8–10]. Goesele et al. [5] have ap
plied NCCbased image matching to the planesweeping approach to estimate a
reliable depth map by cumulating the correlation values calculated from multiple
image pairs with changing the depth. Campbell et al. [8] estimated a depth map
more accurately than Goesele et al. [5] by using the matching results obtained
from neighboring pixels to reduce outliers. Bradley et al. [9] and Furukawa et al.
[10] achieved robust image matching by transforming the matching window in
accordance with not only the depth but also the normal of the 3D points.
In the MVS algorithms mentioned in the above, an NCC value between
matching windows is used as the reliability of a 3D point. The optimal 3D point
is estimated by iteratively computing NCC values between matching windows
with changing the parameter of 3D point, i.e., depth or normal. For example,
the planesweeping approach such as that of Goesele et al. [5] computes NCC
values between matching windows with discretely changing the depth and selects
the depth that has the highest NCC value as the optimal one. To estimate the
accurate depth, a sufficiently small step of the depth must be employed, which
significantly increases computational cost. If the step of the depth is small, the
translational displacement of a 3D point is a subpixel on the multiview im
ages. Most existing methods assume that the subpixel resolution of a matching
window is represented by linear interpolation. This assumption, however, is not
always true.
In this paper, we propose an efficient image matching method for MVS using
PhaseOnly Correlation (POC) (or simply ��phase correlarion��). POC is a kind
of correlation function calculated only from the phase components in Fourier
transform. The translational displacement and similarity between two images
can be estimated from the position and height of the correlation peak of the
POC function, respectively. Kuglin et al. [11] proposed a fundamental image
matching technique using POC, and Takita et al. [12] proposed a subpixel im
age registration technique using POC. The major advantages of using POC
based instead of NCCbased image matching are the following two points: (i)
the correlation function is obtained only by one window matching and (ii) the
accurate subpixel translational displacement between two windows can be es
timated by fitting the analytical correlation peak model of the POC function.
By applying POCbased image matching to depth estimation, the peak position
of the POC function indicates the displacement between the assumed and true
depth. Hence, we can directly estimate the true depth from the results of only
one POCbased window matching. By introducing POCbased image matching
to the planesweeping approach, we need little window matching to estimate the
true depth from multiview images. In addition, the accuracy of depth estima
tion can be improved by integrating the POC functions calculated from multiple
stereo image pairs. Thus, using POCbased window matching for MVS makes
it possible to estimate depth accurately from the correlation function obtained
only by one window matching. Through a set of experiments using the pub
lic multiview stereo datasets [13], we demonstrate that the proposed method
An Efficient Image Matching Method for MultiView Stereo
3
performs better in terms of the accuracy and the computational cost than the
method proposed by Goesele et al. [5].
2 PhaseOnly Correlation
This section describes the fundamentals of POCbased image matching. Most
existing POCbased image matching methods are for 2D images. The image
matching between stereo images can be reduced to a 1D image matching through
stereo rectification. In this paper, we employ 1D POC function to estimate the
depth from multiview images.
POC is an image matching technique using the phase components in Discrete
Fourier Transforms (DFTs) of given images. Consider two Nlength 1D image
signals f(n) and g(n), where the index range is −M, ··· ,M (M > 0) and hence
N = 2M + 1. Let F(k) and G(k) denote the 1D DFTs of the two signals. F(k)
and G(k) are given by
F(k) =
M
��
n=−M
f(n)Wkn
N = AF (k)ej��F (k),
(1)
G(k) =
M
��
n=−M
g(n)Wkn
N = AG(k)ej��G(k),
(2)
where k = −M, ··· ,M, WN = e−j 2��
N , AF (k) and AG(k) are amplitude, and
��F (k) and ��G(k) are phase. The normalized crosspower spectrum R(k) is given
by
R(k) =
F(k)G(k)
\
\
\F(k)G(k)
\
\
\
= ej(��F (k)−��G(k)),
(3)
where G(k) is the complex conjugate of G(k), and ��F (k) − ��G(k) denotes the
phase difference. The POC function r(n) is defined by Inverse DFT (IDFT) of
R(k) and is given by
r(n) =
1
N
M
��
k=−M
R(k)W−kn
N
.
(4)
Shibahara et al. [14] derived the analytical peak model of 1D POC function.
Let us assume that f(n) and g(n) are minutely displaced with each other. The
analytical peak model of 1D POC function can be defined by
r(n) ≃
��
N
sin (��(n + ��))
sin ( ��
N
(n + ��)),
(5)
where �� is a subpixel peak position and �� is a peak value. The peak position
n = �� indicates the translational displacement between the two 1D image signals
4
S. Sakai et al.
50
0
50
0
255
DFT
−50
0
50
0
50
−50
0
50
−3
0
3
Amplitude
Phase
50
0
50
0
255
DFT
−50
0
50
0
50
−50
0
50
−3
0
3
Amplitude
Phase
IDFT
−50
0
50
0
1
��
��
1D POC function
r(
n)
1D image signal
f(
n)
1D image signal
g(
n)
Image 1
Image 2
F(
k)
G(
k)
Fig. 1. Example of 1D POCbased image matching.
and the peak value �� indicates the similarity between the two 1D image signals.
The translational displacement with subpixel accuracy can be estimated by
fitting the model of Eq. (5) to the calculated data array around the correlation
peak, where �� and �� are fitting parameters. In addition, we employ the following
techniques to improve the accuracy of 1D image matching: (i) windowing to
reduce boundary effects, (ii) spectral weighting for reducing aliasing and noise
effects, and (iii) averaging 1D POC functions to improve peaktonoise ratio [12,
14]. Fig. 1 shows an example of 1D POCbased image matching.
3 POCBased Image Matching for MultiView Stereo
In this section, we describe a POCbased image matching method for MVS.
The existing algorithms using NCCbased image matching need to do many
NCC computations with changing the assumed depth to estimate the accurate
depth of a 3D point. On the other hand, the proposed method estimates the
accurate depth only with one window matching by approximating the depth
change on a 3D point by the translational displacement on the stereo image and
estimating the translational displacement using POC. The proposed method also
enhances the estimation accuracy by integrating the POC functions calculated
from multiple stereo image pairs.
The POC functions calculated from stereo images with different viewpoints
indicate the different peak positions due to the difference in camera positions.
To integrate the POC functions, the proposed method normalizes the disparity
of each stereo image and integrates the POC functions on the same coordinate
system. So far, Okutomi et al. [15] have proposed the disparity normalization
technique to integrate correlation functions calculated from stereo images with
different viewpoints. This technique, however, assumes that all cameras are lo
cated on the same line. This assumption is not suitable in a practical situation.
The disparity normalization technique used in the proposed method, which is
An Efficient Image Matching Method for MultiView Stereo
5
a generalized version of the technique proposed by Okutomi et al. [15], can in
tegrate the correlation functions calculated from stereo images with different
viewpoints even if the cameras are not located on the same line.
Let V = {V0, ··· ,VH−1} be the multiview images with known camera pa
rameters. We consider a reference view VR �� V and neighboring views C =
{C0, ··· ,CK−1} ⊂ V − {VR} as input images, where H and K are the number
of the multiview images and the number of the neighboring views, respectively.
The proposed method generates K pairs of a rectified stereo image and estimates
the depth of each point in VR from the peak position of the correlation function
obtained by integrating the POC functions with normalized disparity. We use
a stereo rectification method employed in the Camera Calibration Toolbox for
Matlab [16].
Next, we describe the key techniques of the proposed method: (i) normalizing
the disparity and (ii) integrating the POC functions. Then, we describe the
proposed depth estimation method using POCbased image matching.
3.1 Normalization of Disparity
We consider that the camera coordinate of the reference view VR corresponds to
the world coordinate. Let V rect
R,i Crect
i
be the rectified stereo image pair, where
V rect
R,i
is the rectified image of VR so as to correspond to the view angle of Ci.
The relationship among the 3D point M = [X,Y,Z]T in the camera coordinate
of VR, the rectified stereo image V rect
R,i Crect
i
(Ci �� C) with disparity di, and the
rectified stereo image V rect
R,j Crect
j
(Cj �� C − {Ci}) with disparity dj is defined
by
M =
⎡
⎣
X
Y
Z
⎤
⎦ = Ri
⎡
⎣
(ui − u0i)Bi/di
(vi − v0i)Bi/di
��iBi/di
⎤
⎦ = Rj
⎡
⎣
(uj − u0j)Bj/dj
(vj − v0j)Bj/dj
��jBj/dj
⎤
⎦ ,
(6)
where (ul,vl) is the corresponding point of M in V rect
R,l , (u0l,v0l) is the optical
center of V rect
R,l , ��l is focal length and Bl is baseline length between V rect
R,l Crect
l
(l = i, j). Rl denotes a rotation matrix from the reference view VR to the rectified
reference view V rect
R,l
used in stereo rectification for V rect
R,l Crect
l
, and is given by
Rl =
⎡
⎣
Rl11 Rl12 Rl13
Rl21 Rl22 Rl23
Rl31 Rl32 Rl33
⎤
⎦ .
(7)
From Eq. (6), we derive the relationship between di and dj as follows
di =
Ri31(ui−u0i)+Ri32(vi−v0i)+Ri33��i
Rj31(uj −u0j)+Rj32(vj −v0j)+Rj33��j
Bi
Bj
dj.
(8)
From Eq. (8), the relationship between di and dj is represented by the scaling
factor that depends on the camera parameters and the coordinates of the cor
responding points in V rect
R
. We define the normalized disparity d to take into
6
S. Sakai et al.
M
M��
��M
VR
C1
C0
B0
B1
X
Z
Y
m
��0
��1
Fig. 2. Geometric relationship between the location of 3D point and the disparity on
the images.
account the scale factor for each disparity. If we consider the rectified stereo
image pair V rect
R,i Crect
i
(i = 0, ··· ,K − 1), the relationship between di in each
rectified stereo pair and the normalized disparity d can be written as
di = sid,
(9)
where si denotes the scale factor for the disparity di and is given by
si =
(Ri31(ui−u0i)+Ri32(vi−v0i)+Ri33��i)Bi
1
K
K−1
��
l=0
(Rl31(ul −u0l)+Rl32(vl −v0l) + Rl33��l)Bl
.
(10)
In this case, the 3D point M can be defined by
M = Ri
⎡
⎣
(ui − u0i)Bi/(sid)
(vi − v0i)Bi/(sid)
��iBi/(sid)
⎤
⎦ .
(11)
3.2 Integration of POC Function
We consider the 3D point M and its minutely displaced 3D point M0 = M+∆M,
where ∆M = [∆X, ∆Y, ∆Z]T denotes the minute displacement, as shown in
Fig. 2. Let d and d0 be the normalized disparities of M and M0, respectively.
Assuming that M is the true 3D point, the relationship between d and d0 is given
by
d0 = d + ��,
(12)
An Efficient Image Matching Method for MultiView Stereo
7
r(
n
)
^
−20
−15
−10
−5
0
5
10
15
−0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
(a)
n
r(
n
)
−20
−15
−10
−5
0
5
10
15
−0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
(b)
n

VR,0
rect
C0
rect

VR,1
rect
C1
rect

VR,2
rect
C2
rect

VR,3
rect
C3
rect

VR,0
rect
C0
rect

VR,1
rect
C1
rect

VR,2
rect
C2
rect

VR,3
rect
C3
rect
Fig. 3. Integration of the POC functions calculated from stereo image pairs with differ
ent viewpoints: (a) POC functions before disparity normalization and (b) POC func
tions after disparity normalization.
where �� denotes the error between the normalized disparities d and d0. For the
rectified stereo image pair V rect
R,i Crect
i
(i �� {0, ··· ,K − 1}), the relationship
between the 3D point M0 and the normalized disparity d is
M0 = Ri
⎡
⎣
(ui − u0i)Bi/(si(d + ��))
(vi − v0i)Bi/(si(d + ��))
��iBi/(si(d + ��))
⎤
⎦ .
(13)
Let fi and gi be the matching windows extracted from V rect
R,i
and Crect
i
cen
tered on the corresponding point of M0, respectively. Approximating the local
image transformation by translational displacement, the translational displace
ment between fi and gi is ��i = si��. The displacement ��i can be estimated
from the correlation peak position of the POC function ri between fi and gi as
mentioned in Sect. 2. The different rectified stereo image pairs, however, have
different translational displacements. For example, ��i in V rect
R,i Crect
i
and ��j in
V rect
R,j Crect
j
(j �� {0, ··· ,K − 1}−{i}) are not always equal. In other words, the
POC functions ri and rj have different correlation peak positions.
Addressing this problem, we convert the coordinate system of the POC func
tions ri and rj into the same coordinate system by scaling the matching windows
in accordance with each normalized disparity. Let w be the unified size of the
matching window. The size of the matching windows of fi and gi is defined
by siw. Scaling the image signals fi and gi by 1/si, the size of the matching
windows is normalized to w, where we denote ˆfi and gi as the scaled version
of the matching windows fi and gi, respectively. Hence, the correlation peak of
the POC function ri between ˆfi and gi is located at ��. Similarly, for the recti
fied stereo image pair V rect
R,j Crect
j
, the correlation peak of the POC function rj
between ˆfj and gj is located at the same position ��, although the size of the
matching window, i.e., sjw, is different from that for V rect
R,i Crect
i
, i.e., siw.
8
S. Sakai et al.
C0
rect
VR,0
rect
VR,1
rect
VR,2
rect
VR,3
rect
C1
rect
C2
rect
C3
rect
g
0
g
1
r
0
^
r
1
^
r
2
^
rave
^
r
3
^
g
2
g
3
f1
f2
f3
M��
M
��
��
f0
Fig. 4. Depth estimation using POCbased image matching.
Fig. 3 (a) shows the POC functions before disparity normalization. In this
case, the translational displacement ��i between matching windows is different
for each viewpoint. Thus, the positions of the correlation peaks are also dif
ferent. On the other hand, Fig. 3 (b) shows the POC functions after disparity
normalization. In this case, the translational displacement �� is the same for all
the viewpoints. Therefore, all the POC functions overlap at the same position.
Using disparity normalization makes it possible to integrate the POC func
tions calculated from rectified stereo image pairs with different viewpoints. In
this paper, we employ the POC function rave, which is the average of the POC
functions ri (i = 0, ··· ,K − 1), as the integrated POC functions.
3.3 Depth Estimation Using POCBased Image Matching
We describe the depth estimation method using POCbased image matching
with two important techniques as described above. Fig. 4 shows the flow of the
proposed method. First, the initial position of the 3D point M0 is projected
onto the rectified stereo image pair V rect
R,i Crect
i
, and the coordinates on V rect
R,i
and Crect
i
are denoted by mi = [ui,vi] and mC
i = [uC
i ,vC
i ], respectively, where
i = 0, ··· ,K − 1. Next, the matching windows fi and gi extracted from V rect
R,i
centered at mi with the size siw �� L and Crect
i
centered at mC
i with the size
An Efficient Image Matching Method for MultiView Stereo
9
siw �� L, respectively. Note that we extract L lines of the matching window to
employ the technique averaging 1D POC functions to improve the peaktonoise
ratio as described in Sect. 2. Then, we apply the disparity normalization to the
matching windows fi and gi and calculate the 1D POC function ri between ˆfi
and gi. The correlation peak position of the 1D POC function ri may include a
significant error if 3D point M0 is not visible from the neighboring view Ci �� C
or the matching window is extracted from the boundary region of an object that
has multiple disparities. In this case, we observe that the correlation peak value
��i drops, since the local image transformation between the matching windows
cannot be approximated by translational displacement. To improve the accuracy
of depth estimation, the average POC function rave is calculated from the POC
functions ri with ��i > thcorr, where thcorr is a threshold. Finally, the correlation
peak position �� with subpixel accuracy is estimated by fitting the analytical
peak model of the POC function to rave. From Eq. (11), Eq. (12), and ��, the
true position of the 3D point M is estimated by
M = Ri
⎡
⎣
(ui − u0i)Bi/(si(d0 − ��))
(vi − v0i)Bi/(si(d0 − ��))
fiBi/(si(d0 − ��))
⎤
⎦ .
(14)
To generate a depth map, we apply the POCbase image matching to a
planesweeping approach, and search the depth of each pixel in VR. Since the
POCbased image matching can estimate the depth corresponding to ��w/4 pixel
in the neighboringview image, we search on the ray within the bounding box
with changing the depth of M0 in stpdf of siw/4 pixel in the stereo images. We
also apply the the coarsetofine strategy using image pyramids to the proposed
method described in the above. We first esimate the approximate depth in the
coarsest image layer, and then refine the depth in the subsequent image layers.
4 Experiments and Discussion
We evaluate the reconstruction accuracy and the computational cost of the con
ventional method and the proposed method using the public multiview stereo
image datasets [13]. In the experiments, we employ the famous method using
the planesweeping approach proposed by Goesele et al. [5] as the conventional
method.
4.1 Implementation
We describe the implementation notes for Goesele��s method and the proposed
methods.
Goesele��s method [5]
The reconstruction accuracy and the computational cost of Goesele��s method
significantly depends on the step size ∆Z of the depth. In the experiments, we
employ four variations of ∆Z such that the resolution of the disparity on the
10
S. Sakai et al.
Reference view
Neighboring views
VR
C0
C1
Fig. 5. Examples of referenceview image VR and neighboringview images C used in
the experiments (upper: HerzJesuP8, lower: FountainP11).
widestbaseline stereo image is 1, 1/2, 1/5, and 1/10 pixels. The size of NCC
based window matching is 17 �� 17 pixels. The threshold value for averaging the
NCC values calculated from stereo image pairs is 0.3.
Proposed method
The parameters for the proposed method used in the experiments are as
follows. The threshold thcorr is 0.3, the matching window size w is 32 pixel and
the number of POC functions L is 17. Note that the effective information of POC
function with 32 pixels��17 lines is limited to 17 pixels��17 line, since we apply a
Hanning widow with w/2half width to the POC function to reduce the boundary
effect as described in Sect. 2. We also employ the coarsetofine strategy using
image pyramids. The numbers of layers are 2, 3, and 4 for 768��512, 1, 536��1, 024,
and 3, 072 �� 2, 048 pixels, respectively.
4.2 Evaluation of 3D Reconstruction Accuracy
We evaluate the 3D reconstruction accuracy using HerzJesuP8 (8 images) and
FountainP11 (11 images), which are available in [13]. The datasets HerzJesu
P8 and FountainP11 include the multiview images with 3, 072 �� 2, 048 pixels,
camera parameters, bounding boxes, and the mesh model of the target object
that can be used as the ground truth. For each dataset, we generate depth maps
of all the view points using Goesele��s method and the proposed method. We use
two neighboringview images C for one referenceview image VR. Fig. 5 shows
examples of VR and C used in the experiments. The performance is evaluated
for the three different image sizes : 768 �� 512, 1, 536 �� 1, 024, and 3, 072 �� 2, 048
pixels.
We evaluate the accuracy of 3D reconstruction by the error rate e defined by
e =
Zcalculated − Zground truth
Zground truth
�� 100 [%],
(15)
where Zcalculated and Zground truth denote the estimated depth and the true depth
obtained from the ground truth, respectively. Fig. 6 shows the reconstructed 3D
An Efficient Image Matching Method for MultiView Stereo
11
Goesele,
��Z=1/10 pixel
Proposed method
Ground truth
Fig. 6. Reconstruction results of 1, 536 �� 1, 024pixel images for each dataset (upper:
HerzJesuP8, lower: FountainP11).
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
10
20
30
40
50
60
70
80
90
Error rate [%]
Inlier rate [%]
Goesele,
��Z=1 pixel
Goesele,
��Z=1/2 pixel
Goesele,
��Z=1/5 pixel
Goesele,
��Z=1/10 pixel
Proposed method
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
10
20
30
40
50
60
70
80
90
Error rate [%]
Inlier rate [%]
Goesele,
��Z=1 pixel
Goesele,
��Z=1/2 pixel
Goesele,
��Z=1/5 pixel
Goesele,
��Z=1/10 pixel
Proposed method
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
10
20
30
40
50
60
70
80
90
Error rate [%]
Inlier rate [%]
Goesele,
��Z=1 pixel
Goesele,
��Z=1/2 pixel
Goesele,
��Z=1/5 pixel
Goesele,
��Z=1/10 pixel
Proposed method
768��512 pixels
1,536��1,024 pixels
3,072��2,048 pixels
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
10
20
30
40
50
60
70
80
90
Error rate [%]
Inlier rate [%]
Goesele,
��Z=1 pixel
Goesele,
��Z=1/2 pixel
Goesele,
��Z=1/5 pixel
Goesele,
��Z=1/10 pixel
Proposed method
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
10
20
30
40
50
60
70
80
90
Error rate [%]
Inlier rate [%]
Goesele,
��Z=1 pixel
Goesele,
��Z=1/2 pixel
Goesele,
��Z=1/5 pixel
Goesele,
��Z=1/10 pixel
Proposed method
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
10
20
30
40
50
60
70
80
90
Error rate [%]
Inlier rate [%]
Goesele,
��Z=1 pixel
Goesele,
��Z=1/2 pixel
Goesele,
��Z=1/5 pixel
Goesele,
��Z=1/10 pixel
Proposed method
768��512 pixels
1,536��1,024 pixels
3,072��2,048 pixels
Fig. 7. Inlier rate for each dataset (upper: HerzJesuP8, lower: FountainP11).
point clouds of Goesele��s method and the proposed method for 1, 536 �� 1, 024
pixel images. Fig. 7 shows the inlier rates for changing threshold of the error
rates for each dataset. Fig. 8 shows the average error rates of inliers, where the
inlier is defined by a 3D point whose error rate is less than 1.0%.
12
S. Sakai et al.
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
1,536x1,024
768x512
3,072x2,048
Goesele,
��Z=1 pixel
Goesele,
��Z=1/2 pixel
Goesele,
��Z=1/5 pixel
Goesele,
��Z=1/10 pixel
Proposed method
Image size [pixels]
A
verage of error rate [%]
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
1,536x1,024
768x512
3,072x2,048
Goesele,
��Z=1 pixel
Goesele,
��Z=1/2 pixel
Goesele,
��Z=1/5 pixel
Goesele,
��Z=1/10 pixel
Proposed method
Image size [pixels]
A
verage of error rate [%]
Fig. 8. Average error rates for each dataset (left: HerzJesuP8, right: FountainP11).
For Goesele��s method, the error rates of the 3D point clouds are small when
the step size ∆Z is sufficiently small. For the proposed method, we observe
that the reconstructed 3D points are concentrated on smaller error rates than in
Goesele��s method with ∆Z = 1/10 pixel. We also confirm this result from the av
erage error rates in Fig. 8. For FountainP11, the proposed method can estimate
more accurate depth than Goesele��s method for all the image sizes. In Goesele��s
method, to estimate the accurate depth, the subpixel displacement between the
matching windows is represented by image interpolation. On the other hand, the
proposed method employs the POCbased image matching, which can estimate
the accurate subpixel displacement between the matching windows by fitting
the analytical correlation peak model of the POC function.
As is observed in the above experiments, the proposed method exhibits higher
reconstruction accuracy than Goesele��s method.
4.3 Evaluation of Computational Cost
We evaluate the computational cost to estimate the depth of one point on the
referenceview image for Goesele��s method and the proposed method. When
using the wpixel matching window, the proposed method can estimate the dis
placement within ��w/4 pixels for one window matching. In Goesele��s method,
we also estimate the displacement within ��w/4 pixels using NCCbased im
age matching. Table 1 shows the computational cost for each method. Goesele��s
method with the small step size ∆Z requires high computational cost. On the
other hand, the proposed method requires low computational cost that is com
parable to that for Goesele��s method with ∆Z = 1 pixel or ∆Z = 1/2 pixel.
As described in Sect. 4.2, the reconstruction accuracy of the proposed method
is higher than that of Goesele��s method with ∆Z = 1/10 pixel. Although the
computational cost for Goesele��s method can be reduced when ∆Z is large, the
reconstruction accuracy drops significantly. Compared with Goesele��s method,
the proposed method exhibits efficient 3D reconstruction from multiview images
in terms of the reconstruction accuracy and the computational cost.
An Efficient Image Matching Method for MultiView Stereo
13
Table 1. Computational cost to estimate the depth of one point on the referenceview
image for each method.
Additions Multiplications Divisions Square roots
Goesele, ∆Z = 1 pixel
75,140
31,246
578
578
Goesele, ∆Z = 1/2 pixel
150,280
62,492
1,156
1,156
Goesele, ∆Z = 1/5 pixel
357,700
156,230
2,890
2,890
Goesele, ∆Z = 1/10 pixel 751,400
312,460
5,780
5,780
Proposed method
40,000
34,496
2,176
1,088
5 Conclusion
This paper has proposed an efficient image matching method for MultiView
Stereo (MVS) using PhaseOnly Correlation (POC). The proposed method with
normalizing disparity and integrating POC functions can estimate the depth
from the correlation function obtained only by one window matching. Also, the
reconstruction accuracy of the proposed method is higher than that of NCC
based image matching, since POCbased image matching can estimate the ac
curate subpixel translational displacement between two windows by fitting the
analytical correlation peak model of the POC function. Through a set of exper
iments using the public multiview stereo datasets, we have demonstrated that
the proposed method performs better in terms of accuracy and computational
cost than Goesele��s method. In future work, we will improve the accuracy of the
proposed method to consider the normal vectors of 3D point and develop an
MVS algorithm using the proposed method.
References
1. Szeliski, R.: Computer Vision: Algorithms and Applications. SpringerVerlag New
York Inc. (2010)
2. Seitz, S.M., Curless, B., Diebel, J., Scharstein, D., Szeliski, R.: A comparison
and evaluation of multiviews stereo reconstruction algorithms. Proc. Int��l Conf.
Computer Vision and Pattern Recognition (2006) pp. 519–528
3. Strecha, C., Fransens, R., Gool, L.V.: Widebaseline stereo from multiple views: A
probabilistic account. Proc. Int��l Conf. Computer Vision and Pattern Recognition
(2004) pp. 552–559
4. Strecha, C., Fransens, R., Gool, L.V.: Combined depth and outlier estimation in
multiview stereo. Proc. Int��l Conf. Computer Vision and Pattern Recognition
(2006) pp. 2394–2401
5. Goesele, M., Curless, B., Seitz, S.M.: Multiview stereo revisited. Proc. Int��l Conf.
Computer Vision and Pattern Recognition (2006) pp. 2402–2409
6. Goesele, M., Snavely, N., Curless, B., Hoppe, H., Seitz, S.M.: Multiview stereo for
community photo collections. Proc. Int��l Conf. Computer Vision (2007) pp. 1–8
7. Strecha, C., von Hansen, W., Gool, L.V., Fua, P., Thoennessen, U.: On benchmark
ing camera calibration and multiview stereo for high resolution imagery. Proc.
Int��l Conf. Computer Vision and Pattern Recognition (2008) pp. 1–8
14
S. Sakai et al.
8. Campbell, N.D.F., Vogiatzis, G., Hernandez, C., Cipolla, R.: Using multiple hy
potheses to improve depthmaps for multiview stereo. Proc. European Conf. Com
puter Vision (2008) pp. 766–779
9. Bradley, D., Boubekeur, T., Heidrich, W.: Accurate multiview reconstruction
using robust binocular stereo and surface meshing. Proc. Int��l Conf. Computer
Vision and Pattern Recognition (2008) pp. 1–8
10. Furukawa, Y., Ponce, J.: Accurate, dense, and robust multiview stereopsis. IEEE
Trans. Pattern Analysis and Machine Intelligence Vol. 32 (2010) pp. 1362–1376
11. Kuglin, C.D., Hines, D.C.: The phase correlation image alignment method. Proc.
Int��l Conf. Cybernetics and Society (1975) pp. 163–165
12. Takita, K., Aoki, T., Sasaki, Y., Higuchi, T., Kobayashi, K.: Highaccuracy sub
pixel image registration based on phaseonly correlation. IEICE Trans. Fundamen
tals Vol. E86A (2003) pp. 1925–1934
13. Strecha, C.: (Multiview evaluation) http://cvlab.epfl.ch/data/.
14. Shibahara, T., Aoki, T., Nakajima, H., Kobayashi, K.: A subpixel stereo corre
spondence technique based on 1D phaseonly correlation. Proc. Int��l Conf. Image
Processing (2007) pp. V–221–V–224
15. Okutomi, M., Kanade, T.: A multiplebaseline stereo. IEEE Trans. Pattern Anal
ysis and Machine Intelligence Vol. 15 (1993) pp. 353–363
16. Bouguet, J.Y.: (Camera calibration toolbox for matlab) http://www.vision.
caltech.edu/bouguetj/calib_doc/.