Robust Principal Component Analysis with Missing Data
Fanhua Shang
†, Yuanyuan Liu
‡, James Cheng
†, Hong Cheng
‡
†Dept. of Computer Science and Engineering, The Chinese University of Hong Kong
{fhshang, jcheng}@cse.cuhk.edu.hk
‡Dept. of Systems Engineering and Engineering Management, The Chinese University of Hong Kong
{yyliu, hcheng}@se.cuhk.edu.hk
ABSTRACT
Recovering matrices from incomplete and corrupted obser-
vations is a fundamental problem with many applications in
various areas of science and engineering. In theory, under
certain conditions, this problem can be solved via a natural
convex relaxation. However, all current provable algorithms
suffer from superlinear per-iteration cost, which severely lim-
its their applicability to large scale problems. In this paper,
we propose a robust principal component analysis (RPCA)
plus matrix completion framework to recover low-rank and
sparse matrices from missing and grossly corrupted obser-
vations. Under the unified framework, we first present a
convex robust matrix completion model to replace the lin-
ear projection operator constraint by a simple equality one.
To further improve the efficiency of our convex model, we
also develop a scalable structured factorization model, which
can yield an orthogonal dictionary and a robust data repre-
sentation simultaneously. Then, we develop two alternating
direction augmented Lagrangian (ADAL) algorithms to ef-
ficiently solve the proposed problems. Finally, we discuss
the convergence analysis of our algorithms. Experimental
results verified both the efficiency and effectiveness of our
methods compared with the state-of-the-art algorithms.
Categories and Subject Descriptors
H.2.8 [
Database Management]: Database applications-
Data Mining
Keywords
RPCA, robust matrix completion, low-rank, ADAL
1. INTRODUCTION
In recent years, high dimensional data are becoming in-
creasingly ubiquitous such as digital photographs, surveil-
lance videos and social network data. Such high dimen-
sional data are becoming more commonly available due to
the advance in data collection technologies. However, the
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are not
made or distributed for profit or commercial advantage and that copies bear
this notice and the full citation on the first page. Copyrights for components
of this work owned by others than ACM must be honored. Abstracting with
credit is permitted. To copy otherwise, or republish, to post on servers or to
redistribute to lists, requires prior specific permission and/or a fee. Request
permissions from permissions@acm.org.
CIKM��14, November 3–7, 2014, Shanghai, China.
Copyright 2014 ACM 978-1-4503-2598-1/14/11 ...$15.00.
http://dx.doi.org/10.1145/2661829.2662083.
��curse of dimensionality�� has rendered many tasks such as
inference, learning, and recognition, impractical. Principal
component analysis (PCA) is arguably the most widely used
technique for dimensionality reduction in statistical data
analysis, mainly because it is simple to implement, can be
solved efficiently, and is often effective in real-world applica-
tions such as latent semantic indexing, face recognition and
text clustering. However, one of the main challenges faced
by PCA is that the observed data is often contaminated by
outliers or missing values.
This problem has drawn much attention from researchers
in various communities such as data mining, machine learn-
ing, signal/image processing, and computer vision [21, 30, 5,
27, 34, 7, 16]. Based on compressive sensing and rank mini-
mization, many methods for recovering low-rank and sparse
matrices (also called robust principal component analysis or
RPCA [30]) with incomplete or grossly corrupted observa-
tions have been proposed, such as principal component pur-
suit (PCP) [5] and outlier pursuit [31]. In principle, those
methods aim to minimize a hybrid optimization problem in-
volving both the
l1-norm and trace norm minimization. It
is recognized that the
l1-norm and the trace norm as the
convex surrogates for the
l0-norm and the rank function are
powerfully capable of inducing sparse and low-rank, respec-
tively [24, 5]. In addition, Xu et al. [31] used the
l1
,2-norm
to model corrupted columns.
Although RPCA has been well studied in recent years,
there is little work focusing on RPCA plus matrix comple-
tion (also called robust matrix completion or RMC [7]). In
this paper, we are particularly interested in the trace norm
regularized problem for RMC:
min
X
��X��∗ +
��f(
X)
,
(1)
where
��X��∗ is the trace norm of the desired matrix
X, i.e.,
the sum of its singular values,
�� �� 0 is a regularization
parameter, and
f(
·) denotes the loss function such as the
l2-norm loss or the
l1-norm (or
l1
,2-norm) loss functions. In
the following, we give a few examples of applications where
the RMC is useful.
• Robust principal component analysis (
RPCA).
When the loss function is the
l1-norm loss, the model
(1) is a RPCA problem, which is adopted by a great
number of emerging approaches such as PCP [5], and
has been successfully applied in many important prob-
lems such as latent semantic indexing [21], video surveil-
lance [30, 5], and low-rank textures.
• Matrix completion (
MC). When the loss function
is the
l2-norm loss, and only a relatively small number
of entries are observed, the goal of this problem is to
complete a low-rank matrix from incomplete samples
of its entries. This problem, also called matrix com-
pletion, is fundamental for collaborative filtering [6],
link prediction, and global positioning.
In this paper, we aim to recover both low-rank and sparse
matrices from missing and grossly corrupted observations.
Our solution provides a good approximation to the origi-
nal data contaminated with both outliers and missing val-
ues. Unlike existing RPCA methods, our approach not only
takes into account the fact that the observations are contam-
inated by additive outliers or missing values, but can also
identify both low-rank and sparse components from missing
and grossly corrupted observations. We develop a provable
and scalable solution framework for RPCA and RMC prob-
lems, which is particularly useful in this ��big data�� era where
many real-world applications need to deal with large, high
dimensional data with missing and corrupted values. We
conduct extensive experiments that verify both the efficiency
and effectiveness of our methods.
We summarize the main contributions of our work as fol-
lows:
• We propose a unified RMC framework to recover both
low-rank and sparse matrices from missing and grossly
corrupted observations, where the loss function can be
selected as the
l2-norm or the
l1-norm.
• We present both convex and non-convex scalable mod-
els to replace the linear projection operator constraint
by a simple equality one. With the orthogonality con-
straint applied to the dictionary component, we con-
vert the non-convex model into a smaller-scale matrix
trace norm regularized problem.
• We develop two efficient alternating direction augmented
Lagrangian (ADAL) algorithms to solve the proposed
methods, which can be accelerated by adaptively chang-
ing the penalty parameter.
• Finally, we also provide the convergence analysis of our
convex and non-convex algorithms.
This paper is organized as follows. We review background
and related work in Section 2. In Section 3, we present a
convex and a non-convex scalable trace norm regularized
models. We develop two efficient ADAL algorithms in Sec-
tion 4. We provide the theoretical analysis of our algorithms
in Section 5. We report empirical results in Section 6, and
conclude this paper in Section 7.
2. BACKGROUND AND PROBLEM FORMU-
LATION
Given a matrix, some of its entries may not be observed
due to problems in the acquisition process, e.g., loss of in-
formation or high cost of experiments to obtain complete
data [1]. For the matrix completion problem, many itera-
tive thresholding algorithms [25, 19] have been proposed to
solve the trace norm regularized linear least squares problem
min
X
��X��∗ +
��
2
��PΩ(
X)
− PΩ(
Z)
��2
F ,
(2)
where
PΩ(
Z) is defined as the projection of the matrix
Z
on the observed entries Ω:
PΩ(
Z)
ij =
Zij if (
i, j)
�� ٠and
PΩ(
Z)
ij = 0 otherwise. In other words,
f(
·) is the
l2-norm
loss function, i.e.,
f(
X) = 1
2
��PΩ(
X)
− PΩ(
Z)
��2
F .
A low-rank matrix
X can be recovered from highly cor-
rupted matrix
Z =
X +
E via the following trace norm and
l1-norm minimization problem
min
X
��X��∗ +
�ˡ�Z − X��1
,
(3)
where
��·��1 indicates the element-wise
l1-norm, i.e.,
��E��1 =
��
ij |eij | and
E =
Z − X. The model (3) is called the RPCA
problem [30], where
f(
·) is the
l1-norm loss function, i.e.,
f(
X) =
��Z − X��1. Several algorithms have been developed
to solve the convex optimization problem (3), such as PCP
[5] and IALM [17].
A more general RMC model in [7] and [16] aims to si-
multaneously recover both low-rank and sparse components
from incomplete and grossly corrupted observations via the
convex optimization problem,
min
X,E
��X��∗ +
�ˡ�E��1
,
s.t.
, PΩ(
X +
E) =
PΩ(
Z)
.
(4)
Chen et al. [7] and Li [16] provided theoretical performance
guarantees when minimizing trace norm plus
l1-norm suc-
ceeds in exact recovery. Although the RMC model (4) is
a convex optimization problem, and can be solved by some
convex algorithms, some additional variables need to be in-
troduced. In addition, all current provable algorithms for
RMC involve the singular vector decomposition (SVD), and
thus they suffer from high computational cost of full or par-
tial SVDs, which severely limits their applicability to large
scale problems. To address the issues, we propose a scalable
RMC method to recover matrices with missing and grossly
corrupted observations.
3. CONVEX AND NON-CONVEX RMC MOD-
ELS
From the optimization problem (4), we can find that the
optimal solution
EΩ
C =
0, where Ω
C is the complement of Ω,
i.e., the index set of unobserved entries. Consequently, we
have the following lemma.
Lemma 1.
The RMC problem (4) is equivalent to the fol-
lowing convex optimization problem,
min
X,E
��X��∗ +
�ˡ�PΩ(
E)
��1
,
s.t.
, PΩ(
X +
E) =
PΩ(
Z)
, EΩ
C =
0.
(5)
3.1 Convex RMC Model
To efficiently solve the RMC problem (4) and avoid in-
troducing some auxiliary variables, we can assume without
loss of generality that the unobserved data
ZΩ
C =
0, and
EΩ
C
may be any values such that
ZΩ
C =
XΩ
C +
EΩ
C . Therefore,
the constraint with a linear projection operator
PΩ in (4) is
simplified into
Z =
X +
E. It is worth noting that at last
EΩ
C will be set to
0 for the expected output
E. Hence, we
replace the constraint
PΩ(
Z) =
PΩ(
X+
E) with
Z =
X+
E,
and achieve the following equivalent form:
min
X,E
��X��∗ +
�ˡ�PΩ(
E)
��1
,
s.t.
, X +
E =
Z.
(6)
To further improve the efficiency of our convex model (6)
and the scalability of handling large data sets, we also pro-
pose a scalable non-convex model.
3.2 Non-Convex RMC Model
The desired low-rank matrix
X is factorized into two much
smaller matrices
G �� R
m��d (
GT G =
I) and
H �� R
n��d
whose product is equal to
X, i.e.,
X =
GHT , where
d is
an upper bound for the rank of the matrix
X, i.e.,
d �� r =
rank(
X). We have the following lemma.
Lemma 2.
Let G and H be two matrices of compatible
dimensions, where G has orthogonal columns, i.e., GT G =
I, then we have ��GHT ��∗ =
��H��∗.
By substituting
GHT =
X and
��H��∗ =
��X��∗ into (6), we
obtain a much smaller-scale matrix trace norm minimization
problem,
min
G,H,E
��H��∗ +
�ˡ�PΩ(
E)
��1
,
s.t.
, GHT +
E =
Z, GT G =
I.
(7)
Theorem 1.
Suppose (
X
∗
, E
∗
)
is a solution of the con-
vex problem (6) with rank(
X
∗
) =
r, then there exists the
solution G �� R
m��d, H �� R
n��d and E �� R
m��n to the prob-
lem (7) with d �� r and EΩ
C =
0 such that GHT =
X
∗
, and
(
GHT , E)
is also a solution to the problem (6).
Proof. If we know that (
X
∗
, E
∗
) is a solution to the
convex optimization problem (6), it is also a solution to
min
X,E,rank(
X)=
r
��X��∗ +
�ˡ�PΩ(
E)
��1
,
s.t.
, PΩ(
Z) =
PΩ(
X +
E)
, PΩ
C (
E) =
0.
Since for any (
X
∗
, E
∗
) with rank(
X
∗
) =
r, we can find
G �� R
m��d and
H �� R
n��d satisfying
GHT
=
X
∗
and
PΩ(
Z − GHT ) =
PΩ(
E) =
PΩ(
E
∗
), where
d �� r. More-
over, according to Lemma 2, we have
min
G,H,E
��H��∗ +
�ˡ�PΩ(
E)
��1
,
s.t.
, PΩ(
Z) =
PΩ(
GHT +
E)
, GT G =
I,
= min
G,H,E
��GHT ��∗ +
�ˡ�PΩ(
E)
��1
,
s.t.
, PΩ(
Z) =
PΩ(
GHT +
E)
,
=
min
X,E,rank(
X)=
r
��X��∗ +
�ˡ�PΩ(
E)
��1
,
s.t.
, PΩ(
Z) =
PΩ(
X +
E)
,
where
PΩ
C (
E)=
0.
In the following, we will discuss how to solve our convex
and non-convex models (6) and (7). It is worth noting that
the RPCA problem (3) can be viewed as a special case of
both (6) and (7), where all entries of the corrupted matrix
are directly observed. We will develop two efficient alter-
nating direction augmented Lagrangian (ADAL) solvers for
solving our convex model (6) and non-convex model (7),
respectively. It is also worth noting that although our non-
convex algorithm will produce different estimations of
G and
H, the estimation of
GHT is stable as guaranteed by Theo-
rem 1 and the convexity of the problem (6).
4. OPTIMIZATION ALGORITHMS
In this section, we will develop two efficient alternating di-
rection augmented Lagrangian (ADAL) algorithms for solv-
ing both problems (6) and (7). First, we design a general
convex ADAL scheme for solving the convex problem (6).
Then we propose a similar procedure for solving the non-
convex problem (7). The convergence analysis of our algo-
rithms is provided in the next section.
Algorithm 1 Solving the problem (8) via ADAL
Input: ��0.
Initialize: x0 =
0,
z0 =
0 and
��0 =
0.
for k = 0
, 1
, ··· , T do
xk+1 = arg min
x Lµ(
x, zk, ��k).
zk+1 = arg min
z Lµ(
xk+1
, z, ��k).
��k+1 =
��k +
µ(
Axk+1 +
Bzk+1
− c).
end for
Output: xk and
zk.
4.1 Generic Formulation
We describe our optimization algorithm based on the ADAL
method (also known as the alternating direction method of
multipliers) for solving (6). The ADAL method was intro-
duced for optimization in the 1970��s, and its origins can be
traced back to techniques for solving partial difference equa-
tions in the 1950��s. It has received renewed interest due to
the fact that it is efficient to tackle large scale problems
and solve optimization problems with multiple non-smooth
terms in the objective function [17]. The ADAL can be
considered as an approximation of the method of multipli-
ers. It decomposes a large global problem into a series of
smaller subproblems, and coordinates the solutions of sub-
problems to compute the globally optimal solution. The
problem solved by ADAL takes the following generic form
min
x��R
n,z��R
m
f(
x) +
g(
z)
,
s.t.
, Ax +
Bz =
c,
(8)
where both
f(
·) and
g(
·) are convex functions. ADAL re-
formulates the problem using a variant of the augmented
Lagrangian method as follows:
Lµ(
x, z, ��) =
f(
x)+
g(
z)+
��T (
Ax+
Bz−c)+
µ
2
��Ax+
Bz−c��2
2
,
where
�� is the Lagrangian multiplier and
µ is a penalty pa-
rameter. ADAL solves the problem (8) by iteratively min-
imizing
Lµ(
x, z, ��) over
x, z, and then updating
��, as out-
lined in
Algorithm 1 [2].
4.2 Convex ADAL Scheme
Our RMC optimization problem (6) can be solved by
ADAL. The augmented Lagrangian of (6) is given by
Lµ(
X,E,Y ) =
��X��∗ +
�ˡ�PΩ(
E)
��1
+
⟨Y, Z − X − E⟩ +
µ
2
��Z − X − E��2
F .
(9)
Applying ADAL, we carry out the following updating steps
in each iteration.
4.2.1 Updating X
With all other variables fixed, the optimal
X is the solu-
tion to the following problem:
min
X
��X��∗ +
µ
2
��X − Z +
E − Y/µ��2
F .
(10)
To solve the problem (10), the spectral soft-thresholding op-
eration [6, 4] is considered as a shrinkage operation on the
singular values and is defined as follows:
X = prox1
/µ(
T) :=
Udiag(max
{�� −
1
µ
, 0
})
V T ,
(11)
Algorithm 2 Solving RMC problem (6) via ADAL
Input: Given data
PΩ
C (
Z) and
��.
Initialize: X0 =
E0 =
Y0 =
0,
µ0 = 10
−4,
µmax = 1010,
�� = 1
.10, and tol.
while not converged
do
Update
Xk+1 by (11).
Update
Ek+1 by (13) and (14).
Update the multiplier
Yk+1 by
Yk+1 =
Yk +
µk(
Z − Xk+1
− Ek+1).
Update the parameter
µk+1 by
µk+1 = min(
��µk, µmax).
Check the convergence condition,
��Z − Xk+1
− Ek+1
��F < tol.
end while
Output: Xk and
Ek, where (
Ek)Ω
C is set to
0.
where
T :=
Z − E +
Y/µ, max
{·, ·} should be understood
element-wise,
U �� R
m��r,
V �� R
n��r, and
�� = (
��1
,��2
,...,��r)
T
�� R
r��1 are obtained by SVD of
T, i.e.,
T =
Udiag(
��)
V T .
4.2.2 Updating E
The optimal
E with all other variables fixed is the solution
to the following problem,
min
E
�ˡ�PΩ(
E)
��1 +
µ
2
��E +
X − Z − Y/µ��2
F .
(12)
To solve the problem (12), we introduce the following well-
known shrinkage (soft-thresholding) operator [8]:
S�� (
Mij ) :=
Mij − ��,
Mij > ��,
Mij +
��, Mij < −��,
0
,
otherwise
.
According to the soft-thresholding operator
S�� , the closed-
form solution
EΩ to the problem (12) is given by
EΩ =
S��/µ(
Z − X +
Y/µ)Ω
.
(13)
We can easily obtain the closed-form solution by zeroing
the gradient of (12) with respect to
EΩ
C , i.e.,
EΩ
C = (
Z − X +
Y/µ)Ω
C .
(14)
We can replace the
l1-norm loss function in the sparse
component learning problem (12) with the
l2
,1-norm loss
function for a sparse solution, such as outlier pursuit [31] or
low-rank representation [18] problems. The optimal solution
to the problem with
l2
,1 regularization can be obtained by
the soft-thresholding operator in [32].
Based on the description above, we develop an ADAL iter-
ative algorithm for solving the RMC problem (6), as outlined
in
Algorithm 2. In addition,
EΩ
C should be set to
0 for
the expected output
E. Moreover, an
O(1
/k) convergence
rate of Algorithm 2 can be established following the conclu-
sion in [11]. A fixed
µ is commonly used. But there are
some schemes of varying the penalty parameter to achieve
better convergence. This algorithm can also be accelerated
by adaptively changing
µ. An efficient strategy [17] is to let
µ =
µ0 (the initialization in Algorithm 2) and increase
µk
iteratively by
µk+1 =
��µk, where
�� �� (1
.0
, 1
.1] in general
and
µ0 is a small constant.
Algorithm 2 can be easily applied to solve the RPCA prob-
lem (3), where all entries of the corrupted matrix are directly
observed. Although we also use the PROPACK package [14]
to compute a partial SVD as in [25, 4, 17], Algorithm 2 em-
ploys the SVD for the spectral soft-thresholding operation,
and becomes slow or even impractical for large-scale prob-
lems. Therefore, we further propose an efficient ADAL al-
gorithm for solving the non-convex problem (7) in Section
4.3. In addition, several researchers [13, 28] have provided
some matrix rank estimation strategies to compute a good
value
r for the rank of the involved matrix. Thus, we only
set a relatively large integer
d such that
d �� r.
4.3 Non-Convex ADAL Scheme
Our non-convex RMC problem (7) can also be solved by
ADAL. The augmented Lagrangian of (7) is given by
Lµ(
G, H, E, Y ) =
��H��∗ +
�ˡ�PΩ(
E)
��1
+
⟨Y,Z − GHT − E⟩ +
µ
2
��Z − GHT − E��2
F .
(15)
We will derive our scheme for solving the following subprob-
lems with respect to
G,
H and
E, respectively,
G
∗
= arg min
G��R
m��d
Lµ(
G, H, E, Y )
,
s.t.
, GT G =
I,
(16)
H
∗
= arg min
H��R
n��d
Lµ(
G
∗
,H,E,Y )
,
(17)
E
∗
= arg min
E��R
m��n
Lµ(
G
∗
, H
∗
,E,Y )
.
(18)
4.3.1 Updating G
By fixing
H and
E at their latest values, and removing
the terms that do not depend on
G and adding some proper
terms that do not depend on
G, the optimization problem
(16) with respect to
G is reformulated as follows:
min
G
��GHT − P��2
F ,
s.t.
, GT G =
I,
(19)
where
P :=
Z − E +
Y/µ. This is actually the well-known
orthogonal procrustes problem [22], the optimal solution can
be given by the SVD of the matrix
PH, i.e.,
G
∗
= ˆ
U ˆ
V T ,
(20)
where ˆ
U and ˆ
V are given by the SVD of
PH, i.e.,
PH =
ˆ
U ˆ
S ˆ
V T .
4.3.2 Updating H
By fixing
G and
E, the optimization problem (17) with
respect to
H can be rewritten as:
min
H
µ
2
��G∗
HT − P��2
F +
��H��∗.
(21)
According to Theorem 2.1 in [4], the closed-form solution to
the problem (21) is given by the following theorem.
Theorem 2.
The trace norm minimization problem (21)
has a closed-form solution given by:
H
∗
= prox1
/µ(
PT G
∗
)
.
(22)
Proof. The first-order necessary and sufficient optimal-
ity condition for the convex problem (21) is given by
0
�� ∂��H��∗ +
µ(
G
∗
HT − P)
T G
∗
,
Algorithm 3 Solving RMCMF problem (7) via ADAL
Input: Given data
PΩ
C (
Z) and
��.
Initialize: G0 = eye(
m, d),
H0 =
0,
E0 =
Y0 =
0,
µ0 =
10
−4,
µmax = 1010,
�� = 1
.10, and tol.
while not converged
do
Update
Gk+1 by (20).
Update
Hk+1 by (22).
Update
Ek+1 by (26) and (27).
Update the multiplier
Yk+1 by
Yk+1 =
Yk +
µk(
Z − Gk+1
HT
k+1
− Ek+1).
Update the parameter
µk+1 by
µk+1 = min(
��µk, µmax).
Check the convergence condition,
��Z − Gk+1
HT
k+1
− Ek+1
��F < tol.
end while
Output: Gk,
Hk and
Ek, where (
Ek)Ω
C is set to
0.
where
∂��H��∗ denotes the set of subgradients of the trace
norm (optimality conditions for trace norm are given in [4]).
Since (
G
∗
)
T G
∗
=
I, the optimality condition for the prob-
lem (21) is rewritten as follows:
0
�� ∂��H��∗ +
µ(
H − PT G
∗
)
.
(23)
(23) is also the first-order optimality condition for the fol-
lowing problem,
min
H
µ
2
��H − PT G
∗��2
F +
��H��∗.
(24)
According to Theorem 2.1 in [4], the optimal solution of the
problem (24) is given by (22).
4.3.3 Updating E
By fixing all other variables, the optimal
E is the solution
to the following problem:
min
E
�ˡ�PΩ(
E)
��1 +
µ
2
��E +
G
∗
(
H
∗
)
T − Z − Y/µ��2
F . (25)
The updating steps for
E are very similar to (13) and (14),
where
X is replaced by
G
∗
(
H
∗
)
T as follows:
E
∗
Ω =
S��/µ(
Z − G
∗
(
H
∗
)
T +
Y/µ)Ω
,
(26)
and
E
∗
Ω
C = (
Z − G
∗
(
H
∗
)
T +
Y/µ)Ω
C .
(27)
Following the above analysis, we obtain an ADAL algo-
rithm to solve the matrix factorization based RMC (RM-
CMF) problem (7), as outlined in
Algorithm 3. In addi-
tion,
EΩ
C should be set to
0 for the output
E. Algorithm 3
can also be easily applied to solve the RPCA problem (3).
5. ALGORITHM ANALYSIS
We now provide convergence analysis and complexity anal-
ysis for our algorithms.
5.1 Convergence Analysis
The convergence of ADAL to solve the standard form (8)
was studied in [9, 2]. We establish the convergence of Algo-
rithm 2 by transforming the RMC problem (6) into a stan-
dard form (8), and show that the transformed problem sat-
isfies the condition needed to establish the convergence. In
Algorithm 2, we state that our algorithm alternates between
two blocks of variables,
X and
E. Let x denote the vectoriza-
tion of
X, i.e., x = vec(
X)
�� R
mn��1, e = vec(
E)
�� R
mn��1
and z = vec(
Z)
�� R
mn��1, and
f(
X) :=
��X��∗ and
g(
E) :=
�ˡ�PΩ(
E)
��1. We can write the equivalence constraint in (6)
as the following form:
Ax
− Be=z
,
where both
A �� R
mn��mn and
B �� R
mn��mn are the identity
matrices. By the definition
f(
X) and
g(
E), it is easy to
verify that the problem (6) and Algorithm 2 satisfy the con-
ditions in Algorithm 1. Hence, the convergence of Algorithm
2 is given as follows:
Theorem 3.
Consider the RMC problem (6), where both
f(
X)
and g(
E)
are convex functions, and A and B are both
identity matrices and have full column rank. The sequence
{Xk, Ek} generated by Algorithm 2 converges to an optimal
solution {X
∗
, E
∗} of the problem (6).
Hence, the sequence
{Xk, Ek} converges to an optimal solu-
tion to the RMC problem (4), where (
Ek)Ω
C =
0. Moreover,
the convergence of our derived Algorithm 3 for the non-
convex problem (7) is guaranteed, as shown in the following
theorem.
Theorem 4.
Let (
Gk, Hk,Ek)
be a sequence generated by
Algorithm 3, then we have the following conclusions:
1. (
Gk, Hk, Ek)
approaches to a feasible solution, i.e.,
limk������Z − GkHT
k − Ek��F = 0
.
2. Both sequences GkHT
k and Ek are Cauchy sequences.
3. (
Gk, Hk, Ek)
converges to a KKT point of the problem
(7).
The proof of this theorem can be found in APPENDIX.
5.2 Complexity Analysis
For the convex problem (6), the running time of Algorithm
2 is dominated by that of performing SVD on the matrix of
size
m �� n. For the non-convex problem (7), Algorithm 3
performs SVD on much smaller matrices of sizes
m �� d and
d �� n, and some matrix multiplications in (22). Hence, the
total time complexity of Algorithm 2 and Algorithm 3 are
O(
tmn2) and
O(
t(
d2
m +
mnd)) (
d ≪ n �� m), respectively,
where
t is the number of iterations.
5.3 Connections to Existing Approaches
Our non-convex method is the scalable version of our con-
vex method for both RPCA and RMC problems. In addi-
tion, the computational complexity of existing convex algo-
rithms is at least
O(
mn2). It means that common RPCA
(e.g., PCP [5]) and RMC (e.g., SpaRCS [27]) methods can-
not handle large-scale problems, while our non-convex method
has a complexity practically linear to the input matrix size
and scales well to handle large-scale problems.
From the problems (3) and (6), we can see that in essence
our convex method for RPCA problems is equivalent to
IALM [17]. The models in [10] and [20] are the special cases
of our model (7) when
�� �� ��. Moreover, the models in [33]
and [3] focus only on the desired low-rank matrix. In this
sense, them can be viewed as the special cases of our model
(7). From the above complexity analysis, both schemes have
the same theoretical computational complexity. However,
from the experimental results in the next section, we can
(a)
(b)
(c)
Figure 1: Image used in the text removal experi-
ment: (a) Input image; (b) Original image; (c) Out-
lier mask.
see that our non-convex method usually runs much faster
than the methods in [33] and [3]. The following bilinear reg-
ularized matrix factorization formulation in [3] is one of the
most similar model to our model (7),
min
G,H
��f(
Z − GHT ) +
1
2
(
��G��2
F +
��H��2
F )
.
(28)
6. EXPERIMENTAL RESULTS
In this section, we evaluate both the effectiveness and ef-
ficiency of our RMC and RMCMF algorithms for solving
RMC problems such as text removal, face reconstruction
and background modeling. All experiments were performed
using Matlab 7.11 on an Intel(R) Core (TM) i5-4570 (3.20
GHz) PC running Windows 7 with 8GB main memory.
6.1 Text Removal
We first conduct an experiment by considering a simulated
image processing task on artificially generated data, and the
goal is to remove some generated text from an image. The
ground-truth image is of size 256
�� 222 with rank equal to
10 for the data matrix. We then add to the image a short
phase in text form which plays the role of outliers. Fig. 1
shows the image together with the clean image and outliers
mask. For fairness, we set the rank of all the algorithms
to 20, which is two times the true rank of the underlying
matrix. The input data are generated by setting 30% of
the randomly selected pixels of the image as missing entries.
We compare our two methods, RMC and RMCMF, with the
state-of-the-art methods, PCP [5], SpaRCS1 [27], RegL12
[33] and BF-ALM [3]. We set the regularization parameter
�� = 1
/
��
max(
m, n) and the stopping tolerance tol = 10
−4
for all algorithms in this experiment.
The results obtained by different methods are visually
shown in Fig. 2, where the outlier detection accuracy (the
score Area Under the receiver operating characteristic Curve,
AUC) and the error of low-rank component recovery (i.e.,
��X − Z��F /��Z��F , where
Z and
X denote the ground-truth
image matrix and the recovered image matrix, respectively)
are also presented. As far as low-rank matrix recovery is con-
cerned, the five RMC methods including SpaRCS, RegL1,
BF-ALM, RMC and RMCMF, outperform PCP, not only
visually but also quantitatively. For outlier detection, it can
be seen that our methods RMC and RMCMF perform better
than the other methods. In short, RMC and RMCMF sig-
nificantly outperform PCP, SpaRCS, RegL1 and BF-ALM
in terms of both low-rank matrix recovery and spare outlier
identification. Moreover, the running time of PCP, SpaRCS,
1http://www.ece.rice.edu/~aew2/sparcs.html
2https://sites.google.com/site/yinqiangzheng/
30
40
50
60
70
0.5
0.6
0.7
0.8
0.9
Given rank
AUC
PCP
SpaRCS
RegL1
RMC
RMCMF
30
40
50
60
70
0.2
0.3
0.4
0.5
0.6
Given rank
Error
PCP
SpaRCS
RegL1
RMC
RMCMF
10−2
10−1
100
0.75
0.8
0.85
0.9
0.95
Regularization parameter
AUC
PCP
SpaRCS
RegL1
RMC
RMCMF
10−2
10−1
100
0.2
0.25
0.3
0.35
0.4
Regularization parameter
Error
PCP
SpaRCS
RegL1
RMC
RMCMF
Figure 3: Performance of PCP, SpaRCS, RegL1,
RMC and RMCMF in terms of AUC (left) and Er-
ror (right) with varying ranks (the first row) or reg-
ularization parameters (the second row).
RegL1, BF-ALM, RMC and RMCMF is 15.39sec, 5.74sec,
3.86sec, 2.62sec, 1.10sec and 0.23sec, respectively.
Moreover, we further evaluate the robustness of our meth-
ods, RMC and RMCMF, with respect to the regularization
parameter
�� and the given rank changes. We conduct some
experiments on the artificially generated data, and illustrate
the outlier detection accuracy (AUC) and the error (Error)
of low-rank component recovery, where the rank parame-
ter of our RMCMF method, SpaRCS and RegL1 is cho-
sen from
{30
, 35
, ··· , 70
}, and the regularization parame-
ter
�� of RMC, RMCMF, PCP and RegL1 is chosen from
the grid
{0
.01
, 0
.025
, 0
.05
, 0
.075
, 0
.1
, 0
.25
, 0
.5
, 0
.75
, 1
}. No-
tice that we only report the results of RegL1 due to the
similar performance of RegL1 and BF-ALM. The average
AUC and Error results of 10 independent runs are shown in
Fig. 3, from which we can see that our RMCMF method per-
forms much more robust than SpaRCS and RegL1 in terms
of AUC and Error with respect to the given rank. Fur-
thermore, our RMC and RMCMF methods are much more
robust than PCP and RegL1 in terms of AUC and Error
against the regularization parameter.
6.2 Face Reconstruction
We also test our methods for the face reconstruction prob-
lems. The face database used in this experiment is a part of
Extended Yale Face Database B [15] with large corruptions.
The part of Extended Yale-B consists of 320 frontal face im-
ages of the first 5 classes, and each subset contains 64 images
with varying illumination conditions and heavily ��shadows��
as outliers. The resolution of all images is 192
��168 and the
pixel values are normalized to [0, 1], then the pixel values are
used to form data vectors of dimension 32,256. The input
data are generated by setting 40% of the randomly selected
pixels of each image as missing entries.
Fig. 4 shows some original and reconstructed images by
RegL1 and CWM3 [20], and our methods, RMC and RM-
CMF, where the average computational time (in seconds)
3http://www4.comp.polyu.edu.hk/~cslzhang/papers.htm
(a)
(b)
(c)
(d)
(e)
(f)
Figure 2: Text removal results of different methods, where the first row shows the recovered foreground
masks and the second row shows the recovered background images: (a) PCP (AUC: 0.7934; Error: 0.3290);
(b) SpaRCS (AUC: 0.8487; Error: 0.2623); (c) RegL1 (AUC: 0.8792; Error: 0.2291); (d) BF-ALM (AUC:
0.8568; Error: 0.2435); (e) RMC (AUC: 0.9206; Error: 0.1987); (f) RMCMF (AUC: 0.9197; Error: 0.1996).
Figure 4: Face reconstruction results. From left
column to right column: input corrupted images
(black pixels denote missing entries), original im-
ages, reconstruction results by CWM (1554.91sec),
RegL1 (2710.68sec), RMC (54.19sec) and RMCMF
(21.18sec).
of all these algorithms on each people��s faces is presented.
It can be observed that RMC and RMCMF perform much
better than the other methods visually, as they effectively
eliminate the heavy noise and ��shadows�� and simultaneously
complete the missing entries. In other words, our RMC and
RMCMF methods can achieve the latent features underly-
ing the original images even though the observed data is
corrupted by both outliers and missing values. And impres-
sively, both RMC and RMCMF are also significantly faster
than RegL1 and CWM.
6.3 Background Modeling
In this experiment we test our methods on real surveil-
lance videos for object detection and background subtrac-
tion as a robust matrix completion problem. Background
modeling is a crucial task for motion segmentation in surveil-
lance videos. A video sequence satisfies the low-rank and
sparse structures, because the background of all the frames
is controlled by few factors and hence exhibits low-rank
property, and the foreground is detected by identifying spa-
tially localized sparse residuals [30, 5]. We test our methods
on real surveillance videos for object detection and back-
ground subtraction on four color surveillance videos: Boot-
strap, Lobby, Hall and Mall databases4. The data matrix
Z
consists of the first 400 frames of size 120
�� 160. Since all
the original videos have colors, we first reshape every frame
of the video into a long column vector and then collect all
the columns into a data matrix
Z with size of 57600
�� 400.
Moreover, the input data is generated by setting 10% of the
randomly selected pixels of each frame as missing entries.
Figs. 5 and 6 illustrate the foreground and background
separation results on the Bootstrap and Mall databases,
where the first and fourth columns represent the incom-
plete input images, the second and fifth columns show the
low-rank recoveries, and the third and sixth columns show
the sparse components. It is clear that the background can
be effectively extracted by RMC, RMCMF, BF-ALM, and
GRASTA5 [12]. Notice that SpaRCS [27] could not yield ex-
perimental results on these databases because it ran out of
memory. We can see that the decomposition results of RM-
CMF and RMC are slightly better than that of GRASTA
and BF-ALM. As pointed out in [23], the theoretical reason
for the unsatisfactory performance of the
l1-penalty is that
the irrepresentable condition is not met. Hence, RMCMF
incorporating with matrix factorization is more accurate in
recovering the low-rank matrix than RMC. Furthermore, we
also provide the CPU time consumption of these algorithms
on all four databases, as shown in Table 1, from which we
can see that RMCMF is more than 7 times faster than RMC,
more than 4 times faster than GRASTA, and more than 2
times faster than BF-ALM. This further shows that RM-
CMF has good scalability and can address large-scale prob-
lems.
7. CONCLUSIONS
We proposed a unified RMC framework for RPCA and
RMC problems. We first presented two matrix trace norm
regularized models that replace the linear projection op-
erator constraint by a simple equality one. Then we de-
4http://perception.i2r.a-star.edu.sg/bkmodel/bkindex
5https://sites.google.com/site/hejunzz/grasta
Figure 5: Foreground and background separation results of different algorithms on the Bootstrap data set,
where the first, second, third and last rows show the recovered low-rank and sparse images by GRASTA,
BF-ALM, RMC and RMCMF, respectively.
Table 1: Comparison of time costs in CPU seconds
of GRASTA, BF-ALM, RMC and RMCMF on back-
ground modeling datasets.
Datasets
Sizes
GRASTA
BF-ALM
RMC
RMCMF
Bootstrap
57
, 600
�� 400
153.65
93.17
344.13
38.32
Lobby
61
, 440
�� 400
187.43
139.83
390.78
50.08
Hall
76
, 032
�� 400
315.11
153.45
461.48
67.73
Mall
245
, 760
�� 200
493.92
–
–
94.59
veloped two efficient ADAL algorithms to solve our con-
vex and non-convex low-rank and sparse matrix decompo-
sition problems. Finally, we analyzed the convergence of
our algorithms. Experimental results on synthetic and real-
world data sets demonstrated the superior performance of
our methods compared with the state-of-the-art methods in
terms of both efficiency and effectiveness.
Both our algorithms are essentially the Gauss-Seidel schemes
of ADAL, and their Jacobi-type update schemes can be
easily implemented in parallel. Hence, our algorithms are
well suited for parallel and distributed computing and are
particularly attractive for solving certain large-scale prob-
lems. Moreover, our methods can easily extended to the
general nonconvex low-rank inducing penalty problem [26].
For future work, we will consider the compressing RMC
(also called compressive principal component pursuit) prob-
lem with the general linear operator as in [29].
8. ACKNOWLEDGMENTS
We thank the reviewers for giving us many constructive
comments, with which we have significantly improved our
paper. This research is supported in part by SHIAE Grant
No. 8115048, MSRA Grant No. 6903555, GRF No. 411211,
and CUHK direct grant Nos. 4055015 and 4055017.
9. REFERENCES
[1] E. Acar, D. Dunlavy, T. Kolda, and M. Mørup.
Scalable tensor factorizations with missing data. In
SDM, pages 701–711, 2010.
[2] S. Boyd, N. Parikh, E. Chu, B. Peleato, and
J. Eckstein. Distributed optimization and statistical
learning via the alternating direction method of
multipliers.
Found. Trends Mach. Learn., 3(1):1–122,
2011.
[3] R. Cabral, F. Torre, J. Costeira, and A. Bernardino.
Unifying nuclear norm and bilinear factorization
approaches for low-rank matrix decomposition. In
ICCV, pages 2488–2495, 2013.
[4] J. Cai, E. Cand`es, and Z. Shen. A singular value
thresholding algorithm for matrix completion.
SIAM
J. Optim., 20(4):1956–1982, 2010.
[5] E. Cand`es, X. Li, Y. Ma, and J. Wright. Robust
principal component analysis?
J. ACM, 58(3):1–37,
2011.
[6] E. Cand`es and B. Recht. Exact matrix completion via
convex optimization.
Found. Comput. Math.,
9(6):717–772, 2009.
[7] Y. Chen, A. Jalali, S. Sanghavi, and C. Caramanis.
Low-rank matrix recovery from errors and erasures.
IEEE Trans. Inform. Theory, 59(7):4324–4337, 2013.
[8] I. Daubechies, M. Defrise, and C. DeMol. An iterative
thresholding algorithm for linear inverse problems
with a sparsity constraint.
Commun. Pur. Appl.
Math., 57(11):1413–1457, 2004.
[9] J. Eckstein and D. Bertsekas. On the
Douglas-Rachford splitting method and the proximal
point algorithm for maximal monotone operators.
Math. Prog., 55(1):293–318, 1992.
Figure 6: Foreground and background separation results of different algorithms on the Mall data set, where
the first and second rows show the recovered low-rank and sparse images by GRASTA and RMCMF, respec-
tively.
[10] A. Eriksson and A. van den Hengel. Efficient
computation of robust low-rank matrix
approximations in the presence of missing data using
the l1 norm. In
CVPR, pages 771–778, 2010.
[11] B. He and X. Yuan. On the O(1/n) convergence rate
of the Douglas-Rachford alternating direction method.
SIAM J. Numer. Anal., 50(2):700–709, 2012.
[12] J. He, L. Balzano, and A. Szlam. Incremental gradient
on the Grassmannian for online foreground and
background separation in subsampled video. In
CVPR, pages 1568–1575, 2012.
[13] R. Keshavan, A. Montanari, and S. Oh. Matrix
completion from a few entries.
IEEE Trans. Inform.
Theory, 56(6):2980–2998, 2010.
[14] R. Larsen. Propack-software for large and sparse svd
calculations. 2005.
[15] K. Lee, J. Ho, and D. Kriegman. Acquiring linear
subspaces for face recognition under variable lighting.
IEEE Trans. Pattern Anal. Mach. Intell.,
27(5):684–698, 2005.
[16] X. Li. Compressed sensing and matrix completion
with constant proportion of corruptions.
Constr.
Approx., 37:73–99, 2013.
[17] Z. Lin, R. Liu, and Z. Su. Linearized alternating
direction method with adaptive penalty for low-rank
representation. In
NIPS, pages 612–620, 2011.
[18] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma.
Robust recovery of subspace structures by low-rank
representation.
IEEE Trans. Pattern Anal. Mach.
Intell., 35(1):171–184, 2013.
[19] S. Ma, D. Goldfarb, and L. Chen. Fixed point and
Bregman iterative methods for matrix rank
minimization.
Math. Prog., 128(1):321–353, 2011.
[20] D. Meng, Z. Xu, L. Zhang, and J. Zhao. A cyclic
weighted median method for L1 low-rank matrix
factorization with missing entries. In
AAAI, 2013.
[21] K. Min, Z. Zhang, J. Wright, and Y. Ma.
Decomposition background topics from keywords by
principal component pursuit. In
CIKM, pages
269–278, 2010.
[22] H. Nick. Matrix procrustes problems. 1995.
[23] Y. She and A. Owen. Outlier detection using
nonconvex penalized regression.
J. Am. Stat. Assoc.,
106(494):626–639, 2011.
[24] M. Tao and X. Yuan. Recovering low-rank and sparse
components of matrices from incomplete and noisy
observations.
SIAM J. Optim., 21(1):57–81, 2011.
[25] K.-C. Toh and S. Yun. An accelerated proximal
gradient algorithm for nuclear norm regularized least
squares problems.
Pac. J. Optim., 6:615–640, 2010.
[26] S. Wang, D. Liu, and Z. Zhang. Nonconvex relaxation
approaches to robust matrix recovery. In
IJCAI, pages
1764–1770, 2013.
[27] A. Waters, A. Sankaranarayanan, and R. Baraniuk.
SpaRCS: Recovering low-rank and sparse matrices
from compressive measurements. In
NIPS, pages
1089–1097, 2011.
[28] Z. Wen, W. Yin, and Y. Zhang. Solving a low-rank
factorization model for matrix completion by a
nonlinear successive over-relaxation algorithm.
Math.
Prog. Comp., 4(4):333–361, 2012.
[29] J. Wright, A. Ganesh, K. Min, and Y. Ma.
Compressive principal component pursuit.
Inform.
Infer., 2:32–68, 2013.
[30] J. Wright, A. Ganesh, S. Rao, Y. Peng, and Y. Ma.
Robust principal component analysis: exact recovery
of corrupted low-rank matrices by convex
optimization. In
NIPS, pages 2080–2088, 2009.
[31] H. Xu, C. Caramanis, and S. Sanghavi. Robust PCA
via outlier pursuit. In
NIPS, pages 2496–2504, 2010.
[32] J. Yang, W. Yin, Y. Zhang, and Y. Wang. A fast
algorithm for edge-preserving variational multichannel
image restoration.
SIAM J. Imag. Sci., 2(2):569–592,
2009.
[33] Y. Zheng, G. Liu, S. Sugimoto, S. Yan, and
M. Okutomi. Practical low-rank matrix approximation
under robust L1-norm. In
CVPR, pages 1410–1417,
2012.
[34] T. Zhou and D. Tao. GoDec: Randomized low-rank &
sparse matrix decomposition in noisy case. In
ICML,
pages 33–40, 2011.
APPENDIX
We first prove that the boundedness of the multiplier and
some variables of Algorithm 3, and then analyze the conver-
gence of Algorithm 3. To prove the boundedness, we first
give the following lemmas.
Lemma 3.
Let X be a real Hilbert space endowed with an
inner product ⟨·⟩ and a corresponding norm ��·�� (the trace
norm or the l1
norm), and y �� ∂��x��, where ∂��·�� denotes
the subgradient. Then ��y��
∗
= 1
if x ̸= 0
, and ��y��
∗ �� 1
if
x = 0
, where ��·��
∗
is the dual norm of the norm ��·��.
Lemma 4.
Let Yk+1 =
Yk +
µk(
Z − Gk+1
HT
k+1
− Ek+1)
,
ˆ
Yk+1 =
Yk +
µk(
Z − Gk+1
HT
k+1
− Ek)
, and ˜
Yk+1 =
Yk +
µk(
Z − Gk+1
HT
k − Ek)
, where Gk+1
is the optimal solution
of the problem (19). Then the sequences {Yk}, {ˆ
Yk}, {˜
Yk},
{Hk} and {Ek} produced by Algorithm 3 are all bounded.
Proof. By the first-order optimality condition of the prob-
lem (18) with respect to
Ek+1, we have
0
�� ∂(
Ek+1)Ω
Lµk (
Gk+1
, Hk+1
, Ek+1
, Yk)
,
and (
Yk +
µk(
Z − Gk+1
HT
k+1
− Ek+1))Ω
�� ��∂��(
Ek+1)Ω
��1,
i.e., (
Yk+1)Ω
�� ��∂��(
Ek+1)Ω
��1.
Furthermore, by substituting
Yk+1 =
Yk+
µk(
Z−Gk+1
HT
k+1
−
Ek+1) into (18), we have (
Yk+1)Ω
C = 0.
By Lemma 4, we have
��Yk+1
���� =
��(
Yk+1)Ω
C + (
Yk+1)Ω
���� �� ��,
where
��·���� denotes the matrix
l��-norm, i.e.,
��M���� =
max
i,j |Mi,j |. Thus, the sequence
{Yk} is bounded.
By the iteration procedure of Algorithm 3, we have
Lµk (
Gk+1
, Hk+1
,Ek+1
, Yk)
�� Lµk (
Gk+1
, Hk+1
, Ek, Yk)
��Lµk (
Gk+1
, Hk, Ek, Yk)
�� Lµk (
Gk, Hk, Ek, Yk)
=
Lµk−1 (
Gk,Hk, Ek, Yk−1) +
��k��Yk − Yk−1
��2
F ,
where
��k = 1
2
µ
−2
k−1(
µk−1 +
µk) and
µk =
��µk−1. Since
��
��
k=1
1
2
µ
−2
k−1(
µk−1 +
µk) =
��(
�� + 1)
2
µ0
��
��
k=1
1
��k
=
��(
�� + 1)
2
µ0(
�� − 1)
< ��,
we have that
{Lµk (
Gk+1
, Hk+1
, Ek+1
,Yk)
} is upper bounded
due to the boundedness of
{Yk}. Then
��Hk��∗ +
�ˡ�PΩ(
Ek)
��1
=
Lµk−1 (
Gk, Hk,Ek,Yk−1)
− ⟨Yk−1
, Z − GkHT
k − Ek⟩
−
µk−1
2
��Z − GkHT
k − Ek��2
F ,
=
Lµk−1 (
Gk,Hk,Ek,Yk−1)
−
1
2
µk−1
(
��Yk��2
F − ��Yk−1
��2
F )
,
is upper bounded, i.e.,
{Hk} and
{Ek} are bounded. Since
��GkHT
k ��∗ =
��Hk��∗,
{GkHT
k } is also bounded.
We next prove that
{˜
Yk} is bounded. Since
Gk+1 is the
optimal solution of the problem (19), then we have
��Yk +
µk(
Z − Gk+1
HT
k − Ek)
��2
F
�ܡ�Yk +
µk(
Z − GkHT
k − Ek)
��2
F .
By the definition of ˜
Yk+1, and
µk+1 =
��µk, thus,
��˜
Yk+1
��2
F �� ��(1 +
��)
Yk − ��Yk−1
��2
F .
By the boundedness of
Hk and
Yk, then the sequence
{˜
Yk}
is bounded.
The first-order optimality condition of the problem (21)
with respect to
Hk+1 is rewritten as follows:
GT
k+1
ˆ
Yk+1
�� ∂��HT
k+1
��∗.
According to Lemma 4, we have
��GT
k+1
ˆ
Yk+1
��2
�� 1, where
��M��2 denotes the spectral norm of
M and is equal to the
maximum singular value of
M. Thus,
GT
k+1
ˆ
Yk+1 is bounded.
Let
G
��
k+1 denote the orthogonal complement of
Gk+1, i.e.,
G
��
k+1
Gk+1 = 0, then we have
(
G
��
k+1)
T ˆ
Yk+1
=(
G
��
k+1)
T (
Yk +
µk(
Z − Gk+1
HT
k+1
− Ek))
,
=(
G
��
k+1)
T (
Yk +
µk(
Z − Ek))
,
=(
G
��
k+1)
T (
Yk +
µk(
Z − Gk+1
HT
k − Ek))
,
=(
G
��
k+1)
T ˜
Yk+1
.
Thus,
{(
G
��
k+1)
T ˆ
Yk+1
} is bounded due to the boundedness
of
{˜
Yk}. Then we have
��ˆ
Yk+1
��2 =
��GT
k+1
ˆ
Yk+1 + (
G
��
k+1)
T ˆ
Yk+1
��2
�ܡ�GT
k+1
ˆ
Yk+1
��2 +
��(
G
��
k+1)
T ˆ
Yk+1
��2
.
Since both
GT
k+1
ˆ
Yk+1 and (
G
��
k+1)
T ˆ
Yk+1 are bounded, the
sequence
{ˆ
Yk} is bounded.
Proof of Theorem 4:
Proof.
1. By
Z − Gk+1
HT
k+1
− Ek+1 =
µ
−1
k (
Yk+1
− Yk),
the boundedness of
{Yk} and lim
k���� µk =
��, we have
lim
k����
Z − Gk+1
HT
k+1
− Ek+1 = 0
.
Thus, (
Gk, Hk, Ek) approaches to a feasible solution.
2. We prove that the sequences
{Ek} and
{GkHT
k } are
Cauchy sequences.
By
��Ek+1
− Ek�� =
µ
−1
k ��Yk+1
− ˆ
Yk+1
�� =
o(
µ
−1
k ) and
��
��
k=1
µ
−1
k−1 =
��
µ0(
�� − 1)
< ��,
thus,
{Ek} is a Cauchy sequence, and it has a limit
E
∗
.
Similarly,
{GkHT
k } is also a Cauchy sequence, therefore it
has a limit
{G∗
(
H
∗
)
T }.
3. According to Algorithm 3, the first-order optimality
conditions of the problems (25) and (21) at the
k-th iteration
is formulated as follows:
GT
k (
Yk−1 +
µk−1(
Z − GkHT
k − Ek−1))
�� ∂��HT
k ��∗,
and
(
Yk−1 +
µk−1(
Z − GkHT
k − Ek))Ω
�� ��∂��(
Ek)Ω
��1
.
Since both
{Ek} and
{GkHT
k } are Cauchy sequences, let
E
∗
and
G
∗
(
H
∗
)
T be limit of
{Ek} and
{GkHT
k }, respec-
tively. By the definition of
Yk, we have
(
G
∗
)
T Y
∗ �� ∂��(
H
∗
)
T ��∗, (
Y
∗
)Ω
�� ��∂��(
E
∗
)Ω
��1
,
and (
G
∗
,H
∗
, E
∗
) is a feasible solution, i.e.,
Z =
G
∗
(
H
∗
)
T +
E
∗
,
and
G
∗
(
G
∗
)
T =
I.
Thus, (
G
∗
, H
∗
, E
∗
) is a KKT point of the problem (7).