Home > Robust Principal Component Analysis with Missing Data

Robust Principal Component Analysis with Missing Data

Page 1
Robust Principal Component Analysis with Missing Data
Fanhua Shang, Yuanyuan Liu, James Cheng, Hong Cheng
Dept. of Computer Science and Engineering, The Chinese University of Hong Kong
{fhshang, jcheng}@cse.cuhk.edu.hk
Dept. of Systems Engineering and Engineering Management, The Chinese University of Hong Kong
{yyliu, hcheng}@se.cuhk.edu.hk ABSTRACT
Recovering matrices from incomplete and corrupted obser- vations is a fundamental problem with many applications in various areas of science and engineering. In theory, under certain conditions, this problem can be solved via a natural convex relaxation. However, all current provable algorithms suffer from superlinear per-iteration cost, which severely lim- its their applicability to large scale problems. In this paper, we propose a robust principal component analysis (RPCA) plus matrix completion framework to recover low-rank and sparse matrices from missing and grossly corrupted obser- vations. Under the unified framework, we first present a convex robust matrix completion model to replace the lin- ear projection operator constraint by a simple equality one. To further improve the efficiency of our convex model, we also develop a scalable structured factorization model, which can yield an orthogonal dictionary and a robust data repre- sentation simultaneously. Then, we develop two alternating direction augmented Lagrangian (ADAL) algorithms to ef- ficiently solve the proposed problems. Finally, we discuss the convergence analysis of our algorithms. Experimental results verified both the efficiency and effectiveness of our methods compared with the state-of-the-art algorithms.
Categories and Subject Descriptors
H.2.8 [Database Management]: Database applications- Data Mining
Keywords
RPCA, robust matrix completion, low-rank, ADAL
1. INTRODUCTION
In recent years, high dimensional data are becoming in- creasingly ubiquitous such as digital photographs, surveil- lance videos and social network data. Such high dimen- sional data are becoming more commonly available due to the advance in data collection technologies. However, the
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. CIKM��14, November 3–7, 2014, Shanghai, China. Copyright 2014 ACM 978-1-4503-2598-1/14/11 ...$15.00. http://dx.doi.org/10.1145/2661829.2662083.
��curse of dimensionality�� has rendered many tasks such as inference, learning, and recognition, impractical. Principal component analysis (PCA) is arguably the most widely used technique for dimensionality reduction in statistical data analysis, mainly because it is simple to implement, can be solved efficiently, and is often effective in real-world applica- tions such as latent semantic indexing, face recognition and text clustering. However, one of the main challenges faced by PCA is that the observed data is often contaminated by outliers or missing values. This problem has drawn much attention from researchers in various communities such as data mining, machine learn- ing, signal/image processing, and computer vision [21, 30, 5, 27, 34, 7, 16]. Based on compressive sensing and rank mini- mization, many methods for recovering low-rank and sparse matrices (also called robust principal component analysis or RPCA [30]) with incomplete or grossly corrupted observa- tions have been proposed, such as principal component pur- suit (PCP) [5] and outlier pursuit [31]. In principle, those methods aim to minimize a hybrid optimization problem in- volving both the l1-norm and trace norm minimization. It is recognized that the l1-norm and the trace norm as the convex surrogates for the l0-norm and the rank function are powerfully capable of inducing sparse and low-rank, respec- tively [24, 5]. In addition, Xu et al. [31] used the l1,2-norm to model corrupted columns. Although RPCA has been well studied in recent years, there is little work focusing on RPCA plus matrix comple- tion (also called robust matrix completion or RMC [7]). In this paper, we are particularly interested in the trace norm regularized problem for RMC: min
X
��X��∗ + ��f(X), (1) where ��X��∗ is the trace norm of the desired matrix X, i.e., the sum of its singular values, �� �� 0 is a regularization parameter, and f(·) denotes the loss function such as the l2-norm loss or the l1-norm (or l1,2-norm) loss functions. In the following, we give a few examples of applications where the RMC is useful. Robust principal component analysis (RPCA). When the loss function is the l1-norm loss, the model (1) is a RPCA problem, which is adopted by a great number of emerging approaches such as PCP [5], and has been successfully applied in many important prob- lems such as latent semantic indexing [21], video surveil- lance [30, 5], and low-rank textures. Matrix completion (MC). When the loss function is the l2-norm loss, and only a relatively small number

Page 2
of entries are observed, the goal of this problem is to complete a low-rank matrix from incomplete samples of its entries. This problem, also called matrix com- pletion, is fundamental for collaborative filtering [6], link prediction, and global positioning. In this paper, we aim to recover both low-rank and sparse matrices from missing and grossly corrupted observations. Our solution provides a good approximation to the origi- nal data contaminated with both outliers and missing val- ues. Unlike existing RPCA methods, our approach not only takes into account the fact that the observations are contam- inated by additive outliers or missing values, but can also identify both low-rank and sparse components from missing and grossly corrupted observations. We develop a provable and scalable solution framework for RPCA and RMC prob- lems, which is particularly useful in this ��big data�� era where many real-world applications need to deal with large, high dimensional data with missing and corrupted values. We conduct extensive experiments that verify both the efficiency and effectiveness of our methods. We summarize the main contributions of our work as fol- lows: We propose a unified RMC framework to recover both low-rank and sparse matrices from missing and grossly corrupted observations, where the loss function can be selected as the l2-norm or the l1-norm. We present both convex and non-convex scalable mod- els to replace the linear projection operator constraint by a simple equality one. With the orthogonality con- straint applied to the dictionary component, we con- vert the non-convex model into a smaller-scale matrix trace norm regularized problem. We develop two efficient alternating direction augmented Lagrangian (ADAL) algorithms to solve the proposed methods, which can be accelerated by adaptively chang- ing the penalty parameter. Finally, we also provide the convergence analysis of our convex and non-convex algorithms. This paper is organized as follows. We review background and related work in Section 2. In Section 3, we present a convex and a non-convex scalable trace norm regularized models. We develop two efficient ADAL algorithms in Sec- tion 4. We provide the theoretical analysis of our algorithms in Section 5. We report empirical results in Section 6, and conclude this paper in Section 7.
2. BACKGROUND AND PROBLEM FORMU- LATION
Given a matrix, some of its entries may not be observed due to problems in the acquisition process, e.g., loss of in- formation or high cost of experiments to obtain complete data [1]. For the matrix completion problem, many itera- tive thresholding algorithms [25, 19] have been proposed to solve the trace norm regularized linear least squares problem min
X
��X��∗ + �� 2 ��PΩ(X) − PΩ(Z)��2
F ,
(2) where PΩ(Z) is defined as the projection of the matrix Z on the observed entries Ω: PΩ(Z)ij = Zij if (i, j) �� Ω and PΩ(Z)ij = 0 otherwise. In other words, f(·) is the l2-norm loss function, i.e., f(X) = 1
2 ��PΩ(X) − PΩ(Z)��2 F .
A low-rank matrix X can be recovered from highly cor- rupted matrix Z = X + E via the following trace norm and l1-norm minimization problem min
X
��X��∗ + �ˡ�Z − X��1, (3) where ��·��1 indicates the element-wise l1-norm, i.e., ��E��1 = ��ij |eij | and E = Z − X. The model (3) is called the RPCA problem [30], where f(·) is the l1-norm loss function, i.e., f(X) = ��Z − X��1. Several algorithms have been developed to solve the convex optimization problem (3), such as PCP [5] and IALM [17]. A more general RMC model in [7] and [16] aims to si- multaneously recover both low-rank and sparse components from incomplete and grossly corrupted observations via the convex optimization problem, min
X,E
��X��∗ + �ˡ�E��1, s.t., PΩ(X + E) = PΩ(Z). (4) Chen et al. [7] and Li [16] provided theoretical performance guarantees when minimizing trace norm plus l1-norm suc- ceeds in exact recovery. Although the RMC model (4) is a convex optimization problem, and can be solved by some convex algorithms, some additional variables need to be in- troduced. In addition, all current provable algorithms for RMC involve the singular vector decomposition (SVD), and thus they suffer from high computational cost of full or par- tial SVDs, which severely limits their applicability to large scale problems. To address the issues, we propose a scalable RMC method to recover matrices with missing and grossly corrupted observations.
3. CONVEX AND NON-CONVEX RMC MOD- ELS
From the optimization problem (4), we can find that the optimal solution EC =0, where ΩC is the complement of Ω, i.e., the index set of unobserved entries. Consequently, we have the following lemma. Lemma 1. The RMC problem (4) is equivalent to the fol- lowing convex optimization problem, min
X,E
��X��∗ + �ˡ�PΩ(E)��1, s.t., PΩ(X + E) = PΩ(Z), EC = 0. (5)
3.1 Convex RMC Model
To efficiently solve the RMC problem (4) and avoid in- troducing some auxiliary variables, we can assume without loss of generality that the unobserved data ZC =0, and EC may be any values such that ZC = XC +EC . Therefore, the constraint with a linear projection operator PΩ in (4) is simplified into Z = X + E. It is worth noting that at last EC will be set to 0 for the expected output E. Hence, we replace the constraint PΩ(Z) = PΩ(X+E) with Z = X+E, and achieve the following equivalent form: min
X,E
��X��∗ + �ˡ�PΩ(E)��1, s.t., X + E = Z. (6) To further improve the efficiency of our convex model (6) and the scalability of handling large data sets, we also pro- pose a scalable non-convex model.

Page 3
3.2 Non-Convex RMC Model
The desired low-rank matrix X is factorized into two much smaller matrices G �� Rm��d (GT G = I) and H �� Rn��d whose product is equal to X, i.e., X = GHT , where d is an upper bound for the rank of the matrix X, i.e., d �� r = rank(X). We have the following lemma. Lemma 2. Let G and H be two matrices of compatible dimensions, where G has orthogonal columns, i.e., GT G = I, then we have ��GHT ��∗ = ��H��∗. By substituting GHT = X and ��H��∗ = ��X��∗ into (6), we obtain a much smaller-scale matrix trace norm minimization problem, min
G,H,E
��H��∗ + �ˡ�PΩ(E)��1, s.t., GHT + E = Z, GT G = I. (7) Theorem 1. Suppose (X

, E

) is a solution of the con- vex problem (6) with rank(X

) = r, then there exists the solution G �� Rm��d, H �� Rn��d and E �� Rm��n to the prob- lem (7) with d �� r and EC =0 such that GHT = X

, and (GHT , E) is also a solution to the problem (6). Proof. If we know that (X

, E

) is a solution to the convex optimization problem (6), it is also a solution to min
X,E,rank(X)=r
��X��∗ + �ˡ�PΩ(E)��1, s.t., PΩ(Z) = PΩ(X + E), PC (E) = 0. Since for any (X

, E

) with rank(X

) = r, we can find G �� Rm��d and H �� Rn��d satisfying GHT = X

and PΩ(Z − GHT ) = PΩ(E) = PΩ(E

), where d �� r. More- over, according to Lemma 2, we have min
G,H,E
��H��∗ + �ˡ�PΩ(E)��1, s.t., PΩ(Z) = PΩ(GHT + E), GT G = I, = min
G,H,E
��GHT ��∗ + �ˡ�PΩ(E)��1, s.t., PΩ(Z) = PΩ(GHT + E), = min
X,E,rank(X)=r
��X��∗ + �ˡ�PΩ(E)��1, s.t., PΩ(Z) = PΩ(X + E), where PC (E)=0. In the following, we will discuss how to solve our convex and non-convex models (6) and (7). It is worth noting that the RPCA problem (3) can be viewed as a special case of both (6) and (7), where all entries of the corrupted matrix are directly observed. We will develop two efficient alter- nating direction augmented Lagrangian (ADAL) solvers for solving our convex model (6) and non-convex model (7), respectively. It is also worth noting that although our non- convex algorithm will produce different estimations of G and H, the estimation of GHT is stable as guaranteed by Theo- rem 1 and the convexity of the problem (6).
4. OPTIMIZATION ALGORITHMS
In this section, we will develop two efficient alternating di- rection augmented Lagrangian (ADAL) algorithms for solv- ing both problems (6) and (7). First, we design a general convex ADAL scheme for solving the convex problem (6). Then we propose a similar procedure for solving the non- convex problem (7). The convergence analysis of our algo- rithms is provided in the next section. Algorithm 1 Solving the problem (8) via ADAL Input: ��0. Initialize: x0 =0, z0 =0 and ��0 =0. for k = 0, 1, ··· , T do xk+1 = arg minx Lµ(x, zk, ��k). zk+1 = arg minz Lµ(xk+1, z, ��k). ��k+1 = ��k + µ(Axk+1 + Bzk+1 − c). end for Output: xk and zk.
4.1 Generic Formulation
We describe our optimization algorithm based on the ADAL method (also known as the alternating direction method of multipliers) for solving (6). The ADAL method was intro- duced for optimization in the 1970��s, and its origins can be traced back to techniques for solving partial difference equa- tions in the 1950��s. It has received renewed interest due to the fact that it is efficient to tackle large scale problems and solve optimization problems with multiple non-smooth terms in the objective function [17]. The ADAL can be considered as an approximation of the method of multipli- ers. It decomposes a large global problem into a series of smaller subproblems, and coordinates the solutions of sub- problems to compute the globally optimal solution. The problem solved by ADAL takes the following generic form min
x��Rn,z��Rm
f(x) + g(z), s.t., Ax + Bz = c, (8) where both f(·) and g(·) are convex functions. ADAL re- formulates the problem using a variant of the augmented Lagrangian method as follows: (x, z, ��) = f(x)+g(z)+��T (Ax+Bz−c)+ µ 2 ��Ax+Bz−c��2
2,
where �� is the Lagrangian multiplier and µ is a penalty pa- rameter. ADAL solves the problem (8) by iteratively min- imizing (x, z, ��) over x, z, and then updating ��, as out- lined in Algorithm 1 [2].
4.2 Convex ADAL Scheme
Our RMC optimization problem (6) can be solved by ADAL. The augmented Lagrangian of (6) is given by (X,E,Y ) = ��X��∗ + �ˡ�PΩ(E)��1 + ⟨Y, Z − X − E⟩ + µ 2 ��Z − X − E��2
F .
(9) Applying ADAL, we carry out the following updating steps in each iteration.
4.2.1 Updating X
With all other variables fixed, the optimal X is the solu- tion to the following problem: min
X
��X��∗ + µ 2 ��X − Z + E − Y/µ��2
F .
(10) To solve the problem (10), the spectral soft-thresholding op- eration [6, 4] is considered as a shrinkage operation on the singular values and is defined as follows: X = prox1(T) := Udiag(max{�� − 1 µ , 0})V T , (11)

Page 4
Algorithm 2 Solving RMC problem (6) via ADAL Input: Given data PC (Z) and ��. Initialize: X0 = E0 = Y0 =0, µ0 = 10
4, µmax = 1010,
�� = 1.10, and tol. while not converged do Update Xk+1 by (11). Update Ek+1 by (13) and (14). Update the multiplier Yk+1 by Yk+1 = Yk + µk(Z − Xk+1 − Ek+1). Update the parameter µk+1 by µk+1 = min(��µk, µmax). Check the convergence condition, ��Z − Xk+1 − Ek+1��F < tol. end while Output: Xk and Ek, where (Ek)ΩC is set to 0. where T := Z − E + Y/µ, max{·, ·} should be understood element-wise, U �� Rm��r, V �� Rn��r, and �� = (��1,��2,...,��r)T �� Rr��1 are obtained by SVD of T, i.e., T = Udiag(��)V T .
4.2.2 Updating E
The optimal E with all other variables fixed is the solution to the following problem, min
E
�ˡ�PΩ(E)��1 + µ 2 ��E + X − Z − Y/µ��2
F .
(12) To solve the problem (12), we introduce the following well- known shrinkage (soft-thresholding) operator [8]: S�� (Mij ) :=    Mij − ��, Mij > ��, Mij + ��, Mij < −��, 0, otherwise. According to the soft-thresholding operator S�� , the closed- form solution EΩ to the problem (12) is given by EΩ = S��/µ(Z − X + Y/µ)Ω. (13) We can easily obtain the closed-form solution by zeroing the gradient of (12) with respect to EC , i.e., EC = (Z − X + Y/µ)ΩC . (14) We can replace the l1-norm loss function in the sparse component learning problem (12) with the l2,1-norm loss function for a sparse solution, such as outlier pursuit [31] or low-rank representation [18] problems. The optimal solution to the problem with l2,1 regularization can be obtained by the soft-thresholding operator in [32]. Based on the description above, we develop an ADAL iter- ative algorithm for solving the RMC problem (6), as outlined in Algorithm 2. In addition, EC should be set to 0 for the expected output E. Moreover, an O(1/k) convergence rate of Algorithm 2 can be established following the conclu- sion in [11]. A fixed µ is commonly used. But there are some schemes of varying the penalty parameter to achieve better convergence. This algorithm can also be accelerated by adaptively changing µ. An efficient strategy [17] is to let µ = µ0 (the initialization in Algorithm 2) and increase µk iteratively by µk+1 = ��µk, where �� �� (1.0, 1.1] in general and µ0 is a small constant. Algorithm 2 can be easily applied to solve the RPCA prob- lem (3), where all entries of the corrupted matrix are directly observed. Although we also use the PROPACK package [14] to compute a partial SVD as in [25, 4, 17], Algorithm 2 em- ploys the SVD for the spectral soft-thresholding operation, and becomes slow or even impractical for large-scale prob- lems. Therefore, we further propose an efficient ADAL al- gorithm for solving the non-convex problem (7) in Section 4.3. In addition, several researchers [13, 28] have provided some matrix rank estimation strategies to compute a good value r for the rank of the involved matrix. Thus, we only set a relatively large integer d such that d �� r.
4.3 Non-Convex ADAL Scheme
Our non-convex RMC problem (7) can also be solved by ADAL. The augmented Lagrangian of (7) is given by (G, H, E, Y ) = ��H��∗ + �ˡ�PΩ(E)��1 +⟨Y,Z − GHT − E⟩ + µ 2 ��Z − GHT − E��2
F .
(15) We will derive our scheme for solving the following subprob- lems with respect to G, H and E, respectively, G

= arg min
G��Rm��d
(G, H, E, Y ), s.t., GT G = I, (16) H

= arg min
H��Rn��d
(G

,H,E,Y ), (17) E

= arg min
E��Rm��n
(G

, H

,E,Y ). (18)
4.3.1 Updating G
By fixing H and E at their latest values, and removing the terms that do not depend on G and adding some proper terms that do not depend on G, the optimization problem (16) with respect to G is reformulated as follows: min
G
��GHT − P��2
F ,
s.t., GT G = I, (19) where P := Z − E + Y/µ. This is actually the well-known orthogonal procrustes problem [22], the optimal solution can be given by the SVD of the matrix PH, i.e., G

= ˆU ˆV T , (20) where ˆU and ˆV are given by the SVD of PH, i.e., PH = ˆU ˆS ˆV T .
4.3.2 Updating H
By fixing G and E, the optimization problem (17) with respect to H can be rewritten as: min
H
µ 2 ��G∗ HT − P��2
F + ��H��∗.
(21) According to Theorem 2.1 in [4], the closed-form solution to the problem (21) is given by the following theorem. Theorem 2. The trace norm minimization problem (21) has a closed-form solution given by: H

= prox1(PT G

). (22) Proof. The first-order necessary and sufficient optimal- ity condition for the convex problem (21) is given by 0 �� ∂��H��∗ + µ(G

HT − P)T G

,

Page 5
Algorithm 3 Solving RMCMF problem (7) via ADAL Input: Given data PC (Z) and ��. Initialize: G0 = eye(m, d), H0 =0, E0 = Y0 =0, µ0 = 10
4, µmax = 1010, �� = 1.10, and tol.
while not converged do Update Gk+1 by (20). Update Hk+1 by (22). Update Ek+1 by (26) and (27). Update the multiplier Yk+1 by Yk+1 = Yk + µk(Z − Gk+1HT
k+1 − Ek+1).
Update the parameter µk+1 by µk+1 = min(��µk, µmax). Check the convergence condition, ��Z − Gk+1HT
k+1 − Ek+1��F < tol.
end while Output: Gk, Hk and Ek, where (Ek)ΩC is set to 0. where ∂��H��∗ denotes the set of subgradients of the trace norm (optimality conditions for trace norm are given in [4]). Since (G

)T G

= I, the optimality condition for the prob- lem (21) is rewritten as follows: 0 �� ∂��H��∗ + µ(H − PT G

). (23) (23) is also the first-order optimality condition for the fol- lowing problem, min
H
µ 2 ��H − PT G
∗��2 F + ��H��∗.
(24) According to Theorem 2.1 in [4], the optimal solution of the problem (24) is given by (22).
4.3.3 Updating E
By fixing all other variables, the optimal E is the solution to the following problem: min
E
�ˡ�PΩ(E)��1 + µ 2 ��E + G

(H

)T − Z − Y/µ��2
F . (25)
The updating steps for E are very similar to (13) and (14), where X is replaced by G

(H

)T as follows: E
Ω = S��/µ(Z − G
(H

)T + Y/µ)Ω, (26) and E
C = (Z − G
(H

)T + Y/µ)ΩC . (27) Following the above analysis, we obtain an ADAL algo- rithm to solve the matrix factorization based RMC (RM- CMF) problem (7), as outlined in Algorithm 3. In addi- tion, EC should be set to 0 for the output E. Algorithm 3 can also be easily applied to solve the RPCA problem (3).
5. ALGORITHM ANALYSIS
We now provide convergence analysis and complexity anal- ysis for our algorithms.
5.1 Convergence Analysis
The convergence of ADAL to solve the standard form (8) was studied in [9, 2]. We establish the convergence of Algo- rithm 2 by transforming the RMC problem (6) into a stan- dard form (8), and show that the transformed problem sat- isfies the condition needed to establish the convergence. In Algorithm 2, we state that our algorithm alternates between two blocks of variables, X and E. Let x denote the vectoriza- tion of X, i.e., x = vec(X) �� Rmn��1, e = vec(E) �� Rmn��1 and z = vec(Z) �� Rmn��1, and f(X) := ��X��∗ and g(E) := �ˡ�PΩ(E)��1. We can write the equivalence constraint in (6) as the following form: Ax − Be=z, where both A �� Rmn��mn and B �� Rmn��mn are the identity matrices. By the definition f(X) and g(E), it is easy to verify that the problem (6) and Algorithm 2 satisfy the con- ditions in Algorithm 1. Hence, the convergence of Algorithm 2 is given as follows: Theorem 3. Consider the RMC problem (6), where both f(X) and g(E) are convex functions, and A and B are both identity matrices and have full column rank. The sequence {Xk, Ek} generated by Algorithm 2 converges to an optimal solution {X

, E
∗} of the problem (6).
Hence, the sequence {Xk, Ek} converges to an optimal solu- tion to the RMC problem (4), where (Ek)ΩC =0. Moreover, the convergence of our derived Algorithm 3 for the non- convex problem (7) is guaranteed, as shown in the following theorem. Theorem 4. Let (Gk, Hk,Ek) be a sequence generated by Algorithm 3, then we have the following conclusions: 1. (Gk, Hk, Ek) approaches to a feasible solution, i.e., limk������Z − GkHT
k − Ek��F = 0.
2. Both sequences GkHT
k and Ek are Cauchy sequences.
3. (Gk, Hk, Ek) converges to a KKT point of the problem (7). The proof of this theorem can be found in APPENDIX.
5.2 Complexity Analysis
For the convex problem (6), the running time of Algorithm 2 is dominated by that of performing SVD on the matrix of size m �� n. For the non-convex problem (7), Algorithm 3 performs SVD on much smaller matrices of sizes m �� d and d �� n, and some matrix multiplications in (22). Hence, the total time complexity of Algorithm 2 and Algorithm 3 are O(tmn2) and O(t(d2m + mnd)) (d ≪ n �� m), respectively, where t is the number of iterations.
5.3 Connections to Existing Approaches
Our non-convex method is the scalable version of our con- vex method for both RPCA and RMC problems. In addi- tion, the computational complexity of existing convex algo- rithms is at least O(mn2). It means that common RPCA (e.g., PCP [5]) and RMC (e.g., SpaRCS [27]) methods can- not handle large-scale problems, while our non-convex method has a complexity practically linear to the input matrix size and scales well to handle large-scale problems. From the problems (3) and (6), we can see that in essence our convex method for RPCA problems is equivalent to IALM [17]. The models in [10] and [20] are the special cases of our model (7) when �� �� ��. Moreover, the models in [33] and [3] focus only on the desired low-rank matrix. In this sense, them can be viewed as the special cases of our model (7). From the above complexity analysis, both schemes have the same theoretical computational complexity. However, from the experimental results in the next section, we can

Page 6
(a) (b) (c) Figure 1: Image used in the text removal experi- ment: (a) Input image; (b) Original image; (c) Out- lier mask. see that our non-convex method usually runs much faster than the methods in [33] and [3]. The following bilinear reg- ularized matrix factorization formulation in [3] is one of the most similar model to our model (7), min
G,H
��f(Z − GHT ) + 1 2 (��G��2
F + ��H��2 F ).
(28)
6. EXPERIMENTAL RESULTS
In this section, we evaluate both the effectiveness and ef- ficiency of our RMC and RMCMF algorithms for solving RMC problems such as text removal, face reconstruction and background modeling. All experiments were performed using Matlab 7.11 on an Intel(R) Core (TM) i5-4570 (3.20 GHz) PC running Windows 7 with 8GB main memory.
6.1 Text Removal
We first conduct an experiment by considering a simulated image processing task on artificially generated data, and the goal is to remove some generated text from an image. The ground-truth image is of size 256 �� 222 with rank equal to 10 for the data matrix. We then add to the image a short phase in text form which plays the role of outliers. Fig. 1 shows the image together with the clean image and outliers mask. For fairness, we set the rank of all the algorithms to 20, which is two times the true rank of the underlying matrix. The input data are generated by setting 30% of the randomly selected pixels of the image as missing entries. We compare our two methods, RMC and RMCMF, with the state-of-the-art methods, PCP [5], SpaRCS1 [27], RegL12 [33] and BF-ALM [3]. We set the regularization parameter �� = 1/ �� max(m, n) and the stopping tolerance tol = 10
4
for all algorithms in this experiment. The results obtained by different methods are visually shown in Fig. 2, where the outlier detection accuracy (the score Area Under the receiver operating characteristic Curve, AUC) and the error of low-rank component recovery (i.e., ��X − Z��F /��Z��F , where Z and X denote the ground-truth image matrix and the recovered image matrix, respectively) are also presented. As far as low-rank matrix recovery is con- cerned, the five RMC methods including SpaRCS, RegL1, BF-ALM, RMC and RMCMF, outperform PCP, not only visually but also quantitatively. For outlier detection, it can be seen that our methods RMC and RMCMF perform better than the other methods. In short, RMC and RMCMF sig- nificantly outperform PCP, SpaRCS, RegL1 and BF-ALM in terms of both low-rank matrix recovery and spare outlier identification. Moreover, the running time of PCP, SpaRCS,
1http://www.ece.rice.edu/~aew2/sparcs.html 2https://sites.google.com/site/yinqiangzheng/
30 40 50 60 70 0.5 0.6 0.7 0.8 0.9 Given rank AUC PCP SpaRCS RegL1 RMC RMCMF 30 40 50 60 70 0.2 0.3 0.4 0.5 0.6 Given rank Error PCP SpaRCS RegL1 RMC RMCMF 10−2 10−1 100 0.75 0.8 0.85 0.9 0.95 Regularization parameter AUC PCP SpaRCS RegL1 RMC RMCMF 10−2 10−1 100 0.2 0.25 0.3 0.35 0.4 Regularization parameter Error PCP SpaRCS RegL1 RMC RMCMF
Figure 3: Performance of PCP, SpaRCS, RegL1, RMC and RMCMF in terms of AUC (left) and Er- ror (right) with varying ranks (the first row) or reg- ularization parameters (the second row). RegL1, BF-ALM, RMC and RMCMF is 15.39sec, 5.74sec, 3.86sec, 2.62sec, 1.10sec and 0.23sec, respectively. Moreover, we further evaluate the robustness of our meth- ods, RMC and RMCMF, with respect to the regularization parameter �� and the given rank changes. We conduct some experiments on the artificially generated data, and illustrate the outlier detection accuracy (AUC) and the error (Error) of low-rank component recovery, where the rank parame- ter of our RMCMF method, SpaRCS and RegL1 is cho- sen from {30, 35, ··· , 70}, and the regularization parame- ter �� of RMC, RMCMF, PCP and RegL1 is chosen from the grid {0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1}. No- tice that we only report the results of RegL1 due to the similar performance of RegL1 and BF-ALM. The average AUC and Error results of 10 independent runs are shown in Fig. 3, from which we can see that our RMCMF method per- forms much more robust than SpaRCS and RegL1 in terms of AUC and Error with respect to the given rank. Fur- thermore, our RMC and RMCMF methods are much more robust than PCP and RegL1 in terms of AUC and Error against the regularization parameter.
6.2 Face Reconstruction
We also test our methods for the face reconstruction prob- lems. The face database used in this experiment is a part of Extended Yale Face Database B [15] with large corruptions. The part of Extended Yale-B consists of 320 frontal face im- ages of the first 5 classes, and each subset contains 64 images with varying illumination conditions and heavily ��shadows�� as outliers. The resolution of all images is 192��168 and the pixel values are normalized to [0, 1], then the pixel values are used to form data vectors of dimension 32,256. The input data are generated by setting 40% of the randomly selected pixels of each image as missing entries. Fig. 4 shows some original and reconstructed images by RegL1 and CWM3 [20], and our methods, RMC and RM- CMF, where the average computational time (in seconds)
3http://www4.comp.polyu.edu.hk/~cslzhang/papers.htm

Page 7
(a) (b) (c) (d) (e) (f) Figure 2: Text removal results of different methods, where the first row shows the recovered foreground masks and the second row shows the recovered background images: (a) PCP (AUC: 0.7934; Error: 0.3290); (b) SpaRCS (AUC: 0.8487; Error: 0.2623); (c) RegL1 (AUC: 0.8792; Error: 0.2291); (d) BF-ALM (AUC: 0.8568; Error: 0.2435); (e) RMC (AUC: 0.9206; Error: 0.1987); (f) RMCMF (AUC: 0.9197; Error: 0.1996). Figure 4: Face reconstruction results. From left column to right column: input corrupted images (black pixels denote missing entries), original im- ages, reconstruction results by CWM (1554.91sec), RegL1 (2710.68sec), RMC (54.19sec) and RMCMF (21.18sec). of all these algorithms on each people��s faces is presented. It can be observed that RMC and RMCMF perform much better than the other methods visually, as they effectively eliminate the heavy noise and ��shadows�� and simultaneously complete the missing entries. In other words, our RMC and RMCMF methods can achieve the latent features underly- ing the original images even though the observed data is corrupted by both outliers and missing values. And impres- sively, both RMC and RMCMF are also significantly faster than RegL1 and CWM.
6.3 Background Modeling
In this experiment we test our methods on real surveil- lance videos for object detection and background subtrac- tion as a robust matrix completion problem. Background modeling is a crucial task for motion segmentation in surveil- lance videos. A video sequence satisfies the low-rank and sparse structures, because the background of all the frames is controlled by few factors and hence exhibits low-rank property, and the foreground is detected by identifying spa- tially localized sparse residuals [30, 5]. We test our methods on real surveillance videos for object detection and back- ground subtraction on four color surveillance videos: Boot- strap, Lobby, Hall and Mall databases4. The data matrix Z consists of the first 400 frames of size 120 �� 160. Since all the original videos have colors, we first reshape every frame of the video into a long column vector and then collect all the columns into a data matrix Z with size of 57600 �� 400. Moreover, the input data is generated by setting 10% of the randomly selected pixels of each frame as missing entries. Figs. 5 and 6 illustrate the foreground and background separation results on the Bootstrap and Mall databases, where the first and fourth columns represent the incom- plete input images, the second and fifth columns show the low-rank recoveries, and the third and sixth columns show the sparse components. It is clear that the background can be effectively extracted by RMC, RMCMF, BF-ALM, and GRASTA5 [12]. Notice that SpaRCS [27] could not yield ex- perimental results on these databases because it ran out of memory. We can see that the decomposition results of RM- CMF and RMC are slightly better than that of GRASTA and BF-ALM. As pointed out in [23], the theoretical reason for the unsatisfactory performance of the l1-penalty is that the irrepresentable condition is not met. Hence, RMCMF incorporating with matrix factorization is more accurate in recovering the low-rank matrix than RMC. Furthermore, we also provide the CPU time consumption of these algorithms on all four databases, as shown in Table 1, from which we can see that RMCMF is more than 7 times faster than RMC, more than 4 times faster than GRASTA, and more than 2 times faster than BF-ALM. This further shows that RM- CMF has good scalability and can address large-scale prob- lems.
7. CONCLUSIONS
We proposed a unified RMC framework for RPCA and RMC problems. We first presented two matrix trace norm regularized models that replace the linear projection op- erator constraint by a simple equality one. Then we de-
4http://perception.i2r.a-star.edu.sg/bkmodel/bkindex 5https://sites.google.com/site/hejunzz/grasta

Page 8
Figure 5: Foreground and background separation results of different algorithms on the Bootstrap data set, where the first, second, third and last rows show the recovered low-rank and sparse images by GRASTA, BF-ALM, RMC and RMCMF, respectively. Table 1: Comparison of time costs in CPU seconds of GRASTA, BF-ALM, RMC and RMCMF on back- ground modeling datasets.
Datasets Sizes GRASTA BF-ALM RMC RMCMF Bootstrap 57, 600 �� 400 153.65 93.17 344.13 38.32 Lobby 61, 440 �� 400 187.43 139.83 390.78 50.08 Hall 76, 032 �� 400 315.11 153.45 461.48 67.73 Mall 245, 760 �� 200 493.92 – – 94.59
veloped two efficient ADAL algorithms to solve our con- vex and non-convex low-rank and sparse matrix decompo- sition problems. Finally, we analyzed the convergence of our algorithms. Experimental results on synthetic and real- world data sets demonstrated the superior performance of our methods compared with the state-of-the-art methods in terms of both efficiency and effectiveness. Both our algorithms are essentially the Gauss-Seidel schemes of ADAL, and their Jacobi-type update schemes can be easily implemented in parallel. Hence, our algorithms are well suited for parallel and distributed computing and are particularly attractive for solving certain large-scale prob- lems. Moreover, our methods can easily extended to the general nonconvex low-rank inducing penalty problem [26]. For future work, we will consider the compressing RMC (also called compressive principal component pursuit) prob- lem with the general linear operator as in [29].
8. ACKNOWLEDGMENTS
We thank the reviewers for giving us many constructive comments, with which we have significantly improved our paper. This research is supported in part by SHIAE Grant No. 8115048, MSRA Grant No. 6903555, GRF No. 411211, and CUHK direct grant Nos. 4055015 and 4055017.
9. REFERENCES
[1] E. Acar, D. Dunlavy, T. Kolda, and M. Mørup. Scalable tensor factorizations with missing data. In SDM, pages 701–711, 2010. [2] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn., 3(1):1–122, 2011. [3] R. Cabral, F. Torre, J. Costeira, and A. Bernardino. Unifying nuclear norm and bilinear factorization approaches for low-rank matrix decomposition. In ICCV, pages 2488–2495, 2013. [4] J. Cai, E. Cand`es, and Z. Shen. A singular value thresholding algorithm for matrix completion. SIAM J. Optim., 20(4):1956–1982, 2010. [5] E. Cand`es, X. Li, Y. Ma, and J. Wright. Robust principal component analysis? J. ACM, 58(3):1–37, 2011. [6] E. Cand`es and B. Recht. Exact matrix completion via convex optimization. Found. Comput. Math., 9(6):717–772, 2009. [7] Y. Chen, A. Jalali, S. Sanghavi, and C. Caramanis. Low-rank matrix recovery from errors and erasures. IEEE Trans. Inform. Theory, 59(7):4324–4337, 2013. [8] I. Daubechies, M. Defrise, and C. DeMol. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pur. Appl. Math., 57(11):1413–1457, 2004. [9] J. Eckstein and D. Bertsekas. On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Prog., 55(1):293–318, 1992.

Page 9
Figure 6: Foreground and background separation results of different algorithms on the Mall data set, where the first and second rows show the recovered low-rank and sparse images by GRASTA and RMCMF, respec- tively. [10] A. Eriksson and A. van den Hengel. Efficient computation of robust low-rank matrix approximations in the presence of missing data using the l1 norm. In CVPR, pages 771–778, 2010. [11] B. He and X. Yuan. On the O(1/n) convergence rate of the Douglas-Rachford alternating direction method. SIAM J. Numer. Anal., 50(2):700–709, 2012. [12] J. He, L. Balzano, and A. Szlam. Incremental gradient on the Grassmannian for online foreground and background separation in subsampled video. In CVPR, pages 1568–1575, 2012. [13] R. Keshavan, A. Montanari, and S. Oh. Matrix completion from a few entries. IEEE Trans. Inform. Theory, 56(6):2980–2998, 2010. [14] R. Larsen. Propack-software for large and sparse svd calculations. 2005. [15] K. Lee, J. Ho, and D. Kriegman. Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans. Pattern Anal. Mach. Intell., 27(5):684–698, 2005. [16] X. Li. Compressed sensing and matrix completion with constant proportion of corruptions. Constr. Approx., 37:73–99, 2013. [17] Z. Lin, R. Liu, and Z. Su. Linearized alternating direction method with adaptive penalty for low-rank representation. In NIPS, pages 612–620, 2011. [18] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma. Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell., 35(1):171–184, 2013. [19] S. Ma, D. Goldfarb, and L. Chen. Fixed point and Bregman iterative methods for matrix rank minimization. Math. Prog., 128(1):321–353, 2011. [20] D. Meng, Z. Xu, L. Zhang, and J. Zhao. A cyclic weighted median method for L1 low-rank matrix factorization with missing entries. In AAAI, 2013. [21] K. Min, Z. Zhang, J. Wright, and Y. Ma. Decomposition background topics from keywords by principal component pursuit. In CIKM, pages 269–278, 2010. [22] H. Nick. Matrix procrustes problems. 1995. [23] Y. She and A. Owen. Outlier detection using nonconvex penalized regression. J. Am. Stat. Assoc., 106(494):626–639, 2011. [24] M. Tao and X. Yuan. Recovering low-rank and sparse components of matrices from incomplete and noisy observations. SIAM J. Optim., 21(1):57–81, 2011. [25] K.-C. Toh and S. Yun. An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems. Pac. J. Optim., 6:615–640, 2010. [26] S. Wang, D. Liu, and Z. Zhang. Nonconvex relaxation approaches to robust matrix recovery. In IJCAI, pages 1764–1770, 2013. [27] A. Waters, A. Sankaranarayanan, and R. Baraniuk. SpaRCS: Recovering low-rank and sparse matrices from compressive measurements. In NIPS, pages 1089–1097, 2011. [28] Z. Wen, W. Yin, and Y. Zhang. Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm. Math. Prog. Comp., 4(4):333–361, 2012. [29] J. Wright, A. Ganesh, K. Min, and Y. Ma. Compressive principal component pursuit. Inform. Infer., 2:32–68, 2013. [30] J. Wright, A. Ganesh, S. Rao, Y. Peng, and Y. Ma. Robust principal component analysis: exact recovery of corrupted low-rank matrices by convex optimization. In NIPS, pages 2080–2088, 2009. [31] H. Xu, C. Caramanis, and S. Sanghavi. Robust PCA via outlier pursuit. In NIPS, pages 2496–2504, 2010. [32] J. Yang, W. Yin, Y. Zhang, and Y. Wang. A fast algorithm for edge-preserving variational multichannel image restoration. SIAM J. Imag. Sci., 2(2):569–592, 2009. [33] Y. Zheng, G. Liu, S. Sugimoto, S. Yan, and M. Okutomi. Practical low-rank matrix approximation under robust L1-norm. In CVPR, pages 1410–1417, 2012. [34] T. Zhou and D. Tao. GoDec: Randomized low-rank & sparse matrix decomposition in noisy case. In ICML, pages 33–40, 2011.
APPENDIX
We first prove that the boundedness of the multiplier and some variables of Algorithm 3, and then analyze the conver-

Page 10
gence of Algorithm 3. To prove the boundedness, we first give the following lemmas. Lemma 3. Let X be a real Hilbert space endowed with an inner product ⟨·⟩ and a corresponding norm ��·�� (the trace norm or the l1 norm), and y �� ∂��x��, where ∂��·�� denotes the subgradient. Then ��y��

= 1 if x ̸= 0, and ��y��
∗ �� 1 if
x = 0, where ��·��

is the dual norm of the norm ��·��. Lemma 4. Let Yk+1 = Yk + µk(Z − Gk+1HT
k+1 − Ek+1),
ˆYk+1 = Yk + µk(Z − Gk+1HT
k+1 − Ek), and ˜Yk+1 = Yk +
µk(Z − Gk+1HT
k − Ek), where Gk+1 is the optimal solution
of the problem (19). Then the sequences {Yk}, {ˆYk}, {˜Yk}, {Hk} and {Ek} produced by Algorithm 3 are all bounded. Proof. By the first-order optimality condition of the prob- lem (18) with respect to Ek+1, we have 0 �� ∂(Ek+1)Ω Lµk (Gk+1, Hk+1, Ek+1, Yk), and (Yk + µk(Z − Gk+1HT
k+1 − Ek+1))Ω �� ��∂��(Ek+1)Ω��1,
i.e., (Yk+1)Ω �� ��∂��(Ek+1)Ω��1. Furthermore, by substituting Yk+1 = Yk+µk(Z−Gk+1HT
k+1
Ek+1) into (18), we have (Yk+1)ΩC = 0. By Lemma 4, we have ��Yk+1���� = ��(Yk+1)ΩC + (Yk+1)Ω���� �� ��, where ��·���� denotes the matrix l��-norm, i.e., ��M���� = maxi,j |Mi,j |. Thus, the sequence {Yk} is bounded. By the iteration procedure of Algorithm 3, we have Lµk (Gk+1, Hk+1,Ek+1, Yk) �� Lµk (Gk+1, Hk+1, Ek, Yk) ��Lµk (Gk+1, Hk, Ek, Yk) �� Lµk (Gk, Hk, Ek, Yk) =Lµk−1 (Gk,Hk, Ek, Yk−1) + ��k��Yk − Yk−1��2
F ,
where ��k = 1
2 µ 2 k−1(µk−1 + µk) and µk = ��µk−1. Since ��
��
k=1
1 2 µ
2 k−1(µk−1 + µk) =
��(�� + 1) 2µ0
��
��
k=1
1 ��k = ��(�� + 1) 2µ0(�� − 1) < ��, we have that {Lµk (Gk+1, Hk+1, Ek+1,Yk)} is upper bounded due to the boundedness of {Yk}. Then ��Hk��∗ + �ˡ�PΩ(Ek)��1 =Lµk−1 (Gk, Hk,Ek,Yk−1) − ⟨Yk−1, Z − GkHT
k − Ek⟩
µk−1 2 ��Z − GkHT
k − Ek��2 F ,
=Lµk−1 (Gk,Hk,Ek,Yk−1) 1 2µk−1 (��Yk��2
F − ��Yk−1��2 F ),
is upper bounded, i.e., {Hk} and {Ek} are bounded. Since ��GkHT
k ��∗ = ��Hk��∗, {GkHT k } is also bounded.
We next prove that {˜Yk} is bounded. Since Gk+1 is the optimal solution of the problem (19), then we have ��Yk + µk(Z − Gk+1HT
k − Ek)��2 F
�ܡ�Yk + µk(Z − GkHT
k − Ek)��2 F .
By the definition of ˜Yk+1, and µk+1 = ��µk, thus, ��˜Yk+1��2
F �� ��(1 + ��)Yk − ��Yk−1��2 F .
By the boundedness of Hk and Yk, then the sequence {˜Yk} is bounded. The first-order optimality condition of the problem (21) with respect to Hk+1 is rewritten as follows: GT
k+1
ˆYk+1 �� ∂��HT
k+1��∗.
According to Lemma 4, we have ��GT
k+1
ˆYk+1��2 �� 1, where ��M��2 denotes the spectral norm of M and is equal to the maximum singular value of M. Thus, GT
k+1
ˆYk+1 is bounded. Let G
�� k+1 denote the orthogonal complement of Gk+1, i.e.,
G
�� k+1Gk+1 = 0, then we have
(G
�� k+1)T ˆYk+1
=(G
�� k+1)T (Yk + µk(Z − Gk+1HT k+1 − Ek)),
=(G
�� k+1)T (Yk + µk(Z − Ek)),
=(G
�� k+1)T (Yk + µk(Z − Gk+1HT k − Ek)),
=(G
�� k+1)T ˜Yk+1.
Thus, {(G
�� k+1)T ˆYk+1} is bounded due to the boundedness
of {˜Yk}. Then we have ��ˆYk+1��2 = ��GT
k+1
ˆYk+1 + (G
�� k+1)T ˆYk+1��2
�ܡ�GT
k+1
ˆYk+1��2 + ��(G
�� k+1)T ˆYk+1��2.
Since both GT
k+1
ˆYk+1 and (G
�� k+1)T ˆYk+1 are bounded, the
sequence {ˆYk} is bounded. Proof of Theorem 4: Proof. 1. By Z − Gk+1HT
k+1 − Ek+1 = µ 1 k (Yk+1 − Yk),
the boundedness of {Yk} and limk���� µk = ��, we have lim
k����
Z − Gk+1HT
k+1 − Ek+1 = 0.
Thus, (Gk, Hk, Ek) approaches to a feasible solution. 2. We prove that the sequences {Ek} and {GkHT
k } are
Cauchy sequences. By ��Ek+1 − Ek�� = µ
1 k ��Yk+1 ˆYk+1�� = o(µ 1 k ) and ��
��
k=1
µ
1 k−1 =
�� µ0(�� − 1) < ��, thus, {Ek} is a Cauchy sequence, and it has a limit E

. Similarly, {GkHT
k } is also a Cauchy sequence, therefore it
has a limit {G∗ (H

)T }. 3. According to Algorithm 3, the first-order optimality conditions of the problems (25) and (21) at the k-th iteration is formulated as follows: GT
k (Yk−1 + µk−1(Z − GkHT k − Ek−1)) �� ∂��HT k ��∗,
and (Yk−1 + µk−1(Z − GkHT
k − Ek))Ω �� ��∂��(Ek)Ω��1.
Since both {Ek} and {GkHT
k } are Cauchy sequences, let
E

and G

(H

)T be limit of {Ek} and {GkHT
k }, respec-
tively. By the definition of Yk, we have (G

)T Y
∗ �� ∂��(H
)T ��∗, (Y

)Ω �� ��∂��(E

)��1, and (G

,H

, E

) is a feasible solution, i.e., Z = G

(H

)T + E

, and G

(G

)T = I. Thus, (G

, H

, E

) is a KKT point of the problem (7).

Set Home | Add to Favorites

All Rights Reserved Powered by Free Document Search and Download

Copyright © 2011
This site does not host pdf,doc,ppt,xls,rtf,txt files all document are the property of their respective owners. complaint#nuokui.com
TOP