Applications of Convex Optimization in
Image Processing
Anirudh K. Agrawal (11098)
Department of Electrical Engineering
Indian Institute of Technology
kanirudh@iitk.ac.in
Vivek Kumar (11821)
Department of Electrical Engineering
Indian Institute of Technology
viveks@iitk.ac.in
,
Abstract We planning to highlight the different areas of image
processing where convex optimization finds its uses. Some of the
general areas where we apply convex optimization are denoising
deblurring of images, segmentation, 3d reconstruction and face
recognition. In the following report we will discuss different areas
of Image Processing in relation to convex optimization, namely
de-noising and face-recognition. To solve the image denoising
problem we discuss an efficient iterative algorithm to solve a
certain class of linearly constrained non-convex optimization
problems, which is used to implement non-convex norms for
regularization tasks in image processing and computer vision and
compare different variants of it. We also discuss Augmented La-
grangian Methods (ALM) for fast and scalable {1-minimization
in real-world Robust Face Recognition and also compares several
other popular l₁-minimization algorithm in sparse representation
based classification. In this report we verify the results obtained
by the authors and present our own comments.
I. INTRODUCTION
In many problems related to Image Processing and Com-
puter Vision we find many optimization problems, many of
which are of convex nature. We can use the various Convex
optimization techniques to solve them.
There are also many computer vision problems which falls
under linearly constrained convex plus concave optimization
problems which are actually non-convex problems but can
be efficiently minimized using state-of-the-art algorithms for
convex optimization. We will study how to efficiently solve a
wide class of optimization problems including l2-lp and l1-lp
problems with 0 < p < 1 which are non-convex in nature.
The model considered is a linearly constrained minimization
on a finite dimensional Hilbert space H of the form
Ax = b,
min F(x), s.t.
xЄH
(3)
with F : H → R being a sum of two Lipschitz continuous
terms
F(x) := F1(x) + F₂(|x|)
(4)
We assume that F is bounded below, F₁: H→ RU{∞} is a
proper convex and F2 : H+ → R is concave and increasing.
Here, H+ denotes the non-negative orthant of the space H;
For solving the Face Recognition problem a sparsity based increasingness and the absolute value |x| are to be understood
classification framework has been used which boils down to
solving a l₁ minimization problem as which can be written as
follows
min ||x||1 subject to b = Ax
T
coordinate-wise. The linear constraint Ax = b is given by a
linear operator A: H→ H1. For the study purpose, we take
formulation of F₁ and F₂ as
(1)
F₁(x) = ||Tx — g||2, and
F₂(x) = \||x||2,p₁ (5)
where,
But in practical cases b has noise in it, so we solve a
unconstrained Basis pursuit denoising problem which can be
written as follows
min
-
·||b – Ax||2 + λ||x||1
X
2
(2)
=
Σ¿(xi) + ε)² is a non-convex norm for
0 < p < 1, λ = R+, T is a linear operator, and g is a vector to
be approximated. To solve the optimization problem given in
(3) the following iterative algorithm is proposed in the paper:
Here the matrix 'A' is made for each class, whose column
vectors are the images of size(w x h) and 'y' is the test
image. We already know that minimizing the l₁ produces a
sparse results, this finds applications in image compression
also. Although the above problem can be solved as a simple
linear program due to high dimensionality of images the time
and computational complexity is vary high. Many efficient
methods have been developed over the years, we will discuss
some of them and compare them based on their computational
complexity.
xk+1
=
argmin F(x)
:=
Ax-b
argminF₁(x) + ||w*.x||1,
Ax=b
(6)
(7)
where OF2 denotes the superdifferential of the concave func-
tion F2. This iterative algorithm proceeds by iteratively solving
l₁ problems which approximate the original problem. We have
worked on the implementation of above algorithm in image
denoising by Rudin, Osher, and Fatemi (ROF) model and
iteratively reweighted l₁ (IRL1) algorithm and compared the
results for both.
II. IMAGE DENOISING
The general problem (3) is formulated for specific computer
vision applications as:
min F(x):
Ax=b
= min F₁(x) + F₂(|x|)
Ax-b
:= min||Tx - g|| +ATF2(|x|),
Ax=b
(8)
(9)
where F₂ : H+ → H+ is a coordinate-wise increasing and
concave funtion. A : H → H∞,TH → H₁ are linear
operators acting between finite dimensional Hilbert spaces
H₁ or H1. The weight A € H+ has non-negative entries.
The data-term is the convex lp-norm with q≥ 1. Various
prototypes for F2(x) are
│Xi│→ (|Xi│+ε)P or |xi| → log(1+B\æil), Vi, (10)
i.e., the regularized 1-norm, 0 < p < 1,ɛ € R+, or
a convex-log function. The p-norm becomes Lipschitz by
the ɛ-regularization and the log-function naturally is Lips-
chitz.Algorithm (7) simplifies to
xk+1
=
argmin||Txg||2 + ||diag(A)(w*.x||1,
Ax-b
B. Iterative Reweighted l₁ Algorithm
Paper [6] describes an iterative reweighted l₁ algorithm
for non-convex lp pseudo-norms, p Є (0,1), which are non-
differentiable at zero. It solves a sequence of non-smooth l₁
problems and can be seen as counterpart to IRLS (Iteratively
Reweighted Least Squares) algorithm. Originally, it was pro-
posed to improve the sparsity properties in l₁ regularized
compressed sensing problems, but this algorithm found its
uses in various computer vision algorithms. In the formulation
of the IRL1 algorithm, we have minimization problem as
explained in equations (10), (11) and (12) as
|2i|→log(1+B|xi), Vi,
k+1 = argmin||Txg||2 + ||diag(A)(w*.x||1,
Ax=b
where the weights given by the superdifferential of F2 are
B
wk
p
-
or wķ
ພ
=
(1 + B|xi)
The results for both the algorithms discussed above are pro-
(11) vided in the Section 4 (Results and Simulations).
where the weights given by the superdifferential of F½ are
Ρ
wk
or wķ
=
(|xk|+ε)¹-p
B
(1 + ẞ|xi|)'
III. 1 MINIMIZATION ALGORITHMS
Here we are gonna discuss various l₁ minimization tech-
(12) niques.
Further in the report, we will describe the two main algorithms
for Image denoising which are optimized using the above
described algorithm and do their comparison.
A. ROF Image Denoising Model
Now, we'll touch upon ROF image model [5] which aims
at obtaining a clean image x from a noisy image ƒ : N →
R, where N is bounder open subset of R2. For solving this
problem they proposed the following minimization problem:
=
u = argmin||p + X||ƒ – u||2.
Ա
(13)
=
Here F₁(u) = |x|p and F2(u) = \||ƒ—u||2 and subdifferential
0F2 2λ(u – ƒ). Formally, we can write dF2(u)
-div(V), and we can write the Euler-lagrange differential
equation for the RUdin Osher Fatemi model as follows:
0 = -div
Vu
|vu|
+2λ(uf)
or equivalently,
1
uft
u = ƒ + 2x div (V)
21
A. Homotopy Methods
The method [4] uses the observation that while solving (2)
the objective function undergoes a homotopy from the ell2
constraint to the ell₁ objective as A decreases, i.e when > → 0,
* converges to the solution of (1). Also we can show that the
solution path = {x* : \ € [0, ∞)} is a piecewise constant
function of A. Therefore if we can the breakpoints that lead to
changes in the support of xi.e either removal of a previously
non-zero coefficients or addition of a new non-zero coefficient,
we can construct a decreasing sequence of X.
In order to computer the gradient of objective function of the
problem defined in (2), let us define a subdifferential of ||x||1
defined as follows:
Ші
=
u(x) = 8||x||1
=
Setting initial value x(0)
X
sgn(xi) xi#0
0
Uż Є [−1, 1] Xi
=
(16)
= 0 we can iterate w.r.t a non-zero
(14) A, and using the condition 0 € OF(x) we derive the following
(15)
Here, The parameter determines the denoising level. if \
is small, then it is equivalent to penalizing the fidelity term
more and the minimizer u is very close to original image.
On the other hand, for large λ the minimizer is a cartoon
representation of the image ƒ.
c(x) = Ab – A¹ Ax € \u(x)
=
(17)
A sparse support set T {i: c) (x) = \} can be generated
according to (16) at the k-th iteration. Then we can compute
the update directions for non-zero coefficients of xk. The
newton update equations derived involve only the non-zero
coefficients of x*, which will usually be very small number
since x is sparse. The complexity of Homotopy Algorithms is
bounded by O(dm²+dmn) which is much better than Interior
Point Methods.
B. Templates for Convex Cone Solvers (TFOCS)
It is a recently developed novel method [3] which can be
used to solve the general constrained problem of the type
min
Ax+bЄK
f(x)
where A is a linear operator and K is a convex cone.We can
observe that that equation (1) and (2) fall into the above cat-
egory. TFOCS proposes to use the soft-thersholding operator
to solve the dual problem of l₁ minimization.
In this algorithm we wish to use the generalized project
gradient descent to obtain the solution of dual problem of
(1) and in turn get the primal solution. But if the dual cost
function is not smooth the gradient descent cannot be applied.
In order to overcome the above problem we use a smoothing
term (x) to the primal problem. For the (1) problem we get
the following problem
XC
min ||||1 +μφ(α) subject to b = Ax (18)
here μ > 0 is the smoothing parameter and ☀(x) is strongly
convex function. (x) is usually taken as (x) = ||x-xo||2
, xo has to be specified. We can form the Lagrangian for as
follows
-
Cu(a, 0) = |lx|| + |x – xo} – 07 (Ax – b)
н
2
here is the Lagrangian multiplier.
=
μ
preferred because of its smoothness.
The Lagrangian of (20) is given by the following
£g(2,0) = g(e) + \\h()+0h()
(21)
here Є Rm is a Lagrange multiplier. L(x, 0) is the
Augmented Lagrangian of (1). It can be shown that there exists
0* € Rm and έ* € R such that
X*
x* = arg min (x, 0*)
Ꮖ
Vε > &*
(22)
To find the optimal solution for (1) we can use the method of
multipliers for minimizing the augmented Lagrangian function
L(x, 0). We get the following iterative scheme for solving
(22)
xk+1 =
Øk+1
=
argminÎL¸x (x, Ok)
0k+$kh(k+1)
(23)
(24)
here {k} is a pre-defined positive sequence. Convergence is
guaranteed provided that {0} is a bounded sequence and {k}
is sufficiently large after a certain index.
The choice of {k} is problem dependent. Higher the
value of έ more difficult it is to minimize L (x, Øk). For
experiments of synthetic data we let it be that έk → ∞ while
(19) for real face data we chose some fixed k = § ·
We can define the dual problem for above as maxLµ(0),
where Lu(0) min Lµ(x, 0). We also note that Lμ(0) is
smooth concave function whose gradient can be calculated
and is given by
▼Lµ(0) = b − Аîµ(0)
D. ALM for dual problems (DALM)
The concepts described above can also be dual problem of
(1) which is
y subj. toА¹y Є B∞
b
Y
(25)
here B∞ = {x € R" : ||x||∞ ≤1. Writing the augmented
Using first-order project gradient methods we can update the Lagrangian for above we get
primal and dual variables as follows
Xk
=
soft(xo+ (AT0k, 1)
0k+1 = 0k + tk (b – Axk)
A||
where {tr} are the step-sizes satisfying tATA for all
k. Note that for a small μ the solution to () reduces to the
solution of (1). The values of µ and xo determine the number
of iterations.
C. Augmented Lagrangian Methods (PALM)
We will discuss here a special class of first-order methods
called the Augmented Lagrangian Methods [2] We know that
Lagrangian eliminates the equality constraints by including a
penalty term in the cost function for the minimization problem.
Let us consider a modified cost function for (1)
x
min_9(x)+ ||h(x)||3 subj.to_h(x) = 0
here g(x)
=
(20)
|||||1 and h(x) = b - Ax, observe that for g
and h are continuous and convex functions. (20) has the same
optimal solution as that of (1) for any § > 0 . Although we
can use other kinds of penalty functions by the quadratic is
B
miny,z – b¹y — x¹ (z — A¹y) + ||z – A¹y||½
–
2
subj. to z Є Boo
here x is the Lagrange Multiplier of the dual problem. For
ease of computation we adopt an alternation strategy to solve
the above problem. In this we iteratively minimize the cost
function with respect to on variable while holding the rest
constant.
Given (xk, yk) the update rule for z is
Zk+1 = Pâ÷ (A¹yk +xk/B)
(26)
here PB is the projection operator onto B. Given
(xk, Zk+1) minimizing w.r.t to y we get a Least Squares
problem which can be solved as follows:
-
-
BAATу = ẞAzk+1 − (Axk − b)
and the update equation for x is
xk+1 = Xk - ẞ(2k+1 − ATуk+1
(27)
(28)
The dual algorithm always converges because all the subprob-
lems are solved exactly.
Relative Error of x
((a)) Original Image
WILSON LAW FIRM, PLC
Century Title Services, Inc.
4725 GARSE MILL ROAD
((b)) After adding gaussian noise (PSNR=20.14)
((c)) Denoising using ROF model (PSNR=26.08)
((d)) Denoising using IRL1 (PSNR=24.80)
Fig. 1: Image denoising problem. Denoising the image using ROF model (100 iterations, time elapsed:1.13 sec) and IRL1
algorithm (40 iterations,time elapsed:24.57 sec)
0
10°
10-5
10-15
PALM
TFOCS
Homotopy
DALM
100
10°
PALM
10-5
TFOCS
Homotopy
DALM
PALM
10-5
TFOCS
Homotopy
DALM
10-1
10-15
10-15
10-20
0
1
2
10-20
3
4
5
0
1
2
3
4
10-20
5
Time(s)
6
7
8
0
1
2
3
4
5
Time(s)
6
7
8
((a)) m = 800,n=1000,d=100
Time(s)
((c)) m =
800,n=2000,d=200
((b)) m = 800,n=1000,d=100
Fig. 2: Relative error v/s CPU time
IV. RESULTS AND SIMULATIONS
A. Image Denoising
Figure 1 shows original image (Figure 1(a)) in consideration
and the noisy image by adding a gaussian noise of σ = 25.
The peak signal to noise ration (PSNR) for the noisy image
is 20.143045 db.
1) ROF Image Denoising Model: ROF algorithm was run
for noisy image for λ = 10 and 100 iterations (figure 1(c)).
The peak SNR of denoised Image by ROF denoising model is:
26.088432 db and the time taken for 100 iterations is 24.57
seconds. On increasing the number of iterations the PSNR
value increases and saturates after around 2000 iterations.
2) Iterative Reweighted l₁ Algorithm: The output of de-
noised image by IRL1 algorithm is shown in figure 1(d). Peak
SNR of denoised image by IRL1 algorithm found out to be
24.80 db after 40 iterations which is less than as compared
to by ROF model. Also the time taken IRL1 is much higer
(24.57 seconds).
B. l1 Minimization algorithm comparison
-
To compare the various 1 minimization algorithms we
plot solve the (2) problem for each of them and plot the
relative error r(x) ||xo||2/||xo||2 w.r.t to number of
iterations. We have tested the algorithms by generating and
sparse signal and under determined linear system based on
Gaussian Distribution. The matrix A of size m x n ( m¡ n)
is generated by setting each entry a i.i.d gaussian. We have
normalized in column to have unit 2 norm. The support of
xo is chosen based on a uniform distribution in the interval
[-10,10]. The support of xo is also chosen randomly with
d i.e there are d non-zero entries in xo. We plot
the the relative error for different values of d and n keeping
m = 800 fixed as done in the paper [2].From Fig. 2 we can see
that DALM has the best performance among all other methods
discussed above.
=
Fig. 2 c) that in terms on time complexity and scaling in the
size of support of x. Though, we still have to test the above
algorithms on real face data.
REFERENCES
[1] Ochs. P; Dosovitskiy. A; Brox. T;Pock. T;"An iterated L1 Algorithm
for Non-Smooth Non-Convex Optimization in Computer Vision"IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), 2013.
[2] Yang, A.Y.; Zihan Zhou; Balasubramanian, A.G.; Sastry, S.S.; Yi Ma,
"Fast l₁ -Minimization Algorithms for Robust Face Recognition," Image
Processing, IEEE Transactions on, vol.22, no.8, pp.3234,3246, Aug. 2013
[3] S. R. Becker, E. J. Cands, and M. Grant, Templates for convex cone
problems with applications to sparse signal recovery, Math. Program.
Compon., vol. 3, no. 3, pp. 165218, 2010.
[4] D. Malioutov, M. Cetin, and A. Willsky, Homotopy continuation for
sparse signal representation, in Proc. IEEE Int. Conf. Acoust., Speech,
Signal Process., Mar. 2005, pp. 733736.
[5] Rudin, Leonid I., Stanley Osher, and Emad Fatemi. "Nonlinear total vari-
ation based noise removal algorithms.” Physica D: Nonlinear Phenomena
60.1 (1992): 259-268.
[6] Candes, Emmanuel J., Michael B. Wakin, and Stephen P. Boyd. "Enhanc-
ing sparsity by reweighted 1 minimization.” Journal of Fourier analysis
and applications 14.5-6 (2008): 877-905.
Time (s)
1
2
5
6
PALM
TFOCS
Homotopy
DALM
0
100 120 140 160
180
200 220 240 260 280 300
d
=
Fig. 3: Variation of Time Taken v/s d
In the above Fig. 3 we calculate the time is takes for
rk (x) ||x − x0||2/||x0||2 < 10-5 and plot that against
the size of support of x. As can be seen from Fig. 3 we
can see that we increase the size of support of x, Homotopy
performance degrades strongly. While DALM,PALM,TFOCS
have comparable performances. From the these two we can
say DALM works best among
V. CONCLUSION
We studied an efficient iterative algorithm to solve certain
class of linearly constrain non-convex problems and used it
to implement two image denoising algorithms ROF image
denoising and iterative reweighted l₁ algorithm and compared
the results for both of them.
We presented various el₁ minimzation approaches that can be
used in face recognition. From our simulation results we found
that DALM has the best performance. WE can observe from