Convex Optimization
Stephen Boyd Lieven Vandenberghe
Revised slides by Stephen Boyd, Lieven Vandenberghe, and Parth Nobel
5. Duality
Outline
Lagrangian and dual function
Lagrange dual problem
KKT conditions
Sensitivity analysis
Problem reformulations
Theorems of alternatives
Convex Optimization
Boyd and Vandenberghe
5.1
Lagrangian
▶
standard form problem (not necessarily convex)
minimize
f0(
x)
subject to
fi(
x) ≤ 0,
i = 1,...,
m
hi(
x) = 0,
i = 1,...,
p
variable
x ∈
Rn, domain D, optimal value
p★
▶
Lagrangian: L :
Rn ×
Rm ×
Rp →
R, with
dom L = D ×
Rm ×
Rp,
L(
x, , ) =
f0(
x) +
m
∑︁
i=1
ifi(
x) +
p
∑︁
i=1
ihi(
x)
– weighted sum of objective and constraint functions
–
i is
Lagrange multiplier associated with
fi(
x) ≤ 0
–
i is Lagrange multiplier associated with
hi(
x) = 0
Convex Optimization
Boyd and Vandenberghe
5.2
Lagrange dual function
▶
Lagrange dual function: g :
Rm ×
Rp →
R,
g( , ) = inf
x∈D
L(
x, , ) = inf
x∈D
(
f0(
x) +
m
∑︁
i=1
ifi(
x) +
p
∑︁
i=1
ihi(
x)
)
▶
g is concave, can be −∞ for some ,
▶
lower bound property: if ⪰ 0, then
g( , ) ≤
p★
▶ proof: if ˜
x is feasible and ⪰ 0, then
f0(˜
x) ≥
L(˜
x, , ) ≥ inf
x∈D
L(
x, , ) =
g( , )
minimizing over all feasible ˜
x gives
p★ ≥
g( , )
Convex Optimization
Boyd and Vandenberghe
5.3
Least-norm solution of linear equations
minimize
xTx
subject to
Ax =
b
▶ Lagrangian is
L(
x, ) =
xTx +
T (
Ax −
b)
▶ to minimize
L over
x, set gradient equal to zero:
∇
xL(
x, ) = 2
x +
AT
= 0
=⇒
x = −(1/2)
AT
▶ plug
x into
L to obtain
g( ) =
L((−1/2)
AT
, ) = −
1
4
TAAT
−
bT
▶ lower bound property:
p★ ≥ −(1/4)
TAAT
−
bT
for all
Convex Optimization
Boyd and Vandenberghe
5.4
Standard form LP
minimize
cTx
subject to
Ax =
b,
x ⪰ 0
▶ Lagrangian is
L(
x, , ) =
cTx +
T (
Ax −
b) −
Tx = −
bT
+ (
c +
AT
− )
Tx
▶
L is affine in
x, so
g( , ) = inf
x
L(
x, , ) =
{ −
bT AT
− +
c = 0
−∞
otherwise
▶
g is linear on affine domain {( , ) |
AT
− +
c = 0}, hence concave
▶ lower bound property:
p★ ≥ −
bT
if
AT
+
c ⪰ 0
Convex Optimization
Boyd and Vandenberghe
5.5
Equality constrained norm minimization
minimize
∥
x∥
subject to
Ax =
b
▶ dual function is
g( ) = inf
x
(∥
x∥ −
TAx +
bT
) =
{
bT
∥
AT
∥∗ ≤ 1
−∞ otherwise
where ∥
v∥∗ = sup∥
u∥≤1
uTv is dual norm of ∥·∥
▶ lower bound property:
p★ ≥
bT
if ∥
AT
∥∗ ≤ 1
Convex Optimization
Boyd and Vandenberghe
5.6
Two-way partitioning
minimize
xTWx
subject to
x2
i= 1,
i = 1,...,
n
▶ a nonconvex problem; feasible set contains 2
n discrete points
▶ interpretation: partition {1,...,
n} in two sets encoded as
xi = 1 and
xi = −1
▶
Wij is cost of assigning
i,
j to the same set; −
Wij is cost of assigning to different sets
▶ dual function is
g( ) = inf
x
(
xTWx +
∑︁
i
i(
x2
i− 1)
)
= inf
x
xT (
W +
diag( ))
x−
1T
=
{ −
1T W +
diag( ) ⪰ 0
−∞
otherwise
▶ lower bound property:
p★ ≥ −
1T
if
W +
diag( ) ⪰ 0
Convex Optimization
Boyd and Vandenberghe
5.7
Lagrange dual and conjugate function
minimize
f0(
x)
subject to
Ax ⪯
b,
Cx =
d
▶ dual function
g( , ) =
inf
x∈
dom f0
(
f0(
x)+(
AT
+
CT
)
Tx −
bT
−
dT
)
= −
f∗
0
(−
AT
−
CT
) −
bT
−
dT
where
f∗(
y) = sup
x∈
dom f (
yTx −
f (
x)) is conjugate of
f0
▶ simplifies derivation of dual if conjugate of
f0 is known
▶
example: entropy maximization
f0(
x) =
n
∑︁
i=1
xi log
xi,
f∗
0
(
y) =
n
∑︁
i=1
eyi−1
Convex Optimization
Boyd and Vandenberghe
5.8
Outline
Lagrangian and dual function
Lagrange dual problem
KKT conditions
Sensitivity analysis
Problem reformulations
Theorems of alternatives
Convex Optimization
Boyd and Vandenberghe
5.9
The Lagrange dual problem
(Lagrange)
dual problem
maximize
g( , )
subject to ⪰ 0
▶ finds best lower bound on
p★, obtained from Lagrange dual function
▶ a convex optimization problem, even if original
primal problem is not
▶ dual optimal value denoted
d★
▶ , are dual feasible if ⪰ 0, ( , ) ∈
dom g
▶ often simplified by making implicit constraint ( , ) ∈
dom g explicit
Convex Optimization
Boyd and Vandenberghe
5.10
Example: standard form LP
(see page 5.5)
▶ primal standard form LP:
minimize
cTx
subject to
Ax =
b
x ⪰ 0
▶ dual problem is
maximize
g( , )
subject to ⪰ 0
with
g( , ) = −
bT
if
AT
− +
c = 0, −∞ otherwise
▶ make implicit constraint explicit, and eliminate to obtain (transformed) dual problem
maximize −
bT
subject to
AT
+
c ⪰ 0
Convex Optimization
Boyd and Vandenberghe
5.11
Weak and strong duality
weak duality: d★ ≤
p★
▶ always holds (for convex and nonconvex problems)
▶ can be used to find nontrivial lower bounds for difficult problems,
e.g., solving the SDP
maximize −
1T
subject to
W +
diag( ) ⪰ 0
gives a lower bound for the two-way partitioning problem on page 5.7
strong duality: d★ =
p★
▶ does not hold in general
▶ (usually) holds for convex problems
▶ conditions that guarantee strong duality in convex problems are called
constraint
qualifications
Convex Optimization
Boyd and Vandenberghe
5.12
Slater’s constraint qualification
strong duality holds for a convex problem
minimize
f0(
x)
subject to
fi(
x) ≤ 0,
i = 1,...,
m
Ax =
b
if it is
strictly feasible,
i.e., there is an
x ∈
int D with
fi(
x) < 0,
i = 1,...,
m,
Ax =
b
▶ also guarantees that the dual optimum is attained (if
p★ > −∞)
▶ can be sharpened:
e.g.,
– can replace
int D with
relint D (interior relative to affine hull)
– linear inequalities do not need to hold with strict inequality
▶ there are many other types of constraint qualifications
Convex Optimization
Boyd and Vandenberghe
5.13
Inequality form LP
primal problem
minimize
cTx
subject to
Ax ⪯
b
dual function
g( ) = inf
x
(
(
c +
AT
)
Tx −
bT
)
=
{ −
bT AT
+
c = 0
−∞
otherwise
dual problem
maximize −
bT
subject to
AT
+
c = 0, ⪰ 0
▶ from the sharpened Slater’s condition:
p★ =
d★ if the primal problem is feasible
▶ in fact,
p★ =
d★ except when primal and dual are both infeasible
Convex Optimization
Boyd and Vandenberghe
5.14
Quadratic program
primal problem (assume
P ∈
Sn
++)
minimize
xTPx
subject to
Ax ⪯
b
dual function
g( ) = inf
x
(
xTPx +
T (
Ax −
b)
)
= −
1
4
TAP−1
AT
−
bT
dual problem
maximize −(1/4)
TAP−1
AT
−
bT
subject to ⪰ 0
▶ from the sharpened Slater’s condition:
p★ =
d★ if the primal problem is feasible
▶ in fact,
p★ =
d★ always
Convex Optimization
Boyd and Vandenberghe
5.15
Geometric interpretation
▶ for simplicity, consider problem with one constraint
f1(
x) ≤ 0
▶ G = {(
f1(
x),
f0(
x)) |
x ∈ D} is set of achievable (constraint, objective) values
▶
interpretation of dual function: g( ) = inf(
u,
t)∈G (
t +
u)
G
p★
g()
u +
t =
g()
t
u
G
p★
d★
t
u
▶
u +
t =
g( ) is (non-vertical) supporting hyperplane to G
▶ hyperplane intersects
t-axis at
t =
g( )
Convex Optimization
Boyd and Vandenberghe
5.16
Epigraph variation
▶ same with G replaced with A = {(
u,
t) |
f1(
x) ≤
u,
f0(
x) ≤
t for some
x ∈ D}
A
p★
g()
u +
t =
g()
t
u
▶ strong duality holds if there is a non-vertical supporting hyperplane to A at (0,
p★)
▶ for convex problem, A is convex, hence has supporting hyperplane at (0,
p★)
▶ Slater’s condition: if there exist (˜
u,˜
t)∈A with ˜
u < 0, then supporting hyperplane at (0,
p★)
must be non-vertical
Convex Optimization
Boyd and Vandenberghe
5.17
Outline
Lagrangian and dual function
Lagrange dual problem
KKT conditions
Sensitivity analysis
Problem reformulations
Theorems of alternatives
Convex Optimization
Boyd and Vandenberghe
5.18
Complementary slackness
▶ assume strong duality holds,
x★ is primal optimal, ( ★, ★) is dual optimal
f0(
x★) =
g( ★
,
★) = inf
x
(
f0(
x) +
m
∑︁
i=1
★
i fi(
x) +
p
∑︁
i=1
★
i hi(
x)
)
≤
f0(
x★) +
m
∑︁
i=1
★
i fi(
x★) +
p
∑︁
i=1
★
i hi(
x★)
≤
f0(
x★)
▶ hence, the two inequalities hold with equality
▶
x★ minimizes
L(
x, ★, ★)
▶ ★
i
fi(
x★) = 0 for
i = 1,...,
m (known as
complementary slackness):
★
i > 0 =⇒
fi(
x★) = 0,
fi(
x★) < 0 =⇒ ★
i = 0
Convex Optimization
Boyd and Vandenberghe
5.19
Karush-Kuhn-Tucker (KKT) conditions
the
KKT conditions (for a problem with differentiable
fi,
hi) are
1. primal constraints:
fi(
x) ≤ 0,
i = 1,...,
m,
hi(
x) = 0,
i = 1,...,
p
2. dual constraints: ⪰ 0
3. complementary slackness:
ifi(
x) = 0,
i = 1,...,
m
4. gradient of Lagrangian with respect to
x vanishes:
∇
f0(
x) +
m
∑︁
i=1
i∇
fi(
x) +
p
∑︁
i=1
i∇
hi(
x) = 0
if strong duality holds and
x, , are optimal, they satisfy the KKT conditions
Convex Optimization
Boyd and Vandenberghe
5.20
KKT conditions for convex problem
if ˜
x, ˜ , ˜ satisfy KKT for a convex problem, then they are optimal:
▶ from complementary slackness:
f0(˜
x) =
L(˜
x, ˜ , ˜ )
▶ from 4th condition (and convexity):
g( ˜ , ˜ ) =
L(˜
x, ˜ , ˜ )
hence,
f0(˜
x) =
g( ˜ , ˜ )
if Slater’s condition is satisfied, then
x is optimal if and only if there exist , that satisfy KKT conditions
▶ recall that Slater implies strong duality, and dual optimum is attained
▶ generalizes optimality condition ∇
f0(
x) = 0 for unconstrained problem
Convex Optimization
Boyd and Vandenberghe
5.21
Outline
Lagrangian and dual function
Lagrange dual problem
KKT conditions
Sensitivity analysis
Problem reformulations
Theorems of alternatives
Convex Optimization
Boyd and Vandenberghe
5.22
Perturbation and sensitivity analysis
(unperturbed) optimization problem and its dual
minimize
f0(
x)
subject to
fi(
x) ≤ 0,
i = 1,...,
m
hi(
x) = 0,
i = 1,...,
p
maximize
g( , )
subject to ⪰ 0
perturbed problem and its dual
minimoize
f0(
x)
subject to
fi(
x) ≤
ui,
i = 1,...,
m
hi(
x) =
vi,
i = 1,...,
p
maximize
g( , ) −
uT
−
vT
subject to ⪰ 0
▶
x is primal variable;
u,
v are parameters
▶
p★(
u,
v) is optimal value as a function of
u,
v
▶
p★(0, 0) is optimal value of unperturbed problem
Convex Optimization
Boyd and Vandenberghe
5.23
Global sensitivity via duality
▶ assume strong duality holds for unperturbed problem, with ★, ★ dual optimal
▶ apply weak duality to perturbed problem:
p★(
u,
v) ≥
g( ★
,
★) −
uT
★ −
vT
★ =
p★(0, 0) −
uT
★ −
vT
★
▶
implications
– if ★
i
large:
p★ increases greatly if we tighten constraint
i (
ui < 0)
– if ★
i
small:
p★ does not decrease much if we loosen constraint
i (
ui > 0)
– if ★
i
large and positive:
p★ increases greatly if we take
vi < 0
– if ★
i
large and negative:
p★ increases greatly if we take
vi > 0
– if ★
i
small and positive:
p★ does not decrease much if we take
vi > 0
– if ★
i
small and negative:
p★ does not decrease much if we take
vi < 0
Convex Optimization
Boyd and Vandenberghe
5.24
Local sensitivity via duality
if (in addition)
p★(
u,
v) is differentiable at (0, 0), then
★
i = −
p★(0, 0)
ui
,
★
i = −
p★(0, 0)
vi
proof (for ★
i
): from global sensitivity result,
p★(0, 0)
ui
= lim
t↘0
p★(
tei, 0) −
p★(0, 0)
t
≥ − ★
i
p★(0, 0)
ui
= lim
t↗0
p★(
tei, 0) −
p★(0, 0)
t
≤ − ★
i
hence, equality
p★(
u) for a problem with one (inequality) constraint:
u
p
★(
u)
p
★(0) − ★
u
u = 0
Convex Optimization
Boyd and Vandenberghe
5.25
Outline
Lagrangian and dual function
Lagrange dual problem
KKT conditions
Sensitivity analysis
Problem reformulations
Theorems of alternatives
Convex Optimization
Boyd and Vandenberghe
5.26
Duality and problem reformulations
▶ equivalent formulations of a problem can lead to very different duals
▶ reformulating primal problem can be useful when dual is difficult to derive, or uninteresting
common reformulations
▶ introduce new variables and equality constraints
▶ make explicit constraints implicit or vice-versa
▶ transform objective or constraint functions,
e.g., replace
f0(
x) by (
f0(
x)) with convex,
increasing
Convex Optimization
Boyd and Vandenberghe
5.27
Introducing new variables and equality constraints
▶ unconstrained problem: minimize
f0(
Ax +
b)
▶ dual function is constant:
g = inf
x L(
x) = inf
x f0(
Ax +
b) =
p★
▶ we have strong duality, but dual is quite useless
▶ introduce new variable
y and equality constraints
y =
Ax +
b
minimize
f0(
y)
subject to
Ax +
b −
y = 0
▶ dual of reformulated problem is
maximize
bT
−
f∗
0
( )
subject to
AT
= 0
▶ a nontrivial, useful dual (assuming the conjugate
f∗
0
is easy to express)
Convex Optimization
Boyd and Vandenberghe
5.28
Example: Norm approximation
▶ minimize ∥
Ax −
b∥
▶ reformulate as minimize ∥
y∥ subject to
y =
Ax −
b
▶ recall conjugate of general norm:
∥
z∥∗ =
{ 0
∥
z∥∗ ≤ 1
∞ otherwise
▶ dual of (reformulated) norm approximation problem:
maximize
bT
subject to
AT
= 0, ∥ ∥∗ ≤ 1
Convex Optimization
Boyd and Vandenberghe
5.29
Outline
Lagrangian and dual function
Lagrange dual problem
KKT conditions
Sensitivity analysis
Problem reformulations
Theorems of alternatives
Convex Optimization
Boyd and Vandenberghe
5.30
Theorems of alternatives
▶ consider two systems of inequality and equality constraints
▶ called
weak alternatives if no more than one system is feasible
▶ called
strong alternatives if exactly one of them is feasible
▶ examples: for any
a ∈
R, with variable
x ∈
R,
–
x >
a and
x ≤
a − 1 are weak alternatives
–
x >
a and
x ≤
a are strong alternatives
▶ a
theorem of alternatives states that two inequality systems are (weak or strong)
alternatives
▶ can be considered the extension of duality to feasibility problems
Convex Optimization
Boyd and Vandenberghe
5.31
Feasibility problems
▶ consider system of (not necessarily convex) inequalities and equalities
fi(
x) ≤ 0,
i = 1,...,
m,
hi(
x) = 0,
i = 1,...,
p
▶ express as
feasibility problem
minimize
0
subject to
fi(
x) ≤ 0,
i = 1,...,
m,
hi(
x) = 0,
i = 1,...,
p
▶ if system if feasible,
p★ = 0; if not,
p★ = ∞
Convex Optimization
Boyd and Vandenberghe
5.32
Duality for feasibility problems
▶ dual function of feasibility problem is
g( , ) = inf
x
(
m
i=1
ifi(
x) +
p
i=1
ihi(
x)
)
▶ for ⪰ 0, we have
g( , ) ≤
p★
▶ it follows that feasibility of the inequality system
⪰ 0,
g( , ) > 0
implies the original system is infeasible
▶ so this is a weak alternative to original system
▶ it is strong if
fi convex,
hi affine, and a constraint qualification holds
▶
g is positive homogeneous so we can write alternative system as
⪰ 0,
g( , ) ≥ 1
Convex Optimization
Boyd and Vandenberghe
5.33
Example: Nonnegative solution of linear equations
▶ consider system
Ax =
b,
x ⪰ 0
▶ dual function is
g( , ) =
{ −
Tb AT
=
−∞
otherwise
▶ can express strong alternative of
Ax =
b,
x ⪰ 0 as
AT
⪰ 0,
Tb ≤ −1
(we can replace
Tb ≤ −1 with
Tb = −1)
Convex Optimization
Boyd and Vandenberghe
5.34
Farkas’ lemma
▶ Farkas’ lemma:
Ax ⪯ 0,
cTx < 0 and
ATy +
c = 0,
y ⪰ 0
are strong alternatives
▶ proof: use (strong) duality for (feasible) LP
minimize
cTx
subject to
Ax ⪯ 0
Convex Optimization
Boyd and Vandenberghe
5.35
Investment arbitrage
▶ we invest
xj in each of
n assets 1,...,
n with prices
p1,...,
pn
▶ our initial cost is
pTx
▶ at the end of the investment period there are only
m possible outcomes
i = 1,...,
m
▶
Vij is the
payoff or final value of asset
j in outcome
i
▶ first investment is risk-free (cash):
p1 = 1 and
Vi1 = 1 for all
i
▶
arbitrage means there is
x with
pTx < 0,
Vx ⪰ 0
▶ arbitrage means we receive money up front, and our investment cannot lose
▶ standard assumption in economics: the prices are such that
there is no arbitrage
Convex Optimization
Boyd and Vandenberghe
5.36
Absence of arbitrage
▶ by Farkas’ lemma, there is no arbitrage ⇐⇒ there exists
y ∈
Rm
+ with
VTy =
p
▶ since first column of
V is
1, we have
1Ty = 1
▶
y is interpreted as a
risk-neutral probability on the outcomes 1,...,
m
▶
VTy are the expected values of the payoffs under the risk-neutral probability
▶ interpretation of
VTy =
p:
asset prices equal their expected payoff under the risk-neutral probability
▶
arbitrage theorem: there is no arbitrage ⇔ there exists a risk-neutral probability
distribution under which each asset price is its expected payoff
Convex Optimization
Boyd and Vandenberghe
5.37
Example
V =
1.0 0.5 0.0
1.0 0.8 0.0
1.0 1.0 1.0
1.0 1.3 4.0
,
p =
1.0
0.9
0.3
,
˜
p =
1.0
0.8
0.7
▶ with prices
p, there is an arbitrage
x =
6.2
−7.7
1.5
,
pTx = −0.2,
1Tx = 0,
Vx =
2.35
0.04
0.00
2.19
▶ with prices ˜
p, there is no arbitrage, with risk-neutral probability
y =
0.36
0.27
0.26
0.11
VTy =
1.0
0.8
0.7
Convex Optimization
Boyd and Vandenberghe
5.38