Home > Distributed Optimization via Alternating Direction Method of Multipliers

Page 1 |

Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University

ITMANET, Stanford, January 2011

Page 2 |

• precursors – dual decomposition – method of multipliers • alternating direction method of multipliers • applications/examples • conclusions/big picture

ITMANET, Stanford, January 2011

1

Page 3 |

• convex equality constrained optimization problem

minimize f(x) subject to Ax = b

• Lagrangian: L(x, y) = f(x) + yT (Ax − b) • dual function: g(y) = infx L(x, y) • dual problem: maximize g(y) • recover x⋆ = argmin

x

L(x, y⋆)

ITMANET, Stanford, January 2011

2

Page 4 |

• gradient method for dual problem: yk+1 = yk + ��k∇g(yk) • ∇g(yk) = A˜x − b, where ˜x = argmin

x

L(x, yk)

• dual ascent method is

xk+1 := argmin

x

L(x, yk)

// x-minimization

yk+1 := yk + ��k(Axk+1 − b) // dual update

• works, with lots of strong assumptions

ITMANET, Stanford, January 2011

3

Page 5 |

• suppose f is separable:

f(x) = f1(x1) + ··· + fN(xN), x = (x1,...,xN)

• then L is separable in x: L(x, y) = L1(x1,y) + ··· + LN(xN,y) − yT b,

Li(xi,y) = fi(xi) + y

T

Aixi

• x-minimization in dual ascent splits into N separate minimizations

x

k+1 i

:= argmin

xi

Li(xi,y

k

) which can be carried out in parallel

ITMANET, Stanford, January 2011

4

Page 6 |

• dual decomposition (Everett, Dantzig, Wolfe, Benders 1960–65)

x

k+1 i

:= argmin

xi

Li(xi,yk), i = 1,...,N yk+1 := yk + ��k(P

N i=1

Aix

k+1 i

− b)

• scatter yk; update xi in parallel; gather Aix

k+1 i

• solve a large problem – by iteratively solving subproblems (in parallel) – dual variable update provides coordination • works, with lots of assumptions; often slow

ITMANET, Stanford, January 2011

5

Page 7 |

• a method to robustify dual ascent • use augmented Lagrangian (Hestenes, Powell 1969), �� > 0

L��(x, y) = f(x) + y

T

(Ax − b)+(��/2)kAx − bk2

2

• method of multipliers (Hestenes, Powell; analysis in Bertsekas 1982)

x

k+1

:= argmin

x

L��(x, y

k

) y

k+1

:= y

k

+ ��(Ax

k+1 − b)

(note specific dual update step length ��)

ITMANET, Stanford, January 2011

6

Page 8 |

• good news: converges under much more relaxed conditions

(f can be nondifferentiable, take on value +��,...)

• bad news: quadratic penalty destroys splitting of the x-update, so can��t

do decomposition

ITMANET, Stanford, January 2011

7

Page 9 |

• a method – with good robustness of method of multipliers – which can support decomposition

��robust dual decomposition�� or ��decomposable method of multipliers��

• proposed by Gabay, Mercier, Glowinski, Marrocco in 1976

ITMANET, Stanford, January 2011

8

Page 10 |

• ADMM problem form (with f, g convex)

minimize f(x) + g(z) subject to Ax + Bz = c

– two sets of variables, with separable objective • L��(x,z,y) = f(x) + g(z) + yT (Ax + Bz − c)+(��/2)kAx + Bz − ck2

2

• ADMM:

xk+1 := argmin

x

L��(x, zk,yk)

// x-minimization

zk+1 := argmin

z

L��(xk+1,z,yk)

// z-minimization

yk+1 := yk + ��(Axk+1 + Bzk+1 − c) // dual update

ITMANET, Stanford, January 2011

9

Page 11 |

• if we minimized over x and z jointly, reduces to method of multipliers • instead, we do one pass of a Gauss-Seidel method • we get splitting since we minimize over x with z fixed, and vice versa

ITMANET, Stanford, January 2011

10

Page 12 |

• assume (very little!) – f, g convex, closed, proper – L0 has a saddle point • then ADMM converges: – iterates approach feasibility: Axk + Bzk − c �� 0 – objective approaches optimal value: f(xk) + g(zk) �� p⋆

ITMANET, Stanford, January 2011

11

Page 13 |

• operator splitting methods

(Douglas, Peaceman, Rachford, Lions, Mercier, . . . 1950s, 1979)

• proximal point algorithm (Rockafellar 1976) • Dykstra��s alternating projections algorithm (1983) • Spingarn��s method of partial inverses (1985) • Rockafellar-Wets progressive hedging (1991) • proximal methods (Rockafellar, many others, 1976–present) • Bregman iterative methods (2008–present) • most of these are special cases of the proximal point algorithm

ITMANET, Stanford, January 2011

12

Page 14 |

• want to solve problem with N objective terms

minimize P

N i=1

fi(x)

– e.g., fi is the loss function for ith block of training data • ADMM form:

minimize P

N i=1

fi(xi) subject to xi − z = 0

– xi are local variables – z is the global variable – xi − z = 0 are consistency or consensus constraints

ITMANET, Stanford, January 2011

13

Page 15 |

• L��(x,z,y) = P

N i=1

fi(xi) + yT

i

(xi − z)+(��/2)kxi − zk2

2

• ADMM:

x

k+1 i

:= argmin

xi

fi(xi) + y

kT i

(xi − zk )+(��/2)kxi − zkk2

2

z

k+1

:= 1 N

N

X

i=1

x

k+1 i

+ (1/��)y

k i

y

k+1 i

:= y

k i

+ ��(x

k+1 i

− zk+1 )

ITMANET, Stanford, January 2011

14

Page 16 |

• using P

N i=1

yk

i

= 0, algorithm simplifies to x

k+1 i

:= argmin

xi

fi(xi) + y

kT i

(xi − xk )+(��/2)kxi − xkk2

2

y

k+1 i

:= y

k i

+ ��(x

k+1 i

− xk+1 ) where xk = (1/N)P

N i=1

xk

i

• in each iteration – gather xk

i

and average to get xk

– scatter the average xk to processors – update yk

i

locally (in each processor, in parallel)

– update xi locally

ITMANET, Stanford, January 2011

15

Page 17 |

• fi is negative log-likelihood for parameter x given ith data block • x

k+1 i

is MAP estimate under prior N(xk + (1/��)yk

i

, ��I)

• prior mean is previous iteration��s consensus shifted by ��price�� of

processor i disagreeing with previous consensus

• processors only need to support a Gaussian MAP method – type or number of data in each block not relevant – consensus protocol yields global maximum-likelihood estimate

ITMANET, Stanford, January 2011

16

Page 18 |

• data (examples) (ai,bi), i = 1,...,N, ai �� R

n

, bi �� {−1,+1}

• linear classifier sign(aT w + v), with weight w, offset v • margin for ith example is bi(aT

i

w + v); want margin to be positive

• loss for ith example is l(bi(aT

i

w + v))

– l is loss function (hinge, logistic, probit, exponential, . . . ) • choose w, v to minimize 1

N P N i=1

l(bi(aT

i

w + v)) + r(w)

– r(w) is regularization term (ℓ2, ℓ1,...) • split data and use ADMM consensus to solve

ITMANET, Stanford, January 2011

17

Page 19 |

• hinge loss l(u) = (1 − u)+ with ℓ2 regularization • baby problem with n = 2, N = 400 to illustrate • examples split into 20 groups, in worst possible way:

each group contains only positive or negative examples

ITMANET, Stanford, January 2011

18

Page 20 |

−3 −2 −1 0 1 2 3 −10 −8 −6 −4 −2 0 2 4 6 8 10

ITMANET, Stanford, January 2011

19

Page 21 |

−3 −2 −1 0 1 2 3 −10 −8 −6 −4 −2 0 2 4 6 8 10

ITMANET, Stanford, January 2011

19

Page 22 |

−3 −2 −1 0 1 2 3 −10 −8 −6 −4 −2 0 2 4 6 8 10

ITMANET, Stanford, January 2011

19

Page 23 |

Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers (Boyd, Parikh, Chu, Peleato, Eckstein) available at Boyd web site

ITMANET, Stanford, January 2011

20

- GUIDE 34 General requirements for the competence of reference material producers ISO GUIDE 34:2009(E) Contents Page Foreword ...........
- Trepte 2006 Social Identity Theory
- SURFTrac: Efficient Tracking and Continuous Object Recognition using Local Feature Descriptors
- Web based peer review
- ADMISSIONS INFORMATION SUMMARY - 2022
- องค์ประกอบที่ 3 กิจกรรมการพัฒนานิสิตนักศึกษา
- EMS Asia Booklet
- Farmwise: Your essential guide to health and safety in agriculture HSG270
- IntroductiontoImageJ
- GCE "A" Level H2 Chemistry Nov 2008 Paper 1 Suggested Solutions
- AI Press Release & Petition, Aug 12, 2015b
- Set-up & General Information
- Preparation and characterization of ultrafine zinc oxide powder by hydrothermal method
- Ph.D. thesis CATALYTIC HYDRODENITROGENATION (HDN) OF ORGANIC NITROGEN COMPOUNDS OVER SUPPORTED NICKEL PHOSPHIDE CATALYSTS Cecíl
- Normal growth rates for height children are: Normal growth rates for weight in children are:
- The Autism Society
- Drama in English Language Teaching:
- Evidence Class Notes
- SADC/COMI/2013/1.2C Version 2:180913 MEETING OF THE COMMITTEE OF MINISTERS RESPONSIBLE FOR TRANSPORT AND METEOROLOGY 14 – 16 OCTOB
- CPU Scheduling and Queueing Theory
- The Language of Prejudice, Discrimination, and Stereotypes: Annotated Bibliography Allport, Gordon. "The Language of Prejudice

- ERF
- GCC box
- tobacco
- rice
- disease resistance
- salt tolerance
- copyright
- the fair use
- permission use in the legal scenes
- authorized important patterns
- collective management
- online service provider safe harbors
- Tall fescue
- Zn
- Cd
- Combined pollution
- Seed germination
- Seedling growth
- Physiological processes
- Wireless Mesh Networks
- Topology control

All Rights Reserved Powered by Free Document Search and Download

Copyright © 2011This site does not host pdf,doc,ppt,xls,rtf,txt files all document are the property of their respective owners. complaint#nuokui.com