Home > Multi-stage Stochastic Programming, Stochastic Decomposition, and Connections to Dynamic Programming: An Algorithmic Perspective

Multi-stage Stochastic Programming, Stochastic Decomposition, and Connections to Dynamic Programming: An Algorithmic Perspective

Page 1
d d d
Data Driven Decisions @ The Ohio State University Industrial and Systems Engineering
Multi-stage Stochastic Programming, Stochastic Decomposition, and Connections to Dynamic Programming: An Algorithmic Perspective
Suvrajeet Sen Data Driven Decisions Lab, ISE Department Ohio State University Columbus, OH

Page 2
d d d
The Plan – Part I (70 minutes)
∎ The Mathematical Models
❑ 2-stage Stochastic Programs with Recourse ❑ T-stage Stochastic Programs with Recourse ❑ Non-anticipativity and Measurability of Decisions
∎ Energy Infrastructure Applications of SP ∎ Foundations for 2-stage Programs
❑ Subgradients of the Expected Recourse Function ❑ Subgradient and Stochastic Quasi-gradient Methods ❑ Deterministic Decomposition (Kelley/Benders/L-
shaped)
❑ Stochastic Decomposition

Page 3
d d d
The Plan - Part II
∎ Algorithms for T-stage Programs
❑ Scenario Decomposition
∎ Progressive Hedging Algorithm
❑ Tree-traversal (time-staged) methods
∎ Nested Benders�� Decomposition & Variants
❑ Discrete-time Dynamic Systems (not covered) ❑ Simulating Optimization (not covered)
∎ Multi-stage Extensions Stochastic Decomposition
∎ Comparative Remarks and Conclusions
The Plan – Part II (70 minutes)

Page 4
d d d
The Mathematical Models

Page 5
d d d
2-stage Stochastic Program with Recourse
Deterministic Data
Avoids having to deal with Zero Prob. Events
Stochastic Data
SP as a Large-Scale Linear Program

Page 6
d d d
T-stage Stochastic Program with Recourse
a.s.
Nested Expectation Formulation �� a la DP Deterministic Data Stochastic Data Non-anticipative

Page 7
d d d
What do we mean by Non-anticipativity?
Non-anticipative Path 1 Path 2
Since paths 1 and 2 share the same history until t=2 They must also share the same decisions until t=2 Scenario Tree
x
11
x
12
x
13
x
23
x
22
x
33
x
43
x
32
x
53
x
63
x
42
x
21
x
73
x
83

Page 8
d d d
Observations of the Decision Process
(x
0
, x
11
, x
12
, x
13
) (x
0
, x
11
, x
12
, x
23
) (x
0
, x
11
, x
22
, x
33
) (x
0
, x
11
, x
22
, x
43
) (x
0
, x
21
, x
32
, x
53
) (x
0
, x
21
, x
32
, x
63
) (x
0
, x
21
, x
42
, x
73
) (x
0
, x
12
, x
42
, x
83
)
Scenario Tree
x
11
x
12
x
13
x
23
x
22
x
33
x
43
x
32
x
53
x
63
x
42
x
21
x
73
x
83

Page 9
d d d
Probabilistic Language: Measurability
is measureable wrt to ��-algebra ℱt If we look at as a stochastic process then one can assign probability measures (on decisions) that are consistent with the stochastic process embedded in the scenario tree.

Page 10
d d d
What does this mean for algorithms?
Non-anticipative For Multi-stage SP Models, it is necessary to track decisions at each node of the scenario tree. So, for Multi-stage SD, we will track decisions by node number.
Scenario Tree
x
11
x
12
x
13
x
23
x
22
x
33
x
43
x
32
x
53
x
63
x
42
x
21
x
73
x
83

Page 11
d d d
Matching mathematical models with algorithms
∎ Stochastic LPs as Linear Programs
❑ Specialized Factorization Methods for Simplex
and Interior Point Methods (few scenarios)
∎ Scenario Decomposition
❑ Progressive Hedging Algorithm
∎ Tree-traversal (time-staged) methods
❑ Nested Benders�� Decomposition & Variants
∎ Simulating Optimization (time permitting)
❑ Stochastic Decomposition

Page 12
d d d
Energy Infrastructure Applications of SP

Page 13
d d d
Why Stochastic Programming in Power Systems? Long History
∎ Reliability Metrics in wide-spread use

Loss of Load Probability (LOLP)

Loss of Load Expectation (LOLE)

Expected Loss of Load Duration (ELOLD)
∎ Some SP References Prior to 2000

Murphy, Sen and Soyster (1981) �� Generation Planning

Louveaux and Smeers (1981) �� Generation Planning

Sherali, Sen and Soyster (1984) �� Electricity Prices

Prekopa and Boros (1989) �� System Reliability

Hobbs and Maheshwari (1990) �� System Planning

Frauendorfer, Glavitsch, and Bacher (1992) �� System Operations

Takriti, Birge and Long (1996) �� Stochastic UC �� SMIP

Takriti , Kassenbrink and Wu (2000) �� Electricity Contracts
∎ Conference of Probabilistic Methods Applied to Power Systems
(since the early 1990��s)

Page 14
d d d
Economic Dispatch Problem
❑ Next generation power grids highly dynamic ▪ Distributed Storage ▪ Cogeneration ▪ Large Scale Renewable Generation ▪ Real-time Pricing ❑ Mandates more proactive and fast operational systems like Economic Dispatch. ❑ The ED system updates the output levels of the committed generators to match the load demands in a cost-optimal manner. ▪ Present ED systems uses forecast of the order of 2 hours (myopic). ▪ Steep trends like wind ramping deteriorate the ED system. ▪ Increasing the foresight and resolution of the ED problem comes at the expense of additional computational complexity.

Page 15
d d d
Economic Dispatch Problem
Loads Wind Generator Thermal Generators

Page 16
d d d
ED Problem Formulation Features (V. Zavala, Argonne National Labs)
∎ 2-stage model of a T-period application
❑ Decisions for first stage �� played out over the next T-1 periods
as the second stage
❑ Randomness in second-stage is wind
∎ Each stage has Cost of Generation and following
constraints
❑ Generation capacity constraints ❑ Power Flow ❑ Flow Balance ❑ Power Flow Bounds ❑ Bus-angle Bounds ❑ Wind ramp constraints (randomness) ❑ Generation ramp constraints
∎ First stage only has generation ramp constraints (bounds)

Page 17
d d d
Foundations for 2-stage Programs: A Review of Basics

Page 18
d d d
∎ Let be a random variable defined on a
probability space
∎ Then ��static�� formulation of a stochastic
program is given by
∎ Why call it ��static��?

(the outcome) is revealed once, and the rest of the decision-model becomes deterministic
❑ This process may be repeated many times.
The commonly stated 2-stage SLP (will be stated again, as needed)

Page 19
d d d
The Recourse Function and its Expectation
∎ Usually, the matrix C
1
is NOT fixed �� For our presentation, we will make this assumption. This is called the ��Fixed-Recourse�� assumption.
∎ Assuming that h(x
0;��) is finite, LP duality implies
∎ Also LP theory => h(��;��) is piecewise linear, convex
and moreover,

Page 20
d d d
Subgradient Method (Shor/Polyak/Nesterov/Nemirovsky��)
∎ At iteration k let be given ∎ Let ∎ Then,
where denotes the projection operator on the set of the decisions and,
∎ Note that is very difficult to compute! ��.
Enter SQG! Use an unbiased estimate of
∎ How? Use a sample size of N
k
:
Interchange of Expectation and Subdifferentiation is required here

Page 21
d d d
Stochastic Quasi-gradient Method (SQG) (Ermoliev/ Gaivoronski/��)
∎ At iteration k let be given ∎ Replace of the previous slide with its
unbiased estimate
∎ Then,
with and, in addition:

Page 22
d d d
∎ Strengths
❑ Easy to Program, no master problem, and easily
parallelizable
∎ Weaknesses
❑ Non-adaptive step-sizes (e.g. Constant/k)

Needs a lot of fine-tuning to determining step-size (e.g. Constant)
❑ Convergence

Method makes good progress early on, but like other steepest-descent type methods, there is zig-zagging behavior
❑ Need ways to stop the algorithm

Difficult because upper and lower bounds on objective values are difficult to obtain
Strengths and Weaknesses of Subgradient Methods

Page 23
d d d
∎ Let be a random variable defined on a
probability space
∎ Then ��static�� formulation of a stochastic
program is given by
Kelley��s Cutting Plane/Benders��/L-shaped Decomposition for 2-stage SLP (Recall Problem)

Page 24
d d d
KBL Decomposition (J. Benders/Van Slyke/Wets)
∎ At iteration k let , and
be given. Recall
∎ Then define ∎ Let ∎ Then,
Constant Term of the Supporting Hyperplane ��Normal�� of the Supporting Hyperplane (same as the subgradient method)

Page 25
d d d
KBL Graphical Illustration
Expected Recourse Function
Approximation: f
k-1
Approximation: f
k

Page 26
d d d
Comparing Subgradient Method and KBL Decomposition
∎ Both evaluate subgradients ∎ Expensive Operation (requires solving as many second-stage
LPs as there are scenarios)
∎ Step size in KBL is implicit (user need not worry) ∎ Master program grows without bound and looks unstable in
the early rounds
∎ Stopping rule is automatic (Upper Bound – Lower Bound �� ��) ∎ KBL��s use of master can be a bottleneck for parallelization

Page 27
d d d
Regularization of the Master Problem (Ruszczynski/Kiwiel/Lemarechal ��)
∎ Addresses the following issue:
❑ Master program grows without bound and looks unstable
in the early rounds
∎ Include an incumbent and a proximity
measure from the incumbent, using �� >0 as a weight:
∎ Particularly useful in case of Stochastic
Decomposition.

Page 28
d d d
Where do we stand at this point in the Lecture?
Feature\Method Subgradient Method SQG Algorithm KBL Decomposition Subgradient or Estimation
Accurate Estimation Accurate
Step Length Choice Required
Yes Yes No
Stopping Rules
Unknown Unknown Known
Parallel Computations
Good Good Not so good- Good
Continuous Random Variables
No Yes No
First-stage Integer Variables
No No Yes
Second-stage Integer Variables
No No No
Of course for small instances, we can always try deterministic equivalents!

Page 29
d d d
Stochastic Decomposition (Sequential Sampling) Higle/Sen
∎ Allow arbitrarily many outcomes (scenarios)
including continuous random variables
∎ Requirement: can provide a simulator ∎ Assume: cost coefficients are deterministic
(although random costs will be allowed soon)

Page 30
d d d
Central Question: Scalability of each iteration
∎ If the number of scenarios is large, can we
afford to solve all second-stage LPs to obtain accurate subgradient estimates?
❑ No!
∎ Put another way: What is the smallest
number of LPs we can solve in each iteration, and yet guarantee asymptotic convergence?
❑ The SD answer: 1!
∎ How?

Page 31
d d d
Approximating the recourse function in SD
∎ At the start of iteration k, sample one more
outcome �� say ��k independently of
∎ Given solve the following LP
∎ Define
and calculate for
∎ Notice the mapping of outcomes
to finitely many dual vertices.

Page 32
d d d
Forming the approximation of the Expected Recourse Function.
∎ The estimated ��cut�� in SD is given by ∎ To calculate this ��cut�� requires one LP
corresponding to the most recent outcome and the ��argmax�� operations at the bottom of the previous slide
∎ In addition all previous cuts need to be
updated �� to make old cuts consistent with the changing sample size over iterations.

Page 33
d d d
Many more SD details (are skipped) �� see Higle and Sen (1999)
∎ Updating previously generated subgradients ∎ Defining incumbents ∎ Using regularized approximations ∎ Dropping subgradients (finite master) ∎ Stopping rules �� three phases
❑ Set of dual vertices stops changing ❑ Incumbent objective stabilizes ❑ Bootstrapped estimate of distribution of duality
gap is acceptably small

Page 34
d d d
Further comparisons including 2-stage SD
Feature\Method SQG Algorithm KBL Decomposition Stochastic Decomposition Subgradient or Estimation
Estimation Accurate Estimation
Step Length Choice Required
Yes No Needed Not Needed
Stopping Rules
Unknown Well Studied Partially Solved
Parallel Computations
Good Not so good- Good Not known
Continuous Random Variables
Yes No Yes
First-stage Integer Variables
No Yes Yes
Second-stage Integer Variables
No No Available in a Dissertation
Of course for small instances, we can always try deterministic equivalents!

Page 35
d d d
Comparisons between SD and SAA (Higle/Zhao accepted for publication)
Prob Approx. Value Seconds
20Term
(40 rvs)
Reg. SD 254,581 (79) 259.30 (31.85)
20Term
SAA 254,512 (55) approx. 10,000
Fleet20_3
(200 rvs)
Reg. SD 141,749 (18) 293.57 (2.45)
Fleet20_3
SAA 141,654 (6.5) approx. 12,000
SSN
(80 rvs)
Reg. SD 10.26 (0.14) 7491.81 (3728.81)
SSN
SAA 10.57 (0.28) approx. 100,000

Page 36
d d d
Why the difference in computational times between SD and SAA?
❑ If there are 1000 outcomes in the SAA
approximations, it requires the subproblem LP to be solved for 1000 outcomes in every iteration.
❑ Unlike SAA, the subproblem in SD is solved
for only one outcome, while approximations are used for other outcomes (from previous iterations). This explains the difference in computational times.

Page 37
d d d
Comparisons between SQG and SD
∎ Inventory coordination instance (Herer et al 2006) ∎ SQG method to solve a small inventory
transshipment problem
❑ Find order quantities to minimize total cost of
inventory and transshipment
❑ Example has 7 outlets which can ship goods
among themselves, if necessary
❑ Demand is normally distributed ❑ Herer et al ran the SQG method for K=3000
iterations, using subgradient estimates with N=1000 simulated outcomes in each iteration

Page 38
d d d
SQG trajectory of order quantities

Page 39
d d d
SD Trajectory of Order Quantities

Page 40
d d d
Optimal Values from 20 SD Runs

Page 41
d d d
Iterations may vary depending on the seed

Page 42
d d d
Running times for SD and SQG

Page 43
d d d
Multi-stage Stochastic Programming Algorithms

Page 44
d d d
Non-anticipative Solutions by Scenario (Rockafellar/Wets)
(x
0
, x
11
, x
12
, x
13
) (x
0
, x
11
, x
12
, x
23
) (x
0
, x
11
, x
22
, x
33
) (x
0
, x
11
, x
22
, x
43
) (x
0
, x
21
, x
32
, x
53
) (x
0
, x
21
, x
32
, x
63
) (x
0
, x
21
, x
42
, x
73
) (x
0
, x
21
, x
42
, x
83
)
Scenario Tree
x
33
x
23
x
13
x
43
x
53
x
63
x
73
x
83
x
11
x
21
x
12
x
22
,
x
32
x
21

Page 45
d d d
0 1 2 3 4 5 6 7 9 8
10 11 12 13 14
Scenario Tree
Non-anticipative Solutions by Node
Nodal Variables: X
n

Page 46
d d d
Scenario Tree
Relaxing Nonanticipativity Creates Clairvoyant Decisions
(x
0
, x
1
, x
2
, x
3
)1 (x
0
, x
1
, x
2
, x
3
)2 (x
0
, x
1
, x
2
, x
3
)3 (x
0
, x
1
, x
2
, x
3
)4 (x
0
, x
1
, x
2
, x
3
)5 (x
0
, x
1
, x
2
, x
3
)6 (x
0
, x
1
, x
2
, x
3
)7 (x
0
, x
1
, x
2
, x
3
)8

Page 47
d d d
Progressive Hedging Algorithm: Coordination of Clairvoyant Decisions
∎ The constraints may be considered a graph
X
n
– (x
t
)
��
= 0
X
0
X
1
X
2
X
3
X
4
X
5
X
6
No Non-anticipativity Constraints for the Terminal Stage

Page 48
d d d
Dualizing for Progressive Hedging
X
0
X
n
– (x
t
)
��
=0
Primal Constraints
X
n
Free
= 0
Dual Constraints
Time of node n

Page 49
d d d
Dualizing for Progressive Hedging
X
1
X
2
Time of node n
= 0
Primal Problem Lagrangian Dual Problem

Page 50
d d d
Regularized of the Lagrangian Dual
Regularized Lagrangian Dual Problem
Where and are given at the start of any iteration. Also assume that satisfies dual feasibility The Progressive Hedging Strategy is Really Simple: Fix Two of the Three Categories of Variables, and Optimize the Third in the following order: Primal , followed by the Primal X (the conditional mean) and finally solve for . . Now Repeat this Procedure until the change in estimated solution is within acceptable range.

Page 51
d d d
Summary of the PHA Process
∎ Let ∎ Note that this minimization only involves data
for the outcome
∎ Next ��minimizing�� with respect to X, gives a
new estimate for the conditional expectations. This is simply the conditional expectation of the new vectors
∎ Finally, update the dual multipliers:

Page 52
d d d
Comments on the PHA Process
∎ Coordination process has no master problem –
making it highly suited for parallelization
∎ The Lagrange Multipliers provide ex-post
estimates of prices or subsides for every scenario
❑ But very large search spaces because of exponentially
many dual variables.
∎ The method has also been used as a heuristic
for Stochastic Mixed-Integer Programs (see Watson, Wets and Woodruff, as well as PySP (part of Coopr at Sandia).

Page 53
d d d
x11 x21 x12 x22 x32 x42 x13 x23 x83
Scenario Tree
Nested Benders�� Decomposition (Birge/Gassman/Dempster ��)
Remember: All data and conditional probabilities of the Multi-stage Stochastic LP are supplied

Page 54
d d d
Nested Benders�� Decomposition (Birge/Gassman/Dempster ��)
Information Visualization
• Upstream nodes place ��orders�� based on a local decision (e.g. x
22
)
• Downstream nodes respond with prices (i.e. subgradients) and feasibility facets
x11 x21 x12 x22 x32 x42 x13 x23 x83
Scenario Tree

Page 55
d d d
Notation: j is an index for a stage i is an index of a node in stage j i- (��i minus��) is an upstream node.
Each Node of the Tree will ��House�� an LP
Prices (i.e. subgradients) supplied by downstream nodes Feasibility facets supplied by downstream nodes
��Orders�� from upstream

Page 56
d d d
��Nested�� Benders�� Method
∎ Traverse the tree solving LPs whenever
feasible. In this case, pass a Subgradient to the upstream node
∎ If any LP is infeasible, pass a ��Feasibility
Facet�� to the upstream node.
∎ Question?
❑ Can this algorithm be run via asynchrounous
processing, and still converge?

Page 57
d d d
Comments on ��Nested�� Benders��
∎ Has been extended to sampling the tree (but
you still work the same ��probability��). So, asymptotic convergence does NOT rely on subgradients that are stochastic (Philpott and Guan 2008)
∎ Extensions to Stochastic Subgradients are
right around the corner (under revision)

Set Home | Add to Favorites

All Rights Reserved Powered by Free Document Search and Download

Copyright © 2011
This site does not host pdf,doc,ppt,xls,rtf,txt files all document are the property of their respective owners. complaint#nuokui.com
TOP