Home > Introduction to ARMA Models

Introduction to ARMA Models

Page 1
Statistics 910, #8 1
Introduction to ARMA Models
Overview
1. Modeling paradigm 2. Review stationary linear processes 3. ARMA processes 4. Stationarity of ARMA processes 5. Identifiability of ARMA processes 6. Invertibility of ARMA processes 7. ARIMA processes
Modeling paradigm
Modeling objective A common measure used to assess many statistical models is their ability to reduce the input data to random noise. For example, we often say that a regression model “fits well” if its residuals ideally resemble iid random noise. We often settle for uncorrelated processes with data. Filters and noise Model the observed time series as the output of an un- known process (or model) M “driven by” an input sequence composed of independent random errors {ϵt}
iid
∼ Dist(0,σ2) (not necessarily nor- mal), ϵt → Process M → Xt From observing the output, say X1,...,Xn, the modeling task is to characterize the process (and often predict its course). This “signal processing” may be more appealing in the context of, say, underwater acoustics rather than macroeconomic processes. Prediction rationale If a model reduces the data to iid noise, then the model captures all of the relevant structure, at least in the sense that

Page 2
Statistics 910, #8 2 we obtain the decomposition Xt = E(Xt|Xt−1, ...) + ϵt = ˆXt + ϵt . Causal, one-sided Our notions of time and causation imply that the cur- rent value of the process cannot depend upon the future (nonantici- pating), allowing us to express the process M as Xt = M(ϵt,ϵt−1,...) . Volterra expansion is a general (too general?) expansion (like the infinite Taylor series expansion of a function) that expresses the process in terms of prior values of the driving input noise. Differentiating M with respect to each of its arguments, we obtain the one-sided expansion (Wiener 1958), Xt − µ =


j=0
ψjϵt−j + ∑
j,k
ψjkϵt−jϵt−k + ∑
j,k,m
ψjkmϵt−jϵt−kϵt−m + ··· , where, for example, ψjk =
∂2M ∂ϵt−j ∂ϵt−k
evaluated at zero. The first summand on the right gives the linear expansion. Linearity The resulting process is linear if Xt is a linear combination (weighed sum) of the inputs, Xt =


j=0
ψjϵt−j (1) Other processes are nonlinear. The process is also said to be causal (Defn 3.7) if there exists an white noise sequence ϵt and an absolutely summable sequence (or sometimes an l2 sequence) {ψj} such that (1) holds. The key notion of causality is that the current observation is a function of current and past white noise terms (analogous to a random variable that is adapted to a filtration). Invertibility The linear representation (1) suggests a big problem for iden- tifying and then estimating the process: it resembles a regression in which all of the explanatory variables are functions of the unobserved

Page 3
Statistics 910, #8 3 errors. The invertibility condition implies that we can also express the errors as weighted sum of current and prior observations, ϵt =


j=0
πjXt−j. Thinking toward prediction, we will want to have an equivalence of the form (for any r.v. Y ) E[Y |Xt,Xt−1,...] = E[Y |ϵt,ϵt−1,...] This equivalence implies that information in the current and past er- rors is equivalent to information in the current and past data (i.e., the two sigma fields generated by the sequences are equivalent). Notice that the conditioning here relies on the infinite collection of prior val- ues, not a finite collection back to some fixed point in time, such as t = 1. Implication and questions The initial goal of time series modeling using the class of ARMA models to be defined next amounts to finding a par- simonious, linear model which can reduce {Xt} to iid noise. Questions remain about how to do this: 1. Do such infinite sums of random variables exist, and how are they to be manipulated? 2. What types of stationary processes can this approach capture (i.e., which covariance functions)? 3. Can one express these models using few parameters?
Review: Stationary Linear Processes
Notation of S&S uses {wt} as the canonical mean zero, finite variance white-noise process (which is not necessarily normally distributed), wt ∼ WN(0,σ2) Convergence. For the linear process defined as Xt = ∑j ψjwt−j to exist, we need assumptions on the weights ψj. An infinite sum is a limit,


j=0
ψjwt−j = lim
n n

j=0
ψjwt−j,

Page 4
Statistics 910, #8 4 and limits require a notion of convergence (how else do you decide if you are close)? Modes of convergence for r.v.s include: • Almost sure, almost everywhere, with probability one, w.p. 1, Xn
a.s.
→ X, P{ω : limXn = X} = 1. • In probability, Xn
P
→ X, lim
n
P{ω : |Xn − X| > ϵ} = 0 • In mean square or l2, the variance goes to zero: E (Xn − X)2 → 0. l2 convergence of linear process requires that ∑
j
ψ2
j < ∞ or {ψj} ∈ l2.
Consequently ∑
j
ψjψj+k ≤ ∑ψ2
j < ∞.
In general, if {ψj} ∈ l2 and {Xt} is stationary, then the linear filter Yt =


j=0
ψjXt−j defines a stationary process with covariance function Cov(Yt+h,Yt) = γY (h) = ∑
j,k
ψjψkγX(h − j + k). Informally, Var(Yt) = ∑j,k ψjψkγ(k − j) ≤ γ(0)∑j ψ2
j .
Covariances When the “input” is white noise, then the covariances are infinite sums of the coefficients of the white noise, γY (h) = σ2
X ∞

j=0
ψjψj+h. (2)

Page 5
Statistics 910, #8 5 Absolutely summable S&S often assume that ∑|ψj| < ∞ (absolutely summable). This is a stronger assumption that simplifies proofs of a.s. convergence. For example, 1
j
is not absolutely summable, but is square summable. We will not be too concerned with a.s. convergence and will focus on mean-square convergence. (The issue is moot for ARMA processes.) Question Does the collection of linear processes as given define a vector space that allows operations like addition? The answer is yes, using the concept of a Hilbert space of random variables.

Page 6
Statistics 910, #8 6
ARMA Processes
Conflicting goals Obtain models that possess a wide range of covariance functions (2) and that characterize ψj as functions of a few parameters that are reasonably easy to estimate. We have seen several of these parsimonious models previously, e.g., • Finite moving averages: ψj = 0,j>q> 0. • First-order autoregression: ψj = φj,|φ| < 1. ARMA processes also arise when sampling a continuous time solution to a stochastic differential equation. (The sampled solution to a pth degree SDE is an ARMA(p,p − 1) process.) Definition 3.5 The process {Xt} is an ARMA(p,q) process if 1. It is stationary. 2. It (or the deviations Xt − E Xt) satisfies the linear difference equation written in “regression form” (as in S&S, with negative signs attached to the φs) as Xt − φ1Xt−1 −···− φpXt−p = wt + θ1wt−1 + ··· + θqwt−q (3) where wt ∼ WN(0,σ2). Backshift operator Abbreviate the equation (3) using the so-called back- shift operator defined as BkXt = Xt−k. Using B, write (3) as φ(B)Xt = θ(B)wt where the polynomials are (note the differences in signs) φ(z)=1 − φ1z −···− φpzp and θ(z)=1+ θ1z + ··· + θqzq Closure The backshift operator shifts the stochastic process in time. Be- cause the process is stationary (having a distribution that is invariant of the time indices), this transformation maps a stationary process

Page 7
Statistics 910, #8 7 into another stationary process. Similarly, scalar multiplication and finite summation preserve stationarity; that is, the vector space of sta- tionary processes is closed in the sense that if Xt is stationary, then so too is Θ(B)Xt so long as ∑j θ2
j < ∞.
Aside: Shifts elsewhere in math The notion of shifting a stationary process {...,Xt,Xt+1,... to {...,Xt−1,Xt,... has parallels. For ex- ample, suppose that p(x) is a polynomial. Define the operator S on the space of polynomials defined as S p(x) = p(x − 1). If the space of polynomials is finite dimensional (up to degree m), then we can write S = I + D + D2/2 + D3/3! + ···Dm/m! where I is the identity (I p = p) and D is the differntiation operator. (The proof is a direct application of Taylor’s theorem.) Value of backshift notation 1. Compact way to write difference equations and avoid backsub- stitution, Backsubstitution becomes the conversion of 1/(1 − x) into a geometric series; If we manipulate B algebraically in the conversion of the AR(1) to moving average form, we obtain the same geometric representation without explicitly doing the te- dious backsubstitution: φ(B)Xt = wt ⇒ Xt = wt 1 − φB = (1 + φB + φ2B2 + ...)wt = wt + φwt−1 + φ2wt−2 + ... . (4) Which clearly requires (as in a geometric series) that |φ| < 1. 2. Expression of constraints that assure stationarity and identifia- bility. 3. Express effects of operations on a process: • Adding uncorrelated observation random noise to an AR pro- cess produces an ARMA process. • A weighted mixture of lags of an AR(p) model is ARMA.

Page 8
Statistics 910, #8 8 Consider the claim that an average of several lags of an autoregression forms an ARMA process. Backshift polynomials make it trival to show this claim holds: φ(B)Xt = wt ⇒ θ(B)Xt = θ(B) φ(B) wt , which has the rational form of an ARMA process. Converting to MA form, in general In order to determine ψ(z), notice that θ(z)/φ(z) = ψ(z) implies that φ(z)ψ(z) = θ(z) . Given the normalization φ0 = θ0 = ψ0 = 1, one solves for the ψj by equating coefficients in the two polynomials (recursively).
Stationarity of ARMA Processes
Moving averages If p = 0, the process is a moving averge of order q, abbreviated an MA(q) process, Xt = wt + θ1wt−1 + ··· + θqwt−q . (5) This is a special case of the general linear process, having a finite number of nonzero coefficients (i.e., ψj = θj,j = 1,...,q). Thus the MA(q) process must be stationary with covariances of the form (2): γX(h) =
q−|h|

j=0
θjθj+|h|, |h| ≤ q, and zero otherwise. All moving averages are stationary Under the typical constraint of S&S that the coefficients of a moving average are absolutely summable, all moving averages are stationary — even moving averages of other mov- ing averages. Proof. Suppose ∑j θ2
j < ∞ and that Xt is stationary with covariance
function γX(h). The covariance function of Yt = ∑j θjXt−j is Cov(Yt+h,Yt) = Cov  


j=0
θjYt+h−j,


k=0
θkYt−k  

Page 9
Statistics 910, #8 9 =


j,k=0
θjθk Cov(Yt+h−j,Yt−k) =


j,k=0
θjθkγX(h − j + k) ≤   ∑
j
θ2
j
    ∑
j
γX(j)   The covariances are summable and invariant of t. Constraints remain An MA(q) process of finite order models a process that becomes uncorrelated beyond time separation q. There may be other limitations on the structure of the covariances. For example, consider the MA(1) model, Xt = wt + θ1wt−1. The covariances of this process are are γ(0) = σ2(1 + θ2
1), γ(1) = σ2θ1, γ(h)=0,h> 1.
Hence, ρ(1) = θ1 1 + θ2
1
< 1
2
which we can see from a graph or by noting that the maximum occurs where the derivative ∂ρ(1)/∂θ1 = 1 − θ2 (1 + θ2)2 = 0 . Don’t try to model the covariance function {γ(h)} = (1,0.8,0,...) with an MA(1) process! Other ARMA models place similar types of constraints on the possible covariances. Autoregressions If q = 0, the process is an autoregression, or AR(p), Xt = φ1Xt−1 + ··· + φpXt−p + wt (6) The stationarity of a solution to (6) is less obvious because of the pres- ence of “feedback” (beyond the AR(1) case considered previously). To investigate initially we make the AR process resemble a linear process (a weighted sum of past white noise) since we know that such a process is stationary.

Page 10
Statistics 910, #8 10 Factor the polynomial φ(z) as using its zeros (φ(zj) = 0) as φ(z) = (1 − z/z1)···(1 − z/zp) = ∏
j
(1 − z/zj) . Some of the zeros zj are likely to be complex. (Complex zeros come in conjugate pairs (say zj = zk) since the coefficients φj are real). As long as all of the zeros are greater than one in modulus (|zj| > 1), we can repeat the method used in (4) to convert {Xt} into a moving average, one term at a time. Since at each step we form a linear filtering of a stationary process with square-summable weights (indeed, absolutely summable weights), the steps are valid. AR(2) example These processes are interesting because they allow for complex-valued zeros in the polynomial φ(z). The presence of complex pairs produces oscillations in the observed process. For the process to be stationary, we need the zeros of φ(z) to lie outside the unit circle. If the two zeros are z1 and z2, then φ(z)=1 − φ1z − φ2z2 = (1 − z/z1)(1 − z/z2). (7) Since φ1 = 1/z1 + 1/z2 and φ2 = −1/(z1z2), the coefficients lie within the rectangular region −2 < φ1 = 1/z1 + 1/z2 < +2, −1 < φ2 = −1 z1z2 < +1 . Since φ(z) = 0 for |z| ≤ 1 and φ(0) = 1, φ(z) is positive for over the unit disc |z| ≤ 1 and φ(1) = 1 − φ1 − φ2 > 0 ⇒ φ1 + φ2 < 1 φ(−1)=1+ φ1 − φ2 > 0 ⇒ φ2 − φ2 < 1 From the quadratic formula applied to (7), φ2
1 + 4φ2 < 0 implies that
the zeros form a complex conjugate pair, z1 = reiλ,z2 = re
−iλ
⇒ φ1 = 2 cos(λ)/r, φ2 = −1/r2. Turning to the covariances of the AR(2) process, these satisfy the difference equation γ(h)−φ1γ(h−1)−φ2γ(h−2) = 0 for h = 1,2,....

Page 11
Statistics 910, #8 11 We need two initial values to start these recursions. To make this chore easier, work with correlations. Dividing by γ(0) gives ρ(h) − φ1ρ(h − 1) − φ2ρ(h − 2) = 0 , and we know ρ(0) = 1. To find ρ(1), use the equation defined by γ(0) and γ(1) = γ(−1): ρ(1) = φ1ρ(0) + φ2ρ(−1) = φ1 + φ2ρ(1), which shows ρ(1) = φ1/(1 − φ2). When the zeros are a complex pair, ρ(h) = crh cos(hλ), a damped sinusoid, and realizations exhibit quasi- periodic behavior. ARMA(p,q) case In general, the process has the representation (again, |zj| > 1) Xt = θ(B) φ(B) wt = ∏
j(1 − B/sj)

j(1 − B/zj)
wt = ψ(B)wt (8) where sj,j = 1,...,q are the zeros of θ(B) and ψ(B) = θ(B)/φ(B). This is a sum of q stationary processes, and thus stationary. Station- arity does not require that |sj| > 1; that’s required for invertibility (defined below).
Identifiability of ARMA processes
Identifiable A model with likelihood L(θ) is identified if different param- eters produce different likelihoods, θ1 = θ2 → L(θ1) = L(θ2). For Gaussian time series, this condition amounts to having a 1-to-1 corre- spondence between the parameters and the covariances of the process. Analogy to regression The most common example of a poorly identified model is a regression model with collinear explanatory variables. If X1 + X2 = 1, say, then Y = β0 + β1X1 + 0 X2 + ϵ ⇔ Y = (β0 + β1)+0 X1 − β1X2 + ϵ Both models obtain the same fit, but with very different coefficients. Least squares can find many fits that all obtain the same R2 (the coefficients lie in a subspace).

Page 12
Statistics 910, #8 12 Non-causal process. These “odd” processes hint at how models are not identifiable. Suppose that |˜φ| > 1. Is there a stationary solution to Xt − ˜φXt−1 = Zt for some white-noise process Zt? The surprising answer is yes, but it’s weird because it runs backwards in time. The hint that this might happen lies in the symmetry of the covariances, Cov(Xt+h,Xt) = Cov(Xt,Xt+h). To arrive at this representation, forward-substitute rather than back- substitute. This flips the coefficient from ˜φ to 1/˜φ < 1. Start with the process at time t + 1 Xt+1 = ˜φXt + wt+1 ⇒ Xt = (1/˜φ)Xt+1 − (1/˜φ)wt+1 . Continuing recursively, Xt = −wt+1/˜φ + (1/˜φ)Xt+1 = −wt+1/˜φ + (1/˜φ) ( −wt+2/˜φ + (1/˜φ)Xt+2 ) = −wt+1/˜φ − wt+2/˜φ2 + Xt+2/˜φ2 ... = −
k

j=1
wt+j/˜φj + Xt+k/˜φk which in the limit becomes Xt = − wt+1 ˜ φ − wt+2 ˜ φ2 −··· = −


j=1
˜ φ
−j
wt+j = −


j=0
˜ φ
−j
˜wt+1+j, where ˜wt = wt/˜φ. This is the unique stationary solution to the differ- ence equation Xt − ˜φXt−1 = wt. The process is said to be non-causal since Xt depends on “future” errors ws,s>t, rather than those in the past. If |˜φ| = 1, no stationary solution exists. Non-uniqueness of covariance The covariance formula (2) implies that Cov(−


j=0
˜ φ
−j
˜wt+1+h+j,−


j=0
˜ φ
−j
˜wt+1+j) = σ2 ˜ φ2 (1/˜φ)|h| 1 − (1/˜φ)2 Thus, the non-causal process also has same correlation function as the more familiar process with coefficient |1/˜φ| < 1 (the non-causal version has smaller error variance).

Page 13
Statistics 910, #8 13 Not identified Either choice for φ =1(φ and 1/φ) generates a solution of the first-order difference equation Xt = φXt−1 + wt . If |φ| < 1, we can find a solution via back-substitution. If |φ| > 1, we obtain a stationary distribution via forward substitution. For a Gaussian process with mean zero, the likelihood is a function of the covariances. Since these two have the same correlations, the model is not identified. Either way, we cannot allow |φ| = 1. Moving averages: one more condition Such issues also appear in the analysis of moving averages. Consider the covariances of the two pro- cesses Xt = wt + θwt−1 and Xt = wt−1 + θwt−2 The second incorporates a time delay. Since both are finite moving averages, both are stationary. Is the model identified? It is with the added condition that ψ0 = 1. Covariance generating function This function expresses the covariance of an ARMA process in terms of the polynomials φ(z) and θ(z). The moving average representation of the ARMA(p, q) process given by (8) combined with our fundamental result (2) implies that the covariances of {Xt} are γ(h) = σ2


j=0
ψj+|h|ψj (9) where ψj is the coefficient of zj in ψ(z). The sum in (9) can be recognized as the coefficient of zh in the product ψ(z)ψ(z−1), implying that the covariance γ(h) is the coefficient of zh in Γ(z) = σ2ψ(z)ψ(z
−1
) = σ2 θ(z)θ(z−1) φ(z)φ(z−1) , which is known as the covariance generating function of the process. Reciprocals of zeros The zeros of φ(1/z) are the reciprocals of those of φ(z). Hence, as far as the covariances are concerned, it does not matter

Page 14
Statistics 910, #8 14 whether the zeros go inside or outside the unit circle. They cannot lie on the unit circle. Since both φ(z) (which has zeros outside the unit circle) and φ(1/z) (which has zeros inside the unit circle) both appear in the definition of Γ(z), some authors state the conditions for stationarity in terms of one polynomial or the other. In any case, no zero can lie on the unit circle. Further identifiability issue: Common factors Suppose that the poly- nomials share a common zero r, θ(z)=˜θ(z)(1 − z/r), φ(z)= ˜φ(z)(1 − z/r) Then this term cancels in the covariance generating function. Thus, the process ˜φ(B)Xt = ˜θ(B)wt has the same covariances as the process φ(B)Xt = θ(B)wt. To avoid this type of non-identifiability, we require that φ(z) and θ(z) have distinct zeros.
Invertibility of ARMA processes
Invertible (Defn 3.8) An ARMA process {Xt} is invertible if there exists an absolutely summable sequence (or perhaps l2 sequence) {πj} such that wt =


j=0
πjXt−j . Conditions for invertible ARMA Assume that the polynomials φ(B) and θ(B) have no common zeros. The process {Xt} is invertible if and only if the zeros of the moving average polynomial θ(B) lie outside the unit circle.
ARIMA processes
Nonstationary processes are common in many situations, and these would at first appear outside the scope of ARMA models (certainly by the definition of S&S). The use of differencing, via the operator (1 − B)Xt = Xt − Xt−1 changes this.

Page 15
Statistics 910, #8 15 Differencing is the discrete-time version of differentiation. For example, differencing a process whose mean function EXt = a + bt is trending in time removes this source of nonstationarity. For example, if {Xt} is a stationary process, then differencing Yt = α + β t + Xt ⇒ (1 − B)Yt = β + Xt − Xt−1 reveals the possibly stationary component of the process. (Note, how- ever, that if Xt were stationary to begin with, the differences of Xt would not be stationary!) ARIMA(p,d,q) models are simply ARMA(p,q) models with Xt replaced by (1 − B)d Xt where (1 − B)d is manipulated algebraically. Long-memory processes are stationary (unlike ARIMA processes) and formed by raising the differencing operator to a fractional power, say (1 − B)1/4. With time, we will study these later.
Search more related documents:Introduction to ARMA Models
Download Document:Introduction to ARMA Models

Recent Documents:

Set Home | Add to Favorites

All Rights Reserved Powered by Free Document Search and Download

Copyright © 2011
This site does not host pdf,doc,ppt,xls,rtf,txt files all document are the property of their respective owners. complaint#nuokui.com
TOP