Home > Chapter Thirteen McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Linear Regression and Correlation

**Chapter**

**Thirteen**

*McGraw-Hill/Irwin*

*© 2006
The McGraw-Hill Companies, Inc., All Rights Reserved.*

**Linear Regression
and Correlation**

**The Business
of Prediction**

**Important to
know if relationships exist among variables**.

Egs.

Amount of gas & the mileage

Advertisement budget & actual sales

Population vs. precipitation

**Recognizing
and modeling the relationship between two variables can be useful in
predicting.**

Eg.

Predicting how much the sales
revenue would be if a certain dollar amount is spent on advertising.

The Dependent Variable is the variable being predicted or estimated.

The Independent Variable provides the basis for estimation. It is the predictor variable.

**Correlation
Analysis**

Measurement
of association between two variables.

A Scatter Diagram
is a chart that portrays the relationship between two variables.

If you suspect two variables to have a relationship, start with drawing a scatter plot.

**Using Excel
to create a Scatter Plot (Chart Wizard)**

*Example
on Page 379-81*

Correlation
Coefficient (Pearson *R*)

- Measures strength of the relationship between two variables.
- It requires interval or ratio-scaled data.
- It can range from -1.00 to 1.00.
- Positive values indicate a direct relationship & negative values indicate an inverse relationship.
- Values of -1.00 or 1.00 indicate perfect and strong correlation.
- Values close to 0.0 indicate weak correlation.

- It
is the square of the coefficient of correlation (
).*R*

- It
also ranges from 0 to 1.

- The
proportion of the total variation in the dependent variable (Y) that
is explained or accounted for by the variation in the independent variable
(X).

*Eg.
80% of the variation in miles driven is accounted by number of gallons
in the tank. The 20% is influenced by road conditions, number of passengers,
etc.**
(more discussion later!).*

The coefficient
of determination (*R*^{2})

**Correlation
and Cause**

A high correlation indicates a strong relationship between the variables.

But they don��t necessarily
mean, one variable influences the other.

Eg: Higher SAT scores lead
to better college grades

Another eg:

A study measured

the number of TV sets per
person (say, X) & the life expectancy (say, Y) for every country.

The study found a high correlation.

On this basis, it was concluded that countries with more TV sets have a higher life expectancies.

**Regression
Analysis**

If there is a strong relationship (r value) between two variables, one can estimate a linear model of the form:

**Y��= a + bX
***[a=Estimate
of ��; b=Estimate of ��]*

where

**Y��** is the predicted value and Y is the
actual value for a given X.

**a** is the Y-intercept *(it is value
of Y�� when X=0).*

**b** is the slope of the line, or the average
change in Y�� for each change of one unit in X

The *least** *** squares
principle** is used
to fit the line. ie., ��(Y – Y��)

**a** and **b** are calculated as:

*b *
= *r*

*s*_{y}

*s*_{x}

*a*
= Y – *b*X

*{ Regression
line always passes through (X,Y) }*

*{ If r=1,
slope is similar to ��y/��x }*

Error in

prediction

**��**

(Actual)
(Predicted)

(Y�� – Y)
is the error in prediction

**.**

**Error in prediction**

** Example**

*(page
400)*** **

The production supervisor of XYZ
Inc. looked at the number of units produced by 5 of his employees during
a week. He also looked at how long they had been working for the company.

(Years) (#ofUnits)

The supervisor wants to know

(i)
if there is a correlation between X and Y (ie.* R*)

(ii) the equation to the regression line (ie. Y�� = a + bX)

(iii)
how much of variation in Y is explained by X (ie. R^{2})

**Using Excel
for Regression**

**1. What
is the independent variable?**

**2. What
is the dependent variable?**

**3. What
is the regression equation?**

**4. Is it
a significant predictor of #Units?**

**5. Is Years
a sig. predictor of #Units?**

**6. If one
had Years=20, predict #Units.**

**7. Construct
a 95% CI around it.**

*Use SE to
calculate CI*

*Watch the
screencam tutorial in the book CD to learn how to use Excel for regression*

*See also
lab handout.*

*The equation
will be correct in 96% of the cases*

**Calculating
Total Variation**

**Page**

**402**

Now, we want to
find out how much of this variation is contributed by Years on Job.

(Years) (#ofUnits)

The sample
mean is 6. Total Variation is given by ��(Y-Y)^{2}.

*[see Chapter
3, pages 78 & 80]*

When we came
up with the Regression equation,

** Y�� = 2
+ 0.4X**

we added the
assumption that Years on the job & Production are related.

Let us see how well this equation
fits our data.

It can be seen that the ��fit�� between Y�� and Y is not ��perfect��.

Let us calculate the ** error
variation** as shown
in next slide.

**Catching
the ��Error��**

(Actual) (Predicted)

**Calculating
Unexplained Variation (Error in prediction)**

**Page**

**401**

Unexplained variation

Total variation

1 -

R^{2} =

**Calculating
Coefficient of Determination**

Substituting the values for the Unexplained & Total variations from our example problem, we get

= 1 – 4/20

= 16/20

= 0.8*

Thus, we say that 80% of the variation in weekly production is explained by years of experience on the job.

* Compare this
with the computer output on the next slide

R^{2} =
Explained variation / Total variation ***
(Equation 1)*

Explained variation =
[ Total variation - Unexplained variation ] *** (Equation 2)*

*SSR
= SST
- SSE*

Substituting Equation2 in Equation1,

R^{2} = [ Total variation - Unexplained
variation ] / Total variation

*(Equation 3)*

**Interpreting
Excel Regression Output**

Make sure you know how to interpret the Excel output.

(No kidding!)

*p*-value

Use this for
CI

**F, R**^{2 }
** & SE tell if the regression model is really useful for prediction.**

**Multiple
Regression**

You can extend the idea of
linear regression and make an independent variable dependent on more
than one variable.

Eg. The price of a house can
be dependent on Sq ft, Number of bedrooms, Baths, Pool, Garage, etc. [see page 503].

The general multiple regression
equation is:

Y��
= a + b_{1}. X_{1} + b_{2}. X_{2}
+ �� + b_{n}. X_{n}

**Practice!**

**1. What
are the independent variables?**

**2. What
is the dependent variable?**

**3. What
is the regression equation?**

**4. Is it
a significant predictor of Price?**

**5. Is Bedrooms
a sig. predictor of Price?**

**6. Is Baths
a sign. predictor of Price?**

**7. If one
had 8 Bedrooms, predict Price.**

**8. Construct
a 95% CI around it. **

- Criminology & Criminal Justice Studies is the sociology-based study of crime and the criminal justice system.
- Impact of the New Title I Requirements on Charter Schools (PDF)
- Life Science Journal 2014;11(3) http://www.lifesciencesite.com 339 Patients' satisfaction about nurses' competency in prac
- AnnuAl report 2012/2013
- EE364a Homework 5 solutions
- FAULT TOLERANT SYSTEMS
- Wh-quantification and the morphological make-up of free choice
- SOFTWARE ENGINEER Resume Sample www.timesresumes.com SOFTWARE ENGINEER
- Harn Museum of Art
- Explicit Time-Domain Finite-Element Method Stabilized for an Arbitrarily Large Time Step
- Constructing and Interpreting Graphic Displays of Behavioral Data
- A Survey of Bitmap Index Compression Algorithms for Big Data
- A Photographic Essay By Justin Kerr
- Feature Detection and Matching
- Carton Contents Installing the Powersaves 3DS™ PC software Registration Connecting your 3DS Game Cartridge Powersaves 3DS PC
- Performance overhead of KVM on Linux 3.9 on ARM Cortex-A15
- Tom's Presentation Tips
- ArcGIS 9.3 manual
- Action Research in High School Physics
- Consumer Behaviour of Luxury Automobiles: A Comparative Study between Thai and UK Customers' Perceptions
- Direct Numerical Simulation of Supersonic Turbulent Boundary Layer over a Compression Ramp

- Different specific Position
- Special quality
- The dominant factor
- Culture
- Culture Industry
- the Legislation of the Culture Industry
- Earthquake resisting
- Welded connection
- Experimental research
- Damage analysis
- Calculation model
- Earth's gravity field
- Satellite gravity measurements
- CHAMP
- GPS
- Satellite perturbation motion
- Track Credits
- Geocentric motion
- Precise Orbit
- Bayes least squares estimation block
- Time changes in Earth's gravitational potential coefficients

All Rights Reserved Powered by Free Document Search and Download

Copyright © 2011This site does not host pdf,doc,ppt,xls,rtf,txt files all document are the property of their respective owners. complaint#nuokui.com