Consider these 10 ages (in years):

21 42 5 11 30 50 28 27 24 52

• The symbol

The three main measures that summarize the center of a distribution are the mean, median, and mode. While there are several different types of

)290( 10 1 =

= 29 years.

��

=

Page 2 |

The median was covered in the previous chapter. Briefly, this is the value with a

2 in the

4 7 8 11 12

��

median

The median has a depth of (5 + 1) / 2 = 3 and a value of 8. The median is slightly more

The mode is the most frequently occurring value in a data set.

AGE | Freq % ------+--------------- 3 | 2 0.3% 4 | 9 1.4% 5 | 28 4.3% 6 | 37 5.7% 7 | 54 8.3% 8 | 85 13.0%

�� Mode (most frequent value)

10 | 81 12.4% 11 | 90 13.8% 12 | 57 8.7% 13 | 43 6.6% 14 | 25 3.8% 15 | 19 2.9% 16 | 13 2.0% 17 | 8 1.2% 18 | 6 0.9% 19 | 3 0.5% ------+--------------- Total | 654 100.0%

The mode is unreliable in all but very large data sets.

Page 3 |

The mean, median, and mode are equivalent when the distribution is unimodal and symmetrical. However, with asymmetry, the median is approximately one-third the distance between the mean and mode: The mean, median, and mode offer different advantages and disadvantages. The mean offers the advantages of familiarity and efficiency. It also has advantages when making inferences about a population mean. On the downside, the mean is markedly influenced by extreme skewness and outliers. Under circumstances of extreme skewness, the median is a more ��stable.�� An often cited example of this advantage come when considering the salary of employees, where the salary of highly paid executives skews the average income toward a misleadingly high value. Another example is the average price of homes, in which case high priced homes skew the data in a positive direction. In such circumstances, the median is less likely to be misinterpreted, and is therefore the preferred measure of central location. You can judge the asymmetry of a distribution by comparing its mean and median. When the mean is greater than the median, the distribution has a positive skew. When the mean is about equal to the median, the distribution is symmetrical. When the mean is less than the median, the distribution has a negative skew: mean > median : positive skew mean ≅ median : symmetry mean < median : negative skew In general, the mean is preferred when data are symmetrical and do not have outliers. In other instances, the median may be the preferred measure of central location.

Page 4 |

Measures of Spread

Simply noting the minimum and maximum values can be useful when describing the spread of a distribution. However, calculating the sample range (maximum – minimum) is not an acceptable measure of spread because it will consistently underestimate the population range.

The

5 11 21 24 27 28 30 42 50 52 low group M high group

The low group has a ��median�� of 21. This is Q1. The high group has a ��median�� of 42. This is Q3. The five-point summary (5, 21, 27.5, 42, 52)

1.47 2.06 2.36 3.43 3.74 3.78 3.94 low group M high group

Recall that you must include the median in both the low group and high group for Tukey��s hinges. Therefore the low group consists of {1.47, 2.06, 2.36, 3.43}. The ��median�� of this low group (Q1) is the average of 2.06 and 2.36, or 2.21. The ��median�� of the high group (Q3) is 3.76. The five point summary is (1.47, 2.21, 3.43, 3.76, 3.95).

Page 5 |

An good measure of spread (esp. for asymmetrical data) is the

The Tukey boxplot consists of a box showing Q1, Q2, and Q3, whiskers and, occasionally

5 11 21 24 27 28 30 42 50 52

The 5-point summary (determine on prior page) is (5, 21, 27.5, 42, 52). The box extends from 21 to 42 and has a line in its midst to identify the median at 27.5. The FL = 21 − (1.5)(21) = −10.5 and FU = 42 + (1.5)(21) = 73.5. No values in the data set are above 73.5 or below −10.5.Therefore, there are no outside values. The upper inside value is 52 and the lower inside is the 5. Whiskers are drawn from the hinges to the inside values. The boxplot is shown on the next page.

Page 6 |

3 21 22 24 25 26 28 29 31 51

The five-point summary is (3, 22, 25.5, 29, 51). IQR = 29 − 22 = 7. FU = 29 + (1.5)(7) = 39.5. FL = 22 − (1.5)(7) = 11.5. There is one value above the upper fence (51). There is one value below the lower fence (3). The largest value still inside the upper fences (upper inside value) is 31. The smallest value still inside the lower fence (lower inside value) is 21.

Page 7 |

Both the standard deviation and variance are based on

�� −

=

2

) (

The

2

Note: The denominator in this formula is by

2

The sample variance is

1111.237 110 2134 1

2

= − = − =

years2. [Note the squared units.] The standard deviation

398.15 1111.237

2

= = =

= 15.4 years. Report the standard deviation in conjunction with the mean, round accordingly, and include units of measure. ��The mean age of the participants was 29.0 years with a standard deviation of 15.4 years.�� Calculating a few variance and standard deviations by hand is instructive. However, most of the time we calculate the standard deviation with a computer or calculator. Do not need to calculate variances and standard deviations by hand unless the instructions specifically request a step-by-step calculation.

