3.1 Center and Deviation

Bar Chart

Categorical Data

Pie Chart

Data

Histogram

Numerical Data

Mean, Median, Range,

Variance , Standard Deviation

Mean and Median are descriptive measures to describe the center of a data set.

Range, Variance and standard deviation are descriptive measures to describe the deviation of a data set.

3.1.1 Mean (Measurement of Center)

For a data set, the mean (or average) is defined as:

Example The following are the scores of two students:

Student A: 9 7 8.5 7.5 8

Student B: 5 10 6 8

Find the sample mean for each student.

Use the sample means to determine which student performs better.

Solution

(a)

For Student A,

For Student B,

(b) Student A performs better because the student has higher mean (average).

Symbols and formula

Consider Student A: 9 7 8.5 7.5 8

The score for the 1st quiz is denoted as = 9

The score for the 2nd quiz is denoted as = 7

The score for the 3rd quiz is denoted as = 8.5

The score for the 4th quiz is denoted as = 7.5

The score for the 5th quiz is denoted as = 8

Then, the sum of the observations:

= 40

The Greek lettercorresponds to the letter S and is used as an abbreviation for the phrase “the sum of”. So, in place of , we can use the summation notation, . Then we can express the sum as

= = 40

Let denote the number of observations (also called the sample size).

Let denote the mean. Then

In general, for a sample containing observations , the sample mean is given by the following expression:

3.1.2 Median (Measurement of Center)

To find the median of a data set, we first arrange the data in increasing order. The median is the observation in the middle (if it exists) or the average of the two observations in the middle.

By this definition, half the observations are smaller than the median, and half the observations are larger than the median.

Example The following are the scores of two students:

Student A: 9 7 8.5 7.5 8

Student B: 5 10 6 8

Find the median score for each student.

Use the medians to determine which student performs better.

Solution

For Student A, the data set in increasing order is: 7 7.5 8 8.5 9

↑

Median

For Student B, the data set in increasing order is: 5 6 8 10

↑

Note: For Student B, there are two middle numbers 7 and 8. The median is the average of the two numbers.

Student A performs better because the student has higher median.

We summarize the process of finding median as follows:

Arrange the data in increasing order.

If the number of observations is odd, the median is the observation in the middle.

If the number of observations is even, the median is the average of the two observations in the middle.

Note: For the data of household incomes in a region, some households may have very large incomes (for example, $200,000 or much more). This kind of data is called right skewed. In this case, the mean is larger than the median. The reason is as follows. In the calculation of mean, a household with $200,000 income is equivalent of ten households with $20,000 income. Hence the center is higher. In the calculation of median, a household with $200,000 income is equivalent to a household with $20,000 because each one is counted as one observation. Hence the center is lower.

Another explanation is as follows:

Case 1. The distance of each object from the center has no effect on the balance (as median):

Case 2. The longer distance from the center, the more effect to tip the balance (as average):

3.1.3 Range (Measurement of Deviation)

For a set of data, the range is:

Example The following are the scores of two students:

Student A: 9 7 8.5 7.5 8

Student B: 5 10 6 8

Find the range for each student.

Use the ranges to determine which student’s scores are more spread out.

Solution

(a)

For Student A,

For Student B,

(b)

Student B’s scores are more spread out because the range is bigger.

Note: Range is calculated involving only the maximum and minimum of a data set. In the following we discuss the concept of sample standard deviation which is calculated involving all the numbers in a data set.

3.1.4 Sample Variance and Standard Deviation (Measurement of Deviation)

We use the following formulas to find sample standard deviation:

Example The following are the scores of Student A:

Student A: 9 7 8.5 7.5 8

Find the sample standard deviation.

Solution 1 (Using formulas)

Solution 2 (Using table)

9

9 – 8 = 1

(1)2 = 1

7

7 – 8 = -1

(-1)2 = 1

8.5

8.5 – 8 = 0.5

(0.5)2 = 0.25

7.5

7.5 – 8 = -0.5

(-0.5)2 = 0.25

8

8 – 8 = 0

(0)2 = 0

40

2.5

The sample standard deviation measures how far away the scores are from the mean 8. So s = 0.79 indicates that the scores deviate from the center by 0.79 point.

SD = 0.79

SD = 0.79

Center 8

Example The following are the scores of Student B:

Student B: 5 10 6 8

Find the sample standard deviation.

Solution

The sample standard deviation measures how far away the scores are from the mean 7.25. So s = 2.22 indicates that the scores deviate from the center by 2.22 points.

SD = 2.22

SD = 2.22

Center 7.25

Example The following are the scores of two students:

Student A: 9 7 8.5 7.5 8

Student B: 5 10 6 8

Use the sample standard deviations to determine which student’s score is more spread out.

Solution

From Examples 4 and 5, we have

Sample standard deviation

Student A

0.79

Student B

2.22

Student B’s scores are more spread out because the sample standard deviation is larger.

Relation between sample variance and sample standard deviation

Sample variance is the square of sample standard deviation. Conversely, sample standard deviation is the square root of the sample variance.

Example The sample variance of a sample is 6.54. Find the sample standard deviation.

Solution

Example The sample standard deviation is 0.56. Find the sample variance.

Solution

3.1.5 Weighted Average

Suppose that the values have different frequencies (or weights) The weighted average is

Example

Class A has 20 students and the average score is 80.

Class B has 26 students and the average score is 75.

Class C has 32 students and the average score is 81.

What is the average score for all the students?

Solution 1 (using formula)

The number of students in each class is the weight , and the average score in each class is .

Solution 2 (using table)

We can organize the above in the following table:

Grade ()

Number of students (frequency or weight) ()

80

20

80 × 20 = 1600

75

26

75 × 26 = 1950

81

32

81 × 32 = 2592

Total

78

6142

By the weighted average formula:

Note: 78 is the total frequency (total number of students) and 6142 is the total scores.

3.1.6 Using Midpoint to Calculate Weighted Average

Example The following is a frequency distribution of commuting time from home to school for the students of a class. Find the average commuting time.

Commuting time (minutes)

Number of students

1-10

2

11-20

3

21-30

6

31-40

0

41-50

1

Solution 1 (using formula)

Commuting time (minutes) (midpoint )

Number of students ()

1-10 ()

2

11-20 ()

3

21-30 (25.5)

6

31-40 (35.5)

0

41-50 (45.5)

1

Note: For the class 11-20, if we use the average of the end points as the midpoint, then midpoint =(11+20)/2=15.5. For convenience, we can also use 15 as the midpoint. There is no reason to believe 15.5 is more accurate than 15 or vice versa. Therefore, in homework assignment or test, both are acceptable.

Solution 2 (using table)

Commuting time (minutes)

Midpoint ()

Frequency (

1-10

5.5

2

5.5 × 2 = 11

11-20

15.5

3

15.5 × 3 = 46.5

21-30

25.5

6

25.5 × 6 = 153

31-40

35.5

0

35.5 × 0 = 0

41-50

45.5

1

45.5 × 1 = 45

Total

12

256