3.2 Five-Number Summary and Boxplot

Bar Chart

Categorical Data

Pie Chart

Data

Histogram

Numerical Data

Mean, Median, Range,

Variance, Standard Deviation,

Five-Number Summary, Boxplot

3.2.1 Quartiles and Five-Number Summary

We use the following method to find quartiles of a data set:

Arrange the data set in increasing order.

The middle value (or the median) is the second quartile (), which divide the data set into two halves (lower half on the left, and upper half on the right).

The middle value of the lower half of the data set is called the first quartile ().

The middle value of the upper half of the data set is called the third quartile ().

The five-number summary is defined as: Minimum, , , , Maximum

Example 1 A survey of a class shows that the commuting times to campus (in minutes) are as follows:

15 35 5 10 20 50 30 90 37 40 60 45

Find the quartiles.

Find the five-number summary.

Solution

First we arrange the data set in increasing order:

5 10 15 20 30 35 37 40 45 50 60 90

The middle value is as follows:

5 10 15 20 30 35 37 40 45 50 60 90

For the lower half, the middle value is as follows:

5 10 15 20 30 35 37 40 45 50 60 90

For the upper half, the middle value is as follows:

5 10 15 20 30 35 37 40 45 50 60 90

(b)

The five-number summary is: 5, 17.5, 36, 47.5, 90

Note: The following graph shows the data set and the five-number summary:

The Quartiles divide the ordered data set into four parts so that each part contains the same number of observations (3 observations in this example).

Example 2 The following is some people’s TV watch time for a week (in hours):

15 17 49 25 20 10 21 19 27

Find the quartiles.

Find the five-number summary.

Solution

The ordered data set and the quartiles are:

10 15 17 19 20 21 25 27 49

The five-number summary is: 10, 16, 20, 26, 49

The following is a graph of the five-number summary:

3.2.2 Outliers and Boxplot

Outliers are exceptionally small or large observations compared with other observations in a data set.

Case 1 A student’s homework scores are all between 8 and 10 but one homework score is 1. Then the homework score 1 may be considered as an outlier.

Case 2 The gas prices along a street are mostly between $3.16 and $3.21. However one gas station’s price is $3.00 (cheap), and another gas station’s price is $3.45 (expensive). Then the two prices may be considered as outliers.

Case 3 A research team collects water samples from a lake each day and tests the pollution level. One day the pollution level is 10 times higher than the previous levels. Then the high pollution level may be considered as an outlier.

A boxplot is a graph describing the distribution of a set of data. As in the following, the box includes Q1, Q2 and Q3. The outlier(s) are separate points. The whiskers represent the data outside the box but do not include the outliers. The endpoints of the whiskers are called adjacent values which could be the end points (if there is no outlier at the end) or the point next to the outliers.

Adjacent value

Adjacent value

Q2

Q2

Q1

Q1

Q3

Q3

Adjacent value

Adjacent value

Outlier

Outlier

10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50

10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50