3.2 Five-Number Summary and Boxplot
Bar Chart
Categorical Data
Pie Chart
Data
Histogram
Numerical Data
Mean, Median, Range,
Variance, Standard Deviation,
Five-Number Summary, Boxplot
3.2.1 Quartiles and Five-Number Summary
We use the following method to find quartiles of a data set:
Arrange the data set in increasing order.
The middle value (or the median) is the second quartile (), which divide the data set into two halves (lower half on the left, and upper half on the right).
The middle value of the lower half of the data set is called the first quartile ().
The middle value of the upper half of the data set is called the third quartile ().
The five-number summary is defined as: Minimum, , , , Maximum
Example 1 A survey of a class shows that the commuting times to campus (in minutes) are as follows:
15 35 5 10 20 50 30 90 37 40 60 45
Find the quartiles.
Find the five-number summary.
Solution
First we arrange the data set in increasing order:
5 10 15 20 30 35 37 40 45 50 60 90
The middle value is as follows:
5 10 15 20 30 35 37 40 45 50 60 90
For the lower half, the middle value is as follows:
5 10 15 20 30 35 37 40 45 50 60 90
For the upper half, the middle value is as follows:
5 10 15 20 30 35 37 40 45 50 60 90
(b)
The five-number summary is: 5, 17.5, 36, 47.5, 90
Note: The following graph shows the data set and the five-number summary:
The Quartiles divide the ordered data set into four parts so that each part contains the same number of observations (3 observations in this example).
Example 2 The following is some people’s TV watch time for a week (in hours):
15 17 49 25 20 10 21 19 27
Find the quartiles.
Find the five-number summary.
Solution
The ordered data set and the quartiles are:
10 15 17 19 20 21 25 27 49
The five-number summary is: 10, 16, 20, 26, 49
The following is a graph of the five-number summary:
3.2.2 Outliers and Boxplot
Outliers are exceptionally small or large observations compared with other observations in a data set.
Case 1 A student’s homework scores are all between 8 and 10 but one homework score is 1. Then the homework score 1 may be considered as an outlier.
Case 2 The gas prices along a street are mostly between $3.16 and $3.21. However one gas station’s price is $3.00 (cheap), and another gas station’s price is $3.45 (expensive). Then the two prices may be considered as outliers.
Case 3 A research team collects water samples from a lake each day and tests the pollution level. One day the pollution level is 10 times higher than the previous levels. Then the high pollution level may be considered as an outlier.
A boxplot is a graph describing the distribution of a set of data. As in the following, the box includes Q1, Q2 and Q3. The outlier(s) are separate points. The whiskers represent the data outside the box but do not include the outliers. The endpoints of the whiskers are called adjacent values which could be the end points (if there is no outlier at the end) or the point next to the outliers.
Adjacent value
Adjacent value
Q2
Q2
Q1
Q1
Q3
Q3
Adjacent value
Adjacent value
Outlier
Outlier
10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50
10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50