# Chapter 9 Estimation from Sample Data Work Sampling

#### Document technical information

Format pdf
Size 4.0 MB
First found Jun 9, 2017

#### Document content analysis

Category Also themed
Language
English
Type
not defined
Concepts
no text concepts found

#### Transcript

```Chapter 9
Estimation from
Sample Data
Work Sampling
In the workplace, production experts sometimes conduct studies to find out how
much time workers spend doing various job activities. This information can be used in
establishing or updating standards for performance, as well as for comparing and
evaluating worker performance. Among the approaches for determining how much
time a worker spends at various activities is a technique known as work sampling.
Compared to alternative methods (such as following the person and timing him or
her with a stopwatch), work sampling is unobtrusive in that the behavior observed
is not influenced by the observation process itself.
In work sampling, a worker is observed at randomly selected points along an
interval of time; then the proportion of these observations that involve each selected
activity is determined. For example, if we want to determine how much of the time a
secretary spends keyboarding, we can observe the person at random times during a
typical day or week, then calculate the proportion of time that he or she happens to
be keyboarding. If the person were observed to be keyboarding in 100
of 280 random observations, the sample proportion
ata
mple d
would be 100兾280, or 0.357.
d on sa
e
s
a
b
e
renc
Using this information, along
an infe
Making
with estimation techniques
described later in this chapter, we
could then arrive at an interval
estimate reflecting the likely range of
values within which the true
population proportion lies. When you
finish Section 9.6, you might like to pay
a short visit back here and verify that
the 90% confidence interval for the population proportion is from 0.310 to
0.404, and that we are 90% confident
that the person spends somewhere
between 31.0% and 40.4% of his or her
time clicking away at the keyboard.
270
Part 3: Sampling Distributions and Estimation
learning
objectives
chapter, you should
be able to:
9.1
•
Explain the difference between a point estimate and an interval estimate for a
population parameter.
•
Use the standard normal distribution in constructing a confidence interval for a
population mean or proportion.
•
Use the t distribution in constructing a confidence interval for a population
mean.
•
Decide whether the standard normal distribution or the t distribution should
be used in constructing a given confidence interval.
•
Determine how large a simple random sample must be in order to estimate a
population mean or proportion at specified levels of accuracy and confidence.
•
Use Excel and Minitab to construct confidence intervals.
INTRODUCTION
In Chapter 8, we began with a population having a known mean (␮) or proportion (␲); then we examined the sampling distribution of the corresponding sample statistic (x or p) for samples of a given size. In this chapter, we’ll be going
in the opposite direction—based on sample data, we will be making estimates
involving the (unknown) value of the population mean or proportion. As mentioned previously, the use of sample information to draw conclusions about the
population is known as inferential statistics.
To repeat a very important point, this chapter completes a key transition discussed at the beginning of Chapter 7:
•
statements about individual x values taken from the population.
central limit theorem and began making probability statements about the
means of samples taken from the population. (Beginning with Chapter 8, the
sample mean itself is considered as a random variable.)
• In this chapter, we again lack access to the population mean, but we will begin
using sample data as the basis from which to make probability statements
about the true (but unknown) value of the population mean. As in Chapter 8,
we will be relying heavily on the central limit theorem.
In the following sections, we will use sample data to make both point and
interval estimates regarding the population mean or proportion. While the
point estimate is a single number that estimates the exact value of the population parameter of interest (e.g., ␮ or ␲), an interval estimate includes a range
of possible values that are likely to include the actual population parameter.
When the interval estimate is associated with a degree of confidence that it
actually includes the population parameter, it is referred to as a confidence
interval.
Point and interval estimates can also be made regarding the difference between
two population means (␮1 ␮2) or proportions (␲1 ␲2). These involve data from
two samples, and they will be discussed in the context of the hypothesis-testing
procedures of Chapter 11.
Chapter 9: Estimation from Sample Data
271
Whenever sample data are used for estimating a population mean or proportion, sampling error will tend to be present because a sample has been taken
instead of a census. As a result, the observed sample statistic (x or p) will differ
from the actual value of the population parameter (␮ or ␲). Assuming a simple
random sampling of elements from the population, formulas will be presented for
determining how large a sample size is necessary to ensure that such sampling
error is not likely to exceed a given amount.
exercises
9.1 Differentiate between a point estimate and an interval
9.3 What is necessary for an interval estimate to be a
estimate for a population parameter.
confidence interval?
9.2 What is meant by inferential statistics, and what role
does it play in estimation?
9.2
POINT ESTIMATES
An important consideration in choosing a sample statistic as a point estimate of
the value of a population parameter is that the sample statistic be an unbiased estimator. An estimator is unbiased if the expected value of the sample statistic is
the same as the actual value of the population parameter it is intended to estimate.
Three important point estimators introduced in the chapter are those for a population mean (␮), a population variance (␴2), and a population proportion (␲).
As Chapter 8 showed, the expected value of the sample mean is the population
mean, and the expected value of the sample proportion is the population proportion. As a result, x and p are unbiased estimators of ␮ and ␲, respectively. Table 9.1
presents a review of the applicable formulas. Note that the divisor in the formula
for the sample variance (s2) is (n 1). Using (n 1) as the divisor in calculating the
variance of the sample results in s2 being an unbiased estimate of the (unknown)
population variance, ␴2. The positive square root of s2, the sample standard deviation (s), will not be an unbiased estimate of the population standard deviation (␴).
In practice, however, s is the most frequently used estimator of its population counterpart, ␴.
Population Parameter
Unbiased
Estimator
Mean, ␮
x
Variance, ␴2
s2
Proportion, ␲
p
TABLE 9.1
Formula
x
s2 兺xi
n

n1
x successes
p
n trials
An estimator is unbiased if
its expected value is the
same as the actual value of
the corresponding population parameter. Listed here
are unbiased point estimators for a population mean,
a population variance, and
a population proportion.
272
Part 3: Sampling Distributions and Estimation
exercises
9.4 What is meant when a sample statistic is said to be
an unbiased estimator?
9.5 When calculating the sample variance, what procedure
is necessary to ensure that s2 will be an unbiased estimator
of ␴ 2? Will s be an unbiased estimator of ␴?
9.6 During the month of July, an auto manufacturer gives
its production employees a vacation period so it can tool
up for the new model run. In surveying a simple random
sample of 200 production workers, the personnel director
finds that 38% of them plan to vacation out of state for at
least one week during this period. Is this a point estimate
or an interval estimate? Explain.
days each was absent during the past month was found to
be 0, 2, 4, 2, 1, 7, 3, and 2, respectively.
a. What is the point estimate for ␮, the mean number of
days absent for the firm’s employees?
b. What is the point estimate for ␴2, the variance of the
number of days absent?
9.8 The average annual U.S. per capita consumption of
9.7 A simple random sample of 8 employees is selected
iceberg lettuce has been estimated as 24.3 pounds. The
annual per capita consumption 2 years earlier had been
estimated as 21.6 pounds. Could either or both of these
consumption figures be considered a point estimate?
Could the difference between the two consumption
figures be considered an interval estimate? Explain your
reasoning in both cases. SOURCE: Bureau of the Census, Statistical
from a large firm. For the 8 employees, the number of
Abstract of the United States 2002, p. 130.
9.3
A PREVIEW OF INTERVAL ESTIMATES
When we know the values of the population mean and standard deviation, we can
(if either the population is normally distributed or n is large) use the standard
normal distribution in determining the proportion of sample means that will fall
within a given number of standard error (␴x) units of the known population
mean. This is exactly what we did in Chapter 8.
It is typical of inferential statistics that we must use the mean (x) and standard deviation (s) of a single sample as our best estimates of the (unknown)
values of ␮ and ␴. However, this does not prevent us from employing x and s in
constructing an estimated sampling distribution for all means having this sample
size. This is the basis for the construction of an interval estimate for the population mean.
When we apply the techniques of this chapter and establish the sample
mean as the midpoint of an interval estimate for the population mean, the
resulting interval may or may not include the actual value of ␮. For example,
in Figure 9.1, six of the seven simple random samples from the same population led to an interval estimate that included the true value of the population
mean.
In Figure 9.1, the mean of sample number 1 ( x 1) is slightly greater than the
population mean (␮), and the interval estimate based on this sample actually includes ␮. For sample 3, taken from the same population, the estimation interval
does not include ␮. In Figure 9.1, we can make these observations because the
value of ␮ is known. In practice, however, we will not have the benefit of knowing the actual value of the population mean. Therefore, we will not be able to say
with complete certainty that an interval based on our sample result will actually
include the (unknown) value of ␮.
Chapter 9: Estimation from Sample Data
273
FIGURE 9.1
Sampling distribution of
the mean for all possible
simple random samples
with a given sample size
x
mx = m = 20.0
Does this interval
include the actual
population mean, m?
15.0
Yes
22.0
29.0
x1
11.0
18.0
25.0
Yes
x2
23.5
30.5
37.5
No
x3
10.0
17.0
24.0
Yes
x4
8.0
15.0
22.0
Yes
x5
19.0
26.0
33.0
Yes
x6
14.5
21.5
28.5
Yes
x7
mx = m = 20.0
The interval estimate for the mean simply describes a range of values that is
likely to include the actual population mean. This is also the case for our use of
the sample proportion (p) to estimate the population proportion (␲), as well as
for our construction of an interval estimate within which the actual value of ␲ is
likely to fall.
The following terms are of great importance in interval estimation:
INTERVAL ESTIMATE A range of values within which the actual value of the
population parameter may fall.
INTERVAL LIMITS The lower and upper values of the interval estimate.
(continued)
Examples of seven different
interval estimates for a population mean, with each interval based on a separate
simple random sample from
the population. Six of the
seven interval estimates include the actual value of ␮.
274
Part 3: Sampling Distributions and Estimation
CONFIDENCE INTERVAL An interval estimate for which there is a specified degree of certainty that the actual value of the population parameter will fall within
the interval.
CONFIDENCE COEFFICIENT For a confidence interval, the proportion of such intervals that would include the population parameter if the process leading to the
interval were repeated a great many times.
CONFIDENCE LEVEL Like the confidence coefficient, this expresses the degree
of certainty that an interval will include the actual value of the population parameter, but it is stated as a percentage. For example, a 0.95 confidence coefficient is
equivalent to a 95% confidence level.
ACCURACY The difference between the observed sample statistic and the actual
value of the population parameter being estimated. This may also be referred to as
estimation error or sampling error.
To illustrate these and several other terms discussed so far, we have provided
their values in the following example, which is typical of published statistical findings. The methods by which the values were determined will become apparent in
the sections to follow.
example
Interval Estimates
“In our simple random sample of 2000 households, we found the average income
to be x \$65,000, with a standard deviation, s \$12,000. Based on these data,
we have 95% confidence that the population mean is somewhere between
\$64,474 and \$65,526.”
SOLUTION
•
•
•
•
•
•
•
Point estimate of ␮ \$65,000
Point estimate of ␴ \$12,000
Interval estimate of ␮ \$64,474 to \$65,526
Lower and upper interval limits for ␮ \$64,474 and \$65,526
Confidence coefficient 0.95
Confidence level 95%
Accuracy For 95% of such intervals, the sample mean would not differ from
the actual population mean by more than \$526.
When constructing a confidence interval for the mean, a key consideration
is whether we know the actual value of the population standard deviation (␴).
As Figure 9.2 shows, this will determine whether the normal distribution or
the t distribution (see Section 9.5) will be used in determining the appropriate
interval. Figure 9.2 also summarizes the procedure for constructing the confidence interval for the population proportion, a technique that will be discussed
in Section 9.6.
Chapter 9: Estimation from Sample Data
275
FIGURE 9.2
Confidence interval
estimate for a
Population mean
Population proportion
s known
s unknown
z-interval,
with limits:
s
x ± z –––
√n
t-interval,
with limits:
s
x ± t –––
√n
and
df = n – 1
z-interval,
with limits:
p(1 – p)
p ± z –––––––
n
Section 9.4
Note 1
Section 9.5
Note 2
Section 9.6
Note 3
This figure provides an
overview of the methods
for determining confidence
interval estimates for a
population mean or a
population proportion
and indicates the chapter
section in which each is
discussed. Key assumptions
are reviewed in the figure
notes.
√
1If the population is not normally distributed, n should be at least 30 for the central limit
theorem to apply.
2When ␴ is unknown, but the underlying population can be assumed to be approximately
normally distributed, use of the t distribution is a necessity when n 30. Use of the t distribution is also appropriate, however, when ␴ is unknown and the sample sizes are larger.
Most computer statistical packages routinely use the t-interval for all sample sizes when s is
used to estimate ␴.
3Assumes that both np and n(1 p) are 5. The normal distribution as an approximation
to the binomial improves as n becomes larger and for values of p that are closer to 0.5.
exercises
9.9 Exactly what is meant by the accuracy of a point
estimate?
9.10 A population is approximately normally distributed
and the sample size is to be n 40. What additional factor
must be considered in determining whether to use the standard normal distribution in constructing the confidence
interval for the population mean?
9.11 “In surveying a simple random sample of 1000 em-
ployed adults, we found that 450 individuals felt they
were underpaid by at least \$3000. Based on these results,
we have 95% confidence that the proportion of the population of employed adults who share this sentiment is
between 0.419 and 0.481.” For this summary statement,
identify the
a. point estimate of the population proportion.
b. confidence interval estimate for the population
proportion.
c. confidence level and the confidence coefficient.
d. accuracy of the sample result.
CONFIDENCE INTERVAL ESTIMATES
FOR THE MEAN: ␴ KNOWN
If we don’t know the value of the population mean, chances are that we also do
not know the value of the population standard deviation. However, in some cases,
usually industrial processes, ␴ may be known while ␮ is not. If the population
9.4
276
Part 3: Sampling Distributions and Estimation
cannot be assumed to be normally distributed, the sample size must be at least 30
for the central limit theorem to apply. When these conditions are met, the confidence interval for the population mean will be as follows:
Confidence interval limits for the population mean, ␴ known:
xz
N O T E S
␴

where x sample mean
␴ population standard deviation
n sample size
z z value corresponding to the level of
confidence desired (e.g., z 1.96
for the 95% confidence level)
␴/兹n standard error of the sampling
distribution of the mean
This application assumes that either (1) the underlying population is normally
distributed or (2) the sample size is n 30. Also, an alternative way of describing
the z value is to refer to it as z␣/2, with ␣兾2 being the area to the right. For example, z0.025 would be 1.96.
example
z-Interval, Mean
From past experience, the population standard deviation of rod diameters produced by a machine has been found to be ␴ 0.053 inches. For a simple random
sample of n 30 rods, the average diameter is found to be x 1.400 inches. The
underlying data are in file CX09RODS.
SOLUTION
What Is the 95% Confidence Interval for the Population Mean, ␮?
Although we don’t know the value of ␮, x 1.400 inches is our best estimate of
the population mean diameter. As Figure 9.3 shows, the sampling distribution of
the mean will have a standard error of ␴/兹n, or 0.053/兹30. For the standard
normal distribution, 95% of the area will fall between z 1.96 and z 1.96.
We are able to use the standard normal distribution table because n 30 and the
central limit theorem can be invoked. As a result, the 95% confidence interval for
the (unknown) population mean can be calculated as
xz
␴
0.053
1.400 1.96

or between 1.381 and 1.419 inches
Figure 9.3 shows the midpoint (x 1.400 inches) for the 95% confidence interval for the mean, along with the lower and upper limits for the confidence interval. Based on our calculations, the 95% confidence interval for the population
mean is from 1.381 to 1.419 inches.
Chapter 9: Estimation from Sample Data
277
FIGURE 9.3
Normal distribution:
for 95% confidence,
z will be ±1.96.
Area = 0.95
z = –1.96
0
z = +1.96
The 95% confidence interval for m is
s
x – z –––
√n
0.053
1.400 – 1.96 –––––
√30
or
1.381
x
s
x + z –––
√n
1.400
0.053
1.400 + 1.96 –––––
√30
1.400
1.419
More precisely, 95% of such intervals constructed in this way would include
the population mean. Since we have taken only one sample and constructed just
one interval, it is technically correct to say we have 95% confidence that this
particular interval contains the population mean. Although the logic may be
tempting, this is not the same as saying the probability is 0.95 that this particular
interval will include the population mean.
With other factors unchanged, a higher confidence level will require a wider
confidence interval. Likewise, a lower confidence level will lead to a narrower
confidence interval. In other words, the more certain we wish to be that the interval estimate contains the population parameter, the wider the interval will have to
be. Refer to Seeing Statistics Applet 9, at the end of the chapter, to visually
demonstrate how the width of the confidence interval changes when higher and
lower levels of confidence are specified.
Computer Solutions 9.1 shows how we can use Excel or Minitab to generate
a confidence interval for the mean when the population standard deviation is
known or assumed. In this case, we are replicating the 95% confidence interval
shown in Figure 9.3, and the 30 data values are in file CX09RODS. If we use an
Excel procedure based on summary statistics, it can be interesting to examine
“what-if” scenarios to instantly see how changes in the specified confidence level
would change the width of the confidence interval.
Construction of the 95%
confidence interval for the
population mean, based on
a sample of 30 rods for
which the average diameter
is 1.400 inches. From past
experience, the population
standard deviation is known
to be ␴ 0.053 inches.
Because ␴ is known, the
normal distribution can be
used in determining the
interval limits. We have
95% confidence that ␮
is between 1.381 and
1.419 inches.
278
Part 3: Sampling Distributions and Estimation
computer solutions 9.1
Confidence Interval for Population Mean, ␴ Known
These procedures show how to construct a confidence interval for the population mean when the
population standard deviation is known.
EXCEL
1
2
3
4
5
6
7
8
9
A
B
z-Estimate: Mean
Mean
Standard Deviation
Observations
SIGMA
LCL
UCL
C
diameter
1.400
0.052
30
0.053
1.381
1.419
Excel confidence interval for ␮ based on raw data and ␴ known
1. For example, using the 30 rod diameters (file CX09RODS.XLS) on which Figure 9.3 is based: The label and 30 data values are in A1:A31. Click Tools. Click Data Analysis Plus. Click Z-Estimate: Mean. Click OK.
2. Enter A1:A31 into the Input Range box. Enter the known population standard deviation (0.053) into the Standard
Deviation (SIGMA) box. Click Labels, since the variable name is in the first cell within the field. The desired confidence level as a decimal fraction is 0.95, so the corresponding alpha value is 1 0.95 0.05. Enter 0.05 into the
Alpha box. Click OK. The confidence interval results will be as shown above.
Excel confidence interval for ␮ based on summary statistics and ␴ known
1. For example, with x 1.400, ␴ 0.053, and n 30, as in Figure 9.3: Open the ESTIMATORS.XLS workbook, supplied
with the text.
2. Using the arrows at the bottom left, select the z-Estimate_Mean worksheet. Enter the sample mean (1.4), the known
sigma (0.053), the sample size (30), and the desired confidence level as a decimal fraction (0.95).
(Note: As an alternative, you can use Excel worksheet template TMZINT.XLS, supplied with the text. The steps are described
within the template.)
MINITAB
Confidence interval for ␮ based on raw data and ␴ known
One-Sample Z: diameter
The assumed standard deviation = 0.053
Variable
diameter
N
30
Mean
1.40000
StDev
0.05196
SE Mean
0.00968
95% CI
(1.38103, 1.41897)
1. For example, using the data (file CX09RODS.MTW) on which Figure 9.3 is based, with the 30 data values in column C1:
Click Stat. Select Basic Statistics. Click 1-Sample Z.
2. Select Samples in columns and enter C1 into the box. Enter the known population standard deviation (0.053) into
the Standard deviation box. The Perform hypothesis test box should be left blank—we will not be doing hypothesis testing until the next chapter.
Chapter 9: Estimation from Sample Data
279
3. Click Options. Enter the desired confidence level as a percentage (95.0) into the Confidence Level box. Within the
Alternative box, select not equal. Click OK. Click OK. The printout also includes the sample mean (1.400), the known
sigma (0.053), the sample size (30), and the standard error of the mean (calculated as 0.053 ⁄兹30 0.00968). Although the sample standard deviation is shown, it is not used in the construction of this confidence interval.
Confidence interval for ␮ based on summary statistics and ␴ known
Follow the procedure in the previous steps for raw data, but in step 2 select Summarized data and enter 30 into the
Sample size box and 1.4 into the Mean box.
exercises
9.12 What role does the central limit theorem play in the
construction of a confidence interval for the population
mean?
9.13 In using the standard normal distribution to construct
a confidence interval for the population mean, what two
assumptions are necessary if the sample size is less than 30?
9.14 The following data values are a simple random
sample from a population that is normally distributed,
with ␴2 25.0: 47, 43, 33, 42, 34, and 41. Construct
and interpret the 95% and 99% confidence intervals for
the population mean.
9.15 A simple random sample of 30 has been collected
from a population for which it is known that ␴ 10.0.
The sample mean has been calculated as 240.0. Construct
and interpret the 90% and 95% confidence intervals for
the population mean.
9.16 A simple random sample of 25 has been collected
from a normally distributed population for which it is
known that ␴ 17.0. The sample mean has been calculated as 342.0, and the sample standard deviation is
s 14.9. Construct and interpret the 95% and 99%
confidence intervals for the population mean.
9.17 The administrator of a physical therapy facility has
found that postoperative performance scores on a knee
flexibility test have tended to follow a normal distribution
with a standard deviation of 4. For a simple random sample of ten patients who have recently had knee surgery,
the scores are as follows: 101, 92, 94, 88, 52, 93, 76, 84,
72, and 98. Construct and interpret the 90% and 95%
confidence intervals for the population mean.
9.18 In testing the heat resistance of electrical components,
safety engineers for an appliance manufacturer routinely
subject wiring connectors to a temperature of 450 degrees
Fahrenheit, then record the amount of time it takes for
the connector to melt and cause a short circuit. Past experience has shown the standard deviation of failure times
to be 6.4 seconds. In a simple random sample of 40 connectors from a very large production run, the mean time
until failure was found to be 35.5 seconds. Construct and
interpret the 99% confidence interval for ␮ the mean
time until failure for all of the connectors from the
production run.
9.19 An assembly process includes a torque wrench
device that automatically tightens compressor housing
bolts; the device has a known process standard deviation
of ␴ 3 lb-ft in the torque applied. A simple random
sample of 35 nuts is selected, and the average torque to
which they have been tightened is 150 lb-ft. What is the
95% confidence interval for the average torque being
applied during the assembly process?
9.20 A machine that stuffs a cheese-filled snack product
can be adjusted for the amount of cheese injected into
each unit. A simple random sample of 30 units is selected,
and the average amount of cheese injected is found to
be x 3.5 grams. If the process standard deviation is
known to be ␴ 0.25 grams, construct the 95% confidence interval for ␮ the average amount of cheese
being injected by the machine.
9.21 In Exercise 9.20, if the sample size had been n 5
about the population distribution of filling weights in order
to use z values in constructing the confidence interval?
/ data set / Note: Exercises 9.22 and 9.23 require a
computer and statistical software.
9.22 For one of the tasks in a manufacturing process, the
mean time for task completion has historically been
35.0 minutes, with a standard deviation of 2.5 minutes.
Workers have recently complained that the machinery used
in the task is wearing out and slowing down. In response to
the complaints, plant engineers have measured the time required for a sample consisting of 100 task operations. The
100 sample times, in minutes, are in data file XR09022.
280
Part 3: Sampling Distributions and Estimation
Using the mean for this sample, and assuming that the population standard deviation has remained unchanged at 2.5
minutes, construct the 95% confidence interval for the population mean. Is 35.0 minutes within the confidence interval? Interpret your “yes” or “no” answer in terms of
whether the mean time for the task may have changed.
9.23 Sheila Johnson, a state procurement manager, is
responsible for monitoring the integrity of a wide range of
products purchased by state agencies. She is currently
examining a sample of paint containers recently received
from a long-time supplier. According to the supplier, the
process by which the cans are filled involves a small
9.5
amount of variation from one can to the next, and the standard deviation is 0.25 fluid ounces. The 40 cans in Sheila’s
sample were examined to determine how much paint they
contained, and the results (in fluid ounces) are listed in data
file XR09023. Using the mean for this sample, and assuming
that the population standard deviation is 0.25 fluid ounces,
construct the 90% confidence interval for the population
mean volume for the cans of paint provided by the supplier.
If the labels on the paint cans say the mean content for such
containers is 100.0 fluid ounces, would your confidence interval tend to support this possibility?
CONFIDENCE INTERVAL ESTIMATES
FOR THE MEAN: ␴ UNKNOWN
It is rare that we know the standard deviation of a population but have no knowledge about its mean. For this reason, the techniques of the previous section are
much less likely to be used in practice than those discussed here. Whenever the
population standard deviation is unknown, it must be estimated by the sample
standard deviation, s. For such applications, there is a continuous distribution
called the Student’s t distribution.
The Student’s t Distribution
Description
Also referred to as simply the t distribution, this distribution is really a family of
continuous, unimodal, bell-shaped distributions. It was developed in the early
1900s by W. S. Gossett, who used the pen name “Student” because his company
did not permit employees to publish their research results. The t distribution is the
probability distribution for the random variable t (x ␮)兾(s兾兹n). It has a
mean of zero, but its shape is determined by what is called the number of degrees
of freedom (df ). For confidence interval applications, the specific member of the
family is determined by df n 1.
The term degrees of freedom refers to the number of values that remain free
to vary once some information about them is already known. For example, if four
items have a mean of 10.0, and three of these items are known to have values of
8, 12, and 7, there is no choice but for the fourth item to have a value of 13. In
effect, one degree of freedom has been lost.
The t distribution tends to be flatter and more spread out than the normal distribution, especially for very small sample sizes. Figure 9.4 compares the approximate shape of a standard normal distribution with that of a t distribution for
which df 6. The t distribution converges to the normal distribution as the sample size (and df ) increases, and as the number of degrees of freedom approaches
infinity, the two distributions are actually identical. As with our use of z previously in this chapter, t represents distance in terms of standard error units. Seeing
Statistics Applet 10, at the end of the chapter, can be used in seeing how the shape
of the t distribution responds to different df values.
Chapter 9: Estimation from Sample Data
281
FIGURE 9.4
Standard normal
distribution
t distribution
with df = 6
example
t Distribution Table
Using the t Distribution Table. A table for t values that correspond to selected
areas beneath the t distribution appears on the pages immediately preceding the
back cover. A portion of the table is reproduced as Table 9.2 on page 282. In general, it is used in the same way as the standard normal table, but there are two
exceptions: (1) the areas provided are for the right tail only, and (2) it is necessary to refer to the appropriate degrees of freedom (df ) row in finding the appropriate t value.
SOLUTION
For a Sample Size of n ⴝ 15, What t Values Would Correspond to an Area
Centered at t ⴝ 0 and Having an Area beneath the Curve of 95%?
The area of interest beneath the curve can be expressed as 0.95, so the total area
in both tails combined will be (1.00 0.95), or 0.05. Since the curve is symmetrical, the area in just one tail will be 0.05兾2, or 0.025. The number of degrees
of freedom will be the sample size minus 1, or df n 1, or 15 1 14.
Referring to the 0.025 column and the df 14 row of the table, we find that
the value of t corresponding to a right-tail area of 0.025 is t 2.145. Because
the curve is symmetrical, the value of t for a left-tail area of 0.025 will be
t 2.145.
Note that these values of t (t 2.145) are farther apart than the z values
(z 1.96) that would have led to a 95% area beneath the standard normal
curve. Remember that the shape of the t distribution tends to be flatter and more
spread out than that of the normal distribution, especially for small samples.
For a Sample Size of n ⴝ 99, What t Values Would Correspond to an Area
Centered at t ⴝ 0 and Having an Area beneath the Curve of 90%?
In this case, the proportion of the area beneath the curve is 0.90, so each tail will
have an area of (1.00 0.90)兾2, or 0.05. Therefore, we will refer to the 0.05
A comparison of the
approximate shape of
the standard normal distribution with that of a t distribution having 6 degrees
of freedom. The shape of
the t distribution is flatter
but approaches that of
the standard normal
distribution as the number
of degrees of freedom
increases.
282
Part 3: Sampling Distributions and Estimation
TABLE 9.2
A portion of the Student’s
t distribution table. The t
distribution is really a family
of symmetric, continuous
distributions with a mean of
t 0. The specific member
of the distribution depends
on the number of degrees
of freedom, or df. As df
increases, the t distribution
approaches the normal distribution, and the t values
in the infinity row are
identical to the z values
for the standard normal
distribution.
a = right-tail area
(For a right-tail area of 0.025 and
df = 15, the t value is 2.131.)
0
␣
t
0.10
0.05
0.025
df ⴝ 1
2
3
4
5
3.078
1.886
1.638
1.533
1.476
6.314
2.920
2.353
2.132
2.015
12.706
4.303
3.182
2.776
2.571
31.821
6.965
4.541
3.747
3.365
63.657
9.925
5.841
4.604
4.032
6
7
8
9
10
1.440
1.415
1.397
1.383
1.372
1.943
1.895
1.860
1.833
1.812
2.447
2.365
2.306
2.262
2.228
3.143
2.998
2.896
2.821
2.764
3.707
3.499
3.355
3.250
3.169
11
12
13
14
15
1.363
1.356
1.350
1.345
1.341
1.796
1.782
1.771
1.761
1.753
2.201
2.179
2.160
2.145
2.131
2.718
2.681
2.650
2.624
2.602
3.106
3.055
3.012
2.977
2.947
⯗
⯗
⯗
⯗
⯗
⯗
1.290
1.290
1.290
1.282
1.661
1.660
1.660
1.645
1.984
1.984
1.984
1.960
2.365
2.365
2.364
2.326
2.627
2.626
2.626
2.576
98
99
100
“Infinity”
0.01
0.005
column in the t table. Subtracting 1 from the sample size of 99, df 99 1,
or 98. Using the 0.05 column and the df 98 row, the corresponding t value is
t 1.661. Since t 1.661 corresponds to a right-tail area of 0.05, t 1.661 will correspond to a left-tail area of 0.05. This is due to the symmetry of
the distribution, and the distance from t 1.661 to t 1.661 will include
90% of the area beneath the curve. You can use Seeing Statistics Applet 11, at the
end of the chapter, to further examine areas beneath the t distribution curve.
Should you encounter a situation in which the number of degrees of freedom
exceeds the df 100 limit of the t distribution table, just use the corresponding z
value for the desired level of confidence. These z values are listed in the df infinity
row in the t distribution table.
Chapter 9: Estimation from Sample Data
Using the t table instead of the standard normal table to which you’ve become accustomed may seem cumbersome at first. As we mentioned previously, however,
the t-interval is the technically appropriate procedure whenever s has been used to
estimate ␴. This is also the method you will either use or come into contact with
when dealing with computer statistical packages and their construction of confidence intervals. When using computer statistical packages, it’s easy to routinely
(and correctly) use the t distribution for constructing confidence intervals whenever ␴ is unknown and being estimated by s.
Confidence Intervals Using the t Distribution
Aside from the use of the t distribution, the basic procedure for estimating
confidence intervals is similar to that of the previous section. The appropriate
t value is used instead of z, and s replaces ␴. The t distribution assumes the underlying population is approximately normally distributed, but this assumption is
important only when the sample is small — that is, n 30. The interval estimate is
summarized as follows:
Confidence interval limits for the population mean, ␴ unknown:
xt
s

where x sample mean
s sample standard deviation
n sample size
t t value corresponding to the level of
confidence desired, with df n 1
(e.g., t 2.201 for 95% confidence,
n 12, and df 12 1 11)
s/兹n estimated standard error of the sampling
distribution of the mean
(If n 30, it must be assumed that the underlying population is approximately
normally distributed.)
example
t-Interval, Mean
A simple random sample of n 90 manufacturing employees has been selected
from those working throughout a state. The average number of overtime
hours worked last week was x 8.46 hours, with a sample standard deviation of
s 3.61 hours. The underlying data are in file CX09OVER.
SOLUTION
What Is the 98% Confidence Interval for the Population Mean, ␮?
The first step in determining the appropriate value of t is to identify the column
of the t distribution table to which we must refer. Since the confidence level is
98%, the right-tail area of interest is (1.00 0.98)兾2, or 0.01. For this sample
size, the number of degrees of freedom will be 90 1, or df 89. Referring to
283
AN IMPORTANT
NOTE
284
Part 3: Sampling Distributions and Estimation
FIGURE 9.5
Although the sample size in
this example is relatively
large (n 90), the t distribution was used in
constructing the 98% confidence interval for the population mean. This is because
the population standard deviation is unknown and is
being estimated by the
sample standard deviation
(s 3.61 hours).
The t distribution:
with df = n – 1 = 89.
For 98% confidence,
t will be ±2.369.
Area = 0.98
t = –2.369
0
t = +2.369
The 98% confidence interval for m is
s
x – t –––
√n
3.61
8.46 – 2.369 –––––
√90
or
s
x + t –––
√n
x
7.56
8.46
3.61
8.46 + 2.369 –––––
√90
8.46
9.36
the 0.01 column and the df 89 row, we determine t 2.369. Due to the
symmetry of the t distribution, 98% of the area beneath the curve will be
between t 2.369 and t 2.369. For the results summarized in Figure 9.5,
the underlying calculations for the 98% confidence interval are
xt
A REMINDER
s

8.46 2.369
3.61

or between 7.56 and 9.36 hours
If n is so large (e.g., n 101) that df exceeds the finite limits of the t table, just
use the infinity row of the table. In the preceding example, if n were 101, we
would refer to the 0.05 column and the df infinity row and obtain a t value of
1.645. This is the same as using z instead of t. For such large samples, the z and t
distributions are similar enough that the z will be a very close approximation.
Computer Solutions 9.2 shows how we can use Excel or Minitab to generate
a confidence interval for the mean when the population standard deviation is unknown. In this case, we are replicating the 98% confidence interval shown in Figure 9.5, and the 90 data values are in file CX09OVER. Once again, if we use an
Excel procedure based on summary statistics, it can be interesting to examine
“what-if” scenarios to instantly see how changes in the specified confidence level
would change the width of the confidence interval.
Chapter 9: Estimation from Sample Data
285
computer solutions 9.2
Confidence Interval for Population Mean, ␴ Unknown
These procedures show how to construct a confidence interval for the population mean when the
population standard deviation is unknown.
EXCEL
1
2
3
4
5
6
7
A
B
t-Estimate: Mean
C
D
hours
8.4603
3.61
7.559
9.362
Mean
Standard Deviation
LCL
UCL
Excel confidence interval for ␮ based on raw data and ␴ unknown
1. For example, using the 90 overtime data values (file CX09OVER.XLS) on which Figure 9.5 is based: The label and 90
data values are in A1:A91. Click Tools. Click Data Analysis Plus. Click t-Estimate: Mean. Click OK.
2. Enter A1:A91 into the Input Range box. Click Labels. The desired confidence level as a decimal fraction is 0.98, so
enter the corresponding alpha value (0.02) into the Alpha box. Click OK. The lower portion of the printout lists the
lower and upper limits for the 98% confidence interval.
Excel confidence interval for ␮ based on summary statistics and ␴ unknown
1. Open the ESTIMATORS.XLS workbook, supplied with the text.
2. Using the arrows at the bottom left, select the t-Estimate_Mean worksheet. Enter the sample mean (8.46), the sample standard deviation (3.61), the sample size (90), and the desired confidence level as a decimal fraction (0.98).
(Note: As an alternative, you can use the Excel worksheet template TMTINT.XLS, supplied with the text. The steps are described within the template.)
MINITAB
Confidence interval for ␮ based on raw data and ␴ unknown
One-Sample T: hours
Variable
hours
N
90
Mean
8.460
StDev
3.610
SE Mean
0.381
98% CI
(7.559, 9.362)
1. For example, using the data (file CX09OVER.MTW) on which Figure 9.5 is based, with the 90 data values in column C1:
Click Stat. Select Basic Statistics. Click 1-Sample t.
2. Select Samples in columns and enter C1 into the box. The Perform hypothesis test box should be left blank—we
will not be doing hypothesis testing until the next chapter.
3. Click Options. Enter the desired confidence level as a percentage (98.0) into the Confidence Level box. Within the
Alternative box, select not equal. Click OK. Click OK.
Confidence interval for ␮ based on summary statistics and ␴ unknown
Follow the procedure in the previous steps for raw data, but in step 2 select Summarized data, enter 90 into the Sample size
box, 8.46 into the Mean box, and 3.61 into the Standard deviation box.
286
Part 3: Sampling Distributions and Estimation
exercises
9.24 When the t distribution is used in constructing a
confidence interval based on a sample size of less than 30,
underlying population?
9.25 Why are the t values listed in the df infinity row
of the t distribution table identical to the z values that
correspond to the same right-tail areas of the standard
normal distribution? What does this indicate about the
relationship between the t and standard normal
distributions?
9.26 In using the t distribution table, what value of t
would correspond to an upper-tail area of 0.025 for
19 degrees of freedom?
9.27 In using the t distribution table, what value of t
would correspond to an upper-tail area of 0.10 for
28 degrees of freedom?
9.29 For df 85, determine the value of A that
corresponds to each of the following probabilities:
a. P(t A) 0.10
b. P(t A) 0.025
c. P(A t A) 0.98
55
54
9.30 Given the following observations in a simple
random sample from a population that is approximately
normally distributed, construct and interpret the 90%
and 95% confidence intervals for the mean:
79
71
98
74
70
59
102
92
96
9.31 Given the following observations in a simple
random sample from a population that is approximately
normally distributed, construct and interpret the 95%
and 99% confidence intervals for the mean:
66
50
34
34
59
42
56
61
51
53
45
48
38
57
58
47
52
50
9.33 The service manager of Appliance Universe has
recorded the times for a simple random sample of 50
refrigerator service calls taken from last year’s service
records. The sample mean and standard deviation were
25 minutes and 10 minutes, respectively.
a. Construct and interpret the 95% confidence interval
for the mean.
b. It’s quite possible that the population of such times is
strongly skewed in the positive direction — that is,
some jobs, such as compressor replacement, might
take 3 or 4 hours. If this were true, would the interval constructed in part (a) still be appropriate?
9.34 An automobile rental agency has the following
mileages for a simple random sample of 20 cars that were
rented last year. Given this information, and assuming
the data are from a population that is approximately
normally distributed, construct and interpret the 90%
confidence interval for the population mean:
9.28 For df 25, determine the value of A that
corresponds to each of the following probabilities:
a. P(t A) 0.025
b. P(t A) 0.10
c. P(A t A) 0.99
67
c. Given that the population standard deviation is not
known, which of these two confidence intervals
should be used as the interval estimate for ␮?
52
54
9.32 A consumer magazine has contacted a simple random
sample of 33 owners of a certain model of automobile and
within the first 2 months of ownership. The average number of defects was x 3.7, with a standard deviation of
1.8 defects.
a. Use the t distribution to construct a 95% confidence
interval for ␮ the average number of defects for
this model.
b. Use the z distribution to construct a 95% confidence
interval for ␮ the average number of defects for this
model.
35
50
65
74
64
92
69
59
37
50
88
38
80
59
39
29
61
60 miles
9.35 One of the most popular products sold by a manufacturer of electrical fuses is the standard 30-ampere
fuse used by electrical contractors in the construction
industry. The company has tested a simple random sample of 16 fuses from its production output and found the
amperages at which they “blew” to be as shown here.
Given this information, and assuming the data are from a
normally distributed population, construct and interpret
the 95% confidence interval for the population mean
amperage these fuses will withstand.
30.6 30.2 27.7 28.5 29.0 27.5 28.9 28.1
30.3 30.8 28.5 30.3 30.0 28.5 29.0 28.2 amperes
9.36 The author of an entry-level book on using Microsoft
Word has carried out a test in which 35 novices were provided with the book, a computer, and the assignment of
converting a complex handwritten document with text and
tables into a Microsoft Word file. The novices required an
average time of 105 minutes, with a standard deviation of
20 minutes. Construct and interpret the 90% confidence
interval for the population mean time it would take all
such novices to complete this task.
9.37 An office equipment manufacturer has developed a
new photocopy machine and would like to estimate the
average number of 81/2-by-11 copies that can be made
using a single bottle of toner. For a simple random sample
of 20 bottles of toner, the average was 1535 pages, with a
Chapter 9: Estimation from Sample Data
standard deviation of 30 pages. Making and stating
whatever assumptions you believe are necessary,
construct and interpret the 95% confidence interval for
the population mean.
9.38 Researchers have estimated that office workers in
Germany receive an average of 15.0 fax messages per day.
Assuming this finding to be based on a simple random sample of 80 German office workers, with a sample standard
deviation of s 3.5 messages, construct and interpret the
95% confidence interval for the population mean. Given
this confidence interval, would it seem very unusual if
another sample of this size were to have a mean of 16.5
faxes? SOURCE: Anne R. Carey and Genevieve Lynn, “Message Overload?”
USA Today, September 13, 1999, p. 1B.
9.39 According to Nielsen//NetRatings, the average visitor
to the American Greetings website spends 11.85 minutes
at the site. Assuming this finding to be based on a simple
random sample of 20 visitors to the site, with a sample
standard deviation of s 3.0 minutes, and from a population of visiting times that is approximately normally
distributed, construct and interpret the 98% confidence
interval for the population mean. Given this confidence
interval, would it seem very unusual if another sample of
this size were to have a mean visiting time of 13.0 minutes?
SOURCE:
287
/ data set / Note: Exercises 9.40 and 9.41 require a
computer and statistical software.
9.40 In an article published in a British medical journal,
Japanese researchers found that adults who were
videotaped in a relaxed setting blinked an average of
15.0 times per minute. Under the assumption that this
finding was based on the sample data in file XR09040,
construct and interpret the 99% confidence interval for
the population mean. Based on this confidence interval,
would it seem very unusual if another sample of this
size were to exhibit a mean blink rate of 16.0 times per
minute? SOURCE: “Blink Factor,” USA Today, August 9, 1999, p. 3D.
9.41 Automotive researchers have reported that building
the Ford Mustang required an average of 22.3 labor hours.
Under the assumption that this finding was based on the
sample data in file XR09041, construct and interpret the
95% confidence interval for the population mean number
of labor hours required for this model. Based on this
confidence interval, would it seem very unusual if
another sample of this size were to require an average of
22.9 labor hours for production of a Ford Mustang?
SOURCE:
Michael Woodyard, “U.S. Makers Narrow Efficiency Gap,”
Automotive News, June 21, 1999, p. 8.
“Top Web Properties,” USA Today, April 27, 2000, p. 3D.
CONFIDENCE INTERVAL ESTIMATES FOR THE
POPULATION PROPORTION
Determining a confidence interval estimate for the population proportion requires
that we use the sample proportion (p) for two purposes: (1) as a point estimate of
the (unknown) population proportion, ␲, and (2) in combination with the sample
size (n) in estimating the standard error of the sampling distribution of the sample
proportion for samples of this size.
The technique of this section uses the normal distribution as an approximation
to the binomial distribution. This approximation is considered satisfactory whenever np and n(1 p) are both 5, and becomes better for large values of n and
whenever p is closer to 0.5. The midpoint of the confidence interval is the sample
proportion, and the lower and upper confidence limits are determined as follows:
Confidence interval limits for the population proportion:

pz
p(1 p)
n
where p sample proportion 兹
number of successes
number of trials
n sample size
z z value corresponding to desired level of
confidence (e.g., z 1.96 for 95% confidence)
p(1 p)
estimated standard error of the
n
sampling distribution of the proportion
9.6
288
Part 3: Sampling Distributions and Estimation
example
z-Interval, Proportion
In a USA Today/CNN poll, 1406 adults were randomly selected from across the United
States. In response to the question, “Do you agree that the current system discourages
the best candidates from running for president?” 22% responded “strongly agree.”1
SOLUTION
What Is the 95% Confidence Interval for the Population Proportion Who
Would Have Answered “Strongly Agree” to the Question Posed?
The sample proportion p 0.22 is our point estimate of ␲ and the midpoint of
the interval. Since the confidence level is to be 95%, z will be 1.96. The resulting confidence interval, shown in Figure 9.6, is

pz

p(1 p)
0.22 1.96
n
0.22(1 0.22)
0.198 to 0.242
1406
From these results, we have 95% confidence that the population proportion is
somewhere between 0.198 and 0.242. Expressed in terms of percentage points,
the interval would be from 19.8% to 24.2% for the percentage of the population
who would have “strongly agreed,” and the interval width would be (24.2 19.8),
or 4.4 percentage points.
1Source:
Jean Becker, “Voters Favor a National Primary,” USA Today, February 5, 1988, p. 8A.
FIGURE 9.6
The 95% confidence interval
for a population proportion,
based on a political poll having a sample proportion of
p 0.22 and a sample size
of n 1406. We have 95%
confidence that ␲ is
between 0.198 and 0.242.
Normal distribution:
for 95% confidence,
z will be ±1.96.
Area = 0.95
z = –1.96
0
z = +1.96
The 95% confidence interval for p is
p–z
(1 – p)
√ p––––––
n
0.22 – 1.96
or
– 0.22)
–––––––––––
√ 0.22(1
1406
0.198
p
0.22
0.220
p+z
– p)
––––––
√ p(1
n
√
– 0.22)
0.22 + 1.96 0.22(1
–––––––––––
1406
0.242
Chapter 9: Estimation from Sample Data
289
Computer Solutions 9.3 shows how we can use Excel or Minitab to generate
a confidence interval for a population proportion. In this case, we are replicating
the 95% confidence interval shown in Figure 9.6. As always, if we use an Excel
procedure based on summary statistics, it can be interesting to examine “what-if”
scenarios to instantly see how changes in the specified confidence level would
change the width of the confidence interval.
computer solutions 9.3
Confidence Interval for Population Proportion
These procedures show how to construct a confidence interval for the population proportion.
EXCEL
1
2
3
4
5
6
A
B
z-Estimate of a Proportion
Sample proportion
Sample size
Confidence level
0.22
1406
0.95
C
Confidence Interval Estimate
0.22
Lower confidence limit
Upper confidence limit
D
E
plus/minus
0.022
0.198
0.242
Excel confidence interval for ␲ based on summary statistics
1. For example, with n ⫽ 1406 and p ⫽ 0.22, as in Figure 9.6: Open the ESTIMATORS.XLS workbook, supplied with
the text.
2. Using the arrows at the bottom left, select the z-Estimate_Proportion worksheet. Enter the sample proportion
(0.22), the sample size (1406), and the desired confidence level as a decimal fraction (0.95). The confidence interval
appears as shown here.
(Note: As an alternative, you can use Excel worksheet template TMPINT.XLS, supplied with the text. The steps are described
within the template.)
Excel confidence interval for ␲ based on raw data
1. For example, if we had 20 data values that were coded (1 ⫽ female, 2 ⫽ male), with the label and data values in
A1:A21: Click Tools. Click Data Analysis Plus. Click Z-Estimate: Proportion. Click OK.
2. Enter A1:A21 into the Input Range box. Enter 1 into the Code for Success box. Click Labels. The desired confidence
level as a decimal fraction is 0.95, so enter the corresponding alpha value (0.05) into the Alpha box. Click OK.
MINITAB
Minitab confidence interval for ␲ based on summary statistics
Test and CI for One Proportion
Sample
1
X
309
N
1406
Sample p
0.219772
95% CI
(0.198128, 0.241417)
Using the normal approximation.
1. This interval is based on the summary statistics for Figure 9.6: Click Stat. Select Basic Statistics. Click 1 Proportion.
Select Summarized Data. Enter the sample size (in this case, 1406) into the Number of Trials box. Multiply the sample proportion (0.22) times the sample size (1406) to get the number of “successes” or “events” as (0.22)(1406) ⫽
309.32. Round to the nearest integer and enter the result (309) into the Number of Events box. The Perform
hypothesis test box should be left blank.
(continued)
290
Part 3: Sampling Distributions and Estimation
2. Click Options. Enter the desired confidence level as a percentage (95.0) into the Confidence Level box. Within the
Alternative box, select not equal. Click to select Use test and interval based on normal distribution. Click OK.
Click OK.
Minitab confidence interval for ␲ based on raw data
1. For example, if column C1 contains 20 data values that are coded (1 female, 2 male): Click Stat. Select Basic
Statistics. Click 1 Proportion. Select Samples in columns and enter C1 into the dialog box.
2. Follow step 2 in the summary-information procedure. Note: Minitab will select the larger of the two codes (i.e., 2 male) as the “success” or “event” and provide the sample proportion and the confidence interval for the
population proportion of males. To obtain the results for females, just recode the data so females will have the higher
code number: Click Data. Select Code. Click Numeric to Numeric. Enter C1 into both the Code data from columns
box and the Into columns box. Enter 1 into the Original values box. Enter 3 into the New box. Click OK. The new
codes will be (3 female, 2 male).
exercises
9.42 Under what conditions is it appropriate to use the
normal approximation to the binomial distribution in
constructing the confidence interval for the population
proportion?
9.43 A pharmaceutical company found that 46% of 1000
U.S. adults surveyed knew neither their blood pressure nor
their cholesterol level. Assuming the persons surveyed to be
a simple random sample of U.S. adults, construct a 95%
confidence interval for ␲ the population proportion of
9.44 An airline has surveyed a simple random sample of
air travelers to find out whether they would be interested
in paying a higher fare in order to have access to e-mail
during their flight. Of the 400 travelers surveyed, 80
said e-mail access would be worth a slight extra cost.
Construct a 95% confidence interval for the population
proportion of air travelers who are in favor of the
airline’s e-mail idea.
9.45 In response to media inquiries and concerns
expressed by groups opposed to violence, the president of
a university with over 25,000 students has agreed to survey a simple random sample of her students to find out
whether the student body thinks the school’s “Plundering
Pirate” mascot should be changed to one that is less
aggressive in name and appearance. Of the 200 students
selected for participation in the survey, only 20% believe
the school should select a new and more kindly mascot.
Construct a 90% confidence interval for the population
proportion of students who believe the mascot should be
changed. Based on the sample findings and associated
confidence interval, comment on the credibility of a local
journalist’s comment that “over 50%” of the students
would like a new mascot.
9.46 In examining a simple random sample of 100 sales
invoices from several thousand such invoices for the previous
year, a researcher finds that 65 of the invoices involved
customers who bought less than \$2000 worth of merchandise from the company during that year. Construct a 90%
confidence interval for the proportion of all sales invoices
that were for customers buying less than \$2000 worth of
merchandise during the year.
9.47 Survey researchers estimate that 40% of U.S.
women age 18–29 save in a 401(k) or individual retirement account. Assuming the persons surveyed to be a
simple random sample of 1000 U.S. women in this age
group, construct a 95% confidence interval for ␲ the
population proportion of U.S. women in this age group
been taken instead of a survey. SOURCE: Anne R. Carey and
Sam Ward, “Women Saving for Retirement,” USA Today, September 5,
1996, p. 1B.
9.48 A study by the Society of Human Resource Manage-
ment found 23% of U.S. business executives surveyed
believe that an employer has no right to read employees’
e-mail. Assuming that the survey included a simple
random sample of 1200 executives, construct a 90%
confidence interval for ␲ the population proportion of
U.S. business executives who believe that employers have
no right to read employees’ e-mail. SOURCE: Anne R. Carey and
Marcy E. Mullins, “Bosses OK E-Mail Oversight,” USA Today, April 22, 1996,
p. 1A.
Chapter 9: Estimation from Sample Data
9.49 According to Nielsen Media Research viewership
data, the top television broadcast of all time was the last
episode of M*A*S*H, which aired on February 28, 1983,
and was viewed by an estimated 60.2% of all TV
households. Assuming this estimate was based on a simple
random sample of 1800 TV households, what is the 95%
confidence interval for ␲ the proportion of all TV households who viewed the last episode of M*A*S*H? SOURCE:
The World Almanac and Book of Facts 2003, p. 283.
9.50 In a major industry where well over 100,000 manufacturing employees are represented by a single union, a simple
random sampling of n 100 union members finds that 57%
of those in the sample intend to vote for the new labor contract negotiated by union and management representatives.
a. What is the 99% confidence interval for ␲ the
population proportion of union-represented employees
who intend to vote for the labor contract?
b. Based on your response to part (a), does contract
approval by the union appear to be a “sure thing”?
Why or why not?
9.51 Repeat Exercise 9.50, but assume that the sample size
was n 900 instead of n 100.
9.52 Based on its 1999 survey, Student Monitor reports that
20% of U.S. college students used the Internet for job hunting during the month preceding the survey. Assuming this
finding to be based on a simple random sample of 1600 college students, construct and interpret the 90% confidence
interval for the population proportion of college students
who used the Internet for job hunting during this period.
SOURCE:
Julie Stacey, “Online Extracurricular Activities,” USA Today, March 13,
2000, p. 10D.
9.53 According to Keynote Systems, Wal-Mart’s website
was available 95% of the time during a holiday shopping
season. Assuming this finding to be based on a simple random sample of 200 attempts, construct and interpret the
90% confidence interval for the population proportion of
the time the Wal-Mart site was available during this period.
SOURCE:
“Measuring How Key Web Sites Handle Holiday Shopping Rush,”
USA Today, November 17, 1999, p. 3B.
9.54 A Pathfinder Research Group survey estimates that, of
U.S. adults who have a favorite among The Three Stooges,
Moe is the favorite of 31% of them. Assuming this finding
291
to be based on a simple random sample of 800 Stooge fans
who have a favorite Stooge, construct and interpret the
95% confidence interval for the population proportion
whose favorite Stooge is Moe. SOURCE: Anne R. Carey and Marcy
E. Mullins, “Favorite Stooges,” USA Today, November 22, 1996, p. 1D.
9.55 Estelle McCarthy, a candidate for state office in New
Jersey, has been advised that she must get at least 65% of
the union vote in her state. A recent political poll of likely
voters included 800 union-member respondents, and 60%
of them said they intended to vote for Ms. McCarthy.
Based on the survey, construct and interpret the 95% confidence interval for the population proportion of likely-voter
union members who intend to vote for Ms. McCarthy. Is
the 65% level of support within the confidence interval?
the possibility that Ms. McCarthy might not succeed in
obtaining the level of union support she needs.
/ data set / Note: Exercises 9.56 and 9.57 require a
computer and statistical software.
9.56 In the documentation that accompanies its products
that are returned for in-warranty service, a manufacturer of
electric can openers asks the customer to indicate the
reason for the return. The codes and return problem categories are: (1) “doesn’t work,” (2) “excessive noise,” and
(3) “other.” Data file XR09056 contains the problem codes
for a simple random sample of 300 product returns. Based
on this sample, construct and interpret the 95% confidence
interval for the population proportion of returns that were
because the product “doesn’t work.”
9.57 An investment counselor has purchased a large mailing list consisting of 50,000 potential investors. Before
creating a brochure to send to members of the list, the
counselor mails a questionnaire to a small simple random
sampling of them. In one of the questions, the respondent is
asked, “Do I think of myself as someone who enjoys taking
risks?” The response codes are (1) “Yes” and (2) “No.”
The results for the 200 investors who answered this question are represented by the codes listed in data file XR09057.
Based on this sample, construct and interpret the 99% confidence interval for the population proportion of investors
on the counselor’s mailing list who think of themselves as
someone who enjoys taking risks.
SAMPLE SIZE DETERMINATION
In our interval estimates to this point, we have taken our results, including a
stated sample size, then constructed a confidence interval. In this section, we’ll
proceed in the opposite direction—in other words, we will decide in advance the
desired confidence interval width, then work backward to find out how large a
9.7
292
Part 3: Sampling Distributions and Estimation
sample size is necessary to achieve this goal. Central to our discussion in this section is the fact that the maximum likely sampling error (accuracy) is one-half the
width of the confidence interval.
Estimating the Population Mean
To show how the necessary sample-size equation is obtained, we will consider a
case in which ␴ is known. Of especial importance is that the distance from the
midpoint to the upper confidence limit can be expressed as either (1) the maximum likely sampling error (e) or (2) z times the standard error of the sampling
distribution. Since the two quantities are the same, we can set up the following
equation and solve for n:
ez
␴
z2 ⴢ ␴2
⎯→ Solving for n ⎯→ n e2

Required sample size for estimating a population mean:
n
z2 ⴢ ␴2
e2
where n required sample size
z z value for which z corresponds to the
desired level of confidence
␴ known (or, if necessary, estimated) value of
the population standard deviation
e maximum likely error that is acceptable
One way of estimating an unknown ␴ is to use a relatively small-scale pilot
study from which the sample standard deviation is used as a point estimate of the
population standard deviation. A second approach is to estimate ␴ by using the
results of a similar study done at some time in the past. A third method is to estimate ␴ as 1兾6 the approximate range of data values.
example
Sample Size, Estimating a Mean
A state politician would like to determine the average amount earned during summer employment by state teenagers during the past summer’s vacation period. She
wants to have 95% confidence that the sample mean is within \$50 of the actual
population mean. Based on past studies, she has estimated the population standard deviation to be ␴ \$400.
SOLUTION
What Sample Size Is Necessary to Have 95% Confidence That x– Will Be
Within \$50 of the Actual Population Mean?
For this situation, 95% confidence leads to a z value of 1.96, e the \$50 maximum likely error that is acceptable, and the estimated value of ␴ is \$400. The necessary sample size will be
n
z2 ⴢ ␴ 2 1.962 ⴢ 4002
245.9 persons, rounded up to 246
e2
502
Chapter 9: Estimation from Sample Data
Since we can’t include a fraction of a person in the sample, we round up to
n 246 to ensure 95% confidence in being within \$50 of the population mean.
Whenever the calculated value of n is not an integer, it is a standard (though
slightly conservative) practice to round up to the next integer value.
Note that if we cut the desired maximum likely error in half, the necessary
sample size will quadruple. This is because the e term in the denominator is
squared. The desire for extremely accurate results can lead to sample sizes that
grow very rapidly in size (and expense) as the specified value for e is reduced.
Estimating the Population Proportion
As when estimating the population mean, the maximum likely error (e) in estimating a population proportion will be one-half of the eventual confidence interval width. Likewise, the distance from the midpoint of the confidence interval to
the upper confidence limit can be described in two ways. Setting them equal and
solving for n gives the following result:

ez
p(1 p)
z2p(1 p)
⎯→ Solving for n ⎯→ n n
e2
Required sample size for estimating a population proportion:
n
z2p(1 p)
e2
where n required sample size
z z value for which z corresponds to the
desired level of confidence
p the estimated value of the population
proportion (As a conservative strategy,
use p 0.5 if you have no idea as to
the actual value of ␲.)
e maximum likely error that is acceptable
In applying the preceding formula, we should first consider whether the true
population proportion is likely to be either much less or much greater than 0.5. If
we have absolutely no idea as to what ␲ might be, using p 0.5 is the conservative strategy to follow. This is because the required sample size, n, is proportional
to the value of p(1 p), and this value is largest whenever p 0.5.
If we are totally uncertain regarding the actual population proportion, we
may wish to conduct a pilot study to get a rough idea of its value. If we can estimate the population proportion as being either much less or much more than 0.5,
we can obtain the desired accuracy with a smaller sample than would have otherwise been necessary.
If the population proportion is felt to be within a range, such as “between
0.20 and 0.40,” we should use as our estimate the value that is closest to 0.5.
For example, if we believe the population proportion is somewhere between
0.20 and 0.40, it should be estimated as 0.40 when calculating the required
sample size.
293
statistics
statistics in
in action
action 9.1
9.1
Sampling Error in Survey Research
When survey results are published, they are sometimes
accompanied by an explanation of how the survey was
conducted and how much sampling error could have been
present. Persons who have had a statistics course need only
know the size of the sample and the sample proportion or
percentage for a given question. For the general public,
however, the explanation of survey methods and sampling
error needs a bit of rephrasing. The following statement
accompanied the results of a survey commissioned by the
Associated Press.
this description. For 95% confidence, z 1.96. Since some
questions may have a population proportion of 0.5 (the
most conservative value to use when determining the required sample size), this is used in the following calculations:
n
The first sentence in this description briefly describes the
size and nature of the sample. The second sentence describes the confidence level as 19兾20, or 95%, and the sampling error as plus or minus 3 percentage points. Using the
techniques of this chapter, we can verify the calculations in
where p estimated
population proportion
and
e
How Poll Was Conducted
The Associated Press poll on taxes was taken Feb. 14–20
using a random sample of 1009 adult Americans.
No more than one time in 20 should chance variations
in the sample cause the results to vary by more than 3 percentage points from the answers that would be obtained if
all Americans were polled.
z 2p(1 p)
e2

z2p(1 p)
n
(1.96)2(0.5)(1 0.5)
0.0309
1009
As these calculations indicate, the Associated Press
rounded down to the nearest full percentage point (from
3.09 to 3.0) in its published explanation of the sampling
error. This is not too unusual, since the general public would
probably have enough to digest in the description quoted
above without having to deal with decimal fractions.
Source: “How Poll Was Conducted,’’ in Howard Goldberg, “Most Not Ready
to Scrap Tax System,” Indiana Gazette, February 26, 1996, p. 3.
example
Sample Size, Estimating a Proportion
A tourist agency researcher would like to determine the proportion of U.S. adults
who have ever vacationed in Mexico and wishes to be 95% confident that the
sampling error will be no more than 0.03 (3 percentage points).
SOLUTION
Assuming the Researcher Has No Idea Regarding the Actual Value of the Population Proportion, What Sample Size Is Necessary to Have 95% Confidence
That the Sample Proportion Will Be within 0.03 (3 Percentage Points) of the
Actual Population Proportion?
For the 95% level of confidence, the z value will be 1.96. The maximum acceptable
error is e 0.03. Not wishing to make an estimate, the researcher will use p 0.5
in calculating the necessary sample size:
n
z2p(1 p) 1.962(0.5)(1 0.5)
1067.1 persons, rounded up to 1068
e2
0.032
Chapter 9: Estimation from Sample Data
295
If the Researcher Believes the Population Proportion Is No More Than 0.3,
and Uses p ⴝ 0.3 as the Estimate, What Sample Size Will Be Necessary?
Other factors are unchanged, so z remains 1.96 and e is still specified as 0.03.
However, the p(1 p) term in the numerator will be reduced due to the assumption that the population proportion is no more than 0.3. The required sample size
will now be
n
z2p(1 p) 1.962(0.3)(1 0.3)
896.4 persons, rounded up to 897
e2
0.032
As in determining the necessary size for estimating a population mean, lower
values of e lead to greatly increased sample sizes. For example, if the researcher
estimated the population proportion as being no more than 0.3, but specified a
maximum likely error of 0.01 instead of 0.03, he would have to include nine
times as many people in the sample (8068 instead of 897).
Computer Solutions 9.4 (page 296) shows how we can use Excel to determine the necessary sample size for estimating a population mean or proportion.
With these procedures, it is very easy to examine “what-if” scenarios and instantly see how changes in confidence level or specified maximum likely error
will affect the required sample size.
exercises
9.58 “If we want to cut the maximum likely error in half,
we’ll have to double the sample size.” Is this statement
correct? Why or why not?
9.59 In determining the necessary sample size in making
an interval estimate for a population mean, it is necessary
to first make an estimate of the population standard deviation. On what bases might such an estimate be made?
9.60 From past experience, a package-filling machine has
been found to have a process standard deviation of 0.65
ounces of product weight. A simple random sample is to
be selected from the machine’s output for the purpose of
determining the average weight of product being packed
by the machine. For 95% confidence that the sample
mean will not differ from the actual population mean by
more than 0.1 ounces, what sample size is required?
9.61 Based on a pilot study, the population standard
deviation of scores for U.S. high school graduates taking a
new version of an aptitude test has been estimated as 3.7
points. If a larger study is to be undertaken, how large a
simple random sample will be necessary to have 99% confidence that the sample mean will not differ from the
actual population mean by more than 1.0 points?
9.62 A consumer agency has retained an independent
testing firm to examine a television manufacturer’s claim
that its 25-inch console model consumes just 110 watts of
electricity. Based on a preliminary study, the population
standard deviation has been estimated as 11.2 watts for
these sets. In undertaking a larger study, and using a simple
random sample, how many sets must be tested for the firm
to be 95% confident that its sample mean does not differ
from the actual population mean by more than 3.0 watts?
9.63 A national political candidate has commissioned a
study to determine the percentage of registered voters who
intend to vote for him in the upcoming election. To have
95% confidence that the sample percentage will be within
3 percentage points of the actual population percentage,
how large a simple random sample is required?
9.64 Suppose that Nabisco would like to determine, with
95% confidence and a maximum likely error of 0.03, the
Nabisco’s Spoon-Size Shredded Wheat for breakfast at
least once last week. In determining the necessary size of
a simple random sample for this purpose:
a. Use 0.5 as your estimate of the population proportion.
b. Do you think the population proportion could really
be as high as 0.5? If not, repeat part (a) using an
estimated proportion that you think would be more
likely to be true. What effect does your use of this
estimate have on the sample size?
296
Part 3: Sampling Distributions and Estimation
computer solutions 9.4
Sample Size Determination
These procedures determine the necessary sample size for estimating a population mean or a population
proportion.
EXCEL
1
2
3
4
5
6
7
8
9
10
11
A
B
C
D
Sample size required for estimating a
population mean:
Estimate for sigma:
Maximum likely error, e:
400.00
50.00
Confidence level desired:
alpha = (1 - conf. level desired):
The corresponding z value is:
0.95
0.05
1.960
The required sample size is n =
245.9
Sample size for estimating a population mean, using Excel worksheet template TMNFORMU.XLS that accompanies the text
To determine the necessary sample size for estimating a population mean within \$50 and with 95% confidence, assuming a population standard deviation of \$400: Open Excel worksheet TMNFORMU.XLS. Enter the estimated sigma
(400), the maximum likely error (50), and the specified confidence level as a decimal fraction (0.95). The required sample size (in cell D11) should then be rounded up to the nearest integer (246). This procedure is also described within
the worksheet template.
Caution Do not save any changes when exiting Excel.
Sample size for estimating a population proportion, using Excel worksheet template TMNFORPI.XLS that
accompanies the text
To determine the necessary sample size for estimating a population proportion within 0.03 (3 percentage points) and
with 95% confidence: Open Excel worksheet TMNFORPI.XLS. Enter the estimate for pi (0.50), the maximum likely error
(0.03), and the specified confidence level as a decimal fraction (0.95). The required sample size (in cell D11) should then
be rounded up to the nearest integer (1068). (Note: If you have knowledge about the population and can estimate pi
as either less than or greater than 0.50, use your estimate and the necessary sample size will be smaller. Otherwise, be
conservative and use 0.50 as your estimate.) This procedure is also described within the worksheet template.
Caution Do not save any changes when exiting Excel.
9.65 The Chevrolet dealers of a large county are conduct-
ing a study to determine the proportion of car owners in
the county who are considering the purchase of a new car
within the next year. If the population proportion is
believed to be no more than 0.15, how many owners
must be included in a simple random sample if the dealers
want to be 90% confident that the maximum likely error
will be no more than 0.02?
9.66 In Exercise 9.65, suppose that (unknown to the
dealers) the actual population proportion is really 0.35. If
they use their estimated value (␲ 0.15) in determining
the sample size and then conduct the study, will their
maximum likely error be greater than, equal to, or less
than 0.02? Why?
9.67 In reporting the results of their survey of a simple
random sample of U.S. registered voters, pollsters claim
95% confidence that their sampling error is no more than
4 percentage points. Given this information only, what
sample size was used?
Chapter 9: Estimation from Sample Data
297
9.8
WHEN THE POPULATION IS FINITE
Whenever sampling is without replacement and from a finite population, it may be
necessary to modify slightly the techniques for confidence-interval estimation and
sample size determination in the preceding sections. As in Chapter 8, the general
idea is to reduce the value of the standard error of the estimate for the sampling
distribution of the mean or proportion. As a rule of thumb, the methods in this section should be applied whenever the sample size (n) is at least 5% as large as the
population. When n 0.05N, there will be very little difference in the results.
Confidence-Interval Estimation
Whether we are dealing with interval estimation for a population mean (␮) or a
population proportion (␲), the confidence intervals will be similar to those in Figure 9.2. The only difference is that the “” term will be multiplied by the “finite
population correction factor” shown in Table 9.3. As in Chapter 8, this correction depends on the sample size (n) and the population size (N).
As an example of how this works, we’ll consider a situation such as that in
Section 9.5, where a confidence interval is to be constructed for the population
mean, and the sample standard deviation (s) is used as an estimate of the population standard deviation (␴). In this case, however, the sample will be relatively
large compared to the size of the population.
example
Interval Estimates, Finite Population
According to the Bureau of the Census, the population of Kent County, Texas, is
812 persons.2 For purposes of our example, assume that a researcher has interviewed a simple random sample of 400 persons and found that their average
TABLE 9.3
Confidence Interval Estimate for the Population Mean, ␴ Known
Infinite Population
Finite Population
␴
xz
xz

( 兹 )
␴

Nn
N1
ⴢ
Confidence Interval Estimate for the Population Mean, ␴ Unknown
Infinite Population
xt
Finite Population
s
xt

( 兹 )
s

ⴢ
Nn
N1
Confidence Interval Estimate for the Population Proportion
Infinite Population

pz
2Source:
p(1 p)
n
The World Almanac and Book of Facts 2003, p. 459.
Finite Population
pz
(兹
p(1 p)
ⴢ
n

Nn
N1
)
Summary of confidence
interval formulas when
sampling without replacement from a finite population. As a rule of thumb,
they should be applied
whenever the sample is at
least 5% as large as the
population. The formulas
and terms are similar to
those in Figure 9.2 but
include a “finite population
correction factor,” the
value of which depends
on the relative sizes of
the sample (n) and
population (N).
298
Part 3: Sampling Distributions and Estimation
number of years of formal education is x 11.5 years, with a standard deviation
of s 4.3 years.
SOLUTION
Considering That the Population Is Finite and n ⱖ 0.05N, What Is the
95% Confidence Interval for the Population Mean?
Since the number of degrees of freedom (df n 1 400 1, or 399) exceeds
the limits of our t distribution table, the t distribution and normal distribution can
be considered to be practically identical, and we can use the infinity row of the t
table. The appropriate column in this table will be 0.025 for a 95% confidence interval, and the entry in the infinity row of this column is a t value of 1.96.
Since s is being used to estimate ␴, and the sample is more than 5% as large
as the population, we will use the “␴ unknown” formula of the finite population
expressions in Table 9.3. The 95% confidence interval for the population mean
can be determined as
The finite population correction term:
this will be smaller than 1.0 so the
standard error will be less than if an
infinite population were involved.
xt
(
( 兹
s
ⴢ

Nn
N1

)
)
4.3
812 400
ⴢ
812 1

11.5 1.96(0.215 ⴢ 0.713) 11.5 0.300
or from 11.200 to 11.800
11.5 1.96
As expected, the finite correction term (0.713) is less than 1.0 and leads to a
95% confidence interval that is narrower than if an infinite population had been
assumed. (Note: If the population had been considered infinite, the resulting interval would have been wider, with lower and upper limits of 11.079 and 11.921
years, respectively.)
Sample Size Determination
As in confidence-interval estimation, the rule of thumb is to change our sample
size determination procedure slightly whenever we are sampling without replacement from a finite population and the sample is likely to be at least 5% as large
as the population. Although different in appearance, the following formulas are
applied in the same way that we used their counterparts in Section 9.7.
N
O
T
E
If you were to substitute an N value of infinity into each of the following equations, you would find that the right-hand term in the denominator of each would
be eliminated, and the result would be an expression exactly the same as its counterpart in Section 9.7.
Chapter 9: Estimation from Sample Data
Required sample size for estimating the mean of a finite population:
n
␴2
e2 ␴2
z2
N
where n required sample size
N population size
z z value for which z corresponds to the
desired level of confidence
␴ known (or, if necessary, estimated) value
of the population standard deviation
e maximum likely error that is acceptable
Required sample size, estimating the proportion for a finite population:
n
p(1 p)
e2 p(1 p)
z2
N
where n required sample size
N population size
z z value for which z corresponds to the
desired level of confidence
p the estimated value of the population
proportion (As a conservative strategy,
use p 0.5 if you have no idea as to
the actual value of ␲.)
e maximum likely error that is acceptable
example
Sample Size, Finite Population
The Federal Aviation Administration (FAA) lists 8586 pilots holding commercial
helicopter certificates.3 Suppose the FAA wishes to question a simple random
sample of these individuals to find out what proportion are interested in switching jobs within the next 3 years. Assume the FAA wishes to have 95% confidence
that the sample proportion is no more than 0.04 (i.e., 4 percentage points) away
from the true population proportion.
SOLUTION
Considering That the Population Is Finite, What Sample Size Is Necessary to
Have 95% Confidence That the Sample Proportion Will Not Differ from the
Population Proportion by More Than 0.04?
Since the actual population proportion who are interested in switching jobs has
not been estimated, we will be conservative and use p 0.5 in deciding on the
necessary sample size. For the 95% confidence level, z will be 1.96. Applying the
3Source: General Aviation Manufacturers Association, General Aviation Statistical Database, 2005
Edition, p. 37.
299
300
Part 3: Sampling Distributions and Estimation
finite population formula, with N 8586, the number of pilots who should be included in the sample is
n
p(1 p)
0.5(1 0.5)
561.0
e2 p(1 p)
0.042 0.5(1 0.5)
z2
N
1.962
8586
Had the population been infinite, the required sample size would have been
calculated as in Section 9.7. This would have resulted in n 600.25, rounded up
to 601. By recognizing that the population is finite, we are able to achieve the desired confidence level and maximum error with a sample size that includes only
exercises
9.68 As a rule of thumb, under what conditions should the
finite population correction be employed in determining
confidence intervals and calculating required sample sizes?
homes might exceed the Environmental Protection Agency’s
recommended limit of 15 parts per billion of lead.
9.69 Compared to situations where the population is either
ulation of 800. In order to have 95% confidence that the
sampling error in estimating ␲ is no more than 0.03,
what sample size will be necessary?
infinite or very large compared to the sample size, what
effect will the finite population correction tend to have on
a. the width of a confidence interval?
b. the required size of a sample?
9.70 The personnel manager of a firm with 200 employ-
ees has selected a simple random sample of 40 employees
and examined their health-benefit claims over the past
year. The average amount claimed during the year was
\$260, with a standard deviation of \$80. Construct and
interpret the 95% confidence interval for the population
mean. Was it necessary to make any assumptions about
the shape of the population distribution? Explain.
9.71 Of 1200 undergraduates enrolled at a univer-
sity, a simple random sample of 600 have been surveyed
to measure student support for a \$5 activities fee increase
to help fund women’s intercollegiate athletics at the
NCAA division 1A level. Of those who were polled, 55%
supported the fee increase. Construct and interpret the
95% and 99% confidence intervals for the population
proportion. Based on your results, comment on the possibility that the fee increase might lose when it is voted on
at next week’s university-wide student referendum.
9.72 A local environmental agency has selected a simple
random sample of 16 homes to be tested for tap-water
12 parts per billion and a standard deviation of 4 parts per
billion. Considering that the homes were selected from a
community in which there are 100 homes, construct and
interpret the 95% confidence interval for the population
mean. Based on your results, comment on the possibility
that the average lead concentration in this community’s
9.73 A simple random sample is to be drawn from a pop-
9.74 A simple random sample is to be drawn from a pop-
ulation of 2000. The population standard deviation has
been estimated as being 40 grams. In order to have 99%
confidence that the sampling error in estimating ␮ is no
more than 5 grams, what sample size will be necessary?
9.75 There are 100 members in the United States Senate. A
political scientist wants to estimate, with 95% confidence
and within 3 percentage points, the percentage who own
stock in foreign companies. How many senators should be
interviewed? Explain any assumptions you used in obtaining your recommended sample size.
9.76 A transportation company operates 200 trucks and
would like to use a hidden speed monitor device to record
the maximum speed at which a truck is operated during
the period that the device is installed. The trucks are driven
primarily on interstate highways, and the company wants
to estimate the average maximum speed for its fleet with
90% confidence and within 2 miles per hour. Using (and
explaining) your own estimate for the population standard
deviation, determine the number of trucks on which the
company should install the hidden speed-recording device.
9.77 A research firm supports a consumer panel of 2000
households that keep written diaries of their weekly grocery expenditures. The firm would like to estimate, with
95% confidence and within 4 percentage points, the percentage of its panel households who would be interested
in providing more extensive information in return for an
extra \$50 per week remuneration. How many of the
Chapter 9: Estimation from Sample Data
301
households should be surveyed? Explain any assumptions
you used in obtaining your recommended sample size.
(and explain) your own estimate for the population standard deviation.
9.78 A university official wants to estimate, with 99%
9.79 A quality-management supervisor believes that no
confidence and within \$2, the average amount that
members of fraternities and sororities spend at local
restaurants during the first week of the semester. If the
total fraternity/sorority membership is 300 people, how
many members should be included in the sample? Use
more than 5% of the items in a recent shipment of 2000
are defective. If she wishes to determine, within 1 percentage point and with 99% confidence, the percentage of
defective items in the shipment, how large a simple
random sample would be necessary?
SUMMARY
• Inferential statistics: point and interval estimates for a population parameter
Chapter 8 examined the sampling distribution of a sample mean or a sample proportion from a known population. In this chapter, the emphasis has been on the
estimation of an unknown population mean (␮) or proportion (␲) on the basis of
sample statistics. Point estimates involve using the sample mean (x) or proportion
(p) as the single best estimate of the value of the population mean or proportion.
Interval estimates involve a range of values that may contain the actual value of
the population parameter. When interval estimates are associated with a degree of
certainty that they really do include the true population parameter, they are referred to as confidence intervals.
a confidence interval for a population mean or a population
• Constructing
proportion
The procedure appropriate to constructing an interval estimate for the population
mean depends largely on whether the population standard deviation is known.
Figure 9.2 summarizes these procedures and their underlying assumptions. Although the t-interval is often associated with interval estimates based on small
samples, it is appropriate for larger samples as well. Using computer statistical
packages, we can easily and routinely apply the t distribution for interval estimates of the mean whenever ␴ is unknown, even for very large sample sizes.
A trade-off exists between the degree of confidence that an interval contains
the population parameter and the width of the interval itself. The more certain we
wish to be that the interval estimate contains the parameter, the wider the interval
will have to be.
• Sample size determination
Accuracy, or sampling error, is equal to one-half of the confidence interval width.
The process of sample size determination anticipates the width of the eventual
confidence interval, then determines the required sample size that will limit the
maximum likely sampling error to an acceptable amount.
• When the sample is large compared to the population
As in Chapter 8, when sampling is without replacement from a finite population,
it is appropriate to use a finite population correction factor whenever the sample is
at least 5% of the size of the population. Such corrections are presented for both
interval estimation and sample size determination techniques within the chapter.
• Computer-generated confidence intervals
Most computer statistical packages are able to construct confidence interval estimates of the types discussed in the chapter. Examples of Excel and Minitab outputs
are provided for a number of chapter examples in which such confidence intervals
were developed.
9.9
302
Part 3: Sampling Distributions and Estimation
equations
Confidence Interval Limits for the Population Mean, ␴ Known
xz
␴

where x sample mean
␴ population standard deviation
n sample size
z z value for desired confidence level
␴兾兹n standard error of the sampling
distribution of the mean
(Assumes that either (1) the underlying population is normally distributed or (2) the sample size is n 30.)
Confidence Interval Limits for the Population Mean, ␴ Unknown
s
where x sample mean
xt

s sample standard deviation
n sample size
t t value corresponding to the level of
confidence desired, with df n 1
s兾兹n estimated standard error of the sampling
distribution of the mean
(If n 30, this requires the assumption that the underlying population is approximately
normally distributed.)
Confidence Interval Limits for the Population Proportion

pz
p(1 p)
n
where p sample proportion 兹
number of successes
number of trials
n sample size
z z value corresponding to desired level of
confidence (e.g., z 1.96 for 95% confidence)
p(1 p)
estimated standard error of the sampling
n
distribution of the proportion
Required Sample Size for Estimating a Population Mean
n
z2 ⴢ ␴2
e2
where n required sample size
z z value for which z corresponds to the
desired level of confidence
␴ known (or, if necessary, estimated) value of
the population standard deviation
e maximum likely error that is acceptable
Required Sample Size for Estimating a Population Proportion
n
z2p(1 p)
e2
where n required sample size
z z value for desired level of confidence
p estimated value of the population
proportion (if not estimated, use p 0.5)
e maximum likely error that is acceptable
Chapter 9: Estimation from Sample Data
303
Confidence Interval Estimates When the Population Is Finite
•
For the population mean, ␴ known:
xz
•
( 兹
␴
ⴢ

Nn
N1
where n sample size
N population size
For the population mean, ␴ unknown:
xt
•
)
( 兹
s
ⴢ

Nn
N1
)
For the population proportion:
pz
(兹
p(1 p)
ⴢ
n

Nn
N1
)
Required Sample Size for Estimating the Mean of a Finite Population
n
␴2
␴2
2
z
N
e2
where n required sample size
N population size
z z value for desired level of confidence
␴ known (or estimated) value of the population
standard deviation
e maximum likely error that is acceptable
Required Sample Size for Estimating the Proportion for a Finite Population
n
p(1 p)
p(1 p)
z2
N
e2
where n required sample size
N population size
z z value for desired level of confidence
p the estimated population proportion (if not
estimated, use p 0.5)
e maximum likely error that is acceptable
chapter exercises
9.80 In a destructive test of product quality, a briefcase
manufacturer places each of a simple random sample of the
day’s production in a viselike device and measures how
many pounds it takes to crush the case. From past experience, the standard deviation has been found to be 21.5
pounds. For 35 cases randomly selected from today’s
production, the average breaking strength was 341.0
pounds. Construct and interpret the 99% confidence interval for the mean breaking strength of the briefcases
produced today.
9.81 Working independently, each of two researchers has
devised a sampling plan to be carried out for the purpose of
constructing a 90% confidence interval for the mean of a
certain population. What is the probability that neither of
their confidence intervals will include the population mean?
9.82 The accompanying data represent one-way
commuting times (minutes) for a simple random sample
of 15 persons who work at a large assembly plant. The
data are also in file XR09082. Assuming an approximately
normal distribution of commuting times for those who
work at the plant, construct and interpret the 90% and
95% confidence intervals for the mean.
21.7
39.0
30.0
26.8
28.0
33.6
33.1
24.7
33.3
27.9
28.4
34.1
23.5
28.9
35.1
304
Part 3: Sampling Distributions and Estimation
9.83 A torque wrench used in the final assembly of cylinder heads has a process standard deviation of 5.0 lb-ft. The
engineers have specified that a process average of 135 lb-ft
is desirable. For a simple random sample of 30 nuts that
the machine has recently tightened, the sample mean is
137.0 lb-ft. Construct and interpret the 95% confidence
interval for the current process mean. Discuss the possibility that the machine may be in need of adjustment to
correct the process mean.
9.84 There are approximately 109 million television
households in the United States. A ratings service would
like to know, within 5 percentage points and with 95%
confidence, the percentage of these households who tune
in to the first episode of a network miniseries. How many
television households must be included in the sample?
SOURCE:
The World Almanac and Book of Facts 2006, p. 278.
9.85 In Exercise 9.84, a small-scale preliminary survey
has indicated that no more than 20% of the television
households will tune in to the first episode of the
miniseries. Given this information, how large must the
sample be?
9.86 In a survey of 500 U.S. adults, 45% of them said
that lounging at the beach was their “dream vacation.”
Assuming this to be a simple random sample of U.S.
adults, construct and interpret the 95% and 99%
confidence intervals for the proportion of U.S. adults who
consider lounging at the beach to be their dream vacation.
9.87 For the following simple random sample of household incomes (thousands of dollars) from a large county,
construct and interpret the 90% and 95% confidence intervals for the population mean. The data are also in file
XR09087.
58.3 50.0 58.1 33.5 51.1 38.1 42.3 60.4 55.8 46.2
40.4 52.5 51.3 47.5 48.5 59.3 40.9 37.1 39.1 43.6
55.3 42.3 48.2 42.8 61.1 34.7 35.5 52.9 44.7 51.5
9.88 For a new process with which the production
personnel have little experience, neither the standard deviation nor the mean of the process is known. Twenty different simple random samples, each with n 50, are to
be drawn from the process, and a 90% confidence interval for the mean is to be constructed for each sample.
What is the probability that at least 2 of the confidence
intervals will not contain the population mean?
9.89 There were 904 new Subway Restaurants franchises
opened during 2002. Suppose that Subway wished to survey a simple random sample of the new franchisees to
find out what percentage of them were totally pleased
with their relationship with the company. If Subway
wanted to have 90% confidence in being within 3
percentage points of the population percentage who are
pleased, how many of the new franchisees would have to
be included in the sample? SOURCE: Subway.com. June 13, 2003.
9.90 In Exercise 9.89, suppose Subway has carried out
the study, using the sample size determined in that exercise, and 27.5% of the franchisees say they are pleased
with their relationship with Subway. Construct and
interpret the 95% confidence interval for the population
percentage.
9.91 A research firm wants to be 90% confident that a
population percentage has been estimated to within 3
percentage points. The research manager calculates the
necessary sample size with 0.5 as his estimate of the population proportion. A new business school graduate who
has just joined the firm questions the research manager
further, and they agree that the population proportion is
no more than 0.3. If interviews cost \$10 each, how much
money has the new graduate just saved the company?
9.92 The activities director of a large university has
surveyed a simple random sample of 100 students for
the purpose of determining approximately how many
students to expect at next month’s awards ceremony to
be held in the gymnasium. Forty of the students said they
plan to attend. What are the upper and lower 95% confidence limits for the number of the university’s 10,000
students who plan to attend the awards ceremony?
9.93 A research firm has found that 39% of U.S. adults
in the over-\$75,000 income category work at least 51
hours per week. Assuming this was a simple random sample of 500 adults in this income group, construct and
interpret the 95% and 99% confidence intervals for the
proportion who work at least 51 hours per week. For
each of the confidence intervals, identify and explain the
maximum likely error in the study.
9.94 For a process having a known standard deviation, a
simple random sample of 35 items is selected. If the width
of the 95% confidence interval is identified as y, express the
width of the 99% confidence interval as a multiple of y.
9.95 The makers of Count Chocula breakfast cereal
would like to determine, within 2 percentage points and
with 99% confidence, the percentage of U.S. senior
citizens who have Count Chocula for breakfast at least
once a week. What sample size would you recommend?
9.96 In a work-sampling study, an industrial engineer
has observed the activities of a clerical worker on 121
randomly selected times during a workweek. On 32 of
these occasions, the employee was talking on the
telephone. For an 8-hour day, what are the upper and
lower 95% confidence limits for the number of minutes
this employee talks on the phone?
9.97 A researcher would like to determine, within 3 percentage points and with 90% confidence, the percentage of
Americans who have a certain characteristic. If she feels
certain that the percentage is somewhere between 20%
and 40%, how many persons should be included in the
sample?
Chapter 9: Estimation from Sample Data
9.98 In a survey of 1320 executives who oversee corporate data systems, 24% said they had experienced losses
caused by computer viruses during the past year. Assuming the executives were a simple random sample of all
such executives, construct and interpret the 90%
confidence interval for the population proportion who
were monetarily harmed by computer viruses that year.
9.99 An airline would like to determine, within 3
percentage points and with 95% confidence, the percentage of next month’s customers who judge the courtesy of
its employees as being “very good to excellent.” What
sample size would you recommend?
9.100 A consultant conducts a pilot study to estimate a
population standard deviation, then determines how large
a simple random sample will be necessary to have a given
level of confidence that the difference between x and ␮
will be within the maximum error specified by her client.
The necessary sample size has been calculated as n 100. If the client suddenly decides that the maximum
error must be only one-fourth that originally specified,
what sample size will now be necessary?
9.101 There are 1733 machinery rebuilding and repairing companies in the United States. A tool manufacturer
wishes to survey a simple random sample of these firms
to find out what proportion of them are interested in a
new tool design. If the tool manufacturer would like to be
95% confident that the sample proportion is within 0.01
of the actual population proportion, how many machinery rebuilding and repairing companies should be
included in the sample? SOURCE: American Business Information,
Sales Leads & Mailing Lists, August 1999, p. 22.
9.102 In Exercise 9.101, suppose the tool manufacturer
has carried out the study, using the sample size determined
in that exercise, and 39.0% of the machinery rebuilding
and repairing companies are interested in the new tool
design. Construct and interpret the 95% confidence
interval for the population percentage.
9.103 The Colgate-Palmolive Company has 37,700
employees. If the company wishes to estimate, within
2 percentage points and with 99% confidence, the percentage of employees who are interested in participating in a
new stock option benefits program, how large a simple
random sample will be necessary? SOURCE: Colgate-Palmolive
Company, 2002 Annual Report, p. 8.
9.104 To gain information about competitors’ products,
companies sometimes employ “reverse engineering,”
which consists of buying the competitor’s product, then
taking it apart and examining the parts in great detail.
Engaging in this practice, a bicycle manufacturer intends
bikes and measure the tensile strength of the crossbar
portion of the frame. Past experience has shown these
strengths to be approximately normally distributed with a
standard deviation of 20 pounds per square inch (psi).
305
If the bike purchaser wants to have 90% confidence that
the sampling error will be no more than 5 psi, how many
of the competitor’s mountain bikes should be purchased
for destructive testing?
9.105 A survey of business travelers found that 40% of
those surveyed utilize hotel exercise facilities during their
stay. Under the assumption that a simple random sample
of 1000 business travelers were surveyed, construct and
interpret the 90% and 95% confidence intervals for the
proportion of business travelers who use their hotel’s
exercise facilities.
9.106 A researcher, believing ␲ to be no more than 0.40,
calculates the necessary sample size for the confidence
level and maximum likely error he has specified. Upon
completing the study, he finds the sample proportion to
be 0.32. Is the maximum likely error greater than, equal
to, or less than that originally specified? Explain.
9.107 A truck loaded with 8000 electronic circuit boards
has just pulled into a firm’s receiving dock. The supplier
claims that no more than 3% of the boards fall outside the
most rigid level of industry performance specifications. In a
simple random sample of 300 boards from this shipment,
12 fall outside these specifications. Construct the 95%
confidence interval for the percentage of all boards in this
shipment that fall outside the specifications, then comment
on whether the supplier’s claim would appear to be
correct.
9.108 A researcher has estimated that U.S. college
students spend an average of 17.2 hours per week on the
Internet. Assuming a simple random sample of 500
college students and a sample standard deviation of
1.4 hours per week, construct and interpret the 99%
confidence interval for the population mean.
/ data set / Note: Exercises 9.109–9.111 require a
computer and statistical software.
9.109 According to the National Restaurant Association,
the average check for a sit-down dinner is \$25. Such a
finding could have been based on data like the 800 sample checks in file XR09109. Using the data in this file, construct and interpret the 95% confidence interval for the
population mean. SOURCE: “U.S. Dining-Out Tab: \$1B a Day,” USA
Today, May 25, 2000, p. 1D.
9.110 For taxpayers having an adjusted gross income of
\$1 million or more, the Internal Revenue Service reports
that the average deduction for gifts to charity was
\$144,700. Curious to see how his state compares, a legislator surveys a simple random sample of 200 taxpayers
from his state who are in this gross income category, with
the data as shown in file XR09110. Using the data in this
file, construct and interpret the 90% confidence interval
for the mean charitable-gifts deduction for all of the
state’s taxpayers who are in the \$1 million or more
adjusted gross income category. Is \$144,700 within the
306
confidence interval? Given the answer to the preceding
question, comment on whether the state’s taxpayers who
are in this income group might not be typical of those in
the nation as a whole in terms of their tax-deductible
charitable contributions. SOURCE: “Brilliant Deductions, Taxing
Questions,” USA Today, March 3, 2000, p. 3B.
9.111 To avoid losing part of their federal highway fund
allocation, state safety administrators must ensure that
interstate speed limits are adequately enforced within
their state. In an upcoming test, federal researchers will
be randomly selecting and clocking a very large sample
of vehicles on a given section of the state’s portion of an
Part 3: Sampling Distributions and Estimation
interstate highway that has historically had a relatively
high accident rate. In anticipation of the upcoming study,
state administrators randomly select and clock 100 vehicles along this route, obtaining the speeds shown in data
file XR09111. Construct and interpret the 95% confidence
interval for the population mean vehicle speed along this
stretch of highway. Based on this interval, comment on
whether the mean speed for the population of vehicles
using this part of the highway might be 70 mph, the cutoff above which federal highway funds become
endangered.
integrated cases
Thorndike Sports Equipment (Thorndike Video Unit Four)
Seeing the fishing pole in his grandfather’s office, Ted
Thorndike’s first thought is that old Luke is going to go
fishing again and leave him to manage the store. He is
quite surprised to learn the fishing pole is actually an inspiration for a new series of ads that Luke has in mind.
The elder Thorndike explains, “Ted, this fishing pole
is made of graphite, the same stuff that goes into our GrafPro racquetball racquets. It’s so flexible and strong that it
can be bent so the two ends actually touch each other.
They even show this in the ads.” Although Luke realizes
that you can’t do exactly the same thing with a racquetball
racquet, he’d like to put some of his racquets into a horizontal mounting device, then see how much weight they’ll
take before they break.
If the amount of weight is impressive enough, Luke
plans to include this kind of test in the television advertisements he’s planning for the firm’s racquetball racquets.
However, he wants to be careful not to brag about the
racquet being able to hold too much weight, since the firm
could get into trouble with the government and other
He asks Ted to set up a test in which racquets are
mounted horizontally, then the weight on the end is
gradually increased until they break. Based on the test results, a weight value would be selected such that the average racquet would almost certainly be able to withstand
this amount. Although accuracy is important, Ted has
been instructed not to break more than 15 or 20 racquets
in coming up with an average for all the racquets.
For 20 racquets subjected to this severe test, the
weight (in pounds) at which each one failed was as follows. The data are also in file THORN09.
221
208
224
217
228
220
222
230
223
217
229
236
218
224
215
222
218
225
221
234
Ted believes it’s reasonable to assume the population of
breaking strengths is approximately normally distributed.
Because of Luke’s concern about being able to support the
advertising claim, he wants to be very conservative in estimating the population mean for these breaking strengths.
Ted needs some help in deciding how conservative he
would like to be, and in coming up with a number that can
Springdale Shopping Survey
The case in Chapter 2 listed 30 questions asked of 150 respondents in the community of Springdale. The coding key
for these responses was also provided in this earlier
exercise. The data are in file SHOPPING. In this exercise,
some of the estimation techniques presented in the chapter
will be applied to the survey results. You may assume that
(continued)
seeing statistics: applet 9
Chapter 9: Estimation from Sample Data
307
Confidence Interval Size
This applet allows us to construct and view z-intervals for
the population mean by using the slider to specify the confidence level. As in Figure 9.3, the sample mean is 1.400
inches, the sample size is 30, and the population standard
deviation is known to be 0.053 inches.
Note that the confidence interval limits shown in the
graph may sometimes differ slightly from those we would
calculate using the pocket calculator and our standard normal distribution table. This is because the applet is using
more exact values for z than we are able to show within
printed tables like the one in the text.
Applet Exercises
9.1 With the slider positioned so as to specify a 95% confidence interval for ␮, what are the upper and lower
confidence limits?
9.2 Move the slider so that the confidence interval is now
99%. Describe how the increase in the confidence level has
changed the width of the confidence interval.
9.3 Move the slider so that the confidence interval is now
80%. Describe how the decrease in the confidence level
has changed the width of the confidence interval.
9.4 Position the slider at its extreme left position, then
gradually move it to the far right. Describe how this movement changes the confidence level and the width of the
confidence interval.
these respondents represent a simple random sample of all
potential respondents within the community and that the
population is large enough that application of the finite
population correction would not make an appreciable difference in the results.
Managers associated with shopping areas like these
find it useful to have point estimates regarding variables
describing the characteristics and behaviors of their customers. In addition, it is helpful for them to have some
idea as to the likely accuracy of these estimates. Therein
lies the benefit of the techniques presented in this chapter
and applied here.
1. Item C in the description of the data collection instrument lists variables 7, 8, and 9, which represent the respondent’s general attitude toward each of the three
shopping areas. Each of these variables has numerically
equal distances between the possible responses, and for
purposes of analysis they may be considered to be of the
interval scale of measurement.
a. Determine the point estimate, then construct the 95%
confidence interval for ␮7 the average attitude toward Springdale Mall. What is the maximum likely
error in the point estimate of the population mean?
b. Repeat part (a) for ␮8 and ␮9, the average attitudes
toward Downtown and West Mall, respectively.
2. Given the breakdown of responses for variable 26 (sex
of respondent), determine the point estimate, then
construct the 95% confidence interval for ␲26 the
population proportion of males. What is the maximum likely error in the point estimate of the population proportion?
3. Given the breakdown of responses for variable 28
(marital status of respondent), determine the point
estimate, then construct the 95% confidence interval
for ␲28 the population proportion in the “single or
other” category. What is the maximum likely error in
the point estimate of the population proportion?
seeing statistics: applet 10
Comparing the Normal and Student t Distributions
In this applet, we use a slider to change the number of degrees of freedom and shape for the Student t distribution
and then observe how the resulting shape compares to that
of the standard normal distribution. The standard normal
distribution is fixed and shown in red, and the Student t distribution is displayed in blue.
Applet Exercises
10.1 Move the slider so that df 5. Describe the shape of
the t distribution compared to that of the standard normal
distribution.
10.2 Move the slider downward so that df 2. How has
this decrease changed the shape of the t distribution?
10.3 Gradually move the slider upward so that df increases
from 2 to 10. Describe how the shape of the t distribution
changes along the way.
10.4 Position the slider so that df 2, then gradually move
it upward until df 100. Describe how the shape of the t
distribution changes along the way.
seeing statistics: applet 11
Student t Distribution Areas
In this applet, we use a slider to change the number of degrees of freedom for the t distribution, and text boxes allow
us to change the t value or the two-tail probability for a given
df. When changing a text-box entry, be sure the cursor is still
within the box before pressing the enter or return key.
Applet Exercises
11.1 With the slider set so that df 9 and the left text box
containing t 3.25, what is the area beneath the curve between t 3.25 and t 3.25?
11.2 Gradually move the slider upward until df 89. What
effect does this have on the t value shown in the text
box?
11.3 Position the slider so that df 2, then gradually move
it upward until df 100. Describe how the value in the t
text box and the shape of the t distribution change along
the way.
11.4 With the slider set so that df 9, enter 0.10 into the
two-tail probability text box at the right. What value of t now
appears in the left text box? To what right-tail area does this
correspond? Verify the value of t for df 9 and this right-tail
area by using the t table immediately preceding the back
cover of the book.
```