One- and Two-Sample Tests of Hypotheses 10. 1 Statistical Hypotheses: General Concepts Often. the problem confronting the scien tist or engineer is not so much the estimation o f a population parameter as discussed in Chapter 9. but rather the formation of a d a ta-based decision procedure that can prod uce a concl usion about some scien tific system . For example. a medical researcher may decide on the basis of experimental evidence whether coffee drinking incre ases the risk o f cancer in h u m a ns; a n engineer might have to decide on the basis of sam ple data whether t h ere is a difference between the accu racy of two kinds of gauges: or a sociologist might wish to collect appropriate data to en able him or her to decide whether a person's blood type and eye color are indepe ndent variables. [n each o f these cases the scie n tist or engin eer postll/mes or conjec tllres something a bout a system. [n addition. each must involve the use of experime ntal d ata and decision m aking that is based on the data. Formal ly. in e ach case. the conjecture can be put in the form of a statistical hypothesis. Pro ced u res t h a t lead to the acceptance or rejection of statistical hypotheses such as these comprise a major area of statistical inference . First. let us de fine pre cisely what we mean by a statistical hypothesis. Dcfinilldll to.l A statistical hypothesis is a n assertion or conjecture concerning one o r m ore popUla t ions. j The truth or falsity of a statistical hypothesis is never k n own with absolute certainty u nless we examine the entire popu l ation . This. o f course. would be 290 Section 10.1 Statistical Hypotheses: General Concepts 291 impractical in most situations. I nstead. we take a random sample from the pop ulation of in terest a n d use (he data contained in this sample to provide evi dence that eith e r supports or does not support the hypothesis. E vidence from the sample that is inconsiste n t with the stated h ypothesis lea ds to a rejection of the hypothesis. where as evide nce supporting the hypothesis leads to its acceptance. It should be m a de clear to the reader that the design of a decision proce dure must be done with the notion in mind of the pro!Jabilil}' of (/ wrong COI/ clusion. For example. suppose that the conjecture (the hypothesis) postulated by the engineer is that the fraction defective p in a certain process is 0.10. The experime n t is to observe a ran dom sample of the product in 4uestion. Suppose that 100 ite ms are tested and 12 i tems are found defective . Il is reasonable to conclude tha t this evide nce does not refute the condition p = 0. 1n. and thus i t m a y lea d to a n acceptance of t h e hypothesis. H owever. i t also does n o t refute p = 0.12 or perh aps even p = 0.15. As a result. the reade r m u st be acclIs tomed to u n derstan ding th at the acceptance of a hypothesis merely implies that the data do not give sufficient evidence to refute it. On the other hand, rejection implies that the sample evidence refutes it. Put another way. rejec tion means that there is a small probability of obtaining the sample informa tion observed when, in fact, the hypothesis is true. For example. in our proportion -defective h ypothesis. a sample of 100 revealing 20 defective items is ce rtainly e vide nce of rejection. Why'? I f. indeed. p = 0.10. the proba bility of obtaining 20 or more defectives is approximately 0.0035. With the resulting small risk of a wrong cOl/clllsiol/. it would seem safe to reject the hypothesis tha t p = 0. 1 0. In other words. rejection of a hypothesis It!ncls to all hut "rule ouf' the h ypothesis. On the other hand. it is very important to emphasize that the acceptance or. rather. failure to reject does not rule out other possihilities. As a result. th e fir/ll conclusiol/ is established hy rhe dow (Il/lIlyst when II hypothesis is rejected. The formal statement of a h ypothesis is often influenced hy the structure of the prohahility of a wrong conclusion. If the scientist is intereste d in strongly supporting a contention. he or she hopes to arrive at the contention in the form of rejection of a hypothesis. I f the me dical researcher wishes to show strong evidence in favor of the contention that coffee drin king increases the risk of cancer. the h ypothesis teste d should be of the form "the re is no increase in cancer risk produced hy drin ki ng coffee." As a result. th e conten tion is reached via a rejection. Similarly. to support the claim that one ki n d of gauge is more accurate tha n another. the engineer tests the hypothesis that there is no difference in the accuracy of th e two kin ds of gauges. The . and /Hternative Hvpotheses The structure of hypothesis testing will be formulated with the use of the term This refers to any h ypoth esis we wish to test and is denoted hy Ho. The rej ection of HI) leads to the acceptance of an alternative hypothe sis. denoted by H I' A null h ypothesis concerning a population parameter will null hypothesis. 292 Chapter 10 One- and Two-Sample Tests of Hypotheses always be stated so as to specify an exact value of the paramete r, w he re as the a l ternative hypothesis allows for the possibility of several va l ues. H ence, if HI) is th e n u l l h ypothesis p = 0.5 for a binomial popul atio n , th e alte rn a tive hypoth esis HI would be one of th e following: p > 0.5, p < or, 0.5, p =1= 0.5 . 10.2 Testing a Statistical Hypothesis L. To illustrate the concepts used in testing a statistical hypothesis about a pop ulation, consider th e fo l lowing example. A ce rtain type of cold vaccin e is known to be only 25% e ffective afte r a period of 2 years. To determine if a n e w and somewhat more expensive vaccine is superior i n providing p rotection against the same virus for a longer period of time , suppose that 20 people are chosen at ran dom and inoculated. In a n actual study of this type the partici pan ts receiving th e new vaccine m ight number several thousand. The number 20 is being used h e re only to demonstrate the basic steps in carrying out a sta tistical test. I f more than 8 of those receiving th e new vaccine surpass th e 2year period without contracting the virus, th e new vaccine will be considered superior to the one presently in use. The requirement that the number exceed 8 is somewhat arbitrary but a ppears reasonable in that it represents a modest gain over the 5 people that could be expected to receive protection if the 20 people had been inoculated with the vaccine already in use. We are essentially testing the n u l l h ypothesis tha t the n e w vaccine is equal l y effective after a pe riod of 2 years as the one now commonly used. The alternative hypoth esis is that the n e w vaccine is in fact superior. This is e quivalent to testing the hypothesis th at the binomial parameter for thl proba bil ity of a success on a give n tria l is p = 1/4 against th e alternative that p > 1/-1-. This is usually writ ten as follows: p= p> I 4' I 4' The test statistic on which we base our decision is X. the number of indi viduals in our test group who receive protection from the new vaccine for a pe riod of at least 2 years. The possible values of X. from () to 20, are divided into two groups: those numbers less than or equal to 8 and those greater th an 8. All possible scores greater than 8 constitute the critical region, and all pos sible scores less th an or e qual to 8 determine the acceptance region. The last number th at we observe in passing from the acceptance region into the critical region is called the critical value. I n our il lustration the critical va l ue is the num ber 8 . The refore, if x > 8, we reject Ho in favor of the alternative hypothesis H I' I f x � 8, we accept Ho. This decision crite rion is illustrated in Figure 10.1. The decision procedure j ust described could lead to either of two wrollg conclusions. For instance , the n e w vaccine may be no bette r than the one now 293 Section 10.2 Te sting a Statistical Hypothesis Accept Ho I 0 (p=0.2) 2 I 7 . Figure I 8 I, 9 Reject Ho (p> 0.2) I 10 1 0.1 Decision criterion for testing p 20 . = 0.2 versus p > x 0.2. in use and, for this particular randomly selected group of individuals, more than 8 surpass the 2-year period without contracting the virus. We would be committing a n error by rejecting HI) in favor of HI when. in fact, Ho is true. Such an error is called a type I error. l>efinitiun 111.2 Rejection of the n u ll h ypothesis w h e n it is true is called a type I error. A second kind of error is committed if 8 or fewer of the group surpass the 2-year period successful ly and we conclude that the new vaccine is no bette r when it actu a l ly is better. I n this case we would accept HI) when it is false. This is called a type II error. Definition 10.3 Acceptance of the n ull hypothesis when it is false is called a type n error. I n testing any s tatistical hypothesis, the re are four possible situations that determine whether our decision is correct or in error. These four situations are summarized in Table 10.1. Table 1 0. 1 Possible Situations for Testing a Statistical Hypothesis Ho Is true Ho Is false Accept HI) Correct decision Type II error Reject HI) Type I error Correct decision The probability of committing a type I error, also called the level of sig is denoted by the Greek letter a. I n our illustration, a type I error will occur when more tha n 8 ind ividuals surpass the 2-year period without con tracting the virus using a new vaccine tha t is actually equiva len t to the one in use. H ence, I f X is the n umber of individuals who remain free of the virus for at least 2 years. nificance, a = P(type I error) = P = I - Lh K \=0 ( x; 20, ) ( X > 8 when p 1 = I 4 -. - I = - 4 ) = 21) Lb .\=9 ( " . x; 20 . 1 .- 4 ) 0.9591 = 0.0409. We say that the n u l l h ypothesis. p = 1/4. is being tested at the a = (l.O409 level of significance. Sometime s the l e ve l of significance is called the size of 2:}:'$ Chapter 10 One- and Two-Sample Tests of Hvpotheses the critical region. A critical region of size 0.0409 is very small and therefore it is u n likel y that a type I error wi l l be com mitted. Consequently, it would be most unusual for more th a n 8 in dividuals to remain i m mune to a virus for a 2-year period using a new vaccine tha t is essentially equivalent to the one now on the market. The probability of committ i ng a type II error, denoted by (3, is impossible to compute u nless we have a specific alternative hypothesis. I f we test the null hypothesis that p = 1/4 against the alternative hypothesis that p = 1/2, then we are able to com pute the probability of accepting Ho w he n it is false. We simply find the probability of obtaining 8 or fewer in the group that surpass the 2-year period when p == 1/2. I n this case f3 = P(type II error) = p(x,,;: 8 whe n p = i) � ( = I b X; 20, i) == 0. 2517 . This is a rather h ig h probability. indicating a test procedure in wh ich i t is quite likely that we shall reject the new vaccine when, i n fact, it is superior to tha t n o w i n use . I de a l ly, w e like to use a test procedure for which both the type I and type I I errors are small. I t is possible that the director of the testing program is willing to make a type II error if th e more expensive vaccine is not significantly superior. I n fact the on l y time he wishes to g uard against the t ype If error is when the true value of p is a t least 0.7. If p = 0.7. th i s test procedure gives f3 = P(type II error) P(X";: 8 when p = 0.7) = = x 2: b(x; 20. 0.7) = 0.0051. r=--O With such a small probabil i ty of com mitting a type II error. it is extremely unlikely th at the new vacci ne would be rejected when it is 70o/r effective after a period of 2 years. As the alternative h ypothesis approaches unity. the val ue of {3 diminishes to zero. Let us assume tha t the director of the testi n g program is unwilling to com mit a type [ [ error when th e al ternati ve h ypothesis p == 1/2 is true e ven though we have found the probabil ity of such an error to be f3 == 0. 2517. A reduction in f3 is always possible by increasing the size of the critical region. For exam ple . consider what happe ns to the values of Cl' and {3 when we change our crit ical value to 7 so that all score s greater than 7 fall in the criti cal region and those Jess th an or equal to 7 fall i n the acceptance regi on. Now. in testi ng p = 1/4 aga inst the alternat ive h ypothesis th at p = 1/2. we fi n d that Cl' = and 'II 2: h \ _.� ( x; I 20. ' 4 ) = 1 f3 = - 2: b \ 7 �(] ( x: ± h(X: 1)' 20, \ �II ' I ) 20. - = 1 - 0.8982 = 0. 1018 4 2 = 0.1316. By adopting a new decision procedure. we have reduced the probability of committing a type II error at the expense of increasing the probability of Section 10.2 Te sting a Statistical Hypothesis 295 committing a type I error. For a fixed sample size, a decrease in the probabil ity of one error will usually resul t in an increase in the probability of the other error. Fortunately, the probability of com mitting both types of error can be reduced by increasing the sampie size. Con sider the same problem using a ran dom sample of 100 individuals. If more than 36 of the group surpass th e 2-year period, we rej ec t the n u ll h ypothesis that p = 1/4 and accept the alternative h ypothesis that p > 1/4. The critical value is now 36. A l l possible scores above 36 constitute the critical region an d all possible scores l ess than or equal to 36 fall in the acceptan ce region. To determine the probab ility of committing a type I error, we shall use the normal -curve approximation with J.L = np = (lOO)(�) = 25 and vnpq = (T = V(100)(�)(�) = 4.33. Referring to Figure 10.2, we need the area u nder the normal curve to th e right of x = 36.5. The corresponding z-value is z 36.5 - 25 =---4.-n- = 2.66. � ---------------- � � �------ tL = 25 Figure 10.2 Probability of a type From Table A.3 we find that 0' = = p(type I error) = 1 - P(Z < 2.66) p(X = I = �- x error. > 36 when p = 1 - 0.996 1 -- �) =P(Z > 2,66) 0.0039. I f Ho is false and th e true value of H I is p = 1/2, we can determine the probability of a type I I error using the normal -curve approximation with J.L = np = ( 1 00)(�) = 50 and (T = vnpq = . � .. V(100)( I )6I ) = 5. 2 The probability of falling in the acceptance region when H I is true is given by the area of the shaded region to the l eft of x = 36.5 in Figure 10.3. The z-value correspon ding to x = 36.5 is z 36.5 - 50 = -5--= - 2.7. - 296 Chapter 10 One- and Two-Sample Tests of Hypotheses (T= 5 / ____________ __ -LI 25 -L------------ x ---------------------- Figure There fore, f3 = P(type I I error) = 50 1 0.3 Proba b i l ity of a type II e r ror. p( X � 36 when p = D = P(Z < - 2.7) = 0.0035. O bviously, the type I and type I I errors will rarely occur if the experiment con sists of 100 individuals. The illustration above u n derscores the strategy of the sci entist in hypoth esis testing. A fter the null and alternative h ypotheses are stated, it is i mpor ta n t to consider the sensitivity of the test procedure . By this we mean that t here should be a determi nation, for a fixed (1' , of a reasonable value for th e probability of wro ngly accepti ng HI! (i.e .. the value of (3) when th e true situa ti on represents some imporlilnt del'iafioll frolll H11. The value of the samole size can usually be de termined for which there is a re as onab le balance he twee n i t and the \alue of f3 c ompu te d in this fashion. Th e v acci ne probl e m is an illustration. The cOllcepts discussed here for a discrete popUlation can equally well be applied to continuous popUlations. Consider the null hypothesis that the aver age weight of male student-. in a certain college is oX kilograms against the ,i 1ternati\e hypothesis that It is une q u al to oK That is. we wish to test 1111 : /1 1/1: /-i = =1= fiX, ()x. The alternative hypothesis allows for the possihility that /1 ' A sample mean that falls close oX or /1 -> flS. the hypothesized value of fiX would to be is considL'/'ed evidence in favor of fill' On the other ham!. a sample mean that considerably less than or more than IlX ul d he wo evidence inconsistent with IlII and therefore favoring 111' The "ample mean is t h e test statistic in thi s case. A critical region for the test statistic might arh it rar ily be chosen t o be t he tw o i nt er vals .r < 67 and .r > flY. T he ,lCceptance region will th en he t he interval 67 % x "" 6Y. This decision criterion is illustrated in F i gur e lOA. Ho (JL # 68) Reject Accept (f..' 67 Figure 10.4 = Ho Reject Ho (JL # 68) 68) 68 Probab i lity of a type 69 II e r ror. Section 10.2 Testing a Statistical Hypothesis 297 Let us now use the decision criterion of Figure 10.4 to calculate the prob abilities of committing type I and type I I errors when testing the null hypoth esis that J.L = 68 kilograms against the alternative that J.L =1= 68 ki lograms for the continuous population of students' weights. Assume the standard deviation of the population of weights to be (T = 3.6. For large samples we may substitute s for (T if no other estimate of (T is avail aQ le. Our decision statistic, based on a random sample of size n = 36, will be X, the most efficient estimator of J.L. From the central limit theorem, we know that the sampling distribution of X is approximately normal with standard deviation (Tx (T/ Vii = 3.6/6 = 0.6. The probability of committing a type I e rror, or the level of significance of our test, is equal to the sum of the areas that have been shaded in each tail of the distribution in Figure 1 0.5. Therefore, = a = P(X < 67 when J.L = 68) + P(X > 69 when J.L = 68 ). �-------L--�--��-- x 69 67 Figure 1 0.5 Critical region for testing The z-values corresponding to Xl = Zl = 67 - 68 -0.6"" = - 1 .67 J.L = 68 67 and x2 and versus = 69 when Z2 = J.L * 68. flo is true are 69 - 68 (i�6 = 1 .67. The refore, a = P(Z < - 1 .67) + P(Z > 1 .67) = 2P(Z - 1 .67) < = 0.0950. Thus 9.5 % of all samples of size 36 would lead us to reject J.L = 68 k ilograms when it is true . To reduce a, we have a choice of increasing the sample size or widening the acceptance region. Suppose that we increase the sample size to n = 64. Then (Tx = 3.6/8 = 0.45. N ow � = "I 67 - 68 0.45 ... ... " = - 2.22 and Z2 = 69 - 68 . ... .. = 2.22. 0.45 " H ence a = P(Z < - 2.22) + P(Z > 2.22) = 2P(Z < - 2.22) = 0.0264. The reduction in a is not sufficient by itself to guarantee a good testing procedure. We must eval uate f3 for various alternative hypotheses that we feel should be accepted if true . Therefore, if it is important to reject Ho when the 298 Chapter 10 One- and Two-Sample Tests of Hvpotheses true mean is some va lue f.L � 70 or f.L � 66, then the prohahility of committing a type I I error should he computed and examined for the altern atives f.L == 66 and f.L = 70. Because of sym m e try, it is only necessary to consider the proha hility of accepting the n ull hypothesis that f.L == 68 when the alternative f.L == 70 is true. A type I I error will result when the sampl e mean _r falls he tween 67 and 69 when HI is true. Therefore, referring to Figure 10.6. we find that f3 == P(67 � X � 69 when f.L == 70). ·,Ho �____________IL______ X ________ 67 Figure 70 69 68 10.6 Type" error for testin g The z-values corresponding to 68 versus J1. = 70. and x2 == 69 when HI is true are .rl == 67 67 - 70 Z I == - -- -- == - 6.67 0.45 J1. = Z2 = and 69 - 70 - - == - 2.22. 0.45 -� Therefore. f3 == P( - 6.67 < Z < - 2 .22) == 0.0132 - JI f !t = P(Z < - 2.22) - P(Z < - 6.(7) O.O()OO = 0.0132. If the t rue va lue of f.L is the alternative f.L == 66. the va lue of f3 wi ll again he 0.0132. For all plissi hle values of f.L < fJ6 or f.L > 70. the value of f3 will he even smaller when 11 == 64. and consequently there would he little chance of accept ing H(( when it is false. The prohahility of committing a type [[ error in creases rapi dly when the true value of f.L approaches. hut is not eq ual to. the hypothesized value. Of course. this is usually the situation where we do not mind maki ng a type" error. For example. if the alternative h ypoth esis f.L == 6:-;.5 is true. we do not mind commit! ing. a type [[ error by concludi ng that the true answer is f.L == 6k. The prohabi li t y of making such an error \vi ll be high when 11 64. Referring to Figure 10. 7. we have == f3 == P(67 � X � 69 when f.L == 6k.5). H, ---LI ________ 67 Figure I ( I --'-I ...JIL-__ 68 ...JI __ 68.5 1 0.7 Type II error for testin g ____ 69 J1. = 68 versus !.L x = 68.5. 299 Section 10.2 Te sting a Statistical Hypothesis The z-values corresponding to.t] = z ] = 67 - 68.5 0.45 .------ = 67 and .t2 - 3. 33 = 69 when JJ.. Z2 = and = 69 - 68.5 ' 0.45 . 68.5 are = 1.11. Therefore, f3 = = P( - 3.33 < Z < 0.8665 - 0.0004 1. l 1) = = P(Z < 1 .11) - P(Z < - 3.33) 0.866 1 . The preceding examples illustrate the following important properties: 1. The type I e rror and type I I error are re lated. A decrease in the probabil ity of one generally results in an increase in the probability of the other. 2. The size of the critical region, and therefore the probability of committing a type I error, can always be reduced by adj usting the critical value(s). 3. A n increase in the sample size n will reduce Q' and {3 simul taneously. 4. If the n u l l hyphothesis is false, {3 is a maximum when the true value of a parameter approaches the hypothesized value. The greater the dis tance between the true value and the hypothesized value, the smaller {3 will be. One very i mportant concept that relates to error probabil i ties is the notion of the power of a test. Definition lOA I ���� IT he power of a test is the probability of rejecting Ho given that a specific tive is true . _ __ _ _ The power of a test can be computed as I - {3. Often different types of tests are compared by contrasting power properties. Consider the previous illustration in wh ich we were testing Ho: JJ.. = 68 and HI: JJ.. 1= 68. As before, suppose we are interested in assessing the sensitivity of the test. The test is gov erned by the rule that we accept if 67 :!S X � 69. We seek the capability of the test for properly rejecting Ho when indeed JJ.. = 68.5. We have seen that the probabil ity of a type I I error is given by f3 = 0.866 1 . Thus the power of the test is 1 - 0.866 1 = 0. 1 339. In a sense , the power is a more succinct measure of how sensitive the test is for "detecting differences" between a mean of 68 and 68.5. I n this case, if JJ.. is truly 68.5, the test as described will properly reject Hn only 13.39% of (he iime. As a result, the test would not be a good one if it is important that the analyst h ave a reasonable chance of truly distinguishing between a mean of 68.0 (specified by Ho) and a mean of 68.5. From the fore going, it is clear that to produce a desirable power (say, greater than 0.8), one must either increase Q' or increase the sample size. In what has preceded in this chapter, much of the text on hypothesis test ing revolves around foundations and definitions. I n t he sections that foll ow we get more specific and put hypotheses in categories as well as discuss tests of 300 Chapter 10 One- and Two-Sample Tests of Hvpotheses h ypotheses on various parameters of i n terest. We begin by drawing th e dis tinction between a one-sided and two-sided h ypoth esis. 10.3 One- and Two-Tailed Tests A test of any statistical hypoth esis, where the alternative is Ho: 8 = 81i, HI: 8> 80, or perhaps Ho: 8 = 80, lfl : 8 < 811, one-sided, such as is called a one-tailed test. I n Section 10.2, we make reference to th e test statistic for a h ypoth esis. General ly, the criti cal region for the alternative hypoth esis 8> 80 lies in th e rig h t tai l of th e distribution of th e test statistic, wh i l e the criti cal region for th e alternative hypothesis 8 < ell l i es entirely in th e left tai l . In a sense, th e inequal i ty symbol poin ts i n th e direction where the critical regi on l ies. A on e tai led test is used in the vaccine experime n t of Section 10.2 to test the h ypoth esis p = 1/4 agai nst th e on e-s i de d alternative p > 1/4 for the bi nomial distri bution. Th e one-tai l e d criti cal region is usual l y obvious. For an under standing the reader should visual i ze the beh avior of th e test statistic an d notice th e obvious siRna/ that wou l d produce evidence supporti ng th e alter native hypothesis. A test of any statistical hypothesis where the alternative is two-sided, such as t I is cal led a two-tailed test, si nce the critical region is spl i t into two parts, ofte n having e qual probabi l ities placed in each t<lil of the distribution of the test sta tistic. The alternative hypothesis H =1= Ho sUltes that either H < 811 or H > HII• A two-tai l e d test was used to test the n u l l hypothsis that f..t = 61-1 kilograms agai nst the two-sided alternative f..t =1= 61-1 kilograms for the contin uous popu lation of student weights in Section 10.2. The null hypothes is, HII, will always be stated using the eLju al ity sign so as to specify a single val ue. In th is way the probability of committing a type I error can be control led. Whe th er one sets up a one-tai l ed or a two-tai led test wi l l depend on the conclusion to be drawn if Ho i s rejected. The l ocation of th e critical region can be determi n ed on ly after HI h as been stated. For exampl e, i n testi ng a new drug, on e sets up the hypoth esis that it is no better than sim ilar drugs now on the market and tests this agai nst the alternative hypoth esis that the new drug is superior. Such an alternative hypoth esis wi l l result in a on e-tail ed test with the critical region in the right tai l. However, if we wish to compare a n e w teach ing techn ique w i th the conven tional classroom proce dure, the alternative hypothesis shou l d al low for the new approach to be either inferior or superi or to the conventional procedure. Hence th e test is two-tai led Section 10.3 One- and Two-Tailed Tests 301 wi th the critical region divided equally so as to fal l in the extreme left and right tai ls of the distribution of our statistic. Certain guidelines are desirable in determining which hypothesis should be stated as Ho and which should be stated as HI . First, read the problem care ful ly and determine the claim that you want to test. Should the claim suggest a simple direction such as more than, less than, superior to, inferior to, and so on, then HI wil l be stated using the inequality symbol ( < or > ) correspond ing to the suggested direction. If, for example, in testing a new drug we wish to show strong evidence that more than 30% of the people will be helped, we immediately write HI: p > 0.3 and then the n u l l hypothesis is written Ho: p = 0.3. Should the claim suggest a compound direction (equality as well as direction) such as at least, equal to or greater, at most, no more than, and so on, then this entire compoun d direction ( � or � ) is expressed as Ho' but using only the equality sign , and HI is given by the opposite direction. Finally, if no direction wh atsoever is suggested by the claim, then HI is stated using the not equal symbol ( *- ) . Example 10.1 A man ufacturer of a certain brand of rice cereal claims that the average saturated fat conte nt does not exceed 1.5 grams. State the null and alternative hypotheses to be used in testing this claim and determine where the critical region is located. SOLUTION The manufacturer's claim should be rejected only if J-L is greater than 1.5 mil ligrams and should be accepted if J-L is less than or equal to 1.5 milligrams. Since the null hypothesis always specifies a single value of the parameter, we test Ho : J-L = 1.5, H I: J-L> 1.5. A lthough we have stated the null hypothesis with an equal sign, it is under stood to inc lude any value not specified by the alternative hypothesis. Conse q uently, the acceptance of HI! does not imply that J-L is exactly equal to 1.5 mi lligrams but rather that we do n ot h ave sufficient evidence favoring HI' Since we have a one-tailed test, the greater than symbol indicates that the crit i£a l region lies entirely in the right tail of the distribution of our test statistic X. • Example 10.2 A real estate agent claims that 60% of all private residences being bui l t today are 3-bedroom homes. To test this claim, a large sample of new residences is inspected: the proportion of these homes with 3 bedrooms is recorded and used as our test statistic. State the null and alternative hypotheses 10 be used in this test and determine the 10catio'1 of the critical region. SOLUTION If the test statistic is substanti a l ly higher or lower than p = 0.6, we would reject the agent's claim. Hence we should m ake the hypothesis 302 Chapter 10 One- and Two-Sample Tests of Hvpotheses Ho : P = 0.6, iii: P *- 0.6. The a l ternative hypothesis implies a two-tailed t �st with the crit ical region divided equal l y in both tails of the distribution of p, our test statistic. 10.4 The Use of P-Values for Decision Making In testing hypoth e ses in which the test statistic is discrete, t h e critical region may be chosen arbitrarily a n d its size determined. If a is too large, it can be reduced by making an adjustment in the critical val ue. I t may be n ecessary to increase the sampl e size to offset the decrease that occurs automatical l y in the power of t h e test. Over a number of gen erations of statistical analysis, it had become cus tomary to choose an a of 0.05 or 0.0 I and select th e critical region accordingly. Then , of course, strict rejection or nonrejection of HIJ wou l d depen d on tha t critical region. For example. i f t h e test is two-t ail ed and (Y is set a t the 0.1)) leve l of significance and the test statistic invol ves, say, the sta n dard normal distrib ution, then a z-va lue is observed from the data and the critical region is z > 1.96. z < - 1.96, where the va lue 1.96 is found as ZIl.II.:'5 in Table A .3. A value of :: in the critical region prompts t h e state ment: 'The val ue of t h e test sta tistic is significant." We can translate that into the user's la nguage . For example. if the h ypot hesis is given by J fill: f.L = 10, "I: f.L =to 10. one might say: "The mean differs sign ifica ntly from the va lue 10." This preselection of a significance level a has its roots in the philosophy that the maximum risk of making a type [ error should be controlled. How ever. this approach does not account for values of test statistics that are "c1ose" to the critical region. Suppose. for example, in t h e il l u stnltion wit h /II': f.L = 10: III: f.L *- 10, a va lue of z = I.k7 is ohserved: strictly spea king, with a = 0.0) the va lue is not significant. But the risk of cummit ting a t ype I e rror if on e rejects iii I in t h is case cou l d hardly be con sidered severe. In fac t . in a two-tailed scenario one can quantify t h is risk as P = 2P(z > UP when f.L = 10) = 2(0.0307) = 0,0614. As a resu l t . 0,0614 is the probabil it y of ohtaining a va l ue of z as large or larger (in magnitude) than l.k7 wh en in fact f.L = 10. Although this evidence against fill is not as strong as that which would resu l t from a rejection at an a = 0.0) le vel . it is important information to t h e user. I n deed. comin ued use o f a = O.OS or 0.0 I i s only a result of what standards have b e e n passed t h rough the generations. The P-value approach has been adopted extensively by users Section 10.4 The Use of P-Values for Decision Making 303 in applied statistics. The approach is designed to give the user an alternative (in terms of a probability) to a mere "reject" or "do not reject" conclusion. The P-value computation also gives the user important information when the z-value falls well into the ordinary critical region. For example, if z is 2.73, it is informative for the user to observe that P = 2(0.0032) = 0.0064 and thus the z-value is significant at a level considerably l ess than 0.05. It is important to k now that under the condition of Ho. a value of z = 2.73 is an extremely rare even t. N amely, a value at least that large in magnitude wou l d o n l y occur 64 times in 1 0.000 experiments. One very simple way of explaining a P-value graphically is to consider two distinct samples prematurely. Suppose that two materials are considered for coating a particular type of metal in order to inh ibit corrosion. Specimens are obtained and one co flection is coated with material 1 and one collection coated with material 2. The sample sizes are n, n2 10 for each sample and corro sion was measured in percent of surface area affected. The hypothesis is that the samples came from common distributions with mean Ji = 1 0. Let us assume that t he population variance is 1 .0. Then we are testing = Ho: Ji, = Ji2 = = 1 0. Let Figure 10.8 represent a point plot of the data; the data are placed on the distribution stated by the null hypothesis. Now it seems clear that the data do refute the null hypothesis. But how can this be summarized in one number? The P-value can be viewed as simply the probability of obtaining this data set given that the samples come from the distribution depicted. Clearly. this prob abi l ity is quite small. say 0.00000001 ! Thus the small P-value clearly refutes Ho. and the conclusion is that the population means are significantly different. J.L Figure = 10 v, 1 0.8 Data that are likely generated from populations having two different means. The P-val ue approach as an aid in decision m a king is quite natural because nearly all computer packages that provide hypothesis-testing compu tation print out P-values along with values of the appropriate test statistic. The fol lowing is a formal definition of a P-value. Detinition 10.5 [ A P-value is the lowest level (of significance) at which the observed value of the test statistic is significant. Chapter 10 One, and Two-Sample Tests of Hypotheses 304 It might be appropriate at this poi n t to s u m marize the procedures f( h ypoth e<;i s testing, This may serve as a foun dation on which specia l cases ar ha sed in sllcceeding sections, For this summary, assume that the hypothesis i fI,,: 8 80, == l. State the n u l l hypothesis 110 that H HIl, == 2. Choose an appropriate alternative hypothesis III from one of the allerna tives H < HI)' H> 80, or H '* HII, 3. Choose a significance l evel of size (\" 4. Select the appropriate test statistic and establish the crit ica l region, (If the decision is to h e hased o n a P-value, it is not necessary t o slate the critical region,) 5. C o m p ut e the value of the test statistic from the sample da ta, 6. Decision: Reject /III if the test statistic has a value in t he nit ical re!!ion (or if the co mputed P-value is less than or equal to l eve I a): ot herwise, do not reject /10' the desired si.\!nific<1nce The reader should realize that the conclusions drawn hy the analyst illay affected hy computed P-val ues, In other words, one may Ih)t have a prese lected (Y level in mind and thus draw conclusions hased on t h e information pro v i d e d hy the P-value, As indicated earlier. this is the approach often taken in rL'a l-lifc situations, he Exercises 1, ."lIl'l't"�' Ih�11 hll'"llh>h 111;11 <II ''''llc l" ,1Il ;liler�isl wi�hes 1 0 lest Ihe k,r" 30'/; III' Ihe publie is ; d ler,!! i c 10 ,'IIC'l"l' !,luclllCf';, r::',pl"in hm\' the ,iIlergisl could "111/1111 (" I it 1\ Pl' I ,:1" , ),: 1.1', ,I 111'c' 1I,'IT,n, clucle Ihill 2, ,\" 'l'i, ll()��i'l IS cdllll'rncd ah()lIt Ihl' dkclivelless ,JI " Ir"i"",�. l'\lIIr,,: ,k,,,!nl'd In gel mure driver, 10 lI'" ,c,11 I'vlh in ,1111'''llOI·i\c-s, (II WIi<l1 11II'(I\I1L" " is ,he ic'sting il Sill.' cOl11mits a II I'c' I c'rr,,, hI cr"'llc',ll"'" ""Ilcilldin!! Ihal Ihe tr<lin In� (lIIISl' i, inelleclill'" (I>I \\ kit III Jl"lhL',is i, she ic' sl in g if she cOlllmits a Ilpc II c'ITIlr I" CITlllh:"lIslll'IIncilitiin!I Ihal Ihe t rain in!! C,'l!rSl' is ciTcclivc.' 3, :\ lar�,' l11anliLICIUril1!I firlll is hein!I chargeu wilh dis,'li11lilldll<"i in ih hiring random sample or 1.� ,Idults sclectcd, II" the numher 1'1' c o llc ge !Ir"dLlil"" in OUI samp le is anywherc frolll h 10 12, Ill' shall "ccepi the null hypolhesis Ihal f! O,h: lHilerllisl', we ,h;1I1 (Iln To test this hypothesis, a is praclices, (,I) \\ h;)1 11I1"lihl'sis is hrin)! tested if a jury commits :1 111'c I c'ITIII I" fillding Ihc rinn guill\'" (h I \\ 11<11 1111" ,t hl'sis is heill)! lested i I' a j lIry Clllllll1its :l III'" II ':1'1'" III finding Ihe firm guillv',' 4, 'Ill<: 1"III'(lrli"'l "I' "dults liling in a small tuwn l\ohl) <Ire c()lk!!l' g r ad l l <lles is eslil11<lted 10 he p O,n, = f! � O,h (;1) r:v,tiuate If = assliming Ihal mial disirihulillll, (hI haluate {3lor Ihe alkrniltil'c (c) b Ihis I I () the "� ,\ -s uilles in /1 �ooJ IeSI prllcedure') 5, Repc'at L\ercise � nd il <l 0,(" l\,' lile' bint) [I "ccCptilI1Cl' II hell ..'Ii() "dull, our silmple , l:,c Ihe ;Irc' IU. sl'le-clL'd 1(\ he Iltlmhcr "f c, lik�L' gI"d· rc!!i'll1 130 IIherL'.r is Ihe (1.:" ,lI1d /' is defincd nllrm,t1 ilppro\irnill i'ln, 6, A filhric n1<lnUraclurcr helicvcs thill the IH')jl"r' OJ), II' a random sample of I () l )rders shows Ihal 3 or Icwer arrived lale, the hl'polhesis Ihal I' O,h should he rej ecled in favor Ilf Ihe allernillil'c f1' O,h, Usc Ihe lion or orders for rilW ll1atcri:iI ilrril·in!:'. lale is /) �c '= binomial distrihution, il Ivpe I error J! ,0 O,h. (h) Find the p robahili l Y or Ctlillmiiting il Ilpc II errllr for the alternalive p = tI.3,/) OA, and [) O,S, (a) Find thc prohahilil\' or COllllllillil1)! ir t he I rue proporlilln is = = Section 7. Repeat Exercise 6 when 50 orders are selected, and t h e critical region is defined to he x � 24. where x i s the number of orders i n our sample that arrived late. Use the normal approximation. 8. A dry cleaning establishment claims that a new spot remover will remove more than 70o/c of t h e spots 10.4 The Use of P-Values for Decision Making 305 of 15 k i lograms with a standard deviation o f 0.5 kilo gra m . To test t h e hypothesis that J.t = 1 5 kilograms against the alternative that J.t < 1 5 kilograms, a ran dom sample of 50 l i n e s will be tested. The critical region i s d e fi n e d to be x < 1 4.9. Assume correct. u == 15 i s to which it is applie d . To check this claim. the spot (a) Find the probability of committing a type I t!rror remover will be used on 12 spots chosen at random. when H() is true . I f fewer than 1 1 of the spots are remove d , we s h a l l accept the n ul l hypothesis that p = 0 . 7 ; otherwise, we conclude that p > 0.7. ( a ) Evaluate a, assuming that p = 0.7. (h) Evaluate f3 for t h e alternative p = 0.9. 9. Repeat Exercise 8 when 100 spots are treated and the critical region is defined to be x > 82 , where x is the numher of spots removed. ( h ) Evalu att! f3 for t h e altt!rnat ivcs J.t = 1 4 .8 and f.L = 1 4.9 k i lograms. 1 5 , A soft-drink machine at a stt!ak house is regu lated so that the amount of dri n k dispensed is approx i m att!ly norma l l y distributed with a mean of 200 m i ll i l it e rs and a s t a n d ard deviation of 1 5 m i l l i l iters. The machine is cht!cked periodically hy taking a sam pit! of 9 drinks a n d computing the average content. If 1 0. I n the publication Relief from A rthritis by Thor i falls i n t h e i n t erval 1 9 1 < :t' < 209, t h e machine i s 40% of the sufferers from osteoarthritis received mea conclude t h a t f.L '* 2 00 milliliters. ticular species of m ussel found off the coast of New when f.L = 200 m i l l iliters. sons Puhl ishers. Ltd . . John E. Croft claims that over surable relief from a n ingredient produced hy a par Zealand. To test this claim. t h e mussel extract is to be given to a group of 7 osteoarthritic patients. If 3 or more of the patients receive relief. we shall accept the null hypothesis t h at p = 0.4: otherwise. we conclude that p < 0.4. (a) Evaluate a. assuming that p = 0.4. ( b ) Evaluate f3 for the alternative p = 0.3. 1 1 . Repeat Exercise 1 0 when 70 patie nts are given the m ussel ext ract a n d the critical region is defined to be x < 24. where x is the n umber of osteoa rthritic patients who receive relief. 12. A random sample of 400 voters in a certain city are asked if they favor an additional 4'k gasoline sales tax to provide hadly needed revenues for street repairs. If more than 220 but fewer than 260 favor the sales tax, we shal l conclude that 60'/r of the voters arc for it. (a) Find t h e prohahility of co mmitting a type 1 error i f 60% of the voters favor the increased tax. (h) What is t h e prohability of comm i t t i ng a type II error using this test procedure if actually only 48% of t h e voters are in favor of the additional gasolinc tax? 13. Suppose, in Exercise 1 2, we conclude that 60% of t h e voters favor t h e gaso l i n e sales tax i f more t h a n 2 1 4 b u t fewer t h a n 2 6 6 voters in o u r sample favor i t . Show that this n e w acceptance region results i n a smaller value for a at the expense of increasing f3. 1 4 . A m a n u facturer h a s developed a new fishing l i n e , which he claims has a mean breaking strength thought to h e operating satisfactorily: otherwise. w e ( a ) Find the prohability of com mitting a type I error (b) Find the probability of committing a type II error when f.L = 2 1 5 milliliters. 1 6 . Repeat Exercise 1 5 for samples of size Use the same critical region. II = 25. 1 7 . A new cure has been developed for a certain type of cement that resu l ts i n a compressive strength of 5000 kilograms per square centimeter and a standard deviation of 1 20. To test the hypothesis t h at f.L = 5000 against t h e alternative t h at f.L < 5000, a random sam ple of 50 pieces of cement is teste d . The critical region is defined to be x < 4970. ( a ) Find the probahility of committing a type I error when Hu is true. ( b ) Eval uate f3 for t h e al ternat ive JL = 4970 and f.L = 4960. 1 8. If we plot the prohabilities of acce p t i ng HI) cor responding to various alternatives for f.L (including the value specified by Hu ) and connect all the points hy a smooth curve, we obtain the operating charac teristic curve of the test criterion, or simply the O C curve. Note that t h e prohability of accepting Hu when it is true is s i mply 1 - a. Operati n g characteristic curves are widely used i n ind ustrial applications to provide a visu a l display of the merits of the test crite rion. With reference to Exercise I S , find the proba hilities of accepting Hu for the fol lowing 9 values of f.L and plot the OC curve: 1 84, 1 88, 1 92 , 1 96. 200. 204, 208, 2 1 2, and 2 1 6. 306 Chapter 10 One- and Two-Sample Tests of Hvpotheses 1 0. 5 Single Sample: Tests Concerning a Single Mean (Variance Known) I n this section we consider formally tests of hypotheses on a single popula tion mean. Many of the illustrations from previous sections involved tests on the mean. so the reader should already have insight i nto some of the detai ls that are outlined here . We should first describe the assumptions on wh ich the e xperiment is based. The model for the underlying situation centers around an experiment with X l . X2 • . • . • XIl representing a random sample from a distribution with mean J-L and varia nce (T 2 > O . Consider first the hypothesis Ho : J-L == J-Lo . HI : J-L =1= J-Lo ' The appropriate test statistic should be based o n t h e random variable X. I n Chapter 8 . the central l imit theorem i s introduced, �hich essentially states that despite the distribution of X, the random variable X has approximately a nor mal distribution with mean J-L and variance (T 2 /f1 for reasonably large sample sizes. So, J-Lx = J-L and (T� = (T2/n. We can then determine a critical region based on the computed sample average, X. It shou ld be clear to the reader by now that there will be a two-tailed critical region for the test. It is convenient to standardize X and formally involve the standard nOf mal random variable Z. where Z = (T/'v X - J-L .. ... n . We know that lInder HI ) , that is. if J-L = J-LI ) . then ( X N(O. 1 ) distribution. and hence the expression P ( 7 (t Il - "" ,. < � - / J-Lo CT \ /1 < 7 , ,- "' n / ") ) = 1 - J-LlI )/(T/'v /1 has an - 0' can be used to write an appropriate acceptance region. The reader should keep in mind that, formally, the critical region is designed to contro l 0', the proba bi lity of type I error. It should be obvious that a two-tailed signal of evidence is needed to support HI ' Thus, given a computed value x, the formal test involves rejecting Ho if the computed test statistic Z == .-.- .. . x J-L() > (T/Yn . Z a/_" or Z < - Z,,/2 ' If - Z a /2 < Z < Z ,, /2 ' do not reject H() . Rejection of Ho , of course. implies acceptance of the a lternative hypothesis J-L =1= J-Lo . With this definition of the Section 10.5 Single Sample: Tests Concerning a Single Mean (Variance Known) 307 critical region it should be clear that there will be probability ex of rejecting Ho (falling into the critical region) when, indeed, IL = ILo . Although it is easier to understand the critical region written in terms of Z , we write the same critical region i n terms of the computed average x. The following can be written as an identical decision procedure: reject Ho if x > b or x < a, where a = ILo - Z a/2 Vn' a a b = ILo + Z a/2 Vii ' Hence, for an ex level of significance, the critical values of the random variable Z and x are both depicted in Figure 10.9. X -scale z -scale Figure 1 0.9 Critical region for the alternative hypothesis Ii * lio ' Tests of one-sided hypotheses on the mean involve the same statistic described in the two-sided case. The difference, of course, is that the critical region is only in one tail of the standard normal distribution. As a result, for example, suppose that we seek to test Ho : IL = ILo , HI: IL > ILo ' The signal that favors H I comes from large values of z . Thus rejection of H o results when the computed Z > z a ' Obviously, if the alternative is H I : IL < ILo , the critical region is entirely in the lower t ai l and thus rejection results from Z < za ' The following two e xamples illustrate tests on means for the case in which a is known. - Example 1 0.3 A random sample of 100 recorded deaths in the United States during the past year showed an average life span of 71.8 years. Assuming a population standard deviation of 8.9 years, does this seem to indicate that t he mean life span today is greater than 70 years? Use a 0.05 level of significance. SOLUTION 1. Ho : IL = 70 years. 308 Chapter 70 One- and Two-Sample Tests of Hvpotheses M> 2. H I : 3. ct == 70 years. 0.05. 4. Critica l region: Z > 1 .64) . 5. Co m p u tati on s : x == 7 1 .8 where z years. == x - - '/ == 8.9 (T Mil . _- . (T \ n years. a n d ;: == 8 9;--0- (}O 7 1 .8 70 1 . == 2.02. 6. De cision : Rej ect H" a n d con cl ude th a t t h e mean l ife span today is greater than 70 years. In Example I 0.3 the P-va lue correspon ding to :: area o f the shaded region in Figure 1 0. 1 0. . 10. 1 0 P-vaJue 2.02 i s given by the ----- z o Figure == 2.02 for Example 10.3. U s i ng Ta ble A .3 . we have P == P(Z > 2 .(2 ) == 0.02 1 7. As a re s u l t . t he e v i d e n ce i n favor of HI is e v e n s t ro n g e r t h a n t h a t s u ggested by a 0.05 l e v e l of s i gn i ficance. Examph.' lilA • A m a n u fa c t u r e r of sports e q u i p m e n t h a s d e ve l oped a n e w syn t h e t i c fi s h i n g line t h a t h e c l a i m s h a s iI m e a n h r ea k i n g s t re n g t h o f H k i lograms w i t h a s t a n d a rd d e v i <l l i o n o f 0.5 k i logra m . Test the h y po t h e s i s t h a t M H k i l o grams a ga i n s t the i1 l te rn a t ive t h a t M =/= 8 k i lograms if a r a n d o lll sample 0/ 5 0 == l i n e s i s t e s t e d a n d fo u n d t o h a ve a m e a n b re a k i n g s t re n g t h o f 7.8 k i logra m s . Use a (J.( ) I l e v e l o f s i g n i ficance. SOLUTION 1 . 1 /, , : 2. I I I : 3. a == M M == 8 k i lo g ra m s . =/= 8 k i logra m s . ( Ul l . 4. Critica l reg i o n : 5. Computations: = 6. ' . )-7 )- , 1 11 d ;: > L..) "' - 7 )- . ;: < - � .r == 7 . H k i l o g ra m s . II == W Ilere ;: == x - Mil / IT \ 50. a n d h e n c e - 11 . 7.8 - H 0.5/\ 5 0 - 2.83. De cision: Rej e ct If" a n d c o n c l u d e that t h e a v e ra ge breaking strength eq u a l to 8 b u t i s . i n fact. less than 8 kilograms. not is Section 10.6 Relationship to Confidence Interval Estimation 309 P/ 2 P/2 ----�------------------�--� z -2.83 0 2.83 Figure 1 0. 1 1 P-value for Example 10. 4 . Since the test in Example 1 0.4 is two-tailed, the desired P-value is twice the area of the shaded region in Figure 1 0. 1 1 to the left of z = 2.83. There fore, using Tabl e A.3, we have - P = p( I Z I > 2.83) = 2 P ( Z < - 2.83) = 0.0046, which allows us to reject the null hypothesis that IL = 8 k ilograms at a level of significance smaller than 0.01 . • 1 0. 6 Relationship to Confidence Interval Estimation The reader should realize by now that the hypothesis-testing approach to sta tistical inference in this chapter is very closely related to the confidence inter val approach in Chapter 9. Confidence interval estimation involves computation of bounds for which it is "reasonable" that the parameter in ques tion is inside the bounds. For the case of a single population mean IL with if k nown. the structure of both hypothesis testing and confidence interval esti mation is based on the random variable Z = X - IL a/Vn ' � I t turns out that the testing of Ho : IL = ILo against HI : IL =1= ILo at a significance level a is equivalent to computing a 1 00( 1 a)% confidence interval on IL and rejecting Ho if ILo is not inside the confidence interval. I f ILo is inside the confi dence interval, the hypothesis is not rejected. The equivalence is very intuitive and quite simple to i l l ustrate. Recal l that with an observed value x failure to reject Ho at significance level a implies that - which is equivalent to The confidence interval equivalence to hypothesis testing extends to dif ferences between two means, variances, ratios of variances, and so on. As a 310 Chapter 10 One· and Two Sample Tests of Hypotheses res u l t t he student of statistics should n ot consider confidence i n t e rval estima tiun and h ypothesis testing as separate forms of statistical i n ference. For e xam ple. consider Exam ple 9.2. The 951ft- confidence i n te rval on the m e a n is given h y the bou n d s 1 2 .50. 2.70]. Thus with the same sample i n formation, a two-sided h y pothesis on 11 i nvolving a n y hypothesized value between 2 .50 and 2 . 70 w i l l n o t be rejected . As w e t u r n t o d i fferent areas of hypothesis testing. t h e equ i v ale nce to the con fi de nce i n te rval esti m a t i o n w i l l con t i n u e to be exploited. 10. 7 Single Sample: Tests on a Single Mean (Variance Unknown) One would certa i n l y suspect t h a t tests on a p o p u l a t i o n m e a n J.L w i t h if2 u n k nown. l i k e con fidence i n terva l est i m a t i o n . should i nvol ve t h e use of S t u den t ''i I-d istribution. S t r i c t l y spe a k i n g. t h e application of S t u d e n t's I f o r h o t h confidence i n tervals and h ypoth esis t e s t i n g is developed under t h e following assumpt ions. The random variables X I ' Xc ' . . . . XII represent a random sam ple frum a normal d i s t ri h u tion with u n k n owllu 11 a n d (T] . Th e n t h e random varia h i e \ I/ ( X - 11 ) IS has a S t u d e n t ' s I-d i s t r i b u t ion with 11 - 1 degrees o f freedom. T h e st ruct u re of the t e s t i s i d e n t ical t o th a t for t h e case of if k n own with the e xception t h a t the v a l ue (T i n the test stat istic is repl aced b y the com puteu est i m a te S a nd the stand ard normal uistribution is replaced h y a I-dis t r i b u t i o n . As il res u l t . for t h e two-sided h y po t h esis reje ction of fill a t sign i fica nce leve l a 1 = res u l t s when a compu ted I-statistic x -- 1111 sl \ f1 exceeds I" I or is less t l1<1n - In :' I ' The re ader should reca l l from Chap ters X and 9 that t h e I-d ist r i h u t ion is sym m e tric a ro u n d t h e val ue zero. Thus this t wo-tai ll:d c r i t ical region a p p l i es i n a fash ion s i m i l a r to t hil t for t he ease of k n own IT. For the t wo-sided hy pot hesis a t signi fica nce Il:vel (t, the two-tai led critical regions apply, For HI : 11 > 111 1 ' rejection res u l ts when I > 1" 1/ I ' For I I : fJ. < : 11 1 1 ' t h e critical region i s g i ve n h y 1 <: - (, " I ' ' . 11 /I I Example W.5 The Edison Eleclric IlIslilllle h a s published figures on t h e a n ll u a l n u m oe r o f k i lowatt-hours expended hy various h o m e appli a n ces. I t i s c l a i m e d t h a t a vacu u m cleaner e x pends a n average of 4 6 k i lowatt-hours per ye a r. I f a random sa m p l e of 1 2 homes i n cl uded in a pla n n e d study i n d icates t h at vacu u m cleaners e xpend a n ave rage of 42 k i lowatt-hours per yea r with a s t a ndard deviation of 1 1 .9 k i lowa t t - h o u rs, does t h i s suggest at the OJ)5 level of sign i ficance t h a t vac u u m cleaners expend. on t h e average. l ess than 46 k i lo wa tt-hours a n n ua l l y ? A ssume the popU lation of k i lowatt-ho u rs to be normal. Section 10. 7 Single Sample: Tests on a Single Mean (Variance Unknown) 31 1 SOLUTION 1. Ho : 2. H I : fL = 46 k ilowatt-hours. fL < 46 kilowatt-hours. 3. a = 0.05 . 4. Critical region: freedom. S. Computations: n = 1 2. Hence t = t < - 1 . 796, where t == x = 42 - 46 ----- l 1.9/V12 X - fL ' r-0 with v == 1 1 degrees of s/ v n 42 ki lowatt-hours, s == 1 1 .9 kilowatt-hours, and == - 1 . 16, P = P ( T < - 1 . 1 6) = 0. 1 35 . 6 . Decision: Do n o t reject Ho a n d conclude t h a t t h e average number o f k ilowatt-hours expended annually b y home vacuum cleaners is not signif icantly less than 46. • Comment on the Single-Sample T- Test The reader has probably noticed that the equivalence of the two-tailed t-test for a single mean and the computation of a confidence i n terval on f..L wi th u replaced by s is maintained. For example, consider Example 9 .4. Esse n tially, we can view that computation as one i n which we have found all values of f..Lo ' the hypothesized mean volume of containers of sulfuric acid, for which the hypothesis H(): fL fLo will not be rejected at a = 0.05 . Again, this is consistent with the state ment: " B ased on the sample information, values of the popula tion mean volume between 9.74 and 1 0.26 lite rs are not unreasonable." Comments regarding the normality assumption are worth emphasis at this point. We h ave i ndicated that when a is known, the central limit theorem allows for the use of a test statistic or a confidence i nterval which is based on Z, the standard normal random variable. Strictly speaking, of course, the cen tral l imit theorem and thus the use of the standard normal does not apply unless u is known. Now, in Chapter 8, the development of the t-distribution is given. At that point it was stated that normal i ty on XI ' X2 , , XII was an underlying assumption. Thus, strictly speaking, the Student 's t-tables of per centage points for tests or confidence intervals should not be used unless it is known that the sample comes from a normal population. In practice, a can rarely be assumed known. H owever, a very good estimate may be available from previous experiments. Many statistics textbooks suggest that one can safely replace u by s in the test statistic == . • • z = x- fLo --- u/Vn when n � 30 and still use the Z-tables for the appropriate critical region. The implication here is that the central limit theorem is indeed being invoked and C / ; dptPI 1(1 (.Ine- and Two Saml){t' Tests of f1Ylmtileses u n e is r e l y i n g on t h e fa c t t h a t Ill u <; t he y i e wed d i s t ri b u t i o n ) o f :l� s = (T. O b v i o u s l y w h e n t h i s i s d o n e t h e re s u l t s hL' i n g a pprox i m a t e . Th u s 0 . 1 :) 1I1<1y he 0. 1 2 com p l l t e d [' - v a l u L' ( f ro l ll t h e / ,) o r p e r h a p s 0. 1 7 . or it co m p u te d c o n fi d e n ce i n t e rv;l l I ll a v he ;1 03 ' ; c o n fi d e n c e i n t e r v a l r a t h e r t h a n il Now d e s i re d . he i n g c 1 o�e to mate. the w h a t a b o u t s i t u ; l t i o l l s w h e re if. and i n c o n fi d e n c e order ;lO'? The J 1 -s 9)',; i n t e r v a l ,IS u s e r c a n n o t re l v o n I to t a k e i n t u acco u n t t h e i n a cc u ra c y ut t h e est i i n te rva l s h o u l d h e wider or t h e c r i t i c a l v a l ue b r g e r i n r]] (l g n i t ud c . The I-d i st r i h u t i o n perce n tage poin ts acco m p l i s h t h is h u t rect o n l y w h t.' n t h e sa m p l e i s fr o m a normal d i s t ri b u t i o n . a rc cor For s m a l l s a m p l e s , i t i s oft e n d i ffic u l t t o d e t e c t d e v i a t i o n s from a n o r m a l d i s t r i b u t i o n . ( G ood n ess-of-fit tests ar e d i sc ussed i n a l a t e r sec t i o n o f t im ch:l p XI/ ' t h e t e l' . ) For h e l l - sh a pe d d i s t r i b u t i o n s of t h e random vari a b les X I ' X� . _ . , use of t h e r d i s t ri b u t i o n fo r t e sts or c o n ficll' n ce i n te r v a l s i s to be q u i te likely . good . W h e n in d o u b t . t h e us,,-' r s h o u l d resort t o n o n p ma rnc t r i c proce d u r e s w h ieh <I re p r ese n t e d i n C h a p t e r 1 6. I t s h o u l d be of i n t e re s t for t h e r e a d e r to sec a n n o t a t e d com p u t e r p r i n t o u t s h o w i n g t h e res u l t o f <l s i n gle-sa m p l e [-test . S u p pose t h a t a n e n g i n e e r is i n t e r e s t e d i n t e s t i n g t h e b i a s i n " p H m e t e r. D a t a a rc co l l e c t e d on s t ,l Il ce ( p H ceo a n e u t ra l s u h 7.( ) ) _ A s a m p l e o f t h e m e a s ur e m e n t <; w e re t n k e n w i t h t h e d a t a a s fo l l ows: 7.07 7.0, II j". 7.0() 7 () 1 7. 1 0 h.l)7 nIl 7 , ( )( ) 1l_l)X 7 . ( jS t h e n . of i n t e re s t to l e s t I I I t h i s i l l ll', l r ,l t i o ll ,I e l i se I f, , : f.l ", : /1 -- 7 .0. ic 7.0 t h L' C ( ) ll l p u t c r p:l c k ; l ge \ I I N I I A B I , ) i l l u \ I I ,1 1 c I l l e ; l I l i l l y s i " ( I f I h e d a t a � c t a h m'e . N l l t in: I h c k e :- C O I l l P , J l I L' l ! h ( ) I t i l e p r i n t ( ) u l ,IIOWI1 .\ d ll l p k i n F i g ur e , t a n d , l rd 1 0. 1 2 . ( ) f c()lIr'e. t h e d e \' i , l I i o ll \ �. l11e'l I1 .' = 7 02:'i/ ). S T D F \ j , ' i n : p l \ ; I 1 L' ( ) _...f +O. a l l d S F vI E ;\ \,J is t i l l' e � l i Jl ) a l c d s t d l1Lia rcl t: r r or o t I he I l l c a n ;l n d i -; e ( ) l l l p l I ll'd as \ /I -'" t U ) I .N . The { - \ a l li e i', t h L' r; d i ( l t 7 , ( )2:'i( ) pH- m eler 7 _ 0 7 7 . 1 0 1 . 0 0 1' . 0 1 6 . '1 11 ,' _ D O 6 . 9 7 11 TB t t e s t m u 7 I p H - m e t e r T E S T OF M U 7 . 0 0 0 0 V S :1 U N . B . 7 _ 0 0 0 0 I = pH meter N 10 Figure 10. 12 MEAN 7 . U250 M I NITAB STDEV SE MEAN o . D � L, O 0 . U139 7 .03 T 1 . 1l 0 7 . 01 ? rJ fl P VALUE 0 . 11 printout for o n e sample (.test for pH meter. Section 10.8 Two Samples: Tests on Two Means 313 The P-value of 0. 1 1 suggests results that are inconclusive. There is not a strong rejection of Ho ( based on an (l' of O.OS or 0. 1 0) , yet one certa inly cannot truly conclude that the p H meter is unbiased. Notice that the sample size of 1 0 is rather small. An increase in sample size ( perhaps another experimen t ) may sort things out. A discussion regarding appropriate sample size appears in Sec tion 10. 1 0. 10. 8 Two Samples: Tests on Two Means The reader has already come to understand the relationship between tests and confidence intervals and can rely l argely on details supplied by the confidence interval material in Chapter 9. Tests concerning two means represent a set of very important analytical tools for the scientist or engineer. The experimental setting is very much l ike that described in Section 9. 7. Two independent ran dom samples of size II I and " 2 , respectively. are drawn from two populations with means J-t l and f.0. and variances a} and a'i . We know that the random variable z = ( X I - X2 ) - (J-tl - J-t2 ) VaUn l + (T1 /n2 has a standard normal distribution. Here we are assuming that n l and n2 are sufficiently large that the central limit theorem applies. Of course , if the two populations are normal, the statistic above has a standard normal distribution even for smal l n l and n2 • Obviously. if we can assume that a l = (T, = (T. the statistic above reduces to . z = ( XI - Xl ) - ( J-t l - J-t 2 ) . /- --- - -_ (T V I /" 1 + 1 / 112 .. _ .. _- -._ - --- The two statistics above serve as a basis for the development of the test pro cedures involving two means. The confidence interval equivalence and the ease in the transi tion from the case of tests on one mean provide simplicity. The two-sided hypothesis on two means can be written quite genera l ly as fio : J-t l - J-t 2 = do · Obviously. the alternative can be two-sided or one-sided. Again . the distribu tion used is the distribution of the test statistic under Ho . Values il and X2 are co mputed and for (TI and (T2 known. the test statistic is given by z = C� I - :t,- ) do. VaUn l + ai /n 2 - - wi th a two-tailed critical region in the case of a two-sided al ternative. Th at is, rej ect Ho in favor of H I : J-t l - J-t2 '* do if z > z"j2 or Z < Z,,/2 ' One-tai led critical regions are used in the case of t he one-sided a lternatives. The reader should, as before, study the test statistic and be satisfied that for, say . HI : J-t l - J-t2 > du , the signal favoring H I comes from large values of z. Th us the upper-tailed critical region applies. - 314 Chapter 10 One- and Two-Sample Tests of Hypotheses Unknown Variances The more prevalent situations involving tests on two means are those in wh ich variances are unknown. If the scientist involved is willing to assume that both distributions are normal and that (T, = (T2 = cr, the pooled f-test (often called the two-sample t-test) may be used. The test statistic (see Section 9.7) is given by the following test procedure. Two-Sample Pooled T- Test: t = - :(2 ) - do (x ,------sp Vl / n, + 1 /n 2 ' -.. where sP2 = s � (n l - 1 ) + si ( n2 - 1 ) . n , + n2 - 2 --------.--. The (-distribution is involved and the two-sided hypothesis is n ot rejected when Recall from m aterial in Chapter 9 that the degrees of freedom for the t-distri bution are a result of pooling of information from the two samples to estimate �. One-sided a l ternatives suggest one-sided critical regi ons, as one might expect. For example, for H I : I-t , - 1-t2 > do , reject Ho : I-t l - 1-t2 = do when ( > ta. II, + n, - 2 · Example 10.6 An e xperiment was pe rformed to compare the a brasive wear of two different l aminated materials. Twelve pieces of material 1 were tested by exposing each piece to a mach ine measuring wear. Ten pieces of material 2 were similarly tested. In each case, the depth of wear was observed. The sam ples of material ! gave an average ( coded ) wear of 85 un its with a sample stan dard deviation of 4, while the samples of material 2 gave an average of 81 and a sample standard deviation of 5. Can we cone/ude at the 0.05 level of signifi cance that the abrasive wear of material I exceeds that of m aterial 2 by more than 2 units? Assum e the popu lations to be approx imately normal with equal variances. SOLUTION Let J.LI and f-L2 represent the popUlation means of the abrasive wear for mate rial 1 and material 2, respectively. Section 5. Computations: .\-1 = 85, .tz = Hence = p S (= P = 10.8 Two Samples: Tests on Two Means 315 81, �(Tl'��I�-�) �j}25i (85 - 81 ) 4.4 7 8 \1'( 1 / 1 2 ) peT > = - 2 + ( 1 / 1 0) 4. 4 78, = 1 . 04 , 1 .04 ) = 0. 1 6. 6. Decision: Do not reject HI ) . We are unable to conclude that the abrasive wear of material 1 exceeds that of material 2 by more than 2 units. _ Unkno wn But Unequal Variances There are situations where the analyst is not able to assume that (T2 Recall from Chapter 9 that. if the populations are normal, the statistic ( I _ - = (TZ ' ( X -1 Xl ) - du Sf s� - -- '" . c cC"'-CC-- � ;tl + ,;� has an approximate (-distribution with approximate degrees of freedom v = 2 (S12 /n ..l + si./nz ... ..) . .. .. . . . . [ (sUn l f/ ( n l - 1 ) 1 + [ (si !nz )z/ (112 - 1 ) ] ' . .. . . . . . .. As a result the test procedure is to flO( reject Hu when - (,,;2. /. ' < t' < {"fZ . t" with v given as above. Again, as in the case of the pooled (-test, one-sided alt ernatives suggest one-sided critical regions. Paired Observations When the student of statistics studies the two-sample (-test or confidence inter val on the difference between means, he or she should realize that some ele mentary notions dealing in experimental design become relevant and must be <;) rl rl r� l:' c p rl .... D p, (" 'l l l thQ. ,-l 1 C r'1 1 C" C' ; ,'\ n nf o v .." ,<.:t. r ; "' p n t d l I n � t(' ;n rh 'l ..... tor 0 1 1 , h n 1"'o ; t r 31 6 i I �. Chapter 10 One- and Two-Sample Tests of Hypotheses to the expe ri mental units in the study. For example, consider Exercise 6, Sec tion 9.8. The 20 seedlings play the role of the experimental units. Ten of them are to be treated with nitrogen and 1 0 with no n itrogen. It m ay be very impor t ant that this assignment to the "nitrogen " and "no nitroge n " treatment be random to ensure that systematic differences between the seedlings do not interfere with a valid comparison between the means. In E x ample 1 0.6, time of measurement is the most l i ke ly choice of the experimental unit. The 22 pieces of material should be measured in random order. We need to guard against the possibili ty that wear measurements made close together in time might tend to give similar results. Systematic (nonran dom) differences i n experimental units are not expected. H owever, random assignments guard against the problem. References to planning of experiments, randomization, choice of sample size, and so on. will continue to influence much of the development i n Chap ters 1 3 , 1 4, and 1 5 . Any scientist or engineer whose interest lies in analysis of real data should study this material. The pooled {-test is extended in Chapter 1 3 to cover more than two means. Testing of two means can be accomplished when data are in the form of paired observations as discussed in Chapter 9. I n this pairing structure, the conditions of the two populations (treatments) are assigned randomly within homogeneous units. Computation of the confidence interval for fLl - fL2 in the situation with paired observations is based on the random variable (�-��� -��-�----.�-�-��--�------ - - - - - . - . - - _ . . _." -----------._--,,-I T= D - - - ._- ---�- ------.--- --- fLD --- S,Jvn ' where D and S" are random variables representing the sample mean and stan dard deviations of the differences of the observations in the experimental un its. As in the case of the pooled {-test, the assumption is that the observa ti ons from each population are normal. This two-sample problem is essentially reduced to a one-sample problem by using the computed di fferences dl d2 , . . . . dll • Thus the hypothesis reduces to • fio : fLo = dl l · The computed test statistic is then given by t = d - do --- ;cc: . s,, / vn Critical regions a r e constructed using t h e t-distribution w i t h n freedom. - 1 degrees o f E xa lll plt' 1 11.7 In a study conducted in the forestry and wildlife department at Virginia Polytechnic I nstitute and State University, J . A . Wesson examined the influence of the drug succinylcholine on the circulation levels of androgens in the blood. B lood samples from wild , free-ranging deer were obtained via Section 10.8 Two Samples: Tests on Two Means 317 the j ugular ve in immediately after an i ntramuscular i nj ection of succinyl choline using darts and a capture gun. Deer were bled again approximately 30 minutes after the injection and then released. The levels of androgens at time of capture and 30 minutes l ater, measured in nanograms per milliliter (ng/ml), for 15 deer are as follows: Deer Time of injection 1 Androgen (nglm)) 30 minutes after injection 2.76 dl 7.02 4.26 2 5.18 3.10 - 2 .08 3 2.68 5 .44 2.76 4 3.05 3.99 0.94 5 4. 1 0 5.2 1 1.11 6 7.05 1 0.26 3.21 7 6.60 1 3 .9 1 7.31 8 4.79 1 8.53 1 3 .74 9 7.39 7.91 0.52 10 7.30 4.85 - 2.45 J1 1 1 .78 1 1.10 -0.68 12 3.90 3.74 - 0. 1 6 13 26.00 94.03 68.03 14 67.48 94.03 26.55 15 1 7.04 4 1 .70 24.66 Assuming that the populations of androgen at time of injection and 30 min utes l a ter are norma l l y distributed, test at the 0.05 leve l of significance whether the androgen concentrations are a l tered after 30 minutes of restraint. SOLUTION Let J.L I and J.L2 be the average androgen concentration at the time of injection and 30 minutes later, respectively. We proceed as follows: H() : J.LI = J.L2 or J.LD 2. H I : J.LI "* J.L2 or J.LD 1. 3. a = J.L I - J.L2 = o. J.L I - J.L2 "* O. = = 0.05 . 4. Critical region: t < - 2 . 1 45 and t > 2. 1 45 , where t = degrees of freedom. 5. Computations: Therefore, d - d Sd .,.2 with v / Vn = 14 The sample mean and standard deviation for the d, s are ' d = 9.848 t = and 9.848 0 ____ 1 8.474/ VlS Sri = = 1 8.474. 2.06. 6. Though the {-statistic is not significant at the 0.05 level, p = p( 1 TI > 2.06) = 0.06. 31 8 Chapter 10 One- and Two-Sample Tests of Hypotheses As a result, there is some evidence that there is a difference in mean cir culating leve ls of androgen . I n the case of paired observations, i t is important t h a t there b e no inter action between the treatments and the experimental units. This was discussed in Chapter 9 in the development of confidence intervals. The no interaction assumption impl ies that the effect of the experimental, or pairing, unit is the same for each of the two treatments. In Example 1 0.7 . we are assuming that the effect of the deer is the same for the two conditions under study, namely "at injection" and 30 minutes after injection. • Annotated Computer Printout For Paired T-test Figure 1 0. 1 3 displays a SAS computer printout for a paired (-test using the data of Example 1 0.7. Notice that the appearance of the printout is that of a single sample (-test and, of course, that is exactly what is accomplished since the test seeks to determine if d is significantly different from zero. Analysi. Variable : DIFF Difference in Level. of Androgens N lie an Std Error T Prob> I T I 15 9 . 8480000 4 . 7698699 2 . 0646265 0 . 0580 Figure 10.13 SAS p r i ntout of p a i red t-test for data of E x a m p l e 10.7. Summary of Test Procedures As we complete the formal deve lopment of tests on popu lation means, we offer Table 1 0. 2 . which summarizes the test procedure for the cases of a single mean and two means. Notice the approximate procedure when distributions are normal and variances are unknown but not assumed to be equa l . Th is sta tistic was in troduced in Chapter 9. 10.9 Choice of Sample Size for Testing Means I n Section 1 0.2 we demonstrate how the ana l yst can exploit re latio nsh ips among the sample size, the significance level 0' , and the power of the test to achieve a certain standard of o ua l i tv . In mn<;t nr:lrt i r :l l ri rrl lm<:hlnrp<: t h p 10.9 Choice of Sample Size for Testing Means Section Table 10.2 J.I- = J.l-o Tests Concerning Means 319 Value of test statistic H, J.l- J.I- < J.l-o z < - z" J.I- * J.l-o z < - z " n and J.I- < J.l-o t < - I", J.I- * J.l-o 1 < - lal 2 and J.I- ] - J.l-2 < do z < - z" z = �_ u . rJ known rJ/Vn • x - J.l-o (= ;/Vn : v J.I- > J.l-o = n - I. J.I- > J.l-o rJ unknown Z = (."\\ - x2 ) = v = do V( rJUn ] ) + ( rJ{ /n2 ) rJ] r - and (X I rJ2 known x2 ) - do sp V( I /nl ) + ( l /n2 ) ' + n 2 - 2 , rJl = rJ2 but unknown 7 sP = : -. ----- - . . "] (nl - l )s� + (n2 n l + n2 2 '� " -'-" J.l-I - J.l-2 > do J.l- I - J.l-z * do J.I-] - J.l-2 < do J.I-] - 1 > f('<l2 Z > z" z < - Z,,/ 2 and Z > Z,,/2 f < - fa I > fa J.l-2 * do « J.I-] - J.l-2 < do (' < - (" J.l-2 > do J.l-I - J.l- 2 * do - (a / 2 and ( > (,,/2 . (' > (" ' ( < - (nil and ( ' > (ttl"]. (sUn] )2 (s{ /n2 )2 ' . ...... . . ... + "] 1 "2 - 1 _ ... rJl 1 > I" l )si J�U"� � s{ /"2 )2 .. . - Z > Z a/2 " --"'-" - = J.l-z > do Z > Za J.I-] - J.I-] - /) Critical region _. * (T2 and unknown d - do -;:=- : v = n - 1 . Sd /vn J.l-n < dl) ( < - (" paired observations J.l-n * do « (= .. . .. . J.l-n > do Suppose that we wish to test the hypothesis ( > I" - (In and ( > (all -320 Chapter 10 One- a n d Two-Sample Tests o f Hvpotheses ---------= Figure f.1o a f.1o + 8 1 0. 1 4 Testi n g f.1 = f.111 versus -------- x f.1 f.111 + 8. Therefore, f3 = P(X < a when fJ- = fJ-o + 8) = p r� =-( tto_��) I. (I/V n < (1-=- ( fLJL.-1:=.. �l when (I/\ 11 fJ- = fJ-o 151_ . J + Under the a l ternative h ypot hesis fJ- = fJ-1l + 8, the statistic x - ( fJ- 1I + (5) (I/'v 1 1 is the standard norma l varia hIe Z. Therefore, f3 = P (z < a � fJ-1I IT/\ 11 _ ) (z (I/\ Il 8 = P � "' � �H - _ 8 (I/\ 11 ) • from which we conclude that 15 \ /1 (T and hence Choice of Sample Size a result that is also true when the alternative hypothesis is fJ- / fJ-il ' I n the case of a two-tailed test we ohtain the power I f3 for a specified a l ternative when Section 10.9 Choice of Sample Size for Testing Means Example 10.8 32 1 Suppose that we wish to test the h ypothesis HI) : J-i 6H k i lograms. == J-i > H, : 68 k i lograms for the weights of male students at a ce rtain co llege using an a = (I.OS l evel of significance when it is known that if == 5. Find the sample size required if the power of our test is to be 0.95 when the true mean is 69 k i lograms. SOLUTION 8 Since a = {3 = 0.05. we have z" = z {3 = 1 . 645 For the alternative J-i = 69. we take = I and then . fI = ( 1 .645 + 1 .(45)2(25) -. - -- . . -.- -- -. = 270.6. -. 1 Therefore. 27 1 observations are required i f the test is to reject the null hypoth esis 95% of the time when. i n fact, J-i is as large as 69 k i lograms. A similar procedure can be used to determi ne the sample size n = II, = 112 required for a specific power of the test in which two population means are being compared. For example. suppose that we w ish to test the hypothesis Ho : H, : = do . J-i , - J-i2 J-i, - J-i2 '* do , when if, and (T2 are k nown. For a specific alternative, say J-i, the power of O l l r test is shown in Figure 1 0. 1 5 to be I - (3 = a/2 p( I X, I i I ! I I do -a Figure - 1 0. 1 5 Testing xl i > (/ f3 when J-i, a/2 a ILl - iJ.� � - J-i2 = do - J-i 2 = do + 8. I do + 0 x, - x2 d" versus IL l - IJ., � d" + 0. Therefore. (3 = == < p X , - X2 < do + l0;�1-��(�J):,� XI 0(�i :;;��1 -8) PC-a < a / a when J.i, � 1 J.i2 = < -:-:- (dl! + 8 ) wh e n + (Ti .>111 \! « Ti - II I"'" I - II 2 = ,..- d0 J + 8 . 8) Under the alternative hypothesis J-i , - J-i 2 = do + 8, the statistic + 8. 322 Chapter 70 One- a n d Two-Sample Tests of Hypotheses V(iff + if] )/n is the standard normal variable Z. Now, writing I - (I 0 \: ( if i + we have p = p l- Z,,/2 - dl l Z (t / e and 0 (Y';J /n = V(�f- -! �] );�1 < Z < Z ,,1 2 ----- - - a -------- \'/( if � dll i )/ 11 ' + (T \/ ( if f from which we conclude that : -ITj)/lll and he nce 11 For the one-tai led test, the expression for the required sample size when = II I = 112 I S Choice of Sample Size n = ( .(,a + 7 ...... f3 -- � 2 ) ( ifCI 8 2 + if"2 ) . When the population v ariance ( or variances in the two-sa mple situat ion ) is unk nown , t h e choice of sa mple size is not straigh t forward. I n testing t h e h ypot hesis J-L J-L II when t h e true value i s J-L = J-L II + 8. t h e statistic = X - ( J-L II + 8 ) S/\ 11 does not follow the {-distribution, as one might expect. but instead fol lows the However, tables or charts based on the nonce ntral {-d istribution do exist for determining the appropriate sample size i f some esti mate of if is available or i f 8 is a multiple of (T. Table A.X gives the sample sizes needed to control the val ues of a and f3 for various values of noncentral (-distribution. � = L8 1 = lL� ..&1 (T (T for both one- and two-tailed tests. I n the case of the two-sample {-test in wh ich the variances are un known but assumed equal, we obtain the sample sizes 1 1 = 11 I 112 n eeded to control the values of a and f3 for various values of = � = 8 1 J if if from Table A.Y. • !. \ ; n l l p h- ! O.9 I n comparing the performance of two catalysts on the e ffect of a reaction yield, a two-sample {-test is to be conducted with a O.OS. The vari= Section 10.10 Graphical Methods for Comparing Means 323 ances i n the yields are considered to be the same for the two cat alysts. How large a sample for each catalyst is needed to test t h e hypothesis HI : 11-1 =/. 11-2 if it is essential to detect a difference of 0.8 u between the catalysts with prob ability 0.9? SOLUTION From Tahle A . 9, w i t h a = 0.05 for a two-tailed test. f3 = 0. 1 , and .1 = 1 0.8 ul (T = 0.8. = we fi nd t h e required sample size to be n 34. It is e mphasized that in practical situa tions it might be difficul t to force a scient ist or engineer to make a com m i tment on i n format ion from which a va l ue of .1 can be found. The reader is reminded t ha t the .l-val ue quantifies the kind of difference between the means that the scien tist considers impor tant, t h at i s, a difference considered significant from a scientific, not a s tatisti cal, point of view. Example 1 0.9 i l l ustrates h ow this choice is often made. namely, by selecting a fraction of (T. Ohviously, if t h e sample size is hased on a choi ce of 1 8 1 t h at is a sma l l fraction of (T, the resu lt i ng sample size may be q u i te l arge compared to what t h e study allows. • 10. 1 0 Graphical Methods for Comparing Means I n Chapter 3 considera ble attention is d irected toward displaying data i n graph ical form. Stem and leaf displays and, i n Chapter 8 , box and w h isker, quantile plots. and quanti le-quantile norma l plots are used to provide a "pic t ure" to summarize a set of experimental data. Many computer software pack ages produce graph ical displays. As we proceed to other forms of data analysis (e.g .. regression ana lysis and analysis of variance) , graphical methods become even more i nformative. Graphi cal aids used i n conj unction with hypothesis testi ng are not used as a replacement of the test procedure. Certain ly. the value of the test statistic ind icates the proper type o f evidence i n support of fll ) or H I ' However. a pic torial display provides a good i l l ustration and is often a better commun icator of evidence to the beneficiary of the a nalysis. A lso, a picture will often clarify w h y a sign i ficant d i fference was found. Failure of an i m portant assumption may be exposed by a summ ary type of graphical d isplay. For the comparison of means, side-by-side box and whisker plo ts provide a t e l l ing display. The reader should recal l that these plots display t h e 25th per centile, 75th perce n t i le, and the median in a data se t . In addition, the w h iskers d isplay t h e extremes in a data set. Consider Exercise 22 fol lowing t h is section. Plasma abscorbic acid levels were measured in two groups of pregnant women, smokers and nonsmokers. Figure 1 0. 1 6 shows t h e box and w h isker p lots for 324 Chapter 10 One- and Two-Sample Tests of Hypotheses 1 .8 1 .5 J "0 '(3 1 .2 <tl u ii 0 u <Jl <{ 0.9 1 0.6 0.3 o Figure 10.16 N o n sm o ker 1 S m oker M u ltiple box and whisker plot plasma ascorbic acid i n smokers and n o n smokers. both groups of women . Two things are very apparent. Ta k i n g i n to account variability, there appears to be a n egligible differe nce in the sample means. I n addition, the variability i n the two groups appears t o b e somewhat differe nt. O f course , the a n alyst must k e ep i n mind the rather sizable d i ffe re nces between the sample sizes i n this case . 1 03 � 43 23 Figure 1 1 I, ! I I L�� None N i t rogen _J 1 0. 1 7 M u lt i p l e box and whisker p l ots of seed l i n g data. Consider Exercise 6 fol lowing Section 9.8. Figure 1 0. 1 7 shows the multi ple box and whisker plot for the data of 10 seedlings. half given nitroge n and half given n o nitroge n . The display reveals a smaller variability for the group con taining no nitroge n . I n addition. the lack of overlap of the box plots sug gests a sign i ficant difference between the mean stem weights between the two groups. I t would appear that the presence of n itroge n i ncreases the stem we ights and perhaps i ncreases the variability i n the weights. There are no certain rules of thumb regard ing when two box and whisker plots give evidence of significant diffe re nce between the means. However. a rough guideline is that i f the 25th perce ntile line for one sam ple e xceeds the Section 10. 10 Graphical Methods for Comparing Means 325 median l i ne for the other sample, there is strong evi dence of a difference between means. More e mphasis is placed on graph ical methods in a real -life case study demonstrated later in this chapter. Annotated Computer Prin tout for Two-Sample T- Test Consider the data of Exercise 6, Section 9.8, where seedling data under con ditions of n itrogen and no n itrogen were collected. Test J-LNIT Ho : = J-LNON J-LNIT > J-LNON ' Hi : where the population means indicate mean weights. Figure 1 0 . 1 8 is an anno tated computer printout using the SAS package. Notice that sample standard deviations and standard error are shown for both samples. The I-statistic under the assumption of "equal variance" and "unequal variance " are both given. From the box and whisker plot of Figure 1 0. 1 7 it would certai n l y appear that the equal variance assumption is v iolated. A P-value of 0.0229 suggests a con clusion of unequal means. This concurs with the diagnostic i nformation given in Figure 1 0. 1 7. I ncidentally, notice that t and (' are equal in this case, since n l = 1l2 · TTE S T V a r i a b le : no PROCEDURE WEIGHT M I NE R A L N Mean nitrogen ], 0 0 . 39900000 n it r o g e n ], 0 Std 0 . 56500000 0 . 05905271 P r o b > ITI Unequal - 2 . 6 ], 9 ], H.7 0 . 02 2 9 Equal - 2 . 6 ], 9 ], 11 L O Prob > F' 0 . 0 1 7 .1; V a r i a nces a r e e q u a l , = E r ro r 0 . ], 11 6 7 .1; ], 0 6 OF For HO : Std 0 . 0 2 3 0 ], 9 3 2 T V a r ia nces Dev 0 . 072793<0 F' = 6 . 50 OF : ( 9 , 9 ) 0 . 0090 Figure 1 0. 1 8 SAS p r i n t o ut for two-sa m p l e t-test. Exercises 1 . A n e k ct rical firm man ufact ures l ight hulhs t h a t h ave a l i fe t i me t h a t is approxi m a t e l y n o r m a l l y d i s t r i h ut e d w i t h a mean of ROO hours and a s t a n d ard d e v i a t i o n o f 40 ho urs. Test the hypothesis t h a t /-l = 800 hours agai nst t he al ternative J.L "* ROO hours if a random sample of 30 bul bs has a n average l i fe of 7RR ho urs. Use a 0.04 level o f sign i ficance. sis that J.L = 5.5 ou nces agai nst t h e a l t e r n a t ive h ypothesis. /-l < 5 . 5 ounces a t the 0.05 k v e l o f sign i f icance. 3. In a research report by Richard H. We indruch o f t h e U CL A M e d i c a l Sch ool. it i s clai med t h a t m ice w i t h an average l i fe span o f 32 mont hs w i l l l i ve to be about 40 months old when 40'ff of the calories in t h e i r 2. A random sample o f 64 bags o f W h ite Cheddar food a r c replaced b y v i t a m i n s and pro t e i n . I s t h e r e standard deviat ion of 0.24 ounces. Test the h ypothe- pl aced on this diet have an average l i fe of 3R months Popcorn weighed, o n average. 5.23 o unces with a any reason to believe that /-l < 4 0 i f 64 m ice t h a t are ..... 330 Chapter 10 One- and Two-Sample Tests of Hypotheses was made on each dog and t h e stre ngt h was mea carhon monoxide h as an i mpact on bre a t h i n g capa \ure d . The resul t i n g data appear helow . h i l i t y . The data were collected hy personnel in t h e ( a ) W r i t e a n appropri a t e hypothesis to determine i f H e a l th and P h ysical Education Department a t V i r t here is a sign i fi c a n t d i ffe r en c e i n stre n g t h hetween ginia Pol ytechnic I ns t i t ute and S t a t e Un iversity. The ( h ) Test the hvpot hesis using a p a i re d I-test. Use a P ter a t V P I & S U o The suhjects w e re expose d to t h e hot and cold i ncisions. data were analyzed in the S t a tistics ConSUl ting Cen hreath i n g cha mhe rs, one of w h i c h contained a h i gh \'al ue in your cll n cl usion . conce n t r a t i o n of CO. Several h r e a t h i n g measures were m ade for e ac h suhject for e ach chamber. The 5 , 1 20 H ot :2 Hot 1 0.000 ., - enid 8,600 3 Co l d 3 -+ -+ 5 36. N i ne H ot Cold H ot 6200 Hot 7,90n Cold 5 200 7 Cold Hot 8 Cold �uh j ec ts w e re used I 2 l O,OOO 0 8 Subject 1 0 ,000 1 0 ,000 Hot ing frequency in n umher of hreaths taken per m i n u t e . 9200 C ol d 7 random sequence. The fol l owing data give t he breath l O,OOO 5 6 j, Hot suhjects were exposed t o t h e hre a t h i n g c h a mhers in 8,200 Cold 510 885 l ,!l20 With CO Without CO 30 30 -+5 40 3 26 25 4 25 23 5 34 30 6 51 -+9 7 46 41 8 32 35 9 30 28 -+60 I l1 an e x p e r i m e n t to d e t e r m i n e if an a t m o s p h e re i n v o l v i n g e xposure t o M a k e a o n e - sid e d test o f t h e h y p o t hesis t h a t m e a n hre a t h i n g freque ncy is t he same for t he t wo envi ro n ments. Use a = 0.05. 10. 1 1 One Sample: Test on a Single Proportion Tests of hypotheses concerning proportions are reljuired in many areas. The politician is cer t a i n l y i n t e rested i n knowing what fraction o f the voters will favor h i m in the next e lection. A l l manufacturing fi rms a re conce rned about the proportion of defective items when a s h i p m e n t is made. The gambler depends on a k nowledge of the proportion of outcomes t h a t he considers favorable . We shall conside r the problem of testi n g the hypothesis that the propor tion of successes in a binomial experiment equals some specified value. That is. we are test i n g the n u l l h ypothesis HI ) that I' = PI)' where p is the parameter of the binomial d istribution. The alternative hypothesis may be one o f the usual one-sided or two-sided a l ternatives: I' < P II' P > P o' or p *- Po · The appropriate random variable on which we base our decision criterion is the binomial random variable X, although we could just as well use t h e sta tistic /) = X/I1. Val ues o f X that are far from the mean fJ- = npil will lead to Section 10. 1 1 One Sample: Test on a Single Proportion 331 t h e rejection of the n ull h ypothesis. Because X is a discrete binomial variable. i t is un likely that a critical region can be established whose size is exactly equal to a prcspeci fied val ue of a. For this reason it is preferable. in dealing with small samples . to base our decisions on P-values. To test the h ypothesis P o. III) : P II I : [I < [II), = we use the binomial distribution to compute the P-value P = P ( X ,:; x when p = Po ) . The v(Jlue x is the n umber of successes in our sample of size / 1 . I f this P-value is less than or equal to a. our test is signi ficant at the a level and we reject III ) in favor of II I ' Simi larly. to test the hypothesis 110 : p = Po ' HI : P > Po . at the a-level o f sign ificance, w e compute P = P(X ;0, x when P = Po ) and reject 110 in favor of III if this P-va lue is less than or equal to a. Finally, to test the h ypothesis f lo : P = Po , HI : p =I- P I) ' at the a-level of significance . we compute P = 2 P ( X � x when p = Po ) if x < I1p " or P = 2 P ( X ;o, .r when p = Po ) if x > "I'" and reject H" i n favor of HI if the computed P-va lue is less than or equal to a. The steps for testin g a null hypot hesis about a proportion aga inst various alternat ives usi n g the binomial probahil ities of Table A . ) are as follows: Testing a proportion: small samples 1. Ho : P 2. HI : A l te rnatives are P < Po , P > Po, or P =I- Po · = Po ' 3. Choose a level of significance equal to a. 4. Test statist ic: Binomial variable X with P = Po . S. Computations: Find .t, the n umber of s uccesses, and compute the appropriate P-value. 6. Decision: Draw appropriate conclusions based on the P-value. E x a m ple H U H A builder cla ims that heat pumps are installed i n 7Wii of all homes heing constructed today in the city of Richmond. Would you agree with 332 Chapter 1 0 One- a n d Two-Sample Tests of Hypotheses this clai m if a random survey of new homes i n this city shows that 8 out of 1 5 had heat pumps i nstalled? Use a 0. 1 0 leve l o f signi ficance. SOLUTION p = 0.7. 1. I/o : 2. HI : P *- 0.7. 3. a = 0. 1 0. 4. Test statistic: B i nomial variable X with p = 0.7 and n = 1 5. x = 8 and np/ J = ( 1 5 ) (0.7) = \ 0.5. Therefore . from Table A. I , the computed P-value is S. Computations: P = 2 P ( X � 8 when p = 0.2622 > 0. 1 0. = 0.7) = 2 K 2: b (x ; 1 5 . 0.7) 6. Decision: Do not rej ect I/o . Conclude that there is i nsufficient reason to doubt the builder's claim. In Section 5.3. we saw that binomial probabi lities were obtainable from t h e actual binomial formula or from Table A . l when 11 is small. For large 1 1 . approximat ion procedures are required. When t h e hypothesized value Po is very close to () or I . the Poisson distribution. with parameter fL = /lPI i ' may be used . H owever . the normal -curve approx imation. with parameters fL = IIP r r and (J' c = npli 1/o . is usual l y preferred for l arge II and is very accurate as long as Prr is not e xtremely close to 0 or to I . I f we use t h e normal approximation. the z-value for testing P = PI! is given by ;:, == x - npr ) \ II P r r l/ rr which is a value of the sta ndard normal variable Z. He nce . for a two-t ai led test at t h e a-level o f significance. t he cri tical region is ::: < - ::: " C and ::: > ::: " 2 ' For the one-sided alternative P < P li ' t h e crit ical region is ::: < - ::: " , and for t h e alternat ive I) > /)r r ' the critical region is ::: > ::: " , .. I' , a m p le 1 0 . 1 1 A com monly prescribed drug for re lieving nervous te nsion is be l i e ved to he o n l y 6WIr e ffecti ve. Experi m e n t a l results w i t h a new drug adm i n istered to a random sample of 1 00 adults who were suffering from n e r vous tension sr r)w that 70 rece ived relief. Is this sufficient evidence to con clude that the new drug is superior to the one commonly prescribed'! Use a 0.05 leve l of sign i ficance. S O L IJTION HII : 2. H I : 1. 3. a = P = 0.6. P > 0.6. 0.05. 4. Critical region: z > 1 .645. Section 70. 12 Two Samples: Tests on Two Proportions 70, n = 1 00, np o 70 - 60 ------. .V( l 00) (0.6) (0.4) = 2.04, 5. Computations: z = .. - 6. Decision: x = �- = ( 100) (0.6) P = P (Z > = 333 60, and 2.04) < 0.025. Reject Ho and conclude that the new drug is superior. • 10. 12 Two Samples: Tests on Two Proportions Situations often arise where we wish to test the hypothesis that two propor tions are equal. For example, we might try to show evidence that the propor tion of doctors who are pediatricians in one state is equal to the proportion of pediatricians in another state. A person may decide to give up smoking only if he or she is convinced that the proportion of smokers with lung cancer exceeds the proportion of nonsmokers with lung cancer. In general, we wish to test the null h ypothesis that two proportions, or binomial parameters, are equal. That is, we are testing PI = Pz against one of the alternatives PI < Pz , P I > P2 ' or P I * P2 ' Of course, this is equivalent to testi ng the null h ypothesis that PI - P2 = ° against one of the a l ternatives P I - p z < 0, PI p z > 0, or P I -;: P 2 � O. The statistic on wh ich we base our decision is the random variable P I - P 2 . I ndependent samples of size n l and n 2 are selecteF at ra'1.dom from two binomial popUlations and the proportion of successes P I and P z for the two samples is computed. I n our construction of confidence interv�ls for PI apd P2 we noted, for n l and n2 sufficiently large, that the point estimator P I minus P 2 was approximately nor mally distributed with mean - an d var iance ' (J' L P, -P, = P I q l + f}2q-;" nI n2 Therefore, our acceptance and critical regions can be established by using the standard norma l vari able Z = � ( P 1 . -. P'2 ) - (PI --. eJ V(P l q l /n l ) + (P 2 q2 /n 2 ) When HI I is true, we can substitute P I = P2 = P and ql = q 2 = q (where P and q are the common values) in the preceding formula for Z to give the form To compute a value of Z, however, we must estimate the parameters P and q that appear in the radical . Upon poo ling the data from both samples, the pooled estimate of the proportion p is 334 Chapter 10 One- and Two-Sample Tests o f Hypotheses where XI - a n d r , a rc t h e n u m b e r of su ccesses i n e a c h of the two s a m p l e s . S u b s t i t u t i n g fi for f J a n d II 1 = m i n e d from t h e form u l a - /) lor 1/ . t h e ;:-\- alue for testing P I \ - , ., PI p, 1)(1[(1/11 ;)-+- ( I /f�, ) 1 :::: 1'2 is d e t e r ' fhe c r i t i c a l r e g i o n s for t h e a p p ro p r i a t e a l t e rn a t i ve h yp o t h e ses a re set u p as belore using c r i t i c a l poi n ts o f the s t a n d ar d nor m a l c u r v c . l l e n c e . fur t h e ;l l t e r n a t i v e P I * p � a t t h e a- l e v e l of sign i fi c a nce. t h e c r i t i c a l regioll i s ; - :. " , and For a test w h e r e t h e a l t e rn a t i v e i s P I < 1', . t h e c r i t i c a l r e g i o n IS :. > ,�" :. < ::: " a n d w h e n t h e a l t e r n a t i ve i s P I > P2 ' t h e c r i t i c a l r e g i o n i s ::: ::: " , " ' ;I II1 P '" 1 0. 1 2 A vote i s t o b e t a k e n among t h e reside n t s 01 a town a n d t h e s u r r o u n d i ng co u n t y t o d e t e r m i n e w h e t h e r a p roposed c h e m ic a l p l a n t s h o u l d be cOl h t r u c t e d . T h e con s t r u c t i ( ) n s i t e i s w i t h i n t h e t o w n l I m i t s a n d for t h is r e a son m a n y voters i n t h e c o u n t y fe e l t h a t t h e propos a l w i l l p a ss because of t h e l a rge propor t i o n o f t o w n voters w h o fa vor t h e const ru c t i o n . To d e t e r m i ne i f t h e re i s a sign i fi c a n t d i ffe ren ct: i n t h t: p roport i on o f t o w n v o t e r " a n d c o u n t y vOle rs fa vor i n g t h e p roposa l . a p o l l i s t :1 k t: n . I f 1 20 o f 20() t u w n vote l s favor t h e proposal a n d 2 40 o f 5 0 0 co u n t y reside n t s ravor i t . w o u l d y o u a gr e e t h a t t h e p ro po r t i o n o f t o w n voters favori n g t h t: proposal i s h i g h e r t h a n t h e propor t i on o f co u n ty v o t e rs " Use a O.()25 l e ve l o f s i g n i fi c a nce . SOUJ flON L e t P I and P , bc the t r u e proport i o n o r voters i n t h e tmvn and co u n ty. respec t i v e l y . ravor i n g t h e proposa l . 1 . "1 1 : ) 2. " 1 : 3. lY = II I = 1'1 II , . > /"- O.(]2". 4. Cri t ical region: 5. ( 'oll1 p u t a t i l l l l s : , \yn. XI PI 1/ 1 2 4( ] x' / ', I' 1 20 � ( )( ) 1/, . XI III ,,00 + t, + II, ( LhO ( iAK 1 2( ) + 240 200 + ,,00 6. D e c i s i o n : = p ( /. > 2 .Y ) 0.) \ ( J.hO -- ( J.·+K \ ( O .'i I ) ( O. 4() 1 1 ( I nO( ) ) I' = = O . l )( J I l) . 1 ( I /.'i( )( ) 1 I - 2.9. R e j e c t I I" a n d a gree t h a t t h e pro p o r t i o n o r town v o t e rs fa vor ing the prupos a l i s h ig h e r t h a n the p ropor t i o n o f cou n t v voTe rs. • Section 10. 12 Two Samples: Tests on Two Proportions 335 Exercises 1. A m a r k d i n g e x p e r t for a past a-making company belit!ves t h a t 40'k of pasta lovers prefer lasagna. I f 9 out of 20 pasta lovers choose lasagna over other pas tas. w h a t can be concl uded anout the expert 's claim? Use a 0.05 level of significance. increased i f the experiment were repeated and 16 of 48 rats developed t u m ors? Use a 0.05 l e ve l o f signifi cance. 9. I n a s t u d y t o e s t i m a t e the proportion of resi dents in a certain c i t y a n d i ts sunurns who favnr t h e 2. Suppose t h a t . i n t he past. 40'k of all a d u l ts construct i o n o f a n uck;lr power p l a n t . i t i s fou n d believe t hat t h e proportion o f adults favoring capi t a l w h il e only 5 9 of 1 25 sunurnan resi d e n t s arc i n favor. favored capital p u n i s h m e n t . Do we h a ve reason t o punishment t o d a y h a s i ncreased i f . i n a random sam ple of 1 5 adults. R favor capi t a l p u n i s h m e n t ? Use a 0.05 level of significance. 3. A coin is tossed 20 t i mes, resulting in 5 heads. I s t h i s suffi c i e n t evide nce t o reject t h e h y po t h esis t h a t t h a t 63 of 1 00 u r n a n resi d e n t s favor t he construction I s t h ere a signi ficant d i fference ndween the propo r t i o n of urnan and s u burnan residents who favor con s t r u c t i o n of t h e n uclear p l a n t ? M a k e use of a P-value. 1 0. In a study on the fe rtility of married women con t h e c o i n is b a l anced i n favor of t h e a l t e rnative t h a t ducted ny M a r t i n O ' Connell and Caroly n C. Rogers value. less wives aged 25 t o 29 were selected a t random and heads occu r less t h an SOCk o f t h e t i m e ? Quote a P 4. I t is believed that at l e a s t 600/( o f t h e residents i n a certain a r e a favor a n a n n e x a t i on s u i t n y a nt!igh fo r the Census Bureau in 1 979. two groups of child each w i fe was asked i f she eventually planned t o have a child. One group was selected from among t h ose nor i n g city. W h a t conclusion would y o u draw i f only wives married kss than two years and t he other from 1 1 0 in a sample o f 200 voll:rs favor t h e sui t ? Use a among t hose wives married five y e ars. Suppose t h a t 0.05 level of significance. 5 . A fuel oil company claims t ha t o n e - fifth of t h e 240 of 3()0 wives m a rried less t h a n t w o years planned t o have children some day compared to 2�11 o f the 4()() homes in a cert a i n c i t y arc heated n y o i l . Do we h ave wives m a rried five years. Can we conclude t h a t the reason to douht t h is claim i f. in a random sample o f proportion o f wives married kss than two y e a rs who 1 00 0 homes in this c i t y . i t is found t h a t 1 36 a r c heated planned t o havt! chi ldren is signifi ca n t l y higher than ny o i l ? Usc a 0.01 lewl of sign i ficance. t h e proportion o f wives married five years'! Make usc seem t o be a valid estimate if. i n a random sam ple of i ncidence o f breast cancer is higher t han in a nearny 6. A t a certa i n colit! ge i t i s e s t i m a t e d t h at a t most 25';{- o f the students ride bicycles t o class. Does t h is \10 coll ege st u d e n t s . 2� arc found to ride nicycles to class'? Usc a 0.0) level of significance. 7 . A new rad a r device is be i n g considered for a certa i n defense m i s si l e system. The system is checked ny e x p e ri m e n t i n g w i t h a c t u a l a i rc r a ft in which a kill or a I/O kill i s s i m u l a t e d . If in 3()O tri als, 250 k i l l s occu r . accept o r rej e c t . a t t h e 0.04 level of sign i ficance. t h e claim t h a t t h e probabi l i t y of a k i l l of a P-value. 1 1 . An urnan com m un i t y would like to show that the rura l area. ( PCB levels were fo und to ne higher in the soil o f t he urnan community . ) If it is found t h a t 2() of 2()O a d u l t women in the urban com m u n i t y h a v e nreast cancer and 1 0 of 1 50 adult women in t he rural com m u n i t y have bre ast cancer. can we concl ude at the 0.06 kvel o f sign i ficance t h a t breast cancer is more prevaknt in t h e urnan community? 1 2. In a winter o f elll epidemic Il u. 2(J()O banies were w i t h t h e n c w system docs n o t exceed t h e O . � proba surveyed by a w..:l l- known pharmace utical compa n y hili t y of the existing device. t o determine if t h e company's n e w medicine was 8. In a con t rolled laboratory experi ment. scie n tists at the University of M i n n e sota discovered that 25 '7r o f a certain strain o f rats sunjt!cted t o a 20% coffee bea n d i e t a n d t he n force-fed a powerful cancer-ca using chem i cal later developed cancerous t u m ors. Would effective after t wo days. Amlm g 1 20 nabies who had the fl u and were given the medicine. 29 were cured wit hin two days. Among 2110 nanies who had the flu but were not given the medicine. 56 were cured wi t h i n two days. I s t here a n y signi ficant i n dication we have reason t o neli eve that the proportion of rats t hat supports the compa n y ' s claim o f t h e effectiveness developing t u mors when subjected t o this d i e t has o f t he medicine?