UNIVERSITY OF SWAZILAND FINAL EXAMINATION PAPER 2013 TITLE OF PAPER SAMPLE SURVEY THEORY COURSE CODE ST306 TIME ALLOWED TWO (2) HOURS REQUIREMENTS CALCULATOR AND STATISTICAL TABLES INSTRUCTIONS ANSWER ANY THREE QUESTIONS 1 Question 1 [20 marks, 3+3+3+3+4+4] (a) For each of the following surveys, describe the target population, sampling frame, sampling unit, and observation unit. Discuss any possible sources of selection bias or inaccuracy of responses. (i) A student wants to estimate the percentage of mutual funds whose shares went up in price last week. She selects every tenth fund listing in the mutual fund pages of the newspaper and calculates the percentage of those in which the share price increased. • (ii) A sample of 8 architects was chosen in a city with 14 architects and architectural firms. To select a survey sample, each architect was contacted by telephone in order of appearance in the telephone directory. The first 8 agreeing to be interviewed formed the sample. (iii) To estimate how many hooks in the library need rebinding, a librarian uses a random number table to randomly select 100 locations on library shelves. He then walks to each location, looks at the book that resides at that spot, and records whether the book needs rebinding or not. (iv) A survey is conducted to find the average weight of cows in a region. A list of all farms is available for the region, and 50 farms are selected at random. Then the weight of each cow at the 50 selected farms is recorded. (b) Senturia et al. (1994) describe a survey taken to study how many children have access to guns in their households. Questionnaires were distributed to all parents who attended selected clinics in the Chicago area during a 1-week period for well or sick-child visits. (i) Suppose the quantity of interest is percentage of the households with guns. Describe why this is a cluster sample. What is the psu? The ssu? Is it a one-stage or two-stage cluster sample? How would you estimate the percentage of households with guns and the standard error of your estimate? (ii) What is the sampling population for this study? Do you think this sampling procedure results in a representative sample of households with children? Why, or why not? Question 2 [20 marks, 10+5+5] (a) Foresters want to estimate the average age of trees in a stand. because one needs to count the tree rings on a core taken from older the tree, the larger the diameter, and diameter is easy to the diameter of all 1132 trees and find that the population mean select 20 trees for age measurement. Tree No. 1 2 3 4 5 6 7 8 9 10 Diameter, x Age, y Tree No. 12.0 11.4 7.9 9.0 10.5 7.9 7.3 10.2 11.7 11.3 125 119 83 85 99 117 69 133 154 168 2 11 12 13 14 15 16 17 18 19 20 Determining age is cumbersome the tree. In general, though, the measure. The foresters measure equals 10.3. They then randomly Diameter, x Age, y 5.7 8.0 10.3 12.0 9.2 8.5 7.0 10.7 9.3 8.2 61 80 114 147 122 106 82 88 97 99 Estimate the population mean age of trees in the stand and give an approximate standard error for your estimate. (b) An accounting firm is interested in estimating the error rate in a compliance audit it is conducting. The population contains 828 claims, and the firm audits an SRS of 85 of those claims. In each of the 85 sampled claims, 215 fields are checked for errors. One claim has errors in 4 of the 215 fields, 1 claim has three errors, 4 claims have two errors, 22 claims have one error, and the remaining 57 claims have no errors. (Data courtesy of Fritz Scheuren.) (i) Treating the claims as psu's and the observations for each field as ssu's, estimate the error rate for all 828 claims. Give a standard error for your estimate. (ii) Estimate (with SE) the total number of errors in the 828 claims. Question 3 [20 marks, 4+4+12] (a) Mayr et al. (1994) took an SRS of 240 children aged 2 to 6 years who visited their pediatric outpatient clinic. They found the following frequency distribution for free (unassisted) walking among the children: Age (months) 9 10 11 12 13 14 15 16 17 18 19 20 Number of children 13 35 44 69 36 24 7 3 2 5 1 1 (i) Find the mean, standard error, and a 95% CI for the average age for onset of free walking. (ii) Suppose the researchers want to do another study in a different region and want a 95% confidence interval for the mean age of onset of walking to have margin of error 0.5. Using the estimated standard deviation for these data, what sample size would they need to take? (b) The following data are from a stratified sample of faculty, using the areas biological sciences, physical sciences, social sciences, and humanities as the strata. Proportional allocation was used in this sample. Stratum Number Number of Faculty of Faculty Members in Members in Sample Stratum Biological sciences Physical sciences Social sciences Humanities Total 102 310 217 178 807 7 19 13 11 50 The frequency table for number of publications in the strata is given below. 3 Number of Refereed Publications o 1 2 3 4 5 6 7 8 Number of Faculty Members Biological Physical Social Humanities 1 10 9 8 220 2 001 0 1 1 0 1 o 2 2 0 2 0 1 0 0 100 020 0 0 o 1 1 (i) Estimate the total number of refereed publications by faculty members in the college and give the standard error. (ii) Estimate the proportion of faculty with no refereed publications and give the standard error. Question 4 [20 marks, 4+4+4+2+2+2+2] (a) A letter in the December 1995 issue of Dell Champion Variety Puzzles stated: "I've noticed over the last several issues there have been no winners from the South in your contests. You always say that winners are picked at random, so does this mean you're getting fewer entries from the South?" In response, the editors took a random sample of 1000 entries from the last few contests and found that 175 of those came from the South. (i) Find a 95% CI for the percentage of entries that come from the South. (ii) According to Statistical Abstract of the United States, 30.9% of the U. S. population live in states that the editors considered to be in the South. Is there evidence from your confidence interval that the percentage of entries from the South differs from the percentage of persons living in the South? (b) A city council of a small city wants to know the proportion of eligible voters who oppose having an incinerator built for burning Phoenix garbage, just outside city limits. They randomly select 100 residential numbers from the city's telephone book that contains 3000 such numbers. Each selected residence is then called and asked for (a) the total number of eligible voters and (b) the number of voters opposed to the incinerator. A total of 157 voters are surveyed; of these, 23 refuse to answer the question. Of the remaining 134 voters, 112 oppose the incinerator, so the council estimates the proportion by p= with 112 = 0.83582 134 V(P) = 0.83582(~~ 0.83582) = 0.00102. Are these estimates valid? Why. or why not? For each of the following situations, indicate how you might use ratio or regression estimation. (i) Estimate the proportion of time devoted to sports in television news broadcasts in your city. (ii) Estimate the average number of fish caught per hour by anglers visiting a lake in August. 4 (iii) Estimate the average amount that undergraduate students spent on textbooks at your univer sity in the fall semester. (iv) Estimate the total weight of usable meat (discarding bones, fat, and skin) in a shipment of chickens. [20 marks, 4+4+6+6] Question 5 Suppose a city has 90,000 dwelling units, of which 35,000 are houses, 45,000 are apartments, and 10,000 are condominiums. You believe that the mean electricity usage is about twice as much for houses as for apartments or condominiums and that the standard deviation is proportional to the mean. (a) How would you allocate a sample of 900 observations if you want to estimate the mean electricity consumption for all households in the city? (b) Now suppose that you want to estimate the overall proportion of households in which energy conservation is practiced. You have strong reason to believe that about 45% of house dwellers use some sort of energy conservation and that the corresponding percentages are 25% for apartment dwellers and 3% for condominium residents. What gain would proportional allocation offer over simple random sampling? (c) Someone else has taken a small survey. using an SRS, of energy usage in houses. On the basis of the survey, each house is categorized as having electric heating or some other kind of heating. The January electricity consumption in kilowatt-hours for each house is recorded (Yi) and the results are given below: Type of Heating Electric Nonelectric Total Number of Sample Houses Mean 24 972 463 36 60 Sample Variance 202,396 96,721 From other records, it is known that 16,450 of the 35,000 houses have electric heating, and 18,550 have nonelectric heating. (i) Using the sample, give an estimate and its standard error of the proportion of houses with electric heating. Does your 95% (I include the true proportion? (ii) Give an estimate and its standard error of the average number of kilowatt-hours used by houses in the city. What type of estimator did you use, and why did you choose that estimator? 5 Useful formulas L:~l (Yi - y)2 n 1 = fj Mara • n ~ Pars ~ Thh "Yi = {;;;t n 1 ~Yi L...J n i=l Pi =- ~ Thh j.£hh = N Mr = rj.£x h = Nj.£L L ~ " Nh _ = L...J N j.£str Yh h=l Tstr ~ Pstr = N Mstr L " Nh ~ = L...J N Ph h=l L Mpstr = L W hiA h=l 6 M fel n N L i=l j=l {i,el n "n L...i=l 82 u = Nfl i=l L fI L fel M -= 3'=1 y.l -_ icl. N 8 where n I:I: Yij . = -nL. ~=1 1n n -- I:Yi i=l j=l 1 where Y- N L n I:I:Yij = --n I:I: Yij nL =- 2 A(A ) _ N(N - n) N(N - n).2! n 2 = 2:'-1n-l (lIi-jj) . {i,1 V fel = fI ~el V({i,l N - M2 8! - n N-n8 2 _ _ .2! N n The formulas for systematic sampling are the same as those used for one-stage cluster sampling. Change the subscript cI to sys to denote the fact that data were collected under systematic sampling. To estimate T, multiply {i,c(.) by M. To get the estimated variances, mUltiply V({i,cO) by M2. If M is not known, substitute M with Nm/n. m= 2::=1 Mi/n. n for ~ SRS n for T SRS n for p SRS 1)(dl/z2) + 0-2 N0-2 (N -1)(dl/z2N2) + 0-2 n = (N n = Np(l- p) 1)(dl/z2) + p(l - p) N0-2 n = --:-::-:-:----:-c~-;--;:;----:::(N -1)(dl/z2) + 0-2 N0- 2 n = (N -1)(dl/z2N2) + 0-2 n = (N n for ~ SYS n for T SYS n for ~ STR n= n for T STR n 'E~=1 N~(o-VWh) N2(dl/z 2) + 'ELI Nho-~ 'E~=1 Nl(o-VWh) N2(dl/z2N2) + 'E~=1 Nho-~ 7 where Wh =~. Allocations for STR p,: (c - eo) n • = (E~=l NkC1kVck) (E~=l Nk(JkVck) N2(d2/z2) + L:~=l Nk(JI N2(d2/Z2) + 11 E~=l Nk(JI (E~=l Nk(Jk) 2 n Allocations for STR r: Allocations for STR p: 8 STATISTICAL TABLES 1 TABLEA.1 Cumulative Standardizeci Normal Distribution A(z) is the integral ofthe standardized normal distribution from - 00 to z (in other words, the area under the curve to the left of z). It gives the probability of a normal random variable not being more than z standard deviations above its mean. Values ofz of particular importance: -4 -3 ·2 ·1 0 1 Z 2 z A(z} 1.645 1.960 2.326 2.576 3.090 3.291 0.9500 0.9750 0.9900 0.9950 0.9990 0.9995 Lower limit of right 5% tail Lower limit of right 2.5% tail Lower limit ofright 1% tail Lower limit of right 0.5% tail Lower limit ofrigbtO.l% tail Lower limit of right 0.05% tail 4 3 z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0,7 0.8 0,9 1,0 J.l 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2,2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 0.5000 0.5398 0.5793 0.6179 0.6554 0.6915 0.7257 0.7580 0.7881 0,8159 0,8413 0.8643 0.8849 0.9032 0.9192 0.9332 0.9452 0.9554 0.9641 0,9713 0.9772 0.9821 0.9861 0.9893 0.9918 0.9938 0.9953 0.9965 0.9974 0.9981 0.9987 0.9990 0.9993 0.9995 0.9997 0.9998 0.9998 0.5040 0.5438 0.5832 0.6217 0.6591 0.6950 0.7291 0.7611 0.7910 0.8186 0,8438 0,8665 0.8869 0.9049 0.9207 0.9345 0.9463 0,9564 0.9649 0.9719 0.9778 0.9826 0.9864 0.9896 0.9920 0.9940 0.9955 0.9966 0.9975 0.9982 0.9987 0.9991 0.9993 0.9995 0.9997 0.9998 0.9998 0.5080 0.5478 0.5871 0.6255 0.6628 0.6985 0.7324 0.7642 0.7939 0.8212 0,8461 0.8686 0.8888 0.9066 0.9222 0.9357 0.9474 0.9573 0.9656 0,9726 0,9783 0.9830 0.9868 0.9898 0,9922 0.9941 0.9956 0.9967 0.9976 0.9982 0.9987 0.9991 0.9994 0.9995 0.9997 0.9998 0.9999 0.5120 0.5517 0.5910 0.6293 0.6664 0.7019 0.7357 0,7673 0.7967 0.8238 0.8485 0.8708 0.8907 0.9082 0.9236 0.9370 0.9484 0.9582 0.9664 0.9732 0.9788 0.9834 0.9871 0.9901 0.9925 0.9943 0.9957 0.9968 0.9977 0.9983 0.9988 0.9991 0.9994 0.9996 0.9997 0.9998 0.5160 0.5557 0.5948 0.6331 0.6700 0.7054 0.7389 0.7704 0.7995 0.8264 0,8508 0.8729 0.8925 0.9099 0.9251 0.9382 0.9495 0.9591 0.9671 0,9738 0.9793 0.9838 0.9875 0.9904 0.9927 0.9945 0.9959 0.9969 0.9977 0.9984 0.9988 0.9992 0.9994 0.9996 0.9997 0.9998 0.5199 0.5596 0.5987 0.6368 0.6736 0.7088 0.7422 0.7734 0.8023 0.8289 0.8531 0,8749 0.8944 0.9115 0.9265 0.9394 0.9505 0.9599 0.9678 0.9744 0.9798 0.9842 0.9878 0.9906 0.9929 0.9946 0.9960 0.9970 0.9978 0.9984 0.9989 0.9992 0.9994 0.9996 0.9997 0.9998 0.5239 0.5636 0.6026 0.6406 0.6772 0.7123 0.7454 0.7764 0.8051 0,8315 0.8554 0.8770 0.8962 0.9131 0.9279 0.9406 0.9515 0.9608 0.9686 0.9750 0.9803 0.9846 0.9881 0.9909 0.9931 0.9948 0.9961 0.9971 0.9979 0.9985 0.9989 0.9992 0.9994 0.9996 0.9997 0.9998 0.5279 0.5675 0.6064 0.6443 0.6808 0.7157 0.7486 0.7794 0.8078 0,8340 0.8577 0.8790 0.8980 0.9147 0.9292 0.9418 0.9525 0.9616 0.9693 0.9756 0.9808 0.9850 0.9884 0.9911 0.9932 0.9949 0.9962 0.9972 0.9979 0.9985 0.9989 0.9992 0.9995 0.9996 0.9997 0.9998 0.5319 0.5714 0.6103 0.6480 0.6844 0.7190 0.7517 0.7823 0.8106 0.8365 0.8599 0.8810 0.8997 0.9162 0.9306 0.9429 0.9535 0.9625 0.9699 0.9761 0.9812 0.9854 0.9887 0.9913 0.9934 0.9951 0.9963 0.9973 0.9980 0.9986 0.9990 0.9993 0.9995 0.9996 0.9997 0.9998 0.5359 0.5753 0.6141 0.6517 0.6879 0.7224 0.7549 0.7852 0.8133 0,8389 0.8621 0.8830 0.9015 0.9177 0.9319 0.9441 0.9545 0.9633 0.9706 0.9767 0.9817 0.9857 0.9890 0.9916 0.9936 0.9952 0.9964 0.9974 0.9981 0.9986 0.9990 0.9993 0.9995 0.9997 0.9998 0.9998 9 STATISTICAL TABLES 2 TABLEA.2 t Distribution: Critical Values of t Significance level Degrees 0/ freedom Two-tailed test: One-tailed test: 10"A. 5% 5% 2.5% 6.314 2.920 2.353 2.132 2.015 12.706 4.303 3.182 2.776 2.571 1.943 1% 0.5% 0.2% 0.1% 0.1% 0.05% 31.821 6.965 4.541 3.747 3.365 63.657 9.925 5.841 4.604 4.032 318.309 22.327 10.215 7.173 5.893 636.619 31.599 12.924 8.610 6.869 1.860 1.833 1.812 2.447 2.365 2.306 2.262 2.228 3.143 2.998 2.896 2.821 2.764 3.707 3.499 3.355 3.250 3.169 5.208 4.785 4.501 4.297 4.144 5.959 5.408 5.041 4.781 4.587 1.796 1.782 1.771 1.761 1.753 2.201 2.179 2.160 2.145 2.131 2.718 2.681 2.650 2.624 2.602 3.106 3.055 3.012 2.977 2.947 4.025 3.930 3.852 3.787 3.733 4.437 4.318 4.221 4.140 4.073 20 1.746 1.740 1.734 1.729 1.725 2.120 2.110 2.101 2.093 2.086 2.583 2.567 2.552 2.539 2.528 2.921 2.898 2.878 2.861 2.845 3.686 3.646 3.610 3.579 3.552 4.015 3.965 3.922 3.883 3.850 21 22 23 24 25 1.721 1.717 1.714 1.711 1.708 2.080 2.074 2.069 2.064 2.060 2.518 2.508 2.500 2.492 2.485 2.831 2.819 2.807 2.797 2.787 3.527 3.505 3.485 3.467 3.450 3.819 3.792 3.768 3.745 3.725 26 29 30 1.706 1.703 1.701 1.699 1.697 2.056 2.052 2.048 2.045 2.042 2.479 2.473 2.467 2.462 2.457 2.779 2.771 2.763 2.756 2.750 3.435 3.421 3.408 3.396 3.385 3.707 3.690 3.674 3.659 3.646 32 34 36 38 40 1.694 1.691 1.688 1.686 1.684 2.037 2.032 2.028 2.024 2.021 2.449 2.441 2.434 2.429 2.423 2.738 2.728 2.719 2.712 2.704 3.365 3.348 3.333 3.319 3.307 3.622 3.601 3.582 3.566 3.551 42 44 46 50 1.682 1.680 1.679 1.677 1.676 2.018 2.015 2.013 2.011 2.009 2.418 2.414 2.410 2.407 2.403 2.698 2.692 2.687 2.682 2.678 3.296 3.286 3.277 3.269 3.261 3.538 3.526 3.515 3.505 3.496 60 70 80 90 100 1.671 1.667 1.664 1.662 1.660 2.000 1.994 1.990 1.987 1.984 2.390 2.381 2.374 2.368 2.364 2.660 2.648 2.639 2.632 2.626 3.232 3.211 3.195 3.183 3.174 3.460 3.435 3.416 3.402 3.390 120 150 200 300 400 1.658 1.655 1.653 1.650 1.649 1.980 1.976 1.972 1.968 1.966 2.358 2.351 2.345 2.339 2.336 2.617 2.609 2.601 2.592 2.588 3.160 3.145 3.131 3.118 3.m 3.373 3.357 3.340 3.323 3.315 500 600 1.648 1.647 1.965 1.964 2.334 2.333 2.586 2.584 3.107 3.104 3.310 3.307 co 1.645 1.960 2.326 2.576 3.090 3.291 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 27 28 48 1.894 10 2% 1%