ECO 312 Fall 2013 Chris Sims MIDTERM EXAM (1) 10 minutes. Below is a qq plot against a normal cdf of a sample of 200 observations. Does the plot show the sample has approximately a normal distribution, a distribution with fatter tails than the normal, a distribution with thinner tails than the normal, or a distribution with a fatter tail than the normal on one side and a thinner tail on the other? Explain your answer. 0.6 0.4 0.0 0.2 Sample Quantiles 0.8 1.0 Normal Q−Q Plot −3 −2 −1 0 1 2 3 Theoretical Quantiles The qq plot (quantile-quantile plot) plots the quantiles of one distribution against those of another. The p’th quantile of X is the value q p of the random variable X such that P[ X ≤ q p ] = p. The plot shows that, e.g. at the left end, the quantiles of the normal (the horizontal axis) are more widely spaced than those of the sample distribution, so the slope of the plot is low relative to the slope in the middle of the distribution. In one of the exercises, you computed a qq plot of a t-distributed sample against a normal, and that plot was steeper at the far left and far right than in the middle. A t distribution is fatter-tailed than a normal. This question’s plot shows flatter slope at the far left and far right than in the middle. So it corresponds to a thinner-tailed distribution. If you didn’t remember the shape from the plot on the exercise, trying to figure out whether the plot is fat or thin tailed by thinking it through from first principles is a bit tricky. The qq plot plots y against x, where the relation between y and x is determined by Fx ( x ) = Fy (y) , with Fx and Fy being the cdf’s of x and y. The slope of the y( x ) function is then easily shown to be y′ ( x ) = f x (x) . f y (y( x )) If you think that a relatively fat-tailed y distribution is one in which f y / f x is big in the tails (a natural first stab at a definition of “fat tail”), you might think that a low slope at the extremes of the distribution goes with a fat tail — when in fact the opposite is true. The reason is that we are not taking the ratio of f x to f y at y = x, but the ratio at Fx ( x ) = Fy (y). One definition of relative tail fatness is that x has fatter tails than y if f x ( x )/Fx ( x ) is smaller than f y (y)/Fy (y) if x and y are both close to the lower limit of the support of their respective distributions. (At the other extreme, the comparison is based on f x ( x )/(1 − Fx ( x )).) c ⃝2013 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License 2 MIDTERM EXAM So if x is a t with one degree of freedom, this ratio behaves, as x approaches −∞, like −2x/(1 + x2 ) and thus approaches zero, whereas for a normal it behaves like − x, and thus approaches infinity. So the t has fatter tails, and for large negative x, at points where Fx ( x ) = Fy (y) (with y normal) we will have f x / f y = f x Fy /( f y Fx ) very large. I.e., the slope will be very steep at the extremes in the qq plot when a t(1) is plotted against a normal. The plot shown on the exam was of a sample from the uniform distribution against a normal. (2) 30 minutes. Here are the results from estimating a logit model from the Stock-Watson “Names” data set. These data resulted from sending out resumés to prospective employers with names that were likely to suggest the applicants were black, and/or female. The variable call_back is one for those resumés that generated a call back from the employer, 0 otherwise. black is 1 for resumés that were likely to be identified as black, zero otherwise, and female is one for resumés likely to be identified as female, 0 otherwise. There was also a dummy variable that was 1 for employers in Chicago. (There were a lot of other variables, but we are keeping this simple.) The logit regression has call_back as dependent variable, black, female and chicago as independent variables. Here is the summary output from the maximum likelihood fit: Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -2.2116 0.1246 -17.743 < 2e-16 *** female 0.2745 0.1335 2.057 0.0397 * black -0.4431 0.1075 -4.120 3.78e-05 *** chicago -0.4599 0.1096 -4.198 2.70e-05 *** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (a) Find an approximate 95% probability interval for the coefficient on black and use it to give a 95% probability interval for the difference in probability of a call back between a black male not in Chicago and a non-black male, also not in Chicago. The 2.5% tails of the N (0, 1) distribution are at ±1.96. Thus the 95% interval on the black variable is −.4431 ± (1.96 · .1075) = (−.2324, .3399) . For a male not in Chicago, the coefficients on female and Chicago are not needed to compute the probabilities. The interval for the probabilities for a non-black and a black, respectively, are e−2.2116 = .0987 , 1 + e−2.2116 e−2.2116−.4431 = .0657 . 1 + e−2.2116−.4431 (b) The estimated covariance matrix of these coefficients is (Intercept) female black chicago (Intercept) 0.015537058 -0.012133751 -0.0046002636 -0.0027857895 female -0.012133751 0.017814209 -0.0002528960 -0.0037471701 black -0.004600264 -0.000252896 0.0115659895 0.0001393946 chicago -0.002785789 -0.003747170 0.0001393946 0.0120026554 Use it to construct a chi-squared test statistic for the hypothesis that the female and Chicago coefficients are both zero. Three significant figure accuracy is fine, and you will get full credit if you set up a correct numerical matrix expression for the statistic, even if you don’t carry out all the matrix multiplications, inversions, etc. Explain how to determine whether the hypothesis is rejected at the 95% level by a frequentist test. Explain with a hand sketch what region of the space of coefficients on the two tested variables has probability pchisq(s,df), where s is your statistic, df is the appropriate degrees of freedom for the chi-squared distribution, and MIDTERM EXAM 3 pchisq() is a function delivering the probability of a chi-squared variable being less than its first argument (s in the case at hand). The test statistic is ( (0.2745, −0.4599) .01781 −.003747 −.003747 .01200 The inverse of the matrix in the center is 1 .01781 · .012 − .0037472 ( .012 .003747 .003747 .01781 ) −1 ( ) ( = 0.2745 −0.4599 ) . 60.09616 18.76502 18.76502 89.19271 ) 0.0 + −1.0 −0.5 cf[2] 0.5 1.0 And the test statistic evaluates to 18.655, which, being distributed as χ2 (2) under the null hypothesis, rejects the null at any reasonable significance level. The probability of a χ2 (2) variable exceeding this level is .0001. The highest posterior density region of the parameter space for these two coefficients is an ellipse centered at the coefficient estimates and slightly negatively sloped (because the coefficients have negative covariance. The ellipse with probability .9999 corresponding to our test statistic passes through the null hypothesis point (0,0). A plot of it (more precise than you were required to produce) is below. Note that it includes some negative values of the coefficient on female (as we might have expected because of its smaller t statistic) but scarcely any positive values of the chicago coefficient. −1.0 −0.5 0.0 0.5 1.0 cf[1] (3) 20 minutes. We would like to estimate the coefficients in the equation (1) y j = β0 + β1 x j + ε j . The data are i.i.d. and ε j has mean zero, but we do not actually have data on x. We have instead two error-ridden proxies for x (2) z j = x j + νj (3) wj = xj + ξ j . We are willing to assume that all the data are jointly i.i.d. across j, that νj , ξ j and ε j all have zero mean conditional on x j , and that νj , ξ j , and ε j are uncorrelated with one another and have constant variance across j. (a) Can we obtain consistent estimates of β 0 and β 1 by replacing x j with z j in the equation and then using w j as an instrument for z j ? Would the results be different if we did the reverse, using z j as an instrument for w j ? 4 MIDTERM EXAM The original equation implies that when we replace x j by z j we get y j = β 0 + β 1 z j − β 1 νj + ε j = β 0 + β 1 z j + ζ j . The error term in this equation, ζ j = − β 1 νj + ε j , is clearly, under our assumptions, correlated with z j , so OLS will not work. However, w j is correlated with z j because of their common dependence on x j , and neither of w j ’s two components, x j and ξ j is correlated with ζ j , so the model meets the basic requirements for validity of instrumental variables. The same arguments with the roles of z and w reversed imply that we could use z j as an instrument for w j instead. The two estimates would not be the same, though, since they are ( Z ′ W ) −1 Z ′ Y and (W ′ Z ) − 1 W ′ Y , where Z is the matrix consisting of the constant vector and the z j ’s and W is the matrix consisting of the constant vector and the w j ’s. The asymptotic covariance matrix would not even match between the two (though you were not asked to check that). (b) We could get a more accurate measure of x j by taking an average of w j and z j . What about replacing x j with that average and using z j as an instrument for it? Would that give different results? Better results? It would certainly give different results, and not better results. The residual once x j was replaced by (z j + w j )/2 would depend on the noise terms in both z and w, so neither z nor w would be usable as an instrument. } { (4) 10 minutes. Suppose we have i.i.d. data on y j , x j for j = 1, . . . , n and wish to estimate the regression (with no constant term) (4) y j = αx j + ε j . Suppose we know that ε j | x j ∼ N (0, σ2 x2j ), where σ2 is an unknown parameter. Explain why ordinary least squares estimation is inefficient in this case and explain how to implement a more efficient estimator. Ordinary least squares is maximum likelihood and unbiased when the errors are homoskedastic — i.e. ε j ∼ N (0, σ2 ) with the same σ2 value for all j. Here the error variance varies, so GLS gives a lower variance. The weighted least squares version of GLS would tell us here to divide the data (y j , x j ) for each j by x j , then apply least squares to the transformed data. This results in the equation yj = α + νj , xj where νj = ε j /x j and νj ∼ N (0, σ2 ). Applying least squares to this homoskedastic equation is just taking the sample mean of y j /x j .