English

not defined

no text concepts found

ST 521: Statistical Theory I Armin Schwartzman Class topic 3 Random Variables Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Distribution Functions Cumulative Distribution Functions . Some properties of the cdf . . . . . . Induced Probability Space. . . . . . . Identically distributed rvs . . . . . . . Types of Random Variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Discrete random variables Discrete random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Properties of the pmf. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Continuous Random Variables Continuous Random Variabes Probability density function . Properties . . . . . . . . . . . . . Notes . . . . . . . . . . . . . . . . Notes (cont.) . . . . . . . . . . . Notes (cont.) . . . . . . . . . . . Stochastic ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 3 4 5 6 7 8 9 10 11 12 . 13 . 14 . 15 . . . . . . . 16 17 18 19 20 21 22 23 2 / 23 Random Variables Random Variables Suppose we start with a probability space (S, A, P ). Instead of referring to outcomes and events observed from the sample space S, it is often convenient to assign a number to each possible outcome and record that instead. Definition: A random variable Y is a real-valued and measurable function deﬁned on a probability space. That is, Y : S → R. Every point ω in S maps to a point in R, namely Y (ω). Conversely, we deﬁne the inverse image under Y of a subset B of R as Y −1 (B) = {ω : Y (ω) ∈ B} The deﬁnition of a random variable requires that the inverse image of every Borel set B ⊂ R is an element of A. This property allows us to assign probabilities to random variables. More precisely, P {Y ∈ B} = P {Y −1 (B)} ST 521 – Class topic 3 Fall 2014 – 3 / 23 Conventions A Random Variable is a set function which takes values on the real line (for now). Often the argument is omitted and one writes Y instead of Y (ω). Random variables are usually denoted by capital letters (e.g. Y ). Values which random variable can take on are denoted by lower case letters (e.g. y). Example: Coin Tosses S : {H, T } Y (H) = 1, Y (T ) = 0 If P (head) = .5, then P (Y = 1) = .5 ST 521 – Class topic 3 Fall 2014 – 4 / 23 2 Examples Roll of a die: S = set of sides of a cube 1. Deﬁne Y (ω) = # of dots on side ω P {Y (ω) = ω} = P (ω) = 1/6 2. Deﬁne X(ω) = # of dots on side ω modulo 3 ST 521 – Class topic 3 Fall 2014 – 5 / 23 6 / 23 Distribution Functions Cumulative Distribution Functions Distribution Functions are used to describe the behavior of a rv. Definition: The cumulative distribution function (cdf) of a random variable Y is a real valued function FY (y) deﬁned by FY (y) = P {Y ≤ y} = P {ω : Y (ω) ≤ y} Example: Cdf of a die Definition: The survival function of Y is deﬁned by SY (y) = 1 − FY (y) = P (Y > y) ST 521 – Class topic 3 Fall 2014 – 7 / 23 3 Some properties of the cdf Let F (y) be a cdf. Then: 1. 0 ≤ F (y) ≤ 1 2. limy→−∞ F (y) = 0 3. limy→∞ F (y) = 1 4. F is nondecreasing: if a < b, then F (a) ≤ F (b) 5. F is right continuous: limy↓b F (y) = F (b) 6. P {a < Y ≤ b} = F (b) − F (a) These properties can all be proved using the properties of probability measures. The above properties are also suﬃcient for F (y) to be a cdf of a rv. ST 521 – Class topic 3 Fall 2014 – 8 / 23 Induced Probability Space All probability questions about a random variable can be answered via its cdf. Every random variable deﬁned on a probability space induces a probability space on R: (S, A, P ) −→ Y (ω) −→ (R, B, F (·)) Points in S are transformed to points on R (Real line) Sets (events in A) are mapped into intervals on real line, i.e., into members of the Borel sets, B. P is replaced by F (·). Because of this, the abstract notion of a sample space recedes, and attention is usually given primarily to random variables and their distributions. We will sometimes refer to the ‘sample space’ of a random variable, which will be taken to be the values in R that a random variable takes on. ST 521 – Class topic 3 Fall 2014 – 9 / 23 4 Identically distributed rvs The cdf does not contain information about the original sample space. Example: Toss a coin n times. The number of heads and number of tails have the same distribution. Definition: Two rvs X and Y are identically distributed if for every Borel set A ⊂ R, P (X ∈ A) = P (Y ∈ A). Theorem C&B 1.5.10 The following two statements are equivalent: a. The rvs X and Y are identically distributed b. FX (x) = FY (x) for every x. The distinction between two rvs being equal and having the same distribution will become important later in questions of convergence. ST 521 – Class topic 3 Fall 2014 – 10 / 23 Types of Random Variables A random variable Y can be discrete: – – continuous: – – Y takes on a ﬁnite or countably inﬁnite number of values FY (y) is step-wise constant the range of Y consists of subsets of the real line. FY (y) is continuous mixed: FY (y) is piecewise continuous. Example: a rv with cdf ⎧ 0 ⎪ ⎪ ⎪ ⎪ x/2 ⎨ F (x) = 2/3 ⎪ ⎪ 11/12 ⎪ ⎪ ⎩ 1 ST 521 – Class topic 3 x<0 0≤x<1 1≤x<2 2≤x<3 3≤x Fall 2014 – 11 / 23 5 12 / 23 Discrete random variables Discrete random variables Suppose a random variable Y takes only a ﬁnite or countable number of values. Let the sample space of Y be S = {y1 , y2 , ...}. Then the cdf can be expressed as: P {Y = yi } F (y) = yi ≤y Definition The prob. mass function (pmf) or frequency function is a function f (y) deﬁned by f (y) = P {Y (ω) = y} If the sample space of Y is S = {y1 , y2 , ...}, then f (yi ) = P (Y = yi ) = P (yi−1 < Y ≤ yi ) = F (yi ) − F (yi−1 ) Example: Suppose Y is a random variable that takes the values 0, 1 or 2 with probability .5, .3, and .2, respectively. ST 521 – Class topic 3 Fall 2014 – 13 / 23 Properties of the pmf Definition: The domain of a random variable Y is the set of all values of y for which f (y) > 0. This is also called range or sample space. Properties of the pmf: 1. f (y) > 0 for at most a countable number of values y. For all other values y, f (y) = 0. 2. Let {y1 , y2 , . . .} denote the domain of Y . Then ∞ f (yi ) = 1 i=1 An obvious consequence is that f (y) ≤ 1 over the domain. Example: What is the pmf of a deterministic rv (a constant)? ST 521 – Class topic 3 Fall 2014 – 14 / 23 6 Example In many applications, a formula can be used to represent the pmf of a random variable. Suppose Y can take values 1,2 ... with pmf f (y) = 1 y(y+1) 0 y = 1, 2, ... otherwise How would we determine if this is an allowable pmf ? ST 521 – Class topic 3 Fall 2014 – 15 / 23 16 / 23 Continuous Random Variables Continuous Random Variabes A random variable Y is called continuous if its distribution function FY (y) = P (Y ≤ y) is a continuous function. A random variable Y is called absolutely continuous if its distribution function F (y) = P (Y ≤ y) is an absolutely continuous function. Definition: A function F (y) is absolutely continuous if it can be written y f (x)dx. F (y) = −∞ Absolute continuity is stronger than continuity but weaker than diﬀerentiability. An example of an absolutely continuous function is one that is: continuous everywhere diﬀerentiable everywhere, except possibly for a countable number of points. ST 521 – Class topic 3 Fall 2014 – 17 / 23 7 Probability density function If F (y) is absolutely continuous, f (y) is called the probability density function (pdf) of Y and F (y) = dF (y) = f (y). dy Building on this idea, P (a < Y ≤ b) = F (b) − F (a) = a b f (x)dx More generally, for a set B, P (Y ∈ B) = B f (x)dx Note that of course B has to be an “allowable” subset of the real line R, that is, a Borel set. ST 521 – Class topic 3 Fall 2014 – 18 / 23 Properties In general, a function f (x) is a pdf iﬀ 1. f (x) ≥ 0 ∞ 2. −∞ f (x)dx = 1 Examples: Suppose F (x) = 1 − e−λx for x > 0 and F (x) = 0 otherwise. Is F (x) a cdf? What is the associated pdf? What about f (x) = 1/xr for x > 1 and f (x) = 0 otherwise? ST 521 – Class topic 3 Fall 2014 – 19 / 23 8 Notes f (x) is not the probability that Y = x. In fact, if Y is an absolutely continuous random variable with density function f (x), then P (Y = x) = 0. Why? P (Y = x) = = lim x+h h→0 x−h f (u)du lim F (x + h) − F (x − h) h→0 = F (x+) − F (x−) = 0 ST 521 – Class topic 3 Fall 2014 – 20 / 23 Notes (cont.) More generally, if B is a subset of R with B then if Y is an absolutely continuous random variable deﬁned on R, then P (Y ∈ B) = 0 also. Because P (Y = a) = 0, all the following are equivalent: P (a ≤ Y ≤ b), dx = 0, P (a ≤ Y < b) and P (a < Y < b) Also, note that f (x) can exceed one! ST 521 – Class topic 3 Fall 2014 – 21 / 23 9 Notes (cont.) f (x) can be interpreted as the relative probability that Y takes the value x. Why? By the mean value theorem, we can say P (x < Y ≤ x + Δ) ≈ f (x)Δ Thus P (Y ∈ interval of width Δ centered at a) ≈ f (a)Δ and P (Y ∈ interval of width Δ centered at b) ≈ f (b)Δ Hence, if f (b) > f (a), we can say that it is more likely for Y to take the values near b rather than near a. ST 521 – Class topic 3 Fall 2014 – 22 / 23 Stochastic ordering Suppose Y is a rv and deﬁne X = Y + 2. Then X > Y always. Now suppose X ∼ FX (t) = (1 − e−t ) 1(t > 0) Y ∼ FY (t) = (1 − e−2t ) 1(t > 0) Then X is not always greater than Y , but it is likely to be. Definition: X is stocastically greater than Y if FX (t) ≤ FY (t) for all t FX (t) < FY (t) for some t or equivalently P (X > t) ≥ P (Y > t) for all t P (X > t) > P (Y > t) for some t ST 521 – Class topic 3 Fall 2014 – 23 / 23 10