English

not defined

no text concepts found

Decisions, Decisions • Concepts • Naïve Bayesian Classification • Decision Trees – General Algorithm – Refinements • Accuracy • Scalability – Strengths and Limitations Lecture 12, CS567 1 • Problem – Will there be a pop quiz today? • Data – – – – • (Duration of delay) in entering class Instructor’s bag (bulging/not bulging) Instructor (has/does not have) wicked/impish smile on face There (was/wasn’t) a quiz last class Naïve Bayesian – • Concepts Calculate P(Pop Quiz) from data with no regard to order of calculation Decision Tree – Evaluate data in a particular branching sequence If …. then Lecture 12, CS567 elsif ….. 2 Naïve Bayesian • • Goal: To estimate P(M|D) aka Posterior “Naïve” assumption – • • Prior = P(M). Binomial distribution with parameter p = P(Pop Quiz). Thus p = ? P(Pop Quiz|D) = P(D|Pop Quiz) P(Pop Quiz)/P(D) where, for i attributes constituting the data – – • All data have “free will”- All attributes have independent probability distributions (Mutual information between every pair of attributes = 0) P(D|Pop Quiz) = i P(Di|Pop Quiz) P(D) = K (uniform assumption) OR P(D) = i P(Di) Thus, either calculate explicit P(Pop Quiz|D) OR Max likelihood comparison of P(Pop Quiz|D) and P(No Pop Quiz|D) Lecture 12, CS567 3 Decision Trees • Directed Graph for reaching a decision • Decision = – Verdict – More generally, classification into no of several classes – If (..) OR (…) .. then Pop Quiz; If (..) OR (..) .. Then no Pop Quiz • Given i attributes/data about an instance, navigate graph based on the values of {i} • Based on minimizing uncertainty – Greedy approach: Largest drops in uncertainty occur first Lecture 12, CS567 4 Decision Trees - Training • Given: Set of labeled data (Di = {ai}, Ck) • Goal: To find best classification tree • Maximum uncertainty = log2(|class|) = 1 bit for a 2-class problem • Entropy for given data E(D) = - n P(Ck) log P(Ck), for n classes • Conditional/Residual entropy E(D|ai, Ck) = l { [|Dl| / |D|].E(Dl) } for l subsets • Reduction in uncertainty = Gain of Information • Gain G(ai) = E(D) – E(D|ai, Ck) Lecture 12, CS567 5 Decision Trees - Training • Find ai | [G(ai) > G(aj) for all i j] • Root = ai with children = subsets of data falling into each range of ai • Iterate through remaining list of attributes till all ai have been considered • Label each subset with majority class label • Optional, highly recommended, steps: – Prepruning and postpruning: Avoid over-fitting Lecture 12, CS567 6 Decision Trees - Training • Caveat with previous approach: – Subsetting with single or few data point(s) highly favored – In other words, variable with higher resolution (that have a wider range of possibilities) favored • Gain ratio – Alternative to information gain as criterion for choice of attributes – Compensates for bias towards high scores for ai with high resolution (higher number of states for that attribute) – Gain ratio = G(ai) / E(ai) • Gini – Recommended for attribute selection for large training sets for scalability – Gini Index = 1 - Pk2 Lecture 12, CS567 7 Decision Trees - Limitation • Rectangular (Hypercuboidal) data space partitioning assumption • Not the best solution where the hyperline/plane is not orthogonal to data dimensions • Greeedy strategy can easily lead to overfitting Lecture 12, CS567 8