Document

Document technical information

Format ppt
Size 60.4 kB
First found May 22, 2018

Document content analysis

Category
Also themed
Language
English
Type
not defined
Concepts
no text concepts found

Transcript

```Decisions, Decisions
• Concepts
• Naïve Bayesian Classification
• Decision Trees
– General Algorithm
– Refinements
• Accuracy
• Scalability
– Strengths and Limitations
Lecture 12, CS567
1
•
Problem
–
Will there be a pop quiz today?
•
Data
–
–
–
–
•
(Duration of delay) in entering class
Instructor’s bag (bulging/not bulging)
Instructor (has/does not have) wicked/impish smile on face
There (was/wasn’t) a quiz last class
Naïve Bayesian
–
•
Concepts
Calculate P(Pop Quiz) from data with no regard to order of
calculation
Decision Tree
– Evaluate data in a particular branching sequence
If …. then
Lecture 12, CS567
elsif …..
2
Naïve Bayesian
•
•
Goal: To estimate P(M|D) aka Posterior
“Naïve” assumption
–
•
•
Prior = P(M). Binomial distribution with parameter p =
P(Pop Quiz). Thus p = ?
P(Pop Quiz|D) = P(D|Pop Quiz) P(Pop Quiz)/P(D) where,
for i attributes constituting the data
–
–
•
All data have “free will”- All attributes have independent
probability distributions (Mutual information between every pair
of attributes = 0)
P(D|Pop Quiz) = i P(Di|Pop Quiz)
P(D) = K (uniform assumption) OR P(D) = i P(Di)
Thus, either calculate explicit P(Pop Quiz|D) OR Max
likelihood comparison of P(Pop Quiz|D) and P(No Pop
Quiz|D)
Lecture 12, CS567
3
Decision Trees
• Directed Graph for reaching a decision
• Decision =
– Verdict
– More generally, classification into no of several classes
– If (..) OR (…) .. then Pop Quiz; If (..) OR (..) .. Then no Pop Quiz
• Given i attributes/data about an instance, navigate graph
based on the values of {i}
• Based on minimizing uncertainty
– Greedy approach: Largest drops in uncertainty occur first
Lecture 12, CS567
4
Decision Trees - Training
• Given: Set of labeled data (Di = {ai}, Ck)
• Goal: To find best classification tree
• Maximum uncertainty = log2(|class|) = 1 bit for a 2-class
problem
• Entropy for given data E(D) = - n P(Ck) log P(Ck), for n
classes
• Conditional/Residual entropy
E(D|ai, Ck) = l { [|Dl| / |D|].E(Dl) } for l subsets
• Reduction in uncertainty = Gain of Information
• Gain G(ai) = E(D) – E(D|ai, Ck)
Lecture 12, CS567
5
Decision Trees - Training
• Find ai | [G(ai) > G(aj) for all i  j]
• Root = ai with children = subsets of data falling into each
range of ai
• Iterate through remaining list of attributes till all ai have
been considered
• Label each subset with majority class label
• Optional, highly recommended, steps:
– Prepruning and postpruning: Avoid over-fitting
Lecture 12, CS567
6
Decision Trees - Training
• Caveat with previous approach:
– Subsetting with single or few data point(s) highly favored
– In other words, variable with higher resolution (that have a wider
range of possibilities) favored
• Gain ratio
– Alternative to information gain as criterion for choice of attributes
– Compensates for bias towards high scores for ai with high
resolution (higher number of states for that attribute)
– Gain ratio = G(ai) / E(ai)
• Gini
– Recommended for attribute selection for large training sets for
scalability
– Gini Index = 1 -  Pk2
Lecture 12, CS567
7
Decision Trees - Limitation
• Rectangular (Hypercuboidal) data space partitioning
assumption
• Not the best solution where the hyperline/plane is not
orthogonal to data dimensions
• Greeedy strategy can easily lead to overfitting
Lecture 12, CS567
8
```