English

not defined

no text concepts found

Tricks of the Trade Deep Learning and Neural Nets Spring 2015 Agenda 1. Homa Hosseinmardi on cyberbullying 2. Model fitting and overfitting 3. Generalizing architectures, activation functions, and error functions 4. The latest tricks that seem to make a difference Learning And Generalization • What’s my rule? 1 2 3 ⇒ satisfies rule 4 5 6 ⇒ satisfies rule 6 7 8 ⇒ satisfies rule 9 2 31 ⇒ does not satisfy rule • Plausible rules 3 consecutive single digits 3 consecutive integers 3 numbers in ascending order 3 numbers whose sum is less than 25 3 numbers < 10 1, 4, or 6 in first column “yes” to first 3 sequences, “no” to all others “What’s My Rule” For Machine Learning • x1 x2 x3 y 0 0 0 1 0 1 1 0 1 0 0 0 1 1 1 1 0 0 1 ? 0 1 0 ? 1 0 1 ? 1 1 0 ? 16 possible rules (models) With N binary inputs and P training examples, there are 2(2^N-P) possible models. • Model Space restricted model class models consistent with data correct model All possible models Challenge for learning Start with model class appropriately restricted for problem domain Model Complexity Models range in their flexibility to fit arbitrary data simple high bias model complex low bias model constrained low variance unconstrained high variance small capacity may prevent it from representing all structure in data large capacity may allow it to memorize data and fail to capture regularities Training Vs. Test Set Error Test Set Training Set Error on Test Set Bias-Variance Trade Off underfit overfit image credit: scott.fortmann-roe.com Overfitting Occurs when training procedure fits not only regularities in training data but also noise. Like memorizing the training examples instead of learning the statistical regularities that make a “2” a “2” Leads to poor performance on test set Most of the practical issues with neural nets involve avoiding overfitting Avoiding Overfitting Increase training set size Make sure effective size is growing; redundancy doesn’t help Incorporate domain-appropriate bias into model Customize model to your problem Set hyperparameters of model number of layers, number of hidden units per layer, connectivity, etc. Regularization techniques “smoothing” to reduce model complexity Incorporating Domain-Appropriate Bias Into Model Input representation Output representation e.g., discrete probability distribution Architecture # layers, connectivity e.g., family trees net; convolutional nets Activation function Error function Customizing Networks Hinton softmax video lecture gives one example of how neural nets can be customized based on understanding of problem domain choice of error function choice of activation function Domain knowledge can be used to impose domainappropriate bias on model bias is good if it reflects properties of the data set bias is harmful if it conflicts with properties of data