Regularization
The Problem of Overfitting
- Overfitting is when a model is overly trained by training data that it only performs well for training data and not for test data
Handling Overfitting
- Simplify a model
- Lower the dimension of leading terms for Polynominal feature
- Minimize the number of features
- Regularization
- Restrain the size of learning parameter $\theta$
Intuition of Regularization
- Restraining $\theta$ of leading terms makes the model similar to lower dimensioned ones = simplifies the model
Cosf function with Regularization
- Apply regularization by adding regularization trem at the end of the cost function
$J(\theta) = -\frac{1}{N}[\displaystyle\sum_{i=1}^{N}(y^{(i)}logh_\boldsymbol\theta(\boldsymbol{x}^{(i)}) + (1-y^{(i)})log(1-h_\theta(\boldsymbol{x}^{(i)})))] + \frac{\lambda}{2N}\displaystyle\sum_{k=1}^{K}\theta_k^2$
How does $\lambda$ work?
- It is important to control the size of $\lambda$ since it is also a hyper-parameter
- $\theta$ gets smaller too fast and model is overly simplified if $\lambda$ is too large: underfitting
- Effect of regularization is marginal if $\lambda$ is too small
Leave a comment