Model Parameter Learning

June 22, 2024

How to train a model

Model learning means finding $\theta$ that can maximize probability of observing data set $D$ from model $p_\theta$
- Cost function
- Parameter learning algorithm

Continuous and differentiable cost function $J:\mathbb{R}^d\to\mathbb{R}$
How should parameter $\theta$ be moved to minimize the value of cost function $J$?
Gradient descent
- Partially differenciate function by $\theta$ to get gradient and move towards direction where cost function $J$ is minimized
Cost function for logistic regression
$J(\boldsymbol\theta) = -\frac{1}{N}\displaystyle\sum_{i=1}^{N}[y^{(i)}logh_\boldsymbol\theta(\boldsymbol{x}^{(i)})+(1-y^{(i)})log(1-h_\boldsymbol\theta(\boldsymbol{x}^{(i)}))]$
Gradient Descent (=Delta Learning Rule)
$\theta_j := \theta_j - \alpha\frac{\partial J(\boldsymbol\theta)}{\partial\theta_j}=\theta_j-\alpha\frac{1}{N}\displaystyle\sum_{i=1}^{N}(h_\boldsymbol\theta(\boldsymbol{x}^{(i)})-y^{(i)})x_j^{(i)}$

Initialize $\theta$ arbitrarily
repeat
- for j=0 to number of $\theta$ $\theta_j =\theta_j-\alpha\frac{1}{N}\displaystyle\sum_{i=1}^{N}(h_\boldsymbol\theta(\boldsymbol{x}^{(i)})-y^{(i)})x_j^{(i)}$
until $J(\boldsymbol\theta)$ converges

Cost function should be convex in order to find proper golobal minimum
Non-convex function can fall into local minimum instead
Cost function for logistic regression is convex
$J(\boldsymbol\theta) = -ylogh_\boldsymbol\theta(\boldsymbol{x})+(1-y)log(1-h_\boldsymbol\theta(\boldsymbol{x})]$