Information Theory

June 21, 2024

Uncertainty

Information amount of certain accident happening with probability of distribution P is $I(x)$
Metrics can be bit or nat depending on the base of log function calculating $I(X)$
$log_2P(x)$: bit, $log_eP(x)$:nat

$I(x) = -logP(x)$

Degree of uncertainty
How many questions are needed to check information
Expected information amount on average when accident $x$ is sampled by probability distribution $P$
$H(x) = \mathbb{E}{x{\sim}p}[I(x)] = -\displaystyle\sum{x}P(x)logP(x)$

KL divergence is used to measure difference between probability distribution $P$ and $Q$
Entropy difference in alternative probability distribution presenting approximate values in sampling step for certain probability distribution
KL divergence is $D_{KL}(P||Q) \neq D_{KL}(Q||P)$
$D_{KL}(P||Q) = \mathbb{E}{x{\sim}p}[logP(x) - logQ(x)]$ $=\displaystyle\sum{x}log\frac{P(x)}{Q(x)}=\displaystyle\sum_{x}P(x)logP(x) - P(x)logQ(x)$ $=(-\displaystyle\sum_{x}P(x)logQ(x)) - (-\displaystyle\sum_{x}P(x)logP(x))$ $H(P,Q)$ $H(P)$

Used to measure difference between probability distribution P and Q
$H(P,Q) = -\displaystyle\sum_{x}P(x)logQ(x)$