Information Theory

Information Theory

  • Researched to quantify information and compress messages efficiently

  • How many bits can discribe a certain incident

  • Less bits are used for frequent incidents

  • More bits are used for rare incidents

  • Coin flip can be described with 1 bit

  • 2 bits are used for 4 choices (00/01/10/11)

Uncertainty

  • How many questions on average are needed to figure out a status

Self-Information

  • Information amount of certain accident happening with probability of distribution P is $I(x)$
  • Metrics can be bit or nat depending on the base of log function calculating $I(X)$
  • $log_2P(x)$: bit, $log_eP(x)$:nat

  $I(x) = -logP(x)$

  • If $P(sunny) = 0.5, I(sunny) = -log_2(\frac{1}{2}) = 1$

Entropy

  • Degree of uncertainty
  • How many questions are needed to check information
  • Expected information amount on average when accident $x$ is sampled by probability distribution $P$
      $H(x) = \mathbb{E}{x{\sim}p}[I(x)] = -\displaystyle\sum{x}P(x)logP(x)$

KL divergence

  • KL divergence is used to measure difference between probability distribution $P$ and $Q$
  • Entropy difference in alternative probability distribution presenting approximate values in sampling step for certain probability distribution
  • KL divergence is $D_{KL}(P||Q) \neq D_{KL}(Q||P)$
      $D_{KL}(P||Q) = \mathbb{E}{x{\sim}p}[logP(x) - logQ(x)]$   $=\displaystyle\sum{x}log\frac{P(x)}{Q(x)}=\displaystyle\sum_{x}P(x)logP(x) - P(x)logQ(x)$   $=(-\displaystyle\sum_{x}P(x)logQ(x)) - (-\displaystyle\sum_{x}P(x)logP(x))$          $H(P,Q)$          $H(P)$

Cross Entropy

  • Used to measure difference between probability distribution P and Q
    $H(P,Q) = -\displaystyle\sum_{x}P(x)logQ(x)$

Leave a comment