By David Lin (linkewei :at: stanford.edu)
Introduction
In this article, we explore some applications of information-theoretic methods to various combinatorial problems. The various approaches are best thought of as an extension to the probabilistic method, first initiated by Paul Erdős in the 1940s, where a probability distribution imposed on combinatorial objects (e.g. a randomly selected element of a set) can give rise to deterministic conclusions that are otherwise difficult to obtain.
The probabilistic method usually depends on computing the expected value (where we obtain a “typical” value), or showing that the probability of an event is positive (i.e. the probability of constructing a desired object). Information theory forms a natural extension to this method by offering additional quantities to consider, since entropy and mutual information possess intuitive significance once a random distribution is imposed on the objects.
Information Theory Preliminaries
When we construct a (discrete) random variableThese two quantities are also related by the chain rule:
Fact. (Chain Rule)
(Interpretation: the information of revealing
A key inequality is as follows:
Fact. (Data Processing Inequality)
Remark. As a quick corollary, we have the much simpler
Entropic Lower Bounds
A basic property is that the entropy gives a lower bound on the number of distinct valuesFact.
This immediately gives us a way to connect entropy and counting: we can bound any count we want by setting up some distribution over the desired object, and then estimating the entropy.
Problem. Let
Solution. We will select a random length 3 path
First we compute
Similarly
Discussion.
- This is clearly tight for complete graphs.
- The way to invent this solution is firstly to notice the naive argument that gives the RHS: picking
randomly among
,
and
shares the
-vertex with probability
while
and
shares the
-vertex with probability
. Hence the probability of
being a valid path is
if we make the naive assumption that the two events are independent.
This leads us to the correct solution when we impose such a distribution on the set of 3-paths, then the entropic counting bound gives us the right answer.
Problem. Given finite sets
Solution. Select
Remark. We made use of the QM-AM-GM-HM inequality for random variables:
Entropic Lower Bounds
An alternative argument provides us a way to obtain an upper bound instead of a lower bound. We start with the following observation: ifA naive way to do so (especially when joint variables are present) is to simply make use of subadditivity. If we are more careful, we could also reveal each variable sequentially and track the incremental information revealed by each variable, expressed by the equation below:
Problem. (Binomial tail bounding) Show the bound (for
Solution. Consider a random
Problem. (Discrete Loomis-Whitney) Let
Solution. The following entropy estimate is useful (special case of Han’s inequality):
Discussion.
- This was proposed at the International Mathematics Olympiad 1992, a contest for high school students. How might they have solved it? Here’s a possible approach:
Alternative Solution. Let
. By the Cauchy-Schwarz inequality:
then we note that there is an injection from pairs with the same
coordinates to
:
.
- What are the equality cases? Looking at the computation above, we require at least:
In particular, the second equation tells us that the values for
should have the same distribution conditioned the projection onto
. The first tells us that conditioned on
,
forms a grid, so equality holds for “cuboidal sets”
(If you look closely, the alternative solution suggests the same equality case).
- triangle if
are edges
- vee if
are edges
Solution. Select
Discussion. Ordinarily, when we relax
Other directions
Aside from entropy, other information-theoretic concepts also have some relevance to combinatorial problems. For instance, Moser [2009] used a compression argument to prove the Lovasz local lemma, which guarantees that for a collection of independent variables and a sequence of “local” (i.e. depending only on a few variables) and “rare” bad events , all bad events are avoided with some positive probability.
Acknowledgements
I would like to thank Yanjun for his useful suggestions and advice for this project, and Professor Weissman for teaching this course and introducing me to the wonderful world of information theory.