Combinatorial Problems via Information Theory

Information Theory (Winter 2020)

By David Lin (linkewei :at:


In this article, we explore some applications of information-theoretic methods to various combinatorial problems. The various approaches are best thought of as an extension to the probabilistic method, first initiated by Paul Erdős in the 1940s, where a probability distribution imposed on combinatorial objects (e.g. a randomly selected element of a set) can give rise to deterministic conclusions that are otherwise difficult to obtain.

The probabilistic method usually depends on computing the expected value (where we obtain a “typical” value), or showing that the probability of an event is positive (i.e. the probability of constructing a desired object). Information theory forms a natural extension to this method by offering additional quantities to consider, since entropy and mutual information possess intuitive significance once a random distribution is imposed on the objects.

Information Theory Preliminaries

When we construct a (discrete) random variable X taking values among a set \mathcal X, the entropy is defined as H(X) \overset{\scriptscriptstyle\Delta}= \mathbb E[-\log p_X(X)] = \sum_{x\in \mathcal X} -p_X(x) \log p_X(x) where all logs are base 2 and by convention 0\log 0= 0. Following the analogy, if we have a second random variable Y taking values on \mathcal Y, we have the concepts of joint entropy and conditional entropy : \begin{aligned} H(X,Y) &\overset{\scriptscriptstyle\Delta}= \mathbb E[-\log p_{X,Y}(X,Y)] = \sum_{(x,y)\in \mathcal X\times \mathcal Y} -p_{X,Y}(x,y) \log p_{X,Y}(x,y)\\ H(Y|X) &\overset{\scriptscriptstyle\Delta}= \mathbb E[H(Y|X=x)] = \mathbb E[-\log p_{Y|X}] = \sum_{(x,y)\in \mathcal X \times \mathcal Y} - p_{X,Y}(x,y) \log p_{Y|X=x}(y) \end{aligned} There is a natural information-based interpretation for the above. H(X,Y) is a measure of the combined information of X and Y (or equivalently, the measure of the information of (X,Y) as a joint variable), while H(Y|X) is the measure of the expected information revealed by X about Y.

These two quantities are also related by the chain rule:

Fact. (Chain Rule) H(X,Y) = H(X) + H(Y|X).

(Interpretation: the information of revealing X and Y is equivalent to the information of revealing X then revealing Y.)

A key inequality is as follows:

Fact. (Data Processing Inequality) H(X|Y,Z) \le H(X|Y). (Interpretation: revealing additional information cannot increase entropy)

Remark. As a quick corollary, we have the much simpler H(Y|X)\le H(Y), which when combined with the chain rule gives us the subadditivity of entropy.

Entropic Lower Bounds

A basic property is that the entropy gives a lower bound on the number of distinct values X can take:

Fact. H(X) \le \log |\mathcal X|.

This immediately gives us a way to connect entropy and counting: we can bound any count we want by setting up some distribution over the desired object, and then estimating the entropy.

Problem. Let G = (U\sqcup V,E) be a bipartite graph with vertex partitions of size |U|=m and |V|=n respectively. If P is the set of length 3 paths (edges may be repeated), then |P|\ge \frac{|E|^3}{mn}

Solution. We will select a random length 3 path (e_1,e_2,e_3) as follows: e_2 = \overline{uv} is uniformly selected among all edges, then e_1 is uniformly selected among all edges at u and e_3 is uniformly selected among all edges at v independent of e_1.

First we compute H[e_1|e_2]: \begin{aligned} H[e_1|e_2] &= \sum_{u\in U}\frac{\deg u}{|E|}\log (\deg u)\\ &\ge \frac{m}{|E|}\left(\frac{|E|}{m}\log\left(\frac{|E|}{m}\right)\right) = \log\left(\frac{|E|}{m}\right)\\ \end{aligned}

Similarly H[e_3|e_2]\ge \log\left(\frac{|E|}{n}\right). Then: \begin{aligned} \log |P|&\ge H(e_1,e_2,e_3)\\ &= H(e_2) + H(e_1|e_2) + H(e_3|e_2)\\ &\ge \log\left(\frac{|E|^3}{mn}\right)\\ \end{aligned}

  • This is clearly tight for complete graphs.
  • The way to invent this solution is firstly to notice the naive argument that gives the RHS: picking (e_1,e_2,e_3) randomly among E\times E\times E, e_1 and e_2 shares the U-vertex with probability 1/m while e_2 and e_3 shares the V-vertex with probability 1/n. Hence the probability of (e_1,e_2,e_3) being a valid path is 1/mn if we make the naive assumption that the two events are independent.

    This leads us to the correct solution when we impose such a distribution on the set of 3-paths, then the entropic counting bound gives us the right answer.

Problem. Given finite sets X, A_1,...,A_{n-1} and functions f_i : X \to A_i, a vector (x_1,...,x_n) \in X^n is called nice if f_i(x_i) = f_i(x_{i+1}) for each i = 1,2,..,n-1. Show that the number of nice vectors is at least \frac{|X|^n}{\prod_{i=1}^{n-1} |A_i|}

Solution. Select x_1 uniformly at random, and select x_2 among the possible choices (given x_1) uniformly at random, and so on. The probability of x_i being any particular value of X is fixed at \frac{1}{|X|}. So: \begin{aligned} H[x_1,x_2,...,x_n] &= H[x_1] + H[x_2|x_1] + H[x_3 | x_1,x_2]...\\ &= \log |X| + \mathbb{E}\left[\log \deg x_1\right] + \mathbb{E}\left[\log \deg x_2\right]...\\ &\ge \log |X| - \log\mathbb{E}\left[\frac{1}{\deg x_1}\right] - \log\mathbb{E}\left[\frac{1}{\deg x_2}\right] ... \qquad\text{(GM-HM)}\\ &= \log |X| - \log \left( |A_1|/|X|\right) - \log \left( |A_2|/|X|\right) - ... \end{aligned} which is what we wanted.

Remark. We made use of the QM-AM-GM-HM inequality for random variables: \frac{1}{2}\log \mathbb{E}[X^2] \ge \log \mathbb{E}[X]\ge \mathbb{E}[\log(X)]\ge -\log \mathbb{E}[1/X]

Entropic Lower Bounds

An alternative argument provides us a way to obtain an upper bound instead of a lower bound. We start with the following observation: if X was the uniform distribution, then the \log |\mathcal X| upper bound is in fact attained. This suggests that for an upper bound, we should start with the uniform distribution and then bound the entropy by various methods.

A naive way to do so (especially when joint variables are present) is to simply make use of subadditivity. If we are more careful, we could also reveal each variable sequentially and track the incremental information revealed by each variable, expressed by the equation below: H(X_1,X_2,...,X_n) = \sum_{i=1}^n H(X_i|X_1,...,X_{i-1}) \le \sum_{i=1}^n H(X_i)

Problem. (Binomial tail bounding) Show the bound (for k\le n/2) \sum_{i=0}^k \binom{n}{k} \le 2^{nh_2(k/n)} where h_2(\alpha) = \alpha \log_2 \alpha + (1-\alpha) \log_2 (1-\alpha) is the binary entropy function.

Solution. Consider a random n-bit string X_1X_2...X_n uniformly selected from the set of strings with at most n 1’s. We have \log LHS = H(X_1X_2...X_n) \le \sum_{i=1}^n H(X_i) = \sum_{i=1}^n h_2(\mathbb P(X_i = 1))\le nh_2(k/n) since \mathbb P(X_i = 1) \le k/n \le 1/2 and h_2 is monotonic on [0,1/2].

Problem. (Discrete Loomis-Whitney) Let S be a finite subset of \Z^3, and let S_{xy} be the projection of S onto the xy-plane (with S_{yz},S_{zx} defined similarly). Then: |S|^2 \le |S_{xy}|\cdot |S_{yz}|\cdot |S_{zx}|

Solution. The following entropy estimate is useful (special case of Han’s inequality): 2H[X,Y,Z] \le H[X,Y] + H[Y,Z] + H[Z,X] This can be established by a short computation: \begin{aligned}     H[X,Y] + H[Y,Z] + H[Z,X] - 2H[X,Y,Z] &= H[X,Y] - H[X|Y,Z] - H[Y|X,Z]\\     &\ge H[X,Y] - H[X|Y] - H[Y] = 0 \qedhere     \end{aligned} Now select (P_x,P_y,P_z)\in S uniformly at random, then \begin{aligned}         2\log |S| &= 2H[P_x, P_y, P_z]\\                   &\le H[P_x, P_y] + H[P_y, P_z] + H[P_z,P_x]\\                   &\le \log |S_{xy}| + \log|S_{yz}| + \log|S_{zx}|     \end{aligned} which is exactly log of the inequality we needed to show.

  • This was proposed at the International Mathematics Olympiad 1992, a contest for high school students. How might they have solved it? Here’s a possible approach: Alternative Solution. Let V_{a,b} = \{(a,b,c)\in S\}. By the Cauchy-Schwarz inequality: |S|^2 = \left(\sum_{(a,b)\in S_{xy}} |V_{a,b}|\right)^2 \le |S_{xy}|\cdot \sum_{(a,b)\in S_{xy}} |V_{a,b}|^2 then we note that there is an injection from pairs with the same xy coordinates to S_{yz}\times S_{zx}: (a,b,c),(a,b,c')\mapsto (a,c), (b,c').
  • What are the equality cases? Looking at the computation above, we require at least: \begin{aligned}         I(P_x;P_z|P_y) &= 0\\         I(P_y;P_x,P_z) &= 0     \end{aligned} In particular, the second equation tells us that the values for P_y should have the same distribution conditioned the projection onto S_{zx}. The first tells us that conditioned on Y, (P_x,P_z) forms a grid, so equality holds for “cuboidal sets” S=S_x\times S_y\times S_z (If you look closely, the alternative solution suggests the same equality case).
Problem. In a directed graph G, a triple of vertices (x,y,z) is a
  • triangle if \overrightarrow{xy}, \overrightarrow{yz}, \overrightarrow{zx} are edges
  • vee if \overrightarrow{xy}, \overrightarrow{xz} are edges
If T and V denonte the sets of triangles and vees respectively, show that |V|\ge |T|.

Solution. Select (X,Y,Z) uniformly at random from the set of all triangles. Then: \begin{aligned}     \log_2 T &= H(X,Y,Z)\\     &= H(X) + H(Y|X) + H(Z|Y,X) \end{aligned} where the last line has an interpretation as the entropy decrements from revealing X,Y,Z in that order. However, we can ignore the dependence of Z on X (i.e. H(Z|Y,X) \le H(Z|Y) = H(Y|X) by symmetry) to get an upper bound: \log_2 T \le H(X) + 2H(Y|X) This suggests constructing a distribution on vees as follows: X'\sim X above and Y',Z'\sim Y|X'. The result is that \log_2 (X',Y',Z') is precisely H(X) + 2H(Y|X), but it must be upper bounded by \log_2 |V|.

Discussion. Ordinarily, when we relax H(Z|Y,X) to H(Z|Y), we would get a distribution of length-2 paths (like X\rightarrow Y \rightarrow Z), but we know that every triangle is a length-2 path (and so by following through, the conclusion would have been trivial). However, the magic happens when we make use of cyclic symmetry to “move” H(Z|Y) to H(Y|X).

Other directions

Aside from entropy, other information-theoretic concepts also have some relevance to combinatorial problems. For instance, Moser [2009] used a compression argument to prove the Lovasz local lemma, which guarantees that for a collection of independent variables and a sequence of “local” (i.e. depending only on a few variables) and “rare” bad events , all bad events are avoided with some positive probability.


I would like to thank Yanjun for his useful suggestions and advice for this project, and Professor Weissman for teaching this course and introducing me to the wonderful world of information theory.

An Order-Optimal Edit-Correcting Code

Information Theory (Winter 2020)

By Daniel Tan,


DNA (deoxyribonucleic acid), could also be called the “source code” of life on Earth. Whereas machine code relies on a binary alphabet of 0s and 1s, DNA relies on a quaternary code of the ATCG nucleotides. DNA is easy to read from, write to, and copy, since these functions are already implemented by natural cellular machinery. In addition, it is very stable, and does not require much energy to store. Due to these advantages, it has recently become a medium of interest for long-term information storage. [1]

One flaw of DNA as a storage medium is that information stored in DNA can be corrupted through point mutations. [2] In particular, mutations can involve a nucleotide being removed from the DNA sequence (deletion), a nucleotide being changed to a different nucleotide (substitution), or a new nucleotide being added at an arbitrary location (insertion). If DNA is to be used as information storage, there is a need for error-correcting codes which can reliably handle all three types of error.


In this project, I implement a code presented in a recent paper [3] that corrects a single edit (substitution, insertion, or deletion) with \log n + O(\log \log n) redundancy. In other words, to correct a single edit in a message of  n symbols, only \log n + O(\log \log n) additional symbols need to be used. In contrast, previous approaches had a redundancy of  c \log n where  c \geq 2 .

The implementation and a high-level explanation of how the code works can be found at this GitHub repository. The code is intended to be open-source and may be freely used, reproduced or modified with suitable accreditation.


[1] Goldman, N., Bertone, P., Chen, S. et al. Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature 494, 77–80 (2013).

[2] “Point Mutation”. Biology Dictionary. Retrieved 25 Mar 2020.

[3] Cai, Kui et al. Optimal Codes Correcting a Single Indel / Edit for DNA-Based Data Storage. arXiv:1910.06501 [cs.IT]



An information theory approach to understanding medical lab tests

Information Theory (Winter 2020)


In this project, we take an information theoretic approach to understanding redundancy in common blood tests taken for patients in the ICU. Patients in the ICU routinely get a lot of bloodwork done. While results from their tests are used by clinicians to monitor their health and progress over time, there are several drawbacks to excessive testing. Routinely drawing blood is a nuisance for the patient, and is also costly. In addition, many of the test administered may be redundant. If we are able to quantify the amount of redundancy between common tests, it may be possible to reduce the amount of testing done for a patient. This in turn can lower costs and potentially reduce patient stress.

This project is inspired from a similar study done by Lee and Maslove, who also studied redundancy in lab tests in the ICU. In our approach, however, we use a much larger dataset, focus on fewer tests, and relax the requirement for data from consecutive days.


Data was collected from the Optum database, a de-identified database from a national insurance provider. We extracted lab test data for patients in the ICU. Specifically, we focused on the following five tests lab tests: BUN (Blood urea nitrogen), sodium, platelets, WBC (which blood cells), and glucose. We collected data for up to 3 lab tests, where each lab test was at least one day apart. That is, the lab tests do not necessarily have to be on consecutive days. If the same lab test was done several times in a single day, the average value of all the tests were used. Outliers were excluded from the data. The table below summarizes the number of patients with results per day for each test. The number of patients decrease each day, since they perhaps no longer require the test or leave the ICU.

Lab test Test 1 Test 2 Test 3
BUN 60,881 30,780 24,117
Glucose 57,060 29,558 22,906
Platelets 55,650 29,134 22,028
Sodium 62,051 31,972 25,104
WBC 47,018 21,579 15,793

The data was discretized into 30 bins. Histograms of the tests over 3 days of testing are shown below. Visually, it appears that the distribution of the lab test values stay remain similar over the three days of testing. To get a quantitative measure, we turn to information theory concepts.



First, we calculate the entropy of the test values on each day. Entropy is a measure of randomness or “surprise” of a variable. If  U is a discrete random variable taking values in alphabet  \mathcal{U}, then the entropy of  U is given by  H(U)=- \sum_n p(u) \log p(u). In the table below, we report the entropy of each lab test per day. For most tests, the entropy decreases over time. BUN and sodium, however, are exceptions.

Lab test Test 1 Test 2 Test 3
BUN 3.091 3.115 3.312
Glucose 3.538 3.404 3.315
Platelets 3.615 3.431 3.447
Sodium 3.534 3.625 3.558
WBC 3.244 3.223 3.050

Next, we are interested in quantifying redundancy between consecutive lab tests. That is, for example, how much additional information do the lab results from day 2 provide, given the lab test from day 1? If there is not much new information, there is a high level of redundancy.

To better quantify this measure, we introduce the concepts of mutual information and conditional entropy. Mutual information measures how much a variables reduces the entropy in another variable. High mutual information between tests suggests high redundancy between the tests. Given random variables  X and  Y, we calculate their mutual information as follows:  I(X; Y) = \sum_{(x,y)}P(x,y) \log \frac{P(x,y)}{P(x)P(y)}. We can use mutual information to motivate conditional entropy  H(X|Y), which is the entropy of a given random variable  X given knowledge of the random variable  Y. To calculate the conditional entropy of  X given  Y, we simply subtract the mutual information from the entropy of  X. That is,  H(X|Y) = H(X) - I(X;Y).

We now return to the context of lab tests taken over many days. Below, we report the conditional entropy of a lab test given the previous lab test. We see that entropy drops significantly after conditioning, which suggests a high degree of redundancy in these consecutive tests. Interestingly, we see that the entropy of test 3 decreases more upon conditioning on test 1 compared to test 2.

Lab Test H(Test 2 | Test 1) H(Test 3 | Test 2)
H(Test 3 | Test 1)
BUN 0.282 0.508 0.532
Glucose 0.311 0.376 0.311
Platelets 0.218 0.298 0.436
Sodium 0.371 0.280 0.371
WBC 0.225 0.181 0.065

We also visualize the results in the figures below. We plot both the entropy of the test over 3 days and the conditional entropy of the test, conditioned on the previous lab value (note we are of course unable to condition for the first lab test). Again, we notice how conditioning significantly decreases entropy.


Next, we performed a pairwise analysis to determine the extent to which pairs of tests are redundant. Specifically, we calculated the entropy of a lab test conditioned on another lab test. We calculated these results for all pairs over all 3 days of testing and plot the results below.


We note a few things about these results. First, conditional entropy is not symmetric. For example, let us look at WBC tests. We see that conditioning on sodium, in general, reduces the entropy on WBC the most compared to the rest of the tests. However, conditioning on WBC does not reduce the entropy most for sodium. Rather, conditioning on platelets most reduces entropy of sodium. Coincidentally, the relationship is symmetric for platelets and sodium: conditioning on sodium most reduces entropy in platelets (conditioning on glucose also does quite well in reducing entropy). Across all tests, conditioning on sodium in general reduces entropy most. This suggests that the sodium test is indeed important, and the sodium test results have value in eliminating redundancy in the other tests.


Lab testing for ICU patients consumes resources and money. In this project, we used concepts from information theory to study redundancy in 5 common lab tests ordered by doctors for ICU patients. As suspected, we found that there is a lot of mutual information in lab tests taken over multiple days. Conditioning on previous lab test from days prior results significantly reduces entropy of the lab test. Likewise, there is also a degree of redundancy between different lab tests. We saw conditioning on the sodium lab in general reduces entropy for most of the other lab tests.

The methods and approaches in this project can be further extended to include more tests over more days to understand the relationship between different tests. Findings can be used to better inform doctors on which tests to administer for ICU patients.


[1]  Lee, J., Maslove, D.M. Using information theory to identify redundancy in common laboratory tests in the intensive care unit. BMC Med Informatics and Decision Making 15:59 (2015).

[2]  Vollmer, Robin. (2007). Entropy and Information Content of Laboratory Test Results. American journal of clinical pathology. 127. 60-5. 10.1309/H1F0WQW44F157XDU.


Using Art and Tech to Communicate Concepts in Information Theory

Information Theory (Winter 2020)

Team Members: Sarah Woodard, Meera Radhakrishnan, Natalie Gable, Andrea Ramirez


For our final project, we developed an interactive LED-based art installation to communicate concepts in information theory, with the goal of presenting it to students at Ravenswood Middle School at the end of the quarter. In creating this project, we hoped to combine our interests in electrical engineering and art to design a project that would both intuitively communicate the underlying concepts in information theory while being visually engaging and interesting for the students at the outreach event.

Although we initially intended this art installation to be an interactive setup at the end-of quarter outreach event, given the cancellation of the event we instead demonstrated our project via a video to be shared with the students who would have attended the event. We appreciated the opportunity to leverage our engineering skills, explore our artistic sides, and develop a project with the intention of reaching out to students who may still be deciding whether they find engineering exciting.

Our interactive installation consists of the following two components:

  1. An art piece inspired by bits traveling across multiple cascaded binary symmetric channels.
  2. Two instructive mini-displays that students can play with to learn how a binary symmetric channel and binary erasure channel work, so that they can better understand the concepts underlying the art piece.

Art Piece

In developing the concept for our main art installation, we wanted to come up with a design that would be both educational and visually exciting. We decided to build an LED light sculpture that displays patterns inspired by the behavior of multiple cascaded binary symmetric channels.


The preliminary design for our sculpture. Only 3 cascaded channels are pictured here, but the final product is a full cylinder consisting of 10 channels total.

Our initial concept is pictured above. The sculpture would consist of a twisted cylinder of 10 vertical lines of LEDs. (The picture only shows 3 for clarity.) Each LED strip represents a series of cascaded binary symmetric channels across which a bit is communicated.

First, an Arduino controlling the installation would generate a random sequence of 10 bits. 0s would be represented by one color, and 1s by another. These bits would light up along the top ring of the sculpture.

Then, each bit would be “communicated” downwards to the LED below it. With a small probability of error, the color of the next LED below may flip, indicating that an error in that BSC has occurred. This occurs down each line of LEDs, with the LED color flipping at each step with a small probability representing the probability of error.

Finally, when the bits reach the bottom ring, the program controlling the sculpture will check if the communicated sequence of bits matches the originally generated bit sequence. The columns representing any bits that were communicated incorrectly will then light up red, and columns representing bits that were communicated correctly will light up green.


To build the sculpture, we decided to use a cylindrical lampshade as a mechanical scaffold for our LEDs. The lampshade would mechanically support the LEDs and provide the cylindrical shape we were looking for.

While prototyping the design, we also decided to modify the “twisted cylinder” design and instead have the LED strips run straight vertically down the sculpture. This would both be easier to construct given the LED strings we had, and would more clearly communicate the underlying concepts to viewers of the art piece.

We debated several ways to mount the LEDs onto the scaffold. Since the LED strings came with significant wiring, we wanted the wiring to be neatly tucked inside the sculpture.

One option we considered was to poke holes through the lampshade, push the LEDs through from the inside of the sculpture, and glue them in place. However, we realized that the paper of the lampshade might be too fragile, and that it would be difficult to make this design look neat. Instead we chose to glue the LEDs to the inside of the cylinder, allowing the light to filter out through the lampshade. We used hot glue to attach a total of 200 LEDs to the inside of our sculpture.

Programming and Control

Our sculpture is controlled by an Arduino and a potentiometer. The potentiometer sets the probability of error that the sculpture uses to display the binary symmetric channels.’

Final Product

Interactive Mini-Displays

In addition to our larger display, we created two interactive hand-held mini displays. The goal of these displays was to teach students at the outreach event how different types of channels work, so that they could better understand the concepts underlying our art installation.

Binary Symmetric Channel (BSC) Display

The display pictured below represents a BSC. The switch on the left can be toggled to change the input bit, which is displayed at the leftmost LEDs, from a 1 to a 0 or vice versa. In the image below, a 1 has been selected as the input and so the leftmost green LED lights up. The probability of error is encoded on the Arduino, and is used to generate a random number which determines whether or not an error has occurred.

If no error occurs, the light will travel in a horizontal line to the output, which is represented by the rightmost column of LEDs. If an error does occur, then this is represented by the light travelling down one of the diagonal strings of LEDs to the opposite output. This process is illustrated in our outreach video, which is linked to below.

Binary Erasure Channel (BEC) Display

The other display, shown below, models a BEC. The BEC display is similar to the BSC display, except now instead of error causing the light to travel to the opposite value, it will now travel to the center-right blue LED. See the outreach video below to watch the BEC display in action.

How They Work

Each mini display was made using an assortment of LEDs, an arduino, and a switch. The boards were hand soldered, and the cathodes and anodes of the LEDs were wired based on what row and column the LED belonged to, and resistors were added to each column. The input of each row and column went to the output pin of an arduino using the longer grey wires in the image below.

The serial input of the Arduino can be used to alter the probability of error, which is then used to randomly determine if an error has occurred. Once this is determined, the appropriate cathodes and anodes are activated to create the desired LED lighting pattern.


Here’s the video presenting our project, as well as a flyer that we would have distributed at our installation if the event had occurred in person. We hope you enjoy!


A flyer for our outreach installation encouraging students to be excited about engineering!
Artist: Natalie Gable

Information of Popular Communication Methods

Information Theory (Winter 2020)

Arun Seetharaman and Alissa Ling

Information theory can be a difficult topic to convey to middle school students, and we would like to encourage students to further explore fields in engineering. Our goal is to teach the theory and science of information using popular communication methods such as texting, social media, and phone calls as a platform to engage with the students. We believe this will be a concept more accessible and appreciated by the students of our outreach initiative. Our final deliverable will be a fun video that will both inform and captivate our audience. We will use drawings and animation throughout the video to make the material accessible. This is important so that we can help younger students understand an abstract science concept and get them excited about the field of engineering.

The link to our video is here:

Our outreach project was intended to be an interactive, animated video that we would share with the students. We would have paused the video and asked the students questions during the video so that they stayed engaged in the dialogue. Since the outreach was cancelled, we just recorded ourselves asking the questions we would have asked to the students.

Our project is unique in that our deliverable is the same as the outreach video, since our original idea was to make a video for the middle schoolers. We believe we accomplished our goals in creating this video to be fun, interactive, and engaging, and believe that we made the content accessible to the level of a middle schooler. While it is unfortunate that we cannot share it in person, we believe that watching the video is as good as if we were there in person.

Outline of video:

  1. Introduction
    • You can understand the fundamental aspects of information theory without knowing all the math behind it.
    • Everything you communicate can be broken down into bits, which is 1’s and 0’s.
  2. Texting
    • Each letter in a word has its own, unique binary sequence.
    • While texts are simple and can be represented easily, as you scale up the complexity of the data, you need to compress it.
  3. Compression
    • You need to use the structure of the binary sequences in the data in order to make the data smaller, while still retaining the same information.
    • Example. Say you have a sequence such as 1111111111101. Instead of sending the whole sequence, you can just send the location of the 0.
  4. Photos
    • A picture is made of pixels which are little squares of color, which all combine to form the larger image.
    • To compress an image, the most common method is to use lossy compression in the form of a JPEG, which means that some of the data is lost.
    • The image quality if affected by how much compression is applied to the image. As more compression is applied, the image becomes blurrier and more noisy.
    • Example is Snapchat.
  5. Videos
    • Videos are many pictures in quick succession. Since there are many pictures, videos require a lot of data to store, and thus need more complex methods of compression.
    • We need to both spatially and temporally compress a video. Spatial compression is image compression, and temporal compression looks at the changes in the frames.
    • Streaming and video calling are two different applications the behave differently for a variety of reasons. Examples are Netflix and FaceTime.
  6. Conclusion
    • Everything can be represented in bits, which is a binary sequence of 1s and 0s.
    • Compression is necessary when dealing with complex data such as photos and videos because they are very large.
    • Everything you use is related to information theory.
    • If you found this interesting, then you should pursue a career in engineering!


We found a variety of reputable sources and technical standards that are agreed upon by the scientific community. Many of our resources are blog posts, lecture slides, and google support posts. Unfortunately some of the information we are looking for is proprietary information that is not publicly available. For instance the exact type of compression used by popular applications such as Netflix and Snapchat are not publicly specified due to their role in their respective company’s success in their field. In these cases we can only look at what publicly known techniques are considered cutting edge and use that as an estimate.

  6. All images and clips were found online and were not created by us.