# An information theory approach to understanding medical lab tests

Information Theory (Winter 2020)

## Introduction

In this project, we take an information theoretic approach to understanding redundancy in common blood tests taken for patients in the ICU. Patients in the ICU routinely get a lot of bloodwork done. While results from their tests are used by clinicians to monitor their health and progress over time, there are several drawbacks to excessive testing. Routinely drawing blood is a nuisance for the patient, and is also costly. In addition, many of the test administered may be redundant. If we are able to quantify the amount of redundancy between common tests, it may be possible to reduce the amount of testing done for a patient. This in turn can lower costs and potentially reduce patient stress.

This project is inspired from a similar study done by Lee and Maslove, who also studied redundancy in lab tests in the ICU. In our approach, however, we use a much larger dataset, focus on fewer tests, and relax the requirement for data from consecutive days.

## Dataset

Data was collected from the Optum database, a de-identified database from a national insurance provider. We extracted lab test data for patients in the ICU. Specifically, we focused on the following five tests lab tests: BUN (Blood urea nitrogen), sodium, platelets, WBC (which blood cells), and glucose. We collected data for up to 3 lab tests, where each lab test was at least one day apart. That is, the lab tests do not necessarily have to be on consecutive days. If the same lab test was done several times in a single day, the average value of all the tests were used. Outliers were excluded from the data. The table below summarizes the number of patients with results per day for each test. The number of patients decrease each day, since they perhaps no longer require the test or leave the ICU.

 Lab test Test 1 Test 2 Test 3 BUN 60,881 30,780 24,117 Glucose 57,060 29,558 22,906 Platelets 55,650 29,134 22,028 Sodium 62,051 31,972 25,104 WBC 47,018 21,579 15,793

The data was discretized into 30 bins. Histograms of the tests over 3 days of testing are shown below. Visually, it appears that the distribution of the lab test values stay remain similar over the three days of testing. To get a quantitative measure, we turn to information theory concepts.

## Methods

First, we calculate the entropy of the test values on each day. Entropy is a measure of randomness or “surprise” of a variable. If $U$ is a discrete random variable taking values in alphabet $\mathcal{U}$, then the entropy of $U$ is given by $H(U)=- \sum_n p(u) \log p(u)$. In the table below, we report the entropy of each lab test per day. For most tests, the entropy decreases over time. BUN and sodium, however, are exceptions.

 Lab test Test 1 Test 2 Test 3 BUN 3.091 3.115 3.312 Glucose 3.538 3.404 3.315 Platelets 3.615 3.431 3.447 Sodium 3.534 3.625 3.558 WBC 3.244 3.223 3.050

Next, we are interested in quantifying redundancy between consecutive lab tests. That is, for example, how much additional information do the lab results from day 2 provide, given the lab test from day 1? If there is not much new information, there is a high level of redundancy.

To better quantify this measure, we introduce the concepts of mutual information and conditional entropy. Mutual information measures how much a variables reduces the entropy in another variable. High mutual information between tests suggests high redundancy between the tests. Given random variables $X$ and $Y$, we calculate their mutual information as follows: $I(X; Y) = \sum_{(x,y)}P(x,y) \log \frac{P(x,y)}{P(x)P(y)}$. We can use mutual information to motivate conditional entropy $H(X|Y)$, which is the entropy of a given random variable $X$ given knowledge of the random variable $Y$. To calculate the conditional entropy of $X$ given $Y$, we simply subtract the mutual information from the entropy of $X$. That is, $H(X|Y) = H(X) - I(X;Y)$.

We now return to the context of lab tests taken over many days. Below, we report the conditional entropy of a lab test given the previous lab test. We see that entropy drops significantly after conditioning, which suggests a high degree of redundancy in these consecutive tests. Interestingly, we see that the entropy of test 3 decreases more upon conditioning on test 1 compared to test 2.

 Lab Test H(Test 2 | Test 1) H(Test 3 | Test 2) H(Test 3 | Test 1) BUN 0.282 0.508 0.532 Glucose 0.311 0.376 0.311 Platelets 0.218 0.298 0.436 Sodium 0.371 0.280 0.371 WBC 0.225 0.181 0.065

We also visualize the results in the figures below. We plot both the entropy of the test over 3 days and the conditional entropy of the test, conditioned on the previous lab value (note we are of course unable to condition for the first lab test). Again, we notice how conditioning significantly decreases entropy.

Next, we performed a pairwise analysis to determine the extent to which pairs of tests are redundant. Specifically, we calculated the entropy of a lab test conditioned on another lab test. We calculated these results for all pairs over all 3 days of testing and plot the results below.

We note a few things about these results. First, conditional entropy is not symmetric. For example, let us look at WBC tests. We see that conditioning on sodium, in general, reduces the entropy on WBC the most compared to the rest of the tests. However, conditioning on WBC does not reduce the entropy most for sodium. Rather, conditioning on platelets most reduces entropy of sodium. Coincidentally, the relationship is symmetric for platelets and sodium: conditioning on sodium most reduces entropy in platelets (conditioning on glucose also does quite well in reducing entropy). Across all tests, conditioning on sodium in general reduces entropy most. This suggests that the sodium test is indeed important, and the sodium test results have value in eliminating redundancy in the other tests.

## Conclusions

Lab testing for ICU patients consumes resources and money. In this project, we used concepts from information theory to study redundancy in 5 common lab tests ordered by doctors for ICU patients. As suspected, we found that there is a lot of mutual information in lab tests taken over multiple days. Conditioning on previous lab test from days prior results significantly reduces entropy of the lab test. Likewise, there is also a degree of redundancy between different lab tests. We saw conditioning on the sodium lab in general reduces entropy for most of the other lab tests.

The methods and approaches in this project can be further extended to include more tests over more days to understand the relationship between different tests. Findings can be used to better inform doctors on which tests to administer for ICU patients.

References:

[1]  Lee, J., Maslove, D.M. Using information theory to identify redundancy in common laboratory tests in the intensive care unit. BMC Med Informatics and Decision Making 15:59 (2015). https://doi.org/10.1186/s12911-015-0187-x

[2]  Vollmer, Robin. (2007). Entropy and Information Content of Laboratory Test Results. American journal of clinical pathology. 127. 60-5. 10.1309/H1F0WQW44F157XDU.