## Bayes' Theorem

### Motivation

Although Bayes’ theorem looks like a simple equation. There is some deep insights to be gained by looking at it closely. A typical problem where it is applied:

1% of women at age forty who participate in routine screening have

breast cancer. 80% of women with breast cancer will get positive

mammographies. 9.6% of women without breast cancer will also get

positive mammographies. A woman in this age group had a positive

mammography in a routine screening. What is the probability that

she actually has breast cancer?

Unfortunately, only 15% of doctors get the answer correct.

Here is the same problem stated using numbers instead of probabilities which a lot more doctors get right:

100 out of 10,000 women at age forty who participate in routine

screening have breast cancer. 80 of every 100 women with breast

cancer will get a positive mammography. 950 out of 9,900 women

without breast cancer will also get a positive mammography. If

10,000 women in this age group undergo a routine screening, about

what fraction of women with positive mammographies will actually

have breast cancer?

Now, here’s another way to look at the problem which makes it easier to understand. Before the mammography screening, the 10,000 women can be divided into two groups:

Group 1: 100 women with breast cancer

Group 2: 9,900 women without breast cancer.

After the mammography, the women can be divided into four groups:

Group A: 80 women with breast cancer, and a positive mammography

Group B: 20 women with breast cancer, and a negative mammography

Group C: 950 women without breast cancer, and a positive mammography

Group D: 8,950 women without breast cancer, and a negative mammography.

The proportion of women with breast cancer who actually got positive results, within the group of all patients with positive results, is the proportion of (A) within (A + C) = 80 / (80 + 950) = 80 / 1030 = 7.8%. This is the chance of someone actually having cancer if she is one of the unlucky ones who got a positive mammography. Not, 80%. This is Bayes’ Theorem.

As the above calculation shows, figuring out the final answer always requires all three pieces of information:

1) the percentage of women with breast cancer

2) the percentage of women without breast cancer who receive false positives

3) the percentage of women with breast cancer who receive (correct) positives.

The first one is called **prior** and the other two are called **conditionals**.

A common mistake is to ignore one of these three while calculating the probability and most commonly its the prior which is ignored.

### Interpretation as probability slider

Another way to think about the above situation is this: a person in the above population mix has a 1% chance of having cancer. However, if she undergoes the test and gets a positive, her chances increases to 7.8%. So, By using the test we are sliding the scale of belief in hypothesis “got cancer” from 1% to 7.8%. A test slides the belief probability to stronger side depending on relative ratios of true positive and false positive results it gives. A bad test may not help at all. In fact, if the rate of false positives is the same as the rate of true positives, you always have the same probability after the test as when you started. Also, if the rate of false positive exceeds rate of true positive then its a test of the reverse condition

### Understanding the terms of the equation

Bayes’s theorem expresses the above observations more formally in mathematical terms. However, to understand it, some basics first:

In probability theory, P(A|B) is read as probability of A given B. It can also be read as probability that B implies A (i.e. the reading direction is right-to-left).

P(A, B) is P(A & B) and it is different from P(A|B). This is another common confusion.

The probability that a test gives a true positive divided by the probability that a test gives a false positive is known as the **likelihood ratio** of that test. However, the likelihood ratio does not sum up everything there is to know about the usefulness of the test as is evident from the Bayes’ theorem.

The Bayes’s theorem states:

p(A|X) = p(X|A)*p(A) / [p(X|A)*p(A) + p(X|~A)*p(~A)]

Where:

p(X|A) is probability of evidence as predicted by the theory A (the prior)

p(A|X) is the probability or degree of belief we should have in our theory given the evidence X.

p(X) can also be written as the denominator above by using the *law of total probability*.

### Scientific method as a special case of Bayes’ theorem

Previously, the most popular philosophy of science was probably “Karl Popper’s falsificationism”#https://www.stuvia.com/doc/49916/summary-notes-of-karl-popper—science-as-falsification – this is the old philosophy that the Bayesian revolution is currently dethroning. Karl Popper’s falsifactionism says that theories can be definitely falsified, but never definitely confirmed. It is yet another special case of the Bayesian rules. You can even formalize Popper’s philosophy mathematically.

Let’s that there is a Theory A and we make an observation X which we think supports our theory A. The likelihood ratio for X, p(X|A)/p(X|~A), determines how much observing X slides the probability for A; So there’s a limit on how much mileage you can get from successful predictions; there’s a limit on how high the likelihood ratio goes for confirmatory evidence. On the other hand, if you encounter some piece of evidence Y that is definitely not predicted by your theory, this is enormously strong evidence against your theory. If p(Y|A) is infinitesimal, then the likelihood ratio will also be infinitesimal. For example, if p(Y|A) is 0.0001%, and p(Y|~A) is 1%, then the likelihood ratio p(Y|A)/p(Y|~A) will be 1:10000. Or flipping the likelihood ratio, if p(Y|A) is very small, then p(Y|~A)/p(Y|A) will be very large, meaning that observing Y greatly favors ~A over A. Falsification is much stronger than confirmation.

This is a consequence of the earlier point that very strong evidence is not the product of a very high probability that A leads to X, but the product of a very low probability that not-A could have led to X. This is the precise Bayesian rule that underlies the heuristic value of Popper’s falsificationism.

With some poetic license, we can say Bayes’ theorem has rational inference on the left end and physical causality on the right end; an equation with mind on one side and reality on the other.

### Conclusion

An intuitive explanation of Bayes’ Theorem is full of witticisms like “According to legend, one who fully grasped Bayes’ Theorem would gain the ability to create and physically enter an alternate universe using only off-the-shelf equipment and a short computer program. One who fully grasps Bayes’ Theorem, yet remains in our universe to aid others, is known as a Bayesattva.”

Give it a read.