2. Small Worlds and Large Worlds

Hard.

2H1. Suppose there are two species of panda bear. Both are equally common in the wild and live in the same places. They look exactly alike and eat the same food, and there is yet no genetic assay capable of telling them apart. They differ however in their family sizes. Species A gives birth to twins 10% of the time, otherwise birthing a single infant. Species B births twins 20% of the time, otherwise birthing singleton infants. Assume these numbers are known with certainty, from many years of field research. Now suppose you are managing a captive panda breeding program. You have a new female panda of unknown species, and she has just given birth to twins. What is the probability that her next birth will also be twins?

We first need to compute the probability that our panda will have twins, then use that information to compute the probability that our panda will have twins again, given that she’s already had one set.

We can now update the probabilities that our panda is a member of species A or species B, given that she has had twins:

We can see that, after the birth of twins, we now believe our panda is twice as likely to be a member of species B than species A. We’ll use these new estimates about her species membership to update the probability that she will have twins. Given $\eqref{1}$:


2H2. Recall all the facts from the problem above. Now compute the probability that the panda we have is from species A, assuming we have observed only the first birth and that it was twins.

We computed this probability while solving 2H1. It is simply $\eqref{4}$:


2H3. Continuing on from the previous problem, suppose the same panda mother has a second birth and that it is not twins, but a singleton infant. Compute the posterior probability that this panda is species A.

We can compute this probability by using the information we learned in 2H1. The key is to use the probabilities computed after observing our panda has had her first set of twins.


2H4. A common boast of Bayesian statisticians is that Bayesian inference makes it easy to use all of the data, even if the data are of different types. So suppose now that a veterinarian comes along who has a new genetic test that she claims can identify the species of our mother panda. But the test, like all tests, is imperfect. This is the information you have about the test:

The vet administers the test to your panda and tells you that the test is positive for species A. First ignore your previous information from the births and compute the posterior probability that your panda is species A. Then redo your calculation, now using the birth data as well.

It will be helpful to compute a table to show the possible results of the veterinarian’s test:

\begin{array}{c|c|c} & \text{Test=A} & \text{Test=B} \\ \hline A & 0.8 & 0.2 \\ B & 0.35 & 0.65 \\ \end{array}

To complete the first part of the question, we can use our initial estimates about whether our panda is a member of species A or species B to compute the updated probability, given the results of the veterinarian test:

By including the information about our panda’s births, we can gain a better estimate of her species membership. All we need to do is use the information we learned in 2H3. The key is to use the probabilities computed after observing our panda has had twins and then a singleton.