Archive for March, 2024

Monday Morning Math: Naive Bayes

March 25, 2024

Good morning! Last week I talked about Bayes’ theorem, which is a way of using the probability of B (assuming that you already know A) to find the probability of A (assuming that you already know B). As an example, you can use the probability that a person with a disease gets a positive test for that disease to find the probability that a person with a positive test actually has the disease, and (still and always surprising to me) those are not the same.

It turns out that Bayes theorem can also be used to determine if an email is spam! Here’s how it works. The email in question is made up of a bunch of words, and matters order the. But for this process, all the words are treated as independent, just a bunch of words in a pile — this is what is behind the word “naive”. The Naive Bayes algorithm looks at this bunch of words, figures out the probability that a piece of spam has those words in it, and then uses Bayes’ theorem to turn that around and find the probability that an email with those words is spam! The math involved is about one step more complicated than Bayes’ theorem, maybe two (something called Laplace smoothing plays a role), but it’s still the same basic idea of flipping probabilities around, in a modern application!

Thanks, S, for sharing this with me!
Sources: “Speech and Language Processing” by Daniel Jurafsky and James H. Martin, as explained by S.

Monday Morning Math: Bayes’ theorem

March 18, 2024

Good morning! Today’s post is the first of a two-parter, starting with Bayes’ theorem, named after the Reverend Thomas Bayes, who lived in England in the 1700s. Bayes’ theorem is a rule that lets you calculate probabilities that are sometimes surprising.

Let’s do an example. (I love this example – I use it each time I teach stats.) Let’s say that there is a disease that affects 1% of the population, and you have a test that is 95% accurate. This means if a person has the disease, there’s a 95% chance that the test is positive, and if the person doesn’t have the disease, there’s a 95% chance that the test is negative.

Now suppose someone takes the test and it comes back positive. What is the chance that person has the disease? It might seem like it’s 95% but it isn’t! It’s actually less than 25%. To find the probability, we can use Bayes’ theorem. Bayes’ theorem is usually written as P(A|B)=P(B|A)P(A)/P(B), but I’m going to replace A with 🤒 (for having the disease) and B with + (for having a positive test) and write it as
P(🤒| +) = P (+ | 🤒) P(🤒) / P(+)

Here’s what everything means:

  • P( 🤒 | +) is the probability that a person has the disease if they got a positive test. That’s what we’re trying to figure out.
  • P( + | 🤒) is the probability that a person gets a positive test if they have the disease. That’s 95% because the test is 95% accurate.
  • P(🤒) is the probability that a person has the disease. That’s 1%, since 1% of the population has the disease.
  • P(+) is the probability that a person gets a positive test. That’s a little more complicated: If a person has the disease there’s a 95% chance they get a positive test, but if they don’t have the disease there’s still a 5% chance that the test would (incorrectly) come back positive. So P(+) is 95% of 1% plus 5% of 99%, which turns out to be 4.75%.

Putting this all together, we get that P( 🤒 | +)=95%*1%/4.75%, which is 16.1%. In other words, even though the test is 95% accurate, if a person gets a positive test there’s only a 16.1% chance that they have the disease! This is about 1 out of 6, because out of every 100 people, only 1 person will have the disease (and they probably get a positive test) but there will also be about 5 people without the disease will also (incorrectly) get a positive test result.

Isn’t that amazing? Next week we’ll go one step further, and see how this applies to your computer deciding whether an email is spam…

How many digits of π should NASA use?

March 11, 2024

Good morning! I’ve been looking up space things lately in honor of our upcoming Total Eclipse 🌑🌞, and ran across an article by NASA/JPL on how many digits of π are necessary for accurate calculations with space travel. It’s an interesting read, but was last updated in 2022, which means it’s out of date. Or is it? Let’s do some calculations in anticipation of ✨ Pi Day ✨ this coming Thursday and find out!

The driving force behind the question is Voyager 1. Fun fact: Voyager 1 was launched on September 5, 1977, two weeks after Voyager 2, but it took a faster route into space and by mid-December that year was further away from Earth than Voyager 2’s, and fourteen months later got to discover that Jupiter had a ring! (Don’t feel too bad for Voyager 2, though, since V2 is still the only spacecraft to have visited Uranus and Neptune. Everyone gets to contribute.) In 1998 Voyager 1 overtook Pioneer 10, which had been launched in 1972, and with that Voyager 1 became the furthest spacecraft from Earth. At this point, after over 46 years on the (space)road, Voyager 1 is about 15.14 billion miles from Earth, which is almost 24.36 billion kilometers. We’ll round up to 25 billion kilometers to be safe. And, umm, because that makes the math easier.

Speaking of math, let’s do some calculations! If we think of r as Voyager 1’s distance from Earth, the circumference of a circle with that radius would be 2πr. Now let’s think of π as being made up of an approximation p plus some small error ∆p. This means the circumference would be 2(p+∆p)r=2pr+2r∆p, making our error for the circumference is 2r∆p. We’re using circumference here as a proxy for thinking of how far off our estimates of the exact location of V1 could be by potentially rounding too much in our approximation for π.

So to figure out how accurate our approximation for π needs to be in space travel computations, we can decide how much error we’re OK with, and divide that by 2r. If we want the circumference using Voyager 1’s distance to be accurate to 1 millimeter, which honestly is pretty good after close to 50 years of travel, then we’ll take that 1 millimeter and divide by 50 billion kilometers (which is twice the 25 billion km that V1 is from Earth). Putting everything in meters for an easier calculation, that’s 1×10-3 meters divided by 50×1012 meters. Division gives (1/50)x10-15. Since 1/50 is 0.02=2×10-2, our allowable error in the approximation for π is 2×10-17.

What does all this mean? Well, it means that if our approximation for π is accurate to 17 decimal places, than our computations involving Voyager 1’s distance will be accurate to within a millimeter! (Even more, I think, because that 1mm is spread over the whole circumference, but we’re trying to keep things simple). In other words, NASA can use 3.141 592 653 589 793 24. That last digit is rounded up from a 3, but we’re still OK whether we use a 3 or a 4 in that last spot. We can still know where Voyager 1 is.

HAPPY PI DAY!!!

Motivation: https://www.jpl.nasa.gov/edu/news/2016/3/16/how-many-decimals-of-pi-do-we-really-need/
(although they approached it in the opposite way – picking 15 decimal places and seeing how far off the circumference using Voyager 1 would be, and similarly looking at distances over Earth or over the known universe!).

Monday Morning Math: Kathaleen Land

March 4, 2024

Good morning! Our mathematician this morning is Kathaleen Land, one of the Hidden Figures who worked at NASA.  Kathaleen Land was the Sunday School Teacher of Margot Lee Shetterly, the author of the book  (and then movie) Hidden Figures, and she is sometimes referred to as the inspiration for the book. According to Shetterly, in 2010 she and her husband were visiting her parents in Hampton, Virginia, and had gone to her church where she caught up with her former teacher.  After leaving, Shetterly’s father – himself a research scientist at the National Aeronautics and Space Administration’s Langley Research Centre — pointed out that Mrs. Land had been a computer (someone who computes) at NASA.  Shetterly had grown up knowing many of the computers, but now, as an adult, she found it compelling in a new way.  Kathaleen Land was one of the first women Margot Lee Shetterly interviewed, and she pointed the way to other women whose contributions were not widely known.  

Kathaleen Land herself was born Bonnie Kathaleen Pleasants in northern Virginia and moved to Hampton, Virginia, in 1941 around the time she married her husband Stanley Land. They had three daughters and lived in Hampton, where Langley is located, the rest of their lives.  Kathaleen Land passed away in 2012 at the age of 93.

Sources: