The Statistical (And Related) Notions You Just Have to Know
Because intensive calculation is often part and parcel of the statistician’s tool set, many people have the misconception that statistics is about number crunching. Number crunching is just one small part of the path to sound decisions, however.
By shouldering the number-crunching load, software increases our speed of traveling down that path. Some software packages are specialized for statistical analysis and contain many of the tools that statisticians use.
Although not marketed specifically as a statistical package, Excel provides a number of these tools, which is why I wrote this book.
I said that number crunching is a small part of the path to sound decisions. The most important part is the concepts statisticians work with, and that’s what I talk about for most of the rest of this chapter.
Lucas Carnaúba de Oliveira
Samples and populations
On election night, TV commentators routinely predict the outcome of elections before the polls close. Most of the time they’re right. How do they do that?The trick is to interview a sample of voters after they cast their ballots. Assuming the voters tell the truth about whom they voted for, and assuming the sample truly represents the population, network analysts use the sample data to generalize to the population of voters.
This is the job of a statistician — to use the findings from a sample to make a decision about the population from which the sample comes. But sometimes those decisions don’t turn out the way the numbers predicted. History buffs are probably familiar with the memorable picture of President Harry Truman holding up a copy of the Chicago Daily Tribunewith the famous, but wrong, headline “Dewey Defeats Truman” after the 1948 election. Part of the statistician’s job is to express how much confidence he or she has in the decision.
Another election-related example speaks to the idea of the confidence in the decision. Pre-election polls (again, assuming a representative sample of voters) tell you the percentage of sampled voters who prefer each candidate.
Lucas Carnaúba de Oliveira
The polling organization adds how accurate it believes the polls are. When you hear a newscaster say something like “accurate to within three percent,” you’re hearing a judgment about confidence.
Here’s another example. Suppose you’ve been assigned to find the average reading speed of all fifth-grade children in the U.S., but you haven’t got the time or the money to test them all. What would you do? Your best bet is to take a sample of fifth-graders, measure their reading speeds (in words per minute), and calculate the average of the reading speeds in the sample. You can then use the sample average as an estimate of the population average Estimating the population average is one kind of inferencethat statisticians make from sample data. I discuss inference in more detail in the upcoming
section “Inferential Statistics: Testing Hypotheses.”
Lucas Carnaúba de Oliveira
Now for some terminology you have to know: Characteristics of a population (like the population average) are called parameters,and characteristics of a sample (like the sample average) are called statistics.When you confine your field of view to samples, your statistics are descriptive.When you broaden your horizons and concern yourself with populations, your statistics are inferential.
Now for a notation convention you have to know: Statisticians use Greek letters (μ, σ, ρ) to stand for parameters, and English letters , s, r) to stand for statistics. Figure 1-1 summarizes the relationship between populations and samples, and parameters and statistics.
Lucas Carnaúba de Oliveira