Math 281 - Chapter 1
1-2
Statistics
Statistics is a collection of methods for:
- Planning experiments and surveys
- Obtaining data
- Organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions based on data (observations that have been collected)
Population/Sample
Population: the group to be studied
Sample: a subgroup of the population
In a census, we collect data from the entire population
In a sample, we collect data from the sample (usually to try to draw conclusions about the population)
What’s the population & sample?
A congressperson reads 350 letters from her constituents about gun control
A pollster asks 30 adults at the mall about their shopping preferences
Fox News does a poll, and reports the opinions of the 2500 people who called in.
Researchers test out a new cancer drug on 100 men with prostate cancer.
Parameters and Statistics
A parameter is a numerical measurement describing some characteristic of a population
A statistic is a numerical measurement describing some characteristic of a sample
Parameter or Statistic?
Of the 350 letters read, 250 wanted more restriction on gun sales
Of the 30 people questioned, 20 were shopping for themselves
Americans spend $23 billion shopping online this holiday season.
In the last election, 82% of registered voters voted in Washington
Qualitative/Quantitative
Qualitative Data is data that only has qualities (categories or names), but no numbers.
Examples: Eye color, Yes/No, type of car
Quantitative Data is data that is a numerical quantity (a number, usually with units).
Examples: Height, weight, hours slept
Discrete and Continuous Data
Discrete Data comes from a finite or countable set of possibilities.
Example: How many kids do you have? Possible answers are 0,1,2,…, but not 1.3
Continuous Data comes from an infinite set of possibilities.
Example: Your car speedometer reading. It is possible to be going 20mph, 20.5mph, 20.01mph, etc.
Levels of Measurement
Nominal – all we have is names
Ordinal – we can also order
Interval – also, differences are meaningful
Ratio – also, ratios are meaningful
What’s the level of measurement?
Salaries of corporate executives
Your “now serving” number at the DMV
People’s area codes
Record time for 100m dash
People’s birth state
Daily low temperatures
Birth years
Homework 1-2
1.2: 1, 3, 5, 9, 11, 13, 15, 17
1-3
Misuses of Statistics
Voluntary response sample (self-selected):
Internet, call-in, magazine polls
Small samples:
“4 out of 5 dentists prefer…”
Self-Interest Study:
Company-sponsored studies
Precise Numbers:
“People spent an average of $23.12 on a gift"
Questions
Loaded questions
Order of questions
Refusals (non-response)
Misleading Graphs
Graphs with different scales. Graphs show salaries increased from $2100 to $2300 from 1994 to 2004.
In the first graph, the scale for salaries runs from $0 to $3000, making the difference look very small.
In the second graph, the scale runs from $2000 to $2350, making the difference look very large.
Misleading Pictographs
Pictographs are representing a number with a picture. The size of the picture represents the size of the number.
In this example, Worker salaries of $2000 a month are compared with Manager salaries of $4000 a month. The picture
representing the salary for managers is twice as wide, making the picture actually 4 times as big.
Percentages
Percentage has to be of something
100% means all of it
Percents are based on a “base” amount:
10% extra savings on clearance merchandise (50% off).
$100 coat -> $50 clearance price
10% extra -> $45 final, not $40
Correlation and Causation
Just because two things have a relationship does NOT mean one causes the other.
Even when a causal relationship is likely, we have to be careful about assuming what is the cause, and what is the effect.
Examples: Golf scores and salary for CEOs
Prozac and Suicide risk
Homework
1-3: 3, 5, 11, 17
1-4
Types of Studies
Observational Study:
Observes and measures characteristics without trying to modify the subjects being studied
Experiment:
Impose a treatment on the subjects, then observe the response
Types of Studies
Cross-sectional Study:
Data are observed, measured, and collected at one point in time.
Example: What percentage of people own dogs?
Most polls are cross-sectional studies
Retrospective (or case control) Study:
Data are collected from the past.
Example: What was the average rainfall in 1994?
Prospective (or Longitudinal or Cohort) Study:
Data are collected in the future from groups (cohorts) sharing similar characteristics.
Example: What percentage of dogs who attend an obedience class are still well-behaved 2 years later?
Confounding
When it’s not possible to distinguish the effects of each factor (i.e., which factor caused the outcome?)
Usually, when there are multiple differences between comparison groups
Confounding can be avoided by good study design
Examples of Confounding
Example: A middle-school implements a new math curriculum. They also encourage parent participation, and offer after-school tutoring. An improvement in performance results
Example: An experiment is done to determine if students perform better on tests while listening to music. Each subject is given two similar tests; the first in silence, and the second while listening to music. Performance is higher on the second test.
Ways to control confounding
Blocks:
Create groups with similar characteristics.
Ideally identical in every way except factor being compared
Blinding:
Subjects don’t know if they’re receiving a treatment or placebo
Double-blinding:
Experimenters don’t know which subjects are receiving the treatment
Experiment Design (How to create blocks)
Completely randomized experimental design:
Subjects are assigned to groups based on a process of random selection
Rigorously controlled experimental design:
Subjects are very carefully chosen and assigned to groups so they have similar characteristics
Sample size
Sample must be large enough to reveal the true nature of any effects
Large samples do not make up for bad samples; sample must be selected appropriately for results to be valid.
Random Sampling
Random Sample:
Members of the population are chosen so that each individual has equal likelihood of being chosen
Simple Random Sample:
A special random sample where every possible sample is equally likely
Other types of sampling
Systematic sampling:
Population is ordered, and every kth element is chosen.
This is only random sampling if the starting element is randomly chosen
Stratified sampling:
Population is divided into groups with similar characteristics, and a sample is chosen from each subgroup (stratum).
This is only random sampling if the sample from each subgroup is chosen randomly
Cluster sampling:
Divide the population into sections, or clusters. Select a group of clusters, and use all members of those clusters.
This is only random sampling if the clusters to be used are selected randomly
Not-so-good sampling methods
Voluntary response sample
Convenience sample:
Choosing whoever’s handy
Sources of Error
Sampling error:
The difference between the sample result and the population result, caused by chance fluctuations
Non-sampling error:
Error caused by problems in collecting, recording, and analyzing the data (like broken tools, typos, or miscalculations), or by a biased sample (bad sample selection)
Differences in error:
Sampling error is natural and unavoidable. We must consider it when analyzing our data, but we cannot eliminate it.
Non-sampling error is avoidable, and every effort should be taken to do so.
Homework
1-4: 1-21 odd