12  Confidence Intervals

Confidence Interval

A Confidence Interval is an interval of plausible values for the population parameter and is used to estimate the parameter.

The interval is calculated from the data and has the form \[\text{point estimate} ± \text{margin of error}\]

Margin of Error

Margin of errors are calculated from the confidence level (\(C\)) and the data and has the form

\[ME = m = \text{critical value} \cdot \text{standard error of the statistic}\]

The difference between the point estimate, the value of the statistics from a sample that provides an estimate of the population parameter, and the true parameter value will be less than the margin of error in \(C\)% of samples.

12.1 Interpreting Confidence Intervals

You can say something of the form:

Confidence Interval Interpretation

I am [\(C\)]% confident the interval from [lower bound] to [upper bound] captures the [true parameter in the context of the problem].

Warning

You CANNOT say that “There is a 95% chance that this interval contains the population (proportion or mean)” because your single interval either did or did not capture the real value, and either has a probability of 0 or 1

12.2 Interpreting Confidence Level

Confidence Level Interpretation

If we take many, many samples and create many, many intervals using the same method, about [\(C\)]% of them will capture the [true parameter in the context of the problem].

In other words, you can say that: “95% of intervals created in this manner will capture the true population (proportion or mean)”

12.3 Changing Confidence Levels

When we increase confidence level, we are saying that more intervals in this manner should capture the true parameter. This means that to increase confidence level means that the confidence intervals need to increase in width.

When we increase the size of a sample, the precision of our statistic increases because of the reduction in sampling variability (standard deviation of the statistic decreases as n increases). This means that we can now guess/estimate a smaller range compared to if we had a smaller sample size. So, when we increase the size of our samples, our confidence intervals will have smaller widths compared to those with a smaller sample size.

12.4 Conditions for Confidence Intervals

1. Random

The data must come from a random sample, or random assignment in an experiment.

In order to infer about a larger population, we must have a random, unbiased sample from that population. If the sample is not random, we need to think about whether it’s at least unbiased and/or representative. If so, we may be able to treat it as a random sample, but we must state that we’re doing this. If you’re not told that a sample is random, you can write something like “I’m going to have to assume I can treat this as a random sample”.

In the case of experiments, we can perform statistical tests on the data from our experimental groups, even if we don’t have a random sample from a larger population. In this case, we must make sure we have internal validity in our experiment (a well-conducted experiment with control, replication, and random assignment of treatments). If we do, our results are valid for our subjects, but we may or may not be able to generalize to a larger population. This depends on whether our subjects are a random sample from, or at least representative of, some larger population.

2. Large

You must check the following to know if the sample is large enough for us to know if we have an approximately normal distribution.

Proportions

For proportions, remember, the population is never anywhere near normal (it’s always two bars, yes and no). In this case we check the Large Counts Condition.

This ensures that our sample proportion can take on enough different values (and make enough bars in a histogram) to create an approximately normal sampling distribution.

Sample means

In the case of sample means, the sampling distribution is approximately normal if at least one of the following conditions are met.

  • Normal/Large Condition

  • When \(n<30\), check if the graph of the sample data does not show any strong skew or outliers. Otherwise,

3. Sampling Independence

Observations in our sample must be independent of each other.

In random samples from a population, observations are never independent because the population changes with every person we sample and remove from it. However, this effect is small enough to ignore as long as the population from which we’re sampling is at least 10 times as large as our sample. This needs to be stated!

In experiments, or other situations where we’re not sampling from a population (rolls of dice for example), we just need to think about whether the outcome of one observation could be having any effect on the outcome of another observation. Often we don’t know this, so we need to state that we’re assuming it; “I’m going to have to assume I can treat these observations as independent.”

12.5 The Four-Step Process

  1. STATE exactly what you’re doing.
State

Be sure to be specific here, especially with regard to defining the actual numbers you will be analyzing. You need to demonstrate your understanding of the population about which you’re inferring, the procedure you’re using, and the parameter of interest.

I will estimate \(p =\) the proportion of all non-solar Berkeley homeowners who would install solar panels if they knew that they could recover the cost in 10 years with 92% confidence.

  1. PLAN which method you will use and the conditions required to use the method.
Plan

State the name of the procedure that you’ll attempt.

Check the conditions, and if the conditions are not met, or if you can’t be sure they’re met, you need to explain what needs to be true in order for the conditions to be met.

If the conditions are met, I will use a one-sample z-interval for the population parameter \(p\).

Random Sample: Yes, it’s stated that the sample is random.

Large Sample: \(n\hat{p}=75\left(\frac{43}{75}\right)=43\) and \(n\left(1-\hat{p}\right)=75\left(1-\frac{43}{75}\right)=32\)

Both amounts are over 10, so the sample is large enough for the sampling distribution of to be approximately normal.

Independence: We have independence as long as the population of non-solar Berkeley homeowners is at least 750. This is definitely true.

  1. DO the calculations
Do

Perform the calculations necessary for the method you chose to use. Show all work!

\[ \begin{aligned} z^\ast &= \mathtt{invNorm((1 - 0.92) / 2 + 0.92)} \approx 1.75 \\ \\ CI_{92}&=\hat{p}\pm z^\ast\cdot SE_{\hat{p}} \\ &=\hat{p}\pm1.75\cdot\sqrt{\frac{\hat{p}\left(1-\hat{p}\right)}{n}} \text{, and we know that } n=75 \text{ and } \hat{p}=\frac{43}{75}=0.5733 \\ &=0.5733\pm1.75\sqrt{\frac{0.5733\left(1-0.5733\right)}{75}} \\ &=\left[0.4734,\ 0.6732\right] \end{aligned} \]

  1. CONCLUDE in the context of the situation.
Conclude

Interpret the confidence interval in the context of the problem.

I am 92% confident that \(p\) (as defined in State) is somewhere between 47.34% and 67.32%.