7 Experimental Design
An experiment deliberately imposes some treatment (the specific condition applied) on individuals (experimental units or subjects if the experimental units are human) to measure their responses.
If an experiment is conducted improperly, or an observational study is held instead, it would mean that many variables are not controlled for so we cannot establish causation between the explanatory and response variables.
Response variables measure an outcome of a study.
Explanatory variables may help explain or predict changes in a response variable.
Because of the lack of control, we cannot be certain that variables other than the explanatory variable confound the response instead. There may be confounding occurring that makes us think that something is happening at first glance.
This website highlights many different quirky correlations that we can see between non-related variables https://www.tylervigen.com/spurious-correlations. As you should notice, there is obviously factors that are influencing the data to look like as one increases, the other increases.
Confounding occurs when the effects of two variables on a response variable cannot be separated from each other.
Confounding variables are variables other than the explanatory variable that may have an effect on the response variable.
When our goal is to understand cause and effect, an experiment is the only source of fully convincing data since results in observational studies typically have some confounding variable in play. For this reason, the distinction between observational study and experiment is one of the most important in statistics.
7.1 Experiment Principles
In general, the quality of experiments (their internal validity) can be judged based on the degree to which they have four things: comparison, randomization, control, and replication. Stronger internal validity gives us a better cause-effect link in our experiment. Whenever you are describing or evaluating the design of an experiment, you need to be sure to discuss all four of these!
These four determine how much internal validity we have in our experiment.
- Comparison
- Use a design that compares two or more treatments.
- Randomization
- Use chance to assign experimental units to treatments. Doing so helps create roughly equivalent groups of experimental units by balancing the effects of other variables among treatment groups.
- Control
- Keep other variables that might affect the response the same for all groups.
- Replication
- Use enough experimental units in each group so that difference in the effects of the treatments can be distinguished from chance differences between the groups.
The logic of a randomized comparative experiment depends on our ability to treat all the subjects the same in every way except for the actual treatments being compared. Good experiments, therefore, require careful attention to details to ensure that all subjects really are treated identically.
7.1.1 Placebos
The response to a dummy treatment is called the placebo effect. Subjects are given a placebo treatment to control for the placebo effect.
For example, If I tell someone that I am giving them an energy drink (when it in fact has doesn’t actually provide “energy”), and they feel like they have energy after, they have fallen for the placebo effect.
It’s well known that someone’s mental state can easily affect their physical state, so it’s important to control for the placebo effect. Typically, this applies to medicine settings, where you might give a pill with the actual medicine and a placebo (a pill with everything but the actual medicine) and conduct it in a blind.
Conducting an experiment in a blind means that you give treatments to patients without allowing them to know which treatment they are taking.
However, Whenever possible, experiments with human subjects take it a bit further and conduct their experiment in a double-blind, where neither the subjects nor those who interact with them and measure the response variable know which treatment a subject received.
7.2 Experiment Designs
7.2.1 Completely Randomized Design
In a completely randomized design, the experimental units are assigned to the treatments completely by chance. This is similar to (but NOT the same as) a simple random sample (SRS), because in both cases we ignore other variables. Here’s the difference: In an SRS, we’re picking some people (our sample) to study, and ignoring the rest. In a completely randomized experiment, however, we already have our sample (the people in our experiment), and we’re randomly deciding how we’re going to study each person (or, which treatment they’re going to get). So in complete randomization, the randomization is in the assignment, not in the selection, of people in our study.
7.2.2 Randomized Block Design
In a randomized block design, the experimental units are first assigned to blocks according to the different types/status of the experimental units in the experiment. This is similar to stratified random sampling, however, we are not taking a sample. Any reference to stratified random sampling is wrong when describing an experiment design.
After each experimental unit is assigned to their block, the experiment is carried out in each block, where a completely randomized design is carried out within the block.
Afterwards, you compare and analyze results from each block and finally combine all results and analyze the differences between blocks.
7.2.3 Matched Pairs Design
A matched pairs design is a special case of a randomized block design that uses blocks of size 2. In this kind of design, you have to have “matched pairs.” In other words, you need to have two extremely similar individuals that make up each block. In some cases, you have a single person for each block and that person recieves both treatments in randomized order (because who is more similar to a person than themselves?).
7.3 Inference
The main purpose of experiments is to be able to infer something about what we did. Does A actually affect B? Is it true for anyone else other than the people we experimented on?
An observed effect so large that it would rarely occur by chance is said to be statistically significant. If we test something according to a single assumption that we make and find out that the data that we collect doesn’t really match up with that claim (if the chance of seeing data like the one we obtained is too low), then we’d say that it is statistically significant evidence.
7.3.1 Scope of Inference
The scope of inference refers to the type of inferences (conclusions) that can be drawn from a study. The types of inferences we can make (inferences about the population and inferences about cause-and-effect) are determined by two factors in the design of the study.
Were individuals randomly assigned to groups? | |||
---|---|---|---|
Yes | No | ||
Were individuals randomly selected from a population? |
Yes |
Can make inferences about the population Can make inferences about cause and effect (Rare in the real world) |
Can make inferences about the population Cannot make inferences about cause and effect (Some observational studies) |
No |
Cannot make inferences about the population Can make Inferences about cause and effect: (Most experiments) |
Cannot make inferences about the population Cannot make Inferences about cause and effect (Some observational studies) |
7.4 Free Response Tip
Free response questions will mainly challenge you to think about the 4 principles of experimental design. So make sure to have those to heart and mention as many as posible when asked how to improve an experiment or what is wrong with an experiment.
Any other concept is still free game to ask you to elaborate on.
Something could be on a FRQ that’s concrete would be needing to describe how to randomly assign experimental units to treatments. Make sure that you incorporate the same elements as a sampling description would: assigning numbers, a specific mechanism for randomly selecting numbers, and assigning corresponding to the numbers.
Here’s an example taken from the 2007 Exam:
As dogs age, diminished joint and hip health may lead to joint pain and thus reduce a dog’s activity level. Such
a reduction in activity can lead to other health concerns such as weight gain and lethargy due to lack of exercise. A study is to be conducted to see which of two dietary supplements, glucosamine or chondroitin, is more effective in promoting joint and hip health and reducing the onset of canine osteoarthritis. Researchers will randomly select a total of 300 dogs from ten different large veterinary practices around the country. All of the dogs are more than 6 years old, and their owners have given consent to participate in the study. Changes in joint and hip health will be evaluated after 6 months of treatment.Assuming a control group is added to the other two groups in the study, explain how you would assign the 300 dogs to these three groups for a completely randomized design.
- Assign each of the dogs a number 1-300.
- Use a random number generator to get 200 unique random numbers 1-300.
- Use the first 100 numbers generated to assign the corresponding dogs to the control group.
- Use the last 100 numbers generated to assign the corresponding dogs to a group that recieves glucosamine.
- Assign the remaining dogs to a group that recieves chrondroitin.
Another approach could be the “slips in the hat approach.” Be aware that regardless of approach, that you mention randomly assigned and that the method is descriptive enough to ensure that someone can reproduce it. Also make sure that you ensure that every individual has an equal chance of being assigned to some group.
- Assign each of the dogs a number 1-300 and write their numbers on slips of paper and place them in a box
- Mixing well, randomly take 100 slips out of the box and assign the corresponding dogs to the control group.
- Repeat step two and assign the corresponding dogs to the glucosamine group.
- Assign the remaining dogs to a group that recieves chrondroitin.