Table of contents
- 1. Intro to Stats and Collecting Data55m
- 2. Describing Data with Tables and Graphs1h 55m
- 3. Describing Data Numerically1h 45m
- 4. Probability2h 16m
- 5. Binomial Distribution & Discrete Random Variables2h 33m
- 6. Normal Distribution and Continuous Random Variables1h 38m
- 7. Sampling Distributions & Confidence Intervals: Mean1h 3m
- 8. Sampling Distributions & Confidence Intervals: Proportion1h 12m
- 9. Hypothesis Testing for One Sample1h 1m
- 10. Hypothesis Testing for Two Samples2h 8m
- 11. Correlation48m
- 12. Regression1h 4m
- 13. Chi-Square Tests & Goodness of Fit1h 20m
- 14. ANOVA1h 0m
1. Intro to Stats and Collecting Data
Intro to Stats
Problem 1.1.28
Textbook Question
In Exercises 25–28, refer to the data in the table below. The entries are for five different years, and they consist of weights (metric tons) of lemons imported from Mexico and U.S. car crash fatality rates per 100,000 population [based on data from “The Trouble with QSAR (or How I Learned to Stop Worrying and Embrace Fallacy)” by Stephen Johnson, Journal of Chemical Information and Modeling, Vol. 48, No. 1].
[IMAGE]
Conclusion If we were to use the sample data and conclude that there is a correlation or association between lemon imports and crash fatality rates, does it follow that lemon imports are the cause of fatal crashes?

1
Step 1: Understand the problem. The question is asking whether a correlation or association between two variables (lemon imports and car crash fatality rates) implies causation. This is a common statistical concept where correlation does not necessarily mean causation.
Step 2: Define the key terms. Correlation refers to a statistical relationship between two variables, which can be positive, negative, or zero. Causation, on the other hand, implies that one variable directly affects the other. It is important to distinguish between these two concepts.
Step 3: Analyze the data. Look at the data provided in the table. Calculate the correlation coefficient (r) to determine the strength and direction of the relationship between lemon imports and car crash fatality rates. Use the formula for app's correlation coefficient: . Here, x and y represent the two variables, and x̄ and ȳ are their respective means.
Step 4: Interpret the correlation coefficient. If the correlation coefficient is close to 1 or -1, it indicates a strong relationship. If it is close to 0, it indicates a weak or no relationship. However, even if a strong correlation exists, it does not imply causation. Other factors, such as confounding variables, could be influencing the relationship.
Step 5: Draw a conclusion. Based on the statistical analysis and the concept of correlation versus causation, explain that even if a correlation is found between lemon imports and car crash fatality rates, it does not mean that lemon imports cause fatal crashes. This could be an example of a spurious correlation, where two variables appear to be related but are not causally connected.

This video solution was recommended by our tutors as helpful for the problem above
Video duration:
3mPlay a video:
Was this helpful?
Key Concepts
Here are the essential concepts you must grasp in order to answer the question correctly.
Correlation vs. Causation
Correlation refers to a statistical relationship between two variables, indicating that they tend to move together. However, this does not imply that one variable causes the other. Understanding this distinction is crucial, as it helps prevent erroneous conclusions about the nature of the relationship, especially in observational data where confounding factors may be present.
Recommended video:
Guided course
Scatterplots & Intro to Correlation
Confounding Variables
Confounding variables are external factors that may influence both the independent and dependent variables, potentially leading to misleading interpretations of data. In the context of the question, other factors could affect both lemon imports and car crash fatality rates, making it essential to identify and control for these variables to draw valid conclusions.
Recommended video:
Guided course
Intro to Random Variables & Probability Distributions
Statistical Significance
Statistical significance assesses whether the observed relationship in data is likely due to chance. A statistically significant result suggests that the correlation observed is unlikely to have occurred randomly. However, it is important to remember that statistical significance does not imply practical significance or causation, which must be evaluated through further analysis.
Recommended video:
Guided course
Parameters vs. Statistics
Watch next
Master Introduction to Statistics Channel with a bite sized video explanation from Patrick
Start learning