Why Divide by ? Let a population consist of the values 9 cigar...

Question

Why Divide by ? Let a population consist of the values 9 cigarettes, 10 cigarettes, and 20 cigarettes smoked in a day (based on data from the California Health Interview Survey). Assume that samples of two values are randomly selected with replacement from this population. (That is, a selected value is replaced before the second selection is made.)

d. Which approach results in values that are better estimates of part (b) or part (c)? Why? When computing variances of samples, should you use division by n or

d. Which approach results in values that are better estimates of part (b) or part (c)? Why? When computing variances of samples, should you use division by n or

Accepted Answer

Welcome back, everyone. In this problem, a population consists of 4 hours, 7 hours, and 15 hours of study time per week based on a student survey. Assume that samples of two values are randomly selected with replacement from this population and list all possible pairs. For each of the 9 possible samples, calculate the variance by treating each sample as a population that is dividing by n, treating each sample as a sample, dividing by n minus 1, and then compute the mean of the variances in both cases. Which approach results in values that are better estimates of the true population variant and why. A says dividing by n is better because it ensures the variance is always slightly lower, making the estimate more stable. B says dividing by n minus 1 is better because it corrects for the bias that occurs when estimating population variance from a sample. C says both approaches give the same result, so it doesn't matter whether you divide by N or N minus 1. And the DU says neither approach provides a good estimate, a larger sample size is required to estimate variance accurately. Now, essentially what we're doing here is that we're finding all of the possible peers, sorry, for our randomly selected values of 47, and 15 hours of study time, OK. We're calculating the variance in two different ways and the meaning of these variances in both of those ways. And then once we have some results, we we're going to try and figure out which approach results in values with better estimates. So let's make a table to track all this data that we're going to generate, OK? So for starters, we've been talking about variants and recall, OK, that. If we're dividing, if we're treating the sample as a population, then the variance, OK, is gonna be equal to the sum of the deviations squared, OK, for each data point. Divided by N. OK, so that would be our uh our variant if we're treating it as a population. But if we're dividing by N minus 1, then our variance is going to be equal to the sum of our deviations squared, OK. No, divided by n minus 1. So you notice there's a few things that we might need to get here, OK? Now for starters in our sample. Or rather, for starters, we need to generate our sample values, I should say, OK, so first we need to generate our sample values. After we do that. Then we need to calculate the sample mean for each of our sample values. Once we have the sample mean, then we can find our deviations. That is for the first value in our sample, we can subtract the sample mean and square that, and then we can do the same for a second value. That would be X2 minus X bar, OK, or squared. Then we can factor in our variants in both cases, OK? So for starters, we can find our variants or population variants rather. That is when we are dividing by, let me write that here, so our population. Variance When we are dividing by N, OK, we put N in bracket, and then we can find our sample variants, OK? And we now know that our sample variance is when we are dividing by N minus 1, OK? So these are, this is, this is just a glimpse of the information that we'll need for or uh to help us compute rather than mean in each scenario, whether the population or the sample, OK? Now we know that since we're talking about two values here, N is going to be equal to 2, OK? So for formula N is gonna be equal to 2. So let me just highlight that in blue here. And I should probably, where can I write that? Let me just probably just to remind ourselves that N equals 24 our value N, OK? So, let's generate our samples. Remember we were given 47, and 15. So what are all the possible pair with or with replacement that we can get from that list? Well, One sample could be 44, another 47. And another 4:15. Another could be 74, OK, 77. And 7:15. And then we could also get 154, OK, 15 7. And 1515. And those are all the possible values. So those are the 9 possible values, which makes sense because in our problem statement when we were told that there should be 9 possible samples, OK? So now that we have these, we can go ahead and calculate all of the values that we put in the headers of our table. So for starters, Our sample mean is going to be equal to the mean of 4 and 4 which would be equal to 4. Thus that tells us then that uh X1 minus X would be 4 minus 4 squared. Which would be equal to 02 0 X2 minus x bar squared would be 4 minus 4 squared, which would also be equal to 0, and thus the population variance would be equal to the sum of these deviations 0 + 0 divided by n, which is 2, and that equals 0. On the other hand, the sample variance would be 0 + 0 divided by 2 minus 1, which would also be equal to 0, OK. And now basically we're gonna do the same thing to figure out the rest of our values, right? So, for example, for 47, the sample mean is 5.5. The variance squared for the first term, that is 4 minus 54 minus 5.5 squared would be 2.25, while for the second value, it would be 6.25. And thus, that tells us then. That the population variance would be 8.5 divided by 2, 4.25, while the sample variance would be 8.5 divided by 2 minus 1 or 1, which would give us 8.5. Or 4 and 15, or sample is going to be uh samine here is gonna be 9.5. Our deviations squared are gonna be 30.25 and 30.25 respectively, which means that our population variance is gonna be 30.25, while the sample variance is going to be 60.5. For 7 and 4, our sample mean is 5.5. Our deviations are 6.25 and 2.25. The variance is gonna be 4.25, and thus the sample variance will be 8.5. For 7 and 7, the mean is 7, the deviations are 0, which means our population and sample variants will also be zero. For 7 and 15, the mean is 11, deviations are 16, which means that our population variance will be 16 and 30, or sample variants will be 32, OK? Now, for 15 and 4, the mean is 9.5. The deviations are 30.25 and 30.25. Thus, our population variance is 30.25, while our sample variance is going to be 60.5. For 15 and 7, the mean is 11. Our deviation squared would be 16, which means our population variance is 16 while the sample variance is 32. And then finally for 15 and 15, they should be 15, not 5 for 15 and 15, the mean would be 15, the deviations would be 0, and thus our population and sample variants would also be zero. So what have we done here? No, we've been able to generate a data set or generate a table that has our population variance and our sample variants. Now, all we need to do is to compute the mean of both of our variances. Now for our population variants, OK, this would imply. And our population variance would be equal to the sum of population variances divided by the sample n. Now we know that our sum of our population variances, we'd add all of these values here. So let me, let me highlight that in red. We'd add all of these values in this column which. It's gonna be equal to 101 divided by the sample size 9 and 101 divided by 9 equals 11.22. OK? Nor on the other hand, our sam of variances squared would be equal to the sum of the sam of variances divided by n. And some of the sample variances, OK. Would be the summer of the valleys in our last column, OK. Which here oops, I also should have included the one at the top, right? And that sum of values is going to be equal to 202. Divided by 9, which equals 22.44, OK. So now what do both of these values tell us, OK? Which approach results in in values that are best estimates of the true population variants? Well, when estimating the population variance from a sample, dividing by N minus 1 instead of N will provide a better estimate. For two reasons. First, because of bias correction, the sample variance tends to underestimate the true population variance when dividing by n, as the sample mean is used instead of the population mean. Next, for the because of the degrees of freedom, since one data point in the sample is constrained by the sample mean, only n minus 1 values are truly independent. Therefore, using N minus 1 ensures an unbiased estimate and in that case. That tells us then that dividing by N minus 1 gives a better estimate of population variance, which is why it is used in sample variance calculations. Thus, B is the correct answer. Thanks a lot for watching everyone. I hope this video helped.

��app

Key Concepts

Population and Sample

Variance and Sample Variance

Sampling with Replacement

Watch next