We've spent a lot of time visualizing different types of data where there was one variable across a bunch of different categories. It could be like a histogram or a frequency distribution, a pie chart, etcetera. But now we're going to start taking a look at visualizing data sets in which there are multiple variables like two or more. And one of the most common situations of that is called a scatter plot. Alright?
Now that's something that you may have done in a previous math course, like an algebra course, so this may be a little bit of a refresher, but there's a couple of important conceptual things that we'll address and some terms that we'll need to know. So let's go ahead and get started here. All right. So basically, a scatterplot is just a graph on an x y coordinate system, and it's just a graph of paired numerical data. So this is basically just a bunch of pairs of numbers where there's one independent variable, which we call the x, and one dependent variable, which we call the y.
So when I whenever I think of a scatterplot, I basically just think of a bunch of x and y points on a grid. It looks exactly something like this. Alright. So we're going to talk about this second point in just a second here, but let's just jump right into our example. So a teacher is taking a survey of students to determine different factors that might affect test scores.
We've got these different tables that show these different sort of sets of data, and we're going to go ahead and plot the data if necessary, and then we'll go from there. Alright. So let's take a look at this first graph over here, this first table, which just shows test scores versus time spent studying in minutes for an exam. So we've got time on the x axis, and then we've got the scores on the y axis. So, basically, what happens here is you just pair up each one of these numbers, and they just form a bunch of x y points that you would just draw on a grid.
Alright? Now all of these points except for the first two are already drawn for us, so I'm just going to go ahead and just draw the first two just to just in case it's been some time since you’ve done this. All you do is you just grab two of these numbers over here in the same sort of column, and that's going to be your x y pair. So you're just going to go 50 on the x and then 86 on the y, and you're just going to drop a point right there. So this coordinate pair would be right over here, and this would be 50, 86.
Alright. So I'm going to do this one more time over here because all the rest of these are already drawn. Sixty ninety two would look something like this. You'd go to sixty and then all the way up to ninety two and drop a dot right there, and that would be back coordinates. All right?
So that's just how you sort of plot numbers on a scatterplot, but that's really all there is to it. All right? So now let's take a look at the second part of this question, which is and determine what type of correlation each graph has. One of the things you might notice about this graph over here is that generally, as the data points go to the right, the y values also increase. As students spend more time studying for that exam, they get higher test scores, which hopefully should be expected.
Right? So we actually say that these things are correlated, these x and y values. A correlation happens whenever the data points basically form some kind of a trend or some kind of a pattern. Now, there are different types of correlations that you might see, but the most common one that we work with in this course is called a linear correlation. This is basically where the general trend of the data is pretty much like a straight line, where you can draw a straight line that kind of cuts through most of those data points.
We say that those things are linearly correlated. Alright? So what we can see here is that as the x values increase to the right, the y values also increase. In other words, if you were to take the slope of this line here that you just drew, think back to algebra rise over run, we would say that that slope is positive. And because it's positive, we say that this correlation here is a positive correlation.
Right? That's all there is to it. Positive correlation just looks like this. Now let's take a look on the right over here because we have slightly different data. We have test scores versus the number of pins on a student's backpack.
And clearly, we can see a slightly different relationship that's going on here. So as we can see, the more pins you have on the backpack or more students, pins students have on their backpack, the worse they actually did on their test scores for some reason. In this case, what we actually see here is kind of the opposite. As the x values increase, the y values are actually decreasing. If you were to draw a line that kind of cuts through most of these things, the slope of that line would actually be negative, and therefore, we just say that this is negative correlation.
Alright? So positive correlation looks like this, and negative correlation kind of looks like that. That's really all there is to it. Alright? Now I want to point out one important conceptual thing between these two datasets.
Right? Here we saw that there was a relationship between test scores and time spent studying on the exam. And as you spend more time studying, this basically sort of gave you a higher score on your test. So this kind of makes sense here that these two things would be correlated because it could be something like a cause and effect relationship. If you spend more time studying, you're going to get a higher test score.
Over here, we saw that there is a relationship between test scores and number of pins in the backpack, even though that makes absolutely no sense. So I just want to point out here that two variables can look like they're related to each other without it actually being cause and effect. It's not like you're going to go to your next class and have more pins on your backpack, and you're going to do worse on that exam. Alright? It just looks like that way just sort of because of coincidence.
So one really important thing here that you may see in your course is that correlation does not imply causation. Just because two things show a sort of relationship like this doesn't mean that one causes another. Alright? Now let's take a look at our last two graphs over here. We're going to look at these two slightly different sort of datasets.
Here we have test scores versus the time spent sleeping in number of hours. So clearly, we can see here that all the data points are forming some kind of a trend. But instead of it being a straight line like this or straight line like that, it actually sort of forms a little bit of like a parabola or like a u shape or something like that. Whatever it is, it's definitely not a straight line that kind of represents the relationship between these variables. We see here that students who spend less time sleeping did worse on their exam, and also students who started sleeping more also did worse.
So we would just say here that this is a nonlinear relationship. That's really all you need to know about that. The final one that we'll see here is we have test scores versus number of siblings, versus the different students. And clearly, we can see here that this relationship isn't a straight line either upwards or downwards, and it's not even a curve or something else like that or a different kind of shape. So we would just say here that this data is just not correlated at all.
So you can actually have no correlation whatsoever. So you have positive, negative, nonlinear, and no correlation. Alright, folks. That's just the basics. Let's go ahead and take a look at some practice problems.