Welcome back, everybody. So in recent videos, we've taken a look at x and y scatter plots, and we've been able to sort of visually tell just by looking at the data whether there were trends or correlations in them. But sometimes you may have to more specifically quantify how related two variables are. Fortunately, there is a word for that, and it's called correlation coefficients. Now I know that sounds kind of scary, but all I'm going to show you in this video is that it's just a number.
It's a simple number that represents how related two variables are. So let's take a look at a couple of examples. We're going to jump right in. Let's get started. Alright?
So basically, this correlation coefficient, sometimes called the linear correlation coefficient, is given by the letter "r", and basically, it is just a number that is between negative one and positive one. So you can kind of imagine it on a number line like this. And what it does is it measures two things: it measures the direction and the strength of correlation between two variables.
In fact, let's just jump right into our example so I can show you how this all works. So we have these three correlation coefficients that are given to us over here: 0.13, 0.64, and negative 0.96. They're all numbers between negative one and positive one and they always will be. All we have to do is we're given these three numbers and we just have to match them up to their appropriate graphs.
So let's go ahead and get started here. I mentioned that r measures the direction and strength. So let's talk about the direction. The direction of correlation is always going to match the sign of that r value. So, for example, in a couple of videos, we talked about how positive correlation shows an upward trend like this.
Those are going to be positive r values. So basically, whenever you have positively trending upwards like this, your r value is going to be positive and vice versa. Whenever you have something that sort of slopes downward like this, that's negative correlation. Those r values will be negative. All right.
So let's take a look at our three numbers. You'll notice that these two are positive and this one is negative. So we just have to find which one of these graphs shows a downward sloping trend in data. And if you take a look, we should see it here that there's basically only one, and it's going to be the left one. So automatically, just by looking at the graphs, we can figure out that this one is negative 0.96.
So sometimes you can tell just right off the bat just by looking at the signs of the r values. Alright? Okay. So that's a little bit more about the direction, but let's talk about the strength of that r value.
So we say that correlation is strong whenever the points are tightly clustered around the line that kind of cuts through a lot of the data points. For example, in that left graph, you can see that all of these data points are actually really sort of all lined up such that there's a little bit of wiggle room here, but you can almost draw a line that cuts through most of these data points, and they're really tightly clustered around them. This is what we would say is strong correlation. So this is how we can visually tell whether two variables are strongly correlated. Now when this happens, the r value is going to be close to either negative one or positive one, depending on whether it slopes down or up.
All right. So, clearly, we can see here that this slopes downwards. Therefore, it's negative. But because the data points are tightly packed, it's going to be an r value that's very close to negative one. All right.
Now, as you get closer towards either one or negative one, that correlation gets stronger and stronger. On the opposite side, if you get closer towards zero, it means that the correlation is getting weaker. And when you have values that are close to zero, it means that there's no correlation and the data points are kind of scattered around everywhere.
Let's take a look at our remaining two values.
We have 0.13 and 0.64. So both of them are positive numbers, but which one do you think is going to represent each one of these two graphs? Well, if I take a look at the second graph over here and compare it to the third one, you can see that these data points are much more loosely scattered around. I can't draw any line that cuts through most of these data points, so there's no real correlation anywhere here. So which of the values do you think it's going to be?
Well, it's going to be 0.13, and this is what we would say is no correlation as we've seen before. Therefore, just by default, this final value over here, this r value, is going to be 0.64. We can still see that there's somewhat of a trend line that cuts through most of these numbers.
But unlike that left graph, those data points don't really cluster tightly around that line. So we say that this is weak correlation. Now there's no general consensus on what counts as strong versus weak, what the cutoff is. But generally, anything beyond 0.8 is considered strong here. So somewhere around 0.8 would be a good boundary for that.
Alright? That's really all there is to it, folks.
Thanks for watching. Let's get some practice.