WEBVTT
00:00:01.120 --> 00:00:07.840
Using the information given in the table, find the Spearman’s rank correlation between the variables 𝑥 and 𝑦.
00:00:08.440 --> 00:00:10.920
Give your answer to four decimal places.
00:00:11.880 --> 00:00:18.720
We’re given a set of bivariate data, that is, paired data, for variables 𝑥 and 𝑦.
00:00:19.440 --> 00:00:21.480
The variables are qualitative, or categorical.
00:00:21.760 --> 00:00:24.240
That is, the values they take are nonnumerical.
00:00:24.760 --> 00:00:31.120
And we see that the possible values of both 𝑥 and 𝑦 are “Excellent,” “Very Good,” “Good,” and “Poor.”
00:00:31.800 --> 00:00:41.960
We can say then that our data follow a grading system with values in the definite categorical order ranging from “Excellent” to “Poor.”
00:00:42.680 --> 00:00:49.760
And since there is an order to our data, we can assign ranks to the variable values in our data set.
00:00:50.040 --> 00:00:54.400
We can then use these ranks to calculate Spearman’s correlation coefficient.
00:00:54.880 --> 00:00:59.720
To do this, we first add two new rows to our table for the ranks.
00:01:00.400 --> 00:01:03.080
That’s 𝑅 𝑥 and 𝑅 𝑦.
00:01:03.800 --> 00:01:11.080
To first rank our 𝑥-data, to see how the ranking works, we first list our six values in order.
00:01:12.160 --> 00:01:19.640
And since we have four instances of “Excellent,” these take up the first four places in our ranking.
00:01:20.280 --> 00:01:23.640
And our two instances of “Good” will take up places five and six.
00:01:24.240 --> 00:01:34.000
And when we have repeated values like this, we need to calculate their tied ranks so that the repeated values all have the same rank.
00:01:34.600 --> 00:01:41.080
And we do this by calculating the average of their places or positions in the ordered list.
00:01:41.560 --> 00:01:53.120
For the four instances of “Excellent” then in our 𝑥-data, each instance has a rank of one plus two plus three plus four divided by four.
00:01:53.120 --> 00:01:59.320
That is the sum of the positions taken by the four instances divided by the number of elements of equal value.
00:02:00.040 --> 00:02:04.640
And this evaluates to 10 over four, which is 2.5.
00:02:05.360 --> 00:02:16.800
This means that each of our instances of “Excellent” for the variable 𝑥 has a rank of 2.5, which we can put in our table in the row for the ranks of the 𝑥-values.
00:02:17.560 --> 00:02:25.880
And we know that assigning tied ranks in this way ensures that the sum of the ranks is the same for each of the two variables 𝑥 and 𝑦.
00:02:26.480 --> 00:02:31.760
And we’ll see that this is indeed the case when we finish ranking our data.
00:02:32.520 --> 00:02:40.760
So now for the 𝑥-variable, we’re left with two values, which are both “Good.”
00:02:41.280 --> 00:02:46.520
These are in positions five and six in our list.
00:02:47.320 --> 00:02:53.960
And since they both have the same value, that is, “Good,” we’ll need to work out their tied ranks.
00:02:54.840 --> 00:02:56.120
This is given by the average of their positions five and six, which is five plus six over two.
00:02:56.680 --> 00:02:58.800
That is 11 over two, which is 5.5.
00:02:59.520 --> 00:03:06.640
These are both then assigned the rank of 5.5, which we put in our table underneath the instances of “Good” for the 𝑥-variable.
00:03:07.440 --> 00:03:10.520
Now let’s assign ranks in the same way to our 𝑦-data.
00:03:11.240 --> 00:03:16.080
And listing our 𝑦-data in order, we see again that we have some repeated values.
00:03:16.720 --> 00:03:31.640
Labeling our positions again one to six, it’s very important that the positioning is the same way round as it was for the 𝑥-values; that is, we have a low number attached to a high grade, so we start with one for “Excellent.”
00:03:32.480 --> 00:03:40.160
Since we have only one instance of “Excellent,“ this element is ranked first, or one.
00:03:41.160 --> 00:03:45.920
And we can put this in our table under “Excellent” for the 𝑦-data.
00:03:46.960 --> 00:03:54.040
Similarly, we have only one instance of “Very Good” in the 𝑦-data, so we can rank this second.
00:03:54.760 --> 00:03:58.920
In our table then, the rank of two goes underneath the instance of “Very Good” for the 𝑦-data.
00:03:59.640 --> 00:04:00.880
We have two instances of “Good.”
00:04:01.400 --> 00:04:06.920
So working out the tied ranks for these two, their positions are third and fourth.
00:04:07.640 --> 00:04:13.760
So their tied ranks are three plus four over two, that is, seven over two, which is 3.5.
00:04:14.200 --> 00:04:16.560
And this is their rank, which we put in our table underneath the two instances of “Good” within the 𝑦-data.
00:04:17.040 --> 00:04:22.440
And now we’re left with two instances of “Poor.”
00:04:23.000 --> 00:04:36.880
Their tied ranks are the average of their positions, that is, five plus six over two, which is 11 over two, which is 5.5.
00:04:37.480 --> 00:04:40.480
So these are both ranked 5.5.
00:04:41.040 --> 00:04:44.760
And we can put these in our table in the rankings for 𝑦.
00:04:45.440 --> 00:04:53.080
So now if we work out the sums of the ranks for each variable, we find, as expected, these are equal with a value of 21.
00:04:53.720 --> 00:05:13.600
Now, to calculate Spearman’s rank correlation between the two variables, we use the formula the coefficient 𝑟 is equal to one minus six times the sum of the differences squared over 𝑛 times 𝑛 squared minus one, where 𝑛 is the number of data pairs, which in our case is six, and 𝑑 subscript 𝑖 is the difference in ranks of each data pair for 𝑖 is one to 𝑛.
00:05:14.200 --> 00:05:18.120
So we’re going to need to work out the differences in the ranks and the squares of those differences.
00:05:18.840 --> 00:05:21.120
So we add two more rows to our table.
00:05:22.000 --> 00:05:23.480
So we begin by working out our difference.
00:05:24.040 --> 00:05:38.240
If we assign the number 𝑛 is equal to one, two, three, four, five, and six to our data pairs, we have that our first difference 𝑑 one is 5.5 minus 5.5, and that’s equal to zero.
00:05:38.720 --> 00:05:46.200
Our second difference 𝑑 two is 2.5 minus 3.5, which is negative one.
00:05:46.880 --> 00:05:52.120
𝑑 three is 5.5 minus 5.5, which is zero.
00:05:52.760 --> 00:05:58.240
𝑑 four is 2.5 minus one, which is 1.5.
00:05:58.960 --> 00:06:04.000
And similarly, 𝑑 five is 0.5, and 𝑑 six is negative one.
00:06:04.920 --> 00:06:11.000
A good check that we’re on the right track at this point is that the sum of the differences is equal to zero.
00:06:11.640 --> 00:06:14.520
And in fact, this is the case for our differences.
00:06:15.120 --> 00:06:20.400
So now we work out the differences squared since this is what we need for our formula.
00:06:21.000 --> 00:06:39.280
We have zero squared, which is equal to zero; negative one squared, which is equal to one; again zero squared, which is zero; 1.5 squared, which is 2.25; 0.5 squared, which is 0.25; and again negative one squared, which is one.
00:06:39.920 --> 00:06:44.880
And so if we now sum the square differences, we have a sum of 4.5.
00:06:45.760 --> 00:07:01.000
So now we can use this and the fact that 𝑛 is equal to six in our formula so that our Spearman’s rank correlation 𝑟 𝑠 is one minus six times 4.5 over six times six squared minus one.
00:07:01.480 --> 00:07:16.400
Our fraction evaluates to 27 over 210 so that the Spearman’s correlation is approximately equal to one minus 0.128571.
00:07:17.080 --> 00:07:25.640
That is to six decimal places, which to four decimal places is 0.8714.
00:07:26.400 --> 00:07:37.560
Hence, the Spearman’s rank correlation coefficient between the variables 𝑥 and 𝑦 is 0.8714 to four decimal places.
00:07:38.120 --> 00:07:47.960
We can note at this point that Spearman’s rank correlation coefficient can take values from negative one to positive one and that our coefficient is within this range.
00:07:48.480 --> 00:07:56.080
In fact, since our coefficient is close to positive one, we can say that the rankings for 𝑥 and 𝑦 are in strong agreement.
00:07:56.640 --> 00:08:03.760
And so we would associate better grades or ratings for 𝑥 with better grades or ratings for 𝑦 and vice versa.
00:08:04.360 --> 00:08:13.560
And this is how we interpret the Spearman’s correlation coefficient with a value of 0.8714.