clock menu more-arrow no yes mobile

Filed under:

Correlation vs. Causation

People like to draw conclusions; it's human nature.  Patterns emerge, and people are quick to attribute the patterns to certain characteristics.  Many people have noted the number of Senior team captains in Gene Smith's two drafts and decided that it meant Gene Smith wasn't interested in any players besides Senior team captains.  This logic pattern, which is a logical fallacy, is called "cum hoc ergo propter hoc", which translates to "with this, therefore because of this".  It continually shows itself in sports, and I've seen a great deal specifically in football-related conversation.  The problem is that it's flawed logic; statistically, this logical fallacy states that correlation does not imply causation.  To understand why, we need to look at both correlation and causation.


Correlation is defined by Merriam-Webster as:

a relation existing between phenomena or things or between mathematical or statistical variables which tend to vary, be associated, or occur together in a way not expected on the basis of chance alone

In layman's terms, correlation is a relationship between things that happen at the same time in a way you wouldn't expect them to do randomly.  For example, there is correlation between having good grades and rate of graduation from college; obviously, you would expect students with good grades to have a higher graduation rate than students with poor grades.

The issue with the definition of correlation is "not expected".  Though you wouldn't EXPECT the things to be related based on chance, it's not IMPOSSIBLE for them to be related based on chance.  This is the source of most people's flawed logic when it comes to sports.  Correlation is all over the place in sports.  In football, correlation can be found in many places, such as correlation between a running back's age and yards per carry average.  To understand why correlation does not imply causation, though, we must look at causation as well.

Merriam-Webster defines causation as:

the act or process of causing; the act or agency which produces an effect

Causation is the act of causing, which means the act of directly producing an effect on something else.  When people look for the cause of something, they are not looking for things that are possibly or even likely related; they are looking for the specific reason something occurred.  For example, if a car accident occurred, perhaps the driver that caused the accident was on their cell phone, but that does not mean the cell phone use was the reason for the crash.  What if the driver on the cell phone had their foot slip off the brake?  What if the driver on the cell phone's steering wheel locked up?  Just because they are on the cell phone does not mean they caused the accident.  Of course it IS possible that the cell phone was indeed the cause of the accident, but the fact that the cell phone use and the accident happened at the same time (correlation), this does not mean the cell phone use caused the accident (causation).

When people state that correlation implies causation, their argument is that A occurs in correlation with B, and therefore, A causes B.  In reality, there are five total possibilities; they are:

1.  A may be the cause of B
2.  B may be the cause of A
3.  Some unknown factor C may be the cause of A and B
4.  A combination any or all of A, B, and/or C
5.  The results are coincidental or are random enough statistically to be termed coincidental, meaning there is not a relationship between A and B

The fact is that it's usually not possible to prove causation.  Often the cause of something is not clearly evident given the information available, but people love jumping to conclusions whether or not they already have a "Jump to Conclusions Mat".

In sports terms, let's look at a favorite example of mine:  the players that fans tend to label "Gene guys".  These players are team captains, Seniors in college, and "safe" draft picks.  Over the first two years of his tenure as Jaguars, Gene Smith has drafted 15 players.  Of these 15, 14 were Seniors.  It seems easy to state that Gene Smith prefers college Seniors; in fact, most people do.  However, we cannot definitively state causation here.

Here is the statement in A and B form:

A:  Gene Smith has drafted 15 players in two years
B:  Gene Smith has drafted 14 Seniors in two years

Let's go down the possibilities list:

1.  Gene Smith prefers to draft Seniors - This is possible
2.  Seniors prefer to be drafted by Gene Smith - Maybe, but it doesn't mean anything in terms of causation.
3.  Gene Smith prefers to draft Seniors that are better at football than underclassmen - in this statement, football ability is factor C.
4.  Gene Smith prefers to draft Seniors when they're better at football than underclassmen, or gives being a Senior additional weight but does not use it as an overriding philosophy, or something like that - A combination of factors
5.  The fact that Gene Smith has drafted 14 Seniors in 15 picks has no statistical relationship.

When I look at this list, option 1 is the one most stated by people that I've run across.  Aldon Smith is a great example; he's an underclassman who most people have rated in the top 20 overall players this year.  I've heard many times that the Jaguars won't draft Aldon Smith "because he's an underclassman."  This is fallacious logic.  When I look at the list above, the options that seem most likely to impact whether or not Gene Smith drafts Aldon Smith are options 3 and 4; that is, that Gene Smith prefers to draft Seniors when they're better players than underclassmen.  He may give additional weight to whether or not a player is a Senior, but in no way does being an underclassman eliminate a player from draft consideration.  If Aldon Smith is a more highly-rated pro prospect in Gene Smith's eyes than, say, Ryan Kerrigan, it would be foolish to expect Gene to pick Kerrigan anyway.

The purpose of this post is to try to get people to think things through rather than jumping to conclusions.  Is Jimmy Smith's coverage ability to cause of opposing teams only throwing at him 20 times in man coverage, or could it be some other reason?  Is Maurice Jones-Drew's increased carry load the cause of his knee injury, or could it have happened for some other reason?  Before attributing causes to certain events, let's try to take a step back and think about other possible causes.  Increased credibility comes with increased knowledge and logical reasoning.  Let's stop with the proclamations about the cause of this or that and investigate all potential causes instead of trying to (likely erroneously) pinpoint the absolute cause.