The spurious relationship is said to have occurred if the statistical summaries are indicating that two variables are related to each other when in fact there is no theoretical relationship between two variables. It often happens in time series data and there are many well-known examples of spurious correlation in time series data as well. For example, Yule (1926) observed strong relationship between marriages in church and the mortality rate in UK data. Obviously, it is very hard to explain that how the marriages in church can possibly effect the mortality, but the statistics says one variable has very strong correlation with other. This is typical example of spurious regression. Yule (1926) thought that this happens due to missing third variable.
This term spurious correlation was invented on or before 1897 i.e. in less than 15 years after invention of regression analysis. In 1897, Karl Pearson wrote a paper entitled, ‘Mathematical Contributions to the Theory of Evolution: On a Form of Spurious Correlation Which May Arise When Indices Are Used in the Measurement of Organs’. The title indicates the terms spurious regression was known at least as early as 1897, and it was observed in the data related to measurement of organs. The reason for this spurious correlation was use of indices. In next 20 years, many reasons for spurious correlation were unveiled with the most popular being missing third variable. This means if X is a cause of Y and X is also a cause of Z, but Y and Z are not directly associated. If you regress Y on Z, you will find spurious regression.
In 1974, Granger and Newbold (Granger won noble prize later) found that two non-stationary series may also yield spurious results even if there is no missing variable. This finding only added another reason to the possible reasons of spurious regression. Neither this finding can be used to argue that the non-stationarity is one and only reason of spurious regression nor this can be used to argue that the spurious regression is time series phenomenon. However, unfortunately, the economists adapted the two misperception. First, they thought that spurious regression is time series phenomenon and secondly, although not explicitly stated, it appears that the economists assume that the non-stationarity is the only cause of spurious regression. Therefore, although not explicitly stated, most of books and articles discussing the spurious regression, discuss the phenomenon in the context of non-stationary time series.
Granger and his coauthors in 1998 wrote a paper entitled “Spurious regressions with stationary series”, in which they show that spurious regression can occur in the stationary data. Therefore, they clear one of the common misconception that the spurious regression is only due to non-stationarity, but they were themselves caught in the second misconception that the spurious regression is time series phenomenon. They define spurious regression as “A spurious regression occurs when a pair of independent series but with strong temporal properties, are found apparently to be related according to standard inference in an OLS regression”. The use of term temporal properties implies that they assume the spurious regression to be time series related phenomenon. But a 100 years ago, Pearson has shown the spurious regression a cross-sectional data.
The unit root and cointegration analysis were developed to cope with the problem of spurious regression. The literature argues that spurious regression can be avoided if there is cointegration. But unfortunately, cointegration can be defined only for non-stationary data. What is the way to avoid spurious regression if the underlying are stationary? The literature is silent to answer this question.
Pesaran et al (1998) developed a new technique ‘ARDL Bound Test’ to test the existence of level relationship between variables. People often confuse the level relationship with cointegration and the common term used for ARDL Bound test is ARDL cointegration, but the in reality, this does not necessarily imply cointegration. The findings of Bound test are more general and imply cointegration only under certain conditions. The ARDL is capable of testing long run relationship between pair of stationary time series as well as between pair of non-stationary time series. However, the long run relationship between stationary time series cannot be termed as cointegration because by definition cointegration is the long run relationship between stationary time series.
In fact, ARDL bound test is a better way to deal with the spurious regression in stationary time series, but several misunderstandings about the test has restricted the usefulness of the test. We will discuss the use and features of ARDL in a future blog.