Spurious Regression With Stationary Time Series

The spurious relationship is said to have occurred if the statistical summaries are indicating that two variables are related to each other when in fact there is no theoretical relationship between two variables. It often happens in time series data and there are many well-known examples of spurious correlation in time series data as well. For example, Yule (1926) observed strong relationship between marriages in church and the mortality rate in UK data. Obviously, it is very hard to explain that how the marriages in church can possibly effect the mortality, but the statistics says one variable has very strong correlation with other. This is typical example of spurious regression. Yule (1926) thought that this happens due to missing third variable.

This term spurious correlation was invented on or before 1897 i.e. in less than 15 years after invention of regression analysis. In 1897, Karl Pearson wrote a paper entitled, ‘Mathematical Contributions to the Theory of Evolution: On a Form of Spurious Correlation Which May Arise When Indices Are Used in the Measurement of Organs’. The title indicates the terms spurious regression was known at least as early as 1897, and it was observed in the data related to measurement of organs. The reason for this spurious correlation was use of indices. In next 20 years, many reasons for spurious correlation were unveiled with the most popular being missing third variable. This means if X is a cause of Y and X is also a cause of Z, but Y and Z are not directly associated. If you regress Y on Z, you will find spurious regression.

In 1974, Granger and Newbold (Granger won noble prize later) found that two non-stationary series may also yield spurious results even if there is no missing variable. This finding only added another reason to the possible reasons of spurious regression. Neither this finding can be used to argue that the non-stationarity is one and only reason of spurious regression nor this can be used to argue that the spurious regression is time series phenomenon. However, unfortunately, the economists adapted the two misperception. First, they thought that spurious regression is time series phenomenon and secondly, although not explicitly stated, it appears that the economists assume that the non-stationarity is the only cause of spurious regression. Therefore, although not explicitly stated, most of books and articles discussing the spurious regression, discuss the phenomenon in the context of non-stationary time series.

Granger and his coauthors in 1998 wrote a paper entitled “Spurious regressions with stationary series”, in which they show that spurious regression can occur in the stationary data. Therefore, they clear one of the common misconception that the spurious regression is only due to non-stationarity, but they were themselves caught in the second misconception that the spurious regression is time series phenomenon. They define spurious regression as “A spurious regression occurs when a pair of independent series but with strong temporal properties, are found apparently to be related according to standard inference in an OLS regression”. The use of term temporal properties implies that they assume the spurious regression to be time series related phenomenon. But a 100 years ago, Pearson has shown the spurious regression a cross-sectional data.

The unit root and cointegration analysis were developed to cope with the problem of spurious regression. The literature argues that spurious regression can be avoided if there is cointegration. But unfortunately, cointegration can be defined only for non-stationary data. What is the way to avoid spurious regression if the underlying are stationary? The literature is silent to answer this question.

Pesaran et al (1998) developed a new technique ‘ARDL Bound Test’ to test the existence of level relationship between variables. People often confuse the level relationship with cointegration and the common term used for ARDL Bound test is ARDL cointegration, but the in reality, this does not necessarily imply cointegration. The findings of Bound test are more general and imply cointegration only under certain conditions. The ARDL is capable of testing long run relationship between pair of stationary time series as well as between pair of non-stationary time series. However, the long run relationship between stationary time series cannot be termed as cointegration because by definition cointegration is the long run relationship between stationary time series.

In fact, ARDL bound test is a better way to deal with the spurious regression in stationary time series, but several misunderstandings about the test has restricted the usefulness of the test. We will discuss the use and features of ARDL in a future blog. 

Please follow and like us:
error

Can cointegration analysis solve spurious regression problem?

The efforts to avoid the existence of spurious regression has led to the development of modern time series analysis (see How Modern Time Series Analysis Emerged? ). The core objective of unit root and cointegration procedures is to differentiate between genuine and spurious regression. However, despite the huge literature, the unit root and cointegration analysis are unable to solve spurious regression problem. The reason lies mainly in the misunderstanding of the term spurious regression.

Spurious correlation/spurious correlation occur when a pair of variable having no (weak) causal connection appears to have significant (strong) correlation/regression. In these meanings the term spurious correlation/spurious has the same history as the term regression itself. The correlation and regression analysis were invented by Sir Francis Galton in around 1888 and in 1897, Karl Pearson wrote a paper with the following title, ‘Mathematical Contributions to the Theory of Evolution: On a Form of Spurious Correlation Which May Arise When Indices Are Used in the Measurement of Organs’ (Pearson, 1897).

This title indicates number of important things about the term spurious correlation: (a) the term spurious correlation was known as early as 1897, that is, in less than 10 years after the invention of correlation analysis (ii) there were more than one types of spurious correlation known to the scientists of that time, therefore, the author used the phrase ‘On a Farm of Spurious Regression’, (c) the spurious correlation was observed in measurement of organs, a cross-sectional data (d) the reason of spurious correlation was use of indices, not the non-stationarity.

One can find in classical econometric literature that that many kinds of spurious correlations were known to experts in first two decades of twentieth century. These kinds of spurious correlations include spurious correlation due to use of indices (Pearson, 1897), spurious correlation due to variations in magnitude of population (Yule, 1910), spurious correlation due to mixing of heterogeneous records (Brown et al, 1914), etc. The most important reason, as the econometricians of that time understand, was the missing third variable (Yule, 1926).

Granger and Newbold (1974) performed a simulation study in which they generated two independent random walk time series x(t)=x(t-1)+e(t) and y(t)=y(t-1)+u(t) . The two series are non-stationary and the correlation of error terms in the two series is zero so that the two series are totally independent of each other. The two variables don’t have any common missing factor to which the movement of the two series can be attributed. Now the regression of the type y(t)=a+bx(t)+e(t) should give insignificant regression coefficient, but the simulation showed very high probability of getting significant coefficient. Therefore, Granger and Newbold concluded that spurious regression occurs due to non-stationarity.

Three points are worth considering regarding the study of Granger and Newbold. First, the above cited literature clearly indicates that the spurious correlation does exist in cross-sectional data and the Granger-Newbold experiment is not capable to explain cross-sectional spurious correlation. Second, the existing understanding of the spurious correlation was that it happens due to missing variables and the experiment adds another reason for the phenomenon which cannot deny the existing understanding. Third, the experiment shows that non-stationarity is one of the reasons of spurious regression. It does not prove that non-stationarity  is ‘the only’ reason of spurious regression.

However, unfortunately, the econometric literature that emerged after Granger and Newbold, adapted the misconception. Now, many textbooks discuss the spurious regression only in the context of non-stationarity, which leads to the misconception that the spurious regression is a non-stationarity related phenomenon. Similarly, the discussion of missing variable as a reason of spurious regression is usually not present in the recent textbooks and other academic literature.

To show that spurious regression is not necessarily a time series phenomenon, consider the following example:

A researcher is interested in knowing the relationship between shoe size and mathematical ability level of school students. He goes to a high school and takes a random sample of the students present in the school. He takes readings on shoe size and ability to solve the mathematical problems of the selected students. He finds that there is very high correlation between two variables. Would this be sufficient to argue that the admission policy of the school should be based on the measurement of shoe size? Otherwise, what accounts for this high correlation?

If sample is selected from a high school having kids in different classes, the same observation is almost sure to occur. The pupil in higher classes have larger shoe size and have higher mathematical skills, whereas student in lower classes have lower mathematical skills. Therefore, high correlation is expected. However, if we take data of only one class, say grade III, we will not see such high correlation. Since theoretically, there should be no correlation between shoe size and mathematical skills, this apparently high correlation may be regarded as spurious correlation/regression. The reason for this spurious correlation is mixing of missing age factor which drives both shoe size and mathematical skills.

Since this is not a time series data, there is no question of the existence of non-stationarity, but the spurious correlation exists. This shows that spurious correlation is no necessarily a time series phenomenon. The unit root and cointegration would be just incapable to solve this problem.

Similarly, it can be shown that the unit root and cointegration analysis can fail to work even with time series data, and this will be discussed in our next blog

Please follow and like us:
error

So why Seem to be Web based On-line casinos Turning into Huge Names during Rugby Sponsorship?

Night-life is undoubtedly probably the most effective establishments about the globe. Daily folks commit your fortune on video games, sporting and even leisure. Certain pleasure things to do can be purchased in real life while others this kind of as on the internet casinos anticipate game enthusiasts and even buffs of on the web slots to the Web. Nfl is the most used off-line method to receive enjoyable, having any throne just as one proven fighter ly sports. Exactly how do you receive check it again you enroll in on-line betting houses and then baseball? The solution is – soccer sponsorship – in addition to here’south for what reason internet gambling establishments have become giant companies during it.

Continue reading “So why Seem to be Web based On-line casinos Turning into Huge Names during Rugby Sponsorship?”

Please follow and like us:
error