Central Bank Borrowing and Inflation: between Myth and Reality

Pandemic covid-19 was a global disaster and every nation was affected with different levels of intensity. Many huge industries such as the aviation and tourism industry faced huge losses. In such circumstances, it was not possible to provide a huge covid support package using the tax revenue. Despite this, many governments including those running deficit budgets for decades provided very large covid relief packages. For example, the UK recorded its last surplus budget in 2002. After 2002, every budget has been a deficit budget. Despite this, UK’s covid support package reaches 18% of their GDP. The tax revenue could never support such a huge spending, so how did the UK manage money for the package? They printed money to provide the package.    

This reminds me of an old debate, sovereign governments are authorized to print money as much as they need, then why do the governments need to collect taxes and why don’t they cover all of their expenses just by printing money? 

In fact, if the governments print money arbitrarily without any economic fundamentals, this may lead to hyperinflation. In the Second World War, Germany chose to print money to meet the expenditures of the War and as a result, German mark lost its value.

More recent example is that of Zimbabwe. Zimbabwe announced to print money to retire public debt and as a result, inflation in Zimbabwe jumped to over one million percent in the next year. In a couple of years, Zimbabwe had to abandon its currency. Therefore, currency printing must be done with extreme care and with careful analysis of economic fundamentals.

But what exactly these fundamentals are? How much money can be created without fear of inflation? This concept is very misunderstood by the academicians and by the policy makers. 

There is very clear difference in practices adapted by the advanced economies and the emerging economies.  Most of the advanced economies used their central banks to create money for Covid support program, but most of the developing nations remain reluctant to do so. So the question is, what is the limit of money printing without invoking inflation?

There is an emerging heterodox school of thought called Modern Monetary Theory having an entirely different perspective on the nature of money. I am not going into the theories and solutions forwarded by Modern Monetary Theory. My analysis entirely lies within the frame work of conventional economics. My observation is, the conventional wisdom on money-inflation relationship is very badly misunderstood by the profession.

Let’s start by very basic Quantity Theory of Money. The QTM is described by equation

MV=PY

Where

M represents money supply

V represents velocity of money

P represents the price level

and    Y denotes the aggregate GDP

This equation is an identity, which is bound to hold.

Usually V is assumed fixed and assume that in the short run Y is also fixed, and if M is increased, P must also increase so that the equality holds. Therefore the equation says, if money supply in an economy is increased, the price level P will also increase. But this conclusion is based on two assumptions: Constancy of velocity of money and constancy of the aggregate economic activity. Suppose new money is printed and is used to create entirely new economic activity, this means Y is increased. In this case, the equality may hold without increase in price level. The QTM doesn’t predict a necessary rise in prices.

This simple analysis indicates money can be created for news activity without fear of inflation. There are the countries opted to print money for new activities and did it successfully without inflation.  

Similarly at the times of economic recessions, people tend to consume less, therefore V decreases. This decrease in V may lead to decrease in Y or P, i.e. to a recession or a deflation. Both deflation and recession are considered undesirable.

Alternatively governments may choose to increase M so that the downward changes in P and Y can be stopped. In this case, the money creation needs not to be inflationary.

This is also evident from the behavior of a large number of developed and some developing nations.

For example, UK had a budget deficit amounting to about 17%  of GDP in Year 2000 . This is because of huge covid related spending and loss of productivity in the first half of 2020. UK financed this deficit by using its central bank. During the year 2020, the money supply in UK increased by about 12% but the inflation actually reduced from 1.7% to 0.4%. Germany spent about 35% of GDP on covid related spending leading to huge increase in public debt and the budget deficit. But inflation in Germany is not out of control. The money supply M2 increased by about 20% in Canada during 2020, without sparking a high inflation. 

The international financial institutions such as IMF have observed that at least during the economic recessions, the central bank borrowing and other kinds of monetary expansions do not bring inflation. Despite this, these institutions use their influence on the developing nations who are obliged to them to enforce the policies that only add to the miseries. 

What is needed at this time is a deeper analysis of relationship between central bank borrowing and inflation. Don’t print money in an uncontrolled manner, but do learn lessons from countries that printed successfully without inflation and try to follow it. Ultimately, government surplus is people’s deficit and the vice versa

Please follow and like us:

Mainstream Monetary Economics: A Package of Logical Fallacies

The mainstream monetary economics is filled with contradictions, logical inconsistencies, missed and messed normative implications and data inconsistencies. There exist heterodox theories having better match with historical data, but the theories are often undermined and ignored. It is in fact difficult to find something logical and valid in classical monetary economics. Despite a clear empirical failure, monetary economics is still widely believed which is quite surprising.

Thomas Tooke is perhaps the first person to produce a book in monetary economics. In 1857, he wrote a book titled ‘History of Prices and of the State of the Circulation during the Years 1793–1856’. He is also a pioneer of the ‘Banking School theory’. This theory predicts that higher interest rates should be associated with higher price levels. The logic for this view is very simple; the interest rate is a part of cost of production for the firms. Higher the interest rate, higher would be the cost of production leading to higher prices. This is the oldest theory on the relationship between interest rate and inflation.

However, mainstream economics adapted an opposite theory known as demand channel of monetary transmission. This says that if the interest rate increases, the people will reduce spending and there would be a reduction in aggregate demand which will lead to reduction in prices. This view was adapted at least as early as the 1890s and is popular to date. The inflation targeting framework which is the most popular framework for designing monetary policy today is also based on this hypothesis.

Historical data in every time period provided evidence against the demand channel. The most popular of the early evidences against the demand channel is findings of Gibson. Gibson (1923) analyzed the data on interest rate and prices for the United Kingdom for about 200 years and found that the high interest rate is associated with higher prices; something which is matching with Tooke’s view and supported by the oldest theory in monetary economics.

The findings of Gibson were so impressive that Keynes recognized his findings as ‘one of the most completely established empirical facts in the whole field of quantitative economics.’. However, Keynes termed this finding as ‘Gibson paradox’ indicating absence of any theory to explain the observation. Given the presence of Tooke’s Banking School Theory, this labeling was erroneous. However, Keynes’ recognition was a strong support to the idea that interest rate and inflation are positively associated. This is quite opposite to the logical foundations of the inflation targeting framework.

The history goes on and the empirical evidence supporting Tooke’s view were ignored by labeling as paradox. In the 1970s, there was a re-invention of supply side economics and people discussed the possibility of a cost channel of monetary transmission mechanism. This was strong theoretical support to the positive association between interest rate and inflation.

In 1992, Sims produced his seminal paper where he found that impulse response of inflation to changes in interest rate is positive. Despite the stature of Sims who later won the prestigious Nobel Prize, his findings were labeled as ‘price puzzle’, to indicate absence of theoretical underpinning of the observation. This was a denial of Tooke’s theory and the cost channel.

Brazil reduced policy rate from 14% to 2% during the three years starting from 2017. Such a drastic cut in interest should skyrocket the inflation if the widely believed demand channel was valid, but the opposite happened. The inflation in Brazil during 2017 has been about 10% which is now below 6%.

Graph showing trend in interest rate and inflation in brazil for 2015-2021. from 2016 to 2020, interest rate dropped from 14% to 2%, and the inflation also reduced.

The response to the Global Financial Crisis and Covid-19 also mark the failure of classical monetary theory. All major economies responded to the pandemic by reducing interest rate and the inflation also reduced. Despite this failure of monetary theory, the international financial institutions such as IMF use to advocate the inflation targeting, which is quite strange.

The graph shows trend in interest rate and inflation in UK. in 2020, the interest rate was suddenly reduced from 0.7% to 0.1%. The mainstream wisdom predicts a rise of inflation after reductionin interest rate, but inflation also reduced after the fall in interest rate.

Besides the contradictions with empirical evidence, there are logical inconsistencies and messed and missed normative implications. Assume for a while that the demand channel is valid i.e. increasing interest rate reduces inflation. If so, it can happen only through the luxuries. The demand for necessities cannot be reduced significantly. Therefore, if any reduction in aggregate price level occurs, it must be driven by prices of luxuries. Therefore, the rise in interest rate will improve the purchasing power of consumers of luxuries, and would be ineffective to improve the prices of necessities. There are very obvious normative implications, but conventional monetary economics never discusses the normative implications of monetary policy. That’s the missed normative implication.

Assume for a while that the traditional demand channel exists and that increase in interest rate reduces the prices. The demand channel also implies that a higher interest rate leads to increase in unemployment. Therefore, the cost of price stability shall be borne by those who will lose their jobs. It is also well known that those who are at the risk of losing jobs are the poorest people. Therefore price stability comes at the cost of the most vulnerable cohort of society, another very serious normative implication. But the traditional monetary economics totally ignores the normative implication of monetary policies.

It is also clear that any real implication of inflation on the economy comes from relative price movement. If prices of all goods and services increase at the same rate, no real variable would be affected; an implication known as monetary neutrality. Contrary to monetary neutrality, the Phillips curve assumes that inflation affects employment, and so happens due to differential in the price changes for wages and commodity prices. This means, focusing aggregate inflation is meaningless. One needs to look at the relative movement of sub-indices of the consumer price index. But the monetary policy especially the inflation targeting framework explicitly focuses on aggregate price level without taking any care of the relative price movement. There is no explanation for this in the literature.  

In short, if you try to look into theoretical underpinning of monetary policy, you will find them to be very weak. If you look at the empirical data, the data shows invalidity of the underlying hypothesis. If you look into normative implication, you will find many which are practically ignored. Therefore, the textbooks on monetary economics need a rewrite, and an alternative monetary theory need to be developed which should be based on empirical data, not on the basis of hypothetical theories. 

Please follow and like us:

General to Specific Modeling; a Step by Step Guide

In my the previous blogs, [1] [2] I have explained that following the General Specific Methodology, one can choose between theoretical models to find out a model which is compatible with data. Here is an example which shows step by step procedure of the general to simple methodology.

At the end of this blog, you will find the data on three variables, (i) Household Consumption (ii) GDP and (iii) for the South Korea. The data set is retrieved from WDI

Before starting the modeling, it is very useful to plot the data series. We have three data series, two of them are on same scale and can be plotted together. The third series ‘inflation’ is in percentage form and if plotted with the above mentioned series, it will not be visible. The graph of two series is as follows

You can see, the gap between income and consumption seems to be diverging over time. This is natural phenomenon, suppose a person has income 1000, and consumes 70% of it, the difference between consumption and income would be 300. Suppose the income has gone up to 10,000 and the MPC is same, than the difference between two variables would be widened to 3000. This widening gap is visible in the graph.

However, the widening gap creates problem in OLS. The residuals in the beginning of the data would have smaller variance and at the endpoints, they will have larger variance, i.e. there will be heteroskedasticity. In presence of heteroskedasticity, the OLS doesn’t remain efficient.

The graphs also show a non-linearity, the two series appear to behave like exponential series. A solution to the two problems is to use the log transform. The difference in log transform of two series is roughly equal to the percentage difference, and if the MPC remains same, the gap between two series would be smoothened.

I have taken the log transform and plotted the series again, the graphs is as follows

You can see the gap between log transform of two series is smooth compared to the previous graph. One can see the gap is still widening, but much smoother compared to the previous graph. The widening gap in this graph indicates decline in MPC overtime. Anyhow, the two graphs indicate that log transform is better to start building model.

I am starting with ARDL model of the following form

Where Ct indicates consumption Yt and indicates income

The estimated equation is as follows

The equation has very high R-square, but a high R-square in time series is no surprise. This turns out to be high even with unrelated series. However, the thing to note is the sigma which is the standard deviation of residuals, indicating average size of error is 0.0271. Before we proceed further we want to make sure that the estimated model is not having the issue of failure of assumption. We tested the model for normality, autocorrelation and heteroskedasticity, and the results are as follows;

The autocorrelation (AR) test has the null hypothesis of no autocorrelation and the P-value for AR test is above 5%, indicating that the null is no rejected and the hypothesis survived with a narrow margin. Normality test with null of normality and heteroskedasticity test with null of heteroskedasticity also indicate validity of the assumptions.

We want to ensure that the model is also good at prediction, because the ultimate goal of an econometric model is to predict the future. But the problem is, for the real time forecasting, we have to wait for years to see whether the model has the capability to predict. One solution to this problem is to leave some observation out of the model for purpose of prediction and then see how the model works to predict these observations.

The output indicates that the two tests for predictions have p-value much greater than 5%. The null hypothesis for Forecast Chi-square test is that the error variance for the sample period and forecast period are same and this hypothesis is not rejected. Similarly, the null hypothesis for Chow test is that the parameters remain same for the sample period and forecast period and this hypothesis is also not rejected.

All the diagnostic again show satisfactory results

Now let’s look back at the output of Eq(2). It shows the second lag variables Lconsumption_2 and LGDP_2 are insignificant. This means, keeping the Lconsumption_2 in the model, you can exclude LGDP_2 and vice versa. But to exclude both of these variables, you need to test significance of the two variables simultaneously. Sometime it happens that two variables are individually insignificant but become significant when taken together. Usually this happens due to multi-colinearity. We test joint significance of the two second lag variables, i.e.

The results of the test are

F(2,48)   =   2.1631 [0.1260] 

The results indicate that the hypothesis is not rejected, therefore, we can assume the coefficients of relevant variables to be zero, therefore the model becomes

The model M2 was estimated and the results are as follows

The results show the diagnostic tests for the newly estimated model are all OK, and the forecast performance for the new model is not affected by excluding the two variables. If you compare sigma for for Eq (2) and Eq(3), you will the difference only at fourth decimal. This means the size of model is reduced without paying any cost in terms of predictability.

Now the variables in the model are significant except the intercept for which the p-value is 0.178. This means the regression doesn’t support an intercept. We can reduce the model further by excluding intercept. This time we don’t need to test joint restriction because we want to exclude only one variable. After excluding the intercept, the model becomes

The output indicates that all the diagnostic are OK. All the variables are significant, so no variable can be excluded further.

Now we can impose some linear restrictions instead of the exclusion restrictions. For example, if we want to tests whether or not we can take difference of Cons and Income, we need to test following

      

And if we want to test restriction for the error correction model, we have to test

Apparently the two restriction seems valid because estimated value of  is close to 1 and values of  sum to 0. We have the choice to test R3 or R4. We are testing restriction R3 first.  The results are as follows

 

This means the error correction model can e estimated for the data under consideration.

For the error correction model, one needs to estimate a static regression (without lags) and to use the residuals of the equation as error correction term. Estimating static regression yield

The estimates of this equation are representative of the long run coefficients of relationship between the two variables. This shows the long run elasticity of consumption with respect to income is 0.93

We have to estimate following kind of error correction regression

 

The intercept doesn’t enter in the error correction regression. The estimates are as follows

This is the parsimonious model made for the consumption and income. The Eq (5) is representative of long run relationship between two variable and Eq (6) informs about short run dynamics.

The final model has only two parameters, whereas as Eq(1) that we started with contains 6 parameters. The sigma for the Eq(6) and  Eq (2) are roughly same which informs that the large model where we started has same predicting power as the last model. The diagnostic tests are all OK which means the final model is statistically adequate in that it the assumption of the model are not opposed by the data.

The final model is an error correction model, which contains information for both short run and long run. The short run information is present in equation (6), whereas the long run information is implicit in the error correction term and it is available in the static Eq (5).

The same methodology can be adapted for the more complex situations and the researcher needs to start from a general model, reducing it successively until the most parsimonious model which is statistically adequate is achieved

Data

Variables Details:

Consumption: Households and NPISHs Final consumption expenditure (current LCU)

GDP: GDP Current LCU

Country: Korea, Republic

Time Period: 1960-2019

Source: WDI online (open source data)

ConsumptionGDP
1960212720000000249860000000
1961252990000000301690000000
1962304100000000365860000000
1963417870000000518540000000
1964612260000000739680000000
1965693780000000831390000000
19668378900000001066070000000
196710398000000001313620000000
196812770600000001692900000000
196915977800000002212660000000
197020621000000002796600000000
197125924000000003438000000000
197231048000000004267700000000
197337986000000005527300000000
197454436000000007905000000000
1975728570000000010543600000000
1976931550000000014472800000000
19771136150000000018608100000000
19781501670000000025154500000000
19791943970000000032402300000000
19802491670000000039725100000000
19813118150000000049669800000000
19823527860000000057286600000000
19833979680000000068080100000000
19844444480000000078591300000000
19854930500000000088129700000000
198654837200000000102986000000000
198761775800000000121698000000000
198871362200000000145995000000000
198983899400000000165802000000000
1990100738000000000200556000000000
1991122045000000000242481000000000
1992141345000000000277541000000000
1993161105000000000315181000000000
1994192771000000000372493000000000
1995227070000000000436989000000000
1996261377000000000490851000000000
1997289425000000000542002000000000
1998270298000000000537215000000000
1999311177000000000591453000000000
2000355141000000000651634000000000
2001391692000000000707021000000000
2002440207000000000784741000000000
2003452737000000000837365000000000
2004468701000000000908439000000000
2005500911000000000957448000000000
20065332780000000001005600000000000
20075718100000000001089660000000000
20086063560000000001154220000000000
20096228090000000001205350000000000
20106670610000000001322610000000000
20117111190000000001388940000000000
20127383120000000001440110000000000
20137580050000000001500820000000000
20147804630000000001562930000000000
20158048120000000001658020000000000
20168348050000000001740780000000000
20178727910000000001835700000000000
20189115760000000001898190000000000
20199316700000000001919040000000000
Please follow and like us:

ARDL model and General to simple methodology

Listening to the word ARDL, the first things that comes into mind is the bound testing approach introduced by Pesaran and Shin (1999).The Pesaran and Shin’s approach is an incredible use of the ARDL, however, the term ARDL is much elder, and the ARDL model has many other uses as well. In fact, the equation used by Pesaran and Shin is a restricted version of ARDL, and the unrestricted version of ARDL was introduced by Sargan (1964) and popularized by David F Hendry and his coauthors in several papers. The most important paper is one which is usually known as DHSY, but we will come to the details DHSY later. Let me introduce what is ARDL and what are the advantages of this model


What is ARDL model?

ARDL model is an a-theoretic model for modeling relationship between two time series. Suppose we want to see the effect of time series variable Xt on another variable Yt. The ARDL model for the purpose will be of the form

The same model can be written as

This means, in the layman language the dependent variable is regressed on its own lags, independent variable and the lags of independent variables. The above ARDL model can be termed as ARDL (j, k) model, referring to number of lags j & K in the model.

The model itself is written without any theoretical considerations. However, a large number of theoretical models are embedded inside this model and one can drive appropriate theoretical model by testing and imposing restrictions on the model.

To have more concrete idea, let’s consider the case of relationship between consumption and income. To further simplify, lets consider j=k=1, so that the ARDL(1,1) model for the relationship of consumption and income can be written as

Model 1:          Ct=a+b1Ct-1+d0Yt+d1Yt-1+et

HereC denotes consumption and Y denotes income, a,b1,d0,d1 denote the regression coefficient and et denotes error term. So far, no theory is used to develop this model and the regression coefficients don’t have any theoretical interpretation. However, this model can be used to select appropriate theoretical model for the consumption.

Suppose we have estimated the above mentioned model and found the regression coefficients. We can test any one of the coefficient and/or number of coefficient for various kinds of restriction. Suppose we test the restriction that   

R1: H0: (b1 d1)=0

Suppose testing restriction on actual data implies that restriction is valid, this means we can exclude the curresponding variables from the model. Excluding the variables, the model will become

Model 2:          Ct=a+ d0Yt+et

The model 2 is actually the Keynesian consumption (also called absolute income hypothesis), which says that current consumption is dependent on current income only. The coefficient of income in this equation is the marginal propensity to consume and Keynes predicted that this coefficient would be between 0 and 1, implying that individuals consume a part of their income and save a part of their incomes for future.

Suppose that the data did not suppose the restriction R1, however, the following restriction is valid

R2: H0: d1=0

This means model 1 would become

Model 3:          Ct=a+b1Ct-1+d0Yt+et

This means that current consumption is dependent on current income and past consumption. This is called Habit Persistence model. The past consumption here is the proxy of habit. The model says that what was consumed in the past is having effect on current consumption and is evident from human behavior.

Suppose that the data did not suppose the restriction R1, however, the following restriction is valid

R3: H0: b1=0

This means model 1 would become

Model 4:          Ct=a+ d0Yt+d1Yt-1+et

This means that current consumption is dependent on current income and past income. This is called Partial Adjustment model. As per implications of Keynesian consumption function, the consumption should only depend on the current income, but the partial adjustment model says that it takes sometimes to adjust to the new income. Therefore, the consumption is partially on the current income and partially on the past income

In a similar way, one can derive many other models out of the model 1 which are representative of different theories. The details of the models that can be drawn from model 1 can be found in Charemza and Deadman (1997)’s ‘New Directions in Econometric Practice…’.  

It can also be shown that the difference form models are also derivable from model 1. Consider the following restriction

R 4:

If this restriction is valid, the model 1 will become

Ct=a+Ct-1+d0Yt-d0Yt-1+et

This model can be re-written as

Ct-Ct-1=d0(Yt-Yt-1)+et

  This means

Model 5: DCt=d0DYt+et

This indicates that the difference form models can also be derived from the model 1 with certain restrictions

Further elaboration shows that the error correction models can also be derived from model 1.

Consider model 1 again and subtract Ct-1 both sides, we will get

 Ct- Ct-1=a+b1Ct-1 -Ct-1+d0Yt+d1Yt-1+et

Adding and subtracting d0Yt-1 on the right hand side we get

DCt=a+(b1-1)Ct-1+d0Yt+d1Yt-1 +d0Yt-1 -d0Yt-1 +et

DCt=a+(b1-1)Ct-1+d0DYt+d1Yt-1 +d0Yt-1 +et

DCt=a+(b1-1)Ct-1+d0DYt+(d1+d0)Yt-1+et

This equation contains error correction mechanism if

R6: (b1-1)= – (d1+d0)

Assume

 (b1-1)= – (d1+d0)=F

The equation will reduce to

DCt=a+F(Ct-1-Yt-1)+ d0DYt +et

This is our well known error correction model and can be derived if R6 is valid.

Therefore, existence of an error correction mechanism can also be tested from model 1 and restriction to be considered valid if R6 is valid.  

As we have discussed, number of theoretical models can be driven from model 1 by testing certain restrictions. We can start from model 1 and go with testing different restrictions. We can impose the restriction which is found valid and discard the restrictions which were found invalid in our testing. This provides us a natural way of selection among various theoretical models.

When we say theoretical model, this means there is some economic sense of the model. For example the models 2 to model 6 all make economic sense. So, how to decide between these models? This problem can be solved if we start with an ARDL model and choose to impose restrictions which are permitted by the data

The famous DHSY paper recommends a methodology like this. DHSY recommend that we should start with a large model which encompasses various theoretical models. The model can then be simplified by testing certain restrictions.

In another blog I have argued that if there are different theories for a certain variables, the research must be comparative. This short blog gives the brief outlines about how we can do this. Practically, one need to take larger ARDL structures and number of models that can be derived from the parent model would also be large.

Please follow and like us:

Research in presence of multiple theories

Consider a hypothetical question, a researcher was given with a research question; compare the mathematical ability of male and female students of grade 5. The researcher collected data of 300 female students and 300 male students of grade 5 and administered a test of mathematical questions. The average score for female students was 80% and average score of male student was 50%, the difference was statistically significant and therefore, the researcher concluded that the female students have better mathematical aptitude.

The findings seem strong and impressive, but let me add into the information that the male students were chosen from a far-off village with untrained educational staff and lack of educational facilities. The female students were chosen from an elite school of a metropolitan city, where the best teachers of the city actually serve. What should be the conclusion now? It can be argued that actually difference doesn’t come from the gender, the difference is coming from the school type.

The researcher carrying out the project says ‘look, my research assignment was only to investigate the difference due to gender, the school type is not the question I am interested in, therefore, I have nothing to do with the school type’.

Do you think that the argument of researcher is valid and the findings should be considered reliable? The answer is obvious, the findings are not reliable, and the school type creates a serious bias.  The researcher must compare students from the same school type. This implies you have to take care of the variables not having any mention in your research question if they are determinants of your dependent variable.

Now let’s apply the same logic to econometric modeling, suppose we have the task to analyze the impact of financial development on economic growth. We are running a regression of GDP growth on a proxy of financial development; we are getting a regression output and presenting the output as impact of financial development on economic growth. Is it a reliable research?

This research is also deficient just like our example of gender and mathematical ability. The research is not reliable if ceteris paribus doesn’t hold. The other variables which may affect the output variable should remain same.

But in real life, it is often very difficult to keep all other variables same. The economy continuously evolves and so are the economic variables. The other solution to overcome the problem is to take the other variables into account while running regression. This means other variables that determine your dependent variable should be taken as control variables in the regression. This means suppose you want to check the effect of X1 on Y using model Y=a+bX1+e. Some other research studies indicate that another model exist for Y which is Y=c+dX2+e.   Then I cannot run the first model ignoring the second model. If I am running only model 1 ignoring the other models, the results would be biased in a similar way as we have seen in our example of mathematical ability. We have to use the variables of model 2 as control variable, even if we are not interested in coefficients of model 2. Therefore, the estimated model would be like Y=a+bX1+cX2+e

Taking the control variables is possible when there are a few models. The seminal study of Davidson, Hendry, Sarba and Yeo titled ‘Econometric modelling of the aggregate time-series relationship between …. (often referred as DHSY)’ summarizes the way to build a model in such a situation. But it often happens that there exists very large number of models for one variable. For example, there is very large number of models for growth. A book titled ‘Growth Econometrics’ by Darlauf lists hundreds of models for growth used by researchers in their studies. Life becomes very complicated when you have so many models. Estimating a model with all determinants of growth would be literally impossible for most of the countries using the classical methodology. This is because growth data is usually available at annual or quarterly frequency and the number of predictors taken from all models collectively would exceed number of observations. The time series data also have dynamic structure and taking lags of variables makes things more complicated. Therefore, classical techniques of econometrics often fail to work for such high dimensional data.

Some experts have invented sophisticated techniques for the modeling in a scenario where number of predictor becomes very large. These techniques include Extreme Bound Analysis, Weighted Average Least Squares, and Autometrix etc. The high dimensional econometric techniques are also very interesting field of econometric investigation. However, DHSY is extremely useful for the situations where there are more than one models for a variable based on different theories. The DHSY methodology is also called LSE methodology, General to Specific Methodology or simply G2S methodology.  

Please follow and like us:

Learning Central Limit Theorem with Microsoft Excel

Many statistical and econometric procedures depend on the assumption of normality. The importance of the normal distribution lies in the fact that sums/averages of random variables tend to be approximately normally distributed regardless of the distribution of draws. The central limit theorem explains this fact. Central Limit Theorem is very important since it provides justification for most of statistical inference. The goal of this paper is to provide a pedagogical introduction to present the CLT, in form of self study computer exercise. This paper presents a student friendly illustration of functionality of central limit theorem. The mathematics of theorem is introduced in the last section of the paper. 

CENTRAL LIMIT THEOREM

We start by an example where we observe a phenomenon and than we will discuss the theoretical background of the phenomenon.

Consider 10 players playing with identical dice simultaneously. Each player rolls the dice large number of times. The six numbers on the dice have equal probability of occurrence on any roll and before any player. Let us ask computer to generate data that resembles with the outcomes of these rolls.

We need to have Microsoft Excel ( above 2007 preferable) for this exercise. Point to ‘Data’ tab in the menu bar, it should show ‘Data Analysis’ in the tools bar. If Data Analysis is not there, than you need to install the data analysis tool pack, for this  you have to click on the office button, which is the yellow color button at top left corner of Microsoft Excel Window.  Choose ‘Add Ins’ from the left pan that appears, than check the box against ‘Analysis Tool Pack’ and click OK.

Select Office Button Excel OptionsSelect Add Ins Þ  Analysis ToolPack ÞGo from the screen that appears

Computer will take few moments to install the analysis toolpack. After installation is done, you will see ‘Data Analysis’ on pointing again to Data Tab in the menu bar. The analysis tool pack provides a variety of tool for statistical procedures.

We will generate data that matches with the situation described above using this tool pack.

Open an Excel spread sheet, write 1, 2, 3,…6 in cells A1:A6,

Write ‘=1/6’ in cell B1 and copy it down

This shows you possible outcomes of roll of dice and their probabilities.

 This will show you following table:

10.167
20.167
30.167
40.167
50.167
60.167

Here first column contain outcomes of roll of dice and second column contain probability of outcomes. Now we want the computer to have some draws from this distribution. That is, we want computer to roll dice and record outcomes.

For this go to Data Þ Data AnalysisÞ Random Number Generation and select discrete distribution. Write number of variables =10 and number of random number =1000, enter value input and probability range A1:B6, put output range D1 and click OK.

This will generate a 1000×10 matrix of outcomes of roll of dice in cells A8:J1007. Each column represent outcome for a certain player in 1000 draws whereas rows represent outcomes for 10 players in some particular draw. In the next column ‘K’ we want to have sum of each row. Write ‘=SUM(A8:J8) and copy it down. This will generate column of sum for each draw.

Now, we are interested in knowing that what distribution of outcome for each player is:  

Let us ask Excel to count the frequency of each outcome for player 1. Choose Tools/Data Analysis/Histogram and fill the dialogue box as follows:

The screenshot shows the dialogue box filled to count the frequency of outcomes listed observed by player A. The input range is the column for which we want to count frequency of outcomes and bin range is the range of possible outcomes.  This process will generate frequency of six possible outcomes for the single player. When we did this, we got following output:

BinFrequency
1155
2154
3160
4169
5179
6183
More0

The table above gives the frequency of the outcomes whereas same frequencies are plotted in the bar chart. You observe that frequency of occurrence is not approximately equal. The height of vertical bars is approximately same. This implies that the distribution of draws is almost uniform. And we know this should happen because we made draws from uniform distribution. If we calculate percentage of each outcome it will become 15.5%, 15.4%, 16%, 16.9%, 17.9% and 18.3% respectively. These percentages are close to the probability of these outcomes i.e. 16.67%.

Now we want to check the distribution of column which contain sum of draws for 10 players, i.e. the column K. Now the range of possible values of column of sum varies from 10 to 60 (if all column have 1, the sum would be 10 and if all columns have 6 than sum would be 60, in all other cases it would be between these two numbers). It would be in-appropriate to count frequencies of all numbers in this range. Let us make few bins and count the frequencies of these bins. We choose following bins; (10,20), (20, 30),…(50, 60). Again we would ask Excel to count frequencies of these bins. To do this, write 10, 20,…60 in column M of Excel spread sheet (these numbers are the boundaries of bins we made). Now select Tools/Data Analysis/Histogram and fill the dialogue box that appears.

 The input range would be the range that contains sum of draws i.e. K8 to K1007 and bin range would be the address of cells where we have written the boundary points of our desired bins. Completing this procedure would produce the frequencies of each bin. Here is the output that we got from this exercise.

BinFrequency
100
205
30211
40638
50144
602
More0

First row of this output tells that there was no number smaller than starting point of first bin i.e. smaller than 10, and 2nd, 3rd …rows tell frequencies of bins (10-20), (20,30),…respectively. Last row informs about frequency of numbers larger than end point of last bin i.e. 60.

Below is the plot of this frequency table.

 Obviously this plot has no resemblance with uniform distribution. Rather if you remember famous bell shape of the normal distribution, this plot is closer to that shape.

Let us summarize our observation out of this experiment. We have several columns of random numbers that resemble roll of dice i.e. possible outcomes are 1…6 each with probability 1/6 (uniform distribution). If we count frequency of these outcomes in any column, the outcomes reveal the distributional shape and the histogram is almost uniform. Last column was containing sum of 10 draws from uniform distribution and we saw that distribution of this column is no longer uniform, rather it has closer match with shape of normal distribution.

Explanation of the observation:

The phenomenon that we observed may be explained by central limit theorem. According to central limit, let  be independent draws from any distribution (not necessarily uniform) with finite variance, than distribution of sum of draws  and average of draws would be approximately normal if sample size ‘n’ is large.

Mean and SE for sum of draws:

From our primary knowledge about random variables we know that:

And

Suppose

Let , than and

These two statements tell the parameters of normal distribution that emerges from sum of random numbers and we have observed this phenomenon described above.

Verification

Consider the exercise discussed above; column A:J are draws from dice roll with expectation 3.5 and variance 2.91667. Column K is sum of 10 previous columns. Thus expected value of K is thus 10*3.5=35 and variance 2.91667*10. This also implies that SE of column K is 5.400 (square root of variance.

The SD and variance in the above exercise can be calculated as follows:

Write ‘AVERAGE(K8:K1007)’ in any blank cell in spreadsheet. This will calculate sample mean of numbers in column K. The answer will be close to 35. When I did this, I found 34.95.

Write ‘VAR(K8:K1007)’ in any blank cell in spreadsheet. This will calculate sample variance of numbers in column K. The answer will be close to 29.16, when I did this, I found 30.02

Summary:

In this exercise, we observed that if we take draws from some certain distribution, the frequency of draws will reflect the probability structure of parent distribution. But when we take sum of draws, the distribution of sum reveals the shape of normal distribution. This phenomenon has its root in central limit theorem which is stated in Section …..

Please follow and like us:

Spurious Regression With Stationary Time Series

The spurious relationship is said to have occurred if the statistical summaries are indicating that two variables are related to each other when in fact there is no theoretical relationship between two variables. It often happens in time series data and there are many well-known examples of spurious correlation in time series data as well. For example, Yule (1926) observed strong relationship between marriages in church and the mortality rate in UK data. Obviously, it is very hard to explain that how the marriages in church can possibly effect the mortality, but the statistics says one variable has very strong correlation with other. This is typical example of spurious regression. Yule (1926) thought that this happens due to missing third variable.

This term spurious correlation was invented on or before 1897 i.e. in less than 15 years after invention of regression analysis. In 1897, Karl Pearson wrote a paper entitled, ‘Mathematical Contributions to the Theory of Evolution: On a Form of Spurious Correlation Which May Arise When Indices Are Used in the Measurement of Organs’. The title indicates the terms spurious regression was known at least as early as 1897, and it was observed in the data related to measurement of organs. The reason for this spurious correlation was use of indices. In next 20 years, many reasons for spurious correlation were unveiled with the most popular being missing third variable. This means if X is a cause of Y and X is also a cause of Z, but Y and Z are not directly associated. If you regress Y on Z, you will find spurious regression.

In 1974, Granger and Newbold (Granger won noble prize later) found that two non-stationary series may also yield spurious results even if there is no missing variable. This finding only added another reason to the possible reasons of spurious regression. Neither this finding can be used to argue that the non-stationarity is one and only reason of spurious regression nor this can be used to argue that the spurious regression is time series phenomenon. However, unfortunately, the economists adapted the two misperception. First, they thought that spurious regression is time series phenomenon and secondly, although not explicitly stated, it appears that the economists assume that the non-stationarity is the only cause of spurious regression. Therefore, although not explicitly stated, most of books and articles discussing the spurious regression, discuss the phenomenon in the context of non-stationary time series.

Granger and his coauthors in 1998 wrote a paper entitled “Spurious regressions with stationary series”, in which they show that spurious regression can occur in the stationary data. Therefore, they clear one of the common misconception that the spurious regression is only due to non-stationarity, but they were themselves caught in the second misconception that the spurious regression is time series phenomenon. They define spurious regression as “A spurious regression occurs when a pair of independent series but with strong temporal properties, are found apparently to be related according to standard inference in an OLS regression”. The use of term temporal properties implies that they assume the spurious regression to be time series related phenomenon. But a 100 years ago, Pearson has shown the spurious regression a cross-sectional data.

The unit root and cointegration analysis were developed to cope with the problem of spurious regression. The literature argues that spurious regression can be avoided if there is cointegration. But unfortunately, cointegration can be defined only for non-stationary data. What is the way to avoid spurious regression if the underlying are stationary? The literature is silent to answer this question.

Pesaran et al (1998) developed a new technique ‘ARDL Bound Test’ to test the existence of level relationship between variables. People often confuse the level relationship with cointegration and the common term used for ARDL Bound test is ARDL cointegration, but the in reality, this does not necessarily imply cointegration. The findings of Bound test are more general and imply cointegration only under certain conditions. The ARDL is capable of testing long run relationship between pair of stationary time series as well as between pair of non-stationary time series. However, the long run relationship between stationary time series cannot be termed as cointegration because by definition cointegration is the long run relationship between stationary time series.

In fact, ARDL bound test is a better way to deal with the spurious regression in stationary time series, but several misunderstandings about the test has restricted the usefulness of the test. We will discuss the use and features of ARDL in a future blog. 

Please follow and like us:

Can cointegration analysis solve spurious regression problem?

The efforts to avoid the existence of spurious regression has led to the development of modern time series analysis (see How Modern Time Series Analysis Emerged? ). The core objective of unit root and cointegration procedures is to differentiate between genuine and spurious regression. However, despite the huge literature, the unit root and cointegration analysis are unable to solve spurious regression problem. The reason lies mainly in the misunderstanding of the term spurious regression.

Spurious correlation/spurious correlation occur when a pair of variable having no (weak) causal connection appears to have significant (strong) correlation/regression. In these meanings the term spurious correlation/spurious has the same history as the term regression itself. The correlation and regression analysis were invented by Sir Francis Galton in around 1888 and in 1897, Karl Pearson wrote a paper with the following title, ‘Mathematical Contributions to the Theory of Evolution: On a Form of Spurious Correlation Which May Arise When Indices Are Used in the Measurement of Organs’ (Pearson, 1897).

This title indicates number of important things about the term spurious correlation: (a) the term spurious correlation was known as early as 1897, that is, in less than 10 years after the invention of correlation analysis (ii) there were more than one types of spurious correlation known to the scientists of that time, therefore, the author used the phrase ‘On a Farm of Spurious Regression’, (c) the spurious correlation was observed in measurement of organs, a cross-sectional data (d) the reason of spurious correlation was use of indices, not the non-stationarity.

One can find in classical econometric literature that that many kinds of spurious correlations were known to experts in first two decades of twentieth century. These kinds of spurious correlations include spurious correlation due to use of indices (Pearson, 1897), spurious correlation due to variations in magnitude of population (Yule, 1910), spurious correlation due to mixing of heterogeneous records (Brown et al, 1914), etc. The most important reason, as the econometricians of that time understand, was the missing third variable (Yule, 1926).

Granger and Newbold (1974) performed a simulation study in which they generated two independent random walk time series x(t)=x(t-1)+e(t) and y(t)=y(t-1)+u(t) . The two series are non-stationary and the correlation of error terms in the two series is zero so that the two series are totally independent of each other. The two variables don’t have any common missing factor to which the movement of the two series can be attributed. Now the regression of the type y(t)=a+bx(t)+e(t) should give insignificant regression coefficient, but the simulation showed very high probability of getting significant coefficient. Therefore, Granger and Newbold concluded that spurious regression occurs due to non-stationarity.

Three points are worth considering regarding the study of Granger and Newbold. First, the above cited literature clearly indicates that the spurious correlation does exist in cross-sectional data and the Granger-Newbold experiment is not capable to explain cross-sectional spurious correlation. Second, the existing understanding of the spurious correlation was that it happens due to missing variables and the experiment adds another reason for the phenomenon which cannot deny the existing understanding. Third, the experiment shows that non-stationarity is one of the reasons of spurious regression. It does not prove that non-stationarity  is ‘the only’ reason of spurious regression.

However, unfortunately, the econometric literature that emerged after Granger and Newbold, adapted the misconception. Now, many textbooks discuss the spurious regression only in the context of non-stationarity, which leads to the misconception that the spurious regression is a non-stationarity related phenomenon. Similarly, the discussion of missing variable as a reason of spurious regression is usually not present in the recent textbooks and other academic literature.

To show that spurious regression is not necessarily a time series phenomenon, consider the following example:

A researcher is interested in knowing the relationship between shoe size and mathematical ability level of school students. He goes to a high school and takes a random sample of the students present in the school. He takes readings on shoe size and ability to solve the mathematical problems of the selected students. He finds that there is very high correlation between two variables. Would this be sufficient to argue that the admission policy of the school should be based on the measurement of shoe size? Otherwise, what accounts for this high correlation?

If sample is selected from a high school having kids in different classes, the same observation is almost sure to occur. The pupil in higher classes have larger shoe size and have higher mathematical skills, whereas student in lower classes have lower mathematical skills. Therefore, high correlation is expected. However, if we take data of only one class, say grade III, we will not see such high correlation. Since theoretically, there should be no correlation between shoe size and mathematical skills, this apparently high correlation may be regarded as spurious correlation/regression. The reason for this spurious correlation is mixing of missing age factor which drives both shoe size and mathematical skills.

Since this is not a time series data, there is no question of the existence of non-stationarity, but the spurious correlation exists. This shows that spurious correlation is no necessarily a time series phenomenon. The unit root and cointegration would be just incapable to solve this problem.

Similarly, it can be shown that the unit root and cointegration analysis can fail to work even with time series data, and this will be discussed in our next blog

Please follow and like us: