Thomas Tooke is perhaps the first person to produce a book in monetary economics. In 1857, he wrote a book titled ‘History of Prices and of the State of the Circulation during the Years 1793–1856’. He is also a pioneer of the ‘Banking School theory’. This theory predicts that higher interest rates should be associated with higher price levels. The logic for this view is very simple; the interest rate is a part of cost of production for the firms. Higher the interest rate, higher would be the cost of production leading to higher prices. This is the oldest theory on the relationship between interest rate and inflation.

However, mainstream economics adapted an opposite theory known as demand channel of monetary transmission. This says that if the interest rate increases, the people will reduce spending and there would be a reduction in aggregate demand which will lead to reduction in prices. This view was adapted at least as early as the 1890s and is popular to date. The inflation targeting framework which is the most popular framework for designing monetary policy today is also based on this hypothesis.

Historical data in every time period provided evidence against the demand channel. The most popular of the early evidences against the demand channel is findings of Gibson. Gibson (1923) analyzed the data on interest rate and prices for the United Kingdom for about 200 years and found that the high interest rate is associated with higher prices; something which is matching with Tooke’s view and supported by the oldest theory in monetary economics.

The findings of Gibson were so impressive that Keynes recognized his findings as ‘one of the most completely established empirical facts in the whole field of quantitative economics.’. However, Keynes termed this finding as ‘Gibson paradox’ indicating absence of any theory to explain the observation. Given the presence of Tooke’s Banking School Theory, this labeling was erroneous. However, Keynes’ recognition was a strong support to the idea that interest rate and inflation are positively associated. This is quite opposite to the logical foundations of the inflation targeting framework.

The history goes on and the empirical evidence supporting Tooke’s view were ignored by labeling as paradox. In the 1970s, there was a re-invention of supply side economics and people discussed the possibility of a cost channel of monetary transmission mechanism. This was strong theoretical support to the positive association between interest rate and inflation.

In 1992, Sims produced his seminal paper where he found that impulse response of inflation to changes in interest rate is positive. Despite the stature of Sims who later won the prestigious Nobel Prize, his findings were labeled as ‘price puzzle’, to indicate absence of theoretical underpinning of the observation. This was a denial of Tooke’s theory and the cost channel.

Brazil reduced policy rate from 14% to 2% during the three years starting from 2017. Such a drastic cut in interest should skyrocket the inflation if the widely believed demand channel was valid, but the opposite happened. The inflation in Brazil during 2017 has been about 10% which is now below 6%.

The response to the Global Financial Crisis and Covid-19 also mark the failure of classical monetary theory. All major economies responded to the pandemic by reducing interest rate and the inflation also reduced. Despite this failure of monetary theory, the international financial institutions such as IMF use to advocate the inflation targeting, which is quite strange.

Besides the contradictions with empirical evidence, there are logical inconsistencies and messed and missed normative implications. Assume for a while that the demand channel is valid i.e. increasing interest rate reduces inflation. If so, it can happen only through the luxuries. The demand for necessities cannot be reduced significantly. Therefore, if any reduction in aggregate price level occurs, it must be driven by prices of luxuries. Therefore, the rise in interest rate will improve the purchasing power of consumers of luxuries, and would be ineffective to improve the prices of necessities. There are very obvious normative implications, but conventional monetary economics never discusses the normative implications of monetary policy. That’s the missed normative implication.

Assume for a while that the traditional demand channel exists and that increase in interest rate reduces the prices. The demand channel also implies that a higher interest rate leads to increase in unemployment. Therefore, the cost of price stability shall be borne by those who will lose their jobs. It is also well known that those who are at the risk of losing jobs are the poorest people. Therefore price stability comes at the cost of the most vulnerable cohort of society, another very serious normative implication. But the traditional monetary economics totally ignores the normative implication of monetary policies.

It is also clear that any real implication of inflation on the economy comes from relative price movement. If prices of all goods and services increase at the same rate, no real variable would be affected; an implication known as monetary neutrality. Contrary to monetary neutrality, the Phillips curve assumes that inflation affects employment, and so happens due to differential in the price changes for wages and commodity prices. This means, focusing aggregate inflation is meaningless. One needs to look at the relative movement of sub-indices of the consumer price index. But the monetary policy especially the inflation targeting framework explicitly focuses on aggregate price level without taking any care of the relative price movement. There is no explanation for this in the literature.

In short, if you try to look into theoretical underpinning of monetary policy, you will find them to be very weak. If you look at the empirical data, the data shows invalidity of the underlying hypothesis. If you look into normative implication, you will find many which are practically ignored. Therefore, the textbooks on monetary economics need a rewrite, and an alternative monetary theory need to be developed which should be based on empirical data, not on the basis of hypothetical theories.

]]>Consider the basic Auto-Regressive Distributed Lag model with an exogenous variable, which is of the form:

Where y represents the dependent variable, p represents the autoregressive order of the ARDL, where it is directly associated to the y (the dependent variable). X is an exogenous explanatory variable which has *l* lags (also a contemporaneous value of x can be included) and the residual term u.

The present form of the ARDL is not actually a long-run form, in fact, it is more a short-run model. Therefore, the actual impact of x through *α* must be done considering the size and orders associated with the dependent variable y through *ß*. The above leads to a situation where we want to weigh the cumulative impact of *α*, and the way to do so is by using a long-run multiplier. Blackburne & Frank (2007) indicate to us that an approximation to this long-run multiplier, would involve a non-linear transformation to get a long-run coefficient, such transformation is given in the general form of:

Where this is the long-run multiplier of the variable X, also please note how this formula works. It’s using the sums of the coefficient α associated to the independent variable (and its lags) divided by 1 minus the sums of the autoregressive ß coefficients. Upper part corresponds to the Long-Run Propensity of X towards y, which is just simply the sums of the coefficients, and it’s interpreted that given one permanent change of one unit in x, the sums would be the long-run propensity as impact on y. The down part represents the weight associated to the response of the autoregressive structure.

This means that if for example, if we got an ARDL (2,2) it refers to a model where we got two lags of the dependent variable and two lags associated to the independent variable (considering of course the contemporaneous value of x). This model is one of the form of

And the weighted long-run multiplier will be given in the form of:

Where α goes from 1 up to 3, it starts from the contemporaneous value of x given by coefficient α1 and then sums the coefficients of the lag orders α2 for lag 1, and α3 for lag 2. Notice that we subtract the sums of the autoregressive parameters ß from the unity to weight the size of the impact of the cumulative sums of x.

Interpretation of the long-run coefficient goes as follow: if x in levels change by one unit, then the average/expected change in y would be given by the long-run coefficient.

**Let’s put this together with an example in Stata.**

Load up the data base and generate a time identification variable with:

use https://www.stata-press.com/data/r16/auto

generate t = _n

Then tell to Stata that you’re working with time series, so:

tsset t, y

Now let’s estimate an ARDL (2,2) model using the variables of price and weight, where the price is the dependent variable and weight is the independent variable (all assumed to be stationary variables).

reg price L.price L2.price weight L1.weight L2.weight

From here you can analyze a lot of things, for example, the long-run propensity will be given by:

** Long-run propensity of x (weight) display _b[weight] +_b[L1.weight]+_b[L2.weight]

And the long-run multiplier which we discussed, can be calculated by:

** Long-run multiplier of x display (_b[weight] +_b[L1.weight]+_b[L2.weight]) / (1-(_b[L1.price] + _b[L2.price]))

And from here, you can even go to estimate the long-run coefficient with statistical significance and the actual value of the long-run coefficient by using nlcom: this can be done by using:

nlcom (_b[weight] +_b[L1.weight]+_b[L2.weight]) / (1-(_b[L1.price] + _b[L2.price]))

Notice that when the weight increases in unit over the long-run the expected change would be of 1.68 units on the price, statistically significant with a 10% level of significance.

You can extend such analysis to the famous long-run & short-run dynamics of the Cointegration tests of Engle & Granger, where you just will have to compute the short-run coefficients in order to obtain the long-run coefficients, this will be done in a future next post.

An excellent video to help you to get this idea can be found in Nyboe Tabor (2016).

Bibliography:

Blackburne, E. F. & Frank, M.W. (2007) Estimation of nonstationary heterogeneous panels The Stata Journal (2007), 7, Number 2, pp. 197-208.

Nyboe Tabor, M. (2016) The ADL Model for Stationary Time Series: Long-run Multipliers and the Long-run Solution, Recuperated from: https://www.youtube.com/watch?v=GLpCVrZbW-g

]]>At the end of this blog, you will find the data on three variables, (i) Household Consumption (ii) GDP and (iii) for the South Korea. The data set is retrieved from WDI

Before starting the modeling, it is very useful to plot the data series. We have three data series, two of them are on same scale and can be plotted together. The third series ‘inflation’ is in percentage form and if plotted with the above mentioned series, it will not be visible. The graph of two series is as follows

You can see, the gap between income and consumption seems to be diverging over time. This is natural phenomenon, suppose a person has income 1000, and consumes 70% of it, the difference between consumption and income would be 300. Suppose the income has gone up to 10,000 and the MPC is same, than the difference between two variables would be widened to 3000. This widening gap is visible in the graph.

However, the widening gap creates problem in OLS. The residuals in the beginning of the data would have smaller variance and at the endpoints, they will have larger variance, i.e. there will be heteroskedasticity. In presence of heteroskedasticity, the OLS doesn’t remain efficient.

The graphs also show a non-linearity, the two series appear to behave like exponential series. A solution to the two problems is to use the log transform. The difference in log transform of two series is roughly equal to the percentage difference, and if the MPC remains same, the gap between two series would be smoothened.

I have taken the log transform and plotted the series again, the graphs is as follows

You can see the gap between log transform of two series is smooth compared to the previous graph. One can see the gap is still widening, but much smoother compared to the previous graph. The widening gap in this graph indicates decline in MPC overtime. Anyhow, the two graphs indicate that log transform is better to start building model.

I am starting with ARDL model of the following form

Where Ct indicates consumption Yt and indicates income

The estimated equation is as follows

The equation has very high R-square, but a high R-square in time series is no surprise. This turns out to be high even with unrelated series. However, the thing to note is the sigma which is the standard deviation of residuals, indicating average size of error is 0.0271. Before we proceed further we want to make sure that the estimated model is not having the issue of failure of assumption. We tested the model for normality, autocorrelation and heteroskedasticity, and the results are as follows;

The autocorrelation (AR) test has the null hypothesis of no autocorrelation and the P-value for AR test is above 5%, indicating that the null is no rejected and the hypothesis survived with a narrow margin. Normality test with null of normality and heteroskedasticity test with null of heteroskedasticity also indicate validity of the assumptions.

We want to ensure that the model is also good at prediction, because the ultimate goal of an econometric model is to predict the future. But the problem is, for the real time forecasting, we have to wait for years to see whether the model has the capability to predict. One solution to this problem is to leave some observation out of the model for purpose of prediction and then see how the model works to predict these observations.

The output indicates that the two tests for predictions have p-value much greater than 5%. The null hypothesis for Forecast Chi-square test is that the error variance for the sample period and forecast period are same and this hypothesis is not rejected. Similarly, the null hypothesis for Chow test is that the parameters remain same for the sample period and forecast period and this hypothesis is also not rejected.

All the diagnostic again show satisfactory results

Now let’s look back at the output of Eq(2). It shows the second lag variables Lconsumption_2 and LGDP_2 are insignificant. This means, keeping the Lconsumption_2 in the model, you can exclude LGDP_2 and vice versa. But to exclude both of these variables, you need to test significance of the two variables simultaneously. Sometime it happens that two variables are individually insignificant but become significant when taken together. Usually this happens due to multi-colinearity. We test joint significance of the two second lag variables, i.e.

The results of the test are

F(2,48) = 2.1631 [0.1260]

The results indicate that the hypothesis is not rejected, therefore, we can assume the coefficients of relevant variables to be zero, therefore the model becomes

The model M2 was estimated and the results are as follows

The results show the diagnostic tests for the newly estimated model are all OK, and the forecast performance for the new model is not affected by excluding the two variables. If you compare sigma for for Eq (2) and Eq(3), you will the difference only at fourth decimal. This means the size of model is reduced without paying any cost in terms of predictability.

Now the variables in the model are significant except the intercept for which the p-value is 0.178. This means the regression doesn’t support an intercept. We can reduce the model further by excluding intercept. This time we don’t need to test joint restriction because we want to exclude only one variable. After excluding the intercept, the model becomes

The output indicates that all the diagnostic are OK. All the variables are significant, so no variable can be excluded further.

Now we can impose some linear restrictions instead of the exclusion restrictions. For example, if we want to tests whether or not we can take difference of Cons and Income, we need to test following

And if we want to test restriction for the error correction model, we have to test

Apparently the two restriction seems valid because estimated value of is close to 1 and values of sum to 0. We have the choice to test R3 or R4. We are testing restriction R3 first. The results are as follows

This means the error correction model can e estimated for the data under consideration.

For the error correction model, one needs to estimate a static regression (without lags) and to use the residuals of the equation as error correction term. Estimating static regression yield

The estimates of this equation are representative of the long run coefficients of relationship between the two variables. This shows the long run elasticity of consumption with respect to income is 0.93

We have to estimate following kind of error correction regression

The intercept doesn’t enter in the error correction regression. The estimates are as follows

This is the parsimonious model made for the consumption and income. The Eq (5) is representative of long run relationship between two variable and Eq (6) informs about short run dynamics.

The final model has only two parameters, whereas as Eq(1) that we started with contains 6 parameters. The sigma for the Eq(6) and Eq (2) are roughly same which informs that the large model where we started has same predicting power as the last model. The diagnostic tests are all OK which means the final model is statistically adequate in that it the assumption of the model are not opposed by the data.

The final model is an error correction model, which contains information for both short run and long run. The short run information is present in equation (6), whereas the long run information is implicit in the error correction term and it is available in the static Eq (5).

The same methodology can be adapted for the more complex situations and the researcher needs to start from a general model, reducing it successively until the most parsimonious model which is statistically adequate is achieved

Data

Variables Details:

Consumption: Households and NPISHs Final consumption expenditure (current LCU) |

GDP: GDP Current LCU

Country: Korea, Republic

Time Period: 1960-2019

Source: WDI online (open source data)

Consumption | GDP | |

1960 | 212720000000 | 249860000000 |

1961 | 252990000000 | 301690000000 |

1962 | 304100000000 | 365860000000 |

1963 | 417870000000 | 518540000000 |

1964 | 612260000000 | 739680000000 |

1965 | 693780000000 | 831390000000 |

1966 | 837890000000 | 1066070000000 |

1967 | 1039800000000 | 1313620000000 |

1968 | 1277060000000 | 1692900000000 |

1969 | 1597780000000 | 2212660000000 |

1970 | 2062100000000 | 2796600000000 |

1971 | 2592400000000 | 3438000000000 |

1972 | 3104800000000 | 4267700000000 |

1973 | 3798600000000 | 5527300000000 |

1974 | 5443600000000 | 7905000000000 |

1975 | 7285700000000 | 10543600000000 |

1976 | 9315500000000 | 14472800000000 |

1977 | 11361500000000 | 18608100000000 |

1978 | 15016700000000 | 25154500000000 |

1979 | 19439700000000 | 32402300000000 |

1980 | 24916700000000 | 39725100000000 |

1981 | 31181500000000 | 49669800000000 |

1982 | 35278600000000 | 57286600000000 |

1983 | 39796800000000 | 68080100000000 |

1984 | 44444800000000 | 78591300000000 |

1985 | 49305000000000 | 88129700000000 |

1986 | 54837200000000 | 102986000000000 |

1987 | 61775800000000 | 121698000000000 |

1988 | 71362200000000 | 145995000000000 |

1989 | 83899400000000 | 165802000000000 |

1990 | 100738000000000 | 200556000000000 |

1991 | 122045000000000 | 242481000000000 |

1992 | 141345000000000 | 277541000000000 |

1993 | 161105000000000 | 315181000000000 |

1994 | 192771000000000 | 372493000000000 |

1995 | 227070000000000 | 436989000000000 |

1996 | 261377000000000 | 490851000000000 |

1997 | 289425000000000 | 542002000000000 |

1998 | 270298000000000 | 537215000000000 |

1999 | 311177000000000 | 591453000000000 |

2000 | 355141000000000 | 651634000000000 |

2001 | 391692000000000 | 707021000000000 |

2002 | 440207000000000 | 784741000000000 |

2003 | 452737000000000 | 837365000000000 |

2004 | 468701000000000 | 908439000000000 |

2005 | 500911000000000 | 957448000000000 |

2006 | 533278000000000 | 1005600000000000 |

2007 | 571810000000000 | 1089660000000000 |

2008 | 606356000000000 | 1154220000000000 |

2009 | 622809000000000 | 1205350000000000 |

2010 | 667061000000000 | 1322610000000000 |

2011 | 711119000000000 | 1388940000000000 |

2012 | 738312000000000 | 1440110000000000 |

2013 | 758005000000000 | 1500820000000000 |

2014 | 780463000000000 | 1562930000000000 |

2015 | 804812000000000 | 1658020000000000 |

2016 | 834805000000000 | 1740780000000000 |

2017 | 872791000000000 | 1835700000000000 |

2018 | 911576000000000 | 1898190000000000 |

2019 | 931670000000000 | 1919040000000000 |

What is ARDL model?

ARDL model is an a-theoretic model for modeling relationship between two time series. Suppose we want to see the effect of time series variable Xt on another variable Yt. The ARDL model for the purpose will be of the form

The same model can be written as

This means, in the layman language the dependent variable is regressed on its own lags, independent variable and the lags of independent variables. The above ARDL model can be termed as ARDL (j, k) model, referring to number of lags j & K in the model.

The model itself is written without any theoretical considerations. However, a large number of theoretical models are embedded inside this model and one can drive appropriate theoretical model by testing and imposing restrictions on the model.

To have more concrete idea, let’s consider the case of relationship between consumption and income. To further simplify, lets consider j=k=1, so that the ARDL(1,1) model for the relationship of consumption and income can be written as

Model 1: Ct=a+b_{1}C_{t-1}+d_{0}Y_{t}+d_{1}Y_{t-1}+e_{t}

HereC denotes consumption and Y denotes income, a,b_{1},d_{0},d_{1} denote the regression coefficient and e_{t} denotes error term. So far, no theory is used to develop this model and the regression coefficients don’t have any theoretical interpretation. However, this model can be used to select appropriate theoretical model for the consumption.

Suppose we have estimated the above mentioned model and found the regression coefficients. We can test any one of the coefficient and/or number of coefficient for various kinds of restriction. Suppose we test the restriction that

R1: H0: (b_{1} d_{1})=0

Suppose testing restriction on actual data implies that restriction is valid, this means we can exclude the curresponding variables from the model. Excluding the variables, the model will become

Model 2: Ct=a+ d_{0}Y_{t}+e_{t}

The model 2 is actually the Keynesian consumption (also called absolute income hypothesis), which says that current consumption is dependent on current income only. The coefficient of income in this equation is the marginal propensity to consume and Keynes predicted that this coefficient would be between 0 and 1, implying that individuals consume a part of their income and save a part of their incomes for future.

Suppose that the data did not suppose the restriction R1, however, the following restriction is valid

R2: H0: d_{1}=0

This means model 1 would become

Model 3: Ct=a+b_{1}C_{t-1}+d_{0}Y_{t}+e_{t}

This means that current consumption is dependent on current income and past consumption. This is called Habit Persistence model. The past consumption here is the proxy of habit. The model says that what was consumed in the past is having effect on current consumption and is evident from human behavior.

Suppose that the data did not suppose the restriction R1, however, the following restriction is valid

R3: H0: b_{1}=0

This means model 1 would become

Model 4: Ct=a+ d_{0}Y_{t}+d_{1}Y_{t-1}+e_{t}

This means that current consumption is dependent on current income and past income. This is called Partial Adjustment model. As per implications of Keynesian consumption function, the consumption should only depend on the current income, but the partial adjustment model says that it takes sometimes to adjust to the new income. Therefore, the consumption is partially on the current income and partially on the past income

In a similar way, one can derive many other models out of the model 1 which are representative of different theories. The details of the models that can be drawn from model 1 can be found in Charemza and Deadman (1997)’s ‘New Directions in Econometric Practice…’.

It can also be shown that the difference form models are also derivable from model 1. Consider the following restriction

R 4:

If this restriction is valid, the model 1 will become

Ct=a+C_{t-1}+d_{0}Y_{t}-d_{0}Y_{t-1}+e_{t}

This model can be re-written as

Ct-C_{t-1}=d_{0}(Y_{t}-Y_{t-1})+e_{t}

This means

Model 5: DCt=d_{0}DY_{t}+e_{t}

This indicates that the difference form models can also be derived from the model 1 with certain restrictions

Further elaboration shows that the error correction models can also be derived from model 1.

Consider model 1 again and subtract C_{t-1 }both sides, we will get

Ct- C_{t-1}=a+b_{1}C_{t-1} -C_{t-1}+d_{0}Y_{t}+d_{1}Y_{t-1}+e_{t}

Adding and subtracting d_{0}Y_{t-1 }on the right hand side we get

DCt=a+(b_{1}-1)C_{t-1}+d_{0}Y_{t}+d_{1}Y_{t-1} +d_{0}Y_{t-1} -d_{0}Y_{t-1} +e_{t}

DCt=a+(b_{1}-1)C_{t-1}+d_{0}DY_{t}+d_{1}Y_{t-1} +d_{0}Y_{t-1} +e_{t}

DCt=a+(b_{1}-1)C_{t-1}+d_{0}DY_{t}+(d_{1}+d_{0})Y_{t-1}+e_{t}

_{This equation contains error correction mechanism if}

R6: (b_{1}-1)= – (d_{1}+d_{0})

Assume

(b_{1}-1)= – (d_{1}+d_{0})=F

The equation will reduce to

DCt=a+F(C_{t-1}-Y_{t-1})+ d_{0}DY_{t} +e_{t}

This is our well known error correction model and can be derived if R6 is valid.

Therefore, existence of an error correction mechanism can also be tested from model 1 and restriction to be considered valid if R6 is valid.

As we have discussed, number of theoretical models can be driven from model 1 by testing certain restrictions. We can start from model 1 and go with testing different restrictions. We can impose the restriction which is found valid and discard the restrictions which were found invalid in our testing. This provides us a natural way of selection among various theoretical models.

When we say theoretical model, this means there is some economic sense of the model. For example the models 2 to model 6 all make economic sense. So, how to decide between these models? This problem can be solved if we start with an ARDL model and choose to impose restrictions which are permitted by the data

The famous DHSY paper recommends a methodology like this. DHSY recommend that we should start with a large model which encompasses various theoretical models. The model can then be simplified by testing certain restrictions.

In another blog I have argued that if there are different theories for a certain variables, the research must be comparative. This short blog gives the brief outlines about how we can do this. Practically, one need to take larger ARDL structures and number of models that can be derived from the parent model would also be large.

]]>The findings seem strong and impressive, but let me add into the information that the male students were chosen from a far-off village with untrained educational staff and lack of educational facilities. The female students were chosen from an elite school of a metropolitan city, where the best teachers of the city actually serve. What should be the conclusion now? It can be argued that actually difference doesn’t come from the gender, the difference is coming from the school type.

The researcher carrying out the project says ‘look, my research assignment was only to investigate the difference due to gender, the school type is not the question I am interested in, therefore, I have nothing to do with the school type’.

Do you think that the argument of researcher is valid and the findings should be considered reliable? The answer is obvious, the findings are not reliable, and the school type creates a serious bias. The researcher must compare students from the same school type. This implies you have to take care of the variables not having any mention in your research question if they are determinants of your dependent variable.

Now let’s apply the same logic to econometric modeling, suppose we have the task to analyze the impact of financial development on economic growth. We are running a regression of GDP growth on a proxy of financial development; we are getting a regression output and presenting the output as impact of financial development on economic growth. Is it a reliable research?

This research is also deficient just like our example of gender and mathematical ability. The research is not reliable if ceteris paribus doesn’t hold. The other variables which may affect the output variable should remain same.

But in real life, it is often very difficult to keep all other variables same. The economy continuously evolves and so are the economic variables. The other solution to overcome the problem is to take the other variables into account while running regression. This means other variables that determine your dependent variable should be taken as control variables in the regression. This means suppose you want to check the effect of X1 on Y using model Y=a+bX1+e. Some other research studies indicate that another model exist for Y which is Y=c+dX2+e. Then I cannot run the first model ignoring the second model. If I am running only model 1 ignoring the other models, the results would be biased in a similar way as we have seen in our example of mathematical ability. We have to use the variables of model 2 as control variable, even if we are not interested in coefficients of model 2. Therefore, the estimated model would be like Y=a+bX1+cX2+e

Taking the control variables is possible when there are a few models. The seminal study of Davidson, Hendry, Sarba and Yeo titled ‘Econometric modelling of the aggregate time-series relationship between …. (often referred as DHSY)’ summarizes the way to build a model in such a situation. But it often happens that there exists very large number of models for one variable. For example, there is very large number of models for growth. A book titled ‘Growth Econometrics’ by Darlauf lists hundreds of models for growth used by researchers in their studies. Life becomes very complicated when you have so many models. Estimating a model with all determinants of growth would be literally impossible for most of the countries using the classical methodology. This is because growth data is usually available at annual or quarterly frequency and the number of predictors taken from all models collectively would exceed number of observations. The time series data also have dynamic structure and taking lags of variables makes things more complicated. Therefore, classical techniques of econometrics often fail to work for such high dimensional data.

Some experts have invented sophisticated techniques for the modeling in a scenario where number of predictor becomes very large. These techniques include Extreme Bound Analysis, Weighted Average Least Squares, and Autometrix etc. The high dimensional econometric techniques are also very interesting field of econometric investigation. However, DHSY is extremely useful for the situations where there are more than one models for a variable based on different theories. The DHSY methodology is also called LSE methodology, General to Specific Methodology or simply G2S methodology.

]]>Assume a two-equation system of the form:

Where the y’s represents the endogenous variables, Z represents the exogenous variables taken as instruments and u are the residuals for each equation. Notice that y_{2} is in a quadratic form in the first equation but also present in linear terms on the second equation.

Woolridge calls this model as **nonlinear in endogenous variable**, yet the model still linear in the parameters γ making this a particular problem where we need to somehow instrument the quadratic term of y_{2}.

Finding the instruments for the quadratic term is a particular challenge than already it is for linear terms in simple instrumental variable regression. He suggests the following:

*“A general approach is to always use some squares and cross products of the exogenous variables appearing somewhere in the system. If something like exper ^{2} appears in the system, additional terms such as exper^{3} and exper^{4} would be added to the instrument list.” (Wooldridge, 2002, p. 235).*

Therefore, it worth the try to use nonlinear terms of the exogenous variables from Z, in the form of possible Z^{2} or even Z^{3}. And use these instruments to deal with the endogeneity of the quadratic term y_{2}. When we define our set of instruments, then any nonlinear equation can be estimated with two-stage least squares. And as always, we should check the overidentifying restrictions to make sure we manage to avoid inconsistent estimates.

The process with an example.

Let’s work with the Example of a nonlinear labor supply function. Which is a system of the form:

Some brief description of the model indicates that for the first equation, the hours (worked) are a nonlinear function of the wage, the level of education (educ), the age (age), the kids situation associated to the age, whether if they’re younger than 6 years old or between 6 and 18 (kidslt6 and kidsge6), and the wife’s income (nwifeinc).

On the second equation, the wage is a function of the education (educ), and a nonlinear function of the exogenous variable experience (exper and exper^{2}).

We work on the natural assumptions that E(u|z)=0 therefore the instruments are exogenous. Z in this case contains all the other variables which are not endogenous (hours and wage are the endogenous variables).

We will instrument the quadratic term of the logarithm of the wage in the first equation, and for such instrumenting process we will add three new quadratic terms, which are:

And we include those in the first-stage regression.

With Stata we first load the dataset which can be found here.

https://drive.google.com/file/d/1m4bCzsWgU9sTi7jxe1lfMqM2T4-A3BGW/view?usp=sharing

Load up the data (double click the file with Stata open or use some path command to get it ready)

use MROZ.dta

Generate the squared term for the logarithm of the wage with:

gen lwage_sq=lwage *lwage

Then, get ready to use the following command with ivregress, however, we will explain it in detail.

*ivregress 2sls hours educ age kidslt6 kidsge6 nwifeinc (lwage lwage_sq = educ c.educ#c.educ exper expersq age c.age#c.age kidsge6 kidslt6 nwifeinc c.nwifeinc#c.nwifeinc), first*

Which has the following interpretation. According to the syntaxis of Stata’s program. First, make sure you specify the **first equation** with the associated exogenous variables, we do that with the part.

*ivregress 2sls hours educ age kidslt6 kidsge6 nwifeinc*

Now, let’s tell to Stata that we have two other endogenous regressors, which are the wage and the squared term of the wages. We open the bracket and put

*(lwage lwage_sq =*

This will tell to Stata that lwage and lwage_sq are endogenous, part of the first equation of hours, and after the equal, we specify ALL the exogenous variables including the instruments for the endogenous terms, this will lead to include the second part as:

*(lwage lwage_sq = educ c.educ#c.educ exper expersq age c.age#c.age kidsge6 kidslt6 nwifeinc c.nwifeinc#c.nwifeinc)*

Notice that this second part will have a **c.var#c.var** structure, this is Stata’s operator to indicate a multiplication for continuous variables, (and we induce the quadratic terms without generating the variables with another command like we did with the wage).

So notice we have **c.educ#c.educ** which is the square of the educ variable, and c**.age#c.age** which is the square of the age, and we also square the wife’s income with **c.nwifeinc#c.nwifeinc**. These are the instruments for the quadratic term.

The fact that we have two variables on the left (lwage and lwage_sq) indicates that the set of instruments will hold first for an equation for lwage and second for lwage_sq given the exact same instruments.

We include the option **, first** to see what were the regressions in the first stage.

ivregress 2sls hours educ age kidslt6 kidsge6 nwifeinc (lwage lwage_sq = educ c.educ#c.educ exper expersq age c.age#c.age kidsge6 kidslt6 nwifeinc c.nwifeinc#c.nwifeinc), first

The output of the above model for the first stage equations is:

And the output for the two stage equation is:

Which yields in the identical coefficients in Woolridge’s book (2002, p- 236) also with some slightly difference in the standard errors (yet these slight differences do not change the interpretation of the statistical significance of the estimators).

In this way, we instrumented both endogenous regressors lwage and lwage_sq. Which are a nonlinear relationship in the model.

As we can see, the quadratic term is not statistically significant to explain the hours worked.

At last, we need to make sure that overidentification restrictions are valid. So we use after the regression

estat overid

And within this result, we cannot reject the null that overidentifying restrictions are valid.

**Bibliography**

Wooldridge, J. M. (2002). Econometric Analysis of Cross Section and Panel Data. Cam-bridge, MA: MIT Press.

]]>We start with the linear model as:

Where y represents the dependent variable, X is the (1xK) vector of exogenous variables, Z is considered a vector of time-invariant covariates. With *µ* as individual effects for each individual. Special importance is associated with the correlation between X and *µ* since, if such correlation is zero (or uncorrelated), we better go for the random-effects model, however, if X and *µ* are correlated, it’s better to stick with fixed-effects.

The estimators of fixed and random effects rely on the absence of serial correlation. From this Wooldridge use the residual from the regression of (1) but in first-differences, which is of the form of:

Notice that such differentiating procedure eliminates the individual effects contained in *µ*, leading us to think that level-effects are time-invariant, hence if we analyze the variations, we conclude there’s non-existing variation over time of the individual effects.

Once we got the regression in first differences (and assuming that individual-level effects are eliminated) we use the predicted values of the residuals of the first difference regression. Then we double-check the correlation between the residual of the first difference equation and its first lag, if there’s no serial correlation then the correlation should have a value of -0.5 as the next expression states.

Therefore, if the correlation is equal to -.5 the original model in (1) will not have serial correlation. However, if it differs significantly, we have a serial correlation problem of first-order in the original model in (1).

For all of the regressions, we account for the within-panel correlation, therefore all of the procedures require the inclusion of the cluster regression, and also, we omit the constant term in the difference equation. In sum we do:

**Specify our model (whether if it has fixed or random effects, but these should be time-invariant).****Create the difference model (using first differences on all the variables, therefore the difference model will not have any individual effects). We perform the regression while clustering the individuals and we omit the constant term.****We predict the residuals of the difference model.****We regress the predicted residual over the first lag of the predicted residual. We also cluster this regression and omit the constant.****We test the hypothesis if the lagged residual equal to -0.5.**

Let’s do a quick example of this steps using the same example as Drukker.

We start loading the database.

use http://www.stata-press.com/data/r8/nlswork.dta

Then we format the database for stata with the code:

xtset idcode year

Then we generate some quadratic variables.

gen age2 = age^2

gen tenure2 = tenure^2

We regress our model of the form of:

xtreg ln_wage age* ttl_exp tenure* south, fe

It doesn’t matter whether if it is fixed or random effects as long as we assume that individuals’ effects are time invariant (therefore they get eliminated in the first difference model).

Now let’s do the manual estimation of the test. In order to do this, we use a pooled regression of the model without the constant and clustering the regression for the panel variable. This is done of the form:

reg d.ln_wage d.age* d.ttl_exp d.tenure* d.south, noconst cluster(idcode)

The options noconst eliminates the constant term for the difference model, and cluster option includes a clustering approach in the regression structure, finally idcode is the panel variable which we identify our individuals in the panel.

The next thing to do is predict the residuals of the last pooled difference regression, and we do this with:

predict u, res

Then we regress the predicted residual u against the first lag of u, while we cluster and also eliminate the constant of the regression as before.

reg u L.u, noconst cluster(idcode)

Finally, we test the hypothesis whether if the coefficient of the first lag of the pooled difference equation is equal or not to -0.5

test L.u==-0.5

According to the results we strongly reject the null hypothesis of no serial correlation with a 5% level of significance. Therefore, the model has serial correlation problems.

We can also perform the test with the Stata compiled package of Drukker, which can be somewhat faster. We do this by using

xtserial ln_wage age* ttl_exp tenure* south, output

and we’ll have the same results. However, the advantage of the manual procedure of the test is that it can be done for any kind of model or regression.

**Bibliography**

Drukker, D. (2003) Testing for serial correlation in linear panel-data models, The Stata Journal, 3(2), pp. 168–177. Taken from: https://journals.sagepub.com/doi/pdf/10.1177/1536867X0300300206

Wooldridge, J. M. (2002). Econometric Analysis of Cross Section and Panel Data. Cam-bridge, MA: MIT Press.

]]>If we’re purely interested in statistical inferences, we should go for the HAC robust standard errors under the Time Series context. This name as Woolridge appoints refers to:

“In the time series literature, the serial correlation–robust standard errors are sometimes called heteroskedasticity and autocorrelation consistent, or HAC, standard errors.” (Wooldridge, ,p. 432).

We got to appoint that HAC standard errors (also called HAC estimators) are derived from the work of Newey & West (1987) where the objective was to build a robust approach to handle the usual problems of time series associated with serial correlation and heteroskedasticity.

What’s the idea behind these standard errors? Well, we can summarize it as:

- We do not know the form of the serial correlation.
- Works for arbitrary forms of serial correlation and the autocorrelation structure can be derived from the sample size.
- With larger samples, we can be flexible in the amount of serial correlation.

This means, that even when the robust standard error is consistent in the presence of the serial correlation and heteroskedasticity, we still need to figure the lag structure for the autocorrelation. Again, Woolridge helps us to decide this on simple basis:

*Annual data = 1 lag, 2 lags. Quarterly data= 4 up to 8 lags. Monthly data = 12 up to 24 lags.*

Let’s dig into some formulas to understand the relationship between HAC and OLS.

First, Newey & West standard errors work under the ordinary least squares estimator of the form:

Where X is the matrix of independent variables and Y is the vector of the dependent variable. This leads to establishing that Newey-West estimates in terms of values of the estimators will not differ from the OLS estimates.

Second, Newey & West standard errors modify the role of the estimated variances to include White’s robust approach to heteroskedasticity and also the serial correlation structure.

Consider that estimates of the variance in OLS are given by:

Where Ω is the diagonal matrix containing the distinct variances (for a representation of heteroskedasticity).

Now, White robust estimator is defined by:

Where n is the sample size, and e is the estimated time-period residual with ith row of the matrix of independent variables. Let’s define this as robust estimates with 0 lags (since it is only handling heteroskedasticity).

Now here’s where Newey & West extended the White estimator to include the arbitrary forms of serial correlation with a m-lag structure:

As it is visible, the HAC estimates of the variance now include the heteroskedasticity and a m-lag consistent estimate. K represents the number of independent variables, t the time periods and x is the row of matrix of independent variables observed at time t.

So with this, it is more clearly to work under the frame of :

*Annual data: m=1,2 lags. Quarterly data: m=4,8 lags. Monthly data: 12,24 lags.*

Let’s see an example:

use https://www.stata-press.com/data/r16/auto

generate t = _n

tsset t

regress price weight displ, vce(robust)

Up to this point, this is the White robust standard errors to heteroskedasticity, now let’s estimate the HAC estimator with the equivalent which is 0 lags.

newey price weight displ, lag(0)

As you can see everything is exact in comparison to the White’s robust standard errors. Now let’s start to use the HAC structure under 2 lags.

newey price weight displ, lag(2)

Notice as well that the values of standard errors of the independent variables have changed with this estimation.

I would recommend always to provide estimates of the HAC SE, in order to obtain more comparative estimates and correct inferences.

As a last mention, Greene (2012) states as a usual practice to select the integer approximate of T^(1/4) where T is the total of time periods of time. For example, for our case considering it is annual data, it would be

display (74)^(1/4)

and Stata will display a value, therefore our lags to select would be 3 and 2 (with no specific criteria to select one over the other).

**Bibliography:**

Greene, W. H. 2012. Econometric Analysis, 7th edition, section 20.5.2, p. 960

Newey, W. K., and K. D. West. 1987. A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica 55: 703–708.

Wooldridge, J. 2013. Introductory Econometrics: A Modern Approach, Fifth Edition. South-Western CENGAGE Learning.

]]>The lessons from behavioral economics have ameliorated social wellbeing and economic success in recent years. Academics and policymakers now recognize that integrating how individuals behave and make decisions in real-life dramatically improves the effectiveness of public policies and the validity of simple theoretical models. Thus, this area of research has enhanced our understanding of the barriers to decision-making and led to the emergence of a wider and richer theoretical and empirical framework to inform human decision making.

This framework builds on fields such as sociology, anthropology, psychology, economics, and political science. Two of the last four Nobel Prizes in Economics (2017 and 2019) have been awarded to Behavioral and Experimental economists working also on development-related problematics. The wider results from this body of work have been used by academics, governments, and international organizations to design evidence-based policies in a wide range of activities such as finance, tax collection, healthcare, education, energy consumption and human cooperation.

Based on this relevance, the present workshop aims to teach foundations on behavioral economics and how their instruments can help improve social and economic outcomes in problems found in modern public policy. Similarly, the workshop will establish statistical and econometric techniques (and commands) to secure the correct implementation of interventions, and the assessment of their results.

Learn more and register at the upcoming workshop in March 2021 at https://ms-researchhub.com/home/training/expert-metrics-behavioral-and-experimental-econometrics.html

]]>