General to Specific Modeling; a Step by Step Guide

In my the previous blogs, [1] [2] I have explained that following the General Specific Methodology, one can choose between theoretical models to find out a model which is compatible with data. Here is an example which shows step by step procedure of the general to simple methodology.

At the end of this blog, you will find the data on three variables, (i) Household Consumption (ii) GDP and (iii) for the South Korea. The data set is retrieved from WDI

Before starting the modeling, it is very useful to plot the data series. We have three data series, two of them are on same scale and can be plotted together. The third series ‘inflation’ is in percentage form and if plotted with the above mentioned series, it will not be visible. The graph of two series is as follows

You can see, the gap between income and consumption seems to be diverging over time. This is natural phenomenon, suppose a person has income 1000, and consumes 70% of it, the difference between consumption and income would be 300. Suppose the income has gone up to 10,000 and the MPC is same, than the difference between two variables would be widened to 3000. This widening gap is visible in the graph.

However, the widening gap creates problem in OLS. The residuals in the beginning of the data would have smaller variance and at the endpoints, they will have larger variance, i.e. there will be heteroskedasticity. In presence of heteroskedasticity, the OLS doesn’t remain efficient.

The graphs also show a non-linearity, the two series appear to behave like exponential series. A solution to the two problems is to use the log transform. The difference in log transform of two series is roughly equal to the percentage difference, and if the MPC remains same, the gap between two series would be smoothened.

I have taken the log transform and plotted the series again, the graphs is as follows

You can see the gap between log transform of two series is smooth compared to the previous graph. One can see the gap is still widening, but much smoother compared to the previous graph. The widening gap in this graph indicates decline in MPC overtime. Anyhow, the two graphs indicate that log transform is better to start building model.

I am starting with ARDL model of the following form

Where Ct indicates consumption Yt and indicates income

The estimated equation is as follows

The equation has very high R-square, but a high R-square in time series is no surprise. This turns out to be high even with unrelated series. However, the thing to note is the sigma which is the standard deviation of residuals, indicating average size of error is 0.0271. Before we proceed further we want to make sure that the estimated model is not having the issue of failure of assumption. We tested the model for normality, autocorrelation and heteroskedasticity, and the results are as follows;

The autocorrelation (AR) test has the null hypothesis of no autocorrelation and the P-value for AR test is above 5%, indicating that the null is no rejected and the hypothesis survived with a narrow margin. Normality test with null of normality and heteroskedasticity test with null of heteroskedasticity also indicate validity of the assumptions.

We want to ensure that the model is also good at prediction, because the ultimate goal of an econometric model is to predict the future. But the problem is, for the real time forecasting, we have to wait for years to see whether the model has the capability to predict. One solution to this problem is to leave some observation out of the model for purpose of prediction and then see how the model works to predict these observations.

The output indicates that the two tests for predictions have p-value much greater than 5%. The null hypothesis for Forecast Chi-square test is that the error variance for the sample period and forecast period are same and this hypothesis is not rejected. Similarly, the null hypothesis for Chow test is that the parameters remain same for the sample period and forecast period and this hypothesis is also not rejected.

All the diagnostic again show satisfactory results

Now let’s look back at the output of Eq(2). It shows the second lag variables Lconsumption_2 and LGDP_2 are insignificant. This means, keeping the Lconsumption_2 in the model, you can exclude LGDP_2 and vice versa. But to exclude both of these variables, you need to test significance of the two variables simultaneously. Sometime it happens that two variables are individually insignificant but become significant when taken together. Usually this happens due to multi-colinearity. We test joint significance of the two second lag variables, i.e.

The results of the test are

F(2,48)   =   2.1631 [0.1260] 

The results indicate that the hypothesis is not rejected, therefore, we can assume the coefficients of relevant variables to be zero, therefore the model becomes

The model M2 was estimated and the results are as follows

The results show the diagnostic tests for the newly estimated model are all OK, and the forecast performance for the new model is not affected by excluding the two variables. If you compare sigma for for Eq (2) and Eq(3), you will the difference only at fourth decimal. This means the size of model is reduced without paying any cost in terms of predictability.

Now the variables in the model are significant except the intercept for which the p-value is 0.178. This means the regression doesn’t support an intercept. We can reduce the model further by excluding intercept. This time we don’t need to test joint restriction because we want to exclude only one variable. After excluding the intercept, the model becomes

The output indicates that all the diagnostic are OK. All the variables are significant, so no variable can be excluded further.

Now we can impose some linear restrictions instead of the exclusion restrictions. For example, if we want to tests whether or not we can take difference of Cons and Income, we need to test following

      

And if we want to test restriction for the error correction model, we have to test

Apparently the two restriction seems valid because estimated value of  is close to 1 and values of  sum to 0. We have the choice to test R3 or R4. We are testing restriction R3 first.  The results are as follows

 

This means the error correction model can e estimated for the data under consideration.

For the error correction model, one needs to estimate a static regression (without lags) and to use the residuals of the equation as error correction term. Estimating static regression yield

The estimates of this equation are representative of the long run coefficients of relationship between the two variables. This shows the long run elasticity of consumption with respect to income is 0.93

We have to estimate following kind of error correction regression

 

The intercept doesn’t enter in the error correction regression. The estimates are as follows

This is the parsimonious model made for the consumption and income. The Eq (5) is representative of long run relationship between two variable and Eq (6) informs about short run dynamics.

The final model has only two parameters, whereas as Eq(1) that we started with contains 6 parameters. The sigma for the Eq(6) and  Eq (2) are roughly same which informs that the large model where we started has same predicting power as the last model. The diagnostic tests are all OK which means the final model is statistically adequate in that it the assumption of the model are not opposed by the data.

The final model is an error correction model, which contains information for both short run and long run. The short run information is present in equation (6), whereas the long run information is implicit in the error correction term and it is available in the static Eq (5).

The same methodology can be adapted for the more complex situations and the researcher needs to start from a general model, reducing it successively until the most parsimonious model which is statistically adequate is achieved

Data

Variables Details:

Consumption: Households and NPISHs Final consumption expenditure (current LCU)

GDP: GDP Current LCU

Country: Korea, Republic

Time Period: 1960-2019

Source: WDI online (open source data)

ConsumptionGDP
1960212720000000249860000000
1961252990000000301690000000
1962304100000000365860000000
1963417870000000518540000000
1964612260000000739680000000
1965693780000000831390000000
19668378900000001066070000000
196710398000000001313620000000
196812770600000001692900000000
196915977800000002212660000000
197020621000000002796600000000
197125924000000003438000000000
197231048000000004267700000000
197337986000000005527300000000
197454436000000007905000000000
1975728570000000010543600000000
1976931550000000014472800000000
19771136150000000018608100000000
19781501670000000025154500000000
19791943970000000032402300000000
19802491670000000039725100000000
19813118150000000049669800000000
19823527860000000057286600000000
19833979680000000068080100000000
19844444480000000078591300000000
19854930500000000088129700000000
198654837200000000102986000000000
198761775800000000121698000000000
198871362200000000145995000000000
198983899400000000165802000000000
1990100738000000000200556000000000
1991122045000000000242481000000000
1992141345000000000277541000000000
1993161105000000000315181000000000
1994192771000000000372493000000000
1995227070000000000436989000000000
1996261377000000000490851000000000
1997289425000000000542002000000000
1998270298000000000537215000000000
1999311177000000000591453000000000
2000355141000000000651634000000000
2001391692000000000707021000000000
2002440207000000000784741000000000
2003452737000000000837365000000000
2004468701000000000908439000000000
2005500911000000000957448000000000
20065332780000000001005600000000000
20075718100000000001089660000000000
20086063560000000001154220000000000
20096228090000000001205350000000000
20106670610000000001322610000000000
20117111190000000001388940000000000
20127383120000000001440110000000000
20137580050000000001500820000000000
20147804630000000001562930000000000
20158048120000000001658020000000000
20168348050000000001740780000000000
20178727910000000001835700000000000
20189115760000000001898190000000000
20199316700000000001919040000000000
Please follow and like us:

Author: Atiq Rehman

Dr. Atiq –ur-Rehman is Associate Professor and Director at Kashmir Institute of Economics, University of Azad Jammu and Kashmir (UAJK). Before joining UAJK, he has been an Assistant Professor of Econometrics at Pakistan Institute of Development Economics, Islamabad, Pakistan’s leading university in economics and at International Islamic University, Islamabad on similar position. He holds PhD in Econometrics from International Islamic University, where he wrote his thesis under supervision of Dr. Asad Zaman. Dr. Atiq teaches Econometrics and Statistics especially theoretical and time series econometrics and statistical research methods. He also has interest in teaching Monetary Economics Dr. Atiq has formal training in Econometrics and has vast experience of utilizing his econometric skills in various economic related disciplines. His published research relates to Development Studies, Islamic Banking Finance, Monetary Economics and Econometrics. He has supervised large number of students in their MS/PhD theses and has presented his research in several international events. Dr. Atiq also has deep interest in Islamic Economics and Islamic Finance and has presented his research in on Islamic finance in several international conferences.

Leave a Reply

Your email address will not be published. Required fields are marked *