In regression analysis, we often check the assumptions of the econometrical model regressed, during this, one of the key assumptions is that the model has no omitted variables (and it’s correctly specified). In 1969, Ramsey (1969) developed an omitted variable test, which basically uses the powers of the predicted values of the dependent variable to check if the model has an omitted variable problem.
Assume a basic fitted model given by:
Where y is the vector of containing the dependent variable with nx1 observations, X is the matrix that contains the explanatory variables which is nxk (n are the total observations and k are the number of independent variables). The vector b represents the estimated coefficient vector.
Ramsey test fits a regression model of the type
Where z represents the powers of the fitted values of y, the Ramsey test performs a standard F test of t=0 and the default setting is considering the powers as:
In Stata this is easily done with the command
estat ovtest
after the regression command reg.
To illustrate this, consider the following code:
use https://www.stata-press.com/data/r16/auto regress mpg weight foreign estat ovtest
The null hypothesis is that t=0 so it means that the powers of the fitted values have no relationship which serves to explain the dependent variable y, meaning that the model has no omitted variables. The alternative hypothesis is that the model is suffering from an omitted variable problem.
In the panel data structure where we have multiple time series data points and multiple observations for each time point, in this case we fit a model like:
With i=1, 2, 3, …, n observations, and for each i, we have t=1, 2, …, T time periods of time. And v represents the heterogenous effect which can be estimated as parameter (in fixed effects: which can be correlated to the explanatory variables) and as variable (in random effects which is not correlated with the explanatory variables).
To implement the Ramsey test manually in this regression structure in Stata, we will follow Santos Silva (2016) recommendation, and we will start predicting the fitted values of the regression (with the heterogenous effects too!). Then we will generate the powers of the fitted values and include them in the regression in (4) with clustered standard errors. Finally, we will perform a significant test jointly for the coefficients of the powers.
use https://www.stata-press.com/data/r16/nlswork xtreg ln_w grade age c.age#c.age ttl_exp c.ttl_exp#c.ttl_exp tenure c.tenure#c.tenure 2.race not_smsa south, fe cluster(idcode) predict y_hat,xbu gen y_h_2=y_hat*y_hat gen y_h_3=y_h_2*y_hat gen y_h_4=y_h_3*y_hat xtreg ln_w grade age c.age#c.age ttl_exp c.ttl_exp#c.ttl_exp tenure c.tenure#c.tenure 2.race not_smsa south y_h_2 y_h_3 y_h_4, fe cluster (idcode) test y_h_2 y_h_3 y_h_4
Alternative you can skip the generation of the powers and apply them directly using c. and # operators in the command as it follows this other code:
use https://www.stata-press.com/data/r16/nlswork xtreg ln_w grade age c.age#c.age ttl_exp c.ttl_exp#c.ttl_exp tenure c.tenure#c.tenure 2.race not_smsa south, fe cluster(idcode) predict y_hat,xbu xtreg ln_w grade age c.age#c.age ttl_exp c.ttl_exp#c.ttl_exp tenure c.tenure#c.tenure 2.race not_smsa south c.y_hat#c.y_hat c.y_hat#c.y_hat# c.y_hat c.y_hat#c.y_hat# c.y_hat# c.y_hat , fe cluster (idcode) test c.y_hat#c.y_hat c.y_hat#c.y_hat# c.y_hat c.y_hat#c.y_hat# c.y_hat# c.y_hat
At the end of the procedure you will have this result.
Where the null hypothesis is that the model is correctly specified and has no omitted variables, however in this case, we reject the null hypothesis with a 5% level of significance, meaning that our model has omitted variables.
As an alternative but somewhat more restricted, also with more features, you can use the user-written package “resetxt” developed by Emad Abd & Sahra Khaleel (2015) which can be used after installing it with:
ssc install resetxt, replace
This package however doesn’t work with factor-variables or time series operators, so we cannot include c. or i. and d. or L. operators for example.
clear all use https://www.stata-press.com/data/r16/nlswork gen age_sq=ageage gen ttl_sq= ttl_exp ttl_exp gen tenure_sq= tenure* tenure xtreg ln_w grade age age_sq ttl_exp ttl_sq tenure tenure_sq race not_smsa south, fe cluster(idcode) resetxt ln_w grade age age_sq ttl_exp ttl_sq tenure tenure_sq race not_smsa south, model(xtfe) id(idcode) it(year)
however, the above code might be complicated to calculate in Stata, depending on how much memory do you have to do the procedure. That’s why in this post it was implemented the manual procedure of the Ramsey test in the panel data structure.
Bibliography
Emad Abd, S. E., & Sahra Khaleel, A. M. (2015). RESETXT: Stata Module to Compute Panel Data REgression Specification Error Tests (RESET). Obtained from: Statistical Software Components S458101: https://ideas.repec.org/c/boc/bocode/s458101.html
Ramsey, J. B. (1969). Tests for specification errors in classical linear least-squares regression analysis. Journal of the Royal Statistical Society Series B 31, 350–371.
Santos Silva, J. (2016). Reset test after xtreg & xi:reg . Obtained from: The Stata Forum: https://www.statalist.org/forums/forum/general-stata-discussion/general/1327362-reset-test-after-xtreg-xi-reg?fbclid=IwAR1vdUDn592W6rhsVdyqN2vqFKQgaYvGvJb0L2idZlG8wOYsr-eb8JFRsiA
After I originally left a comment I seem to have clicked the -Notify me when new comments are added- checkbox and from now on every time a comment is added I get 4 emails with the exact same comment. There has to be a way you can remove me from that service? Thank you!
I have found very interesting your article.It’s pretty worth enough for me.
In my view, if all website owners and blpggers made good content as you did, the
weeb will be a lot more usefcul than everr before.
Hello there, I believe your site might be having browser compatibility
problems. When I look at your blog in Safari, it looks fine however when opening
in IE, it has some overlapping issues. I merely wanted
to provide you with a quick heads up! Other than that, wonderful blog!
Hey! Thiis is my first comment hhere so I just wabted to
give a quuck shout out and say I truly enoy reading through your articles.
Appreciate it!
I need to to thank you for this fantastic
read!! I definitely enjoyed evdry bit of it.I have gott youu book-marked to look at new things you post…
I need to to thank you for this fantastic read!!
I definitely enjoyed every bit of it.I have got you book-marked
to look at new things you post…
Hello, I found your blog in a very new directory of blogs. I don’t learn how your site came up, need to have been a typo. Your blog looks good, have a nice day.
This kind of lovely blog you’ve, glad I found it!??
I liked as much as you will obtain carried out proper here. The cartoon is attractive, your authored material stylish. nonetheless, you command get got an nervousness over that you want be handing over the following. unwell without a doubt come further in the past once more since precisely the similar just about a lot steadily inside of case you shield this hike.
The Ramsey RESET test is not really a test for omitted variables that are missing from the model in any form. It really is a test for functional form. If the squares, cubes… have significant explanatory power, the test is saying that the linear specification is rejected, and y=f(x) is not a linear function. It is unfortunate that Stata calls this “ovtest”, as failure to reject does not mean that there is no omitted variables problem. Using the auto dataset, reg price rep78 turn and note that turn is positive and significant. Use estat ovtest, no problem. Now reg price rep78 turn weight. The variable turn is still significant (and switched sign!), but its coefficient is obviously biased in the first regression due to the true omitted variables problem. RESET does not pick up this sort of misspecification.
That’s indeed a pretty accurate comment, thank you for the clarification !.