Threat to validity of regression analysis – Omitted Variables Bias

Most of the readers of this blog would be familiar with ordinary least squares estimator and regression models. Let us talk about one source that can cause these estimates and models to be biased and inconsistent. This is especially important when we think about the causal relationship of interest or the relationship which being studied. Proving causality can be difficult. But if we assess our econometric analysis and consider the different threats to the validity of our statistical inference then we can be more assured about our analysis, results, and inferences.

We shall be discussing omitted variables bias. This is bias that occurs when

  • the regressor X is correlated with an omitted variable Z.
  • omitted variable Z is a determinant of the dependent variable Y.

Both of these conditions result in the violation of the Gauss-Markov assumption of ordinary least squares regression

which is the assumption that states that the error term is uncorrelated with the regressors. The u denotes the error term while the X denotes the regressors. Simply put, this bias occurs when an econometric model leaves out one or more relevant variables. This bias results from the model attributing the effects of the missing variable(s) to those that are included in the model.

One simple example of this bias is by taking an example of dependent variable Salary, regressor Education and omitted variable Ability. Here, Salary is the annual salary of the individuals in our sample. Education can be either years of education or their scores on tests or any other measure of education. Ability is some variable which signifies talent, skill, or proficiency in general. We can think of Ability as being unmeasurable as well.

Let us think about a bias induced when we omit the Ability variable as a regressor, either due to a mistake or because we cannot measure it. At the same time, the true data generating process includes Ability and is

In this case, the variable Ability would have some impact on both Salary as well as Education. Hence, we can say that Ability is correlated with Education. The effect on Salary could be directly captured by including it in the regression. At the same time, we have no method to quantify Ability. And if we do not include it then its effect on Salary will not be picked up as an indirect effect through the variable Education.

If we do not use the true data generating process and instead use

Then we run into the problem of omitted variable bias. This causes our ordinary least squares estimate of the estimate of Education (denoted by beta_1)to be biased and inconsistent. This means that the bias cannot be prevented by increasing the sample size because omitted variable bias prevents ordinary least squares estimate from converging in probability to the true parameter value. The strength and direction of the bias is determined by the correlation between the error term and the regressor.

Now that we know exactly what the issue of Omitted Variable Bias is, let us consider some solutions.

One answer to this issue is to include more variables in the regression model. By doing this, the regression model uses as independent variables, not only the ones whose effects on the dependent variable are of interest, but also any potential variables which might cause omitted variables bias. Including these additional variables can help us reduce the risk of inducing omitted variables bias but at the same time, it may increase the variance of the estimator.

Some general guidelines to follow in this case that help us in our decision to include additional variables are:

  • Specify the coefficient of interest.
  • Based on your knowledge of the variables and model, identify possible sources of omitted variables bias. This should give you a starting point specification as baseline and a set of regressor variables, sometimes called control variables.
  • Use different model specifications and test against your baseline.
  • Use tables to provide full disclosure of your results – by presenting different model specifications, you can support your argument and enable readers to see the impact of including other regressors.

If diminishing the bias by including additional variables is not possible, such as in the cases where there are no adequate control variables, then there are still a variety of approaches which can help us solve this problem.

  • Making use of panel data methods.
  • Making use of instrumental variables regression methods such as Two Stage Least Squares.
  • Making use of a randomized control experiment.

These approaches are important to consider because they help us to avoid false inferences of causality due to the presence of another underlying variable, the omitted variable, that influences both the independent and dependent variables.

Wooldridge, Jeffrey M. (2009). “Omitted Variable Bias: The Simple Case”. Introductory Econometrics: A Modern Approach. Mason, OH: Cengage Learning. ISBN 9780324660548.
Greene, W. H. (1993). Econometric Analysis (2nd ed.). Macmillan.

Log-linearisation in Short

Log-linearisation in Short with an example

There exist many different types of models of equations for which there exists no closed form solution. In these cases, we use a method known as log-linearisation. One example of these kinds of models are non-linear models like Dynamic Stochastic General Equilibrium (DSGE) models. DSGE models are non-linear in both parameter and in variables. Because of this, solving and estimating these models is challenging.

Hence, we have to use approximations to the non-linear models. We have to make concessions in this, as some features of the models are lost, but the models become more manageable.

In the simplest terms, we first take the natural logs of the non-linear equations and then we linearise the logged difference equations about the steady state. Finally, we simplify the equations until we have linear equations where the variables are percentage deviations from the steady state. We use the steady state as that is the point where the economy ends up in the absence of future shocks.

Usually in the literature, the main part of estimation consisted of linearised models, but after the global financial crisis, more and more non-linear models are being used. Many discrete time dynamic economic problems require the use of log-linearisation.

There are several ways to do log-linearisation. Some examples of which, have been provided in the bibliography below.

One of the main methods is the application of Taylor Series expansion. Taylor’s theorem tells us that the first-order approximation of any arbitrary function is as below.

We can use this to log-linearise equations around the steady state. Since we would be log-linearising around the steady state, x* would be the steady state.

For example, let us consider a Cobb-Douglas production function and then take a log of the function.

The next step would be to apply Taylor Series Expansion and take the first order approximation.

Since we know that

Those parts of the function will cancel out. We are left with –

For notational ease, we define these terms as percentage deviation of x about x* where x* signifies the steady state.
Thus, we get

At last, we have log-linearised the Cobb-Douglas production function around the steady state.

Sims, Eric (2011). Graduate Macro Theory II: Notes on Log-Linearization – 2011. Retrieved from

Zietz, Joachim (2006). Log-Linearizing Around the Steady State: A Guide with Examples. SSRN Electronic Journal. 10.2139/ssrn.951753.

McCandless, George (2008). The ABCs of RBCs: An Introduction to Dynamic Macroeconomic Models, Harvard University Press

Uhlig, Harald (1999). A Toolkit for Analyzing Nonlinear Dynamic Stochastic Models Easily, Computational Methods for the Study of Dynamic
Economies, Oxford University Press