## Panel Data Nonlinear Simultaneous Equation Models with Two-Stage Least Squares using Stata

In this article, we will follow Woolridge (2002) procedure to estimate a set of equations with nonlinear functional forms for panel data using the two-stage least squares estimator. It has to be mentioned that this topic is quite uncommon and not used a lot in applied econometrics, this is due that instrumenting the nonlinear terms might be somewhat complicated.

Assume a two-equation system of the form:

Where the y’s represents the endogenous variables, Z represents the exogenous variables taken as instruments and u are the residuals for each equation. Notice that y2 is in a quadratic form in the first equation but also present in linear terms on the second equation.

Woolridge calls this model as nonlinear in endogenous variable, yet the model still linear in the parameters γ making this a particular problem where we need to somehow instrument the quadratic term of y2.

Finding the instruments for the quadratic term is a particular challenge than already it is for linear terms in simple instrumental variable regression. He suggests the following:

“A general approach is to always use some squares and cross products of the exogenous variables appearing somewhere in the system. If something like exper2 appears in the system, additional terms such as exper3 and exper4 would be added to the instrument list.” (Wooldridge, 2002, p. 235).

Therefore, it worth the try to use nonlinear terms of the exogenous variables from Z, in the form of possible Z2 or even Z3. And use these instruments to deal with the endogeneity of the quadratic term y2. When we define our set of instruments, then any nonlinear equation can be estimated with two-stage least squares. And as always, we should check the overidentifying restrictions to make sure we manage to avoid inconsistent estimates.

The process with an example.

Let’s work with the Example of a nonlinear labor supply function. Which is a system of the form:

Some brief description of the model indicates that for the first equation, the hours (worked) are a nonlinear function of the wage, the level of education (educ), the age (age), the kids situation associated to the age, whether if they’re younger than 6 years old or between 6 and 18 (kidslt6 and kidsge6), and the wife’s income (nwifeinc).

On the second equation, the wage is a function of the education (educ), and a nonlinear function of the exogenous variable experience (exper and exper2).

We work on the natural assumptions that E(u|z)=0 therefore the instruments are exogenous. Z in this case contains all the other variables which are not endogenous (hours and wage are the endogenous variables).

We will instrument the quadratic term of the logarithm of the wage in the first equation, and for such instrumenting process we will add three new quadratic terms, which are:

And we include those in the first-stage regression.

With Stata we first load the dataset which can be found here.

`https://drive.google.com/file/d/1m4bCzsWgU9sTi7jxe1lfMqM2T4-A3BGW/view?usp=sharing`

Load up the data (double click the file with Stata open or use some path command to get it ready)

`use MROZ.dta`

Generate the squared term for the logarithm of the wage with:

`gen lwage_sq=lwage *lwage`

Then, get ready to use the following command with ivregress, however, we will explain it in detail.

ivregress 2sls hours educ age kidslt6 kidsge6 nwifeinc (lwage lwage_sq  = educ c.educ#c.educ exper expersq age c.age#c.age kidsge6 kidslt6 nwifeinc c.nwifeinc#c.nwifeinc), first

Which has the following interpretation. According to the syntaxis of Stata’s program. First, make sure you specify the first equation with the associated exogenous variables, we do that with the part.

ivregress 2sls hours educ age kidslt6 kidsge6 nwifeinc

Now, let’s tell to Stata that we have two other endogenous regressors, which are the wage and the squared term of the wages. We open the bracket and put

(lwage lwage_sq  =

This will tell to Stata that lwage and lwage_sq are endogenous, part of the first equation of hours, and after the equal, we specify ALL the exogenous variables including the instruments for the endogenous terms, this will lead to include the second part as:

(lwage lwage_sq  = educ c.educ#c.educ exper expersq age c.age#c.age kidsge6 kidslt6 nwifeinc c.nwifeinc#c.nwifeinc)

Notice that this second part will have a c.var#c.var structure, this is Stata’s operator to indicate a multiplication for continuous variables, (and we induce the quadratic terms without generating the variables with another command like we did with the wage).

So notice we have c.educ#c.educ which is the square of the educ variable, and c.age#c.age which is the square of the age, and we also square the wife’s income with c.nwifeinc#c.nwifeinc. These are the instruments for the quadratic term.

The fact that we have two variables on the left (lwage and lwage_sq) indicates that the set of instruments will hold first for an equation for lwage and second for lwage_sq given the exact same instruments.

We include the option , first to see what were the regressions in the first stage.

`ivregress 2sls hours educ age kidslt6 kidsge6 nwifeinc (lwage lwage_sq  = educ c.educ#c.educ exper expersq age c.age#c.age kidsge6 kidslt6 nwifeinc c.nwifeinc#c.nwifeinc), first`

The output of the above model for the first stage equations is:

And the output for the two stage equation is:

Which yields in the identical coefficients in Woolridge’s book (2002, p- 236) also with some slightly difference in the standard errors (yet these slight differences do not change the interpretation of the statistical significance of the estimators).

In this way, we instrumented both endogenous regressors lwage and lwage_sq. Which are a nonlinear relationship in the model.

As we can see, the quadratic term is not statistically significant to explain the hours worked.

At last, we need to make sure that overidentification restrictions are valid. So we use after the regression

`estat overid`

And within this result, we cannot reject the null that overidentifying restrictions are valid.

Bibliography

Wooldridge, J. M. (2002). Econometric Analysis of Cross Section and Panel Data. Cam-bridge, MA: MIT Press.

## Identifying Patterns with Stata Graphs

When we start to analyze any type of economic relationship, it is often said that we always need to graph the data. The importance of this step is having a visual where we can increase the understanding of our current relationships in the data. Sometimes with this, we can improve the mathematical functional form in the econometric modelling to capture better the relationships and dynamics in the data.

I would suggest to first do the following steps:

1. Scatter your independent variable (in the x-axis) against your dependent variable (in the y-axis)
2. Observe what kind of linear and non-linear relationships may exists in the graph.
3. Place the mean values of the variables to have some sort of idea of what kind of data concentrations we might have.
4. Make your inferences accordingly, and do a matrix with correlations with everything.

To do an example of this, let’s make an example with a Data Generating Process of the form:

And to generate the random sample we will use:

```clear all
set obs 100
gen n=_n
set seed 1234
gen x=rnormal()
gen x_sq=x*x
gen z=rnormal()
gen y= 1 + (0.5*x)+ (- 0.2*x_sq) + (1.5*z)```

Now let’s see a summary of our variables.

`sum`

Which will have as a result

Skipping n, which is just the individual identificatory variable, we can see the mean values of these variables. Now let’s start to play with some scatter plots.

`scatter y xscatter y z`

And we will have two graphs that look like this:

First graph, which is the scatter of y and x doesn’t show any clear relationship, in fact, we might state that there’s no relationship by such dispersion, On the second hand, we find out that there’s a possible linear relationship with y and z.

Let’s go and place the means of each variable in the scatter graph, remember that x mean is 0.0078 and y mean is 0.7479, with these values we will have something like this:

`scatter y x, xline(.0078032) yline(.747933)scatter y z, xline(-.0452837) yline(.747933)`

According to this, the data appears to be normal distributed (as it should be since we use a random sampling with normal distribution), in other cases, we might find that the mean is allocated in extreme values in either of the axis, which might imply some sort of kurtosis or non-normal distributions.

Now let’s use some linear and non-linear predictions using the not so common lfitci and qfitci. To do this, we type:

`twoway (lfitci y x)twoway (lfitci y z)`

And the respective output will be:

If we want to use lines instead of shaded area, we might type

`twoway (lfitci y x, ciplot(rline) )twoway (lfitci y z, ciplot(rline) )`

And it will display the same graph, but without shaded areas.

We can extend the same idea with non-linear relationships with a quadratic form using qfitci:

`twoway (qfitci y x)twoway (qfitci y z)`

And the output of the graph will be:

Notice that the quadratic relationship is now more visible using the quadratic adjustment for x and y. Therefore, it is a good practice to perform the quadratic adjustment even when the relationship is totally linear like in the case of y and z.

One last type of graphical analysis is using the fractional polynomial, where the syntax is given by:

`twoway (fpfitci y x)twoway (fpfitci y z)`

Finally, and to complete the steps we mentioned in this post, let’s do the matrix of correlations. Which is just simply the scatter plots together.

`graph matrix y x z`

The useful thing to consider with the matrix of correlations is that we can observe not only the scatter plots to a certain variable, but instead we got the scatter plots associated to all the variables we place in the command. Therefore, in regression analysis, this is quite useful to inspect to multicollinearity issues among the independent variables and not only the correlation between the dependent variable.

We can say that similar to x and z, there’s no strong linear correlation since it looks like more like a cloud of dots instead of a linear relationship like it has y and z.

Notice, however, that unless we use a quadratic adjustment, we don’t have it easy to detect the quadratic relationship between y and x, therefore, it is recommended to use the qfitci command to investigate such non-linear relationship.

Bibliography.

StataCorp (2020) Graph twoway fpfitci, Recuperated from: https://www.stata.com/manuals13/g-2graphtwowayfpfitci.pdf#g-2graphtwowayfpfitci

## Investigating Non-linear relationships with curvefit using Stata

While modelling specific phenomenon’s in economics, sometimes we might encounter a functional form which may not be linear in the explanatory variables. Assuming, that we still have linearity in the estimators, we have the capability to include in the regression, variables with powers. As an example, consider the following model:

The last equation presents the dependent variable Y as a function of X however, we can see that the polynomial in the model is of second-order degree. A few mentions can be done from here: 1) the model still linear in the parameters β. 2) No multicollinearity can be argued to exists between the regressors in X and the square of X (the model itself in terms of X will be highly correlated) therefore we’re modeling a structure where both of them will move together. 3) The parameters will no longer have a static/basic marginal effect, to find out this marginal effect we need to calculate the derivate of the model, given by:

Which represents that when X increase in one unit, the change in y is the above expression.

Considering the derivate, a turning point is given in the effect of X to Y, and can be found when we equal this derivate to 0 (to find the numerical spot where the slope is equal to 0). And that is done by solving the equation for the value of X:

We clear X and we have:

Let’s see this in practice, first let’s formulate a Data Generating Process -DGP- as follows without any noise or error:

Where X~N(0,1), with Stata let’s generate some random observations and the square variable.

```clear all
** Setting observations
set obs 50
gen n=_n
set seed 1234
gen x=rnormal()
gen x_sq=x*x
gen y= 1 + (0.5*x)+ (- 0.2*x_sq)```

After that, let’s scatter y, over x. and using scatter y x we have the next graph:

If we regress this functional form with the next command:

`regress y x x_sq`

We have the regression totally adjusted to the DGP. But with missing values on lots of statistics (since there is no residual at all!).

Notice also that the linear adjustment for r-squared is 1, meaning it is matching the data perfectly.

Now confirming that coefficients are 0.5, -0.2 and 1 for the constant. Let’s confirm that the turning point of the model is in:

Solving and changing the parameter’s we have that:

The slope of the curve where it turns to be 0 it should be allocated in X=1.25, with an image in Y=1+0.5(1.25)-0.2(1.25^2)= 1.3125 after that, there’s a decreasing effect in Y given changes in X.

Let’s redo the graph but marking those points.

`scatter y x, yline(1.3125) xline(1.25)`

We allocated the exact point where the input of x variable is enough to create a decreasing effect on the dependent variable (specifically at x=1.25, y=1.3525) and moving to x>1.25 we have decreasing effects on y, where areas before this point it was positive.

Within this context, let’s introduce to curvefit command.

This package created by Liu wei (2010) and it is good to investigate this kind of nonlinearities, let’s look it in action.

`curvefit y x, function(1)`

By placing the variables of interest (y as dependent and x as an independent), we need to specify the behavior of the polynomial, as the examples show, function(1) equals a first-order polynomial (a single straight line equation). With the following output.

As you can see, it gives estimates of the coefficients (b0 as the constant with b1 as the slope) and the basic statistic of the number of observations (N) and the adjusted r-squared. The graph displayed is:

Which is a linear model. A simple regression with first-order power in X. let’s try another function (the quadratic function). We type:

`curvefit y x, function(4)`

Which gives the following output:

Where b0 is the constant parameter, b1 would equal to the X without any power, and finally, b2 is the parameter associated with X^2. Giving an R^2 adjusted of 1, represents the goodness fit of the model of 100%. With the associated graph:

As you can see, the curve provides estimates pretty decent of the structure of the data given different types of mathematical models.

Here’s the complete list of what kind of functions it can be modeled.

```function(string) The following are alternative Models correspond with the values of the sting:

. string = 1 Linear: Y = b0 + (b1 * X)
. string = 2 Logarithmic: Y = b0 + (b1 * ln(X))
. string = 3 Inverse: Y = b0 + (b1 / X)
. string = 4 Quadratic: Y = b0 + (b1 * X) + (b2 * X^2)
. string = 5 Cubic: Y = b0 + (b1 * X) + (b2 * X^2) + (b3 * X^3)
. string = 6 Power: Y = b0 * (X^b1) OR ln(Y) = ln(b0) + (b1 * ln(X))
. string = 7 Compound: Y = b0 * (b1^X) OR ln(Y) = ln(b0) + (ln(b1) * X)
. string = 8 S-curve: Y = e^(b0 + (b1/X)) OR ln(Y) = b0 + (b1/X)
. string = 9 Logistic: Y = b0 / (1 + b1 * e^(-b2 * X))
. string = 0 Growth: Y = e^(b0 + (b1 * X)) OR ln(Y) = b0 + (b1 * X)
. string = a Exponential: Y = b0 * (e^(b1 * X)) OR ln(Y) = ln(b0) + (b1 * X)
. string = b Vapor Pressure: Y = e^(b0 + b1/X + b2 * ln(X))
. string = c Reciprocal Logarithmic: Y = 1 / (b0 + (b1 * ln(X)))
. string = d Modified Power: Y = b0 * b1^(X)
. string = e Shifted Power: Y = b0 * (X - b1)^b2
. string = f Geometric: Y = b0 * X^(b1 * X)
. string = g Modified Geometric: Y = b0 * X^(b1/X)
. string = h nth order Polynomial: Y = b0 + b1X + b2X^2 + b3X^3 + b4X^4 + b5*X^5 …
. string = i Hoerl: Y = b0 * (b1^X) * (X^b2)
. string = j Modified Hoerl: Y = b0 * b1^(1/X) * (X^b2)
. string = k Reciprocal: Y = 1 / (b0 + b1 * X)
. string = l Reciprocal Quadratic: Y = 1 / (b0 + b1 * X + b2 * X^2)
. string = m Bleasdale: Y = (b0 + b1 * X)^(-1 / b2)
. string = n Harris: Y = 1 / (b0 + b1 * X^b2)
. string = o Exponential Association: Y = b0 * (1 - e^(-b1 * X))
. string = p Three-Parameter Exponential Association: Y = b0 * (b1 - e^(-b2 * X))
. string = q Saturation-Growth Rate: Y = b0 * X/(b1 + X)
. string = r Gompertz Relation: Y = b0 * e^(-e^(b1 - b2 * X))
. string = s Richards: Y = b0 / (1 + e^(b1 - b2 * X))^(1/b3)
. string = t MMF: Y = (b0 * b1+b2 * X^b3)/(b1 + X^b3)
. string = u Weibull: Y = b0 - b1*e^(-b2 * X^b3)
. string = v Sinusoidal: Y = b0+b1 * b2 * cos(b2 * X + b3)
. string = w Gaussian: Y = b0 * e^((-(b1 - X)^2)/(2 * b2^2))
. string = x Heat Capacity: Y = b0 + b1 * X + b2/X^2
. string = y Rational: Y = (b0 + b1 * X)/(1 + b2 * X + b3 * X^2)
. string = ALL refers to a total of above models (Attention: it's uppercase!) nograph Curve Estimation without curve fit graph.
```

This package can be installed using:

`ssc install curvefit, replace.`

Bibliography.

Liu Wei (2010) “CURVEFIT: Stata module to produces curve estimation regression statistics and related plots between two variables for alternative curve estimation regression models,” Statistical Software Components S457136, Boston College Department of Economics, revised 28 Jul 2013.