Identifying Patterns with Stata Graphs

When we start to analyze any type of economic relationship, it is often said that we always need to graph the data. The importance of this step is having a visual where we can increase the understanding of our current relationships in the data. Sometimes with this, we can improve the mathematical functional form in the econometric modelling to capture better the relationships and dynamics in the data.

I would suggest to first do the following steps:

  1. Scatter your independent variable (in the x-axis) against your dependent variable (in the y-axis)
  2. Observe what kind of linear and non-linear relationships may exists in the graph.
  3. Place the mean values of the variables to have some sort of idea of what kind of data concentrations we might have.
  4. Make your inferences accordingly, and do a matrix with correlations with everything.

To do an example of this, let’s make an example with a Data Generating Process of the form:

And to generate the random sample we will use:

clear all
set obs 100
gen n=_n
set seed 1234
gen x=rnormal()
gen x_sq=x*x 
gen z=rnormal() 
gen y= 1 + (0.5*x)+ (- 0.2*x_sq) + (1.5*z)

Now let’s see a summary of our variables.

sum

Which will have as a result

Skipping n, which is just the individual identificatory variable, we can see the mean values of these variables. Now let’s start to play with some scatter plots.

scatter y x
scatter y z

And we will have two graphs that look like this:

First graph, which is the scatter of y and x doesn’t show any clear relationship, in fact, we might state that there’s no relationship by such dispersion, On the second hand, we find out that there’s a possible linear relationship with y and z.

Let’s go and place the means of each variable in the scatter graph, remember that x mean is 0.0078 and y mean is 0.7479, with these values we will have something like this:

scatter y x, xline(.0078032) yline(.747933)
scatter y z, xline(-.0452837) yline(.747933)

According to this, the data appears to be normal distributed (as it should be since we use a random sampling with normal distribution), in other cases, we might find that the mean is allocated in extreme values in either of the axis, which might imply some sort of kurtosis or non-normal distributions.

Now let’s use some linear and non-linear predictions using the not so common lfitci and qfitci. To do this, we type:

twoway (lfitci y x)
twoway (lfitci y z)

And the respective output will be:

If we want to use lines instead of shaded area, we might type

twoway (lfitci y x, ciplot(rline) )
twoway (lfitci y z, ciplot(rline) )

And it will display the same graph, but without shaded areas.

We can extend the same idea with non-linear relationships with a quadratic form using qfitci:

twoway (qfitci y x)
twoway (qfitci y z)

And the output of the graph will be:

Notice that the quadratic relationship is now more visible using the quadratic adjustment for x and y. Therefore, it is a good practice to perform the quadratic adjustment even when the relationship is totally linear like in the case of y and z.

One last type of graphical analysis is using the fractional polynomial, where the syntax is given by:

twoway (fpfitci y x)
twoway (fpfitci y z)

Finally, and to complete the steps we mentioned in this post, let’s do the matrix of correlations. Which is just simply the scatter plots together.

graph matrix y x z

The useful thing to consider with the matrix of correlations is that we can observe not only the scatter plots to a certain variable, but instead we got the scatter plots associated to all the variables we place in the command. Therefore, in regression analysis, this is quite useful to inspect to multicollinearity issues among the independent variables and not only the correlation between the dependent variable.

We can say that similar to x and z, there’s no strong linear correlation since it looks like more like a cloud of dots instead of a linear relationship like it has y and z.

Notice, however, that unless we use a quadratic adjustment, we don’t have it easy to detect the quadratic relationship between y and x, therefore, it is recommended to use the qfitci command to investigate such non-linear relationship.

Bibliography.

StataCorp (2020) Graph twoway fpfitci, Recuperated from: https://www.stata.com/manuals13/g-2graphtwowayfpfitci.pdf#g-2graphtwowayfpfitci

Please follow and like us:

Investigating Non-linear relationships with curvefit using Stata

While modelling specific phenomenon’s in economics, sometimes we might encounter a functional form which may not be linear in the explanatory variables. Assuming, that we still have linearity in the estimators, we have the capability to include in the regression, variables with powers. As an example, consider the following model:

The last equation presents the dependent variable Y as a function of X however, we can see that the polynomial in the model is of second-order degree. A few mentions can be done from here: 1) the model still linear in the parameters β. 2) No multicollinearity can be argued to exists between the regressors in X and the square of X (the model itself in terms of X will be highly correlated) therefore we’re modeling a structure where both of them will move together. 3) The parameters will no longer have a static/basic marginal effect, to find out this marginal effect we need to calculate the derivate of the model, given by:

Which represents that when X increase in one unit, the change in y is the above expression.

Considering the derivate, a turning point is given in the effect of X to Y, and can be found when we equal this derivate to 0 (to find the numerical spot where the slope is equal to 0). And that is done by solving the equation for the value of X:

We clear X and we have:

Let’s see this in practice, first let’s formulate a Data Generating Process -DGP- as follows without any noise or error:

Where X~N(0,1), with Stata let’s generate some random observations and the square variable.

clear all
** Setting observations
set obs 50
gen n=_n
set seed 1234
gen x=rnormal()
gen x_sq=x*x 
gen y= 1 + (0.5*x)+ (- 0.2*x_sq)

After that, let’s scatter y, over x. and using scatter y x we have the next graph:

If we regress this functional form with the next command:

regress y x x_sq

We have the regression totally adjusted to the DGP. But with missing values on lots of statistics (since there is no residual at all!).

Notice also that the linear adjustment for r-squared is 1, meaning it is matching the data perfectly.

Now confirming that coefficients are 0.5, -0.2 and 1 for the constant. Let’s confirm that the turning point of the model is in:

Solving and changing the parameter’s we have that:

The slope of the curve where it turns to be 0 it should be allocated in X=1.25, with an image in Y=1+0.5(1.25)-0.2(1.25^2)= 1.3125 after that, there’s a decreasing effect in Y given changes in X.

Let’s redo the graph but marking those points.

scatter y x, yline(1.3125) xline(1.25)

We allocated the exact point where the input of x variable is enough to create a decreasing effect on the dependent variable (specifically at x=1.25, y=1.3525) and moving to x>1.25 we have decreasing effects on y, where areas before this point it was positive.

Within this context, let’s introduce to curvefit command.

This package created by Liu wei (2010) and it is good to investigate this kind of nonlinearities, let’s look it in action.

curvefit y x, function(1)

By placing the variables of interest (y as dependent and x as an independent), we need to specify the behavior of the polynomial, as the examples show, function(1) equals a first-order polynomial (a single straight line equation). With the following output.

As you can see, it gives estimates of the coefficients (b0 as the constant with b1 as the slope) and the basic statistic of the number of observations (N) and the adjusted r-squared. The graph displayed is:

Which is a linear model. A simple regression with first-order power in X. let’s try another function (the quadratic function). We type:

curvefit y x, function(4)

Which gives the following output:

Where b0 is the constant parameter, b1 would equal to the X without any power, and finally, b2 is the parameter associated with X^2. Giving an R^2 adjusted of 1, represents the goodness fit of the model of 100%. With the associated graph:

As you can see, the curve provides estimates pretty decent of the structure of the data given different types of mathematical models.

Here’s the complete list of what kind of functions it can be modeled.

function(string) The following are alternative Models correspond with the values of the sting: 

. string = 1 Linear: Y = b0 + (b1 * X) 
. string = 2 Logarithmic: Y = b0 + (b1 * ln(X)) 
. string = 3 Inverse: Y = b0 + (b1 / X) 
. string = 4 Quadratic: Y = b0 + (b1 * X) + (b2 * X^2) 
. string = 5 Cubic: Y = b0 + (b1 * X) + (b2 * X^2) + (b3 * X^3) 
. string = 6 Power: Y = b0 * (X^b1) OR ln(Y) = ln(b0) + (b1 * ln(X)) 
. string = 7 Compound: Y = b0 * (b1^X) OR ln(Y) = ln(b0) + (ln(b1) * X) 
. string = 8 S-curve: Y = e^(b0 + (b1/X)) OR ln(Y) = b0 + (b1/X) 
. string = 9 Logistic: Y = b0 / (1 + b1 * e^(-b2 * X)) 
. string = 0 Growth: Y = e^(b0 + (b1 * X)) OR ln(Y) = b0 + (b1 * X) 
. string = a Exponential: Y = b0 * (e^(b1 * X)) OR ln(Y) = ln(b0) + (b1 * X) 
. string = b Vapor Pressure: Y = e^(b0 + b1/X + b2 * ln(X)) 
. string = c Reciprocal Logarithmic: Y = 1 / (b0 + (b1 * ln(X))) 
. string = d Modified Power: Y = b0 * b1^(X) 
. string = e Shifted Power: Y = b0 * (X - b1)^b2 
. string = f Geometric: Y = b0 * X^(b1 * X) 
. string = g Modified Geometric: Y = b0 * X^(b1/X) 
. string = h nth order Polynomial: Y = b0 + b1X + b2X^2 + b3X^3 + b4X^4 + b5*X^5 … 
. string = i Hoerl: Y = b0 * (b1^X) * (X^b2) 
. string = j Modified Hoerl: Y = b0 * b1^(1/X) * (X^b2) 
. string = k Reciprocal: Y = 1 / (b0 + b1 * X) 
. string = l Reciprocal Quadratic: Y = 1 / (b0 + b1 * X + b2 * X^2) 
. string = m Bleasdale: Y = (b0 + b1 * X)^(-1 / b2) 
. string = n Harris: Y = 1 / (b0 + b1 * X^b2) 
. string = o Exponential Association: Y = b0 * (1 - e^(-b1 * X)) 
. string = p Three-Parameter Exponential Association: Y = b0 * (b1 - e^(-b2 * X)) 
. string = q Saturation-Growth Rate: Y = b0 * X/(b1 + X) 
. string = r Gompertz Relation: Y = b0 * e^(-e^(b1 - b2 * X)) 
. string = s Richards: Y = b0 / (1 + e^(b1 - b2 * X))^(1/b3) 
. string = t MMF: Y = (b0 * b1+b2 * X^b3)/(b1 + X^b3) 
. string = u Weibull: Y = b0 - b1*e^(-b2 * X^b3) 
. string = v Sinusoidal: Y = b0+b1 * b2 * cos(b2 * X + b3) 
. string = w Gaussian: Y = b0 * e^((-(b1 - X)^2)/(2 * b2^2)) 
. string = x Heat Capacity: Y = b0 + b1 * X + b2/X^2 
. string = y Rational: Y = (b0 + b1 * X)/(1 + b2 * X + b3 * X^2) 
. string = ALL refers to a total of above models (Attention: it's uppercase!) nograph Curve Estimation without curve fit graph.

This package can be installed using:

ssc install curvefit, replace.

Bibliography.

Liu Wei (2010) “CURVEFIT: Stata module to produces curve estimation regression statistics and related plots between two variables for alternative curve estimation regression models,” Statistical Software Components S457136, Boston College Department of Economics, revised 28 Jul 2013.

Please follow and like us:

Log-linearisation in Short

Log-linearisation in Short with an example

There exist many different types of models of equations for which there exists no closed form solution. In these cases, we use a method known as log-linearisation. One example of these kinds of models are non-linear models like Dynamic Stochastic General Equilibrium (DSGE) models. DSGE models are non-linear in both parameter and in variables. Because of this, solving and estimating these models is challenging.

Hence, we have to use approximations to the non-linear models. We have to make concessions in this, as some features of the models are lost, but the models become more manageable.

In the simplest terms, we first take the natural logs of the non-linear equations and then we linearise the logged difference equations about the steady state. Finally, we simplify the equations until we have linear equations where the variables are percentage deviations from the steady state. We use the steady state as that is the point where the economy ends up in the absence of future shocks.

Usually in the literature, the main part of estimation consisted of linearised models, but after the global financial crisis, more and more non-linear models are being used. Many discrete time dynamic economic problems require the use of log-linearisation.

There are several ways to do log-linearisation. Some examples of which, have been provided in the bibliography below.

One of the main methods is the application of Taylor Series expansion. Taylor’s theorem tells us that the first-order approximation of any arbitrary function is as below.

We can use this to log-linearise equations around the steady state. Since we would be log-linearising around the steady state, x* would be the steady state.

For example, let us consider a Cobb-Douglas production function and then take a log of the function.

The next step would be to apply Taylor Series Expansion and take the first order approximation.

Since we know that

Those parts of the function will cancel out. We are left with –

For notational ease, we define these terms as percentage deviation of x about x* where x* signifies the steady state.
Thus, we get

At last, we have log-linearised the Cobb-Douglas production function around the steady state.

Bibliography:
Sims, Eric (2011). Graduate Macro Theory II: Notes on Log-Linearization – 2011. Retrieved from https://www3.nd.edu/~esims1/log_linearization_sp12.pdf


Zietz, Joachim (2006). Log-Linearizing Around the Steady State: A Guide with Examples. SSRN Electronic Journal. 10.2139/ssrn.951753.


McCandless, George (2008). The ABCs of RBCs: An Introduction to Dynamic Macroeconomic Models, Harvard University Press


Uhlig, Harald (1999). A Toolkit for Analyzing Nonlinear Dynamic Stochastic Models Easily, Computational Methods for the Study of Dynamic
Economies, Oxford University Press

Please follow and like us:

Box-Pierce Test of autocorrelation in Panel Data using Stata.

The test of Box & Pierce was derived from the article “Distribution of Residual Autocorrelations in Autoregressive-Integrated Moving Average Time Series Models” in the Journal of the American Statistical Association (Box & Pierce, 1970).

The approach is used to test first-order serial correlation, the general form of the test is given the statistic as:

Where the statistic of Box- Pierce Q is defined as the product between the number of observations and the sum of the square autocorrelation ρ in the sample at lag h. The test is closely related to the Ljung & Box (1978) autocorrelation test, and it used to determine the existence of serial correlation in the time series analysis. The test works with chi-square distribution by the way.

The null hypothesis of this test can be defined as H0: Data is distributed independently, against the alternative hypothesis of H1: Data is not distributed independently. Therefore, the null hypothesis is that data is not suffering from an autocorrelation structure against the alternative which proposes that the data has an autocorrelation structure.

The test was implemented in Stata with the panel data structure by Emad Abd Elmessih Shehata & Sahra Khaleel A. Mickaiel (2004), the test works in the context of ordinary least squares panel data regression (the pooled OLS model). And we will develop an example here.

First we install the package using the command ssc install as follows:

ssc install lmabpxt, replace

Then we will type help options.

help lmabpxt

From that we got the next result displayed.

We can notice that the sintax of the general form is:

lmabpxt depvar indepvars [if] [in] [weight] , id(var) it(var) [noconstant coll ]

In this case id(var) and it(var) represents the identificatory of individuals (id) and identificatory of the time structure (it), so we need to place them in the model.

Consider the next example

clear all
use http://www.stata-press.com/data/r9/airacc.dta
xtset airline time,y
reg pmiles inprog
lmabpxt  pmiles inprog, id(airline) it(time)

Notice that the Box-Pierce test implemented by Emad Abd Elmessih Shehata & Sahra Khaleel A. Mickaiel (2004) will re-estimate the pooled regression. And the general output would display this:

In this case, we can see a p-value associated to the Lagrange multiplier test of the box-pierce test, and such p-value is around 0.96, therefore, with a 5% level of significance, we cannot reject the null hypothesis, which is the No AR(1) panel autocorrelation in the residuals.

Consider now, that you might be using fixed effects approach. A numerical approach would be to include dummy variables (in the context of least squares dummy variables) of the individuals (airlines in this case) and then compare the results.

To do that we can use:

tab airlines, gen(a)

and then include from a2 to a20 in the regression structure, with the following code:

lmabpxt  pmiles inprog a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15 a16 a17 a18 a19 a20 , id(airline) it(time)

This would be different from the error component structure, and it would be just a fixed effects approach using least squares dummy variable regression. Notice the output.

Using the fixed effects approach with dummy variables, the p-value has decreased significantly, in this case, we reject the null hypothesis at a 5% level of significance, meaning that we might have a problem of first-order serial correlation in the panel data.

With this example, we have done the Box-Price test for panel data (and additionally, we established that it’s sensitive to the fixed effects in the regression structure).

Notes:

The lmabpxt appears to be somewhat sensitive if the number of observations is too large (bigger than 5000 units).

There are an incredible compilation and contributions made by Shehata, Emad Abd Elmessih & Sahra Khaleel A. Mickaiel which can be found in the next link:

http://www.haghish.com/statistics/stata-blog/stata-programming/ssc_stata_package_list.php

I suggest you to check it out if you need anything related to Stata.

Bibliography

Box, G. E. P. and Pierce, D. A. (1970) “Distribution of Residual Autocorrelations in Autoregressive-Integrated Moving Average Time Series Models”, Journal of the American Statistical Association, 65: 1509–1526. JSTOR 2284333

G. M. Ljung; G. E. P. Box (1978). “On a Measure of a Lack of Fit in Time Series Models”. Biometrika 65 (2): 297-303. doi:10.1093/biomet/65.2.297.

Shehata, Emad Abd Elmessih & Sahra Khaleel A. Mickaiel (2014) LMABPXT: “Stata Module to Compute Panel Data Autocorrelation Box-Pierce Test”

Please follow and like us:

Ramsey RESET Test on Panel Data using Stata

In regression analysis, we often check the assumptions of the econometrical model regressed, during this, one of the key assumptions is that the model has no omitted variables (and it’s correctly specified). In 1969, Ramsey (1969) developed an omitted variable test, which basically uses the powers of the predicted values of the dependent variable to check if the model has an omitted variable problem.

Assume a basic fitted model given by:

Where y is the vector of containing the dependent variable with nx1 observations, X is the matrix that contains the explanatory variables which is nxk (n are the total observations and k are the number of independent variables). The vector b represents the estimated coefficient vector.

Ramsey test fits a regression model of the type

Where z represents the powers of the fitted values of y, the Ramsey test performs a standard F test of t=0 and the default setting is considering the powers as:

In Stata this is easily done with the command

estat ovtest

after the regression command reg.

To illustrate this, consider the following code:

use https://www.stata-press.com/data/r16/auto
regress mpg weight foreign
estat ovtest

The null hypothesis is that t=0 so it means that the powers of the fitted values have no relationship which serves to explain the dependent variable y, meaning that the model has no omitted variables. The alternative hypothesis is that the model is suffering from an omitted variable problem.

In the panel data structure where we have multiple time series data points and multiple observations for each time point, in this case we fit a model like:

With i=1, 2, 3, …, n observations, and for each i, we have t=1, 2, …, T time periods of time. And v represents the heterogenous effect which can be estimated as parameter (in fixed effects: which can be correlated to the explanatory variables) and as variable (in random effects which is not correlated with the explanatory variables).

To implement the Ramsey test manually in this regression structure in Stata, we will follow Santos Silva (2016) recommendation, and we will start predicting the fitted values of the regression (with the heterogenous effects too!). Then we will generate the powers of the fitted values and include them in the regression in (4) with clustered standard errors. Finally, we will perform a significant test jointly for the coefficients of the powers.

use https://www.stata-press.com/data/r16/nlswork

xtreg ln_w grade age c.age#c.age ttl_exp c.ttl_exp#c.ttl_exp tenure c.tenure#c.tenure 2.race not_smsa south, fe cluster(idcode)

predict y_hat,xbu

gen y_h_2=y_haty_hat gen y_h_3=y_h_2y_hat

gen y_h_4=y_h_3*y_hat

xtreg ln_w grade age c.age#c.age ttl_exp c.ttl_exp#c.ttl_exp tenure c.tenure#c.tenure 2.race not_smsa south y_h_2 y_h_3 y_h_4, fe cluster (idcode)

test y_h_2 y_h_3 y_h_4

Alternative you can skip the generation of the powers and apply them directly using c. and # operators in the command as it follows this other code:

use https://www.stata-press.com/data/r16/nlswork

xtreg ln_w grade age c.age#c.age ttl_exp c.ttl_exp#c.ttl_exp tenure c.tenure#c.tenure 2.race not_smsa south, fe cluster(idcode)

predict y_hat,xbu

xtreg ln_w grade age c.age#c.age ttl_exp c.ttl_exp#c.ttl_exp tenure c.tenure#c.tenure 2.race not_smsa south c.y_hat#c.y_hat c.y_hat#c.y_hat# c.y_hat c.y_hat#c.y_hat# c.y_hat# c.y_hat , fe cluster (idcode)

test c.y_hat#c.y_hat c.y_hat#c.y_hat# c.y_hat c.y_hat#c.y_hat# c.y_hat# c.y_hat

At the end of the procedure you will have this result.

Where the null hypothesis is that the model is correctly specified and has no omitted variables, however in this case, we reject the null hypothesis with a 5% level of significance, meaning that our model has omitted variables.

As an alternative but somewhat more restricted, also with more features, you can use the user-written package “resetxt” developed by Emad Abd & Sahra Khaleel (2015) which can be used after installing it with:

ssc install resetxt, replace

This package however doesn’t work with factor-variables or time series operators, so we cannot include c. or i. and d. or L. operators for example.

clear all

use https://www.stata-press.com/data/r16/nlswork

gen age_sq=ageage gen ttl_sq= ttl_exp ttl_exp

gen tenure_sq= tenure* tenure

xtreg ln_w grade age age_sq ttl_exp ttl_sq tenure tenure_sq race not_smsa south, fe cluster(idcode)

resetxt ln_w grade age age_sq ttl_exp ttl_sq tenure tenure_sq race not_smsa south, model(xtfe) id(idcode) it(year)

however, the above code might be complicated to calculate in Stata, depending on how much memory do you have to do the procedure. That’s why in this post it was implemented the manual procedure of the Ramsey test in the panel data structure.

Bibliography

Emad Abd, S. E., & Sahra Khaleel, A. M. (2015). RESETXT: Stata Module to Compute Panel Data REgression Specification Error Tests (RESET). Obtained from: Statistical Software Components S458101: https://ideas.repec.org/c/boc/bocode/s458101.html

Ramsey, J. B. (1969). Tests for specification errors in classical linear least-squares regression analysis. Journal of the Royal Statistical Society Series B 31, 350–371.

Santos Silva, J. (2016). Reset test after xtreg & xi:reg . Obtained from: The Stata Forum: https://www.statalist.org/forums/forum/general-stata-discussion/general/1327362-reset-test-after-xtreg-xi-reg?fbclid=IwAR1vdUDn592W6rhsVdyqN2vqFKQgaYvGvJb0L2idZlG8wOYsr-eb8JFRsiA

Please follow and like us:

How to Design a Novel Replicative Study?

“A replication study attempts to validate the findings of a published piece of research. By doing so, that prior research is confirmed as being both accurate and broadly applicable”

A replication process generally consists of two parts. The first part is concerned with reproducing key findings from the original study. If this step was successful, the next part will be performing robustness checks. Meta-analysis reveals another side of replicating published research. Meta-based studies survey the empirical results of a group of published papers attempting to test three key dimensions— statistical power, selective reporting bias, and between-study heterogeneity.

From the perspective of contributing to scientific research, replication studies are important for the continued progress of science. Given the relative scarcity of replication studies and in recognition of the importance of these methods, there has been increasing attention by editors of A-class journals (American Economic Review, Journal of Political Economy, Review of Economic Studies, Journal of Applied Econometrics) in publishing replicative studies.

The one-day intensive online workshop on 29 June 2020 by “Econometric Replication: Methods & Guidelines for Designing a Replicated Study” will teach you theoretically and practically how to design a novel replicated study.

Learn about the workshop and moderator at https://www.ms-researchhub.com/home/events/workshops/econometric-replication.html

References:

http://www.economics-ejournal.org/special-areas/replications-1

https://www.deakin.edu.au/__data/assets/pdf_file/0007/1198456/WhatMeta-AnalysesReveal_WP.pdf

https://link.springer.com/article/10.1007/s11301-018-0149-3

Please follow and like us:

The holy grail in econometrics.

In the last month, while I was researching through the literature of the military expenditure and economic growth, I found a little statement from an article, which appointed one of the things less discussed in econometrics, such statement is:

“The Holy Grail of applied econometrics is a tight theoretical model, which fits the data well. Like the Holy Grail, such models are hard to find.” (Dunne, Smith, & Willenbockel, 2005)

When one, as a researcher meditate this, one really knows that matching theoretical models with regression equations it’s indeed hard. Even when econometrics can be defined as the measure and validation part of the economic science, the relationships which are addressed to study are not exactly as accurate as the theory states.

I want to put an example, let’s see the conclusions of the Solow Swan (1956) model with technology. which are compiled in the next equation.

Where Y/L is the gross domestic product -GDP- of the economy measured in per capita units, A is a level of technology, α is the elasticity of the aggregate stock of capital of the economy, s is an exogenous saving rate, δ is the depreciation rate, x is the growth rate of the technology, and n is the growth rate of the population.

The term ε is just added as the stochastic error in the equation to proceed with the regression analysis, which theoretically is defined as independent of the variables in the regression and represents external shocks in the per capita product. However, if this doesn’t happen in the time series context, it could be possible that this term contains all the variables not included in the regression, therefore violating the exogeneity assumption and inducing an omitted variable bias with misspecification.

Basically, the model is telling us that the growth of the economy is positively given by the technology and the rate of saving of the economy which is invested in physical capital.

Now the Augmented Solow-Model proposed by Mankiw, Romer & Weil (1992) also known as the MRW model, concludes the following:

Where we got some new terms denoted with β as the elasticity of the aggregate stock of human capital in the production function, and separated terms of the savings, denoted by s_k as the saving rate dedicated to the accumulation of physical capital and s_h which is the saving rate dedicated to the accumulation of human capital.

The Augmented Model proposed by Mankiw, Romer & Weil has more variables in the specification of the growth of the economy.

Which one is correct? The answer relies on the regressions they have performed with both models, in general, the augmented model explains better the economic growth and the convergence of the economies than the simple Solow-Swan model.

The simple Solow-Swan model has a problem in the specification and an omitted variable problem, the augmented Solow-Swan model correct this by introducing the measure and importance of human capital accumulation. Both are theoretical constructions, but the augmented model fits better in reality than the original model.

Going further, one could ask if it would be wrong to consider all variables as endogenous? In the last two models, we have seen that the savings of physical or human capital are exogenous along with the growth rate of technology, but more theoretical considerations, like the Ramsey (1928) model could determinate the savings as endogenous, even the depreciation and the technology can be endogenized,  so regressing the above equation with two-stage or three-stage least squares would be the best approach.

Considering this set of ideas, econometricians then will have to face a difficult situation when the theoretical approach might not be suitable for the reality of the sample, and I say this because this is a complex world, where a single explanation for all the situations is not plausible.

We need to remember also, that the whole objective of the theory is to explain reality, and if this theory fails to succeed in this objective, even the most logical explanation would be useless. Something completely out of sense is to modify reality to match with the theory.

The holy grail then would be the adequacy of the theory with the reality, and in econometrics, this means that we need to find a strong theoretical framework that matches our data generating process. But the validation techniques should have some logical approaches considering the assumptions of the regression.

Going backward, before theory and empirical methods, we are interested in finding the truth, and this truth goes from discovery existing or non-existing relationships and causality, in order to explain reality. Such findings, even when they start from a deviated or wrong approach are useful to build the knowledge.

A great example of this is the Phillip’s Curve (Phillips, 1958), it started as an empirical fact which correlated positive rates of inflation with employment, and then it began to be strongly study on Phelps (1967) and Friedman (1977) with more theoretical concepts as rational expectations over the phenomenon of inflation.

Econometricians should then do research with logical economic sense when they are heading to estimate relationships, but have to be aware that samples and individuals are not the same in the space (they change according to locations and the time itself). However, the theoretical framework is the main basis we need to always consider during the economic research, but also remember we can propose a new theoretical framework, to explain the reality on the basis of facts and past theories.

Bibliography

Dunne, J., Smith, R. P., & Willenbockel, D. (2005). MODELS OF MILITARY EXPENDITURE AND GROWTH: A CRITICAL REVIEW. Defence and Peace Economics, Volume 16, 2005 – Issue 6, 449-461.

Friedman. (1977). Nobel Lecture Inflation and Unemployment. Journal of Political Economy, Vol. 85, No. 3 (Jun., 1977), 451-472.

Kwat, N. (2018). The Circular Flow of Economic Activity. Obtenido de Economics Discussion: http://www.economicsdiscussion.net/circular-flow/the-circular-flow-of-economic-activity/18159

Mankiw, N. G., Romer, D., & Weil, N. D. (1992). A CONTRIBUTION TO THE EMPIRICS OF ECONOMIC GROWTH. Quarterly Journal of Economics, 407- 440.

Marmolejo, I. (2012). Indifference Curve Confusion and Possible Critique. Obtenido de Radical Subjectivist: https://radicalsubjectivist.wordpress.com/2012/02/10/indifference-curve-confusion-and-possible-critique/

Nicholson, W. (2002). Microeconomic Theory. México D.F.: Thompson Learning.

Phelps, E. (1967). Phillips Curves, Expectations of Inflation and Optimal Unemployment over Time . Economica, New Series, Vol. 34, No. 135 (Aug., 1967), 254-281.

Phillips, A. W. (1958). The Relation between Unemployment and the Rate of Change of Money Wage Rates in the United Kingdom, 1861-1957. Economica, New Series, Vol. 25, No. 100. (Nov., 1958),, 283-299.

Ramsey, F. P. (1928). A mathematical theory of saving. Economic Journal, vol. 38, no. 152,, 543–559.

Solow, R. (1956). A Contribution to the Theory of Economic Growth. The Quarterly Journal of Economics, Vol. 70, No. 1 (Feb., 1956),, 65-94.

Please follow and like us:

The budget constraints in the microeconomic approach

Following the last post which gave an example to model the Cobb-Douglas utility function regarding microeconometrics, we need to provide an important aspect related to the behavior of the consumer. That is the budget constraint (referred to as a monetary linear constraint) which limits the number of goods that the consumer can buy and use to get a certain level of utility.

In this article, I want to start with an introduction of the basic concept of budget constrain related to the income in microeconomics, and that’s the linear constraint given a set of quantities and prices of the goods which determine the utility for the consumer, this is closely related to the Cobb-Douglas utility function (and overall utility functions) since it is one of the main aspects of the microeconomic theory.

Keeping the utility function as the traditional Cobb-Douglas function:

We know that the utility is sensitive to the elasticity αand B. With αand B lesser or equal to one.  And since resources are not infinite, we can establish that the amount of goods that the consumer can pay is not infinite. In markets, the only way to get goods and services is with money, and according to the circular flow of the economy, the factor market can revenue two special productive factors: labor and capital, we can say that consumers have a level of income derived from his productive activities.

The Circular Flow of the economy. Source: Kwat (2018)

Inside the microeconomic theory in general, utility U is restricted to the income of the consumer within a maximization process with a linear constraint containing the goods and prices which are consumed. The budget constraint for the two good model looks as it follows:

Where I is the income of the individual, Px is the price of the good X and Py is the price of the good Y. One might wonder if the income of the customer is the sum of prices times goods, which doesn’t seem as close to what the circular flows states in a first glance. Income could be defined as the sum of the salary and overall returns of the productive activities (like returns on assets) of the consumer, and there’s no such thing as that in the budget equation.

However, if you look at the equation as a reflection of all the spending on goods (assuming the consumer will spend everything) this will equally match all that he has earned from his productive activities.

The maximization problem of the consumer is established as:

And typical maximization solution is done by using the Lagrange operator where the whole expression of the Lagrange function can be stated as:

A useful trick to remember how to write this function is to remember that if λ is positive then the income is positive and the prices and goods are negative (we’re moving everything to the left from the constraint equation). And the first-order conditions are given by:

By simply dividing the first two differential equations you’ll get the solution to the consumer’s problem which satisfies the relation as the next ratio:

Each good then is primarily sensitive to his own price and the weight (elasticity) in the utility function, seconded by the prices and quantities of the other good Y. If we replace one of the solutions in the last differential equation, say X, we’ll get:

Taking as a general factor the Py*Y will result in:

The quantities of the good Y are a ratio of the Income times the elasticity B and this is divided by the price of the same good Y given the sum of the elasticities. Before we stated that α+ B = 1 so we got that B=1- α and the optimum quantities of the goods can be defined now as:

This optimal place its graphically displayed ahead, and it represents the point where the utility is the maximum given a certain level of income and a set of prices for two goods, if you want to expand this analysis please refer to Nicholson (2002).

The tangent point of the maximization process related to the income constraint for the case of two goods. Source: Own Elaboration.

The budget constrains: An econometric appreciation

Suppose we got a sample of n individuals which only consumes a finite number of goods. The income is given for each individual and also the quantities for each good. How we would be able to estimate the average price that each good has? If we start by assuming that the income is a relation of prices and quantities from the next expression:

Where X_1 is the good number one associated with the price of the good P_1, the income would be the sum of all quantities multiplicated by their prices or simply, the sum of all expenses. That’s the approach on demand-based income. In this case we got m goods consumed.

Now assume we can replace each price for another variable.

Looks familiar, isn’t it? It’s a regression structure for the equation, so in theory, we are able to estimate each price with ordinary least squares. Assuming as the prices, the estimators associated with each good with B-coefficients. And that all the income is referred to as the other side of the coin for the spending process.

The simulation exercises

Assume we got a process which correlates the following variables (interpret it as the Data Generating Process):

Where I is the total income, Px, Py, Pz are the given prices for the goods X, Y, and Z and we got s which refers to a certain amount of savings, all of this of the individual i. This population according to the DGP not only uses the income for buying the goods X, Y, and Z, but also deposits an amount of savings in s. The prices used in the Monte Carlo approach are Px=10, Py=15, and Pz=20.

If we regress the income and the demanded quantities of each good, we’ll have:

The coefficients don’t match our DGP and that is because our model is suffering from a bias problem related to omitted variables. In this case, we’re not taking into account that the income is not only the sum of expenses in goods but also the income is distributed in savings. Regressing the expression with the s variable we have:

The coefficients for the prices of each good (X, Y, Z) match our DGP almost accurate, R squared has gotten a significant increase from 51.45% to 99.98%. And the overall variance of the model has been reduced. The interesting thing to note here is that the savings of the individuals tend to be associated with an increase in the income with an increase of one monetary unit in the savings.

Remember that this is not an exercise of causality, this is more an exercise of correlation. In this case, we’re just using the information of the goods for the individuals of our sample to estimate the average price for the case of two goods. If we have a misspecification problem, such an approach cannot be performed.

This is one way to estimate the prices that the consumers pay for each good, however, keep in mind that the underlying assumptions are that 1) the prices are given for everyone, they do not vary across individuals, 2) The quantities of X, Y and the amount of savings must be known for each individual and it must be assumed that the spending (including money deposited in savings) should be equivalent to the income. 3) The spending of each individual must be assumed to be distributed among the goods and other variables and those have to be included in the regression, otherwise omitted variable bias can inflict problems in the estimators of the goods we’re analyzing.

References

Kwat, N. (2018). The Circular Flow of Economic Activity. Economics Discussion. Recuperated from: http://www.economicsdiscussion.net/circular-flow/the-circular-flow-of-economic-activity/18159

Marmolejo, I. (2012). Indifference Curve Confusion and Possible Critique. Radical Subjectivist. Recuperated from: https://radicalsubjectivist.wordpress.com/2012/02/10/indifference-curve-confusion-and-possible-critique/

Nicholson, W. (2002). Microeconomic Theory. México D.F.: Thompson Learning.

Please follow and like us: