Some new evidence points out that heterogeneous effects are derived from both of these situations in regards the behavioral response of Welfare across continental regions. (A working paper titled “

We can witness that a general pattern arises when Gini Coefficient increases significantly (in particular more than 40), where a reduction in the Sen’s Welfare Index is occurring. This is not a general behavior in what regards the slopes of the relationship. It is a steeper slope for Sub-Saharan Africa in comparison to Latin America and the Caribbean.

Preliminary results evidence that not all the continents are affected with statistically significant results of the inequality towards Welfare, and Latin America tends to be a special case in the relationship.

Published results excluding Venezuela from the article of Riveros-Gavilanes (2020) point out that for this part of the American continent, the magnitude over the long run of the inequality represents much larger effects than economic growth. However, important discrepancies may exists in what relates to short-run dynamics. As appointed for this case of study. the short-run behavior only provides evidence that economic growth tends to alter the growth of Welfare in comparison to the long-run behavior where improvements in inequality tend to increase Welfare. In particular, the empirical strategy involved using the Human Development Index as the measure of Welfare across the Latin-American countries, and the income per capita was measured by the real GDP per capita and the complement of the Gini coefficient. This complement is used as the basis for “higher equality” when the regressions are estimated.

The relationship between Welfare and inequality, along with economic growth is a topic that requires further research, (see the working paper in the repository (https://ms-researchhub.com/home/research/msrworkingpapers.html) which will provide a better overview of the regional heterogeneous effects.

Seeking equality is a common objective of States and governments, but in recent times, the task has been more complex as more evasion and flexibility of the rules may occur. Inequality then here comes with the cost of Welfare for the highest amount of the population as the effect is given by the proportions of the income distribution. The highest amount of individuals affected, the larger the harm to Welfare.

In fact, micro-data research could be particularly important from both theoretical and empirical aspects in what relates to the harm to Welfare, if somehow, we are able to present a “sufficient statistic” of Welfare. Or at least, a form to measure individual welfare. In the mean time, the theoretical discussion provides great foundations for what may occur in the future as income inequality rises in the world. In particular, after the Covid-19 Pandemic and the Russian-Ukranian conflict.

References

Riveros-Gavilanes (2020) Estimación de la función de bienestar social de Amartya Sen para America Latina, Ensayos de Economia. 31(59), 13-40. https://revistas.unal.edu.co/index.php/ede/article/view/88235/82113

]]>Authors of the 2 winning posts will receive 100 USD and 50 USD respectively, besides public exposure of their blog/profiles and publicizing their work over wide academic and social networks.

We invite students, researchers, public officials, junior and senior academicians to submit their work to John Mandor at info@ms-researchhub.com using the subject line “MSR economic perspectives contest and author names”.The posts can be under any of the following categories:1- Technical/methodology related2- Summary of an innovative research3- Commentary/analytical/opinion post on global events and local or international economic challenges

The deadline to submit posts is 30 April 2022.

Contest participants should mind the following terms:1- There is no limit for each post’s author board. If the post wins the competition the prize will be distributed evenly among all authors.2- Posts should be between 1000-2000 words.3- By submission, authors claim they have no conflict of interest, and they have the copyrights to publish their work. 4- Non-English posts should be submitted along with a concise and sound English translation.5- Shortlisted posts will be published under their author names at MSR economic perspectives at https://lnkd.in/e6thRiNb6- Deadline to send work (30 April 2022), shortlisting decision (Mid-May 2022), results of final selection, and public voting (end June 2022).

]]>The key to understanding the endogeneity of the co-regressor far away from the classic perspective is derived from the potential covariance which may be different from 0 between the regressors X in a regression model, this produces the topic called “M-Bias” and essentially it is a situation when if one regressor moves, the other regressor do moves too, (slightly different from near-multicollinearity which on the other hand is a linear relationship and not actual changes!).

Let’s start with a linear model as always:

In this setup, we have two regressors which are in notation of matrixes (to simplify the calculous of Least Squares), and we may derive the Least Squares estimator as:

If Cov(X1,X2)= 0 holds, the least-squares estimator for B1 will be unbiased!

Thus, we are interested in the case where the Cov(X1,X2)≠0, a clear implication of the last statement is that when X2 changes also it does change X1. The effect on this will bias the estimates of B1, it creates an “amplification bias effect” because the causal channel is no longer clear in the process to isolate the true effect of X1 on Y.

Notice this is not derived from an existing linear relationship which we may believe from near-multicollinearity, we may define X1=a+bX2+u and this does not necessarily mean that Cov(X1,X2) ≠0, since a linear relationship may exist when two phenomenons are not caused by each other, and X1 must also be potentially endogenous to bias the estimates. Hence, the difference between correlation and causation is important here.

Graphically this means that:

As you can note, X2 is affecting X1 and X1 is also affecting Y, also there may be potential unobserved effects/variables U, which are affecting X1 and Y together. This is the case of the endogenous co-regressor, where X2 is affecting the change on X1, and thus, it will affect the channel effect and amplify the bias if X2 is included in the regression. (This is the case of Model 10 from Cinelli et. al, 2021)

At this point, you may think, “What is this guy talking about? Doesn’t the aggregation of covariates may determine a robust result?” Well, the last is true, but you must be careful to include only sensible covariates and not potential endogenous co-regressors!

In order to demonstrate the effect of this endogenous co-regressor bias amplification, I will follow Cinelli’s Example on R (Cinelli, 2020):

Let’s start creating some observations on R:

`n <- 1e5`

Now let’s create some random disturbances.

`u <- rnorm(n)`

And now let’s create our regressor x2 with also a random behavior:

`x2 <- rnorm(n)`

Let’s create a Data Generating Process -DGP- where we know that Cov(x1,x2) differs from 0, thus, if X2 changes, it will automatically change X1, making it endogenous:

`x <- 2*x2 + u + rnorm(n)`

And let’s create the DGP for the dependent variable.

`y <- x + u + rnorm(n)`

Notice that y is a linear process that depends on x and u within a randomly distributed disturbance. Hence, the component u (as a disturbance) is affecting both x and y. If we run the regression y over x we will have a biased estimator.

`lm(y ~ x)`

```
#> Call:
#> lm(formula = y ~ x)
#>
#> Coefficients:
#> (Intercept) x
#> 0.00338 1.16838
```

And one may think, well, we may improve the estimates if we include x2, but look again!

`lm(y ~ x + x2) # even more biased`

```
#> Call:
#> lm(formula = y ~ x + z)
#>
#> Coefficients:
#> (Intercept) x x2
#> 0.002855 1.495812 -0.985012
```

The coefficient for x has been greatly amplified in the point estimates by the bias-aggregation effect!.

The conclusion of this is that you may double-check again before adding controls on your variables, and be sure to add just sensitive controls, otherwise you will bias the estimates of your regression model!

References:

Cinelli, (2020) Bad Controls and Omitted Variables, Taken from: https://stats.stackexchange.com/questions/196686/bad-controls-and-omitted-variables

Cinelli, C. ; Forney, A. ; Pearl, J. (2021) A Crash Course in Good and Bad Controls, Taken from: https://www.researchgate.net/publication/340082755_A_Crash_Course_in_Good_and_Bad_Controls

Pearl, J. (2009a). Causality. Cambridge University Press

Shrier, I. (2009). Propensity scores. Statistics in Medicine, 28(8):1317–1318.

Pearl, J. (2009c). Myth, confusion, and science in causal analysis. UCLA Cognitive Systems

Laboratory, Technical Report (R-348). URL: https://ucla.in/2EihVyD.

Sjolander, A. (2009). Propensity scores and m-structures. Statistics in medicine,

28(9):1416–1420.

Rubin, D. B. (2009). Should observational studies be designed to allow lack of balance in

covariate distributions across treatment groups? Statistics in Medicine, 28(9):1420–1423.

Ding, P. and Miratrix, L. W. (2015). To adjust or not to adjust? sensitivity analysis of

m-bias and butterfly-bias. Journal of Causal Inference, 3(1):41–57.

Pearl, J. (2015). Comment on ding and miratrix:“to adjust or not to adjust?”. Journal of

Causal Inference, 3(1):59–60. URL: https://ucla.in/2PgOWNd.

In my last recent publication co-authored with Jeisson Riveros, the article entitled “

The article takes as theoretical constructions the statements of Oszlak & Kaufman (2014) from the Open Government features, including their advantages, and innovations from the implementation of policies on this matter. Along with the New Public Management characteristics from Osborne & Gaebler (1994). It is empirically reviewed how the open government policies are correlated with the phenomenons of corruption and transparency for Colombia in two specific years (2014 & 2016). The methodology involved the panel data regression models using Fixed and Random Effects as a form to empirically review these correlations.

The dependent variable to measure the level of corruption involved an Index formulated by the organization “Transparency for Colombia” which belongs to the initiative of “Transparency International” (well-known for the construction of the Global Corruption Barometer). The index, called “

The set of independent variables for the first model were chosen through three components of the Open Government Index published by the National Attorney of Colombia, these components are also found for the same years, in the same local level, and are segmented in

Some of the scatterplots of these independent variables against the Transparent Municipal Index for Colombia in the two years of the study, reflected a positive correlation with the open government policies. And some of visual results can be witnedssed in the following graphs.

The panel data econometric model involved a linear specification of the Transparent Municipal Index (as the measure of the risk of corruption) to be explained through the open government components of OI, EI, and DI. Which derived in the next specification:

Where *i* represents the local entity (from the municipal level) and *t* the year of the observation. A one-way error component model was included as C_{i} and allowed to capture some of the unobserved heterogeneity of the local entities. And in the regression panel data outputs two specifications were included by adding a Visibility variable of the public entities. The results were the following:

From the two models estimated, the inclusion of the Visibility variable improved greatly the specification of the original model, and the results derived in a significant positive impact from the components of Organization of the Information and Exposition of the Information as relevant variables at a 5% level of significance to explain the Transparent Municipal Index as the measure of the risk of corruption. Increases in the components deliver an increase in the transparency at the local levels, and by the index construction, involves a decrease in the risk of corruption practices. The linear explanation of the model (not so important in this context) is also decent between the individuals.

**Concluding remarks** are that practices of open government help to reduce the risk of corruption, by the theoretical constructions in which the public management can be viewed, inspected, and controlled by the society, and in this setting, provided a better set up for the governance in the local entities. Further research is required, but this study supports the traditional idea that a good and efficient government is one that has no secrets, and according to logical deductions, such transparency helps to decrease the risk of corruption, which for the sample of the study, are empirically correlated.

**Reference of the article: **

Riveros Gavilanes, J. M. & Riveros Gavilanes, J. A. (2021) Implicaciones e Incidencias de las Politicas de Gobierno Abierto, Contenido en Retos 2020: Gobierno Abierto. Instituto de Altos Estudios Nacionales IAEN (Ecuador). Recuperated from: https://www.researchgate.net/publication/356507453_Implicaciones_e_incidencias_de_las_politicas_de_gobierno_abierto_el_caso_colombiano_2014_y_2016

**General References**

Corporación Transparencia por Colombia CTC. Resultados 2015-2016. Recuperado de https://indicedetransparencia.org. co/2015-2016/ITM/Alcaldias

Oszlak, O., & Kaufman, E. (2014). Teoría y práctica del gobierno abierto: Lecciones de la experiencia internacional. Buenos Aires: OEA. Recuperado de https://redinpae.org/recursos/kaufman-oszlak.pdf

Osborne, D., y Gaebler, T. (1994). Un nuevo modelo de gobierno: cómo transforma el espíritu empresarial al sector público. Ciudad de México: Gernika.

Procuraduría General de la Nación PGN. Índice de Gobierno Abierto. Recuperado de https://www.procuraduria.gov.co/portal/Indice-de-Gobierno-Abierto.page

We can take the natural logarithm

to show that the natural logarithm of asset prices follows a *random walk* – the best forecast for prices is simply the current price. As such, applying regression methods from basic ARIMA models to advanced neural networks will fail – the models will simply repeat the last observation in the training data.

Instead, we can successfully predict asset prices by assuming their *returns* follow *Geometric Brownian Motion* (GBM):

Here, the change in returns is given by the expected value plus volatility, both multiplied by the last observed price. For the log of returns, and using Ito’s Lemma, one can write the solution to this differential equation as

where *B_t* represents a Brownian motion process. The above formula is how we will forecast liquid asset prices in this article. For models in other asset types (ie illiquid assets), one may simply substitute the GBM equation in Ito’s Lemma and derive a new formula for forecasting.

We first import our packages:

```
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import scipy
from fitter import Fitter
```

For today, we forecast Bitcoin using data from August 01, 2020 to November 15, 2021. Our data comes from Yahoo Finance.

```
liquid = pd.read_csv("/path/to/BTC-USD.csv")
liquid_returns = np.log(liquid.Close) - np.log(liquid.Close.shift(1))
```

We split both our returns and prices data into training and testing sets:

```
train, test = pmdarima.model_selection.train_test_split(liquid.Close.dropna(), train_size = 0.8)
training, testing = pmdarima.model_selection.train_test_split(liquid_returns.dropna(), train_size = 0.8)
```

Now, we obtain the distribution of our returns. Note that it is a common and erroneous practice to assume that returns follow a normal distribution in forecasting. This practice yields disastrous results – one needs proper knowledge of the distribution to forecast properly.

```
f = Fitter(training, timeout = 120)
f.fit()
f.summary()
```

Using BIC as our criterion, we get the Laplace distribution as our best distribution.

`f.get_best(method = "bic")`

We now write our main function for performing Monte Carlo Integration. This methods uses random numbers to repeatedly sample future results – in our case, we sample random numbers from a Laplace Distribution, then multiply them to our volatility to obtain our diffusion term.

```
def GBMsimulatorUniVar(So, mu, sigma, T, N):
dim = np.size(So)
S = np.zeros([T + 1, int(N)])
S[0, :] = So
for t in range(1, int(T) + 1):
for i in range(0, int(N)):
drift = (mu - 0.5 * sigma**2)
Z = scipy.stats.laplace.rvs()
diffusion = sigma*Z
S[t][i] = S[t - 1][i]*np.exp(drift + diffusion)
return S[1:]
```

Here, we forecast our prices with 1000 simulations for the length of our testing data. We use the average of simulations as our optimal forecast.

```
prices = GBMsimulatorUniVar(So = liquid.Close.iloc[len(training)], mu = training.mean(), sigma = training.std(), T = len(test), N = 1000)
newpreds = pd.DataFrame(prices).mean(axis = 1)
```

Taking the mean average prediction error (MAPE), we find around 6.8% forecasting error.

```
from sklearn.metrics import mean_absolute_percentage_error as mape
mape(newpreds, test.dropna())
```

We now plot our forecast against the real test values.

```
axis = np.arange(len(train) + len(test))
plt.plot(axis[:len(train)], train, c = "blue")
plt.plot(axis[len(train):], test, c = "blue")
plt.plot(axis[len(train):], np.array(newpreds), c = "green")
```

As one can see, we have relatively good results.

One should note that other assets may have different distributions. For instance, here are distribution fit results for Ethereum:

```
> f.get_best(method = "bic")
{'gennorm': {'beta': 1.126689300086524,
'loc': 0.007308884923027554,
'scale': 0.047110827282059724}}
```

As a rule of thumb, the distribution parameters in the fit function need to be multiplied by 2.5 when sampling random numbers to obtain good forecast results. One must also use common sense in determining which proposed distribution to use – those such as the Gumbel, Logistic, or similar ones (used to model categorical data) are wholly unsuitable for stock price data.

```
def GBMsimulatorUniVar(So, mu, sigma, T, N):
dim = np.size(So)
#t = np.linspace(0., T, int(N))
S = np.zeros([T + 1, int(N)])
S[0, :] = So
for t in range(1, int(T) + 1):
for i in range(0, int(N)):
drift = (mu - 0.5 * sigma**2)
Z = scipy.stats.gennorm.rvs(beta = 1.126689300086524*2.5)
diffusion = sigma*Z
S[t][i] = S[t - 1][i]*np.exp(drift + diffusion)
return S[1:]#, t
```

This forecast obtains around 8.6% forecasting error with Ethereum.

While some asset prices may follow random walks, using the proper tools to model them gives great forecasting results and accuracy. However, even with the best tools and distributions, no forecast will ever be great if a structural break exists in the data. Both our Ethereum and Bitcoin data started and ended during the COVID-19 Pandemic – mixing pre and post pandemic data is always an ill-advised move.

]]>Why do people so easily invest money showing off with new mobile, tablet, shirt, or hand watch

while

they tend to be reluctant or uncertain when paying for training, course, or even buying a book that can pay off and change their lives.

Why do you waste money in decorating what others see “on” you and poorly or never invest in what you see “in” yourself?

86 billion brain cells control what you do, say, think, feel and learn. They control your job, investment, and savings. They control your ability to make the right decisions at the right time. They control how you simply live. Brain muscles are similar to any muscle, if not trained they will shrink.

If you think that what you took from your high school or university is enough for you to rest safely on this knowledge, You are mistaken. Even graduates from top-world universities need to keep building up their knowledge to stay up to date with recent scientific discoveries and market trends in their fields.

Do not get fooled with the consumerism era, they fool you to keep getting richer and successful. Stand up, rationalize, invest in your real property ” Your Brain and Knowledge”.

]]>المهم انا عجبنى الموضوع لاني حسيت اني ممكن افيد بصورة من الصور و عشان كدا بكتب ده دلوقتي

من ضمن الحاجات اللى بتديك قوه رهيبة وانتا فى الغرية هيا حاجة اسمها معية ربنا ليك. يعنى ايه الكلام ده…..

انا بتكلم والله انى معرفتش معنى الجملة دي فى اللغة العربية ولكن عرفتها بقلبى من سنين وكل يوم بشوف حاجات و بنعيش مواقف جديدة من تدوير على شغل لتدوير على شقة ايجار اخري او اننا نسجل عربيتنا فى شركة تأمين , انتوا مش عارفين يعنى ايه ربنا يكون بيختارلك كل حاجه فى حياتك وتطلع هي فعلا احسن حاجة بعد ما يعملوا عليها دراسات وابحاث و يطلع ده هوا فعلا معنى معية ربنا ليك.

يعنى اكنك موكل ناس مختلفة و لله المثل الاعلى و كل واحد من الناس دي بيفهم فى حاجات مختلفة و ممكن تستشيره فهيا عشان يشير عليك تعمل ايه وتختار ايه عشان يكون احسن حاجة ليك لانه متخصص فى الحاجة دي… هوا ده اللى اقصده انتا تقول يا رب انت معايا و انا متوكل عليك وده فى الغربة ملجأ ليك فعلا لانك فعليا وحرفيا بتكون لوحدك وملكش حد تروحلة يساندك او يديلك غير ان بردة ربنا يوقفلك حد.

وتبتدي رحلتك فى التوكل على الله كل ما تمشي شوية فى حياتك وترجع تبص ورا وتقيم اللى انتا عملتواو عشته تلاقى ان ده كان احسن حاجة ليك و احسن طريقة توصل بيه للى انتا فيه دلوقت. حتي لو كنت مريت بحاجات تعبتك بس معية رينا محوطاك طول الوقت و انا هكتبلكوا مثل يعني. انا اول ما وصلت المانيا كنت مبعرفش اتكلم كلمة واحدة المانى او اتنين عشان الحق و هما

Entschuldigung und Danke

يعني اسف وشكرا. وبعد شهر اضيته فى البيت ابتديت كورس الالماني ودعيت ربنا انوا يوفقنى فى تعلم اللغة دي والحمد لله بعد 3 شهور كنت بتكلم كويس و بقيت اقدر ادور على شغل و فعلا لقيت شغل تحت التدريب فى صيدلية والمديرة فى الصيدلية دي كانت من نوعية الالمان اللى بتسمع عن حبهم الخرافى للعرب

لدرجة انها كل يوم تقريبا تخترع موضوع جديد عشان تخللى صاحبة الصيدلية ,لان صاحبة الصيدلية كانت فعلا بتحبنى , وبتحترمني, تتخانق معايا عشان امشي من الصيدلية دي. انا شغلي فى الصيدلية دي فى الوقت ده كان شبة عذابو اللحظة الحلوه كانت لما مديرة الصيدلية دي تاخد اجازه و تطير بعيد عني. و الغريب ان نفس المديره دي لما انا اخدت اجازه رعاية طفل ورجعت بعد شهور قامت وحضنتنى اول ما جيت و مفيش حد الماني ممكن يعمل كدا او يحضنك اصلا, قمة المشاعر حقيقي, يعنى المديرة كانت جواها

deep deep inside

بتحبنى بس الموضوع كان تنافس خوفا على مكانها. المهم بعد ما عقدي خلص ومشيت من الصيدلية و خلصت المعادلة بتاعتى وبدأت اشتعل فى صيدليات تانية كتير, تقريبا كل سن كنت كنت بشتغل فى صييدلية مختلفة. اكتشفت ان المرحلة دي فى حياتى كانت اهم محطة و كنت لازم اعدي عليها عشان اتعلم اللغة كويس جدا لان مديرة الصيدلية مكنتش بتخلينى اقف لوحدي و بتخلينى اقف مع واحده تانية عشان توصلى احساس انى غير مؤهله, كان ده بصراحة اهم حجر اساس اتعلمت منه اللغة كويس بالاكسنت المظبوطة و ثقتي فى نفسي زادت جدا مع ان المديرة كان هدفها انا تهد ثقتى بنفسي من الاصل. وكمان اتعلمت صيدلة كويس جدا لان اللى كنب بقف معاها على وش الخروج للمعاش و كانت خبرتها طويلة جدا و مش واحده بس كنب بقف مع اكتر ناس عندهم خبره فى الصيدلية فجمعت خبراتهم كلها.. فكان اللى عملته المديرة اللى كانت فاكراه ضرر ليا هيا اكتر حاجة نفعتني بيها عشان انجح فى المعادلة و اكون متفوقة لغويا و علميا

هوا ده اللى اقصدة بالظبط بالمعية الالهيه معاك ان الحاجة اللى قدامك فاكر انه بيضرك بيها بتكون اكتر حاجة بتنفعك وده عشان بتقول انى توكلت عليكي يا رب. عايز اقولكوا اني ساعتها مكنتش بحمد ربنا على وجودي فى الصيدلية دي زي مانا بحمد ربنا دلوقتي لاني دلوقتي بصيت ورايا وشوفت اد الصيدلية دي كان فيها نظام و الناس اللى كانت شغالة فيها كان عندها خبره كبيرة جدا فى وصف الدواء..

ده موقف واحد فى حياتي من مواقف كتير كلها بتأكد معنى عناية ربنا لما تكون معاك

]]>We shall be discussing omitted variables bias. This is bias that occurs when

- the regressor X is correlated with an omitted variable Z.
- omitted variable Z is a determinant of the dependent variable Y.

Both of these conditions result in the violation of the Gauss-Markov assumption of ordinary least squares regression

which is the assumption that states that the error term is uncorrelated with the regressors. The u denotes the error term while the X denotes the regressors. Simply put, this bias occurs when an econometric model leaves out one or more relevant variables. This bias results from the model attributing the effects of the missing variable(s) to those that are included in the model.

One simple example of this bias is by taking an example of dependent variable Salary, regressor Education and omitted variable Ability. Here, Salary is the annual salary of the individuals in our sample. Education can be either years of education or their scores on tests or any other measure of education. Ability is some variable which signifies talent, skill, or proficiency in general. We can think of Ability as being unmeasurable as well.

Let us think about a bias induced when we omit the Ability variable as a regressor, either due to a mistake or because we cannot measure it. At the same time, the true data generating process includes Ability and is

In this case, the variable Ability would have some impact on both Salary as well as Education. Hence, we can say that Ability is correlated with Education. The effect on Salary could be directly captured by including it in the regression. At the same time, we have no method to quantify Ability. And if we do not include it then its effect on Salary will not be picked up as an indirect effect through the variable Education.

If we do not use the true data generating process and instead use

Then we run into the problem of omitted variable bias. This causes our ordinary least squares estimate of the estimate of Education (denoted by beta_1)to be biased and inconsistent. This means that the bias cannot be prevented by increasing the sample size because omitted variable bias prevents ordinary least squares estimate from converging in probability to the true parameter value. The strength and direction of the bias is determined by the correlation between the error term and the regressor.

Now that we know exactly what the issue of Omitted Variable Bias is, let us consider some solutions.

One answer to this issue is to include more variables in the regression model. By doing this, the regression model uses as independent variables, not only the ones whose effects on the dependent variable are of interest, but also any potential variables which might cause omitted variables bias. Including these additional variables can help us reduce the risk of inducing omitted variables bias but at the same time, it may increase the variance of the estimator.

Some general guidelines to follow in this case that help us in our decision to include additional variables are:

- Specify the coefficient of interest.
- Based on your knowledge of the variables and model, identify possible sources of omitted variables bias. This should give you a starting point specification as baseline and a set of regressor variables, sometimes called control variables.
- Use different model specifications and test against your baseline.
- Use tables to provide full disclosure of your results – by presenting different model specifications, you can support your argument and enable readers to see the impact of including other regressors.

If diminishing the bias by including additional variables is not possible, such as in the cases where there are no adequate control variables, then there are still a variety of approaches which can help us solve this problem.

- Making use of panel data methods.
- Making use of instrumental variables regression methods such as Two Stage Least Squares.
- Making use of a randomized control experiment.

These approaches are important to consider because they help us to avoid false inferences of causality due to the presence of another underlying variable, the omitted variable, that influences both the independent and dependent variables.

**Bibliography:**

Wooldridge, Jeffrey M. (2009). “Omitted Variable Bias: The Simple Case”. *Introductory Econometrics: A Modern Approach*. Mason, OH: Cengage Learning. ISBN 9780324660548.

Greene, W. H. (1993). *Econometric Analysis* (2nd ed.). Macmillan.

First, the choice of topics indicates that the researcher considers the economic growth as ultimate objective and every other variable is subservient to growth. To me, it is not sensible to treat the growth as ultimate objective of economic policies. It is making more sense to assume the humanity as the ultimate objective instead of the growth.

For example you may find many papers with the title similar to ‘Human Capital and Economic Growth’ with the conclusion that human capital improves economic growth therefore we should have focus on improving the human capital. But what is human capital? It consists of the measures of health and education of the humanity. It is actually a measure of the well-being of humanity and therefore it closer to the ultimate objective. That is, if someone is writing ‘there should be growth because it improves the public health’, it makes more sense. But if someone is writing ‘we should focus on health because it will improve the growth’ it looks very odd.

Sometimes, the growth does not have very strong relationship with the happiness and wellbeing of the public. In 2008, the President of France Mr. Nicholas Sarkozy formulated a commission ‘The Commission on the Measurement of Economic Performance and Social Progress’ to revise the measure of human wellbeing. The commission consisted of two Nobel laureates Amartya Sen and Joseph Stiglitz and and another well-known economist Jean Paul Fitoussi. The report highlights several flaws of the conventional measures of GDP to be used as the measure of well-being. They cite an example of a couple who is living in their home happily; they grow most of their food in kitchen garden, cook the meal for the family at home and enjoy reading newspaper together. All of these activities are not marketed, therefore did not count to GDP. In contrast, consider a person who lives in a hostel, eats unhealthy fast food, visits the prostitute and goes to the bar for the entertainment and while coming back from the bar, due to overdrinking, had a serious accident and goes to the mechanic to repair his car. All of these activities are the market activities and would count to GDP. One can very easily judge that the life of the couple is much better than the life of lonely young man, but the GDP would consider the young man to be better than the couple. So GDP is neither the ultimate objective nor is it a good measure of happiness and wellbeing.

The services of female at home are among the most valuable services, as they prepare and recruit the future generations. But these activities are not marketed, therefore don’t count to GDP. The same services would be counted if provided at a marketplace. The high percentage of economic growth might be a reflection of conversion of home activities into market activities. It may not indicate any improvement in the living standards of the people.

Beside the false philosophy of taking the GDP as ultimate objective, many times the research question itself is trivial. For example, you might have seen the research papers like ‘Impact of Financial Development on Economic Growth’. But what is the financial development? It is usually proxied by the profits on financial assets and these profits are already a part of GDP. The GDP includes all the goods and services produced in an economy and the financial assets are also part of the economy. So it is useless to ask whether or not GDP will increase with the increase in profits on financial assets. It is not possible to increase the financial development without having same increase in the overall GDP. Same is the case of the questions like ‘Energy consumption and economic growth’. The energy consumption is a part of GDP and an increase in energy consumption must increase the GDP. No unknown research question is addressed by this kind of research.

At the third place there is issue of the methodology of estimating the models. There are literally dozens of theories for economic growth and hundreds of variables are the candidates of explaining economic growth. In fact this is no surprise to have so many models for economic growth because everything produced in an economy ultimately counts to the growth. Numbers of haircuts at a barber shop also counts to the economic growth, and it indicates that there is no harm in developing a theory ‘haircut theory of economic growth’. Most probably, the regression of GDP on haircuts will yield very high correlation with growth.

But when you have so many determinants of a variable of interest, estimating a model without any of these variables would be subject to serious missing variable bias. Any model based on single theory is inherently subject to missing variable bias. You have to take care of all determinants of Y, even if they are not a part of your research question.

On the other hand it is also not easy to avoid this missing variable because there are so many theories and models for growth and a model encompassing all of these variables in one model is rarely feasible to estimate. This illustrates the inherent difficulty of estimating a growth model. Estimating a growth model with sensible procedures is not an easy job. Only few people have attempted the growth model in a serious mode, considering all important determinant of growth and one of those is Sala-i-Martin study titled as ‘I just ran 2 million regressions’. The title illustrates the difficulty of estimating a growth model in a sensible way, i.e. you may need to run 2 million regressions to get valid determinants of growth.

Same difficulty arise in many other kinds of economic models such as the model for inflation, consumption and others where there are so many theories to explain the variable of interest. But the academic journals are accepting the papers without any care of these considerations and people are increasing the lengths of their CVs by putting names of so many papers in it.

In fact having so many models for a variable of interest provides a unique opportunity of doing novel research, but such a research may need longer time and longer efforts. A note on selecting appropriate variables coming from different theories can be found in my blogs [1] [2].

Suppose there are three theories for a variables of interest, it is easy to produce a paper based on theory 1. But a sensible research should take the variables from the theory 1,2 and 3 simultaneously and should come up with a final model.

In my previous blogs, I have explained how to do research in presence of multiple theories by constructing the Generalized Unrestricted Model. However, sometimes it is not possible to construct the generalized model. In the next few blogs, I will explain how we can do research in an area where there are so many models and constructing GUM is not possible. Stay tuned

]]>Some of the most popular models used in Data Analysis imply the use of the so-called “Black Box” approach. Regarding the simplest interpretation one can give in this context, it depends on the inputs and outputs that a certain model can deliver in terms of prediction power.

If econometrics is thought to estimate population parameters, and provide their causal inference, the black box approach proper of data analysis is somewhat opposite to this concept. In fact, we only care about responses and predicted responses to discriminate across models given a certain amount of data (captured in an observable sample). We then calculate the prediction contrasted with the actual value and derive measures of the error, and thus, we select a rational model which provides the best explanation of the response variable considering, of course, the tradeoff between variance and bias induced.

In an article by Mullainathan & Spiess (2017) from the Journal of Economic Perspectives, a short description of supervised and unsupervised approaches of machine learning are described. The out-of-sample performance for these methods in comparison to the least-squares is potentially greater. See the next table taken from the article of these authors:

Source: Mullainathan & Spiess (2017, 90) Note: The dependent variable is the log-dollar house value of owner-occupied units in the 2011 American Housing Survey from 150 covariates including unit characteristics and quality measures. All algorithms are fitted on the same, randomly drawn training sample of 10,000 units and evaluated on the 41,808 remaining held-out units. The numbers in brackets in the hold-out sample column are 95 percent bootstrap confidence intervals for hold-out prediction performance, and represent measurement variation for a fixed prediction function. For this illustration, we do not use sampling weights. Details are provided in the online Appendix at http://e-jep.org.

In this exercise, a training sample and a test sample were used to calculate the “prediction performance” given by the R^{2}. In econometrics, we would call this, the goodness of fit of the model, or also, the percentage of linear explanation regarding the model. It is not a secret that when the goodness of fit of the model increases, we will have a higher prediction power (considering of course that we would never actually going to have an R^{2} of 1 unless we have some overfitting issues).

When you compare table 1 results in the “hold out of sample” column, you can find that some other approaches may outperform the least-squares regression in terms of the prediction power. A mere mention of this can be witnessed in the row corresponding to LASSO estimates, and hence, one can states that there’s an increased prediction performance compared to least squares. And therefore, the LASSO model is capturing somewhat better, the behavior of the response variable (at least for this sample).

One should ask at this point what is the objective of the analysis. If we are going for statistical inference and the estimation of population parameters, we should stick to the non-black-box approaches. Some of them may involve traditional LS, GMM, 2SLS to mention an example. But, if we are more interested in the prediction power and performance, the black box approaches surely will come in handy, and sometimes, may outperform the econometrical procedures to estimate population parameters. In the way I see it, the black box even when it is unknown to us in the closer details, has the ability to adapt itself to the data (but of course this should be considering the variety of machine learning methods and algorithms, not the penalized regression).

As the authors expressed in their article, it could be tentative to draw conclusions from these methods like we usually do in econometrics, but first, we need to consider some of the limitations in the application of the black box approaches. A mention of these could be defined as 1) Sometimes the correlation steps in, 2) The production of standard errors become harder, 2) Some of the methods are inconsistent if we change the initial conditions, 3) There’s a risk of choosing incorrect models and may induce to omitted variable bias.

However, even with the above problems, we are able to get some useful connections between the black box approaches and the econometrical methods. The advantage of machine learning over the estimation of traditional econometrical models may be superior in the context of large samples, in which, the researcher may need to define a set of covariates of influence to define or test a theory. Also, even for policymakers, it can be a useful tool associated with econometrical analysis. This provides the economist “a tool to infer actual from reported” values and proceed with comparisons given the samples of the researcher.

We are also able to correct some of the problems associated with the prediction powers to estimate population parameters, as the authors appoint, consider the case of two-stage least squares, where in the first stage we are required to make a prediction of the endogenous regressor considering an instrument, the black box approach may even be useful to perform better predictions and include it in second-stage regression, however, it should be noted that instruments selected should be at least reasonable exogenous, because if we let the black box alone, it would just take correlations and possible bring up reverse causality problems.

Supervised or non-supervised methods in machine learning may provide a better understanding from a different approach, and with this, I refer to the “black box” approach. Since even when it is not exactly part of the causal analysis. It may be useful to select some possible covariates of a phenomenon, thus, the rational analysis and the selected outcome should always be considered and criticized in terms to provide the best inference. From this perspective, even when we don’t know what exactly happens inside the box, the outcome of the black box itself is giving us some useful information.

This is a topic that is getting constant reviews and enhancement for real-world applications, I believe that the bridge between the black box approaches from machine learning and the econometrical theory will eventually be more strong over time, considering, of course, the needs of the growing society in terms of information.

**Bibliography**

Aravindan, G. (2019) Challenges of AI-based adoptions: Simplified, Sogetilabs. Recuperated from: https://labs.sogeti.com/challenges-of-ai-based-adoptions/

Mullainathan, S. & Spiess, J. (2017) Machine Learning: An Applied Econometric Approach, Volume 31, Number 2—Spring 2017—Pages 87–106.

Rudin, C. & Radin, J. (2019) Why Are We Using Black Box Models in AI When We Don’t Need To? A Lesson From An Explainable AI Competition, Recuperated from: https://hdsr.mitpress.mit.edu/pub/f9kuryi8/release/6