Order Custom Written Econometrics Homework Help
Get the results you need with essays, dissertations and mentorship from the world's top writers and college tutors
Order in just 3 minutes!
Econometrics: Cross Section and Panel Data Spring 2017
Final Exam
Name:
Institution:
Professor:
Date:
Wooldridge (2010, 2002) Question 5.5
Question 1
One occasionally sees the following reasoning used in applied work for choosing instrumental variables in the context of omitted variables. The model is;
y_{1} = z_{1}δ_{1}+α_{1}y_{1}+yq+a_{1}
Where q is the omitted factor, we assume that a_{1} satisfies the structural error assumption E (a_{1 }z_{1}, y_{2}, q) = 0, that z_{1} is exogenous in the sense that E (q _{I} z_{1}) = 0 but that y_{2} and q may be correlated. Let z_{2} be a vector of instrumental variable candidate for y2 on (z_{1,} z_{2}) and so the requirement that z_{2} be partially correlated with y_{2} satisfied.
Also, we are willing to assume that z_{2} is redundant in the structural equation, so that a1 is uncorrelated with z_{2}.What we were unsure of is whether z_{2} is correlated with the omitted variable q, in which case z_{2} would contain valid IVs.
To “test” whether z_{2} is in fact uncorrelated with q, it has been suggested to use OLS on the equation
y_{1} = z_{1}+α_{1}y_{2}+z_{2}ψ_{1}+u_{1}
Where u_{1} = yq+a_{1} and test H0: ψ_{1} = 0, why does this method not work?
As you know, here is Wooldridge’s answer to this question
Under the null hypothesis that q and z_{2} are uncorrelated, z_{1} and z_{2} are exogenous in (5.55) because each is uncorrelated with u_{1}.
Unfortunately, y_{2} is correlated with u_{1}, and so the regression of y_{1} on z_{1} , y_{2} , and z_{2} does not produce a consistent estimator of 0 on z_{2} even when E z′ 2 ( q) = 0. We could find that Ψˆ 1 from this regression is statistically different from zero even when q and z2 are uncorrelated – in which case we would incorrectly conclude that z_{2} is not a valid instrument in this model. Or, we might fail to reject H0: Ψ1 = 0 when z_{2} and q are correlated – in which case we incorrectly conclude that the elements in z_{2} are valid as instruments. The point of this exercise is that one cannot simply add instrumental variable candidates in the structural equation and then test for significance of these variables using OLS estimation. This is the sense in which identification cannot be tested: we cannot test whether all of the IV candidates are uncorrelated with q. With a single endogenous variable, we must maintain an assumption, i.e., at least one element of z_{2} is uncorrelated with q.
Given the above:
- Why one should expect that carrying out the test of the null in Wooldridge’s problem 5.5, model 5.55, to have low power? Explain.
Carrying out the test of the null in Wooldridge’s problem 5.5 will have a lower power because one cannot simply add instrumental variable candidates in the structural equation and then test for the significance of the instrumental variable candidates using OLS because having a single endogenous variable means one should take a stand that at least one element of z_{2} is uncorrelated with q
- Is z_{2} a proxy variable or an instrument? Explain.
In the equation, z_{2} is an instrumental variable because it satisfies two conditions, the correlation and correlation properties. The first property is observed under null hypothesis that z_{2} and q are uncorrelated in which case, we could incorrectly conclude that z_{2} is not a valid IV candidate or when they are correlated, and in which case we incorrectly conclude that z_{2} is a valid instrument.
- Does Wooldridge’s statement “y_{2} is correlated with u_{1}” follow from the information provided? Explain. If we estimate the sample model y1 = z1 δ 1 +α1 y2 + a1, how do we know if we have unbiased or consistent parameter estimates? Explain.
Under null hypothesis, y_{2} correlated with u1 and therefore the regression of y1 on the sample model y1 = z_{1}δ_{1}+α_{1}y_{2}+a_{1} does not produce a consistent estimator of 0 on z_{1}. We found out that α_{1} is statistically different from zero even when y_{2} and z_{1} are uncorrelated in which case; we can incorrectly say that y_{2} is not a valid IV candidate.
Question 2
Using the data in “Ecs S17 FINAL 2.xls”, a researcher obtained the following OLS regression results.
From the estimates presented above, the researcher concludes:
- Experience (exp), on average significantly impacts wages, more experience leads to higher wages to a point, but too much experience will reduce wages. So workers finally reach the point of negative returns, they have too much experience.
- Females, on average, make 35.8% less than males, all else equal.
- Union workers make 12.57% more, on average, than nonunion workers.
- Blacks, on average, make 14.64% less than other races, all else equal.
- Marriage has no significant impact on wages.
- Each year of education increases the expected wage by 6.24%.
Using the sample and information above, address the following.
- Assuming no violations of underlying OLS assumptions, are these valid interpretations and conclusions?
The column headers on the XY Data Table are;
exp, wks, occ, ind, south, smsa, ms, fem, union, ed, blk, lwage
From the results of the regression analysis done on experience versus wages, female versus male, union workers versus non- union workers, blacks versus wages, marriage versus wages and education versus wages, several factors affects pay and there exists a remarkable variation in wages and some of these variations are a s a result of experience, unionization, industry, region, gender, race, occupation and marital status among many others.
An example to illustrate the impact of experience and education on wages is shown below;
Wage = f (Education, Experience)
The above equation shows that the longer a person stays on the job, the better they become and therefore are more productive and deserve more pay which means that;
∆Wage/∆Experience > 0, other things equal.
The complete relationship between experience, wage, and education is;
Ln (wage_{i}) = β_{1}+_{β2}Education_{i}+β_{3}Experiencei+u_{i}
- Are parameter estimates consistent or unbiased? Test and explain.
We ran the Ramsey Reset test to see if our functional form is correct. Due to the large number of data points, we rest with only, we rest with multiple terms and the output is shown below;
Ramsey RESET Test: | ||||
F-statistic | 3.338742 | Probability | 0.084294763 | |
Log likelihood ratio | 3.743356 | Probability | 0.053017849 |
The results of the Ramsey RESET test, a null hypothesis is not acceptable because F statistic stands at 3.338742 which is a figure that is higher than the critical F-statistic which is 2.61 at 0.05 levels. This means that there is a possibility of bias in the parameter used. We may argue that the causes of these results include assumptions that all the relevant explainers of the regression summary outputs of the model are incorrectly specified. Second, the explainers in the regression summary outputs for all the variables are not correlated with error terms. Thirdly, all the error in the summary output and residual data are normally distributed. Fourth, the error terms contain constant variance, and lastly, the error terms in the output are independent of each other.
- Are parameter standard errors efficient? Test and explain.
The findings of the White’s test for heteroskedasticity on whether the error terms have constant variance or not show a very high p-value, meaning that we can accept a null hypothesis at a reasonable significance level. The conclusion is that there is adequate statistical evidence to support that all the variance of the error terms is a constant.
White Heteroskedasticity Test: | ||||
F-statistic | 0.720549 | Probability | 0.589658095 | |
Obs*R-squared | 3.1892 | Probability | 0.526677039 |
From the regression analysis summary output for each of the variables, standard error of the regression are tested by the equation sqrt(SSE/(n-k)), The standard error values shows that experience versus wages standard error is infinitely large; however, for the rest of the variables, they are infinitely distributed because they are not closer to 1.
- Compare the results from at least three other estimators relevant for this problem and compare them to the OLS results above. Explain your choice of estimators and compare the results. Should cross-section effects be included? If so, how? Should time effects be included in these models?
Hausman Test
We used Hausman test to compare random effects, fixed effects and OLS using descriptive statistics model show above. The results of the test show that a null hypothesis is not acceptable at 0.0132 level of significance which means that the RE parameter estimates are inconsistent due to contemporaneous correlation between the random effects and the X variables.
- Which estimator is preferred? Why? Motivate your choice using the appropriate tests and explain how the tests indicate support for your final model relative to the other models you’ve estimated.
Linear regression model is preferable estimator because it contains other data analysis tools such as straight line plot which can be used to compare the trends in predicted and observed output. This model contains a tool that can be used to refine the estimation by removing and adding regressors depending on the results of the t tests.
- Once you have developed and supported the validity of a final model (relative to the others), explain your results in the context of the researcher’s conclusions. Are the original conclusions of the researcher supported by the results from your proposed model? Explain.
When linear regression model is compared to the other models of data analysis, this model implements a statistical approach of analyzing relationships between independent variables and dependent ones which shows optimal results. Regression based forecasting is useful when predicting what is likely to happens in different periods of the financial year and provides an insight into how higher taxes, consumer tastes and preferences impact on the economy and businesses as well.
In the context of the researcher’s conclusion, the application of linear regression data analysis supports decisions on wages, training and recruitment, employee development, gender issues, and as a scientific angle to management of businesses by helping to reduce large amounts of raw data into actionable information. The model is also useful for managers to correct errors in scenarios where evidence indicates otherwise by providing a more quantitative support for decisions. Lastly, the model provides new insights by uncovering patterns and relationships that were previously unnoticed.
Part II
- A researcher wants to estimate how age, income, education, health state, and other individual characteristics affect the number of times each individual visits an MD during a year.
Ordinary Least Squares (OLS) Data Analysis Model: 2002
In this analysis, OLS data analysis model was used to estimate how age, income, education, health state, and other individual characteristics affect the number of times each individual visits an MD during a year. The variables that can be identified from the dataset are age, income, education, gender, and ethnicity among many others. The resulting estimator is expressed by the following general equation;
y = ß_{0} + ß_{1}x
Where ß_{0 }is the y intercept and ß_{1 }is the slope of the output.
The variables regression model is: y = β_{1} + β_{2} x + u
We use the regression line: y = b_{1} + b_{2} x
Education vs docvisits in a year
SUMMARY OUTPUT | |||||
Regression Statistics | |||||
Multiple R | 0.098589865 | ||||
R Square | 0.009719961 | ||||
Adjusted R Square | 0.009687085 | ||||
Standard Error | 2.939940107 | ||||
Observations | 30123 | ||||
ANOVA | |||||
df | SS | MS | F | Significance F | |
Regression | 1 | 2555.364568 | 2555.364568 | 295.6487 | 6.03875E-66 |
Residual | 30121 | 260343.2679 | 8.643247831 | ||
Total | 30122 | 262898.6325 | |||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | |
Intercept | 12.58355589 | 0.018810522 | 668.9636637 | 0 | 12.54668646 |
1 | 0.036833124 | 0.002142154 | 17.19443665 | 6.04E-66 | 0.032634411 |
Interpreting the OLS Output
Intercept
Refers to the estimated intercept.
Educ
The variable name represents the estimated slope.
Coefficient
The actual estimated values
Standard Error
Below “R Square”, denotes the standard error of the regression:
where k = 2 because we are estimating two parameters.
In the regression output area, denotes the standard errors of each OLS estimator.
t Stat
Denotes the t-statistic for the traditional hypothesis
The statistics are, respectively,
P-value
Associated p-value for each standard t-test. If the p-value is less than .05, for example, we reject the standard null hypothesis at the 5%-level.
Lower 95%
Denotes the lower bound of a 95% confidence interval. Formulaically, we have, respectively,
Upper 95%
The upper bound of a 95% confidence interval. Formulaically,
Square
The coefficient of determination:\
Predicted Y
Predicted values:.
ANALYSIS
How Education affects the number of times each individual visits an MD during a Year
An estimate of how education levels affects the number of times each individual visits an MD during a year reveals that persons with higher levels of education tend to consult an MD more times in year than those with lower levels of education. The main reason for prevalence of this trend is for preventive purposes.
The finding from the regression analyses on an aggregate level of physical consultations per 1000 visits shows a small Medicare share of persons with lower levels of education from the ISCED classification. Results of other research studies also show that there is a sizeable correlation between mortality rate and education, lost days of work, and self-reported poor health (Mertens et al., 2017).
The outcomes of the regression analyses also show that there is a significant gap among the different population groups in the US, the largest one being the black-white gap. There are three general explanations that can be given on the outcomes of correlation between education and health. First is that educational causality leads to better and improved health outcomes (Mertens et al., 2017). Second is the reverse causality in which states that poor health leads to low levels of education and lastly, additional factors leads to an escalation of the correlation between education and health.
How income affects the number of times each individual visits an MD during a Year
An estimate of how income affects the number of times each individual visits an MD during a year using the sample data provided shows that the trend obeys the utility maximizing rule where consumers of healthcare services chooses a bundle of healthcare services that would maximize utility.
The numbers of visits in a year are determined by a gain or loss in the marginal utility of the amount spent in a year on the same service. The demand for healthcare service in a year is also subject to the law of diminishing utility as illustrated in the utility curve shown below;
Figure 1: The relationship between utility and Quantity of Medicare
The figure above reveals that total utility increases at a decreasing rate with respect to the quantity of Medicare demand in a year. The bow shape is because each additional unit of healthcare visits results in a smaller increase in health than the previous visit because of the law of diminishing marginal productivity.
How Education affects the number of times each individual visits an MD during a Year
An estimate of how education levels affects the number of times each individual visits an MD during a year reveals that persons with higher levels of education tend to consult an MD more times in year than those with lower levels of education. The main reason for prevalence of this trend is for preventive purposes (Lyles & Hummer, 2012).
The finding from the regression analyses on an aggregate level of physical consultations per 1000 visits shows a small Medicare share of persons with lower levels of education from the ISCED classification. Results of other research studies also show that there is a sizeable correlation between mortality rate and education, lost days of work, and self-reported poor health (Lyles & Hummer, 2012).
The outcomes of the regression analyses also show that there is a significant gap among the different population groups in the US, the largest one being the black-white gap. There are three general explanations that can be given on the outcomes of correlation between education and health (Lyles & Hummer, 2012).First is that educational causality leads to better and improved health outcomes. Second is the reverse causality in which states that poor health leads to low levels of education and lastly, additional factors leads to an escalation of the correlation between education and health (Lyles & Hummer, 2012).
- Evaluate the estimation results. Are there any potential issues with the specification or in validly making statements such (a) and (b) below? Explain why or why not.
- On average, the number of doctor visits is not significantly affected by age.
The results of the data analysis show that on average, the number of doctor visits is not significantly affected by age. However, the demand for healthcare services during a year shows slight impact. The aging of a population is the main driving agent of demand for healthcare services. Statistics show that the numbers of visits for older persons in a year are higher than those of the young and middle age. The amount of healthcare services seniors with the ages of 65 to and 75 to 84 are driven by the prevalence of chronic healthcare conditions. The sample data reveals that healthcare expenditure for persons above 64 years of age is three times or more than the expenditure for those between the ages of 0 and 64.
The key findings in the research show that older people use more services than younger people when correlated with age. The trend also shows that age is not strongly associated with outpatient services and that those patients with medical health insurance cover have more visits and stays with MD than those without cover. Married people tend to spend more time with MD than single people because single people have higher chances of being discharged to other institutional care.
- Individuals with fair or poor health visit the doctor significantly more often than others, on average 1.84 times more per year.
This result of dataset analysis reveals that persons with fair or poor health status visits the doctors by approximately 1.84 times more per year because of they tend to invest less amounts of money in preventive measures. In the regression analysis done to estimate the impact of health state on the number of times each individual visit s an MD in a year, the results show a correlation between a country’s health state and the number of times persons from different population groups visit an MD in a year.
A state level analysis shows higher levels of healthcare spending for population groups with higher incomes, with few uninsured residents, healthy lifestyles. In the US, data shows that the status on healthcare systems is average and current efforts of improving health state is more focused cost containment, increasing levels of education and promoting healthier lifestyles.
The findings of the analysis also show major differences in the health status in different population groups where minority groups have higher levels of mortality and less healthier than the majority whites who have lower mortality rates and healthier, yet they tend to visit MD more times in a year than persons from the minority groups.
- Are parameter estimates consistent or unbiased? Test and explain.
The parameter estimates are not fully consistent because we assume the X’s are not perfectly linearly related (i.e. we assume there does not exist perfect multicolinearity).
- Are parameter standard errors efficient? Test and explain.
The parameter standard errors are efficient because we deduce that deduce that the “Residual SS” is the ESS, or
ESS: Error Sum of Squares, or “Sum of Squared Residuals”
- Compare the results from at least three other estimators relevant for the sample to the OLS results the researcher has above. Which estimator is preferred? Why? Motivate your choice using the appropriate tests and explain how the tests indicate support for your final model relative to the other models you’ve estimated
Linear regression model is the preferred estimator because it contains other data analysis tools such as straight line plot which can be used to compare the trends in predicted and observed output. This model contains a tool that can be used to refine the estimation by removing and adding regressors depending on the results of the t tests.
References:
Top of Form
Bottom of Form
Lyles, R. W., & Hummer, J. E. (2012). Effective Experiment Design and Data Analysis in Transportation Research. Washington, D.C: Transportation Research Board.
Mertens, W., Pugliese, A., & Recker, J. (2017). Quantitative Data Analysis: A Companion for Accounting and Information Systems Research.
Order a custom written paper here
or Contact us for tailored assistance