Heteroskedastic means “differing variance” which comes from the Greek word “hetero” (‘different’) and “skedasis” (‘dispersion’). It refers to the variance of the error terms in a regression model in an independent variable.
If heteroscedasticity is present in the data, the variance differs across the values of the explanatory variables and violates the assumption. This will make the OLS estimator unreliable due to bias. It is therefore imperative to test for heteroscedasticity and apply corrective measures if it is present. Various tests help detect heteroscedasticities such as the Breusch-Pagan test and the White test.
Heteroscedasticity tests use the standard errors obtained from the regression results. Therefore, the first step is to run the regression with the same three variables considered in the previous article for the same period of 1997-98 to 2017-18.
Regression results
The previous article explained the procedure to run the regression with three variables in STATA. The regression result is as follows.
Now proceed to the heteroscedasticity test in STATA using two approaches.
Breusch-Pagan test for heteroscedasticity
The Breusch-Pagan test helps to check the null hypothesis versus the alternative hypothesis. A null hypothesis is where the error variances are all equal (homoscedasticity), whereas the alternative hypothesis states that the error variances are a multiplicative function of one or more variables (heteroscedasticity).
To perform the Breusch-Pagan test use this STATA command:
estat hettest
The below results will appear.
The figure above shows that the probability value of the chi-square statistic is less than 0.05. Therefore the null hypothesis of constant variance can be rejected at a 5% level of significance. It implies the presence of heteroscedasticity in the residuals.
White test for heteroscedasticity
To check heteroscedasticity using the White test, use the following command in STATA:
estat imtest, white
The below results will appear.
Similar to the results of the Breusch-Pagan test, here too prob > chi2 = 0.000. The null hypothesis of constant variance can be rejected at a 5% level of significance. The implication of the above finding is that there is heteroscedasticity in the residuals.
Graphical depiction of results from heteroscedasticity test in STATA
Present heteroscedasticity graphically using the following procedure (figure below):
- Go to ‘Graphics’
- Selecting ‘Regression diagnostic plots’
- Choose ‘Residuals-versus-fitted’.
The rvfplot box will appear (figure below). Click on ‘Reference lines’. Click on ‘OK’.
The ‘Reference lines (y-axis)’ window will appear (figure below). Enter ‘0’ in the box for ‘Add lines to the graph at specified y-axis values’. Then click on ‘Accept’.
The following graph will appear.
The above graph shows that residuals are somewhat larger near the mean of the distribution than at the extremes. Also, there is a systematic pattern of fitted values.
Presence of heteroscedasticity
Thus heteroscedasticity is present. This can be due to measurement error, model misspecifications or subpopulation differences. The consequences of the heteroscedasticity are that the OLS estimates are no longer BLUE (Best Linear Unbiased Estimator). Standard errors will be unreliable, which will further cause bias in test results and confidence intervals.
Therefore correct heteroscedasticity either by changing the functional form or by using a robust command in the regression.
Correction for heteroscedasticity
In order to get the robust standards errors, add the ‘vce (robust)’ command after the regression:
regress gdp gfcf pfce, vce(robust)
This will output the following result (figure below).
Thus the problem of heteroscedasticity is not present anymore. This gives robust standard errors, which are different from standard errors in figure 1. Here robust standard error for the variable gfcf is 0.1030497, which is different from 0.076651 given in figure 1. Similar is the case with the variable pfcf.
The presence of autocorrelation or serial correlation is a violation of another important ordinary least squares (OLS) assumption that errors in the regression model are uncorrelated with each other at all points in time.