## Assumptions and Diagnostics Tutorial assignment

Tutorial 6 (Week 7) Assumptions and Diagnostics Tutorial assignment: What might Ramsey’s RESET test be used for? Ans: The Ramsey’s RESET test is a test to determine the correct functional form. Ramsey’s RESET test is a test of whether the functional form of the regression is appropriate. In other words, we test whether the relationship between the dependent variable and the independent variables really should be linear or whether a non-linear form would be more appropriate. The test works by adding powers of the fitted values from the regression into a second regression. If the appropriate model was a linear one, then the powers of the fitted values would not be significant in this second regression. What could be done if it were found that the RESET test failed? Ans: The test is performed under the null hypothesis of a linear model. The rejection of the null implies that a nonlinear model is supported by the data. However, the test does not provide the functional form of the nonlinear model. If we fail Ramsey’s RESET test, then the easiest “solution” is probably to transform all of the variables into logarithms. This has the effect of turning a multiplicative model into an additive one. If this still fails, then we really have to admit that the relationship between the dependent variable and the independent variables was probably not linear after all so that we have to either estimate a non-linear model for the data (which is beyond the scope of this course) or we have to go back to the drawing board and run a different regression containing different variables. Objective: 1. Identifying multicollinearity and possible solutions to the problem 2. Perform Chow Test to determine whether parameters are the same for different groups Question 1 The data is available in the file “hedonic1.xls” Consider the following multiple regression model for new houses only (age=0). Proc>Set sample> if age=0 a) Estimate the econometric model t t t t t t u VACANT STORIES BATHS BEDS SFLA SP 6 5 4 3 2 1 ) ln( Quick>Estimate equation> Log(selling_price) c sfla beds baths stories vacant 2 Dependent Variable: LOG(SELLING_PRICE) Method: Least Squares Date: 07/09/05 Time: 10:42 Sample: 1 6660 IF AGE=0 Included observations: 151 Variable Coefficient Std. Error t-Statistic Prob. C 11.24201 0.124654 90.18541 0.0000 SFLA 0.000707 5.48E-05 12.90426 0.0000 BEDS -0.084611 0.033665 -2.513295 0.0131 BATHS 0.034427 0.084073 0.409495 0.6828 STORIES -0.141884 0.060394 -2.349299 0.0202 VACANT 0.068117 0.035346 1.927130 0.0559 R-squared 0.784029 Mean dependent var 12.02137 Adjusted R-squared 0.776582 S.D. dependent var 0.431160 S.E. of regression 0.203797 Akaike info criterion -0.304458 Sum squared resid 6.022332 Schwarz criterion -0.184566 Log likelihood 28.98658 F-statistic 105.2772 Durbin-Watson stat 0.213214 Prob(F-statistic) 0.000000 b) Do the coefficients take the expected signs? Check for any evidence of multicollinearity. The coefficients for beds and stories do not take the expected signs we expect the more bedrooms and more stories the greater the selling price however this is not demonstrated by the data for new houses. Also vacant houses on average are more expensive than non-vacant houses but this is not significant at the 5% level. One way to check for multicollinearity is to check the correlation coefficients of all the explanatory variables. In the workfile window> hold control and click all the explanatory variables> right click and select to open them as a group. View> covariance analysis>tick the option correlation>OK BATHS BEDS SFLA STORIES VACANT BATHS 1.000000 0.676223 0.862870 0.673716 -0.178327 BEDS 0.676223 1.000000 0.657986 0.515717 -0.074098 SFLA 0.862870 0.657986 1.000000 0.658058 -0.131856 STORIES 0.673716 0.515717 0.658058 1.000000 -0.029968 VACANT -0.178327 -0.074098 -0.131856 -0.029968 1.000000 3 Any correlations over 0.8 are considered to be evidence of multicollinearity. However with multicollinearity there is no cut off there is just more of a possible effect the higher the correlations between the variables. Another way in which to detect multicollinearity is to run auxiliary regressions where we run one variable against the rest of the explanatory variables. For example: Quick>Estimate Equation> Sfla c beds baths stories vacant Dependent Variable: SFLA Method: Least Squares Date: 07/09/05 Time: 12:00 Sample: 1 6660 IF AGE=0 Included observations: 151 Variable Coefficient Std. Error t-Statistic Prob. C -1312.035 153.8351 -8.528843 0.0000 BEDS 111.6102 50.00764 2.231862 0.0271 BATHS 1015.968 95.17536 10.67469 0.0000 STORIES 204.8847 89.63886 2.285668 0.0237 VACANT 6.589078 53.38976 0.123415 0.9019 R-squared 0.763486 Mean dependent var 1621.993 Adjusted R-squared 0.757006 S.D. dependent var 624.5071 S.E. of regression 307.8469 Akaike info criterion 14.32963 Sum squared resid 13836377 Schwarz criterion 14.42954 Log likelihood -1076.887 F-statistic 117.8251 Durbin-Watson stat 0.343464 Prob(F-statistic) 0.000000 A high r-squared indicates that there is evidence of multicollinearity. c) What are the possible effects of multicollinearity? What are the possible solutions to the problem? Possible consequences include – incorrect signs and sizes of the coefficients and possible large std errors, so that the variables appear individually not significant when in fact they are and together they may be significant when doing an F-test. Possible solutions include – dropping one of the variables in question. Creating a ratio of the variables. Gathering more data to estimate your model. 4 d) Create a dummy variable for the entire dataset which has a value of 1 for a new house and 0 for any other house. Proc> set sample> clear the if statement Genr>new=0 Genr>new=1 , and type in the sample window next to @all if age=0 Alternatively, in the blank window above the workfile type in dum1= age=0. You will be able to find a new variable dum1 created in the workfile. To check whether the dummy is properly created, you can graph the dum1 variable and find the many spikes that have the value 1 at age=0. e) Do a Chow test for the complete dataset to see if the equation changes depending on whether the house is a new house or not. Quick>Estimate equation> Log(selling_price) c sfla beds baths stories vacant new new*sfla new*beds new*baths new*stories new*vacant Dependent Variable: LOG(SELLING_PRICE) Method: Least Squares Date: 07/09/05 Time: 12:24 Sample: 1 6660 Included observations: 6660 Variable Coefficient Std. Error t-Statistic Prob. C 11.06159 0.016676 663.3423 0.0000 SFLA 0.000465 1.12E-05 41.67081 0.0000 BEDS -0.025994 0.007025 -3.700036 0.0002 BATHS 0.168266 0.009302 18.08933 0.0000 STORIES -0.055975 0.010797 -5.184200 0.0000 VACANT -0.111657 0.007842 -14.23829 0.0000 NEW 0.180419 0.174618 1.033220 0.3015 NEW*SFLA 0.000242 7.72E-05 3.136150 0.0017 NEW*BEDS -0.058617 0.047466 -1.234926 0.2169 NEW*BATHS -0.133838 0.117601 -1.138073 0.2551 NEW*STORIES -0.085909 0.084904 -1.011839 0.3117 NEW*VACANT 0.179774 0.049907 3.602154 0.0003 R-squared 0.538565 Mean dependent var 11.91930 Adjusted R-squared 0.537802 S.D. dependent var 0.418000 S.E. of regression 0.284178 Akaike info criterion 0.323366 Sum squared resid 536.8724 Schwarz criterion 0.335626 Log likelihood -1064.810 F-statistic 705.3853 5 Durbin-Watson stat 0.818210 Prob(F-statistic) 0.000000 View>Coefficient tests>Wald – Coefficient restrictions> C(7)=0,C(8)=0,C(9)=0,C(10)=0,C(11)=0,C(12)=0 Wald Test: Equation: EQ01 Test Statistic Value df Probability F-statistic 4.102574 (6, 6648) 0.0004 Chi-square 24.61545 6 0.0004 Ho: The coefficients are the same regardless of whether it is a new house or not H1: The coefficients change depending on whether it is a new house or not Assume a 5% level of significance p-value=.0004<0.05 Reject the null At the 5% level we can conclude that there are different effects for new and old houses