Fitness of Regression Line Test

Fit of the Regression Line

Basic Concepts

On this webpage, we show how to test the following null hypothesis:

H0: the regression line doesn’t capture the relationship between the variables

If we reject the null hypothesis it means that the line is a good fit for the data. We now express the null hypothesis in a way that is more easily testable:

H0\sigma^2_{Reg} ≤ \sigma^2_{Res}

As described in Two Sample Hypothesis Testing to Compare Variances, we can use the F test to compare the variances in two samples. To test the above null hypothesis we set F = MSReg/MSRes and use dfRegdfRes degrees of freedom.

Assumptions

The use of the linear regression model is based on the following assumptions:

  • Linearity of the phenomenon measured
  • Constant variance of the error term
  • Independence of the error terms
  • Normality of the error term distribution

In fact, the normality assumption is equivalent to the condition that the sample comes from a population with a bivariate normal distribution. See Multivariate Normal Distribution for more information about this distribution. The homogeneity of the variance assumption is equivalent to the condition that for any values x1 and x2 of x, the variance of y for those x are equal, i.e.

image1742

Linear regression can be effective with a sample size as small as 20.

Example

Example 1: Test whether the regression line in Example 1 of Method of Least Squares is a good fit for the data.

Regression line goodness fit

Figure 1 – Goodness of fit of regression line for data in Example 1

We note that SST = DEVSQ(B4:B18) = 1683.7 and r = CORREL(A4:A18, B4:B18) = -0.713, and so by Property 3 of Regression Analysis, SSReg r2·SST = (1683.7)(0.713)2 = 857.0. By Property 1 of Regression Analysis, SSRes SST – SSReg = 1683.7 – 857.0 = 826.7. From these values, it is easy to calculate MSReg and MSRes.

We now calculate the test statistic F = MSReg/MSRes = 857.0/63.6 = 13.5. Since Fcrit = F.INV(α, dfReg, dfRes) = F.INV(.05, 1, 13) = 4.7 < 13.5 = F, we reject the null hypothesis, and so accept that the regression line is a good fit for the data (with 95% confidence). Alternatively, we note that p-value = F.DIST.RT(F, dfReg, dfRes) = F.DIST.RT(13.5, 1, 13) = 0.0028 < .05 = α, and so once again we reject the null hypothesis.

Observations

There are many ways of calculating SSRegSSResand SST. E.g. using the worksheet in Figure 1 of Regression Analysis, we note that SSReg = DEVSQ(K5:K19) and SSRes = DEVSQ(L5:L19). These formulas are valid since the means of the y values and ȳ values are equal by Property 5(b) of Regression Analysis.

Also by Definition 2 of Regression Analysis, SSRes \sum{} (yi – ŷi)2  = SUMXMY2(J5:J19, K5:K19). Finally, SST = DEVSQ(J5:J19), but alternatively SST var(y) ∙ dfT = VAR(J5:J19) * (COUNT(J5:J19)-1).

Help with any types of dissertations

Undergraduate dissertation

Choose from hundreds of experts who can assist you in completing your undergraduate dissertation! Prices start at $10 per page, with potential discounts for longer orders or extended deadlines

Master’s dissertation

 

If you are in the process of completing a Master’s degree, we can provide you with an experienced writer to finish your dissertation. We strive to offer a quick turnaround on tailored papers at an affordable price, starting at $10.30 per page.

Ph.D. or doctoral dissertation

 

Hire from among our most skilled experts to save your time and ease your workload. Prices for Ph.D. assistance start at $10.60 per page.

Dissertation Writing Help
Need Our Services?
Thesis Writing Help
Editing & Proofreading Services
Need Help, Whatsapp Us Now