7 Assumptions of Linear regression using Stata

7 Assumptions of Linear regression using Stata

There are seven “assumptions” that underpin linear regression. If any of these seven assumptions are not met, you cannot analyse your data using linear because you will not get a valid result. Since assumptions #1 and #2 relate to your choice of variables, they cannot be tested for using Stata. However, you should decide whether your study meets these assumptions before moving on.

  • Assumption #1: Your dependent variable should be measured at the continuous level. Examples of such continuous variables include height (measured in feet and inches), temperature (measured in oC), salary (measured in US dollars), revision time (measured in hours), intelligence (measured using IQ score), reaction time (measured in milliseconds), test performance (measured from 0 to 100), sales (measured in number of transactions per month), and so forth. If you are unsure whether your dependent variable is continuous (i.e., measured at the interval or ratio level), see our Types of Variable guide.
  • Assumption #2: Your independent variable should be measured at the continuous or categorical level. However, if you have a categorical independent variable, it is more common to use an independent t-test (for 2 groups) or one-way ANOVA (for 3 groups or more). In case you are unsure, examples of categorical variables include gender (e.g., 2 groups: male and female), ethnicity (e.g., 3 groups: Caucasian, African American and Hispanic), physical activity level (e.g., 4 groups: sedentary, low, moderate and high), and profession (e.g., 5 groups: surgeon, doctor, nurse, dentist, therapist). In this guide, we show you the linear regression procedure and Stata output when both your dependent and independent variables were measured on a continuous level.

Need Data Analysis & Research Help?

Consult Us on Different Types of Regression analysis and Results Interpretation

Fortunately, you can check assumptions #3, #4, #5, #6 and #7 using Stata. When moving on to assumptions #3, #4, #5, #6 and #7, we suggest testing them in this order because it represents an order where, if a violation to the assumption is not correctable, you will no longer be able to use linear regression. In fact, do not be surprised if your data fails one or more of these assumptions since this is fairly typical when working with real-world data rather than textbook examples, which often only show you how to carry out linear regression when everything goes well. However, don’t worry because even when your data fails certain assumptions, there is often a solution to overcome this (e.g., transforming your data or using another statistical test instead). Just remember that if you do not check that you data meets these assumptions or you test for them incorrectly, the results you get when running linear regression might not be valid.

  • Assumption #3: There needs to be a linear relationship between the dependent and independent variables. Whilst there are a number of ways to check whether a linear relationship exists between your two variables, we suggest creating a scatterplot using Stata, where you can plot the dependent variable against your independent variable. You can then visually inspect the scatterplot to check for linearity. Your scatterplot may look something like one of the following:
    If the relationship displayed in your scatterplot is not linear, you will have to either run a non-linear regression analysis or “transform” your data, which you can do using Stata.
  • Assumption #4: There should be no significant outliers. Outliers are simply single data points within your data that do not follow the usual pattern (e.g., in a study of 100 students’ IQ scores, where the mean score was 108 with only a small variation between students, one student had a score of 156, which is very unusual, and may even put her in the top 1% of IQ scores globally). The following scatterplots highlight the potential impact of outliers:
    The problem with outliers is that they can have a negative effect on the regression equation that is used to predict the value of the dependent variable based on the independent variable. This will change the output that Stata produces and reduce the predictive accuracy of your results. Fortunately, you can use Stata to carry out casewise diagnostics to help you detect possible outliers.

Need Help with Researchers or Data Analysts, Lets Help you with Data Analysis & Result Interpretation for your Project, Thesis or Dissertation?

We are Experts in SPSS, EVIEWS, AMOS, STATA, R, and Python

  • Assumption #5: You should have independence of observations, which you can easily check using the Durbin-Watson statistic, which is a simple test to run using Stata.
  • Assumption #6: Your data needs to show homoscedasticity, which is where the variances along the line of best fit remain similar as you move along the line. The two scatterplots below provide simple examples of data that meets this assumption and one that fails the assumption:

    When you analyse your own data, you will be lucky if your scatterplot looks like either of the two above. Whilst these help to illustrate the differences in data that meets or violates the assumption of homoscedasticity, real-world data is often a lot more messy. You can check whether your data showed homoscedasticity by plotting the regression standardized residuals against the regression standardized predicted value.

  • Assumption #7: Finally, you need to check that the residuals (errors) of the regression line are approximately normally distributed. Two common methods to check this assumption include using either a histogram (with a superimposed normal curve) or a Normal P-P Plot.

In practice, checking for assumptions #3, #4, #5, #6 and #7 will probably take up most of your time when carrying out linear regression. However, it is not a difficult task, and Stata provides all the tools you need to do this.

In the section, Procedure, we illustrate the Stata procedure required to perform linear regression assuming that no assumptions have been violated. First, we set out the example we use to explain the linear regression procedure in Stata.

Data Analytics Services
Need Our Services?
Econometrics & Statistics Modelling Services
Need Help, Whatsapp Us Now