This describes the Introduction to Regression Analysis with Examples
In studying relationships between two variables, collect the data and then construct a scatter plot. The purpose of the scatter plot is to determine the nature of the relationship between the variables. The possibilities include a positive linear relationship, a negative linear relationship, a curvilinear relationship, or no discernible relationship. After the scatter plot is drawn and a linear relationship is determined, the next steps are to compute the value of the correlation coefficient and to test the significance of the relationship. If the value of the correlation coefficient is significant, the next step is to determine the equation of the regression line, which is the data’s line of best fit. (Note: Determining the regression line when r is not significant and then making predictions using the regression line are meaningless.) The purpose of the regression line is to enable the researcher to see the trend and make predictions on the basis of the data.
Line of Best Fit
Figure 1 shows a scatter plot for the data of two variables. It shows that several lines can be drawn on the graph near the points. Given a scatter plot, you must be able to draw the line of best fit. Best fit means that the sum of the squares of the vertical distances from each point to the line is at a minimum.
The difference between the actual value y and the predicted value yʹ (that is, the vertical distance) is called a residual or a predicted error. Residuals are used to determine the line that best describes the relationship between the two variables.
The method used for making the residuals as small as possible is called the method of least squares. As a result of this method, the regression line is also called the least-squares regression line.
The reason you need a line of best fit is that the values of y will be predicted from the values of x; hence, the closer the points are to the line, the better the fit and the prediction will be. See Figure 2. When r is positive, the line slopes upward and to the right. When r is negative, the line slopes downward from left to right.
Determination of the Regression Line Equation
In algebra, the equation of a line is usually given as y = mx + b, where m is the slope of the line and b is the y intercept. (Students who need an algebraic review of the properties of a line should refer to the online resources, before studying this section.) In statistics, the equation of the regression line is written as yʹ = a + bx, where a is the yʹ intercept and b is the slope of the line.
There are several methods for finding the equation of the regression line. Two formulas are given here. These formulas use the same values that are used in computing the value of the correlation coefficient. The mathematical development of these formulas is beyond the scope of this book.
Formulas for the Regression Line yʹ = a + bx
where α is the yʹ intercept and b is the slope of the line.
Rounding Rule for the Intercept and Slope Round the values of α and b to three decimal places.
The steps for finding the regression line equation are summarized in this Procedure Table.
Procedure Table Finding the Regression Line Equation
Finding the Regression Line Equation
Step 1: Make a table, as shown in step 2.
Step 2: Find the values of xy, x2, and y2. Place them in the appropriate columns and sum each column.
Step 3: When r is significant, substitute in the formulas to find the values of a and b for the regression line equation yʹ = a + bx.
. margins, at(taxlevel=(.1(.01).3))
EXAMPLE 1: Car Rental Companies
Find the equation of the regression line for the data in BELOW, and graph the line on the scatter plot of the data.
The values needed for the equation are n = 6, Σx = 153.8, Σy = 10.7, Σxy = 682.77, and Σx2 = 5859.26. Substituting in the formulas, you get
Hence, the equation of the regression line yʹ = a + bx is
To graph the line, select any two points for x and find the corresponding values for y. Use any x values between 10 and 60. For example, let x = 15. Substitute in the equation and find the corresponding yʹ value.
Let x = 40, then