Introduction to Regression Analysis with Examples

This describes the Introduction to Regression Analysis with Examples

In studying relationships between two variables, collect the data and then construct a scatter plot. The purpose of the scatter plot is to determine the nature of the relationship between the variables. The possibilities include a positive linear relationship, a negative linear relationship, a curvilinear relationship, or no discernible relationship. After the scatter plot is drawn and a linear relationship is determined, the next steps are to compute the value of the correlation coefficient and to test the significance of the relationship. If the value of the correlation coefficient is significant, the next step is to determine the equation of the regression line, which is the data’s line of best fit. (Note: Determining the regression line when r is not significant and then making predictions using the regression line are meaningless.) The purpose of the regression line is to enable the researcher to see the trend and make predictions on the basis of the data.

Line of Best Fit

Figure 1 shows a scatter plot for the data of two variables. It shows that several lines can be drawn on the graph near the points. Given a scatter plot, you must be able to draw the line of best fit. Best fit means that the sum of the squares of the vertical distances from each point to the line is at a minimum.

The difference between the actual value y and the predicted value  (that is, the vertical distance) is called a residual or a predicted error. Residuals are used to determine the line that best describes the relationship between the two variables.

The method used for making the residuals as small as possible is called the method of least squares. As a result of this method, the regression line is also called the least-squares regression line.

The reason you need a line of best fit is that the values of y will be predicted from the values of x; hence, the closer the points are to the line, the better the fit and the prediction will be. See Figure 2. When r is positive, the line slopes upward and to the right. When r is negative, the line slopes downward from left to right.

Figure 1: Scatter Plot with Three Lines Fit to the Data
FIGURE 2: Line of Best Fit for a Set of Data Points

Determination of the Regression Line Equation

In algebra, the equation of a line is usually given as y = mx + b, where m is the slope of the line and b is the y intercept. (Students who need an algebraic review of the properties of a line should refer to the online resources, before studying this section.) In statistics, the equation of the regression line is written as  = a + bx, where a is the  intercept and b is the slope of the line.

FIGURE 3: A Line as Represented in Algebra and in Statistics

There are several methods for finding the equation of the regression line. Two formulas are given here. These formulas use the same values that are used in computing the value of the correlation coefficient. The mathematical development of these formulas is beyond the scope of this book.

Formulas for the Regression Line  = a + bx

where α is the  intercept and b is the slope of the line.

Rounding Rule for the Intercept and Slope Round the values of α and b to three decimal places.

The steps for finding the regression line equation are summarized in this Procedure Table.

Procedure Table Finding the Regression Line Equation

Finding the Regression Line Equation

Step 1: Make a table, as shown in step 2.

Step 2: Find the values of xy, x2, and y2. Place them in the appropriate columns and sum each column.

xyxyx2y2
.....
.....
 .  .  .  .  . 
Σx =Σy =Σxy =Σx2 =Σy2 =

Step 3: When r is significant, substitute in the formulas to find the values of a and b for the regression line equation  = a + bx.

. margins, at(taxlevel=(.1(.01).3))

EXAMPLE 1: Car Rental Companies

Find the equation of the regression line for the data in BELOW, and graph the line on the scatter plot of the data.

SOLUTION

The values needed for the equation are n = 6, Σx = 153.8, Σy = 10.7, Σxy = 682.77, and Σx2 = 5859.26. Substituting in the formulas, you get

Hence, the equation of the regression line  = a + bx is

To graph the line, select any two points for x and find the corresponding values for y. Use any x values between 10 and 60. For example, let x = 15. Substitute in the equation and find the corresponding  value.

Let x = 40, then

Then plot the two points (15,1.986) and (40,4.636) and draw a line connecting the two points. See Figure 10–13.FIGURE 10–13Regression Line for Example 10–7

Get Help with Data Analysis, Research, Thesis, Dissertation and Assignments.

Data Analytics Services
Need Our Services?
Econometrics & Statistics Modelling Services
Need Help, Whatsapp Us Now