Method of Least Squares Using Excel

In Correlation we study the linear correlation between two random variables x and y. We now look at the line in the xy plane that best fits the data (x₁, y₁), …, (x_n, y_n).

Recall that the equation for a straight line is y = bx + a, where

b = the slope of the line
a = y-intercept, i.e. the value of y where the line intersects with the y-axis

For our purposes, we write the equation of the best fit line as

and so the y-intercept is

For each i, we define ŷ_i as the y-value of x_i on this line, and so

The best fit line is the line for which the sum of the distances between each of the n data points and the line is as small as possible. A mathematically useful approach is therefore to find the line with the property that the sum of the following squares is minimum.

Theorem 1: The best fit line for the points (x₁, y₁), …, (x_n, y_n) is given by

where

Click here for the proof of Theorem 1. Two proofs are given, one of which does not use calculus.

Definition 1: The best fit line is called the regression line.

Observation: The theorem shows that the regression line passes through the point (x̄, ȳ) and has the equation

where the slope is

and the y-intercept is

Note too that b = cov(x,y)/var(x). Since the terms involving n cancel out, this can be viewed as either the population covariance and variance or the sample covariance and variance. Thus a and b can be calculated in Excel as follows where R1 = the array of y values and R2 = the array of x values:

b = SLOPE(R1, R2) = COVAR(R1, R2) / VARP(R2)

a = INTERCEPT(R1, R2) = AVERAGE(R1) – b * AVERAGE(R2)

Property 1:

Proof: By Definition 2 of Correlation,

and so by the above observation we have

Excel Functions: Excel provides the following functions for forecasting the value of y for any x based on the regression line. Here R1 = the array of y data values and R2 = the array of x data values:

SLOPE(R1, R2) = slope of the regression line as described above

INTERCEPT(R1, R2) = y-intercept of the regression line as described above

FORECAST(x, R1, R2) calculates the predicted value y for the given value of x. Thus FORECAST(x, R1, R2) = a + b * x where a = INTERCEPT(R1, R2) and b = SLOPE(R1, R2).

TREND(R1, R2) = array function which produces an array of predicted y values corresponding to x values stored in array R2, based on the regression line calculated from x values stored in array R2 and y values stored in array R1.

TREND(R1, R2, R3) = array function which predicts the y values corresponding to the x values in R3 based on the regression line based on the x values stored in array R2 and y values stored in array R1.

To use TREND(R1, R2), highlight the range where you want to store the predicted values of y. Then enter TREND and a left parenthesis. Next highlight the array of observed values for y (array R1), enter a comma and highlight the array of observed values for x (array R2) followed by a right parenthesis. Finally press Crtl-Shft-Enter.

To use TREND(R1, R2, R3), highlight the range where you want to store the predicted values of y. Then enter TREND and a left parenthesis. Next highlight the array of observed values for y (array R1), enter a comma and highlight the array of observed values for x (array R2) followed by another comma and highlight the array R3 containing the values for x for which you want to predict y values based on the regression line. Now enter a right parenthesis and press Crtl-Shft-Enter.

Excel 2016 Function: Excel 2016 introduces a new function FORECAST.LINEAR, which is equivalent to FORECAST.

Example 1: Calculate the regression line for the data in Example 1 of One Sample Hypothesis Testing for Correlation and plot the results.

Figure 1 – Fitting a regression line to the data in Example 1

Using Theorem 1 and the observation following it, we can calculate the slope b and y-intercept a of the regression line that best fits the data as in Figure 1 above. Using Excel’s charting capabilities we can plot the scatter diagram for the data in columns A and B above and then select Layout > Analysis|Trendline and choose a Linear Trendline from the list of options. This will display the regression line given by the equation y = bx + a (see Figure 1).

Method of Least Squares Using Excel

Method of Least Squares

Get Help with Data Analysis, Research, Thesis, Dissertation and Assignments.

Help with any types of dissertations

Undergraduate dissertation

Master’s dissertation

Ph.D. or doctoral dissertation

Hire from among our most skilled experts to save your time and ease your workload. Prices for Ph.D. assistance start at $10.60 per page.

Dissertation Writing Help

Need Our Services?

Thesis Writing Help

Editing & Proofreading Services

Stuck with Your Research or Data Analysis Project?

Let Our Experts Help You:

Whatsapp Us:

Email Us:

We Make Sense out of your Data

CONTACT US

NAVIGATION

PRIVACY & TOS

Method of Least Squares Using Excel

Get Help with Data Analysis, Research, Thesis, Dissertation and Assignments.

Help with any types of dissertations

Undergraduate dissertation

Master’s dissertation

Ph.D. or doctoral dissertation

Hire from among our most skilled experts to save your time and ease your workload. Prices for Ph.D. assistance start at $10.60 per page.

Dissertation Writing Help

Need Our Services?

Thesis Writing Help

Editing & Proofreading Services

Stuck with Your Research or Data Analysis Project?Let Our Experts Help You:

Whatsapp Us:

Email Us:

We Make Sense out of your Data

CONTACT US

NAVIGATION

PRIVACY & TOS

Stuck with Your Research or Data Analysis Project?

Let Our Experts Help You: