Least Squares Method in Excel. Regression analysis

The least squares method (least squares) is in the field of regression analysis. It has many uses, as it allows an approximate representation of a given function by other simpler ones. OLS can be extremely useful in processing observations, and it is actively used to estimate some values ​​from the results of measurements of others containing random errors. In this article, you'll learn how to implement least squares calculations in Excel.

Statement of the problem on a concrete example

Suppose there are two indicators X and Y. Moreover, Y depends on X. Since MNCs are of interest to us from the point of view of regression analysis (in Excel its methods are implemented using built-in functions), we should immediately proceed to consider a specific problem.

So, let X be the sales area of ​​the grocery store, measured in square meters, and Y the annual turnover, defined in millions of rubles.

It is required to make a forecast what kind of goods turnover (Y) the store will have if it has one or another retail space. Obviously, the function Y = f (X) is increasing, since the hypermarket sells more goods than the stall.

A few words about the correctness of the source data used for prediction

Suppose we have a table built according to data for n stores.

X

x 1

x 2

...

x n

Y

y 1

y 2

...

y n

According to mathematical statistics, the results will be more or less correct if the data on at least 5-6 objects are examined. In addition, “abnormal” results cannot be used. In particular, an elite small boutique can have turnover many times greater than the turnover of large retail outlets of the “masmarket” class.

The essence of the method

The table data can be represented on the Cartesian plane as points M 1 (x 1 , y 1 ), ... M n (x n , y n ). Now the solution of the problem is reduced to the selection of the approximating function y = f (x), which has a graph passing as close as possible to the points M 1, M 2, .. M n .

Of course, it is possible to use a polynomial of a high degree, but this option is not only difficult to implement, but simply incorrect, as it will not reflect the main trend that needs to be detected. The most reasonable solution is to find the straight line y = ax + b, which best approximates the experimental data, and more precisely, the coefficients a and b.

regression model example

Accuracy rating

In any approximation, the estimation of its accuracy is of particular importance. We denote by e i the difference (deviation) between the functional and experimental values ​​for the point x i , i.e., e i = y i - f (x i ).

Obviously, to estimate the accuracy of approximation, the sum of deviations can be used, i.e., when choosing a straight line for an approximate representation of the dependence of X on Y, one should give preference to the one with the smallest value of the sum e i at all points under consideration. However, not everything is so simple, since along with positive deviations practically negative ones will also be present.

You can solve the problem using the modules of deviations or their squares. The latter method was most widely used. It is used in many areas, including regression analysis (in Excel, it is implemented using two built-in functions), and has long been proven to be effective.

Least square method

In Excel, as you know, there is a built-in auto-sum function that allows you to calculate the values ​​of all values ​​located in the selected range. Thus, nothing will stop us from calculating the value of the expression (e 1 2 + e 2 2 + e 3 2 + ... e n 2 ).

In a mathematical notation, this has the form:

regression model example

Since the decision was first made to approximate using a straight line, we have:

Excel Formulas for Dummies

Thus, the problem of finding the line that best describes the specific dependence of the quantities X and Y comes down to calculating the minimum function of two variables:

using functions in Excel

To do this, we need to set partial derivatives with respect to the new variables a and b to zero, and solve a primitive system consisting of two equations with 2 unknowns of the form:

regression analysis in Excel

After some simple transformations, including dividing by 2 and manipulating the sums, we get:

MNCs in Excel

Solving it, for example, using the Cramer method, we obtain a stationary point with certain coefficients a * and b * . This is the minimum, i.e., to predict what the store’s turnover will be for a certain area, the straight line y = a * x + b * is suitable, which is a regression model for the example in question. Of course, it will not allow you to find the exact result, but it will help to get an idea of ​​whether the purchase will pay off on credit of the store of a particular area.

How to implement a least squares method in Excel

In "Excel" there is a function for calculating the value by OLS. It has the following form: “TREND” (known values ​​of Y; known values ​​of X; new values ​​of X; const.). We apply the formula for calculating OLS in Excel to our table.

To do this, in the cell in which the result of the calculation by the least squares method is to be displayed in Excel, we enter the “=” sign and select the “TREND” function. In the window that opens, fill in the appropriate fields, highlighting:

  • the range of known values ​​for Y (in this case, data for turnover);
  • range x 1 , ... x n , i.e., the size of the retail space;
  • both known and unknown values ​​of x, for which you need to find out the size of the turnover (information on their location on the worksheet see below).

In addition, in the formula there is a logical variable "Const". If you enter 1 in the corresponding field, then this will mean that calculations should be carried out, assuming that b = 0.

If you need to find out the forecast for more than one value of x, then after entering the formula, you should not click on “Enter”, but rather type on the keyboard the combination “Shift” + “Control” + “Enter”.

Some features

Regression analysis may be available even to dummies. The Excel formula for predicting the value of an array of unknown variables - "TREND" - can be used even by those who have never heard of the least squares method. Just know some of the features of her work. In particular:

  • If you place the range of known values ​​of the variable y in one row or column, then each row (column) with known values ​​of x will be perceived by the program as a separate variable.
  • If the “TREND” window does not specify a range with known x, then if you use the function in Excel, the program will consider it as an array of integers, the number of which corresponds to the range with the given values ​​of the variable y.
  • In order to get an array of “predicted” values ​​at the output, the expression for calculating the trend needs to be entered as an array formula.
  • If new x values ​​are not specified, then the TREND function considers them equal to known. If they are not specified, then array 1 is taken as an argument; 2; 3; 4; ..., which is commensurate with the range with parameters y already set.
  • A range containing new x values ​​must consist of the same or more rows or columns as a range with given y values. In other words, it must be a proportionate independent variable.
  • An array with known x values ​​may contain several variables. However, if we are talking only about one, then it is required that the ranges with the given values ​​of x and y be proportionate. In the case of several variables, it is necessary that the range with the given values ​​of y fit in one column or in one row.

function window feed

Predict function

Regression analysis in Excel is implemented using several functions. One of them is called “PREDICTION”. It is similar to "TRENDS", that is, it produces the result of calculations using the least squares method. However, only for one X for which the value of Y is unknown.

Now you know the formulas in Excel for dummies that allow you to predict the future value of a particular indicator according to a linear trend.


All Articles