Correlation analysis is a set of mathematically justified methods by which the correlation dependence between a pair of factors or features having a random component is detected. In the set of techniques used in this research method, widespread:
- construction of correlation fields, preparation of correlation tables;
- calculation of the correlation ratio or sample coefficients;
- Testing the hypothesis of statistical significance of relationships.
Continued research leads to the establishment of specific types of relationships between quantities. The relationship between random signs or factors, the number of which exceeds three, requires the use of multivariate analysis.
The field and table, the construction of which is involved in correlation analysis, are used as auxiliary tools in the analysis of sample data. Drawing selective points on the field of the coordinate plane , we obtain the so-called correlation field. By the way the points are located, it is already possible to make a preliminary forecast and determine the shape of the dependence of random variables. Numerical processing of the results requires grouping them in the form of a correlation table.
Having first appeared in the 18th century, the term โcorrelationโ with the light hand of paleontologist Georges Cuvier began to be actively used for the process of restoring the appearance of fossil animals in some parts of its remains. The development of a narrowly focused paleontological method has led to the fact that correlation analysis began to be used in a wide variety of spheres of human life.
This method is attractive for processing statistical data. Correlation analysis in statistics was first used by the English biologist and statistician Francis Galton at the end of the 19th century. Further development of the method made it possible to measure the tightness of the connection between the pair and a large number of variables. Correlation analysis has a close relationship with regression analysis.
A special place is taken by correlation analysis in the economy. But its use imposes a number of limitations. First of all, it is the presence of a sufficient number of measurements and data for study. Practice suggests that the number of observations should exceed 5-6 times the number of factors. The best option is to have a number of observations that exceeds the number of factors by several tens of times. In this case, the law of large numbers applies, thanks to it there will be a mutual cancellation of random fluctuations.
It should also be ensured that the entire set of factor and effective attributes obeys the normal multidimensional distribution. There are cases when the volume of the population is not enough to carry out formal testing for compliance with the distribution normality, then the determination of the distribution law is visually carried out according to the correlation field. If the points are arranged according to a linear trend, then it is quite realistic to conclude that the set of source data will satisfy the requirements of the normal distribution law.
In the original set of values, it is necessary to monitor qualitative uniformity.
The fact of a correlation dependence does not yet give grounds for asserting that an arbitrarily taken variable precedes the appearance of the second or causes its changes, in other words, there is no strict causal relationship between them, and even the action of some third factor is possible.
Putting into practice the results of analysis based on correlation research methods, a number of certain conclusions can be drawn about the presence, and most importantly, about the nature of interdependence. This already gives a significant share of information about the object under investigation.