🥃 💤 🥈 Statistical model: method essence, construction and analysis 🚵🏿 👠 🚾

A statistical model is a mathematical projection that embodies a set of different assumptions regarding the generation of some sample data. This term is often presented in a significantly idealized form.

The assumptions expressed in the statistical model show a set of probability distributions. Many of which, as implied, correctly approximate the distribution from which a certain set of information is selected. The probability distributions inherent in statistical models are what distinguish the projection from other mathematical modifications.

General projection

The mathematical model is a description of the system using certain concepts and language. They are used in the natural sciences (such as physics, biology, earth science, chemistry) and engineering disciplines (such as computer science, electrical engineering), as well as in the social sciences (such as economics, psychology, sociology, political science).

A model can help explain the system and study the effects of various components, as well as make predictions of behavior.

Mathematical models can take various forms, including dynamical systems, statistical projections, differential equations, or game-theoretic parameters. These and other types may intersect, and this model includes many abstract structures. In general, mathematical projections can include logical components. In many cases, the quality of the scientific field depends on how well the mathematical models developed on the theoretical side are consistent with the results of repeated experiments. The lack of agreement between theoretical processes and experimental measurements often leads to important advances as more advanced theories are developed.

In the physical sciences, the traditional mathematical model contains a large number of the following elements:

Governing equations.
Additional submodels.
Definition of equations.
Constituent equations.
Assumptions and limitations.
Initial and boundary conditions.
Classical constraints and kinematic equations.

Formula

The statistical model, as a rule, is given by mathematical equations that combine one or more random variables and, possibly, other naturally occurring variables. Similarly, projection is considered to be a “formal concept of concept."

All statistical hypothesis tests and statistical estimates are derived from mathematical models.

Introduction

Informally, a statistical model can be considered as an assumption (or a set of assumptions) with a certain property: it allows you to calculate the probability of any event. As an example, consider a couple of regular hexagonal cubes. Two different statistical assumptions about bone need to be studied.

The first assumption is as follows:

For each of the dice, the probability of one of the numbers falling out (1, 2, 3, 4, 5, and 6) is: 1/6.

From this assumption, we can calculate the probability of both cubes: 1: 1/6 × 1/6 = 1/36.

In a more general sense, the probability of any event can be calculated. However, it should be understood that it is impossible to calculate the probability of any other non-trivial event.

Only the first opinion collects a statistical mathematical model: due to the fact that with only one assumption can the probability of each action be determined.

In the above sample, with initial permission, it is easy to determine the possibility of an event. With some other examples, the calculation can be difficult or even unrealistic (for example, it can require many years of calculation). For the person who makes up the statistical analysis model, such complexity is considered unacceptable: making calculations should not be practically impossible and theoretically impossible.

Formal definition

In mathematical terms, a statistical model of a system is usually considered as a pair (S, P), where S is the set of possible observations, that is, the sample space, and P is the set of probability distributions on S.

The intuition of this definition is as follows. It is assumed that there is a “true” probability distribution caused by a process that generates certain data.

Set

It is he who determines the parameters of the model. Parameterization, as a rule, requires that different values lead to excellent distributions, i.e.

must hold on (in other words, it must be injective). A parameterization that meets the requirement is called identifiable.

Example

Suppose that there are a certain number of students who have different ages. The growth of the child will be stochastically associated with the year of birth: for example, when a student is 7 years old, this affects the probability of growth, only so that a person will be above 3 centimeters.

You can formalize this approach into a linear regression model, for example, as follows: height i = b 0 + b 1agei + εi, where b 0 is the intersection, b 1 is the parameter by which age is multiplied when receiving elevation monitoring. This is the term error. That is, this suggests that growth is predicted by age with a certain error.

The valid form must answer all points of information. Thus, the rectilinear direction (level i = b 0 + b 1agei) is not able to be an equation for a data model - if it does not clearly correspond to absolutely all points. That is, without exception, all information flawlessly reclines on the line. The participant of the error εi must be entered into equality so that the form corresponds to absolutely all points of information.

To make a statistical conclusion, we first need to accept some probability distributions for ε i. For example, it can be assumed that the distributions ε i have a Gaussian shape with zero mean. In this case, the model will have 3 parameters: b 0, b 1 and the variance of the Gaussian distribution.

You can formally specify the model in the form (S, P).

In this example, the model is defined by specifying S, and therefore some assumptions related to P. can be made. There are two options:

This growth can be approximated by a linear function of age;

That the errors in the approximation are distributed both inside the Gaussian.

General remarks

The statistical parameters of models are a special class of mathematical projection. What distinguishes one species from another? So it is that the statistical model is non-deterministic. Thus, in it, unlike mathematical equations, certain variables do not have specific values, but instead have a distribution of capabilities. That is, individual variables are considered stochastic. In the previous example, ε is a stochastic variable. Without it, the projection would be deterministic.

The construction of a statistical model is often used, even if the material process is considered deterministic. For example, tossing coins is, in principle, a predetermining action. However, in most cases, this is modeled as stochastic (through the Bernoulli process).

According to Konishi and Kitagawa, there are three goals for a statistical model:

Predictions.
Information mining.
Description of stochastic structures.

Projection size

Suppose there is a statistical forecasting model,

A model is called parametric if O has a finite dimension. In the decision it is necessary to write that

where k is a positive integer (R denotes any real numbers). Here k is called the dimension of the model.

As an example, we can assume that all data arise from a one-dimensional Gaussian distribution:

In this example, the dimension k is 2.

And as another example, it can be assumed that the data consists of points (x, y), which are supposed to be distributed in a straight line with Gaussian residues (with a zero mean). Then the dimension of the statistical economic model is 3: the intersection of the line, its slope and dispersion of the distribution of residuals. It should be noted that in geometry a straight line has a dimension of 1.

Although the above value is formally the only parameter that has dimension k, it is sometimes considered as containing k individual values. For example, with a one-dimensional Gaussian distribution, O is the only parameter with a size of 2, but is sometimes considered as containing two separate parameters - the average value and standard deviation.

The statistical model of the process is nonparametric if the set of values of O is infinite. And also it is semi-parametric if it has both finite-dimensional and infinite-dimensional parameters. Formally, if k is a dimension of O and n is the number of samples, semi-parametric and non-parametric models have

then the model is semi-parametric. Otherwise, the projection is nonparametric.

Parametric models are the most commonly used statistics. Regarding semi-parametric and non-parametric projections, Sir David Cox stated:

“As a rule, they imply the smallest number of hypotheses about the texture and shape of the distribution, but they include powerful theories of independence.”

Nested Models

Do not confuse them with multi-level projections.

Two statistical models are nested if the first can be converted to the second by imposing restrictions on the parameters of the first. For example, the set of all Gaussian distributions has a set of distributions with zero mean embedded in it:

That is, it is necessary to limit the average in the set of all Gaussian distributions in order to obtain distributions with a zero mean. As a second example, the quadratic model y = b 0 + b 1 x + b 2 x 2 + ε, ε ~ N (0, σ ² ) has the linear model embedded in it y = b ₀ + b ₁ x + ε, ε ~ N (0, σ ² ) - that is, the parameter b ₂ is 0.

In both of these examples, the first model has a higher dimension than the second model. This is often, but not always. As another example, we can cite a lot of Gaussian distributions with a positive mean, which has dimension 2.

Model Comparison

It is assumed that there is a “true” probability distribution underlying the observed data induced by the process that generated them.

And also models can be compared with each other, with the help of exploratory analysis or confirmatory. In a research analysis, various models are formulated, and an assessment is made of how well each of them describes the data. In the confirmatory analysis, the previously formulated hypothesis is compared with the original. General criteria for this include P ² , the Bayes factor, and relative probability.

Thought of Konishi and Kitagawa

“Most of the problems of a statistical mathematical model can be considered as questions related to forecasting. They are usually formulated as comparisons of several factors. ”

In addition, Sir David Cox said: “As a translation from a topic, the problem in the statistical model is most often the most important part of the analysis.”

Statistical model: method essence, construction and analysis