My business is Franchises. Ratings. Success stories. Ideas. Work and education
Site search

Estimated dispersion, standard deviation.

X i - random (current) values;

the average value of random variables in the sample is calculated by the formula:

So, variance is the mean square of the deviations . That is, the average value is first calculated, then taken the difference between each original and mean value, squared , is added and then divided by the number of values ​​in the given population.

The difference between the individual value and the mean reflects the measure of the deviation. It is squared to ensure that all deviations become exclusively positive numbers and to avoid mutual cancellation of positive and negative deviations when they are summed. Then, given the squared deviations, we simply calculate the arithmetic mean.

The clue to the magic word "dispersion" lies in just these three words: average - square - deviations.

Standard deviation (RMS)

Taking the square root of the dispersion, we get the so-called " standard deviation". There are names "standard deviation" or "sigma" (from the name of the Greek letter σ .). The formula for the standard deviation is:

So, variance is sigma squared, or - standard deviation squared.

The standard deviation, obviously, also characterizes the measure of data dispersion, but now (unlike dispersion) it can be compared with the original data, since they have the same units of measurement (this is clear from the calculation formula). The range of variation is the difference between the extreme values. Standard deviation, as a measure of uncertainty, is also involved in many statistical calculations. With its help, the degree of accuracy of various estimates and forecasts is established. If the variation is very large, then the standard deviation will also be large, therefore, the forecast will be inaccurate, which will be expressed, for example, in very wide confidence intervals.

Therefore, in the methods of statistical data processing in real estate appraisals, depending on the required accuracy of the task, the rule of two or three sigmas is used.

To compare the two sigma rule and the three sigma rule, we use the Laplace formula:

F - F,

where Ф(x) is the Laplace function;



Minimum value

β = maximum value

s = sigma value (standard deviation)

a = mean value

In this case, a particular form of the Laplace formula is used when the boundaries α and β of the values ​​of the random variable X are equally spaced from the distribution center a = M(X) by some value d: a = a-d, b = a+d. Or (1) Formula (1) determines the probability of a given deviation d of a random variable X with a normal distribution law from its mathematical expectation М(X) = a. If in formula (1) we take successively d = 2s and d = 3s, then we get: (2), (3).

Two sigma rule

Almost reliably (with a confidence probability of 0.954) it can be argued that all values ​​of a random variable X with a normal distribution law deviate from its mathematical expectation M(X) = a by an amount not greater than 2s (two standard deviations). Confidence probability (Pd) is the probability of events that are conditionally accepted as reliable (their probability is close to 1).

Let's illustrate the rule of two sigma geometrically. On fig. 6 shows a Gaussian curve with a distribution center a. The area bounded by the entire curve and the Ox axis is 1 (100%), and the area of ​​the curvilinear trapezoid between the abscissas a–2s and a+2s, according to the two sigma rule, is 0.954 (95.4% of the total area). The area of ​​the shaded areas is equal to 1-0.954 = 0.046 (>5% of the total area). These sections are called the critical range of the random variable. The values ​​of a random variable that fall into the critical region are unlikely and in practice are conditionally taken as impossible.

The probability of conditionally impossible values ​​is called the significance level of a random variable. The significance level is related to the confidence level by the formula:

where q is the significance level, expressed as a percentage.

Three sigma rule

When solving issues that require greater reliability, when the confidence probability (Pd) is taken equal to 0.997 (more precisely, 0.9973), instead of the two-sigma rule, according to formula (3), the rule is used three sigma.



According to three sigma rule with a confidence level of 0.9973, the critical area will be the area of ​​the attribute values ​​outside the interval (a-3s, a+3s). The significance level is 0.27%.

In other words, the probability that the absolute value of the deviation will exceed three times the standard deviation is very small, namely 0.0027=1-0.9973. This means that only in 0.27% of cases this can happen. Such events, based on the principle of the impossibility of unlikely events, can be considered practically impossible. Those. high precision sampling.

This is the essence of the three sigma rule:

If a random variable is normally distributed, then the absolute value of its deviation from the mathematical expectation does not exceed three times the standard deviation (RMS).

In practice, the three-sigma rule is applied as follows: if the distribution of the random variable under study is unknown, but the condition specified in the above rule is met, then there is reason to assume that the studied variable is distributed normally; otherwise, it is not normally distributed.

The level of significance is taken depending on the permitted degree of risk and the task. For real estate appraisals, a less accurate sample is usually taken, following the two sigma rule.