My business is Franchises. Ratings. Success stories. Ideas. Work and education
Site search

Distribution of workers by category. Distribution of workshop workers by category

Modal level (option) – 5, because it has the highest frequency ( f=55).

Median () is the value of a varying characteristic that divides the population in half, i.e. lies in the middle of the ranked series.

Place of the median in the series:

a) with an odd number of units:

;

b) with an even number of units:

In our example, the median is rank 4.

Formulas for calculating structural averages based on grouped data (with intervals):

, (6.21)

where is the lower limit of the modal interval (the interval with the highest frequency); i– interval size; – frequencies of modal, premodal and postmodal intervals, respectively.

, (6.22)

where is the lower limit of the median interval in which half of the population volume units are located; i– interval size; – the sum of all frequencies; – the sum of frequencies preceding the median interval; – frequency of the median interval.

Example 8. The table shows data on the work experience of 30 workers in the workshop.

Solution:

The modal interval with work experience from 6 to 12 years has a frequency of 12. Then the mode is equal to:

years.

Fashion shows that most often shop workers have 8.5 years of experience.

The median interval (which contains half the frequencies of the population, i.e. 15 people) will also be with an experience of 6 to 12 years.

The median is:

years.

The median shows that half of the workers have up to 10 years of experience, half have more than 10 years of experience.

Structural averages can be determined not only by formulas, but also graphically: mode by histogram, median by cumulate.

To graphically determine the mode in the histogram, three bars are used: the highest and two adjacent to it - on the left and on the right. Inside the column with the greatest height, two lines are drawn: the first connects its upper right corner with the upper right corner of the previous column, and the left one connects it with the upper left corner of the next one. The abscissa of the point of intersection of these lines is the mode of the distribution, presented in the form of a histogram (Fig. 6.1).

Figure 6.1 Graphical representation of mode in a distribution histogram

To determine the median graphically, a cumulate is constructed and the last ordinate of the cumulate is divided in half. A straight line is drawn through the resulting point parallel to the abscissa axis until it intersects with the cumulate. The abscissa of the intersection point is the median of the graphically presented distribution (Fig. 6.2).

Figure 6.2 Graphical representation of the median on the cumulate of the distribution

CALCULATION OF MAIN TECHNICAL AND ECONOMIC INDICATORS OF ENTERPRISE ACTIVITY

Student day department 2 year group 171010

Kubatina Pavel….

Scientific adviser:

Ph.D. N. A. Gerasimova

BELGOROD, 2011

Initial data for the calculation task.

To calculate the main technical and economic indicators, the following data will be used:

1. Issue finished products

Ned.= 152(n.h.)

2. Duration of the production cycle for manufacturing products

Shopping center= 18(d.)

3. Labor intensity of one product

Tizd= 50,017(n.h.)

4. Product readiness ratio

Kg = 0,5

5. Services and works of a production nature

U= 19(n.h.)

6. Actual balance of work in progress at the time of planning

Nf= 5.5(n.h.)

7. Finished main products

G= 1417(n)


1. Calculation of the planned gross volume of work……………………………5

1.1. Calculation of volumes of unfinished products………………………………5

1.2. Calculation of the gross volume of work……………………………………………..5

2.Planning the number of personnel……………………………………...6

2.1. Calculation of working time funds…………………………………………………….6

3.Planning the salary fund……………………………………………………...8

3.1. Calculation of the average tariff rate…………………………………………...8

3.1.1. Distribution of production workers by category and working conditions………………………………………………………………………………….8

3.1.2. Calculation of the tariff rate for workers by category………………………………………………………………………………….…9

3.1.3. Calculation of the average tariff rate……………………………….…9

3.2.Calculation of fixed and additional salary funds for production workers……………………………………………………………………………………10

3.3. Determination of the average monthly salary of a production worker……….11

3.4. Calculation of the salary fund for other categories of employees……………….........12

4.Planning product costs…………………………….…14


Calculation of the planned gross volume of work.

Calculation of volumes of unfinished products.

To begin with, let's calculate the volume of work in progress at the beginning of the year: Nng = N f +h, where h is the increase in work in progress before the start of the planning period; let's take h=0

Nkg = [(Nizd*Tts*Tizd*Kg)/T] = (152*18*50017*0.5)/ 365 =

where T is the planned period (365 days)

Calculation of the gross volume of work.

The gross volume of work will be calculated using the formula:

Vnch = G + U + Ng + Well (n-h), where:

Ng – balance of work in progress for main productsNg = Nkg - Nng = 187461 – 5.5 = 187455.5

Well – the balance of work in progress for services, let’s take Well = 0

VNC=1417+19+187455.5+0=188891.5
2. Personnel planning.

Calculation of working time funds.

Let's calculate the nominal working time fund:

Fн = (Dyear – Dpr – Dout) * tп, where:

Dyear – number of days in a year (365),

Dpr – quantity holidays per year(11),

Two – number of days off with a 5-day work week (104),

tп – full working day.

tп = Mon/Day, where:

Mon – length of the working week (40 hours),

Day – number of working days in a week (5 hours) => tп = 8 hours

Fн = (365 - 11 – 104)*8h = 2000h

Let's calculate the effective working time fund:

Feff = (Dyr – Dpr – Dout) * tс, where tс is the average working day.

To calculate tс, internal losses are subtracted from the total duration: downtime - 1.5%, technical training - 2.52%, other - 0.17%.

We summarize the calculations for tc in Table 2.1:

Table 2.1.

Then Fef = 250 * 7.6648 = 1916.2 h

Headcount planning is carried out in the following categories:

1) Rpr – production workers

2) Rvr – auxiliary workers

3) Ritr – engineering and technical workers

4) RSL – employees

5) Rmop – junior service personnel

The number of production workers will be calculated using the formula:

Rpr = Vnch/(Feff*Kvn), where Kvn is the coefficient of fulfillment of production standards = 1.1

The wages of production workers are determined on the basis of hourly wage rates, depending on working conditions, qualification category and forms of remuneration. The exact distribution of workers by category and working conditions depends on the characteristics of a particular production. For mechanical engineering enterprises, approximately the following distribution is observed:

  • 1) about 80% of the average number of production workers work in cold jobs under normal working conditions, and in hot and hard work -20%;
  • 2) of all production workers, approximately 65% ​​work on a piece-rate basis, and the rest - on a time-based basis;
  • 3) each group of workers is usually distributed among themselves in percentage terms according to qualification categories(an example of distribution is given in the bottom lines of Table 2)

Since this project requires, first of all, to determine the basic wages of the main workers, it is their number (Chor) that should be distributed according to working conditions and qualification categories. To calculate the corresponding number, it is advisable to use the diagram (tree) shown in Fig. 1. The following designations are used: Chn - the number of people working under normal conditions; Chn - the number of people working in difficult conditions; Chsd - the number of people working on a piece-rate wage system; Chpv - number of people working time system wages; Chsd and Chvp - the number of workers with the i-th category; a1 is a coefficient that takes into account the share of workers with the i-th category in the corresponding group.

Fractional values ​​of indicators Chnd, Chnvp-, etc. should be rounded to whole numbers so that their final sum is equal to the total value - Chor. It is assumed that the number of auxiliary workers is distributed according to working conditions and categories in the same way as the main ones.

Hourly tariff rates, rub.

At cold jobs with normal working conditions

(Chn):. Piece workers (Chnsd)

Znsd3 = =5.12

time workers (Chnpv)

Znpv3 = =4.78

Znpv4 = =5.36

Znpv5 = =6.10

During hot and heavy work (Ht):

Piece workers (Ch tsd)

time workers (Chtpv)

Distribution of workers by category, %

Coefficient bi, taking into account

number of workers of the corresponding category

Chor=360111, persons Chn=0.8*360=288, persons H t=0.2*360=72, pers.

Chnsd=0.65*288=187, pers. H tsd=0.65*72=46, pers.

Chnsd1 =187*0.05=9, pers. H tsd1=46*0.05=2, pers.

Chnsd2 =187*0.12=22, pers. H tsd2=46*0.12=5, pers.

Chnsd3 =187*0.5=93, pers. H tsd3=46*0.5=23, pers.

Chnsd4 =187*0.2=37, pers. H tsd4=46*0.2=9, pers.

Chnsd5 =187*0.1=18, pers. H tsd5=46*0.1=4, pers.

Chnsd6 =187*0.03=5, pers. H tsd6=46*0.03=1, pers.

Chnpv=0.35*288=100, persons. H tpv =0.35*72=25, pers.

Chnpv1 =100*0.05= 5 people. H tpv1=25*0.05=1, pers.

Chnpv2=100*0.12=12, persons. H tpv2=25*0.12=3, pers.

Chnpv3=100*0.5=50, persons. H tpv3=25*0.5=12, pers.

Chnpv4=100*0.2=20, persons. H tpv4=25*0.2=5, pers.

Chnpv5 =100*0.1=10, persons. H tpv5=25*0.1=2, pers.

Chnpv6 =100*0.03=3, pers. H tpv6=25*0.03=1, pers.

It is necessary to determine with a probability of 0.997 the limits within which the average category of workers lies machine shop.

Let us determine the sample averages for teams and the overall average:

Let us determine the inter-series variance:

Let's calculate the average sampling error:

Let's calculate the maximum sampling error with a probability of 0.997: .

With a probability of 0.997, it can be stated that the average category of workers in a machine shop is within .◄

At repeated serial selection The average sampling error for the share is determined by the formula:

where is the inter-run dispersion of the share.

Example.

200 boxes of parts are packed in 40 pieces. in everyone. To check the quality of parts, a continuous inspection of parts was carried out in 20 boxes (non-repetitive sampling). As a result of the control, it was found that the proportion of defective parts is 15%. The inter-batch variance is 49. With a probability of 0.997, we determine the limits within which the proportion of defective products in a batch of boxes lies.

Let's determine the average sampling error for the share: .

The maximum sampling error for the proportion with probability 0.997 is: .

With a probability of 0.997, it can be stated that the proportion of defective parts in the batch will range from 10.59% to 19.41%.

Example

To determine the speed of settlements with creditors, 50 payment documents were selected by mechanical sampling, for which the average time for transferring money turned out to be 28.2 days with a standard deviation of 5.4 days. It is required to determine the average term of all payments during a given year with a probability of 0.95.

Solution. Marginal sampling error

Then, with a probability of 0.95, it can be stated that the average duration of settlements for the enterprise of this trust is no less than 26.7 days (28.2 - 1.49) and no more than 29.7 days (28.2 + 1.49). ◄

Example

The general population N consists of 100,000 units, divided into 200 series of equal volume. A non-repetitive sampling (m) of 50% of the series and 20% of units from each series was made. The average of the serial variances turned out to be equal to 12, and the inter-series variance was 5. It is necessary to determine the average sampling error.

We determine the total number of units selected serially: . The number of units that make up an individual sample: Using the formula for the average error for non-repetitive sampling, we find:

You can make a sample of the same size of 100,000 units by selecting 20% ​​of the series and 50% of the units from each series. With the same values ​​of the average of the serial variances and the inter-run variance, the average error of this sample would double.

The distribution of sample mean values ​​always has a normal distribution law (or approaches it) at , regardless of the nature of the distribution of the general population. However, in the case of small samples, a different distribution law applies - Student distribution. In this case, the confidence coefficient is found from Student's t-distribution tables depending on the confidence level and sample size. For individual values, the confidence probability of a small sample is determined using special Student tables (Table 9), which give the distribution of standardized deviations:

Table 9.

n t
0,5 1,0 1,5 2,0 3,0
0,347 0,609 0,769 0,861 0,942
0,362 0,637 0,806 0,898 0,970
0,368 0,649 0,823 0,914 0,980
0,371 0,657 0,832 0,923 0,985
0,376 0,666 0,846 0,936 0,992
0,377 0,670 0,850 0,940 0,993

Since when conducting a small sample, the value of 0.95 or 0.99 is practically accepted as a confidence probability, the following readings of the Student distribution are used to determine the maximum error of a small sample (Table 10)

Table 10.

n
0,95 0,99
3,183 5,841
2,777 4,604
2,571 4,032
2,447 3,707
2,364 3,500
2,307 3,356
2,263 3,250
2,119 2,921
2,078 2,832

Example.

At control check quality of sausages supplied for sale, data on the content of table salt in the samples was obtained. According to the sample survey data, it is necessary to establish, with a probability of 0.95, the limit within which the average percentage of table salt content in a given batch of goods lies.

We draw up a calculation table and, based on its results, determine the average sample of a small sample (Table 11).

Table 11.

Samples
4,3 0,2 0,04
4,2 0,1 0,01
3,8 0,3 0,09
4,3 0,2 0,04
3,7 - 0,4 0,16
3,9 - 0,2 0,04
4,5 0,4 0,16
4,4 0,3 0,09
4,0 - 0,1 0,01
3,9 - 0,2 0,04
41,0 - 0,68

We determine the variance of a small sample:

We determine the average error of a small sample:

Based on the sample size (n=10) and the specified probability =0.95, the value of the confidence coefficient t=2.263 is established using the Student distribution (see Table 10).

The marginal error of a small sample will be:

Therefore, with a probability of 0.95 it can be stated that in the entire batch of sausage the content of table salt is within the limits:

Those. from 4.1% - 0.2%=3.9% to 4.1%+0.2%=4.3%.◄

Example

It is required to construct a 99% confidence interval for estimating the general average diameter of a product based on a sample of 10 parts processed on an automatic lathe, if the deviations of the dimensions of these parts from the middle of the tolerance field turned out to be as follows (Table 12):

Table 12.

Sample average micro. The sample variance is 5.2:

The mean square error of the sample will be 0.76 microns: mk.

With P = 0.99 and the number of degrees of freedom k = 9, we find from the table that the t value is 3.25. Then, with a probability of 0.99, we can assume that the error of the sample average will be no more than 2.47 μm (3.25 x 0.76), and the acceptable values ​​of the population parameter lie in the range from – 0.47 to +4.47 kg (2.0 ± 2.47).◄

4. Definition required number samples. Before directly conducting a sample observation, the question of how many units of the population under study must be selected for the survey is always resolved. Sample size may be determined in accordance with the provisions:

· type of intended sample;

· selection method (repeated or non-repetitive);

· selection of the parameter to be evaluated (average value of a characteristic or proportion).

In addition, it is necessary to determine in advance the value of the confidence probability that suits the consumer of information, and the size of the permissible maximum sampling error.

These problems are solved on the basis of the theorems of P. Chebyshev and A. Lyapunov. The value of the maximum sampling error for a purely random, mechanical sample is determined as follows:

For purely random and mechanical sampling with a non-repetitive selection method, the required sample size for the average quantitative characteristic is calculated using the formula

When determining from sample materials share of the characteristic, and not its average value, the size of the sample population will be determined by the following formulas.

For re-selection:

For non-repetitive selection:

The quantity characterizing the dispersion in the population is often unknown. In mathematical statistics it has been proven that the relationship between the general and sample variances is determined by the equality

Since for sufficiently large values ​​the value is close to unity, we can assume that . Therefore, in practice, the sample variance is used as an estimate of the general variance. Note that at the beginning of the sample observation the variation indicators are unknown, so determining the required sample size is often serious problem associated with determining the variation indicator of the characteristic being studied. Approximately the variation index is determined in one of the following ways:

· taken from previous studies;

· if the structure and conditions of development are sufficiently stable, or knowing the approximate value of the average, the dispersion is found from the relation;

· if and are known, then the standard deviation can be determined in accordance with the “three sigma” rule: , since in a normal distribution the range of variation fits within . If the distribution is obviously asymmetrical, then ;

when studying an alternative characteristic for the case where the frequency is even approximately unknown, you can take the maximum value of the dispersion of the share equal to 0.25, i.e. . In this case we have for repeated selection, for non-repetitive selection;

· carry out a “test” sample, from which the variation index is calculated, used as an estimate of the general population.

Since the general variance is estimated approximately, the sample size is rounded up both for repeated and non-repetitive sampling, since there must always be some “reserve” in the number of surveyed units to ensure the required accuracy of the results.

Often in practice, it is not the value of the absolute maximum error that is specified, but the value of the relative error, expressed as a percentage of the average:

Where .

Substituting the value expressed through the relative error into the formula for determining , we obtain the following expression for determining the required sample size:

As is known, the ratio is the coefficient of variation , where

With non-repetitive sampling, the sample size is calculated using the formula

If the maximum sampling error and sample size are given, then you can determine the value of the coefficient, knowing which, you can determine the probability from the table.

Example

How many travel agents need to be surveyed in travel enterprises in the region in order to obtain a characteristic of the average level of remuneration for this category of workers in the region? It is known that the difference between the highest and lowest levels of remuneration for travel agents in the region is 300 thousand rubles.

For a normal distribution in the interval ± 3s includes 99.7% of all variants of attribute values, which means, in relation to the problem under consideration, that 300 thousand rubles. approximately equal to six standard deviations (300 » 6s). Therefore, an approximate estimate of the standard deviation wages in the general population of travel agents in the region will be 50 thousand rubles. (). For further calculations, it is enough that with a probability of 0.954 the maximum sampling error does not exceed 10 thousand rubles. Then, knowing that s = 50 thousand rubles, a t = 2, and using formula (5.6) to determine the required sample size, we obtain: people

Thus, under given conditions, it is necessary to survey the salaries of 100 travel agents in the region.◄

Example

What size should the sample be from a population that includes 8,000 young investors so that with a probability of 0.954 the relative marginal error is no more than 1%, if it is known that the coefficient of variation of the attribute for the entire population is 0.125, that is, 12.5%?

With V=12.5%, =1%, t=2 we have people◄

Example

Using a sample survey certain group population (N = 5,000), it is required to determine the proportion of families that are this moment do not have an imported car. The maximum sampling error should be no more than 0.01 with a probability of 0.954. It can be assumed that the proportion in the population is less than 0.2. What should the sample size be?

Household

The share of households that do not have an imported car is . If in this example we do not take into account the volume of the population, then the calculations lead to a meaningless result:

Example

In a sample of 1000 units, the proportion of defective products was 2%. What is the probability that in the entire batch of products (10,000 pieces) the proportion of defective products will be in the range from 1.5 to 2.5%?

The confidence probability that needs to be determined is a function t. The latter is found from the formula for the maximum sampling error , where . The value of the maximum sampling error can be defined as the difference between the maximum permissible general share (according to the condition, it is equal to 2.5%) and the proportion of defective products in the sample (according to the condition, 2%).

Thus = 0.5% (2.5% – 2.0%). Since the sample is random, non-repetitive, the value of the average sampling error is found by the formula

We find the value of the confidence coefficient: .

According to the tables of the Laplace integral function, the probability corresponding to a given coefficient value t, equal to 0.76595. ◄

5. Methods of disseminating sample data to the general population. The sampling method is most often used to obtain characteristics of the population according to the corresponding sample indicators. Depending on the purposes of the research, two method of extending sample observation to the general population: direct recalculation of sample indicators for the general population or by calculating correction factors.

Direct conversion method is that the indicators of the sample share or average are extended to the general population, taking into account the sampling error. In this case, the general average is defined as , and the general share is .

Thus, in trade, the number of non-standard products received in a consignment is determined. To do this (taking into account the accepted degree of probability), the indicators of the share of non-standard products in the sample are multiplied by the number of products in the entire batch of goods.

Example.

During a random inspection of a batch of sliced ​​loaves of 2,000 units. the share of non-standard products in the sample is: 0.1 (10: 100) with the maximum sampling error established with probability =0.954.

Based on these data, the share of non-standard products in the entire batch will be: or from 0.04 to 0.16.

Using the method of direct recalculation, it is possible to determine the limits of the absolute number of non-standard products in the entire batch: minimum number - 2,000: 0.04 = 80 pcs.; maximum number - 2,000: 0.16 = 320 pcs.

Method of correction factors used in cases where the purpose of the sampling method is to clarify the results of continuous observation.

In statistical practice, this method is used to clarify data from annual censuses of livestock owned by the population. To do this, after generalizing the data from the complete census, a 10% sample survey is used to determine the so-called “percentage of undercounting”.

So, for example, if, according to a 10% sample, 52 heads of livestock were registered on the farms of the village population, and according to the complete census data, there are 50 heads in this array, then the undercount factor is 4% [(2*50):100]. Taking into account the obtained coefficient, an amendment is made to total number livestock owned by the population of this village.

6. Statistical testing of hypotheses. Hypothesis- this is a scientific assumption about the characteristics of the phenomena that determine them, requiring verification and proof.

Statistical hypothesis- this is a certain assumption regarding the parameters or shape of the distribution of the population, which can be verified based on the results of sample observation. The essence of hypothesis testing is to check whether the sample results are consistent with the hypothesis or whether the discrepancies between the hypothesis and the sample data are random or non-random.

It can be hypothesized that normal, binomial, Poisson distribution etc. . The reason for frequent reference to the normal distribution is that this type of distribution expresses a pattern that arises from the interaction of many random causes, when none of them has a predominant influence. In socio-economic statistics, the normal distribution is rare, but comparison with it is important to determine the extent and nature of the deviation of the actual distribution from it. When testing hypotheses, it is possible to make two types of errors:

A) Type I error– the hypothesis being tested (usually called the null hypothesis) is actually true, but the results of the test lead to its rejection;

b) Type II error– the hypothesis being tested is in fact erroneous, but the results of the test lead to its acceptance.

Most often, the hypothesis that needs to be tested is formulated as the absence of discrepancies between the unknown population parameter and a given value (null hypothesis), denoted by . The content of the hypothesis is written after a colon, for example.

Statistical criterion is the rule according to which the null hypothesis is accepted or rejected. For each type of hypothesis being tested, special criteria have been developed, among which the most often used are the normal distribution test and the Student distribution, the Fisher test, the Pearson distribution (“chi-square”) and others.

To construct a statistical criterion that allows you to test a certain hypothesis, you need the following:

1) Formulate a testable hypothesis. Along with the hypothesis being tested, a competing hypothesis (alternative) is also formulated;

2) select a significance level that controls the permissible probability of a type I error;

3) determine the range of acceptable values ​​and the so-called critical area;

4) make this or that decision based on a comparison of the actual and critical values ​​of the criterion.

Significance level() is such a small value of the probability of the criterion falling into the critical region, provided the hypothesis is valid, that the occurrence of this event can be regarded as a consequence of a significant discrepancy between the put forward hypothesis and the sample results. Typically the significance level is taken to be 0.05 or 0.01.

Power of test is the probability of rejecting the null hypothesis being tested when the alternative hypothesis is correct. That is, the power of the criterion is the probability that no error will be made. Of course, it is desirable to have a more powerful criterion, since this will ensure a minimal probability of making a type II error.

Statistical tests used to test hypotheses are of two types:

1) Parametric I call criteria that are based on the assumption: the distribution of a random variable in the aggregate obeys some known law (for example, normal, binomial, Poisson). These criteria include criteria.

2) Nonparametric(ordinal) are criteria whose use is not related to knowledge of the distribution law of a random variable. They can be used when the distribution differs significantly from normal. These criteria include the Wilcoxon, White, and Mann-Whitney sign tests.

Compared to parametric tests, non-parametric testing has the following advantages and disadvantages.

Advantages:

1. Fewer assumptions about the population. The most important of these is that the population should not be normally distributed or approximately normal.

2. Non-parametric testing methods can be applied even when the sample is very small.

3. Data presented in any measurement scale (nominal, ordinal) can be used.

4. Simplicity of calculations, which can be carried out on a microcalculator. This is primarily due to the small number of observations to which nonparametric tests are applied.

Flaws:

1. Data information is used less efficiently, and the power of tests is lower than parametric ones.

Nonparametric testing relies more on statistical tables unless a special software package is used.

Stages of work on testing a statistical hypothesis:

1) assessment of input information and description of the statistical model of the sample population;

2) formation of a null and alternative hypothesis;

3) establishing the significance level with which to control the error of the first type;

4) choosing a powerful criterion to test the null hypothesis (this makes it possible to control the occurrence of a type II error);

5) calculation of the actual value of the criterion using a certain algorithm;

6) determining the critical region and the region of agreement with the null hypothesis, that is, establishing a tabular value of the criterion;

7) comparison of actual and tabulated criterion values ​​and drawing conclusions based on the results of testing the null hypothesis.

The number of observations from which the empirical distribution is constructed is small and represents a sample from the population under study. Empirical data are associated with random errors, the magnitude of which is unknown. With an increase in the number of observations and simultaneously with a decrease in the value of the interval, the zigzags of the polygon begin to smooth out and, in the limit, pass to a smooth curve - the distribution curve.

The distribution curve characterizes the theoretical distribution, that is, which would be obtained if all random causes that obscure the main pattern were completely suppressed.

The study of the pattern (shape) of distribution includes:

· clarification of the general nature of the distribution;

· alignment of the empirical distribution, that is, based on the empirical distribution, a curve with a given shape is constructed;

· checking the compliance of the found theoretical distribution with the empirical one.

Homogeneous populations are characterized single-vertex distributions. Multivertex indicates heterogeneity the population being studied. In this case, it is necessary to regroup the data in order to identify more homogeneous groups.

Determining the general nature of the distribution involves assessing the degree of homogeneity, as well as calculating indicators of asymmetry and kurtosis.

Symmetrical is a distribution in which the frequencies of any two options equidistant from the center of the distribution are equal to each other. For symmetrical distribution.

For comparative analysis asymmetries of several distributions, the relative asymmetry index:

The value can be positive or negative. A positive value indicates the presence of right-sided asymmetry (the right branch is more elongated relative to the maximum ordinate than the left one) (Fig. 1):

Fig.1. Mo<Ме<

A negative sign of the asymmetry indicator indicates the presence of left-sided asymmetry (Fig. 2).

Fig.2. Mo>Me>

The most common is the asymmetry indicator, calculated by the formula

where is the third-order central moment.

The use of this indicator makes it possible to determine not only the degree of asymmetry, but also the presence or absence of asymmetry in the distribution of a characteristic in the general population. The estimation is carried out using the mean square error:

where n is the number of observations.

If >3, the asymmetry is significant and the distribution of the trait in the population is not symmetrical. If<3, асимметрия несущественна и ее наличие может объясняться влиянием случайных обстоятельств.

Agreement criterion called a criterion for testing a hypothesis on the expected law of an unknown distribution in the population. There are a number of agreement criteria: Pearson, Kolmogorov, Smirnov, Yastremsky. These criteria make it possible to establish whether or not the experimental distributions agree with the theoretical ones, as well as how significant the discrepancies between the distributions are.

One of the most used goodness-of-fit tests is K. Pearson’s test (“Chi-square”):

where are the frequencies of the empirical and theoretical distributions in the interval, respectively.

The greater the difference between the observed and theoretical frequencies, the greater the value of the Pearson criterion. To distinguish significant values ​​from values ​​that may arise as a result of random sampling, the calculated criterion value is compared with the tabulated value at the appropriate number of degrees of freedom and a given significance level.

Having determined the value of the Pearson criterion based on data from a specific sample, you can encounter the following options:

1), that is, it falls into the critical region. This means that the discrepancy between the empirical and theoretical frequencies is significant and cannot be explained by random fluctuations in the sample data. In this case, the hypothesis that the empirical distribution is close to normal is rejected.

2), that is, the calculated criterion does not exceed the maximum possible discrepancy between empirical and theoretical frequencies, which may arise due to random fluctuations in sample data. In this case, the hypothesis that the empirical distribution is close to normal is not rejected.

The table value of the Pearson criterion is determined at a fixed significance level and the corresponding number of degrees of freedom.

Number of degrees of freedom = , where is the number of conditions that are assumed to be met when calculating theoretical frequencies, is the number of groups. The concept of the number of degrees of freedom is due to the fact that in statistical aggregates it is necessary to take into account linear relationships that limit the freedom of change of random variables. For example, when calculating dispersion in the aggregate, we have degrees of freedom, since we can determine any value of a characteristic by knowing the values ​​and the arithmetic mean.

When calculating the Pearson criterion, the following conditions must be met:

1. The number of observations must be large enough

2. If the theoretical frequencies in some intervals are less than 5, then such intervals are combined so that the frequencies are greater than 5.

Example

It is required to check whether the distribution of regional enterprises according to the average cost of fixed assets corresponds to the normal distribution law, using the criterion.

It is necessary to test the hypothesis that the sample is obtained from a normally distributed population (in this population 30.3; 8.44).

To answer the question, we will compile auxiliary table 13.

Table 13

Groups of construction enterprises by volume of contract work performed, million rubles. Observed Frequency Rounded frequencies
10–15 15–20 20–25 25–30 30–35 35–40 40–45 45–50 50–55 -2,41 -1,81 -1,22 -0,63 -0,04 0,56 1,15 1,74 2,33 -1,81 -1,22 -0,63 -0,04 0,56 1,15 1,74 2,33 2,93 -0,984 -0,930 -0,778 -0,471 -0,032 0,425 0,750 0,918 0,980 -0,930 -0,778 -0,471 -0,032 0,425 0,750 0,918 0,980 0,997 0,027 0,076 0,153 0,220 0,228 0,163 0,084 0,031 0,008 3,9 10,9 21,9 31,4 32,6 23,3 12,0 4,4 1,2
0,18 3,226 1,48 0,173 0,333
0,2
Total - - - - - - 5,512

For the first interval

143*0,027 = 3,9 ≈ 4.

The number of groups after combining the small ones was 7. The critical value with 7 – 3 = 4 degrees of freedom and a significance of 0.05 will be 9.49. This means that the probability of a distribution diverging from the normal one is less than 0.05 and the probability of its compliance with the normal law is greater than 0.95. at α = 0.1 it is equal to 7.78, which is also more than the actual one. The hypothesis that the distribution of a given population corresponds to the normal law cannot be rejected.

Using the criterion, you can check not only the hypothesis about the agreement of the empirical distribution with the normal one, but also with any other known distribution law, for example Poisson distribution. This distribution occurs when considering low-probability events occurring in a large series of independent trials. The likelihood of these rare events occurring

where is the average number of occurrences of an event A V n identical independent tests, that is; R– probability of an event during one trial; e = 2,71828; m– frequency of this event.

For example, to conduct internal quality control of processing payment requests, 100 documents were randomly selected. The average number of errors was . It is required to check, using the criterion, the compliance of the empirical distribution with the Poisson distribution (Table 14).

Table 14

Number of mistakes Number of documents verified
0,6771 0,2641 0,0515 0,0067 0,0007 67,7 26,4 5,15 0,7 0,1 0,7859 0,4100 0,0043 8,1148 13,3877
Total 1,0000 26,400

Value = 26.4. Number of degrees of freedom df = 5 – 1 = 4. (For the Poisson distribution: df = k – 1 – r, where r = 1 or r = 0 if the estimation is based on a sample.) Table values ​​; . Since , the Poisson distribution hypothesis is rejected.

To assess the degree of agreement between empirical and theoretical distributions according to this criterion, special tables are used.

In the absence of special tables, the “chi-square” criterion can be replaced by V.I. Romanovsky’s criterion:

where is the number of degrees of freedom.

For a normal distribution, Charlier distribution, where is the number of intervals (groups).

Differences between empirical and theoretical frequencies are considered random if the value is less than three.

In addition to these criteria, consider nonparametric criterion and the relevance of their use is constantly increasing.

Wilcoxon signed-rank test– number of observations for which ).

Deviation area H 0 can be either on one side or on both sides depending on which null hypothesis is being tested. In the absence of special W-statistics tables, the standard normal distribution can be used, that is, Z-statistics taking into account P.

Example

It is required, using the Wilcoxon signed-rank test, to resolve the issue of the significance of the excess of the median profit value in the studied population of firms engaged in real estate transactions, the zero value (5% significance level). The null and alternative hypotheses will be written as follows: But: m< 0; H 1: m > 0.

Table 16

Wilcoxon test calculation

Firm Observed values ​​(profit as a percentage of sales) Rank
-5 -5 9,5 9,5 9,5 15,5 9,5 2,0 15,5 9,5 13,5 13,5 9,5 9,5 9,5 9,5 15,5 9,5 2,0 15,5 9,5 13,5 - 9,5 13,5
Total - - - - 139,5 13,5

For companies with ranks are placed in a separate column R+. The sum of the values ​​in this column gives the Wilcoxon statistic: W= 139.5. (Column R– is not involved in the analysis, but is calculated to avoid errors.)

Critical Criterion Value W can be found from tables.

For 17 non-zero differences and α = 0.05 the lower critical value W= 42, upper – 111. Actual value = 139.5 is not in the range of table values. Therefore, the null hypothesis can be rejected at the 5% significance level.

Wilcoxon signed-rank testto compare two samples can be used as a nonparametric criterion for solving a problem for which a parametric t-test was previously used. The characteristics of one population are designated x 1 and the other y 1 . The calculation method is similar to applying the criterion to one sample.

Example

Each member of the 17-person analysis team was shown two advertisements. The subjects rated the creative level of each advertisement on a scale from 1 to 5. Rate the creative level of each advertisement at the 5% significance level.

H 0: , that is, the median value in the population equal to zero (creative levels of advertising are the same);

21.5 falls within these limits, therefore the null hypothesis is accepted. Conclusion: the advertising products being compared have the same level of creativity. It is assumed that , ranks for data from sample 2 are written in the column R2. The observed (actual) value of the Wilcoxon test is calculated using the formula W = .

Example. The firm is facing a lawsuit alleging discrimination against employees based on gender. It is required, using the presented wage data (Table 19), to determine at the 5% significance level whether both distributions have the same median.

Table 19

Data on employee gender discrimination

Monthly salary, thousand rubles.
Women 11,2 10,5 8,3 10,2 14,4 8,5 5,0
7,5 = 43,5
Men 9,1 18,3 14,1 21,9 10,5 13,8 14,6 8,6 13,4 10,6
7,5

Since there is no reason to believe that monthly wages for one group of employees are higher than for another, the null and alternative hypotheses are formulated as two-tailed.

The working time fund of one worker is determined by the formula:

where are calendar days, 365 days;

Days off per year, 54 days;

Holidays per year, 13 days;

Vacation days, 24 days;

Days of absence from work due to illness, 2 days;

Days of absence from work due to performance government duties, 1 day;

Duration of the working day, 8 hours;

Time to shorten the working day before the weekend and holidays, 1 hour

Number of repair workers

The number of repair workers is determined by the formula:

where is the coefficient of fulfillment of production standards for repair workers, taken within the range of 1.06-1.1.

Distribution of workers by profession and category

The distribution of the number of repair workers by category and profession is made taking into account the volume of production by type of work based on Table 4.

Table 4 - Distribution of workers by category and type of work

Calculation of the average wage category of a worker

Average calculation tariff category worker is produced according to the formula:

rank, (32)

where 4,5,6 are job categories;

Number of repair workers of the corresponding category, people.

The result obtained is most often a fractional value and cannot be rounded to whole numbers. It is recommended to write the average tariff category as a decimal fraction, with whole numbers denoted by Roman numerals and fractional numbers by Arabic numerals, for example III, 5.

Selecting work and rest modes

Working hours and rest times for enterprise employees are regulated labor legislation. The work-rest regime is optimal if it allows you to increase the period of stable performance. To a large extent this is achieved the right choice lunch break time, as well as additional break times.

The best time for a lunch break is the middle of the shift. The frequency and duration of regulated short breaks should be set depending on the load and pace of work of the production site. For light loads and work tempo, 2-4 five-minute breaks are recommended; for heavy workload and high work tempo, 4-5 ten-minute breaks are recommended; for high work tempo nervous tension- 4 fifteen-minute breaks for 7-8 hour shifts.