Научная статья на тему 'Statistical processing of data describing very large collectivities'

Statistical processing of data describing very large collectivities Текст научной статьи по специальности «Экономика и бизнес»

CC BY
63
18
i Надоели баннеры? Вы всегда можете отключить рекламу.
Область наук
Ключевые слова
СТАТИСТИЧЕСКАЯ ПРОЦЕДУРА / БАЗА ДАННЫХ / МАССИВ ДАННЫХ / STATISTICAL PROCESSING / COLLECTIVITY / DATABASE / DATA SET

Аннотация научной статьи по экономике и бизнесу, автор научной работы — Ivan Ion, Ciurea Cristian, Vinturis Sorin

It is presented types of collectivities and their characteristics. Indicators associated with data that describe collectivities are determined. National databases intented to treat homogeneous sets of problems are analyzed. Data sets are identified and combined analysis is performed to determine certain statistics.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «Statistical processing of data describing very large collectivities»

СТАТИСТИЧЕСКИЕ ПРОЦЕДУРЫ ОПИСАНИЯ ДАННЫХ БОЛЬШИХ МНОЖЕСТВ

УДК 311.2, 004.652 Ion Ivan

PhD Professor, Economic Informatics Department of Bucharest Academy of Economic Studies

Phone: +4.021.319.19.00, +4.021.319.19.01; E-mail: [email protected]

Cristian Ciurea

PhD Candidate, Economic Informatics Department of Bucharest Academy of Economic Studies E-mail:

[email protected]

Sorin Vinturis

PhD student at Doctoral School of Bucharest Academy of Economic Studies in the field of Economic Informatics E-mail: mailto:[email protected]

It is presented types of collectivities and their characteristics. Indicators associated with data that describe collectivities are determined. National databases intented to treat homogeneous sets of problems are analyzed. Data sets are identified and combined analysis is performed to determine certain statistics.

Keywords: statistical processing, collectivity, database, data set.

Ион Иван

PhD профессор, факультет экономической информатики, Академия экономических наук, Бухарест

Тел: +4.021.319.19.00, +4.021.319.19.01; E-mail: [email protected]

Кристиан Чуря

PhD кандидат, факультет экономической информатики, Академия экономических наук, Бухарест E-mail: [email protected]

Сорин Винтурис

PhD аспирант в области экономической информатики, Академия экономических наук, Бухарест

E-mail: mailto:[email protected]

СТАТИСТИЧЕСКАЯ ОБРАБОТКА ДАННЫХ, ОПИСЫВАЮЩИХ ОЧЕНЬ БОЛЬШИЕ СОВОКУПНОСТИ

В статье представлены типы множеств и их характеристики. Определены показатели, соответствующие данным, описывающим множества.

Анализируются национальные базы данных, предназначенные для решения однородных типов задач. Обозначены массивы данных и для определения конкретных статистик представлен их анализ.

Ключевые слова: статистическая процедура, база данных, массив данных.

1. Very large statistical collectivities

It is considered a very large collectivity C consisting of elements C1, C2, ..., CN, where N is a very large number, the order of 106 elements. One such collectivity is the population of a country or the bank transactions on a number of months. There are material resources to create a physical medium for the data of collectivity C. A large collectivity contains millions of components, for example a portal database with real time data acquisitions.

It is considered that the element C is described with K,, K,... K,,characteristics.

t 1 2 M

If C is a collectivity of people, then K1 represents the name, K2 - height and K3 -weight.

The database size which is used to store the elements of the collectivity is measured as:

- number of records N;

- number of bytes.

The database size in bytes, DIM, is given by:

M

DIM = N L ,

where: l=1

L. - K. field length,

M - number of characteristics.

The quality characteristics of data from very large databases, that describe the collectivity, are:

- homogeneity of the measurement procedures, which meet the same standards; these are certified to satisfy that standard;

- comparability;

- precision;

- completeness, respectively high costs if data gathering is not complete; correction costs arise;

- reproducibility;

- interdependence over time.

Table 1.1. Associated indicators that describe collectivities

Characteristic Measurement 1 Measurement 2 Difference

Ci a1 Pi Л i

C 2 a2 в2 Л 2

CN a N Pn ЛN

If At differences are significant, then the process of data gathering has not been done properly. The data are actually used for decisions such as reserving a hotel room or to make a promotion for the customers of a mobile phone company.

A database with hotel rooms is considered. Room reservations are made over the Internet for some period of time. The database is wrong, meaning that the hotel has 100 rooms in reality, but the database has 150 rooms. The one who is reserving, receives the room number 130 which does not exists in reality. If h. is the number of hotels in the town i, t=1..k, from a country, then the set Hof hotels from that country is determined according to the following relation:

i=1

Where for every hotel from the set H is calculated the minimum number of reservations, R , necessary for the hotel to operate, and the maximum number of reserva' mm7 J i ?

tions, R , which supports the hotel.

' max7 i i

Actual number of room reservations, R, must respect the following relation:

Rmin ^ Rf ^ Rmax.

The collectivity Hv H2,... HTof the town halls is considered. In a town hall there are PP2,... PSpositions. It is determined the database indicator G representing the number of people working in the town hall t, having the position j.

There are T databases, corresponding to T town halls. The T databases are merged and it is obtained the virtual database of all workers from the T town halls.

It is considered that for the workers' wages x% of GDP is allocated. It is determined the virtual database indicator g. representing the number of workers of type i, making selections for each profession P. i

The professions stack is constructed and the occurrence frequencies of each profession is determined, respecting the following relationship:

£f = S,

i=1

where:

t - weight number,

S - professions number.

The p. professions weight in the collectivity is determined based on the following relationship:

Pi = i

The A ■ pi multiplication is considered the salary weight that is received by workers having the position i and B. representing the wages of individuals from the i professional category.

To obtain the changed wages, the GDP percentage is modified from x to x'. The wage change is simulated proportionally to all employees or differently usingp, p2,..., pn weights, provided that:

S Pi =l.

Fig. 1.1. Virtual database of workers from the town halls

lectivity.

To calculation of the arithmetic mean for the elements of a very large collectivity, AM, is performed on intervals, [1; j],

[j+1; 2j], [2j+1; 3j], ..., [(i-1)j+1; ij],

according to the following formula:

j 2j 3j ij

"em + "em + + ...+ "em

m=1 m=j+1 m=2 j+1 m=(i-1)j+1

AM = -

' ■ J

i=1

The new weights will be:

A' = p +

pi = p2 + a2,

pn ' = pn + an ,

n n

provided that " pi' =1 and " ai = 0 . i=1 i=1

The risk of working on sample is given by the loss of information. The population is determined every ten years through census or statistics regarding the number of births and the number of deaths. When working on databases, a grouping by professions, by categories of wages is made, and the results obtained are more accurate. The risks of operating with very large data sets refer to the difficulty of eliminating abnormal or incorrect values, introduced by human operators or retrieved by acquisition from various equipments. The existence of such false data affect the indicators values calculated on the entire col-

where:

em - the m element of the collectivity;

i - intervals number;

j - the number of elements within an interval.

The computing power of servers hosting very large databases is enough to compute the arithmetic mean of the elements for the entire collectivity, without having to divide them into intervals. But this approach blocks the database longer than when the computing is performed on intervals, in which case multiple connections to the database are made. The solution using very large virtual databases allows blocking fragmentation that dramatically improves the waiting proces in distributed applications with large number of users simultaneously.

2. Very large databases

The databases of a bank is filled with new data, as a result of customer transactions, of over 100 GB per day. This growth is achieved by updating the databases, solely by adding data regarding current transactions. Raiffeisen Bank has a data warehouse of over 24 TB. These data need to be correlated, cleaned and aggregated. Many of this data are in separate systems, including user-level, in local Excel files or Access local data-

bases, within reach for everyone, because of the information anarchy generated by the Office products. Organizations have impressive investments in technology, servers, to process data information that creates informational dispassion, losing it in silos of applications and expertise. These silos need to be strengthened, IT solutions should be developed not to keep large amount of data, but to create informations and knowledge uniformly and consistently.

Retrieved and processed, these data are translated into banking informations. The data, as a result of data processing, forms into an bank information flow that influence both the internal operation of the banking unit as well as beyond it. The complexity of the banking information system requires that at the organizational and functioning level lie a series of principles:

- the design and operation of the banking information system be according to the organizational structure of the banking unit;

- the ranking of the banking informations by importance and degree of efficiency;

- the concentration and centralization of the banking informations;

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

- design, organization and operation of banking informational system be simple enough so that banking operation be more efficient.

In a bank tens of millions of transactions are made daily, representing transfers between existing accounts, opening of new accounts, opening or closing of deposits, loan agreements. These transactions require the existance of an advanced database management system

and an integrated computer system. Electronic transactions that takes place in a bank are saved in databases and are never deleted. Each bank has well tuned procedures for backup and disaster recovery, to avoid the loss of database records, even for natural disasters events.

Considering the coresponding virtual database for the daily transactions in a bank, the problem of diversity extraction arise. Extraction operation involve the following steps:

- a collectivity is considered;

- a characteristic is chosen within the collectivity;

- the collectivity items are sorted after the chosed characteristic;

- frequencies of occurrence of the identical values for the choosed characteristic are taken;

- collectivity items are sorted by frequency levels;

- the variability for the elements belonging to the considered frequencies is analyzed.

For the daily transactions conducted in a bank, the characteristics target the workstation, the client account, the beneficiary account and the transaction value. Transactions are sorted by value. It is determined the frequencies of the traded amount values and the highest frequencies are chosen. It is verified if the transactions were made from the same workstation. The persons who operated the tradings are determined. The traded amount beneficiaries are determined. From the analysis, abnormal situations arise, that is all transactions were made to a certain destination, or all transactions were made from the same workstation. An explanation of these abnormal situations is sought to prevent any fraud attempts.

If an incorrect transaction on a client account is made, meaning that it has made a payment to another beneficiary than the correct one or it transferred a wrong amount of money, then the payment reversal is carried out and a new transaction account is registered. Once registered a transaction on an account, it is no longer clear. The payment reversal requires crediting the customer's account with the equivalent payment, a situation which leads to two entries in the database, the one related to the payment and other related to the payment reversal. This working method has advantages such as keeping track of all transactions carried out on an account. Informations regarding banking transactions

Transaction Transaction

Key ID date place

Kev

Value

Transaction

Fig. 2.1. Transaction elements

are increasing their value as aging. Banks realized this opportunity and charges for the availability of the old customer transactions for an account. If a customer requires a proof of one payment from his account, and the payment was made three years ago, the bank offer the customer acount statement for the required payment day for a fee.

When a customer requests preferential discounts and commissions for conducting operations, the bank analyzes the history of the customer transactions and the monthly transactions volume. The bank has all the transactions made by all the customers in a long period of time. If one bank has N branches and each branch makes an average of K daily transactions, then in H days, the total number _ of transactions NTR = N ■ K ■ H represents hundreds of millions. With complete databases related to all transactions conducted by all customers, the bank examines the customer's money turnover and decides whether to grant preferential discounts and commissions.

The problem is to develop processes to search the database for a truncated key.

All the customer's transactions for a certain value are extracted if it's preferentially requires.

The National Health Insurance House database is considered. The da-

tabase contains identification data of patients, such as name, age, gender, address, and data regarding medical history for each patient with records like performed consultations, medical treatment, suffered diseases. It is determined the incidence of the most common diseases depending on group age or according to patients address.

If the data is grouped by age intervals, a frequency interval distribution series is obtained. The number of grouping intervals is achieved using the relation of Sturges:

r = 1 + 3,322 * In N,

where:

r - the number of grouping intervals,

N - the number of persons in the collectivity.

To determine the age grouping intervals the following steps are performed:

Step 1. It is established the amplitude of Age characteristic:

AP = A - A . ,

max mm

where:

AP = amplitude,

A = maximum level of Age charac-

max

teristic,

A = minimum level of Age charac-

m.n

teristic.

Step 2. It is determined the number of interval groups, r, using the above relation of Sturges.

Step 3. It is calculated the size of the grouping intervals:

Table 2.1. Frequency distribution on intervals

Variation interval for Age characteristic Occurrence frequency for a disease

Ii = [Ajinf, Alsup) ni

I2 = [A2inf > A2sup) П2

I3 = [A3inf, A3sup) n3

Ir [ Ar inf , Ar sup) nr

TOTAL N и

h=AP/r ,

unde:

h = interval size.

Step 4. The grouping intervals are established starting from A .:

° min

I, = [A ., A . + h ),

1 L mim mm

I, = [A . + h,A . + 2-h ),

2 L min 7 min '7

I, = [A . + 2-h,A . + 3-h ),

3 L min ' min '7

I = [A . + ( r-1 ) h, A . + rh ).

r min min

In this way r groups are obtained for which the frequencies are established by counting units belonging to each group interval. It is determined the frequency distribution on intervals (Table 2.1.).

The frequency of diseases depending on age group is obtained.

If data are grouped after a time variable, a chronological data series is obtained. The chronological data series consists of two data sets: a time data set, and another referring to the frequency of occurence for X disease.

The chronological data series is plotted through the cronograma or historigram (Fig. 2.2.).

In the rectangular coordinate system, time units are marked on the abscise axis and on the ordinate axis the values of the measured variable.

Radar diagram is drawn to examine whether the time series has seasonal variations (Fig. 2.3.).

Initial data volume is used to estimate the physical resource requirements for data storage and the workload required for data preparation and database creation.

When estimating the volume of a large virtual database, it is necessary be considered as more elements so that whatever changes will occur to the virtual database, it's structure remains stable. There are situations in which one experience on database operations is partially taken into account, in which case the initial structure of the database changes, because fields are added to allow the description of states that weren't considered when designing the database.

If the patients virtual database is designed only to store data from registers and observation files and data acquisitions from equipments such as tomograph computer, electrocardiograph, MRI system needs to be correlated with the patients database, then more fields should be added to allow links between the virtual database and the databases containing data acquisitions.

But if, when designing the virtual da-

ian feb mar apr mai iun iul aug sep oct nov dec Lunile

Fig. 2.2. Cronograma represented by columns

Fig. 2.3. Radar diagram

tabase, hypothetical situations are taken into account ensuring in this way some reserved fields, whenever new ways of describing the patient's condition arise these fields will be initialized and the stability structure of the database is assured. In figure 2.5 it is observed that the virtual database obtained by concatenating the local database is completed with pointers to the informations obtained in the data acquisition databases.

In figure 2.4 are presented classical databases, the virtual database obtained through concatenation and database obtained through data acquisition from the electrocardiograph, computerized tomography scanners and the MRI system. The virtual database has pointers to records acquired from the machines.

3. National databases

There are national database designed

to treat homogeneous sets of problems. In these databases the updates are done without deleting old data.

- the population database contains information describing the people characteristics of a country;

- the car database concern vehicle unique identifiers, such as car number, engine series, car type; the information is used to determine taxes that the vehicle owner must pay;

- the car permits database contains the person identification code, the date of the exam, the examination test he had to obtain the car license, the scores he had to pass the exam;

- the diplomas database concern information related to diplomas that educational system awarded to individuals; this information is interesting for companies working on the market for hiring

Fig. 2.4. Virtual database

staff;

- the health database suppose the existence of data for every individual receiving medical services;

- the weather evolution database, in which is recorded in real-time weather information necessary for determining weather forecasts;

- the cadaster database containing detailed information on all buildings in a country;

- the fiscal administration database, in which data is entered regarding taxpayers and paid or unpaid taxes;

- the phone subscribers database, that contains information about subscribers to fixed or mobile phone companies; at national level, the subscribers database of a mobile phone company include several million records.

In all the mentioned databases must be recorded all the changes made, preserving in the same time the identification elements of the data change operation performed.

National databases are designed as a whole in which are kept very detailed data on the elements that are described, aggregate data, averages on each year of study, group indicators, aggregate indicators on areas, and focused data.

It is considered the unemployed database of National Agency for Employ-

ment. The database contains information identifying the unemployed, such as name, age, sex, residence and educational level. The number of registered unemployed is determined according to each unemployed residence.

Data are grouped by a area or county variable, and is obtained a spatial series of data.

If at the moment when the dataset is created there are Q records corresponding to the Q elements of the collectivity, in the problem analysis phase are studied processes that dominate the community dynamic, aiming to:

• the volume of changes of the characteristics levels that define the elements states; in the case of patients whose data are stored in a large database, the blood group, birth date, eye color, DNA remains constant, while blood sugar, weight,

height and other measurements that characterize the parameters defining the health status change; for each group of people exist on a time interval a minimum and maximum number of changes, the total volume of changes depends on groups number with high homogeneity degree that build up the collectivity, the number of people in each group and the estimated volume of changes;

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

• the variation of elements number from the collectivity, given by the elements leaving the collectivity, new elements that come into the collectivity and the elements that change the state through corrections made to fields considered invariant; if a person has a wrong birth date, then that must be corrected; the same should be done in case of blood group or sex changing;

• the sets of data describing the tran-

CNP Name Surname

С1ЯЗОЯ23410095°) ("Ione&cu") ("Gheorghe")

Operation code for changing Name

name ("625") ("Pope&ciT)

Fig. 3.1. Update a record in the database

sition to a new state of each element from the collectivity, marking in a complete manner, with no ambiguity, the evolution of the element that records the transition to a new quality; in the case of training cycles, for each cycle a structure is build, that cover from information point of view all the elements that suppose the promotion and obtaining of documents proving the attending of each stage.

Working with national databases implies the existence of millions of records like the population of a country, taxes paid by taxpayers, text messages sent by subscribers of a mobile phones company, detailed information on all the buildings from a country. The size of operations performed on the national databases is over 106 operations.

Using national databases should be restricted so that only authorized persons can access them. Some informations from the national databases are available to all without requiring special authorization. Aggregated indicators at the level of counties or countries are data at which everyone has access.

4. Combined analysis of banking datasets

It is considered the database of Collaborative Multicash Servicedesk - CMS application, in which are stored the requests of a bank customers, relating to the problems that they have in using the Multicash electronic payment service.

The Collaborative Multicash Service-desk application is structured in two modules:

- the module for online registration of bank customers requests;

- the module for recording phone requests by Multicash Helpdesk analysts.

In the module for online registration of bank customers requests, each customer receives from the bank an username and password with which he will authenticate in the application. The associated customer interface allow the customer to send a written request to the Helpdesk department, by framing the issue in the appropriate category and subcategory, but also to register a priority request in exchange of a fee.

In the module for recording phone requests by Multicash Helpdesk analysts, after authentication in the application, the analyst see the page from which is made the registration of requests in the database.

The fields to be completed or selected by the bank analyst are the followings:

- customer name, based on sugges-

Table 3.1. Unemployment rate by counties, at the end of December 2009 [5]

County Unemployment rate [%]

Alba 12,6

Arad 6,8

Arges 9,4

Bihor 5,9

Caras-Severin 10,4

Calarasi 9,2

Cluj 6,3

Constanta 6,3

Covasna 11,1

Hunedoara 10,6

Prahova 9

Satu Mare 6,5

Timis 4,4

Bucuresti 2,3

Ё Telefonice : Table

Field Name | Data Type |

1. ID Number

Customer Text

Contact_person Text

Request_code Text

Category Text

Description Text

Solution Text

Analyst Text

Call_date Text

Fig. 4.1. The fields from the table of CMS application database

tions from a predefined list of Multicash customers;

- the contact person of the customer who made the call;

- the request category, which is a dropdown list with predefined categories and related codes;

- request description, which is a field for adding the details of the problem;

- the way to solve by selecting the appropriate option.

The fields from the table of CMS application database are shown in Figure 4.1.

The CMS application is used effectively within Raiffeisen Bank, in its database is being introduced over two thousand requests per month. Having the database of all customer requests, it is realized the analysis of the types of problems faced by Multicash service users and are determined the strategies to address each customer, according to the history of problems he encountered.

A request from the database of CMS application contains the fields in Figure 4.2.

Working with the entire database of requests allow to avoid future complaints through the analysis of previous customers problems, offering solutions and ad-

ditional support to the customers with many requests.

The situation of requests on categories, recorded in the period February 15 to March 15, 2010, is presented in Table 4.1.

Analyzing the data from Table 4.1, result that most requests were registered on Check payments status category, because the Multicash service allow to view information on the settlement payments status and account balances updated every hour. Customers need the confirmation of certain payment processing at a certain time and they call the Helpdesk department to get these confirmations.

According to Table 4.1, the first three categories with the biggest number of requests are Check.ng payments status, Other requests and Confirm account balance. The difference between the number of requests registered on these categories and the number of requests from other categories is significant. To reduce the number of requests in these categories should be:

- improved the Multicash service, in order to allow real time view of operations performed;

- reviewed the requests recorded in the category Other requests for their re-

1 ID ¡Customer icoiitactjiersonjueiiuest ТТШТ1 кляшляя 1 РЛТРШ! EHEEEII

Biocare ulilizatar la wmunlMtie ■ 9ÜJ &eblfl"f Mullicash Helpdesfc 31-12-2009

Fig. 4.2. Fields of a record in the database

classification in existing categories or in order to create new categories of problems;

- updated accounts balances in real time.

Working on large data sets allow the launch of assumptions, making calculations and determining ways to correct reality.

Databases with transactions performed in a bank contains information about the user who performed the operation, the channel through was done, from which workstation, in which date and which hour. These databases are updated in real time and are consulted by the Banking Security Department to discover any fraud attempts. If you find that, from

Table 4.1. Number of requests on categories

Request code Request category Number of requests

901 Training on using the application 107

902 User blocked at logon 127

903 User blocked on the communication 248

904 Training on see rejected payments 56

905 Check payments status 795

906 Login with admin2 user 3

907 Index corrupted in database tables 27

908 Please repeat job with AC29 37

909 Communication initiated 254

910 Transmission interrupted 155

911 Signature error 233

912 Generate electronic signature 122

913 Add new users in the client application 95

914 Add new accounts in the client application 105

915 Change name / address of payer 15

916 Training of branches for completing annexes 52

917 Error on see statements 186

918 Delivery account statements 80

919 Delivery files for distributed signature 34

920 Move the application on another computer 70

921 Installing the application abroad 11

922 Confirm account balance 424

923 Deactivate payments file 11

924 Change communication channel 8

925 Setting print parameters 20

926 Reinstalling the application 54

928 Change number of approvals / amount limits 6

929 Error on starting the application 54

930 Statements export 20

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

931 Setting communication sessions 6

932 ROI or INT button disappearing from the main menu 2

933 Other requests 729

934 Delivery file with bank codes 83

935 Unresolved - BAS blocked 2

936 Check the import file structure 6

937 Check the validity of files sent for distributed signature 1

938 Payments cancellation 28

939 Change the customer status in LIVE/ TEST 3

940 Communication problems to the customer 17

941 Decryption error/ wrong communication password 7

942 Missing a bank branch 12

999 Intervention of service provider 9

№3, 2010

a workstation, an operator makes a lot of transactions compared to other operators, or amounts transferred are very high, then it is done thorough research regarding these operations.

From the database of CMS application data sets are identified and is performed a combined analysis to determine certain statistics. The combined analysis involves correlations between data sets, for the calculation of quality indicators.

For the analysis Person - Operations, are identified the types of operations made by a person.

Is determined the load degree of each agent in the system and is made a redistribution of operations so that do no exist a situation in which an agent is overloaded and another do not have enough operations which fill the working time.

It considers H, H2, H3 and H4 the names of four analysts who actually work with the CMS application within the Mul-ticash Helpdesk department of Raiffeisen Bank.

From the combined analysis Analyst -Category of requests, on the basis of records from the Collaborative Multicash Servicedesk application, results that the analyst H} solved requests from the categories Add new accounts in the client application, User blocked on the communication, Generate electronic signature, Change communication channel, and the analyst H2 solved requests from the categories Add new users in the client application, Training on see rejected payments, Move the application on another computer. Taking into account the number of requests recorded on each category, it follows that the analyst H} has been overloaded.

For the analysis Person - Resolutions, there are evaluated the types of resolutions adopted and their frequencies of occurrence:

H3: resolution YES at the rate of x%, NO at the rate of y%.

H4: resolution YES at the rate of z%, NO at the rate of w%.

If x > z, then H3 gave more positive resolution than H. If x > y, then H3 gave more positive than negative resolution. If z > w, then H4 gave more positive than negative resolution.

By generalization, being considered

the data sets D,, D2, ..., D , correlations

1 2 n

are established between any of D. and D, where i, j = 1..n, with i '" j. For each combined analysis D. - D. the types of correlations are analyzed and are calculated quantitative and qualitative indicators.

Indicators for the case presented above are:

- the quantitative indicator comparing the number of resolutions adopted by the two entities:

I

ND

Di / D j

ND

, where:

Nd - total number of resolutions adopted by D.;

NDj - total number of resolutions adopted by D.;

- the qualitative indicators comparing values between the two resolutions adopted:

I D = ; I D =

y x

w

I =-• I =y

x / z

y / w

w

For x = 80, y = 20, z = 70 si w = 30, the indicator ID. has value 4, representing the report of positive and negative resolutions established by H3. The indicator I has value 2.33, representing the report of positive and negative resolutions established by H4. The number of positive resolutions established by H3 versus H4 is I, = 1.14, and the number

4 x/z '

of negative resolutions established by H versus H. is I, = 0.66.

3 4 y/w

5. Conclusions

Increasing the volume of information and improving the software products for exploit it have led to a new quality of data usage by analysis that reveal to the organization's management information difficult or impossible to obtain other-

wise. In this way are obtained information on customer preferences, their profile or distribution. It provides the management data regarding the region of the country where a product sells well, and which are the preferences of a particular market segment.

Such information is obtained only by using certain treatments, such as multidimensional analysis, statistical methods of forecasting and other mathematical methods applied to a very large volume of data. Mathematical methods advertise the use of specialized computer software, very complex.

Statistical data processing describing a collectivity must be made starting from the entire population, without having to work on samples that are representative or not. Working with accurate data about a collectivity, are avoided rounding errors and are obtained accurate values of the indicators calculated. Current information systems allow working with very large databases, that must be updated in real time.

Operation on the primary databases requires accuracy, flexibility, decisions foundation on hypothesis variants and a qualitative leap. Working with the entire population or collectivity is not in contradiction with the work on a given sample.

6. Acknowledgements

This article is a result of the projects POSDRU/6/1.5/S/11 „Doctoral Program and PhD Students in the education research and innovation triangle (DOC-ECI)" and "PhD in economy at European knowledge standards (DoEsEc)". These projects are co funded by European Social Fund through The Sectorial Operational Programme for Human Resources Development 2007-2013, coordinated by The Bucharest Academy of Economic Studies.

References

1. C. Gaber and V. Voineagu, Sonda-jul: metoda de investigare a fenomene-lor de masa, Editura ASE, 2004.

2. V. Voineagu, "Structura ocuparii fortei de munca," Tribuna economica, Vol. 18, Nr. 31, pp. 30-32, 2007.

3. Peng Li, "Variational analysis of large power grids by exploring statistical sampling sharing and spatial locality," Computer-Aided Design, 1CCAD-2005. IEEE/ACM International Conference on, pp. 645- 651, 6-10 Nov. 2005.

4. http://www.contabilizat.ro/cursuri_ de_perfectionare~categoria-manage-ment_si_marketing~nume-statistica_ economica.html

5. Ministerul muncii, Familiei si Pro-tectiei Sociale, Statistici Somaj 2009, Available at: http://www.mmuncii.ro/ pub/imagemanager/images/file/Statisti-ca/Statistici%20lunare/s39a12.pdf

6. K. Abe, S. Sugawa, S. Watabe, N. Miyamoto, A. Teramoto, Y. Kamata, K. Shibusawa, M. Toita and T. Ohmi, "Random Telegraph Signal Statistical Analysis using a Very Large-scale Array TEG with 1M MOSFETs," VLSI Technology, 2007 IEEE Symposium on, pp.210-211, 12-14 June 2007.

7. I. Ivan and C. Ciurea, "Quality Characteristics of Collaborative Systems," International Conference on Advances in Computer-Human Interaction, ACH1 2009, pp. 164-168, 2009 Second International Conferences on Advances in Computer-Human Interactions, 2009.

8. I. Ivan and C. Ciurea, "Using Very Large Volume Data Sets for Collaborative Systems Study," Informatica Economics Journal, Vol. 13, No. 1, 2009.

9. I. Ivan, B. Vintila, C. Ciurea and M. Doinea, "The Modern Developments Cycle of Citizen Oriented Applications," Studies in Informatics and Control, Vol. 18, No. 3, 2009.

z

z

i Надоели баннеры? Вы всегда можете отключить рекламу.