ANALYSIS AND SOLUTION METHODS OF A MULTI-USER CLIENTSERVER WEB SYSTEM FOR MANAGING DISTRIBUTED COMPUTING
Y.N.ALiYEVA., E.i.AHMADLi
Azerbaijan State Oil and Industry University
ANNOTATION
There are a large number of important tasks, the solution of which requires the use of huge computing power, often inaccessible to modern computing systems. It was assumed that computing modules (processors or computers) can be interconnected so that the solution of problems on the resulting computing system is accelerated as many times as the number of computing modules used in it. It is important to know the class of methods that are convenient for implementation on a parallel system and the algorithmic structure of these methods, as well as to study parallel programming tools.
Key words: WEB-systems, distributed computing, interface, architecture, client-server.
The purpose of this article is to create a client-server WEB-system that provides convenient access to computing resources, splits the main task into subtasks, analyzes the performance of clients, and assigns a subtask to a client in accordance with the power of his computer. A client is any user registered in the system who has installed a client service on a computer. The task in the context of this work is the selection of a password (within the maximum length specified by the user) by the HASH value. The subtask is a part of the search options, limited by the region of the alphabet and the possible length of the password.
The client service receives subtasks from the server site via the HTTP protocol and sends the calculation result.
Multiple computers are used in a parallel computing system most frequently in distributed computing to solve time-consuming computational problems.
Contrary to local supercomputers, a property of distributed multiprocessor computing systems is the possibility of unlimited increase in productivity due to scaling.
At the moment, there are several ready-made solutions for convenient parallelization, both low-level and high-level.
A free standard for parallelizing C, C++, and Fortran programs is called OpenMP. Explains a collection of environment variables, library routines, and compiler directives for creating multithreaded applications on multiprocessor platforms with shared memory. OpenMP uses multithreading to accomplish parallel computing, in which the "master" (master) process generates a group of slave (slave) threads and distributes the job among them.
On a machine with multiple processors, threads are intended to operate concurrently.
A programming interface (API) called Message Passing Interface (MPI) enables processes that carry out the same function to communicate with one another. In parallel programming, MPI is the most popular data interchange interface standard, and there are implementations for many different computer platforms. It is employed in the creation of software for supercomputers and clusters. The transmission of messages to one another is the main means of communication between processes in MPI.
The open standard for parallelizing C, C++, and Fortran programs is called OpenMP (Open Multi-Processing). explains a collection of compiler options, library functions, and environment variables that are made for multiprocessor platforms with shared memory programming multithreaded applications. OpenMP implements parallel computing using multithreading, where the "master" (master) thread establishes a group of slave (slave) threads and assigns the task to them. On
a computer with several processors, threads are intended to operate concurrently (the number of processors need not be greater than or equal to the number of threads).
GRID is a collection of several computers for solving a single computationally complex task, divided into subtasks. Each computer solves several subproblems, after which the results of individual calculations are combined. The main advantage of GRID is that it can consist of computers that are hundreds and thousands of kilometers apart and have different hardware and software characteristics. The task of connecting computers is performed by middleware that (virtually) links all computers over the Internet into a single supercomputer. The idea of GRID originated in the 1990s, when, with the development of computer communications, the consolidation of geographically dispersed computers became cheaper, simpler, and potentially more powerful means of increasing productivity than increasing the power of a single supercomputer.
There are different types of GRIDs. Volunteer Computing is a form of GRID computing that takes advantage of the idle time of ordinary users' computers around the world. At the moment, the largest Volunteer Computing project, both in terms of the number of participants and in terms of total power, is Folding@home, a project for computer simulation of protein folding, launched in October 2000 by scientists from Stanford University. However, this article will not focus on Folding@home, but on the second largest project - the BOINC system. The reason is simple - unlike the specialized Folding@home, the BOINC project provides an opportunity to participate in a wide variety of scientific projects, from breaking cryptographic systems to searching for extraterrestrial civilizations.
To organize distributed computing (GRID computing), an appropriate software platform is required. The system should be able to split one large task into many small subtasks, distribute these subtasks among computing nodes, receive the results of calculations and combine them into a single whole. To do this, various software "layers" were created between the control server and computing nodes. One such software layer is BOINC.
BOINC (Berkeley Open Infrastructure for Network Computing) is a free (distributed under the GNU LGPL) software platform for distributed computing (more specifically, Volunteer Computing). The BOINC system was developed at the University of California at Berkeley under the direction of David Anderson by the team behind the legendary SETI@home project. The main motive for the development of the system was the lack of free computing power for processing data from radio telescopes. That is why the developers decided to attract computing resources and unite the communities of several scientific projects. To solve this large-scale problem, the BOINC software platform was created.
The BOINC system consists of a client program common to all BOINC projects, a composite server (the term "composite server" means that a server can physically consist of several separate computers) and software. To perform distributed computing, a client-server architecture is used.
Users who donate BOINC computing power to projects like SETI@home "earn" credits for completing individual subtasks. These credits are needed only to create a spirit of competition among project participants - users with the largest number of credits have a chance to "light up" on the main page of the project.
The peculiarity of "volunteer computing" is that for a successful solution, individual small subtasks must be very weakly interconnected and practically independent of the results of parallel tasks. Otherwise, a very large performance overhead will come from waiting for other results and synchronizing them. An ideal task is, for example, password guessing using the "brute force" method - for each variant (ten, hundreds, depending on the complexity of the calculation) of the password, a separate computer calculates the hash and compares it with the given one. The first match - the problem is solved, but no other result can bring it closer to the correct answer, and the subtasks are absolutely independent (due to the properties of password hash functions). For such tasks, consisting of independent subtasks, the architecture of the BOINC system was developed. The figure below shows the architecture of the BOINC system.
Server
Figure 1. System architecture BOINC
The BOINC architecture is based on the idea of a finite state machine - the server consists of a set of separate subsystems, each of which is responsible for its own specific task, for example, performing calculations, transferring files, etc. Each of the subsystems checks the state of the subtask, performs some actions and changes the state of the subtask - this is how they work in an endless loop.
In general, the system consists of a BOINC server (if necessary, distributed over several physical servers to improve performance, fault tolerance and security), many clients that perform server tasks, and possibly additional components in the form of connected GRID networks (for example, based on widely used Globus Toolkit).
Globus Toolkit is an open (Open Source) toolkit for creating computational Grids. Includes a set of software services and libraries for resource monitoring, discovery and management of computing nodes, security and file management.
Developed and maintained by the Globus Alliance.
Previously, most passwords were stored in plain text, but very quickly the developers realized that it was not safe to do so. It is better to store not the password itself, but its hash, a set of numbers of a certain size, which is generated based on the password. In this case, even if the attacker gets the hash, he will not be able to find out the password. There are various hashing methods, such as md5, sha1, sha2 and many more. But there is a way to recover a password from a hash. To do this, you can use brute force, we just need to create a hash for each possible password and compare it with the hash that needs to be decrypted.
There are various programs for sorting hashes, the most popular of them is Hashcat. With this utility, you can iterate over a hash value using a dictionary or exhaustive search over all values. The hashcat program supports the following hashing algorithms: md5, md5crypt, sha1, sha2, sha256, md4, mysql, sha512, wpa, wpa2, grub2, android, sha256crypt, drupal7, scrypt, django and others. The disadvantage of Hashcat is the complexity of setting up and the need to remember more command line switches.
The presentation in the form of a site makes it much easier for the user to work with distributed computing - he can start working anywhere where there is access to the Internet. Such a presentation relieves the user of the need to study complex documentation for parallelization tools and remember command line switches. In addition to registering and installing the service, nothing is required from the user.
Registration of clients allows you to keep statistics on adding tasks, statistics on user participation in solving problems. Providing the user with up-to-date information about the number of on-line clients ready to work on tasks, providing a list of solved tasks that a particular user has created, simplifies working with the portal. The possibility of feedback allows the developer to respond to customer comments in a timely manner.
The developed program is useful for administrators in large and small companies in situations where the user has forgotten his password. Usually, the HASH is stored in the database, and there are no reverse algorithms for converting from a hash to a password. If a user uses the same password for multiple resources, it is easier to recover it by brute force rather than leaving change requests.
The program can also help if the computer has been infected with a ransomware virus. The virus locks files or folders with a password and demands money for access to it. The program allows you to determine by enumeration the private key, which will allow you to decrypt files. The advantage of the program is the use of a modern and frequently used encryption algorithm. But if it does not fit, it can be easily replaced in the program.
In contrast to the file organization of information storage, the use of databases provides undeniable advantages. For example, it is easy to organize a search, sort records by date and time, carry out various selections of records, since the database provides for an effective organization of information storage, minimizing access and search time. You can quickly find one specific record among many thousands of records (by a given identifier).
To work with large amounts of data, people use either file storage or databases. When working with files, you must constantly keep under control many auxiliary parameters and files. Complicated site search and archive creation.
Databases don't have the big disadvantage of files: they don't have data sharing problems. A script that modifies a file as it runs can be run by two people at the same time, and if steps are not taken to lock the file, problems can arise. With databases, these problems do not exist, because sharing problems are designed to be dealt with at a low level with maximum efficiency.
REFERENCES
1. Г. Шилдг Полный справочник по C# — изд. Вильнюс, 2004г
2. Г.Р. Эндрюс Основы многопоточного, параллельного и распределенного программирования — изд. Вильнюс, 2003
3.Распределенные вычисления на платформе .Net. [Электронный ресурс], URL: http://habrahabr.ru/post/97292/(дата обращения: 10.05.2017)
4. Джеффри Рихтер CLR via C#. Программирование на платформе Microsoft .NET Framework 4.0 на языке C#. — изд. Питер, 2012г