#
Вестник РУДН. Серия МИФ
RUDN Journal of MIPh
http://journals.rudn.ru/miph
2018 Vol. 26 No. 1 84-92
Информатика и вычислительная техника
UDC 621.39
DOI: 10.22363/2312-9735-2018-26-1-84-92
Analysis of the File Distribution Time in Peer-to-Peer Network
Peer-to-peer (P2P) file sharing systems are responsible for a significant part of the Internet traffic today. File sharing is perhaps the most popular application among P2P applications. In comparison with traditional Client/Server file distribution, P2P file sharing has some advantages, namely, scalability, bandwidth and others. In this paper we study the minimum distribution time for getting the entire file by all of the users in the system, who need this file. This parameter is closely associated with the mentioned bandwidth. The expression for the minimum distribution time uses fluid-flow arguments and includes such terms as the file size, the upload rates of the seeds and the upload and download rates of the leechers. Using numerical examples and the expression for the minimum distribution time, we show the efficiency of P2P file sharing. We consider the system behaviour, when there are two types of leechers in the system. These types differ from each other by their upload bandwidths.
Key words and phrases: peer-to-peer network (P2P), file distribution, minimum distribution time, leecher, seeder, peer, file sharing, fluid-flow arguments
The classical methods of resource distribution are based on the paradigm Client/Server. Here a set of servers distributes a file to receiving users. This file could be a software, a content such as a movie or a TV show, etc. The servers and the servers' bandwidth can be bottlenecks in the process of the file distribution, when the file size and the number of receiving nodes become large.
Peer-to-peer (P2P) file sharing is an alternative to the classical Client/Server file distribution and allows to amplify the uploading capacity of the receiving users to aid in the process of the file distribution [1]. We should notice, that a node, participating in P2P file sharing, is usually called a peer. In particular, as soon as a peer has got any portion of the file, it can redistribute that portion to any of the other receiving peers. There are lots of examples of P2P networks today, for instance: Napster [2], Gnutella [3], Freenet [4], etc. P2P file sharing systems, transferring files via the BitTorrent protocol [5], for instance, Vuse [6], etc., are widespread. The intrinsic scalability of these protocols enables to distribute the files of large sizes to several thousand participants. Here the usage of high bandwidth at the distribution servers is not demanded. A user with an ordinary PC with a regular connection can apply such a way to distribute large files to an audience, which size is significantly higher than what is possible with the classical Client/Server approach.
Obviously, P2P file sharing systems have become well known in the Internet today. But there are a lot of different questions, which demand answers. It is necessary to explore: how good quantitatively P2P is in the file sharing process. Can P2P be essentially better than Client/Server distribution? Can P2P scale well when the number of receiving peers increases and becomes very large? How does the cooperation of the server upload
Received 8th September, 2017.
The publication was prepared with the support of the "RUDN University Program 5-100" and funded by RFBR according to the research projects No 15-07-03608, No 17-07-00845.
The authors would like to express thanks to Prof. Konstantin Samouylov for useful remarks in preparing the paper.
E. V. Bobrikova, Yu. V. Gaidamaka
Department of Applied Probability and Informatics Peoples' Friendship University of Russia (RUDN University) 6 Miklukho-Maklaya St., Moscow, 117198, Russian Federation
1. Introduction
bandwidth, the upload bandwidth of a receiving peer, and the download bandwidth of a receiving peer influence the total distribution time?
Various considerable papers have been dedicated to mathematical models, measurements, and simulation results for P2P networks, [7-12].
In this paper, we consider fundamental questions of P2P file sharing, which are in a basis of P2P file sharing. We propose an expression for the minimum achievable file distribution time. In the expression we use fluid arguments. In the expression we also use terms of the basic parameters of a P2P file sharing system, precisely, the file size, the number of servers, the number of receiving peers, and the upload and download bandwidths of all the peers, who participate. The expression takes place for arbitrary and heterogeneous upload and download bandwidths. The expression has a closed and simple form. The problem statement was given in [13]. The mentioned result allows to address many of the questions posed above.
The rest of the paper is organized as follows. In Section 2 we describe the main problem and demonstrate the main result of the paper. We discuss the applicability of the result. In Section 3 we present numerical results for the case of the receiving peers with different upload bandwidths. In Section 4 there is a conclusion.
2. Main Problem Description
The fundamental problem in P2P file sharing is how to distribute a file to peers in P2P network. In P2P a file is divided into parts or portions to distribute it. We consider two sets of peers: seeds and leechers. Any seed has a whole copy of the file and stays in the system to allow other peers to download from itself. Any leecher needs a copy of the file. At first leechers have no portions of the file and have to download portions from the seeds. But as soon as a leecher gets a portion, it begins to upload the portion to other leechers. So leechers can get portions of the file from any of the seeds and from other leechers that have portions. A leecher is allowed to leave the system after getting the whole file. The problem is to minimize the distribution time, i.e. the time, which is needed to obtain the file by all of the leechers.
We consider the following parameters: V — a set of all peers in P2P network, P = |V| — a number of all peers; S — a set of seeds, S = |5| — a number of seeds; £ — a set of leechers, L = |£| — a number of leechers, so we have V = SU£ and P = S + L; F — the size of a file; di — a download bandwidth of a leecher i; m — an upload bandwidth of a peer i.
A peer i can transfer bits at a maximum rate of Ui and can download bits at a maximum rate of di. We take into account the condition di > m, that is usual in the Internet today, but we can also consider arbitrary upload and download rates. Furthermore,
Ti(t) — the rate, at which a leecher i downloads "new" content from seeds and other leechers combined at time t, so R = {ri(t),t > 0,i G £} — a rate profile; T — a distribution time for a rate profile R; Tmin = min T — minimum distribution time achievable over all possible rate profiles.
The rate profile {ri(t),t > 0,i G £} can achieve Tmin for arbitrary values of F, Ui, i G V, and di, i G £. So we aim to determine the corresponding minimum distribution time Tmin.
In fact the model, which is considered in the paper, is a fluid model [14]. Particularly, we imply that a leecher can replicate and forward a bit as soon as it receives the bit. This key assumption allows us to derive remarkably explicit expressions for the minimum distribution time for general, heterogeneous models. This assumption is core and important. Due to it clear expressions for the minimum distribution time for general, heterogeneous models are obtained.
As we mentioned, BitTorrent is a very popular protocol for real P2P file sharing systems. The idea of BitTorrent is to divide the file to be distributed into parts, named chunks. The size of the chunks is typically 256 KB [8]. In BitTorrent a peer can only forward a chunk as soon as it has fully got the chunk. So chunk-based models are more realistic than fluid-based models. For the chunk-based model, closed form expressions for the minimum
distribution time are only available for very simple cases of peers, who have homogeneous bandwidths and infinite download bandwidths, and for the case of heterogeneous systems it is difficult to obtain closed-form expressions for the minimum distribution time. The explicit expressions for the minimum distribution time Tmin, demonstrated in the paper and obtained for the fluid-based model, are lower bounds for more realistic chunk-based models. We can compare the minimum distribution time Tmin(/) for a fluid-based model and the minimum distribution time Tmin(c) for a chunk-based model. It is possible to show that for homogeneous systems, the error is about
^min (c) - Tmin(/) _ log2 L
Tmin(/) = N ' ()
where N is the number of chunks in the file. It turns out this error is negligibly small for cases of practical interest, even when file sizes are medium. Thus although the chunk-based model is more realistic than our fluid-based model, the last one leads to a clear expression for the minimum distribution time for heterogeneous systems. This expression is a good approximation of the minimum distribution time for the chunk-based model for homogeneous systems.
We imply, that the bandwidth bottlenecks are in upload and download rates and not in the basis of Internet. This postulate dominate in Internet today. Further, we do not take into account the impacts of network congestion and TCP congestion control. Thus, our expressions for the minimum distribution time Tmin are really approximations for real file distribution times. However, the expressions can be used to make useful, relevant calculations. Further that can be a base for a file-distribution protocol for arbitrary upload and download rates.
We also make an assumption that each peer in the system takes participation in the file distribution up until the peer has gained the whole file. This enables to understand the main aspect: how different system parameters affect the P2P file distribution process.
Further, the following parameters are also considered:
u(A) = yi Ui, dmin(^) = min Ui
i€A
eA
for any subset A C V, dmin = dmin(£).
3. Minimum File Distribution Time
In this section we demonstrate the result for the minimum distribution time Tmin of a file [13]. We consider all the leechers are equal, but their download and upload bandwidth can differ. Thus, P2P system is heterogeneous.
Theorem 1. The minimum distribution time for the general heterogeneous P2P file sharing system is
F
Tmin = ■ tj TcvT' (2)
mm{dmin, ,u(s)}
The expression (2) has a rather simple and explicit form. The theorem gives some distribution scheme for any upload and download parameters and can be used as a reference point for the distribution time for any P2P file distribution protocol.
We can notice, that we actually choose Tmin from three values. Each value has its own sense. We have Tmin > F/dmin, because the leecher with the lowest download rate cannot receive the file faster than F/dmin. We also have Tmin > F/u(S), because the set of seeds cannot distribute fresh bits at a rate faster than u(S), a leecher cannot obtain the file at a rate faster than u(S). Finally, we have Tmin > LF/u(V), because the total upload bandwidth of the system is u(P) and because leechers need to obtain
a total of LF bits. So we have the lower bound for P2P file distribution: Tmin > max{F/dmin, LF/u(V),F/u(S)}. The theorem finds that the right-hand side of this inequality is not only a lower bound, but also the exact value of the minimum distribution time Tmin.
As we can see, four cases are possible here (the proof of the theorem is based on these cases):
dmin < min{«(p)/L, u(S)} and dmin < u(£)/(L — 1);
dmin < min{«(P)/L, u(S)} and dmin > u(£)/(L — 1);
u(V)/L < min u(S)}; u(S) < mm u(V )/L}.
Let Si(t) be the rate at which the seeds send bits to leecher i at time t. In each case, a rate profile has the same general structure, based on the following assumptions. As soon as each leecher i begins to obtain its bits from the seeds, it replicates the obtained bits to each of the other L — 1 leechers at some rate less than or equal Si(t) as shown in Figure 1.
Figure 1. The scheme of P2P file distribution
Thus, for each case, the distribution scheme consists of L application-level multicast trees. Each tree has a root in the seed, passes through one of the leechers, and ends at each of the L — 1 other leechers.
Now we consider three examples to show the significance and the usefulness of the presented theorem. In each example, it is necessary to distribute a file of size F = 1.25 GB and to calculate Tmin according to the expression (2). There are one seed (S = 1) and ten leechers (L = 10) in P2P network. The seed's upload bandwidth us is usually higher than leecher's upload bandwidth. We distinguish two types of leechers: ordinary leechers and super leechers. The number of ordinary leechers is denoted as Lord and the number of super leechers is denoted as Lsup. A super leecher has the same upload bandwidth us as a seed, and an ordinary leecher's upload bandwidth uiord is lower.
Example 1. Each leecher has download bandwidth d = 2000 Kbps. Each ordinary leecher has upload bandwidth uiord = 200 Kbps. If we change the number of the ordinary leechers Lord from 1 to 10, the value of the minimum distribution time Tmin increases.
In Figure 2 we present the dynamics of Tmin at three values of a seed's upload bandwidth: us = 1000 Kbps, us = 1500 Kbps, us = 2000 Kbps.
We can see, the higher us the less time is necessary to distribute the file.
Example 2. Here again the download bandwidth of an each leecher is d = 2000 Kbps. A seed's upload bandwidth is a constant us = 1500 Kbps.
Now we present in Figure 3 the same function, but now we change leecher's upload bandwidth, we have uiord = 200 Kbps, uiord = 600 Kbps, uiord = 1000 Kbps.
Here we can observe similar effect: the higher uiord the less time is necessary to distribute the file.
Figure 3. Tmin = Tmin(Tord), «iord = const
Example 3. In this example each ordinary leecher has constant upload bandwidth uiord = 600 Kbps, and a seed has constant upload bandwidth us = 2000 Kbps. Again we draw the plot for function Tmin, depending on a number of ordinary leechers Lord. The parameter of leecher's download bandwidth is changed: d = 600 Kbps, d = 1600 Kbps, d = 2600 Kbps. The plot is presented in Figure 4.
Here Tmin = 4.63 hours is a constant at low download bandwidth d = 600 Kbps. In other cases of d first the values of Tmin vary, then from the number of ordinary leechers 5 they coincide and increase.
The presented examples allow to conclude that our calculations for fluid-based model is rather good for a description of a real P2P file distribution process in the Internet.
The study of the implications of theorem are made in [13]. In (1) we present the fractional error between a chunk-based model and a fluid-based model. This error is obtained for the minimum distribution time Tmin for a homogeneous system with peers having infinite download capacity. It turns out, that for file size of 350 MB or more, the percentage error between Tmin(f) and Tmin(c) is less than 1%, even when there are 10,000 leechers in the network.
We can conclude from (1) that, for homogeneous systems, the error can be surely neglected if N ^ log2 L. This condition is also easily satisfied for typical file sizes of the order of several GB. We suppose this is true for heterogeneous systems as well.
5,00
4,50
4,00
3,50
l/l 3,00
-C
c Z,50
E,
I-1 2,00
1,50
1,00
0,50
0,00
y
-- ----
-d=600kb(35 d=1600kbps d=2600kbps
5 6 L_ord
Figure 4. Tmin = Tmin(iord), d = const
The fluid-based model is very accurate, even though the fluid-based model is not as realistic as the chunk-based model. The fluid-based model has the advantage of providing a simple, explicit expression for the minimum distribution time for general, heterogeneous systems.
4. Conclusion
Today P2P file sharing is an important application in the Internet. The determination of a minimum achievable time for distribution of a file to all leechers is a fundamental problem in P2P file sharing network. Distribution of a file to all leechers means, that all equal small pieces of the file, named chunks, are obtained by all leechers. The mentioned fundamental problem becomes a complex optimal scheduling problem, when we consider a model in which discrete chunks are kept and forwarded at the peers.
In this paper we consider a version of a fluid-based model for the problem of the determination of the minimum achievable distribution time. We get an explicit expression for this fundamental problem, using this fluid-based model. The result is simple, useful. On the basis of the expression we construct three numerical examples, demonstrating the behaviour of the minimum distribution time depending on the number of super leechers with the high upload bandwidth and ordinary leechers with the ordinary upload bandwidth.
As it was expected, with the growing number of ordinary leechers, the total bandwidth available on the network for distribution is reduced, which leads to the increase of the value of the minimum distribution time Tmin. As we can see from the plot in Figure 2, this time depends on the upload bandwidth us, and for super leechers with us = 2 Mbps the minimum distribution time Tmin is minimal.
From the plot in Figure 3 we can see the minimum distribution time Tmin also depends on the upload bandwidth uiord, and for ordinary leechers with uiord = 1 Mbps the minimum distribution time Tmin is minimal. Finally, from the plot in Figure 4 we see the minimum distribution time Tmin depends on the download bandwidth d, and for leechers with d = 2.6 Mbps the minimum distribution time Tmin is minimal. Note that, the values of Tmjn are the same for d =1.6 Mbps and d = 2.6 Mbps starting with the value Lord = 5.
The obtained expression is rather close to the similar result for a more realistic chunk-based model in the case of homogeneous system. The result of this paper has a very high accuracy. The given fractional error demonstrates this fact. It turns out, that the discussed fluid-based model is a rather good approximation to a real network.
The results of the paper can be developed in different directions. One direction is to compare the minimum distribution time in the fluid-based model with the minimum
distribution time in the chunk-based model for heterogeneous systems. Another direction is a determination of the minimum distribution time for networks which limit the number of simultaneous connections between participating peers.
References
1. Yu. V. Gaidamaka, A. K. Samuilov, Analysis of Playback Continuity for Video Streaming in Peer-to-Peer Networks with Data Transfer Delays, T-Comm: Telecommunications and Transport (11) (2013) 77-81, in Russian.
2. Napster Company Info.
URL http://us.napster.com/availability
3. The Gnutella Protocol Specification v0.4. URL https://gnunet.org/node/147
4. What is Freenet? Freenet Company Info. URL https://freenetproject.org
5. The BitTorrent Protocol Specification.
URL http://www.bittorrent.org/beps/bep_0003
6. Vuse BitTorrent Client. URL http://www.vuze.com
7. X. Yang, G. de Veciana, Service Capacity of Peer to Peer Networks, in: Proceedings of IEEE INFOCOM, Vol. 4, 2004, pp. 2242-2252.
8. D. Qiu, R. Srikant, Modeling and Performance Analysis of BitTorrent-Like Peer-to-Peer Networks, in: Proceedings of ACM SIGCOMM, Vol. 34, 2004, pp. 367-378.
9. Z. Mordji, M. Amad, D. Aissani, A Derived Queueing Network Model for Structured P2P Architectures, in: VECoS 2014, 2014, pp. 76-84.
10. A. Ferragut, F. Paganini, Fluid Models of Population and Download Progress in P2P Networks, IEEE Trans. on Control of Network Systems 3 (1) (2016) 34-45.
11. Yu. V. Gaidamaka, E. V. Bobrikova, E. G. Medvedeva, The Application of Fluid Models to the Analysis of Peer-to-Peer Network, Bulletin of PFUR. Series: Mathematics. Informatics. Physics (4) (2016) 15-25, in Russian.
12. Yu. V. Gaidamaka, , E. G. Medvedeva, S. I. Salpagarov, E. V. Bobrikova, Analysis of Model for Multichannel Peer-to-peer TV Network with View-upload Decoupling Scheme, RUDN Journal of Mathematics, Information Sciences and Physics 25 (2) (2017) 123-132, in Russian. doi:10.22363/2312-9735-2017-25-2-123-132.
13. R. Kumar, K. W. Ross, Optimal Peer-Assisted File Distribution: Single and Multi-Class Problems, 2006.
14. K. E. Samouylov, E. V. Bobrikova, A Simple Fluid Model of P2P File Sharing Network, T-Comm: Telecommunications and Transport (7) (2012) 180-184, in Russian.
УДК 621.39
Б01: 10.22363/2312-9735-2018-26-1-84-92
Анализ времени распространения файла для одноранговой сети
Е. В. Бобрикова, Ю. В. Гайдамака
Кафедра прикладной информатики и теории вероятностей Российский университет дружбы народов ул. Миклухо-Маклая, д. 6, Москва, Россия, 117198
Передача данных по одноранговым сетям или P2P-сетям занимает значительную долю трафика в современной сети Интернет. Наибольшей популярностью пользуется обмен файлами по P2P-сетям. Обмен файлами по P2P-сети обладает рядом преимуществ такими, как, например, хорошая масштабируемость, высокая пропускная способность, по сравнению
с традиционным подходом Клиент/Сервер к передаче файлов. Данная работа посвящена изучению минимального времени распространения файла. Речь идёт о минимальном времени, которое необходимо затратить для получения целого файла всеми пользователями сети, которым необходим этот файл. Этот параметр имеет прямое отношение к уже упомянутой пропускной способности сети. Для получения выражения для минимального времени распространения файла используется так называемая жидкостная модель P2P-сети. Выражение оперирует такими понятиями, как размер файла, скорость передачи сидов, скорость передачи и скорость загрузки личеров. С использованием численных примеров для минимального времени распространения файла показана эффективность применения жидкостной модели для описания файлообмена по P2P. Рассматривается поведение системы в случае, когда в сети имеются личеры двух типов, которые отличаются друг от друга скоростью передачи данных.
Ключевые слова: одноранговая сеть, жидкостная модель, личер, сид, пир, время загрузки файла
Литература
1. Гайдамака Ю. В., Самуйлов А. К. Анализ стратегий заполнения буфера оборудования пользователя при предоставлении услуги потокового видео в одноранговой сети // T-Comm: Телекоммуникации и Транспорт. — 2013. — № 11. — С. 77-81.
2. Napster Company Info. — http://us.napster.com/availability.
3. The Gnutella Protocol Specification v0.4. — https://gnunet.org/node/147.
4. What is Freenet? Freenet Company Info. — https://freenetproject.org.
5. The BitTorrent Protocol Specification. — http://www.bittorrent.org/beps/bep_ 0003.
6. Vuse BitTorrent Client. — http://www.vuze.com.
7. Yang X, de Veciana G. Service Capacity of Peer to Peer Networks // Proceedings of IEEE INFOCOM. — Vol. 4. — 2004. — Pp. 2242-2252.
8. Qiu D., Srikant R. Modeling and Performance Analysis of BitTorrent-Like Peer-to-Peer Networks // Proceedings of ACM SIGCOMM. — Vol. 34, No 4. — 2004. — Pp. 367-378.
9. Mordji Z, Amad M., Aissani D. A Derived Queueing Network Model for Structured P2P Architectures // VECoS 2014. — 2014. — Pp. 76-84.
10. Ferragut A, Paganini F. Fluid Models of Population and Download Progress in P2P Networks // IEEE Trans. on Control of Network Systems. — 2016. — Vol. 3, No 1. — Pp. 34-45.
11. Гайдамака Ю. В., Бобрикова Е. В., Медведева Е. Г. Применение жидкостных моделей к анализу одноранговой сети // Вестник РУДН. Серия: Математика. Информатика. Физика. — 2016. — № 4. — С. 15-25.
12. Анализ модели многоканальной одноранговой сети вещательного телевидения для схемы с разделением видеопотока / Ю. В. Гайдамака, Е. Г. Медведева, С. И. Салпагаров, Е. В. Бобрикова // Вестник РУДН. Серия: Математика. Информатика. Физика. — 2017. — Т. 25, № 2. — С. 123-132.
13. Kumar R., Ross K. W. Optimal Peer-Assisted File Distribution: Single and Multi-Class Problems. — 2006.
14. Самуйлов К. Е., Бобрикова Е. В. Простейшая жидкостная модель файлообменной Р2Р-сети // T-Comm: Телекоммуникации и транспорт. — 2012. — № 7. — С. 180184.
© Bobrikova E. V., Gaidamaka Yu.V., 2018
Для цитирования:
Bobrikova E. V., Gaidamaka Yu. V. Analysis of the File Distribution Time in Peer-to-Peer
Network // RUDN Journal of Mathematics, Information Sciences and Physics. — 2018. —
Vol. 26, No 1. — Pp. 84-92. — DOI: 10.22363/2312-9735-2018-26-1-84-92.
For citation:
Bobrikova E.V., Gaidamaka Yu. V. Analysis of the File Distribution Time in Peer-to-Peer Network, RUDN Journal of Mathematics, Information Sciences and Physics 26 (1) (2018) 84-92. DOI: 10.22363/2312-9735-2018-26-1-84-92.
Сведения об авторах:
Бобрикова Екатерина Васильевна — кандидат физико-математических наук, старший преподаватель кафедры прикладной информатики и теории вероятностей РУДН (e-mail: [email protected], тел.: +7 (495) 9550999) Гайдамака Юлия Васильевна — доцент, кандидат физико-математических наук, доцент кафедры прикладной информатики и теории вероятностей РУДН (e-mail: [email protected], тел.: +7 (495) 9550999)
Information about the authors:
Bobrikova E. V. — Candidate of Physical and Mathematical Sciences, senior lecturer of Department of Applied Probability and Informatics of Peoples' Friendship University of Russia (RUDN University) (e-mail: [email protected], phone: +7 (495) 9550999)
Gaidamaka Yu. V. — Candidate of Physical and Mathematical Sciences, associate professor of Department of Applied Probability and Informatics of Peoples' Friendship University of Russia (RUDN University) (e-mail: [email protected], phone: +7 (495) 9550999)