НАУЧНЫЙ ОТДЕЛ
ИНФОРМАТИКА
A Method of Protected Distribution of Data Among Unreliable and Untrusted Nodes
Yu. V. Kosolapov, F. S. Pevnev
Yury V. Kosolapov, https://orcid.org/0000-0002-1491-524X, Institute of Mathematics, Mechanics, and Computer Science named after of I. I. Vorovich, Southern Federal University, 8a Milchakova St., Rostov-on-Don 344090, Russia, [email protected]
Fedor S. Pevnev, Institute of Mathematics, Mechanics, and Computer Science named after of I. I. Vorovich, Southern Federal University, 8a Milchakova St., Rostov-on-Don 344090, Russia, [email protected]
We consider a model of protecting the confidentiality and recover-ability of data in a distributed storage system. It is assumed that informational blocks are coded into the code blocks. Then the blocks are divided into parts and distributed among repositories of the distributed storage. A modification of the code noising method is constructed which simultaneously provides computational resistance to coalition attacks on confidentiality of stored data. Moreover, the modification also provides protection from the failure of a part of the storage nodes. Confidentiality protection is provided for coalitions of greater cardinality than in the case of using the classical method of code noising. It is shown that computational resistance is based on the complexity of solving one well-known problem of theoretical coding.
Keywords: wiretap channel, distributed secure storage, coalition attack.
Received: 05.10.2018 / Accepted: 21.05.2019/ Published: 31.08.2019
This is an open access article distributed under the terms of
Creative Commons Attribution License (CC-BY 4.0).
DOI: https://doi.org/10.18500/1816-9791-2019-19-3-326-337
INTRODUCTION
Let us consider a model of safe data storage on n independent and in general untrusted repositories Si,..., Sn (Figure). Further these repositories are sometimes referred to as nodes. We consider cloud repositories like Google Drive,
\
/
/
j — coalition of nodes
fail node
The distributed storage system
Yandex.Disk, etc. to be such independent storages. The users are able to write their data into each of n repositories and read data from at least v(e N) ones (inaccessible repositories are crossed out on the Figure). We assume that adversary coalition contains no more than N) repositories (referred to as participants of coalition) and are able to obtain data from each of them (coalition is marked with a dashed line on the Figure). The parameters n, v and y are known to everybody. The challenge for the developers of a protection system is choosing the transformation of protected data before distributing it among repositories. On the one hand, this transformation should provide confidentiality of protected data against coalition of cardinality y or less, on the other hand, it should provide a possibility of recovering the protected data when any n — v repositories are inaccessible. The coding method is considered to be not secret. We are interested in non-cryptographic methods, because in this case it is not necessary to support the life cycle of cryptographic keys.
The store model described above is actually the research subject of [1]. In [1] transformation of protected^data is a code noising method (in terminology [2]) based on a pair of linear codes (C,C) where [n, l, d]-code C with length n, dimension l, and code distance d contains [n,1 — k,d]-code C, k < I. In [1] both codes are MDS-codes (Maximum Distance Separable codes). Code noising method is optimal for this store model if n — v < d— 1 and
(see results in [3-6]). In this case the confidentiality is provided in theoretical-informational sense if protected data is uniformly random distributed. The pairs of MDS-codes are also optimal if availability of the data storage is limited [7], or if coalition has an access to an additional part of protected data [8]. Some experimental estimations of code noising resistance in distributed storage are explored in [9], but the observer has identified an attack algorithm in that case.
The article [10] considers a repetitive interception attack against the classical code noising method. It is assumed that the observer has the opportunity to notice several partial code blocks corresponding to one unknown informational block. In the article [10] it is also assumed that different code blocks are observed on different subsets of coordinates. The repetitive interception attack is successful if condition (1) is wrong [10]. Thus, in the distributed storage model the coalition of repositories is able to attack confidentiality effectively with the repetitive interception attack. This attack is possible if the system similar to one described in [1] is used and the condition (1) is wrong. We offer a modification of the code noising method which provides high resistance to repetitive interception even in case when condition (1) is wrong.
y < I — k
(1)
Our solution is based on the regular change of the coding map. Synchronization of the sender and the receiver is not required, however, the sender needs to additionally send the information about the mapping used. We use an approach usual in cryptography to estimate the resistance of proposed method. According to this method it is enough to reduce the task by breaking it into several (usually well-known) matematical problems. In the present paper the resistance of the constructed method is based on the complexity of one theoretical coding problem.
The article consists of introduction, two sections, and conclusion. The first section describes an analytical model (data storage model), a code noising method and its modification. The second section analyzes the application of the constructed modified method in a distributed storage system. An estimate is obtained for the number of storages that may fail without affecting the possibility of correct recovering of informational blocks from uninjured repositories (nodes). Also the resistance of the modified method is analyzed for coalition attack.
1. DATA PROTECTION SCHEME
1.1. Analytical model
Let us briefly describe the data storage model proposed in [1]. Let i e {1,...,r} and data from i-th source U be represented as informational blocks of k characters over a finite field Fq. Each informational block is encoded independently into the code block of n characters from Fq via encoder Enc. Then all n symbols of code block c = (c1, ...,cn) e Fn are distributed in n repositories so that j-th symbol Cj is written to the repository with number n(j) (or equivalently to the node Sni(j)) where
П : {1,...,n}^{1,...,n} (2)
is a permutation. The users independently choose permutations (2) which are not private. We also assume that the users know the permutation n while they obtain information from repositories. The permutations may appear different from block to block or from file to file or other way.
To extract one informational block the user reads characters of the corresponding code block from the repositories and then puts the whole block into the decoder Dec. The value of n-1 (j)-th coordinate is unknown for the user if the repository Sj is inaccessible (e.g. due to failure or injury). In this case we consider this coordinate to be erased and write symbol * instead of its value. We assume that no more than n — v repositories may be inaccessible while the user is reading data. As the repositories are supposed to be untrusted, we consider every node to be an eavesdropper which knows a value of only one coordinate in every code block. Other coordinates are considered to be erased. The participants of coalition of д repositories will know values of д coordinates in every code block. This set of coordinates may be different from block to block because of different permutations conducted (2). Therefore, the coalition has the opportunity to launch a repetitive interception attack from [10] if the classical code noising method is used.
1.2. Classical code noising method
The code noising method is used in [1] for keeping data safety against adversary coalition and inaccessibility of the repositories at the same time. We can describe this method in the following way. Let ( be a linear [n, Ldl-code with length n, dimen-
sion l, and code distance d, C and C are [n, k]-code and [n, l — k]-code respectively, C n C = 0 = (0,..., 0)(e F£) and direct sum C © C is equal to C Let G and G be generating matrices of codes C and C respectively. Code noising is the function
f : Fq x F,-q ^ CC,
f (m, r) = mG + rG = c
where m(e Fq) is an informational block, r is vector which is chosen randomly and equiprobably from F,-q. Let
Decg : (F, U {*})" ^ Fq
be a decoder which is able to correct no more than d — 1 erasures in every code block and has vectors from F, as output. One can try to obtain the informational block from the block c' e (F, U (*})n by applying the decoder Dec^ to the c' and cutting off the last l — k symbols of the decoder output.
Let us assume that every informational block m(e Fq) has an equal probability pM(m) = 1/qq, i.e. random variable M is uniformly distributed over Fq. As we can see in [1,4-6,8] the resistance of the code noising method strongly depends on pair (C, C). In fact for every pair (C, C) there exists a threshold y0(e N) such that if the coalition (or eavesdropper) knows the values of no more than y0 coordinates of the code block it will not obtain any information about encoded informational block. Otherwise if y > y0 the eavesdropper can get non-zero information. In this case there is at least one set of observed coordinates t (|t| = y) which does not provide the whole set of informational blocks as candidates to be original informational block, i.e. the size of the provided set of candidates is less than qq. So the eavesdropper may attempt to use repetitive interception attack from [10]. For example, as it is shown in [1], if (C, C) is a pair of MDS-codes then y0 = l — k (see (1)) where l is rank of C and n — k is rank of C. The eavesdropper can easily recover the informational block knowing few partially erased code blocks if y > y0 (see [10]). We propose a modification of the code noising method for counteracting a repetitive interception attack. We describe this modification in the next subsection. The defense ability will be described in subsection 2.
1.3. Modified code noising method
The main idea of the modified code noising method is periodic change of encoding functions in such way that the legal receiver can determine the exact encoding function using the received code blocks. Note that further the user is called a legal receiver if he or she has a permission to read data from the storage. At the same time an illegal eavesdropper cannot determine the exact function. Note that the idea of changing encoding functions is not new. The authors of [11] have used this idea creating the XtX encoding construction. They have assumed that the eavesdropper is able to obtain full data with errors (not erasures) and have analyzed properties of this construction such as code rate and security. The principal distinction of our scheme is using only one operation for providing security instead of two operations as in XtX construction.
We denote the set of numbers {1, ...,n} as n. Let the set supp(a) = {i : a = 0} be a support of vector a = (ai, ...,an) and the number w(a) = |supp(a)| be a weight of this vector. For positive integers n' < n the operator
nT : Fn ^ Ff
will be used as a projection operator on the set т(С n). If т = {i1, ...,in'} and a = (ab...,an), then Пт(a) = (ai1, ...,ain,). If A С FJJ is a set then its projection is
denoted as a set Пт(A) = {Пт(a) : a e A}. Let function в : Flq ^ Fz2 be such that for a from Fq:
в (a) = b(e F2) and supp(a) = supp(b). (3)
In order to generate matrix
G = (e )l=i (4)
of [n, l, <d]-code C and for vector k(e Fz2) let us denote the submatrix of matrix (4) as Gk so that Gk = (e,)iGsupp(k). Random encoding parametrized with binary vector k e Fz2 is a function gk : F^(k) x Fl"w(k) ^ J7 that
gk(m, r) = mGk + rGk = c (5)
where m e F^(k), r is chosen from Fl"w(k) randomly and equiprobably, k = 10 k, 1 e F2 and w(1) = l. Let c' be a partially erased vector corresponding to the code block c (see (5)). If k is known then one can try to extract informational vector m' with the next rule:
m' = g-1 (c') = nsupp(k) (Dec^(c')) . (6)
The set of all possible functions gk for given G we denote as G((G):
G= {gk : k e F2}.
The legal sender (or the user who has a permission to write symbols of code blocks into distributed storage) chooses function gk randomly and equiprobably from G((G). With this assumption the legal receiver (the user who has a permission to read data from distributed storage) is not able to recover m' uniquely with only one code block c' because he or she has to know the set of coordinates in Dec^(c') corresponding to the informational vector (see (6)). The legal receiver should know k for recovering the informational block. We propose to put the information about vector k into a package of 9 +1 code blocks, 9 e N. Note that it is usual for data storage systems to read and write data as packages of blocks rather than single blocks.
Let us consider how the legal sender forms t-th package, t e N. The data from the source are represented as packages of 9 blocks. The length of blocks may be different in different packages. At first, the sender gets vector k(e Fz2) randomly and equiprobably. This vector is matched with function gk. At the next step the sender represents the data as a sequence of 9 blocks with the length equal to w(k) so that t-th package Mt of informational vectors is
Mt = (mt,1,..., mt,,), mt,1 e F^(k).
The corresponding package of code blocks is
Ct = (ct,1,..., ct)0, c^+1) (7)
where c^ = gk(m^, r^) for p = 1,9 and c^+1 = gk(ut, 0), 0 is a zero vector, w(ut) = w(k). Vector ut is chosen from the set of vectors with weight w(k) (ut e F^(k)) with probability equal to (g — 1)-w(k) for every vector. Let us denote encoding of Mt as
Enc(Mt) = Ct. The legal receiver can use the following way for extracting informational blocks from the packages. He or she should calculate vector k' = p(Dec^(ct,e+1)) and then find mt p = g-1 (ct p), p = 1,...,6. We denote decoding of package Ct as Dec(Ct) = Mt = {mt 1 mt e}. We denote constructed modification of the code noising method as (G(G), 6)-scheme.
For our method the code rate is equal to Rk, e = (e+?)nR for fixed k(e Fz2) where
R = l/n is the code rate for code C. As vector k is chosen randomly and equiprobably, the expected value Re of code rate is
Re = £ R = (6+W £ w(k)R = R (8)
keFi, ' v ' keFi, v '
as kGFi, w(k) = n2i-1. Note that lime^TO Re = 0, 5R.
Let us denote the set G(G)hl, h2 = {gk : h1 < w(k) < h2} for h1, h2 e {0, ...,1}, h2 > h1. One may use this set if it is necessary to increase the code rate Re, for example. Note that G(G) = G(G)0,i. In the next section the (G(G), 6)-scheme is analyzed for resistance against failure of n — v repositories and coalition of y participants (recall that here the length n of code block is equal to the number of repositories). It is easy to generalize the results represented in the next section if (G(G)hl, h2, 6)-scheme is used instead of G((5).
2. ANALYSIS AND APPLICATION OF (G(G), 6)-SCHEME
2.1. Defense against unreliable nodes
Theorem 1. Let C be a [n, l, d\-code generating matrix G, and package Ct = Enc(Mt) be an output of (G(G), 6)-scheme using function gk> C't = (ct , 1,..., ct , e+1) is the corresponding package of partially erased code blocks. If every block c't
p = 1,..., 6 +1 has no more than d — 1 erasures, then Dec(Ct) = Mt.
Proof. By condition, d is code distance of code C and there are no more than d— 1 erasures in every code block. Then p(Dec^(ct,e+1)) = k. According to condition of the theorem, g-1 (ct,p) = mt,p for p =1,6. □
Theorem 1 allows us to get limit on number n — v of unreliable nodes when these nodes may be inaccessible but (G(G), 6)-scheme provides the recovery of information. Exactly, n — v < d — 1 where d is the code distance of C
2.2. Defense against coalition of untrusted nodes
If the coalition (or eavesdropper) knows function gk then the resistance of the modified code noising method does not exceed the resistance of classical code noising based on the pair (C,L(Gk)) where L(A) is a linear subspace with rows of matrix A as its basis. In other words if adversary knows k he or she will be able to attack with all known ways, e.g. attack on repetitive messages. Further we presume that the next hypothesis is right.
Hypothesis 1. If someone wants to get any information about data in package (7) he or she should obtain information about function gk which was used while package encoding.
Let K be a vector chosen randomly and equiprobably from F2, U be a random vector with distribution
PU(u) = 2' (д -1)w(u) (9)
on Fq. Obviously, random vectors K and в(U) have the same distributions. Let us consider for a fixed k the random vector
Jk = gk(M(w(k)) ,R(1-w(k))),
where M(w(k)) and R(1-w(k)) are random vectors distributed uniformly over F^(k) and Fl-w(k) respectively. Note that random vector Jk has uniform distribution over J for any k. Let H(K) and H(K|Jk) be an entropy of a random vector K and conditional entropy of a random vector K on condition ( k respectively:
H(K) = рк(x)log2(PK(x)) = l,
XGF2
H(K|Jk) = - £ k)(x, c)log2(pK|Ck(x|c)).
XGF2 CGC
If Jk = c(e J7), there is no way to choose a correct function from G(G) using only decoded value Dec^(c) because vectors M(w(k)), R(1-w(k)), and K are distributed uniformly. Thus H(K|Jk) = l and the mutual information I(K; Jk) is equal to zero:
I(K; Jk) = H(K) - H(K| Jk) = 0.
Moreover, it is not hard to check that I(K; (Ck, J,k)) = 0 for 9 copies JkJk of a random vector ( k.
Let us consider the random vectors J1,...,J,, X = UG, Jp = J^(u) if U = u, p = 1,9, where U is a random vector with distribution (9). Let т(С n) be a set of observed coordinates (or the numbers of repositories from the coalition) with cardinality |т| = д and Z1 ,...,Z,, Y be random vectors, Zp = Пт(Jp), p =1,..., 9, Y = Пт(X). It is not hard to check the next chain of equalities:
p/в(U) ki Z Z Y г Рг{в(U) = k, Zj = Z1 ,...,Z, = z,, Y = y}
Рг{в (U )= k|Z1 = Z1 , ...,Z, = z, , Y = y} = -——;------- =
Pr{Z1 = Z1, ...,Z, = z,, Y = y} = Рг{в(U) = k, Z1 = Z1,..., Z, = z,}Pr{Y = у|в(U) = k, Z1 = Z1,..., Z, = z,} =
Pr{Z1 = Z1,..., Z, = z,, Y = y}
= Pr{e(U) = k}Pr{Z1 = Z1,..., Z, = z,|в(U) = k}Pr{Y = у|в(U) = k, Z1 = Z1,..., Z, = z,} Pr{Z1 = Z1,..., Z, = z,}Pr{Y = y|Z1 = Z1,..., Z, = z,}
= Pr{e (U ) = Ц»^ (U ) = k} = Рг{в (U) = k|Y = y} = Pr{K = k|Y = y},
for every k e Fk, Z1,..., z,, y e Пт (J?). Thus, H(K |Z1,...,Z, ,Y) = H(K |Y) and
I(K; (Z1,..., Z,, Y)) = H(K) - H(K|Y) = I(K; Y) = l - H(K|Y), (10)
because K is equiprobable. For every k e Fz2 we denote B(k) = {u e Fq : в(u) = k}. Let n e N, т С n, y be the implementation of a random vector Y = Пт(X). Consider the the system of equations
un (G) = y (11)
where u is unknown. The set of solutions of this system denote r(y).
Lemma 1. Let p(r(y)) = {/(g) : g e r(y)}. Then
H(K\Y = y) < log2 |P(r(y))| < min{(1 — rank(nT(G))) log2 q; 1}. Proof. Note that p^|Y(u\y) = 0 if u e r(y). Then
PK\Y (k\y) = pu \Y (u\y) = pu (u)
=
uGB(k)
EueB(k)nr(y)2-1 (q — ir,(u)
Eu-
uGB(k)nr(y)
SueB(k)nr(y)(q — 1)
ep(y> pu (u') Eu-er(y)2-1 (q — '
_ i \-w(k)
\B(k) n r(y)\
(12)
Euer(y)(q — 1)-w(u) Euer(y)(q — 1)w<k>-w(u)' It is obvious that
pk\y(k|y) = 0 ^ B(k) n r(y) = 0 ^ k e p(r(y)),
then
H(K |Y = y) = — £ pk\y (k|y)log2 (pk\y (k|y)) =
ke/3(r(y))
\B(k) n r(y)\ _log2(_\B(k) n r(y)\_) ^ log2 |p(r(y))|
u) l0g2^ (q — 1)w(k)-w(u)) ^ l0g2 \p(r
=—
_ V (q — 1 )w(k)-w(u)
because log2 \p(r(y))\ is the entropy of uniformly distributed K for a given y. Estimate of log2 \p(r(y))\ is also right because there are only 2l possible variants of vector k(e F2), on the one hand, and equation (11) has ql-rank(nT(G)) solutions, on the other hand. □
Let B(k, y) = B(k)nr(y) and for i e {0,..., 1} define A(r(y)) = £uer(y)(q — 1)i-w(u).
Then from (12) we get pk\y(k\y) = )),
H(K\Y = y) = — V Pk \ y (kjy)log2 Pk\y (kjy) = — V J^kffl log J (k y)\ ^ =
kG F1
\ B (k, y) \ Aw(k)(r(y))
Aw(k) (r(y));
l
1
— 5 A(r(y)) keFf,:w(k)=
= — £
1
= — £
i=0
1
Ai (r(y))
\ B(k, y) \(log2 ( \B(k, y) \ ) — log2 (Ai(r(y)))) kGFi, :w(k)=i
i=0
Ai(r(y))
Y, \ B(k, y) \ log2 ( \ B(k, y) \ ) — log2 (Ai(r(y))) £ \ B(k, y) \
keFi :w(k)=i kGFi :w(k)=i
£
i=0
1
Ai (r(y))
^ \ B(k, y) \ log2 ( \B(k, y) \ ) — Ni(r(y)) log2 (Ai(r(y))) ke F22 :w(k)=i
where Ж,(Г(у)) = |{u G Г(у) : w(u) = i}|. Thus, if q = 2, then calculation of H(K|Y = y) and I(K; Y) seems to be a hard challenge. Because two or more different solutions of the system (11) can correspond to one binary vector k if their supports are the same. Next theorem calculates I(K; Y) for q = 2.
Theorem 2. Let q = 2, т ç n, then I(K; Y) = rank(nr(G)) for (G(GG), в)--scheme.
Proof. As q = 2 then |B(k)| = 1 and Г(у) = /3(Г(у)), because different binary vectors have different supports. If у is fixed, then using Lemma 1 for k G Г(у) we have:
PKIY(к|у) |B(k) П Г(у)1 1
Eu£F(y)(2 - 1)w(k)-w(u) |Г(у)|
Щ|log2( ТГТууТ
H(KlY = у) = - E ^ = |Г(у)|.
ker(y)
As |r(y)| = 21-rank(nT(G)), then H(K|Y) = H(K|Y = y) = l - rank(nr((G)). The last step is the substitution of the value of H(K|Y) into (10). □
It follows from (10) and Lemma 1 that obtaining information I(K; (Z, ...,Z, Y = y)) = I(K; Y = y) is strongly related with constructing r(y) as the set of solutions of the system (11). As |r(y)| = q1-rank(nT(G)), this task can be challenging in selecting parameters of the scheme. In general, if (G(G)hl,h2, 0) is used, then
I(K; Y = y) > log2(|G((G)hi,h21) - log2(/(r(y)) n {k G F* : hi < w(k) < h2}).
The complexity of obtaining this information seems to be equivalent to the complexity of making the set
/(r(y)) n {k G F2 : hi < w(k) < h2}.
To make this set it is necessary either to construct the set r(y) or to look over all possible vectors from F^, then choose vectors with weight in range [hi5h2] only and check if these vectors are solutions of the system (11). Furthermore, if |G(G%1)h21 is small, the eavesdropper is able to check all functions from G(G)hl,h2. Thus, the computational complexity of obtaining information about package of informational blocks (when Hypothesis 1 is right) is not less than
O (min{|G(G%i,h21, |r(y)|}) , (13)
where O(|G((G)hl,h21) is the complexity of brute force over all functions from G(G)hl,h2 and O(|r(y)|) is the complexity of making the set r(y).
Note that for q = 2 obtaining full information even about the length of informational blocks by package (7) is a severe challenge. Consider the general case when G(G)hl,h2-scheme is used. Actually the eavesdropper will get non-zero information about the length only if there is at least one number l' G {h1 ,...,h2} such that there is not any vector with weight l' in r(y). The complexity of obtaining information about the length on conditions ^ < l and rank(nT((G)) = ^ may be reduced to one task in the coding theory. Namely, the matrix nT((G) may be considered as a transposed parity-check matrix of
some [1,1 — y]-code. Let vector y, number w e {h1, ...,h2}, and transposed parity-check matrix nT(G) be preassigned. The task of finding vector u with weight no more than w on condition (11) is NP-complete [12]. If (G(G)hl,h2,6)-scheme is used, then obtaining non-zero information about the length of message is equivalent to finding out that there is no vector u with weight exactly w on condition (11). We do not know any polynomial algorithm for resolving the latter task. Note that in binary case this problem is also NP-complete [13].
Thus, according to Hypothesis 1 the resistance of the modified code noising method to known attacks, particularly to the repetitive interception attack, is based on the fact that it may be difficult (depending on the parameters) for the coalition to obtain information about the mapping used. It should be noted that to increase resistance to repetitive interception attack it is also recommended to use small value of the parameter 6. In this case the probability of the appearance of code blocks, corresponding to one informational block with the same mapping, is reducing. If 6 =1, then the level of defense is maximal in this sense, but code rate Re = 0,25 is minimal.
We assume above that number of repositories is equal to the length of code block. It is practically unreal if length of code block is huge. But proposed (G(G), 6)-scheme may be easily adopted for a smaller number of repositories. If N is the length of code block and n is the number of repositories (n < N), then we should write no more [N/n] code symbols into every repository. In this case coalition with y repositories knows no more than y[N/n] symbols of every code block, and inaccessibility of n — v repositories is equivalent to erasure of (n — v)[N/n] symbols of every code block.
2.3. An example of (G(G), 6)-scheme application
Let C be a [255,200, 56] Reed-Solomon code over F28, q = 28. The table contains the comparison of characteristics of (G(G), 6)-scheme and the classical code noising method based on pair (C, C) if C is [255,150,106] Reed-Solomon code. Code rate of a pair-based code noising method is 50/255 « 0.196 and theoretical-informational resistance is achieved if coalition knows no more than 150 symbols of code block [1]. For (G(G), 6)-scheme code rate is equal to 0.196 if 6 = 1 and is about 0.392 if 6 > 1000.
Maximal allowable size y of coalition for (G(G), 6)-scheme is calculated on condition that complexity (13) should be not less than 2128; it corresponds to high level of resistance according to [14]. Note that estimation (13) for this example takes the form O(min{2200, \ r(y)\ }). For any different sets T1 and t2 of the same cardinality for generating matrix of Reed - Solomon code we have rank(nT1 (G)) = rank(nT2(G)). Then for security reason the allowed maximal number x of symbols observed by the coalition in each code block can be obtained from inequality 28(200-x) > 2128; so, we have x < 184. Note that each repository knows only [] symbols of each code block where n is a number of repositories, n e {3, 5,17}. As we can see from the table, the modified method provides defense against a bigger coalition. In particular, in case of using three repositories the modified code noising method provides
Comparison of characteristics of (G((G), #)-scheme and (C, C)-pair
Number of repositories, n 3 5 17
length of a part, [N/n] 85 51 15
max. value n — v 0 1 3
max. value y for (G(G),0) 2 3 12
max. value y for (C, C) 1 2 10
computational resistance even in the case when two of the three participants have united in the coalition. At the same time the classical code noising method provides resistance only in the case of the coalition consisting of one participant.
CONCLUSION
Usually the core of the resistance of modern methods of data confidentiality protection is a certain mathematical problem with a computationally complex solution if particular "secret" is unknown. In this paper, a non-cryptographic method for protecting data confidentiality is constructed based on the use of special data coding and distribution of parts of the encoded data among the nodes of the distributed storage. In this case the "secret" is replaced with the assumption that the observer (i.e. node coalition) cannot get data from all nodes of the distributed storage. The paper shows that the complexity of recovering the protected data by coalition is not less than the complexity of solving the theoretical coding problem of finding all weights of vectors with a given syndrome. The computations lead to the conclusion that the constructed method can provide protection from coalitions of more cardinality than the classical code noising method, and provides not less protection from the failure of storage nodes. References
1. Subramanian A., McLaughlin S. W. MDS codes on the erasure-erasure wiretap channel. arXiv:0902.3286 [cs.IT], 2009.
2. Korzhik V., Yakovlev V. Nonasymptotic estimates of information protection efficiency for the wire-tap channel concept. In: Seberry J., Zheng Y. (eds.). Advances in Cryptology — AUSCRYPT '92. AUSCRYPT 1992. Lecture Notes in Computer Science, 1993, vol. 718, pp. 183-195. DOI: https://doi.org/10.1007/3-540-57220-1_61
3. Ozarov L. H., Wyner A. D. Wire-Tap Channel II. In: Beth T., Cot N., Ingemarsson I. (eds.). Advances in Cryptology. EUROCRYPT 1984. Lecture Notes in Computer Science, 1984, vol. 209, pp. 33-55. DOI: https://doi.org/10.1007/3-540-39757-4_5
4. Wei V. K. Generalized Hamming Weights for Linear Codes. IEEE Trans. Inform. Theory, 1991, vol. 37, no. 5, pp. 1412-1418. DOI: https://doi.org/10.1109/18.135655
5. Forney G. D. Dimension/Length Profiles and Trellis Complexity of Linear Block Codes. IEEE Trans. Inform. Theory, 1994, vol. 40, no. 6, pp. 1741-1752. DOI: https://doi.org/10.1109/18.340452
6. Luo Y., Mitrpant C., Hav Vinck A. J., Chen K. Some New characters on the wire-tap channel of type II. IEEE Trans. Inform. Theory, 2005, vol. 51, no. 3, pp. 1222-1229. DOI: https://doi.org/10.1109/TIT.2004.842763
7. Hu P., Sung C. W., Ho S.-W., Chan T. H. Optimal Coding and Allocation for Perfect Secrecy in Multiple Clouds. IEEE Transactions on Information Forensics and Security, 2016, vol. 11, no. 2, pp. 388-399. DOI: https://doi.org/10.1109/TIFS.2015.2500193
8. Kosolapov Yu. V. Codes for a generalized wire-tap channel model. Problems of Information Transmission, 2015, vol. 51, no. 1, pp. 20-24. DOI: https://doi.org/10.1134/S0032946015010020
9. Kosolapov Yu. V., Pozdnyakov A. V. Evaluation of resistance of code noising in the distributed data storage. Systems and Means of Informatics, 2015, vol. 25, no. 4, pp. 158174 (in Russian). DOI: https://doi.org/10.14357/08696527150412
10. Gazaryan Yu. O., Kosolapov Yu. V. On the experimental estimation of the lower bound for the maximum number of messages in a scheme aimed at data protection against spoofing. Computational Technologies, 2015, vol. 20, no. 6, pp. 5-21 (in Russian).
11. Bellare M., Tessaro S., Vardy A. A Cryptographic Treatment of the Wiretap Channel. arXiv:1201.2205 [cs.IT], 2012.
12. Barg S. Some new NP-complete coding problems. Problems of Information Transmission, 1994, vol. 30, no. 3, pp. 209-214.
13. Sendrier N., Simos D. E. The Hardness of Code Equivalence over Fq and Its Application to Code-Based Cryptography. In: Gaborit P. (eds.). Post-Quantum Cryptography. PQCrypto 2013. Lecture Notes in Computer Science, 2013, vol. 7932, pp. 203-216.
14. Lenstra A. K., Verheul E. R. Selecting Cryptographic Key Sizes. J. Cryptology, 2001, vol. 14, pp. 255-293. DOI: https://doi.org/10.1007/s00145-001-0009-4
Cite this article as:
Kosolapov Yu. V., Pevnev F. S. A Method of Protected Distribution of Data Among Unreliable and Untrusted Nodes. Izv. Saratov Univ. (N. S.), Ser. Math. Mech. Inform., 2019, vol. 19, iss. 3, pp. 326-337. DOI: https://doi.org/10.18500/1816-9791-2019-19-3-326-337
УДК 621.391.7
Метод защищенного распределения данных среди ненадежных и недоверенных узлов
Ю. В. Косолапов, Ф. С. Певнев
Косолапов Юрий Владимирович, кандидат технических наук, доцент, Институт математики, механики и компьютерных наук имени И. И. Воровича, Южный федеральный университет, Россия, 344090, г. Ростов-на-Дону, ул. Мильчакова, д. 8a, [email protected]
Певнев Федор Сергеевич, магистрант, Институт математики, механики и компьютерных наук имени И. И. Воровича, Южный федеральный университет, Россия, 344090, г. Ростов-на-Дону, ул. Мильчакова, д. 8a, [email protected]
В работе рассматривается модель защиты конфиденциальности и целостности данных в системе распределенного хранения. Предполагается, что информационные блоки кодируются в кодовые блоки, которые затем разделяются на части и распределяются среди узлов хранения распределенного хранилища. В качестве способа кодирования построена модификация метода кодового зашумления, которая одновременно обеспечивает вычислительную стойкость к коалиционным атакам на конфиденциальность хранимых данных, а также обеспечивает защиту от выхода из строя части узлов хранения. При этом защита конфиденциальности обеспечивается для коалиций большей мощности, чем в случае применения классического метода кодового зашумления. Вычислительная стойкость основана на сложности решения одной теоретико-кодовой задачи.
Ключевые слова: канал с перехватом, защищенное распределенное хранилище, коалиционные атаки.
Поступила в редакцию: 05.10.2018 / Принята: 21.05.2019 / Опубликована: 31.08.2019 Статья опубликована на условиях лицензии Creative Commons Attribution License (CC-BY4.0)
Образец для цитирования:
Kosolapov Yu. V., Pevnev F. S. A Method of Protected Distribution of Data Among Unreliable and Untrusted Nodes [Косолапов Ю. В., Певнев Ф. С. Метод защищенного распределения данных среди ненадежных и недоверенных узлов] // Изв. Сарат. ун-та. Нов. сер. Сер. Математика. Механика. Информатика. 2019. Т. 19, вып. 3. С. 326-337. DOI: https://doi.org/10.18500/1816-9791-2019-19-3-326-337