УДК 579.519.6
doi 10.18101/2306-2363-2016-4-17-25
© D. Batkhishig, B. Mijiddorj, P. Enkhbayar
HELICAL PARAMETERS OF REGULAR п-HELICES IN PROTEINS
(Part 2)
The a-helix, 310-helix, п-helix and co-helix have been observed in protein structures. They account for 32% of residues, 4%, 0.3% and 0.2%, respectively. However, these percentages depend on resolution of solved structures and method for assignment of secondary structures. May 2016, culled Protein Data Bank (PDB) data set, containing 2901 protein chains with less than 25% sequence identity and < 1.6Л resolution (R-value < 0.25), was used in this analysis. Secondary structure assignments are performed by DSSP, STRIDE and SECSTR for п-helices. Helical parameters-pitch, residues per turn, radius, handedness and p = rmsd/(N-1)1/2 for п-helices are determined by HELFIT program. p-Value, estimates helical regularity and all п-helices with p <0.10Л, were identified as regular. Helical parameters of protein п-helices are compared with those of canonical п-helices and other types of protein helices.
Keywords: 310-helix, a-helix, п-helix, helical parameters, regular helix, protein structures, protein chains.
Д. Батхишиг, Б. Муиддорж, П. Энхбаяр
СПИРАЛЬНЫЕ ПАРАМЕТРЫ РЕГУЛЯРНЫХ п-СПИРАЛЕЙ В БЕЛКАХ (Часть 2)
а-Спираль, 310-спираль, п-спираль и о-спираль наблюдались в белковых структурах. Они составляют 32% от остатков, 4%, 0,3% и 0,2%, соответственно. Однако эти проценты зависят от разрешения решаемых структур и способу присвоения вторичных структур. Возможно 2016, из отобранного набора в данных банк белков (PDB), содержащих 2901 белковые цепи с менее чем 25% идентичности последовательности и < 1.6Л разрешающей способности (R-значения < 0.25), использовать в этом анализе. Вторичные задания структуры выполняются DSSP, STRIDE и SECSTR для п-спиралей. Спиральные параметры шага, остатки на оборот, радиусы, хиральности и р = RMSD/(N-1)1/2 для p-спиралей определяются программой HELFIT. р-Значения, оценивающие спиральную регулярность и все п -спиралей с р < 0.10Л, были идентифицированы как регулярные. Спиральные параметры белка p-спиралей сравнивались с данными канонических p-спиралей и других типов белковых спиралей.
Ключевые слова: 310-спиралей, а-спиралей, п-спираль, спиральные параметры, регулярные спирали, белковые структуры, белковые цепи.
Introduction
Helix is one of two main types of secondary structures in proteins. Helices are usually designated as in based on the number of residues per turn (i) and the number of atoms in the ring joined by the backbone hydrogen bond (n) [1]. Pauling and Corey first hypothesized the a-helix (3.6i3) and the y-helix (5.li7) structures [2]. Donohue later considered the possibility of other types of helices (2.2, 3ю, 4.3M and 4.416) [3]. Low and Baybutt also suggested the possibility of the 4.4i6-helix or п-helix [3]. The main stabilizing factor for helical structures in polypeptides is re-
peated hydrogen bonds between main chain carbonyl oxygen (C=O) and amide hydrogen (NH) groups with the a-helix characterized by an (/ ^ /+4) pattern, the 310 and the n-helix by repealing (/ ^ /+3) and (/ ^ /+5) hydrogen bonds, respectively [4].
There are several programs perform assignments of secondary structures based on three-dimensional (3D) atomic coordinates of proteins [4-6]. Among these, DSSP [4] and STRIDE [5] are the most widely used [7]. DSSP identifies helices based on the repeating (/W+n) hydrogen bonds with corresponding to n of 3, 4 and 5 for 310, a- and n-helices, respectively [4, 8]. STRIDE uses both hydrogen bonds and main chain dihedral angles to define secondary structures [5]. DSSP program identified only 9 unique n-helices from the database of more than 6000 of proteins [9]. Fodje and Karadaghi defined 116 n-helices using their home made program, SECSTR, from the database of 932 high resolution 3D structures of proteins [7].
These different results can be explained by the following two reasons: 1) Number of solved 3D structures was insufficient by this time 2) Programs to assign of secondary structures use different methods.
We studied helical parameters of protein helices with HELFIT program and compared with the parameters of canonical rc-helices.
Materials and Methods
Composition of database
The 16 May 2016 culled PDB data set, containing 2969 protein chains with less than 20% sequence identity and resolution < 1.6 A (R-value < 0.25), was used in this analysis.
DSSP program
DSSP performs secondary structure assignments by the bonding energy £<-0.5 kcal/mol between C=O of residue / and N-H residue n (/ ^ /+n). The optimal hydrogen bonding energy for mainchain-mainchain N—H--O hydrogen bonds Em < -3 kcal/mol. Hydrogen bond energy depends on both electrostatic interaction N—H-••O of atoms and of hydrogen bonds angle 6 [4].
STRIDE program
STRIDE program is designed for protein secondary structure assignment from 3D atomic coordinates based on the combined use of hydrogen bond energy and statistically derived backbone torsional angle information [7]. The hydrogen bond energy Ehb is calculated using the empirical energy function derived from the analysis of experimental data on hydrogen bond geometries in crystal structures of amino acids in polypeptide chains [10].
SECSTR program
SECSTR is a new addition to the DSSP program that is dedicated to identifying n-helices, which were seldom assigned by older versions of DSSP and STRIDE [7]. The secondary structure assignment methods based on hydrogen bond assignments (DSSP, STRIDE, and SECSTR) produced nearly identical assignments, with more than to 90% [6].
HELFIT program
HELFIT enables to calculate simultaneously all five of the helix parameters with high accuracy. The minimum number of data points required for the analysis is only four. HELFIT also calculates a parameter, p = RMSD/(N-1)12, which estimates the regularity of helical structures independent of the number of data points, where RMSD is the root mean square distance from the best-fit helix to data points and N is the number of data points [11]. Results and Discussion
We identified 27, 22 and 340 n-helices from 2901 high resolution protein structures by DSSP, STRIDE and SECSTR programs, respectively. All n-helices are divided into two groups, regular and irregular, withp-value: p < 0.10 A regular and p > 0.10 A irregular. 7 of 27, 5 of 22, and 76 of 340 helices are grouped as regular by the HELFIT program. In order to compare protein n-helices with the canonical n-helices the only parameters of regular n-helices are used for the further analysis (Table 1).
Table 1
Helical parameters of 86 regular n-helices in proteins identified by DSSP, STRIDE and SECSTR program
PDB ID Chain_Po sition P (A) n Az (A)4 r (A) Vc (A3)" P (A) Identified Program
1DJ0 A 81-87 5.01 4.18 1.20 2.58 25.06 0.10 SECSTR
1DK8 A 242249 5.12 4.36 1.17 2.69 26.70 0.10 SECSTR
1ELK A 95-101 5.24 4.42 1.19 2.70 27.15 0.10 SECSTR
1JET A 301308 4.82 4.44 1.09 2.80 26.74 0.09 SECSTR
1KJQ A 119125 5.10 4.30 1.19 2.67 26.56 0.09 SECSTR
1KK O A 199205 4.99 4.53 1.10 2.81 27.33 0.09 DSSP, STRIDE, SECSTR
1NU Y A_1276-1282 5.30 4.64 1.14 2.87 29.56 0.10 SECSTR
1RK6 A_386-393 5.30 4.47 1.19 2.80 29.20 0.06 DSSP
1RK6 A_387-393 5.14 4.41 1.17 2.74 27.49 0.04 STRIDE
1RK6 A_384-393 5.22 4.37 1.19 2.71 27.56 0.06 SECSTR
1W5 R A_58-64 5.17 4.37 1.18 2.73 27.70 0.08 SECSTR
1XG0 A 105111 5.32 4.55 1.17 2.84 29.63 0.10 SECSTR
1XG K A 266272 5.02 4.31 1.16 2.70 26.67 0.07 SECSTR
2BF D A 109-115± 5.20 4.50 1.16 2.80 28.46 0.10 SECSTR
2CI1 A 51-57 5.12 4.33 1.18 2.68 26.68 0.09 SECSTR
2DPL A_68-74 5.17 4.42 1.17 2.77 28.20 0.03 SECSTR
2GZS A 163169 5.31 4.53 1.17 2.83 29.49 0.09 SECSTR
P (A) n Az (A)b r (A) Vc (A3)" P (A) Identified Program
5.15 4.21 1.22 2.62 26.38 0.09 SECSTR
5.15 4.42 1.17 2.75 27.68 0.07 SECSTR
5.13 4.48 1.15 2.79 28.00 0.08 SECSTR
5.15 4.32 1.19 2.71 27.51 0.08 SECSTR
5.18 4.38 1.18 2.73 27.69 0.08 SECSTR
5.24 4.42 1.19 2.77 28.58 0.09 SECSTR
5.29 4.52 1.17 2.81 29.03 0.10 SECSTR
5.17 4.39 1.18 2.76 28.18 0.02 DSSP
5.09 4.42 1.15 2.75 27.36 0.06 DSSP
5.06 4.42 1.14 2.78 27.80 0.05 STRIDE
5.08 4.42 1.15 2.75 27.31 0.06 SECSTR
5.25 4.49 1.17 2.77 28.19 0.09 SECSTR
5.27 4.00 1.32 2.79 32.22 0.09 SECSTR
5.36 4.55 1.18 2.81 29.22 0.10 SECSTR
5.20 4.34 1.20 2.70 27.44 0.08 SECSTR
5.11 4.23 1.21 2.62 26.05 0.10 SECSTR
5.11 4.32 1.18 2.70 27.09 0.06 SECSTR
5.07 4.40 1.15 2.77 27.78 0.08 SECSTR
5.28 4.44 1.19 2.75 28.25 0.09 SECSTR
5.16 4.45 1.16 2.80 28.56 0.04 SECSTR
4.97 4.29 1.16 2.72 26.93 0.07 SECSTR
5.13 4.62 1.11 2.88 28.93 0.10 SECSTR
5.49 4.41 1.24 2.74 29.36 0.08 SECSTR
5.25 4.54 1.16 2.83 29.10 0.06 SECSTR
5.18 4.44 1.17 2.77 28.12 0.10 SECSTR
5.30 4.50 1.18 2.81 29.22 0.08 SECSTR
5.17 4.43 1.17 2.77 28.13 0.06 SECSTR
5.24 4.45 1.18 2.76 28.18 0.09 SECSTR
5.29 4.45 1.19 2.77 28.66 0.05 SECSTR
5.14 4.41 1.17 2.75 27.69 0.10 SECSTR
5.04 4.25 1.19 2.68 26.76 0.10 DSSP
PDB ID
Chain_Po sition
2H1V
2JIS 200А
2P51
2P6 W 2PB D
2POF
2PY
Q
2PY X
2PY X
2PY X
2RB K
2VL
A
2WQ
F
A_264-274
A_28-35
A_424-
430
A_207-213
A_154-160
A_88-94
A_37-43 B_61-67
A_232-239
A_232-
238
A_232-
239
A_122-129
A_68-77 A_59-65
2XR A_300-
Y 306
2Y53 A 48-54
3A0Y A 723-
729
3BH A 128-
Q 134
3H9C A_382-
391
3IT3 A_56-63
3OAJ A 24-30
3OCJ A 253-
259
3OY A_227-
V 233
3PB6 X_93-99
3PJP A 1334-
1340
3Q28 A 280-
286
3RRI A_22-28
3S5 A_692-
M 698
3T4L A 168-
174
3VE A 437-
N 443
3WA X_297-
2 303
PDB ID
Chain_Po sition
P (A) n Az (A)4 r (A) Vc (A3)" P (A) Identified Program
5.22 4.50 1.16 2.82 28.98 0.08 SECSTR
5.18 4.28 1.21 2.65 26.70 0.10 SECSTR
5.22 4.42 1.18 2.76 28.26 0.09 SECSTR
5.06 4.35 1.16 2.76 27.84 0.05 SECSTR
5.22 5.07 1.03 2.72 23.93 0.06 SECSTR
5.11 4.38 1.17 2.75 27.72 0.07 SECSTR
5.25 4.55 1.15 2.81 28.62 0.09 SECSTR
4.99 4.32 1.16 2.73 27.05 0.09 SECSTR
5.10 4.39 1.16 2.75 27.60 0.10 SECSTR
5.36 4.14 1.29 2.59 27.28 0.07 SECSTR
5.02 4.35 1.15 2.72 26.82 0.09 SECSTR
5.19 4.46 1.16 2.76 27.85 0.06 DSSP
5.14 4.46 1.15 2.80 28.39 0.07 STRIDE
5.16 4.44 1.16 2.77 28.01 0.05 SECSTR
5.13 4.36 1.18 2.73 27.55 0.08 DSSP
5.12 4.40 1.16 2.74 27.45 0.09 STRIDE
5.21 4.38 1.19 2.71 27.44 0.09 SECSTR
5.29 4.53 1.17 2.85 29.80 0.10 SECSTR
5.28 4.46 1.18 2.78 28.74 0.09 SECSTR
4.86 4.47 1.09 2.81 26.97 0.09 SECSTR
5.08 4.53 1.12 2.84 28.42 0.09 SECSTR
5.11 4.38 1.17 2.74 27.52 0.09 SECSTR
5.07 4.54 1.12 2.85 28.50 0.07 SECSTR
5.23 4.53 1.15 2.81 28.64 0.08 SECSTR
5.18 4.38 1.18 2.73 27.69 0.06 SECSTR
5.15 4.33 1.19 2.69 27.04 0.08 SECSTR
5.30 4.45 1.19 2.74 28.09 0.10 SECSTR
5.17 4.49 1.15 2.79 28.16 0.09 SECSTR
5.22 4.40 1.19 2.73 27.78 0.09 SECSTR
5.29 4.53 1.17 2.80 28.76 0.10 SECSTR
3ZB O
4AY O
4B1Y
4BR
C
4CB U
A_94-100
A_122-128
B_88-94
A_359-
365
A 89-95
4CD5 A 248-
254
4CD5 A_350-
356
4DJA A_305-
311
4DJA A 405-
412
4ES A 137-
M 143
4EZI A 128-
135
4GV A 231-
F 239
4GV A 232-
F 238
4GV A 229-
F 239
4I3G A 257-
264
4I3G A 257-
263
4I3G A 253-
264
4JA8 A_66-72
4LRT A 267-
273
4ME A 192-
2 198
4QB3 A_66-72
4R75 A 311-
318
4U9H L_127-
133
4W7 A_373-
L 379
4WRI A 65-71
4XE A 120-
M 126
4XFJ A_68-74
4XQ7 A 217-
223
4Z5S A 108-
115
4ZG A 115-
W 121
PDB ID Chain_Po sition P (A) n Az (A)b r (A) Vc (A3)" P (A) Identified Program
5A0Y A 314324 5.09 4.46 1.14 2.77 27.51 0.10 SECSTR
5AZ B A 203210 5.15 4.41 1.17 2.74 27.54 0.09 SECSTR
5BSR A_240-247 5.12 4.33 1.18 2.68 26.68 0.10 SECSTR
5DA W A_89-95 5.32 4.37 1.22 2.72 28.30 0.09 SECSTR
5DP2 A 143149 5.18 4.44 1.17 2.78 28.33 0.06 SECSTR
5E8X A_442-448 5.09 4.35 1.17 2.74 27.60 0.08 SECSTR
5EJ8 A 485491 5.22 4.55 1.15 2.81 28.46 0.10 SECSTR
5HZ7 A 280286 5.25 4.49 1.17 2.80 28.80 0.08 SECSTR
Average 5.17±0 4.42±0 1.17±0 2.75±0 27.89± 0.08±0
.11 .13 .04 .06 1.09 .02
Canonical n-helix 5.16 4.40 1.15 2.68 25.9 -
a Voronoi volume (Vc=n r Az); Helix rise per residue Az=P/n;
Total of 88 regular n-helices are 7, 5 and 76 identified by DSSP, STRIDE and SECSTR program respectively. The n-helix is identified at position 199-205 of A chain in 1KKO protein by the three programs [12-18].
Helix radius and Voronoi volume of real n-helices are larger than that of canonical n-helix. The other helix parameters are close to the parameters of canonical n-helix. Average length is 7.47 residues and length is in range of 7-12 residues (Table 2).
Table 2
Average of helical parameters for regular n-helices in proteins and standard
deviations
Average <P> (A) <n> <Az>(A) <r> (A) <VC>(A3) <p> (A)
n-helices 5. 13±0. 4 ,41±0. 1.16±0. 2.76±0. 27.75±0. 0.07±0.
(DSSP) 10 09 03 04 78 03
n-helices 5. 09±0. 4 ,44±0. 1.15±0. 2.77±0. 27.69±0. 0.07±0.
(STRIDE) 06 05 02 03 38 02
n-helices 5. 17±0. 4 ,42±0. 1.17±0. 2.75±0. 27.90±1. 0.08±0.
(SECSTR) 11 13 04 06 13 02
Standard deviations of helical parameters for n-helices identified by SECSTR program are larger than DSSP and STRIDE programs. Also, average values of the helix radius r and number of residue per turn n are approximate to each for the three programs.
B)
ISO
-ISO
C)
ISO
-ISO
с 6 О с* О
> 'Jv
ISO
-180
ISO
Fig. The Ramachandran-map of regular n-helices in proteins. The 9, y angles are indicated in panels which regular n-helices identified by A) DSSP, B) STRIDE and C) SECSTR, respectively. The abscissa is 9; the ordinate axis is y. The 9, y of residues at Nc and Cc are not shown.
Average dihedral angles of regular n-helices were determined at each for DSSP (-77°±14°, -50°±11°), STRIDE (-77°±15°, -51°±12°) and SECSTR (-81°±18°, -44°±21°) programs. The average values of backbone dihedral angles (9, y) of all regular n-helices observed were found to be (9, y)=(-81°, -45°) with standard deviations (o9, oy)=(17°, 20°). The average of dihedral angle is larger than canonical n-helix (-57°, -70°). The 9, y angles of regular n-helices are located on an allowed regions for other residues except for glycine, were removed from the calculation (Fig.).
Conclusion
• 2901 3D structures of high resolution protein structures were downloaded from Protein Data Bank (PDB) and there are 389 n-helices. In average, every protein contains 0.13 n-helices.
• All n-helices are divided into two groups, regular and irregular. 89 n-helices are regular among the total of 389 n-helices, 4.37%. Helix parameters of all regular n-helices are used for further analysis.
• Radii of all n-helices and Voronoi volume are larger than that of canonical n-helices and all the helical parameters are comparable with those of canonical helices.
References
1. Donohue J. Hydrogen Bonded Helical Configurations of the Polypeptide Chain // Proc. Natl. Acad. Sci. USA. — 1953. — V. 39, № 6. — P. 470-478.
2. Pauling L., Corey R. B., Branson H. R. The structure of proteins; two hydrogen-bonded helical configurations of the polypeptide chain // Proc. Natl. Acad. Sci. USA. — 1951. — V. 37, № 4. — P. 205-211.
3. Low B. W., Baybutt R. B. The n-helix a hydrogen bonded configuration of the polypeptide chain // J. of the American Chemical Society. — 1952. — V. 74(22). — P. 5806-5807.
4. Kabsch W., Sander C. How good are predictions of protein secondary structure? // FEBS Lett. — 1983. — 155(2). — P. 179-82.
5. Frishman D., Argos P. Knowledge-based protein secondary structure assignment // Proteins. — 1995. — 23(4). — P. 566-579.
6. Tyagi M., Bornot A., Offmann B., De Brevernet A. Analysis of loop boundaries using different local structure assignment methods // Protein Science. — 2009. — 18(9). — P. 1869-1881.
7. Fodje M. N., Al-Karadaghi S. Occurrence, conformational features and amino acid propensities for the pi-helix // Protein Eng. — 2002. — 15(5). — P. 353-358.
8. Richardson J. S. The anatomy and taxonomy of protein structure // Adv. Protein Chem. — 1981. — V. 34. — P. 167-339.
9. Weaver T. M. The pi-helix translates structure into function // Protein Sci. — 2000. — 9(1). — P. 201-6.
10. Boobbyer D. N., Goodford P. J., McWhinnie P. M., Wade R. C. New hydrogen-bond potentials for use in determining energetically favorable binding sites on molecules of known structure // J. of medicinal chemistry. — 1989. — 32(5). — P. 1083-1094.
11. Enkhbayar P., Damdinsuren, S., Osaki M., Matsushima N. HELFIT: Helix fitting by a total least squares method // Comput. Biol. Chem. — 2008. — 32(4). — P. 307-10.
12. Baker E. N. and Hubbard R. E. Hydrogen bonding in globular proteins // Prog Biophys Mol. Biol. — 1984. — 44(2). — P. 97-179.
13. Barlow D. J. and Thornton J. M. Helix geometry in proteins // J. Mol. Biol. — 1988. — 201(3). — P. 601-19.
14. Ramachandran G. N. and Sasisekharan V. Conformation of polypeptides and proteins // Adv. Protein Chem. — 1968. — 23. — P. 283-438.
15. Perutz M. New X-Ray Evidence on the Configuration of Polypeptide Chains: Polypeptide Chains in Poly-gamma-benzyl-L-glutamate // Keratin and Hemoglobin. Nature. — 1951. — 167. — P. 1053-1054.
16. Lees W. J., Benson T. E., Hogle J. M., Walsh C. T. (E)-enolbutyryl-UDP-N-acetylglucosamine as a mechanistic probe of UDP-N-acetylenolpyruvylglucosamine reductase (MurB) // Biochemistry. — 1996. — 35(5). — P. 1342-51.
17. Cooley R. B., Arp D. J., Karplus P. A. Evolutionary origin of a secondary structure: n-helices as cryptic but widespread insertional variations of a-helices that enhance protein functionality // J. of molecular biology. — 2010. — 404(2). — P. 232-246.
18. Duneau J. P., Genest D., Genest M. Detailed description of an alpha helix, pi bulge transition detected by molecular dynamics simulations of the p185c-erbB2 V659G transmembrane domain // J. Biomol. Struct. Dyn. — 1996. — 13(5). — P. 753-69
Batkhishig D., Department of Physics, School of Mathematics and Natural Science, Mongolian National University of Education, Laboratory of Bioinformatics and Systems Biology, Department of Information and Computer Science, School of Engineering and Applied Sciences, National University of Mongolia Ulaanbaatar, Mongolia, E-mail: [email protected]. mn
Mijiddorj B., Laboratory of Bioinformatics and Systems Biology, School of Engineering and Applied Sciences, National University of Mongolia, Ulaanbaatar, Mongolia.
Enkhbayar P., Laboratory of Bioinformatics and Systems Biology, Department of Information and Computer Science, School of Engineering and Applied Sciences, National University of Mongolia, Ulaanbaatar, Mongolia, E-mail: [email protected]
Батхишиг Д., Отделение физики, школа математики и естественных наук, Монгольский национальный педагогический университет, лаборатория биоинформатики и системной биологии, Отделение информационных и компьютерных наук, школа инженерных и прикладных наук, Национальный университет Монголии, Монголия, Улан-Батор, E-mail: [email protected]
Mijiddorj B., лаборатория биоинформатики и системной биологии, Отделение информационных и компьютерных наук, школа инженерных и прикладных наук, Национальный университет Монголии, Монголия, Улан-Батор
Энхбаяр П., лаборатория биоинформатики и системной биологии, Отделение информационных и компьютерных наук, школа инженерных и прикладных наук, Национальный университет Монголии, Монголия, Улан-Батор, E-mail: [email protected]. mn