УДК 579.519.6
doi: 10.18101/2306-2363-2016-2-3-10-17
© D. Batkhishig, P. Enkhbayar
HELICAL PARAMETERS OF REGULAR я-HELICES IN PROTEINS
The a-helix, 310-helix, п-helix and co-helix have been observed in protein structures; they account for 32% of residues, 4%, 0.3% and 0.2%, respectively. However these percentages depend on resolution of solved structures and method for assignment of secondary structures. Culled data set, containing 2901 protein chains with less than 25% sequence identity and < 1.6A resolution (R-value < 0.25), was used in this analysis. Secondary structure assignments are performed by DSSP, STRIDE, and SECSTR for п-helices.
HELFIT program determines the helical parameters-pitch, residues per turn, radius, and handedness and p = rmsd/(N-1)1/2 for п-helices, where RMSD is the root mean squares deviation from the best fit helix and N is helix length. p-value, estimates helical regularity and all regular п-helices with p < 0.10A were identified. Helical parameters of п-helices are compared with the helical parameters of canonical п-helices and other types ofprotein helices.
Keywords: 310-helix, a-helix, п-helix, helical parameters, regular helix, protein structures, protein chains.
Д. Батхишиг, П. Энхбаяр
СПИРАЛЬНЫЕ ПАРАМЕТРОВ РЕГУЛЯРНЫХ я-СПИРАЛЕЙ
В БЕЛКАХ
а-Спираль, 310-спираль, п-спираль и со-спираль были отмечены в белковых структурах. На них приходится 32% от остатка, 4%, 0.3% и 0.2%, соответственно. Однако эти проценты зависят по разрешения разрешающих структур и способа для присвоения вторичной структуры. Отобранный набор данных, содержащий 2901 белковые цепи с менее, чем 25% последовательной идентичности и разрешением < 1.6 A (R-значение < 0.25), был использован в данном анализе. Вторичные структуры присвоений выполняются DSSP, STRIDE и SECSTR для п-спиралей.
HELFIT программа определяет спиральные параметры-высоту тона, аминокислотный остаток на виток, радиус, и направленность и p = rmsd/(N-1)1/2 для п-спиралей, где RMSD является среднеквадратичным отклонением от оптимальной подобранной спирали и N является длиной спирали. р-значение оценивает спиральную регулярность и все регулярные п-спирали с р = 0.10A были определены. Спиральные параметры п-спиралей сопоставлены со спиральными параметрами канонических п-спиралей и другим видам белковых спиралей.
Ключевые слова: 310-спираль, а-спираль, п-спираль, спиральные параметры, регулярные спирали, белковые структуры, белковые цепи.
Helix is one of two main types of secondary structures in proteins. Helices are usually designated as in based on the number of residues per turn (i) and the number of atoms in the ring joined by the backbone hydrogen bond (n) [3]. Pauling and Corey first hypothesized the a-helix (3.613) and the y-helix (5.l17) structures [15]. Donohue later considered the possibility of other types of helices (2.2, 310, 4.314 and 4.416) [3]. Low and Baybutt also suggested the possibility of the 4.416-helix or n-helix [14]. The main stabilizing factor for helical structures in polypeptides is repeat hydrogen bonds between main chain carbonyl oxygen (C=O) and amide hydrogen (NH) groups with the п-helix characterized by an (i<—i+4) pattern, the 310- and the п-helix by repealing (i<—i+3) and (i<—i+5) hydrogen bonds, respectively [11].
There are several programs perform assignments of secondary structures based on three-dimensional atomic coordinates of proteins [7, 11]. Among these, DSSP (Kahsch and Sander, 1983) and STRIDE (Frishman and Argos, 1995) are the most widely used. While DSSP identifies helices based on the repeating (i^i+n) hydrogen bonds with corresponding to n of 3, 4 and 5 for 310, a- and n-helices, respectively [11], the STRIDE uses both of hydrogen bonds and main chain dihedral angles to define secondary structures. DSSP program identified only 9 unique п-helices from the database of more than 6000 of proteins [19]. Fodje and Karadaghi defined 116 п-helices using their home made program, SECSTR, from the database of 932 high-resolution three-dimensional structures of proteins [8].
These different results can be explained by the following two reasons: 1) Number of solved 3D structures were insufficient by this time 2) Programs to assign of secondary structures are using different methods.
We studied helical parameters of protein helices with HELFIT program and compared with the parameters of canonical л-helices.
Materials and Methods
Composition of Database
The 16 May 2016 culled PDB data set, containing 2969 protein chains with less than 20% sequence identity and 1.6 A resolution (R--value < 0.25), was used in this analysis. DSSP program
DSSP performs secondary structure assignments by the bonding energy Е < -0.5 kcal/mol between C=O of residue i and N-H residue n (i—i+n). The optimal hydrogen bonding energy for mainchain-mainchain N—H--O hydrogen bonds Em < -3 kcal/mol. Hydrogen bond energy depends on both electrostatic interaction N—H- • •O of atoms and of hydrogen bonds angle в [11].
They calculate the electrostatic interaction energy between two hydrogen bonding groups by placing partial charges on the C, O (+qb -qO and N, H (-q2, +q2) atoms.
E=qq|-(CN)+-(CH)-(CH)+-(CN) I-f
E interaction potential energy (kcal/mol), q1 = 0.42e and q2 = 0.20e, e is elementary charge (|e| = 1.610-19C), r is distance between two atoms (A), the dimensional factor f = 332 kcal/e2mol [11].
Fig. 1. Distances used to calculate hydrogen bond energy the Coulomb interaction.
STRIDE program
STRIDE program is designed for protein secondary structure assignment from three-dimensional atomic coordinates based on the combined use of hydrogen bond energy and statistically derived backbone torsional angle information [7]. The hydrogen bond energy Ehb is calculated using the empirical energy function derived from the analysis of experimental data on hydrogen bond geometries in crystal structures of amino acids in polypeptide chains;
Ehb = Er + Et + Ee
where, Er is the distance dependence of the hydrogen bond, and Et and Ee describe its directional properties.
E- C+D
r _ 8 6 r r
where C = -3Emr8m kcal A8/mol, D = -4Emr6m kcal A6/mol, r is the distance between the donor and acceptor atoms participating in the hydrogen bond (fig. 1), and Em and rm are the optimal hydrogen bond energy and length, respectively [7]. For mainchain-mainchain hydrogen bonds N—H-••O, Em = -2.8 kcal/mol and rm = 3.0 A [11]. The angular terms Et and Ee have the following forms:
Ee = cos26 and
Ee И
(0.9 + 0.1sin 2tt) cos t0,
K1 (K2 - cos2 t )3 cos t0
0 < tt < 90°
90° < tt < 110°
0, tt > 110°
where, Kj = 0.9/cos6110o, K2 = cos 110o and CaC'NO atoms situated in flat because tj is of rOH in flat (from O atom to H atom drawn radius vector) mapping C' - O of
bond from directional deviation angle, t0 is between rOH radius vector and flat angle (fig. 2).
SECSTR program
SECSTR is a new addition to the DSSP program that is dedicated to identifying n-helices, which were seldom assigned by older versions of DSSP and STRIDE [8]. The secondary structure assignment methods based on hydrogen bond assignments (DSSP, STRIDE, and SECSTR) produced nearly identical assignments, with more than to 90% [20].
Fig. 2. An illustration of main-chain hydrogen bond geometry as adapted from Boobbyer et al.
HELFIT program
HELFIT is a continuous helical five parameters described with least squares method. Therefore
1. the direction vector of the helix axis (ax, ay, az)
2. the radius of the helix r
3. radius of the helix and its +/- error r and ±Ar
4. the pitch of the helix P
5. number of residue per turn n
This program described the helical parameters with high accuracy for the analysis least four of data points three-dimensional coordinate and these points are coordinate of Ca atoms of amino acids of helix type secondary structure in polypeptide chain. Above the helical some parameters specification displacement of the helix axis along per turn or pitch P, it also calculates a parameter p = rmsd/(N-1)1/2, where rmsd is the root mean square distance from the best-fit helix to data points and N is the number of data points [6].
Discussion and Results
The a-helix is considered to be the most abundant form of secondary structure, accounting for about 31% of amino acid secondary structure states, while the 310-helix accounts for about 4% [1, 2]. The n-helix, however, appears to be extremely rare. The rarity of the n-helix has been attributed to its instability due to the following properties: (1) the dihedral angles 9 and y are unfavorable, lying at the
very edge of an allowed minimum energy region of the Ramachandran plot [17]; (2) the larger radius of the n-helix means that main chain atoms are no longer in van der Waals contact along the helix axis, resulting in a hole too small for a water molecules to fill [14]; (3) a large entropic cost is required to form a helix in which five residues need to be aligned to permit the (i<—i+5) hydrogen bond [16]. A few researchers have, however, found n-helices to be formed during molecular dynamics simulations of peptides [12] with some reports of a transition from a-helix to n-helix structure [4]. This suggests that the n-helix is not as unstable as previously believed.
We identified 27, 22 and 340 n-helices from 2901 proteins by DSSP, STRIDE and SECSTR respectively. All n-helices are divided into two types, regular and irregular, with p-value: p < 0.10 A regular and p > 0.10 A irregular. In order to compare protein n-helices with the canonical n-helices the only parameters of regular n-helices' are used for the further analysis.
Table 1
Helical parameters of 7 regular n-helices in proteins
Length P (A) n Azb (A) r (A) Vc (A3)fl p (A)
Average 7.71 5.13 4.41 1.16 2.76 27.76 0.07
Standard deviation 0.76 0.10 0.09 0.03 0.04 0.77 0.03
Minimum 7.00 5.01 4.25 1.11 2.68 26.76 0.02
Maximum 9.00 5.30 4.53 1.19 2.81 29.20 0.10
Canonical - 5.16 4.40 1.15 2.68 25.90 -
a Voronoi volume (Vc=n r Az), Helix rise per residue Az=P/n
Helical parameters of n-helices identified by DSSP program
Total 27 (20 irregular n-helices) n-helices are identified from 2901 proteins by DSSP program. Helix length, helical parameters, and p-values of n-helices in protein are determined with the HELFIT program (Table 1).
Helix radius of real n-helices is larger than the radius of canonical n-helix, also Voronoi volume of real n-helices is larger than canonical n-helix. The other helix parameters are close to the parameters of canonical p-helix. Average length is 7.71 and length is range of 7-9 residues.
Helical parameters of n-helices identified by STRIDE program
n-helices (22) are identified the high resolution 3D structures of proteins by DSSP program. Helical parameters, and p-values of n-helices in protein are determined with the HELFIT program and Voronoi volume (V) and helix rise per residue (Az) are calculated of n-helices (Table 2).
Table 2
Helical parameters of 6 regular n-helices in proteins
Length P (A) n Azb (A) r (A) Vc (A3)fl p (A)
Average 7 5.10 4.46 1.14 2.78 27.65 0.07
Standard deviation 0 0.06 0.06 0.02 0.03 0.36 0.04
Minimum 7 4.99 4.40 1.10 2.74 27.33 0.03
Maximum 7 5.14 4.54 1.17 2.81 28.39 0.14
Canonical - 5.16 4.40 1.15 2.68 25.90 -
Length of all real n-helices is same and helix radius of real n-helices is larger than the radius of canonical n-helix. Real values of P, r are close to the parameters of canonical n-helix.
Helical parameters of n-helices identified by SECSTR program
340 n-helices (264 are irregular helices) are identified the high resolution 3D structures of proteins by SECSTR program. Helical parameters of n-helices in protein are determined with the HELFIT program and Voronoi volume (V) and helix rise per residue (Az) are calculated of n-helices (Table 3).
Table 3
Helical parameters of 76 regular n-helices in proteins
Length P (A) n Azb (A) r (A) Vc (A3)fl p (A)
Average 7.42 5.18 4.40 1.18 2.75 28.10 0.08
Standard deviation 1.16 0.12 0.21 0.09 0.06 1.98 0.02
Minimum 5.00 4.82 2.99 1.03 2.58 23.93 0.03
Maximum 12.00 5.51 5.07 1.84 2.88 42.20 0.10
Canonical - 5.16 4.40 1.15 2.68 25.90 -
Real values of P, Az, r, and Vc, more than those of canonical helix and value of same that of canonical n-helix.
Conclusion
• 2901 3D structures of high resolution protein structures were downloaded from Protein Data Bank (PDB) and there are 389 n-helices. In average, every protein contains 0.13 n-helices.
• All n-helices are divided into two groups, regular and irregular. There are 89 n-helices are regular among the total of 389 n-helices, 4.37%. Helix parameters of all regular n-helices are used for futher analysis.
• Radii of all n-helices are larger than that of canonical n-helices and all the helical parameters are comparable with those of canonical helices.
References
1. Baker E. N., Hubbard R. E. Hydrogen bonding in globular proteins // Prog. Biophys. Mol. Biol. - 1984. - V. 44. - P. 97-179.
2. Barlow D. J., Thornton J. M. Helix geometry in proteins // J. Mol. Biol. -1988. - V. 201. - P. 601-619.
3. Donohue J. Hydrogen Bonded Helical Configurations of the Polypeptide Chain // Pro. Nat. Acad. Sci. USA. - 1953. - V. 39. - P. 470-478.
4. Duneau J. P., Genest D. and Genest M. Detailed description of an alpha helix-pi bulge transition detected by molecular dynamics simulations of the p185(c-erbB2) V659G transmembrane domain // J. Biomol. Struct. Dyn. - 1996. -V. 13.- P. 753-769.
5. Enkhbayar P., Boldgiv B. and Matsushima N. ©-Helices in Proteins // Protein J. - 2010. - V. 29. - P. 242-249.
6. Enkhbayar P., Damdinsuren S., Osaki M., Matsushima N. HELFIT: Helix fitting by a total least squares method // Comput. Biol. Chem. - 2008. - V. 32. - P. 307-310.
7. Frishman D., Argos P. Knowledge-based protein secondary structure assignment // Proteins. - 1995. - V. 23. - P. 566-579.
8. Fodje M. N., Al-Karadaghi S. Occurrence, conformational features and amino acid propensities for the pi-helix // Protein Eng. - 2002. - V. 15. - P. 353358.
9. Hobohm U., Scharf M., Schneider R., Sander C. Selection representative set of protein structures // Protein Sci. - 1992. - V. 1. - P. 409-417.
10. IUPAC-IUB, Commission on Biochemical Nomenclature // J. Mol. Biol. -1970. - V. 52. - P. 1-17.
11. Kabsch W., Sander C. Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features // Biopolymers. - 1983. -V. 22. - P.2577-2637.
12. Kovacs H., Mark A. E., Johansson J. and van Gunsteren W. F. The effect of environment on the stability of an integral membrane helix: molecular dynamics simulations of surfactant protein C in chloroform, methanol and water // J. Mol. Biol. - 1995. - V. 247. - P. 808-822.
13. Lee K. H., Benson D. R. and Kuczera K. Transitions from a to n helix observed in molecular dynamics simulations of synthetic peptides // Biochemistry. - 2000. - V. 39. - P. 13737-13747.
14. Low B. W., Baybutt R. B. The pi-helix - a hydrogen bonded configuration of the polypeptide chain // J. Am. Chem. Soc. - 1952. - V. 74. - P. 5806-5814.
15. Pauling L., Corey R. B., Branson H. R. The structure of proteins; two hydrogen-bonded helical configurations of the polypeptide chain // Proc. Nat. Acad. Sci. USA. - 1951. - V. 37. - P. 205-211.
16. Perutz M. F. New X-Ray Evidence on the Configuration of Polypeptide Chains // Nature. - 1951. - V. 167. - P. 1053-1054.
17. Ramachandram G. N and Sasisekharan V. Conformation of polypeptides and proteins // Adv. Protein Chem. - 1968. - V. 23. - P. 283-437.
18. Rohl C. A and Doing A. J. Models for the 3(10)-helix/coil, pi-helix/coil, and alpha-helix/3(10)-helix/coil transitions in isolated peptides // Protein Sci. -1996. - V. 5. - P. 1687-1696.
19. Weaver T. M. The pi-helix translates structure into function // Protein Sci. - 2000. - V. 9. - P. 201-206.
20. Manoj Tyagi, Aure lie Bornot, Bernard Offmann and Alexandre G. de Brevern. Analysis of loop boundaries using different local structure assignment methods // Protein science. - 2009. - V. 18. - P. 1869-1881.
Batkhishig D., Department of Physics, School of Mathematics and Natural Science, Mongolian National University of Education, Laboratory of Bioinformatics and Systems Biology, Department of Information and Computer Science, School of Engineering and Applied Sciences, National University of Mongolia Ulaanbaatar, Mongolia, E-mail: [email protected]
Enkhbayar P., Laboratory of Bioinformatics and Systems Biology, Department of Information and Computer Science, School of Engineering and Applied Sciences, National University of Mongolia Ulaanbaatar, Mongolia, E-mail: [email protected]
Батхишиг Д., Отделение физики, школа математики и естественных наук, Монгольский национальный педагогический университет, лаборатория биоинформатики и системной биологии, Отделение информационных и компьютерных наук, школа инженерных и прикладных наук, Национальный университет Монголии, Монголия, Улан-Батор, E-mail: [email protected]
Энхбаяр П., лаборатория биоинформатики и системной биологии, Отделение информационных и компьютерных наук, школа инженерных и прикладных наук, Национальный университет Монголии, Монголия, Улан-Батор, E-mail: [email protected]