Научни трудове на Съюза на учените в България-Пловдив. Серия В. Техника и технологии, т. XV, ISSN 1311 -9419 (Print), ISSN 2534-9384 (On- line), 2017. Scientific Works of the Union of Scientists in Bulgaria-Plovdiv, series C. Technics and Technologies, Vol. XV., ISSN 1311 -9419 (Print), ISSN 2534-9384 (On- line), 2017.
КОМБИНАТОРНО ГЕНЕРИРАНЕ НА МОЛЕКУЛИ ПОСРЕДСТВОМ ВИРТУАЛЕН СОФТУЕРЕН РЕАКТОР Николай Кочев1, С вттлансАврамЕтс1, Нина Желязкова2 ^атедра „Атвлитичнехимес и теепютн^а химит", Пловдивски университет „Паисий Хилендарски", ул. „Цар Асен" 24, 4000Пловдив, България 2Идеяконсулт, ул. "A. Кънчев" 4, София 1000, България
COMBINATORIAL 0:F MOLECULES BYVIRTRAL
SOFTWARE REMCTOR Nikday Kochev1, Svetlan a Avramova1, Ni naJeliazkova2 ^epartmczn of^n^^lyri^i^lC^l^t^m^i^tFy andComputer Ch^em^^try, University of Plovdiv „Paisii Hilendarski", 24 Tzar Assen Str., 4000 Plovdiv, Bulgaria 2Ideaconsult Ltd, 4 A. Kanchev Str., Sofia 1000, Bulgaria
Ebstrict
Embit-ReictLR is i newly developed software module for simulition of chemical reictions is i port zf open source chemzinfzrmatics pTotfzrm Ambit. Fzr o given set zf initial reoctonts, Embit-Reoctzr oppTies exhoustiveTy oTT tronsfzrmotizns bosed zn generic chemicoT reoctizn Rules described in i predefined set of reictions. Fzr eoch zf the resuTt products ill possible tronsfzrmotizns ore oppTied tz zbtoin new przducts ond sz zn. In zrder tz czntrzT the czmbinotzrioT expTzsizn, the przcess stzps when cznditizns defined by the user ore reoched. Embit-Reoctzr is cznfigured vio JdON fiTes thot specify the reoctizn strategy, reoctizn rules, oTTzwed ond fzrbidden przducts, set zf porometers ond TzgicoT cznditizns fzr reoctizn ipplicition ind definition of sites where reictions zccurs. The reoctzr strategy is defined by TzgicoT expressizns zf mzTecuTor descriptzrs' voTues. We demonstrite ipplicitions of Embit-Reoctzr fzr generotizn zf virtuoT czmpzund Tibrories ond fzr czmbinotzrioT generotizn zf metobzTites.
Aey words: chemicil reiction, softwire ReICtLR, FMIRAF, FMERTF, JdON IntRLductiLn
Computer assisted application of molecule transformations and simulation of chemical reactions have become important tools in the process of searching new biologically active compounds and lead optimization unto medicinal drugs especially in the context of combinatorial chemistry and big data processing. Software based reaction applications determine what products can be obtained when a transformation takes place on a given set of initial reactants. The virtual combinatorial chemistry relies on various chemoinformatics techniques for representation of chemical reactions e.g. linear notations: SMIRKS (http://www.daylight.com), SLN (Homer, 2008), RInChI (Grethe, 2013) and xml based format CSRML (Yang, 2015). Such techniques are used to define the exact topology of reactants and the products, with all the necessary atomic and bond features, 214
represented in a compact form which is easy to interpret. A number of cheminformatics toolkits are recognized to be widely used for simulation of chemical reactions: CDK (Willighagen, 2017), OpenBabel (O'Boyle, 2011), RDKit (http://www.rdkit.org), OpenEye (https://www.eyesopen.com/), Daylight (http://www.daylight.com) and ChemAxon Marvin (https://chemaxon.com). Cheminformatics toolkits are attracting considerable interest due to the tools they provide for molecule comparison (exact and substructure search), handling of chemical structures and generation of molecular fingerprints. Specification of sites of the reacting molecules that undergo chemical transformations and assembling models of the appropriate products is a methodology that is applied in modeling of metabolic processes (Kirchmair, 2012) and generation of virtual compound libraries (Durrant, 2012).
In this paper we present a newly developed software reactor based on linear notation SMIRKS.
Ambit-Reactor Software Architecture
Ambit-Reactor is a software module for simulation of chemical reactions developed in our group. It is a part of Java library Ambit-Reactions within Chemoinformatics software platform Ambit (Jeliazkova, 2011). Ambit-Reactor module is developed in Java on top of the Chemoinformatics library CDK (Willighagen, 2017) which is used for internal representation of the molecules. Ambit-SMARTS module (Jeliazkova, 2011) is previously developed by authors software that provides basic functionality for representation of search queries via linear notation SMARTS and representation of chemical reactions via linear notation SMIRKS. Ambit-Reactor uses the latest extension of Ambit-SMARTS called Ambit-SMIRKS that performs: (1) parsing of SMIRKS linear notations into internal reaction (transformation) representations and (2) application of the stored reactions against target molecules for actual transformation of the chemical objects.
(i) CCCN (SMILES) Connection table AtomContainer
Atoms Bonds CDK Object
2 4 1 C 1 2 1
3 NH2 2 C 2 3 1
^ 3 C 3 4 1
(ii) [C;R1][C;R0] (SMARTS) Connection table (specialized)
Atom expressions y Bond expressions
match molecule fragment
(iii) [C:1][H]>>[C:1][o][H] (SMILES)
NH2
aliphatic hydroxylation / 2
QueryAtomContainer
SMIRKSReaction
2
Figure 1. Representation of molecules, search queries and reactions by means of linear notations (i) SMILES, (ii) SMARTS and (iii) SMIRKS.
Figure 1 illustrates the techniques used in Ambit-Reactor for the handling of three basic types of chemical objects. The molecules are represented as CDK objects (e.g. AtomContainer) where the input to the system can be done via several chemical formats; figure 1-(i) shows as an example the popular linear notation SMILES. Search queries are handled by means of SMARTS notations (figure 1 -(ii)). The chemical reactions are represented via linear notation SMIRKS which can be considered as notation constructed from two SMARTS parts - one for the reactants and one for the reaction products and additional atom mapping.
Ambit-Reactor is characterized with the following basic functionalities:
(1) Exhaustive application of all possible transformations from a predefined set of generic chemical reaction rules;
(2) Reactor configuration with a set of chemical and logical rules called "strategy of the reactor";
(3) The user can define the logical conditions for particular reaction application.
Reactor Configuration
The Reactor is configured via JSON (http://www.json.org/) configuration files that specify reaction rules, the reaction strategy, allowed products, forbidden products, parameter values that define the logical conditions for reaction application and the sites where reactions occur. Each
reaction is represented as a JSON section in the following format:
{
"NAME" : "Aldehyde oxidation", "CLASS" : "phasel",
"SMIRKS" : "[H][C:1]=[O:2]>>[C:1](O[H])=[O:2]", "USE_CONDITIONS" : ["REACTION_APPLICATIONS_PER_REACTANT < 2"]
}
The basic reaction description includes a formal name, reaction class, SMIRKS defining the reaction and conditions for the reaction application.
The combination of multiple chemical and logical conditions is called the "strategy of the reactor". Formally these conditions are set by logical expressions of the values of descriptors calculated for the compounds taking part in the reactions. Figure 2 show the Reactor configuration process and the interrelations between the elements of the reactor strategy.
JSON configuration
parse
Reaction data base
Reactor Algorithm
The Reactor algorithm workflow is illustrated in figure 3 which shows a part of tyrosine metabolism simulation tree.
Input molecule (SMILES' InChI' MOL file)
NC(Cc1ccc(O)cc 1 )C(O)=O Tyrosine
parse molecule
nh2 o
oh
Search reaction rules from data base against the initial molecule ho ho
Aromatic hydroxyiation [c:l][H:2]>>[c:l][0][H:2]
ho.
Search reaction rules from data base against product 1
gearch reaction rules from data base - against product 3
Decarboxylation [*:1][c:2][C](=[OHO])[O][H]>>[*:1][C'2]
sites that reaction fi ^ can ta^e place at
Oh
nh2 o
K
o
Transformation application
ho
oj^
product 5: ( NH2 Dopamine (allowed product) N_'
ho b
nh2
ho n_/ Product 6:
4"(2"aminoethyl)benzene" 1,3-dioi
_ sam e steps are repeated until products fulfill the reaction strategy conditions or reaction tree is exhausted
Figure 3. Reactor work flow: reaction application for tyrosine molecule (part of reactor tree).
The start molecule is submitted to the reactor by means of popular structure data formats: SMILES, InChI or *.MOL file (figure 3). For a given initial reactant, Ambit-Reactor applies exhaustively all possible transformations from a predefined database with generic chemical reaction rules. For each of the products all possible transformations are applied again to obtain new products and so on. In order to control the combinatorial explosion, the process stops when conditions defined by the user are reached e.g. number of reactions to take place, maximum number of nodes, number of "success" or "failed" nodes etc. The reaction simulation can be represented as a tree data structure where each node in the tree represents particular products obtained from reactions applied to the compounds from the upper level nodes. Each node includes also information about the transformation path and descriptors that are used for the reactor strategy. Typically the root node is the input starting molecule. "Success" nodes are those nodes that comply with the strategy; accordingly "failed" nodes are the nodes that do not comply all logical conditions defined in the reactor strategy. Tyrosine molecule is the starting (root) node for the example simulation given in figure 3. Aromatic hydroxylation reaction is matched to four possible sites of tyrosine molecule and accordingly four possible products are obtained; they form children nodes of the root node. Product 2 is topologically equivalent to product 1 and analogously product 4 is equivalent to product 3 therefore only two children nodes are generated. The reactor strategy in figure 3 is applied with condition reaction_applications_per_reactant < 2 hence aromatic hydroxylation is not further applied to products 1 and 3. The reaction application continues with decarboxylation reaction which is matched at one site for of product 1 and one site for product 3 as well. The obtained products are dopamine (products 5) and 4-(2-aminoethyl)benzene-1,3-diol (product 6). Dopamine molecule is in the "allowed products" list from the reaction strategy and thus dopamine node is a terminal one for the reactor tree. The reactor algorithm continues with product 6. Analogous steps are repeated for the obtained products that fulfill the reaction strategy conditions until reaction tree is exhausted, maximal number of nodes is reached or some other condition for simulation termination is reached.
Ambit-Reactor Applications
According to the configured strategy of the reactor, Ambit-Reactor software can be applied for
synthesis planning and retrosynthetic analysis, modeling of metabolic processes, combinatorial
generation of virtual compound libraries and guided generation of compound libraries.
Figure 4 represents three simple cases of exhaustive combinatorial generation of hydrocarbons and
alcohols.
a)
Reactor configuration:
Methylation of terminal carbon [ch3 :1 ][h]>> [ch2:1]c([h])([h])[h]
b)
Reactor configuration:
Methylation (any carbon) [c:1][h]>> [c:1]c([h])([h] )[h]
c)
Reactor configuration:
Methylation of terminal carbon [ch3 : 1 ][h]>> [ch2:1]c([h])([h])[h]
Starting molecule:
CC (ethane)
Starting molecule:
CC(ethane)
Starting molecule:
OCC (ethanol)
Result:
CCC
CCCC
CCCCC
(propane) CCC (butane) CCCC
(pentane) CC(C)C
Result:
(propane) OCCC (butane) OCCCC
(isobutane) OCCCCC (neo-pentane) OCCCCCC
Result:
(propanol) (butanol) (pentanol) (hexanol)
CCCCCC
(hexane) CC(C)(C)C
Figure 4. Combinatorial generation of molecules with Ambit-Reactor.
The hydrocarbons generated in case a) are only normal alkanes since the reactor is configured with methylation reaction applied only for terminal carbons while in case b) branched hydrocarbons are obtained as well (see figure 4). In case c), the same strategy of the reactor as in case a) is applied, but using ethanol as a starting molecule, hence alcohols are obtained.
Summary
Ambit-Reactor software can be used for exhaustive combinatorial generation of molecules by means of preliminary set reaction database and reactor strategy. Focused compound libraries and virtual libraries which contain molecules with higher synthetic accessibility can be generated by means of suitable strategy rules.
Ambit-Reactor module can be used as a software library by means of Java API access (http://ambit.sourceforge.net/). Also command-line standalone application is available at the following address http://ambit.sourceforge.net/reactor.html.
References
[1] http://www.daylight.com
[2] Homer, R.; Swanson, J., Jilek RJ, Hurst T., Clark RD.; SYBYL Line Notation (SLN): A Single Notation To Represent Chemical Structures, Queries, Reactions, and Virtual Libraries; J. Chem. Inf. Model. 2008, 48, 2294-2307.
[3] Grethe, G.; Goodman, J.; Allen, C.; International chemical identifier for reactions (RInChI); Journal of Cheminformatics 2013, 5:45.
[4] Yang, C., Tarkhov, A., Marusczyk, J., Bienfait, B., Gasteiger, J., Kleinoeder, T., Magdziarz, T., Sacher, O., Schwab, C.H., Schwoebel, J., Terfloth, L., Arvidson, K., Richard, A., Worth, A., Rathman, J.; New Publicly Available Chemical Query Language, CSRML, To Support Chemotype Representations for Application to Data Mining and Modeling; J. Chem. Inf. Model. 2015, 55, 510-528.
[5] Willighagen, E.; Mayfield, J.; Alvarsson, J.; Berg A.; Carlsson, L.; Jeliazkova, N.; Kuhn, S.; Pluskal, T.; Rojas-Cherto, M.; Spjuth, O.; Torrance, G.; Evelo, C.; Guha, R.; Steinbeck, C.; The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching, J. Cheminform, 2017, 9:33.
[6] O'Boyle, N., Banck, N., James, C., Morley, C., Vandermeersch, T., Hutchison, G.: Open Babel: An open chemical toolbox; Journal of Cheminformatics 2011, 3:33.
[7] http://www.rdkit.org
[8] https://www.eyesopen.com
[9] https://chemaxon.com
[10] Kirchmair, J.; Williamson, M.; Tyzack, J.; Tan, L., Bond, P., Bender, A., Glen, R: Computational Prediction of Metabolism: Sites, Products, SAR, P450 Enzyme Dynamics, and Mechanisms; J. Chem. Inf. Model. 2012, 52, 617-648.
[11] Durrant, J.; McCammon, A.; AutoClickChem: Click Chemistry in Silico. PLoS Comput Biol 8(3): e1002397.
[12] Jeliazkova, N.; Jeliazkov, V.; AMBIT RESTful web services: an implementation of the OpenTox application programming interface, Journal of Cheminformatics 2011, 3:18.
[13] Jeliazkova, N.; Kochev, N.; AMBIT-SMARTS: Efficient Searching of Chemical Structures and Fragments; Mol. Inf. 2011, 30, p. 707-720.
[14] http://www.json.org/