CXSMILES – Part 2: Component Grouping

This post is a follow up on the previous introduction – Part 1. Here I examine how we can capture fragment grouping in CXSMILES and other extensions.

Fragment Grouping

Fragment grouping (or component grouping) allows you group together separate fragments/components of a molecule. It is critical for reaction representation and therefore several independent SMILES extensions that have emerged. Common cases include keeping counter-ions, hydrates, and salts together as a single “molecule”.

`EP2305640A2 Example 11, Step v`
(SMILES)

`EP2305640A2 Example 11, Step v`
(CXSMILES with fragment grouping)

Syntax

Here is a simple example annotated with the fragment indexes, we want to group together (0,1) and (3,4,5):

[Na+].[OH-].c1ccccc1.[Cs+].[O-]C(=O)[O-].[Cs+]>> |f:0.1,3.4.5|
--0-- --1-- ----2--- --3-- -----4------- --5--

Component indexes span the entire reaction, so we can for example move to the agents and the CXSMILES encoding does not change:

[Na+].[OH-].c1ccccc1>[Cs+].[O-]C(=O)[O-].[Cs+]> |f:0.1,3.4.5|
--0-- --1-- ----2--- --3-- -----4------- --5--

Does it only apply to reactions?

Toolkit dependent. ChemAxon appears to only read/write it on reactions (MarvinJS v22.9.0) but it’s also useful on molecules to capture formulations/mixtures (e.g. Artifical seawater) . In reaction, fragment grouping of agents (between the two “>”) appears to be ignored in MarvinJS – so my example images aren’t valid examples. One of our customers tested Marvin Desktop v21.15.1 for me and confirmed it round-trips correctly.

Do component terms need to be adjacent?

Toolkit dependent. In older versions of ChemAxon desktop tools (I no longer have access) I remember it would reject non-adjacent components:

[Na+].c1ccccc1.[OH-]>> |f:0.2|

This does not seem to be the case in MarvinJS and again a customer again confirmed it round-trips ok in Marvin Desktop.

Spanning Different Roles?

An input where the roles of the components being grouped (e.g. a reactant and a product) could be rejected as inconsistent:

c1ccccc1.[Na+]>>[OH-] |f:1.2|
c1ccccc1.[Na+]>>[OH-] |f:2.1|

MarvinJS and CDK default to the role of the first component encountered so those two inputs are different. As the author of the CDK logic I will note this is consistent only by coincidence.

Implicit grouping?

We can implicitly group components with multi-attach (m:) and Sgroup brackets. For example consider the following, which is preferred?

**.c1ccccc1 |$R1$,m:1:2.3.4.5.6.7|
**.c1ccccc1 |$R1$,f:0.1,m:1:2.3.4.5.6.7|

Alternatives

Daylight SMILES5

As with the cis/trans specification, SMILES5 had a solution:

“Molecule-level Components: There will be another level of components within a molecule and reaction object which will allow easier handling of complex mixtures.” – Futures, MUG 2005

SMARTS has component grouping using zero-level brackets already and it would likely have followed a similar syntax:

([Na+].[OH-]).c1ccccc1.([Cs+].[O-]C(=O)[O-].[Cs+])>>

Notice the wording that it applies to both molecule and reactions.

LillyMol

An extension used by Eli Lilly’s LillyMol is to treat the “.” to separate fragments and use a “+” to separate the molecules.

[Na+].[OH-]+c1ccccc1+[Cs+].[O-]C(=O)[O-].[Cs+]>>

Note that LillyMol also supports CXSMILES.

IBM RXN for Chemistry

Schwaller et al 2020 describe how they use “~” to group together fragments. They note in the supplementary information how this is more useful than CXSMILES for their purposes since it enforces the fragments are kept together:

[Na+]~[OH-].c1ccccc1.[Cs+]~[O-]C(=O)[O-]~[Cs+]>>

NextMove (proposed) / OntoChem

In 2013 Roger proposed a double-dot “..” for a similar purpose. The advantage being that “relaxed” SMILES parsers will simply ignore the repeated dot:

[Na+]..[OH-].c1ccccc1.[Cs+]..[O-]C(=O)[O-]..[Cs+]>>

OntoChem also use this representation in reactions but I cannot find a link to relevant material.

NextMove (actual)

In Pistachio we use CXSMILES for fragment grouping in reactions. In recent releases we have tried to use alternative representations that avoid the problem where possible:

[Na]O.c1ccccc1.[Cs]OC(=O)O[Cs]>>

Where needed we still use CXSMILES since it is the most widely supported convention:

The fragment grouping also gets captured in the JSON format of reactions albeit much less compactly:

{
role: "Product",
orgName: "title compound",
name: "Methyl (2S)-2-amino-3-(2-chlorophenyl)propanoate hydrochloride",
smiles: "Cl.N[C@H](C(=O)OC)CC1=C(C=CC=C1)Cl",
quantities: [ {type: "Mass", value: 5.89, text: "5.89 g"},
              {type: "Yield", value: 94, text: "94%"}],
stoichiometry: 1
}

We also support interconversion of the LillyMol syntax in our reaction processing tool set, HazELNut:

$ echo "*>[Na+].[OH-].[Cs+].[O-]C(=O)[O-].[Cs+]>* |f:1.2.3,4.5|" | ./filbert .smi .iwsmi
*>C(=O)([O-])[O-].[Cs+]+[OH-].[Na+].[Cs+]>*

$ echo "*>C(=O)([O-])[O-].[Cs+]+[OH-].[Na+].[Cs+]>*" | ./filbert .iwsmi .smi
*>C(=O)([O-])[O-].[OH-].[Na+].[Cs+].[Cs+]>* |f:1.4,2.3.5|

CXSMILES – Part 2: Component Grouping

Fragment Grouping

Alternatives

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112