skip to primary navigationskip to content
 

Network analysis provides insight into navigating chemical space

last modified Mar 06, 2018 12:25 PM
Alexei Lapkin and Philipp-Maximilian Jacob
Network analysis provides insight into navigating chemical space

Probability of reaction sequence length

Prof Alexei Lapkin and Philipp-Maximilian Jacob have discovered that the network of reactions comprising organic chemistry shares statistical phenomena with social networks, and that the structure of this network is itself full of untapped chemical information.

The original paper is Statistics of the network of organic chemistry, P-M Jacob and A Lapkin, React. Chem. Eng., 2018, 3, 102 (DOI: 10.1039/c7re00129k) and this work has recently been featured in Chemistry World, a publication from the Royal Society of Chemistry.

The abstract for the paper states that organic chemistry can be represented as a network of reactions and studied by mathematical tools of graph theory. In this paper, the structure of a network of organic reactions has been studied using several graph theory metrics. The network was based on a section of chemical space downloaded from Reaxys. The studied area of chemistry corresponds to the chemistry of terpenes and includes 12 238 931 species and 12 939 422 reactions after filtering of an initial set of 35 million reactions. The analysis of the network statistics confirmed that the network was scale-free, as was reported in the earlier literature from the analysis of a much smaller network. Many networks in other technological or non-technological areas show that nodes have a preference as to whether they connect to highly connected or scarcely connected nodes, but for chemistry no such trend was observed. It was found that the network of reactions exhibits "small world" behaviour and in simile to the 'six degrees of separation' encountered in social networks, on average, any molecule could be made from any other molecule in six synthesis steps. Scale-free networks have hubs in their wiring pattern. By investigating whether these hubs are not only well studied but also frequently used, it was found that they concentrated a large share of the network's load onto themselves, showing that the network's structure impacts the usage of chemistry, or vice versa, implying a hierarchy of molecules.

Looking to the future, Prof Lapkin predicts that chemistry will depend progressively more on the algorithmic use of information, which is only possible if published research contains machine-readable data. 'We would urge people to look into how easy it is to extract numbers out of publications if they want the papers to have long-term impact in the future of chemistry.'