Welcome To Support Community

Pipeline Pilot and BIOVIA Foundation

Advanced Search
Ask Search:

Generating Fragments problem

Hello ,

I want to do a recursive fragmentation based on the paper 


Right now generate components doesn't generate the way as discussed in the paper. I want to remove  ring, linker, and substituent in first iteration. From the paper in figure 1 I want to generate node2, node3 node4 and node5 from the starting molecule.Can anyone help me on this fragmentation. Other ideas always welcome ?
Phil CochranePhil Cochrane
Hi Abhik,

Although I've not read the paper fully, it would be my suspicion that the fragmentation would need to take place one of 2 ways. 

If it's possible to draw "reactions" which break the bonds you'd like to break at each stage. These reactions wouldn't be based on real chemistry, but would rather be transformations which simply perform the changes needed. These can become very unwieldy, especially when there comes a choice as to which to perform. 

The alternative is to perform this fragmentation using the Molecular Toolkit API. This gives you complete control of what to transform and when, but comes at the cost of probably having to write this component in Java or Perl. Although the Molecular Toolkit API is partially exposed in PilotScript, it's my experience in these things that the limited selection of functionality exposed, coupled with the lack of functions make such a complex endeavor close to impossible in PilotScript .

Like I say, I've not fully read the paper yet, but I see that it mentions the Scaffold Tree fragmentation. Prior to this being available in Pipeline Pilot (it was introduced in PP 9.0), I wrote a Java-based fragment component to implement those rules using the Molecular Toolkit API. 

From scanning the methods section (page 6441), I think these rules might be somewhat easier than the scaffold tree. It might be possible to implement with a combination of reactions and some of the existing fragmentation components. I've attached a simple example of how this might work, using 4-hydroxy-biphenyl as in figure 1. The reaction and starting materials as embedded in the protocol so it should just run. Using the Generate Fragments (set to generate the Murko assemblies to remove side-chain functionality) and a single reaction, I've managed to generate node 2 (that's just the Murko assembly in this case), node 7 and node 5. To capture node 4, you'd need a "break side chain" reaction. The rest of the nodes are simply then a matter of separating the fragments from the product molecules.

I should point out that my "break linker" reaction is limited to just single bonds between 2 rings, and those ring atoms have only 2 ring bonds. If the linker was longer than one bond (e.g. benzylbenzene), it would fail. If one of the ring atoms was a bridgehead in a bicyclic ring, it would have 3 ring bonds so again, my reaction would fail.

As well as supporting more reactions, the issue here is how to make this recursive and that's where more examples would be needed. For example, what happens with 1,4-diphenylbenzene? I assume you get biphenyl and benzene, then you break biphenyl into 2 benzene rings? To do this with reactions would therefore involve looping in the protocol (e.g. substructure filter - does reaction A apply to this molecule, followed by applying that reaction followed by the test again).

I hope though that what I've described above at least gives you some pointers as to the direction to try?


Phil CochranePhil Cochrane

Hi Abhik,

I found this supporting paper which helped define the process. I've implemented a simple Java component to generate the fragments as a SMILES string. I've used the Dynamic Java (on Server) so the Java code is embedded for you to look out (though I haven't really documented what I've done).

The component takes the incoming molecule and generates an array of SMILES strings for the child nodes. It uses the same SMARTS string that the paper uses to find the bonds to break. The order of child nodes is different because the SMILES generation is different - Daylight is used in the paper, whereas this obviously uses Pipeline Pilot. This means the order of the nodes found is different, but the actual nodes found are the same and match figure 1 in the original paper.


This code does not record parent relationships, nor does it add the other properties that the paper mentions, but I'm sure that those could be added.

I hope that's useful for you.




This is certainly helpful Phil. Much Appreciate it.