Times are changing but order matters

Our preprint Times are changing but order matters: Transferable prediction of small molecule liquid chromatography retention times is finally available on chemRxiv! Congratulations to Fleming and all authors.

In short, we show that prediction of retention times is a somewhat ill-posed problem, as retention times vary substantially even for nominally identical condition. Next, we show that retention order is much better preserved; but even retention order changes when the chromatographic conditions (column, mobile phase, temperature) vary. Third, we show that we can predict a retention order index (ROI) based on the compound structure and the chromatographic conditions. And finally, we show that one can easily map ROIs to retention times, even for target datasets that the machine learning model has never seen during training. Even for chromatographic conditions (column, pH) that are not present in the training data. Even for “novel compounds” that were not in the training data. Even if all of those restrictions hit us simultaneously.

In principle, this means that transferable retention time prediction (for novel compounds and novel chromatographic conditions) is “solved” for reversed-phase chromatography. There definitely is room for improvement, but that mainly requires more training data, not better methods. For that, we need your help: We need you, the experimentalists, to provide your reference LC-MS datasets. It does not matter what compounds are in there, it does not matter what chromatographic setup you used; as long as it is references data measured from standards (meaning that you know exactly each compound in the dataset), and as long as you tell us the minimum metadata, this will improve prediction quality.

You can post your data wherever you like, but you can also upload them to RepoRT so that in the future, scientists from machine learning can easily access them. We now provide a web app that makes uploading really simple. If you are truly interested in retention time prediction, then this should be well-invested 30 min.