We are happy to announce that SIRIUS 4.7.0 is now available for download . This release is all about fixing bugs and performance optimization. To all who had problems with the ILP solvers, a freezing GUI, high memory consumption or long running times: This update should make your life way easier. For a full list of changes see the Changelog.
We further integrated the option to compute fragmentation trees only with our heuristic algorithm (no ILP involved) to speedup molecular formula identification for high mass compounds. Together with applying timeouts on compound level this should make the processing of large datasets much more feasible.
In short: CANOPUS is a computational tool for systematic compound class annotation. It uses a deep neural network to predict 2,497 compound classes from fragmentation spectra, including all biologically relevant classes. From the machine learning perspective, the interesting part is that different levels of the neural network are trained using different data (heterogeneous training). CANOPUS explicitly targets compounds for which neither spectral nor structural reference data are available, and even predicts classes completely lacking tandem mass spectrometry training data. In evaluation using reference data, CANOPUS reached very high prediction performance (average accuracy of 99.7% in cross-validation) and outperformed four (rather advanced) baseline methods. We used CANOPUS to investigating the effect of microbial colonization in the mouse digestive system, for analyzing the chemodiversity of different Euphorbia plants, and for the structural elucidation of a novel marine natural product.
Full citation: K. Dührkop, L.-F. Nothias, M. Fleischauer, R. Reher, M. Ludwig, M. A. Hoffmann, D. Petras, W. H. Gerwick, J. Rousu, P. C. Dorrestein, and S. Böcker. Systematic classification of unknown metabolites using high-resolution fragmentation mass spectra. Nat Biotechnol, 2020. https://doi.org/10.1038/s41587-020-0740-8
In short: Annotating the molecular formula of a small molecule is the first step towards its structural elucidation but remains highly challenging, particularly for “large compounds” above 500 Daltons. ZODIAC is a network-based algorithm for the de novo annotation (no database needed) of molecular formulas, and processes complete experimental LC-MS/MS runs. (No metabolite is an island.) In comparison to SIRIUS, previously best-of-class for this task, ZODIAC reduces the error rate of false annotations roughly to the half. And sometimes, much more…
Full citation: M. Ludwig, L.-F. Nothias, K. Dührkop, I. Koester, M. Fleischauer, M.A. Hoffmann, D. Petras, F. Vargas, M. Morsy, L. Aluwihare, P.C. Dorrestein, and S. Böcker. Database-independent molecular formula annotation using Gibbs sampling through ZODIAC. Nat Mach Intell 2:629–641, 2020.
Some of you might have noticed problems with the pediction of negative ionized data within in last few days. These problems should be fixed now. Further SIRIUS 4.0.1 build 8 fixes some problem with the workspace export.
K. Dührkop, M. Fleischauer, M. Ludwig, A. A. Aksenov, A. V. Melnik, M. Meusel, P. C. Dorrestein, J. Rousu, and S. Böcker, “Sirius 4: Turning tandem mass spectra into metabolite structure information,” Nature Methods, doi 10.1038/s41592-019-0344-8, 2019.