How good is the new SIRIUS? (update)

With the release of the new SIRIUS version (and, behind the scenes, a new version of CSI:FingerID), we want to share some numbers so you know if it was worth the hassle. We use CASMI 2016 data, to allow you to compare our results against those of other methods. We use the candidate structures provided as part of category 2, automated structural identification. See also Schymanski et al. (J Cheminf 2017).

For molecular formula identification, we use both isotope patterns and fragmentation patterns. (Isotope pattern data were released after the contest.) We consider all molecular formulas — we will not get bored to stress that if you limit molecular formulas to those found in some structure database, you will never ever find a new molecular formula. We find that SIRIUS 4 identifies the correct molecular formula for 91.3% of the challenges.

Next, we use CSI:FingerID to identify the compound structures. During the last year, the CASMI 2016 data have found their way into the CSI:FingerID training data. We know that CSI:FingerID has excellent identification performance if a spectrum for this structure is present in the training data (expect anything between 70% and 95%). But this is not challenging, and also does not tell us how good SIRIUS and CSI:FingerID can identify truly novel compounds.

To this end, we excluded all structures from CASMI 2016 from the training data. Hence, any structure is novel, in the sense that CSI:FingerID has never before seen any MS/MS data for this structure. SIRIUS 4 and the new CSI:FingerID reach 37.8% correct identifications for novel structures and positive ion mode; this is significantly better than the 27.6% reported in the CASMI paper (Schymanski et al., J Cheminf 2017). In addition, we can now also process challenges in negative ion mode, thanks to the training data available in NIST; here, we reach 28.4% correct identifications for novel structures.

These numbers are for “unambiguously correctly” identified structures: Sometimes, two candidate structures have exactly the same molecular fingerprint and are scored with exactly the same score. If we include these “ambiguously correctly” identified structures, numbers increase to 40.2% and 30.9%, respectively.

By the way: The idea of a challenge is to be challenging. That is why category 2 of CASMI 2016 uses candidate lists directly extracted from ChemSpider. In application, you will probably use a smaller candidate lists, which will make identification easier and improve identification rates: For example, unambiguous correct identifications for novel structures and positive ion mode increase to 71.7% if we search in a biomolecular structure database with “only” 0.5 million structures.

To cut a long story short: SIRIUS 4 and CSI:FingerID provide outstanding performance for molecular formula and structure identification. As mentioned in the release news, it is also much faster than before: On a set of 1533 GNPS compounds, we observed a 36-fold speedup.

Update: Two bugs had to be corrected in our evaluation. Minor: We chose the wrong parameter set (TOF vs. Orbitrap), resulting in small ID rate changes. Major: We did not search in the biomolecule DB but rather in “biomolecules plus MINEs”. Fixing this resulting in pretty dramatic changes.

SIRIUS 4.0 released

A new version of SIRIUS and CSI:FingerID is available for download. SIRIUS 4.0 supports structure elucidation for negative ion mode spectra and computes fragmentation trees up to 40 times faster. For positive ion mode spectra, CSI:FingerID shows an improved identification rate due to new training data and several methodical enhancements.
We also fixed a lot of bugs and think that CSI:FingerID is now working more stable.

You can download SIRIUS with CSI:FingerID here.

Call for Tutorials: SIRIUS and CSI:FingerID

During the Dagstuhl seminar on Computational Metabolomics a few days ago, there was a session on “bridging the gap” between methods developers and experimentalists, which resulted in a long list of what method developers should do to bridge the gap (develop GUIs, provide example data, manuals, tutorials, etc) but rather little that experimentalists can do.

Now, let us turn the tables: SIRIUS and CSI:FingerID come with a nice GUI, manual etc. But undoubtedly the best to explain how to use our software in practice, are you, the users! To this end, we ask you to send us any tutorial material you have (maybe, a video tutorial?) that may help fellow experimentalists to get acquainted to the software more quickly; you may also point out what the software cannot do, and how you deal with it; and so on.

Please leave a comment below with the link (I hope that works) or, even better, send us a link () that we can put on our soon-to-come training material web page!

ps. The first video tutorial is already out there: Louis-Felix Nothias-Scaglia (UCSD) has prepared this video tutorial on YouTube that explains how to combine Optimus (OpenMS), SIRIUS plus CSI:FingerID and GNPS. Many thx to him! The tutorial is potentially already outdated as development is progressing fast, but isn’t that nice?!

 

CASMI 2017 results are out

CASMI (Critical Assessment of Small Molecule Identification) 2017 results are finally out! Kai participated with SIRIUS/CSI:FingerID, and won categories 1 (Best Structure Identification on Natural Products), 2 (Best Automatic Structural Identification – In Silico Fragmentation Only), and 4 (Best Automatic Candidate Ranking). He did not participate in category 3.

For category 4 (in silico methods for searching in molecular structure databases), CSI:FingerID correctly identified 66 of 198 compounds (33.3%). This is more than 6-fold of what the best non-CSI:FingerID contestant reached. (Submissions containing “IOKR” in their name, correspond to different variants of the Input Output Kernel Regression version of CSI:FingerID.)

Unfortunately, CASMI 2017 data did not include isotope patterns. It appears that in many cases, SIRIUS was not able to find the correct molecular formula among the top candidate. This resulted in several cases where the correct structure was excluded due to the “wrong” molecular formula. We will investigate how many compounds would have been correctly identified if isotope pattern data would have been available.

SIRIUS+CSI:FingerID 3.5 released

Our new version, Sirius 3.5, comes with several advancements. Download and use it here.
We have a new overview tab for CSI:FingerID hits which displays results of structure search for multiple molecular formulas.
You can examine the predicted fingerprint of each compound (and molecular formula) independently of any database.
We now offer the possibility to create and search in custom structure databases.
Besides, we have a new bayesian networks scoring function for CSI:FingerID which considers dependencies between different molecular properties.
This and much more.

SIRIUS hotfix

We released a patch for SIRIUS+CSI:FingerID (now version 3.4.1) that fixes some critical bugs. We recommend everybody to download the newest version of SIRIUS.

SIRIUS+CSI:FingerID 3.4 released

A new Sirius version is available. We included new features to enable a more intuitive workflow. We hope you’ll like it. Download and use it here.
We provide element prediction using isotope pattern. CSI:FingerID now predicts more molecular properties which improves structure identification.
Besides we fundamentally changed the structure of the result output generated by the command line tool to its final version. This means you might have to adjust  your workflow.

 

 

 

Sirius release 3.4 is coming soon

During the next few days we will release Sirius 3.4

This will be a major release containing several changes on the command line interface.
You may have to adjust existing scripts to get them work with the finalized command line interface.

SIRIUS+CSI:FingerID for Mac OSX

SIRIUS+CSI:FingerID is no available for Mac. You can download it here. A few highlights of the new version:

  • Searching your tandem mass spectra in molecular databases using CSI:FingerId
  • Restrict the SIRIUS molecular formula identification to formulas appearing in molecular databases – or search through the whole space of possible formulas. It’s up to you.
  • Sort you identification results by confidence such that more reliable identifications (e.g. from high quality spectra or easy recognizable compounds) are separated from bogus identifications.
  • Predict structural features and substructures from tandem mass spectra and visualize them in your candidate structure list.

Don’t hesitate to give us your feedback, report bugs or suggesting new features. Just write to

SIRIUS+CSI:FingerID release

SIRIUS+CSI:FingerID leaves the alpha state. You can download it here. A few highlights of the new version:

  • Searching your tandem mass spectra in molecular databases using CSI:FingerID
  • Restrict the SIRIUS molecular formula identification to formulas appearing in molecular databases – or search through the whole space of possible formulas. It’s up to you.
  • Sort you identification results by confidence such that more reliable identifications (e.g. from high quality spectra or easy recognizable compounds) are separated from bogus identifications.
  • Predict structural features and substructures from tandem mass spectra and visualize them in your candidate structure list.

Don’t hesitate to give us your feedback, report bugs or suggesting new features. Just write to

CSI:FingerID: Best automated method in CASMI contest

CSI:FingerID participated in this year’s CASMI (Critical Assessment of Small Molecule Identification) contest for automated methods identifying compounds solely by mass spectral data without additional meta data (category 2). IOKR, the new prediction method within CSI:FingerID, won this category. Standard CSI:FingerId prediction method ranked 2nd best. For positive ionization, CSI:FingerID identified more than twice as much compounds than any other method.

We are pleased that SIRIUS and CSI:FingerID were successfully used by many other contestants in category 1.

You can try CSI:FingerID on http://www.csi-fingerid.org/. A commandline version as well as a user interface which allows batch processing of MS/MS data will be released soon. We will also integrate the new IOKR prediction method into the CSI:FingerID web interface.

SIRIUS 3.0 release

We present SIRIUS 3.0, a java-based software for discovering a landscape of de-novo identification of metabolites using single and tandem mass spectrometry.
SIRIUS uses isotope pattern analysis for detecting the molecular formula and further analyses the fragmentation pattern of a compound using fragmentation trees.

The version 3.0 is a complete rewrite of our previous software. It uses faster algorithms for the fragmentation tree computation and a revised scoring based on Bayesian statistics.

A command line tool is available here. We will add a graphical user interface soon.