Marcus (with the help of The People) wrote a not-too-short, not-too-shabby HowTo document on, well, how to use SIRIUS 4 and CSI:FingerID. This will be published as a book chapter in a few months, but check out a preprint here.
Some of you might have noticed problems with the
The International Max Planck Research School at the Max Planck Institute for Chemical Ecology in Jena is looking for PhD students. One of the projects is from our group on “making SIRIUS and CSI:FingerID GCMS-ready”. Deadline is May 24, 2019.
SIRIUS and CSI:FingerID are the best-of-class tools for MS-based compound identification in metabolomics, natural products and related fields. More than one million compound queries have been submitted to our web service, from over 3000 users and 47 countries. See our recent publication in Nature Methods (Dührkop et al., 2019).
Currently, our tools can only process tandem mass spectrometry data; extending them to Gas Chromatography Electron Ionization appears natural, but comes with numerous challenging problems from algorithmics and machine learning. This will be done in cooperation with the group of Georg Pohnert, see his recent publication in Nature (Thume et al., 2018).
We are searching for motivated candidates from bioinformatics, machine learning, cheminformatics and/or computer science who want to work in this exciting, quickly evolving interdisciplinary field. Please contact Sebastian Böcker in case of questions.
Half a position is being paid by the IMPRS; this will be supplemented by funding from our chair to 2/3 TV-L E13. (Note that the cost of living in East Germany is still considerably lower than in West Germany.) Jena is a beautiful city and wine is grown in the region: https://www.youtube.com/watch?v=DQPafhqkabc.
SIRIUS & CSI:FingerID: https://bio.informatik.uni-jena.de/software/sirius/
Literature: https://bio.informatik.uni-jena.de/publications/ and https://bio.informatik.uni-jena.de/textbook-algoms/
- K. Dührkop, M. Fleischauer, M. Ludwig, A. A. Aksenov, A. V. Melnik, M. Meusel, P. C. Dorrestein, J. Rousu, and S. Böcker, “Sirius 4: Turning tandem mass spectra into metabolite structure information,” Nature Methods, doi 10.1038/s41592-019-0344-8, 2019.
View-only access to the paper is available here.
Speaking of SIRIUS and CSI:FingerID are gathering interest in the community, the CSI:FingerID web service has processed more than ten million compound queries. Awesome!
With SIRIUS and CSI:FingerID gathering interest in the community, we are thinking about a SIRIUS and CSI:FingerID user meeting (a SIRIUS user meeting, so to say) in Jena. This would be a 2-3 day come-together with the possibility to show what your are doing with our tools, discuss with the developers, give us feedback on what is SIRIUSly needed etc. We are open to suggestions.
But most importantly: Are you interested in such a meeting? Would you come to Jena for 2-3 days? When would be a good time? (September is the default, but this is usually packed.)
In case you are interested, please let us know. You can leave your comment below, but please also send an email to the SIRIUS email address.
A new version of SIRIUS 4 is available for download.
SIRIUS 4.0.1 brings many bugfixes, user interface polishing and improved stability of the CSI:FingerID backend.
- SIRIUS 4.0.1 now supports JAVA 9 and higher
- The structures used to train CSI:FingerID are now available via the web service:
See our changelog for further details .
You can download SIRIUS with CSI:FingerID here.
We have to shutdown the CSI:FingerID webservice. We will restart the service as soon as the AC is fixed.
UPDATE: All up and running again.
On July 8, 2018, the CSI:FingerID web service has passed five million compound queries. Awesome!
Larson et al. (J. Am. Soc. Mass Spectrom., 2018) have applied CSI:FingerID to dopant-assisted atmospheric pressure chemical ionization (dAPCI) gas chromatography mass spectrometry (GC-MS) data. They identified almost three times as many compounds with SIRIUS and CSI:FingerID as when searching in the NIST spectral library. Find their study here.
On May 9, 2018, the CSI:FingerID web service has passed two million compound queries. Nice.
We have fixed a bug in SIRIUS when analyzing compounds in negative ion mode. These were wrongly treated as intrinsically charged. If you have analyzed negative ion mode data with SIRIUS 4 and CSI:FingerID, you might want to reanalyze the data with the newest version.
Some of you might have experienced problems to reach the CSI:FingerID web service on Thursday, April 12, 2018. The reason is: After a “relatively quiet” Wednesday with “only” 120k compound queries (irony warning), the CSI:FingerID web service had to handle a real-world stress test on Thursday, with 260k compound queries submitted on a single day. That is 3 compound queries per second on average.
This brought our job database into some trouble: The database runs on a redundant twin server and is regularly mirrored between the two servers, in case one of them is hosed. Unfortunately, the log files used for mirroring jobs became too big, stalling the web server.
The problem resided until Friday 13th noon. Bad luck.
We are sorry if this caused any inconvenience. All systems should be up and running again.
As a courtesy towards other users: In case you want to submit complete databases with hundreds of runs, it would be great if you could distribute your computations over the course of several days. Also, you might want to contact us in advance, so that we do not accidentally block you.
By the way: We have obviously passed 1.5 million compound queries after these two days.
The International Max Planck Research School at the MPI for Chemical Ecology in Jena is looking for PhD students, and one of the projects is on “making SIRIUS and CSI:FingerID GCMS-ready”. Only a half position is being paid by the IMPRS, but this can be supplemented by funding from our chair. We are searching for motivated candidates from bioinformatics, cheminformatics and computer science who want to work in this exciting, quickly evolving interdisciplinary field. Please see here for details, and apply here. Application deadline is May 16th, 2018. Contact Sebastian in case your have questions.
We found a major bug in the web service of SIRIUS 3 which can also affect the stability of the new SIRIUS 4. Therefore we decided to shut down the web service of SIRIUS 3 immediately.
Please contact us () if you need to finish work that can only be done with SIRIUS 3. We will try to find a solution then.
With the release of the new SIRIUS version (and, behind the scenes, a new version of CSI:FingerID), we want to share some numbers so you know if it was worth the hassle. We use CASMI 2016 data, to allow you to compare our results against those of other methods. We use the candidate structures provided as part of category 2, automated structural identification. See also Schymanski et al. (J Cheminf 2017).
For molecular formula identification, we use both isotope patterns and fragmentation patterns. (Isotope pattern data were released after the contest.) We consider all molecular formulas — we will not get bored to stress that if you limit molecular formulas to those found in some structure database, you will never ever find a new molecular formula. We find that SIRIUS 4 identifies the correct molecular formula for 91.3% of the challenges.
Next, we use CSI:FingerID to identify the compound structures. During the last year, the CASMI 2016 data have found their way into the CSI:FingerID training data. We know that CSI:FingerID has excellent identification performance if a spectrum for this structure is present in the training data (expect anything between 70% and 95%). But this is not challenging, and also does not tell us how good SIRIUS and CSI:FingerID can identify truly novel compounds.
To this end, we excluded all structures from CASMI 2016 from the training data. Hence, any structure is novel, in the sense that CSI:FingerID has never before seen any MS/MS data for this structure. SIRIUS 4 and the new CSI:FingerID reach 37.8% correct identifications for novel structures and positive ion mode; this is significantly better than the 27.6% reported in the CASMI paper (Schymanski et al., J Cheminf 2017). In addition, we can now also process challenges in negative ion mode, thanks to the training data available in NIST; here, we reach 28.4% correct identifications for novel structures.
These numbers are for “unambiguously correctly” identified structures: Sometimes, two candidate structures have exactly the same molecular fingerprint and are scored with exactly the same score. If we include these “ambiguously correctly” identified structures, numbers increase to 40.2% and 30.9%, respectively.
By the way: The idea of a challenge is to be challenging. That is why category 2 of CASMI 2016 uses candidate lists directly extracted from ChemSpider. In application, you will probably use a smaller candidate lists, which will make identification easier and improve identification rates: For example, unambiguous correct identifications for novel structures and positive ion mode increase to 71.7% if we search in a biomolecular structure database with “only” 0.5 million structures.
To cut a long story short: SIRIUS 4 and CSI:FingerID provide outstanding performance for molecular formula and structure identification. As mentioned in the release news, it is also much faster than before: On a set of 1533 GNPS compounds, we observed a 36-fold speedup.
Update: Two bugs had to be corrected in our evaluation. Minor: We chose the wrong parameter set (TOF vs. Orbitrap), resulting in small ID rate changes. Major: We did not search in the biomolecule DB but rather in “biomolecules plus MINEs”. Fixing this resulting in pretty dramatic changes.
A new version of SIRIUS and CSI:FingerID is available for download. SIRIUS 4.0 supports structure elucidation for negative ion mode spectra and computes fragmentation trees up to 40 times faster. For positive ion mode spectra, CSI:FingerID shows an improved identification rate due to new training data and several methodical enhancements.
We also fixed a lot of bugs and think that CSI:FingerID is now working more stable.
You can download SIRIUS with CSI:FingerID here.
The CSI:FingerID web service has now processed more than one million compound queries. (In fact, we are already 50k queries beyond that.) We are thrilled, and hope that you made some interesting discoveries. 😉
During the Dagstuhl seminar on Computational Metabolomics a few days ago, there was a session on “bridging the gap” between methods developers and experimentalists, which resulted in a long list of what method developers should do to bridge the gap (develop GUIs, provide example data, manuals, tutorials, etc) but rather little that experimentalists can do.
Now, let us turn the tables: SIRIUS and CSI:FingerID come with a nice GUI, manual etc. But undoubtedly the best to explain how to use our software in practice, are you, the users! To this end, we ask you to send us any tutorial material you have (maybe, a video tutorial?) that may help fellow experimentalists to get acquainted to the software more quickly; you may also point out what the software cannot do, and how you deal with it; and so on.
Please leave a comment below with the link (I hope that works) or, even better, send us a link () that we can put on our soon-to-come training material web page!
ps. The first video tutorial is already out there: Louis-Felix Nothias-Scaglia (UCSD) has prepared this video tutorial on YouTube that explains how to combine Optimus (OpenMS), SIRIUS plus CSI:FingerID and GNPS. Many thx to him! The tutorial is potentially already outdated as development is progressing fast, but isn’t that nice?!
CASMI (Critical Assessment of Small Molecule Identification) 2017 results are finally out! Kai participated with SIRIUS/CSI:FingerID, and won categories 1 (Best Structure Identification on Natural Products), 2 (Best Automatic Structural Identification – In Silico Fragmentation Only), and 4 (Best Automatic Candidate Ranking). He did not participate in category 3.
For category 4 (in silico methods for searching in molecular structure databases), CSI:FingerID correctly identified 66 of 198 compounds (33.3%). This is more than 6-fold of what the best non-CSI:FingerID contestant reached. (Submissions containing “IOKR” in their name, correspond to different variants of the Input Output Kernel Regression version of CSI:FingerID.)
Unfortunately, CASMI 2017 data did not include isotope patterns. It appears that in many cases, SIRIUS was not able to find the correct molecular formula among the top candidate. This resulted in several cases where the correct structure was excluded due to the “wrong” molecular formula. We will investigate how many compounds would have been correctly identified if isotope pattern data would have been available.