The International Max Planck Research School at the MPI for Chemical Ecology in Jena is looking for PhD students, and one of the projects is on “making SIRIUS and CSI:FingerID GCMS-ready”. Only a half position is being paid by the IMPRS, but this can be supplemented by funding from our chair. We are searching for motivated candidates from bioinformatics, cheminformatics and computer science who want to work in this exciting, quickly evolving interdisciplinary field. Please see here for details, and apply here. Application deadline is May 16th, 2018. Contact Sebastian in case your have questions.
Sebastian Böcker
How good is the new SIRIUS? (update)
With the release of the new SIRIUS version (and, behind the scenes, a new version of CSI:FingerID), we want to share some numbers so you know if it was worth the hassle. We use CASMI 2016 data, to allow you to compare our results against those of other methods. We use the candidate structures provided as part of category 2, automated structural identification. See also Schymanski et al. (J Cheminf 2017).
For molecular formula identification, we use both isotope patterns and fragmentation patterns. (Isotope pattern data were released after the contest.) We consider all molecular formulas — we will not get bored to stress that if you limit molecular formulas to those found in some structure database, you will never ever find a new molecular formula. We find that SIRIUS 4 identifies the correct molecular formula for 91.3% of the challenges.
Next, we use CSI:FingerID to identify the compound structures. During the last year, the CASMI 2016 data have found their way into the CSI:FingerID training data. We know that CSI:FingerID has excellent identification performance if a spectrum for this structure is present in the training data (expect anything between 70% and 95%). But this is not challenging, and also does not tell us how good SIRIUS and CSI:FingerID can identify truly novel compounds.
To this end, we excluded all structures from CASMI 2016 from the training data. Hence, any structure is novel, in the sense that CSI:FingerID has never before seen any MS/MS data for this structure. SIRIUS 4 and the new CSI:FingerID reach 37.8% correct identifications for novel structures and positive ion mode; this is significantly better than the 27.6% reported in the CASMI paper (Schymanski et al., J Cheminf 2017). In addition, we can now also process challenges in negative ion mode, thanks to the training data available in NIST; here, we reach 28.4% correct identifications for novel structures.
These numbers are for “unambiguously correctly” identified structures: Sometimes, two candidate structures have exactly the same molecular fingerprint and are scored with exactly the same score. If we include these “ambiguously correctly” identified structures, numbers increase to 40.2% and 30.9%, respectively.
By the way: The idea of a challenge is to be challenging. That is why category 2 of CASMI 2016 uses candidate lists directly extracted from ChemSpider. In application, you will probably use a smaller candidate lists, which will make identification easier and improve identification rates: For example, unambiguous correct identifications for novel structures and positive ion mode increase to 71.7% if we search in a biomolecular structure database with “only” 0.5 million structures.
To cut a long story short: SIRIUS 4 and CSI:FingerID provide outstanding performance for molecular formula and structure identification. As mentioned in the release news, it is also much faster than before: On a set of 1533 GNPS compounds, we observed a 36-fold speedup.
Update: Two bugs had to be corrected in our evaluation. Minor: We chose the wrong parameter set (TOF vs. Orbitrap), resulting in small ID rate changes. Major: We did not search in the biomolecule DB but rather in “biomolecules plus MINEs”. Fixing this resulting in pretty dramatic changes.
One million compound queries for CSI:FingerID
The CSI:FingerID web service has now processed more than one million compound queries. (In fact, we are already 50k queries beyond that.) We are thrilled, and hope that you made some interesting discoveries. 😉
Juho Rousu will visit us Aug to Sep 2018
Juho Rousu (Aalto University, Finland) will visit our group August to September 2018. We are excited to have him with us again: During his last visit, we jointly laid the foundations for CSI:FingerID; let us see what comes out this time!
Call for Tutorials: SIRIUS and CSI:FingerID
During the Dagstuhl seminar on Computational Metabolomics a few days ago, there was a session on “bridging the gap” between methods developers and experimentalists, which resulted in a long list of what method developers should do to bridge the gap (develop GUIs, provide example data, manuals, tutorials, etc) but rather little that experimentalists can do.
Now, let us turn the tables: SIRIUS and CSI:FingerID come with a nice GUI, manual etc. But undoubtedly the best to explain how to use our software in practice, are you, the users! To this end, we ask you to send us any tutorial material you have (maybe, a video tutorial?) that may help fellow experimentalists to get acquainted to the software more quickly; you may also point out what the software cannot do, and how you deal with it; and so on.
Please leave a comment below with the link (I hope that works) or, even better, send us a link (sirius@uni-jena.de) that we can put on our soon-to-come training material web page!
ps. The first video tutorial is already out there: Louis-Felix Nothias-Scaglia (UCSD) has prepared this video tutorial on YouTube that explains how to combine Optimus (OpenMS), SIRIUS plus CSI:FingerID and GNPS. Many thx to him! The tutorial is potentially already outdated as development is progressing fast, but isn’t that nice?!
Our paper on FDR estimation has finally appeared in Nature Communications
Our paper “Significance estimation for large scale metabolomics annotations by spectral matching” (joined work with the group of Pieter Dorrestein) has finally appeared in Nature Communications; you can find it here.
CASMI 2017 results are out
CASMI (Critical Assessment of Small Molecule Identification) 2017 results are finally out! Kai participated with SIRIUS/CSI:FingerID, and won categories 1 (Best Structure Identification on Natural Products), 2 (Best Automatic Structural Identification – In Silico Fragmentation Only), and 4 (Best Automatic Candidate Ranking). He did not participate in category 3.
For category 4 (in silico methods for searching in molecular structure databases), CSI:FingerID correctly identified 66 of 198 compounds (33.3%). This is more than 6-fold of what the best non-CSI:FingerID contestant reached. (Submissions containing “IOKR” in their name, correspond to different variants of the Input Output Kernel Regression version of CSI:FingerID.)
Unfortunately, CASMI 2017 data did not include isotope patterns. It appears that in many cases, SIRIUS was not able to find the correct molecular formula among the top candidate. This resulted in several cases where the correct structure was excluded due to the “wrong” molecular formula. We will investigate how many compounds would have been correctly identified if isotope pattern data would have been available.
CSI:FingerID has processed 500,000 query compounds
The CSI:FingerID web service has just passed the mark of processing data from 500,000 query compounds — congratulations to CSI:FingerID, and thank you for your interest in our tools! (Be reminded that CSI:FingerID should be accessed via the SIRIUS application, not via the web page.)
SIRIUS license change to GNU GPL
Since version 3.4, SIRIUS is licensed under the GNU General Public License (GNU GPL). If you need SIRIUS under a different license, please contact us.
Trinity workflow video tutorial
SIRIUS and CSI:FingerID are part of the Trinity workflow, and Louis-Felix Nothias-Scaglia (UCSD) has prepared a wonderful video tutorial how to use Optimus, SIRIUS & CSI:FingerID and GNPS together – enjoy!
Kerstin awarded Wissenschaftspreis für anwendungsorientierte Abschlussarbeit
Today, Kerstin was awarded the “Wissenschaftspreis für anwendungsorientierte Abschlussarbeit” (scientific award for application-oriented thesis) from the Wirtschaftsförderungsgesellschaft mbH and the FSU Jena, for her PhD thesis “Small molecules: From mass spectral fragmentation data to structural elucidation”. Congratulations!
meet Sebastian at the OpenMS user meeting 2016
Sebastian will give a talk on CSI:FingerID and SIRIUS at the OpenMS user meeting 2016 in Tübingen, 21-23 September 2016.
meet Sebastian at the de.NBI Summer School 2016
Sebastian will give a talk at the de.NBI Summer School 2016: From Big Data to Big Insights (Dagstuhl, 26-30 September 2016).
Gastvorlesung Sebastian in Halle
Sebastian wird am Montag den 25.01.2016 zwei Vorlesungen zum Thema “Genome Rearrangements und Gene Clusters” an der Universität Halle halten.
Die Vorlesungen finden von 10:15 bis 12:00 Uhr in Seminarraum 1.30 sowie von 14:00 Uhr bis 15:15 Uhr in Seminarraum 0.04 des Institut für Informatik (Von-Seckendorff-Platz 1, Halle) statt.
Interessierte sind herzlich eingeladen.
We have published a paper in PNAS
Our paper “Searching molecular structure databases with tandem mass spectra using CSI:FingerID” has just appeared in the online issue of Proceedings of the National Academy of Sciences USA. Also see the press release by the Friedrich-Schiller-University Jena (here for German). The metabolite search engine CSI:FingerID is available from http://www.csi-fingerid.org/. This is joint work with Juho Rousu and his group at Aalto University (Finland).
PNAS paper scheduled for Sep 21
We are thrilled that our paper in “Proceedings of the National Academy of Sciences USA” is scheduled for Sep 21, 2015 online issue. More details then.
Meet Sebastian at Metabolomics 2015
Sebastian will attend the Conference of the Metabolomics Society 2015 in San Franciso, and give a talk on “Searching molecular structure databases with tandem mass spectra using CSI:FingerID”. On Tuesday after 6 pm, Steffen Neumann and Sebastian will head the BoF meeting on Computational Mass Spectrometry.
Kinderuni am Freitag 19. Juni
Am Freitag 19. Juni 2015 wird Sebastian im Rahmen der KinderUni Jena einen Vortrag zum Thema “Auch Computer müssen aufräumen” halten. Start ist um 16:00 Uhr, Veranstaltungsort ist der Hörsaal 7 in der Carl Zeiss Straße 3 (neben der Mensa).
Sebastian will give a talk at the Science Pub in Jena
Sebastian will give a talk at the Science Pub in Jena: “Was ist Bioinformatik und kann man mit Dreck Krebs heilen?” The talk is Mon April 13, 2015 at 20:oo, Cafe Wagner.