Marcus (with the help of The People) wrote a not-too-short, not-too-shabby HowTo document on, well, how to use SIRIUS 4 and CSI:FingerID. This will be published as a book chapter in a few months, but check out a preprint here.
I have just uploaded a new version (0.8.3) of the Lecture Notes on Algorithmic Mass Spectrometry. As expected, I did not have too much time to work on it (them?) during lecture time, which is luckily over now. It is a lot of small improvements. Also, Magnus Palmblad was so kind and had an expert look through the isotope pattern sections. Unfortunately, the stuff that was missing from the previous version, is still missing now…
Meet Marcus and Sebastian at the conference of the Metabolomics Society 2019.
On Monday and Tuesday, Marcus will present a poster (539) about SIRIUS 4 and turning tandem mass spectra into metabolite structure information.
The idea of the project is to integrate retention times from liquid chromatography into the SIRIUS/CSI:FingerID identification pipeline. Literally hundreds of papers have been published on the topic of retention time prediction, but all of them fail to provide predictions that are transferable across chromatography conditions and compound classes; see Héberger’s review (Journal of Chromatography A, 2007) where he speaks rather frankly about the malpractices of publishing such RT-prediction methods. On the other hand, retention times can indeed be used to further boost CSI:FingerID’s identification performance. Also, transferable retention prediction is not impossible, as we have shown here. The trick is not to try to predict retention time (which is extremely dependent on instrument parameters etc) but rather retention order.
We are searching for a qualified and motivated PhD student who wants to accept this challenge. (S)he should be knowledgeable in machine learning and preferably also bioinformatics in general; biochemistry knowledge is clearly also a plus. We believe that this can be the next big thing to further push CSI:FingerID’s performance. Please contact Sebastian or Kathrin in case you are interested and qualified.
The International Max Planck Research School at the Max Planck Institute for Chemical Ecology in Jena is looking for PhD students. One of the projects is from our group on “making SIRIUS and CSI:FingerID GCMS-ready”. Deadline is May 24, 2019.
SIRIUS and CSI:FingerID are the best-of-class tools for MS-based compound identification in metabolomics, natural products and related fields. More than one million compound queries have been submitted to our web service, from over 3000 users and 47 countries. See our recent publication in Nature Methods (Dührkop et al., 2019).
Currently, our tools can only process tandem mass spectrometry data; extending them to Gas Chromatography Electron Ionization appears natural, but comes with numerous challenging problems from algorithmics and machine learning. This will be done in cooperation with the group of Georg Pohnert, see his recent publication in Nature (Thume et al., 2018).
We are searching for motivated candidates from bioinformatics, machine learning, cheminformatics and/or computer science who want to work in this exciting, quickly evolving interdisciplinary field. Please contact Sebastian Böcker in case of questions.
Half a position is being paid by the IMPRS; this will be supplemented by funding from our chair to 2/3 TV-L E13. (Note that the cost of living in East Germany is still considerably lower than in West Germany.) Jena is a beautiful city and wine is grown in the region: https://www.youtube.com/watch?v=DQPafhqkabc.
SIRIUS & CSI:FingerID: https://bio.informatik.uni-jena.de/software/sirius/
Literature: https://bio.informatik.uni-jena.de/publications/ and https://bio.informatik.uni-jena.de/textbook-algoms/
Is it a script? Is it a textbook? Or maybe, lecture notes? See all the details at https://bio.informatik.uni-jena.de/textbook-algoms/!
Version 0.7 is now available and contains chapters on p-value calculation and decoy databases.
The previous version was not publicly announced because I hoped to get some feedback first; but somehow, it did not work out this way…
- K. Dührkop, M. Fleischauer, M. Ludwig, A. A. Aksenov, A. V. Melnik, M. Meusel, P. C. Dorrestein, J. Rousu, and S. Böcker, “Sirius 4: Turning tandem mass spectra into metabolite structure information,” Nature Methods, doi 10.1038/s41592-019-0344-8, 2019.
View-only access to the paper is available here.
Another Dagstuhl seminar on Computational Metabolomics will be held in January 2020. The seminar is filling up quickly: Less than a month ago, invitations have been send out; but 25 people have already accepted the invitation! That is a lot, considering that it is still 10 months to go.
The title of the Dagstuhl seminar is “Computational Metabolomics: From Cheminformatics to Machine Learning“; it will be organized by Corey Broeckling, Emma Schymanski, Nicola Zamboni and myself. Unfortunately, it is invitation only. Two Dagstuhl seminars on related topics (Seminar 15492 in Nov/Dec 2015 and Seminar 17491 in Dec 2017) were already very successful.
Hope that we have a jolly good time in Dagstuhl!
With SIRIUS and CSI:FingerID gathering interest in the community, we are thinking about a SIRIUS and CSI:FingerID user meeting (a SIRIUS user meeting, so to say) in Jena. This would be a 2-3 day come-together with the possibility to show what your are doing with our tools, discuss with the developers, give us feedback on what is SIRIUSly needed etc. We are open to suggestions.
But most importantly: Are you interested in such a meeting? Would you come to Jena for 2-3 days? When would be a good time? (September is the default, but this is usually packed.)
In case you are interested, please let us know. You can leave your comment below, but please also send an email to the SIRIUS email address.
With the publication of the beam search variant of BCD supertrees (Fleischauer and Böcker, PeerJ 2018), this project has come to an end. BCD supertrees shows an outstanding performance for a supertree method with guaranteed polynomial running time, and is usually on par or even better than established supertree methods such as MRP or SuperFine. With the beam search, you can trade running time for supertree quality; but for input trees that contain branch lengths, even the “regular” BCD shows excellent performance.
We sincerely hope that someone will continue our work and, in particular, will integrate BCD supertrees into a divide-and-conquer strategy to improve the quality of phylogenetic reconstruction for very large trees. In (Fleischauer and Böcker, Mol Biol Evol 2017) we have shown that this is indeed possible (Fig. 2): Computation with RAxML gets faster and the tree quality is improved. Given BCDs fast and guaranteed running times, this should be very interesting for large phylogenies with several thousand taxa: BCD requires only hours to compute a supertree with 5000+ taxa and, even more importantly, supertree quality does not deteriorate for such large datasets.
For us, this is it in phylogenetics — at least, for the moment. It has been a great experience with challenging and fascinating combinatorial problems!
ps. We gratefully acknowledge funding by Deutsche Forschungsgemeinschaft.
pps. The BCD code is available on GitHub.
Larson et al. (J. Am. Soc. Mass Spectrom., 2018) have applied CSI:FingerID to dopant-assisted atmospheric pressure chemical ionization (dAPCI) gas chromatography mass spectrometry (GC-MS) data. They identified almost three times as many compounds with SIRIUS and CSI:FingerID as when searching in the NIST spectral library. Find their study here.
On May 9, 2018, the CSI:FingerID web service has passed two million compound queries. Nice.
We have fixed a bug in SIRIUS when analyzing compounds in negative ion mode. These were wrongly treated as intrinsically charged. If you have analyzed negative ion mode data with SIRIUS 4 and CSI:FingerID, you might want to reanalyze the data with the newest version.
Some of you might have experienced problems to reach the CSI:FingerID web service on Thursday, April 12, 2018. The reason is: After a “relatively quiet” Wednesday with “only” 120k compound queries (irony warning), the CSI:FingerID web service had to handle a real-world stress test on Thursday, with 260k compound queries submitted on a single day. That is 3 compound queries per second on average.
This brought our job database into some trouble: The database runs on a redundant twin server and is regularly mirrored between the two servers, in case one of them is hosed. Unfortunately, the log files used for mirroring jobs became too big, stalling the web server.
The problem resided until Friday 13th noon. Bad luck.
We are sorry if this caused any inconvenience. All systems should be up and running again.
As a courtesy towards other users: In case you want to submit complete databases with hundreds of runs, it would be great if you could distribute your computations over the course of several days. Also, you might want to contact us in advance, so that we do not accidentally block you.
By the way: We have obviously passed 1.5 million compound queries after these two days.
The International Max Planck Research School at the MPI for Chemical Ecology in Jena is looking for PhD students, and one of the projects is on “making SIRIUS and CSI:FingerID GCMS-ready”. Only a half position is being paid by the IMPRS, but this can be supplemented by funding from our chair. We are searching for motivated candidates from bioinformatics, cheminformatics and computer science who want to work in this exciting, quickly evolving interdisciplinary field. Please see here for details, and apply here. Application deadline is May 16th, 2018. Contact Sebastian in case your have questions.