The Deutsche Forschungsgemeinschaft has provided us with funding for our project Harvester. The problem in many areas of small molecule machine learning is the available training data and how slowly more data become available. This is also true for MS/MS data, where doubling time is a decade or two, possibly more. To this end, a somewhat obvious idea is to resort to unlabeled data, as there is tons of it available (particularly in GNPS). Yet, using these data is non-trivial. We have already experimented with pre-training, but this improved annotation rates by a mere single percentage point. In our new project, we instead want to resort to self-training, a technique recently “rediscovered” and successfully used for AlphaFold2, among others. What we now need is someone to do the work and take the money. If you are interested, let us know!