Skip to content

Research

DFG project PABI “Parameterized Algorithmics for Bioinformatics”

This project aims at solving NP-hard bioinformatics problems using fixed-parameter algorithmics.  Using a careful analysis of the problem structure, exact solutions to several seemingly intractable problems come into reach.  The main idea of fixed-parameter algorithms is to confine the exponential part of the running time to a preferably small parameter individually chosen for the problem.  Despite longer running times, computing exact solutions to NP-hard problems in bioinformatics can pay off, as it may enable a better analysis of data from expensive and laborious experiments. Biological problems often feature characteristics that can be exploited towards a fixed-parameter algorithm.

In the first two years of the PABI project, we have developed, implemented, engineered, and evaluated fixed-parameter algorithms and data reduction rules for various problems in computational biology.  The focus of the following two years, besides developing algorithms for new problems, will be to continuously improve existing methods, and to make them available to researchers in  biology.

Project publications

DFG project IDUN “Identifying the Unknowns: Towards structural elucidation of small molecules using mass spectrometry”

Rapid identification of small compounds from small amounts of substance is of interest in many areas of biology and medicine such as metabolomics, bio-prospecting, biomarker discovery, and diagnostics. Today, mass spectrometry (MS) is a key technology for the identification of small molecules. Due to immense technological advances in mass spectrometry over the last years, the amount and complexity of MS data has been growing rapidly. Tandem mass spectrometry offers the potential of identifying small molecules solely from MS fingerprints, but computational analysis of such data is in its infancy, and this is presumed to be one of the major technological hurdles in metabolomics today.

The key objective of this project is to develop computational methods to “identify the unknowns”, that is, compounds that cannot be found in any database. For this, we will develop new computational techniques, models, and algorithms for the interpretation of MS fragmentation data from small molecules. As a medium term goal, we want to develop a BLAST-like search tool for searching fragmentation patterns. It turns out that many of the arising problems are computationally hard, and require new algorithmic concepts to guarantee that optimal solutions can be found. We will implement, train, and evaluate our methods to allow an automated processing of metabolite MS data.

Project publications

DFG project 3AGC “Algorithms for the Analysis of Approximate Gene Clusters”

This project is realized in cooperation with the AG Genome Informatics of Prof. Dr. Jens Stoye at Bielefeld University, Germany.

The order of genes in genomes can be used to determine the function of unknown genes, as well as the phylogenetic history of the organisms. In view of the ever-increasing speed of genome sequencing, there exists a huge amount of data for such studies. On the algorithmic side, though, methods are often based on overly simplified genome models, use heuristics to solve optimization problems, or suffer from long running times.

Gene clusters are sets of genes that occur as single contiguous blocks in several genomes. Unfortunately, the requirement of exact occurrences of gene clusters turns out to be too strict for the biological application. In this project, we want to develop models and algorithms for the computation of approximate gene clusters, that combine a formal strictness with applicability to biological data. At the same time, our algorithms must be swift to allow application to the increasing amount of genome data. We will combine methods from combinatorial optimization and algorithmic graph theory with a statistically sound evaluation. We will implement, train, and evaluate our methods to allow an automated processing of gene order data. Finally, we want to apply our method to biological data, to derive new insights about gene function.

Project publications

DFG project “FlipCut supertrees: Computing larger, better phylogenies faster”

Supertree approaches combine input trees with overlapping taxa sets into one large and more comprehensive tree. Here, the problem is dealing with incompatible data in a reasonable way. Supertree methods can be used as part of a divide-and-conquer approach for phylogenetic tree inference:  (1) we compute, say, maximum likelihood (ML) trees for several subsets of taxa; (2) we merge the resulting trees into one supertree;  (3) finally, we use this supertree as a seed for an ML analysis on the complete set of taxa.  Current supertree methods suffer from being too slow or too inaccurate when analyzing large datasets.  We have recently developed a novel supertree called FLIPCUT supertrees, which is very swift and computes supertrees of good quality.  In this project, we will further advance this methods so that supertree quality is on a par with the current supertree “gold standard” Matrix Representation with Parsimony (MRP). At the same time, we want to preserve the extraordinary swiftness of the method. We will then integrate the resulting method into a divide-and-conquer approach for phylogenetic reconstruction, and evaluate it against “pure” ML programs.

Project publications