The European Commission has awarded 6.3 million Euros to a four-year collaborative project on data-driven design of cells and microbial communities for applications ranging from human health to sustainable production of chemicals. With advances in synthetic biology genomes can now be edited at unprecedented speed allowing making multiple changes in the same genome at the same time. This increases the need for computational tools to design cells and communities of cells analogous to the tools used in Computer Aided Design of cars, buildings and other man-made objects. In biotechnology these design tools need to be able to use existing large-scale databases to discover new parts and place them in the functioning context of the cell. The tools need to be easily accessible and provide an intuitive visual map of the cell to the biotechnologists working in the lab on building better cell factories and communities.
The project, called DD-DeCaF (Bioinformatics Services for Data-Driven Design of Cell Factories and Communities) brings together leading academic partners from five European universities with five innovative European companies to address the challenge of building a comprehensive design tool. The academic partners will develop cutting edge methods for using large scale data to design cell factories and communities for biotechnological applications. Three innovative Small/Medium Enterprise partners will convert these advanced methods to software tools that can be used by non-experts and to build intuitive visualizations of biological networks. These tools will be tested and applied to real world cell factory development projects by end-user partners.
Using big “bio-data” to design better cell factories [Press release] (April 01, 2016)
The EU has granted 6.3 million Euros to the project DD-DeCaF, coordinated by the Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark. The objective is to develop a computer tool that will allow biotech companies to design and engineer cell factories faster than is currently possible today. The tool will accelerate the production of sustainable bio-chemicals and lay the groundwork for design of healthier foodstuff.
DD-DeCaF Kickoff Meeting in Brussels (March 10, 2016)
Orthology assignment is ideally suited for functional inference. However, because predicting orthology is computationally intensive at large scale, and most pipelines relatively inaccessible, less precise homology-based functional transfer is still the default for (meta-)genome annotation. We therefore developed eggNOG-mapper, a tool for functional annotation of large sets of sequences based on fast orthology assignments using precomputed clusters and phylogenies from eggNOG. To validate our method, we benchmarked Gene Ontology predictions against two widely used homology-based approaches: BLAST and InterProScan. Compared to BLAST, eggNOG-mapper reduced by 7% the rate of false positive assignments, and increased by 19% the ratio of curated terms recovered over all terms assigned per protein. Compared to InterProScan, eggNOG-mapper achieved similar proteome coverage and precision, while predicting on average 32 more terms per protein and increasing by 26% the rate of curated terms recovered over total term assignments per protein. Through strict orthology assignments, eggNOG-mapper further renders more specific annotations than possible from domain similarity only (e.g. predicting gene family names). eggNOG-mapper runs ~15x than BLAST and at least 2.5x faster than InterProScan. The tool is available standalone or as an online service at http://eggnog-mapper.embl.de.
Genome-scale metabolic reconstructions are currently available for hundreds of organisms. Constraint-based modeling enables the analysis of the phenotypic landscape of these organisms, predicting the response to genetic and environmental perturbations. However, since constraint-based models can only describe the metabolic phenotype at the reaction level, understanding the mechanistic link between genotype and phenotype is still hampered by the complexity of gene-protein-reaction associations. We implement a model transformation that enables constraint-based methods to be applied at the gene level by explicitly accounting for the individual fluxes of enzymes (and subunits) encoded by each gene. We show how this can be applied to different kinds of constraint-based analysis: flux distribution prediction, gene essentiality analysis, random flux sampling, elementary mode analysis, transcriptomics data integration, and rational strain design. In each case we demonstrate how this approach can lead to improved phenotype predictions and a deeper understanding of the genotype-to-phenotype link. In particular, we show that a large fraction of reaction-based designs obtained by current strain design methods are not actually feasible, and show how our approach allows using the same methods to obtain feasible gene-based designs. We also show, by extensive comparison with experimental 13C-flux data, how simple reformulations of different simulation methods with gene-wise objective functions result in improved prediction accuracy. The model transformation proposed in this work enables existing constraint-based methods to be used at the gene level without modification. This automatically leverages phenotype analysis from reaction to gene level, improving the biological insight that can be obtained from genome-scale models.
Optlang is a Python package implementing a modeling language for solving mathematical optimization problems, i.e., maximizing or minimizing an objective function over a set of variables subject to a number of constraints. It provides a common native Python interface to a series of optimization tools, so different solver backends can be used and changed in a transparent way. Optlang’s object-oriented API takes advantage of the symbolic math library SymPy (Team 2016) to allow objective functions and constraints to be easily formulated algebraically from symbolic expressions of variables. Optlang targets scientists who can thus focus on formulating optimization problems based on mathematical equations derived from domain knowledge. Solver interfaces can be added by subclassing the four main classes of the optlang API (Variable, Constraint, Objective, and Model) and implementing the relevant API functions.
The composition of a cell in terms of macromolecular building blocks and other organic molecules underlies the metabolic needs and capabilities of a species. Although some core biomass components such as nucleic acids and proteins are evident for most species, the essentiality of the pool of other organic molecules, especially cofactors and prosthetic groups, is yet unclear. Here we integrate biomass compositions from 71 manually curated genome-scale models, 33 large-scale gene essentiality datasets, enzyme-cofactor association data and a vast array of publications, revealing universally essential cofactors for prokaryotic metabolism and also others that are specific for phylogenetic branches or metabolic modes. Our results revise predictions of essential genes in Klebsiella pneumoniae and identify missing biosynthetic pathways in models of Mycobacterium tuberculosis. This work provides fundamental insights into the essentiality of organic cofactors and has implications for minimal cell studies as well as for modeling genotype-phenotype relations in prokaryotic metabolic networks.
OptFlux is an open-source and modular software to support in silico metabolic engineering tasks aimed at being the reference computational application in the field.
Cameo is a high-level python library developed to aid the strain design process in metabolic engineering projects.
iPath2: interactive Pathways Explorer is a web-based tool for the visualization, analysis and customization of various pathways maps.
The Transport Reactions Annotation and Generation (Triage) tool identifies the metabolites transported by each transmembrane protein and its transporter family.
@Note is a Biomedical Text Mining platform that copes with major Information Retrieval and Information Extraction tasks and promotes multi-disciplinary research.
The Mass-Action Stoichiometric Simulation (MASS) toolbox is a modeling software package that focuses on the construction and analysis of kinetic and constraint-based models of biochemical reactions systems.
iTOL: interactive Tree Of Life is an online tool for the display, annotation and management of phylogenetic trees. Trees can be annotated with 14 different dataset types, and exported into various graphical formats.
The Metabolic Models Reconstruction Using Genome-Scale Information (merlin) tool is an user-friendly Java application that performs the reconstruction of genome-scale metabolic models for any organism that has its genome sequenced.
Optlang is a Python package implementing a modeling language for solving mathematical optimization problems, i.e. maximizing or minimizing an objective function over a set of variables subject to a number of constraints. Optlang provides a common interface to a series of optimization tools, so different solver backends can be changed in a transparent way.