The European Commission has awarded 6.3 million Euros to a four-year collaborative project on data-driven design of cells and microbial communities for applications ranging from human health to sustainable production of chemicals. With advances in synthetic biology genomes can now be edited at unprecedented speed allowing making multiple changes in the same genome at the same time. This increases the need for computational tools to design cells and communities of cells analogous to the tools used in Computer Aided Design of cars, buildings and other man-made objects. In biotechnology these design tools need to be able to use existing large-scale databases to discover new parts and place them in the functioning context of the cell. The tools need to be easily accessible and provide an intuitive visual map of the cell to the biotechnologists working in the lab on building better cell factories and communities.
The project, called DD-DeCaF (Bioinformatics Services for Data-Driven Design of Cell Factories and Communities) brings together leading academic partners from five European universities with five innovative European companies to address the challenge of building a comprehensive design tool. The academic partners will develop cutting edge methods for using large scale data to design cell factories and communities for biotechnological applications. Three innovative Small/Medium Enterprise partners will convert these advanced methods to software tools that can be used by non-experts and to build intuitive visualizations of biological networks. These tools will be tested and applied to real world cell factory development projects by end-user partners.
1st edition of the DD-DeCaF newsletter now available! (February 28, 2017)
The first edition of the DD-DeCaF is now available, in this issue it is presented an overview of the project and its resources and a summary including the publications generated in the year 1 of the project.
Webcast DD-DeCaF platform adds an interactive pathway viewer (January 25, 2017)
DD-DeCaF 2nd Consortium Meeting in Heidelberg (September 28, 2016)
Using big “bio-data” to design better cell factories [Press release] (April 01, 2016)
The EU has granted 6.3 million Euros to the project DD-DeCaF, coordinated by the Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark. The objective is to develop a computer tool that will allow biotech companies to design and engineer cell factories faster than is currently possible today. The tool will accelerate the production of sustainable bio-chemicals and lay the groundwork for design of healthier foodstuff.
DD-DeCaF Kickoff Meeting in Brussels (March 10, 2016)
Orthology assignment is ideally suited for functional inference. However, because predicting orthology is computationally intensive at large scale, and most pipelines relatively inaccessible, less precise homology-based functional transfer is still the default for (meta-)genome annotation. We therefore developed eggNOG-mapper, a tool for functional annotation of large sets of sequences based on fast orthology assignments using precomputed clusters and phylogenies from eggNOG. To validate our method, we benchmarked Gene Ontology predictions against two widely used homology-based approaches: BLAST and InterProScan. Compared to BLAST, eggNOG-mapper reduced by 7% the rate of false positive assignments, and increased by 19% the ratio of curated terms recovered over all terms assigned per protein. Compared to InterProScan, eggNOG-mapper achieved similar proteome coverage and precision, while predicting on average 32 more terms per protein and increasing by 26% the rate of curated terms recovered over total term assignments per protein. Through strict orthology assignments, eggNOG-mapper further renders more specific annotations than possible from domain similarity only (e.g. predicting gene family names). eggNOG-mapper runs ~15x than BLAST and at least 2.5x faster than InterProScan. The tool is available standalone or as an online service at http://eggnog-mapper.embl.de.
Genome-scale metabolic reconstructions are currently available for hundreds of organisms. Constraint-based modeling enables the analysis of the phenotypic landscape of these organisms, predicting the response to genetic and environmental perturbations. However, since constraint-based models can only describe the metabolic phenotype at the reaction level, understanding the mechanistic link between genotype and phenotype is still hampered by the complexity of gene-protein-reaction associations. We implement a model transformation that enables constraint-based methods to be applied at the gene level by explicitly accounting for the individual fluxes of enzymes (and subunits) encoded by each gene. We show how this can be applied to different kinds of constraint-based analysis: flux distribution prediction, gene essentiality analysis, random flux sampling, elementary mode analysis, transcriptomics data integration, and rational strain design. In each case we demonstrate how this approach can lead to improved phenotype predictions and a deeper understanding of the genotype-to-phenotype link. In particular, we show that a large fraction of reaction-based designs obtained by current strain design methods are not actually feasible, and show how our approach allows using the same methods to obtain feasible gene-based designs. We also show, by extensive comparison with experimental 13C-flux data, how simple reformulations of different simulation methods with gene-wise objective functions result in improved prediction accuracy. The model transformation proposed in this work enables existing constraint-based methods to be used at the gene level without modification. This automatically leverages phenotype analysis from reaction to gene level, improving the biological insight that can be obtained from genome-scale models.
The availability of microbial genomes has opened many new avenues of research within microbiology. This has been driven primarily by comparative genomics approaches, which rely on accurate and consistent characterization of genomic sequences. It is nevertheless difficult to obtain consistent taxonomic and integrated functional annotations for defined prokaryotic clades. Thus, we developed proGenomes, a resource that provides user-friendly access to currently 25 038 high-quality genomes whose sequences and consistent annotations can be retrieved individually or by taxonomic clade. These genomes are assigned to 5306 consistent and accurate taxonomic species clusters based on previously established methodology. proGenomes also contains functional information for almost 80 million protein-coding genes, including a comprehensive set of general annotations and more focused annotations for carbohydrate-active enzymes and antibiotic resistance genes. Additionally, broad habitat information is provided for many genomes. All genomes and associated information can be downloaded by user-selected clade or multiple habitat-specific sets of representative genomes. We expect that the availability of high-quality genomes with comprehensive functional annotations will promote advances in clinical microbial genomics, functional evolution and other subfields of microbiology. proGenomes is available at http://progenomes.embl.de.
Optlang is a Python package implementing a modeling language for solving mathematical optimization problems, i.e., maximizing or minimizing an objective function over a set of variables subject to a number of constraints. It provides a common native Python interface to a series of optimization tools, so different solver backends can be used and changed in a transparent way. Optlang’s object-oriented API takes advantage of the symbolic math library SymPy (Team 2016) to allow objective functions and constraints to be easily formulated algebraically from symbolic expressions of variables. Optlang targets scientists who can thus focus on formulating optimization problems based on mathematical equations derived from domain knowledge. Solver interfaces can be added by subclassing the four main classes of the optlang API (Variable, Constraint, Objective, and Model) and implementing the relevant API functions.
The composition of a cell in terms of macromolecular building blocks and other organic molecules underlies the metabolic needs and capabilities of a species. Although some core biomass components such as nucleic acids and proteins are evident for most species, the essentiality of the pool of other organic molecules, especially cofactors and prosthetic groups, is yet unclear. Here we integrate biomass compositions from 71 manually curated genome-scale models, 33 large-scale gene essentiality datasets, enzyme-cofactor association data and a vast array of publications, revealing universally essential cofactors for prokaryotic metabolism and also others that are specific for phylogenetic branches or metabolic modes. Our results revise predictions of essential genes in Klebsiella pneumoniae and identify missing biosynthetic pathways in models of Mycobacterium tuberculosis. This work provides fundamental insights into the essentiality of organic cofactors and has implications for minimal cell studies as well as for modeling genotype-phenotype relations in prokaryotic metabolic networks.
OptFlux is an open-source and modular software to support in silico metabolic engineering tasks aimed at being the reference computational application in the field.
Cameo is a high-level python library developed to aid the strain design process in metabolic engineering projects.
iPath2: interactive Pathways Explorer is a web-based tool for the visualization, analysis and customization of various pathways maps.
The Transport Reactions Annotation and Generation (Triage) tool identifies the metabolites transported by each transmembrane protein and its transporter family.
@Note is a Biomedical Text Mining platform that copes with major Information Retrieval and Information Extraction tasks and promotes multi-disciplinary research.
The Mass-Action Stoichiometric Simulation (MASS) toolbox is a modeling software package that focuses on the construction and analysis of kinetic and constraint-based models of biochemical reactions systems.
iTOL: interactive Tree Of Life is an online tool for the display, annotation and management of phylogenetic trees. Trees can be annotated with 14 different dataset types, and exported into various graphical formats.
The Metabolic Models Reconstruction Using Genome-Scale Information (merlin) tool is an user-friendly Java application that performs the reconstruction of genome-scale metabolic models for any organism that has its genome sequenced.
Optlang is a Python package implementing a modeling language for solving mathematical optimization problems, i.e. maximizing or minimizing an objective function over a set of variables subject to a number of constraints. Optlang provides a common interface to a series of optimization tools, so different solver backends can be changed in a transparent way.