The European Commission has awarded 6.3 million Euros to a four-year collaborative project on data-driven design of cells and microbial communities for applications ranging from human health to sustainable production of chemicals. With advances in synthetic biology genomes can now be edited at unprecedented speed allowing making multiple changes in the same genome at the same time. This increases the need for computational tools to design cells and communities of cells analogous to the tools used in Computer Aided Design of cars, buildings and other man-made objects. In biotechnology these design tools need to be able to use existing large-scale databases to discover new parts and place them in the functioning context of the cell. The tools need to be easily accessible and provide an intuitive visual map of the cell to the biotechnologists working in the lab on building better cell factories and communities.
The project, called DD-DeCaF (Bioinformatics Services for Data-Driven Design of Cell Factories and Communities) brings together leading academic partners from five European universities with five innovative European companies to address the challenge of building a comprehensive design tool. The academic partners will develop cutting edge methods for using large scale data to design cell factories and communities for biotechnological applications. Three innovative Small/Medium Enterprise partners will convert these advanced methods to software tools that can be used by non-experts and to build intuitive visualizations of biological networks. These tools will be tested and applied to real world cell factory development projects by end-user partners.
2nd workshop in September, Oeiras/Lisbon (September 18, 2018)
DD-DeCaF 2nd Workshop: data-driven cell factory design
DD-DeCaF 5th Consortium Meeting in Lausanne (April 24, 2018)
DD-DeCaF 4th Consortium Meeting in Delft (September 11, 2017)
Hands-on workshop in September, Delft (June 29, 2017)
DD-DeCaF 1st Workshop: Hands on introduction to data-driven cell factory and community design
DD-DeCaF 3rd Consortium Meeting in Copenhagen (May 30, 2017)
The third DD-DeCaF consortium meeting was held in Kongens Lyngby (Copenhagen) on 11-12 May 2017 at the Novo Nordisk Foundation Center for Biosustainability.
1st edition of the DD-DeCaF newsletter now available! (February 28, 2017)
The first edition of the DD-DeCaF is now available, in this issue it is presented an overview of the project and its resources and a summary including the publications generated in the year 1 of the project.
Webcast DD-DeCaF platform adds an interactive pathway viewer (January 25, 2017)
DD-DeCaF 2nd Consortium Meeting in Heidelberg (September 28, 2016)
Using big “bio-data” to design better cell factories [Press release] (April 01, 2016)
The EU has granted 6.3 million Euros to the project DD-DeCaF, coordinated by the Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark. The objective is to develop a computer tool that will allow biotech companies to design and engineer cell factories faster than is currently possible today. The tool will accelerate the production of sustainable bio-chemicals and lay the groundwork for design of healthier foodstuff.
DD-DeCaF Kickoff Meeting in Brussels (March 10, 2016)
Orthology assignment is ideally suited for functional inference. However, because predicting orthology is computationally intensive at large scale, and most pipelines relatively inaccessible, less precise homology-based functional transfer is still the default for (meta-)genome annotation. We therefore developed eggNOG-mapper, a tool for functional annotation of large sets of sequences based on fast orthology assignments using precomputed clusters and phylogenies from eggNOG. To validate our method, we benchmarked Gene Ontology predictions against two widely used homology-based approaches: BLAST and InterProScan. Compared to BLAST, eggNOG-mapper reduced by 7% the rate of false positive assignments, and increased by 19% the ratio of curated terms recovered over all terms assigned per protein. Compared to InterProScan, eggNOG-mapper achieved similar proteome coverage and precision, while predicting on average 32 more terms per protein and increasing by 26% the rate of curated terms recovered over total term assignments per protein. Through strict orthology assignments, eggNOG-mapper further renders more specific annotations than possible from domain similarity only (e.g. predicting gene family names). eggNOG-mapper runs ~15x than BLAST and at least 2.5x faster than InterProScan. The tool is available standalone or as an online service at http://eggnog-mapper.embl.de.
Genome-scale metabolic reconstructions are currently available for hundreds of organisms. Constraint-based modeling enables the analysis of the phenotypic landscape of these organisms, predicting the response to genetic and environmental perturbations. However, since constraint-based models can only describe the metabolic phenotype at the reaction level, understanding the mechanistic link between genotype and phenotype is still hampered by the complexity of gene-protein-reaction associations. We implement a model transformation that enables constraint-based methods to be applied at the gene level by explicitly accounting for the individual fluxes of enzymes (and subunits) encoded by each gene. We show how this can be applied to different kinds of constraint-based analysis: flux distribution prediction, gene essentiality analysis, random flux sampling, elementary mode analysis, transcriptomics data integration, and rational strain design. In each case we demonstrate how this approach can lead to improved phenotype predictions and a deeper understanding of the genotype-to-phenotype link. In particular, we show that a large fraction of reaction-based designs obtained by current strain design methods are not actually feasible, and show how our approach allows using the same methods to obtain feasible gene-based designs. We also show, by extensive comparison with experimental 13C-flux data, how simple reformulations of different simulation methods with gene-wise objective functions result in improved prediction accuracy. The model transformation proposed in this work enables existing constraint-based methods to be used at the gene level without modification. This automatically leverages phenotype analysis from reaction to gene level, improving the biological insight that can be obtained from genome-scale models.
The availability of microbial genomes has opened many new avenues of research within microbiology. This has been driven primarily by comparative genomics approaches, which rely on accurate and consistent characterization of genomic sequences. It is nevertheless difficult to obtain consistent taxonomic and integrated functional annotations for defined prokaryotic clades. Thus, we developed proGenomes, a resource that provides user-friendly access to currently 25 038 high-quality genomes whose sequences and consistent annotations can be retrieved individually or by taxonomic clade. These genomes are assigned to 5306 consistent and accurate taxonomic species clusters based on previously established methodology. proGenomes also contains functional information for almost 80 million protein-coding genes, including a comprehensive set of general annotations and more focused annotations for carbohydrate-active enzymes and antibiotic resistance genes. Additionally, broad habitat information is provided for many genomes. All genomes and associated information can be downloaded by user-selected clade or multiple habitat-specific sets of representative genomes. We expect that the availability of high-quality genomes with comprehensive functional annotations will promote advances in clinical microbial genomics, functional evolution and other subfields of microbiology. proGenomes is available at http://progenomes.embl.de.
Optlang is a Python package implementing a modeling language for solving mathematical optimization problems, i.e., maximizing or minimizing an objective function over a set of variables subject to a number of constraints. It provides a common native Python interface to a series of optimization tools, so different solver backends can be used and changed in a transparent way. Optlang’s object-oriented API takes advantage of the symbolic math library SymPy (Team 2016) to allow objective functions and constraints to be easily formulated algebraically from symbolic expressions of variables. Optlang targets scientists who can thus focus on formulating optimization problems based on mathematical equations derived from domain knowledge. Solver interfaces can be added by subclassing the four main classes of the optlang API (Variable, Constraint, Objective, and Model) and implementing the relevant API functions.
The composition of a cell in terms of macromolecular building blocks and other organic molecules underlies the metabolic needs and capabilities of a species. Although some core biomass components such as nucleic acids and proteins are evident for most species, the essentiality of the pool of other organic molecules, especially cofactors and prosthetic groups, is yet unclear. Here we integrate biomass compositions from 71 manually curated genome-scale models, 33 large-scale gene essentiality datasets, enzyme-cofactor association data and a vast array of publications, revealing universally essential cofactors for prokaryotic metabolism and also others that are specific for phylogenetic branches or metabolic modes. Our results revise predictions of essential genes in Klebsiella pneumoniae and identify missing biosynthetic pathways in models of Mycobacterium tuberculosis. This work provides fundamental insights into the essentiality of organic cofactors and has implications for minimal cell studies as well as for modeling genotype-phenotype relations in prokaryotic metabolic networks.
Microbial cell factories based on renewable carbon sources are fundamental to a sustainable bio-economy. The economic feasibility of producer cells requires robust performance balancing growth and production. However, the inherent competition between these two objectives often leads to instability and reduces productivity. While algorithms exist to design metabolic network reduction strategies for aligning these objectives, the biochemical basis of the growth-product coupling has remained unresolved. Here, we reveal key reactions in the cellular biochemical repertoire as universal anchor reactions for aligning cell growth and production. A necessary condition for a reaction to be an anchor is that it splits a substrate into two or more molecules. By searching the currently known biochemical reaction space, we identify 62 C‐C cleaving anchor reactions, such as isocitrate lyase (EC 18.104.22.168) and L-tryptophan indole-lyase (EC 22.214.171.124), which are relevant for biorefining. The here identified anchor reactions mark network nodes for basing growth-coupled metabolic engineering and novel pathway designs.
There is an urgent need to significantly accelerate the development of microbial cell factories to produce fuels and chemicals from renewable feedstocks in order to facilitate the transition to a biobased society. Methods commonly used within the field of systems biology including omics characterization, genome-scale metabolic modeling, and adaptive laboratory evolution can be readily deployed in metabolic engineering projects. However, high performance strains usually carry tens of genetic modifications and need to operate in challenging environmental conditions. This additional complexity compared to basic science research requires pushing systems biology strategies to their limits and often spurs innovative developments that benefit fields outside metabolic engineering. Here we survey recent advanced applications of systems biology methods in engineering microbial production strains for biofuels and -chemicals.
Genome-scale metabolic reconstructions have proven to be valuable resources in enhancing our understanding of metabolic networks as they encapsulate all known metabolic capabilities of the organisms from genes to proteins to their functions. However the complexity of these large metabolic networks often hinders their utility in various practical applications. Although reduced models are commonly used for modeling and in integrating experimental data, they are often inconsistent across different studies and laboratories due to different criteria and detail, which can compromise transferability of the findings and also integration of experimental data from different groups. In this study, we have developed a systematic semi-automatic approach to reduce genome-scale models into core models in a consistent and logical manner focusing on the central metabolism or subsystems of interest. The method minimizes the loss of information using an approach that combines graph-based search and optimization methods. The resulting core models are shown to be able to capture key properties of the genome-scale models and preserve consistency in terms of biomass and by-product yields, flux and concentration variability and gene essentiality. The development of these “consistently-reduced” models will help to clarify and facilitate integration of different experimental data to draw new understanding that can be directly extendable to genome-scale models.
In the post-genomic era, Genome-scale metabolic networks (GEMs) have emerged as invaluable tools to understand metabolic capabilities of organisms. Different parts of these metabolic networks are defined as subsystems/pathways, which are sets of functional roles to implement a specific biological process or structural complex, such as glycolysis and TCA cycle. Subsystem/pathway definition is also employed to delineate the biosynthetic routes that produce biomass building blocks. In databases, such as MetaCyc and SEED, these representations are composed of linear routes from precursors to target biomass building blocks. However, this approach cannot capture the nested, complex nature of GEMs. Here we implemented an algorithm, lumpGEM, which generates biosynthetic subnetworks composed of reactions that can synthesize a target metabolite from a set of defined core precursor metabolites. lumpGEM captures balanced subnetworks, which account for the fate of all metabolites along the synthesis routes, thus encapsulating reactions from various subsystems/pathways to balance these metabolites in the metabolic network. Moreover, lumpGEM collapses these subnetworks into elementally balanced lumped reactions that specify the cost of all precursor metabolites and cofactors. It also generates alternative subnetworks and lumped reactions for the same metabolite, accounting for the flexibility of organisms. lumpGEM is applicable to any GEM and any target metabolite defined in the network. Lumped reactions generated by lumpGEM can be also used to generate properly balanced reduced core metabolic models.
The past two decades have witnessed great advances in the computational modeling and systems biology fields. Soon after the first models of metabolism were developed, methods for phenotype prediction were put forward, as well as strain optimization methods, within the field of Metabolic Engineering. Evolutionary computation has been on the front line, with the proposal of bilevel metaheuristics, where EC works over phenotype simulation, selecting the most promising solutions for bioengineering tasks. Recently, Schuetz and co-workers proposed that the metabolism of bacteria operates close to the Pareto-optimal surface of a three-dimensional space defined by competing objectives. Albeit multi-objective strain optimization approaches focused on bioengineering objectives have been proposed, none tackles the multiob-jective nature of the cellular objectives. In this work, we propose multi-objective evolutionary algorithms for strain optimization, where objective functions are defined based on distinct phenotype prediction methods, showing that those can lead to more robust designs, allowing to find solutions in more complex scenarios.
Genome‐scale metabolic models (GEMs) are widely used to calculate metabolic phenotypes. They rely on defining a set of constraints, the most common of which is that the production of metabolites and/or growth are limited by the carbon source uptake rate. However, enzyme abundances and kinetics, which act as limitations on metabolic fluxes, are not taken into account. Here, we present GECKO, a method that enhances a GEM to account for enzymes as part of reactions, thereby ensuring that each metabolic flux does not exceed its maximum capacity, equal to the product of the enzyme’s abundance and turnover number. We applied GECKO to a Saccharomyces cerevisiae GEM and demonstrated that the new model could correctly describe phenotypes that the previous model could not, particularly under high enzymatic pressure conditions, such as yeast growing on different carbon sources in excess, coping with stress, or overexpressing a specific pathway. GECKO also allows to directly integrate quantitative proteomics data; by doing so, we significantly reduced flux variability of the model, in over 60% of metabolic reactions. Additionally, the model gives insight into the distribution of enzyme usage between and within metabolic pathways. The developed method and model are expected to increase the use of model‐based design in metabolic engineering.
Many microorganisms live in communities and depend on metabolites secreted by fellow community members for survival. Yet our knowledge of interspecies metabolic dependencies is limited to few communities with small number of exchanged metabolites, and even less is known about cellular regulation facilitating metabolic exchange. Here we show how yeast enables growth of lactic acid bacteria through endogenous, multi-component, cross-feeding in a readily established community. In nitrogen-rich environments, Saccharomyces cerevisiae adjusts its metabolism by secreting a pool of metabolites, especially amino acids, and thereby enables survival of Lactobacillus plantarum and Lactococcus lactis. Quantity of the available nitrogen sources and the status of nitrogen catabolite repression pathways jointly modulate this niche creation. We demonstrate how nitrogen overflow by yeast benefits L. plantarum in grape juice, and contributes to emergence of mutualism with L. lactis in a medium with lactose. Our results illustrate how metabolic decisions of an individual species can benefit others.
OptFlux is an open-source and modular software to support in silico metabolic engineering tasks aimed at being the reference computational application in the field.
Cameo is a high-level python library developed to aid the strain design process in metabolic engineering projects.
iPath2: interactive Pathways Explorer is a web-based tool for the visualization, analysis and customization of various pathways maps.
The Transport Reactions Annotation and Generation (Triage) tool identifies the metabolites transported by each transmembrane protein and its transporter family.
@Note is a Biomedical Text Mining platform that copes with major Information Retrieval and Information Extraction tasks and promotes multi-disciplinary research.
NGLess is a domain-specific language for NGS (next-generation sequencing data) processing with a focus on metagenomics processing.
The Mass-Action Stoichiometric Simulation (MASS) toolbox is a modeling software package that focuses on the construction and analysis of kinetic and constraint-based models of biochemical reactions systems.
iTOL: interactive Tree Of Life is an online tool for the display, annotation and management of phylogenetic trees. Trees can be annotated with 14 different dataset types, and exported into various graphical formats.
The Metabolic Models Reconstruction Using Genome-Scale Information (merlin) tool is an user-friendly Java application that performs the reconstruction of genome-scale metabolic models for any organism that has its genome sequenced.
Optlang is a Python package implementing a modeling language for solving mathematical optimization problems, i.e. maximizing or minimizing an objective function over a set of variables subject to a number of constraints. Optlang provides a common interface to a series of optimization tools, so different solver backends can be changed in a transparent way.
The GECKO toolbox is a Matlab/Python package for enhancing a Genome-scale model to account for Enzyme Constraints, using Kinetics and Omics.