The European Commission has awarded 6.3 million Euros to a four-year collaborative project on data-driven design of cells and microbial communities for applications ranging from human health to sustainable production of chemicals. With advances in synthetic biology genomes can now be edited at unprecedented speed allowing making multiple changes in the same genome at the same time. This increases the need for computational tools to design cells and communities of cells analogous to the tools used in Computer Aided Design of cars, buildings and other man-made objects. In biotechnology these design tools need to be able to use existing large-scale databases to discover new parts and place them in the functioning context of the cell. The tools need to be easily accessible and provide an intuitive visual map of the cell to the biotechnologists working in the lab on building better cell factories and communities.
The project, called DD-DeCaF (Bioinformatics Services for Data-Driven Design of Cell Factories and Communities) brings together leading academic partners from five European universities with five innovative European companies to address the challenge of building a comprehensive design tool. The academic partners will develop cutting edge methods for using large scale data to design cell factories and communities for biotechnological applications. Three innovative Small/Medium Enterprise partners will convert these advanced methods to software tools that can be used by non-experts and to build intuitive visualizations of biological networks. These tools will be tested and applied to real world cell factory development projects by end-user partners.
3rd workshop in November, ESIB, Graz (November 18, 2019)
The DD-DeCaF 3rd Workshop: Computer-aided design of cell factories will be integrated in the European Summit of Industrial Biotechnology (ESIB), to be held in Graz, Austria.
2nd periodic review meeting (March 15, 2019)
The 2nd periodic review meeting of the DD-DeCaF projcet was held in Brussels on March 15, 2019 at the CPH EU Office.
2nd workshop in September, Oeiras/Lisbon (September 18, 2018)
DD-DeCaF 2nd Workshop: data-driven cell factory design
DD-DeCaF 5th Consortium Meeting in Lausanne (April 24, 2018)
DD-DeCaF 4th Consortium Meeting in Delft (September 11, 2017)
Hands-on workshop in September, Delft (June 29, 2017)
DD-DeCaF 1st Workshop: Hands on introduction to data-driven cell factory and community design
DD-DeCaF 3rd Consortium Meeting in Copenhagen (May 30, 2017)
The third DD-DeCaF consortium meeting was held in Kongens Lyngby (Copenhagen) on 11-12 May 2017 at the Novo Nordisk Foundation Center for Biosustainability.
1st edition of the DD-DeCaF newsletter now available! (February 28, 2017)
The first edition of the DD-DeCaF is now available, in this issue it is presented an overview of the project and its resources and a summary including the publications generated in the year 1 of the project.
Webcast DD-DeCaF platform adds an interactive pathway viewer (January 25, 2017)
DD-DeCaF 2nd Consortium Meeting in Heidelberg (September 28, 2016)
Using big “bio-data” to design better cell factories [Press release] (April 01, 2016)
The EU has granted 6.3 million Euros to the project DD-DeCaF, coordinated by the Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark. The objective is to develop a computer tool that will allow biotech companies to design and engineer cell factories faster than is currently possible today. The tool will accelerate the production of sustainable bio-chemicals and lay the groundwork for design of healthier foodstuff.
DD-DeCaF Kickoff Meeting in Brussels (March 10, 2016)
Orthology assignment is ideally suited for functional inference. However, because predicting orthology is computationally intensive at large scale, and most pipelines relatively inaccessible, less precise homology-based functional transfer is still the default for (meta-)genome annotation. We therefore developed eggNOG-mapper, a tool for functional annotation of large sets of sequences based on fast orthology assignments using precomputed clusters and phylogenies from eggNOG. To validate our method, we benchmarked Gene Ontology predictions against two widely used homology-based approaches: BLAST and InterProScan. Compared to BLAST, eggNOG-mapper reduced by 7% the rate of false positive assignments, and increased by 19% the ratio of curated terms recovered over all terms assigned per protein. Compared to InterProScan, eggNOG-mapper achieved similar proteome coverage and precision, while predicting on average 32 more terms per protein and increasing by 26% the rate of curated terms recovered over total term assignments per protein. Through strict orthology assignments, eggNOG-mapper further renders more specific annotations than possible from domain similarity only (e.g. predicting gene family names). eggNOG-mapper runs ~15x than BLAST and at least 2.5x faster than InterProScan. The tool is available standalone or as an online service at http://eggnog-mapper.embl.de.
Genome-scale metabolic reconstructions are currently available for hundreds of organisms. Constraint-based modeling enables the analysis of the phenotypic landscape of these organisms, predicting the response to genetic and environmental perturbations. However, since constraint-based models can only describe the metabolic phenotype at the reaction level, understanding the mechanistic link between genotype and phenotype is still hampered by the complexity of gene-protein-reaction associations. We implement a model transformation that enables constraint-based methods to be applied at the gene level by explicitly accounting for the individual fluxes of enzymes (and subunits) encoded by each gene. We show how this can be applied to different kinds of constraint-based analysis: flux distribution prediction, gene essentiality analysis, random flux sampling, elementary mode analysis, transcriptomics data integration, and rational strain design. In each case we demonstrate how this approach can lead to improved phenotype predictions and a deeper understanding of the genotype-to-phenotype link. In particular, we show that a large fraction of reaction-based designs obtained by current strain design methods are not actually feasible, and show how our approach allows using the same methods to obtain feasible gene-based designs. We also show, by extensive comparison with experimental 13C-flux data, how simple reformulations of different simulation methods with gene-wise objective functions result in improved prediction accuracy. The model transformation proposed in this work enables existing constraint-based methods to be used at the gene level without modification. This automatically leverages phenotype analysis from reaction to gene level, improving the biological insight that can be obtained from genome-scale models.
The availability of microbial genomes has opened many new avenues of research within microbiology. This has been driven primarily by comparative genomics approaches, which rely on accurate and consistent characterization of genomic sequences. It is nevertheless difficult to obtain consistent taxonomic and integrated functional annotations for defined prokaryotic clades. Thus, we developed proGenomes, a resource that provides user-friendly access to currently 25 038 high-quality genomes whose sequences and consistent annotations can be retrieved individually or by taxonomic clade. These genomes are assigned to 5306 consistent and accurate taxonomic species clusters based on previously established methodology. proGenomes also contains functional information for almost 80 million protein-coding genes, including a comprehensive set of general annotations and more focused annotations for carbohydrate-active enzymes and antibiotic resistance genes. Additionally, broad habitat information is provided for many genomes. All genomes and associated information can be downloaded by user-selected clade or multiple habitat-specific sets of representative genomes. We expect that the availability of high-quality genomes with comprehensive functional annotations will promote advances in clinical microbial genomics, functional evolution and other subfields of microbiology. proGenomes is available at http://progenomes.embl.de.
Optlang is a Python package implementing a modeling language for solving mathematical optimization problems, i.e., maximizing or minimizing an objective function over a set of variables subject to a number of constraints. It provides a common native Python interface to a series of optimization tools, so different solver backends can be used and changed in a transparent way. Optlang’s object-oriented API takes advantage of the symbolic math library SymPy (Team 2016) to allow objective functions and constraints to be easily formulated algebraically from symbolic expressions of variables. Optlang targets scientists who can thus focus on formulating optimization problems based on mathematical equations derived from domain knowledge. Solver interfaces can be added by subclassing the four main classes of the optlang API (Variable, Constraint, Objective, and Model) and implementing the relevant API functions.
The composition of a cell in terms of macromolecular building blocks and other organic molecules underlies the metabolic needs and capabilities of a species. Although some core biomass components such as nucleic acids and proteins are evident for most species, the essentiality of the pool of other organic molecules, especially cofactors and prosthetic groups, is yet unclear. Here we integrate biomass compositions from 71 manually curated genome-scale models, 33 large-scale gene essentiality datasets, enzyme-cofactor association data and a vast array of publications, revealing universally essential cofactors for prokaryotic metabolism and also others that are specific for phylogenetic branches or metabolic modes. Our results revise predictions of essential genes in Klebsiella pneumoniae and identify missing biosynthetic pathways in models of Mycobacterium tuberculosis. This work provides fundamental insights into the essentiality of organic cofactors and has implications for minimal cell studies as well as for modeling genotype-phenotype relations in prokaryotic metabolic networks.
Microbial cell factories based on renewable carbon sources are fundamental to a sustainable bio-economy. The economic feasibility of producer cells requires robust performance balancing growth and production. However, the inherent competition between these two objectives often leads to instability and reduces productivity. While algorithms exist to design metabolic network reduction strategies for aligning these objectives, the biochemical basis of the growth-product coupling has remained unresolved. Here, we reveal key reactions in the cellular biochemical repertoire as universal anchor reactions for aligning cell growth and production. A necessary condition for a reaction to be an anchor is that it splits a substrate into two or more molecules. By searching the currently known biochemical reaction space, we identify 62 C‐C cleaving anchor reactions, such as isocitrate lyase (EC 18.104.22.168) and L-tryptophan indole-lyase (EC 22.214.171.124), which are relevant for biorefining. The here identified anchor reactions mark network nodes for basing growth-coupled metabolic engineering and novel pathway designs.
There is an urgent need to significantly accelerate the development of microbial cell factories to produce fuels and chemicals from renewable feedstocks in order to facilitate the transition to a biobased society. Methods commonly used within the field of systems biology including omics characterization, genome-scale metabolic modeling, and adaptive laboratory evolution can be readily deployed in metabolic engineering projects. However, high performance strains usually carry tens of genetic modifications and need to operate in challenging environmental conditions. This additional complexity compared to basic science research requires pushing systems biology strategies to their limits and often spurs innovative developments that benefit fields outside metabolic engineering. Here we survey recent advanced applications of systems biology methods in engineering microbial production strains for biofuels and -chemicals.
Genome-scale metabolic reconstructions have proven to be valuable resources in enhancing our understanding of metabolic networks as they encapsulate all known metabolic capabilities of the organisms from genes to proteins to their functions. However the complexity of these large metabolic networks often hinders their utility in various practical applications. Although reduced models are commonly used for modeling and in integrating experimental data, they are often inconsistent across different studies and laboratories due to different criteria and detail, which can compromise transferability of the findings and also integration of experimental data from different groups. In this study, we have developed a systematic semi-automatic approach to reduce genome-scale models into core models in a consistent and logical manner focusing on the central metabolism or subsystems of interest. The method minimizes the loss of information using an approach that combines graph-based search and optimization methods. The resulting core models are shown to be able to capture key properties of the genome-scale models and preserve consistency in terms of biomass and by-product yields, flux and concentration variability and gene essentiality. The development of these “consistently-reduced” models will help to clarify and facilitate integration of different experimental data to draw new understanding that can be directly extendable to genome-scale models.
In the post-genomic era, Genome-scale metabolic networks (GEMs) have emerged as invaluable tools to understand metabolic capabilities of organisms. Different parts of these metabolic networks are defined as subsystems/pathways, which are sets of functional roles to implement a specific biological process or structural complex, such as glycolysis and TCA cycle. Subsystem/pathway definition is also employed to delineate the biosynthetic routes that produce biomass building blocks. In databases, such as MetaCyc and SEED, these representations are composed of linear routes from precursors to target biomass building blocks. However, this approach cannot capture the nested, complex nature of GEMs. Here we implemented an algorithm, lumpGEM, which generates biosynthetic subnetworks composed of reactions that can synthesize a target metabolite from a set of defined core precursor metabolites. lumpGEM captures balanced subnetworks, which account for the fate of all metabolites along the synthesis routes, thus encapsulating reactions from various subsystems/pathways to balance these metabolites in the metabolic network. Moreover, lumpGEM collapses these subnetworks into elementally balanced lumped reactions that specify the cost of all precursor metabolites and cofactors. It also generates alternative subnetworks and lumped reactions for the same metabolite, accounting for the flexibility of organisms. lumpGEM is applicable to any GEM and any target metabolite defined in the network. Lumped reactions generated by lumpGEM can be also used to generate properly balanced reduced core metabolic models.
The past two decades have witnessed great advances in the computational modeling and systems biology fields. Soon after the first models of metabolism were developed, methods for phenotype prediction were put forward, as well as strain optimization methods, within the field of Metabolic Engineering. Evolutionary computation has been on the front line, with the proposal of bilevel metaheuristics, where EC works over phenotype simulation, selecting the most promising solutions for bioengineering tasks. Recently, Schuetz and co-workers proposed that the metabolism of bacteria operates close to the Pareto-optimal surface of a three-dimensional space defined by competing objectives. Albeit multi-objective strain optimization approaches focused on bioengineering objectives have been proposed, none tackles the multiob-jective nature of the cellular objectives. In this work, we propose multi-objective evolutionary algorithms for strain optimization, where objective functions are defined based on distinct phenotype prediction methods, showing that those can lead to more robust designs, allowing to find solutions in more complex scenarios.
Genome‐scale metabolic models (GEMs) are widely used to calculate metabolic phenotypes. They rely on defining a set of constraints, the most common of which is that the production of metabolites and/or growth are limited by the carbon source uptake rate. However, enzyme abundances and kinetics, which act as limitations on metabolic fluxes, are not taken into account. Here, we present GECKO, a method that enhances a GEM to account for enzymes as part of reactions, thereby ensuring that each metabolic flux does not exceed its maximum capacity, equal to the product of the enzyme’s abundance and turnover number. We applied GECKO to a Saccharomyces cerevisiae GEM and demonstrated that the new model could correctly describe phenotypes that the previous model could not, particularly under high enzymatic pressure conditions, such as yeast growing on different carbon sources in excess, coping with stress, or overexpressing a specific pathway. GECKO also allows to directly integrate quantitative proteomics data; by doing so, we significantly reduced flux variability of the model, in over 60% of metabolic reactions. Additionally, the model gives insight into the distribution of enzyme usage between and within metabolic pathways. The developed method and model are expected to increase the use of model‐based design in metabolic engineering.
Cpf1 represents a novel single RNA‐guided CRISPR/Cas endonuclease system suitable for genome editing with distinct features compared with Cas9. We demonstrate the functionality of three Cpf1 orthologues – Acidaminococcus spp. BV3L6 (AsCpf1), Lachnospiraceae bacterium ND2006 (LbCpf1) and Francisella novicida U112 (FnCpf1) – for genome editing of Saccharomyces cerevisiae. These Cpf1‐based systems enable fast and reliable introduction of donor DNA on the genome using a two‐plasmid‐based editing approach together with linear donor DNA. LbCpf1 and FnCpf1 displayed editing efficiencies comparable with the CRISPR/Cas9 system, whereas AsCpf1 editing efficiency was lower. Further characterization showed that AsCpf1 and LbCpf1 displayed a preference for their cognate crRNA, while FnCpf1‐mediated editing with similar efficiencies was observed using non‐cognate crRNAs of AsCpf1 and LbCpf1. In addition, multiplex genome editing using a single LbCpf1 crRNA array is shown to be functional in yeast. This work demonstrates that Cpf1 broadens the genome editing toolbox available for Saccharomyces cerevisiae.
Many microorganisms live in communities and depend on metabolites secreted by fellow community members for survival. Yet our knowledge of interspecies metabolic dependencies is limited to few communities with small number of exchanged metabolites, and even less is known about cellular regulation facilitating metabolic exchange. Here we show how yeast enables growth of lactic acid bacteria through endogenous, multi-component, cross-feeding in a readily established community. In nitrogen-rich environments, Saccharomyces cerevisiae adjusts its metabolism by secreting a pool of metabolites, especially amino acids, and thereby enables survival of Lactobacillus plantarum and Lactococcus lactis. Quantity of the available nitrogen sources and the status of nitrogen catabolite repression pathways jointly modulate this niche creation. We demonstrate how nitrogen overflow by yeast benefits L. plantarum in grape juice, and contributes to emergence of mutualism with L. lactis in a medium with lactose. Our results illustrate how metabolic decisions of an individual species can benefit others.
Summary: Metabolite analogues (MAs) mimic the structure of native metabolites, can competitively inhibit their utilization in enzymatic reactions, and are commonly used as selection tools for isolating desirable mutants of industrial microorganisms. Genome-scale metabolic models representing all biochemical reactions in an organism can be used to predict effects of MAs on cellular phenotypes. Here, we present the metabolite analogues for rational strain improvement (MARSI) framework. MARSI provides a rational approach to strain improvement by searching for metabolites as targets instead of genes or reactions. The designs found by MARSI can be implemented by supplying MAs in the culture media, enabling metabolic rewiring without the use of recombinant DNA technologies that cannot always be used due to regulations. To facilitate experimental implementation, MARSI provides tools to identify candidate MAs to a target metabolite from a database of known drugs and analogues.
Computational systems biology methods enable rational design of cell factories on a genome-scale and thus accelerate the engineering of cells for the production of valuable chemicals and proteins. Unfortunately, the majority of these methods’ implementations are either not published, rely on proprietary software, or do not provide documented interfaces, which has precluded their mainstream adoption in the field. In this work we present cameo, a platform-independent software that enables in silico design of cell factories and targets both experienced modelers as well as users new to the field. It is written in Python and implements state-of-the-art methods for enumerating and prioritizing knockout, knock-in, overexpression, and down-regulation strategies and combinations thereof. Cameo is an open source software project and is freely available under the Apache License 2.0. A dedicated Web site including documentation, examples, and installation instructions can be found at http://cameo.bio. Users can also give cameo a try at http://try.cameo.bio.
Bacterial metabolism plays a fundamental role in gut microbiota ecology and host–microbiome interactions. Yet the metabolic capabilities of most gut bacteria have remained unknown. Here we report growth characteristics of 96 phylogenetically diverse gut bacterial strains across 4 rich and 15 defined media. The vast majority of strains (76) grow in at least one defined medium, enabling accurate assessment of their biosynthetic capabilities. These do not necessarily match phylogenetic similarity, thus indicating a complex evolution of nutritional preferences. We identify mucin utilizers and species inhibited by amino acids and short-chain fatty acids. Our analysis also uncovers media for in vitro studies wherein growth capacity correlates well with in vivo abundance. Further value of the underlying resource is demonstrated by correcting pathway gaps in available genome-scale metabolic models of gut microorganisms. Together, the media resource and the extracted knowledge on growth abilities widen experimental and computational access to the gut microbiota.
Background Gut microbes influence their hosts in many ways, in particular by modulating the impact of diet. These effects have been studied most extensively in humans and mice. In this work, we used whole genome metagenomics to investigate the relationship between the gut metagenomes of dogs, humans, mice, and pigs.
Fast metabolite quantification methods are required for high throughput screening of microbial strains obtained by combinatorial or evolutionary engineering approaches. In this study, a rapid RIP-LC-MS/MS (RapidRIP) method for high-throughput quantitative metabolomics was developed and validated that was capable of quantifying 102 metabolites from central, amino acid, energy, nucleotide, and cofactor metabolism in less than 5 minutes. The method was shown to have comparable sensitivity and resolving capability as compared to a full length RIP-LC-MS/MS method (FullRIP). The RapidRIP method was used to quantify the metabolome of seven industrial strains of E. coli revealing significant differences in glycolytic, pentose phosphate, TCA cycle, amino acid, and energy and cofactor metabolites were found. These differences translated to statistically and biologically significant differences in thermodynamics of biochemical reactions between strains that could have implications when choosing a host for bioprocessing.
Microbial cell factories have proven to be an economical means of production for many bulk, specialty, and fine chemical products. However, we still lack both a holistic understanding of organism physiology and the ability to predictively tune enzyme activities in vivo, thus slowing down rational engineering of industrially relevant strains. An alternative concept to rational engineering is to use evolution as the driving force to select for desired changes, an approach often described as evolutionary engineering. In evolutionary engineering, in vivo selections for a desired phenotype are combined with either generation of spontaneous mutations or some form of targeted or random mutagenesis. Evolutionary engineering has been used to successfully engineer easily selectable phenotypes, such as utilization of a suboptimal nutrient source or tolerance to inhibitory substrates or products. In this review, we focus primarily on a more challenging problem—the use of evolutionary engineering for improving the production of chemicals in microbes directly. We describe recent developments in evolutionary engineering strategies, in general, and discuss, in detail, case studies where production of a chemical has been successfully achieved through evolutionary engineering by coupling production to cellular growth.
Summary: pyTFA and matTFA are the first published implementations of the original TFA paper. Specifically, they include explicit formulation of Gibbs energies and metabolite concentrations, which enables straightforward integration of metabolite concentration measurements.
Soils harbour some of the most diverse microbiomes on Earth and are essential for both nutrient cycling and carbon storage. To understand soil functioning, it is necessary to model the global distribution patterns and functional gene repertoires of soil microorganisms, as well as the biotic and environmental associations between the diversity and structure of both bacterial and fungal soil communities1–4. Here we show, by leveraging metagenomics and metabarcoding of global topsoil samples (189 sites, 7,560 subsamples), that bacterial, but not fungal, genetic diversity is highest in temperate habitats and that microbial gene composition varies more strongly with environmental variables than with geographic distance. We demonstrate that fungi and bacteria show global niche differentiation that is associated with contrasting diversity responses to precipitation and soil pH. Furthermore, we provide evidence for strong bacterial–fungal antagonism, inferred from antibiotic-resistance genes, in topsoil and ocean habitats, indicating the substantial role of biotic interactions in shaping microbial communities. Our results suggest that both competition and environmental filtering affect the abundance, composition and encoded gene functions of bacterial and fungal communities, indicating that the relative contributions of these microorganisms to global nutrient cycling varies spatially.
Mathematical modeling is a key process to describe the behavior of biological networks. One of the most difficult challenges is to build models that allow quantitative predictions of the cells’ states along time. Recently, this issue started to be tackled through novel in silico approaches, such as the reconstruction of dynamic models, the use of phenotype prediction methods, and pathway design via efficient strain optimization algorithms. The use of dynamic models, which include detailed kinetic information of the biological systems, potentially increases the scope of the applications and the accuracy of the phenotype predictions. New efforts in metabolic engineering aim at bridging the gap between this approach and other different paradigms of mathematical modeling, as constraint-based approaches. These strategies take advantage of the best features of each method, and deal with the most remarkable limitation—the lack of available experimental information—which affects the accuracy and feasibility of solutions. Parameter estimation helps to solve this problem, but adding more computational cost to the overall process. Moreover, the existing approaches include limitations such as their scalability, flexibility, convergence time of the simulations, among others. The aim is to establish a trade-off between the size of the model and the level of accuracy of the solutions. In this work, we review the state of the art of dynamic modeling and related methods used for metabolic engineering applications, including approaches based on hybrid modeling. We describe approaches developed to undertake issues regarding the mathematical formulation and the underlying optimization algorithms, and that address the phenotype prediction by including available kinetic rate laws of metabolic processes. Then, we discuss how these have been used and combined as the basis to build computational strain optimization methods for metabolic engineering purposes, how they lead to bi-level schemes that can be used in the industry, including a consideration of their limitations.
Genome-scale metabolic models are instrumental in uncovering operating principles of cellular metabolism, for model-guided re-engineering, and unraveling cross-feeding in microbial communities. Yet, the application of genome-scale models, especially to microbial communities, is lagging behind the availability of sequenced genomes. This is largely due to the time-consuming steps of manual curation required to obtain good quality models. Here, we present an automated tool, CarveMe, for reconstruction of species and community level metabolic models. We introduce the concept of a universal model, which is manually curated and simulation ready. Starting with this universal model and annotated genome sequences, CarveMe uses a top-down approach to build single-species and community models in a fast and scalable manner. We show that CarveMe models perform closely to manually curated models in reproducing experimental phenotypes (substrate utilization and gene essentiality). Additionally, we build a collection of 74 models for human gut bacteria and test their ability to reproduce growth on a set of experimentally defined media. Finally, we create a database of 5587 bacterial models and demonstrate its potential for fast generation of microbial community models. Overall, CarveMe provides an open-source and user-friendly tool towards broadening the use of metabolic modeling in studying microbial species and communities.
Vitamins are essential compounds in human and animal diets. Their demand is increasing globally in food, feed, cosmetics, chemical and pharmaceutical industries. Most current production methods are unsustainable because they use non-renewable sources and often generate hazardous waste. Many microorganisms produce vitamins naturally, but their corresponding metabolic pathways are tightly regulated since vitamins are needed only in catalytic amounts. Metabolic engineering is accelerating the development of microbial cell factories for vitamins that could compete with chemical methods that have been optimized over decades, but scientific hurdles remain. Additional technological and regulatory issues need to be overcome for innovative bioprocesses to reach the market. Here, we review the current state of development and challenges for fermentative processes for the B vitamin group.
Background Genome-scale metabolic models (GEMs) allow predicting metabolic phenotypes from limited data on uptake and secretion fluxes by defining the space of all the feasible solutions and excluding physio-chemically and biologically unfeasible behaviors. The integration of additional biological information in genome-scale models, e.g., transcriptomic or proteomic profiles, has the potential to improve phenotype prediction accuracy. This is particularly important for metabolic engineering applications where more accurate model predictions can translate to more reliable model-based strain design. Results Here we present a GEM with Enzymatic Constraints using Kinetic and Omics data (GECKO) model of Bacillus subtilis, which uses publicly available proteomic data and enzyme kinetic parameters for central carbon (CC) metabolic reactions to constrain the flux solution space. This model allows more accurate prediction of the flux distribution and growth rate of wild-type and single-gene/operon deletion strains compared to a standard genome-scale metabolic model. The flux prediction error decreased by 43% and 36% for wild-type and mutants respectively. The model additionally increased the number of correctly predicted essential genes in CC pathways by 2.5-fold and significantly decreased flux variability in more than 80% of the reactions with variable flux. Finally, the model was used to find new gene deletion targets to optimize the flux toward the biosynthesis of poly-γ-glutamic acid (γ-PGA) polymer in engineered B. subtilis. We implemented the single-reaction deletion targets identified by the model experimentally and showed that the new strains have a twofold higher γ-PGA concentration and production rate compared to the ancestral strain. Conclusions This work confirms that integration of enzyme constraints is a powerful tool to improve existing genome-scale models, and demonstrates the successful use of enzyme-constrained models in B. subtilis metabolic engineering. We expect that the new model can be used to guide future metabolic engineering efforts in the important industrial production host B. subtilis.
Background A recurrent problem in genome-scale metabolic models (GEMs) is to correctly represent lipids as biomass requirements, due to the numerous of possible combinations of individual lipid species and the corresponding lack of fully detailed data. In this study we present SLIMEr, a formalism for correctly representing lipid requirements in GEMs using commonly available experimental data. Results SLIMEr enhances a GEM with mathematical constructs where we Split Lipids Into Measurable Entities (SLIME reactions), in addition to constraints on both the lipid classes and the acyl chain distribution. By implementing SLIMEr on the consensus GEM of Saccharomyces cerevisiae, we can represent accurate amounts of lipid species, analyze the flexibility of the resulting distribution, and compute the energy costs of moving from one metabolic state to another. Conclusions The approach shows potential for better understanding lipid metabolism in yeast under different conditions. SLIMEr is freely available at https://github.com/SysBioChalmers/SLIMEr.
Objective The composition of the healthy human adult gut microbiome is relatively stable over prolonged periods, and representatives of the most highly abundant and prevalent species have been cultured and described. However, microbial abundances can change on perturbations, such as antibiotics intake, enabling the identification and characterisation of otherwise low abundant species. Design Analysing gut microbial time-series data, we used shotgun metagenomics to create strain level taxonomic and functional profiles. Community dynamics were modelled postintervention with a focus on conditionally rare taxa and previously unknown bacteria. Results In response to a commonly prescribed cephalosporin (ceftriaxone), we observe a strong compositional shift in one subject, in which a previously unknown species, UBorkfalki ceftriaxensis, was identified, blooming to 92% relative abundance. The genome assembly reveals that this species (1) belongs to a so far undescribed order of Firmicutes, (2) is ubiquitously present at low abundances in at least one third of adults, (3) is opportunistically growing, being ecologically similar to typical probiotic species and (4) is stably associated to healthy hosts as determined by single nucleotide variation analysis. It was the first coloniser after the antibiotic intervention that led to a long-lasting microbial community shift and likely permanent loss of nine commensals. Conclusion The bloom of UB. ceftriaxensis and a subsequent one of Parabacteroides distasonis demonstrate the existence of monodominance community states in the gut. Our study points to an undiscovered wealth of low abundant but common taxa in the human gut and calls for more highly resolved longitudinal studies, in particular on ecosystem perturbations.
Biological production of chemicals is an attractive alternative to petrochemical-based production, due to advantages in environmental impact and the spectrum of feasible targets. However, engineering microbial strains to overproduce a compound of interest can be a long, costly and painstaking process. If production can be coupled to cell growth it is possible to use adaptive laboratory evolution to increase the production rate. Strategies for coupling production to growth, however, are often not trivial to find. Here we present OptCouple, a constraint-based modeling algorithm to simultaneously identify combinations of gene knockouts, insertions and medium supplements that lead to growth-coupled production of a target compound. We validated the algorithm by showing that it can find novel strategies that are growth-coupled in silico for a compound that has not been coupled to growth previously, as well as reproduce known growth-coupled strain designs for two different target compounds. Furthermore, we used OptCouple to construct an alternative design with potential for higher production. We provide an efficient and easy-to-use implementation of the OptCouple algorithm in the cameo Python package for computational strain design.
Microbial cell factories offer new and sustainable production routes for high-value chemicals. However, identification of high producers within a library of clones remains a challenge. When product formation is coupled to growth, millions of metabolic variants can be effectively interrogated by growth selection, dramatically increasing the throughput of strain evaluation. While growth-coupled selections for cell factories have a long history of success based on metabolite auxotrophies and toxic antimetabolites, such methods are generally restricted to molecules native to their host metabolism. New synthetic biology tools offer the opportunity to rewire cellular metabolism to depend on specific and non-native products for growth.
The uncertain relationship between genotype and phenotype can make strain engineering an arduous trial and error process. To identify promising gene targets faster, constraint-based modeling methodologies are often used, although they remain limited in their predictive power. Even though the search for gene knockouts is fairly established in constraint-based modeling, most strain design methods still model gene up/down-regulations by forcing the corresponding flux values to fixed levels without taking in consideration the availability of resources. Here, we present a constraint-based algorithm, the turnover dependent phenotypic simulation (TDPS) that quantitatively simulates phenotypes in a resource conscious manner. Unlike other available algorithms, TDPS does not force flux values and considers resource availability, using metabolite production turnovers as an indicator of metabolite abundance. TDPS can simulate up-regulation of metabolic reactions as well as the introduction of heterologous genes, alongside gene deletion and down-regulation scenarios. TDPS simulations were validated using engineered Saccharomyces cerevisiae strains available in the literature by comparing the simulated and experimental production yields of the target metabolite. For many of the strains evaluated, the experimental production yields were within the simulated intervals and the relative strain performance could be predicted with TDPS. However, the algorithm failed to predict some of the production changes observed experimentally, suggesting that further improvements are necessary. The results also showed that TDPS may be helpful in finding metabolic bottlenecks, but further experiments would be required to confirm these findings.
The CRISPR/Cas12a system in combination with a single crRNA array enables efficient multiplex editing of the S. cerevisiae genome at multiple loci simultaneously. This is demonstrated by constructing carotenoid producing yeast strains which are subsequently used to create yeast pixel art.
Background Shotgun metagenomes contain a sample of all the genomic material in an environment, allowing for the characterization of a microbial community. In order to understand these communities, bioinformatics methods are crucial. A common first step in processing metagenomes is to compute abundance estimates of different taxonomic or functional groups from the raw sequencing data. Given the breadth of the field, computational solutions need to be flexible and extensible, enabling the combination of different tools into a larger pipeline. Results We present NGLess and NG-meta-profiler. NGLess is a domain specific language for describing next-generation sequence processing pipelines. It was developed with the goal of enabling user-friendly computational reproducibility. It provides built-in support for many common operations on sequencing data and is extensible with external tools with configuration files. Using this framework, we developed NG-meta-profiler, a fast profiler for metagenomes which performs sequence preprocessing, mapping to bundled databases, filtering of the mapping results, and profiling (taxonomic and functional). It is significantly faster than either MOCAT2 or htseq-count and (as it builds on NGLess) its results are perfectly reproducible. Conclusions NG-meta-profiler is a high-performance solution for metagenomics processing built on NGLess. It can be used as-is to execute standard analyses or serve as the starting point for customization in a perfectly reproducible fashion. NGLess and NG-meta-profiler are open source software (under the liberal MIT license) and can be downloaded from https://ngless.embl.de or installed through bioconda.
Background Computational strain optimisation methods (CSOMs) have been successfully used to exploit genome-scale metabolic models, yielding strategies useful for allowing compound overproduction in metabolic cell factories. Minimal cut sets are particularly interesting since their definition allows searching for intervention strategies that impose strong growth-coupling phenotypes, and are not subject to optimality bias when compared with simulation-based CSOMs. However, since both types of methods have different underlying principles, they also imply different ways to formulate metabolic engineering problems, posing an obstacle when comparing their outputs. Results In this work, we perform an in-depth analysis of potential strategies that can be obtained with both methods, providing a critical comparison of performance, robustness, predicted phenotypes as well as strategy structure and size. To this end, we devised a pipeline including enumeration of strategies from evolutionary algorithms (EA) and minimal cut sets (MCS), filtering and flux analysis of predicted mutants to optimize the production of succinic acid in Saccharomyces cerevisiae. We additionally attempt to generalize problem formulations for MCS enumeration within the context of growth-coupled product synthesis. Strategies from evolutionary algorithms show the best compromise between acceptable growth rates and compound overproduction. However, constrained MCSs lead to a larger variety of phenotypes with several degrees of growth-coupling with production flux. The latter have proven useful in revealing the importance, in silico, of the gamma-aminobutyric acid shunt and manipulation of cofactor pools in growth-coupled designs for succinate production, mechanisms which have also been touted as potentially useful for metabolic engineering. Conclusions The two main groups of CSOMs are valuable for finding growth-coupled mutants. Despite the limitations in maximum growth rates and large strategy sizes, MCSs help uncover novel mechanisms for compound overproduction and thus, analyzing outputs from both methods provides a richer overview on strategies that can be potentially carried over in vivo.
Background Pythium irregulare is an oleaginous Oomycete able to accumulate large amounts of lipids, including Eicosapentaenoic acid (EPA). EPA is an important and expensive dietary supplement with a promising and very competitive market, which is dependent on fish-oil extraction. This has prompted several research groups to study biotechnological routes to obtain specific fatty acids rather than a mixture of various lipids. Moreover, microorganisms can use low cost carbon sources for lipid production, thus reducing production costs. Previous studies have highlighted the production of EPA by P. irregulare, exploiting diverse low cost carbon sources that are produced in large amounts, such as vinasse, glycerol, and food wastewater. However, there is still a lack of knowledge about its biosynthetic pathways, because no functional annotation of any Pythium sp. exists yet. The goal of this work was to identify key genes and pathways related to EPA biosynthesis, in P. irregulare CBS 494.86, by sequencing and performing an unprecedented annotation of its genome, considering the possibility of using wastewater as a carbon source. Results Genome sequencing provided 17,727 candidate genes, with 3809 of them associated with enzyme code and 945 with membrane transporter proteins. The functional annotation was compared with curated information of oleaginous organisms, understanding amino acids and fatty acids production, and consumption of carbon and nitrogen sources, present in the wastewater. The main features include the presence of genes related to the consumption of several sugars and candidate genes of unsaturated fatty acids production. Conclusions The whole metabolic genome presented, which is an unprecedented reconstruction of P. irregulare CBS 494.86, shows its potential to produce value-added products, in special EPA, for food and pharmaceutical industries, moreover it infers metabolic capabilities of the microorganism by incorporating information obtained from literature and genomic data, supplying information of great importance to future work.
Summary CoBAMP is a modular framework for the enumeration of pathway analysis concepts, such as elementary flux modes (EFM) and minimal cut sets in genome-scale constraint-based models (CBMs) of metabolism. It currently includes the K-shortest EFM algorithm and facilitates integration with other frameworks involving reading, manipulation and analysis of CBMs. Availability and implementation The software is implemented in Python 3, supported on most operating systems and requires a mixed-integer linear programming optimizer supported by the optlang framework. Source-code is available at https://github.com/BioSystemsUM/cobamp.
Background As genome sequencing projects grow rapidly, the diversity of organisms with recently assembled genome sequences peaks at an unprecedented scale, thereby highlighting the need to make gene functional annotations fast and efficient. However, the (high) quality of such annotations must be guaranteed, as this is the first indicator of the genomic potential of every organism. Automatic procedures help accelerating the annotation process, though decreasing the confidence and reliability of the outcomes. Manually curating a genome-wide annotation of genes, enzymes and transporter proteins function is a highly time-consuming, tedious and impractical task, even for the most proficient curator. Hence, a semi-automated procedure, which balances the two approaches, will increase the reliability of the annotation, while speeding up the process. In fact, a prior analysis of the annotation algorithm may leverage its performance, by manipulating its parameters, hastening the downstream processing and the manual curation of assigning functions to genes encoding proteins. Results Here SamPler, a novel strategy to select parameters for gene functional annotation routines is presented. This semi-automated method is based on the manual curation of a randomly selected set of genes/proteins. Then, in a multi-dimensional array, this sample is used to assess the automatic annotations for all possible combinations of the algorithm’s parameters. These assessments allow creating an array of confusion matrices, for which several metrics are calculated (accuracy, precision and negative predictive value) and used to reach optimal values for the parameters. Conclusions The potential of this methodology is demonstrated with four genome functional annotations performed in merlin, an in-house user-friendly computational framework for genome-scale metabolic annotation and model reconstruction. For that, SamPler was implemented as a new plugin for the merlin tool.
Genome-scale metabolic models (GEMs) represent extensive knowledgebases that provide a platform for model simulations and integrative analysis of omics data. This study introduces Yeast8 and an associated ecosystem of models that represent a comprehensive computational resource for performing simulations of the metabolism of Saccharomyces cerevisiae - an important model organism and widely used cell-factory. Yeast8 tracks community development with version control, setting a standard for how GEMs can be continuously updated in a simple and reproducible way. We use Yeast8 to develop the derived models panYeast8 and coreYeast8, which in turn enable the reconstruction of GEMs for 1,011 different yeast strains. Through integration with enzyme constraints (ecYeast8) and protein 3D structures (proYeast8DB), Yeast8 further facilitates the exploration of yeast metabolism at a multi-scale level, enabling prediction of how single nucleotide variations translate to phenotypic traits.
OptFlux is an open-source and modular software to support in silico metabolic engineering tasks aimed at being the reference computational application in the field.
Cameo is a high-level python library developed to aid the strain design process in metabolic engineering projects.
iPath2: interactive Pathways Explorer is a web-based tool for the visualization, analysis and customization of various pathways maps.
The Transport Reactions Annotation and Generation (Triage) tool identifies the metabolites transported by each transmembrane protein and its transporter family.
@Note is a Biomedical Text Mining platform that copes with major Information Retrieval and Information Extraction tasks and promotes multi-disciplinary research.
NGLess is a domain-specific language for NGS (next-generation sequencing data) processing with a focus on metagenomics processing.
The Mass-Action Stoichiometric Simulation (MASS) toolbox is a modeling software package that focuses on the construction and analysis of kinetic and constraint-based models of biochemical reactions systems.
iTOL: interactive Tree Of Life is an online tool for the display, annotation and management of phylogenetic trees. Trees can be annotated with 14 different dataset types, and exported into various graphical formats.
The Metabolic Models Reconstruction Using Genome-Scale Information (merlin) tool is an user-friendly Java application that performs the reconstruction of genome-scale metabolic models for any organism that has its genome sequenced.
Optlang is a Python package implementing a modeling language for solving mathematical optimization problems, i.e. maximizing or minimizing an objective function over a set of variables subject to a number of constraints. Optlang provides a common interface to a series of optimization tools, so different solver backends can be changed in a transparent way.
The GECKO toolbox is a Matlab/Python package for enhancing a Genome-scale model to account for Enzyme Constraints, using Kinetics and Omics.