Constraint-based analysis (CBA) code

We will analyse contraint-based models of the form

using cobrapy.

Information in this tutorial is based on https://cobrapy.readthedocs.io/en/latest/

%matplotlib inline

1. Working with metabolic models

1.1 Loading and inspecting model

To begin with, cobrapy comes with a “textbook” model of E. coli core metabolism. To load the test model use

import cobra
import cobra.test
model = cobra.test.create_test_model("textbook")

The reactions, metabolites, and genes attributes of the cobrapy model are a special type of list called a cobra.DictList, and each one is made up of cobra.Reaction, cobra.Metabolite and cobra.Gene objects respectively.

print(len(model.reactions))
print(len(model.metabolites))
print(len(model.genes))
95
72
137

When using Jupyter notebook this type of information is rendered as a table.

model
Name e_coli_core
Memory address 0x07fac4a9cb290
Number of metabolites 72
Number of reactions 95
Number of groups 0
Objective expression 1.0*Biomass_Ecoli_core - 1.0*Biomass_Ecoli_core_reverse_2cdba
Compartments cytosol, extracellular

Just like a regular list, objects in the DictList can be retrieved by index. For example, to get the 30th reaction in the model (at index 29 because of 0-indexing):

model.reactions[29]
Reaction identifierEX_glu__L_e
NameL-Glutamate exchange
Memory address 0x07facb0d14610
Stoichiometry

glu__L_e -->

L-Glutamate -->

GPR
Lower bound0.0
Upper bound1000.0

Additionally, items can be retrieved by their id using the DictList.get_by_id() function. For example, to get the cytosolic atp metabolite object (the id is “atp_c”), we can do the following:

model.metabolites.get_by_id("atp_c")
Metabolite identifieratp_c
NameATP
Memory address 0x07fac4a737850
FormulaC10H12N5O13P3
Compartmentc
In 13 reaction(s) PGK, ACKr, ADK1, PFK, PPS, ATPM, SUCOAS, Biomass_Ecoli_core, GLNS, PYK, ATPS4r, PPCK, GLNabc

1.2 Reactions

We will consider the reaction glucose 6-phosphate isomerase, which interconverts glucose 6-phosphate and fructose 6-phosphate. The reaction id for this reaction in our test model is PGI.

pgi = model.reactions.get_by_id("PGI")
pgi
Reaction identifierPGI
Nameglucose-6-phosphate isomerase
Memory address 0x07facb0cfa7d0
Stoichiometry

g6p_c <=> f6p_c

D-Glucose 6-phosphate <=> D-Fructose 6-phosphate

GPRb4025
Lower bound-1000.0
Upper bound1000.0

We can also ensure the reaction is mass balanced. This function will return elements which violate mass balance. If it comes back empty, then the reaction is mass balanced.

pgi.check_mass_balance()
{}

1.3 Metabolites

We will consider cytosolic atp as our metabolite, which has the id “atp_c” in our test model.

atp = model.metabolites.get_by_id("atp_c")
atp
Metabolite identifieratp_c
NameATP
Memory address 0x07fac4a737850
FormulaC10H12N5O13P3
Compartmentc
In 13 reaction(s) PGK, ACKr, ADK1, PFK, PPS, ATPM, SUCOAS, Biomass_Ecoli_core, GLNS, PYK, ATPS4r, PPCK, GLNabc

1.4 Genes

The gene_reaction_rule is a boolean representation of the gene requirements for this reaction to be active as described in Schellenberger2011.

The GPR is stored as the gene_reaction_rule for a Reaction object as a string.

gpr = pgi.gene_reaction_rule
gpr
'b4025'

Corresponding gene objects also exist. These objects are tracked by the reactions itself, as well as by the model

pgi.genes
frozenset({<Gene b4025 at 0x7facb0d45090>})
pgi_gene = model.genes.get_by_id("b4025")
pgi_gene
Gene identifierb4025
Namepgi
Memory address 0x07facb0d45090
FunctionalTrue
In 1 reaction(s) PGI

Each gene keeps track of the reactions it catalyzes

pgi_gene.reactions
frozenset({<Reaction PGI at 0x7facb0cfa7d0>})

1.5 Export as SBML & Visualization

The model can be exported to standard exchange formats, e.g., Systems Biology Markup Language (SBML) and be analyzed in other tools like for instance Cytoscape with cy3sbml

from cobra.io import write_sbml_model
write_sbml_model(model, "./models/e_coli_core.xml")

For instance after loading the model with cy3sbml we can get an overview over the metabolic network with the associated gene-product rules. Note the typical hairball structure of biological networks due to highly connected nodes (cofactor metabolites).

2. FBA simulations

Simulations using flux balance analysis can be solved using Model.optimize(). This will maximize or minimize (maximizing is the default) flux through the objective reactions.

import cobra.test
model = cobra.test.create_test_model("textbook")
model
Name e_coli_core
Memory address 0x07facb0cfafd0
Number of metabolites 72
Number of reactions 95
Number of groups 0
Objective expression 1.0*Biomass_Ecoli_core - 1.0*Biomass_Ecoli_core_reverse_2cdba
Compartments cytosol, extracellular

2.1. Running FBA

solution = model.optimize()
print(solution)
<Solution 0.874 at 0x7facb0be84d0>

The Model.optimize() function will return a Solution object. A solution object has several attributes:

  • objective_value: the objective value
  • status: the status from the linear programming solver
  • fluxes: a pandas series with flux indexed by reaction identifier. The flux for a reaction variable is the difference of the primal values for the forward and reverse reaction variables.
  • shadow_prices: a pandas series with shadow price indexed by the metabolite identifier.

For example, after the last call to model.optimize(), if the optimization succeeds it’s status will be optimal. In case the model is infeasible an error is raised.

2.2. Analyzing FBA solutions

Models solved using FBA can be further analyzed by using summary methods, which output printed text to give a quick representation of model behavior. Calling the summary method on the entire model displays information on the input and output behavior of the model, along with the optimized objective.

model.summary()
IN_FLUXES OUT_FLUXES OBJECTIVES
ID FLUX ID FLUX ID FLUX
0 o2_e 21.799493 h2o_e 29.175827 Biomass_Ecoli_core 0.873922
1 glc__D_e 10.000000 co2_e 22.809833 NaN NaN
2 nh4_e 4.765319 h_e 17.530865 NaN NaN
3 pi_e 3.214895 NaN NaN NaN NaN

In addition, the input-output behavior of individual metabolites can also be inspected using summary methods. For instance, the following commands can be used to examine the overall redox balance of the model

model.metabolites.nadh_c.summary()
PERCENT FLUX REACTION_STRING
RXN_STAT ID
PRODUCING GAPD 41.582168 16.023526 g3p_c + nad_c + pi_c <=> 13dpg_c + h_c + nadh_c
PDH 24.088820 9.282533 coa_c + nad_c + pyr_c --> accoa_c + co2_c + na...
AKGDH 13.142408 5.064376 akg_c + coa_c + nad_c --> co2_c + nadh_c + suc...
MDH 13.142408 5.064376 mal__L_c + nad_c <=> h_c + nadh_c + oaa_c
Biomass_Ecoli_core 8.044196 3.099800 1.496 3pg_c + 3.7478 accoa_c + 59.81 atp_c + 0...
CONSUMING NADH16 100.000000 38.534610 4.0 h_c + nadh_c + q8_c --> 3.0 h_e + nad_c + ...

Or to get a sense of the main energy production and consumption reactions

model.metabolites.atp_c.summary()
PERCENT FLUX REACTION_STRING
RXN_STAT ID
PRODUCING ATPS4r 66.579799 45.514010 adp_c + 4.0 h_e + pi_c <=> atp_c + h2o_c + 3.0...
PGK 23.439885 16.023526 3pg_c + atp_c <=> 13dpg_c + adp_c
SUCOAS 7.408381 5.064376 atp_c + coa_c + succ_c <=> adp_c + pi_c + succ...
PYK 2.571936 1.758177 adp_c + h_c + pep_c --> atp_c + pyr_c
CONSUMING Biomass_Ecoli_core 76.461640 52.269245 1.496 3pg_c + 3.7478 accoa_c + 59.81 atp_c + 0...
ATPM 12.273243 8.390000 atp_c + h2o_c --> adp_c + h_c + pi_c
PFK 10.938227 7.477382 atp_c + f6p_c --> adp_c + fdp_c + h_c

2.3. Changing objectives

The objective function is determined from the objective_coefficient attribute of the objective reaction(s). Generally, a “biomass” function which describes the composition of metabolites which make up a cell is used.

biomass_rxn = model.reactions.get_by_id("Biomass_Ecoli_core")
biomass_rxn
Reaction identifierBiomass_Ecoli_core
NameBiomass Objective Function with GAM
Memory address 0x07facb0c110d0
Stoichiometry

1.496 3pg_c + 3.7478 accoa_c + 59.81 atp_c + 0.361 e4p_c + 0.0709 f6p_c + 0.129 g3p_c + 0.205 g6p_c + 0.2557 gln__L_c + 4.9414 glu__L_c + 59.81 h2o_c + 3.547 nad_c + 13.0279 nadph_c + 1.7867 oaa_c ...

1.496 3-Phospho-D-glycerate + 3.7478 Acetyl-CoA + 59.81 ATP + 0.361 D-Erythrose 4-phosphate + 0.0709 D-Fructose 6-phosphate + 0.129 Glyceraldehyde 3-phosphate + 0.205 D-Glucose 6-phosphate + 0.2557...

GPR
Lower bound0.0
Upper bound1000.0

Currently in the model, there is only one reaction in the objective (the biomass reaction), with an linear coefficient of 1.

from cobra.util.solver import linear_reaction_coefficients
linear_reaction_coefficients(model)
{<Reaction Biomass_Ecoli_core at 0x7facb0c110d0>: 1.0}

The objective function can be changed by assigning Model.objective, which can be a reaction object (or just it’s name), or a dict of {Reaction: objective_coefficient}.

# change the objective to ATPM
model.objective = "ATPM"

# The upper bound should be 1000, so that we get
# the actual optimal value
model.reactions.get_by_id("ATPM").upper_bound = 1000.
linear_reaction_coefficients(model)
{<Reaction ATPM at 0x7facb0c11990>: 1.0}
model.optimize().objective_value
174.99999999999966
model.summary()
IN_FLUXES OUT_FLUXES OBJECTIVES
ID FLUX ID FLUX ID FLUX
0 o2_e 60.0 co2_e 60.0 ATPM 175.0
1 glc__D_e 10.0 h2o_e 60.0 NaN NaN
model.metabolites.atp_c.summary()
PERCENT FLUX REACTION_STRING
RXN_STAT ID
PRODUCING ATPS4r 72.972973 135.0 adp_c + 4.0 h_e + pi_c <=> atp_c + h2o_c + 3.0...
SUCOAS 10.810811 20.0 atp_c + coa_c + succ_c <=> adp_c + pi_c + succ...
PGK 10.810811 20.0 3pg_c + atp_c <=> 13dpg_c + adp_c
PYK 5.405405 10.0 adp_c + h_c + pep_c --> atp_c + pyr_c
CONSUMING ATPM 94.594595 175.0 atp_c + h2o_c --> adp_c + h_c + pi_c
PFK 5.405405 10.0 atp_c + f6p_c --> adp_c + fdp_c + h_c

3 Flux Variability Analysis (FVA)

FBA will not give always give unique solution, because multiple flux states can achieve the same optimum. FVA (or flux variability analysis) finds the ranges of each metabolic flux at the optimum.

from cobra.flux_analysis import flux_variability_analysis
flux_variability_analysis(model, model.reactions[:10])
minimum maximum
ACALD -5.247085e-14 0.000000e+00
ACALDt -5.247085e-14 0.000000e+00
ACKr -8.024953e-14 0.000000e+00
ACONTa 2.000000e+01 2.000000e+01
ACONTb 2.000000e+01 2.000000e+01
ACt2r -8.024953e-14 0.000000e+00
ADK1 0.000000e+00 3.410605e-13
AKGDH 2.000000e+01 2.000000e+01
AKGt2r -3.172656e-14 0.000000e+00
ALCD2x -4.547474e-14 0.000000e+00

Setting parameter fraction_of_optimium=0.90 would give the flux ranges for reactions at 90% optimality.

cobra.flux_analysis.flux_variability_analysis(
    model, model.reactions[:10], fraction_of_optimum=0.9)
minimum maximum
ACALD -2.692308 0.0
ACALDt -2.692308 0.0
ACKr -4.117647 0.0
ACONTa 8.461538 20.0
ACONTb 8.461538 20.0
ACt2r -4.117647 0.0
ADK1 0.000000 17.5
AKGDH 2.500000 20.0
AKGt2r -1.489362 0.0
ALCD2x -2.333333 0.0

3.1. Running FVA in summary methods

Flux variability analysis can also be embedded in calls to summary methods. For instance, the expected variability in substrate consumption and product formation can be quickly found by

model.optimize()
model.summary(fva=0.95)
IN_FLUXES OUT_FLUXES OBJECTIVES
ID FLUX FLUX_MIN FLUX_MAX ID FLUX FLUX_MIN FLUX_MAX ID FLUX
0 o2_e 60.0 55.882353 60.000000 h2o_e 60.0 54.166667 60.000000 ATPM 175.0
1 glc__D_e 10.0 9.500000 10.000000 co2_e 60.0 54.166667 60.000000 NaN NaN
2 nh4_e -0.0 -0.000000 0.673077 h_e 0.0 0.000000 5.833333 NaN NaN
3 NaN NaN NaN NaN for_e 0.0 0.000000 5.833333 NaN NaN
4 NaN NaN NaN NaN ac_e 0.0 0.000000 2.058824 NaN NaN
5 NaN NaN NaN NaN acald_e 0.0 0.000000 1.346154 NaN NaN
6 NaN NaN NaN NaN pyr_e 0.0 0.000000 1.346154 NaN NaN
7 NaN NaN NaN NaN etoh_e 0.0 0.000000 1.166667 NaN NaN
8 NaN NaN NaN NaN lac__D_e 0.0 0.000000 1.129032 NaN NaN
9 NaN NaN NaN NaN succ_e 0.0 0.000000 0.875000 NaN NaN
10 NaN NaN NaN NaN akg_e 0.0 0.000000 0.744681 NaN NaN
11 NaN NaN NaN NaN glu__L_e 0.0 0.000000 0.673077 NaN NaN

4 Flux sampling

The easiest way to get started with flux sampling is using the sample function in the flux_analysis submodule. sample takes at least two arguments: a cobra model and the number of samples you want to generate.

This samples from the flux cone of possible solutions (see Megchelenbrink2014 for details).

from cobra.test import create_test_model
from cobra.sampling import sample

model = create_test_model("textbook")
s = sample(model, 500)
s.head()
ACALD ACALDt ACKr ACONTa ACONTb ACt2r ADK1 AKGDH AKGt2r ALCD2x ... RPI SUCCt2_2 SUCCt3 SUCDi SUCOAS TALA THD2 TKT1 TKT2 TPI
0 -0.279405 -0.204919 -0.495190 11.843064 11.843064 -0.495190 42.164976 9.972375 -0.027105 -0.074487 ... -0.547864 1.638816 3.633646 491.572336 -9.972375 0.538130 3.697968 0.538130 0.534215 9.159104
1 -0.493738 -0.186165 -3.729782 5.419629 5.419629 -3.729782 17.895967 3.137708 -0.422005 -0.307573 ... -7.013742 14.738326 15.027930 903.123183 -3.137708 7.006810 16.375670 7.006810 7.004023 2.801040
2 -2.918138 -0.047177 -5.604668 4.184706 4.184706 -5.604668 21.380668 3.457323 -0.032278 -2.870962 ... -2.137431 18.409834 19.089707 851.886851 -3.457323 2.133926 2.890324 2.133926 2.132516 7.635461
3 -1.210218 -0.108791 -4.188558 5.325057 5.325057 -4.188558 4.512811 1.853503 -0.703495 -1.101427 ... -3.141216 0.985120 1.207118 891.174008 -1.853503 3.108748 4.295533 3.108748 3.095691 6.291233
4 -2.144321 -0.133900 -2.748090 5.946703 5.946703 -2.748090 11.061729 4.668504 -0.052671 -2.010421 ... -4.326408 9.847510 11.487255 963.492028 -4.668504 4.312598 5.354727 4.312598 4.307044 5.605263

5 rows × 95 columns

import altair as alt
p1 = alt.Chart(s).mark_tick().encode(
    x='PGI:Q',
)
p2 = alt.Chart(s).mark_bar().encode(
    alt.X('PGI:Q', bin=True),
    y='count()'
)
p1 & p2

5. Simulating deletions

import pandas as pd
import cobra.test
from cobra.flux_analysis import (
    single_gene_deletion, single_reaction_deletion, double_gene_deletion,
    double_reaction_deletion)

model = cobra.test.create_test_model("textbook")

5.1. Single gene and reaction knockouts

A commonly asked question when analyzing metabolic models is what will happen if a certain reaction was not allowed to have any flux at all. This can tested using cobrapy by

print('complete model: ', model.optimize())
with model:
    model.reactions.PFK.knock_out()
    print('pfk knocked out: ', model.optimize())
complete model:  <Solution 0.874 at 0x7fac34eeec50>
pfk knocked out:  <Solution 0.704 at 0x7fac34eee550>

For evaluating genetic manipulation strategies, it is more interesting to examine what happens if given genes are knocked out as doing so can affect no reactions in case of redundancy, or more reactions if gene when is participating in more than one reaction.

print('complete model: ', model.optimize())
with model:
    model.genes.b1723.knock_out()
    print('pfkA knocked out: ', model.optimize())
    model.genes.b3916.knock_out()
    print('pfkB knocked out: ', model.optimize())
complete model:  <Solution 0.874 at 0x7fac34ed0d10>
pfkA knocked out:  <Solution 0.874 at 0x7fac34ed0350>
pfkB knocked out:  <Solution 0.704 at 0x7fac34ed0390>

5.2 Single deletions

Perform all single gene deletions on a model

deletion_results = single_gene_deletion(model)

These can also be done for only a subset of genes

single_gene_deletion(model, model.genes[:20])
growth status
ids
(b1478) 0.873922 optimal
(b0474) 0.873922 optimal
(b0118) 0.873922 optimal
(b2587) 0.873922 optimal
(b1849) 0.873922 optimal
(b0726) 0.858307 optimal
(b1276) 0.873922 optimal
(b0356) 0.873922 optimal
(b0727) 0.858307 optimal
(b3115) 0.873922 optimal
(b3736) 0.374230 optimal
(b2296) 0.873922 optimal
(b0116) 0.782351 optimal
(b0351) 0.873922 optimal
(b1241) 0.873922 optimal
(b3733) 0.374230 optimal
(b3732) 0.374230 optimal
(b3734) 0.374230 optimal
(s0001) 0.211141 optimal
(b3735) 0.374230 optimal

5.3 Double deletions

Double deletions run in a similar way. Passing in return_frame=True will cause them to format the results as a pandas.DataFrame.

double_gene_deletion(
    model, model.genes[-5:], return_frame=True).round(4)
growth status
ids
(b0008) 0.8739 optimal
(b3919, b2464) 0.7040 optimal
(b2465, b2464) 0.8739 optimal
(b0008, b2935) 0.8739 optimal
(b3919, b2465) 0.7040 optimal
(b2465, b2935) 0.8739 optimal
(b2465) 0.8739 optimal
(b3919) 0.7040 optimal
(b2935) 0.8739 optimal
(b0008, b2464) 0.8739 optimal
(b3919, b2935) 0.7040 optimal
(b2464) 0.8739 optimal
(b0008, b3919) 0.7040 optimal
(b2464, b2935) 0.8739 optimal
(b0008, b2465) 0.8739 optimal