Constraint-based analysis (CBA) code¶
We will analyse contraint-based models of the form
using cobrapy
.
Information in this tutorial is based on https://cobrapy.readthedocs.io/en/latest/
%matplotlib inline
1. Working with metabolic models¶
1.1 Loading and inspecting model¶
To begin with, cobrapy comes with a “textbook” model of E. coli core metabolism. To load the test model use
import cobra
import cobra.test
model = cobra.test.create_test_model("textbook")
The reactions, metabolites, and genes attributes of the cobrapy model
are a special type of list called a cobra.DictList
, and each one is
made up of cobra.Reaction
, cobra.Metabolite
and cobra.Gene
objects respectively.
print(len(model.reactions))
print(len(model.metabolites))
print(len(model.genes))
95
72
137
When using Jupyter notebook this type of information is rendered as a table.
model
Name | e_coli_core |
Memory address | 0x07fac4a9cb290 |
Number of metabolites | 72 |
Number of reactions | 95 |
Number of groups | 0 |
Objective expression | 1.0*Biomass_Ecoli_core - 1.0*Biomass_Ecoli_core_reverse_2cdba |
Compartments | cytosol, extracellular |
Just like a regular list, objects in the DictList
can be retrieved
by index. For example, to get the 30th reaction in the model (at index
29 because of 0-indexing):
model.reactions[29]
Reaction identifier | EX_glu__L_e |
Name | L-Glutamate exchange |
Memory address | 0x07facb0d14610 |
Stoichiometry |
glu__L_e --> L-Glutamate --> |
GPR | |
Lower bound | 0.0 |
Upper bound | 1000.0 |
Additionally, items can be retrieved by their id
using the
DictList.get_by_id()
function. For example, to get the cytosolic atp
metabolite object (the id is “atp_c”), we can do the following:
model.metabolites.get_by_id("atp_c")
Metabolite identifier | atp_c |
Name | ATP |
Memory address | 0x07fac4a737850 |
Formula | C10H12N5O13P3 |
Compartment | c |
In 13 reaction(s) | PGK, ACKr, ADK1, PFK, PPS, ATPM, SUCOAS, Biomass_Ecoli_core, GLNS, PYK, ATPS4r, PPCK, GLNabc |
1.2 Reactions¶
We will consider the reaction glucose 6-phosphate isomerase, which interconverts glucose 6-phosphate and fructose 6-phosphate. The reaction id for this reaction in our test model is PGI.
pgi = model.reactions.get_by_id("PGI")
pgi
Reaction identifier | PGI |
Name | glucose-6-phosphate isomerase |
Memory address | 0x07facb0cfa7d0 |
Stoichiometry |
g6p_c <=> f6p_c D-Glucose 6-phosphate <=> D-Fructose 6-phosphate |
GPR | b4025 |
Lower bound | -1000.0 |
Upper bound | 1000.0 |
We can also ensure the reaction is mass balanced. This function will return elements which violate mass balance. If it comes back empty, then the reaction is mass balanced.
pgi.check_mass_balance()
{}
1.3 Metabolites¶
We will consider cytosolic atp as our metabolite, which has the id “atp_c” in our test model.
atp = model.metabolites.get_by_id("atp_c")
atp
Metabolite identifier | atp_c |
Name | ATP |
Memory address | 0x07fac4a737850 |
Formula | C10H12N5O13P3 |
Compartment | c |
In 13 reaction(s) | PGK, ACKr, ADK1, PFK, PPS, ATPM, SUCOAS, Biomass_Ecoli_core, GLNS, PYK, ATPS4r, PPCK, GLNabc |
1.4 Genes¶
The gene_reaction_rule is a boolean representation of the gene requirements for this reaction to be active as described in Schellenberger2011.
The GPR is stored as the gene_reaction_rule for a Reaction object as a string.
gpr = pgi.gene_reaction_rule
gpr
'b4025'
Corresponding gene objects also exist. These objects are tracked by the reactions itself, as well as by the model
pgi.genes
frozenset({<Gene b4025 at 0x7facb0d45090>})
pgi_gene = model.genes.get_by_id("b4025")
pgi_gene
Gene identifier | b4025 |
Name | pgi |
Memory address | 0x07facb0d45090 |
Functional | True |
In 1 reaction(s) | PGI |
Each gene keeps track of the reactions it catalyzes
pgi_gene.reactions
frozenset({<Reaction PGI at 0x7facb0cfa7d0>})
1.5 Export as SBML & Visualization¶
The model can be exported to standard exchange formats, e.g., Systems
Biology Markup Language (SBML) and be analyzed in other tools like for
instance Cytoscape
with cy3sbml
from cobra.io import write_sbml_model
write_sbml_model(model, "./models/e_coli_core.xml")
For instance after loading the model with cy3sbml
we can get an
overview over the metabolic network with the associated gene-product
rules. Note the typical hairball structure of biological networks due to
highly connected nodes (cofactor metabolites).
2. FBA simulations¶
Simulations using flux balance analysis can be solved using Model.optimize(). This will maximize or minimize (maximizing is the default) flux through the objective reactions.
import cobra.test
model = cobra.test.create_test_model("textbook")
model
Name | e_coli_core |
Memory address | 0x07facb0cfafd0 |
Number of metabolites | 72 |
Number of reactions | 95 |
Number of groups | 0 |
Objective expression | 1.0*Biomass_Ecoli_core - 1.0*Biomass_Ecoli_core_reverse_2cdba |
Compartments | cytosol, extracellular |
2.1. Running FBA¶
solution = model.optimize()
print(solution)
<Solution 0.874 at 0x7facb0be84d0>
The Model.optimize()
function will return a Solution object. A
solution object has several attributes:
objective_value
: the objective valuestatus
: the status from the linear programming solverfluxes
: a pandas series with flux indexed by reaction identifier. The flux for a reaction variable is the difference of the primal values for the forward and reverse reaction variables.shadow_prices
: a pandas series with shadow price indexed by the metabolite identifier.
For example, after the last call to model.optimize()
, if the
optimization succeeds it’s status will be optimal. In case the model is
infeasible an error is raised.
2.2. Analyzing FBA solutions¶
Models solved using FBA can be further analyzed by using summary methods, which output printed text to give a quick representation of model behavior. Calling the summary method on the entire model displays information on the input and output behavior of the model, along with the optimized objective.
model.summary()
IN_FLUXES | OUT_FLUXES | OBJECTIVES | ||||
---|---|---|---|---|---|---|
ID | FLUX | ID | FLUX | ID | FLUX | |
0 | o2_e | 21.799493 | h2o_e | 29.175827 | Biomass_Ecoli_core | 0.873922 |
1 | glc__D_e | 10.000000 | co2_e | 22.809833 | NaN | NaN |
2 | nh4_e | 4.765319 | h_e | 17.530865 | NaN | NaN |
3 | pi_e | 3.214895 | NaN | NaN | NaN | NaN |
In addition, the input-output behavior of individual metabolites can also be inspected using summary methods. For instance, the following commands can be used to examine the overall redox balance of the model
model.metabolites.nadh_c.summary()
PERCENT | FLUX | REACTION_STRING | ||
---|---|---|---|---|
RXN_STAT | ID | |||
PRODUCING | GAPD | 41.582168 | 16.023526 | g3p_c + nad_c + pi_c <=> 13dpg_c + h_c + nadh_c |
PDH | 24.088820 | 9.282533 | coa_c + nad_c + pyr_c --> accoa_c + co2_c + na... | |
AKGDH | 13.142408 | 5.064376 | akg_c + coa_c + nad_c --> co2_c + nadh_c + suc... | |
MDH | 13.142408 | 5.064376 | mal__L_c + nad_c <=> h_c + nadh_c + oaa_c | |
Biomass_Ecoli_core | 8.044196 | 3.099800 | 1.496 3pg_c + 3.7478 accoa_c + 59.81 atp_c + 0... | |
CONSUMING | NADH16 | 100.000000 | 38.534610 | 4.0 h_c + nadh_c + q8_c --> 3.0 h_e + nad_c + ... |
Or to get a sense of the main energy production and consumption reactions
model.metabolites.atp_c.summary()
PERCENT | FLUX | REACTION_STRING | ||
---|---|---|---|---|
RXN_STAT | ID | |||
PRODUCING | ATPS4r | 66.579799 | 45.514010 | adp_c + 4.0 h_e + pi_c <=> atp_c + h2o_c + 3.0... |
PGK | 23.439885 | 16.023526 | 3pg_c + atp_c <=> 13dpg_c + adp_c | |
SUCOAS | 7.408381 | 5.064376 | atp_c + coa_c + succ_c <=> adp_c + pi_c + succ... | |
PYK | 2.571936 | 1.758177 | adp_c + h_c + pep_c --> atp_c + pyr_c | |
CONSUMING | Biomass_Ecoli_core | 76.461640 | 52.269245 | 1.496 3pg_c + 3.7478 accoa_c + 59.81 atp_c + 0... |
ATPM | 12.273243 | 8.390000 | atp_c + h2o_c --> adp_c + h_c + pi_c | |
PFK | 10.938227 | 7.477382 | atp_c + f6p_c --> adp_c + fdp_c + h_c |
2.3. Changing objectives¶
The objective function is determined from the objective_coefficient attribute of the objective reaction(s). Generally, a “biomass” function which describes the composition of metabolites which make up a cell is used.
biomass_rxn = model.reactions.get_by_id("Biomass_Ecoli_core")
biomass_rxn
Reaction identifier | Biomass_Ecoli_core |
Name | Biomass Objective Function with GAM |
Memory address | 0x07facb0c110d0 |
Stoichiometry |
1.496 3pg_c + 3.7478 accoa_c + 59.81 atp_c + 0.361 e4p_c + 0.0709 f6p_c + 0.129 g3p_c + 0.205 g6p_c + 0.2557 gln__L_c + 4.9414 glu__L_c + 59.81 h2o_c + 3.547 nad_c + 13.0279 nadph_c + 1.7867 oaa_c ... 1.496 3-Phospho-D-glycerate + 3.7478 Acetyl-CoA + 59.81 ATP + 0.361 D-Erythrose 4-phosphate + 0.0709 D-Fructose 6-phosphate + 0.129 Glyceraldehyde 3-phosphate + 0.205 D-Glucose 6-phosphate + 0.2557... |
GPR | |
Lower bound | 0.0 |
Upper bound | 1000.0 |
Currently in the model, there is only one reaction in the objective (the biomass reaction), with an linear coefficient of 1.
from cobra.util.solver import linear_reaction_coefficients
linear_reaction_coefficients(model)
{<Reaction Biomass_Ecoli_core at 0x7facb0c110d0>: 1.0}
The objective function can be changed by assigning Model.objective, which can be a reaction object (or just it’s name), or a dict of {Reaction: objective_coefficient}.
# change the objective to ATPM
model.objective = "ATPM"
# The upper bound should be 1000, so that we get
# the actual optimal value
model.reactions.get_by_id("ATPM").upper_bound = 1000.
linear_reaction_coefficients(model)
{<Reaction ATPM at 0x7facb0c11990>: 1.0}
model.optimize().objective_value
174.99999999999966
model.summary()
IN_FLUXES | OUT_FLUXES | OBJECTIVES | ||||
---|---|---|---|---|---|---|
ID | FLUX | ID | FLUX | ID | FLUX | |
0 | o2_e | 60.0 | co2_e | 60.0 | ATPM | 175.0 |
1 | glc__D_e | 10.0 | h2o_e | 60.0 | NaN | NaN |
model.metabolites.atp_c.summary()
PERCENT | FLUX | REACTION_STRING | ||
---|---|---|---|---|
RXN_STAT | ID | |||
PRODUCING | ATPS4r | 72.972973 | 135.0 | adp_c + 4.0 h_e + pi_c <=> atp_c + h2o_c + 3.0... |
SUCOAS | 10.810811 | 20.0 | atp_c + coa_c + succ_c <=> adp_c + pi_c + succ... | |
PGK | 10.810811 | 20.0 | 3pg_c + atp_c <=> 13dpg_c + adp_c | |
PYK | 5.405405 | 10.0 | adp_c + h_c + pep_c --> atp_c + pyr_c | |
CONSUMING | ATPM | 94.594595 | 175.0 | atp_c + h2o_c --> adp_c + h_c + pi_c |
PFK | 5.405405 | 10.0 | atp_c + f6p_c --> adp_c + fdp_c + h_c |
3 Flux Variability Analysis (FVA)¶
FBA will not give always give unique solution, because multiple flux states can achieve the same optimum. FVA (or flux variability analysis) finds the ranges of each metabolic flux at the optimum.
from cobra.flux_analysis import flux_variability_analysis
flux_variability_analysis(model, model.reactions[:10])
minimum | maximum | |
---|---|---|
ACALD | -5.247085e-14 | 0.000000e+00 |
ACALDt | -5.247085e-14 | 0.000000e+00 |
ACKr | -8.024953e-14 | 0.000000e+00 |
ACONTa | 2.000000e+01 | 2.000000e+01 |
ACONTb | 2.000000e+01 | 2.000000e+01 |
ACt2r | -8.024953e-14 | 0.000000e+00 |
ADK1 | 0.000000e+00 | 3.410605e-13 |
AKGDH | 2.000000e+01 | 2.000000e+01 |
AKGt2r | -3.172656e-14 | 0.000000e+00 |
ALCD2x | -4.547474e-14 | 0.000000e+00 |
Setting parameter fraction_of_optimium=0.90 would give the flux ranges for reactions at 90% optimality.
cobra.flux_analysis.flux_variability_analysis(
model, model.reactions[:10], fraction_of_optimum=0.9)
minimum | maximum | |
---|---|---|
ACALD | -2.692308 | 0.0 |
ACALDt | -2.692308 | 0.0 |
ACKr | -4.117647 | 0.0 |
ACONTa | 8.461538 | 20.0 |
ACONTb | 8.461538 | 20.0 |
ACt2r | -4.117647 | 0.0 |
ADK1 | 0.000000 | 17.5 |
AKGDH | 2.500000 | 20.0 |
AKGt2r | -1.489362 | 0.0 |
ALCD2x | -2.333333 | 0.0 |
3.1. Running FVA in summary methods¶
Flux variability analysis can also be embedded in calls to summary methods. For instance, the expected variability in substrate consumption and product formation can be quickly found by
model.optimize()
model.summary(fva=0.95)
IN_FLUXES | OUT_FLUXES | OBJECTIVES | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
ID | FLUX | FLUX_MIN | FLUX_MAX | ID | FLUX | FLUX_MIN | FLUX_MAX | ID | FLUX | |
0 | o2_e | 60.0 | 55.882353 | 60.000000 | h2o_e | 60.0 | 54.166667 | 60.000000 | ATPM | 175.0 |
1 | glc__D_e | 10.0 | 9.500000 | 10.000000 | co2_e | 60.0 | 54.166667 | 60.000000 | NaN | NaN |
2 | nh4_e | -0.0 | -0.000000 | 0.673077 | h_e | 0.0 | 0.000000 | 5.833333 | NaN | NaN |
3 | NaN | NaN | NaN | NaN | for_e | 0.0 | 0.000000 | 5.833333 | NaN | NaN |
4 | NaN | NaN | NaN | NaN | ac_e | 0.0 | 0.000000 | 2.058824 | NaN | NaN |
5 | NaN | NaN | NaN | NaN | acald_e | 0.0 | 0.000000 | 1.346154 | NaN | NaN |
6 | NaN | NaN | NaN | NaN | pyr_e | 0.0 | 0.000000 | 1.346154 | NaN | NaN |
7 | NaN | NaN | NaN | NaN | etoh_e | 0.0 | 0.000000 | 1.166667 | NaN | NaN |
8 | NaN | NaN | NaN | NaN | lac__D_e | 0.0 | 0.000000 | 1.129032 | NaN | NaN |
9 | NaN | NaN | NaN | NaN | succ_e | 0.0 | 0.000000 | 0.875000 | NaN | NaN |
10 | NaN | NaN | NaN | NaN | akg_e | 0.0 | 0.000000 | 0.744681 | NaN | NaN |
11 | NaN | NaN | NaN | NaN | glu__L_e | 0.0 | 0.000000 | 0.673077 | NaN | NaN |
4 Flux sampling¶
The easiest way to get started with flux sampling is using the sample function in the flux_analysis submodule. sample takes at least two arguments: a cobra model and the number of samples you want to generate.
This samples from the flux cone of possible solutions (see Megchelenbrink2014 for details).
from cobra.test import create_test_model
from cobra.sampling import sample
model = create_test_model("textbook")
s = sample(model, 500)
s.head()
ACALD | ACALDt | ACKr | ACONTa | ACONTb | ACt2r | ADK1 | AKGDH | AKGt2r | ALCD2x | ... | RPI | SUCCt2_2 | SUCCt3 | SUCDi | SUCOAS | TALA | THD2 | TKT1 | TKT2 | TPI | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | -0.279405 | -0.204919 | -0.495190 | 11.843064 | 11.843064 | -0.495190 | 42.164976 | 9.972375 | -0.027105 | -0.074487 | ... | -0.547864 | 1.638816 | 3.633646 | 491.572336 | -9.972375 | 0.538130 | 3.697968 | 0.538130 | 0.534215 | 9.159104 |
1 | -0.493738 | -0.186165 | -3.729782 | 5.419629 | 5.419629 | -3.729782 | 17.895967 | 3.137708 | -0.422005 | -0.307573 | ... | -7.013742 | 14.738326 | 15.027930 | 903.123183 | -3.137708 | 7.006810 | 16.375670 | 7.006810 | 7.004023 | 2.801040 |
2 | -2.918138 | -0.047177 | -5.604668 | 4.184706 | 4.184706 | -5.604668 | 21.380668 | 3.457323 | -0.032278 | -2.870962 | ... | -2.137431 | 18.409834 | 19.089707 | 851.886851 | -3.457323 | 2.133926 | 2.890324 | 2.133926 | 2.132516 | 7.635461 |
3 | -1.210218 | -0.108791 | -4.188558 | 5.325057 | 5.325057 | -4.188558 | 4.512811 | 1.853503 | -0.703495 | -1.101427 | ... | -3.141216 | 0.985120 | 1.207118 | 891.174008 | -1.853503 | 3.108748 | 4.295533 | 3.108748 | 3.095691 | 6.291233 |
4 | -2.144321 | -0.133900 | -2.748090 | 5.946703 | 5.946703 | -2.748090 | 11.061729 | 4.668504 | -0.052671 | -2.010421 | ... | -4.326408 | 9.847510 | 11.487255 | 963.492028 | -4.668504 | 4.312598 | 5.354727 | 4.312598 | 4.307044 | 5.605263 |
5 rows × 95 columns
import altair as alt
p1 = alt.Chart(s).mark_tick().encode(
x='PGI:Q',
)
p2 = alt.Chart(s).mark_bar().encode(
alt.X('PGI:Q', bin=True),
y='count()'
)
p1 & p2
5. Simulating deletions¶
import pandas as pd
import cobra.test
from cobra.flux_analysis import (
single_gene_deletion, single_reaction_deletion, double_gene_deletion,
double_reaction_deletion)
model = cobra.test.create_test_model("textbook")
5.1. Single gene and reaction knockouts¶
A commonly asked question when analyzing metabolic models is what will happen if a certain reaction was not allowed to have any flux at all. This can tested using cobrapy by
print('complete model: ', model.optimize())
with model:
model.reactions.PFK.knock_out()
print('pfk knocked out: ', model.optimize())
complete model: <Solution 0.874 at 0x7fac34eeec50>
pfk knocked out: <Solution 0.704 at 0x7fac34eee550>
For evaluating genetic manipulation strategies, it is more interesting to examine what happens if given genes are knocked out as doing so can affect no reactions in case of redundancy, or more reactions if gene when is participating in more than one reaction.
print('complete model: ', model.optimize())
with model:
model.genes.b1723.knock_out()
print('pfkA knocked out: ', model.optimize())
model.genes.b3916.knock_out()
print('pfkB knocked out: ', model.optimize())
complete model: <Solution 0.874 at 0x7fac34ed0d10>
pfkA knocked out: <Solution 0.874 at 0x7fac34ed0350>
pfkB knocked out: <Solution 0.704 at 0x7fac34ed0390>
5.2 Single deletions¶
Perform all single gene deletions on a model
deletion_results = single_gene_deletion(model)
These can also be done for only a subset of genes
single_gene_deletion(model, model.genes[:20])
growth | status | |
---|---|---|
ids | ||
(b1478) | 0.873922 | optimal |
(b0474) | 0.873922 | optimal |
(b0118) | 0.873922 | optimal |
(b2587) | 0.873922 | optimal |
(b1849) | 0.873922 | optimal |
(b0726) | 0.858307 | optimal |
(b1276) | 0.873922 | optimal |
(b0356) | 0.873922 | optimal |
(b0727) | 0.858307 | optimal |
(b3115) | 0.873922 | optimal |
(b3736) | 0.374230 | optimal |
(b2296) | 0.873922 | optimal |
(b0116) | 0.782351 | optimal |
(b0351) | 0.873922 | optimal |
(b1241) | 0.873922 | optimal |
(b3733) | 0.374230 | optimal |
(b3732) | 0.374230 | optimal |
(b3734) | 0.374230 | optimal |
(s0001) | 0.211141 | optimal |
(b3735) | 0.374230 | optimal |
5.3 Double deletions¶
Double deletions run in a similar way. Passing in return_frame=True
will cause them to format the results as a pandas.DataFrame
.
double_gene_deletion(
model, model.genes[-5:], return_frame=True).round(4)
growth | status | |
---|---|---|
ids | ||
(b0008) | 0.8739 | optimal |
(b3919, b2464) | 0.7040 | optimal |
(b2465, b2464) | 0.8739 | optimal |
(b0008, b2935) | 0.8739 | optimal |
(b3919, b2465) | 0.7040 | optimal |
(b2465, b2935) | 0.8739 | optimal |
(b2465) | 0.8739 | optimal |
(b3919) | 0.7040 | optimal |
(b2935) | 0.8739 | optimal |
(b0008, b2464) | 0.8739 | optimal |
(b3919, b2935) | 0.7040 | optimal |
(b2464) | 0.8739 | optimal |
(b0008, b3919) | 0.7040 | optimal |
(b2464, b2935) | 0.8739 | optimal |
(b0008, b2465) | 0.8739 | optimal |