Constraint-based analysis (CBA) code¶

We will analyse contraint-based models of the form

using cobrapy.

Information in this tutorial is based on https://cobrapy.readthedocs.io/en/latest/

%matplotlib inline

1. Working with metabolic models¶

1.1 Loading and inspecting model¶

To begin with, cobrapy comes with a “textbook” model of E. coli core metabolism. To load the test model use

import cobra
import cobra.test
model = cobra.test.create_test_model("textbook")

The reactions, metabolites, and genes attributes of the cobrapy model are a special type of list called a cobra.DictList, and each one is made up of cobra.Reaction, cobra.Metabolite and cobra.Gene objects respectively.

print(len(model.reactions))
print(len(model.metabolites))
print(len(model.genes))

95
72
137

When using Jupyter notebook this type of information is rendered as a table.

model

Name	e_coli_core
Memory address	0x07fac4a9cb290
Number of metabolites	72
Number of reactions	95
Number of groups	0
Objective expression	1.0Biomass_Ecoli_core - 1.0Biomass_Ecoli_core_reverse_2cdba
Compartments	cytosol, extracellular

Just like a regular list, objects in the DictList can be retrieved by index. For example, to get the 30th reaction in the model (at index 29 because of 0-indexing):

model.reactions[29]

Reaction identifier	EX_glu__L_e
Name	L-Glutamate exchange
Memory address	0x07facb0d14610
Stoichiometry	glu__L_e --> L-Glutamate -->
GPR
Lower bound	0.0
Upper bound	1000.0

Additionally, items can be retrieved by their id using the DictList.get_by_id() function. For example, to get the cytosolic atp metabolite object (the id is “atp_c”), we can do the following:

model.metabolites.get_by_id("atp_c")

Metabolite identifier	atp_c
Name	ATP
Memory address	0x07fac4a737850
Formula	C10H12N5O13P3
Compartment	c
In 13 reaction(s)	PGK, ACKr, ADK1, PFK, PPS, ATPM, SUCOAS, Biomass_Ecoli_core, GLNS, PYK, ATPS4r, PPCK, GLNabc

1.2 Reactions¶

We will consider the reaction glucose 6-phosphate isomerase, which interconverts glucose 6-phosphate and fructose 6-phosphate. The reaction id for this reaction in our test model is PGI.

pgi = model.reactions.get_by_id("PGI")
pgi

Reaction identifier	PGI
Name	glucose-6-phosphate isomerase
Memory address	0x07facb0cfa7d0
Stoichiometry	g6p_c <=> f6p_c D-Glucose 6-phosphate <=> D-Fructose 6-phosphate
GPR	b4025
Lower bound	-1000.0
Upper bound	1000.0

We can also ensure the reaction is mass balanced. This function will return elements which violate mass balance. If it comes back empty, then the reaction is mass balanced.

pgi.check_mass_balance()

{}

1.3 Metabolites¶

We will consider cytosolic atp as our metabolite, which has the id “atp_c” in our test model.

atp = model.metabolites.get_by_id("atp_c")
atp

Metabolite identifier	atp_c
Name	ATP
Memory address	0x07fac4a737850
Formula	C10H12N5O13P3
Compartment	c
In 13 reaction(s)	PGK, ACKr, ADK1, PFK, PPS, ATPM, SUCOAS, Biomass_Ecoli_core, GLNS, PYK, ATPS4r, PPCK, GLNabc

1.4 Genes¶

The gene_reaction_rule is a boolean representation of the gene requirements for this reaction to be active as described in Schellenberger2011.

The GPR is stored as the gene_reaction_rule for a Reaction object as a string.

gpr = pgi.gene_reaction_rule
gpr

'b4025'

Corresponding gene objects also exist. These objects are tracked by the reactions itself, as well as by the model

pgi.genes

frozenset({<Gene b4025 at 0x7facb0d45090>})

pgi_gene = model.genes.get_by_id("b4025")
pgi_gene

Gene identifier	b4025
Name	pgi
Memory address	0x07facb0d45090
Functional	True
In 1 reaction(s)	PGI

Each gene keeps track of the reactions it catalyzes

pgi_gene.reactions

frozenset({<Reaction PGI at 0x7facb0cfa7d0>})

1.5 Export as SBML & Visualization¶

The model can be exported to standard exchange formats, e.g., Systems Biology Markup Language (SBML) and be analyzed in other tools like for instance Cytoscape with cy3sbml

from cobra.io import write_sbml_model
write_sbml_model(model, "./models/e_coli_core.xml")

For instance after loading the model with cy3sbml we can get an overview over the metabolic network with the associated gene-product rules. Note the typical hairball structure of biological networks due to highly connected nodes (cofactor metabolites).

2. FBA simulations¶

Simulations using flux balance analysis can be solved using Model.optimize(). This will maximize or minimize (maximizing is the default) flux through the objective reactions.

import cobra.test
model = cobra.test.create_test_model("textbook")
model

Name	e_coli_core
Memory address	0x07facb0cfafd0
Number of metabolites	72
Number of reactions	95
Number of groups	0
Objective expression	1.0Biomass_Ecoli_core - 1.0Biomass_Ecoli_core_reverse_2cdba
Compartments	cytosol, extracellular

2.1. Running FBA¶

solution = model.optimize()
print(solution)

<Solution 0.874 at 0x7facb0be84d0>

The Model.optimize() function will return a Solution object. A solution object has several attributes:

objective_value: the objective value
status: the status from the linear programming solver
fluxes: a pandas series with flux indexed by reaction identifier. The flux for a reaction variable is the difference of the primal values for the forward and reverse reaction variables.
shadow_prices: a pandas series with shadow price indexed by the metabolite identifier.

For example, after the last call to model.optimize(), if the optimization succeeds it’s status will be optimal. In case the model is infeasible an error is raised.

2.2. Analyzing FBA solutions¶

Models solved using FBA can be further analyzed by using summary methods, which output printed text to give a quick representation of model behavior. Calling the summary method on the entire model displays information on the input and output behavior of the model, along with the optimized objective.

model.summary()

	IN_FLUXES		OUT_FLUXES		OBJECTIVES
	ID	FLUX	ID	FLUX	ID	FLUX
0	o2_e	21.799493	h2o_e	29.175827	Biomass_Ecoli_core	0.873922
1	glc__D_e	10.000000	co2_e	22.809833	NaN	NaN
2	nh4_e	4.765319	h_e	17.530865	NaN	NaN
3	pi_e	3.214895	NaN	NaN	NaN	NaN

In addition, the input-output behavior of individual metabolites can also be inspected using summary methods. For instance, the following commands can be used to examine the overall redox balance of the model

model.metabolites.nadh_c.summary()

		PERCENT	FLUX	REACTION_STRING
RXN_STAT	ID
PRODUCING	GAPD	41.582168	16.023526	g3p_c + nad_c + pi_c <=> 13dpg_c + h_c + nadh_c
	PDH	24.088820	9.282533	coa_c + nad_c + pyr_c --> accoa_c + co2_c + na...
	AKGDH	13.142408	5.064376	akg_c + coa_c + nad_c --> co2_c + nadh_c + suc...
	MDH	13.142408	5.064376	mal__L_c + nad_c <=> h_c + nadh_c + oaa_c
	Biomass_Ecoli_core	8.044196	3.099800	1.496 3pg_c + 3.7478 accoa_c + 59.81 atp_c + 0...
CONSUMING	NADH16	100.000000	38.534610	4.0 h_c + nadh_c + q8_c --> 3.0 h_e + nad_c + ...

Or to get a sense of the main energy production and consumption reactions

model.metabolites.atp_c.summary()

		PERCENT	FLUX	REACTION_STRING
RXN_STAT	ID
PRODUCING	ATPS4r	66.579799	45.514010	adp_c + 4.0 h_e + pi_c <=> atp_c + h2o_c + 3.0...
	PGK	23.439885	16.023526	3pg_c + atp_c <=> 13dpg_c + adp_c
	SUCOAS	7.408381	5.064376	atp_c + coa_c + succ_c <=> adp_c + pi_c + succ...
	PYK	2.571936	1.758177	adp_c + h_c + pep_c --> atp_c + pyr_c
CONSUMING	Biomass_Ecoli_core	76.461640	52.269245	1.496 3pg_c + 3.7478 accoa_c + 59.81 atp_c + 0...
	ATPM	12.273243	8.390000	atp_c + h2o_c --> adp_c + h_c + pi_c
	PFK	10.938227	7.477382	atp_c + f6p_c --> adp_c + fdp_c + h_c

2.3. Changing objectives¶

The objective function is determined from the objective_coefficient attribute of the objective reaction(s). Generally, a “biomass” function which describes the composition of metabolites which make up a cell is used.

biomass_rxn = model.reactions.get_by_id("Biomass_Ecoli_core")
biomass_rxn

Reaction identifier	Biomass_Ecoli_core
Name	Biomass Objective Function with GAM
Memory address	0x07facb0c110d0
Stoichiometry	1.496 3pg_c + 3.7478 accoa_c + 59.81 atp_c + 0.361 e4p_c + 0.0709 f6p_c + 0.129 g3p_c + 0.205 g6p_c + 0.2557 gln__L_c + 4.9414 glu__L_c + 59.81 h2o_c + 3.547 nad_c + 13.0279 nadph_c + 1.7867 oaa_c ... 1.496 3-Phospho-D-glycerate + 3.7478 Acetyl-CoA + 59.81 ATP + 0.361 D-Erythrose 4-phosphate + 0.0709 D-Fructose 6-phosphate + 0.129 Glyceraldehyde 3-phosphate + 0.205 D-Glucose 6-phosphate + 0.2557...
GPR
Lower bound	0.0
Upper bound	1000.0

Currently in the model, there is only one reaction in the objective (the biomass reaction), with an linear coefficient of 1.

from cobra.util.solver import linear_reaction_coefficients
linear_reaction_coefficients(model)

{<Reaction Biomass_Ecoli_core at 0x7facb0c110d0>: 1.0}

The objective function can be changed by assigning Model.objective, which can be a reaction object (or just it’s name), or a dict of {Reaction: objective_coefficient}.

# change the objective to ATPM
model.objective = "ATPM"

# The upper bound should be 1000, so that we get
# the actual optimal value
model.reactions.get_by_id("ATPM").upper_bound = 1000.
linear_reaction_coefficients(model)

{<Reaction ATPM at 0x7facb0c11990>: 1.0}

model.optimize().objective_value

174.99999999999966

model.summary()

	IN_FLUXES		OUT_FLUXES		OBJECTIVES
	ID	FLUX	ID	FLUX	ID	FLUX
0	o2_e	60.0	co2_e	60.0	ATPM	175.0
1	glc__D_e	10.0	h2o_e	60.0	NaN	NaN

model.metabolites.atp_c.summary()

		PERCENT	FLUX	REACTION_STRING
RXN_STAT	ID
PRODUCING	ATPS4r	72.972973	135.0	adp_c + 4.0 h_e + pi_c <=> atp_c + h2o_c + 3.0...
	SUCOAS	10.810811	20.0	atp_c + coa_c + succ_c <=> adp_c + pi_c + succ...
	PGK	10.810811	20.0	3pg_c + atp_c <=> 13dpg_c + adp_c
	PYK	5.405405	10.0	adp_c + h_c + pep_c --> atp_c + pyr_c
CONSUMING	ATPM	94.594595	175.0	atp_c + h2o_c --> adp_c + h_c + pi_c
CONSUMING	PFK	5.405405	10.0	atp_c + f6p_c --> adp_c + fdp_c + h_c

3 Flux Variability Analysis (FVA)¶

FBA will not give always give unique solution, because multiple flux states can achieve the same optimum. FVA (or flux variability analysis) finds the ranges of each metabolic flux at the optimum.

from cobra.flux_analysis import flux_variability_analysis
flux_variability_analysis(model, model.reactions[:10])

	minimum	maximum
ACALD	-5.247085e-14	0.000000e+00
ACALDt	-5.247085e-14	0.000000e+00
ACKr	-8.024953e-14	0.000000e+00
ACONTa	2.000000e+01	2.000000e+01
ACONTb	2.000000e+01	2.000000e+01
ACt2r	-8.024953e-14	0.000000e+00
ADK1	0.000000e+00	3.410605e-13
AKGDH	2.000000e+01	2.000000e+01
AKGt2r	-3.172656e-14	0.000000e+00
ALCD2x	-4.547474e-14	0.000000e+00

Setting parameter fraction_of_optimium=0.90 would give the flux ranges for reactions at 90% optimality.

cobra.flux_analysis.flux_variability_analysis(
    model, model.reactions[:10], fraction_of_optimum=0.9)

	minimum	maximum
ACALD	-2.692308	0.0
ACALDt	-2.692308	0.0
ACKr	-4.117647	0.0
ACONTa	8.461538	20.0
ACONTb	8.461538	20.0
ACt2r	-4.117647	0.0
ADK1	0.000000	17.5
AKGDH	2.500000	20.0
AKGt2r	-1.489362	0.0
ALCD2x	-2.333333	0.0

3.1. Running FVA in summary methods¶

Flux variability analysis can also be embedded in calls to summary methods. For instance, the expected variability in substrate consumption and product formation can be quickly found by

model.optimize()
model.summary(fva=0.95)

	IN_FLUXES				OUT_FLUXES				OBJECTIVES
	ID	FLUX	FLUX_MIN	FLUX_MAX	ID	FLUX	FLUX_MIN	FLUX_MAX	ID	FLUX
0	o2_e	60.0	55.882353	60.000000	h2o_e	60.0	54.166667	60.000000	ATPM	175.0
1	glc__D_e	10.0	9.500000	10.000000	co2_e	60.0	54.166667	60.000000	NaN	NaN
2	nh4_e	-0.0	-0.000000	0.673077	h_e	0.0	0.000000	5.833333	NaN	NaN
3	NaN	NaN	NaN	NaN	for_e	0.0	0.000000	5.833333	NaN	NaN
4	NaN	NaN	NaN	NaN	ac_e	0.0	0.000000	2.058824	NaN	NaN
5	NaN	NaN	NaN	NaN	acald_e	0.0	0.000000	1.346154	NaN	NaN
6	NaN	NaN	NaN	NaN	pyr_e	0.0	0.000000	1.346154	NaN	NaN
7	NaN	NaN	NaN	NaN	etoh_e	0.0	0.000000	1.166667	NaN	NaN
8	NaN	NaN	NaN	NaN	lac__D_e	0.0	0.000000	1.129032	NaN	NaN
9	NaN	NaN	NaN	NaN	succ_e	0.0	0.000000	0.875000	NaN	NaN
10	NaN	NaN	NaN	NaN	akg_e	0.0	0.000000	0.744681	NaN	NaN
11	NaN	NaN	NaN	NaN	glu__L_e	0.0	0.000000	0.673077	NaN	NaN

4 Flux sampling¶

The easiest way to get started with flux sampling is using the sample function in the flux_analysis submodule. sample takes at least two arguments: a cobra model and the number of samples you want to generate.

This samples from the flux cone of possible solutions (see Megchelenbrink2014 for details).

from cobra.test import create_test_model
from cobra.sampling import sample

model = create_test_model("textbook")
s = sample(model, 500)
s.head()

	ACALD	ACALDt	ACKr	ACONTa	ACONTb	ACt2r	ADK1	AKGDH	AKGt2r	ALCD2x	...	RPI	SUCCt2_2	SUCCt3	SUCDi	SUCOAS	TALA	THD2	TKT1	TKT2	TPI
0	-0.279405	-0.204919	-0.495190	11.843064	11.843064	-0.495190	42.164976	9.972375	-0.027105	-0.074487	...	-0.547864	1.638816	3.633646	491.572336	-9.972375	0.538130	3.697968	0.538130	0.534215	9.159104
1	-0.493738	-0.186165	-3.729782	5.419629	5.419629	-3.729782	17.895967	3.137708	-0.422005	-0.307573	...	-7.013742	14.738326	15.027930	903.123183	-3.137708	7.006810	16.375670	7.006810	7.004023	2.801040
2	-2.918138	-0.047177	-5.604668	4.184706	4.184706	-5.604668	21.380668	3.457323	-0.032278	-2.870962	...	-2.137431	18.409834	19.089707	851.886851	-3.457323	2.133926	2.890324	2.133926	2.132516	7.635461
3	-1.210218	-0.108791	-4.188558	5.325057	5.325057	-4.188558	4.512811	1.853503	-0.703495	-1.101427	...	-3.141216	0.985120	1.207118	891.174008	-1.853503	3.108748	4.295533	3.108748	3.095691	6.291233
4	-2.144321	-0.133900	-2.748090	5.946703	5.946703	-2.748090	11.061729	4.668504	-0.052671	-2.010421	...	-4.326408	9.847510	11.487255	963.492028	-4.668504	4.312598	5.354727	4.312598	4.307044	5.605263

5 rows × 95 columns

import altair as alt
p1 = alt.Chart(s).mark_tick().encode(
    x='PGI:Q',
)
p2 = alt.Chart(s).mark_bar().encode(
    alt.X('PGI:Q', bin=True),
    y='count()'
)
p1 & p2

5. Simulating deletions¶

import pandas as pd
import cobra.test
from cobra.flux_analysis import (
    single_gene_deletion, single_reaction_deletion, double_gene_deletion,
    double_reaction_deletion)

model = cobra.test.create_test_model("textbook")

5.1. Single gene and reaction knockouts¶

A commonly asked question when analyzing metabolic models is what will happen if a certain reaction was not allowed to have any flux at all. This can tested using cobrapy by

print('complete model: ', model.optimize())
with model:
    model.reactions.PFK.knock_out()
    print('pfk knocked out: ', model.optimize())

complete model:  <Solution 0.874 at 0x7fac34eeec50>
pfk knocked out:  <Solution 0.704 at 0x7fac34eee550>

For evaluating genetic manipulation strategies, it is more interesting to examine what happens if given genes are knocked out as doing so can affect no reactions in case of redundancy, or more reactions if gene when is participating in more than one reaction.

print('complete model: ', model.optimize())
with model:
    model.genes.b1723.knock_out()
    print('pfkA knocked out: ', model.optimize())
    model.genes.b3916.knock_out()
    print('pfkB knocked out: ', model.optimize())

complete model:  <Solution 0.874 at 0x7fac34ed0d10>
pfkA knocked out:  <Solution 0.874 at 0x7fac34ed0350>
pfkB knocked out:  <Solution 0.704 at 0x7fac34ed0390>

5.2 Single deletions¶

Perform all single gene deletions on a model

deletion_results = single_gene_deletion(model)

These can also be done for only a subset of genes

single_gene_deletion(model, model.genes[:20])

	growth	status
ids
(b1478)	0.873922	optimal
(b0474)	0.873922	optimal
(b0118)	0.873922	optimal
(b2587)	0.873922	optimal
(b1849)	0.873922	optimal
(b0726)	0.858307	optimal
(b1276)	0.873922	optimal
(b0356)	0.873922	optimal
(b0727)	0.858307	optimal
(b3115)	0.873922	optimal
(b3736)	0.374230	optimal
(b2296)	0.873922	optimal
(b0116)	0.782351	optimal
(b0351)	0.873922	optimal
(b1241)	0.873922	optimal
(b3733)	0.374230	optimal
(b3732)	0.374230	optimal
(b3734)	0.374230	optimal
(s0001)	0.211141	optimal
(b3735)	0.374230	optimal

5.3 Double deletions¶

Double deletions run in a similar way. Passing in return_frame=True will cause them to format the results as a pandas.DataFrame.

double_gene_deletion(
    model, model.genes[-5:], return_frame=True).round(4)

	growth	status
ids
(b0008)	0.8739	optimal
(b3919, b2464)	0.7040	optimal
(b2465, b2464)	0.8739	optimal
(b0008, b2935)	0.8739	optimal
(b3919, b2465)	0.7040	optimal
(b2465, b2935)	0.8739	optimal
(b2465)	0.8739	optimal
(b3919)	0.7040	optimal
(b2935)	0.8739	optimal
(b0008, b2464)	0.8739	optimal
(b3919, b2935)	0.7040	optimal
(b2464)	0.8739	optimal
(b0008, b3919)	0.7040	optimal
(b2464, b2935)	0.8739	optimal
(b0008, b2465)	0.8739	optimal