Yeast Metabolysis - MUCM

**Yeast** **Metabolysis**

Peter Challenor, Jim Gattiker

National Oceanography Centre, Southampton

The Problem

◮ Saccharomyces cerevisiae glycolytic pathway (i.e., yeast

glycolysis)

◮ Data from Tuesink et al (2000) - in vitro experiment

◮ Model from Pritchard and Kell (2002) - system of ODEs (in

SBML) to explain the data

◮ Aims:

◮ Calibrate the model

◮ Demonstrate **MUCM** methods with biological data

Data

◮ The data are the concentrations of chemical species

◮ One (glycerone phosphate) was removed from the analysis

because its response was ‘dead’ and it caused some

numerical instabilities

◮ Tolerance is ±2 standard deviation.

The Data

SBML id Teusink name P&K name val tolerance

ATP ATP ATP 2.52 0.20

G6P G6P Glc6P 2.45 0.14

ADP ADP ADP 1.32 0.10

F6P F6P Fru6P 0.62 0.02

F16bP F1,6bP2 Fru1,6P2 5.51 0.04

AMP AMP AMP 0.25 0.07

GAP Phosphoenolpyruvate Gra3P 0.07 0.01

NAD NAD NAD 1.20 0.13

NADH NADH NADH 0.39 0.09

P3G 3GriP Gri3P 0.90 0.02

P2G 2GriP Gri2P 0.12 0.01

PYR Pyruvate pyrauvate 1.85 0.64

AcAld Acetaldehyde acetaldehyde 0.17 0.02

The simulations

◮ Taken directly from SBML - SBML Toolbox for Matlab

◮ The free parameters are the Vmax parameters in the ODE

system.

◮ Parameters uncertain in the range from 1

2 x ∗ to 2x ∗ , where

x ∗ is the “nominal” value given in the reference SBML

model

◮ Solved using Matlab’s ode15s function (for stiff ODEs), to

10 4 seconds, which is a good settling time, and is also

quickly computed.

SBML species id SBML Param id Nominal value

HXT Vmax_1 103.2700

HK Vmax_2 403.5400

PGI Vmax_3 1937.6000

PFK Vmax_4 121.6700

ALD Vmax_5 101.1800

GAPDH Vmaxf_7 3419.5000

GAPDH Vmaxr_7 6596.2000

PGK Vmax_8 1283.8000

PGM Vmax_9 2429.2000

ENO Vmax_10 220.6200

PYK Vmax_11 952.2700

PDC Vmax_12 874.2500

ADH Vmax_13 50.1380

G3PDH Vmax_16 47.3710

The Design of the Experiment

◮ To build the emulator we need a training set of model runs

◮ This is generated in a designed experiment

◮ This ensemble is designed to fill parameter space with a

minimum number of model runs

◮ The most common design is the Latin Hypercube, which

we use here

◮ In our experiments we have 256 member ensembles for

training

The Latin hypercube

X 2

●

●

X 1

●

Not all Latin hypercubes are equal

X 2

●

●

X 1

●

Maximin Latin Hypercubes

◮ Optimal Latin Hypercube not available

◮ Use maximin criteria (maximise the minimum distance

between points)

◮ Generate large number of hypercubes and choose the

‘best’

Our experiment

◮ Maximin Latin Hypercube 256 members

◮ Only 137 reached steady state

Scatterplot showing convergence in the LHC design

Red indicates convergence.

Where do we get to steady state?

Figure: Tree classifier diagram, classifying whether a point in the

design converges to steady-state (1) or not. The tree shown achieves

90% accuracy.

Multivariate Outputs

◮ Emulator theory (at present) is univariate

◮ We have 14 outputs

◮ We could emulate them separately

◮ Instead take principal components of outputs

◮ Use first 8 - these account for 99% of the variation

◮ Emulate these

◮ Predict 25 runs held back in turn

Crossvalidation with PC’s

2

0

!2

!4

%&1

3

2

1

0

!1

%&2

!!

!! 4 !4

%&4

!2 0

!2

2!2

3

2

!1

%&5

0 1 2

!2

3!2

5

%&!

0 2 4

2

1

0

0

0

!1

!2

!2

!10

!2 5 0 2 4!2

4 !1 0 1 2 !10 3 !5 0 5

%&*

%&8

0

!5

!4

!5 0 5!4

!2 0 2 4

2

0

!2

4

2

0

!5

%&3

Crossvalidation with original variables

6

4

2

ATP

1000

500

0

Glc6P

400

0

!500

0 2 4 !500 6

2

0 500 1000

0.5

0

!100

0 1 2 !100 3

2

Fru1,6P2

AMP

Gra3P

NAD 0 100 200

200

0

2

1

0

1

0

3

2

1

0

ADP

200

100

0

Fru6P

!1

0 200 400

50

!0.5

!1 0 1

10

0

!0.5 2

0 0.5

4

NADH

Gri3P

Gri2P

0 pyrauvate1

2

0

!1

!1

1

!50

!10

0

acetaldehyde

0 1 !50 2

0 50 !10 0 100

2 4

0.5

0

0 0.5 1

0

1

2

Active variables

Values

Values

Values

Values

Values

Values

Values

Values

0.5

0

dummy HXT HK PGI PFK ALD GAPDHGAPDH PGK PGM ENO PYK PDC ADH G3PDH

1

PC1

0.5

0

dummy HXT HK PGI PFK ALD GAPDHGAPDH PGK PGM ENO PYK PDC ADH G3PDH

1

PC2

0.5

0

dummy HXT HK PGI PFK ALD GAPDHGAPDH PGK PGM ENO PYK PDC ADH G3PDH

1

PC3

0.5

0

dummy HXT HK PGI PFK ALD GAPDHGAPDH PGK PGM ENO PYK PDC ADH G3PDH

1

PC4

0.5

0

dummy HXT HK PGI PFK ALD GAPDHGAPDH PGK PGM ENO PYK PDC ADH G3PDH

1

PC5

0.5

0

dummy HXT HK PGI PFK ALD GAPDHGAPDH PGK PGM ENO PYK PDC ADH G3PDH

1

PC6

0.5

0

dummy HXT HK PGI PFK ALD GAPDHGAPDH PGK PGM ENO PYK PDC ADH G3PDH

1

PC7

0.5

0

dummy HXT HK PGI PFK ALD GAPDHGAPDH PGK PGM ENO PYK PDC ADH G3PDH

1

PC8

Active variables

Values

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

dummy HXT HK PGI PFK ALD GAPDHGAPDH PGK PGM ENO PYK PDC ADH G3PDH

Active variables

◮ Conservative variable selection chooses 6 active variables

◮ HXT, PFK, ENO, PDC, ADH, G3PDH

◮ Redo the experiment with these six variables

◮ New 256 member Latin Hypercube

Active variables

◮ Conservative variable selection chooses 6 active variables

◮ HXT, PFK, ENO, PDC, ADH, G3PDH

◮ Redo the experiment with these six variables

◮ New 256 member Latin Hypercube

Active variables

◮ Conservative variable selection chooses 6 active variables

◮ HXT, PFK, ENO, PDC, ADH, G3PDH

◮ Redo the experiment with these six variables

◮ New 256 member Latin Hypercube

Active variables

◮ Conservative variable selection chooses 6 active variables

◮ HXT, PFK, ENO, PDC, ADH, G3PDH

◮ Redo the experiment with these six variables

◮ New 256 member Latin Hypercube

Convergence in the reduced LHC design space

Covergence is blue

1

0.5

0

1

0.5

0

1

0.5

0

1

0.5

0

1

0.5

0

1

0.5

HXT

PFK

ENO

PDC

ADH

0

0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1

G3PDH

0 0.5 1

Crossvalidation using the active variables

6

4

2

ATP

1000 Glc6P

500

0

400

0

!500

0 2 4 !500 6

2

0 500 1000

0.5

0

!100

0 1 2 !100 3

2

Fru1,6P2

AMP

NAD 0 100 200

200

0

1

0

3

2

1

0

ADP

Gra3P

200

100

0

Fru6P

2

!1

0 200 400

50

!0.5

!1 0 1

10

0

!0.5 2

0 0.5

4

NADH

0 pyrauvate1

2

1

0

0

Gri3P

!1

!1

1

!50

!10

0

0 1 !50 2

0 50 !10 0 100

2 4

0.5

acetaldehyde

0

0 0.5 1

0

Gri2P

1

2

Calibration

◮ Now we calibrate the model

◮ Use the Tuesink et al data

Calibration

SBML species Nominal value Scaled Native

HXT Vmax_1 103.2700 0.28 95.1

PFK Vmax_4 121.6700 0.84 213.4

ENO Vmax_10 220.6200 0.26 196.4

PDC Vmax_12 874.2500 0.28 806.0

ADH Vmax_13 50.1380 0.62 71.4

G3PDH Vmax_16 47.3710 0.13 33.0

Calibration

HXT

PFK

ENO

PDC

ADH

G3PDH

HXT

PFK ENO PDC ADH

G3PDH

Model Discrepancy

◮ Our calibration includes a model discrepancy term

◮ We have used a simple constant (with a prior value of 0)

Model Discrepancy

60 ATP

40

20

0

!0.5 0 0.5

100 Fru1,6P2

50

0

!2 0

60

NADH

2

40

20

0

!1 0

60

acetaldehyde

1

40

20

0

!1 0 1

60 Glc6P

40

20

0

!0.5 0 0.5

60 AMP

40

20

0

!1 0 1

60 Gri3P

40

20

0

!1 0 1

60 ADP

40

20

0

!2 0 2

60 Gra3P

40

20

0

0

60

Gri2P

1 2

40

20

0

!1 0 1

60 Fru6P

40

20

0

!0.5

60

NAD

0 0.5

40

20

0

!1 0 1

60

40

20

pyrauvate

0

!1 0 1

Model Discrepancy

60 ATP

40

20

0

1.5 2 2.5

60 Fru1,6P2

40

20

0

!50 0

60

NADH

50

40

20

0

0 0.5

60

acetaldehyde

1

40

20

0

0 0.2 0.4

40 Glc6P

20

0

0 200 400

60 AMP

40

20

0

0 0.5 1

60 Gri3P

40

20

0

!5 0 5

60 ADP

40

20

0

1 1.5 2

60 Gra3P

40

20

0

0 0.05 0.1

60 Gri2P

40

20

0

!0.5 0 0.5

60 Fru6P

40

20

0

0

60

NAD

50 100

40

20

0

1 1.2 1.4

60

40

20

pyrauvate

0

1 1.5 2

Conclusions

◮ Demonstrated that emulator-based methods can be

applied to the glycolysis model

◮ More work needed on

◮ Variable selection - sequential selection

◮ Non-steady state solutions. Can we build an emulator to

predict where we will not get a solution?

◮ Better deterministic modelling

◮ Better priors

◮ More realistic model discrepancy

◮ More complex models

Conclusions

◮ Demonstrated that emulator-based methods can be

applied to the glycolysis model

◮ More work needed on

◮ Variable selection - sequential selection

◮ Non-steady state solutions. Can we build an emulator to

predict where we will not get a solution?

◮ Better deterministic modelling

◮ Better priors

◮ More realistic model discrepancy

◮ More complex models

Conclusions

◮ Demonstrated that emulator-based methods can be

applied to the glycolysis model

◮ More work needed on

◮ Variable selection - sequential selection

◮ Non-steady state solutions. Can we build an emulator to

predict where we will not get a solution?

◮ Better deterministic modelling

◮ Better priors

◮ More realistic model discrepancy

◮ More complex models

Conclusions

◮ Demonstrated that emulator-based methods can be

applied to the glycolysis model

◮ More work needed on

◮ Variable selection - sequential selection

◮ Non-steady state solutions. Can we build an emulator to

predict where we will not get a solution?

◮ Better deterministic modelling

◮ Better priors

◮ More realistic model discrepancy

◮ More complex models

Conclusions

◮ Demonstrated that emulator-based methods can be

applied to the glycolysis model

◮ More work needed on

◮ Variable selection - sequential selection

◮ Non-steady state solutions. Can we build an emulator to

predict where we will not get a solution?

◮ Better deterministic modelling

◮ Better priors

◮ More realistic model discrepancy

◮ More complex models

Conclusions

◮ Demonstrated that emulator-based methods can be

applied to the glycolysis model

◮ More work needed on

◮ Variable selection - sequential selection

◮ Non-steady state solutions. Can we build an emulator to

predict where we will not get a solution?

◮ Better deterministic modelling

◮ Better priors

◮ More realistic model discrepancy

◮ More complex models

Conclusions

◮ Demonstrated that emulator-based methods can be

applied to the glycolysis model

◮ More work needed on

◮ Variable selection - sequential selection

◮ Non-steady state solutions. Can we build an emulator to

predict where we will not get a solution?

◮ Better deterministic modelling

◮ Better priors

◮ More realistic model discrepancy

◮ More complex models