HIERARCHAL INDUCTIVE PROCESS MODELING AND ANALYSIS ...

HIERARCHAL INDUCTIVE PROCESS MODELING AND ANALYSIS 

Youri Noël Nelson 

A Thesis Submitted to the 

University of North Carolina Wilmington in Partial Fulfillment 

of the Requirements for the Degree of 

Master of Science 

Department of Mathematics and Statistics 

University of North Carolina Wilmington 

2011 

Approved by 

Advisory Committee 

Michael Freeze 

Xin Lu 

Wei Feng 

Chair 

Stuart Borrett 

Co-Chair 

Accepted by 

Dean, Graduate School

TABLE OF CONTENTS 

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii 

DEDICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv 

ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . v 

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 

vi 

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii 

LIST OF SYMBOLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 

viii 

1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 

2 METHOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 

2.1 HIPM Description . . . . . . . . . . . . . . . . . . . . . . . . 10 

2.1.1 Measure of Fit . . . . . . . . . . . . . . . . . . . . . 12 

2.1.2 Entities specification and model library . . . . . . . . 13 

2.2 Experiment Design . . . . . . . . . . . . . . . . . . . . . . . . 16 

3 COMPUTATIONAL RESULTS . . . . . . . . . . . . . . . . . . . . . 20 

3.1 Increase in number of time-series input . . . . . . . . . . . . . 24 

3.2 Value of Information . . . . . . . . . . . . . . . . . . . . . . . 28 

3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 

4 ANALYTICAL ANALYSIS . . . . . . . . . . . . . . . . . . . . . . . 33 

4.1 Most recurrent models . . . . . . . . . . . . . . . . . . . . . . 33 

4.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 

4.3 Model A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 

4.4 Model B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 

4.5 Model C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 

4.6 Effects of increasing the number of constraints . . . . . . . . . 63 

5 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 

ii

APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 

A. Sample CIAO data - 1997 . . . . . . . . . . . . . . . . . . . . . . . 72 

B. Full entity specification file . . . . . . . . . . . . . . . . . . . . . . 73 

C. Full ross Sea generic model library . . . . . . . . . . . . . . . . . . 75 

D. Models selected in both experiment 8 and 19 . . . . . . . . . . . . 87 

E. Models selected in both experiment 8 and 21 . . . . . . . . . . . . 89 

iii

ABSTRACT 

Understanding the Phytoplankton dynamic in the Ross Sea Polynya may yield useful 

knowledge in the search for solving the worlds rising carbon dioxide levels. Modeling 

such dynamics is a very lengthy and tedious process that can be helped with the use 

of computational tools like HIPM. This system relies on knowledge that is already 

available, in the shape of time series data and process library, to construct and then 

evaluates these models. 

In this research models were ranked by sum of squared 

error, from lowest to highest. The lowest being the best fit model. Some of the 

questions that arise from the use of HIPM are about the amount and value of the 

time series provided to the software, from which we formulated two hypotheses. 

Will having more time series better the output of the system Will time series 

for different variables provide different quality of output Through 31 experiments 

and mathematical analysis, we began to answer these questions. The computational 

result showed us that our first hypothesis does not always hold true, which is thought 

to be because of the way the fit is measured. On the other hand the mathematical 

analysis showed us many variations, over all the experiments, in the zooplankton 

equation structure which can be indication that the process library needs to be better 

defined and that the system needs to take into consideration not only Phaeocystis 

antartica phytoplankton species but also diatoms. This thesis provides the start to 

an answer for this hypothesis but further research is still needed. 

iv

DEDICATION 

This Thesis is dedicated to all my friends and family have supported me in this 

incredible journey I started 5 years ago. More importantly I want to dedicate to our 

Lord and Savior as I certainly would not be here today without his help, support 

and comfort. 

“I can do anything through God who strengthens me.”(Philippians 4:13) 

I also want to dedicate this to my nephew Noah Nelson and my niece Sarah Nelson 

for always putting a smile on my face during the tough times, their unconditional 

love and making me want to persevere always. I love you beyond words. 

Thank you, Christel & Douglas Nelson, Lara Nelson, Celio & Elise Nelson, Sven 

Diebold, Andrew & Robin Nelson, Ed & Pat Nelson, Joann Nelson, Philip Varvaris, 

Luke Brown, Taylor Jackson and Bud Edwards (for always being there at the right 

place at the right time) and all my other friends and family members that are not 

named here but are present in my heart and to whom I am so grateful for all the 

words of encouragement and support throughout the years. 

v

ACKNOWLEDGMENTS 

I would like to thank Dr. Feng, Dr. Borrett, Dr. Simmons, Dr. Freeze and Dr. 

Lu for all their help and support in this endeavor and process, as well as my friend 

Brevin Rock for his advice in completing a Masters thesis. 

vi

LIST OF TABLES 

1 Example of entity definition and instantiation (P) . . . . . . . . . . . 15 

2 Example of process definition (Growth) . . . . . . . . . . . . . . . . . 16 

3 Data contained in CIAO set . . . . . . . . . . . . . . . . . . . . . . . 18 

4 Cutoff Value Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 

5 Model A Parameter Values . . . . . . . . . . . . . . . . . . . . . . . . 34 

6 Model B Parameter Values . . . . . . . . . . . . . . . . . . . . . . . . 36 

7 Model C Parameter Values . . . . . . . . . . . . . . . . . . . . . . . . 57 

vii

LIST OF FIGURES 

1 Initial Conceptual Model . . . . . . . . . . . . . . . . . . . . . . . . . 4 

2 Tree diagram representing the process library . . . . . . . . . . . . . 5 

3 Map of the Ross Sea . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 

4 reMSE summary - Part 1 . . . . . . . . . . . . . . . . . . . . . . . . . 21 



7 Good fit Models VS. Number of inputted time-series . . . . . . . . . 24 

8 Mean Activation Values Graph . . . . . . . . . . . . . . . . . . . . . 29 

viii

LIST OF SYMBOLS 

P = Amount of Phytoplankton present in the system (mg Chla/m 3 ), 

D = Detritus concentration (mg C/m 3 ), 

F = Iron concentration (µM), 

Z = Zooplankton concentration (mg C/m 3 ), 

N = Nitrate concentration (µM), 

E ice (t) = Sea ice concentration 

E T H2 O(t) = Temperature of the water ( ◦ C) 

E P UR (t) = Photosynthetically usable radiation ( µmol photons m −2 s −1 ) 

E T H2 O max 

= Maximum water temperature 

E T H2 O min 

= Minimum water temperature 

a i = Optimal parameters of the system selected by HIPM software 

ix

1 INTRODUCTION 

Whether you talk about biology, mathematics, physics, ecology, or any other type 

of science, all have a common objective to explain and describe the world that surrounds 

us. All of these fields build upon the collection of observations, to explain 

recurring phenomena. To explain and depict some of these phenomena scientists 

make use of models which can take a variety of forms including conceptual, formal, 

physical and diagrammatic (Haefner, 2005). 

Models are widely used in science and researchers continue to look for tools or 

techniques that will enhance and optimize their ability to construct new models or 

improve existing ones. 

Given a certain task the type of modeling technique will 

differ, for instance in his book Haefner (2005) uses a Forrester Diagram to model a 

hypothetical agro-ecosystem system, which is a qualitative model formulation. Another 

example would be in biology when describing predator-prey interaction, one 

can use differential equations models like those formulated by Lokta and Volterra 

(Berryman 1992). Models are useful for system study because they let researchers 

conduct experiments and test theories on the system that would otherwise be unethical 

or impossible to perform, as well as enabling them to predict the behavior of 

varying components of an ecosystem. 

Model construction is a difficult and lengthy endeavor. For a given system there 

may be many different combinations of processes (i.e. grazing, decay, growth) that 

could provide a plausible explanation for the behavior being studied. 

Thus, exploring 

and evaluating all these possibilities makes for a tedious task. In the past, 

limitations in computational powers restricted scientists in their ability to investigate 

more complex models, certain known or suspected processes would be left out 

to simplify calculations in part because as computational powers increased so did our 

capacity to evaluate more intricate models (Oreskes 2000). In addition, numerical

models of natural systems are non-unique, there is multiple ways to represent the 

same dynamic. Creating computational tools that would quickly and automatically 

evaluate multiple models seemed to be a promising idea to search through the extensive 

model space. The success of machine learning and data mining in commercial 

domains led scientists to investigate the field of automated modeling to serve that 

particular purpose (Fayyad et al., 1996). 

The act of gathering small pieces of information and combining it to prior knowledge 

to formulate a complex overview of an object or process studied is called induction. 

Induction prevents from searching the entire space of possible equations 

by only piecing together the meaningful terms, for instance a predator-prey model 

will need terms specifying growth and death (Todorovski et al. 2005). Inductive 

modeling methods (i.e. LAGRAMGE, HIPM, ARIMA, FUSE) use the principles of 

induction to construct models of the studied system. Methods used for commercial 

application, such as Knowledge Discovery in Database (KDD) process, were insufficient 

for scientific purposes as they only described and did not explain the observed 

system behavior (Langley et al. 2006). A simple example would be the modeling of 

water consumption in a city, a water company could easily create a numerical model 

based on previous years that would give a good estimate of the projected water 

consumption over time but it may not explain why the consumption fluctuates the 

way it does. In other words the commercial methods were able to produce models 

that are useful when trying to make accurate predictions for a system but become 

very limited when trying to explain which processes drive systems behaviors; these 

methods did not explore the realm of all possible models. Thus, induction methods 

had to be enhanced to automate the task of building and evaluating multiple models 

(Dzeroski et al. 1995). 

In this thesis, I used the hierarchal inductive process modeling technique, which 

is encoded as computer algorithm called HIPM (Langley et al. 2006; Bridewell et 

2

al. 2005; Dzeroski et al. 1995; Borrett et al. 2007). Inductive process modeling 

methods such as HIPM (Bridewell et al. 2008; Borrett et al. 2007; Langley et al. 

2006; Todorovski et al. 2005) searches through two spaces; the first space is made 

up of mathematical formulations and alternative model structures, which consist of 

entities, processes and the connection biding the two and the second space is made 

up of parameter values (Borrett et al. 2007).The system takes as input a hierarchy 

of generic processes - a process being a certain action on the system which is defined 

by mean of fragment mathematical equations and the rule on how to combine these 

fragments with the rest of the equations -, a set of entities - an entity being an object 

regrouping the properties of the organism or nutrient by mean of variables and 

parameters - and a set of observed time series of the entities variables (Todorovski 

et al. 2005). HIPM will perform one of two search for for the model structure, a 

heuristic search or exhaustive search. With the search option selected, HIPM creates 

all the possible model structures with the given background knowledge and selects 

the best set of parameters for each model structure. Finally, the system ranks the 

models based on their sum of squared error (Todorovski et al. 2005). 

This system allows for model representation of complex system dynamics, for 

example in the study of photosynthesis regulation it generated a model that reproduced 

both the qualitative shape and the quantitative details of the time series data 

while incorporating processes that made biological sense (Langley et al. 2006). In 

our case we studied the phytoplankton dynamic in the aquatic ecosystem of the Ross 

Sea. 

In this thesis I used the HIPM tool combined with the appropriate process library 

to study of the phytoplankton dynamic in Ross Sea ecosystem. Here the term 

process library is defined as the collection of processes (i.e. grazing, decay, growth) 

and entities (i.e. phytoplankton, zooplankton, nitrate), with their relation to one 

another. It is best represented by Figure 2. 

3

Figure 1: This schematic represent the interaction between entities and exogenous 

variables driving the model. Here, P, Z , D , NO3 and Fe are the state variables. 

PUR, T and Ice are the exogenous variables acting on the system and influencing the 

state variables. The arrows represent the interaction of one variable onto another 

(Borrett, unpublished research). 

Arrigo, Borrett, Bridewell and Langley used HIPM and the Ross Sea process library 

to create and search a space of over 1120 possible model structures to explain 

the phytoplankton and nitrogen temporal dynamics in the Ross Sea ecosystem; all 

models contained five state variables, phytoplankton, zooplankton, detritus, nitrogen 

and iron. Time series for both phytoplankton and nitrogen where available and 

given to HIPM along with the process library. Their initial research found that 200 

model structures were deemed of good fit, in this case good fit was defined by models 

having a sum of squared error less than or equal to 0.2. From a computer scientist 

standpoint, reducing the search space from 1120 models structure to 200 is a great 

accomplishment; however for a biologist the solution is not specific enough and offers 

few insights on the ecosystem dynamics. There is a need for ways to constraint the 

search further, bringing down the number of good fit models, making the output 

4

Figure 2: A tree diagram representing the process library constructed for the Ross 

Sea ecosystem problem. The interaction between processes and entities is defined in 

the library as explained in Section 2.1.2 ( Borrett et al. 2007) 

useful to biologists. 

Superficially, HIPM appears related to equation discovery methods, which is a 

subfield of machine learning (Langley, 1995; Mitchell, 1997) that investigates collections 

of measurements and observations, using different computational methods, 

in search of quantitative laws (Todorovski, 2003). For example the LAGRAMGE 

system will take in as input background knowledge encoded in terms of a grammar 

5

specifying the space of possible equations and a dependent variable and will output 

the best equation for the variable, able to only perform the search for one variable 

at the time (Dzeroski et al. 1993, Todrovski 2003). This is further related to the 

methods used in Ljungs work (1993) on system identification, but is further removed 

to that of inductive process modeling. 

The main assumption behind system identification is that the model structure 

is known and that the primary concern is finding the adequate parameter values; 

equation discovery focuses on both the structure and parameter values (Todorovski 

et al. 1998). Both of these approach produce descriptive models that summarize 

and predict the data but they fail to search through the space of alternative explanations, 

these methods do not take into account models with theoretical variables 

or consider alternate processes to explain certain dynamics (Bridewell et al. 2005). 

The Southern Ocean covers an area equivalent to about 10% of the global ocean 

and is a key element of the global ocean system as it links all major ocean basins and 

facilitates the global distribution of its deep water; it is considered to play an important 

part in the global carbon (C) cycle (Arrigo et al. 2003). The Ross Sea polynya 

(area of open water surrounded by sea ice) is one of the most productive ecosystems 

in the Southern Ocean as it experiences some of the largest phytoplankton blooms 

in the region (Arrigo et al 1994, 1998, 2000, 2003). Indeed, phytoplankton productivity 

(photosynthesis) is important to the carbon cycle as it removes carbon dioxide 

(CO 2 ) from surface water during photosynthesis, part of which will then be exported 

to deep ocean water. What makes the Ross Sea polynya so interesting for ecologist 

compared to other locations such as Terra Nova Bay, is the type of phytoplankton 

dominating the ecosystem. In the Ross Sea polynya , Phaeocystis antartica dominates 

as opposed to diatoms (species such as Fragilariopsis spp.) in Terra Nova Bay. 

Phaeocystis antartica are thought to resist grazing more than other phytoplankton 

species, which could imply that more carbon would be taken from shallow water into 

6

the depth as the un-eaten phytoplankton full of CO 2 sinks to the bottom (Tagliabue 

and Arrigo 2003). Deep ocean water has a larger residence time than shallow water, 

meaning that carbon trapped in deep ocean water will be effectively removed from 

atmospheric circulation for a much longer time than the carbon contained in surface 

water. 

Figure 3: Map of the southwestern Ross Sea showing the Ross Sea ploynya, located 

north of the Ross Sea Ice Shelf, and the Terra Nova Bay polynya, located on the 

western continental shelf (Arrigo et al. 2003) 

Thus, there is an incentive to understand the ecological processes that control the 

7

phytoplankton productivity and community composition -which species dominatesin 

the Ross Sea. Fluctuations in phytoplankton population could potentially have 

effects on the CO 2 levels in the atmosphere (Carlson et al. 1998) and if we can 

figure out why Phaeocystis antartica is predominant it would be useful information 

to scientist as they entertain the idea of altering phytoplankton populations 

around the world to create carbon sinks, providing a temporary solution to our CO 2 

problem. It is all these elements that initiated the search for the best process explanation 

of the phytoplankton dynamics in the Ross Sea, by determining which 

processes act upon the system and which entities are most important, scientist will 

accumulate knowledge that may prove valuable in the fight against rising CO 2 levels. 

As mentioned the tool that I have chosen for model search relies on measurements 

and observations of one or more variables of a system to make inferences on 

the remaining variables for which no data is available and the processes at works in 

the system. In Borrett’s study, the only state variables for which he had measurements 

and observations are Phytoplankton and Nitrate. Ultimately the goal is to 

select model structures that would be good approximations of the natural system 

and give good insights on the processes at work in the system. However, here I was 

faced with an under constrained optimization problem, there was no data available 

for 3 of the state variables. Indeed, one of the big challenges of using HIPM for this 

particular ecosystem was that the data that is used to conduct the search is very 

expensive to collect, and it becomes especially complicated when it comes to iron 

(Fe) as it is difficult to measure. From this last statement arise two questions: does 

knowing data for more than one state variable narrow down the number of possible 

good fit models in a significant manner Will knowledge about certain variable have 

better optimization power than for others For example if we could only afford to 

collect data for one of the five variables in the system, would phytoplankton give us 

8

etter model output (fewer good fit models) in HIPM than zooplankton or would it 

be detritus 

This is an important question because as scientist are trying to advance their knowledge 

on the Ross Sea; there is a need to make educated decisions on what information 

to collect in an effort to optimize the use of resources. 

This thesis is structured in five parts, firstly I described the method used to 

gather the data that was used in my analysis, and this includes the HIPM software 

as well as an overview of the data sets. I then went into the quantitative analysis, 

by looking strictly at the results generated from the HIPM software and discussing 

what it tells us on an ecological standpoint. In section 4, I entered the analytical 

part of our analysis, picking and studying some of the best-fit models selected during 

the quantitative analysis. I then discussed these analytical results and in the next 

section tied it back to the biology in an effort to link both qualitative and quantitative 

research. Through this analysis we saw how we can help HIPMs model selection 

method as well as assist scientists in finding a model that most accurately explain 

the processes at works in the ecosystem observed. 

9

2 METHOD 

The method employed in this paper involves constructing process models from continuous 

data. To assist in this task we used a piece of software named HIPM. It 

is the output and model selection efficiency of this computer software that we are 

investigating. To better understand the task at hand it is important to define what 

HIPM does, as well as the steps we are taking to test its efficiency. 

2.1 HIPM Description 

Ecologists rely on system modeling quite heavily to build ecological theory, guide 

environmental assessment and management (Borrett et al. 2007). Typically scientists 

will build and study a couple of models, basing the model structure on previous 

research or by making a judgement call on which entities and processes should or 

not be included. One of the aspirations and problems of modeling natural systems is 

to capture the essence of the system necessary for the model purpose by figuring out 

what can be left out; in that regards which entities and processes should be included, 

and what are the best mathematical formulation and parameter values for a given 

structure become an essential part of this search. Choosing from among the possible 

model structures presents an intricate and time consuming challenge for ecologists 

who want to navigate this space (Borrett et al. 2007). In searching through this 

space of possible models, we are guided by the claim made by Langley et al. (1987), 

which we support, that we must look for models that will fit real-life observations. In 

summary,we are faced with the problem of constructing models anchored in domain 

theory, conducting a time consuming search and linking the models to empirical 

data (Borrett et al. 2007). This is where the HIPM software comes into play to 

remedy these issues, HIPM stands for Hierarchal Inductive Process Modeling. This 

scientific approach (Lantley et al. 2005) assumes the following: 

10

• Given: Time-series data for continuous variables. 

• Given: Background knowledge about the entities of the system; in other words 

constraints on variables and other parameters driving these entities. 

• Given: Background knowledge on the type of processes that may be involved 

in driving the ecosystem as well as the constraints that may exist for the said 

processes. 

Then the task for the software is to perform a search through the structure and 

parameter space defined by the process-entity library to find the models that best 

fit the data. HIPM operates in four phases. 

1. In an exhaustive search, it first finds all the possible instantiations of the 

generic processes for all variables. This means that the system will find all the 

possible combinations of processes that can affect a given variable (We will 

give an example in Section 2.1.2 ). For our purposes we used the exhaustive 

search option programmed into the software but there is also a heuristic search 

option available. 

2. The system then walks through each model and puts them together. In other 

words, it puts together, into a generic model, one instantiation of generic 

processes for each variable present in the system. It uses the constraints given 

by the users to determine which instantiations can be linked together into a 

generic model; the program goes through an exhaustive search to find all the 

possible models. In our study it makes 1120 model structures, due mainly to 

the large amount of different grazing processes that are potentially present in 

the ecosystem. 

3. It searches for the parameter values for each model using the constraints defined 

by the users. 

To infer these parameters, the system picks a random 

11

set of values that respect the constraints and, using the Levenberg-Marquardt 

gradient descent method, finds a local optimum. To avoid entrapment in local 

minima, the system will restart the parameter estimation from multiple 

random points retaining only the parameters that produce the lowest error. 

In our experiment we set the number of restarts to 128. This technique has 

been found to produce reasonable matches to time series in multiple systems 

(Langley et al. 2007). 

4. Evaluates the performances of the produced model structures (predicted values) 

against the data series (observed values) by calculating the root mean 

square error (reMSE); models with the lowest reMSE will be considered best 

fit models. 

2.1.1 Measure of Fit 

As mentioned above, HIPM evaluates and selects the best model structure and set 

of parameters according to a fitness measure. The system currently uses the sum 

of square error (SSE) to evaluate fitness (Bridewell et al. 2007), which is defined as 

follow: 

n∑ 

i=1 

SSE(x i , x obs 

i ) = 

n∑ 

i=1 

m∑ 

k=1 

(x i,k − x obs 

i,k ) 2 

where x i , . . . , x n are the variables that are being fitted with m observed values for 

each. To take into account the modeling of variables of varying scale, the system 

uses a relative mean squared error that we define in the following way: 

reMSE = 

∑ n SSE(x i ,x obs 

i ) 

i=1 s 2 (x obs 

i ) 

nm 

Here s 2 (x obs 

i ) is the sample variance of the observation for x i . Across this paper 

12

we will refer to the relative mean squared error as reMSE. The biggest asset to this 

rescaling is the ability to compare values across data sets. Typically, an ReMSE of 

1.0 or above signifies that the model performs poorly and inversely, the lower the 

reMSE, the better the fit. 

2.1.2 Entities specification and model library 

Each entity of a system is defined by a combination of variables and parameters 

which makes them actors but also receivers of action in the model. A distinction is 

to be made between generic entity and instantiated entity. Indeed, a formal generic 

entity has a name and a set of properties which can include both variables and 

parameters. 

In a given model the parameters of the instantiated entity will not 

change whereas the variables do. Every variable in the entity has a name and a 

rule that determines how multiple processes and their subprocesses are combined 

(e.g. summed, minimum, product, etc...). For the parameters there is a name 

and a range that constrains their possible values. On the other hand, instantiated 

entities have their variables associated with either time-series or they are given initial 

values and the parameters have been assigned real values. A field is also included 

to indicate the parent generic entity (Borrett et al. 2007). One given generic entity 

can be instantiated multiple times, the generic entity can be thought of as a blue 

print for the instantiated entities. For example in our system we defined the entity 

phytoplankton as presented in Table 1. Here our entity’s name is “P”; it contains the 

variables “conc”, “growth rate” and “growth lim” with the rules determining how 

they will be aggregated with other processes; the next part of the entity definition is 

the list of parameters that are of concern for this entity such as “max growth’ with 

possible values in the (0,600) range. Following the definition of a generic entity in 

Table 1 is an instantiated entity, “pe” which refers to the parent generic entity. The 

variables are then either given the name of a time-series to which the model will be 

13

fitted such as for “conc”, with the “PHA c” referring to the phytoplankton column 

of the CIAO data set, or an initial value such as 0 for “growth rate”, indicating 

that this particular state variable won’t be fitted to a time-series. 

The mention 

“system” as opposed to “exogenous” simply states that this variable is dependent 

on the system as opposed to being independent like variables such as solar radiation 

or water temperature. The full instantiated entity library can be found in Appendix 

B and the generic entity library in Appendix C. 

For HIPM to be fully functional there needs to be a library of processes. Processes 

are the physical, chemical, or biological actions that drive change in dynamic models. 

Just as we made a distinction between generic entity and instantiated entity, we 

make a distinction between generic processes and instantiated processes. All generic 

processes are defined by a name by which entities can tie into the process, the 

subprocesses that are tied to that one process and one or multiple equations. The 

generic process can also include a set of Bolean conditions that determine if the 

process is active, making the process dynamic by turning the process on and off 

depending on whether the conditions are satisfied (Borrett et al. 2007). For instance 

we could set the photosynthetic process to only occur if a set environment light 

variable is greater than zero. We have an example of generic process in Table 2, it is 

named “growth”, and any of the following entities “P, N, D, E”can take a role in the 

process, then there is a list of the subprocesses, with the entities that can take a role 

in the subprocess, that are linked to this process and finally the equation that defined 

this process; this equation calls onto the “conc” and “growth rate’ variables that all 

entities must have. The instantiated process will take on a specific name and will be 

bound to a specific instantiated entity, one of P, N, D or E. The instantiated entity 

will take it’s role in the equation of the instantiated process. All the instantiated 

processes will be aggregated according to the rule defined in the generic entity. It 

is this organization in terms of entity and process that drives inductive process 

14

modeling. It makes for an easier construction of systems of equations by building in 

fragments. 

Table 1: In this table we are first giving an example of generic entity definition with 

its variables and parameters followed by an example of an instantiated entity, more 

specifically Phytoplankton - P, to which the variable “conc” is given a time series 

and the other variables initial values. 

pe = lib.add_generic_entity("P", 

{ "conc":"sum", 

"growth_rate":"prod", 

"growth_lim":"min"}, 

{ "max_growth": (0.4,0.8), 

"exude_rate": (0.001,0.2), 

"death_rate": (0.02,0.04), 

"Ek_max":(1,100), 

"sinking_rate":(0.0001,0.25), 

"biomin":(0.02,0.04), 

"PhotoInhib":(200,1500),}); 

p1 = entity_instance (pe, 

"phyto", 

{ "conc": ("system", "PHA_c", (0,600)), 

"growth_rate": ("system", 0, (0,1)), 

"growth_lim": ("system", 1, (0,1))}, 

{ "max_growth":0.59, 

"exude_rate":0.19, 

"death_rate":0.025, 

"Ek_max":30, 

"biomin":0.025, 

"PhotoInhib":200 } ); 

15

Table 2: Defining a process - Growth 

lib.add_generic_process( 

"growth", "", 

[("P",[pe],1,1), ("N",[no3,fe],1,100), 

("D",[de],1,1), ("E",[ee],1,1)], 

[("limited_growth", ["P","N","E"], 0), 

("exudation",["P"],1), 

("nutrient_uptake",["P","N"],0)], 

{}, 

{}, 

{"P.conc": "P.growth_rate * P.conc"} ); 

To sum it up, HIPM’s power resides in its knowledge of the modeled domain as 

well as its ability to estimate parameters (Bridewell et al. 2007). 

2.2 Experiment Design 

Having now established how HIPM works let us consider the problem at hand. 

Though in theory HIPM is an extremely powerful tool which permits a search 

through a wide structure and parameter space, previous research has demonstrated 

that a more thorough investigation of HIPM’s output is necessary to evaluate its 

potential and usefulness to biologist. 

In our example of the Ross Sea ecosystem 

with the process-entity library set up as described, the search space represents 1120 

possible models; each model can take on a wide variety of parameters set depending 

on the constraints given to the software. The Phytoplankton dynamic models of the 

Ross Sea have five variables: Phytoplankton (P ), Zooplankton (Z), Detritus (D), 

Nitrate (N) and Iron (F ). In previous research, real-life time series about Phyto- 

16

plankton and Nitrate were available to us for this particular ecosystem, thus the 

data was fed to HIPM. By doing so, HIPM came out with about 200 possible models 

that have a reMSE of less or equal to 0.2 which from a computer science stand 

point is a good improvement. Indeed, we reduce the search space from 1120 possible 

models to 200 models. However, for a biologist that is still a quite large amount of 

models approximating the ecosystem studied; going through and testing out every 

one of these 200 models would be extremely time-consuming. Therefore, it is clear 

that we somehow need to lower this number of possible models to a point deemed 

reasonable/useful to biologist. Logically we assume that increasing the number of 

constraints (i.e. add real-life time series of a variable for which we had no previous 

empirical data) would help model discrimination in HIPM. But this would imply 

that the scientist would have to go into the field and collect time series for one of 

the variables in the system; that process being very expensive, can HIPM be used 

to make an informed decision about which variable would yield the most discriminatory 

powers, if there is at all a difference between variables This is what we are 

investigating and in the light of these elements we have formulated two hypotheses: 

• Hypothesis 1: Increasing the number of constraints: increasing the number of 

time-series for which we have data in HIPM for model selection will induce 

better fits. In other words, the increase in number of known time-series of 

system variables leads to better model discrimination and therefore better 

model selection. 

• Hypothesis 2: Variables yield different values of information: some variables 

will have more discriminatory power and restrict the best fit models more than 

others. 

To test our two hypotheses it was imperative to employ a full data set including 

time-series for all variables of the system in order to compare the results depending 

17

upon whether certain time-series are included or not as constraint for HIPM. Since 

no full data set with real-life data was available, we turned to a simulated data set 

called the ”Couple Ice and Ocean model” datasets otherwise referred to as CIAO 

datasets. This dataset is generated from a three dimensional ecosystem model that 

spans the entire water column and multiple stations across the Ross Sea. However, 

for our purposes only a portion of this data, the top 5 meters at the Ross Sea Polynya 

station 01, is used. The type of information contained in the CIAO dataset is stated 

in Table 3. 

Table 3: Information included in the CIAO data set. 

NOTE: A sample of the CIAO 1997 data can be found as Appendix A. 

Symbol Units Description 

JDAY Day Day of the measurements 

TEMP ◦ C Temperature of the water 

DPML m Mixed layer depth 

AI 

Sea ice concentration 

NITR µM Nitrate concentration 

PHOS mg Chla/m 3 Phosphate concentration, 

SILC µM Silicate concentration 

IRON nM or µM Iron concentration 

PARL µmol photons m −2 s −1 Solar radiation used by organism in photosynthesis. 

PHA mg Chla/m 3 Phaeo chlorophyll concentration 

DIAT mg Chla/m 3 Diatom chlorophyll concentration 

ZOO mg C/m 3 Zooplankton concentration 

DET mg C/m 3 Detritus concentration 

PURL µmol photons m −2 s −1 Photosynthetically usable radiation 

In addition to a full data set, it is necessary to have a working library, that, as 

stated in Section 2.1.2, defined both entities and processes for HIPM. The processentity 

library that we used is available in Appendix B and C, it was previously 

put together by Bridewell, Borrett, Langley and Arrigo. 

All the processes and 

subprocesses in which the instantiated entities can take a role in our study are 

represented in Figure 2. 

18

Having the background knowledge necessary for HIPM to conduct successful runs 

we designed thirty one experiments; each experiment represents a possible combination 

of time-series constraints that could potentially be entered into the software. 

For example, if we had time-series for Iron and Nitrate and fed the information into 

HIPM they would act as additional constraints in the model selection process. To 

be selected, models have to exhibit behavior close to the given time-series. All the 

experiments are summarized in Table 4 . 

19

3 COMPUTATIONAL RESULTS 

The main topic in this paper, is to determine how to optimize the usage we make of 

HIPM to assist scientists in there decision making process when it comes to selecting 

a model that most accurately represent an ecosystem. The first need is to narrow 

down the number of possible good fit models capable of describing the system. We 

did this feeding additional time series about one of the state variable into HIPM, 

thus providing more constraints; so did this assumption hold true 

Secondly, if 

adding more constraints to HIPM does reduce that number, are observations for a 

specific state variable holding more reducing power than the other state variables 

The data collected helped us answer these questions as well as discuss the efficiency 

of HIPM in its current state. 

There were thirty-one different experiments performed, each returning a measure of 

fit value (reMSE) every one of the 1120 models tested in every experiment. This 

makes for a large amount of data to analyze. To get a better idea of what this data 

looks like, the measures of fit values of models that had an reMSE between 0 and 

2 were graphed, ranking and graphing them from lowest to highest (see Figure 4, 5 

and 6) value. We did not look at reMSE higher than 2.0 since, as stated previously, 

models with reMSE higher than 1.0 are typically classified as poorly performing 

models as it indicates a very large difference between observed and expected values. 

We estimated that the (0,2) range would be sufficient for our purpose, as it would 

encompass most models. Based on these initial results we decided to pick an reMSE 

of 0.5 as our good fit model cutoff; any model under that cutoff is considered of good 

fit. This choice of cutoff was made because the multiple graphs seemed to exhibit a 

turning point or slight step pattern around this reMSE value, such as portrayed in 

the graph for experiments 1, 5 or 20. 

20

2.0 [P] 1 

[Z] 

2 

[D] 

● 

● 

3 

1.5 

1.0 

0.5 

0.0 

1.5 

1.0 

0.5 

0.0 

● 

● 

● 

● ● 

● 

● 

● ● 

● ●●● 

● 197 Good Fit Models 101 Good Fit Models 366 Good Fit Models 

● 

● 

● ● ● 

● 

●● 

● 

● 

● ● 

● 

● ● 

● 

● ●●● 

● 

●● ● 

2.0 [N] 4 

1.5 

1.0 

0.5 

0.0 

● 

●●●●● 

●● 

● 

● 

439 Good Fit Models 509 Good Fit Models 

2.0 [P,D] 7 

● 

● 

● 

● ● 

61 Good Fit Models 

2.0 [Z,D] 10 

1.5 

1.0 

0.5 

0.0 

● ● ● ● 

● 

● ●● 

● ● ● 

● ● 


● 

[F] 

● 

● 

● ● 

● 

● ● [P,N] 

● 

● 

● ●● ● 

●●●● ● 

●● ● ●● ● 

● 

5 

● ● ●● 

● ● 

● ●● 

[P,Z] 

● 

● ●● 

● ●●● ● 


● ●●●● 

● 

8 

[P,F] 9 

● 

● 

● 

●● 

● 

● ●●● ● 

25 Good Fit Models 79 Good Fit Models● 

[Z,N] 

● 

● ●●●●● 


● 

● 

● 

11 

● 

● ● 

● 

[Z,F] 

●● 


● 

● 

● 

● 

● 

● 

● 

● 

● 

● ● ● 

6 

12 

0 200 400 600 800 1200 

0 200 400 600 800 1000 

0 200 400 600 800 1000 

Figure 4: reMSE value are ranked from lowest to highest. The reMSE = 0.5 signifies 

the good fit model cutoff, any models under that value are considered good fit models. 

The experimental setup for each run as well as the ID number is indicated in the 

top right corner. 

21

● 

● 

● 

2.0 ● 

[D,N] 

●●● 

● ● 

● 

● 

● 13 

[D,F] 14 

[N,F] 

● 15 

● 

1.5 

●●●●● ● 

● 

●● 

● 

1.0 

● 

● 

0.5 

● ● 

● ● ● ● 

67 Good Fit Models 190 Good Fit Models 128 Good Fit Models 

0.0 

● 

2.0 16 

1.5 

1.0 

0.5 

0.0 

● ●● 

●●● ● 

● 

● ●●● ●● [P,Z,D] 

● ●●●●●● ● 

● 

●● ● ●● ● 

● ● 

● 

● 

● 

● 

● 

● ● 

●● 

● ● 0 Good Fit Models 

● 

[P,D,N] ● 

●● ● ●●● 

● 

● 

●● ● 

● 

●● ● 

● ●●● 

● 

● ●● 

● ● ● ● ●● ● ● 

● 

● ● ● 

● ● ● 

● ● ● ● ●●● ● ● ● 

● 

[P,Z,N] 


● ●● 

● 

2.0 19 

[P,D,F] 20 

● 

1.5 

●● 

1.0 

●● ● ● ●● 

0.5 

13 Good Fit Models 177 Good Fit Models 

0.0 

2.0 [Z,D,N] 22 

1.5 

1.0 

0.5 

0.0 

● 


● ●● 

●● 

● [Z,D,F] 

● ●● 

● ● ● 


● 

● 

● 

● 

● ● 

● 

17 

23 

● 

● 

[P,Z,F] 

●●● 18 

● 

● 

● 

● 

● 


● 

[P,N,F] 21 

●●●●●●● ● ● ●●●● 

● ●●● ● ● ●● ● ● ●●● ● 

● 

● 

● 

● ●● 

● ●● ● 

● ● 

● 

● 

● ●●● 

● 


[Z,N,F] 


● ● 

● 

24 

0 200 400 600 800 1200 

0 200 400 600 800 1000 

0 200 400 600 800 1000 





22

2.0 [D,N,F] 25 

1.5 

1.0 

0.5 

0.0 

1.5 

1.0 

0.5 

0.0 

● ●● 

● ● ●●●● 

● ●●● ● 

● ● 

● ●●● ● 

● ● 

●● ● ●● 

● ● 

● 

● 

● 

●●● 

●●● 

● 

● 

● 

● ● 

● 

● 

● ●● 

● 

●● ● 

● ●● 

● 

● ● ● ●● ● 

●● ● ● 

● ●● ● 

● ● 


2.0 [P,Z,N,F] 28 


● 

● ●● 

● ●●● 

● 

● 

● ●● 

● ●● 

[P,Z,D,N] 


[P,D,N,F] 


● ● ●● 

● 

● ● ●● 

●● 

●● ● ● 

●●●● 

0 200 400 600 800 1000 

26 

29 

●● ●●● ● ●●●● ● ● 

[P,Z,D,F] 


[Z,D,N,F] 


●● ● 

●● 

● 

●●● 

●● 

●● ● 

● 

● 

● 

0 200 400 600 800 1000 

● ●●●● 

27 

30 

● 

2.0 [P,Z,D,N,F] 31 

1.5 

●● 

1.0 

0.5 

0.0 


0 200 400 600 800 1200 





23

3.1 Increase in number of time-series input 

One of the first observations that was made when looking at the data set, is that the 

general trend was the more time-series were used in HIPM the smaller the number 

of good fit models, as represented in Figure 7. 

Number of Good Fit Models (reMSE

dramatically, with very small or non existent variance, and get very close or equal 

to zero. This suggest there may be some issues in the selection process which could 

originate from over-constraining the system or from a need to improve the processentity 

library. Furthermore, looking at Figure 6 we observe that there are no models 

with an reMSE lower than 1.5 which means that all models have performed poorly 

given the constraints. At first glance and momentarily putting aside the observed 

behavior for four and five time-series constraints, we can conclude that adding up 

to three multiple time-series constraints produces the desired effect and reduces the 

number of good fit models. But the conclusion of this initial examination does not 

always hold true, as a closer look at the data reveals. 

At this point my research entered the field of exploratory statistics as opposed to 

hypothesis testing statistics, conventional statistics tool such as p-value or confidence 

interval were not suitable to evaluate the hypotheses.The data has been reformatted 

in the more reader-friendly 4 which represents each experiment in a binary format: 

the instantiated entities given time-series for a run received a 1 and the ones with 

only initial values received a zero. In addition to this the number of good fit models 

for each reMSE cutoff value from 0.1 to 1 were added up in order to analyze the 

individual effect, on the model selection process, of adding time-series constraint for 

each entity. In order to do so, we select a subset of Table 4 for which a certain entity 

has the value of 1. For example for P, we selected the subset of rows where P had a 

value of 1. By doing so we are only looking at the runs in which the constraints on 

P had a role, excluding the experiments where P was not constrained. 

By carefully looking at Table 4, we notice that in experiment 1 when given 

observations only for phytoplankton the number of good fit models under .5 reMSE 

is 197. In experiment 7 and 9, this number dropped to 61 and 79 respectively, with 

the addition of observations for detritus in one case and iron in the other. However, 

notice that in experiment 20, where observations for phytoplankton, detritus and 

25

Table 4: This table represents each experiment in binary form, 1 signifying that a 

time-series was given for this entity and 0 that no time-series were given for this run. 

We counted the number of models present under each reMSE cutoff value 

ID Data Constraints reMSE Cutoff 

P Z D N F .1 .2 .3 .4 .5 .6 .7 .8 .9 1 

1 1 0 0 0 0 11 122 161 183 197 213 233 253 284 336 

2 0 1 0 0 0 14 38 46 72 101 141 186 236 296 487 

3 0 0 1 0 0 95 184 248 331 366 404 441 501 552 602 

4 0 0 0 1 0 67 188 301 376 439 482 517 531 547 605 

5 0 0 0 0 1 167 361 414 452 509 537 563 594 628 1094 

6 1 1 0 0 0 0 0 1 4 5 9 14 19 20 22 

7 1 0 1 0 0 0 18 35 45 61 75 88 94 100 102 

8 1 0 0 1 0 0 0 15 18 25 30 35 38 45 48 

9 1 0 0 0 1 0 37 60 73 79 91 110 123 143 158 

10 0 1 1 0 0 0 1 5 5 8 10 11 14 18 18 

11 0 1 0 1 0 0 0 0 1 1 1 1 1 1 5 

12 0 1 0 0 1 0 0 0 0 0 3 10 14 16 23 

13 0 0 1 1 0 2 8 13 48 67 93 120 142 167 178 

14 0 0 1 0 1 59 94 140 156 190 232 255 276 295 311 

15 0 0 0 1 1 23 40 57 89 128 151 179 218 252 290 

16 1 1 1 0 0 0 0 0 0 0 1 4 5 7 7 

17 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 

18 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 

19 1 0 1 1 0 0 0 0 9 13 19 22 24 24 25 

20 1 0 1 0 1 44 91 132 149 177 226 253 280 295 312 

21 1 0 0 1 1 0 0 0 11 15 17 25 25 27 31 

22 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 

23 0 1 1 0 1 0 0 0 0 0 3 4 7 7 7 

24 0 1 0 1 1 0 0 1 1 3 4 5 6 7 8 

25 0 0 1 1 1 3 13 21 27 39 51 65 87 100 114 

26 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 

27 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 

28 1 1 0 1 1 0 0 1 1 2 2 3 4 5 6 

29 1 0 1 1 1 0 0 0 2 5 10 13 17 17 18 

30 0 1 1 1 1 0 0 0 0 0 1 2 2 2 2 

31 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 

Iron were used that the number of good fit models is 177 which is higher than 

that of experiment 7 and 9. This perfect counter example demonstrates that more 

26

observations is not always synonymous to fewer models. Yet, according to Table 

4 there are some experiments for which this assumption did hold true. Indeed, in 

experiment 8 both phytoplankton and nitrate are known and HIPM output 25 good 

fit models under 0.5 reMSE; when Iron observations were added in experiment 21 we 

count 15 models selected and again when detritus was added in experiment 29 this 

number dropped to 5. Thus, in some cases more information will provide further 

restriction in the number of models selected. 

If observations and measurements for one or two state variables and we want to 

collect data about one of the remaining state variables, it is clear that our choice of 

which state variable to use should be highly influenced by the restriction power of 

each variable which can be calculated from data already known. To place it back 

into context, in experiment 9 data for both phytoplankton and iron were used in 

the selection process and the output was 79 good fit models under 0.5 reMSE; if 

we wanted to decrease this number adding detritus would be an unwise choice, as 

it outputs 177 good fit models. However, this is not always the case. For instance, 

in experiment 8, where phytoplankton and nitrate data are known, the output was 

25 good fit models, with the addition of detritus this number went down to 13 

in experiment 19. Indeed, the assumption was that more time-series data would 

constraint the model selection process further. 

However, the way the reMSE is calculated may be the reason why the results 

contradict our assumptions. The reMSE is the average of the fit for each of the 

variables being fitted. For example if for a model I was fitting both phytoplankton 

and iron and they both had an reMSE of 0.6, their average would be 0.6 and thus 

would not be selected as good fit model but if in the next experiment we added 

detritus and it had a fit of 0.1, the overall average would then be 0.35 which would 

put the model into the good fit model category. This explain why more time-series 

does not always mean fewer model being selected. Hence, different entities will yield 

27

different restriction powers based on the pre-existing knowledge. It seems that the 

value of the information varies according to the data previously available; but is 

there a specific state variable that tends to provide more restriction than the others 

regardless of the case or inversely, is there a state variable that tends to provide very 

little additional information in the model selection process 

3.2 Value of Information 

It seems realistic to think that each state variable could yield varying restrictive 

power when it comes to model selection through HIPM. To verify this assumption 

we used subsets of Table 4 to create Figure 8: five subsets were created for each of the 

state variables using the experiments for which the observations and measurements 

for that variable were known. I evaluated the overall impact of each state variable 

over all experiments by first looking at the mean number of good fit models for each 

reMSE cut offs. Realizing that the variation in these subsets was great because of 

the presence of so many zeros we decided to look at the median number of models 

for each of the different reMSE cutoff values in order to quantify the discriminatory 

powers of each entity. We will refer to these values as Median Activation Values in 

reference to Bayesian Statistics Activation Probabilities that inspired this approach. 

The lower the Median Activation Value the more restrictive power it holds. 

Indeed the activation value refers to the median number of models selected across 

all the runs that included that entity. In our case we are looking for the state variable 

that would reduce the number of good fit models the most. This is determine 

by the lower the activation value, the lower it is the more discriminatory power 

this variable holds at that particular cutoff. Looking at Figure 8 that zooplankton 

for cutoff between 0.4 and 1 has the most discriminatory power. Nitrate comes in 

second with an overlap with Iron from reMSE cutoff between 0.4 and 0.6 but for 

higher cutoff nitrate will have lower Median Activation Value than iron. As far as 

28

25 

20 

● 

P 

Z 

D 

N 

F 

● 

● 

● 

Median Activation Values 

15 

10 

● 

● 

5 

● 

● 

0 

● ● ● 

0.2 0.4 0.6 0.8 1.0 

ReMSE cutoff 

Figure 8: Mean Activation Values by different Cutoffs. The lower the median activation 

value the more discriminatory powers that entity holds at that particular cutoff. 

Between 0.4 and 1, Zooplankton consistently has the lowest median activation value. 

phytoplankton and detritus are concerned, they seem to have similar behavior with 

high activations values. We are to note that for cutoffs less than 0.4 no one entity 

seems to have greater discriminatory power. Based on this graph alone it would 

seem that zooplankton is the one entity that yields the most information when it 

comes to model selection and therefore would be the entity worth collecting in the 

field. 

That said, Table 4 also reveals a worrisome amount of data constraints combinations 

which yield no models with reMSE less than one. A example of that being 

29

experiment 22, yielding no good fit models under 1 reMSE cutoff. Another case 

that is cause for worry is the one where time-series are given to all 5 entities which 

we would expect to have at least one model selected in between the 0.1 to 1 range 

of reMSE cutoff. This observation raises the question that there may be an underlying 

issue with the model selection process. Incidentally, all of the combinations 

that yield no models under the 1 reMSE cutoff are experiments to which we gave 

Z a time-series constraint, which may say something about the processes that drive 

zooplankton; the library may be in need of improvements. 

3.3 Summary 

This result analysis allows us to make the following observations: 

• In most cases, increasing the number of time-series constraints up to 3 seemed 

to reduce the number of good fit models under a 0.5 cutoff. If we consider 

experiments 1 through 25, there was only one case (experiment 20) for which 

the number of good fit models increased and six cases for which the number of 

good fit models went to zero (experiment 12, 16,17,18, 22, 23) which can been 

interpreted as a deficiency in the library. Overall, this result is due to the fact 

that the reMSE is an average of the fit of all the state variables for which we 

have time-series. 

• The decision as to which data to collect next should take into consideration the 

previously acquired time-series. Recommendations may differ based on state 

variables previously measured and used with HIPM in the model selection 

process. The reason for this is once again the way the reMSE is calculated. 

Indeed, depending on how well the previously included time-series fitted a 

particular model will determine whether or not the addition of another timeseries 

will throw the said model in or out of the pool of good fit models. 

30

• For an reMSE cutoff of less than 0.4 the Median Activation Values for all 5 

entities blend together and are not useful. However, for reMSE cutoff equal to 

0.4 or greater the Median Activation Values seem to indicate that zooplankton 

yields the most discriminatory powers which could be due to the numerous 

experiments for which we obtain zero good fit models. These zeros could be 

the result of two things, either that the discriminatory power of Zooplankton is 

superior or the most plausible answer at this time would be that zooplankton 

are not appropriately defined in the process library. 

That being said there are a couple of elements that raise question in regards 

to the accuracy of the model selection process or the process-entity library. These 

elements being: 

• The behavior observed in Figure 7 with time-series for four or five of the entities 

as an average number of good fit models very close or equal to zero as well 

as a spike in reMSE fosters doubt as to the accuracy of the selection process 

when provided too many constraints. 

• The lack of good fit models, under reMSE cutoffs ranging from 0.1 to 1, for 

many of the experiments that included Zooplankton time-series constraints. 

This leads us to conjecture that when HIPM is given more than 3 data-series the 

system becomes overconstrained thus preventing it from accurately selecting models. 

Another conjecture is that the entity Zooplankton as defined in the process-entity 

library needs to be reviewed; it could be this element alone that is at the origin of 

this issue in the model selection. More specifically, one of the assumption of the 

system is that zooplankton feed very little if at all on Phaeocystis antartica as they 

are more resistant to grazing in comparison to diatoms that are more typically grazed 

upon by zooplankton. The way the process-library is currently set-up diatoms are 

31

not taken into consideration which could then in turn affect how well zooplankton 

performs when fitting models. 

32

4 ANALYTICAL ANALYSIS 

The quantitative analysis of HIPM’s results enabled us to make some useful observations. 

Since the main purpose of this software for a biologist is to approximate 

the natural system observed in order to use this model to perform experiments, we 

decided to choose two of the models with an reMSE of less than 0.5 that came up 

most frequently over all 31 runs. 

4.1 Most recurrent models 

The initial concept driving the models is represented in Figure 1. Phytoplankton 

plays a role in both Zooplankton and Detritus concentration, it is acted upon by both 

Nitrate and Iron which are in turned acted upon by Detritus. The environmental 

factors act on the Phytoplankton concentration as well as Nitrate and Iron. Model 

A came up 13 times and Model B came up 11 times over all runs. We analyzed these 

models to figure out if their behavior make sense from an ecological standpoint and if 

they could give us information on how to improve the HIPM selection process. The 

models are composed of five differential equations, each one determined by one of 

the five principal concentrations: Phytoplankton (P), Zooplankton (Z), Detritus (D), 

Nitrate (N)and Iron (F). All theses entities are acted upon by sets of parameters 

listed in Table 5 and 6. 

There are also a set of exogenous variables acting on 

the system, defined as follow: E P UR (t) is the photosynthetically usable radiation, 

E T H2 O(t) is the temperature of the water and E ice (t) is the sea ice concentration. 

33

Table 5: This table summarizes all the parameters that play a role in Model A 

Model A 

ID Name Value 

a 0 phyto.max growth 0.8 

a 1 phyto.Ek max 12.033 

a 2 phyto.PhotoInhib 771.158 

a 3 arrigoetal1998 w photoinhibition coefficient 13.2302 

a 4 NO3 monod lim coefficient 0.00099718 

a 5 Fe monod lim coefficient 0.000394882 

a 6 phyto.exude rate 0.0228636 

a 7 NO3.toCratio 6.6 

a 8 Fe.toCratio 308026 

a 9 phyto.death rate 0.0311617 

a 10 environment.beta 0.327204 

a 11 zoo.death rate 0.270568 

a 12 zoo.assim eff 0.167516 

a 13 zoo.gmax 0.403535 

a 14 grazing ivlev delta coefficient 0.997648 

a 15 detritus.remin rate 0.0335311 

a 16 zoo.respiration rate 0.0103725 

a 17 phyto.sinking rate 0.015739 

a 18 detritus.sinking rate 0.074487 

a 19 NO3.avg deep conc 31 

a 20 NO3 linear temp control max mixing rate 0.729376 

a 21 Fe.avg deep conc 0.00045 

a 22 Fe linear temp control max mixing rate 0.00794959 

34

Model A 

Where, 

dP 

dt 

dZ 

dt 

dD 

dt 

dN 

dt 

dF 

dt 

= 

= 

= 

= 

= 

[ [ ] 

(1 − E ice (t))a 0 e (0.06933∗E T H 2 O(t)) 

M(t) (1 − a 6 ) − a 9 − a 17 

]P (1) 

} {{ } 

( 

− 

P Growth 

) 

Rate 

a 13 (1 − e (−a 14P ) ) Z 

} {{ } 

Z Grazing Rate 

[ 

a 12 a 13 (1 − e (−a 14P ) ) −a 

} {{ } 11 − a 16 

]Z (2) 


( 

) ( 

) 

(1 − a 10 )a 11 P + (1 − a 10 )a 11 Z 

( 

) 

+ (1 − a 10 )(1 − a 12 ) a 13 (1 − e (−a 14P ) ) Z 

} {{ } 

− D(a 15 + a 18 ) 


[ 

] 

E T H2 O 

(a 19 − N) a 

max 

− E T H2 O(t) 

20 

E T H2 O max 

− E T H2 O 

} {{ min 

} 

N Mixing Rate 

− 

[ 

− 

[ 

P 

(a 7 12.0107) 

[ 

] 

(1 − E ice (t))a 0 e (0.06933∗E T H 2 O(t)) 

M(t) 

} {{ } 

P Growth Rate 

] 

E T H2 O 

(a 21 − F ) a 

max 

− E T H2 O(t) 

22 

E T H2 O max 

− E T H2 O 

} {{ min 

} 

F Mixing Rate 

[ 

P 

(a 8 12.0107) 

[ 

] 

(1 − E ice (t))a 0 e (0.06933∗E T H 2 O(t)) 

M(t) 

} {{ } 

P Growth Rate 

(3) 

(4) 

] 

+ a 15D 

(a 7 12.0107) 

(5) 

] 

+ a 15D 

(a 8 12.0107) 

{ 

F 

M(t) = min 

(F + a 5 ) , N 

(N + a 4 ) , E P UR (t) 

(e− a 2 )(1 − e −E P UR (t)(1+a 3 e(E P UR (t)e1.089−2.12log 10 (a 1 

} ) ) ) 

a 1 )) 

} {{ } 

Phytoplankton Growth Limitation 

35

Table 6: This table summarizes all the parameters that play a role in Model B 

Model B 

ID Name Value 



a 2 phyto.PhotoInhib 394.809 


a 4 nut lim exp coefficient 0.784127 

a 5 monod lim coefficient 0.000722964 








a 13 zoo.gmax 0.350046 

a 14 zoo.gcap 288.23 

a 15 zoo.glim 19.0002 

a 16 phyto.biomin 0.0201679 





a 21 NO3.avg deep conc 31 




36

Model B 

dP 

dt 

dZ 

dt 

dD 

dt 

= 

= 

= 

[ ] 

[ ] 

(1 − E ice (t))a 0 e (0.06933∗E T H 2 O(t)) 

M(t) P (1 − a 6 ) (6) 

} {{ } 

( 

P Growth 

) 

Rate 

−(a 9 P ) − H(t)Z − (a 19 P ) 

( ) 

a 12 H(t)Z − a 11 Z 2 − a 18 Z (7) 

( 

) 

) 

(1 − a 10 )a 9 P + 

((1 − a 10 )a 11 Z 2 (8) 

( 

) 

+ (1 − a 10 )(1 − a 12 )H(t)Z − D(a 17 + a 20 ) 

dN 

dt 

dF 

dt 

= 

= 

[ 

] [ 

a 17 D 

E T H2 O 

+ (a 21 − N) (a 

max 

− E T H2 O(t) 

22 

(a 7 12.0107) 

E T H2 O max 

− E T H2 O 

} {{ min 

} 

N Mixing Rate 

− 

[ 

− 

[ 

P 

(a 7 12.0107) 

Da 17 

(a 8 12.0107) 

[ 

P 

(a 8 12.0107) 

[ 

] 

] 

(1 − E ice (t))a 0 e (0.06933∗E T H 2 O(t)) 

M(t) 

} {{ } 

P Growth Rate 

] [ 

] 

+ 

E T H2 O 

(a 23 − F ) a 

max 

− E T H2 O(t) 

24 

E T H2 O max 

− E T H2 O 

} {{ min 

} 

F Mixing Rate 

[ 

] 

(1 − E ice (t))a 0 e (0.06933∗E T H 2 O(t)) 

M(t) 

} {{ } 

P Growth Rate 

] 

] 

(9) 

(10) 

Where, 

{ 

F 

M(t) = min 

(F + a 5 ) , (1 − e−a 5N ), (e −E P UR (t) 

a 2 )(1 − e −E P UR (t)(1+a 3 e(E P UR (t)e1.089−2.12log 10 (a 1 

} ) ) ) 

a 1 )) 

} {{ } 

{ 

H(t) = max 


} 

a 13 (P − a 16 − a 15 ) 

0, 

a 14 + (P − a 16 − a 15 ) 

} {{ } 

Zooplankton Grazing Rate 

37

4.2 Preliminaries 

Both Models A and B have complex structures which differ from more theoretical 

models studied by mathematicians. Since solving these differential equations directly 

is extremely difficult, we decided to take a more indirect approach by looking at 

the bounds of function, using the positive lemma and comparison arguments which 

follow. 

Lemma 1 A Positivity Lemma. Let W (t) be a smooth function over a domain 

[0, T ], T ∈ R . If W satisfies W ′ (t) + M(t)W (t) ≥ 0 in (0, T ] and W (0) ≥ 0, where 

M(t) is a bounded function in [0, T ], then W (t) ≥ 0 on [0, T ]. 

Proof: We prove this lemma by contradiction. Assume that the statement W (t) ≥ 0 

in [0, T ] were not true, then there would exist a point t 0 ∈ [0, T ] such that W (t 0 ) is 

a negative minimum of W on [0, T ]. Since W (0) ≥ 0, then t 0 ∈ (0, T ] which means 

that 

W ′ (t 0 ) + M(t 0 )W (t 0 ) ≥ 0. 

Since W reaches its minimum value at t 0 , then we have W ′ (t 0 ) = 0 if t 0 ≠ T and 

W ′ (t 0 ) ≤ 0 if t 0 = T . This ensures that 

M(t 0 )W (t 0 ) ≥ 0 

which contradicts our assumption about W (t 0 ) < 0 when M(t 0 ) > 0. 

For the case of M(t 0 ) ≤ 0, we let V (t) = e −γt W (t) for some constant γ with 

γ > −M(t) in (0, T ], then V will satisfy the relation V ′ (t) + (γ + M)V (t) ≥ 0 in 

(0, T ] and V (0) ≥ 0, where γ+M(t) > 0 for all t ∈ (0, T ]. From the above arguments 

we have V (t) ≥ 0 in [0, T ]. It follows from W (t) = e γt V (t) that W (t) ≥ 0 on [0, T ]. 

□ 

38

As an application of Lemma 1, we have the following comparison argument for 

the respective solutions u 1 and u 2 of the initial-value problem 

u ′ i = f i (t, u i ) in (0, T ], u i (0) = u i,0 , (11) 

where i = 1, 2. f 1 and f 2 are continuous functions in [0, T ] × R. 

Lemma 2 The Comparison Argument. 

Assume that both ∂f 1 

∂u and ∂f 2 

∂u are continuous in [0, T ] × R. If f 1(t, u) ≤ f 2 (t, u) 

in (0, T ] × R and u 1,0 ≤ u 2,0 , then the respective solutions u 1 and u 2 of (11) satisfy 

u 1 (t) ≤ u 2 (t) on [0, T ]. 

Proof: Let W = u 2 − u 1 , and let M = M(t) be any bounded function in [0, T ] × Ω. 

Then by (10), W satisfies 

W ′ (t) + M(t)W (t) 

= M(t)[u 2 (t) − u 1 (t)] + f 2 (t, u 2 (t)) − f 1 (t, u 1 (t)) in (0, T ] 

W (0) = u 2,0 − u 1,0 ≥ 0. 

Since ∂f 1 

∂u 

is continuous in u, then by the mean value theorem [2], 

f 2 (t, u 2 ) − f 1 (t, u 1 ) 

= [f 2 (t, u 2 ) − f 1 (t, u 2 )] + [f 1 (t, u 2 ) − f 1 (t, u 1 )] 

≥ ∂f 1 

∂u (t, ˆη)(u 2 − u 1 ) 

where ˆη = ˆη(t) is an intermediate value between u 1 and u 2 . Hence, for the bounded 

function M(t) = − ∂f 1 

∂u (t, ˆη(t)), W satisfies W ′ (t) + M(t)W (t) ≥ 0 in (0, T ]. It is 

known from lemma 1 that W ≥ 0, i.e. u 2 (t) ≥ u 1 (t) on [0, T ]. This proves lemma 2. 

□ 

39

In addition to these 2 Lemmas, let us introduce a method for solving a first order 

linear differential equation. 

Proposition 1 Suppose u is a function that satisfies: 

du 

dt = αu + β, u(0) = u 0, 

then, 

u(t) = (u 0 + α β )eαt − α β . 

Proof: 

du 

dt 

= αu + β is a first order linear differential equation and can be solved 

using the method of integrating factor. 

(e −αt u) ′ = βe −αt 

e −αt u = 

∫ t 

0 

βe −αs ds 

u = − β α + Ceαt 

Using initial condition u(0) = u 0 we find the following solution: 

u = (u 0 + β α )eαt − β α 

Hence, proving proposition 1. □ 

40

4.3 Model A 

Our analysis of Model A begins with two entities that have very similar structure, 

and only differ in variables and parameters, iron and nitrate. 

dN 

dt = [ 

(a 19 − N) 

− 

[ 

P 

(a 7 12.0107) 

≤a 20 

{ }} { ] 

E T H2 O 

a 

max 

− E T H2 O(t) 

20 

E T H2 O max 

− E T H2 O min 

[ 

] 

(1 − E ice (t))a 0 e (0.06933∗E T H 2 O(t)) 

M(t) + 

] 

a 15 D 

(a 7 12.0107) 

We decide to go for a very wide upper bound and to try keeping the bounds simple 

but yet still informative. Thus, for the upper bound we decided to drop the 

subtracted term. 

dN 

dt ≤(a 19 − N)a 20 

≤a 19 a 20 − Na 20 

Then, solving for N u (t) 

dN u 

dt 

+ N u a 20 = a 19 a 20 

Using integrating factor e a 20t and Proposition 1 we get, 

N u (t) = a 19 + R u e −a 20t 

where R u = N 0 − a 19 , As far as the lower bound is concern, we chose it to be zero. 

Thus summarizing the bounds we get, 

0 ≤ N(t) ≤ a 19 + R u e −a 20t 

(12) 

41

When t → ∞ we get the following, 

0 ≤ N(t) ≤ a 19 (13) 

This result tells us that the Nitrate concentration in this model will not exceed 

the value of parameter 19 which is the Nitrate average deep concentration. (12) also 

tells us that the maximum rate of decline of Nitrate will be that of Parameter 20 

which represents the Nitrate maximum mixing rate. This means that the accuracy 

of the Nitrate concentration is extremely dependent on how well the parameters are 

selected. Since Iron has the same equation structures, the same analysis applies: 

dF 

dt = [ 

(a 21 − N) 

− 

[ 

P 

(a 8 12.0107) 

a 

{ }} 

22 

{ ] 

E T H2 O 

a 

max 

− E T H2 O(t) 

22 

E T H2 O max 


[ 

] 

(1 − E ice (t))a 0 e (0.06933∗E T H 2 O(t)) 

M(t) + 

] 

a 15 D 

(a 8 12.0107) 

Using the same method as Nitrate the upper found for Iron is 

F u (t) = a 21 + Q u e −a 22t 

where Q u = F 0 − a 21 , 

Summarizing the bounds we get, 

0 ≤ F (t) ≤ a 21 + Q u e −a 22t 

(14) 

When t → ∞ we get it to be, 

∴ 0 ≤ F (t) ≤ a 21 (15) 

42

As for Nitrate, Iron Concentration will not exceed a maximum set by parameter 

21 which is the Iron average depth concentration. Similarly, the maximum rate 

of decline for Iron is set by its maximum mixing rate. Behavior of Iron and Nitrate 

concentrations are thus dependent on the accuracy of the parameter selection 

process. 

The next entity in our analysis is zooplankton as it has a fairly simple equation 

structure. By the Comparison Argument in Lemma 2 and since (1 − e (−a 14P ) ) ≤ 1 

we can write, 

dZ 

[ 

dt = a 12 a 13 (1 − e (−a 14P ) ) − a 11 − a 16 

]Z 

] 

≤ 

[a 12 a 13 − a 11 − a 16 Z 

Thus, by Proposition 1, 

Z(t) ≤ Z 0 e (a 12a 13 −a 11 −a 16 )t = Z 0 e −0.2166944309t 

We know that by definition the Zooplankton concentration is positive which gives 

us a lower bound of sero. Then, 

0 ≤ Z(t) ≤ Z 0 e −0.2166944309t 

0 ≤ Z(t) ≤ Z 0 e −δt where, δ = 0.2166944309 (16) 

We notice that as t → ∞ Z(t) goes to zero which implies that the Zooplankton 

population is driven to extinction. 

Since for a biologist this result goes against 

expectations, the validity of this model structure is questioned. 

lim Z(t) = 0 (17) 

t→+∞ 

43

For the Zooplankton not to go to zero as t goes to infinity, (a 12 a 13 − a 11 − a 16 ) 

would have to be greater than zero. This may be a clue to refining the constraints on 

the parameter selection process, so that it is strictly positive, insuring a zooplankton 

concentration not going to zero for this model structure. 

This result is then used to further our analysis by looking at the Phytoplankton 

(1) equation as knowing P(t) will help us find bounds for the other entities. The 

phytoplankton differential equation like those of Nitrate and Iron is composed of a 

minimum function M(t), not often found in differential equations. In order to find 

bounds for P(t) we must first find bounds for M(t). Recall, 

{ 

F 

M(t) = min 

(F + a 5 ) , N 

(N + a 4 ) , E P UR (t) 

(e− a 2 )(1−e −E P UR (t)(1+a 3 e(E P UR (t)e1.089−2.12log 10 (a 1 ) ) ) 

a 1 )) 

} 

M(t) being a minimum function it will always pick the smallest value of the 3 

functions stated above, thus using (15) we can safely estimate the range of M(t) to 

be: 

0 ≤ M(t) ≤ F upperbound 

F upperbound + a 5 

= a 21 

a 21 + a 5 

= 0.53262. (18) 

Using the lower bound of (16), we are trying to find an upper bound for P(t) since 

Z(t) is subtracted we used its small value (i.e. lower bound), and Lemma 2 we have, 

[ 

dP [ ] 

dt = (1 − E ice (t))a 0 e (0.06933∗E T H 2 O(t)) 

M(t)(1 − a 6 ) − a 9 − a 17 

]P 

( 

) 

− a 13 (1 − e (−a 14P ) )Z 

≤ 

[ [ 

(1 − E ice (t))a 0 e (0.06933∗E T H 2 O(t)) 

] 

M(t)(1 − a 6 ) − a 9 − a 17 

]P 

44

Using the exogenous time-series at our disposal we find, 

0.01406716 ≤ 

[ 

] 

(1 − E ice (t)) ∗ a 0 ∗ e (0.06933∗E T H 2 O(t)) 

≤ 0.8639959 (19) 

Using (18) and (19), 

We may rewrite as follow, 

dP 

[ 

] 

dt ≤ (0.8639959)(0.53262)(1 − a 6 ) − (a 9 + a 17 ) P 

≤ (0.402759) P 

} {{ } 

α u 

dP 

≤ α 

dt u P where α u = 0.402759 

For the lower bound of P(t), using the upper bound of (16), (18) and Lemma 2, we 

get a manageable lower bound, 

dP 

( 

) 

dt ≥ − (a 9 + a 17 ) P − a 

} {{ } 

13 (1 − e (−a 14P ) ) Z 

} {{ } 0 e 

} {{ −δt 

} 

≥a 13 

(16) 

α l 

≥ −α l P − a 13 Z 0 e −δt where, α l = 0.0469007 

By Proposition 1 we get, 

P (t) ≥ 

( 

P 0 + a ) 

13 

α − δ Z 0 e −αlt − a 13 

α − δ Z 0e −δt 

Summarizing the bounds for P(t) we get, 

45

( 

∴ P 0 + a ) 

13 

α − δ Z 0 e −αlt − a 13 

α − δ Z 0e −δt ≤ P (t) ≤ P 0 e αut (20) 

P (t) > 0 ∈ (0, +∞) 

From a biological standpoint α l is the maximum rate of decline and α u is the maximum 

rate of growth. Theses bounds give us little information about the model, as 

they simply state that Phytoplankton concentration is contained between zero and 

infinity. 

Next we look at Detritus: 

dD 

( 

) ( 

{ }} { ) 

dt = (1 − a 10 )a 11 P + a 11 + (1 − a 12 ) a 13 (1 − e (−a 14P ) ) (1 − a 10 )Z − D(a 15 + a 18 ) 

( 

) 

) 

≤ (1 − a 10 )a 11 P + 

(a 11 + (1 − a 12 )a 13 (1 − a 10 )Z − D(a 15 + a 18 ) 

( 

) 

) 

≤ (1 − a 10 )a 11 P 0 e 

} {{ αut + 

(a 

} 11 + (1 − a 12 )a 13 (1 − a 10 ) Z 0 e 

} {{ −δt 

} 

(20) 

(16) 

− D(a 15 + a 18 ) 

≥a 13 

Using (16) and (20)and simplifying a bit we get a more manageable upper bound. 

Solving for upper bound D u (t) 

dD u 

dt 

+ (a 15 + a 18 )D u = (1 − a 10 )a 11 P 0 e αut + 

(a 11 + (1 − a 12 )a 13 

) 

(1 − a 10 )Z 0 e −δt 

46

Using integrating factor e (a 15+a 18 )t and Proposition 1 we get, 

D u (t) = 

( (1 − a10 )a 

) ( 

11 

P 0 e αut (a11 + (1 − a 12 )a 13 )(1 − a 10 ) 

) 

+ 

Z 0 e −δt + C u e −(a 15+a 18 )t 

α u + a 15 + a 

} {{ 18 −δ + a 

} 

15 + a 

} {{ 18 

} 

β u1 β u2 

Let’s rewrite to simplify the expression a bit, 

D u (t) = β u1 P 0 e αut + β u2 Z 0 e −δt + C u e −(a 15+a 18 )t , 

where β u1 = 0.224039 and β u2 = −3.754762. 

Assuming D(0) = D 0 and still following Proposition 1 we solve for C u , 

C u = D 0 − β u1 P 0 − β u2 Z 0 

Similarly for the lower bound, using Proposition 1, (16) and (20) we get, 

Let’s rewrite it as, 

D l (t) = 

( (1 − a10 )a 

) 

11 

P 0 e −αlt + C l e −(a 15+a 18 )t 

−α l + a 15 + a 

} {{ 18 

} 

β l 

D l (t) = β l P 0 e −α lt + C l e −(a 15+a 18 )t , 

where β l = 2.9784819 and C l = D 0 − β l P 0 . 

Summarizing the bounds, 

β l P 0 e −α lt + C l e −(a 15+a 18 )t ≤ D(t) ≤ β u1 P 0 e αut + β u2 Z 0 e −δt + C u e −(a 15+a 18 )t 

(21) 

This concludes our analysis of Model A; the results will be discussed further on. 

47

4.4 Model B 

Shifting our focus to Model B we find different model structures. Indeed, (7) yields 

a Lokta-Volterra structure which will make for an interesting analysis. 

Following the procedure used for Model A, we start our analysis with the Zooplankton 

equation (7), the simplest of all five. To find bounds for Z(t) we first need 

to find that of H(t). 

{ 

} 

a 13 (P − a 16 − a 15 ) 

H(t) =max 0, 

a 14 + (P − a 16 − a 15 ) 

a 13 (P − a 16 − a 15 ) 

a 14 + (P − a 16 − a 15 ) ≤ a 13 

Thus we get, 

0 ≤ H(t) < a 13 (22) 

Knowing (22) we can conclude, 

dZ 

( ) 

dt = a 12 H(t)Z − a 11 Z 2 − a 18 Z 

≤a 12 a 13 Z − a 11 Z 2 − a 18 Z = Z [a 12 a 13 − a 18 − a 11 Z] 

} {{ } 

Logistic Equation 

Setting, a 12 a 13 − a 18 − a 11 Z = 0 we can find the carrying capacity K. 

K = a 12a 13 − a 18 

a 12 

= 42.3156486 

48

Thus, an upper bound for Z(t) will be, 

lim Z(t) ≤ K = 42.3156486 

t→+∞ 

This is significant, since the Zooplankton concentration will have a maximum of K 

and is closer to the type of behavior an ecologist would expect to see in Zooplankton 

concentrations. The lower bound of this entity will be zero, since (22) and we know 

from biology that Zooplankton concentration cannot be negative. Hence, 

∴ 0 ≤ Z(t) ≤ 42.3156486 (23) 

However if P (0) ≤ a 16 + a 15 then Z(t) would go to zero as t goes to infinity 

because H(t) = 0 hence changing the structure of the equation and driving the 

population to extinction. 

In order to proceed to the analysis of P(t) we must first find the bounds for M(t). 

{ 

F 

M(t) =min 

(F + a 5 ) , (1 − e−a 5N ), (e −E P UR (t) 

a 2 )(1 − e −E P UR (t)(1+a 3 e(E P UR (t)e1.089−2.12log 10 (a 1 ) ) ) 

a 1 )) 

} 

Since M(t) is a minimum function its bounds are, 

0 ≤ M(t) ≤ 1. (24) 

We now are able to find bounds for P(t), 

[ ] 

dP [ ] 


M(t)P (1 − a 6 ) 

( ) 

− (a 9 P ) − H(t)Z − (a 19 P ) 

49

Using the exogenous variables time-series we estimate: 

0.009868042 ≤ 

[ 

] 

(1 − E ice (t)) ∗ a 0 ∗ e (0.06933∗E T H 2 O(t)) 

≤ 0.6060888 (25) 

Using (24), (37) and dropping the subtracted elements we find the upper bound to 

be, 


dP 

[ 

] 

dt ≤ (0.6060888)(1)(1 − a 6 ) − (a 9 + a 19 ) P 

} {{ } 

α u 

For the lower bound, since (24) and (23): 

dP 

≤ α 

dt u P where α u = 0.4720906 


dP 

dt ≥ − (a 9 + a 19 ) P − a 

} {{ } 13 K 

α l 

dP 

dt 

≥ −α l P − K 

where α l = 0.032101 

Using proposition 1 we get, 

∴ (P 0 + K α l 

)e −α lt − K α l 

≤ P (t) ≤ P 0 e αut (26) 

P (t) > 0 ∈ (0, +∞) 

50

On a biology standpoint α l is the maximum rate of decline and α u is the maximum 

rate of growth. Theses bounds give us little information about the model, as it states 

that Phytoplankton concentration is contained between zero and infinity. Continue 

our analysis with Detritus: 

dD 

( 

) 

) ( 

) 

dt = (1 − a 10 )a 9 P + 

((1 − a 10 )a 11 Z 2 + (1 − a 10 )(1 − a 12 )H(t)Z 

− D(a 17 + a 20 ) 

Using (22), (23) and (26) we get, 

dD 

( 

) ) ( 

) 

dt ≤ (1 − a 10 )a 9 P 0 e αut + 

((1 − a 10 )a 11 K 2 + (1 − a 10 )(1 − a 12 )a 13 K − D(a 17 + a 20 ) 

Then solving for the upper bound, 

dD u 

dt 

) 

) 

+ D u (a 17 + a 20 ) = 

((1 − a 10 )a 9 P 0 e αut + 

(a 11 K + (1 − a 12 )a 13 (1 − a 10 )K 

Using Proposition 1, 

D u (t) = 

lim 

t→+∞ Du (t) = ∞ 

+ 

(a 11 K + (1 − a 12 )a 13 

) 

(1 − a 10 )K 

+ (1 − a 10)a 9 

P 0 e αut 

α u + a 17 + a } {{ 20} 

β u1 β u2 

(a 17 + a 20 ) 

} {{ } 

(D 0 − β u1 − β u2 P 0 

) 

e −(a 17+a 20 )t 

where β u1 = 220.00119 and β u2 = 0.03054 

51

Then finding an lower bound, 

dD l 

dt + Dl (a 17 + a 20 ) = 0 

D l (t) = D 0 e −(a 17+a 20 )t 

lim 

t→+∞ Dl (t) = 0 

Thus, 

D 0 e −(a 17+a 20 )t ≤ D(t) ≤ β u1 + β u2 P 0 e αut + 

(D 0 − β u1 − β u2 P 0 

) 

e −(a 17+a 20 )t 

(27) 

These bound (27) show a maximum rate of decline driven by parameter 17 and 

20. As they were in Model A, the equation structure for Nitrate and Iron are very 

similar differing only by parameter and variables. 

[ 

] [ 

dN 

dt = a 17 D 

+ (a 21 − N) 

(a 7 12.0107) 

[ 

P 

− 

(a 7 12.0107) 

≤a 22 

{ }} { 

] 

E T H2 O 

(a 

max 

− E T H2 O(t) 

22 

E T H2 O max 


[ 

] 

(1 − E ice (t))a 0 e (0.06933∗E T H 2 O(t)) 

M(t) 

] 

Using (26) and (27) we find an upper bound, the P term is dropped as it’s lower 

bound is zero . Thus, 

dN 

a 17 

(β 

dt ≤ (a u1 + β u2 P 0 e αut + 

21 − N)a 22 + 

) ) 

(D 0 − β u1 − β u2 P 0 e −(a 17+a 20 )t 

(a 7 12.0107) 

52

Setting up to solve N u (t) 

dN u 

dt 

dN u 

dt 

a 17 

(β 

= (a 21 − N u u1 + β u2 P 0 e αut + 

)a 22 + 

(a 7 12.0107) 

a 17 

(β 

+ N u u1 + β u2 P 0 e αut + 

a 22 = a 21 a 22 + 

(a 7 12.0107) 

) ) 

(D 0 − β u1 − β u2 P 0 e −(a 17+a 20 )t 

(D 0 − β u1 − β u2 P 0 

) 

e −(a 17+a 20 )t 

) 

Solving for N u (t) using Proposition 1, 

( 

N u (t) = a 21 + 

) ) 

a 17 

(β u1 + β u2 P 0 e αut + 

(D 0 − β u1 − β u2 P 0 e −(a 17+a 20 )t 

) 

(a 22 a 7 12.0107) 

} {{ } 

+ 

(N 0 − γ N 

) 

e −(a 22)t 

γ N 

where, 

lim γ N(t) = ∞ 

t→+∞ 


N u (t) = γ N + 

lim N u (t) = ∞ 

t→+∞ 

(N 0 − γ N 

) 

e −(a 22)t 

For the lower bound, the D term is dropped as its lower bound goes to zero. 

dN 

dt ≤ (a 21 − N)a 22 

53

Thus, using Proposition 1 we get, 

dN l 

= (a 21 − N l )a 22 

dt 

dN u 

+ N l a 22 = a 21 a 22 

dt 

N l (t) = a 21 + N 0 e −a 22t 

lim 

t→+∞ N l (t) = a 21 

To summarize the bounds, 

a 21 + N 0 e −a 22 

≤ N(t) ≤ γ N + 

(N 0 − γ N 

) 

e −a 22t 

When t → ∞ we obtain, 

∴ a 21 = 31 ≤ N(t) ≤ ∞ (28) 

Nitrate then has constant lower bound, which implies that the concentration will 

never go below a 21 for this particular model structure. This make us re-iterate that 

these models are very sensitive to parameter selection process. As mentioned above 

Iron as the same equation structures, thus using the same analysis we found Iron as 

follows: 

[ 

] [ 

dF 

dt = Da 17 

+ 

(a 8 12.0107) 

[ 

P 

− 

(a 8 12.0107) 

] 

E T H2 O 

(a 23 − F )a 

max 

− E T H2 O(t) 

24 

E T H2 O max 


[ 

] 

(1 − E ice (t))a 0 e (0.06933∗E T H 2 O(t)) 

M(t) 

] 

Thus, 

54

( 

F u (t) = a 23 + 

) ) 

a 17 

(β u1 + β u2 P 0 e αut + 

(D 0 − β u1 − β u2 P 0 e −(a 17+a 20 )t 

) 

(a 24 a 8 12.0107) 

} {{ } 

+ 

(F 0 − γ F 

) 

e −(a 24)t 

γ F 

where, 

lim γ F (t) = ∞ 

t→+∞ 


F u (t) = γ F + 

lim F u (t) = ∞ 

t→+∞ 

(F 0 − γ F 

) 

e −(a 24)t 

For the lower bound we get, 

F l (t) = a 23 + F 0 e −a 24t 

lim 

t→+∞ F l (t) = a 23 

To summarize we get, 

a 23 + F 0 e −a 24 

≤ F (t) ≤ γ F + 

(F 0 − γ F 

) 

e −a 24t 

which as t → ∞, 

55

∴ a 23 = 4.5.10 −4 ≤ F (t) ≤ ∞ (29) 

As was the case for Nitrate, Iron is bounded below by Parameter 23. 

This 

concludes the analysis of Model B. 

Models A and B are the two good fit models under a .5 reMSE which came 

up the most frequently throughout the 31 experiments. 

Our analysis has shown 

that the structure of the equations for phytoplankton and detritus produce similar 

dynamics and bounds for both models; on the other hand where iron and nitrate 

were bounded above with a parameter in Model A they were bounded below by 

a parameter value in Model B . Also, the structure and bounds for zooplankton 

had much more variations. For instance, the bounds for Model A implied that the 

zooplankton population will go to extinction whereas bounds for Model B indicated 

that the population has an upper bound at the carrying capacity K. This simple 

observation led me to look more into the zooplankton dynamic, to do so I chose to 

select the model with the lowest reMSE from experiment 6. In this experiment HIPM 

was provided observations for both phytoplankton and zooplankton dynamics. This 

was not a random choice since phytoplankton is the dynamic we are trying to model 

and zooplankton is the state variable demonstrating the most variability in structure 

and having potentially the most restrictive power out of all state variables, based on 

computational results. This model will be presented as Model C. 

56

4.5 Model C 

Table 7: This table summarize all the parameters that play a role in Model C 

Model C 

ID Name Value 




a 3 Nitrate monod lim coefficient 5.13429e-05 

a 4 Iron monod lim 0.0001252 








a 12 zoo.attack 0.340717 

a 13 zoo grazing.ratio dependent 3 coefficient 1.86168 





a 18 NO3.avg deep conc 31.2197 




57

Model C 

Where, 

dP 

dt 

dZ 

dt 

dD 

dt 

dN 

dt 

dF 

dt 

= 

= 

= 

= 

= 

[ [ ] 

(1 − E ice (t))a 0 e (0.06933∗E T H 2 O(t)) 

M(t) (1 − a 5 ) − a 8 − a 16 

]P (30) 

} {{ } 

( 

− 

(a 12 P 2 ) 

Z 2 + a 12 a 13 P 2 

} {{ } 


P Growth Rate 

) 

Z 

( 

(a 12 P 2 ) 

) 

) 

a 11 Z − 

(a 

Z 2 + a 12 a 13 P 

} {{ 2 

10 Z + a 15 Z (31) 

} 


( 

) 

(1 − a 9 )(a 8 P + a 10 Z 2 ) 

(32) 

( 

+ (1 − a 9 )(1 − a 11 ) 

[ 

] [ 

a 14 D 

+ 

(a 6 ∗ 12.0107) 

(a 12 P 2 ) 

Z 2 + a 12 a 13 P 

} {{ 2 

} 


) 

Z − D(a 14 + a 17 ) 

E T H2 O 

(a 18 − N) a 

max 

− E T H2 O(t) 

19 

E T H2 O max 

− E T H2 O 

} {{ min 

} 

N Mixing Rate 

[ 

] 

P 

[ 

] 

− 

(1 − E ice (t))a 0 e (0.06933∗E T H 2 O(t)) 

M(t) 

(a 6 12.0107) } {{ } 

P Growth Rate 

[ 

] [ 

] 

a 14 D 

+ 

(a 7 12.0107) 

− 

[ 

P 

(a 7 12.0107) 

E T H2 O 

(a 20 − F ) a 

max 

− E T H2 O(t) 

21 

E T H2 O max 

− E T H2 O 

} {{ min 

} 

F Mixing Rate 

[ 

] 

(1 − E ice (t))a 0 e (0.06933∗E T H 2 O(t)) 

M(t) 

} {{ } 

P Growth Rate 

] 

] 

(33) 

(34) 

{ 

M(t) = min 

F 

(F + a 4 ) , N 

(N + a 3 ) , (1 − e −E P UR (t)(1+a 2 e(E P UR (t)e1.089−2.12log 10 (a 1 

} ) ) ) 

a 1 )) 

} {{ } 


58

The analysis of Model C is very similar to that of Model B, especially for Nitrate 

and Iron since their equations structures are identical expect for the parameters and 

M(t). In this case since we are using the approximation of the bound of M(t) to 

be between zero and one for all the models, the bounds for both Nitrate and Iron 

are going to be close in structure to that of model C. Following the procedure used 

in both previous models we start our analysis with the Zooplankton equation (31) 

which is the simplest of the five. 

dZ 

( 

dt = 

(a 12 P 2 ) 

a 11 

Z 2 + a 12 a 13 P 

} {{ 2 

} 


≤Z [ a 11 

a 13 

− a 15 − a 10 Z] 

} {{ } 

Logistic Equation 

) 

) 

Z − 

(a 10 Z + a 15 Z 

Setting, a 11 

a 13 

− a 15 − a 10 Z = 0 we can find the carrying capacity K. 

K = 

a 11 

a 13 

− a 15 

a 10 

= 76.41856944 

Thus, an upper bound for Z(t) will be, 

lim Z(t) ≤ K = 76.41856944 

t→+∞ 

This is significant since the Zooplankton concentration will have a maximum of 

K. The lower bound of this entity will be zero, since we know from biology that 

Zooplankton concentration cannot be negative. Hence, 

∴ 0 ≤ Z(t) ≤ 76.41856944 (35) 

59

Next we turn our attention to P (t) for which we take M(t) to have the following 

bounds: 

0 ≤ M(t) ≤ 1, (36) 

and using the exogenous variables time-series we estimate: 

0.0138249 ≤ 

[ 

] 

(1 − E ice (t)) ∗ a 0 ∗ e (0.06933∗E T H 2 O(t)) 

≤ 0.8491189 (37) 

Then, 

[ 

dP [ ] 


M(t)(1 − a 5 ) − a 8 − a 16 

]P 

( 

(a 12 P 2 ) 

) 

− 

Z 

Z 2 + a 12 a 13 P 2 ] 

[(0.8491189)(1)(1 − a 5 ) − a 8 − a 16 

≤ 


} {{ } 

α u 

P 

Using proposition 1 we get, 

dP 

≤ α 

dt u P where α u = 0.806566 

∴ 0 ≤ P (t) ≤ P 0 e αut (38) 

lim P (t) = ∞ (39) 

t→+∞ 

60

That being said let’s continue the analysis of Model C with Detritus. 

dD 

dt = ( 

+ 

) 

(1 − a 9 )(a 8 P + a 10 Z 2 ) 

( 

(1 − a 9 )(1 − a 11 ) 

Using (35) and (39) we get, 

(a 12 P 2 ) 

Z 

Z 2 + a 12 a 13 P 

} {{ 2 

} 


) 

− D(a 14 + a 17 ) 

dD 

( 

) 

dt ≤ (1 − a 9 )(a 8 P 0 e αut + a 10 K 2 ) + 

((1 − a 9 )(1 − a 11 ) K ) 

− D(a 14 + a 17 ) 

a 13 

Then solving for the upper bound, 

dD u 

dt 

+ D u (a 14 + a 17 ) =(1 − a 9 )a 8 P 0 e αut + 

( 

a 10 K + (1 − a ) 

11) 

(1 − a 9 )K 

a 13 

Using Proposition 1, 

( 

a 10 K + (1−a 11) 

D u a 13 

)(1 − a 9 )K 

(t) = 

+ (1 − a 9)a 8 

P 0 e αut 

(a 14 + a 17 ) α 

} {{ } u + (a 14 + a 17 ) 

} {{ } 

β u1 β u2 

) 

(D 0 − β u1 − β u2 P 0 e −(a 14+a 17 )t 

where β u1 = 1.721033 and β u2 = 1.7508e −04 

61

We can rewrite D u (t) as, 

D u (t) = β u1 + β u2 P 0 e αut + 

lim 

t→+∞ Du (t) = ∞ 

(D 0 − β u1 − β u2 P 0 

) 

e −(a 14+a 17 )t 

Then finding an lower bound, 

dD l 

dt + Dl (a 14 + a 17 ) = 0 

D l (t) = D 0 e −(a 14+a 17 )t 

lim 

t→+∞ Dl (t) = 0 

Thus, 

D 0 e −(a 14+a 17 )t ≤ D(t) ≤ β u1 + β u2 P 0 e αut + 

(D 0 − β u1 − β u2 P 0 

) 

e −(a 14+a 17 )t 

(40) 

When t → ∞ we get the following, 

∴ 0 ≤ D(t) ≤ ∞ (41) 

The approach for Nitrate and Iron is identical to that of Model B so we get, for 

Nitrate 

a 18 + N 0 e −a 19 

≤ N(t) ≤ γ N + 

(N 0 − γ N 

) 

e −a 19t 

where, 

γ N = 

) 

( a 14 β u1 + β u2 P 0 e αut + 

(D 0 − β u1 − β u2 P 0 

a 18 + 

(a 19 a 6 12.0107) 

e −(a 14+a 17 )t 

) 

62

When t → ∞ we obtain, 

∴ a 18 = 31.2197 ≤ N(t) ≤ ∞ (42) 

And for iron we get, 

a 20 + F 0 e −a 21 

≤ F (t) ≤ γ F + 

(F 0 − γ F 

) 

e −a 21t 

where, 

γ F = 

) 

( a 17 β u1 + β u2 P 0 e αut + 

(D 0 − β u1 − β u2 P 0 

a 20 + 

(a 21 a 7 12.0107) 

e −(a 14+a 17 )t 

) 

which as t → ∞, 

∴ a 20 = 4.5.10−4 ≤ F (t) ≤ ∞ (43) 

The analysis of Model C yield some interesting results. 

Indeed, we obtained 

realistic bounds for all the states variables. The upper bounds for Phytoplankton, 

Nitrate and Iron are not informative as they go to infinity.That being said with 

only two set of time-series (Phytoplankton and Zooplankton) HIPM produced five 

good fit model under .5 reMSE, one of which I have analyzed and seemed to yield 

dynamics in accordance with what domain scientists would expect. 

4.6 Effects of increasing the number of constraints 

The computational results have established that increasing the number of constraints 

inputted into HIPM by adding additional entities’ time-series will in some cases 

reduce the number of good fit models selected by the software. A few cases were 

studied to look at the impact of increase in constraint on the models selected. During 

63

his research on the Ross Sea Phytoplankton dynamic, Borrett (unpublished data) 

worked with real-life measurements for Phytoplankton and Nitrate and inputted this 

data into HIPM, for that particular reason I decided to look at experiment number 8 

which assumed data for both Nitrate and Phytoplankton (cf. Table 4). The number 

of good fit models, under a .5 reMSE, produced by HIPM is 25. When observations 

for Zooplankton are added no models under the chosen reMSE cutoff are selected; on 

the other hand when Iron or Detritus are added a significant decrease in the number 

of good fit models can be observed. The addition of detritus (experiment 19) and 

iron (experiment 21) yielded respectively 13 and 15 good fit models. However, out 

of all the models selected in experiment 8 only 2 models were part of the set selected 

in experiment 19 (addition of detritus data) and 3 models from the set selected 

in experiment 21 (addition of iron data). A comparison of the structures of these 

models (the models can be found in appendix D and E) that they differ only, by 

the type of Zooplankton grazing process used, by the parameter values and, in some 

cases, by the Phytoplankton growth limitation (aka M(t)). Otherwise the structures 

of these models are comparable, implying perhaps that the grazing processes used 

in these models have similar effects on the ecosystem, which is plausible given the 

number of grazing processes present in the process library. It is in fact this high 

number of grazing options that makes for a very large structural search space. 

64

5 CONCLUSION 

The hierarchal inductive process-modeling framework was effective in its role to cover 

two very extensive search spaces in a short amount time and with the availability of 

the CIAO data I was able to investigate the usefulness of the software in our search 

for the best model representation. 

Some of the major observations that can be made throughout this paper are 

about the Zooplankton state variable. 

Not only did it tend to provide great restrictive 

power when its time series was inputted into HIPM as portrayed with its 

median activation value and Table 4, but most often some of the good fit models 

I selected only differed in the type of Zooplankton grazing process chosen. All 

this may suggest one of two things. 

Either, that indeed Zooplankton yields the 

most important discriminatory power out of all state variables and is the data that 

should be collected first and foremost for the Ross Sea ecosystem, or that the way 

the Zooplankton entity is defined in the HIPM framework is inadequate for this 

type of ecosystem which incidentally weeds out many of the model that are being 

searched through (i.e. all the zeros in Table 4 ). This makes for high variability in 

structure for the good fit models selected which in turn creates an array of dynamic 

some of which are at opposite end of the spectrum (i.e. zooplankton population 

going to extinction in some case or going to the carrying capacity in other) when 

HIPM is provided with time-series data that are not that of Zooplankton. The other 

explanation about the issues encountered with Zooplankton could be found not in 

the way Zooplankton is defined in the process library but rather in an assumption 

made within the biological knowledge encoded into HIPM. Indeed, Phaeocystis are 

assumed to be grazing resistant to zooplankton, meaning that it is more difficult for 

zooplankton to graze upon on Phaeocystis than it is on diatoms; the later not being 

included in the process-library as we will discuss later on. Hence, the inability for 

65

zooplankton to be properly fitted may lie in the way phytoplankton has been defined 

and not zooplankton. 

Even though phytoplankton did not seem to reduce the number of good fit model 

as effectively as zooplankton, it did have an unexpected dynamic for the good fit 

models studied. Indeed, in all models the phytoplankton state variable upper bound 

is infinity as t goes to infinity. As mentioned previously the Ross Sea is scene to one 

the largest phytoplankton blooms in the Southern Ocean. The population of phytoplankton 

in the Ross Sea is primarily made up of two species, Phaeocystis antartica 

and diatoms. Since Phaeocystis dominates the phytoplankton bloom in the region, 

it was the only species taken into consideration in this experiment which may have 

influenced the output of the system and the dynamics of the states variables and 

more specifically those of zooplankton. The addition of another species of phytoplankton 

in HIPM could change the outputs significantly. There is a need here to 

modify the way phytoplankton are defined in HIPM and incorporate the multiplicity 

of species of phytoplankton in the system. This can be ground for future research. 

Surprisingly, we found contrary evidence to the assumption that more information 

meant fewer models being selected. As explained in the computational results, 

there were instances where inputting two time series into HIPM selected fewer good 

fit models then inputting three time-series. Once again, as previously mentioned, 

the explanation for this observation is the way reMSE is defined within HIPM when 

dealing with multiple variables: the mean square errors of the state variable being 

fitted are averaged. For instance, if both phytoplankton and detritus have a mean 

error of 0.3 the reMSE would be 0.3 if we then add iron with a mean error of 0.5 

the overall reMSE is now 0.36 which is under the cutoff for good fit models. We 

can imagine cases for which models that were not in the good fit category with two 

time-series then become good fit models with the addition of another time-series. 

Hence, the set of good fit models selected with three time-series entries, will not 

66

e a subset of the set of good fit models selected with two of the three time-series 

previously mentioned. This puts an important emphasis on which measurements 

and observations added to the search. The decision of adding an extra time-series to 

the software should be highly influenced by which data has already been collected 

and used with HIPM. That being said this could be an area where HIPM could be 

enhanced and a direction for further research. Indeed, it would be interesting to look 

at the output of HIPM if the reMSE was calculated by taking the maximum square 

error of all the variables being fitted which would then make our assumption, that 

more time-series data equates fewer models, valid. 

The parameter selection process 

is a very important step of HIPM model selection; this statement was reinforced 

by the mathematical analysis that made evident the sensitivity of the system to parameter 

values. Differences in parameter values could mean the difference between a 

population going to extinction or not. In the case of nitrate and iron I observed that 

specific parameters acted as bounds for these variables, which is useful information. 

Indeed, when looking at a model through mathematical analysis one can determine 

if a parameter will have a significant effect on the overall dynamics of the system. 

Coupling this information with experts knowledge it would then be possible to redefine 

the ranges set within HIPM for the parameter selection, which would in turn 

refine the search process. Scientists could then run the software once again, take 

the result and see if parameters ranges could be further refined. This can almost be 

seen as a cycle, deriving information on the parameters from the analytical analysis 

which in turn help better the constraints given to HIPM, then repeating the process 

to see the improvement made to the type of model being selected. This in itself can 

be seen as a procedure that transcend just the phytoplankton dynamic in the Ross 

Sea ecosystem and can be generalized to other systems. 

Automated modeling is a successful method, the LAGRAMGE framework has 

been successfully applied in a real-world domain and selected models that performed 

67

eally well (Atanasova et al. 2007). While this framework can evaluate only able one 

state variable at the time, HIPM is capable of evaluating multiple variables simultaneously 

; however we are faced with an under-constrained optimization problem 

since we want to select models with data for only a couple of the variables. Using 

CIAO simulated data we were able to explore and investigate the response of HIPM 

for the phytoplankton dynamic in the Ross Sea. Even though we did not use real-life 

data the results generate conclusions. First, more data is not synonymous with fewer 

models being selected. This conclusion must be tested to see if it can be generalized 

to other ecosystems or if becomes obsolete with a refined and improved processlibrary. 

Secondly, the result that zooplankton contains more restrictive power than 

the other state variables was attained only through multiple experiments using a 

full data set. There is room for further work in the area of exploratory statistics 

with the median activation value in order to develop a formal procedure that would 

assist scientist in their decision making process for data collection. 

The number of processes taken into consideration for the Ross Sea ecosystem 

make for an extensive process-library, which creates models with very intricate and 

complex structures. A direction for improvement would be to look at some sort of 

measure of complexity for the models, somewhat motivated by the law of parsimony 

that states that the simplest explanation is often the best.This could be coupled 

with a measure of distance between models; this two concepts could potentially be 

of great value when comparing different models. However, at this point the more 

plausible and logical step for future research would be first to incorporate diatoms 

in the way in which phytoplankton is defined in the process library and second to 

switch from a mean square error to a maximum square error, which in my opinion 

would yield very different results. In the long run, this thesis could be the premise 

to a protocol towards decision making in the data collection process. 

68

REFERENCES 

[1] Arrigo, K. R., and C. R. McClain, “Spring phytoplankton production in the 

western Ross Sea”, Science, 266, 261263, 1994. 

[2] Arrigo, K. R., A. M. Weiss, and W. O. Smith Jr., “Physical forcing of phytoplankton 

dynamics in the southwestern Ross Sea”, J. Geophys. Res., 103, 

10071021, 1998. 

[3] Arrigo, K. R., G. R. DiTullio, R. B. Dunbar, M. P. Lizotte, D. H. Robinson, 

M. VanWoert, and D. L. Worthen, “Phytoplankton taxonomic variability and 

nutrient utilization and primary production in the Ross Sea”, J. Geophys. Res., 

105, 8827 8846, 2000. 

[4] Arrigo, K. R., D. Worthen & D. Robinson, “A coupled ocean-ecosystem model 

of the Ross Sea: 2. Iron regulation of phytoplankton taxonomic variability and 

primary production”, Journal of Geophysical Research, VOL 108, NO. C7, 3231, 

2003. 

[5] Atanasova, N., L. Todorovski, S. Dzeroski &B. Kompare, “Application of automated 

model discovery from data and expert knowledge to a real-world domain: 

Lake Glums” Ecological Modelling, 212, 92-98, 2008. 

[6] Borrett, S. R., W. Bridewell, P. Langley & K. Arrigo, “A method for representing 

and developing process models” Ecological Complexity, 4, I-12, 2007. 

[7] Bridewell, W., P. Langley , S. Racunas, & Borrett, S. “Learning process models 

with missing data”. Proceedings of the Seventeenth European Conference on 

Machine Learning, 557–565. 2006. 

[8] Bridewell, W., P. Langley, L. Todorovski &S. Dzeroski, “Inductive Process Modeling”, 

Standford University, Standford, CA, 2007. 

69

[9] Dzeroski, S. and Todorovski, L. “Discovering dynamics: from inductive logic 

programming to machine discovery.” Journal of Intelligent Information Systems, 

4: 89-108. 1995. 

[10] Dzeroski, S., Todorovski, L. “Discovering dynamics. Proceedings of the Tenth 

International Conference on Machine learning”, Morgan Kaufmann, San Mateo, 

CA, pp. 97103. 1993. 

[11] Fayyad, U., Haussler, D., & Stolorz, P. KDD “for science data analysis: Issues 

and examples. Proceedings of the Second International Conference of Knowledge 

Discovery and Data Mining” (pp. 5056). Portland, OR: AAAI Press. 1996. 

[12] Feng, W., X. Lu & R. Donovan, “Population Dynamics in a Model Territory 

Acquisition” Discrete And Continuous Dynamical Systems,Added Volume, 156- 

165, 2001. 

[13] Langley, P., J. Shrager, N. Asgharbeygi & S. Bay, “Inducing Explanatory Process 

Models from Biological Time Series”,Standford University, Standford, CA. 

[14] Langley, P. Elements of machine learning. San Mateo, CA: Morgan Kaufmann.1995. 

[15] Langley, P., Shiran, O., Shrager, J., Todorovski, L., & Pohorille, A. “Constructing 

explanatory process models from biological data and knowledge.” AI 

in Medicine, 37, 191-201. 2006. 

[16] Ljung, L. “Modelling of industrial systems. Proceedings of Seventh International 

Symposium on Methodologies for Intelligent Systems” (pp. 338-349). Berlin: 

Springer. 1993. 

[17] Mitchell, T. M. Machine learning. New York, NY: McGraw Hill. 1997. 

70

[18] Oreskes, N., K. Shrader-Frechette & K. Belitz.“Verification, validation, and 

confirmation of numerical models in the earth sciences. Science, vol. 263, pp. 

641-646. [Reprinted in Transactions of the Computer Measurement Group, vol. 

84, pp. 85-92].1994. 

[19] Oreskes, N. “Why believe a computer Models, measures, and meaning in the 

natural world, in The Earth Around Us: Maintaining a Livable Planet, edited 

by Jill S. Schneiderman (San Francisco: W.H. Freeman and Co.), pp. 70-82. 

2000. 

[20] Tagliabue A. & K. R. Arrigo. “Anomalously Low Zooplankton Abundance in 

the Ross Sea: An Alternative Explanation, Limnology and Oceanography Vol. 

48, No. 2, pp. 686-699. 2003. 

[21] Todorovski, L., Dzeroski, S., Kompare, B. “Modelling and prediction of phytoplankton 

growth with equation discovery.” Ecological Modelling 113, 7181. 

1998. 

[22] Todorovski, L. “Using domain knowledge for automated modeling of dynamic 

systems with equation discovery”. Doctoral dissertation, Faculty of Computer 

and Information Science, University of Ljubljana. Ljubljana, Slovenia. 2003. 

[23] Todorovski, L., W. Bridewell, O. Shiran, & P. Langley. “Inducing hierarchical 

process models in dynamic domains”. Proceedings of the Twentieth National 

Conference on Artificial Intelligence, 892–897. 2005. 

71

APPENDIX 

A. Sample CIAO data - 1997 

JDAY TEMP DPML AI NITR PHOS SILC IRON_nm IRON_um PARL PHA PHA_c DIA DIA_c 

ZOO DET PURL TP 

229 -1.842 68.86 0.87 31 2.1 75.97 0.5052 0.0005052 3.204 0.02518 2.2662 

0.02518 1.7626 1.999 0.02289 1.678 4.0288 

230 -1.842 79 0.84 31 2.1 75.86 0.505 0.000505 7.401 0.02518 2.2662 0.02518 

1.7626 1.999 0.02289 3.891 4.0288 

231 -1.844 83.75 0.82 31 2.1 75.73 0.5054 0.0005054 5.875 0.02518 2.2662 

0.02518 1.7626 1.999 0.02289 3.166 4.0288 

232 -1.848 84.19 0.84 31 2.1 75.7 0.5062 0.0005062 8.494 0.02518 2.2662 

0.02518 1.7626 1.999 0.02289 4.425 4.0288 

233 -1.838 112.1 0.83 31 2.1 75.7 0.5069 0.0005069 10.76 0.02518 2.2662 

0.02518 1.7626 1.999 0.02289 5.595 4.0288 

234 -1.832 114.4 0.86 31 2.1 75.64 0.5073 0.0005073 16.48 0.02518 2.2662 

0.02518 1.7626 1.999 0.02289 8.723 4.0288 

235 -1.84 103 0.9 31 2.1 75.54 0.5083 0.0005083 19.51 0.02518 2.2662 0.02518 

1.7626 1.999 0.02289 10.33 4.0288 

236 -1.844 92.1 0.88 31 2.1 75.44 0.5087 0.0005087 

72

B. Full entity Specification File 

#!/usr/bin/python 

""" 

This is the revised file for entity specification 

Stuart Borrett 

April 26, 2007 

""" 

from ross_lib import *; 

# import library 

# observed primary producer 

p1 = entity_instance(pe, "phyto", 

{"conc": ("system", "PHA_c", (0,600)), # ugC/L 

"growth_rate": ("system", 0, (0,1)), 

"growth_lim": ("system", 1, (0,1))}, 

{"max_growth":0.59, 

"exude_rate":0.19, 


"Ek_max":30, 

"biomin":0.025, 

"PhotoInhib":200} 

); 

# unobserved grazer with initial value from [0,1] default 0.1 

Z1 = entity_instance(ze, "zoo", 

{"conc": ("system", 0.1 , (0.10,510)), 

"growth_rate": ("system", 0.1, (0, 1))}, 

{"assim_eff":0.75, 

73


"respiration_rate":0.019, 

"gmax":0.4, 

"gcap":200} 

); 

# observed nitrate 

no3 = entity_instance(no3, "NO3", 

{"conc": ("system", "NITR", (0,32)), 

"mixing_rate": ("system", 0, (0,1))}, None); 

# unobserved iron 

fe = entity_instance(fe, "Fe", 

{"conc": ("system",.00042920, (0,0.001)), 

"mixing_rate": ("system", 0, (0,1))}, None); 

# observed/exogenous ENVIRONMENT 

e1 = entity_instance(ee, "environment", 

{"PUR": ("exogenous", "PURL", None), 

"TH2O": ("exogenous", "TEMP", None), 

"ice":("exogenous", "AI", None) }, 

{"beta":0.7} 

); 

# unobsevable detritus with initial value from [0,1] default 0.1 

D1 = entity_instance(de, "detritus", 

{"conc": ("system", 0.1, (0.001, 210))}, None); 

74

C. Full ross Sea generic model library 

#!/usr/bin/python 

""" 

This generic model library supports the construction 

of an ecosystem model of the Ross Sea. 

It is hierarchical in processes, but the entites are flat. 

This version is updated and corrected. 

It is designed for use with the sensitivity analysis experiments 

""" 

from library import *; 

from entities import *; 

from processes import *; 

lib = library("aquatic_ecosystem"); 

# ----------------------------------------------------------------------- 

# ----------------------------------------------------------------------- 

# GENERIC ENTITIES 

# id, variables, constant parameters 

# ----------------------------------------------------------------------- 

# --- PHYTOPLANKTON --- 

pe = lib.add_generic_entity("P", 

{"conc":"sum", 

"growth_rate":"prod", 

"growth_lim":"min"}, 

{"max_growth": (0.4,0.8), 

"exude_rate": (0.001,0.2), 

75

"death_rate": (0.02,0.04), 

"Ek_max":(1,100), 

"sinking_rate":(0.0001,0.25), 

"biomin":(0.02,0.04), 

"PhotoInhib":(200,1500), 

} 

); 

# --- ZOOPLANKTON --- 

ze = lib.add_generic_entity("Z", 

{"conc": "sum", 

"grazing_rate": "prod"}, 

{"assim_eff":(0.05,0.4), 

"death_rate": (0.001,0.3), 

"respiration_rate":(0.01,0.04), 

"sinking_rate":(0.001,0.25), 

"gmax":(0.3,0.5), 

"glim":(19,21), 

"gcap":(199,301)} 

); 

# --- NUTRIENTs --- 

# nitrate 

no3 = lib.add_generic_entity("Nitrate", 


"mixing_rate":"sum"}, 

{"toCratio": (6.6,6.7), 

"avg_deep_conc": (31,32)} 

); 

76

# iron 

fe = lib.add_generic_entity("Iron", 


"mixing_rate":"sum"}, 

{"toCratio": (3000,450000), 

"avg_deep_conc": (0.00035,0.00045)} 

); 

# --- DETRITUS --- 

de = lib.add_generic_entity("D", 

{"conc": "sum"}, 

{"remin_rate": (0.03,0.04), 

"sinking_rate":(0.00001,0.1)} 

); 

# --- ENVIRONMENT --- 

ee = lib.add_generic_entity("E", 

{"TH2O":"sum", 

"PUR":"sum", 

"ice":"sum"}, 

{"beta":(0.001,1), 

} 

); 

# ----------------------------------------------------------------------- 

# ----------------------------------------------------------------------- 

# GENERIC PROCESSES: 

# id, type, entities related, list of subprocesses, 

# constant parameters, equations 

# ----------------------------------------------------------------------- 

77

# --- GROWTH --- 


"growth", "", 

[("P",[pe],1,1), ("N",[no3,fe],1,100), ("D",[de],1,1), ("E",[ee],1,1)], 

[("limited_growth", ["P","N","E"], 0), 

("exudation",["P"],1), 

("nutrient_uptake",["P","N"],0)], 

{}, 

{}, 

{"P.conc": "P.growth_rate * P.conc"} 

); 


"exudation", "exudation", 

[("P",[pe],1,1)], 

[], 

{}, 

{}, 

{"P.conc": "-1 * P.exude_rate * P.growth_rate * P.conc"} 

); 


"nutrient_uptake", "nutrient_uptake", 

[("P",[pe],1,1), ("N",[no3,fe],1,1)], 

[], 

{}, 

{}, 

78

{"N.conc": "-1 * 1/( N.toCratio * 12.0107) 

* P.growth_rate * P.conc"} 

); 


"limited_growth", "limited_growth", 

[("P",[pe],1,1), ("N",[no3,fe],1,100), ("E",[ee],1,1)], 

[("light_lim", ["P","E"], 0), ("nutrient_lim",["P","N"], 0)], 

{}, 

{"P.growth_rate": "(1-E.ice) * P.max_growth 

* exp(0.06933 * E.TH2O) * P.growth_lim"}, 

{} 

); 

# ------ P.growth_lim -- 

# there are multiple factors (and formulations of factors) 

# that might limit growth. 

# In this library nutrient and light limitations are combined 

# into P.growth_lim using a minimum function 

# so that only one operates at a time (i.e., they are substitutable). 

# The disadvantage 

# of this encoding is that it will not be possible to determine 

# which factor is operating at a given time. Temperature 

# is a multiplicative control factor encoded in the P.growth_rate 

# equation, and in the present library we do not consider 

# alternative temperature effect functions. 

# --light lim -- 


79

"arrigoetal1998", "light_lim", 

[("P",[pe],1,1), ("E",[ee],1,1)], 

[], 

{"a":(5,15)}, 

{"P.growth_lim": "(1 - exp(-E.PUR / (P.Ek_max / (1 + a 

* exp(E.PUR * exp(1.089 - 2.12 * log10(P.Ek_max)))))))"}, 

{} 

); 


"arrigoetal1998_w_photoinhibition", "light_lim", 

[("P",[pe],1,1), ("E",[ee],1,1)], 

[], 

{"a":(5,15)}, 

{"P.growth_lim": "(1 - exp(-E.PUR / (P.Ek_max / (1 + a 

* exp(E.PUR * exp(1.089 - 2.12 * log10(P.Ek_max))))))) 

* exp(-1 * E.PUR /P.PhotoInhib)"}, 

{} 

); 

# -- nutrient lim -- 


"monod_lim", "nutrient_lim", 

[("P",[pe],1,1), ("N",[no3,fe],1,1)], 

[], 

{"k":(0.000001,0.001)}, 

{"P.growth_lim": "N.conc / (N.conc + k)"}, 

{} 

); 

80

’’’ 


"ratio_lim", "nutrient_lim", 

[("P",[pe],1,1), ("N",[no3,fe],1,1)], 

[], 

{"k":(0.000001,1)}, 

{"P.growth_lim": "N.conc / (N.conc + k * P.conc)"}, 

{} 

); 

’’’ 


"monod_2nd", "nutrient_lim", 

[("P",[pe],1,1), ("N",[no3,fe],1,1)], 

[], 

{"k":(0.000001,0.001)}, 

{"P.growth_lim": "pow(N.conc,2) / (pow(N.conc,2) + k)"}, 

{} 

); 


"nut_lim_exp", "nutrient_lim", 

[("P",[pe],1,1), ("N",[no3,fe],1,1)], 

[], 

{"k":(0.000001,1)}, 

{"P.growth_lim": "1-exp(-1* k * N.conc)"}, 

{} 

); 

# --- DEATH --- 

81


"death_exp", "", 

[("S",[pe,ze],1,1), ("D",[de],1,1), ("E",[ee],1,1)], 

[], 

{}, 

{}, 

{"S.conc": "-1 * S.death_rate * S.conc", 

"D.conc": "(1-E.beta) * S.death_rate * S.conc"}, 

); 

# --- REMINERALIZATION --- 


"remineralization", "", 

[("D",[de],1,1), ("N",[fe,no3],1,3)], 

[("nutrient_remineralization",["D","N"], 0)], 

{}, 

{}, 

{"D.conc": "-1 * D.remin_rate * D.conc"} 

); 


"nutrient_remineralization", "", 

[("D", [de], 1,1), ("N", [fe], 1, 1)], 

[], 

{}, 

{}, 

{ "N.conc": "1/(N.toCratio * 12.0107) * D.remin_rate * D.conc" } 

); 

# --- RESPIRATION --- 

82


"respiration", "", 

[("Z",[ze],1,1)], 

[], 

{}, 

{}, 

{"Z.conc":"-1 * Z.respiration_rate * Z.conc"} 

); 

# --- SINKING --- 


"sinking", "", 

[("V",[pe,ze,de],1,1)], 

[], 

{}, 

{}, 

{"V.conc": "-1 * V.sinking_rate * V.conc"} 

); 

# --- GRAZING --- 


"holling_type_1", "graze_rate", 

[("Z",[ze],1,1), ("P",[pe],1,1)], 

[], 

{}, 

{"Z.grazing_rate": "Z.gmax * P.conc"}, 

{} 

); 

83


"holling_type_2", "graze_rate", 

[("Z",[ze],1,1), ("P",[pe],1,1)], 

[], 

{}, 

{"Z.grazing_rate": "max(0,Z.gmax * P.conc / (Z.gcap + P.conc))"}, 

{} 

); 


"holling_type_2_mod", "graze_rate", 

[("Z",[ze],1,1), ("P",[pe],1,1)], 

[], 

{}, 

{"Z.grazing_rate": "max(0,(Z.gmax * (P.conc - P.biomin - Z.glim) 

/ (Z.gcap + (P.conc - P.biomin - Z.glim))))"}, 

{} 

); 


"ivlev", "graze_rate", 

[("Z", [ze],1,1), ("P", [pe],1,1)], 

[], 

{"delta":(0.01,0.5)}, 

{"Z.grazing_rate": "max(0,Z.gmax * (1 - exp(-1 * delta * P.conc)))" 

}, 

84

{} 

); 


"grazing", "grazing", 

[("Z",[ze],1,1), ("P",[pe],0,1), ("D",[de],0,1), ("E",[ee],0,1)], 

[("graze_rate", ["Z","P"], 0)], 

{}, 

{}, 

{"Z.conc": "Z.assim_eff * Z.grazing_rate * Z.conc", 

"P.conc": "-1 * Z.grazing_rate * Z.conc", 

"D.conc": "(1-E.beta) * (1-Z.assim_eff) * Z.grazing_rate * Z.conc"} 

); 

# --- Nutrient Mixing ------------------------------------------ 

# this process represents an input of nutrients (nitrate) 

# due to mixing or upwelling. 


"nutrient_mixing", "", 

[("N",[no3,fe],1,1),("E",[ee],1,1)], 

[("mixing_rate", ["N","E"],0)], 

{}, 

{}, 

{"N.conc": "(N.avg_deep_conc - N.conc) * N.mixing_rate"} 

); 


"linear_temp_control", "mixing_rate", 

[("N",[no3,fe],1,1),("E",[ee],1,1)], 

85

[], 

{"max_mixing_rate":(0.000001,1)}, 

{"N.mixing_rate": "max_mixing_rate 

*(datamax(E.TH2O)-E.TH2O)/(datamax(E.TH2O)-datamin(E.TH2O))"}, 

{}, 

); 

# --- ROOT --- 

lib.add_generic_process("root", "", 

[("Z",[ze],0,1), ("P",[pe],1,2), 

("N",[no3,fe],2,2), ("D",[de],1,1), ("E",[ee],1,1)], 

[("growth", ["P","N","D","E"], 0), 

("death_exp", ["P","D","E"],1), 

("death_exp", ["Z","D","E"],1), 

("grazing", ["Z","P","D","E"], 0), 

("remineralization", ["D","N"], 0), 

("respiration", ["Z"], 1), 

("sinking", ["P"],1), 

("sinking", ["D"],1), 

("nutrient_mixing", ["N","E"],1), 

], 

{}, {}, {} 

); 

86

D. Models selected in both experiment 8 and 19 

Model D 

[ 

dP [ ] 


M(t)(1 − a 6 ) − a 9 − a 17 

]P 

− 

( 

( a 13 P a 14 

) Z 

} {{ } 


dZ 

( 

dt = a 12 ( a 13 P a 14 

) 

} {{ } 


dD 

( 

dt = (1 − a 10 )(a 9 P + a 11 Z 2 ) 

− D(a 15 + a 18 ) 

) 

) 

) 

Z − 

(a 11 Z + a 16 Z 

) 

+ 

[ 

] 

dN 

dt = E T H2 O 

(a 19 − N)a 

max 

− E T H2 O(t) 

20 

E T H2 O max 


[ 

− 

P 

(a 7 12.0107) 

( 

) 

(1 − a 10 )(1 − a 12 )( a 13 P a 14 

) Z 

} {{ } 


[ 

] 

(1 − E ice (t))a 0 e (0.06933∗E T H 2 O(t)) 

M(t) + 

[ 

] 

dF 

dt = E T H2 O 

(a 21 − F )a 

max 

− E T H2 O(t) 

22 

E T H2 O max 


[ 

− 

P 

(a 8 12.0107) 

[ 

] 

(1 − E ice (t))a 0 e (0.06933∗E T H 2 O(t)) 

M(t) + 

] 

a 15 D 

(a 7 ∗ 12.0107) 

[ 

]] 

a 15 D 

(a 7 12.0107) 

{ 

F 

M(t) = min 

(F + a 5 ) , (1−e−a 4N ), (e − E P UR (t) 

a 2 )(1−e −E P UR (t)(1+a 3 e(E P UR (t)e1.089−2.12log 10 (a 1 ) ) ) 

a 1 )) 

} 

87

Model E 

[ 

dP [ ] 


M(t)(1 − a 5 ) − a 8 − a 18 

]P 

( { 

− max 0, a } 

12(P − a 15 − a 14 ) 

) 

Z 

a 13 + P − a 15 − a 

} {{ 14 

} 


dZ 

( { 

dt = a 11 max 0, a } 

12(P − a 15 − a 14 ) 

) 

) 

Z − 

(a 10 Z + a 17 Z 

a 13 + P − a 15 − a 

} {{ 14 

} 


dD 

( 

) ( 

{ 

dt = (1 − a 9 )(a 8 P + a 10 Z 2 ) + (1 − a 9 )(1 − a 11 ) max 0, a } 

12(P − a 15 − a 14 ) 

a 13 + P − a 15 − a 14 

− D(a 16 + a 19 ) 

[ 

] 

dN 

dt = E T H2 O 

(a 20 − N)a 

max 

− E T H2 O(t) 

21 

E T H2 O max 


[ 

− 

P 

(a 6 12.0107) 

[ 

] 

(1 − E ice (t))a 0 e (0.06933∗E T H 2 O(t)) 

M(t) + 

[ 

] 

dF 

dt = E T H2 O 

(a 22 − F )a 

max 

− E T H2 O(t) 

23 

E T H2 O max 


[ 

− 

P 

(a 7 12.0107) 

[ 

] 

(1 − E ice (t))a 0 e (0.06933∗E T H 2 O(t)) 

M(t) + 

} {{ } 


] 

a 16 D 

(a 6 ∗ 12.0107) 

[ 

]] 

a 16 D 

(a 7 12.0107) 

) 

Z 

{ 

M(t) = min 

} 

a 1 )) 

F 

(F + a 4 ) , (1 − e−a 3N ), (1 − e −E P UR (t)(1+a 2 e(E P UR (t)e1.089−2.12log 10 (a 1 ) ) ) 

88

E. Models selected in both experiment 8 and 21 

Model F 

[ 

dP [ ] 

( 


M(t)(1 − a 6 ) − a 9 − a 17 

]P − 

a 13 P 

1 + a 13 a 14 P 

} {{ } 


) 

Z 

dZ 

( 

dt = a 12 

a 13 P 

1 + a 13 a 14 P 

} {{ } 


) 

) 

Z − 

(a 11 + a 16 Z 

dD 

( 

) ( 

dt = (1 − a 10 )(a 9 P + a 11 Z) + (1 − a 10 )(1 − a 12 ) 

a 13 P 

1 + a 13 a 14 P 

} {{ } 


) 

Z − D(a 15 + a 18 ) 

[ 

] 

dN 

dt = E T H2 O 

(a 19 − N)a 

max 

− E T H2 O(t) 

20 

E T H2 O max 


[ 

− 

P 

(a 7 12.0107) 

[ 

] 

(1 − E ice (t))a 0 e (0.06933∗E T H 2 O(t)) 

M(t) + 

] 

a 15 D 

(a 7 ∗ 12.0107) 

[ 

] 

dF 

dt = E T H2 O 

(a 21 − F )a 

max 

− E T H2 O(t) 

22 

E T H2 O max 


[ 

− 

P 

(a 8 12.0107) 

[ 

] 

(1 − E ice (t))a 0 e (0.06933∗E T H 2 O(t)) 

M(t) + 

[ 

]] 

a 15 D 

(a 7 12.0107) 

{ 

F 

M(t) = min 

(F + a 5 ) , N 

(N + a 4 ) , E P UR (t) 


a 1 )) 

} 

89

Model G 

[ 

dP [ ] 

( 


M(t)(1 − a 6 ) − a 9 − a 17 

]P − 

dZ 

( 

dt = 

a 13 P 2 

a 12 

1 + a 13 a 14 P 

} {{ 2 

} 


dD 

dt = ( 

(1 − a 10 )(a 9 P + a 11 Z) 

) 

) 

Z − 

(a 11 + a 16 Z 

) 

+ 

( 

(1 − a 10 )(1 − a 12 ) 

[ 

] 

dN 

dt = E T H2 O 

(a 19 − N)a 

max 

− E T H2 O(t) 

20 

E T H2 O max 


[ 

− 

P 

(a 7 12.0107) 

a 13 P 2 

1 + a 13 a 14 P 2 

} {{ } 


[ 

] 

(1 − E ice (t))a 0 e (0.06933∗E T H 2 O(t)) 

M(t) + 

[ 

] 

dF 

dt = E T H2 O 

(a 21 − F )a 

max 

− E T H2 O(t) 

22 

E T H2 O max 


[ 

− 

P 

(a 8 12.0107) 

[ 

] 

(1 − E ice (t))a 0 e (0.06933∗E T H 2 O(t)) 

M(t) + 

a 13 P 2 

1 + a 13 a 14 P 2 

} {{ } 


) 

Z − D(a 15 + a 18 ) 

] 

a 15 D 

(a 7 ∗ 12.0107) 

[ 

]] 

a 15 D 

(a 7 12.0107) 

) 

Z 

{ 

F 

M(t) = min 

(F + a 5 ) , N 

(N + a 4 ) , E P UR (t) 


a 1 )) 

} 

90

Model H 

[ 

dP [ ] 

( 


M(t)(1 − a 6 ) − a 9 − a 17 

]P − a 13 (1 − e −a14P ) 

} {{ } 


dZ 

( 

dt = a 12 a 13 (1 − e −a14P ) 

} {{ } 


dD 

dt = ( 

(1 − a 10 )(a 9 P + a 11 Z) 

) 

) 

Z − 

(a 11 + a 16 Z 

) 

+ 

[ 

] 

dN 

dt = E T H2 O 

(a 19 − N)a 

max 

− E T H2 O(t) 

20 

E T H2 O max 


[ 

− 

P 

(a 7 12.0107) 

( 

) 

(1 − a 10 )(1 − a 12 ) a 13 (1 − e −a14P ) Z − D(a 

} {{ } 

15 + a 18 ) 


[ 

] 

(1 − E ice (t))a 0 e (0.06933∗E T H 2 O(t)) 

M(t) + 

[ 

] 

dF 

dt = E T H2 O 

(a 21 − F )a 

max 

− E T H2 O(t) 

22 

E T H2 O max 


[ 

− 

P 

(a 8 12.0107) 

[ 

] 

(1 − E ice (t))a 0 e (0.06933∗E T H 2 O(t)) 

M(t) + 

] 

a 15 D 

(a 7 ∗ 12.0107) 

[ 

]] 

a 15 D 

(a 7 12.0107) 

) 

Z 

{ 

F 

M(t) = min 

(F + a 5 ) , N 

(N + a 4 ) , E P UR (t) 


a 1 )) 

} 

91

BIOGRAPHICAL SKETCH 

I was born and raised in France and came to the United States in 2006 to further 

my education. I saw there an incredible opportunity not only to explore my father’s 

origins but also to set out on a journey that promised to be full of learning experiences. 

I used to be terrible in math. If you would have told me in High School that I 

would study math later on in life, I probably would have laughed. But sure enough 

I completed my Undergraduate Degree in Applied Mathematics at the University of 

North Carolina Wilmington in 2010. For the past year and a half I have conducted 

research under Dr. Borrett on Inductive Process Modeling. I am now looking at 

possibility of traveling and working for a non-profit Christian organization which 

work with orphanages around the world. I have a heart for service and helping 

others. I trust that God will use the skills that I have acquired during my Masters 

where he sees fit. 

92

HIERARCHAL INDUCTIVE PROCESS MODELING AND ANALYSIS ...

Create successful ePaper yourself

Delete template?

Save as template?