29.12.2014 Views

Quantitative effects of composting state variables on ... - ResearchGate

Quantitative effects of composting state variables on ... - ResearchGate

Quantitative effects of composting state variables on ... - ResearchGate

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

1246 W. Sun et al. / Science <str<strong>on</strong>g>of</str<strong>on</strong>g> the Total Envir<strong>on</strong>ment 409 (2011) 1243–1254<br />

3.2. Dataset<br />

All the observed data from the six runs <str<strong>on</strong>g>of</str<strong>on</strong>g> food waste <str<strong>on</strong>g>composting</str<strong>on</strong>g><br />

were collected (198 samples) and combined to form the whole<br />

dataset. Each sample was a combinati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> inputs (i.e. <str<strong>on</strong>g>state</str<strong>on</strong>g> <str<strong>on</strong>g>variables</str<strong>on</strong>g><br />

and SCA's inherent parameters) and the C/N ratio. The <str<strong>on</strong>g>state</str<strong>on</strong>g> <str<strong>on</strong>g>variables</str<strong>on</strong>g><br />

had potential <str<strong>on</strong>g>effects</str<strong>on</strong>g> <strong>on</strong> C/N ratio variati<strong>on</strong>; they included time (X 1 ),<br />

mean temperature (X 2 ), pH (X 3 ), moisture c<strong>on</strong>tent (X 4 ), ash c<strong>on</strong>tent<br />

(X 5 ), organic c<strong>on</strong>tent (X 6 ), NH + 4 -N c<strong>on</strong>centrati<strong>on</strong> (X 7 ), cumulative NH 3<br />

emissi<strong>on</strong>s (X 8 ), log col<strong>on</strong>y count <str<strong>on</strong>g>of</str<strong>on</strong>g> thermophilic bacteria (X 9 ), log<br />

col<strong>on</strong>y count <str<strong>on</strong>g>of</str<strong>on</strong>g> mesophilic bacteria (X 10 ), upper temperature (X 11 ),<br />

and lower temperature (X 12 ). Although the sum <str<strong>on</strong>g>of</str<strong>on</strong>g> X 4 ,X 5 and X 6 was<br />

<strong>on</strong>e, the three <str<strong>on</strong>g>state</str<strong>on</strong>g> <str<strong>on</strong>g>variables</str<strong>on</strong>g> were still included since it was difficult to<br />

exclude <strong>on</strong>e <str<strong>on</strong>g>of</str<strong>on</strong>g> them from the candidates subjectively; The SCA's<br />

inherent parameters were α cut (significance level for cutting clusters),<br />

α merge (significance level for merging clusters), and N min (the<br />

minimum number <str<strong>on</strong>g>of</str<strong>on</strong>g> samples within tip clusters). The C/N ratio was<br />

the c<strong>on</strong>cerned output. The entire dataset was randomly divided into a<br />

training set (167 samples, 84%) and a test set (31 samples, 16%). The<br />

training set was used for calibrating the GASCA (or SCA) forecasting<br />

trees and the test <strong>on</strong>e was for verificati<strong>on</strong> in independent samples<br />

from different reactors. Since the magnitudes <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>state</str<strong>on</strong>g> <str<strong>on</strong>g>variables</str<strong>on</strong>g> were<br />

significantly different from each other, we firstly c<strong>on</strong>verted all input<br />

datasets into the range <str<strong>on</strong>g>of</str<strong>on</strong>g> [−1, 1] before the training process and then<br />

transformed the values <str<strong>on</strong>g>of</str<strong>on</strong>g> the predicted dependent <str<strong>on</strong>g>variables</str<strong>on</strong>g> back to<br />

their original ranges after the training was completed.<br />

3.3. Principle <str<strong>on</strong>g>of</str<strong>on</strong>g> stepwise cluster analysis (SCA)<br />

SCA is based <strong>on</strong> the theory <str<strong>on</strong>g>of</str<strong>on</strong>g> multivariate analysis <str<strong>on</strong>g>of</str<strong>on</strong>g> variance.<br />

Generally, two phases are involved in its implementati<strong>on</strong>: training<br />

and test. In the training phase, the sample dataset is cut into two<br />

subsets (clusters) and then two are merged into <strong>on</strong>e during the<br />

iterative training process. According to Wilks' likelihood-ratio<br />

criteri<strong>on</strong>, the cutting point is optimal <strong>on</strong>ly if the value <str<strong>on</strong>g>of</str<strong>on</strong>g> Wilks' Λ is<br />

minimum (Wilks, 1962). Since the Λ is directly related to the F<br />

statistic, the sample means <str<strong>on</strong>g>of</str<strong>on</strong>g> the two sub-clusters can be compared<br />

for significant differences through F test (Rao, 1952). Thus, the criteria<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> cutting (or merging or not) clusters rely <strong>on</strong> a set <str<strong>on</strong>g>of</str<strong>on</strong>g> F tests (Rao,<br />

1965; Tatsuoka, 1971). Based <strong>on</strong> the F tests, a tree is shaped step by<br />

step until no subsets could be cut or merged any more. In the test<br />

phase, the values <str<strong>on</strong>g>of</str<strong>on</strong>g> the new independent <str<strong>on</strong>g>variables</str<strong>on</strong>g> in the test samples<br />

will be used as references to determine which child cluster a sample in<br />

the parent cluster will enter. Thus, the sample would gradually enter<br />

<strong>on</strong>e tip cluster. The mean <str<strong>on</strong>g>of</str<strong>on</strong>g> dependent <str<strong>on</strong>g>variables</str<strong>on</strong>g> from the training set<br />

at the tip cluster that the sample finally enters becomes the predicted<br />

value <str<strong>on</strong>g>of</str<strong>on</strong>g> the sample's dependent variable. A more detailed descripti<strong>on</strong><br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> SCA can be referred to our previous work (Qin et al., 2008).<br />

3.4. Variable/parameter selecti<strong>on</strong> using GA<br />

An integrated GASCA model combining GA with SCA would possess<br />

abilities in both variable searching and n<strong>on</strong>linear fitting. GA transfers<br />

chromosome informati<strong>on</strong> (certain independent <str<strong>on</strong>g>variables</str<strong>on</strong>g> and inherent<br />

parameters) to SCA and then SCA returned the fitness (an index reflecting<br />

both structure and performance <str<strong>on</strong>g>of</str<strong>on</strong>g> the SCA tree) to the corresp<strong>on</strong>ding<br />

chromosome in GA. In detail, the chromosome is a 19-bit binary string.<br />

The first twelve binary strings encode all the SCA's candidate independent<br />

<str<strong>on</strong>g>variables</str<strong>on</strong>g>, where the ‘1’ represents that its corresp<strong>on</strong>ding independent<br />

variable is selected while the ‘0’ means not. The last seven binary<br />

strings represent inherent parameters <str<strong>on</strong>g>of</str<strong>on</strong>g> SCA (α cut , α merge and N min ). In<br />

order to unify the calculati<strong>on</strong>, we chosen several representative values<br />

within the appropriate ranges <str<strong>on</strong>g>of</str<strong>on</strong>g> the three parameters respectively and<br />

used binary strings to represent the locati<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> each value in the ranges.<br />

Thus the binary strings are capable <str<strong>on</strong>g>of</str<strong>on</strong>g> describing real-number values <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

SCA's inherent parameters. The mapping relati<strong>on</strong>ships between the<br />

binary string and the <str<strong>on</strong>g>variables</str<strong>on</strong>g>/parameters are illustrated in Fig. 3.<br />

An index based <strong>on</strong> both the SCA's fitting accuracy and the SCA tree's<br />

complexity was developed as the fitness functi<strong>on</strong>. The accuracy <str<strong>on</strong>g>of</str<strong>on</strong>g> SCA's<br />

fitting is defined as RMSE <str<strong>on</strong>g>of</str<strong>on</strong>g> the training set. The complexity <str<strong>on</strong>g>of</str<strong>on</strong>g> SCA tree<br />

is depicted as the sum <str<strong>on</strong>g>of</str<strong>on</strong>g> normalized layer number <str<strong>on</strong>g>of</str<strong>on</strong>g> the SCA (N s,layer ),<br />

the normalized node number <str<strong>on</strong>g>of</str<strong>on</strong>g> the SCA (N s,node ) and the normalized<br />

number <str<strong>on</strong>g>of</str<strong>on</strong>g> independent <str<strong>on</strong>g>variables</str<strong>on</strong>g> (N s,x ). Four weighting coefficients<br />

(μ 1 , μ 2 , μ 3 , and μ 4 ) were introduced to balance the accuracy and the<br />

complexity. Thus the fitness functi<strong>on</strong> has the following definiti<strong>on</strong>:<br />

Fitness = μ 1 RMSE s + μ 2 N s;layer + μ 3 N s;node + μ 4 N s;x<br />

vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi<br />

n<br />

u∑<br />

ðy j −y training; j Þ 2<br />

tj =1<br />

−RMSE<br />

n−1<br />

min<br />

RMSE s =<br />

RMSE max −RMSE min<br />

N layer−N layer; min<br />

N s;layer =<br />

N layer; max −N layer; min<br />

N node−N node; min<br />

N s;node =<br />

N node; max −N node; min<br />

N s;x =<br />

N x−N x; min<br />

N x; max −N x; min<br />

where RMSE min and RMSE max are ideal minimum and maximum values <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

RMSE, respectively; n is the number <str<strong>on</strong>g>of</str<strong>on</strong>g> samples in the training set; y j and<br />

y training,j are the predicted and observed values for C/N in the jth sample in<br />

the training set, respectively; N layer is the layer number <str<strong>on</strong>g>of</str<strong>on</strong>g> the SCA tree.<br />

N layer,min and N layer,max are ideal minimum and maximum values <str<strong>on</strong>g>of</str<strong>on</strong>g> N layer ,<br />

ð1Þ<br />

ð2Þ<br />

ð3Þ<br />

ð4Þ<br />

ð5Þ<br />

X1<br />

Input Variables <str<strong>on</strong>g>of</str<strong>on</strong>g> SCA<br />

Internal Parameters <str<strong>on</strong>g>of</str<strong>on</strong>g> SCA<br />

State Variables<br />

α cut α merge N min<br />

X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 C1 C2 C3 M1 M2 N1 N2<br />

Populati<strong>on</strong><br />

1 0 ... 1 0 1 ... 0 1 1 ... 0 1 1 1<br />

1 1 ... 0 1 0 ... 0 1 0 ... 0 0 1 0<br />

1 0 ... 1 1 1 ... 0 1 1 ... 0 1 1 0<br />

0 0 ... 1 1 1 ... 0 1 1 ... 0 1 1 0<br />

… … ...<br />

0 1 ... 1 0 1 ... 0 1 0 ... 0 1 0 1<br />

1 0 ... 1 0 1 ... 0 1 0 ... 0 1 1 0<br />

Detailed form<br />

1 0 1 1 0 1 1 1 0 0 1 1 1 0 1 1 0 1 0<br />

Vector 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1<br />

1 0 0 0 1 1 0 1 1 0 1 0 0 0 1 0 1 1 0<br />

1 0 1 0 0 1 1 1 0 0 1 0 1 0 1 0 0 1 0<br />

mutati<strong>on</strong> mutati<strong>on</strong><br />

mutati<strong>on</strong><br />

1 0 1 0 1 1 1 0 0 0 1 0 1 0 1 1 0 1 0<br />

crossover<br />

Fig. 3. Principle <str<strong>on</strong>g>of</str<strong>on</strong>g> GASCA.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!