Quantitative effects of composting state variables on ... - ResearchGate
Quantitative effects of composting state variables on ... - ResearchGate
Quantitative effects of composting state variables on ... - ResearchGate
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
1246 W. Sun et al. / Science <str<strong>on</strong>g>of</str<strong>on</strong>g> the Total Envir<strong>on</strong>ment 409 (2011) 1243–1254<br />
3.2. Dataset<br />
All the observed data from the six runs <str<strong>on</strong>g>of</str<strong>on</strong>g> food waste <str<strong>on</strong>g>composting</str<strong>on</strong>g><br />
were collected (198 samples) and combined to form the whole<br />
dataset. Each sample was a combinati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> inputs (i.e. <str<strong>on</strong>g>state</str<strong>on</strong>g> <str<strong>on</strong>g>variables</str<strong>on</strong>g><br />
and SCA's inherent parameters) and the C/N ratio. The <str<strong>on</strong>g>state</str<strong>on</strong>g> <str<strong>on</strong>g>variables</str<strong>on</strong>g><br />
had potential <str<strong>on</strong>g>effects</str<strong>on</strong>g> <strong>on</strong> C/N ratio variati<strong>on</strong>; they included time (X 1 ),<br />
mean temperature (X 2 ), pH (X 3 ), moisture c<strong>on</strong>tent (X 4 ), ash c<strong>on</strong>tent<br />
(X 5 ), organic c<strong>on</strong>tent (X 6 ), NH + 4 -N c<strong>on</strong>centrati<strong>on</strong> (X 7 ), cumulative NH 3<br />
emissi<strong>on</strong>s (X 8 ), log col<strong>on</strong>y count <str<strong>on</strong>g>of</str<strong>on</strong>g> thermophilic bacteria (X 9 ), log<br />
col<strong>on</strong>y count <str<strong>on</strong>g>of</str<strong>on</strong>g> mesophilic bacteria (X 10 ), upper temperature (X 11 ),<br />
and lower temperature (X 12 ). Although the sum <str<strong>on</strong>g>of</str<strong>on</strong>g> X 4 ,X 5 and X 6 was<br />
<strong>on</strong>e, the three <str<strong>on</strong>g>state</str<strong>on</strong>g> <str<strong>on</strong>g>variables</str<strong>on</strong>g> were still included since it was difficult to<br />
exclude <strong>on</strong>e <str<strong>on</strong>g>of</str<strong>on</strong>g> them from the candidates subjectively; The SCA's<br />
inherent parameters were α cut (significance level for cutting clusters),<br />
α merge (significance level for merging clusters), and N min (the<br />
minimum number <str<strong>on</strong>g>of</str<strong>on</strong>g> samples within tip clusters). The C/N ratio was<br />
the c<strong>on</strong>cerned output. The entire dataset was randomly divided into a<br />
training set (167 samples, 84%) and a test set (31 samples, 16%). The<br />
training set was used for calibrating the GASCA (or SCA) forecasting<br />
trees and the test <strong>on</strong>e was for verificati<strong>on</strong> in independent samples<br />
from different reactors. Since the magnitudes <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>state</str<strong>on</strong>g> <str<strong>on</strong>g>variables</str<strong>on</strong>g> were<br />
significantly different from each other, we firstly c<strong>on</strong>verted all input<br />
datasets into the range <str<strong>on</strong>g>of</str<strong>on</strong>g> [−1, 1] before the training process and then<br />
transformed the values <str<strong>on</strong>g>of</str<strong>on</strong>g> the predicted dependent <str<strong>on</strong>g>variables</str<strong>on</strong>g> back to<br />
their original ranges after the training was completed.<br />
3.3. Principle <str<strong>on</strong>g>of</str<strong>on</strong>g> stepwise cluster analysis (SCA)<br />
SCA is based <strong>on</strong> the theory <str<strong>on</strong>g>of</str<strong>on</strong>g> multivariate analysis <str<strong>on</strong>g>of</str<strong>on</strong>g> variance.<br />
Generally, two phases are involved in its implementati<strong>on</strong>: training<br />
and test. In the training phase, the sample dataset is cut into two<br />
subsets (clusters) and then two are merged into <strong>on</strong>e during the<br />
iterative training process. According to Wilks' likelihood-ratio<br />
criteri<strong>on</strong>, the cutting point is optimal <strong>on</strong>ly if the value <str<strong>on</strong>g>of</str<strong>on</strong>g> Wilks' Λ is<br />
minimum (Wilks, 1962). Since the Λ is directly related to the F<br />
statistic, the sample means <str<strong>on</strong>g>of</str<strong>on</strong>g> the two sub-clusters can be compared<br />
for significant differences through F test (Rao, 1952). Thus, the criteria<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> cutting (or merging or not) clusters rely <strong>on</strong> a set <str<strong>on</strong>g>of</str<strong>on</strong>g> F tests (Rao,<br />
1965; Tatsuoka, 1971). Based <strong>on</strong> the F tests, a tree is shaped step by<br />
step until no subsets could be cut or merged any more. In the test<br />
phase, the values <str<strong>on</strong>g>of</str<strong>on</strong>g> the new independent <str<strong>on</strong>g>variables</str<strong>on</strong>g> in the test samples<br />
will be used as references to determine which child cluster a sample in<br />
the parent cluster will enter. Thus, the sample would gradually enter<br />
<strong>on</strong>e tip cluster. The mean <str<strong>on</strong>g>of</str<strong>on</strong>g> dependent <str<strong>on</strong>g>variables</str<strong>on</strong>g> from the training set<br />
at the tip cluster that the sample finally enters becomes the predicted<br />
value <str<strong>on</strong>g>of</str<strong>on</strong>g> the sample's dependent variable. A more detailed descripti<strong>on</strong><br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> SCA can be referred to our previous work (Qin et al., 2008).<br />
3.4. Variable/parameter selecti<strong>on</strong> using GA<br />
An integrated GASCA model combining GA with SCA would possess<br />
abilities in both variable searching and n<strong>on</strong>linear fitting. GA transfers<br />
chromosome informati<strong>on</strong> (certain independent <str<strong>on</strong>g>variables</str<strong>on</strong>g> and inherent<br />
parameters) to SCA and then SCA returned the fitness (an index reflecting<br />
both structure and performance <str<strong>on</strong>g>of</str<strong>on</strong>g> the SCA tree) to the corresp<strong>on</strong>ding<br />
chromosome in GA. In detail, the chromosome is a 19-bit binary string.<br />
The first twelve binary strings encode all the SCA's candidate independent<br />
<str<strong>on</strong>g>variables</str<strong>on</strong>g>, where the ‘1’ represents that its corresp<strong>on</strong>ding independent<br />
variable is selected while the ‘0’ means not. The last seven binary<br />
strings represent inherent parameters <str<strong>on</strong>g>of</str<strong>on</strong>g> SCA (α cut , α merge and N min ). In<br />
order to unify the calculati<strong>on</strong>, we chosen several representative values<br />
within the appropriate ranges <str<strong>on</strong>g>of</str<strong>on</strong>g> the three parameters respectively and<br />
used binary strings to represent the locati<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> each value in the ranges.<br />
Thus the binary strings are capable <str<strong>on</strong>g>of</str<strong>on</strong>g> describing real-number values <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
SCA's inherent parameters. The mapping relati<strong>on</strong>ships between the<br />
binary string and the <str<strong>on</strong>g>variables</str<strong>on</strong>g>/parameters are illustrated in Fig. 3.<br />
An index based <strong>on</strong> both the SCA's fitting accuracy and the SCA tree's<br />
complexity was developed as the fitness functi<strong>on</strong>. The accuracy <str<strong>on</strong>g>of</str<strong>on</strong>g> SCA's<br />
fitting is defined as RMSE <str<strong>on</strong>g>of</str<strong>on</strong>g> the training set. The complexity <str<strong>on</strong>g>of</str<strong>on</strong>g> SCA tree<br />
is depicted as the sum <str<strong>on</strong>g>of</str<strong>on</strong>g> normalized layer number <str<strong>on</strong>g>of</str<strong>on</strong>g> the SCA (N s,layer ),<br />
the normalized node number <str<strong>on</strong>g>of</str<strong>on</strong>g> the SCA (N s,node ) and the normalized<br />
number <str<strong>on</strong>g>of</str<strong>on</strong>g> independent <str<strong>on</strong>g>variables</str<strong>on</strong>g> (N s,x ). Four weighting coefficients<br />
(μ 1 , μ 2 , μ 3 , and μ 4 ) were introduced to balance the accuracy and the<br />
complexity. Thus the fitness functi<strong>on</strong> has the following definiti<strong>on</strong>:<br />
Fitness = μ 1 RMSE s + μ 2 N s;layer + μ 3 N s;node + μ 4 N s;x<br />
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi<br />
n<br />
u∑<br />
ðy j −y training; j Þ 2<br />
tj =1<br />
−RMSE<br />
n−1<br />
min<br />
RMSE s =<br />
RMSE max −RMSE min<br />
N layer−N layer; min<br />
N s;layer =<br />
N layer; max −N layer; min<br />
N node−N node; min<br />
N s;node =<br />
N node; max −N node; min<br />
N s;x =<br />
N x−N x; min<br />
N x; max −N x; min<br />
where RMSE min and RMSE max are ideal minimum and maximum values <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
RMSE, respectively; n is the number <str<strong>on</strong>g>of</str<strong>on</strong>g> samples in the training set; y j and<br />
y training,j are the predicted and observed values for C/N in the jth sample in<br />
the training set, respectively; N layer is the layer number <str<strong>on</strong>g>of</str<strong>on</strong>g> the SCA tree.<br />
N layer,min and N layer,max are ideal minimum and maximum values <str<strong>on</strong>g>of</str<strong>on</strong>g> N layer ,<br />
ð1Þ<br />
ð2Þ<br />
ð3Þ<br />
ð4Þ<br />
ð5Þ<br />
X1<br />
Input Variables <str<strong>on</strong>g>of</str<strong>on</strong>g> SCA<br />
Internal Parameters <str<strong>on</strong>g>of</str<strong>on</strong>g> SCA<br />
State Variables<br />
α cut α merge N min<br />
X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 C1 C2 C3 M1 M2 N1 N2<br />
Populati<strong>on</strong><br />
1 0 ... 1 0 1 ... 0 1 1 ... 0 1 1 1<br />
1 1 ... 0 1 0 ... 0 1 0 ... 0 0 1 0<br />
1 0 ... 1 1 1 ... 0 1 1 ... 0 1 1 0<br />
0 0 ... 1 1 1 ... 0 1 1 ... 0 1 1 0<br />
… … ...<br />
0 1 ... 1 0 1 ... 0 1 0 ... 0 1 0 1<br />
1 0 ... 1 0 1 ... 0 1 0 ... 0 1 1 0<br />
Detailed form<br />
1 0 1 1 0 1 1 1 0 0 1 1 1 0 1 1 0 1 0<br />
Vector 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1<br />
1 0 0 0 1 1 0 1 1 0 1 0 0 0 1 0 1 1 0<br />
1 0 1 0 0 1 1 1 0 0 1 0 1 0 1 0 0 1 0<br />
mutati<strong>on</strong> mutati<strong>on</strong><br />
mutati<strong>on</strong><br />
1 0 1 0 1 1 1 0 0 0 1 0 1 0 1 1 0 1 0<br />
crossover<br />
Fig. 3. Principle <str<strong>on</strong>g>of</str<strong>on</strong>g> GASCA.