29.12.2014 Views

Quantitative effects of composting state variables on ... - ResearchGate

Quantitative effects of composting state variables on ... - ResearchGate

Quantitative effects of composting state variables on ... - ResearchGate

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

1248 W. Sun et al. / Science <str<strong>on</strong>g>of</str<strong>on</strong>g> the Total Envir<strong>on</strong>ment 409 (2011) 1243–1254<br />

Best, Worst, and Mean Scores<br />

Average Distance<br />

Times Calling Proxy Table<br />

0.9<br />

0.8<br />

0.7<br />

0.6<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

3.5<br />

3<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

0<br />

100<br />

90<br />

80<br />

70<br />

60<br />

50<br />

40<br />

30<br />

20<br />

10<br />

worse and the average fitness value (the vertical bars in Fig. 4a) would<br />

decrease gradually with the increase <str<strong>on</strong>g>of</str<strong>on</strong>g> generati<strong>on</strong> number. There<br />

were always significant differences between the best and the average<br />

fitness values at each generati<strong>on</strong>; the average distance would<br />

decrease instantly but automatically keep fluctuating during the<br />

process <str<strong>on</strong>g>of</str<strong>on</strong>g> GA operati<strong>on</strong> (Fig. 4b). This dem<strong>on</strong>strated that the<br />

populati<strong>on</strong> diversity could be maintained and the undesired prematurity<br />

could be avoided during GA operati<strong>on</strong>s.<br />

4.2. Interacti<strong>on</strong>s between SCA and GA<br />

5 10 15 20 25<br />

5 10 15 20 25<br />

5 10 15 20 25<br />

Generati<strong>on</strong><br />

(a)<br />

(b)<br />

(c)<br />

Fig. 4. Representative operati<strong>on</strong> processes <str<strong>on</strong>g>of</str<strong>on</strong>g> GASCA 2 .<br />

Within the GASCA framework, reducti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> SCA's computati<strong>on</strong>al<br />

efforts is critical for improving GASCA's efficiency. This is due to: (i) the<br />

SCA's calculati<strong>on</strong> amount increases significantly with the increase <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

dataset volume, and (ii) SCA is frequently invoked to calculate the<br />

fitness functi<strong>on</strong> during the GA operati<strong>on</strong>. Thus, effective use <str<strong>on</strong>g>of</str<strong>on</strong>g> SCA's<br />

results that have been calculated during the former generati<strong>on</strong>s would<br />

greatly improve the entire efficiency <str<strong>on</strong>g>of</str<strong>on</strong>g> GASCA. To this end, a proxy table<br />

are introduced to realize this mechanism. During GA operati<strong>on</strong>s, when<br />

GASCA needs to evaluate the fitness <str<strong>on</strong>g>of</str<strong>on</strong>g> a new chromosome, a proxy table<br />

can be used to find out whether the same chromosome has been<br />

calculated in the previous generati<strong>on</strong>s. Fig. 4c shows that the time to<br />

invoke the proxy table gradually increase to its maximum value with the<br />

increase <str<strong>on</strong>g>of</str<strong>on</strong>g> generati<strong>on</strong> numbers. During the early stage <str<strong>on</strong>g>of</str<strong>on</strong>g> GASCA, the<br />

invoking time was relatively limited. It implied that most <str<strong>on</strong>g>of</str<strong>on</strong>g> the SCAs<br />

needed to be trained since there were few chromosome records in the<br />

proxy table. At later stages, the invoking time kept increasing since more<br />

and more chromosome records were saved in the proxy table.<br />

Therefore, the corresp<strong>on</strong>ding fitness values <str<strong>on</strong>g>of</str<strong>on</strong>g> many SCAs could be<br />

obtained directly from the proxy table since these chromosomes and<br />

their fitness values had already been recorded. In Fig. 4c, the sum <str<strong>on</strong>g>of</str<strong>on</strong>g> time<br />

invoking the proxy table was 1879 while that required for calculating<br />

the fitness was 2600; about 72% efforts for fitness calculati<strong>on</strong> had been<br />

saved. It thus dem<strong>on</strong>strated that the introducti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> a proxy table could<br />

effectively reduce the chance <str<strong>on</strong>g>of</str<strong>on</strong>g> repetitive training and thus significantly<br />

improve the efficiency <str<strong>on</strong>g>of</str<strong>on</strong>g> GASCA iterati<strong>on</strong>.<br />

In additi<strong>on</strong> to efficiency, the accuracy <str<strong>on</strong>g>of</str<strong>on</strong>g> the final result <str<strong>on</strong>g>of</str<strong>on</strong>g> GASCA is<br />

also critical. The manner that GA affects the c<strong>on</strong>structi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> a SCA tree<br />

largely determines the performance <str<strong>on</strong>g>of</str<strong>on</strong>g> a GASCA tree. The factors for<br />

c<strong>on</strong>trolling SCA's performance c<strong>on</strong>sist <str<strong>on</strong>g>of</str<strong>on</strong>g> dataset quality, dataset<br />

partiti<strong>on</strong> strategy, SCA's internal parameters (α cut , α merge and N min ),<br />

and the combinati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> candidate independent <str<strong>on</strong>g>variables</str<strong>on</strong>g> (Qin et al.,<br />

2007). In this study, the candidate independent <str<strong>on</strong>g>variables</str<strong>on</strong>g> and<br />

inherent parameters <str<strong>on</strong>g>of</str<strong>on</strong>g> SCA are c<strong>on</strong>sidered as the major factors that<br />

GA would affect a SCA tree. In our practice for the current dataset <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

food waste <str<strong>on</strong>g>composting</str<strong>on</strong>g>, it is found that the SCA would tend to be overfitting<br />

(i.e. the RMSE for the test set is much worse than the <strong>on</strong>e for the<br />

training set) in <strong>on</strong>e <str<strong>on</strong>g>of</str<strong>on</strong>g> following settings: 1) when the α cut is less than<br />

0.03 or greater than 0.065; 2) when the α merge is less than 0.05 or<br />

greater than 0.065; 3) when the α cut is greater than α merge ; 4) and<br />

when N min is less than 7 and greater than 10. Thus, the variati<strong>on</strong> range<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> inherent parameters are set as: α cut ∈ [0.03, 0.065], α merge ∈ [0.05,<br />

0.065], and N min ∈ integer in [7, 10]. Since α cut and α merge are real<br />

numbers, it is difficult to incorporate them into binary-bit chromosomes.<br />

Alternatively, a set <str<strong>on</strong>g>of</str<strong>on</strong>g> representative values for these inherent<br />

parameters are chosen as: α cut ∈{0.03, 0.035, 0.04, 0.045, 0.05, 0.055,<br />

0.06, 0.065}, α merge ∈{0.05, 0.055, 0.06, 0.065}, and N min ∈ {7, 8, 9, 10}.<br />

Meanwhile, 3, 2 and 2 binary-bits are used to represent 2 3 ,2 2 , and 2 2<br />

numbers <str<strong>on</strong>g>of</str<strong>on</strong>g> representative values, respectively. For example, the last<br />

seven bit <str<strong>on</strong>g>of</str<strong>on</strong>g> chromosome (1010110) includes three parts 101 (No.6 in<br />

α cut 's representative values), 01 (No.2 for α merge ) and 10 (No.3 for<br />

N min ), which means the corresp<strong>on</strong>ding values were chosen as:<br />

α cut =0.055, α merge =0.055, and N min =9. In this manner, the values<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> chromosomes in GA represent the locati<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> parameters in the<br />

given ranges. Al<strong>on</strong>g with the evoluti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> GA, the performance <str<strong>on</strong>g>of</str<strong>on</strong>g> the<br />

chromosomes will be improved and the related parameters will be<br />

transferred to SCA.<br />

4.3. GASCA vs. SCA<br />

To compare GASCA with SCA, four opti<strong>on</strong>al SCA/GASCA models<br />

were calculated (Table 2). SCA 1 has all <str<strong>on</strong>g>of</str<strong>on</strong>g> the candidate independent<br />

<str<strong>on</strong>g>variables</str<strong>on</strong>g> and uses each internal parameter at its default value; SCA 2<br />

selects these <str<strong>on</strong>g>variables</str<strong>on</strong>g>/parameters through the trial and error<br />

method; GASCA 1 uses GA to select <strong>on</strong>ly independent <str<strong>on</strong>g>variables</str<strong>on</strong>g> for<br />

SCA and employs default SCA's inherent parameters; GASCA 2 uses GA<br />

to select both independent <str<strong>on</strong>g>variables</str<strong>on</strong>g> and SCA's inherent parameters.<br />

The results indicated that the R 2 for training sets <str<strong>on</strong>g>of</str<strong>on</strong>g> GASCA were much<br />

higher than those <str<strong>on</strong>g>of</str<strong>on</strong>g> SCA [R 2 -train: GASCA 2 (0.990)≈ GASCA 1

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!