21.01.2014 Views

A semi-automatic approach for workflow staff assignment - Faculty of ...

A semi-automatic approach for workflow staff assignment - Faculty of ...

A semi-automatic approach for workflow staff assignment - Faculty of ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Y. Liu et al. / Computers in Industry 59 (2008) 463–476 471<br />

represents the activity number, which is defined in decreasing<br />

order <strong>of</strong> their training set sizes.<br />

In Fig. 7, the first row <strong>of</strong> subfigures (subfigure a–c) illustrates<br />

each activity’s training set size. The vertical axis represents the<br />

number <strong>of</strong> training samples <strong>for</strong> each activity. In the real<br />

situation, it reveals the number <strong>of</strong> times that the target activity<br />

has been executed. The second row <strong>of</strong> subfigures illustrates the<br />

number <strong>of</strong> classes <strong>for</strong> each training set. The vertical axis<br />

represents the class number. In our <strong>approach</strong>, the class is<br />

represented by actor’s identity. There<strong>for</strong>e, this number<br />

demonstrates how many different actors have executed the<br />

target activity. The last row <strong>of</strong> subfigures depicts the number <strong>of</strong><br />

features <strong>for</strong> each training set. The vertical axis is the feature<br />

number. Because we use previous activities <strong>of</strong> the target activity<br />

as features, this number demonstrates how many activities in a<br />

<strong>workflow</strong> model have the precedence relationship with the<br />

target activity.<br />

For example, in enterprise A (the first column <strong>of</strong> subfigures<br />

in Fig. 7), the property <strong>of</strong> the first activity corresponds to the left<br />

most pixel <strong>of</strong> each subfigure. This activity is a review activity <strong>of</strong><br />

a CAD drawing design process, which is the second activity <strong>of</strong><br />

the <strong>workflow</strong> model. 58 different actors executed it <strong>for</strong> 316<br />

times in the past. There<strong>for</strong>e, the training set <strong>of</strong> this activity<br />

contains 316 training samples, 58 classes and 2 features (one<br />

feature <strong>for</strong> the actor <strong>of</strong> the first activity and the other <strong>for</strong> the<br />

initiator <strong>of</strong> the <strong>workflow</strong> instance). It is important to mention<br />

that, in Definition 3, the start node is also included in the<br />

precedence activity sequences to represent the instantiation<br />

event <strong>of</strong> <strong>workflow</strong>, so there is always one more feature than the<br />

number <strong>of</strong> activities.<br />

According to Fig. 7, we may find that the size <strong>of</strong> the<br />

training set is not the same <strong>for</strong> each activity, which implies a<br />

different execution frequency. However, this unbalanced<br />

execution frequency does not have much impact on the<br />

number <strong>of</strong> past actors (i.e., the second row <strong>of</strong> subfigures),<br />

which means the activities are designed to be undertaken by<br />

many actors. Moreover, judging from the number <strong>of</strong> features,<br />

we can see that the activities that need run-time <strong>staff</strong><br />

<strong>assignment</strong> are located in different places <strong>of</strong> the <strong>workflow</strong><br />

model.<br />

4.2. Prediction accuracy <strong>of</strong> classifiers<br />

Fig. 8 shows the prediction accuracy <strong>of</strong> 10-fold crossvalidation<br />

<strong>for</strong> the selected activities in three enterprises. In<br />

order to provide a reference, we connected the pixels <strong>of</strong> Naïve<br />

Bayes classifiers using dotted lines. As shown in the figure, the<br />

prediction accuracy is not the same <strong>for</strong> all the selected<br />

Fig. 8. Prediction accuracy <strong>of</strong> classifiers in three enterprises.<br />

activities. Some <strong>of</strong> them are well predicted by the generated<br />

classifier (>80%), but some <strong>of</strong> them are not. However, most<br />

classifiers have achieved a prediction accuracy that is greater<br />

than 50%. Just few activities’ <strong>staff</strong> <strong>assignment</strong> is poorly<br />

predicted in three enterprises.<br />

4.3. Overall per<strong>for</strong>mance <strong>of</strong> learning algorithms<br />

Table 4 lists the overall prediction accuracy <strong>of</strong> three<br />

algorithms. We use the percentage <strong>of</strong> correct predictions as a<br />

measure <strong>of</strong> overall per<strong>for</strong>mance. According to the result, in<br />

three enterprises, all the algorithms have achieved an overall<br />

prediction accuracy <strong>of</strong> well over 75%. SVM generally per<strong>for</strong>ms<br />

the best, but only marginally.<br />

Table 5 lists the training and testing time <strong>of</strong> three algorithms.<br />

The testing time <strong>of</strong> three algorithms is very quick, but the<br />

training time differs greatly. Naïve Bayes per<strong>for</strong>ms the best in<br />

terms <strong>of</strong> time consumption. In contrast, SVM’s training time is<br />

very long.<br />

4.4. Analysis <strong>of</strong> some special cases<br />

As a further investigation, we evaluated some special cases.<br />

Table 6 lists the properties <strong>of</strong> some worst predicted activities in<br />

three enterprises (the first, third and fifth rows). Their related<br />

<strong>workflow</strong> models are depicted in Fig. 9. As a comparison, we<br />

also provide some <strong>of</strong> the best predicted activities in Table 6 (the<br />

second, fourth and sixth rows).<br />

Table 4<br />

Overall prediction accuracy in three enterprises<br />

Enterprise Number <strong>of</strong> predictions Number <strong>of</strong> correct predictions Prediction accuracy (%)<br />

C4.5 Naïve Bayes SVM C4.5 Naïve Bayes SVM<br />

A 3,447 2,739 2,532 2,740 79.460 73.455 79.489<br />

B 13,483 10,669 10,294 10,712 79.129 76.348 79.448<br />

C 24,347 20,028 19,766 20,178 82.261 81.185 82.877

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!