25.07.2013 Views

Download - Educational Technology & Society

Download - Educational Technology & Society

Download - Educational Technology & Society

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Having applied the One-way ANOVA model on the pre-test results, we found that these scores did not differ much<br />

between the three groups (f=0.108, df=2, P= 0.89). Table 2 presents the results of applying the One-way ANOVA<br />

model on the post-test results. Post-test scores among three groups showed no significant differences (f=1.18, df=2,<br />

P=0.32), as well (Brown et al 2006). As F-value is lower than the F-critical value, we concluded that statistically<br />

there is no significant difference, and the score differences could be explained the best as being present by chance.<br />

The outcome of our test was that we accepted the null hypothesis that the students in the experimental group will<br />

perform in the similar manner as students who learned in the traditional way (groups B and C).<br />

Even though the results of One-way ANOVA can bring conclusion that DEPTHS does not lead to significant<br />

improvements over the traditional learning, we are encouraged with the fact that our system even in its early stages<br />

has better results than traditional learning. In particular, the results of t-test performed on the experimental group<br />

have shown significant progress of those students who learned with DEPTHS. Moreover, the qualitative results<br />

however indicate that the students prefer this kind of learning over the traditional learning of design patterns. Our<br />

experiences from this evaluation will lead to many improvements of the system which will make it far more<br />

successful in the future.<br />

Student model assessment evaluation<br />

To evaluate the accuracy of the DEPTHS’s student knowledge assessment we compared the students’ knowledge of<br />

the domain as captured in their student models with their results on an external test, performed after the course was<br />

completed. For each student, the DEPTHS system keeps track of all the tests he has solved throughout the course<br />

(i.e., after having learned each domain concept) and evaluates his overall knowledge of the domain from the tests’<br />

results. The external test was designed by the group of teachers, experts in the domain of software engineering. It<br />

was given to the students at the end of the course to assess their knowledge of the whole course content.<br />

The results of the 14 participants from the experimental group were analyzed. The questions of the external test were<br />

divided into 10 groups, each group covering one domain concept, so that, based on the test results, we could assess<br />

the students’ knowledge of each concept individually. Comparing the results of the external test with the student<br />

model data (Fig. 8), we found that the external test shows slightly lower knowledge level (15.43%). That difference<br />

was expected, because the system (i.e. its student modeling component) assesses student knowledge on each concept<br />

immediately after the student has learned it, whereas the external test assesses student knowledge at the end of the<br />

whole course when a number of other variables affect the final results. For example, students could not use DEPTHS<br />

out of the classroom, but we could not control if they are using some other sources of information (literature or the<br />

Internet). Moreover, when estimating the student’s knowledge of a concept, the system takes the best score over all<br />

the test attempts related to that concept. However, if instead of taking the best result we toke the average test result<br />

for a concept (from system’s logs on each concept), we get results that are 11.37% lower than those on the external<br />

test. As the result of the student model assessment evaluation performed on DEPTHS, we finally concluded that the<br />

comparison between the external test and student model could not be realistic indicator of the student model<br />

accuracy, as it is performed during the long period of time (one semester), while system assesses student knowledge<br />

immediately after the learning of each concept. Therefore, we decided to analyze the system’s log data in order to get<br />

some more information about the student model used.<br />

By analyzing the system’s log data we found that at the beginning of the course students tend to spent more time on<br />

each lesson, give more attempts at quizzing and were often revisiting previously learned concepts. The majority of<br />

the students (87%) attempted to solve the tests more than once during the first quarter of the course, and 34% gave<br />

three or more attempts. This trend was diminishing as the course proceeded. At the last quarter, 49% of the students<br />

made two or more attempts and only 11% tried to do the test more than twice. We think that this decrease in the<br />

number of test reatempts is due to the fact that as the end of the semester was approaching, the students were having<br />

increasingly more obligations on other courses and less time to dedicate to learning with DEPTHS. This is a well<br />

known phenomenon, present at the Military Academy for years.<br />

We found an additional interesting issue by analyzing the system’s log data. Some questions, annotated by their<br />

creators as very easy, were not performed well by students. Contrary, some other questions described as very<br />

difficult were surprisingly well done. Having analyzed this issue, we concluded that it originates from the fact that<br />

the Diagnosis module uses data about a question’s difficulty level and time necessary to solve it provided by a<br />

125

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!