Download - Educational Technology & Society
Download - Educational Technology & Society
Download - Educational Technology & Society
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Having applied the One-way ANOVA model on the pre-test results, we found that these scores did not differ much<br />
between the three groups (f=0.108, df=2, P= 0.89). Table 2 presents the results of applying the One-way ANOVA<br />
model on the post-test results. Post-test scores among three groups showed no significant differences (f=1.18, df=2,<br />
P=0.32), as well (Brown et al 2006). As F-value is lower than the F-critical value, we concluded that statistically<br />
there is no significant difference, and the score differences could be explained the best as being present by chance.<br />
The outcome of our test was that we accepted the null hypothesis that the students in the experimental group will<br />
perform in the similar manner as students who learned in the traditional way (groups B and C).<br />
Even though the results of One-way ANOVA can bring conclusion that DEPTHS does not lead to significant<br />
improvements over the traditional learning, we are encouraged with the fact that our system even in its early stages<br />
has better results than traditional learning. In particular, the results of t-test performed on the experimental group<br />
have shown significant progress of those students who learned with DEPTHS. Moreover, the qualitative results<br />
however indicate that the students prefer this kind of learning over the traditional learning of design patterns. Our<br />
experiences from this evaluation will lead to many improvements of the system which will make it far more<br />
successful in the future.<br />
Student model assessment evaluation<br />
To evaluate the accuracy of the DEPTHS’s student knowledge assessment we compared the students’ knowledge of<br />
the domain as captured in their student models with their results on an external test, performed after the course was<br />
completed. For each student, the DEPTHS system keeps track of all the tests he has solved throughout the course<br />
(i.e., after having learned each domain concept) and evaluates his overall knowledge of the domain from the tests’<br />
results. The external test was designed by the group of teachers, experts in the domain of software engineering. It<br />
was given to the students at the end of the course to assess their knowledge of the whole course content.<br />
The results of the 14 participants from the experimental group were analyzed. The questions of the external test were<br />
divided into 10 groups, each group covering one domain concept, so that, based on the test results, we could assess<br />
the students’ knowledge of each concept individually. Comparing the results of the external test with the student<br />
model data (Fig. 8), we found that the external test shows slightly lower knowledge level (15.43%). That difference<br />
was expected, because the system (i.e. its student modeling component) assesses student knowledge on each concept<br />
immediately after the student has learned it, whereas the external test assesses student knowledge at the end of the<br />
whole course when a number of other variables affect the final results. For example, students could not use DEPTHS<br />
out of the classroom, but we could not control if they are using some other sources of information (literature or the<br />
Internet). Moreover, when estimating the student’s knowledge of a concept, the system takes the best score over all<br />
the test attempts related to that concept. However, if instead of taking the best result we toke the average test result<br />
for a concept (from system’s logs on each concept), we get results that are 11.37% lower than those on the external<br />
test. As the result of the student model assessment evaluation performed on DEPTHS, we finally concluded that the<br />
comparison between the external test and student model could not be realistic indicator of the student model<br />
accuracy, as it is performed during the long period of time (one semester), while system assesses student knowledge<br />
immediately after the learning of each concept. Therefore, we decided to analyze the system’s log data in order to get<br />
some more information about the student model used.<br />
By analyzing the system’s log data we found that at the beginning of the course students tend to spent more time on<br />
each lesson, give more attempts at quizzing and were often revisiting previously learned concepts. The majority of<br />
the students (87%) attempted to solve the tests more than once during the first quarter of the course, and 34% gave<br />
three or more attempts. This trend was diminishing as the course proceeded. At the last quarter, 49% of the students<br />
made two or more attempts and only 11% tried to do the test more than twice. We think that this decrease in the<br />
number of test reatempts is due to the fact that as the end of the semester was approaching, the students were having<br />
increasingly more obligations on other courses and less time to dedicate to learning with DEPTHS. This is a well<br />
known phenomenon, present at the Military Academy for years.<br />
We found an additional interesting issue by analyzing the system’s log data. Some questions, annotated by their<br />
creators as very easy, were not performed well by students. Contrary, some other questions described as very<br />
difficult were surprisingly well done. Having analyzed this issue, we concluded that it originates from the fact that<br />
the Diagnosis module uses data about a question’s difficulty level and time necessary to solve it provided by a<br />
125