25.12.2013 Views

SLAMorris Final Thesis After Corrections.pdf - Cranfield University

SLAMorris Final Thesis After Corrections.pdf - Cranfield University

SLAMorris Final Thesis After Corrections.pdf - Cranfield University

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

approximately half the data in Data Set 1; using a random number generator<br />

half the fragments from each file type were selected. For the thumbnail cache,<br />

half the records for each classification of thumbnail cache data were selected.<br />

Where the number of fragments in a category was odd, the number of selected<br />

fragments was rounded up.<br />

For methods 1 and 2, the training phase focused on determining which rules for<br />

classifying fragments were accepting false positives. <strong>After</strong> each time the<br />

methods were run, the logs were analysed to ascertain which rules were falsely<br />

classifying fragments. For example, was a specific structure used to classify a<br />

fragment also being detected in other file types? Once the rules generating<br />

false positives were identified the individual method could be adapted to<br />

strengthen the rule. This provided an opportunity to refine the classification<br />

rules based on the thumbnail cache information identified in Chapters 5 and 6.<br />

For the statistical and Neural Network methods, the training is described in their<br />

individual sections.<br />

<strong>After</strong> the models had been trained using the training set data, the models were<br />

tested against the testing set. The testing set comprised the remainder of the<br />

files from Data set 1, and all the files in Data sets 2 and 3. The files in Data sets<br />

1 and 2 had known classifications. In data set 3 the number of fragments in<br />

each classification was unknown; therefore the only information provided by<br />

each models for this data set was the number of correctly identified thumbnail<br />

cache fragments and the number of false positive identifications.<br />

Page<br />

174

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!