10.01.2017 Views

Forging new pathways of research and innovation in open and distance learning

RW_2016_Oldenburg_Proceedings

RW_2016_Oldenburg_Proceedings

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Forg<strong>in</strong>g</strong> <strong>new</strong> <strong>pathways</strong> <strong>of</strong> <strong>research</strong> <strong>and</strong> <strong><strong>in</strong>novation</strong> <strong>in</strong> <strong>open</strong> <strong>and</strong> <strong>distance</strong> learn<strong>in</strong>g:<br />

Reach<strong>in</strong>g from the roots<br />

Proceed<strong>in</strong>gs <strong>of</strong> the 9 th European Distance <strong>and</strong> E-Learn<strong>in</strong>g Network Research Workshop, 2016<br />

Oldenburg, 4-6 October, 2016<br />

ISBN 978-615-5511-12-7<br />

SCALING LEARNING ANALYTICS:<br />

THE PRACTICAL APPLICATION OF SYNTHETIC DATA<br />

Alan M. Berg, Stefan T. Mol, Gábor Kismihók, University <strong>of</strong> Amsterdam, Netherl<strong>and</strong>s,<br />

Niall Sclater, Sclater Digital Ltd, United K<strong>in</strong>gdom<br />

Summary<br />

This case study is based on experiences ga<strong>in</strong>ed dur<strong>in</strong>g the runn<strong>in</strong>g <strong>of</strong> a two-day data<br />

hackathon around large scale Learn<strong>in</strong>g Analytics <strong>in</strong>frastructure at the LAK16 conference. The<br />

ma<strong>in</strong> conclusion is that there will be a significant dem<strong>and</strong> for realistic synthetic data to<br />

support the development <strong>of</strong> large scale <strong>in</strong>frastructures.<br />

Synthetic data overcomes ethical barriers to shar<strong>in</strong>g large data sets between different (parts <strong>of</strong>)<br />

organizations. Properly simulated synthetic data can be leveraged to f<strong>in</strong>e tune algorithms<br />

deployed with<strong>in</strong> the field <strong>of</strong> Learn<strong>in</strong>g Analytics. This data driven approach lowers the risk <strong>of</strong><br />

accidental disclosure <strong>and</strong> bypasses limitations rightfully imposed due to legal <strong>and</strong>/or ethical<br />

constra<strong>in</strong>ts associated with real student data. The application <strong>of</strong> synthetic data to performance<br />

test<strong>in</strong>g allows universities to develop highly scalable <strong>in</strong>frastructure <strong>in</strong> parallel to develop<strong>in</strong>g<br />

central data governance practices.<br />

This short paper explores the conformance test<strong>in</strong>g <strong>of</strong> Learn<strong>in</strong>g Record Stores (LRS – secure<br />

locations to store <strong>and</strong> query student digital traces), discusses the implications for Universities<br />

around a specific set <strong>of</strong> xAPI recipes (Berg, Scheffel, Drachsler, Ternier, & Specht, 2016) <strong>and</strong><br />

generalizes practices for the acceleration <strong>of</strong> large scale deployments <strong>of</strong> LA <strong>in</strong>frastructure. The<br />

authors argue that by apply<strong>in</strong>g a st<strong>and</strong>ardized set <strong>of</strong> synthetic data based on a peer reviewed<br />

synthetic data generator, universities will f<strong>in</strong>d it easier to develop reliable recipes for digital<br />

learner traces. Consistent data storage across university boundaries will subsequently enable<br />

the benchmark<strong>in</strong>g <strong>of</strong> algorithms that consume student digital traces <strong>and</strong> support the<br />

generation <strong>of</strong> predictive validity evidence across university boundaries. Thus universities can<br />

compare the value <strong>of</strong> their algorithms relative to other universities <strong>and</strong> consistently apply<br />

algorithms when students transfer.<br />

The relevance <strong>of</strong> synthetic data <strong>in</strong> Learn<strong>in</strong>g Analytics<br />

Synthetic data, also known as simulated data, has been heavily <strong>research</strong>ed <strong>and</strong> successfully<br />

applied across a broad range <strong>of</strong> scientific fields. Berg, Mol, Kismihók, & Sclater (2016) have<br />

previously discussed the application <strong>of</strong> synthetic data with<strong>in</strong> the field <strong>of</strong> Learn<strong>in</strong>g Analytics.<br />

264

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!