08.10.2016 Views

Measuring the internet economy in The Netherlands a big data analysis 2016 | 14

measuring-the-internet-economy

measuring-the-internet-economy

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

an e-commerce website. Thus, some websites have low probabilities (5% for example) and<br />

are thus most likely not e-commerce website, while some have high probabilities (95% for<br />

example) and are thus most likely e-commerce websites. It rema<strong>in</strong>s <strong>the</strong>n to choose a cutoff<br />

po<strong>in</strong>t. Dataprovider chooses a cut-off po<strong>in</strong>t of 85%. Dur<strong>in</strong>g this research, we analysed<br />

this choice by look<strong>in</strong>g at <strong>the</strong> websites ei<strong>the</strong>r side of <strong>the</strong> cut-off po<strong>in</strong>t. On this basis, <strong>the</strong><br />

performance of <strong>the</strong> mach<strong>in</strong>e-learn<strong>in</strong>g algorithm and <strong>the</strong> appropriateness of <strong>the</strong> 85% cut-off<br />

po<strong>in</strong>t appeared to be perform<strong>in</strong>g satisfactorily. Below <strong>the</strong> cut-off po<strong>in</strong>t most websites don’t<br />

seem to be an onl<strong>in</strong>e store and above <strong>the</strong> cut-off po<strong>in</strong>t most websites were.<br />

It is, however, possible that websites which are actually onl<strong>in</strong>e stores can be missed us<strong>in</strong>g<br />

this method. We <strong>the</strong>refore searched for o<strong>the</strong>r <strong>data</strong> with which we could test whe<strong>the</strong>r <strong>the</strong><br />

Dataprovider mach<strong>in</strong>e learn<strong>in</strong>g algorithm had identified at least all of <strong>the</strong> most important<br />

onl<strong>in</strong>e stores. <strong>The</strong> Dutch website jouwaanbied<strong>in</strong>g.nl conta<strong>in</strong>s an up to date list conta<strong>in</strong><strong>in</strong>g<br />

<strong>the</strong> most popular onl<strong>in</strong>e stores. With<strong>in</strong> this list were several onl<strong>in</strong>e stores which were better<br />

placed <strong>in</strong> o<strong>the</strong>r categories, so we manually removed <strong>the</strong>se from <strong>the</strong> list. From <strong>the</strong> rema<strong>in</strong><strong>in</strong>g<br />

website approximately 250 additional onl<strong>in</strong>e stores were identified and added to category C.<br />

F<strong>in</strong>ally, we employed <strong>the</strong> Call To Action (CTA) <strong>data</strong>set to fur<strong>the</strong>r ref<strong>in</strong>e <strong>the</strong> category. Of <strong>the</strong> six<br />

calls to action <strong>in</strong> this <strong>data</strong>set (order, buy, view <strong>the</strong> shopp<strong>in</strong>g cart, make a reservation/book<strong>in</strong>g,<br />

subscribe or register), <strong>the</strong> follow<strong>in</strong>g three are most closely associated with onl<strong>in</strong>e stores:<br />

order, buy and view <strong>the</strong> shopp<strong>in</strong>g cart. We analysed different comb<strong>in</strong>ations of <strong>the</strong>se 3 calls<br />

to action <strong>in</strong> order to f<strong>in</strong>d a comb<strong>in</strong>ation which added a significant number of onl<strong>in</strong>e stores to<br />

<strong>the</strong> category without add<strong>in</strong>g any websites which were not onl<strong>in</strong>e stores. <strong>The</strong> best perform<strong>in</strong>g<br />

set of calls to action was ‘buy’ and ‘view <strong>the</strong> shopp<strong>in</strong>g cart’. This choice of calls to action<br />

identified around 5,400 onl<strong>in</strong>e stores which were not yet categorised as such.<br />

Comparison to SIC codes for onl<strong>in</strong>e stores<br />

<strong>The</strong> use of <strong>big</strong> <strong>data</strong> with<strong>in</strong> this project facilitates a different perspective on <strong>the</strong> nature of <strong>the</strong><br />

Dutch bus<strong>in</strong>esses. It is particularly <strong>in</strong>terest<strong>in</strong>g to make comparisons between <strong>the</strong> nature of<br />

bus<strong>in</strong>esses accord<strong>in</strong>g to <strong>the</strong> SIC codes and <strong>the</strong> categorisation of bus<strong>in</strong>esses <strong>in</strong> this study. In<br />

this box we analyse <strong>the</strong> SIC codes of all <strong>the</strong> bus<strong>in</strong>esses which are classified as onl<strong>in</strong>e stores<br />

accord<strong>in</strong>g to our def<strong>in</strong>ition. <strong>The</strong> results are shown below.<br />

5.1.1.1 SIC codes of bus<strong>in</strong>esses classiied as onl<strong>in</strong>e stores<br />

35,1%<br />

1,9%<br />

2,0%<br />

2,5%<br />

2,6%<br />

12,7%<br />

29,3%<br />

13,9%<br />

4791: Onl<strong>in</strong>e stores<br />

47: Retail exclud<strong>in</strong>g 4791 (Onl<strong>in</strong>e stores)<br />

46: Wholesale<br />

62: Support activities <strong>in</strong> <strong>the</strong> field of ICT<br />

74: Industrial design, photography, translation<br />

74: and o<strong>the</strong>r consultancy<br />

96: Wellness and o<strong>the</strong>r services: funeral activities<br />

45: Sale and repair of motor vehicles<br />

O<strong>the</strong>r<br />

CBS | Discussion Paper, <strong>2016</strong> | <strong>14</strong> 18

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!