Measuring the internet economy in The Netherlands a big data analysis 2016 | 14
measuring-the-internet-economy
measuring-the-internet-economy
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Retail Survey<br />
<strong>The</strong> retail survey provides <strong>data</strong> on a sample of bus<strong>in</strong>esses with more than 10 employees<br />
<strong>in</strong> <strong>the</strong> retail sector. <strong>The</strong> size of <strong>the</strong> sample is 7,456 BUs and <strong>the</strong> <strong>data</strong> refers to 2015. If <strong>the</strong><br />
bus<strong>in</strong>ess operates an onl<strong>in</strong>e store <strong>the</strong>n <strong>the</strong> turnover from that onl<strong>in</strong>e store is reported<br />
separately, next to <strong>the</strong> turnover for <strong>the</strong> whole bus<strong>in</strong>ess. However, turnover from onl<strong>in</strong>e stores<br />
is not reported for bus<strong>in</strong>esses of a certa<strong>in</strong> size classes and for certa<strong>in</strong> SIC codes (retail and<br />
wholesale). This <strong>data</strong> <strong>in</strong> <strong>the</strong> retail survey is used <strong>in</strong> <strong>the</strong> categorisation of onl<strong>in</strong>e stores.<br />
ICT Use Survey<br />
This survey conta<strong>in</strong>s annual <strong>data</strong> on automation and <strong>the</strong> use of <strong>in</strong>formation and<br />
communication technology (ICT) <strong>in</strong> companies <strong>in</strong> <strong>the</strong> Ne<strong>the</strong>rlands. <strong>The</strong> results describe,<br />
among o<strong>the</strong>r th<strong>in</strong>gs, <strong>the</strong> use of computers, <strong>the</strong> <strong><strong>in</strong>ternet</strong>, electronic buy<strong>in</strong>g and sell<strong>in</strong>g,<br />
software and ICT applications and show <strong>the</strong> trends <strong>in</strong> <strong>the</strong>se phenomena for <strong>the</strong> period s<strong>in</strong>ce<br />
2003. <strong>The</strong> survey is carried out on a sample of roughly 11,000 bus<strong>in</strong>esses from <strong>the</strong> population<br />
of bus<strong>in</strong>esses with at least 10 employees (which consists of approximately<br />
60,000 bus<strong>in</strong>esses). We used <strong>data</strong> from this survey to cross-validate our results and improve<br />
our methodology.<br />
O<strong>the</strong>r sources<br />
F<strong>in</strong>ally, we make use of several o<strong>the</strong>r publicly available sources of <strong>data</strong> from <strong>the</strong> <strong><strong>in</strong>ternet</strong>.<br />
For example, where lists of onl<strong>in</strong>e stores are available onl<strong>in</strong>e, we use this to check that our<br />
method to identify websites is captur<strong>in</strong>g all <strong>the</strong> most important onl<strong>in</strong>e stores (see section<br />
5.1.1). We also use lists of ICT bus<strong>in</strong>esses for a similar purpose (see section 5.1.2). <strong>The</strong>se<br />
sources allow us to check <strong>the</strong> plausibility of our results and to complement <strong>the</strong> o<strong>the</strong>r <strong>data</strong><br />
sources as necessary.<br />
5. Methodology<br />
This section describes <strong>the</strong> steps to create a <strong>data</strong>base from which all <strong>the</strong> results are derived.<br />
Figure 5.1 shows a schematic summary of all <strong>the</strong> research steps.<br />
We beg<strong>in</strong> with <strong>the</strong> Dataprovider <strong>data</strong> on Dutch websites and allocate each website to a<br />
category. We expla<strong>in</strong> our method for this <strong>in</strong> section 5.1. After websites have been allocated<br />
to given categories, we l<strong>in</strong>k <strong>the</strong> websites to <strong>the</strong> GBR. <strong>The</strong> method for l<strong>in</strong>k<strong>in</strong>g to <strong>the</strong> GBR is<br />
quite complex and employs diverse methods. For some l<strong>in</strong>k<strong>in</strong>g methods more than o<strong>the</strong>rs,<br />
we can be more confident that <strong>the</strong> l<strong>in</strong>k between <strong>the</strong> website and <strong>the</strong> BU is correct. <strong>The</strong>refore,<br />
each l<strong>in</strong>k<strong>in</strong>g method is described <strong>in</strong> detail <strong>in</strong> section 5.2. At this stage, we have a <strong>data</strong>base<br />
<strong>in</strong> which a BU can have 1) no website, 2) one website or 3) multiple websites allocated to it.<br />
This means that <strong>the</strong> <strong>data</strong>base is not unique at <strong>the</strong> BU level. In order to accurately represent<br />
<strong>the</strong> <strong>economy</strong>, this <strong>data</strong>base needs to be unique at <strong>the</strong> BU level. Fur<strong>the</strong>r, we need to translate<br />
<strong>the</strong> categorisation of <strong>the</strong> websites attached to each BU to <strong>the</strong> classification of <strong>the</strong> BU. Thus if,<br />
for example, one BU has two websites, one belong<strong>in</strong>g to category B and <strong>the</strong> o<strong>the</strong>r belong<strong>in</strong>g<br />
to category C, <strong>the</strong>n <strong>the</strong> BU could be categorised as B or C. To deal with this, we develop a<br />
series of decision rules which allow us to create a <strong>data</strong>base which is unique at <strong>the</strong> BU level<br />
and for which all BUs are allocated to a category. <strong>The</strong>se decision rules are expla<strong>in</strong>ed <strong>in</strong><br />
CBS | Discussion Paper, <strong>2016</strong> | <strong>14</strong> 16