08.10.2016 Views

Measuring the internet economy in The Netherlands a big data analysis 2016 | 14

measuring-the-internet-economy

measuring-the-internet-economy

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Retail Survey<br />

<strong>The</strong> retail survey provides <strong>data</strong> on a sample of bus<strong>in</strong>esses with more than 10 employees<br />

<strong>in</strong> <strong>the</strong> retail sector. <strong>The</strong> size of <strong>the</strong> sample is 7,456 BUs and <strong>the</strong> <strong>data</strong> refers to 2015. If <strong>the</strong><br />

bus<strong>in</strong>ess operates an onl<strong>in</strong>e store <strong>the</strong>n <strong>the</strong> turnover from that onl<strong>in</strong>e store is reported<br />

separately, next to <strong>the</strong> turnover for <strong>the</strong> whole bus<strong>in</strong>ess. However, turnover from onl<strong>in</strong>e stores<br />

is not reported for bus<strong>in</strong>esses of a certa<strong>in</strong> size classes and for certa<strong>in</strong> SIC codes (retail and<br />

wholesale). This <strong>data</strong> <strong>in</strong> <strong>the</strong> retail survey is used <strong>in</strong> <strong>the</strong> categorisation of onl<strong>in</strong>e stores.<br />

ICT Use Survey<br />

This survey conta<strong>in</strong>s annual <strong>data</strong> on automation and <strong>the</strong> use of <strong>in</strong>formation and<br />

communication technology (ICT) <strong>in</strong> companies <strong>in</strong> <strong>the</strong> Ne<strong>the</strong>rlands. <strong>The</strong> results describe,<br />

among o<strong>the</strong>r th<strong>in</strong>gs, <strong>the</strong> use of computers, <strong>the</strong> <strong><strong>in</strong>ternet</strong>, electronic buy<strong>in</strong>g and sell<strong>in</strong>g,<br />

software and ICT applications and show <strong>the</strong> trends <strong>in</strong> <strong>the</strong>se phenomena for <strong>the</strong> period s<strong>in</strong>ce<br />

2003. <strong>The</strong> survey is carried out on a sample of roughly 11,000 bus<strong>in</strong>esses from <strong>the</strong> population<br />

of bus<strong>in</strong>esses with at least 10 employees (which consists of approximately<br />

60,000 bus<strong>in</strong>esses). We used <strong>data</strong> from this survey to cross-validate our results and improve<br />

our methodology.<br />

O<strong>the</strong>r sources<br />

F<strong>in</strong>ally, we make use of several o<strong>the</strong>r publicly available sources of <strong>data</strong> from <strong>the</strong> <strong><strong>in</strong>ternet</strong>.<br />

For example, where lists of onl<strong>in</strong>e stores are available onl<strong>in</strong>e, we use this to check that our<br />

method to identify websites is captur<strong>in</strong>g all <strong>the</strong> most important onl<strong>in</strong>e stores (see section<br />

5.1.1). We also use lists of ICT bus<strong>in</strong>esses for a similar purpose (see section 5.1.2). <strong>The</strong>se<br />

sources allow us to check <strong>the</strong> plausibility of our results and to complement <strong>the</strong> o<strong>the</strong>r <strong>data</strong><br />

sources as necessary.<br />

5. Methodology<br />

This section describes <strong>the</strong> steps to create a <strong>data</strong>base from which all <strong>the</strong> results are derived.<br />

Figure 5.1 shows a schematic summary of all <strong>the</strong> research steps.<br />

We beg<strong>in</strong> with <strong>the</strong> Dataprovider <strong>data</strong> on Dutch websites and allocate each website to a<br />

category. We expla<strong>in</strong> our method for this <strong>in</strong> section 5.1. After websites have been allocated<br />

to given categories, we l<strong>in</strong>k <strong>the</strong> websites to <strong>the</strong> GBR. <strong>The</strong> method for l<strong>in</strong>k<strong>in</strong>g to <strong>the</strong> GBR is<br />

quite complex and employs diverse methods. For some l<strong>in</strong>k<strong>in</strong>g methods more than o<strong>the</strong>rs,<br />

we can be more confident that <strong>the</strong> l<strong>in</strong>k between <strong>the</strong> website and <strong>the</strong> BU is correct. <strong>The</strong>refore,<br />

each l<strong>in</strong>k<strong>in</strong>g method is described <strong>in</strong> detail <strong>in</strong> section 5.2. At this stage, we have a <strong>data</strong>base<br />

<strong>in</strong> which a BU can have 1) no website, 2) one website or 3) multiple websites allocated to it.<br />

This means that <strong>the</strong> <strong>data</strong>base is not unique at <strong>the</strong> BU level. In order to accurately represent<br />

<strong>the</strong> <strong>economy</strong>, this <strong>data</strong>base needs to be unique at <strong>the</strong> BU level. Fur<strong>the</strong>r, we need to translate<br />

<strong>the</strong> categorisation of <strong>the</strong> websites attached to each BU to <strong>the</strong> classification of <strong>the</strong> BU. Thus if,<br />

for example, one BU has two websites, one belong<strong>in</strong>g to category B and <strong>the</strong> o<strong>the</strong>r belong<strong>in</strong>g<br />

to category C, <strong>the</strong>n <strong>the</strong> BU could be categorised as B or C. To deal with this, we develop a<br />

series of decision rules which allow us to create a <strong>data</strong>base which is unique at <strong>the</strong> BU level<br />

and for which all BUs are allocated to a category. <strong>The</strong>se decision rules are expla<strong>in</strong>ed <strong>in</strong><br />

CBS | Discussion Paper, <strong>2016</strong> | <strong>14</strong> 16

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!