08.10.2016 Views

Measuring the internet economy in The Netherlands a big data analysis 2016 | 14

measuring-the-internet-economy

measuring-the-internet-economy

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Key comb<strong>in</strong>ation 5<br />

<strong>The</strong> f<strong>in</strong>al category consists of cases where more than 10 websites have <strong>the</strong> same CoC number.<br />

Some bus<strong>in</strong>esses may <strong>in</strong>deed have more than one website, for example, <strong>the</strong>re are several<br />

onl<strong>in</strong>e stores which each have several websites sell<strong>in</strong>g different products. In many cases<br />

however, more than 10 websites l<strong>in</strong>k to <strong>the</strong> same CoC number because <strong>the</strong> CoC number<br />

on a given website does not correspond to <strong>the</strong> bus<strong>in</strong>ess beh<strong>in</strong>d <strong>the</strong> website. For example,<br />

a website may conta<strong>in</strong> a list of bus<strong>in</strong>esses who supply a particular product, which <strong>in</strong>cludes<br />

<strong>the</strong> CoC number for each bus<strong>in</strong>ess. A given CoC number can show up multiple times over<br />

different websites. Ano<strong>the</strong>r example is of host<strong>in</strong>g/web-design companies which display<br />

<strong>the</strong>ir CoC on <strong>the</strong> websites <strong>the</strong>y host or designed. For <strong>the</strong>se cases we <strong>in</strong>corporated a special<br />

decision rule. If <strong>the</strong>re were more than 10 websites that belong to one CoC number and one<br />

of <strong>the</strong>se websites was a host<strong>in</strong>g/web-design company than this CoC number was allocated to<br />

category E. Of all <strong>the</strong> websites which could be l<strong>in</strong>ked to <strong>the</strong> ABR 16% belongs to this category.<br />

All websites which could not be l<strong>in</strong>ked accord<strong>in</strong>g to one of <strong>the</strong> 5 key comb<strong>in</strong>ations are from<br />

this po<strong>in</strong>t on excluded from <strong>the</strong> <strong>analysis</strong>. <strong>The</strong> websites which could not be l<strong>in</strong>ked will <strong>in</strong>clude<br />

all <strong>the</strong> websites of private <strong>in</strong>dividuals which fall outside of <strong>the</strong> def<strong>in</strong>ition of <strong>the</strong> <strong><strong>in</strong>ternet</strong><br />

<strong>economy</strong>, as well as websites where <strong>the</strong>re was not sufficient <strong>in</strong>formation to make a l<strong>in</strong>k to <strong>the</strong><br />

GBR. <strong>The</strong> different key comb<strong>in</strong>ations are summarised <strong>in</strong> <strong>the</strong> follow<strong>in</strong>g table.<br />

5.2.1 Results of <strong>the</strong> merg<strong>in</strong>g process<br />

Key<br />

comb<strong>in</strong>ation Description Count Percentage<br />

1 hostname (Dataprovider) = hostname (GBR) and CoC (Dataprovider)<br />

= C0C (GBR) 272,000 32­<br />

2 hostname (Dataprovider) = hostname (GBR) and CoC (Dataprovider)<br />

CoC (GBR) -> CoC (GBR) 66,000 8<br />

3 hostname (Dataprovider) = hostname (GBR) and CoC (Dataprovider)<br />

CoC (GBR) -> CoC (Dataprovider) 294,000 35<br />

4 hostname (Dataprovider) hostname (GBR) and CoC (Dataprovider)<br />

CoC -> CoC through e-mail or telephone number 76,000 9<br />

5 CoC (Dataprovider) > 10 hostnames 132,000 16<br />

Total matched 840,000 100<br />

Total not matched 1,948,000<br />

5.3 Decision rules<br />

Decision rules are formulated to determ<strong>in</strong>e <strong>the</strong> role of BUs <strong>in</strong> <strong>the</strong> <strong><strong>in</strong>ternet</strong> <strong>economy</strong> based on<br />

<strong>the</strong> <strong>in</strong>formation which we have ga<strong>in</strong>ed about <strong>the</strong> website(s) of <strong>the</strong> BU. Decision rules would<br />

not be necessary if every bus<strong>in</strong>ess had only one website, only one CoC number and only one<br />

BU. <strong>The</strong> situation can be more complicated because bus<strong>in</strong>esses often have multiple websites.<br />

Consider for example a large bus<strong>in</strong>ess with separate websites for customers, bus<strong>in</strong>esses,<br />

careers and sales. It is likely that <strong>the</strong>se websites will not appear <strong>in</strong> <strong>the</strong> same category of <strong>the</strong><br />

<strong><strong>in</strong>ternet</strong> <strong>economy</strong>. Additionally, due to <strong>the</strong> structure of GBR, multiple CoC numbers often l<strong>in</strong>k<br />

to a s<strong>in</strong>gle BU. Subsequently, multiple BU’s can l<strong>in</strong>k to one EG when large companies have<br />

diverse activities. <strong>The</strong> follow<strong>in</strong>g figure presents <strong>the</strong>se different problems.<br />

CBS | Discussion Paper, <strong>2016</strong> | <strong>14</strong> 24

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!