19.08.2013 Views

INFORMATION AND SOCIAL ANALYSIS OF REDDIT - SNAP

INFORMATION AND SOCIAL ANALYSIS OF REDDIT - SNAP

INFORMATION AND SOCIAL ANALYSIS OF REDDIT - SNAP

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3.5 CRAWLING RESULTS<br />

Table 1 below displays some statistics on the pure data crawled.<br />

Subreddits Listings Submissions<br />

Disk Size 21 MB 640 MB 4.3 GB<br />

Requests 243 2,348 257,624<br />

Entries 24,298 227,792 3,851,223<br />

Table 1 – Crawled data statistics.<br />

Note: Listings only shows the 1% of subreddits crawled.<br />

Due to the compact and dense format that JSON provides, the results take up much less disk space<br />

than an HTML based crawl would. A simple comparison of the HTML version versus the JSON<br />

version shows a factor of four times the saving in size for the same raw data. Not only does it save<br />

more space, the JSON data is naturally easier to work with when moving to the Analysis section.<br />

4 SUB<strong>REDDIT</strong>S <strong>ANALYSIS</strong><br />

The listing of subreddits provides the highest level view of Reddit. Table 2 below shows some<br />

general statistics of the crawl. Note the large difference between the largest amounts of subscribers<br />

to a single subreddit versus the average subreddit. This is a clear indication that an overwhelming<br />

majority of subreddits are not followed.<br />

Figure 1 – Subreddits versus Age<br />

Count Subreddits 24,298<br />

Max Subscribers 1,140,436<br />

Avg Subscribers 1077<br />

S.D. Subscribers 23,514<br />

Table 2 - Subreddits<br />

Figure 2 – Subscriptions versus Age<br />

5

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!