Implementation of data collection tools using NetFlow for statistical ...

Basic level CDT307 

Implementation of data collection tools using NetFlow 

for statistical analysis at the ISP level 

Bachelor's Thesis in Computer Science 

by 

Daniel Karlström 

Department of Innovation, Design and Engineering Akademin för Innovation, Design och Teknik 

Mälardalen University Mälardalens Högskola 

SE-722 18 Västerås, Sweden 722 18 Västerås

Implementation of data collection tools using NetFlow 

for statistical analysis at the ISP level 

by 

Daniel Karlström 

Supervisors: Stefan Löfgren 

Mälardalen University, IDT 

Fredrik Holmqvist 

Examiner: Mats Björkman 

Västerås, May 23 rd , 2012 

Internet 2 Business KB

Abstract 

Defending against Dos- and DDoS attacks is difficult to accomplish; finding and filtering out 

illegitimate traffic from the legitimate flow is near impossible. Taking steps to mitigate or even 

block the traffic can only be done once the IP addresses of the attackers are known. This is 

achievable by monitoring the flows to- and from the target and identifying the attacker's IP 

addresses, allowing the company or their ISP to block the addresses itself by blackholing them (also 

known as a null route). 

Using the IP accounting and monitoring tool “pmacct”, this thesis aims to investigate whether or not 

the pmacct suite is suited for larger installations when tracking and mitigating DDoS-attacks, such 

at an Internet Service Provider (ISP). Potential problems are the amount of traffic that need to be 

analyzed and the computational power required to do it. This thesis also provide information about 

the pmacct suite at large. 

The conclusions are positive, indicating it does scale up to handle larger installations when given 

careful consideration and planning. 

1

Sammanfattning 

Att försvara sig mot DoS-och DDoS-attacker är svårt att åstadkomma; att hitta och filtrera ut 

illegitim trafik från det legitima flödet är nästan omöjligt. Att vidta åtgärder när en sådan attack 

upptäcks kan endast göras när IP-adresserna från angriparna är kända. Detta kan uppnås genom att 

man övervakar trafikflödet mellan målet för attacken och angriparna och ser vilka som sänder mest 

data och på så sätt identifierar angriparna.. Detta tillåter företaget eller dess ISP att blockera trafiken 

ifrån dessa IP-adresser genom att sända trafiken vidare till ingenstans. Detta kallas blackholerouting 

eller null-routing. 

Genom att använda redovisnings- och övervakningsprogrammet pmacct syftar denna uppsats på att 

undersöka hurvida pmacct-sviten är lämpad för större installationer när det gäller att spåra och 

förhindra DDoS-attacker, såsom hos en Internetleverantör eller dylikt. Potentialla problem som kan 

uppstå är att mängden trafik som måste analyserar blir för stor och för krävande. Denna avhandling 

går även igenom pmacct-verktyget i sig. 

Slutsatserna är lovande, vilket indikerar att den har potential av att kunna hantera sådana stora 

miljöer med noggrann planering. 

2

Acknowledgments 

I would like to thank Fredrik Holmqvist, Internet 2 Business KB, who was my supervisor at the 

company, and also to thank the company at large for creating the opportunity for me to make this 

bachelor's thesis. Fredrik, and others at the company, were always been willing to offer a helping 

hand and provide information when needed. 

Many thanks as well to my supervisor Stefan Löfgren at the Department of Innovation, Design and 

Engineering at Mälardalen University. His guidance helped shape this report, vastly improving it 

into what you now see before you. 

Last but not least, I would like to reach out and thank Paolo Lucente, the creator of the pmacct suite, 

upon which this thesis is based. His advice over continuous mail correspondence regarding 

configuration issues were essential and his eagerness to help people should be recognized. 

3

Table of Contents 

Abstract..............................................................................................................................................1 

Sammanfattning................................................................................................................................2 

Acknowledgments..............................................................................................................................3 

1 Introduction....................................................................................................................................11 

1.1 Thesis background...................................................................................................................11 

1.2 Related work............................................................................................................................12 

1.3 Problem formulation................................................................................................................12 

1.4 Purpose.....................................................................................................................................12 

1.5 Software used...........................................................................................................................12 

1.6 Organization.............................................................................................................................13 

1.7 Limitations...............................................................................................................................13 

2 Theoretical background................................................................................................................15 

2.1 Internet Protocol......................................................................................................................15 

2.2 Apache HTTP Server ..............................................................................................................16 

2.3 Autonomous Systems...............................................................................................................16 

2.4 NetFlow...................................................................................................................................18 

2.5 sFlow........................................................................................................................................19 

2.6 IPFIX.......................................................................................................................................19 

2.7 Linux........................................................................................................................................20 

2.7.1 Ubuntu.............................................................................................................................20 

2.7.2 Cron.................................................................................................................................20 

2.7.3 Scripting..........................................................................................................................21 

2.7.4 Libpcap............................................................................................................................22 

2.8 Quagga.....................................................................................................................................22 

2.9 MRTG......................................................................................................................................23 

2.10 pmacct....................................................................................................................................24 

2.11 Baselining..............................................................................................................................24 

3 Problem analysis............................................................................................................................25 

3.1 Collecting the data...................................................................................................................25 

3.1.1 Protocol decision.............................................................................................................25 

3.1.2 Choice of NetFlow-collector...........................................................................................25 

3.1.3 Network configuration....................................................................................................25 

3.2 Storing the data .......................................................................................................................25 

3.3 Displaying the data .................................................................................................................26 

4 Implementation..............................................................................................................................27 

4.1 Choice of method.....................................................................................................................27 

4.1.1 Collecting the data..........................................................................................................27 

4.1.2 Storing the data ..............................................................................................................29 

4.1.3 Displaying the data..........................................................................................................29 

4.2 Method criticism......................................................................................................................29 

4.3 Solution....................................................................................................................................30 

4.3.1 Collecting the data - solution..........................................................................................30 

4.3.2 Storing the data – solution..............................................................................................33 

4.3.3 Displaying the data – solution.........................................................................................35 

4.4 Problems .................................................................................................................................36 

4.4.1 Problem with collecting the data ....................................................................................36 

4

4.4.2 Problems with storing the data .......................................................................................37 

4.4.2.1 pmacct .....................................................................................................................38 

4.4.2.2 MySQL.....................................................................................................................38 

4.4.2.3 Execution time of said custom-made script.............................................................38 

4.4.3 Problems with displaying the data..................................................................................39 

4.4.3.1 MRTG graphing incorrectly.....................................................................................39 

5 Results.............................................................................................................................................42 

5.1 Analysis of results....................................................................................................................42 

5.2 Recommendations....................................................................................................................43 

5.3 Future work..............................................................................................................................43 

6 Summary and conclusions............................................................................................................44 

7 References.......................................................................................................................................45 

8 Appendix.........................................................................................................................................51 

8.1 E-mail conversation with Paolo Lucente.................................................................................51 

8.2 Enabling NetFlow on Cisco devices........................................................................................52 

8.3 Configuration of NetFlow on the small router.........................................................................52 

8.4 Configuration of Quagga.........................................................................................................52 

8.5 Configuration of pmacct..........................................................................................................53 

8.5.1 agent.map........................................................................................................................54 

8.5.2 pretag.map.......................................................................................................................54 

8.6 Bash Script...............................................................................................................................54 

8.7 Python Script............................................................................................................................54 

8.8 Script for MRTG......................................................................................................................55 

8.9 Custom-made script to push the data into the database...........................................................56 

8.10 RRDTOOL-script (rrdtool.sh)..............................................................................................58 

5

6 

Listen to the music, not the song

Definitions & Terminology 

AS / ASn Autonomous System Number, a unique identifier which every organization 

must have in order to be part of the Internet. Prominently used by Internet 

Service Providers (ISPs) 

BGP Border Gateway Protocol (BGP) is the routing protocol used between 

Internet Service Providers and similar large installations. 

Cacti Cacti is an open source web-based network monitoring and graphing tool. 

Datagram Blocks of data. 

Default route A default route is a term for a packet forwarding rule which occurs when no 

other route can be determined for a particular address. Also called a gateway. 

DoS Denial of Service, an attack that consists of overwhelming the target with 

information. 

DDoS Distributed Denial of Service, an escalated version of DoS-attacks that 

utilizes several computers working in unison to attack a target. 

GNS3 An add-on for Dynamips, which emulates Cisco routers. 

Header The part of an IP packet which contains routing-relevant information, such as 

protocol version, time to live before it gets discarded, source- and destination 

address 

HTTP Hypertext Transfer Protocol, the protocol used for browsing web pages when 

surfing on the web. 

HTTPS A derivative of the HTTP protocol, the HTTPS adds a layer of security by 

encrypting traffic between the sender and receiver. 

IETF Internet Engineering Task Force. 

IGP An interior gateway protocol (IGP) is a routing term regarding protocols that 

is used to exchange routing information within an autonomous system. 

Examples of these are RIP, OSPF, IS-IS. 

IPFIX A network protocol developed for collecting IP traffic information, created 

by the IEFF in an attempt to unite the different flow protocols into a universal 

united standard. 

ISP Internet Service Provider. 

Layer 3 routing Routing on layer 3 of the OSI model means that the routing decisions use IP 

addresses to reach the desired networks. 

MySQL Open source database system. 

MRTG Monitoring tool for the traffic load (usage) on network links. 

7

MySQL A relational database management system; currently the world's most used 

RDBMS. 

NetFlow A network protocol developed for collecting IP traffic information. Only 

available on Cisco devices. 

Null route Also called a blackhole route, a null route is a route that goes nowhere, in 

effect discarding packets from a particular network. An example of discarding 

traffic from the network 192.168.1.0 / 24 on Cisco IOS is as follows: 

ip route 192.168.1.0 255.255.255.0 null0 

OSI model A standardized method of viewing the functions of a communications system 

by separating them into different layers. 

IPv4 Internet Protocol version 4, the most commonly used inter-networking 

method of communicating on the Internet. 

IPv6 Internet Protocol version 6, the next step up from IPv4. 

pmacct The name of the program suite as well as a tool in the suite itself. Other 

included tools are nfacct, IPFIX, sfacct and uacct. 

Quagga An open source network routing software suite providing implementations of 

OSPF, RIP, BGP and IS-IS for Unix-like systems. 

Route A route is used by routers and contain the best road for traffic to take in order 

to get from network A to network B. 

Router A device that forwards data packets between computer networks. 

Router, core A core router is a router that is designed to operate in the network's backbone, 

or core. All the traffic from the network will flow through here, with the core 

router sending the packets to their destination. 

RRDtool Being an acronym for round-robin database tool, RRDtool is designed to be a 

tool used for handling time-sensitive data such as bandwidth-data, 

temperatures and so on. Stores information in a round-robin database. 

sFlow A network protocol developed for collecting IP traffic information. Unlike 

NetFlow, sFlow is available on a multitude of device manufacturers. 

SQL Structured Query Language. A programming language designed for managing 

data in an RDBMS. 

TCP Transmission Control Protocol. 

Tier 2 Carrier An ISP who peers with some networks but still purchase IP transit from 

others to reach the rest of the Internet. 

TOS Type of Service, a field in the header of an IPv4 packet. 

UDP User Datagram Protocol. 

8

VirtualBox Virtualization software for Operating Systems. 

VMware Virtualization software for Operating Systems. 

9

1 Introduction 


This introductory chapter will describe the background of this thesis, as well what earlier work has 

been made in this area. The problem will be formulated followed by the objectives of the thesis 

work. With this defined, the different methods will be presented and evaluated. Towards the end of 

the chapter the structure of the project such as organization, limitations and the outline of the rest of 

the report will be described. 

1.1 Thesis background 

The need for protection against cyber attacks increase every day with more and more companies 

tied to the Internet even tighter than before [1]. Internet has become a reliable medium in which one 

needs to be seen in able to compete. Various security measures exists today to secure and harden a 

company network and its perimeter to the outside world, in case of an attack. These attacks vary 

both in difficulty and execution time. Attacks might start with reconnaissance, which could be as 

simple as dropping a few infected USB memory sticks on the parking lot in an attempt to get an 

inside look of the network. Attacks in particular that are hard to protect oneself against however, are 

the so called Denial of Service (DoS) or Distributed Denial of Service (DDoS) attacks [3]. These 

types of attacks aim to flood the targeted server(s) with bogus traffic in an attempt to overwhelm its 

resources. If an attacker simultaneously sends a large amount of requests to a web server, it has to 

respond to all those requests. This forces legitimate requests to be put on hold, as the server is busy 

handling a massive amount of fake requests. If the stream of requests from the attacker does not 

cease and the server is kept busy the users would experience the site as slow or unresponsive. 

Another type of attack involves starting a connection and not completing it, leaving it in a half-open 

state. When a TCP connection is made, the client start by sending a SYN (synchronize) packet to 

the server. The server respond with a SYN-ACK (synchronize acknowledgment), confirming the 

request. Finally the client sends its own ACK, confirming the connection. The server waits until it 

receives the final acknowledgment, however if a client omits to send it the server will in turn be 

kept in a waiting state forever, and thereby tying up resources. If enough connections are made, its 

resources will be exhausted and unable to process legitimate requests. [24] 

One method of mitigating the effects from these type of attacks is to identify the IP addresses of the 

attackers involved and block them from the network. In order to accomplish this one needs to 

collect traffic information from the network, separating bogus traffic from legitimate traffic and act 

accordingly. 

One suite of such data collecting tools is called pmacct [4], which is a collector for network traffic 

information. It works by collecting information from senders and recipients throughout the network, 

including the source- and destination address. With the added feature of being able to track BGP 

AS-numbers, aggregating all flows from a particular ISP would effectively monitor traffic sent to- 

and from that ISP. The pmacct suite will be evaluated at an Internet Service Provider for testing and 

evaluation. It will first be implemented and configured on a small scale in their network and then 

evaluated. If the results are positive, pmacct will then be implemented throughout their entire core 

network. 

The testing will take place in Stockholm, at an Internet Service Provider called Internet 2 Business 

KB (I2B). Internet 2 Business Sverige KB was founded at the second half of 2009 by Fredrik 

11


Holmqvist, who still remain at the company as its CEO at present date. Situated in Stockholm, 

Sweden, their network consist of about 120 devices (switches / routers) in several different 

countries including Norway, Denmark, England, Germany and Holland. Their primary clients 

consist of corporations, other ISP's and organizations. 

1.2 Related work 

There are instances of pmacct being used by individual people or small businesses. There are 

however fewer instances of pmacct being implemented in larger installations, although it has been 

successfully done. As mentioned by the creator of pmacct, Paolo Lucente, once such a case study 

has been done the results are usually kept secret by the company in question. A company might not 

want to expose the topology of their network since it poses a security risk (see the appendix, chapter 

8.1). 

Chris Wilson wrote a report about bandwidth management for AfNOG in 2010 

(https://nsrc.org/workshops/2010/nsrc-unan-leon/raw-attachment/wiki/Agenda/afnog-bmopresentation-2010.pdf) 

in which he displayed a working configuration. 

1.3 Problem formulation 

The issue at hand is to determine whether or not pmacct is suitable for larger installations when it 

comes to collecting network traffic information aimed towards mitigating DDoS attacks. It will 

investigate if it scales with the company as it grows and if the hardware requirements scale at the 

same pace. This thesis will also provide information about the pmacct suite at large. 

1.4 Purpose 

The purpose of this thesis is to investigate whether or not pmacct is a viable option for gathering 

network traffic on larger installations, i.e, at an Internet Service Provider (ISP). Implementation of 

both IPv4 and IPv6 is desired, with IPv4 having higher priority. Additionally, the pmacct tool as a 

NetFlow-collector in junction with MRTG will be implemented for visual presentation of the data. 

Choosing to implement this at an ISP is ideal both due to its size and its complexity and will 

hopefully provide a definitive resolution to the question at hand. 

1.5 Software used 

The following software tools was used in this thesis; tools chosen for their ease of use and 

familiarity. 

12 

Software Purpose 

Bash Scripting language 

GNS3 Router emulator software 

Linux (Ubuntu) Operating System


1.6 Organization 

MRTG Load monitoring and 

graphing software 

OpenOffice Word processing 

pmacct Network traffic collector 

PuTTY SSH client 

Python Scripting language 

Quagga Routing software suite 

RRDTOOL Creating files for MRTG 

VMware OS Virtualization software 

TShark Network protocol analyzer 

phpMyAdmin Web-based MySQL 

administration 

This bachelor's thesis was made for Internet 2 Business KB (simply referred to as as I2B in this 

thesis), from which much guidance has been provided. Theoretical knowledge and experience have 

foremost been provided by the division of Innovation, Design and Engineering at Mälardalen 

University. 

1.7 Limitations 

Network monitoring software is a large chapter in itself, touching many aspects of computer 

science. It is because of this a set of limitations must be enforced; the project is limited to a total of 

10 weeks of full-time work, with a few set aside for the report itself. 

• This thesis will focus primarily on IPv4 while only grazing the subject of IPv6; there was 

not enough time to complete both. 

• This thesis will implement a 'proof of concept'-solution on a small router and then migrate 

this configuration to the core router, should it scale appropriately. 

13


14

2 Theoretical background 


In this chapter, the various technologies used in the implementation section of this thesis will be 

described. 

2.1 Internet Protocol 

The Internet Protocol (IP) is the most commonly used protocol for sending datagrams from, to and 

between networks. IP addresses are used by routers and routing protocols to send traffic from the 

source host to the destination host, using the best path available. 

Illustration 1: IP traffic overview - sending data from host A to host B 

IPv4 

As specified by the IETF in 1981 in RFC791 [16], the IPv4 protocol is the fourth revision of the 

Internet Protocol and was drafted in 1981. It is still the most commonly used Internet Layer 

protocol as of 2012. Based on a 32-bit addressing scheme, IPv4 allows for a maximum of 2 32 

(4,294,967,296) unique addresses. An address is divided into four sections, each dot-separated, with 

each section containing an 8-digit number. This is the reason a section is also referred to as an octet. 

These numbers are in turn in the range of 0 to 255, or written in binary, 00000000 to 11111111. 

The header of an IPv4 packet is embedded in every packet sent and received; even if several 

packets are destined for the same host. This help to decrease the chances of packet loss due to a 

downed router along the way, as the packets will just take another route and be inspected for their 

destination. Below is a graphical display of the different fields that make up the IPv4 header, with 

information relevant to NetFlow color coded: 

Version IHL DSCP ECN Total length 

Identification Flags Fragment Offset 

Time To Live Protocol Header Checksum 

Source IP address 

Destination IP address 

Options (rarely used) 

Due to its limitations in itself and the massive increase in network-capable devices, i.e. laptops and 

15


hand held devices, the number of allocated IP address blocks reached it peak in 2011 when the last 

free block was allocated [17][18]. 

IPv6 

Continuing the specifications by IETF, Internet Protocol revision six was drafted in the year 1998, a 

full 17 years after its predecessor's creation date. Now based on a 128-bit addressing scheme, IPv6 

allows for a maximum of 2 128 (3.4⋅10 38 ) addresses [19]. An address is divided into 8 sections by 

a colon, with a four-digit hexadecimal number in each section. These addresses can be shortened by 

reducing groups of four-digit zeroes to a single zero or removed altogether. 

2.2 Apache HTTP Server 

The Apache HTTP Server is a web server application, serving users web pages over HTTP. It is 

released free of charge and is developed and maintained by the Apache Software Foundation. The 

Apache server itself runs on most operating systems, including Unix- and Linux variants as well as 

Microsoft Windows. It contains a core program which handles the requests and extends features by 

using modules, and is as such both memory efficient and modular. [31] 

The Apache web server is used in over 448 million sites all over the world with a market share of 

over 64% as of March 2012 according to the site Netcraft.com [32]. 

2.3 Autonomous Systems 

An ISP must have at least one Autonomous System (AS) in order to be part of the Internet. Some 

larger ISP's even have several. These ASes are used to mark the boundaries of any particular ISP; 

which networks fall under a particular ISP's administration and which does not, what the routing 

policy is for the networks in question and so on [20]. Routing between different ASes is done with a 

Exterior Gateway Protocol (EGP) while the internal traffic within an AS is routed by using an 

Interior Gateway Protocol (IGP). The most prominent EGP protocol is the Border Gateway 

Protocol, or BGP for short [21][22]. Despite BGP being able to route traffic between ASes, it is also 

capable of routing inside the AS itself. It is in these situations called External BGP (EBGP) and 

Internal BGP (IBGP), respectively. Routing IGP traffic is however usually left to other protocols 

partly due to faster convergence, such as OSPF [23] or IS-IS [25]. 

16 

Illustration 2: Shortening an address by removing zeroes


RFC 1930 provide guidelines in the decision-making of whether or not one would need an own 

public AS. In essence, it is only needed if the AS is connected to other ASes (i.e. multi-homed) or if 

the routing policy differs from the service provider. If not, the AS should be placed in an AS of the 

provider. [20] 

“An AS is a connected group of one or more IP prefixes run by one or more network operators 

which has a SINGLE and CLEARLY DEFINED routing policy.” [20] 

BGP 

BGP is the primary routing protocol on the Internet and the most commonly used for EGP routing. 

It became the most prominent protocol partly for being a very reliable protocol as well as being 

very scalable. As of this thesis, the BGP routing table consists of just above 400.000 routes (see 

illustration 14). BGP is also able to merge several routes into one or more routes in a process called 

supernetting, which saves memory by keeping the size of the routing table down. 

As an EGP routes between ASes, the amount of network chatter must be kept to a minimum to 

allow for better use of the available bandwidth. This is why BGP, unlike some protocols, do not 

send periodic updates about the network topology. BGP instead notifies its neighbors about a 

downed network when the event occurs. This method of updating the network topology is known as 

triggered updates. 

BGP works by establishing connections to other BGP neighbors, called peers, on TCP port 179. 

Once successful, BGP begins by transmitting its entire routing table to its neighbor, unless stated 

otherwise. Any further updates will be by triggered events only. It sends a keep-alive message every 

30 seconds to ensure the connectivity of its neighbors. If the timers expire and no connection to a 

neighbor is re-established, the route is marked as down and removed from the routing table 

followed by a triggered update to its neighbors. 

BGP differs from the other routing protocols by neither being classified as a distance-vector 

protocol or a link state-protocol. BGP bases its routing decisions on attributes instead. These 

attributes are divided into groups of well known mandatory and discretionary [54][55]: 

• Well known mandatory 

Must appear in every UPDATE message and must be supported by every BGP software 

implementation. 

• AS_PATH: The AS_PATH attribute is a list of all the ASes a particular route has to take 

to reach its destination. Separated by spaces, this list can contain several ASes if the 

distance is big enough. 

• ORIGIN: The origin attribute indicates how BGP learned about a particular route. 

Allowed values are IGP, EGP or Incomplete. 

• NEXT-HOP: The EBGP next-hop attribute is the IP address that is used to reach the 

advertising router. 

17


• Well known discretionary 

May or may not appear in every UPDATE message, but must nonetheless be supported. 

• LOCAL_PREF: Used by a BGP peer to prefer a specific exit point from the local AS if 

there are several to choose from. This information is propagated to peers in the UPDATE 

message. 

• ATOMIC_AGGREGATE: Alerts BGP speakers along the path that some information 

have been lost due to route aggregation (supernetting). 

The most commonly used version is version 4, which added support for supernetting and CIDR. 

Support for IPv6 exists as well. [22] 

2.4 NetFlow 

NetFlow was a protocol originally developed by Cisco Systems® in 1996 to collect IP traffic 

information. It can answer questions such as who the top bandwidth users are, what applications 

they use and what percentage of traffic they use by sending the information to a collector [26] [45]. 

NetFlow is a tool which can be used for mitigating DoS attacks since the traffic is actively being 

monitored. NetFlow does this by recording IP traffic statistics and exporting them as flows to a 

collector for analysis. A flow is a set of data packets which share common characteristics, such as 

packets coming from the same source and heading for the same destination. Cisco deem that the 

following characteristics must match in order for it to be classified as a flow [2]: 

18 

1. Ingress interface 

2. Source IP address 

3. Destination IP address 

4. IP protocol 

5. Source port for UDP or TCP protocols, 0 for other protocols 

6. Destination port for UDP or TCP protocols, 0 for other protocols 

7. IP Type of Service (TOS)


Illustration 3: NetFlow overview 

It is worth mentioning that a TCP header is only 20 bytes in size [51]. 

See appendix 8.2 for information on how to configure NetFlow on Cisco devices. 

2.5 sFlow 

Much like NetFlow, sFlow collects network traffic information for use in monitoring the network. It 

is an industry standard, which means it is supported on a majority of devices. It too sends network 

traffic information to a collector for further analysis. This collector is a computer running any of the 

many available programs for gathering flow-data. It works by sampling the network traffic – that is, 

it tags packets, one out of every N packets, and send it to the collector. The marked packet's header 

information is saved into a new packet and sent to the collector once it reaches 1500 bytes (the 

maximum size for a packet). Along with the header information from packets, information about the 

sampling rates and interface id are also included. This type of sampling is called random sampling. 

The other type of sampling method involves basing the samples on a time-based polling interval 

and is called counter sampling. [39] [40] [41] [42] [43][46] 

2.6 IPFIX 

Internet Protocol Flow Information Export (IPFIX) is a protocol created by the IETF in an attempt 

to create a common universal standard for flow information from routers and other devices, 

eliminating the need for a specific vendor to create a version of its own [44]. Being based on Cisco's 

NetFlow version 9, the characteristics for defining a flow are the same - same source, same 

19


destination and so on. 

2.7 Linux 

Linux is a Unix-like kernel originally developed by Linus Torvalds while he was a student at the 

University of Helsinki. It was officially released on October 5 th , 1991 as open source software. 

Linux teamed up with the GNU system which supplied programs such as the X Window System, 

the BASH shell and other various programs. Together they formed a complete Operating System 

with Linux as the underlying kernel. Linux is licensed under the GNU General Public License and 

are together called GNU/Linux. [29] 

There are several different distributions of GNU/Linux available, each aimed for their specific 

market. One distribution might be optimized for size and may not include the X Window System or 

advanced word processors. One distribution might be aimed toward the server market and focus 

their applications toward server-based programs and services. What they all share in common is the 

underlying system - the kernel. Regardless of distribution, the kernel is the same in every one. 

Illustration 4: Components of a Linux distribution. Image courtesy of Erik Streb 

2.7.1 Ubuntu 

Ubuntu Server is a distribution released by Canonical Ltd., aimed towards server use. It does not 

come with a graphical user interface, such as the X Window System, and is thus purely text-based. 

Ubuntu Server's package manager is dpkg, the same as Debian uses, which is another popular 

distribution. Graphical front ends such as Synaptic are available. 

2.7.2 Cron 

Cron is a program which allows automatic execution of commands or scripts based on a schedule. It 

runs as a daemon to ensure its availability and is the default scheduler in Linux systems, while at 

20


the same time being easy to manage. There are several paths available in implementing a scheduled 

command / script execution; the first simply being to put the script in question in the correct folder 

in the /etc directory. There are five folders specifically targeted for cron jobs, each folder running 

at specific intervals. The contents of the folder /etc/cron.hourly runs every hour, and so on. 

Furthermore, each user account have their own crontab file for personal use. They can schedule 

program executions themselves without administrative rights, if the program in question does not 

require it. 

Another way to implement a scheduled execution is to edit cron's configuration file directly, usually 

located at /etc/crontab. 

One script per line, with its execution time in the following format: 

* * * * * /path-to-script 

Each field, denoted by an asterisk, are minute, hour, day of month, month and day of week. 

Replacing each field with an appropriate number tells cron to execute the script at that specified 

time. An example of * * * * * /path-to-script tells cron to execute the script at one 

minute past of every hour. Keeping the asterisks implies an “always” solution, hence the first 

example would execute every minute of every hour of every day of the month, every month and 

every day of the week. [30] 

2.7.3 Scripting 

A very basic script is a set of commands which aim to automate certain tasks and are executed from 

a top-to-bottom hierarchy. Such tasks can be anything from going through log files and alert the 

administrator if an error has occurred, or simply taking a backup of old logs by compressing them to 

an archive, move them to another folder and delete the old logs. Being able to automate such tasks 

help the administration enormously, enabling them to focus on more important tasks. 

The most commonly used scripting languages today have evolved to support more complex 

methods, such as conditional execution (i.e. if...else and looping), variable creation and evaluation. 

A script differs from a program in the sense that scripts does not need to be compiled or linked 

before being able to run; the code is not compiled into machine code but rather interpreted. This 

allows for immediate execution once the script is finished. [33] 

Bash scripting 

A Bash-script must start with the line #!/bin/bash, also known as “shebang”. This tells the 

system which program will be used to run the file. In this case, the Bash-shell. 

Bash-scripts support the use of variables within them but lack support for data types. Thus, any 

variable can contain numbers, strings, or any mix thereof. These variables do not need to be 

declared beforehand, simply assigning a value to it will create it. Blocks used for evaluation or 

conditional execution starts with “if..” and end with “fi” (“if” backwards). Loops starts with the 

appropriate loop being either “for”, “while” and “until”, and end with “done”. [34] 

21


A simple Bash-script utilizing variables and evaluation can be found in the appendix, section 3. 

Python scripting 

A Python-script must start with the “shebang”-line as well, albeit this time telling the system to use 

Python instead: #!/usr/bin/env python. 

Python is a high-level language which have grown in popularity in recent years, partly due to it 

being extensively documented as well as having a clean syntax. Being more strict than Bash, 

Python use whitespace indention to mark where blocks start and where blocks end. A block in this 

sense refer to evaluation or conditional execution, i.e. if...else and looping. A simple Python-script 

utilizing variables and evaluation can be found in the appendix, section 4. 

2.7.4 Libpcap 

Libpcap is an open source-framework written in C and designed for low-level network information 

capturing as well as manipulation. Commonly used “packet sniffers” such as tcpdump, Snort and 

Wireshark all use Libpcap. A framework for Microsoft Windows is also available, under the name 

WinPcap. [38] 

2.8 Quagga 

Quagga is a software suite designed for routing traffic much like a real router would. Originally a 

spin-off from the GNU Zebra project, another software routing suite which became commercialized 

and no longer supported, Quagga was created in an effort to fix the many bugs throughout the 

software suite as well as keeping it free. [56] It is capable of running multiple routing protocols 

such as OSPF, RIP and BGP for both versions of the Internet Protocols, namely IPv4 and IPv6. It 

runs on most Unix platforms such as FreeBSD, Linux, Solaris and NetBSD. Quagga consists of a 

core process, Zebra, which acts as an underlying layer to the Unix kernel, utilizing its own API to 

send TCP streams to the Quagga routing protocols. In essence, the core process is simply passing 

data to- and from the routing processes, and it is these processes that do the majority of the work. 

Quagga make use of select(2) system calls instead of multi-threading due to the problems with the 

thread library in GNU/Linux and FreeBSD systems. 

Quagga was originally a split from the GNU Zebra project, another software routing suite which 

went inactive in 2005. It is released under the GNU General Public License. [35] 

22


2.9 MRTG 

Illustration 5: Illustration depicting the Quagga processes and 

roles 

MRTG is a graphical tool used to monitor the amount of traffic passing through a network link, 

such as a router, switch or server. Being written in Perl, it has successfully been ported to most Unix 

platforms as well as Microsoft Windows. MRTG use data obtained from either using SNMP to poll 

the devices in question, or by a third-party script. Once the data is obtained it generates HTML 

pages containing images based on the traffic load, allowing a graphical view of the traffic load 

complete with timestamps and history. 

Illustration 6: Example MRTG graph. Image courtesy of Tobias Oetiker 

23


As derived from the example illustration above, traffic in is plotted as a full green area while traffic 

out is plotted as a blue line. It displays the maximum recorded value, average- and current value. 

[36] 

MRTG can be used in conjunction with RRDTOOL, a software suite designed to log and analyze 

collected data. It does this by storing the data in a custom database file that end in .rrd format. These 

files contain historical data over a definable time period. Using the RRDTOOL program, graphs can 

be generated from the database. These graphs are pure image files, enabling them to easily be 

incorporated into web pages for viewing. [37] 

2.10pmacct 

pmacct is a network monitoring suite designed to run on Unix systems with the aim to classify, 

measure and export real-time traffic data. 

pmacct offers finer granularity than using SNMP for collecting traffic information, of which it is 

capable of getting from several sources including NetFlow, sFlow and IPFIX. It can then take this 

information and tag it in order for it to be in line with the company policy or just a way of 

distinguishing important traffic before exporting it. pmacct can export the data in many ways, 

including several different types of databases, memory tables or even files on the hard drive. 

2.11 Baselining 

In order to distinguish potential harmful traffic from the ordinary day-to-day traffic throughout the 

network, a method of baselining can be implemented. Having a baseline of the traffic means to 

know how much traffic normally pass through and at what time during the day certain drops or 

spikes occur. A prime example of this would be seeing a drop in traffic during lunch hours when 

employees are away from their computers. There might be a cause for alarm if the drop doesn't 

occur when it should, or when any deviation from the norm is observed. Investigating the issue 

might show a high amount of connections of the same type to a specific IP address, which could 

indicate a potential DoS attack in progress. 

Likewise if an alarm goes off indicating more than the usual number of open connections on a 

device, server or router, might indicate a SYN Flood-attack in progress. 

24

3 Problem analysis 


The assignment revolves around the pmacct software suite. Hence, obtaining and configuring a 

working Linux machine with the appropriate software is required. The machine itself should 

preferably be plugged in to the network close to the core router in order to avoid unnecessary 

overhead by having the NetFlow traffic passing through multiple switches and routers, as the 

amount of data will probably be significant. 

3.1 Collecting the data 

Collecting all traffic from every entry-point at I2B could generate an unnecessary load on the 

network and its devices. A way of lessening or compressing the data might be required. The file size 

of the logs may grow up to a point of not becoming manageable, since the entire BGP routing table 

will be logged. The hardware on the computer must also be taken into consideration, such as the 

amount of CPU cores and the amount of RAM available. Tools for monitoring disk I/O may be 

needed in order to find -and assess- certain bottlenecks such as the file containing the routing table. 

If the disk I/O is too intense and begin to affect normal computer operations, keeping the logs on 

another drive might be needed. 

3.1.1 Protocol decision 

Decision about which flow-protocol to use must be decided; NetFlow, sFlow or IPFIX are the most 

prevalent; each protocol presents its own benefits and drawbacks. These protocols must be 

examined and evaluated and a conclusion must be made as to which will be used. A balance must 

therefore be reached between operability of the protocols and the current topology in use by the 

service provider. 

3.1.2 Choice of NetFlow-collector 

Configuration of the chosen NetFlow-collector would be needed in order to tag and save the desired 

information, as well as to implement a feasible method for mitigating DDoS attacks. Since the 

company wanted to evaluate pmacct, the choice has already been made. A brief summary of the 

different programs available should nonetheless be mentioned. See chapter 4.1.1 for more 

information. 

3.1.3 Network configuration 

BGP peering between the core router and the software router, Quagga, need to be configured and 

established since Quagga provides the pmacct suite with a routing table as well as BGP AS 

information. Extra thought and careful planning need to be given to the configuration between the 

two as this will be done live. Wrongful configuration could end up interrupting the traffic for the 

company's customers and steps need to be taken to minimize any such damage, should it occur. 

3.2 Storing the data 

The collected data need to be stored to allow for a historical view. This can be a either long-term 

25


solution or something smaller such as the time frame of the historical data. That is, if the company 

graphs one day worth of data, should data further back be available? Saving data longer than that 

might be counterproductive, given the increasing amount of hard disk space needed by the log files. 

3.3 Displaying the data 

A graphing tool is required for taking our collected data and creating historical graphs, presenting a 

clean overview for easy viewing. The requirements would be finding a tool which can draw data 

from multiple sources, allowing for more compatibility with the chosen storage method. 

26

4 Implementation 


This chapter will describe the different methods available, as well as which methods were chosen 

and more importantly, why they were chosen. 

4.1 Choice of method 

This chapter will present the chosen method to solve each problem, explaining why a certain 

solution was chosen over the others. 

4.1.1 Collecting the data 

Protocol decision 

Several widely used protocols are available today, all of which gather network traffic in their own 

way. The most prominently used protocols are NetFlow, sFlow and IPFIX. 

All Cisco devices which support flow accounting only support NetFlow, which is their own 

solution. One benefit for using NetFlow is that it captures all of the IP traffic without missing 

anything – every packet it accounted for. This needs especially to be taken into consideration if one 

is using IP billing, i.e. charging customers for the amount of traffic they use. Furthermore, it might 

be advisable to utilize NetFlow if the network already consists of Cisco devices, as it is the only 

supported flow-protocol. 

Another protocol is sFlow, which is based on sampling. Due to the nature of sampling it is very 

probable for sFlow to miss some of the packets, since only one in every N packets are forwarded to 

a collector for actual analysis. However, the greatest benefit sFlow have is its ability to be protocolindependent. 

Whereas NetFlow can only account for IP traffic, sFlow can gather any protocol such 

as IPX and AppleTalk. It is also capable of operating on Layer 2 and as such does not require Layer 

3 routing like its peers. Thus, sFlow is a strong competitor in an environment running multiple 

protocols. 

In the end, NetFlow was chosen as the protocol of choice for several reasons; the company utilizes 

Cisco devices throughout their network, and it allows for the capture of smaller flows since every 

flow is accounted for. This was decided primarily for DDoS-attacks in mind in which attackers 

might try to avoid detection by creating smaller flows but with an increased numbers of hosts. This 

rules out sFlow since it is not supported on Cisco devices. IPFIX was rejected since NetFlow would 

suffice for the company's needs and the ease of configuring – it works by just enabling a few 

commands in an already running device. 

Program decision 

Although the choice of software was already decided upon by the company, several tools exist: 

• NetFlow Analyzer by ManageEngine [6] is such a tool. It comes in three editions, with 

prices varying from $795 up to $9995 as of June 2012. Understands IPv4 as well as IPv6. 

27


• Peakflow by Arbor Networks [7] is another commercial product worth mentioning as they 

provide several solutions suited for different scenarios, in addition to having teamed up with 

Cisco Systems to provide additional features and tools to mitigate DDoS attacks. 

Understands IPv4 as well as IPv6. 

• Flowc [8] by Uninet Ltd. is tool which is open source but only works with Cisco devices, 

using NetFlow version 5 only. Documented configuration and explanation is available, 

although the site was has not been updated since 2006. Status of IPv6 support is unknown. 

• Flowd [50] is a another NetFlow-only suite, understanding all versions of NetFlow as well 

as support IPv4 and IPv6. 

• NFDUMP, a small tool distributed under the BSD license, works by collecting NetFlow data 

(v5, v7 and v9) and writing it to disk. Combined with the graphical web-based front end 

NfSen, the suite collects and graphs the data. Understands IPv4 as well as IPv6. [48] [49] 

• The last tool to be mentioned is pmacct, by Paulo Lucente [4]. It is also the software of 

choice by Internet 2 Business, and hence this thesis. It is also free and extended 

documentation is available, as well as support from other uses via mailing lists. Understands 

IPv4 as well as IPv6. 

For a more comprehensive list of programs related to flows, see the appendix “List of software 

related to flow accounting” [5]. 

Routing platform decision 

Although Quagga was the chosen software by the company, several other tools exist: 

• Bird Internet routing daemon [57] is an open source routing daemon for Unix systems and 

support the BGP, RIP and OSPF routing protocols on both IPv4 and IPv6. Already in use in 

internet exchanges [58] where it came to replace Quagga for scalability issues [59], it is 

again in active development as of 2008. Editions for Ubuntu and Debian exist [60][61]. 

• OpenBGPD [62] was developed as an alternative to Quagga when it became apparent that 

Quagga did not fit the requirements and quality standards for BSD [63]. It runs on Unix 

systems and supports BGP v4 only. It is licensed under a BSD license and the latest version, 

4.6, was released in 2009. 

• Vyatta is a software-based virtual router as well as the name of the company itself [64]. 

Although commercialized, a free open source version exists which support OSPF- and BGP 

routing. The full commercial version adds support for a web GUI, dashboard, technical 

support 24 / 7 as well as access to the API itself [65]. 

The decision to use Quagga for peering with the routers was in majority due to the company's 

wishes to implement it, possibly for a proof of concept for future work. 

28


4.1.2 Storing the data 

Having the data stored in a database had numerable advantages over keeping the data as flat files on 

the hard drive; complex queries to determine which IP address or AS are the top talkers, how many 

different flows a client was using or even which port was used the most. Accomplishing the same 

thing with the data in flat files would require a script, which would undoubtedly be slower. Thus, a 

database solution was determined from the start. The available databases suited to the task was 

MySQL and PostgreSQL, with MS SQL being removed from the list due to its inability to work 

under Unix systems. Oracle was likewise removed due to its commercial cost and with the free 

version being too limited. 

While PostgreSQL have under many years been considered more robust and better suited for heavy 

load systems than MySQL, the difference between the two have shrunk in recent years of 

development. MySQL have, on the other hand, been gaining popularity by the ease of install and 

speed of use, although improvements have been made on both sides in this regard as well. It was in 

the end decided that MySQL would be the database system used; the database load would be low 

and the queries simple enough for both solutions, with MySQL being the more familiar system. 

4.1.3 Displaying the data 

Two solutions for displaying graphed data from network links were investigated. 

• Cacti [13] 

• MRTG [14] 

Although both tools support the utilization of RRDtool, there are vast differences in their ease of 

use and configuration. Cacti was first considered since it supported more features, although this was 

discarded as these features didn't benefit graphing bandwidth-data. It is however complex and 

support a wide range of items to graph, allowing for great customization. MRTG is also 

customizable and very easy to set up. Both Cacti and MRTG support third party scripts, allowing it 

to feed them data to graph. This allows both to graph almost anything. This feature is especially 

easy to set up with MRTG; it is a bit more complex and more time-consuming to accomplish the 

same with Cacti. 

In the end, MRTG was chosen. This is partly due to Internet 2 Business already use it internally, as 

well as the ease of use. 

4.2 Method criticism 

The choice of software suites could be investigated further. Which would suit the task better, and 

which would be the easiest to configure? The primary reason for choosing pmacct was due to 

Internet 2 Business already having decided upon it. More people would benefit from this thesis if 

the same conclusion could be reached by a thorough investigation of the available software 

applications, not just because it was a favorite. 

The same thought goes for the graphing software used, namely MRTG. A thorough investigation 

and comparison between the two would have been preferred, although MRTG proved to be the 

correct method since it was the easiest to configure. A third graphing software worth taking a look 

29


at is Munin [15], as it seem to updated regularly and is gaining traction in the community. 

However, the point was not to investigate if a certain graphing tool is better than another. The tool is 

just that; a tool about graphing collected data into an easy to read, comprehensible way. Both Cacti 

and MRTG accomplishes this. 

4.3 Solution 

The main purpose to use pmacct was as a helpful tool in mitigating DDoS attacks. Detecting them 

quickly would allow I2B to take appropriate action, such as blackholing the addresses itself (also 

known as a null route) as well as informing the ISP from where the attack originated. As mentioned, 

the faster they could identify the IP addresses the faster the threat would be contained. 

The first approach to avoid wrongful configuration which might end up disrupting customer traffic 

was to do everything in a virtual environment; using the free GNS3 framework which emulates 

routers from different brands, including Cisco and Juniper. This would emulate any number of 

routers with the added advantage of not having to worry about configuration mistakes. Once the 

configuration and environment would be set up, it could either be migrated directly to the live 

routers since the configuration would be exactly the same, or even continue running virtually. GNS3 

supports bridged networking which allows for communication with the outside world, much like 

any OS virtualization software like VMware or VirtualBox do. The idea was later discarded for 

several reasons; it was felt that creating and configuring the virtual environment would be too timeconsuming 

considering what would be gained. Spending this time double-checking the 

configuration seemed like a better use of the already diminishing timetable. 

One other reason for not implementing a virtual environment was due to the fact that GNS3 requires 

a real operating system of the router it is emulating. Thus, emulating a Cisco Catalyst 6500 like the 

small router would require obtaining a Cisco IOS license for the 6500. Even if that was achieved, 

generating traffic to capture with NetFlow would take additional time to set up. It was however 

possible to use virtual machines with VMware or VirtualBox to create hosts and link that universe 

with GNS3 and its virtual routers. In essence, bridging two virtual environments together, either 

alone or connected to the real network as well. It was either that or yet another license would need 

to be obtained, this one for traffic generation (TGN for instance). The whole idea was discarded 

after a few days when the amounting time for configuration became apparent. It was then decided 

that a live environment would be used instead, with extra care given to the configuration. The 

problem actually solved itself when configuring Quagga, the software router (see below). 

The Linux distribution used was Ubuntu Server 11.10 (Oneiric) 64 bit with 8 GB RAM and 4 CPU 

cores (1 physical). 

4.3.1 Collecting the data - solution 

In order to collect the desired data the small router was configured to send NetFlow-traffic to the 

Linux machine as well as configuration of the software router, Quagga, to send BGP information. 

See appendix 8.3 and 8.4 for configuration. 

30


Verifying the NetFlow-data 

Once the router configuration of sending NetFlow-data to the Linux machine was completed, it had 

to be verified that the computer actually got the flow-data. This was done by using a terminal-based 

version of the popular packet analyzer Wireshark named TShark [52]. 

The following syntax was used in order to make TShark focus on port 2100, the default port for 

NetFlow-data and save the results to a file: 

tshark -ni eth0 -R udp.port==2100 -d udp.port==2100,cflow -V > 

tshark.cap 

The file rapidly grew in size and after viewing the contents it was concluded that the router was 

correctly configured to send NetFlow-data to the computer. Below are two excerpts from the 

captured traffic, each displaying the duration of the flow and other relevant information. 

Illustration 8: Captured flow between two 

parties 

Quagga 

Illustration 7: Traffic destined for port 80, 

HTTP 

Configuring Quagga to exchange neighbor traffic with the small router turned out to be very easy 

and was done fairly quickly. It turned out to be Quagga that had the solution to avoid disrupting 

customer traffic by a wrongful configuration; in order to disrupt the customer traffic, it would have 

been mandatory that Quagga be configured with networks it could advertise as its own to other 

routers. If the same network(s) also existed on the real router, it would cause an interruption in 

traffic as both routers would claim ownership over the same networks. It is possible, however 

unlikely, that this new information would be propagated further out on the Internet as well. ISP's are 

generally very careful with advertising networks and usually have some sort of filter to only allow 

routing updates containing their own networks to be sent out. Since Quagga did not have any 

networks it could advertise to other router(s), wrongful configuration could not occur as it only 

listened to routing updates; it never sent any of its own. It was also one of the things that “just 

worked” and did not need any special attention once it was set up. 

31


Illustration 9: Quagga successfully peering with the small router and seeing over 48.000 routes 

Illustration 10: Quagga later successfully peering with the core router and seeing over 400.000 

routes 

pmacct 

Since pmacct is capturing flows between IP addresses, the amount of flows grow rapidly. Sending 

an e-mail to a friend might not generate many flows, but a server answering to requests might. 

Every user connecting to it would be a flow, with the return traffic being another flow (since the 

destination- and source-fields are reversed, they do not match and a new flow is instead created). 

The torrent protocol is an excellent example of creating many flows; every connected user is 

sending data to all the other connected users. In order to avoid saving all this unnecessary data and 

keeping the load on the NetFlow-generated traffic low, it was decided up to configure pmacct in the 

following way: 

32 

• Since pmacct supports filtering, tagging certain traffic would split up the data and allow for 

easier management. 

• The pmacct software would record and tag traffic based upon the direction from the other 

ASes, i.e. traffic going in to I2B from other ASes and traffic going out to them from I2B. 

These would be tagged as [in] and [out].


• Once an attack was in place from another AS, the amount of flows, packets and bytes 

received would increase rapidly and cause a spike in the graphed data for that particular AS, 

indicating there might be a problem. 

• Focusing on that particular AS with a new tag, named [attack] would yield the wanted 

detailed information such as IP addresses of the sender and receiver, bytes sent and received 

as well the number of packets sent. 

With this in mind, the pmacct configuration took place and divided the traffic accordingly. The 

configuration used several individual configuration files in order to accomplish the tagging 

procedure. The main configuration was done in the regular configuration file with the BGP peering 

being done in the configuration file agent.map, which was referenced to from the main 

configuration. The act of tagging traffic was done in pretag.map, in which I2B would change the 

configuration and focus on the attacking AS and thus give them the detailed information they would 

require. 

See the appendix, chapter 8.5 and it's sub-chapters for the configuration. 

The amount of traffic sent to the Linux machine from the small router turned out to be more than 

estimated, however very manageable. Observation showed that close to 20gb data was sent per day, 

all of it being flow data. Considering that the only thing being sent is the header information itself 

and that a TCP header is only 20 bytes in size, this amounts to quite a lot of data. 

4.3.2 Storing the data – solution 

MySQL was configured with a default configuration with the decision that a default configuration 

would suffice, given the approximated small load. Fine-tuning for optimal performance was 

therefore not necessary. A graphical front end called phpMyAdmin was however installed to help 

visualize the collected data. 

The computer hardware turned out to be more than sufficient, after both testing and hearing about a 

case-study peering at AS286 in which pmacct was running on a dual-core CPU with 4GB RAM and 

collecting data from a 250+ Gbps routing-domain [53]. 

33


Illustration 11: Saved data viewed raw with the text editor Nano under Linux 

Illustration 12: Excerpt from the database containing traffic bound for I2B from other ASes using 

phpMyADmin. Notice AS 0 

AS 0 would appear in the database regularly, which at first seemed odd since AS 0 is reserved and 

34


therefore not in use. After speaking with the creator of pmacct, AS 0 often meant that the route 

exported by NetFlow, as source or destination of the traffic, was not in the BGP routing table. This 

typically meant an internal route to the network, advertised by an IGP or similar. 

Doing a simulation of a pretended attack where one of I2B's own customers started to DDoS one of 

IP Only's customers (a Swedish ISP mainly aimed towards businesses), the configuration in 

pretag.map was changed to IP Only's AS. This told pmacct to tag traffic originating from I2B 

destined for IP Only. This tagged traffic was then saved in a special file in which the contents was 

pushed into the database. 

Illustration 13: phpMyAdmin displaying detailed information between I2B and IP Only, simulating 

an attack. Notice the amount of rows in the database – this is only traffic going to IP Only and from 

the small small router at that 

4.3.3 Displaying the data – solution 

MRTG need to be fed data in order to create graphs; thus a third-party script was written (see the 

35


appendix, chapter 8.8) to print the data sent in- and out from the requested AS, which in turn was 

stored in a database. The following is the output from the script: 

MRTG would then take over and store those numbers in at database-format of its own, round-robin 

database or rrd for short, which save data historically. The configuration for MRTG to graph data 

from a third-party script turned out to be easily achieved. Adding a target to graph in the 

configuration file, usually located at /etc/mrtg.cfg, takes no more than a few lines to achieve. 

# Global configuration 

WorkDir: /var/www/mrtg 

WriteExpires: Yes 

Interval: 1 

LogFormat: rrdtool 

PathAdd: /usr/bin 

LibAdd: /usr/lib 

Target[BB2]: `/var/www/mrtg/pmacct_get_traffic.sh 29518` 

#Above: Path to the script which print in- and out-data. 

MaxBytes[BB2]: 1000000000 

Title[BB2]: Bredband2 bytes 

Options[BB2]: gauge 

The above last five lines in the configuration creates a graph for the Swedish ISP Bredband2, 

showing traffic in and out between them and I2B. 

4.4 Problems 

This section discusses the different problems that manifested during the implementation and how 

these were dealt with. Some problems were small and easily solved while others required a whole 

different approach. 

4.4.1 Problem with collecting the data 

Configuring pmacct to combine the data from NetFlow and Quagga was the sole most difficult task. 

Despite well-documented commands and a mailing list, things still went very slow. This was 

attributed to the unfamiliarity with the software suite and its way of operation, as making one 

correct line of configuration did not give any hint of things moving in the right direction. The 

majority of the configuration had to be correct before any indicators were noticed at all. 

One other collector-related problem, which was only partially resolved, was the move from the 

36 

Illustration 14: Collected traffic from AS 29518 (Bredband2). Given the small 

amount, the counters had been reset prior to polling


small router to the core router. Copying the configuration to the core router was performed and 

achieved without incident. The Quagga transition went smoothly as it began peering almost 

instantly with the new router, however the flow-data stopped after only a few minutes. Capturing 

the traffic on the wire using TShark showed basically no flows at all. Instead of several megabytes 

of captured NetFlow-data from under a minute's capture to getting one or two flows every other 

minute. Restarting did unfortunately not resolve the issue. A buffer overflow was first thought of as 

the culprit, however this idea was moot since the data collection didn't even begin after the system 

reboot. Data should have started to fill up the buffers, until the point that they would overflow. That 

means at least some data should have been gathered. 

The culprit was later decided to be different versions of the operating system – the core router was 

running a newer IOS version and the configuration to enable NetFlow was apparently different. 

This was later confirmed when the small router received the same IOS upgrade and had its 

configuration restored. Although the router accepted the configuration, no NetFlow data were 

collected. BGP routing information was however received from both routers, which indicated that 

Quagga was correctly configured. The reason for seeing normal traffic at first and then have it drop 

was due to pmacct restarting after a configuration change. It is a probable assumption that the data 

being graphed meanwhile this change occurred was a mix of cached data, as it is saved up to 60 

seconds in the database as well as the graphing software only polling at the same interval. As such, 

this gave the false impression of data suddenly dropping out as it the graphing takes a while to 

properly update. 

The potential problem with large log files remained and careful observation was being done right 

after the migration, with extra care given to the file containing the routing table. The following are 

the sizes of the files which store traffic in- and out from I2B to other ASes, as well as the log file. 

The log file contains BGP routing information and is thus directly related to the number of routes 

being seen. The file containing attacks is not included since its size greatly varies between which 

AS is being focused, making it impossible to give an estimate. It is also worth noting that it did not 

increase in size after the switch to the core router; this is because all routes from and to that 

particular AS was already being monitored. 

File Approximated size before 

migration (small router) 

nfacctd.in 157kb 750kb 

nfacctd.out 150kb 740kb 

nfacctd.log 7mbyte 77mbyte 

Approximated size after migration 

(core router) 

As seen in the table, the estimated worst-case scenario of a 9-10x increase in size did not happen. It 

was instead a 4x to 5x increase, despite the number of routes being 9-10 times more on the core 

router. 

4.4.2 Problems with storing the data 

Several issues regarding the storing of the data were encountered, with some due to unexplainable 

reasons despite investigations into the issue. Those problems were not solved, they were avoided 

altogether with different approaches instead. 

37


4.4.2.1 pmacct 

Problems arose when trying to save the data from pmacct directly to the database and letting pmacct 

connect to the database itself. Data was not saved for some reason. An investigation did not find the 

issue, but it was clear that the user had the correct privileges. 

Solution: 

Saving the data as formatted files, i.e. CSV-files, and then using a custom-made script written in 

Python to manually push the data into the database instead. See the appendix, chapter 8.9 for the 

script. 

4.4.2.2 MySQL 

Several performance issues were discovered during the implementation process. One such concern 

regarding load and performance was when the pmacct script emptied its tables into the database. 

Since the ISP wanted as close to a real time-analysis as possible, a 60 second refresh timer was set. 

This meant pmacct was emptying its buffers into a database every 60 seconds and thus keeping the 

information up to date, while discarding with the old data first. The default configuration was 300 

seconds (5 minutes). 

This caused a much more heavier load than anticipated, with CPU utilization peaking over 90% 

load during these pushes. The entire machine became sluggish and almost non-responsive for a few 

seconds. The reason for the sluggish behavior was discovered by inspecting the database after such 

a push – Over 70.000 entries had been created in just a few seconds. Increasing the system 

specifications would unlikely produce sufficient results as it was taxed over 90% with just the ~48k 

routes from the test bench. Scaling up to their core router with ~400k routes would scale the 

requirements up as well. Comparing cost versus performance, it was decided that increasing the 

specifications of the system to handle the increased load would be a sub-optimal solution. 

Solution: 

The configuration was instead changed to allow the data to be stored on the hard drive as individual 

files, where a custom-made script took over and manually read them into the database instead. This 

vastly reduced the CPU load and made the system feel responsive during pushes. 

4.4.2.3 Execution time of said custom-made script 

Another potential problem was discovered with the addition of the script; namely its execution time. 

Working with data on the test bench which contains ~48k routes, the execution time was between 5 

and 7 seconds long. This posed a potential problem for the day when implementing it on their core 

routers would take place. The reason for the long execution time was determined to be the way the 

script handled the data. To understand why this was the case, and what the solution was, requires a 

brief explanation of the script itself and the reasoning behind it. Since the output files sometimes 

contained traffic information about the same source / destination but with different bytes and 

packets, this was decided to be solved right away using the script. As the script looped through each 

file and each row, it would do a database query to see if the AS already existed in the database or 

not. 

38


This was what turned out to be the source of the long execution time; the query back to the 

database. Although each file ([in], [out] and [attack]) only contained ~1100 rows each for a total of 

little over 3330 entries, it also meant that the script had to send a query to the database and check its 

reply 3300 times before it was done. 

Pseudo-code for the relevant section of the script: 

1. Loop through every line in every text file. 

2. For each row, take the AS and make a query back to the database: Does this ISP already 

exist in the database? 

3. If YES, there is an entry with the same AS – update the AS in question and sum together the 

values from the file and the value already present. 

4. If NO, there is no such record – create a new entry with the relevant information about that 

particular ISP. 

Solution: 

Once the safety check to avoid duplicate ISP's was removed, the script executed in less than a 

second. Even if the scale up from ~48k routes to ~400k, almost ten times more routes, is linear it 

shouldn't pose a problem. Worst case scenario would be that the script would take ten times longer 

to execute, finishing in at about 10 seconds. Considering the data is refreshed every 60 seconds, 

there is plenty of time to spare. 

This is the preferred solution, despite there now being some duplicate ISP's in the database. The 

solution to this is simply to let MySQL sum it together when the query for that information is made. 

This meant changing the MRTG-script as follows: 

Old query New query 

SELECT bytes FROM.. SELECT SUM(bytes) FROM.. 

4.4.3 Problems with displaying the data 

4.4.3.1 MRTG graphing incorrectly 

Once the graphing was achieved, it appeared that the numbers were far too big to be trusted. Graphs 

were showing in excess of 50+ times more traffic than was actually flowing through to other ASes. 

Some testing showed that MRTG took the script's printed data and changing the in-data to out-data 

and vice versa. In effect, if I2B were sending 10 mbit to any other ISP and receiving 20 mbit worth 

of traffic, MRTG would graph it as I2B were receiving 10 mbit and not 20 mbit like it should. This 

was despite the configuration saying that when two values are printed out, the first one would 

automatically be tagged as “in” and the other as “out”. 

39


After several days of researching the issue, it was also determined that MRTG likes its data “as-is” 

and not aggregated over a period of time, like pmacct does. pmacct saves the data in memory until 

the configured time as of which it dumps the data to be saved. MRTG and pmacct were both 

configured to poll or save data every 60 seconds. In the case of pmacct, it was 60 seconds of 

aggregated data. 

A small script was set up and configured in MRTG to be graphed . This would print out the values 

100 and 50 on different lines, simulating traffic in and traffic out. It would then be graphed, 

allowing for consistency in debugging as finding errors when graphing static data would help track 

down the problem. Manually creating images of the data using a script which called upon the 

rrdtool command directly (see the appendix, chapter 8.10) showed that MRTG was working as 

intended, except for the values switching place. 

Solution: 

Switching in- and out-traffic for MRTG as well as dividing the printed data by 60 in the custommade 

script showed values closer to what was expected. 

40 

Illustration 15: Static graph created for debugging purposes, graphing 100 bytes in and 50 

bytes out, in bits. Notice the switched values


41

5 Results 

5 Results 

Several conclusions can be made after investigation of the software suite pmacct. The pmacct suite 

offers strong indication for being able to scale up as the traffic load increase, as long as careful 

consideration and planning is done. Since this particular solution only involved getting detailed 

traffic information from the attacking AS, it is uncertain if this level of IP accounting is feasible in 

gathering the same information from every connected AS in real-time. Every flow from all of their 

customers connected to each other by said flows, from every connected ISP. It seems unlikely for 

one machine to handle. However, pmacct is able to solve this by splitting the data between several 

machines; one machine only focuses on one or a few ASes, as much as deemed safe and within 

margins. 

The solution used in this thesis, namely only collecting the number of bytes sent and received as 

well as the number of packets from every AS, while only focusing on the one attacking or being 

attacked seemed to work very well. It did seem strange that pmacct and MySQL would not 

communicate directly, since it worked for other people according to the mailing lists. This was 

however easily circumnavigated by the custom-made script. 

Another odd event was MRTG and its switched values, where traffic in was being graphed as traffic 

out. An explanation to why this happened is still sought. One thing that was easy to explain was 

why the data appeared so much bigger than it actually was, as this was a configuration mishap 

between MRTG and pmacct. 

The move from the small router to the core router turned out to be a real mystery and a lot of time 

was spent trying to figure out the reason. This unfortunately meant the IPv6 implementation had to 

be scrubbed. The data that was collected from the core router was very promising however – since 

the scale up from about 50k routes to over 400k routes (a factor of 8) only increased the sizes of the 

files by a factor of 4, it would indicate that the performance impact from scaling up would be 

negligible. The script which read the files and pushed the data into the database executed in about 

14 seconds, which meant there was plenty of time to spare. 

5.1 Analysis of results 

IPv6 would have been implemented if the configuration of pmacct was done faster. There was 

unfortunately not enough time for both IPv4 and IPv6, and thus one had to be scrubbed. The MRTG 

incident with the data being switched could perhaps have been solved faster if contact had been 

made with its creator directly and asking why. The fact that MRTG graphed them vastly too big was 

one thing that could have been avoided if the whole documentation would have been read; pmacct 

was aggregating one minute worth of data and MRTG wanting it in another way. 

The script which pushed data from the files to the database could have been written more 

efficiently, such as moving from a script to a fully fledged program which may have handled the 

database queries faster. 

Detecting DoS attacks turned out to be easier with the solution of storing the data in a database. As 

mentioned above, in addition to graph traffic from different ASes, the collection of detailed 

information from a particular AS would also start once an attack is detected. Identifying the top 

talkers is done quickly as it only involves an SQL-query. A customer might make a ticket about 

42

5 Results 

their router feeling slow or unresponsive altogether; a simple SQL-query written to list the number 

of connections to that particular router's IP would display the culprits. Ordering the results by most 

packets sent or most bytes sent is also done within the same query. Due to MRTG being able to 

graph basically everything and thanks to the database solution, graphing the number of connections 

on a device can be done with extreme ease as well. 

5.2 Recommendations 

It might be advisable to put the pmacct-collector on several different machines and thus splitting up 

the workload. The first machine would accept NetFlow data from a router and just focus on one or 

two ASes and then move the flow onward to the next machine which handles the rest. One other 

solution is to have dedicated machines for each router sending NetFlow-data in the network. This 

has the benefit of each router sending flows to its own collector instead of having one collector 

getting all the data. The workload on the hard drives should also be monitored, as the amount of 

flow-data sent from just the small router averaged about 20 gigabyte per day. 

If Quagga is to be taking a more direct role, such as actively participating in the routing process, 

extra observation should be made due to it being single-threaded and might not scale with the 

increased load. 

5.3 Future work 

Manual vs. automated 

As it stands, all the graphs have to be created individually, i.e create graphs for one particular AS, 

and another and so on. This poses a daunting challenge when considering the amount of ISP's that 

exists. There are several tens of thousands ISP's on the Internet as of today and creating a graph for 

each and every one isn't feasible. In addition, one would need to actually be watching the graphs in 

order to see the attack take place and take appropriate action. One solution to this is to automate the 

process using software either made by custom scripts or already available products, to monitor the 

traffic and send an alarm if it reaches a certain threshold. It would sample the traffic flow to find the 

norm and send an alert if it spots an abnormal amount of traffic, using the norm as a template for 

normal traffic. 

One could further develop this idea to automatically graph said ISP's network traffic in order to be 

ready if it turns out to be an actual attack. The alert message would contain enough information for 

the receiving script to automatically start graphing that particular ISP and display it on a web page 

containing potential threats. This would ensure a significantly lower workload for the employees as 

well as eliminating the human error. 

Using the tools used in this thesis to additionally graph the number of active connections on devices 

is something that could be implemented for greater observation. 

43

6 Summary and conclusions 

6 Summary and conclusions 

The issue was to determine whether or not pmacct scaled well with larger installations, as it had 

been proven to work well with smaller ones. Investigation showed it would scale well as the 

company would grow. Both IPv4 and IPv6 was to be monitored, however the time only allowed for 

one. IPv4 was therefore chosen for monitoring as it is more commonly used throughout the world 

today. Graphing the data was successfully done with MRTG and RRDTOOL. Previous concerns 

regarding the hardware specifications turned out to be moot as pmacct doesn't appear to need much 

at all in that regard. It runs well on normal computers one would expect to find at home. However, 

the amount of NetFlow-data sent to the collector turned out to be greater than anticipated. 

Considering the averaged amount of flow-data turned out to be 20 gigabyte per day from the small 

router, the workload on the hard drives over a sustained time period should be taken into 

consideration. 

The project was not without problems however, with some still being unresolved. 

Once the small router's flows were being graphed it would be migrated to the core router. It all 

started well with the software router instantly peering with the core router, however the flow of data 

stopped after a few minutes. A buffer overflow problem was discarded as a restart did not help; if it 

was a buffer overflow problem, some data should have been gathered before the buffers got full 

again. The only conclusion is a wrongly configured NetFlow configuration, since the routers were 

running different versions of the operating system. 

More problems did unfortunately arise with storing as well as displaying the data. 

The database, MySQL and pmacct did not want to communicate with each other, which meant that 

pmacct could not store its data. The issue was resolved with a small configuration change; 

temporarily storing the data as CSV-formatted files and then writing a custom script to read through 

the files and manually push the data into MySQL. Fine-tuning the script itself turned out to be 

necessary as its execution time was far to long. This was due to the script making a query back to 

the database to see if the AS already existed or not. This turned out to be over 3000 sql-queries, 

with data from just the small router. Once the correction was made the script executed within 1/8 th 

of the time. 

The graphing issue turned out to be an problem with pmacct and MRTG itself. Apart from graphing 

too big in- and out-numbers to be real, they had as well swapped place with each other. Thus, the 

amount of data sent in to I2B was being graphed as data going out from them instead. The solution 

to this was just to switch the numbers ourselves, but the reason remains unknown. The reason for 

graphing far too big values was due to pmacct holding the data for the configured 60 seconds before 

storing it, while MRTG wanted the data differently. Numbers came closer to reality once division by 

60 was made before passing the data to MRTG. 

The results were especially promising since interesting information had been received during the 

few minutes of operation after the migration. It showed that the performance hit when switching to 

the core router, an increase of routes by a factor of ten, was in fact negligible. The files grew to a 

manageable size with the script finishing in at about 14 seconds, having plenty of time to spare 

before the next update. 

44

7 References 

7 References 

[1] “Cyber Attacks Now 2nd Most Common Economic Crime” 

http://www.securitiestechnologymonitor.com/news/cyber-attacks-economic-crime-big-30254-1.html 

Article publication date: March 27, 2012. Accessed May 23 rd , 2012. 

[2] “Introduction to Cisco IOS NetFlow - A Technical Overview”, figure 1. 

http://www.cisco.com/en/US/prod/collateral/iosswrel/ps6537/ps6555/ps6601/prod_white_paper090 

0aecd80406232.html 

Article publication date: October 2007. Accessed May 23 rd , 2012. 

[3] “Understanding Denial-of-Service Attacks” 

http://www.us-cert.gov/cas/tips/ST04-015.html 

Article publication date: Aug 11, 2004. Accessed May 23 rd , 2012. 

[4] “pmacct homepage” 

http://www.pmacct.net/ 

[5] “List of software related to flow accounting” 

http://www.switch.ch/network/projects/completed/TF-NGN/floma/software.html 

Page last updated: March 3 rd , 2012. Accessed May 23 rd , 2012. 

[6] “NetFlow Analyzer by ManageEngine” 

http://www.manageengine.com/ 

Accessed May 23 rd , 2012. 

[7] “Arbor Peakflow” 

http://www.arbornetworks.com/arbor-peakflow-ip-traffic-flow-monitoring-system.html 


[8] “Flowc” 

http://netacad.kiev.ua/flowc/index.php?id=4 

Page last updated: Jul 8 th , 2006. Accessed May 23 rd , 2012. 

[9] “MariaDB” 

http://mariadb.org/ 


[10] “MySQL” 

http://mysql.com/ 

Page last updated: April 10 th , 2012. Accessed May 23 rd , 2012. 

[11] “Microsft SQL Server” 

http://www.microsoft.com/sqlserver/en/us/default.aspx 


[12] “PostgreSQL” 

http://www.postgresql.org/ 

Page last updated: May 14 th , 2012. Accessed May 23 rd , 2012. 

45

7 References 

[13] “Cacti” 

http://cacti.net/ 

Page last updated: April 29 th , 2012. Accessed May 23 rd , 2012. 

[14] “MRTG” 

http://oss.oetiker.ch/mrtg/ 

Page last updated: Feb 18 th , 2012. Accessed May 23 rd , 2012. 

[15] “Munin” 

http://munin-monitoring.org/ 

Page last updated: May 5 th , 2012. Accessed May 23 rd , 2012. 

[16] “RFC 791 – DARPA Internet Program Protocol Specification” 

http://tools.ietf.org/html/rfc791 

Page created: Sept 1981. Accessed May 24 th , 2012. 

[17] ”The IANA IPv4 Address Free Pool is Now Depleted” 

https://www.arin.net/announcements/2011/20110203.html 

Article publication date: Feb 3 rd , 2011. Accessed May 25 th , 2012. 

[18] "Free Pool of IPv4 Address Space Depleted” 

http://www.nro.net/news/ipv4-free-pool-depleted 

Article publication date: Feb 3 rd , 2011. Accessed May 25 th , 2012. 

[19] “RFC 2460 – Internet Protocol version 6” 

http://www.ietf.org/rfc/rfc2460.txt 

Page created: Dec 1998. Accessed May 25 th , 2012. 

[20] “RFC 1930 - Guidelines for creation, selection, and registration of an Autonomous System 

(AS)” 


Page created: Mar 1996. Accessed May 25 th , 2012. 

[21] “RFC 4271 - A Border Gateway Protocol 4 (BGP-4)” 

http://www.ietf.org/rfc/rfc4271 

Page created: Jan 2006. Accessed May 25 th , 2012. 

[22] “RFC 1771 - A Border Gateway Protocol 4 (BGP-4)” 

http://www.ietf.org/rfc/rfc1771.txt 

Page created: Mar 1995. Accessed May 25 th , 2012. 

[23] “RFC 2328 – OSPF Version 2” 


Page created: April 2008. Accessed May 25 th , 2012. 

[24] “TCP SYN Flooding and IP Spoofing Attacks” 

http://www.cert.org/advisories/CA-1996-21.html 

Article publication date: Sep 19 , 1996. Accessed Aug 11 th 2012. 

[25] “RFC 1142 – OSI IS-IS Intra-domain Routing Protocol” 

46

7 References 


Page created: Feb 1990. Accessed May 25 th , 2012. 

[26] “CISCO IOS NETFLOW OVERVIEW” 

http://www.cisco.com/en/US/prod/collateral/iosswrel/ps6537/ps6555/ps6601/prod_presentation090 

0aecd80311f57.pdf 

Article publication date: Feb 2004. Accessed May 26 th , 2012. 

[27] “What is Linux” 

http://www.linux.org/article/view/what-is-linux 

Page created: March 5 th 2012. Page last updated: May 25 th , 2012. Accessed May 26 th , 2012. 

[28] “RFC1918 – Address Allocation for Private Internets” 


Page created: Feb 1996. Accessed May 29 th , 2012. 

[29] “GNU and Linux” 

http://www.gnu.org/gnu/linux-and-gnu.html 

Page last updated: May 18 th , 2012. Accessed June 1 st , 2012. 

[30] “Newbie: Intro to cron” 

http://www.unixgeeks.org/security/newbie/unix/cron-1.html 

Page created: Dec 30 th 1999. Accessed June 1 st , 2012. 

[31] “Apache Frequently Asked Questions” 

http://wiki.apache.org/httpd/FAQ 

Page last updated: Jan 17 th , 2012. Accessed June 1 st , 2012. 

[32] “March 2012 Web Server Survey” 

http://news.netcraft.com/archives/2012/03/05/march-2012-web-server-survey.html 

Page created: March 5 th 2012. Accessed June 1 st , 2012. 

[33] “Scripting Languages” - Rich Morin and Vicki Brown 

http://www.mactech.com/articles/mactech/Vol.15/15.09/ScriptingLanguages/index.html 

Page created: 1999. Accessed June 1 st , 2012. 

[34] “Introduction to Bash-scripting” - Mike G Mikkey 

http://www.linuxdoc.org/HOWTO/Bash-Prog-Intro-HOWTO.html 

Article publication date: Jul 27 th 2000. Accessed June 4 th , 2012. 

[35] “Quagga” 

http://www.nongnu.org/quagga/docs/docs-info.html 

Page created: July 28 th 2006. Page last updated: April 1 st , 2011. Accessed June 8 th , 2012. 

[36] “MRTG” - Tobias Oetiker 

http://oss.oetiker.ch/mrtg/doc/mrtg.en.html 

Page last updated: Jan 13 th 2012. Accessed June 8 th , 2012. 

[37] “RRDTOOL - Tobias Oetiker” 

http://oss.oetiker.ch/rrdtool/doc/rrdtool.en.html 

47

7 References 


[38] “Libpcap framework” 

http://www.tldp.org/LDP/Linux-Dictionary/html/l.html 

Accessed June 12 th , 2012. 

[39] “sFlow Version 5 specifications” 

http://www.sflow.org/sflow_version_5.txt 

Page created: July 2004. Accessed June 17 th , 2012. 

[40] “Closer look: sFlow better than NetFlow?” - Brad Reese 

http://www.networkworld.com/community/node/29117 

Page created: Jun 20 th 2008. Accessed June 18 th , 2012. 

[41] “NetFlow Vs. sFlow – It may matter to you” - Michael Pattersen, Plixer International 

http://www.plixer.com/blog/general/netflow-vs-sflow-it-may-matter-to-you/ 

Page created: Jan 21 st 2009. Accessed June 20 th , 2012. 

[42] “Cisco's NetFlow vs. Inmon's sFlow: Which will prevail?” - Brad Reese 

http://www.networkworld.com/community/node/22667 

Page created: Dec 5 th 2007. Accessed June 20 th , 2012. 

[43] “An Introduction to sFlow technology and a brief on sFlow vs NetFlow technology” - Arun 

Karthik Asokan, ManageEngine 

http://blogs.manageengine.com/netflowanalyzer/2011/03/11/netflow-vs-sflow/ 

Page created: Mar 11 th 2009. Accessed June 21 th , 2012. 

[44] “IPFIX fine-tunes traffic analysis” - Paul Kohler and Benoit Claise, Network World 

http://www.networkworld.com/news/tech/2003/0811techupdate.html 

Page created: Aug 11 th 2003. Accessed June 21 th , 2012. 

[45] “NetFlow Switching Enhancements Feature Module” 

http://www.cisco.com/en/US/docs/ios/11_2/feature/guide/netflow.html 

Accessed June 21 th , 2012. 

[46] “sFlow” 

http://www.sflow.org/ 

Accessed June 22 nd , 2012. 

[47] “IPFIX” 

http://tools.ietf.org/wg/ipfix/ 

Accessed June 23 rd , 2012. 

[48] “NFDUMP” 

http://nfdump.sourceforge.net/ 

Page last updated: Dec 31 st 2011. Accessed June 25 th , 2012. 

[49] “NfSen” 

http://sourceforge.net/projects/nfsen/ 


48

7 References 

[50] “Flowd” 

http://code.google.com/p/flowd/ 

Page last updated: Nov 2 nd 2010. Accessed June 26 th , 2012. 

[51] “TRANSMISSION CONTROL PROTOCOL” 

http://tools.ietf.org/html/rfc793#page-15 

Page created: Sept 1981. Accessed June 30 th , 2012. 

[52] “TShark man-pages” 

http://www.wireshark.org/docs/man-pages/tshark.html 

Accessed July 2 nd , 2012. 

[53] Case-study, peering at AS286 – Paolo Lucente 

http://www.pmacct.net/pmacct_peering_epf5.pdf (the bottom of the presentation) 

Presentation published Sept 2010. Accessed July 5 th , 2012. 

[54] Border Gateway Protocol, Cisco Wiki 

http://docwiki.cisco.com/wiki/Border_Gateway_Protocol 

Page last updated: Dec 17 th 2009. Accessed July 3 rd , 2012. 

[55] BGP AS PATH Attributes 

http://www.inetdaemon.com/tutorials/internet/ip/routing/bgp/operation/messages/update/attributes/i 

ndex.shtml#wkd 

Page last updated: Aug 10 th 2012. Accessed Aug 11 th 2012. 

[56] BGP – Existing Implementations 

http://www.openbsd.org/papers/bgpd_sucon04/mgp00006.html 

Accessed Aug 11 th 2012. 

[57] “Bird Internet routing daemon” 

http://bird.network.cz/ 

Page last updated: Mar 2 nd 2012. Accessed Aug 11 th 2012. 

[58] “LONAP's Route Servers”, Davidson, Andy 

http://www.uknof.org.uk/uknof13/Davidson-LONAP_routeservers.pdf 

Presentation published May 28 th 2009. Accessed Aug 11 th 2012. 

[59] “BIRD Route Server and Linx”, Preston, Tim 

http://www.uknof.org.uk/uknof15/Preston-Routeserver.pdf 

Presentation published Jan 21 st 2010. Accessed Aug 11 th 2012. 

[60] “BIRD package for Debian” 

http://packages.debian.org/squeeze/bird 


[61] “BIRD package for Ubuntu” 

http://packages.ubuntu.com/hardy/bird 


[62] “OpenBGPD” 

49

7 References 

http://www.openbgpd.org/ 

Page last updated: Mar 25 th 2012. Accessed Aug 11 th 2012. 

[63] “A Secure BGP Implementation”, Henning Brauer 

http://www.openbsd.org/papers/bgpd_sucon04/mgp00006.html 


[64] “Vyatta” 

http://www.vyatta.com/ 


[65] “Vyatta Commercial Evaluation Request” 

http://www.vyatta.com/product/vyatta-network-os/get-started 


50

8 Appendix 

8 Appendix 

This appendix includes deep technical information, such as the source code for scripts and 

configuration files for related programs. 

8.1 E-mail conversation with Paolo Lucente. 

pmacct Case Studies 

From:Paolo Lucente (paolo@pmacct.net) 

Sent: Thursday, April 4, 2012 2:06:21 AM 

To: Daniel Karlström (dkm07001@student.mdh.se) 

Hi Daniel, 

Thanks for your interest in pmacct. 

There have indeed been implementations of pmacct on such a scale before. 

Unfortunately, once the implementation is finished the ISP's usually wrap 

it up in an NDAs (Non-Disclosure Agreements), making me unavailable to 

reveal the information to anyone else. This is due to security reasons, 

as such case studies would undoubtedly contain information about their 

internal network and they don't want that getting out. 

The only thing i can point you is at the end of this presentation: 

http://www.pmacct.net/pmacct_peering_epf5.pdf 

A few slides on the case-study of AS286, using pmacct for their peering 

purposes. AS286 is a reputable IP carrier (tier 2) in Europe. 

Regards, 

Paolo 

On Wed, April 3, 2012 at 10:22:49PM +0200, Daniel Karlström wrote: 

> 

> Greetings Paolo, 

> 

> My name is Daniel Karlström and I am doing a thesis on pmacct for Mälardalen 

University, Sweden. 

> The thesis is investing whether or not pmacct is suitable for larger 

installations, at the ISP 

> level for instance. I have not come across any prior work in this field and 

was wondering if they exist. 

> Are there any existing case studies I might use and reference to? 

> 

> Regards, 

> 

> Daniel Karlström, Mälardalen University, Sweden. 

51

8 Appendix 

8.2 Enabling NetFlow on Cisco devices 

Example configuration of a Cisco router sending incoming and outgoing traffic information using 

NetFlow, sourced from its own fastEthernet 0/0-interface to a collector residing at the IP address 

192.168.0.10 on port 2055 

Router>enable 

Router#conf t 

Router(config)#ip flow-export source fastEthernet 0/0 

Router(config)#ip flow-export version 5 

Router(config)#ip flow-export destination 192.168.0.10 2055 

Router(config)#interface fastEthernet 0/0 

Router(config)#ip flow ingress 

Router(config)#ip flow egress 

Router(config-if)#ip route-cache flow 

8.3 Configuration of NetFlow on the small router 

ip flow-cache timeout inactive 10 

ip flow-cache timeout active 5 

mls flow ip full 

interface xx x/x 

ip flow ingress 

ip flow-export source Loopback0 

ip flow-export version 9 

ip flow-export destination x.x.x.x 2100 

8.4 Configuration of Quagga 

router bgp wwwww 

52 

bgp router-id x.x.x.x 

bgp enforce-first-as

8 Appendix 

! 

neighbor y.y.y.y remote-as zzzzz 

neighbor y.y.y.y update-source x.x.x.x 

neighbor y.y.y.y default-originate 

neighbor y.y.y.y soft-reconfiguration inbound 

neighbor y.y.y.y prefix-list deny-all out 

neighbor 127.0.0.1 remote-as zzzzz 

neighbor 127.0.0.1 port 17917 

neighbor 127.0.0.1 update-source 127.0.0.1 

neighbor 127.0.0.1 route-reflector-client 

ip prefix-list deny-all seq 5 deny 0.0.0.0/0 le 32 

8.5 Configuration of pmacct 

daemonize: true 

pidfile: /var/run/nfacctd.pid 

logfile: /usr/local/etc/pmacct/nfacctd.log 

! 

interface: eth0 

nfacctd_ip: x.x.x.x 

nfacctd_port: 2100 

nfacctd_renormalize: true 

! 

aggregate[out]: dst_as,peer_src_as,peer_dst_as 

aggregate[in]: src_as,peer_src_as 

aggregate[attack]:dst_as,src_host,dst_host,src_port,dst_port,proto 

,tos 

! 

pre_tag_filter[attack]: 100 

pre_tag_map: /usr/local/etc/pmacct/pretag.map 

! 

refresh_maps: true 

! 

bgp_daemon: true 

bgp_daemon_ip: 127.0.0.1 

bgp_daemon_port: 17917 

bgp_daemon_msglog: true 

bgp_agent_map: /usr/local/etc/pmacct/agent.map 

53

8 Appendix 

nfacctd_as_new: bgp 

! 

pmacctd_force_frag_handling: true 

plugin_buffer_size: 990480000 

plugin_pipe_size: 904800000 

plugins: print[ut],print[in],print[attack] 

print_output_file[out]: /usr/local/etc/pmacct/nfacctd.out 

print_output_file[in]: /usr/local/etc/pmacct/nfacctd.in 

print_output_file[attack]: /usr/local/etc/pmacct/nfacctd.attack 

print_output: csv 

print_refresh_time: 60 

8.5.1 agent.map 

id=x.x.x.x ip=y.y.y.y 

8.5.2 pretag.map 

id=100 ip=y.y.y.y dst_as=zzzzz 

8.6 Bash Script 

#!/bin/bash 

color="blue" 

if [ "@color" = "blue" ]; then 

else 

fi 

echo "Blue is a very nice color!" 

echo "Not as good as blue, but.." 

echo "Program complete." 

8.7 Python Script 

#!/usr/bin/env python 

# -*- coding: utf-8 -*- 

54

8 Appendix 

color="blue" 

if color == "blue": 

else: 

print("Blue is a very nice color!") 

print("Not as good as blue, but..") 

print("Program complete.") 

8.8 Script for MRTG 

#!/bin/sh 

# This script takes only one input variable, namely 

# the AS one want traffic information about. 

# Example: ./pmacct_get_traffic 29518 would 

# give in- and out-data from AS 29518 (Bredband2) 

# for the last 60 seconds. 

AS=$1 

unset IN 

unset OUT 

IN=`/usr/bin/mysql -e "select sum(bytes) from AS_in_v4 where 

src_as='$AS'" pmacct` 

OUT=`/usr/bin/mysql -e "select sum(bytes) from AS_out_v4 where 

dst_as='$AS'" pmacct` 

# Our variables now contain junk letters along with 

# the actual numbers in the form of "bytes 164277. 

# These should be removed as we only want the numbers. 

# We are also dividing the numbers by 60, which equals 

# one minute. This is due to MRTG wanting live data 

# and not historical. pmacct's timer is set to 60 seconds 

# and hence the division by 60 is necessary. 

IN=$(echo $IN |sed s/'sum(bytes) '//) 

IN=$(($IN/60)) 

55

8 Appendix 

echo $IN 

OUT=$(echo $OUT |sed s/'sum(bytes) '//) 

OUT=$(($OUT/60)) 

echo $OUT 

8.9 Custom-made script to push the data into the database 

#!/usr/bin/env python 

# -*- coding: utf-8 -*- 

# Importing various required libraries. 

import MySQLdb 

import sys 

import csv 

import time 

import os 

# Declaring variables. 

script_exec_time = time.asctime( time.localtime(time.time()) ) 

query = "" 

# TIMER START 

startTime=time.clock() 

# DATABASE INFORMATION 

# host, user, passwd, db 

db = MySQLdb.connect("localhost", "user", "password", "pmacct") 

cursor = db.cursor() 

# Opening of the files for reading. 

file_out = csv.reader(open('/usr/local/etc/pmacct/nfacctd.out', 

'rb'), delimiter=',', quotechar='|') 

file_in = csv.reader(open('/usr/local/etc/pmacct/nfacctd.in', 

'rb'), delimiter=',', quotechar='|') 

file_attack = 

csv.reader(open('/usr/local/etc/pmacct/nfacctd.attack', 'rb'), 

delimiter=',', quotechar='|') 

# Creating lists to temporarily store the 

# data before it is pushed into SQL. 

v4_out = [] 

v4_in = [] 

v4_attack = [] 

# Populating the lists with the files' contents. 

v4_out.extend(file_out) 

v4_in.extend(file_in) 

v4_attack.extend(file_attack) 

56

8 Appendix 

# Remove the first line from the lists, as these contain the 

# actual name of the fields such as PROTO, BYTE and PACKETS 

# instead of the values itself. MySQL doesn't like doing 

# numerical calculations on fields containing text, for instance. 

v4_out.pop(0) 

v4_in.pop(0) 

v4_attack.pop(0) 

# Emptying the database before new values are inserted. 

cursor.execute("TRUNCATE TABLE AS_in_v4") 

cursor.execute("TRUNCATE TABLE AS_out_v4") 

cursor.execute("TRUNCATE TABLE AS_attack_v4") 

# Looping through each list and inserting the values 

# into MySQL. 

# Traffic going OUT from I2B 

for kolumn in v4_out: 

cursor.execute("""INSERT INTO AS_out_v4 (DST_AS, packets, 

bytes) VALUES (%s, %s, %s)""", (kolumn[9], kolumn[30], 

kolumn[32])) 

# Note: 

# kolumn 9, 30 and 32 are the delimited fields which 

# contain the data we want to save. In this case, the 

# destination AS, the number of packets and bytes sent. 

# Traffic going IN to I2B 

for kolumn in v4_in: 

cursor.execute("INSERT INTO AS_in_v4 (SRC_AS, packets, bytes) 

VALUES (%s, %s, %s)", (kolumn[8], kolumn[30], kolumn[32])) 

# Note: 

# kolumn 8, 30 and 32 are the delimited fields which 


# source AS, the number of packets and bytes received. 

# Attacking AS 

for kolumn in v4_attack: 

cursor.execute("INSERT INTO AS_attack_v4 (DST_AS, SRC_IP, 

DST_IP, DST_PORT, proto, packets, bytes) VALUES (%s, %s, %s, %s, 

%s, %s, %s)", (kolumn[9], kolumn[21], kolumn[22], kolumn[26], 

kolumn[28], kolumn[30], kolumn[32])) 

# Note: 

# kolumn 9, 21, 22, 26, 28, 30 and 32 are the delimited fields 

57

8 Appendix 

which 


# destination AS, source IP, destination IP, destination port, 

protocol 

# and the number of packets and bytes sent. 

# Close the open handles before exiting. 

cursor.close() 

db.close() 

# Print the execution time of the script. 

print ("%s: Execution time: %s seconds." % (script_exec_time, 

(time.clock() - startTime))) 

8.10 RRDTOOL-script (rrdtool.sh) 

/usr/bin/rrdtool graph $1.png \ 

--imgformat=PNG \ 

--start=-7200 \ 

--end=-60 \ 

--title='Pmacct Graph' \ 

--rigid \ 

--base=1000 \ 

--height=120 \ 

--width=500 \ 

--alt-autoscale-max \ 

--lower-limit=0 \ 

--vertical-label='bits/s' \ 

--slope-mode \ 

--font TITLE:12: \ 

--font AXIS:8: \ 

--font LEGEND:8: \ 

--font UNIT:8: \ 

DEF:a="/var/www/mrtg/$1.rrd":ds0:AVERAGE \ 

CDEF:cdefa=a,8,* \ 

DEF:b="/var/www/mrtg/$1.rrd":ds1:AVERAGE \ 

CDEF:cdefb=b,8,* \ 

AREA:cdefa#0000ffff:"Out Traffic" \ 

GPRINT:cdefa:LAST:"Current\:%8.2lf %s" \ 

GPRINT:cdefa:AVERAGE:"Average\:%8.2lf %s" \ 

GPRINT:cdefa:MAX:"Maximum\:%8.2lf %s\n" \ 

AREA:cdefb#00ff0099:"In traffic" \ 

58

8 Appendix 

GPRINT:cdefb:LAST:"Current\:%8.2lf %s" \ 

GPRINT:cdefb:AVERAGE:"Average\:%8.2lf %s" \ 

GPRINT:cdefb:MAX:"Maximum\:%8.2lf %s\n" 

AREA:cdefb#FFC73BFF:"In traffic":STACK \ 

59

Implementation of data collection tools using NetFlow for statistical ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?