27.03.2014 Views

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Bandwidth daily = R size × R sum (6)<br />

Storage requirement = U pop × U size (7)<br />

The models output are described in the formulas above.<br />

CPU requirement is defined in MHz, while outputs that are<br />

dealing with size (bytes) are defined in the same units as R size<br />

and U size . Thus defining the parameters in MB will yield the<br />

requirements in MB. Keep in mind that the Hardware Model<br />

uses GB as inputs, so by defining the size in MB the output<br />

will have to be converted to GB in a later stage.<br />

D. Hardware Model<br />

The main function of the Hardware Model is to calculate the<br />

cost of the hardware requirements derived from the Software<br />

Model. It calculates the cost for a one month period. The<br />

Hardware Model uses all but the bandwidth peak output from<br />

the Software Model as inputs. The Hardware Model uses the<br />

following parameters:<br />

• CPU Cost :(C CPU ) The cost for the one MHz of CPU.<br />

• Storage Cost :(C storage ) The cost of storing GB in the<br />

servers per month.<br />

• Bandwidth Cost :(C bandwidth ) The cost of using GB of<br />

bandwidth per month.<br />

The Hardware Model has the following outputs. The cost<br />

of fulfilling the CPU requirement (8), the network cost per<br />

month (9) and the storage cost (10).<br />

CPU cost = C CPU × R CPU (8)<br />

Bandwidth cost = C bandwidth × bandwidth daily × 30 (9)<br />

Storage cost = C storage × Storage requirement (10)<br />

Equation (8) is designed to derive the cost for the costperformance<br />

of the modeled application on the given hardware.<br />

For applications hosted at Infrastructure as a Service provider<br />

(IaaS providers) such as Amazon or Rackspace we also pay<br />

for the bandwidth and storage of our applications. For the<br />

bandwidth cost described in (9) we start with calculating the<br />

daily cost and then converting it to a monthly basis. For the<br />

storage cost (described in (10)) it is common to charge a<br />

certain amount per GB for each month, hence we do not<br />

convert it to a monthly basis. We simply calculate the cost<br />

based on how many GB that will be required and the cost for<br />

those.<br />

TABLE I<br />

THE APPLICATIONS INCLUDED IN THE VALIDATION.<br />

Application U pop R size<br />

1 15140 351839<br />

2 1334 2210956<br />

3 599 676276<br />

III. EVALUATION<br />

This section contains an evaluation of the models. The<br />

evaluation was conducted in Malvacom AB; the parameters<br />

were estimated using historical data. The validation was done<br />

by modeling three applications that are currently in production.<br />

The selection of the applications was done by looking at the<br />

user population for the applications. We wanted to compare<br />

one small application, one medium-sized and one large from<br />

the available dataset in order to compare how the attributes<br />

such as the user population and the users availability impacts<br />

the actual cost. The selected applications can be seen in<br />

Table I. Application 1 is an application used for walking and<br />

social interactions while Application 2 and 3 is are social<br />

networking applications. The results were then analyzed and<br />

compared against developer opinions to see how the model<br />

performs compares against expert opinion.<br />

In the first step of the evaluation we describe how we<br />

estimated the parameters. During the estimation we had access<br />

to data that allowed us to estimate the user’s availability and<br />

various attributes about the requests such as request size. For<br />

the last step, when looking at the cost for deploying, we<br />

chose Amazon EC2 due their clear documentation regarding<br />

the pricing (obviously any other provider can be used as long<br />

as they provide clear data regarding pricing).<br />

A. Parameter Estimation<br />

For the parameter P in the User Model we summarized the<br />

traffic for mAppBridge, we grouped each request per hour and<br />

took the average number of requests based on the number of<br />

days we had collected traffic for and the average number of<br />

unique users per day. We chose to treat all requests as a virtual<br />

user, which results in 100% user activity when the number<br />

of requests was equal to the total population. Our extracted<br />

user pattern is described in the matrix below. The population<br />

and the average request size for each of the applications are<br />

described in Table I.<br />

⎛<br />

⎞<br />

0, 113 0, 091 0, 087 0, 135 0, 260 0, 383<br />

P = ⎜0, 530 0, 613 0, 626 0, 652 0, 660 0, 675<br />

⎟<br />

⎝0, 679 0, 711 0, 695 0, 651 0, 624 0, 658⎠<br />

0, 645 0, 655 0, 597 0, 439 0, 266 0, 165<br />

Unfortunately, we did not have any good source regarding<br />

the data activity for the application so we assume that there<br />

were always something new for the user to fetch and, hence,<br />

we set the activity to 100% for each hour, i.e. DP = {t 1 =<br />

1,t 2 =1,...,t 24 =1}.<br />

207

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!