wu.cloud: Insights Gained fromOperating a Private Cloud SystemStefan Theußl, Institute for Statistics and MathematicsWU Wirtschaftsuniversität WienMarch 23, 2011. . . . . .1 / 14

Introduction◮ In statistics we are increasingly facing the following challenges:◮ more accurate and time consuming models (1),◮ computational intensive applications (2),◮ and/or large datasets (3).◮ Thus, one could◮ or◮ just wait (1+2),◮ reduce problem size (3),◮◮◮run similar tasks on independent processors in parallel (A),load data onto multiple machines that work together in parallel (B),outsource computation (C).In this talk we focus on option C: outsourcing computation.. . . . . .Introduction 2 / 14

Requirements (Applications)Statisticians need/want to◮ run highly computational applications,◮ process large data sets,◮ run memory-demanding applications.For example:◮ Bayesian statistics (Gibbs sampling)◮ Complex optimization problems◮ Investigation of CDS/Bond quote/trade via database backend◮ MC simulation: hedging of options with Levy processes◮ Text mining on large data sets◮ Topic models. . . . . .Requirements 3 / 14

Requirements (Software)However, usually the scientific software employed is rather heterogeneous:◮ R: want to use current version and complete development environment◮ Compilers: GNU Compiler Collection, Intel Compiler, etc.◮ Mathematica and gridMathematica◮ Matlab◮ Optimization: want to use state-of-the-art optimizers like CPLEX, GLPK,KNITRO, MOSEK, etc.◮ ideally on different platforms: Linux and Windows-based system (32 and 64bit)◮ using various editors: emacs, RStudio, Winedit, nano, vi, etc.. . . . . .Requirements 4 / 14

wu.cloudFrom the NIST Definition of Cloud Computing, seehttp://csrc.nist.gov/groups/SNS/cloud-computing/, we derived thefollowing cloud model for wu.cloud:◮ private cloud as solely operated for WU members and projects,◮ thus, network access only via Intranet/VPN,◮ on-demand self-service,◮ resource pooling via virtualization,◮ extensibility/elasticity,◮ Infrastructure as a Service (IaaS),◮ Platform as a Service (PaaS).. . . . . .wu.cloud 7 / 14

wu.cloudwu.cloud is a private cloud system based on the open source softwarepackage Eucalyptus (see http://open.eucalyptus.com/).◮ Accessible via http://cloud.wu.ac.at/.◮ Consists of a frontend (website, management software) and a backend(providing resources) system.. . . . . .wu.cloud 8 / 14

wu.cloud HardwareBackend system:(c) 2010 IBM Corporation, fromDatasheet XSD03054-USEN-05Frontend System:◮ 2x IBM X3850 X5◮ 8x8 (64) core Intel Xeon CPUs 2.26 GHz◮ 1 TB RAM◮ EMC 2 Storage Area Network: 7 TB fast + 4 TBslow disks◮ Suse Linux Enterprise Server 11 SP1◮ Xen 4.0.1◮ Eucalyptus backend components (cluster, storage,node controller)◮ Virtual (Xen) instance◮ Apache Webserver◮ Eucalyptus frontend components (cloud controller, walrus). . . . . .wu.cloud 9 / 14

wu.cloud Characteristicswu.cloud aims at scaling in three different dimensions:◮ Compute-nodes: number of cloud instances and cores employed◮ Memory: amount of memory per instance requested◮ Software: Windows vs. Linux and software packages installedCPU0 5 10 15 20 25 30 35Debian/gridMathematica virtual cluster●Windows/R high CPU instance1 2 4 8 16 32 64 128 256●Debian/R high memory instance●Linux base systemR/Mathematica/MatlabR dev environmentGUI−basedcustomized systemR dev environmentMatlab/PASW/StataWindows base systemRAM [GB] per instance. . . . . .wu.cloud 10 / 14

wu.cloud User Interface◮ Amazon EC2 API◮ allows for using tools like ec2/euca2ools, hybridfox, etc., primarily designed forEC2◮ transparent use of wu.cloud and EC2/S3 side by side◮ Remote connection to cloud instances can be established by◮◮Secure shell (ssh), PuTTY (http://www.chiark.greenend.org.uk/ ~sgtatham/putty/)VNC (Linux)◮ Remote Desktop (Windows)wu.cloud 11 / 14. . . . . .

wu.cloud User Interface. . . . . .wu.cloud 12 / 14

Insights Gained and Outlook◮ Operating a private cloud environment is recommended under the followingconditions:◮ want to benefit from the cloud model (elasticity, dynamic provisioning,multi-OS/arch operation, etc.),◮ while maintaining control of resources,◮ appropriate hardware is available/affordable,◮ pay-per-use model of public clouds cannot be considered.◮ The wu.cloud idea is easily advertised to researchers using the threedimensions of scalability: computing resources, RAM, and software packagesemployed.◮ Some users prefer to have full control over a given system (i.e., being root)rather than just outsourcing computations to a homogeneous system.◮ Nevertheless, it is very important to guide users into the cloud (manuals,lectures, etc.),◮ and considerable resources have to be invested in order to provide severalbase images (Linux/Windows, R, Matlab, etc.).. . . . . .wu.cloud 13 / 14

ContactStefan TheußlInstitute for Statistics and Mathematicsemail: cloud@wu.ac.at, or, Stefan.Theussl@wu.ac.atURL: http://statmath.wu.ac.at/~theusslWU ViennaAugasse 2–6, A-1090 Wien. . . . . .wu.cloud 14 / 14

