09.07.2015 Views

Grid Computing Cluster – the Development and ... - Lim Lian Tze

Grid Computing Cluster – the Development and ... - Lim Lian Tze

Grid Computing Cluster – the Development and ... - Lim Lian Tze

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Grid</strong> <strong>Computing</strong> <strong>Cluster</strong>The <strong>Development</strong> <strong>and</strong> Integration of<strong>Grid</strong> Services <strong>and</strong> Applications<strong>Grid</strong>@USM2008/09 ReportEdited by Bahari Belaton <strong>and</strong> <strong>Lim</strong> <strong>Lian</strong> <strong>Tze</strong>Platform for Information & Communication Technology ResearchUniversiti Sains Malaysia


<strong>Grid</strong> <strong>Computing</strong> <strong>Cluster</strong>The <strong>Development</strong> <strong>and</strong> Integration of<strong>Grid</strong> Services <strong>and</strong> Applications<strong>Grid</strong>@USM2008/09 ReportEdited by Bahari Belaton <strong>and</strong> <strong>Lim</strong> <strong>Lian</strong> <strong>Tze</strong>Platform for Information & Communication Technology ResearchUniversiti Sains Malaysia


© 2009 <strong>Grid</strong>@USMPlatform for Information & Communication Technology ResearchUniversiti Sains Malaysia11800 USM, Penang, MalaysiaAll rights reserved. No part of this publication may be reproduced, stored in aretrieval system, or transmitted, electronic or mechanical photocopying, recordingor o<strong>the</strong>rwise without prior permission of <strong>the</strong> Publisher.Many of <strong>the</strong> designations used by manufacturers for products mentioned in thispublication are claimed as trademarks. The authors <strong>and</strong> <strong>the</strong> editors respect <strong>the</strong>copyright <strong>and</strong> trademarks for all of <strong>the</strong>se products. Where <strong>the</strong>y are mentionedin <strong>the</strong> book, <strong>and</strong> <strong>Grid</strong>@USM was aware of a trademark claim, <strong>the</strong> designationshave been printed in capital letters or with initial capital letters.Perpustakaan Negara MalaysiaCataloguing-in-Publication Data<strong>Grid</strong> computing cluster : <strong>the</strong> development <strong>and</strong> integration of grid services <strong>and</strong>applications / edited by Bahari Belaton, <strong>Lim</strong> <strong>Lian</strong> <strong>Tze</strong>.ISBN 978-983-3986-58-31. Computational grids (Computer systems). 2. Electronic data processing–Distributed Processing. I. <strong>Lim</strong>, <strong>Lian</strong> <strong>Tze</strong>.004.36Proof-read by Norliza Hani Md. Ghazali.Designed <strong>and</strong> typeset by <strong>Lim</strong> <strong>Lian</strong> <strong>Tze</strong> with LATEX.First printing, July 2009.Cover image by ©B S K (http://www.sxc.hu/profile/spekulator).


Image: The Petronas Towers, Kuala Lumpur. Photograph by ©Mahjabeen Mankani (http://www.sxc.hu/profile/mejmankani).PROJECT OVERVIEW


Image: It’s a wide, yet connected world. Illustration by ©B S K (http://www.sxc.hu/profile/spekulator).INTRODUCTION<strong>Grid</strong> computing can provide easy<strong>and</strong> transparent access to high performancecomputing by aggregatinggeographically distributed resourcesto act as a single powerful resource.With grid computing technologies,users can access to any virtual resourcessuch as computers, special instruments,software applications <strong>and</strong>databases in a seamless manner. It isanalogous to <strong>the</strong> electric power grid,where users do not need to know <strong>the</strong>details of <strong>the</strong> technology. They simplyconnect to common interfaces <strong>and</strong>subscribe only what <strong>the</strong>y need. <strong>Grid</strong>computing has evolved from clustercomputing, meta-computing, utilitycomputing <strong>and</strong> today, a new term hasbeen coined as cloud computing.In order to achieve <strong>the</strong> objectiveof providing seamless access to highperformance resource <strong>and</strong> virtual services,<strong>the</strong> concepts of aggregation <strong>and</strong>management are vital. Two typesof resources need to be aggregated,hardware <strong>and</strong> software. The managementaspect involves management oftechnical applications (resource management)<strong>and</strong> jobs (resource allocation).Resource monitoring monitors<strong>the</strong> grid resources <strong>and</strong> mitigate problemwhile resource allocation acceptsjob submission <strong>and</strong> ensures efficient<strong>and</strong> accurate delivery. Aggregationof grid services is classified into data<strong>and</strong> applications. Distributed datamanagement (data placement, datafragmentation or replication <strong>and</strong> heterogeneousdata transformation) providesdata distribution transparencyfor various data sources.The grid application can be anykind of services that can reside anywhere<strong>and</strong> accessible from any place.Application distribution transparencyis a main objective in gluing differentservices toge<strong>the</strong>r in grid environment.In order to manage <strong>the</strong> unstructured,distributed services, a good resourcediscovery framework has to bein place.Universiti Sains Malaysia (USM)is one of <strong>the</strong> first universitiesto embark on grid computing researchin Malaysia. In 2002, <strong>the</strong>first grid testbed was establishedthrough an e-Science project involvingUSM, Univeristi TeknologiMalaysia (UTM), <strong>and</strong> University ofMalaya (UM) in drug discovery <strong>and</strong>liquid crystal simulation. TheMalaysian Biogrid testbed has beenestablished in collaboration with <strong>the</strong>Malaysian Biotechnology <strong>and</strong> BioinformaticsNetwork, Universiti KebangsaanMalaysia (UKM), <strong>and</strong> SunMicrosystems. Through The PacificRim Applications <strong>and</strong> <strong>Grid</strong> MiddlewareAssembly (PRAGMA) <strong>and</strong> as partof <strong>the</strong> newly formed Malaysian ResearchNetwork (MyREN), USM is committedto taking <strong>Grid</strong> <strong>Computing</strong> Applications<strong>and</strong> Research in Malaysiato <strong>the</strong> next level.USM researchers have communicatedwith o<strong>the</strong>r PRAGMA participantsabout experiences in underst<strong>and</strong>ingpractice <strong>and</strong> policy issues concerningsharing resources <strong>and</strong> <strong>the</strong> establishmentof collaborations with o<strong>the</strong>rPRAGMA institutions.In 2002, a research grant wasawarded for <strong>the</strong> project “The e-Science<strong>Grid</strong>: <strong>Development</strong> of Back-end <strong>Grid</strong>Engine <strong>and</strong> <strong>Grid</strong> Infrastructure”. Twohigh performance clusters were assembled,each with 34 processors,18 GB RAM <strong>and</strong> 160 GB shared storage,running on a high-speed GigabitE<strong>the</strong>rnet backbone switch. One clusteris located at USM <strong>and</strong> <strong>the</strong> o<strong>the</strong>r atUniversiti Teknologi Malaysia, KualaLumpur (UTM-KL). The e-Science <strong>Grid</strong>Portal integrates both e-Science <strong>Grid</strong>components (such as <strong>Grid</strong> ResourceMonitoring, <strong>Grid</strong> Resource Allocation,<strong>Grid</strong> Resource Metering, <strong>Grid</strong>Resource Prediction, etc) <strong>and</strong> e-Science<strong>Grid</strong> applications (such as IterativeSolver Agent, Molecular Simulation<strong>and</strong> Molecular Docking).2


Image: A simple network. Illustration by ©Svilen Mushkatov (http://www.sxc.hu/profile/svilen001).<strong>Grid</strong>@USM:DEVELOPING & INTEGRATINGGRID APPLICATIONS & SERVICESUSM’S R&D SUSTAINABILITYTo attain <strong>the</strong> level of excellence as aresearch university (RU), USM in recentyears has seriously engaged herselfin various efforts to formulate<strong>and</strong> chart her future research <strong>and</strong>development (R&D) directions. Thepinnacle of <strong>the</strong>se efforts was translatedinto a 2 year strategic plan “USMResearch-Intensive University 2007–2009”.One of <strong>the</strong> crucial input elementsfor attaining <strong>and</strong> subsequently sustainingRU status is <strong>the</strong> creationof highly conducive research environment<strong>and</strong> infrastructure. <strong>Grid</strong>computing is an example of suchinfrastructure, aiming at providingexcellent research facilities by ensuringgood computational horsepowerto support high impact domainsin <strong>the</strong> bio-sciences, physicalsciences, information <strong>and</strong> communicationtechnology (ICT), Environmentsciences, Education, Arts <strong>and</strong> o<strong>the</strong>rs.<strong>Grid</strong> computing is by its naturehighly distributed geographically,consists of highly specialised equipment(storage, grid engines <strong>and</strong> managementtools), as well as expensive.It is as such an ideal example of common,core computational resourcesto be created <strong>and</strong> pooled among aspiringresearchers who require highperformance resources.Equally important, USM needs a focused,holistic <strong>and</strong> dedicated effort4


<strong>Grid</strong> <strong>Computing</strong> <strong>Cluster</strong> – <strong>the</strong> <strong>Development</strong> <strong>and</strong> Integration of <strong>Grid</strong> Services <strong>and</strong> ApplicationsProf. Koh Hock Lye(PPSM)<strong>Grid</strong> Application to Wave Front Propagation<strong>and</strong> Containment of Vector-borne DiseasesSetting UpUSM<strong>Grid</strong>AP Chan Huah Yong(PPSK)Dynamic Replica Managementin Data <strong>Grid</strong> EnvironmentAP Rahmat BudiartoProf. Sureswaran Ramadass(PPSK)iNet <strong>Grid</strong>AP Bahari BelatonProject LeaderAP Tang Enya Kong(PPSK)<strong>Grid</strong>-enabled Blexisma2AP Phua Kia Kien(INFORMM)Using X<strong>Grid</strong> for CreatingRender-Farm for 3D AnimatorsDr Kamal Zuhairi Zamli(PPKEE)An Automated Java Testing Toolon <strong>the</strong> <strong>Grid</strong>Dr Vincent Khoo Kay Teong(PPSK)B2B St<strong>and</strong>ardsComponent Modeling<strong>Grid</strong> middleware <strong>and</strong> enablers PPSK – School of Computer Sciences PPKEE – School of Electrical & Electronic Engineering<strong>Grid</strong> applications PPSM – School of Ma<strong>the</strong>matical Sciences INFORMM – Institute for Research in Molecular MedicineFigure 1: <strong>Grid</strong>@USM Project Team <strong>and</strong> Sub-projectsto build a sustainable human capitalin grid computing. We needresearch-based human capital in orderto achieve <strong>and</strong> maintain technical humanresource support with technicalskills, knowledge <strong>and</strong> experience toensure uptime of project deliverability,releasing academics to concentrateon activities to fur<strong>the</strong>r <strong>the</strong> frontier ofknowledge.GRID COMPUTING AND USMUSM is no stranger to grid computing.We pioneered various grid computingresearch initiatives at both national<strong>and</strong> international levels, as exemplifiedby our past achievements:• USM led <strong>the</strong> first e-Science grid testbedin 2002, connecting USM–UTM–UKM in a project to simulate liquidcrystal <strong>and</strong> drug discovery.• USM also led <strong>the</strong> bio-grid test-bedengaging industry partners SunMicrosystems, Biotechnology <strong>and</strong>Bioinformatics Network (BBN) <strong>and</strong>UKM.• USM launched its University <strong>Grid</strong><strong>and</strong> International Collaboration initiativein May 2005 to provide awell-designed grid computing infrastructureamong its three campuses,pooling grid resources toprovide computational support toR&D activities.• USM participated in Malaysian ResearchNetwork (MyREN) which provided<strong>the</strong> b<strong>and</strong>width to connectgrid resources within <strong>and</strong> outsideUSM.• USM is <strong>the</strong> first Malaysian memberin The Pacific Rim Applications<strong>and</strong> <strong>Grid</strong> Middleware Assembly(PRAGMA). The PRAGMA communityregularly uses USM’s computationalresources via <strong>the</strong> Auroracluster.• An MoU was signed between USM,Sun-APSTC (Asia-Pacific Science &Technology Center) <strong>and</strong> NationalUniversity of Singapore (NUS),where one of <strong>the</strong> proposals was toset up Asia-Pacific Science & TechnologyForum (APSTF) at USM.• USM was granted <strong>the</strong> honour tohost PRAGMA 15 in October 2008,a milestone in charting <strong>and</strong> leadingR&D in grid computing.Physically, pockets of grid infrastructuresin <strong>the</strong> form of clusteredcomputers, high-end grid engines<strong>and</strong> o<strong>the</strong>r specific resources(e.g. domain-specific software) havebeen acquired <strong>and</strong> deployed withinUSM. The grid infrastructures aremainly concentrated at <strong>the</strong> School ofComputer Sciences (PPSK), <strong>the</strong> Centrefor Knowledge, Communication &Technology (PPKT) <strong>and</strong> <strong>the</strong> Laboratoryof Biocrystallography & StructuralBioinformatics (MBBS). The latest additionis <strong>the</strong> grid engine deployed at<strong>the</strong> Collaborative µElectronic DesignExcellence Center (CEDEC) at <strong>the</strong> Engineeringcampus.GRID ACCESSIBILITY ANDHUMAN EXPERTISEDespite <strong>the</strong> various progresses made,particularly with respect to infrastructure,<strong>the</strong> real benefits of <strong>the</strong>se computationalpowers to <strong>the</strong> scientists werenot yet fully realised. The penetrationrate among pure end-users isperhaps at <strong>the</strong> lower end – 10 % atmost. Clearly, with such costly investmentin <strong>the</strong> infrastructure <strong>and</strong> heavyparticipation from technologists (particularlycomputer scientists), a welldesigneddeployment plan to increaseaccessibility to <strong>the</strong> end-users is desperatelyneeded. In fact, PRAGMA wasestablished solely to h<strong>and</strong>le this lastmileissue – making grid resourcesaccessible to researchers.Zooming into <strong>the</strong> technical level,<strong>the</strong>re are two main barriers hampering<strong>the</strong> quick <strong>and</strong> large-scale acceptanceof grid computing among researchers.Firstly, <strong>the</strong> technology itself stillneeds fur<strong>the</strong>r R&D efforts to makeit more user-friendly. The ultimatePROJECT OVERVIEW 5


<strong>Grid</strong>@USMaim here is to allow researchers withcomputation-intensive requirementsto simply plug <strong>the</strong>ir application intogrid technology <strong>and</strong> run it seamlesslywithout major user intervention. Ananalogy would be <strong>the</strong> seamless mannerwith which we plug electrical devicesinto power sockets <strong>and</strong> <strong>the</strong>y willsimply function – without us knowingwhat actually goes on behind <strong>the</strong>power socket.The second problem is relatedto <strong>the</strong> readiness of grid technology.There are stillmany open researchquestionsto address itsfour main domainareas: policy<strong>and</strong> governanceof gridresources; gridmiddleware <strong>and</strong>tool enablers;grid applications;infrastructure<strong>and</strong> security.It is withUSM’s R&D needs<strong>and</strong> sustainabilityin mindthat we set up<strong>Grid</strong>@USM, agrid computingresearch clusterto address <strong>the</strong> problems mentionedabove. Our two main driving principles<strong>and</strong> objectives are:• Training of human capital in gridtechnology, <strong>and</strong>• Increasing <strong>the</strong> accessibility <strong>and</strong> utilisationof USM grid infrastructures.<strong>Grid</strong>@USM’s pilot project, led byAP Bahari Belaton <strong>and</strong> entitled <strong>Grid</strong><strong>Computing</strong> <strong>Cluster</strong> – <strong>the</strong> <strong>Development</strong><strong>and</strong> Integration of <strong>Grid</strong> Services <strong>and</strong> Applications,was initiated under an RUresearch grant. This RU project involvesintegrated research couplingexpertise from different fields. Thetechnologists (mainly computer scientists)will continue to work <strong>and</strong>improve on <strong>the</strong> previously mentionedgrid computing domain areas,while grid infrastructure accessibility<strong>and</strong> utilisation is increasedby grid-enabling existing research applicationsby experts from o<strong>the</strong>r domains.This is directly related to ourthird objective:• To showcase <strong>the</strong>se grid-enabled applicationsduring PRAGMA 15.Project Objectives❦ To increase accessability <strong>and</strong>utilization of USM Campus <strong>Grid</strong>infrastructure❦ To train research-based humancapital in grid computing❦ To showcase potential gridapplications in PRAGMA 15The human capital training will focuson research-based skills, knowledge<strong>and</strong> h<strong>and</strong>s-on experience in gridcomputing for both technologists <strong>and</strong>domain experts.THE IMPLEMENTATION PLANThe project can be viewed as havingtwo main domains:<strong>Grid</strong> middleware <strong>and</strong> tool enablers,concerning <strong>the</strong> administration <strong>and</strong>management of grid resources<strong>and</strong> users. We allocated 1 / 3 of <strong>the</strong>overall project resources to thisfundamental stage.<strong>Grid</strong> applications, where readilyavailable applications are gridenabledto run on existing gridinfrastructure available at USM.This accounts for 2 / 3 of <strong>the</strong> project.The former task is entrusted to<strong>the</strong> <strong>Grid</strong> <strong>Computing</strong> Lab (GCL) ofPPSK. As for <strong>the</strong> latter aspect,a call for research proposals issuedby <strong>the</strong> ICT Research Platformelicited applications from researchersof various disciplines, ranging frommedicine, computer sciences to engineering.Seven seed applications <strong>and</strong><strong>the</strong>ir principle investigatorswereidentified. Toge<strong>the</strong>rwith <strong>the</strong>task of settingup USM Campus<strong>Grid</strong> i.e. <strong>the</strong> gridinfrastructure forresearchers, wehave a total of8 sub-projects,<strong>the</strong> leaders ofwhich are alsomembers of<strong>the</strong> <strong>Grid</strong>@USMteam.To achieve<strong>the</strong> project aims,domain expertsin <strong>the</strong> respectivefields wereemployed asGraduate ResearchAssistants (GRAs) <strong>and</strong>/orResearch Assistants (RAs) for eachgrid application sub-project. TheseGRAs <strong>and</strong> RAs were given training toget acquainted with <strong>the</strong> grid engines<strong>and</strong> infrastructure at USM. They <strong>the</strong>nadapted <strong>the</strong> domain-specific applicationsto run on <strong>the</strong> grid environmentunder guidance of <strong>the</strong> respective subprojectleaders. Finally, <strong>the</strong>se gridapplications were deployed on USMCampus <strong>Grid</strong>, where <strong>the</strong>y are accessiblevia various grid tool enablers. Detailsabout <strong>the</strong> research methodology ofeach sub-project are reported in laterchapters, as are <strong>the</strong> various activitiescarried out under this project.U@S<strong>Grid</strong>M<strong>Grid</strong>@USM6 <strong>Grid</strong>@USM: DEVELOPING & INTEGRATING GRID APPLICATIONS & SERVICES


PROJECT MILESTONESPast events2007Planned eventsOctProject Kick-offPhase 1: <strong>Grid</strong>-enabling seedapplications for PRAGMA15Sub-project teamsequipped with gridenabledworkstationsSub-project teamshooked up toUSM Campus <strong>Grid</strong>2008FebMarIntroductory Workshopto <strong>Grid</strong> <strong>Computing</strong>PRAGMA 14 in Taiwan(Promoting PRAGMA 15)Workshop on <strong>Grid</strong><strong>Computing</strong> Strategies& Security IssuesJunPRAGMA 15 in USMAugCampus <strong>Grid</strong> Portalwent liveAll sub-projects participatedin poster sessionsOct3 sub-projects selectedfor specialdemonstrations2009Phase 2: Completing gridservices projects to providecampus grid infrastructurefor more researchers <strong>and</strong>external partiesPRAGMA 16 in KoreaMarApr1st Knowledge <strong>Grid</strong>Malaysia ForumUSM Campus <strong>Grid</strong>hardware upgradePublishing ProjectReportJulProject End: 30 Sep(Original plan)AugSep<strong>Grid</strong>-enabled Blender 3DAnimation WorkshopNovSouth China SeaTsunami Workshop 3Presence atsains@usm asservice provider2010U@S<strong>Grid</strong>M7


PROJECT EXPENDITURESJanuary ’08 – May ’09Project allocation <strong>and</strong> expenditures until May 2009. All amounts are in RM.USM Expenditure Code & item Project Allocation Amount Spent Project Balance111 Salary & wages 465,600 274,885 190,715115 O<strong>the</strong>r emoluments 0 4,272 −4,272221 Travel expenses & subsistence 60,000 52,456 7,544223 Communication & utilities 4,000 5 3,995224 Rental 10,000 0 10,000227 Research materials & supplies 20,400 11,226 9,174228 Maintenance & minor repair services 20,000 0 20,000229 Professional & o<strong>the</strong>r services 368,000 59,579 308,421335 Equipment 52,000 274,436 −222,436Total 1,000,000 676,859 323,141Salary & wages O<strong>the</strong>r emoluments Travel expensesCommunication & utilities Rental Research materials & suppliesMaintenance & minor repair services Professional & o<strong>the</strong>r services EquipmentProject allocation (outer ring) <strong>and</strong> actual expenditure up till May ’09 (inner pie)U@S<strong>Grid</strong>M8


Image: Eye on Malaysia by Titiwangsa Lake, Malaysia. Photography by ©Mee Lin Woon (http://www.sxc.hu/profile/meelin).SUB-PROJECTS


Image: Server room at DESY, Hamburg. Photograph by ©Christian Hoffmann (http://www.sxc.hu/profile/brcwcs).SETTING UPUSM CAMPUS GRID<strong>Grid</strong> <strong>Computing</strong> LabSchool of Computer SciencesTEAM MEMBERSProject Leader : AP Chan Huah YongResearchers: Ang Sin Keat, Tan Chin Min, Kheoh Hooi Leng, Cheng Wai Khuen, Muhammad Muzzammil bin MohdSalahudinINTRODUCTIONIn order to increase <strong>the</strong> accessibilityof USM’s grid infrastructure, we at <strong>the</strong><strong>Grid</strong> <strong>Computing</strong> Lab (GCL) embarkedon this effort to set up USM Campus<strong>Grid</strong> to link research groups in all 3Universiti Sains Malaysia (USM) campuses.USM Campus <strong>Grid</strong> also acts as atestbed for <strong>the</strong> sub-project teams’ gridenabledapplications. We welcome researchers,both in <strong>and</strong> outside USM,to utilize USM Campus <strong>Grid</strong> resources.In a nutshell, our objective is tointegrate clusters at <strong>the</strong> 7 sub-projectteam locations with GCL filtered availableresources <strong>and</strong> PPKT resources.USM CAMPUS GRIDRESOURCESCentrally, several machines were setup at GCL to serve all sub-project teammembers, to h<strong>and</strong>le directory <strong>and</strong> servicequeries (Lightweight DirectoryAccess Protocol or LDAP), grid credentialmanagement, user authorisation,as well as for computational processingneeds.On <strong>the</strong> o<strong>the</strong>r h<strong>and</strong>, this project involves7 sub-project teams from differentresearch groups <strong>and</strong> centres ofUSM, spanning all 3 campuses in differentgeographical locations:Main campus (Minden, Penang)❦ <strong>Grid</strong> <strong>Computing</strong> Lab (GCL),School of Computer Sciences:Dynamic Replica Management inData <strong>Grid</strong> Environment.❦ Computer-aided TranslationUnit (UTMK), School of ComputerSciences: <strong>Grid</strong>-enabledBlexisma2.❦ School of Ma<strong>the</strong>matical Sciences(PPSM): <strong>Grid</strong> Application to WaveFront Propagation <strong>and</strong> Containmentof Vector Borne Diseases.❦ Information Systems EngineeringResearch Group (ISE), Schoolof Computer Sciences: B2B St<strong>and</strong>ardsComponent Modeling.❦ Network Research Group (NRG),School of Computer Sciences:iNet <strong>Grid</strong>.Transkrian Engineering Campus(Nibong Tebal, Penang)❦ School of Electrical & ElectronicEngineering (PPKEE): An AutomatedJava Testing Tool on <strong>the</strong><strong>Grid</strong>.Health Campus(Kubang Kerian, Kelantan)❦ Institute for Research in MolecularMedicine (INFORMM): UsingX<strong>Grid</strong> for Creating Render-Farmfor 3D Animations.We supplied each sub-project teamwith a Dell Vostro 400 Mini TowerDesktop workstation, on which we installed<strong>and</strong> configured <strong>the</strong> Rocks <strong>Cluster</strong>s4.2.1 (Cydonia) operating system,<strong>the</strong> Sun <strong>Grid</strong> Engine (SGE), <strong>the</strong> Globustoolkit, <strong>the</strong> MyProxy credential managementservice, <strong>the</strong> Ganglia moni-10


<strong>Grid</strong> <strong>Computing</strong> <strong>Cluster</strong> – <strong>the</strong> <strong>Development</strong> <strong>and</strong> Integration of <strong>Grid</strong> Services <strong>and</strong> ApplicationsTable 1: Recent USM Campus <strong>Grid</strong> resources<strong>Cluster</strong> No of CPU Physical Hosts Memory (each host) Storage (each host)Aurora 32 16 2 GB (master node)1 GB (compute node)80 GBGCLDatarm 20 5 2 GB 500 GB (master node)250 GB (4 compute)GCLMicro 40 5 8 GB 500 GBGCLHome 8 1 8 GB 5 TBLDAP server 4 1 4 GB 500 GBgclportal 4 1 4 GB 500 GBgclproxy 2 1 1 GB 160 GBgcldns 1 1 786 MB 160 GBmathvbd 4 1 2 GB 500 GBisitb2b 4 1 2 GB 500 GButmkblex 4 1 2 GB 500 GBengtest 4 1 2 GB 500 GBnav6mon 4 1 2 GB 500 GBComsics – – – –Table 2: Physical machines purchased under RU grantModel Quantity ParticularsDell Vostro 400 Mini Tower Desktop 7 A single desktop was distributed to each member in <strong>the</strong> campusgrid projectApple X-<strong>Grid</strong> Server CTO 1 Used for INFORMM’s projectSun SPARC Enterprise T1000 Server 3 Machines just arrived <strong>and</strong> still in installation progress.Dell Power Edge 1950 Rack Mount Server 1 Machine just arrived <strong>and</strong> still in installation progress.Dell Power Edge 1950 Rack Mount Server 3 Machines just arrived <strong>and</strong> still in installation progress.Dell Power Edge SC1435 Server 5 Set up as GCLMicro cluster.toring system, <strong>and</strong> tools for MessagePassing Interface (MPI) programming.Our research officers (ROs) set up<strong>the</strong> machines on-site at each subprojectteam’s lab, connecting <strong>the</strong>m toUSM Campus <strong>Grid</strong> such that each clustercan access resources on all o<strong>the</strong>r clustersin <strong>the</strong> grid. We also connected<strong>the</strong> USM Campus <strong>Grid</strong> to <strong>the</strong> MalaysianResearch Network (MyREN), <strong>the</strong>reforeallowing external parties to utilizeUSM grid resources as well – includingPRAGMA members. We also createduser accounts <strong>and</strong> grid credentialsfor <strong>the</strong> sub-project team membersduring this set up exercise.The current machine resources arelisted in Tables 1 <strong>and</strong> 2, while Figure1 shows a conceptual view of USMCampus <strong>Grid</strong>’s architecture.USM CAMPUS GRID USERSIn line with <strong>Grid</strong>@USM’s objective ofincreasing <strong>the</strong> accessibility <strong>and</strong> utilisationof USM grid infrastructures,we welcome researchers – both in<strong>and</strong> outside <strong>the</strong> School of ComputerSciences <strong>and</strong> even USM – to use USMCampus <strong>Grid</strong> for <strong>the</strong>ir computationalneeds. Table 3 lists current registeredUSM Campus <strong>Grid</strong> users <strong>and</strong> <strong>the</strong>ir researchprojects.SHARED HOME DIRECTORYEach sub-project team originallymaintained <strong>the</strong>ir data <strong>and</strong> applicationfiles on <strong>the</strong>ir own machines, <strong>and</strong><strong>the</strong>refore had to transfer files betweenmachines before submitting processingjobs on o<strong>the</strong>r clusters. We <strong>the</strong>reforeset up a shared user home directory,by using <strong>the</strong> network file system(NFS) protocol <strong>and</strong> <strong>the</strong> Lightweight DirectoryAccess Protocol (LDAP) serverto get <strong>the</strong> user’s details in setting up<strong>the</strong> user profiles.With this shared home directory,USM Campus <strong>Grid</strong> users can log on toany node <strong>and</strong> have access to all <strong>the</strong>irfiles. They are <strong>the</strong>refore able to initiate<strong>and</strong> submit jobs without <strong>the</strong> hassle<strong>and</strong> overhead of transferring requiredfiles between nodes.CAMPUS GRID PORTALWe have built a web portal forUSM Campus <strong>Grid</strong> at http://gclportal.usmgrid.myren.net.my using <strong>the</strong><strong>Grid</strong>Sphere platform. The Campus <strong>Grid</strong>Portal acts as an information centrefor USM Campus <strong>Grid</strong> <strong>and</strong> <strong>Grid</strong>@USM’sprojects. Upon logging in, membersmay customize <strong>the</strong> portal by specifyinggroups that <strong>the</strong>y wish to bein.We describe o<strong>the</strong>r important featuresof <strong>the</strong> Campus <strong>Grid</strong> Portal in <strong>the</strong>following sections.PROJECT PROGRESSUPDATES WITH TRAC WIKIWe have set up a Trac Wiki platformfor <strong>the</strong> respective sub-project teamsto post information <strong>and</strong> progress updatesabout <strong>the</strong>ir respective applications.This wiki serves as a projectmonitoring mechanism, as well as aninvaluable archive for recording <strong>the</strong>irexperience using <strong>Grid</strong> technologies.SUB-PROJECTS 11


<strong>Grid</strong>@USMFigure 1: Simplified view of USM Campus <strong>Grid</strong> architectureAPPLICATION SHOWCASESWITH I-FRAMESub-project teams may showcase <strong>the</strong>irgrid-enabled applications in <strong>the</strong> Campus<strong>Grid</strong> Portal using a number ofmechanisms. In cases where a Webgraphical user interface (GUI) is available<strong>and</strong> presumably hosted on <strong>the</strong>sub-project team’s node, this is just amatter of informing GCL researchersof <strong>the</strong> relevant URL. We would <strong>the</strong>nembed <strong>the</strong> web interface as an i-Frameportlet in <strong>the</strong> Campus <strong>Grid</strong> Portal. Ino<strong>the</strong>r words, <strong>the</strong> portal acts as a onestopshowcase of <strong>the</strong> sub-project gridenabledapplications.Each member can specify <strong>the</strong> membergroup <strong>the</strong>y wish to be in, thuscontrolling <strong>the</strong> i-Frame applicationsthat will appear on <strong>the</strong>ir personalisedportal pages upon logging in. Thisalso provides an easier route for membersto make use of <strong>the</strong> results fromano<strong>the</strong>r team’s application, i.e. via<strong>the</strong> portlets <strong>and</strong> without having toknow <strong>the</strong> application’s original URL.Currently, <strong>the</strong> UTMK <strong>and</strong> ISE projectteams have incorporated Web GUIfront-ends of <strong>the</strong>ir applications into<strong>the</strong> Campus <strong>Grid</strong> Portal as i-Frameportlets (Figure 2).JOB SUBMISSION PORTALJob submission portlets were deployedto provide a user-friendly GUIfor submitting jobs to <strong>the</strong> grid, as opposedto using <strong>the</strong> comm<strong>and</strong> line. Authorisedusers are only required toprovide <strong>the</strong>ir passwords <strong>and</strong> <strong>the</strong> locationof a executable file. At <strong>the</strong> clickof a button, <strong>the</strong> job is submitted toa designated cluster. The portal willnotify <strong>the</strong> user when <strong>the</strong> job is completed,<strong>and</strong> users may <strong>the</strong>n download<strong>the</strong> results. In particular, we have createda dedicated job submission portletto allow <strong>the</strong>PPSM team to submit<strong>the</strong>ir computation jobs to <strong>the</strong> Auroracluster.SINGLE LOGONWe implemented a single log-on systemto provide transparent access too<strong>the</strong>r grid resources in USM. Thesystem currently allows Campus <strong>Grid</strong>Portal users to access <strong>the</strong> Centrefor Knowledge, Communication &12 SETTING UP USM CAMPUS GRID


<strong>Grid</strong> <strong>Computing</strong> <strong>Cluster</strong> – <strong>the</strong> <strong>Development</strong> <strong>and</strong> Integration of <strong>Grid</strong> Services <strong>and</strong> ApplicationsUserAP Phua Kia Kien<strong>Lim</strong> <strong>Lian</strong> <strong>Tze</strong>Prof. Koh Hock LyeDr Kamal Zuhairi ZamliProf. Sureswaran Ramadass, AP Rahmat BudiartoDr Vicent Khoo Kay TeongA<strong>the</strong>erSupervisor: Dr Nur’Aini Abdul RashidAdnanSupervisor: Prof. Sureswaran RamadassSalah SalemSupervisor: AP Chan Huah YongAhmad AbusnainaSupervisor: AP Chan Huah YongDr Yoon Tiem Leong –Ng Kok FuSupervisor: AP Norhashidah Hj Mohd AliKoay Kok TeongSupervisor: AP Norhashidah Hj Mohd AliTable 3: USM Campus <strong>Grid</strong> UsersProjectUsing X<strong>Grid</strong> for Creating Render-Farm for 3D Animators<strong>Grid</strong>-enabled Blexisma2<strong>Grid</strong> Application to Wave Front Propagation <strong>and</strong> Containment ofVector Borne DiseasesAn Automated Java Testing Tool on <strong>the</strong> <strong>Grid</strong>iNet <strong>Grid</strong>B2B St<strong>and</strong>ards Component ModelingMSc (Mixed Mode)PhDMSc (Mixed Mode)MSc (Mixed Mode)PhD, Parallel Preconditioned Explicit Group Methods for Distributed MemoryMulticomputersMSc (Research), Parallel Numerical AlgorithmsFigure 2: An i-Frame portlet in Campus <strong>Grid</strong> PortalTechnology (PPKT)’s portal with a onetimesign-in, <strong>and</strong> vice versa.SUPPORT & MAINTENANCETo summarise, <strong>the</strong> GCL team acts asa support <strong>and</strong> maintenance team ofUSM Campus <strong>Grid</strong> <strong>and</strong> <strong>Grid</strong>@USM ’s RUproject. Among <strong>the</strong> tasks we havebeen (<strong>and</strong> are) carrying out include:• managing USM Campus <strong>Grid</strong> machineslocated centrally at GCL, includinghardware upgrades;• managing <strong>and</strong> monitoring USMCampus <strong>Grid</strong> services, includingmanagement of users, credentials<strong>and</strong> computational resources;• managing <strong>the</strong> Campus <strong>Grid</strong> Portalas an information centre <strong>and</strong> jobsubmission portal;• h<strong>and</strong>ling project management <strong>and</strong>administrative issues;• supporting sub-project teams ingrid-enabling applications, e.g.:Í installing Blender 3D onGCLDatarm <strong>and</strong> creating a samplePHP page for SGE job submissionfor <strong>the</strong> INFORMM team;Í preparing sample Fortran code,with <strong>and</strong> without enabling MPI,for <strong>the</strong> PPSM team;Í enabling database connectivitybetween Microsoft SQL2005 <strong>and</strong>Linux clients using FreeTDS,C/C++ library scripts <strong>and</strong> bindingwith ASP.NET (C#) for <strong>the</strong>ISE team.U@S<strong>Grid</strong>MSUB-PROJECTS 13


Image: Binary data everywhere. Illustration by ©Rodolfo Clix (http://www.sxc.hu/profile/clix).DYNAMIC REPLICA MANAGEMENTIN DATA GRID ENVIRONMENT<strong>Grid</strong> <strong>Computing</strong> LabSchool of Computer SciencesTEAM MEMBERSProject Leader : AP Chan Huah YongResearchers : Aloysius Indrayanto, Muhammad Muzzammil bin Mohd SalahudinContributors : Siamak Sarmady, Wong Jik Soon, Seow Kwen Jin, Zeinab Noorian, Ang Sin Keat, Cheng Wai KhuenINTRODUCTIONAccessing, managing, querying, <strong>and</strong>programming a database system arenormally dependent on <strong>the</strong> type of<strong>the</strong> database deployment, st<strong>and</strong>ards,conventions defined by <strong>the</strong> databasevendor, <strong>and</strong> <strong>the</strong> language that will beused to define <strong>the</strong> query. Currently,<strong>the</strong> Relational Database ManagementSystem (RDBMS) is one of <strong>the</strong> mostwidely used database system. Queryon <strong>the</strong>se kind of databases are usuallydone by using a language namedStructured Query Language (SQL).RDBMS is currently implemented<strong>and</strong> supported by many different vendors.Every vendor has its ownmethod or convention on how usersshould access its database system.This convention is defined as a setof programming procedure calledApplication Programming Interface(API). These differences would causea developer (programmer) <strong>the</strong> needto learn many different API if he/shewants to use <strong>and</strong> integrate RDBMS deploymentsfrom multiple vendors.SQL is a st<strong>and</strong>ardized language.However, <strong>the</strong> st<strong>and</strong>ard set of SQL comm<strong>and</strong>snormally are not enough fora vendor. Hence, database vendorswould normally add non-st<strong>and</strong>ard extensionsto <strong>the</strong> SQL so that users canuse <strong>the</strong>ir RDBMS more efficiently. Thisextension would cause an administratoror a developer <strong>the</strong> need to learnmany non-st<strong>and</strong>ard SQL comm<strong>and</strong>s.In addition, some vendors also providea custom API which will need tobe used if a developer wants an evenmore efficient access to <strong>the</strong> RDBMS.Some vendors only provide customAPIs to access <strong>the</strong> RDBMS’ specific features(no extension to <strong>the</strong> SQL language).Learning many different APIs <strong>and</strong>non-st<strong>and</strong>ard SQL extensions wouldnot be efficient for a developer. Timecould be wasted to factorize whatcan be done <strong>and</strong> what cannot bedone with a particular RDBMS product/deployment.Hence, interoperatingRDBMS from multiple vendorswould be a maintenance problem.Performing a custom distributedquery between two different RDBMSthat are installed in two differenthosts (computers) would be impossibledue to <strong>the</strong> fact that <strong>the</strong> vendorsuse <strong>the</strong>ir own proprietary protocolsfor communication between<strong>the</strong>ir RDBMS servers <strong>and</strong> clients. Also,not all RDBMS vendors support distributedquery between two (or more)hosts. In addition, <strong>the</strong> cost of <strong>the</strong>software license <strong>and</strong> deployment for<strong>the</strong>se kind of databases are expensive.A large database vendors, such asOracle <strong>and</strong> IBM DB2, provides supportsfor distributed query. Hence,<strong>the</strong>ir system is called DistributedDatabase System (DDS). However, <strong>the</strong>two systems may not cooperate wellwith each o<strong>the</strong>r. Hence, performinga distributed query between Oracle<strong>and</strong> DB2 would be difficult.The VDDBMS project aims to providea compatibility layer betweendifferent RDBMS implementations <strong>and</strong><strong>the</strong> end users. The main objective isto allow distributed queries betweentwo or more RDBMS from differentvendors with <strong>the</strong> same SQL <strong>and</strong> API.14


<strong>Grid</strong> <strong>Computing</strong> <strong>Cluster</strong> – <strong>the</strong> <strong>Development</strong> <strong>and</strong> Integration of <strong>Grid</strong> Services <strong>and</strong> ApplicationsFigure 1: Current architecture of <strong>the</strong> VDDBMS systemMETHODOLOGYThe methodology applied by <strong>the</strong>project to achieve its goal can bebriefly described as follow:1. An SQL parser has been implementedto parse <strong>the</strong> user-given SQLstatements into tokens (symbols<strong>and</strong> data).2. A resource locator has been implementedto add references to <strong>the</strong>tokens to record <strong>the</strong> hosts (datasources) on which <strong>the</strong> SQL wouldneed to be executed.3. A token splitter has been implementedto split <strong>the</strong> tokens into subtokens.These steps are necessarybecause a data source does not haveto execute <strong>the</strong> complete tokens.4. An instruction sequencer has beenimplemented to execute to subtokensin <strong>the</strong> appropriate datasources in <strong>the</strong> correct sequence.5. A SQL generator has been implementedto translate back <strong>the</strong> subtokensinto SQL statements underst<strong>and</strong>ableby <strong>the</strong> target datasources.6. A post-processor has been implementedto aggregate back <strong>the</strong> subresultsfrom <strong>the</strong> data sources into<strong>the</strong> final result.DESIGN & IMPLEMENTATIONThe system can be separated into fourmain modules:1. Core engine;2. Backend drivers for each RDBMSvendor to be supported;3. Language binding;4. Network server <strong>and</strong> client drivers.The system is mainly developedusing C++ in Linux. Figure 1 shows<strong>the</strong> current architecture of <strong>the</strong> system.The core engine interacts with <strong>the</strong>backend RDBMS using <strong>the</strong> specificdrivers. The core engine is workingtoge<strong>the</strong>r with resource catalogue <strong>and</strong>lock manager to process user requests.User can connect to <strong>the</strong> VDDBMS systemei<strong>the</strong>r locally or remotely. Currentlylocal connection is supportedby native API in C++ <strong>and</strong> PHP. Remoteconnection is supported by networkAPI in C++, PHP, Java, <strong>and</strong> .NET.Figure 2 shows <strong>the</strong> modules in <strong>the</strong>VDDBMS system with <strong>the</strong>ir simplifieddataflow. In <strong>the</strong> figure <strong>the</strong> solid linesdefines communication path whichare done using API. The dashed linedefines communication path whichare done using network protocol.Most of <strong>the</strong> API communicationpaths are private. Only <strong>the</strong> C++ <strong>and</strong>PHP public interfaces are usable by<strong>the</strong> user. Hence, changes in <strong>the</strong> internalAPI would not raise incompatibilityissue with user applications.The network (VDDBMS) server <strong>and</strong><strong>the</strong> client drivers (C++, Java, .NET)are completely independent. As longas <strong>the</strong> same protocol is used, changesin <strong>the</strong> server would not affect <strong>the</strong>client. The protocol has been designedto be extensible. Hence, anewer server would be able to manageolder client driver implementations.Figure 3 shows <strong>the</strong> high level flowof <strong>the</strong> VDDBMS engine while Figure 4shows <strong>the</strong> detailed flow of <strong>the</strong> coreexecution engine.The main features of <strong>the</strong> currentsystem are:1. Data replication <strong>and</strong> horizontalfragmentation.2. Support for simple SQL INSERT,DELETE, UPDATE, <strong>and</strong>SELECT queries.3. Prepared statement for SQLINSERT.4. Native DATE/TIME field support.5. Stored procedure with supportfor looping, conditional statements,simple calculations, <strong>and</strong> atomic integeroperations.6. MySQL, PostgreSQL, Microsoft-SQL, <strong>and</strong> Oracle are supported as<strong>the</strong> backend RDBMS (data sources).SUB-PROJECTS 15


<strong>Grid</strong>@USMFigure 2: Modules of <strong>the</strong> VDDBMS system <strong>and</strong> <strong>the</strong>ir simplified dataflowsFigure 3: Flow of <strong>the</strong> VDDBMS Engine16 DYNAMIC REPLICA MANAGEMENT IN DATA GRID ENVIRONMENT


<strong>Grid</strong> <strong>Computing</strong> <strong>Cluster</strong> – <strong>the</strong> <strong>Development</strong> <strong>and</strong> Integration of <strong>Grid</strong> Services <strong>and</strong> ApplicationsFigure 4: Detailed Flow of <strong>the</strong> VDDBMS Execution Engine7. Plugin support (backend RDBMSdriver <strong>and</strong> user-implementedfunctionin stored procedure).8. C++ native API.9. PHP, Java, <strong>and</strong> .NET binding.10. Network server is implemented ontop of <strong>the</strong> core engine with userau<strong>the</strong>ntication support.RESULTS & CONCLUSIONThe VDDBMS system is able to:• Display multiple backend RDBMSas if <strong>the</strong>y are one single, virtualdatabase host.• Perform database <strong>and</strong> table managementsin multiple backendRDBMS.• Perform queries <strong>and</strong> updates inmultiple backend RDBMS.Compared with a st<strong>and</strong>aloneRDBMS system such as MySQL, <strong>the</strong>VDDBMS system performance is about10 to 100 times slower. However, ithas <strong>the</strong> advantage in that it can performdistributed query on multipleRDBMS hosts from different vendors.FUTURE WORK• Distributed resource catalogue <strong>and</strong>distributed lock manager.• Support for aggregation (bottomup)approach. In this mode, existingdatabase deployments can beintegrated as one virtual database.The user would still be able to use<strong>the</strong> existing deployments as independentsystem as well as aggregate<strong>the</strong>m with <strong>the</strong> VDDBMS system.• Support for error recovery.• Performance optimization.• Integration of a more intelligent algorithmon top of <strong>the</strong> system forautomated data replica or fragmentmanagement using an already published/implementedwork.CONTRIBUTION TO THE OPENSOURCE COMMUNITYThis work has been published as anOpen Source Software (OSS) projectvia <strong>the</strong> SourceForge code repository.The project homepage can be accessedfrom <strong>the</strong> URL http://vddbms.sourceforge.net.PROJECT PUBLICATIONSAl-Mistarihi, H. H. E. & Chan, H.Y. (2009). On Fairness, OptimizingReplica Selection in Data <strong>Grid</strong>s.IEEE Transactions on Parallel & DistributedSystems. (forthcoming).Al-Mistarihi, H. H. E. (2009). AData <strong>Grid</strong> Replica ManagementSystem with Local <strong>and</strong> GlobalMulti-Objective Optimization. PhD<strong>the</strong>sis, School of Computer Sciences,Universiti Sains Malaysia.Chan, H. Y. & Al-Mistarihi, H. H.E. (2008). Replica Management inData <strong>Grid</strong>. International Journal ofComputer Science <strong>and</strong> Network Security.Chan, H. Y. & Al-Mistarihi, H. H. E.(2008). Replica Optimization Servicein Data <strong>Grid</strong>s. Sciences Publications.Indrayanto, A. (2009). Initial File-Placement in Data <strong>Grid</strong> Environmentusing Game Theory <strong>and</strong> FictitiousPlay. MSc <strong>the</strong>sis, School ofComputer Sciences, Universiti SainsMalaysia.Indrayanto, A. & Chan, H. Y. (2008).Application of Game Theory <strong>and</strong>Fictitious Play in Data Placement.In Proceedings of <strong>the</strong> InternationalConference on Distributed Frameworks& Applications 2008 (DFmA 2008).Penang, Malaysia; pp. 79–83.U@S<strong>Grid</strong>MSUB-PROJECTS 17


Image: Sun-lit spiral staircase. 3D illustration by ©Bertr<strong>and</strong> Benoit (http://www.bertr<strong>and</strong>-benoit.com/).Inset (on wall): INFORMM building model in Blender 3D.USING GRID TECHNOLOGY TOCREATE A RENDER-FARM FORBLENDER 3D ANIMATIONBioinformatics LabInstitute for Research in Molecular MedicineTEAM MEMBERSProject Leader : AP Phua Kia KenResearcher : Zafri MuhammadINTRODUCTION<strong>Grid</strong> computing is becoming morepopular today as computers comeequipped with multi-core CPUs. Toge<strong>the</strong>rwith innovative operating systems,multi-threaded <strong>and</strong> parallel processing,a.k.a. grid computing, has becomemore common, especially for applicationsthat require intensive computations.An attractive feature ofgrid computing is that it can utilizeidle CPUs connected toge<strong>the</strong>r in a networkto create a virtual supercomputer.‘Xgrid’ is a distributed computingtechnology introduced by AppleInc. to create a grid network for parallelprocessing. Applications that canbe controlled via <strong>the</strong> comm<strong>and</strong>-lineinterface of <strong>the</strong> Unix operating systemshould be able to benefit from Xgrid.One such application is Blender 3D.Blender 3D was selected in this researchbecause it is an open source3D modelling <strong>and</strong> animation software,<strong>and</strong> more importantly because it hasan expansive architecture that canbe extended using Python programminglanguage. Blender 3D version2.46 is also capable of multi-threading,which enable one to tap into <strong>the</strong> multicorecapabilities of modern CPUs inaddition to <strong>the</strong> power of grid computing.Recently, several public-domain animationmovies have been producedthat showcase Blender 3D’s capabilityas a professional animation tool.However, <strong>the</strong> rendering time neededfor professional animations take considerabletime to complete using aconventional computer setup. To overcomethis limitation, we explore <strong>the</strong>use of grid computing to create aRender-Farm that will reduce <strong>the</strong> burdenof rendering Blender 3D animations.To encourage end-users (3D animators)to use <strong>the</strong> Render-Farm, weexplored <strong>the</strong> use of PHP to developa web portal for easy access <strong>and</strong> retrievalof rendered results.OBJECTIVETo create a Render-Farm using Xgridtechnology for Blender 3D in order tospeed up <strong>the</strong> processor-intensive rendering<strong>and</strong> compression of animationfiles for Blender 3D animators.RESEARCH METHODOLOGYIn our grid environment, we used10 Apple iMac computers that wereavailable at <strong>the</strong> Bioinformatics Lab,18


<strong>Grid</strong> <strong>Computing</strong> <strong>Cluster</strong> – <strong>the</strong> <strong>Development</strong> <strong>and</strong> Integration of <strong>Grid</strong> Services <strong>and</strong> ApplicationsFigure 1: System Architecture showing system flow for Blender 3D Render-Farm Portal developed at INFORMM, USM.Figure 2: Xgrid Admin showing<strong>the</strong> combined CPU power of <strong>the</strong> gridobtained by networking 10 iMac2.0 GHz (Intel Core 2 Duo CPUs) ina cluster using Apple Xgrid.Institute for Research in MolecularMedicine (INFORMM), USM. All AppleiMac computers were equippedwith Mac OS 10.5, Intel Core 2 Duo2.0 GHz processors <strong>and</strong> 1 GB of RAM.These were connected through <strong>the</strong>USM local area network (Figure 1). AnApple Xserve was used as a controller,<strong>and</strong> <strong>the</strong> Apple iMacs served as agentsin <strong>the</strong> grid. A total of 40 GHz CPUpower was successfully attained on<strong>the</strong> grid cluster when all 10 iMac computersin <strong>the</strong> cluster were used (Figure2).Blender version 2.46 was downloadedfrom <strong>the</strong> Blender website(http://www.blender.org) <strong>and</strong> installedon all agents (see Figure 5(a)for screenshot). A Blender 3D Render-Farm portal was developed usingPHP <strong>and</strong> MySQL to provide easy accessfor users to submit <strong>the</strong>ir Blender3D rendering jobs (animation <strong>and</strong> texturesfiles; Figure 3(a)), <strong>and</strong> to retrieve<strong>the</strong> rendered results (individualframes) from <strong>the</strong> controller (Figure3(b)).Users can simply ‘submit’ <strong>the</strong>irjobs by logging-in to <strong>the</strong> Render-Farmportal; upload <strong>the</strong>ir animation <strong>and</strong>texture files <strong>and</strong> fill in <strong>the</strong> start-frame<strong>and</strong> end-frame in <strong>the</strong> fields provided(Figure 3(a)). Once <strong>the</strong> job is submitted,<strong>the</strong> controller splits <strong>the</strong> job intomultiple tasks <strong>and</strong> <strong>the</strong>n distributes itto any available agents on <strong>the</strong> grid.The submitted job contains a Job ID<strong>and</strong> is saved in <strong>the</strong> MySQL databaserunning on <strong>the</strong> server. During <strong>the</strong> renderingprocess, Xgrid AdministrationTool was used to log-in to <strong>the</strong> Xgridcontroller to monitor its activities (Figures4(a), (b) <strong>and</strong> (c)). The easy-to-useinterface provides a clean summaryof <strong>the</strong> cluster activities, such as <strong>the</strong>SUB-PROJECTS 19


<strong>Grid</strong>@USM(a) Job Submission Page. Users only need to upload <strong>the</strong>ir animation file<strong>and</strong> textures. The render job is given a name <strong>and</strong> <strong>the</strong> ‘start-frame’ <strong>and</strong>‘end-frame’ fields are keyed in before clicking on <strong>the</strong> ‘Submit’ button tosend <strong>the</strong> job to <strong>the</strong> controller, which <strong>the</strong>n splits <strong>and</strong> delegates <strong>the</strong> job to<strong>the</strong> agents on network.(b) Results Page. When <strong>the</strong> job is completed, <strong>the</strong> agents return <strong>the</strong> resultsto <strong>the</strong> controller when <strong>the</strong>y are stored in compressed form as a zip file.Users only need to click <strong>the</strong> ‘Download’ button to return <strong>the</strong> results from<strong>the</strong> controller.Figure 3: Screenshot of <strong>the</strong> Render-Farm portal(a) List of agents available in <strong>the</strong> grid(b) Jobs Status Monitor (Running jobs)(c) Jobs Status Monitor (Finished jobs)Figure 4: Xgrid Admin Tools20 USING GRID TECHNOLOGY TO CREATE A RENDER-FARM FOR BLENDER 3D ANIMATION


<strong>Grid</strong> <strong>Computing</strong> <strong>Cluster</strong> – <strong>the</strong> <strong>Development</strong> <strong>and</strong> Integration of <strong>Grid</strong> Services <strong>and</strong> Applications(a) Wireframe rendered image of INFORMM building at hd resolution (1920 × 1080 pixel) for Blender3D. Model took less than 1 second to render <strong>and</strong> save compressed as a JPEG image file.(b) Final rendered image from Blender 3D using radiocity ray-tracing. Image took 80 seconds to render1 still frame. 24 still frames are needed to produce 1 second of animation.Figure 5: Images rendered by Blender 3Djob status <strong>and</strong> how many jobs are currentlyrunning <strong>and</strong>/or pending, CPUusage, <strong>and</strong> much more.On completion of <strong>the</strong> renderingjob, <strong>the</strong> agents will send back <strong>the</strong> renderedresults (series of static imagefiles) to <strong>the</strong> controller. The controller<strong>the</strong>n collects <strong>the</strong> images into a temporaryfolder <strong>and</strong> <strong>the</strong>n proceeds to compresseach rendered file in <strong>the</strong> folder.The temporary folder is <strong>the</strong>n packagedas a zipped file <strong>and</strong> stored in<strong>the</strong> portal, ready for download. Theuser can <strong>the</strong>n download <strong>the</strong> zippedfile to his terminal for post processing,such as QuickTime stitching toge<strong>the</strong>r<strong>and</strong> temporal or spatial compressionto create a .mov file ready to play.RESULTSThe time periods taken for rendering10,000 frames of Blender 3D animationon <strong>the</strong> Render-Farm containing 1to 10 iMac computers were recorded.The result showed that <strong>the</strong>re is ahuge difference in completion timebetween using one computer <strong>and</strong> <strong>the</strong>Render-Farm running on 10 agents.Xgrid with 10 computers (20 CPUs)running a job consisting of 10,000frames of Blender 3D animation took2340 seconds (39 minutes) comparedwith 20,160 seconds (336 minutes) ona single computer (2 CPUs). This representsa 9 fold increase in renderingspeed.SUB-PROJECTS 21


<strong>Grid</strong>@USMTable 1: Reliability of Xgrid Render-FarmTime Taken (Seconds) To Complete Rendering 1000 FramesNo. of nodes1 2 3 4 5 6 7 8 9 10❵❵❵❵❵❵❵❵❵❵ Run No.❵1 2435.00 1306.90 963.42 663.78 569.78 435.71 422.64 355.13 327.98 305.872 2436.00 1336.00 963.35 663.33 569.40 436.00 421.00 354.00 327.66 303.003 2435.00 1306.90 963.42 663.78 560.78 435.71 422.64 355.23 327.98 305.874 2435.00 1306.90 963.42 663.78 569.78 435.71 333.00 355.13 327.98 305.275 2435.00 1306.90 963.33 660.00 564.00 432.00 421.00 350.95 321.22 301.17coefficient of variation (%) 1.848 3.061 4.152 0.002 0.007 0.003 0.002 0.005 0.009 0.006Figure 6: Blender 3D rendering times using Xgrid <strong>and</strong> increasing number of agentsDISCUSSION &CONCLUSIONIn this project, we constructed a gridnetwork of 10 iMac computers usingApple Xgrid middleware to create aRender-Farm for Blender 3D animation.A web portal was also developedusing PHP to enable easy submissionof render jobs to <strong>the</strong> grid<strong>and</strong> to retrieve <strong>the</strong> results of <strong>the</strong>rendering. To study <strong>the</strong> efficiencyof <strong>the</strong> grid, we recorded <strong>the</strong> timetaken to render 1000, 5000, <strong>and</strong> 10,000frames of Blender 3D animation witha fixed number of agents in <strong>the</strong> gridfor each study. The results showedthat when <strong>the</strong> number of agents wereincreased from 1 to 10, <strong>the</strong> renderingtimes were reduced by 86.78 %,86.68 % <strong>and</strong> 88.39 % for 1000, 5000<strong>and</strong> 10,000 frames, respectively. Onaverage <strong>the</strong> 10-node Render-Farm wasable to complete <strong>the</strong> rendering job byas much as 9 times faster than <strong>the</strong>conventional single computer setup.As <strong>the</strong> workload increases, <strong>the</strong> efficiencyalso increases, although largerjobs also increase <strong>the</strong> time taken toreturn <strong>the</strong> results to <strong>the</strong> controller.To study <strong>the</strong> reliability (reproducibility)of <strong>the</strong> Xgrid Render-Farm,we repeated 5 times each run acrossa fixed number of agents <strong>and</strong> foundthat <strong>the</strong> coefficient of variation oftimes taken for completing of eachrendering job was less than 5 % (Table1). We conclude that Xgrid canbe used to create a reliable low-costRender-Farm to accelerate CPU intensivetasks, saving time <strong>and</strong> cost.FUTURE WORKFollowing <strong>the</strong> PRAGMA 15 exhibition<strong>and</strong> showcase, we have receivedmany queries regarding our Blender3D Render-Farm portal. We arepreparing to organise a workshop onBlender 3D <strong>and</strong> grid computing.PROJECT PUBLICATIONSPhua, K. K. & Wong, V. C. (2007).Developing A Compelling 3D Animation<strong>and</strong> Multimedia PresentationUsing Blender for INFORMM’sOpening Ceremony. Malaysian Journalof Medical Sciences, 14(Supplement1).Zafri, M., Sanjay, K. C., & Phua, K.K. (2009). Implementation Methodsfor Estimating Haplotypes withGRID <strong>Computing</strong> Technology. InCompendium of Abstracts for <strong>the</strong> 14thNational Conference on Medical <strong>and</strong>Health Sciences (NCMHS).Zafri, M., Fadhilah, N. K., & Phua,K. K. (2008). Using Xgrid to ImproveFASTA Efficiency for Alignmentof Multiple DNA <strong>and</strong> ProteinSequences. Malaysian Journal of MedicalSciences, 15(Supplement 1).U@S<strong>Grid</strong>M22 USING GRID TECHNOLOGY TO CREATE A RENDER-FARM FOR BLENDER 3D ANIMATION


Image: S/Cultura Luminosa bookshelves.Designed by ©Bruno Petronzi (http://www.brunopetronzi.it/).GRID-ENABLEDBLEXISMA2Computer-aided Translation UnitSchool of Computer SciencesTEAM MEMBERSProject Leader : AP Tang Enya KongResearchersCollaborator: <strong>Lim</strong> <strong>Lian</strong> <strong>Tze</strong>, Ye Hong Hoe: Dr Didier Schwab,Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole (GETALP),Laboratoire d’Informatique de Grenoble, Université Pierre Mendès France (Grenoble 2), FranceNATURAL LANGUAGEPROCESSING &GRID COMPUTINGEfforts to create computer programsthat can underst<strong>and</strong> human languages,or natural languages, has along tradition dating back to <strong>the</strong> inventionof <strong>the</strong> computer. Much workis still on-going in <strong>the</strong> field of NaturalLanguage Processing (NLP) today.On one h<strong>and</strong>, <strong>the</strong>re are still manypertinent research questions as naturallanguage is intricate, ambiguous,dynamic <strong>and</strong> continuously evolving.On <strong>the</strong> o<strong>the</strong>r h<strong>and</strong>, novel needs forNLP applications are discovered everydayas people from different communitiesinteract in this increasinglyglobalised world, across national, linguistic<strong>and</strong> cultural borders. Machinetranslation, automatic summarisation,intelligent search <strong>and</strong> targeted advertisingare some examples that springto mind. Suffice to say that NLP is anexciting research field to be in.NLP systems are notorious for beingresource-hungry. Intricate grammaticalstructures, large vocabularies,<strong>the</strong> complex interactions between<strong>the</strong>se linguistic layers, as well as <strong>the</strong>sheer amount of documents to beprocessed, all call for high processingpower, speed <strong>and</strong> storage. Distributedprocessing in grid environmentshave proved to be a helpful solution:<strong>the</strong> Tsujii Lab at <strong>the</strong> Universityof Tokyo had successfully performeddeep grammatical parsing of <strong>the</strong> entireMEDLINE corpus 1 in 8 days on pc1 MEDLINE is an online database of 11 million citations <strong>and</strong> abstracts from health <strong>and</strong> medical journals <strong>and</strong> o<strong>the</strong>r news sources.It contains about 10 GB of uncompressed data.23


<strong>Grid</strong>@USMwhole#2entity#1273 nodes 7 nodes 5 nodes 6 nodes148699 7553medical building#1medical institution#1medical man#1hypernymhypernymhypernymhospital#1 hospital#2doctor#18 64hypernym hypernymsurgeon#1 1 allergist#1 2has instance3Hippocrates#1Figure 1: Discrete relations <strong>and</strong> non-discrete neighbourhoods of lexical items <strong>and</strong> <strong>the</strong>ir meanings. Hypernymy (<strong>the</strong> ‘is-a’relation) links surgeon to doctor etc, but does not link hospital to surgeon nor doctor directly. Instead, all lexical items related to<strong>the</strong> medical profession are within nearby neighbourhood of each o<strong>the</strong>r by <strong>the</strong> conceptual vector model.clusters consisting of 340 CPUs. Closerto home, we had previously deployedour machine translation system on agrid cluster for load balancing.An NLP application would needto underst<strong>and</strong> – or seen to be capableof underst<strong>and</strong>ing – ‘meaning’ asconveyed by natural language discourse,if it was to perform anythingdeemed ‘intelligent’. While <strong>the</strong> exactdefinition of ‘meaning’ is still debatedamong linguists <strong>and</strong> philosophers, weattempted to create a simple computationalrepresentation of word (lexicalitem) meanings. This model is used byBlexisma2, our distributed multi-agentsystem, to construct a semantic lexicaldatabase <strong>and</strong> to perform semanticprocessing of natural language texts.REPRESENTING LEXICALMEANING WITH COMPUTERSWe designed a Semantic Lexical Base(SLB) to store computer representationsof word meanings. The meaningof a word or lexical item can bemodelled by its relations to o<strong>the</strong>r lexicalitems. For example, <strong>the</strong> Englishword heavy is used to convey <strong>the</strong> intensityof rain – we do not say bigrain or even strong rain in English.In addition, rain intuitively calls tomind o<strong>the</strong>r words like snow, sunny,wet, water. We model such explicitlinguistic relations with discrete linksbetween lexical items (<strong>and</strong> between<strong>the</strong>ir meanings), <strong>and</strong> implicit ‘conceptualneighbourhood’ with ma<strong>the</strong>maticalvectors. Essentially, <strong>the</strong> conceptualvector model projects words <strong>and</strong> <strong>the</strong>irmeanings onto a spherical space, inwhich meanings with similar ‘conceptual<strong>the</strong>mes’ (e.g. water, wea<strong>the</strong>r) arelocated close to each o<strong>the</strong>r. Figure 1illustrates how this hybrid representationscheme captures multiple facetsof lexical item meanings.The SLB is designed to h<strong>and</strong>le polysemy,i.e. that a surface lexical form(e.g. mouse) can have multiple meanings(a small, furry animal; a computeraccessory; a timid person; etc).In o<strong>the</strong>r words, <strong>the</strong> SLB will store arecord for each acknowledged meaning.One of <strong>the</strong> key aims of Blexisma2is to automatically acquire as manymeaning records (called acceptions)as possible from multiple sources,including dictionaries, terminologylists <strong>and</strong> Web articles. We chose touse multiple learning sources becauseno single source can claim exhaustivecoverage of a language’s lexicon.Moreover, if <strong>the</strong> information or definitiongiven by a particular source isbadly written to <strong>the</strong> point of introducingnoise, <strong>the</strong> negative effect can beautomatically reduced when data ofbetter quality from ano<strong>the</strong>r source isprocessed. Therefore, we also store arecord (called a lexie) for each meaningentry acquired from a differentsource. The overall SLB organisationis shown in Figure 2.Blexisma2 is expected to run continuously<strong>and</strong> iteratively, so that newlycoinedterms or new usages of existingwords will be captured in <strong>the</strong>SLB as <strong>the</strong>y appear. The quality ofacception <strong>and</strong> lexie objects will alsobe refined with each iterative cycle,as each refined record would ‘feed’positive inputs into <strong>the</strong> learning processof o<strong>the</strong>r lexical meanings. Weimplemented <strong>the</strong> SLB as a PostgreSQLdatabase, hosted on utmkblex in <strong>the</strong>USM Campus <strong>Grid</strong>.BLEXISMA2 AGENTSBlexisma2 is a distributed multi-agentsystem, implemented with <strong>the</strong> Mad-Kit platform (http://www.madkit.net). Each Blexisma2 agent has a specificrole or function to perform. Theymay communicate with each o<strong>the</strong>r bysending messages, managed by <strong>the</strong>MadKit kernel. An agent may request<strong>the</strong> help of o<strong>the</strong>r agents to completeits task. Conversely, an agent may24 GRID-ENABLED BLEXISMA2


<strong>Grid</strong> <strong>Computing</strong> <strong>Cluster</strong> – <strong>the</strong> <strong>Development</strong> <strong>and</strong> Integration of <strong>Grid</strong> Services <strong>and</strong> Applicationslexical item acceptions lexiesPWN:any of numeroussmall rodents. . .PWN:person who is quietor timidPWN:mousea h<strong>and</strong>-operatedelectronic device. . .Wikipedia:in computing, apointing device. . .Wikipedia:a small animal. . .Wikitionary:any small rodent. . .Figure 2: Schematic organisation of lexical items, acceptions <strong>and</strong> lexies in <strong>the</strong> SLB. Lexies are mined from various sources,<strong>and</strong> are <strong>the</strong>n aligned or clustered into acceptions.respond to o<strong>the</strong>r agents’ requests bysupplying information after performingits own processing, or setting offfur<strong>the</strong>r chains of agent interactions.The first implemented agents inour Blexisma2 system prototype attemptsto compute acception <strong>and</strong> lexieobjects for English lexical items, basedon Princeton WordNet (PWN), a widecoverage,freely available online lexicaldatabase for <strong>the</strong> English language.PWN Analyser agents would requestinput data from Lexicon Dispenseragents, whose role is to interface with<strong>the</strong> SLB on behalf of all o<strong>the</strong>r agents.After performing <strong>the</strong>ir computations,<strong>the</strong> Analysers would pass <strong>the</strong>ir resultsto <strong>the</strong> Dispensers to be storedin <strong>the</strong> SLB. (See <strong>Lim</strong> <strong>and</strong> Schwab2008; Schwab <strong>and</strong> <strong>Lim</strong> 2008 for detailsof <strong>the</strong> computation method). Wedeployed instances of <strong>the</strong>se agents on<strong>the</strong> USM Campus <strong>Grid</strong> using <strong>the</strong> Globusjob submission toolkit (Figure 3), thusfreeing up utmkblex to devote itsresources to <strong>the</strong> running <strong>and</strong> maintenanceof <strong>the</strong> SLB PostgreSQL server.RESULTS AND APPLICATIONWe were able to process as many as1,902,080 lexical items in 5 days, averagingaround 263 items per minute.Performance-wise, we noted bottleneckeffects i.e. hitting <strong>the</strong> performanceceiling at around 8 nodes dueto frequent connections, high I/O <strong>and</strong>overall heavy load on <strong>the</strong> SLB, whichwe aim to overcome in <strong>the</strong> future.To demonstrate how <strong>the</strong> SLB mightbe used, we also implemented asimple Sense-Tagger <strong>and</strong> a SyntacticParser. The Sense-Tagger takes an Englishsentence as input, <strong>and</strong> requests<strong>the</strong> Syntactic Parser to analyse <strong>the</strong>grammatical properties of <strong>the</strong> wordsin <strong>the</strong> sentence. It <strong>the</strong>n uses semanticdata from <strong>the</strong> SLB to identify <strong>the</strong>most likely meanings of each word.For instance, when given <strong>the</strong> inputs(1) Spica is a star in <strong>the</strong> Virgo constellation.(2) She is <strong>the</strong> star actress in our play.(3) A school of fish swam past.(4) The school’s classrooms are spacious.<strong>the</strong> Sense-Tagger identifies star as acelestial body in (1), <strong>and</strong> as an actorplaying a principal role in (2). Schoolin (3) is tagged as a large group offish; <strong>and</strong> as an education institutionin (4). This capability to discern betweendifferent meanings of a wordin different contexts is important inhelping o<strong>the</strong>r NLP applications provideaccurate results.WHAT’S NEXT?While <strong>the</strong> Blexisma2 prototype catersonly for <strong>the</strong> English language, we areSUB-PROJECTS 25


<strong>Grid</strong>@USMPWNAnalyserPWNAnalyserPWNAnalyserPWNAnalyserRemote HostLexiconDispenserRemote HostLexiconDispenserLexiconDispenserRemote HostLexiconDispenserSense-TaggerSyntacticParserLexiconDispenserSLB on utmkblexPWNAnalyserRemote HostFigure 3: Blexisma2 agents are deployed on USM Campus <strong>Grid</strong> nodes using <strong>the</strong> Globus job submission toolkit.(a) Different meanings of star in context(b) Different meanings of school in contextFigure 4: SLB data generated by Blexisma2 agents used to determine most probable meanings of ambiguous lexicalitem. A Web interface to <strong>the</strong> Sense-Tagger is available at http://utmkblex.usmgrid.myren.net.my:8080/Blexisma2Servlets/CVSenseTagger.interested to improve its design toa multilingual setting, where morecomplicated cross-lingual phenomenawill have to be considered. We alsohope to improve <strong>the</strong> agent communicationmechanisms to reduce latency.Apart from creating moreagents that implement different learningalgorithms <strong>and</strong> heuristics, we arealso planning agents responsible foro<strong>the</strong>r NLP tasks such as deep parsing,detecting named entities, etc. In o<strong>the</strong>rwords, we hope to have agents of variousresponsibilities so that <strong>the</strong>y canbe ‘mixed-<strong>and</strong>-matched’ to constructnew NLP applications, such as thosementioned in <strong>the</strong> introduction.On <strong>the</strong> performance issues, wemay attempt several solutions for <strong>the</strong>SLB bottleneck problem mentionedearlier:• hardware (RAM) upgrades,• fur<strong>the</strong>r optimisation of PostgreSQLserver settings,• database connection pooling,• load balancing,• parallelising queries.PROJECT PUBLICATIONS<strong>Lim</strong>, L. T. & Schwab, D. (2008). <strong>Lim</strong>itsof Lexical Semantic Relatednesswith Ontology-based ConceptualVectors. In Proceedings of <strong>the</strong> 5th InternationalWorkshop on Natural LanguageProcessing <strong>and</strong> Cognitive Science(NLPCS’08). Barcelona, Spain;pp. 153–158.Schwab, D. & <strong>Lim</strong>, L. T. (2008). Blexisma2:a Distributed Agent Frameworkfor Constructing a SemanticLexical Database based on ConceptualVectors. In Proceedingsof <strong>the</strong> International Conference onDistributed Frameworks & Applications2008 (DFmA 2008). Penang,Malaysia; pp. 102–110.U@S<strong>Grid</strong>M26 GRID-ENABLED BLEXISMA2


Image: A business h<strong>and</strong>shake. Photograph by ©FOTOCROMO (http://www.sxc.hu/profile/FOTOCROMO).B2B STANDARDSCOMPONENTMODELINGInformation Systems Engineering Research GroupSchool of Computer SciencesTEAM MEMBERSProject Leader : Dr Vincent Khoo Kay TeongResearchers : Ting Tin Tin, Rinki Yadav, Johnson Foong, Kor Chan HockINTRODUCTIONThis report is about <strong>the</strong> research, design,development <strong>and</strong> implementationof two B2B communications modelson a USM networked non-<strong>Grid</strong>architecture, <strong>and</strong> also an attemptedimplementation of <strong>the</strong> models on aMyREN networked <strong>Grid</strong> architecture.B2B INTEGRATIONSTANDARDSA major challenge in integrating <strong>the</strong>business processes among <strong>the</strong> tradingpartners is <strong>the</strong> enhancement of<strong>the</strong> documents interchange processes.Traditional business-to-business (B2B)process integration is based on a‘Push’ model through which documentsare pushed from <strong>the</strong> sendersto <strong>the</strong> receivers, as in e-mailing <strong>and</strong>electronic data interchange. However,as <strong>the</strong> volume of document increases,<strong>the</strong> ‘Push’ model suffers from diminishingdata quality, which could bedue to <strong>the</strong> low personalizability in<strong>the</strong> communication subject. Pushinga large volume of pre-defined st<strong>and</strong>arddocuments among <strong>the</strong> tradersusually results in data redundancy.In addition, low personalizability incommunication channel also incurs ahigher cost <strong>and</strong> longer time in developing<strong>and</strong> deploying <strong>the</strong> infrastructure.In this research, we explored <strong>the</strong>‘Pull-only’ model <strong>and</strong> <strong>the</strong> ‘Push <strong>and</strong>Pull’ model to increase <strong>the</strong> personalizability<strong>and</strong> <strong>the</strong>reby promote a morepervasive use of <strong>the</strong> B2B integrationst<strong>and</strong>ards.‘PULL-ONLY’ MODELThe ‘Pull-only’ model is a style of communicationin which <strong>the</strong> receiver initiates<strong>the</strong> data request to be respondedby <strong>the</strong> sender accordingly. The processthat involves <strong>the</strong> sender is automatedby <strong>the</strong> Web services. After<strong>the</strong> receiver has initiated <strong>the</strong> informationrequest, <strong>the</strong> stored au<strong>the</strong>nticationinformation will be sent by <strong>the</strong>Web services to <strong>the</strong> sender’s serverfor au<strong>the</strong>ntication, <strong>and</strong> connecting <strong>the</strong>Web services to <strong>the</strong> sender’s databaseview. Once it has been au<strong>the</strong>nticated,<strong>the</strong> sender’s database view will allow<strong>the</strong> receiver to ‘pull’ data through <strong>the</strong>Web services. The raw data will betransformed into a XML format messagebased on <strong>the</strong> predefined st<strong>and</strong>ardschema. Once <strong>the</strong> XML messageis validated, it can be transformedback into raw data <strong>and</strong> inserted into<strong>the</strong> receiver’s database automatically.The purpose of <strong>the</strong> transformationof raw data into XML message is toensure that <strong>the</strong> message conforms to<strong>the</strong> st<strong>and</strong>ard schema. This will ease<strong>the</strong> data insertion process.‘PUSH AND PULL’ MODELA ‘Push <strong>and</strong> Pull’ model is a style ofcommunication in which <strong>the</strong> senderpushes <strong>the</strong> data to <strong>the</strong> receiver but<strong>the</strong> amount of data received is filteredby a st<strong>and</strong>ard schema pre-defined by<strong>the</strong> receiver (pull). The receiver controlswhat to receive through <strong>the</strong> filterwhich does not require <strong>the</strong> receiver tomanually select what to be inserted27


<strong>Grid</strong>@USMFigure 1: B2B Web Services Integration FrameworkFigure 2: ‘Pull’ Model Mechanisminto <strong>the</strong>ir database during <strong>the</strong> datainsertion process.ENTERPRISE GRIDARCHITECTURE FORCOLLABORATIVE B2BINFORMATIONINTERCHANGEThe <strong>Grid</strong> infrastructure layer supportsa collaborative database sharingmiddleware. All <strong>the</strong> virtualdatabases sharable among <strong>the</strong> partnersare maintained at a centralizedphysical database. The grid middlewareused in this implementationcontrols <strong>and</strong> manages <strong>the</strong> retrievalof <strong>the</strong> databases. The ‘Push’ <strong>and</strong>‘Pull’ modeling in Education ServiceMashup (EdSeM) helps <strong>the</strong> partners tointerchange confidential informationmore effectively. EdSeM is <strong>the</strong> partnerof choice for business partnersthat are interested in building longtermrelationship among <strong>the</strong> educationalinstitutions worldwide. EdSeMhas been set up to deploy, leverage<strong>and</strong> integrate many different typesof emerging technologies including<strong>the</strong> Web 2.0 content extraction, agentbased,<strong>and</strong> decision-support technologies;knowledge dissemination <strong>and</strong>knowledge representation technologies.The EdSeM infrastructure complements<strong>the</strong> middleware by sharing<strong>the</strong> information interchange Web servicesacross <strong>the</strong> network. Every partnercan use <strong>the</strong> services to interchangeconfidential information.RESULTS & APPLICATIONInfoX is a data mapping tool thatmaps raw data among <strong>the</strong> files in adatabase. It uses <strong>the</strong> ‘Push <strong>and</strong> Pull’models for transforming data between<strong>the</strong> trading partners. Each time a tradingpartner needs to send or receive28 B2B STANDARDS COMPONENT MODELING


<strong>Grid</strong> <strong>Computing</strong> <strong>Cluster</strong> – <strong>the</strong> <strong>Development</strong> <strong>and</strong> Integration of <strong>Grid</strong> Services <strong>and</strong> ApplicationsFigure 3: ‘Push <strong>and</strong> Pull’ Model MechanismFigure 4: Enterprise <strong>Grid</strong> ArchitectureSUB-PROJECTS 29


<strong>Grid</strong>@USMFigure 5: InfoX ArchitectureFigure 6: InfoX Infrastructure Incorporating <strong>Grid</strong> Architecturedata, it has to be connected to <strong>the</strong>InfoX Web Services Server to requestfor <strong>the</strong> Web service to perform <strong>the</strong>required action. All <strong>the</strong> raw data willbe transformed into <strong>the</strong> XML formatto ease <strong>the</strong> data transformation process.The XML format message willbe stored in <strong>the</strong> InfoX server whichis sharable throughout <strong>the</strong> networkthrough a grid middleware. The gridmiddleware controls <strong>and</strong> manages <strong>the</strong>retrieval of <strong>the</strong> database. The managementis transparent to <strong>the</strong> users.The end users only need to query <strong>the</strong>desired information through <strong>the</strong> middleware.The grid services will subsequentlyretrieve <strong>the</strong> data from variousdatabases based on a data catalog.The distribution of data at variousdatabases is carried out by <strong>the</strong> gridservices.By deploying <strong>the</strong> InfoX system ina grid environment, it is able to send45 records per second on an average.However, it was observed that some340 records per second on an averagecould be achieved in a non-gridenvironment.In order to deploy Infox in a gridenvironment through a grid middleware,both <strong>the</strong> InfoX <strong>and</strong> <strong>the</strong> middlewaremust be located in a grid MyRENnetwork. Due to <strong>the</strong> incompatibilityof InfoX in <strong>the</strong> MyREN network,it takes a much longer time to process<strong>the</strong> data <strong>and</strong> eventually affects<strong>the</strong> response time.We have also implemented a selfserviceevaluation system for <strong>the</strong> potentialeducational institutions to estimate<strong>the</strong> possible time required for<strong>the</strong> development <strong>and</strong> deployment ofan industry-wide st<strong>and</strong>ardised architecture.WHAT’S NEXT?As part of <strong>the</strong> future work, InfoX<strong>and</strong> <strong>the</strong> grid middleware have to beenhanced from various perspectives.First of all, <strong>the</strong> InfoX system musthave <strong>the</strong> ability to compress <strong>the</strong> databefore sending to <strong>the</strong> grid middlewarefor fur<strong>the</strong>r processing. It ishoped that <strong>the</strong> response time especiallywhen sending large files willeventually be enhanced. From <strong>the</strong>grid system perspective, we need tofur<strong>the</strong>r investigate <strong>the</strong> existence of abotteneck in <strong>the</strong> MyREN network.PROJECT PUBLICATIONSTing, T. T. & Khoo, V. K. T. (2009).B2B St<strong>and</strong>ardized Information InterchangeChallenges – A Study onSt<strong>and</strong>ardization versus Personalization.In Proceedings of <strong>the</strong> 2009 InternationalConference on AdvancedManagement Science (ICAMS 2009);pp. 244–248.Ting, T. T. & Khoo, V. K. T. (2008).Personalizable Information InterchangeArchitecture for EducationalInstitutions. In Proceedingsof <strong>the</strong> 3rd International Conference one-Commerce with Focus on DevelopingCountries (ECDC’08). Isfahan, Iran.U@S<strong>Grid</strong>M30 B2B STANDARDS COMPONENT MODELING


Image: A programmer hard at work. Photograph by ©Milen Yakimov (http://www.sxc.hu/profile/todorov40).AN AUTOMATED JAVATESTING TOOL ON THE GRIDSoftware Engineering Research GroupSchool of Electrical & Electronic EngineeringTEAM MEMBERSProject Leader : Dr Kamal Zuhairi ZamliResearchers : Dr Nor Ashidi Mat Isa, Mohammed Issam Younis, Saidatul Khatimah binti SaidINTRODUCTIONNowadays, we are increasingly dependenton software to assist as wellas facilitate our daily chores. In fact,whenever possible, most hardware implementationis now being replacedby <strong>the</strong> software counterpart. From<strong>the</strong> washing machine controllers, mobilephone applications to <strong>the</strong> sophisticatedairplane control systems, <strong>the</strong>growing dependent on software canbe attributed to a number of factors.Unlike hardware, software does notwear out. Thus, <strong>the</strong> use of softwarecan also help to control maintenancecost. Additionally, software is alsomalleable <strong>and</strong> can be easily changedas <strong>the</strong> need arises.Covering as much as 40 to 50 %of <strong>the</strong> total software developmentcosts, testing can be considered oneof <strong>the</strong> most important activities forsoftware validation <strong>and</strong> verification.Lack of testing can lead to disastrousconsequences including loss of31


<strong>Grid</strong>@USMFigure 1: Option dialog in Microsoft ExcelFigure 2: G_MIPOG Graphical User Interfacedata, fortunes <strong>and</strong> even lives. Manycombinations of possible input parameters,hardware/software environments,<strong>and</strong> system conditions needto be tested <strong>and</strong> verified against forconformance based on <strong>the</strong> system’sspecification. Often, this results intocombinatorial explosion problem.As illustration, consider <strong>the</strong> optiondialog in Microsoft Excel software(see Figure 1). Even if only<strong>the</strong> View tab option is considered,<strong>the</strong>re are already 20 possible configurationsto be tested. With <strong>the</strong> exceptionof <strong>Grid</strong>lines color which takes56 possible values, each configurationcan take two values (i.e. checked orunchecked). Here, <strong>the</strong>re are 2 20 × 56(i.e. 58,720,256) combinations of testcases to be evaluated. Assuming thatit takes 5 minute for one test case,<strong>the</strong>n it would require nearly 559 yearsfor a complete test of <strong>the</strong> View tab option.As highlighted above, given limitedtime <strong>and</strong> resources, exhaustivetesting is next to impossible. Obviously,<strong>the</strong>re is a need for a systematicstrategy in order to reduce <strong>the</strong> testdata into manageable ones.Earlier work has indicated thatpairwise testing (i.e. based on 2-wayinteraction of variables) can be effectiveto sample <strong>the</strong> test data appropriately.While such conclusion maybe true for some systems, it cannotbe generalized to all software systemfaults (i.e. especially when <strong>the</strong>re aresignificant interactions between variables).This argument is supportedby <strong>the</strong> fact that software applicationsgrew tremendously in size from kilobytesto terabytes in <strong>the</strong> last 5 years.Driven by <strong>the</strong> advancement of technologies<strong>and</strong> market dem<strong>and</strong>s, <strong>the</strong> neteffect of this massive software growthcan often lead to intertwined dependencybetween parameters involved,thus, justifying <strong>the</strong> need to considerfor high interaction strength (i.e. oftenindicated as t).Although desirable, <strong>the</strong> consider-32 AN AUTOMATED JAVA TESTING TOOL ON THE GRID


<strong>Grid</strong> <strong>Computing</strong> <strong>Cluster</strong> – <strong>the</strong> <strong>Development</strong> <strong>and</strong> Integration of <strong>Grid</strong> Services <strong>and</strong> ApplicationsTable 1: Results For 7 to 10 5-Valued Parameters in 6-Way Testing# Parameters 7 8 9 10Test Case Size 15,625 28,125 40,146 45,168Time (MIPOG) (Single Computer) 118.703 485.047 1637.097 4657.457Time (G_MIPOG) (6-Computers) 55.234 183.184 626.797 1552.125Speedup 2.149 2.648 2.838 3.203Table 2: Results using 1 to 11 Computers for 10 10-valued Parameters in 4-way Testing# Computers Time Speedup Test Case Size1 (MIPOG) 77,882.16 12 (G_MIPOG) 43,680.404 1.7833 (G_MIPOG) 31,140.4 2.5014 (G_MIPOG) 24,089.749 3.2335 (G_MIPOG) 17,150.883 4.5416 (G_MIPOG) 15,193.554 5.126 27,3067 (G_MIPOG) 14,928.537 5.2178 (G_MIPOG) 14,789.624 5.2669 (G_MIPOG) 12,928.645 6.02410 (G_MIPOG) 9570.184 8.13811 (G_MIPOG) 9352.967 8.327ation for higher interaction strength(e.g. from t = 3 onwards) can be problematic.When <strong>the</strong> parameter interactioncoverage t increases (i.e. as t > 2),<strong>the</strong> number of t-way test set also increasesexponentially. For example,consider a system with 10 parameters,where each parameter has 5 values.There are 1125 2-way tuples (or pairs),15,000 3-way tuples, 131,250 4-way tuples,787,500 5-way tuples, 3,281,2506-way tuples, 9,375,000 7-way tuples,17,578,125 8-way tuples, 19,531,250 9-way tuples, <strong>and</strong> 9,765,625 10-way tuples.From this example, it is evidentthat for large system with manyparameters, considering higher ordert-way test data invites massively notoriouscomputation.Facing <strong>the</strong>se challenges, our researchinvestigates a new strategy <strong>and</strong>its implementation, called G_MIPOG.Here, G_MIPOG (see Figure 2) originatesfrom a well known sequentialt-way strategy called IPOG. Unlike itspredecessor strategy (IPOG), G_MIPOGpermits harnessing of grid based architecture,allowing <strong>the</strong> support forhigher t than that of IPOG.OUR EXPERIENCE WITH THEG_MIPOG STRATEGYAs <strong>the</strong> first step to <strong>the</strong> development ofG_MIPOG, we investigate an intermediatestrategy, called MIPOG 2 , basedon <strong>the</strong> modification of IPOG. Our experimentalresult demonstrates that<strong>the</strong> control <strong>and</strong> data dependencies inMIPOG can be removed to permit distributedas well as parallel processing.As part of our research, we intendto investigate whe<strong>the</strong>r <strong>the</strong>re isa speedup gain from distributingMIPOG in G_MIPOG. Three experimentsare applied to both MIPOG<strong>and</strong> G_MIPOG in order to gauge <strong>the</strong>speedup. Here, <strong>the</strong> speedup is definedas ratio of <strong>the</strong> time taken bysingle computer to <strong>the</strong> time taken bymultiple computers. The experimentalgoals are to investigate:• <strong>the</strong> speedup as <strong>the</strong> number of parametersincreases;• <strong>the</strong> speedup as <strong>the</strong> number of computernodes increases;• <strong>the</strong> speedup against <strong>the</strong> increase ofparameter coverage;• <strong>the</strong> comparison in terms of test sizeagainst o<strong>the</strong>r strategies.To achieve <strong>the</strong> first goal, we applyMIPOG, <strong>and</strong> G_MIPOG (consists of6 computers) to 5-valued parameters,<strong>and</strong> vary <strong>the</strong> number of parametersfrom 7 to 10, <strong>and</strong> fix t = 6. The resultsare demonstrated in Table 1. Toachieve <strong>the</strong> second goal, we have fixedt = 4, with 10 10-valued parameters.Then, we determine <strong>the</strong> speedup using2 to 11 computers. The results areshown in Table 2. To achieve <strong>the</strong> thirdgoal, we apply G_MIPOG to <strong>the</strong> TrafficCollision Avoidance System (TCAS)module. Here, <strong>the</strong> TCAS module isan aircraft collision avoidance systemfrom <strong>the</strong> Federal Aviation Administrationwhich has been used as casestudy in o<strong>the</strong>r related works. TheTCAS module has twelve parameters:seven parameters have 2 values, twoparameters have 3 values, one parameterhas 4 values, <strong>and</strong> two parametershave 10 values. In this case, we use2 computers, <strong>and</strong> vary t from 2 to 11to determine <strong>the</strong> speedup, as givenin Table 3. Finally, to achieve <strong>the</strong> lastgoal, we compare <strong>the</strong> test size generatedby G_MIPOG <strong>and</strong> o<strong>the</strong>r strategiesfor <strong>the</strong> aforementioned TCAS module.The results are tabulated in Table 4.We have darkened <strong>the</strong> entries to show<strong>the</strong> best result. Entry with ‘NS’ refersto as not supported by <strong>the</strong> implementation.DISCUSSIONIn Table 1, we note that <strong>the</strong> speedupincreases linearly as <strong>the</strong> number of parametersincreases. Here, extra overheadis added for <strong>the</strong> fifth parametersdue to <strong>the</strong> need to start <strong>and</strong>shut down <strong>the</strong> corresponding threads.Referring to Table 2, we note <strong>the</strong>speedup gain also increases quadraticallyas <strong>the</strong> number of values increases.Extrapolating <strong>and</strong> performingcurve fitting of <strong>the</strong> results from Ta-2 The details of MIPOG can be found at http://gclportal.usmgrid.myren.net.my/trac/engpjtstSUB-PROJECTS 33


<strong>Grid</strong>@USMTable 3: Results Using t = 2 to 11 for TCAS Modulet-Way Time (MIPOG) Time(G_MIPOG) Speedup Test Case Size2 0.156 0.292 0.534 1003 0.609 0.547 1.113 4004 3.234 2.578 1.2544 12655 36.797 29.125 1.263 41966 301.128 214.091 1.406 10,8517 1772.407 1086.7 1.631 26,0618 10,242.09 5839.276 1.754 56,7429 36,284.7 19,124.78 1.897 120,36110 41,481.672 21,832.46 1.9 201,60111 14,939.891 7821.932 1.91 230,400Table 4: Comparative Test Size Results Using TCAS Module for t = 2 to 12t G_MIPOG IPOG IPOG_D IPOF1 IPOF2 ITCH Jenny TConfig TVGII2 100 100 130 100 100 120 106 108 1013 400 400 480 402 427 2388 411 472 4344 1265 1361 2522 1352 1644 1484 1527 1476 15995 4196 4219 5306 4290 5018 NS 4680 NS 47736 10,851 10,919 14,480 11,234 13,310 NS 11,608 NS NS7 26,061 NS NS NS NS NS 27,630 NS NS8 56,742 NS NS NS NS NS 58,865 NS NS9 120,361 NS NS NS NS NS NS NS NS10 201,601 NS NS NS NS NS NS NS NS11 230,400 NS NS NS NS NS NS NS NS12 460,800 NS NS NS NS NS NS NS NSble 3, we observe that <strong>the</strong> speedup increaseslogarithmically as <strong>the</strong> strengthof coverage increases. In this case,<strong>the</strong>re is also no speedup gain forthis strategy when t = 2, possiblydue to <strong>the</strong> overhead required for creation,synchronization, <strong>and</strong> deletionof threads for small degree of interaction.Concerning comparison witho<strong>the</strong>r strategies in Table 4, G_MIPOG,IPOG, IPOF1, <strong>and</strong> IPOF2 gave <strong>the</strong> mostoptimum test size at t = 2. At t = 3,both G_MIPOG <strong>and</strong> IPOG gave <strong>the</strong> mostoptimum test size. For all o<strong>the</strong>r cases,G_MIPOG always outperforms o<strong>the</strong>rstrategies. Putting G_MIPOG aside,only Jenny can go more than t > 6for <strong>the</strong> TCAS module. TVG implementationdo not give any result whent > 5 whilst TConfig does not produceany result when t > 4. ITCH cannot support t > 4. We also note thatin <strong>the</strong> case of ITCH, <strong>the</strong> test size fort = 3 is greater than that of t = 4.CONCLUSION &FUTURE WORKIn short, our experimental resultsdemonstrate that G_MIPOG scales wellagainst its sequential strategy (MIPOG)with <strong>the</strong> increase of <strong>the</strong> computersas computational nodes. Moreover,G_MIPOG produces minimal test set ascompared with o<strong>the</strong>r existing strategies(whilst keeping <strong>the</strong> test set ofMIPOG <strong>the</strong> same). Finally, G_MIPOG is<strong>the</strong> only strategy that supports hight even against all o<strong>the</strong>r IPOG variants(i.e. IPOG_D, IPOF1, <strong>and</strong> IPOF2).As t-way testing approach is still ininfancy within <strong>the</strong> software engineeringcommunity, novel research <strong>and</strong>findings are highly welcomed. Thework discussed here is indeed significantto facilitate <strong>the</strong> testing activitiesby systematically minimizing <strong>the</strong> testdata for test consideration. As humanare more <strong>and</strong> more dependent towardsoftware, this work presents a smallleap toward <strong>the</strong> betterment of <strong>the</strong> wayengineers do testing.As a part of our future work, weare trying to integrate test execution<strong>and</strong> reporting along with parallelizing<strong>the</strong> data structure for storing <strong>the</strong> interactionelement using tuples-spacetechnology.PROJECT PUBLICATIONSYounis, M. I., Zamli, K. Z., Klaib, M.F. J., Che Soh, Z. H., Che Abdullah,S. A., & Mat Isa, N. A. (2009).Assessing IRPS as an Efficient PairwiseTest Data Generation Strategy.International Journal of Advanced IntelligenceParadigms. (forthcoming).Younis, M. I., Zamli, K. Z., & MatIsa, N. (2008). IRPS – An EfficientTest Data Generation Strategy forPairwise Testing. Lecture Notes inArtificial Intelligence, 5177.Younis, M. I., Zamli, K. Z., & MatIsa, N. A. (2008). A Strategy for<strong>Grid</strong> Based t-Way Test Data Generation.In Proceedings of <strong>the</strong> InternationalConference on DistributedFrameworks & Applications (DFmA2008). Penang, Malaysia; pp. 73–78.Zamli, K. Z., Che Abdullah, S. A., Younis,M. I., & Che Soh, Z. H. (2009).Software Testing Module. (forthcoming).OpenUniversity Malaysia <strong>and</strong>Pearson Publishing.U@S<strong>Grid</strong>M34 AN AUTOMATED JAVA TESTING TOOL ON THE GRID


Image: E<strong>the</strong>rnet crossover cable. Photograph by ©Elia Schmidt (http://The-Condor.deviantart.com/)INET-GRIDNetwork Research GroupSchool of Computer SciencesTEAM MEMBERSProject Leader : Prof. Sureswaran RamadassResearchers : AP Rahmat Budiarto, AP Chan Huah Yong, Dr Ahmed M. ManasrahGRID COMPUTINGA <strong>Grid</strong> environment is a complexglobally distributed system that involveslarge sets of diverse, geographicallydistributed components used fora numerous type of applications. Thecomponents discussed here includeall of <strong>the</strong> software <strong>and</strong> hardware services<strong>and</strong> resources needed by applications.Due to <strong>the</strong> diversity of <strong>the</strong>components <strong>and</strong> enormous numberof users, <strong>the</strong> <strong>Grid</strong> environment hasbecome more vulnerable to fault, failure<strong>and</strong> excessive tools. Thus, a suitablemechanism is needed to monitor<strong>the</strong> components, <strong>the</strong>ir usage <strong>and</strong> detectingconditions that may lead tobottlenecks, faults or failures. <strong>Grid</strong>monitoring is a potential platform towardsa robust, reliable <strong>and</strong> efficientenvironment in networking wherebyit is crucial to maintain network reliabilityduring a peak environment.In conclusion, a combination betweennetwork monitoring <strong>and</strong> grid monitoringare required in order to enhancenetwork reliability <strong>and</strong> performances.Network monitoring in <strong>Grid</strong> environmentplays an important role innetwork management <strong>and</strong> is essentialin <strong>Grid</strong> utilization. According to <strong>the</strong>Global <strong>Grid</strong> Forum schema, <strong>the</strong> managementof network infrastructure observationsis divided into three mainactivities: <strong>the</strong>ir production, using networkmonitoring tools, <strong>the</strong>ir publication,by way of powerful databases,<strong>and</strong> <strong>the</strong>ir utilization by administration<strong>and</strong> workflow analysis tools.We are embarking on an intelligentreal time grid monitoring systemcalled iNet-<strong>Grid</strong>. The purpose ofthis research is to enable an intelligenttool to assist network <strong>and</strong> system administratorsintelligently by providinginformation for preventive measuresto be taken to minimize damages dueto system or network down time thatcan be very costly. Real-time networkanalysis helps in detecting <strong>and</strong> resolvingnetwork faults <strong>and</strong> performanceproblems in a minimum amount oftime. This system has <strong>the</strong> power to analyzemulti-topology, multi-protocolnetworks, automatically.TECHNICAL OBJECTIVETo prove <strong>the</strong> ability of <strong>the</strong> proposedwork to be integrated with any opensource application for grid monitoring(Ganglia) <strong>and</strong> thus, building a platformthat allow all o<strong>the</strong>r grid monitoringsystem to be integrated toge<strong>the</strong>rfor better monitoring results. For example,<strong>the</strong> platform allows a networkmonitoring solution <strong>and</strong> a grid monitoringsolution to work h<strong>and</strong> in h<strong>and</strong>to provide network monitoring information<strong>and</strong> <strong>the</strong>refore, accurate gridmonitoring results.GENERAL BENEFITSFigure 1 depicts <strong>the</strong> general benefitsfrom monitoring grid environments.1. To underst<strong>and</strong> <strong>the</strong> performance,identify problems <strong>and</strong> to tune agrid system for better overall performance,2. Fault detection <strong>and</strong> recovery mechanisms;to determine if <strong>the</strong>re areany units within <strong>the</strong> grid environmentthat are not functioning properly,<strong>and</strong> thus, to decide whe<strong>the</strong>rto restart a component or redirecta service requests elsewhere,3. For debugging purposes,4. Resource utilization,5. Management decisions,6. Performance utilization,7. Accounting,8. Security.INET-GRIDiNet-<strong>Grid</strong> was built on top of EclipsePlatform as a Rich Client Platform35


<strong>Grid</strong>@USMFigure 1: <strong>Grid</strong> monitoring benefits(RCP) application that gives <strong>the</strong> flexibilityin deploying <strong>the</strong> end solutioninto various/any grid environmentsregardless of <strong>the</strong> grid architecture or<strong>the</strong> operating system. The RCP provides<strong>the</strong> power of plugins that canextend <strong>the</strong> system functionality easily<strong>and</strong> quickly without refactoring<strong>the</strong> whole core component from <strong>the</strong>scratch. Moreover, <strong>the</strong>se plugins alsoprovide extension points for o<strong>the</strong>r pluginsto be integrated or add/enhanceany existing service (plugin) such asinterfacing to Ganglia <strong>and</strong> any o<strong>the</strong>rmonitoring tools.ARCHITECTUREiNet-<strong>Grid</strong> is one of <strong>the</strong> core modulesof iNet-Enterprise Suite <strong>and</strong> plays arole as <strong>the</strong> centralized grid monitoringmodule that retrieves <strong>and</strong> managesall <strong>the</strong> nodes information withina cluster send by a grid monitoringagent. Only one grid agent needsto be installed in a Linux-based machinewithin a cluster equipped withany grid monitoring solution such asGanglia. This monitoring agent willlook into Ganglia database, retrieve<strong>the</strong> data <strong>and</strong> send it to <strong>the</strong> centralmonitoring Server a specific definedtime interval (15 seconds by default).Figure 2 best visualizes <strong>the</strong> iNet-<strong>Grid</strong>’sarchitecture followed by its monitoringagent architecture in Figure 3.Fur<strong>the</strong>rmore, iNet-<strong>Grid</strong> consists ofthree components: grid agent, griddata manager <strong>and</strong> grid data viewer.A grid agent is installed in one machinefor each cluster. It collects <strong>and</strong>processes data loaded from <strong>the</strong> Gangliadatabase <strong>and</strong> sends it to griddata manager which resides in iNet-Server. Figure 4 portrays iNet-<strong>Grid</strong>’sfunctional architecture.The monitoring information consistsof <strong>the</strong> number of all incoming/outgoingpackets, CPU speed,CPU number, CPU usage load <strong>and</strong>hard disk used/free size. On <strong>the</strong>o<strong>the</strong>r h<strong>and</strong>, <strong>the</strong> network monitoringinformation consists of; b<strong>and</strong>widthutilization, viruses/worms infectingor propagating within <strong>the</strong> network,<strong>the</strong> applications that are consumingb<strong>and</strong>width, traffic analyzers, etc. Thevalues are read from Ganglia databasewhich kept in round robin database(RRD) format files in <strong>the</strong> location /usr/share/ganglia/rrdtool/file/libs. Belowis <strong>the</strong> list of <strong>the</strong> files:• boottime.rrd • bytes_out.rrd• cpu_idle.rrd • cpu_num.rrd• cpu_system.rrd • cpu_wio.rrd• disk_total.rrd • load_five.rrd• mem_buffers.rrd • mem_free.rrd• mem_total.rrd • pkts_in.rrd• proc_run.rrd • swap_free.rrd• bytes_in.rrd • cpu_aidle.rrd• cpu_nice.rrd • cpu_speed.rrd• cpu_user.rrd • disk_free.rrd• load_fifteen.rrd • load_one.rrd• mem_cached.rrd • mem_shared.rrd• part_max_used.rrd • pkts_out.rrd• proc_total.rrd • swap_total.rrdCONNECTIONFUNCTIONALITYThe grid monitoring agent sends a requestto <strong>the</strong> central monitoring server.The central monitoring server <strong>the</strong>nchecks/verifies whe<strong>the</strong>r <strong>the</strong> requestis a valid request or not (<strong>the</strong> monitoringagent will verify his identity usinga unique ID that is pre-defined in <strong>the</strong>database), if it is valid, <strong>the</strong> server willgive <strong>the</strong> authorization to <strong>the</strong> monitoringagent <strong>and</strong> allows him to send<strong>the</strong> data so it can be used for realtimeor historical monitoring information.When <strong>the</strong> data reaches to<strong>the</strong> server, it will be stored into <strong>the</strong>database. “<strong>Grid</strong>” table is where <strong>the</strong>data is stored (Table 1 shows <strong>the</strong> tabledata dictionary).iNet-Console (which acts as aviewer) will send a grid data requestto <strong>the</strong> central monitoring server requestingfor grid monitoring information.Finally, <strong>the</strong> received grid informationwill be displayed to <strong>the</strong> viewer.The format of <strong>the</strong> data sent to <strong>the</strong>client is as shown in Figure 5.RESULTSThe implementation of iNet-<strong>Grid</strong> prototypehas been done. We tested itwith 5 clusters <strong>and</strong> 90 nodes in total.From our evaluation we can conclude<strong>the</strong> effectiveness of <strong>the</strong> system. Thescreenshots taken during <strong>the</strong> prototypeare shown in Figure 6.EFFICIENT GRIDMONITORING<strong>Grid</strong> monitoring is considered efficientif it is a combination of network<strong>and</strong> grid monitoring, <strong>the</strong> networkmonitoring in term of b<strong>and</strong>width,anomaly, security, user misusage/misdealingwith <strong>the</strong> networkresources which obviously will resultin slow network, thus, minimizing<strong>the</strong> grid productivity specially if it requirescommunication between differentparties over <strong>the</strong> network. On <strong>the</strong>o<strong>the</strong>r h<strong>and</strong>, grid monitoring will ensurethat <strong>the</strong> <strong>Grid</strong> resources are used<strong>and</strong> managed wisely <strong>and</strong> efficiently36 INET-GRID


<strong>Grid</strong> <strong>Computing</strong> <strong>Cluster</strong> – <strong>the</strong> <strong>Development</strong> <strong>and</strong> Integration of <strong>Grid</strong> Services <strong>and</strong> ApplicationsFigure 2: <strong>Grid</strong> monitoring general architectureFigure 3: <strong>Grid</strong> agent architectureTable 1: Data dictionary of <strong>the</strong> “<strong>Grid</strong>” tableColumn Name Comment Column Name CommentNode_name Holds <strong>the</strong> CPU ID value Time Holds <strong>the</strong> time valueBoot_time Holds <strong>the</strong> boot time value Cpu_idle Holds <strong>the</strong> CPU idle valueCpu_system Holds <strong>the</strong> CPU system value Disk_total Holds <strong>the</strong> disk total valueMem_buffers Holds <strong>the</strong> memory buffers value Mem_total Holds <strong>the</strong> memory total valueProc_run Holds <strong>the</strong> number of running processes Bytes_in Holds <strong>the</strong> bytes in valueCpu_nice Holds <strong>the</strong> CPU nice value Cpu_user Holds <strong>the</strong> CPU user valueLoad_fifteen Holds <strong>the</strong> load fifteen value Mem_cached Holds <strong>the</strong> cached memory valuePart_max_used Holds <strong>the</strong> part max used value Proc_total Holds <strong>the</strong> total processes valueBytes_out Holds <strong>the</strong> bytes out value Cpu_num Holds <strong>the</strong> number of CPUs valueCpu_wio Holds <strong>the</strong> CPU I/O value Load_five Holds <strong>the</strong> load five valueMem_free Holds <strong>the</strong> free memory value Pkts_in Holds <strong>the</strong> packets in valueSwap_free Holds <strong>the</strong> swap free value Cpu_aidle Holds <strong>the</strong> CPU idle valueCpu_speed Holds <strong>the</strong> CPU speed value Disk_free Holds <strong>the</strong> free disk valueLoad_one Holds <strong>the</strong> load one value Mem_shared Holds <strong>the</strong> shared memory valuePkts_out Holds <strong>the</strong> packets out value Swap_total Holds <strong>the</strong> total swap valueby keeping an eye on each individualCPU available in <strong>the</strong> grid <strong>and</strong> alertingwhen certain threshold is triggered.This is illustrated in Figure 7.The <strong>Grid</strong> <strong>and</strong> network monitoringis one tool that all modules are pluggableto each o<strong>the</strong>r on top of one coreengine that manages <strong>the</strong> communicationbetween <strong>the</strong> different availablemodules controlled by XML manifest.This XML manifest makes it easierto add/remove/enhance any existingfunctionality. As shown in Figure 8,each module is working/operatingh<strong>and</strong> in h<strong>and</strong> with o<strong>the</strong>rs. The coreengine is hardwired with a circularbuffer using iNetmon technologyto support high speed network withhigh amount of traffic efficiently witha minimal packet loss. iNet-<strong>Grid</strong> is <strong>the</strong>core of <strong>the</strong> <strong>Grid</strong> monitoring where ithas ano<strong>the</strong>r pluggable interface for existinggrid monitoring tool integration(such as Ganglia) without any specialSUB-PROJECTS 37


<strong>Grid</strong>@USMFigure 5: Data format sent to <strong>the</strong> clientFigure 4: Functional architecture of iNet-<strong>Grid</strong>Figure 6: Snapshots of <strong>the</strong> systemFigure 7: Efficient <strong>Grid</strong> Environment Monitoring Features38 INET-GRID


<strong>Grid</strong> <strong>Computing</strong> <strong>Cluster</strong> – <strong>the</strong> <strong>Development</strong> <strong>and</strong> Integration of <strong>Grid</strong> Services <strong>and</strong> ApplicationsFigure 8: Efficient grid environment monitoring components <strong>and</strong> architectureFigure 9: <strong>Grid</strong> environmentmonitoring <strong>and</strong> deploymentconfiguration accept that for Ganglia(client) which is assumed to be ready<strong>and</strong> functioning as required. iNet-<strong>Grid</strong> client is installed with Gangliaclient to avoid installing <strong>the</strong> wholeGanglia package. Instead, <strong>the</strong> entiresystem will provide as a whole:1. Application usage statistics,2. Worm activity alerts,3. B<strong>and</strong>width usage,4. Top users/usage,5. Nodes status,6. Disk usage,7. Packets per second,8. <strong>Cluster</strong> statistics,9. CPU load,10. RAM usage.DEPLOYMENTAs we described earlier, <strong>the</strong> gridnetworkwill have an intelligent agentfor <strong>the</strong> network part that is responsiblefor any network activity h<strong>and</strong>in h<strong>and</strong> with a grid intelligent agentthat is in charge of each grid environmentresources. Figure 9 portrays <strong>the</strong>combination between network monitoring<strong>and</strong> grid monitoring as well asillustrating <strong>the</strong> deployment plan.CONCLUSION &FUTURE WORKResource monitoring is very importantfor <strong>Grid</strong> environment. Many maturesystems exist today. However,none of <strong>the</strong>m provides complete solution,which covers all requirementsneeded for efficient grid monitoring.With this we propose grid monitoringarchitecture namely iNet-<strong>Grid</strong>. Wefocus on following issues: integratingexisting systems on top of Ganglia,providing component for <strong>the</strong> managementof monitoring component <strong>and</strong>designing iNet Console for <strong>the</strong> GUI. Deploymentof <strong>the</strong> described monitoringsystems has already been successfullyaccomplished <strong>and</strong> <strong>the</strong> preliminary resultshave shown positive outcomes.PROJECT PUBLICATIONSAhmed, M. (2009). A Real TimeDistributed Network Monitoring<strong>and</strong> Security Monitoring Platform(RTDNMS). PhD <strong>the</strong>sis, School ofComputer Sciences, Universiti SainsMalaysia.Ahmed, M. & Talib, N. A. (2008).iNet-<strong>Grid</strong>: A Real-Time <strong>Grid</strong> Monitoring<strong>and</strong> Troubleshooting System.In Proceedings of <strong>the</strong> InternationalConference on Distributed Frameworks& Applications 2008 (DFmA 2008).Penang, Malaysia; pp. 68–72.U@S<strong>Grid</strong>MSUB-PROJECTS 39


Image: Spread of diseases can be modeled ma<strong>the</strong>matically. Illustration by ©Kroma Kromalsky (http://www.sxc.hu/profile/krominator).GRID APPLICATION TO WAVE FRONTPROPAGATION AND CONTAINMENTOF VECTOR BORNE DISEASESSchool of Ma<strong>the</strong>matical SciencesTEAM MEMBERSProject Leader : Prof. Koh Hock LyeResearchers : Dr Teh Su Yean, Tan Kah BeeINTRODUCTIONVector borne diseases such asdengue, Yellow fever, malaria,West Nile Encephalitis (WNE),Japanese Encephalitis (JE), St. LouisEncephalitis (SLE) <strong>and</strong> WesternEquine Encephalitis (WEE) pose seriouspublic health hazard worldwide.Nearly half of <strong>the</strong> world populationis infected with at least one type ofvector borne diseases (WHO 2000).Approximately 1.4 million lives arelost each year due to vector-borne diseases.For most of <strong>the</strong>se vector-bornediseases, vaccines are currently notavailable <strong>and</strong> are unlikely to be availablein <strong>the</strong> near future. Therefore,<strong>the</strong> focus is diverted to controlling<strong>the</strong> spread of a disease by controlling<strong>the</strong> vector of that disease. For thispurpose, a ma<strong>the</strong>matical model isdeveloped by <strong>the</strong> team to study <strong>the</strong>mechanisms involved in <strong>the</strong> invasion<strong>and</strong> persistence of vector borne diseases.Through model simulations,qualitative <strong>and</strong> quantitative assessmentscan be made to determine<strong>the</strong> factors affecting <strong>the</strong> dispersal ofvector population <strong>and</strong> <strong>the</strong> transmissionof <strong>the</strong> disease from <strong>the</strong> vector tohuman. Fur<strong>the</strong>r, control methods canbe experimented numerically on itseffectiveness before implementation.CLIMATE CHANGE IMPACTON DENGUEDengue fever is <strong>the</strong> world fastestgrowing vector borne disease whichis currently endemic or intermittentlyepidemic in many tropical <strong>and</strong>subtropical regions. Current estimatessuggest that up to 50 milliondengue cases occur annually, including500,000 cases of dengue haemorrhagicfever (WHO 2001). Denguevirus is transmitted by arthropods of<strong>the</strong> species Aedes aegypti, a mosquitofound throughout <strong>the</strong> world wherehot <strong>and</strong> humid climate is predominant.The transmission of <strong>the</strong> denguevirus has only one epidemiologicalcycle linking <strong>the</strong> human host <strong>and</strong> femaleAedes aegypti mosquito. Fourmajor serotypes of dengue virus havebeen identified. A person who recoversfrom an infection is immune toonly that serotype <strong>and</strong> may becomesecondarily infected with a differentserotype virus (Atkinson et al. 2007).In <strong>the</strong> absence of vaccine, <strong>the</strong> effortsto control dengue disease are focusedon controlling <strong>the</strong> mosquito vector.Fur<strong>the</strong>r, global warming may increase<strong>the</strong> outbreak risk of dengue diseaseassociated with wea<strong>the</strong>r or precipitationpattern modifications (Schaeffer,Mondet, <strong>and</strong> Touzeau 2008). Climatechange may be irreversible, <strong>and</strong> is predictedto become more extreme in <strong>the</strong>future (IPCC 2001). To examine <strong>the</strong>global-scale relationships between climate,Aedes aegypti populations <strong>and</strong>dengue disease, grid technology is appropriate,where a grid will simulateeach local habitat.APPLICATION OF GRIDTECHNOLOGYWNE is a vector borne disease withmore than one epidemiological cyclelinking <strong>the</strong> human host, <strong>the</strong> mosquitovector <strong>and</strong> ano<strong>the</strong>r mammal such asbirds <strong>and</strong> pigs. WNE started in New40


<strong>Grid</strong> <strong>Computing</strong> <strong>Cluster</strong> – <strong>the</strong> <strong>Development</strong> <strong>and</strong> Integration of <strong>Grid</strong> Services <strong>and</strong> ApplicationsFigure 1: Conceptual computation gridFigure 2: Compartment model for Dengue disease transmission∂WS∂t∂WI∂t= DIFF × ∂2 WS∂x 2= DIFF × ∂2 WI∂x 2− v × ∂WS(∂x + CONV × A × 1 − WN )− ALPHAW × WS − DISW × BITR × WS × HICCWHN− v × ∂WI∂x(∂A∂t = BETA × WN × 1 − ACCAHI− ALPHAW × WI + DISW × BITR × WS ×HN)− ALPHAA × A − CONV × A∂HS= ALPHAH × HN − ALPHAH × HS − DISH × BITR × HS × WI∂tWN∂HI∂t∂HR= REMVH × HI − ALPHAH × HR∂t∂HD∂t= DISH × BITR × HS × WI − REMVH × HI − ALPHAH × HI − DEADH × HIWN= DEADH × HI (1)York in nor<strong>the</strong>ast USA in 1990 butquickly spread to <strong>the</strong> southwest ina matter of 3 to 4 years. The localscale of mosquito dispersion cannotaccount for this speed with which <strong>the</strong>disease spread. The spread was indeedenhanced by infected birds thatmay fly over large distances of tensof kilometers over a season or a fewmonths. This pattern of dispersionmay be conceptualized in Figure 1,in which uneven patches of distributions(squares) spreading over a rectangleof 3000 km by 4000 km are usedto represent WNE distribution on <strong>the</strong>scale of continental USA. Over a localpatch (a square) of <strong>the</strong> order ofa few km, <strong>the</strong> PDE for mosquito willbe solved by finite difference methodwith mesh size of <strong>the</strong> order of 0.1 km,each patch requiring a total numberof meshes of <strong>the</strong> order of 10 to 100thous<strong>and</strong>s nodes.For large system simulations, normalcomputing power offered by <strong>the</strong>regular PC would not be adequate.We <strong>the</strong>refore use <strong>the</strong> high power computingsuch as parallel or grid computingto provide <strong>the</strong> needed power.In this project, we develop grid technologyfor general applications towave front propagation simulations,with particular reference to vectorborne disease dispersal <strong>and</strong> transmission.The grid application developedhas been extended to tsunami wavepropagation <strong>and</strong> will be extended too<strong>the</strong>r areas.DEER MODELDengue <strong>and</strong> Encephalitis EradicationRoutines (DEER) is a numerical simulationmodel developed in this projectto simulate <strong>the</strong> dynamics of vectorborne diseases transmission. DEER isdeveloped to simulate <strong>the</strong> distributionof Aedes aegypti mosquito population<strong>and</strong> <strong>the</strong> dynamics of dengue diseasetransmission, which has only one epidemiologicalcycle linking <strong>the</strong> humanhosts <strong>and</strong> <strong>the</strong> vector mosquitoes.DEER is based upon a mass balanceequation for two phases of mosquitolifecycle: <strong>the</strong> winged female mosquito(represented by W) <strong>and</strong> <strong>the</strong> aquaticform (represented by A), which includeseggs, larvae <strong>and</strong> pupae. Fordisease transmission, <strong>the</strong> winged femalemosquitoes (W) are divided intotwo different classes, susceptible wing(WS) <strong>and</strong> infected wing (WI). Humansare ano<strong>the</strong>r state variables <strong>and</strong>it has 4 different sub-variables, susceptiblehuman (HS), infected human(HI), recovered human (HR) <strong>and</strong>dead human (HD). The compartmentmodel for dengue disease transmissionis illustrated in Figure 2. Theequations governing <strong>the</strong> spatial <strong>and</strong>temporal evolution of <strong>the</strong> disease inDEER are given in (1).Let <strong>the</strong> mortality rate of <strong>the</strong>mosquitoes <strong>and</strong> aquatic forms berespectively ALPHAW (day −1 ) <strong>and</strong>ALPHAA (day −1 ). The specific rateof development of <strong>the</strong> aquatic formsinto wing mosquitoes will be CONV(day −1 ). The rate of oviposition byfemale mosquito is BETA (day −1 ).We will consider <strong>the</strong> Aedes aegyptidispersal as <strong>the</strong> result of a ran-SUB-PROJECTS 41


<strong>Grid</strong>@USMFigure 3: Compartment model for West Nile EncephalitisFigure 4: Graphical user interface (GUI) for DEERdom flying movement representedby a diffusion process with coefficientDIFF (km 2 /day) <strong>and</strong> <strong>the</strong> constantvelocity flux of <strong>the</strong> wind advectionis v (km/day). The densityof wing mosquitoes <strong>and</strong> aquaticforms is limited by carrying capacityof wing mosquitoes CCW (mosquito)<strong>and</strong> aquatic forms CCA (mosquito).Wing female mosquitoes become infectedwhen <strong>the</strong>y bite infected humans.The transmission probabilityfrom infected human hosts tosusceptible vector mosquitoes is DIS.Similarly, humans get dengue virusinfections from <strong>the</strong> bite of infectedwing female mosquitoes. DISH is<strong>the</strong> transmission probability from infectedvector mosquitoes to susceptiblehuman hosts. REMVH is recoveryrate of human (day −1 ). DEADH is<strong>the</strong> mortality rate of human causedby <strong>the</strong> virus. ALPHAH is <strong>the</strong> naturalmortality rate <strong>and</strong> birth rate of human(day −1 ). BITR is biting rate ofmosquito (day −1 ). AHOST is numberof alternative hosts available asblood sources. HN is <strong>the</strong> total humanpopulation (HN = HS + HI + HR).DEER is fur<strong>the</strong>r enhanced to enable<strong>the</strong> simulation of <strong>the</strong> transmission ofo<strong>the</strong>r vector borne diseases with morethan one epidemiological cycle suchas WNE <strong>and</strong> JE. WNE disease transmissioninvolves ano<strong>the</strong>r mammal likebirds or horse in <strong>the</strong> epidemiologicalcycle linking human host <strong>and</strong>mosquito. To enable simulation ofWNE transmission, bird is included asa state variable in our model. Similarto <strong>the</strong> human state variable, it hasfour sub-variables such as susceptiblebirds (BS), infected birds (BI), recoveredbirds (BR) <strong>and</strong> dead birds (BD).Figure 3 shows <strong>the</strong> life cycle of WestNile Virus Disease that mainly circulatesbetween mosquitoes <strong>and</strong> birds.The virus is transmitted to human byinfected mosquito bites. Humans are<strong>the</strong> end host in <strong>the</strong> life cycle of <strong>the</strong>encephalitis virus.GRID APPLICATIONMPI comm<strong>and</strong>s are incorporated into<strong>the</strong> DEER algorithm so that some routinescan be performed in parallel. Wehave successfully run DEER using ourUSM Campus <strong>Grid</strong> with 22 nodes, <strong>the</strong>results of which are presented <strong>and</strong>demonstrated in PRAGMA 15 <strong>and</strong> 16.GRAPHICAL USER INTERFACEA web-based application will enableeasy access to job submission <strong>and</strong> resultsretrieval by users. The user cansubmit jobs using a web browser runningei<strong>the</strong>r from Windows, Linux orMac OS X operating systems. Webbasedapplication enables simple retrievalof results <strong>and</strong> reduces datastorage size. The Microsoft .NetFramework is a platform that providestools <strong>and</strong> technologies we needto build Networked Applications aswell as Distributed Web Services <strong>and</strong>Web Applications. For creating a userfriendlymodel for <strong>the</strong> web services,we develop <strong>the</strong> GUI for DEER modelin C#.NET. Figure 4 shows <strong>the</strong> simpleGUI panel for DEER.AN EXAMPLE APPLICATIONFor <strong>the</strong> purpose of presentation, astudy domain with 400 × 400 gridcells is used for <strong>the</strong> simulations. Thespatial <strong>and</strong> temporal distribution ofAedes aegypti mosquito population isshown in Figure 5. Model simulationbegins with a small concentrationof wing mosquito. The wingmosquitoes spread out to locationswith water containers, which providesites for <strong>the</strong> mosquitoes to lay eggs.A small number of mosquitoes arefrequently transported by vehicles too<strong>the</strong>r locations. Here, we investigate<strong>the</strong> effectiveness of insecticide sprayingon controlling <strong>the</strong> dispersal ofmosquitoes. We assume that insecticideis applied to kill <strong>the</strong> mosquitoesat a certain time. The amount of insecticidesapplied is <strong>the</strong>n gradually reducedto zero. Upon application of insecticide,<strong>the</strong> density of mosquito populationdecreases with <strong>the</strong>ir dispersalcontained. However, <strong>the</strong> density <strong>and</strong>dispersal of <strong>the</strong> mosquitoes eventuallyreturn to <strong>the</strong> pre-treatment level.Figure 6 shows <strong>the</strong> physical computationaltime consumed subject to <strong>the</strong>42 GRID APPLICATION TO WAVE FRONT PROPAGATION AND CONTAINMENT OF VECTOR BORNE DISEASES


<strong>Grid</strong> <strong>Computing</strong> <strong>Cluster</strong> – <strong>the</strong> <strong>Development</strong> <strong>and</strong> Integration of <strong>Grid</strong> Services <strong>and</strong> ApplicationsFigure 5: Mosquito Population DistributionFigure 6: Computational time using different number of nodesFigure 7: Memory required for different research areaFigure 8: Dengue disease transmissionSUB-PROJECTS 43


<strong>Grid</strong>@USMTable 1: DEER computational time on a single PC <strong>and</strong> cluster with 10 <strong>and</strong> 20 nodesSegmentsComputational timeSingle PC 10 nodes 20 nodesCalculation 6 hrs 50 mins 40 minsPrinting 3 hrs 60 mins 20 minsTotal 9 hrs 1 hr 50 mins 1 hrvarious numbers of nodes used. Thereduction of time used becomes saturatedafter 12 nodes. This is because<strong>the</strong> execution speed of <strong>the</strong> programis limited by <strong>the</strong> system input <strong>and</strong>output latencies such as file access<strong>and</strong> message passing over network.Figure 7 shows <strong>the</strong> memory storageneeded for several sizes of study domain.The memory size needed fordata storage increases exponentiallyas <strong>the</strong> calculation domain increases.To examine <strong>the</strong> global-scale relationshipsamong climate, Aedes aegyptipopulations <strong>and</strong> transmissionof dengue disease, a large study domaintypically in terms of thous<strong>and</strong>sof kilometers is needed. This incursexcessive dem<strong>and</strong>s on <strong>the</strong> computationaltime <strong>and</strong> storage size as <strong>the</strong>computational grid size of <strong>the</strong> studydomain must be small enough to reflect<strong>the</strong> local distribution <strong>and</strong> flightreach of Aedes aegypti, which is onlya few hundred meters (Reiter et al.1995). Therefore, grid technology isused to speed up <strong>the</strong> computationaltime <strong>and</strong> to reduce storage size. Thisis achieved by dividing <strong>and</strong> allocatingsections of program computation toseveral computers.REGIONAL DENGUESIMULATION BY GRIDA global scale of dengue disease transmissionis considered in this studyto demonstrate <strong>the</strong> capability of gridcomputing in reducing <strong>the</strong> computationalcost of simulating global transmissionof dengue. Table 1 shows<strong>the</strong> computational time used to run aglobal case of dengue transmission fora single PC <strong>and</strong> clusters with 10 <strong>and</strong>20 nodes. A single simulation takesabout 9 hours to run on a single normalPC of Intel® Pentium® 1.60 GHz<strong>and</strong> 768 MB RAM. Using a clusterwith 20 nodes reduces <strong>the</strong> computationaltime to about half an hour;a total time reduction of more than80 %.DIRECTION OFFUTURE RESEARCHDEER is undergoing continuous enhancementsto upgrade its capabilityin simulating <strong>the</strong> dynamics of denguedisease transmission. Fur<strong>the</strong>r, DEER isundergoing revisions for applicationon <strong>the</strong> PRAGMA <strong>Grid</strong> network. Wehope that this project would stimulateactive research <strong>and</strong> collaborationin this region. Fur<strong>the</strong>r we have successfullyextended grid technology totsunami simulations, which typicallyrequire large computational resources,made available by <strong>Grid</strong> technology.PROJECT PUBLICATIONSKew, L. M., Teh, S. Y., & Koh, H. L.(2009). Optimization of TsunamiModel TUNA by <strong>Grid</strong> Technology.In Proceedings of <strong>the</strong> 5th Asian Ma<strong>the</strong>maticalConference (AMC). KualaLumpur, Malaysia.Koh, H. L., Lee, H. L., Teh, S. Y.,& Izani, A. (2009). Dengue <strong>and</strong>Tsunami Modeling: Application of<strong>Grid</strong> Technology. In Proceedingsof 2nd Regional Conference on Ecological<strong>and</strong> Environmental Modeling(ECOMOD 2007). Penang, Malaysia;pp. 22–28.Tan, K. B., Teh, S. Y., Koh, H. L.,Sui, L. L., Bahari, B., & Izani, A.(2009). Modeling West Nile Viruswith <strong>Grid</strong> Technology. In Proceedingsof <strong>the</strong> 16th Pacific-Rim ApplicationAnd <strong>Grid</strong> Middleware Assembly(PRAGMA 16). Daejeon, Korea.Teh, S. Y., Kew, L. M., & Koh, H.(2008). Application of <strong>Grid</strong> <strong>Computing</strong>in Modeling Tsunami <strong>and</strong>Dengue. In ICTP Advanced Schoolin High Performance <strong>and</strong> GRID <strong>Computing</strong>.Trieste, Italy.REFERENCESAtkinson, M. P., Su, Z., Alphey, N.,Alphey, L., Coleman, P. G., & Weln,L. M. (2007). Analyzing <strong>the</strong> Controlof Mosquito-borne Diseases bya Dominant Lethal Genetic System.Proceedings of <strong>the</strong> NationalAcademy of Sciences of <strong>the</strong> UnitedStates of America (PNAS), 104(22),pp. 9540–9545.Intergovernmental Panel for ClimateChange. (2001). Summary for policymakers climate change 2001: <strong>the</strong> scientificbasis. Cambridge UniversityPress.Reiter, P., Amador, M. A., Anderson,R. A., & Clark, G. G. (1995). ShortReport: Dispersal of Aedes aegyptiin Urban Area after Blood Feedingas Demonstrated by Rubidium-Marked Eggs. American Journal ofTropical Medicine <strong>and</strong> Hygiene, 52,pp. 177–179.Schaeffer, B., Mondet, B., & Touzeau,S. (2008). Using a Climate-Dependent Model to PredictMosquito Abundance: Applicationto Aedes (Stegomyia) africanus<strong>and</strong> Aedes (Diceromyia) furcifer(Diptera: Culicidae). Infection, Genetics<strong>and</strong> Evolution, 8, pp. 422–432.World Health Organization. (2001).Chiang Mai Declaration onDengue/Dengue HaemorrhagicFever. Wkly Epidemiol Rec,pp. 29–30.World Health Organization. (2000).World Health Report 2000 – HealthSystems: Improving Performance.World Health Organization.U@S<strong>Grid</strong>M44 GRID APPLICATION TO WAVE FRONT PROPAGATION AND CONTAINMENT OF VECTOR BORNE DISEASES


Image: Fireworks. Photograph by ©G Schouten de Jel (http://www.sxc.hu/profile/gabriel77).PROJECT ACTIVITIES


Image: A conference room. Photograph by ©Justyna Furmanczyk (http://www.sxc.hu/profile/just4you).ORGANISED EVENTSAPPLE CONTENT CREATION WORKSHOPThis workshop was jointly organised by <strong>the</strong> School of Computer Sciences (PPSK) <strong>and</strong> OpenSOSSdn. Bhd. on 26–28 February 2008 at PPSK. The objective of <strong>the</strong> workshop is to learn how technologyengages students in exploring <strong>the</strong>ir natural curiosity of <strong>the</strong> world while learning <strong>the</strong> core curriculum.Mr Ahmad Shaharuddin Amin b. Sahar, <strong>the</strong> facilitator, guided <strong>the</strong> participants from familiarising<strong>the</strong>mselves with a Mac to using applications in <strong>the</strong> iLife ’08 suite.INTRODUCTORY WORKSHOP TO GRID COMPUTINGAP Chan Huah Yong organised <strong>and</strong> facilitated this one-day workshop on 1–2 March, 2008 at PPSK,assisted by researchers from <strong>the</strong> <strong>Grid</strong> <strong>Computing</strong> Lab (GCL). All ROs <strong>and</strong> RAs from <strong>the</strong> sub-projectteams were invited to get a first taste of <strong>Grid</strong> computing technologies, especially from an end-userpoint perspective. The workshop included an introductory talk on <strong>Grid</strong> computing <strong>and</strong> h<strong>and</strong>s-onsessions on grid credentials, <strong>the</strong> Globus toolkit <strong>and</strong> <strong>the</strong> <strong>Grid</strong>Sphere portal framework.WORKSHOP ON GRID COMPUTINGSTRATEGIES & SECURITY ISSUESHeld in Pulau Perhentian on 2–4 July, 2008, thisworkshop focusing on grid security <strong>and</strong> monitoring.During a co-located session, ROs <strong>and</strong> RAsfrom <strong>the</strong> respective sub-project teams updatedeach o<strong>the</strong>r on <strong>the</strong>ir progress, <strong>and</strong> received trainingon <strong>the</strong> Trac Wiki system. We would later use Tracto post information <strong>and</strong> progress updates on <strong>the</strong>irrespective sub-projects in one centralized repository.46


<strong>Grid</strong> <strong>Computing</strong> <strong>Cluster</strong> – <strong>the</strong> <strong>Development</strong> <strong>and</strong> Integration of <strong>Grid</strong> Services <strong>and</strong> ApplicationsTHE 2ND INTEL MULTICORE WORKSHOPThis event was co-organised with Intel <strong>and</strong> held in Bangunan Eureka on 22–25 July 2008. AP ChanHuah Yong hosted a workshop including a h<strong>and</strong>s-on session <strong>the</strong>re.PRAGMA INSTITUTE, PRAGMA 15 & DFMA 2008The PRAGMA 15 Workshop was held <strong>and</strong> hosted byUSM on 22–24 October, 2008, while PRAGMA Institutemet at <strong>the</strong> same venue earlier on <strong>the</strong> 21 & 22. Allsub-projects were selected for poster exhibition, whileAP Chan Huah Yong, AP Phua Kia Kien <strong>and</strong> Prof. KohHock Lye’s projects were chosen for showcase demonstrations.In addition, The International Conference on Distributed Frameworks & Applications2008 (DFmA 2008) was also held in conjunction on <strong>the</strong> 21 & 22. DFmA 2008 accepted 4 papers from<strong>Grid</strong>@USM members.MICROSOFT HIGH PERFORMANCECOMPUTING (HPC) WORKSHOPThis 3-day workshop was held at <strong>the</strong> School of ComputerSciences, USM on 8–10 April 2009, organised by AP ChanHuah Yong. He was also joined by Mr Ang Sin Keat <strong>and</strong> MsKheoh Hooi Leng.Participants were introduced to <strong>the</strong> infrastructure of WindowsHigh Performance <strong>Computing</strong> (HPC) Server 2008 infrastructure,as well as deploying <strong>and</strong> administrating a WindowsHPC Server 2008 <strong>Cluster</strong>. Topics on <strong>the</strong> second day includednetwork configuration, diagnostics, performance tuning <strong>and</strong>scheduling jobs. The third <strong>and</strong> final day saw <strong>the</strong> participantsdeveloping C/C++ MPI applications with Visual Studio <strong>and</strong>MPI.NET.U@S<strong>Grid</strong>MPROJECT ACTIVITIES 47


Image: a conference room. Photograph by ©Ann Petersen (http://www.sxc.hu/profile/Lilja).OTHER EVENTSPRAGMA 14AP Bahari Belaton was present at PRAGMA 14 heldin March 2008 in Taichung, Taiwan. He <strong>and</strong> DatoProf. Muhd. Idiris, <strong>the</strong>n USM’s Deputy Vice Chancellorfor Research <strong>and</strong> Innovation, were invitedas co-chairs <strong>and</strong> took <strong>the</strong> opportunity to promote<strong>and</strong> raise awareness about PRAGMA 15, which wasto be held in USM in October <strong>the</strong> same year.COURSE ON ENABLING GRIDS FORE-SCIENCEMr Aloysius Indrayanto <strong>and</strong> Ms Kheoh Hooi Lengattended this training course at MIMOS Berhad inKuala Lumpur on 10–14 December, 2007.PBS TRAINING AND GRIDING OFUNIVERSITIESMr Tan Choo Jun <strong>and</strong> Ms Tan Chin Min participatedin this training held at International IslamicUniversity Malaysia (IIUM) on 5–7 May 2008.ICTP ADVANCED SCHOOL IN HIGHPERFORMANCE AND GRIDCOMPUTINGDr Teh Su Yean attended this course at <strong>the</strong> AbdusSalam International Centre for Theoretical Physics(ICTP), Trieste, Italy on 3–14 Nov 2008.48


<strong>Grid</strong> <strong>Computing</strong> <strong>Cluster</strong> – <strong>the</strong> <strong>Development</strong> <strong>and</strong> Integration of <strong>Grid</strong> Services <strong>and</strong> ApplicationsTHE 1ST KNOWLEDGE GRID MALAYSIA FORUM 2009This event was held on 18-19 March, 2009 at <strong>the</strong> IIUM. AP Chan Huah Yong was invitedto conduct a h<strong>and</strong>s-on session entitled “USM <strong>Grid</strong> (Globus)”. Three USM ROs i.e. MrAng Sin Keat, Ms Tan Kah Bee <strong>and</strong> Ms Sui Lee Lin also attended <strong>the</strong> two-day forum.SCOPE ICT CAREER FAIRThis event was held in conjunction with <strong>the</strong> SCoPE ICTExpo on 18 March 2009 at Suntech Penang Cybercity.<strong>Grid</strong>@USM members exhibited 6 posters about <strong>the</strong>irsub-projects at rented booths at <strong>the</strong> fair, including:❦ AP Chan Huah Yong• Virtual Distributed Database Management System• Mobile Desktop <strong>Grid</strong>• Campus <strong>Grid</strong> Portal❦ Dr Vincent Khoo Kay Teong• Enterprise <strong>Grid</strong> Architecture for CollaborativeB2B Information Interchange❦ Prof. Koh Hock Lye• In-house Tsunami Propagation Model TUNA-M2: Application of <strong>Grid</strong>• <strong>Grid</strong> Application: Modelling Mosquitoes DistributionROs attending <strong>the</strong> event include Mr (Mr) JohnsonFoong, Mr Kor Chan Hock, Mr Aloysius Indrayanto<strong>and</strong> Mr Tan Choo Jun.PRAGMA 16 & IDRICPRAGMA 16 was held on 23–26 March 2009 at DCC, Deajon, Korea, in conjunction with <strong>the</strong>International Workshop on Infectious Disease Researches in Cyberinfrastructure (IDRiC).4 posters from USM participated in PRAGMA 16:❦ Prof. Koh Hock Lye• Modeling Dengue Infection for Malaysia• Modelling West Nile Virus with <strong>Grid</strong> Technology❦ AP Chan Huah Yong – Virtual Distributed Database Management System❦ Mr Mohd. Azam Osman – Mobile Desktop <strong>Grid</strong>U@S<strong>Grid</strong>MPROJECT ACTIVITIES 49


Image: Books at a library. Photograph by ©Carlo Lazzeri (http://www.sxc.hu/profile/mvttley).PUBLICATIONSJOURNAL ARTICLESPhua, K. K. & Wong, V. C. (2007).Developing A Compelling 3DAnimation <strong>and</strong> Multimedia PresentationUsing Blender for IN-FORMM’s Opening Ceremony.Malaysian Journal of Medical Sciences,14(Supplement 1).Al-Mistarihi, H. H. E. & Chan, H. Y.(2009). On Fairness, OptimizingReplica Selection in Data <strong>Grid</strong>s.IEEE Transactions on Parallel &Distributed Systems. (forthcoming).Chan, H. Y. & Al-Mistarihi, H. H. E.(2008). Replica Management inData <strong>Grid</strong>. International Journalof Computer Science <strong>and</strong> NetworkSecurity.Chan, H. Y. & Al-Mistarihi, H. H.E. (2008). Replica OptimizationService in Data <strong>Grid</strong>s. SciencesPublications.Younis, M. I., Zamli, K. Z., Klaib, M.F. J., Che Soh, Z. H., Che Abdullah,S. A., & Mat Isa, N. A. (2009).Assessing IRPS as an EfficientPairwise Test Data GenerationStrategy. International Journal ofAdvanced Intelligence Paradigms.(forthcoming).Younis, M. I., Zamli, K. Z., & MatIsa, N. (2008). IRPS – An EfficientTest Data Generation Strategy forPairwise Testing. Lecture Notes inArtificial Intelligence, 5177.Zafri, M., Fadhilah, N. K., & Phua,K. K. (2008). Using Xgrid to ImproveFASTA Efficiency for Alignmentof Multiple DNA <strong>and</strong> ProteinSequences. Malaysian Journalof Medical Sciences, 15(Supplement1).CONFERENCE & WORKSHOPPROCEEDINGSZafri, M., Sanjay, K. C., & Phua, K.K. (2009). Implementation Methodsfor Estimating Haplotypeswith GRID <strong>Computing</strong> Technology.In Compendium of Abstractsfor <strong>the</strong> 14th National Conferenceon Medical <strong>and</strong> Health Sciences(NCMHS).Ahmed, M. & Talib, N. A. (2008).iNet-<strong>Grid</strong>: A Real-Time <strong>Grid</strong> Monitoring<strong>and</strong> Troubleshooting System.In Proceedings of <strong>the</strong> InternationalConference on DistributedFrameworks & Applications 2008(DFmA 2008). Penang, Malaysia;pp. 68–72.Indrayanto, A. & Chan, H. Y. (2008).Application of Game Theory<strong>and</strong> Fictitious Play in Data Placement.In Proceedings of <strong>the</strong> InternationalConference on DistributedFrameworks & Applications 2008(DFmA 2008). Penang, Malaysia;pp. 79–83.Kew, L. M., Teh, S. Y., & Koh, H. L.(2009). Optimization of TsunamiModel TUNA by <strong>Grid</strong> Technology.In Proceedings of <strong>the</strong> 5th AsianMa<strong>the</strong>matical Conference (AMC).Kuala Lumpur, Malaysia.Koh, H. L., Lee, H. L., Teh, S. Y., &Izani, A. (2009). Dengue <strong>and</strong>Tsunami Modeling: Applicationof <strong>Grid</strong> Technology. In Proceedingsof 2nd Regional Conference onEcological <strong>and</strong> Environmental Modeling(ECOMOD 2007). Penang,Malaysia; pp. 22–28.50


<strong>Grid</strong> <strong>Computing</strong> <strong>Cluster</strong> – <strong>the</strong> <strong>Development</strong> <strong>and</strong> Integration of <strong>Grid</strong> Services <strong>and</strong> Applications<strong>Lim</strong>, L. T. & Schwab, D. (2008).<strong>Lim</strong>its of Lexical Semantic Relatednesswith Ontology-basedConceptual Vectors. In Proceedingsof <strong>the</strong> 5th International Workshopon Natural Language Processing<strong>and</strong> Cognitive Science (NLPCS’08).Barcelona, Spain; pp. 153–158.Schwab, D. & <strong>Lim</strong>, L. T. (2008).Blexisma2: a Distributed AgentFramework for Constructing a SemanticLexical Database based onConceptual Vectors. In Proceedingsof <strong>the</strong> International Conference onDistributed Frameworks & Applications2008 (DFmA 2008). Penang,Malaysia; pp. 102–110.Tan, K. B., Teh, S. Y., Koh, H. L., Sui,L. L., Bahari, B., & Izani, A. (2009).Modeling West Nile Virus with<strong>Grid</strong> Technology. In Proceedingsof <strong>the</strong> 16th Pacific-Rim ApplicationAnd <strong>Grid</strong> Middleware Assembly(PRAGMA 16). Daejeon, Korea.Teh, S. Y., Kew, L. M., & Koh, H.(2008). Application of <strong>Grid</strong> <strong>Computing</strong>in Modeling Tsunami <strong>and</strong>Dengue. In ICTP Advanced Schoolin High Performance <strong>and</strong> GRID<strong>Computing</strong>. Trieste, Italy.Ting, T. T. & Khoo, V. K. T. (2009).B2B St<strong>and</strong>ardized Information InterchangeChallenges – A Studyon St<strong>and</strong>ardization versus Personalization.In Proceedings of<strong>the</strong> 2009 International Conferenceon Advanced Management Science(ICAMS 2009); pp. 244–248.Ting, T. T. & Khoo, V. K. T. (2008).Personalizable Information InterchangeArchitecture for EducationalInstitutions. In Proceedings of<strong>the</strong> 3rd International Conference one-Commerce with Focus on DevelopingCountries (ECDC’08). Isfahan,Iran.Younis, M. I., Zamli, K. Z., & MatIsa, N. A. (2008). A Strategyfor <strong>Grid</strong> Based t-Way Test DataGeneration. In Proceedings of <strong>the</strong>International Conference on DistributedFrameworks & Applications(DFmA 2008). Penang, Malaysia;pp. 73–78.THESESAhmed, M. (2009). A Real TimeDistributed Network Monitoring<strong>and</strong> Security Monitoring Platform(RTDNMS). PhD <strong>the</strong>sis, Schoolof Computer Sciences, UniversitiSains Malaysia.Al-Mistarihi, H. H. E. (2009). AData <strong>Grid</strong> Replica ManagementSystem with Local <strong>and</strong> GlobalMulti-Objective Optimization. PhD<strong>the</strong>sis, School of Computer Sciences,Universiti Sains Malaysia.Indrayanto, A. (2009). Initial File-Placement in Data <strong>Grid</strong> Environmentusing Game Theory <strong>and</strong>Fictitious Play. MSc <strong>the</strong>sis, Schoolof Computer Sciences, UniversitiSains Malaysia.BOOKSZamli, K. Z., Che Abdullah, S. A.,Younis, M. I., & Che Soh, Z. H.(2009). Software Testing Module.(forthcoming). OpenUniversityMalaysia <strong>and</strong> Pearson Publishing.U@S<strong>Grid</strong>MImage: A huge pile of books. Photograph by ©Sanja Gjenero(http://www.sxc.hu/profile/lusi).PROJECT ACTIVITIES 51


<strong>Grid</strong>@USM<strong>Grid</strong> computing can provide easy <strong>and</strong> transparent access tohigh performance computing by aggregating geographicallydistributed resources to act as a single powerful resource.With grid computing technologies, users can access to anyvirtual resources such as computers, special instruments, softwareapplications <strong>and</strong> databases in a seamless manner.<strong>Grid</strong> computing is by its nature highly distributed geographically,consists of highly specialised equipment (storage, gridengines <strong>and</strong> management tools), as well as expensive. It isas such an ideal example of common, core computational resourcesto be created <strong>and</strong> pooled among aspiring researcherswho require high performance resources. Equally important,Universiti Sains Malaysia (USM) as a research university (RU)needs a focused, holistic <strong>and</strong> dedicated effort to build a sustainablehuman capital in grid computing.It is with USM’s R&D needs <strong>and</strong> sustainability in mind that<strong>Grid</strong>@USM, a grid computing research cluster to address <strong>the</strong>problems mentioned above, came into being. <strong>Grid</strong>@USM’spilot project, led by AP Bahari Belaton <strong>and</strong> entitled <strong>Grid</strong> <strong>Computing</strong><strong>Cluster</strong> – <strong>the</strong> <strong>Development</strong> <strong>and</strong> Integration of <strong>Grid</strong> Services<strong>and</strong> Applications, was initiated under an RU research grant.This book records our achievements thus far, in <strong>the</strong> form oftechnical reports on 8 sub-projects. These sub-projects includeboth grid middleware <strong>and</strong> tool enablers, as well asapplications from various domains, spanning grid infrastructuresetup, grid monitoring, virtual distributed databases, 3Drendering, natural language processing, B2B st<strong>and</strong>ards componentmodeling, software testing <strong>and</strong> disease modeling.Platform for Information & Communication Technology ResearchUniversiti Sains Malaysia, 11800 USM, Penang, MalaysiaISBN 978-983-3986-58-39 789833 986583

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!