21.06.2013 Views

JAM: Java agents for Meta-Learning over Distributed Databases

JAM: Java agents for Meta-Learning over Distributed Databases

JAM: Java agents for Meta-Learning over Distributed Databases

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

thatthedataatBankAincludesomeadditionalPFeldsthatBankB'sdatalack.Inthis totheintersectionoftheeldsofthedatasetsofthetwobanks,impliesthatthesecond laterusebythemeta-learning<strong>agents</strong>,whilethesecondone,withouttheseeldsisexchanged. approach,BankAcanlearntwolocalmodels.OnewiththePFeldsisstoredlocally<strong>for</strong> classiermakesuseonlyoftheattributesthatarecommonamongtheparticipatingsites MethodII:LearnamodelusingPFeldsandholdlocally.Again,weassume<br />

importedbyBankA(andassurednottoinvolvepredictions<strong>over</strong>thePFelds)canstill andnoissueexists<strong>for</strong>itsintegrationatotherBanks.Ontheotherhand,remoteclassiers belocallyintegratedwiththeoriginalmodelthatemployesthePFelds.Inthiscase,the remoteclassierssimplyignorethePFeldsofthelocaldataset. <strong>Learning</strong>asecondclassierwithoutthePFelds,orbetteryet,witheldsthatbelong<br />

5.1Descriptionofthelearningprocess thesemodelsshouldproceedinastraight<strong>for</strong>wardmanner. Inthissection,wedescribethesettingofourexperiments.Inparticular,wesplittheoriginal datasetprovidedbyonebankintorandompartitionsandwedistributedthemacrossthe Bothapproachesaddressthedataschemaintegrationproblemand<strong>Meta</strong>-learning<strong>over</strong><br />

dierentsitesofthe<strong>JAM</strong>network.Thenwecomputedtheaccuracyfromeachmodel obtainedateachsuchpartition. datasetweusedinourexperiments,andkeptthem<strong>for</strong>theValidationandTestsetsto evaluatetheaccuracyoftheresultantdistributedmodels.Thelearningtaskistoidentify sitesofdata(saysites1and2),whiletwoinstancesofRipperareappliedelsewhere(sayat patternsinthe30attributeeldsthatcancharacterizethefraudulentclasslabel. sites3and4),allbeinginitiatedas<strong>Java</strong><strong>agents</strong>.Theresultofthesefourlocalcomputations Tobemorespecic,wesampled84,000recordsfromthetotalof500,000recordsofthe<br />

arefourseparateclassiers,CID3i();i=1;2,andCRipperj();j=3;4thatareeachinvocable as<strong>agents</strong>atarbitrarysitesofcreditcardtransactiondata. Let'sassume,withoutlossofgenerality,thatweapplytheID3learningprocesstotwo<br />

siteandinvokedremotelytoextractdata.Thiscanbeaccomplished<strong>for</strong>exampleusinga usingsay,CRipper3()thecodeimplementingthisclassierwouldbetransmittedtothefth queryofthe<strong>for</strong>m: inFigure5,arelativelysmallsetofrulesthatiseasilycommunicatedamongdistributed sitesasneeded.5Toextractfrauddatafromadistinctfthsiteofdata,oranyothersite, SelectX.*FromCredit-card-dataWhereCRipper3(X:fraudlabel)=1. AsampleRipperRule-BasedClassierlearnedfromthecreditcarddatasetisdepicted<br />

classiedas\notfraud"wouldresultinnoin<strong>for</strong>mationbeingreturnedatall(ratherthan basedentirelyupontheclassicationslearnedatsite3.Noticethatrequestingtransactions inafrauddetectionsystem. implementeddirectlyasadatalterappliedagainstincomingtransactionsataserversite 5Thespeciccondentialattributenamesarenotrevealedhere. Theendresultofthisqueryisastreamofdataaccessedfromsomeremotesource Naturally,theselectexpressionrenderedhereinSQLinthisexamplecanbeinstead<br />

12

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!