13.07.2015 Views

2 Credit Card Fraud Detection using Hidden Markov ... - Wiphala.net

2 Credit Card Fraud Detection using Hidden Markov ... - Wiphala.net

2 Credit Card Fraud Detection using Hidden Markov ... - Wiphala.net

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON DEPEDABLE AND SECURE COMPUTINGBarclaycard, the largest credit card company in the UK, towards the end of the last century [2].Retailers like Wal-Mart typically handle much larger number of credit card transactionsincluding on-line and regular purchases. As the number of credit card users rises worldwide, theopportunities for attackers to steal credit card details and subsequently commit fraud are alsoincreasing. The total credit card fraud in the USA itself is reported to be $2.7 billion in 2005 andestimated to be $3.0 billion in 2006 out of which $1.6 billion and $1.7 billion, respectively, arethe estimates of on-line fraud [3].<strong>Credit</strong> card based purchases can be categorized into two types: (a) Physical card and (b)Virtual card. In a physical card based purchase, the cardholder presents his card physically to amerchant for making a payment. To carry out fraudulent transactions in this kind of purchase, anattacker has to steal the credit card. If the cardholder does not realize the loss of card, it can leadto a substantial financial loss to the credit card company. In the second kind of purchase, onlysome important information about a card (card number, expiry date, secure code) is required tomake the payment. Such purchases are normally done on the Inter<strong>net</strong> or over telephone. Tocommit fraud in these types of purchases, a fraudster simply needs to know the card details. Mostof the time, the genuine cardholder is not aware that someone else has seen or stolen his cardinformation. The only way to detect this kind of fraud is to analyze the spending patterns onevery card and to figure out any inconsistency with respect to the “usual” spending patterns.<strong>Fraud</strong> detection based on the analysis of existing purchase data of cardholder is a promising wayto reduce the rate of successful credit card frauds. Since humans tend to exhibit specificbehavioristic profiles, every cardholder can be represented by a set of patterns containinginformation about the typical purchase category, the time since the last purchase, the amount ofmoney spent, etc. Deviation from such patterns is a potential threat to the system.3


This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON DEPEDABLE AND SECURE COMPUTINGvariety of commercial databases. Kim and Kim have identified skewed distribution of data andmix of legitimate and fraudulent transactions as the two main reasons for the complexity ofcredit card fraud detection [9]. Based on this observation, they use fraud density of realtransaction data as a confidence value and generate the weighted fraud score to reduce thenumber of misdetections.Fan et al [10] suggest the application of distributed data mining in credit card frauddetection. Brause et al [11] have developed an approach that involves advanced data miningtechniques and neural <strong>net</strong>work algorithms to obtain a high fraud coverage. Chiu and Tsai [12]have proposed web services and data mining techniques to establish a collaborative scheme forfraud detection in the banking industry. With this scheme, participating banks share knowledgeabout the fraud patterns in a heterogeneous and distributed environment. To establish a smoothchannel of data exchange, web services techniques such as XML, SOAP and WSDL are used.Phua et al [13] have done an extensive survey of existing data mining based fraud detectionsystems and published a comprehensive report. Prodromidis and Stolfo [14] use an agent basedapproach with distributed learning for detecting frauds in credit card transactions. It is based onartificial intelligence and combines inductive learning algorithms and meta learning methods forachieving higher accuracy.Phua et al [15] suggest the use of meta classifier similar to [6] in fraud detectionproblems. They consider naïve Bayesian, C4.5 and Back Propagation neural <strong>net</strong>works as the baseclassifiers. A meta-classifier is used to determine which classifier should be considered based onskewness in data. Although they do not directly use credit card fraud detection as the targetapplication, their approach is quite generic. Vatsa et al [16] have recently proposed a gametheoreticapproach to credit card fraud detection. They model the interaction between an attacker5


This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON DEPEDABLE AND SECURE COMPUTINGand a fraud detection system as a multi-stage game between two players, each trying tomaximize his payoff.The problem with most of the above-mentioned approaches is that they require labeleddata for both genuine as well as fraudulent transactions to train the classifiers. Getting real worldfraud data is one of the biggest problems associated with credit card fraud detection. Also, theseapproaches cannot detect new kinds of frauds for which labeled data is not available. In contrast,we present a <strong>Hidden</strong> <strong>Markov</strong> Model (HMM) based credit card fraud detection system, whichdoes not require fraud signatures and yet, is able to detect frauds by considering a cardholder'sspending habit. We model credit card transaction processing sequence by the stochastic processof an HMM. The details of items purchased in individual transactions are usually not known to afraud detection system running at the bank that issues credit cards to the cardholders. This can berepresented as the underlying finite <strong>Markov</strong> chain which is not observable. The transactions canonly be observed through the other stochastic process that produces the sequence of the amountof money spent in each transaction. Hence, we feel that HMM is an ideal choice for addressingthis problem. Another important advantage of the HMM-based approach is a drastic reduction inthe number of false positives – transactions identified as malicious by an FDS although they areactually genuine. Since the number of genuine transactions is a few orders of magnitude higherthan the number of malicious transactions, an FDS should be designed in such a way that thenumber of false positives is as low as possible. Otherwise, due to the “base rate fallacy” effect[17], bank administrators may tend to ignore the alarms. To the best of our knowledge, there isno other published literature on the application of HMM for credit card fraud detection.The rest of the paper is organized as follows. In Section 3, we first briefly explain theworking principle of an HMM. We then show how to model credit card transaction processing6


This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON DEPEDABLE AND SECURE COMPUTINGattacker is not expected to have a behavior similar to the genuine user. Hence, an alarm is raisedin case of any deviation.An HMM can be characterized by the following [18]:1. N, the number of states in the model. We denote the set of states S = {S 1 , S 2 ,…S N }, whereS i , i =1, 2,…, N is an individual state. The state at time instant t is denoted by q t .2. M, the number of distinct observation symbols per state. The observation symbolscorrespond to the physical output of the system being modeled. We denote the set ofsymbols V = {V 1 , V 2 , ….V M }, where V i , i = 1, 2,..., M is an individual symbol.3. The state transition probability matrix A = [a ij ], wherea ij = P(q t+1 = S j | q t = S i ), 1 i N, 1 j N; t = 1, 2,…. (1)For the general case where any state j can be reached from any other state i in a singlestep, we have a ij > 0 for all i, j. Also,Nj=1aij= 1,1 ≤ i ≤ N4. The observation symbol probability matrix B = [b j (k)], whereb j (k) = P(V k | S j ), 1 j N, 1 k M andMb( k)= 1, 1 ≤k = 1jj ≤ N(2)5. The initial state probability vector = [ i ] whereN i = P(q 1 = S i ), 1 i N, such that π = 1(3)i=16. The observation sequence O = O 1 , O 2 , O 3 ,...O R , where each observation O t is one of thesymbols from V and R is the number of observations in the sequence.iIt is evident that a complete specification of an HMM requires the estimation of two modelparameters, N and M, and three probability distributions A, B and . We use the notation = (A,8


This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON DEPEDABLE AND SECURE COMPUTINGB, ) to indicate the complete set of parameters of the model, where A, B implicitly include N andM.An observation sequence O as mentioned above, can be generated by many possible statesequences. Consider one such particular sequence:Q = q 1 , q 2 , …, q R (4)where q 1 is the initial state.The probability that O is generated from this state sequence is given by:RP(O|Q, λ) = ∏t=1P(O t |q t , λ) (5)where statistical independence of observations is assumed. Eq. 5 can be expanded as:P(O|Q, λ) = b q 1(O 1 ).b q 2(O 2 )…b q R(O R ) (6)The probability of the state sequence Q is given as:P(Q|λ) = π q 1a q 1 q a2 q 2 3q …a qR−1q (7)RThus, the probability of generation of the observation sequence O by the <strong>Hidden</strong> <strong>Markov</strong> Modelspecified by λ can be written as follows:P(O|λ) =all QP(O|Q, λ) P (Q| λ) (8)Deriving the value of P(O| λ) <strong>using</strong> the direct definition of Eq. 8 is computationally intensive.Hence, a procedure named as Forward-Backward procedure [18] is used to compute P(O| λ).4. USE OF HMM FOR CREDIT CARD FRAUD DETECTIONA fraud detection system runs at a credit card issuing bank. Each incoming transaction issubmitted to the FDS for verification. FDS receives the card details and the value of purchase toverify whether the transaction is genuine or not. The types of goods that are bought in that9


This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON DEPEDABLE AND SECURE COMPUTINGtransaction are not known to the FDS. It tries to find any anomaly in the transaction based on thespending profile of the cardholder, shipping address and billing address, etc. If the FDS confirmsthe transaction to be malicious, it raises an alarm and the issuing bank declines the transaction.The concerned cardholder may then be contacted and alerted about the possibility that the card iscompromised. In this section we explain how HMM can be used for credit card fraud detection.A set of notations and acronyms used in the paper is given in Table I.Table I. Notations and AcronymsNotation MeaningMNumber of observation symbolsNNumber of hidden statesV k k=1..M Observation symbolsl, m, h Price ranges – low, medium and higha x-yProbability of transition from the <strong>Hidden</strong><strong>Markov</strong> Model state x to state yKNumber of clustersc iCentroid of cluster iRSequence lengthαProbability of acceptance of a sequence by<strong>Hidden</strong> <strong>Markov</strong> ModelAcronym Expanded FormFDS<strong>Fraud</strong> <strong>Detection</strong> SystemHMM <strong>Hidden</strong> <strong>Markov</strong> ModelSPSpending Profilehs, ms, ls High spending, Medium spending, Lowspending groupTP, FP True Positive, False Positive4.1. HMM Model for <strong>Credit</strong> <strong>Card</strong> Transaction ProcessingTo map the credit card transaction processing operation in terms of an HMM, we start by firstdeciding the observation symbols in our model. We quantize the purchase values x into M priceranges V 1 , V 2 ,...V M forming the observation symbols at the issuing bank. The actual price rangefor each symbol is configurable based on the spending habit of individual cardholders. These10


This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON DEPEDABLE AND SECURE COMPUTINGprice ranges can be determined dynamically by applying a clustering algorithm on the values ofeach cardholder's transactions as shown in the next sub-section. We use V k , k =1, 2,....M torepresent both the observation symbol as well as the corresponding price range.In this work, we consider only three price ranges, namely, low (l), medium (m) and high(h). Our set of observation symbols is, therefore, V = {l, m, h} making M = 3. For example, let l= (0, $100], m = ($100, $500] and h = ($500, credit card limit]. If a cardholder performs atransaction of $190, then the corresponding observation symbol is m.A credit cardholder makes different kinds of purchases of different amounts over a periodof time. One possibility is to consider the sequence of transaction amounts and look fordeviations in them. However, the sequence of types of purchase is more stable compared to thesequence of transaction amounts. The reason is that, a cardholder makes purchases depending onhis need for procuring different types of items over a period of time. This, in turn, generates asequence of transaction amounts. Each individual transaction amount usually depends on thecorresponding type of purchase. Hence, we consider the transition in the type of purchase as statetransition in our model. The type of each purchase is linked to the line of business of thecorresponding merchant. This information about the merchant’s line of business is not known tothe issuing bank running the FDS. Thus, the type of purchase of the cardholder is hidden fromthe FDS. The set of all possible types of purchase and equivalently, the set of all possible lines ofbusiness of merchants forms the set of hidden states of the HMM. It should be noted at this stagethat the line of business of the merchant is known to the acquiring bank, since this information isfurnished at the time of registration of a merchant. Also, some merchants may be dealing invarious types of commodities (For example, Wal Mart, K-Mart or Target sells tens of thousandsof different items). Such types of line of business are considered as Miscellaneous and we do not11


This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON DEPEDABLE AND SECURE COMPUTINGattempt to determine the actual types of items purchased in these transactions. Any assumptionabout availability of this information with the issuing bank and hence with the FDS is notpractical and, therefore, would not have been valid. In the results section, we show the effect ofchoice of the number of states on the system performance.After deciding the state and symbol representations, the next step is to determine theprobability matrices A, B and so that representation of the HMM is complete. These threemodel parameters are determined in a training phase <strong>using</strong> the Baum-Welch algorithm [18]. Theinitial choice of parameters affects the performance of this algorithm and hence, they should bechosen carefully.We consider the special case of fully connected HMM in which every state of the modelcan be reached in a single step from every other state as shown in Figure 1. Gr, El, Mi, etc. arenames given to the states to denote purchase types like Groceries, Electronic items andMiscellaneous purchases. Spending profiles of the individual cardholders are used to obtain aninitial estimate for probability matrix B of Eq. 2. In the following sub-section, we explain how todetermine observation symbols dynamically from a cardholder's transactions.Error!12


This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON DEPEDABLE AND SECURE COMPUTINGhb gr-hmb gr-mlma gr-mib mi-hha gr-grGrb gr-lb el-mEla gr-ela mi-eia ei-gra mi-gra ei-eia ei-miMia mi-miFigure 1. HMM for <strong>Credit</strong> <strong>Card</strong> <strong>Fraud</strong> <strong>Detection</strong>4.2. Dynamic Generation of Observation SymbolsFor each cardholder, we train and maintain an HMM. To find the observation symbolscorresponding to individual cardholder's transactions dynamically, we run a clustering algorithmon his past transactions. Normally, the transactions that are stored in the issuing bank's databasecontain many attributes. For our work, we consider only the amount that the cardholder spent inhis transactions. While various clustering techniques could be used, we use K-means clusteringalgorithm [24] to determine the clusters. K-means is an unsupervised learning algorithm forgrouping a given set of data based on the similarity in their attribute (often called feature) values.Each group formed in the process is called a cluster. The number of clusters K is fixed a priori.The grouping is performed by minimizing the sum of squares of distances between each datapoint and the centroid of the cluster to which it belongs.13


This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON DEPEDABLE AND SECURE COMPUTINGIn our work, K is the same as the number of observation symbols M. Let c 1 , c 2 , ….c M bethe centroids of the generated clusters. These centroids or mean values are used to decide theobservation symbols when a new transaction comes in. Let x be the amount spent by thecardholder u in transaction T. FDS generates the observation symbol for x (denoted by O x ) asfollows:Ox= (9)V arg min |x−c i |iAs mentioned before, the number of symbols is 3 in our system. Considering M = 3, if weexecute K-means algorithm on the example transactions of Table II, we get the clusters as shownin Table III with c l , c m and c h as the respective centroids. It may be noted that the dollar amounts5, 10 and 10 have been clustered together as c l resulting in a centroid of 8.3. The percentage (p)of total number of transactions in this cluster is thus 30%. Similarly, dollar amounts 15, 15, 20,25 and 25 have been grouped in the cluster c m with centroid 20 while amounts 40 and 80 havebeen grouped together in cluster c h . c m and c h , thus, contain 50% and 20% of the total number oftransactions. When the FDS receives a transaction T for this cardholder, it measures the distanceof the purchase amount x with respect to the means c l , c m and c h , to decide (<strong>using</strong> Eq. 9) thecluster to which T belongs and hence, the corresponding observation symbol. As an example, if x= $10, then from Table III <strong>using</strong> Eq. 9, the observation symbol is V 1 = l.Table II. Example Transactions with the Dollar Amount spent in each TransactionTransaction no. 1 2 3 4 5 6 7 8 9 10Dollar Amount 40 25 15 5 10 25 15 20 10 80Table III. Output of K-means Clustering AlgorithmCluster mean/centroid name c l c m c hObservation symbol V 1 = l V 2 = m V 3 = hMean value (Centroid) 8.3 20 60Percentage of total transactions (p) 30 50 2014


This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON DEPEDABLE AND SECURE COMPUTING4.3. Spending Profile of <strong>Card</strong>holdersThe spending profile of a cardholder suggests his normal spending behavior. <strong>Card</strong>holders can bebroadly categorized into three groups based on their spending habits, namely, high spending (hs)group, medium spending (ms) group and low spending (ls) group. <strong>Card</strong>holders who belong to thehigh spending group, normally use their credit cards for buying high-priced items. Similardefinition applies to the other two categories also.Spending profiles of cardholders are determined at the end of the clustering step. Let p ibe the percentage of total number of transactions of the cardholder that belong to cluster withmean c i . Then, the spending profile (SP) of the cardholder u is determined as follows:SP (u)= (10)arg max (p i )iThus, spending profile denotes the cluster number to which most of the transactions of thecardholder belong. In the example of Table III, the spending profile of the cardholder is 2, i.e. mand hence the cardholder belongs to the medium spending group.4.4. Model Parameter Estimation and TrainingWe use Baum-Welch algorithm to estimate the HMM parameters for each cardholder. Thealgorithm starts with an initial estimate of HMM parameters A, B and and converges to thenearest local maximum of the likelihood function. Initial state probability distribution isconsidered to be uniform, i.e., if there are N states, then the initial probability of each state is 1/N.Initial guess of transition and observation probability distributions can also be considered to beuniform. However, to make the initial guess of observation symbol probabilities more accurate,spending profile of the cardholder as determined in Section 4.3 is taken into account. We make15


This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON DEPEDABLE AND SECURE COMPUTINGthree sets of initial probability for observation symbol generation for three spending groups - ls,ms and hs. Based on the cardholder’s spending profile, we choose the corresponding set of initialobservation probabilities. The initial estimate of symbol generation probabilities <strong>using</strong> thismethod leads to accurate learning of the model. Since there is no a priori knowledge about thestate transition probabilities, we consider the initial guesses to be uniform. In case of acollaborative work between an acquiring bank and an issuing bank, we can have better initialguess about state transition probabilities as well.We now start training the <strong>Hidden</strong> <strong>Markov</strong> Model. The training algorithm has thefollowing steps (1) Initialization of HMM parameters, (2) Forward procedure and (3) Backwardprocedure. Details of these steps can be found in [18]. For training the HMM, we convert thecardholder’s transaction amount into observation symbols and form sequences out of them. Atthe end of the training phase, we get an HMM corresponding to each cardholder. Since this stepis done off-line, it does not affect the credit card transaction processing performance, whichneeds on-line response.4.5. <strong>Fraud</strong> <strong>Detection</strong>After the HMM parameters are learnt, we take the symbols from a cardholder's training data andform an initial sequence of symbols. Let O 1 , O 2 , …O R be one such sequence of length R. Thisrecorded sequence is formed from the cardholder's transactions up to time t. We input thissequence to the HMM and compute the probability of acceptance by the HMM. Let theprobability be 1 , which can be written as follows: 1 = P(O 1 , O 2 , O 3 ,…O R | ) (11)16


This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON DEPEDABLE AND SECURE COMPUTINGLet O R+1 be the symbol generated by a new transaction at time t + 1. To form another sequenceof length R, we drop O 1 and append O R+1 in that sequence, generating O 2 , O 3 ,..O R , O R+1 as thenew sequence. We input this new sequence to the HMM and calculate the probability ofacceptance by the HMM. Let the new probability be 2 . 2 = P(O 2 , O 3 , O 4 ,… O R+1 | ) (12)Let = 1 - 2 (13)If > 0, it means that the new sequence is accepted by the HMM with low probability and itcould be a fraud. The newly added transaction is determined to be fraudulent if the percentagechange in the probability is above a threshold, i.e., / 1 Threshold (14)The threshold value can be learnt empirically as will be discussed in the next section. If O R+1 ismalicious, the issuing bank does not approve the transaction and the FDS discards the symbol.Otherwise, O R+1 is added in the sequence permanently and the new sequence is used as the basesequence for determining the validity of the next transaction. The reason for including new nonmalicioussymbols in the sequence is to capture the changing spending behavior of a cardholder.Figure 2 shows the complete process flow of the proposed FDS. As shown in the figure, the FDSis divided into two parts - one is the training module and the other is detection. Training phase isperformed off-line while detection is an on-line process.17


This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON DEPEDABLE AND SECURE COMPUTINGCreateclustersIdentifyspending profileof cardholderChoose initial setof probabilitiesbased on thespending profileConstruct sequencesfrom training dataConstruct andtrain model1. Training2. <strong>Detection</strong>TransactionGenerate observationsymbol O R+1Add O R+1 inexisting sequenceto form newsequenceAccept bothold and newsequenceCalculateAdd O R+1 symbol tothe existing sequenceAlarmNormalAnomalyTestFigure 2. Process Flow of the Proposed FDS5. RESULTSTesting credit card fraud detection systems <strong>using</strong> real data set is a difficult task. Banks do not, ingeneral, agree to share their data with researchers. There is also no benchmark data set availablefor experimentation. We have, therefore, performed large-scale simulation studies to test theefficacy of the system. A simulator is used to generate a mix of genuine and fraudulenttransactions. The number of fraudulent transactions in a given length of mixed transactions isnormally distributed with a user specified μ (mean) and σ (standard deviation), takingcardholder's spending behavior into account. μ specifies the average number of fraudulenttransactions in a given transaction mix. In a typical scenario, an issuing bank, and hence its FDS,receives a large number of genuine transactions sparingly intermixed with fraudulenttransactions. The genuine transactions are generated according to the cardholders’ profiles. Thecardholders are classified into three categories as mentioned before - the low, medium and high18


This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON DEPEDABLE AND SECURE COMPUTINGspending groups. We have studied the effects of spending group and the percentage oftransactions that belong to the low, medium and high price range clusters. We use standardmetrics - True Positive (TP) and False Positive (FP) as well as TP-FP spread and Accuracymetrics as proposed in [7] to measure the effectiveness of the system. TP represents the fractionof fraudulent transactions correctly identified as fraudulent while FP is the fraction of genui<strong>net</strong>ransactions identified as fraudulent. Most of the design choices for a fraud detection system thatresult in higher values of TP, also cause FP to increase. To meaningfully capture theperformance of such a system, the difference between TP and FP, often called the TP-FP spread,is used as a metric. Accuracy represents the fraction of total number of transactions (bothgenuine and fraudulent) that have been detected correctly. It can be expressed as follows:Accuracy =No. of good trans.detected as good + No.of bad trans.detected as badTotal No.of transactions(15)We first carried out a set of experiments to determine the correct combination of HMMdesign parameters, namely, the number of states, the sequence length and the threshold value.Once these parameters were decided, we performed comparative study with another frauddetection system.5.1. Choice of Design ParametersSince there are three parameters in an HMM, we need to vary one at a time keeping the other twofixed, thus generating a large number of possible combinations. For choosing the designparameters, we generate transaction sequences <strong>using</strong> 95% low value, 3% medium value and 2%high value transactions. The reason for <strong>using</strong> this mix is that it represents a profile that stronglyresembles a low spending customer profile. We also consider the μ and σ values to be 1.0 and0.5, respectively. This is chosen so that on an average there will be 1 fraudulent transaction in19


This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON DEPEDABLE AND SECURE COMPUTINGany incoming sequence with some scope for variation. After the parameter values are fixed, wewill see in the next sub-section, how the system performs as we vary the profile and the mix offraudulent transactions.For parameter selection, the sequence length is varied from 5 to 25 in steps of 5. Thethreshold values considered are 30%, 50%, 70% and 90%. The number of states is varied from 5to 10 in steps of 1. We consider both TP and FP for deciding the optimum parameter values.Thus, there are a total of 120 (5×4×6) possible combinations of parameters. The number ofsimulation runs required for obtaining results with a given confidence interval was derived asfollows [25, 26].An initial set of 5 simulation runs, each with 100 samples, was carried out to estimate themean and standard deviation of both TP and FP for a fixed sequence length, number of states andthreshold value. Mean TP was found to be an order of magnitude higher than mean FP. Standarddeviation of TP was 0.1 and that for FP was 0.005. We set the target 95% Confidence Interval(CI) for TP and FP respectively as +/- 2.5% and +/- 0.25% around their mean values. UsingStudent’s t-distribution, the minimum number of simulation runs required for obtaining desiredCI for TP was derived as 83 and that for FP as 23. Based on these observations, we set thenumber of simulation runs for all the experiments to be 100. The results obtained were within thedesired confidence interval as mentioned above.Since it is not convenient to present the detailed results for each of the 120 combinations,we show summarized results. In Table IV, we show results for each value of sequence lengthaveraged over all the 6 states. Similarly, we present results for each value of the number of statesaveraged over all the 5 sequence lengths in Table V.20


This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON DEPEDABLE AND SECURE COMPUTINGTable IV. Variation of TP and FP with different sequence lengthsThresholdTP averaged over all the 6 states fordifferent sequence lengthsFP averaged over all the 6 states fordifferent sequence lengths(%) 5 10 15 20 25 5 10 15 20 2530 0.52 0.56 0.64 0.58 0.6 0.05 0.05 0.05 0.05 0.0550 0.54 0.54 0.63 0.57 0.6 0.03 0.05 0.04 0.05 0.0570 0.50 0.60 0.60 0.61 0.59 0.04 0.04 0.05 0.05 0.0590 0.42 0.52 0.59 0.58 0.57 0.02 0.04 0.05 00.05 0.05Table V. Variation of TP and FP with different number of statesThresholdTP averaged over all the 5 sequencelengths for different no. of statesFP averaged over all the 5 sequencelengths for different no. of states(%) 5 6 7 8 9 10 5 6 7 8 9 1030 0.56 0.59 0.55 0.60 0.61 0.55 0.06 0.04 0.05 0.05 0.04 0.0450 0.57 0.60 0.52 0.56 0.59 0.59 0.05 0.04 0.05 0.05 0.05 0.0470 0.57 0.59 0.56 0.58 0.56 0.62 0.04 0.05 0.04 0.05 0.04 0.0590 0.56 0.51 0.60 0.53 0.49 0.55 0.03 0.04 0.03 0.04 0.04 0.04In both Tables IV and V, the highest value of TP as well as the lowest value of FP has beenhighlighted for each row. It is seen from the two tables that FP shows a clear trend of decreasingwith higher threshold and smaller sequence lengths. However, the number of states does not havea strong influence either on TP or on FP. From Table IV it is seen that TP is high for sequencelength 15 in 75% of the cases. Also, fraud detection time increases linearly with the sequence21


This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON DEPEDABLE AND SECURE COMPUTINGlength as shown in Figure 3. The results have been plotted for a Java implementation on a 1.8GHz Pentium IV processor machine. Hence, we choose 15 as the length of observation symbolsequence for optimum performance. Once sequence length is decided, it is seen from Table IVthat the threshold could be set to either 30% or 50%. While TP is higher for threshold=30%, FPis also higher. To minimize FP, we choose threshold=50%. After choosing sequence length andthreshold, we have to choose the number of states. Since there is no clear indication from theabove summary information as presented in Tables IV and V, we take a look at the detailed datafor TP when threshold = 50% as shown in Table VI below.Table VI. Detailed result of TP for threshold = 50%Sequence LengthNo. of States 5 10 15 20 255 0.53 0.54 0.53 0.67 0.596 0.50 0.55 0.67 0.43 0.567 0.54 0.42 0.67 0.43 0.568 0.58 0.46 0.54 0.60 0.639 0.50 0.61 0.60 0.63 0.6310 0.57 0.65 0.75 0.49 0.5It is seen that for sequence length = 15, the highest value of TP occurs for no. of states = 10 andhence, it would be a good choice for our design.22


This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON DEPEDABLE AND SECURE COMPUTING<strong>Detection</strong> Time (Sec.)1.81.61.41.21.00.80.60.40.20.00 20 40 60 80 100Sequence LengthFigure 3. <strong>Detection</strong> Time Vs. Length of SequenceWe have also analyzed the time taken by the training phase, which is performed off-line for eachcardholder's HMM. Figure 4 shows the plot of model learning time against the number ofsequences in the training data. As the size of training data increases, learning time increases,especially beyond 100. We, therefore, use 100 sequences for training the HMM. Although doneoff-line, the model learning time has a strong impact on the scalability of the system. Since anHMM is trained for each cardholder, it is imperative that the training time is kept as low aspossible especially when an issuing bank is meant to handle millions of cardholders with manynew cards being issued everyday. The on-line processing time of about 200 ms on a 1.8 GHzPentium IV machine also shows that the system will be able to handle a large number ofconcurrent operations and hence, is scalable.Thus, our design parameter setting is as follows: (i) Number of hidden states N = 10 (ii)Length of observation sequence R = 15 (iii) Threshold value = 50% (iv) Number of sequencesfor training = 100. With this design parameter setting, we next proceed to study the performanceof the system under various combinations of input data.23


This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON DEPEDABLE AND SECURE COMPUTINGLearning Time (Sec.)1614121086420100 500 1000 2000 3000 4000 5000 10000No. of Sequences in Training DataFigure 4. Model Learning Time Vs. Number of Sequences in Training Data5.2. Comparative PerformanceIn this sub-section, we show performance of the proposed system as we vary the number offraudulent transactions and also the spending profile of the cardholder. Our design parametersetting is as obtained in the previous sub-section. We compare performance of our approach(denoted by OA below) with the credit card fraud detection technique proposed by Stolfo et al[6] (denoted by ST below). For comparison, we consider the metrics TP and FP as well as TP-FPand Accuracy [7].We carried out experiments by varying both the transaction amount mix as well as thenumber of fraudulent transactions intermixed with a sequence of genuine transactions.Transaction amount mix is captured by the cardholder’s profile. We consider four profiles. Oneof them is the mixed profile, which means that spending profile is not considered at all by ourapproach as explained in Section 4.3. The other profiles considered are (55 35 10), (70 20 10),and (95 3 2). Here (a bc) profile represents a low spending profile cardholder who has beenfound to carry out a% of his transactions in the low, b% in medium and c% in the high range.Thus, our attempt is to see how the system performs in the presence of different mixes of24


This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON DEPEDABLE AND SECURE COMPUTINGtransaction amount ranges in the transactions. It may be noted that for cardholders in the othertwo groups, namely, high spending and medium spending, will show similar performance as onlythe relative ordering of a, b and c will change. We also vary the mean value μ of malicioustransactions from 0.5 to 4.0 in steps of 0.5. The σ value is kept fixed at 0.5 for all theexperiments. Thus, every sequence of transaction that we use for testing is a mixed sequencecontaining both genuine as well as malicious transactions. For each combination of spendingprofile and malicious transaction distribution, we carried out 100 runs and report the averageresult. The same set of data was used to determine the performance of both OA and ST.Figure 5(a) shows the variation of TP and FP for the two approaches <strong>using</strong> the spendingprofile (95 3 2). Variation of TP-FP and Accuracy is shown in Figure 5(b).80TP_OA TP_ST FP_OA FP_STTP and FP (%)60402000.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0Malicious Transaction Distribution Mean5(a)25


This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON DEPEDABLE AND SECURE COMPUTINGTP - FP Spread andAccuracy (%)100806040200SP_OA SP_ST AC_OA AC_ST0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0Malicious Transaction Distribution Mean5(b)Figure 5. Performance variation of the two systems (OA and ST) with Mean (μ) of MaliciousTransaction Distribution for the Spending Profile (95 3 2) (a) TP and FP (b) TP-FP spread (SP)and Accuracy (AC)It is seen from the figures that TP of the proposed approach is very close to that of Stolfo et al’sapproach. Also, both the approaches have almost similar values of FP. As a result, the twosystems have comparable accuracies and average TP-FP spread. Further, the two approachesexhibit similar trend with variation in μ. Next, we show how the two systems behave as we mixthe transaction amounts. Percentages of low value, medium value and high value transactions arechanged from (95 3 2) as shown above to (70 20 10) in Figures 6(a)-(b) and (55 35 10) in Figures7(a)-(b).80TP_OA TP_ST FP_OA FP_STTP and FP (%)60402000.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0Malicious Transaction Distribution Mean6(a)26


This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON DEPEDABLE AND SECURE COMPUTINGTP - FP Spread andAccuracy (%)806040200SP_OA SP_ST AC_OA AC_ST0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0Malicious Transaction Distribution Mean6(b)Figure 6. Performance variation of the two systems (OA and ST) with Mean (μ) of MaliciousTransaction Distribution for the Spending Profile (70 20 10) (a) TP and FP (b) TP-FP spread (SP)and Accuracy (AC)80TP_OA TP_ST FP_OA FP_STTP and FP (%)60402000.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0Malicious Transaction Distribution Mean7(a)27


This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON DEPEDABLE AND SECURE COMPUTINGTP - FP Spread andAccuracy (%)100806040200SP_OA SP_ST AC_OA AC_ST0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0Malicious Transaction Distribution Mean7(b)Figure 7. Performance variation of the two systems (OA and ST) with Mean (μ) of MaliciousTransaction Distribution for the Spending Profile (55 35 10) (a) TP and FP (b) TP-FP spread (SP)and Accuracy (AC)There are a number of interesting observations from the above two sets of figures. The firstobservation is that, TP falls and FP rises for both the approaches, as transactions no longerremain strictly low spending in nature. However, the FP rate of ST rises sharply although the TPrate does not degrade to a great extent. On the other hand, for our approach, FP rate remains lowwhile there is a graceful degradation of the TP rate. As a result, the Accuracy of the proposedsystem remains close to 80% for all the above settings. Accuracy of Stolfo et al’s approach fallsdrastically to about 60% for (55 35 10). Thus, our approach has 15-20% better Accuracy. WhileTP-FP values are close for the profile (70 20 10), there is more than 15% difference in TP-FP forthe profile (55 35 10) with our approach performing better.In Figures 8(a)-(b) we show the performance of the two approaches when the profile ismixed, which means that all the three ranges of transactions are equi-probable. It is seen that theFP of the ST method has gone up sharply. In fact, FP is even higher than TP. As a result, themean accuracy has dropped below 40%. Our approach shows a fall in TP but the FP has notdegraded a lot. As a result, the Accuracy of our system still remains around 80%. The TP-FP28


This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON DEPEDABLE AND SECURE COMPUTINGvalue for the ST approach has become negative even for μ = 0.5. For our approach, the TP-FPvalue became negative only after μ = 2.5.80TP_OA TP_ST FP_OA FP_STTP and FP (%)60402000.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0Malicious Transaction Distribution Mean8(a)TP - FP Spread andAccuracy (%)100806040200-200.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0-40SP_OA SP_ST AC_OA AC_STMalicious Transaction Distribution Mean8(b)Figure 8. Performance variation of the two systems (OA and ST) with Mean (μ) of MaliciousTransaction Distribution for mixed profile (a) TP and FP (b) TP-FP spread (SP) and Accuracy(AC)From the above results, it can be concluded that the proposed system has an overall Accuracy of80% even under large input condition variations, which is much higher than the overall Accuracyof the method proposed by Stolfo et al [6]. Our system can, therefore, correctly detect most ofthe transactions. However, when there is no profile information at all, the system shows some29


This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON DEPEDABLE AND SECURE COMPUTINGperformance degradation in terms of TP-FP. This observation highlights the importance ofprofile selection as explained in Section 4. Also, when there is little difference between genui<strong>net</strong>ransactions and malicious transactions, most of the credit card fraud detection systems sufferperformance degradation, either due to a fall in the number of true positives or a rise in thenumber of false positives.6. CONCLUSIONS AND DISCUSSIONSIn this paper, we have proposed an application of <strong>Hidden</strong> <strong>Markov</strong> Model in credit card frauddetection. The different steps in credit card transaction processing are represented as theunderlying stochastic process of an HMM. We have used the ranges of transaction amount as theobservation symbols, while the types of item have been considered to be states of the HMM. Wehave suggested a method for finding the spending profile of cardholders as well as application ofthis knowledge in deciding the value of observation symbols and initial estimate of the modelparameters. It has also been explained how the HMM can detect whether an incomingtransaction is fraudulent or not. Experimental results show the performance and effectiveness ofour system and demonstrate the usefulness of learning the spending profile of the cardholders.Comparative studies reveal that the Accuracy of the system is close to 80% over a wide variationin the input data. The system is also scalable for handling large volumes of transactions.ACKNOWLEDGMENTSWe are thankful to the anonymous reviewers for their constructive and useful comments. This work ispartially supported by a research grant from the Department of Information Technology, Ministry ofCommunication and Information Technology, Government of India, under Grant No. 12(34)/04-IRSDdated 07/12/2004.30


This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON DEPEDABLE AND SECURE COMPUTINGREFERENCES1. http://www2.acnielsen.com/reports/documents/2005_cc_onlineshopping.pdf, Global ConsumerAttitude Towards On-line Shopping, 07 March 2007.2. D.J. Hand, G. Blunt, M.G. Kelly, and N.M. Adams, “Data Mining for Fun and Profit,” StatisticalScience, vol. 15, no. 2, pp. 111–131, 2000.3. http://www.epaynews.com/statistics/fraud.html, Statistics for General and On-line <strong>Card</strong> <strong>Fraud</strong>, 07March 2007.4. S. Ghosh and D.L. Reilly, “<strong>Credit</strong> <strong>Card</strong> <strong>Fraud</strong> <strong>Detection</strong> with a Neural-Network,” Proc. 27 th HawaiiInternational Conference on System Sciences: Information Systems: Decision Support andKnowledge-Based Systems, vol. 3, pp. 621-630, 1994.5. M. Syeda, Y. Q. Zhang, and Y. Pan, “Parallel Granular Networks for Fast <strong>Credit</strong> <strong>Card</strong> <strong>Fraud</strong><strong>Detection</strong>,” Proc. IEEE International Conference on Fuzzy Systems, pp. 572-577, 2002.6. S.J. Stolfo, D.W. Fan, W. Lee, A.L. Prodromidis, and P.K. Chan, “<strong>Credit</strong> <strong>Card</strong> <strong>Fraud</strong> <strong>Detection</strong> <strong>using</strong>Meta-Learning: Issues and Initial Results,” Proc. AAAI Workshop on AI Methods in <strong>Fraud</strong> and RiskManagement, pp. 83-90, 1997.7. S.J. Stolfo, D.W. Fan, W. Lee, A. Prodromidis, and P.K. Chan, “Cost-based Modeling for <strong>Fraud</strong> andIntrusion <strong>Detection</strong>: Results from the JAM Project,” Proc. DARPA Information SurvivabilityConference and Exposition, vol. 2, pp. 130-144, 2000.8. E. Aleskerov, B. Freisleben, and B. Rao, “CARDWATCH: A Neural Network Based DatabaseMining System for <strong>Credit</strong> <strong>Card</strong> <strong>Fraud</strong> <strong>Detection</strong>,” Proc. IEEE/IAFE: Computational Intelligence forFinancial Engineering, pp. 220-226, 1997.9. M.J. Kim and T.S. Kim, “A Neural Classifier with <strong>Fraud</strong> Density Map for Effective <strong>Credit</strong> <strong>Card</strong><strong>Fraud</strong> <strong>Detection</strong>,” Proc. International Conference on Intelligent Data Engineering and AutomatedLearning, Lecture Notes in Computer Science, Springer Verlag, no. 2412, pp. 378-383, 2002.31


This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON DEPEDABLE AND SECURE COMPUTING10. W. Fan, A.L. Prodromidis, and S.J. Stolfo, “Distributed Data Mining in <strong>Credit</strong> <strong>Card</strong> <strong>Fraud</strong><strong>Detection</strong>,” IEEE Intelligent Systems, vol. 14, no. 6, pp. 67-74, 1999.11. R. Brause, T. Langsdorf, and M. Hepp, “Neural Data Mining for <strong>Credit</strong> <strong>Card</strong> <strong>Fraud</strong> <strong>Detection</strong>,” Proc.IEEE International Conference on Tools with Artificial Intelligence, pp. 103-106, 1999.12. C. Chiu and C. Tsai, “A Web Services-based Collaborative Scheme for <strong>Credit</strong> <strong>Card</strong> <strong>Fraud</strong> <strong>Detection</strong>,”Proc. IEEE International Conference on e-Technology, e-Commerce and e-Service, pp. 177-181,2004.13. C. Phua, V. Lee, K. Smith, and R. Gayler, “A Comprehensive Survey of Data Mining-based <strong>Fraud</strong><strong>Detection</strong> Research,” available on-line at http://www.bsys.monash.edu.au/people/cphua/, 07 March2007.14. S. Stolfo and A.L. Prodromidis, “Agent-based Distributed Learning applied to <strong>Fraud</strong> <strong>Detection</strong>,”Technical Report, CUCS-014-99, Columbia University, USA, 1999.15. C. Phua D. Alahakoon, and V. Lee, “Minority Report in <strong>Fraud</strong> <strong>Detection</strong>: Classification of SkewedData,” ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 50-59, 2004.16. V. Vatsa, S. Sural, and A.K. Majumdar, “A Game-theoretic Approach to <strong>Credit</strong> <strong>Card</strong> <strong>Fraud</strong><strong>Detection</strong>,” Proc. 1 st International Conference on Information Systems Security, Lecture Notes inComputer Science, Springer Verlag, pp. 263-276, 2005.17. S. Axelsson, “The Base-Rate Fallacy and the Difficulty of Intrusion <strong>Detection</strong>,” ACM Transactionson Information and System Security, vol. 3, no. 3, pp. 186-205, 2000.18. L.R. Rabiner, “A Tutorial on <strong>Hidden</strong> <strong>Markov</strong> Models and Selected Applications in SpeechRecognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257-286, 1989.19. S.S. Joshi and V.V. Phoha, “Investigating <strong>Hidden</strong> <strong>Markov</strong> Models Capabilities in Anomaly<strong>Detection</strong>,” Proc. of the 43 rd ACM Annual Southeast Regional Conference, vol. 1, pp. 98–103, 2005.20. S.B. Cho and H.J. Park, “Efficient Anomaly <strong>Detection</strong> by Modeling Privilege Flows <strong>using</strong> <strong>Hidden</strong><strong>Markov</strong> Model,” Computer and Security, vol. 22, no. 1, pp. 45-55, 2003.32


This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON DEPEDABLE AND SECURE COMPUTING21. D. Ourston, S. Matzner, W. Stump, and B. Hopkins, “Applications of <strong>Hidden</strong> <strong>Markov</strong> Models toDetecting Multi-stage Network Attacks,” Proc. 36 th Annual Hawaii International Conference onSystem Sciences, vol. 9, pp. 334-344, 2003.22. X.D. Hoang, J. Hu, and P. Bertok, “A Multi-layer Model for Anomaly Intrusion <strong>Detection</strong> <strong>using</strong>Program Sequences of System Calls,” Proc. 11 th IEEE International Conference on Networks, pp.531-536, 2003.23. T. Lane, “<strong>Hidden</strong> <strong>Markov</strong> Models for Human/Computer Interface Modeling,” Proc. InternationalJoint Conference on Artificial Intelligence, Workshop on Learning about Users, pp. 35-44, 1999.24. L. Kaufman and P.J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis,Wiley Series in Probability and Mathematical Statistics, 1990.25. K. S. Trivedi, Probability and Statistics with Reliability, Queuing, and Computer ScienceApplications, Second Edition, John Wiley and Sons, New York, 2001.26. J. Banks, J. S. Carson II, B. L. Nelson and D. M. Nicol, Discrete-Event System Simulation, FourthEdition, Prentice Hall, 200433

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!