12.07.2015 Views

Association Rules for Predicting Customer Lifetime Value in Retail ...

Association Rules for Predicting Customer Lifetime Value in Retail ...

Association Rules for Predicting Customer Lifetime Value in Retail ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

International Journal of Computer and In<strong>for</strong>mation Technology (ISSN: 2279 – 0764)Volume 02– Issue 02, March 2013<strong>Association</strong> <strong>Rules</strong> <strong>for</strong> <strong>Predict<strong>in</strong>g</strong> <strong>Customer</strong> <strong>Lifetime</strong><strong>Value</strong> <strong>in</strong> <strong>Retail</strong> Bank<strong>in</strong>g Context Based on RDB-MINER AlgorithmZhou X<strong>in</strong>Department of Economic Eng<strong>in</strong>eer<strong>in</strong>gKyushu UniversityFukuoka, JapanAbstract—Data m<strong>in</strong><strong>in</strong>g methodology has a tremendouscontribution <strong>for</strong> extract<strong>in</strong>g the hidden knowledge and patternsfrom the exist<strong>in</strong>g databases. Traditionally, researchers use basketdata to m<strong>in</strong>e association rules of which the basic task is to f<strong>in</strong>dthe frequent items. For relational databases whose data <strong>for</strong>mat isrelational data other than basket data, RDB-MINER algorithmwas proposed. In this paper, we <strong>in</strong>troduce an improved RDB-MINER algorithm and apply it to m<strong>in</strong>e association rules <strong>in</strong> retailbank<strong>in</strong>g relational databases. When we assess the customerlifetime value, RFM model is adopted. Moreover, we propose amethod to f<strong>in</strong>d the association rules between customers’attributes and their lifetime value, these patterns are significant<strong>for</strong> predict<strong>in</strong>g their future value.Keywords- <strong>Association</strong> <strong>Rules</strong>; <strong>Customer</strong> <strong>Lifetime</strong> <strong>Value</strong>; RelationalDatabase; RDB-MINER Algorithm; RFM Model; <strong>Retail</strong> Bank<strong>in</strong>g.I. INTRODUCTIONOver the past two decades, there have been numerous researcheson m<strong>in</strong><strong>in</strong>g frequent itemsets from precise data such as databases ofmarket basket transactions. The discovery of association rules isbased on frequent itemset m<strong>in</strong><strong>in</strong>g. <strong>Association</strong> rules m<strong>in</strong><strong>in</strong>g [1] is apopular and well researched method <strong>for</strong> discover<strong>in</strong>g <strong>in</strong>terest<strong>in</strong>grelations between variables <strong>in</strong> large databases. We can use it todiscover regularities between products <strong>in</strong> large scale transaction datarecorded by POS system <strong>in</strong> supermarkets. For example, the rule{onions, potatoes} {beef} found <strong>in</strong> the sales data means if acustomer buys onions and potatoes together, he is likely to also buybeef. Such rules can be used as the basis <strong>for</strong> decisions aboutmarket<strong>in</strong>g activities, e.g., promotional pric<strong>in</strong>g, product placement ornew product <strong>in</strong>novation. Except <strong>for</strong> the above application <strong>in</strong>market<strong>in</strong>g, association rules are employed today <strong>in</strong> many areas<strong>in</strong>clud<strong>in</strong>g bio<strong>in</strong><strong>for</strong>matics, f<strong>in</strong>ance and economics data analysis, web<strong>in</strong><strong>for</strong>mation m<strong>in</strong><strong>in</strong>g, etc. Apriori algorithm is the first and best-knownalgorithm to m<strong>in</strong>e association rules, it uses a breadth-first searchstrategy to count the support of itemsets and uses a candidategeneration function which exploits the downward closure property ofsupport.II.BACKGROUNDA. Def<strong>in</strong>itionLet D = {t 1 ,t 2 ,…,t n } be a set of transactions <strong>in</strong> databases.Let I = { i 1 ,i 2 ,…,i m } be a set of m attributes called items. Eachtransaction <strong>in</strong> D has a unique ID, and conta<strong>in</strong>s a subset ofitems <strong>in</strong> I. An association rule is def<strong>in</strong>ed as XY whereX, Y I and X Y= .B. ConceptsTo select useful rules from all the possible rules, supportand confidence are used as the two m<strong>in</strong>imum thresholds.The support sup(X) of an itemset X is the proportion oftransactions <strong>in</strong> the data set which conta<strong>in</strong>s X. For example, theitemset {onions, potatoes} has a support of 0.2 if theoccurrence frequency is two out of ten <strong>in</strong> all transactions.The confidence is def<strong>in</strong>ed as con(X Y) = sup(X Y)/sup(X). For example, if the support <strong>for</strong> occurrences oftransactions where {onions, potatoes} and {beef} both appearis 0.1, the rule {onions, potatoes} { beef} has a confidenceof 0.1/0.2= 0.5.C. Basket dataBasket data is represented as a set of records where eachrecord conta<strong>in</strong>s a unique ID and a set of bought items. Most ofmarket data are represented <strong>in</strong> basket data <strong>for</strong>mat. Table1shows an example of basket data.www.ijcit.com 210


International Journal of Computer and In<strong>for</strong>mation Technology (ISSN: 2279 – 0764)Volume 02– Issue 02, March 2013MINER algorithm, the time of comparisons <strong>in</strong> memoryshall be less than the <strong>for</strong>mer one. So it has betterper<strong>for</strong>mance on time efficiency and space efficiency.IV.APPLICATIONIn this section, we show an application <strong>in</strong> which theimproved RDB-MINER algorithm can be applied.A. RFM model <strong>in</strong> retail bank<strong>in</strong>g contextS<strong>in</strong>ce the <strong>in</strong>creased importance is placed on customerequity <strong>in</strong> today’s bus<strong>in</strong>ess environment, many companiespay more attention to the notion of customer lifetime valueand their future profitability to <strong>in</strong>crease market share. TheRFM (Recency, Frequency and Monetary) model [16] is usedto assess the customer lifetime value. Recency is the lastpurchase date <strong>in</strong> a particular period, Frequency is thenumber of purchases <strong>in</strong> a particular period, Monetary is thevalue of purchases <strong>in</strong> a particular period. In retail bank<strong>in</strong>gcontext [17] , suppose that the time w<strong>in</strong>dow period is 3months, and the def<strong>in</strong>itions of RFM are described asfollows:Recency is the <strong>in</strong>terval between the date of last purchaseand the first day of last 3 months.Frequency is the number of days which occur at least onetransaction dur<strong>in</strong>g last 3 months.Monetary is daily average amount of money <strong>in</strong> all thecustomer’s deposits dur<strong>in</strong>g last 3 months.Based on Delphi Experts Grad<strong>in</strong>g Method, the relativeweights of the RFM variables W R , W F and W M can beobta<strong>in</strong>ed. W R +W F + W M =1. Use the normalization methodof statistics, we can obta<strong>in</strong> the normalized R, F , M calledNR , NF , NM respectively. Then, we can use <strong>for</strong>mula (1) tocompute the customer lifetime value(CLV).CLV = NR* W R + NF* W F +NM*W M (1)Because { NR , NF , NM } [0,1] and W R +W F + W M=1, CLV[0,1]. We can divide the range [0,1] <strong>in</strong>to 10sections with a sub range of 0.1, and assess the rank ofCLV(CLVR) us<strong>in</strong>g one of a series of consecutive numbers<strong>in</strong>creased by one 1,2,3,…,10 which represents the CLVRfrom low level to high level. The results of calculated CLV<strong>for</strong> different customers or different segments of customerscan be used to improve market<strong>in</strong>g and strategies <strong>in</strong> the retailbank<strong>in</strong>g.B. <strong>Association</strong> Rule M<strong>in</strong><strong>in</strong>gSuppose a relation customers <strong>in</strong> retail bank is shown <strong>in</strong>table 5, the relative values of W R , W F , W M are 0.08, 0.32and 0.6 respectively. CLV and CLVR are generated from<strong>for</strong>mula (1). Recency, frequency and monetary arenormalized value.Then, we can use SQL to implement the improvedRDB-MINER algorithm to f<strong>in</strong>d association rules.The <strong>in</strong>puts are:relation: customers ;exclude_set : a set of attributes {CID, name, age, sex, city,recency, frequency, monetary, CLV};m<strong>in</strong>_supp : 0.2;m<strong>in</strong>_conf : 0.4;target_attribute : CLVR.From the function Compute_N , we get N=3. Then weobta<strong>in</strong> the powerset A of attributes, = {job, education,<strong>in</strong>stitution}. In a loop, we get all the ISIs: <strong>for</strong> card<strong>in</strong>ality 1,they are {{job},{education},{<strong>in</strong>stitution}}; <strong>for</strong> card<strong>in</strong>ality 2,they are {{job, education},{job, <strong>in</strong>stitution}, {education,<strong>in</strong>stitution}}; <strong>for</strong> card<strong>in</strong>ality 3, it is only one item, {job,education, <strong>in</strong>stitution}. For each ISI, us<strong>in</strong>g fuctionAdd_to_ISI_set , we f<strong>in</strong>d the ISEs which satisfies them<strong>in</strong>_supp. Then, <strong>in</strong> a loop <strong>for</strong> itemsets <strong>in</strong> the ISI set, wef<strong>in</strong>d all the ISEs which simultaneously satisfy m<strong>in</strong>_supp andm<strong>in</strong>_conf. F<strong>in</strong>ally, we can summarize all the associationrules.Table 5. <strong>Customer</strong>sCID name age sex job education <strong>in</strong>stitution city recency frequency MonetaryCLVCLVR1 Li P<strong>in</strong> 30 male manager Phd Company CityA 0.1 0.011 0.4 0.252 32 Li Li 22 female student master College CityB 0.1 0.022 0.001 0.016 13 Zhu Qi 45 male sales bachelor Insurance CityC 0.5 0.011 0.6 0.404 5… … … … … … … … … … … … …www.ijcit.com 213


International Journal of Computer and In<strong>for</strong>mation Technology (ISSN: 2279 – 0764)Volume 02– Issue 02, March 2013V. CONCLUSIONThis paper provides with several contributions.Firstly, we have proposed an improved RDB-MINERalgorithm which considers m<strong>in</strong>_supp and m<strong>in</strong>_conf <strong>in</strong> theprocess of f<strong>in</strong>d<strong>in</strong>g ISIs and correspond<strong>in</strong>g ISEs, and give itsadvantages over RDB-MINER algorithm. Secondly, weadopt this algorithm to f<strong>in</strong>d the association rules <strong>in</strong> customervalue management. The obta<strong>in</strong>ed application results canhelp us to classify the customers, predict specific customer’sfuture value and launch some significant commercialactivities to promote their lifetime value. The differencebetween exist<strong>in</strong>g RDB-MINER algorithm and the improvedone is that the <strong>for</strong>mer considers support and confidence aftergett<strong>in</strong>g all the ISIs, the latter considers support andconfidence <strong>in</strong> the process of algorithm, the latter has betterper<strong>for</strong>mance <strong>in</strong> time complexity, when the volume of data isvery big, the per<strong>for</strong>mance becomes important, so choos<strong>in</strong>g asuitable algorithm is very crucial.REFERENCES[1] R. Agrawal and R. Srikant: Fast Algorithms <strong>for</strong> M<strong>in</strong><strong>in</strong>g <strong>Association</strong><strong>Rules</strong>. Proceed<strong>in</strong>gs of the International Conf. on Very LargeDatabases (VLDB’94), pp. 487--499. Santiago, Chile , 1994[2] R. Elmasri, and S. Navathe: Fundamentals of Database Systems, FifthEdition, Addison-Wesley , 2007[3] Abdallah Alashqur: RDB-MINER: A SQL-Based Algorithm <strong>for</strong>M<strong>in</strong><strong>in</strong>g True Relational Databases. Journal of Software, vol. 5, no. 9,pp. 998—1005. 2010 Academy Publisher , 2010[4] Abdallah Alashqur: Us<strong>in</strong>g a Lattice Intension Structure to FacilitateUser-Guided <strong>Association</strong> Rule M<strong>in</strong><strong>in</strong>g. Computer and In<strong>for</strong>mationScience, vol. 5, no. 2, pp. 11--21. Canadian Center of Science andEducation, Toronto , 2012[5] R. Agrawal and R. Srikant, “Fast Algorithms <strong>for</strong> M<strong>in</strong><strong>in</strong>g <strong>Association</strong><strong>Rules</strong>,” Proceed<strong>in</strong>gs of the International Conf. on Very LargeDatabases (VLDB’94), 1994, pages 487-499, Santiago, Chile.[6] J. Wang, J. Han, Y. Lu, and P. Tzvetkov. “TFP: An effcient algorithm<strong>for</strong> m<strong>in</strong><strong>in</strong>g top-k frequent closed itemsets.” IEEE Trans. Knowledgeand Data Eng<strong>in</strong>eer<strong>in</strong>g, 2005, 17:652-664.[7] D. X<strong>in</strong>, J. Han, X. Yan, and H. Cheng. “M<strong>in</strong><strong>in</strong>g compressed frequentpatternsets.” In Proc. 2005 Int.Conf. Very Large Data Bases(VLDB'05), pages 709-720, Trondheim, Norway, Aug. 2005.[8] B. Rozenberg, E. Gudes, “<strong>Association</strong> rules m<strong>in</strong><strong>in</strong>g <strong>in</strong> verticallypartitioned databases,” Data and Knowledge Eng<strong>in</strong>eer<strong>in</strong>g 59(2) 2006.Pages: 378–396.[9] Mohammad-Hosse<strong>in</strong> Nadimi-Shahraki, Norwati Mustapha, Md NasirSulaiman, Ali B. Mamat, “A New Algorithm <strong>for</strong> Discovery MaximalFrequent Itemsets,”International Conference on Data M<strong>in</strong><strong>in</strong>g (DMIN)2008:309-312[10] B. Lucchese, S. Orlando, R. Perego, “Fast and memory efficientm<strong>in</strong><strong>in</strong>g of frequent closed itemsets,” IEEE Transactions onKnowledge and Data Eng<strong>in</strong>eer<strong>in</strong>g 18 (1), 2006, 21–36.[11] H. Moonest<strong>in</strong>ghe, S. Fodeh, P.N. Tan, “Frequent closed itemsetm<strong>in</strong><strong>in</strong>g us<strong>in</strong>g prefix graphs with an efficient flowbased prun<strong>in</strong>gstrategy,” Proceed<strong>in</strong>gs of the 6 th International Conference on DataM<strong>in</strong><strong>in</strong>g, Hong Kong,2006, pp. 426–435.[12] Lisheng Ma, Yi Qi, “An Efficient Algorithm <strong>for</strong> Frequent ClosedItemsets M<strong>in</strong><strong>in</strong>g,” International Conference on Computer Science andSoftware Eng<strong>in</strong>eer<strong>in</strong>g (CSSE)2008:259-262[13] G. Liu, H. Lu, Y. Xu, and J. X. Yu, “Ascend<strong>in</strong>g Frequency OrderedPrefix-tree: Efficient M<strong>in</strong><strong>in</strong>g of Frequent Patterns,” Proc. 2003 Int.Conf. on Database Systems <strong>for</strong> Advanced Applications (DASFAA03),Kyoto, Japan, March, 2003.[14] J.Han, J. Pei, and Y. Y<strong>in</strong>, “M<strong>in</strong><strong>in</strong>g frequent patterns withoutcandidate generation” In Proc. 2000 ACMSIGMOD Int. Conf.Management of Data (SIGMOD'00), pages 1-12, Dallas, TX, May2000.[15] K. Gouda, and M. Zaki, “GenMax: An Efficient Algorithm <strong>for</strong>M<strong>in</strong><strong>in</strong>g Maximal Frequent Itemsets,” Data M<strong>in</strong><strong>in</strong>g and KnowledgeDiscovery: An International Journal, 2005, 11(3): 223-242.[16] Journal, 2005, 11(3): 223-242.Mahboubeh Khajvand, KiyanaZolfaghar, Sarah Ashoori, Somayeh Alizadeh: Estimat<strong>in</strong>g customerlifetime value based on RFM analysis of customer purchase behavior:Case study. Procedia CS (PROCEDIA), vol. 3, pp. 57--63. ElsevierLtd, 2011 .[17] Mahboubeh Khajvand, Mohammad Jafar Tarokh: Estimat<strong>in</strong>gcustomer future value of different customer segments based onadapted RFM model <strong>in</strong> retail bank<strong>in</strong>g context. Procedia CS(PROCEDIA), vol. 3, pp. 1327--1332. Elsevier Ltd, 2011.www.ijcit.com 214

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!