19.08.2016 Views

Social Network Analysis Approaches for Fraud Analytics

Social-Network-Analytics-for-Fraud-Detection

Social-Network-Analytics-for-Fraud-Detection

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Social</strong> <strong>Network</strong> <strong>Analysis</strong><br />

<strong>Approaches</strong> <strong>for</strong> <strong>Fraud</strong> <strong>Analytics</strong><br />

Dr. Archisman Majumdar<br />

Senior Manager, Mphasis NEXTlabs<br />

NEXT labs


2<br />

Introduction<br />

The impact of fraud on organizations is becoming<br />

increasingly costly. Every year financial institutions<br />

lose millions of dollars in revenue to systematic<br />

fraud. The emergence of new technologies and<br />

<strong>for</strong>ms of payments, as well as sophistications in<br />

fraud, complicate the challenges faced by<br />

organizations in creating effective fraud detection<br />

strategies. Many of the existing techniques rely<br />

solely on the business rules developed by experts,<br />

which require great amount of user inputs, and<br />

need to be constantly updated.<br />

However, the ability to link multiple data sources,<br />

analyze large volumes of data, and apply newer<br />

algorithms on the transactions, provide<br />

organizations an opportunity to capture, and<br />

sometimes predict, fraud in a more efficient<br />

manner. More recent analytics based approaches<br />

include the use of descriptive & predictive<br />

analytics, machine learning, and social network<br />

analysis methods <strong>for</strong> fraud detection. This paper<br />

discusses some of the key approaches and<br />

developments in the use of social network<br />

analysis in fraud detection and prevention.<br />

FRAUD AND FRAUD ANALYTICS<br />

Transactions<br />

External<br />

Sources<br />

<strong>Social</strong> Data<br />

Pattern Based <strong>Fraud</strong><br />

Detection - Train -<br />

Predict - Feedback<br />

Expert User<br />

Feedback<br />

Figure 1 <strong>Fraud</strong> <strong>Analytics</strong> System<br />

Reports,<br />

Scores, and<br />

Alerts<br />

<strong>Fraud</strong> is “an uncommon, imperceptibly<br />

concealed, time-evolving and often carefully<br />

organized crime which appears in many types of<br />

<strong>for</strong>ms” (Baesens, Van Vlasselaer, and Verbeke<br />

2015). The definition highlights some of the key<br />

elements of fraud and also points out some of<br />

the major identification methods.<br />

The classical approach to fraud identification relies<br />

on creation of explicit rules (IF-THEN-ELSEIF-…)<br />

based on the recommendation of experts. These<br />

rules are developed and modified through their<br />

collective field experiences. Nevertheless, over<br />

time, due to the dynamic and sophisticated nature<br />

of the frauds, the rules become complex and<br />

difficult to maintain and implement (unless they<br />

are very regularly updated). This is also a very<br />

labor intensive approach requiring human<br />

intervention at every stage of evaluation,<br />

identification, and monitoring.<br />

The availability of data from multiple sources,<br />

and the ability of present systems to process<br />

and analyze this data have provided new<br />

opportunities <strong>for</strong> identifying fraud. As is apparent<br />

from Figure 1 <strong>Fraud</strong> <strong>Analytics</strong> System, the use of<br />

multiple data sources to identify patterns is one<br />

of the cornerstones of a data mining approach to<br />

fraud detection. <strong>Fraud</strong> analytics also provide a<br />

potential to automate multiple stages of the<br />

fraud detection, monitoring, and intervention<br />

stages of a typical cycle.<br />

HyperGraf combines data from multiple<br />

sources, including credit scores, enterprise<br />

transactional data, and social media to identify<br />

and analyze fraud. One of the key methods used<br />

in HyperGraf is network analysis <strong>for</strong> fraud<br />

detection and the following section highlights<br />

some of its key aspects.<br />

NEXT labs


3<br />

<strong>Network</strong> Introduction<br />

<strong>Analytics</strong> in <strong>Fraud</strong><br />

Detection<br />

Cognitive intelligence enables insurance Developing analytics - driven and people -<br />

Visually explore<br />

Represent<br />

Store network the database and<br />

companies Two key trends to in analyse the insurance data industry about are instructions<br />

driven For an mechanisms auto<br />

appropriate<br />

insurance <strong>for</strong> claim,<br />

relationships<br />

application <strong>for</strong><br />

to<br />

example, of th<br />

<strong>Social</strong><br />

interactions <strong>Network</strong> <strong>Analysis</strong>,<br />

in real one of<br />

time the emergent as a network<br />

database<br />

identity and<br />

the use of Big Data multi-structured to predict analytics, insights policyholder’s to business data decision processed<br />

classify networks<br />

scenarios at First Notic<br />

data mining methods in fraud analytics, is a<br />

propensity<br />

technique and which an represents emphasis <strong>for</strong> fraud<br />

the on based<br />

entities omnichannel on voice, video,<br />

as nodes customer This of Loss includes (FNOL) the facilitation and damage of real-time assessment<br />

and relationships<br />

and interaction. chat sessions,<br />

between The trans<strong>for</strong>mation the<br />

and<br />

entities<br />

by correlating<br />

as of links. raw this data to correlation carried out. Suppliers <strong>for</strong> continuous are instructed insights, to carr<br />

Apply social<br />

Classify and<br />

Present the<br />

Representing data with the fraudulent relationships customer reveals a behaviour. lot more<br />

network<br />

in<strong>for</strong>mation to knowledge to insights — and closed-loop out vehicle seperate repairs; learning these <strong>for</strong> in<strong>for</strong>mation could risk in further an analysis, includ<br />

algorithms to between routine easy to consume<br />

in<strong>for</strong>mation than simply listing out the properties identify fraud<br />

ultimately insights-based decisions — has been self-service the involvement users interactive and of an Assessor.<br />

manner dashboard-based<br />

<strong>for</strong> the<br />

links, cliques<br />

possible fraud security analyst<br />

of the Creating entities. a digital The analysis Customer of 360 links view and and partrers<br />

configurable sources<br />

facilitated by advances in the data sciences.<br />

analytics, heat maps, KPI Grid<br />

relationships enables the application of various<br />

Insurers can create a customer footprint Figure 2 End-to-end<br />

graph mining For insurers, algorithms competitive on the data source. advantage will be and From Views, fraud this analytics point, ad approach hoc the using customer analysis, social is Agile no longer Risk i<br />

network analysis methods<br />

correlation determined engine by their that ability takes to slivers adopt of and Reporting, the control and of the Intelligent insurer; the Discovery baton is hande<br />

Traditional customer data mining data from techniques multiple rely interaction the Some<br />

Exploration.<br />

of<br />

exploit advanced analytics techniques, and over<br />

the<br />

to the<br />

popular<br />

supplier<br />

network<br />

to manage<br />

analytics<br />

the remainde<br />

statistical<br />

channels<br />

patterns<br />

and<br />

used<br />

builds<br />

<strong>for</strong><br />

an<br />

identifying<br />

accurate<br />

fraud. methods used and their typical business use<br />

their decision to institutionalise insights-based<br />

customer of the repair process — and, there<strong>for</strong>e, th<br />

Yet, given the uncommon, time-evolving, and cases <strong>for</strong> fraud detection are listed in the<br />

profile <strong>for</strong> channel-specific product On-demand data trans<strong>for</strong>mation,<br />

carefully decision concealed making nature as part of of fraud, their organisational customer experience. Other processes coul<br />

these following figure –<br />

methods recommendations. fabric. are often unable to The detect driver various <strong>for</strong> types this is on-demand be in play — data <strong>for</strong> instance, cleansing, fraud investigation<br />

content<br />

<strong>Network</strong>s visualization and ego-centric analysis<br />

of frauds. integration Application of of a multiple number of graph customer analysis, self-service interface across the<br />

• Displays relationship between selected alert and any other related alerts<br />

algorithms touch-points can help in identifying throughout such the patterns<br />

through link and nodes<br />

Insights-driven decisions can claims trans<strong>for</strong>m enterprise, If the incident personalised was the customer’s interfaces, fault, and th<br />

• Identify links with known (blacklisted) entities<br />

by utilizing<br />

process.<br />

relationship<br />

This<br />

in<strong>for</strong>mation<br />

involves<br />

in<br />

complex<br />

addition to<br />

insurance enterprises by event providing user-assembled insurer might try solutions to entice facilitate an innocent easier Thir<br />

Entity link analysis / entity resolution<br />

the user level attribute in<strong>for</strong>mation.<br />

processing comprehensive, that accurate, correlates real-time, data actionable about • Detecting adoption rings Party in two mode (TP) networks of insurer insight-driven of people and into attributestheir decisions network and in a<br />

• Identify rings in first party fraud collusions<br />

In fraud customer insights detection, demographics about the people, interactions and transactions,<br />

machines, and and significantly ef<strong>for</strong>t to reduce costs. learning If this time. fails, yet anothe<br />

exchanges call centre can be data, viewed data as from heterogeneous<br />

processes from interactions Web across browsing<br />

Graph walking<br />

multiple team<br />

to identify<br />

could<br />

rings<br />

be handling incoming claims from<br />

networks with multiple participants. The number • Walk through the graph to identify rings<br />

behaviour and online chat sessions, e-mail Natural Language Processing, emotion<br />

of participants<br />

channels.<br />

are generally<br />

Technology<br />

huge, but the<br />

today<br />

kind of<br />

enables • Identify paper rings<br />

TP<br />

of fraudulent<br />

<strong>for</strong> damages.<br />

collusions<br />

Conversely, if the inciden<br />

campaign data, data from display detection, microsensor analytics, Automatic<br />

interactions enterprises among to the correlate individuals insights is generally from multiple was not the customer’s fault, subrogation o<br />

Centrality measures<br />

limited advertising, channels and known. and and Graph more. create analysis a single techniques view of the Content recovery Recognition, processes kick advanced in to establish decision a<br />

• Ranking nodes based on various graph centrality parameters<br />

• Identify leaders in fraud networks<br />

can be customer. used further to identify suspicious<br />

tree-based acceptance of visualisation, liability from the lifecycle TP be<strong>for</strong><br />

individuals, Developing groups, an enterprise-level relationships, strategy unusual that Snowball simulation, seeking method recompense and <strong>for</strong> Massively the cost of Parallel repair.<br />

changes over time/geography, and anomalous • Identify suspects and recursively expand their connections using snowball method<br />

aligns analytics programmes with key Processing (MPP) appliances such as<br />

networks<br />

Issues<br />

within the<br />

in<br />

overall<br />

the<br />

graph<br />

Insurance<br />

structure.<br />

Claims • Identify linkages to known (blacklisted) entities<br />

business Process drivers<br />

NoSQL Most insurers and Hadoop still operate need to a be dual explored signator<br />

Peer group analysis<br />

An The enterprise-wide typical claims process strategy in <strong>for</strong> the using case • of This technique <strong>for</strong> process, detects applicability abnormal so behavior that of a to target a business payment by comparing it with scenarios. input its by one The agen<br />

peer group and measuring the deviation of its behavior from that of its peers<br />

multi-structured personal insurance analytics is a complex is set key of to • Abnormal<br />

events technical changes must when be compared challenges authorised to the peer group— by such a senior as managing colleague<br />

generating that are disconnected value network and insights.<br />

<strong>Network</strong><br />

distributed within large Specialist topologies datasets, - cliques Investigation and stars integrating Units with (SIUs) existing operat<br />

• Any quantitative or qualitative features of a user behavior in online social<br />

networks that<br />

the insurance organisation. The problem is independently are inconsistent with the rest of to users cover can be considered fraud, anomalies theft, injury, an<br />

• Anomaly Detection - Identify outliers in networks<br />

compounded by the inclusion of numerous so on. For commercial or High Net Wort<br />

Page-rank<br />

internal and external partners who potentially claims, there is often the added complicatio<br />

• The Page-Rank algorithm can be used to discover the critical accounts of the groups<br />

• Collusive fraud<br />

operate on different systems or plat<strong>for</strong>ms, of groups dealing with a broker or agent.<br />

generating data in many <strong>for</strong>mats.<br />

Combining user level and network level features<br />

• Combine user level attributes with network level attributes<br />

• Identify conspired groups<br />

NEXT labs


4<br />

Challenges Introduction<br />

to <strong>Network</strong> <strong>Analytics</strong> in<br />

<strong>Fraud</strong> Detection<br />

infrastructures, using systems effectively,<br />

and Two so on key — trends need to in be the addressed; insurance so industry do are<br />

people-related <strong>Network</strong> analysis the use of Big challenges<br />

opens new<br />

Data multi-structured such<br />

avenues<br />

as<br />

<strong>for</strong><br />

skill<br />

fraud<br />

analytics,<br />

detection. These can augment the existing<br />

gaps<br />

rule-based, and and emphasis the<br />

and data-mining on management omnichannel of<br />

approaches customer in the<br />

cross-functional<br />

organization. interaction. The teams.<br />

While trans<strong>for</strong>mation network of raw analytics data to<br />

techniques in<strong>for</strong>mation promise to knowledge key breakthroughs to insights fraud — and<br />

Designing detection,<br />

ultimately governance there<br />

insights-based<br />

are certain structures decisions<br />

key challenges<br />

— <strong>for</strong> has been<br />

analytics which make implementing network analysis in<br />

facilitated groups by advances and policies in the to data ensure sciences.<br />

<strong>Fraud</strong> detection difficult. These include -<br />

data For security insurers, and privacy competitive advantage will be<br />

The<br />

• Emergent body of knowledge leading to<br />

determined building by of their an ability insights-driven to adopt and<br />

difficulties in identifying the correct methods,<br />

organisation exploit their applications advanced mandates<br />

and analytics the setting<br />

interpretations. techniques, up of and<br />

robust their policies decision and to institutionalise practices to ensure insights-based<br />

• Requirements <strong>for</strong> novel data storage and<br />

data decision security<br />

ware house making and<br />

methods. as part privacy<br />

Traditional of their through organisational<br />

databases<br />

role-based fabric. are not security optimized integration, or designed <strong>for</strong> access network<br />

control analysis policies, and operations. data masking New NoSQL and and<br />

encryption, graph databases are often more suitable <strong>for</strong><br />

Insights-driven and Single decisions Sign-On. can trans<strong>for</strong>m<br />

these operations.<br />

insurance enterprises by providing<br />

• High volume and variety of data which needs<br />

comprehensive, accurate, real-time, actionable<br />

to be processed.<br />

insights about people, machines, and<br />

• Many graph algorithms are ‘computationally<br />

processes from interactions across multiple<br />

intractable’, i.e., even though the problems<br />

channels.<br />

can be solved<br />

Technology<br />

in finite time,<br />

today<br />

the amount<br />

enables<br />

of<br />

enterprises processing to required correlate make insights them infeasible. from multiple<br />

•<br />

channels<br />

Retro-active<br />

and<br />

nature<br />

create<br />

of<br />

a single<br />

social<br />

view<br />

network<br />

of the<br />

customer. analysis which makes them less suitable <strong>for</strong><br />

prediction compared to other methods, such<br />

as machine learning based approaches.<br />

Issues in the Insurance Claims<br />

Process<br />

• Lack of automation in network analytics in<br />

The<br />

fraud<br />

typical<br />

detection<br />

claims<br />

and<br />

process<br />

the need<br />

in<br />

<strong>for</strong><br />

the<br />

expert<br />

case of<br />

analysis and interpretations.<br />

personal insurance is a complex set of events<br />

that are disconnected and distributed within<br />

the insurance organisation. The problem is<br />

compounded by the inclusion of numerous<br />

internal and external partners who potentially<br />

operate on different systems or plat<strong>for</strong>ms,<br />

generating data in many <strong>for</strong>mats.<br />

Conclusion<br />

Addressing For an auto these insurance challenges claim, <strong>for</strong> example, require the<br />

organizations policyholder’s to continually data is processed innovate at and First use Notice<br />

new of systems Loss (FNOL) with specific and damage capabilities. assessments Some<br />

solutions, carried like out. HyperGraf, Suppliers are provide instructed a plat<strong>for</strong>m to carry<br />

<strong>for</strong> out guided vehicle analytics repairs; these in fraud could further detection. include<br />

Nevertheless,<br />

the involvement<br />

the role<br />

of<br />

of<br />

an<br />

domain<br />

Assessor.<br />

experts and<br />

data scientists in applying these methods are<br />

often the key factors in the successful<br />

From this point, the customer is no longer in<br />

implementation of a fraud detection strategy.<br />

the control of the insurer; the baton is handed<br />

Reference<br />

over to the supplier to manage the remainder<br />

Baesens, of the Bart, repair Véronique process Van — and, Vlasselaer, there<strong>for</strong>e, and the<br />

Wouter customer Verbeke. experience. 2015. “<strong>Fraud</strong> Other <strong>Analytics</strong> processes Using could<br />

Descriptive, be in play Predictive — <strong>for</strong> instance, & <strong>Social</strong> <strong>Network</strong> fraud investigation.<br />

Techniques.”<br />

https://lirias.kuleuven.be/handle/123456789/500346<br />

If the incident was the customer’s fault, the<br />

insurer might try to entice an innocent Third<br />

Party (TP) insurer into their network in an<br />

ef<strong>for</strong>t to reduce costs. If this fails, yet another<br />

team could be handling incoming claims from<br />

the TP <strong>for</strong> damages. Conversely, if the incident<br />

was not the customer’s fault, subrogation or<br />

recovery processes kick in to establish an<br />

acceptance of liability from the TP be<strong>for</strong>e<br />

seeking recompense <strong>for</strong> the cost of repair.<br />

Most insurers still operate a dual signatory<br />

process, so that a payment input by one agent<br />

must be authorised by a senior colleague.<br />

Specialist Investigation Units (SIUs) operate<br />

independently to cover fraud, theft, injury, and<br />

so on. For commercial or High Net Worth<br />

claims, there is often the added complication<br />

of dealing with a broker or agent.<br />

NEXT labs


5<br />

Cognitive intelligence enables insurance Developing analytics - driven and people -<br />

companies to analyse data about driven mechanisms <strong>for</strong> application of<br />

interactions in real time to predict insights to business decision scenarios<br />

propensity <strong>for</strong> fraud based on voice, video, This includes the facilitation of real-time<br />

and chat sessions, and by correlating this correlation <strong>for</strong> continuous insights,<br />

data with fraudulent customer behaviour. closed-loop learning <strong>for</strong> risk analysis,<br />

self-service interactive dashboard-based<br />

Creating a digital Customer 360 view<br />

Insurers can create a customer footprint<br />

correlation engine that takes slivers of<br />

configurable analytics, heat maps, KPI Grid<br />

and Views, ad hoc analysis, Agile Risk<br />

Reporting, and Intelligent Discovery and<br />

customer data from multiple Archisman interaction is a Senior Exploration. Manager at Mphasis NEXTlabs. At Mphasis,<br />

channels and builds an accurate<br />

he conceptualizes,<br />

customer<br />

develops, and leads multiple products in the analytics<br />

R&D space. He has extensive experience in the IT industry at various<br />

profile <strong>for</strong> channel-specific project product management, On-demand research, and engineering data roles. trans<strong>for</strong>mation,<br />

recommendations. The driver <strong>for</strong> this is on-demand data cleansing, content<br />

He holds a PhD from the Indian Institute of Management Bangalore<br />

integration of multiple (IIMB) customer in the Quantitative analysis, Methods self-service and In<strong>for</strong>mation interface Systems across area, the and<br />

touch-points throughout the was a claims visiting researcher enterprise, at the IT personalised University of Copenhagen interfaces, during and his<br />

PhD. His areas of expertise are business analytics, social media, product<br />

process. This involves complex event user-assembled solutions facilitate easier<br />

management, and in<strong>for</strong>mation systems research.<br />

processing that correlates data about adoption of insight-driven decisions and<br />

customer demographics and transactions, significantly reduce learning time.<br />

Dr. Archisman Majumdar<br />

call Senior centre Manager, data, Mphasis data NEXTlabs from Web browsing<br />

behaviour and online chat sessions, e-mail Natural Language Processing, emotion<br />

campaign data, data from display detection, microsensor analytics, Automatic<br />

advertising, and more.<br />

Content Recognition, advanced decision<br />

tree-based visualisation, lifecycle<br />

Developing an enterprise-level strategy that simulation, and Massively Parallel<br />

aligns analytics programmes with key Processing (MPP) appliances such as<br />

business drivers<br />

An enterprise-wide strategy <strong>for</strong> using<br />

NoSQL and Hadoop need to be explored<br />

<strong>for</strong> applicability to business scenarios. The<br />

multi-structured analytics is key to technical challenges — such as managing<br />

generating value network insights.<br />

large datasets, integrating with existing<br />

About Mphasis<br />

Mphasis is a global technology services and solutions company specializing in the areas of Digital and Governance,<br />

Risk & Compliance. Our solution focus and superior human capital propels our partnership with large enterprise<br />

customers in their Digital Trans<strong>for</strong>mation journeys and with global financial institutions in the conception and<br />

execution of their Governance, Risk and Compliance Strategies. We focus on next generation technologies <strong>for</strong><br />

differentiated solutions delivering optimized operations <strong>for</strong> clients.<br />

For more in<strong>for</strong>mation, contact: Nextlabs@mphasis.com<br />

NEXT labs

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!