According to Ernst and Young, $8.2 billion a year is lost to the marketing, advertising, and media industries through fraudulent impressions, infringed content, and malvertising.
The combination of fake news, trolls, bots and money laundering is skewing the value of information and could be hurting your business.
By using graph technology and the data you already have on hand, you can discover fraud through detectable patterns and stop their actions.
We collaborated with Sungpack Hong, Director of Research and Advanced Development at Oracle Labs to demonstrate five examples of real problems and how graph technology and data are being used to combat them.
But first, a refresher on graph technology.
What Is Graph Technology?
With a graph technology, the basic premise is that you store, manage and query data in the form of a graph. Your entities become vertices (as illustrated by the red dots). Your relationships become edges (as represented by the red lines).
By analyzing these fine-grained relationships, you can use graph analysis to detect anomalies with queries and algorithms. We’ll talk about these anomalies later in the article.
The major benefit of graph databases is that they’re naturally indexed by relationships, which provides faster access to data (as compared with a relational database). You can also add data without doing a lot of modeling in advance. These features make graph technology particularly useful for anomaly detection—which is mainly what we’ll be covering in this article for our fraud detection use cases.
How to Find Anomalies with Graph Technology
If you take a look at Gartner’s 5 Layers of Fraud Protection, you can see that they break the analysis to discover fraud into two categories:
- Discrete data analysis where you evaluate individual users, actions, and accounts
- Connected analysis where relationships and integrated behaviors facilitate the fraud
It’s this second category based on connections, patterns, and behaviors that can really benefit from graph modeling and analysis.
Through connected analysis and graph technology, you would:
- Combine and correlate enterprise information
- Model the results as a connected graph
- Apply link and social network analysis for discovery
Now we’ll discuss examples of ways companies can apply this to solve real business problems.
Fraud Detection Use Case #1: Finding Bot Accounts in Social Networks
In the world of social media, marketers want to see what they can discover from trends. For example:
- If I’m selling this specific brand of shoes, how popular will they be? What are the trends in shoes?
- If I compare this brand with a competing brand, how do the results mirror actual public opinion?
- On social media, are people saying positive or negative things about me? About my competitors?
Of course, all of this information can be incredibly valuable. At the same time, it can mean nothing if it’s all inaccurate and skewed by how much other companies are willing to pay for bots.
In this case, we worked with Oracle Marketing Cloud to ensure the information they’re delivering to advertisers is as accurate as possible. We sought to find the fake bot accounts that are distorting popularity.
As an example, there are bots that retweet certain target accounts to make them look more popular.
To determine which accounts are “real,” we created a graph between accounts with retweet counts as the edge weights to see how many times these accounts are retweeting their neighboring accounts. We found that the unnaturally popularized accounts exhibit different characteristics from naturally popular accounts.
Here is the pattern for a naturally popular account:
And here is the pattern for an unnaturally popular account:
When these accounts are all analyzed, there are certain accounts that have obviously unnatural deviation. And by using graphs and relationships, we can find even more bots by:
- Finding accounts with a high retweet count
- Inspecting how other accounts are retweeting them
- Finding the accounts that also get retweets from only these bots
Fraud Detection Use Case #2: Identifying Sock Puppets in Social Media
In this case, we used graph technology to identify sockpuppet accounts (online identity used for purposes of deception or in this case, different accounts posting the same set of messages) that were working to make certain topics or keywords look more important by making it seem as though they’re trending.
To discover the bots, we had to augment the graph from Use Case #1. Here we:
- Added edges between the authors with the same messages
- Counted the number of repeated messaged and filtered to discount accidental unison
- Applied heuristics to avoid n2 edge generation per same message
Because we found that the messages were always the same, we were able to take that and create subgraphs using those edges and apply a connected components algorithm.
As a result of all of the analysis that we ran on a small sampling, we discovered that what we thought were the most popular brands actually weren’t—our original list had been distorted by bots.
See the image below – the “new” most popular brands barely even appear on the “old” most popular brands list. But they are a much truer reflection of what’s actually popular. This is the information you need.
After one month, we revisited the identified bot accounts just to see what had happened to them. We discovered:
- 89% were suspended
- 2.2% were deleted
- 8.8% were still serving as bots
Fraud Detection Use Case #3: Circular Payment
A common pattern in financial crimes, a circular money transfer essentially involves a criminal sending money to himself or herself—but hides it as a valid transfer between “normal” accounts. These “normal” accounts are actually fake accounts. They typically share certain information because they are generated from stolen identities (email addresses, addresses, etc.), and it’s this related information that makes graph analysis such a good fit to discover them.
For this use case, you can use graph representation by creating a graph from transitions between entities as well as entities that share some information, including the email addresses, passwords, addresses, and more. Once we create a graph out of it, all we have to do is write a simple query and run it to find all customers with accounts that have similar information, and of course who is sending money to each other.
Fraud Detection Use Case #4: VAT Fraud Detection
Because Europe has so many borders with different rules about who pays tax to which country when products are crossing borders, VAT (Value Added Tax) fraud detection can get very complicated.
In most cases, the importer should pay the VAT and if the products are exported to other countries, the exporter should receive a refund. But when there are other companies in between, deliberately obfuscating the process, it can get very complicated. The importing company delays paying the tax for weeks and months. The companies in the middle are paper companies. Eventually, the importing company vanishes and that company doesn’t pay VAT but is still able to get payment from the exporting company.
This can be very difficult to decipher—but not with graph analysis. You can easily create a graph by transactions; who are the resellers and who is creating the companies?
In this real-life analysis, Oracle Practice Manager Wojciech Wcislo looked at the flow and how the flow works to identify suspicious companies. He then used an algorithm in Oracle Spatial and Graph to identify the middle man.
The graph view of VAT fraud detection:
A more complex view:
In that case, you would:
- Identify importers and exporters via simple query
- Aggregate of VAT invoice items as edge weights
- Run Fattest Path Algorithm
And you will discover common “Middle Man” nodes where the flows are aggregated
Fraud Detection Use Case #5: Money Laundering and Financial Fraud
Conceptually, money laundering is pretty simple. Dirty money is passed around to blend it with legitimate funds and then turned into hard assets. This was the kind of process discovered in the Panama Papers analysis.
These tax evasion schemes often rely on false resellers and brokers who are able to apply for tax refunds to avoid payment.
But graphs and graph databases provide relationship models. They let you apply pattern recognition, classification, statistical analysis, and machine learning to these models, which enables more efficient analysis at scale against massive amounts of data.
In this use case, we’ll look more specifically at Case Correlation. In this case, whenever there are transactions that regulations dictate are suspicious, those transactions get a closer look from human investigators. The goal here is to avoid inspecting each individual activity separately but rather, group these suspicious activities together through pre-known connections.
To find these correlations through a graph-based approach, we implemented this flow through general graph machines, using pattern matching query (path finding) and connected component graph algorithm (with filters).
Through this method, this company didn’t have to create their own custom case correlation engine because they could use graph technology, which has improved flexibility. This flexibility is important because different countries have different rules.
In today’s world, the scammers are getting ever more inventive. But the technology is too. Graph technology is an excellent way to discover the truth in data, and it is a tool that’s rapidly becoming more popular. If you’d like to learn more, you can find white papers, software downloads, documentation and more on Oracle’s Big Data Spatial and Graph pages.
And if you’re ready to get started with exploring your data now, we offer a free guided trial that enables you to build and experiment with your own data lake.