Lift and Shift Your Apps to the Cloud

Ilona Gabinsky

Principal Product Marketing Manager

Today we have guest blogger – Sai Valluri – Product MarketingManager.

This year at Collaborate, I had the opportunity to host a panel discussion with three amazing customers and one partner. The title of the panel discussion was “Sharpen Your Competitive Edge: Lift and Shift Your Apps to the Cloud”. This discussion took place on 4/25 at 1:15 pm.

The panel included:

Chris Brown, Director-Application Development, Port of Houston

Chris is a seasoned IT executive with over 20 years of experience cultured in the full-vertical structure of operation oriented enterprises. Chris has experience in full Cycle Tier One ERP implementations of JDE software. In his current role, Chris oversees staff between two terminals supporting enterprise software port wide (several JDE modules) and interfaces between applications both hosted and on premises.

Joe Finlinson, Business Applications Technology Director, Intermountain Healthcare

Joe is an experience IT leader with over 15 years in enterprise IT. In his current role, Joe directs all technical aspects of the Business Applications Portfolio including PeopleSoft FSCM, Oracle E-Business Suite HR/Payroll, Kronos Timekeeping and Analytics, Hyperion Budgeting, OBIEE/OBIA, Oracle SOA Suite integrations along with several other applications.

Michael Lee Sherwood, Director of IT, City of Las Vegas

Michael is an accomplished technology and innovation leader with a demonstrated history of working in government and private sector industries. Skilled in process improvement, budgeting, operations management, strategy, customer experience, and entrepreneurship. Michael graduated from University of Southern California – Marshall School of Business

Niklas Ivelsatt, Senior Partner, Arisant LLC

Niklas is the Co-founder and Senior Partner of Arisant LLC. Founded in 2006 in Denver, Arisant is well known for designing, building and supporting scalable, cost effective Oracle infrastructure environments. Arisant is an Oracle Platinum partner.

The discussion covered a variety of challenges faced by customers and business benefits accrued by moving to Oracle Cloud. Each of the customers is running a key Oracle application such as E-Business Suite, JD Edwards and PeopleSoft. So in a way it’s a diverse customer group. Each of them made the transition to Cloud to better serve their stakeholders.

Some of the challenges faced by customers included:

  1. Retiring workforce and its impact on day to day operations.
  2. Increasing hardware/ Data Center costs that create a burden on innovation.
  3. Providing services to end users quickly and efficiently.
  4. Ability to set up and tear down test environments.

Business Benefits:

  1. For one customer by moving to Cloud at Customer, the customer could keep their E-Business Suite and databases on premise due to security/compliance and concern about putting core systems in the public cloud. This also addressed any latency concerns.
  2. Savings in terms of hard dollars when compared to legacy deployments. Each of the panelists could validate savings and how they could utilize these dollars for other projects.
  3. Move to Cloud also helped our customers retrain employees and also attract new talent that was keen to work on Cloud.
  4. Cloud enables customers to leverage data and analytics. Thereby better manage operations, analyze data in real time and make faster business decisions.

The panel also discussed how Oracle Cloud helped them overcome challenges and helped save costs and grow business. Some of the other topics that the panelists touched upon included total cost of ownership, selection process and best practices learned during the evaluation, implementation and post implementation phases.

Call to Action:

You can hear the entire panel discussion at: http://nnf.questdirect.org/questmediaviewer.aspx?video=268022930

Related:

  • No Related Posts

Find the Truth With Data: 5 Fraud Detection Use Cases

According to Ernst and Young, $8.2 billion a year is lost to the marketing, advertising, and media industries through fraudulent impressions, infringed content, and malvertising.

The combination of fake news, trolls, bots and money laundering is skewing the value of information and could be hurting your business.

It’s avoidable.

By using graph technology and the data you already have on hand, you can discover fraud through detectable patterns and stop their actions.

We collaborated with Sungpack Hong, Director of Research and Advanced Development at Oracle Labs to demonstrate five examples of real problems and how graph technology and data are being used to combat them.

Get started with data—register for a guided trial to build a data lake

But first, a refresher on graph technology.

What Is Graph Technology?

With a graph technology, the basic premise is that you store, manage and query data in the form of a graph. Your entities become vertices (as illustrated by the red dots). Your relationships become edges (as represented by the red lines).

What Is Graph Technology

By analyzing these fine-grained relationships, you can use graph analysis to detect anomalies with queries and algorithms. We’ll talk about these anomalies later in the article.

The major benefit of graph databases is that they’re naturally indexed by relationships, which provides faster access to data (as compared with a relational database). You can also add data without doing a lot of modeling in advance. These features make graph technology particularly useful for anomaly detection—which is mainly what we’ll be covering in this article for our fraud detection use cases.

How to Find Anomalies with Graph Technology

Gartner 5 Layers of Fraud Detection

If you take a look at Gartner’s 5 Layers of Fraud Protection, you can see that they break the analysis to discover fraud into two categories:

  • Discrete data analysis where you evaluate individual users, actions, and accounts
  • Connected analysis where relationships and integrated behaviors facilitate the fraud

It’s this second category based on connections, patterns, and behaviors that can really benefit from graph modeling and analysis.

Through connected analysis and graph technology, you would:

  • Combine and correlate enterprise information
  • Model the results as a connected graph
  • Apply link and social network analysis for discovery

Now we’ll discuss examples of ways companies can apply this to solve real business problems.

Fraud Detection Use Case #1: Finding Bot Accounts in Social Networks

In the world of social media, marketers want to see what they can discover from trends. For example:

  • If I’m selling this specific brand of shoes, how popular will they be? What are the trends in shoes?
  • If I compare this brand with a competing brand, how do the results mirror actual public opinion?
  • On social media, are people saying positive or negative things about me? About my competitors?

Of course, all of this information can be incredibly valuable. At the same time, it can mean nothing if it’s all inaccurate and skewed by how much other companies are willing to pay for bots.

In this case, we worked with Oracle Marketing Cloud to ensure the information they’re delivering to advertisers is as accurate as possible. We sought to find the fake bot accounts that are distorting popularity.

As an example, there are bots that retweet certain target accounts to make them look more popular.

To determine which accounts are “real,” we created a graph between accounts with retweet counts as the edge weights to see how many times these accounts are retweeting their neighboring accounts. We found that the unnaturally popularized accounts exhibit different characteristics from naturally popular accounts.

Here is the pattern for a naturally popular account:

Naturally Popular Social Media Account

And here is the pattern for an unnaturally popular account:

Unnaturally Popular Social Media Account

When these accounts are all analyzed, there are certain accounts that have obviously unnatural deviation. And by using graphs and relationships, we can find even more bots by:

  • Finding accounts with a high retweet count
  • Inspecting how other accounts are retweeting them
  • Finding the accounts that also get retweets from only these bots

Fraud Detection Use Case #2: Identifying Sock Puppets in Social Media

In this case, we used graph technology to identify sockpuppet accounts (online identity used for purposes of deception or in this case, different accounts posting the same set of messages) that were working to make certain topics or keywords look more important by making it seem as though they’re trending.

Sock Puppet Accounts in Social Media

To discover the bots, we had to augment the graph from Use Case #1. Here we:

  • Added edges between the authors with the same messages
  • Counted the number of repeated messaged and filtered to discount accidental unison
  • Applied heuristics to avoid n2 edge generation per same message

Because we found that the messages were always the same, we were able to take that and create subgraphs using those edges and apply a connected components algorithm.

Sock Puppet Groups

As a result of all of the analysis that we ran on a small sampling, we discovered that what we thought were the most popular brands actually weren’t—our original list had been distorted by bots.

See the image below – the “new” most popular brands barely even appear on the “old” most popular brands list. But they are a much truer reflection of what’s actually popular. This is the information you need.

Brand Popularity Skewed by Bots

After one month, we revisited the identified bot accounts just to see what had happened to them. We discovered:

  • 89% were suspended
  • 2.2% were deleted
  • 8.8% were still serving as bots

Fraud Detection Use Case #3: Circular Payment

A common pattern in financial crimes, a circular money transfer essentially involves a criminal sending money to himself or herself—but hides it as a valid transfer between “normal” accounts. These “normal” accounts are actually fake accounts. They typically share certain information because they are generated from stolen identities (email addresses, addresses, etc.), and it’s this related information that makes graph analysis such a good fit to discover them.

For this use case, you can use graph representation by creating a graph from transitions between entities as well as entities that share some information, including the email addresses, passwords, addresses, and more. Once we create a graph out of it, all we have to do is write a simple query and run it to find all customers with accounts that have similar information, and of course who is sending money to each other.

Circular Payments Graph Technology

Fraud Detection Use Case #4: VAT Fraud Detection

Because Europe has so many borders with different rules about who pays tax to which country when products are crossing borders, VAT (Value Added Tax) fraud detection can get very complicated.

In most cases, the importer should pay the VAT and if the products are exported to other countries, the exporter should receive a refund. But when there are other companies in between, deliberately obfuscating the process, it can get very complicated. The importing company delays paying the tax for weeks and months. The companies in the middle are paper companies. Eventually, the importing company vanishes and that company doesn’t pay VAT but is still able to get payment from the exporting company.

VAT Fraud Detection

This can be very difficult to decipher—but not with graph analysis. You can easily create a graph by transactions; who are the resellers and who is creating the companies?

In this real-life analysis, Oracle Practice Manager Wojciech Wcislo looked at the flow and how the flow works to identify suspicious companies. He then used an algorithm in Oracle Spatial and Graph to identify the middle man.

The graph view of VAT fraud detection:

Graph View of VAT Fraud Detection

A more complex view:

Complex View of Graph Technology and Anomaly Detection

In that case, you would:

  • Identify importers and exporters via simple query
  • Aggregate of VAT invoice items as edge weights
  • Run Fattest Path Algorithm

And you will discover common “Middle Man” nodes where the flows are aggregated

Fraud Detection Use Case #5: Money Laundering and Financial Fraud

Conceptually, money laundering is pretty simple. Dirty money is passed around to blend it with legitimate funds and then turned into hard assets. This was the kind of process discovered in the Panama Papers analysis.

These tax evasion schemes often rely on false resellers and brokers who are able to apply for tax refunds to avoid payment.

But graphs and graph databases provide relationship models. They let you apply pattern recognition, classification, statistical analysis, and machine learning to these models, which enables more efficient analysis at scale against massive amounts of data.

In this use case, we’ll look more specifically at Case Correlation. In this case, whenever there are transactions that regulations dictate are suspicious, those transactions get a closer look from human investigators. The goal here is to avoid inspecting each individual activity separately but rather, group these suspicious activities together through pre-known connections.

Money Laundering and Financial Fraud

To find these correlations through a graph-based approach, we implemented this flow through general graph machines, using pattern matching query (path finding) and connected component graph algorithm (with filters).

Through this method, this company didn’t have to create their own custom case correlation engine because they could use graph technology, which has improved flexibility. This flexibility is important because different countries have different rules.

Conclusion

In today’s world, the scammers are getting ever more inventive. But the technology is too. Graph technology is an excellent way to discover the truth in data, and it is a tool that’s rapidly becoming more popular. If you’d like to learn more, you can find white papers, software downloads, documentation and more on Oracle’s Big Data Spatial and Graph pages.

And if you’re ready to get started with exploring your data now, we offer a free guided trial that enables you to build and experiment with your own data lake.

Related:

  • No Related Posts

5 Graph Analytics Use Cases

According to Ernst and Young, $8.2 billion a year is lost to the marketing, advertising, and media industries through fraudulent impressions, infringed content, and malvertising.

The combination of fake news, trolls, bots and money laundering is skewing the value of information and could be hurting your business.

It’s avoidable.

By using graph technology and the data you already have on hand, you can discover fraud through detectable patterns and stop their actions.

We collaborated with Sungpack Hong, Director of Research and Advanced Development at Oracle Labs to demonstrate five examples of real problems and how graph technology and data are being used to combat them.

Get started with data—register for a guided trial to build a data lake

But first, a refresher on graph technology.

What Is Graph Technology?

With a graph technology, the basic premise is that you store, manage and query data in the form of a graph. Your entities become vertices (as illustrated by the red dots). Your relationships become edges (as represented by the red lines).

What Is Graph Technology

By analyzing these fine-grained relationships, you can use graph analysis to detect anomalies with queries and algorithms. We’ll talk about these anomalies later in the article.

The major benefit of graph databases is that they’re naturally indexed by relationships, which provides faster access to data (as compared with a relational database). You can also add data without doing a lot of modeling in advance. These features make graph technology particularly useful for anomaly detection—which is mainly what we’ll be covering in this article for our fraud detection use cases.

How to Find Anomalies with Graph Technology

Gartner 5 Layers of Fraud Detection

If you take a look at Gartner’s 5 Layers of Fraud Protection, you can see that they break the analysis to discover fraud into two categories:

  • Discrete data analysis where you evaluate individual users, actions, and accounts
  • Connected analysis where relationships and integrated behaviors facilitate the fraud

It’s this second category based on connections, patterns, and behaviors that can really benefit from graph modeling and analysis.

Through connected analysis and graph technology, you would:

  • Combine and correlate enterprise information
  • Model the results as a connected graph
  • Apply link and social network analysis for discovery

Now we’ll discuss examples of ways companies can apply this to solve real business problems.

Fraud Detection Use Case #1: Finding Bot Accounts in Social Networks

In the world of social media, marketers want to see what they can discover from trends. For example:

  • If I’m selling this specific brand of shoes, how popular will they be? What are the trends in shoes?
  • If I compare this brand with a competing brand, how do the results mirror actual public opinion?
  • On social media, are people saying positive or negative things about me? About my competitors?

Of course, all of this information can be incredibly valuable. At the same time, it can mean nothing if it’s all inaccurate and skewed by how much other companies are willing to pay for bots.

In this case, we worked with Oracle Marketing Cloud to ensure the information they’re delivering to advertisers is as accurate as possible. We sought to find the fake bot accounts that are distorting popularity.

As an example, there are bots that retweet certain target accounts to make them look more popular.

To determine which accounts are “real,” we created a graph between accounts with retweet counts as the edge weights to see how many times these accounts are retweeting their neighboring accounts. We found that the unnaturally popularized accounts exhibit different characteristics from naturally popular accounts.

Here is the pattern for a naturally popular account:

Naturally Popular Social Media Account

And here is the pattern for an unnaturally popular account:

Unnaturally Popular Social Media Account

When these accounts are all analyzed, there are certain accounts that have obviously unnatural deviation. And by using graphs and relationships, we can find even more bots by:

  • Finding accounts with a high retweet count
  • Inspecting how other accounts are retweeting them
  • Finding the accounts that also get retweets from only these bots

Fraud Detection Use Case #2: Identifying Sock Puppets in Social Media

In this case, we used graph technology to identify sockpuppet accounts (online identity used for purposes of deception or in this case, different accounts posting the same set of messages) that were working to make certain topics or keywords look more important by making it seem as though they’re trending.

Sock Puppet Accounts in Social Media

To discover the bots, we had to augment the graph from Use Case #1. Here we:

  • Added edges between the authors with the same messages
  • Counted the number of repeated messaged and filtered to discount accidental unison
  • Applied heuristics to avoid n2 edge generation per same message

Because we found that the messages were always the same, we were able to take that and create subgraphs using those edges and apply a connected components algorithm.

Sock Puppet Groups

As a result of all of the analysis that we ran on a small sampling, we discovered that what we thought were the most popular brands actually weren’t—our original list had been distorted by bots.

See the image below – the “new” most popular brands barely even appear on the “old” most popular brands list. But they are a much truer reflection of what’s actually popular. This is the information you need.

Brand Popularity Skewed by Bots

After one month, we revisited the identified bot accounts just to see what had happened to them. We discovered:

  • 89% were suspended
  • 2.2% were deleted
  • 8.8% were still serving as bots

Fraud Detection Use Case #3: Circular Payment

A common pattern in financial crimes, a circular money transfer essentially involves a criminal sending money to himself or herself—but hides it as a valid transfer between “normal” accounts. These “normal” accounts are actually fake accounts. They typically share certain information because they are generated from stolen identities (email addresses, addresses, etc.), and it’s this related information that makes graph analysis such a good fit to discover them.

For this use case, you can use graph representation by creating a graph from transitions between entities as well as entities that share some information, including the email addresses, passwords, addresses, and more. Once we create a graph out of it, all we have to do is write a simple query and run it to find all customers with accounts that have similar information, and of course who is sending money to each other.

Circular Payments Graph Technology

Fraud Detection Use Case #4: VAT Fraud Detection

Because Europe has so many borders with different rules about who pays tax to which country when products are crossing borders, VAT (Value Added Tax) fraud detection can get very complicated.

In most cases, the importer should pay the VAT and if the products are exported to other countries, the exporter should receive a refund. But when there are other companies in between, deliberately obfuscating the process, it can get very complicated. The importing company delays paying the tax for weeks and months. The companies in the middle are paper companies. Eventually, the importing company vanishes and that company doesn’t pay VAT but is still able to get payment from the exporting company.

VAT Fraud Detection

This can be very difficult to decipher—but not with graph analysis. You can easily create a graph by transactions; who are the resellers and who is creating the companies?

In this real-life analysis, Oracle Practice Manager Wojciech Wcislo looked at the flow and how the flow works to identify suspicious companies. He then used an algorithm in Oracle Spatial and Graph to identify the middle man.

The graph view of VAT fraud detection:

Graph View of VAT Fraud Detection

A more complex view:

Complex View of Graph Technology and Anomaly Detection

In that case, you would:

  • Identify importers and exporters via simple query
  • Aggregate of VAT invoice items as edge weights
  • Run Fattest Path Algorithm

And you will discover common “Middle Man” nodes where the flows are aggregated

Fraud Detection Use Case #5: Money Laundering and Financial Fraud

Conceptually, money laundering is pretty simple. Dirty money is passed around to blend it with legitimate funds and then turned into hard assets. This was the kind of process discovered in the Panama Papers analysis.

These tax evasion schemes often rely on false resellers and brokers who are able to apply for tax refunds to avoid payment.

But graphs and graph databases provide relationship models. They let you apply pattern recognition, classification, statistical analysis, and machine learning to these models, which enables more efficient analysis at scale against massive amounts of data.

In this use case, we’ll look more specifically at Case Correlation. In this case, whenever there are transactions that regulations dictate are suspicious, those transactions get a closer look from human investigators. The goal here is to avoid inspecting each individual activity separately but rather, group these suspicious activities together through pre-known connections.

Money Laundering and Financial Fraud

To find these correlations through a graph-based approach, we implemented this flow through general graph machines, using pattern matching query (path finding) and connected component graph algorithm (with filters).

Through this method, this company didn’t have to create their own custom case correlation engine because they could use graph technology, which has improved flexibility. This flexibility is important because different countries have different rules.

Conclusion

In today’s world, the scammers are getting ever more inventive. But the technology is too. Graph technology is an excellent way to discover the truth in data, and it is a tool that’s rapidly becoming more popular. If you’d like to learn more, you can find white papers, software downloads, documentation and more on Oracle’s Big Data Spatial and Graph pages.

And if you’re ready to get started with exploring your data now, we offer a free guided trial that enables you to build and experiment with your own data lake.

Related:

  • No Related Posts

5 Innovative Ways to Use Graph Analytics

According to Ernst and Young, $8.2 billion a year is lost to the marketing, advertising, and media industries through fraudulent impressions, infringed content, and malvertising.

The combination of fake news, trolls, bots and money laundering is skewing the value of information and could be hurting your business.

It’s avoidable.

By using graph technology and the data you already have on hand, you can discover fraud through detectable patterns and stop their actions.

We collaborated with Sungpack Hong, Director of Research and Advanced Development at Oracle Labs to demonstrate five examples of real problems and how graph technology and data are being used to combat them.

Get started with data—register for a guided trial to build a data lake

But first, a refresher on graph technology.

What Is Graph Technology?

With a graph technology, the basic premise is that you store, manage and query data in the form of a graph. Your entities become vertices (as illustrated by the red dots). Your relationships become edges (as represented by the red lines).

What Is Graph Technology

By analyzing these fine-grained relationships, you can use graph analysis to detect anomalies with queries and algorithms. We’ll talk about these anomalies later in the article.

The major benefit of graph databases is that they’re naturally indexed by relationships, which provides faster access to data (as compared with a relational database). You can also add data without doing a lot of modeling in advance. These features make graph technology particularly useful for anomaly detection—which is mainly what we’ll be covering in this article for our fraud detection use cases.

How to Find Anomalies with Graph Technology

Gartner 5 Layers of Fraud Detection

If you take a look at Gartner’s 5 Layers of Fraud Protection, you can see that they break the analysis to discover fraud into two categories:

  • Discrete data analysis where you evaluate individual users, actions, and accounts
  • Connected analysis where relationships and integrated behaviors facilitate the fraud

It’s this second category based on connections, patterns, and behaviors that can really benefit from graph modeling and analysis.

Through connected analysis and graph technology, you would:

  • Combine and correlate enterprise information
  • Model the results as a connected graph
  • Apply link and social network analysis for discovery

Now we’ll discuss examples of ways companies can apply this to solve real business problems.

Fraud Detection Use Case #1: Finding Bot Accounts in Social Networks

In the world of social media, marketers want to see what they can discover from trends. For example:

  • If I’m selling this specific brand of shoes, how popular will they be? What are the trends in shoes?
  • If I compare this brand with a competing brand, how do the results mirror actual public opinion?
  • On social media, are people saying positive or negative things about me? About my competitors?

Of course, all of this information can be incredibly valuable. At the same time, it can mean nothing if it’s all inaccurate and skewed by how much other companies are willing to pay for bots.

In this case, we worked with Oracle Marketing Cloud to ensure the information they’re delivering to advertisers is as accurate as possible. We sought to find the fake bot accounts that are distorting popularity.

As an example, there are bots that retweet certain target accounts to make them look more popular.

To determine which accounts are “real,” we created a graph between accounts with retweet counts as the edge weights to see how many times these accounts are retweeting their neighboring accounts. We found that the unnaturally popularized accounts exhibit different characteristics from naturally popular accounts.

Here is the pattern for a naturally popular account:

Naturally Popular Social Media Account

And here is the pattern for an unnaturally popular account:

Unnaturally Popular Social Media Account

When these accounts are all analyzed, there are certain accounts that have obviously unnatural deviation. And by using graphs and relationships, we can find even more bots by:

  • Finding accounts with a high retweet count
  • Inspecting how other accounts are retweeting them
  • Finding the accounts that also get retweets from only these bots

Fraud Detection Use Case #2: Identifying Sock Puppets in Social Media

In this case, we used graph technology to identify sockpuppet accounts (online identity used for purposes of deception or in this case, different accounts posting the same set of messages) that were working to make certain topics or keywords look more important by making it seem as though they’re trending.

Sock Puppet Accounts in Social Media

To discover the bots, we had to augment the graph from Use Case #1. Here we:

  • Added edges between the authors with the same messages
  • Counted the number of repeated messaged and filtered to discount accidental unison
  • Applied heuristics to avoid n2 edge generation per same message

Because we found that the messages were always the same, we were able to take that and create subgraphs using those edges and apply a connected components algorithm.

Sock Puppet Groups

As a result of all of the analysis that we ran on a small sampling, we discovered that what we thought were the most popular brands actually weren’t—our original list had been distorted by bots.

See the image below – the “new” most popular brands barely even appear on the “old” most popular brands list. But they are a much truer reflection of what’s actually popular. This is the information you need.

Brand Popularity Skewed by Bots

After one month, we revisited the identified bot accounts just to see what had happened to them. We discovered:

  • 89% were suspended
  • 2.2% were deleted
  • 8.8% were still serving as bots

Fraud Detection Use Case #3: Circular Payment

A common pattern in financial crimes, a circular money transfer essentially involves a criminal sending money to himself or herself—but hides it as a valid transfer between “normal” accounts. These “normal” accounts are actually fake accounts. They typically share certain information because they are generated from stolen identities (email addresses, addresses, etc.), and it’s this related information that makes graph analysis such a good fit to discover them.

For this use case, you can use graph representation by creating a graph from transitions between entities as well as entities that share some information, including the email addresses, passwords, addresses, and more. Once we create a graph out of it, all we have to do is write a simple query and run it to find all customers with accounts that have similar information, and of course who is sending money to each other.

Circular Payments Graph Technology

Fraud Detection Use Case #4: VAT Fraud Detection

Because Europe has so many borders with different rules about who pays tax to which country when products are crossing borders, VAT (Value Added Tax) fraud detection can get very complicated.

In most cases, the importer should pay the VAT and if the products are exported to other countries, the exporter should receive a refund. But when there are other companies in between, deliberately obfuscating the process, it can get very complicated. The importing company delays paying the tax for weeks and months. The companies in the middle are paper companies. Eventually, the importing company vanishes and that company doesn’t pay VAT but is still able to get payment from the exporting company.

VAT Fraud Detection

This can be very difficult to decipher—but not with graph analysis. You can easily create a graph by transactions; who are the resellers and who is creating the companies?

In this real-life analysis, Oracle Practice Manager Wojciech Wcislo looked at the flow and how the flow works to identify suspicious companies. He then used an algorithm in Oracle Spatial and Graph to identify the middle man.

The graph view of VAT fraud detection:

Graph View of VAT Fraud Detection

A more complex view:

Complex View of Graph Technology and Anomaly Detection

In that case, you would:

  • Identify importers and exporters via simple query
  • Aggregate of VAT invoice items as edge weights
  • Run Fattest Path Algorithm

And you will discover common “Middle Man” nodes where the flows are aggregated

Fraud Detection Use Case #5: Money Laundering and Financial Fraud

Conceptually, money laundering is pretty simple. Dirty money is passed around to blend it with legitimate funds and then turned into hard assets. This was the kind of process discovered in the Panama Papers analysis.

These tax evasion schemes often rely on false resellers and brokers who are able to apply for tax refunds to avoid payment.

But graphs and graph databases provide relationship models. They let you apply pattern recognition, classification, statistical analysis, and machine learning to these models, which enables more efficient analysis at scale against massive amounts of data.

In this use case, we’ll look more specifically at Case Correlation. In this case, whenever there are transactions that regulations dictate are suspicious, those transactions get a closer look from human investigators. The goal here is to avoid inspecting each individual activity separately but rather, group these suspicious activities together through pre-known connections.

Money Laundering and Financial Fraud

To find these correlations through a graph-based approach, we implemented this flow through general graph machines, using pattern matching query (path finding) and connected component graph algorithm (with filters).

Through this method, this company didn’t have to create their own custom case correlation engine because they could use graph technology, which has improved flexibility. This flexibility is important because different countries have different rules.

Conclusion

In today’s world, the scammers are getting ever more inventive. But the technology is too. Graph technology is an excellent way to discover the truth in data, and it is a tool that’s rapidly becoming more popular. If you’d like to learn more, you can find white papers, software downloads, documentation and more on Oracle’s Big Data Spatial and Graph pages.

And if you’re ready to get started with exploring your data now, we offer a free guided trial that enables you to build and experiment with your own data lake.

Related:

  • No Related Posts

Association Rules in Machine Learning, Simplified

Peter Jeffcock

Big Data Product Marketing

You’ve probably been to a supermarket that printed coupons for you at checkout. Or listened to a playlist that your streaming service generated for you. Or gone shopping online and seen a list of products labeled “you might be interested in….” that did indeed contain some stuff that you were interested in.

Recommendation engines take data about you, similar consumers, and available products, and use that to figure out what you might be interested in and therefore deliver those coupons, playlists, and suggestions.

Download your free ebook, “Demystifying Machine Learning.”

Recommendation engines can be extremely complex. For example, Netflix ran a $1M competition from 2006 to 2009 to improve their movie recommendation engine performance. Over 5,000 teams participated. The winning team combined results from 107 different algorithms or techniques to deliver the 10 percent improvement and claim the prize.

So, there are many different ways to build a recommendation engine and most will combine multiple techniques or approaches. In this article, I want to cover just one approach, association rules, which are fairly easy to understand and require minimal skills in mathematics. If you can work with simple percentages, there’s nothing more complex than that below.

Association Rules in the Real World

Conceptually association rules is a very simple technique. The end result is one or more statements of the form “if this happened, then the following is likely to happen.” In a rule, the “if” portion is called the antecedent, and the “then” portion is called the consequent. Remember those two terms because they are going to come up in the descriptions below. Let’s start with food shopping because association rules are very often used to analyze the contents of your shopping cart.

As you make your shopping list, you probably buy a mix of pantry staples as well as ingredients for a specific meal or dish that you plan to prepare. Imagine you plan to make tomato sauce for pizza or a pasta dish. You’re probably going to buy tomatoes, onions, garlic, maybe olive oil or fresh basil. You’re far from the only person making tomato sauce and many others will have similar sets of ingredients.

Machine Learning Pizza

If we looked at all the various shopping baskets that people purchased, we could start to see some things in common. “If somebody buys canned tomatoes, then they are more likely to buy dried pasta (or onions or garlic or pizza dough or …)”. Armed with this knowledge, a supermarket could print you a coupon at checkout for something you didn’t purchase, hoping that you would come back. Or a manufacturer might offer you a coupon for their pre-made tomato sauce for those nights when you don’t want to make it from scratch.

Although tomatoes might imply garlic and/or basil, the reverse may not be true. For example, somebody buying garlic and basil could be looking to make pesto, in which case they’d be more likely to buy pine nuts than tomatoes. But with the right analysis, it would be possible to find the rules governing which products were more likely to be associated with each other. Hence the name “association rules”.

Let’s illustrate this process with some real numbers. And to do so we’ll move from buying groceries to watching movies.

How Do Association Rules Work in Machine Learning, Exactly?

The starting point for this algorithm is a collection of transactions. They could be traditional purchase transactions, but could also include events like “put a product in an online shopping cart,” “clicked on a web ad” or, in this case, “watched a movie.”

I’ll use this very abbreviated data set of movie watching habits of five people. I’ve anonymized them to hide their identities (not that this approach always works). Here you see each person and the list of movies they have watched, here represented by numbers from 1-5.

User

Movies Watched

A

1, 2, 4

B

1, 3

C

1, 4

D

2, 3, 4

E

3, 4

As you work your way down that table the first thing to stand out is that the first and third users both watched movies 1 and 4. From this data, the rule there would be “if somebody watches movie number 1, then they are likely to watch movie number 4.” You’ll need to understand the two terms I snuck in above: movie 1 is the antecedent and movie 4 is the consequent. Let’s look at this rule in more detail.

How useful is this rule? There are 2 users out of 5 who demonstrate watched both movies 1 and 4. So we can say that this rule has support of 40% (2 out of 5 users). How confident are we that it’s a reliable indicator? Three users watched movie number 1, but only 2 of them also watched number 3. The confidence in this rule is 67 percent.

Note that if you reverse the order rule (or swap the antecedent and consequent if you prefer) we can also say that “if somebody watched movie number 4 then they are likely to watch movie number 1.” However, while the support is also 40 percent, the confidence changes and is now only 50 percent (check the table above to see how that came about). This is the same process as in the example with tomato sauce and pesto above.

What do these metrics mean? With just 5 users and 5 movies it might be hard to see, but imagine this is a subset of many millions of users and thousands of movies. If the support is very low, it basically means that this rule will not apply to many customers. For example it might mean that people who watch some obscure 70s documentary will also watch an equally obscure 80s film. In the movie recommendation space, this would translate to a niche rule that might not get used very often, but could be quite valuable to that very small subset of customers. However, if you were using rules to find the optimal placement of products on the shelves in a supermarket, lots of low support rules would lead to a very fragmented set of displays. In this kind of application, you might set a threshold for support and discard rules that didn’t meet that minimum.

How to Understand Confidence in Association Rules

Confidence is a little easier to understand. If there’s a rule linking two movies but with very low confidence, then it simply means that most of the time they watch the first movie, they won’t actually watch the second one. For the purpose of making recommendations or predictions, you’d much rather have a rule that you were confident about. You could also use a minimum threshold for confidence and ignore or discard rules below a certain threshold.

Take another look at the first rule from above: if somebody watches movie 1 they will also watch movie 4. The confidence here is 67 percent which is pretty good. But take a look at the rest of the table. Four out of 5 users watched movie number 4 anyway. If we know nothing else about their other movie preferences, we know that there’s an 80 percent chance of them watching movie 4. So despite that confidence of 67 percent that first rule we found is actually not useful: somebody who has watched movie 1 is actually less likely to watch movie 4 than somebody picked at random. Fortunately, there’s a way to take this into account. It’s called “lift”.

Lift in Association Rules

Lift is used to measure the performance of the rule when compared against the entire data set. In the example above, we would want to compare the probability of “watching movie 1 and movie 4” with the probability of “watching movie 4” occurring in the dataset as a whole. As you might expect, there’s a formula for lift:

Lift is equal to the probability of the consequent given the antecedent (that’s just the confidence for that rule) divided by probability of that consequent occurring in the entire data set (which is the support for the consequent), or more concisely:

Lift = confidence / support(consequent)

In this example, the probability of movie 4, given that movie 1 was watched, is just the confidence of that first rule: 67 percent or 0.67. The probability of some random person in the entire dataset (of just 5 users in this simple example) watching movie 4 is 80 percent or 0.8. Dividing 0.67 by 0.8 gives a lift of approximately 0.84.

In general, if you have a lift of less than 1, it shows a rule that is less predictive than just picking a user at random which is the case with this rule as I explained in the first paragraph of this section. If you have a lift of around 1, then it’s indicating two independent events, e.g., watching one movie does not influence the likelihood of watching another. Values of lift that are greater than 1 show that the antecedent does influence finding the consequent. In other words, here is a rule that is useful.

Testing and Binning Association Rules

I’ll finish with a few more tips and extensions to the simple example above.

First, how do you test the accuracy of your rules? You can just use the same approach that I previously outlined for classification: build your rules with a subset of the available transactions, and then test the performance of those rules against the remainder. And of course, you should monitor their performance if they are used to make recommendations to actual users.

We worked with a simple rule above: “if user watched movie 1 then they are likely to watch movie 4.” This is referred to as a rule of length 2, because it incorporates two elements. Of course, we could have more complex rules: “if a user watched movies 1, 2 and 3, they then are likely to watch movie 4” is a rule of length 4. Or if you want to go back to grocery shopping for a similar rule, somebody who buys tomatoes, garlic, and pasta is likely to want some parmesan cheese to go with their spaghetti dinner.

Some streaming sites ask users to rate the things they watch on a scale of 1-5 or 1-10. If we had that information we couldn’t use the numeric value directly; we’d want to “bin” the answers. For example, we might say that a score of 7-10 was considered “high” and so on. A rule then might then incorporate “watched movie A and rated it high”.

The concept of binning applies to the last example, which takes us well away from movies and shopping to machines because these rules potentially have wider uses.

Imagine you’re responsible for maintaining some machine that breaks from time to time. You have lots of sensor data and other information about its operation, and you’ve captured several failures. You could in principle treat failure as a consequent and search for the antecedents. You’d have to bin the sensor data in some way (flow rate of 7.3 to 11.4 goes into this bin etc.). In principle you could use association rules to find the conditions that are associated with failure and take corrective action, also referred to as root cause analysis. Replace mechanical failure with diagnosis and you could even use some form of association rules with medical data.

Learn More

Visit some of our previous articles for high level overviews of machine learning, a look at decision trees, or k-means clustering. If you’d like to find out more about Oracle’s ML technologies you can look at our support for R as well as Advanced Analytics inside Oracle Database. If you’re ready to get started with machine learning, try Oracle Cloud for free and build your own data lake to test out some of these techniques.

Related:

  • No Related Posts

Every CEO is in the Data Security Business!

Ilona Gabinsky

Principal Product Marketing Manager

What would you consider your most valuable resource today? What about the world’s? Turns out they are one and the same: data. According to the Economist, data has replaced oil as the world’s most valuable resource. During every millisecond of our IT world, data is being collected on nearly every activity, creating a virtually limitless resource with equally unlimited demand.

The New Data Economy

Data drives improvement to products and services, which drives stronger customer adoption, resulting in even more data collection. The more data Tesla can collect on its self-driving cars, the better the cars will perform, and the better Tesla will outperform its competitors. This new data economy has changed the competitive landscape of the tech world. Whoever can acquire the most data the fastest—whether they collect it on their own or purchase it elsewhere—stands to win.

But this new wealth of data comes with equally expensive risks. With every opportunity to collect more data comes the opportunity for hackers to steal it. Cyber attacks the likes of Equifax or WannaCry cost companies billions, including potential hardship imposed on lives of individuals. Needless to say, companies stand to lose more than just money in the face of a cyber attack—a major breach can tarnish a company’s image, damaged credibility weaken consumer confidence, causing lasting damage to their brand and reputation.

Cyber Risk is Growing

Smart companies aren’t preparing for the “if” of a cyber attack, but the “when,” because hackers are constantly on the lookout to steal or compromise data. In a world where the next cyber attack could strike at any time, causing a data breach that costs your company millions, how should your company defend itself? Even the most secure firewalls, intrusion detection, and data loss prevention solutions can’t protect against an employee accidentally downloading malware by clicking on the wrong email, a security patch in a timely manner or a simple misconfiguration that leaves an entire database open to intrusion. You could hire an army of IT security professionals, but even they wouldn’t be able to manage the tens of thousands of security alerts that come into most security operations centers from today’s hybrid IT systems. Simply put, if you can’t protect your data, your company, —and your job—are at risk.

But what if you could automate the fight? Get ahead of the hacker with a built-in army of “robot cyber warriors” to protect your data automatically, have all patches automatically applied, configurations self-tuned and optimized?

Protect Your Data from the Inside Out

Last October, Oracle introduced world’s first Autonomous Self-Securing Database. Oracle Autonomous Database is designed with built-in adaptive machine learning to protect you from external hackers as well as malicious internal users. It encrypts all of your data, automatically—ensuring comprehensive data protection. It applies security updates, automatically—while your system is running – with no down time.

24/7 Data Protection

How many unpatched IT asset does your company have right now? Even one is too many, because if it is detected, it can be exploited. Oracle Autonomous Database patches itself as soon as available, without relying on humans remembering to apply . We all know that unpatched systems leave companies open to attack, yet sometimes leaving security vulnerabilities unaddressed is a company’s only option. Your IT team needs time to bring the system down, patch it, and get it up and running again. During every minute of that time, all of your applications, databases, software, and servers are vulnerable to hackers to exploit. With Oracle Autonomous Database, patching happens automatically, while the system is running–without any downtime.

Secure Your Tomorrow Today

Protecting the value of your most valuable assets has never been more important. And it’s something even the most powerful IT security team can’t do on their own. By facing the reality of a potential data breach with the power of a virtual cyber robot army, you protect your data, your brand, and your reputation. Oracle Autonomous Database eliminates the chance of human error, protects your data from intentional or external malicious actors. You can’t stop a hacker from trying, but with Oracle Autonomous Database, you will always stop them from succeeding.

Join a Database month event: Inside the Mind of a Database Hacker with Penny Avril, VP of Oracle Database Server Technologies and Mark Fallon, Lead Security Architect, Oracle Database

Follow Us On Social Media:

Related:

  • No Related Posts

Securing the Oracle Database eBook – Second Edition Now Available

Ilona Gabinsky

Principal Product Marketing Manager

Today we have guest blogger – Michael Mesaros – director of Product Management.

What every data owner should read before hackers and auditors come knocking!

According to the Economist, data has surpassed oil as the most valuable asset. Data gives organizations unprecedented advantages, enabling them to find new ways to serve customers and create value. Your data is your asset, but unless you protect it well it could fall in wrong hands and become a liability.

We hear reports about breaches almost daily and by some estimates on average over 10 million records are lost or stolen each day worldwide. In addition, new laws and regulations such as the European Union’s GDPR are forcing organizations to take a hard look at how they manage and protect data. Since databases contain most of their sensitive data assets, organizations are now appreciating the importance of securing their databases.

Oracle Database provides the industry’s most comprehensive security. Read the latest eBook from Oracle, Securing the Oracle Database: A Technical Primer, authored by the Oracle Database Security Product Management team to:

  • Learn the various approaches hackers use to try to gain access to your sensitive data.
  • Understand the multiple layers of assessment, preventive, and detective security controls you need to protect your data.
  • Guide your teams with strategies to shrink the attack surface and keep your databases secure, both on-premises and in the cloud.

Use this book as a quick study into what every Database or Security Director/VP should know about the security of Oracle Databases. This book will help you answer questions such as:

  1. What are my options for authenticating and authorizing database users?
  2. How do I enforce separation of duties and limit access to data by administrators and other privileged users?
  3. How can I leverage encryption and key management to protect data in motion and at rest?
  4. How do I create application data sets that are safe to use in test, development and production environments?
  5. How do I audit database user activities and generate management and compliance reports?
  6. How do I monitor database activity and protect from attacks such as SQL injection?
  7. How do I leverage authorization technologies to build secure applications?
  8. How can I evaluate the security posture of my database, and understand what controls I can implement to manage risk?
  9. What is EU GDPR, and how can database security technologies help with this and other regulatory compliance requirements?
  10. What do I need to know about securing databases in the cloud?

Breaches are happening faster than ever and it is crucial that you are prepared with a sound database security strategy. Hackers aren’t resting in their endless quest to acquire your data, and we cannot risk resting either. Arm yourself with up-to-date information about these database securityconcepts.

Let’s start by securing the source!

Let’s start by securing the source! Download your eBook

Join a Database month event: Inside the Mind of a Database Hacker with Penny Avril, VP of Oracle Database Server Technologies and Mark Fallon, Lead Security Architect, Oracle Database

Learn more about Oracle Database Security Solutions

Related:

  • No Related Posts

The Future of Data Management is Autonomous.

Ilona Gabinsky

Principal Product Marketing Manager

Gartner again names Oracle as a Leader in Data Management Solutions for Analytics. At Oracle, we believe that we continue to demonstrate superior ability to execute by delivering ground breaking innovations to the industry. Oracle revolutionized data management with the delivery of the world’s first autonomous database.

Oracle Autonomous Database Cloud uses ground-breaking machine learning to enable automation that eliminates human labor, human error, and manual tuning to enable unprecedented availability, high performance, and security at much lower cost. “The Oracle Autonomous Database is based on a technology as revolutionary as the internet,” says Larry Ellison, Oracle Executive Chairman and CTO.

Autonomous Database:

  • Uses machine learning to automatically upgrade, patch, and tune itself
  • Recognizes unusual behavior and fixes problems before they become outages ensuring 99.995 reliability
  • Encrypts data by default, applies security patches automatically and protects from internal and external attacks

Transform Your Data Management Today.

Read the Oracle newsletter featuring Gartner content for more details.

https://www.gartner.com/technology/media-products/newsletters/oracle/1-4TXFZ4K/index.html

Gartner Magic Quadrant for Data Management Solutions for Analytics, Adam M. Ronthal, Roxane Edjlali, Rick Greenwald, 13 February 2018.

Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

Related:

  • No Related Posts