Machine Learning on Autonomous Database: A Practical Example

The Dataset Import and Preparation

The dataset used for building a network intrusion detection classifier is the classic KDD you can download here, released as first version in the 1999 KDD Cup, with 125.973 records in the training set. It was built for DARPA Intrusion Detection Evaluation Program by MIT Lincoln Laboratory. It provides a raw tcpdumptraffic coming from a local area network (LAN) that holds, as reported here, normal traffic and attacks falling into four main categories:

  • DOS: denial-of-service;
  • R2L: unauthorized access from a remote machine;
  • U2R: unauthorized access to local superuser (root) privileges;
  • Probing: surveillance.

The dataset is already split into training and test dataset.

The sub-classes into training dataset are 22 for attacks, and one “normal” for traffic allowed. The list of attacks and the associations with the four categories reported above is hold in this file.

In the test dataset we find 37 kind of attacks, so we have to delete records with class types not included into training set to avoid to affect the quality of final test of model accuracy.

Let’s describe how to import into an Autonomous DB instance these three main files: training/test dataset and dictionary, and analyze and prepare to provide finally to the algorithm chosen to be trained.

Oracle Cloud Infrastructure offers an Object Storage Service in which we can upload files and get an URL to set in our notebook and execute the import. In this way you have a secure and managed environment to store datasets that will be used by data scientists, without losing the governance of a datalab. Into the OCI console look for the Object Storage page as shown here:

Oracle Cloud Infrastructure Console

Oracle Cloud Infrastructure console

and upload the files: training_attack_types, KDDTest+.txt, KDDTrain+.txt previously downloaded using the url links reported above:

Upload a file into Oracle Object Storage

Upload a file into Oracle Object Storage

You can view the details of file uploaded and a preview of its content:

Object Details

Object Details

Now that your files are in the Object Storage, you need to grant the permissions to allow the import in your notebook.

In order to do this, you have to generate a token you will use in the API call. From OCI console:

User Details Menu

User Details Menu

Select your profile in order to access to your “Identity/Users/User Details” administration page. In this page, click on menu “Resources/Auth Tokens” in the left-down corner of the page:

In this way you will able to generate a token you will provide as credential to access your object storage files from PL/SQL notebook.

Create a new notebook and, as your first paragraph, you can put something like this:

%scriptBEGINDBMS_CLOUD.DROP_CREDENTIAL(credential_name => ‘CRED_KDD’);DBMS_CLOUD.CREATE_CREDENTIAL( credential_name => ‘CRED_KDD’, — Credential Token username => ‘oracleidentitycloudservice/email@domain.com’, password => ‘***************’ — Auth Token);END;

where you have to set:

  • a credential named “CRED_KDD”, that you’ll use later to get files;
  • your user name ‘oracleidentitycloudservice/email@domain.com’ that you’ll find in “Identity/Users/User Details” administration page
  • the auth token ‘*********************’ generated before.

Differently from pandas.read_csv(), you have to prepare a table corresponding to the file you are going to import from Object Storage. To avoid any problem during the import, I suggest to use Number type for continuos field and VARCHAR2(4000) type for categorical field.

This is the paragraph for that:

%sqlcreate table kdd_train ( duration NUMBER, protocol_type VARCHAR2(4000), service VARCHAR2(4000), flag VARCHAR2(4000), src_bytes NUMBER, dst_bytes NUMBER, land VARCHAR2(4000), wrong_fragment NUMBER, urgent NUMBER, hot NUMBER, num_failed_logins NUMBER, logged_in VARCHAR2(4000), num_compromised NUMBER, root_shell NUMBER, su_attempted NUMBER, num_root NUMBER, num_file_creations NUMBER, num_shells NUMBER, num_access_files NUMBER, num_outbound_cmds NUMBER, is_host_login NUMBER, is_guest_login NUMBER, count NUMBER, srv_count NUMBER, serror_rate NUMBER, srv_serror_rate NUMBER, rerror_rate NUMBER, srv_rerror_rate NUMBER, same_srv_rate NUMBER, diff_srv_rate NUMBER, srv_diff_host_rate NUMBER, dst_host_count NUMBER, dst_host_srv_count NUMBER, dst_host_same_srv_rate NUMBER, dst_host_diff_srv_rate NUMBER, dst_host_same_src_port_rate NUMBER, dst_host_srv_diff_host_rate NUMBER, dst_host_serror_rate NUMBER, dst_host_srv_serror_rate NUMBER, dst_host_rerror_rate NUMBER, dst_host_srv_rerror_rate NUMBER, type VARCHAR2(4000), nil number);

The last field “nil” is an improvement that will be ignored and deleted after the import. NOTE: if the import command will not find the same number of fields defined into the table, the process will be aborted.

Normally we have a full dataset and we want to split in 70%–30% proportions to get a training/test dataset. In sklearn we have a function to do this sklearn.model_selection.train_test_split(). In PL/SQL, we can simply do:

%sqlcreate table train_data as select * from dataset_table sample (70) seed (1);create table test_data as select * from dataset_table minus select * from train_data;

With the first one we’ll extract randomly 70% of dataset_table with a seed and we put into a train_data table. Then we’ll get the difference between full dataset and training dataset to create the test_data table.

The same structure it will be used for the test dataset, so we simply do a paragraph with:

%sqlCREATE TABLE kdd_test AS (SELECT * FROM kdd_train);

Now it’s time to import the datasets with DBMS_CLOUD.COPY_DATA():

DBMS_CLOUD.COPY_DATA( table_name =>’KDD_TRAIN’, credential_name =>’CRED_KDD’,—- https://objectstorage.us-ashburn-1.oraclecloud.com/n/italysandbox/b/adwhimport/o/KDDTrain%2B.txt file_uri_list =>’https://swiftobjectstorage.us-ashburn-1.oraclecloud.com/v1/italysandbox/adwhimport/KDDTrain%2B.txt',format => json_object(‘delimiter’ value ‘,’));

the main parameters to set are:

  • table_name : the table previously created to import the dataset (KDD_TRAIN)
  • credential_name: the key corresponding to the credential stored (CRED_KDD)
  • format: set the delimiter into the file it will be imported
  • file_uri_list: from the object details page, get the URL Path (URI). For example:
https://objectstorage.us-ashburn-1.oraclecloud.com/n/italysandbox/b/adwhimport/o/KDDTrain%2B.txt

and create a swift object url as follow:

How to Convert Object URL to URL Accepted by DBMS_CLOUD.COPY_DATA()

At this point you can start the data exploration. For example, with this paragraph:

%sqlselect distinct type, count(type) as items from kdd_train group by type

you can have the distribution of attacks type as table:

Distribution of Attack Types

but, differently from a Python scikit-learn stack, you don’t have to write any line of codes to get a graph with matplotlib.pyplot(), but clicking on one of the icons of graphic type, you will have:

and refine the content of diagram working on “setting”, as follows:

Other manipulation you can do, is dropping the “nil” field imported but not useful for training, or add a key id that isn’t into the original dataset imported:

%scriptALTER TABLE kdd_train DROP COLUMN NIL;ALTER TABLE kdd_train ADD id number;UPDATE kdd_train SET id = ROWNUM;

For the unbalanced distribution of this dataset, we’ll aggregate the original 23 types of records into the five categories mapped by training_attack_types file. To do this, we import as done before the file into a prepared KDD_ATTACKTYPE table, adding a record at the end to cover the “normal”traffic type:

%scriptinsert into KDD_ATTACKTYPE (attack,category) values (‘normal’,’normal’);

we can check the number of classes with:

%sqlselect count(*) from kdd_attacktype order by attack

Now, with a simple piece of code we can manipulate the attack type reducing from 23 to 5 the classes, leveraging the dictionary created with the previous import:

%sqlUPDATE kdd_trainSET type = ( SELECT category FROM kdd_attacktype WHERE type=attack)WHERE type <> ‘normal’;

In this way the training dataset will hold the network traffic classified into 5 types only (one as “normal”). Now the distribution isn’t still optimal, but a bit more balanced:

That’s all: we can proceed to the training phase.

But, probably someone of you is asking if I’m missing something in the data preparation pipeline.

For example, the transformation of symbolic fields in a one-hot encoding, avoiding to leave the original field to prevent multicollinearity. Another must to have in the data preparation is the standardization process, in order to rescale the numeric fields to have a mean of 0 and a standard deviation of 1.

The replacement of missing values with the mean in case of numerical attributes or mode in case of categorical attributes, it’s another operation that if missed it could abort the training process if you use some kind of algorithms. The binning it’s another manipulation needed by algorithms like Naive Bayes.

Nothing of these manipulations are needed using Oracle Machine Learning algorithms.

The algorithms implemented on OML have the Automatic Data Preparation features that automatically does all the operations described above and much more. More details about this feature are reported here. For who wants to disable this features, it is always possible, setting the attributes of the algorithm chosen.

Related:

  • No Related Posts

Machine Learning, Spatial and Graph – No License Required!

In keeping with Oracle’s mission to help people see data in new ways, discover insights, unlock endless possibilities, customers wishing to utilize the Machine Learning, Spatial and Graph features of Oracle Database are no longer required to purchase additional licenses.

As of December 5, 2019, the Machine Learning (formerly known as Advanced Analytics), Spatial and Graph features of Oracle Database may be used for development and deployment purposes with all on-prem editions and Oracle Cloud Database Services. See the Oracle Database Licensing Information Manual (pdf) for more details.

This latest announcement further enhances the benefits of Oracle’s multi-model converged architecture by supporting multiple data types, data models (e.g. spatial, graph, JSON, XML) and algorithms (e.g. machine learning, graph and statistical functions) and workload types (e.g. operational and analytical) within a single database. At no additional cost, customers can now take full advantage of the enterprise-class performance, reliability and security of Oracle Database for particular use cases, such as:

  • processing and analyzing all types of spatial data in business applications, GIS and operational systems
  • using graph analysis to discover relationships in social networks, detect fraud, and make informed recommendations
  • building and deploying machine learning models for predictive analytics

Developers and data scientists can use standard SQL interfaces and/or APIs with Oracle’s Machine Learning functions, Graph analytics and Spatial operators to develop their models and applications. Most importantly, Oracle’s converged data architecture approach spares customers the cost and complexity commonly associated with the multiple single-purpose database approach advocated by some. To learn more about this exciting development, read the following blogs:

Watch the video (below) to learn from Oracle’s Andy Mendelsohn how a multi-model converged database architecture is preferable to single-use databases for typical customer use cases.

Related:

  • No Related Posts

Oracle Database Achieves Highest Scores in all Use Cases in 2019 Gartner Critical Capabilities …

In the 2019 Gartner report, ‘Critical Capabilities for Operational Database Management Systems’, Oracle Database again achieved the highest scores in all four Use Cases. The operational DBMS Use Cases analyzed by Gartner are; Traditional Transactions, Distributed Variable Data, Event Processing/Data in Motion and Augmented Transactions. Click any Use Case image (below) to read the full report.

Gartner says, “Our analysis synthesizes product information provided by vendors and information gathered from interactions with Gartner clients over the past 12 months. Relevant responses from our survey of the vendor’s reference customers, conducted during June of 2019, are also included.”

We believe Oracle’s rankings are due, in part, to its strong rating for high-speed transaction processing, HA/DR and security. Indeed, Oracle’s ratings for all but two of the critical capabilities were 4.0 or above, meeting and exceeding requirements, and HA/DR was rated at 5.0, highest of all vendors.

Source: Gartner Critical Capabilities for Operational Database Management Systems , Donald Feinberg, Merv Adrian, Henry Cook , 25 November 2019

Gartner Disclaimer

Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, express or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

The graphics (above) were published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request from Oracle.

GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally, and is used herein with permission. All rights reserved.

Related:

  • No Related Posts

Oracle | A Leader in 2019 Gartner Magic Quadrant for Operational DBMS

Oracle has been named a Leader in the 2019 Gartner report,’ Magic Quadrant for Operational Database Management Systems’.

Oracle continues to deliver innovations and enhancements in database software with the latest generation of Oracle Database, in infrastructure with engineered systems such as Oracle Exadata, and in the cloud with the Oracle Autonomous Database. Click image (below) to read the full report.

Source: Gartner Magic Quadrant for Operational Database Management Systems. Merv Adrian, Donald Feinberg, Henry Cook, 25 November 2019

Gartner Disclaimer

Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, express or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

The graphic (above) was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request from Oracle.

GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally, and is used herein with permission. All rights reserved.

Related:

  • No Related Posts

DX Marketing Demonstrates Combined Power of Autonomous Data Warehouse and Oracle Analytics

For most businesses, getting the most out of data means assembling the right tools needed for the development of deep insights. But what happens when your entire business is about doing that for others? Suddenly, your data isn’t just your own internal dataset, but every client’s dataset too. Data management suddenly becomes much more complicated.

The team at DX Marketing (DXM), an award-winning insights company with offices in Greenville, South Carolina, and Savannah, Georgia, found themselves in that exact predicament. As a company focused on providing data-driven digital marketing, every individual client account was essentially a new source of big data.

Data is inherent in everything DXM does. Its products involve collecting data for clients, analyzing that data, and then leveraging it into predictive models, all to deliver insights fulfilling specific end goals. This could be increasing conversion, predicting audience behavior in specific channels, maximizing ROI, entering a new geographic market, or all of the above. With so many data sources, DXM needed a platform to unify it all. Without that, hours of work were wasted performing logistical tasks such as data consolidation and preparation.

To make matters more complicated, several other factors entered the equation. For the most accurate predictive insights for customers, DXM licensed US consumer data from Epsilon. This refreshed the demographic dataset every six weeks. When combined together, it created an intense process for correlation working across datasets. Other logistical factors included being Health Insurance Portability and Accountability Act (HIPAA) compliant regarding security protocols, and providing cloud-based access for a broad team of analysts and data scientists—preferably with an easy-to-learn interface and report generation that could enable greater flexibility in resource usage. In addition, the DXM team wanted to explore the idea of machine learning and artificial intelligence to expedite data preparation and analysis.

In short, it was a lot. And the amount of data coming in wasn’t getting any easier to manage; in fact, as the calendar continued to turn, the data volume followed the worldwide trend of increasing as time moved on. What product could fulfill all of DXM’s needs?

As it turned out, the answer was not a single product but a pair of products working seamlessly together. That’s why DXM went with Oracle’s winning combination of Oracle Autonomous Data Warehouse and Oracle Analytics.

Never miss an update about big data!Subscribe to the Big Data Blog to receive the latest posts straight to your inbox!

A Unified Oracle Platform to Handle Big Data

Let’s examine all these needs one by one:

  • Consolidate Many Data Sources: Oracle’s Autonomous Data Warehouse acts as a smart repository for DXM’s many data sources, including the Epsilon demographic updates that arrived every six weeks.
  • Maintain HIPAA Security Protocols: 1996’s HIPAA law establishes privacy and security rules for electronic health records. Oracle’s platform offers protocol compliance in accordance with AICPA SSAE 18, AT-C sections 205 and 315, among many other global security compliances in defense, finance, and other industries. In addition, because data stays within a single environment rather than being transferred around for processing, risk is inherently minimized.
  • Gain Cloud Access: DXM stressed the need for team members to have access. Fortunately, Oracle Autonomous Data Warehouse and Oracle Analytics Cloud natively provide this, ensuring the access and flexibility required by DXM to keep projects on schedule.
  • Employ Easy-to-Use Interface: Oracle’s platform stresses usability. In particular, Oracle Analytics Cloud makes it easy for business users to create in-depth reports and develop insights without the depth of knowledge of an IT staff member or a data scientist.
  • Harness Machine Learning and Artificial Intelligence: Both Oracle Autonomous Data Warehouse and Oracle Analytics Cloud use embedded machine learning and artificial intelligence in different ways to make the lives of users easier. As a self-running platform, Oracle Autonomous Data Warehouse configures and runs smoother, in addition to expediting the data ingestion and preparation process. For Oracle Analytics Cloud, machine learning and artificial intelligence simplify the data analysis process, speeding up basic tasks while generating new insights.

By combining two powerful Oracle platforms into a single data machine, DXM quickly saw improvements on all levels. Starting with everyday tasks and going all the way to client data, results rolled in for DXM’s two pilot programs. In fact, within six months, customer acquisition cost decreased by 52%, deliverables sped up by 70%, and revenue grew by 25%. How satisfied was DXM with its investment? “It’s been an invaluable tool for us,” says Ray Owens, CEO of DX Marketing. “We’re already, just literally six months in, utilizing some of the key features that we probably wouldn’t have picked up on with traditional database routines.”

To get the full scoop on how DXM made the most out of its Oracle technology, download the complete customer success story or watch the video. And for more about how you can benefit from Oracle Big Data, visit Oracle’s Big Data page—and don’t forget tosubscribe to the Oracle Big Data blog to get the latest posts sent to your inbox.

Related:

  • No Related Posts

DX Marketing Combines Power of Autonomous Data Warehouse and Oracle Analytics

For most businesses, getting the most out of data means assembling the right tools needed for the development of deep insights. But what happens when your entire business is about doing that for others? Suddenly, your data isn’t just your own internal dataset, but every client’s dataset too. Data management suddenly becomes much more complicated.

The team at DX Marketing (DXM), an award-winning insights company with offices in Greenville, South Carolina, and Savannah, Georgia, found themselves in that exact predicament. As a company focused on providing data-driven digital marketing, every individual client account was essentially a new source of big data.

Data is inherent in everything DXM does. Its products involve collecting data for clients, analyzing that data, and then leveraging it into predictive models, all to deliver insights fulfilling specific end goals. This could be increasing conversion, predicting audience behavior in specific channels, maximizing ROI, entering a new geographic market, or all of the above. With so many data sources, DXM needed a platform to unify it all. Without that, hours of work were wasted performing logistical tasks such as data consolidation and preparation.

To make matters more complicated, several other factors entered the equation. For the most accurate predictive insights for customers, DXM licensed US consumer data from Epsilon. This refreshed the demographic dataset every six weeks. When combined together, it created an intense process for correlation working across datasets. Other logistical factors included being Health Insurance Portability and Accountability Act (HIPAA) compliant regarding security protocols, and providing cloud-based access for a broad team of analysts and data scientists—preferably with an easy-to-learn interface and report generation that could enable greater flexibility in resource usage. In addition, the DXM team wanted to explore the idea of machine learning and artificial intelligence to expedite data preparation and analysis.

In short, it was a lot. And the amount of data coming in wasn’t getting any easier to manage; in fact, as the calendar continued to turn, the data volume followed the worldwide trend of increasing as time moved on. What product could fulfill all of DXM’s needs?

As it turned out, the answer was not a single product but a pair of products working seamlessly together. That’s why DXM went with Oracle’s winning combination of Oracle Autonomous Data Warehouse and Oracle Analytics.

Never miss an update about big data!Subscribe to the Big Data Blog to receive the latest posts straight to your inbox!

A Unified Oracle Platform to Handle Big Data

Let’s examine all these needs one by one:

  • Consolidate Many Data Sources: Oracle’s Autonomous Data Warehouse acts as a smart repository for DXM’s many data sources, including the Epsilon demographic updates that arrived every six weeks.
  • Maintain HIPAA Security Protocols: 1996’s HIPAA law establishes privacy and security rules for electronic health records. Oracle’s platform offers protocol compliance in accordance with AICPA SSAE 18, AT-C sections 205 and 315, among many other global security compliances in defense, finance, and other industries. In addition, because data stays within a single environment rather than being transferred around for processing, risk is inherently minimized.
  • Gain Cloud Access: DXM stressed the need for team members to have access. Fortunately, Oracle Autonomous Data Warehouse and Oracle Analytics Cloud natively provide this, ensuring the access and flexibility required by DXM to keep projects on schedule.
  • Employ Easy-to-Use Interface: Oracle’s platform stresses usability. In particular, Oracle Analytics Cloud makes it easy for business users to create in-depth reports and develop insights without the depth of knowledge of an IT staff member or a data scientist.
  • Harness Machine Learning and Artificial Intelligence: Both Oracle Autonomous Data Warehouse and Oracle Analytics Cloud use embedded machine learning and artificial intelligence in different ways to make the lives of users easier. As a self-running platform, Oracle Autonomous Data Warehouse configures and runs smoother, in addition to expediting the data ingestion and preparation process. For Oracle Analytics Cloud, machine learning and artificial intelligence simplify the data analysis process, speeding up basic tasks while generating new insights.

By combining two powerful Oracle platforms into a single data machine, DXM quickly saw improvements on all levels. Starting with everyday tasks and going all the way to client data, results rolled in for DXM’s two pilot programs. In fact, within six months, customer acquisition cost decreased by 52%, deliverables sped up by 70%, and revenue grew by 25%. How satisfied was DXM with its investment? “It’s been an invaluable tool for us,” says Ray Owens, CEO of DX Marketing. “We’re already, just literally six months in, utilizing some of the key features that we probably wouldn’t have picked up on with traditional database routines.”

To get the full scoop on how DXM made the most out of its Oracle technology, download the complete customer success story or watch the video. And for more about how you can benefit from Oracle Big Data, visit Oracle’s Big Data page—and don’t forget tosubscribe to the Oracle Big Data blog to get the latest posts sent to your inbox.

Related:

  • No Related Posts

Introducing Just Enough Documentation

The Oracle Database User Assistance group is introducing “just enough documentation”, a combination of two content types, interactive diagrams and video. Our aim is to help you get started using Oracle’s products as quickly as possible through a combination of interactivity and visual presentation. Our first implementation of this approach is the Architectural Overview for Oracle Data Safe.

The interface accommodates three different learning styles.

At the top of the page, you can download the content for offline viewing or printing by clicking Show PDF. You can also advance through the content using the slide control buttons. Click Next to view the next slide, click Previous to back up one slide, and click First to return to the main slide.

The interactive diagram is a implemented as a graphic that you manipulate with your mouse. Click in a box that has an ellipses to explore details of a component in the diagram. A right-click in a component opens the related documentation.

The text below the diagram describes the highlighted components at a high level, and you can watch a demonstration of the components in action by clicking the video link.

If you like interactive diagrams we have others, including the Oracle Database single-instance and cluster technical architecture diagrams.

What do you think of this approach? How else can we improve the documentation? Let us know in the comments!

Related:

  • No Related Posts

Keep Your Data Safe with Oracle Autonomous Database Today!

When I started my professional career building network diagrams, I knew the cylindrical symbol of a database was important. I had no idea how far it could come; self -driving, self-repairing, and self-securing databases seemed like something out of a science fiction, similar to self-driving cars and computers that fit into my back pocket. All those concepts are reality today. In 2018, Oracle introduced the Autonomous Database, and it changed the way databases are managed. Of course, we didn’t stop there; Oracle has been building upon the autonomous capabilities again and again. It’s important to recognize that the autonomous capabilities didn’t show up overnight though. They are a culmination of 40+ years of innovations in database, security, and more.

In the area of security, we continue our innovations to improve the self-securing capabilities. Many customers have already been excited about self-patching capabilities, freeing up their DBAs to focus on more strategic projects. Self-encryption has been especially important as shown when a recent breach in the hospitality industry exposed unencrypted personal information of their consumers.

Now, with some of the newest capabilities announced at OpenWorld 2019, Oracle has expanded the self-securing capabilities to secure the locations of the databases and automate the security of the customer information.

Data Safe is our newest database security service and it is included with Autonomous Database. Data Safe helps you better understand the sensitivity of data stored within the Autonomous Database and the risk inherent in users you grant access to that data. Data Safe gives you insight into how your users interact with the data; and when appropriate, lets you easily remove sensitive data from your test and development databases. One customer we spoke with prior to Data Safe shared her request to easily identify the age of an account and whether the account was still actively in use. As you can imagine, such a task would be laborious, repetitive, and tedious. Now, with Data Safe, she can quickly identify inactive database accounts, and as an added bonus can even prioritize those inactive accounts based upon the level of risk they represent. These capabilities and more are now available for Autonomous Database customers with Data Safe.

Autonomous Database Dedicated Deployment is our newest configuration of Autonomous Database. The Serverless Autonomous Database configuration has been very popular already with a strong adoption across all industries. Some customers seek more dedicated infrastructure and control over their Autonomous Database. The dedicated infrastructure for Autonomous Database means that they not only have a physical separation from other databases, but there is also a secure isolation zone for each Autonomous Database. The ability to provide layers of defenses to protect that dedicated Autonomous Database takes self-securing to the next level.

There’s even more to talk about in our self-securing capabilities, so stay tuned to our newest on-demand Autonomous Database security webcast. Tune in today – we not only cover these items, but we also have a direct demonstration of Data Safe available for you.

Related:

  • No Related Posts

How Graph Analytics Works: Six Degrees of Kevin Bacon

From a technical perspective, the term “graph analytics” means using a graph format to perform analysis of relationships between data based on strength and direction. That might be a bit hard to understand for the uninitiated, particularly when the traditional idea of data analysis brings up images of poring over spreadsheets—very big spreadsheets, when you’re looking at big data with petabytes or exabytes of data.

So let’s break down that definition piece by piece to offer some clarity. Assuming you know the general idea behind analytics, what is the difference when we add the word “graph” to it? Consider the general statement above:

“Using a graph format to perform analysis of relationships between data based on strength and direction.”

Drilling that statement down into individual pieces, we can look at segments to gain a greater understanding of the definition.

“Using a graph format”: The technical definition of a graph is the relationship between nodes (aka vertices or points) and edges (aka links or lines).

“Analysis of relationships”: Graph analytics excels at delivering insights from relationships. The visual nature of the method makes it much easier to identify unexpected relationships and derive insights faster and quicker than using, say, a tabular format of data. While you may be able to come to the same conclusion by analyzing, for example, a spreadsheet of data, a graph format can bring this about with far less effort. The phrase “a picture is worth a thousand words” essentially applies here, and with computing tools designed to maximize the capabilities of graph analytics, insights can be determined in much more efficient ways.

“Based on strength or direction”: If you consider data points to be nodes in a graph, then the edges connecting those points define the relationship between them. Thus, strength of relationship can be derived from the density of the edge (as in, two points have a dozen relationships, so it is denser than an edge with a single connection) as well as the direction of the edge (the visual layout of nodes can translate into spatial data, where physical distance offers insight into the node.)

Never miss an update about big data! Subscribe to the Big Data Blog to receive the latest posts straight to your inbox!

A Real-World Example of Graph Analytics

The type of insight provided by graph analytics doesn’t have to be a complex technical concept; in fact, one of the easiest ways to explain graph analytics is through a party game that pretty much everyone has played at one point or another: Six Degrees of Kevin Bacon.

If you’re one of the few people on the planet who’s never heard of it, the idea behind Six Degrees of Kevin Bacon came about in the 1990s based on the theory that every actor was connected to Kevin Bacon through six degrees of working relationships. When playing with friends, the challenge is to minimize the number of degrees connecting the actor to Kevin Bacon—functionally, this is the same thing as running a query via graph analytics.

Imagine every actor working in Hollywood as a node in a massive graph, with Kevin Bacon at the center of it. Edges are drawn for every film connection between actors. We want to run a query to find the relationships that connect Kevin Bacon to another random node (actor). For this example, let’s pick Pedro Pascal (Game of Thrones, The Mandalorian). Pascal’s lengthy list of high-profile work means that he shares the cast list with many other notable actors, creating nearly limitless paths for connecting to Kevin Bacon. However, the goal is to find the shortest path to Kevin Bacon.

To do this, we run a query that analyzes the various paths (or, if you’re doing this in a party setting, you just think really hard), ultimately generating this output:

  • Kevin Bacon (node) was in Crazy, Stupid, Love (edge) with Julianne Moore (node).
  • Julianne Moore (node) was in Kingsman: The Golden Circle (edge) with Pedro Pascal.

 How Graph Analytics Works: Six Degrees of Kevin Bacon

The website The Oracle of Bacon (no relation to Oracle, of course) is an online database project built around this game. It should be noted that the site’s database uses a graph algorithm known as breadth-first, and in this instance, the site would give this a Bacon Number of two, because there are two edges Thus, the shortest path to connect Kevin Bacon is through Julianne Moore. That’s a pretty easy example given that Pedro Pascal currently has a high-profile stature. But what if we pick someone who’s a little more obscure—someone whose career came about prior to Kevin Bacon’s breakout period in the 1980s? For this example, let’s use Wendell Corey, a character actor who worked in the 1940s, 50s, and 60s.

If we were using an analytics tool, we would submit a query to search for the number of relationships between Kevin Bacon and Wendell Corey. This produces a Bacon Number of three:

  • Kevin Bacon (node) was in Animal House (edge) with Tim Matheson (node).
  • Tim Matheson (node) was in The Apple Dumping Gang Rides Again (edge) with Audrey Totter (node).
  • Audrey Totter (node) was in Any Number Can Play (edge) with Wendell Corey (node).

 How Graph Analytics Works: Six Degrees of Kevin Bacon

Using a breadth-first search, even seemingly obscure connections can find relationships quickly and efficiently. How else can we use the Kevin Bacon example to demonstrate elements of graph analytics? The principle behind Six Degrees is to minimize the number of edges between nodes. To try different types of graph analyses, we can look at the strength of relationships. In this case, if the nodes are Kevin Bacon and other actors, we can assign a single edge for every time they are in a film together. Thus, the theoretical strongest relationship between Kevin Bacon and another actor comes down to the quantity of edges in their respective filmographies.

 How Graph Analytics Works: Six Degrees of Kevin Bacon

Distance between nodes offers another dimension in how data is analyzed. For example, if we place Kevin Bacon at the center of this, then the placement of nodes (coordinates) is based on how recently the film with Kevin Bacon was made. In this case, Jill Hennessey co-starred in the series City on a Hill, which was made in 2019, so her node in this analysis would be placed immediately next to Kevin Bacon at the center. Queen Latifah appeared in the film Beauty Shop with Kevin Bacon in 2005, which would place her node further out. With the nodes distributed this way, a more-refined analysis could be run based on connections made in the last ten years.

 How Graph Analytics Works: Six Degrees of Kevin Bacon

The context of the query can change as well if you take the focus off Kevin Bacon. Degree centrality, or the calculation of a node’s volume of relationships in relation to the largest volume of relationships (mathematically, a percentage calculated by node’s relationships divided by largest volume of relationships), can easily be determined using graph analytics—which shows us, by the way, that Kevin Bacon does not have the largest degree centrality among actors listed in IMDb. Consider the logistics of trying to calculate such a thing by analyzing data in a tabular format versus a graph format where nodes and edges naturally take care of a significant amount of the prep work.

Digging Deeper into Graph Analytics

Six Degrees of Kevin Bacon provides a fun and accessible example of graph analytics, but there’s much more to this method from both a functional and technical perspective. To learn more about the how, what, and why behind graph analytics, check out What Is Graph Analytics for an in-depth explanation.

For more about how you can benefit from Oracle Big Data, visit Oracle’s Big Data page—and don’t forget to subscribe to the Oracle Big Data blog to get the latest posts sent to your inbox.

Related:

  • No Related Posts

Do you know what you know?

In recent years there has been a lot written on the topic of big data, and one idea I find intriguing is that when it’s used effectively, teams with access to big data can go from being dependent on what they know, to being driven by what they can learn. This opens up opportunities for new insights, which an in turn lead to opportunities in areas like efficiency, competitive differentiation and so on.

The key to having data is not that you have data (or that it’s ’big’), it’s the potential business benefit you gain from having data. Unfortunately, too often we put a wall between the data, and the people and teams who could really benefit from it. Sometimes the wall is an expert, or a cost, or simply time. If that wall becomes higher when you collect more and more data, collecting more data won’t add more value to your business.

This is why the ‘question’ analogy is useful here. The goal with data is not to collect as much as we can, it’s to answer specific and evolving questions related to our business. To do so, you need to break down those walls requiring you to find the expertise, budget and time to answer those questions. Failing this, your data will continue to silently hold the answers to your most compelling questions.

So what does the perfect data app look like?

Given that we understand the on-going problem of not being able to get the most out of our data when we need to, it’s a useful exercise to review how, in an ideal world, we would potentially use data to empower our teams. And because we all love acronyms, let’s go for the three ‘U’s of a perfect data application –

Up-to-date – the data you are most interested in to guide your actions today is data that’s current. This means that any delay between an application accumulating data and making it available must be reduced to as close to zero as possible. If you need to update a sales forecast let’s say, you absolutely need the most up-to-date information.

Ubiquitous – you shouldn’t have to jump through hoops to get at that important data, so it should be easily accessible to everyone who needs it, when they need it. Whether it’s a support engineer on the Shinkansen pulling out of Tokyo on the way to a customer meeting, or the product lead preparing for a quarterly review at HQ in Palo Alto.

User-friendly – nobody should need to be ‘trained’ before getting access to the data they need, and everyone should be able to access it directly without having to call on the ‘expert’. And when the data is presented to them, it should be obvious what they’re looking at – so no need for specific expertise, or other documentation or other reference guides. And what’s more, if the app no longer serves the needs of the team as those needs evolve, it should be easy to update.

If your data is sensitive, security is of course a critical concern. You will often need to reliably limit access to the data you choose to share, and perhaps also have a level of control over who can see what data.

And just because the barrier to entry is low, it doesn’t mean that the resulting applications can only be rudimentary – the low code application development tools offered by Oracle are enabling feature rich applications like the Oracle Learning Library today.

Succeed fast, succeed often

Contrary to the common tech mantra, wouldn’t it be better if you could quickly build an app to share critical data with your team, and not hesitate to do it? So rather than wondering if it would be possible, and deciding that you don’t have the skills, you would ideally want a way of developing with a platform that put more of a focus on getting up and running quickly. And the beauty of lowering the barrier to entry means that you can experiment.

The value of experimentation is that it can help you uncover unexpected insights. So let’s say that it would take your expert six months to implement a single solution that you carefully defined, imagine if your team could come up with alternatives in less than two weeks? Initially those alternatives may not be perfect, but the learning process may uncover new possibilities not originally imagined, and ultimately lead to a superior solution.

“Oracle Database and Oracle Application Express are critical to our mission of identifying the biomarkers related to breast cancer. The solution gives our researchers the tools they need to successfully filter and reduce data, as well as generate hypotheses, without being slowed by weeks or months of software obstacles. In addition, the tool’s collaborative services make us more competitive for future research opportunities,” said Dr. John Springer, assistant professor, department of computer and information technology, Purdue University.

Why not make everyone smarter?

That’s obvious – right? If everyone who needs insights from the data can get them, they’re by definition ‘smarter’. But if we dig a little deeper, we’re effectively doing this by lowering the expertise, cost and time barriers. We do this at Oracle in two ways – first of all we significantly lower the effort to make that data continuously available and secure with Oracle Autonomous Database, then we make the data you and your team care about easily accessible with via low code application development with APEX/ORDS/SDW.

Everyone in the team also gets smarter by adding the skill to quickly develop an app that gives them access to the data they care about, when they care about it. And because that data lives in the Oracle Autonomous Database, teams can spend significantly less time and effort keeping their data available and secure – freeing team members to move from data management to data insights.

One great consequence of being able to develop applications more easily, is that they can now be developed speculatively. Imagine being in a position where you can quickly spin up an app on demand, where you don’t need to put it in a pipeline to get budget and the attention of your ‘experts’. That’s what the combination of Oracle Autonomous Database and Oracle APEX/ORDS/SDW gives you.

What’s Next?

In the next installment in this blog series, we’ll take look at the kinds of applications that can be quickly developed and easily maintained, and what kind of impact you can really have on your business by combining the power of Oracle Autonomous Database with the low code development capabilities offered by APEX and related technology.

Related:

  • No Related Posts