Back To The Database – Prologue: What’s Old Is New Again

Originally posted by Todd Sharp in the Oracle Developers Blog

We’re starting to see some influencers and larger organizations scale back from an “all in” and “by the book” stance on microservices and advocate for a more sensible and logical approach in 2020. There’s nothing wrong with the microservice pattern when used appropriately. It makes extremely complex application architectures easier to manage and continuously deploy for large, distributed teams, but just like all other design patterns it is not (and should not be) a solution for every architecture.

There is no question that smaller, purpose driven services are (and have been for many years) a smart approach to building out backends and APIs. If you’re not doing a complete “table/schema per service” with event sourcing and CQRS approach, it still makes sense to break up your persistence operations into small, manageable services that can be independently scaled and deployed. This is typically the approach that I’ve used over the past several years with the applications that I’ve built. Some people call this approach a “distributed monolith” or even just the standard Service Oriented Architecture (SOA). The ultimate point that I’m trying to express in the next few posts on this blog is that it’s OK to not “do” microservices “by the book”.

You shouldn’t feel bad if your application is labeled as a “monolith” or a “distributed monolith” or any other terms someone may come up with to make you feel like you’re inferior because you’re not up to date on the latest industry buzzword and fad. We can talk about trends and forecasts and where the industry is headed all day long, but at the end of the day your application needs to do the following things:

Be responsive, easy to use and error free.

When users visit your site, it needs to be user friendly, available and quick to respond to their requests and it needs to complete those requests without failure. Every other decision that you make about your architecture makes your developer’s lives easier or is reflected in your cloud bill at the end of the month. Don’t get me wrong – those are important things – but don’t lose focus on the most important goal which is creating a great experience for the end user.

So if our goal is to create a great experience for the people browsing and using our site/application, why do we spend so much time and effort worrying about making sure we’re following the latest trends and fads in the industry? Well, by our nature a lot of developers are tinkerers who are curious and always interested in learning new things. We like to challenge ourselves and no one likes to feel like they are being “left behind” in an industry (especially one that changes as rapidly as ours does). Having been in this industry for almost 20 years now I have started to notice that we tend to circle back to the trends that we used once before. We tend to “reinvent the wheel” quite a bit. I don’t know if we’ve forgotten the lessons we’ve learned, or we ultimately realize that adding layers of complexity to solutions is a bad thing. Either way, it’s OK – sometimes you have to learn things the hard way. Pain is OK, as long as it results in a positive outcome at the end of the day. It’s only when we refuse to learn from our mistakes or compound the pain that we’re dealing with out of stubbornness or ignorance that we’re really doing more harm than good.

Sometimes the best solution is the one that has been staring us in the face all along.

In the case of this series, maybe the best solution for publishing and subscribing to changes in our database has been the one we’ve pretended didn’t exist all along: using the database itself. There’s nothing wrong with using triggers, stored procedures or scheduled jobs in our database, but for the last few years we’ve just stopped using them. We’ve kept this logic outside of the database because…reasons…for a long time. I’d bet that the first reason I would hear if I asked someone why we’ve done that would be that putting business logic in the DB leads to “vendor lock in” because each RDBMS has its own flavor of SQL and none of them are compatible when it comes to certain tasks. To which I’d respond: when was the last time you re-wrote or re-architected an application where the DB layer moved completely intact to the new solution? My answer, in 16 years, was: “never”. If we rewrote something, we looked at the persistence tier and made changes as necessary. Things tend to get ugly over time, and that tier is no exception. Don’t let “vendor lock in” be your excuse. You’re not going to find a piece of functionality that exists in one RDBMS that doesn’t have a suitable counterpart in another RDBMS. You just won’t. Don’t use that excuse.

Regardless if you’ve gone “all in” or are building a “distributed monolith” (or whatever you want to call it) the next few posts in this series will show you how to use the database to accomplish some tasks that you may be doing the “hard way” in your applications and services today. I’ll show you exactly how you can use Oracle’s Autonomous DB and a few PL/SQL scripts to publish all changes to a table via triggers and use scheduled jobs to update a table from messages posted to a stream in Oracle Streaming Service. But as I said earlier, none of what I’m going to cover is specific to Oracle DB (other than the code itself). This functionality can be done in just about any RDBMS out there – and I think it’s time we started trusting our database to handle data again instead of handling the complexity of these operations in our application code.

It’s time to think like we used to again. It’s time to stop caring about whether or not your solution is something that someone else in this industry thinks you should be doing and start worrying about our users and our developers. There’s no need to over-complicate software – it’s hard enough to build and maintain as it is.


How a Unified Approach Supports Your Data Strategy

Are you finding it easy to explore and analyze data located on-premise or in the cloud? You are not alone, but there is a solution.

It’s a rare instance of a company that stores 100 percent of its data in one place or a company that secures 100 percent of its data in the cloud. Most companies must combine datasets. But by establishing a unified data tier, it can be easier to perform certain types of analytics, especially when the data is widely distributed.

Never miss an update about big data! Subscribe to the Big Data Blog to receive the latest posts straight to your inbox!

Take for example the case of a bike-share system that looked at its publicly available ridership data, then added weather data to predict bike ridership and made appropriate changes to make sure bikes were available when and where riders needed them. If the data was stored in different geographical areas and used different storage systems, it might be difficult to compare that information to make an informed decision.

So how can companies take advantage of data, whether it’s located in Oracle Autonomous Data Warehouse, Oracle Database, object store, or Hadoop? A recent Oracle webcast titled, “Explore, Access, and Integrate Any Data, Anywhere,” explored this issue. Host Peter Jeffcock outlined four new services Oracle released in February 2020 to let companies dive right in and solve these real-world problems, manage data, and enable augmented analytics:

The idea is that there needs to be a unified data tier that starts with workload portability, which means that your data and the data environment can be managed in the public cloud, on a local cloud, or in your on-premise data store.

Unified Data Tier

The next step is to develop a converged database, especially with an autonomous component so that repeatable processes free up administrative time and reduce human error. Oracle Database allows for multiple data models, multiple workloads, and multiple tenants, making it easier to operate because all these processes are managed into a single database.

You can take it one step further if you add the cloud to the configuration. Oracle can manage the data and apply different processes and machine learning so that you can run your database autonomously in the cloud.

Unified Data Tier

The unified data tier also means taking advantage of multiple data stores such as data lakes and other databases. And finally expanding that ecosystem with our partners such as our recent agreement with Microsoft that allows for a unified data tier between Oracle Cloud and Microsoft’s Azure.

“If you want to run an application in the Microsoft Cloud and you want to connect to the Oracle Cloud where the data is stored, that’s now supported. It’s a unique relationship and it’s something to look into if you want to run a multi-cloud strategy,” Jeffcock says.

You can experience the full presentation if you register for the on-demand webcast.

To learn more about how to get started with data lakes, check out Oracle Big Data Service—and don’t forget to subscribe to the Oracle Big Data blog to get the latest posts sent to your inbox. Also, follow us on Twitter @OracleBigData.


Build Your Data Lake with Oracle Big Data Service

In today’s world, there’s an ever-growing deluge of highly diverse data coming from diverse sources. In the struggle to manage and organize that data, practitioners are finding it harder when they only have the traditional relational database or data warehouse as options.

That’s why the data lake has become increasingly popular as a complement to traditional data management. Think of the traditional data warehouse as a reservoir—it’s cleansed, drinkable.

The data lake, on the other hand, has data that are of potentially unknown value. The data isn’t necessarily cleansed—which is why it’s more of an adventure. The data lake can be voluminous, brimming with data and unmatched possibilities. Users can easily load even more data and start experimenting to find new insights that organizations couldn’t discover before.

Organizations must be able to:

• Store their data in a way that is less complicated

• Reduce management even though the data is more complex

• Use data in a way that makes sense for them

And that’s exactly why Oracle has created Oracle Big Data Service as a way to help build data lakes.

Oracle Big Data Service is an automated service based on Cloudera Enterprise that provides a cost-effective Hadoop data lake environment—a secure place to store and analyze data of different types from any source. It can be used as a data lake or a machine learning platform.

It comes with a fully integrated stack that includes both open-source and Oracle value-added tools, and it’s designed for enterprises that need flexible deployment options, scalability, and the ability to add tools of their choosing.

Oracle Big Data Service also provides:

  • An easy way to expand on premises to Oracle Cloud
  • Secure, reliable, and elastic Hadoop clusters in minutes
  • Native integration with Oracle Cloud platform services

Never miss an update about big data! Subscribe to the Big Data Blog to receive the latest posts straight to your inbox!

Oracle + Hadoop = A Better Data Lake Together

We wanted to make the power of Hadoop and the entire Hadoop ecosystem available to you. But Hadoop can be complicated, which is why we’ve combined the best of what Oracle and Cloudera have to offer and made it into something easier to handle—which makes building and managing your data lake easier than ever.

With Cloudera Enterprise Deployment, our service is vertically integrated for Hadoop, Kafka, and Spark with a best-practices, high-availability deployment.

With Big Data Service, you get:

  • Highly secure, highly available clusters provisioned in minutes
  • Ability to expand on-premises Hadoop, which enables you to deploy, test, development, and/or move data lakes to cloud
  • Flexibility to scale as you wish using high-performance bare metal or cost-effective virtual machine shapes
  • Automatically deployed security and management features

You also can choose your Cloudera version, giving you the ability to:

  • Match your current deployment—which is important for test and dev environments
  • Deploy new versions—allowing you to take advantage of the distribution’s latest features

Oracle Big Data Service Features

We built Oracle Big Data Service to be your go-to big data and data lake solution, one that’s specifically designed for a diverse set of big data use cases and workloads. From short-lived clusters used to tackle specific tasks to long-lived clusters that manage large data lakes, Oracle Big Data Service scales to meet an organization’s requirements at a low cost and with the highest levels of security.

Let’s explore just how Oracle Big Data Service does this.

  1. Oracle Big Data Service and Oracle Cloud SQL

Use Oracle SQL to query across big data sources with Oracle Cloud SQL, including the Hadoop Distributed File System (HDFS), Hive, object stores, Kafka, and NoSQL.

You can accomplish all of this with simple administration, because Oracle Cloud SQL uses existing Hive metadata and security, and offers fast, scale-out processing using Oracle Cloud SQL compute.

  1. Oracle Big Data Service and Big Data Analytics

What use is managing and accessing your data if you can’t run analytics to find real results? We offer support in the areas of machine learning, spatial analysis, and graph analysis to help you get the information your organization needs to gain better business results and improved metrics. Oracle Big Data Service customers are licensed for these options and can deploy at no extra cost.

It’s also easy to connect to Oracle Cloud services such as Oracle Analytics Cloud, Oracle Cloud Infrastructure Data Science, or Oracle Autonomous Database. Or you can use any Cloudera-certified application for a wide range of analytic tools and applications.

  1. Oracle Big Data Service and Workload Portability

Cloud may be the future of enterprise computing, which is why we’ve built the newest, best cloud infrastructure out there with Oracle Cloud Infrastructure. But it’s not everything—at least, not yet. You still need to maintain a mix of public cloud, local cloud, and traditional on-premises computing for the foreseeable future.

With Oracle Big Data Service, deploy where it makes sense. With Oracle, if you develop something on premises, it’s easy to move that to the cloud and vice versa.

  1. Oracle Big Data Service and Secure, High-Availability Clusters

With Oracle Big Data Service, expect easy deployment when creating your clusters. Specify minimal settings to create the cluster, then use just one click to create a cluster with highly available Hadoop services.

You also get a choice of Cloudera versions, enabling “Cloud Also” deployments to match for on-premises compatibility, or you can choose newer versions to take advantage of the latest features.

  1. Oracle Big Data Service Offers Security

If you’re using off-box virtualization, Oracle can’t see customer data and customers can’t see Oracle management code. In most first-generation clouds, the network and tenant environments are coupled, only abstracted by the hypervisor.

Oracle follows a Least Trust Design principle. We don’t trust the hardware, the customer (think rogue employees), or the hypervisor. That’s why we’ve separated our network and tenant environments. Isolating that network virtualization helps prevent the spread and lateral movement of attacks.

In addition, with Oracle Big Data Service, all Cloudera security features are enabled with strong authentication, role-based authorization, auditing, and encryption.

  1. Oracle Big Data Service and the Compute and Storage You Want

Whether you’re using Oracle Big Data Cloud for development, test, data science, or data lakes, we offer the compute offerings you need for your use case. Leverage the flexibility of virtual machines (VMs), block storage, and with direct-attached NVMe (non-volatile memory express) storage, the unparalleled performance of bare metal.

  1. Oracle Big Data Service and Superior Networking

With Oracle Big Data Service, you can expect high fidelity, virtual networks, and connectivity. Our networking is:


  • Fully configurable IP addresses, subnets, routing, and firewalls to support new or existing private networks

High performance and consistent

  • High bandwidth, microsecond latency network
  • Private access without traversing the internet

Capable of connecting to corporate networks

  • FastConnect—dedicated, private connectivity
  • VPN Connect—simple and secure internet connectivity
  1. Oracle Big Data Service and Oracle’s Data Management Platform

Your organization spends time and effort creating, attaining and storing data and you want to be able to use it. You can reduce the time, cost, and effort of getting data from wherever it originates to all the places it’s needed across the enterprise with Oracle.

Oracle has spent decades building and expanding its data management platform.

With Oracle’s end-to-end data management, you get an easy connection to:

  • Oracle Autonomous Database
  • Oracle Analytics Cloud
  • Oracle Cloud Infrastructure Streaming
  • Oracle Cloud Infrastructure Data Catalog
  • Oracle Cloud Infrastructure Data Science
  • Oracle Cloud Infrastructure Data Flow
  • The list goes on …

And with a unified query with Oracle Cloud SQL, you’ll be able to correlate information from a variety of sources using Oracle SQL. In addition, you will gain a host of Oracle analytic and connectivity options, including:

  • Oracle Machine Learning
  • Oracle Big Data Spatial and Graph
  • Oracle Big Data Connectors
  • Oracle Data Integrator Enterprise Edition

Oracle Big Data Service for All Your Data Lake Needs

From enabling machine learning to storing and analyzing data, Oracle Big Data Service is a scalable, secure data lake service that meets your requirements at a low cost and the highest levels of security.

It allows you to worry less about managing and storing data. And it empowers you to start analyzing your data in a way that makes the future of your organization more successful than ever before.

To learn more about how to get started with data lakes, check out Oracle Big Data Service—and don’t forget to subscribe to the Oracle Big Data blog to get the latest posts sent to your inbox.


Oracle Named 2020 Gartner Peer Insights Customers’ Choice for Operational Database Management …

Oracle has been named a March 2020 Gartner Peer Insights Customers’ Choice for Operational Database Management Systems.

Oracle Database – the industry’s leading database continues to deliver cutting-edge innovations. Oracle Autonomous Database, the company’s latest innovation, uses ground-breaking machine learning to eliminate manual database management. The result? Unprecedented availability, high performance, and security, all for a much lower cost. Oracle Autonomous Database is self-driving, self-securing, and self-repairing.

  • Self-Driving: Provides continuous adaptive performance tuning based on machine learning.
  • Self-Securing: Automatically upgrades and patches itself while running. Automatically applies security updates while running to protect against cyberattacks.
  • Self-Repairing: Automatically protects from all types of downtime, including system failures, maintenance, user errors, and changes to the application data model.

Gartner Peer Insights hosts more than 330,000 verified customer reviews across 340+ defined markets. In markets with enough data to report, up to seven vendors are recognized that are the most highly rated by their customers with the Gartner Peer Insights Customers’ Choice distinction. All reviews go through a strict validation and moderation process, while playing a significant role in the customers’ buying process.

According to Gartner, “The Gartner Peer Insights Customers’ Choice is a recognition of vendors in this market by verified end-user professionals.” To ensure fair evaluation, Gartner maintains rigorous criteria for recognizing vendors with a high customer satisfaction rate.

Over the past year, Oracle Database has received a rating of 4.5 out of 5 for the Operational Database Management systems market, based on 682 published reviews across different industries as of March 20. Here are excerpts of some of the reviews which contributed to this distinction:

Oracle Database is very secured and provides data consistency at any point of time” Oracle DBA in the Healthcare Industry

Best of Class Technology and Outstanding Support” — CTO in the Services Industry

Autonomous lives up to its name and improved our technology footprint. The experience with Oracle has been very productive and successful. The tools have performed better than expected. We’ve experienced faster system responses and reduced costs. When needed, they provide access to expert resources to help clarify or troubleshoot issues.” CDO in the Healthcare Industry

Everyone at Oracle is deeply honored to be nameda 2020 Customers’ Choice for the Operational Database Management Systems Market. To learn more about this distinction and to read the reviews written by our customers visit the Customers’ Choice landing page on Gartner Peer Insights. To all of our customers who submitted the reviews, thank you! We are only successful when our customers are successful!

The GARTNER PEER INSIGHTS CUSTOMERS’ CHOICE badge is a trademark and service mark of Gartner, Inc., and/or its affiliates, and is used herein with permission. All rights reserved. Gartner Peer Insights Customers’ Choice constitute the subjective opinions of individual end-user reviews, ratings, and data applied against a documented methodology; they neither represent the views of, nor constitute an endorsement by, Gartner or its affiliates.


  • No Related Posts

Data Lakes: Examining the End to End Process

It’s a good way to think of a data lake as being the ultimate hub for your organization. On the most basic level, it takes data in from various sources and makes it available for users to query. But much more goes on during the entire end to end process involving a data lake. To get a clearer understanding of how it all comes together—and a bird’s-eye view of what it can do for your organization—let’s look at each step in depth.

Never miss an update about big data! Subscribe to the Big Data Blog to receive the latest posts straight to your inbox!

Step 1: Identify and connect sources

Unlike data warehouses, data lakes can take inputs from nearly any type of source. Structured, unstructured, and semi-structured data can all coexist in a data lake. The primary goal of this type of feature is allowing all of the data to exist in a single repository in its raw format. A data warehouse specializes in housing processed and prepared data for use, and while that is certainly helpful in many instances, it still leaves many types of data out of the equation. By unifying these disparate data sources into a single source, a data lake allows users to have access to all types of data without requiring the logistical legwork of connecting to individual data warehouses.

Step 2: Ingest data into zones

If a data lake is set up per best practices, then incoming data will not just get dumped into a single data swamp. Instead, since the data sources come from known quantities, it is possible to establish landing zones for datasets from particular sources. For example, if you know that a dataset contains sensitive financial information, it can immediately go into a zone that limits access by user role and additional security measures. If it’s data that comes in a set format ready for use by a certain user group (for example, the data scientists in HR), then that can immediately go into a zone defined for that. And if another dataset delivers raw data with minimal metadata specifics to easily identify it on a database level (like a stream of images), then that can go into its own zone of raw data, essentially setting that group aside for further processing.

In general, it’s recommended that the following zones be used for incoming data. Establishing this zone sorting right away allows for the first broad strokes of organization to be completed without any manual intervention. There are still more steps to go to optimize discoverability and readiness, but this automates the first big step. Per our blog post 6 Ways To Improve Data Lake Security, these are the recommended zones to establish in a data lake:

Temporal: Where ephemeral data such as copies and streaming spools live prior to deletion.

Raw: Where raw data lives prior to processing. Data in this zone may also be further encrypted if it contains sensitive material.

Trusted: Where data that has been validated as trustworthy lives for easy access by data scientists, analysts, and other end users.

Refined: Where enriched and manipulated data lives, often as final outputs from tools.

Step 3: Apply security measures

Data arrives into a data lake completely raw. That means that any inherent security risk with the source data comes along for the ride when it lands in the data lake. If there’s a CSV file with fields containing sensitive data, it will remain that way until security steps have been applied. If step 2 has been established as an automated process, then the initial sorting will help get you halfway to a secure configuration.

Other measures to consider include:

  • Clear user-based access defined by roles, needs, and organization.
  • Encryption based on a big-picture assessment of compatibility within your existing infrastructure.
  • Scrubbing the data for red flags, such as known malware issues, suspicious file names or formats (such as an executable file living in a dataset that is otherwise media files). Machine learning can significantly speed up this process.

Running all incoming data through a standardized security process ensures consistency among protocols and execution; if automation is involved, this also helps to maximize efficiency. The result? The highest levels of confidence that your data will go only to the users that should see it.

Step 4: Apply metadata

Once the data is secure, that means that it’s safe for users to access it—but how will they find it? Discoverability is only enabled when the data is properly organized and tagged with metadata. Unfortunately, since data lakes take in raw data, data can arrive with nothing but a filename, format, and time stamp. So what can you do with this?

A data catalog is a tool that can work with data lakes in a way that optimizes discovery. By enabling more metadata application, data can be organized and labeled in an accurate and effective way. In addition, if machine learning is utilized, the data catalog can begin recognizing patterns and habits to automatically label things. For example, let’s assume a data source is consistently sending MP3 files of various lengths—but the ones over twenty minutes are always given the metatag “podcast” after arriving in the data lake. Machine learning will pick up on that pattern and then start auto-tagging that group with “podcast” upon arrival.

Given that the volume of big data is getting bigger—and that more and more sources of unstructured data are entering data lakes, that type of pattern learning and automation can make huge differences in efficiency.

Step 5: User discovery

Once data is sorted, it’s ready for users to discover. With all of those data sources consolidated into a single data lake, discovery is easier than ever before. If tools like analytics exist outside of the data lake’s infrastructure, then there’s only one export/import step that needs to take place for the data to be used. In a best-case scenario, those tools are integrated into the data lake, allowing for real-time queries against the absolute latest data, all without any manual intervention.

Why is this so important? A recent survey showed that, on average, five data sources are consulted before making a decision. Consider the inefficiency if each source has to be queried and called manually. Putting it all in a single accessible data lake and integrating tools for real-time data querying removes numerous steps so that discovery can be as easy as a few clicks.

The Hidden Benefits of a Data Lake

The above details break down the end-to-end process of a data lake—and the resulting benefits go beyond saving time and money. By opening up more data to users and removing numerous access and workflow hurdles, users have the flexibility to try new perspectives, experiment with data, and look for other results. All of this leads to previously impossible insights, which can drive an organization’s innovation in new and unpredictable ways.

To learn more about how to get started with data lakes, check out Oracle Big Data Service—and don’t forget to subscribe to the Oracle Big Data blog to get the latest posts sent to your inbox.


Four Tools to Integrate into Your Data Lake

A data lake is an absolutely vital piece of today’s big data business environment. A single company may have incoming data from a huge variety of sources, and having a means to handle all of that is essential. For example, your business might be compiling data from places as diverse as your social media feed, your app’s metrics, your internal HR tracking, your website analytics, and your marketing campaigns. A data lake can help you get your arms around all of that, funneling those sources into a single consolidated repository of raw data.

But what can you do with that data once it’s all been brought into a data lake? The truth is that putting everything into a large repository is only part of the equation. While it’s possible to pull data from there for further analysis, a data lake without any integrated tools remains functional but cumbersome, even clunky.

On the other hand, when a data lake integrates with the right tools, the entire user experience opens up. The result is streamlined access to data while minimizing errors during export and ingestion. In fact, integrated tools do more than just make things faster and easier. By expediting automation, the door opens to exciting new insights, allowing for new perspectives and new discoveries that can maximize the potential of your business.

To get there, you’ll need to put the right pieces in place. Here are four essential tools to integrate into your data lake experience.

Never miss an update about big data! Subscribe to the Big Data Blog to receive the latest posts straight to your inbox!

Machine Learning

Even if your data sources are vetted, secured, and organized, the sheer volume of data makes it unruly. As a data lake tends to be a repository for raw data—which includes unstructured items such as MP3 files, video files, and emails, in addition to structured items such as form data—much of the incoming data across various sources can only be natively organized so far. While it can be easy to set up a known data source for, say, form data into a repository dedicated to the fields related to that format, other data (such as images) arrives with limited discoverability.

Machine learning can help accelerate the processing of this data. With machine learning, data is organized and made more accessible through various processes, including:

In processed datasets, machine learning can use historical data and results to identify patterns and insights ahead of time, flagging them for further examination and analysis.

With raw data, machine learning can analyze usage patterns and historical metadata assignments to begin implementing metadata automatically for faster discovery.

The latter point requires the use of a data catalog tool, which leads us to the next point.

Data Catalog

Simply put, a data catalog is a tool that integrates into any data repository for metadata management and assignment. Products like Oracle Cloud Infrastructure Data Catalog are a critical element of data processing. With a data catalog, raw data can be assigned technical, operational, and business metadata. These are defined as:

  • Technical metadata: Used in the storage and structure of the data in a database or system
  • Business metadata: Contributed by users as annotations or business context
  • Operational metadata: Created from the processing and accessing of data, which indicates data freshness and data usage, and connects everything together in a meaningful way

By implementing metadata, raw data can be made much more accessible. This accelerates organization, preparation, and discoverability for all users without any need to dig into the technical details of raw data within the data lake.

Integrated Analytics

A data lake acts as a middleman between data sources and tools, storing the data until it is called for by data scientists and business users. When analytics and other tools exist separate from the data lake, that adds further steps for additional preparation and formatting, exporting to CSV or other standardized formats, and then importing into the analytics platform. Sometimes, this also includes additional configuration once inside the analytics platform for usability. The cumulative effect of all these steps creates a drag on the overall analysis process, and while having all the data within the data lake is certainly a help, this lack of connectivity creates significant hurdles within a workflow.

Thus, the ideal way to allow all users within an organization to swiftly access data is to use analytics tools that seamlessly integrate with your data lake. Doing so removes unnecessary manual steps for data preparation and ingestion. This really comes into play when experimenting with variability in datasets; rather than having to pull a new dataset every time you experiment with different variables, integrated tools allow this to be done in real time (or near-real time). Not only does this make things easier, this flexibility opens the door to new levels of insight as it allows for previously unavailable experimentation.

Integrated Graph Analytics

In recent years, data analysts have started to take advantage of graph analyticsthat is, a newer form of data analysis that creates insights based on relationships between data points. For those new to the concept, graph analytics considers individual data points similar to dots in a bubble—each data point is a dot, and graph analytics allows you to examine the relationship between data by identifying volume of related connections, proximity, strength of connection, and other factors.

This is a powerful tool that can be used for new types of analysis in datasets with the need to examine relationships between data points. Graph analytics often works with a graph database itself or through a separate graph analytics tool. As with traditional analytics, any sort of extra data exporting/ingesting can slow down the process or create data inaccuracies depending on the level of manual involvement. To get the most out of your data lake, integrating cutting-edge tools such as graph analytics means giving data scientists the means to produce insights as they see fit.

Why Oracle Big Data Service?

Oracle Big Data Service is a powerful Hadoop-based data lake solution that delivers all of the needs and capabilities required in a big data world:

  • Integration: Oracle Big Data Service is built on Oracle Cloud Infrastructure and integrates seamlessly into related services and features such as Oracle Analytics Cloud and Oracle Cloud Infrastructure Data Catalog.
  • Comprehensive software stack: Oracle Big Data Service comes with key big data software: Oracle Machine Learning for Spark, Oracle Spatial Analysis, Oracle Graph Analysis, and much more.
  • Provisioning: Deploying a fully configured version of Cloudera Enterprise, Oracle Big Data Service easily configures and scales up as needed.
  • Secure and highly available: With built-in high availability and security measures, Oracle Big Data Service integrates and executes this in a single click.

To learn more about Oracle Big Data Service, click here—and don’t forget to subscribe to the Oracle Big Data blog to get the latest posts sent to your inbox.


Podcast #377: Oracle Autonomous Database — An Interview with Maria Colgan

This episode brings you an interview with Maria Colgan, Master Product Manager for Oracle Database. Maria joined Oracle in 1996, and since then has held positions as product manager for Oracle Database In-Memory and the Oracle Database query optimizer. Maria is the primary author of the SQLMaria blog and a contributing author to the Oracle Optimizer blog.

“With the Autonomous Database, data scientists and developers can basically help themselves, provision the database and get running and utilize it.” – Maria Colgan

Handling the host duties for this program is Alexa Weber Morales. An award-winning musician and writer, Alexa is director of developer content at Oracle. She is the former editor in chief of Software Development magazine, and has more than 15 years of experience as a technology content strategist and journalist.

In this program Maria talks about Oracle Autonomous Database and the new features that allow data scientists, developers, and others beyond traditional database users to help themselves.

This program is Groundbreakers Podcast #377. The interview was originally recorded on September 17, 2019 at Oracle Openworld. Listen!

On the Mic

Maria ColganMaria Colgan

Master Product Manager, Oracle Database, Oracle

San Francisco, California

Alexa Weber MoralesAlexa Weber Morales

Editor/Content Strategist, Oracle

San Francisco, California

Additional Resources

Stay Tuned!

The Oracle Groundbreakers Podcast is available via:


GA of Oracle Database 20c Preview Release

The latest annual release of the world’s most popular database, Oracle Database 20c, is now available for preview on Oracle Cloud (Database Cloud Service Virtual Machine).

As with every new release, Oracle Database 20c introduces key new features and enhancements that further extend Oracle’s multi-model converged architecture with the introduction of Native Blockchain Tables, and more performance enhancements such as Automatic In-Memory (AIM) and a binary JSON datatype. For a quick introduction, watch Oracle EVP, Andy Mendelsohn discuss Oracle Database 20c during his last Openworld keynote.

For the complete list of new features in Oracle Database 20c, please refer to the new features guide in latest documentation set. To learn more about some of the key new features and enhancements in Oracle Database 20c, check out the following blog posts:

For availability of Oracle Database 20c on all other platforms on-premises (including Exadata) and in Oracle Cloud please refer to MyOracle Support (MOS) note 742060.1.


Six Retail Dashboards for Data Visualizations

Retail is rapidly transforming. Consumers expect an omni-channel experience that empowers them to shop from anywhere in the world. To capture this global demand, retailers have developed ecommerce platforms to complement their traditional brick-and-mortar stores. Ecommerce is truly revolutionizing the way retailers can collect data about the customer journey and identify key buying behaviors.

As described byBeyond Marketing by Deloitte, analysts are taking advantage of “greater volumes of diverse data—in an environment that a company controls—mak[ing] it possible to develop a deeper understanding of customers and individual preferences and behaviors.” Savvy retailers will leverage the influx of data to generate insights on how to further innovate products to the tastes and preferences of their target audience. But how can they do this?

Try a Data Warehouse to Improve Your Analytics Capabilities

Retailers are looking to leverage data to find innovative answers to their questions. While analysts are looking for those data insights, leadership want immediate insights delivered in a clear and concise dashboard to understand the business. Oracle Autonomous Database delivers analytical insights allowing retail analysts to immediately visualize their global operations on the fly with no specialized skills. In this blog, we’ll be creating sales dashboards for a global retailer to help them better understand and guide the business with data.

Analyzing Retail Dashboards

Retail analysts can create dashboards to track KPIs including: Sales, Inventory turnover, Return rates, Cost of customer acquisition, and Retention. These dashboards can help with monitoring KPIs with easily understandable graphics that can be shared with executive management to drive business decisions.

In this blog, we focus on retail dashboards that break down sales and revenue by:

  • Product
  • Region
  • Customer segment

In each dashboard, we will identify and isolate areas of interest. With the introduction of more data, you can continuously update the dashboard and create data-driven insights to guide the business.

Understanding the Retail Data Set

We used modified sales and profit data from a global retailer to simulate the data that retail analysts can incorporate into their own sales dashboards. In our Sample Order Lines excel (shown below), we track data elements such as: Order ID, Customer ID, Customer Segment, Product Category, Order Discount, Profit, and Order Tracking.

Data Visualization Desktop, a tool that comes free with ADW, allows users to continuously update their dashboards by easily uploading each month’s sales data. By introducing more data, we can understand how the business is changing over time and adapt accordingly.

For more on how to continuously update your dashboard please see: Loading Data into Autonomous Data Warehouse Using Oracle Data Visualization Desktop

We looked at the following questions:

  1. What is the current overview of sales and profit broken down by product?
  2. Which regional offices have the best sales performance?
  3. Which geographic regions are hotbeds of activity?
  4. How are different products and regions linked together by sales?
  5. Which market segments are the most profitable?
  6. Which specific products are driving profitability?

Here is the view of the data in Excel:

Here’s a quick snapshot of the data loaded into Data Visualization Desktop:

What is the current overview of sales and profit broken down by product?

This is a sales dashboard summary which shows the overall revenue and profit from different product segments. Here are some quick insights:

  • Using tiles (top left), we see: $1.3M profit out of $8.5M total sales making a 15.3% profit margin.
  • We use pie charts to break out total sales and profit by product category (top right). Technology products not only contribute the most to sales of any single product category (40.88%) but also make up the most profit of any single category (56.15%), meaning that technology products are the highest grossing product line. Under the pie charts is a pivot table showing the actual figures.
  • A line graph shows that every product category has been growing with technology products growing the fastest (bottom left). There was a spike in technology sales that started August and peaked in November 2018.
  • Sales are broken down by both product and customer segment so that we can understand more about the buying habits for different customers (bottom right).

For an even more detailed segment analysis, we also broke out the corporate customer segment (below) to compare with the overall business (above).

Which regional offices have the best sales performance?

In this visualization, we’re looking at the performance of different regional offices and how they’re collectively trending. We overlaid a horizontal stacked graph and a donut chart on a scatterplot. Using a dot to represent each city, the scatterplot analysis compares profit (x-axis) vs sales (y-axis) using larger dots to represent larger customer populations in each city.

For example, the dot in yellow (far right) represents San Paulo, Brazil with 127 customers generating $200,193 in sales and $44,169 in profit. As the most profitable city, San Paulo has a profit margin of 22%, averaging total purchases of $1,576 per customer. On the scatterplot, cities that make at least $10,000 of profit are indicated left of the dotted line.

The horizontal stacked graph (top left) breaks down sales by continent so you can see which regions are leading in sales. The donut chart (bottom right) indicates shows the total amount of sales from all the regions ($9M) and shows each region as a percent. Here are the leading regions by sales:

  • America (38.64%)
  • Europe (28.81%)
  • Asia (18.05%)

To learn more, we use the “keep selected” option to dynamically look at a specific region like Europe (shown below). We can see that Europe accounts for just under $2.5M in sales with the largest portion coming from Northern Europe. The scatterplot also dynamically changes to only show cities in Europe. Now you can now identify that the profitable European city is Belfast, Ireland ($27,729) and the city with the most sales is St. Petersburg, Russia ($127,521). This allows us to identify and replicate the success of the offices like Belfast and St. Petersburg in the other regions as well.

Which geographic regions are hotbeds of activity?

Analysts need to identify which markets to immediately focus on. Using a heat map, we can see which regions have the most sales (shown in color) and regions without sales (gray). This particular global retailer’s sales are primarily in developed markets:

1. America ($1.5M+)

2. United Kingdom ($887K)

3. Australia ($695K)

We can investigate further to pinpoint the exact cities (below) in the UK. We can see that the sales are originating from multiple cities including:
  • Belfast
  • Leeds
  • Manchester
  • Sheffield
Using a heat map can not only help identify how easily customers access storefront locations but also show where to expand operations based on demand.

How are different products and regions linked together by sales?

It’s often hard to see how different factors like sales, product, and geography are interrelated. Using a network map, we see how products categories (technology, furniture, office supplies) are linked to continents that are sublinked to countries. The thickness of the connecting line from one node on the network to another is based on sales and the deeper shades of green are represent more profits. We hover over the line connecting Africa to Southern Africa (above) to see the total sales ($242K) and profit ($34K) from Southern Africa.

Another way to focus on a specific region is to hover over a specific node and use the “keep selected” option (below). In this in the example, we only identify nodes linked to Europe. By doing this, we can see that a majority of the sales and profits from Europe are coming from technology products ($1,030K sales, $213K profit) and originating from the Northern Europe ($974K sales, $162K profit) specifically the UK ($880K sales, $162K profit). Analysts can identify the regional sources of sales/profit while seeing a macroview of how products and regions are linked.

Which market segments are the most profitable?

It’s critical to understand which customer groups are growing the fastest and generating the most sales and profit. We use a stacked bar (left) and scatterplot (right) break down profitability by market segment in FY18. We categorize buyer types into:

  • Consumer
  • Corporate
  • Home office
  • Small business

In the stack bar, we can see that the sales has been growing from Q2 to Q4 but the primary market segments driving sales growth are corporate (61% growth since Q1) and small business (53% growth since Q1). The combined growth of the corporate and small business segments lead to a $191K increase of sales of since Q1. Although these two segment made up over 63% of total sales in FY18Q4, we can also see that sales from the home office segment more than doubled from FY18Q3 to FY18Q4.

In a scatterplot (right), we can see the changes in profit ratio of each market segment over time. The profit ratio formula divides net profits for a reporting period by net sales for the same period. The fastest growing market segments and the most profitable market segments in FY18 (top right quadrant) are:

  • Corporate
  • Small business

We can also isolate the profitability of the corporate customer segment (below). By generating insights about the target market segments, companies are able to focus their product development and marketing efforts.

Which specific products are driving profitability?

Retailers are often managing a portfolio of hundreds, if not thousands, of products. This complexity makes it challenging to track and identify the profitability of individual products. However, we can easily visualize how profitability has changed over time and compare it to specific products. We use a combo graph (top left) to indicate changes to sales and profit ratios over time.

Generally, we can see that every year sales (and profits) increase from Q1 to Q4 then drop off with the start of the next Q1. We use a waterfall graph to track how profits have gradually changed over time (bottom left). From 2013 to the end of 2018, there was a net gain of $167K in profit.

Analysts identify performant products to expand and unprofitable products to cut. On the right, we track sales and profit ratios by individual products. We can see that product that generate the most sales are:

  1. Telephones/communication tools ($1,380K)
  2. Office machines ($1,077)
  3. Chairs ($1,046K)

The products with the highest profit ratio are:

  1. Binders (35 percent)
  2. Envelopes (32.4 percent)
  3. Labels (31.6 percent)
This means that for every binder sold, 35 percent of the sale was pure profit. We also found that product such as bookcases (-5.2 percent), tablets (-5.3 percent), and scissors/rulers (-8.3 percent) had negative profit ratios, which means that there was a loss on each sale. We can also isolate sales performance of the top five products (below).


Data visualizations dashboards empowered by Autonomous Data Warehouse allow major global retailers to easily understand the state of their business and make judgments on how adapt to dynamic market environments.

Oracle Autonomous Database allow users to easily create secure data marts in the cloud to generate powerful business insights – without specialized skills. It took us fewer than five minutes to provision a database and upload data for analysis.

Now you can also leverage the Autonomous Data Warehouse through a cloud trial:

Sign up for your free Autonomous Data Warehouse trial today

Please visit the blogs below for a step-by-step guide on how to start your free cloud trial: upload your data into OCI Object Store, create an Object Store Authentication Token, create a Database Credential for the user, and load data using the Data Import Wizard in SQL Developer:

Feedback and questions are welcome. Tell us about the dashboards you’ve created!