6 Benefits of a Cloud Data Warehouse

Sometimes it seems like cloud technology all that anyone in the tech world is talking about.

But not all companies have adopted a data warehouse in the cloud. We’ve written this article to help answer some questions:

  • Do you even want a data warehouse in the cloud?
  • What can you expect from it and what are the benefits?

Let’s explore these key topics one by one.

Sign up for your free data warehouse trial

Question 1 – Do you even want a data warehouse in the cloud?

Of course you do! Look at how fast your data warehouse is growing. Look at the growing number of requests building up for new data warehouse projects, new data discovery sandboxes, new departmental marts, faster query response times, etc. Every IT department is looking for a silver bullet that can magically help them meet the growing demands for data access coming their business units. That silver bullet would be cloud.

Question 2: What can you expect from a cloud data warehouse and what are the key benefits? There are many, but we’ve identified the top six benefits for you.

Data Warehouse Cloud Benefit #1: Lower Costs With Elasticity

The biggest reason most people move to a data warehouse in the cloud is cost. Storing data on-premise, in your own data center, can get very expensive. And expanding your data footprint often makes it harder to support all of your ever-expanding analytical needs.

Why? Well with an on-premise data warehouse, you can’t independently scale compute and storage – at least not that quickly or easily. Typically, if you need more storage the compute will come with it and you end up having to pay for both.

In addition, you need to purchase as much compute as you need for peak times. So if you’re a retail company worried about how much compute you need to handle Black Friday, well, tough luck—you’re stuck with that much compute for the whole year.

Fortunately, it doesn’t have to be that way.

With the best kind of data warehouse, your system can instantly and flexibly scale to deliver as much or as little compute is necessary, whenever it is that you need it. And, because compute and storage are separate, you only need to purchase what’s essential. Lastly, you also don’t have as many upfront costs—hardware, server rooms, networking, adding extra staff, etc.

Data Warehouse Cloud Benefit #2: Quick to Deploy

In the past, IT teams had to estimate how much storage and compute power would be necessary for their line of business teams—sometimes three years in advance. Getting this information incorrect would mean buying hardware they didn’t need, or facing complaints if there was a lack of storage.

Today, this complicated, detailed planning-and-estimation process isn’t necessary. With the cloud, business users can build their own data warehouse, data mart, or sandbox in only minutes, at any time (night or day). Having a data warehouse in the cloud allows organizations to pay for only the resources they need—when they need it.

In addition, Oracle’s cloud makes it quicker and easier to roll out new data warehouse projects such as data discovery sandboxes. IT and business teams can develop and/or prototype new services and products without spending large sums of money on infrastructure.

Data Warehouse Cloud Benefit #3: Grow Your Capabilities

Having a data warehouse in the cloud improves the overall value of the data warehouse. It means that business intelligence and other applications can deliver faster, smarter insights to the business since the availability, scalability and performance are better.

As Penny Avril, VP of Product Management said in a previous article about autonomous capabilities for data warehouses, “The value of the business is driven by data, and by the usage of the data. For many companies, the data is the only real capital they have. Oracle is making it easier for the C-level to manage and use that data. That should help the bottom line.”

With a data warehouse in the cloud, you can engage in the full spectrum of data warehousing from business analytics, data integration, IoT, and more as a complete, integrated solution.

Data Warehouse Cloud Benefit #4: Self-Service Data Warehousing

Self-service is only truly possible if you have a self-driving database.Just as the cloud data warehouse has many benefits, a self-driving, autonomous data warehouse offers even more benefits. Essentially, you don’t really have to worry about managing the data warehouse anymore.

And that means you can benefit from fully automated management, fully automated patching, and upgrades. It means as business user, you don’t need IT to spin up a new data mart for you. You simply log into the cloud and provision a new data warehouse yourself, in minutes.

Data is more available and accessible than ever before.

This allows IT teams to focus attention and resources on more strategic aspects of providing value to the business. But this doesn’t mean that DBAs will be out of work—they still have to manage how applications connect to the data warehouse and how developers use the in-database features and functions within their application code.

Data Warehouse Cloud Benefit #5: More Secure Data

In the past, people were convinced that on-premises data warehouses were more secure. But in the same way that they now trust digital copies more than physical paper copies, some are beginning to see a data warehouse in the cloud as more secure than an on-premises system.

But obviously, it all depends on the database company. So choose a company that has a business model that relies on data security and encryption. Preferably, that company should have over four decades of experience with entire departments to protecting your most valuable asset … Hmmm, who could that be?

Just as an aside, with our self-driving database, the Autonomous Data Warehouse, we have strong data encryption switched on by default to ensure your data is fully protected.

Data Warehouse Cloud Benefit #6: The Cloud Itself

A self-driving database makes everything easier: it takes care of much of the dull but highly valuable work that most people don’t want to do. A self-driving database will help you gain even more ability and capability in the cloud.

For many customers, adopting a data warehouse is just one step on a multi-step journey. You need to make sure that your cloud provider offers a complete path to the cloud that encompasses integrated IaaS, PaaS, and SaaS solutions.

You can simplify your IT infrastructure and minimize capital investments by utilizing your cloud’s services for infrastructure, data management, applications, and business intelligence.

When it comes to choosing a cloud, make sure the one you pick allows for flexible deployment models, enabling you to seamlessly migrate your IT workloads from an on-premises data center to the cloud and back again.

Conclusion

The benefits of having a data warehouse in the cloud are many. But don’t just stop there; think about the benefits of a self-driving data warehouse too—and how much more you could accomplish.

If you’re ready to get started, sign up for a free Autonomous Data Warehouse trial. You’ll be able to:

  • Deploy a new data warehouse in minutes
  • Quickly run sample queries against billion-row tables in seconds
  • Work with Oracle Machine Learning SQL notebooks to build and run machine learning models
  • Use Oracle Analytics Cloud to create interactive, guided data visualizations

Get started today:

Step 1 – Sign up for a new, free trial account

Step 2 – Get started with our free Oracle Learning Library workshop

Step 3 – Learn about loading the data warehouse for business analytics

Written by Sherry Tiao and Keith Laker

Related:

How 4 Customers Use Autonomous Data Warehouse & Analytics

Effective organizations want access to their data, fast, and they want it readily available for analytics. That’s what makes the Autonomous Data Warehouse such a great fit for businesses. It abstracts away the complexities of managing and maintaining a data warehouse while still making it easy for business analysts to sift through and analyze potentially millions of records.

Sign up today for your free data warehouse trial

This enables businesses to spend more time and resources on answering questions about how the business is performing and what to do next, and less time on routine maintenance and upgrades.

How Customers Use a Data Warehouse with Analytics

Here we’ve gathered together four customers who use the Autonomous Data Warehouse with their analytics. Watch what they have to say about their experience, and learn how a self-driving data warehouse helps them deliver more business value.

Data Tank Fuels Growth with Autonomous Data Warehouse

The Autonomous Data Warehouse enables Drop Tank to stand up a data warehouse in about an hour, and then start pulling in useful information in around four hours. This enables them to see information and act upon it very quickly.

With a data warehouse that automatically scales, Drop Tank can run a promotion and even if there’s 500 times the amount of transaction volume, the system can recognize that, make some tuning adjustments, secure systems, and deliver what Drop Tank needs without needing to hire people to manage that.

They’ve also found value in Oracle’s universal credit model. Drop Tank CEO David VanWiggeren said, “If we decide we want to spin up the Analytics Cloud, try it for a day or two and turn it off, we can do that. That’s incredibly flexible and very valuable to a company like us.

With Autonomous Data Warehouse, Drop Tank can now monetize their data and use it to drive a loyalty program to further delight their customers.

Data Intensity and Reporting With Autonomous Data Warehouse

Data Intensity decided to use Oracle Autonomous Data Warehouse to solve a problem they had around finance and financial reporting. Their finance team was spending around 60 percent of their time getting data out of systems, and only the remaining 40 percent generating value back to the business.

They chose Autonomous Data Warehouse because it was quick, easy, solved a lot of problems for them, and suited their agile development. In addition, they’ve really appreciated the flexibility of a data warehouse in the cloud, and being able to scale up and scale down the solution as needed for financial reporting periods.

Their CFO is especially delighted. With the Autonomous Data Warehouse and Oracle Analytics Cloud together, he can get the data he needs when he needs it – even during important meetings.

Since implementing Autonomous Data Warehouse, Data Intensity has had an initial savings of nearly a quarter of a million dollars and they’re running on 10 times less hardware than they were previously. They also have 10 times the number of users accessing the system as they used to, and all of them are driving value rather than just spending their time getting data out of the system.

Looker: Analytics at the Speed of Thought

At Looker, they were seeing demand for a fully managed experience where people didn’t have to worry about the hardware component. Because of the Autonomous Data Warehouse, users can focus on the analytics from day 1 and have interactive question-answer sessions in real time.

Now, Looker can feel confident that they’re fulfilling their growth while providing analytics to the entire organizations as they keep adding new users.

DX Marketing: Advanced Analytics in Autonomous Data Warehouse

DX Marketing wanted to build a data management platform people that non-technical people could build themselves. Having an Autonomous Data Warehouse makes things easier for the end user. And using Oracle Advanced Analytics with Autonomous Data Warehouse means that everything runs in the database. There’s no external system pulling data down and processing it and putting it back, which alleviates any kind of network latency.

Four Companies, Four Success Stories with Autonomous Data Warehouse

With Autonomous Data Warehouse, we’ve built a data warehouse that essentially runs itself. These are the only five questions you need to answer before setting up your data warehouse:

  • How many CPUs do you want?
  • How much storage do you need?
  • What’s your password?
  • What’s the database name?
  • What’s a brief description?

It’s really that simple. To get started today and see how you can stop worrying about data management and start thinking about how to take your analytics to the next level, sign up for a free data warehouse trial. It’s easy, it’s fast, and we have a step-by-step how-to guide right here.

Related:

Data Warehouse 101: Introduction

What Is a Data Warehouse?

A data warehouse is a relational database that is designed for queries and analytics rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources. It separates analytics workloads from transaction workloads and enables an organization to consolidate data from several sources.

So what’s an Oracle Autonomous Data Warehouse?

With an Autonomous Data Warehouse, you no longer need specialized database administration skills. You could be a marketing manager, financial analyst or HRIS administrator, and start an analytics project in minutes without involving IT. Anyone can easily explore data for deeper business insights and transform analytics into visual stories to guide executive decision making.

Sign Up for a Free Data Warehouse Trial

What’s Inside an Autonomous Data Warehouse?

Oracle Autonomous Data Warehouse is built on the market-leading Oracle database and comes with fully automated features that offload manual IT administration tasks and deliver outstanding query performance. This environment is delivered as a fully managed cloud service running on optimized high-end Oracle hardware systems. You don’t need to spend time thinking about how you should store your data, when or how to back it up or how to tune your queries.

We take care of everything for you.This video explains the key features in Oracle’s Autonomous Data Warehouse:

This ongoing series of blogs provides you with a detailed, step-by-step guide on Oracle’s Autonomous Data Warehouse. With Oracle Autonomous Data Warehouse, we make it quick and easy for you to create a secure, fully managed data warehouse service in the Oracle Cloud which allows you to start loading and analyzing your data immediately.

We will provide you with step-by-step instructions on how to:

We’ll play with publicly available data sets and analyze them with the service, demonstrating the types of powerful visualizations you can create yourself.

How to Set up a Data Warehouse Trial

Let’s get started by setting up a free data warehouse trial:

  • You can quickly and easily sign up for a free trial account that provides:
    • $300 of free credits good for up to 3500 hours of Oracle Cloud usage
    • Credits can be used on all eligible Cloud Platform and Infrastructure services for the next 30 days
    • Your credit card will only be used for verification purposes and will not be charged unless you ‘Upgrade to Paid’ in My Services

Click on the image below to go to the trial sign-up page which will allow you to request your free cloud account:

Once your trial account is created, you will receive a “Welcome to Oracle Cloud” email that contains your cloud account password along with links to useful collateral. To sign into the Oracle Cloud, click here.

Logging in to Oracle Cloud, Selecting a Data Center for your Workshop

  • When you login into the Cloud Console for Autonomous Data Warehouse you will have the option of choosing the REGION for your new data warehouse instance.

To ensure you get the very best experience possible during this workshop, we recommend that when creating instances of Autonomous Data Warehouse in North America Data Region, please choose our data center in Phoenix. When creating instances of Autonomous Data Warehouse in EMEA and APAC, please choose our data center in Frankfurt.

For example the image below shows how you would select the us-phoenix-1 REGION.

If you do not see the US-Phoenix-1 region (or any other region), choose the menu item “Manage Regions”. Subscribe to the region you want (e.g. US-Phoenix-1). It will become available for selection after you refresh your browser.

Lab Prerequisites – Required Software

  • This workshop needs two desktop tools to be installed on your computer to do the exercises in this lab.

1. SQL Developer

To download and install SQL Developer please follow this link, select the operating system for your computer. This page also has instructions on how to install SQL Developer on Windows, Mac OSX and Linux.

If you already have SQL Developer installed on your computer then please check the version. The minimum version that is required to connect to an Oracle Autonomous Data Warehouse Cloud is SQL Developer 17.4.

2. Data Visualization Desktop

Oracle Data Visualization Desktop makes it easy to visualize your data so you can focus on exploring interesting data patterns. Choose from a variety of visualizations to look at data in a specific way. Data Visualization Desktop comes included with Autonomous Data Warehouse.

To download and install Data Visualization Desktop please follow this link and select the operating system for your computer. This page also has instructions on how to install DVD on Windows and Mac OSX.

If you already have Data Visualization Desktop installed on your computer then please check the version. The minimum version that is required to connect to an Oracle Autonomous Data Warehouse Cloud is 12c 12.2.5.0.0.

In the next post, Data Warehouse 101: Provisioning, we will provision an autonomous data warehouse.

Written by Sai Valluri and Philip Li

Related:

How USC’s Person Data Integration Project Went Enterprise-Wide

Like other institutions, USC has many different legacy and modern systems that keep operations running on a daily basis. In some cases, we have the same kind of data in different systems; in other cases, we have different data in different systems. The majority of our student data, is in the Student Information System, which is over 30 years old, whereas the majority of our staff data is now in a cloud software with APIs. At the same time, the majority of our financial data is in another system, and the faculty research data in yet another system. You get the idea.

Naturally, all of these different systems make it hard to combine the data to make better decisions. A plethora of ways to extract data from these systems along with different time intervals make it easier to make mistakes each time a report needs to be created. Even though many groups on campus need to extract/combine the same kind of data, each group needs to individually do the work to extract it, which wastes time and resources across the entire organization.

There is no single system that will be able to accommodate all of the different functions USC needs to operate; USC simply does too much: Academics, Athletics, Healthcare, Research, Construction, HR, etc. Because of this, the solution is to systematically integrate the data from the different systems one time, so every department on campus can access the same data while the different systems continue to operate.

The Person Entity Project

At a high level, the Person Entity (PE) Project is essentially a database of everything applicable to a Person. This includes every kind of person – pre-applicant, applicant, admit, student, alum, donor, faculty, staff, etc. It aims to be a centralized, high-integrity database, encompassing data from multiple systems, that can supply data to every department at USC. Such complete, high-quality data encompassing every Trojan’s entire academic and professional life at USC can then motivate powerful decision-making in recruiting, admissions, financial aid, advancement, advising, and many other domains.

As one of the original members of the team that started the PE as a skunk works project, I have seen some of the challenges of transforming a small project to an enterprise-backed service.

Where It All Began

The project came about shortly after Dr. Douglas Shook became the USC Registrar. He wanted the data at USC to be more easily accessible to the academic units on campus as well as the university for its business processes. The initial business processes of getting data and using data was filled with manual text file ftps, excel files emailed back and forth, nightly data dumps, etc. and providing delayed, incomplete, costly data was the norm.

The odds were against us because there had been numerous attempts in the past to transition out of our custom-coded legacy student information system that died mid-project. Though our project was not entirely the same as the failed ones, it was quite similar so that if we were able to move the data into a relational database and make it more accessible to other systems, the eventual transition would be a lot easier. There was not a lot of documentation on the student information system, and there was 30 years worth of custom code and fields in the system. We were also using a message broker for the first time to connect to a legacy system like ours.

Working on this project was rewarding and worthwhile because we were able to not only overcome technical hurdles, but get some early wins for the project such as providing data for a widely used applicant portal and providing updated data faster than the existing method. We steadily grew our list of data consumers and now our service provides data for the entire University—schools and administrative units.

If you are currently working on a small experimental project in a large organization, here are some key drivers to keep in mind as you embark on your journey.

1. Create your own success criteria.

Good success criteria for a small project are the small proof of concepts that can be completed for the people on campus that are interested in your services or have not had all of their needs met by one of the enterprise services. The initial ROI will definitely not be as high as other projects so don’t measure things by the monetary amount. For us, the success criteria were things like successfully moved data from one system to the other and reducing the amount of hours a process used to take to get to the same result. We also had a timeline of the “Firsts” that the project had. For instance, first successful push of data, first database tables created, first triggers, first procedures, first production database, first integration with another system, first data client, etc. We could show to the sponsors that we were making progress, and we had a growing list of people who were excited about the problems we could solve for them. By keeping the lines of communication open about our progress, we were given the time and funding we needed to keep working on the project.

2. Minimize spending to maximize your budget.

Don’t be afraid to ask others in your organization for help. In my organization, we piggybacked on other internal organizations’ licenses to lock in lower prices on the renewals versus signing on as a new customer. Another way to cut costs is to use student workers or interns to do research and work. Student workers bring a different set of skills along with challenges, but have been instrumental in our project. In our case, the project initially started with only one full time member and three to four student workers; I was one of them. We did the data modeling at first, some requirements gathering, and the coding as well. Though our work was far from perfect and we needed some guidance from full-time employees, the entire team was able to gain traction on the project by accomplishing tasks with the help of the students at a very low cost.

I would also suggest doing a cost benefit analysis for software purchases. For example, we decided to purchase a professional data modeling tool even though we could have continued using Visio, because the monetary and time cost we would spend on finicking with Visio would be higher than the license fee in the long run. Lastly, find others willing to partially sponsor the project with equipment. I.e. You can ask other groups if they can give you a slice of their Virtual Machine while you are just getting started developing.

3. Work in bite-sized chunks.

It’s easy to be overwhelmed if your project scope is large. That’s why it’s extremely useful to do proof of concepts and pilot projects. For us, the project was so large that if we tried to plan everything out in the very beginning we would get too overwhelmed to even start. The goal of the project was to migrate all of the data over to the Oracle relational database. We needed to divide the system into different data domains and start with one or two. We chose to do Admissions and Person first. This is a little unconventional I think but we just deployed some tables and created some procedures to populate the tables just so we could get started, before we completely finished the model. As a small project it is usually okay to be wrong, because you can just start over!

4. Balance selling/promoting the project with working on the project.

The project sponsor is an important member of the team for this point. If there is no interest in your project then it will die, but if there is too much interest and too many expectations you will fall short of those expectations and people will lose faith in the project. Both are important but too much of one could be fatal. He pitched the project solutions to the problems that USC was trying to solve, how much faster and easier it would be, got people excited about using it and lining up to be the next customer. We started partnering with other groups on campus and doing proof of concepts together.

5. Base your design choices on mass adoption and impact.

When working on the project you need to think of what the best design choice is once the project is in production. This is easier said than done because the future is unknown. Essentially, don’t knowingly make design choices that will cripple your project in the future once it is in production. Think of the potential benefits to the organization if what you are working on is implemented at scale. It is a balancing act to get things out and get things perfect, find this balance that your organization finds acceptable, I think it is different for each feature and for our case, each data field. In other words, prioritize the features so you can make sure to pay special attention to the high priority ones.

6. Document everything.

Increase speed of onboarding new members and for remembering the rationale behind the decisions you made at the time. This will save you a lot of time down the road. Now that we are on year five of the project, sometimes we will come back to old code and tables, and wonder why we designed them the way they are; this is not ideal and we should have done a better job at documentation. This is less of an issue with the newer portions of the project but in the beginning we did not do much documentation and this has slowed us down recently. Members of the team could leave during a critical stage of the project and without documentation, it is very hard to progress at the same rate. Another thing is to at least version your documentation, so even if you don’t make it a habit to update it after things change in production, at least you should know when it was last updated.

7. Embrace change.

There’s always a lot of change to be expected when working on a small project. Decisions about the future of the project can be made almost any time, especially since there are very few services that are being provided in the beginning. The decisions could be made without you in the room as well. Funding and resources can be reallocated which can severely impact your project. We were lucky in that we didn’t have any of these happen to us but I think that is because we were able to overcome the major technical roadblocks in the beginning.

Now that we are an enterprise-backed service, things have definitely changed, and though I am proud that the project has progressed so far, I am a little nostalgic of the time when we were accomplishing milestones of what felt like every week. You should definitely enjoy the time when the project is just starting and small because it will never get smaller after it gains momentum.

The Person Entity project team now has around 3 full-time members, including one member strictly focusing on data quality. The project is now part of a larger data team, with a portfolio of data systems to manage like Tableau and Cognos. We provide some or all of the data & dashboards for all the different schools on campus such as Financial Aid, Registrar dashboards, Admissions, etc. We now have integration between cVent, Campaign!, the mandatory education modules, myUSC, and with plenty more on the way. We are still far from completing our task of extracting all the data from our legacy student information system, and will steadily continue to work on it, as well as moving our entire infrastructure to the cloud.

Guest Author, Stanley Su is currently a data architect on the Enterprise Data and Analytics team at USC. Stan was one of the original members on the Person Entity project, an enterprise data layer with integrated data from multiple systems at USC, and is the current lead on the project. During the Fall semester, he also TAs for a database class at the Marshall School of Business. Stan is interested in using technology to increase business efficiency and reduce repetitive tasks at work.

Related:

Design Your Data Lake for Maximum Impact

Data lakes are fast becoming valuable tools for businesses that need to organize large volumes of highly diverse data from multiple sources. However, if you are not a data scientist, a data lake may seem more like an ocean that you are bound to drown in. Making a data lake manageable for everyone requires mindful designs that empower users with the appropriate tools.

A recent webcast conducted by TDWI and Oracle, entitled “How to Design a Data Lake with Business Impact in Mind,” identified the best use cases for using a data lake and then defined how to design one for an enterprise-level business. The presentation recommended keeping data-driven use cases at the forefront, making a data lake a central IT-managed function, blending old and new data, empowering self-service, and establish a sponsor group to manage the company’s data lake plan with enough staffing and skills to keep it relevant.

“Business want to make more fact-based data but they also want to go deeper into the data they have with analytics,” says Philip Russom, a Senior Research Director for Data Management at TDWI. “We see data lakes as a good advantage for companies that want to do this as the data can be repurposed repeatedly for new analytics and use cases.”

Data lake usage is on the rise, according to TDWI surveys. A 2017 query revealed that nearly a quarter of those businesses questioned (23 percent) have a data lake already in production with another quarter (24 percent) expected to launch in 12 months with only 7 percent admitting they would not jump into a data lake. A significant number (21 percent) said they would be establish a data lake within three years.

In the same survey, respondents were asked about the business benefit of deploying a Hadoop-based data lake. Half (49 percent) rated advanced analytics including data mining, statistics, and machine learning the primary use case, followed by data exploration and discovery. The third largest response saw big data source for analytics as the third most likely use case for a data lake.

Use cases for data lakes include investigating new data coming from sensors and machines, streaming, and human language text. More complex uses for data lakes include multiplatform data warehouse environments, omnichannel marketing, and digital supply chain.

As the best argument for deploying and using a data lake is to be able to blend old and new data together. This is especially helpful for departments like marketing, finance, and governance which require insight from multiple sources old and new. Russom noted multi-module enterprise resource planning, Internet of Things (IoT), insurance claim workflow, and digital healthcare would all be areas that could benefit from data lake deployments.

When it comes to design, Russom suggests the following:

  • Create a plan, prioritize use cases, and update as biz evolves
  • Choose data platform(s) that support business requirements
  • Get tools that work with platform and satisfy user requirements
  • Augment your staff with consultants experienced with data lakes
  • Train staff for Hadoop, analytics, lakes, clouds.
  • Start with business use case that a lake can address w/ROI

Bruce Edwards, a Cloud Luminary and Information Management Specialist with Oracle, added that the convergence of cloud, big data, and data science have enabled the explosion of data lake deployments. Having a central vendor that not only understands large scale data management the but can integrate existing infrastructures into core data lake components is essential.

“What data lake users need is an open, integrated, self-healing, high performance tool,” Edwards said. “These elements are all needed to allow businesses to begin their data lake journey.

To experience the entire webcast, download the presentation from our website. if you’re ready to start playing around with a data lake, we can offer you a free trial right here.

Related:

How does Oracle’s Data Lake Enable Big Data Solutions?

By: Wes Prichard

Senior Director Industry Solution Architecture

When I took wood shop back in eighth grade, my shop teacher taught us to create a design for our project before we started building it. The way we captured the design was in what was called a working drawing. In those days it was neatly hand sketched showing shapes and dimensions from different perspectives and it provided enough information to cut and assemble the wood project.

The big data solutions we work with today are much more complex and built with layers of technology and collections of services, but we still need something like working drawings to see how the pieces fit together.

Solution patterns (sometimes called architecture patterns) are a form of working drawing that help us see the components of a system and where they integrate but without some of the detail that can keep us from seeing the forest for the trees. That detail is still important, but it can be captured in other architecture diagrams.

In this blog I want to introduce some solution patterns for data lakes. (If you want to learn more about what data lakes are, read “What Is a Data Lake?“) Data lakes have many uses and play a key role in providing solutions to many different business problems.

Register for a guided trial to build your own data lake

The solution patterns described here show some of the different ways data lakes are used in combination with other technologies to address some of the most common big data use cases. I’m going to focus on cloud-based solutions using Oracle’s platform (PaaS) cloud services.

These are the patterns:

  • Data Science Lab
  • ETL Offload for Data Warehouse
  • Big Data Advanced Analytics
  • Streaming Analytics

Data Science Lab Solution Pattern

Let’s start with the Data Science Lab. We call it a lab because it’s a place for discovery and experimentation using the tools of data science. Data Science Labs are important for working with new data, for working with existing data in new ways, and for combining data from different sources that are in different formats. The lab is the place to try out machine learning and determine the value in data.

Before describing the pattern, let me provide a few tips on how to interpret the diagrams. Each blue box represents an Oracle cloud service. A smaller box attached under a larger box represents a required supporting service that is usually transparent to the user. Arrows show the direction of data flow but don’t necessarily indicate how the data flow is initiated.

The data science lab contains a data lake and a data visualization platform. The data lake is a combination of object storage plus the Apache Spark™ execution engine and related tools contained in Oracle Big Data Cloud. Oracle Analytics Cloud provides data visualization and other valuable capabilities like data flows for data preparation and blending relational data with data in the data lake. It also uses an instance of the Oracle Database Cloud Service to manage metadata.

The data lake object store can be populated by the data scientist using an Open Stack Swift client or the Oracle Software Appliance. If automated bulk upload of data is required, Oracle has data integration capabilities for any need that is described in other solution patterns. The object storage used by the lab could be dedicated to the lab or it can be shared with other services, depending on your data governance practices.

ETL Offload for Data Warehouse Solution Pattern

Data warehouses are an important tool for enterprises to manage their most important business data as a source for business intelligence. Data warehouses, being built on relational databases, are highly structured. Data therefore must often be transformed into the desired structure before it is loaded into the data warehouse.

This transformation processing in some cases can become a significant load on the data warehouse driving up the cost of operation. Depending on the level of transformation needed, offloading that transformation processing to other platforms can both reduce the operational costs and free up data warehouse resources to focus on its primary role of serving data.

Oracle’s Data Integration Platform Cloud (DIPC) is the primary tool for extracting, loading, and transforming data for the data warehouse. Oracle Database Cloud Service provides required metadata management for DIPC. Using Extract-Load-Transform (E-LT) processing, data transformations are performed where the data resides.

For cases where additional transformation processing is required before loading (Extract-Transform-Load, or ETL), or new data products are going to generated, data can be temporarily staged in object storage and processed in the data lake using Apache Spark™. Additionally, this also provides an opportunity to extend the data warehouse using technology to query the data lake directly, a capability of Oracle Autonomous Data Warehouse Cloud.

Big Data Advanced Analytics Solution Pattern

Advanced analytics is one of the most common use cases for a data lake to operationalize the analysis of data using machine learning, geospatial, and/or graph analytics techniques. Big data advanced analytics extends the Data Science Lab pattern with enterprise grade data integration.

Also, whereas a lab may use a smaller number of processors and storage, the advanced analytics pattern supports a system scaled-up to the demands of the workload.


Oracle Data Integration Platform Cloud provides a remote agent to capture data at the source and deliver it to the data lake either directly to Spark in Oracle Big Data Cloud or to object storage. The processing of data here tends to be more automated through jobs that run periodically.

Results are made available to Oracle Analytics Cloud for visualization and consumption by business users and analysts. Results like machine learning predictions can also be delivered to other business applications to drive innovative services and applications.

Stream Analytics Solution Pattern

The Stream Analytics pattern is a variation of the Big Data Advanced Analytics pattern that is focused on streaming data. Streaming data brings with it additional demands because the data arrives as it is produced and often the objective is to process it just as quickly.

Stream Analytics is used to detect patterns in transactions, like detecting fraud, or to make predictions about customer behavior like propensity to buy or churn. It can be used for geo-fencing to detect when someone or something crosses a geographical boundary.


Business transactions are captured at the source using the Oracle Data Integration Platform Cloud remote agent and published to an Apache Kafka® topic in Oracle Event Hub Cloud Service. The Stream Analytics Continuous Query Language (CQL) engine running on Spark subscribes to the Kafka topic and performs the desired processing like looking for specific events, responding to patterns over time, or other work that requires immediate action.

Other data sources that can be fed directly to Kafka, like public data feeds or mobile application data, can be processed by business-specific Spark jobs. Results like detected events and machine learning predictions are published to other Kafka topics for consumption by downstream applications and business processes.

Conclusion

The four different solution patterns shown here support many different data lake use cases, but what happens if you want a solution that includes capabilities from more than one pattern? You can have it. Patterns can be combined, but the cloud also makes it easy to have multiple Oracle Big Data Cloud instances for different purposes with all accessing data from a common object store.

Now you’ve seen some examples of how Oracle Platform Cloud Services can be combined in different ways to address different classes of business problem. Use these patterns as a starting point for your own solutions. And even though it’s been a few years since eighth grade, I still enjoy woodworking and I always start my projects with a working drawing.

If you’re ready to test these data lake solution patterns, try Oracle Cloud for free with a guided trial, and build your own data lake.

The Documents contained within this site may include statements about Oracle’s product development plans. Many factors can materially affect Oracle’s product development plans and the nature and timing of future product releases. Accordingly, this Information is provided to you solely for information only, is not a commitment to deliver any material, code, or functionality, and SHOULD NOT BE RELIED UPON IN MAKING PURCHASING DECISIONS. The development, release, and timing of any features or functionality described remains at the sole discretion of Oracle. THIS INFORMATION MAY NOT BE INCORPORATED INTO ANY CONTRACTUAL AGREEMENT WITH ORACLE OR ITS SUBSIDIARIES OR AFFILIATES. ORACLE SPECIFICALLY DISCLAIMS ANY LIABILITY WITH RESPECT TO THIS INFORMATION. Refer to the LEGAL NOTICES AND TERMS OF USE (http://www.oracle.com/html/terms.html) for further information.

Related:

Integrating Autonomous Data Warehouse and Big Data Using Object Storage

While you can run your business on the data stored in Oracle Autonomous Data Warehouse, there’s lots of other data out there which is potentially valuable. Using Oracle Big Data Cloud, it’s possible to store and process that data, making it ready to be loaded into or queried by the Autonomous Data Warehouse. The point of integration for these two services is object storage which I will explore below. Of course, you need more than this for a complete big data solution. If that’s what you’re looking for, you should read about data lake solution patterns.

Sign up for a free trial to build and populate a data lake in the cloud

Use Cases for the Data Lake and Data Warehouse

Autonomous Data Warehouse and Big Data

Almost all big data use cases involve data that resides in both a data lake and data warehouse. With predictive maintenance, for example, we would want to combine sensor data (stored in the data lake) with official maintenance and purchase records (stored in the data warehouse).

When trying to determine the next best action for a given customer, we would want to work with both customer purchase records (in the data warehouse) and customer web browsing or social media usage (details of which would most likely be stored in the data lake). In use cases from manufacturing to healthcare, having a complete view of all available data means working with data in both the data warehouse and the data lake.

The Data Lake and Data Warehouse for Predictive Maintenance

Take predictive maintenance as an example. Official maintenance records and purchase or warranty information are all important to the business. It may be needed for regulators to check that proper processes are being followed or for purchasing departments to manage budgets or order new components.

On the other hand, sensor information from machines, weather stations, thermometers, seismometers, and similar devices all produce data that is potentially useful to help understand and predict the behavior of some piece of equipment. If you asked your data warehouse administrator to store many terabytes of this raw, less well-understood, multi-structured data, they would not be very enthusiastic. This kind of data is much better suited for a data lake, where it can be transformed or used as the input for machine learning algorithms. But ultimately, you want to combine both data sets to predict failures or a component moving out of tolerance.

Examples: How Object Storage Works with the Data Warehouse

We talked previously about how object storage is the foundation for a modern data lake. But it’s much more than that. Object storage is used, amongst other things, for backup and archive, to stage data for a data warehouse, or to offload data that is no longer stored there. And these use cases require that the data warehouse can also work easily with object storage, including data in the data lake.

Let’s go back to that predictive maintenance use case. After being loaded into the data lake (in object storage) the sensor data can be processed in a Spark cluster spun up by Oracle Big Data Cloud. “Processing” in this context could be anything from a simple filter or aggregation of results to a running a complex machine learning algorithm to uncover hidden patterns.

Once that work is done, a table of results will be written back to object storage. At that point, it could be loaded into the Autonomous Data Warehouse or queried in place. Which approach is best? Depends on the use case. In general, if that data is accessed more frequently, or performance of the query is more important, then loading into the Autonomous Data Warehouse is probably optimal. Here you can think of object storage as another tier in your storage hierarchy (note that Autonomous Data Warehouse already has RAM, flash, and disk as storage tiers).

We can also see a similar approach in an ETL offload use case. Raw data is staged into object storage. Transformation processes then run in one or more Big Data Cloud Spark clusters, with the results written back to object storage. This transformed data is then available to load into Autonomous Data Warehouse.

Autonomous Data Warehouse and Big Data Cloud: Working Together

Don’t think of Oracle Autonomous Data Warehouse and Oracle Big Data Cloud as two totally separate services. They have complementary strengths and can interoperate via object storage. And when they do, it will make it easier to take advantage of all your data, to the benefit of your business as a whole.

If you’re interested in learning more, sign up for an Oracle free trial to build and populate your own data lake. We have tutorials and guides to help you along.

Related:

Autonomous Capabilities Will Make Data Warehouses—and DBAs—More Valuable

By: Ilona Gabinsky

Principal Product Marketing Manager

Today we have guest blogger – Alan Zeichick – principal analyst at Camden Associates

As the old saying goes, you can’t manage what you don’t measure. In a data-driven organization, the best tools for measuring the performance are business intelligence (BI) and analytics engines, which require data. And that explains why data warehouses continue to play such a crucial role in business. Data warehouses often provide the source of that data, by rolling up and summarizing key information from a variety of sources.

Data warehouses, which are themselves relational databases, can be complex to set up and manage on a daily basis, so they typically require significant human involvement from database administrators (DBAs). In a large enterprise, a team of DBAs ensure that the data warehouse is extracting data from those disparate data sources, as well as accommodating new and changed data sources. They’re also making sure the extracted data is summarized properly and stored in a structured manner that can be handled by other applications, including those BI and analytics tools.

On top of that, DBAs are managing the data warehouse’s infrastructure, everything from server processor utilization, the efficiency of storage, security of the data, backups, and more.

However, the labor-intensive nature of data warehouses is about to change, with the advent of Oracle Autonomous Data Warehouse Cloud, announced in October 2017. The self-driving, self-repairing, self-tuning functionality of Oracle’s Data Warehouse Cloud is good for the organization—and good for the DBAs.

No Performance-Tuning Knobs

Data-driven organizations need timely, up-to-date business intelligence, which can feed instant decision-making, short-term predictions and business adjustments, and long-term strategy. If the data warehouse goes down, slows down, or lacks some information feeds, the impact can be significant. No data warehouse may mean no daily operational dashboards and reports, or inaccurate dashboards or reports.

Oracle Autonomous Data Warehouse Cloud is a powerful platform, because the customer doesn’t have to worry about the system itself, explains Penny Avril, vice president of product management for Oracle Databases.

“Customers don’t have to worry about the operational management of the underlying database—provisioning, scaling, patching, backing up, failover, all of that is fully automated,” she says. “Customers also don’t have to worry about performance. There are no performance knobs for the customer: DBAs don’t have to tweak anything themselves.”

For example, one technique used to drive Autonomous Data Warehouse’s performance is by automating the process of creating storage indexes, which Avril describes as the top challenge faced by database administrators. Those indexes allow applications to quickly extract data required to handle routine reports or ad-hoc queries.

“DBAs manually create custom indexes when they manage their own data warehouse. Now, the autonomous data warehouse transparently, and continually, generates indexes automatically based on the queries coming in,” she says. Those automatically created indexes keep the performance high, without any manual tuning or interventional required by DBAs.

  • Related: Join Oracle CEO Mark Hurd for the release of the world’s first autonomous database cloud service. Register now.

The organization also can benefit by the automatic scaling features of Autonomous Data Warehouse. When the business requires more horsepower in the data warehouse to maintain performance during times of high utilization, the customer can add more processing power by adding more CPUs to the cloud service, for which there is an additional cost. However, Avril says, “Customers can scale back down again when the peak demand is over”—eliminating that extra cost until the next time the CPUs are needed.

Customers can even turn off the processing entirely if needed. “When a customer suspends the service, they pay for storage, but not CPU,” she says. “That’s great for developers and test beds. It’s great for ad-hoc analytics for people running queries. When you don’t need a particular data warehouse, you can just suspend it.”

Freedom for the Database Administrator

Performance optimization, self-repairing, self-securing, scalability up and down—those benefits serve the organization. What about the poor DBA? Is he or she out of work? Not at all, says Avril, laughing at the question. “They can finally tackle the task backlog,” adding more value to the business, she says.

Avril explains that DBAs do two types of day-to-day work. “There are generic tasks, common to all databases, including data warehouses. And there are tasks that are specific to the business. With Oracle’s Autonomous Data Warehouse, the generic tasks go away. Configuring, tuning, provisioning, backup, optimization—gone.”

That leaves the good stuff, she explains: “If they aren’t overloaded with generic tasks, DBAs can do business-specific tasks, like data modeling, integrating new data sources, application tuning, and end-to-end service level management.”

For example, DBAs will have to manage how applications connect to the data warehouse—and what happens if things go wrong. “If the database survives a failure through failover, does the application know to failover instantly and transparently? The DBA still needs to manage that,” Avril says.

In addition, data security still must be managed. “Oracle will take care of patching the data warehouse itself, but Oracle doesn’t see the customer’s data,” she says, “DBAs still need to understand where the data lives, what the data represents, and which people and applications should get to see which data.”

No need for a resume writer: DBAs will still have plenty of work to do.

For C-level executives, Autonomous Data Warehouse can improve the value of the data warehouse—and the responsiveness of business intelligence and other important applications—by improving availability and performance. “The value of the business is driven by data, and by the usage of the data,” says Avril. “For many companies, the data is the only real capital they have. Oracle is making it easier for the C-level to manage and use that data. That should help the bottom line.”

For the DBA, Autonomous Data Warehouse means the end of generic tasks that, on their own, don’t add significant value to the business. Stop worrying about uptime. Forget about disk-drive failures. Move beyond performance tuning. DBAs, you have a business to optimize.

Alan Zeichick is principal analyst at Camden Associates, a tech consultancy in Phoenix, Arizona, specializing in software development, enterprise networking, and cybersecurity. Follow him @zeichick.

Follow Us On Social Media:

Related:

What’s the Difference Between a Data Lake, Data Warehouse and Database?

There are so many buzzwords these days regarding data management. Data lakes, data warehouses, and databases – what are they? In this article, we’ll walk through them and cover the definitions, the key differences, and what we see for the future.

Start building your own data lake with a free trial

Data Lake Definition

If you want full, in-depth information, you can read our article called, “What’s a Data Lake?” But here we can tell you, “A data lake is a place to store your structured and unstructured data, as well as a method for organizing large volumes of highly diverse data from diverse sources.”

The data lake tends to ingest data very quickly and prepare it later, on the fly, as people access it.

Data Warehouse Definition

A data warehouse collects data from various sources, whether internal or external, and optimizes the data for retrieval for business purposes. The data is primarily structured, often from relational databases, but it can be unstructured too.

Primarily, the data warehouse is designed to gather business insights and allows businesses to integrate their data, manage it, and analyze it at many levels.

Database Definition

Essentially, a database is an organized collection of data. Databases are classified by the way they store this data. Early databases were flat and limited to simple rows and columns. Today, the popular databases are:

  • Relational databases, which store their data in tables
  • Object-oriented databases, which store their data in object classes and subclasses

Data Mart, Data Swamp and Other Terms

And, of course, there are other terms such as data mart and data swamp, which we’ll cover very quickly so you can sound like a data expert.

Enterprise Data Warehouse (EDW): This is a data warehouse that serves the entire enterprise.

Data Mart: A data mart is used by individual departments or groups and is intentionally limited in scope because it looks at what users need right now versus the data that already exists.

Data Swamp: When your data lake gets messy and is unmanageable, it becomes a data swamp.

The Differences Between Data Lakes, Data Warehouses, and Databases

Data lakes, data warehouses and databases are all designed to store data. So why are there different ways to store data, and what’s significant about them? In this section, we’ll cover the significant differences, with each definition building on the last.

The Database

Databases came about first, rising in the 1950s with the relational database becoming popular in the 1980s.

Databases are really set up to monitor and update real-time structured data, and they usually have only the most recent data available.

The Data Warehouse

But the data warehouse is a model to support the flow of data from operational systems to decision systems. What this means, essentially, is that businesses were finding that their data was coming in from multiple places—and they needed a different place to analyze it all. Hence the growth of the data warehouse.

For example, let’s say you have a rewards card with a grocery chain. The database might hold your most recent purchases, with a goal to analyze current shopper trends. The data warehouse might hold a record of all of the items you’ve ever bought and it would be optimized so that data scientists could more easily analyze all of that data.

The Data Lake

Now let’s throw the data lake into the mix. And because it’s the newest, we’ll talk about this one more in depth. The data lake really started to rise around the 2000s, as a way to store unstructured data in a more cost-effective way. The key phrase here is cost effective.

Although databases and data warehouses can handle unstructured data, they don’t do so in the most efficient manner. With so much data out there, it can get expensive to store all of your data in a database or a data warehouse.

In addition, there’s the time-and-effort constraint. Data that goes into databases and data warehouses needs to be cleansed and prepared before it gets stored. And with today’s unstructured data, that can be a long and arduous process when you’re not even completely sure that the data is going to be used.

That’s why data lakes have risen to the forefront. The data lake is primarily designed to handle unstructured data in the most cost-effective manner possible. As a reminder, unstructured data can be anything from text to social media data to machine data such as log files and sensor data from IoT devices.

Data Lake Example

Going back to the grocery example that we used with the data warehouse, you might consider adding a data lake into the mix when you want a way to store your big data. Think about the social sentiment you’re collecting, or advertising results. Anything that is unstructured but still valuable can be stored in a data lake and work with both your data warehouse and your database.

Note 1: Having a data lake doesn’t mean you can just load your data willy-nilly. That’s what leads to a data swamp. But it does make the process easier, and new technologies such as having a data catalog will steadily make it easier to find and use the data in your data lake.

Note 2: If you want more information on the ideal data lake architecture, you can read the full article we wrote on the topic. It describes why you want your data lake built on object storage and Apache Spark, versus Hadoop.

What’s the Future of Data Lakes, Data Warehouses, and Databases?

Will one of these technologies rise to overtake the others? No, we don’t think so.

Here’s what we see. As the value and amount of unstructured data rises, the data lake will become increasingly popular. But there will always be an essential place for databases and data warehouses.

You’ll probably continue to keep your structured data in the database or data warehouse. But increasingly, companies are moving their unstructured data to data lakes on the cloud, where it’s more cost effective to store it and easy to move it when it’s needed. This workload that involves the database, data warehouse, and data lake in different ways is one that works, and works well. We’ll continue to see more of this for the foreseeable future.

If you’re interested in the data lake and want to try to build one yourself, we’re offering a free data lake trial with a step-by-step tutorial. Get started today.

Related: