What Is Oracle Cloud Infrastructure Data Science?

And how does it work?

Incredible things can be done with data science, and more appear in the news every day—but there are still many barriers to success. These barriers range from a lack of proper support for data scientists to challenges around operationalizing and maintaining models in production.

That is why we created Oracle Cloud Infrastructure Data Science. Based on the acquisition of DataScience.com in 2018, Oracle Cloud Infrastructure Data Science was built with the goal of making data science collaborative, scalable, and powerful for every enterprise on Oracle Cloud Infrastructure. This short video gives an overview of the power of Oracle Cloud Infrastructure Data Science.

Oracle Cloud Infrastructure Data Science was created with the data scientist in mind—and it’s uniquely suited for data science success because of its support for team-based activity. When it comes to data science success, teams must collaborate at each step of the model lifecycle: from building models all the way through to deployment and beyond.

Oracle Cloud Infrastructure Data Science helps make all of that possible.

Never miss an update about data science! Introducing Oracle Data Science on Twitter — follow @OracleDataSci today for the latest updates!

What Is Oracle Cloud Infrastructure Data Science?

Oracle Cloud Infrastructure Data Science makes data science more structured and more efficient by offering:

Access to data and open-source tools

We are data-source agnostic. Your data can be on Autonomous Data Warehouse, on Object Storage, in MongoDB, or even in an Elasticsearch instance on Azure or AWS Redshift. It doesn’t matter to us where the data is; we just care about giving you access to your data to get things done.

With Oracle Cloud Infrastructure Data Science, you can use the best of open source, including:

  • Tools and languages like Python and JupyterLab
  • Visualization like Plotly and Matplotlib
  • Machine-learning libraries like TensorFlow, Keras, SciKit-Learn, and XGBoost
  • Version control with Git

Ability to utilize compute on demand

We’ll give you the client connectors you need to access your data and a configurable volume to store that data in your notebook compute environment.

But of course, it doesn’t stop there. You can also select the amount of compute you need to train your model on Oracle Cloud Infrastructure. For now, you can choose small to large CPU virtual machines. And in the near future, we’re planning to add GPUs.

Collaborative workflow

We make a big deal out of teamwork, because we believe that data science can’t truly be successful unless there’s an emphasis on making those teams efficient and successful. We’ve done everything we can to make this possible.

Data scientists can work in “projects” where it’s easy to see what’s happening with a high-level view. Data scientists can share and reuse data science assets and test their colleagues’ models.

Model deployment

Model deployment is usually challenging. But it’s made easier with Oracle Functions on Oracle Cloud Infrastructure. Create a machine learning model function which can be invoked from any application. It’s one of many possible deployment targets, and it’s fully managed, high scalable, and on-demand.

What Makes Oracle Cloud Infrastructure Data Science Different?

With the growing popularity of data science and machine learning, products that claim to help are a dime a dozen. So, what makes Oracle Cloud Infrastructure Data Science different?

This isn’t an analytics tool with some machine learning capabilities embedded within it. Nor is it an app that offers AI capabilities across different products.

Oracle Cloud Infrastructure Data Science is a platform built for the modern, expert data scientist. And it was built by data scientists who were seeking a platform that would help them perform their complex work better. It’s not a drag-and-drop interface­. This is meant for data scientists who write code in Python and need something with real power to enable real data science.

Oracle Cloud Infrastructure Data Science is right for you if you:

  • Have a team and see the benefits of centralized work
  • Prefer Python to drag-and-drop interfaces
  • Want to take advantage of the benefits of Oracle Cloud, with easy access to your data

Oracle Cloud Infrastructure Data Science is also right for you if you need:

  • The ability to train large models on large amounts of data with minimal infrastructure expertise
  • A system to evaluate and monitor models throughout their lifecycle
  • Improved productivity through automation and streamlined workflows
  • Capabilities to deploy models for varying use cases
  • Ability to collaborate with team members in an enterprise organization
  • A seamless, integrated Oracle Cloud Infrastructure user experience

How Does Oracle Cloud Infrastructure Data Science Work?

Oracle Cloud Infrastructure Data Science has:

Projects to centralize, organize, and document a team’s work. These projects describe the purpose of the work and allow users to organize notebook sessions and models.

Notebook Sessions for Python analyses and model development. Users can easily launch Oracle Cloud Infrastructure compute, storage, and networking for Python data science workloads. These sessions provide easy access to JupyterLab and other curated open-source machine-learning libraries for building and training models.

In addition, these notebook sessions come loaded with tutorials and example use cases to make getting started easier than ever.

Accelerated Data Science (ADS) SDK to make common data science tasks faster, easier, and less error-prone. This is a Python library that offers capabilities for data exploration and manipulation, model explanation and interpretation, and AutoML for automated model training.

Model Catalog to enable model auditability and reproducibility. You can track model metadata (including the creator, created date, name, and provenance), save model artifacts in service-managed object storage, and load models into notebook sessions for testing.

How Does Oracle Cloud Infrastructure Data Science Help with Model Management?

The process of building a machine leaning model is an iterative one, and it’s one that essentially never ends. Let’s walk through how Oracle Cloud Infrastructure Data Science makes it easier to manage models throughout every step of the entire lifecycle.

Building a Model

Oracle Cloud Infrastructure Data Science’s JupyterLab environment offers a variety of open-source libraries for building machine learning models. It also includes the Accelerated Data Science (ADS) SDK, which provides APIs on data ingestion, data profiling and visualization, automated feature engineering, automated machine learning, model evaluation, and model interpretation. It’s everything that’s needed in a unified Python SDK, accomplishing in a few lines of code what a data scientist would typically do in hundreds of lines of code.

Training a Model

Data scientists can automate model training through the ADS AutoML API. ADS can help data scientists find the best data transformations for datasets. After the model evaluation shows that the model is ready for production, the model can be made accessible to anybody who needs to use it.

Evaluating a Model

ADS also helps with model evaluation to ensure that your model is accurate and reliable. What percent accuracy can you achieve with the model? How can you make it more accurate? You want to feel confident in your model before you start to deploy it.

Explaining a Model

Model explainability is becoming an increasingly important part of machine learning and data science. Can your model give you more information about why it’s making the decisions it’s reaching? Increasingly, there are more European regulations around the right to know. GDPR, for example, states that the data subject has a right to an explanation of the decision reached by a model.

Deploying a Model

Taking a trained machine learning model and getting it into the right systems is often a difficult and laborious process. But Oracle Cloud Infrastructure enables team to operationalize models as scalable and secure APIs. Data scientists can load their model from the model catalog, deploy the model using Oracle Functions, and secure the model endpoint with Oracle API Gateway. Then, the model REST API can be called from any application.

Model Monitoring

Unfortunately, deploying a model isn’t the end of it. Models must always be monitored after deployment to maintain good health. The data it was trained on may no longer be relevant for future predictions after a while. For example, in the case of fraud detection, the fraudsters may come up with new ways to defraud the system, and the model will no longer be as accurate. Oracle Cloud Infrastructure Data Science is working to provide data scientists with tools to easily monitor how the model continues to do while it’s deployed, so that it becomes easier to monitor model accuracy over time.

Conclusion

Oracle Cloud Infrastructure Data Science is an enterprise-grade service in which teams of data scientists can collaborate to solve business problems and leverage the latest and greatest in Oracle Cloud Infrastructure to build, train, and deploy their models in the cloud.

It is part of Oracle’s data and AI platform, which makes it simple to integrate and manage your data and use the power of data science and machine learning for more business results.

With Oracle Cloud Infrastructure Data Science, it’s easier than ever before for data scientists to get started, work with the tools and libraries that they want, and gain streamlined access to all data in Oracle Cloud Infrastructure and beyond. For more information, see this overview video and don’t forget to subscribe to the Oracle Big Data blog to get the latest posts sent to your inbox.

Related:

  • No Related Posts

On-Premises Autonomous Database

As Autonomous Database services have launched over the past 2 years, I’ve been often asked by customers, “will the Autonomous Database features become available to on-premises deployments?”

The answer is yes, but with a large caveat. Actually, the question itself is slightly misguided and made me realize that some fundamentals seem to be missing from the Autonomous Database dialog.

To clarify, we must first consider that there is a big difference between a database and a database service. The Autonomous Database is a service, it’s much more than a database.

A Database is software technology in a bundled set of binaries licensed and installed by a customer onto a predetermined set of infrastructure (compute, storage and network). Once installed, the customer is responsible for all aspects of the database operation, including dealing with optimal workload configuration, the availability and expansion of physical resources like storage for data, taking backups of the database for regulatory and recovery purposes, the availability of the database in the face of hardware and software level failures, updating the software to get new features or patch software flaws, addressing security concerns like protecting access to the data, etc.

On the other hand, a Database Service is not only database software technology, but a set of integrated capabilities designed to use automation to enhance the customer experience of using a database in an end to end solution. A Database Service transfers many responsibilities involved in operating the database to the service provider. An obvious example being that not every customer must install a database (or even create one for that matter), rather the service provider has automated access so new databases can be available for use in minutes instead of hours or even days. Another example being that a customer no longer must download and install software maintenance updates, rather they can simply be scheduled or initiated on demand and the service provider automation takes care of all the dirty work to get the update completed. Database Services provide capabilities beyond database operations, they can include solution components like identity management, resource governance and operational notification services, just to name a few. Also, at Oracle, the Database Services have additional data management capabilities built-in like deep monitoring, data modeling tools, data visualization, low code development environments, an array of additional value add that surrounds “the database”. This is in essence what customers of Oracle’s Co-Managed Database Services have when using Oracle Database Cloud Service (DBCS) or Exadata Cloud Service.

Now, the question is what makes an Autonomous Database, a Database Service that is self-driving, self-securing and self-repairing? The answer is that an Autonomous Database includes an additional artificial intelligence (A.I.) software layer, a layer that leverages Machine Learning algorithms and decision making to harden software automation, so it’s proactive rather than reactive and it becomes software rather than people operating the database.

Autonomous Database Service - more than a Database

Humans are especially good at abstracting. For example, coming up with a system design for dynamic process monitoring and automation. However, unlike a computer, humans are not good at scanning large quantities of data while looking for divergent patterns. For example, humans are not good at deep packet inspection of flowing zeros and ones in a network routing algorithm. The A.I. layer is a special purpose computer process, efficient and looking at large quantities of data in a way that is impossible for humans. The A.I. layer is comparing what it sees to patterns (commonly referred to as models) in that data, and is then capable of making fast decisions based on past patterns of successful operation as well as anti-patterns, data and patterns that lead to failure. Using an A.I. layer allows to minimize the number of humans in the operating loop, so there is a faster, more repeatable and reliable response to emerging conditions.

The data and patterns being examined by the A.I. layers in an Autonomous Database go well beyond the database operating logs. The data and patterns extend to every aspect of the service including the operating system, hypervisor, compute and storage level metrics, network logs, logs of supporting processes as well as the logs of adjunct functions like modeling and low code development tools.

Even further, a decision made by the A.I. layer may involve some rather complex activities such as the decision to decommission a compute server, put another one in place to handle the old server’s activities, move the software processes to the new server, update networking layers to incorporate the new server in service call routing, etc. These things are possible because the Autonomous Database service is operating in the Cloud where there is an effectively unlimited amount of infrastructure available and an API enabling a software defined infrastructure setup. It is possible to spin up new resources on demand to proactively mitigate any approaching failure condition.

It is also important to note that the data and patterns of importance can be highly influenced by the underlying components of the service, whether that’s a specific version of some software library or a specific vendor and model storage device, memory card, network switch, etc. This is how real machine learning works, it needs a sample data set from a larger (bigger is better) system set to train the models and if the system set changes, the models can become invalid.

So, let’s now revisit the question, “will the Autonomous Database features become available to on-premises deployments”?

From a traditional database deployment perspective, hopefully its clear now, Autonomous Database is a service, it is much more than a database. Much of the additional value add that comes with the service is not available in any on-premises form factor. There is no on-premises notion of a self-service software defined infrastructure, a centralized logging service, a resource governance and operational notification service nor any complete set of highly available tooling that supports the capabilities that surround the database. Finally, the system as a whole, it’s tooling and the A.I. layer depends on both the accessibility of a virtually unlimited set of infrastructure that can be called upon on demand and a set of machine learning patterns (models) that depend on a data set that is specific to the configuration running inside the Oracle Cloud, from the supporting software libraries to the specific vendor hardware running in our Infrastructure as-a Service layer.

Given all of this, the answer to our question for a traditional database deployment is necessarily, No. The large caveat is that Autonomous Database will not become available for traditional on-premises deployments.

So, why then is the answer given at the start of the blog, yes? Well, just when you thought it was all finally understood, let’s talk about non-traditional database deployments. There is a version of Autonomous Database coming to what is called an Oracle Gen 2 Exadata Cloud at Customer service deployment. This is a representative slice of the Oracle Cloud that a customer can host inside their data center, inside their network and behind their own firewall. The Oracle Gen 2 Exadata Cloud at Customer has a light weight design to allow the Autonomous Database A.I. layer and all of the supporting adjunct service capabilities to live and run in the Oracle Cloud, while using the A.I. and Autonomous Database automation to operate the database while it runs on your premises. Only in this specialized extension of the Oracle Cloud to your data center will it be possible to have an Autonomous Database on premises. Keep an eye out for a future blog with more details on this exciting Autonomous Database deployment option.

Related:

  • No Related Posts

Can an autonomous database run at the user’s location?

As Autonomous Database services have launched over the past 2 years, I’ve been often asked by customers, “will the Autonomous Database features become available to on-premises deployments?”

The answer is yes, but with a large caveat. Actually, the question itself is slightly misguided and made me realize that some fundamentals seem to be missing from the Autonomous Database dialog.

To clarify, we must first consider that there is a big difference between a database and a database service. The Autonomous Database is a service, it’s much more than a database.

A Database is software technology in a bundled set of binaries licensed and installed by a customer onto a predetermined set of infrastructure (compute, storage and network). Once installed, the customer is responsible for all aspects of the database operation, including dealing with optimal workload configuration, the availability and expansion of physical resources like storage for data, taking backups of the database for regulatory and recovery purposes, the availability of the database in the face of hardware and software level failures, updating the software to get new features or patch software flaws, addressing security concerns like protecting access to the data, etc.

On the other hand, a Database Service is not only database software technology, but a set of integrated capabilities designed to use automation to enhance the customer experience of using a database in an end to end solution. A Database Service transfers many responsibilities involved in operating the database to the service provider. An obvious example being that not every customer must install a database (or even create one for that matter), rather the service provider has automated access so new databases can be available for use in minutes instead of hours or even days. Another example being that a customer no longer must download and install software maintenance updates, rather they can simply be scheduled or initiated on demand and the service provider automation takes care of all the dirty work to get the update completed. Database Services provide capabilities beyond database operations, they can include solution components like identity management, resource governance and operational notification services, just to name a few. Also, at Oracle, the Database Services have additional data management capabilities built-in like deep monitoring, data modeling tools, data visualization, low code development environments, an array of additional value add that surrounds “the database”. This is in essence what customers of Oracle’s Co-Managed Database Services have when using Oracle Database Cloud Service (DBCS) or Exadata Cloud Service.

Now, the question is what makes an Autonomous Database, a Database Service that is self-driving, self-securing and self-repairing? The answer is that an Autonomous Database includes an additional artificial intelligence (A.I.) software layer, a layer that leverages Machine Learning algorithms and decision making to harden software automation, so it’s proactive rather than reactive and it becomes software rather than people operating the database.

Autonomous Database Service - more than a Database

Humans are especially good at abstracting. For example, coming up with a system design for dynamic process monitoring and automation. However, unlike a computer, humans are not good at scanning large quantities of data while looking for divergent patterns. For example, humans are not good at deep packet inspection of flowing zeros and ones in a network routing algorithm. The A.I. layer is a special purpose computer process, efficient and looking at large quantities of data in a way that is impossible for humans. The A.I. layer is comparing what it sees to patterns (commonly referred to as models) in that data, and is then capable of making fast decisions based on past patterns of successful operation as well as anti-patterns, data and patterns that lead to failure. Using an A.I. layer allows to minimize the number of humans in the operating loop, so there is a faster, more repeatable and reliable response to emerging conditions.

The data and patterns being examined by the A.I. layers in an Autonomous Database go well beyond the database operating logs. The data and patterns extend to every aspect of the service including the operating system, hypervisor, compute and storage level metrics, network logs, logs of supporting processes as well as the logs of adjunct functions like modeling and low code development tools.

Even further, a decision made by the A.I. layer may involve some rather complex activities such as the decision to decommission a compute server, put another one in place to handle the old server’s activities, move the software processes to the new server, update networking layers to incorporate the new server in service call routing, etc. These things are possible because the Autonomous Database service is operating in the Cloud where there is an effectively unlimited amount of infrastructure available and an API enabling a software defined infrastructure setup. It is possible to spin up new resources on demand to proactively mitigate any approaching failure condition.

It is also important to note that the data and patterns of importance can be highly influenced by the underlying components of the service, whether that’s a specific version of some software library or a specific vendor and model storage device, memory card, network switch, etc. This is how real machine learning works, it needs a sample data set from a larger (bigger is better) system set to train the models and if the system set changes, the models can become invalid.

So, let’s now revisit the question, “will the Autonomous Database features become available to on-premises deployments”?

From a traditional database deployment perspective, hopefully its clear now, Autonomous Database is a service, it is much more than a database. Much of the additional value add that comes with the service is not available in any on-premises form factor. There is no on-premises notion of a self-service software defined infrastructure, a centralized logging service, a resource governance and operational notification service nor any complete set of highly available tooling that supports the capabilities that surround the database. Finally, the system as a whole, it’s tooling and the A.I. layer depends on both the accessibility of a virtually unlimited set of infrastructure that can be called upon on demand and a set of machine learning patterns (models) that depend on a data set that is specific to the configuration running inside the Oracle Cloud, from the supporting software libraries to the specific vendor hardware running in our Infrastructure as-a Service layer.

Given all of this, the answer to our question for a traditional database deployment is necessarily, No. The large caveat is that Autonomous Database will not become available for traditional on-premises deployments.

So, why then is the answer given at the start of the blog, yes? Well, just when you thought it was all finally understood, let’s talk about non-traditional database deployments. There is a version of Autonomous Database coming to what is called an Oracle Gen 2 Exadata Cloud at Customer service deployment. This is a representative slice of the Oracle Cloud that a customer can host inside their data center, inside their network and behind their own firewall. The Oracle Gen 2 Exadata Cloud at Customer has a light weight design to allow the Autonomous Database A.I. layer and all of the supporting adjunct service capabilities to live and run in the Oracle Cloud, while using the A.I. and Autonomous Database automation to operate the database while it runs on your premises. Only in this specialized extension of the Oracle Cloud to your data center will it be possible to have an Autonomous Database on premises. Keep an eye out for a future blog with more details on this exciting Autonomous Database deployment option.

Related:

  • No Related Posts

5 ways to get an Oracle Database

Do you want to get your hands on an Oracle Database but don’t know how? Here are 5 ways to get you going:

Do you just want to type some awesome SQL and need a database to do so? Then LiveSQL.oracle.com is your friend. LiveSQL is a browser-based SQL scratchpad that not only allows you to pull off some SQL magic but also to save and share your scripts with others. It also comes with a comprehensive library of tutorials and samples. LiveSQL is the best place for anybody that is completely unfamiliar with Oracle Database and wants to get going.

If you want to have an Oracle Database on your machine instead, but don’t want to worry about setup and configuration, the Oracle provided Docker images are a good choice. All you need is to install Docker on your machine (Mac or Windows), build an image one time from Oracle’s Docker GitHub repo and Docker will take care of the rest. From then on, all you have to remember is:

docker run -name oracle -p 1521:1521 oracle/database:19.3.0-ee

and

docker start oracle

Docker is great for running one or many instances and versions of an Oracle Database on your machine without having to know how to operate (start/stop/setup) them. What you end up with is still a full-fledged Oracle Database.

If you want to have an Oracle Database on your machine, but you prefer to run it inside a Virtual Machine, then the Oracle provided Vagrant scripts will do a great job. HashiCorp’s Vagrant is a great tool to provision repeatable VM environments, including VirtualBox VMs. For this scenario, you will need to install Oracle’s VirtualBox and HashiCorp’s Vagrant on your machine first. Once you have done that, provision a VM via the scripts from the Oracle Vagrant Boxes GitHub repo and let Vagrant take care of the rest. All you have to remember is:

vagrant up

and

vagrant ssh

The Vagrant box is great if you want a scripted and repeatable way of creating a VirtualBox VM that contains an Oracle Database. You can also provision multiple VMs with different versions of the Oracle Database. The VM comes with port forwarding enabled by default, which means that you are able to connect any of your tools from your host directly, say SQL Developer for example, to the database inside the VM and treat the VM like a little embedded server.

If you like the VM approach but don’t want or need the repeatable nature of Vagrant, then the Oracle Database Application Development VM is the right choice for you. Simply download the .ova file, import it into VirtualBox and start the VM. The VM will boot into a graphical Linux desktop.

The Oracle Database App Dev VM comes with tools like SQL Developer and Oracle REST Data Services preinstalled, which makes it a great self-contained, one-stop-shop VM. It too has port forwarding enabled by default, in case you want to connect your tools from your host directly. Another bonus of the App Dev VM is that it also includes some hands-on labs that you can go through.

If you want an Oracle Database but not on your laptop, then you should check out the Oracle Cloud Free Tier which includes an Always Free Oracle Autonomous Database. Once you have signed up for the free tier and provisioned your Always Free Autonomous Database, you can head over to SQL Developer Web and get going.

The Always Free Tier Oracle Autonomous Database is great if you want the latest and greatest what Oracle has to offer in terms of cloud databases. SQL Developer Web and APEX come out of the box and you can connect any other app or IDE from anywhere around the world, as long as it has access to the internet. And the best part, as long as you use the database, it stays with you forever!

Now, what you are waiting for? Get yourself an Oracle Database!

Related:

  • No Related Posts

Autonomous Database – Dedicated : Operational Notifications

In a previous blog on Autonomous Operations Policies, I detailed a bit about how Autonomous Database Dedicated Infrastructure deployments differ from Shared Infrastructure and illustrated how with Dedicated you can setup a policy to govern a development-test, pre-production, production software update lifecycle.

I had said this next blog would be discussing how to perform monitoring of Autonomous Database – Dedicated Exadata Infrastructure operations. The objective being a group of users can be asynchronously informed about Maintenance activities including: when a new update operation is being scheduled, reminded when scheduled updates are going to occur, when software updates are beginning and when updates have ended, along with a status so you know all is good. This is done by using a combination of Oracle Events and Notifications. This is of course extremely important for any business to optimize their own operations and streamline a response to any disruption in business.

As it turns out, between now and the time I wrote the first part of my Autonomous Database Dedicated Infrastructure blog series on operational controls, Todd Sharp, a colleague of mine has written an excellent blog post that gives a general overview of Oracle Events and Notifications. Todd’s blog includes how to configure a Notification Topic Subscription. So, rather than providing a step by step guide here again, I am going to refer you to Todd’s blog and focus in this blog on the details of what notifications and events are available specifically for Autonomous Dedicated deployments.

Oracle Cloud service Resources, which are API endpoints in Oracle Cloud, all generate Events about their activities. Those service Resource Events can be monitored using Oracle Notifications Service. Recall there are 3 key service Resources for Autonomous – Dedicated: autonomous-exadata-infrastructures (AEI), autonomous-container-databases (ACD), autonomous-databases (ADB).

Oracle Notifications uses a publish and subscribe communications model. The idea is to create a Topic of interest to which relevant service Resource Events are published and for which interested users create Subscriptions to be notified of each event by their chosen protocol e.g. http, email, pager duty.

For example, one might create a Topic like MaintenanceActivities and then any service resource generating events related to maintenance can be configured to publish their events to that Topic. Users who want to monitor maintenance activities across all resources that are involved in maintenance can create a Subscription to the MaintenanceActivities Topic.

Topics are service Resources that are part of Oracle Notifications, they are defined by you and can represent an aggregation of Events that make sense to how your organization is setup to monitor service operations. If you are a small company, you might even create a Topic like ServiceActivities and direct all Events to that single Topic and perhaps one person is Subscribed to that Topic and gets all notices about all service activities. In larger companies where responsibilities are segmented you might create a range of Topics like Compliance, Security, Administration, Billing and target specific subsets of service events to each, having different groups of people monitoring each Topic. A single Event can be sent to multiple Topics if it makes sense that more than one group is aware of a specific kind of activity.

The obvious question becomes, what service Resource events are available for Autonomous Database? The current set of events for Autonomous Database dedicated deployments include:

Autonomous Exadata Infrastructure – Create(begin/end), Maintenance(scheduled/remind/begin/end), Terminate(begin/end) … a total of 8 event types.

Autonomous Container Database – Compartment(change), Backup(begin/end), Create(begin/end), Restart(begin/end), Maintenance(scheduled, reminder, begin, end), Restore(begin/end), Terminate(begin/end), Update(begin/end), Update(begin/end) … a total of 19 event types.

Autonomous Database – Change Compartment(begin/end), Create(begin/end), Create Backup(begin/end), Generate Wallet, Restore(begin/end), Start(begin/end), Stop(begin/end), Terminate(begin/end) … a total of 14 event types.

Your first step to monitoring maintenance activities would be to create a Topic for the maintenance related events, let’s say we create a Topic SoftwareUpdateCompliance. Keeping in mind, service Resources are specific to a given Compartment, so your Topic is created as a specific Compartment Resource, in the same Compartment where the Events will be getting generated. In Todd’s blog, you learned how to use Oracle Notifications to create the notification Topic. The details page of such a Topic would look as follows:

Of course, now that you have a Topic you need to have at least 1 Subscription so that when events arrive, they will be directed to someone paying attention. You learned how to do that in Todd’s blog so will not repeat here, but ideally you will have created a Subscription that targets a protocol as shown below like pager duty, so your operations team can get asynchronous notifications.

You would now need to setup maintenance related events to be published to that Topic. Recall you learned in Todd’s blog that in the Events Service, you create Event Rules that aggregate related events, when a Rule fires it triggers an Action. An Action can be directed towards Oracle Functions, Streams or Notification. When an Action targets a Notification Topic, then all Subscriptions to the Topic get the notice that the Event has happened, with detail about the Event.

To monitor maintenance activities for Autonomous Databases, you create a Rule for all Maintenance related events, then assign all possible events that have been defined for Maintenance across the Autonomous Database service Resources. You learned how to create an Event Rule in Todd’s blog, below you will see that you need to create an aggregation of Event Types for the Database Service that includes the Event Types: Autonomous Exadata Infrastructure – Maintenance(scheduled/remind/begin/end) and Autonomous Container Database – Maintenance(scheduled, reminder, begin, end).

Make sure your Rule’s Action is setup for Notifications

Rule Target setting to Notifications

Choose the Compartment where events would be generated which is where you’ve created your Topic of interest.

Rule to Compartment Setting

Select the Topic that was just created, in this case SoftwareUpdateCompliance.

Setting Rule Topic

After clicking on Create Rule, you will then be taken to a Details page where you can test it. Maintenance is not as easily triggered as it’s a scheduled activity, so its important to run a test and make sure you see the event show up in the Slack Channel associated with your Topic Subscription.

Because Maintenance cannot be directly triggered, below is an example where an operations-automation channel got an event of “eventType” : “com.oraclecloud.databaseservice.autonomous.database.backup.begin”. Today these events come in a raw JSON format, in the future will have the option to request a human readable format, but for now, getting it is JSON can be useful for further API automation.

Well, that’s all there is to it. It’s quite simple to setup Topics of interest for different categories of Events, direct all of the Events in that category to any Subscription created for the Topic. Using these Oracle Cloud features one can effectively monitor the health of the databases supporting all business applications.

Related:

  • No Related Posts

Why Data Lakes Need a Data Catalog

It’s no secret that big data is getting much bigger with each passing year—in fact, the world is seeing exponential growth in the amount of data generated, as plenty of research shows. That creates the issue of storage. If all those bits and bytes are being transmitted and you need access to them in order to analyze and derive insights via business intelligence, then the next logical step is a data lake.

But what happens when all of that data is sitting in the data lake? Finding anything specific within such a repository can be unwieldy by today’s standards. With the growing volume of data generated by all the world’s devices, the data lake will only grow wider and deeper with each passing day. Thus, while collecting it into a repository is key to using it, information needs to be cataloged and accessible in order for it to actually be usable. The sensible solution, then, is to implement a data catalog.

Never miss an update about big data! Subscribe to the Big Data Blog to receive the latest posts straight to your inbox!

What Is a Data Lake?

Before understanding why a data catalog can be so useful in this situation, it’s important to grasp the concept of a data lake. In layman’s terms, a data lake acts as a repository that stores data exactly the way it comes in. If it’s a structured dataset, it maintains that structure without adding any further indexing or metadata. If it’s unstructured data (for example, social media posts, images, MP3 files, etc.), it lands in the data lake as is, whatever its native format might be. Data lakes can take input from multiple sources, making them a functional single repository for an organization to use as a collection point. To go further into the lake metaphor, consider each data source as a stream or a river and they all lead to the data lake, where raw and unfiltered datasets sit next to curated and enterprise/certified datasets.

Collecting data is only half of the equation, however. A repository only works well if data can be called up and used for analysis. In a data lake, data remains in its raw format until this step happens. At that point, a schema is applied to it for processing (schema on read), allowing analysts and data scientists to pick and choose what they work with and how they work with it.

This is a very simple call-and-response action, but one element is missing: the search process. A data lake requires data governance. Without organization, searching for data is a chaotic, inefficient, and time-consuming process. And if too much time passes without clear organization and governance, the value of a data lake may collapse under its own accumulated data.

Enter the data catalog.

What Is a Data Catalog?

A data catalog is exactly as it sounds: it is a catalog for all the big data in a data lake. By applying metadata to everything within the data lake, data discovery and governance become much easier tasks. By applying metadata and a hierarchical logic to incoming data, datasets receive the necessary context and trackable lineage to be used efficiently in workflows.

Let’s use the analogy of notes in a researcher’s library. In this library, a researcher gets structured data in the form of books that feature chapters, indices, and glossaries. The researcher also gets unstructured data in the form of notebooks that feature no real organization or delineation at all. A data catalog would take each of these items without changing their native format and apply a logical catalog to them using metadata such as date received, sender, general topic, and other such items that could accelerate data discovery.

Given that most data lake situations lack a universal organizational tool, a true data catalog is an essential add-on. Without the level of organization of a data catalog, a data lake becomes a data swamp—and trying to pull data from a data swamp creates a process that is inefficient at best and a bottleneck at worst.

How Data Lakes Work with Data Catalogs

Let’s take a look at a data scientist’s workflow from two different perspectives: without a data catalog and with a data catalog. Our hypothetical case study involves a smart doorbell that provides a stream of device data. At the same time, the company tracks mentions on social media by users who’ve had packages stolen to determine times that more accurately predict when thieves come.

Without a data catalog: In this example, a data lake has datasets streaming in from Internet of Things (IoT) devices along with collected social media posts from the marketing team. A data analyst wants to examine the impact of a specific feature’s usage on social media sharing. Remember, the data in a data lake remains raw and unprocessed. In this case, data scientists will have to pull device datasets from the time period of the feature’s launch, then examine the individual data tables. To cross reference against social media, they will have to pull all social media posts from this time period, then filter out by keyword to try and drill down using mentions of the feature. While all this can be achieved using the data lake as a single source, it also requires quite a bit of manual labor for preparation time.

With a data catalog: As datasets come into the data lake, a data catalog’s machine learning capabilities recognize the IoT data and create a universal schema based on those elements. Users still have the ability to apply their own metadata to enhance discoverability. Thus, when data scientists want to pull their data, a search within the data catalog brings up relevant results associated with the feature and other targeted keywords, allowing for much quicker preparation and processing.

This example illustrates the stark difference created by a data catalog. Without it, data scientists are essentially searching through folders without context—the information sought has to be already identified through some means such as data source, time range, and file type. In a small, controlled data environment with limited sources, this is workable. However, in a large repository featuring many sources and heavy collaboration, it quickly devolves into murky chaos.

A data catalog doesn’t completely automate everything, though its ability to intake structured data does feature significant automated processing. However, even with unstructured data, inherent machine learning and artificial intelligence capabilities mean that if a data scientist manually processes data with set patterns, then the catalog can begin to learn and provide first-cut recommendations to speed things up.

Position Your Data Lake for Success

The volume of data flowing into repositories is only getting bigger with each passing day. To ensure efficiency and accuracy, a form of governance is necessary for creating order among the chaos. Otherwise, a data lake quickly becomes a proverbial data swamp. Fortunately, data catalogs are a simple tool to achieve this, and by integrating such a thing into a repository, organizations are set up for success now—and prepared to scale up as needed towards a bigger-than-big data future.

Need to know more about data lakes and data catalogs? Check out Oracle’s big data management products and don’t forget to subscribe to the Oracle Big Data blog to get the latest posts sent to your inbox

Related:

  • No Related Posts

Oracle Database(s) Top DB-Engines Ranking

Oracle Database and MySQL top the latest DB-Engines ranking for database management systems (see chart below).

The DB-Engines Index utilizes a scientific method to calculate the popularity of every database management system in use (whether that be relational, NoSQL, time-series, etc.). As of January 2020, Oracle Database is ranked the most popular among all 350 databases that have been analyzed. MySQL, the open source database that is developed, distributed, and supported by Oracle takes the #2 spot.

Not only do Oracle Database and MySQL top the DB-Engines ranking, but their 2019 increase in popularity was higher than any other database’s (see the chart below). In other words, the popularity gap between Oracle and its database competitors is widening.

This latest DB-Engines ranking follows the November 2019 Gartner report, ‘Critical Capabilities for Operational Database Management Systems’, where Oracle Database again achieved the highest scores in all four Use Cases.

Related:

  • No Related Posts

Enterprise Manager CIS Benchmark Certification Eases Adoption of Secure Database Best Practices

It only takes a single mistake for the “bad guys” to be able to exploit a misconfiguration and exfiltrate your data. Thanks to the Center for Internet Security, Oracle Database users can avoid such scenarios by following the best practices defined by the CIS Benchmarks™. With the high rate of change in DevOps-oriented development teams and the profilferation of data across on-premise and cloud environments, database administrators now have an easy way to comply with these standards right within Oracle Enterprise Manager.

Configuration and Compliance management has been part of Oracle Enterprise Manager Database Lifecycle Management for a long time, and we’re happy to report that Oracle Enterprise Manager has been certified by CIS to compare the configuration status of Oracle Databases against the consensus-based best practice standards contained in the Oracle Database Benchmark v2.1.0, Level 1- RDBMS Profile. Organizations that leverage Oracle Enterprise Manager can now ensure that the configurations of their critical assets align with the CIS Benchmarks consensus-based practice standards for all their database releases including Oracle Database 18c and 19c. For more details on Oracle’s CIS listings visit Center for Internet Security Web Site.

“Data is a company’s most valuable asset, and securing it has never been more important. We are pleased to support the industry standard CIS Benchmarks as part of our comprehensive Enterprise Manager automation and compliance offerings.”

Wim Coekaerts, Senior Vice President, Software Development

“Cybersecurity challenges are mounting daily, which makes the need for standard configurations imperative. By certifying its product with CIS, Oracle has demonstrated its commitment to actively solve the foundational problem of ensuring standard configurations are used throughout a given enterprise.”

Curtis Dukes, CIS Executive Vice President of Security Best Practices & Automation Group.

Enterprise Manager supports 2 flavors of the CIS Oracle Database v2.1.0 Benchmarks, one for Single-Instance Database and one for Cluster Database. Below is a screenshot of what the listings look like in the Compliance Framework.

Figure 1. CIS Benchmarks as they appear in the Enterprise Manager user interface.

CIS provides comprehensive configuration coverage for Oracle database, including:

  • Installation
  • Parameters
  • Connectivity
  • User Privileges
  • Auditing

Below are examples of some of the specific areas the Benchmark focuses on:

Figure 2. Samples of evaluation areas in the CIS Benchmarks for Oracle Database.

In addition to the CIS Benchmarks included in the latest release of Oracle Enterprise Manager, we’ve also included new Oracle-provided Security benchmarks for Database 18c and 19c. We’re committed to continuing to bring you best-in-class security offerings to harden your security posture across your data estate, whether on-premise or in the cloud.

For more information about Oracle Enterprise Manager, visit http://www.oracle.com/enterprise-manager and for more information about the Center for Internet Security (CIS), visit https://www.ciscecurity.org.

About CIS

The Center for Internet Security, Inc. (CIS®) makes the connected world a safer place for people, businesses, and governments. We are a community-driven nonprofit, responsible for the CIS Controls® and CIS Benchmarks™, globally recognized best practices for securing IT systems and data. We lead a global community of IT professionals to continuously refine these standards to proactively safeguard against emerging threats. Our CIS Hardened Images® provide secure, on-demand, scalable computing environments in the cloud. CIS is home to the Multi-State Information Sharing and Analysis Center® (MS-ISAC®), the trusted resource for cyber threat prevention, protection, response, and recovery for U.S. State, Local, Tribal, and Territorial government entities, and the Elections Infrastructure Information Sharing and Analysis Center® (EI-ISAC®), which supports the cybersecurity needs of U.S. elections offices. To learn more, visit CISecurity.org or follow us on Twitter: @CISecurity.

Related:

  • No Related Posts

Cloud Day: What’s Possible and Where to Start

Want to get a peek into the future of modern IT? Then, come to Oracle Cloud Day, says Dain Hansen, VP of Product Marketing for IaaS and PaaS at Oracle. After speaking at Oracle Cloud Day events in Boston and Chicago last year, Hansen said that one of the things he liked best about the event was that it gave people a real view into what their future could be.

“Imagine a world where everything is automated. You can use AI to power the next level of insights, or you can build a modern application that you can talk to just like you talk to your phone,” Hansen said. “Those are things that we want people to experience. We want them to get first-hand knowledge of and use and touch and see what’s possible.”

Register here

This year, Hansen said, it’s all about how to use data to get a leg up—on the competition and in your career.

“You’re going to see all kinds of ways to use your data,” Hansen said.

Oracle Cloud Day will take a broad, yet detailed, look at all things data—how to manage it, how to secure it, how to draw insights from it, and how to create applications and services that use it in new ways.

But with so much to see at Oracle Cloud Day and so many new technologies to take in, we asked Hansen, “How does someone get the most from Oracle Cloud Day?”

Here are Hansen’s three tips.

Discover the Best Way to Do What You’re Trying to Do

Because there’s so much expertise on hand, Oracle Cloud Day is the perfect place to get information on best practices. Hansen recommends focusing first on what you’re trying to do within your organization, then finding the best way to do it.

If you’re a security person, maybe you want to learn about the latest security threats or figure out the best way to secure your data across cloud and on premises. If you’re an apps IT person, maybe you want to hear about the best way to migrate an application to the cloud.

Whatever it is, zero in on that topic, seek out the best way to do it, and take a look at how Oracle can help. Oracle Cloud Day is a great venue to experience technologies first hand and talk to experts about how they can help you with not only your needs, but the needs of your business as a whole.

Decide What You’re Going to Learn Next

Once you’ve identified how you can address your current needs, take a look at the horizon. What’s next?

“Everyone is always trying to learn something. Even for me, I’m always trying to study and see what I need to pick up on,” Hansen said.

Because of its emphasis on modern IT and the breadth of Oracle technology, Cloud Day is a great place to get up to date on what’s next for you and your business.

Hear From People Already Doing It

Maybe one of the best things about Cloud Day, Hansen said, is that attendees get to hear from companies already reaching their goals. Cloud Day will be packed with real-life stories told by customers who have made the journey.

“Customers don’t mess around. They don’t mince words. They tell it like it is. And that’s one thing that I don’t want anyone to miss is to hear what our customers say about what they’re doing,” Hansen said.

With 15 sessions across three tracks—Modernizing Data Management, Modernizing Applications, and Transforming Business with Analytics and AI—plus the Developer Playground, industry experts and partners in the Innovation Lounge, and a keynote that brings it all together, there are plenty of opportunities to track down all the information you need for what you’re doing today and what you’ll want to do tomorrow.

Now that you know how to make the most of your time at Cloud Day, don’t forget to register. For more information about Oracle Cloud Day, visit the Oracle Cloud Day website.

Related:

  • No Related Posts

Data Lake, Data Warehouse and Database…What’s the Difference?

There are so many buzzwords these days regarding data management. Data lakes, data warehouses, and databases – what are they? In this article, we’ll walk through them and cover the definitions, the key differences, and what we see for the future.

Start building your own data lake with a free trial

Data Lake Definition

If you want full, in-depth information, you can read our article called, “What’s a Data Lake?” But here we can tell you, “A data lake is a place to store your structured and unstructured data, as well as a method for organizing large volumes of highly diverse data from diverse sources.”

The data lake tends to ingest data very quickly and prepare it later, on the fly, as people access it.

Never miss an update about big data! Subscribe to the Big Data Blog to receive the latest posts straight to your inbox!

Data Warehouse Definition

A data warehouse collects data from various sources, whether internal or external, and optimizes the data for retrieval for business purposes. The data is usually structured, often from relational databases, but it can be unstructured too.

Primarily, the data warehouse is designed to gather business insights and allows businesses to integrate their data, manage it, and analyze it at many levels.

Database Definition

Essentially, a database is an organized collection of data. Databases are classified by the way they store this data. Early databases were flat and limited to simple rows and columns. Today, the popular databases are:

  • Relational databases, which store their data in tables
  • Object-oriented databases, which store their data in object classes and subclasses

Data Mart, Data Swamp and Other Terms

And, of course, there are other terms such as data mart and data swamp, which we’ll cover very quickly so you can sound like a data expert.

Enterprise Data Warehouse (EDW): This is a data warehouse that serves the entire enterprise.

Data Mart: A data mart is used by individual departments or groups and is intentionally limited in scope because it looks at what users need right now versus the data that already exists.

Data Swamp: When your data lake gets messy and is unmanageable, it becomes a data swamp.

The Differences Between Data Lakes, Data Warehouses, and Databases

Data lakes, data warehouses and databases are all designed to store data. So why are there different ways to store data, and what’s significant about them? In this section, we’ll cover the significant differences, with each definition building on the last.

The Database

Databases came about first, rising in the 1950s with the relational database becoming popular in the 1980s.

Databases are really set up to monitor and update real-time structured data, and they usually have only the most recent data available.

The Data Warehouse

But the data warehouse is a model to support the flow of data from operational systems to decision systems. What this means, essentially, is that businesses were finding that their data was coming in from multiple places—and they needed a different place to analyze it all. Hence the growth of the data warehouse.

For example, let’s say you have a rewards card with a grocery chain. The database might hold your most recent purchases, with a goal to analyze current shopper trends. The data warehouse might hold a record of all of the items you’ve ever bought and it would be optimized so that data scientists could more easily analyze all of that data.

The Data Lake

Now let’s throw the data lake into the mix. And because it’s the newest, we’ll talk about this one more in depth. The data lake really started to rise around the 2000s, as a way to store unstructured data in a more cost-effective way. The key phrase here is cost effective.

Although databases and data warehouses can handle unstructured data, they don’t do so in the most efficient manner. With so much data out there, it can get expensive to store all of your data in a database or a data warehouse.

In addition, there’s the time-and-effort constraint. Data that goes into databases and data warehouses needs to be cleansed and prepared before it gets stored. And with today’s unstructured data, that can be a long and arduous process when you’re not even completely sure that the data is going to be used.

That’s why data lakes have risen to the forefront. The data lake is mainly designed to handle unstructured data in the most cost-effective manner possible. As a reminder, unstructured data can be anything from text to social media data to machine data such as log files and sensor data from IoT devices.

Data Lake Example

Going back to the grocery example that we used with the data warehouse, you might consider adding a data lake into the mix when you want a way to store your big data. Think about the social sentiment you’re collecting, or advertising results. Anything that is unstructured but still valuable can be stored in a data lake and work with both your data warehouse and your database.

Note 1: Having a data lake doesn’t mean you can just load your data willy-nilly. That’s what leads to a data swamp. But it does make the process easier, and new technologies such as having a data catalog will steadily make it simpler to find and use the data in your data lake.

Note 2: If you want more information on the ideal data lake architecture, you can read the full article we wrote on the topic. It describes why you want your data lake built on object storage and Apache Spark, versus Hadoop.

What’s the Future of Data Lakes, Data Warehouses, and Databases?

Will one of these technologies rise to overtake the others?

We don’t think so.

Here’s what we see. As the value and amount of unstructured data rises, the data lake will become increasingly popular. But there will always be an essential place for databases and data warehouses.

You’ll probably continue to keep your structured data in the database or data warehouse. But these days, more companies are moving their unstructured data to data lakes on the cloud, where it’s more cost effective to store it and easier to move it when it’s needed.

This workload that involves the database, data warehouse, and data lake in different ways is one that works, and works well. We’ll continue to see more of this for the foreseeable future.

If you’re interested in the data lake and want to try to build one yourself, we’re offering a free data lake trial with a step-by-step tutorial. Get started today, and don’t forget to subscribe to the Oracle Big Data blog to get the latest posts sent to your inbox

Related:

  • No Related Posts