The Edge is More than a Place

Edge is going through a sea change today. Evolving technology ecosystems and new wave of applications in computer vision, Edge analytics, telco virtual radio access networks and autonomous vehicles require real time analytics. IT work at home and data privacy regulations are all driving the new paradigm. The outcome of these changes is an explosion of data produced outside the cloud and core data center. This data has two requirements: 1) a powerful compute capability to process it, and 2) real-time systems with very low latency that will provide insights and summarize the massive wave of … READ MORE

Related:

How a Unified Approach Supports Your Data Strategy

Are you finding it easy to explore and analyze data located on-premise or in the cloud? You are not alone, but there is a solution.

It’s a rare instance of a company that stores 100 percent of its data in one place or a company that secures 100 percent of its data in the cloud. Most companies must combine datasets. But by establishing a unified data tier, it can be easier to perform certain types of analytics, especially when the data is widely distributed.

Never miss an update about big data! Subscribe to the Big Data Blog to receive the latest posts straight to your inbox!

Take for example the case of a bike-share system that looked at its publicly available ridership data, then added weather data to predict bike ridership and made appropriate changes to make sure bikes were available when and where riders needed them. If the data was stored in different geographical areas and used different storage systems, it might be difficult to compare that information to make an informed decision.

So how can companies take advantage of data, whether it’s located in Oracle Autonomous Data Warehouse, Oracle Database, object store, or Hadoop? A recent Oracle webcast titled, “Explore, Access, and Integrate Any Data, Anywhere,” explored this issue. Host Peter Jeffcock outlined four new services Oracle released in February 2020 to let companies dive right in and solve these real-world problems, manage data, and enable augmented analytics:

The idea is that there needs to be a unified data tier that starts with workload portability, which means that your data and the data environment can be managed in the public cloud, on a local cloud, or in your on-premise data store.

Unified Data Tier

The next step is to develop a converged database, especially with an autonomous component so that repeatable processes free up administrative time and reduce human error. Oracle Database allows for multiple data models, multiple workloads, and multiple tenants, making it easier to operate because all these processes are managed into a single database.

You can take it one step further if you add the cloud to the configuration. Oracle can manage the data and apply different processes and machine learning so that you can run your database autonomously in the cloud.

Unified Data Tier

The unified data tier also means taking advantage of multiple data stores such as data lakes and other databases. And finally expanding that ecosystem with our partners such as our recent agreement with Microsoft that allows for a unified data tier between Oracle Cloud and Microsoft’s Azure.

“If you want to run an application in the Microsoft Cloud and you want to connect to the Oracle Cloud where the data is stored, that’s now supported. It’s a unique relationship and it’s something to look into if you want to run a multi-cloud strategy,” Jeffcock says.

You can experience the full presentation if you register for the on-demand webcast.

To learn more about how to get started with data lakes, check out Oracle Big Data Service—and don’t forget to subscribe to the Oracle Big Data blog to get the latest posts sent to your inbox. Also, follow us on Twitter @OracleBigData.

Related:

Build Your Data Lake with Oracle Big Data Service

In today’s world, there’s an ever-growing deluge of highly diverse data coming from diverse sources. In the struggle to manage and organize that data, practitioners are finding it harder when they only have the traditional relational database or data warehouse as options.

That’s why the data lake has become increasingly popular as a complement to traditional data management. Think of the traditional data warehouse as a reservoir—it’s cleansed, drinkable.

The data lake, on the other hand, has data that are of potentially unknown value. The data isn’t necessarily cleansed—which is why it’s more of an adventure. The data lake can be voluminous, brimming with data and unmatched possibilities. Users can easily load even more data and start experimenting to find new insights that organizations couldn’t discover before.

Organizations must be able to:

• Store their data in a way that is less complicated

• Reduce management even though the data is more complex

• Use data in a way that makes sense for them

And that’s exactly why Oracle has created Oracle Big Data Service as a way to help build data lakes.

Oracle Big Data Service is an automated service based on Cloudera Enterprise that provides a cost-effective Hadoop data lake environment—a secure place to store and analyze data of different types from any source. It can be used as a data lake or a machine learning platform.

It comes with a fully integrated stack that includes both open-source and Oracle value-added tools, and it’s designed for enterprises that need flexible deployment options, scalability, and the ability to add tools of their choosing.

Oracle Big Data Service also provides:

  • An easy way to expand on premises to Oracle Cloud
  • Secure, reliable, and elastic Hadoop clusters in minutes
  • Native integration with Oracle Cloud platform services

Never miss an update about big data! Subscribe to the Big Data Blog to receive the latest posts straight to your inbox!

Oracle + Hadoop = A Better Data Lake Together

We wanted to make the power of Hadoop and the entire Hadoop ecosystem available to you. But Hadoop can be complicated, which is why we’ve combined the best of what Oracle and Cloudera have to offer and made it into something easier to handle—which makes building and managing your data lake easier than ever.

With Cloudera Enterprise Deployment, our service is vertically integrated for Hadoop, Kafka, and Spark with a best-practices, high-availability deployment.

With Big Data Service, you get:

  • Highly secure, highly available clusters provisioned in minutes
  • Ability to expand on-premises Hadoop, which enables you to deploy, test, development, and/or move data lakes to cloud
  • Flexibility to scale as you wish using high-performance bare metal or cost-effective virtual machine shapes
  • Automatically deployed security and management features

You also can choose your Cloudera version, giving you the ability to:

  • Match your current deployment—which is important for test and dev environments
  • Deploy new versions—allowing you to take advantage of the distribution’s latest features

Oracle Big Data Service Features

We built Oracle Big Data Service to be your go-to big data and data lake solution, one that’s specifically designed for a diverse set of big data use cases and workloads. From short-lived clusters used to tackle specific tasks to long-lived clusters that manage large data lakes, Oracle Big Data Service scales to meet an organization’s requirements at a low cost and with the highest levels of security.

Let’s explore just how Oracle Big Data Service does this.

  1. Oracle Big Data Service and Oracle Cloud SQL

Use Oracle SQL to query across big data sources with Oracle Cloud SQL, including the Hadoop Distributed File System (HDFS), Hive, object stores, Kafka, and NoSQL.

You can accomplish all of this with simple administration, because Oracle Cloud SQL uses existing Hive metadata and security, and offers fast, scale-out processing using Oracle Cloud SQL compute.

  1. Oracle Big Data Service and Big Data Analytics

What use is managing and accessing your data if you can’t run analytics to find real results? We offer support in the areas of machine learning, spatial analysis, and graph analysis to help you get the information your organization needs to gain better business results and improved metrics. Oracle Big Data Service customers are licensed for these options and can deploy at no extra cost.

It’s also easy to connect to Oracle Cloud services such as Oracle Analytics Cloud, Oracle Cloud Infrastructure Data Science, or Oracle Autonomous Database. Or you can use any Cloudera-certified application for a wide range of analytic tools and applications.

  1. Oracle Big Data Service and Workload Portability

Cloud may be the future of enterprise computing, which is why we’ve built the newest, best cloud infrastructure out there with Oracle Cloud Infrastructure. But it’s not everything—at least, not yet. You still need to maintain a mix of public cloud, local cloud, and traditional on-premises computing for the foreseeable future.

With Oracle Big Data Service, deploy where it makes sense. With Oracle, if you develop something on premises, it’s easy to move that to the cloud and vice versa.

  1. Oracle Big Data Service and Secure, High-Availability Clusters

With Oracle Big Data Service, expect easy deployment when creating your clusters. Specify minimal settings to create the cluster, then use just one click to create a cluster with highly available Hadoop services.

You also get a choice of Cloudera versions, enabling “Cloud Also” deployments to match for on-premises compatibility, or you can choose newer versions to take advantage of the latest features.

  1. Oracle Big Data Service Offers Security

If you’re using off-box virtualization, Oracle can’t see customer data and customers can’t see Oracle management code. In most first-generation clouds, the network and tenant environments are coupled, only abstracted by the hypervisor.

Oracle follows a Least Trust Design principle. We don’t trust the hardware, the customer (think rogue employees), or the hypervisor. That’s why we’ve separated our network and tenant environments. Isolating that network virtualization helps prevent the spread and lateral movement of attacks.

In addition, with Oracle Big Data Service, all Cloudera security features are enabled with strong authentication, role-based authorization, auditing, and encryption.

  1. Oracle Big Data Service and the Compute and Storage You Want

Whether you’re using Oracle Big Data Cloud for development, test, data science, or data lakes, we offer the compute offerings you need for your use case. Leverage the flexibility of virtual machines (VMs), block storage, and with direct-attached NVMe (non-volatile memory express) storage, the unparalleled performance of bare metal.

  1. Oracle Big Data Service and Superior Networking

With Oracle Big Data Service, you can expect high fidelity, virtual networks, and connectivity. Our networking is:

Customizable

  • Fully configurable IP addresses, subnets, routing, and firewalls to support new or existing private networks

High performance and consistent

  • High bandwidth, microsecond latency network
  • Private access without traversing the internet

Capable of connecting to corporate networks

  • FastConnect—dedicated, private connectivity
  • VPN Connect—simple and secure internet connectivity
  1. Oracle Big Data Service and Oracle’s Data Management Platform

Your organization spends time and effort creating, attaining and storing data and you want to be able to use it. You can reduce the time, cost, and effort of getting data from wherever it originates to all the places it’s needed across the enterprise with Oracle.

Oracle has spent decades building and expanding its data management platform.

With Oracle’s end-to-end data management, you get an easy connection to:

  • Oracle Autonomous Database
  • Oracle Analytics Cloud
  • Oracle Cloud Infrastructure Streaming
  • Oracle Cloud Infrastructure Data Catalog
  • Oracle Cloud Infrastructure Data Science
  • Oracle Cloud Infrastructure Data Flow
  • The list goes on …

And with a unified query with Oracle Cloud SQL, you’ll be able to correlate information from a variety of sources using Oracle SQL. In addition, you will gain a host of Oracle analytic and connectivity options, including:

  • Oracle Machine Learning
  • Oracle Big Data Spatial and Graph
  • Oracle Big Data Connectors
  • Oracle Data Integrator Enterprise Edition

Oracle Big Data Service for All Your Data Lake Needs

From enabling machine learning to storing and analyzing data, Oracle Big Data Service is a scalable, secure data lake service that meets your requirements at a low cost and the highest levels of security.

It allows you to worry less about managing and storing data. And it empowers you to start analyzing your data in a way that makes the future of your organization more successful than ever before.

To learn more about how to get started with data lakes, check out Oracle Big Data Service—and don’t forget to subscribe to the Oracle Big Data blog to get the latest posts sent to your inbox.

Related:

Four Tools to Integrate into Your Data Lake

A data lake is an absolutely vital piece of today’s big data business environment. A single company may have incoming data from a huge variety of sources, and having a means to handle all of that is essential. For example, your business might be compiling data from places as diverse as your social media feed, your app’s metrics, your internal HR tracking, your website analytics, and your marketing campaigns. A data lake can help you get your arms around all of that, funneling those sources into a single consolidated repository of raw data.

But what can you do with that data once it’s all been brought into a data lake? The truth is that putting everything into a large repository is only part of the equation. While it’s possible to pull data from there for further analysis, a data lake without any integrated tools remains functional but cumbersome, even clunky.

On the other hand, when a data lake integrates with the right tools, the entire user experience opens up. The result is streamlined access to data while minimizing errors during export and ingestion. In fact, integrated tools do more than just make things faster and easier. By expediting automation, the door opens to exciting new insights, allowing for new perspectives and new discoveries that can maximize the potential of your business.

To get there, you’ll need to put the right pieces in place. Here are four essential tools to integrate into your data lake experience.

Never miss an update about big data! Subscribe to the Big Data Blog to receive the latest posts straight to your inbox!

Machine Learning

Even if your data sources are vetted, secured, and organized, the sheer volume of data makes it unruly. As a data lake tends to be a repository for raw data—which includes unstructured items such as MP3 files, video files, and emails, in addition to structured items such as form data—much of the incoming data across various sources can only be natively organized so far. While it can be easy to set up a known data source for, say, form data into a repository dedicated to the fields related to that format, other data (such as images) arrives with limited discoverability.

Machine learning can help accelerate the processing of this data. With machine learning, data is organized and made more accessible through various processes, including:

In processed datasets, machine learning can use historical data and results to identify patterns and insights ahead of time, flagging them for further examination and analysis.

With raw data, machine learning can analyze usage patterns and historical metadata assignments to begin implementing metadata automatically for faster discovery.

The latter point requires the use of a data catalog tool, which leads us to the next point.

Data Catalog

Simply put, a data catalog is a tool that integrates into any data repository for metadata management and assignment. Products like Oracle Cloud Infrastructure Data Catalog are a critical element of data processing. With a data catalog, raw data can be assigned technical, operational, and business metadata. These are defined as:

  • Technical metadata: Used in the storage and structure of the data in a database or system
  • Business metadata: Contributed by users as annotations or business context
  • Operational metadata: Created from the processing and accessing of data, which indicates data freshness and data usage, and connects everything together in a meaningful way

By implementing metadata, raw data can be made much more accessible. This accelerates organization, preparation, and discoverability for all users without any need to dig into the technical details of raw data within the data lake.

Integrated Analytics

A data lake acts as a middleman between data sources and tools, storing the data until it is called for by data scientists and business users. When analytics and other tools exist separate from the data lake, that adds further steps for additional preparation and formatting, exporting to CSV or other standardized formats, and then importing into the analytics platform. Sometimes, this also includes additional configuration once inside the analytics platform for usability. The cumulative effect of all these steps creates a drag on the overall analysis process, and while having all the data within the data lake is certainly a help, this lack of connectivity creates significant hurdles within a workflow.

Thus, the ideal way to allow all users within an organization to swiftly access data is to use analytics tools that seamlessly integrate with your data lake. Doing so removes unnecessary manual steps for data preparation and ingestion. This really comes into play when experimenting with variability in datasets; rather than having to pull a new dataset every time you experiment with different variables, integrated tools allow this to be done in real time (or near-real time). Not only does this make things easier, this flexibility opens the door to new levels of insight as it allows for previously unavailable experimentation.

Integrated Graph Analytics

In recent years, data analysts have started to take advantage of graph analyticsthat is, a newer form of data analysis that creates insights based on relationships between data points. For those new to the concept, graph analytics considers individual data points similar to dots in a bubble—each data point is a dot, and graph analytics allows you to examine the relationship between data by identifying volume of related connections, proximity, strength of connection, and other factors.

This is a powerful tool that can be used for new types of analysis in datasets with the need to examine relationships between data points. Graph analytics often works with a graph database itself or through a separate graph analytics tool. As with traditional analytics, any sort of extra data exporting/ingesting can slow down the process or create data inaccuracies depending on the level of manual involvement. To get the most out of your data lake, integrating cutting-edge tools such as graph analytics means giving data scientists the means to produce insights as they see fit.

Why Oracle Big Data Service?

Oracle Big Data Service is a powerful Hadoop-based data lake solution that delivers all of the needs and capabilities required in a big data world:

  • Integration: Oracle Big Data Service is built on Oracle Cloud Infrastructure and integrates seamlessly into related services and features such as Oracle Analytics Cloud and Oracle Cloud Infrastructure Data Catalog.
  • Comprehensive software stack: Oracle Big Data Service comes with key big data software: Oracle Machine Learning for Spark, Oracle Spatial Analysis, Oracle Graph Analysis, and much more.
  • Provisioning: Deploying a fully configured version of Cloudera Enterprise, Oracle Big Data Service easily configures and scales up as needed.
  • Secure and highly available: With built-in high availability and security measures, Oracle Big Data Service integrates and executes this in a single click.

To learn more about Oracle Big Data Service, click here—and don’t forget to subscribe to the Oracle Big Data blog to get the latest posts sent to your inbox.

Related:

GA of Oracle Database 20c Preview Release

The latest annual release of the world’s most popular database, Oracle Database 20c, is now available for preview on Oracle Cloud (Database Cloud Service Virtual Machine).

As with every new release, Oracle Database 20c introduces key new features and enhancements that further extend Oracle’s multi-model converged architecture with the introduction of Native Blockchain Tables, and more performance enhancements such as Automatic In-Memory (AIM) and a binary JSON datatype. For a quick introduction, watch Oracle EVP, Andy Mendelsohn discuss Oracle Database 20c during his last Openworld keynote.

For the complete list of new features in Oracle Database 20c, please refer to the new features guide in latest documentation set. To learn more about some of the key new features and enhancements in Oracle Database 20c, check out the following blog posts:

For availability of Oracle Database 20c on all other platforms on-premises (including Exadata) and in Oracle Cloud please refer to MyOracle Support (MOS) note 742060.1.

Related:

Oracle Database 20c Preview Released

The latest annual release of the world’s most popular database, Oracle Database 20c, is now available for preview on Oracle Cloud (Database Cloud Service Virtual Machine).

As with every new release, Oracle Database 20c introduces key new features and enhancements that further extend Oracle’s multi-model converged architecture with the introduction of Native Blockchain Tables, and more performance enhancements such as Automatic In-Memory (AIM) and a binary JSON datatype. For a quick introduction, watch Oracle EVP, Andy Mendelsohn discuss Oracle Database 20c during his last Openworld keynote.

For the complete list of new features in Oracle Database 20c, please refer to the new features guide in latest documentation set. To learn more about some of the key new features and enhancements in Oracle Database 20c, check out the following blog posts:

For availability of Oracle Database releases on all platforms on-premises and in Oracle Cloud please refer to MyOracle Support (MOS) note 742060.1.

Related:

What Is Oracle Cloud Infrastructure Data Catalog?

And What Can You Do with It?

Simply put, Oracle Cloud Infrastructure Data Catalog helps organizations manage their data by creating an organized inventory of data assets. It uses metadata to create a single, all-encompassing and searchable view to provide deeper visibility into your data assets across Oracle Cloud and beyond. This video provides a quick overview of the service.

This helps data professionals such as analysts, data scientists, and data stewards discover and assess data for analytics and data science projects. It also supports data governance by helping users find, understand, and track their cloud data assets and on-premises data as well—and it’s included with your Oracle Cloud Infrastructure subscription.

Never miss an update about big data! Subscribe to the Big Data Blog to receive the latest posts straight to your inbox!

Why Does Oracle Cloud Infrastructure Data Catalog Matter?

Hint: It has to do with self-service data discovery and governance.

Oracle Cloud Infrastructure Data Catalog matters because it’s a foundational part of the modern data platform—a platform where all of your data stores can act as one, and you can view and access that data easily, no matter whether it resides in Oracle Cloud, object storage, an on-premises database, big data system, or a self-driving database.

This means that data users—data scientists, data analysts, data engineers, and data stewards—can all find data across systems and the enterprise more easily because a data catalog provides a centralized, collaborative environment to encourage exploration. Now these key players can trust their data because they gain technical as well as business context around it. It means they don’t have to have SQL access, or understand what object storage is, or figure out the complexities of Hadoop—they can get started faster with their single unified view through their data catalog. It’s no longer necessary to have five different people with five different skillsets just to find where the right data resides.

Easy data discovery is now possible.

And of course, it’s not just data discovery that’s easier. Governance is also easier—and that is a key benefit with GDPR and ever more complex compliance requirements in today’s world of multiple enterprise systems, with on-premises, cloud, and multi-cloud environments.

With Oracle Cloud Infrastructure Data Catalog, you have better visibility into all of your assets, and business context is available in the form of a business glossary and user annotations. And of course, understanding the data you have is essential for governance.

How Does Oracle Cloud Infrastructure Data Catalog Work?

Oracle Cloud Infrastructure Data Catalog takes metadata—technical, business, and operational—from various data sources, users, and assets, and harvests it to turn it into a data catalog: a single collaborative solution for data professionals to collect, organize, find, access, enrich, and activate metadata to support self-service data discovery and governance for trusted data assets across Oracle Cloud.

And what’s so important about this metadata? Metadata is the key to Oracle Cloud Infrastructure Data Catalog. There are three types of metadata that are relevant and key to how our data catalog works:

  • Technical metadata: Used in the storage and structure of the data in a database or system
  • Business metadata: Contributed by users as annotations or business context
  • Operational metadata: Created from the processing and accessing of data, which indicates data freshness and data usage, and connects everything together in a meaningful way

You can harvest this metadata from a variety of sources, including:

    • Oracle Cloud Infrastructure Object Storage
    • Oracle Database
    • Oracle Autonomous Transaction Processing
    • Oracle Autonomous Data Warehouse
    • Oracle MySQL Cloud Service
    • Hive
    • Kafka

And the supported file types for Oracle Cloud Infrastructure Object Storage include:

    • CSV, Excel
    • ORC, Avro, Parquet
    • JSON

Once the technical metadata is harvested, subject matter experts and data users can contribute business metadata in the form of annotations to the technical metadata. By organizing all this metadata and providing a holistic view into it, Oracle Cloud Infrastructure Data Catalog helps data users find the data they need, discover information on available data, and gain information about the trustworthiness of data for different uses.

How Can You Use a Data Catalog?

Metadata Enrichment

Oracle Cloud Infrastructure Data Catalog enables users to collaboratively enrich technical information with business context to capture and share tribal knowledge. You can tag or link data entities and attributes to business terms to provide a more all-inclusive view as you begin to gather data assets for analysis and data science projects. These enrichments also help with classification, search, and data discovery.

Business Glossaries

One of the first steps towards effective data governance is establishing a common understanding of business concepts across the organization, and establishing their relationships to the data assets in the organization. Oracle Cloud Infrastructure Data Catalog makes it possible to see associations and linkages between glossary terms and other technical terms, assets, and artifacts. This helps increase user trust because users understand the relationships and what they’re looking at.

Oracle Cloud Infrastructure Data Catalog makes this possible by including capabilities to collaboratively define business terms in rich text form, categorize them appropriately, and build a hierarchy to organize this vocabulary. You can also create parent-child relationships between various terms to build a taxonomy, or set business term owners and approval status so that users know who can answer their questions regarding specific terms. Once created, users can then link these terms to technical assets to provide business meaning and use them for searching as well.

Searchable Data Asset Inventory

By organizing all this metadata and providing a more complete view into it, Oracle Cloud Infrastructure Data Catalog helps users find the data they need, discover information on available data, and gain information about the trustworthiness of data for different uses.

Being able to search across data stores makes finding the right data so much easier. With Oracle Cloud Infrastructure Data Catalog, you have a powerful, searchable, standardized inventory of the available data sources, entities, and attributes. You can enter technical information, defined tags, or business terms to easily pull up the right data entities and assets. You can also use filtering options to discover relevant datasets, or browse metadata based on the technical hierarchy of data assets, entities, and attributes. These features make it easier to get started with data science, analytics, and data engineering projects.

Data Catalog API and SDK

Many of Oracle Cloud Infrastructure Data Catalog’s capabilities are also available as public REST APIs to enable integrations such as:

  • Searching and displaying results in applications that use the data assets
  • Looking up definitions of defined business terms in the business glossary and displaying them in reporting applications
  • Invoking job execution to harvest metadata as needed

Available search capabilities include:

  • Search data based on technical names, business terms, or tags
  • View details of various objects
  • Browse Oracle Cloud Infrastructure Data Catalog based on data assets

Available single collaborative environment includes:

  • Homepage with helpful shortcuts and operational stats
  • Search and browse
  • Quick actions to manage data assets, glossaries, jobs, and schedules
  • Popular tags and recently updated objects

Conclusion

Oracle Cloud Infrastructure Data Catalog is the underlying foundation to data management that you’ve been waiting for—and it’s included with your Oracle Cloud Infrastructure subscription. Now, data professionals can use technical, business, and operational metadata to support self-service data discovery and governance for data assets in Oracle Cloud and beyond.

Leverage your data in new ways, and more easily than you ever could before. Try Oracle Cloud Infrastructure Data Catalog today and start discovering the value of your data. And don’t forget to subscribe to the Big Data Blog for the latest on Big Data straight to your inbox!

Related:

What Is Oracle Cloud Infrastructure Data Science?

And how does it work?

Incredible things can be done with data science, and more appear in the news every day—but there are still many barriers to success. These barriers range from a lack of proper support for data scientists to challenges around operationalizing and maintaining models in production.

That is why we created Oracle Cloud Infrastructure Data Science. Based on the acquisition of DataScience.com in 2018, Oracle Cloud Infrastructure Data Science was built with the goal of making data science collaborative, scalable, and powerful for every enterprise on Oracle Cloud Infrastructure. This short video gives an overview of the power of Oracle Cloud Infrastructure Data Science.

Oracle Cloud Infrastructure Data Science was created with the data scientist in mind—and it’s uniquely suited for data science success because of its support for team-based activity. When it comes to data science success, teams must collaborate at each step of the model lifecycle: from building models all the way through to deployment and beyond.

Oracle Cloud Infrastructure Data Science helps make all of that possible.

Never miss an update about data science! Introducing Oracle Data Science on Twitter — follow @OracleDataSci today for the latest updates!

What Is Oracle Cloud Infrastructure Data Science?

Oracle Cloud Infrastructure Data Science makes data science more structured and more efficient by offering:

Access to data and open-source tools

We are data-source agnostic. Your data can be on Autonomous Data Warehouse, on Object Storage, in MongoDB, or even in an Elasticsearch instance on Azure or AWS Redshift. It doesn’t matter to us where the data is; we just care about giving you access to your data to get things done.

With Oracle Cloud Infrastructure Data Science, you can use the best of open source, including:

  • Tools and languages like Python and JupyterLab
  • Visualization like Plotly and Matplotlib
  • Machine-learning libraries like TensorFlow, Keras, SciKit-Learn, and XGBoost
  • Version control with Git

Ability to utilize compute on demand

We’ll give you the client connectors you need to access your data and a configurable volume to store that data in your notebook compute environment.

But of course, it doesn’t stop there. You can also select the amount of compute you need to train your model on Oracle Cloud Infrastructure. For now, you can choose small to large CPU virtual machines. And in the near future, we’re planning to add GPUs.

Collaborative workflow

We make a big deal out of teamwork, because we believe that data science can’t truly be successful unless there’s an emphasis on making those teams efficient and successful. We’ve done everything we can to make this possible.

Data scientists can work in “projects” where it’s easy to see what’s happening with a high-level view. Data scientists can share and reuse data science assets and test their colleagues’ models.

Model deployment

Model deployment is usually challenging. But it’s made easier with Oracle Functions on Oracle Cloud Infrastructure. Create a machine learning model function which can be invoked from any application. It’s one of many possible deployment targets, and it’s fully managed, high scalable, and on-demand.

What Makes Oracle Cloud Infrastructure Data Science Different?

With the growing popularity of data science and machine learning, products that claim to help are a dime a dozen. So, what makes Oracle Cloud Infrastructure Data Science different?

This isn’t an analytics tool with some machine learning capabilities embedded within it. Nor is it an app that offers AI capabilities across different products.

Oracle Cloud Infrastructure Data Science is a platform built for the modern, expert data scientist. And it was built by data scientists who were seeking a platform that would help them perform their complex work better. It’s not a drag-and-drop interface­. This is meant for data scientists who write code in Python and need something with real power to enable real data science.

Oracle Cloud Infrastructure Data Science is right for you if you:

  • Have a team and see the benefits of centralized work
  • Prefer Python to drag-and-drop interfaces
  • Want to take advantage of the benefits of Oracle Cloud, with easy access to your data

Oracle Cloud Infrastructure Data Science is also right for you if you need:

  • The ability to train large models on large amounts of data with minimal infrastructure expertise
  • A system to evaluate and monitor models throughout their lifecycle
  • Improved productivity through automation and streamlined workflows
  • Capabilities to deploy models for varying use cases
  • Ability to collaborate with team members in an enterprise organization
  • A seamless, integrated Oracle Cloud Infrastructure user experience

How Does Oracle Cloud Infrastructure Data Science Work?

Oracle Cloud Infrastructure Data Science has:

Projects to centralize, organize, and document a team’s work. These projects describe the purpose of the work and allow users to organize notebook sessions and models.

Notebook Sessions for Python analyses and model development. Users can easily launch Oracle Cloud Infrastructure compute, storage, and networking for Python data science workloads. These sessions provide easy access to JupyterLab and other curated open-source machine-learning libraries for building and training models.

In addition, these notebook sessions come loaded with tutorials and example use cases to make getting started easier than ever.

Accelerated Data Science (ADS) SDK to make common data science tasks faster, easier, and less error-prone. This is a Python library that offers capabilities for data exploration and manipulation, model explanation and interpretation, and AutoML for automated model training.

Model Catalog to enable model auditability and reproducibility. You can track model metadata (including the creator, created date, name, and provenance), save model artifacts in service-managed object storage, and load models into notebook sessions for testing.

How Does Oracle Cloud Infrastructure Data Science Help with Model Management?

The process of building a machine leaning model is an iterative one, and it’s one that essentially never ends. Let’s walk through how Oracle Cloud Infrastructure Data Science makes it easier to manage models throughout every step of the entire lifecycle.

Building a Model

Oracle Cloud Infrastructure Data Science’s JupyterLab environment offers a variety of open-source libraries for building machine learning models. It also includes the Accelerated Data Science (ADS) SDK, which provides APIs on data ingestion, data profiling and visualization, automated feature engineering, automated machine learning, model evaluation, and model interpretation. It’s everything that’s needed in a unified Python SDK, accomplishing in a few lines of code what a data scientist would typically do in hundreds of lines of code.

Training a Model

Data scientists can automate model training through the ADS AutoML API. ADS can help data scientists find the best data transformations for datasets. After the model evaluation shows that the model is ready for production, the model can be made accessible to anybody who needs to use it.

Evaluating a Model

ADS also helps with model evaluation to ensure that your model is accurate and reliable. What percent accuracy can you achieve with the model? How can you make it more accurate? You want to feel confident in your model before you start to deploy it.

Explaining a Model

Model explainability is becoming an increasingly important part of machine learning and data science. Can your model give you more information about why it’s making the decisions it’s reaching? Increasingly, there are more European regulations around the right to know. GDPR, for example, states that the data subject has a right to an explanation of the decision reached by a model.

Deploying a Model

Taking a trained machine learning model and getting it into the right systems is often a difficult and laborious process. But Oracle Cloud Infrastructure enables team to operationalize models as scalable and secure APIs. Data scientists can load their model from the model catalog, deploy the model using Oracle Functions, and secure the model endpoint with Oracle API Gateway. Then, the model REST API can be called from any application.

Model Monitoring

Unfortunately, deploying a model isn’t the end of it. Models must always be monitored after deployment to maintain good health. The data it was trained on may no longer be relevant for future predictions after a while. For example, in the case of fraud detection, the fraudsters may come up with new ways to defraud the system, and the model will no longer be as accurate. Oracle Cloud Infrastructure Data Science is working to provide data scientists with tools to easily monitor how the model continues to do while it’s deployed, so that it becomes easier to monitor model accuracy over time.

Conclusion

Oracle Cloud Infrastructure Data Science is an enterprise-grade service in which teams of data scientists can collaborate to solve business problems and leverage the latest and greatest in Oracle Cloud Infrastructure to build, train, and deploy their models in the cloud.

It is part of Oracle’s data and AI platform, which makes it simple to integrate and manage your data and use the power of data science and machine learning for more business results.

With Oracle Cloud Infrastructure Data Science, it’s easier than ever before for data scientists to get started, work with the tools and libraries that they want, and gain streamlined access to all data in Oracle Cloud Infrastructure and beyond. For more information, see this overview video and don’t forget to subscribe to the Oracle Big Data blog to get the latest posts sent to your inbox.

Related:

Cloud Day: What’s Possible and Where to Start

Want to get a peek into the future of modern IT? Then, come to Oracle Cloud Day, says Dain Hansen, VP of Product Marketing for IaaS and PaaS at Oracle. After speaking at Oracle Cloud Day events in Boston and Chicago last year, Hansen said that one of the things he liked best about the event was that it gave people a real view into what their future could be.

“Imagine a world where everything is automated. You can use AI to power the next level of insights, or you can build a modern application that you can talk to just like you talk to your phone,” Hansen said. “Those are things that we want people to experience. We want them to get first-hand knowledge of and use and touch and see what’s possible.”

Register here

This year, Hansen said, it’s all about how to use data to get a leg up—on the competition and in your career.

“You’re going to see all kinds of ways to use your data,” Hansen said.

Oracle Cloud Day will take a broad, yet detailed, look at all things data—how to manage it, how to secure it, how to draw insights from it, and how to create applications and services that use it in new ways.

But with so much to see at Oracle Cloud Day and so many new technologies to take in, we asked Hansen, “How does someone get the most from Oracle Cloud Day?”

Here are Hansen’s three tips.

Discover the Best Way to Do What You’re Trying to Do

Because there’s so much expertise on hand, Oracle Cloud Day is the perfect place to get information on best practices. Hansen recommends focusing first on what you’re trying to do within your organization, then finding the best way to do it.

If you’re a security person, maybe you want to learn about the latest security threats or figure out the best way to secure your data across cloud and on premises. If you’re an apps IT person, maybe you want to hear about the best way to migrate an application to the cloud.

Whatever it is, zero in on that topic, seek out the best way to do it, and take a look at how Oracle can help. Oracle Cloud Day is a great venue to experience technologies first hand and talk to experts about how they can help you with not only your needs, but the needs of your business as a whole.

Decide What You’re Going to Learn Next

Once you’ve identified how you can address your current needs, take a look at the horizon. What’s next?

“Everyone is always trying to learn something. Even for me, I’m always trying to study and see what I need to pick up on,” Hansen said.

Because of its emphasis on modern IT and the breadth of Oracle technology, Cloud Day is a great place to get up to date on what’s next for you and your business.

Hear From People Already Doing It

Maybe one of the best things about Cloud Day, Hansen said, is that attendees get to hear from companies already reaching their goals. Cloud Day will be packed with real-life stories told by customers who have made the journey.

“Customers don’t mess around. They don’t mince words. They tell it like it is. And that’s one thing that I don’t want anyone to miss is to hear what our customers say about what they’re doing,” Hansen said.

With 15 sessions across three tracks—Modernizing Data Management, Modernizing Applications, and Transforming Business with Analytics and AI—plus the Developer Playground, industry experts and partners in the Innovation Lounge, and a keynote that brings it all together, there are plenty of opportunities to track down all the information you need for what you’re doing today and what you’ll want to do tomorrow.

Now that you know how to make the most of your time at Cloud Day, don’t forget to register. For more information about Oracle Cloud Day, visit the Oracle Cloud Day website.

Related:

The Age of Automation: How AI Is Reshaping the Way We Work

Today, we’re seeing automation permeate every aspect of work and life. From always-on vacuums to self-patching databases, these technologies help us reimagine what’s possible and are changing the way we experience our world. We’re now coming out of the Information Age and entering the Age of Automation, a new era of capabilities propelled by groundbreaking business tools and evolved cloud technology that take advantage of the information we continue to value.

With autonomous technologies, we now have the means to automate insights and automate manual tasks, while reducing costs and reducing risk. We can integrate machine learning (ML) software, hardware, applications—and sometimes machinery and electronics—to create all kinds of secure, closed systems that act safely and independently to perform tasks humans

This post was contributed by Dain Hansen, VP of Product Marketing for Oracle Cloud Platform (PaaS/IaaS)

Today, we’re seeing automation permeate every aspect of work and life. From always-on vacuums to self-patching databases, these technologies help us reimagine what’s possible and are changing the way we experience our world. We’re now coming out of the Information Age and entering the Age of Automation, a new era of capabilities propelled by groundbreaking business tools and evolved cloud technology that take advantage of the information we continue to value.

With autonomous technologies, we now have the means to automate insights and automate manual tasks, while reducing costs and reducing risk. We can integrate machine learning (ML) software, hardware, applications—and sometimes machinery and electronics—to create all kinds of secure, closed systems that act safely and independently to perform tasks humans have always done, like driving, compiling reports, securing records, or administering minor medical tests.

What’s more, autonomous systems can consume, process, and analyze enormous amounts of data—much more than most of us could do using traditional IT tools. This is significant because today’s most successful business models are anchored in an abundance of data, both to direct internal operations and to innovate new products, services, and processes. Just look around, and you’ll see the disruption these new models have caused in entertainment, lodging, transportation, energy, retail, healthcare, education, manufacturing, and other sectors.

For CIOs, the Age of Automation requires a rethinking of how to approach enterprise IT. From an executive perspective, CIOs need to recognize that “information” means something different than it did in the Information Age. Data is incredibly valuable for business insight, but with ML, it also has become a literal catalyst for action.

Let’s take a deeper look at three things CIOs should do to be successful in the Age of Automation:

1. Use systems-design thinking

Autonomous things are designed as integrated systems, whether it’s a marketing automation platform or a self-navigating robotic surgeon’s tool. Software and data prompt the system to autonomously act, learn, and adjust within human-defined parameters. A secure, reliable operation requires purposeful design and tight integration of sub-systems so that everything takes place in a closed loop.

In the Information Age, a common CIO strategy was to pull together a mix of cloud and software vendors in an ongoing effort to control costs while serving basic workflow, compute, and storage needs. In the Age of Automation, the value of an integrated system goes beyond costs savings and raises standards for performance and security as well.

Think about what could happen if a criminal hacked into “bolted on” sensoring software in an autonomous vehicle system. The criminal could basically take control of the larger system and cause harm. But if the security of the sensoring sub-system is designed into the larger vehicle system, hacking into it becomes much more difficult, if not impossible. The same is true for enterprise technology.

Also, when an autonomous system is designed holistically from the start, it’s much more efficient because there are fewer operational gaps and data hand-offs. Having system-level integration of cloud, data flow, and applications speeds up processing faster than has ever been possible.

Companies with more rapid processes and fewer manual touches can free up budget and other resources, and use them to accelerate growth. For example, in a global survey of managers, directors, vice presidents, and C-suite executives, researchers discovered that organizations that have experienced significant growth are two times more likely to have completed intelligent automation initiatives.

The challenge for CIOs will be to stop looking at their enterprise technology as separate sub-systems and start thinking about it as one system that serves the needs of all while operating autonomously.

2. Prioritize data challenges

In the Age of Automation, data is a multidimensional asset. In the Information Age, data—even what we called “big data”—was comparatively limited, inert, and single-purposed. Now, data from many different sources is directly spurring operations and impacting business health. You simply can’t run an organization—and deliver profitable products and services—without prioritizing data.

But many organizations are struggling to address their most pressing data challenges, while others are pulling ahead by prioritizing data management. In the same survey referenced in the prior section, researchers documented that organizations that are data leaders have huge advantages over data laggards. For example, 77% of data leaders said they’ve improved the customer/employee experience as a result of better secured data, while only 11% of laggards said the same. And 69% of data leaders were confident that they were generating meaningful data insights for their organization, while only 23% of data laggards had the same confidence.

To fix these problems, CIOs need to prioritize data challenges by using autonomous systems to streamline, automate, and improve data management while redeploying the newly freed-up assets to help lines-of-business more effectively use data, automation, and autonomy.

Even if an organization has a “low tech” product or service, autonomous systems will reset standards for everyday processes. For example, one bank is working toward having an autonomous rolling forecast system that updates business forecasts on its own using Oracle Cloud and ERP applications. In this system, the forecasts will run themselves. The capabilities they needed were added in a simple upgrade, but that upgrade never would have happened without the CIO prioritizing the common back-office challenge of collecting, normalizing, and analyzing disparate enterprise data from spreadsheets.

As we progress in the Age of Automation, such systems will become omnipresent inside and outside of businesses. Processing the data within these systems at heightened speed will require enterprise-architected cloud resources that are self-aware, self-operating, and self-healing, so that security and reliability are embedded into the system.

3. Moving to Gen 2 Cloud

The second-generation cloud is a fundamental re-architecture of the conventional public cloud, which was built on decades-old technology that can’t effectively provide capabilities organizations need in the Age of Automation. Oracle’s Gen 2 cloud is specifically designed for enterprises and supports the emerging technologies used in autonomous systems—a new infrastructure, with new platform capabilities.

For example, Oracle Cloud is different from other clouds because it includes all of the technology required to build, extend, and connect apps—and embed emerging technologies like ML within your connected enterprise workflows. There’s application development that includes mobile, blockchain, AI/ML, and chatbots, and there’s integration for Oracle and non-Oracle apps. Oracle Gen 2 Cloud also serves as the infrastructure through which we deliver Oracle Analytics, Oracle Autonomous Transaction Processing, and Oracle Autonomous Data Warehouse.

Taken together, this provides a comprehensive set of automated capabilities to seamlessly move from on-premise to cloud, increasing productivity while reducing costs. This enables you to move your organization to use next generation of technology for more secure, high-performance, mission-critical workloads.

Time to Move Toward Autonomous Systems

We are at the precipice of a new era that will take us far beyond what we’ve achieved with the first-generation cloud technology used in the Information Age. Autonomous systems are quickly becoming more prevalent in everyday life, as well as in business operations and innovation.

New business models take advantage of what’s possible with autonomous systems, but first, CIOs will need to rethink enterprise technology. They can cut costs, improve efficiency, and make their organizations more easily scalable, while enabling agility and emerging technologies within workflows by moving to a second-generation cloud like Oracle Cloud.

For more information about Oracle Cloud, visit www.oracle.com/do-more-with-data.

Related: