And how does it work?
Incredible things can be done with data science, and more appear in the news every day—but there are still many barriers to success. These barriers range from a lack of proper support for data scientists to challenges around operationalizing and maintaining models in production.
That is why we created Oracle Cloud Infrastructure Data Science. Based on the acquisition of DataScience.com in 2018, Oracle Cloud Infrastructure Data Science was built with the goal of making data science collaborative, scalable, and powerful for every enterprise on Oracle Cloud Infrastructure. This short video gives an overview of the power of Oracle Cloud Infrastructure Data Science.
Oracle Cloud Infrastructure Data Science was created with the data scientist in mind—and it’s uniquely suited for data science success because of its support for team-based activity. When it comes to data science success, teams must collaborate at each step of the model lifecycle: from building models all the way through to deployment and beyond.
Oracle Cloud Infrastructure Data Science helps make all of that possible.
Never miss an update about data science! Introducing Oracle Data Science on Twitter — follow @OracleDataSci today for the latest updates!
What Is Oracle Cloud Infrastructure Data Science?
Oracle Cloud Infrastructure Data Science makes data science more structured and more efficient by offering:
Access to data and open-source tools
We are data-source agnostic. Your data can be on Autonomous Data Warehouse, on Object Storage, in MongoDB, or even in an Elasticsearch instance on Azure or AWS Redshift. It doesn’t matter to us where the data is; we just care about giving you access to your data to get things done.
With Oracle Cloud Infrastructure Data Science, you can use the best of open source, including:
- Tools and languages like Python and JupyterLab
- Visualization like Plotly and Matplotlib
- Machine-learning libraries like TensorFlow, Keras, SciKit-Learn, and XGBoost
- Version control with Git
Ability to utilize compute on demand
We’ll give you the client connectors you need to access your data and a configurable volume to store that data in your notebook compute environment.
But of course, it doesn’t stop there. You can also select the amount of compute you need to train your model on Oracle Cloud Infrastructure. For now, you can choose small to large CPU virtual machines. And in the near future, we’re planning to add GPUs.
We make a big deal out of teamwork, because we believe that data science can’t truly be successful unless there’s an emphasis on making those teams efficient and successful. We’ve done everything we can to make this possible.
Data scientists can work in “projects” where it’s easy to see what’s happening with a high-level view. Data scientists can share and reuse data science assets and test their colleagues’ models.
Model deployment is usually challenging. But it’s made easier with Oracle Functions on Oracle Cloud Infrastructure. Create a machine learning model function which can be invoked from any application. It’s one of many possible deployment targets, and it’s fully managed, high scalable, and on-demand.
What Makes Oracle Cloud Infrastructure Data Science Different?
With the growing popularity of data science and machine learning, products that claim to help are a dime a dozen. So, what makes Oracle Cloud Infrastructure Data Science different?
This isn’t an analytics tool with some machine learning capabilities embedded within it. Nor is it an app that offers AI capabilities across different products.
Oracle Cloud Infrastructure Data Science is a platform built for the modern, expert data scientist. And it was built by data scientists who were seeking a platform that would help them perform their complex work better. It’s not a drag-and-drop interface. This is meant for data scientists who write code in Python and need something with real power to enable real data science.
Oracle Cloud Infrastructure Data Science is right for you if you:
- Have a team and see the benefits of centralized work
- Prefer Python to drag-and-drop interfaces
- Want to take advantage of the benefits of Oracle Cloud, with easy access to your data
Oracle Cloud Infrastructure Data Science is also right for you if you need:
- The ability to train large models on large amounts of data with minimal infrastructure expertise
- A system to evaluate and monitor models throughout their lifecycle
- Improved productivity through automation and streamlined workflows
- Capabilities to deploy models for varying use cases
- Ability to collaborate with team members in an enterprise organization
- A seamless, integrated Oracle Cloud Infrastructure user experience
How Does Oracle Cloud Infrastructure Data Science Work?
Oracle Cloud Infrastructure Data Science has:
Projects to centralize, organize, and document a team’s work. These projects describe the purpose of the work and allow users to organize notebook sessions and models.
Notebook Sessions for Python analyses and model development. Users can easily launch Oracle Cloud Infrastructure compute, storage, and networking for Python data science workloads. These sessions provide easy access to JupyterLab and other curated open-source machine-learning libraries for building and training models.
In addition, these notebook sessions come loaded with tutorials and example use cases to make getting started easier than ever.
Accelerated Data Science (ADS) SDK to make common data science tasks faster, easier, and less error-prone. This is a Python library that offers capabilities for data exploration and manipulation, model explanation and interpretation, and AutoML for automated model training.
Model Catalog to enable model auditability and reproducibility. You can track model metadata (including the creator, created date, name, and provenance), save model artifacts in service-managed object storage, and load models into notebook sessions for testing.
How Does Oracle Cloud Infrastructure Data Science Help with Model Management?
The process of building a machine leaning model is an iterative one, and it’s one that essentially never ends. Let’s walk through how Oracle Cloud Infrastructure Data Science makes it easier to manage models throughout every step of the entire lifecycle.
Building a Model
Oracle Cloud Infrastructure Data Science’s JupyterLab environment offers a variety of open-source libraries for building machine learning models. It also includes the Accelerated Data Science (ADS) SDK, which provides APIs on data ingestion, data profiling and visualization, automated feature engineering, automated machine learning, model evaluation, and model interpretation. It’s everything that’s needed in a unified Python SDK, accomplishing in a few lines of code what a data scientist would typically do in hundreds of lines of code.
Training a Model
Data scientists can automate model training through the ADS AutoML API. ADS can help data scientists find the best data transformations for datasets. After the model evaluation shows that the model is ready for production, the model can be made accessible to anybody who needs to use it.
Evaluating a Model
ADS also helps with model evaluation to ensure that your model is accurate and reliable. What percent accuracy can you achieve with the model? How can you make it more accurate? You want to feel confident in your model before you start to deploy it.
Explaining a Model
Model explainability is becoming an increasingly important part of machine learning and data science. Can your model give you more information about why it’s making the decisions it’s reaching? Increasingly, there are more European regulations around the right to know. GDPR, for example, states that the data subject has a right to an explanation of the decision reached by a model.
Deploying a Model
Taking a trained machine learning model and getting it into the right systems is often a difficult and laborious process. But Oracle Cloud Infrastructure enables team to operationalize models as scalable and secure APIs. Data scientists can load their model from the model catalog, deploy the model using Oracle Functions, and secure the model endpoint with Oracle API Gateway. Then, the model REST API can be called from any application.
Unfortunately, deploying a model isn’t the end of it. Models must always be monitored after deployment to maintain good health. The data it was trained on may no longer be relevant for future predictions after a while. For example, in the case of fraud detection, the fraudsters may come up with new ways to defraud the system, and the model will no longer be as accurate. Oracle Cloud Infrastructure Data Science is working to provide data scientists with tools to easily monitor how the model continues to do while it’s deployed, so that it becomes easier to monitor model accuracy over time.
Oracle Cloud Infrastructure Data Science is an enterprise-grade service in which teams of data scientists can collaborate to solve business problems and leverage the latest and greatest in Oracle Cloud Infrastructure to build, train, and deploy their models in the cloud.
It is part of Oracle’s data and AI platform, which makes it simple to integrate and manage your data and use the power of data science and machine learning for more business results.
With Oracle Cloud Infrastructure Data Science, it’s easier than ever before for data scientists to get started, work with the tools and libraries that they want, and gain streamlined access to all data in Oracle Cloud Infrastructure and beyond. For more information, see this overview video and don’t forget to subscribe to the Oracle Big Data blog to get the latest posts sent to your inbox.