Oracle Database 18c : Now available on the Oracle Cloud and Oracle Engineered Systems

The release itself focused on 3 major areas:

Multitenantis Oracle’s strategic container architecture for the Oracle Database. It introduced the concept of a pluggable database (PDB) enabling users to plug and unplug their databases and move them to other containers either locally or in the cloud. The architecture enables massive consolidation and the ability to manage/patch/backup many databases as one. We introduced this architecture in Oracle Database 12c and extended it capabilities in Oracle Database 12c Release 2 with the ability to hot clone, online relocate and provide resource controls for IO, CPU and Memory on a PDB basis. We also ensured that all of the features available in a non-container are available for a PDB (Flashback Database, Continuous Query etc.).

Database In-Memory enables users to perform lighting fast analytics against their operational databases without being forced to acquire new hardware or make compromises in the way they process their data. The Oracle Database enables users to do this by adopting a dual in-memory model where OLTP data is held both as rows, enabling it to be efficiently updated, and in a columnar form enabling it to be scanned and aggregated much faster. This columnar in-memory format then leverages compression and software in silicon to analyze billions of rows a second, meaning reports that used to take hours can now be executed in seconds. In Oracle Database 12c Release 2 we introduced many new performance enhancements and extended this capability with new features that enabled us to perform in In-Memory analytics on JSON documents as well as significantly improving the speed at which the In-Memory column store is available to run queries against at startup.

Oracle Database Sharding, released in Oracle Database 12c Release 2, provides OLTP scalability and fault isolation for users that want to scale outside of the usual confines of a typical SMP server. It also supports use cases where data needs to be placed in geographic location because of performance or regularity reasons. Oracle Sharding provides superior run-time performance and simpler life-cycle management compared to home-grown deployments that use a similar approach to scalability. Users can automatically scale up the shards to reflect increases in workload making Oracle the one of the most capable and flexible approaches to web scale workloads for the enterprise today.

Oracle 12c Release 2 also included over 600 new features ranging from small syntax improvements to features like improved Index Compression, Real Time Materialized views, Index Usage Statistics, Improved JSON support, Enhancements to Real Application Clusters and many many more. I’d strongly recommend taking a look at the “New Features guide for Oracle Database 12c Release 2” available here

Small improvements across the board

As you’d expect from a yearly release Oracle Database 18c doesn’t contain any seismic changes in functionality but there are lots of small incremental improvements. These range from small syntax enhancements to improvements in performance, some will require that you explicitly enable them whilst others will happen out of the box. Whilst I’m not going to be able to cover all of the many enhancements in detail I’ll do my best to give you a flavor of some of these changes. To do this I’ll break the improvements into 6 main areas : Performance, High Availability, Multitenant, Security, Data Warehousing and Development.

Performance

For users of Exadata and Real Application Clusters (RAC), Oracle Database 18c brings changes that will enable a significant reduction in the amount of undo that needs to be transferred across the interconnect. It achieves this my using RDMA, over the Infiniband connection, to access the undo blocks in the remote instance. This feature combined with a local commit cache significantly improves the throughput of some OLTP workloads when running on top of RAC. This combined with all of the performance optimization that Exadata brings to the table, cements its position as the highest performance Database Engineered System for both OLTP and Data Warehouse Workloads.

To support applications that fetch data primarily via a single unique key Oracle Database 18c provides a memory optimized lookup capability. Users simply need to allocate a portion of Oracle’s Memory (SGA) and identify which tables they want to benefit from this functionality, the database takes care of the rest. SQL fetches are significantly faster as they bypass the SQL layer and utilize an in-memory hash index to reduce the number or operations that need to be performed to get the row. For some classes of application this functionality can result in upwards of 4 times increase in throughput with a halving of their response times.

To ease the maintenance work for In-Memory it’s also now possible to have tables and partitions automatically populated into and aged out of the column store. It does this by utilizing the Heat Map such that when the Column Store is under memory pressure it evicts inactive segments if more frequently accessed segments would benefit from population.

Oracle Database In-Memory gets a number of improvements as well. It now uses parallel light weight threads to scan its compression units rather than a process driven serial scans. This is available for both serial and parallel scans of data and it can double the speed at which data is read. This improves the already exceptional scan performance of Oracle Database In-Memory. Alongside this feature, Oracle Database In-Memory also enables Oracle Number types to be held in their native binary representation (int, float etc). This enables the data to be processed by the vector processing units on processors like Intel’s Xenon CPU much faster than previously. For some aggregation and arithmetic operations this can result in a staggering 40 times improvement in performance.

Finally, In-Memory in Oracle Database 18c also allows you to place data from external tables in the column store, enabling you to execute high performance analytics on data outside of the database.

High Availability

Whether you are using Oracle Real Application Clusters or Oracle DataGuard we continue to look for ways to improve on the Oracle Database’s high availability functionality. With Oracle Database 18c we’re rolling out a few small but significant upgrades.

Oracle Grid Infrastructure, which sits at the heart of Oracle’s Real Application Clusters, now offers “Zero Impact Patching”. That is to say you can now patch each of the nodes Grid Infrastructure software in a rolling fashion whist the databases running on that node continue to be available to the application users. We do this by leveraging the Grid Infrastructure services (Flex ASM, Flex Cluster) running on other nodes in the cluster until the patching has taken place.

Oracle Real Application Clusters also gets a hybrid sharding model. With this technology you can enjoy all of the benefits that a shared disk architecture provides whilst leverage some of the benefits that Sharding offers. The Oracle Database will affinitize table partitions/shards to nodes in the cluster and route connections using the Oracle Database Sharding API based on a shard key. The benefit of this approach is that it formalizes a technique often taken by application developers to improve buffer cache utilization and reduce the number of cross shard pings between instances. It also has the advantage of removing the punitive cost of cross shard queries simply by leveraging RAC’s shared disk architecture.

Sharding also gets some improvements in Oracle Database 18c in the form of “User Defined Sharding” and “Swim Lanes”. Users can now specify how shards are to be defined using either the system managed approach, “Hashing”, or by using an explicit user defined model of “Range” and “List” sharding. Using either of these last two approaches gives users the ability to ensure that data is placed in a location appropriate for its access. This might be to reduce the latency between the application and the database or to simply ensure that data is placed in a specific data center to conform to geographical or regulatory requirements. Sharded swim lanes also makes it possible to route requests through sharded application servers all the way to a sharded Oracle Database. Users do this by having their routing layer call a simple REST API. The real benefit of this approach is that it can improve throughput and reduce latency whilst minimizing the number of possible connections the Oracle Database needs to manage.

For the users of Java in the Database we’re rolling out a welcome fix that will make it possible to perform rolling patching of the database.

Multitenant

Multitenant in Oracle Database 18c got a number of updates to continue to round out the overall architecture. We’re introducing the concept of a Snapshot Carousel. This enables you to define regular snapshots of PDBs. You can then use these snapshots as a source for PDB clones from various points of time, rather than simply the most current one. The Snapshot Carousel might be ideal for a development environment or to augment a non-mission critical back and recovery process.

I’m regularly asked if we support Multitenant container to container active/active Data Guard Standbys. This is where some of the primary PDBs in one container have standby PDBs in an opposing container and vice versa. We continue to move in that direction and in Oracle Database 18c we move a small step closer with the introduction of “Refreshable PDB Switchover”. This enables users to create a PDB which is an incrementally updated copy of a “master” PDB. Users may then perform a planned switchover between the PDBs inside of the container. When this happens the master PDB becomes the clone and the old clone the master. It’s important to point out that this feature is not using Data Guard; rather it extends the incremental cloning functionality we introduced in Oracle Database 12c Release 2.

In Oracle Database 18c Multitenant also got some Data Guard Improvements. You can now automatically maintain standby databases when you clone a PDB on the primary. This operation will ensure that the PDB including all of its data files are created on the standby database. This significantly simplifies the process needed to provide disaster recovery for PDBs when running inside of a container database. We also have made it possible to clone a PDB from a Active Data Guard Standby. This feature dramatically simplifies the work needed to provide copies of production databases for development environments.

Multitenant also got a number of smaller improvements that are still worth mentioning. We now support the use of backups performed on a PDB prior to it being unplugged and plugged into a new container. You can also expect upgrades to be quicker under Multitenant in Oracle Database 18c.

Security

The Oracle Database is widely regarded as the most secure database in the industry and we continue to innovate in this space. In Oracle Database 18c we have added a number or small but important updates. A simple change that could have a big impact for the security of some databases is the introduction of schema only accounts. This functionality allows schemas to act as the owners of objects but not allow clients to log in potentially reducing the attack surface of the database.

To improve the isolation of Pluggable Databases (PDBs) we are adding the ability for each PDB to have its own key store rather than having one for the entire container. This also simplifies the configuration of non-container databases by introducing explicit parameters and hence removing the requirement to edit the sqlnet.ora file

A welcome change for some Microsoft users is the integration of the Oracle Database with Active Directory. Oracle Database 18c allows Active Directory to authenticate and authorize users directly without the need to also use Oracle Internet Directory. In the future we hope to extend this functionality to include other third-party LDAP version 3–compliant directory services. This change significantly reduces the complexity needed to perform this task and as a result improves the overall security and availability of this critical component.

Data Warehousing

Oracle Database 18c’s support for data warehousing got a number of welcome improvements.

Whilst machine learning has got a lot of attention in the press and social media recently it’s important to remind ourselves that the Oracle Database has had a number of these algorithms since Oracle 9i. So, in this release we’ve improved on our existing capability by adding a few more and improving a number of others by implementing them directly inside of the database without the need for callouts.

One of the compromises that data warehouse users have had to accept in the past was that if they wanted to use a standby database, they couldn’t use no-logging to rapidly load data into their tables. In Oracle Database 18c that no longer has to be the case. Users can make a choice between two modes whilst accommodating the loading of non-logged data. The first ensures that standbys receive non-logged data changes with minimum impact on loading speed at the primary but at the cost of allowing the standby to have transient non-logged blocks. These non-logged blocks are automatically resolved by managed standby recovery. And the the second ensures all standbys have the data when the primary load commits but at the cost of throttling the speed of loading data at the primary, which means the standbys never have any non-logged blocks.

One of the most interesting developments in Oracle Database 18c is the introduction of Polymorphic table functions. Table functions are a popular feature that enables a developer to encapsulate potentially complicate data transformations, aggregations, security rules etc. inside of a function that when selected from returns the data as if it was coming from a physical table. For very complicated ETL operations these table functions can be pipelined and even executed in parallel. The only downside of this approach was that you had to declare the shape of the data returned as part of the definition of the function i.e. the columns to the returned. With Polymorphic tables, the shape of the data to be returned is determined by the parameters passed to the table function. This provides the ability for polymorphic table functions to be more generic in nature at the cost of a little more code.

One of my personal favorite features of this release is the ability to merge partitions online. This is particularly useful if you portion your data by some unit of time e.g. minutes, hours, days weeks and at some stage as the data is less frequently updated you aggregate some of the partitions into larger partitions to simply administration. This was possible in previous versions of the of the database, but the table was inaccessible whilst this took place. In Oracle Database 18c you merge your partitions online and maintain the indexes as well. This rounds out a whole list of online table and partitions that we introduced in Oracle Database 12c Release 1 and Release 2 e.g. move table online, split partition online, convert table to partition online etc.

For some classes of queries getting a relatively accurate approximate answer fast is more useful than getting an exact answer slowly. In Oracle Database 12c we introduced the function APPROX_COUNT_DISTINCT which was typically 97% or greater but can provide the result orders of magnitudes faster. We added additional functions in Oracle Database 12c Release 2 and in 18c we provide some additional aggregation (on group) operations APPROX_COUNT(), APPROX_SUM() and APPROX_RANK().

Oracle Spatial and Graph also added some improvements in this release. We added support for Graphs in Oracle Database 12c Release 2. And now in Oracle Database 18c you can now use Property Graph Query Language (PGL) to simply the querying of the data held within them. Performance was also boosted with the introduction of support for Oracle In Memory and List Hash partitioning.

We also added a little bit of syntax sugar when using external tables. You can now specify the external table definition inline on an insert statement. So no need to create definitions that are used once and then dropped anymore.

Development

As you’d expect there were a number of Oracle Database 18c improvements for developers, but we are also updating to our tools and APIs.

JSON is rapidly becoming the preferred format for application developers to transfer data between the application tiers. In Oracle Database 12c we introduced support that enabled JSON to be persisted to the Oracle Database and queried using dot notation. This gave developers a no compromise platform for JSON persistence with the power and industry leading analytics of the Oracle Database. Developers could also treat the Oracle Database as if it was a NoSQL Database using the Simple Oracle Document Access (SODA) API. This meant that whilst some developers could work using REST or JAVA NoSQL APIs to build applications, others could build out analytical reports using SQL. In Oracle Database 18c we’ve also added a new SODA API for C and PL/SQL and included a number of improvements to functions to return or manipulate JSON in the database via SQL. We’ve also enhanced the support for Oracle Sharding and JSON.

Global Temporary Tables are an excellent way to hold transient data used in reporting or batch jobs within the Oracle Database. However, their shape, determined by their columns, is persisted across all sessions in the database. In Oracle Database 18c we’ve provide a more flexible approach with Private Temporary Tables. These allow uses to define the shape of the table that is only visible for a give session or even just a transaction. This approach provides more flexibility in the way developers write code and can ultimately lead to better code maintenance.

Oracle Application Express, Oracle SQL Developer, Oracle SCLCl, ORDS have all been tested with 18c and in some instance get small bumps in functionality such as support for Sharding.

We also plan to release an REST API for the Oracle Database. This will ship with ORDS 18.1 a little later this year.

And One Other Thing…

We’re also introducing a new mode for Connection Manager. If you’re not familiar with what Connection Manager (CMAN) does today, I’d recommend taking a look here. Basically, CMAN allows you to use it as a connection concentrator enabling you to funnel thousands of sessions to a single Oracle Database. With the new mode introduced in Oracle Database 18c, it’s able to do a lot more. It can now automatically route connections to surviving database resources in the advent of some outage. It can also redirect connections transparently if you re-locate a PDB. It can load-balance connections across databases and PDBs whilst also transparently enabling connection performance enhancements such as statement caching and pre-fetching. And it can now significantly improve the security of incoming connections to the database.

All in all, an exciting improvement to a great networking resource for the Oracle Database.

Where to get more information

We’ve covered some of the bigger changes in Oracle Database 18c but there are many more that we don’t have space to cover here. If you want a more comprehensive list take a look at the new features guide here.

https://docs.oracle.com/en/database/oracle/oracle-database/18/newft/new-features.html

You can also find more information on the application development tools here

http://www.oracle.com/technetwork/developer-tools/sql-developer/overview/index.html

http://www.oracle.com/technetwork/developer-tools/rest-data-services/overview/index.html

http://www.oracle.com/technetwork/developer-tools/sqlcl/overview/sqlcl-index-2994757.html

http://www.oracle.com/technetwork/developer-tools/apex/overview/index.html

If you’d like to try out Oracle Database 18c you can do it here with LiveSQL

https://livesql.oracle.com/apex/livesql/file/index.html

Related:

  • No Related Posts

Oracle Data Management Solutions: Success From The Data Center To The Cloud

By: Edgar Haren

Principal Product Marketing Director

Oracle is off to a strong start in 2018. Industry analyst Gartner has recognized Oracle as a leader among 22 technology providers evaluated for their offerings in data management and analytic solutions. Gartner evaluates solution providers based on a variety of evaluation criteria including ability to execute and completeness of vision. Based on these parameters, Gartner has published its latest February 2018 Magic Quadrant for Data Management Solutions for Analytics, and it is our opinion that Oracle stood out from the competition by offering truly meaningful technologies. The announcement of Oracle Autonomous Database and Autonomous Data Warehouse Cloud enable Oracle to deliver new automation innovations in cloud based data management services

Figure-1: Gartner February 2018 Magic Quadrant for Data Management Solutions for Analytics

The Oracle complete and unified data management portfolio that is highly automated provides customers a seamless path to the cloud. As a customer becomes familiar and finds value in the Oracle Database or Oracle Database Cloud Service, there is a natural expansion into common cloud services such as database backup or IaaS, but also data monetization solutions such as data warehousing, BI, analytics or even big data.

Figure-2: Oracle Solutions For the Data Life-Cycle

One of the big challenges for customers is that many public cloud providers lack support for a true hybrid-cloud deployment, which is the model that is predicted to be the norm for most organizations by the end of 2018. Oracle offers true enterprise grade cloud solutions that are engineered for a 100% compatible hybrid-cloud deployment. Whether your business requires data to remain on-premises, managed inside your data center or deployed in the cloud, with Oracle you have access to the same enterprise scale data management technology virtually anywhere.

Figure-3: Oracle’s Flexibility Of Product Type & Deployment Choice

Read the Gartner report:

http://www.gartner.com/reprints/?id=1-4O3NVDI&ct=180109&st=sb

For more information on Oracle Data Management Cloud Services please visit us athttps://cloud.oracle.com/en_US/data-management

Gartner Magic Quadrant for Data Management Solutions for Analytics, Adam M. Ronthal, Roxane Edjlali, Rick Greenwald, 13 February 2018. This graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request from Oracle. Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose

Follow Us On Social Media:

Related:

  • No Related Posts

Big Data In The Cloud: Why And How?

By: Prashant Jha

Director of Product Management

Lowered Total Cost of Ownership, Total Flexibility, Hyper-Scale, and the Oracle Advantage

Algorithms have a huge influence on our daily lives and our future. Everything from our social interactions, news, entertainment, finance, health, and other aspects of our lives are impacted by mathematical computations and nifty algorithms, and big data is a significant part of what makes this possible.

We’re now in the era of machine learning and artificial intelligence, one more time. But unlike our previous attempt in the 1960s and 1980s, things are different this time. Of course they are. Thanks to Moore’s Law, transistor density continues to increase while storage costs continues to drop.

But that, by itself, is not enough to ensure success.

Sign up for a free trial to build and populate a data lake in the cloud

This time around, we have distributed computing which has also come a long way with new computing paradigms such as Hadoop, Spark, TensorFlow, etc. being developed every day, both within academia and large corporations.

These advancements have enabled us to build systems that are way more powerful than anything that can be achieved with a single machine with the most powerful processor inside, which is what was attempted in the 60s.

It enables us to do more with our big data than we’ve ever been able to before.

But there are two other advances that are playing a huge role in this revolution:

Data is Fueling the Future

Explosive digitization of the physical world is creating an unprecedented amount of data. There’s a deluge of data that needs to get stored and processed. This data comes from:

  • Online social networks
  • User generated content
  • Mobile computing
  • Embedded sensors in everyday objects
  • Automation of routine tasks
  • And much more

The action of analyzing and processing this data is what makes the algorithms smart and efficient, which then gets applied to others areas of our life. And thus, there is this virtuous cycle of self-fulfillment.

According to some estimates, 2/3 of all digital content is user generated and has been created in the last few years. According to Intel, autonomous cars will generate 4000GB of data per car per day. Soon this will be more production and consumption of data than humans can generate and consume.

This is the foundation of a smarter, better, and more efficient planet. If algorithms are the engine that is going to drive us to a better future, then data is the gas in the tank. But how are companies using their gas?

Challenges to Utilizing Big Data

Traditionally, companies have preferred to build out their own server farm, deploy, run, and manage systems themselves. But as this data volume grows and the goal of extracting value out of this massive data set involves complex and sophisticated machine learning and AI algorithms, it is becoming more challenging in terms of operations and total cost of ownership (TCO) to maintain this deployment.

In addition to the TCO, there are challenges with agility and flexibility. From a hardware perspective, there is the sunk cost of buying machines and provisioning for peak load which affects utilization. Longer procurement cycles mean the predictions for growth will have to be accurate and there is no room for error. This limits elasticity of the infrastructure and thus curbs models for experimentation and ad-hoc applications.

In short, here are some major challenges with the traditional model:

  • Low hardware utilization
  • Lack of multi-tenancy support
  • No self-serve model
  • Slow onboarding new applications/users
  • Low bandwidth network
  • High OPEX
  • Lack of big data skills and expertise

The Answer: Oracle Big Data Cloud

For organizations managing this growing volume of data and trying to gain insights/value, the right answer is turning to public cloud computing using open source software (OSS).

However, in certain cases, due to reasons such as organizational concerns, security issues, regulations in industries, or sovereignty rules such as the EU’s GDPR, not all big data deployments can move to the public cloud as is.

Hence, the power, value, and flexibility of Oracle’s Big Data Cloud, which is a modern platform for big data management with support for modern as well as traditional frameworks for:

  • Analytics
  • Business intelligence
  • Machine learning
  • Internet of Things (IoT)
  • Artificial intelligence (AI)

This is the only PaaS service of its kind that addresses the scenarios mentioned previously through two very special offerings:

  • Big Data Cloud: The most comprehensive, secure, performant, scalable, and feature-rich public cloud service for big data in the market today. And we have only gotten started building out this platform so expect more goodness down the road.
  • Cloud At Customer: For customers who cannot move to the public cloud, Oracle Cloud Machine can bring the public cloud to their own data center and provide the same benefits including having Oracle manage the cloud machine. This is a unique service that no other cloud provider offers.

Oracle Big Data Cloud brings the best of open source software to an easy-to-use and secure environment that is seamlessly integrated with other Oracle PaaS and IaaS services. Customers can get started in no time and do not require in-house expertise to put together a solution stack by assembling open source components and other third-party solutions.

7 Key Features of Oracle Big Data Cloud:

1. Advanced Storage: Build a data lake with all your data in one centralized store with advanced storage options. Smart caching allows for extreme performance. Provide your entire organization with access to all of the data sets in a secure and centralized environment. There is built-in data lineage and governance support. It is the easiest way to scale out storage independent of compute clusters.

2. Advanced Compute: Spin up or down compute clusters (Apache Hadoop, Apache Spark, or any analytic stack) within minutes. Auto-scale your clusters based on triggers or metrics. Use GPUs for deep learning.

3. Built-in ML and AI Tools: Data science tools such as Zeppelin come with the service to enable scientists to experiment and explore data sets. As mentioned earlier, there are compute shapes available with full GPU support for advanced algorithms and training in deep learning. A diverse catalog of machine learning libraries such as OpenCV, Scikit, Pandas, etc. makes it easy to build your next intelligent product.

4. Strong Security: Oracle Identity Cloud Service provides a way to allow granular access on a per-user basis and there are audit facilities built in as well. There is full encryption support for data-in-motion and data-at-rest. Sophisticated SDN allows customers to define their own network segments with advanced capability such as custom VPN, whitelisted IP, etc.

5. Integrated IaaS and PaaS Experience: Easy access to other Oracle Cloud Application Development services such as Oracle Event Hub Cloud Service, Oracle Analytics Cloud, Oracle MySQL Cloud, etc. Customers also have the option of using Oracle Cloud Infrastructure to back up Oracle Storage Cloud or create private VPNs to connect on-premise applications with services running in Oracle Public Cloud.

6. Fully Automated: The entire lifecycle of your infrastructure is automated. Our goal is to help you focus on the real differentiator, your data and your application. The platform will take care of all the undifferentiated work of provisioning, managing, patching, etc., so you can focus on your business.

7. World-Class Support: With an integrated approach, Oracle provides a one-stop shop for all things big data including support for Hadoop. Customers will not have to deal with multiple vendors to manage their stack.

For more information on Oracle’s Cloud Platform – Big Data offerings, please visit the Oracle Big Data Cloud webpages. And for the most advanced public cloud service, you can visit the Oracle Cloud Platform pages.

Or, sign up for an Oracle free trial to build and populate your own data lake. We have tutorials and guides to help you along.

Please leave a comment to let us know how we are doing.

Related:

  • No Related Posts

Object Storage for Big Data: What Is It? And Why Is It Better?

Hadoop was once the dominant choice for data lakes. But in today’s fast-moving world of technology, there’s already a new approach in town. And that’s the data lake based on Apache Spark clusters and object storage.

In this article, we’ll be taking a deep dive into why that has happened and the history behind it, and why exactly Apache Spark and object storage together is truly the better option now.

The Backdrop to the Rise of Object Storage With Apache Spark

A key big data and data-lake technology, Hadoop arose in the early 2000s. It has become massively popular in the last five years or so. In fact, because Oracle has always been committed to open source, our first big data project five or six years ago was based on Hadoop.

Try building a fully functioning data lake – free

Simply put, you can think of Hadoop as having two main capabilities: a distributed file system, HDFS, to persist data. And a processing framework, MapReduce, that enables you to process all that data in parallel.

Increasingly, organizations started wanting to work with all of their data and not just some of it. And as a result of that, Hadoop became popular because of its ability to store and process new data sources, including system logs, click streams, and sensor- and machine-generated data.

Around 2006, 2007, this was a game changer. At that time, Hadoop made perfect sense for the primary design goal of enabling you to build an on-premises cluster with commodity hardware to store and process this new data cheaply.

It was the right choice for the time – but it isn’t the right choice today.

The Emergence of Spark

Object Storage and Big Data

The good thing about open source is that it’s always evolving. The bad thing about open source is that it’s always evolving too.

What I mean is that there’s a bit of a game of catch-up you have to play, as the newest, biggest, best new projects come rolling out. So let’s take a look at what’s happening now.

Over the last few years, a newer framework than MapReduce emerged: Apache Spark. Conceptually, it’s similar to MapReduce. But the key difference is that it’s optimized to work with data in memory rather than disk. And this, of course, means that algorithms run on Spark will be faster, often dramatically so. In fact, if you’re starting a new big data project today and don’t have a compelling requirement to interoperate with legacy Hadoop or MapReduce applications, then you should be using Spark.

You’ll still need to persist the data and since Spark has been bundled with many Hadoop distributions, most on-premises clusters have used HDFS. That works, but with the rise of the cloud, there’s a better approach to persisting your data: object storage.

What Is Object Storage?

Object storage differs from file storage and block storage in that it keeps data in an “object” versus a block to make up a file. Metadata is associated to that file which eliminates the need for the hierarchical structure used in file storage—there is no limit to the amount of metadata that can be used. Everything is placed into a flat address space, which is easily scalable.

Object storage offers multiple advantages.

Essentially, object storage performs very well for big content and high stream throughput. It allows data to be stored across multiple regions, it scales infinitely to petabytes and beyond, and it offers customizable metadata to aid with retrieving files.

Many companies, especially those running a private cloud environment, look at object stores as a long-term repository for massive, unstructured data that needs to be kept for compliance reasons.

But it’s not just data for compliance reasons. Companies use object storage to store photos on Facebook, songs on Spotify, and files in Dropbox.

The factor that likely makes most people’s eyes light up is the cost. The cost of bulk storage for object store is much less than the block storage you would need for HDFS. Depending upon where you shop around, you can find that object storage costs about 1/3 to 1/5 as much as block storage (remember, HDFS requires block storage). This means that storing the same amount of data in HDFS could be three to five times as expensive as putting it in object storage.

So, Spark is a faster framework than MapReduce, and object storage is cheaper than HDFS with its block storage requirement. But let’s stop looking at those two components in isolation and look at the new architecture as a whole.

The Benefits of Combining Object Storage and Spark

What we recommend especially, is building a data lake in the cloud based on object storage and Spark. This combination is faster, more flexible and lower cost than a Hadoop-based data lake. Let’s explain this more.

Combining object storage in the cloud with Spark is more elastic than your typical Hadoop/MapReduce configuration. If you’ve ever tried to add and subtract nodes to a Hadoop cluster, you’ll know what I mean. It can be done but it’s not easy, while that same task is trivial in the cloud.

But there’s another aspect of elasticity. With Hadoop if you want to add more storage, you do so by adding more nodes (with compute). If you need more storage, you’re going to get more compute whether you need it or not.

With the object storage architecture, it’s different. If you need more compute, you can spin up a new Spark cluster and leave your storage alone. If you’ve just acquired many terabytes of new data, then just expand your object storage. In the cloud, compute and storage aren’t just elastic. They’re independently elastic. And that’s good, because your needs for compute and storage are also independently elastic.

What Can You Gain With Object Storage and Spark?

1. Object Storage + Spark = Business Agility

All of this means that your performance can improve. You can spin up many different compute clusters according to your needs. A cluster with lots of RAM, heavy-duty general-purpose compute, or GPUs for machine learning – you can do all of this as needed and all at the same time.

By tailoring your cluster to your compute needs, you can get results more quickly. When you’re not using the cluster, you can turn it off so you’re not paying for it.

Use object storage to become the persistent storage repository for the data in your data lake. On the cloud, you’ll only pay for the amount of data you have stored, and you can add or remove data whenever you want.

The practical effect of this newfound flexibility in allocating and using resources is greater agility for the business. When a new requirement arises, you can spin up independent clusters to meet that need. If another department wants to make use of your data that’s also possible because all of those clusters are independent.

2. Object Storage + Spark = Stability and Reliabilty

There’s a joke doing the rounds that while some people are successful with Hadoop, nobody is happy with it. In part that’s because operating a stable and reliable Hadoop cluster over an extended period of time delivers more than its share of frustration.

If you have an on-premise solution, upgrading your cluster typically means taking the whole cluster down and upgrading everything before bringing it up again. But doing so means you’re without access to that cluster while that’s happening, which could be a very long time if you run into difficulties. And when you bring it back up again, you might find new issues.

Rolling upgrades (node by node) are possible, but they’re still a very painful and difficult process. It’s not widely recommended. And it’s certainly not for the faint of heart.

And it’s not just upgrades and patches. Just running and tuning a Hadoop cluster potentially involves adjusting as many as 500 different parameters.

One way to address this kind of problem is through automation. Indeed, Oracle has taken this path with the Oracle Big Data Appliance.

But the cloud gives you another option. Fully managed Spark and object storage services can do all that work for you. Backup, replication, patching, upgrades, tuning, all outsourced.

In the cloud, the responsibility for stability and reliability is shifted from your IT department to the cloud vendor.

3. Object Storage + Spark = Lowered Total Cost of Ownership

Shifting the work of managing your object storage/Spark configuration to the cloud has another advantage too. You’re essentially outsourcing the annoying part of the work that no one else wants to do anyway. It’s a way to keep your employees engaged and working on exciting projects while saving on costs and contributing to a lowered TCO.

The lower TCO for this new architecture is about more than reducing labor costs, important though that is. Remember that object storage is cheaper than the block storage required by HDFS. Independent elastic scaling really does mean paying for what you use.

Conclusion

We boil down the advantages of this new data lake architecture built on object storage and Spark to three:

  1. Increased business agility
  2. More stability and reliability
  3. Lowered total cost of ownership

All of this is incredibly significant. So if you’d like to try building a fully functioning data lake, give the free trial a spin. We provide a step-by-step guide to walk you through it. Or, contact us if you have any questions, and we’ll be happy to help.

Related:

  • No Related Posts

CloudWorld NYC 2018: Data Management Cloud Keynote, Sessions and Activities

By: Edgar Haren

Principal Product Marketing Director

Oracle CloudWorld 2018, New York City, is nearly upon us. To attend the event, please register here.

Oracle’s Data Management Cloud Services will have a significant presence at CloudWorld with the following sessions.

  • Keynote “Revolutionize Your Data Management with World’s 1st Autonomous Database” with Monica Kumar, Vice President, Oracle – 11:20 a.m. – 11:50 a.m. – New York Ballroom West, Third Floor
  • “The Autonomous Data Warehouse Cloud – Simplifying the Path to Innovation” with George Lumpkin, Vice President, Oracle and Edgar Haren, Product Marketing, Oracle – 12:40 p.m. – 1:10 p.m. – New York Ballroom West, Third Floor
  • “Move Your Workloads: No Pain, Lots of Gain” with Sachin Sathaye, Senior Director Oracle, Zach Vinduska, Vice President of IT and Infrastructure, ClubCorp and Arvind Rajan, Entrepreneur and Architect, Astute

Join Monica Kumar, Vice President of Oracle Cloud Platform Product Marketing, in the keynote session as she discusses the rise of data as a valuable asset of your organization, and therefore your enterprise database, and what the impact it will have on the future success of your business. Whether you are looking to unlock the power of your application data or interested in integrating all your data and making it accessible to all employees, the Autonomous Database can offer significant advantages to your organization.

The Oracle Autonomous Data Warehouse Cloud session explores how this new solution can help customers easily and rapidly deploy data warehouse or data-marts to gain faster insights from their business-critical data. This session examines common customer challenges with traditional data warehouse deployments, while detailing the value of the Autonomous Data Warehouse Cloud autonomous life-cycle and the features aimed at helping overcome these obstacles. In addition, attendees will learn how Oracle’s new data management solution fits into a complete ecosystem consisting of business analytics, data integration, visualization, IoT and more.

Considering moving your workloads to the cloud? The third session brings customers to the forefront with leaders from ClubCorp and Astute discussing their individual cloud migration experiences. Hear how and why these organizations decided to “Lift and Shift” their IT deployments to Oracle Cloud. Also find out why Astute decided to migrate their existing cloud deployment off of Amazon Web Services for Oracle. This session will also cover the key benefits gained by the customers with tangible metrics on TCO and performance. Learn from Oracle leadership and customers the value of moving your workloads to Oracle Cloud.

To see the Autonomous Data Warehouse demo and how you can lift and shift your applications, visit us in the demo area and mini-theatre on the show floor. Talk to Oracle experts in this area.

See you in New York City!

Follow Us On Social Media:

Related:

  • No Related Posts

Oracle Releases Database Security Assessment Tool: A New Weapon in the War to Protect Your Data

By: Edgar Haren

Principal Product Marketing Director

Evaluate your database security before hackers do it for you!!

Today, we have guest blogger – Vipin Samar, Senior Vice President, Oracle

Data is a treasure. And in my last 20 years of working in security, I’ve found that hackers have understood this better than many of the organizations that own and process the data.

Attackers are relentless in their pursuit of data, but many organizations ignore database security, focusing only on network and endpoint security. When I ask the leaders responsible for securing their data why this is so, the most frequent answers I hear are:

  • Our databases are protected by multiple firewalls and therefore must be secure.
  • Our databases have had no obvious breaches so far, so whatever we have been doing must be working.
  • Our databases do not have anything sensitive, so there is no need to secure them.

And yet, when they see the results from our field-driven security assessment, the same organizations backtrack. They admit that their databases do, in fact, have sensitive data, and while there may be firewalls, there are very limited security measures in place to directly protect the databases. They are even unsure how secure their databases are, or if they have ever been hacked. Given the high volume of breaches, they realize that they must get ready to face attacks, but don’t know where to start.

Assessing database security is a good first step but it can be quite an arduous task. It involves finding holes from various angles including different points of entry, analyzing the data found, and then prioritizing next steps. With DBAs focused on database availability and performance, spending the time to run security assessments or to develop database security expertise is often not a priority.

Hackers, on the other hand, are motivated to attack and find the fastest way in, and then the fastest way out. They map out the target databases, looking for vulnerabilities in database configuration and over privileged users, run automated tools to quickly penetrate systems, and then exfiltrate sensitive data without leaving behind much of a trail.

If this were a war between organizations and hackers, it would be an asymmetric one. In such situations, assessing your own weaknesses and determining vulnerable points of attack becomes very critical.

Assess First

I am excited to announce availability of the Oracle Database Security Assessment Tool (DBSAT). DBSAT helps organizations assess the security configuration of their databases, identify sensitive data, and evaluate database users for risk exposure. Hackers take similar steps during their reconnaissance, but now organizations can do the same – and do it first.

DBSAT is a simple, lightweight, and free tool that helps Oracle customers quickly assess their databases. Designed to be used by all Oracle database customers in small or large organizations, DBSAT has no dependency on other tools or infrastructure and needs no special expertise. DBAs can download DBSAT and get actionable reports in as little as 10 minutes.

What can you expect DBSAT to find? Based upon decades of Oracle’s field experience in securing databases against common threats, DBSAT looks at various configuration parameters, identifies gaps, discovers missing security patches, and suggests remediation. It checks whether security measures such as encryption, auditing, and access control are deployed, and how they compare against best practices. It evaluates user accounts, roles, and associated security policies, determining who can access the database, whether they have highly sensitive privileges, and how those users should be secured.

Finally, DBSAT searches your database metadata for more than 50 types of sensitive data including personally identifiable information, job data, health data, financial data, and information technology data. You can also customize the search patterns to look for sensitive data specific to your organization or industry. DBSAT helps you not only discover how much sensitive data you have, but also which schemas and tables have them.

With easy-to-understand summary tables and detailed findings, organizations can quickly assess their risk exposure and plan mitigation steps. And all of this can be accomplished in a few minutes, without overloading valuable DBAs or requiring them to take special training.

Reviewing your DBSAT assessment report may be surprising – and in some cases, shocking – but the suggested remediation steps can improve your security dramatically.

Privacy Regulations and Compliance

DBSAT also helps provide recommendations to assist you with regulatory compliance. This includes the European Union General Data Protection Regulation (EU GDPR) that calls for impact assessments and other enhanced privacy protections. Additionally, DBSAT highlights findings that are applicable to EU GDPR and the Center for Internet Security (CIS) benchmark.

Nothing could be Easier

Oracle is a leader in preventive and detective controls for databases, and now with the introduction of DBSAT, security assessment is available to all Oracle Database customers. I urge you to download and try DBSAT – after all, it’s better that you assess your database’s security before the hackers do it for you!

Related:

  • No Related Posts

Decision Trees in Machine Learning, Simplified

By: Peter Jeffcock

Big Data Product Marketing

I did a series of blog posts on different machine learning techniques recently, which sparked a lot of interest. You can see part 1, part 2, and part 3 if you want to learn about classification, clustering, regression, and so on.

In that series I was careful to differentiate between a general technique and a specific algorithm like decision trees. Classification, for example, is a general technique used to identify members of a known class like fraudulent transactions, bananas, or high value customers. Read this machine learning post if you need a refresher or are wondering quite what bananas have to do with machine learning.

But once you’ve decided that your problem is addressable with classification, you still need to pick the right algorithm. And there are many to choose from. So I’m going to take a few of them and give a basic explanation, just like for techniques in the previous series. Starting with … decision trees.

Register for a free trial to build a data lake and try machine learning techniques

What is a Decision Tree?

Have you played Twenty Questions? The goal is to guess what object the “answerer” is thinking of by asking a series of questions that can only be answered by “yes” or “no”. If you’re playing well, then each answer helps you ask a more specific question until you get the right answer. You can think of that set of questions as a decision tree that guides you to more specific questions and ultimately the answer.

Or imagine you are calling a rather large company and end up talking to their “intelligent computerized assistant,” pressing 1 then 6, then 7, then entering your account number, mother’s maiden name, the number of your house before pressing 3, 5 and 2 and reaching a harried human being. You may think that you were caught in voicemail hell, but the company you called was just using a decision tree to get you to the right person.

Confusion About Decision Trees

So that’s what decision trees look like in real life. Let’s look at them in a machine learning context. Let’s imagine that we are going to use a decision tree algorithm to classify our customers by likelihood of churning (leaving us for a competitor).

How Does a Decision Tree Work in Machine Learning, Exactly?

Since we’re working with a specific algorithm we need to have some understanding of what the data looks like. We’ll start with something simple like this table below and train a model on that.

Customer ID

Age

Income

Gender

Churned

1008764

34

47,200

F

Yes

We’ll work with just five columns of data, also referred to as attributes or variables:

  • The customer number identifies the customer and is unique to that customer.
  • Age is an integer.
  • Income is an integer rounded to the nearest hundred.
  • Gender is a single letter.
  • Churned identifies class members, so this customer has churned which means she has recently stopped being a customer.

I only show one row, but presumably we’re a large company and there are many thousands or even millions of rows to work with. Initially we’ll work with those middle three columns as predictors and see if we can use age, income, and gender to predict churn. Simplifying only slightly, we can now just point an algorithm at this data set and it will churn out an answer. Let’s look behind the scenes at how it does that.

First step is to find the predictor that will give the best initial split of the data. Finding this involves testing all the predictors and their different values. Imagine that after this calculation, it’s discovered that age is the best initial predictor, with individuals 37 and under being more likely to churn than individuals over 37. So now the algorithm can split the data set into two parts, ideally two roughly equal parts. We now have two nodes on the tree, one root node and two leaf nodes.

At this point we can repeat the process on each of these leaf nodes. Perhaps age is again the best predictor for the under 37 year olds (under 27 more likely to churn than over 27) while gender or income becomes a better predictor for those 37 and older. Now we have seven nodes on the tree, or four leaf nodes. And associated with each node is a percentage: 65% of the customers in this group churned, 13% in this group churned and so on.

In fact, here is an actual diagram of part of a decision tree. (we cut off some of the deeper nodes). Take a look at node nine on the right hand side of the third row. It contains customers with income > 129,500, though the actual income levels are not shown on this particular diagram. The model predicts that members of this group will not churn.

The confidence in this prediction is 85.2% and it’s supported by 223 customers or 17.5% of the original 1274 customers. On the other side of the diagram, node four shows the first prediction of churn, in this case for customers with incomes between 41,500 and 89,500. One more split to node 14 shows a prediction of churn with 100% confidence.

Decision Tree Example

We could repeat this process until we run out of data to split, but it’s useful to set some criteria for stopping. One or more of the following would do:

  • We reach a predetermined number of levels in the tree like seven
  • A node has fewer than 10 (pick a number) records
  • A node has fewer than 5% (pick a number) of the original data set’s records

Interpreting the Results from a Decision Tree

Now we’ve finished building the decision tree let’s see what we’ve got. Each decision gives us two nodes, so if we stopped at seven levels, then we’ve got 128 leaf nodes in the tree. (This assumes, of course, that no nodes terminate prematurely due to insufficient data, which in fact did happen in that diagram above).

If we look at one of those nodes in detail we might find a leaf node that groups women aged 27 to 29 with incomes between 28,000 and 34,000 as being 70% likely to churn. And we’d have 127 more nodes like that with different characteristics. Think of these leaf nodes as segmenting our customer base in 128 different ways. If the head of marketing has a limited budget to prevent churn, then this exercise tells them where to focus scarce funds with a targeted message to maximize impact.

Making a More Accurate Decision Tree

But before we spend any money, we’d like to know how good this model is. If you recall the process from that previous machine learning techniques article, you’ll know that I didn’t use all that training data up creating the initial model. I kept back 20% to 40% that I can use to test that initial model.

Now I can use that “held-aside” data to classify customers the model hasn’t seen before to assess how accurate the model actually is. (If you’d like a little more insight on this process, go search for “confusion matrix”). Assuming it’s accurate enough, now the marketing group can start spending money.

But what if it’s not accurate? What can we do? There are lots of possibilities and lots of things to look out for, including picking a different algorithm. But assuming that a decision tree is appropriate here, let’s look at the data the algorithm worked on.

First, age, income, and gender may not be particularly good predictors of churn. I just picked those three to make the example simple. Machine learning isn’t magic. If you’ve got data that’s weakly correlated with what you are trying to predict, then you need better data. Perhaps you should run a graph analysis on your customer base to flag if a given customer is connected closely to people who churn, and then go back and build a new model. Maybe there are other predictors you can add: number of calls to support, dropped service, changed usage patterns, time to renewal and so on.

In fact, it would not be unusual to have thousands of columns potentially available instead of the 5 I started with above in this simplified example. And just as the algorithm could find the best predictor with 3 columns of data, it could do the same with thousands, though of course that will take more time and CPU resources.

Data Preparation for Decision Trees

Preparing data could be a whole topic in its own right, so I’ll just pull out a few items of interest that arise from this particular example. Notice that I didn’t suggest training the model using customer ID, even though it was the first attribute in the table of data. Those identifiers are unique to each individual. I could create a model that was very accurate at identifying churn but that would be based on the specific identifiers in the training data.

Introduce new customers with different IDs and the model would be useless because it had never seen any of those IDs before. It would be like training a model to predict people’s incomes given social security number. I’ll get 100% accuracy for the social security numbers in the training data, and 0% accuracy on any other numbers. Such models don’t generalize.

The opposite of a predictor that is different for each row is one that is more or less constant and the same for each row. If I’m working with women’s clothing, for example, you’d expect most customers to be female. (That was most, not all, making that column mostly constant). In that case, gender may be a less useful attribute to use in your model in this situation.

If you’re a data engineer, helping to ready datasets used for machine learning you need to work closely with the data science team to understand what they need. In addition to the points above you’d also want to make sure that your sample is random and not biased (don’t pick all the new customers from July that were captured via a highly targeted campaign), unless that’s the specific requirement. Also look out for attributes with too many missing values (suppose we didn’t ask customers for income, so only have that information for 5% of them).

Where Did That Machine Learning Answer Come From?

In closing, there’s one last characteristic of this algorithm that might be useful. Somebody could look at the model and ask: “these customers are 80% likely to churn? Why is that?” You can ask the scoring engine to return the decision tree “rule” and provide a good answer to that question. That’s not always the case with other algorithms.

As use of ML and AI becomes more widespread, knowing why a particular prediction or answer was generated can be important, if not essential. For example, regulators might want to know why a particular loan application was rejected, and an organization would want to understand and correct a pattern of discrimination that crept in due to the operation of a particular algorithm.

Decision trees are relatively transparent and humans can interpret their “thought process”. The same cannot be said about neural networks, for example, which from a human perspective operate in a pretty opaque manner.

If you’d like to try out building a data lake and using machine learning on the data, Oracle offers a free trial. Register today to see what you can do.

Related:

  • No Related Posts

GDPR and Big Data: 4 Steps to Compliance

GDPR is fast approaching – May 25, 2018. And the implications for big data are, well, big.

Essentially, GDPR is a regulation intended to strengthen and unify data protection for all individuals within the European Union, and it applies regardless of where the company is located. Whether you’re located in the US or Thailand, if you do business with EU residents, you are subject to GDPR.

Penalties for non-compliance can be steep, and companies worldwide are scrambling.

The Impact of GDPR on Big Data

Here’s some of the impact GDPR can have:

  • Increased necessity for reviewing and modifying organizational processes, applications, and systems
  • A need for new and more stringent privacy and security requirements to be addressed
  • And even potential fines up to 4% of annual revenue turnover and legal costs and recourse

Have you done all you can to address GDPR?

Download the free whitepaper, “Addressing GDPR Compliance Using Oracle Data Integration and Data Governance Solutions”

What to Do About GDPR

Fully addressing GDPR compliance requires a coordinated strategy involving different organizational entities including legal, human resources, marketing, IT, and more.

You’ll want to implement the right technology with effective security controls to:

  • Address regulatory requirements
  • Reduce risk
  • Improve competitive advantage by enabling increased flexibility and quicker time to market
  • Enable digital transformations

GDPR will include key requirements that directly impact the way organizations implement IT security.

In particular, to protect and secure personal data it is necessary to:

  • Know where the data resides (data inventory)
  • Understand risk exposure (risk awareness)
  • Review and where necessary, modify existing applications (application modification)
  • Integrate security into IT architecture (architecture integration)

Unfortunately, it’s not really possible to just buy a GDPR-compliant product and call it done. Because GDPR is really more about security processes and managing risk, there isn’t truly a product that will solve all of your problems. What you’ll have to do is ensure that your solutions work together to be truly GDPR compliant.

This can get complicated. So here is Oracle’s solutions framework for addressing GDPR. We’ll go through the four steps to GDPR compliance.

1. GDPR Discovery

The ability to monitor, enforce, and report on compliance to GDPR will be essential. You’ll need clear insight into how data is coming into your organization, what happens to it, and how it leaves the organization.

For that, you’ll need data governance that provides capabilities such as data lineage, asset inventory, and data discovery. The more data is being reused without proper data governance, the greater the risk of data-handling mishaps. Choose your tools wisely to help with your data governance.

To learn more, download our free whitepaper, “Addressing GDPR Compliance Using Oracle Data Integration and Data Governance Solutions.”

2. GDPR Enrichment

You may need application modifications to comply with the rights of the data subject (people like you and me). This can be a major challenge, as all personal information can come in many different formats and types, and can be stored in various locations and held in different forms such as voice recordings and video.

In addition, because individuals can request all information about themselves, it must be possible to dynamically handle and automate a potentially large number of these requests—and delete the data, with GDPR’s “right to be forgotten.”

You might also need to consolidate customer data to get a single view of the data subjects across the organization. If an organization can’t identify all personal information that belongs to an individual, that would be an indication that they don’t have appropriate control over their personal information – which can be a red flag to regulators.

3. GDPR Foundation

You want good IT security with an emphasis on availability and performance of the services. That’s because you don’t know when your system will be tasked with pulling information, and how much at once. You’re also going to be responsible for the ability to restore the availability and access to personal data in a timely manner if there’s been a physical or technical incident.

Here’s what you’ll have to think about too: encryption will be more important than ever. Ensure you have detailed application-to-storage mapping so any application can be mapped to the physical storage it uses.

4. GDPR Enforcement

You’ll need technologies that can protect people, software, and systems. This includes products and services that provide predictive, preventive, detective and responsive security controls across database security, identity and access management, and much more.

It’s a common misperception that GDPR lists out specific technologies to be applied. But actually, it’s more that GDPR holds the controller and processor accountable, and requires that they consider the risks associated with the data they handle and adopt appropriate security controls.

For enforcement, here are the four groups that encompass the basic security measures that organizations should consider implementing.

Overall, GDPR addresses the key security tenets of confidentiality, integrity and availability of systems and data.

Download the free whitepaper, “Helping Address GDPR Compliance Using Oracle Security Solutions.”

The Opportunities of GDPR for Big Data

So that’s a lot to do. But look at the positive. Some companies view this as a once-in-a-generation chance to truly take a look at their data management and transform it according to general best practices. Data volumes have exploded in the last ten years, and many are working with outdated architectures that haven’t been optimally built. This may be your chance to do something about it and with GDPR looming, it just might be easier to get executive support.

It’s also a chance to take a second look at your tools. GDPR requires higher and more robust reporting and auditing structures so your organization can respond to any Data Protection Authorities and individuals who may have questions. So if there’s any tool you’ve had your eye on previously, now’s your chance …

Future Proof Your Big Data Compliance

GDPR is not likely to be the only data regulation your organization will have to address. There are multiple laws out there, and the laws are going to change. These laws and regulations are going to be intended to protect citizens, the economy, government, and more. With data breaches and cyber security incidents on the rise, it’s likely this will continue to be an issue.

Consider future-proofing your data, and getting it right now to avoid more headaches (and potentially bigger headaches) in the future.

Consider the Cloud for GDPR

This might also be the perfect time to think about the cloud for your data. Your data is going to have to be:

  • Private
  • Be easily portable and removable
  • Meet the data minimization principle

At the same time, you’re going to have to understand your internal controls, infrastructure and data architecture in addition to that of any external partners or service providers. The liability of new regulation is going to fall on all parties. This just might be easier if you switch to a cloud or hybrid solution. And it could lead to reducing costs and risks.

Conclusion

Don’t underestimate the length of time it will take to align with GDPR. Remember, it’s not that you should start on May 25 – that’s the date you’re supposed to be compliant. At Oracle, we’re committed to helping organizations with GDPR. Talk to us if you have any questions or would like to learn more about how we can help.

Related:

  • No Related Posts

Why Move SAP Applications to Oracle Cloud?

By: Edgar Haren

Principal Product Marketing Director

Today we have guest blogger, Bertrand Matthelie, Senior Principal Product Marketing Director, providing us with insights into the value of migrating your SAP applications to Oracle Cloud.

SAP NetWeaver-based applications have last year been certified on Oracle Cloud Infrastructure. NetWeaver-based applications represent most of the deployed SAP applications, and the majority of them are powered by Oracle databases. Indeed, while SAP is encouraging customers to move to S/4HANA, a Rimini Street survey shows that 65% of them have no plans to do so. They’re unable to build a business case, deem the ROI unclear, consider S/4HANA to be an unproven, early stage product, and face significant migration & implementation costs. Most customers want instead to keep running their existing proven SAP applications that they spent years customizing to their needs. At the same time however, they face pressure to reduce costs and improve agility to better support the business. Digital disruption is hard at work in all industries and organizations are looking for ways to shift resources from maintenance to innovation. Up to 80% of IT budgets can be spent on “keeping the lights on”; moving enterprises applications to the cloud represents an attractive way to reduce costs, free up resources and focus on higher value activities than infrastructure management.

Moving SAP applications & Oracle Databases to Oracle Cloud enables customers to preserve existing investments while accelerating innovation, relying on the only cloud architected for enterprise workloads and optimized for Oracle Database. Key benefits include:

  • High and predictable performancefor SAP/enterprise applications with dedicated bare metal instances as well as high performance network and storage resources.
  • Best Oracle Database performance: As demonstrated in a recent Accenture report focused on running enterprise workloads in the cloud, Oracle databases run up to 7.8x faster on Oracle Cloud Infrastructure (OCI) vs leading cloud provider.
  • Lower costs & transparency: The Accenture report also demonstrates that customers can benefit from up to 34% lower infrastructure costs for their SAP/enterprise workloads relying on OCI vs leading cloud provider. Additionally, there are no hidden costs with Oracle Cloud, and Universal Credits allow you to benefit from simple, flexible and predictable pricing.
  • Security and governance: Compute and network isolation help ensure data security; Compartment capabilities coupled with identity and access management and audit allow visibility and control for your SAP deployments.
  • Complete & integrated cloud, enabling you to leverage Oracle’s most comprehensive PaaS & SaaS offering to for example connect your existing SAP applications to SaaS modules from any provider, or to extend your SAP applications with mobile interfaces or chatbots. According to the Rimini Street survey mentioned earlier, 30% of surveyed SAP customers look to augment their existing platforms with cloud applications for innovation.

Various resources to learn more are at your disposal, discover now how you can ensure business continuity, reduce costs and accelerate innovation!

Let us know if you have any question or comment.

Follow Us On Social Media:

Related:

  • No Related Posts

Oracle Ask TOM Office Hours Now Live!

By: Edgar Haren

Principal Product Marketing Director

Hundreds of thousands of companies around the world rely on Oracle Database to power their mission-critical applications. Millions of developers and administrators rely on the Oracle Database dev team to provide them with the tools and knowledge they need to succeed.

AskTOM Office Hours continues the pioneering tradition of Ask TOM. Launched in 2000 by Tom Kyte, the site now has a dedicated team who answer hundreds of questions each month. Together they’ve helped millions of developers understand and use Oracle Database.

AskTOM Office Hours takes this service to the next level, giving you live, direct access to a horde of experts within Oracle. All dedicated to helping you get the most out of your Oracle investment. To take advantage of this new program, visit the Office Hours home page and find an expert who can help. Sign up for the session and, at the appointed hour, join the webinar. There you can put your questions to the host or listen to the Q&A of others, picking up tips and learning about new features.

Have a question about upgrading your database? Sign up for Mike Dietrich and Roy Swonger’s session. Struggling to make sense of SQL analytic functions? AskTOM’s very own Connor McDonald and Chris Saxon will be online each month to help. JSON, Application Express, PL/SQL, In-Memory, Multitenant…if you’ve got a question, we’ve got an expert with an answer, and they’re just a URL away.

Our experts live all over the globe. So even if you inhabit Middleofnowhereland, you’re sure to find a time-slot that suits you.

You need to make the most of Oracle Database and its related technologies. It’s our job to make it easy for you.

So visit https://asktom.oracle.com/pls/apex/f?p=100:500 and sign up for your sessions.

AskTOM Office Hours: Dedicated to Customer Success

Resources:

Follow Us On Social Media:

Related:

  • No Related Posts