Modern Customer Experience 2017: Event Registration Now Open

By: Nathan Joynt

Product Marketing Content Director

We’re happy to announce super early bird registration is now open for Modern Customer Experience 2017 from April 25-27 in sunny Las Vegas, Nevada.

Experience modern commerce technology and best practices as you continue to build and foster preferred shopping experiences across all customer touch points in 2017 including marketing, service and sales.

Register by January 22, 2017 to receive:

  • The super early bird rate of $999 (a $900 savings off the standard conference price).
  • If you attended last year, you can also use the code DALM17 for a rate of $899.

Register now!

Call for Speakers for Modern Commerce Experience 2017

We are looking for those who have stories to tell, insights to share or even predictions to make.

Submit your ideas and help inspire our amazing commerce community.


CALL FOR ABSTRACTS: Oracle BIWA Summit ’17 – THE Big Data + Analytics + Spatial + Cloud + IoT …

THE Big Data + Analytics + Spatial + Cloud + IoT + Everything “Cool”

Oracle User Conference 2017

January 31 – February 2, 2017

Oracle Conference Center at Oracle Head Quarters Campus, Redwood Shores, CA

What Oracle Big Data + Analytics + Spatial + Cloud + IoT + Everything “Cool” Successes Can You Share?

We want to hear your story. Submit your proposal today for OracleBIWA Summit 2017, January 31– February 2, 2017 and share your successes with Oracle technology. Speaker proposals now are being accepted through October 1, 2016. Submit now for possible early acceptance and publication inOracleBIWA Summit 2017promotion materials.

Presentations must be non-commercial. Sales promotions for products or services disguised as proposals will be eliminated. Speakers whose abstracts are accepted will be expected to submit at a later date a presentation outline and presentation PDF slide deck. Accompanying technical and use case papers are encouraged, but not required.

Click HERE to submit your abstract(s) for OracleBIWA Summit 2017.

BIWA Summits are organized and managed by the Oracle Business Intelligence, Data Warehousing and Analytics (BIWA) SIG, the Oracle Spatial and Graph SIG—both Special Interest Groups in the Independent Oracle User Group (IOUG), and the Oracle Northern California User Group. BIWA Summits attract presentations and talks from the top BI, DW, Advanced Analytics, Spatial, and Big Data experts. The 3-day BIWA Summit 2016 event involved Keynotes by Industry experts, Educational sessions, Hands-on Labs and networking events. Click HERE to see presentations and content from BIWA Summit 2016.

Call for Speaker DEADLINE is October 1, 2016 at midnight Pacific Time.

Complimentary registration to OracleBIWA Summit 2017 is provided to the primary speaker of each accepted abstract.

Note: One complimentary registration per accepted session will be provided. Any additional co-presenters need to register for the event separately and provide appropriate registration fees. It is up to the co-presenters’ discretion which presenter to designate for the complimentary registration.

Please submit speaker proposals in one of the following tracks:

  • Advanced Analytics
  • Business Intelligence
  • Big Data + Data Discovery
  • Data Warehousing and ETL
  • Cloud
  • Internet of Things
  • Spatial and Graph
  • …Anything else “Cool” using Oracle technologies in “novel and interesting” ways

Learn from Industry Experts from Oracle, Partners, and Customers

Come join hundreds of professionals with shared interests in the successful deployment of Oracle Business Intelligence, Data Warehousing, IoT and Analytical products:

Cloud & Big Data

DW & Data Integration

BI & Data Discovery & Visualization

Advanced Analytics & Spatial

Internet of Things

Oracle Database Cloud Service

Big Data Appliance

Oracle Data Visualization Cloud Service

Hadoop abd Spark

Big Data Connectors (Hadoop & R)

Oracle Data as a Service

Engineered Systems


Oracle Partitioning

Oracle Data Integrator (ETL)


Oracle Big Data Preparation Cloud Service

Big Data Discovery

Data Visualization


OBI Applications



Real-Time Decisions

Oracle Advanced Analytics

Oracle Spatial and Graph

Oracle Data Mining & Oracle Data Miner

Oracle R Enterprise

SQL Patterns

Oracle Text

Oracle R Advanced Analytics for Hadoop

Big Data from sensors

Edge Analytics

Industrial Internet

IoT Cloud

Monetizing IoT



What To Expect

500+ Attendees | 90+ Speakers | Hands on Labs | Technical Content| Networking

Exciting Topics Include:

  • Database, Data Warehouse, and Cloud, Big Data Architecture
  • Deep Dives on existing Oracle BI, DW and Analytics products and Hands on Labs
  • Updates on the latest Oracle products and technologies e.g. Oracle Big Data Discovery, Oracle Visual Analyzer, Oracle Big Data SQL
  • Novel and Interesting Use Cases of Everything! Spatial, Text, Data Mining, ETL, Security, Cloud
  • Working with Big Data: Hadoop, “Internet of Things”, SQL, R, Sentiment Analysis
  • Oracle Big Data Discovery, Oracle Business Intelligence (OBIEE), Oracle Spatial and Graph, Oracle Advanced Analytics—AllBetter Together

Example Talks from BIWA Summit 2016:

[Visit to see the last year’s Full Agenda from BIWA’16 and to download copies of BIWA’16 presentations and HOLs.]

Advanced Analytics

  • Dogfooding – How Oracle Uses Oracle Advanced Analytics To Boost Sales Efficiency, Frank Heilland, Oracle Sales and Support
  • Fiserv Case Study: Using Oracle Advanced Analytics for Fraud Detection in Online Payments, Julia Minkowski, Fiserv
  • Enabling Clorox as Data Driven Enterprise, Yigal Gur, Clorox
  • Big Data Analytics with Oracle Advanced Analytics 12c and Big Data SQL and the Cloud, Charlie Berger, Oracle
  • Stubhub and Oracle Advanced Analytics, Brian Motzer, Stubhub
  • Fault Detection using Advanced Analytics at CERN’s Large Hadron Collider: Too Hot or Too Cold, Mark Hornick, Oracle
  • Large Scale Machine Learning with Big Data SQL, Hadoop and Spark, Marcos Arancibia, Oracle
  • Oracle R Enterprise 1.5 – Hot new features!, Mark Hornick, Oracle

BI and Visualization

  • Electoral fraud location in Brazilian General Elections 2014, Alex Cordon, Henrique Gomes, CDS
  • See What’s There and What’s Coming with BICS & Data Visualization, Philippe Lions, Oracle
  • Optimize Oracle Business Intelligence Analytics with Oracle 12c In-Memory Database option, Kai Yu, Dell
  • BI Movie Magic: Maps, Graphs, and BI Dashboards at AMC Theatres, Tim Vlamis, Vlamis
  • Defining a Roadmap for Migrating to Oracle BI Applications on ODI, Patrick Callahan, AST Corp.
  • Free form Data Visualization, Mashup BI and Advanced Analytics with BI 12c, Philippe Lions, Oracle

Big Data

  • How to choose between Hadoop, NoSQL or Oracle Database , Jean-Pierre Djicks, Oracle
  • Enrich, Transform and Analyse Big Data using Big Data Discovery and Visual Analyzer, Mark Rittman, Rittman Mead
  • Oracle Big Data: Strategy and Roadmap, Neil Mendelson, Oracle
  • High Speed Video Processing for Big Data Applications, Melliyal Annamalai, Oracle
  • How to choose between Hadoop, NoSQL or Oracle Database, Shyam Nath, General Electric
  • What’s New With Oracle Business Intelligence 12c, Stewart Bryson, Red Pill
  • Leveraging Oracle Big Data Discovery to Master CERN’s Control Data, Antonio Romero Marin, CERN

Cloud Computing

  • Hybrid Cloud Using Oracle DBaaS: How the Italian Workers Comp Authority Uses Graph Technology, Giovanni Corcione, Oracle
  • Oracle DBaaS Migration Road Map, Daniel Morgan, Forsythe Meta7
  • Safe Passage to the CLOUD – Analytics, Rich Solari, Privthi Krishnappa, Deloitte
  • Oracle BI Tools on the Cloud–On Premise vs. Hosted vs. Oracle Cloud, Jeffrey Schauer, JS Business Intelligence

Data Warehousing and ETL

  • Making SQL Great Again (SQL is Huuuuuuuuuuuuuuuge!) , Panel Discussion, Andy Mendelsohn, Oracle, Steve Feuerstein, Oracle, George Lumpkin, Oracle
  • The Place of SQL in the Hybrid World, Kerry Osborne and Tanel Poder, Accenture Enkitec Group
  • Is Oracle SQL the best language for Statistics, Brendan Tierney, Oralytics
  • Taking Full Advantage of the PL/SQL Compiler, Iggy Ferenandez, Oracle

Internet of Things

  • Industrial IoT and Machine Learning – Making Wind Energy Cost Competitive, Robert Liekar, M&S Consulting

Spatial Summit

  • Utilizing Oracle Spatial and Graph with Esri for Pipeline GIS and Linear Asset Management, Dave Ellerbeck, Global Information Systems
  • Oracle Spatial and Graph: New Features for 12.2, Siva Ravada, Oracle
  • High Performance Raster Database Manipulation and Data Processing with Oracle Spatial and Graph, Qingyun (Jeffrey) Xie, Oracle

Example Hands-on Labs from BIWA Summit 2016:

  • Scaling R to New Heights with Oracle Database, Mark Hornick, Oracle, Tim Vlamis, Vlamis Software
  • Learn Predictive Analytics in 2 hours!! Oracle Data Miner 4.1, Charlie Berger, Oracle, Brendan Tierney, Oralytics, Karl Rexer, Rexer Analytics
  • Predictive Analytics using SQL and PL/SQL, Oracle Brendan Tierney, Oralytics, Charlie Berger, Oracle
  • Oracle Data Visualization Cloud Service Hands-On Lab with Customer Use Cases, Pravin Patil, Kapstone

Lunch & Partner Lightning Rounds

  • Fast and Fun 5 Minute Presentations from Each Partner–Must See!

Submit your abstract(s) today, good luck and hope to see you there!

See last year’s Full Agenda from BIWA’16.



Three Successful Customers Using IoT and Big Data

By: Peter Jeffcock

Big Data Product Marketing

When I wrote about the convergence of IoT and big data I mentioned that we have successful customers. Here I want to pick three that highlight different aspects of the complete story. There are a lot of different components to a complete big data solution. These customers are using different pieces of the Oracle solution, integrating them with existing software and processes.

Gemü manufactures precision valves used to make things like pharmaceuticals. As you can imagine, it’s critical that valves operate correctly to avoid adding too much or too little of an active ingredient. So Gemü turned to the Oracle IoT Cloud Service to help the monitor those valves in use in their customers’ production lines. This data helps Gemü and their partners ensure the quality of their product. And over time, this data will enable them to predict failures or even the onset of out of tolerance performance. Predictive maintenance is a potentially powerful new capability and enables Gemü to maintain the highest levels of quality and safety.

From small valves to the largest machine on the planet: the Large Hadron Collider at CERN. There are many superlatives about this system. Their cryogenics system is also the largest in the world, and has to keep 36,000 tons of superconducting magnets at 1.9K (-271.3 Celsius) using 120 tons of liquid helium. Failures in that system can be costly. They’ve had problems with a weasel and a baguette, both of which are hard to predict, but other failures could potentially be stopped. Which is why CERN is using Big Data Discovery to help them understand what’s going on with their cryogenics system. They are also using predictive analytics with the ultimate goal of predicting failures before they happen, and avoiding the two months it can take to warm up systems long enough to make even a basic repair, before cooling them down again.

And finally this one.

IoT and big data working together can help a plane to fly, a valve to make pharmaceuticals, and the world’s largest machine to stay cool. What can we do for you?


Focus On Big Data at Oracle OpenWorld!

Oracle OpenWorld is fast approaching and you won’t want to miss the big data highlights. Participate in our live demos, attend a theater session, or take part in one of our many hands-on labs, user forums, and conference sessions all dedicated to big data.

Whether you’re interested in machine learning, predictive maintenance, real-time analytics, the internet of things (IoT), data-driven marketing, or learning how Oracle supports open source technologies such as Kafka, Apache Spark, and Hadoop as part of our core strategy, we have the information for you.

For more details on how to center your attention on Big Data at OpenWorld, you can access the “Focus On” Big Data program guide link, however here are a few things you won’t want to miss:

  • General Session: Oracle Cloud Platform for Big Data [GEN7471]

    Tuesday, Sept. 20th11:00 a.m. | Moscone South—103

    Oracle Cloud Platform for big data enables complete, secure solutions that maximize value to your business, lowers costs and increases agility, and embraces open source technologies. Learn about Oracle’s strategy for big data in the cloud.
  • Oracle Big Data Management in the Cloud [CON7473]

    Wednesday, Sept. 21, 11:00 a.m. | Moscone South—302

    Successful analytical environments require seamless integration of Hadoop, Spark, NoSQL, and relational databases. Data virtualization can eliminate data silos and make this information available to your entire business. Learn to tame the complexity of data management.
  • Oracle Big Data Lab in the Cloud [CON7474]

    Wednesday, Sep 21, 12:15 p.m. | Moscone South—302

    Business analysts and data scientists can experiment and explore diverse data sets and uncover what new questions can be answered in a data lab environment. Learn about the future of the data lab in the cloud and also how lab insights can unlock the value of big data for the business.
  • Oracle Big Data Integration in the Cloud [CON7472]

    Tuesday, Sep 20, 4:00 p.m. | Moscone South—302

    Oracle Data Integration’s cloud services and solutions can help manage your data movement and integration challenges across on-premises, cloud, and other data platforms. Get started quickly in the cloud with data integration for Hadoop, Spark, NoSQL, and Kafka. You’ll also see the latest data preparation self-service tools for nontechnical users.
  • Drive Business Value and Outcomes Using Big Data Platform [THT7828]

    Monday, Sep 19, 2:30 p.m. | Big Data Theater, Moscone South Exhibition Hall

    Driving business value with big data requires more than big data technology. Learn how to maximize the value of big data by bringing together big data management, big data analytics, and enterprise applications. The session explores several different use cases and shows what it takes to construct integrated solutions that address important business problems.
  • Oracle Streaming Big Data and Internet of Things Driving Innovation [CON7477]

    Wednesday, Sep 21, 3:00 p.m. | Moscone South—302

    In the Internet of Things (IoT), a wealth of data is generated, and can be monitored and acted on in real time. Applying big data techniques to store and analyze this data can drive predictive, intelligent learning applications. Learn about how the convergence of IoT and big data can reduce costs, generate competitive advantage, and open new business opportunities.
  • Oracle Big Data Showcase

    Moscone South

    Visit the Big Data Showcase throughout the show and participate in a live demo or attend one of our many dedicated 20-minute theater sessions with big data experts.

We are looking forward to Oracle OpenWorld 2016 and we can’t wait to see you there!

In the meantime, check out for more information.


Internet of Things and Big Data – Better Together

By: Peter Jeffcock

Big Data Product Marketing

What’s the difference between the Internet of Things and Big Data? That’s not really the best question to ask, because these two are much more alike than they are different. And they complement each other very strongly which is one reason we’ve written a white paper on the convergence.

Big data is all about enabling organizations to use more of the data around them: things customers write in social media; log files from applications and processes; sensor and device data. And there’s IoT! One way to think of it is as one of the sources for big data.

But IoT is more than that. It’s about collecting all that data, analyzing it in real time for events or patterns of interest, and making sure to integrate any new insight into the rest of your business. With you add the rest of big data to IoT, there’s much more data to work with and powerful big data analytics to come up with additional insights.

Best to look at an example. Using IoT you can track and monitor assets like trucks, engines, HVAC systems, and pumps. You can correct problems as you detect them. With big data, you can analyze all the information you have about failures and start to uncover the root causes. Combine the two and now you can not just react to problems as they occur. You can predict them, and fix them before they occur. Go from being reactive to being proactive.

Check out this infographic. The last data point, down at the bottom right hand side may be the most important one. Only 8% of businesses are fully capturing and analyzing IoT data in a timely fashion.

Nobody likes to arrive last to a party and find the food and drink all gone. This party’s just getting started. You should be asking every vendor you deal with how they can help you take advantage of IoT and big data – they really are better together, and there’s lots of opportunity. Next post will highlight 3 customers who are taking advantage of that opportunity.


DIY Hadoop: Proceed At Your Own Risk

Could your security and performance be in jeopardy?

Nearly half (3.2 billion, or 45%) of the seven billion people in the world used the Internet in 2015, according to a BBC news report. If you think all those people generate a huge amount of data (in the form of website visits, clicks, likes, tweets, photos, online transactions, and blog posts), wait for the data explosion that will happen when the Internet of Things (IoT) meets the Internet of People. Gartner, Inc. forecast that there will be twice as many–6.4 billion–Internet-connected gadgets (everything from light bulbs to baby diapers to connected cars) in use worldwide in 2016, up 30 percent from 2015, and will reach over 20 billion by 2020.

Companies of all sizes and in virtually every industry are struggling to manage the exploding amounts of data. To cope with the problem, many organizations are turning to solutions based on Apache Hadoop, the popular open-source software framework for storing and processing massive datasets. But purchasing, deploying, configuring, and fine-tuning a do-it-yourself (DIY) Hadoop cluster to work with your existing infrastructure can be much more challenging than many organizations expect, even if your company has the specialized skills needed to tackle the job.

But as both business and IT executives know all too well, managing big data involves far more than just dealing with storage and retrieval challenges—it requires addressing a variety of privacy and security issues as well. Beyond the brand damage that companies like Sony and Target have experienced in the last few years from data breaches, there’s also the likelihood that companies that fail to secure the life cycle of their big data environments will face regulatory consequences. Early last year, the Federal Trade Commission released a report on the Internet of Things that contains guidelines to promote consumer privacy and security. The Federal Trade Commission’s document, Careful Connections: Building Security in the Internet of Things, encourages companies to implement a risk-based approach and take advantage of best practices developed by security experts, such as using strong encryption and proper authentication.

While not calling for new legislation (due to the speed of innovation in the IoT space), the FTC report states that businesses and law enforcers have a shared interest in ensuring that consumers’ expectations about the security of IoT products are met. The report recommends several “time-tested” security best practices for companies processing IoT data, such as:

  • Implementing “security by design” by building security into your products and services at the outset of your planning process, rather than grafting it on as an afterthought.
  • Implementing a defense-in-depth approach that incorporates security measures at several levels.

Business and IT executives who try to follow the FTC’s big data security recommendations are likely to run into roadblocks, especially if you’re trying to integrate Hadoop with your existing IT infrastructure. The main problem with Hadoop is that is it wasn’t originally built with security in mind; it was developed solely to address massive distributed data storage and fast processing, which leads to the following threats:

  • DIY Hadoop. A do-it-yourself Hadoop cluster presents inherent risks, especially since many times it’s developed without adequate security by a small group of people in a laboratory-type setting, closed off from a production environment. As a cluster grows from small project to advanced enterprise Hadoop, every period of growth—patching, tuning, verifying versions between Hadoop modules, OS libraries, utilities, user management, and so forth—becomes more difficult and time-consuming.
  • Unauthorized access. Built under the principle of “data democratization”—so that all data is accessible by all users of the cluster— Hadoop has had challenges complying with certain compliance standards, such as the Health Insurance Portability and Accountability Act (HIPAA) and the Payment Card Industry Data Security Standard (PCI DSS). That’s due to the lack of access controls on data, including password controls, file and database authorization, and auditing.
  • Data provenance. With open source Hadoop, it has been difficult to determine where a particular dataset originated and what data sources it was derived from. Which means you can end up basing critical business decisions on analytics taken from suspect or compromised data.

2X Faster Performance than DIY Hadoop

In his keynote at last year’s Oracle OpenWorld 2015, Intel CEO Brian Krzanich described work Intel has been doing with Oracle to build high performing datacenters using the pre-built Oracle Big Data Appliance, an integrated, optimized solution powered by the Intel Xeon processor family. Specifically, he referred to recent benchmark testing by Intel engineers that showed an Oracle Big Data Appliance solution with some basic tuning achieved nearly two times better performance than a comparable DIY cluster built on comparable hardware.

Not only is it faster, but it was designed to meet the security needs of the enterprise. Oracle Big Data Appliance automates the steps required to deploy a secure cluster – including complex tasks like setting up authentication, data authorization, encryption, and auditing. This dramatically reduces the amount of time required to both set up and maintain a secure infrastructure.

Do-it-yourself (DIY) Apache Hadoop clusters are appealing to many business and IT executives because of the apparent cost savings from using commodity hardware and free software distributions. As I’ve shown, despite the initial savings, DIY Hadoop clusters are not always a good option for organizations looking to get up to speed on an enterprise big data solution, both from a security and performance standpoint.

Find out how your company can move to an enterprise Big Data architecture with Oracle’s Big Data Platform at


The Surprising Economics of Engineered Systems

By: Peter Jeffcock

Big Data Product Marketing

The title’s not mine. It comes from a video done for us by ESG, based on their white paper, which looks at the TCO of building your own Hadoop cluster vs buying one ready-built (Oracle Big Data Appliance). You should watch or read, depending on your preference, or even just check out the infographic. The conclusion could be summed up as “better, faster, cheaper, pick all three”. Which is not what you’d expect. But they found that it’s better (quicker to deploy, lower risk, easier to support), faster (from 2X to 3X faster than a comparable DIY cluster) and cheaper (45% cheaper if you go with list pricing).

So while you may not think that an engineered system like the Big Data Appliance is the right system for you, it should always be on your shortlist. Compare it with building your own – you’ll probably be pleasantly surprised.

There’s a lot more background in the paper in particular, but let me highlight a few things:

– We have seen some instances where other vendors offer huge discounts and actually beat the BDA price. If you see this, check two things. First, will that discount be available for all future purchases or is this just a one-off discount. And second, remember to include the cost that you incur to setup, manage, maintain and patch the system.

– Consider performance. We worked with Intel to tune Hadoop for this specific configuration. There are something like 500 different parameters on Hadoop that can impact performance one way or the other. That tuning project was a multi-week exercise with several different experts. The end result was performance of nearly 2X, sometimes up to 3X faster than a comparable, untuned DIY cluster. Do you have the resources and expertise to replicate this effort? Would a doubling of performance be useful to you?

– Finally, consider support. A Hadoop cluster is a complex system. Sometimes problems arise that result from the interaction of multiple components. It can be really hard to figure those out, particularly when multiple vendors are involved for different pieces. When no single component is “at fault” it’s hard to find somebody to fix the overall system. You’d never buy a computer with 4 separate support contracts for operating system, CPU, disk and network card – you’d want one contract for the entire system. The same can be true for your Hadoop clusters as well.


Predictions for Big Data Security in 2016

Leading into 2016, Oracle made ten big data predictions, and one in particular around security. We are nearly four months into the year and we’ve seen these predictions coming to light.

Increase in regulatory protections of personal information

Early February saw the creation of the Federal Privacy Council, “which will bring together the privacy officials from across the Government to help ensure the implementation of more strategic and comprehensive Federal privacy guidelines. Like cyber security, privacy must be effectively and continuously addressed as our nation embraces new technologies, promotes innovation, reaps the benefits of big data and defends against evolving threats.”

The European Union General Data Protection Regulation is a reform of EU’s 1995 data protection rules (Directive 95/46/EC). Their Big Data fact sheet was put forth to help promote the new regulations. “A plethora of market surveys and studies show that the success of providers to develop new services and products using big data is linked to their capacity to build and maintain consumer trust.” As a timeline, the EU expects adoption in Spring 2016 and enforcement will begin two years later in Spring 2018.

Earlier this month, the Federal Communications Commission announced a proposal to restrict Internet providers’ ability to share the information they collect about what their customers do online with advertisers and other third parties.

Increase use of classification systems that categorize data into groups with pre-defined policies for access, redaction and masking.

Infosecurity Magazine article highlights the challenge of data growth and the requirement for classification: “As storage costs dropped, the attention previously shown towards deleting old or unnecessary data has faded. However, unstructured data now makes up 80% of non-tangible assets, and data growth is exploding. IT security teams are now tasked with protecting everything forever, but there is simply too much to protect effectively – especially when some of it is not worth protecting at all.”

The three benefits of classification highlighted include the ability to raise security awareness, prevent data loss, and address records management regulations. All of these are legitimate benefits of data classification that organizations should consider. Case in point, Oracle customer Union Investment increased agility and security by automatically processing investment fund data within their proprietary application, including complex asset classification with up to 500 data fields, which were previously distributed to IT staff using spreadsheets.

Continuous cyber-threats will prompt companies to both tighten security, as well as audit access and use of data.

This is sort of a no-brainer. We know more breaches are coming, such as here, here and here. And we know companies increase security spending after they experience a data breach or witness one close to home. Most organizations now know that completely eliminating the possibility of a data breach is impossible, and therefore, appropriate detective capabilities are more important than ever. We must act as if the bad guys are on our network and then detect their presence and respond accordingly.

See the rest of the Enterprise Big Data Predictions, 2016.

Image Source:


  • No Related Posts

Accelerating SQL Queries that Span Hadoop and Oracle Database

By: Peter Jeffcock

Big Data Product Marketing

It’s hard to deliver “one fast, secure SQL query on all your data”. If you look around you’ll find lots of “SQL on Hadoop” implementations which are unaware of data that’s not on Hadoop. And then you’ll see other solutions that combine the results of two different SQL queries, written in two different dialects, and run mostly independently on two different platforms. That means that while they may work, the person writing the SQL is effectively responsible for optimizing that joint query and implementing the different parts in those two different dialects. Even if you get the different parts right, the end result is more I/O, more data movement and lower performance.

Big Data SQL is different in several ways. (Start with this blog to get the details). From the viewpoint of the user you get one single query, in a modern, fully functional dialect of SQL. The data can be located in multiple places (Hadoop, NoSQL databases and Oracle Database) and software, not a human, does all the planning and optimization to accelerate performance.

Under the covers, one of the key things it tries to do is minimize I/O and minimize data movement so that queries run faster. It does that by trying to push down as much processing as possible to where the data is located. Big Data SQL 3.0 completes that task: now all the processing that can be pushed down, is pushed down. I’ll give an example in the next post.

What this means is cross-platform queries that are as easy to write, and as highly performant, as a query written just for one platform. Big Data SQL 3.0 further improves the “fast” part of “one fast, secure SQL query on all your data”. We’d encourage you to test it against anything else out there, whether it’s a true cross-platform solution or even something that just runs on one platform.


Delegation and (Data) Management

By: Peter Jeffcock

Big Data Product Marketing

Every business book you read talks about delegation. It’s a core requirement for successful managers: surround yourself with good people, delegate authority and responsibility to them, and get out of their way. It turns out that this is a guiding principle for Big Data SQL as well. I’ll show you how. And without resorting to code. (If you want code examples, start here).

Imagine a not uncommon situation where you have customer data about payments and billing in your data warehouse, while data derived from log files about customer access to your online platform is stored in Hadoop. Perhaps you’d like to see if customers who access their accounts online are any better at paying up when their bills come due. To do this, you might want to start by determining who is behind on payments, but has accessed their account online in the last month. This means you need to query both your data warehouse and Hadoop together.

Big Data SQL uses enhanced Oracle external tables for accessing data in other platforms like Hadoop. So your cross-platform query looks like a query on two tables in Oracle Database. This is important, because it means from the viewpoint of the user (or application) generating the SQL, there’s no practical difference between data in Oracle Database, and data in Hadoop.

But under the covers there are differences, because some of the data is on a remote platform. How you process that data to minimize both data movement and I/O is key to maximizing performance.

Big Data SQL delegates work to Smart Scan software that runs on Hadoop (derived from Exadata’s Smart Scan software). Smart Scan on Hadoop does its own local scan, returning only the rows and columns that are required to complete that query, thus reducing data movement, potentially quite dramatically. And using storage indexing, we can avoid some unnecessary I/O as well. For example, if we’ve indexed a data block and know that the minimum value of “days since accessed accounts online” is 34, then we know that none of the customers in that block has actually accessed their accounts in the last month (30 days). So this kind of optimization reduces I/O. Together, these two techniques increase performance.

Big Data SQL 3.0 goes one step further, because there’s another opportunity for delegation. Projects like ORC or Parquet, for example, are efficient columnar data stores on Hadoop. So if your data is there, Big Data SQL’s Smart Scan can delegate work to them, further increasing performance. This is the kind of optimization that the fastest SQL on Hadoop implementations do. Which is why we think that with Big Data SQL you can get performance that’s comparable to anything else that’s out there.

But remember, with Big Data SQL you can also use the SQL skills you already have (no need to learn a new dialect), your applications can access data in Hadoop and NoSQL using the same SQL they already use (don’t have to rewrite applications), and the security policies in Oracle Database can be applied to data in Hadoop and NoSQL (don’t have to write code to implement a different security policy). Hence the tagline: One Fast, Secure SQL Query on All Your Data.