Four Tools to Integrate into Your Data Lake

A data lake is an absolutely vital piece of today’s big data business environment. A single company may have incoming data from a huge variety of sources, and having a means to handle all of that is essential. For example, your business might be compiling data from places as diverse as your social media feed, your app’s metrics, your internal HR tracking, your website analytics, and your marketing campaigns. A data lake can help you get your arms around all of that, funneling those sources into a single consolidated repository of raw data.

But what can you do with that data once it’s all been brought into a data lake? The truth is that putting everything into a large repository is only part of the equation. While it’s possible to pull data from there for further analysis, a data lake without any integrated tools remains functional but cumbersome, even clunky.

On the other hand, when a data lake integrates with the right tools, the entire user experience opens up. The result is streamlined access to data while minimizing errors during export and ingestion. In fact, integrated tools do more than just make things faster and easier. By expediting automation, the door opens to exciting new insights, allowing for new perspectives and new discoveries that can maximize the potential of your business.

To get there, you’ll need to put the right pieces in place. Here are four essential tools to integrate into your data lake experience.

Never miss an update about big data! Subscribe to the Big Data Blog to receive the latest posts straight to your inbox!

Machine Learning

Even if your data sources are vetted, secured, and organized, the sheer volume of data makes it unruly. As a data lake tends to be a repository for raw data—which includes unstructured items such as MP3 files, video files, and emails, in addition to structured items such as form data—much of the incoming data across various sources can only be natively organized so far. While it can be easy to set up a known data source for, say, form data into a repository dedicated to the fields related to that format, other data (such as images) arrives with limited discoverability.

Machine learning can help accelerate the processing of this data. With machine learning, data is organized and made more accessible through various processes, including:

In processed datasets, machine learning can use historical data and results to identify patterns and insights ahead of time, flagging them for further examination and analysis.

With raw data, machine learning can analyze usage patterns and historical metadata assignments to begin implementing metadata automatically for faster discovery.

The latter point requires the use of a data catalog tool, which leads us to the next point.

Data Catalog

Simply put, a data catalog is a tool that integrates into any data repository for metadata management and assignment. Products like Oracle Cloud Infrastructure Data Catalog are a critical element of data processing. With a data catalog, raw data can be assigned technical, operational, and business metadata. These are defined as:

  • Technical metadata: Used in the storage and structure of the data in a database or system
  • Business metadata: Contributed by users as annotations or business context
  • Operational metadata: Created from the processing and accessing of data, which indicates data freshness and data usage, and connects everything together in a meaningful way

By implementing metadata, raw data can be made much more accessible. This accelerates organization, preparation, and discoverability for all users without any need to dig into the technical details of raw data within the data lake.

Integrated Analytics

A data lake acts as a middleman between data sources and tools, storing the data until it is called for by data scientists and business users. When analytics and other tools exist separate from the data lake, that adds further steps for additional preparation and formatting, exporting to CSV or other standardized formats, and then importing into the analytics platform. Sometimes, this also includes additional configuration once inside the analytics platform for usability. The cumulative effect of all these steps creates a drag on the overall analysis process, and while having all the data within the data lake is certainly a help, this lack of connectivity creates significant hurdles within a workflow.

Thus, the ideal way to allow all users within an organization to swiftly access data is to use analytics tools that seamlessly integrate with your data lake. Doing so removes unnecessary manual steps for data preparation and ingestion. This really comes into play when experimenting with variability in datasets; rather than having to pull a new dataset every time you experiment with different variables, integrated tools allow this to be done in real time (or near-real time). Not only does this make things easier, this flexibility opens the door to new levels of insight as it allows for previously unavailable experimentation.

Integrated Graph Analytics

In recent years, data analysts have started to take advantage of graph analyticsthat is, a newer form of data analysis that creates insights based on relationships between data points. For those new to the concept, graph analytics considers individual data points similar to dots in a bubble—each data point is a dot, and graph analytics allows you to examine the relationship between data by identifying volume of related connections, proximity, strength of connection, and other factors.

This is a powerful tool that can be used for new types of analysis in datasets with the need to examine relationships between data points. Graph analytics often works with a graph database itself or through a separate graph analytics tool. As with traditional analytics, any sort of extra data exporting/ingesting can slow down the process or create data inaccuracies depending on the level of manual involvement. To get the most out of your data lake, integrating cutting-edge tools such as graph analytics means giving data scientists the means to produce insights as they see fit.

Why Oracle Big Data Service?

Oracle Big Data Service is a powerful Hadoop-based data lake solution that delivers all of the needs and capabilities required in a big data world:

  • Integration: Oracle Big Data Service is built on Oracle Cloud Infrastructure and integrates seamlessly into related services and features such as Oracle Analytics Cloud and Oracle Cloud Infrastructure Data Catalog.
  • Comprehensive software stack: Oracle Big Data Service comes with key big data software: Oracle Machine Learning for Spark, Oracle Spatial Analysis, Oracle Graph Analysis, and much more.
  • Provisioning: Deploying a fully configured version of Cloudera Enterprise, Oracle Big Data Service easily configures and scales up as needed.
  • Secure and highly available: With built-in high availability and security measures, Oracle Big Data Service integrates and executes this in a single click.

To learn more about Oracle Big Data Service, click here—and don’t forget to subscribe to the Oracle Big Data blog to get the latest posts sent to your inbox.


Leveraging Analytics for Service Excellence: Award-winning Innovation

EMC logo

Why You Should Use Science to Stay Close to Your Customers

Using analytics and instrumentation to track user behavior and respond to their needs isn’t just for savvy eCommerce operations. In fact, analytics can help you maximize user experience with all your software, whether it’s an external-facing website or an internal-facing application like a CRM or sales tool.

Dell Technologies’ Online Support and Dell Digital Analytics teams recently proved the value of that strategy firsthand by using analytics as a key tool in consolidating two customer support websites into one.

Not only has this approach paid off with improved satisfaction on our support portal but last month we won a prestigious STAR Award from the Technology Service Industry Association (TSIA) for Innovation in Leveraging Analytics for Service Excellence.

Central to our project’s success was the fact that we embedded an analytics framework into each step of our software development lifecycle—from design to development to launch to maintenance.

What we have found is that instrumenting every application and applying analytics is helping us listen to users like never before. It is letting us add science to the art of understanding and responding to customer behavior. And it is part of the growing role IT and analytics are playing in our services business to help us keep pace with the evolving needs of our customers.

Fixing a Customer Frustration

To see how analytics made a difference in designing our new support portal, consider one small improvement we made to our search feature as a result of tracking customer behavior.

As we were looking at how and what customers were searching on our support portal, we found that customers were frustrated. Faced with two distinct search box options—one calling for users to describe their issue (such as “PC running slowly”) and a second seeking a product serial number, some users were getting the two mixed up.

Since a serial number entered in the box intended for text search or a text search entered in the serial number box returned nonsensical results, the data showed customers were abandoning their searches and leaving the site without resolution.

With the problem clearly quantified, our engineering team developed a unified search experience that could handle either request and decipher user intent. Based on the intent we are now able to send users to a knowledge article or their product page all from a single search experience—a simple solution on a high traffic experience that has moved the needle for our users.

Building on a History of Analytics

The journey to this level of scientific user feedback began with the major challenge of integrating our support websites. With the groundbreaking merger of Dell and EMC two years ago, we had two separate high traffic websites, each with a support portal for their legacy products.

Because had extensive eCommerce capabilities, a significant analytics capability was already in place to track every aspect of online sales. That instrumentation also extended to the support portal which could in turn be leveraged for

As we charted a course to consolidate our support portals we decided that analytics would be an important tool to drive our design. Ahead of any design work we first decided to dramatically increase instrumentation on to collect a much deeper level of user measures as a critical input into our design. Our hope was that by understanding what aspects of the site were being used, and not being used, we could evolve toward a simpler design of the combined site.

Interestingly, having data on each site helped us break down the typical integration tensions that consolidating anything often creates. Instead of stakeholders citing the way they’d always done things to shape the future design, decisions were analytics-driven. The numbers showed us that the two sites weren’t really all that different or unique. At the end of the day, it was a website that customers go to to support a piece of hardware with embedded software. The numbers showed us all that we had much more in common than anyone thought at the start.

The new site was launched in April 2018 and extended globally on August 26. The site’s analytics-driven approach has improved customer satisfaction scores 10%, decreased contact volume by 10% and decreased page load speeds by 35%.

With analytics embedded in the software development lifecycle process, we continue to measure customer experience to remain responsive to emerging needs. Our IT analytics team members now have a permanent seat at the table and are part of every decision we make about the site and our customer experience.

Adding Science to the Art of Software Experience Design

We still do some of what is considered the art of software experience design; the qualitative research reaching out to small groups of customers to get their feedback via focus groups, putting prototypes in front of them and doing empathy work to understand their needs. But adding science to the process with a quantitative view of a larger sample of customers lets us hear the voice of the user on a more constant basis.

There is huge value in the quantitative feedback loop which comes from analytics. You are able to know what people are doing in an application and perhaps more importantly what they aren’t doing. We regularly see instances where we design a new app and there are pieces that are getting used and pieces that aren’t getting used even though we think they are important.

It couldn’t be more clear that analytics provide a more sustainable way to stay close to your users, which allows you to deliver better outcomes in every experience.


While both EMC and Dell have received many awards from the TSIA—the biggest service industry association out there—over the years, this is the first time TSIA has recognized the IT side of our services team. That milestone underscores the critical importance of IT in enabling the services industry to provide the experiences we want for our customers.

We invite you to check out the blog Digital Transformation in Field Services: Award-winning Best Practices to learn how Dell EMC Global Field Services won the 2018 TSIA STAR Award for applying end-to-end digitalization, automation, and data capture and analysis to improve the onsite customer support experience.

The post Leveraging Analytics for Service Excellence: Award-winning Innovation appeared first on InFocus Blog | Dell EMC Services.

Update your feed preferences





submit to reddit


Google Analytics policy for Citrix Secure Mail and Secure Web

Citrix Secure Mail and Secure Web uses Google Analytics for collecting anonymous statistics and usage information for analyzing the various feature usage of the app. This data is further used by Citrix to develop right feature set for future releases with appropriate priority.

Citrix Secure Mail and Secure Web anonymizes the data before updating Google Analytics.

In other words, we do not collect any user specific details like user id or email address from the apps, rather just the domain part of the email address is collected to categorize the usage of features accordingly.

This can be configured using Secure Mail and Secure Web policies configured in XenMobile console.

In Secure Mail and Secure Web App configuration -> Analytics -> Google Analytics level of detail.

Complete – This is default configuration, Shares the domain information along with feature usage data to categorize the data.

Anonymous – Shares only feature usage data.

For example:

If the account configured on Secure Mail is for,

For Complete : Only is shared via Google analytics, along with feature usage data to categorize the data accordingly.

For Anonymous : Only feature usage data is shared via Google Analytics.