Perhaps you’re looking for a better way to perform data experimentation and facilitate data discovery. Or to start using machine learning to uncover more innovation opportunities through data.
The answer to that, of course, is having a data lab. Data labs make data science and experimenting with new data more possible. Complex analytics like machine learning can put a strain on the service levels of production systems. But having a data lab ensures your data scientists can experiment and run analytics as they need to without putting a strain on systems and facing complaints from other teams.
For many, setting up and implementing a data lab is a new project. In fact, you might even be setting up the first data lab ever at your company.
Download your free TDWI report, “Seven Best Practices for Machine Learning on a Data Lake“
So how can you ensure that your data lab has the best chance of success? In this article, we lay out seven data lab best practices. Keep in mind, these best practices are designed to get you thinking beyond the nitty-gritty details of architecture and implementation, and more along the lines of widespread support and adoption.
Data Lab Best Practice #1: Deliver a Quick Win
Better one quick win in two months than three wins after four months. Your data lab is likely a high-visibility, expensive project. People want proof that it’s working and they want it now.
So don’t be tempted to just play computer science sandbox. And instead, keep a business goal in mind that aligns with that of a key business stakeholder.
You’ll want to show the value of your data lab from both an IT and business perspectives to gain as much support as you can.
For IT, demonstrate that you’re minimizing the strain placed on production systems with your lab.
For the business, demonstrate easy ways the company can start saving money or maximizing revenue, now.
If you don’t have any ideas, meet with the business leader of the unit and brainstorm. Here are some for a jumping off point:
- How do I design a service to maximize ad revenue?
- What is the best combination of data that gives me the optimal segment that will be ever more likely to accept mobile offers?
- What data do I need for this? How do I combine it? Where do I get it?
Concentrate on the quick wins, but keep the future improvements and more complicated projects in mind.
Data Lab Best Practice #2: Consider Starting with Existing Data
Remember there’s value in your existing data, even if you’ve been collecting or cleaning new data at the same time. If you already have clean, labeled data available, consider creating a use case around that so you can get started faster.
Sometimes this might revolve around reorganizing your project scope. Let’s say ideally, your business unit would like a 360-view of the customer for more effective customer promotions. That’s a complicated project that requires a great deal of data.
But Britain’s National Health Service used existing data to help speed their quick wins. They examined payments, other transactions, and customer complaints as examples of fraud to investigate. Stopping fraud or recovering fraudulent claims is often a good quick win.
Once you have a few of those quick wins under your belt, you can start tackling more complicated projects that require more resources or more kinds of data. But especially in the early stages, it’s important to remember that most businesses won’t care about how complex or innovative your machine learning algorithm is.
They want results. And the faster they can get those results, the better.
Data Lab Best Practice #3: Try to Have Many (But Not Too Many) Projects in the Pipeline
We’ve said you should have a few quick wins. We’ve also said you should start with existing data. And now we’re also saying that you should try to have many projects in the pipeline (but not too many – stay balanced).
You should remember that not every data exploration project is going to have viable results that will mean change at the company. And if the idea only demonstrates incremental change, it might not display enough cost savings and ease of implementation for the ideas to gain traction. And even if the idea demonstrates change, it might not display enough cost savings and ease of implementation for that idea to gain traction.
It’s not going to be enough for you to only demonstrate the many ways your data lab has identified value. The executive team will want to know how many of those ideas were implemented.
That’s why you should have many data projects in the pipeline, to decrease your chances of failure. But try to have some focus too. At the opposite end of the spectrum is chasing after too many ideas and ending up with nothing because you didn’t focus resources.
Data Lab Best Practice #4: Keep Your Executive Support Engaged
We assume you had some sort of executive support to make it this far. But you will need to keep them engaged. That’s related to the previous point – deliver a few quick wins.
But don’t stop there. See which other executives you can get on board. Can you deliver quick wins in another area too? You don’t want to stretch yourself and your resources too thinly. But at the same time, the ideal vision is a company full of executives clamoring for more machine learning projects, with plenty of support for your data lab because it’s seen as such a valued part of the company.
To do this, you can deliver sessions on what machine learning can do. Pitch ideas for how you can help other parts of the business expand.
Yes, this does entail extra work but if you’re determined to make your data lab a cornerstone of the business, it’s well worth it.
Data Lab Best Practice #5: Operationalize Your Data
You might be tempted to think that your job ends with finding insights. But that’s not the case. You need to push on your executives and other business leaders to put your findings into place. Take a look at the business units or business leaders that you’re doing work for. Do they praise your findings but never implement them? If so, it’s time to have a serious conversation or it’s time to find a new team to collaborate with.
Think about the actionable reports you can create, or change to existing apps and processes. Your findings could affect the creation of a brand-new service, app, or product.
Remember, at the end of the day, it’s not about how many insights get uncovered. What your business cares about is how much money is being saved and how much revenue is being created. It’s best if you can point to actual revenue being generated by your skilled team.
Data Lab Best Practice #6: Be Sure You Have a Platform That Scales
Keep in mind, the cloud is the perfect place for an initiative like the data lab. You can provision your lab there, store massive amounts of data, and spin up and spin down flexible analytic workloads as needed. The best part of all? You’ll pay for only what you use, which minimizes your cost and risk.
In addition to having a platform that scales, you’ll also need the resources and talent to execute. If you don’t, you could potentially have a backlog of big data projects from day one. That brings up to Best Practice #7.
Data Lab Best Practice #7: Support Your Data Scientists
A good data scientist is worth his or her weight in gold. Make sure you support our data scientists and set them up for success. Assemble them in talented, diverse teams. Provide them with tools. And make sure that your management tolerates risk. It might take time for your data scientists to find the deep wins that everyone is looking for. So set expectations accordingly, while also ensure that you can find quick easy wins to keep everyone happy.
There you have it, our seven best practices for implementing a successful data lab. Data science may not be easy, but having a data lab makes it easier—and we hope this article will help you gain success more easily.
If you’d like to ask us any further questions, feel free to contact us. Or if you’re ready to experiment with working with your data in the cloud, we offer a free guided trial to build and implement successful data lake.