5 Tips To Drive Innovation With Google, AWS, Or Azure Data Lake

A data lake case study: Start small, go for agile cloud data lakes

Thomas Spicer
Published in
6 min readSep 8, 2020

--

Getting started with data lakes can be confusing for newcomers. They are met with misinformation, false comparisons of data lake vs. data warehouse, and overwhelming technical jargon. It does not have to be this way.

Done well, a modern data lake platform can deliver tremendous value to your team and company. Taking an agile, lightweight approach will reduce technical debt while accelerating an enterprise’s understanding of how a data lake can improve the accessibility for the consumption of data by analysts.

In this post, we outline an approach to get started quickly with a pilot or PoC that applies to Google, AWS, or Azure Data Lake.

Data lake trends and best practices

Getting your feet wet in a lake can be done in the context of a quick, low-risk, disposable data lake pilot, or proof-of-concept (POC). Here are a few tips as you think about how to get your data lake implementation rolling;

1. No data oceans, seas, or great lakes

Seek opportunities where you can deploy an “Ephemeral” or “Project” data lake. Not sure what these are, see this definition for different data lake types. Find an early-stage project that aligns with these data lake types. Embracing an ephemeral lake reduces risk and helps you overcome technical or organizational challenges.

In this example, a customer used a pilot to send Adobe clickstream data to a lake:

By design, you are avoiding overly complicated data lake infrastructure. Adopting a well-defined, manageable scope will help the team achieve velocity. For example, stick with the use of a cloud data lake vs. the complexities of an on-premise data lake. An on-premise data lake project may trigger complex IT procurement, prioritization, and staff assignments.

Google, AWS, and Azure provide the building blocks for cloud-based solutions that are perfect for a low-cost, disposable PoC. If you need more examples on how to set up a PoC or pilot, we reference a few use cases below.

2. Have a sponsor, find the passion

Make sure you have an “evangelist” or “advocate” internally, someone passionate about problem-solving, seeking solutions, and interested in seeing adoption within the company. If you are in technology, can you find a partner on the BI team or vice-versa?

If you do not have a sponsor, you may find your lake not able to make enough traction.

3. Drive for simple, achievable

Embrace simplicity and agility; put people, process, and technology choices through this lens. If possible, avoid overly complicated data lake vs. warehouse or data lake vs. data mart conversations. These conversations will not contribute to your PoC or pilot efforts.

For example, you can quickly use AWS Lake formation, populate your lake with test data, and then use a query engine like Amazon Athena for data exploration.

This approach is quick and cost-effective, especially within the context of pilot or PoC. The lack of complexity should not be seen as a deficiency but a byproduct of thoughtful design.

4. Work with data you know

Keep the scope tight and well defined by limiting your lake to understand data, say exports from ERP, CRM, Point-of-Sales, Marketing, Or Advertising data.

Subject matter expertise offers data literacy. At this stage, following a subject matter-driven approach will help you know workflow around data structure, ingest, governance, quality, and testing. This is critical for velocity. If you work with unfamiliar or overly complex data, you will waste valuable resources on data wrangling efforts.

5. Make it real, experiment with tools to bring the data to life

Pair your solution with modern BI and analytics tools like Tableau, Power BI, Amazon Quicksight, or Looker. Allow non-technical users an opportunity to experiment and explore data access via a lake.

Engage a different user base that can assess performance bottlenecks, discover opportunities for improvements, possible linkages to any existing EDW systems (or other data systems), and additional candidate data sources. Allow for the discovery of data lake tools that make sense for your team and where best to invest resources into data lake automation.

For example, one of our customers piloted a data lake to assess the viability of developing media performance insights. Our customers found they achieved their data lake goals as they increased the focus on using business performance outcomes as reference points.

How did they do this? Using our lake formation process, Amazon Athena, data lake metadata catalog, and BI tools, the customer was able to quickly develop prototype reporting on top of the lake’s data:

Data lake pilot with AWS Athena and BI

This approach brought the approach to life in a visible and tangible deliverable that was quickly consumable by the organization.

Get Started

The outcome of a quick-start pilot, or PoC, should be a demonstration of how a data lake can hold the data your team needs to fuel the tools they love. Being a successful data lake early adopter means taking a business value-driven approach rather than a technology one.

An open, on-demand data lake strategy means you can run queries directly against your raw or data in a landing zone from a wide variety of tools like; Tableau, Microsoft Power BI, Looker, Amazon Quicksight, and many others.

The Openbridge data lake formation architecture delivers fast deployments, with surefire cost savings. Our technology, strategy, and modern data lake architecture helps businesses be agile with on-demand, serverless data platforms like Azure Data Lake Storage, Amazon Redshift Spectrum, or Amazon Athena.

With a focus on business value, your organization will be able to assess the impact a lake affords.

Want to discuss a data lake pilot or PoC for your organization? Need a platform and team of experts to kickstart your data and analytic efforts? We can help! Openbridge is a leader in code-free, fully automated data lakes.

Getting traction adopting new technologies, especially if it means your team is working in different and unfamiliar ways, can be a roadblock for success. This is especially true in a self-service-only world. If you want to discuss a proof-of-concept, pilot, project, or any other effort, the Openbridge platform and team of data experts are ready to help.

Reach out to us at hello@openbridge.com. Prefer to talk to someone? Set up a call with our team of data experts.

Visit us at www.openbridge.com to learn how we are helping other companies with their data efforts.

References

--

--