Up & Running: data pipeline with BigQuery and dbt
Nowadays, companies need to deal with the processing of data collected in the organization data lake. As a result, data pipelines are becoming more…
Read moreHardly anyone needs convincing that the more a data-driven company you are, the better. We all have examples of great tech companies in mind. The likes of Netflix, Spotify, or Airbnb who are constantly challenging their markets. It’s hard to argue that data-drivenness is the only or at least the most important catalyst for their success. It’s more than likely an interplay of different factors, like their business model, talent or vision. The reality is that these companies strategically focus on producing large amounts of data and using it to grow their business even further. In fact, if you look at the biggest companies in the world, you will see that the list is mostly occupied by firms with a strong focus on utilizing their data assets.
In his latest book, Ray Dalio, founder of the biggest hedge fund in the world, presents his amazingly exhaustive analysis of the main forces behind the rise and fall of major world powers. One of the observations he shares is that just a few hundred years ago, the main source of wealth was land, in order to grow food. Later, with the industrial revolution, this was replaced with manufacturing resources. Nowadays, data is becoming the most valuable asset. This trend is hard to ignore.
For modern organizations, being data-driven is not optional anymore.
For a term that is so popular, there’s surprisingly little agreement on what it exactly means. There seem to be good reasons for this. First of all, it’s a latent variable. You cannot measure it directly, like the temperature for example. Which leaves room for interpretation. Secondly, it’s not a binary term. Like intelligence, you can’t say that one has or doesn’t have it. Everyone lies somewhere on the spectrum. Thirdly, the upper boundary on this scale is rapidly expanding. What was state-of-the-art ten years ago, today is just yesterday’s news.
We took on this challenge ourselves, after going through dozens of books, papers and articles. Eventually, we decided to craft our own definition that could be used in practice. Our daily job is to help companies build and utilize their data capabilities. That’s why we decided to not only make it exhaustive but also actionable and easy to understand. That’s how we ended up with the following interpretation:
Being a data-driven company means that you use data to link results with actions and and act on the feedback from this.
In other words, data-drivenness is about a feedback loop between your decisions and their outcomes. It doesn’t mean replacing your current processes and business knowledge. It means building on top of them and making them better, thanks to data and analytics.
This loop can be described using three steps: decide, measure and conclude. Here’s how it could look in a perfect scenario.
Major decisions are preceded by careful analysis, where possible (It’s not always possible to act this way. If data can’t be used at this point, your best shot is to transform a decision into a hypothesis to validate). When you make a decision, you understand your options and their potential impact. You don’t have to rely on your experience or intuition. You have clear expectations of the outcomes.
Data needed to evaluate decisions is collected and available for analysis. This allows you to look at the performance from different perspectives, at a granular level.
There is a scientifically robust way to measure the results of the decision (e.g. A/B tests). If you observe the metrics going up or down, you can determine with confidence, whether this was caused by the decision itself or some external factors.
If the results are not satisfactory, the process can be repeated using the conclusions drawn from the experiment.
For decisions with a less data-rich context where A/B testing is not possible, we would still build a metrics landscape for contextualizing the decision. This is a good solution in order to counter human biases or misconceptions.
Not all decisions in the organization have to follow this process. Basic economic principles still apply. For low-stake decisions, you are likely to discover that the efforts needed to make it happen outweigh the expected gains. That’s why the capability of selecting the right initiatives to pursue is crucial for data-driven companies.
By becoming more data-driven, organizations improve their performance in a number of ways.
We have observed firsthand that the companies we work with see tangible benefits from implementing data-driven solutions. Some of these benefits, as experienced by ING are listed below:
If you want to learn more about this project, read a customer story here.
Keeping actions in sync with their outcomes is not only beneficial but necessary. Cuil is an example of a company that at some point challenged Google’s search engine by building an index that was three times bigger, at a fraction of the cost. However, the company failed to establish any feedback loop for its core decisions. Unfortunately, the impressive index wasn't enough to protect Cuil from failure when it turned out that the main features most important to customers were not properly addressed.
Prior to the launch, there was no external feedback to point out that the search quality wasn’t there, that the search engine wasn’t returning enough results and that users didn’t care about the size of the index if it didn’t actually lead to higher quality results. (...)
\*\*Joshua Levy\*\*, Cuil, Director of Engineering - Edmond Lau, 'The effective Engineer'
Transforming a company is not a trivial goal. According to McKinsey & Company, 70% of business transformations fail. One of the reasons for this is the lack of a proper framework for change management. That’s why we decided to formulate a 3-step process to boost these efforts.
First, you build a rapid understanding of your current data-driven capabilities (like tools or skills). You analyze your strengths and opportunities and set short-term goals. To make this a straightforward and repeatable exercise, we created a data-driven survey.
The survey was inspired by scientific research and experience from hundreds of data projects. It guides you through the five core dimensions of data-drivenness: Leadership, Culture, Analytics, Data and Technology. They are evaluated on a 5-level scale that reflects how strong your capabilities are in each of these dimensions. The aim is also to directly transform into recommendations for the next actions. The survey is a significant topic in its own right, so we explore it in detail in the blog post Is my company data-driven? Here’s how you can find out.
Once you have established which capabilities you want to build, we suggest extending your plan with business initiatives. As by definition they are value-oriented, they help to reduce the financial cost of the transformation. They also provide the opportunity of testing new capabilities while building them, to better navigate through the process.
Before formulating a roadmap, we recommend creating a backlog of prioritized analytical initiatives. Design Thinking is a great inspiration for fostering this process. By introducing concepts of divergent (generate as many ideas as possible) and convergent (select only the best ones) thinking, it allows us to approach this process systematically.
This philosophy fits very well into the format of workshops, where we first help to look at business opportunities from different angles to boost creativity. Next, we narrow down the pool of ideas by applying a set of relevant criteria, like the anticipated impact, risk or time to market. At the very start of the transformation, we recommend selecting low-risk initiatives with the potential of showcasing the first tangible gains within a relatively short period of time. This way, you can build trust and momentum for the transformation across the whole organization.
The final step is to merge the learning from the survey and the workshop and start working with domain experts, to prepare the final roadmap. The high-level overview of this process is shown below.
In the end, the plan for data-driven transformation is as valuable as its execution. At this step, two aspects are worth considering: the implementation team and the management process.
The start of the implementation is the most challenging moment skill-wise. To launch the first initiatives, you need quick access to a wide variety of skills such as:
When planning our projects, we select people with the right mix of skills for end-to-end delivery from the broad group of our experts (Business Intelligence, Data Analytics, Data Science, Analytics Engineering, Data Engineering and ML Ops).
The second aspect is to develop a proper process for managing individual initiatives and data products, as well as the whole endeavor. By scaling up the agile philosophy, you can decrease the risk and ensure that the whole process is transparent.
In a world where being a data-driven company is as much about success as survival, we have tried to give more clarity to what this actually means. We have demonstrated the practical benefits of data-driven transformation and have described the three major steps to pursuing them, based on our research and experience from hundreds of data projects. Soon, we will explore this subject in more detail by providing you with a way to evaluate the data-drivenness of your company.
Would you like to discuss our data-driven fast-track? On November 23rd we are organizing a free online event - Data-Driven Fast Track: introduction to data-drivenness. Feel free to join us and do not hesitate to fill in the survey, we will be happy to discuss your results and help you become a data-driven company.
Nowadays, companies need to deal with the processing of data collected in the organization data lake. As a result, data pipelines are becoming more…
Read moreApache Sedona is a distributed system which gives you the possibility to load, process, transform and analyze huge amounts of geospatial data across…
Read moreWhile a lot of problems can be solved in batch, the stream processing approach can give you even more benefits. Today, we’ll discuss a real-world…
Read moreDuring my 6-year Hadoop adventure, I had an opportunity to work with Big Data technologies at several companies ranging from fast-growing startups (e…
Read moreIf you are looking at Nifi to help you in your data ingestions pipeline, there might be an interesting alternative. Let’s assume we want to simply…
Read moreA year is definitely a long enough time to see new trends or technologies that get more traction. The Big Data landscape changes increasingly fast…
Read moreTogether, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.
What did you find most impressive about GetInData?