When British mathematician Clive Humby uttered those now famous words, it’s unlikely even he could predict the vast scope of the industries that would spring up in the ensuing decades and the tremendous value data would unlock. The mantra of “Data is the new oil” has also created a gold rush where companies have invested millions in creating data piping, data lakes, and data streams, whose major output has been, by and large, performance dashboards for internal teams. Lost in this data land grab has often been the “why” of the data. The critical element of causality in data is essential for driving the future of better business decisions and automation.
Causality data is information that contains the cause and the resulting effect of a situation. The before and after data. For example, information about the barometric pressure and the current temperature, and then if it rained within a few hours - that’s data with context. In data collection terms, this is referred to as “Event Driven” data. Collecting Event Driven data is embraced by the world’s most successful companies, business leaders, and technologists. Currently, 80% of Fortune 100 companies use Event-Driven Data architecture.
“Causality is what lets us make predictions about the future, explain the past, and intervene to change outcomes.” - Samantha Kleinberg.
Because causality data tells us that an event results from a set of circumstances, it is vital for effective Machine Learning. Machine Learning is, simplistically put, algorithms finding patterns in data, then predicting the outcomes of what is likely to happen when similar patterns are seen again. To find these patterns and achieve its goals, a Machine Learning model also needs to know the before and after data.
However, many companies, in their rush to collect any and all data, have lost the cause and effect. They aggregated data to make it human-friendly. Who could even look at spreadsheets filled with thousands of columns of variables and millions of rows of events every day? They even overwrote data to save on storage costs. In the process, they are losing that data's vital context and rendering it useless.
Machine Learning is proving to be a highly efficient way of finding patterns that lead to outcomes - needles in the haystacks of data. The world’s most successful companies are already using these patterns to execute better automated decisioning. There are countless examples of businesses using Machine Learning to grow profitability, efficiency and results. The reason is that with causality data, Machine Learning can start bringing value and actionability to all that effort we have put in over the past several years in creating these data “oil” pipelines.
First, see if your organization already has an event-driven architecture. And if they do, find out for what scenarios is this data being collected. Suppose you want to optimize a sign-up flow for your product. In that case, you might be disappointed if your company isn’t collecting some essential session-driven data about the users that come in and their outcomes in completing that flow.
There are plenty of companies out there that will help you collect event-driven data or help you better design a data stack. Happy to make some referrals if needed. The one thing to remember is that you can start Machine Learning in parallel to a larger data project (this is critical in accelerating time-to-value on ML projects).
In the end, if Data is the new oil, then Causality data is that oil refined into Gasoline to be used to power the engine of better business through Machine Learning.
In 15 minutes see how SAVVI AI can benefit your business.