Moving From Correlation to Causation – the next big step in AI
Date : September 27, 2022
Currently, when talking about Artificial Intelligence (AI) in the context of enterprise class, commercial companies, AI is more of an art than a science. Why is this the case?
AI is a part of Data Science. It says it right in the name, Data Science. The science element of Data Science and AI is a moving and evolving target; that is not a bad thing, it is just the state of the art today and a fact that we should understand a bit better.
The art of analytics, including Data Science and AI, is due to a number of factors, including:
1. AI is a relatively new discipline - the math, technology, and tools to build, manage and operate an AI enabled infrastructure are nascent and rapidly evolving.
2. The skilled professionals who can effectively and efficiently conceive, build, and maintain AI applications are in limited numbers and those professionals are trained in a number of different schools of thought and approaches to the craft.
3. The nascent approaches that are showing promise in research and academic labs need to evolve to work in commercial settings.
The amount of innovation that is being developed is promising and exciting, and provides a conceptual foundation for commercial software, robust models, and applications, but there is more work to be done before we can move from art to science.
Today, the effectiveness and efficiency in the art of Data Science and AI, is substantially dependent on the skill and ability of the data scientists or machine learning engineers involved. Specifically, it is reliant the ability of those professionals to effectively conceive, design, build and execute the feature engineering phase of their projects.
Much of the success of current feature engineering work comes down to mastery of statistics in general and creativity with using correlation in specific.
Let’s discuss how Correlation is used today in AI and Data Science projects.
What is Correlation and how do we use it today in Data Science/AI
Let’s define Correlation.
Cor·re·la·tion /?kôr??l?SH(?)n/ - a mutual relationship or connection between two or more things.
Data Scientists work with a broad set of features that are correlated to the variable or the measure that they seek understand, monitor, or predict. There may be thousands of possible variables that are correlates of the target measure, actually, in some cases, there may be millions, and possibly billions.
The art of the analytics process, and to be clear, there are very few people who are highly skilled at this part of the AI and Data Science procedure, is to determine which of those thousands, or millions, or maybe even billions, of variables will accurately, robustly, and reliably, predict the actions and behaviors of the target variable or variables in close proximity to the observed behavior in the real world.
In 37 years, I have worked with a small number of people who are world class in this area (less than 20). If AI and Data Science are to grow and be an integral part of every leading company, we cannot rely on a process that only a handful of experts can execute.
That is why we need to move from Correlation to Causation
Let’s define Causation.
Cau·sa·tion /kô?z?SH(?)n/ The relationship between cause and effect; causality.
Simply moving from correlation to causality does not change the AI and Data Science process into a repeatable, rigorous scientific process. It will take more than a change in focus or methodology to get us there.
We are at an inflection point of where the causal algebra is being proven in academic research labs today. Those academic efforts are being quickly followed by early-stage commercial companies that are building software and tools to leverage the research innovations so that companies can implement causal feature engineering.
Developing the appropriate and purpose-built: math, tools, and technologies is what will enable the changes needed to replace the current fallible feature engineering process with a science-based process. In doing so, a significantly wider population of analytics professionals can undertake the task of developing casual based analytic applications on a reliable, repeatable, and scalable basis.
Casual algebra is being built into tools appropriate for Data Scientist and Machine Learning Engineers. These tools and applications will improve upon and replace the current process of searching for correlates on a hit and miss basis with a measured, controlled process to look at each possible feature and combination of features to ensure that our causal features, models, and application are the best that they can possibly be.
This new and improved process will look at every feature, every combination and test them against the target and provide intelligence and transparency into the fit for the objectives of each project. The process will result in a set of features that we know to be the best available, not just the best we happened to find.
Why is Causality the next step in our Data Science/AI journey
The ability to move from the current highly variable process that is dependent on a small number of highly skilled professionals in severely limited numbers; to a scalable, repeatable process that is available to all analytics professionals that produces scientifically verifiable casual factors in relation to the phenomena that we want to understand, predict, and influence, now, that is a game changing development.
Moving from an environment where we think we have done the best job possible in finding the optimum features, to one where we know that we have the optimum features is a defining development that will have an impact in every indusryt and company that is leveraging Data Science and AI.
We will still need trained and skilled analytics professionals, there is no question, but with the coming advances into Causal AI, we will be able to bring additional people into the analytics process and still maintain a high level of confidence that we are building robust, reliable, flexible model and applications. We can grow the AI and Data Science fields at a much faster rate.
Causal AI not only provides the data science and AI communities with a verifiable scientific process, but it also brings us one step closer to Explainable AI (xAI).
xAI is the ability to explain what our AI models are doing, how they learn, and why they make the decisions that they make. This opens our ability to use our most powerful analytical techniques on all analytical problems across all industries.
Until we have xAI, we cannot use our most advanced techniques in regulated industries like pharmaceuticals, insurance, retail banking and more. Causal AI will accelerate the development of xAI.