Insights

Why industrial AI will be built on causal reasoning, not scale

Louis Allen

April 21, 2026

No technology has reached enterprise scale as quickly as generative AI. Stanford's 2026 AI Index puts enterprise adoption past 88%, roughly double its 2023 level, and investment has grown faster than any category of software spend since cloud. What that headline number hides is that the adoption has not been evenly distributed across the work an enterprise does. It has concentrated almost entirely in one class of problem, which is the class the tools were designed to solve.

That class is content work, and what has been industrialised is language itself, meaning the drafting, summarising, searching, coding, querying, writing of meeting notes and answering of questions from a document library. The value captured in that work is real, measurable, and being renewed by the companies that bought it, because the task actually fits the tool and the tool does it well. Content work, where the transformer actually belongs, is a solved commercial problem. Decision work, where the P&L actually sits, is not.

Where the wins have landed

Klarna has deflected roughly two thirds of its customer service tickets with a language model. Shopify has compressed a meaningful fraction of its engineering hours into copilot-assisted code. Siemens Industrial Copilot at Schaeffler has reported close to a quarter less time spent on reactive maintenance documentation. All of them wins that the companies running them have renewed and extended rather than quietly rolled back.

The reason the ROI has landed in those specific places is that the tools were built to do precisely that work. As well articulated in an article titled "Why the AI Industry Ignored Causation — And Why That Bet Is Coming Due", every one of the large language models powering the current wave, from the frontier models at OpenAI, Anthropic and Google through to the open-weights ecosystem underneath them, is built on the transformer architecture introduced in a 2017 paper out of Google called "Attention Is All You Need". That paper, and the scaling work that followed it, is where every capability gain of the last eight years has come from. A transformer, in the formal sense, is a probability estimator trained over the distribution of tokens it has seen during training. When it answers a question it is conditioning on whatever context has been provided and sampling the next token from that distribution, repeating the operation until the sequence terminates. Every trajectory it produces is, underneath the apparent fluency, a very sophisticated answer to the question of what has usually come next in situations that looked like this one. That is the right computation for retrieval, for summarisation, for drafting, and for any task whose natural output is a document.

Where scale stops working

So far, the gains made from enterprise AI have been largely attributed to scale, which is to say more parameters, more training data and more compute. Inside content work that attribution holds. Coding, drafting, summarisation, retrieval and analysis all get sharper as the model behind them gets larger, because the task each is doing is itself a sampling problem, and more data lets the model sample more accurately.

Coding and content generation are, on that axis, extremely powerful, and they will keep getting more powerful. But scale cannot close the gap between sampling the past and deciding what to do next. A decision that moves the profit and loss is not a question of what usually comes next in the training distribution. It is a question of what will happen to a system if you change one of its variables, and that is not a question a larger sampler can answer any better than a smaller one.

The information required to answer that kind of question is not sparse in the training corpus. It is absent from it by construction. A transformer learns only from observations that have been recorded, and no dataset, however scaled, contains observations of actions that were never taken. There is no larger version of a dataset that contains the outcome of an experiment nobody ran.

That distinction matters most in industrial settings, where AI is not being asked to help draft a brief but to recommend an action on a physical system. The decision layer in an industrial context is what runs the plant, meaning which setpoint to move, which maintenance action to bring forward, which shutdown to call or defer. It is also the layer where the analyst coverage on enterprise AI has had to reach for the language of abandonment rather than adoption.

Gartner reported in April 2026 that 57% of the infrastructure and operations leaders whose AI initiatives had failed cited the same root cause, which was that they had expected too much too fast. MIT's Project NANDA puts the headline figure at around 95% of enterprise pilots returning no measurable impact to the profit and loss, with appropriate caveats about sample size. Triangulating across tier-one analyst houses with very different methodologies, what the numbers share is not a story of failure but a story of mismatch between the work these tools are good at and the work enterprises have been hoping they could do.

Decisions are a different operation

The mathematics that answers the decision question has been developing quietly alongside deep learning, for roughly the same four decades. Causal inference, largely developed by Judea Pearl and his collaborators from the late 1980s onwards, is the formal study of how to reason about interventions rather than observations, and it earned Pearl the Turing Award in 2011. It is a different mathematical foundation from the one behind transformers, and it operates on a different object.

That object is a graph representing how things cause each other in the underlying system, with nodes for the variables and arrows for the direction in which one drives another. If raising the speed of a pump increases the pressure in the pipe it feeds, which increases the rate of wear on a downstream valve, that is a small piece of causal graph: three nodes, two arrows, each one encoding not a correlation but a direction of influence. The mathematics asks what would happen if you severed one of that graph's edges and set a variable by external action rather than observed it passively. The distinction between sampling from a joint distribution and performing an intervention on a graph is not one that scale will eventually collapse. A transformer does not construct such a graph internally, and cannot be coaxed into producing one by sampling harder or sampling longer, which means the limit on transformers performing interventional reasoning is not a capability ceiling waiting to be broken but a property of the object the model is computing in the first place.

The same graph that makes interventional reasoning possible is also what makes the resulting decisions explainable, because every recommendation can be traced backwards along the edges that produced it, which is precisely what anyone accountable for a decision in a regulated environment is asking for when they ask how the system arrived at its conclusion. Causal frameworks are, in that sense, designed for both halves of the decision problem, the making of the decision and the understanding of what the decision will do.

Neither framework displaces the other. The generative layer handles the language around a decision, meaning the logs, reports and summaries that surround it. The causal layer handles the decision itself, meaning which variable caused the change, what will happen under each available action, and why a given recommendation holds up to scrutiny. The causal decision layer is not a replacement for the transformer. It is a distinct, non-trivial layer of reasoning that sits on top of transformer systems and supplies precisely the capability transformers are not structurally built to produce. Improvements at the generative layer make the combined system more useful rather than less, because the two layers compose rather than compete.

Process manufacturing is the sharpest case

The sharpest expression of this gap is in process manufacturing, the industrial category that runs continuous physical processes. It covers the refineries, chemical plants, pharmaceutical manufacturing sites and energy operators that produce the materials every other industry depends on. What makes it the cleanest test case for decision AI is that every recommendation in that world becomes a real action on live equipment, meaning a change to a target operating temperature, a shift in the mix of raw materials being fed into a reactor, a maintenance intervention brought forward, or a full production shutdown called or deferred. The value of each of those decisions depends entirely on what will happen under the action, rather than on what has historically happened in situations that looked superficially similar in the plant's historical data.

Process manufacturing concentrates every feature that makes this structural rather than a data-volume problem. Processes are intrinsically non-stationary, because catalysts age, feedstocks shift and equipment fouls, so the correlations a model learns in one operating regime may already have broken in the next. A fouling model trained on two years of operating data from a refinery heat exchanger can be confidently right on the trend and then confidently wrong the moment the plant switches its crude oil feed from a lighter grade to a heavier blend, because the model has no representation of the fact that the feedstock change caused the fouling rather than correlated with it. More historical data from the previous regime does not recover that information, because what is missing is not frequency but mechanism. And the cost of a hallucinated intervention in that setting is not an embarrassment to be walked back in a board report, it is a physical event on a live plant, which collapses the tolerance for confident-but-wrong answers close to zero.

The decade ahead

The regulators have started to see the same divergence. The EU AI Act's classification of industrial control systems as high-risk will eventually require explainable reasoning behind any AI system that influences a process variable, and the FDA's long-running appetite for causal evidence in pharmaceutical manufacturing shows every sign of rising rather than falling. Any AI system proposing an intervention inside a plant will soon need to show its reasoning in a form that can be defended inside a management of change meeting, meaning the formal engineering review that must sign off any alteration to a running industrial asset. A probability distribution over tokens does not meet that bar regardless of how well phrased its outputs happen to be.

The useful question to ask of each AI investment sitting in your stack is not whether it uses the latest model, but whether it is being asked to produce content or to make a decision, and whether it has been architected to do the job you are asking it to do. Content tools pressed into service on decision tasks is where most of the industry's quiet P&L shortfall is currently accumulating, and it is where the shortfall will keep accumulating for as long as the mismatch goes unnamed.

The first wave of enterprise AI industrialised language. The second will industrialise decisions, and it is being built on causal reasoning rather than scale. The companies that understand the difference will be the ones whose AI programmes are still moving the P&L three years from now, while the rest are still explaining why the pilot didn't convert.

Dr Louis Allen is the founder and CEO of Kausalyze, which builds the causal decision layer for process manufacturing. See how it works at kausalyze.com/how-it-works, or book a pilot conversation.