AI forecasting has a credibility problem in retail and ecommerce, and the technology itself is rarely to blame. Most companies are asking their models to predict a business their own systems cannot fully see. The model ingests demand history, carrier scans, and purchase frequency, but three operational signals that reshape what those numbers mean sit in separate systems, owned by separate teams, with no path back into the forecast.

Those three signals: returns volumes, delivery promises made at checkout, and failed deliveries that suppress repurchase. Each one changes the picture the model is working from. When they are missing, the AI is forecasting a simplified version of the business, and the places where it is most wrong are the places where operations are most complex.

The evidence has been building for years, across multiple research disciplines.

A 2018 PLOS One paper tested machine-learning methods against traditional statistical forecasting across more than 1,000 time series. Traditional methods won on every accuracy measure, at every forecasting horizon, using less computing power. A more advanced model does not automatically produce a more useful forecast when the underlying signal is weak.

A 2025 paper in the European Journal of Operational Research pushed the finding further. Better forecast accuracy does not reliably improve inventory performance. The relationship depends on product mix, cost structure, and replenishment policy. Optimizing a forecast metric and optimizing the business are two different things.

A 2025 International Journal of Forecasting study pitted five large language models against 123 human forecasters in retail. The AI did not consistently win. Both humans and AI performed worst during promotional periods, exactly when delivery capacity, returns volumes, and customer expectations collide at once.

The pattern across all three studies points the same way: AI forecasting disappoints most where operations are most complex, and closing that gap requires architectural changes to the data the model can reach.

25 studies

Zero integrate demand forecasting with returns forecasting

Management Review Quarterly, 2024 systematic review

3 days, 2 ratings

Identical delivery speed rated differently depending on the checkout promise

MSOM, millions of deliveries tracked on a major ecommerce platform

Late = longer gap

Late deliveries measurably increase time between orders

Journal of Service Research, 2025 Western European quick-commerce study

Returns belong inside the forecasting model

Most retailers treat returns as a reverse-logistics cost center. A 2024 systematic review in Management Review Quarterly examined 25 studies on ecommerce returns forecasting and found something stark: no paper in the field integrates demand forecasting and returns forecasting. The two are treated as separate problems managed by separate teams.

In fashion and general merchandise, that gap is expensive. Returns already represent a major operating cost across ecommerce. In fashion, they can be much higher. Each return creates inbound carrier volume, warehouse labor, inspection time, and resale decisions, none of which the outbound demand model anticipated.

The delivery management layer is where forward and reverse flows meet. If returns stay outside the forecasting model, the carrier network absorbs volume that no forecast accounted for. The model calls it noise. The warehouse calls it Tuesday.

Hunkemoller, one of Europe's largest lingerie retailers, faced exactly this problem. Before digitizing returns with nShift, the company had no advance visibility into how many returns would arrive on any given day, what was driving them, or how to prepare warehouse capacity. After connecting returns data across six European markets, warehouse teams now see expected volumes days in advance. "We've made returns part of a seamless omnichannel customer experience with increased returns control and insights," says Robin Visser, Omni Channel Business Development Manager at Hunkemoller. "What was a historical pain point for the company and our customers has been changed into something that adds real value."

Checkout promises are rewriting your historical data

Carrier performance data looks clean at first glance, sorted into on-time or late, rated or not. Customers, though, do not experience delivery against those metrics. They experience it against what was promised at checkout.

Research published in Manufacturing & Service Operations Management tracked logistics ratings across millions of deliveries on an ecommerce platform. Ratings were shaped as much by the promised delivery speed as by actual performance. A three-day delivery rated well when customers expected four. The same delivery rated poorly when they expected two.

That creates a compounding data-quality problem. Every time a retailer changes its delivery promise, adds same-day options, adjusts cutoffs, or enters new regions, historical carrier data stops measuring what it used to. An AI model trained on that history thinks it is learning carrier quality. It is actually learning the gap between promise and expectation, and that gap keeps moving.

Connecting checkout promise logic to the data that trains the model is the only way to stabilize the signal.

Failed deliveries suppress demand, and most models miss it

A failed delivery does more than generate a support ticket. It pushes the next order further out.

A 2025 Journal of Service Research study tracked purchase behavior on a Western European quick-commerce platform and found that late deliveries measurably increase the time between orders. Early deliveries compress it. The negative effect from a late delivery is stronger than the positive effect from an early one of the same magnitude.

Most AI demand models treat purchase history as a clean signal of customer intent. After a stretch of carrier disruption or missed delivery windows, the model quietly learns that demand is lower than it really is. The business trims capacity and inventory. Then conditions improve and the model is still reading the wrong baseline.

When tracking and exception data feeds back into the demand model, the AI can distinguish between a customer who stopped buying and a customer whose last delivery went wrong.

The fix is a connected architecture

In practice, four capabilities keep showing up in the organizations where forecasting actually drives operational decisions.

Connected data. Demand, inventory, promotions, delivery promises, carrier events, failed deliveries, returns, and refunds need to be linked in a way that preserves cause and context. A sales dip from a stockout and a sales dip from weak demand look identical in a time series. They require completely different responses.

Probabilistic outputs. Operations teams need ranges, thresholds, and action triggers, not a single number. The difference between "we expect 80,000 orders next week" and "there is a meaningful probability that parcel volume exceeds carrier capacity in these regions if promotion conversion comes in above plan" is the difference between a number on a slide and a decision the ops team can act on.

Post-deployment monitoring. A 2026 NIST report on deployed AI systems makes the issue explicit. Pre-deployment testing happens in controlled conditions. Deployed models face a world that keeps changing: customer behavior shifts, carrier networks degrade, promotional strategy evolves. A model that passed validation six months ago may be quietly wrong today.

Governance. Someone owns the model, the inputs, the override logic, and the call to retrain or roll back. In Europe, AI governance is increasingly a compliance question as well as an operational one. The EU AI Act entered into force in August 2024 and applies progressively, with stricter obligations for certain high-risk systems.

Four questions to test any AI forecast

Before trusting an AI forecast, ask:

  1. Can the model see the delivery promise the customer was given at checkout?
  2. Can it separate weak demand from stockouts, late deliveries, failed deliveries, and poor service availability?
  3. Can it account for returns as future parcel volume, warehouse labor, inventory movement, and customer friction?
  4. Can the forecast trigger a real operational decision: changing delivery promises, adjusting carrier rules, protecting capacity, or communicating earlier with customers?

If the answer to any of these is no, the AI is predicting outcomes from disconnected evidence. The companies that get forecasting right will not necessarily have the most sophisticated models. They will have the most connected delivery architecture.

AI forecasting will keep disappointing until the delivery layer is part of the forecast, not downstream from it.

This is the conversation at DELIVER Europe 2026

nShift is at DELIVER Europe in Amsterdam on June 3-4 (Stand B39). The session on the Solar Stage, Thursday June 4 at 10:30 CET, picks up exactly where this argument lands: when AI agents start mediating discovery, comparison, and checkout on the shopper's behalf, the delivery layer becomes one of the last places where the brand earns trust in public.

If you are working on forecasting, carrier orchestration, or connected delivery data, book a 30-minute meeting with the team.

FAQ

Why does AI-driven forecasting often disappoint in ecommerce?

AI-driven forecasting often disappoints because the model cannot see the operational events that shape demand. It may see sales history, but not whether a product was out of stock, whether delivery promises changed, whether a carrier underperformed, or whether returns created capacity pressure. When those signals sit in separate systems, AI forecasts from incomplete evidence. The issue is not only model quality. It is whether the business has connected demand, inventory, delivery, returns, and customer-experience data into one architecture.

Why are returns important for AI forecasting?

Returns change the real operating picture after the original sale. A return can affect available inventory, warehouse labor, inbound parcel volume, refund timing, resale decisions, and customer experience. If returns are treated only as a reverse-logistics issue, the demand forecast misses a major source of future capacity pressure. For ecommerce businesses, especially in fashion, returns should be part of the forecasting signal, not handled as a disconnected afterthought.

How do delivery promises affect forecasting accuracy?

Delivery promises shape how customers judge carrier performance. A delivery that feels acceptable under one promise may feel late under another. If a retailer changes checkout promises, cutoffs, carrier options, or regional delivery rules, historical carrier data may no longer mean the same thing. AI models trained on that history can mistake a promise problem for a carrier problem, or a service issue for lower demand. Forecasting becomes more reliable when checkout promise logic is connected to delivery and customer behavior data.

Can failed deliveries affect future demand?

Yes. Failed or late deliveries can change when customers choose to buy again. Purchase history is often treated as a clean signal of customer intent, but it can be distorted by delivery failures. If customers delay repeat purchases after a poor delivery experience, a demand model may read that delay as weaker demand rather than as the effect of an operational failure. Delivery events, exceptions, and customer communication should be connected to forecasting and planning systems.

What should ecommerce teams do before trusting an AI forecast?

Ecommerce teams should check whether the forecast is connected to the operational signals that shape demand. The model should account for stockouts, promotions, delivery promises, carrier events, failed deliveries, returns, and refunds. It should produce ranges and action triggers, not only a single number. It also needs monitoring after deployment, because customer behavior, carrier performance, and service levels change over time. For retailers operating across multiple markets and carriers, the delivery management platform is a critical part of making AI forecasts usable.

Where does delivery management fit into AI forecasting?

Delivery management connects the signals AI forecasting needs but often misses: checkout promises, carrier selection, parcel events, exceptions, returns, and customer communication. When those signals are connected, retailers can make better decisions about capacity, carrier rules, delivery promises, and customer updates. The nShift delivery management platform connects delivery touchpoints across checkout, warehouse, carrier, tracking, and returns workflows, which is why this layer matters for forecasting in ecommerce.
Thomas Bailey

About the author

Thomas Bailey

Product Innovation Lead, nShift

Thomas plays a key role in shaping how new features and platform improvements deliver real value to customers. With a background spanning product, tech, and go-to-market strategy, he brings a pragmatic view of what innovation looks like in practice and how to make delivery experiences work harder for your business.
Read more from this author  →