From Monolith to Contract-Driven Data Mesh

Contents

Why traditional data warehousing becomes a monolith What Data Mesh actually changes (and what it doesn’t)Data contracts: the missing stabiliser The example: website analytics The centralised (pre-Mesh) approach Website analytics in a Data Mesh Website User Behaviour Foundational Data Product Marketing Lead Conversion Metrics Consumer Data Product Zooming into Data Contracts Data Contracts enabling interoperability LLMs, MCP servers and Data Contracts

, the move from a traditional data warehouse to Data Mesh feels less like an evolution and more like an identity crisis.

One day, everything works (maybe “works” is a stretch, but everybody knows the lay of the land) The next day, a new CDO arrives with exciting news: “We’re moving to Data Mesh.” And suddenly, years of carefully designed pipelines, models, and conventions are questioned.

In this article, I want to step away from theory and buzzwords and walk through a practical transition, from a centralised data “monolith” to a contract-driven Data Mesh, using a concrete example: website analytics.

The standardized data contract becomes the critical enabler for this transition. By adhering to an open, structured contract specification, schema definitions, business semantics, and quality rules are expressed in a consistent format that ETL and Data Quality tools can interpret directly. Because the contract follows a standard, these external platforms can programmatically generate tests, enforce validations, orchestrate transformations, and monitor data health without custom integrations.

The contract shifts from static documentation to an executable control layer that seamlessly integrates governance, transformation, and observability. The Data Contract is really the glue that holds the integrity of the Data Mesh.

Why traditional data warehousing becomes a monolith

When people hear “monolith”, they often think of bad architecture. But most monolithic data platforms didn’t start that way, they evolved into one.

A traditional enterprise data warehouse typically has:

One central team responsible for ingestion, modelling, quality, and publishing
One central architecture with shared pipelines and shared patterns
Tightly coupled components, where a change in one model can ripple everywhere
Slow change cycles, because demand always exceeds capacity
Limited domain context, as modelers are often far removed from the business
Scaling pain, as more data sources and use cases arrive

This isn’t incompetence, it’s a natural outcome of centralisation and years of unintended consequences. Eventually, the warehouse becomes the bottleneck.

What Data Mesh actually changes (and what it doesn’t)

Data Mesh is often misunderstood as “no more warehouse” or “everyone does their own thing.”

In reality, it’s a network shift, not necessarily a technology shift.

At its core, Data Mesh is built on four pillars:

Domain ownership
Data as a Product
Self-serve data platform
Federated governance

The key difference is that instead of one big system owned by one team, you get many small, connected data products, owned by domains, and linked together through clear contracts.

And this is where data contracts become the quiet hero of the story.

Data contracts: the missing stabiliser

Data contracts borrow a familiar idea from software engineering: API contracts, applied to data.

They were popularised in the Data Mesh community between 2021 and 2023, with contributions from people and projects such as:

Andrew Jones, who introduced the term data contract widely through blogs and talks and his book, which was published in 2023¹
Chad Sanderson (gable.ai)
The Open Data Contract Standard, which was introduced by the Bitol project

A data contract explicitly defines the agreement between a data producer and a data consumer.

The example: website analytics

Let’s ground this with a concrete scenario.

Imagine an online retailer, PlayNest, an online toy store. The business wants to analyse the user behaviour on our website.

PlayNest home page (AI generated)

There are two main departments that are relevant to this exercise. Customer Experience, which is responsible for the user journey on our website; How the customer feels when they are browsing our products.

Then there is the Marketing domain, who make campaigns that take users to our website, and ideally make them interested in buying our product.

There is a natural overlap between these two departments. The boundaries between domains are often fuzzy.

At the operational level, when we talk about websites, you capture things like:

Visitors
Sessions
Events
Devices
Browsers
Products

A conceptual model for this example could look like this:

From a marketing perspective, however, nobody wants raw events. They want:

Marketing leads
Funnel performance
Campaign effectiveness
Abandoned carts
Which type of products people clicked on for retargeting etc.

And from a customer experience perspective, they want to know:

Frustration scores
Conversion metrics (For example how many users created wishlists, which signals they are interested in certain products, a type of conversion from random user to interested user)

The centralised (pre-Mesh) approach

I will use a Medallion framework to illustrate how this would be built in a centralised lakehouse architecture.

Bronze: raw, immutable data from tools like Google Analytics
Silver: cleaned, standardized, source-agnostic models
Gold: curated, business-aligned datasets (facts, dimensions, marts)

Here in the Bronze layer, the raw CSV or JSON objects are stored in, for example, an Object store like S3 or Azure Blob. The central team is responsible for ingesting the data, making sure the API specifications are followed and the ingestion pipelines are monitored.

In the Silver layer, the central team begins to clean and transform the data. Perhaps the data modeling selected was Data Vault and thus the data is standardised into specific data types, business objects are identified and certain similar datasets are being conformed or loosely coupled.

In the Gold layer, the real end-user requirements are documented in story boards and the centralised IT teams implement the dimensions and facts required for the different domains’ analytical purposes.

Let’s now reframe this example, moving from a centralised operating model to a decentralised, domain-owned approach.

Website analytics in a Data Mesh

A typical Data Mesh data model could be depicted like this:

A Data Product is owned by a Domain, with a specific type, and data comes in via input ports and goes out via output ports. Each port is governed by a data contract.

As an organisation, if you have selected to go with Data Mesh you will constantly have to decide between the following two approaches:

Do you organise your landscape with these re-usable building blocks where logic is consolidated, OR:

Do you let all consumers of the data products decide for themselves how to implement it, with the risk of duplication of logic?

People look at this and they tell me it is obvious. Of course you should choose the first option as it is the better practice, and I agree. Except that in reality the first two questions that will be asked are:

Who will own the foundational Data Product?
Who will pay for it?

These are fundamental questions that often hamper the momentum of Data Mesh. Because you can either overengineer it (having lots of reusable parts, but in so doing hampering autonomy and escalate costs), or create a network of many little data products that don’t speak to each other. We want to avoid both of these extremes.

For the sake of our example, let’s assume that instead of every team ingesting Google Analytics independently, we create one or more shared foundational products, for example Website User Behaviour and Products.

These products are owned by a specific domain (in our example it will be owned by Customer Experience), and they are responsible for exposing the data in standard output ports, which needs to be governed by data contracts. The whole idea is that these products should be reusable in the organisation just like external data sets are reusable through a standardised API pattern. Downstream domains, like Marketing, then build Consumer Data Products on top.

Website User Behaviour Foundational Data Product

Designed for reuse
Stable, well-governed
Often built using Data Vault, 3NF, or similar resilient models
Optimised for change, not for dashboards

Website user behaviour in our Data Product model

Website user behaviour technical implementation

The two sources are treated as input ports to the foundational data product.

The modelling techniques used to build the data product is again open to the domain to decide but the motivation is for re-usability. Thus a more flexible modelling technique like Data Vault I have often seen being used within this context.

The output ports are then also designed for re-usability. For example, here you can combine the Data Vault objects into an easier-to-consume format OR for more technical consumers you can simply expose the raw data vault tables. These will simply be logically split into different output ports. You could also decide to publish a separate output to be exposed to LLM’s or autonomous agents.

Marketing Lead Conversion Metrics Consumer Data Product

Designed for specific use cases
Shaped by the needs of the consuming domain
Often dimensional or highly aggregated
Allowed (and expected) to duplicate logic if needed

Marketing Lead conversion metrics in our Data Product model

Marketing Leads Conversion metrics technical implementation

Here I illustrate how we opt for using other foundational data products as input ports. In the case of the Website user behaviour we opt for using the normalised Snowflake tables (since we want to keep building in Snowflake) and create a Data Product that is ready for our specific consumption needs.

Our main consumers will be for analytics and dashboard building so opting for a Dimensional model makes sense. It is optimised for this type of analytical querying within a dashboard.

Zooming into Data Contracts

The Data Contract is really the glue that holds the integrity of the Data Mesh. The Contract should not just specify some of the technical expectations but also the legal and quality requirements and anything that the consumer would be interested in.

The Bitol Open Data Contract Standard² set out to address some of the gaps that existed with the vendor specific contracts that were available on the market. Namely a shared, open standard for describing data contracts in a way that is human-readable, machine-readable, and tool-agnostic.

Why so much focus on a shared standard?

Shared language across domains

When every team defines contracts differently, federation becomes impossible.

A standard creates a common vocabulary for producers, consumers, and platform teams.

Tool interoperability

An open standard allows data quality tools, orchestration frameworks, metadata platforms and CI/CD pipelines to all consume the same contract definition, instead of each requiring its own configuration format.

Contracts as living artifacts

Contracts shouldn’t be static documents. With a standard, they can be versioned, validated automatically, tested in pipelines and compared over time. This moves contracts from “documentation” to enforceable agreements.

Avoiding vendor lock-in

Many vendors now support data contracts, which is great, but without an open standard, switching tools becomes expensive.

The ODCS is a YAML template that includes the following key components:

Fundamentals – Purpose, ownership, domain, and intended consumers
Schema – Fields, types, constraints, and evolution rules
Data quality expectations – Freshness, completeness, validity, thresholds
Service-level agreements (SLAs) – Update frequency, availability, latency
Support and communication channels – Who to contact when things break
Teams and roles – Producer, owner, steward responsibilities
Access and infrastructure – How and where the data is exposed (tables, APIs, files)
Custom domain rules – Business logic or semantics that consumers must understand

Sample ODCS Data Contract for Website User behaviour

Not every contract needs every section — but the structure matters, because it makes expectations explicit and repeatable.

Data Contracts enabling interoperability

Our consumer data product in the context of data contracts and 3rd party tools

In our example we have a data contract on the input port (Foundational data product) as well as the output port (Consumer data product). You want to enforce these expectations as seamlessly as possible, just as you would with any contract between two parties. Since the contract follows a standardised, machine-readable format, you can now integrate with 3rd party ETL and data quality tools to enforce these expectations.

Platforms such as dbt, SQLMesh, Coalesce, Great Expectations, Soda, and Monte Carlo can programmatically generate tests, enforce validations, orchestrate transformations, and monitor data health without custom integrations. Some of these tools have already announced support for the Open Data Contract Standard.

LLMs, MCP servers and Data Contracts

By using standardised metadata, including the data contracts, organisations can safely make use of LLMs and other agentic AI applications to interact with their crown jewels, the data.

Using a MCP server as translation layer between users, LLM’s and our data assets

So in our example, let’s assume Peter from PlayNest wants to check what the top most visited products are:

Sample Claude interaction using remote MCP server

This is enough context for the LLM to use the metadata to determine which data products are relevant, but also to see that the user does not have access to the data. It can now determine who and how to request access.

Once access is granted:

The LLM can interpret the metadata and create the query that matches the user request.

Making sure autonomous agents and LLMs have strict guardrails under which to operate will allow the business to scale their AI use cases.

Multiple vendors are rolling out MCP servers to provide a well structured approach to exposing your data to autonomous agents. Forcing the interfacing to work through metadata standards and protocols (such as these data contracts) will allow safer and scalable roll-outs of these use cases.

The MCP server provides the toolset and the guardrails for which to operate in. The metadata, including the data contracts, provides the policies and enforceable rules under which any agent may operate.

At the moment there is a tsunami of AI use cases being requested by business. Most of them are currently still not adding value. Now we have a prime opportunity to invest in setting up the correct guardrails for these projects to operate in. There will come a critical mass moment when the value will come, but first we need the building blocks.

I’ll go as far as to say this: a Data Mesh without contracts is simply decentralised chaos. Without clear, enforceable agreements, autonomy turns into silos, shadow IT multiplies, and inconsistency scales faster than value. At that point, you haven’t built a mesh, you’ve distributed disorder. You might as well revert to centralisation.

Contracts replace assumption with accountability. Build small, connect smartly, govern clearly — don’t mesh around.

[1] Jones, A. (2023). Driving data quality with data contracts: A comprehensive guide to building reliable, trusted, and effective data platforms. O’Reilly Media.
[2] Bitol. (n.d.). Open data contract standard (v3.1.0). Retrieved February 18, 2026, from https://bitol-io.github.io/open-data-contract-standard/v3.1.0/

All images in this article was created by the author