Democratizing Marketing Mix Models (MMM) with Open Source and Gen AI

Editor
11 Min Read


been in the industry for several years and recently they have experienced a renaissance. With digitally tracked signals being deprecated for increasing data privacy restrictions, Marketers are turning back to MMMs for strategic, reliable, privacy-safe measurement and attribution framework.

Unlike user-level tracking tools, MMM uses aggregated time-series and cross-sectional data to estimate how marketing channels drive business KPIs. Advances in Bayesian modeling with enhanced computing power has pushed MMM back into the center of marketing analytics.

For years, advertisers and media agencies have used and relied on Bayesian MMM for understanding marketing channel contributions and marketing budget allocation.

The Role of GenAI in Modern MMM

An increasing number of companies are now utilizing GenAI features as an enhancement to MMM in several ways.

1. Data Preparation and Feature Engineering
2. Pipeline Automation: Generating code for MMM pipeline
3. Insight Explanation – translate model insights into plain business language
4. Scenario planning and budget optimization

While these capabilities are powerful, they rely on proprietary MMM engines.

The purpose of this article is not to showcase how Bayesian MMM works but to demonstrate a potential open-source and free system design that marketers can explore without the need of subscribing to black box MMM stack that vendors in the industry provide.

The approach combines:

1. Google Meridian as the open-source Bayesian MMM engine
2. Open-source Large Language Model (LLMs) – Mistral 7B as an insight and interaction layer on top of Meridian’s Bayesian inference output.

Here is an architecture diagram that represents the proposed open-source system design for marketers.

This architecture diagram was created using Gen-AI assisted design tools for rapid prototyping

This open-source workflow has several benefits:

  1. Democratization of Bayesian MMM: eliminates the black box problem of proprietary MMM tools.
  2. Cost Efficiency: reduces financial barrier for small/medium businesses to access advanced analytics.
  3. This seperation preserves statistcal rigor required from MMM engines and makes it just more accessible.
  4. With a GenAI insights layer, audiences do not need to understand the Bayesian math, instead they can just interact using GenAI prompts to learn about model insights on channel contribution, ROI, and possible budget allocation strategies.
  5. Adaptability to newer open-source tools: a GenAI layer can be replaced with newer LLMs as and when they are openly available to get enhanced insights.

Hands-on example of implementing Google Meridian MMM model with a LLM layer

For the purpose of this showcase, I have used the open-source model Mistral 7B, sourced locally from the Hugging Face platform hosted by the Llama engine.

This framework is supposed to be domain-agnostic, i.e. any alternative open-source MMM models such as Meta’s Robyn, PyMC, etc. and LLM versions for GPT and Llama models can be used, depending on the scale and scope of the insights desired.

Important note:

  1. A synthetic marketing dataset was created, having a KPI such as ‘Conversions’ and marketing channels such as TV, Search, Paid Social, Email, and OOH (Out-of-Home media).
  2. Google Meridian produces rich outputs such as ROI, channel coefficients and contributions in driving KPI, response curves, etc. While these output are statistically sound, they often require specialized expertise to interpret. This is where an LLM becomes valuable and can be used as an insight translator.
  3. Google Meridian python code examples were used to run the Meridian MMM model on the synthetic marketing data created. For more information on how to run Meridian code, please refer to this page.
  4. An open-source LLM model, Mistral 7B, was utilized due to its compatibility with the free tier of Google Colab GPU resources and also for being an adequate model for generating instruction-based insights without relying on any API access requirements.

Example: the below snippet of Python code was executed in the Google Colab platform:

# Install meridian: from PyPI @ latest release 
!pip install --upgrade google-meridian[colab,and-cuda,schema] 

# Install dependencies 
import IPython from meridian 
import constants from meridian.analysis 
import analyzer from meridian.analysis 
import optimizer from meridian.analysis 
import summarizer from meridian.analysis 
import visualizer from meridian.analysis.review 
import reviewer from meridian.data 
import data_frame_input_data_builder 
from meridian.model import model
from meridian.model import prior_distribution 
from meridian.model import spec 
from schema.serde import meridian_serde 
import numpy as np 
import pandas as pd

A synthetic marketing dataset (not shown in this code) was created, and as part of the Meridian workflow requirement, an input data builder instance is created as shown below:

builder = data_frame_input_data_builder.DataFrameInputDataBuilder( 
   kpi_type='non_revenue', 
   default_kpi_column='conversions', 
   default_revenue_per_kpi_column='revenue_per_conversion', 
   ) 

builder = ( 
   builder.with_kpi(df) 
  .with_revenue_per_kpi(df) 
  .with_population(df) 
  .with_controls( 
  df, control_cols=["sentiment_score_control", "competitor_sales_control"] ) 
  ) 

channels = ["tv","paid_search","paid_social","email","ooh"] 

builder = builder.with_media( 
  df, 
  media_cols=[f"{channel}_impression" for channel in channels], 
  media_spend_cols=[f"{channel}_spend" for channel in channels], 
  media_channels=channels, 
  ) 

data = builder.build() #Build the input data

Configure and execute the Meridian MMM model:

# Initializing the Meridian class by passing loaded data and customized model specification. One advantage of using Meridian MMM is the ability to set modeling priors for each channel which gives modelers ability to set channel distribution as per historical knowledge of media behavior.

roi_mu = 0.2  # Mu for ROI prior for each media channel.
roi_sigma = 0.9  # Sigma for ROI prior for each media channel.

prior = prior_distribution.PriorDistribution(
    roi_m=tfp.distributions.LogNormal(roi_mu, roi_sigma, name=constants.ROI_M)
)

model_spec = spec.ModelSpec(prior=prior, enable_aks=True)

mmm = model.Meridian(input_data=data, model_spec=model_spec)


mmm.sample_prior(500)
mmm.sample_posterior(
    n_chains=10, n_adapt=2000, n_burnin=500, n_keep=1000, seed=0
)

This code snippet runs the meridian model with defined priors for each channel on the input dataset generated. The next step is to assess model performance. While there are model output parameters such as R-squared, MAPE, P-Values etc. that can be assessed, for the purpose of this article I am just including a visual assessment example:

model_fit = visualizer.ModelFit(mmm)
model_fit.plot_model_fit()

Now that the Meridian MMM model has been executed, we have model output parameters for each media channel, such as ROI, response curves, model coefficients, spend levels, etc. We can bring all this information into a single input JSON object that can be used directly as an input to the LLM to generate insights:

import json

# Combine everything into one dictionary
genai_input = {
    "roi": roi.to_dict(orient='records'),
    "coefficients": coeffs.to_dict(orient='records'),
    "priors": priors.to_dict(orient='records'),
    "response_curves": response_curves.to_dict(orient='records')
}

# Convert to JSON string for the LLM
genai_input_json = json.dumps(genai_input, indent=2)

Downloading Mistral 7B LLM from the Hugging Face platform locally and installing the required Llama engine to execute the LLM:

# Download the Mistral 7B llm from Hugging Face
!wget -O /content/models/mistral-7b-instruct-v0.2.Q4_K_M.gguf \
https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q4_K_M.gguf

# Install Llama for python engine
!pip install llama-cpp-python --upgrade

Executing the Mistral LLM using the input JSON having Meridian MMM output and including the appropriate instructional prompt:

from llama_cpp import Llama

# Initialize the model
llm = Llama(
    model_path="/content/models/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
    n_ctx=2048,        
    n_gpu_layers=35,    
    seed=42,
    verbose=False,
    temperature=0.2
)

# Convert JSON to a prompt for recommendations
prompt = f"""
You are a marketing mix modeling expert.

1. Summarize the effeciency of each marketing channel based on MMM output
2. Explain the key drivers of ROI for each channel in simple business terms.
3. Channels with diminishing returns
4. If we want to maximize total ROI with the current budget, how should we reallocate spend across channels

JSON:
{genai_input_json}
"""

# Generate recommendations
response = llm(prompt, max_tokens=300)

# Print only the text output
print(response["choices"][0]["text"])

Example Output:

1. Based on the Marketing Mix Modeling (MMM) output, TV, OOH, and Search Ads have a positive ROI (3.0, 2.5, and 1.5 respectively), while Social Media and Email have a lower ROI (0.9 and 0.7 respectively).

2. The key drivers of ROI for each channel in simple business terms are:

   – TV: Every dollar spent on TV ads generates $3 in sales.

   – Social Media: Every dollar spent on Social Media ads generates $0.9 in sales.

   – Search Ads: Every dollar spent on Search Ads generates $2.5 in sales.

   – Email: Every dollar spent on Email marketing generates $0.7 in sales.

   – OOH: Every dollar spent on Out of Home (OOH) ads generates $1 .5 in sales.

3. Channels with diminishing returns are Social Media and Email as their response curves show that the sales growth is not proportional to the increasing marketing spend.

4. To maximize total ROI with the current budget, we should consider reallocating spend from Social Media and Email towards TV and OOH as they have a higher ROI.

Practical Considerations

  • Model quality and insights are still dependent on input data quality.
  • Prompt design is critical to avoid misleading insights.
  • Automation for input data processing and model output reporting and visualization will help this stack to operate at scale.

Final thoughts

This walkthrough illustrates how a potential open-source based Bayesian MMM augmented with a GenAI workflow can translate complex Bayesian results into actionable insights for marketers and leaders.

This approach does not attempt to simplify the math behing Marketing Mix Models, instead it preserves it and makes an attempt to make it more accessible for broader audiences with limited model knowledge and budget resources for their organization.

As privacy-safe marketing analytics becomes a norm, open-source MMM systems with GenAI augmentation offer a sustainable path: transparent, adaptable, and designed to evolve with both business and underlying technology.

Resources & References:

Share this Article
Please enter CoinGecko Free Api Key to get this plugin works.