Dog Poop Compass. A Bayesian analysis of canine business | by Dima Sergeev

where

kappa_post is the posterior concentration parameter:

mu_post is the posterior mean direction

Yay, we made it! The posterior is also a von Mises distribution with updated parameters (mu_post, kappa_post). Now we can update the prior with every new observation and see how the mean direction changes.

Welcome back to those of you who skipped the math and congratulations to those who made it through! Let’s code the Bayesian update and vizualize the results.

First, let’s define the helper functions for visualizing the posterior distribution. We’ll need to it to create a nice animation later on.

import imageio
from io import BytesIOdef get_posterior_distribution_image_array(
mu_grid: np.ndarray, 
posterior_pdf: np.ndarray, 
current_samples: List[float], 
idx: int, 
fig_size: Tuple[int, int], 
dpi: int, 
r_max_posterior: float
) -> np.ndarray:
"""
Creates the posterior distribution and observed samples histogram on a polar plot, 
converts it to an image array, and returns it for GIF processing.
Parameters:
-----------
mu_grid (np.ndarray): 
Grid of mean direction values for plotting the posterior PDF.
posterior_pdf (np.ndarray): 
Posterior probability density function values for the given `mu_grid`.
current_samples (List[float]): 
List of observed angle samples in radians.
idx (int): 
The current step index, used for labeling the plot.
fig_size (Tuple[int, int]): 
Size of the plot figure (width, height).
dpi (int): 
Dots per inch (resolution) for the plot.
r_max_posterior (float): 
Maximum radius for the posterior PDF plot, used to set plot limits.
Returns:
np.ndarray: Image array of the plot in RGB format, suitable for GIF processing.
"""
fig = plt.figure(figsize=fig_size, dpi=dpi)
ax = plt.subplot(1, 1, 1, projection='polar')
ax.set_theta_zero_location('N')  
ax.set_theta_direction(-1)  
ax.plot(mu_grid, posterior_pdf, color='red', linewidth=2, label='Posterior PDF')
# observed samples histogram
n_bins = 48
hist_bins = np.linspace(-np.pi, np.pi, n_bins + 1)
hist_counts, _ = np.histogram(current_samples, bins=hist_bins)
# normalize the histogram counts
if np.max(hist_counts) > 0:
hist_counts_normalized = hist_counts / np.max(hist_counts)
else:
hist_counts_normalized = hist_counts
bin_centers = (hist_bins[:-1] + hist_bins[1:]) / 2
bin_width = hist_bins[1] - hist_bins[0]
# set the maximum radius to accommodate both the posterior pdf and histogram bars
r_histogram_height = r_max_posterior * 0.9 
r_max = r_max_posterior + r_histogram_height
ax.set_ylim(0, r_max)
# plot the histogram bars outside the circle
for i in range(len(hist_counts_normalized)):
theta = bin_centers[i]
width = bin_width
hist_height = hist_counts_normalized[i] * r_histogram_height
if hist_counts_normalized[i] > 0:
ax.bar(
theta, hist_height, width=width, bottom=r_max_posterior, 
color='teal', edgecolor='black', alpha=0.5
)
ax.text(
0.5, 1.1, f'Posterior Distribution (Step {idx + 1})', 
transform=ax.transAxes, ha='center', va='bottom', fontsize=18
)
ax.set_yticklabels([])
ax.grid(linestyle='--')
ax.yaxis.set_visible(False)
ax.spines['polar'].set_visible(False)
plt.subplots_adjust(top=0.85, bottom=0.05, left=0.05, right=0.95)
# saving to buffer for gif processing
buf = BytesIO()
plt.savefig(buf, format='png', bbox_inches=None, pad_inches=0)
buf.seek(0)
img_array = plt.imread(buf)
img_array = (img_array * 255).astype(np.uint8)
plt.close(fig)
return img_array

Now we’re ready to write the update loop. Remember that we need to set our prior distribution. I’ll start with a circular uniform distribution which is equivalent to a von Mises distribution with a concentration parameter of 0. For the kappa_likelihood I set a fixed moderate concentration parameter of 2. That’ll make the posterior update more visible.

# initial prior parameters
mu_prior = 0.0  # initial mean direction (any value, since kappa_prior = 0)
kappa_prior = 0.0  # uniform prior over the circle# fixed concentration parameter for the likelihood
kappa_likelihood = 2.0
posterior_mus = []
posterior_kappas = []
mu_grid = np.linspace(-np.pi, np.pi, 200)
# vizualisation parameters
fig_size = (10, 10)
dpi = 100
current_samples = []
frames = []
for idx, theta_n in enumerate(data['radians']):
# compute posterior parameters
C = kappa_prior * np.cos(mu_prior) + kappa_likelihood * np.cos(theta_n)
S = kappa_prior * np.sin(mu_prior) + kappa_likelihood * np.sin(theta_n)
kappa_post = np.sqrt(C**2 + S**2)
mu_post = np.arctan2(S, C)
# posterior distribution
posterior_pdf = np.exp(kappa_post * np.cos(mu_grid - mu_post)) / (2 * np.pi * i0(kappa_post))
# store posterior parameters and observed samples
posterior_mus.append(mu_post)
posterior_kappas.append(kappa_post)
current_samples.append(theta_n)
# plot posterior distribution
r_max_posterior = max(posterior_pdf) * 1.1
img_array = get_posterior_distribution_image_array(
mu_grid, 
posterior_pdf, 
current_samples, 
idx, 
fig_size, 
dpi, 
r_max_posterior
)
frames.append(img_array)
# updating priors for next iteration
mu_prior = mu_post
kappa_prior = kappa_post
# Create GIF
fps = 10
frames.extend([img_array]*fps*3) # repeat last frame a few times to make a "pause" at the end of the GIF
imageio.mimsave('../images/posterior_updates.gif', frames, fps=fps)

And that’s it! The code will generate a GIF showing the posterior distribution update with every new observation. Here’s the glorious result:

Posterior Distribution Updates (image by author)

With every new observation the posterior distribution gets more and more concentrated around the true mean direction. If only I could replace the red line with Auri’s silhouette, it would be perfect!

We can further visualize the history of the posterior mean direction and concentration parameter. Let’s plot them:

# Convert posterior_mus to degrees
posterior_mus_deg = np.rad2deg(posterior_mus) % 360
n_samples = data.shape[0]
true_mu = data['degrees'].mean()
# Plot evolution of posterior mean direction
fig, ax1 = plt.subplots(figsize=(12, 6))color = 'tab:blue'
ax1.set_xlabel('Number of Observations')
ax1.set_ylabel('Posterior Mean Direction (Degrees)', color=color)
ax1.plot(range(1, n_samples + 1), posterior_mus_deg, marker='o', color=color)
ax1.tick_params(axis='y', labelcolor=color)
ax1.axhline(true_mu, color='red', linestyle='--', label='Sample Distribution Mean Direction')
ax1.legend(loc='upper left')
ax1.grid(True)
ax2 = ax1.twinx()  # instantiate a second axes that shares the same x-axis
color = 'tab:orange'
ax2.set_ylabel('Posterior Concentration Parameter (kappa)', color=color)  # we already handled the x-label with ax1
ax2.plot(range(1, n_samples + 1), posterior_kappas, marker='o', color=color)
ax2.tick_params(axis='y', labelcolor=color)
fig.tight_layout()  # otherwise the right y-label is slightly clipped
sns.despine()
plt.title('Evolution of Posterior Mean Direction and Concentration Over Time')
plt.show()

Posterior mu, kappa evolution (image by author)

The plot shows how the posterior mean direction and concentration parameter evolve with every new observation. The mean direction eventually converges to the sample value, while the concentration parameter increases, as the estimate becomes more certain.

The last thing I wanted to try was to use the Bayes Factor approach for hypothesis testing. The idea behind the Bayes Factor is very simple: it’s the ratio of two marginal likelihoods for two competing hypotheses/models.

In general, the Bayes Factor is defined as:

where:

p(D∣Mi) and p(D∣Mj) are the marginal likelihoods of the data under the i and j hypothesis
p(Mi∣D) and p(Mj∣D) are the posterior probabilities of the models given the data
p(Mi) and p(Mj) are the prior probabilities of the models

The result is a number that tells us how much more likely one hypothesis is compared to the other. There are different approaches to interpret the Bayes Factor, but a common one is to use the Jeffreys’ scale by Harold Jeffreys:

What are the models you might ask? Simple! They are distributions with different parameters. I’ll be using PyMC to define the models and sample posterior distributions from them.

First of all, let’s re-itroduce the null hypothesis. I still assume it’s a circular uniform Von Mises distribution with kappa=0 but this time we need to calculate the likelihood of the data under this hypothesis. To simplify further calculations, we’ll be calculating log-likelihoods.

# Calculate log likelihood for H0
log_likelihood_h0 = vonmises.logpdf(data['radians'], kappa=0, loc=0).sum()

Next, it’s time to build the alternative model. Starting with a simple scenario of a Unimodal South direction, where I assume that the distribution is concentrated around 180° or π in radians.

Let’s define the model in PyMC. We’ll use Von Mises distribution with a fixed location parameter μ=π and a Half-Normal prior for the non-negative concentration parameter κ. This allows the model to learn the concentration parameter from the data and check if the South direction is preferred.

import pymc as pm
import arviz as az
import arviz.data.inference_data as InferenceData
from scipy.stats import halfnorm, gaussian_kdewith pm.Model() as model_uni:
# Prior for kappa 
kappa = pm.HalfNormal('kappa', sigma=10)
# Likelihood
likelihood_h1 = pm.VonMises('angles', mu=np.pi, kappa=kappa, observed=data['radians'])
# Sample from posterior 
trace_uni = pm.sample(
10000, tune=3000, chains=4, 
return_inferencedata=True, 
idata_kwargs={'log_likelihood': True})

This gives us a nice simple model which we can also visualize:

# Model graph
pm.model_to_graphviz(model_uni)

And here’s the posterior distribution for the concentration parameter κ:

az.plot_posterior(trace_uni, var_names=['kappa'])
plt.show()

Posterior kappa distribution (image by author)

All that’s left is to calculate the log-likelihood for the alternative model and the Bayes Factor.

# Posterior samples for kappa
kappa_samples = trace_uni.posterior.kappa.values.flatten()
# Log likelihood for each sample
log_likes = []
for k in kappa_samples:
# Von Mises log likelihood
log_like = vonmises.logpdf(data['radians'], k, loc=np.pi).sum()
log_likes.append(log_like)
# Log-mean-exp trick for numerical stability
log_likelihood_h1 = np.max(log_likes) +\
np.log(np.mean(np.exp(log_likes - np.max(log_likes))))
BF = np.exp(log_likelihood_h1 - log_likelihood_h0)
print(f"Bayes Factor: {BF:.4f}")
print(f"Probability kappa > 0.5: {np.mean(kappa_samples > 0.5):.4f}")

>> Bayes Factor: 32.4645
>> Probability kappa > 0.5: 0.0649

Since we’re dividing the likelihood of the alternative model by the likelihood of the null model, the Bayes Factor indicates how much more likely the data is under the alternative hypothesis. In this case, we get 32.46, a very strong evidence, suggesting that the data is not uniformly distributed around the circle and there is a preference for the South direction.

However, we additionally calculate the probability that the concentration parameter kappa is greater than 0.5. This is a simple way to check if the distribution is significantly different from the uniform one. With the Unimodal South model, this probabiliy is only 0.0649, meaning that the distribution is still quite spread out.

Let’s try another model: Bimodal North-South Mixture.

This time I’ll assume that the distribution is bimodal with peaks around 0° and 180°, just as we’ve seen in the compass rose.

To achieve this, I’ll need a mixture of two Von Mises distributions with different fixed mean directions and a shared concentration parameter.

Let’s define a few helper functions first:

# Type aliases
ArrayLike = Union[np.ndarray, pd.Series]
ResultDict = Dict[str, Union[float, InferenceData.InferenceData]]def compute_mixture_vonmises_logpdf(
series: ArrayLike,
kappa: float,
weights: npt.NDArray[np.float64],
mus: List[float]
) -> float:
"""
Compute log PDF for a mixture of von Mises distributions
Parameters:
-----------
series: ArrayLike 
Array of observed angles in radians
kappa: float
Concentration parameter
weights: npt.NDArray[np.float64],
Array of mixture weights
mus: List[float] 
Array of means for each component
Returns:
--------
float: Sum of log probabilities for all data points
"""
mixture_pdf = np.zeros_like(series)
for w, mu in zip(weights, mus):
mixture_pdf += w * vonmises.pdf(series, kappa, loc=mu)
return np.log(np.maximum(mixture_pdf, 1e-300)).sum()
def compute_log_likelihoods(
trace: az.InferenceData, 
series: ArrayLike,
mus: List[float]
) -> np.ndarray:
"""
Compute log likelihoods for each sample in the trace
Parameters:
-----------
trace: az.InferenceData
The trace from the PyMC3 model sampling.
series: ArrayLike
Array of observed angles in radians
"""
kappa_samples = trace.posterior.kappa.values.flatten()
weights_samples = trace.posterior.weights.values.reshape(-1, 2)
# Calculate log likelihood for each posterior sample
log_likes = []
for k, w in zip(kappa_samples, weights_samples):
log_like = compute_mixture_vonmises_logpdf(
series, 
kappa=k, 
weights=w, 
mus=mus
)
log_likes.append(log_like)
# Calculate marginal likelihood using log-sum-exp trick
log_likelihood_h1 = np.max(log_likes) + np.log(np.mean(np.exp(log_likes - np.max(log_likes))))
return log_likelihood_h1
def posterior_report(
log_likelihood_h0: float, 
log_likelihood_h1: float, 
kappa_samples: ArrayLike,
kappa_threshold: float = 0.5
) -> str:
"""
Generate a report with Bayes Factor and probability kappa > threshold
Parameters:
-----------
log_likelihood_h0: float
Log likelihood for the null hypothesis
log_likelihood_h1: float
Log likelihood for the alternative hypothesis
kappa_samples: ArrayLike
Flattened posterior samples of the concentration parameter
kappa_threshold: float
Threshold for computing the probability that kappa > threshold
Returns:
--------
summary: str
A formatted string containing the summary statistics.
"""
BF = np.exp(log_likelihood_h1 - log_likelihood_h0)
summary = (
f"Bayes Factor: {BF:.4f}\n"
f"Probability kappa > {kappa_threshold}: {np.mean(kappa_samples > kappa_threshold):.4f}"
)
return summary

And now back to the model:

mu1 = 0            # 0 degrees
mu2 = np.pi        # 180 degreeswith pm.Model() as model_mixture_bimodal_NS:
# Priors for concentration parameters
kappa = pm.HalfNormal('kappa', sigma=10) 
# Priors for component weights
weights = pm.Dirichlet('weights', a=np.ones(2))
# Define the von Mises components
vm1 = pm.VonMises.dist(mu=mu1, kappa=kappa)
vm2 = pm.VonMises.dist(mu=mu2, kappa=kappa)
# Mixture distribution
likelihood = pm.Mixture(
'angles',
w=weights,
comp_dists=[vm1, vm2],
observed=data['radians']
)
# Sample from the posterior
trace_mixture_bimodal_NS = pm.sample(
10000, tune=3000, chains=4, return_inferencedata=True, idata_kwargs={'log_likelihood': True})
# Get kappa samples
kappa_samples = trace_mixture_bimodal_NS.posterior.kappa.values.flatten()

Once again, let’s visualize the model graph and the posterior distribution for the concentration parameter κ:

# Model graph
pm.model_to_graphviz(model_mixture_bimodal_NS)

# Posterior Analysis
az.plot_posterior(trace_mixture_bimodal_NS, var_names=['kappa'])
plt.show()

And finally, let’s calculate the Bayes Factor and the probability that the concentration parameter κ is greater than 0.5:

log_likelihood_h1 = compute_log_likelihoods(trace_mixture_bimodal_NS, data['radians'], [mu1, mu2])
print(posterior_report(log_likelihood_h0, log_likelihood_h1, kappa_samples))

>> Bayes Factor: 214.2333
>> Probability kappa > 0.5: 0.9110

Fantastic! Both our metrics indicate that this model is a much better fit for the data. The Bayes Factor suggests a decisive evidence and most of the posterior κ samples are greater than 0.5 with the mean value of 0.99 as we’ve seen on the distribution plot.

Let’s try a couple more models before wrapping up.

This model once again assumes a bimodal distribution but this time with peaks around 270° and 180°, which were common directions in the compass rose.

mu1 = np.pi          # 180 degrees
mu2 = 3 * np.pi / 2  # 270 degreeswith pm.Model() as model_mixture_bimodal_WS:
# Priors for concentration parameters
kappa = pm.HalfNormal('kappa', sigma=10)
# Priors for component weights
weights = pm.Dirichlet('weights', a=np.ones(2))
# Define the four von Mises components
vm1 = pm.VonMises.dist(mu=mu1, kappa=kappa)
vm2 = pm.VonMises.dist(mu=mu2, kappa=kappa)
# Mixture distribution
likelihood = pm.Mixture(
'angles',
w=weights,
comp_dists=[vm1, vm2],
observed=data['radians']
)
# Sample from the posterior
trace_mixture_bimodal_WS = pm.sample(
10000, tune=3000, chains=4, return_inferencedata=True, idata_kwargs={'log_likelihood': True})
# Get kappa samples
kappa_samples = trace_mixture_bimodal_WS.posterior.kappa.values.flatten()
# Posterior Analysis
az.plot_posterior(trace_mixture_bimodal_WS, var_names=['kappa'])
plt.show()
log_likelihood_h1 = compute_log_likelihoods(trace_mixture_bimodal_WS, data['radians'], [mu1, mu2])
print(posterior_report(log_likelihood_h0, log_likelihood_h1, kappa_samples))

>> Bayes Factor: 20.2361
>> Probability kappa > 0.5: 0.1329

Nope, definitely not as good as the previous model. Next!

Final round. Maybe my dog really likes to align himself with the cardinal directions? Let’s try a quadrimodal distribution with peaks around 0°, 90°, 180°, and 270°.

mu1 = 0            # 0 degrees
mu2 = np.pi / 2    # 90 degrees
mu3 = np.pi        # 180 degrees
mu4 = 3 * np.pi / 2  # 270 degreeswith pm.Model() as model_mixture_quad:
# Priors for concentration parameters
kappa = pm.HalfNormal('kappa', sigma=10)
# Priors for component weights
weights = pm.Dirichlet('weights', a=np.ones(4))
# Define the four von Mises components
vm1 = pm.VonMises.dist(mu=mu1, kappa=kappa)
vm2 = pm.VonMises.dist(mu=mu2, kappa=kappa)
vm3 = pm.VonMises.dist(mu=mu3, kappa=kappa)
vm4 = pm.VonMises.dist(mu=mu4, kappa=kappa)
# Mixture distribution
likelihood = pm.Mixture(
'angles',
w=weights,
comp_dists=[vm1, vm2, vm3, vm4],
observed=data['radians']
)
# Sample from the posterior
trace_mixture_quad = pm.sample(
10000, tune=3000, chains=4, return_inferencedata=True, idata_kwargs={'log_likelihood': True}
)
# Get kappa samples
kappa_samples = trace_mixture_quad.posterior.kappa.values.flatten()
# Posterior Analysis
az.plot_posterior(trace_mixture_quad, var_names=['kappa'])
plt.show()
log_likelihood_h1 = compute_log_likelihoods(trace_mixture_quad, data['radians'], [mu1, mu2, mu3, mu4])
print(posterior_report(log_likelihood_h0, log_likelihood_h1, kappa_samples))

>> Bayes Factor: 0.0000
>> Probability kappa > 0.5: 0.9644

Well… Not really. Although the probability that the concentration parameter κκ is greater than 0.5 is quite high, the Bayes Factor is 0.0.

The great thing about Bayes Factor is that it penalizes overly complex models effectively preventing overfitting.

Let’s summarize the results of all models using information criteria. We’ll use the Widely Applicable Information Criterion (WAIC) and the Leave-One-Out Cross-Validation (LOO) to compare the models.

# Compute WAIC for each model
wail_uni = az.waic(trace_uni)
waic_quad = az.waic(trace_mixture_quad)
waic_bimodal_NS = az.waic(trace_mixture_bimodal_NS)
waic_bimodal_WS = az.waic(trace_mixture_bimodal_WS)model_dict = {
'Quadrimodal Model': trace_mixture_quad,
'Bimodal Model (NS)': trace_mixture_bimodal_NS,
'Bimodal Model (WS)': trace_mixture_bimodal_WS,
'Unimodal Model': trace_uni 
}
# Compare models using WAIC
waic_comparison = az.compare(model_dict, ic='waic')
waic_comparison