Synergy in Clicks: Harsanyi Dividends for E-Commerce

Contents

Have you ever played a co-operative game or sport? Let’s consider another example, but this time in the professional world. Let’s say you are part of an organization whose primary means of driving product sales is through its e-commerce site. Within this organization, you likely have various marketing teams that drive customers to the website through online advertisements, email campaigns, and other channels. The website itself is maintained by another set of teams whose responsibilities include design, merchandising, recommendation strategies, and many more. The organization must also consider the product itself and the teams that create, improve, and develop new products. This poses a significant question to the organization: How can value be properly attributed so that more strategic decisions can be made going forward? If that question was not already difficult enough to answer, we must consider a more realistic viewpoint. All of these teams within the business depend on each other to make a sale. The product teams need a way to sell the products. The organization has a website to facilitate this. For the website to make a sale, it requires customers; that is where the marketing teams need to create, launch, and maintain campaigns that drive traffic to the website. Recognizing the intertwined dependencies among the organization’s teams, the business must understand value in terms of team coalitions. This is where Harsanyi Dividends come into play. Harsanyi Dividends A cooperative game — Dragon Slayer Calculating the Dividends — Individuals Calculating the Dividends — Pairs Calculating the Dividends — Trios Real World Application — E-commerce Website The Harsanyi Application synthetic_data.py Feature Variable Propensities FEATURE_PROPENSITY_RANGES _coef_range_for_score _sample_marginal_probabilities _build_logistic_spec _compute_linear_predictor _calibrate_intercept_to_global_rate dividends.py Demo using Synthetic Data Step 1: Generate a synthetic Dataset Step 2: Configure the maximum coalition size and the minimum % of data required for a coalition to be counted, then calculate the Harsanyi Dividends.Step 3: Analyze the Results Conclusion

Disclaimer: The data presented in this article is entirely synthetic and purely hypothetical. It was generated solely for illustrative and educational purposes. Any resemblance to real-world data, individuals, or organizations is purely coincidental.

Harsanyi Dividends

Harsanyi dividends, a concept from cooperative game theory, measure the excess value of coalitions in a cooperative game. The key here is the excess value. A co-operative game is a concept in game theory, the study of how participants interact in a game or activity with a shared goal.

Let’s do a bird’s-eye view of cooperative game theory, specifically transferable-utility (TU) cooperative games. In TU cooperative game theory, players can form coalitions to achieve a collective payoff in an agreed-upon way. For those of us who have built robust predictive models using frameworks such as XGBoost or other ensemble methods, we have probably found ourselves using Shapley values to understand the contribution of each feature, since the model itself is a black box. Shapley values can also be used to determine each player’s payoff in a coalition in TU cooperative games. There is certainly a lot of value in a framework such as Shapley values for understanding individual contributions; however, Harsanyi Dividends help us know the extra value generated by coalitions. Let’s look at a hypothetical example.

A cooperative game — Dragon Slayer

Let’s say three friends get together to play a new co-op video game where the goal is to work together to inflict as much damage as possible on a dragon. The players are Andrew, Bryan, and Carson. They have played this game many times; however, not all players play together every time. Sometimes, it is only Andrew & Carson, Carson & Bryan, Carson by himself, etc. They have played this game so much that every possible subset of the group has played it many times, including sessions with only one player.

Carson, a data scientist by trade, wants to gather a deeper understanding of the group’s performance. He gathered the scores of each session and eventually had an average score for each coalition and individual. Take a look at these aggregate scores below. Each player will be represented by their first initial. We will represent the average score of each coalition/individual with v(n).

v(a) = 10

v(b) = 12

v(c) = 18

v(a,b) = 27

v(a,c) = 23

v(c,b) = 29

v(a,b,c) = 37

We can clearly see that Carson is the top individual player, while Andrew & Bryan are the top duo. You are probably not surprised that the three-player coalition yields the highest score. Carson, the curious data scientist, wants to study these interactions among his friends in greater depth. To do so, he decides to calculate Harsanyi Dividends to see which group collaborated the most effectively. Now this is a complicated question. We can easily see the scores by each coalition; however, what if we adjusted for what individual players already contributed? We can discover which coalitions enhance each other or act as a detriment to what the smaller coalitions of players already contribute. In other words, where does 1+1 equal something more than two, and where does 1+1 equal something less than two?

To accomplish this, we will use the formula below:

Image provided by the author

Let’s break it down piece by piece.

This represents the Harsanyi Dividend for coalition S.

The Sigma symbol is used in mathematics to demonstrate taking the sum of a series of terms in a compact form. Just below it shows the expression of the terms we are taking a sum over. In this case, it explicitly reads T subset S. T is a subset of the coalition S, which is the coalition for which we are calculating the dividend. In combination with Sigma, this demonstrates that we are taking the sum of all possible subsets of S in a specific manner that we will discuss next. One final note on this part, the coalition itself is considered a subset.

The pipe symbols around S and T indicate that we are looking at the sizes of the sets. The result of these differences is the power we raise negative one to in each summation. Then, it is multiplied by the value for subset T.

Calculating the Dividends — Individuals

Let’s start with the individual players (Andrew, Bryan, and Carson), as this will be the most straightforward. For convenience, here are their individual aggregated scores mentioned earlier:

v(a) = 10

v(b) = 12

v(c) = 18

Can you guess what their Harsanyi Dividends are? Let’s start with Andrew and calculate it step by step (or just step).

For each part of the sum, we need the subset’s value and its size. For individual players, that leaves us with only one subset (itself), so we only have to go through the loop once.

Starting with the exponent, the size of set a is just one. T will also be of size one. This leaves us with raising -1 to the 0 power, which yields 1. We then multiply that by our value for a, which is 10, yielding a Harsanyi dividend of 10 for Andrew. For individuals, the dividend is just the value.

Dividend (a) = 10

Dividend (b) = 12

Dividend (c) = 18

Calculating the Dividends — Pairs

Let’s calculate the dividends for Andrew and Bryan (a, b). The subsets are (a,b), (a), and (b). Therefore, we will have three sums.

Sum #1, Subset (a), v(a) = 10, size of (a) = 1,

-1^([a,b]-[a]) * v(a) = -1^(2-1) * 10 = -10

Sum #2, Subset (b), v(b) = 12, size of (b) = 1,

-1^([a,b]-[b]) * v(b) = -1^(2-1) * 12 = -12

Sum #3, Subset (a,b), v(a,b) = 27, size of (a,b) = 2,

-1^([a,b]-[a,b]) * v(a,b) = -1^(2-2) * 27 = 27

Add them all together, and we get:

Dividend (a,b) = 5

Let’s pause here to discuss some quick intuition behind calculating Harsanyi dividends for pairs. To put it simply, the dividend is simply the value of the pair minus the values of the individuals in the pair. In other words, it shows whether the pair generates a surplus of value or loses value when they work together. In this example, Andrew & Bryan demonstrate they played the game more efficiently together. Take a look at the dividends for the remaining pairs. What insights can we derive? The first thing that comes to my mind is that Carson is probably not the best teammate, at least when he is in a pair. Let’s see how things change when we look at the trio.

Dividend (a,c) = -5

Dividend (c,b) = -1

Calculating the Dividends — Trios

Buckle up, there are a lot of sums here; however, it is essential to understand which values are added versus subtracted in the trio calculation.

Sum #1, Subset (a), v(a) = 10, size of (a) = 1
- -1^([a,b,c]-[a]) * v(a) = -1^(3-1) * 10 = 10
Sum #2, Subset (b), v(b) = 12, size of (b) = 1
- -1^([a,b,c]-[b]) * v(b) = -1^(3-1) * 12 = 12
Sum #3, Subset (c), v(c) = 18, size of (c) = 1
- -1^([a,b,c]-[c]) * v(c) = -1^(3-1) * 18 = 18
Sum #4, Subset (a,c), v(ab) = 27, size of (a,b) = 2
- -1^([a,b,c]-[a,b]) * v(a,b) = -1^(3-2) * 27 = -27
Sum #5, Subset (a,c), v(a,c) = 23, size of (a,c) = 2
- -1^([a,b,c]-[a,c]) * v(a,c) = -1^(3-2) * 23 = -23
Sum #6, Subset (c,b), v(c,b) = 29, size of (c,b) = 2
- -1^([a,b,c]-[c,b]) * v(c,b) = -1^(3-2) * 29 = -29
Sum #7, Subset (a,b,c), v(a,b,c) = 37, size of (a,b,c) = 3
- -1^([a,b,c]-[a,b,c]) * v(a,b,c) = -1^(3-3) * 37 = 37

Dividend (a,b,c) = -2

So there is certainly a lot of math, but it is straightforward. What about the intuition behind what is happening? As you just saw, calculating dividends for one and two-player coalitions is quite simple to execute without the formula; however, once you get to three-player coalitions and above, the steps increase exponentially. With the three-player coalition specifically, it is easy to see that the two-player coalition values get subtracted, while the one-player coalition values get added back in. What about four-player coalitions? Three player coalitions would get subtracted, two would get added back in, singles would be subtracted, etc. You can easily extrapolate the pattern here; however, what does this pattern of subtracting and adding actually do? Let’s focus on the three-player example. By subtracting the two player coalition values, we are removing the synergy received from that coalition and the lower level values from the smaller coalitions within it, however, when this happens, it actually over-subtracts value and when the single player values are added back in, we are adjusting for the over-subtracted value and are leftover with the pure synergy from the three player coalition.

Real World Application — E-commerce Website

Going back to our original example, let’s build an application that calculates Harsanyi Dividends for an e-commerce website for all of the actions a customer can perform, so that we can get a sense of which elements of the website work well together. These insights can assist stakeholders with the following questions:

Where should we send customers depending on the page?
What products have up-sell or cross-sell opportunities?
What are the best landing pages for different channels?
Which journeys should be improved or should be removed altogether?

The Harsanyi Application

The whole project can be found on my GitHub here. I will walk you through the three core files: synthetic_data.py, dividends.py, and app.py.

synthetic_data.py

Why include a synthetic data feature? One of my goals for this project is to be educational, and the synthetic generation data portion allows an end user to quickly explore the tool and even gain a sense of the type of data the tool is designed to handle. Note, there is also an option for a user to upload their own data via a CSV file.

Here is a simplified view of what the data should look like:

SEO	Email	Product Page	Desktop	Conversion
1	0	1	0	1
0	0	1	0	1
0	0	0	0	0
0	1	1	1	0

As you can see, each feature and the target column (Conversion) are boolean. Each observation can be implied as a customer or a website session. In the synthetic data section, the feature variables can be in three categories: channel, page, and device; however, if you are uploading your own data, you can use whatever you want as long as it is Boolean.

Feature Variable Propensities

Generating “good” synthetic data means making it as realistic as possible. In this project, that means we must include realistic propensities for each feature variable.

In the file, I added a list of feature propensity ranges. These can be easily configured and are utilized to model the propensity to convert. Some of them intentionally do not have a range, but for those that do, they are passed through a custom randomization function that outputs a value in between the range.

FEATURE_PROPENSITY_RANGES

FEATURE_PROPENSITY_RANGES: Dict[str, Tuple[float, float]] = {
    # Channels
    "email": (2.0, 2.0),
    "seo": (6.0, 6.0),
    "sem": (6.0, 6.0),
    "direct": (5.0, 5.0),
    "display": (1.0, 1.0),
    "social": (1.0, 1.0),
    "affiliate": (7.0, 7.0),
    # Pages (A/F with ranges where specified)
    "product_page_a": (5.0, 7.0),
    "product_page_b": (4.0, 8.0),
    "product_page_c": (5.0, 7.0),
    "product_page_d": (4.0, 8.0),
    "product_page_e": (5.0, 7.0),
    "product_page_f": (4.0, 8.0),
    "deals_page": (6.0, 6.0),
    "search_page": (5.0, 5.0),
    "homepage": (4.0, 4.0),
    "account_page": (7.0, 7.0),
    "support_page": (3.0, 3.0),
    # Device
    "device_desktop": (6.0, 6.0),
    "device_mobile": (3.0, 3.0),
}

_coef_range_for_score

The ranges themselves cannot be plugged directly into a model to generate a sample that yields an average conversion rate of around 5%. We are accomplishing this via a Logistic Regression, which requires realistic coefficients in the linear function. To convert these ranges into meaningful coefficients, I created the following function:

def _coef_range_for_score(score: float) -> Tuple[float, float]:
    if score <= 2.0:
        return (-1.0, -0.3)      # negative effect
    elif score <= 4.0:
        return (-0.3, 0.3)       # near neutral
    elif score <= 6.0:
        return (0.3, 1.0)        # moderate positive
    elif score <= 8.0:
        return (1.0, 2.5)        # strong positive
    else:
        return (2.5, 4.0)        # very strong positive

_sample_marginal_probabilities

While propensity is essential, one must also consider how often we expect a user to interact with each channel, page, or device. Therefore, we need a function that determines how frequently each element is interacted with by a customer. Note, I have the channel section shown, but the remaining are executed similarly. Keep in mind that all the functions you have seen so far ensure that the synthetic data is different each time it is generated.

def _sample_marginal_probabilities(
    rng: np.random.Generator,
) -> Tuple[Dict[str, float], float]:

    probs: Dict[str, float] = {}

    # Channels – fairly sparse, some more common (SEO, Direct)
    probs["email"] = rng.uniform(0.03, 0.15)
    probs["seo"] = rng.uniform(0.10, 0.60)
    probs["sem"] = rng.uniform(0.05, 0.40)
    probs["direct"] = rng.uniform(0.10, 0.50)
    probs["display"] = rng.uniform(0.01, 0.10)
    probs["social"] = rng.uniform(0.03, 0.20)
    probs["affiliate"] = rng.uniform(0.02, 0.15)

_build_logistic_spec

The following function is what builds the logistic regression model. Here are the first few lines.

def _build_logistic_spec(
    rng: np.random.Generator,
) -> LogisticSpec:
    scores = _sample_feature_scores(rng)

    # Main effects
    main_effects = {}
    for feature in ALL_BINARY_FEATURES:
        score = scores[feature]
        lo, hi = _coef_range_for_score(score)
        main_effects[feature] = rng.uniform(lo, hi)

To reveal interactions among the variables, we will need to add interaction terms to the model. To accomplish this, we will add some functions within our custom logistic regression function that add interaction terms for combinations of two and three features. These can be configured in the function itself, as you can see in the second code block.

    interactions_2 = {}
    interactions_3 = {}

    strong_2 = (1.0, 3.0)
    moderate_2 = (0.5, 1.5)
    weak_2 = (-0.3, 0.3)
    strong_3 = (1.5, 3.5)
    moderate_3 = (0.7, 2.0)

    def add_interaction_2(a, b, coef_range):
        key = tuple(sorted((a, b)))
        interactions_2[key] = rng.uniform(*coef_range)

    def add_interaction_3(a, b, c, coef_range):
        key = tuple(sorted((a, b, c)))
        interactions_3[key] = rng.uniform(*coef_range)

add_interaction_3("sem", "product_page_a", "deals_page", strong_3)
add_interaction_3("seo", "product_page_c", "search_page", moderate_3)

Finally, we add the intercept. Note that while this will ensure our baseline model keeps to data at around a 5% conversion rate, we will need to fine-tune it to keep it close to 5%.

intercept = float(np.log(0.05 / (1.0 - 0.05)))
return LogisticSpec(intercept, main_effects, interactions_2, interactions_3)

_compute_linear_predictor

Now, the previous function does not actually build the model; it sets the stage by creating a dictionary of features, feature interactions, and their associated coefficients. The function below iterates and returns the output once the values for a given observation are plugged in.

def _compute_linear_predictor(
    df: pd.DataFrame,
    spec: LogisticSpec,
) -> np.ndarray:
    z = np.full(shape=len(df), fill_value=spec.intercept, dtype=float)

    # Main effects
    for f, beta in spec.main_effects.items():
        if f in df.columns:
            z += beta * df[f].values

    # 2-way
    for (a, b), beta in spec.interactions_2.items():
        if a in df.columns and b in df.columns:
            z += beta * (df[a].values * df[b].values)

    # 3-way
    for (a, b, c), beta in spec.interactions_3.items():
        if a in df.columns and b in df.columns and c in df.columns:
            z += beta * (df[a].values * df[b].values * df[c].values)

    return z

_calibrate_intercept_to_global_rate

Conversion rates can vary significantly; however, I believe it is safe to assume that most websites receive conversions from a small number of their customers. In this tool, we will adjust the data to take a conversion rate of around 5%. There are a few ways we can do this; however, I find the most efficient approach is to adjust the intercept term until we get a threshold close to the 5% target. The function below does just that. The final function that follows this one combines everything preceding it and is what is actually called in the application.

def _calibrate_intercept_to_global_rate(
    df: pd.DataFrame,
    spec: LogisticSpec,
    target_rate: float = 0.05,
    max_iter: int = 8,
) -> LogisticSpec:
    for _ in range(max_iter):
        z = _compute_linear_predictor(df, spec)
        p = expit(z)
        mean_p = float(p.mean())
        if mean_p <= 0 or mean_p >= 1:
            break  # something degenerate; give up

        current_odds = mean_p / (1.0 - mean_p)
        target_odds = target_rate / (1.0 - target_rate)
        delta = np.log(target_odds / current_odds)

        spec.intercept += float(delta)

        # Early stop if close enough
        if abs(mean_p - target_rate) < 0.002:
            break

    return spec

dividends.py

As you probably guessed, this file is the engine that computes the Harsanyi Dividends. We already went through a robust exercise reviewing how they are calculated; therefore, I think it is much more productive to discuss how the dividends will be calculated in the context of this tool.

Clickstream data, in itself, can be very sparse, as a typical customer journey may involve several individual actions. This poses a challenge when calculating coalition values. Say we have a dataset of 100k customers with all of the actions they took, and we want to calculate the coalition value for customers who interacted with the homepage and a product page. We may find only a handful of customers who performed those two actions alone; therefore, for each coalition, we will check whether a customer performed those actions regardless of what else they did. From there, we take the average to obtain the coalition’s value. One vital note I should mention is that there is no formal definition of how a value score should be calculated in the context of Harsanyi dividends; therefore, one needs to use one’s best judgment. In this example, taking the average is effective because we are using binary data and the average yields a proportion or percentage. Now, if we were using revenue instead, taking the average could be significantly misleading due to potential outliers.

Lastly, I should mention that this file uses parallel programming via the concurrent.futures module and the dynamic configurations. Parallel programming can significantly reduce the time required to compute Harsanyi dividends when working with large datasets. There is also an option to designate the maximum size of the coalitions for which you wish to calculate dividends. The purpose of this tool is to give stakeholders something actionable they can work with. If you are delivering coalitions of customer journeys that include several interactions, this can lead to many fragmented opportunities that could stretch available resources rather than focusing on a few small, high-value coalitions. The last configuration I will mention is the minimum data proportion for a coalition to be included in the calculations. This ensures that any opportunities that the tool uncovers have a respectable sample size.

Demo using Synthetic Data

Now, let’s do a quick demonstration with the tool. We will go from start to finish using the synthetic dataset option and end with a few insights.

Step 1: Generate a synthetic Dataset

Step 2: Configure the maximum coalition size and the minimum % of data required for a coalition to be counted, then calculate the Harsanyi Dividends.

Step 3: Analyze the Results

The resulting dataframe will be sorted by the Harsanyi Dividend column; therefore, one would most likely see that the first few coalitions are from the single players. Given the context in which one would likely use Harsanyi Dividends, individual players aren’t invaluable, but they are practical in that context. The real impact comes from analyzing multiplayer coalitions. Let’s take a look at a few via the export of the above table.

These are the multiple-player coalitions with the largest Harsanyi Dividends; in other words, the players who generate the most synergy together. So, what do we do with this information?

The top multi-player coalition is “deals page” & “SEM”, more practically speaking, customers who went to the deals page from a SEM campaign. One recommendation you could provide as a professional is that more investment could be valuable for these types of campaigns.

What about the following few coalitions? There appear to be various combinations of product pages. You could recommend upsell or cross-sell experiences for these products, as conversion rates increase measurably when customers interact with these pages during the same journey. Upselling and cross-selling these products together could prove to be valuable.

Conclusion

I could go on and on about the endless opportunities a Harsanyi Dividend-derived analysis could deliver, especially in a high-volume marketing or online store environment where countless variables are always at work. To conclude, I want to leave you all with a few tips when it comes to driving ideas and opportunities via Harsanyi Dividends:

Find a balance between coalition value and volume: You will undoubtedly encounter situations where you identify valuable coalitions, but focusing on them would affect only a fraction of the business or customers. It is vital to find a healthy balance from this perspective.
Stick with reasonably sized coalitions: Pitching opportunities or ideas to large coalitions could prove costly from several angles. In my e-commerce site example, there may be instances where a valuable coalition spans multiple pages and perhaps numerous marketing channels. If I tell stakeholders to focus on those combinations, it could require complex investments across various teams and technologies. With that being said, if it is a large coalition of several similar pages, then any investment increase could be streamlined. Ultimately, a reasonably sized coalition will depend on the business case. As with any data science project, domain knowledge is key here.
Translate Dividends into measurable impact: Any opportunity or idea pitched to a stakeholder will most likely require a financial impact. Therefore, one needs to be able to translate a Harsanyi Dividend into an investment return. It might be as simple as reverting to the coalition value metric and adding some multiplier if you recommend a project that would lead to a larger coalition size, for example, more campaigns from a specific channel to a particular page, as I mentioned earlier. There will most likely be countless ways to accomplish this type of mathematical translation.

I hope you enjoyed this article! I find this area of co-operative game theory a lot of fun! If you want to learn more, be sure to check out the original published paper from John Harsanyi entitled: A Simplified Bargaining Model for the n-Person Cooperative Game, published in 1963.