Iron Triangles: Powerful Tools for Analyzing Trade-Offs in AI Product Development

Contents

A Primer on Iron Triangles Trade-Offs at Design-Time Trade-Offs at Run-Time The Wrap

and operating AI products involves making trade-offs. For example, a higher-quality product may take more time and resources to build, while complex inference calls may be slower and more expensive. These trade-offs are a natural consequence of the fundamental economic notion of scarcity, that our potentially unlimited wants can only be partially satisfied by a limited set of available resources. In this article, we will borrow an intuitive triangle framework from project management theory to explore key trade-offs that builders and users of AI products have to navigate at design- and run-time, respectively.

Note: All figures and formulas in the following sections have been created by the author of this article.

A Primer on Iron Triangles

The tensions between project scope, cost, and time have been studied extensively by academics and practitioners in the field of project management since at least the 1950s. Efforts to visually represent the tensions (or trade-offs) between these three quality dimensions have resulted in a triangular framework that goes by many names, including the “iron triangle,” the “triple constraint,” and the “project management triangle.”

The framework makes a few key points:

It is important to analyze the trade-offs between project scope (what benefits, new features, or functionality will the project deliver), cost (in terms of monetary budget, human effort, IT costs), and time (project schedule, time to delivery).
Project cost is a function of scope and time (e.g., larger projects and shorter delivery time frames will cost more), and as per the so-called common law of business balance, “you get what you pay for.”
In an environment where resources are fundamentally scarce, it may be difficult to simultaneously minimize cost and time while maximizing scope. This situation is neatly captured by the phrase “Good, fast, cheap. Choose two,” which is often attributed (albeit without solid evidence) to Victorian art critic John Ruskin. Project managers thus tend to be highly alert to scope creep (adding more features to the project scope than was previously agreed without adequate governance), which can cause project delays and budget overruns.
In any given project, there may be varying degrees of flexibility in levels of scope, cost, and time that are considered acceptable by stakeholders. It may therefore be possible to adjust one or more of these dimensions to derive different acceptable configurations for the project.

The following video explains the use of the triangle framework in project management in more detail:

In the context of AI product development, the triangle framework lends itself to the exploration of trade-offs both at design-time (when the AI product is built), and at run-time (when the AI product is used by customers). In the following sections, we will look more closely at each of these two scenarios in turn.

Trade-Offs at Design-Time

Figure 1 shows a variant of the iron triangle that captures trade-offs faced by an AI product team at design-time.

Figure 1: Design-Time Iron Triangle

The three dimensions of the triangle are:

Feature scope (S) of the AI product measured in story points, function points, or feature units.
Development cost (C) in terms of person-days of human effort (PM, engineering, UX, data science), and monetary costs of staffing (experienced developers may have higher fully loaded costs) and IT (cloud resources, GPUs for training AI models).
Time to market (T), e.g., in weeks or months.

We can theorize the following minimal model of the triple constraint at design-time:

The development cost is proportional to the ratio of scope and time, and k is a positive scalar factor representing productivity. A higher value of k implies a lower design-time cost per unit scope per unit time, and hence greater design-time productivity. The model matches our basic intuition: as T tends to infinity (or S tends to zero), C tends to zero (i.e., stretching the project timeline or cutting down the scope makes the project cheaper).

For example, suppose that our project consists of building an AI product worth 300 story points, in a 100-day time frame, with a productivity factor of 0.012. Assuming a fully loaded cost of $500 per story point, the minimal model suggests that we should budget around $125k to ship the product:

The minimal model encapsulates the physics-like core of the design-time triple constraint. Indeed, the model is reminiscent of the equation taught in school linking distance (d), velocity (v), and time (t), i.e., d = v*t, which relies on some important assumptions (e.g., constant velocity, straight-line motion, continuous measurement of time). In our design-time model, we assume constant productivity (i.e., k does not vary), a linear trade‑off (scope grows linearly with time and cost), and no external shocks (e.g., rework, reorgs, pivots).

Extended versions of the design-time model could consider:

Fixed costs (e.g., a baseline overhead for planning, governance, infrastructure provision), which imply a lower bound for the total design-time cost.
Limited impact of increasing staffing beyond a certain point. As observed by Fred Brooks in his 1975 book The Mythical Man-Month, “Adding manpower to a late software project makes it later.”
Non-linear productivity (e.g., due to rushing or slowing down in different project phases), which can influence the relationship between cost and the scope-time ratio.
Explicit accounting of AI quality standards to allow transparent tracking of success metrics (e.g., adherence to regulatory requirements and service level agreements with customers). Currently, the accounting happens indirectly by attribution to the productivity factor and scope.
The relationship between productivity and the AI product team’s learning curve, as experience, process repetition, and code reuse make the development more efficient over time.
Accounting for net value (i.e., benefits minus costs) or return on investment (ROI) rather than development costs alone.
Factoring in the sharing of scarce resources across multiple AI products being developed in parallel. This would involve taking a portfolio perspective of AI products under development at any given time.

Trade-Offs at Run-Time

Figure 2 shows a variant of the iron triangle capturing trade-offs faced by customers or users of an AI product at run-time.

The three dimensions of this triangle are:

Response quality (Q) of the AI product measured in terms of predictive accuracy, BLEU/ROUGE score, or some other task-specific quality metric.
Inference costs (C) in terms of dollars or cents per inference call, GPU seconds converted to dollars, or energy costs.
Latency of inference (L) in milliseconds, seconds, etc.

We can theorize the following minimal model of the triple constraint at run-time:

The inference cost is proportional to the ratio of response quality and latency, and k is a positive scalar factor representing system efficiency. A higher value of k implies a lower cost for the same response quality and latency. Again, the model aligns with our basic intuition: as L tends to zero (or Q tends to infinity), C tends to infinity (i.e., an AI product that returns real-time, high-quality responses will be more expensive than a similar product delivering slower, inferior responses).

For example, suppose that an AI product consistently achieves 90% predictive accuracy with an average response latency of 0.5 seconds. Assuming an efficiency factor of 180, we can expect the inference cost to be around one cent:

Extended versions of the run-time model could consider:

Baseline fixed costs (e.g., of model loading, pre- and post-processing of user requests).
Variable scaling costs due to a non-linear relationship between cost and quality (e.g., going from 80% to 95% accuracy may be easier than going from 95% to 99%). This could also capture a form of diminishing returns on successive product optimizations.
Stochastic nature of quality, which can vary depending on the input (“garbage in, garbage out”). This can be done by using the expected value of quality, E(Q), instead of an absolute value in the triple constraint model; see this article for a deep dive on expected value analysis in AI product management.
Fixed and variable latency overheads. Inference cost could be modeled as a function of effective latency, accounting for queuing delays, network hops, etc.
Effects of throughput and concurrency. The cost per inference could be lower for batched inferences (due to a kind of amortization of costs across inferences in a batch) or higher if there is network congestion.
Explicit accounting for component efficiencies of the AI algorithm (due to an optimized model architecture, use of pruning, or quantization), hardware (GPU/TPU performance), and energy (electricity usage per FLOP) by decomposing the efficiency factor k accordingly.
Dynamic adaptation of the efficiency factor k with respect to load, hardware, or type/degree of optimizations. E.g., efficiency could improve with caching or model distillation and deteriorate under heavy load due to resource throttling or blocking.

Finally, the decisions made at design-time can shape the situation and types of decisions that can be made at run-time. For instance, the product team may choose to invest significant resources in training a comprehensive foundation model, which can be extended via in-context learning at run-time; compared to a conventional machine learning algorithm such as a random forest, the foundation model is a design-time choice that may allow for better response quality at run-time, albeit at a potentially higher inference cost. Design-time investments in clean code and efficient infrastructure could increase the run-time system efficiency factor. The choice of cloud provider could determine the minimum inference cost achievable at run-time. It is therefore vital to consider the design- and run-time trade-offs jointly in a holistic manner.

The Wrap

As this article demonstrates, the iron triangle from project management theory can be repurposed to produce simple yet powerful frameworks for analyzing design- and run-time trade-offs in AI product development. The design-time iron triangle can be used by product teams to make decisions about budgeting, resource allocation, and delivery planning. The complementary run-time iron triangle offers several insights into how the relationship between inference costs, response quality, and latency can affect product adoption and customer satisfaction. Since design-time decisions can constrain run-time optionality, it is important to think about design- and run-time trade-offs jointly from the outset. By recognizing the trade‑offs early and working around them, product teams and their customers can create more value from the design and use of AI.