Everyday Decisions are Noisier Than You Think — Here’s How AI Can Help Fix That

Editor
34 Min Read


reading the thought-provoking book Noise: A Flaw in Human Judgment — by Daniel Kahneman (Nobel Prize Winner in Economics and the best selling author of Thinking Fast and Slow) and Professors Olivier Sibony and Cass Sunstein. Noise highlights the looming, but usually well-hidden, presence of persistent noise in human affairs — defined as the variability in decision making outcomes for the same tasks across experts in a particular field. The book supplies many compelling anecdotes into the real effects of noise from fields such as Insurance, Medicine, Forensic Science and Law.

Noise is distinguished from bias which is the magnitude and direction of the error in decision making across those same set of experts. The key difference is best explained in the following diagram:

Figure 1. Four teams: an illustration of bias and noise in judgment. Here the bullseye is the true or correct answer. Bias occurs when judgments are systematically shifted away from the truth, as in Teams A and B, where the shots are consistently off-center in one direction. Noise, by contrast, reflects inconsistency: the judgments scatter unpredictably, as seen in Teams A, C and D. In this example, Team A has a large degree of noise and bias. 📖 Source: Daniel Kahneman, Olivier Sibony, and Cass R. Sunstein, Noise: A Flaw in Human Judgment (HarperCollins, 2021). Diagram adapted by author.

The diagram illustrates the distinction between bias and noise in human judgment. Each target represents repeated judgments against the same problem, with the bullseye symbolising the correct answer. Bias occurs when judgments are systematically shifted away from the truth, as in Teams A and B, where the shots are consistently off-center. Noise, by contrast, reflects inconsistency: the judgments scatter unpredictably, as seen in Teams A, C and D. In this example, Team A has a large degree of noise and bias.

We can summarise this as follows:

  • Team A: The shots are all off-center (bias) and not tightly clustered (noise). This shows both bias and noise.
  • Team B: Shots are tightly clustered but systematically away from the bullseye. This shows bias with little noise.
  • Team C: Shots are spread out and inconsistent, with no clear cluster. This is noise, with less systematic bias.
  • Team D: Also spread out, showing noise.

While bias pulls decisions in the wrong direction, noise creates variability that undermines fairness and reliability.

Artificial Intelligence (AI) practitioners will have an a-ha moment just now, as the bias and noise described above is reminiscent of the bias-variance trade-off in AI, where we seek models that explain the data well, but without fitting to the noise. Noise here is synonymous with variance.

The two major components of human judgement error can be broken down through what is called the overall error equation, with mean squared error (MSE) used to aggregate the errors across individual decisions:

Overall Error (MSE) = Bias² + Noise²

Bias is the average error, while noise is the standard deviation of judgments. Overall error can be reduced by addressing either, since both contribute equally. Bias is usually the more visible component — it is often obvious when a set of decisions systematically leans in one direction. Noise, by contrast, is harder to detect because it hides in variability. Think of the target I presented earlier: bias is when all the arrows cluster off-center, while noise is when arrows are scattered all over the board. Both reduce accuracy, but in different ways. The practical takeaway from the error equation is clear: we should aim to reduce both bias and noise, rather than fixating on the more visible bias alone. Reducing noise also has the benefit of making any underlying bias far easier to spot.

To solidify our understanding of bias and noise, another useful visualisation from the book is shown below. These diagrams plot judgment errors: the x-axis shows the magnitude of the error (difference between judgment and truth), and the y-axis shows its probability. In the left plot, noise is reduced while bias remains: the distribution narrows, but its mean stays offset from zero. In the right plot, bias is reduced: the entire distribution shifts toward zero, while its width (the noise) remains unchanged.

Figure 2: Reducing noise narrows the spread of judgment errors; reducing bias shifts the mean closer to zero. 📖Source: Daniel Kahneman, Olivier Sibony, and Cass R. Sunstein, Noise: A Flaw in Human Judgment (HarperCollins, 2021). Diagram adapted by author.

Noise and bias help explain why organisations often reach decisions that are both inaccurate and inconsistent, with outcomes swayed by factors such as mood, timing, or context. Court rulings are a good example: two judges — or even the same judge on different days — may decide similar cases differently. External factors as trivial as the weather or a local sports result can also shape a judgment. To counter this, startups like Bench IQ are using AI to expose noise and bias in judicial decision-making. Their pitch highlights a tool that maps judges’ patterns to give lawyers a clearer view of how a ruling might unfold. This tool aims to tackle a core concern of Noise: when randomness distorts high-stakes decisions, tools that measure and predict judgment patterns could help restore consistency.

Another compelling example presented by the book comes from the insurance industry. In Noise: A Flaw in Human Judgment, the authors show how judgments by underwriters and adjusters varied dramatically. A noise audit revealed that quotes often depended on who was assigned — essentially a lottery. On average, the difference between two underwriters’ estimates was 55% of their mean, five times higher than what a group of surveyed CEOs expected. For the same case, one underwriter might set a premium at $9,500 while another set it at $16,700 — a shockingly wide margin. Noise is clearly at play here, and this is just one example among many.

Ask yourself this question: when relying on professional judgement would you willingly sign up for a lottery that gives highly variable outcomes, or would you prefer a system that reliably produces consistent judgments?

By now it should be apparent that noise is a very real phenomenon and costs organisations hundreds of millions in errors, inefficiencies, and lost opportunities through ineffective decision making.

Why Group Decisions are Even More Noisier: Information Cascades and Group Polarisation

The wisdom of crowds suggests that group decisions can approximate the truth — when people make judgments independently, their errors cancel out. The idea of the wisdom of crowds goes back to Francis Galton in 1906. At a livestock fair, he asked 800 people to guess the weight of an ox. Individually, their estimates varied widely. But when averaged, the crowd’s judgment was almost perfect — just one pound off. This illustrates the promise of aggregation: independent errors cancel out, and the group judgment converges on the truth.

But in reality, psychological and social factors often derail this process. In groups, outcomes are swayed by who speaks first, who sits next to whom, or who gestures at the right moment. The same group, faced with the same problem, can reach very different conclusions on different days.

In Noise: A Flaw in Human Judgment, the authors highlight a study on music popularity as an example of how group choices can be distorted by social influence. When people saw that a particular song had already been downloaded many times, they were more likely to download it themselves, creating a self-reinforcing cycle of popularity. Strikingly, the same song could end up with very different levels of success across different groups, depending largely on whether it happened to attract early momentum. The study shows how social influence can shape collective judgment, often amplifying noise in unpredictable ways.

Two key mechanisms help explain the dynamics of group-based decision making:

  • Information Cascades — Like dominoes falling after the first push, small early signals can tip an entire group. People copy what’s already been said instead of voicing their own true judgment. Social pressure compounds the effect — few want to appear silly or contrarian.
  • Group Polarization — Deliberation often drives groups toward more extreme positions. Instead of balancing out, discussion amplifies tendencies. Kahneman and colleagues illustrate this with juries: statistical juries, where members judge independently, show much less noise than deliberating juries, where discussion pushes the group toward either greater leniency or greater severity, as compared to the median.

Paradoxically, talking together can make groups less accurate and noisier than if individuals had judged alone. There is a salient lesson here for management: group discussions should ideally be orchestrated in a way that is noise-sensitive, using strategies that aim to reduce bias and noise.

Mapping the Landscape of Noisy Decisions

The key lesson from Noise: A Flaw in Human Judgment is that all human decision-making, both individual and group-based, is noisy. This may or may not come as a surprise, depending on how often you have personally been affected by the variance in professional judgments. But the evidence is overwhelming: medicine is noisy, child-custody rulings are noisy, forecasts are noisy, asylum decisions are noisy, personnel judgments are noisy, bail hearings are noisy. Even forensic science and patent reviews are noisy. Noise is everywhere, yet it is rarely noticed — and even more rarely counteracted.

To help get a grasp on noise, it can be useful to try and categorise it. Let’s begin with a taxonomy of decisions. Two important distinctions help us organise noisy decisions — recurrent vs singular and evaluative vs predictive. Together, these form a simple mental framework for guidance:

  • Recurrent vs Singular decisions: Recurrent decisions involve repeated judgments of similar cases — underwriting insurance policies, hiring employees, or diagnosing patients. Here, noise is easier to spot because patterns of inconsistency emerge across decision-makers. Singular decisions, by contrast, are essentially recurrent decisions made only once: granting a patent, approving bail, or deciding an asylum case. Each decision stands alone, so the noise is present but largely invisible — we cannot easily compare what another decision-maker would have done in the same case.
  • Evaluative vs Predictive decisions: Evaluative decisions are judgments of quality or merit — such as rating a job candidate, evaluating a scientific paper, or assessing performance. Predictive decisions, on the other hand, forecast outcomes — estimating whether a defendant will reoffend, how a patient will respond to treatment, or whether a startup will succeed. Both types are subject to noise, but the mechanisms differ: evaluative noise often reflects inconsistent standards or criteria, while predictive noise stems from variability in how people imagine and weigh the future.

Together, these categories provide a framework for understanding the noise within human judgment. Noise influences how we evaluate and how we predict. Recognising these distinctions is the first step toward designing systems that reduce variability and improve decision quality. Later, I will present some concrete measures that can be taken for reducing noise in both types of judgements.

Not All Noise Is the Same: A Guide to Its Varieties

A noise audit, which is sometimes possible for recurrent decisions, can reveal just how inconsistent human judgment can be. Management can conduct a noise audit by having multiple individuals evaluate the same case. This helps make the variability in the responses become visible and measurable. The outcomes can sometimes be very revealing, a good example is the underwriting case I summarised earlier.

To strike at the heart of the beast, the authors of Noise: A Flaw in Human Judgment distinguish between several types of noise. At the broadest level is system noise — the overall variability in judgments across a group of professionals looking at the same case. System noise can be further divided into the following three sub-components:

  • Level Noise — How much do you disagree with your peers? Differences in the overall average judgments across individuals — some judges are stricter, some underwriters more generous.
  • Pattern Noise — In what consistent way are you uniquely wrong? This is the personal, idiosyncratic tendencies that skew an individual’s decisions — always a bit lenient, always a bit pessimistic, always harsher on certain types of cases. Pattern noise can be broken down into stable pattern noise, which reflects enduring personal tendencies that persist across time and situations, and transient pattern noise, which arises from temporary states such as mood, fatigue, or context that may shift decision to decision.
  • Occasion Noise — How often do you disagree with yourself? Variation in the same person’s judgments at different times, influenced by mood, fatigue, or context. Occasion noise is generally a smaller component in the total system noise. In other words, and thankfully, we are usually more consistent with ourselves across time than interchangeable with another person in the same role.

The relative impact of each type of noise varies across tasks, domains and individuals, with level noise often contributing the most to system noise, followed by pattern noise and then occasion noise. These forms of noise highlight the complexity of untangling how variability affects decision-making, and their differing effects explain why organizations so often reach inconsistent outcomes even when applying the same rules to the same information.

By recognizing both the types of decisions and the sources of noise that shape them, we can design more deliberate strategies to reduce variability and enhance the quality of our judgments.

Strategies for Minimising Noise in our Judgements

Noise in decision-making can never be eliminated, but it can be reduced through well-designed processes and habits — what Kahneman and colleagues call decision hygiene. Like hand-washing, it prevents problems we cannot see or trace directly, yet still lowers risk.

Key strategies include:

  • Conduct a noise audit: Acknowledge that noise is possible and assess the magnitude of variation in judgments by asking multiple decision-makers to evaluate the same cases. This makes noise visible and quantifiable. For example, in the table below three raters scored the same case 4/10, 7/10, and 8/10, producing a mean rating of 6.3/10 and a spread of 4 points. The calculated Noise Index highlights how much individual judgments deviate from the group, making inconsistency explicit.
Table 1 — Noise Audit Example: Three decision-makers independently rate the same case. Their judgments diverge widely (4/10, 7/10, 8/10), revealing inconsistency not driven by bias but by noise📖 Source: Table by author.
  • Use a decision observer: Having a neutral participant in the room helps guide the conversation, surface biases, and keep the group aligned with decision principles. Using a decision observer is most useful to reduce bias in decision making — which is more visible and easier to detect than noise.
  • Assemble a diverse, skilled team: Diversity of expertise reduces correlated errors and provides complementary perspectives, limiting the risk of systematic blind spots.
  • Sequence information carefully: Present only relevant information, in the right order. Exposing irrelevant details early can anchor judgments in unhelpful ways. For example, fingerprint analysts could be swayed by details of the case, or the judgement of a colleague.
  • Adopt checklists: Simple checklists, as championed in The Checklist Manifesto, can be highly effective in high-stakes, high-stress situations by ensuring that critical factors are not overlooked. For example, in medicine the Apgar score began as a guideline for systematically assessing newborn health but was translated into a checklist: clinicians tick through predefined dimensions — heart rate, breathing, reflexes, muscle tone, and skin colour — within a minute of birth. In this way a a complex decision is decomposed into sub-judgments, reducing cognitive load, and improves consistency.
  • Use a shared scale: Decisions should be anchored to a common, external frame of reference rather than each judge relying on personal criteria. This approach has been shown to reduce noise in contexts such as hiring and workplace performance evaluations. By structuring each performance dimension separately and comparing multiple team members simultaneously, applying a standardised ranking scale, and using forced anchors for reference (e.g., case studies showing what good and great means), evaluators are much less likely to introduce idiosyncratic biases and variability.
  • Harness the wisdom of crowds: Independent judgments, aggregated, are often more accurate than collective deliberation. Francis Galton’s famous “village fair” study showed that the median of many independent estimates can outperform even experts.
  • Create an “inner crowd”: Individuals can reduce their own noise by simulating multiple perspectives — making the same judgment again after time has passed, or by deliberately arguing against their initial conclusion. This effectively samples responses from an internal probability distribution, reminiscent of how large language models (LLMs) generate alternative completions. A great source of examples of this technique in action can be found in Ben Horowitz’s excellent book The Hard Thing About Hard Things. You can see Horowitz forming an inner crowd to test every angle when facing high-stakes choices — for example, weighing whether to replace a struggling executive, or deciding if the company should pivot its strategy in the midst of crisis. Rather than relying on a single instinct, he systematically challenges his own assumptions, replaying the decision from multiple standpoints until the most resilient path forward becomes clear.
  • Anchor to an external baseline: when making predictive judgments, think statistically and start by identifying an appropriate external baseline average. Then assess how strongly the information at hand correlates with the outcome. If the correlation is high, adjust the baseline accordingly; if it is weak or nonexistent, stick with the average as your best estimate. For instance, imagine you’re trying to predict a student’s GPA. The natural baseline is the statistical average GPA of 3.2. If the student has consistently excelled across similar courses, that record is strongly correlated with future performance, and you can reasonably adjust your forecast upward toward your intuitive guess of, say, 3.8. But if your main piece of information is something weakly predictive — like the student participating in a debate club — you should resist making adjustments and stick close to the baseline. This approach not only reduces noise but also guards against the common bias of ignoring regression to the mean: the statistical tendency for extreme performances (good or bad) to move closer to the average over time. Starting with the baseline and only shifting when strong evidence justifies it is the essence of noise reduction in predictive judgments, as the diagram below illustrates.
Adjusting an intuitive prediction for regression to the mean: statistical view anchors predictions at the average (3.2–3.3), while the intuitive view pulls toward personal judgment (3.8). The adjustment depends on confidence, from no predictive value to perfect prediction. 📖Source: Daniel Kahneman, Olivier Sibony, and Cass R. Sunstein, Noise: A Flaw in Human Judgment (HarperCollins, 2021). Diagram adapted by author.

Lastly, and by no means least, we can also turn to algorithms as a helper in our decision making: from simple rules-based models to advanced AI systems, algorithms can radically reduce noise in judgments. Used with a human in the loop for oversight and verification, they provide a consistent baseline while leaving space for human discretion when it is most valuable.

Finding the Broken Legs: Leveraging AI in Judgment

One of the most important questions in decision-making is when to when trust algorithms and when to let human judgment take the lead. A useful starting point is the broken leg principle: if you know decisive information that the model could not possibly take into account, you should override its prediction.

For example, if a model predicts that someone will run their usual morning 5k because they never miss a day, but you know they’re down with the flu, you don’t need the algorithm’s forecast — you already know the jog isn’t happening.

AI can often find these types of broken legs on its own. By analysing vast datasets across thousands — or millions — of cases, AI systems can identify subtle, rare, but decisive patterns that humans would likely miss.

To understand what a broken leg is, imagine a commuter who regularly bikes to work every day, but on the one morning there’s a severe snowstorm, the odds of biking collapse—an anomaly the data and an appropriately tuned AI can still catch.

The book — Noise: A Flaw in Human Judgment — highlights how Sendhil Mullainathan and colleagues explored this idea in the context of bail decisions. They trained an AI system on over 758,000 bail cases. Judges had access to the same information — rap sheets, prior failures to appear, and other case details — but the AI was also given the outcomes: whether defendants were released, failed to appear in court, or were rearrested. The AI produced a simple numerical score estimating risk. Crucially, no matter where the threshold was set, the model outperformed human judges. The AI was significantly more accurate at predicting failures to appear and rearrests.

The advantage comes from AI’s ability to detect complex combinations of variables. While a human judge might focus on obvious cues, the model can weigh thousands of subtle correlations simultaneously. This is especially powerful in identifying the highest-risk individuals, where rare but telling patterns predict dangerous outcomes. In other words, the AI excels at picking up rare but decisive signals — the broken legs — that humans either overlook or can’t consistently evaluate.

“The algorithm makes mistakes, of course. But if human judges make even more mistakes, whom should we trust” Source: Noise: A Flaw in Human Judgment (HarperCollins, 2021).

AI models, if designed and applied carefully, can reduce discrimination and improve accuracy. As we’ve seen, AI can enhance human decision making by uncovering hidden structure in messy, complex data. The challenge therefore becomes how to balance the two, and establish an effective human-machine team: when to trust the statistical patterns, and when to step in with human judgment for the broken legs the model can’t yet see.

Figure 3: Spectrum of predictive models — from simple rules to advanced machine learning, illustrating the trade-off between simplicity and complexity in judgment and prediction. 📖 Source: Daniel Kahneman, Olivier Sibony, and Cass R. Sunstein, Noise: A Flaw in Human Judgment (HarperCollins, 2021). Diagram adapted by author.

When large-scale data isn’t available to train advanced AI models, all is not lost. We can go simpler: either by using equally weighted predictors — where each factor or input is given the same importance rather than a learned weight (as in multiple regression) — or by applying simple rules. Both approaches can significantly outperform human judgment. Psychologist Robyn Dawes demonstrated this counterintuitive finding, coining the term improper linear regression to describe the equal-weighting method.

For example, imagine forecasting next quarter’s sales using four independent predictors: historical trend extrapolation (+8%), market sentiment index (+12%), analyst consensus (+6%), and manager gut-feel (+10%). Instead of trusting any single forecast, the improper linear model simply averages them, producing a final prediction of +9%. By cancelling out random variation in individual inputs, this method often beats expert judgment and shows why equal weighting can be surprisingly powerful.

AI practitioners can view Dawes’ breakthrough as an early form of capacity control: in low-data settings, giving every input equal weight prevents the model from overfitting to noise.

Rules are arguably even simpler and can dramatically cut down the noise. Kahneman, Sibony, and Sunstein highlight a team of researchers who built a simple model to assess flight risk for defendants awaiting trial. Using just two predictors — age and the number of missed court dates — the model produced a risk score that rivalled human assessments. The formula was so simple it could be calculated by hand.

Conclusions and Final Thoughts

We have explored the main lessons from Noise: A Flaw in Human Judgment by Kahneman, Sibony, and Sunstein. The book highlights how noise is the proverbial elephant in the room — ever present yet rarely acknowledged or addressed. Unlike bias, noise in judgment is silent, but its impact is real: it costs money, shapes decisions, and affects lives. Kahneman and his co-authors make a compelling case for systematically analyzing noise and its consequences wherever important decisions are made.

Figure 4: Noise is the elephant in the room and can greatly influence individual and group judgements. 📖 Source: Author’s own via GPT5.

In this article, we examined the different types of decisions — evaluative versus predictive, recurrent versus singular — and the corresponding types of noise, including system noisepattern noiselevel noise, and occasion noise. We also linked noise to bias through the noise equation, highlighting the importance of addressing both. While bias is often more visible, the book makes clear that noise is equally damaging, and efforts to reduce it are just as essential.

Noise is less visible than bias not because it cannot be seen, but because it rarely announces itself without systematic comparison. Bias is systematic: after a handful of cases, you can spot a consistent drift in one direction, such as a judge who is always harsher than average. Noise, by contrast, shows up as inconsistency — lenient one day, harsh the next. In principle, this variance is visible, but in practice each decision, viewed in isolation, still feels reasonable. Unless judgments are lined up and compared side by side — a process Kahneman and colleagues call a “noise audit” — the silent cost of variability goes unnoticed.

Thankfully, there are concrete steps we can take to improve our judgments and make our decisions noise-aware: we touched on the importance of a noise audit to firstly accept noise as a possibility that may be an issue. Based on that, and depending on the situation, we can embrace better decision hygiene through, for example, structured decision protocols, the use of independent multiple assessments or AI when used carefully and responsibly— these are concrete shifts that help reduce variability and make our judgments more consistent.

Disclaimer: The views and opinions expressed in this article are my own and do not represent those of my employer or any affiliated organizations. The content is based on personal experience and reflection, and should not be taken as professional or academic advice.

📚Further Learning

Some suggested further reading to deepen your understanding of noise in judgment, forecasting, and decision hygiene:

Share this Article
Please enter CoinGecko Free Api Key to get this plugin works.