Scaling AI With Adaptive Governance

Christian Gralingen

The Research

From 2022 to 2025, the authors conducted in-depth, semistructured interviews with senior leaders and practitioners responsible for AI governance, risk, compliance, data, and product decisions.

Core interviews were conducted at Microsoft, Barclays, Kyriba, Nasdaq, Lloyds Bank, Danske Bank, and the Abu Dhabi Department of Finance. The interviews focused on how governance works in practice: where it breaks down, how controls are enacted, and what organizational trade-offs leaders face as AI systems scale.
The authors collected additional evidence on AI governance at more than 40 other financial institutions by drawing on public disclosures, regulatory filings, and practitioner documentation. These additional cases were used to validate the generalizability of consistent themes that emerged from the core interviews.

Leaders with even a cursory understanding of artificial intelligence know that while the technology can help them improve productivity and capture new opportunities, it can also expose their organization to many risks. Those with a bit more knowledge are aware that surfacing and mitigating those risks requires adopting responsible AI practices. And leaders who are scaling an AI implementation within their organization will quickly realize that ad hoc attention to those practices is inadequate and that they need to develop the capacity to systematically govern AI at scale.

But building that capacity is proving far harder than most executives expect. They know what they need to accomplish; frameworks from governments and regulators define important guardrails and principles, such as transparency, fairness, and accountability.1 But to implement controls and principles into day-to-day workflows and decision-making, organizations must rethink AI governance. They must frame that task not as a compliance obligation but as a strategic, adaptive capability that evolves as AI systems scale, use cases expand, and risks shift over time.

In this article, we will share how leading organizations are doing exactly that. We will also introduce an approach to adaptive AI governance built on two principles: matching governance controls to the type of AI system and risk involved, and embedding those controls directly into workflows, decision rights, and accountability structures.

The Fundamentals of AI Risk

To design effective AI governance, leaders must first understand the multiple ways in which AI can fail and the corresponding risks. The nature and severity of these risks depend on the type of system, its level of autonomy, and the scope of domains affected by its decisions. The central challenge, therefore, is to design controls that anticipate how risks will emerge and that evolve as AI systems operate. Even as conditions, inputs, and expectations change, AI must remain reliable, safe, and aligned with an organization’s values and goals.

In practice, most AI risks emerge at two moments that require very different governance responses: during development and after deployment. Development risks include using biased or incomplete training data, failing to adequately align the model to the task requirements, and following inadequate validation processes. For example, an early credit-limit-increase model at a bank we studied demonstrated that small input changes could lead to unexpected decision shifts.

Deployment risks arise when models interact with dynamic environments and human operators: Sustaining legitimacy, judgment, and accountability once AI systems are operating at scale in real time is a central challenge. Over time, model quality may degrade as the statistical properties of input data change over time, a phenomenon termed data drift. A model may generate plausible but false outputs or be overly trusted by users who lack the means to detect errors. At Nasdaq, AI-driven market-surveillance systems monitor trading activity for suspicious patterns, generating hundreds of alerts per second. Those systems may fail to accurately flag activity, however, because the boundary between abnormal and illicit behavior is often hard to spot; illegitimate behavior may be deliberately designed to pass as compliant by exploiting model learning patterns.

Fit-for-Purpose Controls

The kinds of controls employed depend not only on when risks arise in the AI life cycle but also on what kind of AI system is involved and how widely its decisions propagate. Artificial intelligence systems can be broadly divided into two categories: those based on bounded-learning (or static) models and those that learn and adapt in deployment. (See “Controls in Adaptive AI Governance Systems.”)

Bounded-learning systems operate within a fixed set of rules and parameters. Optimizing how those rules are applied, rather than changing them, is what improves their performance. Credit-scoring models, for example, refine risk estimates based on income or payment history, but they do not alter how those variables relate to one another. Many generative AI models are “pretrained” (static) and do not get updated during use. Contrast that with adaptive learning systems, which evolve by incorporating production data into their training data and by updating internal representations and relationships between variables. Algorithmic trading platforms and dynamic fraud-detection systems illustrate this approach.

Just as salient to the type of control required is the scope of domains affected by AI decisions, shown on the vertical axis of the figure “Controls in Adaptive AI Governance Systems.” This dimension determines how far and how fast risks can travel once a system goes wrong. At one extreme are narrow-scope systems, where errors remain contained within a specific function or task (such as detecting anomalies within a single transaction stream). At the other extreme are wide-scope systems that shape outcomes across multiple functions, geographies, or even industries, such as cross-border supply-chain optimization platforms. The difference is not incremental but exponential: As system reach expands, small errors interact, propagate, and amplify into second-order effects.

Based on our typology of AI systems, we believe that rules-based controls provide the baseline safeguards for all narrow, static AI systems. When such static systems operate at a wider scope, additional propagation-risk controls must be layered on to address broader downstream effects.

For adaptive learning systems, baseline safeguards remain necessary but must be complemented by ex post alignment controls, particularly those focused on explainability and legitimacy. When adaptive systems also have a wide scope, they require the most comprehensive approach: integrated controls that combine baseline rules-based measures with propagation risk management and alignment mechanisms. Let’s take a closer look at how each of them works in practice.

Rules-Based Controls

Rules-based controls are designed to prevent and correct errors in systems that operate within clearly defined parameters. They are particularly effective in narrow decision domains where logic is explicit and outcomes auditable, such as credit scoring, fraud detection, or the use of customer service chatbots. Rules-based controls embed relevant norms (such as ethical guidelines and industry standards) and compliance requirements into models, using them as design constraints. Rules-based controls also include processes such as validation testing or anomaly monitoring.

Consider the credit-limit-increase decision model mentioned earlier. A senior AI leader at the bank explained that it uses a statistical model rather than deep learning so that decisions remain interpretable. Before deploying a new model, the analytics team produces documentation called a model card that covers three aspects of AI risk management. First, data checks indicate whether the training data is complete, recent, and balanced and how the team will detect data drift over time. Next, decision logic and edge cases are checked to see how scores translate into approve/deny decisions; this includes explicit analysis of thresholds where a customer tips from no increase to an increase, so that customers in the “gray zone” are not unfairly treated. Finally, bias and discrimination tests are undertaken to check that the model does not overfit to particular customer profiles or systematically disadvantage certain groups.

The model card undergoes quality-assurance review by an independent model-risk unit, with input from credit-domain and regulatory experts. Internal auditing later verifies that these steps were followed. Only then does the model go live.

Human judgment is central even in rules-based settings. In one organization, each new lending model for midmarket clients underwent sample testing before deployment. Risk teams selected 100 existing client files and ran them through the model. Relationship managers then compared the model’s recommended lending decisions with their own assessments. Where recommendations diverged, the model team investigated whether the model had uncovered a genuine insight or was overfitting to idiosyncrasies in the data. Only once the sample review showed an acceptable level of alignment between model outputs and the judgments of the domain experts involved — and the sources of disagreement were understood — did the bank approve the model for live use. After launch, periodic sample reviews continued as part of the standard risk-and-control cycle.

Rules-based controls are effective because they make critical decision boundaries explicit, reviewable, and contestable across domains. They are adaptive because they can be recalibrated over time. Divergences between model outputs and expert judgment are treated as learning signals, feeding back into updated model thresholds, assumptions, and review routines as data, models, and decision contexts evolve.

Ex Post Alignment

The complexity of advanced AI systems, particularly those based on deep neural networks, render traditional traceability and explainability methods less effective. Rules-based controls depend on the ability to specify decision logic, yet that logic becomes increasingly opaque as models grow more complex. As a result, stakeholders must ensure that outcomes remain reliable, fair, and aligned with organizational and regulatory expectations. When such systems operate with significant autonomy, this need for explainability becomes especially critical, since decisions may be made and acted upon without immediate human review. Generative AI introduces an additional layer of difficulty because of its stochastic behavior, where the same prompt may yield different outputs.

This is where ex post alignment controls come in. They reveal not how a decision was made but whether its outcomes remain legitimate. They assess AI decisions against ethical, regulatory, and domain-specific standards. While some techniques carry over from rules-based approaches, the emphasis shifts from preventing errors upfront to detecting misalignment as systems operate, learn, and scale.

Organizations operationalize ex post alignment through layered evaluation processes that test outcomes against reference expectations. Microsoft, for instance, has developed a structured evaluation pipeline in which high-stakes models are assessed against libraries of expert-defined policies — such as what constitutes a “fair” or “acceptable” outcome. Evaluators annotate model outputs against these policies, while independent reviewers validate where the system falls short. In some cases, these evaluations can be partially automated — for example, when AI systems are continuously assessed against predefined policy benchmarks, fairness constraints, or risk thresholds, with automated monitors flagging deviations for human review.

This is why algorithmic auditing is a critical component of ex post alignment. After its deployment, a model’s behavior is systematically examined to detect hidden risks, evaluate fairness and performance across affected groups, and verify that outcomes align with organizational policy and ethical standards. Auditing proceeds in two steps. First, auditors identify plausible failure scenarios and define the full use case, including whom the system serves, who is affected by its decisions, and for what purpose it operates. They then monitor these risks by assessing decision outputs, input data, and internal logic against predefined criteria. This process helps organizations surface unintended consequences, such as disparate impacts; document recurring risk patterns; and trigger corrective action before harm proliferates. Frameworks for auditing algorithmic risk, such as those articulated in Cathy O’Neil’s work on auditing AI systems, provide practical tools and metrics to operationalize this approach and strengthen accountability.2 In this way, auditing functions as both a diagnostic mechanism and a foundation for continuous improvement.

A key part of ex post alignment is ensuring that people do not treat AI outputs as unquestionable truths. Because many AI recommendations are inherently probabilistic, organizations need to train users to interpret them as informed signals rather than final decisions. Helping managers understand when to rely on the system, when to challenge it, and how to spot unexpected or biased outputs is essential for keeping AI use legitimate, accountable, and aligned with organizational values over time.

Managing misalignment at scale can be a particular challenge, especially for systems that are designed to filter, prioritize, and escalate alerts in real time. Nasdaq’s AI-driven market surveillance, for example, monitors trading activity for irregularities — such as unusual volumes, price anomalies, or potential manipulation — and can generate hundreds of high-risk alerts per second. Cross-functional teams of compliance officers, data scientists, and domain experts review flagged activity through structured case workflows. Each alert is assessed to determine whether it reflects genuine market manipulation or a false positive triggered by unusual but legitimate trading behavior. Investigators document the rationale for their conclusions, and these outcomes are fed back to model developers to recalibrate thresholds, refine detection features, and reduce recurring noise in future alerts.

Escalation committees intervene when investigations suggest the involvement of coordinated bad actors or when anomalies indicate broader systemic risk. Audit trails capture key elements of this process, including the original alert, supporting data signals, the human decision taken, and any subsequent model adjustments made. Periodic governance reviews are conducted to evaluate patterns of false positives and missed detections to ensure accountability, regulatory compliance, and continuous improvement of surveillance rules. Even so, surges in alert volumes can place severe strain on teams, overwhelming response capacity and increasing the risk of error.

One effective approach to managing the impact of high volumes of alerts is to redesign workflows around AI outputs. This approach is well illustrated by a global bank’s experience with AI-driven fraud detection. Executives found that the main challenges did not stem from errors in the model predictions but from breakdowns in how fraud alerts were interpreted, routed, and acted upon across teams. Inconsistent handoffs between compliance, risk, and front-line staff members often led to delayed responses, duplicated effort, or missed follow-up, undermining the system’s effectiveness despite technically sound outputs. For example, alerts were sometimes routed to the wrong team, duplicated across units, or left unresolved because no group clearly owned the next step. Customer service employees occasionally contacted clients based on alerts that fraud teams had not yet validated, while high-risk cases were delayed because escalation criteria were unclear.

To address those problems, the bank mapped the alert workflow step-by-step and reassigned responsibilities at each decision point. Fraud analysts were given clearer authority to close low-confidence alerts, fraud operations focused on rapid escalation of confirmed cases, and customer service teams were engaged after a fraud review determined that outreach was necessary. Decision rules were standardized — for instance, when an alert should be suppressed, investigated further, or escalated — reducing delays, unnecessary escalations, and alert overload.

Ex post alignment focuses on evaluating AI decisions after they have been made, by testing outcomes against ethical, regulatory, and domain-specific expectations rather than reconstructing internal decision logic. Ultimately, successful ex post alignment does not eliminate risk; it sustains legitimacy by ensuring that high-agency AI outcomes remain contestable, correctable, and aligned with the standards that matter over time. Unlike traditional risk management, ex post alignment accepts that some misalignment is inevitable — and focuses governance on detection, contestability, and correction rather than prevention alone.

Propagation-Risk Controls

Rules-based controls and ex post alignment mechanisms share an important limitation: They tend to treat risk as largely confined, focusing on discrete errors or individual outputs. This approach can be effective when AI systems operate in relative isolation, but it produces incomplete outcomes when systems are interconnected through real-time data flows, APIs, and automated decision-making. The rise of agentic AI is a case in point. As AI systems increasingly initiate actions autonomously, coordinate with other systems, and pursue objectives across multiple domains, errors or misalignments originating in one system can propagate across others. The relevant concern, therefore, is interdependence and propagation risks that can have downstream effects that traditional, output-focused controls may overlook.

Regulators are increasingly recognizing the importance of propagation risks and the need for robust testing and oversight. The Bank of England, for example, has highlighted the risks posed by “deep trading agents” — AI-driven strategies that could amplify external shocks or coordinate in ways that evade human detection. In health care, biased diagnostic models can spread flawed heuristics across hospitals and insurers. In supply chains, algorithmic procurement platforms can amplify pricing errors across entire supplier networks. Similar dynamics can arise in any digitally interconnected system.

Propagation-risk controls represent a third layer of governance and are designed to surface second- and higher-order effects before they overwhelm downstream functions. In our framework, rules-based controls safeguard narrow and relatively static processes, alignment mechanisms address complex systems whose decisions are opaque, and propagation controls focus on interconnected systems. These controls are concerned with not only what happens within a system but what occurs when systems interact. Their central challenge is invisibility: Failures travel laterally, exploiting hidden interdependencies that often become apparent only when a disruption occurs. A minor logistics API error, for example, may be harmless in isolation, but when it is combined with a cyber incident affecting a payment gateway, it can contribute to systemic breakdown.

A governance framework built around a company-centric view of risk is poorly suited to track such cross-boundary dynamics. Because propagation risks unfold across interconnected systems, often beyond the visibility or control of any single organization, managing them requires a shift from a company-centric perspective to an ecosystem-aware perspective.

This shift involves three complementary activities: mapping interdependencies, monitoring shared infrastructures, and institutionalizing anticipatory oversight. Together, these practices help surface risks that remain invisible when controls focus only on isolated systems or individual outputs. The European Central Bank’s sectorwide cyber-resilience stress tests show how ecosystem-level propagation-risk controls can be enacted. These exercises map interdependencies across clearinghouses, payment systems, and financial institutions; monitor shared infrastructures for cross-organization vulnerabilities; and simulate how localized disruptions could cascade through the financial system. These practices generalize beyond regulation to any highly interconnected environment.

Organizations can enact propagation-risk controls by redistributing visibility, accountability, and decision rights across ecosystems rather than relying solely on organization-level rules or ex post interventions. Because propagation risks are inherently cross-boundary, effective governance depends as much on coordination across organizations as it does on internal control. Some organizations must shift their cultural norms to encourage data sharing, coordination on standards, and co-investment in oversight infrastructures with partners, competitors, regulators, and, in some cases, open-source communities.3 Reducing propagation risks requires the understanding that resilience is no longer something a company can achieve on its own but instead is a property of the broader system it depends on.

As ecosystems become more densely interconnected, these risks are likely to intensify. The rise of agentic AI — capable of autonomously initiating transactions, negotiating contracts, or reallocating resources across networks — extends this logic, increasing both the speed and reach of failure propagation. In finance, logistics, and health care alike, errors may not simply spread; they may increasingly do so with limited human oversight.

Implementing Adaptive AI Governance

Once leaders have identified the AI risks that are salient to their organizations, and the corresponding controls that they need to have in place, the challenge is to integrate those controls into processes and systems, working within them and continuously adapting them. Doing so involves three key practices: embedding controls into workflows and incentives, building cross-domain fluency, and institutionalizing governance as a living learning system. Here is how to do that.

1. Embed risk-control protocols into operations. Risk protocols must be designed and hardwired into workflows, accountability structures, and incentives. Oversight should flow directly into planning, audits, and leadership reviews rather than sitting on a separate compliance layer. Only when governance becomes part of the operating fabric can AI be scaled with confidence. This is a necessary condition.

A global bank whose leaders we interviewed embedded AI controls into its standard lending workflow rather than treating them as a separate compliance step. For each approved AI use case, the bank’s AI use‑case committee documented (1) the risk tier (high, medium, or low) based on customer impact, regulatory impact, data sensitivity, and model type; (2) the mandatory controls associated with that tier (such as independent model validation, sample testing by relationship managers, or frequency of post‑deployment reviews); and (3) the decision rights (who could approve model changes and under what conditions). These requirements were then encoded directly into the credit‑approval process and systems. Relationship managers could not bypass model‑validation steps or deployment reviews; exceptions required explicit sign‑off from both the business and risk management teams. Oversight surfaced in regular decision-making cycles, not through ad hoc committees or audits.

2. Enable conclusive judgment across heterogeneous expertise and risk profiles. Adaptive AI governance does not require consensus or shared judgment. Quite the opposite: It requires mechanisms that enable conclusive judgment across heterogeneous expertise, methods, and risk profiles. This is often the hardest — and most decisive — task to accomplish. As AI risks shift across categories and cut through organizational silos, accountability cannot reside within any single function. Differences across domains are not a flaw but a feature: They reflect distinct expertise, evaluative methods, and risk tolerances. The governance challenge is therefore not to homogenize these perspectives but to create the conditions under which organizations can translate them into conclusive decisions at scale — while avoiding both judgment homogenization and uncritical rubber-stamping of AI outputs.

Among the central challenges to institutionalizing a durable capacity for conclusive judgment are that rules-based controls are often undermined by siloed knowledge when various domain experts do not share a common frame. To overcome those barriers, share knowledge across domains via joint model reviews and documentation (such as the model cards described earlier), and hold routine cross-functional validation sessions that make decision logic, assumptions, and thresholds explicit and contestable. In ex post alignment controls, the challenge involves not only knowledge silos but also misaligned risk tolerance and methodological approaches. Alignment can break down when different teams operate with different implicit risk thresholds — stopping judgment too early on the one hand or falling into analysis paralysis on the other. Relying on divergent methods to reconcile expected outcomes with observed results (such as analytical validation, controlled experiments, or case-based judgment) can also cause misalignment. In such scenarios, disagreement is not simply about what the model recommends but about how much risk is acceptable and what constitutes sufficient evidence that the model is performing as intended.

A critical response, therefore, is not merely to “build trust” in AI recommendations but to establish shared evaluative routines that surface and reconcile differences in both risk tolerances and methodological approaches. Systematic post-deployment evaluations anchor discussions in observed system behavior rather than abstract beliefs about model quality.

Organizations can do this through structured review routines that combine incident and near-miss analysis, performance-drift monitoring, and explicit comparisons between intended use cases and actual decision outcomes. Crucially, these routines create common reference points — agreed-upon risk thresholds, shared evidentiary standards, and comparable metrics — through which analytically driven teams, experimentation-oriented groups, and use-case owners can jointly assess whether the model is functioning as intended. Over time, this enables assumptions, thresholds, and controls to be recalibrated, reducing both premature shutdowns driven by excessive caution and analysis paralysis driven by methodological disagreement.

Propagation-risk control depends on a fundamental shift in mindset: from treating risk as a company-centric problem to governing it as an ecosystem-level phenomenon. As with digital business ecosystems, risks in AI systems propagate unevenly across actors that have different roles, incentives, and degrees of interdependence.4 Mapping these interdependencies beyond company boundaries is a necessary first step — and often a wake-up call — but it is insufficient on its own.

As research on ecosystem strategy has shown, coordination breaks down when accountability is diffuse, incentives remain locally optimized, and no actor is explicitly responsible for orchestrating cross-boundary trade-offs.5 Similar dynamics undermine AI propagation-risk controls. Teams remain incentivized to focus narrowly on their own systems; risk ownership is fragmented across organizational units and external partners; and downstream or reputational risks are treated as someone else’s responsibility.

Without leadership support for ecosystem-level accountability — and governance mechanisms that differentiate risk ownership by type of interdependence — interdependency mapping risks becoming a one-off analytical exercise rather than a sustained governance capability. An ecosystem mindset requires not only visibility into connections but also shared rules of engagement, escalation rights, and decision authority to manage how risks propagate across organizational and technological boundaries over time.

Overcoming these barriers is essential to creating the conditions for conclusive judgment that respects differences in expertise, methods, and risk tolerance rather than collapsing them into a single acritical evaluative frame.

3. Institutionalize governance as a learning system. AI governance cannot be static: Risks mutate, so controls must evolve.6 Effective governance therefore requires organizations to establish learning loops with clear roles, for capturing lessons from incidents and near-misses and for translating those lessons into updated standards, thresholds, and controls.

Rather than relying solely on controls, systems, or large-scale governance platforms, effective adaptive AI governance depends on building the right mindset and embedding practical learning loops into everyday oversight. This involves assigning explicit responsibility for reviewing incidents and near misses; systematically documenting what went wrong; and ensuring that insights are translated into revised policies, recalibrated thresholds, or strengthened controls. Over time, governance shifts from protocols and systems toward institutionalized continuous improvement, ensuring that AI systems remain aligned with organizational intent as models evolve, contexts shift, and new risks emerge.

Taken together, these steps mark a fundamental shift in governance of AI. Adaptive AI governance is not about multiplying controls, committees, or checklists. It is about identifying fit-for-purpose controls and hardwiring them into how the organization works, decides, and learns — into workflows and incentives, shared frames of judgment, and living systems that continuously absorb and act on experience. Organizations that treat governance as static will inevitably fall behind systems that learn, adapt, and propagate risk in real time. In contrast, organizations that institutionalize governance as a learning capability — one that connects strategy, execution, and oversight — can turn AI governance from a constraint into an enabler of scale. In the age of intelligent systems, advantage will come not from adopting AI faster but from governing it better — by embedding oversight where decisions are made, risks propagate, and value is created.