How to build fairness into healthcare AI

Artificial intelligence (AI) now sits in the engine room of US healthcare. Triage, risk scores, and care-management pathways run on models that promise faster decisions and sharper diagnoses. Kayode Adeniyi explores how healthcare AI can be built to ensure equity and amplify the best instincts of medicine, precision, prevention, and personalised care.

When healthcare AI machinery works, patients win. When it leans on the wrong signals, it quietly points help away from the people who need it most. Fairness is not an add-on. It’s the gatekeeper for whether these systems deserve to touch patient care at all.

The recurring failure mode is simple. Many tools measure “need” indirectly, through billing history, prior utilisation, or admissions, because those are easy to grab at scale. That choice bakes history into the future. If an underserved patient has spent less on healthcare due to access barriers, a model that treats high spend as a sign of patient need will rank them lower for extra care even when their clinical profile is the same. This is not a bug, it’s a design decision with distributive consequences, documented repeatedly in the literature.

Proxies decide who gets help

Proxies are substitute measures, like using past hospital bills to guess how sick someone is. They are attractive because they live in clean databases. However, they are dangerous because they smuggle structural inequity into the objective function. In telemedicine and remote monitoring, models trained on utilisation patterns often down-prioritise rural, uninsured, or minoritised patients while still posting glossy headline metrics. Aggregate accuracy looks great, but subgroup performance sags. The error clusters where resources already run thin. Feature choice is not value neutral. Choosing a proxy is choosing a distribution.

So, who ensures fairness? In theory, everyone. In practice, responsibility diffuses. Developers cite data limits. Hospitals point to a lack of vendor transparency. Regulators hit jurisdictional gaps. This “regulatory orphanhood” lets high-impact systems slip between Health Insurance Portability and Accountability Act (HIPAA), Food and Drug Administration (FDA), and Centers for Medicare & Medicaid Services (CMS), while steering care at scale. Documentation standards, model cards, data sheets, decision logs, exist and help, but adoption accelerates only when procurement and reimbursement demand them.

The Optum lesson: objectives allocate care

Optum, a U.S. healthcare analytics firm, showed how objectives shape care when its algorithm used past spending as a proxy for need. This misdirected support away from Black patients, who often face access barriers. Retuning the model to health status nearly tripled the share of Black patients flagged for extra care.

A widely used risk-stratification tool treated past spending as a stand-in for future health need. Because Black patients face durable access barriers, they often incur lower recorded costs than white patients with similar morbidity. The proxy misled the system, diverting enhanced care away from many Black patients. When researchers retuned the objective to reflect health need rather than spend, the share of Black patients flagged for extra support jumped from 17.7% to 46.5%, without losing predictive performance. The model did exactly what it was told. It was told the wrong thing.

Two clear inferences follow. First, post-hoc fairness checks arrive late. By the time disparity shows up in outcomes, harm has compounded in downstream pathways. Second, opacity plus fragmented oversight makes it hard for institutions to see, let alone fix, allocation skew at scale. Voluntary transparency helps; it does not carry clinical accountability on its back.

From model metrics to system ethics, and build assurance like building codes

Technologies embody values. In medicine, objectives and constraints decide who benefits and who waits. Optimising solely for discrimination metrics can yield tools that look calibrated in aggregate yet treat groups inconsistently because the objective over-weights efficiency and under-weights equity. The remedy is to move fairness upstream into the objective function and the stop/go gates, not just into a report appendix. Think lifecycle, not launch—bias can enter at problem framing, data selection, feature engineering, evaluation, deployment, and monitoring.

Healthcare already governs buildings, drugs, and devices as critical infrastructure. When AI steers care, treat it the same way. The National Academy of Medicine sets out subgroup-aware validation and real-world monitoring that can keep pace with shifting context and deployment scale. A natural complement is independent assurance capacity, pre-deployment evaluation followed by continuous surveillance, so algorithmic claims face testing before they land in busy clinics. In that world, documentation stops being a brochure. It becomes an instrument that ties objectives, features, data lineage, and known limits to named stewards with authority to act.

A simple build specification

Make equity a hard constraint. Code distributive equity into the objective alongside accuracy. Reject models that hit headline metrics by eroding subgroup fidelity. Optimise for health need, not spend or throughput, so the system serves clinical purpose rather than historical access patterns.
Make decisions traceable. Keep an auditable record linking inputs, feature transforms, model versions, and deployment contexts to people and institutions. Traceability turns “black box” rhetoric into assignable responsibility and enables lifecycle audits across vendors and providers.
Watch subgroup fidelity in the wild. Set minimum performance thresholds for protected groups. Implement retraining, feature revision, or suspension when drift appears. Put these thresholds inside clinical governance so monitoring sits beside safety and effectiveness, not as an occasional side project.
Assign roles that are well defined. Hospitals set procurement baselines, no deployment without model cards, subgroup plans, and live monitoring, and review equity alongside safety. Vendors publish objectives and limits and accept liability linked to documented assurances. Regulators and payers tie authorisation and reimbursement to demonstrated subgroup performance and traceability. Concentrate responsibility where it can act.

What “good” looks like

Think seismic codes for algorithms. Before deployment: objectives aligned with health need; subgroup-aware validation; traceability that names accountable stewards. After deployment: live monitoring with preset triggers; public-facing summaries of performance and limitations; authority to remove unsafe systems from service. Patients and communities participate in design choices with distributive stakes, improving construct validity and legitimacy.

AI can amplify the best instincts of medicine, precision, prevention, personalised care, when its objective function aligns with equity. The path is straightforward: measure what matters (need, not spend), disclose how the system works (traceability and documentation), watch it in the wild (dynamic subgroup fidelity), and assign responsibility where it can act (procurement, reimbursement, authorisation). Build fairness in first. Then scale.

This blog was written by a student undertaking a MSc Management of Information Systems and Digital Innovation degree at LSE’s Department of Management.
This blog post represents the views of its author(s), not the position of the London School of Economics and Political Science Department of Management.
Photo by Accuray on Unsplash

The post How to build fairness into healthcare AI first appeared on LSE Management.