How AI Helps the Best and Hurts the Rest

Mark Shaver/theispot.com

Can generative AI serve as an effective adviser for business owners and entrepreneurs? Intuitive chat-based natural language interfaces mean that anyone who can read and write can use GenAI tools for a wide range of tasks, even if they lack technical skills. This has obvious appeal for entrepreneurs and small business owners, many of whom could benefit from an on-demand adviser able to help with marketing, pricing, operations, and strategy.

Improving the performance of entrepreneurs at scale has proved to be challenging. The most effective interventions tend to be high touch, such as hands-on consulting, individualized mentorship, and in-person networking. However, they are expensive to deliver and difficult to scale. In emerging markets specifically, this constraint is often even tighter: High-quality business support can be scarce, and its cost can be prohibitive relative to organizational resources. A low-cost and always-available AI mentor could potentially deliver, at scale, the type of business guidance that has historically been limited by the availability and cost of human experts.

To test whether accessing generative AI can actually help small businesses, we ran a field experiment with hundreds of small business owners in Kenya. We randomly gave half of them access to a WhatsApp contact that connected them to a version of OpenAI’s GPT-4 that we had prompted to act as a Kenyan business adviser, and then we tracked business performance over time. The key factor driving either an increase or decrease in profits and revenues? Whether an entrepreneur had the judgment to distinguish good AI advice from bad.

Testing AI Advice in the Real World

Many previous studies of generative AI have focused on narrow, well-defined tasks, such as drafting emails, developing business strategy, or generating marketing ads. For such tasks, the tool’s output can often be used with little modification, allowing even less-skilled users to benefit from AI assistance. Consistent with this idea, studies have found that the workers who were struggling the most before using AI benefited the most from using such tools.

Managing a business is not a narrow or well-defined task, though. Entrepreneurs often face vague and ambiguous problems. They do not just need help with writing an email; they need help deciding what problem to tackle, what strategy to pursue, and which advice applies to their specific context and then choosing what to implement under real constraints. On its own, AI does not typically handle those kinds of problems well. When Anthropic gave its Claude Sonnet 3.7 large language model total control of a small vending business in its San Francisco office, the LLM sold items at a loss, gave away free products, and quickly ran the shop into the red. But what happens when, instead of leaving AI to run a business on its own, it advises a human entrepreneur who can then decide when to implement or ignore its ideas?

To test how AI impacts a broad task like running a business, we designed a study to evaluate it in the messy reality that entrepreneurs face. We recruited 640 small business owners in Kenya from a range of sectors — including food and beverage, agriculture, and car-wash services — and ran a randomized controlled trial from May to November 2023. Since most of the country’s population communicates via mobile phone, half of the participants were given access to a GPT-4-powered AI business adviser delivered via WhatsApp, the dominant messaging platform in Kenya. Eighty percent had never used ChatGPT or any other generative AI tool. Both groups received brief onboarding training, but the control group received an online business training guide instead of AI access.

Business owners in the experimental group could ask any business-related question of their choosing and use the assistant as much or as little as they wanted. We tracked sales and profits over time, comparing entrepreneurs who got the AI assistant against the control group, who did not. On average, the difference between the control group’s and the experimental group’s business performance was close to zero and not statistically significant. But the average for the experimental group masked a striking split: Having access to generative AI boosted revenues and profits by 15% among business owners who had already been doing well (that is, they were in the top 50% of performance before the experiment), but among those in the bottom 50%, AI use led to a nearly 10% decline in revenues and profits.

Same Advice, Different Choices

Why would a tool capable of producing high-quality business suggestions harm the entrepreneurs it was supposed to help? We found that both high- and low-performing entrepreneurs asked a similar number of questions, asked similar types of questions, and even received similar advice from the AI tool. The difference was in what they chose to act on.

In our data, we saw that every entrepreneur, regardless of baseline performance, received generic suggestions like “lower your prices” or “invest in advertising” alongside more tailored, context-specific ideas. Low performers disproportionately acted on the generic advice, cutting prices and increasing spending on advertising. These one-size-fits-all moves often eroded margins and raised costs without generating enough new business to offset the costs.

High performers, in contrast, used GenAI to discover and implement changes specific to their situation: A cybercafe owner started renting out gaming accessories to customers; a car-wash owner introduced a new in-demand detergent and started selling cold sodas to waiting customers; and another entrepreneur found alternative power sources to withstand electricity blackouts. Both groups had access to the same quality of AI advice. The difference was whether the entrepreneurs had the judgment to sift through AI-generated suggestions, pick the ideas that fit their business, and ignore the rest.

Our takeaway from the study is that in contexts where problems are broad and fuzzy, generative AI amplifies the role of human judgment. The value created by an open-ended AI adviser is critically dependent on the human judgment that guides its use and application. In open-ended contexts, a positive effect of AI on performance relies on asking good questions, interpreting suggestions, and choosing which actions to implement. For users with strong judgment, the tool helps surface new ideas and think through trade-offs. Users with weak judgment can end up following plausible-sounding but misleading advice that leads to worse outcomes.

For managers and policy makers, recognizing this nuance is essential. Without it, well-intentioned AI deployments risk widening performance gaps, because the people who often need the most help are also the least equipped to filter and apply advice.

How Leaders Should Implement AI Advice for Open-Ended Problems

Our experience prototyping and launching a WhatsApp-based AI adviser shows how quickly and cheaply generative AI tools can be rolled out and made widely accessible. But a fast implementation of a GenAI tool may also raise the risk that organizations roll out open-ended AI tools without strong guardrails or evaluation. As the cost of deployment falls, AI is being applied to an ever-wider range of open-ended tasks. For example, engineers at Google now use AI coding tools in their day-to-day work, and there is evidence that the most experienced developers benefit the most from these tools. In book publishing, established authors have been able to increase their output with AI while AI-assisted entrants have flooded the market with lackluster prose. For leaders managing AI within their organizations, these findings reinforce the importance of careful design and rigorous measurement to ensure that AI does not inadvertently lead to worse performance.

What can leaders do? First, cultivate awareness. Leaders should not assume that AI will boost performance for everyone. Evaluations that focus only on average effects can be misleading, because the mean can conceal meaningful harms for specific groups.

Next, leaders can design for heterogeneity. For workers with experience and judgment, open-ended AI tools can have real returns. Junior or weaker performers might need tighter guardrails to avoid following harmful suggestions. One promising direction is feeding the AI tool more context about the user’s specific situation — their business data, financials, or competitive environment — so that it can better filter out generic advice that doesn’t fit their situation. Building that kind of contextual awareness into AI tools remains an open challenge that GenAI vendors are actively exploring.

In the meantime, it is more likely that most people will find generative AI useful for specific, narrow tasks — such as summarizing documents, writing more clearly, or reviewing code for efficiency — rather than tasks that require a great deal of contextual knowledge to determine the applicability of its output and skill to implement well.

Organizations should also invest in human judgment and scaffolding around AI use. For high-stakes decisions, escalation to human support is a critical safeguard, especially when advice is open-ended, context-dependent, or difficult to evaluate in advance. Organizations can build supports that make these tools safer, such as structured onboarding that elicits context, decision checklists, or warnings about margin-destroying tactics.

The third step is to audit for uneven effects by asking questions in three areas:

Adoption: Are some groups avoiding the tool entirely or using it far less than others?
The interactions themselves: Are different users asking different kinds of questions, providing different amounts of context, or receiving meaningfully different outputs?
What happens next: Is the tool changing real-world decisions, and are those decisions producing better results for some users than others?

Asking those questions can help leaders pinpoint where inequality may emerge, which allows for intervention through targeted training, workflow redesign, or tighter controls.

AI shows real potential to increase business performance at scale, but the benefits are not guaranteed. Our research results suggest that GenAI can inadvertently increase inequality in business performance by helping stronger performers more than others and, potentially, actively harming lower performers. When deploying AI tools at scale, a central design challenge is to not merely make AI available but to make its use effective so that scaling AI does not scale inequality.

﻿How AI Helps the Best and Hurts the Rest

Testing AI Advice in the Real World

Same Advice, Different Choices

How Leaders Should Implement AI Advice for Open-Ended Problems

How AI Helps the Best and Hurts the Rest