Challenging the Average With Open-Source AI: Hugging Face’s Thomas Wolf

Thomas Wolf is the cofounder and chief science officer of open-source AI platform Hugging Face, which provides access to thousands of pretrained AI models that can be downloaded and run locally. With over 10 million users, getting started on the site can be a daunting task. Thomas explains how the company aims to improve its accessibility through documentation on the company blog as well as community feedback, similar to social media likes and upvoting.

Thomas and Me, Myself, and AI host Sam Ransbotham discuss the benefits and trade-offs of both open-source and closed-source AI models, as well as the evolution of microchips and the future of hardware and software development — as well as the hopes Thomas has for the future of coding with AI, starting with his children’s generation.

Subscribe to Me, Myself, and AI on Apple Podcasts or Spotify.

Transcript

Allison Ryder: Today’s guest believes open-source and closed-source models will coexist in the world of AI. Find out what he considers the opportunities and drawbacks to each, as well as how communities can make AI tools — and themselves — work better together on today’s episode.

Thomas Wolf: Hello, I’m Thomas from Hugging Face, and you’re listening to Me, Myself, and AI.

Sam Ransbotham: Welcome to Me, Myself, and AI, a podcast from MIT Sloan Management Review exploring the future of artificial intelligence. I’m Sam Ransbotham, professor of analytics at Boston College. I’ve been researching data, analytics, and AI at MIT SMR since 2014, with research articles, annual industry reports, case studies, and now 12 seasons of podcast episodes. In each episode, corporate leaders, cutting-edge researchers, and AI policy makers join us to break down what separates AI hype from AI success.

Hey everyone. Thanks for joining us again, and welcome back to a new season. Today, I’m lucky to be talking with Thom Wolf. He’s the cofounder and chief science officer of Hugging Face. Thom, great to have you on the show today.

Thomas Wolf: Thanks, Sam. It’s a big pleasure to be here.

Sam Ransbotham: Let’s start with Hugging Face itself. Some of our listeners may not be familiar with Hugging Face. Can you give us a brief overview of what the company does and what you do?

Thomas Wolf: Yeah, of course. Hugging Face is an open-source AI platform. We give access to all the AI models that are open source, which means that, basically, these are the models you can download and run wherever you want. So when you use an AI model nowadays, you can choose either to go to ChatGPT, Anthropic, or Google — they are the most widely used at the moment — or sometimes you want to run the AI models on your own data center, or you want to run them on some specific hardware. [This] could be local hardware or could be maybe faster chips because you need instant response. In most cases, you will want to go for [the] open-source AI model, which is a model you can basically just download.

There [are] quite a lot of them. On Hugging Face, there [are] close to 4 million of these models at the moment. There is one new model being published every five seconds. Some of the most famous ones are the Meta series, the Llama series. And one I think [that] got the most adoption and most visibility recently was DeepSeek, which was released in January and kind of crashed the stock markets at the same time that it was released.

So over the past eight years, Hugging Face has been building this platform, growing it together with the community of people and teams, who are both sharing and downloading models. This community is now roughly 10 million users and AI builders. [That’s what] we call them. And we’ve expanded, as well, beyond just model hosting to also host data sets, which are used to train models, to fine-tune them, to evaluate them. And more recently, [we] also [added] what we call Spaces, which are simple, low-code ways to test all of these models.

Sam Ransbotham: So there are a lot of people offering solutions here. And if I think back on the way technology has developed throughout the history of mankind, people came up with chips, and Bell Labs and Intel came along and built fabs for processors. None of that was open source. Why is open source important here?

Thomas Wolf: I think open source has always been important in a way. The thing is, open source is more often the long game in computer science. So if we go back to, for instance, the year 2000 or pre-2000s … basically Microsoft was one of the largest operating systems, and Linux was somehow more for fanatics or geeks. And now if you fast-forward 20 years later, Linux is really the basics of all enterprise software and all enterprise cloud, basically. You almost always run them on some version of Linux. Even MacOS, which I think is probably the most widely used on consumer laptops nowadays [and] is one of the largest competitors to Windows-based, is itself based on the Linux core.

So there is this trend, which is [that] open source has some advantages that make it extremely appealing in the long term. Obviously [in] the short term, you can go faster with closed source, and that’s also what we see with closed models. You can iterate faster. You can raise large amounts of capital to train your models. You can try to grab the most expensive AI researcher and pay them huge sums of money.

We keep pushing a lot for open science, and just this Tuesday we published a new model that’s called SmolLM3, which is an extremely smart model. It’s the best one at 3 billion parameters. So it’s in the range of size that you can run on your laptop and even on the smartphone.

We’ve decided to share, at the same time, all the data, all the recipes, all the knowledge on how to build these models. It’s fine for us because we don’t make money out of these models, and we think it’s very good because anyone who can, [who] wants to build a model based on this, or looks a bit like this, or wants to extend this type of model, now has all the knowledge … to start.

We think open source can be defined in many ways in AI, but we think the most radical way is to say you share just everything. You share the data, you share the code, you share the recipe. We wrote in a very long blog post, [which] we’re going to probably make into a full-blown paper, all the nitty-gritty details on how to build this model. So we’ve kept publishing all of this. We’re even making a book right now, which is on how to train efficiently an LLM [large language model] on a GPU cluster. But we think basically all of this should be really accessible.

But I don’t want to give the impression that I’m an open-source absolutist. I think both of them have interesting advantages and drawbacks. And I think both of them will generally coexist in AI.

If you compare [AI] to hardware, it’s an interesting comparison you made, right? Bell Labs was, in part, mostly developing at the time software [that] was probably much more niche, and software and hardware [that] were much more tied together.

I think if you compare [AI] to hardware, there is some difference, but there is also an interesting advantage and point to be made for open-source hardware. So that’s actually something we have started to do very recently at Hugging Face in robotics. I don’t know if it’s something we want to cover nowadays, but we just acquired, this year, an open-source hardware company in robotics called Pollen [Robotics]. And I think there is some definite interest.

The general idea is, for me, that a lot of the way you see hardware in the long term can be reinvented using software. So maybe we’ll come to that. [I] don’t want to go too quickly into crazy tangents.

Sam Ransbotham: No, I think that’s pretty fascinating, especially with the HopeJR and the robotics initiatives that you’re involved with right now. I guess you can go ahead and talk about that. But as you’re doing that, I want to push you a little bit and say [that] I don’t even know how we define hardware and software anymore. You’ve got firmware. You’ve got so many layers in the software stack. And traditionally, even if we had open source — Linux was the example you mentioned — it still may have [been] built on top of firmware that was vendor-specific and proprietary. So even within those stacks it becomes complicated. So talk about hardware. I’d like to talk about HopeJR.

Thomas Wolf: I think this is very interesting. One thing [is] it’s slightly futuristic. But my job at Hugging Face is mostly to think about what’s coming. So I do spend a lot of time thinking about the next year and the coming few years in AI.

I think, just like you’re saying, the frontier between software and hardware is becoming maybe smaller again. We had the moment where there was really this huge — I mean, maybe at the beginning, it was all the same because everyone was so close to the hardware that basically we didn’t really have anything that we would call software, any really large abstraction.

And nowadays, the interesting thing is we see this tendency to kind of fill the gap and to go back to a very low level. And there [are] a couple of trends that [I have not] fully thought about, but what I see is, for instance, people using AI again to speed up the process of building hardware — both computer-assisted development, basically for mechanical pieces, but also hardware, like chips, all these types of things.

And [they’re] saying [that] using AI, we can maybe reinvent and basically lower, or maybe digest, all of this knowledge that you kind of need nowadays if you want to develop something in hardware in a form that’s actually so helpful that, again, we could develop hardware a little bit like we develop software. So we could iterate much more quickly on it. We would have much less of an entry barrier of knowledge to basically be able to design things in hardware.

So I’m very excited about this. And I think this is, in part, unlocked by the tools that we now have available.

Sam Ransbotham: If I go to Hugging Face’s site, I’m overwhelmed. There’s so much stuff. And I think it’s great that everything’s open, but how do we solve then this curation problem of “There’s a whole bunch of information. I’m never going to get it all. What do I do? How do I get started?”

Thomas Wolf: It’s a difficult problem because there are like 4 million models. So how do you find the model you want to use, right?

In the beginning, we tried to have some curation manuals for us, but with one new model every five seconds, that’s not really sustainable.

There [are] a couple of general guidelines, of course. I mean, if you’re looking for a speech generation model, you can restrict yourself to this type of model. If you’re looking for a textbook, you can filter that.

Sam Ransbotham: Now you’re down to just a million.

Thomas Wolf: Exactly. It’s surprising because everyone thinks, usually, that the category of model they are interested in is the most downloaded. But it’s not. It’s often the most downloaded one [that] will be a speech model. And I get LLM people or text people [who] are very surprised. They’re like, “What?”

But the reality is AI is becoming a huge, huge field with many subfields. And each of these subfields is actually getting really large itself. So the way we try to do that is we cannot really curate ourselves anymore. Just like on the internet, you cannot really just try to curate yourself the best website. You can have a couple of ones you like, you can rely on search, but the best way usually is to rely on kind of a social discovery. So you can go on Reddit, or you can find places where people [add] likes. If you’re on Pinterest, maybe there [are] some likes for different things.

So the way we’re trying to do that more and more is to give social tools to comment on models, to make collections of models — a little bit like the internet itself, I would say, which is often guided by other people [who] tell you, “You should go there,” and that’s how you find the place.

I think the most [reliable] signal right now are two places on the Hugging Face website. One is the blog. We have a blog section that’s very active, actually, and that has really high-quality content. It’s a good thing to follow. And the other one is the trending models that show you, basically, in the past week or two weeks which models have gotten the most likes or the most interest among all the new models in the past period.

Sam Ransbotham: That makes a lot of sense. Although, at the same time, I can’t help but want to connect it to what we said earlier in saying that “Oh, it’s only the weird thing. It’s the not average thing, so maybe you should be looking at the bottom of the list instead of the top of the list. … The problem is there’s going to be a lot there.

But back in the original software days, if you think about the huge, monolithic IBM 360 operating systems, they were giant and expensive to get going. And we’ve seen in software a great reduction in that cost by building on components. But we still have just a handful of chip manufacturers that, again, because of the literally billions of dollars that it takes to create chips and create hardware — that may not be true of all hardware, like robotics or whatever [that] have a much lower entry barrier because of the scale — how do we enable that? How do you push that?

Thomas Wolf: In particular, if we talk about chips, which is a field I’m quite interested in recently, I think there [are] two converging things. The first one is AI in a way is quite a simple technology. So … most AI chips are much simpler than CPU or GPU. We don’t have all of this complexity we needed to build on top of them — all the branch prediction or the complex things. We need to make sure we really make the best use of these chips in [a] very generic compute workflow and setup.

An AI model itself is extremely simple in a way. It’s just really this series of matrix multiplications, a couple of nonlinearities, and you have the attention blocks. That’s maybe slightly scary the first time you see the equation, but actually that’s really that simple, right?

If you compare it to the huge and cumbersome system we had to design to actually make a general computing system efficient in all of these edge cases, just being able to support one forward pass is a baby task in a way.

So this means we can really reinvent how we make this computing architecture itself, and we can make it much, much simpler. So that’s one thing.

And if you project in the future, and if you think that basically AI might become the dominant form of compute — by which I mean the dominant form we use energy for compute — this might be the thing we might actually want to really overindex on and say, “Actually, it’s not like this. The GPU is on the side helping the CPU, but that’s actually the reverse case now. It’s this AI computer is the central piece of what we’re building.”

So the first thing is I think there is a way to redesign our computer architecture in a much more simple way. [This] is kind of a funny gift that the AI revolution brings us, which is it’s much more simple, and it’s also much more powerful because a well-trained LLM and an LLM in the future can simulate extremely complex things.

And you can even ask it right now, which is [a] funny thought experiment that my friend Stephen [Balaban] from Lambda Lab was doing. You can ask it to simulate any type of software. And you could even tell it, “Now, behave as a spreadsheet thing.” Or “behave as a website.”

So it behaves as a very complex type of software pretty well. And if you look at the core operation, it’s just all very simple forward paths. You have a lot of them. So that’s one aspect, which is we have this very simple computing, compute architecture.

And the other aspect is we can use this AI software increasingly as helpers to simplify complex tasks we had to do. Basically, we can offload a lot of the cognitive load we needed to have to kind of remember, “Oh, yeah, I need to check this part of the software. I know where it’s on the dock, but I need to check how this works. And there is this thing I also need to be careful about.”

With the development of these types of AI agents, there is some hope that we can really automate a lot of this design part, just like nowadays, even when you use some kind of CAD design software, you actually use something that’s extremely powerful. So basically, with just a couple of parametric lines, you can design something in 3D that used to be extremely complex to design before.

And the same thing is I could see us, just like we [can] do a little bit of parametric shape on Onshape or one of [the] design software [programs], I can see us designing [a] very complex system by just giving a couple of points on the Bayesian curve and hoping that the AI system will just actually connect all the things and make sure this all fits nicely.

So this is the other thing — not that we are building this system but we’re using them to help us build an extremely complex system.

Sam Ransbotham: I think it’s great. You mentioned Onshape. My son will now listen to the episode. … It’s amazing. You can take a 2D picture and turn it into a 3D. He’s making models of himself just by taking a frontal picture there. … It’s pretty amazing.

I want to push back on a couple of different things you got going there. One, I think the chip growth is something I hadn’t really thought about, but we originally had CPUs, and then we developed GPUs or graphical processing units for processing images on the screen. And then we had this aha that “Hey, those matrices are the same matrices inside of machine learning models. We can use those GPUs much more efficiently.”

But what you’re pointing out is we didn’t make those chips for that purpose originally. And we might have different design constraints if we did make those chips from [the] start. If you combine that with your ability to make chips cheaper, make hardware cheaper in general, then that’s a nice combination that might get started.

Thomas Wolf: To be honest, the GPU is increasingly also an AI-optimized chip for sure. I mean, [there are] Tensor Cores, and you’ll see a variety of chips, you see Cerebras chips. We work with a variety of them. I think it’s very interesting to follow this field and to see how, basically, competition pushes people to explore, just like you were saying, maybe reinventing this. Cerebras is this example of “Let’s be able to host a full model on just [these] very large wafer-scale chips.” Grok — I was just seeing on the news this morning — is also an interesting case of “Let’s try to push maybe for the low batch, the small batch. Let’s try to push the token per second to the max.”

Maybe the driving force here — if we step back a little bit and take more like a business view on this — is for the first time, one of the main metrics we have is a very low-level matrix. And that’s basically the cost per token. And the cost per token is an interesting matrix because it’s both something that basically … a CFO-level person could take a look at. When you use Gemini, that’s the first matrix, people will tell you, and when you compare these values for riders, that’s maybe the most cost-related metrics you’ll take a look at.

But it’s also a metric that’s extremely low level. Because if you think about that, that’s really just a series of operations. And if you think about that … you can even link that to almost “How many transistors will I activate because this model has this size, and one token is just one for what pass?”

You could link that to exactly how [many] billion transistors you will need to activate for this price.

In the past, we didn’t have that for the price we were paying for all our compute. We never said, “Oh, well, you will pay actually this amount because you do 10,000 operations.” We never had this connection between [these] cost metrics and the extremely low level of one single operation on the chip.

It’s no surprise that you see new startups appearing, both in the U.S. in Europe, [such as] Edge Fractal. A lot of them — I am citing the one that I may be closest to having chips in production. But also [of] all the hyperscalers, there’s Trainium at AWS [and] Microsoft has been developing — now, there seems to be some delay — but it’s basically also building their own chips. Even OpenAI [has its] own chips research division. Just because the first time, naturally, your tendency, when you see this matrix, is that you want to change it, you want to lower this cost.

Sam Ransbotham: I think metrics are a big deal, and we respond to those. I hadn’t really thought about it, but before we talked about computing hours, well that was hard to understand what an hour did for you. At the other end of the scale, we had floating point operations per second [FLOPS], which I have much less of a way to relate that to anything I want to do, versus at the token actually makes a lot more sense then. And then people will optimize on it.

Thomas Wolf: It’s the dollar per token. That’s the thing. We never had the dollar per gigahertz that would maybe push you to make faster CPUs. Or, I don’t know, the dollar per FLOPS that would make you make faster GPUs in computer graphics. … The connection was really wide between these two universes.

So I think a lot of people are saying that’s pretty true, that the limiting costs of intelligence would be the cost of electricity. I think that’s first. And the last person I heard saying that was Patrick Collison, but I think a lot of people view this this way.

I do agree, but there will be a multiplying factor between electricity and intelligence. And this multiplying factor will be exactly “How much intelligence can your chips give you per electron, per energy?” And that’s where you will want to squeeze this as much as possible.

Sam Ransbotham: You mentioned cognitive and, of course, that’s a big deal is what we’re going to do with these chips. Let’s say we’ve got all these chips and we’ve got them cheap. What are we going to do with them?

There’s a report that I think you reacted to about a country of Einsteins sitting in a data center. I think that’s pretty appealing, the idea that, “Oh, gosh, we get all these models out there running in data centers, and we’ll just suddenly not have one Einstein, but we’ll have zillions of Einsteins out there.” Just think of the progress.

But today, tools in general and AI specifically, of course, can be a head start for people to get to average quickly, so they don’t have to spend time getting to average cognitive output. But at the same time, they can also be a way that people learn to depend on these tools.

Without practicing skills, we don’t get better at skills. How can we go beyond average? Can we have a country of Einsteins sitting not in data centers but at homes that are using AI tools to provide a head start?

Thomas Wolf: Yeah, for sure. This case [was] started by this essay from Dario Amodei, the CEO of Anthropic. … It’s a beautifully written essay. It’s very optimistic. It’s called Machines of Loving Grace. It’s basically saying that AI will enable us to do extremely important scientific breakthroughs. What he was taking as [an] example was this Nobel Prize-level breakthrough. Where I kind of agree with him … is I think if you summarize scientific progress, you have a lot of incremental progress — and I was guilty of doing a lot [of that] during my Ph.D. and postdoc, and that basically was [doing] the maximal thing I could do, which is basically do your tiny piece on this little aspect, extending a little bit of frontier. And then you have this massive change of paradigm that’s usually the one that will typically be awarded a Nobel Prize and can be general relativity, can be CRISPR in biology. There [are] a couple of them in every field. And they usually also create new fields [themselves].

What I was saying, and what I think, is that AI will be extremely useful for all the incremental innovation. AI is very good at exploring many, many things around the status quo, but AI is extremely bad at challenging the status quo itself. It’s very easy to get ChatGPT to agree with you on anything. It’s very hard to get this model to disagree with you on something, actually, and challenge your view of the world, which is quite a problem in some cases, and in particular scientific research.

Two weeks ago, I had the pleasure to meet again one of my former professors called Alain Aspect, who got the Nobel Prize [three] years ago, if I’m not mistaken, for basically proving this disagreement that Einstein had with quantum mechanics, where he was basically disagreeing with the core idea of quantum mechanics.

If you project the waveform function, you basically have a random output. So you can map that in one experiment. And he did this optical experiment.

If you talk with this type of researcher, they don’t want to please you. They have strong opinions, they have strong ideas. And I think it’s what actually led them to make a strong discovery, because they were like, “I don’t think this is right. I want to prove this wrong.”

And they don’t try to do this type of sycophancy that the LLM will do, where they actually want to please you.

I think [what’s] a strong missing point for AI models nowadays is they’re really trying to … first, they are trying to predict the most likely next word in a sentence, which means they will miss words that are unlikely. … They will tend to regress. Just like you were saying, they tend to bring you to the average. They’re very good [at] average thinking, or [being an] average designer, or [the] creative process if you use them [as an] image designer. But they’re quite bad at really challenging the average and [giving a] crazy idea that might challenge some of their training data in particular.

So my point is that they will be very useful [as a] research assistant, but they won’t be the one that could lead us to [an] extremely novel breakthrough.

So I ask myself this question a lot, in particular, when I see my kids, my son using AI, which is, “How much should they use AI to automate their thinking? What is the remaining part that’s very human, that we should keep, that we should build?”

Like always, I think it’s probably a bad idea to just say, “Don’t use AI.” We’ll need to find a way to teach them how to use this tool and to remain very conscious of what are the missing parts.

Sam Ransbotham: You brought in something that I think about a lot, both with my own kids and also the university students that I contact all the time, [which] is “Well, what exactly, should their relationship be with these technologies?”

I feel like because I’m at the front of the room or around the dinner table that I ought to be having some opinion about this. And it’s really difficult to know.

It sounds like you’re pushing your kids to use these tools and to embrace them, to some degree at least?

Thomas Wolf: Yeah, I think you have to. This sounded maybe quite critical of this tool. But I think also these tools are a huge way to unlock creativity. Let’s talk about, for instance, vibe coding, which I know quite well.

I follow this idea that you can prompt a website into existence, and a quite complex website at that. I think [this] is very fascinating because it used to be quite complex to code a website, for sure. And a lot of people just I think self-censor [themselves], and say, “Oh, I have this idea for something, but it’s so complex. I don’t [know] HTML. I don’t want to.” You have some no-code tools for sure, but they have all their kinds of quirks, the limitation [of] what they can build, some of them don’t have databases. And this general idea [of] I can just ask for these things to exist is quite new.

And so one month ago, for instance, my son is 12. He’s still interested in what I do, I would say, luckily for me, for a couple of years, maybe. I don’t know how long this [will] last. But I managed to bring him and a couple of friends and kids from other friends to a little hackathon we organized. We selected one vibe coding tool, Lovable, that I found very, very easy to use, very nice. And we explained to them a little bit [of] the design process, that it’s better, for instance, to formalize a little bit [of] your ideas. So we asked them to draw the website they had in mind, to think about the idea, how it is, and then to prompt it. And we tried to organize a bit [of] their process.

But the thing we saw is they very quickly grabbed this tool. And they started to create much more different apps than we [predicted]. We thought they would just have one idea. But basically they had 10 ideas. And very quickly, each kid was experimenting with four or five different websites at the same time because they wanted to create this thing to connect scouts with football players [and] this thing to connect cat owners with second-hand cats. So that was very crazy to see. And then you can imagine — they were between 9 and 12 — they will grow to basically decide that if they want to create a website, it’s just a couple of prompts away. [It’s] just a feat they can do in a couple of hours. I think it was very beautiful to see.

And then you even see them morphing as little entrepreneurs. So my daughter was building this website to connect cat owners with these second-hand cats that people wanted.

And she was thinking, “Oh, maybe I could also ask them to pay when they want to meet each other because then they need to give the address. …”

And so you see them starting to ask questions that I think, just because the technical part is so easy, they start to … project themselves a lot more in how this would be in real life.

That was one recent example that really struck me as an unlocking of creativity I had no idea could exist. In September and October, we want to redo this type of hackathon, everywhere in the world. We want to see what [would] happen if we do that at a bigger scale than just basically our neighborhood. I’m quite excited. Maybe when this podcast will go out, we’ll have this worldwide kids vibe coding hackathon going on.

Sam Ransbotham: I may have a couple of kids we can add to that. But I think that ties together well with what you were saying before, which was you have this tool that always says yes and will always do what you want, and try to do it quickly. Using that to your advantage and to take advantage of the fact that this tool will, in fact, do everything you ask and try very hard to accomplish that. That seems like it ties well with your framing of the tools as good assistants.

Thomas Wolf: I think in [that] way we are quite lucky. I’m much more unimpressed by all the stories of AI freeing itself from its chains and deciding to take over humanity. I think the way we are building these tools — and really, that’s both the advantage but also their strongest limitation in ways — we are really building these tools as assistants to what we want to do.

Sam Ransbotham: This has been fascinating. I’ve really enjoyed talking with you. Maybe by the time this podcast comes out, we’ll have some hackathons organized for kids all over the world. It’s been fascinating talking to you. Thanks for taking the time.

Thomas Wolf: Thanks a lot, Sam. It was a pleasure.

Sam Ransbotham: Thanks for joining us today. In our next episode, I’m joined by Angela Nakalembe, engineering program manager at YouTube. Please join us for an insightful conversation about trust and safety in the midst of an influx of AI-generated content.

Allison Ryder: Thanks for listening to Me, Myself, and AI. Our show is able to continue, in large part, due to listener support. Your streams and downloads make a big difference. If you have a moment, please consider leaving us an Apple Podcasts review or a rating on Spotify. And share our show with others you think might find it interesting and helpful.