From Rabbit Holes to Recommendations: Reddit’s Vishal Gupta

Vishal Gupta, engineering manager, machine learning at Reddit, joins the Me, Myself, and AI podcast with host Sam Ransbotham to explain how the social media community uses artificial intelligence to improve user experience and ad relevance. Much of the advertising work relies on increasingly sophisticated recommender systems that have evolved from simple collaborative filtering to deep learning and large language model–based systems capable of multimodal understanding.

Vishal and Sam also explore the philosophical and ethical aspects of AI-driven platforms. Vishal emphasizes the importance of balance — between exploration and exploitation in recommendations, between advertiser goals and user experience, and between human- and machine-generated content. He argues that despite the rise of AI-generated material, authentic human conversation remains vital and even more valuable as models depend on it for training.

Subscribe to Me, Myself, and AI on Apple Podcasts or Spotify.

Transcript

Allison Ryder: Why is balance an important consideration when thinking about nearly all things AI? Find out on today’s episode.

Vishal Gupta: I’m Vishal Gupta from Reddit, and you are listening to Me, Myself, and AI.

Sam Ransbotham: Welcome to Me, Myself, and AI, a podcast from MIT Sloan Management Review exploring the future of artificial intelligence. I’m Sam Ransbotham, professor of analytics at Boston College. I’ve been researching data, analytics, and AI at MIT SMR since 2014, with research articles, annual industry reports, case studies, and now 12 seasons of podcast episodes. In each episode, corporate leaders, cutting-edge researchers, and AI policy makers join us to break down what separates AI hype from AI success.

Hi, listeners. Thanks for joining us again. Our guest today is Vishal Gupta, engineering manager, machine learning at Reddit. He’s helped bring cutting-edge AI products used by billions, like YouTube, Google Ads — all the biggies. Vishal has deep expertise in machine learning, recommender systems, large-scale data processing. At Reddit, he’s also been closely involved in the relationship between platforms like Reddit and the overall AI ecosystem, which I’m excited to learn about.

Vishal, [it’s] great to have you on the podcast.

Vishal Gupta: Thank you for having me, Sam. [It’s] nice to be here.

Sam Ransbotham: I read a stat that Reddit has a billion posts, which is hard to imagine, and 100,000 communities, and is the fifth-most visited website in the U.S., seventh in the world. These are crazy numbers. So I suspect that most of our listeners know about Reddit, but can you give us a quick overview of what Reddit is?

Vishal Gupta: Reddit is a platform where we have hundreds of thousands of communities. People come to Reddit to explore their interests in this vast array of communities. One can subscribe to those communities, have a discussion on a topic, one can chime into already-happening discussions, and one can also go upvote and downvote stuff.

Whether you are into Lord of the Rings, or whether you are into cricket, you can find something that you can relate to [and] engage with the content.

Sam Ransbotham: That’s great. It’s interesting you mentioned cricket, because actually I’m on the record as not really believing that cricket exists, but we’re not going to go into that. Let’s start with how Reddit is using artificial intelligence. Can you give us a few examples?

Vishal Gupta: Reddit is using artificial intelligence in various types of ways. I will talk about two high-level ways of using AI, and then we can go deeper into those.

[The] first use case of AI is enabling users [to find] the right communities [to] engage with. So once users start their journey on Reddit, they will visit some of the communities that are maybe popular or [that match] the interests they showed while signing up, but we want users to engage with niche communities that they can really belong to. [That’s] connecting users with the content.

Another very important use case is when you’re browsing Reddit, there is a feed. If the feed is engaging, it’s relevant to you. It is able to satisfy the content needs that you’re looking for. AI is powering all of that. Now Reddit is a serious ads platform, showing the right ad to the right user at the right time, which is relevant to them. It is all powered by AI. That is another, broader use case of AI within the company.

Sam Ransbotham: I guess my first reaction was probably pretty naive. I guess I didn’t realize that there were 100,000 communities. My guess was if I just type the name of what I want, I would find what I want. But I guess this is a form of recommendation, like a recommendation engine that you’re doing. Maybe take a minute and explain recommendation engines for people.

Vishal Gupta: You are spot-on, Sam. It’s exactly a recommendation system. What [a] recommendation system does is it captures short-term and long-term users’ interests and provides recommendations that might be valuable to the users. Maybe, for example, user Vishal is typically interested in cricket and Lord of the Rings, but right now, he is searching for what blender to buy. If there is an explicit query on the platform, we want to make sure that we are able to satisfy the information needs of the user. There is a recommendation system component that supports search as well. So that is how I will define recommendation systems.

Sam Ransbotham: You’ve been working with recommendation systems for quite a while in your career. What’s changed over the time you’ve been working with them? How are recommendation engines now different than the recommendation engines of our forefathers?

Vishal Gupta: I started working on recommendation systems as my first job in Google back in 2015. At that time, most of the recommendation systems were powered by simple collaborative filtering types of algorithms. So basically, users have explicit engagement with items, and there [are] large metrics, which basically express users’ interests, and then you do very simple matrix factorization and generate some recommendations for users. That is how the Netflix challenge was won, and that was the old-school recommendation system.

The next step in recommendation systems was, OK, let’s try to predict … the probability of engagement instead of just capturing broad user interest. There were simple logistic regression models that capture the probability of engagement. Now, that is still 2016, 2017. Near 2018, 2019, a lot of systems started leveraging deep learning to revamp their recommendation system stacks. People started using two-tower models, and that replaced your collaborative filtering models. Then people started using heavy ranking models to get accurate predictions of engagement on some piece of content. That is [what] I call it. Maybe that is the second wave.

And the third wave is now we have very powerful LLMs [large language models], powerful models that also can understand what an item is about, both visually [and] with text, so [it’s] multimodal. Now all of those powerful representations go feed into recommendation models, and we get hyper-personalized recommendations.

One thing to add here is people across the industry have realized that only optimizing for past user behavior is not the way, because then you will be driving users only to rabbit holes. So there should be a fair amount of exploration. How to balance exploration versus exploitation is an important topic. And people have been using [reinforcement learning] or some simple methods to get that balance right.

Sam Ransbotham: I think that idea of exploitation and exploration is big because I feel like that just comes up almost in every context. We want to optimize what we’re doing, but at the same time, we want to make sure we’re not doing something completely different.

You happened to mention the Netflix challenge. For our listeners [who] may not be familiar, this was a challenge that Netflix did probably a decade or more ago [for] a contest and said, “Here’s a million dollars if you can improve our recommendations.” One of my favorite parts of that story is that they did give the prize out to a team called BellKor’s Pragmatic Chaos, if I remember right. But they never ended up using the algorithms because the algorithms in the world had changed so much by the time the contest was finished.

So that really speaks to what you’re saying: Getting recommendations hyper-personalized right now is a very challenging job, because not only does the world change, but people change and lots [change] at the same time. I think it’s a fascinating world.

Vishal Gupta: Absolutely. I think recommendation systems and ads [present] one of the most beautiful problems in my opinion.

Sam Ransbotham: I’ve got to push back there. What makes that problem beautiful? Why is it beautiful?

Vishal Gupta: Basically, the way I look at ads, it’s not a pure recommendation system problem. It’s an auction at the end of the day. Advertisers can bid to show the right ad at the right time, but ad platforms also need to make sure that ads are relevant. So balancing what advertisers want and their long-term value, and balancing users’ long-term value [is a] very interesting problem. That’s why I say it’s very beautiful. It has a component of recommendation plus marketplace.

Sam Ransbotham: That’s the second time you’ve mentioned “balance” too. You mentioned the balance of explore-exploit, you mentioned the balance of the need for people to pay for these ads and also for us to see relevant ads. Balance is a big part of all these algorithms. Maybe switch to think about a different sort of balance. A lot of the content in the world right now is being generated by machines and not by humans.

I think one of the things that Reddit is on record as saying is that human conversation is not being replaced by AI but instead is becoming more important. I really like that idea, because if we have a whole lot of AI-generated content that’s common, then, in theory, human conversation will be more distinct and more real and more legitimate. But at the same time, it’s also much more of a needle in a haystack. So how are you balancing the need to incorporate all this content that’s being generated now, at the same time, trying to highlight human and authentic voices?

Vishal Gupta: That’s a great question, Sam. A lot of data is being generated by LLMs, but if you look at how these LLMs are getting trained, they’re essentially getting trained on human-generated data. So human-generated data will get more and more valuable.

I think there is, again, a balance. Sometimes people just want some information, and they don’t have [the] bandwidth to sift through all the human-generated content. There, something like ChatGPT, Gemini, or even Reddit Answers can satisfy those needs. But sometimes you really want to engage and converse on the specific topic that you are interested in. There is significant value in making users engage and have them actually create content.

Sam Ransbotham: Actually, I really hope that’s true, and I hope that it survives. … You mentioned the idea that most of these models are trained off of human conversation. But there’ve been some recent papers — there was one in Science recently that talked about a potential decay problem as these models start to ingest their own output. Are you worried about that, or do you think that Reddit offers a new source of information to hopefully get us through the idea of new content versus just regenerating the old?

Vishal Gupta: I’m not at all worried about that. I think people love talking about their interests. People are passionate. So I’m not at all worried. We humans will continue generating content. Some AI tools might aid us in getting content in the right format or correcting grammar, like correcting tone, but human content is not going away, and Reddit is one place where humans will continue writing and engaging with content.

Sam Ransbotham: I guess in general, I have that question as I think about our changing perspectives on the internet and toward large language models versus small language models. Do you have a perspective on when we should be using … more general tools or when we should be using more specialized tools? It seems like Reddit Answers is an example of a more specialized tool that might fit in some cases, but then there are other places where we use more general tools. How do we make a decision about how to go down one path versus [going] down another path?

Vishal Gupta: I don’t have a great answer to this, but I can tell you what I do. Let’s say I want to change my car battery myself. I want a multimodal capability that can show … what I’m trying to do, and it can help me debug on the spot. For that, I use generalized large-scale LLMs. Google’s Gemini is really good. ChatGPT is good. But when I want to, let’s say, do research on “Hey, where should I go for a vacation? Hawaii?” or “What should I buy?” There, I value more user-generated content as compared to something [that] is very prescriptive from large language models. So that is where I think the value-add of a small model or Reddit Answers type of product is.

Sam Ransbotham: I think it’s interesting that you used the car battery example, because a little fun fact for everybody is that my first date was changing my wife’s battery in her car that had died.

One of the other things I was thinking about is if I go to a search engine and search, I can get content from Reddit. What’s the thinking about Reddit in terms of what makes someone go natively to Reddit to search for something versus going through a search engine? And how do you think about that, or … how does that change your perspective on solving these problems?

Vishal Gupta: We have a lot of users who come to Reddit via [a] search engine because people want to search something on Reddit, and they use [a] search engine to get access to that content. I was doing that myself. The answer to that is there is value in both. We want to support the internet, and we want to make sure users get access to the content they’re looking for. What we are doing on our side is, like Reddit Answers, a great product. If you’re a heavy Reddit user, you are already on the app and you are searching for something, and you want Reddit-sourced data, that is something you can use. So Reddit Answers is something that I’m really looking forward to.

I really love that product. Previously, I was going and searching on Gemini or ChatGPT, but I directly get all my answers on [the] Reddit app itself. So this is one thing I’m personally very excited about as a Reddit user, although I work in Reddit Ads.

When you go to Reddit’s homepage or subreddit feed, [it] will keep on getting better and better. People are working really hard to improve recommendation systems, their personalization, etc. On the ads side, we are continually improving our ranking models, retrieval models. As Reddit users, people will continue seeing that ads are getting more and more relevant to them. So I’m really excited about pushing both directions.

Sam Ransbotham: How can ads help? What’s the future of solving that dilemma where you have people like me who don’t want to tell everything, but at the same time, I also want relevant stuff?

Vishal Gupta: I strongly believe that there [are] two types of relevance. One kind of relevance is stuff that [the] user is generally interested in. So, for example, organic content on Reddit is [a] really good indicator of users’ interests, both for recommendation systems and ads. I think we will continually invest in generating a nice ads experience, which is based on these first-party engagements. And second, we are very heavily working on contextually relevant ads — capturing the user’s current behavior in the last few minutes, few hours, within the session — so that we can show relevant ads. As an end user, I also don’t share my information with a lot of websites, and I think an ad platform is great if [it’s] able to still show relevant ads in a privacy-preserving way.

Sam Ransbotham: Right. One can hope. One of the things that we like to do on the show is ask some quick questions. Answer … the first thing that comes to your mind.

What about AI is moving faster or slower than you expected?

Vishal Gupta: I think [one of] the things that [is] moving faster in AI is people have figured out how to train the LLMs. Compute is a bottleneck. So scaling loss is true, like training larger models is getting easier and easier. [One thing] moving slower in AI is real innovation on the core AI side. So if you look at all the recent papers, most of the innovation is happening on the system side. How do you squeeze more juice out of your compute? How do you make some FLOPS more efficient? That is where the pace of AI has decelerated in my personal opinion.

Sam Ransbotham: FLOPS are floating point operations per second. The discussion is there’s some stuff about making those things faster, but … I guess what you’re looking for is more algorithmic improvements. …

Vishal Gupta: Exactly. That has slowed down, especially after LLMs came, and everyone wants just to improve LLMs.

Sam Ransbotham: What are people using AI for right now that they shouldn’t be using AI for? What’s a bad use of AI?

Vishal Gupta: Using only AI to do homework is, I think, [a] bad use of AI. Once you really think about learning, using AI to get all your work done without actually thinking important parts through is a bad use of AI.

Sam Ransbotham: What’s the biggest misconception that you think most people have about artificial intelligence and machine learning? What are people wrong about?

Vishal Gupta: I think a misconception that a lot of folks in the industry have about AI is that AI will solve everything very quickly. I can give you a very personal example that I keep on seeing throughout my career. We have very good coding agents, we have very powerful models, but I think there is still value in rewriting your entire code base from scratch. There are some unintentional bugs that humans introduced, and most likely AI models will introduce when editing these codes. So there is still value in rewriting stuff. AI is not going to solve all the problems. AI will enable humans to solve problems at a faster pace.

Sam Ransbotham: Well, one can hope. What was the first career that you wanted?

Vishal Gupta: I always wanted to be a software engineer.

Sam Ransbotham: That paid off for you, then. So on Reddit, besides cricket, what do you search for?

Vishal Gupta: I am very much into r/BayArea, r/AskSF, what’s going in [San Francisco], what’s going on in [the] Bay Area. Recently I started following [a] community of my hometown in India. It’s a very small town, but there is a Reddit community about that with a handful of users. But it is interesting to see what’s going on, and I also participate.

Sam Ransbotham: Vishal, it’s been great talking to you. I think that there’s an enduring theme that’s come out about balance, whether it’s explore-exploit, whether it’s the revenue production of ads versus the content for people, I think your answers about that are interesting to me because it points out how often we’re making that balanced decision. Thanks for taking the time to talk with us. It’s been fascinating.

Vishal Gupta: Likewise, Sam. Thank you so much.

Sam Ransbotham: Thanks for listening today. On our next episode, I’ll be joined by GeekWire cofounder Todd Bishop, for a fun conversation about AI podcasts. Please tune in!

Allison Ryder: Thanks for listening to Me, Myself, and AI. Our show is able to continue, in large part, due to listener support. Your streams and downloads make a big difference. If you have a moment, please consider leaving us an Apple Podcasts review or a rating on Spotify. And share our show with others you think might find it interesting and helpful.