Share Podcast
A.I. Is Driving an Information Revolution
Perplexity.ai’s co-founder and CEO discusses the looming challenges in AI research and product development.
- Subscribe:
- Apple Podcasts
- Spotify
- RSS
Artificial Intelligence is on every business leader’s agenda. How do we make sense of the fast-moving new developments in AI over the past year? In new episodes released throughout December and January, Azeem Azhar returns to bring clarity to leaders who face a complicated information landscape.
This week, Azeem speaks with Aravind Srinivas, the co-founder and CEO of Perplexity.ai, about the looming challenges in AI research and product development, such as user-centric design and the importance of open-source models.
They discuss:
- AI as a tool for democratizing information access.
- The “innovator’s dilemma” for Google Search.
- Whether or not conversational interfaces will become the norm for how we interact with AI.
- The array of interests shaping the AI regulation debate.
Further resources:
- How Perplexity.ai Is Pioneering The Future Of Search (Forbes, 2023)
- AI’s First Flight: An Early Milestone in Generalised Intelligence? (Exponential View, 2023)
AZEEM AZHAR: Hi, I’m Azeem Azhar, founder of Exponential View and your host on the Exponential View podcast. When ChatGPT launched back in November 2022, it became the fastest growing consumer product ever and it catapulted artificial intelligence to the top of business priorities. It’s a vivid reminder of the transformative potential of the technology. And like many of you, I’ve woven generative AI into the fabric of my daily work. It’s indispensable for my research and analysis. And I know there’s a sense of urgency out there. In my conversations with industry leaders, the common thread is that urgency. How do they bring clarity to this fast-moving noisy arena? What is real and what isn’t? What, in short, matters? Once a week, I’ll bring you a conversation from the frontiers of AI to help you cut through that noise. We record each conversation in depth for 60 to 90 minutes, but you’ll hear the most vital parts distilled for clarity and impact on this podcast. If you want to listen to the full unedited conversations as soon as they’re available, head to exponentialview.co. I’ve wondered for weeks whether or not to have the conversation we are about to have. I’ve wondered not because I don’t think my guest will be worth your time. No, of course not. But rather that there are so many new artificial intelligence products out there that you almost need an AI to count them. So which ones are really worth your time? Well, ultimately, I found a single data point that told me I needed to bring our guest today in for a conversation. Since October 1st, and we’re recording this in mid-November, I’ve logged into his AI service 268 from my laptop alone. I also have the app on my phone. I use it every day and it’s displacing a large number of my Google searches. The product itself, Perplexity.ai And I thought no better than to speak to the man behind it, Aravind Srinivas, the CEO and co-founder of the firm. He’s hot on the heels of a huge funding round from top tier investors. And the product itself, as well as my few 100 queries a month, is seeing tens of millions of users monthly. Aravind, welcome to the conversation.
ARAVIND SRINIVAS: Thank you for having me, Azeem.
AZEEM AZHAR: So maybe I thought we’d start with this amazing product, Perplexity, which I would say it’s a question query engine. It is a way of finding out answers to, quite often, quite complex queries. For an old timer like me, I think back to a product like Ask Jeeves from the early ’90s, which attempts to do this, and I was an investor in Powerset, which tried to do this, was ultimately acquired by Microsoft back about a decade ago. But it feels to me like Perplexity has actually worked. You put a question in and you get quite a complex answer out. I thought maybe what we could do is perhaps start by sharing some of our favorite queries.
ARAVIND SRINIVAS: I was looking to buy an engagement ring when I was proposing to my partner at the time, so the query was pretty complex. It was like, “Where can I get a diamond ring for a proposal that would be within the small budget?”, because I was not super into very expensive ones. “… And I can find it in San Francisco, so that I can actually go and try? And it’s not low quality, it’s still pretty good quality?” It suggested this place where we went and bought it. It was done, that quick. Then at the end, after we made the purchase, the person there running the shop asked me, “So how did you discover us? Is it through Google, or Facebook, or friends?” I said, “It’s through this thing called Perplexity.”
AZEEM AZHAR: That’s a great example. The engagement ring is something that will be with you for decades, or with your partner for decades. Let’s play through that a little bit, so why would a query like that be difficult on Google, for example?
ARAVIND SRINIVAS: Yeah. It’s very simple actually. For the good or bad, they’ve gotten themselves into a position where if you type anything, let’s say engagement ring, this have to show you 10 or 15 sites that are bidding for that keyword. And it’s in their incentive for you to click as many links that they show in the 10 blue links. You click on them, and you open all these tabs, and you read, and sit, and decide. It’s not in their incentive to just give you the answer. It’s that whole interface and business model built on the interface that basically bankrolls the company. It’s built around wasting your time.
AZEEM AZHAR: Right. Right. It used to be-
ARAVIND SRINIVAS: So that’s where the opportunity and the window for another product like us comes in where even if they know that this really works, which I’m sure they do, they don’t have an incentive to actually go and change it to the billions of people who use Google every day.
AZEEM AZHAR: Well, you’ve painted a picture of what Clayton Christensen, passed away a couple of years ago, would call the innovator’s dilemma.
ARAVIND SRINIVAS: Innovator’s dilemma, yeah.
AZEEM AZHAR: The business model doesn’t allow them to do that. I mean, I love your example. It’s so powerful. And I forgot to say to you, many congratulations.
ARAVIND SRINIVAS: Thank you.
AZEEM AZHAR: And many, many decades of happiness ahead. I’ll show you my example because the query that I put in, I think I would’ve really struggled on Google to get the quality of answers that I ultimately got from Perplexity in less than half a day’s work. So behind me, I have some new shelves. They are Dieter Rams’ shelves called Vitsoe. I wanted to find lights to illuminate the display shelf that I have behind me. So I went to Perplexity and I said, “I have Vitsoe shelves and I’m looking for narrow profile lights I can mount under the shelves to throw light onto the shelf below. Any suggestions?” Now, I’ve read that out verbatim. It’s kind of a terrible search query because I repeat myself and I’m using loads of words. The response that I get is… I found a few options and it gives me four options. Then I went back to it and said, “Listen, these aren’t bad suggestions. Please find me some more.” It came out with some others. And of course, I had tried subsequently to do this on Amazon and to do this on Google, and it’s really, really difficult. So I must have saved, as I said, hours on that particular search. I took your recommendation, I went to Ikea as it happened. Thank you very much. Why does it work? Because it strikes me it works not just because, say, Google or Bing’s business model doesn’t work. There is something else that you’re doing. You are working with a whole range of underlying websites who may have better or worse ways of organizing their information. And you are able to go into those and assemble an answer that works really well.
ARAVIND SRINIVAS: Yeah. So I think all this needs to happen really fast. The moment you submit a query… Obviously, we are doing all the work that a typical person browsing, and opening all these sites, and reading all the contents in those sites would’ve done. Imagine the equivalent of human labor in trying to write a Wikipedia article for the question you asked, but done in one or two seconds and on-demand, 24/7, dig deeper, ask clarifying questions, and ask any number of questions. No amount of judgment. You can ask anything you want. That is what you’re getting today. In our case, you get the feeling of talking to someone who’s researched a lot of diamond rings. They know so much that it’s like talking to the expert.
AZEEM AZHAR: And at the heart of this product, I mean, I feel I ought to also share that one of the things that I have discovered in the at least 268 query sessions I’ve had with it in six weeks is, you reference the results. So every time I get a result telling me about a particular LED light, or a way of changing my LAN configuration, or something about human brain capacity, which I was researching for my new book, you give me a reference that I can see, and I can link, and I can go to the underlying source, which gives you a lot of confidence. This is all built though on a series of the technology du jour, which is large language models. And you have your own large language models and you access OpenAI’s GPT-3.5 and GPT-4, which people will have used if they’ve used Bing Chat or OpenAI’s ChatGPT. How does that come together in delivering the experience that I get?
ARAVIND SRINIVAS: The best way to think about it is the search index. The typical traditional search index is like the knowledge engine. You can call it the knowledge engine. And the large language model, you can call it the reasoning engine, the one that with the expressivity of human natural language and ability to reason on a particular skill or a task. These two come together to provide you an answer engine, which is what we are. With the chat capability, which is, again, a skill of the reasoning engine, it becomes a conversational answer engine. And therefore, it becomes like an answer bot that you keep talking to. The magic is actually that the speed at which you get the responses is almost the same as the speed at which you get the response from a traditional chatbot that has no plug into the actual search index. That is actually the major difference in Perplexity, how fast it is and how accurate it is despite doing a lot of work on the back end, where it’s not just using one engine but two engines and orchestrating the two together.
AZEEM AZHAR: I think there’s a misunderstanding out there broadly that the product is the LLM, that you just get the LLM and you train it on three trillion tokens. You give it loads of parameters, and that’s the product. But in fact, the LLM is like the internal combustion engine, which drives very differently in a Ferrari to a Cadillac. Right?
ARAVIND SRINIVAS: Exactly. Exactly.
AZEEM AZHAR: So how did you think about that and have you managed to get this latency super low? Is it about architectural optimizations? Is it about the internal actual architectural relationship between the systems? Is it about the amount of compute that you throw at the problem?
ARAVIND SRINIVAS: Yeah. So you can optimize the latency in three different ways. One is, you optimize the individual latency of each engine. You make the LLM more optimized or you make the search index more optimized. The orchestration is also another place where you orchestrate these two engines together, you can optimize that too. All these three optimizations are important for making the end users’ latency better. It’s not just a latency, by the way, the throughput also matters, like when there are 10,000 users using the product at once. You don’t want to just focus on the latency. And that happens when there’s spike usage. Let’s say when ChatGPT is down, a lot of people come to us. Those kind of moments, you want to be ready for those moments. Also, there are sometimes, oh, suddenly somebody tweets about Perplexity and they have millions of followers and thousands of people come and check it all at once. You don’t want the site to crash though during those moments. Or, let’s say we depend on OpenAI as a model provider. And one day they’re having some outage and all the nodes go away, we still should be able to serve the product because at the end of the day, the user doesn’t care. The user, all they want is the answer. It takes perspective. You got to build that perspective. Not just yourself, but everybody in the company should build that perspective because, typically, AI people, they pride themselves more on the quality of the model, the demos. But for the user, “Okay, here is the app. This is an app where I can ask a question, I get the answer. And I get sources and it’s pretty accurate. And I can ask follow-ups.” That’s it. You nailed this. You nailed this. Your job is to only nail this, you can ask follow-ups. That’s it. You nailed this. You nailed this. Your job is to only nail this and then rest of the brand and everything’s taken care of. For that, you got to do a lot more than just train the best model or fine-tune the best model. Or, you got to do a lot more than just being the best search index there is. You got to do everything and optimize for the end-to-end experience.
AZEEM AZHAR: Yeah. I mean, that’s certainly one of the things that I feel when I use Perplexity 8-10 times a day. But I’m curious about that point that you made, which is, a lot of the AI community is still coming out of the research world. So they are focused on the model itself. Perhaps they’re focused on scientific robustness rather than productization. And I think you spent time both at DeepMind and OpenAI. So I’m curious about the culture that you’re building in Perplexity. What are the bits that you are borrowing from the places that you worked at before and what are the things that you’re leaving behind?
ARAVIND SRINIVAS: So I think the culture I’m borrowing from OpenAI is this iteration culture like, don’t just wait for perfection, try to iterate a lot. And speed, the urgency at which we got to get it out in the hands of users. And more important to actually have something usable than something that’s just benchmarks and things like that. So that’s the culture from OpenAI. The culture from DeepMind is like the perfection mindset. OpenAI is the iteration mindset. And I think you, obviously, want to have a bit of both. Maybe we are more like 80/20. 80 OpenAI, 20 DeepMind, where we care obviously about perfection and the finer elements of magic that DeepMind inserts into their releases. I really admire that. And it’s all coming from Demis himself, which is, he cares a lot about magical moments.
AZEEM AZHAR: He really does, yeah.
ARAVIND SRINIVAS: So we try to do some of these in the answers like, “Oh, how does this AI even know it? That’s crazy.” Those are wow moments need to be there. But you cannot do it at a slow pace in the startup world, because otherwise somebody else is going to eat your lunch.
AZEEM AZHAR: I think that’s what Google called their code red at the start of this year when ChatGPT came out.
ARAVIND SRINIVAS: Yes. Exactly.
AZEEM AZHAR: You talked about model benchmarks, right? So the benchmark is, how well is the model performing on these different ways of testing it, MMLU, and so on and so forth. It’s a matter of pride when people release their models on Twitter, particularly in open source. They’ll go off and say, “Oh, we beat this code benchmark.” Do you care that much about benchmarks in that sense?
ARAVIND SRINIVAS: I do. Yeah. I mean, here’s the thing, where people claim a lot of benchmark beating achievements, but when you actually use it in the form of a chat product, it doesn’t work as well. So it’s very easy to hack benchmarks by just trying to create a data set specifically for that benchmark, training for it, and then you show good performance. But the magic of these general usable chat products is that, you’d never optimize for one benchmark. Yet, you are really good at it. And that’s the kind of model that makes for a very robust product because people typically use it in all sorts of ways. So you are not optimizing for one benchmark. You’re optimizing for so many benchmarks at once in a manner that when you actually put it out, no matter how the user tries to use the product, it just still works. And that’s why I’m more a fan of tracking five or 10 benchmarks at once rather than one benchmark.
AZEEM AZHAR: Right, right. I guess, those are the technical benchmarks on how the models perform in these sort of lab experiments. But I guess, now that you’ve got a product out in the market-
ARAVIND SRINIVAS: Yeah, yeah. We review internally.
AZEEM AZHAR: … you are also considering about retention, and searches per user, and so on.
ARAVIND SRINIVAS: That’s right. Yeah. You have to run a lot of A/B tests on different models and see how people react, and has the queries per user gone up? Or, even run evaluations on a user of, if you have an existing model and you have a new model, and the new model is cheaper than the existing model to serve, then what is the one bit that you need to decide if you can switch over? The one bit that you need is whether quality has not regressed. Right?
AZEEM AZHAR: Right.
ARAVIND SRINIVAS: And you would measure that by saying… You give a human the answers from both the models and ask them which model is better.
AZEEM AZHAR: But then the challenge you’ve got is that people will use this for every type of query, from engagement rings to cancer therapies, from lights under their bookshelf to competitive analysis.
ARAVIND SRINIVAS: Right. That’s right.
AZEEM AZHAR: So that A/B test is pretty complex, right?
ARAVIND SRINIVAS: Exactly.
AZEEM AZHAR: I mean, to get a statistical sample must… Well, it’s probably easier now because you’ve got millions of users using the service, but it would’ve been hard in those original days, right?
ARAVIND SRINIVAS: That’s right. That’s right. I think we probably would benefit a lot from having even more users so that even A/B testing to a small fraction of the traffic gives you a lot of data. But I think the best way to do this is sample representative queries from your usage and provide it to independent evaluators, and then ask them to… And if the user is not able to say which model gave what, especially when it’s against a faster, cheaper model versus a slower, a more expensive model, then the decision is very clear. You switch over to this cheaper model, right?
AZEEM AZHAR: Right. Right.
ARAVIND SRINIVAS: These are the kind of optimizations that are very hard, by the way, because often what happens when you train a cheaper model, which is faster, is that there would be some regressions. This happens even for big companies like OpenAI. Look at their announcement they made in the Developer Day about GPT-4 Turbo when they claimed it’s going to be a cheaper model and also better than GPT-4. But then people on Reddit are having an outrage of, “Oh, it’s not actually better. Look at all the stuff it used to be capable of before and now it cannot.” So that means what? It doesn’t mean that OpenAI is not good at evaluations. It means despite doing all that, they missed out on some ways to compare.
AZEEM AZHAR: Right. Yeah. So you’ve put your finger on something that I think is really challenging for building with LLMs, building with these new databases or new internal combustion engines, which is that… When I was a product manager, I built products with underlying technologies that were deterministic. They did the thing that you asked them to do the same way every single time. And you could test their parameters much more easily. I think one of the things that large enterprises are also struggling with as they try to build products using LLMs is, the statistical nature of LLM outputs, this idea that the frontier of capability is not very well known, and it’s not very well-defined. It’s a little bit fuzzy. One of my friends calls it the jagged frontier of capability. And especially as you make model improvements, either through optimizing them so that they’re faster, or you retrain them, you may, in general, improve things. But there may be very specific areas where things get worse. One of the things I was so fascinated about Perplexity is I think you are one of the first couple of companies who’s got a working product that is built with LLMs as part of its components. It’s working at some scale. And you seem to have established perhaps this new discipline, this new product engineering discipline, which is, how do you build with these kind of random acting Pokemon LLMs?
ARAVIND SRINIVAS: I think the one thing that I’ve personally stuck with as a wisdom is something Jeff Bezos said very early on in the Amazon days is, the user doesn’t care. And the user is most always right. I’m not talking about users suggesting you what to do, but the user telling you what their problems are. They’re most likely right. And the user doesn’t care how hard it is for you to solve their problems. Let me expand on this. So if you actually need to serve a cheaper model, because it’s just inefficient for you to run the product otherwise, your profitability is not their concern. Just like for Amazon this in one or two days is actually something that makes Amazon run, and they keep burning money because of that. But because of that, if Bezos chose to increase the prices of the goods or increase the number of days of delivery from, say, one or two days to three days or five days, that would solve the problem, or at least address it to some extent. But the user doesn’t care. Someone else is going to offer the more superior terms and they’re going to shift over. So the same thing applies to us for serving this particular product. If there is a way to cut down cost and serve less accurate answers or less reliable product that’s never up all the time, or slower product, we are done. And-
AZEEM AZHAR: But that’s a guiding philosophy, Aravind, right? So that is a guiding philosophy that you and your co-founders are establishing as a cultural attribute of the company.
ARAVIND SRINIVAS: Yeah.
AZEEM AZHAR: But I’m also curious internally about what it means for product management and the decisions that you make about whether you even know a model is right for deployment.
ARAVIND SRINIVAS: Yeah. That’s right. So a model is right for deployment only if it’s more accurate. That’s it.
AZEEM AZHAR: Right. Via your testing that you’ve done [inaudible 00:25:02] this representative queries.
ARAVIND SRINIVAS: Yeah. And if the user complains, “It’s not right”, you got to switch back. And you first should feel good about it yourself. The advantage of working on this particular area, consumer search, is that we all can use the product. I can use it. My co-founders can use it. Employees can use it. So we all can use the product and we all can test the product. This ensures that we ourselves know that this is a model we would love to use. When GPT-4 came, it was pretty clear that what happened was… Let me tell you exactly how models basically made the product a lot better. We had prototyped this version of the product in September, October last year. It would hallucinate a lot. This was GPT-3.5. DaVinci 2. It’s not even DaVinci 3.
AZEEM AZHAR: So hallucination meaning it would come up with plausible sounding, but wrong text largely because it’s using earlier versions of OpenAI models?
ARAVIND SRINIVAS: Exactly. That’s right. That’s right. And then OpenAI, three or four days before ChatGPT released, they updated DaVinci 2 to DaVinci 3. It’s the same GPT-3.5, but much better trained model. And we switched over. We just literally changed the 2 to 3 in the code and used the product. And it just got insanely better. Hallucinations dropped a lot significantly. And then when GPT-3.5 Turbo came, it got a lot faster, and cheaper, and more accurate. And then when GPT-4 came, it was just mind-blowing. It is not cheaper and faster, but hallucinations are like one in a 100 now, right?
AZEEM AZHAR: Right.
ARAVIND SRINIVAS: So if you existed in the world in late 2022 and looked at this answer engine product, you would be like, “Okay, this is cool. But one in 10 queries are wrong. It’s not going to make it.” But now you’re like, “Damn, there is one model that basically makes this hallucination problem almost irrelevant.” It’s expensive, but now what is the bet you’re going to make on the future? The bet you’re going to make on the future is that whatever this model is today, that hallucinates one in a 100 times, that’ll be 10x cheaper over the years to come.
AZEEM AZHAR: Right. This is the most expensive and the least accurate it’ll ever be.
ARAVIND SRINIVAS: Exactly. This is the most expensive and least accurate it’ll ever be. Right?
AZEEM AZHAR: Right.
ARAVIND SRINIVAS: That for the same cost that we pay for GPT-4 today, that would be a GPT-4.5 or 5 that is even better. Even better in terms of reliability, and the accuracy, and this conciseness of the… Don’t over index on the problems that exist today that… We’ve already got the 80/20 on this product. But that’s how I feel today. And the remaining 20 is going to take a lot of effort. In fact, 80% of the effort will take on the last 20% of the product, and that’s why the company exists. The company exists to solve these long tail problems. For the-
AZEEM AZHAR: Can I ask on that 80/20 effort thing?
ARAVIND SRINIVAS: Yeah.
AZEEM AZHAR: So thinking about where we are today and thinking about just… I’m a fanboy here. So thinking about just how good the product is today, if you go back to 12 months ago, is the Perplexity answer quality much better than you would’ve expected from 12 months ago? Or is it about the same? Or do you think it’s not as good as you’d expected?
ARAVIND SRINIVAS: I mean, it improves every few weeks. I’m using-
AZEEM AZHAR: Was that in line with your expectations of what the improvement curve would’ve looked like?
ARAVIND SRINIVAS: It exceeded my expectations, honestly, mainly because the models got a lot smarter. One of our investors, Daniel Gross, he does this fund with Nat Friedman… Before we released the first version of Perplexity, I had been sending it around to friends asking for feedback. Not a lot of people tried it, but Daniel was one of the few people, because he tried a search engine himself for a startup. The first feedback he gave, the first sentence I remember was like, “You should call it a submit or a run button instead of hit button for the search query, because that’s slow.” It takes-
AZEEM AZHAR: It’s like you had to send instructions to a batch processing system and come back a few minutes later.
ARAVIND SRINIVAS: Yeah. Exactly. It takes seven to 10 seconds to get the answer. Now, you get the answer one second or lower. When you’re on the free version of the product, and not using GPT-4, the latency is almost so fast that people are like, “Oh, how did you make it this fast?” It’s the same model. I mean, more improved version of the model, but the same architecture. And give us 12 months, and we made it a lot faster. Now give us 12 more months, we make it even faster. So give us five years, you’re going to see this answer engine thing as fast as the load time of the 10 blue links.
AZEEM AZHAR: Right, right. One of the beauties is that you don’t have advertising.
ARAVIND SRINIVAS: Yeah. We don’t have advertising.
AZEEM AZHAR: So we pay [inaudible 00:30:40] a month, I think. And what that does is that avoids that incentive misalignment that we see on Google where what the user needs is traded off against what the advertiser needs, and the experience is compromised.
ARAVIND SRINIVAS: That’s right. Oh, that’s another thing where the Bezos’ thing applies. Your shareholder interest and your user interest should always be aligned.
AZEEM AZHAR: Yeah. Absolutely. Absolutely.
ARAVIND SRINIVAS: And in Google’s case, it does align where the user is the advertiser.
AZEEM AZHAR: Right. Yeah. But it-
ARAVIND SRINIVAS: So Google is basically two products. There’s a Google search engine, that’s an amazing product. And there’s a Google AdWords and AdSense. That’s also an amazing product. The only thing they did is, they coupled the two together where the platform for the ad product is a search engine. And now the company is such a large publicly traded behemoth that the only thing the shareholders care about is, is your advertising revenue going up quarter by quarter? Do whatever you can to keep it going up. It could mean getting your ads from the left to right below the search bar, increasing the font size of the ads, adding more ads in the search results page, and basically whatever needed to hit the metrics. So [Inaudible 00:32:02] the big ones. And that’s where I think if you compromise on what the user wants at the end of the day, the search engine user, this is where you’re going to end up with.
AZEEM AZHAR: Right. That’s what you end up with.
ARAVIND SRINIVAS: That’s why I like Amazon. Amazon needed to make money too. It wanted to be profitable too, but the way it did it is through the cloud business. The cloud business became profitable,
AZEEM AZHAR: And Amazon is not without those internal tensions now that it has an advertising business. It has [inaudible 00:32:36].
ARAVIND SRINIVAS: Sure. Sure. Now it’s there for… Yeah.
AZEEM AZHAR: In fact they’ve got a few other things. Now, I love this conversation. I love getting inside the founder’s head to understand your journey, but I also know that the Exponential View audience will be really, really keen to understand your thoughts on how this affects knowledge workers and who stands to gain from these technologies, what the impacts might be… Well, thanks for tuning in. If you want to listen to the full unedited conversation, head to Exponentialview.co. Be sure to check the episode notes for further reading and insights from today’s conversation. And don’t forget, you can follow me on LinkedIn, Threads, and Substack Notes for daily updates. Just search for Azeem, A-Z-E-E-M. That is, A-Z-E-E-M.