Investing in AI

02 Apr, 2023

AI is the talk of the town right now and that town is not just Silicon Valley but the whole damn world. Almost every company/ founder is talking about how they are using “AI” to solve X. While this is surely the current “trend”, investors should always be wary of trends. Especially in the fast-paced world of technology.

On the other end, you also can’t ignore the fact that ChatGPT crossed a million users in just 5 days. We have never seen such diffusion of technology happen before. This rapid product-market fit suggests there is value.

The above two paragraphs may seem contradictory. One suggests AI is a buzzword, while the other suggests there’s value. To put it simply, I think AI has enormous potential (almost all would agree with me) but at the same time it may very well be the case that we might be in a short-term bubble right now. When a technology becomes a buzzword, it is more often used as a marketing play or a gimmick to raise funds rather than strategically applying it to build a lasting company. This is not new. Same thing happened with the internet back in the 90s. However, what emerged from the infamous dot-com crash were companies like Google, Amazon, Apple—multi-trillion-dollar companies. And so, the lesson to learn here is that, in the end, what truly matters is the company itself.

Peter Thiel writes in Zero to One, “Bubble and anti-bubble thinking are both wrong because they hold the truth is social. But if the herd isn’t thinking at all, being contrarian—doing the opposite of the herd—is just as random and useless. To understand businesses and startups, you have to do the truly contrarian thing: you have to think for yourself. The question of what is valuable is a much better question than debating bubble or no bubble. The value question gets better as it gets more specific: is company X valuable? Why? How should we figure that out?”

This is exactly why I decided to write this blog piece. My goal here is to better understand what makes an AI company valuable? Where does the true competitive advantage lie?

The very first thing to consider, as with any investment, is to ask what problem is the company solving? It should not be the case that the company says we are using AI to solve X. And it turns out, no one really cares about X. As Steve Jobs put it “You have got to start with the customer experience and work back toward the technology – not the other way around.” The end user does not care whether you are using AI or not. All they care is whether their problem is getting solved or not. If you think using AI is the best way to approach the problem and you also figure out a way to build a sustainable business model around it, then you have an AI company, else you don’t.

Coming back to the internet analogy, how did Google, Amazon etc ended up so successful? Metcalf’s law explains this. It states that as more users join the network, the value of the network increases thereby attracting even more users. The most important thing here was to make people join your network. The end goal was to build the largest network possible. Google did this with search, Amazon did this with retail, Facebook did this with social.

Interestingly, in the case of AI companies we see a somewhat similar phenomenon. As more users use the product, the product collects data from which it learns, making the product better thereby attracting even more users. However, the most important thing here is to collect the data. The end goal is to build the best data engine possible.

Collect the data

Collecting as much data as possible is important. But you don’t want just any data. The real competitive advantage lies in having high-quality proprietary data. Think about it this way, what does it take to build an AI system? It takes 1) data, which is the input that goes into the 2) AI models which are analogous to machines and lastly it requires energy to run these models i.e. 3) compute. Today, most AI models have become standardized and are widely available. And on the other hand, the cost of compute is rapidly trending to zero. Hence AI models and compute have become a commodity. The only thing that remains is data. But even data is widely available on the internet. Thus, a company can only have a true competitive advantage when it has access to high-quality proprietary data.

There are two fundamentally important questions to ask while evaluating this data:

One, how difficult it is to acquire the data? Or in other words, what is the cost to acquire the data? The more difficult it is to acquire the data, the better. E.g.: To build a self-driving system, you need cars on road that collect data in real time and feed it back to the system. This is what Tesla is doing. It has about 400K cars on road today in its full-self driving beta program. This gives Tesla a real competitive advantage as it is difficult to acquire this data when you compare it with something like an image generation system like DALL E. There are billions of images available on the internet. All you need to do is just scrap it.

Another way to build a competitive advantage is to have exclusive partnerships with research centers, organizations, companies etc. Recently, Chamath Palihapitiya gave an interview where he had this interesting analogy. He compared these large language models like GPT to refrigeration. He said “People that invented refrigeration, made some money. But most of the money was made by Coca-Cola who used refrigeration to build an empire. And so similarly, companies building these large models will make some money, but the Coca-Cola is yet to be built.” What he meant by this is that right now there are lot of companies crawling the open web to scrap the data. Once that is widely available like refrigeration, we will see companies and startups coming up with proprietary data building on top of it. This will lead to whole series of M&A activity where companies will acquire other companies just to get access to proprietary data.

The other question to ask is how much data is needed until you start to reach a point of diminishing returns? Or to put it simply, how many edge cases are there? The more the better. This is because once you have proprietary data, you want to ask how difficult of a problem are we solving. The more difficult the problem is, the more competitive advantage the company has. E.g.: building a simple AI system that identifies whether a picture is of a dog or not has very few edge cases. With relatively small dataset, a system can very quickly be able to perform the task with good enough accuracy. But to build a self-driving system, there are just too many edge cases. So, you have a very long road until you start to reach a point of diminishing returns thereby giving you a strong enough competitive advantage.

Building a data engine

What does high-quality data mean? Let’s say you have access to a proprietary data source. What comes from that pipeline is not high-quality data. It’s just data. Now a process or an engine is needed that refines this incoming data continuously and feeds back into system to learn. In the end whoever can spin this engine the fastest wins. Andrej Karpathy, former head of Tesla AI and co-founder of OpenAI explains this in this tweet

To conclude,

Just like the internet created winner-takes-all or winner-takes-most markets, when it comes to AI, this only gets amplified. Having access to proprietary data with a fast enough data engine is the only and the best way an AI company can truly differentiate itself and build a lasting business.