Modern AI History

Part 2: The Godfather's 40 Year Bet

Part 2 of an ongoing series on Modern AI History.

Let's skip along nineteen years, to 2012. Nvidia is no longer a long shot. They survived two near-death experiences in the 1990s, won the gaming wars of the 2000s, and now sit comfortably as the graphics chip of choice for anyone who takes computing seriously. Jensen is still CEO. The chip they set out to build — the specialist for video games — is, for all intents and purposes, finished.

From the three men at the Denny's, we travel north to Toronto, where three other men are about to flip a light on in a spare bedroom. That bedroom belongs to a sixty-four-year-old British professor who, four decades earlier, made a bet that the entire field of artificial intelligence had told him was a waste of his life.

His name is Geoffrey Hinton. And to understand his bet, you need to understand the argument he was betting against.

The Civil War

For most of the twentieth century, AI had two camps, and they could not stand each other.

The first camp believed in rules. If you wanted a computer to recognise a cat, you sat down and told it, in painstaking detail, what a cat was: pointed ears, whiskers, four legs, a tail. You translated the world into code, one rule at a time, and the computer checked each new image against your list.

The second camp thought the whole approach was a dead end. You couldn't possibly write down every rule for every thing in the world. What you could do was show the machine millions of examples and let it figure out for itself what a cat looks like. Don't tell it about whiskers. Don't tell it about ears. Let it find the pattern the way a child does, by seeing enough cats that cat eventually becomes obvious. They called the approach a neural network, named loosely after the way a brain learns.

Hinton had been on the losing side of this argument since 1972, when he showed up to grad school in Edinburgh and told his supervisor he wanted to work on neural networks. His supervisor spent the next several years trying to talk him out of it.

He didn't listen. In 1986, Hinton and two collaborators published a paper that cracked a problem nobody else had — for the first time, a neural network could actually be trained at scale. For a brief moment, it looked like his side might win.

It didn't. Within a few years the theory ran into the same wall it always had: not enough data, not enough computing power. The field moved on. Funding dried up. Researchers drifted to other approaches, and the period got a name that stuck — the AI winter.

Hinton stayed in the cold. He kept publishing, kept training students, kept refining the theory while the rest of the field looked the other way. By 2012 he was sixty-four years old and had been waiting forty years for the two missing pieces — the data and the compute — to finally show up.

That year, both walked through the door. Along with two graduate students: a quiet Russian-Israeli named Ilya Sutskever, and a programmer named Alex Krizhevsky.

ImageNet

In 2010, a Stanford professor named Fei-Fei Li launched a competition called ImageNet. The premise was simple: here are a million labelled images. Build a system that can look at a new one and tell you what's in it. Whoever makes the fewest mistakes wins.

For the first two years, the winning entries used the standard rule-based methods. Their error rates barely budged.

In September 2012, Hinton, Sutskever, and Krizhevsky dragged a neural network — the unfashionable approach — out of the drawer and entered. Their system, called AlexNet, didn't win. Or rather, didn't just win. It demolished the field.

The previous year's best had posted an error rate of around 26%. AlexNet posted 15.3%. In a mature scientific field, results don't move like that. A good year was an improvement of one or two percentage points. AlexNet didn't just show up and win the Olympic 100m. It ran it in seven seconds.

Within months, every serious computer vision lab on Earth was tearing the AlexNet paper apart, trying to understand how three people in Toronto had broken their field. The three of them registered a company called DNNresearch. Google bought it for $44 million.

In 2024, twelve years after the bet that had defined his career finally paid off, Geoffrey Hinton — by now known as the godfather of AI — won something almost as precious. A Nobel Prize.

But before we put the tissues down, there's still a question to answer. How did they do it? Neural networks, remember, need two things: data and compute. The data came from ImageNet. The compute came from somewhere else entirely — and to understand that, we need to talk about another big bet, made years earlier by someone who had nothing to do with Hinton.

Because AlexNet hadn't been trained on a supercomputer, or in a Google data centre. It had been trained on two consumer graphics cards Alex Krizhevsky had bought himself and plugged into a desktop in his bedroom. Two gaming cards. The kind teenagers used to play Call of Duty.

Both of them made by Nvidia.

The Other Bet

Let's get back to Denny's favourite son, Jensen Huang. We gave you Nvidia's trajectory at the start of the chapter, but left out the part that matters most.

In 2006, Nvidia released something called CUDA.

To draw a video game, a chip has to do thousands of small calculations at the same time — where every pixel goes, what colour it is, how the light bounces. A normal chip does one big calculation at a time, very fast. A graphics chip does thousands of small ones at once. That ability, it turned out, was useful for almost anything that needed a lot of math done in parallel: weather modelling, drug discovery, financial simulations, cracking codes. CUDA was the software that let people who weren't game developers use Nvidia's chips for any of it.

Almost nobody did. For six years, CUDA was a flop. Nvidia spent billions on it. Wall Street mocked Jensen openly. The stock didn't recover its 2007 peak until 2016 — nine years.

He wasn't betting on AI. He thought CUDA would find its market in finance and science. But quietly, on the fringes of the field, a handful of researchers — Andrew Ng at Stanford, a Swiss lab run by Jürgen Schmidhuber — had started using CUDA to train neural networks, getting results dozens of times faster than anyone else. The momentum was building even if nobody outside the field noticed.

Until the light went on in Toronto. At the time, Nvidia was worth $9 billion. Fourteen years later, it's almost 450 times that. They had built the only road wide enough for what came next.

→ Next: The Maverick Magicians who Turned a Page

← Previous: The Name on the Plaque at a Denny's Booth

Read the full series: Modern AI History

Part 2: The Godfather's 40 Year Bet

The Civil War

ImageNet

The Other Bet

Read next

Part 7: The day Prometheus gave us ChatGPT

Part 6: Amodei, Anthropic & the Lesson of Paul Atreides

Part 5: Attention is all you need