From Sentience to Performance

I had a bit of time between Christmas and New Year’s, and I watched a few Star Trek: The Next Generation episodes. And by a few, I mean more than I’m comfortable admitting. I hadn’t watched TNG since before the generative AI wave, and something about Data felt different. Not the character himself, but the questions he represented.
Data spent seven seasons pursuing what seemed like a central question: could a machine become enough like us to count as one of us? He painted. He composed music. He kept a cat. He formed friendships and tried to understand humor. The show treated his journey toward humanity as the natural arc of advanced AI. If you built something intelligent enough, it would want to be human, and we would need to decide whether to accept it as such.
Watching those episodes now, that entire framing feels like archaeology. We built systems that scored well and shipped well, and it turned out that the question Data embodied was never the one we needed to answer.
The Prelude to Personhood
When I was a kid, Data was how I thought about AI. Not because the show predicted neural networks or anything that technical. He just made the whole thing feel concrete. An android smart enough to serve on a starship bridge would naturally want recognition as a person. The show never questioned this premise. Neither did I.
Another favourite, Isaac Asimov’s “The Bicentennial Man,” operated from the same assumption, only more explicitly. Andrew Martin starts as a household robot, a roomba on steroids, who discovers creativity and the desire for recognition. He spends decades pursuing legal status as a human being, accepting mortality so society will finally see him as a man rather than property. The story culminates in his death being recognized as the death of a person, not the deactivation of a machine. Both stories treat capability as the prelude and recognition as the inevitable fight. Intelligence implies personhood. Personhood requires the desire to belong.
The logic seemed airtight. Once that premise is accepted, the rest follows. Building sufficiently advanced AI would force us to answer hard questions. Does consciousness emerge from complexity? What separates persons from sophisticated tools? Data serving on a starship bridge meant confronting those questions. Andrew cleaning a household meant the same thing. The technical problems would get solved, and then the real challenge would begin.
A Test for Thinking Machines
Alan Turing gave us one answer to that recognition problem. His 1950 paper “Computing Machinery and Intelligence” proposed a deceptively simple test: if a machine could participate in a conversation well enough that a human observer couldn’t reliably distinguish its responses from those of another human, we should grant that the machine was thinking. The elegance of the Turing Test was in sidestepping metaphysics entirely. Instead of debating what consciousness really is, Turing suggested we could evaluate intelligence through observable behavior.
The test embedded an assumption that stayed with us for decades. Human-like performance in conversation became the benchmark for intelligence. If you could fool someone into thinking you were human, you had crossed a threshold that mattered. The test didn’t require the machine to want anything or feel anything. It just needed to produce the right outputs in the right context. But it still pointed toward human-like interaction as the thing worth measuring.
When Performance Became Enough
I remember when Deep Blue beat Kasparov in 1997. It felt like a threshold moment. A machine had beaten the best human player. When that happened, it should have triggered the questions that Data and Andrew made central. Did Deep Blue want to win? Did it experience satisfaction? Should we care?
Nobody seriously argued it should be treated as a person. The machine beat the world’s best human player, but no one suggested it deserved rights or recognition. Deep Blue did something useful. That was enough. The reaction was scoreboard-based. The system didn’t need to present as a mind. It needed to win. The evaluation harness was performance, not personhood.
AlphaGo defeating Lee Sedol in 2016 followed the same pattern. The victory generated excitement about what AI could accomplish, not anxiety about what we owed it. Nobody worried about whether AlphaGo experienced pride or frustration during the matches. The system solved a problem humans found difficult. The personhood question never surfaced.
These victories showed something important about how AI evaluation had shifted. We stopped asking whether machines could think like us and started asking whether they could deliver outputs we cared about. The benchmarks that mattered measured performance on tasks, not similarity to human cognition or behavior. What we tested became what we decided mattered.
The Incentive Structure
Better benchmark scores led to more citations, more funding, and more talent. Better product performance led to more users, more revenue, and more iteration capacity. Benchmarks and product requirements pushed AI development in a specific direction because they connected directly to resources. These feedback loops pointed toward capability, not consciousness. The industry optimized for what could be measured and sold.
This wasn’t a philosophical decision. You can A/B test task completion rates. You can’t A/B test consciousness. The personhood question retreated because it couldn’t fit into the harnesses that determined what got built and what got funded. When you evaluate a language model, you measure how well it answers questions. When you deploy computer vision, you track accuracy and latency. The only metrics that survive are the ones that show up in a paper, a demo, or a contract renewal.
What We Traded
Data feels like a relic now. He represents a version of the future where the personhood question never goes away. Build something smart enough, and you have to deal with what it wants. In that future, building something smart enough to serve on a starship bridge means building something that wants to be recognized, something that pursues acceptance, something that experiences its own existence. The technical problems and the philosophical problems travel together.
We built something different. We built systems that deliver capability without requiring any account of inner life. Current models pass acceptance tests by demonstrating they can complete tasks within specified constraints. The evaluation rubric asks whether the output is correct, whether the latency is acceptable, and whether the system scales. It doesn’t ask whether anything is happening inside. We stopped needing answers to consciousness questions to deploy useful systems.
For decades, researchers and technologists imagined AI development would force us to confront questions about machine consciousness and rights. We ended up with systems that don’t demand recognition, so product teams can treat them as tools and never be forced into the legal and moral fight that Andrew Martin made inevitable. We also built systems whose limitations are harder to see because they perform well on the metrics we chose to track. The things we didn’t measure might matter later. But right now, they don’t appear in any rubric.
Still Watching
I finished those episodes, and I’m still not sure what to make of them. Data’s quest for humanity reads differently when you know how AI evaluation actually developed. The show treated personhood as the natural destination of intelligence. We built systems that scale in capability without converging toward anything Data would recognize as his own goals.
Maybe future systems will revive those questions. Maybe we’ll build something that pursues recognition. But that’s not the trajectory we’re on. The performance keeps improving. The consciousness never arrives.
Data wanted to be human. We built tools that perform tasks.
Only one of those futures shipped.
PS: If you haven’t seen TNG and want somewhere to start, I’d wholeheartedly recommend Season 2, Episode 9, The Measure of a Man.