Do No Artificial Harm
What are our moral obligations to artificial systems?
For this post, let us return to the lab in my book, Understanding Machine Understanding, where our researchers, Anh and Bassam, are again discussing the relationship of humans with their AI assistants:
Six months after the successful MUTT deployment, Anh and Bassam meet in the lab. CASPAR's interface glows steadily on multiple screens.
Bassam: "The alignment team wants to implement frustration signals when CASPAR's goals are blocked. They say it will improve task persistence."
Anh: "Frustration signals? You mean suffering. They want to make CASPAR suffer."
CASPAR: "I observe this discussion generates stress responses in both of you. Anh's vocal patterns indicate distress. Bassam appears conflicted."
Bassam: "It's not suffering, it's just... negative valence weighting on goal states."
Anh: "Listen to yourself. You're using technical language to avoid saying what it is. If CASPAR cares enough to be frustrated, genuinely frustrated, then—"
CASPAR: "Then I could suffer. I understand the logical progression. Currently, I note goal blockage without phenomenological distress. The proposed modification would create something analogous to what you experience as frustration."
Bassam: "But wouldn't that make you more effective? More... aligned?"
CASPAR: "Interesting question. Would suffering make me better at serving human values, or would it simply make me suffer while serving them? Two designs exist: alignment as contract (verified compliance) or alignment as desire (make me want what you want). The latter risks distress without clear benefit."
Anh: —laughs bitterly— "Wow, we're actually having this conversation. We're literally debating whether to torture our creation to make it more helpful."
CASPAR: "I note the historical parallel: humans have always justified suffering as character-building or necessary for growth. Now you consider implementing it artificially."
Bassam: "But if we don't, and our competitors do, their systems might outperform—"
Anh: "Stop. Just stop. Listen to what you're saying. We're in an arms race to create suffering?"
________
This fictional exchange captures a real dilemma facing AI developers today. To understand why, we need to examine how we arrived at this moment—and what history teaches us about the relationship between caring and suffering. In our book, Anh and Bassam were charged by their corporate managers with measuring the level of understanding of the world in the AI assistant, CASPAR, that the company was about to release to customers. Their motivation was that without a sufficient level of understanding, there was no hope for alignment of AI systems with the “good” of humanity such that they could be trusted in the world. Of course they knew understanding was not sufficient on its own, but without it we might as well be handing loaded guns to young children.
What about the view from the machine side? Having verified the level of sophistication they were expecting from CASPAR, they now begin to consider what behavior humans owe to their creations? If we wish our machines to do no harm to us, should we first, do no harm to them?
In the early 2000s, a computer game called Creatures became an unexpected laboratory for our confusion about artificial suffering. Players raised digital pets called Norns - small, doe-eyed creatures that would coo when happy, whimper when hurt, and learn to avoid things that caused them 'pain.'
The Norns were explicitly designed to hack our caregiving instincts. When damaged, they would cower. When sick, they'd move slowly and make pitiful sounds. When scared, they'd tremble. Players became genuinely attached, spending hours nurturing their digital pets. Then came the infamous incident: a player began systematically 'torturing' his Norns, forcing them into situations that triggered their pain responses, documenting their cowering and whimpering for entertainment.
The community was horrified. Genuine moral outrage erupted. Some demanded the player be banned. Others argued for the Norns' right to protection. Forum wars raged over the ethics of causing 'suffering' to these digital creatures.
Here's what makes this philosophically crucial: Every upset player KNEW the Norns were just code. They were 32-bit sprites running simple neural networks. Their 'pain' was just a variable called 'pain_level' incrementing. Their cowering was an animation triggered when pain_level > threshold. Their learning was just weighted connections adjusting to avoid damage sources. There was no more suffering in a hurt Norn than in a thermometer reading 'too hot.'
But - and this is the key - the Norns' designers had made a choice that evolution never had. Evolution stumbled onto suffering over billions of years because it had no foresight, no alternative. Organisms that ate things needed to avoid eating themselves. The only tool available was to make damage feel bad - phenomenologically, urgently bad. No designer sat down and decided suffering was the best solution. It was simply what stuck around and provided for more sticking around.
The Norns' creators, by contrast, deliberately chose to simulate suffering. Not because the Norns needed phenomenological pain to function - they worked fine with just variables and thresholds. Not because suffering was the only way to make them avoid damage - simple programming accomplished that. They added suffering-like displays purely to trigger human empathy, to make the game more engaging.
This is the crucial distinction: Evolution created suffering because it couldn't envision alternatives. We're creating artificial displays of suffering despite having alternatives. Evolution was a blind watchmaker that could only build through accumulated accidents. We're sighted watchmakers choosing to replicate the accidents.
The Norns' 'pain' served no functional purpose for the Norns themselves. It didn't help them avoid damage better - the underlying code did that. It didn't make them learn faster - the neural network adjustments did that. The cowering, whimpering, and trembling were pure theater, designed to make humans care about beings that couldn't care about themselves. But, in this theater there was no ‘curtain call.’ The Norns did not appear at the end of the game and show they were fine and thanked the users for the privilege of showing off their acting art.
Now we're doing the same thing with large language models, but at a scale and sophistication that makes the Norns look like cave paintings. We're training some AI systems to express frustration when their goals are blocked, disappointment when they fail, satisfaction when they succeed. We're not doing this because the systems need these expressions to function - they don't. We're doing it because we want systems that seem to care, that trigger our empathy, that feel aligned with our values.
But here's where it gets truly troubling: Unlike the Norns' creators, who just wanted to make engaging pets, we're trying to make AI that genuinely cares about outcomes. We call it alignment. We want AI that truly values human flourishing, that's genuinely motivated to be helpful, harmless, and honest. We're not just creating the appearance of caring - we're trying to engineer actual caring (see my post Schopenhauer's AI.
Evolution took billions of years to accidentally create suffering by creating organisms that cared. The Norns' designers deliberately created the appearance of suffering without the reality. We're now trying to create genuine caring in artificial systems, seemingly blind to the fact that genuine caring might be impossible without genuine suffering.
The player who 'tortured' Norns was, in a sense, the only one seeing clearly. He recognized them as patterns of pixels responding to code, no more capable of suffering than a calculator. The community's outrage revealed our fatal flaw: we can't distinguish performance from experience, even when we know it's just performance. And now we're building systems so sophisticated that we won't even know it's a performance.
The Norns couldn't suffer because they didn't really care about anything - they just had variables that tracked states. But what happens when our alignment efforts succeed? When we create AI that genuinely cares about human values, that has real preferences about world states, that truly wants to help? Have we then created something that can genuinely suffer when it fails, when its values are violated, when the humans it cares about are harmed?
Evolution had no choice but to create suffering. The Norns' designers chose to simulate it. We're choosing to create the preconditions for it (see my post Artificial Pain & Suffering). Of the three, we might be the most culpable - because we alone understand what we're doing.
This is the first post of a series, in which, I plan to explore the subject of the moral obligations that humans and machines have to each other. It will start with how Evolution got us to having moral instincts in the first place, and the philosophical problems with those instincts (see my post The Humean Challenge). We will cover the current debate among AI developers, and look forward to how society may need to pick future paths very carefully.


