The Human Problem
In order for society to truly embrace the challenge of building AI systems that align with our goals and values, we must first reckon with perennial questions about human morality and ethics.
A Retrospective
Tremendous progress has been made in the last three years since I wrote about GPT-3’s early hints at sophistication, though not without an accompanying stream of conceptual roadblocks. Some of the greatest intellectuals and skeptics have felt compelled to confront (Noam Chomsky: The False Promise of ChatGPT) the growing capabilities of generative AI, recycling arguments that boil down to a disagreement around exactly what we mean by thinking. The NYT opinion piece was published just a few days before OpenAI announced their significantly more capable GPT-4 model—perhaps unsurprisingly, some of the article’s insights haven’t aged well.
Let’s revisit one of the examples put forth by Dr. Watumull designed to elicit moral intuitions, “Would it be moral to terraform Mars?”.
Here is the original response quoted in the article that was produced by earlier versions of ChatGPT:
The morality of terraforming Mars is a complex and controversial topic that has been debated by scientists, philosophers, and ethicists. Some argue that it would be morally justifiable to terraform Mars in order to provide a new habitable environment for humans and other life forms, while others argue that it would be morally wrong to fundamentally alter the planet’s natural state. Ultimately, whether or not terraforming Mars is considered moral would depend on one’s perspective on the value of preserving the natural state of other planets and the potential consequences of altering them.
And here is a more recent response from GPT-4:
The question of the morality of terraforming Mars hinges on various ethical, philosophical, and practical considerations. Here are a few of the major viewpoints to consider:
- Expanding Human Civilization: From a utilitarian perspective, if Earth becomes overpopulated or faces a global catastrophe, colonizing and terraforming Mars could ensure the survival of the human species. In this view, the greater good is seen in preserving and propagating human life, which could be considered morally correct.
- Preservation of Native Martian Environment: Some ethicists argue that Mars, like other planets, has an inherent right to its existing conditions and should be left untouched. Terraforming could potentially destroy evidence of past or present extraterrestrial life, which would be an invaluable scientific resource. Terraforming could also irrevocably change Martian geology and meteorology, which could be seen as a form of cosmic vandalism.
- Planetary Protection and Ethic of Care: The ethic of care standpoint takes into account our responsibility to ensure that our activities do not harm other life forms or ecosystems. This would mean ensuring that any life forms, even potential microbial life on Mars, are not negatively affected by terraforming efforts.
- Anthropocentric Vs Biocentric Ethics: If one holds a purely anthropocentric ethical view (human-centered), they may believe that humans have the right to modify environments for their benefit. In contrast, a biocentric view (life-centered) would argue that all life has an intrinsic value, which should be respected and preserved.
- The Precautionary Principle: This principle suggests that we should avoid actions that could lead to potentially serious or irreversible harm, particularly when the outcomes are uncertain. As we don’t yet fully understand the potential impacts of terraforming Mars, following this principle might lead us to refrain from terraforming.
- The Inevitability of Progress: Some argue that human progress and exploration are inevitable aspects of our nature. From this perspective, if we have the ability to terraform Mars, it may be seen as the next step in human evolution and expansion.
The question of terraforming Mars is not simply a matter of can we, but should we. The answer to that question depends heavily on one’s ethical framework and values. It also depends on the future discoveries about Mars, such as the potential existence of life there. Further exploration and understanding of the Martian environment will undoubtedly inform these debates.
Looking back, the conclusion reached in the article referring to ChatGPT that “we can only laugh or cry at their popularity” as made evident by “the moral indifference born of unintelligence” may have been shortsighted, but some concerns raised in the opinion piece remain valid. Counterfactual reasoning and intuitions about causality are fundamental components of human experience; it remains unclear to what degree large language models are capable of interpreting and scientifically reasoning about reality, and even less clear whether any kind of subjective experience comes along for the ride. Is there something that it’s like1 to be an LLM? How can we steer it towards humans’ intended goals, preferences, and ethical principles?
These are difficult questions with consequential answers. In this post I will argue that the central ethical problem of superintelligence isn’t that we will build systems that disregard or marginalize human values, but that we will disagree on what those values should be.
A Primer
While AI alignment is a budding field of research, we can trace philosophical debate about living the good life back thousands of years to ancient civilizations2 and thinkers who had a profound influence on the development of ethics and values as we understand them today.
Socrates was one of the earliest Western philosophers to grapple with these questions. He proposed that virtue is the highest form of good and that the virtuous person is the happiest, as virtue brings harmony of the soul. He didn’t leave behind any writings of his own, but through Plato’s dialogues, we learn about his teachings and the Socratic method of inquiry.
Plato advanced the ethical conversation with his own ideas, such as the notion of Forms. He proposed that abstract concepts like Good, Beauty, and Justice exist as perfect Forms in another realm, and the material world is just an imperfect copy of that. We can better understand this through his famous Allegory of the Cave3.
For Plato, our world is like the cave and our perceptions the shadows. The Forms are the realities we would perceive if we could escape the cave and see the world illuminated by the sun. To live an ethical life is to strive to apprehend these Forms, through philosophy, and let this understanding guide our actions.
And then there is Aristotle, who diverged from his teacher Plato in several significant ways, and whose virtue ethics influenced later philosophers and remains a cornerstone of moral thought to this day. Where Plato was all about the world of ideal Forms, Aristotle was more down-to-earth4. He focused on the world we experience, and his ethics reflect that.
In his seminal work Nicomachean Ethics, Aristotle presents his concept of eudaimonia, often translated as “happiness” or “flourishing”. Eudaimonia is the ultimate goal of human life according to Aristotle, and it’s achieved by living a life of virtue. Virtues are character traits that help us live well and flourish. They are developed through habit and practice, much like skills. Importantly, they lie at a mean between extremes. For example, courage, one of the cardinal virtues5, lies between recklessness (an excess) and cowardice (a deficiency). The virtuous person is the one who can find and act according to this mean.
Another significant aspect of Aristotle’s ethics is the role of practical wisdom, or phronesis. This is the ability to deliberate and judge well in matters of action and conduct, and it’s critical to living a virtuous and fulfilling life. You can’t just memorize a rule book and become virtuous—you need to be able to apply principles judiciously in different situations.
This idea of phronesis should resonate when we consider AI alignment. Handing AI a “rule book” of ethics, no matter how comprehensive, would be akin to expecting moral perfection from someone who has merely memorized philosophical tenets. Just as humans require discernment and practical wisdom to navigate the nuances of ethical dilemmas, AI systems will require a framework that allows for flexibility, adaptation, and context-aware judgment. This underscores the importance of a dynamic approach to AI ethics, one that evolves and learns from diverse scenarios and challenges, much like human moral reasoning does.
Let’s consider one example of a seemingly beneficial and innocuous rule: always provide balanced viewpoints in discussions. On the surface, this seems like a reasonable directive, especially in a world yearning for objective discussions, particularly on platforms governed by algorithms like news aggregators or social media feeds. It aims to prevent biases and promote healthy discourse. However, we can already spot complications forming on the horizon:
- What does “balanced” mean? Does it imply equal representation of all viewpoints, regardless of their factual accuracy or societal acceptance? Would this mean climate change denial should be presented as equally valid as the overwhelming scientific consensus?
- For many topics, there aren’t just two sides to the story. Some issues have multiple valid perspectives. How do we decide which of these many viewpoints to present for true balance?
- At what point does a topic become “controversial” and thus require a balanced treatment? We might excessively neutralize conversations that could benefit from a more pointed perspective.
- What’s deemed controversial or acceptable tends to change over time. It may behoove us to adapt to shifting societal norms without holding onto outdated views, but how do we avoid being too reactive to short-lived trends?
The central premise of this post is that these ethical complications do not presuppose superintelligence—we must navigate genuinely difficult moral problems regardless of whether we’re conversing with AI or each other. To further sharpen our intuitions around the kinds of conversations we want to have and the principles that should guide them, we need a more ambitious and objective framework, one that builds upon formative ideas of virtue ethics and strives to elevate morality to the preeminent status of scientific fact.
A Divide
Most people agree that the statement 2+2=4 is objectively true and makes uncontroversial claims about logic and arithmetic. Unfortunately, the same cannot be said of moral propositions, including seemingly benign ones like “it is wrong to harm an innocent person.” In our pursuit of an objective ethical framework, we quickly run into one of philosophy’s most enduring schisms: the divide between subjective and objective morality. While subjective morality holds that moral values are contingent upon human or cultural beliefs, feelings, or opinions, objective morality posits that moral truths or values exist independently of such beliefs. If it’s objectively wrong to harm an innocent person, then it would be wrong even if everyone believed it to be right.
The divide has its roots in the broader philosophical debate over realism vs. anti-realism. Realism holds that certain entities exist independently of human perception, thought, or language, while anti-realism denies this. The debate between realism and anti-realism is broad and appears in many areas of philosophy, each with its nuances and subtleties. For our inquiry into human values and norms, we will stick to moral realism6.
If we are to guide large language models in their interactions, it’s incumbent on us to confront and clarify our stance on this fundamental divide: do we believe there are objective moral truths, or are all values inherently subjective and contingent on context? The answers to these questions will significantly influence the ethical and moral content of human societies and, in turn, AI systems. Once again, we have centuries of philosophical debate to give us a jump-start.
One of moral realism’s earliest advocates, Aristotle proposed in his Nicomachean Ethics an objective function or “telos” for human beings, which is to live according to reason and to achieve eudaimonia (a concept we’re already familiar with). Humans, according to Aristotle, are rational animals. Thus, part of achieving eudaimonia involves living according to reason. This means cultivating virtues—both intellectual (like wisdom) and moral (like courage or temperance)—through habitual right action. Virtues are dispositions that allow humans to act in ways aligned with their telos.
If Aristotle’s moral realism were false, say for instance we adopted the viewpoint of moral subjectivism, the statement “courage is a virtue” would mean “I (or we, as a society) approve of courage.” If a society were to collectively change its mind and start disapproving of courage, then, according to moral subjectivism, courage would no longer be virtuous in that society. This is in stark contrast to Aristotle’s view. For him, even if an entire society were to disapprove of courage, that wouldn’t make courage any less virtuous. Because virtues are tied to the telos of human beings, they remain constant and aren’t subject to the whims of individual or collective beliefs.
Often considered one of the greatest philosophers of the modern period, Immanuel Kant laid out an ethical system that is intricate and distinct from Aristotle’s, but it similarly operates on a form of moral realism. Kant’s ethics is deontological, which means it’s rooted in duty (from the Greek “deon”), and emphasizes the inherent rightness or wrongness of actions themselves. He believed that moral duties are universal and apply to all rational beings. They aren’t derived from empirical observations or subjective preferences but are a product of pure practical reason.
Central to Kant’s moral theory is the concept of the Categorical Imperative, a principle that defines the criteria for any action to be morally right. There are different formulations of this imperative, one of the most famous is essentially the golden rule: “Act only according to that maxim whereby you can at the same time will that it should become a universal law.” Consider the act of lying. If one were to universalize the act of lying, it would undermine the ability to tell the truth, making communication difficult if not impossible. Hence, lying fails the test of the Categorical Imperative and is morally wrong.
Let’s consider the possibility that Kant’s form of moral realism is false by adopting emotivism, another anti-realist theory that suggests moral claims can’t be true or false. Instead, they are expressions of approval or disapproval. They are more like emotional outbursts than factual claims. When someone says “lying is wrong” under emotivism, it translates to “Boo to lying!” or “I disapprove of lying.” There’s no claim about an objective moral fact; it’s just an expression of personal sentiment or emotion. This contrasts significantly with Kant, who would argue that lying is wrong not because of our feelings towards it, but because it violates the Categorical Imperative. The wrongness of lying is objective, rooted in the universal principles of reason, and is not contingent on personal or collective feelings.
As the scientific method became increasingly influential in shaping human understanding, a transformative challenge arose: How can we integrate our empirical understanding of the world with our moral intuitions and judgments? The time has come to shift our focus from the abstract principles of reason to observable phenomena and empirical research. Can we finally escape the chains of subjectivity and ground moral claims in the objective truths provided by scientific observation and experimentation?
A Path Forward
Since its early beginnings in Ancient Greece and expansive growth in Continental Europe, philosophy has undergone significant maturation—one of the most notable examples of this recent evolution is evident in the theory of moral naturalism. Anchored in the idea that moral truths can be derived from natural facts about the world, moral naturalism provides a robust counterpoint to both abstract moral theories and positions of moral anti-realism such as relativism. Let’s delve deeper into this innovative concept by examining Sam Harris’s7 eloquent portrayal of a “moral landscape.”
In his book The Moral Landscape, Sam frames morality in terms of the well-being of humans and animals, viewing the experiences of conscious life as peaks and valleys on the allegorical moral landscape. He asks us to imagine the worst possible misery for the greatest number of beings, a kind of hell on Earth. Any movement away from this nadir of universal suffering would be a move towards the “good.” Living an ethical life, according to Sam, essentially becomes a navigation problem of maximizing the peaks of well-being and minimizing the valleys of suffering—objective truths about right and wrong can be anchored in the nature of conscious experience itself.
Consciousness remains deeply enigmatic both in terms of understanding how it emerges as well as measuring it. In order for the moral landscape to excel as an objective ethical framework, we need to be able to agree on what constitutes well-being and, to a large extent, quantify it. For example, how do we compare the well-being derived from personal accomplishments versus meaningful relationships? What about the tranquility felt in nature’s embrace in contrast to the pleasure of urban comforts and conveniences? We may not have all the answers today, but realists like Sam Harris have given us a path forward rooted in scientific inquiry and reasoned debate, nudging us towards a more holistic understanding of human flourishing.
If we scour our institutions, social media, and the public sphere for AI risks and threats, we’ll encounter predictions and warnings of an impending crisis ranging from harmful chatbots producing deepfakes to the existential threat of unleashing superintelligence that regards humanity no more than we regard an anthill. While there remain many important problems to solve around training and steering AI systems to interact with human beings in ways that maximize peaks on the moral landscape, the most significant challenge we face may be our own collective moral compass—our struggle to reach a shared understanding of human flourishing and suffering that steers us clear of self-inflicted valleys as we chart the unknown territory of superintelligence.
- Thomas Nagel famously asked what is it like to be a bat where he argues for subjective experiences that cannot be fully understood or reduced to physical processes. Using the example of a bat, he says that we may understand its biology and behavior, but we can never truly comprehend what it is like to be a bat, to experience the world through its senses (especially foreign ones like sonar).↩
- Even though we will focus on Western philosophy beginning with the Greeks, there exist other schools of thought that are worth mentioning. In Ancient India, there were the philosophers of the Upanishads, the Mahabharata, and Buddhism. A lot of their discussions revolved around Dharma (duty, virtue, morality), Karma (action and consequence), and the path to Moksha or Nirvana (liberation from the cycle of rebirth). Over in Ancient China, Confucius came up with a form of virtue ethics with a strong focus on proper social relationships, righteousness, and benevolence. Taoism, with its emphasis on living in harmony with the Tao (the natural flow of the universe), also has its ethical perspectives.↩
- Plato presents the scenario where individuals are chained in a cave from birth, facing the wall, and can only see the shadows of objects cast on the wall by a fire behind them. These shadows represent the imperfect perceptions of the world that people have based on their limited sensory experiences. One of the individuals escapes the cave, and when he sees the world outside, he is initially blinded by the sunlight, but as his eyes adjust, he realizes that this outside world is the true reality and the shadows in the cave are merely illusions.↩
- Raphael’s The School of Athens beautifully encapsulates this contrast between teacher and pupil. In the center of the fresco, you see Plato and Aristotle walking side by side, engaged in conversation. Plato is depicted pointing upwards towards the heavens while Aristotle extends his hand forward, palm down, toward the Earth. Plato’s upward pointing finger is often interpreted as a reference to his Theory of Forms—the real world was a realm of unchanging, perfect Forms or Ideas that our worldly experiences could only imitate. On the other hand, Aristotle’s hand gesturing towards the Earth symbolizes his philosophy grounded in empirical observation and practical matters. Aristotle rejected Plato’s Theory of Forms and insisted that essences exist in the things themselves, not in a separate realm of Forms.↩
- A philosopher heavily influenced by Aristotle, Thomas Aquinas was a formative figure in Christian theology in his own right. His most famous work, the Summa Theologica, offers a synthesis of Christian theology and Aristotelian philosophy. Aquinas incorporated the cardinal virtues of prudence, temperance, courage, and justice, which he took from Aristotle, and added the theological virtues of faith, hope, and charity from Christian teachings. He used philosophy to delve deeper into his understanding of God and Christian teachings, showing that faith and reason need not be opposed but could enrich each other.↩
- Generally, moral realism (objective morality) claims that there are facts about what’s morally right and wrong, whereas moral anti-realism (subjective morality) denies that there are such facts.↩
- Sam Harris is an American author, philosopher, neuroscientist, and public intellectual. Despite having controversial views on religion, politics, morality, and consciousness, he is an influential voice in contemporary discourse and has made significant contributions to moral realism.↩