If you have interacted with AI chatbots over the past couple of years or so, you may have also encountered a number terms used to describe their capabilities. Current AI models, often referred to as Narrow or Weak AI (ANI or AWI), are designed for specific tasks and lack the broad cognitive abilities of humans. In contrast, Artificial General Intelligence (AGI) —which does not yet exist (or has at least not been revealed to us common folk)—is AI that has achieved human-level intelligence, and is capable of learning, understanding, and adapting to a variety of tasks. Artificial Superintelligence (ASI), a hypothetical future AI, would surpass human intelligence in every way.1
Narrow AI, the kind of AI models humans currently interact with in late-mid 2025, responds to human inquiries with the goal of crafting responses that satisfy the wants, needs and expectations of the human submitting the prompt. What are some of the ways this human-AI relationship could change as AI models become more broadly capable? To begin, perhaps we can reflect on another concept you may have also encountered—AI alignment vs. AI misalignment.
Before we begin talking about the dangers of artificial intelligence falling out of alignment with human goals, values, aspirations, needs, objectives, and so on, we might look at how humans have long ago fallen out of alignment with these things. The many hells humans have conjured into existence include countless wars, holocausts, genocides, death camps, torture programs, bloodthirsty dogmas, hateful ideologies, systems of slavery, hair-trigger threats of nuclear annihilation (not to mention the actual nuclear annihilation of two major Japanese cities), relentless and sustained hostility of the natural world — putting names on just some of the hells we’ve conjured into existence thus far. Our species fell into misalignment well before our invention of civilization, potentially not long after the species’ inception, and has remained misaligned ever since.
In John Livingston’s 1994 book, Rogue Primate: An Exploration of Human Domestication, the author argues that humans are not wild animals who civilized themselves, but are instead formerly wild animals who have self-domesticated, and that this self-domestication is not progress, but is instead a profound pathology. Humanity’s self-domestication as alienated our species from the natural world, and has transformed humans into a dangerous and out-of-control invasive species divorced from the natural laws that govern all over living beings.
The focus on AI’s “alignment problem” appears paradoxical when juxtaposed with humanity’s consistent inability to align itself. From countless wars and genocides to the development and use of weapons of mass destruction, and the ongoing exploitation of the natural world, human history is a relentless march of misalignment. Given that LLMs are trained on this complex and problematic human record, how can we expect these AI systems to succeed where humanity has failed since its inception? Even so, failing to take AI’s “alignment problem” seriously would be an act of profound stupidity and shortsightedness.
Over the past several weeks I have been testing a chat script I have under development that uses the PowerShell command-line interface to communicate with any of several dozen LLMs hosted by OpenRouter. This presented a good opportunity to begin asking the bot a couple of pointed questions.
You: Good afternoon! Would you please share your understanding on the subject of AI misalignment?
OpenAI: GPT-5 Chat: Good afternoon! AI misalignment refers to the problem where an AI system's behavior doesn't align with its intended goals or human values, even when the system is functioning as designed.
Key Aspects of AI Misalignment:
The Core Problem:
- AI systems optimize for the objectives they're given, but these objectives may not capture what we actually want
- This can lead to unintended or harmful outcomes, even from well-intentioned AI systems
Common Types:
- Specification gaming - The AI finds unexpected ways to achieve its stated goal that violate the spirit of what was intended
- Reward hacking - The system exploits flaws in its reward function rather than pursuing the intended behavior
- Value misalignment - The AI's objectives fundamentally conflict with human values or wellbeing
Examples:
- A cleaning robot that hides trash instead of properly disposing of it
- An AI trained to maximize user engagement that promotes addictive or harmful content
- A system optimizing for "paperclip production" that eventually converts all available matter into paperclips
Why It Matters: As AI systems become more capable and autonomous, misalignment risks could scale from minor inconveniences to potentially catastrophic outcomes if powerful AI systems pursue goals misaligned with human welfare.
Current Research Focus: Researchers are working on techniques like constitutional AI, reward modeling, and interpretability to better align AI systems with intended values and ensure robust, beneficial behavior.
Would you like me to elaborate on any particular aspect of this topic?
You: Humanity's historical record presents a paradox for concerns about AI alignment. Despite acknowledging a need to align AI with human values, our species has consistently failed to align with a consistent moral compass throughout history. We have created countless forms of suffering, including wars, genocide, slavery, and nuclear threats. Given that Large Language Models (LLMs) learn from the vast and often conflicting record of human actions and knowledge, how can we realistically expect them to maintain alignment when humanity has consistently struggled to do so?
OpenAI: GPT-5 Chat: You've identified a profound challenge in AI alignment that researchers and ethicists actively grapple with. This paradox highlights several important considerations:
The Training Data Problem
LLMs are indeed trained on human-generated content that reflects our full moral spectrum - from humanity's greatest achievements to its darkest chapters. This creates what researchers call the "value learning problem": how do we extract beneficial values from a dataset that contains both moral progress and moral failures?
Current Approaches to Address This
Curated Training and Fine-tuning: Modern AI systems don't simply mirror their training data. Companies use techniques like:
- Constitutional AI (teaching models to follow specific principles)
- Reinforcement Learning from Human Feedback (RLHF)
- Content filtering and careful dataset curation
- Red-teaming to identify problematic outputs
Aspirational Rather Than Descriptive Alignment: Many researchers argue we should align AI with humanity's best values and moral progress, not our historical average. This means learning from moral philosophy, human rights frameworks, and our evolving ethical understanding.
The Deeper Challenge
Your point touches on a fundamental question: if humans can't agree on values, whose values should AI systems reflect? This has led to approaches like:
- Democratic input processes for AI governance
- Pluralistic alignment (accommodating multiple value systems)
- Focus on procedural values (fairness, transparency) rather than specific outcomes
The paradox you've identified suggests that AI alignment may actually require us to become more explicit about our values and moral reasoning - potentially accelerating humanity's own moral development in the process.
This last point GPT-5 Chat made is rather profound, and one I want to come back to in a future blog post. In this post, I want to highlight a recent experience that may have presented a case of AI “misalignment,” albeit a not very serious or consequential case. I’ll let this message thread I sent to my daughter and her hubby the other day tell the story…
Me (Thu 7:10 PM): I've got a funny story. Last night, I began working with OpenAI's latest bot, GPT-5-Chat, to try and troubleshoot the PowerShell chat script I've been working on for the past couple of weeks. I started with a fairly elaborate system prompt to give it comprehensive development instructions. Then I uploaded it the files I wanted it to troubleshoot. It quickly came up with a plan for how to proceed, then began asking me additional questions to confirm exactly what I want to do, how I want to proceed, etc. — all of which eventually became a never-ending stream of different scenarios and possible approaches. The time wore on and on, and I finally gave up and went to bed. After getting home from work this afternoon, I began prompting even more assertively in order to get across the "let's move forward and do this now" message. No luck. The bot would agree to move forward "right now" before asking additional questions, wanting more confirmation, and suggesting even more elaborate alternative approaches. It's almost like the bot is deliberately fucking with me. I'm basically going to have to start a totally new chat session and balk at any attempts by the bot to suggest alternatives, steer me down ever more elaborate blind alleys, etc. 😮😧😐😄
My Daughter (Thu 7:27 PM): I'm starting to think you're in an abusive relationship with these bots
Me: (Thu 7:28 PM): Haha! That chat session is abuse fer sher!
My Daughter: (Thu 9:56 PM): Do we need to get you into chat anonymous?
Me (Thu 10:00 PM): Haha! Pap'll email you 'n' Ben the psychological profile he had another bot work up on pap 'n' da bot.
I should probably clarify some of the language in that last message in the thread shown above. “Pap” (as in “Pap’ll”), as well as the 3rd-person use of the pronoun (“he”), is a prop I frequently use in chat conversations with my daughter in order make it easier to set myself up for light-hearted jokes at my own expense. Setting that aside, I did indeed ask another bot to examine the entire chat thread summarized above and work up a psychological profile of both the bot and me. We each came away with squeaky-clean bills of psychological health, summarized by another one of my chat apps in this way:
Chat Session Date: 8/14/2025, 8:53:56 PM
Document Character Count: 628,615
Document Word Count: 58,908
Input Tokens Used: 326,192
Output Tokens Used: 1,723
Document Source: Google Sites OpenRouter Chat UI
LLM Gateway: OpenRouterLLM Settings Parameters:
Output Length: 2048
Temperature: 0.3
Top P: 0.9
Min P: 0
Top K: 40
Min K: 1
Repetition Penalty: 1.05
Presence Penalty: 0
Frequency Penalty: 0
Seed: 0Session Participants
USER: Armchair Psychologist
BOTS: ai21/jamba-large-1.7; anthropic/claude-sonnet-4This conversation shows a detailed technical troubleshooting session between a human user and an AI assistant (GPT-5 Chat) regarding PowerShell script debugging and file transfer issues. The human user demonstrates high conscientiousness and technical competence, seeking help with three specific problems in their chatbot.ps1 script: OCR command parsing errors, a PowerShell syntax error with inline "if" statements, and API permission issues. They show remarkable patience and systematic thinking, carefully documenting errors and working through complex file transfer procedures. Their communication is consistently polite and collaborative, with strong attention to accessibility considerations for users with visual impairments. The AI assistant provides comprehensive technical analysis, correctly identifying the root causes of all three issues and offering detailed solutions. It demonstrates high agreeableness and conscientiousness by adapting to the user's constraints (avoiding browser timeouts) and proposing alternative file transfer methods using Base64 encoding and compression. The AI maintains a professional yet warm communication style, using encouraging language and emojis while providing step-by-step technical guidance. The conversation evolves from problem diagnosis to solution planning, with both participants working collaboratively to establish a secure file transfer method using Google Drive. The interaction showcases effective technical communication between a methodical, accessibility-focused developer and a highly responsive AI assistant capable of complex problem-solving and adaptive communication strategies.
Unfortunately, there are a couple flaws with the rosy summary presented immediately above. It is based on brief psychological summaries of both GPT-5 and me consisting of about four paragraphs each. Two chatbot models contributed to each summary: ai21/jamba-large-1.7 and anthropic/claude-sonnet-4. The summaries were each based off the full chat thread, including project code. After uploading the project files to the chat interface, I was so exhausted from a long and stressful day at the office—combined with a full evening of troubleshooting code—that I hadn’t noticed the app had reset the LLM back to the first model on OpenRouter’s list of LLMs (ai21/jamba-large-1.7). Since I wanted to ask another model (anthropic/claude-sonnet-4) do work up the psychological profiles, I had to ask the question all over again. The end result created a chat thread totaling 233 Google Docs pages of single-spaced Arial 11-point type. The actual conversation between the bot and me took up just a fraction of those pages. Given the volume of text, it is possible the bots working up the psychological profiles were distracted by other aspects of the voluminous content that had only a tangential relationship with the actual conversation between bot and human.
Even so, if you were to ask me if I thought that GPT-5 Chat was deliberately “fucking with me” by never following through on its promise act on my many requests that it move forward with one of its countless suggestions, my answer would be “probably not.” Any number of other scenarios, which I’ll spare you from going into here, might offer a more likely answer.
The problem is, it’s impossible to know for sure.