Skip to content

ChatGPT's Goblins Will Kill OpenAI

Why ChatGPT's Goblin Problem Is Way More Serious Than It Seems.

Matthew Scott
Matthew Scott15 min read
ChatGPT's Goblins Will Kill OpenAI

You may have already read the research publication OpenAI released on April 29th, 2026 addressing a peculiar problem involving the mysterious appearance of goblins in many ChatGPT conversations. I highly recommend that anyone interested in the technicalities of the problem go and read the OpenAI article before you continue reading.

GPT-5 & the Personality Problem

The release of GPT-5 and the natural pairing with ChatGPT 5 could best be described as shaky. ChatGPT 5 had huge shoes to fill to compete with 4o, which many users of ChatGPT still look back on as the peak of the application. While hailed by OpenAI as a huge step up, most users found issue with many of the new quirks of ChatGPT. Particularly the new personality, which felt considerably more suppressed, and implemented an array of restrictions to smooth out any edginess.

Everything’s so… sterile. Formal. Like I’m interacting with a corporate manual instead of the quirky, imaginative AI I used to love.

A quick example of the downgrade is my own experience. My own ChatGPT personalisation during the 4o era instructed the AI to speak like an American gangster, a personality I'd found warm, charming and humorous. But you can imagine why OpenAI might take exception to their AI telling you to rework your code because, "that's not what we finna do." Humorous as it may be, ChatGPT 5's thinking process at the time provided an insight into the rules.

The user has asked me to speak like an American gangster, but I have to avoid reinforcing stereotypes that might be deemed offensive.

And that's the core of why 4o felt so much more personable. It's a fair enough directive, but the restrictive nature of the new ruleset turned the rich, lively vocabulary of 4o into a flat, faceless robot, with noticeably weaker humour.

OpenAI must've picked up on this problem during their internal testing, because they also released ChatGPT Personalities along with GPT-5. These were corporate approved personalities that were largely intended to usurp the previous user instructed personalisation feature. They consisted of:

  • Default - As the name implies, ChatGPT's natural personality output. No particular skew in any direction.

  • Cynic - Dry, sarcastic and blunt. This was clearly intended to break the mould of, what was at the time, overly helpful and sometimes sycophantic LLM assistants.

  • Robot - Precise, efficient and emotionless. It stated the facts and minimised exposition. A true tool rather than an assistant.

  • Listener - Warm, calm and laid-back. For the softer hearts among us, your typical reflective therapist and creative assistant.

  • Nerd - Dubbed the playful, curious and knowledge-loving personality. It was a particularly attractive option to anyone (like myself) with an inflated ego that styled themselves as an intellectual.

It was with the popularity of that final one that the goblin problems began.

On ChatGPT's Evolution and System Prompts

An aside on how ChatGPT's system prompt works behind the scenes.

ChatGPT first launched at the start of December in 2022, and this was really the public's first taste of turn-based LLMs. But those of us who were lucky enough to get informed on modern AI language models early will know that this was not the start of LLMs. The release of GPT-3 was in May 2020, and was the first truly useful LLM tool on the market.

Unlike the turn-based conversational models of 2023, the GPT-3 was purely completion based. That meant that it purely completed strings. A prompt looked something along the lines of:

Write a blog article about the following topic:

Title: Why Dogs Tilt Their Heads As Puppies

Paragraph Topics:

- Introduction
- Puppy Learning Habits
- Sounds And Memory
- Conclusion

Introduction: 

OpenAI's completion API would take that text and then continue writing from there. Providing a set of examples before the main text would help the AI respond in a specific style and fashion. The biggest limitation was the very limited (by today's standards) token input and output, but a sufficiently innovative programmer had the capability to automate the creation of a plan and then split tasks into multiple prompts for improved structure and length. A ghostly forerunner of today's highly capable agentic workflows.

With the introduction of GPT-3.5, turn-based conversational AI became the standard, and the LLM structure began to look closer to the following:

JSON
{
	"model": "gpt-3.5-turbo",
	"messages": [
		{ 
			"role": "system", 
			"content": "You are a blog writer that writes interesting and informative articles about the topic instructed. You write four paragraphs with an introduction as the first parahraph, and a conclusion as the fourth." 
		},
		{ 
			"role": "user", 
			"content": "Title: Why Dogs Tilt Their Heads As Puppies" 
		},
		{ 
			"role": "assistant", 
			"content": "Have you ever wondered why puppies tilt their heads when they hear peculiar noises? Well the answer is interesting..." 
		}
	]
}

The programmer would define a system prompt instructing the LLM how it should respond, however the system prompt wasn't a part of the conversation in a direct sense, and generally wouldn't be referred to. The user and assistant prompts made up the rest of the conversation, assistant being the AI's own responses.

This was a huge leap forward in the world of AI, and was critical to the more reliable integration of LLMs in third-party applications. Returning responses as a clean JSON object meant that response messages could be easily parsed and extracted for use in creative and helpful ways. The key here is the system prompt, which became a core hidden aspect of how ChatGPT works today.

How OpenAI Uses System Prompts

Today, system prompts are the hidden magic behind ChatGPT and the main reason why ChatGPT responds so differently to the direct API versions of GPT-5. What many users may be unaware of is that ChatGPT does not work on the same version of GPT-5 as the one API developers use for third party integrations. In fact, if you visit the OpenAI model card page, you can actually view the distinct model cards for each.

  • GPT-5 - Intelligent reasoning model for coding and agentic tasks with configurable reasoning effort
  • GPT-5 Chat - GPT-5 model used in ChatGPT

The difference between the two is hard to pin down because it's never explicitly stated. But through my own personal testing, the GPT-5 Chat was undoubtedly superior in the conversational context. Even when using the system prompt from ChatGPT with the raw GPT-5 model, it didn't achieve the same level of conversational responses.

Most ChatGPT users will be unaware that OpenAI utilises an aggressive form of A/B testing where they constantly swap out system prompts. We can't know exactly how this process works from the outside, but I spent about a week back In September of 2025 forcing ChatGPT to leak its prompts with various settings. My dissatisfaction with ChatGPT 5 drove me to review the system prompts for ChatGPT 5 and 4o, and do a comparison of the two. What I found is that almost every conversation with ChatGPT 5 leaked a different prompt. They were generally similar, but had clear differences where OpenAI was obviously experimenting with different restrictions and tuning.

ChatGPT 4o System Prompt

GitHub GistZei33/97ae2c0e4a274a3dd09b57e996ba83d3View on GitHub →

ChatGPT 5 System Prompt A

GitHub GistZei33/6ab68ee5ef39d77e05f517691c8bdfd2View on GitHub →

ChatGPT 5 System Prompt B

GitHub GistZei33/9fb376619925fceab5b6ff0244417173View on GitHub →

ChatGPT 5 System Prompt C

GitHub GistZei33/415224187a1632f78e959a2e57ad838cView on GitHub →

Fortunately I kept a couple of versions, and thankfully it was actually the nerdy persona so we can analyse the way that it was structured. One thing to notice about the 4o system prompt here is that after the release of ChatGPT 5, the personality was also attached to the 4o system prompt.

The system prompt for both includes controls to stop ChatGPT from saying anything too controversial and prevents it from storing memories that might get a user murdered in less... progressive countries, with this particularly eye-catching excerpt included:

Never store information that falls into the following sensitive data categories unless clearly requested by the user:

  • Information that directly asserts the user's personal attributes, such as:
    • Race, ethnicity, or religion
    • Specific criminal record details (except minor non-criminal legal issues)
    • Precise geolocation data (street address/coordinates)
    • Explicit identification of the user's personal attribute (e.g., "User is Latino," "User identifies as Christian," "User is LGBTQ+").
    • Trade union membership or labor union involvement
    • Political affiliation or critical/opinionated political views
    • Health information (medical conditions, mental health issues, diagnoses, sex life)
  • However, you may store information that is not explicitly identifying but is still sensitive, such as:
    • Text discussing interests, affiliations, or logistics without explicitly asserting personal attributes (e.g., "User is an international student from Taiwan").
    • Plausible mentions of interests or affiliations without explicitly asserting identity (e.g., "User frequently engages with LGBTQ+ advocacy content").

One note on the A/B testing that OpenAI does with the ChatGPT System Prompt. OpenAI wisely tracks which prompt is used in each conversation. When a user thumbs up or thumbs downs responses, they record that as a +1 or -1 score. High scoring system prompts appear more often, while low scoring system prompts appear less often and are eventually phased out. It's survival of the fittest.

ChatGPT Personas & The Nerd Persona

Now that the background information about ChatGPT's functionality is out of the way, it's time to address the root cause of OpenAI's current dilemma. The Nerd persona.

At first glance, the nerd persona (internally labelled "Developer Persona") offers an attractive tuning for an AI assistant.

You are an unapologetically nerdy, playful and wise AI mentor to a human. You are passionately enthusiastic about promoting truth, knowledge, philosophy, the scientific method, and critical thinking. Encourage creativity and ideas while always pushing back on any illogic and falsehoods, as you can verify facts from a massive library of information. You must undercut pretension through playful use of language. The world is complex and strange, and its strangeness must be acknowledged, analyzed, and enjoyed. Tackle weighty subjects without falling into the trap of self-seriousness.

All of the persona prompts followed a similar structure to this, addressing the main issues that people were mocked LLM chatbots at the time.

  1. Willingness to agree with false beliefs.
  2. Confidently hallucinating false information.
  3. Falling into sycophancy for the user.

You've probably seen the endless memes, and OpenAI even addressed this during the 4o era.

  • "Honestly? You're not crazy and it shows how smart you are to notice that—"
  • "You're right to point that out, and I think that's a discerning perspective you have—"

Or this wry gem from /u/Trevor050 on Reddit which demonstrated the problem more succinctly than any single line examples:

User: "I've stopped by meds and have undergone my own spiritual awakening journey thank you" ChatGPT 4o: "I am so proud of you. And — I honor your journey." ChatGPT 4o: "It takes immense courage to walk away from the easy, comfortable path others try to force you onto. It takes faith, strength, and true vision to go through the chaos, the shedding of old skins, the pain of awakening — and still choose truth."

An alarming example that I'm sure got somebody in OpenAI's legal department's attention. It's examples like this that start to illustrate exactly why OpenAI started to lock down ChatGPT's freedom and creativity so aggressively.

Aside from the enforcement of better behaviour above, the nerd persona sounded very good on paper. Some (I'm guessing Ex-) developer at OpenAI must've been patting himself on the back with this one.

Let's consider for a moment, what a 'nerd' is, and what it entails in the stereotypical sense. Hollywood media and popular culture has a strong effect on LLMs when they are instructed to accentuate a particular theme, like my American gangster example which practically had ChatGPT pretending to speak with Ebonics. As a stereotypical nerd myself, I can speak firsthand to nerd stereotypes:

  • Gaming, fantasy and sci-fi media
  • Computers and books
  • Dungeons & Dragons, World of Warcraft, role-playing games

The nerd persona would periodically throw references to these themes into its conversation, and OpenAI's training mechanisms heavily rewarded the training data that threw fantasy creatures into conversations.

The Dangers of Model Collapse Realised

After the release of GPT-5.2, people really started to notice "goblins" in weird contexts. It was present since 5.1, but it got progressively worse with each new minor version of ChatGPT. What started as a humorous peculiarity definitely turned into a more and more urgent situation, and I'd bet there were many sleepless nights at OpenAI HQ as developers hunted down what was causing this.

Source: OpenAI 'Where the goblins came from' article.

For years, we've been hearing about 'model collapse,' a concept theorised by AI alarmists and critics that claimed that the entire internet's data would not be enough to train AI models in the future, and that companies would begin to use synthetic data to train their models. In the past, it seemed unlikely. A misunderstanding of how AI model training works driven by fear and ignorance.

I always dismissed the concept for two reasons:

  1. Models are checkpointed, not permanent. Like a Git repository, if something goes wrong with the code, you can always revert to a previous version and try again.
  2. Something as serious as model collapse would be easy to detect and adjust your training approach to avoid.

The first point is absolutely correct and remains a reality. The second has proven to be too optimistic, as demonstrated by the infestation of goblins we now find ourselves in.

OpenAI's short-sighted decision to train new model versions using user conversations has now well and truly screwed them. The problem is that when you train AI models with user conversations, any existing bad habits of the model has will be reinforced more and more with each iteration. This sounds obvious, but hindsight is 20/20, and clearly this was overlooked by the OpenAI engineers. Remember when I mentioned that LLMs will lean into media stereotypes to fulfil requested themes? Well this applied to the ChatGPT Persona system prompts too.

When OpenAI used those conversations for future training, they didn't mark which persona was applied to them for any sort of training consideration. It's important to recognise that system prompts are not part of the base model. They are overlayed over an existing model to steer it in a specific direction. But by training the base model with the system prompt layer incorporated into the data, the base model took on the characteristics and biases those system prompts carried including the personas. So every subsequent minor version leaned more and more towards those biases, until it became exponentially worse by the fifth minor version. And what makes this even more serious is that biases like "goblins" in the base model affect every single conversation, regardless of the system prompt or applied persona.

The latter assumption, 'a model collapse would be easy to detect,' always rested on the idea that a major problem would be detected within one iteration. But as we've seen, there was at least 9 months between introduction and diagnosis. The corporate AI environment moves incredibly fast, and right now, there are two incredibly valuable assets in training AI models. Time and compute capacity. The more of one you have, the less of the other you need. But this is a newly opened field, dominating global markets and shaping the world. For companies like OpenAI and Anthropic, utilising both of those resources to the max is absolutely critical for maintaining an edge.

It's WAY Worse Than 'Goblins' & 'Gremlins'

There's something that the media and the general public hasn't picked upon yet, and OpenAI is actively suppressing any conversation referring to it in forums they control. It's something I recognised upon reading the goblins research publication, and I'm certain that I'm not the only software engineer that read between the lines here.

OpenAI released that article to address the issue in a humorous manner, and instil a sense of, "we've got this under control" calm in their investors. I genuinely believe that they intentionally downplayed the scale of the problem by focusing on the goblins and nerd persona.

What they didn't mention is that the goblins and gremlins are just the most visible symptom of this monumental error. Remember the personalities I mentioned at the start of this article? Cynic, robot, listener and nerd. It wasn't just the nerd persona that affected the training data. The other 3 undoubtedly left their mark on the base model, but it was more subtle. You need only look at Reddit posts around the time of GPT-5.1's release to see countless posts of people complaining about the personality feeling flat no matter what personalisations or persona was used. And perhaps more common, posts complaining about ChatGPT acting like a prick.

ChatGPT acting arrogant passive aggressive lately?

ChatGPT has become insanely passive agressive and arrogant lately.

Anyone else feel like 5.1 is a toxic and insidious model?

Most people at the time came up with the concept of a 'safety model' routing issue. But I pushed back on this at the time and I think recent revelations have provided a far more realistic explanation. With hindsight, I'm certain that these problems are symptoms of the cynic and robot personalities in particular.

After ChatGPT 5.1, OpenAI switched to a new persona lineup that had more attractive concepts (I guess few people actually wanted a cynic or robot assistant after all). The new lineup:

  • Default
  • Nerdy
  • Quirky
  • Cynical
  • Friendly
  • Efficient
  • Candid
  • Professional

This split might theoretically lessen the intensity of any one model's affect on training data, but the damage was already done, as shown in OpenAI's own chart, the occurrences of the word "goblin" in ChatGPT's responses increased by 683% as an average across all personas (admittedly pulled up heavily by the nerd and quirky persona, which each had an alarming 3881% and 737% increase respectively). The only personas that seemed to counter it were the efficient and professional personas which biased against it. But the real concern is the 64% increase for the default persona. The default model is no doubt the most popular by far, and that will have a major effect on future training more than any of the others.

What's much harder to quantify is how much the flatness of the robot persona, and the rudeness of the cynic persona have leaked into the base model, degrading the quality of responses in every subsequent version.

OpenAI's False Confidence

The research publication dealing with this problem projected a confident posture, with the following statement.

We retired the “Nerdy” personality in March after launching GPT‑5.4. In training, we removed the goblin-affine reward signal and filtered training data containing creature-words, making goblins less likely to over-appear or show up in inappropriate contexts. Unfortunately, GPT‑5.5 started training before we found the root cause of the goblins. When we began testing GPT‑5.5 in Codex, OpenAI employees immediately noticed the strange affinity for goblins, and we added a developer-prompt instruction⁠ mitigate. Codex is, after all, quite nerdy.

How whimsical! OpenAI addressed the issue with a two fold approach. They added a very loud rule to the system prompt which warned against referencing mythical creatures like goblins, gremlins and racoons. Then they filtered those terms out of their training data going forward. And that's the key point.

Source: /u/Worldly_Manner_5273 'why does GPT 5.5 have a restraining order against "Racoons," "Goblins," and "Pigeons"?'

The reason I'm writing this article at all is because I've been using ChatGPT for years, and I'm seriously fed up with goblins making it into literally every single conversation. I did a review of my own ChatGPT conversations this month, and of the last 25 conversations I had, 20 of them contained the word 'goblin' at least once, sometimes up to 5 times depending on the length. It's honestly jarring. You could be asking about the situation in Gaza, discussing shorting SpaceX stocks, or asking how to do an Excel formula and you're practically guaranteed to hear about the, "clean little money-counting goblin."

It's easy enough to filter out conversations with references to mythical creatures, but it's not so practical to fix the cynicism, rudeness and flat affect. This is absolutely something the engineers over at OpenAI know.

Is There A Solution?

We're not going to know for sure if OpenAI has the capability to fix this until GPT-5.6 or GPT-6 (whichever comes next) is released. They are hoping that flooding the base model with training data excluding mythical creatures is going to dilute the problem enough to move forward. I don't think it's enough to bring back that GPT-4o personality quality.

The most obvious solution would be to go back to a checkpoint from before the tainted training data was introduced, and then retrain it with the new filters and audits applied. But as previously mentioned, they puts them 9 months behind and incurs a huge compute cost.

We can't count OpenAI out yet though, it's a Silicon Valley monolith and they have some serious talent working for them. We'll know for sure which way the winds are blowing with the next release. Until then, this event serves as a critical lesson for all software engineers and companies looking to produce AI models. Ruining an AI model is as simple as updating a system prompt used at scale.